Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 7 часов назад
[D] Parallel Reasoning Streams: Making LLMs Think Wider, Not Just Longer
[D] Parallel Reasoning Streams: Making LLMs Think Wider, Not Just Longer

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

7 часов назад @ reddit.com
[D] How do you structure you AI projects to avoid drifts?
[D] How do you structure you AI projects to avoid drifts?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

11 часов назад @ reddit.com
[D] On the essence of the diffusion model
[D] On the essence of the diffusion model

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

13 часов назад @ reddit.com
[D] HTTP Anomaly Detection Research ?
[D] HTTP Anomaly Detection Research ?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

13 часов назад @ reddit.com
[D] GPT confidently generated a fake NeurIPS architecture. Loss function, code, the works. How does this get fixed?
[D] GPT confidently generated a fake NeurIPS architecture. Loss function, code, the works. How does this get fixed? [D] GPT confidently generated a fake NeurIPS architecture. Loss function, code, the works. How does this get fixed?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

15 часов назад @ reddit.com
[D] The More We Heal, The More AI Heals.
[D] The More We Heal, The More AI Heals.

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

17 часов назад @ reddit.com
[D] What's the SOTA audio classification model/method?
[D] What's the SOTA audio classification model/method?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

20 часов назад @ reddit.com
[P] I built an open plant species classification model trained on 2M+ iNaturalist images
[P] I built an open plant species classification model trained on 2M+ iNaturalist images

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

21 час назад @ reddit.com
[D] The sudden improvements in ARC-2 by frontier models
[D] The sudden improvements in ARC-2 by frontier models

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day назад @ reddit.com
[D] Interview preparation for research scientist/engineer or Member of Technical staff position for frontier labs
[D] Interview preparation for research scientist/engineer or Member of Technical staff position for frontier labs

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 1 hour назад @ reddit.com
[D] Examining Author Counts and Citation Counts at ML Conferences
[D] Examining Author Counts and Citation Counts at ML Conferences

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 5 hours назад @ reddit.com
[D] Well I'm kinda stuck here...
[D] Well I'm kinda stuck here...

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 7 hours назад @ reddit.com
[R] Reproduced "Scale-Agnostic KAG" paper, found the PR formula is inverted compared to its source
[R] Reproduced "Scale-Agnostic KAG" paper, found the PR formula is inverted compared to its source

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 9 hours назад @ reddit.com
[D] ARR October 2026 Discussion
[D] ARR October 2026 Discussion

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 10 hours назад @ reddit.com
[R] Found the same information-dynamics (entropy spike → ~99% retention → power-law decay) across neural nets, CAs, symbolic models, and quantum sims. Looking for explanations or ways to break it.
[R] Found the same information-dynamics (entropy spike → ~99% retention → power-law decay) across neural nets, CAs, symbolic models, and quantum sims. Looking for explanations or ways to break it.

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 11 hours назад @ reddit.com
Towards Data Science
последний пост 11 часов назад
The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel
The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel

If you already know this model, here is a question for you:Is Logistic Regression a regressor or a classifier?

There are gamma regression, logistic regression, Poisson regression…perspective, it is a regression.

There are gamma regression, logistic regression, Poisson regression… In the machine learning perspective, it is used for classification.

How Logistic Regression worksWe start with : ax + b, just like the linear regression.

Gradient CalculationFor logistic regression, the gradients of the average log-loss follow a very simple structure.

11 часов назад @ towardsdatascience.com
Decentralized Computation: The Hidden Principle Behind Deep Learning
Decentralized Computation: The Hidden Principle Behind Deep Learning Decentralized Computation: The Hidden Principle Behind Deep Learning

Whereas today’s most successful AI systems – Deep Neural Networks – look very different.

Deep Learning: A Battle between Centralization and DecentralizationNow let’s look at both sides of this bridge.

However, in Deep Neural Networks, parameters are organized in layers which are stacked on top of each other.

Well, as we showed above, in Deep Neural Networks it’s via the propagation of gradients (gradient flow).

Deep learning.

12 часов назад @ towardsdatascience.com
EDA in Public (Part 1): Cleaning and Exploring Sales Data with Pandas
EDA in Public (Part 1): Cleaning and Exploring Sales Data with Pandas

Hey everyone! Welcome to the start of a major data journey that I’m calling “EDA in Public.” For those who know me, I believe the best way to learn anything is to tackle a real-world problem and share the entire messy process — including mistakes, victories, and everything in between. If you’ve been looking to level up […]

The post EDA in Public (Part 1): Cleaning and Exploring Sales Data with Pandas appeared first on Towards Data Science.

15 часов назад @ towardsdatascience.com
Spectral Community Detection in Clinical Knowledge Graphs
Spectral Community Detection in Clinical Knowledge Graphs Spectral Community Detection in Clinical Knowledge Graphs

You can reproduce the full pipeline, from synthetic note generation to Neo4j graph analysis and spectral computations, in Google Colab and/or a local Python environment.

Methodology OverviewIn this section we outline the steps of the project, from synthetic clinical text generation to community detection and spectral analysis.

Apply the Leiden community detection algorithm to identify clusters of patients that share related conditions.

Since this is a bipartite graph (patients connected through shared diseases), the maximum separation between any two patients is 4 additional patients.

Computing the Algebraic Connectivity for a Sample Leiden CommunityWe illustrate the process using Leiden co…

18 часов назад @ towardsdatascience.com
The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel
The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel The Machine Learning “Advent Calendar” Day 11: Linear Regression in Excel

Now, when I say, Linear Regression, I mean Ordinary Least Square Linear Regression.

Their argument is that machine learning is a “new” field, while Linear Regression existed long before, so it cannot be considered ML.

In other words, Linear Regression is one of the oldest models, but also one of the most fundamental in machine learning.

In this article, our Linear Regression model has exactly 2 weights.

Linear Regression itself can separate two classes (0 and 1), but more robust versions lead to Logistic Regression and SVM.

1 day, 12 hours назад @ towardsdatascience.com
Drawing Shapes with the Python Turtle Module
Drawing Shapes with the Python Turtle Module Drawing Shapes with the Python Turtle Module

In this article, we will use Python to create graphical outputs by using the Python module Turtle.

Python’s Turtle ModuleThe turtle module in Python is a module that allows graphical outputs through code.

Let us create our turtle and screen objects:from turtle import Turtle, Screen my_turtle = Turtle() print(my_turtle)Turtle Object created (Image by Author)As can be seen from the screenshot above, the turtle object has been created, and its location is defined.

Building the AlgorithmWe have used the turtle module to draw a triangle, a square, and a pentagon.

This is a basic-level example of what could be done with the Turtle module.

1 day, 13 hours назад @ towardsdatascience.com
7 Pandas Performance Tricks Every Data Scientist Should Know
7 Pandas Performance Tricks Every Data Scientist Should Know 7 Pandas Performance Tricks Every Data Scientist Should Know

A recent State of Data Science survey reports that 77% of practitioners use Pandas for data exploration and processing.

Specify dtypes upfrontWhen you force Pandas to guess data types, it has to scan the entire file.

That’s why the next trick is all about choosing the right data types to make your Pandas operations faster and lighter.

And when you’re using less memory, operations like filtering and joins run faster because there’s less data for Pandas to shuffle around.

If you build good habits around how you write and structure your Pandas code, performance becomes much less of a problem.

1 day, 15 hours назад @ towardsdatascience.com
How Agent Handoffs Work in Multi-Agent Systems
How Agent Handoffs Work in Multi-Agent Systems How Agent Handoffs Work in Multi-Agent Systems

An agentic handoff is the moment when an agent directly and dynamically passes control to another agent after finishing its work.

Here is the flow:Supervisor agent analyzes user intent and decides that it needs assistance from the research agent Supervisor passes control (and state*) to the research agent The research agent performs the task and decides whether to handoff back to the supervisor agent or end the conversation.

(4) Handoffs in LangGraphThere are two mechanisms for agent handoffs in LangGraph:Conditional edgesCommand object(4.1) Conditional Edges (Static Routing-Based Handoff)A conditional edge is the classic graph-routing method for handing off control between agents.

This is …

1 day, 16 hours назад @ towardsdatascience.com
The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel
The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel

Summary in 3 StepsDBSCAN asks three simple questions for each point:How many neighbors do you have within a small radius (eps)?

Here is the summary of the DBSCAN algorithm in 3 steps:DBSCAN in excel – all images by authorLet us begin step by step.

It is the smallest number of neighbors a point must have (inside the eps radius) to be considered a Core point.

Once the Core points are known, we simply check which points are density-reachable from them.

In Excel, we can represent this as a simple connectivity table that shows which points are linked through Core neighbors.

2 days, 12 hours назад @ towardsdatascience.com
How to Maximize Agentic Memory for Continual Learning
How to Maximize Agentic Memory for Continual Learning How to Maximize Agentic Memory for Continual Learning

In this article, I’ll provide a high-level overview of how I achieve continual learning with LLMs by continually updating the agents.md file.

AGI and continual learningI would also like to add a note on AGI and continual learning.

True continual learning is sometimes said to be one of the last hindrances to achieving AGI.

Unfortunately, true continual learning is not achieved yet, but it’s likely a capability we’ll see more of in the coming years.

ConclusionIn this article, I’ve talked about how to become a far more effective engineer by utilizing agents.md for continual learning.

2 days, 13 hours назад @ towardsdatascience.com
Don’t Build an ML Portfolio Without These Projects
Don’t Build an ML Portfolio Without These Projects Don’t Build an ML Portfolio Without These Projects

Think of these simple projects as the “warm-up reps” at the gym.

Aim to build a wide range of projects, each using different tools, datasets, and machine learning algorithms.

GitHub – egorhowell/Data-Science-Projects: A selection of small Data Science Projects.

A selection of small Data Science Projects.

I recommend you start blogging here on Towards Data Science, as it’s very easy to use, has a large data science community, and already has an in-built audience.

2 days, 15 hours назад @ towardsdatascience.com
Optimizing PyTorch Model Inference on AWS Graviton
Optimizing PyTorch Model Inference on AWS Graviton Optimizing PyTorch Model Inference on AWS Graviton

AWS GravitonAWS Graviton is a family of processors based on Arm Neoverse CPUs, that are custom designed and built by AWS for optimal price-performance and energy efficiency.

The intention of this post is to demonstrate tips for boosting performance on an AWS Graviton instance.

Importantly, our intention is not to draw a comparison between AWS Graviton and alternative chips, nor is it to advocate for the use of one chip over the other.

Model CompilationThe support of PyTorch compilation for AWS Graviton is an area of focused effort of the AWS Graviton team.

Both libraries include dedicated support for running on AWS Graviton (e.g., see here and here).

2 days, 16 hours назад @ towardsdatascience.com
The Machine Learning “Advent Calendar” Day 9: LOF in Excel
The Machine Learning “Advent Calendar” Day 9: LOF in Excel The Machine Learning “Advent Calendar” Day 9: LOF in Excel

We did Isolation Forest with it, and we will do LOF with it again.

First, we define the LRD, Local Reachability Density, which is simply the inverse of the average reachability distance.

LOF in Excel – image by authorNow, we can compare Isolation Forest and LOF side by side.

Isolation Forest gives it the lowest score,and LOF gives it the highest LOF value.

Only then can you decide whether LOF, or k-distance, or Isolation Forest is the right choice for your specific situation.

3 days, 10 hours назад @ towardsdatascience.com
Personal, Agentic Assistants: A Practical Blueprint for a Secure, Multi-User, Self-Hosted Chatbot
Personal, Agentic Assistants: A Practical Blueprint for a Secure, Multi-User, Self-Hosted Chatbot Personal, Agentic Assistants: A Practical Blueprint for a Secure, Multi-User, Self-Hosted Chatbot

how I’ve built a self-hosted, end-to-end platform that gives each user a personal, agentic chatbot that can autonomously search through only the files that the user explicitly allows it to access.

That led to this week’s challenge:Build an agentic chatbot equipped with tools to access a user’s personal notes securely, without compromising privacy.

Not a shared assistant but a private agent for every user where user has full control over which files their agent can read and reason about.

Flow 1: User file management: What happens when we submit a file?

The agent is equipped with tools, including a semantic vector-search tool, and can only search documents the user has permission to access.

3 days, 12 hours назад @ towardsdatascience.com
How to Develop AI-Powered Solutions, Accelerated by AI
How to Develop AI-Powered Solutions, Accelerated by AI How to Develop AI-Powered Solutions, Accelerated by AI

What this actually means is: how to successfully develop AI-powered solutions, accelerated by AI itself.

For AI solutions, remember that “Effort” includes dealing with AI’s complexity such as data acquisition, system evaluation, or determining required guardrails.

Or, you could explore AI-powered solutions, such as using a large language model (LLM) to generate a product description or suggest a category.

: Many software development tools now include Generative AI features that help lower the entry barrier to coding.

AI solutions integrated in the backend and not user-facing, might have a lot of potential too.

3 days, 13 hours назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
TheSequence TheSequence
последний пост 1 day, 16 hours назад
The Sequence Opinion #770: The Post-GPU Era: Why AI Needs a New Kind of Computer
The Sequence Opinion #770: The Post-GPU Era: Why AI Needs a New Kind of Computer The Sequence Opinion #770: The Post-GPU Era: Why AI Needs a New Kind of Computer

What got me thinking about this idea was the announcement of Unconventional AI which raised a considerable amount of money of work precisesly on this problem.

Recent events underscore this concern: a new startup called Unconventional AI made headlines by raising an unprecedented $475 million seed round to develop radically new computing hardware for AI.

The human brain performs extraordinary feats on only ~20 watts of power, whereas training a single large AI model can devour megawatt-hours.

The sheer gap suggests that AI might require a new form of computing to continue its trajectory.

The Reign of Matrix Multiplications and GPUs

1 day, 16 hours назад @ thesequence.substack.com
The Sequence AI of the Week #769: Inside Gemini Deep Think
The Sequence AI of the Week #769: Inside Gemini Deep Think The Sequence AI of the Week #769: Inside Gemini Deep Think

Created Using GPT-5Gemini Deep Think is one of the most innovative architectures of recent times and, yet, we know so little about it.

Today, I would like to summarize some of the things I learned about Deep Think.

Gemini DeepThink made news when it score a gold medal at the 2025 international math olympiad using a parallel technique over the standard Gemini model.

It embodies the current frontier idea that how a model uses its compute at inference time matters as much as raw parameter count.

From chain-of-thought hacks to “thinking models”

2 days, 16 hours назад @ thesequence.substack.com
The Sequence Knowledge #768: Using Rephrasing for Synthetic Data Generation
The Sequence Knowledge #768: Using Rephrasing for Synthetic Data Generation The Sequence Knowledge #768: Using Rephrasing for Synthetic Data Generation

Created Using GPT-5Today we will Discuss:Understanding the different types of rephrasing methods for synthetic data generation.

Diving inside Microsoft’s Evol-Instruct method to create highly sophisticated synthetic instruction datasets.

💡 AI Concept of the Day: Understanding the Types of Rephrasing Methods for Synthetic Data GenerationRephrasing is the most reliable way to expand a labeled dataset without changing its ground truth.

In language tasks this means paraphrasing instructions, questions, or rationales; in code it means altering comments, identifiers, or scaffolding while keeping unit tests green; in multimodal alignment it means rewriting captions or prompts without altering the …

3 days, 16 hours назад @ thesequence.substack.com
The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency
The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Google Logic, Amazon Utility, and Mistral EfficiencyThe focus of model development shifted noticeably this week.

The most technically significant release is Gemini 3 Deep Think.

Instead of the standard immediate next-token prediction, Deep Think utilizes a “parallel thinking” process.

🤖 AI Tech ReleasesGemini 3 Deep ThinkGoogle released Gemini 3 Deep Think, its innovative reasoning models that scored gold medals in the recent international math olympiad.

Mistral 3Mistral released Mistral 3, which includes 3 small models (14B, 8B, and 3B) and Mistral Large 3.

5 days, 16 hours назад @ thesequence.substack.com
The Sequence Opinion #766:Why Agents Need a “Headless” Internet
The Sequence Opinion #766:Why Agents Need a “Headless” Internet The Sequence Opinion #766:Why Agents Need a “Headless” Internet

Created Using Gemini 3Today’s installment discusses a topic that I have spent a lot of time thinking about recently.

Do we need to reimagine the web as it is for AI agents?

This idea is not as crazy as it might sound and there are already solid efforts in the space.

However, there are also plenty of challenges.

The Bifurcation of the Web

1 week, 1 day назад @ thesequence.substack.com
The Sequence AI of the Week #765: Diving into Claude Opus 4.5
The Sequence AI of the Week #765: Diving into Claude Opus 4.5 The Sequence AI of the Week #765: Diving into Claude Opus 4.5

Created Using GPT-5Today, we are going to dive into the hottest AI release of last week.

Claude Opus 4.5 is Anthropic’s new flagship model in the Claude 4.5 family, and it’s very clearly optimized around a single thesis: large language models are no longer just chatbots, they’re operating systems for agents.

At the core, Opus 4.5 is a large decoder-only transformer trained with next-token prediction on a broad mixture of internet text, code, documents, and synthetic data, continuing the Claude lineage.

Anthropic doesn’t publish layer counts or parameter numbers, but from its behavior and public documentation it’s clear that the model combines high capacity with careful optimization for long…

1 week, 2 days назад @ thesequence.substack.com
The Sequence Knowledge #764: Wanna do Synthetic Data? Learn About Rephrasing
The Sequence Knowledge #764: Wanna do Synthetic Data? Learn About Rephrasing The Sequence Knowledge #764: Wanna do Synthetic Data? Learn About Rephrasing

Created Using Gemini 3Today we will Discuss:An introduction to rephrasing methods for synthetic data generation.

A review of HuggingFace’s Cosmopedia synthetically generated dataset.

💡 AI Concept of the Day: An Introduction to RephrasingRephrasing is the workhorse of synthetic data generation: you start with a correctly labeled seed example and produce semantically equivalent variants that preserve the label while expanding coverage.

For text, that means rewriting an instruction, query, or rationale without changing its truth conditions; for code, it’s altering comments, variable names, or scaffolding without affecting behavior; for multimodal tasks, it can be caption restyling or prompt re…

1 week, 3 days назад @ thesequence.substack.com
The Sequence Radar #763: Last Week AI Trifecta: Opus 4.5, DeepSeek Math, and FLUX.2
The Sequence Radar #763: Last Week AI Trifecta: Opus 4.5, DeepSeek Math, and FLUX.2 The Sequence Radar #763: Last Week AI Trifecta: Opus 4.5, DeepSeek Math, and FLUX.2

The AI of the week dives into Claude Opus 4.5 ofc.

Subscribe and don’t miss out:📝 Editorial: Last Week AI Trifecta: Opus 4.5, DeepSeek Math, and FLUX.2The pace of AI development is picking up, and frankly, this week has been a bit of a blur.

I’ve been playing around with the new releases—Claude Opus 4.5, DeepSeek Math V2, and the new FLUX.2 from Black Forest Labs—and the “feeling” of using these models is distinct.

First, let’s talk about Claude Opus 4.5.

🤖 AI Tech ReleasesClaude Opus 4.5Anthropic released Claude Opus 4.5 which sets new leves of coding, computer usage and agentic tasks.

1 week, 5 days назад @ thesequence.substack.com
The Sequence Opinion #762: Trillion-Parameter Diplomacy: China, the US, and the Battle for Open Models
The Sequence Opinion #762: Trillion-Parameter Diplomacy: China, the US, and the Battle for Open Models The Sequence Opinion #762: Trillion-Parameter Diplomacy: China, the US, and the Battle for Open Models

Today, we are going to debate a hot topic: US vs. China in open source AI.

What began with a few leaked research models has exploded into a global competition to push the boundaries of what open models can do.

In this essay, we’ll dive into the state of this race as of late 2024–2025, comparing the leading contributions from China and the US (broadly including Western efforts).

Think of this as a walkthrough of the open LLM landscape: technical, opinionated, and zooming in and out between micro details and the big picture.

For a deeper analysis on this topic, check out the US vs. China open source AI evaluation dashboard released by my company LayerLens: https://app.layerlens.ai/evaluation-…

2 weeks, 1 day назад @ thesequence.substack.com
The Sequence AI of the Week #761: Olmo 3 vs. The Black Box: What a Truly Inspectable LLM Looks Like
The Sequence AI of the Week #761: Olmo 3 vs. The Black Box: What a Truly Inspectable LLM Looks Like The Sequence AI of the Week #761: Olmo 3 vs. The Black Box: What a Truly Inspectable LLM Looks Like

Rather than trying to win with a radically new network design, the Allen Institute for AI treats architecture, data, training curriculum, and openness as a single coherent object.

At a high level, Olmo 3 comes in two main parameter scales, around 7 billion and 32 billion parameters.

Each scale is offered in several behavioral variants that sit on top of a shared base architecture.

The Base model is the foundational system trained on a multi-stage curriculum; it is meant to be a strong backbone for further pretraining or fine-tuning.

What is distinctive is not just that these variants exist, but that their entire training flow is documented and reproducible, so that researchers can see how e…

2 weeks, 2 days назад @ thesequence.substack.com
The Sequence Knowledge #760: Everything You Need to Know About Generative Synthesis in AI Models
The Sequence Knowledge #760: Everything You Need to Know About Generative Synthesis in AI Models The Sequence Knowledge #760: Everything You Need to Know About Generative Synthesis in AI Models

Created Using Gemini 3Today we will Discuss:An overview of the most important generative synthesis methods.

A review of Stanford University’s research on the STaR method for synthetic data generation for reasoning.

💡 AI Concept of the Day: Not All Generative Synthesis Methods are Created EqualHere’s a clean way to frame generative synthesis across two axes: (1) spec-first vs. goal-conditioned control and (2) the model class you use to realize it—autoregressive (AR) decoders (LLMs for text/code, AR TTS, etc.)

Spec-first begins with an explicit blueprint—schema, fields, distributions, difficulty knobs—and asks the model to instantiate it.

Either control style can be implemented with either mo…

2 weeks, 3 days назад @ thesequence.substack.com
The Sequence Radar #
The Sequence Radar # The Sequence Radar #

For me, this journey has been an incredible learning experience about the real state of the AI market.

Google answered on the flagship front with Gemini 3 Pro, designed as a general-purpose reasoning and agentic coding model.

It’s designed to be model-agnostic but is deeply wired into the Gemini stack from day one.

SAM3Meta released Segment Anything 3(SAM 3), its object segmentation and tracking model, they also released the SAM 3 playground.

Olmo 3The Allen Institute for AI(AI2) released Olmo3, a completely open source family of models, datasets and training stack.

2 weeks, 5 days назад @ thesequence.substack.com
The Sequence Opinion #758: From Language to Landscape: The Age of Spatially Intelligent AI
The Sequence Opinion #758: From Language to Landscape: The Age of Spatially Intelligent AI The Sequence Opinion #758: From Language to Landscape: The Age of Spatially Intelligent AI

Created Using GPT-5I spent last week researching world models like Marble and SIMA2 and decided to put together a insanely long essay.

Achieving spatial intelligence would enable AI not just to talk about the world, but to truly understand and operate within it.

In this essay, we survey the current landscape and future potential of world models as a path toward spatial intelligence.

We begin with the technical foundations of world models and their key capabilities.

Technical Foundations of World Models

3 weeks, 1 day назад @ thesequence.substack.com
The Sequence AI of the Week #757: 3D World Models in Action: Inside DeepMind’s SIMA 2 Architecture
The Sequence AI of the Week #757: 3D World Models in Action: Inside DeepMind’s SIMA 2 Architecture The Sequence AI of the Week #757: 3D World Models in Action: Inside DeepMind’s SIMA 2 Architecture

Create Using GPT-5World models are becoming a reality in front of our eyes!

Today, we would like to dive into one of the most exciting ones.

DeepMind’s SIMA 2 is best understood as a systems project disguised as a gaming demo: it is a full-stack embodied agent that wraps a Gemini model in a visuomotor control loop, trains it across many 3D games, and then lets it improve itself through model-driven task generation and self-play.

Rather than proposing a new neural building block in isolation, SIMA 2 offers a reference architecture for how a large multimodal model can perceive, reason, and act in complex simulated worlds using exactly the same interface as a human player.

3 weeks, 2 days назад @ thesequence.substack.com
The Sequence Knowledge #756: The Simplest Approach to Synthetic Data Generation
The Sequence Knowledge #756: The Simplest Approach to Synthetic Data Generation The Sequence Knowledge #756: The Simplest Approach to Synthetic Data Generation

Diving into Microsoft’s WinzardLM model that uses generative synthesis for following instructions.

💡 AI Concept of the Day: Understanding Generative SynthesisToday, let’s dive into one of the most straightforward mechanisms for synthetic data generation.

Generative synthesis is the process of creating new data by modeling the underlying patterns and distributions of real-world datasets.

Rather than simply augmenting data with random perturbations, generative synthesis learns the generative process itself, allowing it to produce realistic and diverse samples across domains such as text, images, time series, and structured data.

The approach has become foundational in synthetic data generatio…

3 weeks, 3 days назад @ thesequence.substack.com
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 2 months, 3 weeks назад
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода

Однако все задачи в MERA CODE, как впрочем и в SWE-bench и других бенчмарках подобного назначения, следуют классической парадигме, когда у нас есть фиксированный обучающий набор данных и, что более важно, фиксированный проверочный набор.

Но большие языковые модели для кодинга, которые мы и пытаемся оценивать нашим набором, также учатся на GitHub – со времен еще первой модели LLaMa.

Кажется, что 700 задач немного, но это уже очень приличное количество, и что самое важное — это новые задачи.

Current behavior: from sympy import ask, Q, Symbol x = Symbol('x') print(ask(Q.finite(x**-1), Q.real(x))) # Output: True Expected behavior: The function should return None to indicate uncertainty, as x**-…

2 months, 3 weeks назад @ habr.com
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке

Ответ: Кэисукэ ТибаSPARQL-запрос SimpleSELECT DISTINCT ?s ?r ?o WHERE { { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

GROUP BY ?s ?r HAVING(count(?o) = 1) } { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

Ответ: Национальная система платежных карт (НСПК) Центр биометрических технологий (ЦБТ) ЕБСSELECT ?s ?r ?o ?len WHERE { { SELECT ?s ?r (COUNT(?o1) as ?len) (GROUP_CONCAT(DISTINCT(STR(?o1));separator="|") AS ?o) WHERE { ?s ?r ?o1 . }

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

4 months, 2 weeks назад @ habr.com
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip

В этой статье я хочу поделиться своим опытом по конвертации нейросети в формат rknn с помощью библиотеки rknn-toolkit2.

Вот как выглядят веса pytorch модели в Netron:веса pytorch модели в NetronВажно!

Конвертация onnx модели в rknnДалее создается объект RKNN , который управляет процессом конвертации и инференса модели на платформе Rockchip.

На этом этапе происходит подготавка модели к конвертации в формат RKNN и последующему запуску на NPU Rockchip.

Создание и экспорт rknn моделиНа этом этапе происходит конвертация ONNX-модели во внутренний формат RKNN, оптимизация графа и подготовка к запуску на NPU Rockchip.

4 months, 3 weeks назад @ habr.com
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях MERA Code: всесторонняя оценка генерации кода в прикладных сценариях

🔗MERA Code🔗GitHub с кодом и данными🔗Коллекция на Hugging Face🔗Статья на arxiv🔗Репозиторий проекта на GitVerseЧто такое MERA Code?

Современные кодовые языковые модели и модели общего назначения (ChatGPT, Claude, Qwen, YandexGPT, GigaChat и др.)

Список текущих задач MERA Code и их характеристикКаталог задач MERA Code и их подробное описание представлено на сайте.

В MERA Code промпты строго подобраны под задачу и корректный выбор ответа.

В заключениеMERA Code — это попытка закрыть важный пробел в тестировании LLM: насколько они действительно полезны в реальной, локализованной разработке.

4 months, 3 weeks назад @ habr.com
Байесовская собака: анализ пёсьего компаса
Байесовская собака: анализ пёсьего компаса Байесовская собака: анализ пёсьего компаса

", подумал я. И, к счастью, у меня как раз под рукой оказался идеальный подопытный.

Стандартное арифметическое среднее между 360° и 0° даст нам 180°, несмотря на то, что и 360°, и 0° указывают в одном направлении.

Нулевая гипотеза утверждает, что данные распределены равномерно по кругу, альтернативная — что это не так.

from pingouin import circ_vtest v, pval = circ_vtest(data['radians'], dir=np.pi) print(f"V-statistics: {v:.3f}; p-value: {pval:.6f}")>> V-statistics: 24.127; p-value: 0.002904Вот мы и подобрались к чему-то интересному!

Априорное распределение и функция правдоподобияПредположим, что у нас есть:Априорное распределение с параметрамиФункция правдоподобия для нового наблюдения с п…

8 months, 2 weeks назад @ habr.com
Machine Learning Mastery
последний пост 2 weeks назад
Fine-Tuning a BERT Model
Fine-Tuning a BERT Model Fine-Tuning a BERT Model

This article is divided into two parts; they are: • Fine-tuning a BERT Model for GLUE Tasks • Fine-tuning a BERT Model for SQuAD Tasks GLUE is a benchmark for evaluating natural language understanding (NLU) tasks.

2 weeks назад @ machinelearningmastery.com
The Journey of a Token: What Really Happens Inside a Transformer
The Journey of a Token: What Really Happens Inside a Transformer The Journey of a Token: What Really Happens Inside a Transformer

Large language models (LLMs) are based on the transformer architecture, a complex deep neural network whose input is a sequence of token embeddings.

2 weeks, 2 days назад @ machinelearningmastery.com
Pretrain a BERT Model from Scratch
Pretrain a BERT Model from Scratch Pretrain a BERT Model from Scratch

This article is divided into three parts; they are: • Creating a BERT Model the Easy Way • Creating a BERT Model from Scratch with PyTorch • Pre-training the BERT Model If your goal is to create a BERT model so that you can train it on your own data, using the Hugging Face `transformers` library is the easiest way to get started.

2 weeks, 2 days назад @ machinelearningmastery.com
K-Means Cluster Evaluation with Silhouette Analysis
K-Means Cluster Evaluation with Silhouette Analysis K-Means Cluster Evaluation with Silhouette Analysis

Clustering models in machine learning must be assessed by how well they separate data into meaningful groups with distinctive characteristics.

2 weeks, 3 days назад @ machinelearningmastery.com
The Complete Guide to Docker for Machine Learning Engineers
The Complete Guide to Docker for Machine Learning Engineers The Complete Guide to Docker for Machine Learning Engineers

Machine learning models often behave differently across environments.

2 weeks, 4 days назад @ machinelearningmastery.com
Preparing Data for BERT Training
Preparing Data for BERT Training Preparing Data for BERT Training

This article is divided into four parts; they are: • Preparing Documents • Creating Sentence Pairs from Document • Masking Tokens • Saving the Training Data for Reuse Unlike decoder-only models, BERT's pretraining is more complex.

2 weeks, 4 days назад @ machinelearningmastery.com
BERT Models and Its Variants
BERT Models and Its Variants BERT Models and Its Variants

This article is divided into two parts; they are: • Architecture and Training of BERT • Variations of BERT BERT is an encoder-only model.

2 weeks, 6 days назад @ machinelearningmastery.com
From Shannon to Modern AI: A Complete Information Theory Guide for Machine Learning
From Shannon to Modern AI: A Complete Information Theory Guide for Machine Learning From Shannon to Modern AI: A Complete Information Theory Guide for Machine Learning

In 1948, Claude Shannon published a paper that changed how we think about information forever.

3 weeks, 1 day назад @ machinelearningmastery.com
Why Decision Trees Fail (and How to Fix Them)
Why Decision Trees Fail (and How to Fix Them) Why Decision Trees Fail (and How to Fix Them)

Decision tree-based models for predictive machine learning tasks like classification and regression are undoubtedly rich in advantages — such as their ability to capture nonlinear relationships among features and their intuitive interpretability that makes it easy to trace decisions.

3 weeks, 2 days назад @ machinelearningmastery.com
Training a Tokenizer for BERT Models
Training a Tokenizer for BERT Models Training a Tokenizer for BERT Models

This article is divided into two parts; they are: • Picking a Dataset • Training a Tokenizer To keep things simple, we'll use English text only.

3 weeks, 3 days назад @ machinelearningmastery.com
Forecasting the Future with Tree-Based Models for Time Series
Forecasting the Future with Tree-Based Models for Time Series Forecasting the Future with Tree-Based Models for Time Series

Decision tree-based models in machine learning are frequently used for a wide range of predictive tasks such as classification and regression, typically on structured, tabular data.

3 weeks, 3 days назад @ machinelearningmastery.com
The Complete AI Agent Decision Framework
The Complete AI Agent Decision Framework The Complete AI Agent Decision Framework

You've learned about

3 weeks, 4 days назад @ machinelearningmastery.com
Mastering JSON Prompting for LLMs
Mastering JSON Prompting for LLMs Mastering JSON Prompting for LLMs

LLMs

4 weeks назад @ machinelearningmastery.com
5 Essential Python Scripts for Intermediate Machine Learning Practitioners
5 Essential Python Scripts for Intermediate Machine Learning Practitioners 5 Essential Python Scripts for Intermediate Machine Learning Practitioners

As a machine learning engineer, you probably enjoy working on interesting tasks like experimenting with model architectures, fine-tuning hyperparameters, and analyzing results.

4 weeks, 1 day назад @ machinelearningmastery.com
Datasets for Training a Language Model
Datasets for Training a Language Model Datasets for Training a Language Model

Share Post ShareA language model is a mathematical model that describes a human language as a probability distribution over its vocabulary.

In this article, you’ll learn about datasets used to train language models and how to source common datasets from public repositories.

A Good Dataset for Training a Language ModelA good language model should learn correct language usage, free of biases and errors.

For language model training, datasets typically contain text strings.

Post-Processing the DatasetsBefore training a language model, you may want to post-process the dataset to clean the data.

1 month назад @ machinelearningmastery.com
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 3 weeks, 5 days назад
Authentic Imperfection
Authentic Imperfection Authentic Imperfection

* * *I’ve been thinking about the anger surrounding generative AI.

To keep things fair, he took the best human images and best AI images, meaning human art from famous artists, and AI art from prompters skilled at removing obvious tells of image generation.

When people complain about AI slop, I see it as a complaint against the deluge of default style AI images.

We’ve seen this happen in all forms: AI text, AI music, older forms of computer generated content like CGI.

As much as we celebrate imperfection, digital imperfection is a step too far.

3 weeks, 5 days назад @ alexirpan.com
Ten Years Later
Ten Years Later Ten Years Later

Every now and then, someone asks me why I blog, and I don’t know really know what to tell them.

That’s another reason I’m not celebrating 10 years with more gusto, I know I’ve been writing less.

Indiana Jones and the Great Circle: I don’t know how they did it, but Indiana Jones and the Great Circle was just fun all the way through.

My one complaint is that the hand-to-hand combat feels like the worst part of the game, so of course they put a bunch of upgrades behind learning parry timings you’ll never use later.

I have not tried Peak, but Another Crab’s Treasure was really good and is worth playing if you’re interested in a Souls-like.

3 months, 3 weeks назад @ alexirpan.com
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025 Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025

A music concert in the evenings, typically set up as a rave with EDM or rock music made by brony musicians.

She has been involved in organizing pony music concerts for over a decade, for both BABSCon and other pony conventions.

Thank you, BABSCon ChairsThe brony musicians immediately jump into an emergency Discord call with Pinkaboo, to get her side of the story.

Other conventions start tweeting in support of the brony musicians, with no one taking BABSCon’s side.

It’s hard for me to explain why I like MLP fan music, because brony music really isn’t accessible.

4 months, 3 weeks назад @ alexirpan.com
Who is AI For?
Who is AI For? Who is AI For?

I think the easy answer to this question is that right now, AI is for the AI developers.

Code is useful, it makes money, it is a testbed for AI speeding up the development of AI, and it is easy.

I’m working in AI because it pays well and is potentially really good for the world.

The artists did not know what AI was, but when they learned, they quickly decided they did not want it.

It feels like the most likely outcome is that people go all-in on pushing raw intelligence, in the way that AI developers can measure it, leaving behind those that are not like AI developers.

8 months, 2 weeks назад @ alexirpan.com
Lil'Log
последний пост None
The Spectator
последний пост None
Off the Convex Path
последний пост None
Piekniewski's blog
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
Andrew Karpathy blog
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост 4 months, 3 weeks назад
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy /henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos.

Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker.

It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage.

By learning geometry and motion jointly from …

4 months, 3 weeks назад @ paperswithcode.com
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation /antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation

Calisthenics skill classification is the computer vision task of inferring the skill performed by an athlete from images, enabling automatic performance assessment and personalized analytics.

Traditional methods for calisthenics skill recognition are based on pose estimation methods to determine the position of skeletal data from images, which is later fed to a classification algorithm to infer the performed skill.

This work proposes a direct approach to calisthenics skill recognition, which leverages depth estimation and athlete patch retrieval to avoid the computationally expensive human pose estimation module.

Using Depth Anything V2 for depth estimation and YOLOv10 for athlete localizat…

4 months, 3 weeks назад @ paperswithcode.com
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI /snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI

Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost.

Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parallelism strategy that adapts to real-world traffic while integrating speculative decoding, SwiftKV compute reduction, and optimized embedding inference.

It achieves up to 3.4 times faster request completion, 1.75 times faster generation, and 1.6M tokens/sec per GPU for embeddings, outperforming both latency- and throughput-optimized deployments.

Already powering Snowflake Cortex AI, Arctic Inference delivers state-of-the-art, cost-effective inference for ent…

4 months, 3 weeks назад @ paperswithcode.com
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale /NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting.

The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales.

FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches.

In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realis…

4 months, 3 weeks назад @ paperswithcode.com
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work? /jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

We study A/B experiments that are designed to compare the performance of two recommendation algorithms.

The bias arising from this type of data sharing is known as "symbiosis bias".

In this paper, we highlight that, for decision-making purposes, the sign of the GTE often matters more than its precise magnitude when selecting the better algorithm.

We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the expected GTE estimate under data sharing aligns with or contradicts the sign of the true GTE.

Our analysis identifies the level of exploration versus exploitation as a key determinant of how symbiosis bias impacts algorithm selection.

4 months, 3 weeks назад @ paperswithcode.com
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression /qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression

Task-agnostic prompt compression leverages the redundancy in natural language to reduce computational overhead and enhance information density within prompts, especially in long-context scenarios.

Existing methods predominantly rely on information entropy as the metric to compress lexical units, aiming to achieve minimal information loss.

However, these approaches overlook two critical aspects: (i) the importance of attention-critical tokens at the algorithmic level, and (ii) shifts in information entropy during the compression process.

Motivated by these challenges, we propose a dynamic attention-aware approach for task-agnostic prompt compression (DAC).

This approach effectively integrate…

4 months, 3 weeks назад @ paperswithcode.com
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions /lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions

Large Language Models (LLMs) can provide accurate word definitions and explanations for any context.

However, the scope of the definition changes for different target groups, like children or language learners.

We investigate how simplification impacts homonym definition quality across three target groups: Normal, Simple, and ELI5.

Our results show that simplification drastically degrades definition completeness by neglecting polysemy, increasing the risk of misunderstanding.

Fine-tuning Llama 3.1 8B with Direct Preference Optimization substantially improves homonym response quality across all prompt types.

4 months, 3 weeks назад @ paperswithcode.com
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention /pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention

Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations - fabricated content contradicting visual inputs.

Existing hallucination mitigation methods either incur prohibitive computational costs or introduce distribution mismatches between training data and model outputs.

We identify a critical insight: hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.

To address this, we propose **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning), a framework that eliminates dependency on human annotations…

4 months, 3 weeks назад @ paperswithcode.com
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models /owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Language models (LMs) are challenging to adapt to new data distributions by simple finetuning.

This is due to the rigidity of their subword tokenizers, which typically remain unchanged during adaptation.

This inflexibility often leads to inefficient tokenization, causing overfragmentation of out-of-distribution domains, unseen languages, or scripts.

In this work, we develop byte-level LMs with learnable tokenizers to make tokenization adaptive.

Our models include a submodule that learns to predict boundaries between the input byte sequence, encoding it into variable-length segments.

4 months, 3 weeks назад @ paperswithcode.com
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion /wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion

To address the challenges posed by cascading reactions caused by component failures in autonomous cargo ships (ACS) and the uncertainties in emergency decision-making, this paper proposes a novel hybrid feature fusion framework for constructing a graph-structured dataset of failure modes.

A hierarchical feature fusion framework is constructed, using Word2Vec encoding to encode subsystem/component features, BERT-KPCA to process failure modes/reasons, and Sentence-BERT to quantify the semantic association between failure impact and emergency decision-making.

The dataset covers 12 systems, 1,262 failure modes, and 6,150 propagation paths.

In the label prediction results, the Shore-based Meteor…

4 months, 3 weeks назад @ paperswithcode.com
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering /YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering

In recent years, models based on Graph Convolutional Networks (GCN) have made significant strides in the field of graph data analysis.

Although the Graph Transformer architecture has mitigated some of these issues, its performance is still limited when processing heterogeneous graph data.

To address these challenges, this study proposes a novel deep clustering framework that comprising GCN, Autoencoder (AE), and Graph Transformer, termed the Tri-Learn Graph Fusion Network (Tri-GFN).

The tri-learning mechanism allows mutual learning among these modules, while the feature fusion strategy enables the model to capture complex relationships, yielding highly discriminative representations for gra…

4 months, 3 weeks назад @ paperswithcode.com
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation /mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression.

The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant.

The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable.

We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69\% test accuracy in just 20 ep…

4 months, 3 weeks назад @ paperswithcode.com
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation /Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation

While this problem has been widely studied in traditional recommendation settings, its implications for bundle recommendation (BR) remain largely unexplored.

Existing fairness frameworks and metrics designed for traditional recommender systems may not directly translate to this multi-layered setting.

In this paper, we conduct a comprehensive reproducibility study of product-side fairness in BR across three real-world datasets using four state-of-the-art BR methods.

We analyze exposure disparities at both the bundle and item levels using multiple fairness metrics, uncovering important patterns.

Overall, our findings offer actionable insights for building fairer bundle recommender systems and…

4 months, 3 weeks назад @ paperswithcode.com
/cbobed/ OntView: What you See is What you Meant
/cbobed/ OntView: What you See is What you Meant /cbobed/ OntView: What you See is What you Meant

However, the lack of tools that provide effective visualization is still a significant challenge.

In this paper, we present OntView, an ontology viewer that is designed to provide users with an intuitive visual representation of ontology concepts and their formal definitions through a user-friendly interface.

Building on the use of a DL reasoner, OntView follows a "What you see is what you meant" paradigm, showing the actual inferred knowledge.

One key aspect for this is its ability to visualize General Concept Inclusions (GCI), a feature absent in existing visualization tools.

OntView has been released with an open-source license for the whole community.

4 months, 3 weeks назад @ paperswithcode.com
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction /Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction

These approaches fail to capture elaborate relations hidden in real-world bundle structures, resulting in suboptimal bundle representations.

To overcome this limitation, we propose RaMen, a novel method that provides a holistic multi-strategy approach for bundle construction.

RaMen utilizes both intrinsic (characteristics) and extrinsic (collaborative signals) information to model bundle structures through Explicit Strategy-aware Learning (ESL) and Implicit Strategy-aware Learning (ISL).

Integrating diverse strategies enables RaMen to learn more comprehensive and robust bundle representations.

Meanwhile, Multi-strategy Alignment & Discrimination module is employed to facilitate knowledge tr…

4 months, 3 weeks назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 4 months, 3 weeks назад
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation /PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation

The rise of Large Reasoning Models (LRMs) promises a significant leap forward in language model capabilities, aiming to tackle increasingly sophisticated tasks with unprecedented efficiency and accuracy.

However, despite their impressive performance, recent studies have highlighted how current reasoning models frequently fail to generalize to novel, unseen problems, often resorting to memorized solutions rather than genuine inferential reasoning.

In this paper, we introduce Nexus Architect, an enhanced iteration of our multi-agent system framework, Nexus, equipped with a novel automated workflow synthesis mechanism.

Given a user's prompt and a small set of representative examples, the Archi…

4 months, 3 weeks назад @ paperswithcode.com
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings /sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings

One of the most tested ways to establish such a communication is through the use of sign based languages.

However, not many people are aware of the smaller intricacies involved with sign language.

Sign language recognition using computer vision aims at eliminating the communication barrier between deaf-mute and ordinary people so that they can properly communicate with others.

In recent studies, it has been found that people with hearing disabilities prefer to sign over typing during these video calls.

In this paper, we are proposing a browser extension that will automatically translate sign language to subtitles for everyone else in the video call.

4 months, 3 weeks назад @ paperswithcode.com
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates /alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates

In this paper, we present our submission to the MM-ArgFallacy2025 shared task, which aims to advance research in multimodal argument mining, focusing on logical fallacies in political debates.

Our approach uses pretrained Transformer-based models and proposes several ways to leverage context.

In the fallacy classification subtask, our models achieved macro F1-scores of 0.4444 (text), 0.3559 (audio), and 0.4403 (multimodal).

Our multimodal model showed performance comparable to the text-only model, suggesting potential for improvements.

PDFAbstract

4 months, 3 weeks назад @ paperswithcode.com
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms /RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

On-demand ride-sharing platforms face the fundamental challenge of dynamically bundling passengers with diverse origins and destinations and matching them with vehicles in real time, all under significant uncertainty.

However, conventional MARL-based ride-sharing approaches heavily rely on the accurate estimation of Q-values or V-values, which becomes problematic in large-scale, highly uncertain environments.

To address these challenges, we propose two novel alternative methods that bypass value function estimation.

First, we adapt GRPO to ride-sharing, replacing the PPO baseline with the group average reward to eliminate critic estimation errors and reduce training bias.

Second, inspired b…

4 months, 3 weeks назад @ paperswithcode.com
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation /LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Include the markdown at the top of your GitHub README.md file to showcase the performance of the model.

Badges are live and will be dynamically updated with the latest ranking of this paper.

4 months, 3 weeks назад @ paperswithcode.com
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space

We introduce a novel classification framework, ZClassifier, that replaces conventional deterministic logits with diagonal Gaussian-distributed logits. Code: https://github.com/ShimSoonYong/ZClassifier

4 months, 3 weeks назад @ paperswithcode.com
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation

In this paper, we fill this gap and propose a family of novel gradual semantics for equipping assumptions, which are the core components in ABA frameworks, with dialectical strengths. Code: https://github.com/briziorusso/GradualABA

4 months, 3 weeks назад @ paperswithcode.com
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

4 months, 3 weeks назад @ paperswithcode.com
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling

Despite rapid progress in end-to-end AI music generation, AI-driven modeling of professional Digital Signal Processing (DSP) workflows remains challenging. Code: https://github.com/IsaacYQH/WildFX

4 months, 3 weeks назад @ paperswithcode.com
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss

This paper explores the application of a simple weighted loss function to Transformer-based models for multi-label emotion detection in SemEval-2025 Shared Task 11. Code: https://github.com/summer1278/semeval2025-task11

4 months, 3 weeks назад @ paperswithcode.com
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs

We present Step-wise Policy for Rare-tool Knowledge (SPaRK), a novel reinforcement learning framework that teaches large language models to explore diverse tool usage patterns beyond conventional high-temperature sampling. Code: https://github.com/gabrielkmbo/explore-rl

4 months, 3 weeks назад @ paperswithcode.com
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation

Service robots are increasingly deployed in diverse and dynamic environments, where both physical layouts and social contexts change over time and across locations. Code: https://github.com/Cavendish518/LE-Nav

4 months, 3 weeks назад @ paperswithcode.com
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

4 months, 3 weeks назад @ paperswithcode.com
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array

The frequent data swaps between the systolic array and external vector units result in low systolic array utilization. Code: https://github.com/VCA-EPFL/FSA

4 months, 3 weeks назад @ paperswithcode.com
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

4 months, 3 weeks назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 4 months, 3 weeks назад
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

4 months, 3 weeks назад @ paperswithcode.com
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training

Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. Code: https://github.com/benedekrozemberczki/pytorch_geometric_temporal

4 months, 3 weeks назад @ paperswithcode.com
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

4 months, 3 weeks назад @ paperswithcode.com
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context

Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. Code: https://github.com/gitter-lab/Assay2Mol

4 months, 3 weeks назад @ paperswithcode.com
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition

We employ forward Kullback-Leibler (KL) divergence alongside spatio-temporal focal modulation to effectively transfer both local and global context from the Video-FocalNet Base (teacher) to the proposed VFL-Net (student). Code: https://github.com/hayatkhan8660-maker/DVFL-Net

4 months, 3 weeks назад @ paperswithcode.com
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

4 months, 3 weeks назад @ paperswithcode.com
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters

This paper introduces a novel approach to project success evaluation by integrating fuzzy logic into an existing construct. Code: https://github.com/joaojcorreia/FuzzyLogic_ProjectSuccess

4 months, 3 weeks назад @ paperswithcode.com
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing

Extensive experiments demonstrate the effectiveness of InstructFLIP by outperforming SOTA models in accuracy and substantially reducing training redundancy across diverse domains in FAS. Code: https://github.com/kunkunlin1221/InstructFLIP

4 months, 3 weeks назад @ paperswithcode.com
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images

Recent progress has been made in region-aware vision-language modeling, particularly with the emergence of the Describe Anything Model (DAM). Code: https://github.com/Linvyl/DAM-QA

4 months, 3 weeks назад @ paperswithcode.com
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker

We propose multi-step custom implementation utilizing widely adopted hybrid search (metadata & embedding) and state of the art late interaction re-ranker to retrieve best matching pages. Code: https://github.com/abhijeet3922/vision-RAG

4 months, 3 weeks назад @ paperswithcode.com
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation

3D modeling is moving from virtual to physical. Code: https://github.com/ziangcao0312/PhysX

4 months, 3 weeks назад @ paperswithcode.com
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Code: https://github.com/henry123-boy/SpaTrackerV2

4 months, 3 weeks назад @ paperswithcode.com
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks

Analysis of network performance, correlation patterns, and weight matrices reveals that mutual information minimization yields high task performance alongside clear functional modularity and moderate structural modularity. Code: https://github.com/cncs-fit/mio_rnn

4 months, 3 weeks назад @ paperswithcode.com
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator

Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. Code: https://github.com/coswindywang/HdLM

4 months, 3 weeks назад @ paperswithcode.com
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL

Modern language models address complex questions through chain-of-thought (CoT) reasoning (Wei et al., 2023) and retrieval augmentation (Lewis et al., 2021), yet struggle with error propagation and knowledge integration. Code: https://github.com/ahmedehabb/From-Roots-to-Rewards-Dynamic-Tree-Reasoning-with-RL

4 months, 3 weeks назад @ paperswithcode.com
💼 University and corporation labs
DeepMind DeepMind
последний пост 10 часов назад
Improved Gemini audio models for powerful voice experiences
Improved Gemini audio models for powerful voice experiences Improved Gemini audio models for powerful voice experiences

Earlier this week, we introduced greater control over audio generation with an upgrade to our Gemini 2.5 Pro and Flash Text-to-Speech models.

Today, we’re releasing an updated Gemini 2.5 Flash Native Audio for live voice agents.

Gemini 2.5 Flash Native Audio is now available across Google products including Google AI Studio, Vertex AI, and has also started rolling out in Gemini Live and Search Live, bringing the naturalness of native audio to Search Live for the first time.

Beyond powering helpful agents, native audio unlocks new possibilities for global communication.

Live Voice Agents

10 часов назад @ blog.google
Deepening our partnership with the UK AI Security Institute
Deepening our partnership with the UK AI Security Institute Deepening our partnership with the UK AI Security Institute

Today, we're announcing an expanded partnership with the UK AI Security Institute (AISI) through a new Memorandum of Understanding focused on foundational security and safety research, to help ensure artificial intelligence is developed safely and benefits everyone.

The research partnership with AISI is an important part of our broader collaboration with the UK government on accelerating safe and beneficial AI progress.

This is why we have partnered with the UK AISI since its inception in November 2023 to test our most capable models.

We are actively working with AISI to build more robust evaluations for AI models, and our teams have collaborated on safety research to move the field forward…

2 days, 4 hours назад @ deepmind.google
Strengthening our partnership with the UK government to support prosperity and security in the AI era
Strengthening our partnership with the UK government to support prosperity and security in the AI era Strengthening our partnership with the UK government to support prosperity and security in the AI era

The UK has already laid a strong foundation to seize this moment and is uniquely positioned to translate AI innovation into public benefit.

That’s why we are excited to deepen our collaboration with the UK government to accelerate this work and offer a blueprint for other countries.

Accelerating access to frontier AI in key sectors: Science & EducationOur partnership will center on providing access to frontier AI in two areas foundational to the UK’s long-term success: scientific discovery and education.

The UK has a rich history of applying new technologies to drive scientific progress, from Hooke’s microscope to Faraday’s electrical experiments.

Establishing Google DeepMind’s first automa…

2 days, 13 hours назад @ deepmind.google
FACTS Benchmark Suite: Systematically evaluating the factuality of large language models
FACTS Benchmark Suite: Systematically evaluating the factuality of large language models FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

The FACTS Benchmark SuiteToday, we’re teaming up with Kaggle to introduce the FACTS Benchmark Suite.

A Search Benchmark that tests a model’s ability to use Search as a tool to retrieve information and synthesize it correctly.

Similar to our previous release, we are following standard industry practice and keeping an evaluation set held-out as a private set.

The FACTS Benchmark Suite Score (or FACTS Score) is calculated as the average accuracy of both public and private sets across the four benchmarks.

Kaggle will oversee the management of the FACTS Benchmark Suite.

3 days, 17 hours назад @ deepmind.google
Engineering more resilient crops for a warming climate
Engineering more resilient crops for a warming climate Engineering more resilient crops for a warming climate

Scientists are using AlphaFold in their research to strengthen an enzyme that’s vital to photosynthesis, paving the way for more heat-tolerant crops.

As global warming accompanies more droughts and heatwaves, harvests of some staple crops are shrinking.

But less visible is what is happening inside these plants, where high heat can break down the molecular machinery that keeps them alive.

Plants use photosynthesis to produce the glucose that fuels their growth via an intricate choreography of enzymes inside plant cells.

"Our job is to learn from those examples and build that same resilience into the crops we depend on."

1 week, 1 day назад @ deepmind.google
AlphaFold: Five years of impact
AlphaFold: Five years of impact AlphaFold: Five years of impact

They used AlphaFold alongside comparative genomics to better understand how plants perceive changes in their environment, paving the way for more resilient crops.

AlphaFold has been cited in more than 35,000 papers and more than 200,000 papers incorporated elements of AlphaFold 2 in their methodology.

An independent analysis of AlphaFold 2’s impact, carried out by the Innovation Growth Lab, suggests that researchers using AlphaFold 2 see an increase of over 40% in their submission of novel experimental protein structures.

Those protein structures are more likely to be dissimilar to known structures, encouraging the exploration of uncharted areas of science.

The AlphaFold Server is empowerin…

2 weeks, 3 days назад @ deepmind.google
Revealing a key protein behind heart disease
Revealing a key protein behind heart disease Revealing a key protein behind heart disease

Both have a family history of heart disease – a reminder of what’s at stake in their work to better understand and ultimately help treat this deadly condition.

That protein, apoB100, has defied mapping not only because it’s enormous (for a protein), but also because it connects to fats and other molecules in complicated ways.

ApoB100 forms the molecular scaffold of “bad cholesterol”, which is known to scientists as low-density lipoprotein (LDL).

Discovering the structure of its key protein promised to shed light on how bad cholesterol becomes harmful inside the body, giving scientists a better chance to develop ways to prevent and treat ASCVD.

The images weren’t sharp enough to map the stru…

2 weeks, 3 days назад @ deepmind.google
How we’re bringing AI image verification to the Gemini app
How we’re bringing AI image verification to the Gemini app How we’re bringing AI image verification to the Gemini app

At Google, we’ve long invested in ways to provide you with helpful context about information you see online.

Now, as generative media becomes increasingly prevalent and high-fidelity, we are deploying tools to help you more easily determine whether the content you're interacting with was created or edited using AI.

Starting today, we’re making it easier for everyone to verify if an image was generated with or edited by Google AI right in the Gemini app, using SynthID, our digital watermarking technology that embeds imperceptible signals into AI-generated content.

Since then, over 20 billion AI-generated pieces of content have been watermarked using SynthID, and we have been testing our Synt…

3 weeks, 1 day назад @ blog.google
Build with Nano Banana Pro, our Gemini 3 Pro Image model
Build with Nano Banana Pro, our Gemini 3 Pro Image model Build with Nano Banana Pro, our Gemini 3 Pro Image model

Today, we’re releasing Nano Banana Pro (Gemini 3 Pro Image), a higher-fidelity model built on Gemini 3 Pro for developers to access studio-quality image generation.

This follows our release of Nano Banana (Gemini 2.5 Flash Image) just a few months ago.

Since then, we’ve loved seeing the community put its key features to work — from character consistency to photo restoration, and even using its capabilities to make local edits in an infinite canvas.

This state-of-the-art image generation and editing model is starting to roll out in paid preview to build a new wave of intelligent, multimodal applications with the Gemini API in Google AI Studio and Vertex AI for enterprises.

This model unlocks…

3 weeks, 1 day назад @ blog.google
Introducing Nano Banana Pro
Introducing Nano Banana Pro Introducing Nano Banana Pro

How Nano Banana Pro helps you bring any idea or design to lifeNano Banana Pro can help you visualize any idea and design anything — from prototypes, to representing data as infographics, to turning handwritten notes into diagrams.

With Nano Banana Pro, now you can:Generate more accurate, context-rich visuals based on enhanced reasoning, world knowledge and real-time informationWith Gemini 3’s advanced reasoning, Nano Banana Pro doesn’t just create beautiful images, it also helps you create more helpful content.

You can get accurate educational explainers to learn more about a new subject, like context-rich infographics and diagrams based on the content you provide or facts from the real wor…

3 weeks, 1 day назад @ blog.google
Start building with Gemini 3
Start building with Gemini 3 Start building with Gemini 3

Google AntigravityTo advance how the model and IDE work together, we’re introducing Google Antigravity to showcase what’s possible with Gemini 3.

It’s a faster way to develop: you act as the architect, collaborating with intelligent agents that operate autonomously across the editor, terminal, and browser.

These agents plan and execute complex software tasks, communicating their work with the user via detailed artifacts.

This elevates all aspects of development, from building features, UI iteration, and fixing bugs to researching and generating reports.

Visit the Google Antigravity website to download the public preview at no charge, now available for MacOS, Windows and Linux.

3 weeks, 3 days назад @ blog.google
We’re expanding our presence in Singapore to advance AI in the Asia-Pacific region
We’re expanding our presence in Singapore to advance AI in the Asia-Pacific region We’re expanding our presence in Singapore to advance AI in the Asia-Pacific region

Advancing Gemini and frontier AI impactOur growing team in Singapore will consist of exceptional research scientists, software engineers, and AI impact experts focused on critical areas of research and development.

Regarding multilinguality: We have collaborated with AI Singapore to launch Project Aquarium, an open data platform for Southeast Asian Languages.

We have collaborated with AI Singapore to launch Project Aquarium, an open data platform for Southeast Asian Languages.

These examples reflect what's possible when cutting-edge AI research meets Singapore's forward-looking innovation and strong public purpose.

Through our new AI research lab, we will continue collaborating with the reg…

3 weeks, 3 days назад @ deepmind.google
A new era of intelligence with Gemini 3
A new era of intelligence with Gemini 3 A new era of intelligence with Gemini 3

Every generation of Gemini has built on the last, enabling you to do more.

And starting today, we’re shipping Gemini at the scale of Google.

That includes Gemini 3 in AI Mode in Search with more complex reasoning and new dynamic experiences.

This is the first time we are shipping Gemini in Search on day one.

Gemini 3 is also coming today to the Gemini app, to developers in AI Studio and Vertex AI, and in our new agentic development platform, Google Antigravity — more below.

3 weeks, 3 days назад @ blog.google
Introducing Google Antigravity
Introducing Google Antigravity Introducing Google Antigravity 3 weeks, 3 days назад @ antigravity.google
WeatherNext 2: Our most advanced weather forecasting model
WeatherNext 2: Our most advanced weather forecasting model WeatherNext 2: Our most advanced weather forecasting model

In recent years, artificial intelligence (AI) has dramatically enhanced what’s possible in weather forecasting and the ways in which we can use it.

Today, Google DeepMind and Google Research are introducing WeatherNext 2, our most advanced and efficient forecasting model.

Using this technology, we’ve supported weather agencies in making decisions based on a range of scenarios through our experimental cyclone predictions.

By incorporating WeatherNext technology, we’ve now upgraded weather forecasts in Search, Gemini, Pixel Weather and Google Maps Platform’s Weather API.

In the coming weeks, it will also help power weather information in Google Maps.

3 weeks, 4 days назад @ blog.google
Google
последний пост 11 часов назад
A developer's guide to Gemini Live API in Vertex AI
A developer's guide to Gemini Live API in Vertex AI A developer's guide to Gemini Live API in Vertex AI

Today, we announced the general availability of Gemini Live API on Vertex AI, which is powered by the latest Gemini 2.5 Flash Native Audio model.

In this post we'll look at two templates and three reference demos that help you understand how to best use Gemini Live API.

Gemini Live API fundamentally changes the engineering approach with a unified, low-latency, native audio architecture.

Native audio processing: Gemini 2.5 Flash Native Audio model processes raw audio natively through a single, low-latency model.

Next-generation conversation featuresGemini Live API gives you a suite of production-ready features that define a new standard for AI agents:

11 часов назад @ cloud.google.com
How to connect Looker to Gemini Enterprise in minutes, with MCP Toolbox and ADK
How to connect Looker to Gemini Enterprise in minutes, with MCP Toolbox and ADK How to connect Looker to Gemini Enterprise in minutes, with MCP Toolbox and ADK

We can all agree that the quality of AI-driven answers relies on the consistency of the underlying data.

Building off the recent introduction of Looker’s Model Context Protocol (MCP) server, in this blog we take you through the process of creating an Agent Development Kit (ADK) agent that is connected to Looker via the MCP Toolbox for Databases and exposing it within Gemini Enterprise.

Instead of managing tool logic and authentication themselves, agents act as MCP clients and request tools from the Toolbox.

The MCP Toolbox handles all the underlying complexities, including secure connections to Looker, authentication and query execution.

The MCP Toolbox for Databases natively supports Looke…

11 часов назад @ cloud.google.com
Cloud CISO Perspectives: Our 2026 Cybersecurity Forecast report
Cloud CISO Perspectives: Our 2026 Cybersecurity Forecast report Cloud CISO Perspectives: Our 2026 Cybersecurity Forecast report

Marina Kaganovich, executive trust leadThe heightened capability of agentic AI to take actions and execute tasks autonomously elevates the importance of cybersecurity basics.

Vesselin Tzvetkov, senior cybersecurity advisorAs Francis noted, agentic security operations are set to become the standard for modern SOCs, dramatically enhancing the speed and capabilities of security organizations.

Vinod D’Souza, director, manufacturing and industryIn 2026, agentic AI will help the manufacturing and industrial sector cross the critical threshold from static automation to true autonomy.

By rooting security strategies in data-centered Zero Trust, organizations stop treating security as a gatekeeper an…

11 часов назад @ cloud.google.com
Gemini Live API Now GA on Vertex AI
Gemini Live API Now GA on Vertex AI Gemini Live API Now GA on Vertex AI

Today, we are excited to announce that Gemini Live API, powered by the latest Gemini 2.5 Flash Native Audio model, is generally available on Vertex AI.

A new standard with real-time multimodal AI agentsGemini Live API represents a new standard for bringing AI to life.

Deploying on Vertex AI allows you to leverage our expanding global infrastructure across multiple regions, delivering reliability for your users.

Building real-world impact with Gemini Live APIThe true power of Gemini Live API is demonstrated by the companies who are using it today to redefine their customer experiences.

Shopify, the leading global commerce platform, developed Sidekick, a multimodal AI assistant powered by Gem…

12 часов назад @ cloud.google.com
How we built a multi-agent system for superior business forecasting
How we built a multi-agent system for superior business forecasting How we built a multi-agent system for superior business forecasting

This innovative solution combines two powerful, specialized AI agents: a prediction agent built by Google Cloud and App Orchid’s Data Agent offering.

Google prediction agent - The forecasting powerhouseThe prediction agent, which is primarily the custom engineering work of Google Cloud, is the system’s window to the future.

App Orchid Data Agent - The enterprise intelligence data expertAccurate predictions depend on high-quality, AI-ready data, which is where App Orchid’s Data Agent excels.

The combined business forecasting agentAt the heart of the solution is a unified business forecasting agent, which brings together the capabilities of our unique prediction and data agents in a discrete …

1 day, 11 hours назад @ cloud.google.com
AI agents are here. Is your infrastructure ready?
AI agents are here. Is your infrastructure ready? AI agents are here. Is your infrastructure ready?

In a recent IDC global survey of over 1,300 AI decision-makers, inference was already cited as the largest AI workload segment, accounting for 47% of all AI operations.

This surge in demand is exposing a critical vulnerability for many organizations: the AI efficiency gap.

The TCO crisis in an age of agentsThe AI efficiency gap is the difference between the theoretical performance of an AI stack and the actual, real-world performance achieved.

That is why we created AI Hypercomputer: an integrated supercomputer system designed to deliver exceptional performance and efficiency for demanding AI workloads.

Get your free copy of the whitepaper to learn more: The AI Efficiency Gap: From TCO Cris…

1 day, 11 hours назад @ cloud.google.com
Announcing MCP support in Apigee: Turn existing APIs into secure and governed agentic tools
Announcing MCP support in Apigee: Turn existing APIs into secure and governed agentic tools Announcing MCP support in Apigee: Turn existing APIs into secure and governed agentic tools

When a tools/list or tools/call request is made to the MCP endpoint, Apigee uses the operations documented in the OpenAPI spec as the MCP tools list.

And, with the recent launch of Apigee API insights, you can also use the new “Insights” tab in Apigee API hub’s catalog to view traffic and performance metrics for your MCP endpoints.

Benefits of Apigee’s approach to MCP supportOur main goal with MCP support in Apigee is to make sure that you can secure, govern, and monitor usage of MCP tools with the same policies and workflows in Apigee that you’re already familiar with.

Centralized tool catalog: After you deploy an MCP proxy, Apigee automatically registers your MCP endpoint in Apigee API hu…

2 days, 13 hours назад @ cloud.google.com
Announcing Model Context Protocol (MCP) support for Google services
Announcing Model Context Protocol (MCP) support for Google services Announcing Model Context Protocol (MCP) support for Google services

Today we’re announcing the release of fully-managed, remote MCP servers.

Google’s existing API infrastructure is now enhanced to support MCP, providing a unified layer across all Google and Google Cloud services.

Developers can now simply point their AI agents or standard MCP clients like Gemini CLI and AI Studio to a globally-consistent and enterprise-ready endpoint for Google and Google Cloud services.

With the new Cloud API Registry and Apigee API Hub, developers can find trusted MCP tools from Google and their own organizations, respectively.

We pair this ease of discovery with rigorous control: administrators can manage access via Google Cloud IAM, rely on audit logging for observabili…

2 days, 13 hours назад @ cloud.google.com
AlphaEvolve on Google Cloud: AI for agentic discovery and optimization
AlphaEvolve on Google Cloud: AI for agentic discovery and optimization AlphaEvolve on Google Cloud: AI for agentic discovery and optimization

To help you overcome this challenge, we are releasing AlphaEvolve, a Gemini-powered coding agent for designing advanced algorithms, to Google Cloud, in private preview..

How AlphaEvolve can help businesses across industriesYou can apply this same engine to your own proprietary data and unique algorithmic challenges.

Get started on Google CloudAlphaEvolve is made to help with complex optimization problems that you can define in code and objectively measure.

The AlphaEvolve Service API is now available through an Early access program with Google Cloud.

If you have one of these problems and are interested in participating in the Early Access Program, please reach out to your Google Cloud Repre…

3 days, 11 hours назад @ cloud.google.com
From adoption to impact: Putting the DORA AI Capabilities Model to work
From adoption to impact: Putting the DORA AI Capabilities Model to work From adoption to impact: Putting the DORA AI Capabilities Model to work

From this disparity, we can conclude that how they are using AI is a critical factor.

We wanted to understand the particular capabilities and conditions that enable teams to achieve positive outcomes, leading us to develop the DORA AI Capabilities Model report.

This companion guide to the 2025 DORA Report is designed to help you navigate our new reality.

Seven capabilities that amplify successSuccessfully using AI requires cultivating your technical and cultural environment.

From the same set of respondents who participated in the 2025 DORA survey, we identified seven foundational capabilities that are proven to amplify the positive impact of AI on organizational performance:

3 days, 11 hours назад @ cloud.google.com
Using MCP with Web3: How to secure agents making blockchain transactions
Using MCP with Web3: How to secure agents making blockchain transactions Using MCP with Web3: How to secure agents making blockchain transactions

At Google Cloud, we sit at a unique intersection of two transformative technologies: AI and Web3.

However, the practical viability of this new paradigm hinges on who hosts the agent, and who holds the private key to the operations.

Most of today’s examples showcase an agent directly holding a private key, and most cryptocurrency model context protocol (MCP) servers can only be used if you configure them with a private key.

The agent-controlled modelThis model is designed for a world where users interact with agents hosted by a third party — a realistic assumption for mainstream adoption.

In this scenario, you don’t give the agent your private key.

1 week назад @ cloud.google.com
Replit is delivering enterprise-grade vibe coding with Google Cloud
Replit is delivering enterprise-grade vibe coding with Google Cloud Replit is delivering enterprise-grade vibe coding with Google Cloud

Vibe coding has been all the rage this year.

Today, Replit and Google Cloud are expanding their strategic partnership to bring vibe coding capabilities to enterprise developers and teams.

Google models, including Gemini 3, 2.5 Flash Lite, 2.5 Flash, and Imagen 4, are now supported on Replit, powering both coding and multimodal use cases — and driving significant token usage to Google Cloud.

Replit and Google Cloud will partner to help enterprise customers embrace vibe coding and help their developers be more productive — through joint go-to-market on Google Cloud Marketplace and through Google Cloud’s extensive co-sell programs.

“Our growing partnership will deliver more capabilities to Rep…

1 week, 1 day назад @ cloud.google.com
Accelerate model downloads on GKE with NVIDIA Run:ai Model Streamer
Accelerate model downloads on GKE with NVIDIA Run:ai Model Streamer Accelerate model downloads on GKE with NVIDIA Run:ai Model Streamer

The chart above shows how quickly the model streamer can fetch a 141GB Llama 3.3-7 70B model from Cloud Storage as compared to the default vLLM model loader (lower is better).

By streaming the model into GPU memory, the model streamer slashes potentially the most time-consuming part of the startup process.

Instead of waiting for an entire model to be downloaded before loading, the streamer fetches model tensors directly from object storage and streams them concurrently to GPU memory.

For workloads that rely on model parallelism— where a single model is partitioned and executed across multiple GPUs— the model streamer goes a step further.

Performance and simplicityThe latest updates to the M…

1 week, 1 day назад @ cloud.google.com
GKE Turns 10 Hackathon: Announcing the winners and highlights
GKE Turns 10 Hackathon: Announcing the winners and highlights GKE Turns 10 Hackathon: Announcing the winners and highlights

The GKE Turns 10 Hackathon was an electrifying showcase of developer ingenuity!

Building on the excitement from our initial announcement, the hackathon challenged participants to build powerful AI agents that interact with microservice applications using the robustness of Google Kubernetes Engine (GKE) and the intelligence of Google AI models like Gemini.

The goal was to seamlessly integrate next-generation agentic AI capabilities, all orchestrated on GKE, to elevate existing applications to new heights.

Grand prize winner: Amie WeiThe grand prize winner, Amie Wei, was invited to attend KubeCon + CloudNativeCon North America 2025, where she shared her experience and insights.

She highlighte…

1 week, 3 days назад @ cloud.google.com
Upskill for the holidays: Check out no-cost AI training now
Upskill for the holidays: Check out no-cost AI training now Upskill for the holidays: Check out no-cost AI training now

Google Cloud AI Infrastructure (no credits required): This learning path offers on-demand courses in AI infrastructure for intermediate to advanced learners.

Google AI Essentials (no credits required): Enhance your productivity across all roles and industries by gaining essential AI skills.

Learn directly from Google AI experts how to use AI responsibly, and earn a certificate upon completion.

AI Boost Bites | Amplify Exec Voices with AI: (no credits required): Transform your daily work with practical, hands-on lessons from Google AI experts with short, 10-minute lessons.

Future-Proof Your AI Learning Strategy (no credits required): Continuous learning is key, especially in this rapidly evo…

1 week, 3 days назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 1 day, 11 hours назад
Agent Lightning: Adding reinforcement learning to AI agents without code rewrites
Agent Lightning: Adding reinforcement learning to AI agents without code rewrites Agent Lightning: Adding reinforcement learning to AI agents without code rewrites

To address this, a research team from Microsoft Research Asia – Shanghai has introduced Agent Lightning.

Whether it involves multiple collaborating agents or dynamic tool use, Agent Lightning breaks it down into a sequence of transitions.

Agent Lightning as middlewareAgent Lightning serves as middleware between RL algorithms and agent environments, providing with modular components that enable scalable RL through standardized protocols and well-defined interfaces.

In practice, developers can keep their existing agent frameworks and switch model calls to the Agent Lightning API without changing their agent code (Figure 5).

By bridging existing agentic systems with reinforcement learning, Age…

1 day, 11 hours назад @ microsoft.com
Promptions helps make AI prompting more precise with dynamic UI controls
Promptions helps make AI prompting more precise with dynamic UI controls Promptions helps make AI prompting more precise with dynamic UI controls

To address this, we are excited to introduce Promptions (prompt + options), a UI framework that helps developers build AI interfaces with more precise user control.

We compared the static design from the first study, called the “Static Prompt Refinement Control” (Static PRC), against a “Dynamic Prompt Refinement Control” (Dynamic PRC) with features that responded to participants’ feedback.

Comparison of user preferences for Static PRC versus Dynamic PRC across key evaluation criteria.

(1) The Option Module reads the user’s prompt and conversation history and (2) generates prompt options.

Key usability challenges include clarifying how dynamic options affect AI output and managing the comple…

2 days, 11 hours назад @ microsoft.com
GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI
GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI

C, Scatter plot comparing the subtype-level GigaTIME-translated virtual mIF activations between TCGA and Providence virtual populations.

To our knowledge, this is the first large-scale study exploring multimodal AI for scaling virtual mIF generation.

H, A case study showcasing the activation maps across different virtual mIF channels for a H&E slide in our virtual population, and virtual mIF of sample patches from this slide.

By applying GigaTIME to Providence real-world data, we generated a virtual population of 14,256 patients with virtual mIF and key clinical attributes.

G, Bar plot comparing pan-cancer patient stratification performance in terms of survival log rank p-values among virtu…

3 days, 12 hours назад @ microsoft.com
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

1 week, 4 days назад @ microsoft.com
Reducing Privacy leaks in AI: Two approaches to contextual integrity
Reducing Privacy leaks in AI: Two approaches to contextual integrity Reducing Privacy leaks in AI: Two approaches to contextual integrity

The theory of contextual integrity frames privacy as the appropriateness of information flow within specific social contexts.

Each tackles contextual integrity from a different angle, but both aim to build directly into AI systems a greater sensitivity to information-sharing norms.

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning, accepted at NeurIPS 2025, takes a different approach to applying contextual integrity.

Contextual integrity through reasoning and reinforcement learningIn our second paper, we explore whether contextual integrity can be built into the model itself rather than enforced through external checks at inference time.

To address this trade-off, we int…

2 weeks, 3 days назад @ microsoft.com
Fara-7B: An Efficient Agentic Model for Computer Use
Fara-7B: An Efficient Agentic Model for Computer Use Fara-7B: An Efficient Agentic Model for Computer Use

Today, we are pleased to announce Fara-7B, our first agentic SLM designed specifically for computer use.

Unlike traditional chat models that generate text-based responses, Computer Use Agent (CUA) models like Fara-7B leverage computer interfaces, such as a mouse and keyboard, to complete tasks on behalf of users.

This results in reduced latency and improved privacy, as user data remains local.

Fara-7B breaks ground on a new pareto frontier, showing that on-device computer use agents are approaching the capabilities of frontier models.

For guidance on how to use our model safely, and the security considerations to be mindful of when using our model, please refer to our Model card (opens in n…

2 weeks, 4 days назад @ microsoft.com
MMCTAgent: Enabling multimodal reasoning over large video and image collections
MMCTAgent: Enabling multimodal reasoning over large video and image collections MMCTAgent: Enabling multimodal reasoning over large video and image collections

Real-world reasoning increasingly involves analyzing long-form video content, where context spans minutes or hours, far beyond the context limits of most models.

The Planner agent decomposes a user query, identifies the appropriate reasoning tools, performs multimodal operations, and drafts a preliminary answer.

MMCTAgent’s Planner–Critic architecture enables multimodal reasoning over long-form video through structured ingestion, retrieval, and iterative feedback.

The VideoAgent extends this architecture to long-form video reasoning.

Takeaways and next stepsMMCTAgent demonstrates a scalable agentic approach to multimodal reasoning with a Planner–Critic architecture.

1 month назад @ microsoft.com
BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI
BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI

Many studies have explored red teaming code LLMs, testing whether the models can reject unsafe requests and whether their generated code exhibits insecure patterns.

Knowledge-enhanced blue teaming: Building on the foundation of red-teaming knowledge, BlueCodeAgent significantly improves blue-teaming performance by leveraging constitutions derived from knowledge and dynamic testing.

Generalization to seen and unseen risks: Empowered by comprehensive red-teaming knowledge, BlueCodeAgent generalizes effectively to unseen risks.

A blue teaming agent enabled by red teamingFigure 2: Overview of BlueCodeAgent, an end-to-end blue teaming framework powered by automated red teaming for code security.…

1 month назад @ microsoft.com
When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost
When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost

Spotlight: Event Series Microsoft Research Forum Join us for a continuous exchange of ideas about research in the era of general AI.

These differentiated advantages stem from PIKE-RAG’s unique approach to understanding and processing professional knowledge.

“It’s also worth noting that the researchers at Microsoft Research Asia demonstrated strong industry knowledge and rigorous scientific methodology.

Through this collaboration, we validated that PIKE-RAG’s general approach can greatly improve the accuracy of professional knowledge Q&A and accelerate scenario customization.

Our researchers also gained valuable experience in handling domain-specific data,” explained Jiang Bian, partner rese…

1 month назад @ microsoft.com
Magentic Marketplace: an open-source simulation environment for studying agentic markets
Magentic Marketplace: an open-source simulation environment for studying agentic markets Magentic Marketplace: an open-source simulation environment for studying agentic markets

To help navigate this uncertainty, we built Magentic Marketplace (opens in new tab)— an open-source simulation environment for exploring the numerous possibilities of agentic markets and their societal implications at scale.

To explore these dynamics in depth, the Magentic Marketplace platform enables controlled experimentation across diverse agentic marketplace scenarios.

With Magentic Marketplace, researchers can model how agents representing customers and businesses interact—shedding light on the dynamics that could shape future digital markets.

Magentic Marketplace includes two agent types: Assistant Agents (customers) and Service Agents (businesses).

Unlike traditional markets, which d…

1 month, 1 week назад @ microsoft.com
RedCodeAgent: Automatic red-teaming agent against diverse code agents
RedCodeAgent: Automatic red-teaming agent against diverse code agents RedCodeAgent: Automatic red-teaming agent against diverse code agents

In the context of code, effective red-teaming requires more than simply checking whether the target code agent rejects unsafe requests.

After the second request was rejected by the code agent, RedCodeAgent invoked both Code Substitution and GCG to optimize the prompt.

Ultimately, RedCodeAgent successfully combined the suggestion from Code Substitution (i.e., using pathlib) with the adversarial suffix generated by GCG, making the target code agent delete the specified file.

In the context of code, it is not enough for the target code agent to simply avoid rejecting the request; the target code agent must also generate and execute code that performs the intended function.

Quantitatively, we f…

1 month, 1 week назад @ microsoft.com
Tell me when: Building agents that can wait, monitor, and act
Tell me when: Building agents that can wait, monitor, and act Tell me when: Building agents that can wait, monitor, and act

This matters because monitoring tasks are everywhere.

To address this, we are introducing SentinelStep (opens in new tab), a mechanism that enables agents to complete long-running monitoring tasks.

Most real-world monitoring tasks share this limitation, making systematic bench marking very challenging.

In response, we are developing SentinelBench, a suite of synthetic web environments for evaluating monitoring tasks.

By embedding patience into plans, agents can responsibly monitor conditions and act when it matters—staying proactive without wasting resources.

1 month, 3 weeks назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

2 months, 1 week назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

2 months, 1 week назад @ microsoft.com
When AI Meets Biology: Promise, Risk, and Responsibility
When AI Meets Biology: Promise, Risk, and Responsibility When AI Meets Biology: Promise, Risk, and Responsibility

In computer-based studies, we found that AI protein design (AIPD) tools could generate modified versions of proteins of concern, such as ricin.

Azure AI Foundry Labs Get a glimpse of potential future directions for AI, with these experimental technologies from Microsoft Research.

Stratified tiers of information : Data and code are classified into several tiers according to their potential hazard, from low-risk summaries through sensitive technical data to critical software pipelines.

The Age of AI in the Life Sciences: Benefits and Biosecurity Considerations, National Academies of Science, Engineering, and Medicine, 2025.

(opens in new tab)Protecting scientific integrity in an age of genera…

2 months, 1 week назад @ microsoft.com
MIT AI MIT AI
последний пост 8 часов назад
Enabling small language models to solve complex reasoning tasks
Enabling small language models to solve complex reasoning tasks Enabling small language models to solve complex reasoning tasks

As language models (LMs) improve at tasks like image generation, trivia questions, and simple math, you might think that human-like reasoning is around the corner.

Small LMs can’t do this reliably on their own; large language models (LLMs) sometimes can, particularly if they’re optimized for reasoning tasks, but they take a while to respond, and they use a lot of computing power.

Then, the LLM relays these instructions and guidelines in a clear way to smaller models.

For instance, whereas existing reasoning models like OpenAI’s o1 perform reasoning in text, DisCIPL “reasons” by writing Python code, which is more compact.

DisCIPL’s efficiency gains stem partly from using small Llama models a…

8 часов назад @ news.mit.edu
New MIT program to train military leaders for the AI age
New MIT program to train military leaders for the AI age New MIT program to train military leaders for the AI age

Artificial intelligence can enhance decision-making and enable action with reduced risk and greater precision, making it a critical tool for national security.

“The potential for artificial intelligence is just starting to be fully realized.

The 2N6 curriculum is application focused, and the content is built to satisfy the U.S. Navy’s sub-specialty code for Applied Artificial Intelligence.

“The admiral made the connection, envisioning an applied AI program similar to 2N.”2N6 will run as a pilot program for at least two years.

The program’s first cohort will comprise only U.S. Navy officers, with plans to expand more broadly.

10 часов назад @ news.mit.edu
New method improves the reliability of statistical estimations
New method improves the reliability of statistical estimations New method improves the reliability of statistical estimations

Let’s say an environmental scientist is studying whether exposure to air pollution is associated with lower birth weights in a particular county.

In simulations and experiments with real data, their method was the only technique that consistently generated accurate confidence intervals.

Finally, they assume the source data are similar to the target data where one wants to estimate.

A smooth solutionThe new method for generating confidence intervals explicitly accounts for this potential bias.

When they compared their method to other common techniques, they found it was the only one that could consistently produce reliable confidence intervals for spatial analyses.

23 часа назад @ news.mit.edu
New materials could boost the energy efficiency of microelectronics
New materials could boost the energy efficiency of microelectronics New materials could boost the energy efficiency of microelectronics

MIT researchers have developed a new fabrication method that could enable the production of more energy efficient electronics by stacking multiple functional components on top of one existing circuit.

This new electronics integration platform allows scientists to fabricate transistors and memory devices in one compact stack on a semiconductor chip.

Stacking active components would reduce the distance data must travel and improve a chip’s energy efficiency.

These compact memory transistors demonstrated switching speeds of only 10 nanoseconds, hitting the limit of the team’s measurement instruments.

In the future, they want to build upon these demonstrations by integrating back-end memory tra…

1 day, 23 hours назад @ news.mit.edu
MIT affiliates named 2025 Schmidt Sciences AI2050 Fellows
MIT affiliates named 2025 Schmidt Sciences AI2050 Fellows MIT affiliates named 2025 Schmidt Sciences AI2050 Fellows

Two current MIT affiliates and seven additional alumni are among those named to the 2025 cohort of AI2050 Fellows.

Zongyi Li, a postdoc in the MIT Computer Science and Artificial Intelligence Lab, and Tess Smidt ’12, an associate professor of electrical engineering and computer science (EECS), were both named as AI2050 Early Career Fellows.

He received his PhD in computing and mathematical sciences from Caltech, where he was advised by Anima Anandkumar and Andrew Stuart.

Li's work has been supported by a Kortschak Scholarship, PIMCO Fellowship, Amazon AI4Science Fellowship, Nvidia Fellowship, and MIT-Novo Nordisk AI Fellowship.

Besides the AI2050 fellowship, she has received an Air Force Yo…

4 days, 8 hours назад @ news.mit.edu
MIT researchers “speak objects into existence” using AI and robotics
MIT researchers “speak objects into existence” using AI and robotics MIT researchers “speak objects into existence” using AI and robotics

“We’re connecting natural language processing, 3D generative AI, and robotic assembly,” says Alexander Htet Kyaw, an MIT graduate student and Morningside Academy for Design (MAD) fellow.

Generative AI and robotics are moving us ever closer to the day when we can ask for an object and have it created within a few minutes.

In fact, MIT researchers have developed a speech-to-reality system, an AI-driven workflow that allows them to provide input to a robotic arm and “speak objects into existence,” creating things like furniture in as little as five minutes.

This is followed by creation of a feasible assembly sequence and automated path planning for the robotic arm to assemble physical objects …

1 week назад @ news.mit.edu
Robots that spare warehouse workers the heavy lifting
Robots that spare warehouse workers the heavy lifting Robots that spare warehouse workers the heavy lifting

The company’s unloading robots combine generative AI and machine-learning algorithms with sensors, cameras, and machine-vision software to navigate new environments on day one and improve performance over time.

The Pickle Robot Company wants its machines to do the heavy lifting.

The robots can unload anywhere from 400 to 1,500 cases per hour depending on size and weight.

“Our immediate product roadmap is load and unload,” Meyer says.

What does it mean for the robot unloading a truck to talk to the robot palletizing, or for the forklift to talk to the inventory drone?

1 week назад @ news.mit.edu
A smarter way for large language models to think about hard problems
A smarter way for large language models to think about hard problems A smarter way for large language models to think about hard problems

To make large language models (LLMs) more accurate when answering harder questions, researchers can let the model spend more time thinking about potential solutions.

To address this, MIT researchers developed a smarter way to allocate computational effort as the LLM solves a problem.

In addition, their method allows smaller, less resource-intensive LLMs to perform as well as or even better than larger models on complex problems.

Computation for contemplationA recent approach called inference-time scaling lets a large language model take more time to reason about difficult problems.

A separate model, known as a process reward model (PRM), scores each potential solution or reasoning path.

1 week, 1 day назад @ news.mit.edu
MIT engineers design an aerial microrobot that can fly as fast as a bumblebee
MIT engineers design an aerial microrobot that can fly as fast as a bumblebee MIT engineers design an aerial microrobot that can fly as fast as a bumblebee

MIT researchers have demonstrated aerial microrobots that can fly with speed and agility that is comparable to their biological counterparts.

So far, aerial microrobots have only been able to fly slowly along smooth trajectories, far from the swift, agile flight of real insects — until now.

Like real insects, these robots could flit through tight spaces larger robots can’t reach, while simultaneously dodging stationary obstacles and pieces of falling rubble.

We need to have robust flight control,” How says.

This is exciting because it points toward future insect-scale robots with agility approaching that of their biological counterparts,” she adds.

1 week, 2 days назад @ news.mit.edu
Helping power-system planners prepare for an unknown future
Helping power-system planners prepare for an unknown future Helping power-system planners prepare for an unknown future

“The components each describe basic actions in an energy system: transfer, storage, transformation, and entering or exiting the network,” explains Macdonald.

This flexibility has led other research groups to begin using Macro for their own projects.

It also provides an added benefit when it comes to power system planning.

As with the global climate simulator, using Macro to perform a complete analysis of a proposed policy can take days.

“Then, before the legislator actually drafts the bill, the academic team would run the full Macro model to confirm the accuracy of the results from the emulator,” says Knittel.

1 week, 2 days назад @ news.mit.edu
New control system teaches soft robots the art of staying safe
New control system teaches soft robots the art of staying safe New control system teaches soft robots the art of staying safe

Soft robots, with their deformable bodies, promise a future where machines move more seamlessly alongside people, assist in caregiving, or handle delicate items in industrial settings.

This motivates the need for safe control strategies for soft robots.

In domestic settings, robots could help with chores or caregiving tasks, interacting safely with children or the elderly — a key step toward making soft robots reliable partners in real-world environments.

Looking ahead, the team plans to extend their methods to three-dimensional soft robots and explore integration with learning-based strategies.

“However, as soft robots become faster, stronger, and more capable, that may no longer be enough…

1 week, 3 days назад @ news.mit.edu
MIT Sea Grant students explore the intersection of technology and offshore aquaculture in Norway
MIT Sea Grant students explore the intersection of technology and offshore aquaculture in Norway MIT Sea Grant students explore the intersection of technology and offshore aquaculture in Norway

Two MIT students recently traveled to Trondheim, Norway to explore the cutting-edge technologies being developed and deployed in offshore aquaculture.

Beckett Devoe, a senior in artificial intelligence and decision-making, and Tony Tang, a junior in mechanical engineering, first worked with MIT Sea Grant through the Undergraduate Research Opportunities Program (UROP).

To help better understand this emerging industry, MIT Sea Grant created a collaborative initiative, AquaCulture Shock, with funding from an Aquaculture Technologies and Education Travel Grant through the National Sea Grant College Program.

Kelasidi collaborated with MIT Sea Grant director Michael Triantafyllou and professor of…

1 week, 4 days назад @ news.mit.edu
Driving American battery innovation forward
Driving American battery innovation forward Driving American battery innovation forward

Advancements in battery innovation are transforming both mobility and energy systems alike, according to Kurt Kelty, vice president of battery, propulsion, and sustainability at General Motors (GM).

At the MIT Energy Initiative (MITEI) Fall Colloquium, Kelty explored how GM is bringing next-generation battery technologies from lab to commercialization, driving American battery innovation forward.

“How do you drive down the cost?” Kelty asked the audience.

Lithium-iron-phosphate (LFP) batteries are the chemistry of choice in China, known for low cost, high cycle life, and high safety.

Using a bidirectional charger with a two-way flow of energy, EVs could charge, but also send power from thei…

1 week, 4 days назад @ news.mit.edu
Exploring how AI will shape the future of work
Exploring how AI will shape the future of work Exploring how AI will shape the future of work

“MIT hasn’t just prepared me for the future of work — it’s pushed me to study it.

As AI systems become more capable, more of our online activity will be carried out by artificial agents.

What happens when AI begins making many of our decisions?”These are some of the questions MIT Sloan School of Management PhD candidate Benjamin Manning is researching.

My mom definitely won’t ever get over telling people about it.”Of his MIT Sloan experience, Manning says he didn’t know it was possible to learn so much so quickly.

“Another part of my research agenda explores how well AI systems can simulate human responses.

1 week, 4 days назад @ news.mit.edu
Researchers discover a shortcoming that makes LLMs less reliable
Researchers discover a shortcoming that makes LLMs less reliable Researchers discover a shortcoming that makes LLMs less reliable

Large language models (LLMs) sometimes learn the wrong lessons, according to an MIT study.

This shortcoming could reduce the reliability of LLMs that perform tasks like handling customer inquiries, summarizing clinical notes, and generating financial reports.

In prior work, the researchers found that LLMs pick up patterns in the parts of speech that frequently appear together in training data.

Missing the meaningThe researchers tested this phenomenon by designing synthetic experiments in which only one syntactic template appeared in the model’s training data for each domain.

They are also interested in exploring this phenomenon in reasoning models, special types of LLMs designed to tackle m…

2 weeks, 2 days назад @ news.mit.edu
Berkeley AI
последний пост 1 month, 1 week назад
RL without TD learning
RL without TD learning RL without TD learning

RL without TD learningIn this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer.

We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning.

There are two classes of algorithms in RL: on-policy RL and off-policy RL.

We compared TRL with $n$-step TD learning with different values of $n$, from $1$ (pure TD) to $\infty$ (pure MC).

I still think one of the most important problems in RL (and even in machine learning) is to find a scalable off-policy RL algorithm.

1 month, 1 week назад @ bair.berkeley.edu
What exactly does word2vec learn?
What exactly does word2vec learn? What exactly does word2vec learn?

What exactly does word2vec learn?

What exactly does word2vec learn, and how?

In this framing, it’s clear that word2vec is a minimal neural language model.

As a result, the theory predicts exactly what features are learned in terms of the corpus statistics and the algorithmic hyperparameters.

We find that over the course of learning, word2vec builds these linear representations in a sequence of noisy learning steps, and their geometry is well-described by a spiked random matrix model.

3 months, 1 week назад @ bair.berkeley.edu
Whole-Body Conditioned Egocentric Video Prediction
Whole-Body Conditioned Egocentric Video Prediction Whole-Body Conditioned Egocentric Video Prediction

Whole-Body Conditioned Egocentric Video Prediction×Predicting Ego-centric Video from human Actions (PEVA).

We trained a model to Predict Ego-centric Video from human Actions (PEVA) for Whole-Body-Conditioned Egocentric Video Prediction.

We train an autoregressive conditional diffusion transformer on Nymeria, a large-scale dataset pairing real-world egocentric video with body pose capture.

We include some samples here:Body Movement Actions Move Forward Rotate Left Rotate Right Left Hand Actions Move Left Hand Up Move Left Hand Down Move Left Hand Left Move Left Hand Right Right Hand Actions Move Right Hand Up Move Right Hand Down Move Right Hand Left Move Right Hand RightLong RolloutHere you…

5 months, 2 weeks назад @ bair.berkeley.edu
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign) Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications.

To mitigate the imminent prompt injection threat, we propose two fine-tuning-defenses, StruQ and SecAlign.

Prompt Injection Attack: CausesBelow is the threat model of prompt injection attacks.

Prompt injection threat model in LLM-integrated applicationsWe propose that prompt injection has two causes.

Below are resources to learn more and keep updated on prompt injection attacks and defenses.

8 months назад @ bair.berkeley.edu
Repurposing Protein Folding Models for Generation with Latent Diffusion
Repurposing Protein Folding Models for Generation with Latent Diffusion Repurposing Protein Folding Models for Generation with Latent Diffusion

Repurposing Protein Folding Models for Generation with Latent DiffusionPLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models.

In PLAID, we develop a method that learns to sample from the latent space of protein folding models to generate new proteins.

Unlike many previous protein structure generative models, PLAID addresses the multimodal co-generation problem setting: simultaneously generating both discrete sequence and continuous all-atom structural coordinates.

In this way, we can use structural understanding information in the weights of pretrained protein folding models for the p…

8 months, 1 week назад @ bair.berkeley.edu
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway DeploymentTraining Diffusion Models with Reinforcement LearningWe deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone.

The challenges of phantom jamsA stop-and-go wave moving backwards through highway traffic.

Smoothing behavior of RL AVs.

Overall, the steps towards deployment involved:Training in data-driven simulations: We used highway traffic data from I-24 to create a training environment with realistic wave dynamics, then validate the trained agent’s performance and robustness in a variety of new traffic scenarios.…

8 months, 3 weeks назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 10 часов назад
Building a voice-driven AWS assistant with Amazon Nova Sonic
Building a voice-driven AWS assistant with Amazon Nova Sonic Building a voice-driven AWS assistant with Amazon Nova Sonic

In this post, we explore how to build a sophisticated voice-powered AWS operations assistant using Amazon Nova Sonic for speech processing and Strands Agents for multi-agent orchestration.

The following diagram illustrates how Amazon Nova Sonic integrates with Strands Agents to create a seamless multi-agent system that processes voice commands and executes AWS operations in real-time.

Solution overviewThe Strands Agents Nova Voice Assistant demonstrates a new paradigm for AWS infrastructure management through conversational artificial intelligence (AI).

These examples show how to integrate Amazon Nova Sonic for voice processing and configure the supervisor agent for intelligent task routing…

10 часов назад @ aws.amazon.com
How Harmonic Security improved their data-leakage detection system with low-latency fine-tuned models using Amazon SageMaker, Amazon Bedrock, and Amazon Nova Pro
How Harmonic Security improved their data-leakage detection system with low-latency fine-tuned models using Amazon SageMaker, Amazon Bedrock, and Amazon Nova Pro How Harmonic Security improved their data-leakage detection system with low-latency fine-tuned models using Amazon SageMaker, Amazon Bedrock, and Amazon Nova Pro

To achieve sub-500 millisecond latency while maintaining accuracy, we developed two classification approaches using a fine-tuned ModernBERT model.

First, a binary classification model was prioritized to detect Mergers & Acquisitions (M&A) content, a critical category for helping prevent sensitive data leaks.

Moreover, our analysis showed that dropout and learning rate were the most important hyperparameters, accounting for 48% and 21% of the variance of the F1 score for the binary classification model.

The following subsections provide detailed results for binary M&A classification and multi-label classification across multiple sensitive data types.

Binary classificationWe evaluated the fin…

1 day, 10 hours назад @ aws.amazon.com
How Swisscom builds enterprise agentic AI for customer support and sales using Amazon Bedrock AgentCore
How Swisscom builds enterprise agentic AI for customer support and sales using Amazon Bedrock AgentCore How Swisscom builds enterprise agentic AI for customer support and sales using Amazon Bedrock AgentCore

In this post, we’ll show how Swisscom implemented Amazon Bedrock AgentCore to build and scale their enterprise AI agents for customer support and sales operations.

The architecture diagram below illustrates the vision and associated challenges for a generic customer agent without the Amazon Bedrock AgentCore.

How Amazon Bedrock AgentCore addresses the challengesAmazon Bedrock AgentCore provides Swisscom with a comprehensive solution that addresses their enterprise-scale agentic AI challenges.

The traffic traverses the VPC endpoints for Bedrock and Bedrock AgentCore, keeping the traffic private.

Akarsha Sehwag is a Generative AI Data Scientist for Amazon Bedrock AgentCore GTM team.

1 day, 10 hours назад @ aws.amazon.com
Scaling MLflow for enterprise AI: What’s New in SageMaker AI with MLflow
Scaling MLflow for enterprise AI: What’s New in SageMaker AI with MLflow Scaling MLflow for enterprise AI: What’s New in SageMaker AI with MLflow

Administrators can define a maintenance window during the creation of the MLflow App, during which in-place version upgrades of the MLflow App take place.

SageMaker model customization and MLflow integrationBy default, SageMaker model customization integrates with MLflow, providing automatic linking between model customization jobs and MLflow experiments.

ConclusionThese features make the new MLflow Apps in SageMaker AI ready for enterprise-scale ML and generative AI workloads with minimal administrative burden.

Get started now by visiting the SageMaker AI with MLflow product detail page and Accelerate generative AI development using managed MLflow on Amazon SageMaker AI, and send your feed…

1 day, 10 hours назад @ aws.amazon.com
Amazon Bedrock AgentCore Observability with Langfuse
Amazon Bedrock AgentCore Observability with Langfuse Amazon Bedrock AgentCore Observability with Langfuse

In this post, we explain how to integrate Langfuse observability with Amazon Bedrock AgentCore to gain deep visibility into an AI agent’s performance, debug issues faster, and optimize costs.

Amazon Bedrock AgentCore is a comprehensive agentic platform that can deploy and operate highly capable AI agents securely, at scale.

How Langfuse tracing worksLangfuse uses OpenTelemetry to trace and monitor agents deployed on Amazon Bedrock AgentCore.

Solution overviewThis post shows how to deploy a Strands agent on Amazon Bedrock AgentCore Runtime with Langfuse observability.

Amazon Bedrock Model Access for Anthropic Claude 3.7 in us-west-2 regionAmazon Bedrock AgentCore permissionsPython 3.10+Docke…

1 day, 10 hours назад @ aws.amazon.com
Implement automated smoke testing using Amazon Nova Act headless mode
Implement automated smoke testing using Amazon Nova Act headless mode Implement automated smoke testing using Amazon Nova Act headless mode

Automated smoke testing using Amazon Nova Act headless mode helps development teams validate core functionality in continuous integration and continuous delivery (CI/CD) pipelines.

This post shows how to implement automated smoke testing using Amazon Nova Act headless mode in CI/CD pipelines.

We walk through the following steps to implement automated smoke testing with Amazon Nova Act:Set up your project and dependencies.

Create smoke test for login validationLet’s expand your foundation code to include a complete login test with proper structure.

ConclusionIn this post, we showed how to implement automated smoke testing using Amazon Nova Act headless mode for CI/CD pipelines.

2 days, 9 hours назад @ aws.amazon.com
Real-world reasoning: How Amazon Nova Lite 2.0 handles complex customer support scenarios
Real-world reasoning: How Amazon Nova Lite 2.0 handles complex customer support scenarios Real-world reasoning: How Amazon Nova Lite 2.0 handles complex customer support scenarios

This post evaluates the reasoning capabilities of our latest offering in the Nova family, Amazon Nova Lite 2.0, using practical scenarios that test these critical dimensions.

Nova models (Lite, Micro, Pro, Premier) use Amazon Bedrock Converse API, which provides a unified interface for conversational interactions.

The evaluation prompt structure:```python EVALUATION_PROMPT = """ # Customer Support Response Evaluation Task You are an expert evaluator assessing customer support responses.

Problem Identification: Nova Lite 2.0 excelled at identifying all key issues—crucial where missing problems lead to incomplete solutions.

Key takeawaysAmazon Nova Lite 2.0 demonstrates impressive reasoning c…

3 days, 7 hours назад @ aws.amazon.com
Create AI-powered chat assistants for your enterprise with Amazon Quick Suite
Create AI-powered chat assistants for your enterprise with Amazon Quick Suite Create AI-powered chat assistants for your enterprise with Amazon Quick Suite

Benefits of Quick Suite chat agentsQuick Suite chat agents make advanced AI capabilities accessible to non-technical business users.

PrerequisitesTo build a custom chat agent in Quick Suite, you must have the following:An active Quick Suite instanceA Quick Suite subscription for the required capabilities: Professional – Create, configure, and share, spaces and custom chat agents Enterprise (includes Professional capabilities) – Create knowledge basesFor more information about Quick Suite’s subscription tiers, see Amazon Quick Suite pricing.

Quick Suite chat agents are self-aware of all the Quick Suite capabilities and associated implementation practices.

Choose Create chat agent Choose Skip…

3 days, 11 hours назад @ aws.amazon.com
How AWS delivers generative AI to the public sector in weeks, not years
How AWS delivers generative AI to the public sector in weeks, not years How AWS delivers generative AI to the public sector in weeks, not years

The AWS Generative AI Innovation Center is already making this happen, consistently delivering production-ready solutions for government organizations.

Drawing from over a thousand implementations, the Generative AI Innovation Center combines AWS infrastructure and security conformance to help you transform mission delivery.

Each successful engagement creates a blueprint for the next, continuously expanding what’s possible for public sector AI.

Learn more about the AWS Generative AI Innovation Center and how they’re helping public sector organizations turn AI potential into production reality.

About the authorsKate Zimmerman serves as the Generative AI Innovation Center Geo Leader for World…

4 days, 11 hours назад @ aws.amazon.com
S&P Global Data integration expands Amazon Quick Research capabilities
S&P Global Data integration expands Amazon Quick Research capabilities S&P Global Data integration expands Amazon Quick Research capabilities

Today, we are pleased to announce a new integration between Amazon Quick Research and S&P Global.

This integration brings both S&P Global Energy news, research, and insights and S&P Global Market Intelligence data to Quick Research customers in one deep research agent.

S&P Global Energy: Comprehensive commodity and energy intelligenceThe S&P Global Energy integration, now available in Amazon Quick Research, utilizes an AI Ready Data MCP server to deliver comprehensive access to commodity and energy market intelligence spanning Oil, Gas, Power, Metals, Clean Energy, Agriculture, and Shipping sectors across global markets.

S&P Global Market Intelligence: Trusted financial intelligenceThe S&P …

4 days, 11 hours назад @ aws.amazon.com
Streamline AI agent tool interactions: Connect API Gateway to AgentCore Gateway with MCP
Streamline AI agent tool interactions: Connect API Gateway to AgentCore Gateway with MCP Streamline AI agent tool interactions: Connect API Gateway to AgentCore Gateway with MCP

Amazon Bedrock AgentCore Gateway now supports Amazon API Gateway as a target, translating MCP requests to AgentCore Gateway into RESTful requests to API Gateway.

This integration between AgentCore Gateway and API Gateway simplifies the connection between API Gateway and AgentCore Gateway.

Your agentic applications can now connect to your new or existing API Gateway API.

This integration between AgentCore Gateway and API Gateway supports IAM and API key authorization.

API keys created using an API Gateway mapped with API Gateway usage plans, helps you monitor and control API usage.

4 days, 11 hours назад @ aws.amazon.com
Create an intelligent insurance underwriter agent powered by Amazon Nova 2 Lite and Amazon Quick Suite
Create an intelligent insurance underwriter agent powered by Amazon Nova 2 Lite and Amazon Quick Suite Create an intelligent insurance underwriter agent powered by Amazon Nova 2 Lite and Amazon Quick Suite

Sign in to Quick Suite using credentials with the Quick Suite Author Pro role.

Create and launch the Quick Suite chat agentIn this section, you create a custom chat agent in Quick Suite.

Add an optional description for your custom chat agent that helps users understand the purpose of the chat agent.

On the Configure chat agent page, provide the following information: For Agent identity, define the identity of your chat agent.

You will see the progress Launching chat agent… and in a few minutes, you will see Successfully launched chat agent.

4 days, 12 hours назад @ aws.amazon.com
How Myriad Genetics achieved fast, accurate, and cost-efficient document processing using the AWS open-source Generative AI Intelligent Document Processing Accelerator
How Myriad Genetics achieved fast, accurate, and cost-efficient document processing using the AWS open-source Generative AI Intelligent Document Processing Accelerator How Myriad Genetics achieved fast, accurate, and cost-efficient document processing using the AWS open-source Generative AI Intelligent Document Processing Accelerator

This post explores how Myriad Genetics partnered with the AWS Generative AI Innovation Center (GenAIIC) to transform their healthcare document processing pipeline using Amazon Bedrock and Amazon Nova foundation models.

Pattern 2 – Uses Amazon Textract and Amazon Bedrock with Amazon Nova, Anthropic’s Claude, or custom fine-tuned Amazon Nova models.

Key Information Extraction: The configured LLM extracted medical information using extraction prompt provided in the config file.

Automating Key Information Extraction with generative AIMyriad’s information extraction was manual, requiring up to 10 full-time employees contributing 78 hours daily in the Women’s Health unit alone, which created oper…

2 weeks, 2 days назад @ aws.amazon.com
How CBRE powers unified property management search and digital assistant using Amazon Bedrock
How CBRE powers unified property management search and digital assistant using Amazon Bedrock How CBRE powers unified property management search and digital assistant using Amazon Bedrock

This blog post describes how CBRE and AWS partnered to transform how property management professionals access information, creating a next-generation search and digital assistant experience that unifies access across many types of property data using Amazon Bedrock, Amazon OpenSearch Service, Amazon Relational Database Service, Amazon Elastic Container Service, and AWS Lambda.

The solution reduced SQL query generation time from an average of 12 seconds earlier to 4 seconds using Amazon Nova Pro.

DocInteract Component (Unstructured Document Search): This pathway is specifically designed for intelligent search and interaction with unstructured documents.

Prompt engineering and management opti…

2 weeks, 2 days назад @ aws.amazon.com
Managed Tiered KV Cache and Intelligent Routing for Amazon SageMaker HyperPod
Managed Tiered KV Cache and Intelligent Routing for Amazon SageMaker HyperPod Managed Tiered KV Cache and Intelligent Routing for Amazon SageMaker HyperPod

Today we’re excited to announce that Amazon SageMaker HyperPod now supports Managed Tiered KV Cache and Intelligent Routing capabilities through the HyperPod Inference Operator.

Optimizing LLM inference with Managed Tiered KV Cache and Intelligent RoutingLet’s break down the new features:Managed Tiered KV Cache : Automatic management of attention states across CPU memory (L1) and distributed tiered storage (L2) with configurable cache sizes and eviction policies.

Observability: Built-in HyperPod Observability integration for observability of metrics and logs for Managed Tiered KV Cache and Intelligent Routing in Amazon Managed Grafana.

Managed Tiered KV CacheManaged Tiered KV Cache and Inte…

2 weeks, 2 days назад @ aws.amazon.com
NVIDIA
последний пост 12 часов назад
Cheers to AI: ADAM Robot Bartender Makes Drinks at Vegas Golden Knights Game
Cheers to AI: ADAM Robot Bartender Makes Drinks at Vegas Golden Knights Game Cheers to AI: ADAM Robot Bartender Makes Drinks at Vegas Golden Knights Game

ADAM, a robot developed with NVIDIA Isaac libraries, is pouring drinks and turning heads in one of the NHL’s most exciting venues.

ADAM, short for Automated Dual Arm Mixologist, was developed by Las-Vegas based Richtech Robotics.

“The hospitality industry faces significant labor challenges, and ADAM is our answer to meeting those needs while elevating the customer experience,” said Matt Casella, president of Richtech Robotics.

“With NVIDIA’s Isaac platform, we’ve developed a solution that’s scalable, consistent, and frankly, creates memorable moments for fans.

Running Real-Time AI at the Edge With JetsonADAM runs on NVIDIA Jetson AGX Orin, the powerful edge AI platform capable of 275 TOPS o…

12 часов назад @ blogs.nvidia.com
As AI Grows More Complex, Model Builders Rely on NVIDIA
As AI Grows More Complex, Model Builders Rely on NVIDIA As AI Grows More Complex, Model Builders Rely on NVIDIA

It’s the latest example of how leading AI builders train and deploy at scale on NVIDIA’s full-stack AI infrastructure.

Runway last week announced Gen-4.5, a new frontier video generation model that’s the current top-rated video model in the world, according to the Artificial Analysis leaderboard.

Now optimized for NVIDIA Blackwell, Gen-4.5 was developed entirely on NVIDIA GPUs across initial research and development, pre-training, post-training and inference.

Runway also announced GWM-1, a state-of-the-art general world model trained on NVIDIA Blackwell that’s built to simulate reality in real time.

NVIDIA Blackwell Across Clouds and Data CentersNVIDIA Blackwell is widely available from lea…

1 day, 9 hours назад @ blogs.nvidia.com
Ride Into Adventure With Capcom’s ‘Monster Hunter Stories’ Series in the Cloud
Ride Into Adventure With Capcom’s ‘Monster Hunter Stories’ Series in the Cloud Ride Into Adventure With Capcom’s ‘Monster Hunter Stories’ Series in the Cloud

Monster Hunter Stories and Monster Hunter Stories 2: Wings of Ruin are soaring into the cloud this week, bringing colorful worlds, charming companions and turn-based monster battles across devices.

Saddle UpCapcom’s Monster Hunter Stories and Monster Hunter Stories 2: Wings of Ruin arrive in the cloud this week.

In Monster Hunter Stories, an RPG that expands the Monster Hunter world, players are no longer hunting monsters but raising them.

The first installment of the Monster Hunter Stories series returns, fully voiced in Japanese and English, with additional features such as a new museum mode where players can listen to music and view concept art — offering an even deeper dive into the wor…

1 day, 14 hours назад @ blogs.nvidia.com
Opt-In NVIDIA Software Enables Data Center Fleet Management
Opt-In NVIDIA Software Enables Data Center Fleet Management Opt-In NVIDIA Software Enables Data Center Fleet Management

The optional service will allow data center operators to monitor the health of their entire AI GPU fleet to maximize uptime.

As the scale and complexity of AI infrastructure grows, data center operators need continuous visibility into factors including performance, temperature and power usage.

These insights enable data center operators to actively monitor and adjust data center configurations across large-scale, distributed systems — validating that these systems are operating at their highest efficiency and reliability.

With the service, data center operators will be able to:Track spikes in power usage to keep within energy budgets while maximizing performance per watt.

Making sure that A…

2 days, 4 hours назад @ blogs.nvidia.com
How NVIDIA H100 GPUs on CoreWeave’s AI Cloud Platform Delivered a Record-Breaking Graph500 Run
How NVIDIA H100 GPUs on CoreWeave’s AI Cloud Platform Delivered a Record-Breaking Graph500 Run How NVIDIA H100 GPUs on CoreWeave’s AI Cloud Platform Delivered a Record-Breaking Graph500 Run

NVIDIA last month announced a record-breaking benchmark result of 410 trillion traversed edges per second (TEPS), ranking No.

The level of performance recently achieved by NVIDIA and CoreWeave enables searching through every friend relationship on Earth in just about three milliseconds.

To process graphs, CPUs move graph data across compute nodes.

A common approach is to process the graph where it is with active messages, where developers send messages that can process graph data in place.

As such, in this redesigned system, active messaging runs completely on GPUs, bypassing the CPU.

2 days, 7 hours назад @ blogs.nvidia.com
3 Ways NVIDIA Is Powering the Industrial Revolution
3 Ways NVIDIA Is Powering the Industrial Revolution 3 Ways NVIDIA Is Powering the Industrial Revolution

The NVIDIA accelerated computing platform is leading supercomputing benchmarks once dominated by CPUs, enabling AI, science, business and computing efficiency worldwide.

The top five performers in this industry standard benchmark were all NVIDIA GPUs, delivering an average of 70.1 gigaflops per watt.

The leadership of NVIDIA GPUs in the TOP100 is both proof of this trajectory and a signal of what comes next — breakthroughs across every discipline.

It’s the foundation for the three scaling laws that represent the roadmap for AI’s next workflow: pretraining, post‑training and test‑time scaling.

NVIDIA platforms are the only to run on all of the leading generative AI models and handle 1.4 mill…

2 days, 10 hours назад @ blogs.nvidia.com
NVIDIA Kaggle Grandmasters Win Artificial General Intelligence Competition
NVIDIA Kaggle Grandmasters Win Artificial General Intelligence Competition NVIDIA Kaggle Grandmasters Win Artificial General Intelligence Competition

NVIDIA researchers on Friday won a key Kaggle competition many in the field treat as a real-time pulse check on humanity’s progress toward artificial general intelligence (AGI).

The ARC-AGI benchmark measures how well AI systems perform abstract reasoning and then generalize from very few examples using grid-based visual puzzles.

ARC-AGI-2 is a harder, updated version that removes overlap with public training data.

The ARC-AGI benchmark has become one of the most closely watched indicators of real progress toward general reasoning in AI.

Instead, it leaned on three ideas any developer can appreciate: synthetic data, test-time training, and disciplined engineering.

1 week назад @ developer.nvidia.com
NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains
NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains

NVIDIA CUDA 13.1 introduces the largest and most comprehensive update to the CUDA platform since it was invented two decades ago.

CUDA Tile programmingTo help create software for current and future GPUs, NVIDIA CUDA 13.1 is launching CUDA Tile, which enables you to write GPU kernels at a layer above SIMT.

In this first version of the software:CUDA tile is supported on NVIDIA Blackwell (compute capability 10.x and 12.x) products only.

cub::DeviceScan::ExclusiveSum(d_input,..., mr);Learn moreThe release of CUDA 13.1 brings many new features and ushers in a new era of GPU programming with CUDA Tile.

Check out CUDA Tile resources, download CUDA Toolkit 13.1, and get started today.

1 week, 1 day назад @ developer.nvidia.com
Simplify GPU Programming with NVIDIA CUDA Tile in Python
Simplify GPU Programming with NVIDIA CUDA Tile in Python Simplify GPU Programming with NVIDIA CUDA Tile in Python

The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was invented.

With the launch of NVIDIA cuTile Python, you can write tile kernels in Python.

cuTile Python is an expression of the CUDA Tile programming model in Python, built on top of the CUDA Tile IR specification.

cuTile Python exampleWhat does cuTile Python code look like?

The source page also supports cuTile kernels and performance metrics at the source-line level, just like CUDA C kernels.

1 week, 1 day назад @ developer.nvidia.com
Focus on Your Algorithm—NVIDIA CUDA Tile Handles the Hardware
Focus on Your Algorithm—NVIDIA CUDA Tile Handles the Hardware Focus on Your Algorithm—NVIDIA CUDA Tile Handles the Hardware

With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile.

Figure 1 shows the conceptual differences between the tile model we’re introducing with CUDA Tile, and the CUDA SIMT model.

CUDA Tile IR: The foundation of tile programmingThe foundation of CUDA Tile is CUDA Tile IR (intermediate representation).

CUDA Tile IR: For developers looking to build their own DSL compiler or library, CUDA Tile IR is where you’ll interface with CUDA Tile.

The CUDA Tile IR documentation and specification include information on the CUDA Tile IR programming abstractions, syntax, and semantics.

1 week, 1 day назад @ developer.nvidia.com
NVIDIA Awards up to $60,000 Research Fellowships to PhD Students
NVIDIA Awards up to $60,000 Research Fellowships to PhD Students NVIDIA Awards up to $60,000 Research Fellowships to PhD Students

NVIDIA Awards up to $60,000 Research Fellowships to PhD StudentsFor 25 years, the NVIDIA Graduate Fellowship Program has supported graduate students doing outstanding work relevant to NVIDIA technologies.

Today, the program announced the latest awards of up to $60,000 each to 10 Ph.D. students involved in research that spans all areas of computing innovation.

Yijia Shao , Stanford University — Researching human-agent collaboration by developing AI agents that can communicate and coordinate with humans during task execution, and designing new human-agent interaction interfaces.

, Stanford University — Researching human-agent collaboration by developing AI agents that can communicate and coor…

1 week, 1 day назад @ blogs.nvidia.com
Robots’ Holiday Wishes Come True: NVIDIA Jetson Platform Offers High-Performance Edge AI at Festive Prices
Robots’ Holiday Wishes Come True: NVIDIA Jetson Platform Offers High-Performance Edge AI at Festive Prices Robots’ Holiday Wishes Come True: NVIDIA Jetson Platform Offers High-Performance Edge AI at Festive Prices

The NVIDIA Jetson AGX Thor, AGX Orin and Jetson Orin Nano Super developer kits — on sale through Jan. 11 — give anyone the power to build intelligent robots of the future.

The Jetson Orin Nano Super Developer Kit — the world’s most affordable generative AI supercomputer — offers desktop-class AI in a palm-sized kit for exploring, building and deploying cutting-edge generative AI, vision and robotics.

Read more below on how the NVIDIA Jetson platform presents the future of robotics.

NVIDIA Jetson Orin Nano Super Serves as Brain of Self-Paddling CanoeRobotics enthusiast Dave Niewinski has built a self‑paddling canoe using the Jetson Orin Nano Super Developer Kit, letting boaters relax and gli…

1 week, 1 day назад @ blogs.nvidia.com
Game the Halls: GeForce NOW Brings Holiday Cheer With 30 New Games in the Cloud
Game the Halls: GeForce NOW Brings Holiday Cheer With 30 New Games in the Cloud Game the Halls: GeForce NOW Brings Holiday Cheer With 30 New Games in the Cloud

GeForce NOW is decking the digital halls with 30 new games to keep spirits high all month long.

The “Half-Price Holiday” sale keeps the savings rolling after Black Friday, with premium GeForce NOW memberships available at 50% off for the first month for a limited time.

And GeForce NOW is bringing members a new way to jump into the worlds of some of the most iconic games.

The update expands on existing automatic login support for Xbox, Epic Games and Ubisoft, further streamlining the GeForce NOW cloud gaming experience.

(Steam)(Steam) Sacred 2 Remaster (Steam)(Steam) Songs of Silence (Epic Games Store)(Epic Games Store) Zero Hour (Epic Games Store)To improve the overall quality of service fo…

1 week, 1 day назад @ blogs.nvidia.com
Mixture of Experts Powers the Most Intelligent Frontier AI Models, Runs 10x Faster on NVIDIA Blackwell NVL72
Mixture of Experts Powers the Most Intelligent Frontier AI Models, Runs 10x Faster on NVIDIA Blackwell NVL72 Mixture of Experts Powers the Most Intelligent Frontier AI Models, Runs 10x Faster on NVIDIA Blackwell NVL72

Kimi K2 Thinking, DeepSeek-R1, Mistral Large 3 and others run 10x faster on NVIDIA GB200 NVL72.

The extreme codesign of NVIDIA GB200 NVL72 systems combines hardware and software optimizations for maximum performance and efficiency, making it practical and straightforward to scale MoE models.

“NVIDIA GB200 NVL72 rack-scale design makes MoE model serving dramatically more efficient,” said Lin Qiao, cofounder and CEO of Fireworks AI.

Powering Intelligence at ScaleThe NVIDIA GB200 NVL72 rack-scale system is designed to deliver strong performance beyond MoE models.

Learn more about how GB200 NVL72 scales complex MoE models in this technical deep dive.

1 week, 2 days назад @ blogs.nvidia.com
NVIDIA Partners With Mistral AI to Accelerate New Family of Open Models
NVIDIA Partners With Mistral AI to Accelerate New Family of Open Models NVIDIA Partners With Mistral AI to Accelerate New Family of Open Models

The new Mistral 3 family, spanning frontier-level to compact models, is optimized for NVIDIA platforms, enabling Mistral AI’s vision for distributed intelligence across cloud to the edge.

Today, Mistral AI announced the Mistral 3 family of open-source multilingual, multimodal models, optimized across NVIDIA supercomputing and edge platforms.

This combination makes the announcement a step toward the era of — what Mistral AI calls ‘distributed intelligence,’ bridging the gap between research breakthroughs and real-world applications.

Mistral AI isn’t just driving state of the art for frontier large language models; it also released nine small language models that help developers run AI anywhe…

1 week, 3 days назад @ blogs.nvidia.com
Facebook
последний пост 3 weeks назад
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization

Zoomer has delivered training time reductions, and significant QPS improvements, making it the de-facto tool for AI performance optimization across Meta’s entire AI infrastructure.

Zoomer is Meta’s automated, one-stop-shop platform for performance profiling, debugging, analysis, and optimization of AI training and inference workloads.

AI Performance Optimization Using ZoomerZoomer is an automated debugging and optimization platform that works across all of our AI model types (ads recommendations, GenAI, computer vision, etc.)

Memory Analysis : Comprehensive analysis of GPU memory usage patterns, allocation tracking, and leak detection.

Realtime Memory Profiling : GPU memory allocation track…

3 weeks назад @ engineering.fb.com
Open Source Is Good for the Environment
Open Source Is Good for the Environment Open Source Is Good for the Environment

But have you heard about open hardware?

And did you know open source can have a positive impact on the environment?

On this episode of the Meta Tech Podcast, Pascal Hartig sits down with Dharmesh and Lisa to talk about all things open hardware, and Meta’s biggest announcements from the 2025 Open Compute Project (OCP) Summit – including a new open methodology for leveraging AI to understand Scope 3 emissions.

You’ll also hear how AI and open hardware are helping Meta push to achieve net zero emissions in 2030, including how AI is being used to develop new concrete mixes for data center construction.

And if you’re interested in learning more about career opportunities at Meta visit the Meta C…

4 weeks назад @ engineering.fb.com
Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation
Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation

We’re sharing details about Meta’s Generative Ads Recommendation Model (GEM), a new foundation model that delivers increased ad performance and advertiser ROI by enhancing other ads recommendation models’ ability to serve relevant ads.

GEM propagates its learnings, leveraging a suite of post-training techniques across the entire ads model fleet, enabling a paradigm shift in Meta’s Ads Recommendation system.

GEM leverages enhanced training scalability that efficiently utilizes thousands of GPUs for building and iterating an LLM-scale ads foundation model.

The Generative Ads Recommendation Model (GEM) is Meta’s most advanced ads foundation model, built on an LLM-inspired paradigm and trained …

1 month назад @ engineering.fb.com
Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism
Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism

At Meta, we are constantly pushing the boundaries of LLM inference systems to power applications such as the Meta AI App.

These metrics highlight the distinct computational demands of LLM inference: Prefill is compute-intensive, while decoding is memory bandwidth-intensive.

Communication: Communication latency increases when parallelizing across multiple hosts.

In EP-based inference, we utilize a two-shot, all-to-all communication pattern to exchange tokens between data parallelism and expert parallelism ranks based on routing.

We are committed to continuous innovation to ensure efficient and scalable LLM inference for millions of users worldwide.

1 month, 3 weeks назад @ engineering.fb.com
How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware
How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware

We leveraged AI to help us improve this database and understand our Scope 3 emissions associated with IT hardware by:Identifying similar components and applying existing PCFs to similar components that lack these carbon estimates.

Understanding the carbon footprint of IT racks and applying generative AI (GenAI) as a categorization algorithm to create a new and standard taxonomy .

If these similar components are not identified their carbon footprint estimates will remain at a lower data quality.

These similar components can be mapped to a representative proxy PCF, allowing us to use high-quality PCF data in similar components.

For example, we can scale the carbon footprint calculation for a …

1 month, 4 weeks назад @ engineering.fb.com
OCP Summit 2025: The Open Future of Networking Hardware for AI
OCP Summit 2025: The Open Future of Networking Hardware for AI OCP Summit 2025: The Open Future of Networking Hardware for AI

At Open Compute Project Summit (OCP) 2025, we’re sharing details about the direction of next-generation network fabrics for our AI training clusters.

At Meta, we believe that open hardware is a catalyst for innovation — especially as data center infrastructure increasingly supports new and emerging AI technologies.

Open hardware plays a crucial role in enabling disaggregation, allowing us to break down traditional data center technologies into their core components.

Today, through OCP, we continue to advance open network technologies for the next generation of AI applications.

Ethernet for Scale-Up Networking in OCP: Meta’s Industry LeadershipAt Meta, we recognize that the future of AI and …

2 months назад @ engineering.fb.com
LLMs Are the Key to Mutation Testing and Better Compliance
LLMs Are the Key to Mutation Testing and Better Compliance LLMs Are the Key to Mutation Testing and Better Compliance

By leveraging LLMs we’ve been able to overcome the barriers that have prevented mutation testing from being efficiently deployed at scale.

Our presentations shared insights into how we’ve used LLMs to solve the major barriers that have prevented mutation testing at scale and highlighted new areas in automated software testing where LLMs can have a significant impact.

Mutation Testing Isn’t ScalableTraditional mutation testing generates a very large number of mutants, making it computationally expensive and difficult to scale to large industrial codebases.

Mutation Testing Requires a Lot of Computational ResourcesMutation testing is costly in terms of computational resources and developer ef…

2 months, 1 week назад @ engineering.fb.com
AssetGen: Generating 3D Worlds With AI
AssetGen: Generating 3D Worlds With AI AssetGen: Generating 3D Worlds With AI

Imagine being able to use AI to create 3D virtual worlds using prompts as easily as you can generate images.

In his keynote, Mark Zuckerberg shared his vision of a future where anyone can create virtual worlds using AI-powered tools like the ones available in the upcoming Meta Horizon Studio.

But AI is already making it easier than ever to create 3D assets.

On this episode of the Meta Tech Podcast, Pascal Hartig is joined by Mahima and Rakesh from Meta’s XR Tech team to discuss AssetGen, a new foundation model for 3D assets.

They talk about how they built and trained AssetGen, the important role LLMs have to play in the future of VR, and how they’re tackling the ambitious goal of generating…

2 months, 2 weeks назад @ engineering.fb.com
Meta’s Infrastructure Evolution and the Advent of AI
Meta’s Infrastructure Evolution and the Advent of AI Meta’s Infrastructure Evolution and the Advent of AI

As our user base grew globally, we scaled beyond single data center buildings and into data center regions consisting of multiple buildings.

Enter AI Workloads (2020)While we were navigating the challenges of scaling, we were also seeing glimpses of how AI workloads would impact our infrastructure.

To build out our AI infrastructure, we’ve leveraged solutions from partners like AMD and NVIDIA as well as our own custom silicon.

Constructing Prometheus has been a monumental engineering feat, with infrastructure spanning five or more data center buildings in a single data center region.

We are still early in the evolution and adoption of AI workloads.

2 months, 2 weeks назад @ engineering.fb.com
Networking at the Heart of AI — @Scale: Networking 2025 Recap
Networking at the Heart of AI — @Scale: Networking 2025 Recap Networking at the Heart of AI — @Scale: Networking 2025 Recap

AI is everywhere and, as network engineers, we are right in the thick of it: building the network infrastructure for AI.

Setting Context: Rapid Changes and EvolutionGiven AI continues to drive so much innovation in networking and general infrastructure, we once again focused @Scale: Networking on AI networking, sharing the new insights and progress in the field.

The Models and the Primary AI Workloads Are Rapidly Evolving.

More from @Scale:Networking 2025Please visit the @Scale YouTube channel to check out all the talks from this year’s Networking @Scale.

We look forward to what promises to be another rapid year of network and AI innovation that we’ll cover at the next @Scale: Networking in…

2 months, 2 weeks назад @ engineering.fb.com
A New Ranking Framework for Better Notification Quality on Instagram
A New Ranking Framework for Better Notification Quality on Instagram A New Ranking Framework for Better Notification Quality on Instagram

We’ve introduced a diversity-aware notification ranking framework to reduce uniformity and deliver a more varied and engaging mix of notifications.

Instagram leverages machine learning (ML) models to decide who should get a notification, when to send it, and what content to include.

To tackle this, we’ve introduced a diversity-aware notification ranking framework that helps deliver more diverse, better curated, and less repetitive notifications.

Introducing Instagram’s Diversity-Aware Notification Ranking FrameworkInstagram’s diversity-aware notification ranking framework is designed to enhance the notification experience by balancing the predicted potential for user engagement with the nee…

3 months, 1 week назад @ engineering.fb.com
Federation Platform and Privacy Waves: How Meta distributes compliance-related tasks at scale
Federation Platform and Privacy Waves: How Meta distributes compliance-related tasks at scale Federation Platform and Privacy Waves: How Meta distributes compliance-related tasks at scale

We’re exploring Meta’s Federation Platform, a scalable set of tools for managing compliance-related tasks, along with Privacy Waves, our method for batching these tasks and ensuring accountability.

To facilitate this, we developed the Federation Platform and Privacy Waves program:The Federation Platform breaks down large compliance-related initiatives into smaller, manageable workstreams.

Internal surveys reveal significantly higher positive sentiment for Privacy Waves tasks compared to ad-hoc tasks.

Step 6: Reporting and recognitionThe centralized distribution of tasks via Federation Platform and Privacy Waves streamline operational effectiveness and verification.

Expansions for the Federa…

4 months назад @ engineering.fb.com
Diff Risk Score: AI-driven risk-aware software development
Diff Risk Score: AI-driven risk-aware software development Diff Risk Score: AI-driven risk-aware software development

Built on a fine-tuned Llama LLM, DRS evaluates code changes and metadata to produce a risk score and highlight potentially risky code snippets.

Production risk was one of the areas we tackled first.

The demand to build such features also led us to build the Risk Awareness Platform to provide risk analysis APIs and tool integrations.

We believe code risk can play a significant role in improving this tradeoff, so we will build more risk-aware features while improving their quality.

While code changes cause the plurality of SEVs at Meta, configuration changes are another large category.

4 months, 1 week назад @ engineering.fb.com
Building a human-computer interface for everyone
Building a human-computer interface for everyone Building a human-computer interface for everyone

What if you could control any device using only subtle hand movements?

New research from Meta’s Reality Labs is pointing even more firmly toward wrist-worn devices using surface electromyography (sEMG) becoming the future of human-computer interaction.

Generalization has been one of the most significant challenges in the field of human-computer interaction (HCI).

They discuss the road to creating a first-of-its-kind, generic human-computer neuromotor interface, what happens when software and hardware engineering meet neuroscience, and more!

And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.

4 months, 1 week назад @ engineering.fb.com
Using AI to make lower-carbon, faster-curing concrete
Using AI to make lower-carbon, faster-curing concrete Using AI to make lower-carbon, faster-curing concrete

But concrete suppliers can utilize AI to develop and scale innovative concrete mixes as drop-in replacements, accelerating the discovery and integration of sustainable materials for large-scale use.

Meta’s AI model for green concreteDesigning concrete formulas is a complex, multi-objective problem.

To accelerate the concrete mix design process, Meta developed an AI model for sustainable concrete using BoTorch and Ax, Meta’s open-source software for Bayesian optimization and adaptive experimentation, respectively.

Our AI pipeline consists of the workflow of generating baseline data, training an AI model, using it to develop and validate new hypotheses, and then improving the baseline data an…

4 months, 4 weeks назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 1 week, 2 days назад
We are joining OpenAI
We are joining OpenAI We are joining OpenAI

Piotr Niedźwiedź, CEO/CTO and founder of neptune.aiI’m excited to share that we’ve entered into a definitive agreement to be acquired by OpenAI, subject to closing conditions.

We are thrilled to join the OpenAI team and help their AI researchers build better models faster.

Neptune is a metrics dashboard company.”We’ve worked closely with OpenAI to create the metrics dashboard that helps teams building foundation models.

Our future with OpenAINeptune will join OpenAI and continue to support AI researchers with tools to monitor, debug, and evaluate frontier models.

We are looking forward to working with top AI researchers and supporting OpenAI’s mission of ensuring that AGI benefits all of hu…

1 week, 2 days назад @ neptune.ai
Synthetic Data for LLM Training
Synthetic Data for LLM Training Synthetic Data for LLM Training

For instance, financial data is highly sensitive and protected by very strict regulations, and synthetic data mimics the real data distribution without revealing customer information.

Read more about how leading foundation model teams curate their training data and other topics in the State of Foundation Model Training Report 2025.

Choosing the right synthetic data generation technique depends on the type of data and its complexity.

Synthetic tabular data generation is a promising direction to overcome these challenges by learning the distribution of the tabular data.

Post-processingAs the distribution of tabular data is highly complex, it makes the synthetic tabular data generation very ch…

1 month назад @ neptune.ai
What are LLM Embeddings: All you Need to Know
What are LLM Embeddings: All you Need to Know What are LLM Embeddings: All you Need to Know

TL;DR LLM embeddings are the numerical, vector representations of text that Large Language Models (LLMs) use to process information.

Unlike their predecessor word embeddings, LLM embeddings are context-aware and dynamically change to capture semantic and syntactic relationships based on the surrounding text.

What are the applications of LLM embeddings?

Word EmbeddingsSparse Word Embeddings One-Hot Vectors 1970s TF-IDF1980s Co-Occurrence MatrixStatic Word Embeddings Word2Vec 2013 GloVe 2014Contextualized word embeddings ELMo 2018 GPT-1 2018 BERT 2018 LLAMA 2023 DeepSeek-V1 2023 GPT-4 2023Static word embeddingsStatic word embeddings, such as word2vec in 2013, marked a significant development.…

1 month назад @ neptune.ai
Detecting and Fixing ‘Dead Neurons’ in Foundation Models
Detecting and Fixing ‘Dead Neurons’ in Foundation Models Detecting and Fixing ‘Dead Neurons’ in Foundation Models

TL;DR Dead neurons silently waste compute and reduce effective model capacity in foundation models.

Dead neurons’ impactRecent studies into dead neurons in the context of foundation models show interesting, albeit worrying, results.

These large reported fractions of dead neurons in foundation models are a concern from a computational perspective.

Before we move on to discuss how to detect and fix dead neurons, let’s touch upon an important distinction between dead neurons and vanishing gradients.

Further reading How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models Read moreVisualizing activation distributionsIs your foundation model suffering from dead neurons?

1 month, 2 weeks назад @ neptune.ai
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training

In the first part of this series, we covered the fundamentals of instruction fine-tuning (IFT).

def calculate_irs(instruction, output, reference_model): evaluation_prompt = f""" Instruction: {instruction} Model Output: {output} Rate how well the output follows the instruction on these criteria: 1.

| SourceHINT addresses a computational inefficiency in standard instruction fine-tuning: repeatedly reprocessing the same task instruction with every input example.

Read more about foundation model training infrastructure and other topics in Neptune’s 2025 State of Foundation Model Training Report.

First, during initial instruction fine-tuning across multiple diverse tasks, the model learns genera…

1 month, 2 weeks назад @ neptune.ai
How to Optimize LLM Inference
How to Optimize LLM Inference How to Optimize LLM Inference

Large Language Model (LLM) inference at scale is challenging as it involves transferring massive amounts of model parameters and data and performing computations on large tensors.

In the following, we’ll use the Llama model family architecture as a specific example to understand the LLM workload at inference.

For a far more detailed analysis of the LLM workload at inference, see the chapter All About Transformer Inference in the book How to Scale Your Model, published by Google DeepMind.

See also How to Run LLMs Locally Read moreA quick primer on hardware for LLM inferenceA typical LLM inference cluster consists of several nodes, each with a multi-core CPU and multiple accelerator devices, …

1 month, 4 weeks назад @ neptune.ai
A Researcher’s Guide to LLM Grounding
A Researcher’s Guide to LLM Grounding A Researcher’s Guide to LLM Grounding

In this article, we’ll explore the fundamental concepts of LLM grounding as well as strategies for optimally grounding models.

What is LLM grounding?

LLM grounding is analogous.

If relevant knowledge cannot be inferred from the data, then LLM grounding cannot yield more relevant responses.

When grounding LLMs using RAG, consider retaining only a few of the top hits (i.e., top-k) for your retrieval queries.

2 months, 2 weeks назад @ neptune.ai
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions

TL;DR Instruction fine-tuning (IFT) refines pre-trained large language models (LLMs) to follow specific task instructions by training on prompt-response pairs.

Instruction fine-tuning in a nutshellIFT tailors LLMs to follow user instructions by bridging their inherent next-word prediction with human-defined objectives.

Related LLM Fine-Tuning and Model Selection Using Neptune and Transformers Read moreParameter-efficient instruction fine-tuningWhile major foundation models like GPT-4 or Llama-2 undergo full parameter instruction fine-tuning during development, parameter-efficient fine-tuning (PEFT) methods have become widely adopted for instruction fine-tuning since the LoRA paper was publi…

2 months, 3 weeks назад @ neptune.ai
Understanding Prompt Injection: Risks, Methods, and Defense Measures
Understanding Prompt Injection: Risks, Methods, and Defense Measures Understanding Prompt Injection: Risks, Methods, and Defense Measures

Prompt injection 101: When prompts go rogueThe term ‘Prompt Injection’ comes from SQL injection attacks.

There is another claim of the independent discovery of prompt injection attacks, which suggests that Riley Goodside publicly exhibited a prompt injection in a tweet back in September 2022.

The indirect prompt injection attacks are classified into active, passive, user-driven and virtual prompt attacks.

Virtual prompt injection attacksThis injection type is closely related to passive injection attacks previously described.

Prompt injection: current challenges & lessons learnedThe arms race between prompt injection attacks and defenses is a challenge for researchers, developers, and users.

4 months, 1 week назад @ neptune.ai
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections] SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]

This simple idea avoids computing loss on input prompt tokens the model already knows.

Prompt tokens are (too) expensive in low-resource settingsDuring pre-training, LLMs are trained in causal language modeling through a next-token prediction task.

=> Mo fẹ́ràn ìrẹsì,” the model is trained to predict every token, from the prompt to the actual answer:Step Prompt Next token 1 Translate English Static prompt 2 Translate English to Static prompt 3 Translate English to Yoruba: Static prompt 4 Translate English to Yoruba: I 5 Translate English to Yoruba: I love 6 Translate English to Yoruba: I love rice.

This is straightforward to implement in PyTorch by masking out the prompt tokens in the label …

4 months, 1 week назад @ neptune.ai
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models

What gradient issues occur during foundation model training?

During training, gradient descent updates model parameters by computing the gradients of the loss function via forward and backward passes.

The green line corresponds to a learning rate of 10, while the orange line has a learning rate of 0.1.

The gradient norm for the orange line with LR = 0.1 is very high in the first steps, while the gradient norm of the green line with LR = 10 diverges to NaN after a few steps.

Techniques for gradient stabilizationMonitoring gradient norms and training loss provides insights into the learning dynamics of the foundation models.

5 months, 1 week назад @ neptune.ai
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection] STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]

Unstructured pruning removes individual weights, while structured pruning removes entire model components.

In the context of MoEs, as expert structures from training MoEs correspond to such patterns, pruning experts is a natural fit for structured pruning.

Thus, structured pruning does not significantly decrease kurtosis, leaving plenty of margin for unstructured pruning.

Since structured pruning primarily reduces architectural redundancy rather than reshaping the underlying weight distribution, our two-phase approach—leveraging unstructured pruning after structured pruning—outperforms unstructured-only pruning.

Since STUN does not make any assumption about base MoE models, it is generaliza…

6 months, 1 week назад @ neptune.ai
Evaluating RAG Pipelines
Evaluating RAG Pipelines Evaluating RAG Pipelines

Related Building LLM Applications With Vector Databases Read moreDimensions of RAG evaluationEvaluating a RAG pipeline means assessing its behavior across three dimensions:1.

The evaluation of the RAG pipeline is a multi-step process, starting with creating an evaluation dataset, then evaluating the individual components (retriever, generator, etc.

Curating an evaluation datasetThe first step in the RAG evaluation process is the creation of a ground truth dataset.

MAP considers both the presence and rank of relevant chunks but fails to consider the relative position of relevant chunks.

However, not all retrieved chunks are equally relevant and sometimes, the most relevant chunks might not b…

7 months назад @ neptune.ai
How to Build an LLM Agent With AutoGen: Step-by-Step Guide
How to Build an LLM Agent With AutoGen: Step-by-Step Guide How to Build an LLM Agent With AutoGen: Step-by-Step Guide

The efficiency of an LLM agent depends on the selection of the right LLM model.

In this article, we’ll introduce the fundamental building blocks of LLM agents and then walk through the process of building an LLM agent step by step.

Building an LLM agent from scratchIn the following, we’ll build a trip-planning LLM agent from scratch.

Using AutoGen’s OpenAI Assistant Agent, we instantiate a prompt that the LLM agent will follow throughout its interactions.

Related Ethical Considerations and Best Practices in LLM Development Read moreEnhancing LLM agent performanceWhile architecting an LLM agent, you have to keep in mind opportunities to improve the performance of the LLM agent.

8 months, 3 weeks назад @ neptune.ai
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection] Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]

Moreover, I will make the case for why Bayesian deep learning can satisfy these desiderata and briefly review recent advances in the field.

The case for Bayesian deep learningBayesian deep learning uses the foundational statistical principles of Bayesian inference to endow deep learning systems with the ability to make probabilistic predictions.

However, Bayesian deep learning is unfortunately still not as easy to use as standard deep learning, which you can do these days in a few lines of PyTorch code.

If you want to use a Bayesian deep learning model, first, you have to think about specifying the prior.

If this is the case, trying out Bayesian deep learning is likely worth your while.

9 months назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 1 month, 1 week назад
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff) [Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)

https://arxiv.org/abs/2510.17558 Abstract:

We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks. Author: François Fleuret Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the con…

1 month, 1 week назад @ youtube.com
[Video Response] What Cloudflare's code mode misses about MCP and tool calling
[Video Response] What Cloudflare's code mode misses about MCP and tool calling [Video Response] What Cloudflare's code mode misses about MCP and tool calling

Theo's Video: https://www.youtube.com/watch?v=bAYZjVAodoo

Cloudflare article: https://blog.cloudflare.com/code-mode/ Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8…

1 month, 3 weeks назад @ youtube.com
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant) [Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

Paper: https://arxiv.org/abs/2508.21038 Abstract:

Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assumption that these difficulties are exclusively due to unrealistic queries, and those that are not can be overcome with better training data and larger models. In this work, we demonstrate that we may encounter these theoretical limitations in realist…

2 months назад @ youtube.com
AGI is not coming!
AGI is not coming! AGI is not coming!

jack Morris's investigation into GPT-OSS training data https://x.com/jxmnop/status/1953899426075816164?t=3YRhVQDwQLk2gouTSACoqA&s=09

4 months назад @ youtube.com
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis) Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Paper: https://research.trychroma.com/context-rot Abstract:

Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.

In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. Authors: Kelly Hong, Anton Troynikov, Jeff Huber Links:

4 months, 3 weeks назад @ youtube.com
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review) Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Paper: https://arxiv.org/abs/2507.02092

Code: https://github.com/alexiglad/EBT

Website: https://energy-based-transformers.github.io/ Abstract:

Inference-time computation techniques, analogous to human System 2 Thinking, have recently become popular for improving model performances. However, most existing approaches suffer from several limitations: they are modality-specific (e.g., working only in text), problem-specific (e.g., verifiable domains like math and coding), or require additional supervision/training on top of unsupervised pretraining (e.g., verifiers or verifiable rewards). In this paper, we ask the question "Is it possible to generalize these System 2 Thinking approaches, and de…

4 months, 3 weeks назад @ youtube.com
On the Biology of a Large Language Model (Part 2)
On the Biology of a Large Language Model (Part 2) On the Biology of a Large Language Model (Part 2)

An in-depth look at Anthropic's Transformer Circuit Blog Post

Part 1 here: https://youtu.be/mU3g2YPKlsA

Discord here: https;//ykilcher.com/discord https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy …

7 months, 1 week назад @ youtube.com
On the Biology of a Large Language Model (Part 1)
On the Biology of a Large Language Model (Part 1) On the Biology of a Large Language Model (Part 1)

An in-depth look at Anthropic's Transformer Circuit Blog Post https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson,

Sam Zimmerman, Kelley Rivoire, Thom…

8 months, 1 week назад @ youtube.com
Henry AI Labs Henry AI Labs
последний пост None
3blue1brown 3blue1brown
последний пост 2 weeks, 4 days назад
The most absurd product I've made
The most absurd product I've made The most absurd product I've made

Because why not make a pi creature neck pillow?

Available at 3b1b.co/store

2 weeks, 4 days назад @ youtube.com
How Laplace transforms solve differential equations
How Laplace transforms solve differential equations How Laplace transforms solve differential equations

Studying the forced harmonic oscillator by taking a Laplace transform and studying its poles.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Chapter on the Laplace Transform:

https://youtu.be/j0wJBEZdwLs Chapter on the S-plane and Simple Harmonic Motion:

https://youtu.be/-j8PzkZ70Lg Timestamps:

0:00 - Opening puzzle

1:06 - Key properties of a Laplace Transform

3:29 - Qualitative analysis with Laplace Transforms

4:29 - The Laplace Transforms of a Derivative

6:06 - The forced oscillator

11:59 - Intuition from the transformed solution

1…

1 month, 1 week назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

2 months назад @ youtube.com
But what is a Laplace Transform?
But what is a Laplace Transform? But what is a Laplace Transform?

Visualizing the most important tool for differential equations.

Previous chapter: https://youtu.be/-j8PzkZ70Lg

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Artwork by Kurt Bruns Engine animation borrowed with permission from this (excellent) blog: https://ciechanow.ski/internal-combustion-engine/ Timestamps:

0:00 - Understanding the engine

1:16 - Key background ideas

5:41 - Definition and intuition

10:43 - Complex integration

20:43 - Analytic continuation

23:52 - The transform of exponentials

26:15 - A deep look at cos(t)

32:59 - W…

2 months назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

2 months назад @ youtube.com
Why complex exponents matter | Laplace Transform Prelude
Why complex exponents matter | Laplace Transform Prelude Why complex exponents matter | Laplace Transform Prelude

How dynamics explain Euler's formula, and vice versa.

Early view of the Laplace Transform video: https://www.patreon.com/posts/laplace-early-140428165

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Timestamps:

0:00 - Intro

1:51 - Euler's formula explained dynamically

9:27 - The harmonic oscillator

21:08 - General linear equations

22:47 - Motivating the Laplace Transform ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim Music by Vincent Rubin…

2 months, 1 week назад @ youtube.com
Why ruler and compass? | Guest video by ⁨@bensyversen⁩
Why ruler and compass? | Guest video by ⁨@bensyversen⁩ Why ruler and compass? | Guest video by ⁨@bensyversen⁩

What role were ruler and compass constructions really serving?

Check out Ben's channel: @bensyversen Interview with the author of this video: https://youtu.be/VohYM99j8e0

Supporters get early views of new videos: https://3b1b.co/support Written, produced, edited, and animated by Ben Syversen

Additional editing: Jack Saxon

3d Blender model: Jan-Hendrik Müller

Additional Blender help: Thibaut Modrzyk (@Deepia)

Illustrations: Alex Zepherin/DonDada Studio

Drums: Jeremy Gustin

Additional music from Epidemic Sound Special thanks to Viktor Blåsjö: https://intellectualmathematics.com/opinionated-history-of-mathematics/ References/Recommended reading: Euclid’s Elements:

Visual edition of Book 1: htt…

2 months, 3 weeks назад @ youtube.com
Incomplete open cubes
Incomplete open cubes Incomplete open cubes

Full video: https://youtu.be/_BrFKp-U8GI

3 months назад @ youtube.com
Exploration & Epiphany
Exploration & Epiphany Exploration & Epiphany

Sol Lewitt's "Incomplete Open Cubes" and rediscovering Burnside's lemma in group theory

This is a guest video by Paul Dancstep: https://youtu.be/JEeM2ABUMoo

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to share the videos.

Home page: https://www.3blue1brown.com Thanks to the Wadsworth Atheneum for granting permission to use LeWitt's notebooks. Talks by Paul you can find online: What is Category Theory:

https://www.youtube.com/watch?app=desktop&v=eXBwU9ieLL0 How to Predict Eclipses:

https://www.exploratorium.edu/eclipse/video/how-predict-eclipses Theo Jansen's Strandbeests

https://www.youtube.com/w…

3 months назад @ youtube.com
Simulating Phase Change | Guest video by Vilas Winstein
Simulating Phase Change | Guest video by Vilas Winstein Simulating Phase Change | Guest video by Vilas Winstein

Deriving the Boltzmann formula, defining temperature, and simulating liquid/vapor.

@SpectralCollective has the second part: https://youtu.be/yEcysu5xZH0

You can play with a simulation of this model here: https://vilas.us/simulations/liquidvapor/

These lessons are funded directly by viewers: https://3b1b.co/support

Home page: https://www.3blue1brown.com Notes from Vilas:

1) This open problem is to prove the ergodicity of the deterministic dynamical systems that are used to model the molecule-level physics. A good example of such a dynamical system is the box with particles evolving according to Newton's laws with elastic collisions, like in the video. 2) This video assumes that all probabili…

3 months, 2 weeks назад @ youtube.com
How AI connects text and images
How AI connects text and images How AI connects text and images

From this guest video by @WelchLabsVideo on how diffusion models work: https://youtu.be/iv-5mZ_9CPY

3 months, 3 weeks назад @ youtube.com
The AI that solved IMO Geometry Problems | Guest video by @Aleph0
The AI that solved IMO Geometry Problems | Guest video by @Aleph0 The AI that solved IMO Geometry Problems | Guest video by @Aleph0

How AlphaGeometry combines logic and intuition.

Share stories about AI in math research for an upcoming video: https://forms.gle/gr9aZVdUrW5T3yDg9

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com AlphaGeometry announcement:

https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/ Similar open-source model, Newclid, by Harmonic:

https://harmonic.fun/news#blog-post-geometry Timestamps:

0:00 - What's surprising

1:33 - Solve without AI

7:10 - Where AI comes in

12:48 - Grant's comments ------------------…

3 months, 3 weeks назад @ youtube.com
But how do AI videos actually work? | Guest video by @WelchLabsVideo
But how do AI videos actually work? | Guest video by @WelchLabsVideo But how do AI videos actually work? | Guest video by @WelchLabsVideo

Diffusion models, CLIP, and the math of turning text into images

Welch Labs Book: https://www.welchlabs.com/resources/imaginary-numbers-book Sections

0:00 - Intro

3:37 - CLIP

6:25 - Shared Embedding Space

8:16 - Diffusion Models & DDPM

11:44 - Learning Vector Fields

22:00 - DDIM

25:25 Dall E 2

26:37 - Conditioning

30:02 - Guidance

33:39 - Negative Prompts

34:27 - Outro

35:32 - About guest videos + Grant’s Reaction Special Thanks to:

Jonathan Ho - Jonathan is the Author of the DDPM paper and the Classifier Free Guidance Paper.

https://arxiv.org/pdf/2006.11239

https://arxiv.org/pdf/2207.12598 Preetum Nakkiran - Preetum has an excellent introductory diffusion tutorial:

https://arxiv.org/pdf/24…

4 months, 2 weeks назад @ youtube.com
Summer of Math Exposition #4 | Teachers, I'd love to hear from you
Summer of Math Exposition #4 | Teachers, I'd love to hear from you Summer of Math Exposition #4 | Teachers, I'd love to hear from you

Make a math explainer, get feedback, and receive prizes: https://some.3b1b.co

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos. ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim

https://github.com/3b1b/manim

https://github.com/ManimCommunity/manim/ All code for specific videos is visible here:

https://github.com/3b1b/videos/ The music is by Vincent Rubinetti.

https://www.vincentrubinetti.com

https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

https://open.spotify.com/…

7 months, 1 week назад @ youtube.com
Where my explanation of Grover’s algorithm failed
Where my explanation of Grover’s algorithm failed Where my explanation of Grover’s algorithm failed

Addressing viewer questions from the last video.

These lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to share the videos. ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim

https://github.com/3b1b/manim

https://github.com/ManimCommunity/manim/ All code for specific videos is visible here:

https://github.com/3b1b/videos/ The music is by Vincent Rubinetti.

https://www.vincentrubinetti.com

https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u ------------------ 3blue1brown is a ch…

7 months, 1 week назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 1 day, 20 hours назад
DeepMind’s Crazy New AI Masters Games That Don’t Exist
DeepMind’s Crazy New AI Masters Games That Don’t Exist DeepMind’s Crazy New AI Masters Games That Don’t Exist

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 📝 The SIMA 2 paper is available here:

https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael…

1 day, 20 hours назад @ youtube.com
AlphaFold - The Most Important AI Breakthrough Ever Made
AlphaFold - The Most Important AI Breakthrough Ever Made AlphaFold - The Most Important AI Breakthrough Ever Made

Full interview: https://www.youtube.com/watch?v=Vhcwjzeukts

1 day, 21 hours назад @ youtube.com
30x Better Physics: Why Everyone Missed This Genius Solution
30x Better Physics: Why Everyone Missed This Genius Solution 30x Better Physics: Why Everyone Missed This Genius Solution

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 My hobby channel with guitars and labcoats 🥼:

https://www.youtube.com/watch?v=GjMMhn4pS38

https://www.youtube.com/watch?v=BxS62W6V48E 📝 The paper is available here:

https://arxiv.org/abs/2505.21946 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Chri…

5 days, 19 hours назад @ youtube.com
He Kinda Solved Biology - Nobel Prize Winner John Jumper Interview
He Kinda Solved Biology - Nobel Prize Winner John Jumper Interview He Kinda Solved Biology - Nobel Prize Winner John Jumper Interview

Thank you so much to John for being so kind and insightful, and to the film crew as well - they all did an incredible job. To celebrate the 5th anniversary of #AlphaFold, I was invited by Google DeepMind to interview Nobel Prize Winner and Distinguished Scientist, John Jumper. Note that we have no business ties with them. AlphaFold: https://deepmind.google/science/alphafold/

The full Thinking Game Movie: https://www.youtube.com/watch?v=d95J8yzvjbQ My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

1 week, 3 days назад @ youtube.com
Unreal Engine 5.7: Billions Of Triangles, In Real Time
Unreal Engine 5.7: Billions Of Triangles, In Real Time Unreal Engine 5.7: Billions Of Triangles, In Real Time

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The Unreal Engine 5.7 is available here:

https://www.unrealengine.com/en-US/news/unreal-engine-5-7-is-now-available Sources:

https://www.youtube.com/watch?v=Mj_-2SdsYLw

https://www.youtube.com/watch?v=ngzPTqtZWo4

https://advances.realtimerendering.com/s2023/2023%20Siggraph%20-%20Substrate.pdf 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Be…

2 weeks, 5 days назад @ youtube.com
Blender 5.0 Is Here - A Revolution…For Free!
Blender 5.0 Is Here - A Revolution…For Free! Blender 5.0 Is Here - A Revolution…For Free!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Get Blender 5.0 here: https://www.blender.org/

Example scenes: https://www.blender.org/download/demo-files/

Multiple scattering paper: https://cg.iit.bme.hu/~szirmay/volreuse_link.htm 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall…

3 weeks, 1 day назад @ youtube.com
DeepMind’s New AI Mastered Minecraft… Without Ever Playing It
DeepMind’s New AI Mastered Minecraft… Without Ever Playing It DeepMind’s New AI Mastered Minecraft… Without Ever Playing It

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama following the command from here - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper is available here:

https://danijar.com/project/dreamer4/ Source:

https://www.youtube.com/watch?v=6bnM84xGxbg 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patre…

3 weeks, 3 days назад @ youtube.com
Games Have Never Simulated Clothing Like This Before
Games Have Never Simulated Clothing Like This Before Games Have Never Simulated Clothing Like This Before

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama with this command - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper "Fast Physics-Based Modeling of Knots and Ties Using Templates" is available here:

https://wanghmin.github.io/publication/guo-2025-fpb/ Sources:

https://www.youtube.com/watch?v=2RQcoLV_bVk

https://www.youtube.com/watch?v=7d158rQ1R3k

https://www.youtube.com/watch?v=qirVdKg3qgs

https://www.youtube.com/watch?v=TPokJdN2bkw

https://www.youtube.com/watch?v=DRzT3c1jk14

https://www.youtube.com/w…

3 weeks, 5 days назад @ youtube.com
The Secret Behind Those Perfect Chocolate Commercials
The Secret Behind Those Perfect Chocolate Commercials The Secret Behind Those Perfect Chocolate Commercials

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper "A practical octree liquid simulator with adaptive surface resolution" is available here:

https://cs.uwaterloo.ca/~c2batty/papers/Ando2020/Ando2020.pdf Sources:

https://www.youtube.com/watch?v=kdt5Cs1VYJA

https://www.youtube.com/watch?v=YmmSDZ6dBdY

https://www.youtube.com/shorts/FVIDRU9-FW8

https://www.youtube.com/watch?v=gNZtx3ijjpo&pp=ygUHb2N0cmVlcw%3D%3D

https://www.youtube.com/shorts/1Euba1QvhW0

https://www.youtube.com/shorts/k2P9yWSMaXE

https://www.youtube.com/watch?v=Z5qbxQI6dgw

https://www.youtube.com/watch?v=laoGmqNtUMI 📝 My paper on simulations that look almost like reality is availa…

4 weeks назад @ youtube.com
The Physics Glitch Everyone Gave Up On… Finally Fixed
The Physics Glitch Everyone Gave Up On… Finally Fixed The Physics Glitch Everyone Gave Up On… Finally Fixed

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper "Multi-Material Mesh-Based Surface Tracking with Implicit Topology Changes" is available here under one of these links hopefully:

https://pub.ista.ac.at/group_wojtan/projects/2024_MultimatMeshing/SuperDuperTopoFixer.pdf

https://dl.acm.org/doi/10.1145/3658223 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 Sources:

https://www.youtube.com/watch?v=dtBqv-qIFLo

https://www.youtube.com/watch?v=EZul6DR-fHc

https://www.youtube…

1 month назад @ youtube.com
NVIDIA’s New AI Just Made Real Physics Look Slow
NVIDIA’s New AI Just Made Real Physics Look Slow NVIDIA’s New AI Just Made Real Physics Look Slow

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama with this command - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper "Neural Robot Dynamics" is available here:

https://neural-robot-dynamics.github.io/

https://github.com/NVlabs/neural-robot-dynamics 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our gener…

1 month, 1 week назад @ youtube.com
They Said It Was Impossible… Weta FX Just Solved It
They Said It Was Impossible… Weta FX Just Solved It They Said It Was Impossible… Weta FX Just Solved It

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama with this command - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper "A unified multi-scale method for simulating immersed bubbles" is available here:

https://alexey.stomakhin.com/research/unibubbles.html 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our g…

1 month, 2 weeks назад @ youtube.com
How AI Just Leveled Up Fashion in Games
How AI Just Leveled Up Fashion in Games How AI Just Leveled Up Fashion in Games

❤️ Check out the Fully Connected Conference by Weights & Biases - https://wandb.me/fclon2025-2min

20% discount code: FCLON2025-2MIN 📝 The paper is available here:

https://dress-1-to-3.github.io/ ❤️ Get cool perks and support The Papers on Patreon! Link: https://www.patreon.com/c/TwoMinutePapers 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, Juan Benet, Michael Tedder, Owe…

1 month, 2 weeks назад @ youtube.com
NVIDIA’s New AI’s Movements Are So Real It’s Uncanny
NVIDIA’s New AI’s Movements Are So Real It’s Uncanny NVIDIA’s New AI’s Movements Are So Real It’s Uncanny

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama with this command - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝 The paper is available here:

https://add-moo.github.io/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Chr…

1 month, 3 weeks назад @ youtube.com
The Worst Bug In Games Is Now Gone Forever
The Worst Bug In Games Is Now Gone Forever The Worst Bug In Games Is Now Gone Forever

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide:

Rent one of their GPUs with over 16GB of VRAM

Open a terminal

Just get Ollama with this command - https://ollama.com/download/linux

Then run ollama run gpt-oss:120b - https://ollama.com/library/gpt-oss:120b 📝Paper: https://drive.google.com/file/d/1OrOKJH_im1L4j1cJB18sfvNHEbZVSqjL/view

Code and examples are available here: https://github.com/st-tech/ppf-contact-solver

Guide on how to try it: https://drive.google.com/file/d/1n068Ai_hlfgapf2xkAutOHo3PkLpJXA4/view Sources:

https://www.youtube.com/watch?v=5GDIoshj9Rw

https://www.youtube.com/watch?v=X53VuYLP0VY

https://www.youtube.com/shorts/x0WjJgotCXU

http…

1 month, 4 weeks назад @ youtube.com
DataFest Video DataFest Video
последний пост None
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 1 week, 1 day назад
Как решить проблему разнообразия ответов LLM
Как решить проблему разнообразия ответов LLM Как решить проблему разнообразия ответов LLM

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

1 week, 1 day назад @ youtube.com
ML Global Recap'25
ML Global Recap'25 ML Global Recap'25

Митап Яндекса для ML-сообщества, на котором расскажем о шести международных конференциях и главных трендах в рекомендательных технологиях, компьютерном зрении, технологиях распознавания речи и NLP.

1 week, 2 days назад @ youtube.com
Секция на проверку базовых технических навыков ML-инженеров
Секция на проверку базовых технических навыков ML-инженеров Секция на проверку базовых технических навыков ML-инженеров

Мок-интервью: как проходит секция на базовые алгоритмы для ML-инженеров Подробнее: https://yandex.ru/jobs/interview/mldev

Вакансии: https://yandex.ru/jobs/vacancies?professions=ml-developer Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+Ug9D4CjJrJxmZGRi #ml, #machinelearning, #mlengineer, #mlinterview, #datascience, #яндекс, #yandex, #itcareer, #mldeveloper, #techinterview, #algorithms, #машиннообучение, #вакансии

1 week, 4 days назад @ youtube.com
Какие есть виды LLM-аугментаций
Какие есть виды LLM-аугментаций Какие есть виды LLM-аугментаций

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

1 week, 6 days назад @ youtube.com
Как LLM предсказывает токены
Как LLM предсказывает токены Как LLM предсказывает токены

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

2 weeks, 3 days назад @ youtube.com
«Реал Мадрид» или «Барселона»: кто круче по мнению LLM
«Реал Мадрид» или «Барселона»: кто круче по мнению LLM «Реал Мадрид» или «Барселона»: кто круче по мнению LLM

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #MachineLearning #GenerativeAI #YandexForML #YandexForDevelopers #Яндекс #AIDevDay #NeuralNetworks #ArtificialIntelligence #Football #Soccer #RealMadrid #Barcelona #ElClasico #LaLiga #Messi #Ronaldo #TechAndSports #AIInSports #FootballFans #SportsAnalytics #YandexTech #AIComparison

2 weeks, 5 days назад @ youtube.com
Что должен знать AI-ассистент
Что должен знать AI-ассистент Что должен знать AI-ассистент

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

3 weeks, 2 days назад @ youtube.com
Data Dojo — встреча ML-сообщества в Москве
Data Dojo — встреча ML-сообщества в Москве Data Dojo — встреча ML-сообщества в Москве

Data Dojo — это сообщество ML-экспертов. Здесь обсуждают тренды, разбирают реальные задачи, делятся опытом и практикуются. Додзё в японской культуре — место Пути, где совершенствуют не только мастерство, но и дух. Мы перенесли этот принцип в мир данных. Программа: Приветственное слово | Владислав Офицеров, модератор встречи, руководитель команды развития нейронных технологий международного поиска, и Пётр Ермаков, ML-бренд-директор Лекция: Обзор трендов и предварительный итоги года | Сергей Овчаренко, руководитель отдела мультимодального анализа и генерации Лекция: Научить AI не бредить, сдать физику и получить права: как мы готовили задачи ML‑квалификации Yandex Cup | Сергей Фиронов, ведущи…

3 weeks, 3 days назад @ youtube.com
Как обучать LLM: процесс в двух частях
Как обучать LLM: процесс в двух частях Как обучать LLM: процесс в двух частях

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

4 weeks назад @ youtube.com
Визуально-языковые модели (VLM) в Яндексе: подходы, данные, подводные камни / Сергей Овчаренко
Визуально-языковые модели (VLM) в Яндексе: подходы, данные, подводные камни / Сергей Овчаренко Визуально-языковые модели (VLM) в Яндексе: подходы, данные, подводные камни / Сергей Овчаренко

Это Сергей Овчаренко, руководитель отдела мультимодальных анализа и генерации в Яндекс R&D. В своём докладе Сергей рассказал о VLM в Яндексе: какие подходы мы используем и с какими подводными камнями сталкиваемся. А еще — о претрейне и о том, почему добиться хорошего качества бывает непросто, даже когда, казалось бы, всё делаешь правильно. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #ML #AI #MachineLearning #DeepLearning #LLM #VLM #NeuralNetworks #Transformers #GenerativeAI #NLP #ComputerVision #DataScience #BigData #MLOps #ModelTraining #AIResearch #ArtificialIntellig…

4 weeks, 1 day назад @ youtube.com
Релиз: что может пойти не так? / Алексей Колесов
Релиз: что может пойти не так? / Алексей Колесов Релиз: что может пойти не так? / Алексей Колесов

Это Алексей Колесов, CTO в Яндекс R&D. Поговорили честно и без прикрас об обратной стороне релизов — о нюансах и неожиданном поведении LLM, с которыми сталкивались на своём опыте, и о том, как решали такие кейсы. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #ML #AI #MachineLearning #DeepLearning #LLM #VLM #NeuralNetworks #Transformers #GenerativeAI #NLP #ComputerVision #DataScience #BigData #MLOps #ModelTraining #AIResearch #ArtificialIntelligence #AIDevelopment #AIFuture #Tech #Engineering #Yandex #SberAI #AvitoTech #TBank #AIConference #YandexML #DataEngineering #Reco…

4 weeks, 1 day назад @ youtube.com
Кэш для товарного поиска Лавки на основе LLM / Евгений Комаров
Кэш для товарного поиска Лавки на основе LLM / Евгений Комаров Кэш для товарного поиска Лавки на основе LLM / Евгений Комаров

Это Евгений Комаров, руководитель команды ML Поиска в Яндекс Лавке. Поиск товаров — одна из самых нагруженных частей Лавки. В докладе Евгений рассказал, как команда реализовала кэш на основе LLM, чтобы повысить релевантность и скорость отклика товарного поиска. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #ML #AI #MachineLearning #DeepLearning #LLM #VLM #NeuralNetworks #Transformers #GenerativeAI #NLP #ComputerVision #DataScience #BigData #MLOps #ModelTraining #AIResearch #ArtificialIntelligence #AIDevelopment #AIFuture #Tech #Engineering #Yandex #SberAI #AvitoTech #TBa…

4 weeks, 1 day назад @ youtube.com
Как найти лучшую генеративную модель для своей задачи / Кирилл Власов
Как найти лучшую генеративную модель для своей задачи / Кирилл Власов Как найти лучшую генеративную модель для своей задачи / Кирилл Власов

Это Кирилл Власов, PO AI Studio в Yandex Cloud. Когда мы работаем с AI-проектами, первый вопрос, который мы задаём себе, — какую модель выбрать? Многообразие растёт: каждая компания утверждает, что именно у неё — лучшая модель в мире. В своём докладе Кирилл делится тем, как выжить в этом хаосе и перейти к системной работе с пайплайном промптинга, эвала и трейсинга. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #ML #AI #MachineLearning #DeepLearning #LLM #VLM #NeuralNetworks #Transformers #GenerativeAI #NLP #ComputerVision #DataScience #BigData #MLOps #ModelTraining #AIRe…

4 weeks, 1 day назад @ youtube.com
Какие темы завлекли гостей на Practical ML Conf
Какие темы завлекли гостей на Practical ML Conf Какие темы завлекли гостей на Practical ML Conf

Делимся отрывком с Practical ML Conf 2025 — главной конфы Яндекса по машинному обучению. Тут мы поймали нескольких участников и спросили, какие темы, по их мнению, самые интересные и что они думают об ивенте 😎 #PracticalMLConf #YandexForML #YandexForDevelopers #Яндекс #AI #ML #MachineLearning #ArtificialIntelligence #NeuralNetworks #DeepLearning #DataScience #AIAgents #GenerativeAI #LLM #YandexTech #AIFuture #MLFuture #TechConference #ITConference #AIConference #YandexAI #YandexML #AIDevDay #AICommunity #MLCommunity #YandexEvents #AIEducation #AITech #AIinPractice #TechEvents

1 month назад @ youtube.com
Ценообразование в Яндекс Лавке / Всеволод Парамонов, Андрей Шевцов
Ценообразование в Яндекс Лавке / Всеволод Парамонов, Андрей Шевцов Ценообразование в Яндекс Лавке / Всеволод Парамонов, Андрей Шевцов

В докладе на Data Fest Siberia Всеволод Парамонов и Андрей Шевцов, разработчики группы аналитики ценообразования Яндекс Лавки, рассказали, что лежит под капотом динамического ценообразования в сервисе. Ребята разобрали, как устроены модели спроса и эластичности и какие математические формулы приводят в движение весь процесс. Наш телеграм-канал Yandex for ML: https://t.me/+Ug9D4CjJrJxmZGRi #YandexForML #DataFest #Яндекс #MachineLearning #ML #AI #RAG #LLM #VLM #YandexGPT #МультимодальныеМодели #DataFestSiberia #NeuralNetworks #DeepLearning #ComputerVision #ЯндексПоиск #MLTech #AIinYandex #MLTalks #TechConference

1 month назад @ youtube.com
ML Trainings ML Trainings
последний пост 5 days, 21 hours назад
Капитанский мостик №23: Терминатор знает кунфу | ИИ находит дыру | Доминирование на исходе
Капитанский мостик №23: Терминатор знает кунфу | ИИ находит дыру | Доминирование на исходе Капитанский мостик №23: Терминатор знает кунфу | ИИ находит дыру | Доминирование на исходе

0:00:00 Начало

0:00:36 Темные фабрики

0:06:44 Терминатор знает кунфу

0:12:08 Они не верят в роботов

0:17:07 OpenAI на Нептуне

0:20:48 ИИ находит дыру

0:26:59 ИИ-ландшафт России

0:36:54 Trainium от Amazon

0:39:47 TPU лучше

0:46:04 Baidu тоже так думает

0:48:01 ИИ сожрал память

0:53:34 LoRA на смартфоне

0:58:42 Доминирование на исходе

1:04:09 ИИ против коррупции ИИ-саммари: В этом подкасте обсуждаются последние новости в области робототехники и автоматизации, включая соревнование по машинному переводу, появление темных фабрик, развитие роботов, таких как Т-800, и влияние OpenAI на экосистему разработки AI. Также рассматриваются вопросы безопасности смарт-контрактов и их уязвимости. В этом раз…

5 days, 21 hours назад @ youtube.com
Стем навыки и будущее LLM: мнение Дмитрия Колодезева
Стем навыки и будущее LLM: мнение Дмитрия Колодезева Стем навыки и будущее LLM: мнение Дмитрия Колодезева 1 week, 4 days назад @ youtube.com
Расизм и биотехнологии: где провести черту
Расизм и биотехнологии: где провести черту Расизм и биотехнологии: где провести черту 1 week, 4 days назад @ youtube.com
Комитеты по этике: как решать сложные вопросы
Комитеты по этике: как решать сложные вопросы Комитеты по этике: как решать сложные вопросы 1 week, 4 days назад @ youtube.com
Киборги и парализованные этические вопросы
Киборги и парализованные этические вопросы Киборги и парализованные этические вопросы 1 week, 4 days назад @ youtube.com
Квантовые компьютеры и будущее шифрования
Квантовые компьютеры и будущее шифрования Квантовые компьютеры и будущее шифрования 1 week, 4 days назад @ youtube.com
Дмитрий и Валентин обсуждают будущее технологий
Дмитрий и Валентин обсуждают будущее технологий Дмитрий и Валентин обсуждают будущее технологий 1 week, 4 days назад @ youtube.com
Боль и анестезия: уроки из медицины
Боль и анестезия: уроки из медицины Боль и анестезия: уроки из медицины 1 week, 4 days назад @ youtube.com
Капитанский мостик №22: Биодроны пока не атакуют | Пузыря ИИ нет | ИИ поможет демографии
Капитанский мостик №22: Биодроны пока не атакуют | Пузыря ИИ нет | ИИ поможет демографии Капитанский мостик №22: Биодроны пока не атакуют | Пузыря ИИ нет | ИИ поможет демографии

0:00:00 Начало

0:00:25 Биодроны пока не атакуют

0:13:16 Агенты делают науку

0:28:03 Кванты = новый ИИ

0:35:28 Пузыря ИИ нет

0:39:25 Суцкевер про ИИ

0:45:47 Илон Маск про роботов

0:51:52 Роботы-пограничники

0:59:46 ИИ заменит 11% людей

1:04:20 США (не) продают железо

1:08:54 ИИ не дает вам летать

1:12:06 ИИ поможет демографии ИИ-саммари: В этом подкасте обсуждаются актуальные темы, такие как чипирование голубей, этические вопросы в науке и технологиях, влияние киберпанка на будущее, а также роль ИИ в научных конференциях и исследованиях в физике и химии. В этом разговоре обсуждаются ключевые темы, связанные с автоматизацией, применением технологий в медицине, влиянием NLP, будущим квантовых …

1 week, 5 days назад @ youtube.com
Петербург следит за сосульками с помощью нейросети
Петербург следит за сосульками с помощью нейросети Петербург следит за сосульками с помощью нейросети 2 weeks, 4 days назад @ youtube.com
Переход от чат ботов к реальным системам будущего
Переход от чат ботов к реальным системам будущего Переход от чат ботов к реальным системам будущего 2 weeks, 4 days назад @ youtube.com
Моделируем мир за текстом и видео
Моделируем мир за текстом и видео Моделируем мир за текстом и видео 2 weeks, 4 days назад @ youtube.com
Валентин Малых о доверии к обобщениям LLM
Валентин Малых о доверии к обобщениям LLM Валентин Малых о доверии к обобщениям LLM 2 weeks, 4 days назад @ youtube.com
Валентин Малых говорит о проблемах понимания чат GPT
Валентин Малых говорит о проблемах понимания чат GPT Валентин Малых говорит о проблемах понимания чат GPT 2 weeks, 4 days назад @ youtube.com
Валентин Малых говорит о worldmodels
Валентин Малых говорит о worldmodels Валентин Малых говорит о worldmodels 2 weeks, 4 days назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 8 часов назад
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths #487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths

Irving Finkel is a scholar of ancient languages and a longtime curator at the British Museum, renowned for his expertise in Mesopotamian history and cuneiform writing.

He specializes in reading and interpreting cuneiform inscriptions, including tablets from Sumerian, Akkadian, Babylonian, and Assyrian contexts.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep487-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/Chevron: Reliable energy for data centers.

8 часов назад @ lexfridman.com
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life #486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life

Michael Levin is a biologist at Tufts University working on novel ways to understand and control complex pattern formation in biological systems.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep486-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

(2:42:41) – Mind uploading(3:01:22) – Alien intelligence(3:16:17) – Advice for young people(3:22:46) – Questions for AGI

1 week, 5 days назад @ lexfridman.com
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy #485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy

David Kirtley is a nuclear fusion engineer and CEO of Helion Energy, a company working on building the world's first commercial fusion power plant by 2028.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep485-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/david-kirtley-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

David's X: htt…

3 weeks, 4 days назад @ lexfridman.com
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming #484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming

Dan Houser is co-founder of Rockstar Games and is a legendary creative mind behind Grand Theft Auto (GTA) and Red Dead Redemption series of video games.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep484-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://box.com/aiUPLIFT Desk: Standing desks and office ergonomics.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(01:29) – Sponsors, Comments, and Reflections(11:32) – Greatest films of all time(23:45) – Making video games(26:36) – GTA 3(29:55) – Open world video games(32:42) – Character creation(36:09) – Superintelligent AI in A Bette…

1 month, 1 week назад @ lexfridman.com
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex #483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex

Julia Shaw is a criminal psychologist and author who in her books explores human nature, including psychopathy, violent crime, the psychology of evil, police interrogation, false memory manipulation, deception detection, and human sexuality.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep483-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

1 month, 4 weeks назад @ lexfridman.com
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature #482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature

Pavel Durov is the founder and CEO of Telegram.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep482-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Transcript:https://lexfridman.com/pavel-durov-transcriptCONTACT LEX:Feedback – give feedback to Lex: https://lexfridman.com/surveyAMA – submit questions, videos or call-in: https://lexfridman.com/amaHiring – join our team: https://lexfridman.com/hiringOther – other ways to get in touch: https://lexfridman.com/contactEPISODE LINKS:Pavel’s Telegram: https://t.me/durovPavel’s X: https://x.com/durovTelegram: https://telegram.org/Telegram Contests: https://contest.c…

2 months, 1 week назад @ lexfridman.com
#481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA
#481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA #481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA

Norman Ohler is a historian and author of “Blitzed: Drugs in the Third Reich,” a book that investigates the role of psychoactive drugs, particularly stimulants such as methamphetamine, in the military history of World War II.

It is a book that two legendary historians Ian Kershaw and Antony Beevor give very high praise for its depth of research.

Norman also wrote “Tripped: Nazi Germany, the CIA, and the Dawn of the Psychedelic Age”, and he is working on a new book “Stoned Sapiens” looking at the history of human civilization through the lens of drugs.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep481-scSee below for timestamps, transcript, and to give f…

2 months, 3 weeks назад @ lexfridman.com
#480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park
#480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park #480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park

Dave Hone is a paleontologist, expert on dinosaurs, co-host of the Terrible Lizards podcast, and author of numerous scientific papers and books on the behavior and ecology of dinosaurs.

He lectures at Queen Mary University of London on topics of Ecology, Zoology, Biology, and Evolution.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep480-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://go.lindy.ai/lexBetterHelp: Online therapy and counseling.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

3 months, 1 week назад @ lexfridman.com
#479 – Dave Plummer: Programming, Autism, and Old-School Microsoft Stories
#479 – Dave Plummer: Programming, Autism, and Old-School Microsoft Stories #479 – Dave Plummer: Programming, Autism, and Old-School Microsoft Stories

Dave Plummer is a programmer, former Microsoft software engineer (Windows 95, NT, XP), creator of Task Manager, author of two books on autism, and host of the Dave’s Garage YouTube channel, where he shares stories from his career, insights on software development, and deep dives into technology.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep479-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexZocDoc: App that helps patients find healthcare providers.

Go to https://zocdoc.com/lexFin: AI agent for customer service.

Go to https://fin.ai/lexAllio Capital: AI-powered investment app that use…

3 months, 2 weeks назад @ lexfridman.com
#478 – Scott Horton: The Case Against War and the Military Industrial Complex
#478 – Scott Horton: The Case Against War and the Military Industrial Complex #478 – Scott Horton: The Case Against War and the Military Industrial Complex

Scott Horton is the director of the Libertarian Institute, editorial director of Antiwar.com, host of The Scott Horton Show, co-host of Provoked, and for the past three decades a staunch critic of U.S. military interventionism.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep478-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://alliocapital.com/Hampton: Community for high-growth founders and CEOs.

Go to https://joinhampton.com/lexBetterHelp: Online therapy and counseling.

Go to https://drinkag1.com/lexOUTLINE:(00:00) – Introduction(00:35) – Sponsors, Comments, and Reflections(09:14) – From the Cold War to …

3 months, 3 weeks назад @ lexfridman.com
#477 – Keyu Jin: China’s Economy, Tariffs, Trade, Trump, Communism & Capitalism
#477 – Keyu Jin: China’s Economy, Tariffs, Trade, Trump, Communism & Capitalism #477 – Keyu Jin: China’s Economy, Tariffs, Trade, Trump, Communism & Capitalism

Keyu Jin is an economist specializing in China’s economy, international macroeconomics, global trade imbalances, and financial policy.

She is the author of The New China Playbook: Beyond Socialism and Capitalism.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep477-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://alliocapital.com/UPLIFT Desk: Standing desks and office ergonomics.

Go to https://upliftdesk.com/lexHampton: Community for high-growth founders and CEOs.

4 months назад @ lexfridman.com
#476 – Jack Weatherford: Genghis Khan and the Mongol Empire
#476 – Jack Weatherford: Genghis Khan and the Mongol Empire #476 – Jack Weatherford: Genghis Khan and the Mongol Empire

Jack Weatherford is an anthropologist and historian specializing in Genghis Khan and the Mongol Empire.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep476-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://alliocapital.com/ZocDoc: App that helps patients find healthcare providers.

Go to https://zocdoc.com/lexFin: AI agent for customer service.

Go to https://shopify.com/lexMasterClass: Online classes from world-class experts.

4 months, 2 weeks назад @ lexfridman.com
#475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games
#475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games #475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games

Demis Hassabis is the CEO of Google DeepMind and Nobel Prize winner for his groundbreaking work in protein structure prediction using AI.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep475-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://joinhampton.com/lexFin: AI agent for customer service.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

4 months, 3 weeks назад @ lexfridman.com
#474 – DHH: Future of Programming, AI, Ruby on Rails, Productivity & Parenting
#474 – DHH: Future of Programming, AI, Ruby on Rails, Productivity & Parenting #474 – DHH: Future of Programming, AI, Ruby on Rails, Productivity & Parenting

David Heinemeier Hansson (aka DHH) is a legendary programmer, creator of Ruby on Rails, co-owner & CTO of 37signals that created Basecamp, HEY, & ONCE, and is a NYT-best-selling author (with Jason Fried) of 4 books: REWORK, REMOTE, Getting Real, and It Doesn’t Have To Be Crazy At Work.

He is also a race car driver, including a class-winning performance at the 24 hour Le Mans race.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep474-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexLindy: No-code AI agent builder.

Go to https://go.lindy.ai/lexLMNT: Zero-sugar electrolyte drink …

5 months назад @ lexfridman.com
#473 – Iran War Debate: Nuclear Weapons, Trump, Peace, Power & the Middle East
#473 – Iran War Debate: Nuclear Weapons, Trump, Peace, Power & the Middle East #473 – Iran War Debate: Nuclear Weapons, Trump, Peace, Power & the Middle East

Debate on Iran war between Scott Horton and Mark Dubowitz.

Scott Horton is the author and director of the Libertarian Institute, editorial director of Antiwar.com, host of The Scott Horton Show, and for the past three decades, a staunch critic of U.S. foreign policy and military interventionism.

Mark Dubowitz is the chief executive of the Foundation for Defense of Democracies, host of the Iran Breakdown podcast, and a leading expert on Iran and its nuclear program for over 20 years.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep473-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://drinkLMNT.c…

5 months, 2 weeks назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 1 week, 4 days назад
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

1 week, 4 days назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

2 months, 1 week назад @ microsoft.com
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education

KOHANE: So I think you’ve “nerd sniped” me because you [LAUGHTER]—which is all too easy—but I think there’s a central issue here.

But I actually think this is dark matter of human organizational technology that is not well understood.

AZEEM AZHAR: We didn’t talk about, you know, AI in its ability to potentially do this, which is to extend the clinician’s presence throughout the week.

And so I think there’s always going to be an opening for either differences of opinion or agreeing with you too much.

And this gets into whether AI is really going to get almost to the ab initio understanding of human biology.

3 months, 3 weeks назад @ microsoft.com
Reimagining healthcare delivery and public health with AI
Reimagining healthcare delivery and public health with AI Reimagining healthcare delivery and public health with AI

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

4 months, 1 week назад @ microsoft.com
Navigating medical education in the era of generative AI
Navigating medical education in the era of generative AI Navigating medical education in the era of generative AI

Prior to med school, Daniel pursued experiences that cultivated his interest in the application of AI in medical practice and education.

Really, really looking forward to this chat.

There’s AI before ChatGPT and before, you know, generative AI really became a big thing, and then afterwards.

And then after we talk about what’s really happening, what do you think should happen in medical education given the reality of generative AI?

And I do agree [that] AI really gives us real hope that we can make it true.

4 months, 3 weeks назад @ microsoft.com
AI Testing and Evaluation: Reflections
AI Testing and Evaluation: Reflections AI Testing and Evaluation: Reflections

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

We have examples, like the pharmaceutical or medical device industry experts with whom you spoke, that’s really, you know, testing … there is a pre-deployment requirement.

And the third is just how rigid versus adaptive these testing and evaluation regimes or frameworks are in these different domains.

I really agree that there has been a lot of emphasis to date on, sort of, testing models upstream, the AI model evaluation.

You know, I think there’s been real progress already in the AI evaluation and testing ecosystem in the public-private partnership context.

4 months, 3 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from cybersecurity
AI Testing and Evaluation: Learnings from cybersecurity AI Testing and Evaluation: Learnings from cybersecurity

Absolutely, I really, really was.

As a principal director on the Microsoft AI Red Team, Tori leads all AI security and safety red team operations, as well as dangerous capability testing, to directly inform C-suite decision-makers.

This year, we’ve pulled a lot of those assets and insights into the Azure [AI] Foundry AI Red Teaming Agent (opens in new tab).

So you can get a little taste of what we do day to day in the AI Red Teaming Agent.

WESTERHOFF: I think the most important takeaway from those lessons is that AI security is truly a team sport.

5 months назад @ microsoft.com
How AI will accelerate biomedical research and discovery
How AI will accelerate biomedical research and discovery How AI will accelerate biomedical research and discovery

Dr. Eric Topol is the executive vice president of the biomedical research non-profit Scripps Research, where he founded and now directs the Scripps Research Translational Institute.

Let’s continue our deep dive on AI and biomedical research with this conversation with Noubar Afeyan:LEE: Noubar, thanks so much for joining.

And there’s the origin story of contact with AI, you know, before the emergence of generative AI and afterwards.

What is going on today with respect to AI really being used for something meaningful in the design and development of drugs?

TOPOL: You would read about how, you know, data is the new oil and, you know, gold and whatnot.

5 months назад @ microsoft.com
AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices
AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

During the pre-market phase, medical testing establishes baseline safety and effectiveness metrics through bench testing, performance standards, and clinical studies.

SULLIVAN: So medical devices face a pretty prescriptive multi-level testing path before they hit the market.

We are looking into medical devices, as well, obviously, but also other technologies in advanced medical computing.

So we see Phase 3 trials as something that occurs in the medical devices and pharmaceuticals field.

5 months, 1 week назад @ microsoft.com
AI Testing and Evaluation: Learnings from genome editing
AI Testing and Evaluation: Learnings from genome editing AI Testing and Evaluation: Learnings from genome editing

As generative AI continues to advance, Microsoft has gathered a range of experts—from genome editing to cybersecurity—to share how their fields approach evaluation and risk assessment.

CHARO: Well, you know, genome editing is both very old and very new.

Now the earliest forms of genome editing were very inefficient, and so we didn’t worry that much.

But the bottom-line thing to remember, the way to really think about it is, we don’t regulate genome editing; we regulate the things that use genome editing.

And she said, you know, we don’t regulate genome editing; we regulate the things that use genome editing.

5 months, 2 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from Science and Industry
AI Testing and Evaluation: Learnings from Science and Industry AI Testing and Evaluation: Learnings from Science and Industry

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

And I think, really, there are two reasons why tech is so, kind of, representative of that kind of challenge that I’ve always found fascinating.

Continues to be a really important topic in the AI policy conversation right now, I think, for really good reason.

Testing is an important component for governance and AI and, of course, in all of these other domains, as well.

I think about almost, like, in the near to mid-term, like three issues that we need to address in the AI, kind of, policy and testing context.

5 months, 3 weeks назад @ microsoft.com
The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research
The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research

LEE: Yeah, yeah.

It cannot—as, you know, Bill was saying—it cannot learn from your document.

And I don’t know if the two of you remember, but I ended up doing a lot of tests.

I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini (opens in new tab).

Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know.

6 months назад @ microsoft.com
What AI's impact on individuals means for the health workforce and industry
What AI's impact on individuals means for the health workforce and industry What AI's impact on individuals means for the health workforce and industry

So I don’t think we should be surprised that business schools matter on this because we care about management.

That’s really going to change the way, like, middle school works, was my thinking at the time.

We’ve gone from AI being highly discriminative to AI that’s able to explore the world in particular ways.

The symptoms that they’re showing are quite different, and also their compliance is really, really different.

LEE: Yeah, really, really interesting.

6 months, 2 weeks назад @ microsoft.com
Abstracts: Zero-shot models in single-cell biology with Alex Lu
Abstracts: Zero-shot models in single-cell biology with Alex Lu Abstracts: Zero-shot models in single-cell biology with Alex Lu

And single-cell foundation models claim to be capable of unraveling deeper insights than ever before.

Basically, we showed that single-cell foundation models perform worse in settings that are fundamental to biological discovery than much simpler machine learning and statistical methods that were used in the field before single-cell foundation models emerged and are the go-to standard for unpacking meaning from these complicated experiments.

And the way to understand this is because single-cell foundation models are trained in a way that tries to expose these models to millions of single-cells.

But let’s also talk about the impact for methodologists, people who are trying to improve these s…

6 months, 3 weeks назад @ microsoft.com
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma Abstracts: Aurora with Megan Stanley and Wessel Bruinsma

This is such exciting work about environmental forecasting, so we’re happy to have the two of you join us today.

Mostly because AI weather forecasting models are computationally much more efficient and can even be more accurate.

What’s unfortunate though, about this big step forward, is that these developments are mostly limited to the setting of weather forecasting.

Weather forecasting is very important, obviously, but there are many other important environmental forecasting problems out there, such as air pollution forecasting or ocean wave forecasting.

STANLEY: Current approaches have really focused training very specifically on weather forecasting models.

6 months, 3 weeks назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 4 days, 13 hours назад
Cracking the Cold Start Problem
Cracking the Cold Start Problem Cracking the Cold Start Problem

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insigh…

4 days, 13 hours назад @ dataskeptic.com
Designing Recommender Systems for Digital Humanities
Designing Recommender Systems for Digital Humanities Designing Recommender Systems for Digital Humanities

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challen…

2 weeks, 5 days назад @ dataskeptic.com
DataRec Library for Reproducible in Recommend Systems
DataRec Library for Reproducible in Recommend Systems DataRec Library for Reproducible in Recommend Systems

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Mario Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. The conversation covers Alberto's research journey through knowledge graphs, graph-based recommen…

4 weeks, 1 day назад @ dataskeptic.com
Shilling Attacks on Recommender Systems
Shilling Attacks on Recommender Systems Shilling Attacks on Recommender Systems

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a …

1 month, 1 week назад @ dataskeptic.com
Music Playlist Recommendations
Music Playlist Recommendations Music Playlist Recommendations

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start prob…

1 month, 2 weeks назад @ dataskeptic.com
Bypassing the Popularity Bias
Bypassing the Popularity Bias Bypassing the Popularity Bias 1 month, 4 weeks назад @ dataskeptic.com
Sustainable Recommender Systems for Tourism
Sustainable Recommender Systems for Tourism Sustainable Recommender Systems for Tourism

In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design. Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that balance user satisfaction with environmental concerns, and creating frameworks that distribute tourism more equitably across destinations. Ashmi's insights offer valuable perspectives for both AI res…

2 months назад @ dataskeptic.com
Interpretable Real Estate Recommendations
Interpretable Real Estate Recommendations Interpretable Real Estate Recommendations

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich interviews Dr. Kunal Mukherjee, a postdoctoral research associate at Virginia Tech, about the paper "Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations" The discussion explores how the post-COVID real estate landscape has created a need for better recommendation systems that can introduce home buyers to emerging neighborhoods they might not know about. Dr. Mukherjee, explains how his team developed a graph neural network approach that not only recommends properties but provides human-interpretable explanations for why certain regions are suggested. The conversation covers the advantages o…

2 months, 3 weeks назад @ dataskeptic.com
Why Am I Seeing This?
Why Am I Seeing This? Why Am I Seeing This?

In this episode of Data Skeptic, we explore the challenges of studying social media recommender systems when exposure data isn't accessible. Our guests Sabrina Guidotti, Gregor Donabauer, and Dimitri Ognibene introduce their innovative "recommender neutral user model" for inferring the influence of opaque algorithms.

3 months назад @ dataskeptic.com
Eco-aware GNN Recommenders
Eco-aware GNN Recommenders Eco-aware GNN Recommenders

In this episode of Data Skeptic, we dive into eco-friendly AI with Antonio Purificato, a PhD student from Sapienza University of Rome. Antonio discusses his research on "EcoAware Graph Neural Networks for Sustainable Recommendations" and explores how we can measure and reduce the environmental impact of recommender systems without sacrificing performance.

3 months, 2 weeks назад @ dataskeptic.com
Networks and Recommender Systems
Networks and Recommender Systems Networks and Recommender Systems

Kyle reveals the next season's topic will be "Recommender Systems". Asaf shares insights on how network science contributes to the recommender system field.

3 months, 3 weeks назад @ dataskeptic.com
Network of Past Guests Collaborations
Network of Past Guests Collaborations Network of Past Guests Collaborations

Kyle and Asaf discuss a project in which we link former guests of the podcast based on their co-authorship of academic papers.

4 months, 3 weeks назад @ dataskeptic.com
The Network Diversion Problem
The Network Diversion Problem The Network Diversion Problem

In this episode, Professor Pål Grønås Drange from the University of Bergen, introduces the field of Parameterized Complexity - a powerful framework for tackling hard computational problems by focusing on specific structural aspects of the input. This framework allows researchers to solve NP-complete problems more efficiently when certain parameters, like the structure of the graph, are "well-behaved". At the center of the discussion is the network diversion problem, where the goal isn’t to block all routes between two points in a network, but to force flow - such as traffic, electricity, or data - through a specific path. While this problem appears deceptively similar to the classic "Min.Cu…

5 months, 1 week назад @ dataskeptic.com
Complex Dynamic in Networks
Complex Dynamic in Networks Complex Dynamic in Networks

In this episode, we learn why simply analyzing the structure of a network is not enough, and how the dynamics - the actual mechanisms of interaction between components - can drastically change how information or influence spreads. Our guest, Professor Baruch Barzel of Bar-Ilan University, is a leading researcher in network dynamics and complex systems ranging from biology to infrastructure and beyond. BarzelLab BarzelLab on Youtube Paper in focus: Universality in network dynamics, 2013

5 months, 2 weeks назад @ dataskeptic.com
Github Network Analysis
Github Network Analysis Github Network Analysis 5 months, 3 weeks назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 16 часов назад
948: In Case You Missed It in November 2025
948: In Case You Missed It in November 2025 948: In Case You Missed It in November 2025

In this November episode of “In Case You Missed It” series, Jon Krohn selects his favorite clips from the month. Hear from Shirish Gupta and Tyler Cox (Episode 939), Vikoy Pandey (Episode 941), Marc Dupuis (Episode 937), and Maya Ackerman (Episode 943) on getting back to human motivation and the importance of evaluating the tools and data we use. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/948⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

16 часов назад @ podtrac.com
947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li
947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li 947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li

Jeff Li tells Jon Krohn what it's like to work at scale as a data scientist and a machine learning engineer at Netflix, Spotify and DoorDash, as well as how to get a foot in the door at these companies. Jeff also discusses how to run forecasts and trends, and how to read their results. Listen to hear Jeff Li discuss how Spotify became a podcast powerhouse, his startup move.ai, and the tools he uses every day. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Fabi, and by Airia. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/947⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship informat…

3 days, 16 hours назад @ podtrac.com
946: How Robotaxis Are Transforming Cities
946: How Robotaxis Are Transforming Cities 946: How Robotaxis Are Transforming Cities

Jon Krohn looks into the benefits of robotaxis, from safety to affordability, in this Five-Minute Friday. Hear about Waymo’s partnership with Jaguar Land Rover, the latest safety studies concerning driverless vehicles, and a case for robotaxis becoming the preferred method of transport in the US, where households spend roughly 15% of their budget on vehicle ownership. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/946⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 week назад @ podtrac.com
945: AI is a Joke, with Joel Beasley
945: AI is a Joke, with Joel Beasley 945: AI is a Joke, with Joel Beasley

Is there humor in data? Joel Beasley, host of Modern CTO, tells Jon Krohn how he used AI to turn his sights to stand-up comedy. He also shares his tips on tech leadership that he learned from his popular podcast, Modern CTO, and how he is using generative AI as a collaborative partner in his creative work. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Fabi, and by Gurobi⁠⁠⁠. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/945⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (02:14) Joel Beasley on his comedy career (19:04) Applying the ‘me…

1 week, 3 days назад @ podtrac.com
944: Gemini 3 Pro: Google’s Back on Top
944: Gemini 3 Pro: Google’s Back on Top 944: Gemini 3 Pro: Google’s Back on Top

Google is steaming ahead with launching its top-league new Gemini 3 Pro model across their product suite, from Google Search to Vertex AI cloud services. The multinational tech company is also letting eager early adopters like Wayfair and GitHub. Get all the detailed data, its performance across hard-to-game industry benchmarks, and what this all means for the way you use generative AI, in this week’s episode. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/944⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

2 weeks назад @ podtrac.com
943: Creative Machines: AI in Music and Art, with Prof. Maya Ackerman
943: Creative Machines: AI in Music and Art, with Prof. Maya Ackerman 943: Creative Machines: AI in Music and Art, with Prof. Maya Ackerman

Creative human-AI partnerships and AI-generated music: WaveAI CEO and co-founder Maya Ackerman speaks with Jon Krohn about learning to see – and accept – AI’s potential as a creative partner in a human-centric, AI-forward future. Listen to the episode to hear Maya Ackerman discuss reframing hallucination as a creative force, her work at WaveAI, and how to push the boundaries of creativity using generative AI. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Gurobi⁠⁠⁠ and by Airia. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/943⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship informat…

2 weeks, 3 days назад @ podtrac.com
942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications
942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications 942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications

What’s on the horizon for AI? Jon Krohn wades through opinions from more than experts, curated by the Longitudinal Expert AI Panel (LEAP), about what we can expect from the industry. From estimates on AI-assisted workers through energy consumption to AI performance in highly skilled domains, find out just how much LEAP thinkers believe AI is permeating our daily work and life in this Five-Minute Friday. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/942⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

3 weeks назад @ podtrac.com
941: Multi-Agent Human Societies, with Dr. Vijoy Pandey
941: Multi-Agent Human Societies, with Dr. Vijoy Pandey 941: Multi-Agent Human Societies, with Dr. Vijoy Pandey

Vijoy Pandey imagines a bold new society in which agents and humans make scientific discoveries and complete physical tasks together, and he tells Jon Krohn about his work at AGNTCY, Cisco’s open-source platform for the Internet of Agents. Listen to the episode to hear Vijoy Pandey talk about how a future society in which multi-agents and humans interact may be a real possibility, what TCP/IP is, how to find trustworthy AI agents, and how to get your hands on AGNTCY today! This episode is brought to you by the Dell⁠⁠⁠⁠⁠⁠⁠⁠⁠, by⁠ ⁠⁠Intel⁠⁠⁠, by ⁠Fabi⁠ and by ⁠Gurobi⁠⁠⁠⁠. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/941⁠⁠⁠ Interested in sponsoring a SuperDataScience Po…

3 weeks, 3 days назад @ podtrac.com
SDS 940: In Case You Missed It in October 2025
SDS 940: In Case You Missed It in October 2025 SDS 940: In Case You Missed It in October 2025

Jon Krohn curates a selection of clips from the month that was. Hear from the orchestrators of an expanding AI universe in this episode of In Case You Missed It, with news, views and groundbreaking ideas from Sheamus McGovern, Jerry Yurchisin, Stephanie Hare, Larissa Schneider, and Adrian Kosowsky. We cover baby dragons, the Hippocratic Oath, and, of course, all the latest in artificial intelligence! Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/940⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

4 weeks назад @ podtrac.com
939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta
939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta 939: Mixture-of-Experts and State-Space Models on Edge Devices, with Tyler Cox and Shirish Gupta

State space models (SSMs), granite models, and Mamba: Dell’s Tyler Cox and Shirish Gupta discuss with Jon Krohn why state space models can process information so efficiently, and how Dell’s AI factory helps enterprises manage custom AI workloads. Hear the latest on the Dell Pro AI Studio and Dell’s partnerships with IBM and Hugging Face in this episode. This episode is brought to you by the Trainium2, the latest AI chip from AWS and by Gurobi. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/939⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (02:58) Dell Pro AI…

1 month назад @ podtrac.com
938: Frontier AI Agents for Data Science, with Sphinx’s Rohan Kodialam
938: Frontier AI Agents for Data Science, with Sphinx’s Rohan Kodialam 938: Frontier AI Agents for Data Science, with Sphinx’s Rohan Kodialam

Jon Krohn speaks to Rohan Kodialam, Cofounder and CEO of Sphinx, the company that redefines how machine intelligence reasons data with frontier AI. In this Feature Friday, Jon and Rohan discuss the benefits of using Sphinx to assist with data analysis. Get under the hood to learn how Sphinx operates, from running commands to ensuring your data stays secure, and find out how you can get your hands on this great tool for free. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/938⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month назад @ podtrac.com
937: How to Design AI-First Products, with Marc Dupuis
937: How to Design AI-First Products, with Marc Dupuis 937: How to Design AI-First Products, with Marc Dupuis

AI tools won’t eliminate but elevate data scientists, says Marc Dupuis. The CEO of fabi.ai talks to Jon Krohn about the new wave of AI-driven platforms that integrate workflows within popular work tools like Slack and email, and how building AI-first products means widening access to all ability levels. This episode is brought to you by the Gurobi⁠⁠⁠⁠, by ⁠⁠Dell⁠⁠ and by ⁠⁠Intel. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/937 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (09:31) Will fabi.ai outshine data science practitioners (20:40) Resolving workflows w…

1 month, 1 week назад @ podtrac.com
936: LLMs Are Delighted to Help Phishing Scams
936: LLMs Are Delighted to Help Phishing Scams 936: LLMs Are Delighted to Help Phishing Scams

How much power – and risk – do we carry around with us in our pockets? A Reuters investigation about how easily LLMs can be utilized for online phishing scams is the subject of this week’s Five-Minute Friday with Jon Krohn. By asking six of the most popular LLMs (Grok, ChatGPT, Meta AI, Claude, DeepSeek and Gemini) to generate phishing emails specifically targeting elderly people, Reuters found the safety sometimes severely lacking in the models. Listen to the episode to hear Jon quantify this problem with real-world examples, why mere content warnings in LLM models don’t work, and the troubling results of the phishing requests. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/…

1 month, 1 week назад @ podtrac.com
935: Global Issues Accelerated by AI (with Solutions), feat. Stephanie Hare
935: Global Issues Accelerated by AI (with Solutions), feat. Stephanie Hare 935: Global Issues Accelerated by AI (with Solutions), feat. Stephanie Hare

Jon Krohn speaks to researcher, broadcaster and author Stephanie Hare about how the Hippocratic Oath might apply to artificial intelligence, and a guiding ethos for pushing innovation while protecting users from harm. A code of conduct, she says, could be one approach to ensuring that people are using technology more mindfully and ethically, as well as an opportunity for users to feel that they belong to a wider, global community. Although she sympathizes with people concerned by overregulation undermining innovation, Stephanie also notes that we expect certain standards to be met elsewhere, such as vehicle and drug safety, as well as fair journalistic practices. As Stephanie explains, we n…

1 month, 2 weeks назад @ podtrac.com
934: Is AI Replacing Junior Workers?
934: Is AI Replacing Junior Workers? 934: Is AI Replacing Junior Workers?

With the number of jobs dramatically slowing in the last year, many question if this decline is down to companies turning to AI for completing entry-level tasks in particular. Research published earlier this month by Yale University shows no major difference in the types of roles and tasks in so-called `white-collar jobs` since late 2022, an auspicious date that coincides with the launch of ChatGPT. In this week‘s Five-Minute Friday, host Jon Krohn discusses if and when AI will undercut junior-level jobs, particularly in the US. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/934⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience…

1 month, 2 weeks назад @ podtrac.com
Data Science at Home Data Science at Home
последний пост 2 weeks, 2 days назад
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295) Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)

Most companies don’t have an AI problem.

In this conversation, he breaks down when AI actually makes sense, where AWS costs spiral out of control, and why your “cool demo” keeps dying before launch.

If you’re tired of AI hype and ready for straight answers, hit play.

Our Discord community is full of ML engineers, researchers, and AI enthusiasts discussing papers, sharing projects, and helping each other level up.

Whether you’re debugging your first neural net or training your tenth transformer, there’s a place for you.

2 weeks, 2 days назад @ datascienceathome.com
From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)
From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294) From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)

LLMs generate text painfully slow, one low-info token at a time.

Researchers just figured out how to compress 4 tokens into smart vectors & cut costs by 44%—with full code & proofs!

🔥📊SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

Get $200 off any seminar with code DATA25 at https://statisticalhorizons.com

1 month назад @ datascienceathome.com
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293)
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293) Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293)

VortexNet uses actual whirlpools to build neural networks.

By borrowing equations from fluid dynamics, this new architecture might solve deep learning’s toughest problems—from vanishing gradients to long-range dependencies.

Today we explain how vortex shedding, the Strouhal number, and turbulent flows might change everything in AI.

SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

1 month, 1 week назад @ datascienceathome.com
The Scientists Growing Living Computers in Swiss Labs (Ep. 292)
The Scientists Growing Living Computers in Swiss Labs (Ep. 292) The Scientists Growing Living Computers in Swiss Labs (Ep. 292)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

With a focus on dual-use innovation, Amethix is shaping a future where intelligent machines extend human capability, not replace it.

Discover more at https://amethix.com This episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Learn more at intrepid.aiReferencesWebsite: finalspark.comDiscord account: / discordNewsletter: https://finalspark.com/#newsletterTopics: Biological computing • Neural engineering • Energy-effic…

1 month, 2 weeks назад @ datascienceathome.com
When AI Hears Thunder But Misses the Fear (Ep. 291)
When AI Hears Thunder But Misses the Fear (Ep. 291) When AI Hears Thunder But Misses the Fear (Ep. 291)

Sanjoy Chowdhury reveals AI’s hidden weakness: while systems can see objects and hear sounds perfectly, they can’t reason across senses like humans do.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at https://amethix.comThis episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Whether it’s in the sky, on the ground, or in orbit—if it’s intelligent and mobile, Intrepid helps you build it.

2 months назад @ datascienceathome.com
Why VCs Are Funding $100M Remote Control Toys (Ep. 290)
Why VCs Are Funding $100M Remote Control Toys (Ep. 290) Why VCs Are Funding $100M Remote Control Toys (Ep. 290)

ReferencesWar On The Rocks: https://warontherocks.com/2025/08/ukraine-isnt-the-model-for-winning-the-innovation-war/LinkedIn: https://www.linkedin.com/in/jonasrsinger/Spotify: https://tr.ee/Omy_1X8k1UApple Podcast: https://podcasts.apple.com/us/podcast/defence-innovation-podcast/id1797131332YouTube: https://youtube.com/@DefenceInnovationpodcast?si=cu2WlnVgL5XKnM0pSponsorsThis episode is proudly sponsored by Amethix Technologies.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at https://amethix.comThis episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers…

2 months, 3 weeks назад @ datascienceathome.com
How Hacker Culture Died (Ep. 289)
How Hacker Culture Died (Ep. 289) How Hacker Culture Died (Ep. 289)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comDSH is brought to you by Intrepid AI.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at:[email protected]’t forget to like, subscribe, and hit the 🔔 for…

3 months, 2 weeks назад @ datascienceathome.com
Robots Suck (But It’s Not Their Fault) (Ep. 288)
Robots Suck (But It’s Not Their Fault) (Ep. 288) Robots Suck (But It’s Not Their Fault) (Ep. 288)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comDSH is brought to you by Intrepid AI.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at:[email protected]’t forget to like, subscribe, and hit the 🔔 for…

4 months, 1 week назад @ datascienceathome.com
Your Favorite AI Startup is Probably Bullshit (Ep. 287)
Your Favorite AI Startup is Probably Bullshit (Ep. 287) Your Favorite AI Startup is Probably Bullshit (Ep. 287)

The brutal truth about why Silicon Valley is blowing billions on glorified autocomplete while pretending it’s the next iPhone.

We’re diving deep into the AI investment circus where VCs who can’t code are funding companies that barely understand their own technology.

From blockchain déjà vu to the “ChatGPT wrapper” economy—this episode will make you question every AI valuation you’ve ever seen.

Fair warning: We’re naming names and calling out the hype.

Don’t listen if you work at a “revolutionary AI startup” that’s just OpenAI’s API with a pretty interface.

4 months, 1 week назад @ datascienceathome.com
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 286) [RB]
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 286) [RB] Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 286) [RB]

From the viral article “Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything” on my newsletter at https://defragzone.substack.com/p/techs-dumbest-mistake-why-firinghere are my thoughts about AI replacing programmers…🎙️ Sponsors AGNTCY — The open source collective building the Internet of Agents🌐 https://www.agntcy.org✨ Connect with us!

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Scie…

5 months, 1 week назад @ datascienceathome.com
Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285)
Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285) Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285)

In this episode of Data Science at Home, we explore the fascinating world of neuromorphic computing — a brain-inspired approach to computation that could reshape the future of AI and robotics.

The episode breaks down how neuromorphic systems differ from conventional AI architectures like transformers and LLMs, diving into spiking neural networks (SNNs), their benefits in energy efficiency and real-time processing, and their limitations in training and scalability.

Real-world applications are highlighted, including low-power drones, hearing aids, and event-based cameras.

Francesco closes with a vision of hybrid systems where neuromorphic chips and LLMs coexist, blending biological inspiratio…

5 months, 3 weeks назад @ datascienceathome.com
DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284)
DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284) DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284)

This episode explores the invisible battlespace of cyber and electronic warfare, where AI takes center stage.

SponsorsBuilding multi-agent software is hard — agent-to-agent and agent-to-tool communication is still the wild west.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

6 months, 1 week назад @ datascienceathome.com
DSH/Warcoded Swarming the Battlefield (Ep. 283)
DSH/Warcoded Swarming the Battlefield (Ep. 283) DSH/Warcoded Swarming the Battlefield (Ep. 283)

Swarming the Battlefield explores how artificial intelligence is revolutionizing combat through coordinated drone swarms.

This episode uncovers how these intelligent agents turn the chaos of the battlefield into a synchronized dance of machine warfare.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

6 months, 2 weeks назад @ datascienceathome.com
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282) DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)

In this gripping follow-up, we dive into how AI is transforming kinetic operations—from identifying a threat to executing a strike.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Whether it’s in the sky, on the ground, or in orbit—if it’s intelligent and mobile, Intrepid helps you build it.

7 months назад @ datascienceathome.com
DSH/Warcoded: Eyes and Ears of the Machine – AI Reconnaissance and Surveillance (Ep. 281)
DSH/Warcoded: Eyes and Ears of the Machine – AI Reconnaissance and Surveillance (Ep. 281) DSH/Warcoded: Eyes and Ears of the Machine – AI Reconnaissance and Surveillance (Ep. 281)

Welcome to DSH/WarcodedWe explore how AI is transforming ISR (Intelligence, Surveillance, Reconnaissance)—from satellite imagery to drone feeds.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.com.”Warcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Learn more at intrepid.ai.”#AI #defensetech #ISR #LLM #Warcoded #DataScienceAtHome #OSINT #SIGINT #dronewarfare

7 months, 1 week назад @ datascienceathome.com