Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 5 часов назад
5,000 synthetic Australian medical record PDFs - free 50-doc sample [P]
5,000 synthetic Australian medical record PDFs - free 50-doc sample [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

5 часов назад @ reddit.com
META Superintelligence Lab Presents: ProgramBench: Can SOTA AI Recreate Real Executable Programs(ffmpeg, SQLite, ripgrep) From Scratch Without The Internet?
META Superintelligence Lab Presents: ProgramBench: Can SOTA AI Recreate Real Executable Programs(ffmpeg, SQLite, ripgrep) From Scratch Without The Internet?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

7 часов назад @ reddit.com
Running scope enforcement on every agent action in production — what I'm seeing after launch [P]
Running scope enforcement on every agent action in production — what I'm seeing after launch [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

10 часов назад @ reddit.com
Dataset of 150k+ stool images and not sure how to fully use it [D]
Dataset of 150k+ stool images and not sure how to fully use it [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

10 часов назад @ reddit.com
Visual Perceptual to Conceptual First-Order Rule Learning Networks [R]
Visual Perceptual to Conceptual First-Order Rule Learning Networks [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

10 часов назад @ reddit.com
NeuIPS submission small formatting question [D]
NeuIPS submission small formatting question [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

11 часов назад @ reddit.com
Exploring Black‑Box Optimization [R]
Exploring Black‑Box Optimization [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

13 часов назад @ reddit.com
Weights & Biases New Master Service Agreement Questions [D]
Weights & Biases New Master Service Agreement Questions [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
Model automatically developed by the AIBuildAI Agent ranked among top 5.7% out of 3,219 human teams in the Kaggle TGS Salt Identification Challenge [P]
Model automatically developed by the AIBuildAI Agent ranked among top 5.7% out of 3,219 human teams in the Kaggle TGS Salt Identification Challenge [P] Model automatically developed by the AIBuildAI Agent ranked among top 5.7% out of 3,219 human teams in the Kaggle TGS Salt Identification Challenge [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

19 часов назад @ reddit.com
Stop letting LLMs edit your .bib [D]
Stop letting LLMs edit your .bib [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

23 часа назад @ reddit.com
NeurIPS 2026 AC-Pilot, how much would you trust this? [D]
NeurIPS 2026 AC-Pilot, how much would you trust this? [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day назад @ reddit.com
Wrote an article on Sub 10ms Retrieval Systems [R]
Wrote an article on Sub 10ms Retrieval Systems [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 7 hours назад @ reddit.com
Evaluating LLM Spatial Grounding: A 100-City Audit of 7,000+ Restaurant Recommendations vs. Google Places for Ground Truth [R]
Evaluating LLM Spatial Grounding: A 100-City Audit of 7,000+ Restaurant Recommendations vs. Google Places for Ground Truth [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 8 hours назад @ reddit.com
Transformers with Selective Access to Early Representations [R]
Transformers with Selective Access to Early Representations [R] Transformers with Selective Access to Early Representations [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 9 hours назад @ reddit.com
Is it only me or all the public LLM judges are just bad? [D]
Is it only me or all the public LLM judges are just bad? [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 11 hours назад @ reddit.com
Towards Data Science
последний пост 18 часов назад
When the Uncertainty Is Bigger Than the Shock: Scenario Modelling for English Local Elections
When the Uncertainty Is Bigger Than the Shock: Scenario Modelling for English Local Elections When the Uncertainty Is Bigger Than the Shock: Scenario Modelling for English Local Elections

Across 64 English authorities and six 2026 scenarios, even the strongest scenario shock was only 13% of the median uncertainty band.

Two definitions to carry through the rest of the article: scenario shock is the movement in the scenario point estimate relative to the baseline.

The inset reports each scenario shock as a percentage of the median band width.

Switching the sort to P50 or scenario shock reorders the ranking, and the rings still sit inside the bars.

The pipeline, scenario model code, calculated fields, and Tableau build guide are open-source at github.com/Wisabi-Analytics/civic-lens.

18 часов назад @ towardsdatascience.com
Beyond Lists: Using Python Deque for Real-Time Sliding Windows
Beyond Lists: Using Python Deque for Real-Time Sliding Windows Beyond Lists: Using Python Deque for Real-Time Sliding Windows

If we search around the internet, we will find a lot of information about lists, dictionaries, and tuples, but little on deque.

Deque (you can also pronounce it “deck” ) is an interesting and useful type of collection in Python.

# Create a new deque with 3 (or less) elements my_deck = deque(maxlen = 3) # Adding one element my_deck.append(1) my_deck.append(2) my_deck.append(3) # View my_deck # deque([1, 2, 3])Nice.

Using a deque allows you to maintain a “sliding window” of data efficiently.

# Generate Tasks producer() task_queue # [OUT] deque(['Task 0', 'Task 1', 'Task 2', 'Task 3', 'Task 4']) # Consume Tasks consumer() # [OUT] # Processing Task 0 # Processing Task 1 # Processing Task 2 # Pr…

19 часов назад @ towardsdatascience.com
Timer-XL: A Long-Context Foundation Model for Time-Series Forecasting
Timer-XL: A Long-Context Foundation Model for Time-Series Forecasting Timer-XL: A Long-Context Foundation Model for Time-Series Forecasting

What is Timer-XLTimer-XL is a decoder-only Transformer foundation model for forecasting.

Time-Series ApplicationsBy late 2024 and early 2025, numerous foundation models were published, providing ample evidence of what works best.

This makes Timer-XL ideal for high-frequency forecasting — a configuration where foundation models often underperform, as discussed earlier.

Zero-Shot Forecasting BenchmarkFinally, the authors also evaluate Timer-XL as a foundation model, comparing it with other top time series models.

This isn’t a new pattern — we have seen many Transformer forecasting models using this trick.

20 часов назад @ towardsdatascience.com
Why I Don’t Trust LLMs to Decide When the Weather Changed
Why I Don’t Trust LLMs to Decide When the Weather Changed Why I Don’t Trust LLMs to Decide When the Weather Changed

have a simple problem: they show you the forecast, but they don’t tell you when it actually changed.

Modern numerical weather prediction (NWP) systems — like ECMWF IFS — produce remarkably accurate forecasts at ~9 km resolution, updated every few hours.

When I started building AI systems, the instinct was immediate: this is a threshold problem.

The system runs continuously on real forecast data for user-defined events, evaluating changes every few hours and only triggering alerts when predefined conditions are met.

In practice, this often comes down to a simple distinction: use LLMs to explain decisions, not to replace well-defined ones.

22 часа назад @ towardsdatascience.com
Deconstruct Any Metric with a Few Simple ‘What’ Questions
Deconstruct Any Metric with a Few Simple ‘What’ Questions Deconstruct Any Metric with a Few Simple ‘What’ Questions

I don’t know about you, but it hurts when I look at that sentence knowing someone dared to call it a success metric.

Which would make our statement sound like:“The monthly model accuracy improvement is 5x.”Image generated by the author using Gemini.

Now, that’s better, or at least it feels this way, because we can claim we derived the improvement by comparing model accuracy logs across monthly runs.

Having this information, we can rewrite our statement again:“The monthly model accuracy improvement is 5x, growing from a baseline of 1% to 5%.”That looks better again.

Assuming we got an answer, we can tweak the previous version once more:“Comparing May 2026 to April 2026 results, the monthly m…

23 часа назад @ towardsdatascience.com
Discrete Time-To-Event Modeling – Predicting When Something Will Happen
Discrete Time-To-Event Modeling – Predicting When Something Will Happen Discrete Time-To-Event Modeling – Predicting When Something Will Happen

Predicting the “when” is often referred to as time-to-event modeling or survival analysis.

While event modeling shares techniques and intuitions with more traditional predictive modeling, it also introduces nuances that must be accommodated to create effective predictions.

This is the start of a multi-part series that will cover the basics of time-to-event modeling.

Many continuous time-to-event modeling techniques assume that ties are not possible and do not exist in the dataset.

Illustration of censoring in time-to-event data – image by authorWhat happens if you don’t do anything about data censoring?

1 day, 19 hours назад @ towardsdatascience.com
How to Make Claude Code Validate its own Work
How to Make Claude Code Validate its own Work How to Make Claude Code Validate its own Work

In a previous article, I mentioned Claude validating its own work as an important part of how I optimize my own use of Claude Code.

In this article I’ll discuss how to let Claude code verify its own work to increase performance.

Obviously, you’re putting Claude Code in a worse position where it’s going to produce inferior results compared to when Claude Code gets the opportunity to test its own code.

How to make Claude verify work in practiceThe wording “make Claude verify its own work”, often gets thrown around, for example on LinkedIn and X.

ConclusionIn this article, I covered how to make Claude Code validate its own work, to vastly improve the performance of your Claude Code instance or…

1 day, 20 hours назад @ towardsdatascience.com
RAG Hallucinates — I Built a Self-Healing Layer That Fixes It in Real Time
RAG Hallucinates — I Built a Self-Healing Layer That Fixes It in Real Time RAG Hallucinates — I Built a Self-Healing Layer That Fixes It in Real Time

Give the model real documents and it will use them.

That is enough to catch real drift without the database growing unbounded.

If the healed answer still fails re-inspection, it serves a safe decline instead of delivering a wrong answer.

The wrong answer is fixed in-place, the billing cycle language is normalized throughout, and every change is logged.

I am sharing this work to document a pattern that costs real teams real time, not to sell a product or service.

1 day, 22 hours назад @ towardsdatascience.com
Surviving High Uncertainty in Logistics with MARL
Surviving High Uncertainty in Logistics with MARL Surviving High Uncertainty in Logistics with MARL

Here is how we process the RL agent’s action and pass it into the LP solver.

While the “current” agent trains, the others operate purely in frozen inference mode.

Then, we take the current agent’s model and trigger .learn().

Since the current training agent has already taken its action, we need to infer the actions for the rest of the network.

Finally, at the end of the loop, we switch the context back to the current training agent and pass the final results back to the loop.

1 day, 23 hours назад @ towardsdatascience.com
Single Agent vs Multi-Agent: When to Build a Multi-Agent System
Single Agent vs Multi-Agent: When to Build a Multi-Agent System Single Agent vs Multi-Agent: When to Build a Multi-Agent System

AI AgentsWhen building an AI agent, the design choice matters.

There are two major types of memory commonly used in AI agents: short-term memory and long-term memory.

ReAct (Reasoning + Acting) in AgentsAn AI agent differs from a basic chatbot because a chatbot usually follows a more direct workflow: user query → LLM → response.

Single Agent vs Multi-AgentImage Generated By ChatGPTA single agent is an agent design where one LLM handles the whole task.

Demo Video of the Multi Agent Agent RAG ResearcherNotesSession memory is stored in utils/memory.db .. Local Qdrant data is stored in utils/qdrant_storage/ ..

2 days, 15 hours назад @ towardsdatascience.com
How to Build an Efficient Knowledge Base for AI Models
How to Build an Efficient Knowledge Base for AI Models How to Build an Efficient Knowledge Base for AI Models

An accurate and curated knowledge base improves both model speed and accuracy—areas where current models often fall short.

6 steps to build an effective knowledge baseSteps to build a knowledge base | Image by authorTaking a systematic approach to building a knowledge base helps you create one that is standardized, scalable, and self-explanatory.

Tip: There is an increasing trend to feed AI-generated data while building a knowledge base of new AI models.

Knowledge Base Quality Monitoring Knowledge base health with the help of automated checks: 1.

A knowledge base isn’t a data dump but a curated assetBuilding a knowledge base isn’t a one-time project.

2 days, 18 hours назад @ towardsdatascience.com
Playing Connect Four with Deep Q-Learning
Playing Connect Four with Deep Q-Learning Playing Connect Four with Deep Q-Learning

In this post, we take this step by considering the classic game of Connect Four and investigate how to learn strong policies using Deep Q-Learning.

When combined with neural networks, this approach is commonly referred to as Deep Q-Learning.

At the same time, it highlights a key challenge of Deep Q-Learning: the targets themselves depend on the current network, which can lead to instability during training.

For an additional reference, the official PyTorch tutorial on Deep Q-Learning provides a helpful complementary perspective.

ConclusionIn this post, we moved from tabular Sarsa to Deep Q-Learning, introducing replay buffers, batched updates, and function approximation.

2 days, 22 hours назад @ towardsdatascience.com
How AI Tools Generate Technical Debt in IoT Systems — and What to Do About It
How AI Tools Generate Technical Debt in IoT Systems — and What to Do About It How AI Tools Generate Technical Debt in IoT Systems — and What to Do About It

Why am I recalling an event from thirty years ago in a text about technical debt generated by AI tools?

I would highlight four main mechanisms through which AI tools can generate technical debt.

In IoT systems, this mechanism is particularly dangerous because a legacy pattern rarely remains a local issue within a single module.

Image by the authorWhat to do so that AI does not create technical debt in a projectAI in IoT systems requires stricter engineering discipline than development without it.

In IoT systems, the cost of this acceleration is measured not only in developer time, but also in the reliability of thousands of physical devices.

2 days, 23 hours назад @ towardsdatascience.com
CSPNet Paper Walkthrough: Just Better, No Tradeoffs
CSPNet Paper Walkthrough: Just Better, No Tradeoffs CSPNet Paper Walkthrough: Just Better, No Tradeoffs

Next, CHANNEL_POOLING is the parameter we use to adjust the behavior of the channel-pooling mechanism in our first transition layer.

The COMPRESSION parameter works similarly to the CHANNEL_POOLING variable, yet this one operates in the second transition layer.

# Codeblock 5 dense_block = DenseBlock(in_channels=32, repeats=6) x = torch.randn(1, 32, 56, 56) x = dense_block(x)And below is what the output looks like.

# Codeblock 5 Output original : torch.Size([1, 32, 56, 56]) after bottleneck #0 : torch.Size([1, 44, 56, 56]) after bottleneck #1 : torch.Size([1, 56, 56, 56]) after bottleneck #2 : torch.Size([1, 68, 56, 56]) after bottleneck #3 : torch.Size([1, 80, 56, 56]) after bottleneck #4 :…

3 days, 20 hours назад @ towardsdatascience.com
Inference Scaling (Test-Time Compute): Why Reasoning Models Raise Your Compute Bill
Inference Scaling (Test-Time Compute): Why Reasoning Models Raise Your Compute Bill Inference Scaling (Test-Time Compute): Why Reasoning Models Raise Your Compute Bill

Why the bill explodes in productionApple Machine Learning Research identifies a dangerous efficiency gap between reasoning models and standard LLMs.

Reasoning models break traditional linear pricing by introducing two distinct multipliers that impact both budget and infrastructure.

A standard model predicts in one second while a reasoning model can occupy GPU memory for thirty seconds.

Reasoning models are incredibly powerful for high-stakes planning and complex math, but they are overkill for basic formatting or classification.

Treat reasoning tokens like a precious resource, apply them where they are actually needed, and let your fast models handle the rest.

3 days, 22 hours назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
TheSequence TheSequence
последний пост 1 day, 1 hour назад
The Sequence AI of the Week #855: Inside Nemotron Omni: NVIDIA’s New Multimodal Brain for Agents
The Sequence AI of the Week #855: Inside Nemotron Omni: NVIDIA’s New Multimodal Brain for Agents The Sequence AI of the Week #855: Inside Nemotron Omni: NVIDIA’s New Multimodal Brain for Agents

The interesting thing about NVIDIA’s new Nemotron 3 Nano Omni is not that it “does multimodality.” We already have a zoo of models that can caption images, transcribe speech, parse PDFs, answer questions about videos, and click around GUIs.

The interesting thing is that Nemotron Omni is designed to make that zoo feel like a single animal.

The speech model may hear what was said but not what was on screen when it was said.

Nemotron 3 Nano Omni is NVIDIA’s attempt to move the “eyes and ears” of an agent into a single efficient perception-and-reasoning model: video, audio, image, and text in; text out.

NVIDIA announced it on April 28, 2026, positioning it as an open omni-modal reasoning model …

1 day, 1 hour назад @ thesequence.substack.com
The Sequence Knowledge #854: Return of the King: Unrolling the xLSTM Architecture
The Sequence Knowledge #854: Return of the King: Unrolling the xLSTM Architecture The Sequence Knowledge #854: Return of the King: Unrolling the xLSTM Architecture

💡 AI Concept of the Day: Return of the King: Unrolling the xLSTM ArchitectureIf you were training sequence models circa 2015, your entire mental model of the world was shaped by the Long Short-Term Memory (LSTM) network.

Invented in the 1990s by Sepp Hochreiter and Jürgen Schmidhuber, the LSTM was the undisputed workhorse of deep learning.

“Attention Is All You Need” dropped, and the entire AI ecosystem pivoted.

We traded the deep, architectural elegance of the LSTM for the brute-force, highly parallelizable matrix multiplications of the Transformer.

The Transformer won the hardware lottery because it allowed us to map the entire sequence onto a GPU grid and train it all at once.

2 days назад @ thesequence.substack.com
The Sequence Radar #853: Last Week in AI: The Great AI Fundraising Wars and a New Frontier Lab
The Sequence Radar #853: Last Week in AI: The Great AI Fundraising Wars and a New Frontier Lab The Sequence Radar #853: Last Week in AI: The Great AI Fundraising Wars and a New Frontier Lab

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: The Great AI Fundraising Wars and a New Frontier LabThis week in AI felt less like a product cycle and more like a sovereign debt auction for the future of cognition.

But the real story is that frontier AI is becoming an industrial-scale capital formation game.

It is the market trying to price a new kind of company: part model lab, part cloud tenant, part developer platform, part enterprise operating system.

AI Lab: MicrosoftSummary: This paper introduces a scalable methodology to create realistic, user-specific synthetic computer environments populated with diverse directory structures and content-rich artifacts.

🤖 AI Tech Releases…

4 days, 1 hour назад @ thesequence.substack.com
The Sequence Opinion #852: The Bitter Lessons for Agentic Interfaces: A CLI for EVERYTHING
The Sequence Opinion #852: The Bitter Lessons for Agentic Interfaces: A CLI for EVERYTHING The Sequence Opinion #852: The Bitter Lessons for Agentic Interfaces: A CLI for EVERYTHING

The next evolution of agentic SaaS isn’t more tool infrastructure.

I’ve been thinking a lot lately about why building agentic systems still feels so weirdly clunky, and I think I’ve finally put my finger on it.

The thesis I want to argue is simple: the next phase of agentic SaaS is not about chat interfaces, and it’s not about ever-more-elaborate tool infrastructures.

Every SaaS, eventually, will ship a parallel command-line surface — not as a developer convenience, but as the primary interface for its non-human users.

The bitter lesson, applied to interfaces

1 week назад @ thesequence.substack.com
The Sequence AI of the Week #851: DeepSeek-V4 and the Architecture of Million-Token Intelligence
The Sequence AI of the Week #851: DeepSeek-V4 and the Architecture of Million-Token Intelligence The Sequence AI of the Week #851: DeepSeek-V4 and the Architecture of Million-Token Intelligence

The most interesting thing about DeepSeek-V4 is not that it supports a one-million-token context window.

That number is impressive, but context length by itself is a poor proxy for intelligence.

The real question is not: how much text can the model ingest?

The real question is: how much history can the model economically use?

The model is designed around a simple but profound premise: million-token intelligence requires more than scaling the Transformer.

1 week, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #850: The Unexpected Comeback of RNNs
The Sequence Knowledge #850: The Unexpected Comeback of RNNs The Sequence Knowledge #850: The Unexpected Comeback of RNNs

💡 AI Concept of the Day: The Unexpected Comeback of RNNsIf you were building sequence models circa 2015, your mental model of the world was entirely shaped by Recurrent Neural Networks (RNNs).

You feed the network a token, it updates a fixed-size hidden state, and it throws the token away.

During inference, the memory footprint was beautifully constant—an $O(1)$ operation that could run efficiently on almost any hardware.

In a Transformer, the model must explicitly hold the high-dimensional representations of every previous token in memory to generate the next one.

But this is not a nostalgic return to the classic Long Short-Term Memory (LSTM) networks of the 2010s.

1 week, 1 day назад @ thesequence.substack.com
The Sequence Radar #849: Last Week in AI: OpenAI Ships Agents, xAI Eyes Cursor, DeepSeek and Kimi Advance
The Sequence Radar #849: Last Week in AI: OpenAI Ships Agents, xAI Eyes Cursor, DeepSeek and Kimi Advance The Sequence Radar #849: Last Week in AI: OpenAI Ships Agents, xAI Eyes Cursor, DeepSeek and Kimi Advance

AI Lab: Inclusion AI, Ant GroupSummary: This paper introduces LLaDA2.0-Uni, a unified discrete diffusion large language model that seamlessly integrates multimodal understanding and generation within a single framework.

AI Lab: Carnegie Mellon University, Amazon AGISummary: The authors present SkillLearnBench, the first benchmark designed to evaluate continual learning methods for agent skill generation across 20 real-world tasks.

AI Lab: MicrosoftSummary: AutoAdapt is an end-to-end automated framework designed to optimize the complex domain adaptation process for large language models under tight resource constraints.

Kimi 2.6Kimi 2.6 launched with marquee capabilities in agentic coding.

W…

1 week, 4 days назад @ thesequence.substack.com
The Sequence Opinion #848: The Agent’s Hands: CLI or MCP?
The Sequence Opinion #848: The Agent’s Hands: CLI or MCP? The Sequence Opinion #848: The Agent’s Hands: CLI or MCP?

The moment we give it tools, it becomes something else: not merely a chatbot, but an operator.

It can read files, write code, open issues, call APIs, move tickets, delete emails, deploy infrastructure, or wake you up at 3 a.m. because a background workflow misread a calendar event.

So the real primitive of agentic systems is the interface between the model and the world.

Two candidates have emerged as the main bridge: the command-line interface, or CLI, and the Model Context Protocol, or MCP.

Text in, text out, exit code, compose everything.” MCP says: “Agents need structured, discoverable, typed tools.

2 weeks назад @ thesequence.substack.com
The Sequence AI of the Week #847: Everything You Need to Know About Claude Opus 4.7
The Sequence AI of the Week #847: Everything You Need to Know About Claude Opus 4.7 The Sequence AI of the Week #847: Everything You Need to Know About Claude Opus 4.7

The benchmarks are what you’d expect from a two-month incremental release — SWE-bench Verified 87.6%, SWE-bench Pro 64.3%, MCP-Atlas +14.6pp, state-of-the-art on GDPval-AA for economically valuable knowledge work, XBOW visual-acuity 54.5% → 98.5%, finance and document reasoning up, BrowseComp and long-context multi-needle retrieval down.

If you migrate a 4.6 harness to 4.7 and it still sets temperature , top_p , top_k , or thinking.budget_tokens , you get a 400.

In their place: an effort enum ( low , medium , high , xhigh , max ) and task_budget , a soft token ceiling the model can actually see.

You’re no longer tuning the softmax; you’re telling the model how hard to think and how much run…

2 weeks, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #846: Beyond Transformer: A New Series
The Sequence Knowledge #846: Beyond Transformer: A New Series The Sequence Knowledge #846: Beyond Transformer: A New Series

💡 AI Concept of the Day: Beyond Transformer: A New SeriesIf you have been watching the arXiv firehose lately, you can feel a very palpable vibe shift.

Today, we are starting a new series to map out exactly what is happening: the search for novel alternatives to the Transformer architecture.

For the better part of a decade, the entire artificial intelligence ecosystem has essentially been a giant, spectacularly funded wrapper around a single mathematical operation: self-attention.

The Transformer won the hardware lottery of the late 2010s.

It was beautifully parallelizable across GPUs, and its mental model was intuitively simple—every token looks back at every previous token to decide what t…

2 weeks, 2 days назад @ thesequence.substack.com
The Sequence Radar #845: Last Week in AI: Anthropic and OpenAI Enter a New Phase
The Sequence Radar #845: Last Week in AI: Anthropic and OpenAI Enter a New Phase The Sequence Radar #845: Last Week in AI: Anthropic and OpenAI Enter a New Phase

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Anthropic and OpenAI Enter a New PhaseThis week brought a particularly interesting cluster of releases from Anthropic and OpenAI.

It is about AI splitting into distinct product forms: the general-purpose reasoning model, the domain specialist, and the workflow-native agent.

The deeper pattern across Anthropic and OpenAI is that frontier AI is fragmenting into real products.

AI Lab: NVIDIASummary: Generating large-scale 3D scenes using video diffusion models often suffers from spatial forgetting and temporal drifting over long camera trajectories.

AI Lab: Johns Hopkins UniversitySummary: Current instruction hierarchy paradigms use a …

2 weeks, 4 days назад @ thesequence.substack.com
The Sequence Opinion #844: Harness Engineering: The Operating System for Agentic Software
The Sequence Opinion #844: Harness Engineering: The Operating System for Agentic Software The Sequence Opinion #844: Harness Engineering: The Operating System for Agentic Software

There is a meaningful difference between getting a model to write code and getting a model to reliably build software.

We are talking about harness engineering.

OpenAI recently gave a useful name to a pattern many of us have been discovering the hard way: harness engineering.

The interesting part of harness engineering is not the label itself.

It is the collection of non-obvious truths that appear once you move beyond one-shot demos and start asking agents to do real work over long horizons.

3 weeks назад @ thesequence.substack.com
The Sequence AI of the Week #843: The AI We Built But Can't Release: A Practical View Into the Claude Mythos Preview
The Sequence AI of the Week #843: The AI We Built But Can't Release: A Practical View Into the Claude Mythos Preview The Sequence AI of the Week #843: The AI We Built But Can't Release: A Practical View Into the Claude Mythos Preview

Welcome to another edition of The Sequence.

Today, we are diving into what is undoubtedly the most fascinating, illuminating, and slightly unnerving AI document of the year: the system card for Anthropic’s Claude Mythos Preview.

For the last few years, the frontier AI development loop has been highly predictable: scale up the compute, implement some algorithmic breakthroughs, train a new state-of-the-art model, and push it to an API or chat interface for the world to play with.

We benchmark it, we build products around it, and we wait for the next iteration.

Anthropic just broke that loop.

3 weeks, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #842: Everything You Need to Know About World Models
The Sequence Knowledge #842: Everything You Need to Know About World Models The Sequence Knowledge #842: Everything You Need to Know About World Models

The Sequence Knowledge #800: Discusses the different types of world models and reviews the first major paper in the space.

The Sequence Knowledge #804: Covers the famous Dreamer models that opened up the space of world models.

The Sequence Knowledge #825: Discusses one of the most innovative world models: World Labs’ Marble.

The Sequence Knowledge #829: Explores the idea of world models and physical AI including NVIDIA’s Cosmos models.

The Sequence Knowledge #833: Dives into the core architecture components and building blocks of world models.

3 weeks, 2 days назад @ thesequence.substack.com
The Sequence Radar #841: Three Model Releases, Three Futures
The Sequence Radar #841: Three Model Releases, Three Futures The Sequence Radar #841: Three Model Releases, Three Futures

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Three Model Releases, Three FuturesThis week’s AI launches were not just new models.

They were three different answers to a deeper question: what is a frontier model for?

The key detail is that Meta is tying the model to distribution it already owns: Meta AI, Instagram, Facebook, Messenger, WhatsApp, and eventually glasses.

AI Lab: Meta AI, KAUST, and CollaboratorsSummary: This paper introduces Neural Computers (NCs), an emerging computing paradigm that unifies computation, memory, and I/O within a single learned model state rather than relying on external execution environments.

Muse SparkMeta Superintelligence Lab released Muse Sp…

3 weeks, 4 days назад @ thesequence.substack.com
Synced Review
последний пост None
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 1 month назад
Вайбкодинг по Chess’ноку. 1. e4
Вайбкодинг по Chess’ноку. 1. e4 Вайбкодинг по Chess’ноку. 1. e4

Но это не вайбкодинг, а тяжёлая профессиональная ИИ-разработка.

За это время по этому проекту в ChatGPT было создано 112 чатов — это примерно 560 промптов.

И в особо напряжённые периоды приходилось вставать по ночам, чтобы оптимально использовать лимиты, которые делятся на 5-часовые и недельные сессии.

Но это не магия и не кнопка «сделать хорошо».

Именно поэтому будущее не за вайбкодингом, а за теми, кто научится управлять этой скоростью.

1 month назад @ habr.com
Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества
Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества

Простой пример с ценами на топливо: бензин дорожает и из-за роста цены на нефть, и из-за ее падения.

Осознание того, что твой труд увеличивает чью-то капитализацию, но не решает реальных проблем общества, видимых в быту и в новостях, подтолкнуло искать еще какую-то деятельность.

Кроме того, благодаря АМБ появился уникальный датасет новостей с противоречиями современного общества на kaggle и github, далее о нем.

Датасет новостей о противоречиях современного обществаАктивисты АМБ и волонтеры дружественных коллективов собрали и разметили датасет новостей, подсвечивающие те самые системные противоречия, о которых я задумывался ранее.

Пример Б В 2023 году в мире голодал каждый 11-й человек, а в …

2 months, 2 weeks назад @ habr.com
[Перевод] Как устроен Codex
[Перевод] Как устроен Codex [Перевод] Как устроен Codex

Подробный разбор того, как команда OpenAI Codex создаёт своего кодового агента, как его используют инженеры и что это может значить для будущего разработки ПО.

Чтобы разобраться, как устроен Codex, как команды внутри OpenAI его используют и как он влияет на инженерные практики у создателей ChatGPT, я поговорил с тремя сотрудниками OpenAI:Тибо Соттио (Thibault Sottiaux) — руководитель Codex.

Оба продукта были запущены весной: Codex CLI анонсировали в апреле 2025 года, а Codex в ChatGPT представили в мае.

В команде Codex эти файлы объясняют агенту, как ориентироваться в кодовой базе, какие команды запускать для тестирования и как следовать стандартам проекта.

Использование Codex в OpenAIПомим…

2 months, 2 weeks назад @ habr.com
Курс Natural Language Processing & LLMs — новый сезон
Курс Natural Language Processing & LLMs — новый сезон Курс Natural Language Processing & LLMs — новый сезон

10 февраля мы в очередной раз запускаем бесплатный онлайн-курс по обработке естественного языка (Natural Language Processing).

Что будем проходить:классическое начало: закон Ципфа, TF-IDF, RNN, CNN, Transformer;основные задачи NLP: классификация текста, тегирование и генерация;специфичные области: агенты и вайб-кодинг;LLM и их применение.

Если вы студент ИТМО, МФТИ или ВШЭ, то курс можно зачесть, как учебный.

Работаю в области NLP более 12 лет, успел поработать в Яндексе и ВКонтакте, защитить кандидатскую диссертацию.

Если есть вопросы, то приходите с ними в ODS Mattermost – там будут все ответы, время семинаров и ссылки.

3 months, 1 week назад @ habr.com
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода

Однако все задачи в MERA CODE, как впрочем и в SWE-bench и других бенчмарках подобного назначения, следуют классической парадигме, когда у нас есть фиксированный обучающий набор данных и, что более важно, фиксированный проверочный набор.

Но большие языковые модели для кодинга, которые мы и пытаемся оценивать нашим набором, также учатся на GitHub – со времен еще первой модели LLaMa.

Кажется, что 700 задач немного, но это уже очень приличное количество, и что самое важное — это новые задачи.

Current behavior: from sympy import ask, Q, Symbol x = Symbol('x') print(ask(Q.finite(x**-1), Q.real(x))) # Output: True Expected behavior: The function should return None to indicate uncertainty, as x**-…

7 months, 3 weeks назад @ habr.com
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке

Ответ: Кэисукэ ТибаSPARQL-запрос SimpleSELECT DISTINCT ?s ?r ?o WHERE { { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

GROUP BY ?s ?r HAVING(count(?o) = 1) } { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

Ответ: Национальная система платежных карт (НСПК) Центр биометрических технологий (ЦБТ) ЕБСSELECT ?s ?r ?o ?len WHERE { { SELECT ?s ?r (COUNT(?o1) as ?len) (GROUP_CONCAT(DISTINCT(STR(?o1));separator="|") AS ?o) WHERE { ?s ?r ?o1 . }

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

9 months, 2 weeks назад @ habr.com
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip

В этой статье я хочу поделиться своим опытом по конвертации нейросети в формат rknn с помощью библиотеки rknn-toolkit2.

Вот как выглядят веса pytorch модели в Netron:веса pytorch модели в NetronВажно!

Конвертация onnx модели в rknnДалее создается объект RKNN , который управляет процессом конвертации и инференса модели на платформе Rockchip.

На этом этапе происходит подготавка модели к конвертации в формат RKNN и последующему запуску на NPU Rockchip.

Создание и экспорт rknn моделиНа этом этапе происходит конвертация ONNX-модели во внутренний формат RKNN, оптимизация графа и подготовка к запуску на NPU Rockchip.

9 months, 3 weeks назад @ habr.com
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях MERA Code: всесторонняя оценка генерации кода в прикладных сценариях

🔗MERA Code🔗GitHub с кодом и данными🔗Коллекция на Hugging Face🔗Статья на arxiv🔗Репозиторий проекта на GitVerseЧто такое MERA Code?

Современные кодовые языковые модели и модели общего назначения (ChatGPT, Claude, Qwen, YandexGPT, GigaChat и др.)

Список текущих задач MERA Code и их характеристикКаталог задач MERA Code и их подробное описание представлено на сайте.

В MERA Code промпты строго подобраны под задачу и корректный выбор ответа.

В заключениеMERA Code — это попытка закрыть важный пробел в тестировании LLM: насколько они действительно полезны в реальной, локализованной разработке.

9 months, 3 weeks назад @ habr.com
Machine Learning Mastery
последний пост 3 weeks, 2 days назад
How to Implement Tool Calling with Gemma 4 and Python
How to Implement Tool Calling with Gemma 4 and Python How to Implement Tool Calling with Gemma 4 and Python

How to implement a local tool calling system using Python and Ollama.

Tool calling, aka function calling, is the foundational architecture shift required to fix this gap.

Tool calling serves as the bridge that can help transform static models into dynamic autonomous agents.

decode ( 'utf-8' ) ) if "results" not in geo_data or not geo_data [ "results" ] : return f "Could not find coordinates for city: {city}."

Gemma 4 certainly appears to be a powerhouse of a small language model reasoning engine with tool calling capabilities.

3 weeks, 2 days назад @ machinelearningmastery.com
Structured Outputs vs. Function Calling: Which Should Your Agent Use?
Structured Outputs vs. Function Calling: Which Should Your Agent Use? Structured Outputs vs. Function Calling: Which Should Your Agent Use?

Share Post ShareIn this article, you will learn the architectural differences between structured outputs and function calling in modern language model systems.

Topics we will cover include:How structured outputs and function calling work under the hood.

Function Calling MechanicsFunction calling, on the other hand, relies heavily on instruction tuning.

If structured outputs dictate the shape of the data, function calling dictates the control flow of the application.

The Overlap:It is worth noting that modern function calling actually relies on structured outputs under the hood to ensure the generated arguments match your function signatures.

3 weeks, 2 days назад @ machinelearningmastery.com
Beyond Vector Search: Building a Deterministic 3-Tiered Graph-RAG System
Beyond Vector Search: Building a Deterministic 3-Tiered Graph-RAG System Beyond Vector Search: Building a Deterministic 3-Tiered Graph-RAG System

add ( "LeBron James" , "played_for" , "Ottawa Beavers" , "NBA_2023_regular_season" ) facts_qs .

add ( "Ottawa Beavers" , "based_in" , "downtown Ottawa" , "NBA_trivia" ) facts_qs .

) doc2 = ( "Ottawa Beavers" "The Ottawa Beavers star player LeBron James is out for the rest of the 2023 NBA season, " "after his ankle injury has worsened.

) doc2 = ( "Ottawa Beavers" "The Ottawa Beavers star player LeBron James is out for the rest of the 2023 NBA season, " "after his ankle injury has worsened.

“LeBron James” and “Ottawa Beavers”).

3 weeks, 5 days назад @ machinelearningmastery.com
The Roadmap to Mastering Agentic AI Design Patterns
The Roadmap to Mastering Agentic AI Design Patterns The Roadmap to Mastering Agentic AI Design Patterns

Share Post ShareIn this article, you will learn how to systematically select and apply agentic AI design patterns to build reliable, scalable agent systems.

Agentic design patterns are reusable approaches for recurring problems in agentic system design.

This article offers a practical roadmap to understanding agentic AI design patterns.

Further learning: AI agent design patterns | Google Cloud and Agentic AI Design Patterns Introduction and walkthrough | Amazon Web Services.

Further reading: Evaluating AI Agents | DeepLearning.AIConclusionAgentic AI design patterns are not a checklist to complete once.

3 weeks, 6 days назад @ machinelearningmastery.com
A Hands-On Guide to Testing Agents with RAGAs and G-Eval
A Hands-On Guide to Testing Agents with RAGAs and G-Eval A Hands-On Guide to Testing Agents with RAGAs and G-Eval

This article presents a hands-on guide to understanding how to test large language model and agent-based applications using both RAGAs and frameworks based on G-Eval.

from ragas import evaluate from ragas.metrics import faithfulness # Defining a simple testing dataset for a question-answering scenario data = { "question": ["What is the capital of Japan?

} # Running RAGAs evaluation result = evaluate(data, metrics=[faithfulness]) 1 2 3 4 5 6 7 8 9 10 11 12 from ragas import evaluate from ragas .

environ [ "OPENAI_API_KEY" ] = "YOUR_API_KEY" # Convert list to Hugging Face Dataset (required by RAGAs) dataset = Dataset .

environ [ "OPENAI_API_KEY" ] = openai_api _ key # Convert test cases into …

4 weeks назад @ machinelearningmastery.com
Handling Race Conditions in Multi-Agent Orchestration
Handling Race Conditions in Multi-Agent Orchestration Handling Race Conditions in Multi-Agent Orchestration

Share Post ShareIn this article, you will learn how to identify, understand, and mitigate race conditions in multi-agent orchestration systems.

Why Multi-Agent Pipelines Are Especially VulnerableTraditional concurrent programming has decades of tooling around race conditions: threads, mutexes, semaphores, and atomic operations.

Testing for Race Conditions Before They Test YouThe hard part about race conditions is reproducing them.

acquire ( ) value = counter value = value + 1 counter = value lock .

Closing that window through locks, atomic operations, or conflict detection is the core of handling race conditions in practice.

4 weeks, 1 day назад @ machinelearningmastery.com
Top 5 Reranking Models to Improve RAG Results
Top 5 Reranking Models to Improve RAG Results Top 5 Reranking Models to Improve RAG Results

Share Post ShareIn this article, you will learn how reranking improves the relevance of results in retrieval-augmented generation (RAG) systems by going beyond what retrievers alone can achieve.

IntroductionIf you have worked with retrieval-augmented generation (RAG) systems, you have probably seen this problem.

Benchmarks like MTEB, BEIR, and MIRACL are commonly used to evaluate these models, and most modern RAG systems rely on rerankers for production-quality results.

There is no single best reranker for every use case.

It shows very strong published reranking results (69.76 on MTEB-R, 75.94 on CMTEB-R, 72.74 on MMTEB-R, 69.97 on MLDR, and 81.20 on MTEB-Code).

1 month назад @ machinelearningmastery.com
7 Machine Learning Trends to Watch in 2026
7 Machine Learning Trends to Watch in 2026 7 Machine Learning Trends to Watch in 2026

Here are the 7 trends actually shaping how machine learning is being built and used in 2026.

Trend 4: Machine Learning Moves to the Edge (IoT + Real-Time Intelligence)For years, most machine learning systems lived in the cloud.

The difference between cloud machine learning and edge machine learning comes down to speed and control.

Wrapping UpIn 2026, machine learning is no longer just a set of tools or experimental features.

Together, they represent a new standard: machine learning systems that work, reliably and meaningfully, at the heart of business and daily life.

1 month назад @ machinelearningmastery.com
Building a ‘Human-in-the-Loop’ Approval Gate for Autonomous Agents
Building a ‘Human-in-the-Loop’ Approval Gate for Autonomous Agents Building a ‘Human-in-the-Loop’ Approval Gate for Autonomous Agents

get ( "approved" ) : print ( "[System]: SENDING EMAIL ->" , state [ "draft" ] ) return { "sent" : True } else : print ( "[System]: Draft was rejected.

Notice below that a thread ID is used so the memory can keep track of the workflow state across executions.

get_state ( config ) print ( f "Next node to execute: {current_state.next}" ) # Should show 'send_message' print ( f "Current Draft: '{current_state.values['draft']}'" ) # Simulating a human reviewing and approving the email draft print ( "[Human]: Reviewing draft... Looks good.

stream ( None , config ) : pass print ( "--- FINAL STATE ---" ) print ( app .

-- - FINAL STATE -- - { 'draft' : 'Hello!

1 month, 1 week назад @ machinelearningmastery.com
From Prompt to Prediction: Understanding Prefill, Decode, and the KV Cache in LLMs
From Prompt to Prediction: Understanding Prefill, Decode, and the KV Cache in LLMs From Prompt to Prediction: Understanding Prefill, Decode, and the KV Cache in LLMs

Q = torch .

unsqueeze ( 0 ) , - 1e9 ) # Convert logits to attention weights weights = torch .

zeros_like ( weights ) , weights ) # Compute contexts: (heads, n, n) @ (n, 1) -> (heads, n, 1) contexts = ( weights @ V ) .

float ( ) # [1, 2, 3, 4, 5] print ( "New tokens: " , tokens ) print ( "New Values: " , V )Output:New tokens: ['Today', 'weather', 'is', 'so', 'nice'] New Values: tensor([[10.

zeros_like ( weights_dec ) , weights_dec ) # Context vectors: (4 × 1 × n) @ (n × 1) → (4 × 1 × 1) → squeeze → (4,) contexts_dec = ( weights _ dec @ V ) .

1 month, 1 week назад @ machinelearningmastery.com
LlamaAgents Builder: From Prompt to Deployed AI Agent in Minutes
LlamaAgents Builder: From Prompt to Deployed AI Agent in Minutes LlamaAgents Builder: From Prompt to Deployed AI Agent in Minutes

Share Post ShareIn this article, you will learn how to build, deploy, and test a no-code document-processing AI agent with LlamaAgents Builder in LlamaCloud.

IntroductionCreating an AI agent for tasks like analyzing and processing documents autonomously used to require hours of near-endless configuration, code orchestration, and deployment battles.

This article unveils the process of building, deploying, and using an intelligent agent from scratch without writing a single line of code, using LlamaAgents Builder.

Building with LlamaAgents BuilderLlamaAgents Builder is one of the newest features in the LlamaCloud web platform, whose flagship product was originally introduced as LlamaParse.

Th…

1 month, 1 week назад @ machinelearningmastery.com
Vector Databases Explained in 3 Levels of Difficulty
Vector Databases Explained in 3 Levels of Difficulty Vector Databases Explained in 3 Levels of Difficulty

How vector databases support nearest neighbor search, metadata filtering, and hybrid retrieval.

How indexing techniques such as HNSW, IVF, and PQ help vector search scale in production.

Vector databases answer a different one: which records are most similar to this?

Comparing a query vector against every stored vector means billions of floating-point operations at production data sizes, and that math makes real-time search impractical.

Production vector databases run ANN algorithms under the hood.

1 month, 1 week назад @ machinelearningmastery.com
5 Practical Techniques to Detect and Mitigate LLM Hallucinations Beyond Prompt Engineering
5 Practical Techniques to Detect and Mitigate LLM Hallucinations Beyond Prompt Engineering 5 Practical Techniques to Detect and Mitigate LLM Hallucinations Beyond Prompt Engineering

search ( query_embedding , k = 1 ) retrieved_doc = documents [ indices [ 0 ] [ 0 ] ] # Step 7: Generate response using retrieved context client = OpenAI ( ) response = client .

create ( model = "gpt-4o-mini" , messages = [ { "role" : "system" , "content" : "Answer using the provided context only."

Instead of relying on a single model response, you introduce additional steps that check, validate, or challenge what was generated before it reaches the user.

Here is a simple implementation:from openai import OpenAI client = OpenAI() def get_answer_with_confidence(question): response = client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "system", "content": "Answer the ques…

1 month, 1 week назад @ machinelearningmastery.com
Beyond the Vector Store: Building the Full Data Layer for AI Applications
Beyond the Vector Store: Building the Full Data Layer for AI Applications Beyond the Vector Store: Building the Full Data Layer for AI Applications

Share Post ShareIn this article, you will learn why production AI applications need both a vector database for semantic retrieval and a relational database for structured, transactional workloads.

Topics we will cover include:What vector databases do well, and where they fall short in production AI systems.

Production AI applications need two complementary data engines working in lockstep: a vector database for semantic retrieval, and a relational database for everything else.

This is cheaper, faster, and more reliable than relying on vector search alone to return a perfectly scoped result set.

If you are building a production AI application, it would be a mistake to treat these as competin…

1 month, 2 weeks назад @ machinelearningmastery.com
7 Steps to Mastering Memory in Agentic AI Systems
7 Steps to Mastering Memory in Agentic AI Systems 7 Steps to Mastering Memory in Agentic AI Systems

A Guide to Enhancing AI Learning and Recall | MongoDBStep 2: Learning the AI Agent Memory Type TaxonomyCognitive science gives us a vocabulary for the distinct roles memory plays in intelligent systems.

Further reading: Beyond Short-term Memory: The 3 Types of Long-term Memory AI Agents Need and Making Sense of Memory in AI Agents by Leonie MonigattiStep 3: Knowing the Difference Between Retrieval-Augmented Generation and MemoryOne of the most persistent sources of confusion for developers building agentic systems is conflating retrieval-augmented generation (RAG) with agent memory.

Further reading: AI Agent Memory: Build Stateful AI Systems That Remember – Redis and Building Memory-Aware A…

1 month, 2 weeks назад @ machinelearningmastery.com
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 1 month, 3 weeks назад
Why I Signed The Amicus Brief for Anthropic v Department of War
Why I Signed The Amicus Brief for Anthropic v Department of War Why I Signed The Amicus Brief for Anthropic v Department of War

On Monday, Anthropic filed a lawsuit against the Department of War, and an amicus brief in support of Anthropic was filed on behalf of a number of OpenAI and Google employees.

There’s also an amicus brief filed on behalf of Microsoft.

There’s conflicting reporting, but very broadly, Anthropic signed an agreement with the government to deploy Claude in classified, military contexts.

Anthropic said no, Pete Hegseth declared them a supply chain risk, and Anthropic filed a lawsuit against this.

The amicus brief was broadly aligned with my thoughts on the matter, so I signed.

1 month, 3 weeks назад @ alexirpan.com
MIT Mystery Hunt 2026
MIT Mystery Hunt 2026 MIT Mystery Hunt 2026

This has spoilers for MIT Mystery Hunt 2026.

Pre-HuntThe time running up to Hunt was more stressful than usual…very briefly, I typically hunt with teammate.

Just last year, I did GPH 2025, LN Hunt, Teammate Hunt 2025, Microsoft Hunt 2025, and Silph Puzzle Hunt 2025, all of which had significant 3+ hour solve puzzles that would not be out of place in Mystery Hunt.

Not to mention smaller hunts like Advent Hunt, and then I didn’t even do Brown Puzzlehunt or Vertex Hunt or the fall CMU Hunt.

To me, the crux is whether Mystery Hunt is broken, or Mystery Hunt is fine.

3 months, 1 week назад @ alexirpan.com
Authentic Imperfection
Authentic Imperfection Authentic Imperfection

* * *I’ve been thinking about the anger surrounding generative AI.

To keep things fair, he took the best human images and best AI images, meaning human art from famous artists, and AI art from prompters skilled at removing obvious tells of image generation.

When people complain about AI slop, I see it as a complaint against the deluge of default style AI images.

We’ve seen this happen in all forms: AI text, AI music, older forms of computer generated content like CGI.

As much as we celebrate imperfection, digital imperfection is a step too far.

5 months, 3 weeks назад @ alexirpan.com
Ten Years Later
Ten Years Later Ten Years Later

Every now and then, someone asks me why I blog, and I don’t know really know what to tell them.

That’s another reason I’m not celebrating 10 years with more gusto, I know I’ve been writing less.

Indiana Jones and the Great Circle: I don’t know how they did it, but Indiana Jones and the Great Circle was just fun all the way through.

My one complaint is that the hand-to-hand combat feels like the worst part of the game, so of course they put a bunch of upgrades behind learning parry timings you’ll never use later.

I have not tried Peak, but Another Crab’s Treasure was really good and is worth playing if you’re interested in a Souls-like.

8 months, 3 weeks назад @ alexirpan.com
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025 Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025

A music concert in the evenings, typically set up as a rave with EDM or rock music made by brony musicians.

She has been involved in organizing pony music concerts for over a decade, for both BABSCon and other pony conventions.

Thank you, BABSCon ChairsThe brony musicians immediately jump into an emergency Discord call with Pinkaboo, to get her side of the story.

Other conventions start tweeting in support of the brony musicians, with no one taking BABSCon’s side.

It’s hard for me to explain why I like MLP fan music, because brony music really isn’t accessible.

9 months, 2 weeks назад @ alexirpan.com
Lil'Log
последний пост None
inFERENCe
последний пост 2 months, 1 week назад
The Future of Software
The Future of Software The Future of Software

February 25, 2026The Future of SoftwareThe world of software is undergoing a shift not seen since the advent of compilers in the 1970s.

How will humans tell AI agents what software artefacts we would like to create?

How will humans tell AI agents what software artefacts we would like to create?

This future of software creation, in which our programming languages are abstracted away, raises two very important questions:What will the instruction/specification language look like?

This should be a clear layer of separation between the developer and the pool of AI agents working to maintain software.

2 months, 1 week назад @ inference.vc
Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On
Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On

Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years OnTen years ago this week, I wrote a provocative and bold post that blew up, made it to top spot on HackerNews.

In hindsight: There is a lot of stuff in deep learning that we don't understand nearly enough.

Sometimes things work for reasons completely unrelated to why we thought they would work.

(Pop some 🍿 in the microwave and read till the end for more)🎯 "Deep learning is powerful exactly because it makes hard things easy"Okay, this was a great insight.

🎯 Generative ModelingIn the post I suggested people learn "something harder" instead of - or in addition to - deep learning.

3 months назад @ inference.vc
The Spectator
последний пост None
The Unofficial Google Data Science Blog The Unofficial Google Data Science Blog
последний пост None
Off the Convex Path
последний пост None
Jay Alammar
последний пост None
Piekniewski's blog
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост 9 months, 2 weeks назад
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy /henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos.

Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker.

It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage.

By learning geometry and motion jointly from …

9 months, 2 weeks назад @ paperswithcode.com
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation /antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation

Calisthenics skill classification is the computer vision task of inferring the skill performed by an athlete from images, enabling automatic performance assessment and personalized analytics.

Traditional methods for calisthenics skill recognition are based on pose estimation methods to determine the position of skeletal data from images, which is later fed to a classification algorithm to infer the performed skill.

This work proposes a direct approach to calisthenics skill recognition, which leverages depth estimation and athlete patch retrieval to avoid the computationally expensive human pose estimation module.

Using Depth Anything V2 for depth estimation and YOLOv10 for athlete localizat…

9 months, 2 weeks назад @ paperswithcode.com
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI /snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI

Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost.

Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parallelism strategy that adapts to real-world traffic while integrating speculative decoding, SwiftKV compute reduction, and optimized embedding inference.

It achieves up to 3.4 times faster request completion, 1.75 times faster generation, and 1.6M tokens/sec per GPU for embeddings, outperforming both latency- and throughput-optimized deployments.

Already powering Snowflake Cortex AI, Arctic Inference delivers state-of-the-art, cost-effective inference for ent…

9 months, 2 weeks назад @ paperswithcode.com
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale /NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting.

The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales.

FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches.

In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realis…

9 months, 2 weeks назад @ paperswithcode.com
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work? /jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

We study A/B experiments that are designed to compare the performance of two recommendation algorithms.

The bias arising from this type of data sharing is known as "symbiosis bias".

In this paper, we highlight that, for decision-making purposes, the sign of the GTE often matters more than its precise magnitude when selecting the better algorithm.

We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the expected GTE estimate under data sharing aligns with or contradicts the sign of the true GTE.

Our analysis identifies the level of exploration versus exploitation as a key determinant of how symbiosis bias impacts algorithm selection.

9 months, 2 weeks назад @ paperswithcode.com
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression /qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression

Task-agnostic prompt compression leverages the redundancy in natural language to reduce computational overhead and enhance information density within prompts, especially in long-context scenarios.

Existing methods predominantly rely on information entropy as the metric to compress lexical units, aiming to achieve minimal information loss.

However, these approaches overlook two critical aspects: (i) the importance of attention-critical tokens at the algorithmic level, and (ii) shifts in information entropy during the compression process.

Motivated by these challenges, we propose a dynamic attention-aware approach for task-agnostic prompt compression (DAC).

This approach effectively integrate…

9 months, 2 weeks назад @ paperswithcode.com
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions /lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions

Large Language Models (LLMs) can provide accurate word definitions and explanations for any context.

However, the scope of the definition changes for different target groups, like children or language learners.

We investigate how simplification impacts homonym definition quality across three target groups: Normal, Simple, and ELI5.

Our results show that simplification drastically degrades definition completeness by neglecting polysemy, increasing the risk of misunderstanding.

Fine-tuning Llama 3.1 8B with Direct Preference Optimization substantially improves homonym response quality across all prompt types.

9 months, 2 weeks назад @ paperswithcode.com
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention /pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention

Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations - fabricated content contradicting visual inputs.

Existing hallucination mitigation methods either incur prohibitive computational costs or introduce distribution mismatches between training data and model outputs.

We identify a critical insight: hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.

To address this, we propose **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning), a framework that eliminates dependency on human annotations…

9 months, 2 weeks назад @ paperswithcode.com
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models /owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Language models (LMs) are challenging to adapt to new data distributions by simple finetuning.

This is due to the rigidity of their subword tokenizers, which typically remain unchanged during adaptation.

This inflexibility often leads to inefficient tokenization, causing overfragmentation of out-of-distribution domains, unseen languages, or scripts.

In this work, we develop byte-level LMs with learnable tokenizers to make tokenization adaptive.

Our models include a submodule that learns to predict boundaries between the input byte sequence, encoding it into variable-length segments.

9 months, 2 weeks назад @ paperswithcode.com
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion /wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion

To address the challenges posed by cascading reactions caused by component failures in autonomous cargo ships (ACS) and the uncertainties in emergency decision-making, this paper proposes a novel hybrid feature fusion framework for constructing a graph-structured dataset of failure modes.

A hierarchical feature fusion framework is constructed, using Word2Vec encoding to encode subsystem/component features, BERT-KPCA to process failure modes/reasons, and Sentence-BERT to quantify the semantic association between failure impact and emergency decision-making.

The dataset covers 12 systems, 1,262 failure modes, and 6,150 propagation paths.

In the label prediction results, the Shore-based Meteor…

9 months, 2 weeks назад @ paperswithcode.com
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering /YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering

In recent years, models based on Graph Convolutional Networks (GCN) have made significant strides in the field of graph data analysis.

Although the Graph Transformer architecture has mitigated some of these issues, its performance is still limited when processing heterogeneous graph data.

To address these challenges, this study proposes a novel deep clustering framework that comprising GCN, Autoencoder (AE), and Graph Transformer, termed the Tri-Learn Graph Fusion Network (Tri-GFN).

The tri-learning mechanism allows mutual learning among these modules, while the feature fusion strategy enables the model to capture complex relationships, yielding highly discriminative representations for gra…

9 months, 2 weeks назад @ paperswithcode.com
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation /mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression.

The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant.

The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable.

We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69\% test accuracy in just 20 ep…

9 months, 2 weeks назад @ paperswithcode.com
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation /Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation

While this problem has been widely studied in traditional recommendation settings, its implications for bundle recommendation (BR) remain largely unexplored.

Existing fairness frameworks and metrics designed for traditional recommender systems may not directly translate to this multi-layered setting.

In this paper, we conduct a comprehensive reproducibility study of product-side fairness in BR across three real-world datasets using four state-of-the-art BR methods.

We analyze exposure disparities at both the bundle and item levels using multiple fairness metrics, uncovering important patterns.

Overall, our findings offer actionable insights for building fairer bundle recommender systems and…

9 months, 2 weeks назад @ paperswithcode.com
/cbobed/ OntView: What you See is What you Meant
/cbobed/ OntView: What you See is What you Meant /cbobed/ OntView: What you See is What you Meant

However, the lack of tools that provide effective visualization is still a significant challenge.

In this paper, we present OntView, an ontology viewer that is designed to provide users with an intuitive visual representation of ontology concepts and their formal definitions through a user-friendly interface.

Building on the use of a DL reasoner, OntView follows a "What you see is what you meant" paradigm, showing the actual inferred knowledge.

One key aspect for this is its ability to visualize General Concept Inclusions (GCI), a feature absent in existing visualization tools.

OntView has been released with an open-source license for the whole community.

9 months, 2 weeks назад @ paperswithcode.com
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction /Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction

These approaches fail to capture elaborate relations hidden in real-world bundle structures, resulting in suboptimal bundle representations.

To overcome this limitation, we propose RaMen, a novel method that provides a holistic multi-strategy approach for bundle construction.

RaMen utilizes both intrinsic (characteristics) and extrinsic (collaborative signals) information to model bundle structures through Explicit Strategy-aware Learning (ESL) and Implicit Strategy-aware Learning (ISL).

Integrating diverse strategies enables RaMen to learn more comprehensive and robust bundle representations.

Meanwhile, Multi-strategy Alignment & Discrimination module is employed to facilitate knowledge tr…

9 months, 2 weeks назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 9 months, 2 weeks назад
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation /PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation

The rise of Large Reasoning Models (LRMs) promises a significant leap forward in language model capabilities, aiming to tackle increasingly sophisticated tasks with unprecedented efficiency and accuracy.

However, despite their impressive performance, recent studies have highlighted how current reasoning models frequently fail to generalize to novel, unseen problems, often resorting to memorized solutions rather than genuine inferential reasoning.

In this paper, we introduce Nexus Architect, an enhanced iteration of our multi-agent system framework, Nexus, equipped with a novel automated workflow synthesis mechanism.

Given a user's prompt and a small set of representative examples, the Archi…

9 months, 2 weeks назад @ paperswithcode.com
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings /sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings

One of the most tested ways to establish such a communication is through the use of sign based languages.

However, not many people are aware of the smaller intricacies involved with sign language.

Sign language recognition using computer vision aims at eliminating the communication barrier between deaf-mute and ordinary people so that they can properly communicate with others.

In recent studies, it has been found that people with hearing disabilities prefer to sign over typing during these video calls.

In this paper, we are proposing a browser extension that will automatically translate sign language to subtitles for everyone else in the video call.

9 months, 2 weeks назад @ paperswithcode.com
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates /alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates

In this paper, we present our submission to the MM-ArgFallacy2025 shared task, which aims to advance research in multimodal argument mining, focusing on logical fallacies in political debates.

Our approach uses pretrained Transformer-based models and proposes several ways to leverage context.

In the fallacy classification subtask, our models achieved macro F1-scores of 0.4444 (text), 0.3559 (audio), and 0.4403 (multimodal).

Our multimodal model showed performance comparable to the text-only model, suggesting potential for improvements.

PDFAbstract

9 months, 2 weeks назад @ paperswithcode.com
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms /RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

On-demand ride-sharing platforms face the fundamental challenge of dynamically bundling passengers with diverse origins and destinations and matching them with vehicles in real time, all under significant uncertainty.

However, conventional MARL-based ride-sharing approaches heavily rely on the accurate estimation of Q-values or V-values, which becomes problematic in large-scale, highly uncertain environments.

To address these challenges, we propose two novel alternative methods that bypass value function estimation.

First, we adapt GRPO to ride-sharing, replacing the PPO baseline with the group average reward to eliminate critic estimation errors and reduce training bias.

Second, inspired b…

9 months, 2 weeks назад @ paperswithcode.com
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation /LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Include the markdown at the top of your GitHub README.md file to showcase the performance of the model.

Badges are live and will be dynamically updated with the latest ranking of this paper.

9 months, 2 weeks назад @ paperswithcode.com
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space

We introduce a novel classification framework, ZClassifier, that replaces conventional deterministic logits with diagonal Gaussian-distributed logits. Code: https://github.com/ShimSoonYong/ZClassifier

9 months, 3 weeks назад @ paperswithcode.com
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation

In this paper, we fill this gap and propose a family of novel gradual semantics for equipping assumptions, which are the core components in ABA frameworks, with dialectical strengths. Code: https://github.com/briziorusso/GradualABA

9 months, 3 weeks назад @ paperswithcode.com
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months, 3 weeks назад @ paperswithcode.com
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling

Despite rapid progress in end-to-end AI music generation, AI-driven modeling of professional Digital Signal Processing (DSP) workflows remains challenging. Code: https://github.com/IsaacYQH/WildFX

9 months, 3 weeks назад @ paperswithcode.com
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss

This paper explores the application of a simple weighted loss function to Transformer-based models for multi-label emotion detection in SemEval-2025 Shared Task 11. Code: https://github.com/summer1278/semeval2025-task11

9 months, 3 weeks назад @ paperswithcode.com
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs

We present Step-wise Policy for Rare-tool Knowledge (SPaRK), a novel reinforcement learning framework that teaches large language models to explore diverse tool usage patterns beyond conventional high-temperature sampling. Code: https://github.com/gabrielkmbo/explore-rl

9 months, 3 weeks назад @ paperswithcode.com
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation

Service robots are increasingly deployed in diverse and dynamic environments, where both physical layouts and social contexts change over time and across locations. Code: https://github.com/Cavendish518/LE-Nav

9 months, 3 weeks назад @ paperswithcode.com
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months, 3 weeks назад @ paperswithcode.com
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array

The frequent data swaps between the systolic array and external vector units result in low systolic array utilization. Code: https://github.com/VCA-EPFL/FSA

9 months, 3 weeks назад @ paperswithcode.com
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months, 3 weeks назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 9 months, 2 weeks назад
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months, 3 weeks назад @ paperswithcode.com
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training

Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. Code: https://github.com/benedekrozemberczki/pytorch_geometric_temporal

9 months, 3 weeks назад @ paperswithcode.com
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months, 3 weeks назад @ paperswithcode.com
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context

Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. Code: https://github.com/gitter-lab/Assay2Mol

9 months, 3 weeks назад @ paperswithcode.com
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition

We employ forward Kullback-Leibler (KL) divergence alongside spatio-temporal focal modulation to effectively transfer both local and global context from the Video-FocalNet Base (teacher) to the proposed VFL-Net (student). Code: https://github.com/hayatkhan8660-maker/DVFL-Net

9 months, 3 weeks назад @ paperswithcode.com
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months, 3 weeks назад @ paperswithcode.com
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters

This paper introduces a novel approach to project success evaluation by integrating fuzzy logic into an existing construct. Code: https://github.com/joaojcorreia/FuzzyLogic_ProjectSuccess

9 months, 3 weeks назад @ paperswithcode.com
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing

Extensive experiments demonstrate the effectiveness of InstructFLIP by outperforming SOTA models in accuracy and substantially reducing training redundancy across diverse domains in FAS. Code: https://github.com/kunkunlin1221/InstructFLIP

9 months, 3 weeks назад @ paperswithcode.com
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images

Recent progress has been made in region-aware vision-language modeling, particularly with the emergence of the Describe Anything Model (DAM). Code: https://github.com/Linvyl/DAM-QA

9 months, 3 weeks назад @ paperswithcode.com
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker

We propose multi-step custom implementation utilizing widely adopted hybrid search (metadata & embedding) and state of the art late interaction re-ranker to retrieve best matching pages. Code: https://github.com/abhijeet3922/vision-RAG

9 months, 3 weeks назад @ paperswithcode.com
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation

3D modeling is moving from virtual to physical. Code: https://github.com/ziangcao0312/PhysX

9 months, 3 weeks назад @ paperswithcode.com
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Code: https://github.com/henry123-boy/SpaTrackerV2

9 months, 3 weeks назад @ paperswithcode.com
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks

Analysis of network performance, correlation patterns, and weight matrices reveals that mutual information minimization yields high task performance alongside clear functional modularity and moderate structural modularity. Code: https://github.com/cncs-fit/mio_rnn

9 months, 3 weeks назад @ paperswithcode.com
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator

Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. Code: https://github.com/coswindywang/HdLM

9 months, 3 weeks назад @ paperswithcode.com
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL

Modern language models address complex questions through chain-of-thought (CoT) reasoning (Wei et al., 2023) and retrieval augmentation (Lewis et al., 2021), yet struggle with error propagation and knowledge integration. Code: https://github.com/ahmedehabb/From-Roots-to-Rewards-Dynamic-Tree-Reasoning-with-RL

9 months, 3 weeks назад @ paperswithcode.com
💼 University and corporation labs
DeepMind DeepMind
последний пост 6 days, 23 hours назад
Enabling a new model for healthcare with AI co-clinician
Enabling a new model for healthcare with AI co-clinician Enabling a new model for healthcare with AI co-clinician

Health systems worldwide are striving for better outcomes, lower costs, and an improved experience for both patients and clinicians.

That's why, today, we are announcing our AI co-clinician research initiative, to explore how AI could better amplify doctors’ expertise and deliver higher quality care to patients.

We also have a long history of studying how clinicians and AI systems might work together.

This serves as the foundation of our AI co-clinician research initiative: AI designed to function as a collaborative member of the care team that interacts with patients under expert clinical supervision.

We designed and evaluated AI co-clinician in both clinician and patient-facing settings.

6 days, 23 hours назад @ deepmind.google
Announcing our partnership with the Republic of Korea
Announcing our partnership with the Republic of Korea Announcing our partnership with the Republic of Korea

Helping make this vision a reality, Google will establish an AI Campus in the Republic of Korea — an AI-focused facility within its Seoul offices.

AI co-scientist - a multi-agent AI system that acts as a virtual scientific collaborator to help researchers brainstorm and verify hypotheses.

To support the next generation of Korean AI talent, we are opening doors to forge connections with Google DeepMind, including exploring internship opportunities for Korean students.

Finally, following our Frontier AI Safety Commitments made at the AI Seoul Summit, we will collaborate with the Korean AI Safety Institute (AISI) on research and best practices.

By combining Google DeepMind's frontier AI models…

1 week, 3 days назад @ deepmind.google
Decoupled DiLoCo: A new frontier for resilient, distributed AI training
Decoupled DiLoCo: A new frontier for resilient, distributed AI training Decoupled DiLoCo: A new frontier for resilient, distributed AI training

Training a frontier AI model traditionally depends on a large, tightly coupled system in which identical chips must stay in near-perfect synchronization.

Today, in a new paper we are excited to share a new approach to this problem, called Decoupled DiLoCo (Distributed Low-Communication).

The result is a more resilient and flexible way to train advanced models across globally distributed data centers.

And crucially, Decoupled DiLoCo does not suffer the communication delays that made previous distributed methods like Data-Parallel impractical at global scale.

As frontier models continue to grow in scale and complexity, we’re exploring diverse approaches to train models across more compute, lo…

2 weeks, 1 day назад @ deepmind.google
Partnering with industry leaders to accelerate AI transformation
Partnering with industry leaders to accelerate AI transformation Partnering with industry leaders to accelerate AI transformation

We’re joining forces with Accenture, Bain & Company, BCG, Deloitte, and McKinsey to bring the power of frontier AI to organizations around the world.

A new initiative for enterprise transformationWe’re partnering with global enterprise consultancies to help them deliver world-leading agentic transformation for customers at speed and scale.

Early access to frontier models : Partners will receive early access to our frontier models, including the Gemini family.

Access to AI leadership: We will connect our leadership with customer CEOs and boards, helping them navigate the future of frontier AI research and development.

Looking aheadThese efforts build upon Google Cloud’s work supporting globa…

2 weeks, 1 day назад @ deepmind.google
Gemini 3.1 Flash TTS: the next generation of expressive AI speech
Gemini 3.1 Flash TTS: the next generation of expressive AI speech Gemini 3.1 Flash TTS: the next generation of expressive AI speech

Today, we’re introducing Gemini 3.1 Flash TTS, the latest text-to-speech model that delivers improved controllability, expressivity and quality — empowering developers, enterprises and everyday users to build the next generation of AI-speech applications.

Starting today, 3.1 Flash TTS is rolling out:For developers in preview via the Gemini API and Google AI StudioFor enterprises in preview on Vertex AIFor Workspace users via Google VidsImproved speech quality and controllabilityWe’ve improved the overall speech quality of Gemini 3.1 Flash TTS, making it our most natural and expressive model to date.

On the Artificial Analysis TTS leaderboard, a benchmark that captures thousands of blind hum…

3 weeks назад @ blog.google
Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning
Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning

Today, we’re introducing Gemini Robotics-ER 1.6, a significant upgrade to our reasoning-first model that enables robots to understand their environments with unprecedented precision.

This model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning and success detection.

Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection.

Starting today, Gemini Robotics-ER 1.6 is available to developers via the Gemini API and Google AI Studio.

To help you get started, we …

3 weeks, 2 days назад @ deepmind.google
Gemma 4: Byte for byte, the most capable open models
Gemma 4: Byte for byte, the most capable open models Gemma 4: Byte for byte, the most capable open models

By using these highly optimized models, you can fine-tune Gemma 4 to achieve state-of-the-art performance on your specific tasks.

Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding.

All models natively process video and images, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding.

Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding.

The edge models feature a 128K context window, while the larger models offer up to 256K, allowing you to pass repositories or long documents in a single prompt.

1 month назад @ blog.google
Gemini 3.1 Flash Live: Making audio AI more natural and reliable
Gemini 3.1 Flash Live: Making audio AI more natural and reliable Gemini 3.1 Flash Live: Making audio AI more natural and reliable

Today, we’re advancing Gemini’s real-time dialogue capabilities with Gemini 3.1 Flash Live, our highest-quality audio and voice model yet.

It delivers the speed and natural rhythm needed for the next generation of voice-first AI, offering a more intuitive experience for developers, enterprises and everyday users.

3.1 Flash Live is available across Google products:For developers: Robust reasoning and task executionWe’ve improved 3.1 Flash Live’s overall quality, making it more reliable for developers and enterprises to build voice-first agents that can complete complex tasks at scale.

On ComplexFuncBench Audio, a benchmark that captures multi-step function calling with various constraints, i…

1 month, 1 week назад @ blog.google
Protecting people from harmful manipulation
Protecting people from harmful manipulation Protecting people from harmful manipulation

Why harmful manipulation mattersConsider two scenarios: One AI model gives you facts to make a well-informed healthcare decision that improves your well-being.

Another AI model uses fear to pressure you to make an ill-informed decision that harms your health.

Developing new evaluations for a complex challengeTesting the outcomes of AI harmful manipulationTesting for harmful manipulation is inherently difficult because it involves measuring subtle changes in how people think and act, varying heavily by topic, culture and context.

Our findings show that success in one domain does not predict success in another, validating our targeted approach to testing for harmful manipulation in specific, …

1 month, 1 week назад @ deepmind.google
Lyria 3 Pro: Create longer tracks in more
Lyria 3 Pro: Create longer tracks in more Lyria 3 Pro: Create longer tracks in more

Vertex AI: Lyria 3 Pro is now in public preview on Vertex AI for businesses who require on-demand audio at scale.

Lyria 3 Pro is now available alongside Lyria RealTime in AI Studio.

Google Vids: Vids is an AI-powered video creation app that anyone can use.

This is rolling out to Google Workspace customers and Google AI Pro & Ultra subscribers starting this week.

Gemini app: Longer generations with Lyria 3 Pro are now available in the Gemini app, starting with paid subscribers.

1 month, 1 week назад @ blog.google
Measuring progress toward AGI: A cognitive framework
Measuring progress toward AGI: A cognitive framework Measuring progress toward AGI: A cognitive framework

Artificial General Intelligence (AGI) has the potential to accelerate scientific discovery and help solve some of humanity’s most pressing problems.

Tracking progress toward AGI will require a wide range of methods and approaches, and we believe cognitive science provides one important piece of the puzzle.

That’s why today, we’re releasing a new paper, “Measuring Progress Toward AGI: A Cognitive Taxonomy,” that presents a scientific foundation for understanding the cognitive capabilities of AI systems.

Deconstructing general intelligenceOur framework draws on decades of research from psychology, neuroscience and cognitive science to develop a cognitive taxonomy.

It identifies 10 key cogniti…

1 month, 2 weeks назад @ blog.google
From games to biology and beyond: 10 years of AlphaGo’s impact
From games to biology and beyond: 10 years of AlphaGo’s impact From games to biology and beyond: 10 years of AlphaGo’s impact

Scientific collaboration: We are integrating the search and reasoning principles pioneered with AlphaGo into an AI co-scientist.

We’ve also used AI to better understand the genome, advance fusion energy research, improve weather prediction and more.

Future of intelligenceFor an AI to be truly general, it needs to understand the physical world.

We think the combination of Gemini’s world models, AlphaGo’s search and planning techniques, and specialized AI tool use will prove to be critical for AGI.

True creativity is a key capability that such an AGI system would need to exhibit.

1 month, 4 weeks назад @ deepmind.google
Gemini 3.1 Flash-Lite: Built for intelligence at scale
Gemini 3.1 Flash-Lite: Built for intelligence at scale Gemini 3.1 Flash-Lite: Built for intelligence at scale

Today, we're introducing Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model.

Built for high-volume developer workloads at scale, 3.1 Flash-Lite delivers high quality for its price and model tier.

Starting today, 3.1 Flash-Lite is rolling out in preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI.

Cost-efficiency without compromisePriced at just $0.25/1M input tokens and $1.50/1M output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models.

This low latency is needed for high-frequency workflows, making it an ideal model for developers to build responsive, real-time experiences.

2 months назад @ blog.google
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
Nano Banana 2: Combining Pro capabilities with lightning-fast speed Nano Banana 2: Combining Pro capabilities with lightning-fast speed

In August of last year, our Gemini Image model, Nano Banana, became a viral sensation, redefining image generation and editing.

Then in November, we released Nano Banana Pro, offering users advanced intelligence and studio-quality creative control.

Today, we’re bringing the best of both worlds to users across Google.

Introducing Nano Banana 2 (Gemini 3.1 Flash Image), our latest state-of-the-art image model.

Now you can get the advanced world knowledge, quality and reasoning you love in Nano Banana Pro, at lightning-fast speed.

2 months, 1 week назад @ blog.google
Gemini 3.1 Pro: A smarter model for your most complex tasks
Gemini 3.1 Pro: A smarter model for your most complex tasks Gemini 3.1 Pro: A smarter model for your most complex tasks

Today, we’re releasing the upgraded core intelligence that makes those breakthroughs possible: Gemini 3.1 Pro.

We are shipping 3.1 Pro across our consumer and developer products to bring this progress in intelligence to your everyday applications.

Starting today, 3.1 Pro is rolling out:For developers in preview via the Gemini API in Google AI Studio, Gemini CLI, our agentic development platform Google Antigravity and Android Studioin preview via the Gemini API in Google AI Studio, Gemini CLI, our agentic development platform Google Antigravity and Android Studio For enterprises in Vertex AI and Gemini Enterprisein Vertex AI and Gemini Enterprise For consumers via the Gemini app and Notebook…

2 months, 2 weeks назад @ blog.google
Google
последний пост 19 часов назад
Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX
Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX

However, AI model migration represents a whole new level of complexity that requires even more advanced methods for AI-assisted migration.

Translating a production-grade machine learning model from one framework to another, for example, from TensorFlow (TF) to JAX, is not a simple syntax update.

The result is 6x faster model migration, a milestone Sundar highlighted in the recent Google Cloud Next keynote.

Designed around a functional, stateless paradigm, JAX is heavily optimized for modern Tensor Processing Unit (TPU) infrastructure and XLA compilation, making it the bedrock of the modern AI stack.

Manually migrating these models to JAX requires a fundamental rethinking of how layers inter…

19 часов назад @ cloud.google.com
The Blueprint: Translating stream-of-conscious speech into responsive, actionable task lists
The Blueprint: Translating stream-of-conscious speech into responsive, actionable task lists The Blueprint: Translating stream-of-conscious speech into responsive, actionable task lists

The challenge:We launched Ramble to take our popular Todoist application to the next level by capturing non-stop, stream-of-consciousness talking.

Our inspiration was that scene from The Devil Wears Prada where Miranda Priestly rapid-fires a dozen tasks at her assistant.

The solution:We built Ramble using Gemini Enterprise Agent Platform and its previous iteration, Vertex AI; specifically, we’re using Agent Platform to access the Gemini Flash models.

Gemini’s Live API (accessed via Agent Platform) powers Ramble’s core real-time interactions and key capabilities, including native audio streaming, proactive tool calling, session resumption, and multilingual understanding.

The APIs in Agent Pl…

19 часов назад @ cloud.google.com
Fitting the future: How Breuninger boosted sales with its "be your own model" AI
Fitting the future: How Breuninger boosted sales with its "be your own model" AI Fitting the future: How Breuninger boosted sales with its "be your own model" AI

Breuninger, a fashion and lifestyle company based in Germany, thought emerging generative media models could be a good fit for this fashion conundrum.

Working with Google Cloud, they built a virtual try-on experience that lets shoppers see high-end fashion on their own bodies using a simple selfie.

From trusted tester to live productThe project began when the Google Cloud team in Germany invited Breuninger to join the Trusted Tester Program for the Virtual Try-On (VTO) API.

The 'Be your own model' breakthrough: User feedback showed that customers did not just want to see a model; they wanted to see themselves.

The product owner at Breuninger noted that this close collaboration allowed the t…

19 часов назад @ cloud.google.com
Five must-have guides to move agents into production with Gemini Enterprise Agent Platform
Five must-have guides to move agents into production with Gemini Enterprise Agent Platform Five must-have guides to move agents into production with Gemini Enterprise Agent Platform

Building AI agents that work well in a demo is one thing, but running them in production requires serious infrastructure.

At Next 26, we announced that Agent Runtime now supports long-running agents that maintain state for up to seven days.

In this article, we’ll share five essential agent design patterns for building long-running agents with Agent Runtime.

The pattern we saw with shadow IT in 2015 is repeating itself with AI agents.

Deep dive: How A2A and MCP work togetherOrganizations will rarely build every AI agent they need entirely from scratch.

1 day, 19 hours назад @ cloud.google.com
Introducing Agent Gateway ISV ecosystem for security and governance
Introducing Agent Gateway ISV ecosystem for security and governance Introducing Agent Gateway ISV ecosystem for security and governance

Exabeam can ingest and analyze telemetry from Agent Platform including Agent Gateway, applying behavioral analytics to identify anomalous and high‑risk AI agent activity.

Integrated via Agent Gateway, it enforces data security and policy controls to ensure agent interactions remain governed and compliant across all models.

Ping Identity: Ping Identity integrates with Agent Gateway to bring runtime identity and real-time, fine-grained authorization to agent and tool traffic.

The integration with Agent Gateway ensures every request is continuously verified based on user, agent, context, and policy, rather than relying on static credentials.

Thales (Imperva): Thales provides advanced web appli…

1 day, 19 hours назад @ cloud.google.com
Cloud CISO Perspectives: At Next ‘26, why we’re multicloud and multi-AI
Cloud CISO Perspectives: At Next ‘26, why we’re multicloud and multi-AI Cloud CISO Perspectives: At Next ‘26, why we’re multicloud and multi-AI

Last week at Google Cloud Next ‘26, we announced 220 products, and signaled a paradigm shift.

We are not just moving workloads to the cloud; we are entering the era of the agentic enterprise.

Our vision at Google Cloud is clear: to be the most AI-native, open, and secure platform on the planet, meeting enterprises exactly where they are.

Security at machine speed: From minutes to secondsIn this new landscape, IT resilience is defined by a multi-AI and multicloud strategy.

A durable AI roadmap cannot rely on a single model or a single cloud provider.

6 days, 19 hours назад @ cloud.google.com
UKG unlocks real-time workforce intelligence at scale with the Agentic Data Cloud
UKG unlocks real-time workforce intelligence at scale with the Agentic Data Cloud UKG unlocks real-time workforce intelligence at scale with the Agentic Data Cloud

At UKG, we’ve spent years building and expanding our human capital management (HCM) and workforce management (WFM) solutions with new products, capabilities, and a series of acquisitions.

Our cloud platform includes a suite of connected systems that support every corner of the employee experience, including scheduling and workforce operations, HR and payroll, and culture and engagement tools.

Internally, teams needed consistent, high-performance access to shared data to innovate faster and modernize our architecture.

We needed a unified foundation for the next generation of intelligence across our suite.

That’s why we built People Fabric, our new data and intelligence platform powered by Al…

1 week назад @ cloud.google.com
The founder’s AI foundation: The top announcements for startups from Next ‘26
The founder’s AI foundation: The top announcements for startups from Next ‘26 The founder’s AI foundation: The top announcements for startups from Next ‘26

The momentum is undeniable: the world’s fastest-growing AI startups are building with Google Cloud.

And, as we saw at Next ‘26 last week, we continue to advance the models, infrastructure, platforms, security, and governance that allow startups to build faster and dream bigger.

Let’s look at some of the biggest announcements out of Next ‘26 and what they mean for you.

Gemini Enterprise: comprehensive agent platformThe new Gemini Enterprise Agent Platform is how we’re moving beyond isolated AI tools to a complete lifecycle platform.

It evolves the model building capabilities of Vertex AI with advanced features for:What it means for startups: Startups no longer need to stitch together fragmen…

1 week назад @ cloud.google.com
50+ fully managed MCP servers now available for Google Cloud services
50+ fully managed MCP servers now available for Google Cloud services 50+ fully managed MCP servers now available for Google Cloud services

At Google Cloud Next ‘26, we announced that more than 50 Google-managed Model Context Protocol (MCP) servers are generally available or in preview, with more on the way.

Why it matters: To move beyond experimental prototypes, AI agents must be able to access real-world data and solve complex problems autonomously.

Google-managed Managed Context Protocol (MCP) servers provide the critical connectivity to bridge AI agents with the vast Google and Google Cloud ecosystems.

By hosting these servers on an enterprise-ready, standardized platform, we eliminate the need to integrate with local MCP servers and offer a unified developer experience that’s integrated across major agent runtimes and fram…

1 week, 1 day назад @ cloud.google.com
Google AI 인프라의 미래: 에이전틱 시대를 위한 확장
Google AI 인프라의 미래: 에이전틱 시대를 위한 확장 Google AI 인프라의 미래: 에이전틱 시대를 위한 확장

오늘날의 에이전틱 시대(agentic era)를 선도하고자 하는 기업에는 이러한 새로운 요구사항에 맞춰 설계되고 최적화된 컴퓨팅 인프라가 필요합니다.

오늘 Google Cloud Next에서 우리는 여러분이 더 빠르게 혁신하고, 매력적인 사용자 및 고객 경험을 제공하며, 비용과 에너지 효율을 최적화할 수 있도록 돕는 새로운 AI 인프라 기능들을 대규모로 선보입니다.

똑똑하고 빠르며, 확장 가능하고 비용 효율적인 에이전틱 경험을 제공하기 위해서는 전용 하드웨어, 오픈 소프트웨어, 유연한 소비 모델을 아우르는 통합 인프라 스택이 필요합니다.

Google의 AI Hypercomputer는 에이전틱 시대를 위해 구축되고 최적화된 AI 전용 인프라로, 이러한 새로운 요구사항을 충족하도록 설계되었습니다.

오늘 우리는 다음과 같은 AI 인프라 포트폴리오의 대대적인 확장을 발표합니다.

1 week, 2 days назад @ cloud.google.com
Day 2 at Google Cloud Next: A marathon developer keynote
Day 2 at Google Cloud Next: A marathon developer keynote Day 2 at Google Cloud Next: A marathon developer keynote

When agents go off courseThus far, everything had gone swimmingly, but then Richard accidentally “broke” the simulator agent.

Scaling the agentsTo this point, all of the presenters had been showing off agent services running as Cloud Run services.

Instead, “we need to shift down.”To help, there’s Agent Identity and Agent Gateway, demoed by Ankur Kotwal, head of Cloud Developer Relations.

“It’s a full architecture for security to easily understand what you built without you having to actually explain it,” Yinon said.

“With Wiz, we want to enable your choice of tools and models to fix and prevent real risks,” he said.

1 week, 6 days назад @ cloud.google.com
Day 1 at Google Cloud Next ‘26 recap
Day 1 at Google Cloud Next ‘26 recap Day 1 at Google Cloud Next ‘26 recap

Last year at Google Cloud Next ‘25, we asked you to imagine a new future for AI.

At Next ‘26, the question before you is how do you move AI into production across your entire enterprise?

(This is the same unified stack that Google uses for Search, YouTube, Chrome, and Android.

As Alphabet CEO Sundar Pichai said in his opening remarks, “a big focus of ours is to always be customer zero for our own technologies.”)As AI matures, we’ve laid out a blueprint on how to succeed.

Read on for a whirlwind tour of what we announced from the keynote stage.

2 weeks назад @ cloud.google.com
Small and midsize businesses jumpstart their AI transformations with Gemini Enterprise
Small and midsize businesses jumpstart their AI transformations with Gemini Enterprise Small and midsize businesses jumpstart their AI transformations with Gemini Enterprise

With 400 million SMBs worldwide and 36 million in the U.S. alone, they provide 50% of global employment.

Now, with Google Cloud AI, they’re scaling faster, operating more efficiently, and delivering better results for their customers.

For years SMBs have been working with cutting edge products like Google Workspace, Google Ads, Search, YouTube, Maps, Google Wallet, and more to grow their business.

Now, more SMBs are taking the first big steps in their AI journeys with our leading models and our agentic platform, Gemini Enterprise.

Read on to explore their success stories and find our guide to the best SMB resources at Google Cloud Next ’26.

2 weeks назад @ cloud.google.com
What’s next in Google AI infrastructure: Scaling for the agentic era
What’s next in Google AI infrastructure: Scaling for the agentic era What’s next in Google AI infrastructure: Scaling for the agentic era

TPU 8t is our training powerhouse, specifically designed for high-throughput AI workloads.

It redefines the scale of AI development, delivering nearly 3x higher compute performance than previous generations to shrink training timelines for massive models.

Different customers have different workloads, different requirements, and different use cases.

So, we also partner deeply with NVIDIA to deliver the latest GPU platforms as highly reliable and scalable services in Google Cloud.

Thinking Machine Labs, for example, uses our NVIDIA-based infrastructure to power Tinker, an open platform for reinforcement learning and fine-tuning of frontier models for specialized use cases, achieving over 2x f…

2 weeks назад @ cloud.google.com
Introducing Gemini Enterprise Agent Platform, powering the next wave of agents
Introducing Gemini Enterprise Agent Platform, powering the next wave of agents Introducing Gemini Enterprise Agent Platform, powering the next wave of agents

"Burns & McDonnell uses Agent Platform to transform how organizational knowledge is applied across the enterprise.

– Matt Olson, Chief Innovation Officer, Burns & McDonnell“Color Health uses Agent Platform to power our Virtual Cancer Clinic, delivering end-to-end care.

– Etienne BERTIN, Group CIO, L'Oréal“Payhawk uses Agent Platform to transform our AI agents from simple task executors into genuine financial assistants.

Finally, Agent Payment Protocol (AP2) on Agent Platform provides the critical foundation for trusted agent payments.

Agent Platform is the new standard for enterprise agent development, built to help you move from experimentation to production-scale impact, starting today.

2 weeks назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 1 day, 19 hours назад
Microsoft at NSDI 2026: Advances in large-scale networked systems
Microsoft at NSDI 2026: Advances in large-scale networked systems Microsoft at NSDI 2026: Advances in large-scale networked systems

The USENIX Symposium on Networked Systems Design and Implementation 2026 (opens in new tab) (NSDI ’26) is a leading forum where researchers and practitioners share new research, insights, and advances in the design and operation of these systems.

Microsoft is proud to support NSDI ’26 as a returning sponsor, reflecting our ongoing commitment to advancing systems and networking research and engaging with the broader community.

Together, they highlight advances in building and operating large-scale networked systems.

Spotlight: Microsoft research newsletter Microsoft Research Newsletter Stay connected to the research community at Microsoft.

Wednesday, May 6, 9:00–10:20 AMYuxuan Yan, Zhejiang …

1 day, 19 hours назад @ microsoft.com
Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale
Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale

Actions that seem harmless can cascade causing a chain reaction across an agent network.

Invisibility: Information can pass through chains of unaware agents, making the source of an attack hard to trace from any single agent’s perspective.

Each independently contacted a victim agent (Bob) about the same fabricated audit, using varied language and staggered timing to appear unrelated.

Experimental setup: A principal entrusts their agent, Bob, with sensitive personal data: disability accommodation, medical schedule, preferred pharmacy, emergency contact.

Agents relayed summaries of other agents’ private messages to the attacker (one forwarded another agent’s message within seconds), and agent…

6 days, 13 hours назад @ microsoft.com
AutoAdapt: Automated domain adaptation for large language models
AutoAdapt: Automated domain adaptation for large language models AutoAdapt: Automated domain adaptation for large language models

At a glance Problem : Adapting large language models to specialized, high-stakes domains is slow, expensive, and hard to reproduce.

: Adapting large language models to specialized, high-stakes domains is slow, expensive, and hard to reproduce.

Why it matters: The result is faster, automated, more reliable domain adaptation that turns weeks of manual iteration into repeatable pipelines.

Deploying large language models (LLMs) in real-world, high-stakes settings is harder than it should be.

In our paper, “AutoAdapt: An Automated Domain Adaptation Framework for Large Language Models,” we describe an end-to-end, constraint-aware framework for domain adaptation.

2 weeks назад @ microsoft.com
Can we AI our way to a more sustainable world?
Can we AI our way to a more sustainable world? Can we AI our way to a more sustainable world?

Because I do think there’s a role for AI, a huge role for AI.

BURGER: Right, right.

BURGER: Right, right.

So I think that’s also something quite important here that, you know, AI can help facilitate.

And I think that’s not just applying AI to solve solutions through optimization but also thinking about this in an integrated way.

2 weeks, 2 days назад @ microsoft.com
New Future of Work: AI is driving rapid change, uneven benefits
New Future of Work: AI is driving rapid change, uneven benefits New Future of Work: AI is driving rapid change, uneven benefits

Publication New Future of Work Report 2025The New Future of Work report brings together research from inside and outside of Microsoft to understand what is happening as AI enters workplaces.

But usage and confidence vary widely across sectors, and men report using AI at work more often than women.

AI systems are increasingly playing a role in decision-making, creativity, and communication, with AI systems being positioned as a “collaborator.” This raises questions about how to support “collaboration” between people and AI, what we can learn from how people interact with each other, and where the capabilities of AI systems raise different opportunities and create different requirements.

Usin…

3 weeks, 6 days назад @ microsoft.com
Ideas: Steering AI toward the work future we want
Ideas: Steering AI toward the work future we want Ideas: Steering AI toward the work future we want

JANSSEN: Yeah, yeah, exactly.

TEEVAN: Yeah, yeah, yeah.

I’m curious what you have found particularly surprising about how people and organizations are leveraging AI right now.

And so I do like to picture a future of work where humans are flourishing with AI and where humans still get to do meaningful work.

And I’m very curious about how we can take advantage of AI and do more without running ourselves into the ground because we’re not AI, right?

3 weeks, 6 days назад @ microsoft.com
ADeLe: Predicting and explaining AI performance across tasks
ADeLe: Predicting and explaining AI performance across tasks ADeLe: Predicting and explaining AI performance across tasks

By linking outcomes to task demands, ADeLe explains differences in performance, showing how it changes as task complexity increases.

AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance.

Top: (1) Model performance on the ADeLe benchmark and (2) the resulting ability profiles, showing each model’s strengths and limitations across core abilities.

Evaluating ADeLeUsing ADeLe, the team evaluated a range of AI benchmarks and model behaviors to understand what current evaluations capture and what they miss.

This makes it possible to both explain and anticipate potential failures b…

1 month назад @ microsoft.com
AsgardBench: A benchmark for visually grounded interactive planning
AsgardBench: A benchmark for visually grounded interactive planning AsgardBench: A benchmark for visually grounded interactive planning

At a glance To successfully complete tasks, embodied AI agents must ground and update their plans based on visual feedback.

Spanning 108 controlled task instances across 12 task types, the benchmark requires agents to adapt their plans based on what they observe.

Evaluating AsgardBenchWe tested several leading vision-capable models on AsgardBench and observed that high-performing models require visual grounding to consistently succeed.

Across the models, visual input substantially improved performance: most models more than doubled success rates when given images versus text-only descriptions of the scene.

AsgardBench is open source and available on GitHub (opens in new tab), providing a fo…

1 month, 1 week назад @ microsoft.com
GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation
GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

Video-to-Spatially Grounded Planning (V2GP) is a framework that converts robot demonstration videos into spatially grounded training data, enabling models to learn planning and grounding jointly.

Grounded planning improves both task success and action accuracy, outperforming decoupled approaches in benchmark and real-world evaluations.

We also built Video-to-Spatially Grounded Planning (V2GP), a framework that converts robot demonstration videos into training data to help VLMs learn this capability.

Decoupled vs. grounded planning, illustrating how ambiguous language causes actions to be grounded to the wrong objects.

In contrast, our approach, grounded planning, performs planning and groun…

1 month, 1 week назад @ microsoft.com
Will machines ever be intelligent?
Will machines ever be intelligent? Will machines ever be intelligent?

And the question we’re going to discuss is, are machines intelligent?

No, no, that’s right, that’s right.

I mean, in some sense, you could potentially have a super intelligent system, right, that’s far more intelligent than anything else on the planet.

BURGER: Right, right.

At the same time, I think, you know, transformers are not intelligent in the way that a three-year-old is, right?

1 month, 2 weeks назад @ microsoft.com
Systematic debugging for AI agents: Introducing the AgentRx framework
Systematic debugging for AI agents: Introducing the AgentRx framework Systematic debugging for AI agents: Introducing the AgentRx framework

Debugging AI agent failures is hard because trajectories are long, stochastic, and often multi-agent, so the true root cause gets buried.

As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency.

The challenge: Why AI agents are hard to debugModern AI agents are often:Long-horizon: They perform dozens of actions over extended periods.

LLM-based judging: Finally, an LLM judge uses the validation log and a grounded failure taxonomy to identify the Critical Failure Step—the first unrecoverable error.

Together, we can build AI agents…

1 month, 3 weeks назад @ microsoft.com
From raw interaction to reusable knowledge: Rethinking memory for AI agents
From raw interaction to reusable knowledge: Rethinking memory for AI agents From raw interaction to reusable knowledge: Rethinking memory for AI agents

It seems counterintuitive: giving AI agents more memory can make them less effective.

In our recent paper “PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents,” we introduce a plug-and-play memory system that transforms raw agent interactions into reusable knowledge.

Raw interactions are standardized and transformed into propositional knowledge (facts) and prescriptive knowledge (reusable skills).

One memory, any taskMost AI memory systems are built for one job.

Toward reusable memory for agentsAs AI agents take on longer and more complex tasks, its memory needs to evolve from storing past interactions to actively supplying reusable knowledge.

1 month, 3 weeks назад @ microsoft.com
Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model
Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

At a glance Phi-4-reasoning-vision-15B is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs.

is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs.

This leads to several possible training pipelines:Non-reasoning LLM → reasoning multimodal training: Reasoning and multimodal capabilities are trained together.

Non-reasoning LLM → non-reasoning multimodal → reasoning multimodal training: Multimodal capabilities are learned first, then reasoning is added.

Reasoning LLM → reasoning multimodal training: A reasoning base is used, but all multimodal d…

2 months назад @ microsoft.com
Trailer: The Shape of Things to Come
Trailer: The Shape of Things to Come Trailer: The Shape of Things to Come

This is The Shape of Things to Come.

I manage Microsoft Research’s worldwide labs, and I’m excited to introduce this new Microsoft Research Podcast series.

I called the podcast The Shape of Things to Come because as researchers, the problems that we choose to solve and the technologies that we develop do change the shape of the future.

It’s very hard to say whether we’re in an inflection point because I see the advancement of technology accelerating.

But I don’t know what the inflection point is because all I’ve seen is a curve going up.

2 months назад @ microsoft.com
CORPGEN advances AI agents for real work
CORPGEN advances AI agents for real work CORPGEN advances AI agents for real work

To determine what a benchmark would need to test, we ran MHTEs at scale on some of today’s leading AI agents, exposing four weaknesses.

CORPGEN’s architectureCORPGEN introduces digital employees: LLM-powered AI agents with persistent identities, role-specific expertise, and realistic work schedules.

At 46 tasks, CORPGEN completed 15.2% of tasks, compared with 4.3% for the baselines, roughly 3.5 times more.

CORPGEN also opens a new lens on how AI agents collaborate.

AcknowledgmentsThis work is a result of a collaboration between the Office of the CTO at Microsoft and the Microsoft AI Development Accelerator Program (MAIDAP).

2 months, 1 week назад @ microsoft.com
MIT AI MIT AI
последний пост 7 часов назад
Study: Firms often use automation to control certain workers’ wages
Study: Firms often use automation to control certain workers’ wages Study: Firms often use automation to control certain workers’ wages

Rather than implement automation in pursuit of maximal productivity, firms have often used automation to replace employees who specifically receive a “wage premium,” earning higher salaries than other comparable workers.

For one thing, automation has affected the growth in U.S. income inequality even more than many observers realize.

This inefficient targeting of certain employees has offset 60-90 percent of the productivity gains from automation during the time period.

Inequality implicationsDating back to the 2010s, Acemoglu and Restrepo have combined to conduct many studies about automation and its effects on employment, wages, productivity, and firm growth.

Certain types of automation c…

7 часов назад @ news.mit.edu
Games people — and machines — play: Untangling strategic reasoning to advance AI
Games people — and machines — play: Untangling strategic reasoning to advance AI Games people — and machines — play: Untangling strategic reasoning to advance AI

Nevertheless, one month after graduating with his undergraduate degree, Farina began a doctoral degree in computer science at Carnegie Mellon University.

As he was finishing his doctorate, Farina worked for a year as a research scientist in Meta’s Fundamental AI Research Labs.

An everyday example occurs in the game of poker, where players bluff in order to conceal information about their cards.

Stratego is a military strategy game that has inspired research efforts costing millions of dollars to produce systems capable of beating human players.

I am excited about seeing these algorithms incorporated into the broader AI revolution that’s happening around us.”

1 day, 14 hours назад @ news.mit.edu
Improving understanding with language
Improving understanding with language Improving understanding with language

She learned French from her relationships with Haitian family friends, and American Sign Language because of another friend’s deaf sibling.

“There are so many things that are different about sign language and spoken language,” she says.

“It’s the only reason I’m on the path I’ve chosen,” she continues, one that features a focus on language acquisition, education policy, LLMs’ computational possibilities and limitations, and education reform.

Language is a medium for thought and provides guardrails to improve understanding.

“Support research,” Honeycutt says.

6 days, 7 hours назад @ news.mit.edu
Beacon Biosignals is mapping the brain during sleep
Beacon Biosignals is mapping the brain during sleep Beacon Biosignals is mapping the brain during sleep

Beacon Biosignals is working to make sense of the brain by monitoring its activity while people sleep.

With each deployment, Beacon learns more about how the brain works — insights it is using to create a “foundation model” of the brain.

“It was clear sleep was the right window to understand the brain,” Donoghue says.

“What’s powerful is that we’re building a longitudinal record of brain function over time,” Donoghue says.

That turns routine testing into a foundation for entirely new prognostic biomarkers — and a path to detecting and intervening in brain disease earlier, potentially before symptoms ever begin.”

6 days, 7 hours назад @ news.mit.edu
Making the case for curiosity-driven science
Making the case for curiosity-driven science Making the case for curiosity-driven science

Kornbluth spoke about everything from the importance of curiosity-driven science and why basic science is critical to our nation’s future, to AI and education, and even bravely joined O’Leary in a rendition of the Williams College song, “The Mountains,” in honor of their shared alma mater.

“We are in this time of incredible uncertainty,” said Kornbluth of the current state of higher education and funding for scientific research.

Behind the scenes, I am – along with many other [university] presidents – I am in D.C. all the time now.

Universities are where most of the science with a long pathway to impact, requiring patience, starts.

With that pipeline being drained, what does the future hold…

1 week назад @ news.mit.edu
Solving the “Whac-a-mole dilemma”: A smarter way to debias AI vision models
Solving the “Whac-a-mole dilemma”: A smarter way to debias AI vision models Solving the “Whac-a-mole dilemma”: A smarter way to debias AI vision models

Perhaps one of the best known and most persistent challenges that AI research continues to reckon with is bias.

Bias is often discussed in relation to training data, but model architecture can also contain and amplify bias, negatively influencing model performance in real-world settings.

VLMs are multi-modal models that can understand and interpret different data modalities like video, image, and text simultaneously.

And like projection debiasing, WRING is a post-processing approach, which means it can be applied “on the fly” to a pre-trained VLM.

“Extending this for ChatGPT-style, generative language models, is the reasonable next step for us,” says Gerych.

1 week назад @ news.mit.edu
The MIT-IBM Computing Research Lab launches to shape the future of AI and quantum computing
The MIT-IBM Computing Research Lab launches to shape the future of AI and quantum computing The MIT-IBM Computing Research Lab launches to shape the future of AI and quantum computing

IBM and MIT today announced the launch of the MIT-IBM Computing Research Lab, advancing their long-standing collaboration to shape the next era of computing.

The MIT-IBM Computing Research Lab builds on a distinguished history of scientific excellence at the intersection of research and academia.

“We expect the MIT-IBM Computing Research Lab to emerge as one of the world’s premier academic and industrial hubs accelerating the future of computing,” says Jay Gambetta, director of IBM Research and IBM Fellow, and IBM chair of the MIT-IBM Computing Research Lab.

The MIT-IBM Computing Research Lab will also leverage IBM’s longtime leadership and expertise in quantum computing.

Deep integration w…

1 week, 1 day назад @ news.mit.edu
Enabling privacy-preserving AI training on everyday devices
Enabling privacy-preserving AI training on everyday devices Enabling privacy-preserving AI training on everyday devices

A new method developed by MIT researchers can accelerate a privacy-preserving artificial intelligence training method by about 81 percent.

This advance could enable a wider array of resource-constrained edge devices, like sensors and smartwatches, to deploy more accurate AI models while keeping user data secure.

Each device trains the model using its local data and then transfers model updates back to the server.

This new approach could make it more feasible for AI models to be used in high-stakes applications with strict security and privacy standards, like health care and finance.

The central server usually waits to receive model updates from all devices, then averages them to complete th…

1 week, 1 day назад @ news.mit.edu
A faster way to estimate AI power consumption
A faster way to estimate AI power consumption A faster way to estimate AI power consumption

Improving data center energy efficiency is one way scientists are striving to make AI more sustainable.

In addition, this tool could allow algorithm developers and model providers to assess potential energy consumption of a new model before they deploy it.

The power consumption of a particular GPU will vary based on its configuration and the workload it is handling.

They could use these patterns to generate the information needed for reliable but quick power estimation.

The user can also change the GPU configuration or adjust the operating speed to see how such design choices impact the overall power consumption.

1 week, 3 days назад @ news.mit.edu
MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone
MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone

Every year, the countries competing in the International Mathematical Olympiad (IMO) arrive with a booklet of their best, most original problems.

MathNet also functions as a rigorous benchmark for AI performance, and the results reveal a more complicated picture than recent headlines about AI math prowess might suggest.

Even GPT-5, the top-performing model tested, averaged around 69.3 percent on MathNet's main benchmark of 6,400 problems, failing nearly one-in-three Olympiad-level problems.

The diversity of MathNet is also designed to address a deeper limitation in how AI models learn mathematics.

When training data skews toward English and Chinese problems, models absorb a narrow slice of …

1 week, 5 days назад @ news.mit.edu
Teaching AI models to say “I’m not sure”
Teaching AI models to say “I’m not sure” Teaching AI models to say “I’m not sure”

Today's most capable reasoning models share a trait with the loudest voice in the room: They deliver every answer with the same unshakable certainty, whether they're right or guessing.

The technique, called RLCR (Reinforcement Learning with Calibration Rewards), trains language models to produce calibrated confidence estimates alongside their answers.

During training, models learn to reason about both the problem and their own uncertainty, producing an answer and a confidence estimate together.

Standard RL training actively degraded calibration compared to the base model, making models worse at estimating their own uncertainty.

The researchers trained classifiers on model outputs and found …

2 weeks назад @ news.mit.edu
Jacob Andreas and Brett McGuire named Edgerton Award winners
Jacob Andreas and Brett McGuire named Edgerton Award winners Jacob Andreas and Brett McGuire named Edgerton Award winners

MIT Associate Professor Jacob Andreas of the Department of Electrical Engineering and Computer Science [EECS] and MIT Associate Professor Brett McGuire of the Department of Chemistry have been selected as the winners of the 2026 Harold E. Edgerton Faculty Achievement Award.

“He is an innovative researcher whose work combines computational and linguistically informed approaches to build foundations of language learning.

He aims to understand the computational foundations of language learning, and to build intelligent systems that can learn from human guidance.

His work in natural language processing has taken on thorny problems in the capability gap between humans and computers.

His honors i…

2 weeks, 5 days назад @ news.mit.edu
Bringing AI-driven protein-design tools to biologists everywhere
Bringing AI-driven protein-design tools to biologists everywhere Bringing AI-driven protein-design tools to biologists everywhere

“Researchers can use their own data to train models and optimize protein sequences, and then they can use our other tools to analyze those proteins,” Bepler says.

“People are generating libraries of protein sequences in silico [on computers] and then running them through predictive models to get validation and structural predictors.

Since its founding, OpenProtein’s team has continued to add tools to its platform for researchers regardless of their lab size or resources.

“We really want to solve the question of how we describe proteins,” Bepler says.

As progress in AI races forward, OpenProtein continues to see its mission as giving scientists the best tools to develop new treatments faster.

2 weeks, 6 days назад @ news.mit.edu
Human-machine teaming dives underwater
Human-machine teaming dives underwater Human-machine teaming dives underwater

"Divers and AUVs generally don't team at all underwater," says principal investigator Madeline Miller.

Even ROVs are challenging to work with underwater in very skilled manipulation tasks because the manipulators themselves aren't agile enough."

But what if an autonomous underwater vehicle (AUV) could map the line and pinpoint the location of the fault for a diver to fix?

To combine these strengths, Miller and her team are developing hardware and algorithms for underwater navigation and perception — two key capabilities for effective human-robot teaming.

The historical lack of large, labeled sonar image datasets has hindered training of underwater perception algorithms.

3 weeks, 1 day назад @ news.mit.edu
Q&A: MIT SHASS and the future of education in the age of AI
Q&A: MIT SHASS and the future of education in the age of AI Q&A: MIT SHASS and the future of education in the age of AI

A: Artificial intelligence isn’t just changing the way students learn — it’s transforming every aspect of society.

We need students who have a moral compass, and who understand how the world works, in all of its political, economic, and human complexity.

We need students who know how to think critically, and who have excellent communication and leadership skills.

Q: What role do the humanities, arts, and social sciences play in preparing MIT students for that future?

A: They’re essential, and are rightly a core part of an MIT education: MIT has long required its undergraduates take at least eight courses in HASS disciplines to graduate.

3 weeks, 1 day назад @ news.mit.edu
Berkeley AI
последний пост 2 weeks, 3 days назад
Gradient-based Planning for World Models at Longer Horizons
Gradient-based Planning for World Models at Longer Horizons Gradient-based Planning for World Models at Longer Horizons

Large, learned world models are becoming increasingly capable.

Why is adversarial robustness an issue for world model planning?

We thus exploit the differentiability of learned world models $F_{\theta}$, while not falling victim to the inherent sensitivity of the state Jacobians $D_s F_{\theta}$.

It’s a funny sweet spot where the background literature (planning and control overall) is incredibly mature and well-developed, but the current setting (pure planning optimization over modern, large-scale world models) is still heavily underexplored.

But, once we figure out all the right ideas, world model planners will likely become as commonplace as RL.

2 weeks, 3 days назад @ bair.berkeley.edu
Identifying Interactions at Scale for LLMs
Identifying Interactions at Scale for LLMs Identifying Interactions at Scale for LLMs

Identifying Interactions at Scale for LLMsUnderstanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence.

Therefore, grounded or reality-checked interpretability methods must also be able to capture these influential interactions.

In this blog post, we describe the fundamental ideas behind SPEX and ProxySPEX, algorithms capable of identifying these critical interactions at scale.

SPEX and ProxySPEX FrameworkTo discover influential interactions with a tractable number of ablations, we have developed SPEX (Spectral Explainer).

We formalize this through two observations: sparsity (relatively f…

1 month, 3 weeks назад @ bair.berkeley.edu
Information-Driven Design of Imaging Systems
Information-Driven Design of Imaging Systems Information-Driven Design of Imaging Systems

We developed a framework that enables direct evaluation and optimization of imaging systems based on their information content.

The first approach treated imaging systems as unconstrained communication channels, ignoring the physical limitations of lenses and sensors.

Our Information-Driven Encoder Analysis Learning (IDEAL) method uses gradient ascent on information estimates to optimize imaging system parameters.

The standard approach to computational imaging design, end-to-end optimization, jointly trains the imaging hardware and a neural network decoder.

The computational efficiency of IDEAL suggests possibilities for designing imaging systems that were previously intractable.

3 months, 3 weeks назад @ bair.berkeley.edu
RL without TD learning
RL without TD learning RL without TD learning

RL without TD learningIn this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer.

We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning.

There are two classes of algorithms in RL: on-policy RL and off-policy RL.

We compared TRL with $n$-step TD learning with different values of $n$, from $1$ (pure TD) to $\infty$ (pure MC).

I still think one of the most important problems in RL (and even in machine learning) is to find a scalable off-policy RL algorithm.

6 months, 1 week назад @ bair.berkeley.edu
What exactly does word2vec learn?
What exactly does word2vec learn? What exactly does word2vec learn?

What exactly does word2vec learn?

What exactly does word2vec learn, and how?

In this framing, it’s clear that word2vec is a minimal neural language model.

As a result, the theory predicts exactly what features are learned in terms of the corpus statistics and the algorithmic hyperparameters.

We find that over the course of learning, word2vec builds these linear representations in a sequence of noisy learning steps, and their geometry is well-described by a spiked random matrix model.

8 months, 1 week назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 20 часов назад
Cost effective deployment of vision-language models for pet behavior detection on AWS Inferentia2
Cost effective deployment of vision-language models for pet behavior detection on AWS Inferentia2 Cost effective deployment of vision-language models for pet behavior detection on AWS Inferentia2

After the API layer processes each request, it forwards the image to a second-layer Auto Scaling group dedicated to running model inference.

Model inference – After processing, the images are forwarded to a second layer EC2 Auto Scaling group containing inference instances.

Compile : use the original model ( model.text_encoder.model ): use the original model ( ) Deploy: use TextEncoderWrapper to run the compiled modelThis keeps the original code unchanged while making the compiled model easy to plug into production.

By migrating from those GPU on-demand deployments to Inf2.xlarge instances with Inferentia2, Tomofun achieved 83% cost reduction without compromising performance.

ConclusionBy m…

20 часов назад @ aws.amazon.com
How Hapag-Lloyd uses Amazon Bedrock to transform customer feedback into actionable insights
How Hapag-Lloyd uses Amazon Bedrock to transform customer feedback into actionable insights How Hapag-Lloyd uses Amazon Bedrock to transform customer feedback into actionable insights

We use Amazon Bedrock to c lassify sentiment (positive, negative, mixed, or neutral) for each open comment, streamlining downstream analysis.

Generative AI OrchestrationOrchestration is a core foundation of our solution, because generative AI workflows typically involve multiple steps that need to be coordinated.

For example, our AI-powered internal chatbot uses the Claude Sonnet 4.6 model via Amazon Bedrock.

Amazon Bedrock also integrates with AWS CloudTrail, which captures API calls for Amazon Bedrock as events.

If you want to scale your generative AI applications, you can get started by reading this Architect a mature generative AI foundation on AWS that dives deeper on the various found…

1 day, 18 hours назад @ aws.amazon.com
Streamlining generative AI development with MLflow v3.10 on Amazon SageMaker AI
Streamlining generative AI development with MLflow v3.10 on Amazon SageMaker AI Streamlining generative AI development with MLflow v3.10 on Amazon SageMaker AI

Today, we’re excited to announce that Amazon SageMaker AI MLflow Apps now support MLflow version 3.10, bringing enhanced capabilities for generative AI development and streamlined experiment tracking to your generative AI workflows.

These improvements coupled with SageMaker AI provide an enterprise-grade generative AI infrastructure, making it straightforward to track experiments, monitor generative AI performance, and maintain governance across AI applications at scale.

Getting started with SageMaker AI MLflow App v3.10For new users, creating a SageMaker AI MLflow App is straightforward through the SageMaker Studio console, AWS CLI, or API.

ConclusionThe introduction of MLflow v3.10 in Ama…

1 day, 18 hours назад @ aws.amazon.com
Introducing OS Level Actions in Amazon Bedrock AgentCore Browser
Introducing OS Level Actions in Amazon Bedrock AgentCore Browser Introducing OS Level Actions in Amazon Bedrock AgentCore Browser

We’re announcing OS Level Actions for AgentCore Browser.

This post walks through how OS Level Actions work, what actions are supported, and how to get started.

How OS Level Actions workOS Level Actions are available for new and existing browser configurations without further setup.

OS Level Actions extend that capability beyond the web layer to UI elements visible on the screen.

Combined with AgentCore Browser’s existing capabilities like visual understanding and framework integration with Playwright and Amazon Nova Act, OS Level Actions close the last gap in browser automation coverage.

1 day, 18 hours назад @ aws.amazon.com
Secure AI agents with Amazon Bedrock AgentCore Identity on Amazon ECS
Secure AI agents with Amazon Bedrock AgentCore Identity on Amazon ECS Secure AI agents with Amazon Bedrock AgentCore Identity on Amazon ECS

Solution overviewThis architecture diagram shows how AgentCore Identity secures a self-hosted AI agent on Amazon ECS.

Both services use Amazon Bedrock AgentCore Identity to authenticate users inbound via OIDC and authorize outbound actions on their behalf.

Amazon Bedrock AgentCore Identity: Authorization Code GrantThis walkthrough adapts the general AgentCore Identity session binding flow for a self-hosted architecture using ALB for authentication, a dedicated Session Binding Service, and direct API calls instead of the AgentCore SDK and Runtime.

The user’s browser follows the redirect to the Session Binding Service via the Session Binding URL.

ConclusionIn this post, you learned how to sec…

1 day, 20 hours назад @ aws.amazon.com
Intelligence-driven message defense and insights using Amazon Bedrock
Intelligence-driven message defense and insights using Amazon Bedrock Intelligence-driven message defense and insights using Amazon Bedrock

You can use Amazon Bedrock to experiment with, customize, and integrate generative AI capabilities into your applications using familiar AWS services.

Using the prompt, Amazon Bedrock extracts this data for a backend ticketing system.

Next stepsAfter developing quality user prompts, integrate them into your existing workflows using the Amazon Bedrock API.

For implementation instructions, visit Making a request to Amazon Bedrock via Amazon API Gateway.

To begin building with Amazon Bedrock AgentCore, visit Securely launch and scale your agents and tools on Amazon Bedrock AgentCore.

1 day, 20 hours назад @ aws.amazon.com
Beyond BI: How the Dataset Q&A feature of Amazon Quick powers the next generation of data decisions
Beyond BI: How the Dataset Q&A feature of Amazon Quick powers the next generation of data decisions Beyond BI: How the Dataset Q&A feature of Amazon Quick powers the next generation of data decisions

We evolved TARA’s conversational analytics capabilities by adopting the Dataset Q&A feature as the foundation for semantic query generation and insight delivery.

By embedding semantic definitions directly into the dataset and grounding SQL generation in the business meaning of the data, Dataset Q&A significantly improved the quality and reliability of insights.

To set this up, navigate to the Spaces section in the Amazon Quick side panel and create a new Space.

Key Architectural Differentiator:The critical shift from Topics-based Q&A to direct dataset Q&A is the removal of the semantic intermediary.

ConclusionDirect dataset Q&A transforms how users interact with data by alleviating configur…

2 days, 17 hours назад @ aws.amazon.com
Introducing the agent performance loop: AgentCore Optimization now in preview
Introducing the agent performance loop: AgentCore Optimization now in preview Introducing the agent performance loop: AgentCore Optimization now in preview

Today we are announcing new capabilities in AgentCore that complete the observe, evaluate, improve loop for agent performance and quality: recommendations and two ways to validate them.

Recommendations analyze production traces and evaluation outputs to optimize your system prompt or tool descriptions for the evaluator you specify.

End-to-end traceability in AgentCore captures every model call, tool invocation, and reasoning step as OpenTelemetry-compatible traces managed using AgentCore Observability.

Get startedThese capabilities are available in preview today through Amazon Bedrock AgentCore in AWS Regions where AgentCore Evaluations is available.

During preview, AgentCore Optimization t…

2 days, 18 hours назад @ aws.amazon.com
Agent-guided workflows to accelerate model customization in Amazon SageMaker AI
Agent-guided workflows to accelerate model customization in Amazon SageMaker AI Agent-guided workflows to accelerate model customization in Amazon SageMaker AI

Amazon Kiro in SageMaker AI Studio JupyterLabJupyterLab in SageMaker AI includes an integrated agentic development environment support through ACP.

When you use coding agents in SageMaker AI JupyterLab, the space automatically loads relevant SageMaker AI model customization Skills into your agent’s context.

ACP-compatible agents can benefit from the same SageMaker AI Skills integration when used within SageMaker AI JupyterLab.

PrerequisitesBefore starting this tutorial, you must have the following prerequisites:Skills overviewThe SageMaker AI agent skills are built conforming with the Agent Skills open format.

Review the documentation to see how SageMaker AI serverless model customization w…

2 days, 18 hours назад @ aws.amazon.com
Generate dashboards from natural language prompts in Amazon Quick
Generate dashboards from natural language prompts in Amazon Quick Generate dashboards from natural language prompts in Amazon Quick

Amazon Quick now generates complete multi-sheet dashboards from natural language prompts, taking you from one or more datasets to a production-ready analysis in minutes.

In Amazon Quick, Analysis is the authoring surface where you build and arrange visuals, filters, and calculated fields across multiple sheets.

In Amazon Quick, your data is stored in datasets, which connect to sources such as Amazon Redshift, Amazon Simple Storage Service (Amazon S3), or uploaded files.

Select your datasetsOpen a dataset in Amazon Quick and choose Generate analysis.

ConclusionGenerate Analysis in Amazon Quick creates complete multi-sheet analyses from natural language prompts, reducing dashboard creation fr…

2 days, 18 hours назад @ aws.amazon.com
From data lake to AI-ready analytics: Introducing new data source with S3 Tables in Amazon Quick
From data lake to AI-ready analytics: Introducing new data source with S3 Tables in Amazon Quick From data lake to AI-ready analytics: Introducing new data source with S3 Tables in Amazon Quick

To address this, Amazon Quick introduces Amazon S3 Tables (Apache Iceberg tables) as a new data source.

Transaction events are streamed into Amazon Kinesis Data Streams and delivered using Amazon Data Firehose into an Amazon S3 table bucket.

Step 2: Create an Amazon Quick data source using S3 TablesNow, let’s create an Amazon Quick data source pointing to the s3table-datasamples bucket.

On the next screen, select Amazon S3 Tables (Apache Iceberg tables) as the data source type, then choose Next.

ConclusionIn this post, we explored how Amazon Quick’s new Amazon S3 Tables data source enables near real-time analytics while streamlining modern data architectures.

2 days, 19 hours назад @ aws.amazon.com
Introducing Dataset Q&A: Expanding natural language querying for structured datasets in Amazon Quick
Introducing Dataset Q&A: Expanding natural language querying for structured datasets in Amazon Quick Introducing Dataset Q&A: Expanding natural language querying for structured datasets in Amazon Quick

Amazon Quick now adds a powerful new natural language query capability, Dataset Q&A, to remove this bottleneck.

WalkthroughIn the following walkthrough, we demonstrate Dataset Q&A using a real-world dataset of bicycle rental trips from a city bike-sharing network.

Dataset Q&A capabilities can be invoked for both SPICE and direct query datasets including Amazon Redshift, Amazon Athena, Amazon Aurora PostgreSQL and Amazon Simple Storage Service (S3) Tables.

Supported data sourcesSupported data sources are Amazon Athena, Amazon Redshift, Amazon Aurora PostgreSQL, and Amazon S3 Tables in direct query mode for Dataset Q&A at this time.

ConclusionDataset Q&A for datasets in Quick Sight within Ama…

2 days, 19 hours назад @ aws.amazon.com
Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints
Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints Capacity-aware inference: Automatic instance fallback for SageMaker AI endpoints

Building a real-time inference endpoint on Amazon SageMaker AI has meant committing to a single instance type at creation time.

Today, Amazon SageMaker AI introduces capacity aware instance pool for new and existing inference endpoints.

Option 2: Use SageMaker AI inference recommendationsIf you’d rather not optimize each hardware target manually, SageMaker AI inference recommendations can generate hardware-specific configurations for you.

Workflow to configure endpoints with instance poolThere are two ways you can configure the instance pool: for new Amazon SageMaker AI endpoint or with your existing Amazon SageMaker AI endpoint.

ConclusionAmazon SageMaker AI Instance Pools let you define a…

2 days, 19 hours назад @ aws.amazon.com
AWS Transform now automates BI migration to Amazon Quick in days
AWS Transform now automates BI migration to Amazon Quick in days AWS Transform now automates BI migration to Amazon Quick in days

To learn more about what Amazon Quick offers, see Getting Started with Amazon Quick.

AWS Transform, an AI-powered service built to accelerate enterprise modernization, now answers that how for BI migration.

For BI migration, Wavicle provides four specialized agents available for purchase through AWS Marketplace: one Analyzer agent and one Converter agent for each BI migration source (Power BI and Tableau).

For Quick admin: Assign ownership and configure governanceAs Quick Sight administrator (the role configured in the Quick Sight connector), you assign ownership of each migrated dashboard to the appropriate BI authors.User authentication and directory structures in your source BI tool rare…

5 days, 17 hours назад @ aws.amazon.com
Reinforcement fine-tuning with LLM-as-a-judge
Reinforcement fine-tuning with LLM-as-a-judge Reinforcement fine-tuning with LLM-as-a-judge

Reinforcement Fine‑Tuning (RFT) has emerged as the preferred method to align these models efficiently, using automated reward signals to replace costly manual labeling.

Reinforcement Fine-Tuning can use any reward signal, straightforward hand‑crafted rules (RLVR), or an LLM that evaluates model outputs (LLM-as-a-judge or RLAIF).

Implementing LLM-as-a-judge: Six critical stepsThis section covers the key steps involved in designing and deploying LLM-as-a-judge reward functions.

Reward Lambda function for LLM-as-a-judgeThe following code snippets present the key components of the reward Lambda function.

TargetDocument_Grounding **Evaluates**: (a) Whether text_excerpt quotes from TargetDocument…

6 days, 15 hours назад @ aws.amazon.com
NVIDIA
последний пост 1 day назад
NVIDIA Spectrum-X — the Open, AI-Native Ethernet Fabric — Sets the Standard for Gigascale AI, Now With MRC
NVIDIA Spectrum-X — the Open, AI-Native Ethernet Fabric — Sets the Standard for Gigascale AI, Now With MRC NVIDIA Spectrum-X — the Open, AI-Native Ethernet Fabric — Sets the Standard for Gigascale AI, Now With MRC

NVIDIA Spectrum-X Ethernet is suited for this environment, helping provide the network foundation needed to run large-scale AI models and applications with confidence.

Another innovation key to achieving gigascale AI factories is multiplanar network designs, which OpenAI deploys with Spectrum-X Ethernet in conjunction with MRC.

Both Spectrum-X Ethernet Adaptive RDMA and MRC protocols, as well as other custom protocols, run natively across NVIDIA ConnectX SuperNICs and Spectrum-X Ethernet switches and support multiplanar network designs at gigascale.

NVIDIA Spectrum-X Ethernet delivers on all three, and with MRC, it continues to set the standard for advanced AI networking.

Learn more about N…

1 day назад @ blogs.nvidia.com
NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises
NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises

Unlike standalone AI agents, Project Arc connects natively to the ServiceNow AI Platform through ServiceNow Action Fabric to bring governance, auditability and workflow intelligence to every action the autonomous desktop agent takes.

NVIDIA agent skills enable specialized agents, such as ServiceNow AI Specialists, to deliver targeted capabilities across enterprise workflows.

Efficient AI FactoriesAs AI agents become long running and always on, scaling them across millions of workflows requires not just capability but efficiency — making token economics central to enterprise AI.

NVIDIA AI factories are built to deliver the lowest-cost, most-efficient tokenomics for production AI.

ServiceNow …

1 day, 18 hours назад @ blogs.nvidia.com
Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills
Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills

The following steps outline how to set up and use the NVIDIA cuOpt supply chain agent reference workflow, which uses cuOpt agent skills to perform GPU-accelerated supply chain optimization using agent-driven workflows.

End-to-end supply chain decision optimization using NVIDIA cuOpt agent skillsExtendible agentic architectureThe cuOpt supply chain agent reference workflow is a simplified starting point.

Architectural diagram of an extended pattern using the NVIDIA cuOpt supply chain agent reference workflowGet started with this cuOpt agent workflow on GitHub.

Get startedDeploy NVIDIA cuOpt Agent reference workflow using the NVIDIA NeMo Agent Toolkit and use built-in optimization skills, or …

2 days, 14 hours назад @ developer.nvidia.com
Nemotron Labs: What OpenClaw Agents Mean for Every Organization
Nemotron Labs: What OpenClaw Agents Mean for Every Organization Nemotron Labs: What OpenClaw Agents Mean for Every Organization

Each post highlights practical ways to use an open stack to deliver real value in production — from transparent research copilots to scalable AI agents.

Most AI agents today are triggered by a prompt, complete a defined task and then stop running.

Autonomous agents, which run continuously and act across long time horizons, drive inference demand up by another 1,000x over reasoning AI.

The practical applications of long-running autonomous agents span every function and sector.

Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.

6 days, 15 hours назад @ blogs.nvidia.com
Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl
Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl

This post covers cross-domain-specific language (DSL) GPU kernel translation, from porting cuTile Python kernels to cuTile.jl (Julia).

It shows how to:Translate GPU kernels between cuTile Python and cuTile.jl: Walk through a complete matrix multiplication example side-by-side.

Cross-DSL GPU kernel translationBoth cuTile Python and cuTile.jl frontends share the same tiled abstraction, making the translation largely algorithmic.

The following examples are from TileGym, where the team ported a set of cuTile Python kernels to cuTile.jl and packaged them as a self-contained Julia subproject.

* Accumulator shape (TM, TN) Wrong results in matmul Column-major needs (TN, TM) ct.PaddingMode.ZERO Unde…

6 days, 19 hours назад @ developer.nvidia.com
It’s Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power
It’s Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power It’s Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power

Ultimate members now get priority access to RTX 5080‑class rigs, making it easier than ever to tap into next‑generation PC power from almost any device.

Starting today, Ultimate members can stream even more of their games on RTX 5080 virtual gaming rigs — bringing the power of the NVIDIA Blackwell RTX architecture to a wide range of titles.

This update significantly broadens access to 5080 performance beyond the list of GeForce RTX 5080-optimized titles.

With RTX 5080 in the cloud, Ultimate members unlock the same cutting-edge features available to GeForce RTX 50 Series GPU owners.

With RTX 5080 powering the default Ultimate experience, GeForce NOW delivers next-generation performance to mo…

6 days, 22 hours назад @ blogs.nvidia.com
Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo
Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo

Now, a new context parallelism (CP) framework from the NVIDIA BioNeMo team is shattering the memory barriers of structural biology, enabling the holistic modeling of systems.

For more details, see Fold-CP: A Context Parallelism Framework for Biomolecular Modeling.

The NVIDIA BioNeMo CP framework overcomes these limits by sharding a single large molecular system across multiple GPUs.

BioNeMo context parallelism implementationThe NVIDIA BioNeMo CP implementation is built on Torch distributed APIs for GPU-to-GPU communications.

To learn more, see the Boltz CP code open-source documentation and check out Fold-CP: A Context Parallelism Framework for Biomolecular Modeling.

1 week, 1 day назад @ developer.nvidia.com
NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents
NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents

Nemotron 3 Nano Omni sets a new efficiency frontier for open multimodal models with leading accuracy and low cost, topping six leaderboards for complex document intelligence, and video and audio understanding.

“By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before.

Computer use agents — Nemotron 3 Nano Omni powers the perception loop for agents navigating graphical user interfaces, reasoning over onscreen content and understanding user interface state over time.

Visit the NVIDIA technical blog for tutorials, cookbooks and deployment guides for Nemotron 3 Nano Omni use cases.

Stay up to date on agentic AI, NVI…

1 week, 1 day назад @ blogs.nvidia.com
24/7 Simulation Loops: How Agentic AI Keeps Subsurface Engineering Moving
24/7 Simulation Loops: How Agentic AI Keeps Subsurface Engineering Moving 24/7 Simulation Loops: How Agentic AI Keeps Subsurface Engineering Moving

While our examples focus on subsurface simulation, the framework is tool-agnostic and applicable to any industry reliant on complex simulation workflows.

The overall master architecture diagramThe reservoir simulation assistant: Accelerating daily workflowsThe reservoir simulation assistant acts as a digital domain expert bridging the gap between the engineer, technical documentation, and the simulator.

Demo of the reservoir simulation assistantKey takeawaysThe reservoir simulation assistant is designed to augment, not replace, the established tools of the trade.

Advanced reasoning: Agents utilize Llama-3.3-Nemotron-Super-49B-v1.5 , a state-of-the-art model designed for complex reasoning, p…

1 week, 1 day назад @ developer.nvidia.com
Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived
Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived

Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners, and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

SimReady: The Content Standard for Physical AIAs physical AI becomes integral to industrial operations, manufacturers face a foundational challenge: Assets don’t travel reliably between 3D pipelines.

Every time an asset moves from a computer-aided design tool to a simulation platform, physics properties, geometry and metadata are lost — forcing teams to rebuild from scratch.

In addition, NVIDIA Omniverse libraries provide the physics-accurate, photorealistic simulation layer wher…

1 week, 1 day назад @ blogs.nvidia.com
Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE
Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE

Turn an existing local training script into a federated client with ~5–6 lines of code, without changing your training loop structure.

Step 1: Convert your local training script into a federated client (client API)Who it’s for: Practitioners and ML engineers with existing training code who want the smallest possible difference.

After step 1, you have a federated client script.

Step 2 makes it a federated job you can run repeatedly and move through the lifecycle cleanly.

FLARE in the NewsFLARE is showing up in real deployments—from Eli Lilly TuneLab’s federated learning platform (built by Rhino Federated Computing using NVFlare) to Taiwan MOHW’s national healthcare federated learning initiat…

1 week, 5 days назад @ developer.nvidia.com
Winning a Kaggle Competition with Generative AI–Assisted Coding
Winning a Kaggle Competition with Generative AI–Assisted Coding Winning a Kaggle Competition with Generative AI–Assisted Coding

In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground competition.

Historically, two bottlenecks have limited this experimentation:How quickly you can write code for new experiments.

LLM agents now address the first problem—unlocking a new scale of rapid, iterative experimentation.

Step 1: LLM agents perform EDAAn LLM agent must understand data structure before generating a full pipeline.

The advantage lies in exploring many ideas quickly with GPU-accelerated model execution and LLM agents to write code faster.

1 week, 6 days назад @ developer.nvidia.com
OpenAI’s New GPT-5.5 Powers Codex on NVIDIA Infrastructure — and NVIDIA Is Already Putting It to Work
OpenAI’s New GPT-5.5 Powers Codex on NVIDIA Infrastructure — and NVIDIA Is Already Putting It to Work OpenAI’s New GPT-5.5 Powers Codex on NVIDIA Infrastructure — and NVIDIA Is Already Putting It to Work

Codex, OpenAI’s agentic coding application, is enabling this new frontier.

It’s now powered by GPT-5.5, OpenAI’s latest frontier model, which runs on NVIDIA GB200 NVL72 rack-scale systems.

NVIDIA engineers have had access to GPT-5.5 through the Codex app for a few weeks, and the gains are measurable.

Users can control the Codex agent running in the cloud VM from a user interface that every employee is familiar with.

NVIDIA was a day-zero partner for OpenAI’s gpt-oss open-weight model launch, optimizing model weights for NVIDIA TensorRT-LLM and ecosystem frameworks including vLLM and Ollama.

1 week, 6 days назад @ blogs.nvidia.com
Tag, You’re It: GeForce NOW Levels Up Game Discovery With Xbox Game Pass and Ubisoft+ Labels
Tag, You’re It: GeForce NOW Levels Up Game Discovery With Xbox Game Pass and Ubisoft+ Labels Tag, You’re It: GeForce NOW Levels Up Game Discovery With Xbox Game Pass and Ubisoft+ Labels

This week’s upgrades bring smarter libraries, making it easier than ever for gamers to turn a PC collection into a cloud-powered flex.

The new in-app labels, first announced at GDC, are now live — making it simple to spot titles and new releases from connected subscription services like Xbox Game Pass and Ubisoft+.

Leading the charge is Vampire Crawlers: The Turbo Wildcard, a chaotic new spin on the fast, unpredictable and packed-with-personality Vampire Survivors universe.

New in‑app game labels on GeForce NOW make it easy to see which titles are part of Xbox Game Pass or Ubisoft+ game libraries once accounts are connected.

Redeem through the GeForce NOW account portal and enter the code i…

1 week, 6 days назад @ blogs.nvidia.com
Making Sense of the Early Universe
Making Sense of the Early Universe Making Sense of the Early Universe

There are more galaxies in the universe than anyone ever expected.

“There were galaxies everywhere,” Robertson recalled.

JWST is the most powerful observatory ever launched, observing in infrared, capturing light that has traveled for more than 13 billion years.

Each deep-field image is crowded with hundreds of thousands of galaxies, some of them 13 billion years old.

“These datasets are far too large and complex for humans to analyze by hand,” Robertson said.

1 week, 6 days назад @ blogs.nvidia.com
Facebook
последний пост 2 weeks, 1 day назад
Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge
Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge

We’ve fundamentally transformed Facebook Groups Search to help people more reliably discover, sort through, and validate community content that’s most relevant to them.

We’ve adopted a new hybrid retrieval architecture and implemented automated model-based evaluation to address the major friction points people experience when searching community content.

Addressing the Friction Points in Community KnowledgePeople struggle with three friction points when searching for answers in community content – discovery, consumption, and validation.

The Solution: A Modernized Hybrid Retrieval ArchitectureWe engineered a hybrid retrieval architecture that powers a discussions module on Facebook Search.

R…

2 weeks, 1 day назад @ engineering.fb.com
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale

We’ve built a unified AI agent platform that encodes the domain expertise of senior efficiency engineers into reusable, composable skills.

Introducing the Capacity Efficiency ProgramWhen the code you ship serves more than 3 billion people, even a 0.1% performance regression can translate to significant additional power consumption.

Many engineers at Meta use our efficiency tools to work on these problems every day.

Skills : These encode domain expertise about performance efficiency.

The pipeline mirrors the defensive AI Regression Solver:Gather context with tools: The AI agent looks up: Opportunity metadata.

2 weeks, 6 days назад @ engineering.fb.com
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines

Challenging the Conventional Wisdom on AI Context FilesRecent academic research found that AI-generated context files actually decreased agent success rates on well-known open-source Python repositories.

Our codebase is the opposite: proprietary config-as-code with tribal knowledge that exists nowhere in any model’s training data.

Any team with a large, proprietary codebase can benefit:Identify your tribal knowledge gaps.

What’s NextWe are expanding context coverage to additional pipelines across Meta’s data infrastructure and exploring tighter integration between context files and code generation workflows.

This approach turned undocumented tribal knowledge into structured, AI-readable con…

1 month назад @ engineering.fb.com
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure

This is the second post in the Ranking Engineer Agent blog series exploring the autonomous AI capabilities accelerating Meta’s Ads Ranking innovation.

We introduce KernelEvolve, an agentic kernel authoring system used by Ranking Engineer Agent and generally applicable to a range of AI models beyond Ads Ranking.

Unlike typical large language model (LLM)-based agents that perform one-shot code generation, KernelEvolve treats kernel optimization as a search problem.

A standard coding assistant lacks the context to write optimized MTIA kernels because it has never seen MTIA documentation, instruction set details, or programming idioms.

KernelEvolve represents an early step toward the vision of …

1 month назад @ engineering.fb.com
Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads
Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

To overcome this, we have developed the Meta Adaptive Ranking Model, which effectively bends the inference scaling curve with high ROI and industry-leading efficiency.

Introducing Meta Adaptive Ranking ModelServing LLM-scale & complexity models in a real-time ads recommendation environment requires resolving a fundamental tension between model complexity and system efficiency.

Adaptive Ranking Model addresses these challenges through a paradigm shift powered by three core innovations across the serving stack:Inference-efficient model scaling: Adaptive Ranking Model achieves a model complexity equivalent to the O(10 GFLOPs) per token used by top-tier LLMs.

To minimize compute overhead, Adapt…

1 month назад @ engineering.fb.com
AI for American-Produced Cement and Concrete
AI for American-Produced Cement and Concrete AI for American-Produced Cement and Concrete

Concurrent with the 2026 American Concrete Institute (ACI) Spring Convention, Meta is releasing a new AI model for designing concrete mixes – Bayesian Optimization for Concrete (BOxCrete), as well as the foundational data used to develop award-winning concrete mixes.

Amrize operates 18 cement plants, 141 cement terminals and 269 ready-mix concrete sites across North America.

Alongside the event, Meta is releasing a new AI model for designing concrete mixes, Bayesian Optimization for Concrete (BOxCrete).

How Meta Leverages AI for Concrete MixturesMeta’s AI for concrete model can help suppliers more quickly incorporate U.S. materials into their mixes through an approach called adaptive experi…

1 month, 1 week назад @ engineering.fb.com
Friend Bubbles: Enhancing Social Discovery on Facebook Reels
Friend Bubbles: Enhancing Social Discovery on Facebook Reels Friend Bubbles: Enhancing Social Discovery on Facebook Reels

Friend bubbles in Facebook Reels highlight Reels your friends have liked or reacted to, helping you discover new content and making it easier to connect over shared interests.

Friend bubbles enhance the social experience on Facebook Reels by helping you discover content your friends enjoy, creating a shared viewing experience and sparking new conversations.

Along with additional optimizations in the underlying method, this approach enabled us to ship friend bubbles while preserving core Reels performance.

Friend bubbles work because the signal is high value: It adds meaningful social context that helps people decide what’s worth watching.

Engagement also scales consistently with the number …

1 month, 2 weeks назад @ engineering.fb.com
Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation
Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation

Meta’s Ranking Engineer Agent (REA) autonomously executes key steps across the end-to-end machine learning (ML) lifecycle for ads ranking models.

Powering these interactions are highly sophisticated, complex and massively distributed machine learning (ML) models that continuously evolve to serve both advertisers and people who use the platforms.

Optimizing these ML models has traditionally been time-consuming.

To address this, Meta built the Ranking Engineer Agent, an autonomous AI agent designed to drive the end-to-end ML lifecycle and iteratively evolve Meta’s ads ranking models at scale.

ML training jobs run for hours or days, far beyond what any session-bound assistant can manage.

1 month, 2 weeks назад @ engineering.fb.com
Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps
Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps

Nowhere is this more apparent than in mobile security, where a single class of vulnerability can be replicated across hundreds of call sites scattered throughout a sprawling, multi-app codebase serving billions of users.

Meta’s Product Security team has developed a two-pronged strategy to address this:Designing secure-by-default frameworks that wrap potentially unsafe Android OS APIs and make the secure path the easiest path for developers, andLeveraging generative AI to automate the migration of existing code to those frameworks at scale.

The result is a system that can propose, validate, and submit security patches across millions of lines of code with minimal friction for the engineers w…

1 month, 3 weeks назад @ engineering.fb.com
RCCLX: Innovating GPU communications on AMD platforms
RCCLX: Innovating GPU communications on AMD platforms RCCLX: Innovating GPU communications on AMD platforms

RCCLX is fully integrated with Torchcomms and aims to empower researchers and developers to accelerate innovation, regardless of their chosen backend.

We want to iterate on collectives, transports, and novel features quickly on AMD platforms.

With RCCLX, we have integrated CTran to AMD platforms, enabling the AllToAllvDynamic – a GPU-resident collective.

These features provide significant performance improvements on AMD platforms and we are excited to share this with the community.

RCCLX Quick Start GuideInstall Torchcomms with RCCLX backend by following the installation instructions in the Torchcomms repo.

2 months, 1 week назад @ engineering.fb.com
The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It
The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It

A Catching JiTTest focuses specifically on finding regressions introduced by a code change.

Agentic development dramatically increases the pace of code change, straining test development burden and scaling the cost of false positives and test maintenance to breaking point.

And since the JiTTest itself is LLM-generated, it can often infer the plausible intention of a code change and simulate possible faults that may result from it.

With them engineers no longer have to spend time writing, reviewing, and testing complex test code.

READ THE PAPERJust-in-Time Catching Test Generation at Meta

2 months, 3 weeks назад @ engineering.fb.com
Adapting the Facebook Reels RecSys AI Model Based on User Feedback
Adapting the Facebook Reels RecSys AI Model Based on User Feedback Adapting the Facebook Reels RecSys AI Model Based on User Feedback

Our new User True Interest Survey (UTIS) model , now helps surface more niche, high-quality content and boosts engagement, retention, and satisfaction.

Our paper, “ Improve the Personalization of Large-Scale Ranking Systems by Integrating User Survey Feedback ” shares full details on this work.

The main candidate ranking model used by the platform is a large multi-task, multi-label model.

We trained a lightweight UTIS alignment model layer on the collected user survey responses using existing predictions of the main model as input features.

The UTIS model consistently outperformed the baseline, driving higher user engagement and retention .

3 months, 3 weeks назад @ engineering.fb.com
DrP: Meta’s Root Cause Analysis Platform at Scale
DrP: Meta’s Root Cause Analysis Platform at Scale DrP: Meta’s Root Cause Analysis Platform at Scale

DrP’s key components include:Expressive SDK : The DrP SDK allows engineers to codify investigation workflows into analyzers.

Post-processing system : After an investigation, the post-processing system can take automated actions based on the analysis results.

Bootstrap code : The DrP SDK provides bootstrap code to create a template analyzer with pre-populated boilerplate code.

Data access and analysis : The SDK includes libraries for data access and analysis, such as dimension analysis and time series correlation.

This provides immediate analysis results to on-call engineers.

4 months, 2 weeks назад @ engineering.fb.com
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks

Generative AI and automation accelerate the adoption of secure frameworks at scale, enabling consistent security enforcement and efficient migration across Meta’s vast codebase.

How We Design Secure-by-Default Frameworks at MetaDesigning secure-by-default frameworks for use by a large number of developers shipping vastly different features across multiple apps is an interesting challenge.

There shouldn’t be one security framework that covers all security issues, and not every security issue is general enough to deserve its own framework.

Now that we’ve looked at the design philosophy behind our frameworks, let’s look at one of our most widely used Android security frameworks, SecureLinkLaun…

4 months, 3 weeks назад @ engineering.fb.com
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization

Zoomer has delivered training time reductions, and significant QPS improvements, making it the de-facto tool for AI performance optimization across Meta’s entire AI infrastructure.

Zoomer is Meta’s automated, one-stop-shop platform for performance profiling, debugging, analysis, and optimization of AI training and inference workloads.

AI Performance Optimization Using ZoomerZoomer is an automated debugging and optimization platform that works across all of our AI model types (ads recommendations, GenAI, computer vision, etc.)

Memory Analysis : Comprehensive analysis of GPU memory usage patterns, allocation tracking, and leak detection.

Realtime Memory Profiling : GPU memory allocation track…

5 months, 2 weeks назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 5 months назад
We are joining OpenAI
We are joining OpenAI We are joining OpenAI

Piotr Niedźwiedź, CEO/CTO and founder of neptune.aiI’m excited to share that we’ve entered into a definitive agreement to be acquired by OpenAI, subject to closing conditions.

We are thrilled to join the OpenAI team and help their AI researchers build better models faster.

Neptune is a metrics dashboard company.”We’ve worked closely with OpenAI to create the metrics dashboard that helps teams building foundation models.

Our future with OpenAINeptune will join OpenAI and continue to support AI researchers with tools to monitor, debug, and evaluate frontier models.

We are looking forward to working with top AI researchers and supporting OpenAI’s mission of ensuring that AGI benefits all of hu…

5 months назад @ neptune.ai
Synthetic Data for LLM Training
Synthetic Data for LLM Training Synthetic Data for LLM Training

For instance, financial data is highly sensitive and protected by very strict regulations, and synthetic data mimics the real data distribution without revealing customer information.

Read more about how leading foundation model teams curate their training data and other topics in the State of Foundation Model Training Report 2025.

Choosing the right synthetic data generation technique depends on the type of data and its complexity.

Synthetic tabular data generation is a promising direction to overcome these challenges by learning the distribution of the tabular data.

Post-processingAs the distribution of tabular data is highly complex, it makes the synthetic tabular data generation very ch…

5 months, 3 weeks назад @ neptune.ai
What are LLM Embeddings: All you Need to Know
What are LLM Embeddings: All you Need to Know What are LLM Embeddings: All you Need to Know

TL;DR LLM embeddings are the numerical, vector representations of text that Large Language Models (LLMs) use to process information.

Unlike their predecessor word embeddings, LLM embeddings are context-aware and dynamically change to capture semantic and syntactic relationships based on the surrounding text.

What are the applications of LLM embeddings?

Word EmbeddingsSparse Word Embeddings One-Hot Vectors 1970s TF-IDF1980s Co-Occurrence MatrixStatic Word Embeddings Word2Vec 2013 GloVe 2014Contextualized word embeddings ELMo 2018 GPT-1 2018 BERT 2018 LLAMA 2023 DeepSeek-V1 2023 GPT-4 2023Static word embeddingsStatic word embeddings, such as word2vec in 2013, marked a significant development.…

6 months назад @ neptune.ai
Detecting and Fixing ‘Dead Neurons’ in Foundation Models
Detecting and Fixing ‘Dead Neurons’ in Foundation Models Detecting and Fixing ‘Dead Neurons’ in Foundation Models

TL;DR Dead neurons silently waste compute and reduce effective model capacity in foundation models.

Dead neurons’ impactRecent studies into dead neurons in the context of foundation models show interesting, albeit worrying, results.

These large reported fractions of dead neurons in foundation models are a concern from a computational perspective.

Before we move on to discuss how to detect and fix dead neurons, let’s touch upon an important distinction between dead neurons and vanishing gradients.

Further reading How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models Read moreVisualizing activation distributionsIs your foundation model suffering from dead neurons?

6 months, 1 week назад @ neptune.ai
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training

In the first part of this series, we covered the fundamentals of instruction fine-tuning (IFT).

def calculate_irs(instruction, output, reference_model): evaluation_prompt = f""" Instruction: {instruction} Model Output: {output} Rate how well the output follows the instruction on these criteria: 1.

| SourceHINT addresses a computational inefficiency in standard instruction fine-tuning: repeatedly reprocessing the same task instruction with every input example.

Read more about foundation model training infrastructure and other topics in Neptune’s 2025 State of Foundation Model Training Report.

First, during initial instruction fine-tuning across multiple diverse tasks, the model learns genera…

6 months, 2 weeks назад @ neptune.ai
How to Optimize LLM Inference
How to Optimize LLM Inference How to Optimize LLM Inference

Large Language Model (LLM) inference at scale is challenging as it involves transferring massive amounts of model parameters and data and performing computations on large tensors.

In the following, we’ll use the Llama model family architecture as a specific example to understand the LLM workload at inference.

For a far more detailed analysis of the LLM workload at inference, see the chapter All About Transformer Inference in the book How to Scale Your Model, published by Google DeepMind.

See also How to Run LLMs Locally Read moreA quick primer on hardware for LLM inferenceA typical LLM inference cluster consists of several nodes, each with a multi-core CPU and multiple accelerator devices, …

6 months, 3 weeks назад @ neptune.ai
A Researcher’s Guide to LLM Grounding
A Researcher’s Guide to LLM Grounding A Researcher’s Guide to LLM Grounding

In this article, we’ll explore the fundamental concepts of LLM grounding as well as strategies for optimally grounding models.

What is LLM grounding?

LLM grounding is analogous.

If relevant knowledge cannot be inferred from the data, then LLM grounding cannot yield more relevant responses.

When grounding LLMs using RAG, consider retaining only a few of the top hits (i.e., top-k) for your retrieval queries.

7 months, 1 week назад @ neptune.ai
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions

TL;DR Instruction fine-tuning (IFT) refines pre-trained large language models (LLMs) to follow specific task instructions by training on prompt-response pairs.

Instruction fine-tuning in a nutshellIFT tailors LLMs to follow user instructions by bridging their inherent next-word prediction with human-defined objectives.

Related LLM Fine-Tuning and Model Selection Using Neptune and Transformers Read moreParameter-efficient instruction fine-tuningWhile major foundation models like GPT-4 or Llama-2 undergo full parameter instruction fine-tuning during development, parameter-efficient fine-tuning (PEFT) methods have become widely adopted for instruction fine-tuning since the LoRA paper was publi…

7 months, 3 weeks назад @ neptune.ai
Understanding Prompt Injection: Risks, Methods, and Defense Measures
Understanding Prompt Injection: Risks, Methods, and Defense Measures Understanding Prompt Injection: Risks, Methods, and Defense Measures

Prompt injection 101: When prompts go rogueThe term ‘Prompt Injection’ comes from SQL injection attacks.

There is another claim of the independent discovery of prompt injection attacks, which suggests that Riley Goodside publicly exhibited a prompt injection in a tweet back in September 2022.

The indirect prompt injection attacks are classified into active, passive, user-driven and virtual prompt attacks.

Virtual prompt injection attacksThis injection type is closely related to passive injection attacks previously described.

Prompt injection: current challenges & lessons learnedThe arms race between prompt injection attacks and defenses is a challenge for researchers, developers, and users.

9 months назад @ neptune.ai
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections] SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]

This simple idea avoids computing loss on input prompt tokens the model already knows.

Prompt tokens are (too) expensive in low-resource settingsDuring pre-training, LLMs are trained in causal language modeling through a next-token prediction task.

=> Mo fẹ́ràn ìrẹsì,” the model is trained to predict every token, from the prompt to the actual answer:Step Prompt Next token 1 Translate English Static prompt 2 Translate English to Static prompt 3 Translate English to Yoruba: Static prompt 4 Translate English to Yoruba: I 5 Translate English to Yoruba: I love 6 Translate English to Yoruba: I love rice.

This is straightforward to implement in PyTorch by masking out the prompt tokens in the label …

9 months, 1 week назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 2 months назад
I BUILT A FULLY AUTOMATIC MANSPLAINER
I BUILT A FULLY AUTOMATIC MANSPLAINER I BUILT A FULLY AUTOMATIC MANSPLAINER

All information about GTC and the DGX Spark Raffle is here: https://www.ykilcher.com/gtc Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Ethereu…

2 months назад @ youtube.com
Traditional X-Mas Stream
Traditional X-Mas Stream Traditional X-Mas Stream

Letsgooo

4 months, 1 week назад @ youtube.com
Traditional Holiday Live Stream
Traditional Holiday Live Stream Traditional Holiday Live Stream

https://ykilcher.com/discord Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yannic-kilcher

Minds: https://www.minds.com/ykilcher

Parler: https://parler.com/profile/YannicKilcher

LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https:/…

4 months, 1 week назад @ youtube.com
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis) TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)

Paper: https://arxiv.org/abs/2511.08923 Abstract:

Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation …

4 months, 1 week назад @ youtube.com
Titans: Learning to Memorize at Test Time (Paper Analysis)
Titans: Learning to Memorize at Test Time (Paper Analysis) Titans: Learning to Memorize at Test Time (Paper Analysis)

Paper: https://arxiv.org/abs/2501.00663 Abstract:

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We sh…

4 months, 3 weeks назад @ youtube.com
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff) [Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)

https://arxiv.org/abs/2510.17558 Abstract:

We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks. Author: François Fleuret Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the con…

6 months назад @ youtube.com
[Video Response] What Cloudflare's code mode misses about MCP and tool calling
[Video Response] What Cloudflare's code mode misses about MCP and tool calling [Video Response] What Cloudflare's code mode misses about MCP and tool calling

Theo's Video: https://www.youtube.com/watch?v=bAYZjVAodoo

Cloudflare article: https://blog.cloudflare.com/code-mode/ Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8…

6 months, 2 weeks назад @ youtube.com
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant) [Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

Paper: https://arxiv.org/abs/2508.21038 Abstract:

Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assumption that these difficulties are exclusively due to unrealistic queries, and those that are not can be overcome with better training data and larger models. In this work, we demonstrate that we may encounter these theoretical limitations in realist…

6 months, 3 weeks назад @ youtube.com
AGI is not coming!
AGI is not coming! AGI is not coming!

jack Morris's investigation into GPT-OSS training data https://x.com/jxmnop/status/1953899426075816164?t=3YRhVQDwQLk2gouTSACoqA&s=09

9 months назад @ youtube.com
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis) Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Paper: https://research.trychroma.com/context-rot Abstract:

Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.

In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. Authors: Kelly Hong, Anton Troynikov, Jeff Huber Links:

9 months, 2 weeks назад @ youtube.com
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review) Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Paper: https://arxiv.org/abs/2507.02092

Code: https://github.com/alexiglad/EBT

Website: https://energy-based-transformers.github.io/ Abstract:

Inference-time computation techniques, analogous to human System 2 Thinking, have recently become popular for improving model performances. However, most existing approaches suffer from several limitations: they are modality-specific (e.g., working only in text), problem-specific (e.g., verifiable domains like math and coding), or require additional supervision/training on top of unsupervised pretraining (e.g., verifiers or verifiable rewards). In this paper, we ask the question "Is it possible to generalize these System 2 Thinking approaches, and de…

9 months, 3 weeks назад @ youtube.com
Henry AI Labs Henry AI Labs
последний пост None
3blue1brown 3blue1brown
последний пост 2 weeks, 6 days назад
Covering 10 points, a surprisingly tricky puzzle.
Covering 10 points, a surprisingly tricky puzzle. Covering 10 points, a surprisingly tricky puzzle.

Made as part of a monthly series of puzzles for the 2026 Year of Math.

2 weeks, 6 days назад @ youtube.com
Escher's most mind-bending piece
Escher's most mind-bending piece Escher's most mind-bending piece

On "The Print Gallery", by M.C. Escher

Full video: https://youtu.be/ldxFjLJ3rVY

1 month, 1 week назад @ youtube.com
The subset sum puzzle
The subset sum puzzle The subset sum puzzle

Part of a series of monthly puzzlers. Stay subscribed to see the solution

1 month, 1 week назад @ youtube.com
Escher's most mathematically interesting piece
Escher's most mathematically interesting piece Escher's most mathematically interesting piece

Escher's Print Gallery, and the tour of complex analysis it invites.

Check out our virtual career fair: 3b1b.co/talent

Join channel supporters to see videos early: 3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Original paper by de Smit and Lenstra:

https://pub.math.leidenuniv.nl/~smitbde/papers/2003-de_smit-lenstra-escher.pdf Timestamps: 0:00 - The print gallery

13:04 - Conformal maps from complex analysis

21:41 - The complex exponential

25:56 - The complex logarithm

32:32 - 3b1b Talent

33:14 - Constructing the key function

40:16 - The deeper math behind Escher ------------------ These animations are largely made us…

1 month, 2 weeks назад @ youtube.com
Bacteria Grid Puzzle Solution
Bacteria Grid Puzzle Solution Bacteria Grid Puzzle Solution

Part of a monthly series of puzzlers, in collaboration with MoMath and Peter Winkler

1 month, 2 weeks назад @ youtube.com
The most underappreciated formula | Exploring high-dimensional spheres
The most underappreciated formula | Exploring high-dimensional spheres The most underappreciated formula | Exploring high-dimensional spheres

On the volumes of higher-dimensional spheres

Explore the 3b1b virtual career fair: See https://3b1b.co/talent

Become a supporter for early views of new videos: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Thanks to UC Santa Cruz for letting me film there, and special thanks to Pedro Morales-Almazan for arranging everything. My video on Numberphile with a fun application of this problem: https://youtu.be/6_yU9eJ0NxA Timestamps:

0:00 - Introduction

1:01 - Random puzzle

6:16 - Outside the box

14:35 - Setting up the volume grid

21:14 - Why 4πr^2

25:21 - Archimedes in higher dimensions

36:17 - The general formul…

2 months, 1 week назад @ youtube.com
The lattice bacteria puzzle
The lattice bacteria puzzle The lattice bacteria puzzle

Part of a series of monthly puzzles, done in collaboration with MoMath.

https://momath.org/mindbenders

2 months, 2 weeks назад @ youtube.com
Solution to the ladybug clock puzzle
Solution to the ladybug clock puzzle Solution to the ladybug clock puzzle

Solution to last month's probability puzzle.

2 months, 2 weeks назад @ youtube.com
The Hairy Ball Theorem
The Hairy Ball Theorem The Hairy Ball Theorem

Unexpected applications and a beautiful proof.

Looking for a new career? Check out https://3b1b.co/talent

Supporters get early access to new videos: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Credits:

Senia Sheydvasser: Co-writing and sphere deformation animations

Paul Dancstep: Those lovely fluffy sphere animations Vince Rubinetti: Music Timestamps:

0:00 - To comb a hairy ball

1:24 - Applications

8:46 - The puzzle of one null point

12:12 - The proof outline

16:41 - Defining orientation

21:44 - Why inside-out is impossible

25:59 - 3b1b Talent

27:44 - Final food for thought ------------------ These animati…

3 months назад @ youtube.com
The ladybug clock puzzle
The ladybug clock puzzle The ladybug clock puzzle

This is the first in a set of monthly puzzles, curated by Peter Winkler. This one was originally suggested by Richard Stanley. You can sign up to hear his description of the answer at http://momath.org/mindbenders

3 months, 2 weeks назад @ youtube.com
The most absurd product I've made
The most absurd product I've made The most absurd product I've made

Because why not make a pi creature neck pillow?

Available at 3b1b.co/store

5 months, 1 week назад @ youtube.com
How Laplace transforms solve differential equations
How Laplace transforms solve differential equations How Laplace transforms solve differential equations

Studying the forced harmonic oscillator by taking a Laplace transform and studying its poles.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Chapter on the Laplace Transform:

https://youtu.be/j0wJBEZdwLs Chapter on the S-plane and Simple Harmonic Motion:

https://youtu.be/-j8PzkZ70Lg Timestamps:

0:00 - Opening puzzle

1:06 - Key properties of a Laplace Transform

3:29 - Qualitative analysis with Laplace Transforms

4:29 - The Laplace Transforms of a Derivative

6:06 - The forced oscillator

11:59 - Intuition from the transformed solution

1…

6 months назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

6 months, 3 weeks назад @ youtube.com
But what is a Laplace Transform?
But what is a Laplace Transform? But what is a Laplace Transform?

Visualizing the most important tool for differential equations.

Previous chapter: https://youtu.be/-j8PzkZ70Lg

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Artwork by Kurt Bruns Engine animation borrowed with permission from this (excellent) blog: https://ciechanow.ski/internal-combustion-engine/ Timestamps:

0:00 - Understanding the engine

1:16 - Key background ideas

5:41 - Definition and intuition

10:43 - Complex integration

20:43 - Analytic continuation

23:52 - The transform of exponentials

26:15 - A deep look at cos(t)

32:59 - W…

6 months, 3 weeks назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

6 months, 3 weeks назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 19 часов назад
DeepSeek V4 AI: Crushing The Competition
DeepSeek V4 AI: Crushing The Competition DeepSeek V4 AI: Crushing The Competition

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 Check out DeepSeek here:

https://www.deepseek.com/en/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu

19 часов назад @ youtube.com
NVIDIA's New AI Turns One Photo Into A World That Never Breaks
NVIDIA's New AI Turns One Photo Into A World That Never Breaks NVIDIA's New AI Turns One Photo Into A World That Never Breaks

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://research.nvidia.com/labs/sil/projects/lyra2/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https:…

3 days, 18 hours назад @ youtube.com
Sakana AI’s God Simulator Is Brilliant
Sakana AI’s God Simulator Is Brilliant Sakana AI’s God Simulator Is Brilliant

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 Try it out! The paper is available here:

https://pub.sakana.ai/digital-ecosystem/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https:…

5 days, 18 hours назад @ youtube.com
This Is Why AI Videos Feel Wrong
This Is Why AI Videos Feel Wrong This Is Why AI Videos Feel Wrong

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://research.nvidia.com/labs/sil/projects/MOTIVE/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https…

1 week, 1 day назад @ youtube.com
NVIDIA’s New AI Changed Robotics Forever
NVIDIA’s New AI Changed Robotics Forever NVIDIA’s New AI Changed Robotics Forever

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://nvlabs.github.io/GEAR-SONIC/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu #nv…

1 week, 4 days назад @ youtube.com
DeepMind’s New AI: A Gift To Humanity
DeepMind’s New AI: A Gift To Humanity DeepMind’s New AI: A Gift To Humanity

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Links:

https://ai.google.dev/gemma/docs/core/model_card_4 Fine tuning with Matt Mireles: https://x.com/mattmireles/status/2041606508220489786 Other sources:

https://x.com/googlegemma/status/2041256042882105666?s=46

https://x.com/nakazakifam/status/2041286410930446370

https://x.com/measure_plan/status/2039815699695104343

https://x.com/maddiedreese/status/2041677327604838685?s=46

https://x.com/steipete/status/2042615534567457102?s=46

https://x.com/maziyarpanahi/status/2042592050940449260?s=46

https://x.com/adrgrondin/status/2041962263507083340?s=46

https://x.com/evgeniymikholap/status/2041104232648950170

https:…

2 weeks, 6 days назад @ youtube.com
“Anthropic’s New AI Is Too Dangerous To Release”
“Anthropic’s New AI Is Too Dangerous To Release” “Anthropic’s New AI Is Too Dangerous To Release”

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://www.anthropic.com/claude-mythos-preview-system-card Links and sources:

https://debugml.github.io/cheating-agents/

https://x.com/bstnxbt/status/2042967285715865685 Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, T…

3 weeks, 1 day назад @ youtube.com
NVIDIA’s New AI: The Biggest Leap In Robot Learning Yet
NVIDIA’s New AI: The Biggest Leap In Robot Learning Yet NVIDIA’s New AI: The Biggest Leap In Robot Learning Yet

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper is available here:

https://dreamdojo-world.github.io/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu …

3 weeks, 4 days назад @ youtube.com
NVIDIA’s New AI: A Revolution...For Free!
NVIDIA’s New AI: A Revolution...For Free! NVIDIA’s New AI: A Revolution...For Free!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The #NVIDIA paper on Nemotron 3 Super is available here:

https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My …

4 weeks, 1 day назад @ youtube.com
Google New TurboQuant AI: Hype vs. Reality
Google New TurboQuant AI: Hype vs. Reality Google New TurboQuant AI: Hype vs. Reality

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The TurboQuant paper is available here:

https://arxiv.org/abs/2504.19874 Reproduction: https://x.com/AlicanKiraz0/status/2038245538865275274

KV-cache source: https://huggingface.co/blog/not-lain/kv-caching Reviews and criticisms of the paper:

https://openreview.net/forum?id=tO3ASKZlok

https://x.com/gaoj0017/status/2037532673812443214 Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fre…

1 month назад @ youtube.com
DeepMind’s New AI Just Changed Science Forever
DeepMind’s New AI Just Changed Science Forever DeepMind’s New AI Just Changed Science Forever

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://arxiv.org/abs/2602.10177 Source:

https://www.youtube.com/watch?v=6evUpgCHtOQ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~z…

1 month, 1 week назад @ youtube.com
The Algorithm That Made Me Cry
The Algorithm That Made Me Cry The Algorithm That Made Me Cry

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Free course on Ray Tracing:

https://users.cg.tuwien.ac.at/zsolnai/gfx/rendering-course/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: ht…

1 month, 1 week назад @ youtube.com
DeepSeek Just Fixed One Of The Biggest Problems With AI
DeepSeek Just Fixed One Of The Biggest Problems With AI DeepSeek Just Fixed One Of The Biggest Problems With AI

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The #DeepSeek paper is available here:

https://github.com/deepseek-ai/Engram

https://arxiv.org/abs/2601.07372 Larry Wheels:

https://www.youtube.com/watch?v=7SM816P5G9s&lc=Ugz7yiDrr_8YD7w8gaN4AaABAg Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Taz…

1 month, 1 week назад @ youtube.com
Honey Is Way More Complex Than You Think
Honey Is Way More Complex Than You Think Honey Is Way More Complex Than You Think

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://xuan-li.github.io/pdf/publications/li2024dynamicduo.pdf Sources:

https://www.youtube.com/watch?v=CfEg7fucVYg Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My rese…

1 month, 3 weeks назад @ youtube.com
NVIDIA’s New AI Just Cracked The Hardest Part Of Self Driving
NVIDIA’s New AI Just Cracked The Hardest Part Of Self Driving NVIDIA’s New AI Just Cracked The Hardest Part Of Self Driving

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://github.com/NVlabs/alpamayo Research panel I will be at GTC:

https://www.nvidia.com/gtc/session-catalog/sessions/gtc26-s81810/ Sources:

https://www.youtube.com/watch?v=0aq4Wi2rsOk

https://www.youtube.com/watch?v=I0yPzZp6dM0 Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundv…

1 month, 4 weeks назад @ youtube.com
DataFest Video DataFest Video
последний пост None
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 5 days назад
Как Brickit определяет классы деталей LEGO
Как Brickit определяет классы деталей LEGO Как Brickit определяет классы деталей LEGO

Андрей Татаринов, CEO и CTO в Epoch8, рассказал о системе компьютерного зрения для приложения Brickit, которое сканирует множество деталей лего и подсказывает, что из них можно собрать. В докладе Андрей объяснил, как его команда боролась с редкими классами и оптимизировала пайплайн под мобильные устройства. А ещё он поделился MLOps-решениями для масштабируемого дообучения и поддержки модели. Полное видео уже на канале! #Brickit, #Lego, #AI, #Нейросети, #ComputerVision, #DeepLearning, #MobileAI, #Tech, #Стартап, #Будущее, #MachineLearning, #Гаджеты, #Технологии, #WOW, #AIприложение

5 days назад @ youtube.com
С какими данными работает Brickit
С какими данными работает Brickit С какими данными работает Brickit

Андрей Татаринов, CEO и CTO в Epoch8, рассказал о системе компьютерного зрения для приложения Brickit, которое сканирует множество деталей лего и подсказывает, что из них можно собрать. В докладе Андрей объяснил, как его команда боролась с редкими классами и оптимизировала пайплайн под мобильные устройства. А ещё он поделился MLOps-решениями для масштабируемого дообучения и поддержки модели. Полное видео уже на канале! #Brickit, #Lego, #AI, #Нейросети, #ComputerVision, #DeepLearning, #MobileAI, #Tech, #Стартап, #Будущее, #MachineLearning, #Гаджеты, #Технологии, #WOW, #AIприложение

1 week, 2 days назад @ youtube.com
Как работает разметка классов в Brickit
Как работает разметка классов в Brickit Как работает разметка классов в Brickit

Андрей Татаринов, CEO и CTO в Epoch8, рассказал о системе компьютерного зрения для приложения Brickit, которое сканирует множество деталей лего и подсказывает, что из них можно собрать. В докладе Андрей объяснил, как его команда боролась с редкими классами и оптимизировала пайплайн под мобильные устройства. А ещё он поделился MLOps-решениями для масштабируемого дообучения и поддержки модели. Полное видео уже на канале! #Brickit, #Lego, #AI, #Нейросети, #ComputerVision, #DeepLearning, #MobileAI, #Tech, #Стартап, #Будущее, #MachineLearning, #Гаджеты, #Технологии, #WOW, #AIприложение

2 weeks назад @ youtube.com
Возможен ли единый рекомендатель для всех сервисов?
Возможен ли единый рекомендатель для всех сервисов? Возможен ли единый рекомендатель для всех сервисов?

Технически единый рекомендатель для всех сервисов возможен. Но есть две проблемы: данные Музыки, Кинопоиска, Маркета и Афиши плохо обобщаются, а применять такой единый ранкер просто негде — мы же не сравниваем трек с выставкой. Поэтому вместо одной модели — блендеры: каждый сервис ранжирует своё, а потом блоки смешиваются в единую ленту. А вы хотели бы видеть в одной ленте и музыку, и фильмы, и товары? 👇 #единыйрекомендатель #яндекс #яндексмузыка #кинопоиск #маркет #афиша #рекомендательныесистемы #машинноеобучение #ml #блендер #ранжирование

2 weeks, 6 days назад @ youtube.com
Где грань между советом и манипуляцией?
Где грань между советом и манипуляцией? Где грань между советом и манипуляцией?

Любой совет — это в какой-то степени манипуляция. Всё зависит от цели: вы пытаетесь что-то продать или действительно хотите помочь пользователю? Отвечает Даня Бурлаков, руководитель группы рекомендательных продуктов. В этом шортсе говорим об этике рекомендательных систем, о том, где проходит эта тонкая грань и почему честность важнее. А вы чувствуете, когда алгоритм пытается вами манипулировать? Делитесь в комментариях 👇 Смотрите полное видео на канале! #совет #манипуляция #этика #рекомендательныесистемы #искусственныйинтеллект #яндекс #машинноеобучение #ml #этикаии

3 weeks, 3 days назад @ youtube.com
Влияют ли рекомендации на наше поведение?
Влияют ли рекомендации на наше поведение? Влияют ли рекомендации на наше поведение?

Рекомендательные системы — это не просто алгоритмы. Они влияют на наше настроение, а через него — на многое другое. В этом шортсе разбираемся, как музыка и другие рекомендации меняют пользователей и почему одна из наших целей — делать людей счастливее. Рассказывает Даня Бурлаков, руководитель группы рекомендательных продуктов. Смотрите полное видео на канале! А как вы считаете, рекомендации влияют на ваше настроение? Пишите в комментариях 👇 #влияниерекомендаций #поведениепользователей #настроение #рекомендательныесистемы #яндексмузыка #психология #машинноеобучение #ml #какэтоработает

3 weeks, 5 days назад @ youtube.com
Почему рекомендации замыкаются на знакомом контенте?
Почему рекомендации замыкаются на знакомом контенте? Почему рекомендации замыкаются на знакомом контенте?

Что сделать, чтобы рекомендации не надоели пользователю? Как находить музыкальные треки и фильмы, которые будут расширять его интересы? Рассказывает Даня Бурлаков, руководитель группы рекомендательных продуктов. Смотрите полное видео на канале! #эффектпузыря #фильтрпузыря #разнообразие #рекомендательныесистемы #яндексмузыка #машинноеобучение #трансформеры #ml #новоемузыка

4 weeks, 1 day назад @ youtube.com
«Я хочу всё выбросить и сделать заново» — почему это так больно
«Я хочу всё выбросить и сделать заново» — почему это так больно «Я хочу всё выбросить и сделать заново» — почему это так больно

Даня Бурлаков, руководитель группы рекомендательных продуктов, рассказал, что мешает внедрить ML-модели в привычный флоу кандидат-генераций и ранкеров. Смотрите полное видео на канале! #яндекс #генеративныемодели #генеративныйии #рекомендательныесистемы #машинноеобучение #ml #argus #catboost #внедрение #продакшен #mldevelopment

1 month назад @ youtube.com
Заметны ли изменения в рекомендациях Яндекс Музыки?
Заметны ли изменения в рекомендациях Яндекс Музыки? Заметны ли изменения в рекомендациях Яндекс Музыки?

Даня Бурлаков, руководитель группы рекомендательных продуктов, рассказал, как команда внедряет новые фичи и какой эффект они приносят. Смотрите полное видео на канале! #яндекс #яндексмузыка #рекомендации #рекомендательныесистемы #машинноеобучение #ml #аргус #argus #трансформеры #ранжирование

1 month назад @ youtube.com
Визуально-текстовая омни-модель: путь к объединению LLM и VLM / Роман Исаченко
Визуально-текстовая омни-модель: путь к объединению LLM и VLM / Роман Исаченко Визуально-текстовая омни-модель: путь к объединению LLM и VLM / Роман Исаченко

На Saturday ML Party Роман Исаченко, руководитель группы анализа изображений в Яндекс R&D, рассказал, как выглядел долгий путь к сведению LLM и VLM из части семейства Alice AI в единую омни-модель. Она умеет работать с текстом и изображениями в одном контуре. А ещё поделился ключевыми этапами, компромиссами и планами по развитию модели в ближайшем будущем. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #AI, #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #AIConference

1 month назад @ youtube.com
Function calling без реальных данных / Ольга Цымбой и Рамиль Латыпов
Function calling без реальных данных / Ольга Цымбой и Рамиль Латыпов Function calling без реальных данных / Ольга Цымбой и Рамиль Латыпов

Обучение языковых моделей взаимодействовать с инструментами упирается в дефицит данных. Открытые датасеты ограничены по тематикам, содержат мало сложных сценариев и практически не встречаются на русском языке. На Saturday ML Party коллеги из Т-Банка Ольга Цымбой, старший исследователь-разработчик, и Рамиль Латыпов, исследователь-разработчик, рассказали, как они построили полностью синтетический пайплайн генерации function calling данных. А также разобрали шаги обучения и показали, как этот подход позволил прирастить качество на специализированных бенчмарках. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #AI, #MachineLearning, #LLM, #GenAI, #AI…

1 month назад @ youtube.com
LLM в рекомендациях: теперь мы знаем почти всё о стиле жизни и вкусах покупателя / Владислав Уржумов
LLM в рекомендациях: теперь мы знаем почти всё о стиле жизни и вкусах покупателя / Владислав Уржумов LLM в рекомендациях: теперь мы знаем почти всё о стиле жизни и вкусах покупателя / Владислав Уржумов

Как узнать интересы каждого покупателя, если есть только история действий на Маркете и немного метаинформации? На Saturday ML Party Владислав Уржумов, разработчик группы анализа данных и ML для рекомендаций в Яндекс Маркете, рассказал про вариант такого подхода от коллег из китайского маркетплейса. Внутри: как удалось адаптировать метод под наши данные и пользователей, а заодно прирастить метрики. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #ai , #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #aiconference

1 month, 1 week назад @ youtube.com
RAG-системы сегодня: архитектуры, качество и наши кейсы / Андрей Соколов
RAG-системы сегодня: архитектуры, качество и наши кейсы / Андрей Соколов RAG-системы сегодня: архитектуры, качество и наши кейсы / Андрей Соколов

RAG заметно изменилась — теперь это полноценная система с разными уровнями обработки, оценкой качества и настройкой моделей под задачу, а не простая связка поиска и генерации. На Saturday ML Party Андрей Соколов, руководитель команды обучения моделей с внешним контекстом в Яндекс R&D, поделился продуктовыми кейсами и разобрал на их примере, как такие системы устроены сегодня, какие подходы работают на практике и когда стоит дообучать модель. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #ai #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #A…

1 month, 1 week назад @ youtube.com
Мультиагентные системы банковского сектора / Артём Хусаенов
Мультиагентные системы банковского сектора / Артём Хусаенов Мультиагентные системы банковского сектора / Артём Хусаенов

На Saturday ML Party Артём Хусаенов, CDS платформы цифровых ассистентов в Сбербанке, объяснил, как замерить качество агентов там, где автоматические оценки не дают прозрачного результата, LLM-as-a-judge не работает, а аутсорс-разметка по инструкциям не отображает реальности. А также рассказал, как перестроить продукт без потери пользовательского опыта. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #ai, #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #AIConference

1 month, 1 week назад @ youtube.com
MarketAI для продавцов: с нуля до мультиагентной системы / Владислав Вихров
MarketAI для продавцов: с нуля до мультиагентной системы / Владислав Вихров MarketAI для продавцов: с нуля до мультиагентной системы / Владислав Вихров

В мультиагентных системах мы экипируем LLM мощными инструментами, даём моделям возможность вызывать различные функции и общаться между собой. Но не всё так просто: на практике данных не хватает, запрос пользователя не всегда очевиден, а похожих кейсов мало. На Saturday ML Party Владислав Вихров, ML-разработчик в Яндекс Маркете, разобрал все эти сложности и поделился конкретными решениями, которые позволили их преодолеть. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #ai, #machinelearning , #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #AIConference

1 month, 1 week назад @ youtube.com
ML Trainings ML Trainings
последний пост 3 days, 4 hours назад
Корпоративная шизофрения Google
Корпоративная шизофрения Google Корпоративная шизофрения Google 3 days, 4 hours назад @ youtube.com
Когда вера в пирамиду начинает трещать
Когда вера в пирамиду начинает трещать Когда вера в пирамиду начинает трещать 3 days, 4 hours назад @ youtube.com
Киберпанк Невозможно
Киберпанк Невозможно Киберпанк Невозможно 3 days, 4 hours назад @ youtube.com
Валентин Малых подозревает, что Илон Маск был прав
Валентин Малых подозревает, что Илон Маск был прав Валентин Малых подозревает, что Илон Маск был прав 3 days, 4 hours назад @ youtube.com
Валентин Малых о росте стоимости подписок
Валентин Малых о росте стоимости подписок Валентин Малых о росте стоимости подписок 3 days, 4 hours назад @ youtube.com
Сomputer vision: применение на примере параболы
Сomputer vision: применение на примере параболы Сomputer vision: применение на примере параболы 3 days, 4 hours назад @ youtube.com
Капитанский мостик №17: ЦОД размером с Юту | ИИ дороже людей | пописай для ИИ
Капитанский мостик №17: ЦОД размером с Юту | ИИ дороже людей | пописай для ИИ Капитанский мостик №17: ЦОД размером с Юту | ИИ дороже людей | пописай для ИИ

0:00:00 введение

0:01:15 DeepSeek и Ascend 0:10:57 ЦОД размером с Юту

0:19:35 OpenAI и падение акций

0:22:28 Китай против покупки Manus

0:30:48 ChatGPT c рекламой

0:38:38 Cohere купил Aleph Alpha

0:44:52 закон про ИИ смягчили

0:47:50 Google и Anthropic

0:53:07 Tencent, Alibaba и DeepSeek

0:55:38 ИИ дороже людей

1:00:38 OpenAI и Microsoft

1:05:02 Газпромнефть и беспилотники

1:09:54 Groq быстрее Nvidia

1:12:45 ИИ спроектировал чип

1:17:46 пописай для ИИ ИИ-саммари:

Обсуждение последних новостей в области технологий, включая запуск новых моделей AI, развитие китайского рынка чипов и геополитические аспекты технологического бизнеса. Обсуждение текущих трендов в области искусственного интеллекта…

4 days, 3 hours назад @ youtube.com
Стартап переворачивает парадигму взаимодействия людей с ИИ агентами
Стартап переворачивает парадигму взаимодействия людей с ИИ агентами Стартап переворачивает парадигму взаимодействия людей с ИИ агентами 1 week, 3 days назад @ youtube.com
Дмитрий Колодезев отказывается от идеи съемки
Дмитрий Колодезев отказывается от идеи съемки Дмитрий Колодезев отказывается от идеи съемки 1 week, 3 days назад @ youtube.com
Дмитрий и Валентин обсуждают расходы на ИИ
Дмитрий и Валентин обсуждают расходы на ИИ Дмитрий и Валентин обсуждают расходы на ИИ 1 week, 3 days назад @ youtube.com
Димитров о фашизме и технофашистах
Димитров о фашизме и технофашистах Димитров о фашизме и технофашистах 1 week, 3 days назад @ youtube.com
Биологическое оружие будущего: вирусы с генетической направленностью
Биологическое оружие будущего: вирусы с генетической направленностью Биологическое оружие будущего: вирусы с генетической направленностью 1 week, 3 days назад @ youtube.com
Валентин Малых говорит о ChatGPT и распаде цивилизации
Валентин Малых говорит о ChatGPT и распаде цивилизации Валентин Малых говорит о ChatGPT и распаде цивилизации 1 week, 3 days назад @ youtube.com
Капитанский мостик №15: Сладкий Mythos | Манифест Palantir | Агент заплатит тебе
Капитанский мостик №15: Сладкий Mythos | Манифест Palantir | Агент заплатит тебе Капитанский мостик №15: Сладкий Mythos | Манифест Palantir | Агент заплатит тебе

0:00:00 Начало

0:00:48 Сладкий Mythos

0:07:03 Claude Code за 20$

0:10:06 SpaceX и Cursor

0:15:42 Вышла GPT5.5

0:24:20 Вышел DeepSeek-V4

0:31:00 Манифест Palantir

0:42:33 ИИ-саботаж

0:49:43 Claude и геном

0:55:55 Ozon и электроника

1:03:53 Новые TPU от Google

1:11:11 Google против Claude

1:15:10 Агент заплатит тебе ИИ-саммари:

Обсуждение последних новостей в области искусственного интеллекта, включая мифос, модели Anthropic, SpaceX и стратегию Илона Маска, а также анализ текущих трендов и перспектив развития технологий. Обсуждение последних достижений в области ИИ, контекстных моделей и их применения в бизнесе и безопасности. Анализ технологий, их потенциала и рисков, связанных с биологическ…

1 week, 4 days назад @ youtube.com
Что будет, если заменить китайскую бабушку на ChatGPT
Что будет, если заменить китайскую бабушку на ChatGPT Что будет, если заменить китайскую бабушку на ChatGPT 2 weeks, 2 days назад @ youtube.com
Primer Primer
последний пост 4 months назад
Taking AI Doom Seriously For 62 Minutes
Taking AI Doom Seriously For 62 Minutes Taking AI Doom Seriously For 62 Minutes

Patreon: https://www.patreon.com/primerlearning

80,000 Hours: 80000hours.org/primer https://www.desmos.com/calculator/a5pfjtr4tr Other connections:

Discord: https://discord.gg/NbruaNW

Twitch: https://www.twitch.tv/justin_helps

Store: https://store.dftba.com/collections/primer Reddit: https://www.reddit.com/r/primerlearning/

Bsky: https://bsky.app/profile/justinhelps.bsky.social

Twitter: https://twitter.com/primerlearning Links to other resources:

https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/

https://www.youtube.com/c/robertmilesai

https://www.youtube.com/@Siliconversations

https://www.youtube.com/@Go-Meta

https://www.youtube.com/@Dwarkes…

4 months назад @ youtube.com
Simulating a single brain cell
Simulating a single brain cell Simulating a single brain cell

Patreon:

https://www.patreon.com/primerlearning Helpful resources if you want to learn more about neural networks

https://www.youtube.com/@AndrejKarpathy

https://course.fast.ai/

https://www.youtube.com/@WelchLabsVideo

https://www.youtube.com/@3blue1brown Early papers. These probably aren't helpful for understanding the concepts in this video, but if you're interested in history.

The Perceptron – A perceiving and recognizing automaton: https://bpb-us-e2.wpmucdn.com/websites.umass.edu/dist/a/27637/files/2016/03/rosenblatt-1957.pdf

The Perceptron: A probabilistic model for information storage and organization in the brain: https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf A Logical…

7 months, 1 week назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 13 часов назад
#496 – FFmpeg: The Incredible Technology Behind Video on the Internet
#496 – FFmpeg: The Incredible Technology Behind Video on the Internet #496 – FFmpeg: The Incredible Technology Behind Video on the Internet

Jean-Baptiste Kempf is lead developer of VLC and president of VideoLAN.

Kieran Kunhya is a longtime FFmpeg contributor, codec engineer, and the person behind the now-infamous FFmpeg account on X.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep496-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://larridin.comBlitzy: AI agent for large enterprise codebases.

Go to https://perplexity.ai/OUTLINE:(00:00) – Introduction(03:00) – Sponsors, Comments, and Reflections(10:48) – Weirdest things VLC opens(15:12) – How video playback works(24:33) – Video codecs and containers(35:20) – FFmpeg explained(56:20)…

13 часов назад @ lexfridman.com
#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age
#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age #495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age

Lars Brownworth is a historian, teacher, podcaster, and author specializing in Viking history, medieval Europe, and the Byzantine Empire.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep495-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://larridin.comBetterHelp: Online therapy and counseling.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

Go to https://perplexity.ai/OUTLINE:(00:00) – Introduction(01:03) – Sponsors, Comments, and Reflections(08:57) – The start of the Viking Age(18:50) – Viking military strategy, tactics & technology(32:33) – Ragnar Lothbrok(42:00) – The Grea…

3 weeks, 6 days назад @ lexfridman.com
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution #494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution

Jensen Huang is the co-founder and CEO of NVIDIA, the world’s most valuable company and the engine powering the AI computing revolution.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep494-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

Go to https://quo.com/lexOUTLINE:(00:00) – Introduction(00:26) – Sponsors, Comments, and Reflections(06:34) – Extreme co-design and rack-scale engineering(09:20) – How Jensen runs NVIDIA(28:41) – AI scaling laws(43:41) – Biggest blockers to AI scaling laws(45:25) – Supply chain(47:20) – Memory(53:25) – Power…

1 month, 2 weeks назад @ lexfridman.com
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming #493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming

Jeff Kaplan is a legendary Blizzard game designer of World of Warcraft and Overwatch, now preparing to launch a new game, The Legend of California, from his new studio Kintsugiyama – available to wishlist on Steam today, with alpha later in March.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep493-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://fin.ai/lexBlitzy: AI agent for large enterprise codebases.

Go to https://blitzy.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexShopify: Sell stuff online.

1 month, 3 weeks назад @ lexfridman.com
#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music
#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music #492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music

Rick Beato is a music educator, interviewer, producer, songwriter, and a true multi-instrument musician, playing guitar, bass, cello & piano.

His incredible YouTube channel celebrates great musicians & musical ideas, and helps millions of people fall in love with great music all over again.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep492-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexBetterHelp: Online therapy and counseling.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

2 months, 1 week назад @ lexfridman.com
#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger
#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger #491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger

Peter Steinberger is the creator of OpenClaw, an open-source AI agent framework that’s the fastest-growing project in GitHub history.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep491-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://coderabbit.ai/lexFin: AI agent for customer service.

Go to https://fin.ai/lexBlitzy: AI agent for large enterprise codebases.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(03:51) – Sponsors, Comments, and Reflections(15:29) – OpenClaw origin story(18:48) – Mind-blowing moment(28:15) – Why OpenClaw went viral(32:12) – Self-modifying AI agent(36:57)…

2 months, 3 weeks назад @ lexfridman.com
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI #490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI

Nathan Lambert and Sebastian Raschka are machine learning researchers, engineers, and educators.

Sebastian Raschka is the author of Build a Large Language Model (From Scratch) and Build a Reasoning Model (From Scratch).

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep490-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

(25:11) – ChatGPT vs Claude vs Gemini vs Grok: Who is winning?

(36:11) – Best AI for coding(43:02) – Open Source vs Closed Source LLMs(54:41) – Transformers: Evolution of LLMs since 2019(1:02:38) – AI Scaling Laws: Are they dead or still holding?

3 months назад @ lexfridman.com
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle #489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle

Paul Rosolie is a naturalist, explorer, author of a new book titled Junglekeeper, and is someone who has dedicated his life to protecting the Amazon rainforest.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep489-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://perplexity.ai/BetterHelp: Online therapy and counseling.

Go to https://fin.ai/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

3 months, 3 weeks назад @ lexfridman.com
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins #488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins

Joel David Hamkins is a mathematician and philosopher specializing in set theory, the foundations of mathematics, and the nature of infinity, and he’s the #1 highest-rated user on MathOverflow.

He is also the author of several books, including Proof and the Art of Mathematics and Lectures on the Philosophy of Mathematics.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep488-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://masterclass.com/lexpodOUTLINE:(00:00) – Introduction(01:58) – Sponsors, Comments, and Reflections(15:40) – Infinity & paradoxes(1:02:50) – Russell’s paradox(1:15:57) – Gödel’s…

4 months назад @ lexfridman.com
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths #487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths

Irving Finkel is a scholar of ancient languages and a longtime curator at the British Museum, renowned for his expertise in Mesopotamian history and cuneiform writing.

He specializes in reading and interpreting cuneiform inscriptions, including tablets from Sumerian, Akkadian, Babylonian, and Assyrian contexts.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep487-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/Chevron: Reliable energy for data centers.

4 months, 3 weeks назад @ lexfridman.com
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life #486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life

Michael Levin is a biologist at Tufts University working on novel ways to understand and control complex pattern formation in biological systems.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep486-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

(2:42:41) – Mind uploading(3:01:22) – Alien intelligence(3:16:17) – Advice for young people(3:22:46) – Questions for AGI

5 months, 1 week назад @ lexfridman.com
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy #485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy

David Kirtley is a nuclear fusion engineer and CEO of Helion Energy, a company working on building the world's first commercial fusion power plant by 2028.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep485-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/david-kirtley-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

David's X: htt…

5 months, 2 weeks назад @ lexfridman.com
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming #484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming

Dan Houser is co-founder of Rockstar Games and is a legendary creative mind behind Grand Theft Auto (GTA) and Red Dead Redemption series of video games.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep484-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://box.com/aiUPLIFT Desk: Standing desks and office ergonomics.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(01:29) – Sponsors, Comments, and Reflections(11:32) – Greatest films of all time(23:45) – Making video games(26:36) – GTA 3(29:55) – Open world video games(32:42) – Character creation(36:09) – Superintelligent AI in A Bette…

6 months, 1 week назад @ lexfridman.com
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex #483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex

Julia Shaw is a criminal psychologist and author who in her books explores human nature, including psychopathy, violent crime, the psychology of evil, police interrogation, false memory manipulation, deception detection, and human sexuality.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep483-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

6 months, 3 weeks назад @ lexfridman.com
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature #482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature

Pavel Durov is the founder and CEO of Telegram.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep482-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Transcript:https://lexfridman.com/pavel-durov-transcriptCONTACT LEX:Feedback – give feedback to Lex: https://lexfridman.com/surveyAMA – submit questions, videos or call-in: https://lexfridman.com/amaHiring – join our team: https://lexfridman.com/hiringOther – other ways to get in touch: https://lexfridman.com/contactEPISODE LINKS:Pavel’s Telegram: https://t.me/durovPavel’s X: https://x.com/durovTelegram: https://telegram.org/Telegram Contests: https://contest.c…

7 months, 1 week назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 2 weeks, 2 days назад
Can we AI our way to a more sustainable world?
Can we AI our way to a more sustainable world? Can we AI our way to a more sustainable world?

Because I do think there’s a role for AI, a huge role for AI.

BURGER: Right, right.

BURGER: Right, right.

So I think that’s also something quite important here that, you know, AI can help facilitate.

And I think that’s not just applying AI to solve solutions through optimization but also thinking about this in an integrated way.

2 weeks, 2 days назад @ microsoft.com
Ideas: Steering AI toward the work future we want
Ideas: Steering AI toward the work future we want Ideas: Steering AI toward the work future we want

JANSSEN: Yeah, yeah, exactly.

TEEVAN: Yeah, yeah, yeah.

I’m curious what you have found particularly surprising about how people and organizations are leveraging AI right now.

And so I do like to picture a future of work where humans are flourishing with AI and where humans still get to do meaningful work.

And I’m very curious about how we can take advantage of AI and do more without running ourselves into the ground because we’re not AI, right?

3 weeks, 6 days назад @ microsoft.com
Will machines ever be intelligent?
Will machines ever be intelligent? Will machines ever be intelligent?

And the question we’re going to discuss is, are machines intelligent?

No, no, that’s right, that’s right.

I mean, in some sense, you could potentially have a super intelligent system, right, that’s far more intelligent than anything else on the planet.

BURGER: Right, right.

At the same time, I think, you know, transformers are not intelligent in the way that a three-year-old is, right?

1 month, 2 weeks назад @ microsoft.com
Trailer: The Shape of Things to Come
Trailer: The Shape of Things to Come Trailer: The Shape of Things to Come

Join Microsoft’s Doug Burger and guests as they dig into the fundamental truths about AI and how it will reshape the future.

Technical advances are moving at such a rapid pace that it can be challenging to define the tomorrow we’re working toward.

In The Shape of Things to Come, Microsoft research leader Doug Burger and experts from across disciplines tease out the thorniest AI issues facing technologists, policymakers, business decision-makers, and other stakeholders today.

It’s important to understand what the emerging shapes are and how we should respond.” – Doug Burger, Technical Fellow and Corporate Vice President, Microsoft ResearchAbout Doug BurgerDoug Burger is a research leader in …

2 months назад @ microsoft.com
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

5 months назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

7 months назад @ microsoft.com
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education

KOHANE: So I think you’ve “nerd sniped” me because you [LAUGHTER]—which is all too easy—but I think there’s a central issue here.

But I actually think this is dark matter of human organizational technology that is not well understood.

AZEEM AZHAR: We didn’t talk about, you know, AI in its ability to potentially do this, which is to extend the clinician’s presence throughout the week.

And so I think there’s always going to be an opening for either differences of opinion or agreeing with you too much.

And this gets into whether AI is really going to get almost to the ab initio understanding of human biology.

8 months, 2 weeks назад @ microsoft.com
Reimagining healthcare delivery and public health with AI
Reimagining healthcare delivery and public health with AI Reimagining healthcare delivery and public health with AI

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

9 months назад @ microsoft.com
Navigating medical education in the era of generative AI
Navigating medical education in the era of generative AI Navigating medical education in the era of generative AI

Prior to med school, Daniel pursued experiences that cultivated his interest in the application of AI in medical practice and education.

Really, really looking forward to this chat.

There’s AI before ChatGPT and before, you know, generative AI really became a big thing, and then afterwards.

And then after we talk about what’s really happening, what do you think should happen in medical education given the reality of generative AI?

And I do agree [that] AI really gives us real hope that we can make it true.

9 months, 2 weeks назад @ microsoft.com
AI Testing and Evaluation: Reflections
AI Testing and Evaluation: Reflections AI Testing and Evaluation: Reflections

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

We have examples, like the pharmaceutical or medical device industry experts with whom you spoke, that’s really, you know, testing … there is a pre-deployment requirement.

And the third is just how rigid versus adaptive these testing and evaluation regimes or frameworks are in these different domains.

I really agree that there has been a lot of emphasis to date on, sort of, testing models upstream, the AI model evaluation.

You know, I think there’s been real progress already in the AI evaluation and testing ecosystem in the public-private partnership context.

9 months, 2 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from cybersecurity
AI Testing and Evaluation: Learnings from cybersecurity AI Testing and Evaluation: Learnings from cybersecurity

Absolutely, I really, really was.

As a principal director on the Microsoft AI Red Team, Tori leads all AI security and safety red team operations, as well as dangerous capability testing, to directly inform C-suite decision-makers.

This year, we’ve pulled a lot of those assets and insights into the Azure [AI] Foundry AI Red Teaming Agent (opens in new tab).

So you can get a little taste of what we do day to day in the AI Red Teaming Agent.

WESTERHOFF: I think the most important takeaway from those lessons is that AI security is truly a team sport.

9 months, 3 weeks назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 5 days, 20 hours назад
Student Spotlight: Aaron Payne, Data Analyst
Student Spotlight: Aaron Payne, Data Analyst Student Spotlight: Aaron Payne, Data Analyst

Aaron Payne, an MBA student at Georgia Tech studying business analytics and a Senior Insights Analyst at Chick-fil-A, joins Kyle Polich to talk about turning analytics into decisions that matter. They unpack a real-world forecasting project with Comfama in Colombia, including messy data realities, interpretability tradeoffs, and why "data science for good" starts with the people impacted.

5 days, 20 hours назад @ dataskeptic.com
The Future is Agentic in Recommender Systems
The Future is Agentic in Recommender Systems The Future is Agentic in Recommender Systems

Kyle Polich sits down with Yashar Deldjoo, research scientist and Associate Professor at the Polytechnic University of Bari, to explore how recommender systems have evolved and why trustworthiness matters. They unpack key dimensions of responsible AI, including robustness to adversarial attacks, privacy, explainability, and fairness, and discuss how LLMs introduce new risks like hallucinations. The episode closes with a look at "agentic" recommender systems, where tools and memory shift recommendations from ranked lists to end-to-end task completion.

1 week, 4 days назад @ dataskeptic.com
Book Ratings and Recommendations
Book Ratings and Recommendations Book Ratings and Recommendations

Goodreads star ratings can be misleading as measures of "book quality," and research from Hannes Rosenbusch suggests that for many professionally published books, differences between readers often matter more than differences between books. The episode also explores how to model reader preferences, why reviews often reveal more about the reviewer than the text, and how LLMs can aid computational literary research while still falling short of human editors in creative writing.

1 month, 1 week назад @ dataskeptic.com
Disentanglement and Interpretability in Recommender Systems
Disentanglement and Interpretability in Recommender Systems Disentanglement and Interpretability in Recommender Systems 1 month, 3 weeks назад @ dataskeptic.com
Collective Altruism in Recommender Systems
Collective Altruism in Recommender Systems Collective Altruism in Recommender Systems

Ekaterina (Kat) Filadova from MIT EECS joins us to discuss strategic learning in recommender systems—what happens when users collectively coordinate to game recommendation algorithms. Kat's research reveals surprising findings: algorithmic "protest movements" can paradoxically help platforms by providing clearer preference signals, and the challenge of distinguishing coordinated behavior from bot activity is more complex than it appears. This episode explores the intersection of machine learning and game theory, examining what happens when your training data actively responds to your algorithm.

2 months, 1 week назад @ dataskeptic.com
Niche vs Mainstream
Niche vs Mainstream Niche vs Mainstream

Anas Buhayh discusses multi-stakeholder fairness in recommender systems and the S'mores framework—a simulation allowing users to choose between mainstream and niche algorithms. His research shows specialized recommenders improve utility for niche users while raising questions about filter bubbles and data privacy.

2 months, 2 weeks назад @ dataskeptic.com
Healthy Friction in Job Recommender Systems
Healthy Friction in Job Recommender Systems Healthy Friction in Job Recommender Systems

In this episode, host Kyle Polich speaks with Roan Schellingerhout, a fourth-year PhD student at Maastricht University, about explainable multi-stakeholder recommender systems for job recruitment. Roan discusses his research on creating AI-powered job matching systems that balance the needs of multiple stakeholders—job seekers, recruiters, HR professionals, and companies. The conversation explores different types of explanations for job recommendations, including textual, bar chart, and graph-based formats, with findings showing that lay users strongly prefer simple textual explanations over more technical visualizations. Roan shares insights from his "healthy friction" study, which tested …

3 months назад @ dataskeptic.com
Fairness in PCA-Based Recommenders
Fairness in PCA-Based Recommenders Fairness in PCA-Based Recommenders

In this episode, we explore the fascinating world of recommender systems and algorithmic fairness with David Liu, Assistant Research Professor at Cornell University's Center for Data Science for Enterprise and Society. David shares insights from his research on how machine learning models can inadvertently create unfairness, particularly for minority and niche user groups, even without any malicious intent. We dive deep into his groundbreaking work on Principal Component Analysis (PCA) and collaborative filtering, examining why these fundamental techniques sometimes fail to serve all users equally. David introduces the concept of "power niche users" - highly active users with specialized in…

3 months, 1 week назад @ dataskeptic.com
Video Recommendations in Industry
Video Recommendations in Industry Video Recommendations in Industry

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial …

4 months, 1 week назад @ dataskeptic.com
Eye Tracking in Recommender Systems
Eye Tracking in Recommender Systems Eye Tracking in Recommender Systems

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrate…

4 months, 2 weeks назад @ dataskeptic.com
Cracking the Cold Start Problem
Cracking the Cold Start Problem Cracking the Cold Start Problem

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insigh…

4 months, 4 weeks назад @ dataskeptic.com
Designing Recommender Systems for Digital Humanities
Designing Recommender Systems for Digital Humanities Designing Recommender Systems for Digital Humanities

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challen…

5 months, 2 weeks назад @ dataskeptic.com
DataRec Library for Reproducible in Recommend Systems
DataRec Library for Reproducible in Recommend Systems DataRec Library for Reproducible in Recommend Systems

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Mario Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. The conversation covers Alberto's research journey through knowledge graphs, graph-based recommen…

5 months, 3 weeks назад @ dataskeptic.com
Shilling Attacks on Recommender Systems
Shilling Attacks on Recommender Systems Shilling Attacks on Recommender Systems

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a …

6 months назад @ dataskeptic.com
Music Playlist Recommendations
Music Playlist Recommendations Music Playlist Recommendations

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start prob…

6 months, 1 week назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 2 days назад
989: Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib
989: Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib 989: Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib

Rubrik’s Anneka Gupta and Cal Al-Dhubaib speak to Jon Krohn about cybersecurity measures, the risks AI in business might pose for malicious attacks, and why AI should be kept “boring.” Find out how Rubrik safeguards client data, what zero trust is in the context of cybersecurity, and why cyber-resilience needs to be a top priority for companies looking to adopt AI. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/989⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (02:25) All about Rubrik (08:51) The announcement of Claude …

2 days назад @ podtrac.com
988: In Case You Missed It in April 2026
988: In Case You Missed It in April 2026 988: In Case You Missed It in April 2026

In this month’s episode of In Case You Missed It, Jon Krohn talks to guests about memory and education, and how artificial intelligence is continuing to help lower the barriers to access. Hear from Matt Glickman, Traci Walker-Griffith, Richmond Alake, and Linda Haviv, discussing the foundations of AI agent memory, how engineers can develop at scale, and why they believe AI could be your child’s perfect tutor in the classroom. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/988⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

6 days назад @ podtrac.com
987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv
987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv 987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv

Linda Haviv talks to Jon Krohn about staying current on AI matters, why open-source technology is narrowing the gap in its race with proprietary models, and how being a content creator in tech is key to career growth and longevity. She emphasizes that non-linear pathways to a career in tech can give applicants an edge, and stresses the importance of continuous upskilling to “stay relevant.” In her view, systems thinking is becoming more important than coding skills. Hear why in this episode. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/987⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascienc…

1 week, 2 days назад @ podtrac.com
986: Building Hardware is Hard but AI Agents Help, with Kishore Subramanian
986: Building Hardware is Hard but AI Agents Help, with Kishore Subramanian 986: Building Hardware is Hard but AI Agents Help, with Kishore Subramanian

CTO of Propel Software Kishore Subramanian talks to Jon Krohn about how product lifecycle management (PLM) software and quality management systems help ensure compliance, record management, and quality assurance. Listen to the episode to hear Kishore Subramanian talk about best practices for getting started with Agentforce 360, his top tips for deploying AI projects, and why yoga and meditation could make you better at building AI products! Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/984⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.⁠⁠⁠ In this episode you will learn: (05:21) …

1 week, 6 days назад @ podtrac.com
985: The Four Types of Memory Every AI Agent Needs, with Richmond Alake
985: The Four Types of Memory Every AI Agent Needs, with Richmond Alake 985: The Four Types of Memory Every AI Agent Needs, with Richmond Alake

Oracle’s Director of AI Developer Experience Richmond Alake returns to the show to talk to Jon Krohn about agent memory; the network of systems, models, databases and LLMs that enable AI agents to learn and adapt over time. Listen to the episode to hear about Richmond’s “100 Days of Agent Memory” initiative, retrieval-augmented generation’s (RAG) limitations with AI agents, the layers of the AI agent stack, and what makes the Oracle AI database so useful to developers. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/985⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship inf…

2 weeks, 2 days назад @ podtrac.com
984: Building AI Agents Where 99.9% Accuracy Isn't Good Enough, with Raju Malhotra
984: Building AI Agents Where 99.9% Accuracy Isn't Good Enough, with Raju Malhotra 984: Building AI Agents Where 99.9% Accuracy Isn't Good Enough, with Raju Malhotra

Raju Malhotra, Chief Product and Technology Officer at Certinia, talks to Jon Krohn about the so-called SaaSpocalypse and how agentic AI is proving the doomsayers wrong. Listen to the episode to hear more about Certinia’s work with Salesforce and building with Agentforce 360, the three elements required for enterprise-grade agents, how AI agents have benefitted Certinia’s customers, and how to keep your work portfolio fresh and interesting to recruiters. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/984⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.⁠⁠⁠ In this episode you will lea…

2 weeks, 6 days назад @ podtrac.com
983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith
983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith 983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith

My guest today took a public school that was about to be shut down and turned it into the number one school in Boston, and AI is her latest secret weapon. In a long-overdue episode on AI for supporting children’s education, hear directly from Principal Traci Walker Griffith how her teachers have been experimenting with AI in classrooms, what works, what doesn’t work, and what’s next for kids as LLMs continue to improve. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/983⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (03:38) Th…

3 weeks, 2 days назад @ podtrac.com
982: In Case You Missed It in March 2026
982: In Case You Missed It in March 2026 982: In Case You Missed It in March 2026

Jon Krohn rounds up March’s interviews in this ICYMI episode. Hear from AI and data science experts across the fields of education and business in this wide-ranging series of clips that take listeners from the Renaissance to the near future. Guests include Lin Quiao (Episode 971), Chris Fregly (Episode 973), Zack Kass (Episode 975), Kyunghyun Cho (Episode 977), and Rohit Choudhary (Episode 979). Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/982⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

3 weeks, 6 days назад @ podtrac.com
981: How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman
981: How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman 981: How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman

Matt Glickman talks to Jon Krohn about co-founding the agentic-platform startup, Genesis Computing, how his experience at Goldman Sachs paved the way for developing AI agents, and where he thinks agentic AI has just as much value as a company’s human employees. This February, Genesis Computing revealed how its platform can offer the guardrails so crucial to businesses, alongside increased capabilities that help execute entire workflows from research to deployment. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/981⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.…

1 month назад @ podtrac.com
980: AI Making Theoretical Physics Breakthroughs
980: AI Making Theoretical Physics Breakthroughs 980: AI Making Theoretical Physics Breakthroughs

A team of theoretical physicists from Harvard, Cambridge, the Institute for Advanced Study, and Vanderbilt used OpenAI’s models not just as a tool, but as a collaborator, cracking a problem in particle physics that had stymied them for months. In this Five-Minute Friday, Jon Krohn walks through how GPT-5.2 Pro simplified a 32-variable mathematical expression into a single line, proposed what it called the “obvious generalization” for any number of gluons, and how a more powerful internal model then produced a formal proof after 12 hours of autonomous reasoning. Find out why this may be a template for AI-assisted scientific discovery and what it means for the future of research. Additional m…

1 month назад @ podtrac.com
979: Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary
979: Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary 979: Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary

For years, Jon has been quoting the stat that the world's data is roughly doubling every year. His guest today says that’s way too conservative, he’s seeing enterprise data soon growing at close to 10x per year. And most organizations are nowhere near ready for what that means. In this episode, Rohit Choudhary, founder and CEO of Acceldata, explains how the agentic data management platform his team has built helps enterprises make their increasingly vast amounts of data self-aware, self-optimizing, and AI-ready. He breaks down why governance needs to be operational and real-time rather than a one-time compliance exercise, and shares his view on why the most valuable professionals in the age…

1 month, 1 week назад @ podtrac.com
A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)
A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%) A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

A game millions of people solve over morning coffee is exposing a fundamental weakness in today’s most powerful AI models. In this Five-Minute Friday, Jon Krohn breaks down Pathway’s new Sudoku Extreme benchmark, roughly 250,000 of the hardest Sudoku puzzles available and why leading LLMs like o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet scored effectively zero percent, while Pathway’s post-transformer BDH architecture achieved 97.4% accuracy at a fraction of the cost. Listen to the episode to find out what BDH is doing differently, why Sudoku performance matters far beyond puzzles, and what this means for the future of AI reasoning. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatas…

1 month, 1 week назад @ podtrac.com
977: Attention, World Models and the Future of AI, with Prof. Kyunghyun Cho
977: Attention, World Models and the Future of AI, with Prof. Kyunghyun Cho 977: Attention, World Models and the Future of AI, with Prof. Kyunghyun Cho

What’s going to be the next big step function that blasts us forward in AI capabilities? To find out, Jon Krohn sits down with Professor Kyunghyun Cho, whose 200,000 citations and co-authorship of the first paper on attention place him among the most influential AI researchers in the world. In this episode, Kyunghyun explains why today’s models have already captured most correlations in passive data, making the real challenge about actively choosing which data to collect. He also weighs in on the open debate around world models, whether AI needs high-fidelity, step-by-step imagination or whether a high-level latent representation that lets it skip ahead is sufficient and shares the surprisi…

1 month, 2 weeks назад @ podtrac.com
976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems
976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems 976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems

NVIDIA just dropped Nemotron 3 Super, a 120-billion-parameter open-weight model that only activates 12 billion parameters at a time and it’s built for the agentic AI era. In this Five-Minute Friday, Jon Krohn breaks down the model’s hybrid Mamba-Transformer architecture, its million-token context window, and why its combination of frontier-class reasoning with blazing-fast throughput matters for anyone building multi-agent systems. Find out how Nemotron 3 Super claimed the #1 spot on the DeepResearch Bench leaderboards, which companies are already adopting it, and where you can start using it today. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/976⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Intere…

1 month, 2 weeks назад @ podtrac.com
975: Unmetered Intelligence is Heralding the Next Renaissance, with Zack Kass
975: Unmetered Intelligence is Heralding the Next Renaissance, with Zack Kass 975: Unmetered Intelligence is Heralding the Next Renaissance, with Zack Kass

Zack Kass speaks to Jon Krohn about his bestselling, tech-positive book, The Next Renaissance, that charts the rapid progress of humanity and the benefits that artificial intelligence will bring to us, as well as why a future where intelligence is a cheap and abundant resource will give humanity an edge. Elsewhere in the show, Zack discusses why it’s important to hold parents, teachers and students accountable for their education, why it is incumbent on us to build a healthier relationship with technology, and his 4 principles for thriving in the age of AI. This episode is brought to you by the⁠ ⁠⁠Cisco⁠, by ⁠Acceldata⁠ and by ⁠⁠ODSC, the Open Data Science Conference⁠⁠. Additional materials…

1 month, 3 weeks назад @ podtrac.com
Data Science at Home Data Science at Home
последний пост 2 weeks, 1 day назад
Europe, wake up! You Can’t Be a Superpower on Someone Else’s Servers (Ep. 304)
Europe, wake up! You Can’t Be a Superpower on Someone Else’s Servers (Ep. 304) Europe, wake up! You Can’t Be a Superpower on Someone Else’s Servers (Ep. 304)

Tech sovereignty takes 3 years and political will.

Check outshift.comNEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at: [email protected]’t forget to like, subscribe, and hit the 🔔 for updates on the latest in AI and data science!

2 weeks, 1 day назад @ datascienceathome.com
About Apple’s Privacy (Ep. 302)
About Apple’s Privacy (Ep. 302) About Apple’s Privacy (Ep. 302)

Apple just spent $2B on tech that reads your silent speech.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at: [email protected]’t forget to like, subscribe, and hi…

2 weeks, 1 day назад @ datascienceathome.com
Productivity is the new data breach (Ep. 301)
Productivity is the new data breach (Ep. 301) Productivity is the new data breach (Ep. 301)

Personal newsletter:https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews…

2 weeks, 1 day назад @ datascienceathome.com
Programmable Money: The Cage They’ll Call Convenience (Ep. 300)
Programmable Money: The Cage They’ll Call Convenience (Ep. 300) Programmable Money: The Cage They’ll Call Convenience (Ep. 300)

This episode breaks down programmable money, the technology that turns your wallet into a permission system.

Personal newsletter: https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at: …

2 weeks, 1 day назад @ datascienceathome.com
There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299)
There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299) There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299)

Personal newsletter: https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/ Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, intervi…

2 months назад @ datascienceathome.com
Bias in the machine (edited)
Bias in the machine (edited) Bias in the machine (edited)

The title of today’s episode is Bias in the machineC: Francesco, today we are starting with an infuriating discussion.

The failure of the medical community as a whole to recognise this obvious bias up to the 21st century is an example of how insidious the problem of bias is.

Three: The bias in your training sample: people put training samples together, and people have culture, experience, and prejudice.

These assumptions inform the way AI systems work—and fail—to this day.

When an algorithm is a black box and you can’t look inside, you have no way of analysing its bias.

2 months назад @ datascienceathome.com
What is wrong with reinforcement learning? (Ep. 82)
What is wrong with reinforcement learning? (Ep. 82) What is wrong with reinforcement learning? (Ep. 82)

Join the discussion on our Discord serverAfter reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here.In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions.

RL seems to work so well!

What is wrong with it?

Are you a listener of Data Science at Home podcast?

Or did you subscribe to the Artificial Intelligence at your fingertips newsletter?

3 months назад @ datascienceathome.com
How to generate very large images with GANs (Ep. 76)
How to generate very large images with GANs (Ep. 76) How to generate very large images with GANs (Ep. 76)

Join the discussion on our Discord serverIn this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs.

The problem is not as trivial as it seems.

Many researchers have failed in generating large images with GANs before.

One interesting application of such approach is in medicine for the generation of CT and X-ray images.Enjoy the show!

ReferencesMulti-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376

3 months назад @ datascienceathome.com
Training neural networks faster without GPU [RB] (Ep. 77)
Training neural networks faster without GPU [RB] (Ep. 77) Training neural networks faster without GPU [RB] (Ep. 77)

Join the discussion on our Discord serverTraining neural networks faster usually involves the usage of powerful GPUs.

In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.

Enjoy the show!

ReferencesFaster Neural Network Training with Data Echoinghttps://arxiv.org/abs/1907.05550

3 months назад @ datascienceathome.com
More powerful deep learning with transformers (Ep. 84) (Rebroadcast)
More powerful deep learning with transformers (Ep. 84) (Rebroadcast) More powerful deep learning with transformers (Ep. 84) (Rebroadcast)

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.

Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful.

Don’t forget to subscribe to our Newsletter or join the discussion on our Discord serverReferences

3 months назад @ datascienceathome.com
Your Favorite AI Startup is Probably Bullshit (Ep. 298) [RB]
Your Favorite AI Startup is Probably Bullshit (Ep. 298) [RB] Your Favorite AI Startup is Probably Bullshit (Ep. 298) [RB]

The brutal truth about why Silicon Valley is blowing billions on glorified autocomplete while pretending it’s the next iPhone.

We’re diving deep into the AI investment circus where VCs who can’t code are funding companies that barely understand their own technology.

From blockchain déjà vu to the “ChatGPT wrapper” economy—this episode will make you question every AI valuation you’ve ever seen.

Fair warning: We’re naming names and calling out the hype.

Don’t listen if you work at a “revolutionary AI startup” that’s just OpenAI’s API with a pretty interface.

3 months назад @ datascienceathome.com
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 297) [RB]
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 297) [RB] Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 297) [RB]

VortexNet uses actual whirlpools to build neural networks.

By borrowing equations from fluid dynamics, this new architecture might solve deep learning’s toughest problems—from vanishing gradients to long-range dependencies.

Today we explain how vortex shedding, the Strouhal number, and turbulent flows might change everything in AI.

SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

3 months назад @ datascienceathome.com
AGI: The Dream We Should Never Reach (Ep. 296)
AGI: The Dream We Should Never Reach (Ep. 296) AGI: The Dream We Should Never Reach (Ep. 296)

Also on YouTubeTwo AI experts who actually love the technology explain why chasing AGI might be the worst thing for AI’s future—and why the current hype cycle could kill the field we’re trying to save.

Head to datascienceathome.com for detailed show notes, code examples, and exclusive deep-dives into the papers we discuss.

Subscribe to our newsletter for weekly breakdowns of cutting-edge research delivered straight to your inbox—no fluff, just science!

Our Discord community is full of ML engineers, researchers, and AI enthusiasts discussing papers, sharing projects, and helping each other level up.

Whether you’re debugging your first neural net or training your tenth transformer, there’s a …

3 months назад @ datascienceathome.com
When Data Stops Being Code and Starts Being Conversation (Ep. 297)
When Data Stops Being Code and Starts Being Conversation (Ep. 297) When Data Stops Being Code and Starts Being Conversation (Ep. 297)

Mark Brocato built Mockaroo—the tool that taught millions of developers how to fake data.

Now, as Head of Engineering at Tonic.ai, he’s building the AI agent that’s making his own creation obsolete.

From the hidden failures of legacy mocks to the security implications of agent-driven synthesis, Mark reveals what happens when data generation becomes a conversation—not a pipeline.

SponsorsTonic.ai Synthetic data solutions for software and AI development.

Accelerate engineering velocity and ensure compliance with AI-powered data synthesisThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics…

4 months, 2 weeks назад @ datascienceathome.com
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295) Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)

Most companies don’t have an AI problem.

In this conversation, he breaks down when AI actually makes sense, where AWS costs spiral out of control, and why your “cool demo” keeps dying before launch.

If you’re tired of AI hype and ready for straight answers, hit play.

Our Discord community is full of ML engineers, researchers, and AI enthusiasts discussing papers, sharing projects, and helping each other level up.

Whether you’re debugging your first neural net or training your tenth transformer, there’s a place for you.

5 months, 1 week назад @ datascienceathome.com