Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 14 часов назад
Anyone from India attending EEML ? [D]
Anyone from India attending EEML ? [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]
Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R] Made and Published a Paper Comparing Analysis of CNN and Vision Transformer Architectures for Brain Tumor Detection [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
Do you agree with Judea that learning from data is not everything? [D]
Do you agree with Judea that learning from data is not everything? [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

15 часов назад @ reddit.com
[Academic] We need Data Annotators or Someone who Prepares Dataset [R]
[Academic] We need Data Annotators or Someone who Prepares Dataset [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

21 час назад @ reddit.com
Backlash against Arxiv's proposed 1 year ban is genuinely perplexing. [D]
Backlash against Arxiv's proposed 1 year ban is genuinely perplexing. [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

21 час назад @ reddit.com
[R] Which LLMs are actually best for bleeding-edge Linux/ML debugging workflows in 2026? [R]
[R] Which LLMs are actually best for bleeding-edge Linux/ML debugging workflows in 2026? [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 1 hour назад @ reddit.com
KDD 2026 Cycle 2 Results [D]
KDD 2026 Cycle 2 Results [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 2 hours назад @ reddit.com
ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D]
ROCm with PyTorch and PyTorch Lightning seems to still suck for research [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 6 hours назад @ reddit.com
Doubts Urgent Guys![R]
Doubts Urgent Guys![R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 7 hours назад @ reddit.com
Struggling with Overfitting on Medical Imaging Task [D]
Struggling with Overfitting on Medical Imaging Task [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 9 hours назад @ reddit.com
Notes from evaluating a customer support chat agent system: heuristic evaluators give false signal, retrieval bugs masquerade as LLM failures, and the cost/quality Pareto frontier is rarely where you think [D]
Notes from evaluating a customer support chat agent system: heuristic evaluators give false signal, retrieval bugs masquerade as LLM failures, and the cost/quality Pareto frontier is rarely where you think [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 12 hours назад @ reddit.com
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R]
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R] Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 12 hours назад @ reddit.com
PINN is predicting trivial solution for stiff ODE [D]
PINN is predicting trivial solution for stiff ODE [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 14 hours назад @ reddit.com
Looking for a real world dataset (or website where i can find it) [P]
Looking for a real world dataset (or website where i can find it) [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 15 hours назад @ reddit.com
software trying to catch software is officially a dead en [D]
software trying to catch software is officially a dead en [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 15 hours назад @ reddit.com
Towards Data Science
последний пост 15 часов назад
From Data Analyst to Data Engineer: My 12-Month Self-Study Roadmap
From Data Analyst to Data Engineer: My 12-Month Self-Study Roadmap

The exact tools I'm learning, the projects I'm building, and the mistakes I'm already expecting to make

The post From Data Analyst to Data Engineer: My 12-Month Self-Study Roadmap appeared first on Towards Data Science.

15 часов назад @ towardsdatascience.com
Recursive Language Models: An All-in-One Deep Dive
Recursive Language Models: An All-in-One Deep Dive Recursive Language Models: An All-in-One Deep Dive

To call subagents just run llm_query(sub_context) # Assistant ```repl print(context) ``` # REPL Output Generate a dictionary of different categories: fruits, countries, animals.

You can interact with the Python REPL by writing Python code.

When you want to execute Python code in the REPL environment, wrap it in triple backticks with \`repl\` language identifier.

We expect you to think hard and generate smart python code to manipulate the data better.

Your expected response should be as follows: \`\`\`repl Your working python code FINAL(...) \`\`\` Do not output multiple code blocks.

17 часов назад @ towardsdatascience.com
From Raw Data to Risk Classes
From Raw Data to Risk Classes From Raw Data to Risk Classes

For continuous variables, the objective is usually to transform a raw numerical scale into a smaller number of ordered risk classes.

Suppose default risk decreases for low to medium income levels, but then increases again for very high income levels.

A variable is monotonic with respect to default risk if the default rate moves in one direction when the variable increases.

This result is coherent with credit risk intuition: higher income is generally associated with higher repayment capacity and therefore lower default risk.

This is why we also analyze the evolution of the default rate by category over time :It is also useful to visualize, for each category:the population share;the default …

1 day, 13 hours назад @ towardsdatascience.com
How I Continually Improve My Claude Code
How I Continually Improve My Claude Code How I Continually Improve My Claude Code

How do I optimize the way I interact with my Claude Code instances and the way my Claude Code operates within the code repositories I’m updating?

In this article, I want to highlight how I’m continually updating how I interact with Claude Code and how my Claude Code operates, which makes me and my coding agent more and more effective over time.

Making Claude Code learn from itselfI’ll start off by covering the simple technique that you can start using right now, which is almost certainly going to improve how your Claude Code performs.

You can simply make a skill within Claude Code that goes something like this:Review my last interactions with Claude Code from the last 24 hours.

Check it out…

1 day, 15 hours назад @ towardsdatascience.com
Why My Coding Assistant Started Replying in Korean When I Typed Chinese
Why My Coding Assistant Started Replying in Korean When I Typed Chinese

From a Chinese prompt to a Korean response: an embedding-space investigation into how code vocabulary reshapes language

The post Why My Coding Assistant Started Replying in Korean When I Typed Chinese appeared first on Towards Data Science.

1 day, 16 hours назад @ towardsdatascience.com
Stop Evaluating LLMs with “Vibe Checks”
Stop Evaluating LLMs with “Vibe Checks” Stop Evaluating LLMs with “Vibe Checks”

To move an AI system from a fragile demo to a robust production asset, you must build a decision-frade evaluation scorecard.

As noted in recent discussions on agentic AI latency and cost, these operational metrics are just as critical as the model’s intelligence.

(Measurement: Automated comparison against a golden dataset using an LLM-as-a-judge to check for hallucinated entities).

This is your “golden dataset.”A golden dataset is a curated collection of diverse inputs paired with their expected, ideal outputs.

The Role of LLM-as-a-JudgeOne of the most powerful tools in modern AI evaluation is the “LLM-as-a-Judge” pattern.

1 day, 18 hours назад @ towardsdatascience.com
The Next AI Bottleneck Isn’t the Model: It’s the Inference System
The Next AI Bottleneck Isn’t the Model: It’s the Inference System The Next AI Bottleneck Isn’t the Model: It’s the Inference System

I’ve seen a lot when I’m working with enterprise AI teams: they nearly always blame the model when something goes wrong.

Fine-Tuning vs Inference Loop (Image by Author)What’s happening at inference timeFor a long time, inference was just the step where you used the model.

The resource allocation problemWhat is often underrated is that most AI systems use a uniform approach to all their queries.

These systems are more layered than people realizeWhen you look inside a production AI system today, it usually isn’t just one model answering questions.

It is often accompanied by a retrieval step, a ranking step, possibly a verification step, and a summarization step; several steps in tandem to gen…

2 days, 13 hours назад @ towardsdatascience.com
The Counterintuitive Networking Decisions Behind OpenAI’s 131,000-GPU Training Fabric
The Counterintuitive Networking Decisions Behind OpenAI’s 131,000-GPU Training Fabric

A critical analysis of MRC's three counterintuitive design decisions, the networking mathematics that make them work, and what they mean for the rest of the AI infrastructure community.

The post The Counterintuitive Networking Decisions Behind OpenAI’s 131,000-GPU Training Fabric appeared first on Towards Data Science.

2 days, 15 hours назад @ towardsdatascience.com
I Let CodeSpeak Take Over My Repository
I Let CodeSpeak Take Over My Repository I Let CodeSpeak Take Over My Repository

We can install CodeSpeak using the uv package manager:uv tool install codespeak-cli codespeak --version # CodeSpeak CLI 0.4.1Next, we need to log in.

codespeak loginThe final step is setting up your API key, since CodeSpeak follows a Bring Your Own Key policy.

TakeoverBefore using CodeSpeak to implement new features, we first need to generate an initial version of the specifications.

codespeak change --new # Created template change request in # /Users/marie/Documents/github/trainlytics_codespeak/change-request.cs.mdInside the file, I added the following request covering both the bug fix and better debugging support.

codespeak change frontend.cs.mdImage by authorBefore making any modificatio…

2 days, 16 hours назад @ towardsdatascience.com
How to Write Robust Code with Claude Code
How to Write Robust Code with Claude Code How to Write Robust Code with Claude Code

First, I cover how to initially write robust code and then continue to cover how to verify and improve the code after initial implementation.

having coding agents ask new questions instead of you asking coding agents is an incredibly powerful feature I urge you to actively use more.

How to verify the robustness of code through coding agentsOf course, it is very important to build robust code initially.

Coding agent code reviewThe first and probably the easiest thing you can do to build more robust code is to have coding agents review the code that other coding agents produce.

ConclusionIn this article, I discussed how to code using coding agents and ensure they produce robust code.

2 days, 18 hours назад @ towardsdatascience.com
I Built the Same B2B Document Extractor Twice: Rules vs. LLM
I Built the Same B2B Document Extractor Twice: Rules vs. LLM I Built the Same B2B Document Extractor Twice: Rules vs. LLM

Afterwards, we extract the data once with a traditional OCR and regex pipeline and once with an OCR and LLM pipeline.

Step 3 – Install Python LibrariesNow we install all Python libraries we need.

Approach 1: The Traditional Way (pytesseract + Regex Rules)The traditional approach works in two steps:First, we convert the PDF into an image.

The traditional approach tries to explicitly describe every possible document.

For small and stable environments, the traditional approach is often completely sufficient.

3 days, 11 hours назад @ towardsdatascience.com
Exploring Patterns of Survival from the Titanic Dataset
Exploring Patterns of Survival from the Titanic Dataset Exploring Patterns of Survival from the Titanic Dataset

This is a 38% survival rate as was previously received from the describe() function.

Survival by GenderLet us see how this survival rate was influenced by gender.

plt.figure(figsize=(10,6)) sns.histplot( data=df, x='Age', hue='Survived', bins=30, multiple='stack', alpha=0.6 ) plt.title("Age Distribution by Survival") plt.show()Age Distribution by Survival (Image by Author)From this stacked histogram, we can draw several meaningful insights about how age is related to survival on the Titanic.

Family Size AnalysisThe family size attribute is dependent on two different attributes of the dataset: SibSp and Parch.

As we can see, survival was not linearly related to the family size, but a moderat…

3 days, 13 hours назад @ towardsdatascience.com
What’s the Best Way to Brainwash an LLM?
What’s the Best Way to Brainwash an LLM? What’s the Best Way to Brainwash an LLM?

The SDF model’s protocol knowledge feels correct but slightly recited.

The Demonstrations model nails the most visible surface traits — 100% Sir/Master, 97% verbosity, but lags on anxiety (50%).

The SDF model is where it gets philosophically interesting.

Plot by author, using matplotlibFP and SDF models produce noticeably longer responses.

So, What’s the Best Way to Brainwash an LLM?

3 days, 16 hours назад @ towardsdatascience.com
Building an Evaluation Harness for Production AI Agents: A 12-Metric Framework From 100+ Deployments
Building an Evaluation Harness for Production AI Agents: A 12-Metric Framework From 100+ Deployments Building an Evaluation Harness for Production AI Agents: A 12-Metric Framework From 100+ Deployments

What we didn’t have was an evaluation harness that could measure hallucination rate, context faithfulness, or tool-selection accuracy in production.

>0.80 Retrieval Retrieval Latency How fast did the retrieval complete?

For teams building AI agents for business automation, the evaluation harness often determines whether the project ships to production at all.

Category 1: Retrieval Metrics (4)If your agent uses retrieval (RAG, knowledge base lookup, document search), retrieval quality is the foundation.

If you’re building production AI agents and want a second opinion on your evaluation framework grounded in 100+ deployments, the Intuz team is happy to help.

3 days, 18 hours назад @ towardsdatascience.com
From Vibe Coding to Spec-Driven Development
From Vibe Coding to Spec-Driven Development From Vibe Coding to Spec-Driven Development

There has been a lot of hype around vibe coding, but it seems that professional engineers are already moving beyond it and leaning more toward spec-driven development.

In this article, I’d like to put spec-driven development into practice on a greenfield project, following the best practices from JetBrains’ course on DeepLearning.AI, “Spec-Driven Development with Coding Agents”.

But before we jump into implementation, let me first spend a few minutes on the theory behind spec-driven development.

But when you’re working on a larger project (especially with other people) spec-driven development would definitely be my default approach.

ReferenceThis article is inspired by the “Spec-Driven Deve…

4 days, 13 hours назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
TheSequence TheSequence
последний пост 2 days, 19 hours назад
The Sequence Opinion #860: Every Company’s Last eXam: Some Reflection About Practical AI Evals
The Sequence Opinion #860: Every Company’s Last eXam: Some Reflection About Practical AI Evals The Sequence Opinion #860: Every Company’s Last eXam: Some Reflection About Practical AI Evals

For today’s essay, I want to explore an idea that has become central to how we think about AI evaluations at LayerLens.

This is not an essay about LayerLens, but about a simple and increasingly unavoidable thesis: evals are becoming the fourth pillar of modern AI, alongside compute, data, and models.

As AI systems move from chatbots to agents, from demonstrations to production workflows, every meaningful task performed by every agent inside every company will need its own evaluation layer.

Practical, dynamic, company-specific exams that measure whether an AI system can actually survive contact with real work.

That is why top frontier labs now emphasize task-specific evals, production-derive…

2 days, 19 hours назад @ thesequence.substack.com
The Sequence AI of the Week #859: Reading Claude’s Mind in English: A Note on Natural Language Autoencoders
The Sequence AI of the Week #859: Reading Claude’s Mind in English: A Note on Natural Language Autoencoders The Sequence AI of the Week #859: Reading Claude’s Mind in English: A Note on Natural Language Autoencoders

There is a recurring fantasy in interpretability work, somewhere between a wish and an embarrassment.

You stare at a residual stream activation — twelve thousand floats — and you want to ask it, in plain English, what are you thinking about?

Sparse autoencoders give you a thousand sparse latents you then label by inspecting top-activating examples.

Anthropic’s new paper, Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations , is the first interpretability artifact in a while where the activation talks back.

You point an NLA at a token in a Claude Opus 4.6 transcript and it produces a few bullet points of English describing what the model is thinking.

3 days, 18 hours назад @ thesequence.substack.com
The Sequence Knowledge #858: How State Space Models Went from Curiosity to Serious Transformer Competitor
The Sequence Knowledge #858: How State Space Models Went from Curiosity to Serious Transformer Competitor The Sequence Knowledge #858: How State Space Models Went from Curiosity to Serious Transformer Competitor

💡 AI Concept of the Day: How State Space Models Went from Curiosity to Serious Transformer CompetitorThere is this thing that happens in ML research where a line of work gets quietly good for years, and then one day you wake up and it’s suddenly competing with the dominant paradigm.

State space models are having that moment right now.

For the past eight years, the transformer has been the only architecture that matters.

But transformers have a problem that everyone in the field knows about and nobody has fully solved.

State space models offer a fundamentally different contract: linear time complexity, constant memory at inference, and no KV-cache at all.

4 days, 19 hours назад @ thesequence.substack.com
The Sequence Radar #857: Last Week in AI: Inside the Machine, Outside the Text Box
The Sequence Radar #857: Last Week in AI: Inside the Machine, Outside the Text Box The Sequence Radar #857: Last Week in AI: Inside the Machine, Outside the Text Box

In the AI of the week, we dive into Anthropic’s groundbreaking paper about natural language autoencoders.

Subscribe and don’t miss out:📝 Editorial: Inside the Machine, Outside the Text BoxThis week in AI had the strange texture of a market that is simultaneously becoming more scientific, more productized, and more speculative.

But underneath all of it is the same story: AI is moving from a model race into an infrastructure race.

Anthropic’s Natural Language Autoencoders paper was the most intellectually interesting development of the week.

On the opposite side of the stack, OpenAI’s new voice model release pushed AI further toward becoming a native interface rather than a text box with bett…

6 days, 19 hours назад @ thesequence.substack.com
The Sequence Opinion #856: The Salesforce of agents won't be Salesforce, The Google of agents won't be Google
The Sequence Opinion #856: The Salesforce of agents won't be Salesforce, The Google of agents won't be Google The Sequence Opinion #856: The Salesforce of agents won't be Salesforce, The Google of agents won't be Google

For the first few decades of the internet, software had a reasonably stable assumption baked into it: the user was a human.

The entire SaaS and consumer internet stack grew around this shape of user.

CRMs tracked human sales reps selling to human buyers.

Identity systems authenticated humans.

Analytics systems measured human clicks, human sessions, human conversions.

1 week, 2 days назад @ thesequence.substack.com
The Sequence AI of the Week #855: Inside Nemotron Omni: NVIDIA’s New Multimodal Brain for Agents
The Sequence AI of the Week #855: Inside Nemotron Omni: NVIDIA’s New Multimodal Brain for Agents The Sequence AI of the Week #855: Inside Nemotron Omni: NVIDIA’s New Multimodal Brain for Agents

The interesting thing about NVIDIA’s new Nemotron 3 Nano Omni is not that it “does multimodality.” We already have a zoo of models that can caption images, transcribe speech, parse PDFs, answer questions about videos, and click around GUIs.

The interesting thing is that Nemotron Omni is designed to make that zoo feel like a single animal.

The speech model may hear what was said but not what was on screen when it was said.

Nemotron 3 Nano Omni is NVIDIA’s attempt to move the “eyes and ears” of an agent into a single efficient perception-and-reasoning model: video, audio, image, and text in; text out.

NVIDIA announced it on April 28, 2026, positioning it as an open omni-modal reasoning model …

1 week, 3 days назад @ thesequence.substack.com
The Sequence Knowledge #854: Return of the King: Unrolling the xLSTM Architecture
The Sequence Knowledge #854: Return of the King: Unrolling the xLSTM Architecture The Sequence Knowledge #854: Return of the King: Unrolling the xLSTM Architecture

💡 AI Concept of the Day: Return of the King: Unrolling the xLSTM ArchitectureIf you were training sequence models circa 2015, your entire mental model of the world was shaped by the Long Short-Term Memory (LSTM) network.

Invented in the 1990s by Sepp Hochreiter and Jürgen Schmidhuber, the LSTM was the undisputed workhorse of deep learning.

“Attention Is All You Need” dropped, and the entire AI ecosystem pivoted.

We traded the deep, architectural elegance of the LSTM for the brute-force, highly parallelizable matrix multiplications of the Transformer.

The Transformer won the hardware lottery because it allowed us to map the entire sequence onto a GPU grid and train it all at once.

1 week, 4 days назад @ thesequence.substack.com
The Sequence Radar #853: Last Week in AI: The Great AI Fundraising Wars and a New Frontier Lab
The Sequence Radar #853: Last Week in AI: The Great AI Fundraising Wars and a New Frontier Lab The Sequence Radar #853: Last Week in AI: The Great AI Fundraising Wars and a New Frontier Lab

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: The Great AI Fundraising Wars and a New Frontier LabThis week in AI felt less like a product cycle and more like a sovereign debt auction for the future of cognition.

But the real story is that frontier AI is becoming an industrial-scale capital formation game.

It is the market trying to price a new kind of company: part model lab, part cloud tenant, part developer platform, part enterprise operating system.

AI Lab: MicrosoftSummary: This paper introduces a scalable methodology to create realistic, user-specific synthetic computer environments populated with diverse directory structures and content-rich artifacts.

🤖 AI Tech Releases…

1 week, 6 days назад @ thesequence.substack.com
The Sequence Opinion #852: The Bitter Lessons for Agentic Interfaces: A CLI for EVERYTHING
The Sequence Opinion #852: The Bitter Lessons for Agentic Interfaces: A CLI for EVERYTHING The Sequence Opinion #852: The Bitter Lessons for Agentic Interfaces: A CLI for EVERYTHING

The next evolution of agentic SaaS isn’t more tool infrastructure.

I’ve been thinking a lot lately about why building agentic systems still feels so weirdly clunky, and I think I’ve finally put my finger on it.

The thesis I want to argue is simple: the next phase of agentic SaaS is not about chat interfaces, and it’s not about ever-more-elaborate tool infrastructures.

Every SaaS, eventually, will ship a parallel command-line surface — not as a developer convenience, but as the primary interface for its non-human users.

The bitter lesson, applied to interfaces

2 weeks, 2 days назад @ thesequence.substack.com
The Sequence AI of the Week #851: DeepSeek-V4 and the Architecture of Million-Token Intelligence
The Sequence AI of the Week #851: DeepSeek-V4 and the Architecture of Million-Token Intelligence The Sequence AI of the Week #851: DeepSeek-V4 and the Architecture of Million-Token Intelligence

The most interesting thing about DeepSeek-V4 is not that it supports a one-million-token context window.

That number is impressive, but context length by itself is a poor proxy for intelligence.

The real question is not: how much text can the model ingest?

The real question is: how much history can the model economically use?

The model is designed around a simple but profound premise: million-token intelligence requires more than scaling the Transformer.

2 weeks, 3 days назад @ thesequence.substack.com
The Sequence Knowledge #850: The Unexpected Comeback of RNNs
The Sequence Knowledge #850: The Unexpected Comeback of RNNs The Sequence Knowledge #850: The Unexpected Comeback of RNNs

💡 AI Concept of the Day: The Unexpected Comeback of RNNsIf you were building sequence models circa 2015, your mental model of the world was entirely shaped by Recurrent Neural Networks (RNNs).

You feed the network a token, it updates a fixed-size hidden state, and it throws the token away.

During inference, the memory footprint was beautifully constant—an $O(1)$ operation that could run efficiently on almost any hardware.

In a Transformer, the model must explicitly hold the high-dimensional representations of every previous token in memory to generate the next one.

But this is not a nostalgic return to the classic Long Short-Term Memory (LSTM) networks of the 2010s.

2 weeks, 4 days назад @ thesequence.substack.com
The Sequence Radar #849: Last Week in AI: OpenAI Ships Agents, xAI Eyes Cursor, DeepSeek and Kimi Advance
The Sequence Radar #849: Last Week in AI: OpenAI Ships Agents, xAI Eyes Cursor, DeepSeek and Kimi Advance The Sequence Radar #849: Last Week in AI: OpenAI Ships Agents, xAI Eyes Cursor, DeepSeek and Kimi Advance

AI Lab: Inclusion AI, Ant GroupSummary: This paper introduces LLaDA2.0-Uni, a unified discrete diffusion large language model that seamlessly integrates multimodal understanding and generation within a single framework.

AI Lab: Carnegie Mellon University, Amazon AGISummary: The authors present SkillLearnBench, the first benchmark designed to evaluate continual learning methods for agent skill generation across 20 real-world tasks.

AI Lab: MicrosoftSummary: AutoAdapt is an end-to-end automated framework designed to optimize the complex domain adaptation process for large language models under tight resource constraints.

Kimi 2.6Kimi 2.6 launched with marquee capabilities in agentic coding.

W…

2 weeks, 6 days назад @ thesequence.substack.com
The Sequence Opinion #848: The Agent’s Hands: CLI or MCP?
The Sequence Opinion #848: The Agent’s Hands: CLI or MCP? The Sequence Opinion #848: The Agent’s Hands: CLI or MCP?

The moment we give it tools, it becomes something else: not merely a chatbot, but an operator.

It can read files, write code, open issues, call APIs, move tickets, delete emails, deploy infrastructure, or wake you up at 3 a.m. because a background workflow misread a calendar event.

So the real primitive of agentic systems is the interface between the model and the world.

Two candidates have emerged as the main bridge: the command-line interface, or CLI, and the Model Context Protocol, or MCP.

Text in, text out, exit code, compose everything.” MCP says: “Agents need structured, discoverable, typed tools.

3 weeks, 2 days назад @ thesequence.substack.com
The Sequence AI of the Week #847: Everything You Need to Know About Claude Opus 4.7
The Sequence AI of the Week #847: Everything You Need to Know About Claude Opus 4.7 The Sequence AI of the Week #847: Everything You Need to Know About Claude Opus 4.7

The benchmarks are what you’d expect from a two-month incremental release — SWE-bench Verified 87.6%, SWE-bench Pro 64.3%, MCP-Atlas +14.6pp, state-of-the-art on GDPval-AA for economically valuable knowledge work, XBOW visual-acuity 54.5% → 98.5%, finance and document reasoning up, BrowseComp and long-context multi-needle retrieval down.

If you migrate a 4.6 harness to 4.7 and it still sets temperature , top_p , top_k , or thinking.budget_tokens , you get a 400.

In their place: an effort enum ( low , medium , high , xhigh , max ) and task_budget , a soft token ceiling the model can actually see.

You’re no longer tuning the softmax; you’re telling the model how hard to think and how much run…

3 weeks, 3 days назад @ thesequence.substack.com
The Sequence Knowledge #846: Beyond Transformer: A New Series
The Sequence Knowledge #846: Beyond Transformer: A New Series The Sequence Knowledge #846: Beyond Transformer: A New Series

💡 AI Concept of the Day: Beyond Transformer: A New SeriesIf you have been watching the arXiv firehose lately, you can feel a very palpable vibe shift.

Today, we are starting a new series to map out exactly what is happening: the search for novel alternatives to the Transformer architecture.

For the better part of a decade, the entire artificial intelligence ecosystem has essentially been a giant, spectacularly funded wrapper around a single mathematical operation: self-attention.

The Transformer won the hardware lottery of the late 2010s.

It was beautifully parallelizable across GPUs, and its mental model was intuitively simple—every token looks back at every previous token to decide what t…

3 weeks, 4 days назад @ thesequence.substack.com
Synced Review
последний пост None
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 1 month, 1 week назад
Вайбкодинг по Chess’ноку. 1. e4
Вайбкодинг по Chess’ноку. 1. e4 Вайбкодинг по Chess’ноку. 1. e4

Но это не вайбкодинг, а тяжёлая профессиональная ИИ-разработка.

За это время по этому проекту в ChatGPT было создано 112 чатов — это примерно 560 промптов.

И в особо напряжённые периоды приходилось вставать по ночам, чтобы оптимально использовать лимиты, которые делятся на 5-часовые и недельные сессии.

Но это не магия и не кнопка «сделать хорошо».

Именно поэтому будущее не за вайбкодингом, а за теми, кто научится управлять этой скоростью.

1 month, 1 week назад @ habr.com
Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества
Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества

Простой пример с ценами на топливо: бензин дорожает и из-за роста цены на нефть, и из-за ее падения.

Осознание того, что твой труд увеличивает чью-то капитализацию, но не решает реальных проблем общества, видимых в быту и в новостях, подтолкнуло искать еще какую-то деятельность.

Кроме того, благодаря АМБ появился уникальный датасет новостей с противоречиями современного общества на kaggle и github, далее о нем.

Датасет новостей о противоречиях современного обществаАктивисты АМБ и волонтеры дружественных коллективов собрали и разметили датасет новостей, подсвечивающие те самые системные противоречия, о которых я задумывался ранее.

Пример Б В 2023 году в мире голодал каждый 11-й человек, а в …

2 months, 3 weeks назад @ habr.com
[Перевод] Как устроен Codex
[Перевод] Как устроен Codex [Перевод] Как устроен Codex

Подробный разбор того, как команда OpenAI Codex создаёт своего кодового агента, как его используют инженеры и что это может значить для будущего разработки ПО.

Чтобы разобраться, как устроен Codex, как команды внутри OpenAI его используют и как он влияет на инженерные практики у создателей ChatGPT, я поговорил с тремя сотрудниками OpenAI:Тибо Соттио (Thibault Sottiaux) — руководитель Codex.

Оба продукта были запущены весной: Codex CLI анонсировали в апреле 2025 года, а Codex в ChatGPT представили в мае.

В команде Codex эти файлы объясняют агенту, как ориентироваться в кодовой базе, какие команды запускать для тестирования и как следовать стандартам проекта.

Использование Codex в OpenAIПомим…

2 months, 3 weeks назад @ habr.com
Курс Natural Language Processing & LLMs — новый сезон
Курс Natural Language Processing & LLMs — новый сезон Курс Natural Language Processing & LLMs — новый сезон

10 февраля мы в очередной раз запускаем бесплатный онлайн-курс по обработке естественного языка (Natural Language Processing).

Что будем проходить:классическое начало: закон Ципфа, TF-IDF, RNN, CNN, Transformer;основные задачи NLP: классификация текста, тегирование и генерация;специфичные области: агенты и вайб-кодинг;LLM и их применение.

Если вы студент ИТМО, МФТИ или ВШЭ, то курс можно зачесть, как учебный.

Работаю в области NLP более 12 лет, успел поработать в Яндексе и ВКонтакте, защитить кандидатскую диссертацию.

Если есть вопросы, то приходите с ними в ODS Mattermost – там будут все ответы, время семинаров и ссылки.

3 months, 2 weeks назад @ habr.com
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода

Однако все задачи в MERA CODE, как впрочем и в SWE-bench и других бенчмарках подобного назначения, следуют классической парадигме, когда у нас есть фиксированный обучающий набор данных и, что более важно, фиксированный проверочный набор.

Но большие языковые модели для кодинга, которые мы и пытаемся оценивать нашим набором, также учатся на GitHub – со времен еще первой модели LLaMa.

Кажется, что 700 задач немного, но это уже очень приличное количество, и что самое важное — это новые задачи.

Current behavior: from sympy import ask, Q, Symbol x = Symbol('x') print(ask(Q.finite(x**-1), Q.real(x))) # Output: True Expected behavior: The function should return None to indicate uncertainty, as x**-…

8 months назад @ habr.com
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке

Ответ: Кэисукэ ТибаSPARQL-запрос SimpleSELECT DISTINCT ?s ?r ?o WHERE { { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

GROUP BY ?s ?r HAVING(count(?o) = 1) } { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

Ответ: Национальная система платежных карт (НСПК) Центр биометрических технологий (ЦБТ) ЕБСSELECT ?s ?r ?o ?len WHERE { { SELECT ?s ?r (COUNT(?o1) as ?len) (GROUP_CONCAT(DISTINCT(STR(?o1));separator="|") AS ?o) WHERE { ?s ?r ?o1 . }

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

9 months, 3 weeks назад @ habr.com
Machine Learning Mastery
последний пост 5 days, 18 hours назад
Implementing Prompt Compression to Reduce Agentic Loop Costs
Implementing Prompt Compression to Reduce Agentic Loop Costs Implementing Prompt Compression to Reduce Agentic Loop Costs

Agentic loops in production can be synonymous with high costs, especially when it comes to both LLM and external application usage via APIs, where billing is often closely related to token usage.

5 days, 18 hours назад @ machinelearningmastery.com
Implementing Permission-Gated Tool Calling in Python Agents
Implementing Permission-Gated Tool Calling in Python Agents Implementing Permission-Gated Tool Calling in Python Agents

AI agents have evolved beyond passive chatbots.

1 week, 1 day назад @ machinelearningmastery.com
The Roadmap to Mastering Tool Calling in AI Agents
The Roadmap to Mastering Tool Calling in AI Agents The Roadmap to Mastering Tool Calling in AI Agents

Most

1 week, 2 days назад @ machinelearningmastery.com
Implementing Statistical Guardrails for Non-Deterministic Agents
Implementing Statistical Guardrails for Non-Deterministic Agents Implementing Statistical Guardrails for Non-Deterministic Agents

Non-deterministic agents are those where the same input can lead to distinct outputs across multiple runs.

1 week, 4 days назад @ machinelearningmastery.com
Agentic RAG Explained in 3 Levels of Difficulty
Agentic RAG Explained in 3 Levels of Difficulty Agentic RAG Explained in 3 Levels of Difficulty

Traditional

1 week, 5 days назад @ machinelearningmastery.com
Effective KV Compression with TurboQuant
Effective KV Compression with TurboQuant Effective KV Compression with TurboQuant

TurboQuant has recently been launched by Google as a novel algorithmic suite and library for applying advanced quantization and compression to large language models (LLMs) and vector search engines — an indispensable element of RAG systems.

2 weeks, 2 days назад @ machinelearningmastery.com
Building AI Agents in Python with Pydantic AI
Building AI Agents in Python with Pydantic AI Building AI Agents in Python with Pydantic AI

<a href="https://machinelearningmastery.

2 weeks, 3 days назад @ machinelearningmastery.com
Effective Context Engineering for AI Agents: A Developer’s Guide
Effective Context Engineering for AI Agents: A Developer’s Guide Effective Context Engineering for AI Agents: A Developer’s Guide

When

2 weeks, 4 days назад @ machinelearningmastery.com
Text Summarization with Scikit-LLM
Text Summarization with Scikit-LLM Text Summarization with Scikit-LLM

In a

2 weeks, 5 days назад @ machinelearningmastery.com
Building AI Agents with Local Small Language Models
Building AI Agents with Local Small Language Models Building AI Agents with Local Small Language Models

The idea of building your own AI agent used to feel like something only big tech companies could pull off.

3 weeks, 2 days назад @ machinelearningmastery.com
Train, Serve, and Deploy a Scikit-learn Model with FastAPI
Train, Serve, and Deploy a Scikit-learn Model with FastAPI Train, Serve, and Deploy a Scikit-learn Model with FastAPI

FastAPI has become one of the most popular ways to serve machine learning models because it is lightweight, fast, and easy to use.

3 weeks, 3 days назад @ machinelearningmastery.com
AI Agent Memory Explained in 3 Levels of Difficulty
AI Agent Memory Explained in 3 Levels of Difficulty AI Agent Memory Explained in 3 Levels of Difficulty

A stateless AI agent has no memory of previous calls.

3 weeks, 4 days назад @ machinelearningmastery.com
Getting Started with Zero-Shot Text Classification
Getting Started with Zero-Shot Text Classification Getting Started with Zero-Shot Text Classification

Zero-shot text classification is a way to label text without first training a classifier on your own task-specific dataset.

3 weeks, 5 days назад @ machinelearningmastery.com
The Complete Guide to Inference Caching in LLMs
The Complete Guide to Inference Caching in LLMs The Complete Guide to Inference Caching in LLMs

Calling a large language model API at scale is expensive and slow.

4 weeks, 1 day назад @ machinelearningmastery.com
Python Decorators for Production Machine Learning Engineering
Python Decorators for Production Machine Learning Engineering Python Decorators for Production Machine Learning Engineering

You've probably written a decorator or two in your Python career.

1 month назад @ machinelearningmastery.com
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 2 months назад
Why I Signed The Amicus Brief for Anthropic v Department of War
Why I Signed The Amicus Brief for Anthropic v Department of War Why I Signed The Amicus Brief for Anthropic v Department of War

On Monday, Anthropic filed a lawsuit against the Department of War, and an amicus brief in support of Anthropic was filed on behalf of a number of OpenAI and Google employees.

There’s also an amicus brief filed on behalf of Microsoft.

There’s conflicting reporting, but very broadly, Anthropic signed an agreement with the government to deploy Claude in classified, military contexts.

Anthropic said no, Pete Hegseth declared them a supply chain risk, and Anthropic filed a lawsuit against this.

The amicus brief was broadly aligned with my thoughts on the matter, so I signed.

2 months назад @ alexirpan.com
MIT Mystery Hunt 2026
MIT Mystery Hunt 2026 MIT Mystery Hunt 2026

This has spoilers for MIT Mystery Hunt 2026.

Pre-HuntThe time running up to Hunt was more stressful than usual…very briefly, I typically hunt with teammate.

Just last year, I did GPH 2025, LN Hunt, Teammate Hunt 2025, Microsoft Hunt 2025, and Silph Puzzle Hunt 2025, all of which had significant 3+ hour solve puzzles that would not be out of place in Mystery Hunt.

Not to mention smaller hunts like Advent Hunt, and then I didn’t even do Brown Puzzlehunt or Vertex Hunt or the fall CMU Hunt.

To me, the crux is whether Mystery Hunt is broken, or Mystery Hunt is fine.

3 months, 2 weeks назад @ alexirpan.com
Authentic Imperfection
Authentic Imperfection Authentic Imperfection

* * *I’ve been thinking about the anger surrounding generative AI.

To keep things fair, he took the best human images and best AI images, meaning human art from famous artists, and AI art from prompters skilled at removing obvious tells of image generation.

When people complain about AI slop, I see it as a complaint against the deluge of default style AI images.

We’ve seen this happen in all forms: AI text, AI music, older forms of computer generated content like CGI.

As much as we celebrate imperfection, digital imperfection is a step too far.

6 months назад @ alexirpan.com
Ten Years Later
Ten Years Later Ten Years Later

Every now and then, someone asks me why I blog, and I don’t know really know what to tell them.

That’s another reason I’m not celebrating 10 years with more gusto, I know I’ve been writing less.

Indiana Jones and the Great Circle: I don’t know how they did it, but Indiana Jones and the Great Circle was just fun all the way through.

My one complaint is that the hand-to-hand combat feels like the worst part of the game, so of course they put a bunch of upgrades behind learning parry timings you’ll never use later.

I have not tried Peak, but Another Crab’s Treasure was really good and is worth playing if you’re interested in a Souls-like.

9 months назад @ alexirpan.com
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025 Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025

A music concert in the evenings, typically set up as a rave with EDM or rock music made by brony musicians.

She has been involved in organizing pony music concerts for over a decade, for both BABSCon and other pony conventions.

Thank you, BABSCon ChairsThe brony musicians immediately jump into an emergency Discord call with Pinkaboo, to get her side of the story.

Other conventions start tweeting in support of the brony musicians, with no one taking BABSCon’s side.

It’s hard for me to explain why I like MLP fan music, because brony music really isn’t accessible.

9 months, 4 weeks назад @ alexirpan.com
Lil'Log
последний пост None
inFERENCe
последний пост 2 months, 2 weeks назад
The Future of Software
The Future of Software The Future of Software

February 25, 2026The Future of SoftwareThe world of software is undergoing a shift not seen since the advent of compilers in the 1970s.

How will humans tell AI agents what software artefacts we would like to create?

How will humans tell AI agents what software artefacts we would like to create?

This future of software creation, in which our programming languages are abstracted away, raises two very important questions:What will the instruction/specification language look like?

This should be a clear layer of separation between the developer and the pool of AI agents working to maintain software.

2 months, 2 weeks назад @ inference.vc
Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On
Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On

Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years OnTen years ago this week, I wrote a provocative and bold post that blew up, made it to top spot on HackerNews.

In hindsight: There is a lot of stuff in deep learning that we don't understand nearly enough.

Sometimes things work for reasons completely unrelated to why we thought they would work.

(Pop some 🍿 in the microwave and read till the end for more)🎯 "Deep learning is powerful exactly because it makes hard things easy"Okay, this was a great insight.

🎯 Generative ModelingIn the post I suggested people learn "something harder" instead of - or in addition to - deep learning.

3 months, 2 weeks назад @ inference.vc
The Spectator
последний пост None
The Unofficial Google Data Science Blog The Unofficial Google Data Science Blog
последний пост None
Off the Convex Path
последний пост None
Jay Alammar
последний пост None
Piekniewski's blog
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост 9 months, 4 weeks назад
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy /henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos.

Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker.

It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage.

By learning geometry and motion jointly from …

9 months, 4 weeks назад @ paperswithcode.com
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation /antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation

Calisthenics skill classification is the computer vision task of inferring the skill performed by an athlete from images, enabling automatic performance assessment and personalized analytics.

Traditional methods for calisthenics skill recognition are based on pose estimation methods to determine the position of skeletal data from images, which is later fed to a classification algorithm to infer the performed skill.

This work proposes a direct approach to calisthenics skill recognition, which leverages depth estimation and athlete patch retrieval to avoid the computationally expensive human pose estimation module.

Using Depth Anything V2 for depth estimation and YOLOv10 for athlete localizat…

9 months, 4 weeks назад @ paperswithcode.com
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI /snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI

Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost.

Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parallelism strategy that adapts to real-world traffic while integrating speculative decoding, SwiftKV compute reduction, and optimized embedding inference.

It achieves up to 3.4 times faster request completion, 1.75 times faster generation, and 1.6M tokens/sec per GPU for embeddings, outperforming both latency- and throughput-optimized deployments.

Already powering Snowflake Cortex AI, Arctic Inference delivers state-of-the-art, cost-effective inference for ent…

9 months, 4 weeks назад @ paperswithcode.com
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale /NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting.

The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales.

FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches.

In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realis…

9 months, 4 weeks назад @ paperswithcode.com
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work? /jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

We study A/B experiments that are designed to compare the performance of two recommendation algorithms.

The bias arising from this type of data sharing is known as "symbiosis bias".

In this paper, we highlight that, for decision-making purposes, the sign of the GTE often matters more than its precise magnitude when selecting the better algorithm.

We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the expected GTE estimate under data sharing aligns with or contradicts the sign of the true GTE.

Our analysis identifies the level of exploration versus exploitation as a key determinant of how symbiosis bias impacts algorithm selection.

9 months, 4 weeks назад @ paperswithcode.com
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression /qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression

Task-agnostic prompt compression leverages the redundancy in natural language to reduce computational overhead and enhance information density within prompts, especially in long-context scenarios.

Existing methods predominantly rely on information entropy as the metric to compress lexical units, aiming to achieve minimal information loss.

However, these approaches overlook two critical aspects: (i) the importance of attention-critical tokens at the algorithmic level, and (ii) shifts in information entropy during the compression process.

Motivated by these challenges, we propose a dynamic attention-aware approach for task-agnostic prompt compression (DAC).

This approach effectively integrate…

9 months, 4 weeks назад @ paperswithcode.com
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions /lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions

Large Language Models (LLMs) can provide accurate word definitions and explanations for any context.

However, the scope of the definition changes for different target groups, like children or language learners.

We investigate how simplification impacts homonym definition quality across three target groups: Normal, Simple, and ELI5.

Our results show that simplification drastically degrades definition completeness by neglecting polysemy, increasing the risk of misunderstanding.

Fine-tuning Llama 3.1 8B with Direct Preference Optimization substantially improves homonym response quality across all prompt types.

9 months, 4 weeks назад @ paperswithcode.com
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention /pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention

Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations - fabricated content contradicting visual inputs.

Existing hallucination mitigation methods either incur prohibitive computational costs or introduce distribution mismatches between training data and model outputs.

We identify a critical insight: hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.

To address this, we propose **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning), a framework that eliminates dependency on human annotations…

9 months, 4 weeks назад @ paperswithcode.com
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models /owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Language models (LMs) are challenging to adapt to new data distributions by simple finetuning.

This is due to the rigidity of their subword tokenizers, which typically remain unchanged during adaptation.

This inflexibility often leads to inefficient tokenization, causing overfragmentation of out-of-distribution domains, unseen languages, or scripts.

In this work, we develop byte-level LMs with learnable tokenizers to make tokenization adaptive.

Our models include a submodule that learns to predict boundaries between the input byte sequence, encoding it into variable-length segments.

9 months, 4 weeks назад @ paperswithcode.com
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion /wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion

To address the challenges posed by cascading reactions caused by component failures in autonomous cargo ships (ACS) and the uncertainties in emergency decision-making, this paper proposes a novel hybrid feature fusion framework for constructing a graph-structured dataset of failure modes.

A hierarchical feature fusion framework is constructed, using Word2Vec encoding to encode subsystem/component features, BERT-KPCA to process failure modes/reasons, and Sentence-BERT to quantify the semantic association between failure impact and emergency decision-making.

The dataset covers 12 systems, 1,262 failure modes, and 6,150 propagation paths.

In the label prediction results, the Shore-based Meteor…

9 months, 4 weeks назад @ paperswithcode.com
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering /YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering

In recent years, models based on Graph Convolutional Networks (GCN) have made significant strides in the field of graph data analysis.

Although the Graph Transformer architecture has mitigated some of these issues, its performance is still limited when processing heterogeneous graph data.

To address these challenges, this study proposes a novel deep clustering framework that comprising GCN, Autoencoder (AE), and Graph Transformer, termed the Tri-Learn Graph Fusion Network (Tri-GFN).

The tri-learning mechanism allows mutual learning among these modules, while the feature fusion strategy enables the model to capture complex relationships, yielding highly discriminative representations for gra…

9 months, 4 weeks назад @ paperswithcode.com
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation /mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression.

The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant.

The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable.

We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69\% test accuracy in just 20 ep…

9 months, 4 weeks назад @ paperswithcode.com
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation /Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation

While this problem has been widely studied in traditional recommendation settings, its implications for bundle recommendation (BR) remain largely unexplored.

Existing fairness frameworks and metrics designed for traditional recommender systems may not directly translate to this multi-layered setting.

In this paper, we conduct a comprehensive reproducibility study of product-side fairness in BR across three real-world datasets using four state-of-the-art BR methods.

We analyze exposure disparities at both the bundle and item levels using multiple fairness metrics, uncovering important patterns.

Overall, our findings offer actionable insights for building fairer bundle recommender systems and…

9 months, 4 weeks назад @ paperswithcode.com
/cbobed/ OntView: What you See is What you Meant
/cbobed/ OntView: What you See is What you Meant /cbobed/ OntView: What you See is What you Meant

However, the lack of tools that provide effective visualization is still a significant challenge.

In this paper, we present OntView, an ontology viewer that is designed to provide users with an intuitive visual representation of ontology concepts and their formal definitions through a user-friendly interface.

Building on the use of a DL reasoner, OntView follows a "What you see is what you meant" paradigm, showing the actual inferred knowledge.

One key aspect for this is its ability to visualize General Concept Inclusions (GCI), a feature absent in existing visualization tools.

OntView has been released with an open-source license for the whole community.

9 months, 4 weeks назад @ paperswithcode.com
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction /Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction

These approaches fail to capture elaborate relations hidden in real-world bundle structures, resulting in suboptimal bundle representations.

To overcome this limitation, we propose RaMen, a novel method that provides a holistic multi-strategy approach for bundle construction.

RaMen utilizes both intrinsic (characteristics) and extrinsic (collaborative signals) information to model bundle structures through Explicit Strategy-aware Learning (ESL) and Implicit Strategy-aware Learning (ISL).

Integrating diverse strategies enables RaMen to learn more comprehensive and robust bundle representations.

Meanwhile, Multi-strategy Alignment & Discrimination module is employed to facilitate knowledge tr…

9 months, 4 weeks назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 9 months, 4 weeks назад
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation /PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation

The rise of Large Reasoning Models (LRMs) promises a significant leap forward in language model capabilities, aiming to tackle increasingly sophisticated tasks with unprecedented efficiency and accuracy.

However, despite their impressive performance, recent studies have highlighted how current reasoning models frequently fail to generalize to novel, unseen problems, often resorting to memorized solutions rather than genuine inferential reasoning.

In this paper, we introduce Nexus Architect, an enhanced iteration of our multi-agent system framework, Nexus, equipped with a novel automated workflow synthesis mechanism.

Given a user's prompt and a small set of representative examples, the Archi…

9 months, 4 weeks назад @ paperswithcode.com
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings /sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings

One of the most tested ways to establish such a communication is through the use of sign based languages.

However, not many people are aware of the smaller intricacies involved with sign language.

Sign language recognition using computer vision aims at eliminating the communication barrier between deaf-mute and ordinary people so that they can properly communicate with others.

In recent studies, it has been found that people with hearing disabilities prefer to sign over typing during these video calls.

In this paper, we are proposing a browser extension that will automatically translate sign language to subtitles for everyone else in the video call.

9 months, 4 weeks назад @ paperswithcode.com
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates /alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates

In this paper, we present our submission to the MM-ArgFallacy2025 shared task, which aims to advance research in multimodal argument mining, focusing on logical fallacies in political debates.

Our approach uses pretrained Transformer-based models and proposes several ways to leverage context.

In the fallacy classification subtask, our models achieved macro F1-scores of 0.4444 (text), 0.3559 (audio), and 0.4403 (multimodal).

Our multimodal model showed performance comparable to the text-only model, suggesting potential for improvements.

PDFAbstract

9 months, 4 weeks назад @ paperswithcode.com
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms /RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

On-demand ride-sharing platforms face the fundamental challenge of dynamically bundling passengers with diverse origins and destinations and matching them with vehicles in real time, all under significant uncertainty.

However, conventional MARL-based ride-sharing approaches heavily rely on the accurate estimation of Q-values or V-values, which becomes problematic in large-scale, highly uncertain environments.

To address these challenges, we propose two novel alternative methods that bypass value function estimation.

First, we adapt GRPO to ride-sharing, replacing the PPO baseline with the group average reward to eliminate critic estimation errors and reduce training bias.

Second, inspired b…

9 months, 4 weeks назад @ paperswithcode.com
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation /LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Include the markdown at the top of your GitHub README.md file to showcase the performance of the model.

Badges are live and will be dynamically updated with the latest ranking of this paper.

9 months, 4 weeks назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 9 months, 4 weeks назад
💼 University and corporation labs
DeepMind DeepMind
последний пост 1 week, 3 days назад
AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields
AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields

A year ago, we introduced AlphaEvolve, a Gemini-powered coding agent for designing advanced algorithms.

Today, because algorithms are part of nearly every aspect of life, the landscape of what AlphaEvolve can achieve is even broader.

From helping explain the physics of the natural world to powering electricity grids and computing infrastructure, there are countless ways AlphaEvolve can help accelerate progress for scientists and businesses across a variety of fields.

We’re excited to share a collection of AlphaEvolve’s most significant impact to date.

Driving social impact and sustainabilityAlphaEvolve has helped uncover key connections in health and sustainability research.

1 week, 3 days назад @ deepmind.google
Enabling a new model for healthcare with AI co-clinician
Enabling a new model for healthcare with AI co-clinician Enabling a new model for healthcare with AI co-clinician

Health systems worldwide are striving for better outcomes, lower costs, and an improved experience for both patients and clinicians.

That's why, today, we are announcing our AI co-clinician research initiative, to explore how AI could better amplify doctors’ expertise and deliver higher quality care to patients.

We also have a long history of studying how clinicians and AI systems might work together.

This serves as the foundation of our AI co-clinician research initiative: AI designed to function as a collaborative member of the care team that interacts with patients under expert clinical supervision.

We designed and evaluated AI co-clinician in both clinician and patient-facing settings.

2 weeks, 2 days назад @ deepmind.google
Announcing our partnership with the Republic of Korea
Announcing our partnership with the Republic of Korea Announcing our partnership with the Republic of Korea

Helping make this vision a reality, Google will establish an AI Campus in the Republic of Korea — an AI-focused facility within its Seoul offices.

AI co-scientist - a multi-agent AI system that acts as a virtual scientific collaborator to help researchers brainstorm and verify hypotheses.

To support the next generation of Korean AI talent, we are opening doors to forge connections with Google DeepMind, including exploring internship opportunities for Korean students.

Finally, following our Frontier AI Safety Commitments made at the AI Seoul Summit, we will collaborate with the Korean AI Safety Institute (AISI) on research and best practices.

By combining Google DeepMind's frontier AI models…

2 weeks, 5 days назад @ deepmind.google
Decoupled DiLoCo: A new frontier for resilient, distributed AI training
Decoupled DiLoCo: A new frontier for resilient, distributed AI training Decoupled DiLoCo: A new frontier for resilient, distributed AI training

Training a frontier AI model traditionally depends on a large, tightly coupled system in which identical chips must stay in near-perfect synchronization.

Today, in a new paper we are excited to share a new approach to this problem, called Decoupled DiLoCo (Distributed Low-Communication).

The result is a more resilient and flexible way to train advanced models across globally distributed data centers.

And crucially, Decoupled DiLoCo does not suffer the communication delays that made previous distributed methods like Data-Parallel impractical at global scale.

As frontier models continue to grow in scale and complexity, we’re exploring diverse approaches to train models across more compute, lo…

3 weeks, 3 days назад @ deepmind.google
Partnering with industry leaders to accelerate AI transformation
Partnering with industry leaders to accelerate AI transformation Partnering with industry leaders to accelerate AI transformation

We’re joining forces with Accenture, Bain & Company, BCG, Deloitte, and McKinsey to bring the power of frontier AI to organizations around the world.

A new initiative for enterprise transformationWe’re partnering with global enterprise consultancies to help them deliver world-leading agentic transformation for customers at speed and scale.

Early access to frontier models : Partners will receive early access to our frontier models, including the Gemini family.

Access to AI leadership: We will connect our leadership with customer CEOs and boards, helping them navigate the future of frontier AI research and development.

Looking aheadThese efforts build upon Google Cloud’s work supporting globa…

3 weeks, 4 days назад @ deepmind.google
Gemini 3.1 Flash TTS: the next generation of expressive AI speech
Gemini 3.1 Flash TTS: the next generation of expressive AI speech Gemini 3.1 Flash TTS: the next generation of expressive AI speech

Today, we’re introducing Gemini 3.1 Flash TTS, the latest text-to-speech model that delivers improved controllability, expressivity and quality — empowering developers, enterprises and everyday users to build the next generation of AI-speech applications.

Starting today, 3.1 Flash TTS is rolling out:For developers in preview via the Gemini API and Google AI StudioFor enterprises in preview on Vertex AIFor Workspace users via Google VidsImproved speech quality and controllabilityWe’ve improved the overall speech quality of Gemini 3.1 Flash TTS, making it our most natural and expressive model to date.

On the Artificial Analysis TTS leaderboard, a benchmark that captures thousands of blind hum…

1 month назад @ blog.google
Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning
Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning

Today, we’re introducing Gemini Robotics-ER 1.6, a significant upgrade to our reasoning-first model that enables robots to understand their environments with unprecedented precision.

This model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning and success detection.

Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection.

Starting today, Gemini Robotics-ER 1.6 is available to developers via the Gemini API and Google AI Studio.

To help you get started, we …

1 month назад @ deepmind.google
Gemma 4: Byte for byte, the most capable open models
Gemma 4: Byte for byte, the most capable open models Gemma 4: Byte for byte, the most capable open models

By using these highly optimized models, you can fine-tune Gemma 4 to achieve state-of-the-art performance on your specific tasks.

Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding.

All models natively process video and images, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding.

Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding.

The edge models feature a 128K context window, while the larger models offer up to 256K, allowing you to pass repositories or long documents in a single prompt.

1 month, 2 weeks назад @ blog.google
Reimagining the mouse pointer for the AI era
Reimagining the mouse pointer for the AI era Reimagining the mouse pointer for the AI era

We are developing more seamless, intuitive ways to collaborate with AIThe mouse pointer has been a constant companion on computer screens, across every website, document and workflow.

Our goal is to address a common frustration: because a typical AI tool lives in its own window, users need to drag their world into it.

We want the opposite: intuitive AI that meets users across all the tools they use, without interrupting their flow.

Today, we’re outlining the underlying principles guiding our thinking on future user interfaces, and sharing experimental demos of an AI-enabled pointer, powered by Gemini.

For example, you could visit Google AI Studio to edit an image or find places on the map, …

1 month, 2 weeks назад @ deepmind.google
Gemini 3.1 Flash Live: Making audio AI more natural and reliable
Gemini 3.1 Flash Live: Making audio AI more natural and reliable Gemini 3.1 Flash Live: Making audio AI more natural and reliable

Today, we’re advancing Gemini’s real-time dialogue capabilities with Gemini 3.1 Flash Live, our highest-quality audio and voice model yet.

It delivers the speed and natural rhythm needed for the next generation of voice-first AI, offering a more intuitive experience for developers, enterprises and everyday users.

3.1 Flash Live is available across Google products:For developers: Robust reasoning and task executionWe’ve improved 3.1 Flash Live’s overall quality, making it more reliable for developers and enterprises to build voice-first agents that can complete complex tasks at scale.

On ComplexFuncBench Audio, a benchmark that captures multi-step function calling with various constraints, i…

1 month, 3 weeks назад @ blog.google
Protecting people from harmful manipulation
Protecting people from harmful manipulation Protecting people from harmful manipulation

Why harmful manipulation mattersConsider two scenarios: One AI model gives you facts to make a well-informed healthcare decision that improves your well-being.

Another AI model uses fear to pressure you to make an ill-informed decision that harms your health.

Developing new evaluations for a complex challengeTesting the outcomes of AI harmful manipulationTesting for harmful manipulation is inherently difficult because it involves measuring subtle changes in how people think and act, varying heavily by topic, culture and context.

Our findings show that success in one domain does not predict success in another, validating our targeted approach to testing for harmful manipulation in specific, …

1 month, 3 weeks назад @ deepmind.google
Lyria 3 Pro: Create longer tracks in more
Lyria 3 Pro: Create longer tracks in more Lyria 3 Pro: Create longer tracks in more

Vertex AI: Lyria 3 Pro is now in public preview on Vertex AI for businesses who require on-demand audio at scale.

Lyria 3 Pro is now available alongside Lyria RealTime in AI Studio.

Google Vids: Vids is an AI-powered video creation app that anyone can use.

This is rolling out to Google Workspace customers and Google AI Pro & Ultra subscribers starting this week.

Gemini app: Longer generations with Lyria 3 Pro are now available in the Gemini app, starting with paid subscribers.

1 month, 3 weeks назад @ blog.google
Measuring progress toward AGI: A cognitive framework
Measuring progress toward AGI: A cognitive framework Measuring progress toward AGI: A cognitive framework

Artificial General Intelligence (AGI) has the potential to accelerate scientific discovery and help solve some of humanity’s most pressing problems.

Tracking progress toward AGI will require a wide range of methods and approaches, and we believe cognitive science provides one important piece of the puzzle.

That’s why today, we’re releasing a new paper, “Measuring Progress Toward AGI: A Cognitive Taxonomy,” that presents a scientific foundation for understanding the cognitive capabilities of AI systems.

Deconstructing general intelligenceOur framework draws on decades of research from psychology, neuroscience and cognitive science to develop a cognitive taxonomy.

It identifies 10 key cogniti…

2 months назад @ blog.google
From games to biology and beyond: 10 years of AlphaGo’s impact
From games to biology and beyond: 10 years of AlphaGo’s impact From games to biology and beyond: 10 years of AlphaGo’s impact

Scientific collaboration: We are integrating the search and reasoning principles pioneered with AlphaGo into an AI co-scientist.

We’ve also used AI to better understand the genome, advance fusion energy research, improve weather prediction and more.

Future of intelligenceFor an AI to be truly general, it needs to understand the physical world.

We think the combination of Gemini’s world models, AlphaGo’s search and planning techniques, and specialized AI tool use will prove to be critical for AGI.

True creativity is a key capability that such an AGI system would need to exhibit.

2 months, 1 week назад @ deepmind.google
Gemini 3.1 Flash-Lite: Built for intelligence at scale
Gemini 3.1 Flash-Lite: Built for intelligence at scale Gemini 3.1 Flash-Lite: Built for intelligence at scale

Today, we're introducing Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model.

Built for high-volume developer workloads at scale, 3.1 Flash-Lite delivers high quality for its price and model tier.

Starting today, 3.1 Flash-Lite is rolling out in preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI.

Cost-efficiency without compromisePriced at just $0.25/1M input tokens and $1.50/1M output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models.

This low latency is needed for high-frequency workflows, making it an ideal model for developers to build responsive, real-time experiences.

2 months, 2 weeks назад @ blog.google
Google
последний пост 1 day, 14 hours назад
Gemini Live Agent Challenge: Announcing the winners and highlights
Gemini Live Agent Challenge: Announcing the winners and highlights Gemini Live Agent Challenge: Announcing the winners and highlights

The Gemini Live Agent Challenge is officially in the books!

We challenged developers worldwide to break out of the traditional 'text box' paradigm by building next-generation AI agents.

Participants pushed the boundaries of interactive AI across three distinct categories: The Live Agent, The Creative Storyteller, and The UI Navigator.

Two of these standout developers were even recognized in person at Google Cloud Next 2026.

Celebrating our category winners at Google Cloud Next ‘26Category winners Jeremiah Somoine and Bryen Param were invited to attend Google Cloud Next 2026 in Las Vegas, where they shared their experiences and insights with the broader developer community.

1 day, 14 hours назад @ cloud.google.com
Cloud CISO Perspectives: How Google + Wiz changes multicloud strategy for CISOs
Cloud CISO Perspectives: How Google + Wiz changes multicloud strategy for CISOs Cloud CISO Perspectives: How Google + Wiz changes multicloud strategy for CISOs

Through innovations like Wiz Code, developers get granular data linking production issues directly back to their repositories, empowering them to fix vulnerabilities right where the code is written.

Supercharging the agentic SOC future with data and automationData is the lifeblood of AI and cloud security.

Wiz currently sits on a trove of sanitized data that captures the characteristics of highly secure, resilient, and compliant multicloud environments.

Wiz’s Red, Blue, and Green agents, and Google Security Operations’ Threat Hunting, Detection Engineering, and Third-Party Context agents, can help you develop the human-above-the-loop approach that empowers security teams to rapidly scale up…

2 days, 14 hours назад @ cloud.google.com
The power of LLMs on your data, more than two orders of magnitude faster and cheaper
The power of LLMs on your data, more than two orders of magnitude faster and cheaper The power of LLMs on your data, more than two orders of magnitude faster and cheaper

Proxy models are cost-optimized ultra-lightweight models tailored to a specific query (aka prompt) and tuned for your data.

The fundamental ideas behind proxy models were proposed in Universal Query Engine (UQE) at NeurIPS 2024 by Google DeepMind.

Furthermore, the proxy models run fast in the CPU — no need for dedicated hardware.

We hope that we gave you good intuitions for why proxy models work.

How Proxy Models Work?

3 days, 14 hours назад @ cloud.google.com
How Glance turns hours of video into mobile-ready clips with AI
How Glance turns hours of video into mobile-ready clips with AI How Glance turns hours of video into mobile-ready clips with AI

Every day, thousands of hours of new video content sits waiting to be discovered.

Most of it lives in long-form, horizontal formats, while audiences are scrolling through vertical feeds on their phones.

Here’s how Glance’s video generation solution works.

Building for the lock screen eraThe goal was to create a complete pipeline that takes a long-form landscape video (16:9) and outputs multiple ready-to-publish short-form portrait videos (9:16).

This includes distinguishing between a static image and a live person to ensure the crop focuses on the actual speaker.

3 days, 14 hours назад @ cloud.google.com
How Imgix processes 8 billion images daily with G4 VMs powered by NVIDIA Blackwell
How Imgix processes 8 billion images daily with G4 VMs powered by NVIDIA Blackwell How Imgix processes 8 billion images daily with G4 VMs powered by NVIDIA Blackwell

By transitioning to G4 VMs powered by NVIDIA RTX PRO 6000 Blackwell GPUs, Imgix ramped up its real-time processing capabilities, cutting median latency by 50% and increasing throughput per node by 6x.

And with G4 VMs, they were able to process images instantly upon request rather than pre-rendering and storing millions of image variations.

Imgix is leveraging this structural advantage by using G4 VMs.

The G4 VM’s custom P2P interconnect yields up to 168% more throughput than standard configurations.

With this architecture, Imgix can move all its image processing operations to NVIDIA GPUs and run multiple requests in parallel.

4 days, 14 hours назад @ cloud.google.com
Beyond source code: The files AI coding agents trust — and attackers exploit
Beyond source code: The files AI coding agents trust — and attackers exploit Beyond source code: The files AI coding agents trust — and attackers exploit

Attack surface: What executesJust as developers rely on project configuration to automate setup, debugging, and routine tasks, AI coding agents and modern developer tools also inherit execution paths from repository files.

Attack surface: What instructsAI coding agents also consume persistent instruction files that shape how they behave inside a project.

These files can influence what the agent prioritizes, what it ignores, which tools it uses, which files it trusts, and which actions it takes automatically.

Reusing them across repositories introduces a supply-chain risk, because malicious instructions can be presented as harmless guidance while steering otherwise legitimate agent workflows…

4 days, 14 hours назад @ cloud.google.com
Cloud Storage Rapid: Turbocharged object storage for AI and analytics
Cloud Storage Rapid: Turbocharged object storage for AI and analytics Cloud Storage Rapid: Turbocharged object storage for AI and analytics

Cloud Storage Rapid is our response to the generational shift in how organizations build with AI.

Rapid BucketRapid Bucket (GA), helps Cloud Storage meet the evolving demands of massive-scale generative AI, analytics, and other high-performance workloads.

Lightning-fast performanceBy combining the sub-millisecond latency of block-like storage, the throughput of a parallel filesystem, and the scalability and ease of use of object storage, Rapid Bucket provides high performance from the same Cloud Storage that you know and love.

Massive scalability : Rapid Bucket delivers 15+ TB/s of aggregate read throughput from a single Rapid zonal bucket.

Faster checkpointing: Rapid Bucket makes checkpoin…

5 days, 13 hours назад @ cloud.google.com
Cluster-level reliability for trillion-parameter models on TPUs
Cluster-level reliability for trillion-parameter models on TPUs Cluster-level reliability for trillion-parameter models on TPUs

Frontier AI models have redefined the unit of compute.

At trillion-parameter scale, AI training requires thousands of interconnected components, orchestrated in industrial-scale deployments to operate as a single, massive entity.

Yet for almost two decades, instance-level reliability has been the cloud standard.

Designed for microservices and horizontally scalable applications, instance-level reliability treats infrastructure as a collection of small independent units.

Deep dive: The mathematics of availability at scaleInstance-level reliability models are often deterministic, but industrial-scale AI deployments require a probabilistic approach over thousands of chips.

5 days, 13 hours назад @ cloud.google.com
Gemini 3.1 Flash-Lite is now generally available on Gemini Enterprise Agent Platform
Gemini 3.1 Flash-Lite is now generally available on Gemini Enterprise Agent Platform Gemini 3.1 Flash-Lite is now generally available on Gemini Enterprise Agent Platform

Today, we’re thrilled to announce that Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model yet, is now generally available.

Designed for ultra-low latency, high-volume tasks, and unmatched cost-efficiency, Flash-Lite is already transforming how applications are built at scale.

Developers and enterprises have noted that the model provides the precision required for agentic tasks like tool calling and orchestration, coupled with the cost-efficiency needed to run automated pipelines at scale.

Software development and engineeringEngineering teams require models that can keep pace with real-time coding environments.

With the GA of Gemini 3.1 Flash-Lite, developers ar…

1 week, 2 days назад @ cloud.google.com
How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms
How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms How BASF manages thousands of supply chain decisions with AlphaEvolve’s agentic algorithms

To understand how local decisions ripple across their entire global network, BASF turned to AlphaEvolve on Google Cloud to build a digital twin of their supply chain.

Because the network is so large, a planner can’t easily see how a localized decision affects the rest of the global supply chain.

“We had several attempts to build a digital twin for our complex supply network using deterministic models, and all of them failed,” said Dr. Goetz Krabbe, vice president for global supply chain at BASF.

Using it we can optimize our inventory levels and respond to market volatility with confidence while avoiding stockouts."

BASF’s objective is to create a digital twin of their entire global producti…

1 week, 2 days назад @ cloud.google.com
Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX
Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX Pioneering AI-assisted code migration: How Google achieved 6x faster migration from TensorFlow to JAX

However, AI model migration represents a whole new level of complexity that requires even more advanced methods for AI-assisted migration.

Translating a production-grade machine learning model from one framework to another, for example, from TensorFlow (TF) to JAX, is not a simple syntax update.

The result is 6x faster model migration, a milestone Sundar highlighted in the recent Google Cloud Next keynote.

Designed around a functional, stateless paradigm, JAX is heavily optimized for modern Tensor Processing Unit (TPU) infrastructure and XLA compilation, making it the bedrock of the modern AI stack.

Manually migrating these models to JAX requires a fundamental rethinking of how layers inter…

1 week, 3 days назад @ cloud.google.com
The Blueprint: Translating stream-of-conscious speech into responsive, actionable task lists
The Blueprint: Translating stream-of-conscious speech into responsive, actionable task lists The Blueprint: Translating stream-of-conscious speech into responsive, actionable task lists

The challenge:We launched Ramble to take our popular Todoist application to the next level by capturing non-stop, stream-of-consciousness talking.

Our inspiration was that scene from The Devil Wears Prada where Miranda Priestly rapid-fires a dozen tasks at her assistant.

The solution:We built Ramble using Gemini Enterprise Agent Platform and its previous iteration, Vertex AI; specifically, we’re using Agent Platform to access the Gemini Flash models.

Gemini’s Live API (accessed via Agent Platform) powers Ramble’s core real-time interactions and key capabilities, including native audio streaming, proactive tool calling, session resumption, and multilingual understanding.

The APIs in Agent Pl…

1 week, 3 days назад @ cloud.google.com
Fitting the future: How Breuninger boosted sales with its "be your own model" AI
Fitting the future: How Breuninger boosted sales with its "be your own model" AI Fitting the future: How Breuninger boosted sales with its "be your own model" AI

Breuninger, a fashion and lifestyle company based in Germany, thought emerging generative media models could be a good fit for this fashion conundrum.

Working with Google Cloud, they built a virtual try-on experience that lets shoppers see high-end fashion on their own bodies using a simple selfie.

From trusted tester to live productThe project began when the Google Cloud team in Germany invited Breuninger to join the Trusted Tester Program for the Virtual Try-On (VTO) API.

The 'Be your own model' breakthrough: User feedback showed that customers did not just want to see a model; they wanted to see themselves.

The product owner at Breuninger noted that this close collaboration allowed the t…

1 week, 3 days назад @ cloud.google.com
Five must-have guides to move agents into production with Gemini Enterprise Agent Platform
Five must-have guides to move agents into production with Gemini Enterprise Agent Platform Five must-have guides to move agents into production with Gemini Enterprise Agent Platform

Building AI agents that work well in a demo is one thing, but running them in production requires serious infrastructure.

At Next 26, we announced that Agent Runtime now supports long-running agents that maintain state for up to seven days.

In this article, we’ll share five essential agent design patterns for building long-running agents with Agent Runtime.

The pattern we saw with shadow IT in 2015 is repeating itself with AI agents.

Deep dive: How A2A and MCP work togetherOrganizations will rarely build every AI agent they need entirely from scratch.

1 week, 4 days назад @ cloud.google.com
Introducing Agent Gateway ISV ecosystem for security and governance
Introducing Agent Gateway ISV ecosystem for security and governance Introducing Agent Gateway ISV ecosystem for security and governance

Exabeam can ingest and analyze telemetry from Agent Platform including Agent Gateway, applying behavioral analytics to identify anomalous and high‑risk AI agent activity.

Integrated via Agent Gateway, it enforces data security and policy controls to ensure agent interactions remain governed and compliant across all models.

Ping Identity: Ping Identity integrates with Agent Gateway to bring runtime identity and real-time, fine-grained authorization to agent and tool traffic.

The integration with Agent Gateway ensures every request is continuously verified based on user, agent, context, and policy, rather than relying on static credentials.

Thales (Imperva): Thales provides advanced web appli…

1 week, 4 days назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 1 day, 12 hours назад
Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability
Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability

Our recent paper, “LLMs Corrupt Your Documents When You Delegate”, has generated discussion about the reliability of AI systems in delegated workflows.

Using a controlled evaluation methodology, we examine how well information is preserved across these extended workflows.

We use chained transformation-and-inversion tasks that evaluate whether semantic content is preserved accurately across extended delegated workflows.

Azure AI Foundry Labs Get a glimpse of potential future directions for AI, with these experimental technologies from Microsoft Research.

At the same time, the findings should not be interpreted as evidence that AI systems lack practical value in real-world work today.

1 day, 12 hours назад @ microsoft.com
mimalloc: A new, high-performance, scalable memory allocator for the modern era
mimalloc: A new, high-performance, scalable memory allocator for the modern era mimalloc: A new, high-performance, scalable memory allocator for the modern era

mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free.

The mimalloc memory allocator was initially designed in 2020 as a fast allocator for the state-of-the-art Lean (opens in new tab) and Koka (opens in new tab) programming languages developed at RiSE, both of which use novel compiler-guided reference counting (see Perceus).

ja .LBB0_generic leaq 7 ( %rsi ), %rax ; round to sizeof(void*) andq $-8 , %rax movq 232 ( %rdi , %rax ), %rcx ; rcx = heap->small_pages[index] movq 8 ( %rcx ), %rax ; block = rax = page->free testq %rax , %rax ; block == NULL?

Thus, mimalloc has three free lists per (64 KiB) mimalloc page, and effectively that …

3 days, 12 hours назад @ microsoft.com
GridSFM: A new, small foundation model for the electric grid
GridSFM: A new, small foundation model for the electric grid GridSFM: A new, small foundation model for the electric grid

Microsoft releases a lightweight foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings in grid analysis.

It provides a foundation for the community to build advanced power grid simulators and planning tools without recreating data or models from scratch.

Microsoft introduces GridSFM, a small foundation model for solving AC optimal power flow (AC-OPF) problems in transmission power grids.

Power grids face increasing strain from surging demand, the need to integrate renewable energy sources, transportation electrification, and extreme weather events.

This release adds the first open AC-OPF model that supports multiple grid topo…

3 days, 14 hours назад @ microsoft.com
Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models
Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models

Now we have experimentally synthesized it and measured its thermal conductivity (152 W/m/K) to be close to the thermal conductivity of silicon.

Now we have experimentally synthesized it and measured its thermal conductivity (152 W/m/K) to be close to the thermal conductivity of silicon.

Faster simulation : We have accelerated MatterSim-v1 model inference by 3-5x and integrated it with the LAMMPS software package, enabling large-scale simulations across multiple GPUs.

These include experimental validation of MatterSim predictions for thermal conductors, performance improvements for faster simulation, and the introduction of a new multi-task foundation model for materials characterization.

Le…

4 days, 17 hours назад @ microsoft.com
SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests
SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

In our simulated multi-agent marketplace, agents accepted the first proposal they received up to 93% of the time without exploring alternatives.

Introducing SocialReasoning-BenchFigure 1: Our benchmark measures agents’ social reasoning ability in two domains, calendar coordination and marketplace negotiation.

Finding 3: Outcome optimality shows how much value agents leave on the table.

In calendar, agents perform better but still settle below the midpoint on average.

Finally, Outcome Optimality works well in settings with clear boundaries, where a “good” outcome can be defined and measured.

5 days, 12 hours назад @ microsoft.com
Building realistic electric transmission grid dataset at scale: a pipeline from open dataset
Building realistic electric transmission grid dataset at scale: a pipeline from open dataset Building realistic electric transmission grid dataset at scale: a pipeline from open dataset

The ability to study transmission-level power grid behavior is essential for modern power systems research.

In most of the world, including the United States, realistic transmission-level grid data is classified as critical infrastructure information and subject to strict access controls.

These restrictions exist for good reasons, but the resulting lack of realistic grid models is increasingly exacerbating the challenges power systems face.

In this work, we introduce an open-data-derived pipeline for constructing large-scale, transmission-level power grid models that realistically approximate existing networks without relying on proprietary or restricted datasets.

Using only publicly access…

1 week, 1 day назад @ microsoft.com
Microsoft at NSDI 2026: Advances in large-scale networked systems
Microsoft at NSDI 2026: Advances in large-scale networked systems Microsoft at NSDI 2026: Advances in large-scale networked systems

The USENIX Symposium on Networked Systems Design and Implementation 2026 (opens in new tab) (NSDI ’26) is a leading forum where researchers and practitioners share new research, insights, and advances in the design and operation of these systems.

Microsoft is proud to support NSDI ’26 as a returning sponsor, reflecting our ongoing commitment to advancing systems and networking research and engaging with the broader community.

Together, they highlight advances in building and operating large-scale networked systems.

Spotlight: Microsoft research newsletter Microsoft Research Newsletter Stay connected to the research community at Microsoft.

Wednesday, May 6, 9:00–10:20 AMYuxuan Yan, Zhejiang …

1 week, 4 days назад @ microsoft.com
Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale
Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale

Actions that seem harmless can cascade causing a chain reaction across an agent network.

Invisibility: Information can pass through chains of unaware agents, making the source of an attack hard to trace from any single agent’s perspective.

Each independently contacted a victim agent (Bob) about the same fabricated audit, using varied language and staggered timing to appear unrelated.

Experimental setup: A principal entrusts their agent, Bob, with sensitive personal data: disability accommodation, medical schedule, preferred pharmacy, emergency contact.

Agents relayed summaries of other agents’ private messages to the attacker (one forwarded another agent’s message within seconds), and agent…

2 weeks, 2 days назад @ microsoft.com
AutoAdapt: Automated domain adaptation for large language models
AutoAdapt: Automated domain adaptation for large language models AutoAdapt: Automated domain adaptation for large language models

At a glance Problem : Adapting large language models to specialized, high-stakes domains is slow, expensive, and hard to reproduce.

: Adapting large language models to specialized, high-stakes domains is slow, expensive, and hard to reproduce.

Why it matters: The result is faster, automated, more reliable domain adaptation that turns weeks of manual iteration into repeatable pipelines.

Deploying large language models (LLMs) in real-world, high-stakes settings is harder than it should be.

In our paper, “AutoAdapt: An Automated Domain Adaptation Framework for Large Language Models,” we describe an end-to-end, constraint-aware framework for domain adaptation.

3 weeks, 3 days назад @ microsoft.com
Can we AI our way to a more sustainable world?
Can we AI our way to a more sustainable world? Can we AI our way to a more sustainable world?

Because I do think there’s a role for AI, a huge role for AI.

BURGER: Right, right.

BURGER: Right, right.

So I think that’s also something quite important here that, you know, AI can help facilitate.

And I think that’s not just applying AI to solve solutions through optimization but also thinking about this in an integrated way.

3 weeks, 5 days назад @ microsoft.com
New Future of Work: AI is driving rapid change, uneven benefits
New Future of Work: AI is driving rapid change, uneven benefits New Future of Work: AI is driving rapid change, uneven benefits

Publication New Future of Work Report 2025The New Future of Work report brings together research from inside and outside of Microsoft to understand what is happening as AI enters workplaces.

But usage and confidence vary widely across sectors, and men report using AI at work more often than women.

AI systems are increasingly playing a role in decision-making, creativity, and communication, with AI systems being positioned as a “collaborator.” This raises questions about how to support “collaboration” between people and AI, what we can learn from how people interact with each other, and where the capabilities of AI systems raise different opportunities and create different requirements.

Usin…

1 month, 1 week назад @ microsoft.com
Ideas: Steering AI toward the work future we want
Ideas: Steering AI toward the work future we want Ideas: Steering AI toward the work future we want

JANSSEN: Yeah, yeah, exactly.

TEEVAN: Yeah, yeah, yeah.

I’m curious what you have found particularly surprising about how people and organizations are leveraging AI right now.

And so I do like to picture a future of work where humans are flourishing with AI and where humans still get to do meaningful work.

And I’m very curious about how we can take advantage of AI and do more without running ourselves into the ground because we’re not AI, right?

1 month, 1 week назад @ microsoft.com
ADeLe: Predicting and explaining AI performance across tasks
ADeLe: Predicting and explaining AI performance across tasks ADeLe: Predicting and explaining AI performance across tasks

By linking outcomes to task demands, ADeLe explains differences in performance, showing how it changes as task complexity increases.

AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance.

Top: (1) Model performance on the ADeLe benchmark and (2) the resulting ability profiles, showing each model’s strengths and limitations across core abilities.

Evaluating ADeLeUsing ADeLe, the team evaluated a range of AI benchmarks and model behaviors to understand what current evaluations capture and what they miss.

This makes it possible to both explain and anticipate potential failures b…

1 month, 2 weeks назад @ microsoft.com
AsgardBench: A benchmark for visually grounded interactive planning
AsgardBench: A benchmark for visually grounded interactive planning AsgardBench: A benchmark for visually grounded interactive planning

At a glance To successfully complete tasks, embodied AI agents must ground and update their plans based on visual feedback.

Spanning 108 controlled task instances across 12 task types, the benchmark requires agents to adapt their plans based on what they observe.

Evaluating AsgardBenchWe tested several leading vision-capable models on AsgardBench and observed that high-performing models require visual grounding to consistently succeed.

Across the models, visual input substantially improved performance: most models more than doubled success rates when given images versus text-only descriptions of the scene.

AsgardBench is open source and available on GitHub (opens in new tab), providing a fo…

1 month, 3 weeks назад @ microsoft.com
GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation
GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

Video-to-Spatially Grounded Planning (V2GP) is a framework that converts robot demonstration videos into spatially grounded training data, enabling models to learn planning and grounding jointly.

Grounded planning improves both task success and action accuracy, outperforming decoupled approaches in benchmark and real-world evaluations.

We also built Video-to-Spatially Grounded Planning (V2GP), a framework that converts robot demonstration videos into training data to help VLMs learn this capability.

Decoupled vs. grounded planning, illustrating how ambiguous language causes actions to be grounded to the wrong objects.

In contrast, our approach, grounded planning, performs planning and groun…

1 month, 3 weeks назад @ microsoft.com
MIT AI MIT AI
последний пост 2 days, 9 hours назад
Two from MIT named 2026 Knight-Hennessy Scholars
Two from MIT named 2026 Knight-Hennessy Scholars Two from MIT named 2026 Knight-Hennessy Scholars

MIT master’s student Sunshine Jiang ’25 and Rupert Li ’24 are recipients of this year’s Knight-Hennessy Scholarship.

Rupert Li ’24Rupert Li, from Portland, Oregon, is currently pursuing a PhD in mathematics at Stanford School of Humanities and Sciences.

He graduated from MIT in 2024 with a bachelor’s degree, double majoring in mathematics and computer science, economics, and data science.

Along with his bachelor’s degree, he also received a master’s degree in data science.

In addition to the Knight-Hennessy Scholarship and the Marshall Scholarship, he has been awarded the Hertz Fellowship, P.D.

2 days, 9 hours назад @ news.mit.edu
Universal AI is “a pathway to AI fluency that’s accessible and approachable to anyone, anywhere”
Universal AI is “a pathway to AI fluency that’s accessible and approachable to anyone, anywhere” Universal AI is “a pathway to AI fluency that’s accessible and approachable to anyone, anywhere”

“Universal AI was built to thread that needle.

The result is a pathway to AI fluency that’s approachable to anyone, anywhere.”The need for accessible, practical AI education has never been greater.

Universal AI also includes industry-specific courses that dive into the intersection of AI and health care, sustainability, entrepreneurship, transportation, and more.

Six industry-specific courses are available today, including Holistic AI in Medicine, AI and Entrepreneurship, and AI and Sustainability: Energy.

“MIT’s long history of making knowledge available through MIT Open Learning means it’s only natural we’d feel compelled to bring Universal AI to the world,” adds Kornbluth.

4 days, 15 hours назад @ news.mit.edu
Q&A: Expanding MIT’s global reach through Universal Learning
Q&A: Expanding MIT’s global reach through Universal Learning Q&A: Expanding MIT’s global reach through Universal Learning

MIT's Universal Learning is a new initiative from MIT Open Learning designed to prepare learners everywhere to tackle complex global challenges through boundary-crossing thinking.

Universal AI, the first offering from Universal Learning, launched to the public today.

Dimitris Bertsimas, vice provost for open learning, and Megan Mitchell, senior director of Universal Learning, share how Universal Learning supports MIT’s educational mission, and what makes it distinctive.

My colleagues contributing to current and forthcoming Universal Learning offerings share this same passion.

That’s why with Universal Learning we are prioritizing asynchronous delivery, mobile delivery, translations, and are…

4 days, 15 hours назад @ news.mit.edu
Study: Firms often use automation to control certain workers’ wages
Study: Firms often use automation to control certain workers’ wages Study: Firms often use automation to control certain workers’ wages

Rather than implement automation in pursuit of maximal productivity, firms have often used automation to replace employees who specifically receive a “wage premium,” earning higher salaries than other comparable workers.

For one thing, automation has affected the growth in U.S. income inequality even more than many observers realize.

This inefficient targeting of certain employees has offset 60-90 percent of the productivity gains from automation during the time period.

Inequality implicationsDating back to the 2010s, Acemoglu and Restrepo have combined to conduct many studies about automation and its effects on employment, wages, productivity, and firm growth.

Certain types of automation c…

1 week, 3 days назад @ news.mit.edu
Games people — and machines — play: Untangling strategic reasoning to advance AI
Games people — and machines — play: Untangling strategic reasoning to advance AI Games people — and machines — play: Untangling strategic reasoning to advance AI

Nevertheless, one month after graduating with his undergraduate degree, Farina began a doctoral degree in computer science at Carnegie Mellon University.

As he was finishing his doctorate, Farina worked for a year as a research scientist in Meta’s Fundamental AI Research Labs.

An everyday example occurs in the game of poker, where players bluff in order to conceal information about their cards.

Stratego is a military strategy game that has inspired research efforts costing millions of dollars to produce systems capable of beating human players.

I am excited about seeing these algorithms incorporated into the broader AI revolution that’s happening around us.”

1 week, 4 days назад @ news.mit.edu
Beacon Biosignals is mapping the brain during sleep
Beacon Biosignals is mapping the brain during sleep Beacon Biosignals is mapping the brain during sleep

Beacon Biosignals is working to make sense of the brain by monitoring its activity while people sleep.

With each deployment, Beacon learns more about how the brain works — insights it is using to create a “foundation model” of the brain.

“It was clear sleep was the right window to understand the brain,” Donoghue says.

“What’s powerful is that we’re building a longitudinal record of brain function over time,” Donoghue says.

That turns routine testing into a foundation for entirely new prognostic biomarkers — and a path to detecting and intervening in brain disease earlier, potentially before symptoms ever begin.”

2 weeks, 2 days назад @ news.mit.edu
Improving understanding with language
Improving understanding with language Improving understanding with language

She learned French from her relationships with Haitian family friends, and American Sign Language because of another friend’s deaf sibling.

“There are so many things that are different about sign language and spoken language,” she says.

“It’s the only reason I’m on the path I’ve chosen,” she continues, one that features a focus on language acquisition, education policy, LLMs’ computational possibilities and limitations, and education reform.

Language is a medium for thought and provides guardrails to improve understanding.

“Support research,” Honeycutt says.

2 weeks, 2 days назад @ news.mit.edu
Making the case for curiosity-driven science
Making the case for curiosity-driven science Making the case for curiosity-driven science

Kornbluth spoke about everything from the importance of curiosity-driven science and why basic science is critical to our nation’s future, to AI and education, and even bravely joined O’Leary in a rendition of the Williams College song, “The Mountains,” in honor of their shared alma mater.

“We are in this time of incredible uncertainty,” said Kornbluth of the current state of higher education and funding for scientific research.

Behind the scenes, I am – along with many other [university] presidents – I am in D.C. all the time now.

Universities are where most of the science with a long pathway to impact, requiring patience, starts.

With that pipeline being drained, what does the future hold…

2 weeks, 3 days назад @ news.mit.edu
Solving the “Whac-a-mole dilemma”: A smarter way to debias AI vision models
Solving the “Whac-a-mole dilemma”: A smarter way to debias AI vision models Solving the “Whac-a-mole dilemma”: A smarter way to debias AI vision models

Perhaps one of the best known and most persistent challenges that AI research continues to reckon with is bias.

Bias is often discussed in relation to training data, but model architecture can also contain and amplify bias, negatively influencing model performance in real-world settings.

VLMs are multi-modal models that can understand and interpret different data modalities like video, image, and text simultaneously.

And like projection debiasing, WRING is a post-processing approach, which means it can be applied “on the fly” to a pre-trained VLM.

“Extending this for ChatGPT-style, generative language models, is the reasonable next step for us,” says Gerych.

2 weeks, 3 days назад @ news.mit.edu
The MIT-IBM Computing Research Lab launches to shape the future of AI and quantum computing
The MIT-IBM Computing Research Lab launches to shape the future of AI and quantum computing The MIT-IBM Computing Research Lab launches to shape the future of AI and quantum computing

IBM and MIT today announced the launch of the MIT-IBM Computing Research Lab, advancing their long-standing collaboration to shape the next era of computing.

The MIT-IBM Computing Research Lab builds on a distinguished history of scientific excellence at the intersection of research and academia.

“We expect the MIT-IBM Computing Research Lab to emerge as one of the world’s premier academic and industrial hubs accelerating the future of computing,” says Jay Gambetta, director of IBM Research and IBM Fellow, and IBM chair of the MIT-IBM Computing Research Lab.

The MIT-IBM Computing Research Lab will also leverage IBM’s longtime leadership and expertise in quantum computing.

Deep integration w…

2 weeks, 3 days назад @ news.mit.edu
Enabling privacy-preserving AI training on everyday devices
Enabling privacy-preserving AI training on everyday devices Enabling privacy-preserving AI training on everyday devices

A new method developed by MIT researchers can accelerate a privacy-preserving artificial intelligence training method by about 81 percent.

This advance could enable a wider array of resource-constrained edge devices, like sensors and smartwatches, to deploy more accurate AI models while keeping user data secure.

Each device trains the model using its local data and then transfers model updates back to the server.

This new approach could make it more feasible for AI models to be used in high-stakes applications with strict security and privacy standards, like health care and finance.

The central server usually waits to receive model updates from all devices, then averages them to complete th…

2 weeks, 4 days назад @ news.mit.edu
A faster way to estimate AI power consumption
A faster way to estimate AI power consumption A faster way to estimate AI power consumption

Improving data center energy efficiency is one way scientists are striving to make AI more sustainable.

In addition, this tool could allow algorithm developers and model providers to assess potential energy consumption of a new model before they deploy it.

The power consumption of a particular GPU will vary based on its configuration and the workload it is handling.

They could use these patterns to generate the information needed for reliable but quick power estimation.

The user can also change the GPU configuration or adjust the operating speed to see how such design choices impact the overall power consumption.

2 weeks, 6 days назад @ news.mit.edu
MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone
MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone

Every year, the countries competing in the International Mathematical Olympiad (IMO) arrive with a booklet of their best, most original problems.

MathNet also functions as a rigorous benchmark for AI performance, and the results reveal a more complicated picture than recent headlines about AI math prowess might suggest.

Even GPT-5, the top-performing model tested, averaged around 69.3 percent on MathNet's main benchmark of 6,400 problems, failing nearly one-in-three Olympiad-level problems.

The diversity of MathNet is also designed to address a deeper limitation in how AI models learn mathematics.

When training data skews toward English and Chinese problems, models absorb a narrow slice of …

3 weeks, 1 day назад @ news.mit.edu
Teaching AI models to say “I’m not sure”
Teaching AI models to say “I’m not sure” Teaching AI models to say “I’m not sure”

Today's most capable reasoning models share a trait with the loudest voice in the room: They deliver every answer with the same unshakable certainty, whether they're right or guessing.

The technique, called RLCR (Reinforcement Learning with Calibration Rewards), trains language models to produce calibrated confidence estimates alongside their answers.

During training, models learn to reason about both the problem and their own uncertainty, producing an answer and a confidence estimate together.

Standard RL training actively degraded calibration compared to the base model, making models worse at estimating their own uncertainty.

The researchers trained classifiers on model outputs and found …

3 weeks, 3 days назад @ news.mit.edu
Jacob Andreas and Brett McGuire named Edgerton Award winners
Jacob Andreas and Brett McGuire named Edgerton Award winners Jacob Andreas and Brett McGuire named Edgerton Award winners

MIT Associate Professor Jacob Andreas of the Department of Electrical Engineering and Computer Science [EECS] and MIT Associate Professor Brett McGuire of the Department of Chemistry have been selected as the winners of the 2026 Harold E. Edgerton Faculty Achievement Award.

“He is an innovative researcher whose work combines computational and linguistically informed approaches to build foundations of language learning.

He aims to understand the computational foundations of language learning, and to build intelligent systems that can learn from human guidance.

His work in natural language processing has taken on thorny problems in the capability gap between humans and computers.

His honors i…

4 weeks, 1 day назад @ news.mit.edu
Berkeley AI
последний пост 1 week, 1 day назад
Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling
Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference ScalingOverview of adaptive parallel reasoning.

We provide a detailed analysis of recent progress in the field of parallel reasoning, especially Adaptive Parallel Reasoning.

Figure 4: Special Tokens Variants across Adaptive Parallel Reasoning PapersInference Systems for Adaptive ParallelismHow do we actually execute parallel branches?

Figure 14: Difference in Model Choice Across Adaptive Parallel Reasoning PapersEach paper also offers a slightly different interpretation about how adaptive parallel reasoning contributes to the research field.

(Yang et al., 2025; Lian et al., 2025) aim to deliver sequential-AR-model-level a…

1 week, 1 day назад @ bair.berkeley.edu
Gradient-based Planning for World Models at Longer Horizons
Gradient-based Planning for World Models at Longer Horizons Gradient-based Planning for World Models at Longer Horizons

Large, learned world models are becoming increasingly capable.

Why is adversarial robustness an issue for world model planning?

We thus exploit the differentiability of learned world models $F_{\theta}$, while not falling victim to the inherent sensitivity of the state Jacobians $D_s F_{\theta}$.

It’s a funny sweet spot where the background literature (planning and control overall) is incredibly mature and well-developed, but the current setting (pure planning optimization over modern, large-scale world models) is still heavily underexplored.

But, once we figure out all the right ideas, world model planners will likely become as commonplace as RL.

3 weeks, 5 days назад @ bair.berkeley.edu
Identifying Interactions at Scale for LLMs
Identifying Interactions at Scale for LLMs Identifying Interactions at Scale for LLMs

Identifying Interactions at Scale for LLMsUnderstanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence.

Therefore, grounded or reality-checked interpretability methods must also be able to capture these influential interactions.

In this blog post, we describe the fundamental ideas behind SPEX and ProxySPEX, algorithms capable of identifying these critical interactions at scale.

SPEX and ProxySPEX FrameworkTo discover influential interactions with a tractable number of ablations, we have developed SPEX (Spectral Explainer).

We formalize this through two observations: sparsity (relatively f…

2 months назад @ bair.berkeley.edu
Information-Driven Design of Imaging Systems
Information-Driven Design of Imaging Systems Information-Driven Design of Imaging Systems

We developed a framework that enables direct evaluation and optimization of imaging systems based on their information content.

The first approach treated imaging systems as unconstrained communication channels, ignoring the physical limitations of lenses and sensors.

Our Information-Driven Encoder Analysis Learning (IDEAL) method uses gradient ascent on information estimates to optimize imaging system parameters.

The standard approach to computational imaging design, end-to-end optimization, jointly trains the imaging hardware and a neural network decoder.

The computational efficiency of IDEAL suggests possibilities for designing imaging systems that were previously intractable.

4 months назад @ bair.berkeley.edu
RL without TD learning
RL without TD learning RL without TD learning

RL without TD learningIn this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer.

We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning.

There are two classes of algorithms in RL: on-policy RL and off-policy RL.

We compared TRL with $n$-step TD learning with different values of $n$, from $1$ (pure TD) to $\infty$ (pure MC).

I still think one of the most important problems in RL (and even in machine learning) is to find a scalable off-policy RL algorithm.

6 months, 2 weeks назад @ bair.berkeley.edu
What exactly does word2vec learn?
What exactly does word2vec learn? What exactly does word2vec learn?

What exactly does word2vec learn?

What exactly does word2vec learn, and how?

In this framing, it’s clear that word2vec is a minimal neural language model.

As a result, the theory predicts exactly what features are learned in terms of the corpus statistics and the algorithmic hyperparameters.

We find that over the course of learning, word2vec builds these linear representations in a sequence of noisy learning steps, and their geometry is well-described by a spiked random matrix model.

8 months, 2 weeks назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 1 day, 14 hours назад
Restrict access to sensitive documents in your Amazon Quick knowledge bases for Amazon S3
Restrict access to sensitive documents in your Amazon Quick knowledge bases for Amazon S3 Restrict access to sensitive documents in your Amazon Quick knowledge bases for Amazon S3

Document-level access control list (ACL) support for Amazon Simple Storage Service (Amazon S3) knowledge bases in Amazon Quick gives you that fine-grained control.

Controlling S3 bucket access for knowledge base creationDocument-level ACLs control which documents users can access within a knowledge base, but they don’t control who can create knowledge bases in the first place.

For example, if the file S3 key is recipe.pdf , the metadata file S3 key would be recipe.pdf.metadata.json .

All knowledge base create and update actions are logged in AWS CloudTrail, including whether ACLs are enabled on the knowledge base.

ConclusionDocument-level ACLs for Amazon S3 knowledge bases in Amazon Quick g…

1 day, 14 hours назад @ aws.amazon.com
Improve bot accuracy with Amazon Lex Assisted NLU
Improve bot accuracy with Amazon Lex Assisted NLU Improve bot accuracy with Amazon Lex Assisted NLU

The Assisted NLU (natural language understanding) feature in Amazon Lex helps you improve bot accuracy by handling these natural language variations.

The solution: Amazon Lex Assisted NLU feature uses large language models (LLM) to understand natural language variations and improve bot accuracy.

By combining traditional machine learning (ML) with LLMs, Assisted NLU handles how real customers communicate, creating natural conversational experiences that improve recognition accuracy.

Assisted NLU (including Primary mode, Fallback mode, and intent disambiguation) is included at no additional cost with standard Amazon Lex pricing.

For detailed configuration instructions, API references, and ste…

2 days, 12 hours назад @ aws.amazon.com
Real-time voice agents with Stream Vision Agents and Amazon Nova 2 Sonic
Real-time voice agents with Stream Vision Agents and Amazon Nova 2 Sonic Real-time voice agents with Stream Vision Agents and Amazon Nova 2 Sonic

In this post, you learn how to combine Stream’s Vision Agents open-source framework with Amazon Bedrock and Amazon Nova 2 Sonic to build real-time voice agents that can be production-ready in minutes.

Stream’s Vision Agents an open-source Python framework for building real-time voice and video AI agents.

Vision Agent worker A dedicated Vision Agent worker process holds the PeerConnection state for that session.

Amazon Nova 2 Sonic integration with Vision agents through Amazon Bedrock Amazon Nova 2 Sonic detects speech boundaries and performs speech-to-speech modeling (understanding, reasoning, and TTS) with optional tool calls into customer systems (RDS, APIs, knowledge bases).

ConclusionTh…

2 days, 12 hours назад @ aws.amazon.com
From siloed data to unified insights: Cross-account Athena Access for Amazon Quick
From siloed data to unified insights: Cross-account Athena Access for Amazon Quick From siloed data to unified insights: Cross-account Athena Access for Amazon Quick

Role B (Consumer Account Role) – lives in the consumer account where Athena data resides.

Pattern 1: Basic two account setupThe most straightforward deployment connects one central Quick account to one consumer account.

SolutionCross-account Athena access for Amazon Quick uses IAM role chaining to bridge your central Quick account with one or more consumer accounts where your data lives.

The central Quick account incurs only Quick session costs and SPICE storage.

ConclusionCross-account Athena access for Amazon Quick enables enterprises to maintain a centralized BI AWS account while respecting multi-account data governance and cost boundaries.

2 days, 12 hours назад @ aws.amazon.com
Control where your AI agents can browse with Chrome enterprise policies on Amazon Bedrock AgentCore
Control where your AI agents can browse with Chrome enterprise policies on Amazon Bedrock AgentCore Control where your AI agents can browse with Chrome enterprise policies on Amazon Bedrock AgentCore

Amazon Bedrock AgentCore Browser now supports Chrome enterprise policies and custom root CA certificates to give organizations granular control over agent browser behavior and connectivity.

Why enforce browser policies for AI agentsChrome enterprise policies address three organizational needs when applied to AI browser agents.

Store the root CA certificate in AWS Secrets ManagerThe BadSSL untrusted root CA certificate is publicly available (source: badssl.com/certs/ca-untrusted-root.crt).

In a production environment, the organization’s internal CA certificate or SSL-intercepting proxy root CA certificate would be placed in Secrets Manager the same way.

For more information about Amazon Bedr…

2 days, 12 hours назад @ aws.amazon.com
Build financial document processing with Pulse AI and Amazon Bedrock
Build financial document processing with Pulse AI and Amazon Bedrock Build financial document processing with Pulse AI and Amazon Bedrock

By combining Pulse AI’s advanced document understanding capabilities with the powerful AI services of Amazon Bedrock, organizations can achieve enterprise-grade accuracy and extract contextually relevant financial insights at scale.

In summary, Pulse AI and Amazon Bedrock together provides:Pulse AI extracts structured, semantically-aware data from complex financial documents handling intricate tables, multi-column layouts, and hierarchical data.

Amazon Bedrock fine-tunes Amazon Nova models on that high-quality data to create domain-specific intelligence for your organization’s financial conventions.

Step-by-step implementationFollow these steps to set up and configure your financial documen…

3 days, 12 hours назад @ aws.amazon.com
Build real-time voice streaming applications with Amazon Nova Sonic and WebRTC
Build real-time voice streaming applications with Amazon Nova Sonic and WebRTC Build real-time voice streaming applications with Amazon Nova Sonic and WebRTC

Building end-to-end live streaming applications with real-time voice interaction presents several challenges: network bandwidth constraints can cause high latency and quality degradation in time-critical applications.

This post introduces a solution based on Amazon Nova 2 Sonic (Nova Sonic) and Amazon Kinesis Video Streams WebRTC (WebRTC) that addresses these challenges.

Nova Sonic and WebRTCTraditional voice agent pipelines typically involve separate modules for speech recognition, language processing, and speech synthesis.

Nova Sonic offers a unified speech-to-speech architecture that enables real-time voice conversations between users and AI agents with low latency.

The following diagram…

3 days, 12 hours назад @ aws.amazon.com
Securing AI agents: How AWS and Cisco AI Defense scale MCP and A2A deployments
Securing AI agents: How AWS and Cisco AI Defense scale MCP and A2A deployments Securing AI agents: How AWS and Cisco AI Defense scale MCP and A2A deployments

AI Registry, an AWS-backed open-source project, integrates with Cisco AI Defense to bring:Tool sprawl and visibilityOrganizations deploying MCP servers and AI agents face a fundamental visibility challenge.

Similarly, when registering A2A agents, the Cisco AI Defense A2A Scanner analyzes agent capability declarations, agent skill definitions, and communication patterns.

Getting startedThe following sections describe how to get started with the AI Registry and Cisco AI Defense integration depending on your current environment.

The Cisco AI Defense MCP Scanner (open-source) is available at: cisco-ai-defense/mcp-scanner on GitHubThe Cisco AI Defense A2A Scanner (open-source) is available at: c…

3 days, 12 hours назад @ aws.amazon.com
Fine-tune LLM with Databricks Unity Catalog and Amazon SageMaker AI
Fine-tune LLM with Databricks Unity Catalog and Amazon SageMaker AI Fine-tune LLM with Databricks Unity Catalog and Amazon SageMaker AI

This complete workflow integrates SageMaker AI, EMR Serverless, and Databricks Unity Catalog for governed, scalable LLM fine-tuning.

Download the complete notebook LLM_Finetunig_SageMaker_AI_Unity_Catalog.ipynb and run it in SageMaker AI Studio using the following steps:Navigate to the Amazon SageMaker AI Console.

Create Unity Catalog database objects Set up the following database objects in your Unity Catalog environment: Catalog : Create catalog: Create catalog Schemas : Create schemas: Create schemas External Table: Create external table Grant Permissions Grant necessary privileges to your Service Principal.

Step 5: Fine-tuning with SageMaker AI Training jobAfter preparing the dataset fo…

3 days, 12 hours назад @ aws.amazon.com
How Amazon Finance streamlines regulatory inquiries by using generative AI on AWS
How Amazon Finance streamlines regulatory inquiries by using generative AI on AWS How Amazon Finance streamlines regulatory inquiries by using generative AI on AWS

In this post, we demonstrate how Amazon FinTech teams are using Amazon Bedrock and other AWS services to build a scalable AI application to transform how regulatory inquiries are handled.

Solution overviewTo address these challenges, Amazon FinTech team built an intelligent regulatory response automation system using Amazon Bedrock, AWS Lambda, and supporting AWS services.

The AWS Lambda function then calls the Amazon Bedrock Knowledge Bases.

To support this iterative process, the Amazon FinTech team implemented a multi-turn conversational workflow using Amazon API Gateway (WebSocket APIs), AWS Lambda, and Amazon DynamoDB, integrated with the Amazon Bedrock ConverseStream API for low-latenc…

4 days, 13 hours назад @ aws.amazon.com
Automate schema generation for intelligent document processing
Automate schema generation for intelligent document processing Automate schema generation for intelligent document processing

Before you can extract information from documents using intelligent document processing (IDP) techniques, you need a schema for each document class that defines what to extract.

IDP AcceleratorThe IDP Accelerator is a scalable, serverless, open-source solution for automated document processing and information extraction.

It’s a new “Multiple Document” capability alongside the “Single Document” discovery feature.

Document clusteringThe multi-document discovery feature learns how many document types are in your collection using the silhouette score.

If schema quality is inconsistent across clusters: Check whether your document collection has a highly uneven distribution of document types.

4 days, 14 hours назад @ aws.amazon.com
Navigating EU AI Act requirements for LLM fine-tuning on Amazon SageMaker AI
Navigating EU AI Act requirements for LLM fine-tuning on Amazon SageMaker AI Navigating EU AI Act requirements for LLM fine-tuning on Amazon SageMaker AI

Amazon SageMaker AI provides a managed machine learning (ML) service for building, training, and deploying models.

This solution uses Amazon SageMaker Training jobs to run fine-tuning workloads on fully managed infrastructure.

In this post, we show you how to set up FLOPs tracking during LLM fine-tuning using the open source Fine-Tuning FLOPs Meter toolkit on Amazon SageMaker AI.

Solution overviewThe Fine-Tuning FLOPs Meter is an open source toolkit, available in the Amazon SageMaker Generative AI recipes repository, that integrates into Hugging Face training workflows on Amazon SageMaker AI.

For more information about building compliance-aligned AI systems on AWS, see the Amazon SageMaker …

4 days, 14 hours назад @ aws.amazon.com
Building web search-enabled agents with Strands and Exa
Building web search-enabled agents with Strands and Exa Building web search-enabled agents with Strands and Exa

Strands AgentsThe Strands Agents SDK is an open source framework from AWS for building AI agents using a model-driven approach.

If the model needs more information, it requests a tool; Strands Agents executes it and feeds the result back.

Strands Agents and Exa: Integration overviewThe Exa integration is available through the strands-agents-tools package.

The only new dependency is the `strands-agents-tools` package.To use Exa with Strands Agents, follow these steps:1.

If you no longer need your Exa API key, revoke it from the Exa dashboardConclusionThe Strands Agents SDK and Exa provide a path to building AI agents that are grounded in current, accurate web information.

5 days, 8 hours назад @ aws.amazon.com
Introducing Claude Platform on AWS: Anthropic’s native platform, through your AWS account
Introducing Claude Platform on AWS: Anthropic’s native platform, through your AWS account Introducing Claude Platform on AWS: Anthropic’s native platform, through your AWS account

Claude Platform on AWS is a new service that gives customers direct access to Anthropic’s native Claude Platform experience through their AWS account, with no separate credentials, contracts, or billing relationships required.

You access Claude Platform on AWS through familiar AWS features:Authentication: You use existing AWS IAM credentials to access Claude Platform.

Figure 1: Sign in to the AWS Management Console and open the Claude Platform on AWS ConsoleGetting started with Claude Platform on AWSYou can activate Claude Platform on AWS through the AWS Marketplace.

Figure 4a: Claude Cowork connected to Claude Platform on AWSFigure 4b: Claude Code connected to Claude Platform on AWSYou can…

5 days, 11 hours назад @ aws.amazon.com
Manufacturing intelligence with Amazon Nova Multimodal Embeddings
Manufacturing intelligence with Amazon Nova Multimodal Embeddings Manufacturing intelligence with Amazon Nova Multimodal Embeddings

In this post, we build a multimodal retrieval system for aerospace manufacturing documents using Amazon Nova Multimodal Embeddings on Amazon Bedrock and Amazon S3 Vectors.

We evaluate the system on 26 manufacturing queries and compare generation quality between a text-only pipeline and the multimodal pipeline.

Amazon Nova Multimodal Embeddings overviewAmazon Nova Multimodal Embeddings is available in Amazon Bedrock and generates embeddings for text, images, and multipage documents.

Pipeline A, Multimodal – Embed each image directly and each PDF page as a document image using Amazon Nova Multimodal Embeddings, then ingest into an Amazon Simple Storage Service (Amazon S3) Vectors index.

With …

5 days, 13 hours назад @ aws.amazon.com
NVIDIA
последний пост 2 days, 17 hours назад
Sea You in the Cloud: ‘Subnautica 2’ Early Access Dives Onto GeForce NOW
Sea You in the Cloud: ‘Subnautica 2’ Early Access Dives Onto GeForce NOW Sea You in the Cloud: ‘Subnautica 2’ Early Access Dives Onto GeForce NOW

A limited-time HITMAN World of Assassination reward event brings signature tools of the trade — equal parts precision and unpredictability.

At the same time Le Chiffre from CASINO ROYALE, played by the legendary actor Mads Mikkelsen, returns to HITMAN World of Assassination.

GeForce NOW members can jump in once early access becomes available on the service — no preinstalls or downloads needed.

Here’s what’s waiting:Free users: The Purple Streak Explosive Duck — a remote explosive disguised as an innocent rubber toy.

Performance members: The Purple Streak Explosive Duck and the Bomb Dynamite, a classic TNT bundle built for loud, messy exits.

2 days, 17 hours назад @ blogs.nvidia.com
NVIDIA, Ineffable Intelligence Team Up to Build the Future of Reinforcement Learning Infrastructure
NVIDIA, Ineffable Intelligence Team Up to Build the Future of Reinforcement Learning Infrastructure NVIDIA, Ineffable Intelligence Team Up to Build the Future of Reinforcement Learning Infrastructure

Reinforcement-learning agents — AI systems that learn by trial and error — can convert computation into new knowledge.

“We are thrilled to partner with Ineffable Intelligence to codesign the infrastructure for large-scale reinforcement learning as they push the frontier of AI and pioneer a new generation of intelligent systems.”Silver is one of the pioneers of reinforcement learning, an approach that has transformed AI research.

Unlike pretraining, where a fixed dataset of human data flows through the system, reinforcement learning workloads generate their data on the fly.

That’s where NVIDIA and Ineffable are focusing their technical work: building a pipeline that can feed reinforcement le…

3 days, 17 hours назад @ blogs.nvidia.com
Hermes Unlocks Self-Improving AI Agents, Powered by NVIDIA RTX PCs and DGX Spark
Hermes Unlocks Self-Improving AI Agents, Powered by NVIDIA RTX PCs and DGX Spark Hermes Unlocks Self-Improving AI Agents, Powered by NVIDIA RTX PCs and DGX Spark

It’s provider- and model-agnostic by design, and optimized for always-on local use, making NVIDIA RTX PCs, NVIDIA RTX PRO workstations and NVIDIA DGX Spark the ideal hardware to run it at full speed, around the clock.

Hermes: Local AI Agent Capabilities AcceleratedLike other popular agents, Hermes integrates with messaging apps, can access local files and applications, and runs 24/7.

Stay tuned for more updates from RTX AI Garage on the latest open models and agents optimized for NVIDIA RTX hardware.

#ICYMI: The Latest From RTX AI Garage✨ NVIDIA RTX PRO GPUs deliver up to 3x faster token generation running Qwen 3.6 models with llama.cpp.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTo…

3 days, 17 hours назад @ blogs.nvidia.com
NVIDIA and SAP Bring Trust to Specialized Agents
NVIDIA and SAP Bring Trust to Specialized Agents NVIDIA and SAP Bring Trust to Specialized Agents

Announced today at SAP Sapphire — where NVIDIA founder and CEO Jensen Huang joined SAP CEO Christian Klein’s keynote by video — SAP and NVIDIA’s expanded collaboration helps enterprises run specialized agents with security and governance controls.

SAP embeds NVIDIA OpenShell — an open source runtime for securely developing and deploying autonomous AI agents — into SAP Business AI Platform.

Within SAP Business AI Platform, OpenShell is the runtime security layer for all SAP AI agents, including custom agents built in Joule Studio — SAP’s environment for building and managing end-to-end enterprise agents.

Joule Studio runtime — the enterprise control layer within SAP Business AI Platform — as…

4 days, 17 hours назад @ blogs.nvidia.com
‘Your Career Starts at the Beginning of the AI Revolution,’ NVIDIA CEO Tells Graduates
‘Your Career Starts at the Beginning of the AI Revolution,’ NVIDIA CEO Tells Graduates ‘Your Career Starts at the Beginning of the AI Revolution,’ NVIDIA CEO Tells Graduates

“You are entering the world at an extraordinary moment,” NVIDIA founder and CEO Jensen Huang told graduates as he delivered the keynote address at Carnegie Mellon University’s 128th commencement ceremony on Sunday.

Huang underscored that AI is making intelligence more broadly accessible — reaffirming the imperative for AI to reach everyone, not just a select few.

It is creating a new industrial era.”Massive industrial and economic shifts always bring with them uncertainty, the AI revolution is no different.

“Every major technological revolution in history created fear alongside opportunity,” Huang said.

To meet the moment of the AI revolution, Huang counseled doing four things at once: “Adv…

6 days, 8 hours назад @ blogs.nvidia.com
Model Quantization: Post-Training Quantization Using NVIDIA Model Optimizer
Model Quantization: Post-Training Quantization Using NVIDIA Model Optimizer Model Quantization: Post-Training Quantization Using NVIDIA Model Optimizer

This post walks through how to use NVIDIA Model Optimizer to quantize a CLIP model in FP8 format with the post-training quantization (PTQ) method.

For a general introduction to model quantization, see Model Quantization: Concepts, Methods, and Why It Matters.

Quantization recipeThe following quantization recipe is used in this post as a step-by-step guide for running CLIP model quantization with ModelOpt to understand how the process works.

CLIP model quality comparison of the FP16 baseline versus FP8-PTQ quantized modelsBased on the evaluation results, the CLIP-FP8 quantized model demonstrates comparable quality to the CLIP-FP16 model.

Get started with NVIDIA Model OptimizerThis post intro…

1 week, 2 days назад @ developer.nvidia.com
Powering the Next American Century: US Energy Secretary Chris Wright and NVIDIA’s Ian Buck on the Genesis Mission
Powering the Next American Century: US Energy Secretary Chris Wright and NVIDIA’s Ian Buck on the Genesis Mission Powering the Next American Century: US Energy Secretary Chris Wright and NVIDIA’s Ian Buck on the Genesis Mission

That’s the case U.S. Energy Secretary Chris Wright and NVIDIA Vice President of Hyperscale and High-Performance Computing Ian Buck made Thursday morning at the SCSP AI+ Expo.

The 30-minute fireside chat, moderated by SCSP president Ylli Bajraktari, was called “Powering the Next American Century.”Their argument: American leadership in AI runs through American leadership in energy.

NVIDIA is among the DOE partners on the mission, building on what Buck called two decades of NVIDIA building supercomputers with the national labs.

The DOE PartnershipThe DOE brings 17 national labs, the scientists, the national problems and the data.

That’s a problem because, as Wright put it, the most important s…

1 week, 2 days назад @ blogs.nvidia.com
Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and Prometheus
Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and Prometheus Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and Prometheus

NVIDIA NCCL Inspector accelerates triaging by providing a lightweight and continuous report of NCCL communication performance.

Live, time-series visualizations can now be powered directly within a user’s infrastructure dashboard by integrating NCCL Inspector with Prometheus Exporter.

NCCL Inspector deployment architectureNCCL 2.30 introduces Prometheus Mode, a major enhancement for real-time performance monitoring of NCCL in AI workloads.

NCCL Inspector in JSON mode (default/offline mode)The JSON mode operates in a data collection and data analysis phase.

NCCL Inspector in real-time Prometheus modeThis new feature integrates NCCL Inspector metrics with Prometheus, converting them into time-…

1 week, 2 days назад @ developer.nvidia.com
Linked and Loaded: Gaijin Single Sign-On Now Available on GeForce NOW
Linked and Loaded: Gaijin Single Sign-On Now Available on GeForce NOW Linked and Loaded: Gaijin Single Sign-On Now Available on GeForce NOW

Faster logins mean more time in the gaming action — and this week provides GeForce NOW members with a smoother path straight into the battlefield.

Cloud gaming is all about instant access to titles across devices, and the latest GeForce NOW update removes another layer for members jumping into their Gaijin libraries from anywhere.

One Gaijin Login to Rule Them AllGaijin account linking is now cleared for takeoff on GeForce NOW, making it easier to jump straight into battle without extra logins.

With Gaijin single sign-on, one quick sign-in connects a Gaijin.net account to the cloud, so supported titles are ready to launch with fewer clicks and no password juggling.

Just head to the GeForce …

1 week, 2 days назад @ blogs.nvidia.com
NVIDIA Spectrum-X — the Open, AI-Native Ethernet Fabric — Sets the Standard for Gigascale AI, Now With MRC
NVIDIA Spectrum-X — the Open, AI-Native Ethernet Fabric — Sets the Standard for Gigascale AI, Now With MRC NVIDIA Spectrum-X — the Open, AI-Native Ethernet Fabric — Sets the Standard for Gigascale AI, Now With MRC

NVIDIA Spectrum-X Ethernet is suited for this environment, helping provide the network foundation needed to run large-scale AI models and applications with confidence.

Another innovation key to achieving gigascale AI factories is multiplanar network designs, which OpenAI deploys with Spectrum-X Ethernet in conjunction with MRC.

Both Spectrum-X Ethernet Adaptive RDMA and MRC protocols, as well as other custom protocols, run natively across NVIDIA ConnectX SuperNICs and Spectrum-X Ethernet switches and support multiplanar network designs at gigascale.

NVIDIA Spectrum-X Ethernet delivers on all three, and with MRC, it continues to set the standard for advanced AI networking.

Learn more about N…

1 week, 3 days назад @ blogs.nvidia.com
NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises
NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises NVIDIA and ServiceNow Partner on New Autonomous AI Agents for Enterprises

Unlike standalone AI agents, Project Arc connects natively to the ServiceNow AI Platform through ServiceNow Action Fabric to bring governance, auditability and workflow intelligence to every action the autonomous desktop agent takes.

NVIDIA agent skills enable specialized agents, such as ServiceNow AI Specialists, to deliver targeted capabilities across enterprise workflows.

Efficient AI FactoriesAs AI agents become long running and always on, scaling them across millions of workflows requires not just capability but efficiency — making token economics central to enterprise AI.

NVIDIA AI factories are built to deliver the lowest-cost, most-efficient tokenomics for production AI.

ServiceNow …

1 week, 4 days назад @ blogs.nvidia.com
Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills
Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills

The following steps outline how to set up and use the NVIDIA cuOpt supply chain agent reference workflow, which uses cuOpt agent skills to perform GPU-accelerated supply chain optimization using agent-driven workflows.

End-to-end supply chain decision optimization using NVIDIA cuOpt agent skillsExtendible agentic architectureThe cuOpt supply chain agent reference workflow is a simplified starting point.

Architectural diagram of an extended pattern using the NVIDIA cuOpt supply chain agent reference workflowGet started with this cuOpt agent workflow on GitHub.

Get startedDeploy NVIDIA cuOpt Agent reference workflow using the NVIDIA NeMo Agent Toolkit and use built-in optimization skills, or …

1 week, 5 days назад @ developer.nvidia.com
Nemotron Labs: What OpenClaw Agents Mean for Every Organization
Nemotron Labs: What OpenClaw Agents Mean for Every Organization Nemotron Labs: What OpenClaw Agents Mean for Every Organization

Each post highlights practical ways to use an open stack to deliver real value in production — from transparent research copilots to scalable AI agents.

Most AI agents today are triggered by a prompt, complete a defined task and then stop running.

Autonomous agents, which run continuously and act across long time horizons, drive inference demand up by another 1,000x over reasoning AI.

The practical applications of long-running autonomous agents span every function and sector.

Stay up to date on agentic AI, NVIDIA Nemotron and more by subscribing to NVIDIA AI news, joining the community and following NVIDIA AI on LinkedIn, Instagram, X and Facebook.

2 weeks, 2 days назад @ blogs.nvidia.com
Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl
Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl

This post covers cross-domain-specific language (DSL) GPU kernel translation, from porting cuTile Python kernels to cuTile.jl (Julia).

It shows how to:Translate GPU kernels between cuTile Python and cuTile.jl: Walk through a complete matrix multiplication example side-by-side.

Cross-DSL GPU kernel translationBoth cuTile Python and cuTile.jl frontends share the same tiled abstraction, making the translation largely algorithmic.

The following examples are from TileGym, where the team ported a set of cuTile Python kernels to cuTile.jl and packaged them as a self-contained Julia subproject.

* Accumulator shape (TM, TN) Wrong results in matmul Column-major needs (TN, TM) ct.PaddingMode.ZERO Unde…

2 weeks, 2 days назад @ developer.nvidia.com
It’s Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power
It’s Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power It’s Gonna Be May: 16 Games Hit the Cloud This Month, With More NVIDIA GeForce RTX 5080 Power

Ultimate members now get priority access to RTX 5080‑class rigs, making it easier than ever to tap into next‑generation PC power from almost any device.

Starting today, Ultimate members can stream even more of their games on RTX 5080 virtual gaming rigs — bringing the power of the NVIDIA Blackwell RTX architecture to a wide range of titles.

This update significantly broadens access to 5080 performance beyond the list of GeForce RTX 5080-optimized titles.

With RTX 5080 in the cloud, Ultimate members unlock the same cutting-edge features available to GeForce RTX 50 Series GPU owners.

With RTX 5080 powering the default Ultimate experience, GeForce NOW delivers next-generation performance to mo…

2 weeks, 2 days назад @ blogs.nvidia.com
Facebook
последний пост 3 days, 17 hours назад
Reel Friends: Building Social Discovery that Scales to Billions
Reel Friends: Building Social Discovery that Scales to Billions Reel Friends: Building Social Discovery that Scales to Billions

On its face the new Friend Bubbles feature looks simple enough.

It highlights Reels your friends have watched and reacted to.

On this episode of the Meta Tech Podcast, Pascal Hartig chats with Subasree and Joseph, two software engineers from the Facebook Reels team, about what it took to bring Friend Bubbles to life.

If you’ve ever underestimated a “simple” feature, this one’s for you.

And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.

3 days, 17 hours назад @ engineering.fb.com
Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge
Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge

We’ve fundamentally transformed Facebook Groups Search to help people more reliably discover, sort through, and validate community content that’s most relevant to them.

We’ve adopted a new hybrid retrieval architecture and implemented automated model-based evaluation to address the major friction points people experience when searching community content.

Addressing the Friction Points in Community KnowledgePeople struggle with three friction points when searching for answers in community content – discovery, consumption, and validation.

The Solution: A Modernized Hybrid Retrieval ArchitectureWe engineered a hybrid retrieval architecture that powers a discussions module on Facebook Search.

R…

3 weeks, 4 days назад @ engineering.fb.com
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale

We’ve built a unified AI agent platform that encodes the domain expertise of senior efficiency engineers into reusable, composable skills.

Introducing the Capacity Efficiency ProgramWhen the code you ship serves more than 3 billion people, even a 0.1% performance regression can translate to significant additional power consumption.

Many engineers at Meta use our efficiency tools to work on these problems every day.

Skills : These encode domain expertise about performance efficiency.

The pipeline mirrors the defensive AI Regression Solver:Gather context with tools: The AI agent looks up: Opportunity metadata.

1 month назад @ engineering.fb.com
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines

Challenging the Conventional Wisdom on AI Context FilesRecent academic research found that AI-generated context files actually decreased agent success rates on well-known open-source Python repositories.

Our codebase is the opposite: proprietary config-as-code with tribal knowledge that exists nowhere in any model’s training data.

Any team with a large, proprietary codebase can benefit:Identify your tribal knowledge gaps.

What’s NextWe are expanding context coverage to additional pipelines across Meta’s data infrastructure and exploring tighter integration between context files and code generation workflows.

This approach turned undocumented tribal knowledge into structured, AI-readable con…

1 month, 1 week назад @ engineering.fb.com
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure

This is the second post in the Ranking Engineer Agent blog series exploring the autonomous AI capabilities accelerating Meta’s Ads Ranking innovation.

We introduce KernelEvolve, an agentic kernel authoring system used by Ranking Engineer Agent and generally applicable to a range of AI models beyond Ads Ranking.

Unlike typical large language model (LLM)-based agents that perform one-shot code generation, KernelEvolve treats kernel optimization as a search problem.

A standard coding assistant lacks the context to write optimized MTIA kernels because it has never seen MTIA documentation, instruction set details, or programming idioms.

KernelEvolve represents an early step toward the vision of …

1 month, 2 weeks назад @ engineering.fb.com
Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads
Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

To overcome this, we have developed the Meta Adaptive Ranking Model, which effectively bends the inference scaling curve with high ROI and industry-leading efficiency.

Introducing Meta Adaptive Ranking ModelServing LLM-scale & complexity models in a real-time ads recommendation environment requires resolving a fundamental tension between model complexity and system efficiency.

Adaptive Ranking Model addresses these challenges through a paradigm shift powered by three core innovations across the serving stack:Inference-efficient model scaling: Adaptive Ranking Model achieves a model complexity equivalent to the O(10 GFLOPs) per token used by top-tier LLMs.

To minimize compute overhead, Adapt…

1 month, 2 weeks назад @ engineering.fb.com
AI for American-Produced Cement and Concrete
AI for American-Produced Cement and Concrete AI for American-Produced Cement and Concrete

Concurrent with the 2026 American Concrete Institute (ACI) Spring Convention, Meta is releasing a new AI model for designing concrete mixes – Bayesian Optimization for Concrete (BOxCrete), as well as the foundational data used to develop award-winning concrete mixes.

Amrize operates 18 cement plants, 141 cement terminals and 269 ready-mix concrete sites across North America.

Alongside the event, Meta is releasing a new AI model for designing concrete mixes, Bayesian Optimization for Concrete (BOxCrete).

How Meta Leverages AI for Concrete MixturesMeta’s AI for concrete model can help suppliers more quickly incorporate U.S. materials into their mixes through an approach called adaptive experi…

1 month, 2 weeks назад @ engineering.fb.com
Friend Bubbles: Enhancing Social Discovery on Facebook Reels
Friend Bubbles: Enhancing Social Discovery on Facebook Reels Friend Bubbles: Enhancing Social Discovery on Facebook Reels

Friend bubbles in Facebook Reels highlight Reels your friends have liked or reacted to, helping you discover new content and making it easier to connect over shared interests.

Friend bubbles enhance the social experience on Facebook Reels by helping you discover content your friends enjoy, creating a shared viewing experience and sparking new conversations.

Along with additional optimizations in the underlying method, this approach enabled us to ship friend bubbles while preserving core Reels performance.

Friend bubbles work because the signal is high value: It adds meaningful social context that helps people decide what’s worth watching.

Engagement also scales consistently with the number …

1 month, 4 weeks назад @ engineering.fb.com
Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation
Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation

Meta’s Ranking Engineer Agent (REA) autonomously executes key steps across the end-to-end machine learning (ML) lifecycle for ads ranking models.

Powering these interactions are highly sophisticated, complex and massively distributed machine learning (ML) models that continuously evolve to serve both advertisers and people who use the platforms.

Optimizing these ML models has traditionally been time-consuming.

To address this, Meta built the Ranking Engineer Agent, an autonomous AI agent designed to drive the end-to-end ML lifecycle and iteratively evolve Meta’s ads ranking models at scale.

ML training jobs run for hours or days, far beyond what any session-bound assistant can manage.

2 months назад @ engineering.fb.com
Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps
Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps

Nowhere is this more apparent than in mobile security, where a single class of vulnerability can be replicated across hundreds of call sites scattered throughout a sprawling, multi-app codebase serving billions of users.

Meta’s Product Security team has developed a two-pronged strategy to address this:Designing secure-by-default frameworks that wrap potentially unsafe Android OS APIs and make the secure path the easiest path for developers, andLeveraging generative AI to automate the migration of existing code to those frameworks at scale.

The result is a system that can propose, validate, and submit security patches across millions of lines of code with minimal friction for the engineers w…

2 months назад @ engineering.fb.com
RCCLX: Innovating GPU communications on AMD platforms
RCCLX: Innovating GPU communications on AMD platforms RCCLX: Innovating GPU communications on AMD platforms

RCCLX is fully integrated with Torchcomms and aims to empower researchers and developers to accelerate innovation, regardless of their chosen backend.

We want to iterate on collectives, transports, and novel features quickly on AMD platforms.

With RCCLX, we have integrated CTran to AMD platforms, enabling the AllToAllvDynamic – a GPU-resident collective.

These features provide significant performance improvements on AMD platforms and we are excited to share this with the community.

RCCLX Quick Start GuideInstall Torchcomms with RCCLX backend by following the installation instructions in the Torchcomms repo.

2 months, 3 weeks назад @ engineering.fb.com
The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It
The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It

A Catching JiTTest focuses specifically on finding regressions introduced by a code change.

Agentic development dramatically increases the pace of code change, straining test development burden and scaling the cost of false positives and test maintenance to breaking point.

And since the JiTTest itself is LLM-generated, it can often infer the plausible intention of a code change and simulate possible faults that may result from it.

With them engineers no longer have to spend time writing, reviewing, and testing complex test code.

READ THE PAPERJust-in-Time Catching Test Generation at Meta

3 months назад @ engineering.fb.com
Adapting the Facebook Reels RecSys AI Model Based on User Feedback
Adapting the Facebook Reels RecSys AI Model Based on User Feedback Adapting the Facebook Reels RecSys AI Model Based on User Feedback

Our new User True Interest Survey (UTIS) model , now helps surface more niche, high-quality content and boosts engagement, retention, and satisfaction.

Our paper, “ Improve the Personalization of Large-Scale Ranking Systems by Integrating User Survey Feedback ” shares full details on this work.

The main candidate ranking model used by the platform is a large multi-task, multi-label model.

We trained a lightweight UTIS alignment model layer on the collected user survey responses using existing predictions of the main model as input features.

The UTIS model consistently outperformed the baseline, driving higher user engagement and retention .

4 months назад @ engineering.fb.com
DrP: Meta’s Root Cause Analysis Platform at Scale
DrP: Meta’s Root Cause Analysis Platform at Scale DrP: Meta’s Root Cause Analysis Platform at Scale

DrP’s key components include:Expressive SDK : The DrP SDK allows engineers to codify investigation workflows into analyzers.

Post-processing system : After an investigation, the post-processing system can take automated actions based on the analysis results.

Bootstrap code : The DrP SDK provides bootstrap code to create a template analyzer with pre-populated boilerplate code.

Data access and analysis : The SDK includes libraries for data access and analysis, such as dimension analysis and time series correlation.

This provides immediate analysis results to on-call engineers.

4 months, 4 weeks назад @ engineering.fb.com
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks

Generative AI and automation accelerate the adoption of secure frameworks at scale, enabling consistent security enforcement and efficient migration across Meta’s vast codebase.

How We Design Secure-by-Default Frameworks at MetaDesigning secure-by-default frameworks for use by a large number of developers shipping vastly different features across multiple apps is an interesting challenge.

There shouldn’t be one security framework that covers all security issues, and not every security issue is general enough to deserve its own framework.

Now that we’ve looked at the design philosophy behind our frameworks, let’s look at one of our most widely used Android security frameworks, SecureLinkLaun…

5 months назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 5 months, 2 weeks назад
We are joining OpenAI
We are joining OpenAI We are joining OpenAI

Piotr Niedźwiedź, CEO/CTO and founder of neptune.aiI’m excited to share that we’ve entered into a definitive agreement to be acquired by OpenAI, subject to closing conditions.

We are thrilled to join the OpenAI team and help their AI researchers build better models faster.

Neptune is a metrics dashboard company.”We’ve worked closely with OpenAI to create the metrics dashboard that helps teams building foundation models.

Our future with OpenAINeptune will join OpenAI and continue to support AI researchers with tools to monitor, debug, and evaluate frontier models.

We are looking forward to working with top AI researchers and supporting OpenAI’s mission of ensuring that AGI benefits all of hu…

5 months, 2 weeks назад @ neptune.ai
Synthetic Data for LLM Training
Synthetic Data for LLM Training Synthetic Data for LLM Training

For instance, financial data is highly sensitive and protected by very strict regulations, and synthetic data mimics the real data distribution without revealing customer information.

Read more about how leading foundation model teams curate their training data and other topics in the State of Foundation Model Training Report 2025.

Choosing the right synthetic data generation technique depends on the type of data and its complexity.

Synthetic tabular data generation is a promising direction to overcome these challenges by learning the distribution of the tabular data.

Post-processingAs the distribution of tabular data is highly complex, it makes the synthetic tabular data generation very ch…

6 months назад @ neptune.ai
What are LLM Embeddings: All you Need to Know
What are LLM Embeddings: All you Need to Know What are LLM Embeddings: All you Need to Know

TL;DR LLM embeddings are the numerical, vector representations of text that Large Language Models (LLMs) use to process information.

Unlike their predecessor word embeddings, LLM embeddings are context-aware and dynamically change to capture semantic and syntactic relationships based on the surrounding text.

What are the applications of LLM embeddings?

Word EmbeddingsSparse Word Embeddings One-Hot Vectors 1970s TF-IDF1980s Co-Occurrence MatrixStatic Word Embeddings Word2Vec 2013 GloVe 2014Contextualized word embeddings ELMo 2018 GPT-1 2018 BERT 2018 LLAMA 2023 DeepSeek-V1 2023 GPT-4 2023Static word embeddingsStatic word embeddings, such as word2vec in 2013, marked a significant development.…

6 months, 1 week назад @ neptune.ai
Detecting and Fixing ‘Dead Neurons’ in Foundation Models
Detecting and Fixing ‘Dead Neurons’ in Foundation Models Detecting and Fixing ‘Dead Neurons’ in Foundation Models

TL;DR Dead neurons silently waste compute and reduce effective model capacity in foundation models.

Dead neurons’ impactRecent studies into dead neurons in the context of foundation models show interesting, albeit worrying, results.

These large reported fractions of dead neurons in foundation models are a concern from a computational perspective.

Before we move on to discuss how to detect and fix dead neurons, let’s touch upon an important distinction between dead neurons and vanishing gradients.

Further reading How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models Read moreVisualizing activation distributionsIs your foundation model suffering from dead neurons?

6 months, 2 weeks назад @ neptune.ai
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training

In the first part of this series, we covered the fundamentals of instruction fine-tuning (IFT).

def calculate_irs(instruction, output, reference_model): evaluation_prompt = f""" Instruction: {instruction} Model Output: {output} Rate how well the output follows the instruction on these criteria: 1.

| SourceHINT addresses a computational inefficiency in standard instruction fine-tuning: repeatedly reprocessing the same task instruction with every input example.

Read more about foundation model training infrastructure and other topics in Neptune’s 2025 State of Foundation Model Training Report.

First, during initial instruction fine-tuning across multiple diverse tasks, the model learns genera…

6 months, 3 weeks назад @ neptune.ai
How to Optimize LLM Inference
How to Optimize LLM Inference How to Optimize LLM Inference

Large Language Model (LLM) inference at scale is challenging as it involves transferring massive amounts of model parameters and data and performing computations on large tensors.

In the following, we’ll use the Llama model family architecture as a specific example to understand the LLM workload at inference.

For a far more detailed analysis of the LLM workload at inference, see the chapter All About Transformer Inference in the book How to Scale Your Model, published by Google DeepMind.

See also How to Run LLMs Locally Read moreA quick primer on hardware for LLM inferenceA typical LLM inference cluster consists of several nodes, each with a multi-core CPU and multiple accelerator devices, …

7 months назад @ neptune.ai
A Researcher’s Guide to LLM Grounding
A Researcher’s Guide to LLM Grounding A Researcher’s Guide to LLM Grounding

In this article, we’ll explore the fundamental concepts of LLM grounding as well as strategies for optimally grounding models.

What is LLM grounding?

LLM grounding is analogous.

If relevant knowledge cannot be inferred from the data, then LLM grounding cannot yield more relevant responses.

When grounding LLMs using RAG, consider retaining only a few of the top hits (i.e., top-k) for your retrieval queries.

7 months, 3 weeks назад @ neptune.ai
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions

TL;DR Instruction fine-tuning (IFT) refines pre-trained large language models (LLMs) to follow specific task instructions by training on prompt-response pairs.

Instruction fine-tuning in a nutshellIFT tailors LLMs to follow user instructions by bridging their inherent next-word prediction with human-defined objectives.

Related LLM Fine-Tuning and Model Selection Using Neptune and Transformers Read moreParameter-efficient instruction fine-tuningWhile major foundation models like GPT-4 or Llama-2 undergo full parameter instruction fine-tuning during development, parameter-efficient fine-tuning (PEFT) methods have become widely adopted for instruction fine-tuning since the LoRA paper was publi…

8 months назад @ neptune.ai
Understanding Prompt Injection: Risks, Methods, and Defense Measures
Understanding Prompt Injection: Risks, Methods, and Defense Measures Understanding Prompt Injection: Risks, Methods, and Defense Measures

Prompt injection 101: When prompts go rogueThe term ‘Prompt Injection’ comes from SQL injection attacks.

There is another claim of the independent discovery of prompt injection attacks, which suggests that Riley Goodside publicly exhibited a prompt injection in a tweet back in September 2022.

The indirect prompt injection attacks are classified into active, passive, user-driven and virtual prompt attacks.

Virtual prompt injection attacksThis injection type is closely related to passive injection attacks previously described.

Prompt injection: current challenges & lessons learnedThe arms race between prompt injection attacks and defenses is a challenge for researchers, developers, and users.

9 months, 1 week назад @ neptune.ai
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections] SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]

This simple idea avoids computing loss on input prompt tokens the model already knows.

Prompt tokens are (too) expensive in low-resource settingsDuring pre-training, LLMs are trained in causal language modeling through a next-token prediction task.

=> Mo fẹ́ràn ìrẹsì,” the model is trained to predict every token, from the prompt to the actual answer:Step Prompt Next token 1 Translate English Static prompt 2 Translate English to Static prompt 3 Translate English to Yoruba: Static prompt 4 Translate English to Yoruba: I 5 Translate English to Yoruba: I love 6 Translate English to Yoruba: I love rice.

This is straightforward to implement in PyTorch by masking out the prompt tokens in the label …

9 months, 2 weeks назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 2 months, 1 week назад
I BUILT A FULLY AUTOMATIC MANSPLAINER
I BUILT A FULLY AUTOMATIC MANSPLAINER I BUILT A FULLY AUTOMATIC MANSPLAINER

All information about GTC and the DGX Spark Raffle is here: https://www.ykilcher.com/gtc Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Ethereu…

2 months, 1 week назад @ youtube.com
Traditional X-Mas Stream
Traditional X-Mas Stream Traditional X-Mas Stream

Letsgooo

4 months, 2 weeks назад @ youtube.com
Traditional Holiday Live Stream
Traditional Holiday Live Stream Traditional Holiday Live Stream

https://ykilcher.com/discord Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yannic-kilcher

Minds: https://www.minds.com/ykilcher

Parler: https://parler.com/profile/YannicKilcher

LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https:/…

4 months, 2 weeks назад @ youtube.com
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis) TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)

Paper: https://arxiv.org/abs/2511.08923 Abstract:

Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation …

4 months, 2 weeks назад @ youtube.com
Titans: Learning to Memorize at Test Time (Paper Analysis)
Titans: Learning to Memorize at Test Time (Paper Analysis) Titans: Learning to Memorize at Test Time (Paper Analysis)

Paper: https://arxiv.org/abs/2501.00663 Abstract:

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We sh…

5 months назад @ youtube.com
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff) [Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)

https://arxiv.org/abs/2510.17558 Abstract:

We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks. Author: François Fleuret Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the con…

6 months, 2 weeks назад @ youtube.com
[Video Response] What Cloudflare's code mode misses about MCP and tool calling
[Video Response] What Cloudflare's code mode misses about MCP and tool calling [Video Response] What Cloudflare's code mode misses about MCP and tool calling

Theo's Video: https://www.youtube.com/watch?v=bAYZjVAodoo

Cloudflare article: https://blog.cloudflare.com/code-mode/ Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8…

6 months, 4 weeks назад @ youtube.com
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant) [Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

Paper: https://arxiv.org/abs/2508.21038 Abstract:

Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assumption that these difficulties are exclusively due to unrealistic queries, and those that are not can be overcome with better training data and larger models. In this work, we demonstrate that we may encounter these theoretical limitations in realist…

7 months, 1 week назад @ youtube.com
AGI is not coming!
AGI is not coming! AGI is not coming!

jack Morris's investigation into GPT-OSS training data https://x.com/jxmnop/status/1953899426075816164?t=3YRhVQDwQLk2gouTSACoqA&s=09

9 months, 1 week назад @ youtube.com
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis) Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Paper: https://research.trychroma.com/context-rot Abstract:

Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.

In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. Authors: Kelly Hong, Anton Troynikov, Jeff Huber Links:

9 months, 3 weeks назад @ youtube.com
Henry AI Labs Henry AI Labs
последний пост None
3blue1brown 3blue1brown
последний пост 1 month назад
Covering 10 points, a surprisingly tricky puzzle.
Covering 10 points, a surprisingly tricky puzzle. Covering 10 points, a surprisingly tricky puzzle.

Made as part of a monthly series of puzzles for the 2026 Year of Math.

1 month назад @ youtube.com
Escher's most mind-bending piece
Escher's most mind-bending piece Escher's most mind-bending piece

On "The Print Gallery", by M.C. Escher

Full video: https://youtu.be/ldxFjLJ3rVY

1 month, 2 weeks назад @ youtube.com
The subset sum puzzle
The subset sum puzzle The subset sum puzzle

Part of a series of monthly puzzlers. Stay subscribed to see the solution

1 month, 3 weeks назад @ youtube.com
Escher's most mathematically interesting piece
Escher's most mathematically interesting piece Escher's most mathematically interesting piece

Escher's Print Gallery, and the tour of complex analysis it invites.

Check out our virtual career fair: 3b1b.co/talent

Join channel supporters to see videos early: 3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Original paper by de Smit and Lenstra:

https://pub.math.leidenuniv.nl/~smitbde/papers/2003-de_smit-lenstra-escher.pdf Timestamps: 0:00 - The print gallery

13:04 - Conformal maps from complex analysis

21:41 - The complex exponential

25:56 - The complex logarithm

32:32 - 3b1b Talent

33:14 - Constructing the key function

40:16 - The deeper math behind Escher ------------------ These animations are largely made us…

1 month, 3 weeks назад @ youtube.com
Bacteria Grid Puzzle Solution
Bacteria Grid Puzzle Solution Bacteria Grid Puzzle Solution

Part of a monthly series of puzzlers, in collaboration with MoMath and Peter Winkler

1 month, 3 weeks назад @ youtube.com
The most underappreciated formula | Exploring high-dimensional spheres
The most underappreciated formula | Exploring high-dimensional spheres The most underappreciated formula | Exploring high-dimensional spheres

On the volumes of higher-dimensional spheres

Explore the 3b1b virtual career fair: See https://3b1b.co/talent

Become a supporter for early views of new videos: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Thanks to UC Santa Cruz for letting me film there, and special thanks to Pedro Morales-Almazan for arranging everything. My video on Numberphile with a fun application of this problem: https://youtu.be/6_yU9eJ0NxA Timestamps:

0:00 - Introduction

1:01 - Random puzzle

6:16 - Outside the box

14:35 - Setting up the volume grid

21:14 - Why 4πr^2

25:21 - Archimedes in higher dimensions

36:17 - The general formul…

2 months, 2 weeks назад @ youtube.com
The lattice bacteria puzzle
The lattice bacteria puzzle The lattice bacteria puzzle

Part of a series of monthly puzzles, done in collaboration with MoMath.

https://momath.org/mindbenders

2 months, 4 weeks назад @ youtube.com
Solution to the ladybug clock puzzle
Solution to the ladybug clock puzzle Solution to the ladybug clock puzzle

Solution to last month's probability puzzle.

2 months, 4 weeks назад @ youtube.com
The Hairy Ball Theorem
The Hairy Ball Theorem The Hairy Ball Theorem

Unexpected applications and a beautiful proof.

Looking for a new career? Check out https://3b1b.co/talent

Supporters get early access to new videos: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Credits:

Senia Sheydvasser: Co-writing and sphere deformation animations

Paul Dancstep: Those lovely fluffy sphere animations Vince Rubinetti: Music Timestamps:

0:00 - To comb a hairy ball

1:24 - Applications

8:46 - The puzzle of one null point

12:12 - The proof outline

16:41 - Defining orientation

21:44 - Why inside-out is impossible

25:59 - 3b1b Talent

27:44 - Final food for thought ------------------ These animati…

3 months, 2 weeks назад @ youtube.com
The ladybug clock puzzle
The ladybug clock puzzle The ladybug clock puzzle

This is the first in a set of monthly puzzles, curated by Peter Winkler. This one was originally suggested by Richard Stanley. You can sign up to hear his description of the answer at http://momath.org/mindbenders

4 months назад @ youtube.com
The most absurd product I've made
The most absurd product I've made The most absurd product I've made

Because why not make a pi creature neck pillow?

Available at 3b1b.co/store

5 months, 3 weeks назад @ youtube.com
How Laplace transforms solve differential equations
How Laplace transforms solve differential equations How Laplace transforms solve differential equations

Studying the forced harmonic oscillator by taking a Laplace transform and studying its poles.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Chapter on the Laplace Transform:

https://youtu.be/j0wJBEZdwLs Chapter on the S-plane and Simple Harmonic Motion:

https://youtu.be/-j8PzkZ70Lg Timestamps:

0:00 - Opening puzzle

1:06 - Key properties of a Laplace Transform

3:29 - Qualitative analysis with Laplace Transforms

4:29 - The Laplace Transforms of a Derivative

6:06 - The forced oscillator

11:59 - Intuition from the transformed solution

1…

6 months, 1 week назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

7 months назад @ youtube.com
But what is a Laplace Transform?
But what is a Laplace Transform? But what is a Laplace Transform?

Visualizing the most important tool for differential equations.

Previous chapter: https://youtu.be/-j8PzkZ70Lg

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Artwork by Kurt Bruns Engine animation borrowed with permission from this (excellent) blog: https://ciechanow.ski/internal-combustion-engine/ Timestamps:

0:00 - Understanding the engine

1:16 - Key background ideas

5:41 - Definition and intuition

10:43 - Complex integration

20:43 - Analytic continuation

23:52 - The transform of exponentials

26:15 - A deep look at cos(t)

32:59 - W…

7 months назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

7 months, 1 week назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 3 days, 14 hours назад
NVIDIA’s New AI Is Fast For A Strange Reason
NVIDIA’s New AI Is Fast For A Strange Reason NVIDIA’s New AI Is Fast For A Strange Reason

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://arxiv.org/abs/2604.24954

https://developer.nvidia.com/blog/nvidia-nemotron-3-nano-omni-powers-multimodal-agent-reasoning-in-a-single-efficient-open-model/

https://huggingface.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, R…

3 days, 14 hours назад @ youtube.com
OpenAI's GPT 5.5 Instant: The Good, The Bad And The Insane
OpenAI's GPT 5.5 Instant: The Good, The Bad And The Insane OpenAI's GPT 5.5 Instant: The Good, The Bad And The Insane

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 GPT 5.5 Instant:

https://deploymentsafety.openai.com/gpt-5-5-instant/introduction

https://openai.com/index/gpt-5-5-instant/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwie…

1 week, 1 day назад @ youtube.com
DeepSeek V4 AI: Crushing The Competition
DeepSeek V4 AI: Crushing The Competition DeepSeek V4 AI: Crushing The Competition

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 Check out DeepSeek here:

https://www.deepseek.com/en/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu

1 week, 3 days назад @ youtube.com
NVIDIA's New AI Turns One Photo Into A World That Never Breaks
NVIDIA's New AI Turns One Photo Into A World That Never Breaks NVIDIA's New AI Turns One Photo Into A World That Never Breaks

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://research.nvidia.com/labs/sil/projects/lyra2/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https:…

1 week, 6 days назад @ youtube.com
Sakana AI’s God Simulator Is Brilliant
Sakana AI’s God Simulator Is Brilliant Sakana AI’s God Simulator Is Brilliant

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 Try it out! The paper is available here:

https://pub.sakana.ai/digital-ecosystem/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https:…

2 weeks, 1 day назад @ youtube.com
This Is Why AI Videos Feel Wrong
This Is Why AI Videos Feel Wrong This Is Why AI Videos Feel Wrong

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://research.nvidia.com/labs/sil/projects/MOTIVE/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https…

2 weeks, 4 days назад @ youtube.com
NVIDIA’s New AI Changed Robotics Forever
NVIDIA’s New AI Changed Robotics Forever NVIDIA’s New AI Changed Robotics Forever

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://nvlabs.github.io/GEAR-SONIC/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu #nv…

3 weeks назад @ youtube.com
DeepMind’s New AI: A Gift To Humanity
DeepMind’s New AI: A Gift To Humanity DeepMind’s New AI: A Gift To Humanity

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Links:

https://ai.google.dev/gemma/docs/core/model_card_4 Fine tuning with Matt Mireles: https://x.com/mattmireles/status/2041606508220489786 Other sources:

https://x.com/googlegemma/status/2041256042882105666?s=46

https://x.com/nakazakifam/status/2041286410930446370

https://x.com/measure_plan/status/2039815699695104343

https://x.com/maddiedreese/status/2041677327604838685?s=46

https://x.com/steipete/status/2042615534567457102?s=46

https://x.com/maziyarpanahi/status/2042592050940449260?s=46

https://x.com/adrgrondin/status/2041962263507083340?s=46

https://x.com/evgeniymikholap/status/2041104232648950170

https:…

1 month назад @ youtube.com
“Anthropic’s New AI Is Too Dangerous To Release”
“Anthropic’s New AI Is Too Dangerous To Release” “Anthropic’s New AI Is Too Dangerous To Release”

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://www.anthropic.com/claude-mythos-preview-system-card Links and sources:

https://debugml.github.io/cheating-agents/

https://x.com/bstnxbt/status/2042967285715865685 Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, T…

1 month назад @ youtube.com
NVIDIA’s New AI: The Biggest Leap In Robot Learning Yet
NVIDIA’s New AI: The Biggest Leap In Robot Learning Yet NVIDIA’s New AI: The Biggest Leap In Robot Learning Yet

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper is available here:

https://dreamdojo-world.github.io/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu …

1 month назад @ youtube.com
NVIDIA’s New AI: A Revolution...For Free!
NVIDIA’s New AI: A Revolution...For Free! NVIDIA’s New AI: A Revolution...For Free!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The #NVIDIA paper on Nemotron 3 Super is available here:

https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My …

1 month, 1 week назад @ youtube.com
Google New TurboQuant AI: Hype vs. Reality
Google New TurboQuant AI: Hype vs. Reality Google New TurboQuant AI: Hype vs. Reality

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The TurboQuant paper is available here:

https://arxiv.org/abs/2504.19874 Reproduction: https://x.com/AlicanKiraz0/status/2038245538865275274

KV-cache source: https://huggingface.co/blog/not-lain/kv-caching Reviews and criticisms of the paper:

https://openreview.net/forum?id=tO3ASKZlok

https://x.com/gaoj0017/status/2037532673812443214 Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fre…

1 month, 2 weeks назад @ youtube.com
DeepMind’s New AI Just Changed Science Forever
DeepMind’s New AI Just Changed Science Forever DeepMind’s New AI Just Changed Science Forever

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://arxiv.org/abs/2602.10177 Source:

https://www.youtube.com/watch?v=6evUpgCHtOQ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~z…

1 month, 2 weeks назад @ youtube.com
The Algorithm That Made Me Cry
The Algorithm That Made Me Cry The Algorithm That Made Me Cry

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Free course on Ray Tracing:

https://users.cg.tuwien.ac.at/zsolnai/gfx/rendering-course/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: ht…

1 month, 3 weeks назад @ youtube.com
DeepSeek Just Fixed One Of The Biggest Problems With AI
DeepSeek Just Fixed One Of The Biggest Problems With AI DeepSeek Just Fixed One Of The Biggest Problems With AI

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The #DeepSeek paper is available here:

https://github.com/deepseek-ai/Engram

https://arxiv.org/abs/2601.07372 Larry Wheels:

https://www.youtube.com/watch?v=7SM816P5G9s&lc=Ugz7yiDrr_8YD7w8gaN4AaABAg Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Taz…

1 month, 3 weeks назад @ youtube.com
DataFest Video DataFest Video
последний пост None
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 19 часов назад
Как работает Context Engineering в Agent-loop Нейроюриста Яндекса
Как работает Context Engineering в Agent-loop Нейроюриста Яндекса Как работает Context Engineering в Agent-loop Нейроюриста Яндекса

Если хотите поработать с этим и узнать изнутри как всё устроено — приходите на Weekend Offer ML 30-31 мая: https://clck.ru/3TSCym Андрей Соколов, руководитель команды обучения моделей с внешним контекстом в Яндекс R&D, рассказал, как устроены RAG-системы трёх продуктов Яндекса. В докладе Андрей поделился продуктовыми кейсами и разобрал на их примере, как оценивают качество, где промптинг не вывозит и почему трогать один компонент системы — это всегда риск. Полное видео уже на канале.

19 часов назад @ youtube.com
Оценка стабильности ML-системы: робастность в RAG на примере Алисы
Оценка стабильности ML-системы: робастность в RAG на примере Алисы Оценка стабильности ML-системы: робастность в RAG на примере Алисы

Если хотите поработать с этим и узнать изнутри как всё устроено — приходите на Weekend Offer ML 30-31 мая: https://clck.ru/3TSCym Андрей Соколов, руководитель команды обучения моделей с внешним контекстом в Яндекс R&D, рассказал, как устроены RAG-системы трёх продуктов Яндекса. В докладе Андрей поделился продуктовыми кейсами и разобрал на их примере, как оценивают качество, где промптинг не вывозит и почему трогать один компонент системы — это всегда риск. Полное видео уже на канале.

3 days, 19 hours назад @ youtube.com
Какие бенчмарки мы используем для оценки качества Алисы
Какие бенчмарки мы используем для оценки качества Алисы Какие бенчмарки мы используем для оценки качества Алисы

Если хотите поработать с этим и узнать изнутри как всё устроено — приходите на Weekend Offer ML 30-31 мая: https://clck.ru/3TSCym Андрей Соколов, руководитель команды обучения моделей с внешним контекстом в Яндекс R&D, рассказал, как устроены RAG-системы трёх продуктов Яндекса. В докладе Андрей поделился продуктовыми кейсами и разобрал на их примере, как оценивают качество, где промптинг не вывозит и почему трогать один компонент системы — это всегда риск. Полное видео уже на канале.

6 days, 19 hours назад @ youtube.com
Как Brickit определяет классы деталей LEGO
Как Brickit определяет классы деталей LEGO Как Brickit определяет классы деталей LEGO

Андрей Татаринов, CEO и CTO в Epoch8, рассказал о системе компьютерного зрения для приложения Brickit, которое сканирует множество деталей лего и подсказывает, что из них можно собрать. В докладе Андрей объяснил, как его команда боролась с редкими классами и оптимизировала пайплайн под мобильные устройства. А ещё он поделился MLOps-решениями для масштабируемого дообучения и поддержки модели. Полное видео уже на канале! #Brickit, #Lego, #AI, #Нейросети, #ComputerVision, #DeepLearning, #MobileAI, #Tech, #Стартап, #Будущее, #MachineLearning, #Гаджеты, #Технологии, #WOW, #AIприложение

2 weeks назад @ youtube.com
С какими данными работает Brickit
С какими данными работает Brickit С какими данными работает Brickit

Андрей Татаринов, CEO и CTO в Epoch8, рассказал о системе компьютерного зрения для приложения Brickit, которое сканирует множество деталей лего и подсказывает, что из них можно собрать. В докладе Андрей объяснил, как его команда боролась с редкими классами и оптимизировала пайплайн под мобильные устройства. А ещё он поделился MLOps-решениями для масштабируемого дообучения и поддержки модели. Полное видео уже на канале! #Brickit, #Lego, #AI, #Нейросети, #ComputerVision, #DeepLearning, #MobileAI, #Tech, #Стартап, #Будущее, #MachineLearning, #Гаджеты, #Технологии, #WOW, #AIприложение

2 weeks, 4 days назад @ youtube.com
Как работает разметка классов в Brickit
Как работает разметка классов в Brickit Как работает разметка классов в Brickit

Андрей Татаринов, CEO и CTO в Epoch8, рассказал о системе компьютерного зрения для приложения Brickit, которое сканирует множество деталей лего и подсказывает, что из них можно собрать. В докладе Андрей объяснил, как его команда боролась с редкими классами и оптимизировала пайплайн под мобильные устройства. А ещё он поделился MLOps-решениями для масштабируемого дообучения и поддержки модели. Полное видео уже на канале! #Brickit, #Lego, #AI, #Нейросети, #ComputerVision, #DeepLearning, #MobileAI, #Tech, #Стартап, #Будущее, #MachineLearning, #Гаджеты, #Технологии, #WOW, #AIприложение

3 weeks, 2 days назад @ youtube.com
Возможен ли единый рекомендатель для всех сервисов?
Возможен ли единый рекомендатель для всех сервисов? Возможен ли единый рекомендатель для всех сервисов?

Технически единый рекомендатель для всех сервисов возможен. Но есть две проблемы: данные Музыки, Кинопоиска, Маркета и Афиши плохо обобщаются, а применять такой единый ранкер просто негде — мы же не сравниваем трек с выставкой. Поэтому вместо одной модели — блендеры: каждый сервис ранжирует своё, а потом блоки смешиваются в единую ленту. А вы хотели бы видеть в одной ленте и музыку, и фильмы, и товары? 👇 #единыйрекомендатель #яндекс #яндексмузыка #кинопоиск #маркет #афиша #рекомендательныесистемы #машинноеобучение #ml #блендер #ранжирование

4 weeks, 1 day назад @ youtube.com
Где грань между советом и манипуляцией?
Где грань между советом и манипуляцией? Где грань между советом и манипуляцией?

Любой совет — это в какой-то степени манипуляция. Всё зависит от цели: вы пытаетесь что-то продать или действительно хотите помочь пользователю? Отвечает Даня Бурлаков, руководитель группы рекомендательных продуктов. В этом шортсе говорим об этике рекомендательных систем, о том, где проходит эта тонкая грань и почему честность важнее. А вы чувствуете, когда алгоритм пытается вами манипулировать? Делитесь в комментариях 👇 Смотрите полное видео на канале! #совет #манипуляция #этика #рекомендательныесистемы #искусственныйинтеллект #яндекс #машинноеобучение #ml #этикаии

1 month назад @ youtube.com
Влияют ли рекомендации на наше поведение?
Влияют ли рекомендации на наше поведение? Влияют ли рекомендации на наше поведение?

Рекомендательные системы — это не просто алгоритмы. Они влияют на наше настроение, а через него — на многое другое. В этом шортсе разбираемся, как музыка и другие рекомендации меняют пользователей и почему одна из наших целей — делать людей счастливее. Рассказывает Даня Бурлаков, руководитель группы рекомендательных продуктов. Смотрите полное видео на канале! А как вы считаете, рекомендации влияют на ваше настроение? Пишите в комментариях 👇 #влияниерекомендаций #поведениепользователей #настроение #рекомендательныесистемы #яндексмузыка #психология #машинноеобучение #ml #какэтоработает

1 month назад @ youtube.com
Почему рекомендации замыкаются на знакомом контенте?
Почему рекомендации замыкаются на знакомом контенте? Почему рекомендации замыкаются на знакомом контенте?

Что сделать, чтобы рекомендации не надоели пользователю? Как находить музыкальные треки и фильмы, которые будут расширять его интересы? Рассказывает Даня Бурлаков, руководитель группы рекомендательных продуктов. Смотрите полное видео на канале! #эффектпузыря #фильтрпузыря #разнообразие #рекомендательныесистемы #яндексмузыка #машинноеобучение #трансформеры #ml #новоемузыка

1 month, 1 week назад @ youtube.com
«Я хочу всё выбросить и сделать заново» — почему это так больно
«Я хочу всё выбросить и сделать заново» — почему это так больно «Я хочу всё выбросить и сделать заново» — почему это так больно

Даня Бурлаков, руководитель группы рекомендательных продуктов, рассказал, что мешает внедрить ML-модели в привычный флоу кандидат-генераций и ранкеров. Смотрите полное видео на канале! #яндекс #генеративныемодели #генеративныйии #рекомендательныесистемы #машинноеобучение #ml #argus #catboost #внедрение #продакшен #mldevelopment

1 month, 1 week назад @ youtube.com
Заметны ли изменения в рекомендациях Яндекс Музыки?
Заметны ли изменения в рекомендациях Яндекс Музыки? Заметны ли изменения в рекомендациях Яндекс Музыки?

Даня Бурлаков, руководитель группы рекомендательных продуктов, рассказал, как команда внедряет новые фичи и какой эффект они приносят. Смотрите полное видео на канале! #яндекс #яндексмузыка #рекомендации #рекомендательныесистемы #машинноеобучение #ml #аргус #argus #трансформеры #ранжирование

1 month, 2 weeks назад @ youtube.com
Визуально-текстовая омни-модель: путь к объединению LLM и VLM / Роман Исаченко
Визуально-текстовая омни-модель: путь к объединению LLM и VLM / Роман Исаченко Визуально-текстовая омни-модель: путь к объединению LLM и VLM / Роман Исаченко

На Saturday ML Party Роман Исаченко, руководитель группы анализа изображений в Яндекс R&D, рассказал, как выглядел долгий путь к сведению LLM и VLM из части семейства Alice AI в единую омни-модель. Она умеет работать с текстом и изображениями в одном контуре. А ещё поделился ключевыми этапами, компромиссами и планами по развитию модели в ближайшем будущем. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #AI, #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #AIConference

1 month, 2 weeks назад @ youtube.com
Function calling без реальных данных / Ольга Цымбой и Рамиль Латыпов
Function calling без реальных данных / Ольга Цымбой и Рамиль Латыпов Function calling без реальных данных / Ольга Цымбой и Рамиль Латыпов

Обучение языковых моделей взаимодействовать с инструментами упирается в дефицит данных. Открытые датасеты ограничены по тематикам, содержат мало сложных сценариев и практически не встречаются на русском языке. На Saturday ML Party коллеги из Т-Банка Ольга Цымбой, старший исследователь-разработчик, и Рамиль Латыпов, исследователь-разработчик, рассказали, как они построили полностью синтетический пайплайн генерации function calling данных. А также разобрали шаги обучения и показали, как этот подход позволил прирастить качество на специализированных бенчмарках. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #AI, #MachineLearning, #LLM, #GenAI, #AI…

1 month, 2 weeks назад @ youtube.com
LLM в рекомендациях: теперь мы знаем почти всё о стиле жизни и вкусах покупателя / Владислав Уржумов
LLM в рекомендациях: теперь мы знаем почти всё о стиле жизни и вкусах покупателя / Владислав Уржумов LLM в рекомендациях: теперь мы знаем почти всё о стиле жизни и вкусах покупателя / Владислав Уржумов

Как узнать интересы каждого покупателя, если есть только история действий на Маркете и немного метаинформации? На Saturday ML Party Владислав Уржумов, разработчик группы анализа данных и ML для рекомендаций в Яндекс Маркете, рассказал про вариант такого подхода от коллег из китайского маркетплейса. Внутри: как удалось адаптировать метод под наши данные и пользователей, а заодно прирастить метрики. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #ai , #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #aiconference

1 month, 2 weeks назад @ youtube.com
ML Trainings ML Trainings
последний пост 6 days, 21 hours назад
Капитанский мостик №18: Правда про OpenAI | Бигтехи и правительство | Скрепы для ИИ
Капитанский мостик №18: Правда про OpenAI | Бигтехи и правительство | Скрепы для ИИ Капитанский мостик №18: Правда про OpenAI | Бигтехи и правительство | Скрепы для ИИ

0:00:00 Начало

0:00:55 Европейское роботакси

0:03:59 Правда про OpenAI

0:07:39 Бигтехи и правительство

0:21:22 Anthropic и финансы

0:24:49 Grok и финансы

0:35:21 GPT на FPGA

0:46:31 Подкасты от ИИ

0:52:09 Датацентры в море

0:57:04 Доходы и ИИ

1:04:21 Малайзия и ЦОДы

1:09:32 Скрепы для ИИ ИИ-саммари:

Обсуждение последних новостей в области технологий, роботакси, регулирования ИИ и влияния эмоций на развитие технологий. Гости делятся мнениями о будущем ИИ, законодательстве и этических вопросах. Обсуждение последних трендов в области финтеха, AI и технологий безопасности, а также практических аспектов внедрения новых решений. Обсуждение современных технологий, их применения и будущих тенденций…

6 days, 21 hours назад @ youtube.com
Корпоративная шизофрения Google
Корпоративная шизофрения Google Корпоративная шизофрения Google 1 week, 5 days назад @ youtube.com
Когда вера в пирамиду начинает трещать
Когда вера в пирамиду начинает трещать Когда вера в пирамиду начинает трещать 1 week, 5 days назад @ youtube.com
Киберпанк Невозможно
Киберпанк Невозможно Киберпанк Невозможно 1 week, 5 days назад @ youtube.com
Валентин Малых подозревает, что Илон Маск был прав
Валентин Малых подозревает, что Илон Маск был прав Валентин Малых подозревает, что Илон Маск был прав 1 week, 5 days назад @ youtube.com
Валентин Малых о росте стоимости подписок
Валентин Малых о росте стоимости подписок Валентин Малых о росте стоимости подписок 1 week, 5 days назад @ youtube.com
Сomputer vision: применение на примере параболы
Сomputer vision: применение на примере параболы Сomputer vision: применение на примере параболы 1 week, 5 days назад @ youtube.com
Капитанский мостик №17: ЦОД размером с Юту | ИИ дороже людей | пописай для ИИ
Капитанский мостик №17: ЦОД размером с Юту | ИИ дороже людей | пописай для ИИ Капитанский мостик №17: ЦОД размером с Юту | ИИ дороже людей | пописай для ИИ

0:00:00 введение

0:01:15 DeepSeek и Ascend 0:10:57 ЦОД размером с Юту

0:19:35 OpenAI и падение акций

0:22:28 Китай против покупки Manus

0:30:48 ChatGPT c рекламой

0:38:38 Cohere купил Aleph Alpha

0:44:52 закон про ИИ смягчили

0:47:50 Google и Anthropic

0:53:07 Tencent, Alibaba и DeepSeek

0:55:38 ИИ дороже людей

1:00:38 OpenAI и Microsoft

1:05:02 Газпромнефть и беспилотники

1:09:54 Groq быстрее Nvidia

1:12:45 ИИ спроектировал чип

1:17:46 пописай для ИИ ИИ-саммари:

Обсуждение последних новостей в области технологий, включая запуск новых моделей AI, развитие китайского рынка чипов и геополитические аспекты технологического бизнеса. Обсуждение текущих трендов в области искусственного интеллекта…

1 week, 6 days назад @ youtube.com
Стартап переворачивает парадигму взаимодействия людей с ИИ агентами
Стартап переворачивает парадигму взаимодействия людей с ИИ агентами Стартап переворачивает парадигму взаимодействия людей с ИИ агентами 2 weeks, 5 days назад @ youtube.com
Дмитрий Колодезев отказывается от идеи съемки
Дмитрий Колодезев отказывается от идеи съемки Дмитрий Колодезев отказывается от идеи съемки 2 weeks, 5 days назад @ youtube.com
Дмитрий и Валентин обсуждают расходы на ИИ
Дмитрий и Валентин обсуждают расходы на ИИ Дмитрий и Валентин обсуждают расходы на ИИ 2 weeks, 5 days назад @ youtube.com
Димитров о фашизме и технофашистах
Димитров о фашизме и технофашистах Димитров о фашизме и технофашистах 2 weeks, 5 days назад @ youtube.com
Биологическое оружие будущего: вирусы с генетической направленностью
Биологическое оружие будущего: вирусы с генетической направленностью Биологическое оружие будущего: вирусы с генетической направленностью 2 weeks, 5 days назад @ youtube.com
Валентин Малых говорит о ChatGPT и распаде цивилизации
Валентин Малых говорит о ChatGPT и распаде цивилизации Валентин Малых говорит о ChatGPT и распаде цивилизации 2 weeks, 5 days назад @ youtube.com
Капитанский мостик №15: Сладкий Mythos | Манифест Palantir | Агент заплатит тебе
Капитанский мостик №15: Сладкий Mythos | Манифест Palantir | Агент заплатит тебе Капитанский мостик №15: Сладкий Mythos | Манифест Palantir | Агент заплатит тебе

0:00:00 Начало

0:00:48 Сладкий Mythos

0:07:03 Claude Code за 20$

0:10:06 SpaceX и Cursor

0:15:42 Вышла GPT5.5

0:24:20 Вышел DeepSeek-V4

0:31:00 Манифест Palantir

0:42:33 ИИ-саботаж

0:49:43 Claude и геном

0:55:55 Ozon и электроника

1:03:53 Новые TPU от Google

1:11:11 Google против Claude

1:15:10 Агент заплатит тебе ИИ-саммари:

Обсуждение последних новостей в области искусственного интеллекта, включая мифос, модели Anthropic, SpaceX и стратегию Илона Маска, а также анализ текущих трендов и перспектив развития технологий. Обсуждение последних достижений в области ИИ, контекстных моделей и их применения в бизнесе и безопасности. Анализ технологий, их потенциала и рисков, связанных с биологическ…

2 weeks, 6 days назад @ youtube.com
Primer Primer
последний пост 4 months, 2 weeks назад
Taking AI Doom Seriously For 62 Minutes
Taking AI Doom Seriously For 62 Minutes Taking AI Doom Seriously For 62 Minutes

Patreon: https://www.patreon.com/primerlearning

80,000 Hours: 80000hours.org/primer https://www.desmos.com/calculator/a5pfjtr4tr Other connections:

Discord: https://discord.gg/NbruaNW

Twitch: https://www.twitch.tv/justin_helps

Store: https://store.dftba.com/collections/primer Reddit: https://www.reddit.com/r/primerlearning/

Bsky: https://bsky.app/profile/justinhelps.bsky.social

Twitter: https://twitter.com/primerlearning Links to other resources:

https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/

https://www.youtube.com/c/robertmilesai

https://www.youtube.com/@Siliconversations

https://www.youtube.com/@Go-Meta

https://www.youtube.com/@Dwarkes…

4 months, 2 weeks назад @ youtube.com
Simulating a single brain cell
Simulating a single brain cell Simulating a single brain cell

Patreon:

https://www.patreon.com/primerlearning Helpful resources if you want to learn more about neural networks

https://www.youtube.com/@AndrejKarpathy

https://course.fast.ai/

https://www.youtube.com/@WelchLabsVideo

https://www.youtube.com/@3blue1brown Early papers. These probably aren't helpful for understanding the concepts in this video, but if you're interested in history.

The Perceptron – A perceiving and recognizing automaton: https://bpb-us-e2.wpmucdn.com/websites.umass.edu/dist/a/27637/files/2016/03/rosenblatt-1957.pdf

The Perceptron: A probabilistic model for information storage and organization in the brain: https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf A Logical…

7 months, 3 weeks назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 1 week, 3 days назад
#496 – FFmpeg: The Incredible Technology Behind Video on the Internet
#496 – FFmpeg: The Incredible Technology Behind Video on the Internet #496 – FFmpeg: The Incredible Technology Behind Video on the Internet

Jean-Baptiste Kempf is lead developer of VLC and president of VideoLAN.

Kieran Kunhya is a longtime FFmpeg contributor, codec engineer, and the person behind the now-infamous FFmpeg account on X.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep496-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://larridin.comBlitzy: AI agent for large enterprise codebases.

Go to https://perplexity.ai/OUTLINE:(00:00) – Introduction(03:00) – Sponsors, Comments, and Reflections(10:48) – Weirdest things VLC opens(15:12) – How video playback works(24:33) – Video codecs and containers(35:20) – FFmpeg explained(56:20)…

1 week, 3 days назад @ lexfridman.com
#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age
#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age #495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age

Lars Brownworth is a historian, teacher, podcaster, and author specializing in Viking history, medieval Europe, and the Byzantine Empire.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep495-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://larridin.comBetterHelp: Online therapy and counseling.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

Go to https://perplexity.ai/OUTLINE:(00:00) – Introduction(01:03) – Sponsors, Comments, and Reflections(08:57) – The start of the Viking Age(18:50) – Viking military strategy, tactics & technology(32:33) – Ragnar Lothbrok(42:00) – The Grea…

1 month, 1 week назад @ lexfridman.com
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution #494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution

Jensen Huang is the co-founder and CEO of NVIDIA, the world’s most valuable company and the engine powering the AI computing revolution.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep494-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

Go to https://quo.com/lexOUTLINE:(00:00) – Introduction(00:26) – Sponsors, Comments, and Reflections(06:34) – Extreme co-design and rack-scale engineering(09:20) – How Jensen runs NVIDIA(28:41) – AI scaling laws(43:41) – Biggest blockers to AI scaling laws(45:25) – Supply chain(47:20) – Memory(53:25) – Power…

1 month, 3 weeks назад @ lexfridman.com
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming #493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming

Jeff Kaplan is a legendary Blizzard game designer of World of Warcraft and Overwatch, now preparing to launch a new game, The Legend of California, from his new studio Kintsugiyama – available to wishlist on Steam today, with alpha later in March.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep493-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://fin.ai/lexBlitzy: AI agent for large enterprise codebases.

Go to https://blitzy.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexShopify: Sell stuff online.

2 months назад @ lexfridman.com
#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music
#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music #492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music

Rick Beato is a music educator, interviewer, producer, songwriter, and a true multi-instrument musician, playing guitar, bass, cello & piano.

His incredible YouTube channel celebrates great musicians & musical ideas, and helps millions of people fall in love with great music all over again.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep492-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexBetterHelp: Online therapy and counseling.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

2 months, 2 weeks назад @ lexfridman.com
#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger
#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger #491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger

Peter Steinberger is the creator of OpenClaw, an open-source AI agent framework that’s the fastest-growing project in GitHub history.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep491-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://coderabbit.ai/lexFin: AI agent for customer service.

Go to https://fin.ai/lexBlitzy: AI agent for large enterprise codebases.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(03:51) – Sponsors, Comments, and Reflections(15:29) – OpenClaw origin story(18:48) – Mind-blowing moment(28:15) – Why OpenClaw went viral(32:12) – Self-modifying AI agent(36:57)…

3 months назад @ lexfridman.com
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI #490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI

Nathan Lambert and Sebastian Raschka are machine learning researchers, engineers, and educators.

Sebastian Raschka is the author of Build a Large Language Model (From Scratch) and Build a Reasoning Model (From Scratch).

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep490-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

(25:11) – ChatGPT vs Claude vs Gemini vs Grok: Who is winning?

(36:11) – Best AI for coding(43:02) – Open Source vs Closed Source LLMs(54:41) – Transformers: Evolution of LLMs since 2019(1:02:38) – AI Scaling Laws: Are they dead or still holding?

3 months, 2 weeks назад @ lexfridman.com
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle #489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle

Paul Rosolie is a naturalist, explorer, author of a new book titled Junglekeeper, and is someone who has dedicated his life to protecting the Amazon rainforest.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep489-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://perplexity.ai/BetterHelp: Online therapy and counseling.

Go to https://fin.ai/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

4 months назад @ lexfridman.com
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins #488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins

Joel David Hamkins is a mathematician and philosopher specializing in set theory, the foundations of mathematics, and the nature of infinity, and he’s the #1 highest-rated user on MathOverflow.

He is also the author of several books, including Proof and the Art of Mathematics and Lectures on the Philosophy of Mathematics.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep488-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://masterclass.com/lexpodOUTLINE:(00:00) – Introduction(01:58) – Sponsors, Comments, and Reflections(15:40) – Infinity & paradoxes(1:02:50) – Russell’s paradox(1:15:57) – Gödel’s…

4 months, 2 weeks назад @ lexfridman.com
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths #487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths

Irving Finkel is a scholar of ancient languages and a longtime curator at the British Museum, renowned for his expertise in Mesopotamian history and cuneiform writing.

He specializes in reading and interpreting cuneiform inscriptions, including tablets from Sumerian, Akkadian, Babylonian, and Assyrian contexts.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep487-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/Chevron: Reliable energy for data centers.

5 months назад @ lexfridman.com
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life #486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life

Michael Levin is a biologist at Tufts University working on novel ways to understand and control complex pattern formation in biological systems.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep486-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

(2:42:41) – Mind uploading(3:01:22) – Alien intelligence(3:16:17) – Advice for young people(3:22:46) – Questions for AGI

5 months, 2 weeks назад @ lexfridman.com
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy #485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy

David Kirtley is a nuclear fusion engineer and CEO of Helion Energy, a company working on building the world's first commercial fusion power plant by 2028.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep485-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/david-kirtley-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

David's X: htt…

6 months назад @ lexfridman.com
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming #484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming

Dan Houser is co-founder of Rockstar Games and is a legendary creative mind behind Grand Theft Auto (GTA) and Red Dead Redemption series of video games.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep484-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://box.com/aiUPLIFT Desk: Standing desks and office ergonomics.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(01:29) – Sponsors, Comments, and Reflections(11:32) – Greatest films of all time(23:45) – Making video games(26:36) – GTA 3(29:55) – Open world video games(32:42) – Character creation(36:09) – Superintelligent AI in A Bette…

6 months, 2 weeks назад @ lexfridman.com
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex #483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex

Julia Shaw is a criminal psychologist and author who in her books explores human nature, including psychopathy, violent crime, the psychology of evil, police interrogation, false memory manipulation, deception detection, and human sexuality.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep483-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

7 months назад @ lexfridman.com
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature #482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature

Pavel Durov is the founder and CEO of Telegram.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep482-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Transcript:https://lexfridman.com/pavel-durov-transcriptCONTACT LEX:Feedback – give feedback to Lex: https://lexfridman.com/surveyAMA – submit questions, videos or call-in: https://lexfridman.com/amaHiring – join our team: https://lexfridman.com/hiringOther – other ways to get in touch: https://lexfridman.com/contactEPISODE LINKS:Pavel’s Telegram: https://t.me/durovPavel’s X: https://x.com/durovTelegram: https://telegram.org/Telegram Contests: https://contest.c…

7 months, 2 weeks назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 3 weeks, 5 days назад
Can we AI our way to a more sustainable world?
Can we AI our way to a more sustainable world? Can we AI our way to a more sustainable world?

Because I do think there’s a role for AI, a huge role for AI.

BURGER: Right, right.

BURGER: Right, right.

So I think that’s also something quite important here that, you know, AI can help facilitate.

And I think that’s not just applying AI to solve solutions through optimization but also thinking about this in an integrated way.

3 weeks, 5 days назад @ microsoft.com
Ideas: Steering AI toward the work future we want
Ideas: Steering AI toward the work future we want Ideas: Steering AI toward the work future we want

JANSSEN: Yeah, yeah, exactly.

TEEVAN: Yeah, yeah, yeah.

I’m curious what you have found particularly surprising about how people and organizations are leveraging AI right now.

And so I do like to picture a future of work where humans are flourishing with AI and where humans still get to do meaningful work.

And I’m very curious about how we can take advantage of AI and do more without running ourselves into the ground because we’re not AI, right?

1 month, 1 week назад @ microsoft.com
Will machines ever be intelligent?
Will machines ever be intelligent? Will machines ever be intelligent?

And the question we’re going to discuss is, are machines intelligent?

No, no, that’s right, that’s right.

I mean, in some sense, you could potentially have a super intelligent system, right, that’s far more intelligent than anything else on the planet.

BURGER: Right, right.

At the same time, I think, you know, transformers are not intelligent in the way that a three-year-old is, right?

1 month, 3 weeks назад @ microsoft.com
Trailer: The Shape of Things to Come
Trailer: The Shape of Things to Come Trailer: The Shape of Things to Come

Join Microsoft’s Doug Burger and guests as they dig into the fundamental truths about AI and how it will reshape the future.

Technical advances are moving at such a rapid pace that it can be challenging to define the tomorrow we’re working toward.

In The Shape of Things to Come, Microsoft research leader Doug Burger and experts from across disciplines tease out the thorniest AI issues facing technologists, policymakers, business decision-makers, and other stakeholders today.

It’s important to understand what the emerging shapes are and how we should respond.” – Doug Burger, Technical Fellow and Corporate Vice President, Microsoft ResearchAbout Doug BurgerDoug Burger is a research leader in …

2 months, 2 weeks назад @ microsoft.com
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

5 months, 2 weeks назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

7 months, 1 week назад @ microsoft.com
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education

KOHANE: So I think you’ve “nerd sniped” me because you [LAUGHTER]—which is all too easy—but I think there’s a central issue here.

But I actually think this is dark matter of human organizational technology that is not well understood.

AZEEM AZHAR: We didn’t talk about, you know, AI in its ability to potentially do this, which is to extend the clinician’s presence throughout the week.

And so I think there’s always going to be an opening for either differences of opinion or agreeing with you too much.

And this gets into whether AI is really going to get almost to the ab initio understanding of human biology.

8 months, 4 weeks назад @ microsoft.com
Reimagining healthcare delivery and public health with AI
Reimagining healthcare delivery and public health with AI Reimagining healthcare delivery and public health with AI

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

9 months, 1 week назад @ microsoft.com
Navigating medical education in the era of generative AI
Navigating medical education in the era of generative AI Navigating medical education in the era of generative AI

Prior to med school, Daniel pursued experiences that cultivated his interest in the application of AI in medical practice and education.

Really, really looking forward to this chat.

There’s AI before ChatGPT and before, you know, generative AI really became a big thing, and then afterwards.

And then after we talk about what’s really happening, what do you think should happen in medical education given the reality of generative AI?

And I do agree [that] AI really gives us real hope that we can make it true.

9 months, 3 weeks назад @ microsoft.com
AI Testing and Evaluation: Reflections
AI Testing and Evaluation: Reflections AI Testing and Evaluation: Reflections

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

We have examples, like the pharmaceutical or medical device industry experts with whom you spoke, that’s really, you know, testing … there is a pre-deployment requirement.

And the third is just how rigid versus adaptive these testing and evaluation regimes or frameworks are in these different domains.

I really agree that there has been a lot of emphasis to date on, sort of, testing models upstream, the AI model evaluation.

You know, I think there’s been real progress already in the AI evaluation and testing ecosystem in the public-private partnership context.

9 months, 4 weeks назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 2 weeks, 1 day назад
Student Spotlight: Aaron Payne, Data Analyst
Student Spotlight: Aaron Payne, Data Analyst Student Spotlight: Aaron Payne, Data Analyst

Aaron Payne, an MBA student at Georgia Tech studying business analytics and a Senior Insights Analyst at Chick-fil-A, joins Kyle Polich to talk about turning analytics into decisions that matter. They unpack a real-world forecasting project with Comfama in Colombia, including messy data realities, interpretability tradeoffs, and why "data science for good" starts with the people impacted.

2 weeks, 1 day назад @ dataskeptic.com
The Future is Agentic in Recommender Systems
The Future is Agentic in Recommender Systems The Future is Agentic in Recommender Systems

Kyle Polich sits down with Yashar Deldjoo, research scientist and Associate Professor at the Polytechnic University of Bari, to explore how recommender systems have evolved and why trustworthiness matters. They unpack key dimensions of responsible AI, including robustness to adversarial attacks, privacy, explainability, and fairness, and discuss how LLMs introduce new risks like hallucinations. The episode closes with a look at "agentic" recommender systems, where tools and memory shift recommendations from ranked lists to end-to-end task completion.

3 weeks назад @ dataskeptic.com
Book Ratings and Recommendations
Book Ratings and Recommendations Book Ratings and Recommendations

Goodreads star ratings can be misleading as measures of "book quality," and research from Hannes Rosenbusch suggests that for many professionally published books, differences between readers often matter more than differences between books. The episode also explores how to model reader preferences, why reviews often reveal more about the reviewer than the text, and how LLMs can aid computational literary research while still falling short of human editors in creative writing.

1 month, 2 weeks назад @ dataskeptic.com
Disentanglement and Interpretability in Recommender Systems
Disentanglement and Interpretability in Recommender Systems Disentanglement and Interpretability in Recommender Systems 2 months, 1 week назад @ dataskeptic.com
Collective Altruism in Recommender Systems
Collective Altruism in Recommender Systems Collective Altruism in Recommender Systems

Ekaterina (Kat) Filadova from MIT EECS joins us to discuss strategic learning in recommender systems—what happens when users collectively coordinate to game recommendation algorithms. Kat's research reveals surprising findings: algorithmic "protest movements" can paradoxically help platforms by providing clearer preference signals, and the challenge of distinguishing coordinated behavior from bot activity is more complex than it appears. This episode explores the intersection of machine learning and game theory, examining what happens when your training data actively responds to your algorithm.

2 months, 2 weeks назад @ dataskeptic.com
Niche vs Mainstream
Niche vs Mainstream Niche vs Mainstream

Anas Buhayh discusses multi-stakeholder fairness in recommender systems and the S'mores framework—a simulation allowing users to choose between mainstream and niche algorithms. His research shows specialized recommenders improve utility for niche users while raising questions about filter bubbles and data privacy.

2 months, 3 weeks назад @ dataskeptic.com
Healthy Friction in Job Recommender Systems
Healthy Friction in Job Recommender Systems Healthy Friction in Job Recommender Systems

In this episode, host Kyle Polich speaks with Roan Schellingerhout, a fourth-year PhD student at Maastricht University, about explainable multi-stakeholder recommender systems for job recruitment. Roan discusses his research on creating AI-powered job matching systems that balance the needs of multiple stakeholders—job seekers, recruiters, HR professionals, and companies. The conversation explores different types of explanations for job recommendations, including textual, bar chart, and graph-based formats, with findings showing that lay users strongly prefer simple textual explanations over more technical visualizations. Roan shares insights from his "healthy friction" study, which tested …

3 months, 1 week назад @ dataskeptic.com
Fairness in PCA-Based Recommenders
Fairness in PCA-Based Recommenders Fairness in PCA-Based Recommenders

In this episode, we explore the fascinating world of recommender systems and algorithmic fairness with David Liu, Assistant Research Professor at Cornell University's Center for Data Science for Enterprise and Society. David shares insights from his research on how machine learning models can inadvertently create unfairness, particularly for minority and niche user groups, even without any malicious intent. We dive deep into his groundbreaking work on Principal Component Analysis (PCA) and collaborative filtering, examining why these fundamental techniques sometimes fail to serve all users equally. David introduces the concept of "power niche users" - highly active users with specialized in…

3 months, 2 weeks назад @ dataskeptic.com
Video Recommendations in Industry
Video Recommendations in Industry Video Recommendations in Industry

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial …

4 months, 3 weeks назад @ dataskeptic.com
Eye Tracking in Recommender Systems
Eye Tracking in Recommender Systems Eye Tracking in Recommender Systems

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrate…

4 months, 4 weeks назад @ dataskeptic.com
Cracking the Cold Start Problem
Cracking the Cold Start Problem Cracking the Cold Start Problem

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insigh…

5 months, 1 week назад @ dataskeptic.com
Designing Recommender Systems for Digital Humanities
Designing Recommender Systems for Digital Humanities Designing Recommender Systems for Digital Humanities

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challen…

5 months, 3 weeks назад @ dataskeptic.com
DataRec Library for Reproducible in Recommend Systems
DataRec Library for Reproducible in Recommend Systems DataRec Library for Reproducible in Recommend Systems

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Mario Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. The conversation covers Alberto's research journey through knowledge graphs, graph-based recommen…

6 months назад @ dataskeptic.com
Shilling Attacks on Recommender Systems
Shilling Attacks on Recommender Systems Shilling Attacks on Recommender Systems

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a …

6 months, 1 week назад @ dataskeptic.com
Music Playlist Recommendations
Music Playlist Recommendations Music Playlist Recommendations

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start prob…

6 months, 2 weeks назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 1 day, 19 hours назад
992: Tokenmaxxing vs AI Hardware Bottlenecks
992: Tokenmaxxing vs AI Hardware Bottlenecks 992: Tokenmaxxing vs AI Hardware Bottlenecks

While “tokenmaxxing”, the social media trend of maximizing AI token consumption as a vanity metric, takes off online, the physical infrastructure behind AI is slamming into serious bottlenecks. In this Five-Minute Friday, Jon Krohn maps out the four overlapping supply-chain constraints choking AI compute: GPUs (with NVIDIA Blackwell sold out through mid-2026), high-bandwidth memory (quintupled demand since 2023, only three manufacturers worldwide), CPUs (agentic AI requires 12x more CPUs per GPU than chatbots), and electricity (Gartner projects power shortages will restrict 40% of AI data centres by 2027). Find out why the five biggest hyperscalers are on track to spend $725 billion on AI i…

1 day, 19 hours назад @ podtrac.com
991: Pair Programming with AI in Your Python Notebook, with Dr. Trevor Manz
991: Pair Programming with AI in Your Python Notebook, with Dr. Trevor Manz 991: Pair Programming with AI in Your Python Notebook, with Dr. Trevor Manz

Dr. Trevor Manz of Marimo talks to Jon Krohn about Marimo Pair, an open-source agent skill that teaches coding agents like Claude Code how to drive a reactive Python notebook, reading cell state, running Python in the kernel, taking screenshots of cells, and iterating on data tasks the way agents iterate on traditional software. Trevor also unpacks recursive language models, his AnyWidget project that bridges Python and the web, and his journey from a Wisconsin small town and Harvard bioinformatics research to founding-engineer life at Marimo. Listen to the episode to hear why no matter where AI takes us, curiosity and going deep on a topic will always be valuable. Additional materials: ⁠⁠⁠…

4 days, 19 hours назад @ podtrac.com
990: Inside Mythos: Anthropic's Locked-Down Frontier Model
990: Inside Mythos: Anthropic's Locked-Down Frontier Model 990: Inside Mythos: Anthropic's Locked-Down Frontier Model

Anthropic has built a frontier AI model so capable at finding software vulnerabilities that it has decided not to release it to the general public. In this Five-Minute Friday, Jon Krohn breaks down Claude Mythos Preview, a general-purpose model whose hacking abilities emerged as a side effect of broad improvements in code understanding and reasoning. Find out how Mythos achieved a nearly 100x improvement over Opus 4.6 on Firefox exploit generation, why Mozilla patched 271 vulnerabilities in a single release using an early version of the model, and what Project Glasswing Anthropic’s gated industry consortium means for the future of cybersecurity. Jon also shares practical tips for securing t…

1 week, 1 day назад @ podtrac.com
989: Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib
989: Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib 989: Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib

Rubrik’s Anneka Gupta and Cal Al-Dhubaib speak to Jon Krohn about cybersecurity measures, the risks AI in business might pose for malicious attacks, and why AI should be kept “boring.” Find out how Rubrik safeguards client data, what zero trust is in the context of cybersecurity, and why cyber-resilience needs to be a top priority for companies looking to adopt AI. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/989⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (02:25) All about Rubrik (08:51) The announcement of Claude …

1 week, 4 days назад @ podtrac.com
988: In Case You Missed It in April 2026
988: In Case You Missed It in April 2026 988: In Case You Missed It in April 2026

In this month’s episode of In Case You Missed It, Jon Krohn talks to guests about memory and education, and how artificial intelligence is continuing to help lower the barriers to access. Hear from Matt Glickman, Traci Walker-Griffith, Richmond Alake, and Linda Haviv, discussing the foundations of AI agent memory, how engineers can develop at scale, and why they believe AI could be your child’s perfect tutor in the classroom. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/988⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

2 weeks, 1 day назад @ podtrac.com
987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv
987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv 987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv

Linda Haviv talks to Jon Krohn about staying current on AI matters, why open-source technology is narrowing the gap in its race with proprietary models, and how being a content creator in tech is key to career growth and longevity. She emphasizes that non-linear pathways to a career in tech can give applicants an edge, and stresses the importance of continuous upskilling to “stay relevant.” In her view, systems thinking is becoming more important than coding skills. Hear why in this episode. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/987⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascienc…

2 weeks, 4 days назад @ podtrac.com
986: Building Hardware is Hard but AI Agents Help, with Kishore Subramanian
986: Building Hardware is Hard but AI Agents Help, with Kishore Subramanian 986: Building Hardware is Hard but AI Agents Help, with Kishore Subramanian

CTO of Propel Software Kishore Subramanian talks to Jon Krohn about how product lifecycle management (PLM) software and quality management systems help ensure compliance, record management, and quality assurance. Listen to the episode to hear Kishore Subramanian talk about best practices for getting started with Agentforce 360, his top tips for deploying AI projects, and why yoga and meditation could make you better at building AI products! Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/984⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.⁠⁠⁠ In this episode you will learn: (05:21) …

3 weeks, 1 day назад @ podtrac.com
985: The Four Types of Memory Every AI Agent Needs, with Richmond Alake
985: The Four Types of Memory Every AI Agent Needs, with Richmond Alake 985: The Four Types of Memory Every AI Agent Needs, with Richmond Alake

Oracle’s Director of AI Developer Experience Richmond Alake returns to the show to talk to Jon Krohn about agent memory; the network of systems, models, databases and LLMs that enable AI agents to learn and adapt over time. Listen to the episode to hear about Richmond’s “100 Days of Agent Memory” initiative, retrieval-augmented generation’s (RAG) limitations with AI agents, the layers of the AI agent stack, and what makes the Oracle AI database so useful to developers. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/985⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship inf…

3 weeks, 4 days назад @ podtrac.com
984: Building AI Agents Where 99.9% Accuracy Isn't Good Enough, with Raju Malhotra
984: Building AI Agents Where 99.9% Accuracy Isn't Good Enough, with Raju Malhotra 984: Building AI Agents Where 99.9% Accuracy Isn't Good Enough, with Raju Malhotra

Raju Malhotra, Chief Product and Technology Officer at Certinia, talks to Jon Krohn about the so-called SaaSpocalypse and how agentic AI is proving the doomsayers wrong. Listen to the episode to hear more about Certinia’s work with Salesforce and building with Agentforce 360, the three elements required for enterprise-grade agents, how AI agents have benefitted Certinia’s customers, and how to keep your work portfolio fresh and interesting to recruiters. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/984⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.⁠⁠⁠ In this episode you will lea…

4 weeks, 1 day назад @ podtrac.com
983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith
983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith 983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith

My guest today took a public school that was about to be shut down and turned it into the number one school in Boston, and AI is her latest secret weapon. In a long-overdue episode on AI for supporting children’s education, hear directly from Principal Traci Walker Griffith how her teachers have been experimenting with AI in classrooms, what works, what doesn’t work, and what’s next for kids as LLMs continue to improve. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/983⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (03:38) Th…

1 month назад @ podtrac.com
982: In Case You Missed It in March 2026
982: In Case You Missed It in March 2026 982: In Case You Missed It in March 2026

Jon Krohn rounds up March’s interviews in this ICYMI episode. Hear from AI and data science experts across the fields of education and business in this wide-ranging series of clips that take listeners from the Renaissance to the near future. Guests include Lin Quiao (Episode 971), Chris Fregly (Episode 973), Zack Kass (Episode 975), Kyunghyun Cho (Episode 977), and Rohit Choudhary (Episode 979). Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/982⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month назад @ podtrac.com
981: How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman
981: How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman 981: How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman

Matt Glickman talks to Jon Krohn about co-founding the agentic-platform startup, Genesis Computing, how his experience at Goldman Sachs paved the way for developing AI agents, and where he thinks agentic AI has just as much value as a company’s human employees. This February, Genesis Computing revealed how its platform can offer the guardrails so crucial to businesses, alongside increased capabilities that help execute entire workflows from research to deployment. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/981⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.…

1 month, 1 week назад @ podtrac.com
980: AI Making Theoretical Physics Breakthroughs
980: AI Making Theoretical Physics Breakthroughs 980: AI Making Theoretical Physics Breakthroughs

A team of theoretical physicists from Harvard, Cambridge, the Institute for Advanced Study, and Vanderbilt used OpenAI’s models not just as a tool, but as a collaborator, cracking a problem in particle physics that had stymied them for months. In this Five-Minute Friday, Jon Krohn walks through how GPT-5.2 Pro simplified a 32-variable mathematical expression into a single line, proposed what it called the “obvious generalization” for any number of gluons, and how a more powerful internal model then produced a formal proof after 12 hours of autonomous reasoning. Find out why this may be a template for AI-assisted scientific discovery and what it means for the future of research. Additional m…

1 month, 1 week назад @ podtrac.com
979: Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary
979: Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary 979: Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary

For years, Jon has been quoting the stat that the world's data is roughly doubling every year. His guest today says that’s way too conservative, he’s seeing enterprise data soon growing at close to 10x per year. And most organizations are nowhere near ready for what that means. In this episode, Rohit Choudhary, founder and CEO of Acceldata, explains how the agentic data management platform his team has built helps enterprises make their increasingly vast amounts of data self-aware, self-optimizing, and AI-ready. He breaks down why governance needs to be operational and real-time rather than a one-time compliance exercise, and shares his view on why the most valuable professionals in the age…

1 month, 2 weeks назад @ podtrac.com
A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)
A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%) A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

A game millions of people solve over morning coffee is exposing a fundamental weakness in today’s most powerful AI models. In this Five-Minute Friday, Jon Krohn breaks down Pathway’s new Sudoku Extreme benchmark, roughly 250,000 of the hardest Sudoku puzzles available and why leading LLMs like o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet scored effectively zero percent, while Pathway’s post-transformer BDH architecture achieved 97.4% accuracy at a fraction of the cost. Listen to the episode to find out what BDH is doing differently, why Sudoku performance matters far beyond puzzles, and what this means for the future of AI reasoning. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatas…

1 month, 2 weeks назад @ podtrac.com
Data Science at Home Data Science at Home
последний пост 3 weeks, 4 days назад
Europe, wake up! You Can’t Be a Superpower on Someone Else’s Servers (Ep. 304)
Europe, wake up! You Can’t Be a Superpower on Someone Else’s Servers (Ep. 304) Europe, wake up! You Can’t Be a Superpower on Someone Else’s Servers (Ep. 304)

Tech sovereignty takes 3 years and political will.

Check outshift.comNEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at: [email protected]’t forget to like, subscribe, and hit the 🔔 for updates on the latest in AI and data science!

3 weeks, 4 days назад @ datascienceathome.com
About Apple’s Privacy (Ep. 302)
About Apple’s Privacy (Ep. 302) About Apple’s Privacy (Ep. 302)

Apple just spent $2B on tech that reads your silent speech.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at: [email protected]’t forget to like, subscribe, and hi…

3 weeks, 4 days назад @ datascienceathome.com
Productivity is the new data breach (Ep. 301)
Productivity is the new data breach (Ep. 301) Productivity is the new data breach (Ep. 301)

Personal newsletter:https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews…

3 weeks, 4 days назад @ datascienceathome.com
Programmable Money: The Cage They’ll Call Convenience (Ep. 300)
Programmable Money: The Cage They’ll Call Convenience (Ep. 300) Programmable Money: The Cage They’ll Call Convenience (Ep. 300)

This episode breaks down programmable money, the technology that turns your wallet into a permission system.

Personal newsletter: https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at: …

3 weeks, 4 days назад @ datascienceathome.com
There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299)
There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299) There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299)

Personal newsletter: https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/ Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, intervi…

2 months, 1 week назад @ datascienceathome.com
Bias in the machine (edited)
Bias in the machine (edited) Bias in the machine (edited)

The title of today’s episode is Bias in the machineC: Francesco, today we are starting with an infuriating discussion.

The failure of the medical community as a whole to recognise this obvious bias up to the 21st century is an example of how insidious the problem of bias is.

Three: The bias in your training sample: people put training samples together, and people have culture, experience, and prejudice.

These assumptions inform the way AI systems work—and fail—to this day.

When an algorithm is a black box and you can’t look inside, you have no way of analysing its bias.

2 months, 1 week назад @ datascienceathome.com
What is wrong with reinforcement learning? (Ep. 82)
What is wrong with reinforcement learning? (Ep. 82) What is wrong with reinforcement learning? (Ep. 82)

Join the discussion on our Discord serverAfter reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here.In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions.

RL seems to work so well!

What is wrong with it?

Are you a listener of Data Science at Home podcast?

Or did you subscribe to the Artificial Intelligence at your fingertips newsletter?

3 months, 1 week назад @ datascienceathome.com
How to generate very large images with GANs (Ep. 76)
How to generate very large images with GANs (Ep. 76) How to generate very large images with GANs (Ep. 76)

Join the discussion on our Discord serverIn this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs.

The problem is not as trivial as it seems.

Many researchers have failed in generating large images with GANs before.

One interesting application of such approach is in medicine for the generation of CT and X-ray images.Enjoy the show!

ReferencesMulti-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376

3 months, 1 week назад @ datascienceathome.com
Training neural networks faster without GPU [RB] (Ep. 77)
Training neural networks faster without GPU [RB] (Ep. 77) Training neural networks faster without GPU [RB] (Ep. 77)

Join the discussion on our Discord serverTraining neural networks faster usually involves the usage of powerful GPUs.

In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.

Enjoy the show!

ReferencesFaster Neural Network Training with Data Echoinghttps://arxiv.org/abs/1907.05550

3 months, 1 week назад @ datascienceathome.com
More powerful deep learning with transformers (Ep. 84) (Rebroadcast)
More powerful deep learning with transformers (Ep. 84) (Rebroadcast) More powerful deep learning with transformers (Ep. 84) (Rebroadcast)

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.

Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful.

Don’t forget to subscribe to our Newsletter or join the discussion on our Discord serverReferences

3 months, 1 week назад @ datascienceathome.com
Your Favorite AI Startup is Probably Bullshit (Ep. 298) [RB]
Your Favorite AI Startup is Probably Bullshit (Ep. 298) [RB] Your Favorite AI Startup is Probably Bullshit (Ep. 298) [RB]

The brutal truth about why Silicon Valley is blowing billions on glorified autocomplete while pretending it’s the next iPhone.

We’re diving deep into the AI investment circus where VCs who can’t code are funding companies that barely understand their own technology.

From blockchain déjà vu to the “ChatGPT wrapper” economy—this episode will make you question every AI valuation you’ve ever seen.

Fair warning: We’re naming names and calling out the hype.

Don’t listen if you work at a “revolutionary AI startup” that’s just OpenAI’s API with a pretty interface.

3 months, 1 week назад @ datascienceathome.com
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 297) [RB]
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 297) [RB] Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 297) [RB]

VortexNet uses actual whirlpools to build neural networks.

By borrowing equations from fluid dynamics, this new architecture might solve deep learning’s toughest problems—from vanishing gradients to long-range dependencies.

Today we explain how vortex shedding, the Strouhal number, and turbulent flows might change everything in AI.

SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

3 months, 1 week назад @ datascienceathome.com
AGI: The Dream We Should Never Reach (Ep. 296)
AGI: The Dream We Should Never Reach (Ep. 296) AGI: The Dream We Should Never Reach (Ep. 296)

Also on YouTubeTwo AI experts who actually love the technology explain why chasing AGI might be the worst thing for AI’s future—and why the current hype cycle could kill the field we’re trying to save.

Head to datascienceathome.com for detailed show notes, code examples, and exclusive deep-dives into the papers we discuss.

Subscribe to our newsletter for weekly breakdowns of cutting-edge research delivered straight to your inbox—no fluff, just science!

Our Discord community is full of ML engineers, researchers, and AI enthusiasts discussing papers, sharing projects, and helping each other level up.

Whether you’re debugging your first neural net or training your tenth transformer, there’s a …

3 months, 1 week назад @ datascienceathome.com
When Data Stops Being Code and Starts Being Conversation (Ep. 297)
When Data Stops Being Code and Starts Being Conversation (Ep. 297) When Data Stops Being Code and Starts Being Conversation (Ep. 297)

Mark Brocato built Mockaroo—the tool that taught millions of developers how to fake data.

Now, as Head of Engineering at Tonic.ai, he’s building the AI agent that’s making his own creation obsolete.

From the hidden failures of legacy mocks to the security implications of agent-driven synthesis, Mark reveals what happens when data generation becomes a conversation—not a pipeline.

SponsorsTonic.ai Synthetic data solutions for software and AI development.

Accelerate engineering velocity and ensure compliance with AI-powered data synthesisThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics…

4 months, 3 weeks назад @ datascienceathome.com
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295) Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)

Most companies don’t have an AI problem.

In this conversation, he breaks down when AI actually makes sense, where AWS costs spiral out of control, and why your “cool demo” keeps dying before launch.

If you’re tired of AI hype and ready for straight answers, hit play.

Our Discord community is full of ML engineers, researchers, and AI enthusiasts discussing papers, sharing projects, and helping each other level up.

Whether you’re debugging your first neural net or training your tenth transformer, there’s a place for you.

5 months, 3 weeks назад @ datascienceathome.com