Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 7 часов назад
My offline ablation said -0.19pp. The production retrain said +1.11pp. [D]
My offline ablation said -0.19pp. The production retrain said +1.11pp. [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

7 часов назад @ reddit.com
Source code for LLMs. [D]
Source code for LLMs. [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

8 часов назад @ reddit.com
quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]
quicktok: a faster tokenizer (exact and byte-identical with tiktoken) [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
How the brains learn [R]
How the brains learn [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

19 часов назад @ reddit.com
Cleo: trying to fit full analyst behavior in a 2B model [P]
Cleo: trying to fit full analyst behavior in a 2B model [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

20 часов назад @ reddit.com
Embedded/edge ML folks: what actually eats the most time ,getting data, or cleaning/labeling it (time series sensor data, not computer vision/audio)? [D]
Embedded/edge ML folks: what actually eats the most time ,getting data, or cleaning/labeling it (time series sensor data, not computer vision/audio)? [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

23 часа назад @ reddit.com
Open weights are not enough: we need open training frameworks for research and better algorithms [P]
Open weights are not enough: we need open training frameworks for research and better algorithms [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day назад @ reddit.com
AI language models have favorite names, and we mapped them [R]
AI language models have favorite names, and we mapped them [R] AI language models have favorite names, and we mapped them [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 1 hour назад @ reddit.com
Could AI training be decentralized like Bitcoin mining? [D]
Could AI training be decentralized like Bitcoin mining? [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 2 hours назад @ reddit.com
NeurIPS Competition decision notification [D]
NeurIPS Competition decision notification [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 3 hours назад @ reddit.com
Concept-Vector: A design framework for human-interpretable word embeddings [P]
Concept-Vector: A design framework for human-interpretable word embeddings [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 3 hours назад @ reddit.com
Recent CS graduate looking for GPU compute collaborators for LLM/VLM research [D]
Recent CS graduate looking for GPU compute collaborators for LLM/VLM research [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 6 hours назад @ reddit.com
PrintGuard 2.0 — ShuffleNetV2 + few-shot prototypical network, TFLite via LiteRT, ≈5 MB, runs unmodified in the browser (Pyodide) and on CPython [P]
PrintGuard 2.0 — ShuffleNetV2 + few-shot prototypical network, TFLite via LiteRT, ≈5 MB, runs unmodified in the browser (Pyodide) and on CPython [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 6 hours назад @ reddit.com
PhD study: UX Designers & AI/ML Practitioners to test a "Trust in LLM-based Chatbots" Design Method (~25 min, anonymous) [R]
PhD study: UX Designers & AI/ML Practitioners to test a "Trust in LLM-based Chatbots" Design Method (~25 min, anonymous) [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 11 hours назад @ reddit.com
Why do frontier AI labs send so many people to conferences? [D]
Why do frontier AI labs send so many people to conferences? [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 13 hours назад @ reddit.com
Towards Data Science
последний пост 2 часа назад
Drilling Into AI’s Financial Sustainability
Drilling Into AI’s Financial Sustainability Drilling Into AI’s Financial Sustainability

Let’s go back to my April article and recollect the experience of using AI for the individual.

So, if you’re in the financial department of a tech company, and you need to determine the budget in dollars for AI tokens for the next year, I wish you all the best of luck.

Is it “manual coding” in the second half of the year, after spending the first half using AI intensively?

This also brings up the question of how cost cutting on AI is going to affect the companies providing AI-based solutions.

Even if tech companies do feel like AI is providing them benefits and giving productivity gains, they simply do not have unlimited budgets to apply to it.

2 часа назад @ towardsdatascience.com
Run a Local LLM with OpenClaw on Your Mac Mini
Run a Local LLM with OpenClaw on Your Mac Mini Run a Local LLM with OpenClaw on Your Mac Mini

💵💵 Running a local model eliminates the monthly cost for your OpenClaw agents, entirely.

Download the local LLMAs mentioned, the key to getting good performance from a local model is quantization.

As of June 2026, it’s a top performer for local models, edging out Gemma 4-12B.

Reconfigure OpenClaw to use the local modelWe now need to add this local model to our OpenClaw config so it’s useable by our gateway.

You shouldn’t, but this is doubly important for security{ "ok": true, "capability": "model.run", "transport": "local", "provider": "local", "model": "qwen3-9b", "attempts": [], "outputs": [ { "text": "pong", "mediaUrl": null } ] }We’ve now verified all the plumbing works correctly.

3 часа назад @ towardsdatascience.com
LLM Fallbacks Break Agent Pipelines — I Built the Missing Recovery Layer
LLM Fallbacks Break Agent Pipelines — I Built the Missing Recovery Layer LLM Fallbacks Break Agent Pipelines — I Built the Missing Recovery Layer

My basic retry loop caught the error, swapped to a fallback model, and kept running.

The issue is what happens when you hand a fallback model an unchanged payload formatted for a different engine entirely.

When a model swap breaks the structure at step two, that step doesn’t throw an error.

The fallback model receives the context directly:[RESUME] Task 'pipeline_run_3' interrupted at step 2/3 (Execute planned steps).

Treat a model swap as a data integrity event, not an infrastructure retry.

5 часов назад @ towardsdatascience.com
RAG Questions Need Parsing Too: Turn the User’s String Into Briefs for Retrieval and Generation
RAG Questions Need Parsing Too: Turn the User’s String Into Briefs for Retrieval and Generation RAG Questions Need Parsing Too: Turn the User’s String Into Briefs for Retrieval and Generation

Tquestion-parsing brick of Enterprise Document Intelligence, a series that builds an enterprise RAG system from four bricks: parsing, question parsing, retrieval, and generation.

Turn the noisy user string into a typed, structured brief that downstream bricks can act on.

Question parsing produces one row in question_df plus satellite tables, with two derived views feeding retrieval and generation – Image by author1.

Question parsing has the same goal: turn the unstructured input into structured form before the next steps run on it.

Two follow-up articles close the brick:Article 6 B (extraction) walks what the parser extracts from a user string.

6 часов назад @ towardsdatascience.com
How to Effectively Align with Claude Code
How to Effectively Align with Claude Code How to Effectively Align with Claude Code

I’ll discuss how to align better with your coding agents, which will improve your efficiency when working with them.

In this case, it’s why you should align with coding agents such as Claude Code.

Coding agents are incredibly good at implementing things if they’re given a very specific and well-described spec.

How to align with your coding agentsIn this section, I’ll cover specific techniques that I utilize to align with my coding agent, and also a mindset on how I align with my coding agents.

ConclusionIn this article, I discussed how to effectively align with coding agents.

1 day, 2 hours назад @ towardsdatascience.com
The Protocol That Cleaned Up Our Agent Architecture
The Protocol That Cleaned Up Our Agent Architecture The Protocol That Cleaned Up Our Agent Architecture

Building the MCP server Stdio vs HTTP Connecting it to LangGraph Human-in-the-loop at the protocol boundary What can break in production and why?

Building the MCP ServerAn MCP server can expose three things: Tools (callable actions), Resources (read-only data), and Prompts (reusable templates).

The MCP protocol uses stdout as its communication channel.

For production workloads with chained tool calls, use async with client.session("analyst-tools") to pin multiple calls to one session.

If a human needs to approve a tool call and it takes five minutes, the agent graph is suspended waiting.

1 day, 3 hours назад @ towardsdatascience.com
I Built 11 Models to Predict the 2026 World Cup. They Crown Four Different Champions.
I Built 11 Models to Predict the 2026 World Cup. They Crown Four Different Champions. I Built 11 Models to Predict the 2026 World Cup. They Crown Four Different Champions.

Three rating systems (Elo, Colley, PageRank), two goal models (Poisson, Negative Binomial), five classifiers (logistic regression, KNN, random forest, XGBoost, a neural network), and the betting market as a benchmark.

They crown four different champions — and that disagreement, not the consensus, turns out to be the most useful thing a suite of models can give you.

To get a match probability we run the Elo gap through that logistic curve and split off a draw probability fit separately (more on that below).

The goal models: Poisson and Negative BinomialThese model scorelines, not outcomes.

Every model is built and explained in the upcoming book I co-authored, titled Soccer Analytics with Mac…

1 day, 5 hours назад @ towardsdatascience.com
The System Always Knows: Why Local Efficiency and System Performance Are Not the Same Problem
The System Always Knows: Why Local Efficiency and System Performance Are Not the Same Problem The System Always Knows: Why Local Efficiency and System Performance Are Not the Same Problem

The problem starts when the organization forgets that CPD is only seeing one part of the operating system.

Where the Curves SeparateImagine a grocery delivery route with several stops.

The original CPD model may still show a win.

The system cost is debatable.

A system model is harder.

1 day, 6 hours назад @ towardsdatascience.com
4 Lines You Should Include in Your Claude Skill
4 Lines You Should Include in Your Claude Skill 4 Lines You Should Include in Your Claude Skill

Image generated by author using ClaudeA Quarterly Customer Review Report SkillI’m going to walk you through a Claude skill I built that generates a quarterly customer sentiment report from unstructured product review text, delivered as a PDF to stakeholders.

Use [Data-Supported] only when the insight follows directly from the review text provided.

Here’s the output:Image generated by author using ClaudeHow to Use Claude to Refine the SkillWriting a skill once isn’t enough.

Here is a quarterly customer sentiment report generated by an AI analyst.

Every time Claude produces a report with a clearly wrong or overconfident insight, ask it to add a new constraint to your skill.

2 days, 1 hour назад @ towardsdatascience.com
Vision LLMs are PDF Parsers Too: Reading Charts and Diagrams for RAG
Vision LLMs are PDF Parsers Too: Reading Charts and Diagrams for RAG Vision LLMs are PDF Parsers Too: Reading Charts and Diagrams for RAG

OCR and layout return a box; the vision parser writes text you can retrieve – Image by authorA vision model reads the picture.

A vision model writes “aerial view of a parking lot, roughly half full, around forty cars”.

That sentence is the parse, and only a vision model can produce it.

A vision model reads the text and the tables too, and not worse than the textual engines on clean material.

There is no layout model and no OCR step: the model reads the pixels and fills the schema.

2 days, 3 hours назад @ towardsdatascience.com
GPU Time-Slicing for Concurrent LLM Agents on Kubernetes
GPU Time-Slicing for Concurrent LLM Agents on Kubernetes

A systems-level deep dive into the hidden microarchitectural costs of Kubernetes GPU time-slicing, and what it actually costs to co-locate Agentic AI workloads.

The post GPU Time-Slicing for Concurrent LLM Agents on Kubernetes appeared first on Towards Data Science.

2 days, 5 hours назад @ towardsdatascience.com
Larger Context Windows Don’t Fix RAG — So I Built a System That Does
Larger Context Windows Don’t Fix RAG — So I Built a System That Does Larger Context Windows Don’t Fix RAG — So I Built a System That Does

TL;DRI built a dataset Q&A system and trusted a RAG answer that was less than half-correct.

I was asking a retrieval system to perform heavy computation on data it had only partially seen.

Why RAG Cannot AggregateThe RAG pipeline doesn’t truly understand structured data.

Architectural comparison of query processing workflows, contrasting text-based RAG Simulation retrieval with structured data aggregation in a Semantic Engine.

The illusion of context: How larger context windows in RAG and LLM systems increase user confidence and decrease error detectability without improving actual accuracy.

3 days, 1 hour назад @ towardsdatascience.com
Parse PDFs for RAG Locally with Docling: Rich Tables, No Cloud Upload
Parse PDFs for RAG Locally with Docling: Rich Tables, No Cloud Upload Parse PDFs for RAG Locally with Docling: Rich Tables, No Cloud Upload

The same tables, Docling enriches half of them, all on your own machine – Image by author1.

One DataFrame for every downstream brick to read; paragraph lines and table rows look the same on the way out.

This is the same design choice the Azure builder makes, so a downstream chunker treats fitz, Azure, and Docling table rows identically.

Docling : when fitz misses (tables, scans, figure text, no bookmarks) and the document is confidential or the environment is air-gapped.

A table page in a public report: Azure if you want it managed, Docling if you want it local.

3 days, 3 hours назад @ towardsdatascience.com
Solving the 3Blue1Brown String Probability Problem (Without AI)
Solving the 3Blue1Brown String Probability Problem (Without AI) Solving the 3Blue1Brown String Probability Problem (Without AI)

the trouble of solving a silly probability problem in my free time when I could’ve been doom scrolling?

This article will go through solving a fun probability puzzle that one of my favorite YouTubers (3Blue1Brown) put out recently.

For this problem, we can make a probability tree and manually calculate the expected number of loops.

Image by authorSo, the probability of making a loop is 1/7 and the probability of not making a loop is 6/7 – this yields 1/7 expected loops (1*1/7 + 0*6/7).

What if we changed the problem from calculating the expected number of loops to the expected average circumference of the loops.

3 days, 5 hours назад @ towardsdatascience.com
When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout
When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout

Tables: fitz returns flat words, Azure returns cellsA contract table has rows and columns.

Images: fitz returns the bbox, Azure returns the textMany PDFs have figures with text inside them.

Scanned pages: fitz returns nothing, Azure returns OCRA 30-page native contract gets a 10-page scanned amendment glued at the end.

One table for every downstream brick to read; paragraph lines and table rows look the same on the way out.

: when the same page got both passes, keep azure rows over fitz rows ( if lexicographically, or use an explicit precedence map).

4 days назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
TheSequence TheSequence
последний пост 7 часов назад
The Sequence Knowledge #878: Beyond Transformer: What We Learned
The Sequence Knowledge #878: Beyond Transformer: What We Learned The Sequence Knowledge #878: Beyond Transformer: What We Learned

The Transformer didn’t win because it was the most elegant or the most brain-like design.

Every token looks at every other token, the whole thing maps cleanly onto a GPU grid, and you train it all at once.

The cost is that attention scales quadratically with sequence length, and autoregressive decoding drags around a KV-cache that grows linearly with every token you’ve already seen.

The second family is state space models — the SSM/Mamba line, the most serious challenger of the bunch.

The third family is text diffusion — generation that abandons left-to-right decoding entirely, refining a whole sequence in parallel over a handful of denoising steps.

7 часов назад @ thesequence.substack.com
The Sequence Radar #877: Last Week in AI: Anthropic Ships, Apple Borrows, Musk Lists, Bezos Builds
The Sequence Radar #877: Last Week in AI: Anthropic Ships, Apple Borrows, Musk Lists, Bezos Builds The Sequence Radar #877: Last Week in AI: Anthropic Ships, Apple Borrows, Musk Lists, Bezos Builds

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Anthropic Ships, Apple Borrows, Musk Lists, Bezos BuildsSome weeks in AI feel like incremental patch releases.

🔎 AI ResearchAI Lab: Google Research & Google DeepMindSummary: This paper introduces a unified framework for constructing practical, kernel-based two-sample tests derived from the family of f-divergences.

AI Lab: Stanford UniversitySummary: DELM is a novel multi-agent framework that eliminates the bottleneck of centralized orchestration by relying on a shared, verified context and an asynchronous task queue.

🤖 AI Tech ReleasesClaude Fable 5 and Mythos 5Anthropic released its highly anticipated Fable 5 model, a limited Mytho…

2 days, 7 hours назад @ thesequence.substack.com
The Sequence Opinion: Systems of Record vs. Systems of Action
The Sequence Opinion: Systems of Record vs. Systems of Action The Sequence Opinion: Systems of Record vs. Systems of Action

It changes what enterprise software is fundamentally for.

The old winning layer was the system that held canonical state.

The new winning layer is the system that can take action against that state safely, reliably, and observably.

For the last twenty years, the enterprise stack has been built around one hidden constant: the human is the actor.

The software is basically a database wrapped in forms, permissions, workflows, and a pricing page.

5 days, 7 hours назад @ thesequence.substack.com
The Sequence AI of the Week #875: Why Your Language Model Needs a Nap
The Sequence AI of the Week #875: Why Your Language Model Needs a Nap The Sequence AI of the Week #875: Why Your Language Model Needs a Nap

There’s an awkward fact about the models we all use every day: they don’t learn anything anymore.

Whatever a frontier model knows, it learned once, during training, and then somebody hit save.

It can reason circles around you about events up to its cutoff and then go completely blank about last Tuesday.

Behrouz, Hashemi, and Mirrokni (Google + Cornell) have a name for this in their new paper, and it’s a good one: it’s anterograde amnesia.

The paper’s pitch is that we’ve been missing a step that biology figured out a long time ago.

6 days, 8 hours назад @ thesequence.substack.com
The Sequence Knowledge #874: Transformers or Not?
The Sequence Knowledge #874: Transformers or Not? The Sequence Knowledge #874: Transformers or Not?

💡 AI Concept of the Day: Transformers or Not?

Not because it is obviously the most brain-like, elegant, or efficient design, but because it has the best scaling story.

You add data, parameters, compute, context length, better training recipes, better post-training, and the model gets better in a surprisingly smooth way.

The architecture is simple enough to scale, parallel enough to train efficiently, and expressive enough to absorb huge datasets.

So the question is not “are Transformers good?” They are spectacular.

1 week назад @ thesequence.substack.com
The Sequence Radar #873: Last Week in AI: Soccer, S-1s, and Supermodels
The Sequence Radar #873: Last Week in AI: Soccer, S-1s, and Supermodels The Sequence Radar #873: Last Week in AI: Soccer, S-1s, and Supermodels

The AI of the week dives into a groundbreaking paper that I’ve read three times this week: models need sleep.

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Soccer, S-1s, and SupermodelsThis week I want to start close to home.

At LayerLens, we announced the Stratix Cup, a live tournament in which frontier AI models play soccer in a simulated environment.

The proposed IPO is not yet a public-market event, but it signals that frontier AI is moving from private-market mythology into public-market accountability.

MAI ModelsMicrosoft unveiled 7 new AI models including a flagship MAI-Thinking-1.

1 week, 2 days назад @ thesequence.substack.com
The Sequence Opinion #872: The Cake Is a Battlefield: Who Really Controls the AI Stack
The Sequence Opinion #872: The Cake Is a Battlefield: Who Really Controls the AI Stack The Sequence Opinion #872: The Cake Is a Battlefield: Who Really Controls the AI Stack

When Jensen Huang draws AI as a five-layer cake — energy, chips, infrastructure, models, applications — he describes it as harmony.

Every successful application pulls demand down through models, infrastructure, and chips, all the way to the power plant that keeps it alive.

But Jensen is selling the bottom of the cake, so of course he wants you to see harmony.

The cake is not a structure of mutual reinforcement.

So the right question is never “how many layers do you own.” It is: do you own the scarce layer, and the seam right next to it?

1 week, 5 days назад @ thesequence.substack.com
The Sequence AI of the Week #871: Inside the Loop with Claude Opus 4.8
The Sequence AI of the Week #871: Inside the Loop with Claude Opus 4.8 The Sequence AI of the Week #871: Inside the Loop with Claude Opus 4.8

I am sure you guys are surprised that we are going to cover Claude Opus 4.8 today ;) but I have been playing with it so much that merits the post.

Opus 4.8 shipped on May 28, 2026.

That instinct is wrong here, because Opus 4.8 isn’t competing on the axis the version number implies.

Those are the properties that gate whether you can actually leave an agent running, and they don’t show up on a capability leaderboard.

So let me give you the version I’d want if I were wiring this into a production agent loop at 2am.

1 week, 6 days назад @ thesequence.substack.com
The Sequence Knowledge #870: Liquid Models and the Search for a Post-Transformer Architecture
The Sequence Knowledge #870: Liquid Models and the Search for a Post-Transformer Architecture The Sequence Knowledge #870: Liquid Models and the Search for a Post-Transformer Architecture

💡 AI Concept of the Day: Liquid Models and the Search for a Post-Transformer ArchitectureThe Transformer did not merely become the dominant neural architecture.

Its central idea is deceptively simple: when processing a sequence, let every element look at every other element.

Earlier models processed sequences like a reader moving left to right, updating a hidden state at each step.

Instead of compressing the past into a single state, they exposed the entire past to the model.

During inference, the model accumulates a key-value cache so that each new token can attend to the past.

2 weeks назад @ thesequence.substack.com
The Sequence Radar #869: Last Week in AI: The Token Becomes the Unit of Account — Opus 4.8, OpenRouter, Cognition, Snowflake, and a papal warning
The Sequence Radar #869: Last Week in AI: The Token Becomes the Unit of Account — Opus 4.8, OpenRouter, Cognition, Snowflake, and a papal warning The Sequence Radar #869: Last Week in AI: The Token Becomes the Unit of Account — Opus 4.8, OpenRouter, Cognition, Snowflake, and a papal warning

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: The Token Becomes the Unit of Account — Opus 4.8, OpenRouter, Cognition, Snowflake, and a papal warningFor two years the AI boom was an argument about the future, told in benchmarks and term sheets.

The whole stack — model, router, agent, substrate — is converging on one business model: charge by the token, because the token is the work.

AI Lab: Harvard University & MITSummary: Bidirectional Evolutionary Search (BES) overcomes the limitations of sparse verification signals and narrow autoregressive expansion by coupling forward candidate evolution with backward goal decomposition.

AI Lab: Meta AISummary: MobileMoE introduces a famil…

2 weeks, 2 days назад @ thesequence.substack.com
The Sequence Opinion #868: Recursion Is the New Scaling Law
The Sequence Opinion #868: Recursion Is the New Scaling Law The Sequence Opinion #868: Recursion Is the New Scaling Law

For most of the modern AI era, progress has had a deceptively simple recipe: make the model bigger, train it on more data, and spend more compute.

This formula produced the Transformer era, the foundation model era, and the current wave of large language models.

But the most interesting recent progress in AI is beginning to feel less linear.

It is no longer just about building a larger model that gives a better answer in one pass.

That shift suggests a provocative idea: recursion may be the next scaling law.

2 weeks, 5 days назад @ thesequence.substack.com
The Sequence AI of the Week #867: Thinking in Latents: Why Sapient's HRM-Text Is a Quiet Rebuke to Chain-of-Thought
The Sequence AI of the Week #867: Thinking in Latents: Why Sapient's HRM-Text Is a Quiet Rebuke to Chain-of-Thought The Sequence AI of the Week #867: Thinking in Latents: Why Sapient's HRM-Text Is a Quiet Rebuke to Chain-of-Thought

So we paper over this by making the model think out loud.

Every reasoning step has to leave the residual stream, become a discrete token in a vocabulary built for human communication, and come back in through the embedding layer for the next step.

Sapient Intelligence’s bet, made first with the original Hierarchical Reasoning Model paper last summer and now extended into the language domain with HRM-Text, is that this is fixable.

Not by making the model bigger, not by training on more CoT traces, but by giving the architecture the one thing it doesn’t have: variable, internal, depth.

It’s worth thinking carefully about what they did and what it does and doesn’t yet prove.

2 weeks, 6 days назад @ thesequence.substack.com
The Sequence Knowledge #866: Three Text Diffusion Models You Need To Know About
The Sequence Knowledge #866: Three Text Diffusion Models You Need To Know About The Sequence Knowledge #866: Three Text Diffusion Models You Need To Know About

💡 AI Concept of the Day: Three Text Diffusion Models You Need To Know AboutFor most of the LLM era, language generation has been built around a single assumption: text should be produced like a typewriter, one token at a time, left to right, each new symbol conditioned on a frozen history.

Text diffusion models challenge that assumption at its root.

Instead of factorizing language as “the next token given all previous tokens,” diffusion models define a corruption process and then learn how to reverse it.

Together, they outline the three phases of a new architecture class: scientific proof, industrial deployment, and frontier validation.

LLaDA: The Scientific Proof That Diffusion Can Scale

3 weeks назад @ thesequence.substack.com
The Sequence Radar #865: Last Week in AI: Last Week in AI: Karpathy, Google, Colossus, and the Coming IPO Wave
The Sequence Radar #865: Last Week in AI: Last Week in AI: Karpathy, Google, Colossus, and the Coming IPO Wave The Sequence Radar #865: Last Week in AI: Last Week in AI: Karpathy, Google, Colossus, and the Coming IPO Wave

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Last Week in AI: Karpathy, Google, Colossus, and the Coming IPO WaveThe last three weeks felt like a phase transition.

It is the most coherent agent story any frontier lab has shipped this year.

AI Lab: Carnegie Mellon UniversitySummary: This study evaluates the practical effectiveness of AI-generated peer reviews by having 45 domain scientists manually grade 2,960 individual criticisms from 82 Nature-family papers.

AI Lab: University of Illinois Urbana-Champaign & MetaSummary: This paper introduces Spreadsheet-RL, an on-policy reinforcement learning framework designed to train specialized AI agents for complex spreadsheet workflows…

3 weeks, 2 days назад @ thesequence.substack.com
The Sequence Opinion #864: Every AI Agent Needs a Computer
The Sequence Opinion #864: Every AI Agent Needs a Computer The Sequence Opinion #864: Every AI Agent Needs a Computer

The next phase of AI agents will not be defined only by better models, longer context windows, or more elegant tool-calling APIs, but by something much more primitive: access to a computer.

An agent that can only emit tokens is a brilliant brain in a jar; an agent with a filesystem, terminal, browser, network, package manager, credentials, memory, and guardrails becomes a worker inside a real execution environment.

This is the core thesis: every serious AI agent needs a computer, not metaphorically, but architecturally.

The emerging market for micro-containers, sandboxes, browser runtimes, and agent workspaces is really the market for giving intelligence a body.

The Brain in the Jar Problem

3 weeks, 5 days назад @ thesequence.substack.com
Synced Review
последний пост None
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 2 months, 1 week назад
Вайбкодинг по Chess’ноку. 1. e4
Вайбкодинг по Chess’ноку. 1. e4 Вайбкодинг по Chess’ноку. 1. e4

Но это не вайбкодинг, а тяжёлая профессиональная ИИ-разработка.

За это время по этому проекту в ChatGPT было создано 112 чатов — это примерно 560 промптов.

И в особо напряжённые периоды приходилось вставать по ночам, чтобы оптимально использовать лимиты, которые делятся на 5-часовые и недельные сессии.

Но это не магия и не кнопка «сделать хорошо».

Именно поэтому будущее не за вайбкодингом, а за теми, кто научится управлять этой скоростью.

2 months, 1 week назад @ habr.com
Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества
Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества

Простой пример с ценами на топливо: бензин дорожает и из-за роста цены на нефть, и из-за ее падения.

Осознание того, что твой труд увеличивает чью-то капитализацию, но не решает реальных проблем общества, видимых в быту и в новостях, подтолкнуло искать еще какую-то деятельность.

Кроме того, благодаря АМБ появился уникальный датасет новостей с противоречиями современного общества на kaggle и github, далее о нем.

Датасет новостей о противоречиях современного обществаАктивисты АМБ и волонтеры дружественных коллективов собрали и разметили датасет новостей, подсвечивающие те самые системные противоречия, о которых я задумывался ранее.

Пример Б В 2023 году в мире голодал каждый 11-й человек, а в …

3 months, 3 weeks назад @ habr.com
[Перевод] Как устроен Codex
[Перевод] Как устроен Codex [Перевод] Как устроен Codex

Подробный разбор того, как команда OpenAI Codex создаёт своего кодового агента, как его используют инженеры и что это может значить для будущего разработки ПО.

Чтобы разобраться, как устроен Codex, как команды внутри OpenAI его используют и как он влияет на инженерные практики у создателей ChatGPT, я поговорил с тремя сотрудниками OpenAI:Тибо Соттио (Thibault Sottiaux) — руководитель Codex.

Оба продукта были запущены весной: Codex CLI анонсировали в апреле 2025 года, а Codex в ChatGPT представили в мае.

В команде Codex эти файлы объясняют агенту, как ориентироваться в кодовой базе, какие команды запускать для тестирования и как следовать стандартам проекта.

Использование Codex в OpenAIПомим…

3 months, 3 weeks назад @ habr.com
Курс Natural Language Processing & LLMs — новый сезон
Курс Natural Language Processing & LLMs — новый сезон Курс Natural Language Processing & LLMs — новый сезон

10 февраля мы в очередной раз запускаем бесплатный онлайн-курс по обработке естественного языка (Natural Language Processing).

Что будем проходить:классическое начало: закон Ципфа, TF-IDF, RNN, CNN, Transformer;основные задачи NLP: классификация текста, тегирование и генерация;специфичные области: агенты и вайб-кодинг;LLM и их применение.

Если вы студент ИТМО, МФТИ или ВШЭ, то курс можно зачесть, как учебный.

Работаю в области NLP более 12 лет, успел поработать в Яндексе и ВКонтакте, защитить кандидатскую диссертацию.

Если есть вопросы, то приходите с ними в ODS Mattermost – там будут все ответы, время семинаров и ссылки.

4 months, 2 weeks назад @ habr.com
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода

Однако все задачи в MERA CODE, как впрочем и в SWE-bench и других бенчмарках подобного назначения, следуют классической парадигме, когда у нас есть фиксированный обучающий набор данных и, что более важно, фиксированный проверочный набор.

Но большие языковые модели для кодинга, которые мы и пытаемся оценивать нашим набором, также учатся на GitHub – со времен еще первой модели LLaMa.

Кажется, что 700 задач немного, но это уже очень приличное количество, и что самое важное — это новые задачи.

Current behavior: from sympy import ask, Q, Symbol x = Symbol('x') print(ask(Q.finite(x**-1), Q.real(x))) # Output: True Expected behavior: The function should return None to indicate uncertainty, as x**-…

9 months назад @ habr.com
Machine Learning Mastery
последний пост 6 days, 7 hours назад
Multimodal Browser AI with Transformers.js for Images and Speech
Multimodal Browser AI with Transformers.js for Images and Speech Multimodal Browser AI with Transformers.js for Images and Speech

. . < / div > < div class = "tabs" > < div class = "tab active" data - tab = "file" > Upload File < / div > < div class = "tab" data - tab = "mic" > Record Microphone < / div > < / div > < !

. . < / div > < / div > < / div > < !

< / div > < div class = "tabs" > < div class = "tab active" data - tab = "image" > 🖼 Image Analysis < / div > < div class = "tab" data - tab = "speech" > 🎙 Speech Transcription < / div > < / div > < !

. . < / div > < / div > < / div > < !

all ( [ pipeline ( 'image-classification' , 'Xenova/vit-base-patch16-224' , { dtype : 'q8' , progress_callback : p = > p . status === 'done' && markReady ( 'badge-cls' , 'Classifier' ) } ) , pipeline ( 'image-to-text' , 'Xenova/vit…

6 days, 7 hours назад @ machinelearningmastery.com
The Practitioner’s Guide to AgentOps
The Practitioner’s Guide to AgentOps The Practitioner’s Guide to AgentOps

How to instrument a working research agent with full session tracking, cost attribution, and failure detection using the AgentOps platform.

The AgentOps Tooling EcosystemWhen practitioners say “AgentOps” they may mean either the discipline described above, or the specific platform at agentops.ai.

# Simulate search latency -- remove in production time.sleep(0.3) # Stub response -- replace with: tavily_client.search(query=topic) return { "topic": topic, "depth": depth, "results": f"Comprehensive overview of {topic}: This is a rapidly evolving field " f"with significant developments in 2025-2026.

# # Prerequisites: # pip install agentops anthropic python-dotenv # # Environment variables requir…

1 week, 1 day назад @ machinelearningmastery.com
Building Semantic Search with Transformers.js and Sentence Embeddings
Building Semantic Search with Transformers.js and Sentence Embeddings Building Semantic Search with Transformers.js and Sentence Embeddings

dims } ] ` ) ; // Convert to array of arrays -- one 384-element array per sentence const vectors = batchOutput .

log ( ` Number of vectors : $ { vectors .

* * @param {number[]|Float32Array} vecA - First normalized embedding vector * @param {number[]|Float32Array} vecB - Second normalized embedding vector * @returns {number} Similarity score between -1 and 1 (typically 0 to 1 for sentences) */ function cosineSimilarity ( vecA , vecB ) { if ( vecA .

* * Usage: * const search = new SemanticSearch(extractor); * await search.indexDocuments(myDocs); * const results = await search.search('my query', 5); */ class SemanticSearch { constructor ( extractor ) { // The feature-extraction pipeline instan…

1 week, 4 days назад @ machinelearningmastery.com
Using Scikit-LLM with Open-Source LLMs
Using Scikit-LLM with Open-Source LLMs Using Scikit-LLM with Open-Source LLMs

How to configure the Scikit-LLM library to route requests to a local Ollama endpoint instead of a paid cloud API.

How to build a zero-shot text classifier using a local large language model and scikit-LLM in a familiar scikit-learn-style workflow.

# Use this to tell Scikit-LLM to route cloud requests towards your default local Ollama port SKLLMConfig.set_gpt_url("http://localhost:11434/v1") # Scikit-LLM needs, by default, a key to pass internal validation checks.

SKLLMConfig.set_openai_key("local-ollama-is-free") 1 2 3 4 5 6 # Use this to tell Scikit-LLM to route cloud requests towards your default local Ollama port SKLLMConfig .

. . 100 % | █████████████████████████████████████████████████…

1 week, 5 days назад @ machinelearningmastery.com
Scikit-LLM vs. Traditional Text Classifiers: When Should You Use an LLM?
Scikit-LLM vs. Traditional Text Classifiers: When Should You Use an LLM? Scikit-LLM vs. Traditional Text Classifiers: When Should You Use an LLM?

How to apply zero-shot classification using a transformer-based model (BART) and compare it against the classical baseline.

How to use scikit-LLM with a Groq-hosted large language model for production-ready zero-shot classification with minimal code changes.

Scikit-LLM with zero-shot classification: the most modern, prompt-based approach.

Finally, the zero-shot LLM classifier with a scikit-LLM pipeline and a Groq model:from skllm.config import SKLLMConfig from skllm.models.gpt.classification.zero_shot import ZeroShotGPTClassifier import getpass import time from sklearn.metrics import classification_report # 1.

Initializing with the latest active model for zero-shot classification # 'llama-3…

2 weeks назад @ machinelearningmastery.com
The Roadmap for Mastering LLMOps in 2026
The Roadmap for Mastering LLMOps in 2026 The Roadmap for Mastering LLMOps in 2026

generation ( name = "claude-completion" , model = MODEL , input = { "system" : SYSTEM_PROMPT , "messages" : [ { "role" : "user" , "content" : user_message } ] } ) start_time = time .

create ( model = MODEL , max_tokens = 1024 , system = SYSTEM_PROMPT , messages = [ { "role" : "user" , "content" : user_message } ] ) latency_ms = int ( ( time .

end ( output = response_text , usage = { "input" : input_tokens , "output" : output_tokens , "total" : total_tokens , "unit" : "TOKENS" } , metadata = { "latency_ms" : latency_ms , "cost_usd" : round ( cost_usd , 6 ) , "model" : MODEL } ) # Update the trace with the final output trace .

"" model = route_query ( user_message ) response = completion ( mo…

2 weeks, 1 day назад @ machinelearningmastery.com
Serving Multiple Users at Once: How Continuous Batching Keeps LLM Inference Efficient
Serving Multiple Users at Once: How Continuous Batching Keeps LLM Inference Efficient Serving Multiple Users at Once: How Continuous Batching Keeps LLM Inference Efficient

for slot , ( req_id , ( prompt , cap ) ) in enumerate ( wave ) : print ( f " ++ slot {slot} <- req {req_id} ({cap} tok cap): {prompt!r}" , flush = True ) prompts = [ p for _ , ( p , _ ) in wave ] inputs = tokenizer ( prompts , return_tensors = "pt" , padding = True , truncation = True ) .

Continuous Batching: Dynamic Scheduling and Ragged BatchingContinuous batching is used to address the above problems to improve efficiency.

Notice the time difference of how continuous batching makes the LLM inference process so much more efficient.

for slot , ( req_id , ( prompt , cap ) ) in enumerate ( wave ) : print ( f " ++ slot {slot} <- req {req_id} ({cap} tok cap): {prompt!r}" , flush = True ) promp…

2 weeks, 3 days назад @ machinelearningmastery.com
Building a Context Pruning Pipeline for Long-Running Agents
Building a Context Pruning Pipeline for Long-Running Agents Building a Context Pruning Pipeline for Long-Running Agents

Share Post ShareIn this article, you will learn how to implement a context pruning pipeline for long-running AI agents, enabling them to manage conversational memory efficiently through semantic similarity.

This article outlines the basic principles for implementing a context pruning pipeline for long-running agents.

Simulated Agent History (Usually fetched from a database) chat_history = [ {"role": "user", "content": "My name is Alice and I work in logistics.

Simulated Agent History (Usually fetched from a database) chat_history = [ { "role" : "user" , "content" : "My name is Alice and I work in logistics."

Assemble the final pruned context pruned_context = top_semantic_turns + recent_turn…

2 weeks, 5 days назад @ machinelearningmastery.com
The Statistics of Token Selection: Logits, Temperature, and Top-P Walkthrough
The Statistics of Token Selection: Logits, Temperature, and Top-P Walkthrough The Statistics of Token Selection: Logits, Temperature, and Top-P Walkthrough

Share Post ShareIn this article, you will learn how logits, temperature, and top-p sampling work together to control next-token prediction in large language models.

How temperature and top-p (nucleus sampling) shape the probability distribution used for token selection.

In particular, we will explore how raw model scores, known as logits, interact with two other model settings — temperature and top-p — which are three key parameters utilized to control the token selection process.

Temperature then enters the picture by scaling these raw logits — note that this happens before the softmax function converts them into probabilities.

Depending on the temperature value, the resulting distribution…

2 weeks, 6 days назад @ machinelearningmastery.com
Building a Multi-Tool Gemma 4 Agent with Error Recovery
Building a Multi-Tool Gemma 4 Agent with Error Recovery Building a Multi-Tool Gemma 4 Agent with Error Recovery

key = city.lower().strip() if key not in WEATHER_DATA: raise ValueError( f"Unknown city: '{city}'.

strip ( ) if key not in WEATHER_DATA : raise ValueError ( f "Unknown city: '{city}'.

) if key not in CITY_DATA: raise ValueError(f"Unknown city: '{city}'.

) func = TOOL_FUNCTIONS[function_name] # Defense 2: catch argument errors (wrong types, missing or extra args) try: result = func(**arguments) return " ok ", str(result) except TypeError as e: return " error ", f" Bad arguments for { function_name } : { e } " except ValueError as e: return " error ", str(e) except ToolUnavailableError as e: return " error ", f" Tool temporarily unavailable : { e } " except Exception as e: return " error ", f…

3 weeks назад @ machinelearningmastery.com
Implementing Hybrid Semantic-Lexical Search in RAG
Implementing Hybrid Semantic-Lexical Search in RAG Implementing Hybrid Semantic-Lexical Search in RAG

Share Post ShareIn this article, you will learn how to implement a hybrid search strategy for RAG systems by combining BM25 lexical search with semantic search, fused together using Reciprocal Rank Fusion.

Topics we will cover include:Why hybrid search outperforms either lexical or semantic search alone in retrieval-augmented generation systems.

IntroductionImplementing hybrid search strategies is a critical step in building modern RAG (Retrieval-Augmented Generation) systems, especially when shifting from prototype to production-ready solutions.

Wrapping UpThis article guided you through implementing a hybrid search mechanism for the retrieval stage of RAG systems.

Choosing not to rely sol…

3 weeks, 1 day назад @ machinelearningmastery.com
Building Context-Aware Search in Python with LLM Embeddings + Metadata
Building Context-Aware Search in Python with LLM Embeddings + Metadata Building Context-Aware Search in Python with LLM Embeddings + Metadata

How to build a metadata-aware search index that filters by team, status, priority, and date before scoring candidates.

"}, {"id": "T-102", "team": "infrastructure", "status": "open", "priority": "high", "created": date(2025, 11, 8), "text": "Nginx ingress returning 502 after rotating TLS certificate.

} , { "id" : "T-102" , "team" : "infrastructure" , "status" : "open" , "priority" : "high" , "created" : date ( 2025 , 11 , 8 ) , "text" : "Nginx ingress returning 502 after rotating TLS certificate.

. . [ 0.1714 ] T - 206 backend open high 2025 - 11 - 13 Rate limiting not scoping per user — middleware uses a shared Redis key derived from .

. . [ 0.1419 ] T - 202 backend open high 2025 - 11 - 0…

3 weeks, 4 days назад @ machinelearningmastery.com
How to Build a Multi-Agent Research Assistant in Python
How to Build a Multi-Agent Research Assistant in Python How to Build a Multi-Agent Research Assistant in Python

create ( query = query , limit = limit , scrape_options = scrape_options , ) data = sdk_result_to_dict ( search ) return compact_json ( { "query" : query , "results" : normalize_search_links ( data .

manager_agent = Agent( name="Manager research agent", model=MODEL, instructions=( f"Current date: {current_date_context()}" f"Current year: {current_year_context()}" "You are the orchestrator for a multi-agent research assistant.

def openai_trace_url(trace_id: str) -> str: return f"https://platform.openai.com/logs/trace?trace_id={trace_id}" async def run_research_assistant(query: str) -> MarkdownResearchReport: if not OPENAI_API_KEY: raise RuntimeError( "OPENAI_API_KEY is not set.

with trace( w…

3 weeks, 5 days назад @ machinelearningmastery.com
Agentic Programming: A Roadmap
Agentic Programming: A Roadmap Agentic Programming: A Roadmap

A concrete month-by-month learning roadmap that ends with a working production agent you have built and shipped yourself.

if __name__ == "__main__": goal = "Research the top 3 vector databases for AI in 2026 and write a comparison report."

create ( model = "claude-sonnet-4-20250514" , # Sonnet is fast and cost-effective for loops max_tokens = 4096 , system = system , tools = tools , messages = messages ) print ( f "Stop reason: {response.stop_reason}" ) # Model is done -- return the final message if response .

if __name__ == "__main__" : goal = "Research the top 3 vector databases for AI in 2026 and write a comparison report."

The best production agent systems treat human oversight as a des…

3 weeks, 6 days назад @ machinelearningmastery.com
Prompt Engineering for Agentic AI
Prompt Engineering for Agentic AI Prompt Engineering for Agentic AI

Share Post ShareIn this article, you will learn how prompt engineering changes fundamentally when applied to agentic AI systems, and what principles and patterns enable reliable agent behavior at scale.

The four components every agent prompt needs, including system prompts, tools, examples, and context state management.

This is exactly why Anthropic’s engineering team introduced the concept of context engineering as the natural evolution of prompt engineering.

The Four Components Every Agent Prompt NeedsBased on Lilian Weng’s foundational framework for LLM-powered agents and Anthropic’s engineering guidance, a well-designed agent operates on four categories of context.

ConclusionPrompt engi…

4 weeks назад @ machinelearningmastery.com
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 4 weeks, 1 day назад
AI Will Not Make Your Job Chill
AI Will Not Make Your Job Chill AI Will Not Make Your Job Chill

People keep talking about how AI will make their job easy, and I don’t really understand why.

I assume the factory job producing this was still hard work.

I don’t think AI has made my job chill, and I feel like I am front-line compared to much of the economy.

It’s not widely known, but transportation and warehousing has the highest rate of nonfatal work injuries in the US.

For a while, this will not lead to any job loss, because increasing abundance will lead to higher demand.

4 weeks, 1 day назад @ alexirpan.com
Why I Signed The Amicus Brief for Anthropic v Department of War
Why I Signed The Amicus Brief for Anthropic v Department of War Why I Signed The Amicus Brief for Anthropic v Department of War

On Monday, Anthropic filed a lawsuit against the Department of War, and an amicus brief in support of Anthropic was filed on behalf of a number of OpenAI and Google employees.

There’s also an amicus brief filed on behalf of Microsoft.

There’s conflicting reporting, but very broadly, Anthropic signed an agreement with the government to deploy Claude in classified, military contexts.

Anthropic said no, Pete Hegseth declared them a supply chain risk, and Anthropic filed a lawsuit against this.

The amicus brief was broadly aligned with my thoughts on the matter, so I signed.

3 months, 1 week назад @ alexirpan.com
MIT Mystery Hunt 2026
MIT Mystery Hunt 2026 MIT Mystery Hunt 2026

This has spoilers for MIT Mystery Hunt 2026.

Pre-HuntThe time running up to Hunt was more stressful than usual…very briefly, I typically hunt with teammate.

Just last year, I did GPH 2025, LN Hunt, Teammate Hunt 2025, Microsoft Hunt 2025, and Silph Puzzle Hunt 2025, all of which had significant 3+ hour solve puzzles that would not be out of place in Mystery Hunt.

Not to mention smaller hunts like Advent Hunt, and then I didn’t even do Brown Puzzlehunt or Vertex Hunt or the fall CMU Hunt.

To me, the crux is whether Mystery Hunt is broken, or Mystery Hunt is fine.

4 months, 2 weeks назад @ alexirpan.com
Authentic Imperfection
Authentic Imperfection Authentic Imperfection

* * *I’ve been thinking about the anger surrounding generative AI.

To keep things fair, he took the best human images and best AI images, meaning human art from famous artists, and AI art from prompters skilled at removing obvious tells of image generation.

When people complain about AI slop, I see it as a complaint against the deluge of default style AI images.

We’ve seen this happen in all forms: AI text, AI music, older forms of computer generated content like CGI.

As much as we celebrate imperfection, digital imperfection is a step too far.

7 months назад @ alexirpan.com
Lil'Log
последний пост None
inFERENCe
последний пост 3 months, 3 weeks назад
The Future of Software
The Future of Software The Future of Software

February 25, 2026The Future of SoftwareThe world of software is undergoing a shift not seen since the advent of compilers in the 1970s.

How will humans tell AI agents what software artefacts we would like to create?

How will humans tell AI agents what software artefacts we would like to create?

This future of software creation, in which our programming languages are abstracted away, raises two very important questions:What will the instruction/specification language look like?

This should be a clear layer of separation between the developer and the pool of AI agents working to maintain software.

3 months, 3 weeks назад @ inference.vc
Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On
Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On

Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years OnTen years ago this week, I wrote a provocative and bold post that blew up, made it to top spot on HackerNews.

In hindsight: There is a lot of stuff in deep learning that we don't understand nearly enough.

Sometimes things work for reasons completely unrelated to why we thought they would work.

(Pop some 🍿 in the microwave and read till the end for more)🎯 "Deep learning is powerful exactly because it makes hard things easy"Okay, this was a great insight.

🎯 Generative ModelingIn the post I suggested people learn "something harder" instead of - or in addition to - deep learning.

4 months, 2 weeks назад @ inference.vc
The Spectator
последний пост None
The Unofficial Google Data Science Blog The Unofficial Google Data Science Blog
последний пост None
Off the Convex Path
последний пост None
Jay Alammar
последний пост None
Piekniewski's blog
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост None
Papers With Code Papers With Code
последний пост None
Papers With Code Papers With Code
последний пост None
💼 University and corporation labs
DeepMind DeepMind
последний пост 6 days, 2 hours назад
DiffusionGemma: 4x faster text generation
DiffusionGemma: 4x faster text generation DiffusionGemma: 4x faster text generation

While the AI research community has explored diffusion-based text generation for years, applying it to large models has remained a challenge.

DiffusionGemma changes this by shifting how models use hardware.

But when run locally for a single user, this word-by-word process leaves your dedicated GPU or TPU underutilized — it spends most of its time simply waiting for the next "keystroke."

By giving the computer's processor a larger chunk of work at once, DiffusionGemma utilizes your hardware to its full potential.

It upgrades your model inference from a single, sequential typewriter to a massive printing press that stamps the entire block of text simultaneously.

6 days, 2 hours назад @ blog.google
Investing in multi-agent AI safety research
Investing in multi-agent AI safety research Investing in multi-agent AI safety research

Scaling AI Safety Research for a Multi-Agent WorldFor the past decade, we’ve focused on making individual AI models more capable, helpful and safe.

The funding call focuses on the study of how large-scale multi-agent AI systems behave as a group, and how we can provide frameworks to understand and mitigate against potential risks.

Scaling the frontier of multi-agent safety researchAlthough foundational frameworks for multi-agent safety exist, the rapid evolution of these systems requires an immediate, large-scale expansion of research.

A collaborative call to actionNo single lab can solve multi-agent safety alone.

Building realistic, reproducible environments to evaluate, compare and accele…

6 days, 8 hours назад @ deepmind.google
Fluid, natural voice translation with Gemini 3.5 Live Translate
Fluid, natural voice translation with Gemini 3.5 Live Translate Fluid, natural voice translation with Gemini 3.5 Live Translate

Today, we’re taking our next step with the release of Gemini 3.5 Live Translate, our latest audio model for live speech-to-speech translation.

The model automatically detects 70+ languages and generates smooth, natural-sounding translated speech that preserves the speakers' intonation, pacing and pitch.

Unlike turn by turn systems that wait for the speaker to finish speaking before responding, 3.5 Live Translate generates speech continuously, balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker.

Gemini 3.5 Live Translate is rolling out starting today across Google products:For developers in public preview via the…

1 week назад @ blog.google
Introducing Gemma 4 12B: a unified, encoder-free multimodal model
Introducing Gemma 4 12B: a unified, encoder-free multimodal model Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Today, we are introducing Gemma 4 12B, our latest model designed to bring agentic multimodal intelligence directly to laptops.

Here’s an overview of what makes Gemma 4 12B unique:Novel unified architecture: No multimodal encoders.

Advanced reasoning: Benchmark performance nearing our 26B model, unlocking powerful multi-step reasoning and agentic workflows.

Benchmark performance nearing our 26B model, unlocking powerful multi-step reasoning and agentic workflows.

Small enough to run locally on consumer laptops with 16GB of RAM, it unlocks powerful multimodal and agentic experiences right on your machine.

1 week назад @ blog.google
Powering the future of robotics in Europe
Powering the future of robotics in Europe Powering the future of robotics in Europe

That’s why we’re launching the Google DeepMind Accelerator: Robotics, a three-month program for early-stage robotics startups across Europe.

They’ll have access to our AI stack, technical expertise and Gemini robotics models.

Extend Robotics ( United Kingdom ): Provides teleoperation software and data pipelines that help train and fine-tune foundation models for real-world robotics applications.

Generative Bionics ( Italy ): Amplifies human potential by developing humanoid robots based on physical AI, developed in Europe but built to scale globally.

): Amplifies human potential by developing humanoid robots based on physical AI, developed in Europe but built to scale globally.

1 week назад @ blog.google
Measuring the impact of learning with AI in Sierra Leone and beyond
Measuring the impact of learning with AI in Sierra Leone and beyond Measuring the impact of learning with AI in Sierra Leone and beyond

The results from this pre-registered trial suggest that AI can be a powerful pedagogical partner — not by replacing teachers, but by augmenting their reach.

Students using Guided Learning saw a gain of +0.258 standard deviations in their math scores compared to the control group.

In practical terms, this represents roughly 1.2 to 1.7 years of typical learning progress achieved within the eight-week trial.

To further understand the impact of Guided Learning on student learning, we are conducting a series of additional pre-registered RCTs globally.

Additionally, our support of the Global AI for Learning Alliance (GAILA) will accelerate these commitments and others through collective action.

1 week, 1 day назад @ deepmind.google
We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks
We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

The Asia-Pacific region is a global engine for economic growth, but it's also highly vulnerable to climate change.

While green technologies are gaining momentum, a recent report shows they aren’t scaling fast enough to keep up with the region’s rising environmental risks.

Selected organizations will receive expert mentorship, tailored support and help integrating frontier AI and science AI models from Google AI experts into their projects or products.

If you're working on climate solutions, we want to help you scale your work.

The program kicks off with an in-person bootcamp in Singapore, and you can learn more and register your interest today.

3 weeks, 4 days назад @ blog.google
Fast-tracking genetic leads to reverse cellular aging
Fast-tracking genetic leads to reverse cellular aging Fast-tracking genetic leads to reverse cellular aging

Biologists Omar Abudayyeh and Jonathan Gootenberg are using Co-Scientist to help them blast through both.

Their lab runs huge genetic screens that flip thousands of genes on or off then reads how cells respond to these changes.

Co-Scientist is helping on two fronts.

Second, Co-Scientist speeds up the follow-through.

Having Co-Scientist analyse their screening data alongside the literature, that work is slashed to just a few days.

4 weeks, 1 day назад @ deepmind.google
Simulate real-world places with Project Genie and Street View
Simulate real-world places with Project Genie and Street View Simulate real-world places with Project Genie and Street View

Street View: ground your worlds in real placesWhen creating imaginative worlds in Project Genie, you can now also base them on real places.

This capability is powered by Maps Imagery Grounding, the same technology developers use to create stunning AI visuals with Street View.

Street View imagery in Project Genie is available now for places in the U.S. with plans to expand to more places over time.

Project Genie: now available with Google AI UltraStarting today, Project Genie — including the new Street View capability — is gradually rolling out to all eligible Google AI Ultra $200 subscribers globally (18+).

Try creating today with Project Genie.

4 weeks, 1 day назад @ blog.google
Introducing Gemini Omni
Introducing Gemini Omni Introducing Gemini Omni

We’re introducing Gemini Omni, where Gemini’s ability to reason meets the ability to create.

Omni is our new model that can create anything from any input — starting with video.

With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini's real-world knowledge.

Today, we’re rolling out the first model in the Omni family: Gemini Omni Flash, to the Gemini app, Google Flow and YouTube Shorts.

Here’s some of what makes Omni special:Edit your videos through conversationGemini Omni gives you an easier way to edit video — with natural language.

4 weeks, 1 day назад @ blog.google
Introducing Google Antigravity 2.0
Introducing Google Antigravity 2.0 Introducing Google Antigravity 2.0 4 weeks, 1 day назад @ antigravity.google
Gemini for Science: AI experiments and tools for a new era of discovery
Gemini for Science: AI experiments and tools for a new era of discovery Gemini for Science: AI experiments and tools for a new era of discovery

That’s why we are introducing Gemini for Science, a collection of science tools and experiments designed to expand the scale and precision of scientific exploration.

Gemini for Science experimental tools on Google Labs include three primary prototypes designed to handle such tasks.

Literature Insights searches scientific literature and structures results into tables with custom, searchable attributes for side-by-side analysis.

Our research teams using Science Skills have already seen this speedup in practice.

In early testing, our team used Science Skills to perform a complex analysis that normally takes hours in minutes.

1 month назад @ blog.google
Making it easier to understand how content was created and edited
Making it easier to understand how content was created and edited Making it easier to understand how content was created and edited

As generative media becomes more advanced and accessible, it’s helpful to know where content comes from, and whether it’s been altered.

Today, we’re expanding our content transparency and verification tools in Search, Gemini, Chrome, Pixel and Cloud, and deepening our partnership with the broader industry.

Since then, we've integrated SynthID into our generative media models and products, watermarking over 100 billion images and videos and 60,000 years of audio.

Across a growing number of our generative media tools, we use C2PA Content Credentials, the industry standard that shows how media was created and modified, with or without AI.

In an era of generative media, we believe that identify…

1 month назад @ blog.google
Strengthening Singapore’s AI Future: A New National Partnership
Strengthening Singapore’s AI Future: A New National Partnership Strengthening Singapore’s AI Future: A New National Partnership

At Google DeepMind, we believe frontier AI can be a powerful force for good.

As part of our National Partnerships for AI initiative, we are launching new programmes in the country, while driving a key pillar of Google’s new National AI partnership with the Singapore Government.

Together, we are strengthening Singapore’s National AI Strategy to responsibly deploy AI at scale for economic growth and public benefit.

Using frontier AI like AlphaFold and Google Earth and other latest AI for Science tools, we are working to accelerate our understanding of outbreaks across Southeast Asia.

Using frontier AI like AlphaFold and Google Earth and other latest AI for Science tools, we are working to…

1 month назад @ deepmind.google
Finding the molecular switches behind new infectious diseases
Finding the molecular switches behind new infectious diseases Finding the molecular switches behind new infectious diseases

Professor Clare Bryant at the University of Cambridge is using Co-Scientist to hunt for the molecular switches that cause severe diseases, like sepsis, in humans when pathogens leap between species, and find new approaches to prevent this happening.

Testing Co-Scientist, Bryant fed it a summary of one of her grant proposals studying flu in birds and humans outlining her lab’s research questions.

moment: Co-Scientist had prioritised a protein that hadn’t been on her radar, connected to several signalling pathways she was already interested in.

Back at her lab, she added unpublished material, kept confidential within Co-Scientist.

But her lab is now on track to complete it in six months, she …

1 month назад @ deepmind.google
Google
последний пост 4 часа назад
How Siemens "slices the elephant," advancing agentic workflows for industrial software development
How Siemens "slices the elephant," advancing agentic workflows for industrial software development How Siemens "slices the elephant," advancing agentic workflows for industrial software development

For technology companies like Siemens, software is the nervous system of factories, energy grids, and transportation networks worldwide.

As a global leader in industrial AI, industrial software, and industrial automation, Siemens brings decades of domain expertise across factory and process automation, energy infrastructure, and intelligent transportation — expertise that no off-the-shelf AI solution can replicate.

Existing coding assistants lacked the contextual depth required to navigate complex, multi-layered industrial codebases — a gap Siemens set out to close.

To solve this, Siemens and Google Cloud created Knowledge Fabric, an AI system for automating the software development lifecyc…

4 часа назад @ cloud.google.com
Cloud CISO Perspectives: The 4 lessons that guided AI Threat Defense
Cloud CISO Perspectives: The 4 lessons that guided AI Threat Defense Cloud CISO Perspectives: The 4 lessons that guided AI Threat Defense

Based on this approach, we recently introduced AI Threat Defense as a pathway to achieve the threat-readiness transformation that you need to defend against AI threats with AI.

Second, we invested in the operational framework supporting the vulnerability work.

Third, we planned engineering work alongside security work: Your engineering partners are critical, especially for aligning with your resiliency and deployment processes.

Key lessons include:Tagging components with the model, harness, and issues found when scanning.

A less capable model with a good harness and good expert is more powerful than the best model without a good harness or good experts.

1 day, 2 hours назад @ cloud.google.com
Introducing the Open Knowledge Format
Introducing the Open Knowledge Format Introducing the Open Knowledge Format

As foundation models continue to improve, the lack of relevant context often limits what they can do, especially as they are used to build agentic systems.

That’s why today, we’re introducing the Open Knowledge Format (OKF), an open specification that formalizes the LLM-wiki pattern into a portable, interoperable format.

This is a vendor-neutral, agent- and human-friendly standard for representing the metadata, context, and curated knowledge that modern AI systems need.

What's missing is a format, not another serviceThe answer to this problem isn’t another knowledge service.

How OKF works: The design in one screenAn OKF bundle is a directory of markdown files representing concepts: anything…

4 days, 5 hours назад @ cloud.google.com
Powering the next era of Confidential AI
Powering the next era of Confidential AI Powering the next era of Confidential AI

By protecting data in use, Confidential Computing becomes a fundamental and foundational element for building trust in AI systems, providing verifiable integrity and isolation for sensitive workloads.

Confidential Computing helps prevent unauthorized access because data remains encrypted and isolated.

Our platform supports Apple’s PCC privacy commitments with a layered security approach built upon Google Cloud’s infrastructure, including:Google Cloud Confidential Computing : Our core Confidential Computing platform provides the hardware-based TEEs necessary for PCC.

: Google Titan chips are a key component in powering security and transparency posture for PCC infrastructure on Google Cloud.…

4 days, 23 hours назад @ cloud.google.com
Claude Fable 5: Available on Google Cloud
Claude Fable 5: Available on Google Cloud Claude Fable 5: Available on Google Cloud

Claude Fable 5, Anthropic’s latest frontier model, is now generally available on Google Cloud.

This launch is the latest proof point of our ongoing commitment to bring the industry's latest models straight to our Agent Platform.

Claude Fable 5 brings the best of Anthropic model capabilities to all customers, with strong safeguards designed to make it safe for general use.

Designed for complex, multi-step reasoning, Claude Fable 5 is good for demanding tasks like advanced software development, long-horizon agents, and deep multimodal document analysis.

Build with Claude Fable 5 and other models from Anthropic — including Claude Opus 4.8 and Claude Sonnet 4.6 — today on Agent Platform.

1 week назад @ cloud.google.com
Report: GKE Inference Gateway delivers up to 92% faster AI responses
Report: GKE Inference Gateway delivers up to 92% faster AI responses Report: GKE Inference Gateway delivers up to 92% faster AI responses

In fact, according to an independent benchmark report, GKE Inference Gateway outperforms the next leading managed Kubernetes service with 15.7% higher throughput, 92.8% shorter wait times, and 62.6% lower inter-token latency.

That performance tracks with Snap’s experience using GKE Inference Gateway.

In this blog, we take a closer look at GKE Inference Gateway’s prefix caching, complete with examples.

The secret to low-latency AI: Prefix cachingPrefix caching optimizes LLM performance by storing the KV cache (activation states) of long, repetitive prompt prefixes.

GKE Inference Gateway reads incoming request prefixes and matches them to the specific pods that already hold that data in memor…

1 week назад @ cloud.google.com
How to unlock true ROI in software development – a deep dive into the latest DORA research
How to unlock true ROI in software development – a deep dive into the latest DORA research How to unlock true ROI in software development – a deep dive into the latest DORA research

How do you prove the business value of generative AI to your teams?

Technology and finance leaders need to show the clear business value of AI projects to secure ongoing funding.

To help you evaluate the costs and business benefits of AI, we recently shared the DORA: ROI of AI-assisted software development report.

Insight #1: Navigating the J-curve of AI value realizationIt is important to be realistic about how quickly you will see a return on your AI investments.

While AI can act as a powerful amplifier for software engineering, the path to financial value is rarely a straight line.

1 week назад @ cloud.google.com
Detecting and containing AI-powered threats with Google Security Operations agents
Detecting and containing AI-powered threats with Google Security Operations agents Detecting and containing AI-powered threats with Google Security Operations agents

Autonomous investigation, containment, and responseIf a threat is detected, you need to immediately and autonomously assess and respond to protect your environment.

The Triage and Investigation agent in Google Security Operations, generally available, helps analysts drastically reduce time to respond by autonomously investigating alerts, gathering evidence for analysis, and providing verdicts with comprehensive explanations.

It can help security analysts automate decision-making, alert closure, and remediation flows, allowing them to spend more time prioritizing high-priority threats instead of false positives.

The agent has already investigated over 5 million alerts, reducing a typical 30-…

1 week назад @ cloud.google.com
Modernizing Healthcare: How Alcidion achieved greater stability and performance with AlloyDB
Modernizing Healthcare: How Alcidion achieved greater stability and performance with AlloyDB Modernizing Healthcare: How Alcidion achieved greater stability and performance with AlloyDB

The team had to manually balance database loads between elastic pools to maintain performance while trying to optimize costs.

Performance latency: Complex JSON data processing, critical for modern health informatics, was taking up to 30 minutes for certain jobs.

Stability concerns: The team sought a more stable Kubernetes environment and a persistent backend that could scale without constant administrative intervention.

Alcidion achieved this by spinning up a new Google Cloud instance synchronized to the active one, with both accessible via unique fully qualified domain names.

By moving to AlloyDB, Alcidion has improved its stability and performance and built a strong foundation to keep del…

1 week, 1 day назад @ cloud.google.com
What's new for Managed Service for Apache Spark clusters
What's new for Managed Service for Apache Spark clusters What's new for Managed Service for Apache Spark clusters

We recently announced that the Dataproc service is now Managed Service for Apache Spark, reflecting our deep integration with the Agentic Data Cloud.

To support the diverse architectural needs of today’s modern data teams, we offer the service in two distinct deployment modes: serverless and managed clusters.

This blog post focuses specifically on what we announced at Google Cloud Next ‘26 for the Managed Spark clusters deployment mode: providing enhanced flexibility to fine-tune performance and cost through native execution engine, smarter scaling policies, and Gemini-powered extensions.

Faster, with the Lightning Engine native execution engineArguably the biggest update for Managed Spark …

1 week, 5 days назад @ cloud.google.com
How Trustpilot built a real-time architecture for data enrichment using Gemma
How Trustpilot built a real-time architecture for data enrichment using Gemma How Trustpilot built a real-time architecture for data enrichment using Gemma

Trustpilot has been doing exactly that with custom machine learning since long before large language models (LLMs) were cool.

Instead, by fine-tuning open-weight models like Gemma, Trustpilot takes full ownership of their AI strategy.

Here’s how:Total model independence: By owning its models, Trustpilot ensures it controls the retraining lifecycle, completely freeing it from a third-party vendor's update schedule or sudden API changes.

Expanding MLOps capabilities: Building these models in-house enables Trustpilot to bake in the "secret sauce" of its review intelligence while building competencies on open-weight models.

Rather than deploying one massive model, Trustpilot built a suite of hi…

2 weeks, 1 day назад @ cloud.google.com
The fully-managed Remote MCP Server for AlloyDB is now Generally Available
The fully-managed Remote MCP Server for AlloyDB is now Generally Available The fully-managed Remote MCP Server for AlloyDB is now Generally Available

To bridge this gap, we are excited to announce the Remote Model Context Protocol (MCP) Server for AlloyDB is now generally available.

The Remote MCP Server for AlloyDB runs on fully-managed Google Cloud infrastructure and exposes an HTTP endpoint that connects your AI applications to your data.

This solves key challenges for teams building agents on PostgreSQL:Centralized discovery : Find, secure, and manage your database's MCP server using Agent Registry.

Let's see it in action: A quick demoGetting started with the AlloyDB Remote MCP server is a straightforward process.

Enable data access API: Permit the Data Access API on your AlloyDB instance.

2 weeks, 1 day назад @ cloud.google.com
Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators & more
Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators & more Cool stuff Google Cloud customers built, May edition: Agentic algorithms for supply chains; virtual try-on APIs; robotic camera operators & more

Be sure to check back next month to see how more industry leaders and exciting startups are putting Google Cloud technologies to use.

Google Cloud and IBM teams also assisted URBN in a rigorous, iterative switchover testing strategy.

What they did: To understand how local decisions ripple across their entire global network, BASF turned to AlphaEvolve on Google Cloud to build a digital twin of their supply chain.

What they did: Working with Google Cloud, they built a virtual try-on experience that lets shoppers see high-end fashion on their own bodies using a simple selfie.

What they did: Movix developed custom models for deep learning, computer vision, and 3D mesh analysis over a five-month…

2 weeks, 4 days назад @ cloud.google.com
Cloud CISO Perspectives: How to build an AI-ready security program for the public sector
Cloud CISO Perspectives: How to build an AI-ready security program for the public sector Cloud CISO Perspectives: How to build an AI-ready security program for the public sector

Your tactical execution plan: Months zero to sixBuilding an AI-ready security program is a journey.

Vendor and spend optimization (Immediate): Upload vendor capability matrices and contracts to an isolated AI agent (like NotebookLM).

AI-driven security training (within six months): As manual processes are increasingly automated, use that reclaimed time to run capture the flag (CTF) exercises and community contests for your security team.

Proactive threat hunting : Use AI as a hunting advisor.

Train an internal agent on your organization’s historical IR tickets and SOPs.

2 weeks, 4 days назад @ cloud.google.com
Developer's guide to Gemini Enterprise and A2UI integration
Developer's guide to Gemini Enterprise and A2UI integration Developer's guide to Gemini Enterprise and A2UI integration

What A2UI isA2UI is an open protocol, introduced by Google and co-developed with the Flutter team and product teams behind Gemini Enterprise.

Gemini Enterprise — GE is the shell, the renderer, and the transport client.

Two patterns are common today:Inline pattern — the agent sends a component tree with the data baked into each component (the pattern Gemini Enterprise renders today).

How A2UI works inside Gemini EnterpriseGemini Enterprise ships with a built-in A2UI renderer.

High-level architecture exampleThe reference implementation is an ADK backend on Cloud Run designed to plug seamlessly into Gemini Enterprise.

2 weeks, 4 days назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 3 days, 22 hours назад
Ire identifies another LOTUSLITE specimen
Ire identifies another LOTUSLITE specimen Ire identifies another LOTUSLITE specimen

At a glance Project Ire identifies a LOTUSLITE variant that shares TTPs (tools, tactics, procedures) with the public family but none of its indicators of compromise (IOC).

On Ire’s calibrationOne noteworthy observation in Ire’s report (opens in new tab) is worth highlighting first.

The Ire report does not surface a matching entry-point name, but it identifies that the behavioral shape is the same.

Ire never named LOTUSLITE in its report or chain of evidence.

Ire described the behavior precisely enough to make the mapping straightforward of this sample to LOTUSLITE.

3 days, 22 hours назад @ microsoft.com
Data Formulator 0.7: AI-powered data analytics for enterprise data
Data Formulator 0.7: AI-powered data analytics for enterprise data Data Formulator 0.7: AI-powered data analytics for enterprise data

At a glance Data Formulator 0.7 is an open-source AI-powered system for enterprise data analytics that combines data connectivity, agent-guided exploration, and visualization refinement in a shared workspace.

Enterprise teams increasingly rely on AI systems for analytics, but enterprise data workflows are often fragmented across storage systems and tools.

Listen now Opens in a new tabConnecting enterprise data with Data ConnectorsData Formulator helps teams bring enterprise data into an AI-ready workspace without needing to rebuild the same connections for every source of data.

Data Connectors provide persistent connections between enterprise data sources and Data Formulator, allowing analy…

2 weeks, 5 days назад @ microsoft.com
Extending Human Intelligence Through AI
Extending Human Intelligence Through AI Extending Human Intelligence Through AI

At a glance Modern AI systems are powerful not because they replicate human intelligence, but because they presuppose it, by extending structures already present in human cognition and language.

Understanding AI as an extension of human intelligence—not a replacement for it—offers a more grounded path for building trustworthy AI systems.

Rather than asking whether AI systems are becoming intelligent in the human sense, these approaches ask a more basic question: What if AI systems work because they rely on structures that are rooted in human cognition?

In our recent paper, The Origins of Artificial Intelligence in Natural Intelligence, we argue that modern AI systems are best understood nei…

2 weeks, 6 days назад @ microsoft.com
MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models
MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models

Built as the next generation of Magentic-UI, it combines a redesigned app with a harness optimized for small models.

MagenticBrain and Fara1.5 are small models designed for orchestration and computer-use tasks, respectively.

Together, these releases explore how far agentic performance can be pushed with smaller models, codesigned tools, and an optimized execution harness.

Today, Microsoft Research AI Frontiers releases MagenticLite (opens in new tab), an experimental agentic application designed for small models.

The result is an agent that runs efficiently, keeps data on the user’s machine, and supports a broad range of agentic tasks.

3 weeks, 5 days назад @ microsoft.com
Vega: Zero-knowledge proofs for digital identity in the age of AI
Vega: Zero-knowledge proofs for digital identity in the age of AI Vega: Zero-knowledge proofs for digital identity in the age of AI

Vega puts these building blocks together into a single proof system.

The hashing problem, and how folding solves itA credential proof must do two expensive things: hash the credential bytes with SHA-256 and verify the issuer’s digital signature.

Making it zero-knowledge, cheaplyA proof system needs to be zero-knowledge: the verifier should learn nothing beyond the claim being proved.

Device bindingA zero-knowledge credential proof is only useful if it is tied to the person holding the credential.

The proof system powering Vega is already available as the open-source spartan2 (opens in new tab) project on GitHub.

3 weeks, 5 days назад @ microsoft.com
Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability
Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability

Our recent paper, “LLMs Corrupt Your Documents When You Delegate”, has generated discussion about the reliability of AI systems in delegated workflows.

Using a controlled evaluation methodology, we examine how well information is preserved across these extended workflows.

We use chained transformation-and-inversion tasks that evaluate whether semantic content is preserved accurately across extended delegated workflows.

Azure AI Foundry Labs Get a glimpse of potential future directions for AI, with these experimental technologies from Microsoft Research.

At the same time, the findings should not be interpreted as evidence that AI systems lack practical value in real-world work today.

1 month назад @ microsoft.com
mimalloc: A new, high-performance, scalable memory allocator for the modern era
mimalloc: A new, high-performance, scalable memory allocator for the modern era mimalloc: A new, high-performance, scalable memory allocator for the modern era

mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free.

The mimalloc memory allocator was initially designed in 2020 as a fast allocator for the state-of-the-art Lean (opens in new tab) and Koka (opens in new tab) programming languages developed at RiSE, both of which use novel compiler-guided reference counting (see Perceus).

ja .LBB0_generic leaq 7 ( %rsi ), %rax ; round to sizeof(void*) andq $-8 , %rax movq 232 ( %rdi , %rax ), %rcx ; rcx = heap->small_pages[index] movq 8 ( %rcx ), %rax ; block = rax = page->free testq %rax , %rax ; block == NULL?

Thus, mimalloc has three free lists per (64 KiB) mimalloc page, and effectively that …

1 month назад @ microsoft.com
GridSFM: A new, small foundation model for the electric grid
GridSFM: A new, small foundation model for the electric grid GridSFM: A new, small foundation model for the electric grid

Microsoft releases a lightweight foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings in grid analysis.

It provides a foundation for the community to build advanced power grid simulators and planning tools without recreating data or models from scratch.

Microsoft introduces GridSFM, a small foundation model for solving AC optimal power flow (AC-OPF) problems in transmission power grids.

Power grids face increasing strain from surging demand, the need to integrate renewable energy sources, transportation electrification, and extreme weather events.

This release adds the first open AC-OPF model that supports multiple grid topo…

1 month назад @ microsoft.com
Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models
Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models

Now we have experimentally synthesized it and measured its thermal conductivity (152 W/m/K) to be close to the thermal conductivity of silicon.

Now we have experimentally synthesized it and measured its thermal conductivity (152 W/m/K) to be close to the thermal conductivity of silicon.

Faster simulation : We have accelerated MatterSim-v1 model inference by 3-5x and integrated it with the LAMMPS software package, enabling large-scale simulations across multiple GPUs.

These include experimental validation of MatterSim predictions for thermal conductors, performance improvements for faster simulation, and the introduction of a new multi-task foundation model for materials characterization.

Le…

1 month назад @ microsoft.com
SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests
SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

In our simulated multi-agent marketplace, agents accepted the first proposal they received up to 93% of the time without exploring alternatives.

Introducing SocialReasoning-BenchFigure 1: Our benchmark measures agents’ social reasoning ability in two domains, calendar coordination and marketplace negotiation.

Finding 3: Outcome optimality shows how much value agents leave on the table.

In calendar, agents perform better but still settle below the midpoint on average.

Finally, Outcome Optimality works well in settings with clear boundaries, where a “good” outcome can be defined and measured.

1 month назад @ microsoft.com
Building realistic electric transmission grid dataset at scale: a pipeline from open dataset
Building realistic electric transmission grid dataset at scale: a pipeline from open dataset Building realistic electric transmission grid dataset at scale: a pipeline from open dataset

The ability to study transmission-level power grid behavior is essential for modern power systems research.

In most of the world, including the United States, realistic transmission-level grid data is classified as critical infrastructure information and subject to strict access controls.

These restrictions exist for good reasons, but the resulting lack of realistic grid models is increasingly exacerbating the challenges power systems face.

In this work, we introduce an open-data-derived pipeline for constructing large-scale, transmission-level power grid models that realistically approximate existing networks without relying on proprietary or restricted datasets.

Using only publicly access…

1 month, 1 week назад @ microsoft.com
Microsoft at NSDI 2026: Advances in large-scale networked systems
Microsoft at NSDI 2026: Advances in large-scale networked systems Microsoft at NSDI 2026: Advances in large-scale networked systems

The USENIX Symposium on Networked Systems Design and Implementation 2026 (opens in new tab) (NSDI ’26) is a leading forum where researchers and practitioners share new research, insights, and advances in the design and operation of these systems.

Microsoft is proud to support NSDI ’26 as a returning sponsor, reflecting our ongoing commitment to advancing systems and networking research and engaging with the broader community.

Together, they highlight advances in building and operating large-scale networked systems.

Spotlight: Microsoft research newsletter Microsoft Research Newsletter Stay connected to the research community at Microsoft.

Wednesday, May 6, 9:00–10:20 AMYuxuan Yan, Zhejiang …

1 month, 1 week назад @ microsoft.com
Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale
Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale

Actions that seem harmless can cascade causing a chain reaction across an agent network.

Invisibility: Information can pass through chains of unaware agents, making the source of an attack hard to trace from any single agent’s perspective.

Each independently contacted a victim agent (Bob) about the same fabricated audit, using varied language and staggered timing to appear unrelated.

Experimental setup: A principal entrusts their agent, Bob, with sensitive personal data: disability accommodation, medical schedule, preferred pharmacy, emergency contact.

Agents relayed summaries of other agents’ private messages to the attacker (one forwarded another agent’s message within seconds), and agent…

1 month, 2 weeks назад @ microsoft.com
AutoAdapt: Automated domain adaptation for large language models
AutoAdapt: Automated domain adaptation for large language models AutoAdapt: Automated domain adaptation for large language models

At a glance Problem : Adapting large language models to specialized, high-stakes domains is slow, expensive, and hard to reproduce.

: Adapting large language models to specialized, high-stakes domains is slow, expensive, and hard to reproduce.

Why it matters: The result is faster, automated, more reliable domain adaptation that turns weeks of manual iteration into repeatable pipelines.

Deploying large language models (LLMs) in real-world, high-stakes settings is harder than it should be.

In our paper, “AutoAdapt: An Automated Domain Adaptation Framework for Large Language Models,” we describe an end-to-end, constraint-aware framework for domain adaptation.

1 month, 3 weeks назад @ microsoft.com
Can we AI our way to a more sustainable world?
Can we AI our way to a more sustainable world? Can we AI our way to a more sustainable world?

Because I do think there’s a role for AI, a huge role for AI.

BURGER: Right, right.

BURGER: Right, right.

So I think that’s also something quite important here that, you know, AI can help facilitate.

And I think that’s not just applying AI to solve solutions through optimization but also thinking about this in an integrated way.

1 month, 3 weeks назад @ microsoft.com
MIT AI MIT AI
последний пост 4 days, 22 hours назад
Jinhua Zhao named head of the Department of Urban Studies and Planning
Jinhua Zhao named head of the Department of Urban Studies and Planning Jinhua Zhao named head of the Department of Urban Studies and Planning

Jinhua Zhao MCP ’04, SM ’04, PhD ’09 has been appointed head of the Department of Urban Studies and Planning (DUSP), effective July 1.

Zhao is the Class of 1941 Professor of Cities and Transportation at MIT.

“Jinhua is one of those rare scholars who moves seamlessly between cutting-edge research and real-world policy,” says Sarkis.

After earning advanced degrees at MIT, Zhao joined the DUSP faculty.

Zhao hosts the weekly MIT Mobility Forum via Zoom, with each discussion open to the public.

4 days, 22 hours назад @ news.mit.edu
When it comes to predicting people’s preferences, it pays to consider “the power of three”
When it comes to predicting people’s preferences, it pays to consider “the power of three” When it comes to predicting people’s preferences, it pays to consider “the power of three”

His 1927 paper laid the groundwork for what are now called random utility models, which provide a mathematical framework for describing human preferences — information that can be relied upon, in turn, to make predictions about various hypothetical situations.

Random utility models (RUMs) are so named because they assess the “utility,” or benefit, that can be obtained from a given choice — such as deciding which book to read first among the stack of novels you brought back from the library.

“If a digital platform has a blind eye to the existence of such correlations, it will not be able to estimate preferences very accurately,” Daskalakis notes.

This finding provides a highly practical road…

4 days, 23 hours назад @ news.mit.edu
MIT affiliates win 2026 Hertz Foundation Fellowships
MIT affiliates win 2026 Hertz Foundation Fellowships MIT affiliates win 2026 Hertz Foundation Fellowships

The Hertz Foundation announced that it awarded 2026 fellowships to three current MIT students as well as an incoming graduate student.

This year’s MIT-affiliated recipients are among a total of 19 Hertz Foundation Fellows scholars selected from across the United States.

Zachary S. Siegel is an electrical engineering and computer science graduate student pursuing a PhD in the Computer Science and Artificial Intelligence Laboratory, where he works at the intersection of robotics, cognitive science, and artificial intelligence.

He graduated summa cum laude from Princeton University with a BSE in computer science and a minor in philosophy, receiving honors including Tau Beta Pi, Sigma Xi and th…

5 days, 1 hour назад @ news.mit.edu
Startup’s nuclear-inspired cooling system could make data centers more sustainable
Startup’s nuclear-inspired cooling system could make data centers more sustainable Startup’s nuclear-inspired cooling system could make data centers more sustainable

The rise of artificial intelligence is riding on the back of an enormous data center expansion.

Data centers are projected to account for anywhere from 9 to 17 percent of total electricity usage in the U.S. by the end of the decade.

Today, around a third of data center electricity is devoted to cooling the chips that run AI models.

Ferveret is already testing its solutions with companies including CleanSpark, the data center developer and operator, as well as FuriosaAI, an AI accelerator company, and Switch, one of the largest data center operators in the U.S.

The main goal for these data center operators would be to get more tokens from the power they have.

6 days, 14 hours назад @ news.mit.edu
The consequences of relying on AI for accurate news
The consequences of relying on AI for accurate news The consequences of relying on AI for accurate news

It’s no secret that the last few years have seen a massive explosion in the use of artificial intelligence for general information-gathering.

(Roughly a quarter of all participants actually reported feeling that they were getting better at detection, even as their performance declined.)

The research team said that these AI models are particularly vulnerable to mistakes in the midst of emotionally charged breaking news, as exhibited by the widespread misinformation that accompanied President Trump’s recent assassination attempt and major events during the Iranian war.

(The authors also point out that the original human-created news content that’s used to train the AI models is increasingly u…

6 days, 22 hours назад @ news.mit.edu
The crucial human component in computing and AI
The crucial human component in computing and AI The crucial human component in computing and AI

“There is so much amazing research being done at MIT on how AI and computing can be forces for good that benefit humanity.

Iason Gabriel, a philosopher and research scientist at Google DeepMind, used the example of a judge to illustrate his point.

Offloading versus upliftingAs students across all levels of education begin to use AI, questions arise on whether there’s a way to ethically incorporate AI tools while maintaining academic accuracy and rigor.

When the chess engine hands a turn over to its human partner, the human struggles to pick up on the predictive move pattern that the engine has been following up until this point.

“The danger of human-algorithm teams is that when the human ta…

1 week, 3 days назад @ news.mit.edu
PATH to boost AI training and career opportunities for industry-aligned jobs
PATH to boost AI training and career opportunities for industry-aligned jobs PATH to boost AI training and career opportunities for industry-aligned jobs

“Through PATH, MIT RAISE is using our convening power to bring community colleges, industry, research universities, and government together to build human-centered AI pathways that lead to shared prosperity.

Beyond individual courses, PATH is building clearer pathways for students to turn AI learning into real job opportunities.

The goal is to help students build skills that are relevant, recognized, and directly connected to growing career paths.

The initiative is supported by a grant to MIT from Google.org, which is helping MIT and its collaborators build a multi-state network for AI workforce development.

“MIT’s PATH initiative offers a blueprint for expanding opportunity in the age of A…

1 week, 4 days назад @ news.mit.edu
NSF renews support for MIT-led AI and physics institute, expanding a new model for discovery
NSF renews support for MIT-led AI and physics institute, expanding a new model for discovery NSF renews support for MIT-led AI and physics institute, expanding a new model for discovery

Its work has shown that machine learning can accelerate discovery in physics, while insights from physics can make AI systems more principled and interpretable.

“From the beginning, IAIFI has been built around a two-way street: AI enabling better physics, and physics enabling better AI,” says Jesse Thaler, IAIFI’s director and a professor of physics at MIT.

“We have seen this virtuous cycle play out across multiple areas of physics and AI over the past five years.

In particle physics, IAIFI researchers have developed AI techniques to handle the immense data rates from the Large Hadron Collider in real-time, helping turn a firehose of collision data into actionable physics.

In nuclear physic…

1 week, 5 days назад @ news.mit.edu
Teaching AI agents to ask better questions by playing “Battleship”
Teaching AI agents to ask better questions by playing “Battleship” Teaching AI agents to ask better questions by playing “Battleship”

These semi-autonomous programs can “think” and execute well-defined tasks in areas like customer service and software development, typically using language models (LMs).

In their “Collaborative Battleship” game, one participant is a “captain” who inquires about where hidden ships are, while their teammate plays the “spotter” by responding to those questions in real-time.

The result: AI models that can beat regular players at “Battleship,” regardless of scale.

We find that when we give agents access to a ‘world model,’ they ask better questions and make discoveries more efficiently.”A sea change for LMsThe team’s first focus was getting LMs to ask better questions.

Grand also plans to have h…

1 week, 5 days назад @ news.mit.edu
Tod Machover receives George Peabody Medal for contributions to music and technology
Tod Machover receives George Peabody Medal for contributions to music and technology Tod Machover receives George Peabody Medal for contributions to music and technology

Tod Machover, the Muriel R. Cooper Professor of Music and Media, faculty director of the MIT Media Lab, and director of the Opera of the Future research group, will receive the George Peabody Medal for Outstanding Contributions to Music and Dance in America — the highest honor bestowed by the Peabody Institute of the Johns Hopkins University.

As a composer and music tech pioneer, Machover has helped expand music’s possibilities for artists and audiences alike through his work in participatory opera, artificial intelligence, and creative technologies.

He joins a roster of previous George Peabody Medal recipients that includes Stevie Wonder, Misty Copeland, Herbie Hancock, Renée Fleming, Yo-Y…

1 week, 5 days назад @ news.mit.edu
MIT researchers teach AI models to interpret charts
MIT researchers teach AI models to interpret charts MIT researchers teach AI models to interpret charts

To fill this performance gap, researchers from MIT and the MIT-IBM Computing Research Lab developed a multifaceted resource for AI users that is specifically designed to teach vision-language models (VLMs) how to effectively interpret charts.

Many of these smaller models significantly outperformed orders of magnitude larger, commercial models on tasks like data extraction and chart summarization.

By enabling open-source models to outperform their commercial counterparts, ChartNet could allow small firms with limited budgets to more readily utilize AI.

But less work has focused on interpreting complex multimodal data contained within charts, Kondic says.

The dataset improved the accuracy of …

1 week, 6 days назад @ news.mit.edu
Media Advisory: MIT to establish regional quantum hub
Media Advisory: MIT to establish regional quantum hub Media Advisory: MIT to establish regional quantum hub

MIT and the Commonwealth of Massachusetts announced plans to establish the Quantum Systems Laboratory (QSL) at MIT, which will be open to researchers across the region.

Quantum technologies promise transformative changes in fields from computing, security, and navigation to health sciences, defense technologies, and space exploration.

MIT and the Commonwealth of Massachusetts announced plans to establish the Quantum Systems Laboratory (QSL) at MIT, a new shared-use facility that will serve as a quantum toolbox for the region, aimed at accelerating quantum research, innovation, and growth in this critical field.

Through the new Quantum Systems Laboratory, we will help position Massachusetts …

2 weeks, 5 days назад @ news.mit.edu
Technology usually creates jobs for young, skilled workers. Will AI do the same?
Technology usually creates jobs for young, skilled workers. Will AI do the same? Technology usually creates jobs for young, skilled workers. Will AI do the same?

At any given time, technology does two things to employment: It replaces traditional jobs, and it creates new lines of work.

“We had never before seen exactly who is doing new work,” Autor says.

“Eroding tasks is not the same thing as eroding jobs, since many jobs involve a lot of tasks.

New work, Autor observes, is always tied to new forms of expertise.

Back to AI for a minuteStudying who gets new jobs led the scholars to striking conclusions about how new work is created.

3 weeks, 5 days назад @ news.mit.edu
Building AI models that understand chemical principles
Building AI models that understand chemical principles Building AI models that understand chemical principles

Among all of the possible chemical compounds, it’s estimated that between 1020 and 1060 may hold potential as small-molecule drugs.

His research straddles the line between chemical engineering and computer science, as he develops and deploys computational models to analyze vast numbers of possible chemical compounds, design new compounds, and predict reaction pathways that could generate those compounds.

After graduating from Caltech, he decided to keep going in chemical engineering and came to MIT in 2014 to start a PhD.

His work focused on combining machine learning and cheminformatics — the application of computation methods to analyze chemical data — to plan reaction pathways that could…

3 weeks, 6 days назад @ news.mit.edu
Justin Solomon appointed associate dean of engineering education
Justin Solomon appointed associate dean of engineering education Justin Solomon appointed associate dean of engineering education

Justin Solomon, associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), has been appointed associate dean of engineering education in the MIT School of Engineering, effective July 1.

In this new role, Solomon will focus on advancing innovation in engineering education across the school.

He is the author of “Numerical Algorithms,” a textbook that presents a modern approach to numerical analysis for computer science students.

Solomon is a principal investigator at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), where he leads the Geometric Data Processing Group.

Solomon joined the MIT faculty in 2016.

3 weeks, 6 days назад @ news.mit.edu
Berkeley AI
последний пост 1 month, 1 week назад
Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling
Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference ScalingOverview of adaptive parallel reasoning.

We provide a detailed analysis of recent progress in the field of parallel reasoning, especially Adaptive Parallel Reasoning.

Figure 4: Special Tokens Variants across Adaptive Parallel Reasoning PapersInference Systems for Adaptive ParallelismHow do we actually execute parallel branches?

Figure 14: Difference in Model Choice Across Adaptive Parallel Reasoning PapersEach paper also offers a slightly different interpretation about how adaptive parallel reasoning contributes to the research field.

(Yang et al., 2025; Lian et al., 2025) aim to deliver sequential-AR-model-level a…

1 month, 1 week назад @ bair.berkeley.edu
Gradient-based Planning for World Models at Longer Horizons
Gradient-based Planning for World Models at Longer Horizons Gradient-based Planning for World Models at Longer Horizons

Large, learned world models are becoming increasingly capable.

Why is adversarial robustness an issue for world model planning?

We thus exploit the differentiability of learned world models $F_{\theta}$, while not falling victim to the inherent sensitivity of the state Jacobians $D_s F_{\theta}$.

It’s a funny sweet spot where the background literature (planning and control overall) is incredibly mature and well-developed, but the current setting (pure planning optimization over modern, large-scale world models) is still heavily underexplored.

But, once we figure out all the right ideas, world model planners will likely become as commonplace as RL.

1 month, 3 weeks назад @ bair.berkeley.edu
Identifying Interactions at Scale for LLMs
Identifying Interactions at Scale for LLMs Identifying Interactions at Scale for LLMs

Identifying Interactions at Scale for LLMsUnderstanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence.

Therefore, grounded or reality-checked interpretability methods must also be able to capture these influential interactions.

In this blog post, we describe the fundamental ideas behind SPEX and ProxySPEX, algorithms capable of identifying these critical interactions at scale.

SPEX and ProxySPEX FrameworkTo discover influential interactions with a tractable number of ablations, we have developed SPEX (Spectral Explainer).

We formalize this through two observations: sparsity (relatively f…

3 months назад @ bair.berkeley.edu
Information-Driven Design of Imaging Systems
Information-Driven Design of Imaging Systems Information-Driven Design of Imaging Systems

We developed a framework that enables direct evaluation and optimization of imaging systems based on their information content.

The first approach treated imaging systems as unconstrained communication channels, ignoring the physical limitations of lenses and sensors.

Our Information-Driven Encoder Analysis Learning (IDEAL) method uses gradient ascent on information estimates to optimize imaging system parameters.

The standard approach to computational imaging design, end-to-end optimization, jointly trains the imaging hardware and a neural network decoder.

The computational efficiency of IDEAL suggests possibilities for designing imaging systems that were previously intractable.

5 months, 1 week назад @ bair.berkeley.edu
RL without TD learning
RL without TD learning RL without TD learning

RL without TD learningIn this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer.

We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning.

There are two classes of algorithms in RL: on-policy RL and off-policy RL.

We compared TRL with $n$-step TD learning with different values of $n$, from $1$ (pure TD) to $\infty$ (pure MC).

I still think one of the most important problems in RL (and even in machine learning) is to find a scalable off-policy RL algorithm.

7 months, 2 weeks назад @ bair.berkeley.edu
What exactly does word2vec learn?
What exactly does word2vec learn? What exactly does word2vec learn?

What exactly does word2vec learn?

What exactly does word2vec learn, and how?

In this framing, it’s clear that word2vec is a minimal neural language model.

As a result, the theory predicts exactly what features are learned in terms of the corpus statistics and the algorithmic hyperparameters.

We find that over the course of learning, word2vec builds these linear representations in a sequence of noisy learning steps, and their geometry is well-described by a spiked random matrix model.

9 months, 2 weeks назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 53 минуты назад
Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI
Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI

P-EAGLE completely eliminates the nested sequential drafting phase by predicting all speculative draft tokens simultaneously in a single forward pass.

Getting started with P-EAGLE on SageMaker JumpStartAmazon SageMaker JumpStart provides a one-click deployment experience for foundation models with P-EAGLE parallel speculative decoding.

SageMaker AI provisions the instance, downloads the model artifacts and P-EAGLE drafter head, and starts the vLLM inference server.

To get started, open the Amazon SageMaker AI console, navigate to JumpStart, and deploy one of the supported P-EAGLE models.

To learn more about model deployment on Amazon SageMaker AI, see the Amazon SageMaker AI documentation.

53 минуты назад @ aws.amazon.com
Introducing Gemma 4 models on Amazon Bedrock
Introducing Gemma 4 models on Amazon Bedrock Introducing Gemma 4 models on Amazon Bedrock

In this post, we walk through how to get started with Gemma 4 models on Amazon Bedrock.

Accessing Gemma 4 models on Amazon BedrockYou access Gemma 4 models on Amazon Bedrock through the bedrock-mantle endpoint, the OpenAI-compatible API purpose-built for the next-generation inference engine for Amazon Bedrock.

Get started with Gemma 4 family models on Amazon BedrockComplete the following steps to start using Gemma 4 on Amazon Bedrock.

To control permissions for generating and using API keys, refer to Control permissions for generating and using Amazon Bedrock API keys.

To avoid unintended charges:If you generated short-term Amazon Bedrock API keys for testing, the keys expire automatically …

22 часа назад @ aws.amazon.com
AI Agent Failure Detection and Root Cause Analysis with Strands Evals
AI Agent Failure Detection and Root Cause Analysis with Strands Evals AI Agent Failure Detection and Root Cause Analysis with Strands Evals

Detectors in the Strands Evals SDK remove this bottleneck by automatically identifying failures in agent execution traces and performing root cause analysis, so you can reduce diagnosis time from hours to minutes.

In this post, we walk you through calling the detector functions to diagnose real agent failures.

Phase 2: Root cause analysis takes the detected failures and traces causal chains between them.

Root cause analysis separates causes from symptoms.

Figure: Detector pipeline with integrated and standalone entry points flowing into failure detection and root cause analysis.

1 day назад @ aws.amazon.com
Build context-rich research agents with Deep Agents and Bedrock AgentCore
Build context-rich research agents with Deep Agents and Bedrock AgentCore Build context-rich research agents with Deep Agents and Bedrock AgentCore

It then spawns three browser subagents in parallel for research, each navigating a different competitor’s website in its own AgentCore Browser MicroVM.

You can trace the entire workflow with Amazon CloudWatch through Amazon Bedrock AgentCore Observability or LangSmith.

Figure 1: Solution architecture showing the data flow between the LangChain Deep Agents orchestrator, Amazon Bedrock AgentCore Browser MicroVMs, Interpreter, Memory, and CloudWatch or LangSmith tracing.

Figure 2:Three distinct MicroVM session IDs confirm that each research subagent operates in its own isolated Amazon Bedrock AgentCore Browser environment.

If you created an AgentCore Memory resource and no longer need it, you …

1 day, 4 hours назад @ aws.amazon.com
Building Supercharger: How Rocket Close optimized title operations with agentic AI
Building Supercharger: How Rocket Close optimized title operations with agentic AI Building Supercharger: How Rocket Close optimized title operations with agentic AI

Rocket Close is a Detroit-based title agency and appraisal management company within Rocket Companies that provides title insurance, property valuation, and settlement services.

Supercharger is an agentic AI solution designed to reduce friction in the lending and homebuying process and optimize title operations workflows.

Exam title agent invocation – The Strands Agent is invoked, triggering the agentic workflow based on system prompts and user input.

In the following sections, we explain why we chose Strands Agents and an MCP tool-based architecture.

To learn more, see the Strands Agents documentation and the Amazon Bedrock marketing page.

3 days, 21 hours назад @ aws.amazon.com
Build a meeting prep and follow-up assistant with Amazon Quick and Cisco Webex MCP servers
Build a meeting prep and follow-up assistant with Amazon Quick and Cisco Webex MCP servers Build a meeting prep and follow-up assistant with Amazon Quick and Cisco Webex MCP servers

This post shows how to build a custom meeting prep and follow-up assistant using Amazon Quick and Cisco Webex MCP servers.

This solution uses three Cisco Webex MCP servers:Cisco Webex MCP server Role in this solution Webex Meetings MCP Find upcoming and previous meetings, retrieve meeting status, artificial intelligence (AI)-generated meeting summaries, recordings, and transcripts.

Ask your Webex organization admin to enable the Webex Meetings MCP Server, Webex Messaging MCP Server, and Vidcast MCP Server in Webex Control Hub.

Step 1: Confirm Cisco Webex MCP accessCisco provides hosted Webex MCP server endpoints.

ConclusionIn this post, we showed how to build a meeting prep and follow-up as…

4 days, 3 hours назад @ aws.amazon.com
From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services
From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services

This post outlines the development of a cost-effective and scalable intelligent document processing pipeline on AWS, powered by Amazon Bedrock and its features.

Implementation architectureThe processing pipeline employs an event-driven approach to document processing, integrating multiple specialized layers into a cohesive workflow.

Document processing flowAWS Step Functions orchestrates the document processing pipeline, handling document classification, multi-modal extraction, data validation, and knowledge base integration.

The serverless architecture with AWS Step Functions and asynchronous BDA processing enabled this massive parallel processing capability without performance degradation…

4 days, 3 hours назад @ aws.amazon.com
Built from the inside out: How AWS Professional Services became a frontier team first
Built from the inside out: How AWS Professional Services became a frontier team first Built from the inside out: How AWS Professional Services became a frontier team first

AWS Professional Services (AWS ProServe) compressed engagement timelines from months to days, not by adding artificial intelligence (AI) tools to an existing process, but by fundamentally rebuilding how we deliver from the inside out.

In this post, I’ll share how AWS ProServe became a frontier team, the practices that enabled it, and what your engineering organization can take from our experience.

APEX built the ProServe Delivery Agent, a multi-agent system spanning requirements, architecture validation, implementation, security review, testing, and deployment.

Building the Delivery Agent by using the Delivery AgentAPEX builds the Delivery Agent using the same AI-native practices it provide…

4 days, 5 hours назад @ aws.amazon.com
Extract Data with On-demand and Batch Pipelines Dynamically
Extract Data with On-demand and Batch Pipelines Dynamically Extract Data with On-demand and Batch Pipelines Dynamically

Batch inference pipelineA standard AWS SQS queue is used for the batch inference pipeline because of its high throughput.

Batch Inference AWS Lambda function to pre-process the scanned PDFs, create JSONL files and submit the batch inference job.

2.2.5 Composing messages and submits batch inference jobFinally, the batch inference Lambda function creates the Amazon Bedrock batch inference job using the JSONL artifacts from the previous step.

The Amazon Bedrock batch inference jobWhen Amazon Bedrock receives the batch inference job, it places it in a queue.

ConclusionThe on-demand and batch Amazon Bedrock inference pipelines presented in this post explain how you can dynamically process docume…

4 days, 22 hours назад @ aws.amazon.com
Evaluate AI agents systematically with Agent-EvalKit
Evaluate AI agents systematically with Agent-EvalKit Evaluate AI agents systematically with Agent-EvalKit

Teams building AI agents typically evaluate them the way they evaluate any other software: by checking whether the output matches expectations.

Eval ( /evalkit.eval ) implements the metrics from your plan as executable evaluation code, runs it against the collected traces, and saves structured results.

( ) implements the metrics from your plan as executable evaluation code, runs it against the collected traces, and saves structured results.

Agent-EvalKit is designed to make that evaluation part of the same development workflow you already use to write and review agent code.

Refer to An Empirical Study of Automating Agent Evaluation for additional reading on this solution.

5 days, 2 hours назад @ aws.amazon.com
Spot trends faster, sort smarter: Unlocking Sparklines and Custom Sort in Amazon Quick
Spot trends faster, sort smarter: Unlocking Sparklines and Custom Sort in Amazon Quick Spot trends faster, sort smarter: Unlocking Sparklines and Custom Sort in Amazon Quick

By the end of this post, readers will be able to:Understand what sparklines and custom sort are and the business problems they solve.

Define a custom sort order for dimension fields in Quick Sight.

Verify the visual has at least one field in the Group by field well and one numeric measure in the Values field well.

Let us take a look at some of the factors you should consider when you implement sparklines and custom sort in your analysis.

ConclusionSparklines and custom sort for controls are two focused, high-impact additions to the Amazon Quick Sight authoring experience.

5 days, 3 hours назад @ aws.amazon.com
Optimize blueprint extraction accuracy in Amazon Bedrock Data Automation
Optimize blueprint extraction accuracy in Amazon Bedrock Data Automation Optimize blueprint extraction accuracy in Amazon Bedrock Data Automation

Access to Amazon Bedrock with Amazon Bedrock Data Automation enabled in a supported Region.

Delete blueprints you created by navigating to Amazon Bedrock Data Automation in the Amazon Bedrock console, selecting the blueprint, and choosing Delete .

For more information about managing Amazon Bedrock Data Automation resources, refer to the Amazon Bedrock Data Automation documentation.

Combined with Amazon Bedrock Data Automation’s confidence scores, visual grounding, and integration with Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases, this feature can accelerate the path from prototype to production IDP workflows.

Explore how optimized blueprints integrate with Amazon Bedrock Knowled…

5 days, 3 hours назад @ aws.amazon.com
How frontier teams are reinventing AI-native development
How frontier teams are reinventing AI-native development How frontier teams are reinventing AI-native development

We call the teams that have figured this out “frontier teams.” They are not confined to elite labs.

Any engineering team can become a frontier team; we can show you how to get there.

Frontier teams optimize for something different: the rate at which correct, production-ready software reaches customers.

Frontier teams maintain a steady backlog of well-scoped tasks with clear outcomes, running multiple agents in parallel and reviewing output asynchronously.

Learn more about frontier teams >Tune in to AWS Summit New York City for more on AI-native development.

5 days, 17 hours назад @ aws.amazon.com
Stop hand-tuning kernels: How Neuron Agentic Development accelerates AWS Trainium optimizations
Stop hand-tuning kernels: How Neuron Agentic Development accelerates AWS Trainium optimizations Stop hand-tuning kernels: How Neuron Agentic Development accelerates AWS Trainium optimizations

In this post, we explain how the Neuron Agentic Development capabilities accelerate the kernel development workflow.

The Neuron Agentic Development skillsThe Neuron Agentic Development package provides five specialized skills that follow the natural kernel development pipeline: write → debug → profile → analyze.

Install Kiro: curl -fsSL https://cli.kiro.dev/install | bash Install Neuron Agentic Development skills following the instructions at the neuron-agentic-development repository.

Things to knowKeep the following considerations in mind when working with Neuron Agentic Development skills and agents.

Come build with usThe Neuron Agentic Development capabilities are available today.

6 days, 3 hours назад @ aws.amazon.com
Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore
Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore Build an AI-Powered Equipment Repair Assistant Using Amazon Bedrock AgentCore

In this post, you build an AI-powered equipment repair assistant using Amazon Bedrock AgentCore that helps farmers and field technicians diagnose equipment problems, identify required parts, and access manufacturer-approved repair procedures through natural language.

The Knowledge Base indexes equipment documentation stored in Amazon S3 using Amazon OpenSearch Serverless for vector search and Amazon Titan Embeddings for semantic matching.

Other services (AgentCore Runtime, Amazon DynamoDB, Amazon S3, Amazon Cognito, AWS Amplify) fall within the AWS Free Tier for testing volumes.

Amazon Bedrock AgentCore configurationDifferent troubleshooting scenarios demand varying levels of technical comp…

6 days, 3 hours назад @ aws.amazon.com
NVIDIA
последний пост 2 часа назад
HPE AI Factory With NVIDIA Expands for the Era of Agents
HPE AI Factory With NVIDIA Expands for the Era of Agents HPE AI Factory With NVIDIA Expands for the Era of Agents

At HPE Discover Las Vegas, running through Thursday, June 18, NVIDIA and HPE are expanding the HPE AI Factory with NVIDIA, including NVIDIA Vera CPU and NVIDIA Agent Toolkit for HPE Private Cloud AI.

Plus, NVIDIA Confidential Computing extends across HPE AI Factory and enhanced full-stack NVIDIA integration — with NVIDIA accelerated computing, NVIDIA AI software and NVIDIA networking — is available throughout the entire portfolio.

NVIDIA Vera CPU Available With HPE Private Cloud AIThe HPE ProLiant Compute DL394 Gen12 with the NVIDIA Vera CPU will be available in 2027 with HPE Private Cloud AI, a turnkey AI factory co-engineered with NVIDIA.

NVIDIA Confidential Computing Across All HPE AI Fa…

2 часа назад @ blogs.nvidia.com
How to Optimize Transformer-Based Models for Low-Precision Training
How to Optimize Transformer-Based Models for Low-Precision Training How to Optimize Transformer-Based Models for Low-Precision Training

As these models grow in size, training runs consume more GPU hours and more engineering iteration time.

Transformers spend much of their training time in GEMMs, and low-precision formats speed up training mainly by making those matrix multiplications faster and cheaper.

The tool computes M = 31 × 512 = 15,872 tokens, derives all 12 GEMM shapes, benchmarks each across enabled precisions, and prints the full results.

The tool computes M = 31 × 512 = 15,872 tokens, derives all 12 GEMM shapes, benchmarks each across enabled precisions, and prints the full results.

Use prequantized results to understand whether quantization overhead is the bottleneck, or to compare raw tensor core throughput acr…

2 часа назад @ developer.nvidia.com
Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0
Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0 Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0

The NVIDIA platform was the only one to be submitted across every benchmark, and delivered the fastest time to train on all seven.

This round, NVIDIA submitted results on both NVIDIA GB200 NVL72 and GB300 NVL72 rack-scale systems.

NVIDIA GB300 NVL72 Delivered up to 1.6x Performance Over GB200 NVL72: In this round, GB300 NVL72 delivered up to 1.6x faster training than GB200 NVL72 at the same scale.

NVIDIA also submitted results at 5,120 GPUs with NVIDIA GB200 NVL72 systems on Llama 3.1 405B, one of the largest dense LLMs in the suite.

Many of these partners are running some of the most demanding AI training workloads on NVIDIA infrastructure.

3 часа назад @ blogs.nvidia.com
Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes
Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes Fine-Tuning Biological Foundation Models with LoRA Using NVIDIA BioNeMo Recipes

This post walks through two case studies that show how the same parameter-efficient recipe applies across biological modalities on a single NVIDIA RTX 6000 Blackwell Workstation Edition GPU:ESM2-3B plus LoRA for protein secondary structure prediction (PSSP)Evo2-1B plus LoRA for DNA splice-site classificationAll the source code to customize or reproduce these results are available in NVIDIA BioNeMo Recipes.

Table 1 summarizes Q3/Q8 test accuracy for the ESM2-3B plus LoRA model alongside strong published baselines reported in the Porter 6 paper.

Total trainable parameters: ~3.7 million (0.33 % of the model) LoRA plus head: Backbone frozen; LoRA adapters on the listed target modules, classific…

1 day назад @ developer.nvidia.com
NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark
NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

AgentPerf from Artificial Analysis, the industry’s first agentic AI benchmark, gives developers, enterprises and infrastructure providers a clear way to compare systems for agentic AI.

In the first round of published results, the NVIDIA Blackwell Ultra NVL72 platform delivers leading performance across the agentic AI workloads tested, running 20x more agents per megawatt than NVIDIA Hopper.

Agentic AI is a fundamentally different workload than conversational AI.

These results are grounded in a benchmark methodology built from the ground up to reflect how agentic AI actually works in production.

Dive deeper into AgentPerf’s methodology and NVIDIA’s full-stack optimizations for agentic AI in …

3 days, 21 hours назад @ blogs.nvidia.com
Save Big and Play Bigger: GeForce NOW Summer Sale Brings Major Membership Savings
Save Big and Play Bigger: GeForce NOW Summer Sale Brings Major Membership Savings Save Big and Play Bigger: GeForce NOW Summer Sale Brings Major Membership Savings

Guild Wars 3 is coming to GeForce NOW, bringing its next-generation massively multiplayer online role-playing game (MMORPG) experience to the cloud at launch.

While awaiting its arrival, the journey begins today with Guild Wars 2 and Guild Wars Reforged, along with limited-time exclusive rewards.

Gamers can add Guild Wars 3 to their wishlist today and stay up to date on every reveal as it marches toward launch.

In Guild Wars 2, the “Masterpiece Emote Tome” turns every victory into a moment of flair — a chef’s kiss to triumph.

In Guild Wars Reforged, the “Vision of Lyssa” costume brings elegance and illusion to life and includes an account upgrade.

5 days, 5 hours назад @ blogs.nvidia.com
For Robotaxis, Safety Must Be Built In, Not Bolted On
For Robotaxis, Safety Must Be Built In, Not Bolted On For Robotaxis, Safety Must Be Built In, Not Bolted On

It comprises:Halos Core: A Certified OS FoundationAt the foundation of NVIDIA Halos OS is Halos Core, which is the next generation of NVIDIA DriveOS and certified to automotive safety standards.

In addition, in Halos Applications, Halos OS can be combined with end-to-end AI models for which explainability and transparency are essential.

The Halos Safety Evaluation FrameworkHalos Infra is the cloud-side development infrastructure that enables autonomous vehicle training, simulation and validation at scale.

It’s the foundation for the recently released NVIDIA Halos Safety Evaluation Framework (SEF).

It draws on more than 330 research papers and 1,000 patents developed within NVIDIA Halos OS.

5 days, 23 hours назад @ blogs.nvidia.com
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

NVIDIA has optimized DiffusionGemma to run even faster across NVIDIA GeForce RTX GPUs, the NVIDIA RTX PRO platform and NVIDIA DGX Spark systems, from local PCs to the cloud.

Features of the new model include:Parallel generation: DiffusionGemma denoises up to 256 tokens per step instead of predicting one at a time.

Up to 4x faster performance: The boost means fast text generation, where single-user generation usually stalls — on local hardware.

Check out the vLLM playbooks for DGX Spark , RTX PRO and DGX Station.

Plug in to RTX Spark on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX Spark newsletter.

6 days, 2 hours назад @ blogs.nvidia.com
NVIDIA Confidential Computing to Help Expand Apple’s Private Cloud Compute
NVIDIA Confidential Computing to Help Expand Apple’s Private Cloud Compute NVIDIA Confidential Computing to Help Expand Apple’s Private Cloud Compute

NVIDIA GPUs with Confidential Computing are now used for confidential inference in Apple’s Private Cloud Compute (PCC), as it expands beyond Apple’s data centers to Google Cloud.

Confidential Computing Matters for the Era of AI ExperiencesNVIDIA Confidential Computing provides a hardware-based security layer for accelerated AI workloads.

For end users, NVIDIA Confidential Computing means that no one, not even the system’s builders, can look at their data, chats or conversations.

How Confidential Computing Enforces Privacy and TrustNVIDIA Confidential Computing reflects NVIDIA’s commitment to trustworthy AI and includes these key capabilities:Hardware-rooted trust , helping establish that sy…

6 days, 20 hours назад @ blogs.nvidia.com
Delivering Lifecycle Control for AI Infrastructure at Scale with NVIDIA DGX Spark Enterprise Manageability
Delivering Lifecycle Control for AI Infrastructure at Scale with NVIDIA DGX Spark Enterprise Manageability Delivering Lifecycle Control for AI Infrastructure at Scale with NVIDIA DGX Spark Enterprise Manageability

NVIDIA DGX Spark and NVIDIA GB10 systems are delivering this foundation with new Enterprise Manageability.

How does DGX Spark Enterprise Manageability integrate into existing IT workflows?

NVIDIA partners that currently support DGX Spark from an enterprise manageability perspective include Progress Chef, Perforce Puppet, and Canonical Landscape.

Get started with NVIDIA DGX Spark Enterprise ManageabilityEnterprise AI infrastructure carries enterprise expectations.

For additional documentation, visit DGX Spark Enterprise Manageability.

6 days, 23 hours назад @ developer.nvidia.com
Model Quantization: Turn FP8 Checkpoints into High-Performance Inference Engines with NVIDIA TensorRT
Model Quantization: Turn FP8 Checkpoints into High-Performance Inference Engines with NVIDIA TensorRT Model Quantization: Turn FP8 Checkpoints into High-Performance Inference Engines with NVIDIA TensorRT

In a previous post, we produced a high-quality FP8-quantized Contrastive Language-Image Pretraining (CLIP) checkpoint with NVIDIA TensorRT Model Optimizer.

We also profile the resulting FP8 TensorRT engine against the FP16 baseline to measure the real-world speedup the quantized model delivers.

Profile ONNX model with TensorRTWith the FP8 ONNX model exported, the next step is to pass it to TensorRT and measure how fast it runs.

--saveEngine writes the built TensorRT engine to disk for later reuse, either for standalone TensorRT inference runtime or for serving through NVIDIA Triton Inference Server (see this example).

CLIP FP8 vs FP16: TensorRT engine size and inference latencyFigure 3 show…

1 week назад @ developer.nvidia.com
Accelerating Federated Learning Research with AI Agents and NVIDIA FLARE Auto-FL
Accelerating Federated Learning Research with AI Agents and NVIDIA FLARE Auto-FL Accelerating Federated Learning Research with AI Agents and NVIDIA FLARE Auto-FL

Federated learning (FL) research often begins with a deceptively simple question: What should we try next?

NVIDIA FLARE Auto-FL is an automated, AI-driven research loop designed to test and optimize federated learning strategies.

A completed Auto-FL campaign on a heterogeneous CIFAR-10 data split with eight simulated FL clients in FLAREHow does Auto-FL make the research loop explicit?

Figure 2 shows the Auto-FL research loop with literature-grounded stall recovery.

Auto-FL research loop with literature-grounded stall recoveryWhat is the function of literature-grounded recovery?

1 week назад @ developer.nvidia.com
How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies
How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies

U.K. technology leaders are innovating across healthcare and life sciences, coding, agentic AI, inference and more — all running on sovereign AI deployments.

Nebius has announced plans to expand customers and cloud capabilities with three new deployments of advanced NVIDIA AI infrastructure, as the NVIDIA AI Cloud ecosystem partner continues to build out its commercial and AI R&D hub in London.

CoreWeave is building in the U.K. Government’s AI Growth Zones, and seven more NVIDIA AI Cloud ecosystem partners have plans in the pipeline.

BT and Nscale announced plans to build sovereign AI data centers across three existing BT sites in the U.K., combining NVIDIA AI infrastructure, Nscale’s full …

1 week, 1 day назад @ blogs.nvidia.com
NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure
NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure

The AI factory will provide LG Group with accelerated computing infrastructure to train, simulate, validate and deploy AI-based applications across its key businesses.

Together, the companies are connecting AI model development, physical AI data generation, robot simulation and training, edge deployment and factory-scale digital twins into a unified workflow for building physical AI systems.

To help overcome the training data challenge for robotics, LG Electronics is developing a physical AI data factory poised to help Korean and global companies accelerate physical AI projects.

This initiative aligns with the NVIDIA DSX AI factory platform, enabling the rapid deployment of scalable, high-p…

1 week, 1 day назад @ blogs.nvidia.com
NVIDIA and Doosan Group Collaborate to Advance Physical AI and AI Factory Infrastructure
NVIDIA and Doosan Group Collaborate to Advance Physical AI and AI Factory Infrastructure NVIDIA and Doosan Group Collaborate to Advance Physical AI and AI Factory Infrastructure

NVIDIA and Doosan Group are expanding their collaboration to advance new opportunities across physical AI, robotics and AI factory infrastructure, spanning Doosan Robotics, Doosan Bobcat, Doosan Enerbility and Doosan Corporation Electro-Materials BG.

NVIDIA and Doosan will explore how NVIDIA’s physical AI stack, NVIDIA DSX AI factory platform, NVIDIA MGX and accelerated computing platforms can support these areas.

Doosan Bobcat also plans to explore integrating NVIDIA physical AI technologies into equipment used across construction, landscaping, agriculture and material handling applications.

Exploring AI Factory Power SolutionsDoosan Enerbility is exploring opportunities to support NVIDIA …

1 week, 1 day назад @ blogs.nvidia.com
Facebook
последний пост 3 weeks назад
SilverTorch: Index as Model — A New Retrieval Paradigm for Recommendation Systems
SilverTorch: Index as Model — A New Retrieval Paradigm for Recommendation Systems SilverTorch: Index as Model — A New Retrieval Paradigm for Recommendation Systems

The retrieval system within industry recommendation systems have consisted of microservices stitched together, with neural networks inconsistently integrated.

Under Index as Model previous microservice-based item indices used for retrieval become a tensor inside the model.

Moving From Microservice Mesh to One Integrated Neural NetworkThe Microservice Paradigm We ReplacedTraditional recommendation retrieval is built as a mesh of microservices.

We call this Index as Model: Every retrieval component — the item index, eligibility filter, scoring layer and user tower — becomes a tensor or operator inside a single PyTorch model.

Index FreshnessWith index as a model module, maintaining index fresh…

3 weeks назад @ engineering.fb.com
Reel Friends: Building Social Discovery that Scales to Billions
Reel Friends: Building Social Discovery that Scales to Billions Reel Friends: Building Social Discovery that Scales to Billions

On its face the new Friend Bubbles feature looks simple enough.

It highlights Reels your friends have watched and reacted to.

On this episode of the Meta Tech Podcast, Pascal Hartig chats with Subasree and Joseph, two software engineers from the Facebook Reels team, about what it took to bring Friend Bubbles to life.

If you’ve ever underestimated a “simple” feature, this one’s for you.

And if you’re interested in learning more about career opportunities at Meta visit the Meta Careers page.

1 month назад @ engineering.fb.com
Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge
Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge

We’ve fundamentally transformed Facebook Groups Search to help people more reliably discover, sort through, and validate community content that’s most relevant to them.

We’ve adopted a new hybrid retrieval architecture and implemented automated model-based evaluation to address the major friction points people experience when searching community content.

Addressing the Friction Points in Community KnowledgePeople struggle with three friction points when searching for answers in community content – discovery, consumption, and validation.

The Solution: A Modernized Hybrid Retrieval ArchitectureWe engineered a hybrid retrieval architecture that powers a discussions module on Facebook Search.

R…

1 month, 3 weeks назад @ engineering.fb.com
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale

We’ve built a unified AI agent platform that encodes the domain expertise of senior efficiency engineers into reusable, composable skills.

Introducing the Capacity Efficiency ProgramWhen the code you ship serves more than 3 billion people, even a 0.1% performance regression can translate to significant additional power consumption.

Many engineers at Meta use our efficiency tools to work on these problems every day.

Skills : These encode domain expertise about performance efficiency.

The pipeline mirrors the defensive AI Regression Solver:Gather context with tools: The AI agent looks up: Opportunity metadata.

2 months назад @ engineering.fb.com
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines

Challenging the Conventional Wisdom on AI Context FilesRecent academic research found that AI-generated context files actually decreased agent success rates on well-known open-source Python repositories.

Our codebase is the opposite: proprietary config-as-code with tribal knowledge that exists nowhere in any model’s training data.

Any team with a large, proprietary codebase can benefit:Identify your tribal knowledge gaps.

What’s NextWe are expanding context coverage to additional pipelines across Meta’s data infrastructure and exploring tighter integration between context files and code generation workflows.

This approach turned undocumented tribal knowledge into structured, AI-readable con…

2 months, 1 week назад @ engineering.fb.com
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure

This is the second post in the Ranking Engineer Agent blog series exploring the autonomous AI capabilities accelerating Meta’s Ads Ranking innovation.

We introduce KernelEvolve, an agentic kernel authoring system used by Ranking Engineer Agent and generally applicable to a range of AI models beyond Ads Ranking.

Unlike typical large language model (LLM)-based agents that perform one-shot code generation, KernelEvolve treats kernel optimization as a search problem.

A standard coding assistant lacks the context to write optimized MTIA kernels because it has never seen MTIA documentation, instruction set details, or programming idioms.

KernelEvolve represents an early step toward the vision of …

2 months, 2 weeks назад @ engineering.fb.com
Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads
Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

To overcome this, we have developed the Meta Adaptive Ranking Model, which effectively bends the inference scaling curve with high ROI and industry-leading efficiency.

Introducing Meta Adaptive Ranking ModelServing LLM-scale & complexity models in a real-time ads recommendation environment requires resolving a fundamental tension between model complexity and system efficiency.

Adaptive Ranking Model addresses these challenges through a paradigm shift powered by three core innovations across the serving stack:Inference-efficient model scaling: Adaptive Ranking Model achieves a model complexity equivalent to the O(10 GFLOPs) per token used by top-tier LLMs.

To minimize compute overhead, Adapt…

2 months, 2 weeks назад @ engineering.fb.com
AI for American-Produced Cement and Concrete
AI for American-Produced Cement and Concrete AI for American-Produced Cement and Concrete

Concurrent with the 2026 American Concrete Institute (ACI) Spring Convention, Meta is releasing a new AI model for designing concrete mixes – Bayesian Optimization for Concrete (BOxCrete), as well as the foundational data used to develop award-winning concrete mixes.

Amrize operates 18 cement plants, 141 cement terminals and 269 ready-mix concrete sites across North America.

Alongside the event, Meta is releasing a new AI model for designing concrete mixes, Bayesian Optimization for Concrete (BOxCrete).

How Meta Leverages AI for Concrete MixturesMeta’s AI for concrete model can help suppliers more quickly incorporate U.S. materials into their mixes through an approach called adaptive experi…

2 months, 2 weeks назад @ engineering.fb.com
Friend Bubbles: Enhancing Social Discovery on Facebook Reels
Friend Bubbles: Enhancing Social Discovery on Facebook Reels Friend Bubbles: Enhancing Social Discovery on Facebook Reels

Friend bubbles in Facebook Reels highlight Reels your friends have liked or reacted to, helping you discover new content and making it easier to connect over shared interests.

Friend bubbles enhance the social experience on Facebook Reels by helping you discover content your friends enjoy, creating a shared viewing experience and sparking new conversations.

Along with additional optimizations in the underlying method, this approach enabled us to ship friend bubbles while preserving core Reels performance.

Friend bubbles work because the signal is high value: It adds meaningful social context that helps people decide what’s worth watching.

Engagement also scales consistently with the number …

3 months назад @ engineering.fb.com
Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation
Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation

Meta’s Ranking Engineer Agent (REA) autonomously executes key steps across the end-to-end machine learning (ML) lifecycle for ads ranking models.

Powering these interactions are highly sophisticated, complex and massively distributed machine learning (ML) models that continuously evolve to serve both advertisers and people who use the platforms.

Optimizing these ML models has traditionally been time-consuming.

To address this, Meta built the Ranking Engineer Agent, an autonomous AI agent designed to drive the end-to-end ML lifecycle and iteratively evolve Meta’s ads ranking models at scale.

ML training jobs run for hours or days, far beyond what any session-bound assistant can manage.

3 months назад @ engineering.fb.com
Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps
Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps

Nowhere is this more apparent than in mobile security, where a single class of vulnerability can be replicated across hundreds of call sites scattered throughout a sprawling, multi-app codebase serving billions of users.

Meta’s Product Security team has developed a two-pronged strategy to address this:Designing secure-by-default frameworks that wrap potentially unsafe Android OS APIs and make the secure path the easiest path for developers, andLeveraging generative AI to automate the migration of existing code to those frameworks at scale.

The result is a system that can propose, validate, and submit security patches across millions of lines of code with minimal friction for the engineers w…

3 months назад @ engineering.fb.com
RCCLX: Innovating GPU communications on AMD platforms
RCCLX: Innovating GPU communications on AMD platforms RCCLX: Innovating GPU communications on AMD platforms

RCCLX is fully integrated with Torchcomms and aims to empower researchers and developers to accelerate innovation, regardless of their chosen backend.

We want to iterate on collectives, transports, and novel features quickly on AMD platforms.

With RCCLX, we have integrated CTran to AMD platforms, enabling the AllToAllvDynamic – a GPU-resident collective.

These features provide significant performance improvements on AMD platforms and we are excited to share this with the community.

RCCLX Quick Start GuideInstall Torchcomms with RCCLX backend by following the installation instructions in the Torchcomms repo.

3 months, 3 weeks назад @ engineering.fb.com
The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It
The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It

A Catching JiTTest focuses specifically on finding regressions introduced by a code change.

Agentic development dramatically increases the pace of code change, straining test development burden and scaling the cost of false positives and test maintenance to breaking point.

And since the JiTTest itself is LLM-generated, it can often infer the plausible intention of a code change and simulate possible faults that may result from it.

With them engineers no longer have to spend time writing, reviewing, and testing complex test code.

READ THE PAPERJust-in-Time Catching Test Generation at Meta

4 months назад @ engineering.fb.com
Adapting the Facebook Reels RecSys AI Model Based on User Feedback
Adapting the Facebook Reels RecSys AI Model Based on User Feedback Adapting the Facebook Reels RecSys AI Model Based on User Feedback

Our new User True Interest Survey (UTIS) model , now helps surface more niche, high-quality content and boosts engagement, retention, and satisfaction.

Our paper, “ Improve the Personalization of Large-Scale Ranking Systems by Integrating User Survey Feedback ” shares full details on this work.

The main candidate ranking model used by the platform is a large multi-task, multi-label model.

We trained a lightweight UTIS alignment model layer on the collected user survey responses using existing predictions of the main model as input features.

The UTIS model consistently outperformed the baseline, driving higher user engagement and retention .

5 months назад @ engineering.fb.com
DrP: Meta’s Root Cause Analysis Platform at Scale
DrP: Meta’s Root Cause Analysis Platform at Scale DrP: Meta’s Root Cause Analysis Platform at Scale

DrP’s key components include:Expressive SDK : The DrP SDK allows engineers to codify investigation workflows into analyzers.

Post-processing system : After an investigation, the post-processing system can take automated actions based on the analysis results.

Bootstrap code : The DrP SDK provides bootstrap code to create a template analyzer with pre-populated boilerplate code.

Data access and analysis : The SDK includes libraries for data access and analysis, such as dimension analysis and time series correlation.

This provides immediate analysis results to on-call engineers.

5 months, 4 weeks назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 6 months, 2 weeks назад
We are joining OpenAI
We are joining OpenAI We are joining OpenAI

Piotr Niedźwiedź, CEO/CTO and founder of neptune.aiI’m excited to share that we’ve entered into a definitive agreement to be acquired by OpenAI, subject to closing conditions.

We are thrilled to join the OpenAI team and help their AI researchers build better models faster.

Neptune is a metrics dashboard company.”We’ve worked closely with OpenAI to create the metrics dashboard that helps teams building foundation models.

Our future with OpenAINeptune will join OpenAI and continue to support AI researchers with tools to monitor, debug, and evaluate frontier models.

We are looking forward to working with top AI researchers and supporting OpenAI’s mission of ensuring that AGI benefits all of hu…

6 months, 2 weeks назад @ neptune.ai
Synthetic Data for LLM Training
Synthetic Data for LLM Training Synthetic Data for LLM Training

For instance, financial data is highly sensitive and protected by very strict regulations, and synthetic data mimics the real data distribution without revealing customer information.

Read more about how leading foundation model teams curate their training data and other topics in the State of Foundation Model Training Report 2025.

Choosing the right synthetic data generation technique depends on the type of data and its complexity.

Synthetic tabular data generation is a promising direction to overcome these challenges by learning the distribution of the tabular data.

Post-processingAs the distribution of tabular data is highly complex, it makes the synthetic tabular data generation very ch…

7 months назад @ neptune.ai
What are LLM Embeddings: All you Need to Know
What are LLM Embeddings: All you Need to Know What are LLM Embeddings: All you Need to Know

TL;DR LLM embeddings are the numerical, vector representations of text that Large Language Models (LLMs) use to process information.

Unlike their predecessor word embeddings, LLM embeddings are context-aware and dynamically change to capture semantic and syntactic relationships based on the surrounding text.

What are the applications of LLM embeddings?

Word EmbeddingsSparse Word Embeddings One-Hot Vectors 1970s TF-IDF1980s Co-Occurrence MatrixStatic Word Embeddings Word2Vec 2013 GloVe 2014Contextualized word embeddings ELMo 2018 GPT-1 2018 BERT 2018 LLAMA 2023 DeepSeek-V1 2023 GPT-4 2023Static word embeddingsStatic word embeddings, such as word2vec in 2013, marked a significant development.…

7 months, 1 week назад @ neptune.ai
Detecting and Fixing ‘Dead Neurons’ in Foundation Models
Detecting and Fixing ‘Dead Neurons’ in Foundation Models Detecting and Fixing ‘Dead Neurons’ in Foundation Models

TL;DR Dead neurons silently waste compute and reduce effective model capacity in foundation models.

Dead neurons’ impactRecent studies into dead neurons in the context of foundation models show interesting, albeit worrying, results.

These large reported fractions of dead neurons in foundation models are a concern from a computational perspective.

Before we move on to discuss how to detect and fix dead neurons, let’s touch upon an important distinction between dead neurons and vanishing gradients.

Further reading How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models Read moreVisualizing activation distributionsIs your foundation model suffering from dead neurons?

7 months, 2 weeks назад @ neptune.ai
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training

In the first part of this series, we covered the fundamentals of instruction fine-tuning (IFT).

def calculate_irs(instruction, output, reference_model): evaluation_prompt = f""" Instruction: {instruction} Model Output: {output} Rate how well the output follows the instruction on these criteria: 1.

| SourceHINT addresses a computational inefficiency in standard instruction fine-tuning: repeatedly reprocessing the same task instruction with every input example.

Read more about foundation model training infrastructure and other topics in Neptune’s 2025 State of Foundation Model Training Report.

First, during initial instruction fine-tuning across multiple diverse tasks, the model learns genera…

7 months, 3 weeks назад @ neptune.ai
How to Optimize LLM Inference
How to Optimize LLM Inference How to Optimize LLM Inference

Large Language Model (LLM) inference at scale is challenging as it involves transferring massive amounts of model parameters and data and performing computations on large tensors.

In the following, we’ll use the Llama model family architecture as a specific example to understand the LLM workload at inference.

For a far more detailed analysis of the LLM workload at inference, see the chapter All About Transformer Inference in the book How to Scale Your Model, published by Google DeepMind.

See also How to Run LLMs Locally Read moreA quick primer on hardware for LLM inferenceA typical LLM inference cluster consists of several nodes, each with a multi-core CPU and multiple accelerator devices, …

8 months назад @ neptune.ai
A Researcher’s Guide to LLM Grounding
A Researcher’s Guide to LLM Grounding A Researcher’s Guide to LLM Grounding

In this article, we’ll explore the fundamental concepts of LLM grounding as well as strategies for optimally grounding models.

What is LLM grounding?

LLM grounding is analogous.

If relevant knowledge cannot be inferred from the data, then LLM grounding cannot yield more relevant responses.

When grounding LLMs using RAG, consider retaining only a few of the top hits (i.e., top-k) for your retrieval queries.

8 months, 3 weeks назад @ neptune.ai
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions

TL;DR Instruction fine-tuning (IFT) refines pre-trained large language models (LLMs) to follow specific task instructions by training on prompt-response pairs.

Instruction fine-tuning in a nutshellIFT tailors LLMs to follow user instructions by bridging their inherent next-word prediction with human-defined objectives.

Related LLM Fine-Tuning and Model Selection Using Neptune and Transformers Read moreParameter-efficient instruction fine-tuningWhile major foundation models like GPT-4 or Llama-2 undergo full parameter instruction fine-tuning during development, parameter-efficient fine-tuning (PEFT) methods have become widely adopted for instruction fine-tuning since the LoRA paper was publi…

9 months назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 3 months, 1 week назад
I BUILT A FULLY AUTOMATIC MANSPLAINER
I BUILT A FULLY AUTOMATIC MANSPLAINER I BUILT A FULLY AUTOMATIC MANSPLAINER

All information about GTC and the DGX Spark Raffle is here: https://www.ykilcher.com/gtc Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Ethereu…

3 months, 1 week назад @ youtube.com
Traditional X-Mas Stream
Traditional X-Mas Stream Traditional X-Mas Stream

Letsgooo

5 months, 2 weeks назад @ youtube.com
Traditional Holiday Live Stream
Traditional Holiday Live Stream Traditional Holiday Live Stream

https://ykilcher.com/discord Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yannic-kilcher

Minds: https://www.minds.com/ykilcher

Parler: https://parler.com/profile/YannicKilcher

LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https:/…

5 months, 2 weeks назад @ youtube.com
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis) TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)

Paper: https://arxiv.org/abs/2511.08923 Abstract:

Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation …

5 months, 3 weeks назад @ youtube.com
Titans: Learning to Memorize at Test Time (Paper Analysis)
Titans: Learning to Memorize at Test Time (Paper Analysis) Titans: Learning to Memorize at Test Time (Paper Analysis)

Paper: https://arxiv.org/abs/2501.00663 Abstract:

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We sh…

6 months назад @ youtube.com
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff) [Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)

https://arxiv.org/abs/2510.17558 Abstract:

We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks. Author: François Fleuret Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the con…

7 months, 2 weeks назад @ youtube.com
[Video Response] What Cloudflare's code mode misses about MCP and tool calling
[Video Response] What Cloudflare's code mode misses about MCP and tool calling [Video Response] What Cloudflare's code mode misses about MCP and tool calling

Theo's Video: https://www.youtube.com/watch?v=bAYZjVAodoo

Cloudflare article: https://blog.cloudflare.com/code-mode/ Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8…

8 months назад @ youtube.com
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant) [Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

Paper: https://arxiv.org/abs/2508.21038 Abstract:

Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assumption that these difficulties are exclusively due to unrealistic queries, and those that are not can be overcome with better training data and larger models. In this work, we demonstrate that we may encounter these theoretical limitations in realist…

8 months, 1 week назад @ youtube.com
Henry AI Labs Henry AI Labs
последний пост None
3blue1brown 3blue1brown
последний пост 4 days, 5 hours назад
Measuring the entropy of English
Measuring the entropy of English Measuring the entropy of English

Full video: https://youtu.be/l6DKRf-fAAM

4 days, 5 hours назад @ youtube.com
What's the perfect encoding? How do you know?
What's the perfect encoding? How do you know? What's the perfect encoding? How do you know?

Full video: https://youtu.be/l6DKRf-fAAM

6 days, 5 hours назад @ youtube.com
Reinventing Entropy | Compression & Intelligence Part 1
Reinventing Entropy | Compression & Intelligence Part 1 Reinventing Entropy | Compression & Intelligence Part 1

What is the fundamental compressibility of language?

Check out our virtual career fair: https://3b1b.co/talent

See new projects before they go live: https://3b1b.co/support Animation credit:

Manim scenes by Aaron Gostein and Grant Sanderson

Shannon’s story, as well as those for various pi creatures, by Mitchell Zemil.

Lunar robot and prediction/compression coin by Paul Dancstep

NanoGPT animations by Clayton Rabideau Shannon’s “A Mathematical Theory of Communication”

https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf Shannon’s “Prediction and Entropy of Printed English”

https://www.princeton.edu/~wbialek/rome/refs/shannon_51.pdf Scientific American article that…

1 week, 2 days назад @ youtube.com
Tie random ends: How many loops?
Tie random ends: How many loops? Tie random ends: How many loops?

Recent puzzle solutions on Patreon:

https://members.3blue1brown.com/posts/158885046?pr=true

3 weeks, 4 days назад @ youtube.com
Covering 10 points, a surprisingly tricky puzzle.
Covering 10 points, a surprisingly tricky puzzle. Covering 10 points, a surprisingly tricky puzzle.

Made as part of a monthly series of puzzles for the 2026 Year of Math.

2 months назад @ youtube.com
Escher's most mind-bending piece
Escher's most mind-bending piece Escher's most mind-bending piece

On "The Print Gallery", by M.C. Escher

Full video: https://youtu.be/ldxFjLJ3rVY

2 months, 2 weeks назад @ youtube.com
The subset sum puzzle
The subset sum puzzle The subset sum puzzle

Part of a series of monthly puzzlers. Stay subscribed to see the solution

2 months, 3 weeks назад @ youtube.com
Escher's most mathematically interesting piece
Escher's most mathematically interesting piece Escher's most mathematically interesting piece

Escher's Print Gallery, and the tour of complex analysis it invites.

Check out our virtual career fair: 3b1b.co/talent

Join channel supporters to see videos early: 3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Original paper by de Smit and Lenstra:

https://pub.math.leidenuniv.nl/~smitbde/papers/2003-de_smit-lenstra-escher.pdf Timestamps: 0:00 - The print gallery

13:04 - Conformal maps from complex analysis

21:41 - The complex exponential

25:56 - The complex logarithm

32:32 - 3b1b Talent

33:14 - Constructing the key function

40:16 - The deeper math behind Escher ------------------ These animations are largely made us…

2 months, 3 weeks назад @ youtube.com
Bacteria Grid Puzzle Solution
Bacteria Grid Puzzle Solution Bacteria Grid Puzzle Solution

Part of a monthly series of puzzlers, in collaboration with MoMath and Peter Winkler

2 months, 3 weeks назад @ youtube.com
The most underappreciated formula | Exploring high-dimensional spheres
The most underappreciated formula | Exploring high-dimensional spheres The most underappreciated formula | Exploring high-dimensional spheres

On the volumes of higher-dimensional spheres

Explore the 3b1b virtual career fair: See https://3b1b.co/talent

Become a supporter for early views of new videos: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Thanks to UC Santa Cruz for letting me film there, and special thanks to Pedro Morales-Almazan for arranging everything. My video on Numberphile with a fun application of this problem: https://youtu.be/6_yU9eJ0NxA Timestamps:

0:00 - Introduction

1:01 - Random puzzle

6:16 - Outside the box

14:35 - Setting up the volume grid

21:14 - Why 4πr^2

25:21 - Archimedes in higher dimensions

36:17 - The general formul…

3 months, 2 weeks назад @ youtube.com
The lattice bacteria puzzle
The lattice bacteria puzzle The lattice bacteria puzzle

Part of a series of monthly puzzles, done in collaboration with MoMath.

https://momath.org/mindbenders

3 months, 4 weeks назад @ youtube.com
Solution to the ladybug clock puzzle
Solution to the ladybug clock puzzle Solution to the ladybug clock puzzle

Solution to last month's probability puzzle.

4 months назад @ youtube.com
The Hairy Ball Theorem
The Hairy Ball Theorem The Hairy Ball Theorem

Unexpected applications and a beautiful proof.

Looking for a new career? Check out https://3b1b.co/talent

Supporters get early access to new videos: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Credits:

Senia Sheydvasser: Co-writing and sphere deformation animations

Paul Dancstep: Those lovely fluffy sphere animations Vince Rubinetti: Music Timestamps:

0:00 - To comb a hairy ball

1:24 - Applications

8:46 - The puzzle of one null point

12:12 - The proof outline

16:41 - Defining orientation

21:44 - Why inside-out is impossible

25:59 - 3b1b Talent

27:44 - Final food for thought ------------------ These animati…

4 months, 2 weeks назад @ youtube.com
The ladybug clock puzzle
The ladybug clock puzzle The ladybug clock puzzle

This is the first in a set of monthly puzzles, curated by Peter Winkler. This one was originally suggested by Richard Stanley. You can sign up to hear his description of the answer at http://momath.org/mindbenders

5 months назад @ youtube.com
The most absurd product I've made
The most absurd product I've made The most absurd product I've made

Because why not make a pi creature neck pillow?

Available at 3b1b.co/store

6 months, 3 weeks назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 2 часа назад
Claude AI Knows More Than It Tells You
Claude AI Knows More Than It Tells You Claude AI Knows More Than It Tells You

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://www.anthropic.com/research/natural-language-autoencoders

https://transformer-circuits.pub/2026/nla/index.html 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu

2 часа назад @ youtube.com
NVIDIA's New Free AI - A Gift To All of Us
NVIDIA's New Free AI - A Gift To All of Us NVIDIA's New Free AI - A Gift To All of Us

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The Nemotron 3 Ultra paper is available here:

https://research.nvidia.com/labs/nemotron/Nemotron-3-Ultra/ Free Rendering course and source code:

https://users.cg.tuwien.ac.at/zsolnai/gfx/rendering-course/ 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi Thumbnail design: https://f…

2 days, 3 hours назад @ youtube.com
AI Agents as "Games Masters"? 🎮🔥
AI Agents as "Games Masters"? 🎮🔥 AI Agents as "Games Masters"? 🎮🔥

Check the pinned comment for the link to the full interview. Could AI agents eventually become the "Games Master" driving your gaming storylines? We explore the concept of AI assisting players or creating dynamic, non-scripted narratives. Discover how AI is currently being tested inside immersive game environments to change how we play. 🧠 Hashtags: #aiingames #gaming #ai #gamedev #futuretech

1 week, 3 days назад @ youtube.com
DeepMind’s New AI Found A Strange New Way To Think
DeepMind’s New AI Found A Strange New Way To Think DeepMind’s New AI Found A Strange New Way To Think

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper is available here:

https://github.com/google-deepmind/alphaproof-nexus-results

https://arxiv.org/html/2605.22763v1 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu

1 week, 4 days назад @ youtube.com
Meet the AI "Co-Scientist" Changing Everything 🤖🧪 #ai
Meet the AI "Co-Scientist" Changing Everything 🤖🧪 #ai Meet the AI "Co-Scientist" Changing Everything 🤖🧪 #ai 1 week, 6 days назад @ youtube.com
Claude Opus 4.8: Lying Machine No More
Claude Opus 4.8: Lying Machine No More Claude Opus 4.8: Lying Machine No More

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Anthropic's Opus 4.8: https://www.anthropic.com/news/claude-opus-4-8 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu

1 week, 6 days назад @ youtube.com
A Second Nobel Prize for AlphaFold? 🧬🏆 #alphafold #deepmind #nobelprize #science #ai
A Second Nobel Prize for AlphaFold? 🧬🏆 #alphafold #deepmind #nobelprize #science #ai A Second Nobel Prize for AlphaFold? 🧬🏆 #alphafold #deepmind #nobelprize #science #ai

Check the pinned comment for the link to the full interview. We're discussing whether a "second order Nobel" prize is on the horizon for AI-driven science. With over 3 million researchers already using AlphaFold, the real-world impact is already historic. Hear what the experts think about what comes next for scientific discovery! 🔬

2 weeks назад @ youtube.com
Google's Jeff Dean On Data Center Fires, And The Future Of AI
Google's Jeff Dean On Data Center Fires, And The Future Of AI Google's Jeff Dean On Data Center Fires, And The Future Of AI

Thank you to Google for the invite! 🙏 I use Lambda GPU Cloud myself to rent NVIDIA GPUs for my projects - I’d really appreciate it if you checked them out! https://lambdalabs.com/papers Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumb…

2 weeks, 1 day назад @ youtube.com
Feynman vs. Einstein vs. Newton: Who Wins? 🧠🤔 #physics #ai #science #feynman #research
Feynman vs. Einstein vs. Newton: Who Wins? 🧠🤔 #physics #ai #science #feynman #research Feynman vs. Einstein vs. Newton: Who Wins? 🧠🤔 #physics #ai #science #feynman #research

Check the pinned comment for the link to the full interview. In this quick clip, we explore which legendary scientist ranks higher among the experts. It's a fun debate that leads into an even bigger discussion about AI's role in future scientific breakthroughs. You won't want to miss the full deep dive with Demis Hassabis! ⚡️

2 weeks, 1 day назад @ youtube.com
Google DeepMind CEO Likes Hard Questions
Google DeepMind CEO Likes Hard Questions Google DeepMind CEO Likes Hard Questions

Full video: https://youtu.be/huAwz_BR8WM

#shorts

3 weeks назад @ youtube.com
Insane AI Breakthroughs With Demis Hassabis
Insane AI Breakthroughs With Demis Hassabis Insane AI Breakthroughs With Demis Hassabis

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu

3 weeks, 1 day назад @ youtube.com
DeepSeek Just Changed How AI Sees Images Forever
DeepSeek Just Changed How AI Sees Images Forever DeepSeek Just Changed How AI Sees Images Forever

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://github.com/ailuntx/Thinking-with-Visual-Primitives

https://huggingface.co/datasets/NodeLinker/deepseek-ai-Thinking-with-Visual-Primitives-deleted-repo/blob/main/Thinking_with_Visual_Primitives.pdf Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn B…

3 weeks, 4 days назад @ youtube.com
NVIDIA’s New AI Is Fast For A Strange Reason
NVIDIA’s New AI Is Fast For A Strange Reason NVIDIA’s New AI Is Fast For A Strange Reason

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://arxiv.org/abs/2604.24954

https://developer.nvidia.com/blog/nvidia-nemotron-3-nano-omni-powers-multimodal-agent-reasoning-in-a-single-efficient-open-model/

https://huggingface.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, R…

1 month назад @ youtube.com
OpenAI's GPT 5.5 Instant: The Good, The Bad And The Insane
OpenAI's GPT 5.5 Instant: The Good, The Bad And The Insane OpenAI's GPT 5.5 Instant: The Good, The Bad And The Insane

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 GPT 5.5 Instant:

https://deploymentsafety.openai.com/gpt-5-5-instant/introduction

https://openai.com/index/gpt-5-5-instant/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwie…

1 month, 1 week назад @ youtube.com
DeepSeek V4 AI: Crushing The Competition
DeepSeek V4 AI: Crushing The Competition DeepSeek V4 AI: Crushing The Competition

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 Check out DeepSeek here:

https://www.deepseek.com/en/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu

1 month, 1 week назад @ youtube.com
DataFest Video DataFest Video
последний пост None
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 5 days, 7 hours назад
Borealis — как обучить аудио-LLM по цене MacBook
Borealis — как обучить аудио-LLM по цене MacBook Borealis — как обучить аудио-LLM по цене MacBook

На конференции Data Fest 2026 в Белграде независимый исследователь Александр Николич рассказал практическую историю создания аудиоязыковой модели Borealis с бюджетом, сопоставимым со стоимостью MacBook. Больше контента для разработчиков: https://t.me/+owyCvdge8WIyNTUy #DataFest #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

5 days, 7 hours назад @ youtube.com
Better LLM pre-training in NVFP4
Better LLM pre-training in NVFP4 Better LLM pre-training in NVFP4

At Data Fest 2026 in Belgrade, Andrei Panferov from the Institute of Science and Technology Austria introduced Quartet II, a novel method for NVFP4 pre-training that recovers SOTA accuracy. He outlined the core challenges of low-precision LLM training and presented CUDA kernels tuned for Blackwell GPUs, ready for integration into real training pipelines. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #datafest #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

5 days, 7 hours назад @ youtube.com
Как безопасно выкатывать новые версии продуктовых AI-агентов
Как безопасно выкатывать новые версии продуктовых AI-агентов Как безопасно выкатывать новые версии продуктовых AI-агентов

На Data Fest 2026 в Белграде Дмитрий Коршунов, Team Lead ML в Ecom, показал, как безопасно обновлять продуктовых AI-агентов с помощью системы автометрик. На примере агента Яндекс AI для турецкого рынка он объяснил, как фиксировать регрессии до прода, сравнивать версии и принимать решение о релизе, когда простой «Hello, Agent» уже позади. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

5 days, 7 hours назад @ youtube.com
HGRPO: Hierarchical Grouped Reward Policy Optimization for Multi-Turn Conversational Agents
HGRPO: Hierarchical Grouped Reward Policy Optimization for Multi-Turn Conversational Agents HGRPO: Hierarchical Grouped Reward Policy Optimization for Multi-Turn Conversational Agents

At Data Fest 2026 in Belgrade, Karina Romanova, Senior LLM Research Engineer, presented HGRPO — a hierarchical modification of GRPO for multi-turn dialogue agents. Applied to a booking agent in Yandex Alice, the method improved truthfulness by 8.0 percentage points and reduced dialogue length by 10.7%. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

5 days, 7 hours назад @ youtube.com
Как решаем оптимизационные задачи Яндекс Лавки с помощью uplift-моделей
Как решаем оптимизационные задачи Яндекс Лавки с помощью uplift-моделей Как решаем оптимизационные задачи Яндекс Лавки с помощью uplift-моделей

На Data Fest 2026 в Белграде Вячеслав Костров, ML-инженер в Яндексе, рассказал, как uplift-модели решают бизнес-задачи Лавки: от персональных скидок до показа продуктовых подборок. Он разобрал постановку uplift-задачи, подбор метрик и построение политик, а также практические приёмы с лагранжианом и uplift-деревьями для баланса ограничений. Всё это — на примере реальных внедрений и с разбором типичных ошибок. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #datafest #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #Mu…

5 days, 7 hours назад @ youtube.com
Поиск по архивам: как мы переходим к осознанному распознаванию текста
Поиск по архивам: как мы переходим к осознанному распознаванию текста Поиск по архивам: как мы переходим к осознанному распознаванию текста

На Data Fest 2026 в Белграде Дарья Виноградова, лид команды компьютерного зрения, представила два важных майлстоуна архивного поиска: новую архитектуру распознавания текста и выделение смысловых структур. Эти изменения делают поиск человечнее — теперь можно искать не слова среди текста, а человека среди людей. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #DataFest #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

5 days, 7 hours назад @ youtube.com
Hacks and Defenses in Automatic Kernel Generation
Hacks and Defenses in Automatic Kernel Generation Hacks and Defenses in Automatic Kernel Generation

На Data Fest 2026 в Белграде Егор Коновалов, ML-инженер, разобрал хаки, которые находят LLM-агенты, когда генерируют GPU/TPU-код: от тривиального обхода numerical tolerance до изощрённых атак на timing-измерения и эксплуатации дыр в test harness. А ещё Егор показал, какие методы защиты реально работают, а какие создают ложное чувство безопасности. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #datafest #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

5 days, 7 hours назад @ youtube.com
Real-time video generation: where we are and what comes next
Real-time video generation: where we are and what comes next Real-time video generation: where we are and what comes next

At Data Fest 2026 in Belgrade, Andrey Filatov from KREA AI broke down the current state of real-time video generation: which architectures dominate, how they differ, and what challenges arise from compute limits and memory bottlenecks. He also covered production solutions like distillation and caching, and shared his outlook for the next 2–3 years: what will soon become possible and which bottlenecks the industry still overlooks. More content for developers: https://t.me/+owyCvdge8WIyNTUy #datafest #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #Reinforceme…

5 days, 7 hours назад @ youtube.com
AI-генерация учебного контента и проверка открытых ответов студентов
AI-генерация учебного контента и проверка открытых ответов студентов AI-генерация учебного контента и проверка открытых ответов студентов

Доклад из секции ML & Education конференции Data Fest 2026 в гостях у Яндекса «AI-генерация учебного контента и проверка открытых ответов студентов». Спикер — Денис Королёв, доцент, МИЭМ НИУ ВШЭ. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #datafest #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

6 days, 9 hours назад @ youtube.com
Как оценивать нерандомизированные эксперименты быстрее и надёжнее?
Как оценивать нерандомизированные эксперименты быстрее и надёжнее? Как оценивать нерандомизированные эксперименты быстрее и надёжнее?

Доклад из секции Analytical DS конференции Data Fest 2026 в гостях у Яндекса «Double Machine Learning vs Propensity Score Matching: как оценивать нерандомизированные эксперименты быстрее и надёжнее?». Спикер — Платон Попов, аналитик данных в Wildberries & Russ. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

6 days, 9 hours назад @ youtube.com
EMPI Agent: фреймворк для нейроотличных студентов
EMPI Agent: фреймворк для нейроотличных студентов EMPI Agent: фреймворк для нейроотличных студентов

Доклад «EMPI Agent: фреймворк для нейроотличных студентов» из секции ML & Education конференции Data Fest 2026 в гостях у Яндекса. Спикер — Виктория Фирсанова, преподаватель в Высшей школе экономики. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

6 days, 9 hours назад @ youtube.com
Как выжать максимум из ML-моделей, когда данных слишком мало?
Как выжать максимум из ML-моделей, когда данных слишком мало? Как выжать максимум из ML-моделей, когда данных слишком мало?

Доклад из секции Analytical DS конференции Data Fest 2026 в гостях у Яндекса «Как выжать максимум из ML-моделей, когда данных слишком мало?». Спикер — Олеся Норицына, Data Scientist, Cube | D Innovate. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #datafest #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

6 days, 9 hours назад @ youtube.com
AI-тьютор и методы его оценки
AI-тьютор и методы его оценки AI-тьютор и методы его оценки

Доклад «AI-тьютор и методы его оценки» из секции ML & Education конференции Data Fest 2026 в гостях у Яндекса. Спикеры — Ольга Масаева и Полина Поветьева, исследователи данных в «Море данных». Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

6 days, 9 hours назад @ youtube.com
Атрибуция дальних действий в моделях конверсии
Атрибуция дальних действий в моделях конверсии Атрибуция дальних действий в моделях конверсии

Доклад «Атрибуция дальних действий в моделях конверсии» из секции Analytical DS конференции Data Fest 2026 в гостях у Яндекса. Спикер — Кирилл Вайсер, Data Scientist в Авито. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #datafest #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

6 days, 9 hours назад @ youtube.com
От линейного текста к семантическому графу
От линейного текста к семантическому графу От линейного текста к семантическому графу

Доклад из секции ML & Education конференции Data Fest 2026 в гостях у Яндекса «От линейного текста к семантическому графу: строим knowledge-extraction-пайплайн для учёбы». Спикер — Аскольд Романов, CJE Lead в Сбере. Больше материалов для разработчиков: https://t.me/+owyCvdge8WIyNTUy #DataFest2026 #AI #ML #LLM #GenAI #MachineLearning #DataScience #MLOps #AIAgents #RAG #ComputerVision #AutonomousDriving #Yandex #Яндекс #TechTalk #Developers #ArtificialIntelligence #ReinforcementLearning #MultimodalAI

6 days, 9 hours назад @ youtube.com
ML Trainings ML Trainings
последний пост 5 часов назад
Россия отстает в полупроводниках на полгода
Россия отстает в полупроводниках на полгода Россия отстает в полупроводниках на полгода 5 часов назад @ youtube.com
Капитал и контроль над технологиями
Капитал и контроль над технологиями Капитал и контроль над технологиями 5 часов назад @ youtube.com
Долги США и Китая: сравнение и различия
Долги США и Китая: сравнение и различия Долги США и Китая: сравнение и различия 5 часов назад @ youtube.com
Департамент ИТ и фронтирные модели
Департамент ИТ и фронтирные модели Департамент ИТ и фронтирные модели 5 часов назад @ youtube.com
Власти Вирджинии отклонили гигантский ЦОД под давлением общественности
Власти Вирджинии отклонили гигантский ЦОД под давлением общественности Власти Вирджинии отклонили гигантский ЦОД под давлением общественности 5 часов назад @ youtube.com
Британия хочет внедрить ИИ, а Индия — запретить
Британия хочет внедрить ИИ, а Индия — запретить Британия хочет внедрить ИИ, а Индия — запретить 5 часов назад @ youtube.com
Биологическое оружие и код: как обмануть LLM
Биологическое оружие и код: как обмануть LLM Биологическое оружие и код: как обмануть LLM 5 часов назад @ youtube.com
Anthropic и американское правительство
Anthropic и американское правительство Anthropic и американское правительство 5 часов назад @ youtube.com
Валентин Малых рассказывает о генерации TikTok
Валентин Малых рассказывает о генерации TikTok Валентин Малых рассказывает о генерации TikTok 5 часов назад @ youtube.com
Капитанский мостик 14.06.2026: Claude по паспорту | ИИ-наушники от Яндекса | Тикток-наука
Капитанский мостик 14.06.2026: Claude по паспорту | ИИ-наушники от Яндекса | Тикток-наука Капитанский мостик 14.06.2026: Claude по паспорту | ИИ-наушники от Яндекса | Тикток-наука

0:00:00 Начало

0:00:42 Claude по паспорту

0:07:02 Модель из Бразилии

0:14:49 Питер Тилль и Бразилия

0:20:33 Индия, Британия и ИИ

0:35:19 США, Филиппины и ИИ

0:40:30 Суд против ИИ-плагиата

0:44:00 Госдума за ИИ

0:49:09 Греф про ИИ-отставание

0:52:35 Китайский Stargate

0:57:15 Вирусописатели против ИИ

0:58:52 Вышла Kimi 2.7 Code

1:02:53 США хотят ИИ-компании

1:05:35 ИИ-наушники от Яндекса

1:09:17 Вирджиния против ЦОД

1:13:19 Тикток-наука ИИ-саммари: В этом выпуске обсуждаем последние новости в области технологий, включая влияние США и Китая на развитие ИИ, регулирование и инвестиции в высокотехнологичные проекты, а также влияние глобальных долговых обязательств на мировой рынок. В этом выпуск…

2 days, 3 hours назад @ youtube.com
Ценность кода в глазах заказчика
Ценность кода в глазах заказчика Ценность кода в глазах заказчика 1 week, 1 day назад @ youtube.com
Дмитрий и Валентин обсуждают ИИ и конфликт щита и меча
Дмитрий и Валентин обсуждают ИИ и конфликт щита и меча Дмитрий и Валентин обсуждают ИИ и конфликт щита и меча 1 week, 1 day назад @ youtube.com
Как отличить чушь от не чуши в психологии
Как отличить чушь от не чуши в психологии Как отличить чушь от не чуши в психологии 1 week, 1 day назад @ youtube.com
Как мы создаем продукт: экспертиза, которая ценится
Как мы создаем продукт: экспертиза, которая ценится Как мы создаем продукт: экспертиза, которая ценится 1 week, 1 day назад @ youtube.com
ChatGPT и способы верифицировать ответы
ChatGPT и способы верифицировать ответы ChatGPT и способы верифицировать ответы 1 week, 1 day назад @ youtube.com
Primer Primer
последний пост 5 months, 2 weeks назад
Taking AI Doom Seriously For 62 Minutes
Taking AI Doom Seriously For 62 Minutes Taking AI Doom Seriously For 62 Minutes

Patreon: https://www.patreon.com/primerlearning

80,000 Hours: 80000hours.org/primer https://www.desmos.com/calculator/a5pfjtr4tr Other connections:

Discord: https://discord.gg/NbruaNW

Twitch: https://www.twitch.tv/justin_helps

Store: https://store.dftba.com/collections/primer Reddit: https://www.reddit.com/r/primerlearning/

Bsky: https://bsky.app/profile/justinhelps.bsky.social

Twitter: https://twitter.com/primerlearning Links to other resources:

https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/

https://www.youtube.com/c/robertmilesai

https://www.youtube.com/@Siliconversations

https://www.youtube.com/@Go-Meta

https://www.youtube.com/@Dwarkes…

5 months, 2 weeks назад @ youtube.com
Simulating a single brain cell
Simulating a single brain cell Simulating a single brain cell

Patreon:

https://www.patreon.com/primerlearning Helpful resources if you want to learn more about neural networks

https://www.youtube.com/@AndrejKarpathy

https://course.fast.ai/

https://www.youtube.com/@WelchLabsVideo

https://www.youtube.com/@3blue1brown Early papers. These probably aren't helpful for understanding the concepts in this video, but if you're interested in history.

The Perceptron – A perceiving and recognizing automaton: https://bpb-us-e2.wpmucdn.com/websites.umass.edu/dist/a/27637/files/2016/03/rosenblatt-1957.pdf

The Perceptron: A probabilistic model for information storage and organization in the brain: https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf A Logical…

8 months, 3 weeks назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 2 weeks, 4 days назад
#497 – Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE – Don Lincoln
#497 – Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE – Don Lincoln #497 – Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE – Don Lincoln

Don Lincoln is a particle physicist at Fermilab who has spent decades working at the frontiers of high energy physics.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep497-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://upwork.com/lexLarridin: Measure AI adoption in your business.

Go to https://larridin.comFin: AI agent for customer service.

Go to https://fin.ai/lexLMNT: Zero-sugar electrolyte drink mix.

2 weeks, 4 days назад @ lexfridman.com
#496 – FFmpeg: The Incredible Technology Behind Video on the Internet
#496 – FFmpeg: The Incredible Technology Behind Video on the Internet #496 – FFmpeg: The Incredible Technology Behind Video on the Internet

Jean-Baptiste Kempf is lead developer of VLC and president of VideoLAN.

Kieran Kunhya is a longtime FFmpeg contributor, codec engineer, and the person behind the now-infamous FFmpeg account on X.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep496-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://larridin.comBlitzy: AI agent for large enterprise codebases.

Go to https://perplexity.ai/OUTLINE:(00:00) – Introduction(03:00) – Sponsors, Comments, and Reflections(10:48) – Weirdest things VLC opens(15:12) – How video playback works(24:33) – Video codecs and containers(35:20) – FFmpeg explained(56:20)…

1 month, 1 week назад @ lexfridman.com
#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age
#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age #495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age

Lars Brownworth is a historian, teacher, podcaster, and author specializing in Viking history, medieval Europe, and the Byzantine Empire.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep495-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://larridin.comBetterHelp: Online therapy and counseling.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

Go to https://perplexity.ai/OUTLINE:(00:00) – Introduction(01:03) – Sponsors, Comments, and Reflections(08:57) – The start of the Viking Age(18:50) – Viking military strategy, tactics & technology(32:33) – Ragnar Lothbrok(42:00) – The Grea…

2 months, 1 week назад @ lexfridman.com
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution #494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution

Jensen Huang is the co-founder and CEO of NVIDIA, the world’s most valuable company and the engine powering the AI computing revolution.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep494-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

Go to https://quo.com/lexOUTLINE:(00:00) – Introduction(00:26) – Sponsors, Comments, and Reflections(06:34) – Extreme co-design and rack-scale engineering(09:20) – How Jensen runs NVIDIA(28:41) – AI scaling laws(43:41) – Biggest blockers to AI scaling laws(45:25) – Supply chain(47:20) – Memory(53:25) – Power…

2 months, 3 weeks назад @ lexfridman.com
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming #493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming

Jeff Kaplan is a legendary Blizzard game designer of World of Warcraft and Overwatch, now preparing to launch a new game, The Legend of California, from his new studio Kintsugiyama – available to wishlist on Steam today, with alpha later in March.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep493-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://fin.ai/lexBlitzy: AI agent for large enterprise codebases.

Go to https://blitzy.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexShopify: Sell stuff online.

3 months назад @ lexfridman.com
#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music
#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music #492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music

Rick Beato is a music educator, interviewer, producer, songwriter, and a true multi-instrument musician, playing guitar, bass, cello & piano.

His incredible YouTube channel celebrates great musicians & musical ideas, and helps millions of people fall in love with great music all over again.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep492-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexBetterHelp: Online therapy and counseling.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

3 months, 2 weeks назад @ lexfridman.com
#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger
#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger #491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger

Peter Steinberger is the creator of OpenClaw, an open-source AI agent framework that’s the fastest-growing project in GitHub history.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep491-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://coderabbit.ai/lexFin: AI agent for customer service.

Go to https://fin.ai/lexBlitzy: AI agent for large enterprise codebases.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(03:51) – Sponsors, Comments, and Reflections(15:29) – OpenClaw origin story(18:48) – Mind-blowing moment(28:15) – Why OpenClaw went viral(32:12) – Self-modifying AI agent(36:57)…

4 months назад @ lexfridman.com
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI #490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI

Nathan Lambert and Sebastian Raschka are machine learning researchers, engineers, and educators.

Sebastian Raschka is the author of Build a Large Language Model (From Scratch) and Build a Reasoning Model (From Scratch).

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep490-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

(25:11) – ChatGPT vs Claude vs Gemini vs Grok: Who is winning?

(36:11) – Best AI for coding(43:02) – Open Source vs Closed Source LLMs(54:41) – Transformers: Evolution of LLMs since 2019(1:02:38) – AI Scaling Laws: Are they dead or still holding?

4 months, 2 weeks назад @ lexfridman.com
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle #489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle

Paul Rosolie is a naturalist, explorer, author of a new book titled Junglekeeper, and is someone who has dedicated his life to protecting the Amazon rainforest.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep489-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://perplexity.ai/BetterHelp: Online therapy and counseling.

Go to https://fin.ai/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

5 months назад @ lexfridman.com
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins #488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins

Joel David Hamkins is a mathematician and philosopher specializing in set theory, the foundations of mathematics, and the nature of infinity, and he’s the #1 highest-rated user on MathOverflow.

He is also the author of several books, including Proof and the Art of Mathematics and Lectures on the Philosophy of Mathematics.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep488-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://masterclass.com/lexpodOUTLINE:(00:00) – Introduction(01:58) – Sponsors, Comments, and Reflections(15:40) – Infinity & paradoxes(1:02:50) – Russell’s paradox(1:15:57) – Gödel’s…

5 months, 2 weeks назад @ lexfridman.com
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths #487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths

Irving Finkel is a scholar of ancient languages and a longtime curator at the British Museum, renowned for his expertise in Mesopotamian history and cuneiform writing.

He specializes in reading and interpreting cuneiform inscriptions, including tablets from Sumerian, Akkadian, Babylonian, and Assyrian contexts.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep487-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/Chevron: Reliable energy for data centers.

6 months назад @ lexfridman.com
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life #486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life

Michael Levin is a biologist at Tufts University working on novel ways to understand and control complex pattern formation in biological systems.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep486-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

(2:42:41) – Mind uploading(3:01:22) – Alien intelligence(3:16:17) – Advice for young people(3:22:46) – Questions for AGI

6 months, 2 weeks назад @ lexfridman.com
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy #485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy

David Kirtley is a nuclear fusion engineer and CEO of Helion Energy, a company working on building the world's first commercial fusion power plant by 2028.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep485-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/david-kirtley-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

David's X: htt…

7 months назад @ lexfridman.com
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming #484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming

Dan Houser is co-founder of Rockstar Games and is a legendary creative mind behind Grand Theft Auto (GTA) and Red Dead Redemption series of video games.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep484-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://box.com/aiUPLIFT Desk: Standing desks and office ergonomics.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(01:29) – Sponsors, Comments, and Reflections(11:32) – Greatest films of all time(23:45) – Making video games(26:36) – GTA 3(29:55) – Open world video games(32:42) – Character creation(36:09) – Superintelligent AI in A Bette…

7 months, 2 weeks назад @ lexfridman.com
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex #483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex

Julia Shaw is a criminal psychologist and author who in her books explores human nature, including psychopathy, violent crime, the psychology of evil, police interrogation, false memory manipulation, deception detection, and human sexuality.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep483-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

8 months назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 1 month, 3 weeks назад
Can we AI our way to a more sustainable world?
Can we AI our way to a more sustainable world? Can we AI our way to a more sustainable world?

Because I do think there’s a role for AI, a huge role for AI.

BURGER: Right, right.

BURGER: Right, right.

So I think that’s also something quite important here that, you know, AI can help facilitate.

And I think that’s not just applying AI to solve solutions through optimization but also thinking about this in an integrated way.

1 month, 3 weeks назад @ microsoft.com
Ideas: Steering AI toward the work future we want
Ideas: Steering AI toward the work future we want Ideas: Steering AI toward the work future we want

JANSSEN: Yeah, yeah, exactly.

TEEVAN: Yeah, yeah, yeah.

I’m curious what you have found particularly surprising about how people and organizations are leveraging AI right now.

And so I do like to picture a future of work where humans are flourishing with AI and where humans still get to do meaningful work.

And I’m very curious about how we can take advantage of AI and do more without running ourselves into the ground because we’re not AI, right?

2 months, 1 week назад @ microsoft.com
Will machines ever be intelligent?
Will machines ever be intelligent? Will machines ever be intelligent?

And the question we’re going to discuss is, are machines intelligent?

No, no, that’s right, that’s right.

I mean, in some sense, you could potentially have a super intelligent system, right, that’s far more intelligent than anything else on the planet.

BURGER: Right, right.

At the same time, I think, you know, transformers are not intelligent in the way that a three-year-old is, right?

2 months, 3 weeks назад @ microsoft.com
Trailer: The Shape of Things to Come
Trailer: The Shape of Things to Come Trailer: The Shape of Things to Come

Join Microsoft’s Doug Burger and guests as they dig into the fundamental truths about AI and how it will reshape the future.

Technical advances are moving at such a rapid pace that it can be challenging to define the tomorrow we’re working toward.

In The Shape of Things to Come, Microsoft research leader Doug Burger and experts from across disciplines tease out the thorniest AI issues facing technologists, policymakers, business decision-makers, and other stakeholders today.

It’s important to understand what the emerging shapes are and how we should respond.” – Doug Burger, Technical Fellow and Corporate Vice President, Microsoft ResearchAbout Doug BurgerDoug Burger is a research leader in …

3 months, 2 weeks назад @ microsoft.com
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

6 months, 2 weeks назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

8 months, 1 week назад @ microsoft.com
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education

KOHANE: So I think you’ve “nerd sniped” me because you [LAUGHTER]—which is all too easy—but I think there’s a central issue here.

But I actually think this is dark matter of human organizational technology that is not well understood.

AZEEM AZHAR: We didn’t talk about, you know, AI in its ability to potentially do this, which is to extend the clinician’s presence throughout the week.

And so I think there’s always going to be an opening for either differences of opinion or agreeing with you too much.

And this gets into whether AI is really going to get almost to the ab initio understanding of human biology.

9 months, 4 weeks назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 1 month, 2 weeks назад
Student Spotlight: Aaron Payne, Data Analyst
Student Spotlight: Aaron Payne, Data Analyst Student Spotlight: Aaron Payne, Data Analyst

Aaron Payne, an MBA student at Georgia Tech studying business analytics and a Senior Insights Analyst at Chick-fil-A, joins Kyle Polich to talk about turning analytics into decisions that matter. They unpack a real-world forecasting project with Comfama in Colombia, including messy data realities, interpretability tradeoffs, and why "data science for good" starts with the people impacted.

1 month, 2 weeks назад @ dataskeptic.com
The Future is Agentic in Recommender Systems
The Future is Agentic in Recommender Systems The Future is Agentic in Recommender Systems

Kyle Polich sits down with Yashar Deldjoo, research scientist and Associate Professor at the Polytechnic University of Bari, to explore how recommender systems have evolved and why trustworthiness matters. They unpack key dimensions of responsible AI, including robustness to adversarial attacks, privacy, explainability, and fairness, and discuss how LLMs introduce new risks like hallucinations. The episode closes with a look at "agentic" recommender systems, where tools and memory shift recommendations from ranked lists to end-to-end task completion.

1 month, 3 weeks назад @ dataskeptic.com
Book Ratings and Recommendations
Book Ratings and Recommendations Book Ratings and Recommendations

Goodreads star ratings can be misleading as measures of "book quality," and research from Hannes Rosenbusch suggests that for many professionally published books, differences between readers often matter more than differences between books. The episode also explores how to model reader preferences, why reviews often reveal more about the reviewer than the text, and how LLMs can aid computational literary research while still falling short of human editors in creative writing.

2 months, 3 weeks назад @ dataskeptic.com
Disentanglement and Interpretability in Recommender Systems
Disentanglement and Interpretability in Recommender Systems Disentanglement and Interpretability in Recommender Systems 3 months, 1 week назад @ dataskeptic.com
Collective Altruism in Recommender Systems
Collective Altruism in Recommender Systems Collective Altruism in Recommender Systems

Ekaterina (Kat) Filadova from MIT EECS joins us to discuss strategic learning in recommender systems—what happens when users collectively coordinate to game recommendation algorithms. Kat's research reveals surprising findings: algorithmic "protest movements" can paradoxically help platforms by providing clearer preference signals, and the challenge of distinguishing coordinated behavior from bot activity is more complex than it appears. This episode explores the intersection of machine learning and game theory, examining what happens when your training data actively responds to your algorithm.

3 months, 2 weeks назад @ dataskeptic.com
Niche vs Mainstream
Niche vs Mainstream Niche vs Mainstream

Anas Buhayh discusses multi-stakeholder fairness in recommender systems and the S'mores framework—a simulation allowing users to choose between mainstream and niche algorithms. His research shows specialized recommenders improve utility for niche users while raising questions about filter bubbles and data privacy.

3 months, 4 weeks назад @ dataskeptic.com
Healthy Friction in Job Recommender Systems
Healthy Friction in Job Recommender Systems Healthy Friction in Job Recommender Systems

In this episode, host Kyle Polich speaks with Roan Schellingerhout, a fourth-year PhD student at Maastricht University, about explainable multi-stakeholder recommender systems for job recruitment. Roan discusses his research on creating AI-powered job matching systems that balance the needs of multiple stakeholders—job seekers, recruiters, HR professionals, and companies. The conversation explores different types of explanations for job recommendations, including textual, bar chart, and graph-based formats, with findings showing that lay users strongly prefer simple textual explanations over more technical visualizations. Roan shares insights from his "healthy friction" study, which tested …

4 months, 2 weeks назад @ dataskeptic.com
Fairness in PCA-Based Recommenders
Fairness in PCA-Based Recommenders Fairness in PCA-Based Recommenders

In this episode, we explore the fascinating world of recommender systems and algorithmic fairness with David Liu, Assistant Research Professor at Cornell University's Center for Data Science for Enterprise and Society. David shares insights from his research on how machine learning models can inadvertently create unfairness, particularly for minority and niche user groups, even without any malicious intent. We dive deep into his groundbreaking work on Principal Component Analysis (PCA) and collaborative filtering, examining why these fundamental techniques sometimes fail to serve all users equally. David introduces the concept of "power niche users" - highly active users with specialized in…

4 months, 3 weeks назад @ dataskeptic.com
Video Recommendations in Industry
Video Recommendations in Industry Video Recommendations in Industry

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial …

5 months, 3 weeks назад @ dataskeptic.com
Eye Tracking in Recommender Systems
Eye Tracking in Recommender Systems Eye Tracking in Recommender Systems

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrate…

6 months назад @ dataskeptic.com
Cracking the Cold Start Problem
Cracking the Cold Start Problem Cracking the Cold Start Problem

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insigh…

6 months, 1 week назад @ dataskeptic.com
Designing Recommender Systems for Digital Humanities
Designing Recommender Systems for Digital Humanities Designing Recommender Systems for Digital Humanities

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challen…

6 months, 3 weeks назад @ dataskeptic.com
DataRec Library for Reproducible in Recommend Systems
DataRec Library for Reproducible in Recommend Systems DataRec Library for Reproducible in Recommend Systems

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Mario Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. The conversation covers Alberto's research journey through knowledge graphs, graph-based recommen…

7 months назад @ dataskeptic.com
Shilling Attacks on Recommender Systems
Shilling Attacks on Recommender Systems Shilling Attacks on Recommender Systems

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a …

7 months, 1 week назад @ dataskeptic.com
Music Playlist Recommendations
Music Playlist Recommendations Music Playlist Recommendations

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start prob…

7 months, 2 weeks назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 7 часов назад
1001: How AI Erased My Career Moat, an Episode #1001 Special: Jon Krohn interviewed by Kirill Eremenko
1001: How AI Erased My Career Moat, an Episode #1001 Special: Jon Krohn interviewed by Kirill Eremenko 1001: How AI Erased My Career Moat, an Episode #1001 Special: Jon Krohn interviewed by Kirill Eremenko

For this episode #1001 special, the tables are turned: SuperDataScience founder Kirill Eremenko takes the host’s chair and Jon Krohn is the guest. They trace Jon Krohn’s path from an Oxford neuroscience PhD to a New York hedge fund to founding the AI consulting firm Y Carrot, why he regrets leaving academia and how tools like Claude Code erased his hard-won technical moat and why that makes skilled engineers more valuable than ever. Along the way: whether AI is a bubble, Jevons paradox and the data-center boom, the RICE framework for choosing AI projects, the single biggest reason AI projects fail and how a well-built AI agent could give anyone “Christopher Nolan–like” focus. Additional mat…

7 часов назад @ podtrac.com
1000: Ten Years of the Super Data Science Podcast, with Jon, Kirill and Special Guests
1000: Ten Years of the Super Data Science Podcast, with Jon, Kirill and Special Guests 1000: Ten Years of the Super Data Science Podcast, with Jon, Kirill and Special Guests

For this landmark 1,000th episode and the show’s 10-year anniversary, host Jon Krohn is joined by SuperDataScience founder Kirill Eremenko, who hosted the podcast for its first 400-plus episodes before handing over the reins. In a first for the show, the episode was recorded live with the audience invited to join on air, alongside surprise appearances from the team, longtime guests, and even Jon’s family. Together, Jon Krohn and Kirill look back on a decade of the podcast and field listener questions on AI’s biggest opportunities, the build-versus-buy dilemma, how to break into the field today, and how to stay grounded amid the relentless pace of AI. Additional materials:⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠…

4 days, 7 hours назад @ podtrac.com
999: What's Left to Build When Software Is Free, with Chip Huyen
999: What's Left to Build When Software Is Free, with Chip Huyen 999: What's Left to Build When Software Is Free, with Chip Huyen

Chip Huyen joins host Jon Krohn for this milestone episode 999 to talk about her record-breaking book "AI Engineering" the most-read title on the O'Reilly platform last year and how the AI landscape has shifted since her last appearance. Chip breaks down what separates AI engineering from machine learning engineering, makes the case for a "start simple" workflow, gets candid about the real costs of running LLMs in production, and shares why she's now fascinated by physical AI, robotics, and world models and why the durable problems worth solving are increasingly human ones. Jon Krohn guides the conversation from the practical content of the book through to where the field is heading next. A…

1 week назад @ podtrac.com
998: In Case You Missed It in May 2026
998: In Case You Missed It in May 2026 998: In Case You Missed It in May 2026

In this month’s episode of ICYMI, Jon Krohn explores how AI agents are simultaneously creating new risks and unlocking powerful new ways of working with data. Hear from Anneka Gupta, Cal Al-Dhubaib, Trevor Manz, Jazmia Henry, Jeremy Mumford, and Jacob Miller, discussing why the old cybersecurity playbook breaks down in the age of Claude Mythos, how the notebook became an AI agent’s working memory, what it really takes to build a foundation model from scratch, and why failing slowly is the most expensive mistake an AI team can make. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/998⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natali…

1 week, 4 days назад @ podtrac.com
997: How This AI Startup Hit 20M Users (No Moat)
997: How This AI Startup Hit 20M Users (No Moat) 997: How This AI Startup Hit 20M Users (No Moat)

Dr. Andrey Kurenkov returns to the show to talk about Astrocade's astronomical growth from pre-alpha to over 20 million engaged users, what it actually takes to build a vibe-coding platform that scales, and how the broader AI landscape has shifted since his last appearance. Andrey shares behind-the-scenes lessons from building B2C user-generated content products, why the real moat is community rather than tech, and his current thinking on humanoid robotics, AGI, and the AI risks people actually overlook. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.superdatascience.com/997⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode…

2 weeks назад @ podtrac.com
996: TrueFoundry’s Nikunj Bajaj on How to Get $100M Returns on AI Agent Deployments
996: TrueFoundry’s Nikunj Bajaj on How to Get $100M Returns on AI Agent Deployments 996: TrueFoundry’s Nikunj Bajaj on How to Get $100M Returns on AI Agent Deployments

TrueFoundry co-founder and CEO Nikunj Bajaj speaks to Jon Krohn about how enterprises like Nvidia and Siemens are realizing returns of over $100 million from single agent deployments, the AI gateway architecture that makes it possible to connect, observe, and govern agents at scale, and why the familiar advice to “start small” is the wrong way to roll out AI agents inside a large organization. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/996 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.⁠⁠⁠ In this episode you will learn: (01:21) What TrueFoundry does and why agents in production nee…

2 weeks, 4 days назад @ podtrac.com
995: End-to-End Foundation Models for the Energy Industry, with Jazmia Henry
995: End-to-End Foundation Models for the Energy Industry, with Jazmia Henry 995: End-to-End Foundation Models for the Energy Industry, with Jazmia Henry

Jazmia Henry joins Jon Krohn to break down what it actually takes to build end-to-end foundation models for the energy industry. From wrangling decades of handwritten oil-and-gas documents into usable training data, to bespoke tokenizers, reinforcement learning, and inference at scale, Jazmia walks through every stage of the stack. Along the way she explains why reinforcement learning models are "bursty," what reward hacking is and how her Grounded Continuous Evaluation framework fixes it, and revisits the 2023 NeurIPS paper that argued, to widespread skepticism at the time, that scaling bad data degrades model performance. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠h…

3 weeks назад @ podtrac.com
994: AI’s Putting Recent Grads Out of Work; Here’s How to Get Hired Anyway!
994: AI’s Putting Recent Grads Out of Work; Here’s How to Get Hired Anyway! 994: AI’s Putting Recent Grads Out of Work; Here’s How to Get Hired Anyway!

Unemployment for recent computer-science graduates now rivals rates for fine-arts and anthropology majors, and undergraduate CS enrollment fell 11% in 2025. In this Five-Minute Friday, Jon Krohn walks through the data on both sides of the debate, from Stanford research showing a 13% employment drop for young workers in AI-exposed jobs, to Federal Reserve studies finding no statistically detectable link between AI adoption and reduced hiring. Jon shares his own view on where the truth lies and offers five concrete pieces of advice for graduates and senior professionals alike on how to get hired in 2026. Additional materials:⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/993⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠…

3 weeks, 4 days назад @ podtrac.com
993: How to Build AI-First Organizations, with Jacob Miller and Jeremy Mumford
993: How to Build AI-First Organizations, with Jacob Miller and Jeremy Mumford 993: How to Build AI-First Organizations, with Jacob Miller and Jeremy Mumford

For years, AI content has come in the form of “use this library, use this tool” tutorials that age out within months. Jacob Miller and Jeremy Mumford, co-authors of the brand new Wiley book Architected Intelligence, wanted to write something different, a guide to the higher-level principles of building AI products and AI-first organizations that will still be relevant in five or ten years. In this episode, the two Pattern engineers walk Jon Krohn through the core ideas of their book: why you should design products and processes so they can be executed by a human, an AI agent, or any hybrid combination; why most companies are still treating hallucinations as a model problem when they’re actu…

4 weeks назад @ podtrac.com
992: Tokenmaxxing vs AI Hardware Bottlenecks
992: Tokenmaxxing vs AI Hardware Bottlenecks 992: Tokenmaxxing vs AI Hardware Bottlenecks

While “tokenmaxxing”, the social media trend of maximizing AI token consumption as a vanity metric, takes off online, the physical infrastructure behind AI is slamming into serious bottlenecks. In this Five-Minute Friday, Jon Krohn maps out the four overlapping supply-chain constraints choking AI compute: GPUs (with NVIDIA Blackwell sold out through mid-2026), high-bandwidth memory (quintupled demand since 2023, only three manufacturers worldwide), CPUs (agentic AI requires 12x more CPUs per GPU than chatbots), and electricity (Gartner projects power shortages will restrict 40% of AI data centres by 2027). Find out why the five biggest hyperscalers are on track to spend $725 billion on AI i…

1 month назад @ podtrac.com
991: Pair Programming with AI in Your Python Notebook, with Dr. Trevor Manz
991: Pair Programming with AI in Your Python Notebook, with Dr. Trevor Manz 991: Pair Programming with AI in Your Python Notebook, with Dr. Trevor Manz

Dr. Trevor Manz of Marimo talks to Jon Krohn about Marimo Pair, an open-source agent skill that teaches coding agents like Claude Code how to drive a reactive Python notebook, reading cell state, running Python in the kernel, taking screenshots of cells, and iterating on data tasks the way agents iterate on traditional software. Trevor also unpacks recursive language models, his AnyWidget project that bridges Python and the web, and his journey from a Wisconsin small town and Harvard bioinformatics research to founding-engineer life at Marimo. Listen to the episode to hear why no matter where AI takes us, curiosity and going deep on a topic will always be valuable. Additional materials: ⁠⁠⁠…

1 month назад @ podtrac.com
990: Inside Mythos: Anthropic's Locked-Down Frontier Model
990: Inside Mythos: Anthropic's Locked-Down Frontier Model 990: Inside Mythos: Anthropic's Locked-Down Frontier Model

Anthropic has built a frontier AI model so capable at finding software vulnerabilities that it has decided not to release it to the general public. In this Five-Minute Friday, Jon Krohn breaks down Claude Mythos Preview, a general-purpose model whose hacking abilities emerged as a side effect of broad improvements in code understanding and reasoning. Find out how Mythos achieved a nearly 100x improvement over Opus 4.6 on Firefox exploit generation, why Mozilla patched 271 vulnerabilities in a single release using an early version of the model, and what Project Glasswing Anthropic’s gated industry consortium means for the future of cybersecurity. Jon also shares practical tips for securing t…

1 month, 1 week назад @ podtrac.com
989: Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib
989: Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib 989: Security for Mythos-Era Agentic Risks, with Rubrik’s Anneka Gupta and Cal Al-Dhubaib

Rubrik’s Anneka Gupta and Cal Al-Dhubaib speak to Jon Krohn about cybersecurity measures, the risks AI in business might pose for malicious attacks, and why AI should be kept “boring.” Find out how Rubrik safeguards client data, what zero trust is in the context of cybersecurity, and why cyber-resilience needs to be a top priority for companies looking to adopt AI. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/989⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (02:25) All about Rubrik (08:51) The announcement of Claude …

1 month, 1 week назад @ podtrac.com
988: In Case You Missed It in April 2026
988: In Case You Missed It in April 2026 988: In Case You Missed It in April 2026

In this month’s episode of In Case You Missed It, Jon Krohn talks to guests about memory and education, and how artificial intelligence is continuing to help lower the barriers to access. Hear from Matt Glickman, Traci Walker-Griffith, Richmond Alake, and Linda Haviv, discussing the foundations of AI agent memory, how engineers can develop at scale, and why they believe AI could be your child’s perfect tutor in the classroom. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/988⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 2 weeks назад @ podtrac.com
987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv
987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv 987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv

Linda Haviv talks to Jon Krohn about staying current on AI matters, why open-source technology is narrowing the gap in its race with proprietary models, and how being a content creator in tech is key to career growth and longevity. She emphasizes that non-linear pathways to a career in tech can give applicants an edge, and stresses the importance of continuous upskilling to “stay relevant.” In her view, systems thinking is becoming more important than coding skills. Hear why in this episode. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/987⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascienc…

1 month, 2 weeks назад @ podtrac.com
Data Science at Home Data Science at Home
последний пост 4 weeks назад
Recommend and manipulate: the dangers of the attention economy
Recommend and manipulate: the dangers of the attention economy Recommend and manipulate: the dangers of the attention economy

This sort of operation is directly exploiting a core feature of internet social media platforms.

The main purpose of recommender systems is to recommend people the same items similar people show an interest in.

Some of the most common methods to implement recommender systems, use concepts such as cosine/correlation similarity, matrix factorization, neural autoencoders and sequence predictors.

As you say, recommender systems exist because the business model of social media platforms is to monetise attention.

F: So you are saying that this is not an accident: is this the basis of the optimisation of the recommender system?

4 weeks назад @ datascienceathome.com
Social media is an ant mill (Internet is a disaster) (Ep. 303)
Social media is an ant mill (Internet is a disaster) (Ep. 303) Social media is an ant mill (Internet is a disaster) (Ep. 303)

Personal newsletter:https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews…

4 weeks назад @ datascienceathome.com
AI and videogames (Ep. 305)
AI and videogames (Ep. 305) AI and videogames (Ep. 305)

What is the state of AI and videogames?

This and much more is covered in this 1st episode of AI and videogames.

Check outshift.comNEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at: [email protected]’t forget to like, subscribe, and hit the 🔔 for updates on the latest in AI and data science!

4 weeks назад @ datascienceathome.com
AI and videogames: Conversational NPCs (Ep. 306)
AI and videogames: Conversational NPCs (Ep. 306) AI and videogames: Conversational NPCs (Ep. 306)

Can NPCs in videogames leverage new LLM-based tech?

Check outshift.comNEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at: [email protected]’t forget to like, subscribe, and hit the 🔔 for updates on the latest in AI and data science!

4 weeks назад @ datascienceathome.com
AI tips & tricks (Ep. 307)
AI tips & tricks (Ep. 307) AI tips & tricks (Ep. 307)

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastSPONSORSThis episode is brought to you by Outshift, Cisco’s incubation engine.

Check outshift.comNEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at: [email protected]’t forget to like, subscribe, and hit the …

4 weeks назад @ datascienceathome.com
Europe, wake up! You Can’t Be a Superpower on Someone Else’s Servers (Ep. 304)
Europe, wake up! You Can’t Be a Superpower on Someone Else’s Servers (Ep. 304) Europe, wake up! You Can’t Be a Superpower on Someone Else’s Servers (Ep. 304)

Tech sovereignty takes 3 years and political will.

Check outshift.comNEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at: [email protected]’t forget to like, subscribe, and hit the 🔔 for updates on the latest in AI and data science!

1 month, 3 weeks назад @ datascienceathome.com
About Apple’s Privacy (Ep. 302)
About Apple’s Privacy (Ep. 302) About Apple’s Privacy (Ep. 302)

Apple just spent $2B on tech that reads your silent speech.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at: [email protected]’t forget to like, subscribe, and hi…

1 month, 3 weeks назад @ datascienceathome.com
Productivity is the new data breach (Ep. 301)
Productivity is the new data breach (Ep. 301) Productivity is the new data breach (Ep. 301)

Personal newsletter:https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews…

1 month, 3 weeks назад @ datascienceathome.com
Programmable Money: The Cage They’ll Call Convenience (Ep. 300)
Programmable Money: The Cage They’ll Call Convenience (Ep. 300) Programmable Money: The Cage They’ll Call Convenience (Ep. 300)

This episode breaks down programmable money, the technology that turns your wallet into a permission system.

Personal newsletter: https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at: …

1 month, 3 weeks назад @ datascienceathome.com
There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299)
There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299) There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299)

Personal newsletter: https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/ Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, intervi…

3 months, 1 week назад @ datascienceathome.com
Bias in the machine (edited)
Bias in the machine (edited) Bias in the machine (edited)

The title of today’s episode is Bias in the machineC: Francesco, today we are starting with an infuriating discussion.

The failure of the medical community as a whole to recognise this obvious bias up to the 21st century is an example of how insidious the problem of bias is.

Three: The bias in your training sample: people put training samples together, and people have culture, experience, and prejudice.

These assumptions inform the way AI systems work—and fail—to this day.

When an algorithm is a black box and you can’t look inside, you have no way of analysing its bias.

3 months, 1 week назад @ datascienceathome.com
What is wrong with reinforcement learning? (Ep. 82)
What is wrong with reinforcement learning? (Ep. 82) What is wrong with reinforcement learning? (Ep. 82)

Join the discussion on our Discord serverAfter reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here.In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions.

RL seems to work so well!

What is wrong with it?

Are you a listener of Data Science at Home podcast?

Or did you subscribe to the Artificial Intelligence at your fingertips newsletter?

4 months, 1 week назад @ datascienceathome.com
How to generate very large images with GANs (Ep. 76)
How to generate very large images with GANs (Ep. 76) How to generate very large images with GANs (Ep. 76)

Join the discussion on our Discord serverIn this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs.

The problem is not as trivial as it seems.

Many researchers have failed in generating large images with GANs before.

One interesting application of such approach is in medicine for the generation of CT and X-ray images.Enjoy the show!

ReferencesMulti-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376

4 months, 1 week назад @ datascienceathome.com
Training neural networks faster without GPU [RB] (Ep. 77)
Training neural networks faster without GPU [RB] (Ep. 77) Training neural networks faster without GPU [RB] (Ep. 77)

Join the discussion on our Discord serverTraining neural networks faster usually involves the usage of powerful GPUs.

In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.

Enjoy the show!

ReferencesFaster Neural Network Training with Data Echoinghttps://arxiv.org/abs/1907.05550

4 months, 1 week назад @ datascienceathome.com
More powerful deep learning with transformers (Ep. 84) (Rebroadcast)
More powerful deep learning with transformers (Ep. 84) (Rebroadcast) More powerful deep learning with transformers (Ep. 84) (Rebroadcast)

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.

Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful.

Don’t forget to subscribe to our Newsletter or join the discussion on our Discord serverReferences

4 months, 1 week назад @ datascienceathome.com