Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 4 часа назад
What kind of tools do you need to research AI? [D]
What kind of tools do you need to research AI? [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

4 часа назад @ reddit.com
What should happen when you feed impossible moves into a chess-playing language model? [D]
What should happen when you feed impossible moves into a chess-playing language model? [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

6 часов назад @ reddit.com
[P] I built a small tool that makes LLMs respect your project decisions (no agents, no vector DB) [P]
[P] I built a small tool that makes LLMs respect your project decisions (no agents, no vector DB) [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

6 часов назад @ reddit.com
ResBM: a new transformer-based architecture for low-bandwidth pipeline-parallel training, achieving 128× activation compression [R]
ResBM: a new transformer-based architecture for low-bandwidth pipeline-parallel training, achieving 128× activation compression [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

8 часов назад @ reddit.com
Camera-ready paranoia [D]
Camera-ready paranoia [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

9 часов назад @ reddit.com
Need feedback on my Senior Thesis: An automated MLOps pipeline for AI news classification & summarization [D]
Need feedback on my Senior Thesis: An automated MLOps pipeline for AI news classification & summarization [D] Need feedback on my Senior Thesis: An automated MLOps pipeline for AI news classification & summarization [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

9 часов назад @ reddit.com
Can frontier AI models actually read a painting? [R]
Can frontier AI models actually read a painting? [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

12 часов назад @ reddit.com
[ICML 2026] Scores increased and then decreased!! [D]
[ICML 2026] Scores increased and then decreased!! [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

17 часов назад @ reddit.com
My Intrusion Detection ML Model Failed in Real Lab Testing [D]
My Intrusion Detection ML Model Failed in Real Lab Testing [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

18 часов назад @ reddit.com
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

20 часов назад @ reddit.com
Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]
Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

20 часов назад @ reddit.com
AI for Materials Science starter kit [D]
AI for Materials Science starter kit [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

22 часа назад @ reddit.com
Thesis: an agent-native workspace for running and tracking ML experiments [P]
Thesis: an agent-native workspace for running and tracking ML experiments [P] Thesis: an agent-native workspace for running and tracking ML experiments [P]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day назад @ reddit.com
Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]
Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day назад @ reddit.com
Failure to Reproduce Modern Paper Claims [D]
Failure to Reproduce Modern Paper Claims [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day назад @ reddit.com
Towards Data Science
последний пост 4 часа назад
What It Actually Takes to Run Code on 200M€ Supercomputer
What It Actually Takes to Run Code on 200M€ Supercomputer What It Actually Takes to Run Code on 200M€ Supercomputer

Login nodes are strictly for lightweight tasks: moving files, compiling code, and submitting job scripts to the scheduler.

You write a bash script detailing exactly what hardware you need, what software environments to load, and what code to execute.

SLURM puts your job in a queue, finds the hardware when it becomes available, executes your code, and releases the nodes.

This is done using #SBATCH directives placed at the top of your submission script.

Their “Development Access” track is specifically designed for projects porting code or benchmarking ML models, making it highly accessible for data scientists.

4 часа назад @ towardsdatascience.com
Your Chunks Failed Your RAG in Production
Your Chunks Failed Your RAG in Production Your Chunks Failed Your RAG in Production

If you haven’t read it yet check it out here: A practical guide to RAG for Enterprise Knowledge BasesA RAG pipeline does not retrieve documents.

For our engineering documents, context precision improved noticeably.

After switching to sentence windows for our narrative documents, context recall moved to 0.88 and faithfulness held at 0.91.

Manually read a random sample of chunks including the ones that produced wrong answers, not just the ones you tested on.

In most production RAG systems, the bottleneck is the decision about where one chunk ends and the next begins.

6 часов назад @ towardsdatascience.com
Building My Own Personal AI Assistant: A Chronicle, Part 2
Building My Own Personal AI Assistant: A Chronicle, Part 2 Building My Own Personal AI Assistant: A Chronicle, Part 2

the first part of my journey building Fernão, my personal AI agent.

Remember the function that fetched the calendar through ICS (the universal calendar format) and extracted my calendar tasks?

We now have a beautiful way to fetch the calendar events via API:def get_events_for_date(target_date=None, use_api=True): """ Fetches events for a specific date from Google Calendar.

use_api: If True, try Google Calendar API first.

if use_api and GCAL_API_AVAILABLE: print("[GCal] Attempting to use Google Calendar API...") try: events = get_events_for_date_api(target_date) if events is not None: print(f"[GCal] Successfully fetched {len(events)} events via API") return events else: print("[GCal] API ret…

8 часов назад @ towardsdatascience.com
memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required
memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

The deeper issue is that none of these tools were designed for agent memory.

memweave indexes them into a local SQLite database and lets you search across them with hybrid BM25 + semantic vector search.

How memweave Organises Memory: Evergreen Files, Dated Logs, and Agent NamespacesNot all knowledge ages equally.

The 70/30 split reflects the nature of most agent memory queries: conceptual and paraphrased more often than exact-string lookups.

A local SQLite database indexes them for hybrid search — BM25 for exact matches, vector search for semantic retrieval, merged into a single ranked list.

9 часов назад @ towardsdatascience.com
Introduction to Deep Evidential Regression for Uncertainty Quantification
Introduction to Deep Evidential Regression for Uncertainty Quantification Introduction to Deep Evidential Regression for Uncertainty Quantification

to evidential deep learning (EDL), a framework for one-shot quantification of epistemic and aleatoric uncertainty.

Formalizing Uncertainty and Uncertainty Quantification (UQ) ApproachesNow that we have established problems with naively taking softmax as a measure of uncertainty, we should formalize the concept of uncertainty.

We would expect high aleatoric uncertainty where data is noisy but high epistemic uncertainty in out-of-distribution regions.

Actually, EDL is known for sometimes providing unreliable absolute uncertainty estimates — high aleatoric uncertainty usually leads to high epistemic uncertainty so they cannot be fully disentangled (see this paper for more details).

Evidential …

11 часов назад @ towardsdatascience.com
How to Maximize Claude Cowork
How to Maximize Claude Cowork How to Maximize Claude Cowork

Why use Claude CoworkI think there are two main sides to why you should use Claude Cowork.

How to effectively use Claude CoworkNow, let’s move on to how you should be using Claude Cowork.

This makes it incredibly easy to work with visuals inside Claude Cowork, which is one of the major advantages of working in Cowork rather than Claude Code.

These visualizations are one of the major advantages of working in Claude Cowork rather than Claude Code in the terminal.

I urge you to try out Claude Cowork yourself to experience what it’s like, and you can determine for yourselves whether you want to perform some tasks in Claude Cowork or if you want to use Claude Cowork for everything.

1 day, 6 hours назад @ towardsdatascience.com
Prefill Is Compute-Bound. Decode Is Memory-Bound. Why Your GPU Shouldn’t Do Both.
Prefill Is Compute-Bound. Decode Is Memory-Bound. Why Your GPU Shouldn’t Do Both. Prefill Is Compute-Bound. Decode Is Memory-Bound. Why Your GPU Shouldn’t Do Both.

Splitting the inference path in twoDisaggregated inference runs prefill and decode on separate GPU pools connected by a fast network.

With large batch sizes (hundreds of concurrent decode requests), decode utilization rises because the memory reads are amortized across more work.

The KV-cache produced during prefill has to move from the prefill GPU to the decode GPU over the network, and these caches are not small.

Prefill and decode pools are separate deployments that autoscale independently based on queue depth and GPU utilization.

The SPAD paper takes this logic to its extreme: they propose “right-sizing” GPU designs into separate prefill and decode chips.

1 day, 8 hours назад @ towardsdatascience.com
5 Practical Tips for Transforming Your Batch Data Pipeline into Real-Time: Upcoming Webinar
5 Practical Tips for Transforming Your Batch Data Pipeline into Real-Time: Upcoming Webinar 5 Practical Tips for Transforming Your Batch Data Pipeline into Real-Time: Upcoming Webinar

It’s a common scenario: years ago, you and your data team built a data pipeline that “got the job done” with a big overnight batch.

Here are five practical tips to keep your team on track as you modernize your data pipeline from an overnight batch system to one that consistently provides up-to-date information to your entire platform.

If you have a small amount of data that rarely updates or lacks time-sensitivity, you probably don’t need CDC.

Think of data pipeline modernization as steadily turning up a dimmer, not flipping a light switch.

These platforms also integrate well with orchestration tools, making it easier to manage and automate your data pipelines.

1 day, 8 hours назад @ towardsdatascience.com
From Pixels to DNA: Why the Future of Compression Is About Every Kind of Data
From Pixels to DNA: Why the Future of Compression Is About Every Kind of Data From Pixels to DNA: Why the Future of Compression Is About Every Kind of Data

The latest video standard is called VVC (Versatile Video Coding) and was published in 2020.

Time is changing and bitrate-reduction-for-the-same-visual-quality, although important, is not the only motivation behind the creation of a new video codec.

AI in video coding: Hybrid, super‑resolution, end‑to‑endWhat about AI applied to video coding?

Video for Machines: VCM and FCMMost people still think of video compression as something done “for humans” to watch.

Video coding for machines (VCM)VCM reorganizes the classical video coding pipeline around machine task performance rather than human visual quality.

1 day, 9 hours назад @ towardsdatascience.com
From OpenStreetMap to Power BI: Visualizing Wild Swimming Locations
From OpenStreetMap to Power BI: Visualizing Wild Swimming Locations From OpenStreetMap to Power BI: Visualizing Wild Swimming Locations

To query Overpass API we are going to use Overpass Query Language (Overpass QL), a C-syntax-like language.

Figure 1 – Overpass turbo with default query – screenshot by the authorFigure 1 shows us what Overpass Turbo looks like.

Figure 5 – XML Data retrieved by the query – screenshot by the authorAs you can see, the query returns the data in XML.

We can use this in Power BI, but it will need extra work to transform the query in the data we need.

Getting Openstreetmap data in Power BIAt this point, we leave Overpass Turbo and enter Power BI.

1 day, 11 hours назад @ towardsdatascience.com
RAG Isn’t Enough — I Built the Missing Context Layer That Makes LLM Systems Work
RAG Isn’t Enough — I Built the Missing Context Layer That Makes LLM Systems Work RAG Isn’t Enough — I Built the Missing Context Layer That Makes LLM Systems Work

The Breaking Point of RAG SystemsI built a RAG system that worked perfectly — until it didn’t.

Context engineering is the layer in between — the architectural decisions about what information flows into the context window, how much of it, and in what form.

Full Pipeline ArchitectureA complete context engineering pipeline for RAG systems, combining retrieval, memory management, compression, and token budget control to build efficient and scalable LLM applications.

Documents tagged with memory , context , rag , or embedding receive a tag_importance of 1.4; all others receive 1.0.

Context Engineering.

2 days, 5 hours назад @ towardsdatascience.com
Data Modeling for Analytics Engineers: The Complete Primer
Data Modeling for Analytics Engineers: The Complete Primer Data Modeling for Analytics Engineers: The Complete Primer

However, data modeling begins long before your data is stored in a spreadsheet or in a real database.

In the following sections, we will introduce core data modeling concepts — the ones that should be implemented in every single data modeling scenario, regardless of the modeling approach you plan to take or the tool you are going to use for the physical implementation.

Logical model: The blueprintOnce business and data teams align on the conceptual data model, the next step is designing a logical data model.

Building a logical data model can be considered part of the agile data modeling cycle, which ensures more robust, scalable, and future-proof models.

Physical model: The construction pla…

2 days, 6 hours назад @ towardsdatascience.com
A Practical Guide to Choosing the Right Quantum SDK
A Practical Guide to Choosing the Right Quantum SDK A Practical Guide to Choosing the Right Quantum SDK

anyone trying to get into quantum computing or use it to build something is the abundance of SDKs available.

You define circuits, you run them, and you get results in a way that mirrors how most people are taught quantum computing.

Most near-term quantum algorithms look like this:Prepare a parameterized quantum circuit.

For example, let us consider the core idea behind quantum machine learning: You have a parameterized quantum circuit, and you can optimize it using gradients.

It works with continuous-variable quantum systems, often used in photonic quantum computing.

2 days, 8 hours назад @ towardsdatascience.com
A Guide to Understanding GPUs and Maximizing GPU Utilization
A Guide to Understanding GPUs and Maximizing GPU Utilization A Guide to Understanding GPUs and Maximizing GPU Utilization

In this post, we explore the mechanics of this bottleneck and walk through actionable engineering decisions to maximize GPU utilization.

Optimizing the Data PipelineTracking GPU UtilizationBefore we can optimize the data pipeline, we must understand how to monitor GPU utilization and VRAM.

Hitting periodic 100% utilization is not a sign that GPU utilization is maximized.

An example of a sawtooth GPU utilization graph in the same format as Weights and Biases is shown below:Graph of sawtooth GPU utilization commonly seen in unoptimized ML pipelines.

The GPU utilization is now a high, continuous line near 100%, meaning the GPU never has to wait!

2 days, 9 hours назад @ towardsdatascience.com
How To Produce Ultra-Compact Vector Graphic Plots With Orthogonal Distance Fitting
How To Produce Ultra-Compact Vector Graphic Plots With Orthogonal Distance Fitting How To Produce Ultra-Compact Vector Graphic Plots With Orthogonal Distance Fitting

With vector graphics, a function is approximated by segments of connected cubic Bézier curves that are rasterized (i.e.

The basic primitive of vector graphics is the parametric cubic, represented as a cubic Bézier curve.

Given a parametric function f, it first fits a Chebyshev series to approximate f. For analytic functions, interpolation in Chebyshev nodes provides rapid geometric convergence [4].

In the next section, I describe how to fit arbitrary functions with Algorithm F using the Python package bbai (https://github.com/rnburn/bbai).

While some of the formats provide other basic graphics like circles, etc, those are all just wrappers on top of cubic Bézier curve approximations.

2 days, 11 hours назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
TheSequence TheSequence
последний пост 12 часов назад
The Sequence Opinion #844: Harness Engineering: The Operating System for Agentic Software
The Sequence Opinion #844: Harness Engineering: The Operating System for Agentic Software The Sequence Opinion #844: Harness Engineering: The Operating System for Agentic Software

There is a meaningful difference between getting a model to write code and getting a model to reliably build software.

We are talking about harness engineering.

OpenAI recently gave a useful name to a pattern many of us have been discovering the hard way: harness engineering.

The interesting part of harness engineering is not the label itself.

It is the collection of non-obvious truths that appear once you move beyond one-shot demos and start asking agents to do real work over long horizons.

12 часов назад @ thesequence.substack.com
The Sequence AI of the Week #843: The AI We Built But Can't Release: A Practical View Into the Claude Mythos Preview
The Sequence AI of the Week #843: The AI We Built But Can't Release: A Practical View Into the Claude Mythos Preview The Sequence AI of the Week #843: The AI We Built But Can't Release: A Practical View Into the Claude Mythos Preview

Welcome to another edition of The Sequence.

Today, we are diving into what is undoubtedly the most fascinating, illuminating, and slightly unnerving AI document of the year: the system card for Anthropic’s Claude Mythos Preview.

For the last few years, the frontier AI development loop has been highly predictable: scale up the compute, implement some algorithmic breakthroughs, train a new state-of-the-art model, and push it to an API or chat interface for the world to play with.

We benchmark it, we build products around it, and we wait for the next iteration.

Anthropic just broke that loop.

1 day, 12 hours назад @ thesequence.substack.com
The Sequence Knowledge #842: Everything You Need to Know About World Models
The Sequence Knowledge #842: Everything You Need to Know About World Models The Sequence Knowledge #842: Everything You Need to Know About World Models

The Sequence Knowledge #800: Discusses the different types of world models and reviews the first major paper in the space.

The Sequence Knowledge #804: Covers the famous Dreamer models that opened up the space of world models.

The Sequence Knowledge #825: Discusses one of the most innovative world models: World Labs’ Marble.

The Sequence Knowledge #829: Explores the idea of world models and physical AI including NVIDIA’s Cosmos models.

The Sequence Knowledge #833: Dives into the core architecture components and building blocks of world models.

2 days, 13 hours назад @ thesequence.substack.com
The Sequence Radar #841: Three Model Releases, Three Futures
The Sequence Radar #841: Three Model Releases, Three Futures The Sequence Radar #841: Three Model Releases, Three Futures

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Three Model Releases, Three FuturesThis week’s AI launches were not just new models.

They were three different answers to a deeper question: what is a frontier model for?

The key detail is that Meta is tying the model to distribution it already owns: Meta AI, Instagram, Facebook, Messenger, WhatsApp, and eventually glasses.

AI Lab: Meta AI, KAUST, and CollaboratorsSummary: This paper introduces Neural Computers (NCs), an emerging computing paradigm that unifies computation, memory, and I/O within a single learned model state rather than relying on external execution environments.

Muse SparkMeta Superintelligence Lab released Muse Sp…

4 days, 12 hours назад @ thesequence.substack.com
The Sequence Opinion #840: The Agent-Native Rewrite: Why Every Piece of Software Infrastructure Needs to be Reimagined for AI Agents
The Sequence Opinion #840: The Agent-Native Rewrite: Why Every Piece of Software Infrastructure Needs to be Reimagined for AI Agents The Sequence Opinion #840: The Agent-Native Rewrite: Why Every Piece of Software Infrastructure Needs to be Reimagined for AI Agents

Today, I would like to discuss a thesis that is becoming more and more obvious by the week.

Software infrastructure was built for a world in which intelligence sat outside the machine and needs to be rewritten for AI agents.

A human looked at a screen, read a document, interpreted an exception, decided what mattered, and then clicked a button or wrote some code.

The software itself was mostly a mechanism for executing explicit instructions.

Messaging systems moved well-formed events.

1 week назад @ thesequence.substack.com
The Sequence AI of the Week #839: Gemma 4 and the Compression of Intelligence
The Sequence AI of the Week #839: Gemma 4 and the Compression of Intelligence The Sequence AI of the Week #839: Gemma 4 and the Compression of Intelligence

First, a capability appears at the frontier in a form that is expensive, awkward, and slightly theatrical.

Then, one or two generations later, that same capability gets compressed into something practical.

Gemma 4 feels like one of those moments.

Gemma 4 is less a chatbot and more a compact cognitive runtime.

It feels designed not merely to answer prompts, but to sit inside products, workflows, and devices as an engine for reasoning.

1 week, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #838: Project GENIE: Building Playable Worlds from Pixels
The Sequence Knowledge #838: Project GENIE: Building Playable Worlds from Pixels The Sequence Knowledge #838: Project GENIE: Building Playable Worlds from Pixels

💡 AI Concept of the Day: Project GENIE: Building Playable Worlds from PixelsI’ve spent a lot of time recently thinking about the “Simulator” hypothesis.

But as we scaled from GPT-2 to GPT-4, it became clear that to predict the next token accurately, the model had to build a robust internal representation of the world—a world model.

Project Genie represents a fundamental shift in this direction.

It is not a video generator in the sense of a digital canvas (like Sora); it is a foundation model for agency.

It represents the transition from AI that talks to AI that simulates.

1 week, 2 days назад @ thesequence.substack.com
The Sequence Radar #837: Last Week in AI: From Model Releases to Market Structure
The Sequence Radar #837: Last Week in AI: From Model Releases to Market Structure The Sequence Radar #837: Last Week in AI: From Model Releases to Market Structure

Next Week in The Sequence:We continue our series about world modelsThe AI of the week section will dive into the amazing Gemma4.

Subscribe and don’t miss out:📝 Editorial: Last Week in AI: From Model Releases to Market StructureThis week in AI was not really a product week.

Open models are no longer just the rebel wing of AI.

AI Lab: Ant GroupSummary: To address the scarcity of reliable time series forecasting evaluations, this paper introduces QUITOBENCH, a regime-balanced benchmark derived from a billion-scale corpus of Alipay application traffic.

Trinity Large ThinkingArcee AI open sourced Trinity Large Thinking, its largest open source frontier agent.

1 week, 4 days назад @ thesequence.substack.com
The Sequence Opinion #836: Insurance for AI Agents ? Not as Crazy as You Think
The Sequence Opinion #836: Insurance for AI Agents ? Not as Crazy as You Think The Sequence Opinion #836: Insurance for AI Agents ? Not as Crazy as You Think

By 2026, this capability expanded into “vibe physics,” with models like Claude Opus 4.5 autonomously conducting graduate-level theoretical physics research through 52,000-message agentic loops.

However, for weekend projects and rapid prototyping, this approach demonstrates the unreasonable effectiveness of generative models.

But as these systems transition from conversational assistants to autonomous agents executing multi-step workflows in production, the constraints of software development radically alter.

The output is the product, and when that output causes financial or reputational damage, traditional risk frameworks break down.

This necessitates the emergence of a highly specialized …

1 week, 6 days назад @ thesequence.substack.com
The Sequence Chat #835: Illia Polosukhin on NEAR AI, Authoring the Transformer Paper and Decentralized and Private AI
The Sequence Chat #835: Illia Polosukhin on NEAR AI, Authoring the Transformer Paper and Decentralized and Private AI The Sequence Chat #835: Illia Polosukhin on NEAR AI, Authoring the Transformer Paper and Decentralized and Private AI

BackgroundCan you introduce yourself and tell us about your journey from academy to Google to NEAR AI?

I left Google to found a startup, which became NEAR AI, with Alex Skidanov––we wanted to build apps using natural language.

In 2024, we brought back NEAR AI to ensure that users can control their own assets and data and we’ve now launched several products including NEAR AI Cloud, IronClaw, and confidential compute.

Where does NEAR AI fit into this pipeline?

At NEAR AI we are building confidential computing infrastructure that should work for the full workflow.

2 weeks назад @ thesequence.substack.com
The Sequence AI of the Week #834: Google's AMAZING TurboQuant for Building More Efficient AI
The Sequence AI of the Week #834: Google's AMAZING TurboQuant for Building More Efficient AI The Sequence AI of the Week #834: Google's AMAZING TurboQuant for Building More Efficient AI

In practice, it is a much bigger statement about how efficient AI systems will be built.

That shift matters because vectors are the hidden substrate of modern AI.

Retrieval systems run on inner products.

Vector databases, semantic search engines, recommenders, and increasingly multimodal systems all run on inner products.

If you can compress vectors aggressively while preserving the geometry that those inner products depend on, you are not just saving memory.

2 weeks, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #833: How to Build a World Model
The Sequence Knowledge #833: How to Build a World Model The Sequence Knowledge #833: How to Build a World Model

💡 AI Concept of the Day: How to Build a World ModelWorld models are the workaround.

The last few years have made one thing clear: a world model is not a single model.

It’s a stack of techniques, each one invented to patch a failure mode that kept showing up.

What follows is the practical toolkit—the “how”—behind modern world models.

1) Tokenize reality: compress first, then think

2 weeks, 2 days назад @ thesequence.substack.com
The Sequence Radar #832: Last Week in AI: Compression, Voice, and Why It All Matters
The Sequence Radar #832: Last Week in AI: Compression, Voice, and Why It All Matters The Sequence Radar #832: Last Week in AI: Compression, Voice, and Why It All Matters

Subscribe and don’t miss out:📝 Editorial: Compression, Voice, and Why It All MattersThis week in AI was quite pragmatic.

Just three releases this week that quietly moved the floor on what’s possible, and why that matters more than most people realize.

Voice Week: Two Very Different BetsGoogle shipped Gemini 3.1 Flash Live the same week, and it’s the clearest signal yet that the old voice stack — VAD → STT → LLM → TTS, four sequential hops with four latency budgets — is getting replaced.

AI Lab: University of British Columbia, Vector Institute, University of Edinburgh, New York University, MetaSummary: HYPERAGENTS extends the Darwin Gödel Machine framework to enable self-referential AI agent…

2 weeks, 4 days назад @ thesequence.substack.com
The Sequence Opinion #831: NVIDIA Is Quietly Building the Operating System of AI
The Sequence Opinion #831: NVIDIA Is Quietly Building the Operating System of AI The Sequence Opinion #831: NVIDIA Is Quietly Building the Operating System of AI

A hardware company ships a breakthrough chip.

Then the hardware company realizes the software layer is where the real lock-in lives, and starts building it themselves.

NVIDIA is doing it right now with AI — except they’re doing it at a scale and speed that makes the previous examples look quaint.

He talked about an operating system for AI factories.

Buried under seven new chips, five rack-scale systems, and a software ecosystem so vertically integrated it makes Apple look open.

3 weeks назад @ thesequence.substack.com
The Sequence AI of the Week #830: The Quiet Ambush: Inside the Amazing MiMo-V2-Pro aka Hunter Alpha
The Sequence AI of the Week #830: The Quiet Ambush: Inside the Amazing MiMo-V2-Pro aka Hunter Alpha The Sequence AI of the Week #830: The Quiet Ambush: Inside the Amazing MiMo-V2-Pro aka Hunter Alpha

Every so often, the artificial intelligence community experiences a collective double-take.

We are moving rapidly from the “Chat Era”—where models act as passive oracles answering static trivia and generating boilerplate—to the “Agent Era,” where models are embedded in complex scaffolds, utilizing tools, and driving open-ended, continuous software engineering workflows.

In this high-stakes transition, the most disruptive entry didn’t come with a press tour.

It came disguised as a nameless API endpoint on OpenRouter, executing a strategy of pure, blind telemetry.

The “Hunter Alpha” Anomaly

3 weeks, 1 day назад @ thesequence.substack.com
Synced Review
последний пост None
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 1 week, 4 days назад
Вайбкодинг по Chess’ноку. 1. e4
Вайбкодинг по Chess’ноку. 1. e4 Вайбкодинг по Chess’ноку. 1. e4

Но это не вайбкодинг, а тяжёлая профессиональная ИИ-разработка.

За это время по этому проекту в ChatGPT было создано 112 чатов — это примерно 560 промптов.

И в особо напряжённые периоды приходилось вставать по ночам, чтобы оптимально использовать лимиты, которые делятся на 5-часовые и недельные сессии.

Но это не магия и не кнопка «сделать хорошо».

Именно поэтому будущее не за вайбкодингом, а за теми, кто научится управлять этой скоростью.

1 week, 4 days назад @ habr.com
Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества
Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества Почему я стал ИТ-волонтером & Датасет новостей о противоречиях современного общества

Простой пример с ценами на топливо: бензин дорожает и из-за роста цены на нефть, и из-за ее падения.

Осознание того, что твой труд увеличивает чью-то капитализацию, но не решает реальных проблем общества, видимых в быту и в новостях, подтолкнуло искать еще какую-то деятельность.

Кроме того, благодаря АМБ появился уникальный датасет новостей с противоречиями современного общества на kaggle и github, далее о нем.

Датасет новостей о противоречиях современного обществаАктивисты АМБ и волонтеры дружественных коллективов собрали и разметили датасет новостей, подсвечивающие те самые системные противоречия, о которых я задумывался ранее.

Пример Б В 2023 году в мире голодал каждый 11-й человек, а в …

1 month, 3 weeks назад @ habr.com
[Перевод] Как устроен Codex
[Перевод] Как устроен Codex [Перевод] Как устроен Codex

Подробный разбор того, как команда OpenAI Codex создаёт своего кодового агента, как его используют инженеры и что это может значить для будущего разработки ПО.

Чтобы разобраться, как устроен Codex, как команды внутри OpenAI его используют и как он влияет на инженерные практики у создателей ChatGPT, я поговорил с тремя сотрудниками OpenAI:Тибо Соттио (Thibault Sottiaux) — руководитель Codex.

Оба продукта были запущены весной: Codex CLI анонсировали в апреле 2025 года, а Codex в ChatGPT представили в мае.

В команде Codex эти файлы объясняют агенту, как ориентироваться в кодовой базе, какие команды запускать для тестирования и как следовать стандартам проекта.

Использование Codex в OpenAIПомим…

1 month, 3 weeks назад @ habr.com
Курс Natural Language Processing & LLMs — новый сезон
Курс Natural Language Processing & LLMs — новый сезон Курс Natural Language Processing & LLMs — новый сезон

10 февраля мы в очередной раз запускаем бесплатный онлайн-курс по обработке естественного языка (Natural Language Processing).

Что будем проходить:классическое начало: закон Ципфа, TF-IDF, RNN, CNN, Transformer;основные задачи NLP: классификация текста, тегирование и генерация;специфичные области: агенты и вайб-кодинг;LLM и их применение.

Если вы студент ИТМО, МФТИ или ВШЭ, то курс можно зачесть, как учебный.

Работаю в области NLP более 12 лет, успел поработать в Яндексе и ВКонтакте, защитить кандидатскую диссертацию.

Если есть вопросы, то приходите с ними в ODS Mattermost – там будут все ответы, время семинаров и ссылки.

2 months, 2 weeks назад @ habr.com
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода

Однако все задачи в MERA CODE, как впрочем и в SWE-bench и других бенчмарках подобного назначения, следуют классической парадигме, когда у нас есть фиксированный обучающий набор данных и, что более важно, фиксированный проверочный набор.

Но большие языковые модели для кодинга, которые мы и пытаемся оценивать нашим набором, также учатся на GitHub – со времен еще первой модели LLaMa.

Кажется, что 700 задач немного, но это уже очень приличное количество, и что самое важное — это новые задачи.

Current behavior: from sympy import ask, Q, Symbol x = Symbol('x') print(ask(Q.finite(x**-1), Q.real(x))) # Output: True Expected behavior: The function should return None to indicate uncertainty, as x**-…

7 months назад @ habr.com
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке

Ответ: Кэисукэ ТибаSPARQL-запрос SimpleSELECT DISTINCT ?s ?r ?o WHERE { { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

GROUP BY ?s ?r HAVING(count(?o) = 1) } { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

Ответ: Национальная система платежных карт (НСПК) Центр биометрических технологий (ЦБТ) ЕБСSELECT ?s ?r ?o ?len WHERE { { SELECT ?s ?r (COUNT(?o1) as ?len) (GROUP_CONCAT(DISTINCT(STR(?o1));separator="|") AS ?o) WHERE { ?s ?r ?o1 . }

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

8 months, 3 weeks назад @ habr.com
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip

В этой статье я хочу поделиться своим опытом по конвертации нейросети в формат rknn с помощью библиотеки rknn-toolkit2.

Вот как выглядят веса pytorch модели в Netron:веса pytorch модели в NetronВажно!

Конвертация onnx модели в rknnДалее создается объект RKNN , который управляет процессом конвертации и инференса модели на платформе Rockchip.

На этом этапе происходит подготавка модели к конвертации в формат RKNN и последующему запуску на NPU Rockchip.

Создание и экспорт rknn моделиНа этом этапе происходит конвертация ONNX-модели во внутренний формат RKNN, оптимизация графа и подготовка к запуску на NPU Rockchip.

9 months назад @ habr.com
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях MERA Code: всесторонняя оценка генерации кода в прикладных сценариях

🔗MERA Code🔗GitHub с кодом и данными🔗Коллекция на Hugging Face🔗Статья на arxiv🔗Репозиторий проекта на GitVerseЧто такое MERA Code?

Современные кодовые языковые модели и модели общего назначения (ChatGPT, Claude, Qwen, YandexGPT, GigaChat и др.)

Список текущих задач MERA Code и их характеристикКаталог задач MERA Code и их подробное описание представлено на сайте.

В MERA Code промпты строго подобраны под задачу и корректный выбор ответа.

В заключениеMERA Code — это попытка закрыть важный пробел в тестировании LLM: насколько они действительно полезны в реальной, локализованной разработке.

9 months назад @ habr.com
Machine Learning Mastery
последний пост 3 days, 3 hours назад
How to Implement Tool Calling with Gemma 4 and Python
How to Implement Tool Calling with Gemma 4 and Python How to Implement Tool Calling with Gemma 4 and Python

How to implement a local tool calling system using Python and Ollama.

Tool calling, aka function calling, is the foundational architecture shift required to fix this gap.

Tool calling serves as the bridge that can help transform static models into dynamic autonomous agents.

decode ( 'utf-8' ) ) if "results" not in geo_data or not geo_data [ "results" ] : return f "Could not find coordinates for city: {city}."

Gemma 4 certainly appears to be a powerhouse of a small language model reasoning engine with tool calling capabilities.

3 days, 3 hours назад @ machinelearningmastery.com
Structured Outputs vs. Function Calling: Which Should Your Agent Use?
Structured Outputs vs. Function Calling: Which Should Your Agent Use? Structured Outputs vs. Function Calling: Which Should Your Agent Use?

Share Post ShareIn this article, you will learn the architectural differences between structured outputs and function calling in modern language model systems.

Topics we will cover include:How structured outputs and function calling work under the hood.

Function Calling MechanicsFunction calling, on the other hand, relies heavily on instruction tuning.

If structured outputs dictate the shape of the data, function calling dictates the control flow of the application.

The Overlap:It is worth noting that modern function calling actually relies on structured outputs under the hood to ensure the generated arguments match your function signatures.

3 days, 11 hours назад @ machinelearningmastery.com
Beyond Vector Search: Building a Deterministic 3-Tiered Graph-RAG System
Beyond Vector Search: Building a Deterministic 3-Tiered Graph-RAG System Beyond Vector Search: Building a Deterministic 3-Tiered Graph-RAG System

add ( "LeBron James" , "played_for" , "Ottawa Beavers" , "NBA_2023_regular_season" ) facts_qs .

add ( "Ottawa Beavers" , "based_in" , "downtown Ottawa" , "NBA_trivia" ) facts_qs .

) doc2 = ( "Ottawa Beavers" "The Ottawa Beavers star player LeBron James is out for the rest of the 2023 NBA season, " "after his ankle injury has worsened.

) doc2 = ( "Ottawa Beavers" "The Ottawa Beavers star player LeBron James is out for the rest of the 2023 NBA season, " "after his ankle injury has worsened.

“LeBron James” and “Ottawa Beavers”).

6 days, 1 hour назад @ machinelearningmastery.com
The Roadmap to Mastering Agentic AI Design Patterns
The Roadmap to Mastering Agentic AI Design Patterns The Roadmap to Mastering Agentic AI Design Patterns

Share Post ShareIn this article, you will learn how to systematically select and apply agentic AI design patterns to build reliable, scalable agent systems.

Agentic design patterns are reusable approaches for recurring problems in agentic system design.

This article offers a practical roadmap to understanding agentic AI design patterns.

Further learning: AI agent design patterns | Google Cloud and Agentic AI Design Patterns Introduction and walkthrough | Amazon Web Services.

Further reading: Evaluating AI Agents | DeepLearning.AIConclusionAgentic AI design patterns are not a checklist to complete once.

1 week назад @ machinelearningmastery.com
A Hands-On Guide to Testing Agents with RAGAs and G-Eval
A Hands-On Guide to Testing Agents with RAGAs and G-Eval A Hands-On Guide to Testing Agents with RAGAs and G-Eval

This article presents a hands-on guide to understanding how to test large language model and agent-based applications using both RAGAs and frameworks based on G-Eval.

from ragas import evaluate from ragas.metrics import faithfulness # Defining a simple testing dataset for a question-answering scenario data = { "question": ["What is the capital of Japan?

} # Running RAGAs evaluation result = evaluate(data, metrics=[faithfulness]) 1 2 3 4 5 6 7 8 9 10 11 12 from ragas import evaluate from ragas .

environ [ "OPENAI_API_KEY" ] = "YOUR_API_KEY" # Convert list to Hugging Face Dataset (required by RAGAs) dataset = Dataset .

environ [ "OPENAI_API_KEY" ] = openai_api _ key # Convert test cases into …

1 week, 1 day назад @ machinelearningmastery.com
Handling Race Conditions in Multi-Agent Orchestration
Handling Race Conditions in Multi-Agent Orchestration Handling Race Conditions in Multi-Agent Orchestration

Share Post ShareIn this article, you will learn how to identify, understand, and mitigate race conditions in multi-agent orchestration systems.

Why Multi-Agent Pipelines Are Especially VulnerableTraditional concurrent programming has decades of tooling around race conditions: threads, mutexes, semaphores, and atomic operations.

Testing for Race Conditions Before They Test YouThe hard part about race conditions is reproducing them.

acquire ( ) value = counter value = value + 1 counter = value lock .

Closing that window through locks, atomic operations, or conflict detection is the core of handling race conditions in practice.

1 week, 2 days назад @ machinelearningmastery.com
Top 5 Reranking Models to Improve RAG Results
Top 5 Reranking Models to Improve RAG Results Top 5 Reranking Models to Improve RAG Results

Share Post ShareIn this article, you will learn how reranking improves the relevance of results in retrieval-augmented generation (RAG) systems by going beyond what retrievers alone can achieve.

IntroductionIf you have worked with retrieval-augmented generation (RAG) systems, you have probably seen this problem.

Benchmarks like MTEB, BEIR, and MIRACL are commonly used to evaluate these models, and most modern RAG systems rely on rerankers for production-quality results.

There is no single best reranker for every use case.

It shows very strong published reranking results (69.76 on MTEB-R, 75.94 on CMTEB-R, 72.74 on MMTEB-R, 69.97 on MLDR, and 81.20 on MTEB-Code).

1 week, 3 days назад @ machinelearningmastery.com
7 Machine Learning Trends to Watch in 2026
7 Machine Learning Trends to Watch in 2026 7 Machine Learning Trends to Watch in 2026

Here are the 7 trends actually shaping how machine learning is being built and used in 2026.

Trend 4: Machine Learning Moves to the Edge (IoT + Real-Time Intelligence)For years, most machine learning systems lived in the cloud.

The difference between cloud machine learning and edge machine learning comes down to speed and control.

Wrapping UpIn 2026, machine learning is no longer just a set of tools or experimental features.

Together, they represent a new standard: machine learning systems that work, reliably and meaningfully, at the heart of business and daily life.

2 weeks, 1 day назад @ machinelearningmastery.com
Building a ‘Human-in-the-Loop’ Approval Gate for Autonomous Agents
Building a ‘Human-in-the-Loop’ Approval Gate for Autonomous Agents Building a ‘Human-in-the-Loop’ Approval Gate for Autonomous Agents

get ( "approved" ) : print ( "[System]: SENDING EMAIL ->" , state [ "draft" ] ) return { "sent" : True } else : print ( "[System]: Draft was rejected.

Notice below that a thread ID is used so the memory can keep track of the workflow state across executions.

get_state ( config ) print ( f "Next node to execute: {current_state.next}" ) # Should show 'send_message' print ( f "Current Draft: '{current_state.values['draft']}'" ) # Simulating a human reviewing and approving the email draft print ( "[Human]: Reviewing draft... Looks good.

stream ( None , config ) : pass print ( "--- FINAL STATE ---" ) print ( app .

-- - FINAL STATE -- - { 'draft' : 'Hello!

2 weeks, 2 days назад @ machinelearningmastery.com
From Prompt to Prediction: Understanding Prefill, Decode, and the KV Cache in LLMs
From Prompt to Prediction: Understanding Prefill, Decode, and the KV Cache in LLMs From Prompt to Prediction: Understanding Prefill, Decode, and the KV Cache in LLMs

Q = torch .

unsqueeze ( 0 ) , - 1e9 ) # Convert logits to attention weights weights = torch .

zeros_like ( weights ) , weights ) # Compute contexts: (heads, n, n) @ (n, 1) -> (heads, n, 1) contexts = ( weights @ V ) .

float ( ) # [1, 2, 3, 4, 5] print ( "New tokens: " , tokens ) print ( "New Values: " , V )Output:New tokens: ['Today', 'weather', 'is', 'so', 'nice'] New Values: tensor([[10.

zeros_like ( weights_dec ) , weights_dec ) # Context vectors: (4 × 1 × n) @ (n × 1) → (4 × 1 × 1) → squeeze → (4,) contexts_dec = ( weights _ dec @ V ) .

2 weeks, 3 days назад @ machinelearningmastery.com
LlamaAgents Builder: From Prompt to Deployed AI Agent in Minutes
LlamaAgents Builder: From Prompt to Deployed AI Agent in Minutes LlamaAgents Builder: From Prompt to Deployed AI Agent in Minutes

Share Post ShareIn this article, you will learn how to build, deploy, and test a no-code document-processing AI agent with LlamaAgents Builder in LlamaCloud.

IntroductionCreating an AI agent for tasks like analyzing and processing documents autonomously used to require hours of near-endless configuration, code orchestration, and deployment battles.

This article unveils the process of building, deploying, and using an intelligent agent from scratch without writing a single line of code, using LlamaAgents Builder.

Building with LlamaAgents BuilderLlamaAgents Builder is one of the newest features in the LlamaCloud web platform, whose flagship product was originally introduced as LlamaParse.

Th…

2 weeks, 6 days назад @ machinelearningmastery.com
Vector Databases Explained in 3 Levels of Difficulty
Vector Databases Explained in 3 Levels of Difficulty Vector Databases Explained in 3 Levels of Difficulty

How vector databases support nearest neighbor search, metadata filtering, and hybrid retrieval.

How indexing techniques such as HNSW, IVF, and PQ help vector search scale in production.

Vector databases answer a different one: which records are most similar to this?

Comparing a query vector against every stored vector means billions of floating-point operations at production data sizes, and that math makes real-time search impractical.

Production vector databases run ANN algorithms under the hood.

3 weeks назад @ machinelearningmastery.com
5 Practical Techniques to Detect and Mitigate LLM Hallucinations Beyond Prompt Engineering
5 Practical Techniques to Detect and Mitigate LLM Hallucinations Beyond Prompt Engineering 5 Practical Techniques to Detect and Mitigate LLM Hallucinations Beyond Prompt Engineering

search ( query_embedding , k = 1 ) retrieved_doc = documents [ indices [ 0 ] [ 0 ] ] # Step 7: Generate response using retrieved context client = OpenAI ( ) response = client .

create ( model = "gpt-4o-mini" , messages = [ { "role" : "system" , "content" : "Answer using the provided context only."

Instead of relying on a single model response, you introduce additional steps that check, validate, or challenge what was generated before it reaches the user.

Here is a simple implementation:from openai import OpenAI client = OpenAI() def get_answer_with_confidence(question): response = client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "system", "content": "Answer the ques…

3 weeks, 1 day назад @ machinelearningmastery.com
Beyond the Vector Store: Building the Full Data Layer for AI Applications
Beyond the Vector Store: Building the Full Data Layer for AI Applications Beyond the Vector Store: Building the Full Data Layer for AI Applications

Share Post ShareIn this article, you will learn why production AI applications need both a vector database for semantic retrieval and a relational database for structured, transactional workloads.

Topics we will cover include:What vector databases do well, and where they fall short in production AI systems.

Production AI applications need two complementary data engines working in lockstep: a vector database for semantic retrieval, and a relational database for everything else.

This is cheaper, faster, and more reliable than relying on vector search alone to return a perfectly scoped result set.

If you are building a production AI application, it would be a mistake to treat these as competin…

3 weeks, 2 days назад @ machinelearningmastery.com
7 Steps to Mastering Memory in Agentic AI Systems
7 Steps to Mastering Memory in Agentic AI Systems 7 Steps to Mastering Memory in Agentic AI Systems

A Guide to Enhancing AI Learning and Recall | MongoDBStep 2: Learning the AI Agent Memory Type TaxonomyCognitive science gives us a vocabulary for the distinct roles memory plays in intelligent systems.

Further reading: Beyond Short-term Memory: The 3 Types of Long-term Memory AI Agents Need and Making Sense of Memory in AI Agents by Leonie MonigattiStep 3: Knowing the Difference Between Retrieval-Augmented Generation and MemoryOne of the most persistent sources of confusion for developers building agentic systems is conflating retrieval-augmented generation (RAG) with agent memory.

Further reading: AI Agent Memory: Build Stateful AI Systems That Remember – Redis and Building Memory-Aware A…

3 weeks, 3 days назад @ machinelearningmastery.com
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 1 month назад
Why I Signed The Amicus Brief for Anthropic v Department of War
Why I Signed The Amicus Brief for Anthropic v Department of War Why I Signed The Amicus Brief for Anthropic v Department of War

On Monday, Anthropic filed a lawsuit against the Department of War, and an amicus brief in support of Anthropic was filed on behalf of a number of OpenAI and Google employees.

There’s also an amicus brief filed on behalf of Microsoft.

There’s conflicting reporting, but very broadly, Anthropic signed an agreement with the government to deploy Claude in classified, military contexts.

Anthropic said no, Pete Hegseth declared them a supply chain risk, and Anthropic filed a lawsuit against this.

The amicus brief was broadly aligned with my thoughts on the matter, so I signed.

1 month назад @ alexirpan.com
MIT Mystery Hunt 2026
MIT Mystery Hunt 2026 MIT Mystery Hunt 2026

This has spoilers for MIT Mystery Hunt 2026.

Pre-HuntThe time running up to Hunt was more stressful than usual…very briefly, I typically hunt with teammate.

Just last year, I did GPH 2025, LN Hunt, Teammate Hunt 2025, Microsoft Hunt 2025, and Silph Puzzle Hunt 2025, all of which had significant 3+ hour solve puzzles that would not be out of place in Mystery Hunt.

Not to mention smaller hunts like Advent Hunt, and then I didn’t even do Brown Puzzlehunt or Vertex Hunt or the fall CMU Hunt.

To me, the crux is whether Mystery Hunt is broken, or Mystery Hunt is fine.

2 months, 2 weeks назад @ alexirpan.com
Authentic Imperfection
Authentic Imperfection Authentic Imperfection

* * *I’ve been thinking about the anger surrounding generative AI.

To keep things fair, he took the best human images and best AI images, meaning human art from famous artists, and AI art from prompters skilled at removing obvious tells of image generation.

When people complain about AI slop, I see it as a complaint against the deluge of default style AI images.

We’ve seen this happen in all forms: AI text, AI music, older forms of computer generated content like CGI.

As much as we celebrate imperfection, digital imperfection is a step too far.

5 months назад @ alexirpan.com
Ten Years Later
Ten Years Later Ten Years Later

Every now and then, someone asks me why I blog, and I don’t know really know what to tell them.

That’s another reason I’m not celebrating 10 years with more gusto, I know I’ve been writing less.

Indiana Jones and the Great Circle: I don’t know how they did it, but Indiana Jones and the Great Circle was just fun all the way through.

My one complaint is that the hand-to-hand combat feels like the worst part of the game, so of course they put a bunch of upgrades behind learning parry timings you’ll never use later.

I have not tried Peak, but Another Crab’s Treasure was really good and is worth playing if you’re interested in a Souls-like.

8 months назад @ alexirpan.com
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025 Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025

A music concert in the evenings, typically set up as a rave with EDM or rock music made by brony musicians.

She has been involved in organizing pony music concerts for over a decade, for both BABSCon and other pony conventions.

Thank you, BABSCon ChairsThe brony musicians immediately jump into an emergency Discord call with Pinkaboo, to get her side of the story.

Other conventions start tweeting in support of the brony musicians, with no one taking BABSCon’s side.

It’s hard for me to explain why I like MLP fan music, because brony music really isn’t accessible.

8 months, 4 weeks назад @ alexirpan.com
Lil'Log
последний пост None
inFERENCe
последний пост 1 month, 2 weeks назад
The Future of Software
The Future of Software The Future of Software

February 25, 2026The Future of SoftwareThe world of software is undergoing a shift not seen since the advent of compilers in the 1970s.

How will humans tell AI agents what software artefacts we would like to create?

How will humans tell AI agents what software artefacts we would like to create?

This future of software creation, in which our programming languages are abstracted away, raises two very important questions:What will the instruction/specification language look like?

This should be a clear layer of separation between the developer and the pool of AI agents working to maintain software.

1 month, 2 weeks назад @ inference.vc
Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On
Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On

Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years OnTen years ago this week, I wrote a provocative and bold post that blew up, made it to top spot on HackerNews.

In hindsight: There is a lot of stuff in deep learning that we don't understand nearly enough.

Sometimes things work for reasons completely unrelated to why we thought they would work.

(Pop some 🍿 in the microwave and read till the end for more)🎯 "Deep learning is powerful exactly because it makes hard things easy"Okay, this was a great insight.

🎯 Generative ModelingIn the post I suggested people learn "something harder" instead of - or in addition to - deep learning.

2 months, 2 weeks назад @ inference.vc
The Spectator
последний пост None
The Unofficial Google Data Science Blog The Unofficial Google Data Science Blog
последний пост None
Off the Convex Path
последний пост None
Jay Alammar
последний пост None
Piekniewski's blog
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост 8 months, 3 weeks назад
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy /henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos.

Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker.

It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage.

By learning geometry and motion jointly from …

8 months, 3 weeks назад @ paperswithcode.com
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation /antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation

Calisthenics skill classification is the computer vision task of inferring the skill performed by an athlete from images, enabling automatic performance assessment and personalized analytics.

Traditional methods for calisthenics skill recognition are based on pose estimation methods to determine the position of skeletal data from images, which is later fed to a classification algorithm to infer the performed skill.

This work proposes a direct approach to calisthenics skill recognition, which leverages depth estimation and athlete patch retrieval to avoid the computationally expensive human pose estimation module.

Using Depth Anything V2 for depth estimation and YOLOv10 for athlete localizat…

8 months, 3 weeks назад @ paperswithcode.com
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI /snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI

Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost.

Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parallelism strategy that adapts to real-world traffic while integrating speculative decoding, SwiftKV compute reduction, and optimized embedding inference.

It achieves up to 3.4 times faster request completion, 1.75 times faster generation, and 1.6M tokens/sec per GPU for embeddings, outperforming both latency- and throughput-optimized deployments.

Already powering Snowflake Cortex AI, Arctic Inference delivers state-of-the-art, cost-effective inference for ent…

8 months, 3 weeks назад @ paperswithcode.com
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale /NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting.

The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales.

FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches.

In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realis…

8 months, 3 weeks назад @ paperswithcode.com
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work? /jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

We study A/B experiments that are designed to compare the performance of two recommendation algorithms.

The bias arising from this type of data sharing is known as "symbiosis bias".

In this paper, we highlight that, for decision-making purposes, the sign of the GTE often matters more than its precise magnitude when selecting the better algorithm.

We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the expected GTE estimate under data sharing aligns with or contradicts the sign of the true GTE.

Our analysis identifies the level of exploration versus exploitation as a key determinant of how symbiosis bias impacts algorithm selection.

8 months, 3 weeks назад @ paperswithcode.com
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression /qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression

Task-agnostic prompt compression leverages the redundancy in natural language to reduce computational overhead and enhance information density within prompts, especially in long-context scenarios.

Existing methods predominantly rely on information entropy as the metric to compress lexical units, aiming to achieve minimal information loss.

However, these approaches overlook two critical aspects: (i) the importance of attention-critical tokens at the algorithmic level, and (ii) shifts in information entropy during the compression process.

Motivated by these challenges, we propose a dynamic attention-aware approach for task-agnostic prompt compression (DAC).

This approach effectively integrate…

8 months, 3 weeks назад @ paperswithcode.com
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions /lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions

Large Language Models (LLMs) can provide accurate word definitions and explanations for any context.

However, the scope of the definition changes for different target groups, like children or language learners.

We investigate how simplification impacts homonym definition quality across three target groups: Normal, Simple, and ELI5.

Our results show that simplification drastically degrades definition completeness by neglecting polysemy, increasing the risk of misunderstanding.

Fine-tuning Llama 3.1 8B with Direct Preference Optimization substantially improves homonym response quality across all prompt types.

8 months, 3 weeks назад @ paperswithcode.com
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention /pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention

Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations - fabricated content contradicting visual inputs.

Existing hallucination mitigation methods either incur prohibitive computational costs or introduce distribution mismatches between training data and model outputs.

We identify a critical insight: hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.

To address this, we propose **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning), a framework that eliminates dependency on human annotations…

8 months, 3 weeks назад @ paperswithcode.com
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models /owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Language models (LMs) are challenging to adapt to new data distributions by simple finetuning.

This is due to the rigidity of their subword tokenizers, which typically remain unchanged during adaptation.

This inflexibility often leads to inefficient tokenization, causing overfragmentation of out-of-distribution domains, unseen languages, or scripts.

In this work, we develop byte-level LMs with learnable tokenizers to make tokenization adaptive.

Our models include a submodule that learns to predict boundaries between the input byte sequence, encoding it into variable-length segments.

8 months, 3 weeks назад @ paperswithcode.com
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion /wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion

To address the challenges posed by cascading reactions caused by component failures in autonomous cargo ships (ACS) and the uncertainties in emergency decision-making, this paper proposes a novel hybrid feature fusion framework for constructing a graph-structured dataset of failure modes.

A hierarchical feature fusion framework is constructed, using Word2Vec encoding to encode subsystem/component features, BERT-KPCA to process failure modes/reasons, and Sentence-BERT to quantify the semantic association between failure impact and emergency decision-making.

The dataset covers 12 systems, 1,262 failure modes, and 6,150 propagation paths.

In the label prediction results, the Shore-based Meteor…

8 months, 3 weeks назад @ paperswithcode.com
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering /YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering

In recent years, models based on Graph Convolutional Networks (GCN) have made significant strides in the field of graph data analysis.

Although the Graph Transformer architecture has mitigated some of these issues, its performance is still limited when processing heterogeneous graph data.

To address these challenges, this study proposes a novel deep clustering framework that comprising GCN, Autoencoder (AE), and Graph Transformer, termed the Tri-Learn Graph Fusion Network (Tri-GFN).

The tri-learning mechanism allows mutual learning among these modules, while the feature fusion strategy enables the model to capture complex relationships, yielding highly discriminative representations for gra…

8 months, 3 weeks назад @ paperswithcode.com
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation /mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression.

The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant.

The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable.

We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69\% test accuracy in just 20 ep…

8 months, 3 weeks назад @ paperswithcode.com
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation /Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation

While this problem has been widely studied in traditional recommendation settings, its implications for bundle recommendation (BR) remain largely unexplored.

Existing fairness frameworks and metrics designed for traditional recommender systems may not directly translate to this multi-layered setting.

In this paper, we conduct a comprehensive reproducibility study of product-side fairness in BR across three real-world datasets using four state-of-the-art BR methods.

We analyze exposure disparities at both the bundle and item levels using multiple fairness metrics, uncovering important patterns.

Overall, our findings offer actionable insights for building fairer bundle recommender systems and…

8 months, 3 weeks назад @ paperswithcode.com
/cbobed/ OntView: What you See is What you Meant
/cbobed/ OntView: What you See is What you Meant /cbobed/ OntView: What you See is What you Meant

However, the lack of tools that provide effective visualization is still a significant challenge.

In this paper, we present OntView, an ontology viewer that is designed to provide users with an intuitive visual representation of ontology concepts and their formal definitions through a user-friendly interface.

Building on the use of a DL reasoner, OntView follows a "What you see is what you meant" paradigm, showing the actual inferred knowledge.

One key aspect for this is its ability to visualize General Concept Inclusions (GCI), a feature absent in existing visualization tools.

OntView has been released with an open-source license for the whole community.

8 months, 3 weeks назад @ paperswithcode.com
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction /Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction

These approaches fail to capture elaborate relations hidden in real-world bundle structures, resulting in suboptimal bundle representations.

To overcome this limitation, we propose RaMen, a novel method that provides a holistic multi-strategy approach for bundle construction.

RaMen utilizes both intrinsic (characteristics) and extrinsic (collaborative signals) information to model bundle structures through Explicit Strategy-aware Learning (ESL) and Implicit Strategy-aware Learning (ISL).

Integrating diverse strategies enables RaMen to learn more comprehensive and robust bundle representations.

Meanwhile, Multi-strategy Alignment & Discrimination module is employed to facilitate knowledge tr…

8 months, 3 weeks назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 8 months, 3 weeks назад
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation /PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation

The rise of Large Reasoning Models (LRMs) promises a significant leap forward in language model capabilities, aiming to tackle increasingly sophisticated tasks with unprecedented efficiency and accuracy.

However, despite their impressive performance, recent studies have highlighted how current reasoning models frequently fail to generalize to novel, unseen problems, often resorting to memorized solutions rather than genuine inferential reasoning.

In this paper, we introduce Nexus Architect, an enhanced iteration of our multi-agent system framework, Nexus, equipped with a novel automated workflow synthesis mechanism.

Given a user's prompt and a small set of representative examples, the Archi…

8 months, 3 weeks назад @ paperswithcode.com
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings /sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings

One of the most tested ways to establish such a communication is through the use of sign based languages.

However, not many people are aware of the smaller intricacies involved with sign language.

Sign language recognition using computer vision aims at eliminating the communication barrier between deaf-mute and ordinary people so that they can properly communicate with others.

In recent studies, it has been found that people with hearing disabilities prefer to sign over typing during these video calls.

In this paper, we are proposing a browser extension that will automatically translate sign language to subtitles for everyone else in the video call.

8 months, 3 weeks назад @ paperswithcode.com
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates /alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates

In this paper, we present our submission to the MM-ArgFallacy2025 shared task, which aims to advance research in multimodal argument mining, focusing on logical fallacies in political debates.

Our approach uses pretrained Transformer-based models and proposes several ways to leverage context.

In the fallacy classification subtask, our models achieved macro F1-scores of 0.4444 (text), 0.3559 (audio), and 0.4403 (multimodal).

Our multimodal model showed performance comparable to the text-only model, suggesting potential for improvements.

PDFAbstract

8 months, 3 weeks назад @ paperswithcode.com
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms /RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

On-demand ride-sharing platforms face the fundamental challenge of dynamically bundling passengers with diverse origins and destinations and matching them with vehicles in real time, all under significant uncertainty.

However, conventional MARL-based ride-sharing approaches heavily rely on the accurate estimation of Q-values or V-values, which becomes problematic in large-scale, highly uncertain environments.

To address these challenges, we propose two novel alternative methods that bypass value function estimation.

First, we adapt GRPO to ride-sharing, replacing the PPO baseline with the group average reward to eliminate critic estimation errors and reduce training bias.

Second, inspired b…

8 months, 3 weeks назад @ paperswithcode.com
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation /LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Include the markdown at the top of your GitHub README.md file to showcase the performance of the model.

Badges are live and will be dynamically updated with the latest ranking of this paper.

8 months, 3 weeks назад @ paperswithcode.com
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space

We introduce a novel classification framework, ZClassifier, that replaces conventional deterministic logits with diagonal Gaussian-distributed logits. Code: https://github.com/ShimSoonYong/ZClassifier

9 months назад @ paperswithcode.com
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation

In this paper, we fill this gap and propose a family of novel gradual semantics for equipping assumptions, which are the core components in ABA frameworks, with dialectical strengths. Code: https://github.com/briziorusso/GradualABA

9 months назад @ paperswithcode.com
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months назад @ paperswithcode.com
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling

Despite rapid progress in end-to-end AI music generation, AI-driven modeling of professional Digital Signal Processing (DSP) workflows remains challenging. Code: https://github.com/IsaacYQH/WildFX

9 months назад @ paperswithcode.com
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss

This paper explores the application of a simple weighted loss function to Transformer-based models for multi-label emotion detection in SemEval-2025 Shared Task 11. Code: https://github.com/summer1278/semeval2025-task11

9 months назад @ paperswithcode.com
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs

We present Step-wise Policy for Rare-tool Knowledge (SPaRK), a novel reinforcement learning framework that teaches large language models to explore diverse tool usage patterns beyond conventional high-temperature sampling. Code: https://github.com/gabrielkmbo/explore-rl

9 months назад @ paperswithcode.com
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation

Service robots are increasingly deployed in diverse and dynamic environments, where both physical layouts and social contexts change over time and across locations. Code: https://github.com/Cavendish518/LE-Nav

9 months назад @ paperswithcode.com
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months назад @ paperswithcode.com
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array

The frequent data swaps between the systolic array and external vector units result in low systolic array utilization. Code: https://github.com/VCA-EPFL/FSA

9 months назад @ paperswithcode.com
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 8 months, 3 weeks назад
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months назад @ paperswithcode.com
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training

Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. Code: https://github.com/benedekrozemberczki/pytorch_geometric_temporal

9 months назад @ paperswithcode.com
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months назад @ paperswithcode.com
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context

Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. Code: https://github.com/gitter-lab/Assay2Mol

9 months назад @ paperswithcode.com
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition

We employ forward Kullback-Leibler (KL) divergence alongside spatio-temporal focal modulation to effectively transfer both local and global context from the Video-FocalNet Base (teacher) to the proposed VFL-Net (student). Code: https://github.com/hayatkhan8660-maker/DVFL-Net

9 months назад @ paperswithcode.com
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

9 months назад @ paperswithcode.com
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters

This paper introduces a novel approach to project success evaluation by integrating fuzzy logic into an existing construct. Code: https://github.com/joaojcorreia/FuzzyLogic_ProjectSuccess

9 months назад @ paperswithcode.com
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing

Extensive experiments demonstrate the effectiveness of InstructFLIP by outperforming SOTA models in accuracy and substantially reducing training redundancy across diverse domains in FAS. Code: https://github.com/kunkunlin1221/InstructFLIP

9 months назад @ paperswithcode.com
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images

Recent progress has been made in region-aware vision-language modeling, particularly with the emergence of the Describe Anything Model (DAM). Code: https://github.com/Linvyl/DAM-QA

9 months назад @ paperswithcode.com
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker

We propose multi-step custom implementation utilizing widely adopted hybrid search (metadata & embedding) and state of the art late interaction re-ranker to retrieve best matching pages. Code: https://github.com/abhijeet3922/vision-RAG

9 months назад @ paperswithcode.com
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation

3D modeling is moving from virtual to physical. Code: https://github.com/ziangcao0312/PhysX

9 months назад @ paperswithcode.com
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Code: https://github.com/henry123-boy/SpaTrackerV2

9 months назад @ paperswithcode.com
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks

Analysis of network performance, correlation patterns, and weight matrices reveals that mutual information minimization yields high task performance alongside clear functional modularity and moderate structural modularity. Code: https://github.com/cncs-fit/mio_rnn

9 months назад @ paperswithcode.com
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator

Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. Code: https://github.com/coswindywang/HdLM

9 months назад @ paperswithcode.com
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL

Modern language models address complex questions through chain-of-thought (CoT) reasoning (Wei et al., 2023) and retrieval augmentation (Lewis et al., 2021), yet struggle with error propagation and knowledge integration. Code: https://github.com/ahmedehabb/From-Roots-to-Rewards-Dynamic-Tree-Reasoning-with-RL

9 months назад @ paperswithcode.com
💼 University and corporation labs
DeepMind DeepMind
последний пост 1 day, 7 hours назад
Gemini 3.1 Flash TTS: the next generation of expressive AI speech
Gemini 3.1 Flash TTS: the next generation of expressive AI speech Gemini 3.1 Flash TTS: the next generation of expressive AI speech

Today, we’re introducing Gemini 3.1 Flash TTS, the latest text-to-speech model that delivers improved controllability, expressivity and quality — empowering developers, enterprises and everyday users to build the next generation of AI-speech applications.

Starting today, 3.1 Flash TTS is rolling out:For developers in preview via the Gemini API and Google AI StudioFor enterprises in preview on Vertex AIFor Workspace users via Google VidsImproved speech quality and controllabilityWe’ve improved the overall speech quality of Gemini 3.1 Flash TTS, making it our most natural and expressive model to date.

On the Artificial Analysis TTS leaderboard, a benchmark that captures thousands of blind hum…

1 day, 7 hours назад @ blog.google
Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning
Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning

Today, we’re introducing Gemini Robotics-ER 1.6, a significant upgrade to our reasoning-first model that enables robots to understand their environments with unprecedented precision.

This model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning and success detection.

Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection.

Starting today, Gemini Robotics-ER 1.6 is available to developers via the Gemini API and Google AI Studio.

To help you get started, we …

3 days, 7 hours назад @ deepmind.google
Gemma 4: Byte for byte, the most capable open models
Gemma 4: Byte for byte, the most capable open models Gemma 4: Byte for byte, the most capable open models

By using these highly optimized models, you can fine-tune Gemma 4 to achieve state-of-the-art performance on your specific tasks.

Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding.

All models natively process video and images, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding.

Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding.

The edge models feature a 128K context window, while the larger models offer up to 256K, allowing you to pass repositories or long documents in a single prompt.

2 weeks назад @ blog.google
Gemini 3.1 Flash Live: Making audio AI more natural and reliable
Gemini 3.1 Flash Live: Making audio AI more natural and reliable Gemini 3.1 Flash Live: Making audio AI more natural and reliable

Today, we’re advancing Gemini’s real-time dialogue capabilities with Gemini 3.1 Flash Live, our highest-quality audio and voice model yet.

It delivers the speed and natural rhythm needed for the next generation of voice-first AI, offering a more intuitive experience for developers, enterprises and everyday users.

3.1 Flash Live is available across Google products:For developers: Robust reasoning and task executionWe’ve improved 3.1 Flash Live’s overall quality, making it more reliable for developers and enterprises to build voice-first agents that can complete complex tasks at scale.

On ComplexFuncBench Audio, a benchmark that captures multi-step function calling with various constraints, i…

3 weeks назад @ blog.google
Protecting people from harmful manipulation
Protecting people from harmful manipulation Protecting people from harmful manipulation

Why harmful manipulation mattersConsider two scenarios: One AI model gives you facts to make a well-informed healthcare decision that improves your well-being.

Another AI model uses fear to pressure you to make an ill-informed decision that harms your health.

Developing new evaluations for a complex challengeTesting the outcomes of AI harmful manipulationTesting for harmful manipulation is inherently difficult because it involves measuring subtle changes in how people think and act, varying heavily by topic, culture and context.

Our findings show that success in one domain does not predict success in another, validating our targeted approach to testing for harmful manipulation in specific, …

3 weeks, 1 day назад @ deepmind.google
Lyria 3 Pro: Create longer tracks in more
Lyria 3 Pro: Create longer tracks in more Lyria 3 Pro: Create longer tracks in more

Vertex AI: Lyria 3 Pro is now in public preview on Vertex AI for businesses who require on-demand audio at scale.

Lyria 3 Pro is now available alongside Lyria RealTime in AI Studio.

Google Vids: Vids is an AI-powered video creation app that anyone can use.

This is rolling out to Google Workspace customers and Google AI Pro & Ultra subscribers starting this week.

Gemini app: Longer generations with Lyria 3 Pro are now available in the Gemini app, starting with paid subscribers.

3 weeks, 1 day назад @ blog.google
Measuring progress toward AGI: A cognitive framework
Measuring progress toward AGI: A cognitive framework Measuring progress toward AGI: A cognitive framework

Artificial General Intelligence (AGI) has the potential to accelerate scientific discovery and help solve some of humanity’s most pressing problems.

Tracking progress toward AGI will require a wide range of methods and approaches, and we believe cognitive science provides one important piece of the puzzle.

That’s why today, we’re releasing a new paper, “Measuring Progress Toward AGI: A Cognitive Taxonomy,” that presents a scientific foundation for understanding the cognitive capabilities of AI systems.

Deconstructing general intelligenceOur framework draws on decades of research from psychology, neuroscience and cognitive science to develop a cognitive taxonomy.

It identifies 10 key cogniti…

1 month назад @ blog.google
From games to biology and beyond: 10 years of AlphaGo’s impact
From games to biology and beyond: 10 years of AlphaGo’s impact From games to biology and beyond: 10 years of AlphaGo’s impact

Scientific collaboration: We are integrating the search and reasoning principles pioneered with AlphaGo into an AI co-scientist.

We’ve also used AI to better understand the genome, advance fusion energy research, improve weather prediction and more.

Future of intelligenceFor an AI to be truly general, it needs to understand the physical world.

We think the combination of Gemini’s world models, AlphaGo’s search and planning techniques, and specialized AI tool use will prove to be critical for AGI.

True creativity is a key capability that such an AGI system would need to exhibit.

1 month, 1 week назад @ deepmind.google
Gemini 3.1 Flash-Lite: Built for intelligence at scale
Gemini 3.1 Flash-Lite: Built for intelligence at scale Gemini 3.1 Flash-Lite: Built for intelligence at scale

Today, we're introducing Gemini 3.1 Flash-Lite, our fastest and most cost-efficient Gemini 3 series model.

Built for high-volume developer workloads at scale, 3.1 Flash-Lite delivers high quality for its price and model tier.

Starting today, 3.1 Flash-Lite is rolling out in preview to developers via the Gemini API in Google AI Studio and for enterprises via Vertex AI.

Cost-efficiency without compromisePriced at just $0.25/1M input tokens and $1.50/1M output tokens, 3.1 Flash-Lite delivers enhanced performance at a fraction of the cost of larger models.

This low latency is needed for high-frequency workflows, making it an ideal model for developers to build responsive, real-time experiences.

1 month, 2 weeks назад @ blog.google
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
Nano Banana 2: Combining Pro capabilities with lightning-fast speed Nano Banana 2: Combining Pro capabilities with lightning-fast speed

In August of last year, our Gemini Image model, Nano Banana, became a viral sensation, redefining image generation and editing.

Then in November, we released Nano Banana Pro, offering users advanced intelligence and studio-quality creative control.

Today, we’re bringing the best of both worlds to users across Google.

Introducing Nano Banana 2 (Gemini 3.1 Flash Image), our latest state-of-the-art image model.

Now you can get the advanced world knowledge, quality and reasoning you love in Nano Banana Pro, at lightning-fast speed.

1 month, 2 weeks назад @ blog.google
Gemini 3.1 Pro: A smarter model for your most complex tasks
Gemini 3.1 Pro: A smarter model for your most complex tasks Gemini 3.1 Pro: A smarter model for your most complex tasks

Today, we’re releasing the upgraded core intelligence that makes those breakthroughs possible: Gemini 3.1 Pro.

We are shipping 3.1 Pro across our consumer and developer products to bring this progress in intelligence to your everyday applications.

Starting today, 3.1 Pro is rolling out:For developers in preview via the Gemini API in Google AI Studio, Gemini CLI, our agentic development platform Google Antigravity and Android Studioin preview via the Gemini API in Google AI Studio, Gemini CLI, our agentic development platform Google Antigravity and Android Studio For enterprises in Vertex AI and Gemini Enterprisein Vertex AI and Gemini Enterprise For consumers via the Gemini app and Notebook…

1 month, 3 weeks назад @ blog.google
A new way to express yourself: Gemini can now create music
A new way to express yourself: Gemini can now create music A new way to express yourself: Gemini can now create music

New audio verification capabilitiesAll tracks generated in the Gemini app are embedded with SynthID, our imperceptible watermark for identifying Google AI-generated content.

We are also giving you more tools to help identify AI content, broadening our verification capabilities in the Gemini app to include audio, along with image and video.

Simply upload a file and ask if it was generated using Google AI, and Gemini will check for SynthID and use its own reasoning to return a response.

And Google AI Plus, Pro and Ultra subscribers will enjoy higher limits.

Our goal with music generation in the Gemini app is to help you add a fun, custom soundtrack to your daily life.

1 month, 3 weeks назад @ blog.google
Accelerating discovery in India through AI-powered science and education
Accelerating discovery in India through AI-powered science and education Accelerating discovery in India through AI-powered science and education

In the global AI transformation, India is showing exceptional leadership in applying the technology to tackle its own biggest challenges.

But India is going even further, playing a critical international role by convening this week the fourth global AI summit of governments, companies and civil society.

Partnership in India to broaden AI accessOur partnerships are designed to accelerate the pace of progress across India.

Ltd., a K-12 textbook publisher in India, Gemini will be used to transform two million static textbooks into AI-powered interactive journeys across more than 250 titles and 2,000 schools.

Ltd., a K-12 textbook publisher in India, Gemini will be used to transform two million…

1 month, 4 weeks назад @ deepmind.google
Gemini 3 Deep Think: Advancing science, research and engineering
Gemini 3 Deep Think: Advancing science, research and engineering Gemini 3 Deep Think: Advancing science, research and engineering

Our most specialized reasoning mode is now updated to solve modern science, research and engineering challenges.

2 months назад @ deepmind.google
Accelerating Mathematical and Scientific Discovery with Gemini Deep Think
Accelerating Mathematical and Scientific Discovery with Gemini Deep Think Accelerating Mathematical and Scientific Discovery with Gemini Deep Think

Under direction from expert mathematicians and scientists, Gemini Deep Think is solving professional research problems across mathematics, physics, and computer scienceIn the summer of 2025, an advanced version of Gemini Deep Think achieved Gold-medal standard at the International Mathematics Olympiad (IMO) and later, an updated version, obtained similar results at the International Collegiate Programming Contest.

Since then, Gemini Deep Think mode has moved into science, engineering and enterprise workflows to tackle more complex, open-ended challenges.

In the last week, our teams published two papers (1, 2) detailing a cross-disciplinary effort to solve professional research problems usin…

2 months назад @ deepmind.google
Google
последний пост 10 часов назад
Building the agentic future: A spotlight on Google Cloud’s media & entertainment partner ecosystem
Building the agentic future: A spotlight on Google Cloud’s media & entertainment partner ecosystem Building the agentic future: A spotlight on Google Cloud’s media & entertainment partner ecosystem

At Google Cloud, we believe no studio or broadcaster should have to build this future in isolation.

Brahma.ai: Brahma AI, an enterprise AI content platform, is powering high-fidelity digital likenesses across retail, entertainment, and healthcare, making them interactive and intelligence-driven within a secure and governed framework.

Our partners, listed on the Google Cloud Marketplace, are using generative media models to transform massive, static archives into searchable, revenue-generating engines.

By combining real-time observability with AI-driven insights, media teams can proactively optimize engagement and monetization.

Visit the ecosystem in actionThe strength of our ecosystem is it…

10 часов назад @ cloud.google.com
Claude Opus 4.7 on Vertex AI
Claude Opus 4.7 on Vertex AI Claude Opus 4.7 on Vertex AI

Today, we’re announcing the general availability of Claude Opus 4.7 on Vertex AI.

As an upgrade from Claude Opus 4.6, Opus 4.7 works better through ambiguity, is more thorough in its problem solving, and follows instructions more precisely.

How to access: Opus 4.7 is generally available on Vertex AI.

Scale agents: Deploy and govern your Claude agents on serverless infrastructure with Vertex AI Agent Engine .

Govern the full stack with confidence: Protect your Claude on Vertex AI workloads with Google Cloud’s secure-by-design controls.

1 day, 4 hours назад @ cloud.google.com
Multi-region endpoints are available for Claude on Vertex AI
Multi-region endpoints are available for Claude on Vertex AI Multi-region endpoints are available for Claude on Vertex AI

Today, we’re announcing U.S. and EU multi-region endpoints for Claude on Vertex AI are available in public preview.

Please visit our Vertex AI documentation for detailed instructions and to start building today.

When using Claude on Vertex AI, you previously had two choices:Regional endpoints (e.g., us-central1): Keep data and processing within a single specific location.

Pricing: Multi-region endpoint requests follow the standard Claude on Vertex AI pay-as-you-go pricing model.

How to get startedIntegrating multi-region endpoints for Claude models requires a simple change to your configuration.

1 day, 7 hours назад @ cloud.google.com
Guide to prompting Gemini 3.1 Flash TTS (text-to-speech)
Guide to prompting Gemini 3.1 Flash TTS (text-to-speech) Guide to prompting Gemini 3.1 Flash TTS (text-to-speech)

That's an error.

The requested URL /blog/products/ai-machine-learning/gemini-3-1-flash-tts-on-google-cloud/ was not found on this server.

That's all we know.

1 day, 8 hours назад @ cloud.google.com
How to find the sweet spot between cost and performance
How to find the sweet spot between cost and performance How to find the sweet spot between cost and performance

At Google Cloud, we often see customers asking themselves: "How can we manage our generative AI costs effectively without sacrificing the performance and availability our applications demand?"

This guide will walk you through Google Cloud's flexible gen AI infrastructure options, showing you how to find that sweet spot on the efficient frontier between cost and performance.

We'll start with the foundational pay-as-you-go (PayGo) models and then explore how to layer on more specialized options to build a robust and cost-effective gen AI strategy.

Understanding your foundation: Pay-as-You-Go (PayGo) optionsFor many workloads, Google Cloud's standard PayGo offerings provide a powerful and flex…

3 days, 7 hours назад @ cloud.google.com
Near-100% Accurate Data for your Agent with Comprehensive Context Engineering
Near-100% Accurate Data for your Agent with Comprehensive Context Engineering Near-100% Accurate Data for your Agent with Comprehensive Context Engineering

The QueryData LLM has a greater chance to translate the natural language question into the correct query using these instructions.

You can think of schema ontology as a set of “cues” or “hints” – meant to steer the LLM into picking the right tables and columns and synthesizing them correctly into a database query.

Both proximity and school ranking should affect the overall ranking.

Value searches solve the hard problem of correctly associating data values in the database with the “entities” that the question talks about.

Here, value searches will enable QueryData to respond to the agent that this is likely a misspelling of ‘“westwood, which appears as both a real estate brokerage and a city…

6 days, 7 hours назад @ cloud.google.com
QueryData helps agents turn natural language into queries for AlloyDB, Cloud SQL and Spanner
QueryData helps agents turn natural language into queries for AlloyDB, Cloud SQL and Spanner QueryData helps agents turn natural language into queries for AlloyDB, Cloud SQL and Spanner

It is a tool for translating natural language into database queries with near-100% accuracy.

Developers are already seeing the benefits from QueryData, including Hughes Network Systems, a leader in telecommunications, that deployed QueryData in production.

We are excited about the future of agentic systems!"

- Amarender Singh Sardar, Director of AI, Hughes Network SystemsThe opportunity for agentic systems: from intent to actionAgentic systems are evolving from human-advisory roles into active decision-makers.

With requests expressed in natural language, bridging the gap between conversational input and database records is essential.

6 days, 7 hours назад @ cloud.google.com
Behind the Analysis with Google Cloud and Team USA: Architecting AI infrastructure for U.S. Winter Olympians
Behind the Analysis with Google Cloud and Team USA: Architecting AI infrastructure for U.S. Winter Olympians Behind the Analysis with Google Cloud and Team USA: Architecting AI infrastructure for U.S. Winter Olympians

This requires tracking and analyzing a full three-dimensional model of the athlete, frame by frame, in real-time.

In collaboration with Google DeepMind, we built a system to provide this analysis to U.S. Olympians ahead of the Olympic Winter Games.

Our AI pose estimation model transforms a single 2D video into a complete 3D biomechanical analysis, plotting 63 joints in a localized coordinate system.

Snowboarders and skiers move at extreme velocities.

Standard pose estimation models lose tracking the moment this occlusion occurs.

6 days, 7 hours назад @ cloud.google.com
How SAP Concur automates expense reporting with agentic AI
How SAP Concur automates expense reporting with agentic AI How SAP Concur automates expense reporting with agentic AI

For decades, expense automation relied on a simple premise: If the machine can read the text, it can do the work.

While much of the industry was still focused on the design of conversational interfaces, SAP Concur foresaw a bigger shift.

Speed, scale, and ingenuityStandard expense automation is great at seeing what is on receipts but can’t see what is not there.

SAP Concur saw the emergence of AI agents as an opportunity to create systems that could reason, decide, and act.

SAP Concur wanted to create an AI agent that could think like a human assistant: "I see 'Main St.

6 days, 7 hours назад @ cloud.google.com
How to run evals for Conversational Analytics agents
How to run evals for Conversational Analytics agents How to run evals for Conversational Analytics agents

More organizations are using natural language to query data instead of writing manual SQL.

Prism is an open-source evaluation tool for Conversational Analytics in the BigQuery UI and API, as well as the Looker API.

It replaces unpredictable testing methods by letting you create custom sets of questions and answers to reliably measure your agent’s performance.

This means the exact experts building the agents can easily validate their success and catch performance regressions as they iterate.

Understanding the Prism frameworkTo implement Prism effectively, it is important to understand the core architecture governing the evaluation process.

6 days, 7 hours назад @ cloud.google.com
Raising the security baseline: Essential AI and cloud security now on by default
Raising the security baseline: Essential AI and cloud security now on by default Raising the security baseline: Essential AI and cloud security now on by default

At Google Cloud, we believe that modern cloud defense should have AI protection built in and accessible by default, delivering native guardrails and controls that are essential to ensuring that security strengthens your AI rollouts.

To support the next generation of AI innovators, we are making essential AI security and cloud security on by default with a newly enhanced Security Command Center (SCC) Standard tier.

Upgraded security posture checks : The free security baseline for the Standard tier now offers more than 44 misconfiguration checks based on the Google Cloud Security Essentials (GCSE) compliance framework, 21 more than the previous Standard tier version.

Foundational security at …

6 days, 7 hours назад @ cloud.google.com
Guardrails at the gateway: Securing AI inference on GKE with Model Armor
Guardrails at the gateway: Securing AI inference on GKE with Model Armor Guardrails at the gateway: Securing AI inference on GKE with Model Armor

However, as these models handle increasingly sensitive data, they introduce unique AI-driven attack vectors — from prompt injection to sensitive data leakage — that traditional firewalls aren't designed to catch.

We also recommend developers use Model Armor, a guardrail service that integrates directly into the network data path with GKE Service Extensions, to implement a hardened, high-performance inference stack on GKE.

The solution: Decoupled security with Model ArmorModel Armor addresses these gaps by acting as an intelligent gatekeeper that inspects traffic before it reaches your model and after the model responds.

DLP integration: It scans outputs for sensitive data (PII) using Google…

1 week назад @ cloud.google.com
How Estée Lauder Companies uses Cloud Run worker pools for its pull-based agentic workloads
How Estée Lauder Companies uses Cloud Run worker pools for its pull-based agentic workloads How Estée Lauder Companies uses Cloud Run worker pools for its pull-based agentic workloads

You can easily deploy request-driven web applications using Cloud Run services, or execute run-to-completion batch processing with Cloud Run jobs.

Estée Lauder Companies got just that with Cloud Run worker pools, which transform Cloud Run from a platform for web workloads and background tasks, to a platform for pull-based workloads.

Cloud Run worker pools are now generally available.

Estee Lauder Companies’ Rostrum platform is a polymorphic chat service for LLM-powered applications that originally ran as a standalone Cloud Run service.

In just a few weeks, Estee Lauder Companies migrated to a producer-consumer model using Cloud Run worker pools.

1 week назад @ cloud.google.com
New GKE Cloud Storage FUSE Profiles take the guesswork out of configuring AI storage
New GKE Cloud Storage FUSE Profiles take the guesswork out of configuring AI storage New GKE Cloud Storage FUSE Profiles take the guesswork out of configuring AI storage

The trouble with optimizing Cloud Storage FUSEOptimizing Cloud Storage FUSE for high-performance workloads is a multi-dimensional problem.

And as AI/ML has evolved, Cloud Storage FUSE’s capabilities have also increased, with new mount options available to accelerate your workloads.

Introducing Cloud Storage FUSE Profiles for GKEGKE Cloud Storage FUSE Profiles simplify this complexity with pre-defined, dynamically managed StorageClasses tailored for specific AI/ML patterns.

They take the base best practices from Cloud Storage FUSE and add a GKE-specific intelligence layer.

Using GKE Cloud Storage FUSE Profiles delivers several benefits:

1 week, 1 day назад @ cloud.google.com
Claude Mythos Preview: Available in private preview on Vertex AI
Claude Mythos Preview: Available in private preview on Vertex AI Claude Mythos Preview: Available in private preview on Vertex AI

Claude Mythos Preview, Anthropic’s newest and most powerful model, is now available in Private Preview to a select group of Google Cloud customers, as part of Project Glasswing.

The availability of Claude Mythos Preview on Vertex AI underscores our commitment to offer our customers access to models from frontier AI labs.

Combined with the enterprise-grade power of Vertex AI to build, scale, and govern AI applications and agents, this new general-purpose model offers high performance capabilities across a variety of use cases, with new focus on reducing cybersecurity risk.

For more information about this release, visit Anthropic’s blog.

Build with other Claude models — including Claude Opus …

1 week, 2 days назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 1 week назад
New Future of Work: AI is driving rapid change, uneven benefits
New Future of Work: AI is driving rapid change, uneven benefits New Future of Work: AI is driving rapid change, uneven benefits

Publication New Future of Work Report 2025The New Future of Work report brings together research from inside and outside of Microsoft to understand what is happening as AI enters workplaces.

But usage and confidence vary widely across sectors, and men report using AI at work more often than women.

AI systems are increasingly playing a role in decision-making, creativity, and communication, with AI systems being positioned as a “collaborator.” This raises questions about how to support “collaboration” between people and AI, what we can learn from how people interact with each other, and where the capabilities of AI systems raise different opportunities and create different requirements.

Usin…

1 week назад @ microsoft.com
Ideas: Steering AI toward the work future we want
Ideas: Steering AI toward the work future we want Ideas: Steering AI toward the work future we want

JANSSEN: Yeah, yeah, exactly.

TEEVAN: Yeah, yeah, yeah.

I’m curious what you have found particularly surprising about how people and organizations are leveraging AI right now.

And so I do like to picture a future of work where humans are flourishing with AI and where humans still get to do meaningful work.

And I’m very curious about how we can take advantage of AI and do more without running ourselves into the ground because we’re not AI, right?

1 week назад @ microsoft.com
ADeLe: Predicting and explaining AI performance across tasks
ADeLe: Predicting and explaining AI performance across tasks ADeLe: Predicting and explaining AI performance across tasks

By linking outcomes to task demands, ADeLe explains differences in performance, showing how it changes as task complexity increases.

AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance.

Top: (1) Model performance on the ADeLe benchmark and (2) the resulting ability profiles, showing each model’s strengths and limitations across core abilities.

Evaluating ADeLeUsing ADeLe, the team evaluated a range of AI benchmarks and model behaviors to understand what current evaluations capture and what they miss.

This makes it possible to both explain and anticipate potential failures b…

2 weeks, 1 day назад @ microsoft.com
AsgardBench: A benchmark for visually grounded interactive planning
AsgardBench: A benchmark for visually grounded interactive planning AsgardBench: A benchmark for visually grounded interactive planning

At a glance To successfully complete tasks, embodied AI agents must ground and update their plans based on visual feedback.

Spanning 108 controlled task instances across 12 task types, the benchmark requires agents to adapt their plans based on what they observe.

Evaluating AsgardBenchWe tested several leading vision-capable models on AsgardBench and observed that high-performing models require visual grounding to consistently succeed.

Across the models, visual input substantially improved performance: most models more than doubled success rates when given images versus text-only descriptions of the scene.

AsgardBench is open source and available on GitHub (opens in new tab), providing a fo…

3 weeks назад @ microsoft.com
GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation
GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

Video-to-Spatially Grounded Planning (V2GP) is a framework that converts robot demonstration videos into spatially grounded training data, enabling models to learn planning and grounding jointly.

Grounded planning improves both task success and action accuracy, outperforming decoupled approaches in benchmark and real-world evaluations.

We also built Video-to-Spatially Grounded Planning (V2GP), a framework that converts robot demonstration videos into training data to help VLMs learn this capability.

Decoupled vs. grounded planning, illustrating how ambiguous language causes actions to be grounded to the wrong objects.

In contrast, our approach, grounded planning, performs planning and groun…

3 weeks назад @ microsoft.com
Will machines ever be intelligent?
Will machines ever be intelligent? Will machines ever be intelligent?

And the question we’re going to discuss is, are machines intelligent?

No, no, that’s right, that’s right.

I mean, in some sense, you could potentially have a super intelligent system, right, that’s far more intelligent than anything else on the planet.

BURGER: Right, right.

At the same time, I think, you know, transformers are not intelligent in the way that a three-year-old is, right?

3 weeks, 3 days назад @ microsoft.com
Systematic debugging for AI agents: Introducing the AgentRx framework
Systematic debugging for AI agents: Introducing the AgentRx framework Systematic debugging for AI agents: Introducing the AgentRx framework

Debugging AI agent failures is hard because trajectories are long, stochastic, and often multi-agent, so the true root cause gets buried.

As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency.

The challenge: Why AI agents are hard to debugModern AI agents are often:Long-horizon: They perform dozens of actions over extended periods.

LLM-based judging: Finally, an LLM judge uses the validation log and a grounded failure taxonomy to identify the Critical Failure Step—the first unrecoverable error.

Together, we can build AI agents…

1 month назад @ microsoft.com
From raw interaction to reusable knowledge: Rethinking memory for AI agents
From raw interaction to reusable knowledge: Rethinking memory for AI agents From raw interaction to reusable knowledge: Rethinking memory for AI agents

It seems counterintuitive: giving AI agents more memory can make them less effective.

In our recent paper “PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents,” we introduce a plug-and-play memory system that transforms raw agent interactions into reusable knowledge.

Raw interactions are standardized and transformed into propositional knowledge (facts) and prescriptive knowledge (reusable skills).

One memory, any taskMost AI memory systems are built for one job.

Toward reusable memory for agentsAs AI agents take on longer and more complex tasks, its memory needs to evolve from storing past interactions to actively supplying reusable knowledge.

1 month, 1 week назад @ microsoft.com
Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model
Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

At a glance Phi-4-reasoning-vision-15B is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs.

is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs.

This leads to several possible training pipelines:Non-reasoning LLM → reasoning multimodal training: Reasoning and multimodal capabilities are trained together.

Non-reasoning LLM → non-reasoning multimodal → reasoning multimodal training: Multimodal capabilities are learned first, then reasoning is added.

Reasoning LLM → reasoning multimodal training: A reasoning base is used, but all multimodal d…

1 month, 1 week назад @ microsoft.com
Trailer: The Shape of Things to Come
Trailer: The Shape of Things to Come Trailer: The Shape of Things to Come

This is The Shape of Things to Come.

I manage Microsoft Research’s worldwide labs, and I’m excited to introduce this new Microsoft Research Podcast series.

I called the podcast The Shape of Things to Come because as researchers, the problems that we choose to solve and the technologies that we develop do change the shape of the future.

It’s very hard to say whether we’re in an inflection point because I see the advancement of technology accelerating.

But I don’t know what the inflection point is because all I’ve seen is a curve going up.

1 month, 2 weeks назад @ microsoft.com
CORPGEN advances AI agents for real work
CORPGEN advances AI agents for real work CORPGEN advances AI agents for real work

To determine what a benchmark would need to test, we ran MHTEs at scale on some of today’s leading AI agents, exposing four weaknesses.

CORPGEN’s architectureCORPGEN introduces digital employees: LLM-powered AI agents with persistent identities, role-specific expertise, and realistic work schedules.

At 46 tasks, CORPGEN completed 15.2% of tasks, compared with 4.3% for the baselines, roughly 3.5 times more.

CORPGEN also opens a new lens on how AI agents collaborate.

AcknowledgmentsThis work is a result of a collaboration between the Office of the CTO at Microsoft and the Microsoft AI Development Accelerator Program (MAIDAP).

1 month, 2 weeks назад @ microsoft.com
Media Authenticity Methods in Practice: Capabilities, Limitations, and Directions
Media Authenticity Methods in Practice: Capabilities, Limitations, and Directions Media Authenticity Methods in Practice: Capabilities, Limitations, and Directions

We refer to technologies aimed at helping viewers verify the source and history—that is, the provenance—of digital content as media integrity and authentication (MIA) methods.

Today, we are publishing our findings in the Media Integrity & Authentication: Status, Directions & Futures report.

The report distills lessons learned and outlines practical directions for strengthening media integrity in the years ahead.

Watch on-demand Opens in a new tabFindings and directions forwardOur research recognizes that different media integrity and authenticity methods serve differing purposes and offer distinct levels of protection.

Important directions include in-stream tools that display provenance inf…

1 month, 3 weeks назад @ microsoft.com
Project Silica’s advances in glass storage technology
Project Silica’s advances in glass storage technology Project Silica’s advances in glass storage technology

At a glance Microsoft Research publishes breakthrough in Nature on glass-based data storage that could preserve information for 10,000 years.

Glass is a permanent data storage material that is resistant to water, heat, and dust.

Phase voxels, a new storage method: We invented a new type of data storage in glass called phase voxels, in which the phase change of the glass is modified instead of its polarization, showing that only a single pulse is necessary to make a phase voxel.

We demonstrated that these phase voxels can also be formed in borosilicate glass and devised a technique to read the phase information from phase voxels encoded in this material.

A research-grade Writer used to set t…

1 month, 3 weeks назад @ microsoft.com
Rethinking imitation learning with Predictive Inverse Dynamics Models
Rethinking imitation learning with Predictive Inverse Dynamics Models Rethinking imitation learning with Predictive Inverse Dynamics Models

Predictive Inverse Dynamics Models (PIDMs) predict plausible future states, clarifying the direction of behavior during imitation learning.

Predictive Inverse Dynamics Models (PIDMs) offer a different take on imitation learning by changing how agents interpret human behavior.

By grounding the selection process in a plausible future, PIDMs provide a clearer basis for choosing an action during inference.

They then use an inverse dynamics model to predict the action required to move from the current state towards that future state.

A player (left) and a PIDM agent (right) side by side playing the game Bleeding Edge.

2 months, 1 week назад @ microsoft.com
Paza: Introducing automatic speech recognition benchmarks and models for low resource languages
Paza: Introducing automatic speech recognition benchmarks and models for low resource languages Paza: Introducing automatic speech recognition benchmarks and models for low resource languages

At a glance Microsoft Research releases PazaBench and Paza automatic speech recognition models , advancing speech technology for low resource languages.

First-of-its-kind ASR leaderboard, starting with African languages: Pazabench is the first automatic speech recognition (ASR) leaderboard for low-resource languages.

It launches with initial coverage for 39 African languages and benchmarks 52 state‑of‑the‑art ASR and language models, including newly released Paza ASR models for six Kenyan languages.

Paza ASR Models: Built with and for Kenyan languagesThe Paza ASR models consist of three fine-tuned ASR models built on top of state‑of‑the‑art model architectures.

Here is how Paza models compa…

2 months, 1 week назад @ microsoft.com
MIT AI MIT AI
последний пост 2 days, 10 hours назад
Q&A: MIT SHASS and the future of education in the age of AI
Q&A: MIT SHASS and the future of education in the age of AI Q&A: MIT SHASS and the future of education in the age of AI

A: Artificial intelligence isn’t just changing the way students learn — it’s transforming every aspect of society.

We need students who have a moral compass, and who understand how the world works, in all of its political, economic, and human complexity.

We need students who know how to think critically, and who have excellent communication and leadership skills.

Q: What role do the humanities, arts, and social sciences play in preparing MIT students for that future?

A: They’re essential, and are rightly a core part of an MIT education: MIT has long required its undergraduates take at least eight courses in HASS disciplines to graduate.

2 days, 10 hours назад @ news.mit.edu
Human-machine teaming dives underwater
Human-machine teaming dives underwater Human-machine teaming dives underwater

"Divers and AUVs generally don't team at all underwater," says principal investigator Madeline Miller.

Even ROVs are challenging to work with underwater in very skilled manipulation tasks because the manipulators themselves aren't agile enough."

But what if an autonomous underwater vehicle (AUV) could map the line and pinpoint the location of the fault for a diver to fix?

To combine these strengths, Miller and her team are developing hardware and algorithms for underwater navigation and perception — two key capabilities for effective human-robot teaming.

The historical lack of large, labeled sonar image datasets has hindered training of underwater perception algorithms.

2 days, 10 hours назад @ news.mit.edu
A philosophy of work
A philosophy of work A philosophy of work

Michal Masny, the NC Ethics of Technology Postdoctoral Fellow in the MIT Department of Philosophy, investigates the role work plays in our lives and its impact on our well-being.

“Consider a future in which we shorten the work week, or one in which we eliminate work altogether,” Masny says.

“There can be optimal combinations of work and leisure time.”Masny is completing his two-year term in the NC Ethics of Technology Fellowship at the end of the spring semester.

In addition to advancing his research, Masny has been working to foster dialogue and educate students on issues at the intersection of philosophy and computing.

He works mainly in value theory, ethics of technology, and social and …

1 week назад @ news.mit.edu
New technique makes AI models leaner and faster while they’re still learning
New technique makes AI models leaner and faster while they’re still learning New technique makes AI models leaner and faster while they’re still learning

The technique, called CompreSSM, targets a family of AI architectures known as state-space models, which power applications ranging from language processing to audio generation and robotics.

The key insight is that the relative importance of different components within these models stabilizes surprisingly early during training.

On image classification benchmarks, compressed models maintained nearly the same accuracy as their full-sized counterparts while training up to 1.5 times faster.

Compared to Hankel nuclear norm regularization, a recently proposed spectral technique for encouraging compact state-space models, CompreSSM was more than 40 times faster, while also achieving higher accurac…

1 week назад @ news.mit.edu
Sixteen new START.nano companies are developing hard-tech solutions with the support of MIT.nano
Sixteen new START.nano companies are developing hard-tech solutions with the support of MIT.nano Sixteen new START.nano companies are developing hard-tech solutions with the support of MIT.nano

MIT.nano has announced that 16 startups became active participants in its START.nano program in 2025, more than doubling the number of new companies from the previous year.

The newly engaged startups are developing solutions for some of the world’s greatest challenges in health, climate, energy, semiconductors, novel materials, and quantum computing.

“The unique resources of MIT.nano enable not just the foundational research of academia, but the translation of that research into commercial innovations through startups,” says START.nano Program Manager Joyce Wu SM ’00, PhD ’07.

VioNano Innovations is developing specialty material solutions that reduce variability and improve precision in sem…

1 week, 2 days назад @ news.mit.edu
Helping data centers deliver higher performance with less hardware
Helping data centers deliver higher performance with less hardware Helping data centers deliver higher performance with less hardware

But even with pooling, significant device capacity remains underutilized due to performance variability across the devices.

MIT researchers have now developed a system that boosts the performance of storage devices by handling three major sources of variability simultaneously.

To utilize this untapped SSD performance, the researchers developed Sandook, a software-based system that tackles three major forms of performance-hampering variability simultaneously.

Plan globally, react locallyTo handle all three sources of variability, Sandook utilizes a two-tier structure.

The system enabled SSDs to achieve 95 percent of their theoretical maximum performance, without the need for specialized hard…

1 week, 2 days назад @ news.mit.edu
Working to advance the nuclear renaissance
Working to advance the nuclear renaissance Working to advance the nuclear renaissance

Today, there are 94 nuclear reactors operating in the United States, more than in any other country in the world, and these units collectively provide nearly 20 percent of the nation’s electricity.

He became a nuclear engineer for this very reason — to make sure that nuclear technology is up to the task of delivering in this time of considerable need.

That area of study, called multiphysics modeling, involves looking at various physical processes going on in the core of a nuclear reactor to see how they interact — an alternative to studying these processes one at a time.

One key process, neutronics, concerns how neutrons buzz around in the reactor core causing nuclear fission, which is what…

1 week, 6 days назад @ news.mit.edu
Evaluating the ethics of autonomous systems
Evaluating the ethics of autonomous systems Evaluating the ethics of autonomous systems

These test cases can show situations where autonomous systems align well with human values, as well as scenarios that unexpectedly fall short of ethical criteria.

“We can insert a lot of rules and guardrails into AI systems, but those safeguards can only prevent the things we can imagine happening.

Most testing frameworks rely on pre-collected data, but labeled data on subjective ethical criteria are often hard to come by.

In addition, because ethical values and AI systems are both constantly evolving, static evaluation methods based on written codes or regulatory documents require frequent updates.

To test SEED-SET, the researchers evaluated realistic autonomous systems, like an AI-driven …

2 weeks назад @ news.mit.edu
Preview tool helps makers visualize 3D-printed objects
Preview tool helps makers visualize 3D-printed objects Preview tool helps makers visualize 3D-printed objects

Designers, makers, and others often use 3D printing to rapidly prototype a range of functional objects, from movie props to medical devices.

To help users envision how a fabricated object will look, researchers from MIT and elsewhere developed an easy-to-use preview tool that puts appearance first.

Users upload a screenshot of the object from their 3D-printing software, along with a single image of the print material.

In addition, the VisiPrint preview process took about a minute on average, which was more than twice as fast as any competing method.

They also want to add features that allow users to optimize parts of the printing process beyond color of the material.

2 weeks, 1 day назад @ news.mit.edu
MIT researchers use AI to uncover atomic defects in materials
MIT researchers use AI to uncover atomic defects in materials MIT researchers use AI to uncover atomic defects in materials

But in materials science, defects can be intentionally tuned to give materials useful new properties.

Without knowing what defects are in their materials, engineers risk making products that perform poorly or have unintended properties.

Now, MIT researchers have built an AI model capable of classifying and quantifying certain defects using data from a noninvasive neutron-scattering technique.

“For conventional techniques without machine learning, detecting six different defects is unthinkable.

Detecting defectsManufacturers have gotten good at tuning defects in their materials, but measuring precise quantities of defects in finished products is still largely a guessing game.

2 weeks, 3 days назад @ news.mit.edu
Seeing sounds
Seeing sounds Seeing sounds

Now he’s one of five master’s students in the Music Technology and Computation Graduate Program’s inaugural cohort.

When paired with a stimulus like music, these images can “show” sounds in action.

“This approach enables anyone to create music-driven visuals while leveraging the expressive and sometimes unpredictable dynamics of self-organized systems,” Salcedo says.

“He brings great energy and thoughtfulness to his work, and to supporting others in the [music technology and computation graduate] program,” Egozy notes.

“I want users to feel movement and explore sounds and their impact more fully,” he says.

3 weeks назад @ news.mit.edu
MIT engineers design proteins by their motion, not just their shape
MIT engineers design proteins by their motion, not just their shape MIT engineers design proteins by their motion, not just their shape

“AI must go beyond analyzing static forms to understanding how structure and motion are fundamentally intertwined,” Buehler adds.

Now, MIT engineers have taken a major step toward closing the gap with the development of an AI model known as VibeGen.

VibeGen does something no protein design tool has done before.

By enabling researchers to specify motion as a direct design parameter, VibeGen treats proteins less like static shapes and more like programmable mechanical devices.

They also hope to integrate motion-aware design with other AI tools, building toward systems that can design proteins to be not just dynamic, but multifunctional; machines that sense their environment, respond to signal…

3 weeks назад @ news.mit.edu
AI system learns to keep warehouse robot traffic running smoothly
AI system learns to keep warehouse robot traffic running smoothly AI system learns to keep warehouse robot traffic running smoothly

In simulations inspired by actual e-commerce warehouse layouts, this new approach achieved about a 25 percent gain in throughput over other methods.

Importantly, the system can quickly adapt to new environments with different quantities of robots or varied warehouse layouts.

The planning system needs to be adaptive to these changes as the warehouse operations go on,” Zheng says.

“By interacting with simulations inspired by real warehouse layouts, our system receives feedback that we use to make its decision-making more intelligent.

While their system is still far away from real-world deployment, these demonstrations highlight the feasibility and benefits of using a machine learning-guided a…

3 weeks назад @ news.mit.edu
Augmenting citizen science with computer vision for fish monitoring
Augmenting citizen science with computer vision for fish monitoring Augmenting citizen science with computer vision for fish monitoring

Monitoring fish movement and understanding population dynamics are essential for informing conservation efforts and supporting fisheries management.

A team of researchers from the Woodwell Climate Research Center, MIT Sea Grant, the MIT Computer Science and Artificial Intelligence Lab (CSAIL), MIT Lincoln Laboratory, and Intuit explored a new monitoring method using underwater video and computer vision to supplement citizen science efforts.

The open-access paper, “From snapshots to continuous estimates: Augmenting citizen science with computer vision for fish monitoring,” outlines how recent advancements in computer vision and deep learning, from object detection and tracking to species cla…

3 weeks, 1 day назад @ news.mit.edu
Wristband enables wearers to control a robotic hand with their own movements
Wristband enables wearers to control a robotic hand with their own movements Wristband enables wearers to control a robotic hand with their own movements

In demonstrations, the team has shown that a person wearing the wristband can wirelessly control a robotic hand.

Now, MIT engineers have designed an ultrasound wristband that precisely tracks a wearer’s hand movements in real-time.

Some approaches use cameras to record a person’s hand movements as they manipulate objects or perform tasks.

Others involve having a person wear a glove with sensors, which records the person’s hand movements and transmits the data to a receiving robot.

They tested the algorithm on a new set of ultrasound images and found it correctly predicted the corresponding hand gestures.

3 weeks, 1 day назад @ news.mit.edu
Berkeley AI
последний пост 1 month назад
Identifying Interactions at Scale for LLMs
Identifying Interactions at Scale for LLMs Identifying Interactions at Scale for LLMs

Identifying Interactions at Scale for LLMsUnderstanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence.

Therefore, grounded or reality-checked interpretability methods must also be able to capture these influential interactions.

In this blog post, we describe the fundamental ideas behind SPEX and ProxySPEX, algorithms capable of identifying these critical interactions at scale.

SPEX and ProxySPEX FrameworkTo discover influential interactions with a tractable number of ablations, we have developed SPEX (Spectral Explainer).

We formalize this through two observations: sparsity (relatively f…

1 month назад @ bair.berkeley.edu
Information-Driven Design of Imaging Systems
Information-Driven Design of Imaging Systems Information-Driven Design of Imaging Systems

We developed a framework that enables direct evaluation and optimization of imaging systems based on their information content.

The first approach treated imaging systems as unconstrained communication channels, ignoring the physical limitations of lenses and sensors.

Our Information-Driven Encoder Analysis Learning (IDEAL) method uses gradient ascent on information estimates to optimize imaging system parameters.

The standard approach to computational imaging design, end-to-end optimization, jointly trains the imaging hardware and a neural network decoder.

The computational efficiency of IDEAL suggests possibilities for designing imaging systems that were previously intractable.

3 months назад @ bair.berkeley.edu
RL without TD learning
RL without TD learning RL without TD learning

RL without TD learningIn this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer.

We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning.

There are two classes of algorithms in RL: on-policy RL and off-policy RL.

We compared TRL with $n$-step TD learning with different values of $n$, from $1$ (pure TD) to $\infty$ (pure MC).

I still think one of the most important problems in RL (and even in machine learning) is to find a scalable off-policy RL algorithm.

5 months, 2 weeks назад @ bair.berkeley.edu
What exactly does word2vec learn?
What exactly does word2vec learn? What exactly does word2vec learn?

What exactly does word2vec learn?

What exactly does word2vec learn, and how?

In this framing, it’s clear that word2vec is a minimal neural language model.

As a result, the theory predicts exactly what features are learned in terms of the corpus statistics and the algorithmic hyperparameters.

We find that over the course of learning, word2vec builds these linear representations in a sequence of noisy learning steps, and their geometry is well-described by a spiked random matrix model.

7 months, 2 weeks назад @ bair.berkeley.edu
Whole-Body Conditioned Egocentric Video Prediction
Whole-Body Conditioned Egocentric Video Prediction Whole-Body Conditioned Egocentric Video Prediction

Whole-Body Conditioned Egocentric Video Prediction×Predicting Ego-centric Video from human Actions (PEVA).

We trained a model to Predict Ego-centric Video from human Actions (PEVA) for Whole-Body-Conditioned Egocentric Video Prediction.

We train an autoregressive conditional diffusion transformer on Nymeria, a large-scale dataset pairing real-world egocentric video with body pose capture.

We include some samples here:Body Movement Actions Move Forward Rotate Left Rotate Right Left Hand Actions Move Left Hand Up Move Left Hand Down Move Left Hand Left Move Left Hand Right Right Hand Actions Move Right Hand Up Move Right Hand Down Move Right Hand Left Move Right Hand RightLong RolloutHere you…

9 months, 2 weeks назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 5 часов назад
Cost-efficient custom text-to-SQL using Amazon Nova Micro and Amazon Bedrock on-demand inference
Cost-efficient custom text-to-SQL using Amazon Nova Micro and Amazon Bedrock on-demand inference Cost-efficient custom text-to-SQL using Amazon Nova Micro and Amazon Bedrock on-demand inference

The on-demand inference of Amazon Bedrock with fine-tuned Amazon Nova Micro models offers an alternative.

PrerequisitesTo deploy these solutions, you will need the following:An AWS account with billing enabledStandard IAM permissions and role configured to access: Amazon Bedrock Nova Micro model Amazon SageMaker AI Amazon Bedrock Model customizationQuota for ml.g5.48xl instance for Amazon SageMaker AI training.

The second uses Amazon SageMaker AI training jobs for organizations requiring more granular control over hyperparameters and training infrastructure.

Both implementations share the same data preparation pipeline and deploy to Amazon Bedrock for on-demand inference.

Deploy with on-dem…

5 часов назад @ aws.amazon.com
Transform retail with AWS generative AI services
Transform retail with AWS generative AI services Transform retail with AWS generative AI services

This post demonstrates how to build a virtual try-on and recommendation solution on AWS using Amazon Nova Canvas, Amazon Rekognition and Amazon OpenSearch Serverless.

The architecture uses S3 buckets for secure storage, Amazon OpenSearch Serverless for vector similarity search, and DynamoDB for real-time analytics tracking.

Scalability and deploymentBuilt using AWS Serverless Application Model (AWS SAM), the entire solution deploys with a single command and automatically scales based on demand.

These embeddings are indexed in Amazon OpenSearch Serverless with k-nearest neighbors (kNN) search for sub-second similarity matching.

The solution demonstrates how AI services such as Amazon Bedrock…

5 часов назад @ aws.amazon.com
How Automated Reasoning checks in Amazon Bedrock transform generative AI compliance
How Automated Reasoning checks in Amazon Bedrock transform generative AI compliance How Automated Reasoning checks in Amazon Bedrock transform generative AI compliance

AWS offers Automated Reasoning checks as one of several responsible AI tools to help you safeguard your AI applications.

For a detailed walkthrough of how to configure Automated Reasoning policies and see verification in action, see Minimize generative AI hallucinations with Amazon Bedrock Automated Reasoning checks.

More industries adopting Automated Reasoning checksOrganizations across other regulated industries also adopt Automated Reasoning checks to strengthen compliance:Financial services (EU AI Act): Organizations classifying AI risk under the EU AI Act use Automated Reasoning checks to move from inconsistent manual review to formally verifiable, audit-ready compliance workflows.

Con…

5 часов назад @ aws.amazon.com
Create rich, custom tooltips in Amazon Quick Sight
Create rich, custom tooltips in Amazon Quick Sight Create rich, custom tooltips in Amazon Quick Sight

Amazon Quick Sight, the business intelligence (BI) capability of Amazon Quick, is a unified BI service.

Today, we’re announcing sheet tooltips in Amazon Quick Sight.

Complete the following steps to create a sheet tooltip for your Quick Sight visuals:Step 1: Navigate to the Interactions tabIn the Amazon Quick console, in the left pane, under Quick Sight, choose an analysis.

ConclusionSheet tooltips in Amazon Quick Sight enhance the dashboard authoring experience, giving authors the creative freedom to design rich, multi-visual tooltip layouts that display detailed data on hover.

To learn more about sheet tooltips and other new features, visit the Amazon Quick community What’s New section.

1 day, 8 hours назад @ aws.amazon.com
Accelerating decode-heavy LLM inference with speculative decoding on AWS Trainium and vLLM
Accelerating decode-heavy LLM inference with speculative decoding on AWS Trainium and vLLM Accelerating decode-heavy LLM inference with speculative decoding on AWS Trainium and vLLM

Second, speculative decoding improves hardware utilization during decoding.

NxDI provides native support for speculative decoding on Trainium across four modes:Vanilla speculative decoding — Separate draft and target models compiled independently.

For complete documentation, see the Speculative Decoding guide and the EAGLE Speculative Decoding guide.

Figure 2 System architectureBenchmarking setupWe used LLMPerf to run structured, decode-heavy test cases against both the baseline and speculative decoding deployments.

Next stepsTo get started with speculative decoding on AWS Trainium, explore these resources:About the authors

1 day, 8 hours назад @ aws.amazon.com
Rede Mater Dei de Saúde: Monitoring AI agents in the revenue cycle with Amazon Bedrock AgentCore
Rede Mater Dei de Saúde: Monitoring AI agents in the revenue cycle with Amazon Bedrock AgentCore Rede Mater Dei de Saúde: Monitoring AI agents in the revenue cycle with Amazon Bedrock AgentCore

This post is cowritten by Renata Salvador Grande, Gabriel Bueno and Paulo Laurentys at Rede Mater Dei de Saúde.

About Rede Mater Dei de SaúdeWith 45 years of history, Rede Mater Dei is one of Brazil’s most respected healthcare institutions, operating facilities in Belo Horizonte, Betim-Contagem, Nova Lima, Salvador, Uberlândia, Goiânia, Feira de Santana, and a new project underway in São Paulo.

Like many institutions, Rede Mater Dei faced operational challenges:Manual processes were typically handled by hundreds of operational staff.

The agents are executed on Amazon Bedrock AgentCore Runtime, which provides the secure, serverless hosting environment for deploying, running, and scaling AI a…

1 day, 8 hours назад @ aws.amazon.com
Navigating the generative AI journey: The Path-to-Value framework from AWS
Navigating the generative AI journey: The Path-to-Value framework from AWS Navigating the generative AI journey: The Path-to-Value framework from AWS

The Generative AI Path-to-Value frameworkThe Generative AI Path-to-Value (P2V) framework serves as a shared mental model and roadmap for both technical and non-technical stakeholders.

The Generative AI adoption journeyThe Generative AI Path-to-Value (P2V) framework, as a mental model, simplifies the generative AI adoption journey.

Reimagining how generative AI applications are builtThe P2V framework addresses what organizations need to get right across the generative AI journey, but the speed of that journey depends heavily on how teams build.

To dive deeper, watch the full session from AWS re:Invent Introducing AI-Driven Development Lifecycle (AI-DLC)ConclusionThe Generative AI Path-to-Val…

2 days, 5 hours назад @ aws.amazon.com
Use-case based deployments on SageMaker JumpStart
Use-case based deployments on SageMaker JumpStart Use-case based deployments on SageMaker JumpStart

SageMaker JumpStart offers access to solutions for top use cases that can be deployed to SageMaker AI Managed Inference endpoints or SageMaker HyperPod clusters.

Building on this foundation, we’re excited to announce the launch of SageMaker JumpStart optimized deployments.

SageMaker JumpStart improved deployments address the need for rich and straightforward deployment customization on SageMaker JumpStart by offering pre-defined deployment configurations, designed for specific use cases.

PrerequisitesTo begin using SageMaker JumpStart optimized deployments, customers require at minimum the following:After these features are in place, customers can begin using SageMaker JumpStart optimized d…

2 days, 5 hours назад @ aws.amazon.com
Best practices to run inference on Amazon SageMaker HyperPod
Best practices to run inference on Amazon SageMaker HyperPod Best practices to run inference on Amazon SageMaker HyperPod

Cluster creation – one click deploymentTo create a HyperPod cluster with Amazon Elastic Kubernetes Service (Amazon EKS) orchestration, navigate to the SageMaker HyperPod Clusters page in the Amazon SageMaker AI console.

Node Scaling (Karpenter): Karpenter is a Kubernetes cluster autoscaler that provisions or removes compute nodes based on pending pod requirements.

ObservabilityYou can monitor HyperPod Inference metrics through SageMaker HyperPod observability features.

To enable SageMaker HyperPod observability features, follow the instructions in Accelerate foundation model development with one-click observability in Amazon SageMaker HyperPod.

ConclusionIn this post, we explored how Amazon…

2 days, 5 hours назад @ aws.amazon.com
How Guidesly built AI-generated trip reports for outdoor guides on AWS
How Guidesly built AI-generated trip reports for outdoor guides on AWS How Guidesly built AI-generated trip reports for outdoor guides on AWS

Unlike general-purpose AI tools that require constant prompting and oversight, Jack AI works in the background on its own.

In this post, we walk through how Jack AI is built on AWS to power this end-to-end automation.

Impact on outdoor recreation marketingWith Jack AI operating end-to-end on AWS, the impact extends beyond automation and into how outdoor recreation marketing is executed day-to-day.

ResultsSince launching Jack AI on AWS, Guidesly has seen rapid adoption and measurable impact across its community of outdoor guides.

Jack AI adoption has steadily climbed, growing from just over 100 reports in early 2025 to nearly 340 reports by July 2025.

2 days, 5 hours назад @ aws.amazon.com
Spring AI SDK for Amazon Bedrock AgentCore is now Generally Available
Spring AI SDK for Amazon Bedrock AgentCore is now Generally Available Spring AI SDK for Amazon Bedrock AgentCore is now Generally Available

With the new Spring AI AgentCore SDK, you can build production-ready AI agents and run them on the highly scalable AgentCore Runtime.

The Spring AI AgentCore SDK is an open source library that brings Amazon Bedrock AgentCore capabilities into Spring AI through known patterns: annotations, auto-configuration, and composable advisors.

For a hands-on example, see the Building Java AI agents with Spring AI and Amazon Bedrock AgentCore workshop, which demonstrates MCP integration with AgentCore Gateway.

ConclusionIn this post, we showed you how to build production-ready AI agents in Java using the Spring AI AgentCore SDK.

For a hands-on deep dive, try the Building Java AI agents with Spring AI a…

2 days, 10 hours назад @ aws.amazon.com
How to build effective reward functions with AWS Lambda for Amazon Nova model customization
How to build effective reward functions with AWS Lambda for Amazon Nova model customization How to build effective reward functions with AWS Lambda for Amazon Nova model customization

Building effective reward functions can help you customize Amazon Nova models to your specific needs, with AWS Lambda providing the scalable, cost-effective foundation.

This post demonstrates how Lambda enables scalable, cost-effective reward functions for Amazon Nova customization.

AWS Lambda-based reward functions simplifies this through feedback-based learning.

These responses flow to your Lambda function, which evaluates their quality across dimensions like correctness, safety, formatting, and conciseness.

Optimizing your reward function execution within the training loopOnce your reward function works correctly, optimization helps you train faster while controlling costs.

3 days, 7 hours назад @ aws.amazon.com
Understanding Amazon Bedrock model lifecycle
Understanding Amazon Bedrock model lifecycle Understanding Amazon Bedrock model lifecycle

Amazon Bedrock regularly releases new foundation model (FM) versions with better capabilities, accuracy, and safety.

Understanding the model lifecycle is essential for effective planning and management of AI applications built on Amazon Bedrock.

Amazon Bedrock model lifecycle overviewA model offered on Amazon Bedrock can exist in one of three states: Active, Legacy, or End-of-Life (EOL).

ConclusionThe model lifecycle policy in Amazon Bedrock gives you clear stages for managing FM evolution.

About the authorsSaurabh Trikande is a Senior Product Manager for Amazon Bedrock and Amazon SageMaker Inference.

1 week назад @ aws.amazon.com
The future of managing agents at scale: AWS Agent Registry now in preview
The future of managing agents at scale: AWS Agent Registry now in preview The future of managing agents at scale: AWS Agent Registry now in preview

Now available through Amazon Bedrock AgentCore, use AWS Agent Registry to discover, share, and reuse agents, tools, and agent skills across your organization.

Today, we’re announcing AWS Agent Registry (preview) in AgentCore, a single place to discover, share, and reuse AI agents, tools, and agent skills across your enterprise.

Beyond AWS Agent Registry, we’re building toward connecting with external partner catalogs.

The AWS Agent Registry gives you one place to discover, govern, and reuse every agent across your enterprise.

AWS Agent Registry is available in preview today through AgentCore in five AWS Regions: US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), Asia Pacific (T…

1 week назад @ aws.amazon.com
Embed a live AI browser agent in your React app with Amazon Bedrock AgentCore
Embed a live AI browser agent in your React app with Amazon Bedrock AgentCore Embed a live AI browser agent in your React app with Amazon Bedrock AgentCore

Why embed Live View in your applicationEmbedding Live View inside your own application unlocks additional value for your users at scale.

AWS Cloud hosts Amazon Bedrock AgentCore Browser and Amazon Bedrock services that provide the underlying browser automation and streaming capabilities.

Arrow 3 (blue, solid): The Application Server runs browser tools against Amazon Bedrock AgentCore Browser using Playwright Chrome DevTools Protocol (CDP).

ConclusionIn this post, you learned how to use the BrowserLiveView component to embed a Live View of an Amazon Bedrock AgentCore Browser session into your React application.

For a deeper look at Amazon Bedrock AgentCore Browser capabilities, refer to the …

1 week назад @ aws.amazon.com
NVIDIA
последний пост 10 часов назад
No Need for Space Gear — Capcom’s ‘PRAGMATA’ Joins GeForce NOW on Launch Day
No Need for Space Gear — Capcom’s ‘PRAGMATA’ Joins GeForce NOW on Launch Day No Need for Space Gear — Capcom’s ‘PRAGMATA’ Joins GeForce NOW on Launch Day

PRAGMATA, Capcom’s long-awaited sci-fi action adventure, touches down on GeForce NOW the same day it launches worldwide.

The futuristic journey through a cold lunar station in the near future can be streamed instantly from the cloud to almost any device, no console or heavy hardware needed.

Step into the boots of Hugh Williams, an investigator navigating a lunar research station gone silent and Diana, a young android.

Stream it on launch day at full fidelity, even without the latest hardware — no need to wait on a large install or worry about hardware specs.

In addition, members can look for the following:REPLACED (New release on Steam and Xbox , available on Game Pass, April 14, GeForce RT…

10 часов назад @ blogs.nvidia.com
Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters
Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

In the generative and agentic AI era, these facilities have evolved into AI token factories.

Cost per token is an enterprise’s all-in cost to produce each delivered token, usually represented as cost per million tokens.

Understanding how to optimize token cost requires looking at the equation for calculating cost per million tokens.

The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output.

Minimize token cost : When this increase in token output is reflected through the cost equation, it drives down cost per token, which is what grows the profit margin on every interaction served.

1 day, 8 hours назад @ blogs.nvidia.com
New Adobe Premiere Color Grading Mode Accelerated on NVIDIA GPUs
New Adobe Premiere Color Grading Mode Accelerated on NVIDIA GPUs New Adobe Premiere Color Grading Mode Accelerated on NVIDIA GPUs

Color Meets ComputePremiere’s Color Mode is a new clean, responsive interface within Adobe Premiere that enables editors to do color grading on native videos.

Controls are organized into focused modules, each tailored to a specific aspect of color grading.

🦥 Unsloth and NVIDIA have teamed up to eliminate hidden bottlenecks that slow down fine-tuning on NVIDIA GPUs, improving fine-tuning performance by 15%.

Google and NVIDIA have optimized Gemma 4 for NVIDIA GPUs, enabling efficient performance on NVIDIA RTX-powered PCs and workstations, NVIDIA DGX Spark personal AI supercomputers and NVIDIA Jetson Orin Nano edge AI modules.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and …

1 day, 10 hours назад @ blogs.nvidia.com
Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP
Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP

By introducing a lossless compression step implemented with about 30 lines of Python, we can reduce storage costs by $56,000 every month.

Checkpoint compression ratios for dense and MoE modelsThe ranges reflect the key finding that compression depends on model architecture, not hardware.

At 5 GB/s shared storage, ZSTD at 16 GB/s is 3× faster than the write, so compression overlaps completely.

Read the docs : nvCOMP documentation: nvCOMP documentation Go deeper: For teams using GPUDirect Storage, nvCOMP can compress directly into GPU buffers that GDS writes to NVMe with zero CPU involvement.

For details, see: https://docs.nvidia.com/cuda/nvcomp/ https://docs.nvidia.com/gpudirect-storage/api-…

1 week назад @ developer.nvidia.com
How to Accelerate Protein Structure Prediction at Proteome-Scale
How to Accelerate Protein Structure Prediction at Proteome-Scale How to Accelerate Protein Structure Prediction at Proteome-Scale

Most biological processes are governed by proteins interacting with other proteins, forming protein complexes whose structures are described in the hierarchy of protein structure as the quaternary representation.

Because predicting protein complexes can become a combinatorial problem, it’s useful to understand what may be most interesting.

These modular libraries and SDKs can be leveraged to accelerate operations common in protein structure AI and general inference AI workloads, respectively.

We intend to refine our approach further and expand the universe of available protein complexes in the AlphaFold Database.

Getting startedProteome-scale quaternary structure prediction requires more th…

1 week назад @ developer.nvidia.com
Strength and Destiny Collide: ‘Samson: A Tyndalston Story’ Arrives in the Cloud
Strength and Destiny Collide: ‘Samson: A Tyndalston Story’ Arrives in the Cloud Strength and Destiny Collide: ‘Samson: A Tyndalston Story’ Arrives in the Cloud

A timeless story of grit, faith and rebellion takes center stage as Samson: A Tyndalston Story joins the GeForce NOW library today.

The highly anticipated release from Liquid Swords can now be streamed on nearly any device with GeForce NOW bringing cinematic intensity and mythic storytelling to the cloud.

Samson: A Tyndalston Story from Liquid Swords follows Samson, a former enforcer pulled back to the streets that made him.

The game takes full advantage of ray-traced global illumination, reflections and shadows, creating a city that feels cinematic and alive.

No waiting around for downloads or worrying about system specs, just dive straight into the grit and glow of Tyndalston.

1 week назад @ blogs.nvidia.com
National Robotics Week — Latest Physical AI Research, Breakthroughs and Resources
National Robotics Week — Latest Physical AI Research, Breakthroughs and Resources National Robotics Week — Latest Physical AI Research, Breakthroughs and Resources

This National Robotics Week, NVIDIA is highlighting the breakthroughs that are bringing AI into the physical world — as well as the growing wave of robots transforming industries, from agricultural and manufacturing to energy and beyond.

Advancements in robot learning, simulation and foundation models are accelerating development, enabling robots to move from training in virtual environments to real-world deployment faster than ever.

With NVIDIA platforms for simulation, synthetic data and AI-powered robot learning, developers now have the tools to build machines that can perceive, reason and act in complex environments.

Check back here all week for coverage on the latest NVIDIA physical AI…

1 week, 5 days назад @ blogs.nvidia.com
From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI
From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI From RTX to Spark: NVIDIA Accelerates Gemma 4 for Local Agentic AI

Google and NVIDIA have collaborated to optimize Gemma 4 for NVIDIA GPUs, enabling efficient performance across a range of systems — from data center deployments to NVIDIA RTX-powered PCs and workstations, the NVIDIA DGX Spark personal AI supercomputer and NVIDIA Jetson Orin Nano edge AI modules.

As local agentic AI continues to gain momentum, applications like OpenClaw are enabling always-on AI assistants on RTX PCs, workstations and DGX Spark.

Learn how to run OpenClaw for free on RTX GPUs and DGX Spark or using the DGX Spark OpenClaw playbook.

#ICYMI: The Latest Updates for RTX AI PCs✨ Catch up on RTX AI Garage blogs for a host of agentic AI announcements from NVIDIA GTC, such as new open…

2 weeks назад @ blogs.nvidia.com
Press Start on April: GeForce NOW Brings 10 Games to the Cloud
Press Start on April: GeForce NOW Brings 10 Games to the Cloud Press Start on April: GeForce NOW Brings 10 Games to the Cloud

April kicks off with ten new titles, bringing fresh adventures to GeForce NOW, including the launch of Capcom’s highly anticipated PRAGMATA.

A dozen new games are available to stream this week, including Arknights: Endfield, which expands the acclaimed series into a full 3D real‑time strategy adventure.

On GeForce NOW, Arknights: Endfield can be played at the highest settings from virtually any device, enabling crisp visuals and high performance without compromise.

GeForce RTX rendering brings the game’s metallic skylines and glowing wastelands to life, while ultralow-latency streaming ensures every tactical command lands with precision.

The collection streams instantly with GeForce NOW, tu…

2 weeks назад @ blogs.nvidia.com
Efficiency at Scale: NVIDIA, Energy Leaders Accelerating Power‑Flexible AI Factories to Fortify the Grid
Efficiency at Scale: NVIDIA, Energy Leaders Accelerating Power‑Flexible AI Factories to Fortify the Grid Efficiency at Scale: NVIDIA, Energy Leaders Accelerating Power‑Flexible AI Factories to Fortify the Grid

AES, Constellation, Invenergy, NextEra Energy, Nscale Energy & Power and Vistra are working to build the energy generation capacity needed to meet rapidly growing power demand.

It’s an important milestone in grid resilience, supported by an ecosystem for advanced AI factories.

Adaptive Construction Solutions announced a national registered apprenticeship initiative, in collaboration with NVIDIA, to help build the skilled workforce required for AI factories and energy infrastructure.

The efforts articulated how AI, digital twins and workforce innovation are converging to deliver faster, more resilient energy infrastructure.

The announcements address the “power‑to‑rack” challenge — designing …

2 weeks, 2 days назад @ blogs.nvidia.com
Into the Omniverse: NVIDIA GTC Showcases Virtual Worlds Powering the Physical AI Era
Into the Omniverse: NVIDIA GTC Showcases Virtual Worlds Powering the Physical AI Era Into the Omniverse: NVIDIA GTC Showcases Virtual Worlds Powering the Physical AI Era

At the center of this shift are new frontier models for physical AI, including NVIDIA Cosmos 3, NVIDIA Isaac GR00T N1.7 and NVIDIA Alpamayo 1.5.

NVIDIA also released the NVIDIA Physical AI Data Factory Blueprint, designed to push the state of the art in world modeling, humanoid skills and autonomous driving, as well as the NVIDIA Omniverse DSX Blueprint for AI factory digital twin simulation.

Compute Is Data: Real-World Data Is No Longer the MoatReal-world data used to function as a moat for physical AI — but it doesn’t scale.

To help address this, NVIDIA introduced at GTC its Physical AI Data Factory Blueprint, an open reference architecture that transforms compute into large-scale, high-q…

3 weeks назад @ blogs.nvidia.com
Game On: Five New Titles Now Streaming on GeForce NOW
Game On: Five New Titles Now Streaming on GeForce NOW Game On: Five New Titles Now Streaming on GeForce NOW

This week, five new titles are ready to play instantly in the cloud gaming platform’s library.

Plus, Honkai: Star Rail Version 4.1, “Unraveled for Daybreak,” touches down.

Let’s Play TodayHonkai: Star Rail Version 4.1, “Unraveled for Daybreak,” is available now, bringing new adventures aboard the Astral Express.

The crew touches down at Star Rail FEST, a grand interstellar celebration packed with new zones, characters and challenges.

Play the latest Honkai: Star Rail update instantly on GeForce NOW — no installs, just starlight and action.

3 weeks назад @ blogs.nvidia.com
The Future of AI Is Open and Proprietary
The Future of AI Is Open and Proprietary The Future of AI Is Open and Proprietary

As NVIDIA founder and CEO Jensen Huang told attendees at a special session on open frontier models at NVIDIA GTC, “Proprietary versus open is not a thing.

It’s proprietary and open.”That’s why the future of AI innovation isn’t about a single massive model.

Several Nemotron Coalition members joined other leaders building and consuming open models for a back-to-back panel session at GTC.

“There’s a flourishing ecosystem of powerful, closed models but equally capable open models that are going to be coming over the next couple years.”This combination of open and proprietary models drives advancements at frontier AI companies as well as in academia.

“I think the shape of AI is going to reflect …

3 weeks, 1 day назад @ blogs.nvidia.com
Blowing Off Steam: How Power-Flexible AI Factories Can Stabilize the Global Energy Grid
Blowing Off Steam: How Power-Flexible AI Factories Can Stabilize the Global Energy Grid Blowing Off Steam: How Power-Flexible AI Factories Can Stabilize the Global Energy Grid

In a recent white paper, Emerald AI — in collaboration with NVIDIA, EPRI, National Grid and Nebius — showcased how “power-flexible” AI factories can autonomously adjust their power usage during peak demand.

For AI factories, this could unlock significantly faster grid connections without waiting for massive, years-long infrastructure upgrades.

“With this technology, AI factories become friendly and helpful grid assets,” said Varun Sivaram, founder and CEO of Emerald AI.

Emerald AI recorded 100% alignment with over 200 power targets that EPRI and National Grid instructed the AI cluster to follow for this experiment.

“We’ve proved the value that this technology brings.”Scaling London’s Grid a…

3 weeks, 1 day назад @ blogs.nvidia.com
Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety
Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety

Keep agents safe with Nemotron 3 Content SafetyAs agents expand from text‑only to multimodal workflows, safety guardrails must evolve across inputs, retrieval, and outputs.

Nemotron 3 Content Safety is a compact 4B‑parameter multimodal safety model that detects unsafe or sensitive content across text and images.

With Nemotron and NVIDIA NeMo, you’re getting the building blocks for trustworthy, repeatable, and scalable digital assistants for your production agentic systems.

Get started today:Stay up-to-date on NVIDIA Nemotron by subscribing to NVIDIA news and following NVIDIA AI on LinkedIn, X, Discord, and YouTube.

Explore open Nemotron models and datasets on Hugging Face and Blueprints on …

3 weeks, 2 days назад @ developer.nvidia.com
Facebook
последний пост 7 часов назад
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale

We’ve built a unified AI agent platform that encodes the domain expertise of senior efficiency engineers into reusable, composable skills.

Introducing the Capacity Efficiency ProgramWhen the code you ship serves more than 3 billion people, even a 0.1% performance regression can translate to significant additional power consumption.

Many engineers at Meta use our efficiency tools to work on these problems every day.

Skills : These encode domain expertise about performance efficiency.

The pipeline mirrors the defensive AI Regression Solver:Gather context with tools: The AI agent looks up: Opportunity metadata.

7 часов назад @ engineering.fb.com
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines
How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines

Challenging the Conventional Wisdom on AI Context FilesRecent academic research found that AI-generated context files actually decreased agent success rates on well-known open-source Python repositories.

Our codebase is the opposite: proprietary config-as-code with tribal knowledge that exists nowhere in any model’s training data.

Any team with a large, proprietary codebase can benefit:Identify your tribal knowledge gaps.

What’s NextWe are expanding context coverage to additional pipelines across Meta’s data infrastructure and exploring tighter integration between context files and code generation workflows.

This approach turned undocumented tribal knowledge into structured, AI-readable con…

1 week, 3 days назад @ engineering.fb.com
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure
KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure KernelEvolve: How Meta’s Ranking Engineer Agent Optimizes AI Infrastructure

This is the second post in the Ranking Engineer Agent blog series exploring the autonomous AI capabilities accelerating Meta’s Ads Ranking innovation.

We introduce KernelEvolve, an agentic kernel authoring system used by Ranking Engineer Agent and generally applicable to a range of AI models beyond Ads Ranking.

Unlike typical large language model (LLM)-based agents that perform one-shot code generation, KernelEvolve treats kernel optimization as a search problem.

A standard coding assistant lacks the context to write optimized MTIA kernels because it has never seen MTIA documentation, instruction set details, or programming idioms.

KernelEvolve represents an early step toward the vision of …

2 weeks назад @ engineering.fb.com
Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads
Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads Meta Adaptive Ranking Model: Bending the Inference Scaling Curve to Serve LLM-Scale Models for Ads

To overcome this, we have developed the Meta Adaptive Ranking Model, which effectively bends the inference scaling curve with high ROI and industry-leading efficiency.

Introducing Meta Adaptive Ranking ModelServing LLM-scale & complexity models in a real-time ads recommendation environment requires resolving a fundamental tension between model complexity and system efficiency.

Adaptive Ranking Model addresses these challenges through a paradigm shift powered by three core innovations across the serving stack:Inference-efficient model scaling: Adaptive Ranking Model achieves a model complexity equivalent to the O(10 GFLOPs) per token used by top-tier LLMs.

To minimize compute overhead, Adapt…

2 weeks, 2 days назад @ engineering.fb.com
AI for American-Produced Cement and Concrete
AI for American-Produced Cement and Concrete AI for American-Produced Cement and Concrete

Concurrent with the 2026 American Concrete Institute (ACI) Spring Convention, Meta is releasing a new AI model for designing concrete mixes – Bayesian Optimization for Concrete (BOxCrete), as well as the foundational data used to develop award-winning concrete mixes.

Amrize operates 18 cement plants, 141 cement terminals and 269 ready-mix concrete sites across North America.

Alongside the event, Meta is releasing a new AI model for designing concrete mixes, Bayesian Optimization for Concrete (BOxCrete).

How Meta Leverages AI for Concrete MixturesMeta’s AI for concrete model can help suppliers more quickly incorporate U.S. materials into their mixes through an approach called adaptive experi…

2 weeks, 3 days назад @ engineering.fb.com
Friend Bubbles: Enhancing Social Discovery on Facebook Reels
Friend Bubbles: Enhancing Social Discovery on Facebook Reels Friend Bubbles: Enhancing Social Discovery on Facebook Reels

Friend bubbles in Facebook Reels highlight Reels your friends have liked or reacted to, helping you discover new content and making it easier to connect over shared interests.

Friend bubbles enhance the social experience on Facebook Reels by helping you discover content your friends enjoy, creating a shared viewing experience and sparking new conversations.

Along with additional optimizations in the underlying method, this approach enabled us to ship friend bubbles while preserving core Reels performance.

Friend bubbles work because the signal is high value: It adds meaningful social context that helps people decide what’s worth watching.

Engagement also scales consistently with the number …

4 weeks, 1 day назад @ engineering.fb.com
Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation
Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta’s Ads Ranking Innovation

Meta’s Ranking Engineer Agent (REA) autonomously executes key steps across the end-to-end machine learning (ML) lifecycle for ads ranking models.

Powering these interactions are highly sophisticated, complex and massively distributed machine learning (ML) models that continuously evolve to serve both advertisers and people who use the platforms.

Optimizing these ML models has traditionally been time-consuming.

To address this, Meta built the Ranking Engineer Agent, an autonomous AI agent designed to drive the end-to-end ML lifecycle and iteratively evolve Meta’s ads ranking models at scale.

ML training jobs run for hours or days, far beyond what any session-bound assistant can manage.

1 month назад @ engineering.fb.com
Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps
Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps Patch Me If You Can: AI Codemods for Secure-by-Default Android Apps

Nowhere is this more apparent than in mobile security, where a single class of vulnerability can be replicated across hundreds of call sites scattered throughout a sprawling, multi-app codebase serving billions of users.

Meta’s Product Security team has developed a two-pronged strategy to address this:Designing secure-by-default frameworks that wrap potentially unsafe Android OS APIs and make the secure path the easiest path for developers, andLeveraging generative AI to automate the migration of existing code to those frameworks at scale.

The result is a system that can propose, validate, and submit security patches across millions of lines of code with minimal friction for the engineers w…

1 month назад @ engineering.fb.com
RCCLX: Innovating GPU communications on AMD platforms
RCCLX: Innovating GPU communications on AMD platforms RCCLX: Innovating GPU communications on AMD platforms

RCCLX is fully integrated with Torchcomms and aims to empower researchers and developers to accelerate innovation, regardless of their chosen backend.

We want to iterate on collectives, transports, and novel features quickly on AMD platforms.

With RCCLX, we have integrated CTran to AMD platforms, enabling the AllToAllvDynamic – a GPU-resident collective.

These features provide significant performance improvements on AMD platforms and we are excited to share this with the community.

RCCLX Quick Start GuideInstall Torchcomms with RCCLX backend by following the installation instructions in the Torchcomms repo.

1 month, 3 weeks назад @ engineering.fb.com
The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It
The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It The Death of Traditional Testing: Agentic Development Broke a 50-Year-Old Field, JiTTesting Can Revive It

A Catching JiTTest focuses specifically on finding regressions introduced by a code change.

Agentic development dramatically increases the pace of code change, straining test development burden and scaling the cost of false positives and test maintenance to breaking point.

And since the JiTTest itself is LLM-generated, it can often infer the plausible intention of a code change and simulate possible faults that may result from it.

With them engineers no longer have to spend time writing, reviewing, and testing complex test code.

READ THE PAPERJust-in-Time Catching Test Generation at Meta

2 months назад @ engineering.fb.com
Adapting the Facebook Reels RecSys AI Model Based on User Feedback
Adapting the Facebook Reels RecSys AI Model Based on User Feedback Adapting the Facebook Reels RecSys AI Model Based on User Feedback

Our new User True Interest Survey (UTIS) model , now helps surface more niche, high-quality content and boosts engagement, retention, and satisfaction.

Our paper, “ Improve the Personalization of Large-Scale Ranking Systems by Integrating User Survey Feedback ” shares full details on this work.

The main candidate ranking model used by the platform is a large multi-task, multi-label model.

We trained a lightweight UTIS alignment model layer on the collected user survey responses using existing predictions of the main model as input features.

The UTIS model consistently outperformed the baseline, driving higher user engagement and retention .

3 months назад @ engineering.fb.com
DrP: Meta’s Root Cause Analysis Platform at Scale
DrP: Meta’s Root Cause Analysis Platform at Scale DrP: Meta’s Root Cause Analysis Platform at Scale

DrP’s key components include:Expressive SDK : The DrP SDK allows engineers to codify investigation workflows into analyzers.

Post-processing system : After an investigation, the post-processing system can take automated actions based on the analysis results.

Bootstrap code : The DrP SDK provides bootstrap code to create a template analyzer with pre-populated boilerplate code.

Data access and analysis : The SDK includes libraries for data access and analysis, such as dimension analysis and time series correlation.

This provides immediate analysis results to on-call engineers.

3 months, 4 weeks назад @ engineering.fb.com
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks

Generative AI and automation accelerate the adoption of secure frameworks at scale, enabling consistent security enforcement and efficient migration across Meta’s vast codebase.

How We Design Secure-by-Default Frameworks at MetaDesigning secure-by-default frameworks for use by a large number of developers shipping vastly different features across multiple apps is an interesting challenge.

There shouldn’t be one security framework that covers all security issues, and not every security issue is general enough to deserve its own framework.

Now that we’ve looked at the design philosophy behind our frameworks, let’s look at one of our most widely used Android security frameworks, SecureLinkLaun…

4 months назад @ engineering.fb.com
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization

Zoomer has delivered training time reductions, and significant QPS improvements, making it the de-facto tool for AI performance optimization across Meta’s entire AI infrastructure.

Zoomer is Meta’s automated, one-stop-shop platform for performance profiling, debugging, analysis, and optimization of AI training and inference workloads.

AI Performance Optimization Using ZoomerZoomer is an automated debugging and optimization platform that works across all of our AI model types (ads recommendations, GenAI, computer vision, etc.)

Memory Analysis : Comprehensive analysis of GPU memory usage patterns, allocation tracking, and leak detection.

Realtime Memory Profiling : GPU memory allocation track…

4 months, 3 weeks назад @ engineering.fb.com
Open Source Is Good for the Environment
Open Source Is Good for the Environment Open Source Is Good for the Environment

But have you heard about open hardware?

And did you know open source can have a positive impact on the environment?

On this episode of the Meta Tech Podcast, Pascal Hartig sits down with Dharmesh and Lisa to talk about all things open hardware, and Meta’s biggest announcements from the 2025 Open Compute Project (OCP) Summit – including a new open methodology for leveraging AI to understand Scope 3 emissions.

You’ll also hear how AI and open hardware are helping Meta push to achieve net zero emissions in 2030, including how AI is being used to develop new concrete mixes for data center construction.

And if you’re interested in learning more about career opportunities at Meta visit the Meta C…

5 months назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 4 months, 2 weeks назад
We are joining OpenAI
We are joining OpenAI We are joining OpenAI

Piotr Niedźwiedź, CEO/CTO and founder of neptune.aiI’m excited to share that we’ve entered into a definitive agreement to be acquired by OpenAI, subject to closing conditions.

We are thrilled to join the OpenAI team and help their AI researchers build better models faster.

Neptune is a metrics dashboard company.”We’ve worked closely with OpenAI to create the metrics dashboard that helps teams building foundation models.

Our future with OpenAINeptune will join OpenAI and continue to support AI researchers with tools to monitor, debug, and evaluate frontier models.

We are looking forward to working with top AI researchers and supporting OpenAI’s mission of ensuring that AGI benefits all of hu…

4 months, 2 weeks назад @ neptune.ai
Synthetic Data for LLM Training
Synthetic Data for LLM Training Synthetic Data for LLM Training

For instance, financial data is highly sensitive and protected by very strict regulations, and synthetic data mimics the real data distribution without revealing customer information.

Read more about how leading foundation model teams curate their training data and other topics in the State of Foundation Model Training Report 2025.

Choosing the right synthetic data generation technique depends on the type of data and its complexity.

Synthetic tabular data generation is a promising direction to overcome these challenges by learning the distribution of the tabular data.

Post-processingAs the distribution of tabular data is highly complex, it makes the synthetic tabular data generation very ch…

5 months назад @ neptune.ai
What are LLM Embeddings: All you Need to Know
What are LLM Embeddings: All you Need to Know What are LLM Embeddings: All you Need to Know

TL;DR LLM embeddings are the numerical, vector representations of text that Large Language Models (LLMs) use to process information.

Unlike their predecessor word embeddings, LLM embeddings are context-aware and dynamically change to capture semantic and syntactic relationships based on the surrounding text.

What are the applications of LLM embeddings?

Word EmbeddingsSparse Word Embeddings One-Hot Vectors 1970s TF-IDF1980s Co-Occurrence MatrixStatic Word Embeddings Word2Vec 2013 GloVe 2014Contextualized word embeddings ELMo 2018 GPT-1 2018 BERT 2018 LLAMA 2023 DeepSeek-V1 2023 GPT-4 2023Static word embeddingsStatic word embeddings, such as word2vec in 2013, marked a significant development.…

5 months, 1 week назад @ neptune.ai
Detecting and Fixing ‘Dead Neurons’ in Foundation Models
Detecting and Fixing ‘Dead Neurons’ in Foundation Models Detecting and Fixing ‘Dead Neurons’ in Foundation Models

TL;DR Dead neurons silently waste compute and reduce effective model capacity in foundation models.

Dead neurons’ impactRecent studies into dead neurons in the context of foundation models show interesting, albeit worrying, results.

These large reported fractions of dead neurons in foundation models are a concern from a computational perspective.

Before we move on to discuss how to detect and fix dead neurons, let’s touch upon an important distinction between dead neurons and vanishing gradients.

Further reading How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models Read moreVisualizing activation distributionsIs your foundation model suffering from dead neurons?

5 months, 2 weeks назад @ neptune.ai
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training

In the first part of this series, we covered the fundamentals of instruction fine-tuning (IFT).

def calculate_irs(instruction, output, reference_model): evaluation_prompt = f""" Instruction: {instruction} Model Output: {output} Rate how well the output follows the instruction on these criteria: 1.

| SourceHINT addresses a computational inefficiency in standard instruction fine-tuning: repeatedly reprocessing the same task instruction with every input example.

Read more about foundation model training infrastructure and other topics in Neptune’s 2025 State of Foundation Model Training Report.

First, during initial instruction fine-tuning across multiple diverse tasks, the model learns genera…

5 months, 3 weeks назад @ neptune.ai
How to Optimize LLM Inference
How to Optimize LLM Inference How to Optimize LLM Inference

Large Language Model (LLM) inference at scale is challenging as it involves transferring massive amounts of model parameters and data and performing computations on large tensors.

In the following, we’ll use the Llama model family architecture as a specific example to understand the LLM workload at inference.

For a far more detailed analysis of the LLM workload at inference, see the chapter All About Transformer Inference in the book How to Scale Your Model, published by Google DeepMind.

See also How to Run LLMs Locally Read moreA quick primer on hardware for LLM inferenceA typical LLM inference cluster consists of several nodes, each with a multi-core CPU and multiple accelerator devices, …

6 months назад @ neptune.ai
A Researcher’s Guide to LLM Grounding
A Researcher’s Guide to LLM Grounding A Researcher’s Guide to LLM Grounding

In this article, we’ll explore the fundamental concepts of LLM grounding as well as strategies for optimally grounding models.

What is LLM grounding?

LLM grounding is analogous.

If relevant knowledge cannot be inferred from the data, then LLM grounding cannot yield more relevant responses.

When grounding LLMs using RAG, consider retaining only a few of the top hits (i.e., top-k) for your retrieval queries.

6 months, 3 weeks назад @ neptune.ai
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions

TL;DR Instruction fine-tuning (IFT) refines pre-trained large language models (LLMs) to follow specific task instructions by training on prompt-response pairs.

Instruction fine-tuning in a nutshellIFT tailors LLMs to follow user instructions by bridging their inherent next-word prediction with human-defined objectives.

Related LLM Fine-Tuning and Model Selection Using Neptune and Transformers Read moreParameter-efficient instruction fine-tuningWhile major foundation models like GPT-4 or Llama-2 undergo full parameter instruction fine-tuning during development, parameter-efficient fine-tuning (PEFT) methods have become widely adopted for instruction fine-tuning since the LoRA paper was publi…

7 months назад @ neptune.ai
Understanding Prompt Injection: Risks, Methods, and Defense Measures
Understanding Prompt Injection: Risks, Methods, and Defense Measures Understanding Prompt Injection: Risks, Methods, and Defense Measures

Prompt injection 101: When prompts go rogueThe term ‘Prompt Injection’ comes from SQL injection attacks.

There is another claim of the independent discovery of prompt injection attacks, which suggests that Riley Goodside publicly exhibited a prompt injection in a tweet back in September 2022.

The indirect prompt injection attacks are classified into active, passive, user-driven and virtual prompt attacks.

Virtual prompt injection attacksThis injection type is closely related to passive injection attacks previously described.

Prompt injection: current challenges & lessons learnedThe arms race between prompt injection attacks and defenses is a challenge for researchers, developers, and users.

8 months, 1 week назад @ neptune.ai
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections] SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]

This simple idea avoids computing loss on input prompt tokens the model already knows.

Prompt tokens are (too) expensive in low-resource settingsDuring pre-training, LLMs are trained in causal language modeling through a next-token prediction task.

=> Mo fẹ́ràn ìrẹsì,” the model is trained to predict every token, from the prompt to the actual answer:Step Prompt Next token 1 Translate English Static prompt 2 Translate English to Static prompt 3 Translate English to Yoruba: Static prompt 4 Translate English to Yoruba: I 5 Translate English to Yoruba: I love 6 Translate English to Yoruba: I love rice.

This is straightforward to implement in PyTorch by masking out the prompt tokens in the label …

8 months, 2 weeks назад @ neptune.ai
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models

What gradient issues occur during foundation model training?

During training, gradient descent updates model parameters by computing the gradients of the loss function via forward and backward passes.

The green line corresponds to a learning rate of 10, while the orange line has a learning rate of 0.1.

The gradient norm for the orange line with LR = 0.1 is very high in the first steps, while the gradient norm of the green line with LR = 10 diverges to NaN after a few steps.

Techniques for gradient stabilizationMonitoring gradient norms and training loss provides insights into the learning dynamics of the foundation models.

9 months, 2 weeks назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 1 month, 1 week назад
I BUILT A FULLY AUTOMATIC MANSPLAINER
I BUILT A FULLY AUTOMATIC MANSPLAINER I BUILT A FULLY AUTOMATIC MANSPLAINER

All information about GTC and the DGX Spark Raffle is here: https://www.ykilcher.com/gtc Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Ethereu…

1 month, 1 week назад @ youtube.com
Traditional X-Mas Stream
Traditional X-Mas Stream Traditional X-Mas Stream

Letsgooo

3 months, 2 weeks назад @ youtube.com
Traditional Holiday Live Stream
Traditional Holiday Live Stream Traditional Holiday Live Stream

https://ykilcher.com/discord Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yannic-kilcher

Minds: https://www.minds.com/ykilcher

Parler: https://parler.com/profile/YannicKilcher

LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https:/…

3 months, 2 weeks назад @ youtube.com
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis) TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)

Paper: https://arxiv.org/abs/2511.08923 Abstract:

Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation …

3 months, 2 weeks назад @ youtube.com
Titans: Learning to Memorize at Test Time (Paper Analysis)
Titans: Learning to Memorize at Test Time (Paper Analysis) Titans: Learning to Memorize at Test Time (Paper Analysis)

Paper: https://arxiv.org/abs/2501.00663 Abstract:

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We sh…

4 months назад @ youtube.com
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff) [Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)

https://arxiv.org/abs/2510.17558 Abstract:

We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks. Author: François Fleuret Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the con…

5 months, 2 weeks назад @ youtube.com
[Video Response] What Cloudflare's code mode misses about MCP and tool calling
[Video Response] What Cloudflare's code mode misses about MCP and tool calling [Video Response] What Cloudflare's code mode misses about MCP and tool calling

Theo's Video: https://www.youtube.com/watch?v=bAYZjVAodoo

Cloudflare article: https://blog.cloudflare.com/code-mode/ Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8…

5 months, 4 weeks назад @ youtube.com
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant) [Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

Paper: https://arxiv.org/abs/2508.21038 Abstract:

Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assumption that these difficulties are exclusively due to unrealistic queries, and those that are not can be overcome with better training data and larger models. In this work, we demonstrate that we may encounter these theoretical limitations in realist…

6 months, 1 week назад @ youtube.com
AGI is not coming!
AGI is not coming! AGI is not coming!

jack Morris's investigation into GPT-OSS training data https://x.com/jxmnop/status/1953899426075816164?t=3YRhVQDwQLk2gouTSACoqA&s=09

8 months, 1 week назад @ youtube.com
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis) Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Paper: https://research.trychroma.com/context-rot Abstract:

Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.

In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. Authors: Kelly Hong, Anton Troynikov, Jeff Huber Links:

8 months, 3 weeks назад @ youtube.com
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review) Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Paper: https://arxiv.org/abs/2507.02092

Code: https://github.com/alexiglad/EBT

Website: https://energy-based-transformers.github.io/ Abstract:

Inference-time computation techniques, analogous to human System 2 Thinking, have recently become popular for improving model performances. However, most existing approaches suffer from several limitations: they are modality-specific (e.g., working only in text), problem-specific (e.g., verifiable domains like math and coding), or require additional supervision/training on top of unsupervised pretraining (e.g., verifiers or verifiable rewards). In this paper, we ask the question "Is it possible to generalize these System 2 Thinking approaches, and de…

9 months назад @ youtube.com
Henry AI Labs Henry AI Labs
последний пост None
3blue1brown 3blue1brown
последний пост 7 часов назад
Covering 10 points, a surprisingly tricky puzzle.
Covering 10 points, a surprisingly tricky puzzle. Covering 10 points, a surprisingly tricky puzzle.

Made as part of a monthly series of puzzles for the 2026 Year of Math.

7 часов назад @ youtube.com
Escher's most mind-bending piece
Escher's most mind-bending piece Escher's most mind-bending piece

On "The Print Gallery", by M.C. Escher

Full video: https://youtu.be/ldxFjLJ3rVY

2 weeks, 6 days назад @ youtube.com
The subset sum puzzle
The subset sum puzzle The subset sum puzzle

Part of a series of monthly puzzlers. Stay subscribed to see the solution

3 weeks, 1 day назад @ youtube.com
Escher's most mathematically interesting piece
Escher's most mathematically interesting piece Escher's most mathematically interesting piece

Escher's Print Gallery, and the tour of complex analysis it invites.

Check out our virtual career fair: 3b1b.co/talent

Join channel supporters to see videos early: 3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Original paper by de Smit and Lenstra:

https://pub.math.leidenuniv.nl/~smitbde/papers/2003-de_smit-lenstra-escher.pdf Timestamps: 0:00 - The print gallery

13:04 - Conformal maps from complex analysis

21:41 - The complex exponential

25:56 - The complex logarithm

32:32 - 3b1b Talent

33:14 - Constructing the key function

40:16 - The deeper math behind Escher ------------------ These animations are largely made us…

3 weeks, 4 days назад @ youtube.com
Bacteria Grid Puzzle Solution
Bacteria Grid Puzzle Solution Bacteria Grid Puzzle Solution

Part of a monthly series of puzzlers, in collaboration with MoMath and Peter Winkler

3 weeks, 5 days назад @ youtube.com
The most underappreciated formula | Exploring high-dimensional spheres
The most underappreciated formula | Exploring high-dimensional spheres The most underappreciated formula | Exploring high-dimensional spheres

On the volumes of higher-dimensional spheres

Explore the 3b1b virtual career fair: See https://3b1b.co/talent

Become a supporter for early views of new videos: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Thanks to UC Santa Cruz for letting me film there, and special thanks to Pedro Morales-Almazan for arranging everything. My video on Numberphile with a fun application of this problem: https://youtu.be/6_yU9eJ0NxA Timestamps:

0:00 - Introduction

1:01 - Random puzzle

6:16 - Outside the box

14:35 - Setting up the volume grid

21:14 - Why 4πr^2

25:21 - Archimedes in higher dimensions

36:17 - The general formul…

1 month, 2 weeks назад @ youtube.com
The lattice bacteria puzzle
The lattice bacteria puzzle The lattice bacteria puzzle

Part of a series of monthly puzzles, done in collaboration with MoMath.

https://momath.org/mindbenders

1 month, 3 weeks назад @ youtube.com
Solution to the ladybug clock puzzle
Solution to the ladybug clock puzzle Solution to the ladybug clock puzzle

Solution to last month's probability puzzle.

1 month, 4 weeks назад @ youtube.com
The Hairy Ball Theorem
The Hairy Ball Theorem The Hairy Ball Theorem

Unexpected applications and a beautiful proof.

Looking for a new career? Check out https://3b1b.co/talent

Supporters get early access to new videos: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Credits:

Senia Sheydvasser: Co-writing and sphere deformation animations

Paul Dancstep: Those lovely fluffy sphere animations Vince Rubinetti: Music Timestamps:

0:00 - To comb a hairy ball

1:24 - Applications

8:46 - The puzzle of one null point

12:12 - The proof outline

16:41 - Defining orientation

21:44 - Why inside-out is impossible

25:59 - 3b1b Talent

27:44 - Final food for thought ------------------ These animati…

2 months, 2 weeks назад @ youtube.com
The ladybug clock puzzle
The ladybug clock puzzle The ladybug clock puzzle

This is the first in a set of monthly puzzles, curated by Peter Winkler. This one was originally suggested by Richard Stanley. You can sign up to hear his description of the answer at http://momath.org/mindbenders

3 months назад @ youtube.com
The most absurd product I've made
The most absurd product I've made The most absurd product I've made

Because why not make a pi creature neck pillow?

Available at 3b1b.co/store

4 months, 3 weeks назад @ youtube.com
How Laplace transforms solve differential equations
How Laplace transforms solve differential equations How Laplace transforms solve differential equations

Studying the forced harmonic oscillator by taking a Laplace transform and studying its poles.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Chapter on the Laplace Transform:

https://youtu.be/j0wJBEZdwLs Chapter on the S-plane and Simple Harmonic Motion:

https://youtu.be/-j8PzkZ70Lg Timestamps:

0:00 - Opening puzzle

1:06 - Key properties of a Laplace Transform

3:29 - Qualitative analysis with Laplace Transforms

4:29 - The Laplace Transforms of a Derivative

6:06 - The forced oscillator

11:59 - Intuition from the transformed solution

1…

5 months, 1 week назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

6 months назад @ youtube.com
But what is a Laplace Transform?
But what is a Laplace Transform? But what is a Laplace Transform?

Visualizing the most important tool for differential equations.

Previous chapter: https://youtu.be/-j8PzkZ70Lg

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Artwork by Kurt Bruns Engine animation borrowed with permission from this (excellent) blog: https://ciechanow.ski/internal-combustion-engine/ Timestamps:

0:00 - Understanding the engine

1:16 - Key background ideas

5:41 - Definition and intuition

10:43 - Complex integration

20:43 - Analytic continuation

23:52 - The transform of exponentials

26:15 - A deep look at cos(t)

32:59 - W…

6 months назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

6 months, 1 week назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 5 часов назад
DeepMind’s New AI: A Gift To Humanity
DeepMind’s New AI: A Gift To Humanity DeepMind’s New AI: A Gift To Humanity

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Links:

https://ai.google.dev/gemma/docs/core/model_card_4 Fine tuning with Matt Mireles: https://x.com/mattmireles/status/2041606508220489786 Other sources:

https://x.com/googlegemma/status/2041256042882105666?s=46

https://x.com/nakazakifam/status/2041286410930446370

https://x.com/measure_plan/status/2039815699695104343

https://x.com/maddiedreese/status/2041677327604838685?s=46

https://x.com/steipete/status/2042615534567457102?s=46

https://x.com/maziyarpanahi/status/2042592050940449260?s=46

https://x.com/adrgrondin/status/2041962263507083340?s=46

https://x.com/evgeniymikholap/status/2041104232648950170

https:…

5 часов назад @ youtube.com
“Anthropic’s New AI Is Too Dangerous To Release”
“Anthropic’s New AI Is Too Dangerous To Release” “Anthropic’s New AI Is Too Dangerous To Release”

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://www.anthropic.com/claude-mythos-preview-system-card Links and sources:

https://debugml.github.io/cheating-agents/

https://x.com/bstnxbt/status/2042967285715865685 Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, T…

2 days, 8 hours назад @ youtube.com
NVIDIA’s New AI: The Biggest Leap In Robot Learning Yet
NVIDIA’s New AI: The Biggest Leap In Robot Learning Yet NVIDIA’s New AI: The Biggest Leap In Robot Learning Yet

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper is available here:

https://dreamdojo-world.github.io/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: https://felicia.hu …

5 days, 7 hours назад @ youtube.com
NVIDIA’s New AI: A Revolution...For Free!
NVIDIA’s New AI: A Revolution...For Free! NVIDIA’s New AI: A Revolution...For Free!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The #NVIDIA paper on Nemotron 3 Super is available here:

https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My …

1 week, 2 days назад @ youtube.com
Google New TurboQuant AI: Hype vs. Reality
Google New TurboQuant AI: Hype vs. Reality Google New TurboQuant AI: Hype vs. Reality

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The TurboQuant paper is available here:

https://arxiv.org/abs/2504.19874 Reproduction: https://x.com/AlicanKiraz0/status/2038245538865275274

KV-cache source: https://huggingface.co/blog/not-lain/kv-caching Reviews and criticisms of the paper:

https://openreview.net/forum?id=tO3ASKZlok

https://x.com/gaoj0017/status/2037532673812443214 Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fre…

2 weeks, 1 day назад @ youtube.com
DeepMind’s New AI Just Changed Science Forever
DeepMind’s New AI Just Changed Science Forever DeepMind’s New AI Just Changed Science Forever

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://arxiv.org/abs/2602.10177 Source:

https://www.youtube.com/watch?v=6evUpgCHtOQ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~z…

2 weeks, 6 days назад @ youtube.com
The Algorithm That Made Me Cry
The Algorithm That Made Me Cry The Algorithm That Made Me Cry

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Free course on Ray Tracing:

https://users.cg.tuwien.ac.at/zsolnai/gfx/rendering-course/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: ht…

3 weeks назад @ youtube.com
DeepSeek Just Fixed One Of The Biggest Problems With AI
DeepSeek Just Fixed One Of The Biggest Problems With AI DeepSeek Just Fixed One Of The Biggest Problems With AI

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The #DeepSeek paper is available here:

https://github.com/deepseek-ai/Engram

https://arxiv.org/abs/2601.07372 Larry Wheels:

https://www.youtube.com/watch?v=7SM816P5G9s&lc=Ugz7yiDrr_8YD7w8gaN4AaABAg Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Taz…

3 weeks, 2 days назад @ youtube.com
Honey Is Way More Complex Than You Think
Honey Is Way More Complex Than You Think Honey Is Way More Complex Than You Think

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://xuan-li.github.io/pdf/publications/li2024dynamicduo.pdf Sources:

https://www.youtube.com/watch?v=CfEg7fucVYg Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My rese…

1 month назад @ youtube.com
NVIDIA’s New AI Just Cracked The Hardest Part Of Self Driving
NVIDIA’s New AI Just Cracked The Hardest Part Of Self Driving NVIDIA’s New AI Just Cracked The Hardest Part Of Self Driving

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://github.com/NVlabs/alpamayo Research panel I will be at GTC:

https://www.nvidia.com/gtc/session-catalog/sessions/gtc26-s81810/ Sources:

https://www.youtube.com/watch?v=0aq4Wi2rsOk

https://www.youtube.com/watch?v=I0yPzZp6dM0 Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundv…

1 month, 1 week назад @ youtube.com
Most People Miss What Makes This Impossible
Most People Miss What Makes This Impossible Most People Miss What Makes This Impossible

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://www.geometry.caltech.edu/pubs/LD23.pdf Source:

https://www.youtube.com/watch?v=VIV7GYOBTfM Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Charles Ian Norman Venn, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Shawn Becker, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.t…

1 month, 1 week назад @ youtube.com
DeepMind’s New AI Tracks Objects Faster Than Your Brain
DeepMind’s New AI Tracks Objects Faster Than Your Brain DeepMind’s New AI Tracks Objects Faster Than Your Brain

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://d4rt-paper.github.io/ Our Gaussian Material Synthesis paper:

https://users.cg.tuwien.ac.at/zsolnai/gfx/gaussian-material-synthesis/ Tweet link: https://x.com/GoogleDeepMind/status/2014352808426807527 Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, Taz…

1 month, 1 week назад @ youtube.com
Adobe & NVIDIA: 10,000,000 Sparkles At 280 FPS
Adobe & NVIDIA: 10,000,000 Sparkles At 280 FPS Adobe & NVIDIA: 10,000,000 Sparkles At 280 FPS

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://perso.telecom-paristech.fr/boubek/papers/Glinty/ Demo: https://www.shadertoy.com/view/tcdGDl Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/ #nvidia #adobe

1 month, 3 weeks назад @ youtube.com
The Impossible Physics Of Fire
The Impossible Physics Of Fire The Impossible Physics Of Fire

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper is available here:

https://helgewrede.github.io/firex/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

1 month, 3 weeks назад @ youtube.com
NVIDIA’s New AI Tells You When Photos Lie
NVIDIA’s New AI Tells You When Photos Lie NVIDIA’s New AI Tells You When Photos Lie

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here: https://research.nvidia.com/labs/sil/projects/ppisp/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/ #nvidia

2 months назад @ youtube.com
DataFest Video DataFest Video
последний пост None
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 3 days, 12 hours назад
Где грань между советом и манипуляцией?
Где грань между советом и манипуляцией? Где грань между советом и манипуляцией?

Любой совет — это в какой-то степени манипуляция. Всё зависит от цели: вы пытаетесь что-то продать или действительно хотите помочь пользователю? Отвечает Даня Бурлаков, руководитель группы рекомендательных продуктов. В этом шортсе говорим об этике рекомендательных систем, о том, где проходит эта тонкая грань и почему честность важнее. А вы чувствуете, когда алгоритм пытается вами манипулировать? Делитесь в комментариях 👇 Смотрите полное видео на канале! #совет #манипуляция #этика #рекомендательныесистемы #искусственныйинтеллект #яндекс #машинноеобучение #ml #этикаии

3 days, 12 hours назад @ youtube.com
Влияют ли рекомендации на наше поведение?
Влияют ли рекомендации на наше поведение? Влияют ли рекомендации на наше поведение?

Рекомендательные системы — это не просто алгоритмы. Они влияют на наше настроение, а через него — на многое другое. В этом шортсе разбираемся, как музыка и другие рекомендации меняют пользователей и почему одна из наших целей — делать людей счастливее. Рассказывает Даня Бурлаков, руководитель группы рекомендательных продуктов. Смотрите полное видео на канале! А как вы считаете, рекомендации влияют на ваше настроение? Пишите в комментариях 👇 #влияниерекомендаций #поведениепользователей #настроение #рекомендательныесистемы #яндексмузыка #психология #машинноеобучение #ml #какэтоработает

5 days, 12 hours назад @ youtube.com
Почему рекомендации замыкаются на знакомом контенте?
Почему рекомендации замыкаются на знакомом контенте? Почему рекомендации замыкаются на знакомом контенте?

Что сделать, чтобы рекомендации не надоели пользователю? Как находить музыкальные треки и фильмы, которые будут расширять его интересы? Рассказывает Даня Бурлаков, руководитель группы рекомендательных продуктов. Смотрите полное видео на канале! #эффектпузыря #фильтрпузыря #разнообразие #рекомендательныесистемы #яндексмузыка #машинноеобучение #трансформеры #ml #новоемузыка

1 week, 1 day назад @ youtube.com
«Я хочу всё выбросить и сделать заново» — почему это так больно
«Я хочу всё выбросить и сделать заново» — почему это так больно «Я хочу всё выбросить и сделать заново» — почему это так больно

Даня Бурлаков, руководитель группы рекомендательных продуктов, рассказал, что мешает внедрить ML-модели в привычный флоу кандидат-генераций и ранкеров. Смотрите полное видео на канале! #яндекс #генеративныемодели #генеративныйии #рекомендательныесистемы #машинноеобучение #ml #argus #catboost #внедрение #продакшен #mldevelopment

1 week, 4 days назад @ youtube.com
Заметны ли изменения в рекомендациях Яндекс Музыки?
Заметны ли изменения в рекомендациях Яндекс Музыки? Заметны ли изменения в рекомендациях Яндекс Музыки?

Даня Бурлаков, руководитель группы рекомендательных продуктов, рассказал, как команда внедряет новые фичи и какой эффект они приносят. Смотрите полное видео на канале! #яндекс #яндексмузыка #рекомендации #рекомендательныесистемы #машинноеобучение #ml #аргус #argus #трансформеры #ранжирование

2 weeks, 1 day назад @ youtube.com
Визуально-текстовая омни-модель: путь к объединению LLM и VLM / Роман Исаченко
Визуально-текстовая омни-модель: путь к объединению LLM и VLM / Роман Исаченко Визуально-текстовая омни-модель: путь к объединению LLM и VLM / Роман Исаченко

На Saturday ML Party Роман Исаченко, руководитель группы анализа изображений в Яндекс R&D, рассказал, как выглядел долгий путь к сведению LLM и VLM из части семейства Alice AI в единую омни-модель. Она умеет работать с текстом и изображениями в одном контуре. А ещё поделился ключевыми этапами, компромиссами и планами по развитию модели в ближайшем будущем. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #AI, #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #AIConference

2 weeks, 2 days назад @ youtube.com
Function calling без реальных данных / Ольга Цымбой и Рамиль Латыпов
Function calling без реальных данных / Ольга Цымбой и Рамиль Латыпов Function calling без реальных данных / Ольга Цымбой и Рамиль Латыпов

Обучение языковых моделей взаимодействовать с инструментами упирается в дефицит данных. Открытые датасеты ограничены по тематикам, содержат мало сложных сценариев и практически не встречаются на русском языке. На Saturday ML Party коллеги из Т-Банка Ольга Цымбой, старший исследователь-разработчик, и Рамиль Латыпов, исследователь-разработчик, рассказали, как они построили полностью синтетический пайплайн генерации function calling данных. А также разобрали шаги обучения и показали, как этот подход позволил прирастить качество на специализированных бенчмарках. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #AI, #MachineLearning, #LLM, #GenAI, #AI…

2 weeks, 2 days назад @ youtube.com
LLM в рекомендациях: теперь мы знаем почти всё о стиле жизни и вкусах покупателя / Владислав Уржумов
LLM в рекомендациях: теперь мы знаем почти всё о стиле жизни и вкусах покупателя / Владислав Уржумов LLM в рекомендациях: теперь мы знаем почти всё о стиле жизни и вкусах покупателя / Владислав Уржумов

Как узнать интересы каждого покупателя, если есть только история действий на Маркете и немного метаинформации? На Saturday ML Party Владислав Уржумов, разработчик группы анализа данных и ML для рекомендаций в Яндекс Маркете, рассказал про вариант такого подхода от коллег из китайского маркетплейса. Внутри: как удалось адаптировать метод под наши данные и пользователей, а заодно прирастить метрики. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #ai , #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #aiconference

2 weeks, 2 days назад @ youtube.com
RAG-системы сегодня: архитектуры, качество и наши кейсы / Андрей Соколов
RAG-системы сегодня: архитектуры, качество и наши кейсы / Андрей Соколов RAG-системы сегодня: архитектуры, качество и наши кейсы / Андрей Соколов

RAG заметно изменилась — теперь это полноценная система с разными уровнями обработки, оценкой качества и настройкой моделей под задачу, а не простая связка поиска и генерации. На Saturday ML Party Андрей Соколов, руководитель команды обучения моделей с внешним контекстом в Яндекс R&D, поделился продуктовыми кейсами и разобрал на их примере, как такие системы устроены сегодня, какие подходы работают на практике и когда стоит дообучать модель. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #ai #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #A…

2 weeks, 3 days назад @ youtube.com
Мультиагентные системы банковского сектора / Артём Хусаенов
Мультиагентные системы банковского сектора / Артём Хусаенов Мультиагентные системы банковского сектора / Артём Хусаенов

На Saturday ML Party Артём Хусаенов, CDS платформы цифровых ассистентов в Сбербанке, объяснил, как замерить качество агентов там, где автоматические оценки не дают прозрачного результата, LLM-as-a-judge не работает, а аутсорс-разметка по инструкциям не отображает реальности. А также рассказал, как перестроить продукт без потери пользовательского опыта. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #ai, #MachineLearning, #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #AIConference

2 weeks, 3 days назад @ youtube.com
MarketAI для продавцов: с нуля до мультиагентной системы / Владислав Вихров
MarketAI для продавцов: с нуля до мультиагентной системы / Владислав Вихров MarketAI для продавцов: с нуля до мультиагентной системы / Владислав Вихров

В мультиагентных системах мы экипируем LLM мощными инструментами, даём моделям возможность вызывать различные функции и общаться между собой. Но не всё так просто: на практике данных не хватает, запрос пользователя не всегда очевиден, а похожих кейсов мало. На Saturday ML Party Владислав Вихров, ML-разработчик в Яндекс Маркете, разобрал все эти сложности и поделился конкретными решениями, которые позволили их преодолеть. ➡️ Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+owyCvdge8WIyNTUy #ai, #machinelearning , #LLM, #GenAI, #AIAgents, #RAG, #MLOps, #DataScience, #DeepLearning, #AIEngineering, #NeuralNetworks, #ComputerVision, #NLP, #TechTalk, #AIConference

2 weeks, 3 days назад @ youtube.com
Почему ваши рекомендации всё ещё не идеальны?
Почему ваши рекомендации всё ещё не идеальны? Почему ваши рекомендации всё ещё не идеальны?

Даня Бурлаков, руководитель группы рекомендательных продуктов, рассказал, как инженеры Яндекса исследуют генеративные модели (и при чём тут ARGUS). Смотрите полное видео на канале! #shorts #ml #ai #datascience #рекомендательныесистемы #нейросети

2 weeks, 4 days назад @ youtube.com
Эволюция ML-моделей в рекомендательных системах
Эволюция ML-моделей в рекомендательных системах Эволюция ML-моделей в рекомендательных системах

Как развиваются рекомендательные системы и к чему они должны прийти? Рассказывает Даня Бурлаков, руководитель группы рекомендательных продуктов. Смотрите полное видео на канале! #shorts #ml #ai #datascience #рекомендательныесистемы #нейросети

3 weeks, 1 day назад @ youtube.com
Как ARGUS кратно ускоряет время экспериментов
Как ARGUS кратно ускоряет время экспериментов Как ARGUS кратно ускоряет время экспериментов

ARGUS — новая архитектура для ML-моделей в рекомендательных системах. О ней рассказывает Даня Бурлаков, руководитель группы рекомендательных продуктов. Смотрите полное видео на канале! #shorts #ml #ai #datascience #рекомендательныесистемы #нейросети

3 weeks, 4 days назад @ youtube.com
Какие методы мы используем в рекомендательных системах
Какие методы мы используем в рекомендательных системах Какие методы мы используем в рекомендательных системах

Аналогия с другими пользователями, статистика, матричные факторизации. Об этих и других способах предлагать контент пользователям рассказал Даня Бурлаков, руководитель группы рекомендательных продуктов. Смотрите полное видео на канале! #shorts #ml #ai #datascience #рекомендательныесистемы #нейросети

4 weeks, 1 day назад @ youtube.com
ML Trainings ML Trainings
последний пост 2 days, 13 hours назад
Изменение ожиданий от ии за 4 года
Изменение ожиданий от ии за 4 года Изменение ожиданий от ии за 4 года 2 days, 13 hours назад @ youtube.com
Квантовые компьютеры и дыры в безопасности
Квантовые компьютеры и дыры в безопасности Квантовые компьютеры и дыры в безопасности 2 days, 13 hours назад @ youtube.com
Nvidia создала ии, чтобы чиновники перестали умирать
Nvidia создала ии, чтобы чиновники перестали умирать Nvidia создала ии, чтобы чиновники перестали умирать 2 days, 13 hours назад @ youtube.com
Игрушки с интернетом и страхи родителей
Игрушки с интернетом и страхи родителей Игрушки с интернетом и страхи родителей 2 days, 13 hours назад @ youtube.com
Кибербезопасность в действии как проверяют системы
Кибербезопасность в действии как проверяют системы Кибербезопасность в действии как проверяют системы 2 days, 13 hours назад @ youtube.com
Традиции подношений предкам в Китае
Традиции подношений предкам в Китае Традиции подношений предкам в Китае 2 days, 13 hours назад @ youtube.com
Капитанский мостик №14: OpenAI против Маска | Anthropic проиграл суд | Claude и игрушки
Капитанский мостик №14: OpenAI против Маска | Anthropic проиграл суд | Claude и игрушки Капитанский мостик №14: OpenAI против Маска | Anthropic проиграл суд | Claude и игрушки

0:00:00 Начало

0:01:25 OpenAI против Маска

0:04:32 Цукерберг и открытые модели

0:11:03 Qwen и закрытые модели

0:25:56 Американцы против DeepSeek

0:33:12 Anthropic проиграл суд

0:38:46 ВТБ и китайские GPU

0:46:00 DeepSeek на железе Huawei

0:53:18 Японский ИИ для чиновников

0:58:37 OpenAI и общество

1:11:28 Claude Mythos

1:20:25 Claude и игрушки ИИ-саммари:

Обсуждение последних новостей в сфере искусственного интеллекта, включая конфликты между крупными компаниями, изменения в политике открытости моделей и влияние геополитики на развитие технологий. Обсуждение последних трендов в области искусственного интеллекта, санкций, китайских решений и социальной ответственности в Японии. Эксперты деля…

3 days, 16 hours назад @ youtube.com
Код в Китае: переписывание и независимость
Код в Китае: переписывание и независимость Код в Китае: переписывание и независимость 1 week, 3 days назад @ youtube.com
Дмитрий Колодезев рассказывает о корпоративных use-кейсов
Дмитрий Колодезев рассказывает о корпоративных use-кейсов Дмитрий Колодезев рассказывает о корпоративных use-кейсов 1 week, 3 days назад @ youtube.com
Дмитрий и Валентин обсуждают слабых юристов Perplexity
Дмитрий и Валентин обсуждают слабых юристов Perplexity Дмитрий и Валентин обсуждают слабых юристов Perplexity 1 week, 3 days назад @ youtube.com
Капитанский мостик №13: Claude утёк | SoftBank должен OpenAI | на завод вместо роботов
Капитанский мостик №13: Claude утёк | SoftBank должен OpenAI | на завод вместо роботов Капитанский мостик №13: Claude утёк | SoftBank должен OpenAI | на завод вместо роботов

00:00:00 начало

00:00:47 Claude утёк

00:12:18 NVIDIA и дешевая память

00:15:52 OpenAI и дорогая память

00:20:46 00:00:00 Perplexity оштрафовали

00:29:16 SpaceX продвигает Grok

00:38:33 тепло от дата-центров

00:51:39 SoftBank должен OpenAI

00:58:07 Claude против OpenClaw

01:03:22 миллиард $ в год

01:08:45 китайцы делают GPU

01:13:10 на завод вместо роботов ИИ-саммари:

В этом выпуске мы обсуждаем последние новости в области технологий, включая утечку исходного кода Anthropic, развитие open source, безопасность в AI, а также влияние крупных компаний на рынок памяти и возможные последствия для индустрии. Обсуждение современных технологий, вопросов приватности данных, влияния дата-центров на кли…

1 week, 4 days назад @ youtube.com
Прогноз AI 2027: почему стоит обратить внимание
Прогноз AI 2027: почему стоит обратить внимание Прогноз AI 2027: почему стоит обратить внимание 2 weeks, 3 days назад @ youtube.com
Марк Цукерберг и его агент: как работает CEO организации
Марк Цукерберг и его агент: как работает CEO организации Марк Цукерберг и его агент: как работает CEO организации 2 weeks, 3 days назад @ youtube.com
Дмитрий и Валентин обсуждают усталость общества от боязни терминаторов
Дмитрий и Валентин обсуждают усталость общества от боязни терминаторов Дмитрий и Валентин обсуждают усталость общества от боязни терминаторов 2 weeks, 3 days назад @ youtube.com
Датацентры и экология: мнение эксперта
Датацентры и экология: мнение эксперта Датацентры и экология: мнение эксперта 2 weeks, 3 days назад @ youtube.com
Primer Primer
последний пост 3 months, 2 weeks назад
Taking AI Doom Seriously For 62 Minutes
Taking AI Doom Seriously For 62 Minutes Taking AI Doom Seriously For 62 Minutes

Patreon: https://www.patreon.com/primerlearning

80,000 Hours: 80000hours.org/primer https://www.desmos.com/calculator/a5pfjtr4tr Other connections:

Discord: https://discord.gg/NbruaNW

Twitch: https://www.twitch.tv/justin_helps

Store: https://store.dftba.com/collections/primer Reddit: https://www.reddit.com/r/primerlearning/

Bsky: https://bsky.app/profile/justinhelps.bsky.social

Twitter: https://twitter.com/primerlearning Links to other resources:

https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/

https://www.youtube.com/c/robertmilesai

https://www.youtube.com/@Siliconversations

https://www.youtube.com/@Go-Meta

https://www.youtube.com/@Dwarkes…

3 months, 2 weeks назад @ youtube.com
Simulating a single brain cell
Simulating a single brain cell Simulating a single brain cell

Patreon:

https://www.patreon.com/primerlearning Helpful resources if you want to learn more about neural networks

https://www.youtube.com/@AndrejKarpathy

https://course.fast.ai/

https://www.youtube.com/@WelchLabsVideo

https://www.youtube.com/@3blue1brown Early papers. These probably aren't helpful for understanding the concepts in this video, but if you're interested in history.

The Perceptron – A perceiving and recognizing automaton: https://bpb-us-e2.wpmucdn.com/websites.umass.edu/dist/a/27637/files/2016/03/rosenblatt-1957.pdf

The Perceptron: A probabilistic model for information storage and organization in the brain: https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf A Logical…

6 months, 3 weeks назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 1 week назад
#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age
#495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age #495 – Vikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age

Lars Brownworth is a historian, teacher, podcaster, and author specializing in Viking history, medieval Europe, and the Byzantine Empire.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep495-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://larridin.comBetterHelp: Online therapy and counseling.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

Go to https://perplexity.ai/OUTLINE:(00:00) – Introduction(01:03) – Sponsors, Comments, and Reflections(08:57) – The start of the Viking Age(18:50) – Viking military strategy, tactics & technology(32:33) – Ragnar Lothbrok(42:00) – The Grea…

1 week назад @ lexfridman.com
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution
#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution #494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution

Jensen Huang is the co-founder and CEO of NVIDIA, the world’s most valuable company and the engine powering the AI computing revolution.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep494-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

Go to https://quo.com/lexOUTLINE:(00:00) – Introduction(00:26) – Sponsors, Comments, and Reflections(06:34) – Extreme co-design and rack-scale engineering(09:20) – How Jensen runs NVIDIA(28:41) – AI scaling laws(43:41) – Biggest blockers to AI scaling laws(45:25) – Supply chain(47:20) – Memory(53:25) – Power…

3 weeks, 3 days назад @ lexfridman.com
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming
#493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming #493 – Jeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming

Jeff Kaplan is a legendary Blizzard game designer of World of Warcraft and Overwatch, now preparing to launch a new game, The Legend of California, from his new studio Kintsugiyama – available to wishlist on Steam today, with alpha later in March.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep493-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://fin.ai/lexBlitzy: AI agent for large enterprise codebases.

Go to https://blitzy.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexShopify: Sell stuff online.

1 month назад @ lexfridman.com
#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music
#492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music #492 – Rick Beato: Greatest Guitarists of All Time, History & Future of Music

Rick Beato is a music educator, interviewer, producer, songwriter, and a true multi-instrument musician, playing guitar, bass, cello & piano.

His incredible YouTube channel celebrates great musicians & musical ideas, and helps millions of people fall in love with great music all over again.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep492-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexBetterHelp: Online therapy and counseling.

Go to https://drinkLMNT.com/lexFin: AI agent for customer service.

1 month, 2 weeks назад @ lexfridman.com
#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger
#491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger #491 – OpenClaw: The Viral AI Agent that Broke the Internet – Peter Steinberger

Peter Steinberger is the creator of OpenClaw, an open-source AI agent framework that’s the fastest-growing project in GitHub history.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep491-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://coderabbit.ai/lexFin: AI agent for customer service.

Go to https://fin.ai/lexBlitzy: AI agent for large enterprise codebases.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(03:51) – Sponsors, Comments, and Reflections(15:29) – OpenClaw origin story(18:48) – Mind-blowing moment(28:15) – Why OpenClaw went viral(32:12) – Self-modifying AI agent(36:57)…

2 months назад @ lexfridman.com
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI
#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI #490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI

Nathan Lambert and Sebastian Raschka are machine learning researchers, engineers, and educators.

Sebastian Raschka is the author of Build a Large Language Model (From Scratch) and Build a Reasoning Model (From Scratch).

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep490-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

(25:11) – ChatGPT vs Claude vs Gemini vs Grok: Who is winning?

(36:11) – Best AI for coding(43:02) – Open Source vs Closed Source LLMs(54:41) – Transformers: Evolution of LLMs since 2019(1:02:38) – AI Scaling Laws: Are they dead or still holding?

2 months, 2 weeks назад @ lexfridman.com
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle #489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle

Paul Rosolie is a naturalist, explorer, author of a new book titled Junglekeeper, and is someone who has dedicated his life to protecting the Amazon rainforest.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep489-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://perplexity.ai/BetterHelp: Online therapy and counseling.

Go to https://fin.ai/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

3 months назад @ lexfridman.com
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins #488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins

Joel David Hamkins is a mathematician and philosopher specializing in set theory, the foundations of mathematics, and the nature of infinity, and he’s the #1 highest-rated user on MathOverflow.

He is also the author of several books, including Proof and the Art of Mathematics and Lectures on the Philosophy of Mathematics.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep488-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://masterclass.com/lexpodOUTLINE:(00:00) – Introduction(01:58) – Sponsors, Comments, and Reflections(15:40) – Infinity & paradoxes(1:02:50) – Russell’s paradox(1:15:57) – Gödel’s…

3 months, 2 weeks назад @ lexfridman.com
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths #487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths

Irving Finkel is a scholar of ancient languages and a longtime curator at the British Museum, renowned for his expertise in Mesopotamian history and cuneiform writing.

He specializes in reading and interpreting cuneiform inscriptions, including tablets from Sumerian, Akkadian, Babylonian, and Assyrian contexts.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep487-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/Chevron: Reliable energy for data centers.

4 months назад @ lexfridman.com
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life #486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life

Michael Levin is a biologist at Tufts University working on novel ways to understand and control complex pattern formation in biological systems.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep486-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

(2:42:41) – Mind uploading(3:01:22) – Alien intelligence(3:16:17) – Advice for young people(3:22:46) – Questions for AGI

4 months, 2 weeks назад @ lexfridman.com
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy #485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy

David Kirtley is a nuclear fusion engineer and CEO of Helion Energy, a company working on building the world's first commercial fusion power plant by 2028.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep485-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/david-kirtley-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

David's X: htt…

5 months назад @ lexfridman.com
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming #484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming

Dan Houser is co-founder of Rockstar Games and is a legendary creative mind behind Grand Theft Auto (GTA) and Red Dead Redemption series of video games.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep484-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://box.com/aiUPLIFT Desk: Standing desks and office ergonomics.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(01:29) – Sponsors, Comments, and Reflections(11:32) – Greatest films of all time(23:45) – Making video games(26:36) – GTA 3(29:55) – Open world video games(32:42) – Character creation(36:09) – Superintelligent AI in A Bette…

5 months, 2 weeks назад @ lexfridman.com
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex #483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex

Julia Shaw is a criminal psychologist and author who in her books explores human nature, including psychopathy, violent crime, the psychology of evil, police interrogation, false memory manipulation, deception detection, and human sexuality.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep483-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

6 months назад @ lexfridman.com
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature #482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature

Pavel Durov is the founder and CEO of Telegram.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep482-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Transcript:https://lexfridman.com/pavel-durov-transcriptCONTACT LEX:Feedback – give feedback to Lex: https://lexfridman.com/surveyAMA – submit questions, videos or call-in: https://lexfridman.com/amaHiring – join our team: https://lexfridman.com/hiringOther – other ways to get in touch: https://lexfridman.com/contactEPISODE LINKS:Pavel’s Telegram: https://t.me/durovPavel’s X: https://x.com/durovTelegram: https://telegram.org/Telegram Contests: https://contest.c…

6 months, 2 weeks назад @ lexfridman.com
#481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA
#481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA #481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA

Norman Ohler is a historian and author of “Blitzed: Drugs in the Third Reich,” a book that investigates the role of psychoactive drugs, particularly stimulants such as methamphetamine, in the military history of World War II.

It is a book that two legendary historians Ian Kershaw and Antony Beevor give very high praise for its depth of research.

Norman also wrote “Tripped: Nazi Germany, the CIA, and the Dawn of the Psychedelic Age”, and he is working on a new book “Stoned Sapiens” looking at the history of human civilization through the lens of drugs.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep481-scSee below for timestamps, transcript, and to give f…

6 months, 4 weeks назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 1 week назад
Ideas: Steering AI toward the work future we want
Ideas: Steering AI toward the work future we want Ideas: Steering AI toward the work future we want

JANSSEN: Yeah, yeah, exactly.

TEEVAN: Yeah, yeah, yeah.

I’m curious what you have found particularly surprising about how people and organizations are leveraging AI right now.

And so I do like to picture a future of work where humans are flourishing with AI and where humans still get to do meaningful work.

And I’m very curious about how we can take advantage of AI and do more without running ourselves into the ground because we’re not AI, right?

1 week назад @ microsoft.com
Will machines ever be intelligent?
Will machines ever be intelligent? Will machines ever be intelligent?

And the question we’re going to discuss is, are machines intelligent?

No, no, that’s right, that’s right.

I mean, in some sense, you could potentially have a super intelligent system, right, that’s far more intelligent than anything else on the planet.

BURGER: Right, right.

At the same time, I think, you know, transformers are not intelligent in the way that a three-year-old is, right?

3 weeks, 3 days назад @ microsoft.com
Trailer: The Shape of Things to Come
Trailer: The Shape of Things to Come Trailer: The Shape of Things to Come

Join Microsoft’s Doug Burger and guests as they dig into the fundamental truths about AI and how it will reshape the future.

Technical advances are moving at such a rapid pace that it can be challenging to define the tomorrow we’re working toward.

In The Shape of Things to Come, Microsoft research leader Doug Burger and experts from across disciplines tease out the thorniest AI issues facing technologists, policymakers, business decision-makers, and other stakeholders today.

It’s important to understand what the emerging shapes are and how we should respond.” – Doug Burger, Technical Fellow and Corporate Vice President, Microsoft ResearchAbout Doug BurgerDoug Burger is a research leader in …

1 month, 2 weeks назад @ microsoft.com
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

4 months, 2 weeks назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

6 months, 1 week назад @ microsoft.com
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education

KOHANE: So I think you’ve “nerd sniped” me because you [LAUGHTER]—which is all too easy—but I think there’s a central issue here.

But I actually think this is dark matter of human organizational technology that is not well understood.

AZEEM AZHAR: We didn’t talk about, you know, AI in its ability to potentially do this, which is to extend the clinician’s presence throughout the week.

And so I think there’s always going to be an opening for either differences of opinion or agreeing with you too much.

And this gets into whether AI is really going to get almost to the ab initio understanding of human biology.

7 months, 4 weeks назад @ microsoft.com
Reimagining healthcare delivery and public health with AI
Reimagining healthcare delivery and public health with AI Reimagining healthcare delivery and public health with AI

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

8 months, 1 week назад @ microsoft.com
Navigating medical education in the era of generative AI
Navigating medical education in the era of generative AI Navigating medical education in the era of generative AI

Prior to med school, Daniel pursued experiences that cultivated his interest in the application of AI in medical practice and education.

Really, really looking forward to this chat.

There’s AI before ChatGPT and before, you know, generative AI really became a big thing, and then afterwards.

And then after we talk about what’s really happening, what do you think should happen in medical education given the reality of generative AI?

And I do agree [that] AI really gives us real hope that we can make it true.

8 months, 3 weeks назад @ microsoft.com
AI Testing and Evaluation: Reflections
AI Testing and Evaluation: Reflections AI Testing and Evaluation: Reflections

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

We have examples, like the pharmaceutical or medical device industry experts with whom you spoke, that’s really, you know, testing … there is a pre-deployment requirement.

And the third is just how rigid versus adaptive these testing and evaluation regimes or frameworks are in these different domains.

I really agree that there has been a lot of emphasis to date on, sort of, testing models upstream, the AI model evaluation.

You know, I think there’s been real progress already in the AI evaluation and testing ecosystem in the public-private partnership context.

8 months, 4 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from cybersecurity
AI Testing and Evaluation: Learnings from cybersecurity AI Testing and Evaluation: Learnings from cybersecurity

Absolutely, I really, really was.

As a principal director on the Microsoft AI Red Team, Tori leads all AI security and safety red team operations, as well as dangerous capability testing, to directly inform C-suite decision-makers.

This year, we’ve pulled a lot of those assets and insights into the Azure [AI] Foundry AI Red Teaming Agent (opens in new tab).

So you can get a little taste of what we do day to day in the AI Red Teaming Agent.

WESTERHOFF: I think the most important takeaway from those lessons is that AI security is truly a team sport.

9 months назад @ microsoft.com
How AI will accelerate biomedical research and discovery
How AI will accelerate biomedical research and discovery How AI will accelerate biomedical research and discovery

Dr. Eric Topol is the executive vice president of the biomedical research non-profit Scripps Research, where he founded and now directs the Scripps Research Translational Institute.

Let’s continue our deep dive on AI and biomedical research with this conversation with Noubar Afeyan:LEE: Noubar, thanks so much for joining.

And there’s the origin story of contact with AI, you know, before the emergence of generative AI and afterwards.

What is going on today with respect to AI really being used for something meaningful in the design and development of drugs?

TOPOL: You would read about how, you know, data is the new oil and, you know, gold and whatnot.

9 months, 1 week назад @ microsoft.com
AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices
AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

During the pre-market phase, medical testing establishes baseline safety and effectiveness metrics through bench testing, performance standards, and clinical studies.

SULLIVAN: So medical devices face a pretty prescriptive multi-level testing path before they hit the market.

We are looking into medical devices, as well, obviously, but also other technologies in advanced medical computing.

So we see Phase 3 trials as something that occurs in the medical devices and pharmaceuticals field.

9 months, 1 week назад @ microsoft.com
AI Testing and Evaluation: Learnings from genome editing
AI Testing and Evaluation: Learnings from genome editing AI Testing and Evaluation: Learnings from genome editing

As generative AI continues to advance, Microsoft has gathered a range of experts—from genome editing to cybersecurity—to share how their fields approach evaluation and risk assessment.

CHARO: Well, you know, genome editing is both very old and very new.

Now the earliest forms of genome editing were very inefficient, and so we didn’t worry that much.

But the bottom-line thing to remember, the way to really think about it is, we don’t regulate genome editing; we regulate the things that use genome editing.

And she said, you know, we don’t regulate genome editing; we regulate the things that use genome editing.

9 months, 2 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from Science and Industry
AI Testing and Evaluation: Learnings from Science and Industry AI Testing and Evaluation: Learnings from Science and Industry

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

And I think, really, there are two reasons why tech is so, kind of, representative of that kind of challenge that I’ve always found fascinating.

Continues to be a really important topic in the AI policy conversation right now, I think, for really good reason.

Testing is an important component for governance and AI and, of course, in all of these other domains, as well.

I think about almost, like, in the near to mid-term, like three issues that we need to address in the AI, kind of, policy and testing context.

9 months, 3 weeks назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 2 weeks, 6 days назад
Book Ratings and Recommendations
Book Ratings and Recommendations Book Ratings and Recommendations

Goodreads star ratings can be misleading as measures of "book quality," and research from Hannes Rosenbusch suggests that for many professionally published books, differences between readers often matter more than differences between books. The episode also explores how to model reader preferences, why reviews often reveal more about the reviewer than the text, and how LLMs can aid computational literary research while still falling short of human editors in creative writing.

2 weeks, 6 days назад @ dataskeptic.com
Disentanglement and Interpretability in Recommender Systems
Disentanglement and Interpretability in Recommender Systems Disentanglement and Interpretability in Recommender Systems 1 month, 1 week назад @ dataskeptic.com
Collective Altruism in Recommender Systems
Collective Altruism in Recommender Systems Collective Altruism in Recommender Systems

Ekaterina (Kat) Filadova from MIT EECS joins us to discuss strategic learning in recommender systems—what happens when users collectively coordinate to game recommendation algorithms. Kat's research reveals surprising findings: algorithmic "protest movements" can paradoxically help platforms by providing clearer preference signals, and the challenge of distinguishing coordinated behavior from bot activity is more complex than it appears. This episode explores the intersection of machine learning and game theory, examining what happens when your training data actively responds to your algorithm.

1 month, 2 weeks назад @ dataskeptic.com
Niche vs Mainstream
Niche vs Mainstream Niche vs Mainstream

Anas Buhayh discusses multi-stakeholder fairness in recommender systems and the S'mores framework—a simulation allowing users to choose between mainstream and niche algorithms. His research shows specialized recommenders improve utility for niche users while raising questions about filter bubbles and data privacy.

1 month, 3 weeks назад @ dataskeptic.com
Healthy Friction in Job Recommender Systems
Healthy Friction in Job Recommender Systems Healthy Friction in Job Recommender Systems

In this episode, host Kyle Polich speaks with Roan Schellingerhout, a fourth-year PhD student at Maastricht University, about explainable multi-stakeholder recommender systems for job recruitment. Roan discusses his research on creating AI-powered job matching systems that balance the needs of multiple stakeholders—job seekers, recruiters, HR professionals, and companies. The conversation explores different types of explanations for job recommendations, including textual, bar chart, and graph-based formats, with findings showing that lay users strongly prefer simple textual explanations over more technical visualizations. Roan shares insights from his "healthy friction" study, which tested …

2 months, 1 week назад @ dataskeptic.com
Fairness in PCA-Based Recommenders
Fairness in PCA-Based Recommenders Fairness in PCA-Based Recommenders

In this episode, we explore the fascinating world of recommender systems and algorithmic fairness with David Liu, Assistant Research Professor at Cornell University's Center for Data Science for Enterprise and Society. David shares insights from his research on how machine learning models can inadvertently create unfairness, particularly for minority and niche user groups, even without any malicious intent. We dive deep into his groundbreaking work on Principal Component Analysis (PCA) and collaborative filtering, examining why these fundamental techniques sometimes fail to serve all users equally. David introduces the concept of "power niche users" - highly active users with specialized in…

2 months, 2 weeks назад @ dataskeptic.com
Video Recommendations in Industry
Video Recommendations in Industry Video Recommendations in Industry

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial …

3 months, 3 weeks назад @ dataskeptic.com
Eye Tracking in Recommender Systems
Eye Tracking in Recommender Systems Eye Tracking in Recommender Systems

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrate…

3 months, 4 weeks назад @ dataskeptic.com
Cracking the Cold Start Problem
Cracking the Cold Start Problem Cracking the Cold Start Problem

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insigh…

4 months, 1 week назад @ dataskeptic.com
Designing Recommender Systems for Digital Humanities
Designing Recommender Systems for Digital Humanities Designing Recommender Systems for Digital Humanities

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challen…

4 months, 3 weeks назад @ dataskeptic.com
DataRec Library for Reproducible in Recommend Systems
DataRec Library for Reproducible in Recommend Systems DataRec Library for Reproducible in Recommend Systems

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Mario Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. The conversation covers Alberto's research journey through knowledge graphs, graph-based recommen…

5 months назад @ dataskeptic.com
Shilling Attacks on Recommender Systems
Shilling Attacks on Recommender Systems Shilling Attacks on Recommender Systems

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a …

5 months, 1 week назад @ dataskeptic.com
Music Playlist Recommendations
Music Playlist Recommendations Music Playlist Recommendations

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start prob…

5 months, 2 weeks назад @ dataskeptic.com
Bypassing the Popularity Bias
Bypassing the Popularity Bias Bypassing the Popularity Bias 6 months назад @ dataskeptic.com
Sustainable Recommender Systems for Tourism
Sustainable Recommender Systems for Tourism Sustainable Recommender Systems for Tourism

In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design. Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that balance user satisfaction with environmental concerns, and creating frameworks that distribute tourism more equitably across destinations. Ashmi's insights offer valuable perspectives for both AI res…

6 months, 1 week назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 2 days, 12 hours назад
983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith
983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith 983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith

My guest today took a public school that was about to be shut down and turned it into the number one school in Boston, and AI is her latest secret weapon. In a long-overdue episode on AI for supporting children’s education, hear directly from Principal Traci Walker Griffith how her teachers have been experimenting with AI in classrooms, what works, what doesn’t work, and what’s next for kids as LLMs continue to improve. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/983⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (03:38) Th…

2 days, 12 hours назад @ podtrac.com
982: In Case You Missed It in March 2026
982: In Case You Missed It in March 2026 982: In Case You Missed It in March 2026

Jon Krohn rounds up March’s interviews in this ICYMI episode. Hear from AI and data science experts across the fields of education and business in this wide-ranging series of clips that take listeners from the Renaissance to the near future. Guests include Lin Quiao (Episode 971), Chris Fregly (Episode 973), Zack Kass (Episode 975), Kyunghyun Cho (Episode 977), and Rohit Choudhary (Episode 979). Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/982⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

6 days, 12 hours назад @ podtrac.com
981: How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman
981: How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman 981: How Data Engineers Are “10x’ing” Themselves With Agents, feat. Matt Glickman

Matt Glickman talks to Jon Krohn about co-founding the agentic-platform startup, Genesis Computing, how his experience at Goldman Sachs paved the way for developing AI agents, and where he thinks agentic AI has just as much value as a company’s human employees. This February, Genesis Computing revealed how its platform can offer the guardrails so crucial to businesses, alongside increased capabilities that help execute entire workflows from research to deployment. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/981⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.…

1 week, 2 days назад @ podtrac.com
980: AI Making Theoretical Physics Breakthroughs
980: AI Making Theoretical Physics Breakthroughs 980: AI Making Theoretical Physics Breakthroughs

A team of theoretical physicists from Harvard, Cambridge, the Institute for Advanced Study, and Vanderbilt used OpenAI’s models not just as a tool, but as a collaborator, cracking a problem in particle physics that had stymied them for months. In this Five-Minute Friday, Jon Krohn walks through how GPT-5.2 Pro simplified a 32-variable mathematical expression into a single line, proposed what it called the “obvious generalization” for any number of gluons, and how a more powerful internal model then produced a formal proof after 12 hours of autonomous reasoning. Find out why this may be a template for AI-assisted scientific discovery and what it means for the future of research. Additional m…

1 week, 6 days назад @ podtrac.com
979: Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary
979: Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary 979: Agentic Data Management and the Future of Enterprise AI, with Rohit Choudhary

For years, Jon has been quoting the stat that the world's data is roughly doubling every year. His guest today says that’s way too conservative, he’s seeing enterprise data soon growing at close to 10x per year. And most organizations are nowhere near ready for what that means. In this episode, Rohit Choudhary, founder and CEO of Acceldata, explains how the agentic data management platform his team has built helps enterprises make their increasingly vast amounts of data self-aware, self-optimizing, and AI-ready. He breaks down why governance needs to be operational and real-time rather than a one-time compliance exercise, and shares his view on why the most valuable professionals in the age…

2 weeks, 2 days назад @ podtrac.com
A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)
A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%) A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

A game millions of people solve over morning coffee is exposing a fundamental weakness in today’s most powerful AI models. In this Five-Minute Friday, Jon Krohn breaks down Pathway’s new Sudoku Extreme benchmark, roughly 250,000 of the hardest Sudoku puzzles available and why leading LLMs like o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet scored effectively zero percent, while Pathway’s post-transformer BDH architecture achieved 97.4% accuracy at a fraction of the cost. Listen to the episode to find out what BDH is doing differently, why Sudoku performance matters far beyond puzzles, and what this means for the future of AI reasoning. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatas…

2 weeks, 6 days назад @ podtrac.com
977: Attention, World Models and the Future of AI, with Prof. Kyunghyun Cho
977: Attention, World Models and the Future of AI, with Prof. Kyunghyun Cho 977: Attention, World Models and the Future of AI, with Prof. Kyunghyun Cho

What’s going to be the next big step function that blasts us forward in AI capabilities? To find out, Jon Krohn sits down with Professor Kyunghyun Cho, whose 200,000 citations and co-authorship of the first paper on attention place him among the most influential AI researchers in the world. In this episode, Kyunghyun explains why today’s models have already captured most correlations in passive data, making the real challenge about actively choosing which data to collect. He also weighs in on the open debate around world models, whether AI needs high-fidelity, step-by-step imagination or whether a high-level latent representation that lets it skip ahead is sufficient and shares the surprisi…

3 weeks, 2 days назад @ podtrac.com
976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems
976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems 976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems

NVIDIA just dropped Nemotron 3 Super, a 120-billion-parameter open-weight model that only activates 12 billion parameters at a time and it’s built for the agentic AI era. In this Five-Minute Friday, Jon Krohn breaks down the model’s hybrid Mamba-Transformer architecture, its million-token context window, and why its combination of frontier-class reasoning with blazing-fast throughput matters for anyone building multi-agent systems. Find out how Nemotron 3 Super claimed the #1 spot on the DeepResearch Bench leaderboards, which companies are already adopting it, and where you can start using it today. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/976⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Intere…

3 weeks, 6 days назад @ podtrac.com
975: Unmetered Intelligence is Heralding the Next Renaissance, with Zack Kass
975: Unmetered Intelligence is Heralding the Next Renaissance, with Zack Kass 975: Unmetered Intelligence is Heralding the Next Renaissance, with Zack Kass

Zack Kass speaks to Jon Krohn about his bestselling, tech-positive book, The Next Renaissance, that charts the rapid progress of humanity and the benefits that artificial intelligence will bring to us, as well as why a future where intelligence is a cheap and abundant resource will give humanity an edge. Elsewhere in the show, Zack discusses why it’s important to hold parents, teachers and students accountable for their education, why it is incumbent on us to build a healthier relationship with technology, and his 4 principles for thriving in the age of AI. This episode is brought to you by the⁠ ⁠⁠Cisco⁠, by ⁠Acceldata⁠ and by ⁠⁠ODSC, the Open Data Science Conference⁠⁠. Additional materials…

1 month назад @ podtrac.com
974: When Will The AI Bubble Burst? How Bad Will It Be?
974: When Will The AI Bubble Burst? How Bad Will It Be? 974: When Will The AI Bubble Burst? How Bad Will It Be?

In this week’s Five-Minute Friday, Jon Krohn holds the AI bubble up to the light. He points to the deep greyzone found in AI startups like Cluely that are established on dubious ideas (Cluely’s tagline was “cheat on everything”) and funding bluster, as well as the staggering spending by companies on infrastructure and researcher salaries. Listen to the episode to hear about the historical precedents to the AI bubble that go all the way back to the invention of the railway, what to make of current investments in AI, and what you can do about these changes as an AI practitioner. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/974⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a Supe…

1 month назад @ podtrac.com
973: AI Systems Performance Engineering, with Chris Fregly
973: AI Systems Performance Engineering, with Chris Fregly 973: AI Systems Performance Engineering, with Chris Fregly

No one should be manually writing code in 2026, thinks Chris Fregly, Jon Krohn’s guest on this week’s episode. In this interview about Chris’ latest book, AI Systems Performance Engineering, he explains why it’s so important to consider memory bandwidth when evaluating GPU performance, that understanding the full hardware software stack is the most valuable skill for anyone working in AI development, and which shortcuts we still shouldn’t ever take when writing code, even though we might be outsourcing a great deal to generative AI. This episode is brought to you by the ⁠⁠Cisco, by Acceldata and by ⁠ODSC, the Open Data Science Conference⁠. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠…

1 month, 1 week назад @ podtrac.com
972: In Case You Missed It in February 2026
972: In Case You Missed It in February 2026 972: In Case You Missed It in February 2026

Jon Krohn recaps the month of February in this episode of In Case You Missed It. Across four interviews with Will Falcon (Episode 965), Tom Griffiths (Episode 969), Antje Barth (Episode 963), and Praveen Murugesan (Episode 967), Jon questions the brains behind some of the AI industry’s most innovative companies about launching a startup, developing a popular product, what artificial intelligence can still learn from human intelligence, and how AI might finally start to think on its own. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/972⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 1 week назад @ podtrac.com
971: 90% of The World’s Data is Private; Lin Qiao’s Fireworks AI is Unlocking It
971: 90% of The World’s Data is Private; Lin Qiao’s Fireworks AI is Unlocking It 971: 90% of The World’s Data is Private; Lin Qiao’s Fireworks AI is Unlocking It

Lin Qiao, CEO of Fireworks AI, talks to Jon Krohn about how she builds effective models quickly, why coding agents can perform at the level of a junior engineer, and what she attributes to the success of Fireworks AI: True to its name, the company exploded into the AI industry with over $300 million secured in venture capital, as well as netting a further $250 million Series C funding. For Lin, many enterprises miss out by not being familiar with open models. Open models give a lot of control to the user, offering customizability and at a much lower price point. Listen to hear how Fireworks AI helps companies continue to save money through AI. This episode is brought to you by the⁠ ⁠⁠⁠⁠Dell…

1 month, 2 weeks назад @ podtrac.com
970: The “100x Engineer”: How to Be One, But Should You?
970: The “100x Engineer”: How to Be One, But Should You? 970: The “100x Engineer”: How to Be One, But Should You?

Working with code-gen models and Claude Code: In this Five-Minute Friday, Jon Krohn addresses how AI superstars like Andrej Karpathy are using AI agents in their coding work, the outlook for code-gen in 2026, and how you can get started. Hear about Karpathy’s work as well as the soaring success of Peter Steinberger and how he managed to surpass the GitHub commit rate of teams as an individual working with AI agents. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/970⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 2 weeks назад @ podtrac.com
969: The Laws of Thought: The Math of Minds and Machines, with Prof. Tom Griffiths
969: The Laws of Thought: The Math of Minds and Machines, with Prof. Tom Griffiths 969: The Laws of Thought: The Math of Minds and Machines, with Prof. Tom Griffiths

Princeton Professor Tom Griffiths talks to Jon Krohn about his new book, The Laws of Thought, which grapples with the mathematical models behind biological and artificial intelligence, and what makes the human brain so fascinating for psychologists and computer scientists to study. In this episode, he details how the mathematical principles governing the external world can also be used to explore cognitive science, or “the internal world.” This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Cisco and by Acceldata. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/969⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natal…

1 month, 3 weeks назад @ podtrac.com
Data Science at Home Data Science at Home
последний пост 1 month, 1 week назад
There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299)
There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299) There Is No AI. There’s a Stateless Function on 10,000 GPUs Pretending to Know You (Ep. 299)

Personal newsletter: https://defragzone.substack.com📩 Newsletter: https://datascienceathome.substack.com🎙 Podcast: Available on Spotify, Apple Podcasts, and more.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/ Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, intervi…

1 month, 1 week назад @ datascienceathome.com
Bias in the machine (edited)
Bias in the machine (edited) Bias in the machine (edited)

The title of today’s episode is Bias in the machineC: Francesco, today we are starting with an infuriating discussion.

The failure of the medical community as a whole to recognise this obvious bias up to the 21st century is an example of how insidious the problem of bias is.

Three: The bias in your training sample: people put training samples together, and people have culture, experience, and prejudice.

These assumptions inform the way AI systems work—and fail—to this day.

When an algorithm is a black box and you can’t look inside, you have no way of analysing its bias.

1 month, 1 week назад @ datascienceathome.com
What is wrong with reinforcement learning? (Ep. 82)
What is wrong with reinforcement learning? (Ep. 82) What is wrong with reinforcement learning? (Ep. 82)

Join the discussion on our Discord serverAfter reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here.In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions.

RL seems to work so well!

What is wrong with it?

Are you a listener of Data Science at Home podcast?

Or did you subscribe to the Artificial Intelligence at your fingertips newsletter?

2 months, 1 week назад @ datascienceathome.com
How to generate very large images with GANs (Ep. 76)
How to generate very large images with GANs (Ep. 76) How to generate very large images with GANs (Ep. 76)

Join the discussion on our Discord serverIn this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs.

The problem is not as trivial as it seems.

Many researchers have failed in generating large images with GANs before.

One interesting application of such approach is in medicine for the generation of CT and X-ray images.Enjoy the show!

ReferencesMulti-scale GANs for Memory-efficient Generation of High Resolution Medical Images https://arxiv.org/abs/1907.01376

2 months, 1 week назад @ datascienceathome.com
Training neural networks faster without GPU [RB] (Ep. 77)
Training neural networks faster without GPU [RB] (Ep. 77) Training neural networks faster without GPU [RB] (Ep. 77)

Join the discussion on our Discord serverTraining neural networks faster usually involves the usage of powerful GPUs.

In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.

Enjoy the show!

ReferencesFaster Neural Network Training with Data Echoinghttps://arxiv.org/abs/1907.05550

2 months, 1 week назад @ datascienceathome.com
More powerful deep learning with transformers (Ep. 84) (Rebroadcast)
More powerful deep learning with transformers (Ep. 84) (Rebroadcast) More powerful deep learning with transformers (Ep. 84) (Rebroadcast)

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.

Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful.

Don’t forget to subscribe to our Newsletter or join the discussion on our Discord serverReferences

2 months, 1 week назад @ datascienceathome.com
Your Favorite AI Startup is Probably Bullshit (Ep. 298) [RB]
Your Favorite AI Startup is Probably Bullshit (Ep. 298) [RB] Your Favorite AI Startup is Probably Bullshit (Ep. 298) [RB]

The brutal truth about why Silicon Valley is blowing billions on glorified autocomplete while pretending it’s the next iPhone.

We’re diving deep into the AI investment circus where VCs who can’t code are funding companies that barely understand their own technology.

From blockchain déjà vu to the “ChatGPT wrapper” economy—this episode will make you question every AI valuation you’ve ever seen.

Fair warning: We’re naming names and calling out the hype.

Don’t listen if you work at a “revolutionary AI startup” that’s just OpenAI’s API with a pretty interface.

2 months, 1 week назад @ datascienceathome.com
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 297) [RB]
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 297) [RB] Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 297) [RB]

VortexNet uses actual whirlpools to build neural networks.

By borrowing equations from fluid dynamics, this new architecture might solve deep learning’s toughest problems—from vanishing gradients to long-range dependencies.

Today we explain how vortex shedding, the Strouhal number, and turbulent flows might change everything in AI.

SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

2 months, 1 week назад @ datascienceathome.com
AGI: The Dream We Should Never Reach (Ep. 296)
AGI: The Dream We Should Never Reach (Ep. 296) AGI: The Dream We Should Never Reach (Ep. 296)

Also on YouTubeTwo AI experts who actually love the technology explain why chasing AGI might be the worst thing for AI’s future—and why the current hype cycle could kill the field we’re trying to save.

Head to datascienceathome.com for detailed show notes, code examples, and exclusive deep-dives into the papers we discuss.

Subscribe to our newsletter for weekly breakdowns of cutting-edge research delivered straight to your inbox—no fluff, just science!

Our Discord community is full of ML engineers, researchers, and AI enthusiasts discussing papers, sharing projects, and helping each other level up.

Whether you’re debugging your first neural net or training your tenth transformer, there’s a …

2 months, 1 week назад @ datascienceathome.com
When Data Stops Being Code and Starts Being Conversation (Ep. 297)
When Data Stops Being Code and Starts Being Conversation (Ep. 297) When Data Stops Being Code and Starts Being Conversation (Ep. 297)

Mark Brocato built Mockaroo—the tool that taught millions of developers how to fake data.

Now, as Head of Engineering at Tonic.ai, he’s building the AI agent that’s making his own creation obsolete.

From the hidden failures of legacy mocks to the security implications of agent-driven synthesis, Mark reveals what happens when data generation becomes a conversation—not a pipeline.

SponsorsTonic.ai Synthetic data solutions for software and AI development.

Accelerate engineering velocity and ensure compliance with AI-powered data synthesisThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics…

3 months, 3 weeks назад @ datascienceathome.com
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295) Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)

Most companies don’t have an AI problem.

In this conversation, he breaks down when AI actually makes sense, where AWS costs spiral out of control, and why your “cool demo” keeps dying before launch.

If you’re tired of AI hype and ready for straight answers, hit play.

Our Discord community is full of ML engineers, researchers, and AI enthusiasts discussing papers, sharing projects, and helping each other level up.

Whether you’re debugging your first neural net or training your tenth transformer, there’s a place for you.

4 months, 3 weeks назад @ datascienceathome.com
From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)
From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294) From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)

LLMs generate text painfully slow, one low-info token at a time.

Researchers just figured out how to compress 4 tokens into smart vectors & cut costs by 44%—with full code & proofs!

🔥📊SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

Get $200 off any seminar with code DATA25 at https://statisticalhorizons.com

5 months назад @ datascienceathome.com
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293)
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293) Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293)

VortexNet uses actual whirlpools to build neural networks.

By borrowing equations from fluid dynamics, this new architecture might solve deep learning’s toughest problems—from vanishing gradients to long-range dependencies.

Today we explain how vortex shedding, the Strouhal number, and turbulent flows might change everything in AI.

SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

5 months, 2 weeks назад @ datascienceathome.com
The Scientists Growing Living Computers in Swiss Labs (Ep. 292)
The Scientists Growing Living Computers in Swiss Labs (Ep. 292) The Scientists Growing Living Computers in Swiss Labs (Ep. 292)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

With a focus on dual-use innovation, Amethix is shaping a future where intelligent machines extend human capability, not replace it.

Discover more at https://amethix.com This episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Learn more at intrepid.aiReferencesWebsite: finalspark.comDiscord account: / discordNewsletter: https://finalspark.com/#newsletterTopics: Biological computing • Neural engineering • Energy-effic…

5 months, 3 weeks назад @ datascienceathome.com
When AI Hears Thunder But Misses the Fear (Ep. 291)
When AI Hears Thunder But Misses the Fear (Ep. 291) When AI Hears Thunder But Misses the Fear (Ep. 291)

Sanjoy Chowdhury reveals AI’s hidden weakness: while systems can see objects and hear sounds perfectly, they can’t reason across senses like humans do.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at https://amethix.comThis episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Whether it’s in the sky, on the ground, or in orbit—if it’s intelligent and mobile, Intrepid helps you build it.

6 months, 1 week назад @ datascienceathome.com