Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 3 часа назад
[R] Why do some research papers not mention accuracy as a metric?
[R] Why do some research papers not mention accuracy as a metric?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

3 часа назад @ reddit.com
[D] Error in SIGIR published paper
[D] Error in SIGIR published paper

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

4 часа назад @ reddit.com
[P] Understanding Multi-Head Latent Attention (MLA)
[P] Understanding Multi-Head Latent Attention (MLA)

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

5 часов назад @ reddit.com
[D] ICML new policy: reviewers will be reviewed by meta reviewer. Good policy?
[D] ICML new policy: reviewers will be reviewed by meta reviewer. Good policy? [D] ICML new policy: reviewers will be reviewed by meta reviewer. Good policy?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

6 часов назад @ reddit.com
[D] ICML 2026 - ICML desk-rejected my paper but kept me on as a reviewer. Wow?
[D] ICML 2026 - ICML desk-rejected my paper but kept me on as a reviewer. Wow?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

7 часов назад @ reddit.com
[D] AI4PDEs, SciML, Foundational Models: Where are we going?
[D] AI4PDEs, SciML, Foundational Models: Where are we going?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

8 часов назад @ reddit.com
[D] DeepDanbooru v3 PyTorch Port: Constant 0.5 or 0 output after loading weights
[D] DeepDanbooru v3 PyTorch Port: Constant 0.5 or 0 output after loading weights

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

9 часов назад @ reddit.com
[D] Seeking arXiv cs.IR Endorsement for Paper on AI Visibility Framework
[D] Seeking arXiv cs.IR Endorsement for Paper on AI Visibility Framework

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

16 часов назад @ reddit.com
[D] ICLR 2026 decision mega thread
[D] ICLR 2026 decision mega thread

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

19 часов назад @ reddit.com
[D] Critical AI Safety Issue in Claude: "Conversational Abandonment" in Crisis Scenarios – Ignored Reports and What It Means for User Safety
[D] Critical AI Safety Issue in Claude: "Conversational Abandonment" in Crisis Scenarios – Ignored Reports and What It Means for User Safety

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

23 часа назад @ reddit.com
VeritasGraph: AI Analytics with Power BI + MCP Server [D]
VeritasGraph: AI Analytics with Power BI + MCP Server [D] VeritasGraph: AI Analytics with Power BI + MCP Server [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day назад @ reddit.com
[D] Basis Institute
[D] Basis Institute

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 1 hour назад @ reddit.com
[R] Response to CVPR review that claims lack of novelty because they found our workshop preprint?
[R] Response to CVPR review that claims lack of novelty because they found our workshop preprint?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 1 hour назад @ reddit.com
[D] GPU Server best effort for experiment
[D] GPU Server best effort for experiment

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 2 hours назад @ reddit.com
[D] Correct way to compare models
[D] Correct way to compare models

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

If you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day, 5 hours назад @ reddit.com
Towards Data Science
последний пост 4 часа назад
SAM 3 vs. Specialist Models — A Performance Benchmark
SAM 3 vs. Specialist Models — A Performance Benchmark SAM 3 vs. Specialist Models — A Performance Benchmark

Strict Compute Budget : The specialist models were trained within a maximum window of 6 hours.

: The specialist models were trained within a maximum window of 6 hours.

I used four scenes for training and two for validation resulting in 111 training images and 30 validation images.

Optimization: Specialist models can be pruned and quantized.

Running a dozen environment tuned specialist models is often cheaper and predictable than one massive, foundation model.

4 часа назад @ towardsdatascience.com
Azure ML vs. AWS SageMaker: A Deep Dive into Model Training — Part 1
Azure ML vs. AWS SageMaker: A Deep Dive into Model Training — Part 1 Azure ML vs. AWS SageMaker: A Deep Dive into Model Training — Part 1

Azure ML and AWS SageMaker are machine learning services that enable data scientists and ML engineers to develop and manage the entire ML lifecycle, from data preprocessing and feature engineering to model training, deployment, and monitoring.

Azure ML & AWS SageMaker Training JobsWhile they offer similar high-level functionalities, Azure ML and AWS SageMaker have fundamental differences that determine which platform best suits you, your team, or your company.

Now that we understand the importance of training jobs, let’s look at how they’re defined in Azure ML vs. SageMaker in a nutshell.

SageMaker execution role) to access relevant AWS services, such as S3 bucket, SageMaker Training Pipeli…

6 часов назад @ towardsdatascience.com
How to Build a Neural Machine Translation System for a Low-Resource Language
How to Build a Neural Machine Translation System for a Low-Resource Language How to Build a Neural Machine Translation System for a Low-Resource Language

Our Model: How to Use the Translation SystemWe build our translation system on top of NLLB-200-distilled-600M, a multilingual neural machine translation model released by Meta as part of the No Language Left Behind (NLLB) project.

Step 1: Bilingual Dataset ProcessingThe first stage of model training focuses on constructing a bilingual dataset.

]", " ", s) s = re.sub(r"\s+", " ", s).strip() s = re.sub(r"[,.?

Step 2: Tokenizer PreparationBefore text can be digested by a neural machine translation model, it must be converted into tokens.

This function powers the Dongxiang → Chinese translation.

1 day, 4 hours назад @ towardsdatascience.com
Air for Tomorrow: Mapping the Digital Air-Quality Landscape, from Repositories and Data Types to Starter Code
Air for Tomorrow: Mapping the Digital Air-Quality Landscape, from Repositories and Data Types to Starter Code Air for Tomorrow: Mapping the Digital Air-Quality Landscape, from Repositories and Data Types to Starter Code

All examples assume Python ≥3.10 with pip install as needed.

1) OpenAQ (global ground measurements; open API)OpenAQ [3] is an open-source data platform that hosts global data for air quality data, such as PM2.5, PM10, and O3.

They provide air quality data by partnering with various governmental partners, community partners, and air quality sensor companies such as Air Gradient, IQAir, among others.

For GES DISC datasets, start from https://disc.gsfc.nasa.gov, choose a dataset (e.g., MERRA-2), and use the “Data Access” or “Subset/Get Data” tools.

Data access is restricted to non-profit purposes such as research and education; commercial users are directed to the Japan Meteorological Business…

1 day, 6 hours назад @ towardsdatascience.com
Optimizing Data Transfer in Distributed AI/ML Training Workloads
Optimizing Data Transfer in Distributed AI/ML Training Workloads Optimizing Data Transfer in Distributed AI/ML Training Workloads

part of a series of posts on optimizing data transfer using NVIDIA Nsight™ Systems (nsys) profiler.

Nowadays, it is quite common for AI/ML training — particularly of large models — to be distributed across multiple GPUs.

In this post, we will focus on the most common form of distributed training — data-distributed training.

Each input batch is evenly distributed among the GPUs, each of which executes a training step to calculate the local gradients.

Even in the absence of any cross-GPU communication, DDP introduces overhead that can decrease throughput by ~3–7%.

2 days, 3 hours назад @ towardsdatascience.com
Achieving 5x Agentic Coding Performance with Few-Shot Prompting
Achieving 5x Agentic Coding Performance with Few-Shot Prompting Achieving 5x Agentic Coding Performance with Few-Shot Prompting

Why use few-shot promptingFirstly, I want to cover why you should utilize few-shot prompting.

I’ll discuss few-shot prompting and how you leverage it to optimize your LLM’s performance.

Image by GeminiHow to implement few-shot promptingNow I want to discuss how to implement few-shot prompting.

Firstly, I’ll discuss how you should organize your work to increase the surface area for few-shot prompting opportunities, and I’ll then show you how to do few-shot prompting in practice, using examples.

Few-shot prompting in actionNow, assuming you’ve organized your work properly, it’s time to start taking advantage of few-shot prompting.

2 days, 4 hours назад @ towardsdatascience.com
Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found
Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

They found that while the exact wording of a prompt matters less than it used to, the “sophistication” behind the prompt matters enormously.

In fact, it correlates almost perfectly with the sophistication of the model’s response.

Rather than enforcing a fixed register, it adapts its level of sophistication to the user’s prompt.

And this is positive for users applying the AI system to their domains of expertise.

As models become better mirrors of user sophistication, disparities in expertise may become more visible rather than less as the “equalization” narrative proposes.

2 days, 6 hours назад @ towardsdatascience.com
From Transactions to Trends: Predict When a Customer Is About to Stop Buying
From Transactions to Trends: Predict When a Customer Is About to Stop Buying From Transactions to Trends: Predict When a Customer Is About to Stop Buying

In this post, we will use the famous linear regression for a different problem: predicting customer churn.

Linear Regression vs ChurnCustomer churn rarely happens overnight.

We use monthly purchase trends and linear regression to measure customer momentum over time.

# Example of customer stop buying get_trend_degrees('C_003', plot=True)Downtrending customer.

This project demonstrates a simple, interpretable approach to detecting declining customer purchase behavior using linear regression.

2 days, 7 hours назад @ towardsdatascience.com
TDS Newsletter: Beyond Prompt Engineering: The New Frontiers of LLM Optimization
TDS Newsletter: Beyond Prompt Engineering: The New Frontiers of LLM Optimization TDS Newsletter: Beyond Prompt Engineering: The New Frontiers of LLM Optimization

Never miss a new edition of The Variable, our weekly newsletter featuring a top-notch selection of editors’ picks, deep dives, community news, and more.

Previous Variable editions have devoted a lot of space to prompt engineering, which remains an essential tool for anyone working with LLMs.

Beyond Prompting: The Power of Context EngineeringTo learn how to create self-improving LLM workflows and structured playbooks, don’t miss Mariya Mansurova‘s comprehensive guide.

Automatic Prompt Optimization for Multimodal Vision Agents: A Self-Driving Car ExampleInstead of leaving prompts entirely behind, Vincent Koc’s deep dive shows how to leverage agents to give prompting a substantial performance …

2 days, 21 hours назад @ towardsdatascience.com
Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics
Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics

generate customer journeys that appear smooth and engaging, but evaluating whether these journeys are structurally sound remains challenging for current methods.

Let’s examine what standard customer journey metrics can and can’t provide.

A Structural Approach to Evaluating Customer JourneysUltimately, the primary missing element in evaluating recommended content sequences within the customer journey is structure.

Mapping Customer Journeys onto the TaxonomyOnce the taxonomy is established, individual customer journeys are mapped onto it as ordered sequences of messages.

Mapping Customer Journeys onto the TaxonomyOnce the taxonomy is constructed, evaluating a customer journey becomes a struct…

3 days, 3 hours назад @ towardsdatascience.com
Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026
Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026 Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026

Our discussion focused on which domain is the best for data-driven professionals and how to best use the data in today’s big data world.

In my view, from 7+ years experience in Product Management, it’s SaaS Product Management.

SaaS users typically access applications by using a web browser or an app’.

Case Study #2: The Mysterious Churn SpikeContext: SaaS product, $50 MRR (monthly recurrent revenue) average.

The second advantage is that it will allow you to have deeper conversations with technical teams for heavy data Saas.

3 days, 4 hours назад @ towardsdatascience.com
Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames
Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames

, I discussed how to create your first DataFrame using Pandas.

To do that, it’s pretty straightforward# Filter by a single condition # Example: All orders from the “Electronics” category.

# Filter by date condition # Example: Orders placed after “2023–01–08” df_sales[df_sales[“OrderDate”] > “2023–01–08”]This checks for orders placed after January 8, 2023.

Filtering by Multiple Conditions (AND, OR, NOT)Pandas enables us to filter on multiple conditions using logical operators.

Filter using string matchingOne of Pandas’ powerful but underused functions is string matching.

3 days, 6 hours назад @ towardsdatascience.com
What Other Industries Can Learn from Healthcare’s Knowledge Graphs
What Other Industries Can Learn from Healthcare’s Knowledge Graphs What Other Industries Can Learn from Healthcare’s Knowledge Graphs

Note 1: This post is part 3 of a three-part series on healthcare, knowledge graphs, and lessons for other industries.

Other industries are already developing shared ontologies, vocabularies, observation standards, and exchange models in law, finance, climate science, construction, cybersecurity, and government.

In addition to these domain ontologies, healthcare uses cross-domain ontologies like Schema.org and QUDT (Quantities, Units, Dimensions, and Types).

There are even organizations that tie many of these together into a unified knowledge graph like the Scalable Precision Medicine Open Knowledge Engine (SPOKE), the Monarch Initiative, and Open Targets.

Other industries have also benefite…

3 days, 7 hours назад @ towardsdatascience.com
Google Trends is Misleading You: How to Do Machine Learning with Google Trends Data
Google Trends is Misleading You: How to Do Machine Learning with Google Trends Data

Google Trends is one of the most widely used tools for analysing human behaviour at scale. Journalists use it. Data scientists use it. Entire papers are built on it. But there is a fundamental property of Google Trends data that makes it very easy to misuse, especially if you are working with time series or trying to build models, and most people never realise they are doing it.

The post Google Trends is Misleading You: How to Do Machine Learning with Google Trends Data appeared first on Towards Data Science.

4 days, 3 hours назад @ towardsdatascience.com
If You Want to Become a Data Scientist in 2026, Do This
If You Want to Become a Data Scientist in 2026, Do This If You Want to Become a Data Scientist in 2026, Do This

Useless LearningThe very first thing you need to do to get a data science job is obviously learn some data science.

I have a whole article explaining what a great data science resume looks like, but let me break down the key points here.

If you follow the steps in this article, you will eventually land a data science role.

However, if you want to speed run the process, then I invite you to join the Data Science Launchpad.

Join my free newsletter where I share weekly tips, insights, and advice from my experience as a practising data scientist and machine learning engineer.

4 days, 4 hours назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
TheSequence TheSequence
последний пост 7 часов назад
The Sequence Radar #795: The New Inference Kids
The Sequence Radar #795: The New Inference Kids The Sequence Radar #795: The New Inference Kids

The opinion section dives into agentic reasoning in LLMs based on a groundbreaking paper that came out this week.

Subscribe and don’t miss out:📝 Editorial: The New Inference KidsLast week was all about inference in AI and new players emerging as forces to be reckoned with in the space.

Their bet is that inference is the new “cloud computing”—a utility that needs to be boring, reliable, and infinitely scalable.

In a standard inference engine, when a user sends a prompt, you compute the Key-Value (KV) cache from scratch.

AI Lab: Google DeepMindSummary: This paper introduces the MultiMax probe architecture and utilizes automated architecture search to address the failure of existing activation…

7 часов назад @ thesequence.substack.com
The Sequence Opinion #794: The Uncanny Valley of Intent: Why We Need Systems, Not Just Models
The Sequence Opinion #794: The Uncanny Valley of Intent: Why We Need Systems, Not Just Models The Sequence Opinion #794: The Uncanny Valley of Intent: Why We Need Systems, Not Just Models

It’s “vibe-based.” On the other side, we have Programming Languages (Python, C++)—the interface of classical silicon.

They are “logic-based.”The thesis I want to explore today is that neither of these is the correct abstraction for the future of AI application development.

We have excellent Models (the raw weights), but we are terrible at building Systems (the cognitive architectures) that effectively direct those models.

We need a new kind of substrate.

We need to move away from the binary of “talking to the bot” vs “coding the bot” and toward a paradigm of Artificial Programmable Intelligence (API).

3 days, 7 hours назад @ thesequence.substack.com
The Sequence AI of the Week #793: DeepSeek's New Paper: Storing 100B Parameters on CPU RAM
The Sequence AI of the Week #793: DeepSeek's New Paper: Storing 100B Parameters on CPU RAM The Sequence AI of the Week #793: DeepSeek's New Paper: Storing 100B Parameters on CPU RAM

If you’ve been following the LLM space, you know the current meta is dominated by two things: scaling up and sparsifying everything.

We are obsessed with Mixture-of-Experts (MoE), tossing trillions of parameters at a problem but only activating a tiny fraction per token to keep the FLOPs manageable.

It’s a beautiful hack that has allowed us to break past dense model limitations.

But sometimes, in our rush to build ever-larger differentiable computers, we forget our roots.

We forget that not every problem requires a massive matrix multiplication.

4 days, 7 hours назад @ thesequence.substack.com
The Sequence Knowledge #792: EVERYTHING you Need to Know About Synthetic Data Generation
The Sequence Knowledge #792: EVERYTHING you Need to Know About Synthetic Data Generation The Sequence Knowledge #792: EVERYTHING you Need to Know About Synthetic Data Generation

💡 AI Concept of the Day: A Summary of our Series About Synthetic Data GenerationToday, we are finalizing our series about synthetic data generation.

The Sequence Knowledge #748: We introduced our series about synthetic data generation and reviewed Microsoft’s famous paper: Textbooks is all you need.

The Sequence Knowledge #752: Analyzes the different types of synthetic data generation methods.

The installment also includes a review of Stanford University’s research on the STaR method for synthetic data generation for reasoning.

11.The Sequence Knowledge #788: Reviews the top synthetic data generation frameworks.

5 days, 7 hours назад @ thesequence.substack.com
The Sequence Radar #791: The Eastern Surge & The Silicon Valley Shuffle
The Sequence Radar #791: The Eastern Surge & The Silicon Valley Shuffle The Sequence Radar #791: The Eastern Surge & The Silicon Valley Shuffle

Subscribe and don’t miss out:📝 Editorial: The Eastern Surge & The Silicon Valley ShuffleI wanted to take a moment to dump my cache on what has been a ridiculously high-velocity week in AI.

We have a lot to cover—from DeepSeek’s architectural “sanity check” to Baidu’s quiet ascent, and the zero-day rehiring drama in Silicon Valley.

DeepSeek Engram: The “Hash Map” MomentFirst, the DeepSeek “Engram” paper (released Jan 12).

The Chinese Model Surge (Baidu & GLM)While the West focused on scaling laws, the East had a massive week in architectural refinement.

AI Lab: Tongyi DeepResearch (Alibaba Group)Summary: This paper addresses the “discrimination collapse” in reinforcement learning for open-en…

1 week назад @ thesequence.substack.com
The Sequence Opinion #790: From Book Smarts to Street Smarts: How AI Benchmarks are Changing
The Sequence Opinion #790: From Book Smarts to Street Smarts: How AI Benchmarks are Changing The Sequence Opinion #790: From Book Smarts to Street Smarts: How AI Benchmarks are Changing

If you’ve been watching the loss curves lately, you might have noticed a structural transition in the field that is technically profound but metrics-poor.

We are effectively moving from the era of the Chatbot—a stochastic, next-token prediction engine—to the era of the Agent—a digital employee expected to navigate, reason, and execute long-horizon workflows.

This isn’t just a product update; it’s a fundamental shift in the ontology of intelligence.

We measured intelligence by the ability to retrieve information or compress the internet into a set of weights.

We are moving from “Question-Answer” pairs to “Observation-Action” loops, and that requires us to build entirely new gymnasiums for th…

1 week, 3 days назад @ thesequence.substack.com
The Sequence AI of the Week #789: Recursive Language Models: Inside the MIT Research Everyone is Talking About
The Sequence AI of the Week #789: Recursive Language Models: Inside the MIT Research Everyone is Talking About The Sequence AI of the Week #789: Recursive Language Models: Inside the MIT Research Everyone is Talking About

Like it skimmed the whole thing with sleepy eyes and then improvised.

The usual remedy has been to fight length with length.

Past a point, the model isn’t reasoning over your input; it’s drowning in it.

The paper names the failure mode plainly: context rot.

Not merely because we hit a hard context limit, but because even within that limit the model’s effective “working set” becomes unmanageably large.

1 week, 4 days назад @ thesequence.substack.com
The Sequence Knowledge #788: Inside the Generator: Meet The Top Synthetic Data Generation Frameworks for Modern AI
The Sequence Knowledge #788: Inside the Generator: Meet The Top Synthetic Data Generation Frameworks for Modern AI The Sequence Knowledge #788: Inside the Generator: Meet The Top Synthetic Data Generation Frameworks for Modern AI

Today we will Discuss:An overivew of the top synthetic data generation frameworks in the market.

NVIDIA’s top framework for synthetic data generation.

💡 AI Concept of the Day: An Overview of Synthetic Data Generation FrameworksSynthetic data has quietly become the “second scaling law” for foundation models: once you’ve saturated human-authored corpora, the only way to keep climbing is to manufacture new data with models themselves.

The interesting part is that this is no longer done with ad-hoc scripts; we’re seeing full-fledged frameworks that treat synthetic generation as an infrastructure problem.

NVIDIA: Nemotron-4 + NeMo as a synthetic data foundry

1 week, 5 days назад @ thesequence.substack.com
The Sequence Radar #787: Rubin, Raises, and Returns: 2026 Starts Fast
The Sequence Radar #787: Rubin, Raises, and Returns: 2026 Starts Fast The Sequence Radar #787: Rubin, Raises, and Returns: 2026 Starts Fast

The AI of the week discusses MIT’s crazy paper about recurvise language models that everyone is talking about.

Subscribe and don’t miss out:📝 Editorial: Rubin, Raises, and Returns: 2026 Starts FastThe first full week of 2026 has been incredibly active, marked by a density of significant developments that sets a high bar for the rest of the year.

The Silicon Wars: From Chips to “AI Factories”At CES 2026, the pretense that the GPU is a standalone component finally ended.

This approach allows the model to achieve state-of-the-art performance on reasoning-heavy benchmarks while significantly reducing computational costs and latency on simpler perception tasks.

🤖 AI Tech ReleasesLFM2.5Liquid AI …

2 weeks назад @ thesequence.substack.com
The Sequence Opinion #786: The Great Absorption: When System Code Becomes Model Weights
The Sequence Opinion #786: The Great Absorption: When System Code Becomes Model Weights The Sequence Opinion #786: The Great Absorption: When System Code Becomes Model Weights

There is a distinct, unsettling feeling when you write code for AI agents today.

It is a specific flavor of Rich Sutton’s “The Bitter Lesson,” playing out in real-time.

In Sutton’s original formulation, hand-coded features were consistently crushed by general methods that leveraged computation.1 In our current era, the lesson is slightly different but equally bitter: Hand-coded system scaffolding is consistently crushed by model internalization.

The history of the last three years is the history of capabilities migrating from the “outside” (the system, the prompt, the agent loop) to the “inside” (the weights, the activations, the forward pass).

You need to know which parts of your stack are…

2 weeks, 3 days назад @ thesequence.substack.com
The Sequence AI of the Week #785: Gradient Highway Maintenance: Inside DeepSeek’s Latest Breakthrough
The Sequence AI of the Week #785: Gradient Highway Maintenance: Inside DeepSeek’s Latest Breakthrough The Sequence AI of the Week #785: Gradient Highway Maintenance: Inside DeepSeek’s Latest Breakthrough

Today, I would like to discuss DeepSeek’s new paper that everyone is talking about.

We love to talk about the flashy stuff—the Attention mechanism, the Mixture-of-Experts routing, the sheer scale of the datasets.

But if you peel back the layers of a Transformer, or really almost any deep network from the last decade, you find the true unsung hero of the Deep Learning revolution: the Residual Connection.

), the residual connection is beautifully simple: y = f(x) + x.

You take the input x, pass it through some non-linear transformation f(x), and then just add the original x back in.

2 weeks, 4 days назад @ thesequence.substack.com
The Sequence Knowledge #784: The Convergence of Synthetic Data and World Models Models Are Unlocking Embodied AI
The Sequence Knowledge #784: The Convergence of Synthetic Data and World Models Models Are Unlocking Embodied AI The Sequence Knowledge #784: The Convergence of Synthetic Data and World Models Models Are Unlocking Embodied AI

Today we will Discuss:Synthetic data generation for 3D world model environments.

The amazing Genie world model from Google DeepMind.

Gathering high-fidelity, perfectly labeled 3D data from the physical world is slow, expensive, and rarely captures the “long-tail” edge cases necessary for robust systems.

The industry is increasingly turning to a powerful pairing to solve this: Synthetic Data Generation (SDG) and World Models.

Together, they are moving us from training AI on static datasets to training them inside dynamic, interactive simulations.

2 weeks, 5 days назад @ thesequence.substack.com
The Sequence Radar #783: Softbank, DeepSeek, MiniMax and The Sequence 2026
The Sequence Radar #783: Softbank, DeepSeek, MiniMax and The Sequence 2026 The Sequence Radar #783: Softbank, DeepSeek, MiniMax and The Sequence 2026

Next Week in The Sequence:We continue our series about synthetic data exploring the potential for world models.

Subscribe and don’t miss out:📝 Editorial: Softbank, DeepSeek, MiniMax and The Sequence 2026As we cross the threshold into a new year, I am upholding my annual tradition of experimenting with the structure of The Sequence.

While SoftBank attempts to conquer AGI through capital dominance, Chinese lab DeepSeek continues to demonstrate that smarter math can rival larger budgets.

This approach enables the creation of scalable, persistent, and controllable environments—such as infinite travel atlases or galaxy simulations—that bridge the gap between static web frameworks and fully gener…

3 weeks назад @ thesequence.substack.com
The Sequence Opinion #782: The New Gradient: Research Directions That Will Ship in 2026
The Sequence Opinion #782: The New Gradient: Research Directions That Will Ship in 2026 The Sequence Opinion #782: The New Gradient: Research Directions That Will Ship in 2026

As the first post of 2026, I wanted to share some of the research trends that I think might be super influential in this year frontier model breakthroughs.

We built these incredible “stochastic parrots” that could complete your code, write poetry, and pass the bar exam, all by just really, really wanting to predict the next token.

But if you look at the research papers dropping recently, the vibe has shifted.

We don’t just want models that can talk smooth; we want models that can think straight.

As we look toward 2026, we are moving from the era of Generative AI (making things that look real) to Verifiable AI (making things that are correct).

3 weeks, 3 days назад @ thesequence.substack.com
The Sequence AI of the Week #781: The Amazing GLM 4.7
The Sequence AI of the Week #781: The Amazing GLM 4.7 The Sequence AI of the Week #781: The Amazing GLM 4.7

In the frenetic closing weeks of 2025, the artificial intelligence landscape was dominated by the usual titans.

Google’s Gemini 3 Pro dazzled with its multimodal fluidity, and the rumor mill surrounding OpenAI’s GPT-5.2 reached a fever pitch.

Yet, amidst this cacophony of frontier model releases, a quieter, perhaps more consequential shift occurred on December 22.

Zhipu AI (Z.ai) released GLM-4.7, a model that signals a definitive transition in the open-weight ecosystem: a move away from “conversational competency” toward “agentic reliability.”GLM-4.7 is not designed to be a chat partner; it is designed to be an employee.

By aggressively optimizing for long-context loops, terminal error rec…

3 weeks, 4 days назад @ thesequence.substack.com
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 4 months, 1 week назад
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода
SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода SWE-MERA — новый динамический бенчмарк для моделей агентной генерации кода

Однако все задачи в MERA CODE, как впрочем и в SWE-bench и других бенчмарках подобного назначения, следуют классической парадигме, когда у нас есть фиксированный обучающий набор данных и, что более важно, фиксированный проверочный набор.

Но большие языковые модели для кодинга, которые мы и пытаемся оценивать нашим набором, также учатся на GitHub – со времен еще первой модели LLaMa.

Кажется, что 700 задач немного, но это уже очень приличное количество, и что самое важное — это новые задачи.

Current behavior: from sympy import ask, Q, Symbol x = Symbol('x') print(ask(Q.finite(x**-1), Q.real(x))) # Output: True Expected behavior: The function should return None to indicate uncertainty, as x**-…

4 months, 1 week назад @ habr.com
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке
DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке DRAGON: динамический бенчмарк для оценки RAG-систем на русском языке

Ответ: Кэисукэ ТибаSPARQL-запрос SimpleSELECT DISTINCT ?s ?r ?o WHERE { { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

GROUP BY ?s ?r HAVING(count(?o) = 1) } { SELECT ?s ?r ?o WHERE { ?s ?r ?o . }

Ответ: Национальная система платежных карт (НСПК) Центр биометрических технологий (ЦБТ) ЕБСSELECT ?s ?r ?o ?len WHERE { { SELECT ?s ?r (COUNT(?o1) as ?len) (GROUP_CONCAT(DISTINCT(STR(?o1));separator="|") AS ?o) WHERE { ?s ?r ?o1 . }

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

FILTER(?o != ?o1) } GROUP BY ?o ?o1 ?r ?r1 HAVING(COUNT(?s) = 1) } UNION { SELECT ?s ?r ?o ?r1 ?s1 WHERE { ?s ?r ?o .

6 months назад @ habr.com
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip
RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip RKNN Toolkit2: конвертация моделей и симуляция NPU Rockchip

В этой статье я хочу поделиться своим опытом по конвертации нейросети в формат rknn с помощью библиотеки rknn-toolkit2.

Вот как выглядят веса pytorch модели в Netron:веса pytorch модели в NetronВажно!

Конвертация onnx модели в rknnДалее создается объект RKNN , который управляет процессом конвертации и инференса модели на платформе Rockchip.

На этом этапе происходит подготавка модели к конвертации в формат RKNN и последующему запуску на NPU Rockchip.

Создание и экспорт rknn моделиНа этом этапе происходит конвертация ONNX-модели во внутренний формат RKNN, оптимизация графа и подготовка к запуску на NPU Rockchip.

6 months, 1 week назад @ habr.com
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях
MERA Code: всесторонняя оценка генерации кода в прикладных сценариях MERA Code: всесторонняя оценка генерации кода в прикладных сценариях

🔗MERA Code🔗GitHub с кодом и данными🔗Коллекция на Hugging Face🔗Статья на arxiv🔗Репозиторий проекта на GitVerseЧто такое MERA Code?

Современные кодовые языковые модели и модели общего назначения (ChatGPT, Claude, Qwen, YandexGPT, GigaChat и др.)

Список текущих задач MERA Code и их характеристикКаталог задач MERA Code и их подробное описание представлено на сайте.

В MERA Code промпты строго подобраны под задачу и корректный выбор ответа.

В заключениеMERA Code — это попытка закрыть важный пробел в тестировании LLM: насколько они действительно полезны в реальной, локализованной разработке.

6 months, 1 week назад @ habr.com
Machine Learning Mastery
последний пост 3 days, 8 hours назад
The 2026 Time Series Toolkit: 5 Foundation Models for Autonomous Forecasting
The 2026 Time Series Toolkit: 5 Foundation Models for Autonomous Forecasting The 2026 Time Series Toolkit: 5 Foundation Models for Autonomous Forecasting

This list covers the five essential foundation models you need to know for building production forecasting systems in 2026.

The shift from task-specific models to foundation model orchestration changes how teams approach forecasting.

Instead of spending weeks tuning parameters and wrangling domain expertise for each new dataset, pretrained models already understand universal temporal patterns.

Amazon Chronos-2 (The Production-Ready Foundation)Amazon Chronos-2 is the most mature option for teams moving to foundation model forecasting.

Lag-Llama (The Open-Source Backbone)Lag-Llama brings probabilistic forecasting capabilities to foundation models through a decoder-only transformer inspired by…

3 days, 8 hours назад @ machinelearningmastery.com
Everything You Need to Know About How Python Manages Memory
Everything You Need to Know About How Python Manages Memory Everything You Need to Know About How Python Manages Memory

If you follow the references, you go in a circle: Alice’s object references Bob’s object, which references Alice’s object, which references Bob’s object… forever.

How Python Manages Memory Using Reference Counting & Generational Garbage CollectionPython uses two main mechanisms for garbage collection:Reference counting: This is the primary method.

import sys # Create an object - reference count is 1 my_list = [1, 2, 3] print(f"Reference count: {sys.getrefcount(my_list)}") # Create another reference - count increases another_ref = my_list print(f"Reference count: {sys.getrefcount(my_list)}") # Delete one reference - count decreases del another_ref print(f"Reference count: {sys.getrefcount(my…

4 days, 8 hours назад @ machinelearningmastery.com
The Machine Learning Practitioner’s Guide to Model Deployment with FastAPI
The Machine Learning Practitioner’s Guide to Model Deployment with FastAPI The Machine Learning Practitioner’s Guide to Model Deployment with FastAPI

If you’ve trained a machine learning model, a common question comes up: “How do we actually use it?” This is where many machine learning practitioners get stuck.

In this article, you will learn how to deploy a machine learning model using FastAPI step by step.

pipeline import Pipeline from sklearn .

Create a file called main.py:from fastapi import FastAPI from pydantic import BaseModel import joblib app = FastAPI(title="House Price Prediction API") # Load model once at startup model = joblib.load("house_price_model.joblib") 1 2 3 4 5 6 7 8 from fastapi import FastAPI from pydantic import BaseModel import joblib app = FastAPI ( title = "House Price Prediction API" ) # Load model once at star…

5 days, 8 hours назад @ machinelearningmastery.com
Top 5 Agentic AI Website Builders (That Actually Ship)
Top 5 Agentic AI Website Builders (That Actually Ship) Top 5 Agentic AI Website Builders (That Actually Ship)

Instead of spending weeks learning UI frameworks, I started using tools like v0 and other agentic website builders to create professional dashboards, landing pages, and wallet interfaces.

I wanted to understand which AI website builders actually work end to end—not just tools that generate a pretty frontend, but ones that can also handle backend logic, connect data, and deploy a working product in minutes.

After testing and researching what is available, I put together this list of the top agentic AI website builders that go beyond design.

These tools help you build real applications, manage the backend, and ship fast without getting stuck in setup or configuration.

Hostinger HorizonsHostin…

6 days, 7 hours назад @ machinelearningmastery.com
The Complete Guide to Data Augmentation for Machine Learning
The Complete Guide to Data Augmentation for Machine Learning The Complete Guide to Data Augmentation for Machine Learning

In this article, you’ll learn how data augmentation works in practice and when to use it.

Data Augmentation for Image DataImage data augmentation is the most intuitive place to start.

load_data ( ) # Normalize pixel values X_train = X_train / 255.0 X_test = X_test / 255.0 # Reshape to (samples, height, width, channels) X_train = X_train .

Data Augmentation for Audio DataAudio data also benefits heavily from augmentation.

Data Augmentation for Tabular DataTabular data is the most sensitive data type to augment.

1 week, 2 days назад @ machinelearningmastery.com
The Beginner’s Guide to Computer Vision with Python
The Beginner’s Guide to Computer Vision with Python The Beginner’s Guide to Computer Vision with Python

Share Post ShareIn this article, you will learn how to complete three beginner-friendly computer vision tasks in Python — edge detection, simple object detection, and image classification — using widely available libraries.

Thankfully, Python libraries like OpenCV and TensorFlow make it possible — even for beginners — to create and experiment with their own computer vision solutions using just a few lines of code.

cvtColor ( image , cv2 .

COLOR_BGR2GRAY ) # Detect faces faces = face_cascade .

Dense ( 10 , activation = "softmax" ) ] ) # Compile and train the model model .

1 week, 3 days назад @ machinelearningmastery.com
Uncertainty in Machine Learning: Probability & Noise
Uncertainty in Machine Learning: Probability & Noise Uncertainty in Machine Learning: Probability & Noise

This entry focuses on the uncertainty, probability, and noise in machine learning.

Uncertainty in Machine LearningUncertainty is an unavoidable part of machine learning, arising whenever models attempt to make predictions about the real world.

Machine Learning Mastery ResourcesThese are some selected resources for learning more about probability and noise:A Gentle Introduction to Uncertainty in Machine Learning – This article explains what uncertainty means in machine learning, explores the main causes such as noise in data, incomplete coverage, and imperfect models, and describes how probability provides the tools to quantify and manage that uncertainty.

Understanding Probability Distribut…

1 week, 4 days назад @ machinelearningmastery.com
How to Read a Machine Learning Research Paper in 2026
How to Read a Machine Learning Research Paper in 2026 How to Read a Machine Learning Research Paper in 2026

Share Post ShareIn this article, you will learn a practical, question-driven workflow for reading machine learning research papers efficiently, so you finish with answers — not fatigue.

IntroductionWhen I first started reading machine learning research papers, I honestly thought something was wrong with me.

I would open a paper, read the first few pages carefully, and then slowly lose focus.

Many beginners feel overwhelmed when reading papers, especially in machine learning where ideas, terminology, and assumptions move fast.

I will not get into who originally proposed the technique, but the mindset behind it completely changed how I read papers.

1 week, 5 days назад @ machinelearningmastery.com
10 Ways to Use Embeddings for Tabular ML Tasks
10 Ways to Use Embeddings for Tabular ML Tasks 10 Ways to Use Embeddings for Tabular ML Tasks

DataFrame ( { "user_id" : [ 101 , 102 , 103 , 101 , 104 ] , "product" : [ "Phone" , "Laptop" , "Tablet" , "Laptop" , "Phone" ] , "category" : [ "Electronics" , "Electronics" , "Electronics" , "Electronics" , "Electronics" ] , "review" : [ "great battery" , "fast performance" , "light weight" , "solid build quality" , "amazing camera" ] , "rating" : [ 5 , 4 , 4 , 5 , 5 ] } )1.

Averaging Word Embeddings for Text ColumnsThis approach compresses multiple texts of variable length into fixed-size embeddings by aggregating word-wise embeddings within each text sequence.

models import Word2Vec # Train embeddings on the review text sentences = df [ "review" ] .

tolist ( ) w2v = Word2Vec ( sentences …

1 week, 6 days назад @ machinelearningmastery.com
Quantizing LLMs Step-by-Step: Converting FP16 Models to GGUF
Quantizing LLMs Step-by-Step: Converting FP16 Models to GGUF Quantizing LLMs Step-by-Step: Converting FP16 Models to GGUF

Downloading a Pre-trained ModelWe will pick a small FP16 model from Hugging Face.

Converting the Model to GGUF with QuantizationNow, run the conversion script, specifying the input folder, output filename, and quantization type.

gguf : n_tensors = 201 , total_size = 1.2G Writing : 100 % 1.17G / 1.17G [ 00 : 26 < 00 : 00 , 44.5Mbyte / s ] INFO : hf - to - gguf : Model successfully exported to / content / tinyllama - 1.1b - chat .

upload_file ( path_or_fileobj = "/content/tinyllama-1.1b-chat.Q8_0.gguf" , path_in_repo = "tinyllama-1.1b-chat.Q8_0.gguf" , repo_id = repo _ id )This creates a new repository (if it doesn’t exist) and uploads your quantized GGUF file.

You can access the quantized GG…

2 weeks, 3 days назад @ machinelearningmastery.com
Training a Model on Multiple GPUs with Data Parallelism
Training a Model on Multiple GPUs with Data Parallelism Training a Model on Multiple GPUs with Data Parallelism

device ( f "cuda:{local_rank}" ) print ( f "World size: {world_size}, Rank: {rank}, Local rank: {local_rank}.

save ( { "model" : model .

mlp = LlamaMLP ( config ) def forward ( self , hidden_states : Tensor , rope : RotaryPositionEncoding , attn_mask : Tensor ) -> Tensor : # First residual block: Self-attention residual = hidden_states hidden_states = self .

device ( f "cuda:{local_rank}" ) print ( f "World size: {world_size}, Rank: {rank}, Local rank: {local_rank}.

save ( { "model" : model .

1 month назад @ machinelearningmastery.com
Train a Model Faster with torch.compile and Gradient Accumulation
Train a Model Faster with torch.compile and Gradient Accumulation Train a Model Faster with torch.compile and Gradient Accumulation

It is not the same model object you created using nn.Module , but it shares the same tensors with the original model.

This is because the compiled model is an object that shares the same weights as the original model.

state_dict ( ) , "model.pth" )The original model can be accessed from the compiled model using model._orig_mod .

This can be achieved by increasing the batch size: with the same number of data samples, a larger batch size means fewer batches to process.

You also learned that gradient accumulation is a technique for training with a larger effective batch size by accumulating gradients from multiple mini-batches.

1 month назад @ machinelearningmastery.com
Training a Model with Limited Memory using Mixed Precision and Gradient Checkpointing
Training a Model with Limited Memory using Mixed Precision and Gradient Checkpointing Training a Model with Limited Memory using Mixed Precision and Gradient Checkpointing

set_postfix ( loss = loss .

save ( { "model" : model .

state_dict ( ) , "optimizer" : optimizer .

state_dict ( ) , "scaler" : scaler .

If you still encounter memory issues, another technique trades time for memory: gradient checkpointing.

1 month назад @ machinelearningmastery.com
Practical Agentic Coding with Google Jules
Practical Agentic Coding with Google Jules Practical Agentic Coding with Google Jules

Share Post ShareIntroducing Google JulesIf you have an interest in agentic coding, there’s a pretty good chance you’ve heard of Google Jules by now.

Using Google Jules with an Existing GitHub RepositoryThe process for using Google Jules generally involves initial setup, task prompting, and subsequent human review and approval.

Step 1: Initial Access and AuthenticationVisit the official Google Jules website and click “Try Jules”.

These are just a few specific concerns to keep in mind while exploring Google Jules and agentic coding.

Wrapping UpThis has been an introduction to using Google Jules for agentic coding.

1 month назад @ machinelearningmastery.com
Evaluating Perplexity on Language Models
Evaluating Perplexity on Language Models Evaluating Perplexity on Language Models

When you train a language model, you want to measure how accurately it predicts human language use.

Specifically, you will learn:What is perplexity, and how to compute itHow to evaluate the perplexity of a language model with sample dataLet’s get started.

OverviewThis article is divided into two parts; they are:What Is Perplexity and How to Compute ItEvaluate the Perplexity of a Language Model with HellaSwag DatasetWhat Is Perplexity and How to Compute ItPerplexity is a measure of how well a language model predicts a sample of text.

Evaluate the Perplexity of a Language Model with HellaSwag DatasetPerplexity is a dataset-dependent metric.

from_pretrained ( model ) model = transformers .

1 month назад @ machinelearningmastery.com
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 2 months, 1 week назад
Authentic Imperfection
Authentic Imperfection Authentic Imperfection

* * *I’ve been thinking about the anger surrounding generative AI.

To keep things fair, he took the best human images and best AI images, meaning human art from famous artists, and AI art from prompters skilled at removing obvious tells of image generation.

When people complain about AI slop, I see it as a complaint against the deluge of default style AI images.

We’ve seen this happen in all forms: AI text, AI music, older forms of computer generated content like CGI.

As much as we celebrate imperfection, digital imperfection is a step too far.

2 months, 1 week назад @ alexirpan.com
Ten Years Later
Ten Years Later Ten Years Later

Every now and then, someone asks me why I blog, and I don’t know really know what to tell them.

That’s another reason I’m not celebrating 10 years with more gusto, I know I’ve been writing less.

Indiana Jones and the Great Circle: I don’t know how they did it, but Indiana Jones and the Great Circle was just fun all the way through.

My one complaint is that the hand-to-hand combat feels like the worst part of the game, so of course they put a bunch of upgrades behind learning parry timings you’ll never use later.

I have not tried Peak, but Another Crab’s Treasure was really good and is worth playing if you’re interested in a Souls-like.

5 months, 1 week назад @ alexirpan.com
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025
Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025 Brony Musicians Seize The Means of Production: My Eyewitness Account to BABSCon 2025

A music concert in the evenings, typically set up as a rave with EDM or rock music made by brony musicians.

She has been involved in organizing pony music concerts for over a decade, for both BABSCon and other pony conventions.

Thank you, BABSCon ChairsThe brony musicians immediately jump into an emergency Discord call with Pinkaboo, to get her side of the story.

Other conventions start tweeting in support of the brony musicians, with no one taking BABSCon’s side.

It’s hard for me to explain why I like MLP fan music, because brony music really isn’t accessible.

6 months, 1 week назад @ alexirpan.com
Who is AI For?
Who is AI For? Who is AI For?

I think the easy answer to this question is that right now, AI is for the AI developers.

Code is useful, it makes money, it is a testbed for AI speeding up the development of AI, and it is easy.

I’m working in AI because it pays well and is potentially really good for the world.

The artists did not know what AI was, but when they learned, they quickly decided they did not want it.

It feels like the most likely outcome is that people go all-in on pushing raw intelligence, in the way that AI developers can measure it, leaving behind those that are not like AI developers.

9 months, 4 weeks назад @ alexirpan.com
Lil'Log
последний пост None
The Spectator
последний пост None
The Unofficial Google Data Science Blog The Unofficial Google Data Science Blog
последний пост None
Off the Convex Path
последний пост None
Jay Alammar
последний пост None
Piekniewski's blog
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
Andrew Karpathy blog
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост 6 months назад
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy /henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos.

Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker.

It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage.

By learning geometry and motion jointly from …

6 months назад @ paperswithcode.com
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation
/antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation /antof27/ Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation

Calisthenics skill classification is the computer vision task of inferring the skill performed by an athlete from images, enabling automatic performance assessment and personalized analytics.

Traditional methods for calisthenics skill recognition are based on pose estimation methods to determine the position of skeletal data from images, which is later fed to a classification algorithm to infer the performed skill.

This work proposes a direct approach to calisthenics skill recognition, which leverages depth estimation and athlete patch retrieval to avoid the computationally expensive human pose estimation module.

Using Depth Anything V2 for depth estimation and YOLOv10 for athlete localizat…

6 months назад @ paperswithcode.com
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI
/snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI /snowflakedb/ Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI

Inference is now the dominant AI workload, yet existing systems force trade-offs between latency, throughput, and cost.

Arctic Inference, an open-source vLLM plugin from Snowflake AI Research, introduces Shift Parallelism, a dynamic parallelism strategy that adapts to real-world traffic while integrating speculative decoding, SwiftKV compute reduction, and optimized embedding inference.

It achieves up to 3.4 times faster request completion, 1.75 times faster generation, and 1.6M tokens/sec per GPU for embeddings, outperforming both latency- and throughput-optimized deployments.

Already powering Snowflake Cortex AI, Arctic Inference delivers state-of-the-art, cost-effective inference for ent…

6 months назад @ paperswithcode.com
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale
/NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale /NVIDIA/ FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting.

The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales.

FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches.

In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realis…

6 months назад @ paperswithcode.com
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
/jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work? /jingyanw/ Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

We study A/B experiments that are designed to compare the performance of two recommendation algorithms.

The bias arising from this type of data sharing is known as "symbiosis bias".

In this paper, we highlight that, for decision-making purposes, the sign of the GTE often matters more than its precise magnitude when selecting the better algorithm.

We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the expected GTE estimate under data sharing aligns with or contradicts the sign of the true GTE.

Our analysis identifies the level of exploration versus exploitation as a key determinant of how symbiosis bias impacts algorithm selection.

6 months назад @ paperswithcode.com
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression
/qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression /qqq-yi/ DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression

Task-agnostic prompt compression leverages the redundancy in natural language to reduce computational overhead and enhance information density within prompts, especially in long-context scenarios.

Existing methods predominantly rely on information entropy as the metric to compress lexical units, aiming to achieve minimal information loss.

However, these approaches overlook two critical aspects: (i) the importance of attention-critical tokens at the algorithmic level, and (ii) shifts in information entropy during the compression process.

Motivated by these challenges, we propose a dynamic attention-aware approach for task-agnostic prompt compression (DAC).

This approach effectively integrate…

6 months назад @ paperswithcode.com
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions
/lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions /lukasellinger/ Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions

Large Language Models (LLMs) can provide accurate word definitions and explanations for any context.

However, the scope of the definition changes for different target groups, like children or language learners.

We investigate how simplification impacts homonym definition quality across three target groups: Normal, Simple, and ELI5.

Our results show that simplification drastically degrades definition completeness by neglecting polysemy, increasing the risk of misunderstanding.

Fine-tuning Llama 3.1 8B with Direct Preference Optimization substantially improves homonym response quality across all prompt types.

6 months назад @ paperswithcode.com
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention
/pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention /pspdada/ Mitigating Object Hallucinations via Sentence-Level Early Intervention

Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations - fabricated content contradicting visual inputs.

Existing hallucination mitigation methods either incur prohibitive computational costs or introduce distribution mismatches between training data and model outputs.

We identify a critical insight: hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs.

To address this, we propose **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning), a framework that eliminates dependency on human annotations…

6 months назад @ paperswithcode.com
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models
/owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models /owos/ FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Language models (LMs) are challenging to adapt to new data distributions by simple finetuning.

This is due to the rigidity of their subword tokenizers, which typically remain unchanged during adaptation.

This inflexibility often leads to inefficient tokenization, causing overfragmentation of out-of-distribution domains, unseen languages, or scripts.

In this work, we develop byte-level LMs with learnable tokenizers to make tokenization adaptive.

Our models include a submodule that learns to predict boundaries between the input byte sequence, encoding it into variable-length segments.

6 months назад @ paperswithcode.com
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion
/wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion /wojiufukele/ Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion

To address the challenges posed by cascading reactions caused by component failures in autonomous cargo ships (ACS) and the uncertainties in emergency decision-making, this paper proposes a novel hybrid feature fusion framework for constructing a graph-structured dataset of failure modes.

A hierarchical feature fusion framework is constructed, using Word2Vec encoding to encode subsystem/component features, BERT-KPCA to process failure modes/reasons, and Sentence-BERT to quantify the semantic association between failure impact and emergency decision-making.

The dataset covers 12 systems, 1,262 failure modes, and 6,150 propagation paths.

In the label prediction results, the Shore-based Meteor…

6 months назад @ paperswithcode.com
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering
/YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering /YF-W/ Tri-Learn Graph Fusion Network for Attributed Graph Clustering

In recent years, models based on Graph Convolutional Networks (GCN) have made significant strides in the field of graph data analysis.

Although the Graph Transformer architecture has mitigated some of these issues, its performance is still limited when processing heterogeneous graph data.

To address these challenges, this study proposes a novel deep clustering framework that comprising GCN, Autoencoder (AE), and Graph Transformer, termed the Tri-Learn Graph Fusion Network (Tri-GFN).

The tri-learning mechanism allows mutual learning among these modules, while the feature fusion strategy enables the model to capture complex relationships, yielding highly discriminative representations for gra…

6 months назад @ paperswithcode.com
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation
/mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation /mr-ravin/ APTx Neuron: A Unified Trainable Neuron Architecture Integrating Activation and Computation

We propose the APTx Neuron, a novel, unified neural computation unit that integrates non-linear activation and linear transformation into a single trainable expression.

The APTx Neuron is derived from the APTx activation function, thereby eliminating the need for separate activation layers and making the architecture both computationally efficient and elegant.

The proposed neuron follows the functional form $y = \sum_{i=1}^{n} ((\alpha_i + \tanh(\beta_i x_i)) \cdot \gamma_i x_i) + \delta$, where all parameters $\alpha_i$, $\beta_i$, $\gamma_i$, and $\delta$ are trainable.

We validate our APTx Neuron-based architecture on the MNIST dataset, achieving up to 96.69\% test accuracy in just 20 ep…

6 months назад @ paperswithcode.com
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation
/Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation /Rec4Fun/ A Reproducibility Study of Product-side Fairness in Bundle Recommendation

While this problem has been widely studied in traditional recommendation settings, its implications for bundle recommendation (BR) remain largely unexplored.

Existing fairness frameworks and metrics designed for traditional recommender systems may not directly translate to this multi-layered setting.

In this paper, we conduct a comprehensive reproducibility study of product-side fairness in BR across three real-world datasets using four state-of-the-art BR methods.

We analyze exposure disparities at both the bundle and item levels using multiple fairness metrics, uncovering important patterns.

Overall, our findings offer actionable insights for building fairer bundle recommender systems and…

6 months назад @ paperswithcode.com
/cbobed/ OntView: What you See is What you Meant
/cbobed/ OntView: What you See is What you Meant /cbobed/ OntView: What you See is What you Meant

However, the lack of tools that provide effective visualization is still a significant challenge.

In this paper, we present OntView, an ontology viewer that is designed to provide users with an intuitive visual representation of ontology concepts and their formal definitions through a user-friendly interface.

Building on the use of a DL reasoner, OntView follows a "What you see is what you meant" paradigm, showing the actual inferred knowledge.

One key aspect for this is its ability to visualize General Concept Inclusions (GCI), a feature absent in existing visualization tools.

OntView has been released with an open-source license for the whole community.

6 months назад @ paperswithcode.com
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction
/Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction /Rec4Fun/ RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction

These approaches fail to capture elaborate relations hidden in real-world bundle structures, resulting in suboptimal bundle representations.

To overcome this limitation, we propose RaMen, a novel method that provides a holistic multi-strategy approach for bundle construction.

RaMen utilizes both intrinsic (characteristics) and extrinsic (collaborative signals) information to model bundle structures through Explicit Strategy-aware Learning (ESL) and Implicit Strategy-aware Learning (ISL).

Integrating diverse strategies enables RaMen to learn more comprehensive and robust bundle representations.

Meanwhile, Multi-strategy Alignment & Discrimination module is employed to facilitate knowledge tr…

6 months назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 6 months назад
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation
/PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation /PrimisAI/ Adaptive Multi-Agent Reasoning via Automated Workflow Generation

The rise of Large Reasoning Models (LRMs) promises a significant leap forward in language model capabilities, aiming to tackle increasingly sophisticated tasks with unprecedented efficiency and accuracy.

However, despite their impressive performance, recent studies have highlighted how current reasoning models frequently fail to generalize to novel, unseen problems, often resorting to memorized solutions rather than genuine inferential reasoning.

In this paper, we introduce Nexus Architect, an enhanced iteration of our multi-agent system framework, Nexus, equipped with a novel automated workflow synthesis mechanism.

Given a user's prompt and a small set of representative examples, the Archi…

6 months назад @ paperswithcode.com
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings
/sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings /sharanya02/ Real Time Captioning of Sign Language Gestures in Video Meetings

One of the most tested ways to establish such a communication is through the use of sign based languages.

However, not many people are aware of the smaller intricacies involved with sign language.

Sign language recognition using computer vision aims at eliminating the communication barrier between deaf-mute and ordinary people so that they can properly communicate with others.

In recent studies, it has been found that people with hearing disabilities prefer to sign over typing during these video calls.

In this paper, we are proposing a browser extension that will automatically translate sign language to subtitles for everyone else in the video call.

6 months назад @ paperswithcode.com
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates
/alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates /alessiopittiglio/ Leveraging Context for Multimodal Fallacy Classification in Political Debates

In this paper, we present our submission to the MM-ArgFallacy2025 shared task, which aims to advance research in multimodal argument mining, focusing on logical fallacies in political debates.

Our approach uses pretrained Transformer-based models and proposes several ways to leverage context.

In the fallacy classification subtask, our models achieved macro F1-scores of 0.4444 (text), 0.3559 (audio), and 0.4403 (multimodal).

Our multimodal model showed performance comparable to the text-only model, suggesting potential for improvements.

PDFAbstract

6 months назад @ paperswithcode.com
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
/RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms /RS2002/ One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

On-demand ride-sharing platforms face the fundamental challenge of dynamically bundling passengers with diverse origins and destinations and matching them with vehicles in real time, all under significant uncertainty.

However, conventional MARL-based ride-sharing approaches heavily rely on the accurate estimation of Q-values or V-values, which becomes problematic in large-scale, highly uncertain environments.

To address these challenges, we propose two novel alternative methods that bypass value function estimation.

First, we adapt GRPO to ride-sharing, replacing the PPO baseline with the group average reward to eliminate critic estimation errors and reduce training bias.

Second, inspired b…

6 months назад @ paperswithcode.com
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation
/LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation /LiXinran6/ Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation

Include the markdown at the top of your GitHub README.md file to showcase the performance of the model.

Badges are live and will be dynamically updated with the latest ranking of this paper.

6 months назад @ paperswithcode.com
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space
/ShimSoonYong/ ZClassifier: Temperature Tuning and Manifold Approximation via KL Divergence on Logit Space

We introduce a novel classification framework, ZClassifier, that replaces conventional deterministic logits with diagonal Gaussian-distributed logits. Code: https://github.com/ShimSoonYong/ZClassifier

6 months, 1 week назад @ paperswithcode.com
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation
/briziorusso/ On Gradual Semantics for Assumption-Based Argumentation

In this paper, we fill this gap and propose a family of novel gradual semantics for equipping assumptions, which are the core components in ABA frameworks, with dialectical strengths. Code: https://github.com/briziorusso/GradualABA

6 months, 1 week назад @ paperswithcode.com
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
/wumingqi/ Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

6 months, 1 week назад @ paperswithcode.com
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling
/IsaacYQH/ WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling

Despite rapid progress in end-to-end AI music generation, AI-driven modeling of professional Digital Signal Processing (DSP) workflows remains challenging. Code: https://github.com/IsaacYQH/WildFX

6 months, 1 week назад @ paperswithcode.com
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss
/summer1278/ Addressing Data Imbalance in Transformer-Based Multi-Label Emotion Detection with Weighted Loss

This paper explores the application of a simple weighted loss function to Transformer-based models for multi-label emotion detection in SemEval-2025 Shared Task 11. Code: https://github.com/summer1278/semeval2025-task11

6 months, 1 week назад @ paperswithcode.com
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs
/gabrielkmbo/ Step-wise Policy for Rare-tool Knowledge (SPaRK): Offline RL that Drives Diverse Tool Use in LLMs

We present Step-wise Policy for Rare-tool Knowledge (SPaRK), a novel reinforcement learning framework that teaches large language models to explore diverse tool usage patterns beyond conventional high-temperature sampling. Code: https://github.com/gabrielkmbo/explore-rl

6 months, 1 week назад @ paperswithcode.com
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation
/Cavendish518/ Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation

Service robots are increasingly deployed in diverse and dynamic environments, where both physical layouts and social contexts change over time and across locations. Code: https://github.com/Cavendish518/LE-Nav

6 months, 1 week назад @ paperswithcode.com
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles
/MatteoFasulo/ AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

6 months, 1 week назад @ paperswithcode.com
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array
/VCA-EPFL/ SystolicAttention: Fusing FlashAttention within a Single Systolic Array

The frequent data swaps between the systolic array and external vector units result in low systolic array utilization. Code: https://github.com/VCA-EPFL/FSA

6 months, 1 week назад @ paperswithcode.com
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection
/Buddhi19/ Precision Spatio-Temporal Feature Fusion for Robust Remote Sensing Change Detection

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

6 months, 1 week назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 6 months назад
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
/fudanvi/ Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

6 months, 1 week назад @ paperswithcode.com
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training
/benedekrozemberczki/ PGT-I: Scaling Spatiotemporal GNNs with Memory-Efficient Distributed Training

Spatiotemporal graph neural networks (ST-GNNs) are powerful tools for modeling spatial and temporal data dependencies. Code: https://github.com/benedekrozemberczki/pytorch_geometric_temporal

6 months, 1 week назад @ paperswithcode.com
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation
/chengxuphd/ DCR: Quantifying Data Contamination in LLMs Evaluation

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

6 months, 1 week назад @ paperswithcode.com
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context
/gitter-lab/ Assay2Mol: large language model-based drug design using BioAssay context

Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. Code: https://github.com/gitter-lab/Assay2Mol

6 months, 1 week назад @ paperswithcode.com
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition
/hayatkhan8660-maker/ DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition

We employ forward Kullback-Leibler (KL) divergence alongside spatio-temporal focal modulation to effectively transfer both local and global context from the Video-FocalNet Base (teacher) to the proposed VFL-Net (student). Code: https://github.com/hayatkhan8660-maker/DVFL-Net

6 months, 1 week назад @ paperswithcode.com
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows
/JudyJuezhuLong/ Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows

Cloudflare is unable to establish an SSL connection to the origin server.

If you're a visitor of this website:Please try again in a few minutes.

If you're the owner of this website:It appears that the SSL configuration used is not compatible with Cloudflare.

This could happen for a several reasons, including no shared cipher suites.

Additional troubleshooting information here.

6 months, 1 week назад @ paperswithcode.com
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters
/joaojcorreia/ A Fuzzy Approach to Project Success: Measuring What Matters

This paper introduces a novel approach to project success evaluation by integrating fuzzy logic into an existing construct. Code: https://github.com/joaojcorreia/FuzzyLogic_ProjectSuccess

6 months, 1 week назад @ paperswithcode.com
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing
/kunkunlin1221/ InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing

Extensive experiments demonstrate the effectiveness of InstructFLIP by outperforming SOTA models in accuracy and substantially reducing training redundancy across diverse domains in FAS. Code: https://github.com/kunkunlin1221/InstructFLIP

6 months, 1 week назад @ paperswithcode.com
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images
/Linvyl/ Describe Anything Model for Visual Question Answering on Text-rich Images

Recent progress has been made in region-aware vision-language modeling, particularly with the emergence of the Describe Anything Model (DAM). Code: https://github.com/Linvyl/DAM-QA

6 months, 1 week назад @ paperswithcode.com
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker
/abhijeet3922/ Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker

We propose multi-step custom implementation utilizing widely adopted hybrid search (metadata & embedding) and state of the art late interaction re-ranker to retrieve best matching pages. Code: https://github.com/abhijeet3922/vision-RAG

6 months, 1 week назад @ paperswithcode.com
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation
/ziangcao0312/ PhysX: Physical-Grounded 3D Asset Generation

3D modeling is moving from virtual to physical. Code: https://github.com/ziangcao0312/PhysX

6 months, 1 week назад @ paperswithcode.com
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy
/henry123-boy/ SpatialTrackerV2: 3D Point Tracking Made Easy

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Code: https://github.com/henry123-boy/SpaTrackerV2

6 months, 1 week назад @ paperswithcode.com
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks
/cncs-fit/ Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks

Analysis of network performance, correlation patterns, and weight matrices reveals that mutual information minimization yields high task performance alongside clear functional modularity and moderate structural modularity. Code: https://github.com/cncs-fit/mio_rnn

6 months, 1 week назад @ paperswithcode.com
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator
/coswindywang/ Making Language Model a Hierarchical Classifier and Generator

Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. Code: https://github.com/coswindywang/HdLM

6 months, 1 week назад @ paperswithcode.com
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL
/ahmedehabb/ From Roots to Rewards: Dynamic Tree Reasoning with RL

Modern language models address complex questions through chain-of-thought (CoT) reasoning (Wei et al., 2023) and retrieval augmentation (Lewis et al., 2021), yet struggle with error propagation and knowledge integration. Code: https://github.com/ahmedehabb/From-Roots-to-Rewards-Dynamic-Tree-Reasoning-with-RL

6 months, 1 week назад @ paperswithcode.com
💼 University and corporation labs
DeepMind DeepMind
последний пост 1 week, 2 days назад
D4RT: Teaching AI to see the world in four dimensions
D4RT: Teaching AI to see the world in four dimensions D4RT: Teaching AI to see the world in four dimensions

Introducing D4RT, a unified AI model for 4D scene reconstruction and tracking across space and time.

Today, we are introducing D4RT (Dynamic 4D Reconstruction and Tracking), a new AI model that unifies dynamic scene reconstruction into a single, efficient framework, bringing us closer to the next frontier of artificial intelligence: total perception of our dynamic reality.

How D4RT Works: A Query-Based ApproachD4RT operates as a unified encoder-decoder Transformer architecture.

The encoder first processes the input video into a compressed representation of the scene’s geometry and motion.

This makes D4RT extremely fast and scalable, whether it’s tracking just a few points or reconstructing …

1 week, 2 days назад @ deepmind.google
Veo 3.1 Ingredients to Video: More consistency, creativity and control
Veo 3.1 Ingredients to Video: More consistency, creativity and control Veo 3.1 Ingredients to Video: More consistency, creativity and control

Today, Veo is getting more expressive, with improvements that help you create more fun, creative, high-quality videos based on ingredient images, built directly for the mobile format.

We’re excited to bring new creative possibilities for everyone from casual storytellers to professional filmmakers.

We’re releasing:Improvements to Veo 3.1 Ingredients to Video, our capability that lets you create videos based on reference images.

This update makes videos more expressive and creative, even with simple prompts Native vertical outputs for Ingredients to Video (portrait mode) to power mobile-first, short-form video creation State-of-the-art upscaling to 1080p and 4K resolution for high-fidelity p…

1 week, 5 days назад @ blog.google
Google's year in review: 8 areas with research breakthroughs in 2025
Google's year in review: 8 areas with research breakthroughs in 2025 Google's year in review: 8 areas with research breakthroughs in 2025

Google 2025 recap: Research breakthroughs of the year

1 month назад @ deepmind.google
Gemini 3 Flash: frontier intelligence built for speed
Gemini 3 Flash: frontier intelligence built for speed Gemini 3 Flash: frontier intelligence built for speed

Today, we're expanding the Gemini 3 model family with the release of Gemini 3 Flash, which offers frontier intelligence built for speed at a fraction of the cost.

Last month, we kicked off Gemini 3 with Gemini 3 Pro and Gemini 3 Deep Think mode, and the response has been incredible.

With Gemini 3, we introduced frontier performance across complex reasoning, multimodal and vision understanding and agentic and vibe coding tasks.

Gemini 3 Flash retains this foundation, combining Gemini 3's Pro-grade reasoning with Flash-level latency, efficiency and cost.

Starting today, Gemini 3 Flash is rolling out to millions of people globally:

1 month, 1 week назад @ blog.google
Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior
Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Today, we are releasing Gemma Scope 2: a comprehensive, open suite of interpretability tools for all Gemma 3 model sizes, from 270M to 27B parameters.

Producing Gemma Scope 2 involved storing approximately 110 Petabytes of data, as well as training over 1 trillion total parameters.

What’s new in Gemma Scope 2Interpretability research aims to understand the internal workings and learned algorithms of AI models.

Like its predecessor, Gemma Scope 2 acts as a microscope for the Gemma family of language models.

While the original Gemma Scope enabled research in key areas of safety, such as model hallucination, identifying secrets known by a model, and training safer models, Gemma Scope 2 support…

1 month, 1 week назад @ deepmind.google
Improved Gemini audio models for powerful voice experiences
Improved Gemini audio models for powerful voice experiences Improved Gemini audio models for powerful voice experiences

Earlier this week, we introduced greater control over audio generation with an upgrade to our Gemini 2.5 Pro and Flash Text-to-Speech models.

Today, we’re releasing an updated Gemini 2.5 Flash Native Audio for live voice agents.

Gemini 2.5 Flash Native Audio is now available across Google products including Google AI Studio, Vertex AI, and has also started rolling out in Gemini Live and Search Live, bringing the naturalness of native audio to Search Live for the first time.

Beyond powering helpful agents, native audio unlocks new possibilities for global communication.

Live Voice Agents

1 month, 2 weeks назад @ blog.google
Deepening our partnership with the UK AI Security Institute
Deepening our partnership with the UK AI Security Institute Deepening our partnership with the UK AI Security Institute

Today, we're announcing an expanded partnership with the UK AI Security Institute (AISI) through a new Memorandum of Understanding focused on foundational security and safety research, to help ensure artificial intelligence is developed safely and benefits everyone.

The research partnership with AISI is an important part of our broader collaboration with the UK government on accelerating safe and beneficial AI progress.

This is why we have partnered with the UK AISI since its inception in November 2023 to test our most capable models.

We are actively working with AISI to build more robust evaluations for AI models, and our teams have collaborated on safety research to move the field forward…

1 month, 2 weeks назад @ deepmind.google
Strengthening our partnership with the UK government to support prosperity and security in the AI era
Strengthening our partnership with the UK government to support prosperity and security in the AI era Strengthening our partnership with the UK government to support prosperity and security in the AI era

The UK has already laid a strong foundation to seize this moment and is uniquely positioned to translate AI innovation into public benefit.

That’s why we are excited to deepen our collaboration with the UK government to accelerate this work and offer a blueprint for other countries.

Accelerating access to frontier AI in key sectors: Science & EducationOur partnership will center on providing access to frontier AI in two areas foundational to the UK’s long-term success: scientific discovery and education.

The UK has a rich history of applying new technologies to drive scientific progress, from Hooke’s microscope to Faraday’s electrical experiments.

Establishing Google DeepMind’s first automa…

1 month, 2 weeks назад @ deepmind.google
FACTS Benchmark Suite: Systematically evaluating the factuality of large language models
FACTS Benchmark Suite: Systematically evaluating the factuality of large language models FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

The FACTS Benchmark SuiteToday, we’re teaming up with Kaggle to introduce the FACTS Benchmark Suite.

A Search Benchmark that tests a model’s ability to use Search as a tool to retrieve information and synthesize it correctly.

Similar to our previous release, we are following standard industry practice and keeping an evaluation set held-out as a private set.

The FACTS Benchmark Suite Score (or FACTS Score) is calculated as the average accuracy of both public and private sets across the four benchmarks.

Kaggle will oversee the management of the FACTS Benchmark Suite.

1 month, 2 weeks назад @ deepmind.google
Engineering more resilient crops for a warming climate
Engineering more resilient crops for a warming climate Engineering more resilient crops for a warming climate

Scientists are using AlphaFold in their research to strengthen an enzyme that’s vital to photosynthesis, paving the way for more heat-tolerant crops.

As global warming accompanies more droughts and heatwaves, harvests of some staple crops are shrinking.

But less visible is what is happening inside these plants, where high heat can break down the molecular machinery that keeps them alive.

Plants use photosynthesis to produce the glucose that fuels their growth via an intricate choreography of enzymes inside plant cells.

"Our job is to learn from those examples and build that same resilience into the crops we depend on."

1 month, 3 weeks назад @ deepmind.google
AlphaFold: Five years of impact
AlphaFold: Five years of impact AlphaFold: Five years of impact

They used AlphaFold alongside comparative genomics to better understand how plants perceive changes in their environment, paving the way for more resilient crops.

AlphaFold has been cited in more than 35,000 papers and more than 200,000 papers incorporated elements of AlphaFold 2 in their methodology.

An independent analysis of AlphaFold 2’s impact, carried out by the Innovation Growth Lab, suggests that researchers using AlphaFold 2 see an increase of over 40% in their submission of novel experimental protein structures.

Those protein structures are more likely to be dissimilar to known structures, encouraging the exploration of uncharted areas of science.

The AlphaFold Server is empowerin…

2 months назад @ deepmind.google
Revealing a key protein behind heart disease
Revealing a key protein behind heart disease Revealing a key protein behind heart disease

Both have a family history of heart disease – a reminder of what’s at stake in their work to better understand and ultimately help treat this deadly condition.

That protein, apoB100, has defied mapping not only because it’s enormous (for a protein), but also because it connects to fats and other molecules in complicated ways.

ApoB100 forms the molecular scaffold of “bad cholesterol”, which is known to scientists as low-density lipoprotein (LDL).

Discovering the structure of its key protein promised to shed light on how bad cholesterol becomes harmful inside the body, giving scientists a better chance to develop ways to prevent and treat ASCVD.

The images weren’t sharp enough to map the stru…

2 months назад @ deepmind.google
Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery
Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery

We stand at an inflection point where the convergence of advanced AI and scientific research promises to unlock a new golden age of discovery.

There is perhaps no clearer expression of this than the application of AI within science.

Putting our advanced AI tools into the hands of American scientistsGoogle DeepMind will provide an accelerated access program for scientists at all 17 DOE National Laboratories to our frontier AI for Science models and agentic tools, starting today with AI co-scientist on Google Cloud.

We’re excited to see what America’s leading researchers will be able to do with our frontier AI models and agentic tools.

By combining human ingenuity with advanced AI capabilitie…

2 months назад @ deepmind.google
How we’re bringing AI image verification to the Gemini app
How we’re bringing AI image verification to the Gemini app How we’re bringing AI image verification to the Gemini app

At Google, we’ve long invested in ways to provide you with helpful context about information you see online.

Now, as generative media becomes increasingly prevalent and high-fidelity, we are deploying tools to help you more easily determine whether the content you're interacting with was created or edited using AI.

Starting today, we’re making it easier for everyone to verify if an image was generated with or edited by Google AI right in the Gemini app, using SynthID, our digital watermarking technology that embeds imperceptible signals into AI-generated content.

Since then, over 20 billion AI-generated pieces of content have been watermarked using SynthID, and we have been testing our Synt…

2 months назад @ blog.google
Build with Nano Banana Pro, our Gemini 3 Pro Image model
Build with Nano Banana Pro, our Gemini 3 Pro Image model Build with Nano Banana Pro, our Gemini 3 Pro Image model

Today, we’re releasing Nano Banana Pro (Gemini 3 Pro Image), a higher-fidelity model built on Gemini 3 Pro for developers to access studio-quality image generation.

This follows our release of Nano Banana (Gemini 2.5 Flash Image) just a few months ago.

Since then, we’ve loved seeing the community put its key features to work — from character consistency to photo restoration, and even using its capabilities to make local edits in an infinite canvas.

This state-of-the-art image generation and editing model is starting to roll out in paid preview to build a new wave of intelligent, multimodal applications with the Gemini API in Google AI Studio and Vertex AI for enterprises.

This model unlocks…

2 months назад @ blog.google
Google
последний пост 2 days, 2 hours назад
Monitoring Google ADK agentic applications with Datadog LLM Observability
Monitoring Google ADK agentic applications with Datadog LLM Observability Monitoring Google ADK agentic applications with Datadog LLM Observability

Google’s Agent Development Kit (ADK) gives you the building blocks to create powerful agentic systems.

To help you manage this complexity, Datadog LLM Observability now provides automatic instrumentation for systems built with ADK.

Safety and accuracy: LLM responses may contain hallucinations, sensitive data, or prompt injection attempts, risking security incidents and reduced customer trust.

Get started with Datadog LLM ObservabilityDatadog LLM Observability simplifies monitoring and debugging for Google ADK systems, helping users debug agent operations, evaluate responses, iterate quickly, and validate changes before deploying them to production.

For more information on how to debug agent…

2 days, 2 hours назад @ cloud.google.com
How Fastweb + Vodafone reimagined data workflows with Spanner & BigQuery
How Fastweb + Vodafone reimagined data workflows with Spanner & BigQuery How Fastweb + Vodafone reimagined data workflows with Spanner & BigQuery

Editor’s note: Fastweb + Vodafone, a leading telecommunications provider in Italy, created its Customer 360 platform to support faster and more personalized customer experiences across channels.

Fastweb + Vodafone now delivers real-time insights across the organization and is preparing for expanded AI agent use cases in the future.

Our team’s job is to give everyone at our business access to the customer insights they need, when they need them.

Both companies had already begun modernizing these customer data workflows on Google Cloud, especially with BigQuery.

Choosing a connected platform, not a single toolWhen we stepped back and looked at what Fastweb + Vodafone needed, it was clear we w…

3 days, 2 hours назад @ cloud.google.com
Scaling WideEP Mixture-of-Experts inference with Google Cloud A4X (GB200) and NVIDIA Dynamo
Scaling WideEP Mixture-of-Experts inference with Google Cloud A4X (GB200) and NVIDIA Dynamo Scaling WideEP Mixture-of-Experts inference with Google Cloud A4X (GB200) and NVIDIA Dynamo

These recipes are part of a broader collaboration between our organizations that includes investments in important inference infrastructure like Dynamic Resource Allocation (DRA) and Inference Gateway.

Serving architecture: NVIDIA Dynamo functions as the distributed runtime, managing KV cache state and kernel scheduling across the rack-scale fabric.

We combine the A4X at the infrastructure layer with NVIDIA Dynamo at the model serving Layer, orchestrated by GKE.

Serving layer: NVIDIA DynamoHardware of this scale requires a runtime capable of managing distributed state without introducing synchronization overhead.

With full support for the NVIDIA Dynamo suite — from intelligent routing and s…

3 days, 2 hours назад @ cloud.google.com
Cloud CISO Perspectives: Practical guidance on building with SAIF
Cloud CISO Perspectives: Practical guidance on building with SAIF Cloud CISO Perspectives: Practical guidance on building with SAIF

The SAIF Data Controls address risks in data sourcing, management, and use for model training and user interaction, ensuring privacy, integrity, and authorized use throughout the AI lifecycle.

Key data security tools on Google Cloud include: Identity and Access Management, access controls for Cloud Storage and BigQuery, Dataplex for data governance, and Vertex AI managed datasets.

The SAIF Model Controls are designed to build resilience into the model and sanitize its inputs and outputs to protect against these emerging threats.

SAIF recommends application controls to secure the interface between the end user and the AI model.

Governance controls, assurance controls (including red teaming a…

1 week, 2 days назад @ cloud.google.com
Introducing BigQuery managed and SQL-native inference for open models
Introducing BigQuery managed and SQL-native inference for open models Introducing BigQuery managed and SQL-native inference for open models

A SQL-native workflow with automated managementWith the launch of managed third-party generative AI inference in BigQuery (Preview), you can now run open models using just two SQL statements.

This new capability delivers four key benefits:Simplified deployment : Deploy open models using a single CREATE MODEL SQL statement with the model id string (e.g., google/gemma-3-1b-it ).

Unified SQL interface : The entire workflow — from model creation and inference to cost management and cleanup — is managed directly in BigQuery using SQL.

How it works: A practical exampleLet’s take a look at the process of creating and utilizing an open model.

Step 1: Create a BigQuery managed open modelTo use an op…

1 week, 3 days назад @ cloud.google.com
Vibe querying: Write SQL queries faster with Comments to SQL in BigQuery
Vibe querying: Write SQL queries faster with Comments to SQL in BigQuery Vibe querying: Write SQL queries faster with Comments to SQL in BigQuery

Recently, we have seen how "vibe coding" — using natural language AI prompts to generate code — makes developing easier for everyone.

This feature makes writing queries using natural language – ‘vibe querying’ – a reality.

It helps you embed natural language expressions directly within your SQL statements, which the system then translates into executable SQL code.

Key functionality:Embed natural language: You can integrate natural language expressions into your SQL queries by enclosing them within comments.

Refine as you go: After the initial SQL is generated, you can refine your natural language expressions and immediately see how the SQL output changes.

1 week, 4 days назад @ cloud.google.com
Palo Alto Networks automates customer intelligence document creation with agentic design
Palo Alto Networks automates customer intelligence document creation with agentic design Palo Alto Networks automates customer intelligence document creation with agentic design

Vertex AI Agent Engine : The fully managed, serverless environment that hosts and executes the ADK-based agent, handling scaling, security, and operational overhead.

Cloud Storage: Served as storage for the unstructured customer documents needed to answer the DOR questions.

The solution involved shifting the responsibility for state management to a FastAPI server acting as an orchestrator.

This separation leverages Vertex AI Agent Engine's fully managed and scalable environment for the agent, while providing the flexibility of a custom backend orchestrator.

Vertex AI Agent Engine's horizontal scaling capabilities further supported this parallel execution.

1 week, 4 days назад @ cloud.google.com
A gRPC transport for the Model Context Protocol
A gRPC transport for the Model Context Protocol A gRPC transport for the Model Context Protocol

An interesting alternative to MCP transcoding is to use gRPC as the native transport for MCP.

Specifically, we’re committed to supporting gRPC practitioners in their journey to adopt MCP in production, and we’re actively working with the MCP community to explore mechanisms to support gRPC as a transport for MCP.

A community-backed transport package will enable gRPC practitioners to deploy MCP with gRPC in a consistent and interoperable manner.

The native use of gRPC as a transport avoids the need for transcoding and helps maintain operational consistency for environments that are actively using gRPC.

In the rest of this post, we explore the benefits of using gRPC as a native transport for M…

1 week, 5 days назад @ cloud.google.com
Build data analytics agents faster with BigQuery’s fully managed, remote MCP server
Build data analytics agents faster with BigQuery’s fully managed, remote MCP server Build data analytics agents faster with BigQuery’s fully managed, remote MCP server

With the release of fully managed, remote Model Context Protocol (MCP) servers for Google services last month, you can now use BigQuery MCP server to give your AI agents a direct, secure, way to analyze data.

This fully managed MCP server removes management overhead, enabling you to focus on developing intelligent agents.

In this blog post, we discuss and demonstrate the integrations of newly released fully managed, remote BigQuery Server, which is in preview as of January 2026.

Remote MCP servers run on the service's infrastructure and offer an HTTP endpoint to AI applications.

This enables communication between the AI MCP client and the MCP server using a defined standard.

2 weeks, 4 days назад @ cloud.google.com
Cloud CISO Perspectives: 2025 in review: Cloud security basics and evolving AI
Cloud CISO Perspectives: 2025 in review: Cloud security basics and evolving AI Cloud CISO Perspectives: 2025 in review: Cloud security basics and evolving AI

Building the most trusted cloudWe continued to enhance our security capabilities and controls on our cloud platform to help organizations secure their cloud environments and address evolving policy, compliance, and business objectives.

Our forecast for 2026As security professionals, we know that threat actors will continue to innovate to achieve their mission objectives.

To help defenders proactively prepare for the coming year, we publish our annual forecast report with insights from across Google.

We look forward to sharing more insights to help organizations strengthen their security posture in the new year.

For more leadership guidance from Google Cloud experts, please visit our CISO In…

1 month, 1 week назад @ cloud.google.com
Getting AI to write good SQL: Optimizing the AlloyDB AI natural language API for your use case
Getting AI to write good SQL: Optimizing the AlloyDB AI natural language API for your use case Getting AI to write good SQL: Optimizing the AlloyDB AI natural language API for your use case

Descriptive and prescriptive contextAs mentioned above, the AlloyDB AI natural language API relies on descriptive and prescriptive context to improve the accuracy of the SQL code it generates.

The AlloyDB AI natural language API facilitates the creation of descriptive and prescriptive context.

The value index clarifies what kind of entity “John Smith” is, and can be automatically created by AlloyDB AI for your application.

Natural language search over structured, unstructured and multimodal dataWhen it comes to applications that provide search over structured data, the AlloyDB AI natural language API enables a clean and powerful search experience.

Bringing the AlloyDB AI natural language AP…

1 month, 1 week назад @ cloud.google.com
Announcing advanced governance capabilities for Vertex AI Agent Builder
Announcing advanced governance capabilities for Vertex AI Agent Builder Announcing advanced governance capabilities for Vertex AI Agent Builder

At Google Cloud, we continue to make critical investments to Vertex AI Agent Builder, our comprehensive and open platform, enabling you to build faster, scale efficiently, and govern with enterprise-grade security.

Today, with the integration of the Cloud API Registry, we’re excited to bring enhanced tool governance capabilities to Vertex AI Agent Builder.

With this latest update, administrators can now manage available tools for developers across your organization directly in Vertex AI Agent Builder Console, and developers can leverage tools managed by the registry with a new ApiRegistry .

Govern your tools with confidenceBuilding a useful agent requires the agent to have access to the nec…

1 month, 1 week назад @ cloud.google.com
Automate AI and HPC clusters with Cluster Director, now generally available
Automate AI and HPC clusters with Cluster Director, now generally available Automate AI and HPC clusters with Cluster Director, now generally available

Today, we are delivering on those requirements with the General Availability (GA) of Cluster Director and the Preview of Cluster Director support for Slurm on Google Kubernetes Engine (GKE).

Cluster Director (GA) is a managed infrastructure service designed to meet the rigorous demands of modern supercomputing.

There's no extra charge to use Cluster Director.

How Cluster Director supports each phase of deploymentDay 0: PreparationStanding up a cluster typically involves weeks of planning, wrangling Terraform, and debugging the network.

Cluster Director changes the ‘Day 0’ experience entirely, with tools for designing infrastructure topology that’s optimized for your workload requirements.

1 month, 1 week назад @ cloud.google.com
Google named a Leader in The Forrester Wave™: AI Infrastructure Solutions, Q4 2025
Google named a Leader in The Forrester Wave™: AI Infrastructure Solutions, Q4 2025 Google named a Leader in The Forrester Wave™: AI Infrastructure Solutions, Q4 2025

Yesterday, Forrester released The Forrester Wave™: AI Infrastructure Solutions, Q4 2025 report, evaluating 13 vendors, and we believe their findings validate our commitment to solving these core challenges.

Access the full report: The Forrester Wave™: AI Infrastructure Solutions, Q4 2025Accelerating time-to-value with an integrated systemEnterprises don’t run AI in a vacuum.

Delivering continuous AI innovationWe are honored to be recognized as a Leader in The Forrester Wave™ report, which we believe validates decades of R&D and our approach to building ultra-scale AI infrastructure.

Access the full report: The Forrester Wave™: AI Infrastructure Solutions, Q4 20251.

IDC Business Value Snapsh…

1 month, 1 week назад @ cloud.google.com
Introducing Gemini 3 Flash: Intelligence and speed for enterprises
Introducing Gemini 3 Flash: Intelligence and speed for enterprises Introducing Gemini 3 Flash: Intelligence and speed for enterprises

Today, we’re expanding the Gemini 3 model family with Gemini 3 Flash, which offers frontier intelligence built for speed at a fraction of the cost.

Gemini 3 Flash builds on the model series that developers and enterprises already love, optimized for high-frequency workflows that demand speed, without sacrificing quality.

Gemini 3 Flash is built to be highly efficient, pushing the boundaries of quality at better price performance and faster speed.

It is available now in Gemini Enterprise, Vertex AI, and Gemini CLI, so businesses and developers can access:Advanced multimodal processing: Gemini 3 Flash enables enterprises to build applications capable of complex video analysis, data extraction…

1 month, 1 week назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 5 days, 2 hours назад
Multimodal reinforcement learning with agentic verifier for AI agents
Multimodal reinforcement learning with agentic verifier for AI agents Multimodal reinforcement learning with agentic verifier for AI agents

It’s an agentic verification framework designed to improve the reliability of reinforcement learning in multimodal models.

Reinforcement learning is a training method where AI models learn by receiving rewards for desired behaviors and penalties for undesired ones, gradually improving their performance through trial and error.

This design prevents unreliable feedback from dominating training and produces a stable reward signal for reinforcement learning.

Models trained with Argos also showed a substantial reduction of hallucinations compared with both standard chain-of-thought prompting and reinforcement learning baselines.

How Argos shapes reinforcement learningTo understand how Argos affe…

5 days, 2 hours назад @ microsoft.com
OptiMind: A small language model with optimization expertise
OptiMind: A small language model with optimization expertise OptiMind: A small language model with optimization expertise

OptiMind is a small language model designed to convert business problems described in natural language into the mathematical formulations needed by optimization software.

Enterprises across industries, from energy to finance, use optimization models to plan complex operations like supply chains and logistics.

To address this challenge, we’re introducing OptiMind, a small language model designed to convert problems described in plain language into the mathematical formulations that optimization software needs.

Built on a 20-billion parameter model, OptiMind is compact by today’s standards yet matches the performance of larger, more complex systems.

From problem description to solutionOne of …

1 week, 3 days назад @ microsoft.com
Agent Lightning: Adding reinforcement learning to AI agents without code rewrites
Agent Lightning: Adding reinforcement learning to AI agents without code rewrites Agent Lightning: Adding reinforcement learning to AI agents without code rewrites

To address this, a research team from Microsoft Research Asia – Shanghai has introduced Agent Lightning.

Whether it involves multiple collaborating agents or dynamic tool use, Agent Lightning breaks it down into a sequence of transitions.

Agent Lightning as middlewareAgent Lightning serves as middleware between RL algorithms and agent environments, providing with modular components that enable scalable RL through standardized protocols and well-defined interfaces.

In practice, developers can keep their existing agent frameworks and switch model calls to the Agent Lightning API without changing their agent code (Figure 5).

By bridging existing agentic systems with reinforcement learning, Age…

1 month, 2 weeks назад @ microsoft.com
Promptions helps make AI prompting more precise with dynamic UI controls
Promptions helps make AI prompting more precise with dynamic UI controls Promptions helps make AI prompting more precise with dynamic UI controls

To address this, we are excited to introduce Promptions (prompt + options), a UI framework that helps developers build AI interfaces with more precise user control.

We compared the static design from the first study, called the “Static Prompt Refinement Control” (Static PRC), against a “Dynamic Prompt Refinement Control” (Dynamic PRC) with features that responded to participants’ feedback.

Comparison of user preferences for Static PRC versus Dynamic PRC across key evaluation criteria.

(1) The Option Module reads the user’s prompt and conversation history and (2) generates prompt options.

Key usability challenges include clarifying how dynamic options affect AI output and managing the comple…

1 month, 2 weeks назад @ microsoft.com
GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI
GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI

C, Scatter plot comparing the subtype-level GigaTIME-translated virtual mIF activations between TCGA and Providence virtual populations.

To our knowledge, this is the first large-scale study exploring multimodal AI for scaling virtual mIF generation.

H, A case study showcasing the activation maps across different virtual mIF channels for a H&E slide in our virtual population, and virtual mIF of sample patches from this slide.

By applying GigaTIME to Providence real-world data, we generated a virtual population of 14,256 patients with virtual mIF and key clinical attributes.

G, Bar plot comparing pan-cancer patient stratification performance in terms of survival log rank p-values among virtu…

1 month, 2 weeks назад @ microsoft.com
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

1 month, 3 weeks назад @ microsoft.com
Reducing Privacy leaks in AI: Two approaches to contextual integrity
Reducing Privacy leaks in AI: Two approaches to contextual integrity Reducing Privacy leaks in AI: Two approaches to contextual integrity

The theory of contextual integrity frames privacy as the appropriateness of information flow within specific social contexts.

Each tackles contextual integrity from a different angle, but both aim to build directly into AI systems a greater sensitivity to information-sharing norms.

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning, accepted at NeurIPS 2025, takes a different approach to applying contextual integrity.

Contextual integrity through reasoning and reinforcement learningIn our second paper, we explore whether contextual integrity can be built into the model itself rather than enforced through external checks at inference time.

To address this trade-off, we int…

2 months назад @ microsoft.com
Fara-7B: An Efficient Agentic Model for Computer Use
Fara-7B: An Efficient Agentic Model for Computer Use Fara-7B: An Efficient Agentic Model for Computer Use

Today, we are pleased to announce Fara-7B, our first agentic SLM designed specifically for computer use.

Unlike traditional chat models that generate text-based responses, Computer Use Agent (CUA) models like Fara-7B leverage computer interfaces, such as a mouse and keyboard, to complete tasks on behalf of users.

This results in reduced latency and improved privacy, as user data remains local.

Fara-7B breaks ground on a new pareto frontier, showing that on-device computer use agents are approaching the capabilities of frontier models.

For guidance on how to use our model safely, and the security considerations to be mindful of when using our model, please refer to our Model card (opens in n…

2 months назад @ microsoft.com
MMCTAgent: Enabling multimodal reasoning over large video and image collections
MMCTAgent: Enabling multimodal reasoning over large video and image collections MMCTAgent: Enabling multimodal reasoning over large video and image collections

Real-world reasoning increasingly involves analyzing long-form video content, where context spans minutes or hours, far beyond the context limits of most models.

The Planner agent decomposes a user query, identifies the appropriate reasoning tools, performs multimodal operations, and drafts a preliminary answer.

MMCTAgent’s Planner–Critic architecture enables multimodal reasoning over long-form video through structured ingestion, retrieval, and iterative feedback.

The VideoAgent extends this architecture to long-form video reasoning.

Takeaways and next stepsMMCTAgent demonstrates a scalable agentic approach to multimodal reasoning with a Planner–Critic architecture.

2 months, 2 weeks назад @ microsoft.com
BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI
BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI

Many studies have explored red teaming code LLMs, testing whether the models can reject unsafe requests and whether their generated code exhibits insecure patterns.

Knowledge-enhanced blue teaming: Building on the foundation of red-teaming knowledge, BlueCodeAgent significantly improves blue-teaming performance by leveraging constitutions derived from knowledge and dynamic testing.

Generalization to seen and unseen risks: Empowered by comprehensive red-teaming knowledge, BlueCodeAgent generalizes effectively to unseen risks.

A blue teaming agent enabled by red teamingFigure 2: Overview of BlueCodeAgent, an end-to-end blue teaming framework powered by automated red teaming for code security.…

2 months, 2 weeks назад @ microsoft.com
When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost
When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost

Spotlight: Event Series Microsoft Research Forum Join us for a continuous exchange of ideas about research in the era of general AI.

These differentiated advantages stem from PIKE-RAG’s unique approach to understanding and processing professional knowledge.

“It’s also worth noting that the researchers at Microsoft Research Asia demonstrated strong industry knowledge and rigorous scientific methodology.

Through this collaboration, we validated that PIKE-RAG’s general approach can greatly improve the accuracy of professional knowledge Q&A and accelerate scenario customization.

Our researchers also gained valuable experience in handling domain-specific data,” explained Jiang Bian, partner rese…

2 months, 2 weeks назад @ microsoft.com
Magentic Marketplace: an open-source simulation environment for studying agentic markets
Magentic Marketplace: an open-source simulation environment for studying agentic markets Magentic Marketplace: an open-source simulation environment for studying agentic markets

To help navigate this uncertainty, we built Magentic Marketplace (opens in new tab)— an open-source simulation environment for exploring the numerous possibilities of agentic markets and their societal implications at scale.

To explore these dynamics in depth, the Magentic Marketplace platform enables controlled experimentation across diverse agentic marketplace scenarios.

With Magentic Marketplace, researchers can model how agents representing customers and businesses interact—shedding light on the dynamics that could shape future digital markets.

Magentic Marketplace includes two agent types: Assistant Agents (customers) and Service Agents (businesses).

Unlike traditional markets, which d…

2 months, 3 weeks назад @ microsoft.com
RedCodeAgent: Automatic red-teaming agent against diverse code agents
RedCodeAgent: Automatic red-teaming agent against diverse code agents RedCodeAgent: Automatic red-teaming agent against diverse code agents

In the context of code, effective red-teaming requires more than simply checking whether the target code agent rejects unsafe requests.

After the second request was rejected by the code agent, RedCodeAgent invoked both Code Substitution and GCG to optimize the prompt.

Ultimately, RedCodeAgent successfully combined the suggestion from Code Substitution (i.e., using pathlib) with the adversarial suffix generated by GCG, making the target code agent delete the specified file.

In the context of code, it is not enough for the target code agent to simply avoid rejecting the request; the target code agent must also generate and execute code that performs the intended function.

Quantitatively, we f…

2 months, 3 weeks назад @ microsoft.com
Tell me when: Building agents that can wait, monitor, and act
Tell me when: Building agents that can wait, monitor, and act Tell me when: Building agents that can wait, monitor, and act

This matters because monitoring tasks are everywhere.

To address this, we are introducing SentinelStep (opens in new tab), a mechanism that enables agents to complete long-running monitoring tasks.

Most real-world monitoring tasks share this limitation, making systematic bench marking very challenging.

In response, we are developing SentinelBench, a suite of synthetic web environments for evaluating monitoring tasks.

By embedding patience into plans, agents can responsibly monitor conditions and act when it matters—staying proactive without wasting resources.

3 months назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

3 months, 3 weeks назад @ microsoft.com
MIT AI MIT AI
последний пост 4 days, 22 hours назад
Why it’s critical to move beyond overly aggregated machine-learning metrics
Why it’s critical to move beyond overly aggregated machine-learning metrics Why it’s critical to move beyond overly aggregated machine-learning metrics

In many instances — including areas examined by the researchers such as chest X-rays, cancer histopathology images, and hate speech detection — such spurious correlations are much harder to detect.

In the case of a medical diagnosis model trained on chest X-rays, for example, the model may have learned to correlate a specific and irrelevant marking on one hospital’s X-rays with a certain pathology.

Basically, he trained thousands of models using in-distribution data, meaning the data were from the first setting, and calculated their accuracy.

Salaudeen also emphasizes the dangers of aggregate statistics for evaluation, which can obscure more granular and consequential information about mode…

4 days, 22 hours назад @ news.mit.edu
At MIT, a continued commitment to understanding intelligence
At MIT, a continued commitment to understanding intelligence At MIT, a continued commitment to understanding intelligence

The MIT Siegel Family Quest for Intelligence (SQI), a research unit in the MIT Schwarzman College of Computing, brings together researchers from across MIT who combine their diverse expertise to understand intelligence through tightly coupled scientific inquiry and rigorous engineering.

The question of human intelligence has two parts: how it works, and where it comes from.

Exploring the great mysteries of the mindThe MIT Siegel Family Quest for Intelligence was recently renamed in recognition of a major gift from the Siegel Family Endowment that is enabling further growth in SQI’s research and activities.

By supporting us, David Siegel, the Siegel Family Endowment, and our other donors are…

1 week, 3 days назад @ news.mit.edu
Generative AI tool helps 3D print personal items that sustain daily use
Generative AI tool helps 3D print personal items that sustain daily use Generative AI tool helps 3D print personal items that sustain daily use

A generative AI model then modifies the 3D geometry, while MechStyle simulates how those changes will impact particular parts, ensuring vulnerable areas remain structurally sound.

While AI can help generate personalized 3D models that you can fabricate, those systems don’t often consider the physical properties of the 3D model.

According to MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers, a key issue is the mechanical integrity of the 3D model.

According to CSAIL researchers, 3D stylization used to come with unintended consequences.

What’s more, the team hopes to use generative AI to create 3D models for users, instead of stylizing presets and user-uploaded d…

1 week, 3 days назад @ news.mit.edu
3 Questions: How AI could optimize the power grid
3 Questions: How AI could optimize the power grid 3 Questions: How AI could optimize the power grid

Q: Why does the power grid need to be optimized in the first place?

Q: How can AI be most useful in power grid optimization?

This could lead to a cleaner power grid by allowing us to handle and better utilize these resources.

I’m excited to develop AI algorithms that respect the physical constraints of the power grid so that we can credibly deploy them.

But if you make the same magnitude of a mistake when you are optimizing a power grid, that can cause a large-scale blackout.

2 weeks, 2 days назад @ news.mit.edu
Decoding the Arctic to predict winter weather
Decoding the Arctic to predict winter weather Decoding the Arctic to predict winter weather

Cohen, a research scientist in MIT’s Department of Civil and Environmental Engineering (CEE), has spent decades studying how conditions in the Arctic set the course for winter weather throughout Europe, Asia, and North America.

AI subseasonal forecastingWhile AI weather models have made impressive strides showcasing in short-range (one-to–10-day) forecasts, these advances have not yet applied to longer periods.

That gap is why this year could be a turning point for subseasonal weather forecasting.

If validated, Cohen explains, it would show how combining Arctic indicators with AI could extend the lead time for predicting impactful weather.

In November, Cohen even appeared as a clue in The W…

2 weeks, 2 days назад @ news.mit.edu
Stone Center on Inequality and Shaping the Future of Work Launches at MIT
Stone Center on Inequality and Shaping the Future of Work Launches at MIT Stone Center on Inequality and Shaping the Future of Work Launches at MIT

The James M. and Cathleen D. Stone Center on Inequality and Shaping the Future of Work officially launched on Nov. 3, 2025, bringing together scholars, policymakers, and practitioners to explore critical questions about economic opportunity, technology, and democracy.

Co-directed by MIT professors Daron Acemoglu, David Autor, and Simon Johnson, the new Stone Center analyzes the forces that contribute to growing income and wealth inequality through the erosion of job quality and labor market opportunities for workers without a college degree.

MIT Provost Anantha Chandrakasan opened the launch event by emphasizing the urgency and importance of the center's mission.

To mitigate wealth inequali…

2 weeks, 3 days назад @ news.mit.edu
MIT scientists investigate memorization risk in the age of clinical AI
MIT scientists investigate memorization risk in the age of clinical AI MIT scientists investigate memorization risk in the age of clinical AI

What is patient privacy for?

Foundation models trained on EHRs should normally generalize knowledge to make better predictions, drawing upon many patient records.

But in “memorization,” the model draws upon a singular patient record to deliver its output, potentially violating patient privacy.

Notably, foundation models are already known to be prone to data leakage.

“There’s a reason our health data is private,” Tonekaboni says.

2 weeks, 5 days назад @ news.mit.edu
Using design to interpret the past and envision the future
Using design to interpret the past and envision the future Using design to interpret the past and envision the future

He is a collaborator at the Design Intelligence Lab and has served as a teaching assistant in MIT’s architecture wood shop, helping students to bring together digital design techniques with hands-on fabrication.

Following the class, Payne continued working on models and drawings reconstructing some important Tuskegee architecture.

Incorporating AI to design for the futureWhile much of Payne’s research is rooted in the past, he is also interested in artificial intelligence and its implications for future innovations.

In addition, Payne took a class about large language objects taught by associate professor of the practice Marcelo Coelho, director of the Design Intelligence Lab.

Payne now con…

2 weeks, 5 days назад @ news.mit.edu
MIT in the media: 2025 in review
MIT in the media: 2025 in review MIT in the media: 2025 in review

“At MIT, innovation ranges from awe-inspiring technology to down-to-Earth creativity,” noted Chronicle, during a campus visit this year for an episode of the program.

In 2025, MIT researchers made headlines across print publications, podcasts, and video platforms for key scientific advances, from breakthroughs in quantum and artificial intelligence to new efforts aimed at improving pediatric health care and cancer diagnosis.

Neha Narula, director of the MIT Digital Currency Initiative, examines the future of cash as the use of digital currencies expands.

Full story via Michigan Farm NewsBug-sized robots could help pollination on future farmsInsect-sized robots crafted by MIT researchers cou…

1 month назад @ news.mit.edu
Guided learning lets “untrainable” neural networks realize their potential
Guided learning lets “untrainable” neural networks realize their potential Guided learning lets “untrainable” neural networks realize their potential

Remarkably, even untrained networks contain architectural biases that can be transferred, while trained guides additionally convey learned patterns.

This result underscores a key insight: Untrained networks already encode valuable architectural biases that can steer other networks toward effective learning.

By aligning with a guide network, it’s possible to separate the contributions of architectural biases from those of learned knowledge.

By revealing the hidden potential of even the most stubborn networks, guidance provides a powerful new tool for understanding — and hopefully shaping — the foundations of machine learning.

Remarkably, the authors show this can be done using small, untrain…

1 month, 1 week назад @ news.mit.edu
A new way to increase the capabilities of large language models
A new way to increase the capabilities of large language models A new way to increase the capabilities of large language models

Kim’s co-authors include lead author Songlin Yang, an EECS graduate student and former MIT-IBM Watson AI Lab Summer Program intern; Kaiyue Wen of Stanford University; Liliang Ren of Microsoft; and Yikang Shen, Shawn Tan, Mayank Mishra, and Rameswar Panda of IBM Research and the MIT-IBM Watson AI Lab.

The cumulative effect lets the system model how the meaning changes along the path between words, not just how far apart they are.

PaTH Attention improved perplexity and outcompeted other methods on reasoning benchmarks it wasn’t trained on.

PaTH Attention consistently proved capable of content-awareness.

In this way, PaTH Attention extends the expressive power of transformer architectures.

1 month, 1 week назад @ news.mit.edu
A “scientific sandbox” lets researchers explore the evolution of vision systems
A “scientific sandbox” lets researchers explore the evolution of vision systems A “scientific sandbox” lets researchers explore the evolution of vision systems

This framework could enable scientists to probe “what-if” questions about vision systems that are difficult to study experimentally.

Building a scientific sandboxThe paper began as a conversation among the researchers about discovering new vision systems that could be useful in different fields, like robotics.

Over many generations, agents evolve different elements of vision systems that maximize rewards.

Testing hypothesesWhen the researchers set up experiments in this framework, they found that tasks had a major influence on the vision systems the agents evolved.

In the future, the researchers want to use this simulator to explore the best vision systems for specific applications, which c…

1 month, 1 week назад @ news.mit.edu
“Robot, make me a chair”
“Robot, make me a chair” “Robot, make me a chair”

The researchers tackled these challenges using a vision-language model (VLM), a powerful generative AI model that has been pre-trained to understand images and text.

By serving as both the eyes and brain of the robot, the VLM enables the robot to do this,” Kyaw says.

A user prompts the system with text, perhaps by typing “make me a chair,” and gives it an AI-generated image of a chair to start.

For instance, the model can determine that the seat and backrest should have panels to have surfaces for someone sitting and leaning on the chair.

Then the VLM chooses the labels that correspond to the geometric parts of the chair that should receive panels on the 3D mesh to complete the design.

1 month, 1 week назад @ news.mit.edu
3 Questions: Using computation to study the world’s best single-celled chemists
3 Questions: Using computation to study the world’s best single-celled chemists 3 Questions: Using computation to study the world’s best single-celled chemists

Q: What drew you to research microbes in extreme environments, and what are the challenges in studying them?

I wanted to be an astronaut growing up, and the closest thing to astrobiology is examining extreme environments on Earth.

And the only thing that lives in those extreme environments are microbes.

My latest work is genomic language modeling.

A genomic language model is technically a large language model, except the language is DNA as opposed to human language.

1 month, 1 week назад @ news.mit.edu
Working to eliminate barriers to adopting nuclear energy
Working to eliminate barriers to adopting nuclear energy Working to eliminate barriers to adopting nuclear energy

What if there were a way to solve one of the most significant obstacles to the use of nuclear energy — the disposal of high-level nuclear waste (HLW)?

Such a move would be especially important for the public’s acceptance of nuclear energy.

“We’re reframing the problem of nuclear waste, transforming it from a liability to an energy source,” Sarsenbayev says.

The nuances of nuclearSarsenbayev had to do a bit of reframing himself in how he perceived nuclear energy.

Removing the bottleneck for nuclear energy adoption by producing carbon-free power and ensuring the safe disposal of radioactive waste.

1 month, 1 week назад @ news.mit.edu
Berkeley AI
последний пост 2 months, 3 weeks назад
RL without TD learning
RL without TD learning RL without TD learning

RL without TD learningIn this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer.

We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning.

There are two classes of algorithms in RL: on-policy RL and off-policy RL.

We compared TRL with $n$-step TD learning with different values of $n$, from $1$ (pure TD) to $\infty$ (pure MC).

I still think one of the most important problems in RL (and even in machine learning) is to find a scalable off-policy RL algorithm.

2 months, 3 weeks назад @ bair.berkeley.edu
What exactly does word2vec learn?
What exactly does word2vec learn? What exactly does word2vec learn?

What exactly does word2vec learn?

What exactly does word2vec learn, and how?

In this framing, it’s clear that word2vec is a minimal neural language model.

As a result, the theory predicts exactly what features are learned in terms of the corpus statistics and the algorithmic hyperparameters.

We find that over the course of learning, word2vec builds these linear representations in a sequence of noisy learning steps, and their geometry is well-described by a spiked random matrix model.

4 months, 3 weeks назад @ bair.berkeley.edu
Whole-Body Conditioned Egocentric Video Prediction
Whole-Body Conditioned Egocentric Video Prediction Whole-Body Conditioned Egocentric Video Prediction

Whole-Body Conditioned Egocentric Video Prediction×Predicting Ego-centric Video from human Actions (PEVA).

We trained a model to Predict Ego-centric Video from human Actions (PEVA) for Whole-Body-Conditioned Egocentric Video Prediction.

We train an autoregressive conditional diffusion transformer on Nymeria, a large-scale dataset pairing real-world egocentric video with body pose capture.

We include some samples here:Body Movement Actions Move Forward Rotate Left Rotate Right Left Hand Actions Move Left Hand Up Move Left Hand Down Move Left Hand Left Move Left Hand Right Right Hand Actions Move Right Hand Up Move Right Hand Down Move Right Hand Left Move Right Hand RightLong RolloutHere you…

6 months, 4 weeks назад @ bair.berkeley.edu
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign) Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications.

To mitigate the imminent prompt injection threat, we propose two fine-tuning-defenses, StruQ and SecAlign.

Prompt Injection Attack: CausesBelow is the threat model of prompt injection attacks.

Prompt injection threat model in LLM-integrated applicationsWe propose that prompt injection has two causes.

Below are resources to learn more and keep updated on prompt injection attacks and defenses.

9 months, 2 weeks назад @ bair.berkeley.edu
Repurposing Protein Folding Models for Generation with Latent Diffusion
Repurposing Protein Folding Models for Generation with Latent Diffusion Repurposing Protein Folding Models for Generation with Latent Diffusion

Repurposing Protein Folding Models for Generation with Latent DiffusionPLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models.

In PLAID, we develop a method that learns to sample from the latent space of protein folding models to generate new proteins.

Unlike many previous protein structure generative models, PLAID addresses the multimodal co-generation problem setting: simultaneously generating both discrete sequence and continuous all-atom structural coordinates.

In this way, we can use structural understanding information in the weights of pretrained protein folding models for the p…

9 months, 3 weeks назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 2 days, 1 hour назад
Build AI agents with Amazon Bedrock AgentCore using AWS CloudFormation
Build AI agents with Amazon Bedrock AgentCore using AWS CloudFormation Build AI agents with Amazon Bedrock AgentCore using AWS CloudFormation

In order to streamline the resource deployment and management, Amazon Bedrock AgentCore services are now being supported by various IaC frameworks such as AWS Cloud Development Kit (AWS CDK), Terraform and AWS CloudFormation Templates.

In this post, we use CloudFormation templates to build an end-to-end application for a weather activity planner.

Best practices for deploymentsWe recommend the following practices for your deployments:Modular component architecture – Design AWS CloudFormation templates with separate sections for each AWS Services.

– Design AWS CloudFormation templates with separate sections for each AWS Services.

ConclusionIn this post, we introduced an automated solution for…

2 days, 1 hour назад @ aws.amazon.com
How the Amazon.com Catalog Team built self-learning generative AI at scale with Amazon Bedrock
How the Amazon.com Catalog Team built self-learning generative AI at scale with Amazon Bedrock How the Amazon.com Catalog Team built self-learning generative AI at scale with Amazon Bedrock

In this post, we demonstrate how the Amazon Catalog Team built a self-learning system that continuously improves accuracy while reducing costs at scale using Amazon Bedrock.

At Amazon Catalog, we faced this challenge head-on.

Product data flows through generator-evaluator workers (Amazon EC2 and Amazon Bedrock Runtime), with agreements stored directly and disagreements routed to a supervisor agent (Bedrock AgentCore).

Human review (Amazon Simple Queue Service (Amazon SQS)) and observability (Amazon CloudWatch) complete the architecture.

About the authorsTarik Arici is a Principal Scientist at Amazon Selection and Catalog Systems (ASCS), where he pioneers self-learning generative AI systems …

2 days, 1 hour назад @ aws.amazon.com
How PDI built an enterprise-grade RAG system for AI applications with AWS
How PDI built an enterprise-grade RAG system for AI applications with AWS How PDI built an enterprise-grade RAG system for AI applications with AWS

To address this, PDI Technologies built PDI Intelligence Query (PDIQ), an AI assistant that gives employees access to company knowledge through an easy-to-use chat interface.

This solution is powered by a custom Retrieval Augmented Generation (RAG) system, built on Amazon Web Services (AWS) using serverless technologies.

Azure DevOps crawler – PDI uses Azure DevOps to manage its code base, track commits, and maintain project documentation in a centralized repository.

This approach ensures that image content becomes part of the searchable text, enabling richer and more accurate context retrieval during search and analysis.

Generate document summary – Use Amazon Nova Lite to create a summary …

3 days, 2 hours назад @ aws.amazon.com
How CLICKFORCE accelerates data-driven advertising with Amazon Bedrock Agents
How CLICKFORCE accelerates data-driven advertising with Amazon Bedrock Agents How CLICKFORCE accelerates data-driven advertising with Amazon Bedrock Agents

CLICKFORCE is one of leaders in digital advertising services in Taiwan, specializing in data-driven advertising and conversion (D4A – Data for Advertising & Action).

To remain competitive, CLICKFORCE turned to AWS to build Lumos, a next-generation AI-driven marketing analysis solution powered by Amazon Bedrock, Amazon SageMaker AI, Amazon OpenSearch, and AWS Glue.

Digital advertising challengesBefore adopting Amazon Bedrock, CLICKFORCE faced several roadblocks in building actionable intelligence for digital advertising.

The solution is designed around Amazon Bedrock Agents for contextualized reasoning and Amazon SageMaker AI for fine-tuning Text-to-SQL accuracy.

By combining Amazon Bedrock …

3 days, 2 hours назад @ aws.amazon.com
How Thomson Reuters built an Agentic Platform Engineering Hub with Amazon Bedrock AgentCore
How Thomson Reuters built an Agentic Platform Engineering Hub with Amazon Bedrock AgentCore How Thomson Reuters built an Agentic Platform Engineering Hub with Amazon Bedrock AgentCore

The Platform Engineering team spent a lot of time doing work that is undifferentiated heavy lifting.

The Platform Engineering team spent a lot of time doing work that is undifferentiated heavy lifting.

The following diagram illustrates the architecture of solution:TR built an AI-powered platform engineering hub using AgentCore.

Service Agent Development FrameworkThe Platform Engineering team developed their own framework, TR-AgentCore-Kit (TRACK), to simplify agent deployment across the organization.

Using Amazon Bedrock AgentCore, TR transformed their platform engineering operations from manual processes to an AI-powered self-service hub.

3 days, 22 hours назад @ aws.amazon.com
Build agents to learn from experiences using Amazon Bedrock AgentCore episodic memory
Build agents to learn from experiences using Amazon Bedrock AgentCore episodic memory Build agents to learn from experiences using Amazon Bedrock AgentCore episodic memory

Amazon Bedrock AgentCore episodic memory addresses this limitation by capturing and surfacing experience-level knowledge for AI agents.

To learn more, see Amazon Bedrock AgentCore Memory: Building context-aware agents and Building smarter AI agents: AgentCore long-term memory deep dive.

How AgentCore episodic memory worksWhen your agentic application sends conversational events to AgentCore Memory, raw interactions get transformed into rich episodic memory records through an intelligent extraction and reflection process.

Performance evaluationWe evaluated Amazon Bedrock AgentCore episodic memory on real-world goal completion benchmarks from the retail and airline domain (sampled from τ2-ben…

3 days, 23 hours назад @ aws.amazon.com
How bunq handles 97% of support with Amazon Bedrock
How bunq handles 97% of support with Amazon Bedrock How bunq handles 97% of support with Amazon Bedrock

Beyond direct support, bunq’s team also needed efficient ways to analyze incoming feature requests and bug reports to continuously improve their system.

bunq uses Amazon Bedrock to access Anthropic’s Claude models with enhanced security features, scalability, and compliance—critical requirements for banking applications.

Amazon Simple Storage Service (Amazon S3) handles document storage, and Amazon MemoryDB manages user sessions for real-time interactions.

Real-world impactThe transformation from bunq’s initial router-based architecture to the orchestrator pattern with Amazon Bedrock delivered measurable improvements across user support operations.

The multi-agent deployment achieved signif…

4 days, 1 hour назад @ aws.amazon.com
Using Strands Agents to create a multi-agent solution with Meta’s Llama 4 and Amazon Bedrock
Using Strands Agents to create a multi-agent solution with Meta’s Llama 4 and Amazon Bedrock Using Strands Agents to create a multi-agent solution with Meta’s Llama 4 and Amazon Bedrock

This shift towards collaborative, multi-agent AI solutions is revolutionizing software architectures by making them more autonomous, resilient, and adaptable.

In this post, we explore how to build a multi-agent video processing workflow using Strands Agents, Meta’s Llama 4 models, and Amazon Bedrock to automatically analyze and understand video content through specialized AI agents working in coordination.

Introduction to agentic workflows using Strands AgentsThis post demonstrates a video processing solution that implements an agent workflow using six specialized agents.

This architectural pattern in AI systems allows for specialized AI agents to be wrapped as callable functions (tools) th…

4 days, 1 hour назад @ aws.amazon.com
Introducing multimodal retrieval for Amazon Bedrock Knowledge Bases
Introducing multimodal retrieval for Amazon Bedrock Knowledge Bases Introducing multimodal retrieval for Amazon Bedrock Knowledge Bases

We are excited to announce the general availability of multimodal retrieval for Amazon Bedrock Knowledge Bases.

With multimodal retrieval in Bedrock Knowledge Bases, you can now ingest, index, and retrieve information from text, images, video, and audio using a single, unified workflow.

Amazon Bedrock Data AutomationBedrock Data Automation takes a different approach by converting multimedia content into rich textual representations before embedding.

He is focused on helping accelerate enterprises across the world on their generative AI journeys with Amazon Bedrock and Bedrock AgentCore.

Jean-Pierre Dodel is a Principal Product Manager for Amazon Bedrock, Amazon Kendra, and Amazon Quick Inde…

5 days, 1 hour назад @ aws.amazon.com
Advanced fine-tuning techniques for multi-agent orchestration: Patterns from Amazon at scale
Advanced fine-tuning techniques for multi-agent orchestration: Patterns from Amazon at scale Advanced fine-tuning techniques for multi-agent orchestration: Patterns from Amazon at scale

Amazon SageMaker Training jobs provide managed infrastructure for executing custom fine-tuning workflows, automatically provisioning GPU instances, managing training execution, and handling cleanup after completion.

AI agent development environments and SDKsThe development environment is where developers author, test, and iterate on agent logic before deployment.

Amazon SageMaker AI Serverless Customization eliminates infrastructure management for fine-tuning, supporting SFT, DPO, and RLVR techniques with pay-per-token pricing.

For large-scale training, Amazon SageMaker HyperPod introduced checkpointless training and elastic scaling to reduce recovery time and optimize resource utilization.…

1 week, 2 days назад @ aws.amazon.com
How Palo Alto Networks enhanced device security infra log analysis with Amazon Bedrock
How Palo Alto Networks enhanced device security infra log analysis with Amazon Bedrock How Palo Alto Networks enhanced device security infra log analysis with Amazon Bedrock

Palo Alto Networks offers Cloud-Delivered Security Services (CDSS) to tackle device security risks.

Solution overviewPalo Alto Networks’ automated log classification system helps their Device Security team detect and respond to potential service failures ahead of time.

The solution integrates seamlessly with Palo Alto Networks’ existing infrastructure, helping the Device Security team focus on preventing outages instead of managing complex log analysis processes.

This comprehensive output helps Palo Alto Networks’ SMEs quickly prioritize responses and take preventive action before potential outages occur.

By providing detailed reasoning alongside each classification, Palo Alto Networks enab…

1 week, 2 days назад @ aws.amazon.com
From beginner to champion: A student’s journey through the AWS AI League ASEAN finals
From beginner to champion: A student’s journey through the AWS AI League ASEAN finals From beginner to champion: A student’s journey through the AWS AI League ASEAN finals

Behind the competitionThe AWS AI League competition began with a tutorial session led by the AWS team and the Gen-C Generative AI Learning Community, featuring two powerful user-friendly services: Amazon SageMaker JumpStart and PartyRock.

Answer prompt:You are an AI expert specializing in generative AI, foundation models, agentic AI, prompt engineering, and responsible AI.

A higher learning rate with fewer epochs might converge faster, but could miss the subtle patterns captured by a lower learning rate over more epochs.

A higher learning rate with fewer epochs might converge faster, but could miss the subtle patterns captured by a lower learning rate over more epochs.

Vincent Oh is the Pri…

1 week, 2 days назад @ aws.amazon.com
Deploy AI agents on Amazon Bedrock AgentCore using GitHub Actions
Deploy AI agents on Amazon Bedrock AgentCore using GitHub Actions Deploy AI agents on Amazon Bedrock AgentCore using GitHub Actions

Recently, AWS announced Amazon Bedrock AgentCore, a flexible service that helps developers seamlessly create and manage AI agents across different frameworks and models, whether hosted on Amazon Bedrock or other environments.

In this post, we demonstrate how to use a GitHub Actions workflow to automate the deployment of AI agents on AgentCore Runtime.

In this sample agent code, when someone calls your Amazon Bedrock agent using an API, AgentCore Runtime will automatically call the strands_agent_bedrock(payload) function.

AgentCore Runtime versioning and endpointsAmazon Bedrock AgentCore implements automatic versioning for AgentCore Runtime and lets you manage different configurations using …

1 week, 2 days назад @ aws.amazon.com
How the Amazon AMET Payments team accelerates test case generation with Strands Agents
How the Amazon AMET Payments team accelerates test case generation with Strands Agents How the Amazon AMET Payments team accelerates test case generation with Strands Agents

The challenge with traditional AI approachesOur initial attempts followed conventional AI approaches, feeding entire BRDs to a single AI agent for test case generation.

Workflow orchestration benefitsThe Strands Agents workflow architecture provided the sophisticated coordination capabilities our multi-agent system required.

A key discovery was that the Strands Agents workflow and graph-based orchestration patterns significantly outperformed traditional supervisor agent approaches.

Implementing Pydantic models through the Strands Agents structured output feature effectively reduced type-related hallucinations in our system.

The same patterns that generate payment test cases can be applied t…

1 week, 3 days назад @ aws.amazon.com
Build a generative AI-powered business reporting solution with Amazon Bedrock
Build a generative AI-powered business reporting solution with Amazon Bedrock Build a generative AI-powered business reporting solution with Amazon Bedrock

The generative AI solution enhances the reporting process through automation.

The result is a level of accuracy and objectivity that’s difficult to achieve with manual processes, ultimately leading to more efficient and effective business reporting.

Solution overviewThis generative AI-powered Enterprise Writing Assistant demonstrates a modern, serverless architecture that leverages AWS’s powerful suite of services to deliver an intelligent writing solution.

By leveraging generative AI technology, this solution addresses the traditional pain points of business reporting while introducing capabilities that were previously unattainable.

For more information about generative AI on AWS, refer to…

1 week, 3 days назад @ aws.amazon.com
NVIDIA
последний пост 3 days, 1 hour назад
NVIDIA DRIVE AV Raises the Bar for Vehicle Safety as Mercedes-Benz CLA Earns Top Euro NCAP Award
NVIDIA DRIVE AV Raises the Bar for Vehicle Safety as Mercedes-Benz CLA Earns Top Euro NCAP Award NVIDIA DRIVE AV Raises the Bar for Vehicle Safety as Mercedes-Benz CLA Earns Top Euro NCAP Award

AI-powered driver assistance technologies are becoming standard equipment, fundamentally changing how vehicle safety is assessed and validated.

The CLA is also built on the NVIDIA DRIVE Hyperion architecture, which incorporates sensor diversity and hardware redundancy into the vehicle’s overall design.

TÜV Rheinland performed an independent United Nations Economic Commission for Europe (UNECE) safety assessment of NVIDIA DRIVE AV related to safety requirements for complex electronic systems, which NVIDIA successfully completed.

Training Safety Through Data and SimulationModern AI-driven safety systems learn from exponentially more driving scenarios than any human could experience in a lifet…

3 days, 1 hour назад @ blogs.nvidia.com
How to Get Started With Visual Generative AI on NVIDIA RTX PCs
How to Get Started With Visual Generative AI on NVIDIA RTX PCs How to Get Started With Visual Generative AI on NVIDIA RTX PCs

Here’s how to get started with visual generative AI locally on RTX PCs using ComfyUI and popular models:Visit comfy.org to download and install ComfyUI for Windows.

Model weights are the “knowledge” inside an AI model — think of them like the synapses in a brain.

#ICYMI — The Latest Advancements in NVIDIA RTX AI PCs💻NVIDIA @ CES 2026CES announcements included 4K AI video generation acceleration on PCs with LTX-2 and ComfyUI upgrades.

Plus, major RTX accelerations across ComfyUI, LTX-2, Llama.cpp, Ollama, Hyperlink and more unlock video, image and text generation use cases on AI PCs.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI…

3 days, 5 hours назад @ blogs.nvidia.com
From Pilot to Profit: Survey Reveals the Financial Services Industry Is Doubling Down on AI Investment and Open Source
From Pilot to Profit: Survey Reveals the Financial Services Industry Is Doubling Down on AI Investment and Open Source From Pilot to Profit: Survey Reveals the Financial Services Industry Is Doubling Down on AI Investment and Open Source

The sixth annual “NVIDIA State of AI in Financial Services” report, based on a survey of more than 800 industry professionals, found that AI usage in the industry has never been higher.

Highlights from this year’s report include:89% said AI is helping increase annual revenue and decrease annual costs.

As stated above, 89% of survey respondents said AI has helped increase annual revenue and decrease annual costs.

Download the “State of AI in Financial Services: 2026 Trends” report for in-depth results and insights.

Explore NVIDIA’s AI solutions and enterprise-level AI platforms for financial services.

3 days, 5 hours назад @ blogs.nvidia.com
Flight Controls Are Cleared for Takeoff on GeForce NOW
Flight Controls Are Cleared for Takeoff on GeForce NOW Flight Controls Are Cleared for Takeoff on GeForce NOW

Flight sticks arrive on GeForce NOW with support for Thrustmaster’s T.Flight HOTAS One — plus, four new games join the cloud this week, with the upcoming launch of Tencent’s ‘Delta Force’ coming soon.

Flight control support — one of the most community-requested features for GeForce NOW — is live starting today, following its announcement at CES earlier this month.

Time is running out on our T.Flight Hotas One MSFS Edition + 1 month of @NVIDIAGFN Ultimate giveaway!

To get right to flying, members can look for a dedicated row in the GeForce NOW app highlighting games that support flight controls, making it simple to spot great titles for putting their new setup through its paces.

Fire up the …

3 days, 5 hours назад @ blogs.nvidia.com
“Largest Infrastructure Buildout In Human History”: Jensen Huang on AI’s “Five-Layer Cake” at Davos
“Largest Infrastructure Buildout In Human History”: Jensen Huang on AI’s “Five-Layer Cake” at Davos “Largest Infrastructure Buildout In Human History”: Jensen Huang on AI’s “Five-Layer Cake” at Davos

Huang framed AI not as a single technology but as a “a five-layer cake,” spanning energy, chips and computing infrastructure, cloud data centers, AI models and, ultimately, the application layer.

The application layer might focus on integrating AI into financial services, healthcare, or manufacturing.

Huang pointed to venture capital investment as a signal of how quickly AI is reshaping the global economy.

“AI is infrastructure,” he said, arguing that every country should treat AI like electricity or roads.

Closing Technology DividesFor developing countries, Huang said AI offers a chance to narrow long-standing technology gaps.

4 days, 6 hours назад @ blogs.nvidia.com
Survive the Quarantine Zone and More With Devolver Digital Games on GeForce NOW
Survive the Quarantine Zone and More With Devolver Digital Games on GeForce NOW Survive the Quarantine Zone and More With Devolver Digital Games on GeForce NOW

‘Quarantine Zone: The Last Check’ leads nine games joining the cloud this week.

The energy continues this GFN Thursday with the launch of Quarantine Zone: The Last Check, Devolver Digital’s most haunting and atmospheric release yet.

Check it out along with other top Devolver Digital games on GeForce NOW and nine new games in the cloud this week.

Quarantine Zone: The Last Check streams at full GeForce RTX 5080 power for GeForce NOW Ultimate members, bringing razor-sharp visuals and ultrasmooth performance to every tense border crossing.

No trip through the Devolver catalog would be complete without revisiting its eclectic, chaotic hits already streaming on GeForce NOW.

1 week, 3 days назад @ blogs.nvidia.com
How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile
How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

Each Block is responsible for the calculation of one output tile, and cuTile automatically handles memory access and thread synchronization.

tn : Output tile columns (N dimension).

: Output tile columns (N dimension).

def swizzle_2d_from_bid(M, N, tm, tn, GROUP_SIZE_M, bid): # Get the global IDs of a given CUDA block in a 1D grid.

Using four elements (shaded areas) of the output matrix as an example, the figure compares linear versus swizzled memory accessFigure 2.

1 week, 3 days назад @ developer.nvidia.com
Learn How NVIDIA cuOpt Accelerates Mixed Integer Optimization using Primal Heuristics
Learn How NVIDIA cuOpt Accelerates Mixed Integer Optimization using Primal Heuristics Learn How NVIDIA cuOpt Accelerates Mixed Integer Optimization using Primal Heuristics

NVIDIA cuOpt is a GPU-accelerated optimization engine designed to deliver fast, high-quality solutions for large, complex decision-making problems.

Accelerated primal heuristics for MIP solvers are algorithms that deliver high-quality, feasible solutions without exhaustively searching the entire solution space.

First, many real-world MIP problems are too large or time-sensitive for traditional solvers to find answers in time.

This indicates potential for further improvement, as well as the possibility of augmenting any existing solver with the GPU-accelerated primal heuristics detailed previously.

Get started with the MIP heuristics solverMIP heuristics offer fast and feasible solutions wit…

1 week, 4 days назад @ developer.nvidia.com
CEOs of NVIDIA and Lilly Share ‘Blueprint for What Is Possible’ in AI and Drug Discovery
CEOs of NVIDIA and Lilly Share ‘Blueprint for What Is Possible’ in AI and Drug Discovery CEOs of NVIDIA and Lilly Share ‘Blueprint for What Is Possible’ in AI and Drug Discovery

NVIDIA and Lilly are putting together “a blueprint for what is possible in the future of drug discovery,” NVIDIA founder and CEO Jensen Huang told attendees at a fireside chat Monday with Dave Ricks, chair and CEO of Lilly.

This framework aims to enable experiments, data generation and AI model development to continuously inform and improve one another.

, developer of the Neo model family for co-folding and design across all biological molecules.

Max Jaderberg, president of Isomorphic , which is extending the capabilities of AlphaFold, the defining family of protein structure and interaction models.

Joshua Meier, CEO of Chai Discovery , which developed the Chai family of generative AI model…

1 week, 4 days назад @ blogs.nvidia.com
AI’s Next Revolution: Multiply Labs Is Scaling Robotics-Driven Cell Therapy Biomanufacturing Labs
AI’s Next Revolution: Multiply Labs Is Scaling Robotics-Driven Cell Therapy Biomanufacturing Labs AI’s Next Revolution: Multiply Labs Is Scaling Robotics-Driven Cell Therapy Biomanufacturing Labs

Multiply Labs is doing for cell therapy labs what has already happened in the chip industry: It’s introducing robots to do the tedious, precision and hygienic work better, faster and cheaper.

“Next, I flew to Silicon Valley, and we started this at YCombinator.”San Francisco-based Multiply Labs, founded in 2016, today is automating cell therapy manufacturing with robots for leading companies, including Kyverna Therapeutics and Legend Biotech.

Multiply Labs offers end-to-end robotic systems that produce gene modified cell therapies at scale.

Robots within the controlled biomanufacturing clusters of Multiply Labs help ensure more hygienic and precision processes.

Simulating Cell Therapy Manufa…

1 week, 6 days назад @ blogs.nvidia.com
NVIDIA Unveils Multi-Agent Intelligent Warehouse and Catalog Enrichment AI Blueprints to Power the Retail Pipeline
NVIDIA Unveils Multi-Agent Intelligent Warehouse and Catalog Enrichment AI Blueprints to Power the Retail Pipeline NVIDIA Unveils Multi-Agent Intelligent Warehouse and Catalog Enrichment AI Blueprints to Power the Retail Pipeline

New Multi-Agent Intelligent Warehouse (MAIW) and Retail Catalog Enrichment NVIDIA Blueprints are designed to turn this dynamic system into an advantage.

The NVIDIA MAIW blueprint delivers a synchronized AI system that sits above existing warehouse management systems, enterprise resource planning, robotics and IoT data, so teams gain real-time, explainable operational intelligence.

In addition, the Retail Catalog Enrichment Blueprint can create rich, on-brand marketing content by applying brand voice, tone and taxonomy instructions via prompts, alongside the product image and a target locale.

Global tech consulting firm Grid Dynamics has built a catalog enrichment and management system that …

2 weeks, 2 days назад @ blogs.nvidia.com
Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence
Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence

This post introduces the NVIDIA Multi-Agent Intelligent Warehouse (MAIW) Blueprint for the missing layer.

The solution: An AI command layerThe Multi-Agent Intelligent Warehouse delivers a unified AI command layer for modern warehouse operations, transforming fragmented systems, documents, and telemetry into real-time, actionable intelligence.

Intelligent document processingThe intelligent document processing pipeline uses NVIDIA NIM and multimodal foundation models with quality-based orchestration to deliver enterprise-grade accuracy at scale.

By instrumenting MAIW like any critical warehouse service, SRE and operations teams can monitor, debug, and improve the AI layer with confidence.

Lea…

2 weeks, 2 days назад @ developer.nvidia.com
AI Copilot Keeps Berkeley’s X-Ray Particle Accelerator on Track
AI Copilot Keeps Berkeley’s X-Ray Particle Accelerator on Track AI Copilot Keeps Berkeley’s X-Ray Particle Accelerator on Track

Accelerator Assistant can help prepare and run a multistage physics experiment at a particle accelerator, reducing preparation effort by 100x.

In the rolling hills of Berkeley, California, an AI agent is supporting high-stakes physics experiments at the Advanced Light Source (ALS) particle accelerator.

Researchers at the Lawrence Berkeley National Laboratory ALS facility recently deployed the Accelerator Assistant, a large language model (LLM)-driven system to keep X-ray research on track.

And much can go wrong: the ALS control system has more than 230,000 process variables.

The work has already expanded beyond ALS as part of the DOE’s Genesys mission, with the framework being deployed acro…

2 weeks, 3 days назад @ blogs.nvidia.com
Japan Science and Technology Agency Develops NVIDIA-Powered Moonshot Robot for Elderly Care
Japan Science and Technology Agency Develops NVIDIA-Powered Moonshot Robot for Elderly Care Japan Science and Technology Agency Develops NVIDIA-Powered Moonshot Robot for Elderly Care

Using NVIDIA Isaac Sim and RTX GPUs, humanoid robot research will automate cooking, cleaning, repositioning and other caregiving tasks.

In light of Japan’s rising elderly population, many of the research projects underway center on how robots can aid in senior care.

NVIDIA Architecture Powers On Moonshot RobotsNVIDIA technologies are integrated into every level of the Moonshot project’s senior care robots known as AI-Driven Robot for Embrace and Care, or AIREC.

Dry-AIREC robot, the larger and more mobile member of the Moonshot family, has two NVIDIA GPUs onboard.

Automating repositioning with a humanoid robot — while considering the elderly care patients’ personal states and bodily needs — …

2 weeks, 3 days назад @ blogs.nvidia.com
More Ways to Play, More Games to Love — GeForce NOW Wraps CES With Linux Support, Fire TV App, Flight Stick Controls
More Ways to Play, More Games to Love — GeForce NOW Wraps CES With Linux Support, Fire TV App, Flight Stick Controls More Ways to Play, More Games to Love — GeForce NOW Wraps CES With Linux Support, Fire TV App, Flight Stick Controls

New AAA titles are streaming to the cloud, with easier automatic sign-in and six games joining the GeForce NOW library this week.

NVIDIA is wrapping up a big week at the CES trade show with a set of GeForce NOW announcements that are bringing more ways to play and more games to the cloud.

Leveling Up Where Gamers PlayA new native GeForce NOW app is launching in beta for Linux, starting with Ubuntu 24.04 and later.

Easier Sign-Ins, Faster Game TimeGeForce NOW now supports new games — and more ways to get into them faster.

Battle.net single sign-on recently joined the service, letting members jump straight into supported titles without juggling extra credentials each time.

2 weeks, 3 days назад @ blogs.nvidia.com
Facebook
последний пост 1 week, 3 days назад
Adapting the Facebook Reels RecSys AI Model Based on User Feedback
Adapting the Facebook Reels RecSys AI Model Based on User Feedback Adapting the Facebook Reels RecSys AI Model Based on User Feedback

Our new User True Interest Survey (UTIS) model , now helps surface more niche, high-quality content and boosts engagement, retention, and satisfaction.

Our paper, “ Improve the Personalization of Large-Scale Ranking Systems by Integrating User Survey Feedback ” shares full details on this work.

The main candidate ranking model used by the platform is a large multi-task, multi-label model.

We trained a lightweight UTIS alignment model layer on the collected user survey responses using existing predictions of the main model as input features.

The UTIS model consistently outperformed the baseline, driving higher user engagement and retention .

1 week, 3 days назад @ engineering.fb.com
DrP: Meta’s Root Cause Analysis Platform at Scale
DrP: Meta’s Root Cause Analysis Platform at Scale DrP: Meta’s Root Cause Analysis Platform at Scale

DrP’s key components include:Expressive SDK : The DrP SDK allows engineers to codify investigation workflows into analyzers.

Post-processing system : After an investigation, the post-processing system can take automated actions based on the analysis results.

Bootstrap code : The DrP SDK provides bootstrap code to create a template analyzer with pre-populated boilerplate code.

Data access and analysis : The SDK includes libraries for data access and analysis, such as dimension analysis and time series correlation.

This provides immediate analysis results to on-call engineers.

1 month, 1 week назад @ engineering.fb.com
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks
How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks How AI Is Transforming the Adoption of Secure-by-Default Mobile Frameworks

Generative AI and automation accelerate the adoption of secure frameworks at scale, enabling consistent security enforcement and efficient migration across Meta’s vast codebase.

How We Design Secure-by-Default Frameworks at MetaDesigning secure-by-default frameworks for use by a large number of developers shipping vastly different features across multiple apps is an interesting challenge.

There shouldn’t be one security framework that covers all security issues, and not every security issue is general enough to deserve its own framework.

Now that we’ve looked at the design philosophy behind our frameworks, let’s look at one of our most widely used Android security frameworks, SecureLinkLaun…

1 month, 1 week назад @ engineering.fb.com
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization
Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization

Zoomer has delivered training time reductions, and significant QPS improvements, making it the de-facto tool for AI performance optimization across Meta’s entire AI infrastructure.

Zoomer is Meta’s automated, one-stop-shop platform for performance profiling, debugging, analysis, and optimization of AI training and inference workloads.

AI Performance Optimization Using ZoomerZoomer is an automated debugging and optimization platform that works across all of our AI model types (ads recommendations, GenAI, computer vision, etc.)

Memory Analysis : Comprehensive analysis of GPU memory usage patterns, allocation tracking, and leak detection.

Realtime Memory Profiling : GPU memory allocation track…

2 months назад @ engineering.fb.com
Open Source Is Good for the Environment
Open Source Is Good for the Environment Open Source Is Good for the Environment

But have you heard about open hardware?

And did you know open source can have a positive impact on the environment?

On this episode of the Meta Tech Podcast, Pascal Hartig sits down with Dharmesh and Lisa to talk about all things open hardware, and Meta’s biggest announcements from the 2025 Open Compute Project (OCP) Summit – including a new open methodology for leveraging AI to understand Scope 3 emissions.

You’ll also hear how AI and open hardware are helping Meta push to achieve net zero emissions in 2030, including how AI is being used to develop new concrete mixes for data center construction.

And if you’re interested in learning more about career opportunities at Meta visit the Meta C…

2 months, 1 week назад @ engineering.fb.com
Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation
Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation

We’re sharing details about Meta’s Generative Ads Recommendation Model (GEM), a new foundation model that delivers increased ad performance and advertiser ROI by enhancing other ads recommendation models’ ability to serve relevant ads.

GEM propagates its learnings, leveraging a suite of post-training techniques across the entire ads model fleet, enabling a paradigm shift in Meta’s Ads Recommendation system.

GEM leverages enhanced training scalability that efficiently utilizes thousands of GPUs for building and iterating an LLM-scale ads foundation model.

The Generative Ads Recommendation Model (GEM) is Meta’s most advanced ads foundation model, built on an LLM-inspired paradigm and trained …

2 months, 2 weeks назад @ engineering.fb.com
Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism
Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism

At Meta, we are constantly pushing the boundaries of LLM inference systems to power applications such as the Meta AI App.

These metrics highlight the distinct computational demands of LLM inference: Prefill is compute-intensive, while decoding is memory bandwidth-intensive.

Communication: Communication latency increases when parallelizing across multiple hosts.

In EP-based inference, we utilize a two-shot, all-to-all communication pattern to exchange tokens between data parallelism and expert parallelism ranks based on routing.

We are committed to continuous innovation to ensure efficient and scalable LLM inference for millions of users worldwide.

3 months, 1 week назад @ engineering.fb.com
How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware
How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware How Meta Is Leveraging AI To Improve the Quality of Scope 3 Emission Estimates for IT Hardware

We leveraged AI to help us improve this database and understand our Scope 3 emissions associated with IT hardware by:Identifying similar components and applying existing PCFs to similar components that lack these carbon estimates.

Understanding the carbon footprint of IT racks and applying generative AI (GenAI) as a categorization algorithm to create a new and standard taxonomy .

If these similar components are not identified their carbon footprint estimates will remain at a lower data quality.

These similar components can be mapped to a representative proxy PCF, allowing us to use high-quality PCF data in similar components.

For example, we can scale the carbon footprint calculation for a …

3 months, 1 week назад @ engineering.fb.com
OCP Summit 2025: The Open Future of Networking Hardware for AI
OCP Summit 2025: The Open Future of Networking Hardware for AI OCP Summit 2025: The Open Future of Networking Hardware for AI

At Open Compute Project Summit (OCP) 2025, we’re sharing details about the direction of next-generation network fabrics for our AI training clusters.

At Meta, we believe that open hardware is a catalyst for innovation — especially as data center infrastructure increasingly supports new and emerging AI technologies.

Open hardware plays a crucial role in enabling disaggregation, allowing us to break down traditional data center technologies into their core components.

Today, through OCP, we continue to advance open network technologies for the next generation of AI applications.

Ethernet for Scale-Up Networking in OCP: Meta’s Industry LeadershipAt Meta, we recognize that the future of AI and …

3 months, 1 week назад @ engineering.fb.com
LLMs Are the Key to Mutation Testing and Better Compliance
LLMs Are the Key to Mutation Testing and Better Compliance LLMs Are the Key to Mutation Testing and Better Compliance

By leveraging LLMs we’ve been able to overcome the barriers that have prevented mutation testing from being efficiently deployed at scale.

Our presentations shared insights into how we’ve used LLMs to solve the major barriers that have prevented mutation testing at scale and highlighted new areas in automated software testing where LLMs can have a significant impact.

Mutation Testing Isn’t ScalableTraditional mutation testing generates a very large number of mutants, making it computationally expensive and difficult to scale to large industrial codebases.

Mutation Testing Requires a Lot of Computational ResourcesMutation testing is costly in terms of computational resources and developer ef…

3 months, 3 weeks назад @ engineering.fb.com
AssetGen: Generating 3D Worlds With AI
AssetGen: Generating 3D Worlds With AI AssetGen: Generating 3D Worlds With AI

Imagine being able to use AI to create 3D virtual worlds using prompts as easily as you can generate images.

In his keynote, Mark Zuckerberg shared his vision of a future where anyone can create virtual worlds using AI-powered tools like the ones available in the upcoming Meta Horizon Studio.

But AI is already making it easier than ever to create 3D assets.

On this episode of the Meta Tech Podcast, Pascal Hartig is joined by Mahima and Rakesh from Meta’s XR Tech team to discuss AssetGen, a new foundation model for 3D assets.

They talk about how they built and trained AssetGen, the important role LLMs have to play in the future of VR, and how they’re tackling the ambitious goal of generating…

3 months, 4 weeks назад @ engineering.fb.com
Meta’s Infrastructure Evolution and the Advent of AI
Meta’s Infrastructure Evolution and the Advent of AI Meta’s Infrastructure Evolution and the Advent of AI

As our user base grew globally, we scaled beyond single data center buildings and into data center regions consisting of multiple buildings.

Enter AI Workloads (2020)While we were navigating the challenges of scaling, we were also seeing glimpses of how AI workloads would impact our infrastructure.

To build out our AI infrastructure, we’ve leveraged solutions from partners like AMD and NVIDIA as well as our own custom silicon.

Constructing Prometheus has been a monumental engineering feat, with infrastructure spanning five or more data center buildings in a single data center region.

We are still early in the evolution and adoption of AI workloads.

3 months, 4 weeks назад @ engineering.fb.com
Networking at the Heart of AI — @Scale: Networking 2025 Recap
Networking at the Heart of AI — @Scale: Networking 2025 Recap Networking at the Heart of AI — @Scale: Networking 2025 Recap

AI is everywhere and, as network engineers, we are right in the thick of it: building the network infrastructure for AI.

Setting Context: Rapid Changes and EvolutionGiven AI continues to drive so much innovation in networking and general infrastructure, we once again focused @Scale: Networking on AI networking, sharing the new insights and progress in the field.

The Models and the Primary AI Workloads Are Rapidly Evolving.

More from @Scale:Networking 2025Please visit the @Scale YouTube channel to check out all the talks from this year’s Networking @Scale.

We look forward to what promises to be another rapid year of network and AI innovation that we’ll cover at the next @Scale: Networking in…

4 months назад @ engineering.fb.com
A New Ranking Framework for Better Notification Quality on Instagram
A New Ranking Framework for Better Notification Quality on Instagram A New Ranking Framework for Better Notification Quality on Instagram

We’ve introduced a diversity-aware notification ranking framework to reduce uniformity and deliver a more varied and engaging mix of notifications.

Instagram leverages machine learning (ML) models to decide who should get a notification, when to send it, and what content to include.

To tackle this, we’ve introduced a diversity-aware notification ranking framework that helps deliver more diverse, better curated, and less repetitive notifications.

Introducing Instagram’s Diversity-Aware Notification Ranking FrameworkInstagram’s diversity-aware notification ranking framework is designed to enhance the notification experience by balancing the predicted potential for user engagement with the nee…

4 months, 3 weeks назад @ engineering.fb.com
Federation Platform and Privacy Waves: How Meta distributes compliance-related tasks at scale
Federation Platform and Privacy Waves: How Meta distributes compliance-related tasks at scale Federation Platform and Privacy Waves: How Meta distributes compliance-related tasks at scale

We’re exploring Meta’s Federation Platform, a scalable set of tools for managing compliance-related tasks, along with Privacy Waves, our method for batching these tasks and ensuring accountability.

To facilitate this, we developed the Federation Platform and Privacy Waves program:The Federation Platform breaks down large compliance-related initiatives into smaller, manageable workstreams.

Internal surveys reveal significantly higher positive sentiment for Privacy Waves tasks compared to ad-hoc tasks.

Step 6: Reporting and recognitionThe centralized distribution of tasks via Federation Platform and Privacy Waves streamline operational effectiveness and verification.

Expansions for the Federa…

5 months, 2 weeks назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 1 month, 3 weeks назад
We are joining OpenAI
We are joining OpenAI We are joining OpenAI

Piotr Niedźwiedź, CEO/CTO and founder of neptune.aiI’m excited to share that we’ve entered into a definitive agreement to be acquired by OpenAI, subject to closing conditions.

We are thrilled to join the OpenAI team and help their AI researchers build better models faster.

Neptune is a metrics dashboard company.”We’ve worked closely with OpenAI to create the metrics dashboard that helps teams building foundation models.

Our future with OpenAINeptune will join OpenAI and continue to support AI researchers with tools to monitor, debug, and evaluate frontier models.

We are looking forward to working with top AI researchers and supporting OpenAI’s mission of ensuring that AGI benefits all of hu…

1 month, 3 weeks назад @ neptune.ai
Synthetic Data for LLM Training
Synthetic Data for LLM Training Synthetic Data for LLM Training

For instance, financial data is highly sensitive and protected by very strict regulations, and synthetic data mimics the real data distribution without revealing customer information.

Read more about how leading foundation model teams curate their training data and other topics in the State of Foundation Model Training Report 2025.

Choosing the right synthetic data generation technique depends on the type of data and its complexity.

Synthetic tabular data generation is a promising direction to overcome these challenges by learning the distribution of the tabular data.

Post-processingAs the distribution of tabular data is highly complex, it makes the synthetic tabular data generation very ch…

2 months, 2 weeks назад @ neptune.ai
What are LLM Embeddings: All you Need to Know
What are LLM Embeddings: All you Need to Know What are LLM Embeddings: All you Need to Know

TL;DR LLM embeddings are the numerical, vector representations of text that Large Language Models (LLMs) use to process information.

Unlike their predecessor word embeddings, LLM embeddings are context-aware and dynamically change to capture semantic and syntactic relationships based on the surrounding text.

What are the applications of LLM embeddings?

Word EmbeddingsSparse Word Embeddings One-Hot Vectors 1970s TF-IDF1980s Co-Occurrence MatrixStatic Word Embeddings Word2Vec 2013 GloVe 2014Contextualized word embeddings ELMo 2018 GPT-1 2018 BERT 2018 LLAMA 2023 DeepSeek-V1 2023 GPT-4 2023Static word embeddingsStatic word embeddings, such as word2vec in 2013, marked a significant development.…

2 months, 2 weeks назад @ neptune.ai
Detecting and Fixing ‘Dead Neurons’ in Foundation Models
Detecting and Fixing ‘Dead Neurons’ in Foundation Models Detecting and Fixing ‘Dead Neurons’ in Foundation Models

TL;DR Dead neurons silently waste compute and reduce effective model capacity in foundation models.

Dead neurons’ impactRecent studies into dead neurons in the context of foundation models show interesting, albeit worrying, results.

These large reported fractions of dead neurons in foundation models are a concern from a computational perspective.

Before we move on to discuss how to detect and fix dead neurons, let’s touch upon an important distinction between dead neurons and vanishing gradients.

Further reading How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models Read moreVisualizing activation distributionsIs your foundation model suffering from dead neurons?

2 months, 4 weeks назад @ neptune.ai
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training
Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training Part 2: Instruction Fine-Tuning: Evaluation and Advanced Techniques for Efficient Training

In the first part of this series, we covered the fundamentals of instruction fine-tuning (IFT).

def calculate_irs(instruction, output, reference_model): evaluation_prompt = f""" Instruction: {instruction} Model Output: {output} Rate how well the output follows the instruction on these criteria: 1.

| SourceHINT addresses a computational inefficiency in standard instruction fine-tuning: repeatedly reprocessing the same task instruction with every input example.

Read more about foundation model training infrastructure and other topics in Neptune’s 2025 State of Foundation Model Training Report.

First, during initial instruction fine-tuning across multiple diverse tasks, the model learns genera…

3 months назад @ neptune.ai
How to Optimize LLM Inference
How to Optimize LLM Inference How to Optimize LLM Inference

Large Language Model (LLM) inference at scale is challenging as it involves transferring massive amounts of model parameters and data and performing computations on large tensors.

In the following, we’ll use the Llama model family architecture as a specific example to understand the LLM workload at inference.

For a far more detailed analysis of the LLM workload at inference, see the chapter All About Transformer Inference in the book How to Scale Your Model, published by Google DeepMind.

See also How to Run LLMs Locally Read moreA quick primer on hardware for LLM inferenceA typical LLM inference cluster consists of several nodes, each with a multi-core CPU and multiple accelerator devices, …

3 months, 1 week назад @ neptune.ai
A Researcher’s Guide to LLM Grounding
A Researcher’s Guide to LLM Grounding A Researcher’s Guide to LLM Grounding

In this article, we’ll explore the fundamental concepts of LLM grounding as well as strategies for optimally grounding models.

What is LLM grounding?

LLM grounding is analogous.

If relevant knowledge cannot be inferred from the data, then LLM grounding cannot yield more relevant responses.

When grounding LLMs using RAG, consider retaining only a few of the top hits (i.e., top-k) for your retrieval queries.

4 months назад @ neptune.ai
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions
Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions Instruction Fine-Tuning: Fundamentals, Architecture Modifications, and Loss Functions

TL;DR Instruction fine-tuning (IFT) refines pre-trained large language models (LLMs) to follow specific task instructions by training on prompt-response pairs.

Instruction fine-tuning in a nutshellIFT tailors LLMs to follow user instructions by bridging their inherent next-word prediction with human-defined objectives.

Related LLM Fine-Tuning and Model Selection Using Neptune and Transformers Read moreParameter-efficient instruction fine-tuningWhile major foundation models like GPT-4 or Llama-2 undergo full parameter instruction fine-tuning during development, parameter-efficient fine-tuning (PEFT) methods have become widely adopted for instruction fine-tuning since the LoRA paper was publi…

4 months, 1 week назад @ neptune.ai
Understanding Prompt Injection: Risks, Methods, and Defense Measures
Understanding Prompt Injection: Risks, Methods, and Defense Measures Understanding Prompt Injection: Risks, Methods, and Defense Measures

Prompt injection 101: When prompts go rogueThe term ‘Prompt Injection’ comes from SQL injection attacks.

There is another claim of the independent discovery of prompt injection attacks, which suggests that Riley Goodside publicly exhibited a prompt injection in a tweet back in September 2022.

The indirect prompt injection attacks are classified into active, passive, user-driven and virtual prompt attacks.

Virtual prompt injection attacksThis injection type is closely related to passive injection attacks previously described.

Prompt injection: current challenges & lessons learnedThe arms race between prompt injection attacks and defenses is a challenge for researchers, developers, and users.

5 months, 3 weeks назад @ neptune.ai
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]
SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections] SabiYarn: Advancing Low-Resource Languages With Multitask NLP Pre-Training [Paper Reflections]

This simple idea avoids computing loss on input prompt tokens the model already knows.

Prompt tokens are (too) expensive in low-resource settingsDuring pre-training, LLMs are trained in causal language modeling through a next-token prediction task.

=> Mo fẹ́ràn ìrẹsì,” the model is trained to predict every token, from the prompt to the actual answer:Step Prompt Next token 1 Translate English Static prompt 2 Translate English to Static prompt 3 Translate English to Yoruba: Static prompt 4 Translate English to Yoruba: I 5 Translate English to Yoruba: I love 6 Translate English to Yoruba: I love rice.

This is straightforward to implement in PyTorch by masking out the prompt tokens in the label …

5 months, 3 weeks назад @ neptune.ai
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models

What gradient issues occur during foundation model training?

During training, gradient descent updates model parameters by computing the gradients of the loss function via forward and backward passes.

The green line corresponds to a learning rate of 10, while the orange line has a learning rate of 0.1.

The gradient norm for the orange line with LR = 0.1 is very high in the first steps, while the gradient norm of the green line with LR = 10 diverges to NaN after a few steps.

Techniques for gradient stabilizationMonitoring gradient norms and training loss provides insights into the learning dynamics of the foundation models.

6 months, 3 weeks назад @ neptune.ai
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection] STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]

Unstructured pruning removes individual weights, while structured pruning removes entire model components.

In the context of MoEs, as expert structures from training MoEs correspond to such patterns, pruning experts is a natural fit for structured pruning.

Thus, structured pruning does not significantly decrease kurtosis, leaving plenty of margin for unstructured pruning.

Since structured pruning primarily reduces architectural redundancy rather than reshaping the underlying weight distribution, our two-phase approach—leveraging unstructured pruning after structured pruning—outperforms unstructured-only pruning.

Since STUN does not make any assumption about base MoE models, it is generaliza…

7 months, 3 weeks назад @ neptune.ai
Evaluating RAG Pipelines
Evaluating RAG Pipelines Evaluating RAG Pipelines

Related Building LLM Applications With Vector Databases Read moreDimensions of RAG evaluationEvaluating a RAG pipeline means assessing its behavior across three dimensions:1.

The evaluation of the RAG pipeline is a multi-step process, starting with creating an evaluation dataset, then evaluating the individual components (retriever, generator, etc.

Curating an evaluation datasetThe first step in the RAG evaluation process is the creation of a ground truth dataset.

MAP considers both the presence and rank of relevant chunks but fails to consider the relative position of relevant chunks.

However, not all retrieved chunks are equally relevant and sometimes, the most relevant chunks might not b…

8 months, 2 weeks назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 4 weeks назад
Traditional X-Mas Stream
Traditional X-Mas Stream Traditional X-Mas Stream

Letsgooo

4 weeks назад @ youtube.com
Traditional Holiday Live Stream
Traditional Holiday Live Stream Traditional Holiday Live Stream

https://ykilcher.com/discord Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yannic-kilcher

Minds: https://www.minds.com/ykilcher

Parler: https://parler.com/profile/YannicKilcher

LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https:/…

4 weeks назад @ youtube.com
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)
TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis) TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)

Paper: https://arxiv.org/abs/2511.08923 Abstract:

Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation …

4 weeks, 1 day назад @ youtube.com
Titans: Learning to Memorize at Test Time (Paper Analysis)
Titans: Learning to Memorize at Test Time (Paper Analysis) Titans: Learning to Memorize at Test Time (Paper Analysis)

Paper: https://arxiv.org/abs/2501.00663 Abstract:

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We sh…

1 month, 1 week назад @ youtube.com
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff) [Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)

https://arxiv.org/abs/2510.17558 Abstract:

We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks. Author: François Fleuret Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the con…

2 months, 3 weeks назад @ youtube.com
[Video Response] What Cloudflare's code mode misses about MCP and tool calling
[Video Response] What Cloudflare's code mode misses about MCP and tool calling [Video Response] What Cloudflare's code mode misses about MCP and tool calling

Theo's Video: https://www.youtube.com/watch?v=bAYZjVAodoo

Cloudflare article: https://blog.cloudflare.com/code-mode/ Links:

Homepage: https://ykilcher.com

Merch: https://ykilcher.com/merch

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://ykilcher.com/discord

LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannickilcher

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8…

3 months, 1 week назад @ youtube.com
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant) [Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

Paper: https://arxiv.org/abs/2508.21038 Abstract:

Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assumption that these difficulties are exclusively due to unrealistic queries, and those that are not can be overcome with better training data and larger models. In this work, we demonstrate that we may encounter these theoretical limitations in realist…

3 months, 2 weeks назад @ youtube.com
AGI is not coming!
AGI is not coming! AGI is not coming!

jack Morris's investigation into GPT-OSS training data https://x.com/jxmnop/status/1953899426075816164?t=3YRhVQDwQLk2gouTSACoqA&s=09

5 months, 2 weeks назад @ youtube.com
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis) Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Paper: https://research.trychroma.com/context-rot Abstract:

Large Language Models (LLMs) are typically presumed to process context uniformly—that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.

In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows. Authors: Kelly Hong, Anton Troynikov, Jeff Huber Links:

6 months назад @ youtube.com
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review) Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Paper: https://arxiv.org/abs/2507.02092

Code: https://github.com/alexiglad/EBT

Website: https://energy-based-transformers.github.io/ Abstract:

Inference-time computation techniques, analogous to human System 2 Thinking, have recently become popular for improving model performances. However, most existing approaches suffer from several limitations: they are modality-specific (e.g., working only in text), problem-specific (e.g., verifiable domains like math and coding), or require additional supervision/training on top of unsupervised pretraining (e.g., verifiers or verifiable rewards). In this paper, we ask the question "Is it possible to generalize these System 2 Thinking approaches, and de…

6 months, 1 week назад @ youtube.com
On the Biology of a Large Language Model (Part 2)
On the Biology of a Large Language Model (Part 2) On the Biology of a Large Language Model (Part 2)

An in-depth look at Anthropic's Transformer Circuit Blog Post

Part 1 here: https://youtu.be/mU3g2YPKlsA

Discord here: https;//ykilcher.com/discord https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy …

8 months, 3 weeks назад @ youtube.com
On the Biology of a Large Language Model (Part 1)
On the Biology of a Large Language Model (Part 1) On the Biology of a Large Language Model (Part 1)

An in-depth look at Anthropic's Transformer Circuit Blog Post https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson,

Sam Zimmerman, Kelley Rivoire, Thom…

9 months, 3 weeks назад @ youtube.com
Henry AI Labs Henry AI Labs
последний пост None
3blue1brown 3blue1brown
последний пост 1 week, 2 days назад
The ladybug clock puzzle
The ladybug clock puzzle The ladybug clock puzzle

This is the first in a set of monthly puzzles, curated by Peter Winkler. This one was originally suggested by Richard Stanley. You can sign up to hear his description of the answer at http://momath.org/mindbenders

1 week, 2 days назад @ youtube.com
The most absurd product I've made
The most absurd product I've made The most absurd product I've made

Because why not make a pi creature neck pillow?

Available at 3b1b.co/store

2 months назад @ youtube.com
How Laplace transforms solve differential equations
How Laplace transforms solve differential equations How Laplace transforms solve differential equations

Studying the forced harmonic oscillator by taking a Laplace transform and studying its poles.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Chapter on the Laplace Transform:

https://youtu.be/j0wJBEZdwLs Chapter on the S-plane and Simple Harmonic Motion:

https://youtu.be/-j8PzkZ70Lg Timestamps:

0:00 - Opening puzzle

1:06 - Key properties of a Laplace Transform

3:29 - Qualitative analysis with Laplace Transforms

4:29 - The Laplace Transforms of a Derivative

6:06 - The forced oscillator

11:59 - Intuition from the transformed solution

1…

2 months, 3 weeks назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

3 months, 2 weeks назад @ youtube.com
But what is a Laplace Transform?
But what is a Laplace Transform? But what is a Laplace Transform?

Visualizing the most important tool for differential equations.

Previous chapter: https://youtu.be/-j8PzkZ70Lg

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Artwork by Kurt Bruns Engine animation borrowed with permission from this (excellent) blog: https://ciechanow.ski/internal-combustion-engine/ Timestamps:

0:00 - Understanding the engine

1:16 - Key background ideas

5:41 - Definition and intuition

10:43 - Complex integration

20:43 - Analytic continuation

23:52 - The transform of exponentials

26:15 - A deep look at cos(t)

32:59 - W…

3 months, 2 weeks назад @ youtube.com
The dynamics of e^(πi)
The dynamics of e^(πi) The dynamics of e^(πi)

A fuller version of this explanation, also including the reason we care about complex exponents in the first place: https://youtu.be/-j8PzkZ70Lg

3 months, 2 weeks назад @ youtube.com
Why complex exponents matter | Laplace Transform Prelude
Why complex exponents matter | Laplace Transform Prelude Why complex exponents matter | Laplace Transform Prelude

How dynamics explain Euler's formula, and vice versa.

Early view of the Laplace Transform video: https://www.patreon.com/posts/laplace-early-140428165

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com Timestamps:

0:00 - Intro

1:51 - Euler's formula explained dynamically

9:27 - The harmonic oscillator

21:08 - General linear equations

22:47 - Motivating the Laplace Transform ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim Music by Vincent Rubin…

3 months, 3 weeks назад @ youtube.com
Why ruler and compass? | Guest video by ⁨@bensyversen⁩
Why ruler and compass? | Guest video by ⁨@bensyversen⁩ Why ruler and compass? | Guest video by ⁨@bensyversen⁩

What role were ruler and compass constructions really serving?

Check out Ben's channel: @bensyversen Interview with the author of this video: https://youtu.be/VohYM99j8e0

Supporters get early views of new videos: https://3b1b.co/support Written, produced, edited, and animated by Ben Syversen

Additional editing: Jack Saxon

3d Blender model: Jan-Hendrik Müller

Additional Blender help: Thibaut Modrzyk (@Deepia)

Illustrations: Alex Zepherin/DonDada Studio

Drums: Jeremy Gustin

Additional music from Epidemic Sound Special thanks to Viktor Blåsjö: https://intellectualmathematics.com/opinionated-history-of-mathematics/ References/Recommended reading: Euclid’s Elements:

Visual edition of Book 1: htt…

4 months, 1 week назад @ youtube.com
Incomplete open cubes
Incomplete open cubes Incomplete open cubes

Full video: https://youtu.be/_BrFKp-U8GI

4 months, 2 weeks назад @ youtube.com
Exploration & Epiphany
Exploration & Epiphany Exploration & Epiphany

Sol Lewitt's "Incomplete Open Cubes" and rediscovering Burnside's lemma in group theory

This is a guest video by Paul Dancstep: https://youtu.be/JEeM2ABUMoo

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to share the videos.

Home page: https://www.3blue1brown.com Thanks to the Wadsworth Atheneum for granting permission to use LeWitt's notebooks. Talks by Paul you can find online: What is Category Theory:

https://www.youtube.com/watch?app=desktop&v=eXBwU9ieLL0 How to Predict Eclipses:

https://www.exploratorium.edu/eclipse/video/how-predict-eclipses Theo Jansen's Strandbeests

https://www.youtube.com/w…

4 months, 2 weeks назад @ youtube.com
Simulating Phase Change | Guest video by Vilas Winstein
Simulating Phase Change | Guest video by Vilas Winstein Simulating Phase Change | Guest video by Vilas Winstein

Deriving the Boltzmann formula, defining temperature, and simulating liquid/vapor.

@SpectralCollective has the second part: https://youtu.be/yEcysu5xZH0

You can play with a simulation of this model here: https://vilas.us/simulations/liquidvapor/

These lessons are funded directly by viewers: https://3b1b.co/support

Home page: https://www.3blue1brown.com Notes from Vilas:

1) This open problem is to prove the ergodicity of the deterministic dynamical systems that are used to model the molecule-level physics. A good example of such a dynamical system is the box with particles evolving according to Newton's laws with elastic collisions, like in the video. 2) This video assumes that all probabili…

5 months назад @ youtube.com
How AI connects text and images
How AI connects text and images How AI connects text and images

From this guest video by @WelchLabsVideo on how diffusion models work: https://youtu.be/iv-5mZ_9CPY

5 months, 1 week назад @ youtube.com
The AI that solved IMO Geometry Problems | Guest video by @Aleph0
The AI that solved IMO Geometry Problems | Guest video by @Aleph0 The AI that solved IMO Geometry Problems | Guest video by @Aleph0

How AlphaGeometry combines logic and intuition.

Share stories about AI in math research for an upcoming video: https://forms.gle/gr9aZVdUrW5T3yDg9

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos.

Home page: https://www.3blue1brown.com AlphaGeometry announcement:

https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/ Similar open-source model, Newclid, by Harmonic:

https://harmonic.fun/news#blog-post-geometry Timestamps:

0:00 - What's surprising

1:33 - Solve without AI

7:10 - Where AI comes in

12:48 - Grant's comments ------------------…

5 months, 1 week назад @ youtube.com
But how do AI videos actually work? | Guest video by @WelchLabsVideo
But how do AI videos actually work? | Guest video by @WelchLabsVideo But how do AI videos actually work? | Guest video by @WelchLabsVideo

Diffusion models, CLIP, and the math of turning text into images

Welch Labs Book: https://www.welchlabs.com/resources/imaginary-numbers-book Sections

0:00 - Intro

3:37 - CLIP

6:25 - Shared Embedding Space

8:16 - Diffusion Models & DDPM

11:44 - Learning Vector Fields

22:00 - DDIM

25:25 Dall E 2

26:37 - Conditioning

30:02 - Guidance

33:39 - Negative Prompts

34:27 - Outro

35:32 - About guest videos + Grant’s Reaction Special Thanks to:

Jonathan Ho - Jonathan is the Author of the DDPM paper and the Classifier Free Guidance Paper.

https://arxiv.org/pdf/2006.11239

https://arxiv.org/pdf/2207.12598 Preetum Nakkiran - Preetum has an excellent introductory diffusion tutorial:

https://arxiv.org/pdf/24…

6 months назад @ youtube.com
Summer of Math Exposition #4 | Teachers, I'd love to hear from you
Summer of Math Exposition #4 | Teachers, I'd love to hear from you Summer of Math Exposition #4 | Teachers, I'd love to hear from you

Make a math explainer, get feedback, and receive prizes: https://some.3b1b.co

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos. ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim

https://github.com/3b1b/manim

https://github.com/ManimCommunity/manim/ All code for specific videos is visible here:

https://github.com/3b1b/videos/ The music is by Vincent Rubinetti.

https://www.vincentrubinetti.com

https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

https://open.spotify.com/…

8 months, 3 weeks назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 10 часов назад
Scientists Just Solved The Hardest Problem in Granular Physics
Scientists Just Solved The Hardest Problem in Granular Physics Scientists Just Solved The Hardest Problem in Granular Physics

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://visualcomputing.ist.ac.at/publications/2025/HomogenizedSand/ Previous Disney grains paper:

https://la.disneyresearch.com/publication/multi-scale-modeling-and-rendering-of-granular-materials/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, Tazaur Sagen…

10 часов назад @ youtube.com
This Fluid Simulation Should Not Be Possible
This Fluid Simulation Should Not Be Possible This Fluid Simulation Should Not Be Possible

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper "Fast Octree Neighborhood Search for SPH Simulations" is available here:

https://andreaslongva.com/pdf/2022-SA-NeighborhoodSearch-compressed.pdf

https://animation.rwth-aachen.de/media/papers/79/2022-SA-NeighborhoodSearch.pdf Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, …

1 week назад @ youtube.com
Wrinkles Are Weirder Than We Ever Thought
Wrinkles Are Weirder Than We Ever Thought Wrinkles Are Weirder Than We Ever Thought

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://wanghmin.github.io/publication/zhang-2025-pie/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Ryan Stankye, Steef, Taras Bobrovytsky, Tazaur Sagenclaw, Tybie Fitzhugh, Ueli Gallizzi My research: https://cg.tuwien.ac.at/~zsolnai/

1 week, 6 days назад @ youtube.com
Why Game Physics Is Falling Apart (And How To Fix It)
Why Game Physics Is Falling Apart (And How To Fix It) Why Game Physics Is Falling Apart (And How To Fix It)

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The paper is available here:

https://graphics.cs.utah.edu/research/projects/stable-cosserat-rods/ Our Patreon if you wish to support us: https://www.patreon.com/TwoMinutePapers

Note that just watching the series and leaving a kind comment every now and then is as much support as any of us could ever ask for! Sources:

https://www.youtube.com/watch?v=kO3NsSX1VTg

https://www.youtube.com/watch?v=IQZ_zBX6gQY 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Adam Bridges, Benji Rabhan, B Shang, Cameron Navor, Christian Ahlin, Eric T, Fred R, Gordon Child, Juan Benet, Mi…

2 weeks, 4 days назад @ youtube.com
We Just Turned Down Millions of Dollars. Here Is Why.
We Just Turned Down Millions of Dollars. Here Is Why. We Just Turned Down Millions of Dollars. Here Is Why.

Yup. My free course on how to write a light simulation program (ray tracing): https://users.cg.tuwien.ac.at/zsolnai/gfx/rendering-course/ 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Tybie Fitzhugh, Ueli Gallizzi

If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

3 weeks, 3 days назад @ youtube.com
The Bug That Ruined Game Physics For Decades
The Bug That Ruined Game Physics For Decades The Bug That Ruined Game Physics For Decades

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 📝 The paper "A Stream Function Solver for Liquid Simulations" is available here:

https://pub.ista.ac.at/group_wojtan/projects/2015_Ando_ASFSfLS/download/vecpotential.pdf 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon …

3 weeks, 4 days назад @ youtube.com
NVIDIA’s AI Learns To Walk…Painfully
NVIDIA’s AI Learns To Walk…Painfully NVIDIA’s AI Learns To Walk…Painfully

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 📝 The paper is available here:

https://research.nvidia.com/labs/toronto-ai/trace-pace/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras B…

1 month назад @ youtube.com
This Is The Physics Tech Games Have Been Waiting For
This Is The Physics Tech Games Have Been Waiting For This Is The Physics Tech Games Have Been Waiting For

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The paper is available here:

https://wanghmin.github.io/publication/wu-2022-gbm/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Tybie Fitzhugh, Ueli Gallizzi

If you wish to appear here or …

1 month, 1 week назад @ youtube.com
The AI That Built An Economy… And Went Bankrupt
The AI That Built An Economy… And Went Bankrupt The AI That Built An Economy… And Went Bankrupt

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 📝 The paper is available here:

https://simworld.org/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Tybie Fitzhugh, Ueli G…

1 month, 1 week назад @ youtube.com
DeepMind’s Crazy New AI Masters Games That Don’t Exist
DeepMind’s Crazy New AI Masters Games That Don’t Exist DeepMind’s Crazy New AI Masters Games That Don’t Exist

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 📝 The SIMA 2 paper is available here:

https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Fred R, Gordon Child, Juan Benet, Michael…

1 month, 2 weeks назад @ youtube.com
AlphaFold - The Most Important AI Breakthrough Ever Made
AlphaFold - The Most Important AI Breakthrough Ever Made AlphaFold - The Most Important AI Breakthrough Ever Made

Full interview: https://www.youtube.com/watch?v=Vhcwjzeukts

1 month, 2 weeks назад @ youtube.com
30x Better Physics: Why Everyone Missed This Genius Solution
30x Better Physics: Why Everyone Missed This Genius Solution 30x Better Physics: Why Everyone Missed This Genius Solution

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Using DeepSeek on Lambda:

https://lambda.ai/inference-models/deepseek-r1 My hobby channel with guitars and labcoats 🥼:

https://www.youtube.com/watch?v=GjMMhn4pS38

https://www.youtube.com/watch?v=BxS62W6V48E 📝 The paper is available here:

https://arxiv.org/abs/2505.21946 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Chri…

1 month, 2 weeks назад @ youtube.com
He Kinda Solved Biology - Nobel Prize Winner John Jumper Interview
He Kinda Solved Biology - Nobel Prize Winner John Jumper Interview He Kinda Solved Biology - Nobel Prize Winner John Jumper Interview

Thank you so much to John for being so kind and insightful, and to the film crew as well - they all did an incredible job. To celebrate the 5th anniversary of #AlphaFold, I was invited by Google DeepMind to interview Nobel Prize Winner and Distinguished Scientist, John Jumper. Note that we have no business ties with them. AlphaFold: https://deepmind.google/science/alphafold/

The full Thinking Game Movie: https://www.youtube.com/watch?v=d95J8yzvjbQ My research: https://cg.tuwien.ac.at/~zsolnai/

Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

1 month, 3 weeks назад @ youtube.com
Unreal Engine 5.7: Billions Of Triangles, In Real Time
Unreal Engine 5.7: Billions Of Triangles, In Real Time Unreal Engine 5.7: Billions Of Triangles, In Real Time

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers 📝 The Unreal Engine 5.7 is available here:

https://www.unrealengine.com/en-US/news/unreal-engine-5-7-is-now-available Sources:

https://www.youtube.com/watch?v=Mj_-2SdsYLw

https://www.youtube.com/watch?v=ngzPTqtZWo4

https://advances.realtimerendering.com/s2023/2023%20Siggraph%20-%20Substrate.pdf 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Be…

2 months назад @ youtube.com
Blender 5.0 Is Here - A Revolution…For Free!
Blender 5.0 Is Here - A Revolution…For Free! Blender 5.0 Is Here - A Revolution…For Free!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Get Blender 5.0 here: https://www.blender.org/

Example scenes: https://www.blender.org/download/demo-files/

Multiple scattering paper: https://cg.iit.bme.hu/~szirmay/volreuse_link.htm 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, Juan Benet, Michael Tedder, Owen Skarpness, Richard Sundvall…

2 months назад @ youtube.com
DataFest Video DataFest Video
последний пост None
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 1 day, 8 hours назад
Что такое PMI Score
Что такое PMI Score Что такое PMI Score

Это фрагмент выступления Всеволода Парамонова и Андрея Шевцова, разработчиков группы аналитики ценообразования Яндекс Лавки. На Data Fest Siberia они рассказали, что лежит под капотом динамического ценообразования в сервисе. Также ребята разобрали, как устроены модели спроса и эластичности и какие математические формулы приводят процесс в движение. Смотрите полное видео на нашем канале! #ценообразование, #dynamicpricing, #pricingmodel, #dataanalytics, #datascience, #ML, #математика, #экономика, #аналитика, #бизнесметрики, #спрос, #эластичность, #YandexLavka, #DataFestSiberia, #ITконференция, #datadriven, #Shorts

1 day, 8 hours назад @ youtube.com
Как выглядит модель ценообразования
Как выглядит модель ценообразования Как выглядит модель ценообразования

Это фрагмент выступления Всеволода Парамонова и Андрея Шевцова, разработчиков группы аналитики ценообразования Яндекс Лавки. На Data Fest Siberia они рассказали, что лежит под капотом динамического ценообразования в сервисе. Также ребята разобрали, как устроены модели спроса и эластичности и какие математические формулы приводят процесс в движение. Смотрите полное видео на нашем канале! #ценообразование, #dynamicpricing, #pricingmodel, #dataanalytics, #datascience, #ML, #математика, #экономика, #аналитика, #бизнесметрики, #спрос, #эластичность, #YandexLavka, #DataFestSiberia, #ITконференция, #datadriven, #Shorts

5 days, 5 hours назад @ youtube.com
Какую задачу решает ценообразование
Какую задачу решает ценообразование Какую задачу решает ценообразование

Это фрагмент выступления Всеволода Парамонова и Андрея Шевцова, разработчиков группы аналитики ценообразования Яндекс Лавки. На Data Fest Siberia они рассказали, что лежит под капотом динамического ценообразования в сервисе. Также ребята разобрали, как устроены модели спроса и эластичности и какие математические формулы приводят процесс в движение. Смотрите полное видео на нашем канале! #ценообразование, #dynamicpricing, #pricingmodel, #dataanalytics, #datascience, #ML, #математика, #экономика, #аналитика, #бизнесметрики, #спрос, #эластичность, #YandexLavka, #DataFestSiberia, #ITконференция, #datadriven, #Shorts

1 week, 1 day назад @ youtube.com
Компьютерное зрение в 2025-м / Роман Исаченко
Компьютерное зрение в 2025-м / Роман Исаченко Компьютерное зрение в 2025-м / Роман Исаченко

ML Global Recap 2025 — митап для ML-сообщества, на котором мы рассказали о главных международных конференциях года и самых интересных трендах в рекомендательных технологиях, компьютерном зрении, распознавании речи и NLP. С докладом на ивенте выступил Роман Исаченко, руководитель команды анализа изображений в Яндекс R&D. Он рассказал про мультимодальный анализ изображений (VLM) и диффузионки — картиночную генерацию. Больше ML-контента по ссылке: https://t.me/+Ug9D4CjJrJxmZGRi #ML, #MachineLearning, #AI, #DataScience, #MLGlobalRecap2025, #нейросети, #искусственныйинтеллект, #рекомендательныесистемы, #NLP, #ComputerVision, #SpeechRecognition, #LLM, #DeepLearning, #MLтренды, #ITконференция, #Ya…

1 month назад @ youtube.com
Тренды в NLP, обзор ICLR и ACL / Александр Юшкевич
Тренды в NLP, обзор ICLR и ACL / Александр Юшкевич Тренды в NLP, обзор ICLR и ACL / Александр Юшкевич

ML Global Recap 2025 — митап для ML-сообщества, на котором мы рассказали о главных международных конференциях года и самых интересных трендах в рекомендательных технологиях, компьютерном зрении, распознавании речи и NLP. С докладом на ивенте выступил Александр Юшкевич, руководитель команды развития моделей базового качества в Поисковых сервисах и ИИ. Он показал, как конференции отражают тренды в NLP: растёт закрытость топовых LLM-моделей, а также спрос на alignment & safety, инференс, интерпретируемость и оптимизацию. А ещё появляются новые бенчмарки (куда без них). Больше ML-контента по ссылке: https://t.me/+Ug9D4CjJrJxmZGRi #ML, #MachineLearning, #AI, #DataScience, #MLGlobalRecap2025, #не…

1 month назад @ youtube.com
Голосовые технологии на Interspeech и ICASSP 2025 / Борис Шелудько
Голосовые технологии на Interspeech и ICASSP 2025 / Борис Шелудько Голосовые технологии на Interspeech и ICASSP 2025 / Борис Шелудько

ML Global Recap 2025 — митап для ML-сообщества, на котором мы рассказали о главных международных конференциях года и самых интересных трендах в рекомендательных технологиях, компьютерном зрении, распознавании речи и NLP. С докладом на ивенте выступил Борис Шелудько, руководитель команды качества звука в Яндекс R&D. Он рассказал про итоги двух главных международных конференций по голосовым технологиям: Interspeech в Нидерландах и ICASSP в Индии. Больше ML-контента по ссылке: https://t.me/+Ug9D4CjJrJxmZGRi #ML, #MachineLearning, #AI, #DataScience, #MLGlobalRecap2025, #нейросети, #искусственныйинтеллект, #рекомендательныесистемы, #NLP, #ComputerVision, #SpeechRecognition, #LLM, #DeepLearning, …

1 month назад @ youtube.com
Главные тренды рекомендательных систем / Николай Савушкин
Главные тренды рекомендательных систем / Николай Савушкин Главные тренды рекомендательных систем / Николай Савушкин

ML Global Recap 2025 — митап для ML-сообщества, на котором мы рассказали о главных международных конференциях года и самых интересных трендах в рекомендательных технологиях, компьютерном зрении, распознавании речи и NLP. С докладом на ивенте выступил Николай Савушкин, руководитель команды рекомендательных технологий в Яндекс R&D. Он поделился инсайтами с CIKM и RecSys и рассказал про ключевые тренды в рекомендательных системах: фундаментальные и End2End-модели, масштабирование, мультимодальность, attention-based ranking и другие. Больше ML-контента по ссылке: https://t.me/+Ug9D4CjJrJxmZGRi #ML, #MachineLearning, #AI, #DataScience, #MLGlobalRecap2025, #нейросети, #искусственныйинтеллект, #ре…

1 month назад @ youtube.com
Открытие ML Global Recap 2025 / Алексей Гусаков
Открытие ML Global Recap 2025 / Алексей Гусаков Открытие ML Global Recap 2025 / Алексей Гусаков

ML Global Recap 2025 — митап для ML-сообщества, на котором мы рассказали о главных международных конференциях года и самых интересных трендах в рекомендательных технологиях, компьютерном зрении, распознавании речи и NLP. Ивент открыл Алексей Гусаков, CTO Поисковых сервисов и ИИ. В докладе — вступительное слово, краткий обзор NeurIPS и немного про кризис бенчмарков. Больше ML-контента по ссылке: https://t.me/+Ug9D4CjJrJxmZGRi #ML, #MachineLearning, #AI, #DataScience, #MLGlobalRecap2025, #нейросети, #искусственныйинтеллект, #рекомендательныесистемы, #NLP, #ComputerVision, #SpeechRecognition, #LLM, #DeepLearning, #MLтренды, #ITконференция, #Yandex, #MLсообщество

1 month назад @ youtube.com
Как решить проблему разнообразия ответов LLM
Как решить проблему разнообразия ответов LLM Как решить проблему разнообразия ответов LLM

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

1 month, 3 weeks назад @ youtube.com
ML Global Recap'25
ML Global Recap'25 ML Global Recap'25

Митап Яндекса для ML-сообщества, на котором расскажем о шести международных конференциях и главных трендах в рекомендательных технологиях, компьютерном зрении, технологиях распознавания речи и NLP.

1 month, 3 weeks назад @ youtube.com
Секция на проверку базовых технических навыков ML-инженеров
Секция на проверку базовых технических навыков ML-инженеров Секция на проверку базовых технических навыков ML-инженеров

Мок-интервью: как проходит секция на базовые алгоритмы для ML-инженеров Подробнее: https://yandex.ru/jobs/interview/mldev

Вакансии: https://yandex.ru/jobs/vacancies?professions=ml-developer Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/+Ug9D4CjJrJxmZGRi #ml, #machinelearning, #mlengineer, #mlinterview, #datascience, #яндекс, #yandex, #itcareer, #mldeveloper, #techinterview, #algorithms, #машиннообучение, #вакансии

1 month, 3 weeks назад @ youtube.com
Какие есть виды LLM-аугментаций
Какие есть виды LLM-аугментаций Какие есть виды LLM-аугментаций

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

1 month, 3 weeks назад @ youtube.com
Как LLM предсказывает токены
Как LLM предсказывает токены Как LLM предсказывает токены

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

2 months назад @ youtube.com
«Реал Мадрид» или «Барселона»: кто круче по мнению LLM
«Реал Мадрид» или «Барселона»: кто круче по мнению LLM «Реал Мадрид» или «Барселона»: кто круче по мнению LLM

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025 он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #MachineLearning #GenerativeAI #YandexForML #YandexForDevelopers #Яндекс #AIDevDay #NeuralNetworks #ArtificialIntelligence #Football #Soccer #RealMadrid #Barcelona #ElClasico #LaLiga #Messi #Ronaldo #TechAndSports #AIInSports #FootballFans #SportsAnalytics #YandexTech #AIComparison

2 months назад @ youtube.com
Что должен знать AI-ассистент
Что должен знать AI-ассистент Что должен знать AI-ассистент

Это отрывок из доклада Алексея Колесова, CTO в Яндекс R&D. На Practical ML Conf 2025он рассказал, как ребята учили YandexGPT 5.1 лучше помнить факты и применять знания о них. А ещё показал, как у нас стабильно заработал online RL. Полная запись уже на канале! #YandexGPT #LLM #AI #ArtificialIntelligence #MachineLearning #DeepLearning #ReinforcementLearning #OnlineRL #NLP #GenerativeAI #YandexForDevelopers #YandexForML #Яндекс #AIDevDay #TechConference #DataScience #ML #AIResearch #LanguageModel #YandexTech

2 months, 1 week назад @ youtube.com
ML Trainings ML Trainings
последний пост 3 часа назад
Капитанский мостик №3: нейрошлак от Суцкевера | Google больше не нужен | унитазы и ИИ
Капитанский мостик №3: нейрошлак от Суцкевера | Google больше не нужен | унитазы и ИИ Капитанский мостик №3: нейрошлак от Суцкевера | Google больше не нужен | унитазы и ИИ

0:00:00 Начало

0:00:35 нейрошлак от Суцкевера

0:05:40 ИИ умнее человечества

0:27:17 Маск колоссально обманул

0:35:16 12 месяцев до конца программистов

0:48:01 OpenAI в овечьей шкуре

0:52:12 Чжипу не смогла

0:57:17 Google больше не нужен 1:01:42 ядерные H200

1:17:35 беспилотники на дорогах уже завтра

1:20:25 унитазы и ИИ ИИ-саммари:

В этом разговоре обсуждаются последние новости в области искусственного интеллекта, включая заявления Илона Маска о будущем ИИ, а также философские и практические аспекты интеллекта. Участники делятся мнениями о том, как мы можем оценивать ИИ и его достижения, а также приводят примеры успешных алгоритмов, которые превзошли человеческие способности. В этой беседе …

3 часа назад @ youtube.com
Роверы Яндекса дешевле курьеров, но не могут подниматься по лестнице
Роверы Яндекса дешевле курьеров, но не могут подниматься по лестнице Роверы Яндекса дешевле курьеров, но не могут подниматься по лестнице 1 day, 12 hours назад @ youtube.com
Роботы и лифты: как Китай решает проблему подъезда роботов к двери
Роботы и лифты: как Китай решает проблему подъезда роботов к двери Роботы и лифты: как Китай решает проблему подъезда роботов к двери 2 days, 12 hours назад @ youtube.com
Разница в менталитете: США и Китай в примере смар
Разница в менталитете: США и Китай в примере смар Разница в менталитете: США и Китай в примере смар 3 days, 12 hours назад @ youtube.com
Культура молчания и победы в США и Китае
Культура молчания и победы в США и Китае Культура молчания и победы в США и Китае 4 days, 12 hours назад @ youtube.com
Китайцы говорят, что не могут обогнать США
Китайцы говорят, что не могут обогнать США Китайцы говорят, что не могут обогнать США 5 days, 12 hours назад @ youtube.com
Казахстан запретил дипфейки, и это вызвало споры
Казахстан запретил дипфейки, и это вызвало споры Казахстан запретил дипфейки, и это вызвало споры 6 days, 12 hours назад @ youtube.com
Капитанский мостик №2: у Microsoft пылятся GPU | Цукерберг выкупил АЭС | Маск против университетов
Капитанский мостик №2: у Microsoft пылятся GPU | Цукерберг выкупил АЭС | Маск против университетов Капитанский мостик №2: у Microsoft пылятся GPU | Цукерберг выкупил АЭС | Маск против университетов

0:00:00 Начало

0:01:17 Алиса выбирает товары

0:08:19 у Microsoft пылятся GPU

0:16:41 Цукерберг выкупил АЭС

0:23:46 ЦОДы вместо тракторов

0:30:07 китайский тигр притаился

0:37:33 США запрещают облака

0:40:17 Минюст разрешил ИИ

0:42:26 мусульмане против дипфейков

0:47:11 OpenAI и финансовая дыра

0:51:19 OpenAI и перспективы

0:53:37 роботы лучше курьеров

0:59:08 Госкорпорации без людей

1:03:57 Илон Маск против университетов ИИ-саммари: ИИ и данные перестраивают рынок: у Яндекса появился поиск товаров через Алису, Google работает над ОС Aluminum (слияние Android и Chrome), и платформы адаптируются под новые сценарии. Геополитика и инфраструктура пересекаются — запрет на аренду GPU китайским ком…

1 week назад @ youtube.com
Роботы заменят мигрантов на рынке труда за три года
Роботы заменят мигрантов на рынке труда за три года Роботы заменят мигрантов на рынке труда за три года 1 week, 2 days назад @ youtube.com
Программирование и мышление: что умеют модели
Программирование и мышление: что умеют модели Программирование и мышление: что умеют модели 1 week, 3 days назад @ youtube.com
Отрезвление осени и искусственный интеллект
Отрезвление осени и искусственный интеллект Отрезвление осени и искусственный интеллект 1 week, 4 days назад @ youtube.com
Нейронка и вкус: как создавать дизайн с помощью ИИ
Нейронка и вкус: как создавать дизайн с помощью ИИ Нейронка и вкус: как создавать дизайн с помощью ИИ 1 week, 5 days назад @ youtube.com
ejbot роботы за 690 долларов из Китая
ejbot роботы за 690 долларов из Китая ejbot роботы за 690 долларов из Китая 1 week, 6 days назад @ youtube.com
Dell отказывается от позиционирования ии
Dell отказывается от позиционирования ии Dell отказывается от позиционирования ии 2 weeks назад @ youtube.com
Капитанский мостик №1: Китайские фотоны быстрее | Роботы на детском утреннике | ПК с ИИ не нужны
Капитанский мостик №1: Китайские фотоны быстрее | Роботы на детском утреннике | ПК с ИИ не нужны Капитанский мостик №1: Китайские фотоны быстрее | Роботы на детском утреннике | ПК с ИИ не нужны

0:00:00 Начало

0:00:37 Реклама в ChatGPT

0:08:28 Manus и Цукерберг

0:14:52 Китайские фотоны быстрее

0:27:24 Новый TPU от Google

0:36:03 Рынок ускорителей от Epoch AI

0:39:52 Геймерская GPU от китайцев

0:44:09 Миллион TPU для Anthropic

0:47:39 Doubao больше всех

0:50:31 Роботы на детском утреннике

0:55:49 "Будь здоров" от ChatGPT

1:02:09 ПК с ИИ не нужны

1:10:15 Автоматизации кодинга не будет

1:18:57 Нейросети заставляют вязать ИИ-саммари: В этом выпуске обсуждаются прогнозы на новый год, реклама в ChargeGPT, покупка стартапа Manus и будущее фотонных чипов, а также новый TPU от Google. В ходе обсуждения были рассмотрены ключевые аспекты рынка чипов, включая долю рынка, хеджирование рисков и …

2 weeks назад @ youtube.com
Primer Primer
последний пост 3 weeks, 4 days назад
Taking AI Doom Seriously For 62 Minutes
Taking AI Doom Seriously For 62 Minutes Taking AI Doom Seriously For 62 Minutes

Patreon: https://www.patreon.com/primerlearning

80,000 Hours: 80000hours.org/primer https://www.desmos.com/calculator/a5pfjtr4tr Other connections:

Discord: https://discord.gg/NbruaNW

Twitch: https://www.twitch.tv/justin_helps

Store: https://store.dftba.com/collections/primer Reddit: https://www.reddit.com/r/primerlearning/

Bsky: https://bsky.app/profile/justinhelps.bsky.social

Twitter: https://twitter.com/primerlearning Links to other resources:

https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/

https://www.youtube.com/c/robertmilesai

https://www.youtube.com/@Siliconversations

https://www.youtube.com/@Go-Meta

https://www.youtube.com/@Dwarkes…

3 weeks, 4 days назад @ youtube.com
Simulating a single brain cell
Simulating a single brain cell Simulating a single brain cell

Patreon:

https://www.patreon.com/primerlearning Helpful resources if you want to learn more about neural networks

https://www.youtube.com/@AndrejKarpathy

https://course.fast.ai/

https://www.youtube.com/@WelchLabsVideo

https://www.youtube.com/@3blue1brown Early papers. These probably aren't helpful for understanding the concepts in this video, but if you're interested in history.

The Perceptron – A perceiving and recognizing automaton: https://bpb-us-e2.wpmucdn.com/websites.umass.edu/dist/a/27637/files/2016/03/rosenblatt-1957.pdf

The Perceptron: A probabilistic model for information storage and organization in the brain: https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf A Logical…

4 months назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 1 week, 4 days назад
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle
#489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle #489 – Paul Rosolie: Uncontacted Tribes in the Amazon Jungle

Paul Rosolie is a naturalist, explorer, author of a new book titled Junglekeeper, and is someone who has dedicated his life to protecting the Amazon rainforest.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep489-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://perplexity.ai/BetterHelp: Online therapy and counseling.

Go to https://fin.ai/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

1 week, 4 days назад @ lexfridman.com
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins
#488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins #488 – Infinity, Paradoxes that Broke Mathematics, Gödel Incompleteness & the Multiverse – Joel David Hamkins

Joel David Hamkins is a mathematician and philosopher specializing in set theory, the foundations of mathematics, and the nature of infinity, and he’s the #1 highest-rated user on MathOverflow.

He is also the author of several books, including Proof and the Art of Mathematics and Lectures on the Philosophy of Mathematics.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep488-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://masterclass.com/lexpodOUTLINE:(00:00) – Introduction(01:58) – Sponsors, Comments, and Reflections(15:40) – Infinity & paradoxes(1:02:50) – Russell’s paradox(1:15:57) – Gödel’s…

3 weeks, 3 days назад @ lexfridman.com
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths
#487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths #487 – Irving Finkel: Deciphering Secrets of Ancient Civilizations & Flood Myths

Irving Finkel is a scholar of ancient languages and a longtime curator at the British Museum, renowned for his expertise in Mesopotamian history and cuneiform writing.

He specializes in reading and interpreting cuneiform inscriptions, including tablets from Sumerian, Akkadian, Babylonian, and Assyrian contexts.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep487-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/Chevron: Reliable energy for data centers.

1 month, 1 week назад @ lexfridman.com
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life
#486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life #486 – Michael Levin: Hidden Reality of Alien Intelligence & Biological Life

Michael Levin is a biologist at Tufts University working on novel ways to understand and control complex pattern formation in biological systems.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep486-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexMiro: Online collaborative whiteboard platform.

Go to https://miro.com/MasterClass: Online classes from world-class experts.

(2:42:41) – Mind uploading(3:01:22) – Alien intelligence(3:16:17) – Advice for young people(3:22:46) – Questions for AGI

1 month, 3 weeks назад @ lexfridman.com
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy
#485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy #485 – David Kirtley: Nuclear Fusion, Plasma Physics, and the Future of Energy

David Kirtley is a nuclear fusion engineer and CEO of Helion Energy, a company working on building the world's first commercial fusion power plant by 2028.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep485-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/david-kirtley-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

David's X: htt…

2 months, 1 week назад @ lexfridman.com
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming
#484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming #484 – Dan Houser: GTA, Red Dead Redemption, Rockstar, Absurd & Future of Gaming

Dan Houser is co-founder of Rockstar Games and is a legendary creative mind behind Grand Theft Auto (GTA) and Red Dead Redemption series of video games.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep484-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://box.com/aiUPLIFT Desk: Standing desks and office ergonomics.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(01:29) – Sponsors, Comments, and Reflections(11:32) – Greatest films of all time(23:45) – Making video games(26:36) – GTA 3(29:55) – Open world video games(32:42) – Character creation(36:09) – Superintelligent AI in A Bette…

2 months, 3 weeks назад @ lexfridman.com
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex
#483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex #483 – Julia Shaw: Criminal Psychology of Murder, Serial Killers, Memory & Sex

Julia Shaw is a criminal psychologist and author who in her books explores human nature, including psychopathy, violent crime, the psychology of evil, police interrogation, false memory manipulation, deception detection, and human sexuality.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep483-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

3 months, 1 week назад @ lexfridman.com
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature
#482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature #482 – Pavel Durov: Telegram, Freedom, Censorship, Money, Power & Human Nature

Pavel Durov is the founder and CEO of Telegram.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep482-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Transcript:https://lexfridman.com/pavel-durov-transcriptCONTACT LEX:Feedback – give feedback to Lex: https://lexfridman.com/surveyAMA – submit questions, videos or call-in: https://lexfridman.com/amaHiring – join our team: https://lexfridman.com/hiringOther – other ways to get in touch: https://lexfridman.com/contactEPISODE LINKS:Pavel’s Telegram: https://t.me/durovPavel’s X: https://x.com/durovTelegram: https://telegram.org/Telegram Contests: https://contest.c…

3 months, 3 weeks назад @ lexfridman.com
#481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA
#481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA #481 – Norman Ohler: Hitler, Nazis, Drugs, WW2, Blitzkrieg, LSD, MKUltra & CIA

Norman Ohler is a historian and author of “Blitzed: Drugs in the Third Reich,” a book that investigates the role of psychoactive drugs, particularly stimulants such as methamphetamine, in the military history of World War II.

It is a book that two legendary historians Ian Kershaw and Antony Beevor give very high praise for its depth of research.

Norman also wrote “Tripped: Nazi Germany, the CIA, and the Dawn of the Psychedelic Age”, and he is working on a new book “Stoned Sapiens” looking at the history of human civilization through the lens of drugs.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep481-scSee below for timestamps, transcript, and to give f…

4 months, 1 week назад @ lexfridman.com
#480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park
#480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park #480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park

Dave Hone is a paleontologist, expert on dinosaurs, co-host of the Terrible Lizards podcast, and author of numerous scientific papers and books on the behavior and ecology of dinosaurs.

He lectures at Queen Mary University of London on topics of Ecology, Zoology, Biology, and Evolution.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep480-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://go.lindy.ai/lexBetterHelp: Online therapy and counseling.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

4 months, 3 weeks назад @ lexfridman.com
#479 – Dave Plummer: Programming, Autism, and Old-School Microsoft Stories
#479 – Dave Plummer: Programming, Autism, and Old-School Microsoft Stories #479 – Dave Plummer: Programming, Autism, and Old-School Microsoft Stories

Dave Plummer is a programmer, former Microsoft software engineer (Windows 95, NT, XP), creator of Task Manager, author of two books on autism, and host of the Dave’s Garage YouTube channel, where he shares stories from his career, insights on software development, and deep dives into technology.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep479-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://upliftdesk.com/lexZocDoc: App that helps patients find healthcare providers.

Go to https://zocdoc.com/lexFin: AI agent for customer service.

Go to https://fin.ai/lexAllio Capital: AI-powered investment app that use…

4 months, 4 weeks назад @ lexfridman.com
#478 – Scott Horton: The Case Against War and the Military Industrial Complex
#478 – Scott Horton: The Case Against War and the Military Industrial Complex #478 – Scott Horton: The Case Against War and the Military Industrial Complex

Scott Horton is the director of the Libertarian Institute, editorial director of Antiwar.com, host of The Scott Horton Show, co-host of Provoked, and for the past three decades a staunch critic of U.S. military interventionism.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep478-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://alliocapital.com/Hampton: Community for high-growth founders and CEOs.

Go to https://joinhampton.com/lexBetterHelp: Online therapy and counseling.

Go to https://drinkag1.com/lexOUTLINE:(00:00) – Introduction(00:35) – Sponsors, Comments, and Reflections(09:14) – From the Cold War to …

5 months назад @ lexfridman.com
#477 – Keyu Jin: China’s Economy, Tariffs, Trade, Trump, Communism & Capitalism
#477 – Keyu Jin: China’s Economy, Tariffs, Trade, Trump, Communism & Capitalism #477 – Keyu Jin: China’s Economy, Tariffs, Trade, Trump, Communism & Capitalism

Keyu Jin is an economist specializing in China’s economy, international macroeconomics, global trade imbalances, and financial policy.

She is the author of The New China Playbook: Beyond Socialism and Capitalism.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep477-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://alliocapital.com/UPLIFT Desk: Standing desks and office ergonomics.

Go to https://upliftdesk.com/lexHampton: Community for high-growth founders and CEOs.

5 months, 2 weeks назад @ lexfridman.com
#476 – Jack Weatherford: Genghis Khan and the Mongol Empire
#476 – Jack Weatherford: Genghis Khan and the Mongol Empire #476 – Jack Weatherford: Genghis Khan and the Mongol Empire

Jack Weatherford is an anthropologist and historian specializing in Genghis Khan and the Mongol Empire.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep476-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://alliocapital.com/ZocDoc: App that helps patients find healthcare providers.

Go to https://zocdoc.com/lexFin: AI agent for customer service.

Go to https://shopify.com/lexMasterClass: Online classes from world-class experts.

5 months, 3 weeks назад @ lexfridman.com
#475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games
#475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games #475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games

Demis Hassabis is the CEO of Google DeepMind and Nobel Prize winner for his groundbreaking work in protein structure prediction using AI.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep475-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://joinhampton.com/lexFin: AI agent for customer service.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

6 months назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 1 month, 3 weeks назад
Ideas: Community building, machine learning, and the future of AI
Ideas: Community building, machine learning, and the future of AI Ideas: Community building, machine learning, and the future of AI

This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS.

In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics.

How had you experienced a lack of community or network of women in machine learning before the founding of WiML?

So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

1 month, 3 weeks назад @ microsoft.com
Ideas: More AI-resilient biosecurity with the Paraphrase Project
Ideas: More AI-resilient biosecurity with the Paraphrase Project Ideas: More AI-resilient biosecurity with the Paraphrase Project

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity.

These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening.

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses.

So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

But I feel like broadly…

3 months, 3 weeks назад @ microsoft.com
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education
Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education

KOHANE: So I think you’ve “nerd sniped” me because you [LAUGHTER]—which is all too easy—but I think there’s a central issue here.

But I actually think this is dark matter of human organizational technology that is not well understood.

AZEEM AZHAR: We didn’t talk about, you know, AI in its ability to potentially do this, which is to extend the clinician’s presence throughout the week.

And so I think there’s always going to be an opening for either differences of opinion or agreeing with you too much.

And this gets into whether AI is really going to get almost to the ab initio understanding of human biology.

5 months, 1 week назад @ microsoft.com
Reimagining healthcare delivery and public health with AI
Reimagining healthcare delivery and public health with AI Reimagining healthcare delivery and public health with AI

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

5 months, 3 weeks назад @ microsoft.com
Navigating medical education in the era of generative AI
Navigating medical education in the era of generative AI Navigating medical education in the era of generative AI

Prior to med school, Daniel pursued experiences that cultivated his interest in the application of AI in medical practice and education.

Really, really looking forward to this chat.

There’s AI before ChatGPT and before, you know, generative AI really became a big thing, and then afterwards.

And then after we talk about what’s really happening, what do you think should happen in medical education given the reality of generative AI?

And I do agree [that] AI really gives us real hope that we can make it true.

6 months назад @ microsoft.com
AI Testing and Evaluation: Reflections
AI Testing and Evaluation: Reflections AI Testing and Evaluation: Reflections

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

We have examples, like the pharmaceutical or medical device industry experts with whom you spoke, that’s really, you know, testing … there is a pre-deployment requirement.

And the third is just how rigid versus adaptive these testing and evaluation regimes or frameworks are in these different domains.

I really agree that there has been a lot of emphasis to date on, sort of, testing models upstream, the AI model evaluation.

You know, I think there’s been real progress already in the AI evaluation and testing ecosystem in the public-private partnership context.

6 months, 1 week назад @ microsoft.com
AI Testing and Evaluation: Learnings from cybersecurity
AI Testing and Evaluation: Learnings from cybersecurity AI Testing and Evaluation: Learnings from cybersecurity

Absolutely, I really, really was.

As a principal director on the Microsoft AI Red Team, Tori leads all AI security and safety red team operations, as well as dangerous capability testing, to directly inform C-suite decision-makers.

This year, we’ve pulled a lot of those assets and insights into the Azure [AI] Foundry AI Red Teaming Agent (opens in new tab).

So you can get a little taste of what we do day to day in the AI Red Teaming Agent.

WESTERHOFF: I think the most important takeaway from those lessons is that AI security is truly a team sport.

6 months, 2 weeks назад @ microsoft.com
How AI will accelerate biomedical research and discovery
How AI will accelerate biomedical research and discovery How AI will accelerate biomedical research and discovery

Dr. Eric Topol is the executive vice president of the biomedical research non-profit Scripps Research, where he founded and now directs the Scripps Research Translational Institute.

Let’s continue our deep dive on AI and biomedical research with this conversation with Noubar Afeyan:LEE: Noubar, thanks so much for joining.

And there’s the origin story of contact with AI, you know, before the emergence of generative AI and afterwards.

What is going on today with respect to AI really being used for something meaningful in the design and development of drugs?

TOPOL: You would read about how, you know, data is the new oil and, you know, gold and whatnot.

6 months, 2 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices
AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices AI Testing and Evaluation: Learnings from pharmaceuticals and medical devices

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

During the pre-market phase, medical testing establishes baseline safety and effectiveness metrics through bench testing, performance standards, and clinical studies.

SULLIVAN: So medical devices face a pretty prescriptive multi-level testing path before they hit the market.

We are looking into medical devices, as well, obviously, but also other technologies in advanced medical computing.

So we see Phase 3 trials as something that occurs in the medical devices and pharmaceuticals field.

6 months, 3 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from genome editing
AI Testing and Evaluation: Learnings from genome editing AI Testing and Evaluation: Learnings from genome editing

As generative AI continues to advance, Microsoft has gathered a range of experts—from genome editing to cybersecurity—to share how their fields approach evaluation and risk assessment.

CHARO: Well, you know, genome editing is both very old and very new.

Now the earliest forms of genome editing were very inefficient, and so we didn’t worry that much.

But the bottom-line thing to remember, the way to really think about it is, we don’t regulate genome editing; we regulate the things that use genome editing.

And she said, you know, we don’t regulate genome editing; we regulate the things that use genome editing.

6 months, 4 weeks назад @ microsoft.com
AI Testing and Evaluation: Learnings from Science and Industry
AI Testing and Evaluation: Learnings from Science and Industry AI Testing and Evaluation: Learnings from Science and Industry

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

And I think, really, there are two reasons why tech is so, kind of, representative of that kind of challenge that I’ve always found fascinating.

Continues to be a really important topic in the AI policy conversation right now, I think, for really good reason.

Testing is an important component for governance and AI and, of course, in all of these other domains, as well.

I think about almost, like, in the near to mid-term, like three issues that we need to address in the AI, kind of, policy and testing context.

7 months назад @ microsoft.com
The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research
The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research

LEE: Yeah, yeah.

It cannot—as, you know, Bill was saying—it cannot learn from your document.

And I don’t know if the two of you remember, but I ended up doing a lot of tests.

I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini (opens in new tab).

Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know.

7 months, 2 weeks назад @ microsoft.com
What AI's impact on individuals means for the health workforce and industry
What AI's impact on individuals means for the health workforce and industry What AI's impact on individuals means for the health workforce and industry

So I don’t think we should be surprised that business schools matter on this because we care about management.

That’s really going to change the way, like, middle school works, was my thinking at the time.

We’ve gone from AI being highly discriminative to AI that’s able to explore the world in particular ways.

The symptoms that they’re showing are quite different, and also their compliance is really, really different.

LEE: Yeah, really, really interesting.

8 months назад @ microsoft.com
Abstracts: Zero-shot models in single-cell biology with Alex Lu
Abstracts: Zero-shot models in single-cell biology with Alex Lu Abstracts: Zero-shot models in single-cell biology with Alex Lu

And single-cell foundation models claim to be capable of unraveling deeper insights than ever before.

Basically, we showed that single-cell foundation models perform worse in settings that are fundamental to biological discovery than much simpler machine learning and statistical methods that were used in the field before single-cell foundation models emerged and are the go-to standard for unpacking meaning from these complicated experiments.

And the way to understand this is because single-cell foundation models are trained in a way that tries to expose these models to millions of single-cells.

But let’s also talk about the impact for methodologists, people who are trying to improve these s…

8 months, 1 week назад @ microsoft.com
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma Abstracts: Aurora with Megan Stanley and Wessel Bruinsma

This is such exciting work about environmental forecasting, so we’re happy to have the two of you join us today.

Mostly because AI weather forecasting models are computationally much more efficient and can even be more accurate.

What’s unfortunate though, about this big step forward, is that these developments are mostly limited to the setting of weather forecasting.

Weather forecasting is very important, obviously, but there are many other important environmental forecasting problems out there, such as air pollution forecasting or ocean wave forecasting.

STANLEY: Current approaches have really focused training very specifically on weather forecasting models.

8 months, 1 week назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 1 month назад
Video Recommendations in Industry
Video Recommendations in Industry Video Recommendations in Industry

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial …

1 month назад @ dataskeptic.com
Eye Tracking in Recommender Systems
Eye Tracking in Recommender Systems Eye Tracking in Recommender Systems

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrate…

1 month, 1 week назад @ dataskeptic.com
Cracking the Cold Start Problem
Cracking the Cold Start Problem Cracking the Cold Start Problem

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insigh…

1 month, 2 weeks назад @ dataskeptic.com
Designing Recommender Systems for Digital Humanities
Designing Recommender Systems for Digital Humanities Designing Recommender Systems for Digital Humanities

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challen…

2 months назад @ dataskeptic.com
DataRec Library for Reproducible in Recommend Systems
DataRec Library for Reproducible in Recommend Systems DataRec Library for Reproducible in Recommend Systems

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Mario Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. The conversation covers Alberto's research journey through knowledge graphs, graph-based recommen…

2 months, 1 week назад @ dataskeptic.com
Shilling Attacks on Recommender Systems
Shilling Attacks on Recommender Systems Shilling Attacks on Recommender Systems

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a …

2 months, 3 weeks назад @ dataskeptic.com
Music Playlist Recommendations
Music Playlist Recommendations Music Playlist Recommendations

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start prob…

2 months, 4 weeks назад @ dataskeptic.com
Bypassing the Popularity Bias
Bypassing the Popularity Bias Bypassing the Popularity Bias 3 months, 1 week назад @ dataskeptic.com
Sustainable Recommender Systems for Tourism
Sustainable Recommender Systems for Tourism Sustainable Recommender Systems for Tourism

In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design. Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that balance user satisfaction with environmental concerns, and creating frameworks that distribute tourism more equitably across destinations. Ashmi's insights offer valuable perspectives for both AI res…

3 months, 2 weeks назад @ dataskeptic.com
Interpretable Real Estate Recommendations
Interpretable Real Estate Recommendations Interpretable Real Estate Recommendations

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich interviews Dr. Kunal Mukherjee, a postdoctoral research associate at Virginia Tech, about the paper "Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations" The discussion explores how the post-COVID real estate landscape has created a need for better recommendation systems that can introduce home buyers to emerging neighborhoods they might not know about. Dr. Mukherjee, explains how his team developed a graph neural network approach that not only recommends properties but provides human-interpretable explanations for why certain regions are suggested. The conversation covers the advantages o…

4 months назад @ dataskeptic.com
Why Am I Seeing This?
Why Am I Seeing This? Why Am I Seeing This?

In this episode of Data Skeptic, we explore the challenges of studying social media recommender systems when exposure data isn't accessible. Our guests Sabrina Guidotti, Gregor Donabauer, and Dimitri Ognibene introduce their innovative "recommender neutral user model" for inferring the influence of opaque algorithms.

4 months, 2 weeks назад @ dataskeptic.com
Eco-aware GNN Recommenders
Eco-aware GNN Recommenders Eco-aware GNN Recommenders

In this episode of Data Skeptic, we dive into eco-friendly AI with Antonio Purificato, a PhD student from Sapienza University of Rome. Antonio discusses his research on "EcoAware Graph Neural Networks for Sustainable Recommendations" and explores how we can measure and reduce the environmental impact of recommender systems without sacrificing performance.

4 months, 3 weeks назад @ dataskeptic.com
Networks and Recommender Systems
Networks and Recommender Systems Networks and Recommender Systems

Kyle reveals the next season's topic will be "Recommender Systems". Asaf shares insights on how network science contributes to the recommender system field.

5 months, 1 week назад @ dataskeptic.com
Network of Past Guests Collaborations
Network of Past Guests Collaborations Network of Past Guests Collaborations

Kyle and Asaf discuss a project in which we link former guests of the podcast based on their co-authorship of academic papers.

6 months, 1 week назад @ dataskeptic.com
The Network Diversion Problem
The Network Diversion Problem The Network Diversion Problem

In this episode, Professor Pål Grønås Drange from the University of Bergen, introduces the field of Parameterized Complexity - a powerful framework for tackling hard computational problems by focusing on specific structural aspects of the input. This framework allows researchers to solve NP-complete problems more efficiently when certain parameters, like the structure of the graph, are "well-behaved". At the center of the discussion is the network diversion problem, where the goal isn’t to block all routes between two points in a network, but to force flow - such as traffic, electricity, or data - through a specific path. While this problem appears deceptively similar to the classic "Min.Cu…

6 months, 3 weeks назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 2 days, 7 hours назад
960: In Case You Missed It in December 2025
960: In Case You Missed It in December 2025

For 2026’s first episode of In Case You Missed It (ICYMI), Jon Krohn selects 6 clips from December for a wide-ranging look at the current state of AI in business and beyond. Hear from Joel Beasley (Episode 945), Jeff Li (Episode 947), Sandy Pentland (Episode 949), Josh Clemm (Episode 951), Penelope LaFeuille (Episode 952), and John Roese (Episode 953) on ensuring your AI systems get adopted and succeed in their goals, interesting ways to use AI for standup comedy routines, and more. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/960⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

2 days, 7 hours назад @ podtrac.com
959: Building Agents 101: Design Patterns, Evals and Optimization (with Sinan Ozdemir)
959: Building Agents 101: Design Patterns, Evals and Optimization (with Sinan Ozdemir) 959: Building Agents 101: Design Patterns, Evals and Optimization (with Sinan Ozdemir)

AI entrepreneur and bestselling author Sinan Ozdemir speaks to Jon Krohn about the practical differences between agentic AI and AI workflows, why evaluating accuracy on its own won’t tell you enough about AI models, and more about his latest book Building Agentic AI. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Fabi and by Cisco. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/959⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (04:57) Exploring the differences between workflows and agents (17:03) How to work out parameter count for…

5 days, 7 hours назад @ podtrac.com
958: Without Trusted Context, Agents are Stupid (featuring Salesforce’s Rahul Auradkar)
958: Without Trusted Context, Agents are Stupid (featuring Salesforce’s Rahul Auradkar) 958: Without Trusted Context, Agents are Stupid (featuring Salesforce’s Rahul Auradkar)

In this #sponsored Feature Friday episode, Salesforce’s Rahul Auradkar speaks to Jon Krohn about the company’s unified data engine and how its acquisition of Informatica provides the missing context layer for AI models and agents. Hear how Salesforce’s Data 360 helps customers to get accurate and insightful information about their business, and what AI models need to benefit a company’s bottom line (hint: it’s not only large amounts of data!) Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/958⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 week, 2 days назад @ podtrac.com
957: How AI Agents Are Automating Enterprise Data Operations, with Ashwin Rajeeva
957: How AI Agents Are Automating Enterprise Data Operations, with Ashwin Rajeeva 957: How AI Agents Are Automating Enterprise Data Operations, with Ashwin Rajeeva

AI agents, data lakes, and managing data sprawl: Ashwin Rajeeva, cofounder and CTO of Acceldata, speaks to Jon Krohn about how the agentic data management startup raised over $100 million in venture capital to expand its business in automating data quality assurance as well as cataloguing and pipeline maintenance across enterprise environments. Acceldata utilizes multiple agents to solve enterprise-grade questions with company data. It also uses autonomous data pipelines that can detect and fix issues without human intervention, and the platform’s agentic data management system ADM also lets humans stay in the loop wherever needed. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠…

1 week, 5 days назад @ podtrac.com
956: From Agent Demo to Enterprise Product (with Ease!) feat. Salesforce’s Tyler Carlson
956: From Agent Demo to Enterprise Product (with Ease!) feat. Salesforce’s Tyler Carlson 956: From Agent Demo to Enterprise Product (with Ease!) feat. Salesforce’s Tyler Carlson

#Sponsored SVP, Head of Product for Salesforce’s AppExchange & Ecosystem, Tyler Carlson, talks to Jon Krohn about taking AI agents from prototype to enterprise-grade production with the Agentforce 360 Platform. Though we may now have plenty of tools to build demos for AI agents, most teams still struggle to turn early prototypes into secure and scalable products. With Salesforce’s Agentforce 360 Platform, users can build customer-focused agentic applications with a multi-LLM planner service for reasoning and logic, as well as a new scripting language for deterministic control over how agents interact with the contextual layer of the Salesforce data model. Learn how Salesforce’s WYSIWYG sche…

2 weeks, 2 days назад @ podtrac.com
955: Nested Learning, Spatial Intelligence and the AI Trends of 2026, with Sadie St. Lawrence
955: Nested Learning, Spatial Intelligence and the AI Trends of 2026, with Sadie St. Lawrence 955: Nested Learning, Spatial Intelligence and the AI Trends of 2026, with Sadie St. Lawrence

Sadie St Lawrence joins Jon Krohn to discuss what to expect from the AI industry in 2026. Sadie and Jon talk through what they think will be the five biggest trends in AI, hand out awards for the best moments, comebacks, and disappointments in AI in 2025, and review how their predictions for 2025 played out. Hear Sadie’s five exciting predictions for 2026, from emerging jobs in AI to an important return to the drawing board! This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Fabi and by MongoDB. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/955⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for spon…

2 weeks, 5 days назад @ podtrac.com
954: Recap of 2025 and Wishing You a Wonderful 2026
954: Recap of 2025 and Wishing You a Wonderful 2026 954: Recap of 2025 and Wishing You a Wonderful 2026

Jon Krohn wraps up 2025 with his thoughts on how agentic AI has become as much a resounding success as an annoying buzzword for many in the tech industry, why such promising developments in generative AI mean that well-prepared, secured data will be ever more crucial, and Jon’s hopes for a better year for everyone across the world in 2026. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/954⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

3 weeks, 2 days назад @ podtrac.com
953: Beyond “Agent Washing”: AI Systems That Actually Deliver ROI, with Dell’s Global CTO John Roese
953: Beyond “Agent Washing”: AI Systems That Actually Deliver ROI, with Dell’s Global CTO John Roese 953: Beyond “Agent Washing”: AI Systems That Actually Deliver ROI, with Dell’s Global CTO John Roese

Dell Technologies’ John Roese talks to Jon Krohn about the phenomenon of “agent-washing”, his contribution to Dell’s incredible revenue boost in 2025, and why “knowledge layers” will be crucial to future tech. Hear also John’s predictions for where AI is going to lead us in 2026, from better, clearer governance, data management methods and definitions for agentic AI, to systems that keep AI tools and our data running and secure with the help of “AI factories” and “sovereign AI”. This episode is brought to you by MongoDB and by Y Carrot. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/953⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email natalie…

3 weeks, 5 days назад @ podtrac.com
952: How to Avoid Burnout and Get Promoted, with “The Fit Data Scientist” Penelope Lafeuille
952: How to Avoid Burnout and Get Promoted, with “The Fit Data Scientist” Penelope Lafeuille 952: How to Avoid Burnout and Get Promoted, with “The Fit Data Scientist” Penelope Lafeuille

“The Fit Data Scientist” newsletter author Pénélope Lafeuille talks to Jon Krohn about how to give your all at work, offering her top tips for a healthy body and a healthy mind. Learn why “The SuperDataScience Podcast” made it onto her top 3 data science podcasts, and why following your passion can pay off in dividends for your career. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/952⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month назад @ podtrac.com
951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm
951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm 951: Context Engineering, Multiplayer AI and Effective Search, with Dropbox’s Josh Clemm

VP of Engineering at Dropbox Josh Clemm speaks to Jon Krohn about consolidating search tools across apps with the AI-powered workspace, Dropbox Dash, the new collaborative AI systems that enhance interoperability between team members and their projects, and how to avoid “context rot”. Dropbox Dash gives users the best of Dropbox’s cloud storage and search functions, plus a “universal search” ability to locate information across multimedia and apps. “AI really needs to understand you and your team, first and foremost, and all that connected data,” says Josh. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Airia and by MongoDB. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www…

1 month назад @ podtrac.com
950: Happy Holidays from All of Us at the SuperDataScience Podcast
950: Happy Holidays from All of Us at the SuperDataScience Podcast 950: Happy Holidays from All of Us at the SuperDataScience Podcast

In this special holiday episode, the SuperDataScience Podcast team comes together to wish you happy holidays and thank you for listening throughout the year. Team members from around the world share warm greetings in their own voices and languages as we reflect on another year of learning, curiosity, and community. From all of us at SDS, we wish you a joyful holiday season and look forward to bringing you more data science, machine learning, and AI content in the year ahead. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/950⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 1 week назад @ podtrac.com
949: Why AI Keeps Failing Society, with Stanford professor Alex “Sandy” Pentland
949: Why AI Keeps Failing Society, with Stanford professor Alex “Sandy” Pentland 949: Why AI Keeps Failing Society, with Stanford professor Alex “Sandy” Pentland

Alex “Sandy” Pentland, Toshiba Professor of Media Arts & Science at MIT and Fellow at Stanford, speaks to Jon Krohn about his new book, Shared Wisdom, why he attributes AI to the collapse of the Soviet Union, and why those risks to society could still be relevant today. We can only achieve better system performance, Alex says, when we build tools that keep step with the way that people make decisions. Listen to the episode to hear Alex talk about how he is helping make AI agents work for individuals rather than the companies that develop them, and his work in making sure that systems operate consistently and fairly across the world. This episode is brought to you by the⁠ ⁠⁠⁠⁠Dell⁠⁠⁠, by⁠ ⁠⁠…

1 month, 1 week назад @ podtrac.com
948: In Case You Missed It in November 2025
948: In Case You Missed It in November 2025 948: In Case You Missed It in November 2025

In this November episode of “In Case You Missed It” series, Jon Krohn selects his favorite clips from the month. Hear from Shirish Gupta and Tyler Cox (Episode 939), Vikoy Pandey (Episode 941), Marc Dupuis (Episode 937), and Maya Ackerman (Episode 943) on getting back to human motivation and the importance of evaluating the tools and data we use. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/948⁠⁠⁠⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 2 weeks назад @ podtrac.com
947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li
947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li 947: How to Get Hired at Top Firms like Netflix and Spotify, with Jeff Li

Jeff Li tells Jon Krohn what it's like to work at scale as a data scientist and a machine learning engineer at Netflix, Spotify and DoorDash, as well as how to get a foot in the door at these companies. Jeff also discusses how to run forecasts and trends, and how to read their results. Listen to hear Jeff Li discuss how Spotify became a podcast powerhouse, his startup move.ai, and the tools he uses every day. This episode is brought to you by the ⁠⁠Dell⁠⁠, by ⁠⁠Intel⁠⁠, by Fabi, and by Airia. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/947⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship informat…

1 month, 2 weeks назад @ podtrac.com
946: How Robotaxis Are Transforming Cities
946: How Robotaxis Are Transforming Cities 946: How Robotaxis Are Transforming Cities

Jon Krohn looks into the benefits of robotaxis, from safety to affordability, in this Five-Minute Friday. Hear about Waymo’s partnership with Jaguar Land Rover, the latest safety studies concerning driverless vehicles, and a case for robotaxis becoming the preferred method of transport in the US, where households spend roughly 15% of their budget on vehicle ownership. Additional materials: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠www.superdatascience.com/946⁠⁠⁠⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 3 weeks назад @ podtrac.com
Data Science at Home Data Science at Home
последний пост 1 month назад
When Data Stops Being Code and Starts Being Conversation (Ep. 297)
When Data Stops Being Code and Starts Being Conversation (Ep. 297) When Data Stops Being Code and Starts Being Conversation (Ep. 297)

Mark Brocato built Mockaroo—the tool that taught millions of developers how to fake data.

Now, as Head of Engineering at Tonic.ai, he’s building the AI agent that’s making his own creation obsolete.

From the hidden failures of legacy mocks to the security implications of agent-driven synthesis, Mark reveals what happens when data generation becomes a conversation—not a pipeline.

SponsorsTonic.ai Synthetic data solutions for software and AI development.

Accelerate engineering velocity and ensure compliance with AI-powered data synthesisThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics…

1 month назад @ datascienceathome.com
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)
Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295) Your AI Strategy is Burning Money: Here’s How to Fix It (Ep.295)

Most companies don’t have an AI problem.

In this conversation, he breaks down when AI actually makes sense, where AWS costs spiral out of control, and why your “cool demo” keeps dying before launch.

If you’re tired of AI hype and ready for straight answers, hit play.

Our Discord community is full of ML engineers, researchers, and AI enthusiasts discussing papers, sharing projects, and helping each other level up.

Whether you’re debugging your first neural net or training your tenth transformer, there’s a place for you.

2 months назад @ datascienceathome.com
From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)
From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294) From Tokens to Vectors: The Efficiency Hack That Could Save AI (Ep. 294)

LLMs generate text painfully slow, one low-info token at a time.

Researchers just figured out how to compress 4 tokens into smart vectors & cut costs by 44%—with full code & proofs!

🔥📊SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

Get $200 off any seminar with code DATA25 at https://statisticalhorizons.com

2 months, 2 weeks назад @ datascienceathome.com
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293)
Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293) Why AI Researchers Are Suddenly Obsessed With Whirlpools (Ep. 293)

VortexNet uses actual whirlpools to build neural networks.

By borrowing equations from fluid dynamics, this new architecture might solve deep learning’s toughest problems—from vanishing gradients to long-range dependencies.

Today we explain how vortex shedding, the Strouhal number, and turbulent flows might change everything in AI.

SponsorsThis episode is brought to you by Statistical HorizonsAt Statistical Horizons, you can stay ahead with expert-led livestream seminars that make data analytics and AI methods practical and accessible.

Join thousands of researchers and professionals who’ve advanced their careers with Statistical Horizons.

2 months, 3 weeks назад @ datascienceathome.com
The Scientists Growing Living Computers in Swiss Labs (Ep. 292)
The Scientists Growing Living Computers in Swiss Labs (Ep. 292) The Scientists Growing Living Computers in Swiss Labs (Ep. 292)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

With a focus on dual-use innovation, Amethix is shaping a future where intelligent machines extend human capability, not replace it.

Discover more at https://amethix.com This episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Learn more at intrepid.aiReferencesWebsite: finalspark.comDiscord account: / discordNewsletter: https://finalspark.com/#newsletterTopics: Biological computing • Neural engineering • Energy-effic…

3 months назад @ datascienceathome.com
When AI Hears Thunder But Misses the Fear (Ep. 291)
When AI Hears Thunder But Misses the Fear (Ep. 291) When AI Hears Thunder But Misses the Fear (Ep. 291)

Sanjoy Chowdhury reveals AI’s hidden weakness: while systems can see objects and hear sounds perfectly, they can’t reason across senses like humans do.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at https://amethix.comThis episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Whether it’s in the sky, on the ground, or in orbit—if it’s intelligent and mobile, Intrepid helps you build it.

3 months, 2 weeks назад @ datascienceathome.com
Why VCs Are Funding $100M Remote Control Toys (Ep. 290)
Why VCs Are Funding $100M Remote Control Toys (Ep. 290) Why VCs Are Funding $100M Remote Control Toys (Ep. 290)

ReferencesWar On The Rocks: https://warontherocks.com/2025/08/ukraine-isnt-the-model-for-winning-the-innovation-war/LinkedIn: https://www.linkedin.com/in/jonasrsinger/Spotify: https://tr.ee/Omy_1X8k1UApple Podcast: https://podcasts.apple.com/us/podcast/defence-innovation-podcast/id1797131332YouTube: https://youtube.com/@DefenceInnovationpodcast?si=cu2WlnVgL5XKnM0pSponsorsThis episode is proudly sponsored by Amethix Technologies.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at https://amethix.comThis episode is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers…

4 months, 1 week назад @ datascienceathome.com
How Hacker Culture Died (Ep. 289)
How Hacker Culture Died (Ep. 289) How Hacker Culture Died (Ep. 289)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comDSH is brought to you by Intrepid AI.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at:[email protected]’t forget to like, subscribe, and hit the 🔔 for…

4 months, 4 weeks назад @ datascienceathome.com
Robots Suck (But It’s Not Their Fault) (Ep. 288)
Robots Suck (But It’s Not Their Fault) (Ep. 288) Robots Suck (But It’s Not Their Fault) (Ep. 288)

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comDSH is brought to you by Intrepid AI.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Send us mail at:[email protected]’t forget to like, subscribe, and hit the 🔔 for…

5 months, 3 weeks назад @ datascienceathome.com
Your Favorite AI Startup is Probably Bullshit (Ep. 287)
Your Favorite AI Startup is Probably Bullshit (Ep. 287) Your Favorite AI Startup is Probably Bullshit (Ep. 287)

The brutal truth about why Silicon Valley is blowing billions on glorified autocomplete while pretending it’s the next iPhone.

We’re diving deep into the AI investment circus where VCs who can’t code are funding companies that barely understand their own technology.

From blockchain déjà vu to the “ChatGPT wrapper” economy—this episode will make you question every AI valuation you’ve ever seen.

Fair warning: We’re naming names and calling out the hype.

Don’t listen if you work at a “revolutionary AI startup” that’s just OpenAI’s API with a pretty interface.

5 months, 3 weeks назад @ datascienceathome.com
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 286) [RB]
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 286) [RB] Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 286) [RB]

From the viral article “Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything” on my newsletter at https://defragzone.substack.com/p/techs-dumbest-mistake-why-firinghere are my thoughts about AI replacing programmers…🎙️ Sponsors AGNTCY — The open source collective building the Internet of Agents🌐 https://www.agntcy.org✨ Connect with us!

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Scie…

6 months, 3 weeks назад @ datascienceathome.com
Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285)
Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285) Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285)

In this episode of Data Science at Home, we explore the fascinating world of neuromorphic computing — a brain-inspired approach to computation that could reshape the future of AI and robotics.

The episode breaks down how neuromorphic systems differ from conventional AI architectures like transformers and LLMs, diving into spiking neural networks (SNNs), their benefits in energy efficiency and real-time processing, and their limitations in training and scalability.

Real-world applications are highlighted, including low-power drones, hearing aids, and event-based cameras.

Francesco closes with a vision of hybrid systems where neuromorphic chips and LLMs coexist, blending biological inspiratio…

7 months, 1 week назад @ datascienceathome.com
DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284)
DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284) DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284)

This episode explores the invisible battlespace of cyber and electronic warfare, where AI takes center stage.

SponsorsBuilding multi-agent software is hard — agent-to-agent and agent-to-tool communication is still the wild west.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

7 months, 3 weeks назад @ datascienceathome.com
DSH/Warcoded Swarming the Battlefield (Ep. 283)
DSH/Warcoded Swarming the Battlefield (Ep. 283) DSH/Warcoded Swarming the Battlefield (Ep. 283)

Swarming the Battlefield explores how artificial intelligence is revolutionizing combat through coordinated drone swarms.

This episode uncovers how these intelligent agents turn the chaos of the battlefield into a synchronized dance of machine warfare.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

8 months назад @ datascienceathome.com
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282) DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)

In this gripping follow-up, we dive into how AI is transforming kinetic operations—from identifying a threat to executing a strike.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Whether it’s in the sky, on the ground, or in orbit—if it’s intelligent and mobile, Intrepid helps you build it.

8 months, 2 weeks назад @ datascienceathome.com