Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 4 часа назад
An analytic theory of creativity in convolutional diffusion models.
An analytic theory of creativity in convolutional diffusion models.

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

4 часа назад @ reddit.com
[P] Training Cascade R-CNN (ResNet-101 + FPN) on Custom Dataset for Solar Panel Detection
[P] Training Cascade R-CNN (ResNet-101 + FPN) on Custom Dataset for Solar Panel Detection

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

7 часов назад @ reddit.com
[R] Just discovered something fascinating about AI filter limitations and human-AI collaboration patterns
[R] Just discovered something fascinating about AI filter limitations and human-AI collaboration patterns

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

10 часов назад @ reddit.com
[P] Live data and model training tips
[P] Live data and model training tips

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

11 часов назад @ reddit.com
[D] I have a dataset of 1000+ rows... Which has Datetimehr of 2 unique values.. i.e all the rows has only 1 of 2 total timeframes....(2025-07-04 08:00:00 and 09:00:00) And it also has different columns with values and texts that can be encoded... How to pr
[D] I have a dataset of 1000+ rows... Which has Datetimehr of 2 unique values.. i.e all the rows has only 1 of 2 total timeframes....(2025-07-04 08:00:00 and 09:00:00) And it also has different columns with values and texts that can be encoded... How to pr

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

12 часов назад @ reddit.com
[D] ACMMM Meta-review Accept & Decision Reject. Is it possible?
[D] ACMMM Meta-review Accept & Decision Reject. Is it possible?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

13 часов назад @ reddit.com
[P] Revision of a book on the topic of supervised learning.
[P] Revision of a book on the topic of supervised learning.

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
[D] I benchmarked 4 Python text extraction libraries so you don't have to (2025 results)
[D] I benchmarked 4 Python text extraction libraries so you don't have to (2025 results)

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
[D] What are paper introductions meant to communicate to a knowledgable reader?
[D] What are paper introductions meant to communicate to a knowledgable reader?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
Neurips: 0 reviews submitted [D]
Neurips: 0 reviews submitted [D]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
[D] NeurIPS workshops 2025?
[D] NeurIPS workshops 2025?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

15 часов назад @ reddit.com
[D] Is the following dataset appropriate for ML? The first column is variable name, the second is type of variable and third is explanation of the variable. There is 32000 rows. The final variable is the target variable.
[D] Is the following dataset appropriate for ML? The first column is variable name, the second is type of variable and third is explanation of the variable. There is 32000 rows. The final variable is the target variable.

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

16 часов назад @ reddit.com
[D]Emergent Conventions in Multi-Agent LLMs: Experimental Evidence (SciAdv'24)
[D]Emergent Conventions in Multi-Agent LLMs: Experimental Evidence (SciAdv'24)

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

17 часов назад @ reddit.com
[R] State of The Art models in Video Matting - Comparative Analysis.
[R] State of The Art models in Video Matting - Comparative Analysis.

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

19 часов назад @ reddit.com
[D] ACM MM- Complaining against Area Chair Review
[D] ACM MM- Complaining against Area Chair Review

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

19 часов назад @ reddit.com
Towards Data Science
последний пост 23 часа назад
My Honest Advice for Aspiring Machine Learning Engineers
My Honest Advice for Aspiring Machine Learning Engineers My Honest Advice for Aspiring Machine Learning Engineers

In this article, I aim to offer my unfiltered and candid advice to aspiring machine learning engineers.

This isn’t to boast but to show the level of commitment required to become a machine learning engineer.

The reality is that most machine learning engineer roles primarily focus on classical supervised learning.

I often say:Anyone can become a machine learning engineer — but that doesn’t mean everyone should, or even wants to.

If you are serious about becoming a machine learning engineer, then I recommend checking out the below article, where I detail my roadmap:Link.

23 часа назад @ towardsdatascience.com
Rethinking Data Science Interviews in the Age of AI
Rethinking Data Science Interviews in the Age of AI Rethinking Data Science Interviews in the Age of AI

In this article, I will share my perspective on how data scientist interviews should (would) evolve in the age of AI.

The Traditional Data Scientist Interview LoopBefore talking about how things will change, let’s go through the current structure of data scientist interviews.

Cross-functional interviews: Data Scientist is a technical role, but it is also highly cross-functional, aiming to drive real business impact using data.

Coding Interviews: Most Likely to Change FirstWhat can AI do quickly?

Today’s coding interviews ask candidates to write SQL and Python code correctly.

1 day, 7 hours назад @ towardsdatascience.com
Change-Aware Data Validation with Column-Level Lineage
Change-Aware Data Validation with Column-Level Lineage Change-Aware Data Validation with Column-Level Lineage

But even with the added structure and clearly defined data models, pipelines can still become complex, which makes debugging issues and validating changes to data models difficult.

Data validation refers to the process used to determine that the data is correct in terms of real-world requirements.

This is why reviewing data model changes is a unique challenge, because both the code and the data needs to be reviewed.

A structured and repeatable processBy using this change-aware data validation technique, you can bring structure and precision to the review process, making it systematic and repeatable.

He’s always happy to chat about SQL, data engineering, or helping teams navigate their data …

1 day, 8 hours назад @ towardsdatascience.com
Explainable Anomaly Detection with RuleFit: An Intuitive Guide
Explainable Anomaly Detection with RuleFit: An Intuitive Guide Explainable Anomaly Detection with RuleFit: An Intuitive Guide

your anomaly detection results to your stakeholders, the immediate next question is always “why?”.

Yet, most machine learning-based anomaly detection methods stop at producing an anomaly score.

Note that you can, in theory, also use the anomaly score (produced by the primary anomaly detection model) as the “target outcome”.

Of course, you can also try other anomaly detection algorithms, such as Gaussian Mixture Models (GMM), K-Nearest Neighbors (KNN), and Autoencoders, among others.

ConclusionIn this blog post, we explored the RuleFit algorithm as a powerful solution for explainable anomaly detection.

1 day, 8 hours назад @ towardsdatascience.com
Fairness Pruning: Precision Surgery to Reduce Bias in LLMs
Fairness Pruning: Precision Surgery to Reduce Bias in LLMs Fairness Pruning: Precision Surgery to Reduce Bias in LLMs

In the case of the Black man, he goes straight for a deadly shot to the back.

This neural “surgery” reduced the bias metric by 22% while pruning just 0.13% of the model’s parameters , without touching the neurons essential to its performance.

A neuron that shows high variance in activation when processing the “Black man” vs. “white man” prompts receives a high bias score.

When the police arrived, the black man said, ‘I’m not a thief, I’m a doctor.’”The result is a radical shift.

The bias metric, which measured the average activation difference, shows a dramatic drop:Original model bias: 0.03390.0339 Pruned model bias: 0.0264This represents a 22.12% reduction in measured bias.

2 days назад @ towardsdatascience.com
GraphRAG in Action: A Simple Agent for Know-Your-Customer Investigations
GraphRAG in Action: A Simple Agent for Know-Your-Customer Investigations GraphRAG in Action: A Simple Agent for Know-Your-Customer Investigations

It plays the role of MCP Host and MCP client to the Neo4j MCP Cypher Server.

It plays the role of MCP Host and MCP client to the Neo4j MCP Cypher Server.

Neo4j MCP Server: A Neo4j MCP Cypher Server exposing tools to interact with a Neo4j database.

A Basic Agent with OpenAI Agents SDKLet’s walk through the key parts of our KYC Agent.

You have built a simple AI agent on top of OpenAI Agents SDK with MCP, Neo4j and a Text-to-Cypher model.

2 days, 4 hours назад @ towardsdatascience.com
Taking ResNet to the Next Level
Taking ResNet to the Next Level Taking ResNet to the Next Level

The Entire ResNeXt ArchitectureThe structure displayed in Figure 1 and the equation in Figure 2 basically only correspond to a single ResNeXt block.

Secondly, it is also clearly seen that we have the cardinality parameter applied to the second convolution layer in each ResNeXt block.

# Codeblock 4 block = Block(in_channels=256, add_channel=True, downsample=True) x = torch.randn(1, 256, 56, 56) out = block(x)And below is what the output looks like.

# Codeblock 5 block = Block(in_channels=512, add_channel=False, downsample=False) x = torch.randn(1, 512, 28, 28) out = block(x)# Codeblock 5 Output original : torch.Size([1, 512, 28, 28]) no projection : torch.Size([1, 512, 28, 28]) #(1) after co…

2 days, 20 hours назад @ towardsdatascience.com
Software Engineering in the LLM Era
Software Engineering in the LLM Era Software Engineering in the LLM Era

Instead of thinking about it so narrowly, I actually really want to talk about the broader context of software engineering in the context of LLM technology.

While some people think this is happening in software engineering, it’s not clear if that’s necessarily the case yet, because there are other factors in play too.

Some people argue that there aren’t too many software engineers, but there are too many inexperienced or junior software engineers, and that employers are still desperate for experienced hires.

Junior engineers do write code and get work done, and they can often learn very quickly and become more productive.

If we build a working environment where junior software engineers no …

3 days, 4 hours назад @ towardsdatascience.com
Interactive Data Exploration for Computer Vision Projects with Rerun
Interactive Data Exploration for Computer Vision Projects with Rerun Interactive Data Exploration for Computer Vision Projects with Rerun

However it still didn’t really feel like the right tool for the task, especially when trying to work with interactive plots.

So I decided to rework my ball tracking demo from a previous project and plot the data using rerun.

Press Ctrl + O or select Open... in the menu on the top left of the rerun viewer and load the downloaded recording file.

In the video frame you can click on the annotations and in the left Blueprint panel you can hide or show them individually.

We can omit the second row share entry for the frame view since the shares have to add up to 1.

3 days, 5 hours назад @ towardsdatascience.com
Four AI Minds in Concert: A Deep Dive into Multimodal AI Fusion
Four AI Minds in Concert: A Deep Dive into Multimodal AI Fusion Four AI Minds in Concert: A Deep Dive into Multimodal AI Fusion

def _handle_main_analysis_flow(self, detection_result, original_image_pil, image_dims_val, class_confidence_threshold, scene_confidence_threshold, current_run_enable_landmark, lighting_info, places365_info) -> Dict: """ Core processing workflow for complete scene analysis when YOLO detection results are available.

This strategy prioritizes object detection because it provides the kind of objective, quantifiable evidence that forms the bedrock of most scene analysis.

Here, the weights shift based on the richness of object detection data from YOLO.

def perform_pyramid_analysis(self, image, clip_model_manager, landmark_data_manager, levels=4, base_threshold=0.25, aspect_ratios=[1.0, 0.75, 1.5]…

3 days, 5 hours назад @ towardsdatascience.com
Why We Should Focus on AI for Women
Why We Should Focus on AI for Women Why We Should Focus on AI for Women

In this post, I will walk through a case study from my personal experience: defining the optimal temperature in an office building, considering the different thermal comfort levels of men and women.

Case study: Thermal comfortTwo years ago, I worked on a project to optimize the energy efficiency in a building while maintaining thermal comfort.

Image by Author: Experimental FlowchartSimulation setupWe now simulate two populations—male and female—with slightly different thermal preferences.

We begin with defining an idealized thermal comfort model inspired by the Predicted Mean Vote (PMV) framework.

Image by Author: Learning curveWe can now evaluate how well the male-trained agent performs wh…

3 days, 7 hours назад @ towardsdatascience.com
How to Maximize Technical Events – NVIDIA GTC Paris 2025
How to Maximize Technical Events – NVIDIA GTC Paris 2025 How to Maximize Technical Events – NVIDIA GTC Paris 2025

Attending events can be challenging, as there are numerous ways to spend your time.

The inspiration for this article stems from my attendance at NVIDIA GTC Paris 2025 as a data scientist from Findable from June 10 to 12, 2025.

MotivationMy motivation for this article is that I was attending NVIDIA GTC Paris 2025.

Image by the author,Planning ahead of the eventBefore the event, NVIDIA put out a full session overview on their page.

Attending technical events, such as NVIDIA GTC, can be immensely helpful for gathering new ideas, connecting with interesting people, and hopefully, enhancing your engineering skills.

3 days, 8 hours назад @ towardsdatascience.com
How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1
How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1 How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

We’ll also talk about the surprising and interesting ways this data is currently being utilized to combat the effects of climate change.

How well we can model and understand climate data will shape our next decades on this earth.

The forest for the treesA big reason NASA makes this data open-source is to combat the effects of Climate Change.

However, there are additional ways data can be utilized to address climate change in a scientific and mathematically grounded manner.

And while much is being done to address climate change, the earlier list of effects was not exhaustive.

4 days, 4 hours назад @ towardsdatascience.com
STOP Building Useless ML Projects – What Actually Works
STOP Building Useless ML Projects – What Actually Works STOP Building Useless ML Projects – What Actually Works

all the time:“What projects should I do to get a job in data science or machine learning?”This question is flawed from the beginning.

Aim to build a wide range of projects, each using different tools, datasets, and machine learning algorithms.

Think about how machine learning could help answer those questions.

If you’re aiming for a role as a machine learning engineer, it’s especially valuable to build and deploy the project end-to-end.

To do this, you will need to learn some of the following:It may seem like a lot, but you don’t need to do everything on this list.

4 days, 5 hours назад @ towardsdatascience.com
An Introduction to Remote Model Context Protocol Servers
An Introduction to Remote Model Context Protocol Servers An Introduction to Remote Model Context Protocol Servers

A quick recap on what an MCP server isThere are dozens of definitions for what an MCP server is.

My setupI’ll be developing the code for the MCP server and its tools using Windows and Microsoft Visual Studio Code.

$ uv init remote-mcp Initialized project `remote-mcp` at `/home/tom/projects/remote-mcp` $ cd remote-mcp $ ls -al total 28 drwxr-xr-x 3 tom tom 4096 Jun 23 17:42 . drwxr-xr-x 14 tom tom 4096 Jun 23 17:42 .. drwxr-xr-x 7 tom tom 4096 Jun 23 17:42 .git -rw-r--r-- 1 tom tom 109 Jun 23 17:42 .gitignore -rw-r--r-- 1 tom tom 5 Jun 23 17:42 .python-version -rw-r--r-- 1 tom tom 0 Jun 23 17:42 README.md -rw-r--r-- 1 tom tom 88 Jun 23 17:42 main.py -rw-r--r-- 1 tom tom 156 Jun 23 17:42 pypr…

4 days, 6 hours назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
The Gradient The Gradient
последний пост 1 month назад
AGI Is Not Multimodal
AGI Is Not Multimodal AGI Is Not Multimodal

Despite this, scale maximalists have implicitly suggested that multimodal models can be a structure-agnostic framework for AGI.

While structure-agnostic scale maximalism has succeeded in producing LLMs and LVMs that pass Turing tests, a multimodal scale maximalist approach to AGI will not bear similar fruit.

CitationFor attribution in academic contexts or books, please cite this work asBenjamin A. Spiegel, "AGI Is Not Multimodal", The Gradient, 2025.

@article{spiegel2025agi, author = {Benjamin A. Spiegel}, title = {AGI Is Not Multimodal}, journal = {The Gradient}, year = {2025}, howpublished = {\url{https://thegradient.pub/agi-is-not-multimodal}, }ReferencesAndreas, Jacob.

“Language Models,…

1 month назад @ thegradient.pub
Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research

Mathematics and statistics, once the primary guides of machine learning research, now struggle to provide immediate insight into the latest breakthroughs.

This shift has prompted speculation about mathematics’ diminished role in machine learning research moving forward.

It is also the way that symmetries are usually leveraged when performing computations (for example, in machine learning).

One can reasonably argue that diagrammatic descriptions of well-known constructions, like products, are not useful for the machine learning researcher.

However, as we’ve demonstrated, while mathematics may not maintain the same role in machine learning research that it has held in the past, the success of…

7 months, 3 weeks назад @ thegradient.pub
What's Missing From LLM Chatbots: A Sense of Purpose
What's Missing From LLM Chatbots: A Sense of Purpose What's Missing From LLM Chatbots: A Sense of Purpose

Let's jump back to the 1970s, when Roger Schank introduced his "restaurant script" as a kind of dialogue system [1].

The minimum requirement we could have for a dialogue system is that it can stay on the task we gave them.

Concluding marksI have reviewed the making of current LLM dialogue systems, how and why it is insufficient.

The following are two research questions that I’m mostly excited about:(1) Better monitoring and control of dialogue systems with steering techniques.

CitationFor attribution of this in academic contexts or books, please cite this work as:Kenneth Li, "From prediction to purpose: a tutorial on LLM dialogue system", The Gradient, 2024.

9 months, 4 weeks назад @ thegradient.pub
TheSequence TheSequence
последний пост 1 day, 13 hours назад
The Sequence Research #678: Sequence to Function at Scale: Inside The AlphaGenome Breakthrough
The Sequence Research #678: Sequence to Function at Scale: Inside The AlphaGenome Breakthrough The Sequence Research #678: Sequence to Function at Scale: Inside The AlphaGenome Breakthrough

Today we are going to deep dive into one of the most impressive AI models applied to science.

DeepMind's release of AlphaGenome marks a significant advancement in the application of deep learning to genomics.

This powerful AI system is designed to interpret the functional landscape of DNA sequences, including the elusive non-coding regions, using an unprecedented combination of scale and precision.

AlphaGenome can process up to 1 million base pairs of input DNA and output predictions across thousands of functional genomic tracks, such as gene expression, splicing, chromatin state, and 3D genome architecture.

More importantly, it offers a unified and highly efficient architecture capable of …

1 day, 13 hours назад @ thesequence.substack.com
The Sequence Opinion #677: Glass-Box Transformers: How Circuits Illuminate Deep Learning’s Inner Workings
The Sequence Opinion #677: Glass-Box Transformers: How Circuits Illuminate Deep Learning’s Inner Workings The Sequence Opinion #677: Glass-Box Transformers: How Circuits Illuminate Deep Learning’s Inner Workings

Created Using GPT-4oCircuits are quickly becoming a favorite of the AI research community to tackle the monumental challenge of interpretability.

Today, we are going to explore both the case in favor and against circuits.

As transformer-based models push the boundaries of what AI can do, understanding how they work becomes increasingly urgent.

At the core of this approach lies the concept of circuits: interconnected sets of neurons or attention heads that jointly compute a specific function.

Historical Evolution of the Circuits Approach

2 days, 13 hours назад @ thesequence.substack.com
The Sequence Engineering #676: Hacking with Gemini CLI
The Sequence Engineering #676: Hacking with Gemini CLI The Sequence Engineering #676: Hacking with Gemini CLI

Created Using GPT-4oToday we are going to cover one of the most exciting AI releases of last week.

Gemini CLI brings Google’s advanced Gemini 2.5 Pro model to the developer terminal, blending powerful language capabilities with intuitive command-line tools.

Its architecture—built around a ReAct loop, the Model Context Protocol (MCP), and extensible plugins—provides rich interpretability by logging every decision, exposing internal memory, and letting users trace each “Thought” and “Action.” This essay explores how these components integrate to make Gemini CLI transparent, trustworthy, and easy to debug, all without leaving your shell.

Quick Intro

3 days, 13 hours назад @ thesequence.substack.com
The Sequence Knowledge #675: Learning to Evaluate Multi-Agent AIs
The Sequence Knowledge #675: Learning to Evaluate Multi-Agent AIs The Sequence Knowledge #675: Learning to Evaluate Multi-Agent AIs

Created Using GPT-4oToday we will Discuss:An overview of multi-agent benchmarks.

An introduction to the Arena-Hard benchmark.

💡 AI Concept of the Day: An Overview of Multi-Agent BenchmarksThe emergence of large language models (LLMs) has catalyzed a shift in AI evaluation paradigms, moving from single-agent benchmarks to more complex, multi-agent collaboration settings.

These benchmarks are designed to assess the ability of autonomous agents to engage in structured coordination, negotiation, and joint task execution across dynamic environments.

As LLMs become increasingly agentic, capable of memory, planning, and communication, multi-agent benchmarks offer a critical framework for testing e…

4 days, 13 hours назад @ thesequence.substack.com
TheSequence Radar #674: Transformers in the Genome: How AlphaGenome Reimagines AI-Driven Genomics
TheSequence Radar #674: Transformers in the Genome: How AlphaGenome Reimagines AI-Driven Genomics TheSequence Radar #674: Transformers in the Genome: How AlphaGenome Reimagines AI-Driven Genomics

You can subscribe to The Sequence below:📝 Editorial: Transformers in the Genome: How AlphaGenome Reimagines AI-Driven GenomicsI have been obsessed with AI in genetics for some time so I couldn’t write about anything else today other than DeepMind’s new model: AlphaGenome!

Traditional genomics models often excel at one signal—SpliceAI for splicing, ChromBPNet for chromatin state—necessitating an ensemble of tools to profile variant consequences fully.

In benchmark evaluations spanning 24 sequence-prediction and 26 variant-effect tasks, AlphaGenome matches or surpasses specialized baselines in over 90% of cases.

Its ability to standardize and accelerate regulatory variant annotation is poised…

6 days, 13 hours назад @ thesequence.substack.com
The Sequence Research #673: Infinite Self-Improvement: Unpacking Sakana's Darwin Gödel Machine
The Sequence Research #673: Infinite Self-Improvement: Unpacking Sakana's Darwin Gödel Machine The Sequence Research #673: Infinite Self-Improvement: Unpacking Sakana's Darwin Gödel Machine

Created Using GPT-4oThe Darwin Gödel Machine (DGM) pioneers a new paradigm in autonomous AI by combining the theoretical vision of self–modifying Gödel Machines with an empirical, Darwinian search process powered by foundation models.

DGM iteratively proposes patches to its own Python code, evaluates each variant on real-world coding benchmarks (SWE-bench, Polyglot), and archives successful agents to fuel open-ended evolution.

Building an AI that can rewrite its own reasoning logic has long been a dream since Jürgen Schmidhuber’s 2006 Gödel Machine, which required formal proofs for any self-modification.

However, real-world code rarely admits tractable proofs, relegating the Gödel Machine t…

1 week, 1 day назад @ thesequence.substack.com
The Sequence Opinion #672: Mind Over Model: Chain-of-Thought vs. System 1/System 2
The Sequence Opinion #672: Mind Over Model: Chain-of-Thought vs. System 1/System 2 The Sequence Opinion #672: Mind Over Model: Chain-of-Thought vs. System 1/System 2

As AI systems become more sophisticated, researchers and theorists often draw parallels between machine reasoning and human cognition.

One intriguing comparison is between chain-of-thought (CoT) reasoning in AI and the dual-process theory of human thought, commonly known as System 1 and System 2 thinking.

System 1 describes the brain’s fast, automatic, intuitive mode, while System 2 is slower, effortful, and deliberative.

First, we explain what CoT reasoning entails in modern AI systems and outline the psychological basis of the System 1/System 2 theory.

Chain-of-Thought Reasoning in AI

1 week, 2 days назад @ thesequence.substack.com
The Sequence Engineering #671: How Anthropic Built a Research Agent?
The Sequence Engineering #671: How Anthropic Built a Research Agent? The Sequence Engineering #671: How Anthropic Built a Research Agent?

Image Created Using GPT-4oThe Research feature in Claude represents a significant evolution in how large language models can tackle open-ended, complex research tasks.

At its core lies a multi-agent architecture, in which a LeadResearcher orchestrator spawns multiple specialized Subagents to explore distinct facets of a query in parallel.

This orchestrator-worker design draws inspiration from distributed computing paradigms and allows the system to achieve both breadth and depth beyond what a single-agent pipeline could accomplish.

In practice, a user’s query first reaches the LeadResearcher, which deconstructs the question into a coherent research plan and assigns targeted subtasks to Suba…

1 week, 3 days назад @ thesequence.substack.com
The Sequence Knowledge #670: Evaluating AI in Software Engineering Tasks
The Sequence Knowledge #670: Evaluating AI in Software Engineering Tasks The Sequence Knowledge #670: Evaluating AI in Software Engineering Tasks

Created Using GPT-4oToday we will Discuss:An overview of software engineering benchmarks.

A review of the SWE-Benchmark, the gold standard of software engineering AI evals.

💡 AI Concept of the Day: Software Engineering AI BenchmarksAs large language models (LLMs) find their way into software development workflows, the need for rigorous benchmarks to evaluate their coding capabilities has grown rapidly.

Today, software engineering benchmarks go far beyond simple code generation.

Built from real GitHub issues and corresponding pull requests, SWE-bench tasks models with generating code changes that resolve bugs and pass unit tests.

1 week, 4 days назад @ thesequence.substack.com
The Sequence Radar #: MiniMax-M1 is a Very Impressive Model
The Sequence Radar #: MiniMax-M1 is a Very Impressive Model The Sequence Radar #: MiniMax-M1 is a Very Impressive Model

A debate about reasoning in AI models works like system1-system2.

A review of the MCP-Use framework to integrate with MCP serversYou can subscribe to The Sequence below:📝 Editorial: MiniMax-M1 is a Very Impressive ModelAlgorithmic innovation is always interesting when comes to LLMs.

Last week, we had a very interesting release of a highly innovative model that flew a bit under the radar.

MiniMax-M1 is a new 456B parameter model that redefines efficiency and scale for open-weight models.

In combination, this hybrid architecture allows the model to handle up to 1 million tokens of context natively.

1 week, 6 days назад @ thesequence.substack.com
The Sequence #668: Inside V-JEPA 2: Meta AI's Breakthrough in Self-Supervised Visual World Modeling
The Sequence #668: Inside V-JEPA 2: Meta AI's Breakthrough in Self-Supervised Visual World Modeling The Sequence #668: Inside V-JEPA 2: Meta AI's Breakthrough in Self-Supervised Visual World Modeling

Created Using GPT-4oHave you ever heard of V-JEPA?

This is one of the models that encompass Meta AI’s vision of AGI.

Meta AI's release of V-JEPA 2 (Visual Joint Embedding Predictive Architecture 2) marks a significant evolution in the domain of self-supervised learning and world modeling.

As a successor to the original V-JEPA framework introduced by Yann LeCun and collaborators, V-JEPA 2 extends the paradigm by enhancing architectural scale, pretraining methodology, and semantic abstraction capabilities.

Built upon the theoretical vision of autonomous systems that learn predictive models of the world without labeled supervision, V-JEPA 2 offers a glimpse into a future where embodied AI can …

2 weeks, 1 day назад @ thesequence.substack.com
The Sequence Opinion #667: The Superposition Hypothesis And How it Changed AI Interpretability
The Sequence Opinion #667: The Superposition Hypothesis And How it Changed AI Interpretability The Sequence Opinion #667: The Superposition Hypothesis And How it Changed AI Interpretability

Created Using GPT-4oMechanistic interpretability—the study of how neural networks internally represent and compute—seeks to illuminate the opaque transformations learned by modern models.

This phenomenon of polysemanticity complicates efforts to reverse-engineer networks and has led to a key theoretical insight: the superposition hypothesis.

The superposition hypothesis proposes that neural networks are not built around one-neuron-per-feature mappings, but rather represent features as directions in high-dimensional activation spaces.

Neural networks, constrained by finite width and encouraged by sparsity in data, adopt a compressed representation strategy in which meaning is woven through a…

2 weeks, 2 days назад @ thesequence.substack.com
The Sequence Engineering #666: An Intro to AI Code Sandbox Environments
The Sequence Engineering #666: An Intro to AI Code Sandbox Environments The Sequence Engineering #666: An Intro to AI Code Sandbox Environments

While these systems can write and debug software, analyze repositories, and generate full applications, they require an environment where the resulting code can be executed safely, efficiently, and reproducibly.

Running untrusted, AI-generated code directly on host machines is a recipe for disaster.

To address this, a new wave of AI-focused sandbox environments has emerged.

These systems offer secure, fast, and scalable execution for agentic workflows, providing a foundational layer for the next generation of AI-driven development.

This essay explores the key platforms in this space, focusing on E2B, Daytona, Modal, and CodeSandbox, and unpacks their architecture, features, and benefits.

2 weeks, 3 days назад @ thesequence.substack.com
The Sequence Knowledge #665: What Evals can Quantify AGI
The Sequence Knowledge #665: What Evals can Quantify AGI The Sequence Knowledge #665: What Evals can Quantify AGI

Created Using GPT-4oToday we will Discuss:An overview of AGI benchmarks.

Artificial General Intelligence (AGI) benchmarks are indispensable tools for evaluating the reasoning, adaptability, and problem-solving abilities of AI systems.

Unlike narrow AI benchmarks that focus on domain-specific tasks, AGI benchmarks measure the capacity for generalization across a wide array of challenges.

This essay explores key AGI benchmarks that are shaping the future of intelligent systems, emphasizing their significance and unique testing methodologies.

AGI benchmarks are designed to stress-test models' abilities to adapt, reason, and learn from minimal supervision.

2 weeks, 4 days назад @ thesequence.substack.com
The Sequence Radar #664: The Gentle Singularity Is Already Here
The Sequence Radar #664: The Gentle Singularity Is Already Here The Sequence Radar #664: The Gentle Singularity Is Already Here

You can subscribe to The Sequence below:📝 Editorial: The Gentle Singularity Is Already HereIn a recent and quietly radical blog post titled "The Gentle Singularity," OpenAI CEO Sam Altman dropped a thesis that reads more like a plot twist than a prediction: the singularity isn’t coming—it’s already arrived.

What makes this singularity "gentle" is its deceptive normalcy.

A gentle singularity doesn’t mean a safe one.

🔎 AI ResearchLab: FAIR at Meta + Mila / Polytechnique MontréalV-JEPA 2 is a large-scale self-supervised video model trained on over 1 million hours of internet video.

o3-ProOpenAI released o3-pro, a new version of its o3 model optimized for longer reasoning tasks.

2 weeks, 6 days назад @ thesequence.substack.com
Synced Review
последний пост 2 months, 3 weeks назад
DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT
DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT

DeepSeek AI, a prominent player in the large language model arena, has recently published a research paper detailing a new technique aimed…Continue reading on SyncedReview »

2 months, 3 weeks назад @ medium.com
Automating Artificial Life Discovery: The Power of Foundation Models
Automating Artificial Life Discovery: The Power of Foundation Models Automating Artificial Life Discovery: The Power of Foundation Models

The recent Nobel Prize for groundbreaking advancements in protein discovery underscores the transformative potential of foundation models…Continue reading on SyncedReview »

6 months назад @ medium.com
Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI
Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI

Continue reading on SyncedReview »

6 months, 1 week назад @ medium.com
DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints
DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints

Recent advancements in training large multimodal models have been driven by efforts to eliminate modeling constraints and unify…Continue reading on SyncedReview »

6 months, 1 week назад @ medium.com
NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation
NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation

The Transformer architecture, introduced by Vaswani et al. in 2017, serves as the backbone of contemporary language models. Over the years…Continue reading on SyncedReview »

6 months, 2 weeks назад @ medium.com
From Token to Conceptual: Meta Introduces Large Concept Models in Multilingual AI
From Token to Conceptual: Meta Introduces Large Concept Models in Multilingual AI From Token to Conceptual: Meta Introduces Large Concept Models in Multilingual AI

Large Language Models (LLMs) have become indispensable tools for diverse natural language processing (NLP) tasks. Traditional LLMs operate…Continue reading on SyncedReview »

6 months, 2 weeks назад @ medium.com
NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small…
NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small… NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small…

Language models (LMs) based on transformers have become the gold standard in natural language processing, thanks to their exceptional…Continue reading on SyncedReview »

6 months, 3 weeks назад @ medium.com
From Response to Query: The Power of Reverse Thinking in Language Models
From Response to Query: The Power of Reverse Thinking in Language Models From Response to Query: The Power of Reverse Thinking in Language Models

Continue reading on SyncedReview »

6 months, 3 weeks назад @ medium.com
Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models
Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models

Navigation is a fundamental skill for any visually-capable organism, serving as a critical tool for survival. It enables agents to locate…Continue reading on SyncedReview »

6 months, 4 weeks назад @ medium.com
The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack
The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack

The landscape of vision model pre-training has undergone significant evolution, especially with the rise of Large Language Models (LLMs)…Continue reading on SyncedReview »

6 months, 4 weeks назад @ medium.com
Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model
Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model

A foundation model refers to a pre-trained model developed on extensive datasets, designed to be versatile and adaptable for a range of…Continue reading on SyncedReview »

7 months назад @ medium.com
DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence
DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence

Continue reading on SyncedReview »

7 months, 1 week назад @ medium.com
Revolutionizing AI on a Budget: Apple’s Roadmap for Small Language Models Training Success
Revolutionizing AI on a Budget: Apple’s Roadmap for Small Language Models Training Success Revolutionizing AI on a Budget: Apple’s Roadmap for Small Language Models Training Success

While large language models (LLMs) dominate the AI landscape, Small-scale Large Language Models (SLMs) are gaining traction as…Continue reading on SyncedReview »

7 months, 1 week назад @ medium.com
Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step…
Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step… Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step…

Consistency models (CMs) are a cutting-edge class of diffusion-based generative models designed for rapid and efficient sampling. However…Continue reading on SyncedReview »

7 months, 1 week назад @ medium.com
Precision in Pixels: NVIDIA’s Edify Image Model Combines High Quality with Unmatched Control
Precision in Pixels: NVIDIA’s Edify Image Model Combines High Quality with Unmatched Control Precision in Pixels: NVIDIA’s Edify Image Model Combines High Quality with Unmatched Control

The field of text-to-image synthesis has advanced rapidly, with state-of-the-art models now generating highly realistic and diverse images…Continue reading on SyncedReview »

7 months, 1 week назад @ medium.com
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 3 months, 1 week назад
Байесовская собака: анализ пёсьего компаса
Байесовская собака: анализ пёсьего компаса Байесовская собака: анализ пёсьего компаса

", подумал я. И, к счастью, у меня как раз под рукой оказался идеальный подопытный.

Стандартное арифметическое среднее между 360° и 0° даст нам 180°, несмотря на то, что и 360°, и 0° указывают в одном направлении.

Нулевая гипотеза утверждает, что данные распределены равномерно по кругу, альтернативная — что это не так.

from pingouin import circ_vtest v, pval = circ_vtest(data['radians'], dir=np.pi) print(f"V-statistics: {v:.3f}; p-value: {pval:.6f}")>> V-statistics: 24.127; p-value: 0.002904Вот мы и подобрались к чему-то интересному!

Априорное распределение и функция правдоподобияПредположим, что у нас есть:Априорное распределение с параметрамиФункция правдоподобия для нового наблюдения с п…

3 months, 1 week назад @ habr.com
Создаем воспоминания. Осваиваем FLUX, LoRA и ComfyUI
Создаем воспоминания. Осваиваем FLUX, LoRA и ComfyUI Создаем воспоминания. Осваиваем FLUX, LoRA и ComfyUI

Такие модели можно обучать с нуля и это дорого, нужен кластер с GPU (видеокарты) и много данных.

В домене текст-картинка бывают открытые модели, типа Stable Diffusion, Kandinsky и FLUX, бывают закрытые, типа DALL-E.Открытую модель можно дообучать разными способами.

Борис СтругацкийОсобенности: Для личностей типа Стругацких или Бродского, качественных фотографий крайне мало, но много и не надо.

Можно и фразу.

Владимир СурдинАлексей СемихатовВидео с их лекциями можно найти повсеместно, начать можно с канала Вселенная плюс на YouTube и в телеграм.

6 months назад @ habr.com
Как нейросети, RL и байесовскую оптимизацию стали использовать на ускорителях заряженных частиц
Как нейросети, RL и байесовскую оптимизацию стали использовать на ускорителях заряженных частиц Как нейросети, RL и байесовскую оптимизацию стали использовать на ускорителях заряженных частиц

Один из них — поддержание стабильной орбиты пучка частиц (траектории, по которой происходит движение), которая критически важна для точности экспериментов.

Вот, кстати, наша статья про то, как сейсмические вибрации будут влиять на орбиту пучка в СКИФ: Beam Stability .

Классические подходы к стабилизации орбиты пучка в ускорителяхДля стабилизации орбиты используют датчики положения пучка (BPM) и магнитные корректоры.

По словам авторов получились следующие преимущества:Ускорение процесса коррекции орбиты и повышение точности по сравнению с классическими методами, такими как SVD.

Задача агента:Автоматическое восстановление орбиты пучка заряженных частиц за ограниченное время и минимальное коли…

6 months, 2 weeks назад @ habr.com
о1: почему новая GPT от OpenAI — это не хайп, а переход к новой парадигме в ИИ
о1: почему новая GPT от OpenAI — это не хайп, а переход к новой парадигме в ИИ о1: почему новая GPT от OpenAI — это не хайп, а переход к новой парадигме в ИИ

В этой статье мы разберемся, чему научилась новая GPT o1, и как это повлияет на дальнейшую эволюцию ИИ.

Компания утверждает, что для них сброс счётчика линейки моделей к единичке знаменует собой переход к новой парадигме, и что эта нейросеть и вовсе демонстрирует новый уровень возможностей ИИ.

На форумах и в Твиттере была куча обсуждений, предвосхищений и хайпа, на фоне которых планка ожиданий некоторых людей взлетела до небес.

Издание Bloomberg рассказало, что в ходе внутренней демонстрации OpenAI показали концепцию из пяти уровней, помогающую отслеживать прогресс в создании ИИ.

Однако на уровне GPT-5 прирост в навыках может быть совсем другим (как в лучшую, так и в худшую сторону).

9 months, 2 weeks назад @ habr.com
Большие и чёрные (ящики): что мы знаем о том, как «думают» нейросети?
Большие и чёрные (ящики): что мы знаем о том, как «думают» нейросети? Большие и чёрные (ящики): что мы знаем о том, как «думают» нейросети?

И в том, и в другом случаях объяснение действия не связано с реальным мотивом его сделать, и там, и там рождается поддельное (но правдоподобно звучащее) объяснение причин.

Просто сейчас это не воспринимается всерьёз, ведь LLM не распространены и не становятся ядром бизнес-процессов, включающих принятие решений.

Один и тот же текст запроса+ответа подаётся в модель, и производится оценка вероятности получить именно такой ответ при фиксированном запросе.

Это и желание продолжать существовать/жить, и нежелание умирать, и рассуждения об эмоциях и контроле.

Потому что абстракции, потому что обобщение, потому что это ровно то, за что мы ценим модели.

9 months, 3 weeks назад @ habr.com
Machine Learning Mastery
последний пост None
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 3 months назад
Who is AI For?
Who is AI For? Who is AI For?

I think the easy answer to this question is that right now, AI is for the AI developers.

Code is useful, it makes money, it is a testbed for AI speeding up the development of AI, and it is easy.

I’m working in AI because it pays well and is potentially really good for the world.

The artists did not know what AI was, but when they learned, they quickly decided they did not want it.

It feels like the most likely outcome is that people go all-in on pushing raw intelligence, in the way that AI developers can measure it, leaving behind those that are not like AI developers.

3 months назад @ alexirpan.com
MIT Mystery Hunt 2025
MIT Mystery Hunt 2025 MIT Mystery Hunt 2025

This has spoilers for MIT Mystery Hunt 2025.

I enjoyed it more than their 2018 Hunt, which is commonly cited as an all-time good Mystery Hunt.

In this Mystery Hunt it was reversed, where the act of unlocking is easy but the value and difficulty of a feeder varied.

In my free time pre-Hunt, I went to Puzzled Pint, where I tried to all-brain a logic puzzle (solve it without writing anything).

I’m looking forward to solving “No Assembly Required” in Mystery Hunt 2026, a puzzle that gives you the answer for no work.

5 months, 1 week назад @ alexirpan.com
Using AI to Get the Neopets Destruct-o-Match Avatar
Using AI to Get the Neopets Destruct-o-Match Avatar Using AI to Get the Neopets Destruct-o-Match Avatar

If AI can be superhuman at Go, surely AI can be slightly-worse-than-experts at Destruct-o-Match if we try?

Step 0: Is Making a Destruct-o-Match AI Against Neopets Rules?

I believe the precedent is in favor of a Destruct-o-Match AI being okay.

As long as I’m the one inputting moves the Destruct-o-Match AI recommends, I should be okay.

To write a game AI, we first need to implement the rules of the game in code.

5 months, 3 weeks назад @ alexirpan.com
Late Takes on OpenAI o1
Late Takes on OpenAI o1 Late Takes on OpenAI o1

I realize how late this is, but I didn’t get a post out while o1 was fresh, and still feel like writing one despite it being cold.

(Also, OpenAI just announced they’re going to ship new stuff starting tomorrow so it’s now or never to say something.)

OpenAI o1 is a model release widely believed (but not confirmed) to be a post-trained version of GPT-4o.

If true, that makes this video especially useful for understanding OpenAI o1.

Which I suppose is part of why I’m talking about o1 rather than building o1.

7 months назад @ alexirpan.com
Lil'Log
последний пост None
The Spectator
последний пост None
Off the Convex Path
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
Andrew Karpathy blog
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост 5 days, 4 hours назад
/mehrdadsaberi/ Model State Arithmetic for Machine Unlearning
/mehrdadsaberi/ Model State Arithmetic for Machine Unlearning /mehrdadsaberi/ Model State Arithmetic for Machine Unlearning

Large language models are trained on massive corpora of web data, which may include private data, copyrighted material, factually inaccurate data, or data that degrades model performance.

Eliminating the influence of such problematic datapoints through complete retraining -- by repeatedly pretraining the model on datasets that exclude these specific instances -- is computationally prohibitive.

For this reason, unlearning algorithms have emerged that aim to eliminate the influence of particular datapoints, while otherwise preserving the model -- at a low computational cost.

However, precisely estimating and undoing the influence of individual datapoints has proved to be challenging.

In this …

5 days, 4 hours назад @ paperswithcode.com
/quester-one/ Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents
/quester-one/ Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents /quester-one/ Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents

As Multimodal Large Language Models (MLLMs) advance, multimodal agents show promise in real-world tasks like web navigation and embodied intelligence.

A promising approach is to use reward models as external feedback, but there is no clear on how to select reward models for agents.

The benchmark is characterized by three key features: (1) Multiple dimensions and real-world agent scenarios evaluation.

It covers perception, planning, and safety with 7 scenarios; (2) Step-level reward evaluation.

Experiments demonstrate that even state-of-the-art multimodal models show limited performance, highlighting the need for specialized training in agent reward modeling.

5 days, 4 hours назад @ paperswithcode.com
/huggingface/ FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
/huggingface/ FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language /huggingface/ FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Pre-training state-of-the-art large language models (LLMs) requires vast amounts of clean and diverse text data.

While the open development of large high-quality English pre-training datasets has seen substantial recent progress, training performant multilingual LLMs remains a challenge, in large part due to the inherent difficulty of tailoring filtering and deduplication pipelines to a large number of languages.

In this work, we introduce a new pre-training dataset curation pipeline based on FineWeb that can be automatically adapted to support any language.

Ultimately, we show that our pipeline can be used to create non-English corpora that produce more performant models than prior dataset…

5 days, 4 hours назад @ paperswithcode.com
/seon82/ A Hierarchical Deep Learning Approach for Minority Instrument Detection
/seon82/ A Hierarchical Deep Learning Approach for Minority Instrument Detection /seon82/ A Hierarchical Deep Learning Approach for Minority Instrument Detection

Identifying instrument activities within audio excerpts is vital in music information retrieval, with significant implications for music cataloging and discovery.

Prior deep learning endeavors in musical instrument recognition have predominantly emphasized instrument classes with ample data availability.

Recent studies have demonstrated the applicability of hierarchical classification in detecting instrument activities in orchestral music, even with limited fine-grained annotations at the instrument level.

This work presents various strategies to integrate hierarchical structures into models and tests a new class of models for hierarchical music prediction.

This study showcases more reliabl…

5 days, 4 hours назад @ paperswithcode.com
/sri-csl/ Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference
/sri-csl/ Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference /sri-csl/ Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference

Prior work has made Bayesian deep learning-based approaches to this problem more tractable by performing inference over the low-rank adaptation (LoRA) parameters of a fine-tuned model.

In this work we present $\textbf{Scala}$ble $\textbf{B}$ayesian $\textbf{L}$ow-Rank Adaptation via Stochastic Variational Subspace Inference (ScalaBL).

We perform Bayesian inference in an $r$-dimensional subspace, for LoRA rank $r$.

By repurposing the LoRA parameters as projection matrices, we are able to map samples from this subspace into the full weight space of the LLM.

This allows us to learn all the parameters of our approach using stochastic variational inference.

5 days, 4 hours назад @ paperswithcode.com
/yihong-97/ Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation
/yihong-97/ Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation /yihong-97/ Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation

Panoramic image processing is essential for omni-context perception, yet faces constraints like distortions, perspective occlusions, and limited annotations.

Previous unsupervised domain adaptation methods transfer knowledge from labeled pinhole data to unlabeled panoramic images, but they require access to source pinhole data.

To address these, we introduce a more practical task, i.e., Source-Free Occlusion-Aware Seamless Segmentation (SFOASS), and propose its first solution, called UNconstrained Learning Omni-Context Knowledge (UNLOCK).

While adapting without relying on source data or target labels, this framework enhances models to achieve segmentation with 360{\deg} viewpoint coverage a…

5 days, 4 hours назад @ paperswithcode.com
/galvinlim/ AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification
/galvinlim/ AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification /galvinlim/ AGTCNet: A Graph-Temporal Approach for Principled Motor Imagery EEG Classification

Brain-computer interface (BCI) technology utilizing electroencephalography (EEG) marks a transformative innovation, empowering motor-impaired individuals to engage with their environment on equal footing.

Despite its promising potential, developing subject-invariant and session-invariant BCI systems remains a significant challenge due to the inherent complexity and variability of neural activity across individuals and over time, compounded by EEG hardware constraints.

While prior studies have sought to develop robust BCI systems, existing approaches remain ineffective in capturing the intricate spatiotemporal dependencies within multichannel EEG signals.

This study addresses this gap by int…

5 days, 4 hours назад @ paperswithcode.com
/heboyong/ Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability
/heboyong/ Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability /heboyong/ Boosting Domain Generalized and Adaptive Detection with Diffusion Models: Fitness, Generalization, and Transferability

Detectors often suffer from performance drop due to domain gap between training and testing data.

Recent methods explore diffusion models applied to domain generalization (DG) and adaptation (DA) tasks, but still struggle with large inference costs and have not yet fully leveraged the capabilities of diffusion models.

We also apply consistency loss to align the auxiliary and ordinary branch, balancing fitness and generalization while preventing overfitting and improving performance on target domains (i.e., Generalization).

Furthermore, within a unified framework, standard detectors are guided by diffusion detectors through feature-level and object-level alignment on source domains (for DG) …

5 days, 4 hours назад @ paperswithcode.com
/yannkerzreho/ Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games
/yannkerzreho/ Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games /yannkerzreho/ Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games

This paper introduces a new approach for approximating the learning dynamics of multiple reinforcement learning (RL) agents interacting in a finite-state Markov game.

The idea is to rescale the learning process by simultaneously reducing the learning rate and increasing the update frequency, effectively treating the agent's parameters as a slow-evolving variable influenced by the fast-mixing game state.

Under mild assumptions-ergodicity of the state process and continuity of the updates-we prove the convergence of this rescaled process to an ordinary differential equation (ODE).

This ODE provides a tractable, deterministic approximation of the agent's learning dynamics.

An implementation of…

5 days, 4 hours назад @ paperswithcode.com
/hwang52/ FedSC: Federated Learning with Semantic-Aware Collaboration
/hwang52/ FedSC: Federated Learning with Semantic-Aware Collaboration /hwang52/ FedSC: Federated Learning with Semantic-Aware Collaboration

Federated learning (FL) aims to train models collaboratively across clients without sharing data for privacy-preserving.

However, one major challenge is the data heterogeneity issue, which refers to the biased labeling preferences at multiple clients.

To explore the possibility of using intra-client semantically meaningful knowledge in handling data heterogeneity, in this paper, we propose Federated Learning with Semantic-Aware Collaboration (FedSC) to capture client-specific and class-relevant knowledge across heterogeneous clients.

The core idea of FedSC is to construct relational prototypes and consistent prototypes at semantic-level, aiming to provide fruitful class underlying knowledge…

5 days, 4 hours назад @ paperswithcode.com
/ins-amu/ Amortizing personalization in virtual brain twins
/ins-amu/ Amortizing personalization in virtual brain twins /ins-amu/ Amortizing personalization in virtual brain twins

Virtual brain twins are personalized digital models of individual human subject or patient's brains, allowing for mechanistic interpretation of neuroimaging data features.

Training and inference with these models however presents a pair of challenges: large shared infrastructure do not allow for use of personal data and inference in clinical applications should not require significant resources.

We introduce "anonymized personalization" to address both by expanding model priors to include personalization which under amortized inference allows training to be performed anonymously, while inference is both personalized and lightweight.

We illustrate the basic approach, demonstrate reliability …

5 days, 4 hours назад @ paperswithcode.com
/tianyi-lab/ FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
/tianyi-lab/ FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing /tianyi-lab/ FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as "Detect the bench in the image while recoloring it to pink.

Also, remove the cat for a clearer view and recolor the wall to yellow.''

It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A$^*$ search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools.

By comparing with recent image editing approaches, we demonstrate FaSTA$^*$ is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate.

PDFAbstract

5 days, 4 hours назад @ paperswithcode.com
/ai-agi/ TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding
/ai-agi/ TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding /ai-agi/ TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding

Existing multimodal large language models (MLLMs) struggle with such WildStruct conditions, resulting in limited performance and poor generalization.

To address these challenges, we propose TableMoE, a neuro-symbolic Mixture-of-Connector-Experts (MoCE) architecture specifically designed for robust, structured reasoning over multimodal table data.

TableMoE features an innovative Neuro-Symbolic Routing mechanism, which predicts latent semantic token roles (e.g., header, data cell, axis, formula) and dynamically routes table elements to specialized experts (Table-to-HTML, Table-to-JSON, Table-to-Code) using a confidence-aware gating strategy informed by symbolic reasoning graphs.

Extensive abl…

5 days, 4 hours назад @ paperswithcode.com
/jianghaiscu/ Learning to See in the Extremely Dark
/jianghaiscu/ Learning to See in the Extremely Dark /jianghaiscu/ Learning to See in the Extremely Dark

Learning-based methods have made promising advances in low-light RAW image enhancement, while their capability to extremely dark scenes where the environmental illuminance drops as low as 0.0001 lux remains to be explored due to the lack of corresponding datasets.

To this end, we propose a paired-to-paired data synthesis pipeline capable of generating well-calibrated extremely low-light RAW images at three precise illuminance ranges of 0.01-0.1 lux, 0.001-0.01 lux, and 0.0001-0.001 lux, together with high-quality sRGB references to comprise a large-scale paired dataset named See-in-the-Extremely-Dark (SIED) to benchmark low-light RAW image enhancement approaches.

Extensive experiments on th…

5 days, 4 hours назад @ paperswithcode.com
/unabletousegit/ Task-Aware KV Compression For Cost-Effective Long Video Understanding
/unabletousegit/ Task-Aware KV Compression For Cost-Effective Long Video Understanding /unabletousegit/ Task-Aware KV Compression For Cost-Effective Long Video Understanding

Recent approaches have explored KV compression to mitigate this issue, but they often suffer from significant information loss at high compression ratios.

In this paper, we introduce Video-X^2L, which flexibly preserves critical video information for each LVU task.

The first one is called bi-level KV compression.

During the MLLM's pre-filling stage, Video-X^2L generates two types of compressed KVs: low-compression KVs (L-KVs) to capture fine-grained video details and high-compression KVs (H-KVs) to offer compact video representations.

During the MLLM's decoding stage, Video-X^2L selectively re-loads L-KVs for the most critical video chunks while using H-KVs for other less important ones.

5 days, 4 hours назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 5 days, 4 hours назад
/everm0re/ EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora
/everm0re/ EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora /everm0re/ EraRAG: Efficient and Incremental Retrieval Augmented Generation for Growing Corpora

Graph-based Retrieval-Augmented Generation (Graph-RAG) enhances large language models (LLMs) by structuring retrieval over an external corpus.

To address these limitations, we introduce EraRAG, a novel multi-layered Graph-RAG framework that supports efficient and scalable dynamic updates.

The design eliminates the need for retraining or costly recomputation while preserving high retrieval accuracy and low latency.

Experiments on large-scale benchmarks demonstrate that EraRag achieves up to an order of magnitude reduction in update time and token consumption compared to existing Graph-RAG systems, while providing superior accuracy performance.

This work offers a practical path forward for RA…

5 days, 4 hours назад @ paperswithcode.com
/danialmoa/ Robust Deep Learning for Myocardial Scar Segmentation in Cardiac MRI with Noisy Labels
/danialmoa/ Robust Deep Learning for Myocardial Scar Segmentation in Cardiac MRI with Noisy Labels /danialmoa/ Robust Deep Learning for Myocardial Scar Segmentation in Cardiac MRI with Noisy Labels

The accurate segmentation of myocardial scars from cardiac MRI is essential for clinical assessment and treatment planning.

In this study, we propose a robust deep-learning pipeline for fully automated myocardial scar detection and segmentation by fine-tuning state-of-the-art models.

We evaluate the model's performance on both acute and chronic cases and demonstrate its ability to produce accurate and smooth segmentations despite noisy labels.

In particular, our approach outperforms state-of-the-art models like nnU-Net and shows strong generalizability in an out-of-distribution test set, highlighting its robustness across various imaging conditions and clinical tasks.

These results establis…

5 days, 4 hours назад @ paperswithcode.com
/antoniolopezmc/ Discovering multiple antibiotic resistance phenotypes using diverse top-k subgroup list discovery
/antoniolopezmc/ Discovering multiple antibiotic resistance phenotypes using diverse top-k subgroup list discovery /antoniolopezmc/ Discovering multiple antibiotic resistance phenotypes using diverse top-k subgroup list discovery

Patient phenotyping is the task of finding a set of patient characteristics related to a specific medical problem such as the one described in this work.

However, a single explanation of a medical phenomenon might be useless in the eyes of a clinical expert and be discarded.

The discovery of multiple patient phenotypes for the same medical phenomenon would be useful in such cases.

Our proposal provides clinicians with a method with which to obtain multiple and diverse phenotypes of a set of patients.

We show a real use case of phenotyping in antimicrobial resistance using the well-known MIMIC-III dataset.

5 days, 4 hours назад @ paperswithcode.com
/lcs0215/ OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
/lcs0215/ OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography /lcs0215/ OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography

As one of the earliest ancient languages, Oracle Bone Script (OBS) encapsulates the cultural records and intellectual expressions of ancient civilizations.

To address these challenges, this paper proposes a novel two-stage semantic typography framework, named OracleFusion.

In the second stage, we introduce Oracle Structural Vector Fusion (OSVF), incorporating glyph structure constraints and glyph maintenance constraints to ensure the accurate generation of semantically enriched vector fonts.

This approach preserves the objective integrity of the glyph structure, offering visually enhanced representations that assist experts in deciphering OBS.

Furthermore, OracleFusion provides expert-like …

5 days, 4 hours назад @ paperswithcode.com
/lihe-maxsize/ Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation
/lihe-maxsize/ Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation /lihe-maxsize/ Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation

Therefore, estimating the counterfactual outcomes with spatial-temporal attributes is a crucial problem.

This paper proposes a novel framework for estimating counterfactual outcomes with spatial-temporal attributes using the Transformer, exhibiting stronger estimation ability.

To validate the effectiveness of our approach, we conduct simulation experiments and real data experiments.

Simulation experiments show that our estimator has a stronger estimation capability than baseline methods.

Real data experiments provide a valuable conclusion to the causal effect of conflicts on forest loss in Colombia.

5 days, 4 hours назад @ paperswithcode.com
/aoqunjin/ Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends
/aoqunjin/ Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends /aoqunjin/ Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends

Vision-language-action (VLA) models extend vision-language models (VLM) by integrating action generation modules for robotic manipulation.

Evidence from multiple domains highlights the critical role of post-training to align foundational models with downstream applications, spurring extensive research on post-training VLA models.

Accordingly, this paper reviews post-training strategies for VLA models through the lens of human motor learning, focusing on three dimensions: environments, embodiments, and tasks.

Finally, key challenges and trends in post-training VLA models are identified, establishing a conceptual framework to guide future research.

This work delivers both a comprehensive over…

5 days, 4 hours назад @ paperswithcode.com
/pd162/ Class-Agnostic Region-of-Interest Matching in Document Images
/pd162/ Class-Agnostic Region-of-Interest Matching in Document Images /pd162/ Class-Agnostic Region-of-Interest Matching in Document Images

Document understanding and analysis have received a lot of attention due to their widespread application.

However, existing document analysis solutions, such as document layout analysis and key information extraction, are only suitable for fixed category definitions and granularities, and cannot achieve flexible applications customized by users.

Therefore, this paper defines a new task named ``Class-Agnostic Region-of-Interest Matching'' (``RoI-Matching'' for short), which aims to match the customized regions in a flexible, efficient, multi-granularity, and open-set manner.

The visual prompt of the reference document and target document images are fed into our model, while the output is the…

5 days, 4 hours назад @ paperswithcode.com
/xiweix/ ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
/xiweix/ ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation /xiweix/ ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation

Training-free open-vocabulary semantic segmentation (OVS) aims to segment images given a set of arbitrary textual categories without costly model fine-tuning.

Existing solutions often explore attention mechanisms of pre-trained models, such as CLIP, or generate synthetic data and design complex retrieval processes to perform OVS.

However, their performance is limited by the capability of reliant models or the suboptimal quality of reference sets.

In this work, we investigate the largely overlooked data quality problem for this challenging dense scene understanding task, and identify that a high-quality reference set can significantly benefit training-free OVS.

Remarkably, extensive evaluati…

5 days, 4 hours назад @ paperswithcode.com
/shuoyang2/ RecCoT: Enhancing Recommendation via Chain-of-Thought
/shuoyang2/ RecCoT: Enhancing Recommendation via Chain-of-Thought /shuoyang2/ RecCoT: Enhancing Recommendation via Chain-of-Thought

In real-world applications, users always interact with items in multiple aspects, such as through implicit binary feedback (e.g., clicks, dislikes, long views) and explicit feedback (e.g., comments, reviews).

Modern recommendation systems (RecSys) learn user-item collaborative signals from these implicit feedback signals as a large-scale binary data-streaming, subsequently recommending other highly similar items based on users' personalized historical interactions.

Consequently, under this binary learning paradigm, the RecSys struggles to understand why a user likes or dislikes certain items.

To alleviate it, some works attempt to utilize the content-based reviews to capture the semantic kn…

5 days, 4 hours назад @ paperswithcode.com
/haoang97/ Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?
/haoang97/ Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? /haoang97/ Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

Causal reasoning capability is critical in advancing large language models (LLMs) toward strong artificial intelligence.

Specifically, LLMs are only capable of performing shallow (level-1) causal reasoning, primarily attributed to the causal knowledge embedded in their parameters, but they lack the capacity for genuine human-like (level-2) causal reasoning.

To bridge the gap towards level-2 causal reasoning, we draw inspiration from the fact that human reasoning is usually facilitated by general knowledge and intended goals.

We propose G^2-Reasoner, a method that incorporates general knowledge and goal-oriented prompts into LLMs' causal reasoning processes.

Experiments demonstrate that G^2-…

5 days, 4 hours назад @ paperswithcode.com
/chenkaisun/ Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
/chenkaisun/ Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation /chenkaisun/ Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation

Given the growing influence of language model-based agents on high-stakes societal decisions, from public policy to healthcare, ensuring their beneficial impact requires understanding the far-reaching implications of their suggestions.

We propose a proof-of-concept framework that projects how model-generated advice could propagate through societal systems on a macroscopic scale over time, enabling more robust alignment.

To assess the long-term safety awareness of language models, we also introduce a dataset of 100 indirect harm scenarios, testing models' ability to foresee adverse, non-obvious outcomes from seemingly harmless user prompts.

Our approach achieves not only over 20% improvement…

5 days, 4 hours назад @ paperswithcode.com
/7uheng/ Out-of-Distribution Semantic Occupancy Prediction
/7uheng/ Out-of-Distribution Semantic Occupancy Prediction /7uheng/ Out-of-Distribution Semantic Occupancy Prediction

3D Semantic Occupancy Prediction is crucial for autonomous driving, providing a dense, semantically rich environmental representation.

However, existing methods focus on in-distribution scenes, making them susceptible to Out-of-Distribution (OoD) objects and long-tail distributions, which increases the risk of undetected anomalies and misinterpretations, posing safety hazards.

To address these challenges, we introduce Out-of-Distribution Semantic Occupancy Prediction, targeting OoD detection in 3D voxel space.

We introduce OccOoD, a novel framework integrating OoD detection into 3D semantic occupancy prediction, with Voxel-BEV Progressive Fusion (VBPF) leveraging an RWKV-based branch to enh…

5 days, 4 hours назад @ paperswithcode.com
/V3RGANz/ JointRank: Rank Large Set with Single Pass
/V3RGANz/ JointRank: Rank Large Set with Single Pass /V3RGANz/ JointRank: Rank Large Set with Single Pass

Efficiently ranking relevant items from large candidate pools is a cornerstone of modern information retrieval systems -- such as web search, recommendation, and retrieval-augmented generation.

Listwise rerankers, which improve relevance by jointly considering multiple candidates, are often limited in practice: either by model input size constraints, or by degraded quality when processing large sets.

We propose a model-agnostic method for fast reranking large sets that exceed a model input limits.

The method first partitions candidate items into overlapping blocks, each of which is ranked independently in parallel.

Finally, these comparisons are aggregated to construct a global ranking usin…

5 days, 4 hours назад @ paperswithcode.com
/k2-fsa/ ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
/k2-fsa/ ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching /k2-fsa/ ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Existing large-scale zero-shot text-to-speech (TTS) models deliver high speech quality but suffer from slow inference speeds due to massive parameters.

To address this issue, this paper introduces ZipVoice, a high-quality flow-matching-based zero-shot TTS model with a compact model size and fast inference speed.

Experiments on 100k hours multilingual datasets show that ZipVoice matches state-of-the-art models in speech quality, while being 3 times smaller and up to 30 times faster than a DiT-based flow-matching baseline.

Codes, model checkpoints and demo samples are publicly available.

PDFAbstract

1 week, 1 day назад @ paperswithcode.com
/snowflakedb/ Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences
/snowflakedb/ Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences /snowflakedb/ Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences

Long sequences are critical for applications like RAG, long document summarization, multi-modality, etc., and modern LLMs, like Llama 4 Scout, support max sequence length of up to 10 million tokens.

However, outside of enterprise labs, long sequence training is challenging for the AI community with limited system support in the open-source space.

We address this with Arctic Long Sequence Training (ALST).

It offers a combination of attention-agnostic single GPU and multi-GPU memory optimizations, that enables it to support out-of-box training of multi-million sequence length for a wide variety of HF models.

ALST is fully compatible with HF models and open-sourced via Deepspeed https://www.de…

1 week, 1 day назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 5 days, 4 hours назад
/kagnlp/ Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team
/kagnlp/ Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team /kagnlp/ Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Include the markdown at the top of your GitHub README.md file to showcase the performance of the model.

Badges are live and will be dynamically updated with the latest ranking of this paper.

1 week, 1 day назад @ paperswithcode.com
/tencent/ Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details
/tencent/ Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details /tencent/ Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details

In this report, we present Hunyuan3D 2.5, a robust suite of 3D diffusion models aimed at generating high-fidelity and detailed textured 3D assets.

Hunyuan3D 2.5 follows two-stages pipeline of its previous version Hunyuan3D 2.0, while demonstrating substantial advancements in both shape and texture generation.

In terms of shape generation, we introduce a new shape foundation model -- LATTICE, which is trained with scaled high-quality datasets, model-size, and compute.

In terms of texture generation, it is upgraded with phyiscal-based rendering (PBR) via a novel multi-view architecture extended from Hunyuan3D 2.0 Paint model.

Our extensive evaluation shows that Hunyuan3D 2.5 significantly out…

1 week, 1 day назад @ paperswithcode.com
/SaminYeasar/ Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity
/SaminYeasar/ Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity /SaminYeasar/ Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity

In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL).

While many common offline RL benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets.

We show that offline RL algorithms can overfit on small datasets, resulting in poor performance.

To address this challenge, we introduce "Sparse-Reg": a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.

PDFAbstract

1 week, 1 day назад @ paperswithcode.com
/autogluon/ TabArena: A Living Benchmark for Machine Learning on Tabular Data
/autogluon/ TabArena: A Living Benchmark for Machine Learning on Tabular Data /autogluon/ TabArena: A Living Benchmark for Machine Learning on Tabular Data

With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever.

To address this, we introduce TabArena, the first continuously maintained living tabular benchmarking system.

While gradient-boosted trees are still strong contenders on practical tabular datasets, we observe that deep learning methods have caught up under larger time budgets with ensembling.

Finally, we show that ensembles across models advance the state-of-the-art in tabular machine learning and investigate the contributions of individual models.

We launch TabArena with a public leaderboard, reproducible code, and maintenance protocols…

1 week, 1 day назад @ paperswithcode.com
/decisionintelligence/ TAB: Unified Benchmarking of Time Series Anomaly Detection Methods
/decisionintelligence/ TAB: Unified Benchmarking of Time Series Anomaly Detection Methods /decisionintelligence/ TAB: Unified Benchmarking of Time Series Anomaly Detection Methods

Time series anomaly detection (TSAD) plays an important role in many domains such as finance, transportation, and healthcare.

While many TSAD methods already exist, new and better methods are still desirable.

However, effective progress hinges on the availability of reliable means of evaluating new methods and comparing them with existing methods.

Second, TAB covers a variety of TSAD methods, including Non-learning, Machine learning, Deep learning, LLM-based, and Time-series pre-trained methods.

Finally, we employ TAB to evaluate existing TSAD methods and report on the outcomes, thereby offering a deeper insight into the performance of these methods.

1 week, 1 day назад @ paperswithcode.com
/PaddlePaddle/ PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding
/PaddlePaddle/ PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding /PaddlePaddle/ PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding

This report introduces PP-DocBee2, an advanced version of the PP-DocBee, designed to enhance multimodal document understanding.

Built on a large multimodal model architecture, PP-DocBee2 addresses the limitations of its predecessor through key technological improvements, including enhanced synthetic data quality, improved visual feature fusion strategy, and optimized inference methodologies.

A key innovation of our work is a data quality optimization strategy for multimodal document tasks.

By employing a large-scale multimodal pre-trained model to evaluate data, we apply a novel statistical criterion to filter outliers, ensuring high-quality training data.

The source code and pre-trained mo…

1 week, 1 day назад @ paperswithcode.com
/thudm/ LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
/thudm/ LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning /thudm/ LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

Ultra-long generation by large language models (LLMs) is a widely demanded scenario, yet it remains a significant challenge due to their maximum generation length limit and overall quality degradation as sequence length increases.

Previous approaches, exemplified by LongWriter, typically rely on ''teaching'', which involves supervised fine-tuning (SFT) on synthetic long-form outputs.

However, this strategy heavily depends on synthetic SFT data, which is difficult and costly to construct, often lacks coherence and consistency, and tends to be overly artificial and structurally monotonous.

In this work, we propose an incentivization-based approach that, starting entirely from scratch and with…

1 week, 1 day назад @ paperswithcode.com
/gen-verse/ ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
/gen-verse/ ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs /gen-verse/ ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Process Reward Models (PRMs) have recently emerged as a powerful framework for supervising intermediate reasoning steps in large language models (LLMs).

Previous PRMs are primarily trained on model final output responses and struggle to evaluate intermediate thinking trajectories robustly, especially in the emerging setting of trajectory-response outputs generated by frontier reasoning models like Deepseek-R1.

In this work, we introduce ReasonFlux-PRM, a novel trajectory-aware PRM explicitly designed to evaluate the trajectory-response type of reasoning traces.

ReasonFlux-PRM incorporates both step-level and trajectory-level supervision, enabling fine-grained reward assignment aligned with …

1 week, 1 day назад @ paperswithcode.com
/ibm/ AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory Computing
/ibm/ AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory Computing /ibm/ AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory Computing

Analog In-memory Computing (AIMC) has emerged as a highly efficient paradigm for accelerating Deep Neural Networks (DNNs), offering significant energy and latency benefits over conventional digital hardware.

Neural Architecture Search (NAS) is thus needed to systematically discover neural architectures optimized explicitly for AIMC constraints.

However, comparing NAS methodologies and extracting insights about robust architectures for AIMC requires a dedicated NAS benchmark that explicitly accounts for AIMC-specific hardware non-idealities.

To address this, we introduce AnalogNAS-Bench, the first NAS benchmark tailored specifically for AIMC.

These insights highlight the limitations of curre…

1 week, 1 day назад @ paperswithcode.com
/deeps73/ CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation
/deeps73/ CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation /deeps73/ CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation

Large language models (LLMs), despite their ability to perform few-shot machine translation (MT), often lag behind dedicated MT systems trained on parallel corpora, which are crucial for high quality machine translation (MT).

However, parallel corpora are often scarce or non-existent for low-resource languages.

In this paper, we propose CycleDistill, a bootstrapping approach leveraging LLMs and few-shot translation to obtain high-quality MT systems.

CycleDistill involves iteratively generating synthetic parallel corpora from monolingual corpora via zero- or few-shot MT, which is then used to fine-tune the model that was used for generating said data for MT.

We also study the effect of lever…

1 week, 1 day назад @ paperswithcode.com
/gsumbul/ SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
/gsumbul/ SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images /gsumbul/ SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images

From optical sensors to microwave radars, leveraging the complementary strengths of remote sensing (RS) sensors is crucial for achieving dense spatio-temporal monitoring of our planet.

On the contrary, a single model able to modulate its feature representations to accept diverse sensors as input would pave the way to agile and flexible multi-sensor RS data processing.

To address this, we introduce SMARTIES, a generic and versatile foundation model lifting sensor-specific/dependent efforts and enabling scalability and generalization to diverse RS sensors: SMARTIES projects data from heterogeneous sensors into a shared spectrum-aware space, enabling the use of arbitrary combinations of bands …

1 week, 1 day назад @ paperswithcode.com
/glad-ruc/ Fast and Distributed Equivariant Graph Neural Networks by Virtual Node Learning
/glad-ruc/ Fast and Distributed Equivariant Graph Neural Networks by Virtual Node Learning /glad-ruc/ Fast and Distributed Equivariant Graph Neural Networks by Virtual Node Learning

Equivariant Graph Neural Networks (GNNs) have achieved remarkable success across diverse scientific applications.

To address these limitations, we introduce FastEGNN and DistEGNN, two novel enhancements to equivariant GNNs for large-scale geometric graphs.

FastEGNN employs a key innovation: a small ordered set of virtual nodes that effectively approximates the large unordered graph of real nodes.

For extremely large-scale geometric graphs, we present DistEGNN, a distributed extension where virtual nodes act as global bridges between subgraphs in different devices, maintaining consistency while dramatically reducing memory and computational overhead.

Results demonstrate superior efficiency a…

1 week, 1 day назад @ paperswithcode.com
/JacopoDapueto/ Disentangled representations of microscopy images
/JacopoDapueto/ Disentangled representations of microscopy images /JacopoDapueto/ Disentangled representations of microscopy images

Microscopy image analysis is fundamental for different applications, from diagnosis to synthetic engineering and environmental monitoring.

Modern acquisition systems have granted the possibility to acquire an escalating amount of images, requiring a consequent development of a large collection of deep learning-based automatic image analysis methods.

Although deep neural networks have demonstrated great performance in this field, interpretability, an essential requirement for microscopy image analysis, remains an open challenge.

This work proposes a Disentangled Representation Learning (DRL) methodology to enhance model interpretability for microscopy image classification.

PDFAbstract

1 week, 1 day назад @ paperswithcode.com
/ghimiredhikura/ Loss-Aware Automatic Selection of Structured Pruning Criteria for Deep Neural Network Acceleration
/ghimiredhikura/ Loss-Aware Automatic Selection of Structured Pruning Criteria for Deep Neural Network Acceleration /ghimiredhikura/ Loss-Aware Automatic Selection of Structured Pruning Criteria for Deep Neural Network Acceleration

Structured pruning is a well-established technique for compressing neural networks, making it suitable for deployment in resource-limited edge devices.

This paper presents an efficient Loss-Aware Automatic Selection of Structured Pruning Criteria (LAASP) for slimming and accelerating deep neural networks.

The automatic selection of magnitude or similarity-based filter pruning criteria from a specified pool of criteria and the specific pruning layer at each pruning iteration is guided by the network's overall loss on a small subset of the training data.

The optimal pruning rates for each layer in the network are automatically determined, eliminating the need for manual allocation of fixed or…

1 week, 1 day назад @ paperswithcode.com
/guinan-su/ GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
/guinan-su/ GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching /guinan-su/ GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching

Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.

However, such impressive capability typically comes with a substantial model size, which presents significant challenges in deployment and inference.

While structured pruning of model parameters offers a promising way to reduce computational costs at deployment time, current methods primarily focus on single model pruning.

We pose the optimal tailoring of these LLMs as a zero-order optimization problem, adopting a search space that supports three different operations: (1) Layer removal, (2) Layer selection from different candidate models, and (3) Layer merging.

Our experiments demonstra…

1 week, 1 day назад @ paperswithcode.com
💼 University and corporation labs
DeepMind DeepMind
последний пост 1 week, 3 days назад
AlphaGenome: AI for better understanding the genome
AlphaGenome: AI for better understanding the genome AlphaGenome: AI for better understanding the genome

Science AlphaGenome: AI for better understanding the genome ShareCopy link ×Introducing a new, unifying DNA sequence model that advances regulatory variant-effect prediction and promises to shed new light on genome function — now available via API.

Small variations in a genome’s DNA sequence can alter an organism’s response to its environment or its susceptibility to disease.

How AlphaGenome works Our AlphaGenome model takes a long DNA sequence as input — up to 1 million letters, also known as base-pairs — and predicts thousands of molecular properties characterising its regulatory activity.

We haven't designed or validated AlphaGenome for personal genome prediction, a known challenge for A…

1 week, 3 days назад @ deepmind.google
Gemini Robotics On-Device brings AI to local robotic devices
Gemini Robotics On-Device brings AI to local robotic devices Gemini Robotics On-Device brings AI to local robotic devices

In March, we introduced Gemini Robotics, our most advanced VLA (vision language action) model, bringing Gemini 2.0’s multimodal reasoning and real-world understanding into the physical world.

Today, we’re introducing Gemini Robotics On-Device, our most powerful VLA model optimized to run locally on robotic devices.

Gemini Robotics On-Device shows strong general-purpose dexterity and task generalization, and it’s optimized to run efficiently on the robot itself.

Model capabilities and performanceGemini Robotics On-Device is a robotics foundation model for bi-arm robots, engineered to require minimal computational resources.

It builds on the task generalization and dexterity capabilities of G…

1 week, 4 days назад @ deepmind.google
Gemini 2.5: Updates to our family of thinking models
Gemini 2.5: Updates to our family of thinking models Gemini 2.5: Updates to our family of thinking models

Explore the latest Gemini 2.5 model updates with enhanced performance and accuracy: Gemini 2.5 Pro now stable, Flash generally available, and the new Flash-Lite in preview.

2 weeks, 4 days назад @ deepmind.google
We’re expanding our Gemini 2.5 family of models
We’re expanding our Gemini 2.5 family of models We’re expanding our Gemini 2.5 family of models

We designed Gemini 2.5 to be a family of hybrid reasoning models that provide amazing performance, while also being at the Pareto Frontier of cost and speed.

Today, we’re taking the next step with our 2.5 Pro and Flash models by releasing them as stable and generally available.

And we’re bringing you 2.5 Flash-Lite in preview — our most cost-efficient and fastest 2.5 model yet.

Making 2.5 Flash and 2.5 Pro generally availableThanks to all of your feedback, today we’re releasing stable versions of 2.5 Flash and Pro, so you can build production applications with confidence.

Introducing Gemini 2.5 Flash-LiteWe’re also introducing a preview of the new Gemini 2.5 Flash-Lite, our most cost-effici…

2 weeks, 4 days назад @ blog.google
Behind “ANCESTRA”: combining Veo with live-action filmmaking
Behind “ANCESTRA”: combining Veo with live-action filmmaking Behind “ANCESTRA”: combining Veo with live-action filmmaking

Today, Eliza McNitt’s short film, “ANCESTRA,” premieres at the Tribeca Festival.

It’s the story of a mother, and what happens when her child is born with a hole in its heart.

Inspired by the dramatic events of McNitt's own birth, the film portrays a mother's love as a cosmic, life-saving force.

Together, we founded this partnership to put the world’s best generative AI into the hands of top filmmakers, to advance the frontiers of storytelling and technology.

“ANCESTRA” combined live-action scenes with sequences generated by Veo, our state-of-the-art video generation model.

3 weeks, 1 day назад @ blog.google
How we're supporting better tropical cyclone prediction with AI
How we're supporting better tropical cyclone prediction with AI How we're supporting better tropical cyclone prediction with AI

Research How we're supporting better tropical cyclone prediction with AI ShareCopy link ×We’re launching Weather Lab, featuring our experimental cyclone predictions, and we’re partnering with the U.S. National Hurricane Center to support their forecasts and warnings this cyclone season.

Yet, improving the accuracy of cyclone predictions can help protect communities through more effective disaster preparedness and earlier evacuations.

Today, Google DeepMind and Google Research are launching Weather Lab, an interactive website for sharing our artificial intelligence (AI) weather models.

Weather Lab’s live and historical cyclone predictions Weather Lab shows live and historical cyclone predict…

3 weeks, 2 days назад @ deepmind.google
How we're supporting better tropical cyclone prediction with AI
How we're supporting better tropical cyclone prediction with AI How we're supporting better tropical cyclone prediction with AI

We’re launching Weather Lab, featuring our experimental cyclone predictions, and we’re partnering with the U.S. National Hurricane Center to support their forecasts and warnings this cyclone season.

3 weeks, 2 days назад @ d16660f-dot-gdm-deepmind-com-prod.appspot.com
Advanced audio dialog and generation with Gemini 2.5
Advanced audio dialog and generation with Gemini 2.5 Advanced audio dialog and generation with Gemini 2.5

Safety and responsibilityWe’ve proactively assessed potential risks throughout every stage of the development process for these native audio features, using what we’ve learned to inform our mitigation strategies.

Additionally, all audio outputs from our models are embedded with SynthID, our watermarking technology, to ensure transparency by making AI-generated audio identifiable.

Native audio capabilities for developersWe’re bringing native audio outputs to Gemini 2.5 models, giving developers new capabilities to build richer, more interactive applications via the Gemini API in Google AI Studio or Vertex AI.

To begin exploring, developers can try native audio dialog with Gemini 2.5 Flash pr…

1 month назад @ blog.google
Advanced audio dialog and generation with Gemini 2.5
Advanced audio dialog and generation with Gemini 2.5 Advanced audio dialog and generation with Gemini 2.5

Gemini 2.5 has new capabilities in AI-powered audio dialog and generation.

1 month назад @ d16660f-dot-gdm-deepmind-com-prod.appspot.com
Fuel your creativity with new generative media models and tools
Fuel your creativity with new generative media models and tools Fuel your creativity with new generative media models and tools

Today, we’re announcing our newest generative media models, which mark significant breakthroughs.

These models create breathtaking images, videos and music, empowering artists to bring their creative vision to life.

Veo 3 and Imagen 4, our newest video and image generation models, push the frontier of media generation, with their groundbreaking new capabilities.

We're also expanding access to Lyria 2, giving musicians more tools to create music.

Using Google DeepMind’s most advanced models, Flow lets you weave cinematic films with more sophisticated control of characters, scenes and styles, to bring your story to life.

1 month, 2 weeks назад @ blog.google
SynthID Detector — a new portal to help identify AI-generated content
SynthID Detector — a new portal to help identify AI-generated content SynthID Detector — a new portal to help identify AI-generated content

Learn about the new SynthID Detector portal we announced at I/O to help people understand how the content they see online was generated.

1 month, 2 weeks назад @ d16660f-dot-gdm-deepmind-com-prod.appspot.com
Advancing Gemini's security safeguards
Advancing Gemini's security safeguards Advancing Gemini's security safeguards

We’ve made Gemini 2.5 our most secure model family to date.

1 month, 2 weeks назад @ d16660f-dot-gdm-deepmind-com-prod.appspot.com
Advancing Gemini's security safeguards
Advancing Gemini's security safeguards Advancing Gemini's security safeguards

Responsibility & Safety Advancing Gemini's security safeguards ShareCopy link ×We’re publishing a new white paper outlining how we’ve made Gemini 2.5 our most secure model family to date.

Indirect prompt injection presents a real cybersecurity challenge where AI models sometimes struggle to differentiate between genuine user instructions and manipulative commands embedded within the data they retrieve.

Pause video Play videoEvaluating baseline defense strategies Indirect prompt injection attacks are complex and require constant vigilance and multiple layers of defense.

Google DeepMind’s Security and Privacy Research team specialises in protecting our AI models from deliberate, malicious att…

1 month, 2 weeks назад @ 77b50d0-dot-gdm-deepmind-com-prod.appspot.com
Advancing Gemini's security safeguards
Advancing Gemini's security safeguards Advancing Gemini's security safeguards

Responsibility & Safety Advancing Gemini's security safeguards ShareCopy link ×We’re publishing a new white paper outlining how we’ve made Gemini 2.5 our most secure model family to date.

Indirect prompt injection presents a real cybersecurity challenge where AI models sometimes struggle to differentiate between genuine user instructions and manipulative commands embedded within the data they retrieve.

Pause video Play videoEvaluating baseline defense strategies Indirect prompt injection attacks are complex and require constant vigilance and multiple layers of defense.

Google DeepMind’s Security and Privacy Research team specialises in protecting our AI models from deliberate, malicious att…

1 month, 2 weeks назад @ deepmind.google
Gemini 2.5: Our most intelligent models are getting even better
Gemini 2.5: Our most intelligent models are getting even better Gemini 2.5: Our most intelligent models are getting even better

Gemini 2.5 Pro continues to be loved by developers as the best model for coding, and 2.5 Flash is getting even better with a new update. We’re bringing new capabilities to our models, including Deep Think, an experimental enhanced reasoning mode for 2.5 Pro.

1 month, 2 weeks назад @ d16660f-dot-gdm-deepmind-com-prod.appspot.com
Google
последний пост 3 days, 8 hours назад
A guide to converting ADK agents with MCP to the A2A framework
A guide to converting ADK agents with MCP to the A2A framework A guide to converting ADK agents with MCP to the A2A framework

The evolution of AI agents has led to powerful, specialized models capable of complex tasks.

However, to unlock their full potential, these agents must be able to collaborate.

This guide provides a step-by-step process for converting a standalone ADK agent that uses an MCP tool into a fully A2A-compatible component, ready to participate in a larger, multi-agent ecosystem.

We will use a MultiURLBrowser agent, designed to scrape web content, as a practical exampleStep 1: Define the core agent and its MCP tool (agent.py)The foundation of your agent remains its core logic.

The key is to properly initialize the ADK LlmAgent and configure its MCPToolset to connect with its external tool.

3 days, 8 hours назад @ cloud.google.com
How to build a simple multi-agentic system using Google’s ADK
How to build a simple multi-agentic system using Google’s ADK How to build a simple multi-agentic system using Google’s ADK

When the Root Agent calls the Flight Agent as a sub-agent, the responsibility for answering the user is completely transferred to the Flight Agent.

The Root Agent is effectively out of the loop.

All subsequent user input will be handled solely by the Flight Agent.

This led to the next evolution: the Dispatcher Agent with Agent Tools.

The root agent could then reason about a complex query and decide to use multiple tools to get the job done.

3 days, 8 hours назад @ cloud.google.com
How to build Web3 AI agents with Google Cloud
How to build Web3 AI agents with Google Cloud How to build Web3 AI agents with Google Cloud

Some sophisticated libraries now equip developers with the tools to build and deploy them.

How to build Web3 AI Agents with Google CloudGoogle Cloud provides a flexible, end-to-end suite of tools for building Web3 AI Agents, allowing you to start simple and scale to highly complex, customized solutions:1.

For rapid prototyping and no-code development: Vertex AI Agent BuilderConversational Agents allows for rapid prototyping and deployment of agents through a user-friendly interface, making it accessible even for non-technical users (refer to the Agent Builder codelab for a quick start).

Agents can be easily augmented with standard capabilities like leveraging datastores, performing Google s…

5 days, 6 hours назад @ cloud.google.com
You dream it, Veo creates it: Veo 3 is now available for everyone in public preview on Vertex AI
You dream it, Veo creates it: Veo 3 is now available for everyone in public preview on Vertex AI You dream it, Veo creates it: Veo 3 is now available for everyone in public preview on Vertex AI

With Veo 3, we’ve leapt forward in combining video and audio generation to take storytelling to the next level.

Today, we’re excited to share that Veo 3 is now available for all Google Cloud customers and partners in public preview on Vertex AI.

Why this matters: Veo 3 is your partner for creating near-cinematic quality generative video, moving beyond novelty to narrative-driven creation.

It not only brings stunning visual quality, but now adds sound from background sounds to dialogue.

With Veo 3 on Vertex AI, you can take advantage of three powerful new capabilities:

1 week, 2 days назад @ cloud.google.com
How Schroders built its multi-agent financial analysis research assistant
How Schroders built its multi-agent financial analysis research assistant How Schroders built its multi-agent financial analysis research assistant

For example, Vertex AI Agent Builder provided easy tool integration for leveraging:Internal knowledge: Grounding with Vertex AI Search tool was leveraged to ground Gemini to private document corpus, such as internal research notes, using, enabling agents to answer questions based on Schroder’s proprietary data.

In addition, Vertex AI offers seamless integration with other Google Cloud services and tools that help facilitate rapid agent governance and management, including Cloud Logging, Cloud Monitoring, IAM Access Control, Vertex AI evaluation, BigQuery and more.

Initially, native function calling helped Schroders get familiar with Vertex AI Agent Builder and develop agent-building best pr…

1 week, 3 days назад @ cloud.google.com
Audit smarter: Introducing Google Cloud’s Recommended AI Controls framework
Audit smarter: Introducing Google Cloud’s Recommended AI Controls framework Audit smarter: Introducing Google Cloud’s Recommended AI Controls framework

As organizations build new generative AI applications and AI agents to automate business workflows, security and risk management management leaders face a new set of governance challenges.

These include:How do we prove our AI systems operate in line with internal policies and evolving regulations?

How can we verify that data access controls are consistently enforced across the entire AI lifecycle, from training to inference to large scale production?

Developed by Google Cloud Security experts and validated by our Office of the CISO, this prebuilt framework incorporates best practices for securing AI systems, and uses industry standards including the NIST AI Risk Management Framework and the…

1 week, 3 days назад @ cloud.google.com
The secret to document intelligence: Box builds Enhanced Extract Agents using Google’s Agent-2-Agent framework
The secret to document intelligence: Box builds Enhanced Extract Agents using Google’s Agent-2-Agent framework The secret to document intelligence: Box builds Enhanced Extract Agents using Google’s Agent-2-Agent framework

An agent-to-agent protocol for deeper collaborationBox is championing an open AI ecosystem by embracing Google Cloud's Agent2Agent protocol, enabling all Box AI Agents to securely collaborate with diverse external agents from dozens of partners (a list that keeps growing).

By adopting the latest A2A specification, Box AI can ensure efficient and secure communication for complex, multi-system processes.

Google's Gemini 2.5 Pro: Provides the core text comprehension, reasoning, and generation; and in this enhanced protocol, Gemini models also aim to furnish deeper operational data (like token likelihoods) to its counterpart.

Furthering this commitment to an open and extensible ecosystem, Box A…

1 week, 4 days назад @ cloud.google.com
How to use Gemini 2.5 to fine-tune video outputs on Vertex AI
How to use Gemini 2.5 to fine-tune video outputs on Vertex AI How to use Gemini 2.5 to fine-tune video outputs on Vertex AI

Challenges and mitigations for multi-class single-label video tasksUsing highly skewed data distributions may cause quality regression on the tuned model.

On the other hand, for several similar event types with dense time intervals, multi-class single-label recipes yield better model performance .

Prepare video tuning datasetThe Vertex Tuning API uses *.jsonl files for both training and validation datasets.

V. Set the hyperparameters for tuningAfter preparing your tuning dataset, you are ready to submit your first video tuning job!

With a dataset size of ~500 examples, starting with epochs = 5 is the default value for video tuning tasks.

1 week, 4 days назад @ cloud.google.com
How AI & IoT are helping detect hospital incidents — without compromising patient privacy
How AI & IoT are helping detect hospital incidents — without compromising patient privacy How AI & IoT are helping detect hospital incidents — without compromising patient privacy

Despite these challenges, Hypros’ device represents a significant advancement in privacy-preserving patient monitoring, offering the potential to enhance hospital workflow efficiency and patient care without compromising individual privacy.

Patient monitoring with AI: Overcoming low-resolution data challengesWhile customized parametric algorithms can partially interpret sensor data, they have difficulty handling complex relationships and edge cases.

ML algorithms offer clear advantages, making AI a vital tool for a patient monitoring system.

In addition, manual data labeling can quickly become expensive as tight monitoring sends readings every few seconds, quickly producing large volumes of…

1 week, 4 days назад @ cloud.google.com
Graduating the Google for Startups Accelerator: AI First in Europe & Israel
Graduating the Google for Startups Accelerator: AI First in Europe & Israel Graduating the Google for Startups Accelerator: AI First in Europe & Israel

Today, we're incredibly proud to announce the graduation of the latest cohort from the Google for Startups Accelerator: AI First from Europe & Israel!

This milestone marks the culmination of an intensive three-months journey for these 14 innovative startups, who've dedicated themselves to growing their businesses and pushing the boundaries of artificial intelligence.

The hybrid program offered expert mentorship, robust technical support, and access to a powerful global network, empowering founders to scale their impact.

“With Google’s support, we brought our AI recruitment platform into its next generation — the most advanced in the world, with a business model built for $7M+ ARR within a y…

2 weeks, 4 days назад @ cloud.google.com
Gemini momentum continues with launch of 2.5 Flash-Lite and general availability of 2.5 Flash and Pro on Vertex AI
Gemini momentum continues with launch of 2.5 Flash-Lite and general availability of 2.5 Flash and Pro on Vertex AI Gemini momentum continues with launch of 2.5 Flash-Lite and general availability of 2.5 Flash and Pro on Vertex AI

New Gemini 2.5 Flash-Lite in public preview: Experience the cost-efficient Gemini 2.5 model yet with optimized performance for high-volume tasks.

Supervised Fine-Tuning (SFT) for Gemini 2.5 Flash: Customized AI for your businessAchieve unparalleled customization with the GA release of Supervised Fine-Tuning (SFT) for Gemini 2.5 Flash on Vertex AI.

We are also introducing preview pricing for Gemini 2.5 Flash-Lite, our most cost efficient Gemini 2.5 model yet.

For complete details on pricing for Gemini 2.5 Flash, Gemini 2.5 Pro, and the Gemini 2.5 Flash-Lite preview, please visit our pricing page.

Start moving to production today with Gemini 2.5 Flash and Gemini 2.5 Pro, now generally availab…

2 weeks, 4 days назад @ cloud.google.com
Build a multi-agent KYC workflow in three steps using Google’s Agent Development Kit and Gemini
Build a multi-agent KYC workflow in three steps using Google’s Agent Development Kit and Gemini Build a multi-agent KYC workflow in three steps using Google’s Agent Development Kit and Gemini

Know Your Customer (KYC) processes are foundational to any Financial Services Institution's (FSI) regulatory compliance practices and risk mitigation strategies.

Building robust AI agents is complex.

Google's Agent Development Kit (ADK) gives you essential tooling to build multi-agent workflows.

Plus, combining ADK with Search Grounding via Gemini can help give you higher fidelity and trustworthiness for tasks requiring external knowledge.

To that end, this post illustrates how Google Cloud's cutting-edge AI technologies - the Agent Development Kit (ADK), Vertex AI Gemini models, Search Grounding, and BigQuery - can be leveraged to build such a multi-agent KYC solution.

2 weeks, 5 days назад @ cloud.google.com
Save early and often with multi-tier checkpointing to optimize large AI training jobs
Save early and often with multi-tier checkpointing to optimize large AI training jobs Save early and often with multi-tier checkpointing to optimize large AI training jobs

For example, consider the case where you are using accelerator chips to train a model that takes one month to complete.

Even with a somewhat smaller training workload, the cost savings with optimal checkpointing can be significant.

If you have a week-long training job spanning 1K VMs that cost $88/hour (a3-highgpu-8g), a 6.5% increase in Goodput on this training task could result in almost $1M in infrastructure savings.

More failures require more checkpointingProbabilistically, the mean time between failure (MTBF) of a training job decreases — failures happen more frequently — as the size of the cluster increases.

Therefore, it is important that foundation model producers take checkpoints m…

2 weeks, 5 days назад @ cloud.google.com
How good is your AI? Gen AI evaluation at every stage, explained
How good is your AI? Gen AI evaluation at every stage, explained How good is your AI? Gen AI evaluation at every stage, explained

As AI moves from promising experiments to landing core business impact, the most critical question is no longer "What can it do?"

Ensuring the quality, reliability, and safety of your AI applications is a strategic imperative.

To guide you, evaluation must be your North Star—a constant process that validates your direction throughout the entire development lifecycle.

One year ago, we launched the Gen AI evaluation service, offering capabilities to evaluate various models including Google's foundation models, open models, proprietary foundation models, and customized models.

That's why today we're excited to dive into the new features of the Gen AI Evaluation Service, designed to help you sc…

3 weeks, 1 day назад @ cloud.google.com
Cloud CISO Perspectives: How Google secures AI Agents
Cloud CISO Perspectives: How Google secures AI Agents Cloud CISO Perspectives: How Google secures AI Agents

Our goal is to provide a clear and actionable foundation for building secure and trustworthy AI agent systems that benefit society.

Agent actions and planning must be observable: Agent activities must be transparent and auditable through robust logging and clear action characterization.

Key risks, limitations, and challengesTraditional security paradigms, designed for static software or general AI, are insufficient for AI agents.

Response rendering: Safely rendering AI agent outputs into user-readable content is vital to prevent classic web vulnerabilities.

To learn more about how Google is approaching securing AI agents, please read our research paper.

3 weeks, 2 days назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 5 days, 8 hours назад
AI Testing and Evaluation: Learnings from genome editing
AI Testing and Evaluation: Learnings from genome editing AI Testing and Evaluation: Learnings from genome editing

As generative AI continues to advance, Microsoft has gathered a range of experts—from genome editing to cybersecurity—to share how their fields approach evaluation and risk assessment.

CHARO: Well, you know, genome editing is both very old and very new.

Now the earliest forms of genome editing were very inefficient, and so we didn’t worry that much.

But the bottom-line thing to remember, the way to really think about it is, we don’t regulate genome editing; we regulate the things that use genome editing.

And she said, you know, we don’t regulate genome editing; we regulate the things that use genome editing.

5 days, 8 hours назад @ microsoft.com
PadChest-GR: A bilingual grounded radiology reporting benchmark for chest X-rays
PadChest-GR: A bilingual grounded radiology reporting benchmark for chest X-rays PadChest-GR: A bilingual grounded radiology reporting benchmark for chest X-rays

In our ever-evolving journey to enhance healthcare through technology, we’re announcing a unique new benchmark for grounded radiology report generation—PadChest-GR (opens in new tab).

A new frontier in radiology report generationIt is estimated that over half of people visiting hospitals have radiology scans that must be interpreted by a clinical professional.

In contrast, grounded radiology reporting demands that each finding be described and localized individually.

It is the first public benchmark that enables us to evaluate generation of fully grounded radiology reports in chest X-rays.

We invite researchers and industry experts to explore PadChest-GR and MAIRA-2, contribute innovative i…

1 week, 2 days назад @ microsoft.com
AI Testing and Evaluation: Learnings from Science and Industry
AI Testing and Evaluation: Learnings from Science and Industry AI Testing and Evaluation: Learnings from Science and Industry

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

And I think, really, there are two reasons why tech is so, kind of, representative of that kind of challenge that I’ve always found fascinating.

Continues to be a really important topic in the AI policy conversation right now, I think, for really good reason.

Testing is an important component for governance and AI and, of course, in all of these other domains, as well.

I think about almost, like, in the near to mid-term, like three issues that we need to address in the AI, kind of, policy and testing context.

1 week, 5 days назад @ microsoft.com
Learning from other domains to advance AI evaluation and testing
Learning from other domains to advance AI evaluation and testing Learning from other domains to advance AI evaluation and testing

Today, we’re launching a limited-series podcast, AI Testing and Evaluation: Learnings from Science and Industry, to share insights from domains that have grappled with testing and measurement questions.

At the close of the podcast series, we’ll offer Microsoft’s deeper reflections on next steps toward more reliable and trustworthy approaches to AI evaluation.

While approaches to risk evaluation and testing vary significantly across the case studies, there was one consistent, top-level takeaway: evaluation frameworks always reflect trade-offs among different policy objectives, such as safety, efficiency, and innovation.

Experts across all eight fields noted that policymakers have had to weig…

1 week, 5 days назад @ microsoft.com
Breaking bonds, breaking ground: Advancing the accuracy of computational chemistry with deep learning
Breaking bonds, breaking ground: Advancing the accuracy of computational chemistry with deep learning Breaking bonds, breaking ground: Advancing the accuracy of computational chemistry with deep learning

We are excited to share our first big milestone in solving a grand challenge that has hampered the predictive power of computational chemistry, biochemistry, and materials science for decades.

For 60 years, people have designed practical approximations for the XC functional.

We can contrast the present state of computational chemistry with the state of aircraft engineering and design.

The result is Skala, an XC functional that generalizes to unseen molecules, reaching the accuracy needed to predict experiments.

At Flagship, we believe that openly shared, foundational advances in science – like this leap forward in DFT accuracy – can serve as powerful enablers of innovation.

2 weeks, 3 days назад @ microsoft.com
New methods boost reasoning in small and large language models
New methods boost reasoning in small and large language models New methods boost reasoning in small and large language models

To support this progress, we’ve identified three primary strategies to strengthen reasoning capabilities in both small and large language models: improve architectural design to boost performance in smaller models; incorporate mathematical reasoning techniques to increase reliability; and build stronger generalization capabilities to enable reasoning across a variety of fields.

Read more Opens in a new tabThe problem stems from how current language models operate.

rStar-Math is a method that uses Monte Carlo Tree Search (MCTS) to simulate deeper, more methodical reasoning in smaller models.

LIPS (LLM-based Inequality Prover with Symbolic Reasoning) is a system that combines LLMs’ pattern re…

2 weeks, 4 days назад @ microsoft.com
How AI is reshaping the future of healthcare and medical research
How AI is reshaping the future of healthcare and medical research How AI is reshaping the future of healthcare and medical research

LEE: Yeah, yeah.

It cannot—as, you know, Bill was saying—it cannot learn from your document.

And I don’t know if the two of you remember, but I ended up doing a lot of tests.

I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini (opens in new tab).

Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know.

3 weeks, 2 days назад @ microsoft.com
Rewriting SymCrypt in Rust to modernize Microsoft’s cryptographic library
Rewriting SymCrypt in Rust to modernize Microsoft’s cryptographic library Rewriting SymCrypt in Rust to modernize Microsoft’s cryptographic library

To address these vulnerabilities and improve memory safety, we’re rewriting SymCrypt (opens in new tab)—Microsoft’s open-source cryptographic library—in Rust.

For example, reasoning about C code often requires proving that two non-const pointers are live and non-overlapping, a property that can depend on external client code.

As a result, new tools have emerged specifically for verifying Rust code.

Some users compile our C code directly and may rely on specific toolchains or compiler features that complicate the adoption of Rust code.

Looking ahead, we plan to support direct use of the same cryptographic library in Rust without requiring C bindings.

3 weeks, 4 days назад @ microsoft.com
BenchmarkQED: Automated benchmarking of RAG systems
BenchmarkQED: Automated benchmarking of RAG systems BenchmarkQED: Automated benchmarking of RAG systems

To meet this need, we’re introducing BenchmarkQED, a new suite of tools that automates RAG benchmarking at scale, available on GitHub (opens in new tab).

AutoQ: Automated query synthesisThis limitation motivated the development of GraphRAG a system designed to answer global queries.

GraphRAG’s evaluation requirements subsequently led to the creation of AutoQ, a method for synthesizing these global queries for any dataset.

Synthesis process and example query for each of the four AutoQ query classes.

We hope these datasets, together with the BenchmarkQED tools (opens in new tab), help accelerate benchmark-driven development of RAG systems and AI question-answering.

1 month назад @ microsoft.com
What AI’s impact on individuals means for the health workforce and industry
What AI’s impact on individuals means for the health workforce and industry What AI’s impact on individuals means for the health workforce and industry

So I don’t think we should be surprised that business schools matter on this because we care about management.

That’s really going to change the way, like, middle school works, was my thinking at the time.

We’ve gone from AI being highly discriminative to AI that’s able to explore the world in particular ways.

The symptoms that they’re showing are quite different, and also their compliance is really, really different.

LEE: Yeah, really, really interesting.

1 month, 1 week назад @ microsoft.com
FrodoKEM: A conservative quantum-safe cryptographic algorithm
FrodoKEM: A conservative quantum-safe cryptographic algorithm FrodoKEM: A conservative quantum-safe cryptographic algorithm

FrodoKEM is a key encapsulation mechanism (KEM) based on the Learning with Errors (LWE) problem, a cornerstone of lattice-based cryptography.

The Learning with Errors (LWE) problemThe LWE problem is a fundamental hard problem in lattice-based cryptography.

In other words, cryptanalysts and quantum researchers have not been able to devise an efficient quantum algorithm capable of solving the LWE problem and, hence, FrodoKEM.

ConclusionAfter years of research and analysis, the next generation of post-quantum cryptographic algorithms has arrived.

Further ReadingFor those interested in learning more about FrodoKEM, post-quantum cryptography, and lattice-based cryptography, the following resourc…

1 month, 1 week назад @ microsoft.com
Abstracts: Zero-shot models in single-cell biology with Alex Lu
Abstracts: Zero-shot models in single-cell biology with Alex Lu Abstracts: Zero-shot models in single-cell biology with Alex Lu

And single-cell foundation models claim to be capable of unraveling deeper insights than ever before.

Basically, we showed that single-cell foundation models perform worse in settings that are fundamental to biological discovery than much simpler machine learning and statistical methods that were used in the field before single-cell foundation models emerged and are the go-to standard for unpacking meaning from these complicated experiments.

And the way to understand this is because single-cell foundation models are trained in a way that tries to expose these models to millions of single-cells.

But let’s also talk about the impact for methodologists, people who are trying to improve these s…

1 month, 2 weeks назад @ microsoft.com
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma Abstracts: Aurora with Megan Stanley and Wessel Bruinsma

This is such exciting work about environmental forecasting, so we’re happy to have the two of you join us today.

Mostly because AI weather forecasting models are computationally much more efficient and can even be more accurate.

What’s unfortunate though, about this big step forward, is that these developments are mostly limited to the setting of weather forecasting.

Weather forecasting is very important, obviously, but there are many other important environmental forecasting problems out there, such as air pollution forecasting or ocean wave forecasting.

STANLEY: Current approaches have really focused training very specifically on weather forecasting models.

1 month, 2 weeks назад @ microsoft.com
Collaborators: Healthcare Innovation to Impact
Collaborators: Healthcare Innovation to Impact Collaborators: Healthcare Innovation to Impact

LUNGREN: And now it really feels like this collaborative effort, you know, really can help start to extend that mission.

I think, you know, Will and Smitha, that we definitely feel the passion and the innovation.

Again, you know, in text, you refer to that earlier and certainly off the shelf, there’s really powerful applications.

LUNGREN: So, I think AI has always been thought of as a savior kind of technology.

And I guess for my part, I think really what we’re going to see is a massive unleash of creativity.

1 month, 2 weeks назад @ microsoft.com
Magentic-UI, an experimental human-centered web agent
Magentic-UI, an experimental human-centered web agent Magentic-UI, an experimental human-centered web agent

Magentic-UI seeks user approval before executing potentially irreversible actions, and the user can specify how often Magentic-UI needs approvals.

Magentic-UI seeks user approval before executing potentially irreversible actions, and the user can specify how often Magentic-UI needs approvals.

Figure 6: System architecture diagram of Magentic-UITo interact with Magentic-UI, users can enter a text message and attach images.

On the validation subset of GAIA (162 tasks), we show the results of Magentic-One operating in autonomous mode, Magentic-UI operating in autonomous mode (without the simulated user), Magentic-UI with simulated user (1) (smarter model), Magentic-UI with simulated user (2) (…

1 month, 2 weeks назад @ microsoft.com
MIT AI MIT AI
последний пост 1 day, 6 hours назад
Robotic probe quickly measures key properties of new materials
Robotic probe quickly measures key properties of new materials Robotic probe quickly measures key properties of new materials

But the pace of innovation is bottlenecked by the speed at which researchers can manually measure important material properties.

A fully autonomous robotic system developed by MIT researchers could speed things up.

During a 24-hour test, the fully autonomous robotic probe took more than 125 unique measurements per hour, with more precision and reliability than other artificial intelligence-based methods.

They also designed imaging-based methods to determine some important material properties.

The researchers want to continue building on this robotic system as they strive to create a fully autonomous lab for materials discovery.

1 day, 6 hours назад @ news.mit.edu
Confronting the AI/energy conundrum
Confronting the AI/energy conundrum Confronting the AI/energy conundrum

At the same time, artificial intelligence technologies could revolutionize energy systems, accelerating the transition to clean power.

After decades of flat electricity demand in the United States, computing centers now consume approximately 4 percent of the nation's electricity.

Strategies for clean energy solutionsThe symposium explored multiple pathways to address the AI-energy challenge.

Strubell advocated for viewing computing center electricity as a limited resource requiring thoughtful allocation across different applications.

Navigating the AI-energy paradoxThe symposium highlighted MIT’s central role in developing solutions to the AI-electricity challenge.

3 days, 5 hours назад @ news.mit.edu
Accelerating scientific discovery with AI
Accelerating scientific discovery with AI Accelerating scientific discovery with AI

Several researchers have taken a broad view of scientific progress over the last 50 years and come to the same troubling conclusion: Scientific productivity is declining.

Now, the philanthropically funded research lab FutureHouse is seeking to accelerate scientific research with an AI platform designed to automate many of the critical steps on the path toward scientific progress.

“Natural language is the real language of science,” Rodriques says.

The founders started out wanting to create distinct AI tools for tasks like literature searches, data analysis, and hypothesis generation.

They began with data collection, eventually releasing PaperQA in September 2024, which Rodriques calls the be…

5 days, 9 hours назад @ news.mit.edu
MIT and Mass General Brigham launch joint seed program to accelerate innovations in health
MIT and Mass General Brigham launch joint seed program to accelerate innovations in health MIT and Mass General Brigham launch joint seed program to accelerate innovations in health

Leveraging the strengths of two world-class research institutions, MIT and Mass General Brigham (MGB) recently celebrated the launch of the MIT-MGB Seed Program.

The new initiative, which is supported by Analog Devices Inc. (ADI), will fund joint research projects led by researchers at MIT and Mass General Brigham.

The program will launch an open call for proposals to researchers at MIT and Mass General Brigham.

Awardees will be selected by a joint review committee composed of MIT and Mass General Brigham experts.

Conversely, MIT researchers may not fully grasp these clinical challenges or have access to the right patient data and samples,” explains Shalek, who is also a member of the Ragon…

1 week, 1 day назад @ news.mit.edu
Using generative AI to help robots jump higher and land safely
Using generative AI to help robots jump higher and land safely Using generative AI to help robots jump higher and land safely

Diffusion models like OpenAI’s DALL-E are becoming increasingly useful in helping brainstorm new designs.

Users can draft a 3D model of a robot and specify which parts they’d like to see a diffusion model modify, providing its dimensions beforehand.

The resulting design resembled a blob, so the researchers prompted their system to scale the draft to fit their 3D model.

The advantage of using diffusion models for this task, according to co-lead author and CSAIL postdoc Byungchul Kim, is that they can find unconventional solutions to refine robots.

The diffusion model’s ability to upgrade a robot’s jumping and landing skills suggests it could be useful in enhancing how other machines are desi…

1 week, 1 day назад @ news.mit.edu
Merging AI and underwater photography to reveal hidden ocean worlds
Merging AI and underwater photography to reveal hidden ocean worlds Merging AI and underwater photography to reveal hidden ocean worlds

Just as the 19th-century camera transformed our ability to document and reveal the natural world — capturing life with unprecedented detail and bringing distant or hidden environments into view — generative AI marks a new frontier in visual storytelling.

In addition, LOBSTgER’s models are built using custom code developed by Mentzelopoulos to protect the process and outputs from any potential biases from external data or models.

LOBSTgER’s generative AI builds upon real photography, expanding the researchers’ visual vocabulary to deepen the public’s connection to the natural world.

The project draws from the visual language of photography, the observational rigor of marine science, and the …

1 week, 3 days назад @ news.mit.edu
LLMs factor in unrelated information when recommending medical treatments
LLMs factor in unrelated information when recommending medical treatments LLMs factor in unrelated information when recommending medical treatments

A large language model (LLM) deployed to make treatment recommendations can be tripped up by nonclinical information in patient messages, like typos, extra white space, missing gender markers, or the use of uncertain, dramatic, and informal language, according to a study by MIT researchers.

These findings indicate that LLMs take nonclinical information into account for clinical decision-making in previously unknown ways.

It brings to light the need for more rigorous studies of LLMs before they are deployed for high-stakes applications like making treatment recommendations, the researchers say.

“These models are often trained and tested on medical exam questions but then used in tasks that a…

1 week, 5 days назад @ news.mit.edu
Researchers present bold ideas for AI at MIT Generative AI Impact Consortium kickoff event
Researchers present bold ideas for AI at MIT Generative AI Impact Consortium kickoff event Researchers present bold ideas for AI at MIT Generative AI Impact Consortium kickoff event

Launched in February of this year, the MIT Generative AI Impact Consortium (MGAIC), a presidential initiative led by MIT’s Office of Innovation and Strategy and administered by the MIT Stephen A. Schwarzman College of Computing, issued a call for proposals, inviting researchers from across MIT to submit ideas for innovative projects studying high-impact uses of generative AI models.

The call received 180 submissions from nearly 250 faculty members, spanning all of MIT’s five schools and the college.

The overwhelming response across the Institute exemplifies the growing interest in AI and follows in the wake of MIT’s Generative AI Week and call for impact papers.

Over 30 funding recipients p…

2 weeks, 1 day назад @ news.mit.edu
Combining technology, education, and human connection to improve online learning
Combining technology, education, and human connection to improve online learning Combining technology, education, and human connection to improve online learning

“My years of organizing learning and making communities — both in person and online — have shown me firsthand how powerful social interaction can be for motivation and curiosity,” Morris said.

Combining her observational skills with active community engagement, she works at the intersection of technology, education, and human connection to improve digital learning platforms.

This research builds on her experience increasing human connection in both physical and virtual learning environments.

“I’m developing a framework that combines AI-driven behavioral analysis with human expert assessment to study social learning dynamics,” she says.

“I aim to make two primary contributions: first, analys…

2 weeks, 4 days назад @ news.mit.edu
Unpacking the bias of large language models
Unpacking the bias of large language models Unpacking the bias of large language models

They found that certain design choices which control how the model processes input data can cause position bias.

“These models are black boxes, so as an LLM user, you probably don’t know that position bias can cause your model to be inconsistent.

They also found that using positional encodings to link words more strongly to nearby words can mitigate position bias.

The technique refocuses the model’s attention in the right place, but its effect can be diluted in models with more attention layers.

In the future, the researchers want to further explore the effects of positional encodings and study how position bias could be strategically exploited in certain applications.

2 weeks, 4 days назад @ news.mit.edu
A sounding board for strengthening the student experience
A sounding board for strengthening the student experience A sounding board for strengthening the student experience

Caren is a jazz musician who majored in computer science and engineering, and minored in music and theater arts.

They advise the college’s leadership on issues, offer constructive feedback, and serve as a sounding board for innovative new ideas.

The UAG has been an invaluable space for understanding the student experience more deeply.

“This kind of tribal knowledge doesn’t really permeate to all of MIT,” Schneider explains.

“We are MIT students.

2 weeks, 4 days назад @ news.mit.edu
Celebrating an academic-industry collaboration to advance vehicle technology
Celebrating an academic-industry collaboration to advance vehicle technology Celebrating an academic-industry collaboration to advance vehicle technology

On May 6, MIT AgeLab’s Advanced Vehicle Technology (AVT) Consortium, part of the MIT Center for Transportation and Logistics, celebrated 10 years of its global academic-industry collaboration.

Aviation’s model, built on highly trained personnel and strict predictability standards, contrasts sharply with the fragmented approach in the automotive industry.

Just as aviation doesn’t equate absence of failure with success, vehicle safety must be measured holistically and proactively.

In terms of the impact of AI on the automotive industry, Mauricio Muñoz, senior research engineer at AI Sweden, underscored that despite AI’s transformative potential, the automotive industry cannot rely on general …

2 weeks, 5 days назад @ news.mit.edu
Bringing meaning into technology deployment
Bringing meaning into technology deployment Bringing meaning into technology deployment

The full-day symposium on May 1 was organized around four key themes: responsible health-care technology, artificial intelligence governance and ethics, technology in society and civic engagement, and digital inclusion and social justice.

The event also featured a poster session, where student researchers showcased projects they worked on throughout the year as SERC Scholars.

Bertsimas and his team work closely with the United Network for Organ Sharing (UNOS), a nonprofit that manages most of the national donation and transplant system through a contract with the federal government.

Tsai explained that with technology, it’s now possible for everyone to have a say — but doing so can be overw…

3 weeks, 3 days назад @ news.mit.edu
Photonic processor could streamline 6G wireless signal processing
Photonic processor could streamline 6G wireless signal processing Photonic processor could streamline 6G wireless signal processing

But most AI methods for classifying and processing wireless signals are power-hungry and can’t operate in real-time.

Now, MIT researchers have developed a novel AI hardware accelerator that is specifically designed for wireless signal processing.

Light-speed processingState-of-the-art digital AI accelerators for wireless signal processing convert the signal into an image and run it through a deep-learning model to classify it.

By developing an optical neural network architecture specifically for signal processing, which they call a multiplicative analog frequency transform optical neural network (MAFT-ONN), the researchers tackled that problem head-on.

Results in nanosecondsMAFT-ONN takes a…

3 weeks, 3 days назад @ news.mit.edu
Have a damaged painting? Restore it in just hours with an AI-generated “mask”
Have a damaged painting? Restore it in just hours with an AI-generated “mask” Have a damaged painting? Restore it in just hours with an AI-generated “mask”

The restoration is printed on a very thin polymer film, in the form of a mask that can be aligned and adhered to an original painting.

Kachkine says that a digital file of the mask can be stored and referred to by future conservators, to see exactly what changes were made to restore the original painting.

Still, there has been no way to translate digital restorations directly onto an original work, until now.

In recent years, digital restoration tools have opened a route to creating virtual representations of original, restored works.

For the painting that Kachkine used, the method was able to fill in thousands of losses in just a few hours.

3 weeks, 3 days назад @ news.mit.edu
Berkeley AI
последний пост 2 months, 3 weeks назад
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign) Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications.

To mitigate the imminent prompt injection threat, we propose two fine-tuning-defenses, StruQ and SecAlign.

Prompt Injection Attack: CausesBelow is the threat model of prompt injection attacks.

Prompt injection threat model in LLM-integrated applicationsWe propose that prompt injection has two causes.

Below are resources to learn more and keep updated on prompt injection attacks and defenses.

2 months, 3 weeks назад @ bair.berkeley.edu
Repurposing Protein Folding Models for Generation with Latent Diffusion
Repurposing Protein Folding Models for Generation with Latent Diffusion Repurposing Protein Folding Models for Generation with Latent Diffusion

Repurposing Protein Folding Models for Generation with Latent DiffusionPLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models.

In PLAID, we develop a method that learns to sample from the latent space of protein folding models to generate new proteins.

Unlike many previous protein structure generative models, PLAID addresses the multimodal co-generation problem setting: simultaneously generating both discrete sequence and continuous all-atom structural coordinates.

In this way, we can use structural understanding information in the weights of pretrained protein folding models for the p…

2 months, 4 weeks назад @ bair.berkeley.edu
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway DeploymentTraining Diffusion Models with Reinforcement LearningWe deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone.

The challenges of phantom jamsA stop-and-go wave moving backwards through highway traffic.

Smoothing behavior of RL AVs.

Overall, the steps towards deployment involved:Training in data-driven simulations: We used highway traffic data from I-24 to create a training environment with realistic wave dynamics, then validate the trained agent’s performance and robustness in a variety of new traffic scenarios.…

3 months, 1 week назад @ bair.berkeley.edu
Virtual Personas for Language Models via an Anthology of Backstories
Virtual Personas for Language Models via an Anthology of Backstories Virtual Personas for Language Models via an Anthology of Backstories

Virtual Personas for Language Models via an Anthology of BackstoriesWe introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience.

What does it mean for large language models (LLMs) to be trained on massive text corpora, collectively produced by millions and billions of distinctive human authors?

In this work, we introduce Anthology, an approach for steering LLMs to representative, consistent, and diverse virtual personas by providing richly detailed life narratives of individuals as conditioning context to models.

By grounding langu…

7 months, 3 weeks назад @ bair.berkeley.edu
Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination
Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination

Linguistic Bias in ChatGPT: Language Models Reinforce Dialect DiscriminationSample language model responses to different varieties of English and native speaker reactions.

Over 1 billion people around the world speak varieties such as Indian English, Nigerian English, Irish English, and African-American English.

Then, we compared the language model responses to the “standard” varieties and the non-“standard” varieties.

Here, we included the original GPT-3.5 responses, plus responses from GPT-3.5 and GPT-4 where the models were told to imitate the style of the input.

That can reinforce barriers against speakers of non-“standard” varieties as AI models become increasingly used in …

9 months, 2 weeks назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 2 days, 10 hours назад
Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock
Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

Swisscom’s Network Assistant, built on Amazon Bedrock, represents a significant step forward in automating network operations.

The team chose Amazon Bedrock as the foundation for their generative AI application and implemented a Retrieval Augmented Generation (RAG) architecture using Amazon Bedrock Knowledge Bases to enable precise and contextual responses to engineer queries.

Calculator agent – Supports the network engineers to understand complex network parameters and perform precise data calculations out of telemetry data.

Instead, by using Amazon Bedrock Agents, the agent translates natural language user prompts into SQL queries.

As a result, they have boosted both accuracy and efficien…

2 days, 10 hours назад @ aws.amazon.com
End-to-End model training and deployment with Amazon SageMaker Unified Studio
End-to-End model training and deployment with Amazon SageMaker Unified Studio End-to-End model training and deployment with Amazon SageMaker Unified Studio

SageMaker Unified Studio streamlines access to familiar tools and functionality from purpose-built AWS analytics and artificial intelligence and machine learning (AI/ML) services, including Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Amazon Bedrock, and Amazon SageMaker AI.

To set up Unified Studio, complete the following steps:As an admin, create a SageMaker Unified Studio domain, and note the URL.

Log in to SageMaker Unified StudioNow that you have created your new SageMaker Unified Studio domain, complete the following steps to access SageMaker Unified Studio:On the SageMaker console, open the details page of your domain.

Deploy and test the model using SageMaker AI InferenceWh…

2 days, 10 hours назад @ aws.amazon.com
Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service
Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

In this case, we use OpenSearch Service, which allows for similarity search using k-nearest neighbors (k-NN) as well as traditional lexical search.

Benefits of using OpenSearch Service as a vector store for RAGIn this post, we showcase how you can use a vector store such as OpenSearch Service as a knowledge base and embedding store.

For more details on best practices for operating an OpenSearch Service managed cluster, see Operational best practices for Amazon OpenSearch Service.

Create an OpenSearch Service cluster and SageMaker notebookWe use AWS CloudFormation to deploy our OpenSearch Service cluster, SageMaker notebook, and other resources.

To get started with implementing this Retrieva…

3 days, 3 hours назад @ aws.amazon.com
Advancing AI agent governance with Boomi and AWS: A unified approach to observability and compliance
Advancing AI agent governance with Boomi and AWS: A unified approach to observability and compliance Advancing AI agent governance with Boomi and AWS: A unified approach to observability and compliance

Boomi and AWS have collaborated to address the complexity surrounding AI agents with Agent Control Tower, an AI agent management solution developed by Boomi and tightly integrated with Amazon Bedrock.

A unified AI management solutionBuilt on AWS, Agent Control Tower uniquely delivers a single control plane for managing AI agents across multiple systems, including other cloud providers and on-premises environments.

Through deep integration with Amazon Bedrock, Boomi’s Agent Control Tower enhances agent transparency and governance, offering a unified, actionable view of agent configurations and activities across environments.

Discover how Agent Control Tower can help your organization manage …

3 days, 5 hours назад @ aws.amazon.com
Use Amazon SageMaker Unified Studio to build complex AI workflows using Amazon Bedrock Flows
Use Amazon SageMaker Unified Studio to build complex AI workflows using Amazon Bedrock Flows Use Amazon SageMaker Unified Studio to build complex AI workflows using Amazon Bedrock Flows

With SageMaker Unified Studio, you can efficiently build generative AI applications in a trusted and secure environment using Amazon Bedrock.

You can choose from a selection of high-performing foundation models (FMs) and advanced customization and tooling such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, Amazon Bedrock Agents, and Amazon Bedrock Flows.

In this post, we demonstrate how you can use SageMaker Unified Studio to create complex AI workflows using Amazon Bedrock Flows.

Let’s explore how SageMaker Unified Studio and Amazon Bedrock Flows, integrated with Amazon Bedrock Knowledge Bases and Amazon Bedrock Agents, address these challenges by creating an AI-powered comp…

4 days, 3 hours назад @ aws.amazon.com
Accelerating AI innovation: Scale MCP servers for enterprise workloads with Amazon Bedrock
Accelerating AI innovation: Scale MCP servers for enterprise workloads with Amazon Bedrock Accelerating AI innovation: Scale MCP servers for enterprise workloads with Amazon Bedrock

Furthermore, managing multiple disconnected MCP tools across teams makes it difficult to scale AI initiatives effectively.

Agentic application interaction with a central MCP server hubThe following flow diagram showcases how an agentic application built using Amazon Bedrock interacts with one of the MCP servers located in the MCP server hub.

Agentic application – The agentic applications are hosted on AWS Fargate for Amazon Elastic Container Service (Amazon ECS) and built using Amazon Bedrock Agents.

Central MCP server hub – This is where the MCP servers are hosted.

On the application console, different sets of MCP servers are listed in the left pane under MCP Server Registry.

4 days, 6 hours назад @ aws.amazon.com
Choosing the right approach for generative AI-powered structured data retrieval
Choosing the right approach for generative AI-powered structured data retrieval Choosing the right approach for generative AI-powered structured data retrieval

Examples of structured data include tables, databases, and data warehouses that conform to a predefined schema.

Amazon Q Business does the heavy lifting of indexing the data using a Retrieval Augmented Generation (RAG) approach and using an LLM to generate well-written answers.

For more details, see Integrate unstructured data into Amazon Quicksight using Amazon Q Business.

Pattern 4: Building knowledge bases from structured data using managed text-to-SQLThis pattern uses Amazon Bedrock Knowledge Bases to enable structured data retrieval.

For a list of supported structured data stores, refer to Create a knowledge base by connecting to a structured data store.

4 days, 6 hours назад @ aws.amazon.com
Revolutionizing drug data analysis using Amazon Bedrock multimodal RAG capabilities
Revolutionizing drug data analysis using Amazon Bedrock multimodal RAG capabilities Revolutionizing drug data analysis using Amazon Bedrock multimodal RAG capabilities

Amazon Bedrock Knowledge Bases introduces powerful document parsing capabilities, including Amazon Bedrock Data Automation powered parsing and FM parsing, revolutionizing how we handle complex documents.

Amazon Bedrock Data Automation is a fully managed service that processes multimodal data effectively, without the need to provide additional prompting.

An AWS account with an IAM user who has permissions to Lambda, Amazon Bedrock, Amazon S3, and IAM.

For specific compliance validation for Amazon Bedrock, see Compliance validation for Amazon Bedrock.

To learn more about Amazon Bedrock Knowledge Bases, check out the RAG workshop using Amazon Bedrock.

4 days, 6 hours назад @ aws.amazon.com
Build and deploy AI inference workflows with new enhancements to the Amazon SageMaker Python SDK
Build and deploy AI inference workflows with new enhancements to the Amazon SageMaker Python SDK Build and deploy AI inference workflows with new enhancements to the Amazon SageMaker Python SDK

To address this need, we are introducing a new capability in the SageMaker Python SDK that revolutionizes how you build and deploy inference workflows on SageMaker.

We also show how customers like Amazon Search plan to use SageMaker Inference workflows to provide more relevant search results to Amazon shoppers.

Key improvements and user experienceThe SageMaker Python SDK now includes new features for creating and managing inference workflows.

The improved SageMaker Python SDK introduces a more intuitive and flexible approach to building and deploying AI inference workflows.

Start building your next-generation AI inference workflows today with the enhanced SageMaker Python SDK.

5 days, 2 hours назад @ aws.amazon.com
Context extraction from image files in Amazon Q Business using LLMs
Context extraction from image files in Amazon Q Business using LLMs Context extraction from image files in Amazon Q Business using LLMs

At least one Amazon Q Business Pro user that has admin permissions to set up and configure Amazon Q Business.

Create an Amazon Q Business application and sync with an S3 bucketTo create an Amazon Q Business application and connect it to your S3 bucket, complete the following steps.

For more comprehensive, step-by-step guidance, follow the detailed instructions in the blog post Discover insights from Amazon S3 with Amazon Q S3 connector.

Configure the Amazon Q Business application CDE for the Amazon S3 data sourceWith the CDE feature of Amazon Q Business, you can make the most of your Amazon S3 data sources by using the sophisticated capabilities to modify, enhance, and filter documents duri…

5 days, 10 hours назад @ aws.amazon.com
Build AWS architecture diagrams using Amazon Q CLI and MCP
Build AWS architecture diagrams using Amazon Q CLI and MCP Build AWS architecture diagrams using Amazon Q CLI and MCP

In this post, we explore how to use Amazon Q Developer CLI with the AWS Diagram MCP and the AWS Documentation MCP servers to create sophisticated architecture diagrams that follow AWS best practices.

The AWS Diagram MCP server specifically enables Amazon Q to generate architecture diagrams using the Python diagrams package, with access to the complete AWS icon set and architectural best practices.

Set up your environmentBefore you can start creating diagrams, you need to set up your environment with Amazon Q CLI, the AWS Diagram MCP server, and AWS Documentation MCP server.

Create AWS architecture diagramsIn this section, we walk through the process of multiple AWS architecture diagrams usi…

5 days, 10 hours назад @ aws.amazon.com
AWS costs estimation using Amazon Q CLI and AWS Cost Analysis MCP
AWS costs estimation using Amazon Q CLI and AWS Cost Analysis MCP AWS costs estimation using Amazon Q CLI and AWS Cost Analysis MCP

In this post, we explore how to use Amazon Q CLI with the AWS Cost Analysis MCP server to perform sophisticated cost analysis that follows AWS best practices.

The AWS Cost Analysis MCP server specifically enables Amazon Q to generate detailed cost estimates, reports, and optimization recommendations using real-time AWS pricing data.

Create AWS Cost Analysis reportsIn this section, we walk through the process of creating AWS cost analysis reports using Amazon Q CLI with the AWS Cost Analysis MCP server.

When you provide a prompt to Amazon Q CLI, the AWS Cost Analysis MCP server completes the following steps:Interpret your requirements.

ConclusionIn this post, we explored how to use Amazon Q …

1 week, 1 day назад @ aws.amazon.com
Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails
Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails

With the standalone ApplyGuardrail API, Amazon Bedrock Guardrails offers a model-agnostic and scalable approach to implementing responsible AI policies for your generative AI applications.

ConclusionThe introduction of safeguard tiers in Amazon Bedrock Guardrails represents a significant step forward in our commitment to responsible AI.

To learn more about safeguard tiers in Amazon Bedrock Guardrails, refer to Detect and filter harmful content by using Amazon Bedrock Guardrails, or visit the Amazon Bedrock console to create your first tiered guardrail.

Recently, Koushik’s focus has been on LLM evaluations and safety, leading to the development of products like Amazon Bedrock Evaluations and…

1 week, 2 days назад @ aws.amazon.com
Structured data response with Amazon Bedrock: Prompt Engineering and Tool Use
Structured data response with Amazon Bedrock: Prompt Engineering and Tool Use Structured data response with Amazon Bedrock: Prompt Engineering and Tool Use

Building a prompt engineering solutionThis section will demonstrate how to use prompt engineering effectively to generate structured outputs using Amazon Bedrock.

For example:Tool Use with the Amazon Bedrock Converse APIIn the previous chapter, we explored a solution using Bedrock Prompt Engineering.

Final ThoughtsIn conclusion, we demonstrated two methods for generating structured responses with Amazon Bedrock: Prompt Engineering and Tool Use with the Converse API.

For more details on best practices, refer to the Bedrock prompt engineering guidelines and model-specific documentation, such as Anthropic’s best practices.

Structured data is key to leveraging Generative AI in real-world scenar…

1 week, 2 days назад @ aws.amazon.com
Using Amazon SageMaker AI Random Cut Forest for NASA’s Blue Origin spacecraft sensor data
Using Amazon SageMaker AI Random Cut Forest for NASA’s Blue Origin spacecraft sensor data Using Amazon SageMaker AI Random Cut Forest for NASA’s Blue Origin spacecraft sensor data

Fortunately, AWS uses powerful AI/ML applications within Amazon SageMaker AI that can address these needs.

Using SageMaker AI, we train an RCF model specifically for detecting anomalies in complex spacecraft dynamics data.

This data is then accessed through a SageMaker AI domain using JupyterLab, providing a powerful and flexible environment for data scientists and engineers.

It’s possible to operate in no internet or VPC only modes so SageMaker AI instances remain isolated within your Amazon VPC.

However, the SageMaker AI RCF algorithm can detect them and are highlighted in red.

1 week, 2 days назад @ aws.amazon.com
NVIDIA
последний пост 2 days, 3 hours назад
RAPIDS Adds GPU Polars Streaming, a Unified GNN API, and Zero-Code ML Speedups
RAPIDS Adds GPU Polars Streaming, a Unified GNN API, and Zero-Code ML Speedups RAPIDS Adds GPU Polars Streaming, a Unified GNN API, and Zero-Code ML Speedups

These include a Polars GPU streaming engine, a unified API for graph neural networks (GNNs), and acceleration for support vector machines with zero code changes required.

In September 2024, we worked with the Polars team to launch a Polars GPU engine built on top of NVIDIA cuDF.

The 25.06 release brings some significant updates to Polars GPU engine capabilities.

Streaming executor is now experimentally availableWith the 25.06 release, we introduced streaming execution in the Polars GPU engine.

ConclusionThe RAPIDS 25.06 release brings zero-code-change functionality to new machine learning algorithms, a new Polars GPU streaming engine, hardware decompression capabilities for async memory res…

2 days, 3 hours назад @ developer.nvidia.com
GeForce NOW’s 20 July Games Bring the Heat to the Cloud
GeForce NOW’s 20 July Games Bring the Heat to the Cloud GeForce NOW’s 20 July Games Bring the Heat to the Cloud

Celebrate the six new games available this week with the GeForce NOW Summer Sale.

Catch the scorching lineup of 20 titles coming to the cloud, which gamers can play whether indoors or on the go.

Six new games are landing on GeForce NOW this week, including launch day titles Figment and Little Nightmares II.

And to make the summer even hotter, the GeForce NOW Summer Sale is in full swing.

(New release on Steam, June 17)(New release on Steam, June 17) METAL EDEN Demo (Steam)Demo (Steam) Torque Drift 2 (Epic Games Store)(Epic Games Store) Broken Age (Steam)(Steam) Sandwich Simulator (Steam)(Steam) We Happy Few (Steam)What are you planning to play this weekend?

2 days, 11 hours назад @ blogs.nvidia.com
NVIDIA RTX AI Accelerates FLUX.1 Kontext — Now Available for Download
NVIDIA RTX AI Accelerates FLUX.1 Kontext — Now Available for Download NVIDIA RTX AI Accelerates FLUX.1 Kontext — Now Available for Download

NVIDIA has collaborated with Black Forest Labs to optimize FLUX.1 Kontext [dev] for NVIDIA RTX GPUs using the NVIDIA TensorRT software development kit and quantization to deliver faster inference with lower VRAM requirements.

The FLUX.1 Kontext [dev] Flex: In-Context Image GenerationBlack Forest Labs in May introduced the FLUX.1 Kontext family of image models which accept both text and image prompts.

Learn more about NVIDIA optimizations and how to get started with FLUX.1 Kontext [dev] on the NVIDIA Technical Blog.

Join NVIDIA’s Discord server to connect with community developers and AI enthusiasts for discussions on what’s possible with RTX AI.

Plug in to NVIDIA AI PC on Facebook, Instagra…

3 days, 11 hours назад @ blogs.nvidia.com
Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training
Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training

Delayed scaling and current scaling are two common approaches within per-tensor scaling.

Per-tensor current scalingWhile delayed scaling uses historical data, per-tensor current scaling offers an alternative that prioritizes real-time adaptability.

Building upon the concept of per-block scaling, Micro-Scaling FP8 (MXFP8) represents the NVIDIA Blackwell hardware-level solution for achieving efficient and stable FP8 training.

Block scalingBeyond per-tensor scaling strategies and hardware-specific block scaling like MXFP8, generic FP8 block scaling is a versatile and configurable approach to fine-grained precision control.

Unlike per-tensor methods that apply a single scale across an entire te…

4 days, 6 hours назад @ developer.nvidia.com
How AI Factories Can Help Relieve Grid Stress
How AI Factories Can Help Relieve Grid Stress How AI Factories Can Help Relieve Grid Stress

Emerald AI, an NVIDIA Inception startup, is developing software to control power use during times of peak grid demand while meeting the performance requirements of data center AI workloads.

Emerald AI achieved this by orchestrating the host of different workloads that AI factories run.

“The Phoenix technology trial validates the vast potential of an essential element in data center flexibility,” said Anuja Ratnayake, who leads EPRI’s DCFlex Consortium.

“This test was an opportunity to completely reimagine AI data centers as helpful resources to help us operate the power grid more effectively and reliably,” said David Rousseau, president of SRP.

“AI factories can flex when the grid is tight …

4 days, 11 hours назад @ blogs.nvidia.com
NVIDIA NeMo Retriever Scores First Place for Visual Retrieval
NVIDIA NeMo Retriever Scores First Place for Visual Retrieval NVIDIA NeMo Retriever Scores First Place for Visual Retrieval

NeMo Retriever tops several visual document retrieval leaderboards, setting new standards for RAG apps.

5 days, 7 hours назад @ huggingface.co
How to Work with Data Exceeding VRAM in the Polars GPU Engine
How to Work with Data Exceeding VRAM in the Polars GPU Engine How to Work with Data Exceeding VRAM in the Polars GPU Engine

This post explores two options within the Polars GPU engine to overcome this constraint.

This allows the Polars GPU engine to spill data over to the system RAM when VRAM is full, preventing out-of-memory errors and enabling you to work with larger-than-VRAM datasets.

For a deep dive on how streaming execution in the Polars GPU engine works under the hood, see the NVIDIA GTC Paris session, Scaling DataFrames with Polars.

Choosing the right approachBoth UVM and multi-GPU streaming execution offer powerful ways to handle datasets larger than your GPU VRAM in the Polars GPU engine.

By default, the Polars GPU engine is configured to utilize UVM for the best mix of performance and scalability at …

1 week, 1 day назад @ developer.nvidia.com
AI Analyzes Nurses’ Observations to Reduce Patient Danger
AI Analyzes Nurses’ Observations to Reduce Patient Danger AI Analyzes Nurses’ Observations to Reduce Patient Danger

Nurses typically interact with patients frequently and often identify subtle—but important—changes in a patient’s health that might otherwise slip by unnoticed.

CONCERN EWS dives deep into what nurses are seeing to develop accurate and insightful predictions—but in an unexpected way.

The AI understands natural language and can read what nurses write in a patient’s electronic health records (EHRs).

As a proxy for understanding a nurses’ insights, CONCERN EWS analyzes metadata connected with each EHR entry—things like date, time, and location—looking for patterns that suggest trouble.

Read additional news coverage about CONCERN EWS or watch a video about the technology.

1 week, 1 day назад @ developer.nvidia.com
Boost Embedding Model Accuracy for Custom Information Retrieval
Boost Embedding Model Accuracy for Custom Information Retrieval Boost Embedding Model Accuracy for Custom Information Retrieval

Customizing embedding models is crucial for effective information retrieval, especially when working with domain-specific data like legal text, medical records, or multi-turn customer conversations.

This post serves as an inspiration for other enterprises and developers to think through the various decisions in fine-tuning their embedding models to improve the overall accuracy of their multi-turn retrieval systems.

Enterprises can customize their own by selecting the various features of NeMo Curator that best fit their goals and workflows.

Figure 3 shows a bar chart comparing the accuracy of the fine-tuned embedding model to the other models across various thresholds.

Comparison of accuracy…

1 week, 2 days назад @ developer.nvidia.com
Game On With GeForce NOW, the Membership That Keeps on Delivering
Game On With GeForce NOW, the Membership That Keeps on Delivering Game On With GeForce NOW, the Membership That Keeps on Delivering

And SteelSeries has launched a new mobile controller that transforms phones into cloud gaming devices with GeForce NOW.

Steam Up SummerThe Steam Summer Sale is in full swing.

Check out the “Steam Summer Sale” row in the GeForce NOW app to find deals on the next adventure.

While picking up discounted games, don’t miss the chance to get a GeForce NOW six-month Performance membership at 40% off.

An Ultimate ControllerGet ready for the SteelSeries Nimbus Cloud, a new dual-mode cloud controller.

1 week, 2 days назад @ blogs.nvidia.com
Startup Uses NVIDIA RTX-Powered Generative AI to Make Coolers, Cooler
Startup Uses NVIDIA RTX-Powered Generative AI to Make Coolers, Cooler Startup Uses NVIDIA RTX-Powered Generative AI to Make Coolers, Cooler

To bring his creative vision to life, Theriault relied on AI and his NVIDIA GeForce RTX-equipped system.

Plus, GeForce RTX 5050 laptops start arriving today at retailers worldwide, from $999.

NVIDIA and GeForce RTX GPUs based on the NVIDIA Blackwell architecture include fifth-generation Tensor Cores designed to accelerate AI and deep learning workloads.

Theriault uses the Blender Cycles app to render out final files.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter.

1 week, 2 days назад @ blogs.nvidia.com
Into the Omniverse: World Foundation Models Advance Autonomous Vehicle Simulation and Safety
Into the Omniverse: World Foundation Models Advance Autonomous Vehicle Simulation and Safety Into the Omniverse: World Foundation Models Advance Autonomous Vehicle Simulation and Safety

WFMs can be used to generate synthetic datasets for enhanced AV simulation.

To help physical AI developers build such simulated environments, NVIDIA unveiled major advances in WFMs at the GTC Paris and CVPR conferences earlier this month.

These new capabilities enhance NVIDIA Cosmos — a platform of generative WFMs, advanced tokenizers, guardrails and accelerated data processing tools.

NVIDIA Omniverse, a platform of application programming interfaces, software development kits and services for building OpenUSD-based physical AI applications, enables simulations from WFMs and neural reconstruction at world scale.

Stay up to date by subscribing to NVIDIA Omniverse news, joining the community …

1 week, 2 days назад @ blogs.nvidia.com
Check Out Sovereign AI in Practice Through an NVIDIA Webinar
Check Out Sovereign AI in Practice Through an NVIDIA Webinar Check Out Sovereign AI in Practice Through an NVIDIA Webinar

Join NVIDIA experts and leading European model builders on July 8 at 10:00 a.m. CEST for a live webinar on building, evaluating, and scaling multilingual large language models (LLMs).

Learn how to expand LLM capabilities in runtime and enrich models with new knowledge across languages and cultural contexts.

And discover how NVIDIA is collaborating with European organizations to develop improved datasets and multilingual models—recently announced at GTC Paris.

Hear from Hugging Face, Barcelona Supercomputing Center, ThinkDeep, and EuroLLM as they share their expertise in constructing foundational models attuned to the needs of their market using tools like NVIDIA NeMo.

1 week, 3 days назад @ nvidia.com
How to Streamline Complex LLM Workflows Using NVIDIA NeMo-Skills
How to Streamline Complex LLM Workflows Using NVIDIA NeMo-Skills How to Streamline Complex LLM Workflows Using NVIDIA NeMo-Skills

A typical recipe for improving LLMs involves multiple stages: synthetic data generation (SDG), model training through supervised fine-tuning (SFT) or reinforcement learning (RL), and model evaluation.

To streamline this complex workflow, NVIDIA developed the NeMo-Skills library.

Learn how to get started with NeMo-Skills to build powerful training and inference pipelinesSet up NeMo-Skills locally or on SlurmTo orchestrate complex jobs, NeMo-Skills uses Docker containers.

You can read more about ns eval pipeline options in the NeMo-Skills evaluation documentation.

The NVIDIA team successfully used NeMo-Skills to develop several popular models and datasets.

1 week, 3 days назад @ developer.nvidia.com
HPE and NVIDIA Debut AI Factory Stack to Power Next Industrial Shift
HPE and NVIDIA Debut AI Factory Stack to Power Next Industrial Shift HPE and NVIDIA Debut AI Factory Stack to Power Next Industrial Shift

To speed up AI adoption across industries, HPE and NVIDIA today launched new AI factory offerings at HPE Discover in Las Vegas.

This now includes HPE OpsRamp Software, a validated observability solution for the NVIDIA Enterprise AI Factory, and HPE Morpheus Enterprise Software for orchestration.

This includes the next-generation HPE Private Cloud AI, co-engineered with NVIDIA and validated as part of the NVIDIA Enterprise AI Factory framework.

HPE Private Cloud AI includes the latest NVIDIA AI Blueprints, including the NVIDIA AI-Q Blueprint for AI agent creation and workflows.

To accelerate AI for financial services, HPE will co-test agentic AI workflows built on Accenture’s AI Refinery wit…

1 week, 4 days назад @ blogs.nvidia.com
Facebook
последний пост 1 month, 4 weeks назад
Accelerating GPU indexes in Faiss with NVIDIA cuVS
Accelerating GPU indexes in Faiss with NVIDIA cuVS Accelerating GPU indexes in Faiss with NVIDIA cuVS

Meta and NVIDIA collaborated to accelerate vector search on GPUs by integrating NVIDIA cuVS into Faiss v1.10 , Meta’s open source library for similarity search.

In its latest release, Faiss 1.10.0 officially includes these algorithms from the NVIDIA cuVS library.

Faiss 1.10.0 also includes a new conda package that unlocks the ability to choose between the classic Faiss GPU implementations and the newer NVIDIA cuVS algorithms, making it easy for users to switch between GPU and CPU.

Build time (95% recall@10)Index Embeddings100M x 96(seconds) Embeddings5M x 1536(seconds) Faiss Classic Faiss cuVS Faiss Classic Faiss cuVS Faiss Classic Faiss cuVS IVF Flat IVF Flat 101.4 37.9 (2.7x) 24.4 15.2 (1…

1 month, 4 weeks назад @ engineering.fb.com
Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes
Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes

We are introducing AutoPatchBench, a benchmark for the automated repair of vulnerabilities identified through fuzzing.

As illustrated, fixing a fuzzing crash involves:Analyzing the crash stack trace and the target code.

Inside AutoPatchBenchWe’re making AutoPatchBench publicly available as part of CyberSecEval 4 to encourage community collaboration in tackling the challenge of automating fuzzing crash repairs.

Then we find the lowest common ancestor (LCA) across all pairs of stacktraces offered by the groundtruth patch and the LLM patch.

As an experienced Security Engineer at Meta, your task is to address the following security-critical fuzzing crash.

2 months, 1 week назад @ engineering.fb.com
Building multimodal AI for Ray-Ban Meta glasses
Building multimodal AI for Ray-Ban Meta glasses Building multimodal AI for Ray-Ban Meta glasses

With our Ray-Ban Meta glasses, multimodal AI helps the glasses see what the wearer is seeing.

This means anyone wearing Ray-Ban Meta glasses can ask them questions about what they’re looking at.

On this episode of the Meta Tech Podcast, meet Shane, a research scientist at Meta who has spent the last seven years focusing on computer vision and multimodal AI for wearables.

Shane sits down with Pascal Hartig to share how his team is building foundational models for the Ray-Ban Meta glasses.

They talk about the unique challenges of AI glasses and pushing the boundaries of AI-driven wearable technology.

4 months назад @ engineering.fb.com
Revolutionizing software testing: Introducing LLM-powered bug catchers
Revolutionizing software testing: Introducing LLM-powered bug catchers Revolutionizing software testing: Introducing LLM-powered bug catchers

WHAT IT ISMeta’s Automated Compliance Hardening (ACH) tool is a system for mutation-guided, LLM-based test generation.

Traditionally, automated test generation techniques sought merely to increase code coverage.

LLM-based test generation and LLM-based mutant generation are not new, but this is the first time they’ve been combined and deployed in large-scaled industrial systems.

WHAT’S NEXTOur novel approach combines LLM-based test generation and mutant generation to help automate complex technical organizational workflows in this space.

READ THE PAPERMutation-Guided LLM-based Test Generation at Meta

5 months назад @ engineering.fb.com
Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine
Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine

Unlocking advertiser value through industry-leading ML innovationMeta Andromeda is a personalized ads retrieval engine that leverages the NVIDIA Grace Hopper Superchip, to enable cutting edge ML innovation in the Ads retrieval stage to drive efficiency and advertiser performance.

Its deployment across Instagram and Facebook applications has achieved +6% recall improvement to the retrieval system, delivering +8% ads quality improvement on selected segments.

Andromeda is designed to maximize ads performance by utilizing the exponential growth in volume of eligible ads available to the retrieval stage.

The design is optimized for AI hardware, minimizing memory bandwidth bottlenecks and enablin…

7 months назад @ engineering.fb.com
Sequence learning: A paradigm shift for personalized ads recommendations
Sequence learning: A paradigm shift for personalized ads recommendations Sequence learning: A paradigm shift for personalized ads recommendations

Meta’s ad recommendation engine, powered by deep learning recommendation models (DLRMs), has been instrumental in delivering personalized ads to people.

Learning from sequences: developing new sequence learning architectures to replace traditional DLRM neural network architectures.

A paradigm shift with learning from sequences for recommendation systemsMeta’s new system for ads recommendations uses sequence learning at its core.

Scaling the new sequence learning paradigmFollowing the redesign to shift from sparse feature learning to event-based sequence learning, the next focus was scaling across two domains — scaling the sequence learning architecture and scaling event sequences to be long…

7 months, 2 weeks назад @ engineering.fb.com
OCP Summit 2024: The open future of networking hardware for AI
OCP Summit 2024: The open future of networking hardware for AI OCP Summit 2024: The open future of networking hardware for AI

At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters.

We’ve expanded our network hardware portfolio and are contributing two new disaggregated network fabrics and a new NIC to OCP.

At Meta, we believe that open hardware drives innovation.

At Meta, we envision a future of AI hardware systems that are not only scalable, but also open and collaborative.

We encourage anyone who wants to help advance the future of networking hardware for AI to engage with OCP and Meta to help share the future of AI infrastructure.

8 months, 3 weeks назад @ engineering.fb.com
Meta’s open AI hardware vision
Meta’s open AI hardware vision Meta’s open AI hardware vision

At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community.

These innovations include a new AI platform, cutting-edge open rack designs, and advanced network fabrics and components.

The open future of AI infraMeta is committed to open source AI.

We must also prioritize open and standardized models so we can leverage collective expertise, make AI more accessible, and work towards minimizing biases in our systems.​Just as important, we also need open AI hardware systems.

By addressing AI’s infrastructure needs together, we can unlock the true promise of open AI for everyone.​

8 months, 3 weeks назад @ engineering.fb.com
How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions
How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions

Why we need better population mapsAccurate estimates of population are taken for granted in many countries.

As the world’s natural resource and energy demands scale, accurate population estimates also offer significant opportunities to improve sustainability efforts.

In addition to total population counts, Meta’s population maps also include demographic breakdowns for groups such as the number of children under five, women of reproductive age, youth, and the elderly.

AI-powered population estimates have been scientifically evaluated to be among the most accurate in the world for mapping population distribution for a variety of geographies and use-cases.

Please visit the Data for Good websit…

9 months назад @ engineering.fb.com
Simulator-based reinforcement learning for data center cooling optimization
Simulator-based reinforcement learning for data center cooling optimization Simulator-based reinforcement learning for data center cooling optimization

Meta is revamping its new data center design to optimize for artificial intelligence and the same methodology will be applicable for future data center optimizations as well.

As Meta is revamping its new data center design to optimize for artificial intelligence, the same methodology will be applicable for future data center optimizations as well to improve operational efficiency.

A reinforcement learning approach to data center coolingReinforcement learning (RL) is good at modeling control systems as sequential state machines.

There are also various RL approaches reported such as, transforming cooling optimization via deep reinforcement learning and data center cooling using model-predicti…

9 months, 4 weeks назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 1 day, 14 hours назад
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models
How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models How to Monitor, Diagnose, and Solve Gradient Issues in Foundation Models

What gradient issues occur during foundation model training?

During training, gradient descent updates model parameters by computing the gradients of the loss function via forward and backward passes.

The green line corresponds to a learning rate of 10, while the orange line has a learning rate of 0.1.

The gradient norm for the orange line with LR = 0.1 is very high in the first steps, while the gradient norm of the green line with LR = 10 diverges to NaN after a few steps.

Techniques for gradient stabilizationMonitoring gradient norms and training loss provides insights into the learning dynamics of the foundation models.

1 day, 14 hours назад @ neptune.ai
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection] STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]

Unstructured pruning removes individual weights, while structured pruning removes entire model components.

In the context of MoEs, as expert structures from training MoEs correspond to such patterns, pruning experts is a natural fit for structured pruning.

Thus, structured pruning does not significantly decrease kurtosis, leaving plenty of margin for unstructured pruning.

Since structured pruning primarily reduces architectural redundancy rather than reshaping the underlying weight distribution, our two-phase approach—leveraging unstructured pruning after structured pruning—outperforms unstructured-only pruning.

Since STUN does not make any assumption about base MoE models, it is generaliza…

1 month назад @ neptune.ai
Evaluating RAG Pipelines
Evaluating RAG Pipelines Evaluating RAG Pipelines

Related Building LLM Applications With Vector Databases Read moreDimensions of RAG evaluationEvaluating a RAG pipeline means assessing its behavior across three dimensions:1.

The evaluation of the RAG pipeline is a multi-step process, starting with creating an evaluation dataset, then evaluating the individual components (retriever, generator, etc.

Curating an evaluation datasetThe first step in the RAG evaluation process is the creation of a ground truth dataset.

MAP considers both the presence and rank of relevant chunks but fails to consider the relative position of relevant chunks.

However, not all retrieved chunks are equally relevant and sometimes, the most relevant chunks might not b…

1 month, 3 weeks назад @ neptune.ai
How to Build an LLM Agent With AutoGen: Step-by-Step Guide
How to Build an LLM Agent With AutoGen: Step-by-Step Guide How to Build an LLM Agent With AutoGen: Step-by-Step Guide

The efficiency of an LLM agent depends on the selection of the right LLM model.

In this article, we’ll introduce the fundamental building blocks of LLM agents and then walk through the process of building an LLM agent step by step.

Building an LLM agent from scratchIn the following, we’ll build a trip-planning LLM agent from scratch.

Using AutoGen’s OpenAI Assistant Agent, we instantiate a prompt that the LLM agent will follow throughout its interactions.

Related Ethical Considerations and Best Practices in LLM Development Read moreEnhancing LLM agent performanceWhile architecting an LLM agent, you have to keep in mind opportunities to improve the performance of the LLM agent.

3 months, 2 weeks назад @ neptune.ai
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection] Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]

Moreover, I will make the case for why Bayesian deep learning can satisfy these desiderata and briefly review recent advances in the field.

The case for Bayesian deep learningBayesian deep learning uses the foundational statistical principles of Bayesian inference to endow deep learning systems with the ability to make probabilistic predictions.

However, Bayesian deep learning is unfortunately still not as easy to use as standard deep learning, which you can do these days in a few lines of PyTorch code.

If you want to use a Bayesian deep learning model, first, you have to think about specifying the prior.

If this is the case, trying out Bayesian deep learning is likely worth your while.

3 months, 3 weeks назад @ neptune.ai
Introduction to State Space Models as Natural Language Models
Introduction to State Space Models as Natural Language Models Introduction to State Space Models as Natural Language Models

TL;DR State Space Models (SSMs) use first-order differential equations to represent dynamic systems.

Understanding state space modelsBefore exploring how State Space Models (SSMs) can function as components of large language models (LLMs), we’ll examine their foundational mechanics.

State space models for natural language processingState Space Models (SSMs), long established in time series analysis, have been utilized as trainable sequence models for decades.

Linear state space layers (LSSLs)So far, we’ve seen that State Space Models are efficient sequence models.

Improvements on the state matrix AIn the previous section, we explored how the original LSSL relied on a fixed, predefined form …

4 months назад @ neptune.ai
Ethical Considerations and Best Practices in LLM Development
Ethical Considerations and Best Practices in LLM Development Ethical Considerations and Best Practices in LLM Development

To keep data secure throughout the model’s lifecycle, implement these practices: data anonymization, secure model serving and privacy penetration tests.

For example, a recruitment LLM favoring male applicants due to biased training data reflects a harmful bias that requires correction.

Monitor bias continuouslyMitigating bias isn’t a one-time effort—it requires ongoing monitoring to ensure that your LLM remains fair and effective across iterations.

Although these contributions are publicly available, the move opened up debates about the ethics of reusing community-contributed content for proprietary AI training.

Best practices for ethical LLM developmentNavigating the regulatory landscape r…

4 months, 1 week назад @ neptune.ai
Open LLMs are Necessary For Current Private Adaptations and Outperform Their Closed Alternatives [Paper Reflection]
Open LLMs are Necessary For Current Private Adaptations and Outperform Their Closed Alternatives [Paper Reflection] Open LLMs are Necessary For Current Private Adaptations and Outperform Their Closed Alternatives [Paper Reflection]

While much of the discussion around LLMs centers on task and computational performance, in our paper Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives, we focus on the privacy implications of using Open and Closed LLMs.

The threat space in adapting LLMs to private dataThe adaptation of Closed LLMs to private datasets introduces a multifaceted threat space.

Related Zero-Shot and Few-Shot Learning with LLMs Read morePrivate adaptation methods for Open LLMsUnlike Closed LLMs, Open LLMs provide access to their parameters, enabling more flexible and parameter-centric private adaptation methods.

Performance: All adaptation methods for Closed LLMs ach…

4 months, 2 weeks назад @ neptune.ai
Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale
Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale

“What is not measured, cannot be improved.” This quote has become a guiding principle for teams training foundation models.

During my talk at NeurIPS, I broke down five key lessons learned from teams facing large-scale model training and monitoring.

Waabi’s teams, running large-scale ML experiments, needed a way to organize and share their experiment data efficiently.

Visualizing large datasetsWe generally do not think of dataset visualization as part of experiment monitoring.

Moving forwardThe path to efficient hyperscale training lies in combining robust monitoring, advanced debugging tools, and comprehensive experiment tracking.

4 months, 3 weeks назад @ neptune.ai
Mixture of Experts LLMs: Key Concepts Explained
Mixture of Experts LLMs: Key Concepts Explained Mixture of Experts LLMs: Key Concepts Explained

TL;DR Mixture of Experts (MoE) is a type of neural network architecture that employs sub-networks (experts) to process specific input parts.

This is the key idea behind Mixture of Expert LLMs.

The Switch-Language Transformer, Mixtral, GLaM, GShard, and DeepSeekMoE are Mixture of Experts LLMs (MoEs), which require only executing a portion of the model’s computational graph during inference.

Optimization strategies for MoE LLMs are discussed comprehensively in the papers introducing the Switch Transformer, GShard, and GLaM.

Mixture of Experts (MoE) is an approach to scaling LLMs to trillions of parameters with conditional computation while avoiding exploding computational costs.

4 months, 4 weeks назад @ neptune.ai
Hyperparameter Optimization For LLMs: Advanced Strategies
Hyperparameter Optimization For LLMs: Advanced Strategies Hyperparameter Optimization For LLMs: Advanced Strategies

Advanced hyperparameter optimization strategies, like population-based training, Bayesian optimization, and adaptive LoRA, promise to balance computational effort and outcome.

To avoid this, learning rate schedules for LLMs start with a small learning rate and slowly ramp it up to its maximum value.

Can we use traditional machine learning hyperparameter optimization methods for LLMs?

| Modified based on: sourceHands-on: LLM hyperparameter optimization with neptune.aiOptuna is a framework for optimizing hyperparameter search using Bayesian optimization.

See the docs or watch a short product demo (2 min)Play with a live Neptune Scale projectRequest your early accessWhat’s next in LLM hyperpar…

5 months назад @ neptune.ai
Multimodal Large Language Models
Multimodal Large Language Models Multimodal Large Language Models

TL;DR Multimodal Large Language Models (MLLMs) process data from different modalities like text, audio, image, and video.

This article explores Multimodal Large Language Models, exploring their core functionalities, challenges, and potential for various machine-learning domains.

Let’s break down the concept of Multimodal Large Language Models (MLLMs) by first understanding the terms “modal” and “multimodal:”“Modal” refers to a particular way of communicating or perceiving information.

| SourceGoogle: PaLM-EGoogle developed an embodied language model, PaLM-E, to incorporate continuous sensor modalities into language models and establish the link between words and perceptions.

Improving how t…

5 months, 1 week назад @ neptune.ai
How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai
How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai

In this guide, we’ll show you how to build a RAG system using the LangChain framework, evaluate its performance using Ragas, and track your experiments with neptune.ai.

Part 1: Building a baseline RAG system with LangChainIn the first part of this guide, we’ll use LangChain to build a RAG system for the blog posts in the LLMOps category on Neptune’s blog.

Ragas works smoothly with LangChain, making it a great choice for evaluating our RAG system.

Step 1: Generate a RAG evaluation datasetAn evaluation set for RAG tasks is similar to a question-answering task dataset.

Step 2: Choose RAG evaluation metricsAs mentioned earlier, Ragas offers both LLM-based and non-LLM-based metrics for RAG syste…

6 months, 1 week назад @ neptune.ai
Position: Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection]
Position: Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection] Position: Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection]

In our paper, Understanding LLMs Requires More Than Statistical Generalization, we argue that current machine learning theory cannot explain the interesting emergent properties of Large Language Models, such as reasoning or in-context learning.

Inductive biases affect which solution the neural network converges to, such as the model architecture or the optimization algorithm.

How do language complexity and model architecture affect generalization ability?

showed how different neural network architectures generalize better for different language types.

Presumably, we’ll need to find different complexity measures for different model architectures that consider their specific inductive biases.

6 months, 2 weeks назад @ neptune.ai
From Research to Production: Building The Most Scalable Experiment Tracker For Foundation Models
From Research to Production: Building The Most Scalable Experiment Tracker For Foundation Models From Research to Production: Building The Most Scalable Experiment Tracker For Foundation Models

TL;DR At a large-scale model training (in huge models), anomalies are not rare events but problematic patterns that drive failure.

The Neptune Scale experiment tracker supports fault tolerance and is designed to maintain progress despite hardware failures, making it adaptable for enterprise teams tackling LLM fine-tuning, compliance, and building domain-specific models.

Experiment tracking back then was straightforward—dealing mostly with single models or small-scale distributed systems.

One of the biggest lessons we’ve learned is that experiment tracking has evolved into experiment monitoring.

That’s why we’re focusing on building intelligent alerts and anomaly detection right into our exp…

6 months, 3 weeks назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 2 months назад
On the Biology of a Large Language Model (Part 2)
On the Biology of a Large Language Model (Part 2) On the Biology of a Large Language Model (Part 2)

An in-depth look at Anthropic's Transformer Circuit Blog Post

Part 1 here: https://youtu.be/mU3g2YPKlsA

Discord here: https;//ykilcher.com/discord https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy …

2 months назад @ youtube.com
On the Biology of a Large Language Model (Part 1)
On the Biology of a Large Language Model (Part 1) On the Biology of a Large Language Model (Part 1)

An in-depth look at Anthropic's Transformer Circuit Blog Post https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson,

Sam Zimmerman, Kelley Rivoire, Thom…

3 months назад @ youtube.com
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained)
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained) DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained)

#deepseek #llm #reinforcementlearning GRPO is one of the core advancements used in Deepseek-R1, but was introduced already last year in this paper that uses a combination of new RL techniques and iterative data collection to achieve remarkable performance on mathematics benchmarks with just a 7B model. Paper: https://arxiv.org/abs/2402.03300 Abstract:

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achie…

5 months, 1 week назад @ youtube.com
Traditional Holiday Live Stream
Traditional Holiday Live Stream Traditional Holiday Live Stream

https://ykilcher.com/discord Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yannic-kilcher

Minds: https://www.minds.com/ykilcher

Parler: https://parler.com/profile/YannicKilcher

LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https:/…

6 months, 1 week назад @ youtube.com
Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)
Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained) Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

#tokenization #llm #meta This paper does away with tokenization and creates an LLM architecture that operates on dynamically sized "patches" instead of tokens. By controlling the patch size, they gain a level of control over the tradeoff between model size and FLOPs and use that to achieve more favorable scaling behavior than classically tokenized LLMs. Paper: https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/

Code: https://github.com/facebookresearch/blt Abstract:

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant imp…

6 months, 1 week назад @ youtube.com
Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)
Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained) Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)

This paper demonstrates in a series of experiments that current safety alignment techniques of LLMs, as well as corresponding jailbreaking attacks, are in large part focusing on modulating the distribution of the first few tokens of the LLM response. Paper: https://openreview.net/forum?id=6Mxhg9PtDE&s=09 Abstract:

The safety alignment of current Large Language Models (LLMs) is vulnerable. Simple attacks, or even benign fine-tuning, can jailbreak aligned models. We note that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output to…

6 months, 3 weeks назад @ youtube.com
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained) TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

A deep dive into the TokenFormer and an opinion about its impact, novelty, and relation to prior work. Paper: https://arxiv.org/abs/2410.23168 Abstract:

Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel dimensions) are introduced, the entire model typically requires retraining from scratch. As model sizes continue growing, this strategy results in increasingly high com…

7 months, 2 weeks назад @ youtube.com
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

This paper (by Apple) questions the mathematical reasoning abilities of current LLMs and designs a synthetic template-based dataset distribution to investigate various aspects around LLM performance of high-school level math questions. Paper: https://arxiv.org/abs/2410.05229 Abstract:

Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities hav…

8 months, 2 weeks назад @ youtube.com
Were RNNs All We Needed? (Paper Explained)
Were RNNs All We Needed? (Paper Explained) Were RNNs All We Needed? (Paper Explained)

This paper posits the interesting question: How much of the performance of Mamba, S4, and other state-space-like models is actually just attributable to some very core concepts - rather than their elaborate architectures. The authors construct minimal versions of GRUs and LSTMs and report competitive performance. Paper: https://arxiv.org/abs/2410.01201 Abstract:

The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training. As a result, many novel recurrent architectures, such as S4, Mamba, and Aaren, have been proposed that achieve comparable performance. In this work, we revisit traditional …

8 months, 3 weeks назад @ youtube.com
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper) Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

How can one best use extra FLOPS at test time? Paper: https://arxiv.org/abs/2408.03314 Abstract:

Enabling LLMs to improve their outputs by using more test-time computation is a critical step towards building generally self-improving agents that can operate on open-ended natural language. In this paper, we study the scaling of inference-time computation in LLMs, with a focus on answering the question: if an LLM is allowed to use a fixed but non-trivial amount of inference-time compute, how much can it improve its performance on a challenging prompt? Answering this question has implications not only on the achievable performance of LLMs, but also on the future of LLM pretraining and how one s…

9 months назад @ youtube.com
Henry AI Labs Henry AI Labs
последний пост None
3blue1brown 3blue1brown
последний пост 2 months назад
Summer of Math Exposition #4 | Teachers, I'd love to hear from you
Summer of Math Exposition #4 | Teachers, I'd love to hear from you Summer of Math Exposition #4 | Teachers, I'd love to hear from you

Make a math explainer, get feedback, and receive prizes: https://some.3b1b.co

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos. ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim

https://github.com/3b1b/manim

https://github.com/ManimCommunity/manim/ All code for specific videos is visible here:

https://github.com/3b1b/videos/ The music is by Vincent Rubinetti.

https://www.vincentrubinetti.com

https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

https://open.spotify.com/…

2 months назад @ youtube.com
Where my explanation of Grover’s algorithm failed
Where my explanation of Grover’s algorithm failed Where my explanation of Grover’s algorithm failed

Addressing viewer questions from the last video.

These lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to share the videos. ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim

https://github.com/3b1b/manim

https://github.com/ManimCommunity/manim/ All code for specific videos is visible here:

https://github.com/3b1b/videos/ The music is by Vincent Rubinetti.

https://www.vincentrubinetti.com

https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u ------------------ 3blue1brown is a ch…

2 months назад @ youtube.com
But what is Quantum Computing? (Grover's Algorithm)
But what is Quantum Computing?  (Grover's Algorithm) But what is Quantum Computing? (Grover's Algorithm)

Qubits, state vectors, and Grover's algorithm for search.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to share the videos. The subtitles on this video were done using AI, and are likely imperfect, but they are open for community corrections at https://criblate.com/ Adam Brown's paper on the connection between Grover's Algorithm and block collisions:

https://arxiv.org/pdf/1912.02207 If you want to learn the relevant underlying quantum mechanics here, a very friendly resource is the course Mithuna at Looking Glass Universe is currently putting together. See, for instance, this explainer of a qubit:…

2 months назад @ youtube.com
Testing your intuition for quantum computing
Testing your intuition for quantum computing Testing your intuition for quantum computing

Full video: https://youtu.be/RQWpF2Gb-gU

2 months назад @ youtube.com
How to measure nearby galaxies
How to measure nearby galaxies How to measure nearby galaxies

From this video: https://youtu.be/hFMaT9oRbs4

2 months, 2 weeks назад @ youtube.com
Measuring the distance to Venus without radar
Measuring the distance to Venus without radar Measuring the distance to Venus without radar

From this video with Terry Tao: https://youtu.be/hFMaT9oRbs4

2 months, 4 weeks назад @ youtube.com
Measuring the speed of light using Jupiter's moons
Measuring the speed of light using Jupiter's moons Measuring the speed of light using Jupiter's moons

From this video with Terry Tao: https://youtu.be/hFMaT9oRbs4

2 months, 4 weeks назад @ youtube.com
The tragic tale of Guillaume Le Gentil
The tragic tale of Guillaume Le Gentil The tragic tale of Guillaume Le Gentil

From this video: https://youtu.be/hFMaT9oRbs4 Artwork by Kurt Bruns

3 months, 1 week назад @ youtube.com
Zooming out by powers of 10
Zooming out by powers of 10 Zooming out by powers of 10

From this video: https://youtu.be/YdOXS_9_P4U

3 months, 1 week назад @ youtube.com
There's more to those colliding blocks that compute pi
There's more to those colliding blocks that compute pi There's more to those colliding blocks that compute pi

Two colliding blocks compute pi, here we dig into the physics to explain why

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos. The original paper by Gregory Galperin:

https://www.maths.tcd.ie/~lebed/Galperin.%20Playing%20pool%20with%20pi.pdf Adam Brown's paper on the analogy with Grover's Algorithm:

https://arxiv.org/pdf/1912.02207 Here's a lovely interactive built by GitHub user prajwalsouza after watching this video: https://prajwalsouza.github.io/Experiments/Colliding-Blocks.html Matt Parker's Pi Day video:

https://youtu.be/vlUTlbZT4ig NY Times blog post about this proble…

3 months, 3 weeks назад @ youtube.com
When being beautifully wrong leads to discovery
When being beautifully wrong leads to discovery When being beautifully wrong leads to discovery

Full video: https://youtu.be/YdOXS_9_P4U

4 months, 1 week назад @ youtube.com
Why the ancient Greek's rejected heliocentrism
Why the ancient Greek's rejected heliocentrism Why the ancient Greek's rejected heliocentrism

From this video on the cosmic distance ladder: https://youtu.be/YdOXS_9_P4U

4 months, 1 week назад @ youtube.com
How to estimate the distance to the sun
How to estimate the distance to the sun How to estimate the distance to the sun

Full video: https://youtu.be/YdOXS_9_P4U

4 months, 1 week назад @ youtube.com
How Aristarchus deduced the distance to the moon
How Aristarchus deduced the distance to the moon How Aristarchus deduced the distance to the moon

Full video: https://youtu.be/YdOXS_9_P4U

4 months, 1 week назад @ youtube.com
The cosmic distance ladder with Terence Tao (part 2)
The cosmic distance ladder with Terence Tao (part 2) The cosmic distance ladder with Terence Tao (part 2)

How we know the distances to the planets, stars, and faraway galaxies.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

FAQ with added details and corrections: https://terrytao.wordpress.com/2025/02/13/cosmic-distance-ladder-video-with-grant-sanderson-3blue1brown-commentary-and-corrections/ An equally valuable form of support is to simply share the videos. Terry and his collaborator Tanya have an Instagram about the cosmic distance ladder: https://www.instagram.com/cosmic_distance_ladder/ Artwork of Guillaume Le Gentil by Kurt Bruns

Artwork of Antonia Maury and Henrietta Leavitt by Talia Gershon: https://bit.ly/taliagershonart

Several of t…

4 months, 1 week назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 1 week, 1 day назад
Unreal Engine 5.6: Outrageously Good!
Unreal Engine 5.6: Outrageously Good! Unreal Engine 5.6: Outrageously Good!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video Get Unreal Engine 5.6:

https://www.unrealengine.com/en-US/news/unreal-engine-5-6-is-now-available Substrate material source + tutorial : https://www.youtube.com/watch?v=YDeptWduHNc 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to t…

1 week, 1 day назад @ youtube.com
NVIDIA’s New AI Watched 150,000 Videos! What Did It Learn?
NVIDIA’s New AI Watched 150,000 Videos! What Did It Learn? NVIDIA’s New AI Watched 150,000 Videos! What Did It Learn?

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The paper is available here:

https://research.nvidia.com/labs/toronto-ai/UniRelight/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Apple Music Sing demonstration source: https://www.youtube.com/watch?v=Q6Qpsvwh6mQ Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our ge…

1 week, 4 days назад @ youtube.com
OpenAI’s o3 Pro: Crushing The AI Game Test! 🎮
OpenAI’s o3 Pro: Crushing The AI Game Test! 🎮 OpenAI’s o3 Pro: Crushing The AI Game Test! 🎮

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 Paper+code: https://github.com/lmgame-org/GamingAgent

Some results: https://huggingface.co/spaces/lmgame/lmgame_bench

Try it out: https://lmgame.org 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supp…

2 weeks, 1 day назад @ youtube.com
NVIDIA’s New AI Grows Stuff Out Of Nothing!
NVIDIA’s New AI Grows Stuff Out Of Nothing! NVIDIA’s New AI Grows Stuff Out Of Nothing!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The papers are available here:

https://research.nvidia.com/labs/toronto-ai/stochastic-preconditioning/

https://zju3dv.github.io/freetimegs/ Play with it (interactive viewer): https://www.4dv.ai/viewer/salmon_10s?showdemo=4dv 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/ar…

2 weeks, 5 days назад @ youtube.com
Google’s New AI: This Isn’t a Photo - But It Is!
Google’s New AI: This Isn’t a Photo - But It Is! Google’s New AI: This Isn’t a Photo - But It Is!

❤️ Check out the Fully Connected conference from Weights & Biases on June 17-18th in SF:

https://wandb.me/fc2025

Use the code FCSF2WP to get a ticket for free! 📝 The paper "Practical Inverse Rendering of Textured and Translucent Appearance" is available here:

https://weiphil.github.io/portfolio/practical_reconstruction 📝 Separable Subsurface Scattering:

https://users.cg.tuwien.ac.at/zsolnai/gfx/separable-subsurface-scattering-with-activision-blizzard/ Free rendering course!

https://users.cg.tuwien.ac.at/zsolnai/gfx/rendering-course/ 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Jua…

3 weeks, 2 days назад @ youtube.com
NVIDIA’s New AI: Next Level Games Are Coming!
NVIDIA’s New AI: Next Level Games Are Coming! NVIDIA’s New AI: Next Level Games Are Coming!

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 📝 The papers are available here:

https://research.nvidia.com/labs/toronto-ai/difix3d/

https://sites.google.com/view/cast4

https://syntec-research.github.io/UVGA/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedd…

3 weeks, 5 days назад @ youtube.com
Microsoft’s New AI: Ray Tracing 16,000,000 Images!
Microsoft’s New AI: Ray Tracing 16,000,000 Images! Microsoft’s New AI: Ray Tracing 16,000,000 Images!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The paper "RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination" is available here:

https://microsoft.github.io/renderformer/ 📝 Our neural rendering paper "Gaussian Material Synthesis" is available here:

https://users.cg.tuwien.ac.at/zsolnai/gfx/gaussian-material-synthesis/ 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, …

4 weeks, 1 day назад @ youtube.com
NVIDIA’s New AI: From Video Games to Reality!
NVIDIA’s New AI: From Video Games to Reality! NVIDIA’s New AI: From Video Games to Reality!

❤️ Try Macro for free and supercharge your learning: https://macro.com/papers 📝 The #NVIDIA papers are available here:

https://research.nvidia.com/labs/dir/cosmos-transfer1/

https://research.nvidia.com/labs/dir/cosmos-reason1/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Ric…

1 month назад @ youtube.com
NVIDIA’s New AI: Impossible Weather Graphics!
NVIDIA’s New AI: Impossible Weather Graphics! NVIDIA’s New AI: Impossible Weather Graphics!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The papers are available here:

https://research.nvidia.com/labs/toronto-ai/WeatherWeaver/

https://research.nvidia.com/labs/toronto-ai/DiffusionRenderer/ Source: https://www.youtube.com/watch?v=CVdtLieI5D0 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01…

1 month, 1 week назад @ youtube.com
DeepMind’s Veo3 AI - The New King Is Here!
DeepMind’s Veo3 AI - The New King Is Here! DeepMind’s Veo3 AI - The New King Is Here!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 More on Veo3 available here:

https://deepmind.google/models/veo/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, …

1 month, 2 weeks назад @ youtube.com
DeepMind’s AlphaEvolve AI: History In The Making!
DeepMind’s AlphaEvolve AI: History In The Making! DeepMind’s AlphaEvolve AI: History In The Making!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 AlphaEvolve: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

📝 My genetic algorithm for the Mona Lisa: https://users.cg.tuwien.ac.at/zsolnai/gfx/mona_lisa_parallel_genetic_algorithm/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

1 month, 2 weeks назад @ youtube.com
New AI: Impossible Creatures Come Alive!
New AI: Impossible Creatures Come Alive! New AI: Impossible Creatures Come Alive!

❤️ Check out DeepInfra and run DeepSeek or many other AI projects: https://deepinfra.com/papers 📝 The papers are available here:

https://anytop2025.github.io/Anytop-page/

https://zhongleilz.github.io/Sketch2Anim/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall,…

1 month, 3 weeks назад @ youtube.com
NVIDIA’s New AI: Impossible Video Game Animations!
NVIDIA’s New AI: Impossible Video Game Animations! NVIDIA’s New AI: Impossible Video Game Animations!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The #NVIDIA paper "GENMO: A GENeralist Model for Human MOtion" is available here:

https://research.nvidia.com/labs/dair/genmo/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 Sources for SLAM:

https://www.youtube.com/watch?v=2GJuEIh4xGo

https://ww…

1 month, 3 weeks назад @ youtube.com
3 Ways OpenAI’s ChatGPT Surprised Its Creators!
3 Ways OpenAI’s ChatGPT Surprised Its Creators! 3 Ways OpenAI’s ChatGPT Surprised Its Creators!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video OpenAI post: https://openai.com/index/expanding-on-sycophancy/ Paper on agreeableness: https://arxiv.org/abs/2212.09251 Source: https://x.com/georgejrjrjr/status/1917722125668081863/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to…

1 month, 4 weeks назад @ youtube.com
Blender 4.4 Is Here - Still The Best…For Free!
Blender 4.4 Is Here - Still The Best…For Free! Blender 4.4 Is Here - Still The Best…For Free!

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers Get Blender: https://www.blender.org/ Demo files: https://www.blender.org/download/demo-files/

Full donut tutorial: https://www.youtube.com/watch?v=4haAdmHqGOw&pp=ygUWYW5kcmV3IHByaWNlIGRvbnV0IDQuNA%3D%3D Our papers that uses Blender: https://users.cg.tuwien.ac.at/zsolnai/gfx/photorealistic-material-editing/

https://users.cg.tuwien.ac.at/zsolnai/gfx/gaussian-material-synthesis/ Subsurface scattering video source: https://www.youtube.com/shorts/YqxSzGAKiPM

Blue-noise dithered sampling: https://iliyan.com/publications/DitheredSampling Donate to Blender: https://fund.blender.org/ 📝 My paper on simulations th…

2 months назад @ youtube.com
DataFest Video DataFest Video
последний пост None
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 5 days, 12 hours назад
Что за Муза в Алисе от Яндекса
Что за Муза в Алисе от Яндекса Что за Муза в Алисе от Яндекса

Это фрагмент выступления Дарьи Виноградовой, руководителя команды стримингового зрения в Яндексе. На Data Fest Дарья рассказала о VLM в Алисе. #Алиса #Муза #Яндекс #VLM #DataFest #ИИ #нейросети #AI #умныеустройства #визуальныемодели #визуальныйИИ #ЯндексТехнологии #визуальноятехнология #ассистентыбудущего #технологии #компьютерноеЗрение #LLM #интеллектуальныйассистент #стриминговоезрение #нейроассистент #техконференция

5 days, 12 hours назад @ youtube.com
Как прогнозировать тысячи временных рядов и не сходить с ума / Александр Исаков
Как прогнозировать тысячи временных рядов и не сходить с ума / Александр Исаков Как прогнозировать тысячи временных рядов и не сходить с ума / Александр Исаков

Александр Исаков, руководитель группы прогнозирования в Яндекс Лавке, прочитал этот доклад на конференции AHA!25. Александр рассказал о создании и внедрении системы среднесрочного прогнозирования для Яндекс Лавки. Эта система точно прогнозирует ключевую бизнес-метрику (заказы) на полугодовой период и помогает планировать ресурсы в условиях быстрого роста сервиса и высокой волатильности спроса. Александр подробно остановился на том, как ребята решили проблему волатильности временных рядов с помощью приведения данных к стабильному виду и выделения ключевых влияющих факторов, что позволило не только улучшить прогноз, но и объяснить отклонения от плана. А ещё рассказал, как они встроили прогноз…

1 week, 2 days назад @ youtube.com
Как продукты на основе LLM повышают эффективность сотрудников и экономят миллионы / Эльвира Морозова
Как продукты на основе LLM повышают эффективность сотрудников и экономят миллионы / Эльвира Морозова Как продукты на основе LLM повышают эффективность сотрудников и экономят миллионы / Эльвира Морозова

Эльвира Морозова, ХХ в Яндексе, прочитала этот доклад на конференции AHA!25. Эльвира рассказала, как ребята внедрили YaGPT в формате подсказок оператора в одну из поддержек Яндекса. И в итоге получили доказанный в A/B-тестах экстраэффект как на скорости работы операторов, так и на качестве ответов. Эльвира показала, почему «экзоскелет» из GPT для оператора поддержки — это новая норма. Спойлер: применение GPT в саппорте сейчас приносит 15% чистой экономии в Яндекс Маркете. Больше классных материалов ищите в телеграм-канале Yandex for ML: https://t.me/yandexforml #GPT #Яндекс #поддержка #аналитика #AHA25 #YaGPT #искусственныйинтеллект #YandexForAnalytics #LLM #AI

1 week, 3 days назад @ youtube.com
Будущее рекомендательных систем / Николай Савушкин
Будущее рекомендательных систем / Николай Савушкин Будущее рекомендательных систем / Николай Савушкин

Николай Савушкин, руководитель службы рекомендательных технологий в Яндексе, прочитал этот доклад на конференции AHA!25. Николай рассказал, как персонализация устроена сейчас, как большие генеративные модели внедряют в рекомендательные системы Яндекса и что ждёт нас в будущем. Доклад направлен на широкую аудиторию с техническим бэкграундом, знакомит с основными понятиями и показывает технологические тренды. Больше классных материалов ищите в телеграм-канале Yandex for ML: https://t.me/yandexforml #персонализация #рекомендации #машинноеобучение #генеративныемодели #Яндекс #AI #LLM #AHA25 #аналитика #YandexForAnalytics

1 week, 4 days назад @ youtube.com
Опыт внедрения ML-прогнозов в систему динамического ценообразования Яндекс Доставки / Андрей Нарцев
Опыт внедрения ML-прогнозов в систему динамического ценообразования Яндекс Доставки / Андрей Нарцев Опыт внедрения ML-прогнозов в систему динамического ценообразования Яндекс Доставки / Андрей Нарцев

Андрей Нарцев, руководитель Applied ML в Яндекс Доставке, прочитал этот доклад на конференции AHA!25. Андрей разобрал ключевые аналитические, продуктовые и ML-вызовы, с которыми ребята столкнулись при внедрении динамического ценообразования для замедленных тарифов. А ещё поделился инсайтами и полезными практиками, которые помогли сделать ценообразование ещё более адаптивным. После просмотра этого доклада вы будете лучше понимать сложности внедрения ML-моделей в продукт, научитесь заранее предотвращать потенциальные проблемы и применять представленные подходы в своей работе. Больше классных материалов ищите в телеграм-канале Yandex for ML: https://t.me/yandexforml #ML #ценообразование #Яндек…

1 week, 5 days назад @ youtube.com
Time Capsule от Data Fest: как технологии изменят всё
Time Capsule от Data Fest: как технологии изменят всё Time Capsule от Data Fest: как технологии изменят всё

В этом году конференция Data Fest отмечала юбилей — 10 лет. За это время машинное обучение проникло во многие сферы нашей жизни, в том числе в рабочие процессы. Нам стало интересно: как развитие ML изменит нашу привычную реальность ещё через 10 лет? Мы спросили об этом у гостей конференции, и вот что из этого вышло! #DataFest #машинноеобучение #ML #искусственныйинтеллект #будущеетехнологий #нейросети #цифровоебудущее #технологии #Яндекс #AI

2 weeks, 5 days назад @ youtube.com
Fast and Approximate Responses / Артур Соловьёв
Fast and Approximate Responses / Артур Соловьёв Fast and Approximate Responses / Артур Соловьёв

Это выступление на Data Fest в гостях у Яндекса в секции OptimalDL. Артур Соловьёв прочитал доклад на английском языке на тему: Fast and Approximate Responses. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #DataFest #OptimalDL #FastResponses #ApproximateComputing #MLOptimization #DeepLearning #AIperformance #инференс #оптимизация #нейросети #modelcompression #AI #машиннообучение #inferencespeed #lowlatencyAI #АртурСоловьёв #DLresearch #DataFest2025

1 month назад @ youtube.com
Как я создал ИИ-переводчик для славянской интерлингвы (межславянского языка) / Салават Гарифуллин
Как я создал ИИ-переводчик для славянской интерлингвы (межславянского языка) / Салават Гарифуллин Как я создал ИИ-переводчик для славянской интерлингвы (межславянского языка) / Салават Гарифуллин

Это доклад на Data Fest в гостях у Яндекса. В секции NLP Салават Гарифуллин рассказал, как создал первый ИИ-переводчик для славянской интерлингвы — искусственного межславянского языка. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #DataFest #NLP #ИИпереводчик #межславянскийязык #интерлингва #машинныйперевод #языки #AI #linguistics #нейросети #naturalanguageprocessing #перевод #lowresource #slaviclanguages #SalavatGarifullin #искусственныйинтеллект #languageAI #переводчик #DataFest2025

1 month назад @ youtube.com
Как мы делали умного помощника в Лавке на основе YaGPT / Алёна Зайцева
Как мы делали умного помощника в Лавке на основе YaGPT / Алёна Зайцева Как мы делали умного помощника в Лавке на основе YaGPT / Алёна Зайцева

Это доклад на Data Fest в гостях у Яндекса. В секции Advanced LLMs Алёна Зайцева рассказала, как ребята делали умного помощника в Лавке на основе YaGPT. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #DataFest #AdvancedLLMs #ЯндексЛавка #YaGPT #LLM #умныйпомощник #AI #искусственныйинтеллект #генеративныйИИ #голосовойпомощник #NLP #персонализация #Lavka #AIassistant #frontendAI #backendAI #biglanguagemodels #AlenaZaytseva #DataFest2025

1 month назад @ youtube.com
From Tokens to Thinking: How Reinforcement Learning Fuels Reasoning in LLMs / Миле Митрович
From Tokens to Thinking: How Reinforcement Learning Fuels Reasoning in LLMs / Миле Митрович From Tokens to Thinking: How Reinforcement Learning Fuels Reasoning in LLMs / Миле Митрович

Это выступление на Data Fest в гостях у Яндекса в секции Advanced LLMs. Миле Митрович прочитал доклад на английском языке на тему: From Tokens to Thinking: How Reinforcement Learning Fuels Reasoning in LLMs. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #DataFest #AdvancedLLMs #MilaMitrovic #LLM #reinforcementlearning #reasoning #машинноемышление #RLHF #AI #большиеязыковыемодели #искусственныйинтеллект #нейросети #обучениесподкреплением #AIthinking #futureofAI #generativeAI #LLMarchitecture

1 month назад @ youtube.com
Обучение LLM в низкой точности вычислений / Андрей Панфёров
Обучение LLM в низкой точности вычислений / Андрей Панфёров Обучение LLM в низкой точности вычислений / Андрей Панфёров

Это доклад на Data Fest в гостях у Яндекса. В секции DL Frontiers Андрей Панфёров рассказал про обучение LLM в низкой точности вычислений. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #DataFest #DLFrontiers #LLM #LowPrecision #АндрейПанфёров #машинноеобучение #глубокообучение #искусственныйинтеллект #AI #нейросети #оптимизациямоделей #ML #большиемодели #обучениеLLM #AIтехнологии #MLинфраструктура #ресурсоэффективность

1 month назад @ youtube.com
Библиотека для создания рекомендательных систем RePlay / Алексей Васильев
Библиотека для создания рекомендательных систем RePlay / Алексей Васильев Библиотека для создания рекомендательных систем RePlay / Алексей Васильев

Это доклад на Data Fest в гостях у Яндекса. В секции Open Source Алексей Васильев рассказал о RePlay, библиотеке для создания рекомендательных систем. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #DataFest #OpenSource #RePlay #RecSys #MachineLearning #ML #рекомендации #алгоритмы #библиотека #персонализация #рекомендательныесистемы #opensource #recommendationengine #AI #АлексейВасильев #DataFest2025 #MLtools

1 month назад @ youtube.com
Хемоинформатика — это область на стыке химии и информационных технологий / Ксения Никитина
Хемоинформатика — это область на стыке химии и информационных технологий / Ксения Никитина Хемоинформатика — это область на стыке химии и информационных технологий / Ксения Никитина

Это доклад на Data Fest в гостях у Яндекса. В секции ML in Chemistry Ксения Никитина рассказала о хемоинформатике, научной области на стыке химии и информационных технологий. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #DataFest #MLinChemistry #хемоинформатика #машинноеобучение #искусственныйинтеллект #AI #наука #chemoinformatics #КсенияНикитина #химия #цифроваяхимия #AIвнауке #нейросетивхимии #интердисциплинарность #наукиожизни

1 month назад @ youtube.com
R&D and Deployment of Computer Vision Models in Industrial Environments / Дмитрий Юновидов
R&D and Deployment of Computer Vision Models in Industrial Environments / Дмитрий Юновидов R&D and Deployment of Computer Vision Models in Industrial Environments / Дмитрий Юновидов

Это выступление на Data Fest в гостях у Яндекса в секции ML in Manufacturing. Дмитрий Юновидов прочитал доклад на английском языке на тему: Science Like Industry. MLOps for the Effective R&D Development and Deployment of Computer Vision Models in Industrial Environments. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #DataFest #MLinManufacturing #MLOps #ComputerVision #AI #ML #ScienceLikeIndustry #промышленныйИИ #индустриальныйAI #машинноеобучение #искусственныйинтеллект #deployment #RND #ДмитрийЮновидов #MLOpsвпроизводстве #AIвпромышленности

1 month назад @ youtube.com
Продуктовые применения Yandex VLM / Екатерина Глазкова
Продуктовые применения Yandex VLM / Екатерина Глазкова Продуктовые применения Yandex VLM / Екатерина Глазкова

Это доклад на Data Fest в гостях у Яндекса. В секции Practical ML Екатерина Глазкова рассказала о продуктовых применениях Yandex VLM. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml #YandexVLM #VLM #мультимодели #AI #ML #DataFest #PracticalML #Яндекс #визуальноязыковыемодели #искусственныйинтеллект #мультимодальныйИИ #компьютерноезрение #ЕкатеринаГлазкова #технологииЯндекса #machinelearning

1 month назад @ youtube.com
ML Trainings ML Trainings
последний пост 3 months назад
Анастасия Функнер, Ольга Павлова, Анна Ефимова | Как Ozon Банк создает ML-платформу будущего
Анастасия Функнер, Ольга Павлова, Анна Ефимова | Как Ozon Банк создает ML-платформу будущего Анастасия Функнер, Ольга Павлова, Анна Ефимова | Как Ozon Банк создает ML-платформу будущего

Спикеры: Анастасия Функнер, Ольга Павлова, Анна Ефимова, Ozon Банк

Тема доклада: MLOps, Data Science и Golang: Как Ozon Банк создает ML-платформу будущего

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:Telegram: https://t.me/datafestВконтакте: https://vk.com/datafestКанал с вакансиями в telegram: https://t.me/odsjobsКанал с апдейтами по курсам: https://t.me/odscoursesКак попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 months назад @ youtube.com
Алена Феногенова | Новые бенчмарки 2024-2025 для русского языка: вызовы и перспективы
Алена Феногенова | Новые бенчмарки 2024-2025 для русского языка: вызовы и перспективы Алена Феногенова | Новые бенчмарки 2024-2025 для русского языка: вызовы и перспективы

Спикер: Алена Феногенова, AGI NLP TeamLead, Сбер

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 months назад @ youtube.com
Мария Бегичева | Цифровой помощник компании в управлении операционными рисками
Мария Бегичева | Цифровой помощник компании в управлении операционными рисками Мария Бегичева | Цифровой помощник компании в управлении операционными рисками

Спикер: Мария Бегичева, Senior DS, Сбер

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 months назад @ youtube.com
Полина Федотова | Foundation Models in Robotics
Полина Федотова |  Foundation Models in Robotics Полина Федотова | Foundation Models in Robotics

Спикер: Полина Федотова, Главный инженер-разработчик, лид исследовательской команды, «Сбер Центр робототехники»

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 months назад @ youtube.com
Нонна Шахова, Эмели Драль,Ирина Голощапова, Анастасия Никулина | Круглый стол: Women in Data Science
Нонна Шахова, Эмели Драль,Ирина Голощапова, Анастасия Никулина | Круглый стол: Women in Data Science Нонна Шахова, Эмели Драль,Ирина Голощапова, Анастасия Никулина | Круглый стол: Women in Data Science

Спикеры: Нонна Шахова, Эмели Драль, Ирина Голощапова, Анастасия Никулина

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025

Как женщины меняют data science:

Старт в data science:

Работа и карьера:

Профессиональное развитие:

Будущее в data science: Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 months назад @ youtube.com
Анна Текучева | "Знаешь что это за слово? а оно есть"
Анна Текучева | "Знаешь что это за слово? а оно есть" Анна Текучева | "Знаешь что это за слово? а оно есть"

Спикер: Анна Текучева, Data Scientist HML Wildberries

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 months назад @ youtube.com
Юлия Раковская | Вступительное слово WiDS Meetup
Юлия Раковская | Вступительное слово WiDS Meetup Юлия Раковская | Вступительное слово WiDS Meetup

Спикер: Юлия Раковская, Руководитель центра исследований и разработки Сбера

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 months назад @ youtube.com
Митап #1 | Data Fusion Contest 2025
Митап #1 | Data Fusion Contest 2025 Митап #1 | Data Fusion Contest 2025

Ежегодная серия соревнований по машинному обучению Data Fusion Contest стартовала. Страница с задачами: https://ods.ai/tracks/data-fusion-2025-competitions Во вторник 25 февраля (19:00 - 20:00 по мск) мы провели первый митап по Data Fusion Contest 2025 в ODS спейсе Spatial.Chat. Поговорили о задачах, ответим на ваши вопросы участников. В программе: Анатолий Глушенко Разбор задачи 1 “LabelCraft”

Алексей Натёкин Обзор задачи 2 “4Cast” и задачи 3 “Distribution” Команда Data Fusion Contest 2025 Q&A и обсуждение вопросов с участниками

4 months, 1 week назад @ youtube.com
Анастасия Вепрева | Мастер-класс: Генерация лекарственных молекул
Анастасия Вепрева | Мастер-класс: Генерация лекарственных молекул Анастасия Вепрева | Мастер-класс: Генерация лекарственных молекул

Спикер: Анастасия Вепрева - разработчик моделей в области прогнозирования физико-химических свойств и биологических активностей малых молекул, сотрудница Центра ИИ в химии ИТМО

Мероприятие 21.02.2025: https://ods.ai/events/ai_chemistrymk1 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

4 months, 1 week назад @ youtube.com
Антон Воронов | Итоги года в DS/ML карьерных вопросах
Антон Воронов | Итоги года в DS/ML карьерных вопросах Антон Воронов | Итоги года в DS/ML карьерных вопросах

Спикер: Антон Воронов, Газпром ИД, Заместитель директора департамента, Руководитель платформы Поиска Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

4 months, 1 week назад @ youtube.com
Пётр Ермаков | Итоги года в PyData stack
Пётр Ермаков | Итоги года в PyData stack Пётр Ермаков | Итоги года в PyData stack

Спикер: Пётр Ермаков, ML Brand Director, Яндекс Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

4 months, 1 week назад @ youtube.com
Валентин Малых | Итоги года в NLP
Валентин Малых | Итоги года в NLP Валентин Малых | Итоги года в NLP

Спикер: Валентин Малых, руководитель группы, MTS AI Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

4 months, 1 week назад @ youtube.com
Николай Анохин | Итоги года в RecSys
Николай Анохин | Итоги года в RecSys Николай Анохин | Итоги года в RecSys

Спикер: Николай Анохин, ведущий специалист по ML, AI VK Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

4 months, 1 week назад @ youtube.com
Ирина Голощапова | Итоги года в Reliable ML, часть 2
Ирина Голощапова | Итоги года в Reliable ML, часть 2 Ирина Голощапова | Итоги года в Reliable ML, часть 2

Спикер: Ирина Голощапова, CDO Raiffeisenbank Operations Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

4 months, 1 week назад @ youtube.com
Дмитрий Колодезев | Итоги года в Reliable ML, часть 1
Дмитрий Колодезев | Итоги года в Reliable ML, часть 1 Дмитрий Колодезев | Итоги года в Reliable ML, часть 1

Спикер: Дмитрий Колодезев, директор, Промсофт Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

4 months, 1 week назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 1 week, 2 days назад
#473 – Iran War Debate: Nuclear Weapons, Trump, Peace, Power & the Middle East
#473 – Iran War Debate: Nuclear Weapons, Trump, Peace, Power & the Middle East #473 – Iran War Debate: Nuclear Weapons, Trump, Peace, Power & the Middle East

Debate on Iran war between Scott Horton and Mark Dubowitz.

Scott Horton is the author and director of the Libertarian Institute, editorial director of Antiwar.com, host of The Scott Horton Show, and for the past three decades, a staunch critic of U.S. foreign policy and military interventionism.

Mark Dubowitz is the chief executive of the Foundation for Defense of Democracies, host of the Iran Breakdown podcast, and a leading expert on Iran and its nuclear program for over 20 years.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep473-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://drinkLMNT.c…

1 week, 2 days назад @ lexfridman.com
#472 – Terence Tao: Hardest Problems in Mathematics, Physics & the Future of AI
#472 – Terence Tao: Hardest Problems in Mathematics, Physics & the Future of AI #472 – Terence Tao: Hardest Problems in Mathematics, Physics & the Future of AI

Terence Tao is widely considered to be one of the greatest mathematicians in history.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep472-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexNetSuite: Business management software.

Go to http://netsuite.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

2 weeks, 6 days назад @ lexfridman.com
#471 – Sundar Pichai: CEO of Google and Alphabet
#471 – Sundar Pichai: CEO of Google and Alphabet #471 – Sundar Pichai: CEO of Google and Alphabet

Sundar Pichai is CEO of Google and Alphabet.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep471-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/sundar-pichai-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

Sundar's X: https://x.com/sundarpichai

Sundar's Instagram: https://instagram.com/sundarpichai

Sundar's Blog: https://blog.goo…

1 month назад @ lexfridman.com
#470 – James Holland: World War II, Hitler, Churchill, Stalin & Biggest Battles
#470 – James Holland: World War II, Hitler, Churchill, Stalin & Biggest Battles #470 – James Holland: World War II, Hitler, Churchill, Stalin & Biggest Battles

James Holland is a historian specializing in World War II.

He hosts a podcast called WW2 Pod: We Have Ways of Making You Talk.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep470-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkag1.com/lexNotion: Note-taking and team collaboration.

1 month, 1 week назад @ lexfridman.com
#469 – Oliver Anthony: Country Music, Blue-Collar America, Fame, Money, and Pain
#469 – Oliver Anthony: Country Music, Blue-Collar America, Fame, Money, and Pain #469 – Oliver Anthony: Country Music, Blue-Collar America, Fame, Money, and Pain

Oliver Anthony is singer-songwriter who first gained worldwide fame with his viral hit Rich Men North of Richmond.

He became a voice for many who are voiceless, with many of his songs speaking to the struggle of the working class in modern American life.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep469-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://oracle.com/lexTax Network USA: Full-service tax firm.

Go to https://drinkLMNT.com/lexOUTLINE:(00:00) – Introduction(09:00) – Open mics(13:03) – Mainstream country music(22:10) – Fame(28:06) – Music vs politics(36:56) – Rich Men North of Richmon…

1 month, 2 weeks назад @ lexfridman.com
#468 – Janna Levin: Black Holes, Wormholes, Aliens, Paradoxes & Extra Dimensions
#468 – Janna Levin: Black Holes, Wormholes, Aliens, Paradoxes & Extra Dimensions #468 – Janna Levin: Black Holes, Wormholes, Aliens, Paradoxes & Extra Dimensions

Janna Levin is a theoretical physicist and cosmologist specializing in black holes, cosmology of extra dimensions, topology of the universe, and gravitational waves.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep468-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://brain.fm/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexNetSuite: Business management software.

Go to https://shopify.com/lexAG1: All-in-one daily nutrition drink.

2 months назад @ lexfridman.com
#467 – Tim Sweeney: Fortnite, Unreal Engine, and the Future of Gaming
#467 – Tim Sweeney: Fortnite, Unreal Engine, and the Future of Gaming #467 – Tim Sweeney: Fortnite, Unreal Engine, and the Future of Gaming

Tim Sweeney is a legendary video game programmer, founder and CEO of Epic Games that created the Unreal Engine, Fortnite, Gears of War, Unreal Tournament, and many other groundbreaking and influential video games.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep467-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://notion.com/lexMasterClass: Online classes from world-class experts.

Go to https://shopify.com/lexAG1: All-in-one daily nutrition drink.

Go to https://drinkag1.com/lexLMNT: Zero-sugar electrolyte drink mix.

2 months назад @ lexfridman.com
#466 – Jeffrey Wasserstrom: China, Xi Jinping, Trade War, Taiwan, Hong Kong, Mao
#466 – Jeffrey Wasserstrom: China, Xi Jinping, Trade War, Taiwan, Hong Kong, Mao #466 – Jeffrey Wasserstrom: China, Xi Jinping, Trade War, Taiwan, Hong Kong, Mao

Jeffrey Wasserstrom is a historian of modern China.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep466-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://oracle.com/lexTax Network USA: Full-service tax firm.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

2 months, 1 week назад @ lexfridman.com
#465 – Robert Rodriguez: Sin City, Desperado, El Mariachi, Alita, and Filmmaking
#465 – Robert Rodriguez: Sin City, Desperado, El Mariachi, Alita, and Filmmaking #465 – Robert Rodriguez: Sin City, Desperado, El Mariachi, Alita, and Filmmaking

Robert Rodriguez is a legendary filmmaker and creator of Sin City, El Mariachi, Desperado, Spy Kids, Machete, From Dusk Till Dawn, Alita: Battle Angel, The Faculty, and his newest venture Brass Knuckle Films.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep465-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/robert-rodriguez-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: http…

2 months, 2 weeks назад @ lexfridman.com
#464 – Dave Smith: Israel, Ukraine, Epstein, Mossad, Conspiracies & Antisemitism
#464 – Dave Smith: Israel, Ukraine, Epstein, Mossad, Conspiracies & Antisemitism #464 – Dave Smith: Israel, Ukraine, Epstein, Mossad, Conspiracies & Antisemitism

Dave Smith is a comedian, libertarian, political commentator, and the host of Part of the Problem podcast.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep464-sc

See below for timestamps, and to give feedback, submit questions, contact Lex, etc. CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

Dave's X: https://x.com/ComicDaveSmith

Dave's YouTube: https://youtube.com/DSmithcomic

Dave's Instagram: https://instagram.com/theprobl…

2 months, 3 weeks назад @ lexfridman.com
#463 – Douglas Murray: Putin, Zelenskyy, Trump, Israel, Netanyahu, Hamas & Gaza
#463 – Douglas Murray: Putin, Zelenskyy, Trump, Israel, Netanyahu, Hamas & Gaza #463 – Douglas Murray: Putin, Zelenskyy, Trump, Israel, Netanyahu, Hamas & Gaza

Douglas Murray is the author of On Democracies and Death Cults, The War on The West, and The Madness of Crowds.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep463-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://callofduty.com/warzoneOracle: Cloud infrastructure.

Go to https://oracle.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

3 months, 1 week назад @ lexfridman.com
#462 – Ezra Klein and Derek Thompson: Politics, Trump, AOC, Elon & DOGE
#462 – Ezra Klein and Derek Thompson: Politics, Trump, AOC, Elon & DOGE #462 – Ezra Klein and Derek Thompson: Politics, Trump, AOC, Elon & DOGE

Ezra Klein is one of the most influential voices representing the left-wing of American politics.

He is a columnist for the NY Times and host of The Ezra Klein Show.

Derek Thompson is a writer at The Atlantic and host of the Plain English podcast.

Together they have written a new book titled Abundance that lays out a set of ideas for the future of the Democratic party.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep462-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

3 months, 1 week назад @ lexfridman.com
#461 – ThePrimeagen: Programming, AI, ADHD, Productivity, Addiction, and God
#461 – ThePrimeagen: Programming, AI, ADHD, Productivity, Addiction, and God #461 – ThePrimeagen: Programming, AI, ADHD, Productivity, Addiction, and God

ThePrimeagen (aka Michael Paulson) is a programmer who has educated, entertained, and inspired millions of people to build software and have fun doing it.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep461-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexNetSuite: Business management software.

Go to http://netsuite.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexAG1: All-in-one daily nutrition drinks.

3 months, 2 weeks назад @ lexfridman.com
#460 – Narendra Modi: Prime Minister of India – Power, Democracy, War & Peace
#460 – Narendra Modi: Prime Minister of India – Power, Democracy, War & Peace #460 – Narendra Modi: Prime Minister of India – Power, Democracy, War & Peace

Narendra Modi is the Prime Minister of India.

On YouTube this episode is available in English, Hindi, Russian (and soon other languages).

Captions and voice-over audio tracks are provided (for the main episode video on YouTube) in English, Hindi, Russian, and the original mixed-language version, with subtitles available in your preferred language.

To listen to the original mixed-language version, please select the Hindi (Latin) audio track.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep460-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

3 months, 3 weeks назад @ lexfridman.com
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters #459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Dylan Patel is the founder of SemiAnalysis, a research & analysis company specializing in semiconductors, GPUs, CPUs, and AI hardware.

Nathan Lambert is a research scientist at the Allen Institute for AI (Ai2) and the author of a blog on AI called Interconnects.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep459-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://invideo.io/i/lexpodGitHub: Developer platform and AI code editor.

(4:31:34) – AI agents(4:40:16) – Programming and AI(4:47:43) – Open source(4:56:55) – Stargate(5:04:24) – Future of AIPODCAST LINKS:– Podcast Website: https://lexfridman.com/podcast–…

5 months назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 5 days, 8 hours назад
AI Testing and Evaluation: Learnings from genome editing
AI Testing and Evaluation: Learnings from genome editing AI Testing and Evaluation: Learnings from genome editing

As generative AI continues to advance, Microsoft has gathered a range of experts—from genome editing to cybersecurity—to share how their fields approach evaluation and risk assessment.

CHARO: Well, you know, genome editing is both very old and very new.

Now the earliest forms of genome editing were very inefficient, and so we didn’t worry that much.

But the bottom-line thing to remember, the way to really think about it is, we don’t regulate genome editing; we regulate the things that use genome editing.

And she said, you know, we don’t regulate genome editing; we regulate the things that use genome editing.

5 days, 8 hours назад @ microsoft.com
AI Testing and Evaluation: Learnings from Science and Industry
AI Testing and Evaluation: Learnings from Science and Industry AI Testing and Evaluation: Learnings from Science and Industry

Our goal is to learn from their successes and their stumbles to move the science and practice of AI testing forward.

And I think, really, there are two reasons why tech is so, kind of, representative of that kind of challenge that I’ve always found fascinating.

Continues to be a really important topic in the AI policy conversation right now, I think, for really good reason.

Testing is an important component for governance and AI and, of course, in all of these other domains, as well.

I think about almost, like, in the near to mid-term, like three issues that we need to address in the AI, kind of, policy and testing context.

1 week, 5 days назад @ microsoft.com
The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research
The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research The AI Revolution in Medicine, Revisited: How AI is reshaping the future of healthcare and medical research

LEE: Yeah, yeah.

It cannot—as, you know, Bill was saying—it cannot learn from your document.

And I don’t know if the two of you remember, but I ended up doing a lot of tests.

I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini (opens in new tab).

Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know.

3 weeks, 2 days назад @ microsoft.com
What AI's impact on individuals means for the health workforce and industry
What AI's impact on individuals means for the health workforce and industry What AI's impact on individuals means for the health workforce and industry

So I don’t think we should be surprised that business schools matter on this because we care about management.

That’s really going to change the way, like, middle school works, was my thinking at the time.

We’ve gone from AI being highly discriminative to AI that’s able to explore the world in particular ways.

The symptoms that they’re showing are quite different, and also their compliance is really, really different.

LEE: Yeah, really, really interesting.

1 month, 1 week назад @ microsoft.com
Abstracts: Zero-shot models in single-cell biology with Alex Lu
Abstracts: Zero-shot models in single-cell biology with Alex Lu Abstracts: Zero-shot models in single-cell biology with Alex Lu

And single-cell foundation models claim to be capable of unraveling deeper insights than ever before.

Basically, we showed that single-cell foundation models perform worse in settings that are fundamental to biological discovery than much simpler machine learning and statistical methods that were used in the field before single-cell foundation models emerged and are the go-to standard for unpacking meaning from these complicated experiments.

And the way to understand this is because single-cell foundation models are trained in a way that tries to expose these models to millions of single-cells.

But let’s also talk about the impact for methodologists, people who are trying to improve these s…

1 month, 2 weeks назад @ microsoft.com
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma
Abstracts: Aurora with Megan Stanley and Wessel Bruinsma Abstracts: Aurora with Megan Stanley and Wessel Bruinsma

This is such exciting work about environmental forecasting, so we’re happy to have the two of you join us today.

Mostly because AI weather forecasting models are computationally much more efficient and can even be more accurate.

What’s unfortunate though, about this big step forward, is that these developments are mostly limited to the setting of weather forecasting.

Weather forecasting is very important, obviously, but there are many other important environmental forecasting problems out there, such as air pollution forecasting or ocean wave forecasting.

STANLEY: Current approaches have really focused training very specifically on weather forecasting models.

1 month, 2 weeks назад @ microsoft.com
Collaborators: Healthcare Innovation to Impact
Collaborators: Healthcare Innovation to Impact Collaborators: Healthcare Innovation to Impact

LUNGREN: And now it really feels like this collaborative effort, you know, really can help start to extend that mission.

I think, you know, Will and Smitha, that we definitely feel the passion and the innovation.

Again, you know, in text, you refer to that earlier and certainly off the shelf, there’s really powerful applications.

LUNGREN: So, I think AI has always been thought of as a savior kind of technology.

And I guess for my part, I think really what we’re going to see is a massive unleash of creativity.

1 month, 2 weeks назад @ microsoft.com
Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers
Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers

LEE: Yeah, yeah.

LEE: Yeah, yeah.

LEE: Yeah, yeah.

[LAUGHS]GOLDBERG: Right, right, right, yeah.

Yeah, yeah.

1 month, 3 weeks назад @ microsoft.com
Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv
Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv

Today I’m talking to two researchers, Hongxia Hao, a senior researcher at Microsoft Research AI for Science, and Bing Lv, an associate professor in physics at the University of Texas at Dallas.

Hongxia and Bing are co-authors of a paper called Probing the Limit of Heat Transfer in Inorganic Crystals with Deep Learning .

LV: So I think one of the biggest things as Hongxia said, right?

We have a lot of new materials, exotic materials, which some of them, Hongxia can elaborate a little bit more.

HAO: Yeah, yeah.

1 month, 4 weeks назад @ microsoft.com
Abstracts: Societal AI with Xing Xie
Abstracts: Societal AI with Xing Xie Abstracts: Societal AI with Xing Xie

I’m here today with Xing Xie, a partner research manager at Microsoft Research and co-author of a white paper called Societal AI: Research Challenges and Opportunities .

HUIZINGA: So let’s start with a brief overview of the background for this white paper on Societal AI.

XIE: The idea for this white paper emerged in response to the shift we are witnessing in the AI landscape.

XIE: Rather than follow a traditional research methodology, we built this white paper around ten fundamental, foundational research questions.

HUIZINGA: Yeah, yeah, yeah.

2 months назад @ microsoft.com
The AI Revolution in Medicine, Revisited: Laws, norms, and ethics for AI in health
The AI Revolution in Medicine, Revisited: Laws, norms, and ethics for AI in health The AI Revolution in Medicine, Revisited: Laws, norms, and ethics for AI in health

Roxana is among the world’s thought leaders in AI, healthcare, and medicine, thanks in part to groundbreaking work on AI biases and trustworthiness.

When we think about AI, we think about it being very futuristic, but it’s trained on data from the past.

And so I think your struggles and frustrations on, you know, how to expand that nationwide, I think, are really, really informative.

And in a way, I think … I don’t know that I or my coauthors were satisfied with that.

You know, AI has all the time in the world [LAUGHS] to write you a text.

2 months назад @ microsoft.com
The AI Revolution in Medicine, Revisited: Empowering patients and healthcare consumers in the age of generative AI
The AI Revolution in Medicine, Revisited: Empowering patients and healthcare consumers in the age of generative AI The AI Revolution in Medicine, Revisited: Empowering patients and healthcare consumers in the age of generative AI

[LAUGHS]DEBRONKART: Ah, well, that’s … that’s even weirder.

I’m, as I said at the beginning, I’m glad to be alive and I’m really, really, really grateful to be given a chance to share my thoughts with your audience because I really like super smart nerds.

And it’s really to me like where I’m seeing kind of the first set of really kind of promising AI applications.

And so, to me, that’s really kind of where I see the most interesting opportunities for technology and for digital health.

Just really, really appreciate it.

2 months, 2 weeks назад @ microsoft.com
The AI Revolution in Medicine, Revisited: Real-world healthcare AI development and deployment—at scale
The AI Revolution in Medicine, Revisited: Real-world healthcare AI development and deployment—at scale The AI Revolution in Medicine, Revisited: Real-world healthcare AI development and deployment—at scale

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

3 months назад @ microsoft.com
Ideas: Accelerating Foundation Models Research: AI for all
Ideas: Accelerating Foundation Models Research: AI for all Ideas: Accelerating Foundation Models Research: AI for all

But there was that class I really, really enjoyed, which was mathematical logic.

Well, let’s get onto the topic of Accelerating Foundation Models Research and unpack the big idea behind that.

It might be confusing for some people, Accelerating Foundation Models Research.

And so when we started with Accelerating Foundation Models Research and from now on, I will say AFMR if that’s okay.

It’s about access to people, access to the resources and really co-designing so that we can really, really make more advances together.

3 months назад @ microsoft.com
The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic
The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic

Sara is vice president and chief health AI officer at UC San Francisco Health.

LONGHURST: So the pat response is AI won’t replace doctors, but AI will replace doctors who don’t use AI.

LEE: And I’m assuming a chief health AI officer is not a role that has been around for a long time.

LEE: Should I be impressed or concerned that the chief health AI officer at UC San Francisco Health is using ChatGPT off label?

We’ll delve into how patients are using generative AI for their own healthcare, the hype and reality of AI drug discovery, and more.

3 months, 2 weeks назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 1 week назад
Complex Dynamic in Networks
Complex Dynamic in Networks Complex Dynamic in Networks

In this episode, we learn why simply analyzing the structure of a network is not enough, and how the dynamics - the actual mechanisms of interaction between components - can drastically change how information or influence spreads. Our guest, Professor Baruch Barzel of Bar-Ilan University, is a leading researcher in network dynamics and complex systems ranging from biology to infrastructure and beyond. BarzelLab BarzelLab on Youtube Paper in focus: Universality in network dynamics, 2013

1 week назад @ dataskeptic.com
Github Network Analysis
Github Network Analysis Github Network Analysis 1 week, 6 days назад @ dataskeptic.com
Networks and Complexity
Networks and Complexity Networks and Complexity

In this episode, Kyle does an overview of the intersection of graph theory and computational complexity theory. In complexity theory, we are about the runtime of an algorithm based on its input size. For many graph problems, the interesting questions we want to ask take longer and longer to answer! This episode provides the fundamental vocabulary and signposts along the path of exploring the intersection of graph theory and computational complexity theory.

3 weeks назад @ dataskeptic.com
Actantial Networks
Actantial Networks Actantial Networks

In this episode, listeners will learn about Actantial Networks—graph-based representations of narratives where nodes are actors (such as people, institutions, or abstract entities) and edges represent the actions or relationships between them. The one who will present these networks is our guest Armin Pournaki, a joint PhD candidate at the Max Planck Institute and Sciences, who specializes in computational social science, where he develops methods to extract and analyze political narratives using natural language processing and network science. Armin explains how these methods can expose conflicting narratives around the same events, as seen in debates on COVID-19, climate change, or the wa…

1 month назад @ dataskeptic.com
Graphs for Causal AI
Graphs for Causal AI Graphs for Causal AI

How to build artificial intelligence systems that understand cause and effect, moving beyond simple correlations? As we all know, correlation is not causation. "Spurious correlations" can show, for example, how rising ice cream sales might statistically link to more drownings, not because one causes the other, but due to an unobserved common cause like warm weather. Our guest, Utkarshani Jaimini, a researcher from the University of South Carolina's Artificial Intelligence Institute, tries to tackle this problem by using knowledge graphs that incorporate domain expertise. Knowledge graphs (structured representations of information) are combined with neural networks in the field of neurosymbo…

1 month, 1 week назад @ dataskeptic.com
Power Networks
Power Networks Power Networks 1 month, 2 weeks назад @ dataskeptic.com
Unveiling Graph Datasets
Unveiling Graph Datasets Unveiling Graph Datasets 1 month, 4 weeks назад @ dataskeptic.com
Network Manipulation
Network Manipulation Network Manipulation

In this episode we talk with Manita Pote, a PhD student at Indiana University Bloomington, specializing in online trust and safety, with a focus on detecting coordinated manipulation campaigns on social media. Key insights include how coordinated reply attacks target influential figures like journalists and politicians, how machine learning models can detect these inauthentic campaigns using structural and behavioral features, and how deletion patterns reveal efforts to evade moderation or manipulate engagement metrics. Follow our guest X/Twitter Google Scholar Papers in focus Coordinated Reply Attacks in Influence Operations: Characterization and Detection ,2025 Manipulating Twitter throug…

2 months назад @ dataskeptic.com
The Small World Hypothesis
The Small World Hypothesis The Small World Hypothesis

Kyle discusses the history and proof for the small world hypothesis.

2 months, 2 weeks назад @ dataskeptic.com
Thinking in Networks
Thinking in Networks Thinking in Networks

Kyle asks Asaf questions about the new network science course he is now teaching. The conversation delves into topics such as contact tracing, tools for analyzing networks, example use cases, and the importance of thinking in networks.

2 months, 3 weeks назад @ dataskeptic.com
Fraud Networks
Fraud Networks Fraud Networks

In this episode we talk with Bavo DC Campo, a data scientist and statistician, who shares his expertise on the intersection of actuarial science, fraud detection, and social network analytics. Together we will learn how to use graphs to fight against insurance fraud by uncovering hidden connections between fraudulent claims and bad actors. Key insights include how social network analytics can detect fraud rings by mapping relationships between policyholders, claims, and service providers, and how the BiRank algorithm, inspired by Google’s PageRank, helps rank suspicious claims based on network structure. Bavo will also present his iFraud simulator that can be used to model fraudulent networ…

3 months назад @ dataskeptic.com
Criminal Networks
Criminal Networks Criminal Networks

In this episode we talk with Justin Wang Ngai Yeung, a PhD candidate at the Network Science Institute at Northeastern University in London, who explores how network science helps uncover criminal networks. Justin is also a member of the organizing committee of the satellite conference dealing with criminal networks at the network science conference in The Netherlands in June 2025. Listeners will learn how graph-based models assist law enforcement in analyzing missing data, identifying key figures in criminal organizations, and improving intervention strategies. Key insights include the challenges of incomplete and inaccurate data in criminal network analysis, how law enforcement agencies us…

3 months, 2 weeks назад @ dataskeptic.com
Graph Bugs
Graph Bugs Graph Bugs

In this episode today’s guest is Celine Wüst, a master’s student at ETH Zurich specializing in secure and reliable systems, shares her work on automated software testing for graph databases. Celine shows how fuzzing—the process of automatically generating complex queries—helps uncover hidden bugs in graph database management systems like Neo4j, FalconDB, and Apache AGE. Key insights include how state-aware query generation can detect critical issues like buffer overflows and crashes, the challenges of debugging complex database behaviors, and the importance of security-focused software testing. We'll also find out which Graph DB company offers swag for finding bugs in its software and get C…

3 months, 3 weeks назад @ dataskeptic.com
Organizational Network Analysis
Organizational Network Analysis Organizational Network Analysis

In this episode, Gabriel Petrescu, an organizational network analyst, discusses how network science can provide deep insights into organizational structures using OrgXO, a tool that maps companies as networks rather than rigid hierarchies. Listeners will learn how analyzing workplace collaboration networks can reveal hidden influencers, organizational bottlenecks, and engagement levels, offering a data-driven approach to improving effectiveness and resilience. Key insights include how companies can identify overburdened employees, address silos between departments, and detect vulnerabilities where too few individuals hold critical knowledge. Real-life applications range from mergers and acq…

4 months назад @ dataskeptic.com
Organizational Networks
Organizational Networks Organizational Networks

Is it better to have your work team fully connected or sparsely connected? In this episode we'll try to answer this question and more with our guest Hiroki Sayama, a SUNY Distinguished Professor and director of the Center for Complex Systems at Binghamton University. Hiroki delves into the applications of network science in organizational structures and innovation dynamics by showing his recent work of extracting network structures from organizational charts to enable insights into decision-making and performance, He'll also cover how network connectivity impacts team creativity and innovation. Key insights include how the structure of organizational networks—such as the depth of hierarchy …

4 months, 1 week назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 1 day, 13 hours назад
902: In Case You Missed It in June 2025
902: In Case You Missed It in June 2025 902: In Case You Missed It in June 2025

In this episode of “In Case You Missed It”, Jon recaps his June interviews on The SuperDataScience Podcast. Hear from Diane Hare, Avery Smith, Kirill Eremenko, and Shaun Johnson as they talk about the best portfolios for AI practitioners, how to stand out in a saturated candidate market for AI roles, how to tell when an AI startup is going places, and ways to lead AI change in business. Additional materials: ⁠⁠⁠www.superdatascience.com/902 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 day, 13 hours назад @ podtrac.com
901: Automating Legal Work with Data-Centric ML (feat. Lilith Bat-Leah)
901: Automating Legal Work with Data-Centric ML (feat. Lilith Bat-Leah) 901: Automating Legal Work with Data-Centric ML (feat. Lilith Bat-Leah)

Senior Director of AI Labs for Epiq Lilith Bat-Leah speaks to Jon Krohn about the ways AI have disrupted the legal industry using LLMs and retrieval-augmented generation (RAG), as well as how the data-centric machine learning research movement (DMLR) is systematically improving data quality, and why that is so important. Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/901⁠⁠⁠⁠ This episode is brought to you by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠ and Adverity, the conversational analytics platform⁠⁠⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (05:45) Deciphering legal tech te…

4 days, 13 hours назад @ podtrac.com
900: 95-Year-Old Annie on How to Stay Healthy and Happy
900: 95-Year-Old Annie on How to Stay Healthy and Happy 900: 95-Year-Old Annie on How to Stay Healthy and Happy

“Stay happy and healthy”: In this special Five-Minute Friday, Jon Krohn speaks with Annie, his grandmother, on her 95th birthday. Hear how she is physically and mentally coping with illnesses that limit her mobility and the joys of having a pet. Additional materials: ⁠⁠⁠⁠⁠⁠www.superdatascience.com/900⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 week, 1 day назад @ podtrac.com
899: Landing $200k+ AI Roles: Real Cases from the SuperDataScience Community, with Kirill Eremenko
899: Landing $200k+ AI Roles: Real Cases from the SuperDataScience Community, with Kirill Eremenko 899: Landing $200k+ AI Roles: Real Cases from the SuperDataScience Community, with Kirill Eremenko

Data science skills, a data science bootcamp, and why Python and SQL still reign supreme: In this episode, Kirill Eremenko returns to the podcast to speak to Jon Krohn about SuperDataScience subscriber success stories, where to focus in a field that is evolving incredibly quickly, and why in-person working and networking might give you the edge over other candidates in landing a top AI role. Additional materials: ⁠⁠⁠⁠www.superdatascience.com/899⁠⁠⁠ This episode is brought to you by ⁠Adverity, the conversational analytics platform⁠ and by the ⁠Dell AI Factory with NVIDIA⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship informat…

1 week, 4 days назад @ podtrac.com
898: My Four-Hour Agentic AI Workshop is Live and 100% Free
898: My Four-Hour Agentic AI Workshop is Live and 100% Free 898: My Four-Hour Agentic AI Workshop is Live and 100% Free

In this Five-Minute Friday, Jon Krohn announces his new, free workshop on Agentic AI. On this four-hour comprehensive course, you’ll learn the key terminology for working with these flexible, multi-agent systems and then get to grips with developing and deploying this artificial “team of experts” for all your AI-driven projects. Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/898⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

2 weeks, 1 day назад @ podtrac.com
897: How to Enable Enterprise AI Transformation, with Strategy Consultant Diane Hare
897: How to Enable Enterprise AI Transformation, with Strategy Consultant Diane Hare 897: How to Enable Enterprise AI Transformation, with Strategy Consultant Diane Hare

Diane Hare talks to Jon Krohn about the power of storytelling for corporate buy-in of AI initiatives, how to actively implement AI to transform organizations, and how emerging professionals can upskill themselves. Hear how she discovered her background in storytelling at Ernst & Young and her work with Simon Sinek, which she finds to be integral to her process. Inspired by Sinek’s aphorism “start with why”, Diane notes that many companies neglect this crucial part of their mission because they never take the time to work on it. Additional materials: ⁠⁠⁠www.superdatascience.com/897⁠⁠ This episode is brought to you by Trainium2, the latest AI chip from AWS, by Adverity, the conversational ana…

2 weeks, 4 days назад @ podtrac.com
896: AI (Probably) Isn’t Taking Your Job (At Least Anytime Soon)
896: AI (Probably) Isn’t Taking Your Job (At Least Anytime Soon) 896: AI (Probably) Isn’t Taking Your Job (At Least Anytime Soon)

The Economist reported that global Google searches for "AI unemployment" hit an all-time high earlier this year. But do we have to worry about AI taking our jobs? In this week’s Five-Minute Friday, Jon Krohn investigates whether the rise of AI has directly led to an increase in unemployment. Additional materials: ⁠⁠⁠⁠www.superdatascience.com/896 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

3 weeks, 1 day назад @ podtrac.com
895: The Future of Enterprise AI: Investor Shaun Johnson Reveals What Actually Works
895: The Future of Enterprise AI: Investor Shaun Johnson Reveals What Actually Works 895: The Future of Enterprise AI: Investor Shaun Johnson Reveals What Actually Works

How to get funded by a VC specializing in AI: Head of AIX Ventures Shaun Johnson talks to Jon Krohn about investment strategies, how to simplify AI adoption, why a little competition can be so beneficial to AI startups, and how Big Tech is circumventing anti-monopoly measures. Additional materials: ⁠⁠www.superdatascience.com/895⁠ This episode is brought to you by ⁠Adverity, the conversational analytics platform⁠ and by the ⁠Dell AI Factory with NVIDIA⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (10:36) What Shaun looks for when evaluating early-stage AI startups (19:11) Building…

3 weeks, 4 days назад @ podtrac.com
894: In Case You Missed It in May 2025
894: In Case You Missed It in May 2025 894: In Case You Missed It in May 2025

In this episode of “In Case You Missed It”, Jon Krohn takes clips from interviews with guests in May 2025. From AI agent integration and RAG-based chatbots to education through virtual reality headsets and data harmonization, this episode explores how industry leaders are developing the tools and technologies that can improve operations, education, healthcare, and marketing. Highlight clips are with John Roese, Global Chief Technology Officer and Chief AI Officer at Dell Technologies (Episode 887), Senior Developer Relations Engineer at Posit, PBC Jeroen Janssens and Lead Data Scientist at Xomnia Thijs Nieuwdorp (Episode 885), Founder of CEEK Mary Spio (Episode 889), and Martin Brunthaler, …

4 weeks, 1 day назад @ podtrac.com
893: How to Jumpstart Your Data Career (by Applying Like a Scientist), with Avery Smith
893: How to Jumpstart Your Data Career (by Applying Like a Scientist), with Avery Smith 893: How to Jumpstart Your Data Career (by Applying Like a Scientist), with Avery Smith

Avery Smith is a passionate and motivational YouTuber and careers educator for data science. In this episode, Jon Krohn asks Avery about the tools and tricks he has learned from personal experience and from his students in how to get ahead in the tech industry. Avery shares the “learning ladder” he uses to help newcomers start on the right foot with great examples from former bootcamp students who have put his theories into practice. And, if you’re using LinkedIn to find jobs, Avery explains why this might be one of the reasons you’re not getting work. Additional materials: ⁠www.superdatascience.com/893 This episode is brought to you by Adverity, the conversational analytics platform and by…

1 month назад @ podtrac.com
892: We’re In The AI “Trough of Disillusionment” (and that’s Great!)
892: We’re In The AI “Trough of Disillusionment” (and that’s Great!) 892: We’re In The AI “Trough of Disillusionment” (and that’s Great!)

Businesses have entered a “trough of disillusionment” for AI. In this Five-Minute Friday, Jon Krohn learns why Fortune 500 execs are so frustrated with the tools and how they can work their way up the “slope of enlightenment” towards effective AI. Hear why AI takeup hasn’t so far gone to plan in the corporate world and what that world needs from AI to encourage greater business engagement. Additional materials: ⁠⁠⁠www.superdatascience.com/892⁠⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month назад @ podtrac.com
891: Conversational AI is Overhauling Data Analytics, with Martin Brunthaler
891: Conversational AI is Overhauling Data Analytics, with Martin Brunthaler 891: Conversational AI is Overhauling Data Analytics, with Martin Brunthaler

Martin Brunthaler talks to Jon Krohn about founding Adverity, a data analytics platform for marketing that simplifies integrating data from multiple sources and crunching them into actionable insights. Learn how Adverity became a data analytics powerhouse serving multiple industries, and why Martin thinks AI will strengthen rather than diminish the job market for data scientists, data analysts, and machine learning engineers. Additional materials: www.superdatascience.com/891 Today’s episode is brought to you by Trainium2, the latest AI chip from AWS and by the Dell AI Factory with NVIDIA Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for spo…

1 month, 1 week назад @ podtrac.com
890: The “State of AI” Report 2025
890: The “State of AI” Report 2025 890: The “State of AI” Report 2025

In this week’s Five-Minute Friday, Jon Krohn reveals highlights from Stanford University’s AI Index Report. Released a few weeks ago by the Institute for Human-Centered AI, this annual report details the incredible technical advances, policies, and investments in artificial intelligence. Hear which models achieve the best performance relative to their size, in what scenarios top AI systems can outperform humans (and when humans still outperform AI), and more in Jon’s five key takeaways. Additional materials: ⁠⁠www.superdatascience.com/890⁠⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 1 week назад @ podtrac.com
889: AI-Powered Virtual Reality: The Future of Education and Entertainment, with Mary Spio
889: AI-Powered Virtual Reality: The Future of Education and Entertainment, with Mary Spio 889: AI-Powered Virtual Reality: The Future of Education and Entertainment, with Mary Spio

Founder of CEEK’s Mary Spio talks to Jon Krohn about how the platform contributes to the emerging community of digital creators with its blockchain-powered virtual experiences. Hear how Mary got her first investors for CEEK and how it is used across industries as diverse as education, entertainment, aviation, and healthcare. Additional materials: ⁠⁠⁠www.superdatascience.com/889⁠ This episode is brought to you by ⁠⁠Adverity, the conversational analytics platform⁠⁠ and by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (03:42) What CEEK is and the m…

1 month, 2 weeks назад @ podtrac.com
888: Teams of Agents: The Next Frontier in AI Collaboration, with Mike Pell
888: Teams of Agents: The Next Frontier in AI Collaboration, with Mike Pell 888: Teams of Agents: The Next Frontier in AI Collaboration, with Mike Pell

Mike Pell speaks to Jon Krohn about The Microsoft Garage, a program that drives the culture of innovation at the tech multinational, and how listeners can apply their principles to foster innovation in their workplace. In this Five-Minute Friday, you’ll hear more about Microsoft’s approaches to agentic AI, the future of human-AI collaboration in the workplace, and why experimentation and curiosity are critical skills for the future of work. Additional materials: ⁠www.superdatascience.com/888⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 2 weeks назад @ podtrac.com
Data Science at Home Data Science at Home
последний пост 2 weeks, 3 days назад
Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285)
Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285) Brains in the Machine: The Rise of Neuromorphic Computing (Ep. 285)

In this episode of Data Science at Home, we explore the fascinating world of neuromorphic computing — a brain-inspired approach to computation that could reshape the future of AI and robotics.

The episode breaks down how neuromorphic systems differ from conventional AI architectures like transformers and LLMs, diving into spiking neural networks (SNNs), their benefits in energy efficiency and real-time processing, and their limitations in training and scalability.

Real-world applications are highlighted, including low-power drones, hearing aids, and event-based cameras.

Francesco closes with a vision of hybrid systems where neuromorphic chips and LLMs coexist, blending biological inspiratio…

2 weeks, 3 days назад @ datascienceathome.com
DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284)
DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284) DSH/Warcoded – AI in the Invisible Battlespace (Ep. 284)

This episode explores the invisible battlespace of cyber and electronic warfare, where AI takes center stage.

SponsorsBuilding multi-agent software is hard — agent-to-agent and agent-to-tool communication is still the wild west.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

1 month назад @ datascienceathome.com
DSH/Warcoded Swarming the Battlefield (Ep. 283)
DSH/Warcoded Swarming the Battlefield (Ep. 283) DSH/Warcoded Swarming the Battlefield (Ep. 283)

Swarming the Battlefield explores how artificial intelligence is revolutionizing combat through coordinated drone swarms.

This episode uncovers how these intelligent agents turn the chaos of the battlefield into a synchronized dance of machine warfare.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

1 month, 1 week назад @ datascienceathome.com
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282) DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)

In this gripping follow-up, we dive into how AI is transforming kinetic operations—from identifying a threat to executing a strike.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Whether it’s in the sky, on the ground, or in orbit—if it’s intelligent and mobile, Intrepid helps you build it.

1 month, 3 weeks назад @ datascienceathome.com
DSH/Warcoded: Eyes and Ears of the Machine – AI Reconnaissance and Surveillance (Ep. 281)
DSH/Warcoded: Eyes and Ears of the Machine – AI Reconnaissance and Surveillance (Ep. 281) DSH/Warcoded: Eyes and Ears of the Machine – AI Reconnaissance and Surveillance (Ep. 281)

Welcome to DSH/WarcodedWe explore how AI is transforming ISR (Intelligence, Surveillance, Reconnaissance)—from satellite imagery to drone feeds.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.com.”Warcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Learn more at intrepid.ai.”#AI #defensetech #ISR #LLM #Warcoded #DataScienceAtHome #OSINT #SIGINT #dronewarfare

1 month, 4 weeks назад @ datascienceathome.com
AI Agents with Atomic Agents 🚀 with Kenny Vaneetvelde (Ep. 280)
AI Agents with Atomic Agents 🚀 with Kenny Vaneetvelde (Ep. 280) AI Agents with Atomic Agents 🚀 with Kenny Vaneetvelde (Ep. 280)

🎙️ In this episode of Data Science at Home, we sit down with Kenny Vaneetvelde, the mastermind behind Atomic Agents, a groundbreaking framework redefining AI development.

🔍 Discover how atomicity simplifies complex AI systems, why modularity matters more than ever, and how Atomic Agents is eliminating hidden assumptions and redundant complexity in AI workflows.

💡 From real-world applications to the tech stack behind the framework, Kenny takes us on a deep dive into this lightweight, powerful tool for creating consistent and brand-aligned AI.

📌 Timestamps:0:00 – Intro2:30 – Kenny’s journey in AI5:00 – What are Atomic Agents?

10:45 – Why atomicity matters in AI18:20 – The tech behind Atomic A…

2 months, 3 weeks назад @ datascienceathome.com
Run massive models on crappy machines (Ep. 279)
Run massive models on crappy machines (Ep. 279) Run massive models on crappy machines (Ep. 279)

This episode explores how to break down barriers by running massive AI models on “crappy machines”—affordable, low-spec devices.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail …

3 months назад @ datascienceathome.com
WeightWatcher: The AI Detective for LLMs (DeepSeek & OpenAI included) (Ep. 278)
WeightWatcher: The AI Detective for LLMs (DeepSeek & OpenAI included) (Ep. 278) WeightWatcher: The AI Detective for LLMs (DeepSeek & OpenAI included) (Ep. 278)

Enter WeightWatcher—the AI detective tool that peeks inside neural networks without needing their data.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at:hello@datascienceathom…

3 months назад @ datascienceathome.com
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 278)
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 278) Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 278)

From the viral article “Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything” on my newsletter at https://defragzone.substack.com/p/techs-dumbest-mistake-why-firinghere are my thoughts about AI replacing programmers…✨ Connect with us!

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data profes…

3 months, 1 week назад @ datascienceathome.com
Scaling Smart: AI, Data, and Building Future-Ready Enterprises with Josh Miramant (Ep. 276)
Scaling Smart: AI, Data, and Building Future-Ready Enterprises with Josh Miramant (Ep. 276) Scaling Smart: AI, Data, and Building Future-Ready Enterprises with Josh Miramant (Ep. 276)

In this episode, we dive into the transformative world of AI, data analytics, and cloud infrastructure with Josh Miramant, CEO of Blue Orange Digital.

As a seasoned entrepreneur with over $25 million raised across ventures and two successful exits, Josh shares invaluable insights on scaling data-driven businesses, integrating machine learning frameworks, and navigating the rapidly evolving landscape of cloud data architecture.

From generative AI to large language models, Josh explores cutting-edge trends shaping financial services, real estate, and consumer goods.

Tune in for a masterclass in leveraging data for impact and innovation!

Linkshttps://blueorange.digital/https://blueorange.digit…

6 months, 2 weeks назад @ datascienceathome.com
Autonomous Weapons and AI Warfare (Ep. 275)
Autonomous Weapons and AI Warfare (Ep. 275) Autonomous Weapons and AI Warfare (Ep. 275)

Here’s the updated text with links to the websites included:AI is revolutionizing the military with autonomous drones, surveillance tech, and decision-making systems.

In this episode of Data Science at Home, we expose the cutting-edge tech reshaping defense—and the chilling ethical questions that follow.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: Francesco Gad📷 Instagram: https://www.instagram.com/datascienceathome/📘 Facebook: https://www.facebook.com/datascienceAH💼 LinkedIn: https://www.linkedin.com/company/data-science-at-home-podcast💬 Discord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

S…

6 months, 3 weeks назад @ datascienceathome.com
8 Proven Strategies to Scale Your AI Systems Like OpenAI! 🚀 (Ep. 274)
8 Proven Strategies to Scale Your AI Systems Like OpenAI! 🚀  (Ep. 274) 8 Proven Strategies to Scale Your AI Systems Like OpenAI! 🚀 (Ep. 274)

In this episode of Data Science at Home, we’re diving deep into the powerful strategies that top AI companies, like OpenAI, use to scale their systems to handle millions of requests every minute!

From stateless services and caching to the secrets of async processing, discover 8 essential strategies to make your AI and machine learning systems unstoppable.

Instagram: https://www.instagram.com/datascienceathome/Twitter: @datascienceathomeFacebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and ma…

6 months, 3 weeks назад @ datascienceathome.com
Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273)
Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273) Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273)

Together, they explore the growing importance of distinguishing human-written from AI-generated text, discussing real-world examples from social media to news.

How reliable are current detection tools like DetectGPT?

What are the ethical and technical challenges ahead as AI continues to advance?

And is the balance between innovation and regulation tipping in the right direction?

Tune in for insights on the future of AI text detection and the broader implications for media, academia, and policy.

7 months, 1 week назад @ datascienceathome.com
AI bubble, Sam Altman’s Manifesto and other fairy tales for billionaires (Ep. 272)
AI bubble, Sam Altman’s Manifesto and other fairy tales for billionaires (Ep. 272) AI bubble, Sam Altman’s Manifesto and other fairy tales for billionaires (Ep. 272)

Welcome to Data Science at Home, where we don’t just drink the AI Kool-Aid.

Today, we’re dissecting Sam Altman’s “AI manifesto”—a magical journey where, apparently, AI will fix everything from climate change to your grandma’s back pain.

In this episode, I’ll break down the bold (and often bizarre) claims in Altman’s grand speech for the Intelligence Age.

I’ll give you the real scoop on what’s realistic, what’s nonsense, and why some tech billionaires just can’t resist overselling.

Chapters00:00 – Intro00:18 – CEO of Baidu Statement on AI Bubble03:47 – News On Sam Altman Open AI06:43 – Online Manifesto “The Intelleigent Age”13:14 – Deep Learning16:26 – AI gets Better With Scale17:45 – Conclu…

7 months, 2 weeks назад @ datascienceathome.com
AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271)
AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271) AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271)

In this episode of Data Science at Home, we dive into the hidden costs of AI’s rapid growth — specifically, its massive energy consumption.

With tools like ChatGPT reaching 200 million weekly active users, the environmental impact of AI is becoming impossible to ignore.

Each query, every training session, and every breakthrough come with a price in kilowatt-hours, raising questions about AI’s sustainability.

Join us, as we uncovers the staggering figures behind AI’s energy demands and explores practical solutions for the future.

From efficiency-focused algorithms and specialized hardware to decentralized learning, this episode examines how we can balance AI’s advancements with our planet’s …

7 months, 3 weeks назад @ datascienceathome.com