Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 2 часа назад
[D] What are the current applications of AI in automotive and motorsport industries? Any companies, labs or professors actively working at the intersection?
[D] What are the current applications of AI in automotive and motorsport industries? Any companies, labs or professors actively working at the intersection?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

2 часа назад @ reddit.com
Help with mentorship [d]
Help with mentorship [d]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

3 часа назад @ reddit.com
[D] Lightning/Other high-level frameworks for distributed training?
[D] Lightning/Other high-level frameworks for distributed training?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

3 часа назад @ reddit.com
[D] Most widely used open-source decoder-only transformer?
[D] Most widely used open-source decoder-only transformer?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

3 часа назад @ reddit.com
[R] Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
[R] Pushing the Limits of Large Language Model Quantization via the Linearity Theorem

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

4 часа назад @ reddit.com
[D] Use Cases for Video Mapping/Timestamping Software for ML Training?
[D] Use Cases for Video Mapping/Timestamping Software for ML Training?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

10 часов назад @ reddit.com
Looking for collaboration [R]
Looking for collaboration [R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

10 часов назад @ reddit.com
[P] Clustering time-series data into seasonal and no-seasonal types
[P] Clustering time-series data into seasonal and no-seasonal types

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

11 часов назад @ reddit.com
[D] Is cold start still a pain point in multi-model LLM inference?
[D] Is cold start still a pain point in multi-model LLM inference?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

12 часов назад @ reddit.com
[D] Is my take on transformers in time series reasonable / where is it wrong?
[D] Is my take on transformers in time series reasonable / where is it wrong?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
[D] The Only Way We Can "Humanize" LLMs' Output is by Using Real Human Data During All Training Stages
[D] The Only Way We Can "Humanize" LLMs' Output is by Using Real Human Data During All Training Stages

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
[D] Spotify 100,000 Podcasts Dataset availability
[D] Spotify 100,000 Podcasts Dataset availability

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

16 часов назад @ reddit.com
[P] I built a self-hosted version of DataBricks for research
[P] I built a self-hosted version of DataBricks for research

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

20 часов назад @ reddit.com
[P] Volga - On-Demand Compute in Real-Time AI/ML - Overview and Architecture
[P] Volga - On-Demand Compute in Real-Time AI/ML - Overview and Architecture

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

20 часов назад @ reddit.com
Visual Theory of Mind Enables the Invention of Proto-Writing
Visual Theory of Mind Enables the Invention of Proto-Writing

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

23 часа назад @ reddit.com
Towards Data Science
последний пост 5 часов назад
Exporting MLflow Experiments from Restricted HPC Systems
Exporting MLflow Experiments from Restricted HPC Systems Exporting MLflow Experiments from Restricted HPC Systems

We will focus on:Setting up a local HPC MLflow server on a port with local directory storage.

Transfer experiment data from the local temp folder on HPC to the Remote Mlflow server.

I have deployed Charmed MLflow (MLflow server, MySQL, MinIO) using juju, and the whole thing is hosted on MicroK8s localhost.

5) Transfer experiment runs to MLflow Server:Move everything from the HPC to the temporary folder on the MLflow server.

Spinning up a local MLflow server instance, exporting experiments, and then importing them to my remote MLflow server provided me with flexibility without having to change my workflow.

5 часов назад @ towardsdatascience.com
How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals
How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals

Reasoning models, such as DeepSeek-R1 and OpenAI’s o-series models (e.g., o1, o3), are large language models (LLMs) trained using reinforcement learning to perform reasoning.

This has raised doubts about the utility of distilled reasoning models, especially when they struggle to give correct answers despite generating long reasoning.

Do check out the interestingly unique DeepSeek-R1 usage recommendations — especially for benchmarking — to ensure optimal performance when using DeepSeek-R1 models.

Following the steps above, we have successfully set up and executed the GPQA-Diamond benchmarking on the DeepSeek-R1 distilled model.

Wrapping It UpIn this article, we showcased how we can combine t…

5 часов назад @ towardsdatascience.com
An Existential Crisis of a Veteran Researcher in the Age of Generative AI
An Existential Crisis of a Veteran Researcher in the Age of Generative AI An Existential Crisis of a Veteran Researcher in the Age of Generative AI

I was trying to convince myself that this is only a tool to help in the literature review.

If I say it took 3-6 months to do a solid literature review 15 years ago, I was not wrong.

These AI research assistant tools can enhance all these steps: reviewing your articles and selecting the most relevant journal for you.

I can ask AI research assistant tools to perform a semantic search within an extensive database.

If you can’t say it, AI can’t make it!

10 часов назад @ towardsdatascience.com
Why Most Cyber Risk Models Fail Before They Begin
Why Most Cyber Risk Models Fail Before They Begin Why Most Cyber Risk Models Fail Before They Begin

This article explores why traditional cyber risk models fall short and how applying some light statistical tools such as probabilistic modeling offers a better way forward.

The Two Schools of Cyber Risk ModelingInformation security professionals primarily use two different approaches to modeling risk during the risk assessment process: qualitative and quantitative.

Qualitative Risk ModelingImagine two teams assess the same risk.

At a glance, this seems reasonable as the commonly used definition for risk in information security is:\[\text{Risk} = \text{Likelihood } \times \text{Impact}\]From a statistical standpoint, however, qualitative risk modeling has some pretty important pitfalls.

Look…

10 часов назад @ towardsdatascience.com
Data Science: From School to Work, Part IV
Data Science: From School to Work, Part IV Data Science: From School to Work, Part IV

For a unit test to be truly considered as such, it must adhere to a basic rule: A unit test must not depend on functionalities outside the unit under test.

The Unit Tests with DoctestA fast and simple way of making unit tests is to use docstring .

To overcome this problem and test several data sets in a single unit test, we use the parametrize .

They differ from integration tests and unit tests because you don’t need to know the code to perform them.

By implementing a system of unit tests, integration tests, functional tests and E2E tests, you can ensure that your application meets the specifications.

11 часов назад @ towardsdatascience.com
Explained: How Does L1 Regularization Perform Feature Selection?
Explained: How Does L1 Regularization Perform Feature Selection? Explained: How Does L1 Regularization Perform Feature Selection?

Feature selection can be a manual or rather explicit process when performed with filter or wrapper methods.

Embedded methods perform feature selection implicitly, without using any pre-defined selection criteria and deriving it from the training data itself.

In later sections, we will describe the role of regularization in performing this intrinsic feature selection.

So, how are regularization and feature selection connected to attain a common goal of optimal model complexity?

“L1 regularization performs feature selection” is a simple statement that most ML learners agree with, without diving deep into how it works internally.

1 day, 3 hours назад @ towardsdatascience.com
Enterprise AI: From Build-or-Buy to Partner-and-Grow
Enterprise AI: From Build-or-Buy to Partner-and-Grow Enterprise AI: From Build-or-Buy to Partner-and-Grow

I deeply believe that most companies should build AI expertise internally — this will provide them with more bandwidth in their AI strategy and activities in the future.

Consider your internal technical capabilities, such as skilled AI talent, existing reusable AI assets (e.g.

AI learning curve: Reflects the complexity of acquiring AI expertise and operationalizing it within the organization.

Most B2B AI systems combine two kinds of expertise: domain expertise, which lives within your company, and technical AI expertise, which can be brought in through an external partner if you don’t (yet) have specialized AI skills.

In partnering, start early and focus on communicationBy now, you know tha…

1 day, 6 hours назад @ towardsdatascience.com
How to Get Performance Data from Power BI with DAX Studio
How to Get Performance Data from Power BI with DAX Studio How to Get Performance Data from Power BI with DAX Studio

The Storage Engine can call the Formula Engine when an xmSQL-Query contains functions that the Storage Engine cannot execute.

In case that we must start DAX Studio manually, we can manually connect to the Power BI file as well:Figure 5 – Manually connect DAX Studio to Power BI Desktop (Figure by the Author)After the connection is established, an empty query is opened in DAX Studio.

But, before pasting the DAX Query from Power BI Desktop, we have to start Server Timings in DAX Studio (Right top corner of the DAX Studio Window):Figure 6 – Start Server Timings in DAX Studio (Figure by the Author)After pasting the Query to the Empty Editor, we have to Enable the “Clear on Run” Button and execut…

1 day, 7 hours назад @ towardsdatascience.com
MapReduce: How It Powers Scalable Data Processing
MapReduce: How It Powers Scalable Data Processing MapReduce: How It Powers Scalable Data Processing

The programming model breaks up the computation into the following two primitives:Map : given a partition of the input data to process, parse the input data for each of its individual records.

: given a partition of the input data to process, parse the input data for each of its individual records.

One of the motivating data processing tasks that inspired Google to create the MapReduce framework was to build indexes for its search engine.

For additional examples of data processing tasks that fit well with the MapReduce framework, check out the original paper.

All that being said, the MapReduce framework is no longer the go-to model for most modern large-scale data processing tasks.

1 day, 11 hours назад @ towardsdatascience.com
AI Agents Processing Time Series and Large Dataframes
AI Agents Processing Time Series and Large Dataframes AI Agents Processing Time Series and Large Dataframes

Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal.

They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e.

In this tutorial, I’m going to show how to process dataframes and time series with AI Agents.

Output: {res}" messages.append( {"role":"assistant", "content":local_memory} ) available_tools.pop(tool_used) if len(available_tools) == 1: messages.append( {"role":"user", "content":"now activate the tool final_answer."}

ConclusionThis article has been a tutorial to demonstrate how to build from scratch Agents that process time series and large dataframe…

1 day, 13 hours назад @ towardsdatascience.com
(Many) More TDS Contributors Are Now Eligible for Earning Through the Author Payment Program
(Many) More TDS Contributors Are Now Eligible for Earning Through the Author Payment Program (Many) More TDS Contributors Are Now Eligible for Earning Through the Author Payment Program

We’re thrilled to share that we’ve recently introduced a new earnings tier: articles that gain 500 engaged views can now earn a minimum payout of $100.

The immediate result, and the one we care about the most, is that the number of eligible articles will increase—drastically.

(We’ve already contacted all authors who published on TDS in February and whose articles have crossed the 500 engaged-view threshold.)

Since launching the new TDS site, our authors’ most-requested feature—by a wide margin!—has been access to their articles’ stats.

You’ll be able to see your total views and engaged views (reminder: the latter are views by readers who spend at least 30 seconds on an individual article), …

1 day, 13 hours назад @ towardsdatascience.com
Building a Personal API for Your Data Projects with FastAPI
Building a Personal API for Your Data Projects with FastAPI Building a Personal API for Your Data Projects with FastAPI

Here’s today’s fix (and topic): build yourself a personal API.

Let’s review today’s table of contents:What is a personal API?

Some use cases Setting it up with Fastapi ConclusionWhat Is a Personal API?

But let’s see some real use cases where anyone like you and me would benefit from a personal API.

Some Use CasesWhether you’re a data scientist, analyst, ML engineer, or just building cool stuff on weekends, a personal API can become your secret productivity weapon.

2 days, 1 hour назад @ towardsdatascience.com
Beginner’s Guide to Creating a S3 Storage on AWS
Beginner’s Guide to Creating a S3 Storage on AWS Beginner’s Guide to Creating a S3 Storage on AWS

About this articleThe objective of this article is to demonstrate how to create a basic S3 Storage.

By the end of the tutorial, we will have a functioning S3 storage that allows remote access to uploaded images.

Create S3 storageTo perform any operations related to S3 storage management, select the Storage option from the service menu.

To create a folder, click the Create folder button.

ConclusionIn this article, we have introduced the AWS S3 storage system, which is very useful for storing large amounts of unstructured data.

2 days, 2 hours назад @ towardsdatascience.com
Retrieval Augmented Generation (RAG) — An Introduction
Retrieval Augmented Generation (RAG) — An Introduction Retrieval Augmented Generation (RAG) — An Introduction

These modela have a information retrieval component that allows the model to access up-to-date data, and the generative capabilities they are already well known for.

This hybrid model architecture is called Retrieval Augmented Generation, or RAG for short.

Example of external sources being shown as part of the output of the RAG model.

The second piece of the Rag Architecture is what is the most visible to us, consumers, the generation model.

For every query the user makes, there needs to be one step for information retrieval, and another for text generation.

2 days, 7 hours назад @ towardsdatascience.com
Beyond the Code: Unconventional Lessons from Empathetic Interviewing
Beyond the Code: Unconventional Lessons from Empathetic Interviewing Beyond the Code: Unconventional Lessons from Empathetic Interviewing

This article reflects on the lessons learned across CV reviews, technical interviews, and post-interview feedback.

Code or no codeWhether I include pre-written code or expect the candidate to write depends on the time available.

Tips for Interviewers (Technical Section)Start with guiding questions that explore high-level considerations before narrowing down.

They consider that the error may not even be in the source code, but the environment or elsewhere (See Why Code Rusts in reference).

Style vs FlowSome candidates added pleasantries and extra instructions to their prompts, rather than just pasting the relevant code and error message.

2 days, 7 hours назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
The Gradient The Gradient
последний пост 5 months, 1 week назад
Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research

Mathematics and statistics, once the primary guides of machine learning research, now struggle to provide immediate insight into the latest breakthroughs.

This shift has prompted speculation about mathematics’ diminished role in machine learning research moving forward.

It is also the way that symmetries are usually leveraged when performing computations (for example, in machine learning).

One can reasonably argue that diagrammatic descriptions of well-known constructions, like products, are not useful for the machine learning researcher.

However, as we’ve demonstrated, while mathematics may not maintain the same role in machine learning research that it has held in the past, the success of…

5 months, 1 week назад @ thegradient.pub
What's Missing From LLM Chatbots: A Sense of Purpose
What's Missing From LLM Chatbots: A Sense of Purpose What's Missing From LLM Chatbots: A Sense of Purpose

Let's jump back to the 1970s, when Roger Schank introduced his "restaurant script" as a kind of dialogue system [1].

The minimum requirement we could have for a dialogue system is that it can stay on the task we gave them.

Concluding marksI have reviewed the making of current LLM dialogue systems, how and why it is insufficient.

The following are two research questions that I’m mostly excited about:(1) Better monitoring and control of dialogue systems with steering techniques.

CitationFor attribution of this in academic contexts or books, please cite this work as:Kenneth Li, "From prediction to purpose: a tutorial on LLM dialogue system", The Gradient, 2024.

7 months, 2 weeks назад @ thegradient.pub
We Need Positive Visions for AI Grounded in Wellbeing
We Need Positive Visions for AI Grounded in Wellbeing We Need Positive Visions for AI Grounded in Wellbeing

This leads to our second conclusion: We need plausible positive visions of a society with capable AI, grounded in wellbeing.

The rest of this post describes in more detail (1) what we mean by AI that benefits our wellbeing, (2) the need for positive visions for AI grounded in wellbeing, and (3) concrete leverage points to aid in the development and deployment of AI in service of such positive visions.

In diving into the philosophy of flourishing, wellbeing economics, or psychological theories of human wellbeing, one encounters many interesting, compelling, but seemingly incompatible ideas.

The case so far is that we need positive visions for society with capable AI, grounded in individual a…

8 months, 3 weeks назад @ thegradient.pub
TheSequence TheSequence
последний пост 19 часов назад
The Sequence Engineering #528: Inside Google's New Agent Development Kit
The Sequence Engineering #528: Inside Google's New Agent Development Kit The Sequence Engineering #528: Inside Google's New Agent Development Kit

Created Using GPT-4oOne today’s engineering section, I would like to dive into one of Google’s recent announcements in agentic development.

The Agent Development Kit (ADK) is an open-source Python framework to support the full lifecycle of intelligent agent development—from prototyping and evaluation to production deployment.

Designed with composability and extensibility in mind, ADK empowers researchers and developers to build robust agentic systems ranging from simple task handlers to complex, multi-agent orchestration layers.

Its native integration with Google's Gemini models and the broader Vertex AI ecosystem significantly enhances performance and scalability.

Architectural OverviewADK…

19 часов назад @ thesequence.substack.com
The Sequence Knowledge #527: Let's Learn About Math Benchmarks
The Sequence Knowledge #527: Let's Learn About Math Benchmarks The Sequence Knowledge #527: Let's Learn About Math Benchmarks

Created Using GPT-4oToday we will Discuss:An introduction to math benchmarks.

A review of Frontier Math, one of the most challenging math benchmarks ever built.

💡 AI Concept of the Day: An Intro to Math BenchmarksIn today’s series about AI benchmarks we are going to discuss one of the most fascinating areas of evaluation.

Mathematical reasoning has rapidly emerged as one of the key vectors for evaluating foundation models models, prompting the development of sophisticated benchmarks to evaluate AI systems' capabilities.

The MATH benchmark has become increasingly saturated for state-of-the-art models, with leading systems achieving impressive accuracy rates.

1 day, 19 hours назад @ thesequence.substack.com
📝 Guest Post: I Built a Deep Research with Open Source – and So Can You!
📝 Guest Post: I Built a Deep Research with Open Source – and So Can You! 📝 Guest Post: I Built a Deep Research with Open Source – and So Can You!

In this guest post, Stefan Webb, Developer Advocate at Zilliz, builds a lightweight “Deep Research” clone using open-source tools.

to perform research using Wikipedia.

But how does this technology work, and why is Deep Research a noticeable improvement over previous attempts (like Google’s Deep Research - incoming trademark dispute alert)?

As for the former, there is no doubt much “secret sauce” underlying Deep Research.

However, Deep Research, as OpenAI acknowledges, has limitations common to Generative AI technology.

2 days, 19 hours назад @ thesequence.substack.com
The Sequence Radar #526: The OpenAI Blitz: From GPT-4.1 to Windsurf
The Sequence Radar #526: The OpenAI Blitz: From GPT-4.1 to Windsurf The Sequence Radar #526: The OpenAI Blitz: From GPT-4.1 to Windsurf

Alongside the flagship model, OpenAI released 4.1-mini and 4.1-nano variants, optimizing for different tradeoffs in latency, cost, and performance.

Internally referred to as the "o-series," these models are built to excel at multi-step reasoning, web browsing, visual tasks, and planning.

OpenAI positions o3 as its most advanced reasoning model to date, capable of handling sophisticated instructions with higher reliability.

On the developer tooling front, OpenAI launched Codex CLI—an open-source command-line coding assistant that runs locally but connects to OpenAI models.

If finalized, the acquisition would give OpenAI a more robust foothold in the IDE-level development experience and furth…

3 days, 19 hours назад @ thesequence.substack.com
The Sequence Research #525: Inside the Model that Can Write AI Peer-Reviewed Scientific Papers
The Sequence Research #525: Inside the Model that Can Write AI Peer-Reviewed Scientific Papers The Sequence Research #525: Inside the Model that Can Write AI Peer-Reviewed Scientific Papers

Created Using GPT-4oIf you follow this newsletter, you know I am super passionate about the ideas of open endedness AI, specifically when comes to scientific research.

The field of automated scientific discovery has experienced rapid evolution, with artificial intelligence (AI) systems increasingly entrusted with the end-to-end execution of the scientific method.

This autonomy is powered by a progressive agentic tree-search framework, coordinated by a central experiment manager agent.

Additionally, v2 integrates a Vision-Language Model (VLM) feedback mechanism, which plays a vital role in refining visual outputs, especially figures in the generated papers.

The Experiment Progress Manager an…

5 days, 19 hours назад @ thesequence.substack.com
The Sequence Opinion #524: OpenAI, Anthropic, and DeepMind are Building the Same AI Cognitive Primitives.Are we driving towards monolithic models?
The Sequence Opinion #524: OpenAI, Anthropic, and DeepMind are Building the Same AI Cognitive Primitives.Are we driving towards monolithic models? The Sequence Opinion #524: OpenAI, Anthropic, and DeepMind are Building the Same AI Cognitive Primitives.Are we driving towards monolithic models?

Created Using GPT-4oThe leading AI labs—OpenAI, Anthropic, and Google DeepMind—are independently converging on a shared set of core AI capabilities.

Despite different architectures and branding, these organizations are building systems around similar cognitive primitives: reasoning, planning, tool use, search, code execution, and computer control.

These functions represent the foundations of general-purpose intelligence and are becoming central to the next generation of AI systems.

We also examine the trajectory toward single models that can reason, perceive, and act across multiple domains—setting the stage for unified AI agents that go far beyond text prediction.

Common Cognitive Primitiv…

6 days, 19 hours назад @ thesequence.substack.com
The Sequence Engineering #523: Diving Into Google's Agent2Agent (A2A) Protocol
The Sequence Engineering #523: Diving Into Google's Agent2Agent (A2A) Protocol The Sequence Engineering #523: Diving Into Google's Agent2Agent (A2A) Protocol

Created Using GPT-4oThe rise of autonomous agents marks a turning point in artificial intelligence.

As these agents increasingly take on responsibilities across workflows, applications, and domains, the need for interoperable communication becomes vital.

Historically, most agents have been siloed—designed within proprietary frameworks and unable to collaborate with others outside their ecosystems.

Recognizing this limitation, Google introduced the Agent2Agent (A2A) protocol, a flexible and standardized framework to enable direct interaction between AI agents, regardless of their implementation specifics.

A2A is engineered not merely as a communication tool but as a foundational layer for or…

1 week назад @ thesequence.substack.com
The Sequence Knowledge #532: Learning About AI Reasoning Benchmarks
The Sequence Knowledge #532: Learning About AI Reasoning Benchmarks The Sequence Knowledge #532: Learning About AI Reasoning Benchmarks

Image Credit: GPT-4oToday we will Discuss:An intro to reasoning benchmarks.

💡 AI Concept of the Day: Reasoning BenchmarksAI reasoning benchmarks play a crucial role in evaluating the cognitive capabilities of large language models (LLMs) and other AI systems.

These benchmarks assess various aspects of reasoning, from logical deduction and commonsense understanding to complex problem-solving and mathematical aptitude.

As AI systems continue to advance, the need for comprehensive and challenging benchmarks has grown to accurately measure their progress and limitations.

Key capabilities tested by AI reasoning benchmarks include:

1 week, 1 day назад @ thesequence.substack.com
The Sequence Radar #531: A2A is the New Hot Protocol in Agent Land
The Sequence Radar #531: A2A is the New Hot Protocol in Agent Land The Sequence Radar #531: A2A is the New Hot Protocol in Agent Land

You can subscribe to The Sequence below:📝 Editorial: A2A is the New Hot Protocol in Agent LandIf 2025 is the year of agents, 2026 is going to be the year of multi-agent systems!

Google’s Agent2Agent (A2A) protocol introduces a standardized, open framework purpose-built for seamless inter-agent collaboration across heterogeneous environments.

A2A is engineered to enable secure, efficient, and extensible communication between autonomous agents, irrespective of vendor, architecture, or deployment context.

A2A defines a clear interaction model: one agent (the "client") initiates a task, while another (the "remote") executes it.

Early adopters have praised A2A for streamlining agent discovery, s…

1 week, 3 days назад @ thesequence.substack.com
The Sequence #530: A Tech Deep Dive Into Llama 4
The Sequence #530: A Tech Deep Dive Into Llama 4 The Sequence #530: A Tech Deep Dive Into Llama 4

Created Using GPT-4oThe release of Llama 4 has dominated the AI headlines in recent days.

Despite some questionable performance and criticism, Llama 4 brings some unquestionable technical innovations across different vectors.

The Llama 4 series introduces three distinct models—Scout, Maverick, and Behemoth—designed for a range of use cases, from general-purpose reasoning to long-context and multimodal applications.

This essay explores the technical contributions and innovations of the Llama 4 models, focusing on their architecture, training methodologies, and benchmarks.

Overview of the Llama 4 HerdThe Llama 4 family consists of three models tailored for different computational and applicat…

1 week, 5 days назад @ thesequence.substack.com
The Sequence Opinion #529: Where Foundation Models Are Just Getting Started
The Sequence Opinion #529: Where Foundation Models Are Just Getting Started The Sequence Opinion #529: Where Foundation Models Are Just Getting Started

Foundation models have transformed NLP and vision, but their application to scientific and engineering domains remains limited.

These domains require models trained on proprietary, often experimentally generated datasets and benefit from architectures that go beyond conventional LLM structures.

As general-purpose models saturate the market, the real innovation frontier is shifting toward specialized foundation models that unlock domain-specific reasoning and accelerate discovery.

The Data Challenge and Why It MattersMost of the success in LLMs and vision models stems from massive public datasets: Common Crawl, Wikipedia, ImageNet, LAION-5B.

In contrast, scientific domains deal with data tha…

1 week, 6 days назад @ thesequence.substack.com
The Sequence Engineering #528: Inside Crawl4AI, Extracting Web Data for your AI Apps
The Sequence Engineering #528: Inside Crawl4AI, Extracting Web Data for your AI Apps The Sequence Engineering #528: Inside Crawl4AI, Extracting Web Data for your AI Apps

Created Using GPT-4oIn today’s edition, I finally get to deep dive into one of my favorite frameworks for building AI applications.

More often than not, the challenges in AI apps are more related to data pipelines than to the core AI capabilities.

Traditional web crawling tools, built for static HTML and regex-based extraction, increasingly fall short in an ecosystem dominated by dynamic, JavaScript-driven web applications and the nuanced data demands of large language models (LLMs).

Enter Crawl4AI – an open-source framework that redefines web crawling as a critical, AI-native component in ML workflows.

Crawl4AI departs from the paradigms of Scrapy or BeautifulSoup, treating the web not as …

2 weeks назад @ thesequence.substack.com
The Sequence Knowledge #527: What Types of AI Benchmarks Should You Care About?
The Sequence Knowledge #527: What Types of AI Benchmarks Should You Care About? The Sequence Knowledge #527: What Types of AI Benchmarks Should You Care About?

Created Using GPT-4oToday we will Discuss:Types of AI benchmarks.

The MEGA research by CMU, Microsoft and others about evaluating LLMs across different dimensions.

💡 AI Concept of the Day: A Taxonomy to Understand AI BenchmarksThe benchmarking and evaluation space is evolving quite rapidly and it seems like we get new benchmark every day.

While there is no formal taxonomy to foundation model benchmarking, there are a few categories that I find particularly useful to understand the space.

Task-Centric Benchmarks: Evaluating Functional Capabilities

2 weeks, 1 day назад @ thesequence.substack.com
The Sequence Radar #526: Llama 4 Scout and Maverick are Here!
The Sequence Radar #526: Llama 4 Scout and Maverick are Here! The Sequence Radar #526: Llama 4 Scout and Maverick are Here!

The opinion series explores the trend of all the major AI labs creating the same primitives( research, reasoning, search, etc) and its implications.

You can subscribe to The Sequence below:📝 Editorial: Llama 4 Scout and Maverick are Here!

Llama 4 debuts with two models: Llama 4 Scout and Llama 4 Maverick.

Llama 4 Scout incorporates 16 experts with 17 billion active parameters, uniquely optimized to run on a single H100 GPU.

Early signals suggest it is succeeding, with Behemoth outperforming across a wide range of logic and scientific reasoning evaluations.

2 weeks, 3 days назад @ thesequence.substack.com
The Sequence Research #525: Anthropic's Recent Journey Into the Mind of Claude
The Sequence Research #525: Anthropic's Recent Journey Into the Mind of Claude The Sequence Research #525: Anthropic's Recent Journey Into the Mind of Claude

Created Using GPT-4oInterpretability remains one of the toughest challenges in frontier AI models.

Anthropic is one of the leading labs publishing the frontiers of AI interpretability.

One of the areas in which Antrophic has put a lot of focus in theAnthropic has recently published two landmark studies that represent a pivotal advancement in the mechanistic interpretability of large language models (LLMs).

The papers — Circuit Tracing: Revealing Computational Graphs in Language Models and On the Biology of a Large Language Model — introduce a novel empirical methodology inspired by neuroscience to dissect the computational substrates of Claude 3.5 Haiku.

Circuit Tracing Methodology

2 weeks, 5 days назад @ thesequence.substack.com
Synced Review
последний пост 1 week, 3 days назад
DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT
DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT

DeepSeek AI, a prominent player in the large language model arena, has recently published a research paper detailing a new technique aimed…Continue reading on SyncedReview »

1 week, 3 days назад @ medium.com
Automating Artificial Life Discovery: The Power of Foundation Models
Automating Artificial Life Discovery: The Power of Foundation Models Automating Artificial Life Discovery: The Power of Foundation Models

The recent Nobel Prize for groundbreaking advancements in protein discovery underscores the transformative potential of foundation models…Continue reading on SyncedReview »

3 months, 3 weeks назад @ medium.com
Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI
Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI

Continue reading on SyncedReview »

3 months, 3 weeks назад @ medium.com
DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints
DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints

Recent advancements in training large multimodal models have been driven by efforts to eliminate modeling constraints and unify…Continue reading on SyncedReview »

3 months, 4 weeks назад @ medium.com
NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation
NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation

The Transformer architecture, introduced by Vaswani et al. in 2017, serves as the backbone of contemporary language models. Over the years…Continue reading on SyncedReview »

4 months назад @ medium.com
From Token to Conceptual: Meta Introduces Large Concept Models in Multilingual AI
From Token to Conceptual: Meta Introduces Large Concept Models in Multilingual AI From Token to Conceptual: Meta Introduces Large Concept Models in Multilingual AI

Large Language Models (LLMs) have become indispensable tools for diverse natural language processing (NLP) tasks. Traditional LLMs operate…Continue reading on SyncedReview »

4 months, 1 week назад @ medium.com
NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small…
NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small… NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small…

Language models (LMs) based on transformers have become the gold standard in natural language processing, thanks to their exceptional…Continue reading on SyncedReview »

4 months, 1 week назад @ medium.com
From Response to Query: The Power of Reverse Thinking in Language Models
From Response to Query: The Power of Reverse Thinking in Language Models From Response to Query: The Power of Reverse Thinking in Language Models

Continue reading on SyncedReview »

4 months, 1 week назад @ medium.com
Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models
Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models

Navigation is a fundamental skill for any visually-capable organism, serving as a critical tool for survival. It enables agents to locate…Continue reading on SyncedReview »

4 months, 2 weeks назад @ medium.com
The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack
The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack

The landscape of vision model pre-training has undergone significant evolution, especially with the rise of Large Language Models (LLMs)…Continue reading on SyncedReview »

4 months, 2 weeks назад @ medium.com
Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model
Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model

A foundation model refers to a pre-trained model developed on extensive datasets, designed to be versatile and adaptable for a range of…Continue reading on SyncedReview »

4 months, 2 weeks назад @ medium.com
DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence
DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence

Continue reading on SyncedReview »

4 months, 3 weeks назад @ medium.com
Revolutionizing AI on a Budget: Apple’s Roadmap for Small Language Models Training Success
Revolutionizing AI on a Budget: Apple’s Roadmap for Small Language Models Training Success Revolutionizing AI on a Budget: Apple’s Roadmap for Small Language Models Training Success

While large language models (LLMs) dominate the AI landscape, Small-scale Large Language Models (SLMs) are gaining traction as…Continue reading on SyncedReview »

4 months, 3 weeks назад @ medium.com
Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step…
Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step… Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step…

Consistency models (CMs) are a cutting-edge class of diffusion-based generative models designed for rapid and efficient sampling. However…Continue reading on SyncedReview »

4 months, 4 weeks назад @ medium.com
Precision in Pixels: NVIDIA’s Edify Image Model Combines High Quality with Unmatched Control
Precision in Pixels: NVIDIA’s Edify Image Model Combines High Quality with Unmatched Control Precision in Pixels: NVIDIA’s Edify Image Model Combines High Quality with Unmatched Control

The field of text-to-image synthesis has advanced rapidly, with state-of-the-art models now generating highly realistic and diverse images…Continue reading on SyncedReview »

4 months, 4 weeks назад @ medium.com
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 3 weeks, 5 days назад
Байесовская собака: анализ пёсьего компаса
Байесовская собака: анализ пёсьего компаса Байесовская собака: анализ пёсьего компаса

", подумал я. И, к счастью, у меня как раз под рукой оказался идеальный подопытный.

Стандартное арифметическое среднее между 360° и 0° даст нам 180°, несмотря на то, что и 360°, и 0° указывают в одном направлении.

Нулевая гипотеза утверждает, что данные распределены равномерно по кругу, альтернативная — что это не так.

from pingouin import circ_vtest v, pval = circ_vtest(data['radians'], dir=np.pi) print(f"V-statistics: {v:.3f}; p-value: {pval:.6f}")>> V-statistics: 24.127; p-value: 0.002904Вот мы и подобрались к чему-то интересному!

Априорное распределение и функция правдоподобияПредположим, что у нас есть:Априорное распределение с параметрамиФункция правдоподобия для нового наблюдения с п…

3 weeks, 5 days назад @ habr.com
Создаем воспоминания. Осваиваем FLUX, LoRA и ComfyUI
Создаем воспоминания. Осваиваем FLUX, LoRA и ComfyUI Создаем воспоминания. Осваиваем FLUX, LoRA и ComfyUI

Такие модели можно обучать с нуля и это дорого, нужен кластер с GPU (видеокарты) и много данных.

В домене текст-картинка бывают открытые модели, типа Stable Diffusion, Kandinsky и FLUX, бывают закрытые, типа DALL-E.Открытую модель можно дообучать разными способами.

Борис СтругацкийОсобенности: Для личностей типа Стругацких или Бродского, качественных фотографий крайне мало, но много и не надо.

Можно и фразу.

Владимир СурдинАлексей СемихатовВидео с их лекциями можно найти повсеместно, начать можно с канала Вселенная плюс на YouTube и в телеграм.

3 months, 2 weeks назад @ habr.com
Как нейросети, RL и байесовскую оптимизацию стали использовать на ускорителях заряженных частиц
Как нейросети, RL и байесовскую оптимизацию стали использовать на ускорителях заряженных частиц Как нейросети, RL и байесовскую оптимизацию стали использовать на ускорителях заряженных частиц

Один из них — поддержание стабильной орбиты пучка частиц (траектории, по которой происходит движение), которая критически важна для точности экспериментов.

Вот, кстати, наша статья про то, как сейсмические вибрации будут влиять на орбиту пучка в СКИФ: Beam Stability .

Классические подходы к стабилизации орбиты пучка в ускорителяхДля стабилизации орбиты используют датчики положения пучка (BPM) и магнитные корректоры.

По словам авторов получились следующие преимущества:Ускорение процесса коррекции орбиты и повышение точности по сравнению с классическими методами, такими как SVD.

Задача агента:Автоматическое восстановление орбиты пучка заряженных частиц за ограниченное время и минимальное коли…

4 months назад @ habr.com
о1: почему новая GPT от OpenAI — это не хайп, а переход к новой парадигме в ИИ
о1: почему новая GPT от OpenAI — это не хайп, а переход к новой парадигме в ИИ о1: почему новая GPT от OpenAI — это не хайп, а переход к новой парадигме в ИИ

В этой статье мы разберемся, чему научилась новая GPT o1, и как это повлияет на дальнейшую эволюцию ИИ.

Компания утверждает, что для них сброс счётчика линейки моделей к единичке знаменует собой переход к новой парадигме, и что эта нейросеть и вовсе демонстрирует новый уровень возможностей ИИ.

На форумах и в Твиттере была куча обсуждений, предвосхищений и хайпа, на фоне которых планка ожиданий некоторых людей взлетела до небес.

Издание Bloomberg рассказало, что в ходе внутренней демонстрации OpenAI показали концепцию из пяти уровней, помогающую отслеживать прогресс в создании ИИ.

Однако на уровне GPT-5 прирост в навыках может быть совсем другим (как в лучшую, так и в худшую сторону).

7 months, 1 week назад @ habr.com
Большие и чёрные (ящики): что мы знаем о том, как «думают» нейросети?
Большие и чёрные (ящики): что мы знаем о том, как «думают» нейросети? Большие и чёрные (ящики): что мы знаем о том, как «думают» нейросети?

И в том, и в другом случаях объяснение действия не связано с реальным мотивом его сделать, и там, и там рождается поддельное (но правдоподобно звучащее) объяснение причин.

Просто сейчас это не воспринимается всерьёз, ведь LLM не распространены и не становятся ядром бизнес-процессов, включающих принятие решений.

Один и тот же текст запроса+ответа подаётся в модель, и производится оценка вероятности получить именно такой ответ при фиксированном запросе.

Это и желание продолжать существовать/жить, и нежелание умирать, и рассуждения об эмоциях и контроле.

Потому что абстракции, потому что обобщение, потому что это ровно то, за что мы ценим модели.

7 months, 2 weeks назад @ habr.com
Как организовать процесс А/В тестирования на коленке
Как организовать процесс А/В тестирования на коленке Как организовать процесс А/В тестирования на коленке

В ней авторы выделили 4 этапа зрелости, грубо можно разделить компании по частоте запусков экспериментов:на этапе Crawl компания проводит эксперимент раз в месяц (примерно 10 экспериментов в год);на этапе Walk – раз в неделю (примерно 50 экспериментов в год);на этапе Run – ежедневно (примерно 250 экспериментов в год);на этапе Fly – более 1000 экспериментов в год.

Более четкое разделение основывается на некоторых важных свойствах компаний, таких как: наличие команды платформы экспериментов, возможности автоматического расчета метрик, распространенности экспериментов в компании, самодостаточность продуктовых команд в проведении экспериментов, влияние экспериментов на компанию и другое.

Точка …

8 months, 1 week назад @ habr.com
Как организовать процесс А/В тестирования на коленке
Как организовать процесс А/В тестирования на коленке Как организовать процесс А/В тестирования на коленке

В ней авторы выделили 4 этапа зрелости, грубо можно разделить компании по частоте запусков экспериментов:на этапе Crawl компания проводит эксперимент раз в месяц (примерно 10 экспериментов в год);на этапе Walk – раз в неделю (примерно 50 экспериментов в год);на этапе Run – ежедневно (примерно 250 экспериментов в год);на этапе Fly – более 1000 экспериментов в год.

Более четкое разделение основывается на некоторых важных свойствах компаний, таких как: наличие команды платформы экспериментов, возможности автоматического расчета метрик, распространенности экспериментов в компании, самодостаточность продуктовых команд в проведении экспериментов, влияние экспериментов на компанию и другое.

Точка …

8 months, 1 week назад @ habr.com
Введение в MLflow
Введение в MLflow Введение в MLflow

Mlflow Experiments and Mlflow RunsMLflow Experiments и MLflow Runs - это основные абстракции для структурирования проекта.

mlflow run mlproject --entry-point hyperparameters-tuning --env-manager conda --experiment-name Cancer_Classification --run-name Hyperparameters_Search -P n-trials=10Посмотрим на результаты в MLflow UI.

artifact_path: model flavors: python_function: data: model.xgb env: conda: conda.yaml virtualenv: python_env.yaml loader_module: mlflow.xgboost python_version: 3.11.4 xgboost: code: null data: model.xgb model_class: xgboost.sklearn.XGBClassifier model_format: xgb xgb_version: 2.0.3 mlflow_version: 2.14.2 model_size_bytes: 35040 model_uuid: 516954aae7c94e91adeed9df76cb405…

8 months, 3 weeks назад @ habr.com
В 48 собесах от оффера в Гугл
В 48 собесах от оффера в Гугл В 48 собесах от оффера в Гугл

Как это определить и как предсказывать?

- Да... упс.. правда, они же уже определяют целевой признакВовремя не выкрутился, пришел фидбек, что я не понимаю разницу между обычными признаками (а.к.а.

Не, не делал, только DDP?

NVIDIA ищет единорогов, крутых и в рисече, и в инженерии.

Вакансия в Лондоне была, и в итоге они взяли моего бывшего коллегу: он уже в Лондоне и с чуть большим опытом в рекомендашках, явно лучше подходит.

9 months, 1 week назад @ habr.com
Machine Learning Mastery
последний пост 9 months, 1 week назад
Tips for Effectively Training Your Machine Learning Models
Tips for Effectively Training Your Machine Learning Models

In machine learning projects, achieving optimal model performance requires paying attention to various steps in the training process. But before focusing on the technical aspects of model training, it is important to define the problem, understand the context, and analyze the dataset in detail. Once you have a solid grasp of the problem and data, […]

The post Tips for Effectively Training Your Machine Learning Models appeared first on MachineLearningMastery.com.

9 months, 1 week назад @ machinelearningmastery.com
Principles of Reinforcement Learning: An Introduction with Python
Principles of Reinforcement Learning: An Introduction with Python

Reinforcement Learning (RL) is a type of machine learning. It trains an agent to make decisions by interacting with an environment. This article covers the basic concepts of RL. These include states, actions, rewards, policies, and the Markov Decision Process (MDP). By the end, you will understand how RL works. You will also learn how […]

The post Principles of Reinforcement Learning: An Introduction with Python appeared first on MachineLearningMastery.com.

9 months, 2 weeks назад @ machinelearningmastery.com
5 Tips for Getting Started with Deep Learning
5 Tips for Getting Started with Deep Learning

Deep learning is a subset of machine learning that has become a cornerstone in many technological breakthroughs. At the core of deep learning, it’s a model inspired by the human brain, which we call a neural network. Contrary to the traditional machine learning model, deep learning can automatically find feature representations from data. That’s why […]

The post 5 Tips for Getting Started with Deep Learning appeared first on MachineLearningMastery.com.

9 months, 2 weeks назад @ machinelearningmastery.com
Tips for Effective Feature Engineering in Machine Learning
Tips for Effective Feature Engineering in Machine Learning

Feature engineering is an important step in the machine learning pipeline. It is the process of transforming data in its native format into meaningful features to help the machine learning model learn better from the data. If done right, feature engineering can significantly enhance the performance of machine learning algorithms. Beyond the basics of understanding […]

The post Tips for Effective Feature Engineering in Machine Learning appeared first on MachineLearningMastery.com.

9 months, 3 weeks назад @ machinelearningmastery.com
5 Common Mistakes in Machine Learning and How to Avoid Them
5 Common Mistakes in Machine Learning and How to Avoid Them

Using machine learning to solve real-world problems is exciting. But most eager beginners jump straight to model building—overlooking the fundamentals—resulting in models that aren’t very helpful. From understanding the data to choosing the best machine learning model for the problem, there are some common mistakes that beginners often tend to make. But before we go […]

The post 5 Common Mistakes in Machine Learning and How to Avoid Them appeared first on MachineLearningMastery.com.

9 months, 3 weeks назад @ machinelearningmastery.com
Stable Diffusion Project: Reviving Old Photos
Stable Diffusion Project: Reviving Old Photos

Photography has been around for more than a century. There are many old photos around, and probably your family has some, too. Limited by the camera and film of the time, you may have photos of low resolution, blurry, or with folds or scratches. Restoring these old photos and making them like new ones taken […]

The post Stable Diffusion Project: Reviving Old Photos appeared first on MachineLearningMastery.com.

9 months, 3 weeks назад @ machinelearningmastery.com
The Ultimate Beginner’s Guide to Docker
The Ultimate Beginner’s Guide to Docker

Today’s digital landscape has never been so diverse. Every individual and company selects their preferred tools and operating systems, creating a diverse technological system. However, this diversity often leads to compatibility issues, making it hard to ensure application performance across different environments. This is where Docker plays a key role as an indispensable tool for […]

The post The Ultimate Beginner’s Guide to Docker appeared first on MachineLearningMastery.com.

9 months, 4 weeks назад @ machinelearningmastery.com
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 3 weeks, 1 day назад
Who is AI For?
Who is AI For? Who is AI For?

I think the easy answer to this question is that right now, AI is for the AI developers.

Code is useful, it makes money, it is a testbed for AI speeding up the development of AI, and it is easy.

I’m working in AI because it pays well and is potentially really good for the world.

The artists did not know what AI was, but when they learned, they quickly decided they did not want it.

It feels like the most likely outcome is that people go all-in on pushing raw intelligence, in the way that AI developers can measure it, leaving behind those that are not like AI developers.

3 weeks, 1 day назад @ alexirpan.com
MIT Mystery Hunt 2025
MIT Mystery Hunt 2025 MIT Mystery Hunt 2025

This has spoilers for MIT Mystery Hunt 2025.

I enjoyed it more than their 2018 Hunt, which is commonly cited as an all-time good Mystery Hunt.

In this Mystery Hunt it was reversed, where the act of unlocking is easy but the value and difficulty of a feeder varied.

In my free time pre-Hunt, I went to Puzzled Pint, where I tried to all-brain a logic puzzle (solve it without writing anything).

I’m looking forward to solving “No Assembly Required” in Mystery Hunt 2026, a puzzle that gives you the answer for no work.

2 months, 3 weeks назад @ alexirpan.com
Using AI to Get the Neopets Destruct-o-Match Avatar
Using AI to Get the Neopets Destruct-o-Match Avatar Using AI to Get the Neopets Destruct-o-Match Avatar

If AI can be superhuman at Go, surely AI can be slightly-worse-than-experts at Destruct-o-Match if we try?

Step 0: Is Making a Destruct-o-Match AI Against Neopets Rules?

I believe the precedent is in favor of a Destruct-o-Match AI being okay.

As long as I’m the one inputting moves the Destruct-o-Match AI recommends, I should be okay.

To write a game AI, we first need to implement the rules of the game in code.

3 months, 2 weeks назад @ alexirpan.com
Late Takes on OpenAI o1
Late Takes on OpenAI o1 Late Takes on OpenAI o1

I realize how late this is, but I didn’t get a post out while o1 was fresh, and still feel like writing one despite it being cold.

(Also, OpenAI just announced they’re going to ship new stuff starting tomorrow so it’s now or never to say something.)

OpenAI o1 is a model release widely believed (but not confirmed) to be a post-trained version of GPT-4o.

If true, that makes this video especially useful for understanding OpenAI o1.

Which I suppose is part of why I’m talking about o1 rather than building o1.

4 months, 2 weeks назад @ alexirpan.com
Nine Years Later
Nine Years Later Nine Years Later

I expected to fill that void with more blog writing, but that’s not what happened.

The puzzles are great though, and if that’s good enough for you, I had fun with that.

Undertale YellowUndertale Yellow is a fantastic fan game, that’s been in development for 7 years and comes out feeling like a canon entry made by Toby Fox.

markdown 15837 2024 - 01 - 11 - ai - timelines - 2024. markdown 1939 2024 - 01 - 21 - mh - 2024. markdown 5076 2024 - 03 - 23 - crew - battle .

markdown 826 2024 - 04 - 30 - puzzlehunting - 201. markdown 8641 2024 - 07 - 08 - tragedies - of - reality .

8 months, 1 week назад @ alexirpan.com
I'm Switching Into AI Safety
I'm Switching Into AI Safety I'm Switching Into AI Safety

There’s often a conflation between the research field of AI safety and the community of AI safety.

Me thinking AI safety is important is not an endorsement for or against anything else in the broader meme space it came from.

Historically, AI safety work did not appeal to me because of how theoretical it was.

I’m aware of the arguments that most AI safety work so far has either been useless or not that different from broader AI work.

Those who care about safety a lot call this safetywashing, the stapling of “safety” to work that does not advance safety.

8 months, 2 weeks назад @ alexirpan.com
The Tragedies of Reality Are Coming for You
The Tragedies of Reality Are Coming for You The Tragedies of Reality Are Coming for You

I would extend it to reality is complicated, relative to code, and in robotics you’re often pushing a messy reality into an abstraction nice enough for code to act on it.

Robotics research relies on building new bridges between reality and software, but that happens outside of robotics too.

Any software that interfaces with reality will have imperfect knowledge of that reality.

However, that means all the messiness of reality is coming for a field that historically does a bad job at considering reality.

I consider the world of bits to be as much a part of reality as the world of atoms.

9 months, 2 weeks назад @ alexirpan.com
Lil'Log
последний пост None
inFERENCe
последний пост None
The Spectator
последний пост None
Off the Convex Path
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
Andrew Karpathy blog
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост 1 day, 16 hours назад
/cvlab-kaist/ Seurat: From Moving Points to Depth
/cvlab-kaist/ Seurat: From Moving Points to Depth /cvlab-kaist/ Seurat: From Moving Points to Depth

Accurate depth estimation from monocular videos remains challenging due to ambiguities inherent in single-view geometry, as crucial depth cues like stereopsis are absent.

However, humans often perceive relative depth intuitively by observing variations in the size and spacing of objects as they move.

Inspired by this, we propose a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories.

Specifically, we use off-the-shelf point tracking models to capture 2D trajectories.

Then, our approach employs spatial and temporal transformers to process these trajectories and directly infer depth changes over time.

1 day, 16 hours назад @ paperswithcode.com
/manthan2305/ Efficient Document Retrieval with G-Retriever
/manthan2305/ Efficient Document Retrieval with G-Retriever /manthan2305/ Efficient Document Retrieval with G-Retriever

Recently, a novel approach leveraging the Retrieval-Augmented Generation (RAG) method was introduced, utilizing the Prize-Collecting Steiner Tree (PCST) optimization for sub-graph construction.

However, this method focused solely on node attributes, leading to incomplete contextual understanding.

In this paper, we propose an enhanced approach that replaces the PCST method with an attention-based sub-graph construction technique, enabling more efficient and context-aware retrieval.

Additionally, we encode both node and edge attributes, leading to richer graph representations.

Our method also incorporates an improved projection layer and multi-head attention pooling for better alignment with …

1 day, 20 hours назад @ paperswithcode.com
/lawrencerliu/ NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
/lawrencerliu/ NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models /lawrencerliu/ NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environments.

To address this challenge, we propose NoWag: (Normalized Weight and Activation Guided Compression), a unified framework for zero-shot shape preserving compression algorithms.

We compressed Llama-2 7B/13B/70B and Llama-3 8/70BB models, using two popular forms of shape-preserving compression, vector quantization NoWag-VQ (NoWag for Vector Quantization), and unstructured/semi-structured pruning NoWag-P (NoWag for Pruning).

We found that NoWag-VQ significantly outperf…

1 day, 21 hours назад @ paperswithcode.com
/yangzhenkui/ Talk is Not Always Cheap: Promoting Wireless Sensing Models with Text Prompts
/yangzhenkui/ Talk is Not Always Cheap: Promoting Wireless Sensing Models with Text Prompts /yangzhenkui/ Talk is Not Always Cheap: Promoting Wireless Sensing Models with Text Prompts

Wireless signal-based human sensing technologies, such as WiFi, millimeter-wave (mmWave) radar, and Radio Frequency Identification (RFID), enable the detection and interpretation of human presence, posture, and activities, thereby providing critical support for applications in public security, healthcare, and smart environments.

These technologies exhibit notable advantages due to their non-contact operation and environmental adaptability; however, existing systems often fail to leverage the textual information inherent in datasets.

To address this, we propose an innovative text-enhanced wireless sensing framework, WiTalk, that seamlessly integrates semantic knowledge through three hierarch…

1 day, 22 hours назад @ paperswithcode.com
/microsoft/ UFO2: The Desktop AgentOS
/microsoft/ UFO2: The Desktop AgentOS /microsoft/ UFO2: The Desktop AgentOS

Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language.

We present UFO2, a multiagent AgentOS for Windows desktops that elevates CUAs into practical, system-level automation.

Finally, a Picture-in-Picture (PiP) interface enables automation within an isolated virtual desktop, allowing agents and users to operate concurrently without interference.

We evaluate UFO2 across over 20 real-world Windows applications, demonstrating substantial improvements in robustness and execution accuracy over prior CUAs.

Our results show that deep OS integration unlocks a scalable path…

1 day, 22 hours назад @ paperswithcode.com
/zbw001/ TAPIP3D: Tracking Any Point in Persistent 3D Geometry
/zbw001/ TAPIP3D: Tracking Any Point in Persistent 3D Geometry /zbw001/ TAPIP3D: Tracking Any Point in Persistent 3D Geometry

We introduce TAPIP3D, a novel approach for long-term 3D point tracking in monocular RGB and RGB-D videos.

TAPIP3D iteratively refines multi-frame 3D motion estimates within this stabilized representation, enabling robust tracking over extended periods.

To manage the inherent irregularities of 3D point distributions, we propose a Local Pair Attention mechanism.

Our 3D-centric approach significantly outperforms existing 3D point tracking methods and even enhances 2D tracking accuracy compared to conventional 2D pixel trackers when accurate depth is available.

Our approach replaces the conventional 2D square correlation neighborhoods used in prior 2D and 3D trackers, leading to more robust and…

1 day, 22 hours назад @ paperswithcode.com
/zhengchen1999/ NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results
/zhengchen1999/ NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results /zhengchen1999/ NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results

This paper provides a review of the NTIRE 2025 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes.

Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources or training data.

The track of the challenge evaluates performance using a weighted image quality assessment (IQA) score and employs the AdaFace model as an identity checker.

The competition attracted 141 registrants, with 13 teams submitting valid models, and ultimately, 10 teams achieved a valid score in the final ranking.

This collaborative effort advances the performance of real-world face restora…

1 day, 23 hours назад @ paperswithcode.com
/zhengchen1999/ NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results
/zhengchen1999/ NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results /zhengchen1999/ NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results

This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025.

The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor.

To reflect the dual objectives of image SR research, the challenge includes two sub-tracks: (1) a restoration track, emphasizes pixel-wise accuracy and ranks submissions based on PSNR; (2) a perceptual track, focuses on visual realism and ranks results by a perceptual score.

This report summarizes the challenge design, datasets, evaluation protocol, the main results, and …

1 day, 23 hours назад @ paperswithcode.com
/spengliang/ SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
/spengliang/ SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization /spengliang/ SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization

Previous text-to-image diffusion models typically employ supervised fine-tuning (SFT) to enhance pre-trained base models.

In this paper, we introduce Self-sUpervised Direct preference Optimization (SUDO), a novel paradigm that optimizes both fine-grained details at the pixel level and global image quality.

By integrating direct preference optimization into the model, SUDO generates preference image pairs in a self-supervised manner, enabling the model to prioritize global-level learning while complementing the pixel-level MSE loss.

Importantly, it eliminates the need for costly data collection and annotation efforts typically associated with traditional direct preference optimization method…

1 day, 23 hours назад @ paperswithcode.com
/uirlx/ DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue
/uirlx/ DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue /uirlx/ DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue

Speech synthesis is crucial for human-computer interaction, enabling natural and intuitive communication.

To address these issues, we propose DialogueAgents, a novel hybrid agent-based speech synthesis framework, which integrates three specialized agents -- a script writer, a speech synthesizer, and a dialogue critic -- to collaboratively generate dialogues.

Grounded in a diverse character pool, the framework iteratively refines dialogue scripts and synthesizes speech based on speech review, boosting emotional expressiveness and paralinguistic features of the synthesized dialogues.

Using DialogueAgent, we contribute MultiTalk, a bilingual, multi-party, multi-turn speech dialogue dataset cov…

1 day, 23 hours назад @ paperswithcode.com
/tong-zeng/ Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding
/tong-zeng/ Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding /tong-zeng/ Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding

Vision Large Language Models (VLLMs) have demonstrated impressive capabilities in general visual tasks such as image captioning and visual question answering.

However, their effectiveness in specialized, safety-critical domains like autonomous driving remains largely unexplored.

Autonomous driving systems require sophisticated scene understanding in complex environments, yet existing multimodal benchmarks primarily focus on normal driving conditions, failing to adequately assess VLLMs' performance in safety-critical scenarios.

To address this, we introduce DVBench, a pioneering benchmark designed to evaluate the performance of VLLMs in understanding safety-critical driving videos.

DVBench e…

1 day, 23 hours назад @ paperswithcode.com
/starlight1212/ AlphaZero-Edu: Making AlphaZero Accessible to Everyone
/starlight1212/ AlphaZero-Edu: Making AlphaZero Accessible to Everyone /starlight1212/ AlphaZero-Edu: Making AlphaZero Accessible to Everyone

Nevertheless, existing frameworks are often plagued by high implementation complexity and poor reproducibility.

To tackle these challenges, we present AlphaZero-Edu, a lightweight, education-focused implementation built upon the mathematical framework of AlphaZero.

It boasts a modular architecture that disentangles key components, enabling transparent visualization of the algorithmic processes.

In Gomoku matches, the framework has demonstrated exceptional performance, achieving a consistently high win rate against human opponents.

AlphaZero-Edu has been open-sourced at https://github.com/StarLight1212/AlphaZero_Edu, providing an accessible and practical benchmark for both academic research …

1 day, 23 hours назад @ paperswithcode.com
/baiklab/ Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
/baiklab/ Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning /baiklab/ Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning

Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model.

Such merging methodology faces a central challenge: interference between model parameters fine-tuned on different tasks.

To improve the performance of a merged model, we note that a fine-tuning scheme should aim for (1) smaller parameter interference and (2) better performance of each fine-tuned model on the corresponding task.

In this work, we aim to design a new fine-tuning objective function to work towards these two goals.

In the course of this process, we find such objective function to be strikingly similar to sharpnes…

1 day, 23 hours назад @ paperswithcode.com
/hkust-aerial-robotics/ SG-Reg: Generalizable and Efficient Scene Graph Registration
/hkust-aerial-robotics/ SG-Reg: Generalizable and Efficient Scene Graph Registration /hkust-aerial-robotics/ SG-Reg: Generalizable and Efficient Scene Graph Registration

This paper addresses the challenges of registering two rigid semantic scene graphs, an essential capability when an autonomous agent needs to register its map against a remote agent, or against a prior map.

The hand-crafted descriptors in classical semantic-aided registration, or the ground-truth annotation reliance in learning-based scene graph registration, impede their application in practical real-world environments.

To address the challenges, we design a scene graph network to encode multiple modalities of semantic nodes: open-set semantic feature, local topology with spatial awareness, and shape feature.

Moreover, we design a new data generation approach using vision foundation models…

1 day, 23 hours назад @ paperswithcode.com
/xyhanhit/ Shape-Guided Clothing Warping for Virtual Try-On
/xyhanhit/ Shape-Guided Clothing Warping for Virtual Try-On /xyhanhit/ Shape-Guided Clothing Warping for Virtual Try-On

Image-based virtual try-on aims to seamlessly fit in-shop clothing to a person image while maintaining pose consistency.

To tackle these challenges, we propose a novel shape-guided clothing warping method for virtual try-on, dubbed SCW-VTON, which incorporates global shape constraints and additional limb textures to enhance the realism and consistency of the warped clothing and try-on results.

To integrate global shape constraints for clothing warping, we devise a dual-path clothing warping module comprising a shape path and a flow path.

Furthermore, to alleviate distortions in limb regions of try-on results, we integrate detailed limb guidance by developing a limb reconstruction network ba…

1 day, 23 hours назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 1 day, 16 hours назад
/zjunlp/ EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
/zjunlp/ EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models /zjunlp/ EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models

In this paper, we introduce EasyEdit2, a framework designed to enable plug-and-play adjustability for controlling Large Language Model (LLM) behaviors.

EasyEdit2 supports a wide range of test-time interventions, including safety, sentiment, personality, reasoning patterns, factuality, and language features.

Unlike its predecessor, EasyEdit2 features a new architecture specifically designed for seamless model steering.

It comprises key modules such as the steering vector generator and the steering vector applier, which enable automatic generation and application of steering vectors to influence the model's behavior without modifying its parameters.

Empirically, we report model steering perfo…

1 day, 23 hours назад @ paperswithcode.com
/cjreinforce/ Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
/cjreinforce/ Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning /cjreinforce/ Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

Process reward models (PRMs) have proven effective for test-time scaling of Large Language Models (LLMs) on challenging reasoning tasks.

However, reward hacking issues with PRMs limit their successful application in reinforcement fine-tuning.

The key innovation of PURE is a min-form credit assignment that formulates the value function as the minimum of future rewards.

This method significantly alleviates reward hacking by limiting the value function range and distributing advantages more reasonably.

In contrast, the canonical sum-form credit assignment collapses training even at the beginning!

1 day, 23 hours назад @ paperswithcode.com
/neuspeech/ NeuGaze: Reshaping the future BCI
/neuspeech/ NeuGaze: Reshaping the future BCI /neuspeech/ NeuGaze: Reshaping the future BCI

Traditional brain-computer interfaces (BCIs), reliant on costly electroencephalography or invasive implants, struggle with complex human-computer interactions due to setup complexity and limited precision.

We present NeuGaze, a novel webcam-based system that leverages eye gaze, head movements, and facial expressions to enable intuitive, real-time control using only a standard 30 Hz webcam, often pre-installed in laptops.

Requiring minimal calibration, NeuGaze achieves performance comparable to conventional inputs, supporting precise cursor navigation, key triggering via an efficient skill wheel, and dynamic gaming interactions, such as defeating formidable opponents in first-person games.

B…

1 day, 23 hours назад @ paperswithcode.com
/zhu-qianyu/ PIV-FlowDiffuser:Transfer-learning-based denoising diffusion models for PIV
/zhu-qianyu/ PIV-FlowDiffuser:Transfer-learning-based denoising diffusion models for PIV /zhu-qianyu/ PIV-FlowDiffuser:Transfer-learning-based denoising diffusion models for PIV

To reduce the special noise step-by-step, we employ a denoising diffusion model~(FlowDiffuser) for PIV analysis.

And the data-hungry iterative denoising diffusion model is trained via a transfer learning strategy, resulting in our PIV-FlowDiffuser method.

Note that the PIV images are upsampled by a factor of two to resolve the small-scale turbulent flow structures.

Therefore, the denoising diffusion model reduces the average end-point error~($AEE$) by 59.4% over RAFT256-PIV baseline on the classic Cai's dataset.

Overall, this study highlights the transfer-learning-based denoising diffusion models for PIV.

1 day, 23 hours назад @ paperswithcode.com
/cdjkim/ ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
/cdjkim/ ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams /cdjkim/ ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams

The rapid growth of video-text data presents challenges in storage and computation during training.

Online learning, which processes streaming data in real-time, offers a promising solution to these issues while also allowing swift adaptations in scenarios demanding real-time responsiveness.

One strategy to enhance the efficiency and effectiveness of learning involves identifying and prioritizing data that enhances performance on target downstream tasks.

We propose Relevance and Specificity-based online filtering framework (ReSpec) that selects data based on four criteria: (i) modality alignment for clean data, (ii) task relevance for target focused data, (iii) specificity for informative a…

1 day, 23 hours назад @ paperswithcode.com
/seu-vipgroup/ Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
/seu-vipgroup/ Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation /seu-vipgroup/ Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation

Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated remarkable multimodal perception capabilities, garnering significant attention.

While numerous evaluation studies have emerged, assessing LVLMs both holistically and on specialized tasks, fine-grained image tasks-fundamental to computer vision-remain largely unexplored.

To fill this gap, we introduce a comprehensive fine-grained evaluation benchmark, i.e., FG-BMK, comprising 3.49 million questions and 3.32 million images.

Our evaluation systematically examines LVLMs from both human-oriented and machine-oriented perspectives, focusing on their semantic recognition and fine-grained feature representation capabilitie…

1 day, 23 hours назад @ paperswithcode.com
/chengxihan/ HSANET: A Hybrid Self-Cross Attention Network For Remote Sensing Change Detection
/chengxihan/ HSANET: A Hybrid Self-Cross Attention Network For Remote Sensing Change Detection /chengxihan/ HSANET: A Hybrid Self-Cross Attention Network For Remote Sensing Change Detection

The remote sensing image change detection task is an essential method for large-scale monitoring.

We propose HSANet, a network that uses hierarchical convolution to extract multi-scale features.

It incorporates hybrid self-attention and cross-attention mechanisms to learn and fuse global and cross-scale information.

This enables HSANet to capture global context at different scales and integrate cross-scale features, refining edge details and improving detection performance.

We will also open-source our model code: https://github.com/ChengxiHAN/HSANet.

1 day, 23 hours назад @ paperswithcode.com
/chongqingnosubway/ How Effective Can Dropout Be in Multiple Instance Learning ?
/chongqingnosubway/ How Effective Can Dropout Be in Multiple Instance Learning ? /chongqingnosubway/ How Effective Can Dropout Be in Multiple Instance Learning ?

Multiple Instance Learning (MIL) is a popular weakly-supervised method for various applications, with a particular interest in histological whole slide image (WSI) classification.

However, it is well-known that this suboptimal training scheme suffers from "noisy" feature embeddings from the backbone and inherent weak supervision, hindering MIL from learning rich and generalizable features.

However, the most commonly used technique (i.e., dropout) for mitigating this issue has yet to be explored in MIL.

In this paper, we empirically explore how effective the dropout can be in MIL.

Based on this key observation, we propose a novel MIL-specific dropout method, termed MIL-Dropout, which systema…

1 day, 23 hours назад @ paperswithcode.com
/pfnet-research/ Segmentation with Noisy Labels via Spatially Correlated Distributions
/pfnet-research/ Segmentation with Noisy Labels via Spatially Correlated Distributions /pfnet-research/ Segmentation with Noisy Labels via Spatially Correlated Distributions

These label errors are not independently distributed, and instead usually appear in spatially connected regions where adjacent pixels are more likely to share the same errors.

Bayesian inference requires computing the posterior distribution of label errors, which becomes intractable when spatial correlations are present.

We represent the correlation of label errors between adjacent pixels through a Gaussian distribution whose covariance is structured by a Kac-Murdock-Szeg\"{o} (KMS) matrix, solving the computational challenges.

Through experiments on multiple segmentation tasks, we confirm that leveraging the spatial correlation of label errors significantly improves performance.

Notably, i…

1 day, 23 hours назад @ paperswithcode.com
/8zym/ CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs
/8zym/ CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs /8zym/ CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs

The rapid spread of misinformation, driven by digital media and AI-generated content, has made automatic claim verification essential.

To address this, we propose CRAVE, a Conflicting Reasoning Approach for explainable claim VErification, that verify the complex claims based on the conflicting rationales reasoned by large language models (LLMs).

Ambiguity Elimination enchanced Evidence Retrieval module performs ambiguity elimination and entity-based search to gather relevant evidence related to claim verification from external sources like Wikipedia.

Conflicting Perspective Reasoning and Preliminary Judgment module with LLMs adopts LLMs to reason rationales with conflicting stances about cl…

1 day, 23 hours назад @ paperswithcode.com
/ewrfcas/ Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation
/ewrfcas/ Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation /ewrfcas/ Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

Camera and human motion controls have been extensively studied for video generation, but existing approaches typically address them separately, suffering from limited data with high-quality annotations for both aspects.

To overcome this, we present Uni3C, a unified 3D-enhanced framework for precise control of both camera and human motion in video generation.

This flexibility enables different modules of Uni3C to be trained in specific domains, i.e., either camera control or human motion control, reducing the dependency on jointly annotated data.

Extensive experiments confirm that PCDController enjoys strong robustness in driving camera motion for fine-tuned backbones of video generation.

Un…

1 day, 23 hours назад @ paperswithcode.com
/knoveleng/ RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search
/knoveleng/ RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search /knoveleng/ RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search

Large Language Models (LLMs) exhibit remarkable capabilities but are susceptible to adversarial prompts that exploit vulnerabilities to produce unsafe or biased outputs.

Existing red-teaming methods often face scalability challenges, resource-intensive requirements, or limited diversity in attack strategies.

We propose RainbowPlus, a novel red-teaming framework rooted in evolutionary computation, enhancing adversarial prompt generation through an adaptive quality-diversity (QD) search that extends classical evolutionary algorithms like MAP-Elites with innovations tailored for language models.

By employing a multi-element archive to store diverse high-quality prompts and a comprehensive fitn…

1 day, 23 hours назад @ paperswithcode.com
/neuir/ Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph
/neuir/ Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph /neuir/ Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph

Intellectual Property (IP) management involves strategically protecting and utilizing intellectual assets to enhance organizational innovation, competitiveness, and value creation.

Patent matching is a crucial task in intellectual property management, which facilitates the organization and utilization of patents.

Existing models often rely on the emergent capabilities of Large Language Models (LLMs) and leverage them to identify related patents directly.

In this paper, we propose MemGraph, a method that augments the patent matching capabilities of LLMs by incorporating a memory graph derived from their parametric memory.

After traversing the memory graph, we utilize extracted entities and o…

1 day, 23 hours назад @ paperswithcode.com
/anirudhkhatry/ CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
/anirudhkhatry/ CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation /anirudhkhatry/ CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

C-to-Rust transpilation is essential for modernizing legacy C code while enhancing safety and interoperability with modern Rust ecosystems.

However, no dataset currently exists for evaluating whether a system can transpile C into safe Rust that passes a set of test cases.

We introduce CRUST-Bench, a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases that can be used to validate correctness of the transpilation.

The provided Rust interfaces provide explicit specifications that ensure adherence to idiomatic, memory-safe Rust patterns, while the accompanying test cases enforce functional correctness.

We also provide insights into the …

1 day, 23 hours назад @ paperswithcode.com
/chenwu98/ Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
/chenwu98/ Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction /chenwu98/ Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

We design a suite of minimal algorithmic tasks that are a loose abstraction of open-ended real-world tasks.

This allows us to cleanly and controllably quantify the creative limits of the present-day language model.

In these tasks, we empirically and conceptually argue how next-token learning is myopic and memorizes excessively; comparatively, multi-token approaches, namely teacherless training and diffusion models, excel in producing diverse and original output.

Thus, our work offers a principled, minimal test-bed for analyzing open-ended creative skills, and offers new arguments for going beyond next-token learning and softmax-based sampling.

We make part of the code available under https:…

1 day, 23 hours назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 1 day, 16 hours назад
/salesforceairesearch/ Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
/salesforceairesearch/ Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators /salesforceairesearch/ Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

Scaling test-time computation, or affording a generator large language model (LLM) extra compute during inference, typically employs the help of external non-generative evaluators (i.e., reward models).

Despite judge empirical successes, their effectiveness as evaluators in test-time scaling settings is largely unknown.

In this paper, we introduce the Judge Evaluation for Test-Time Scaling (JETTS) benchmark, which evaluates judge performance in three domains (math reasoning, code generation, and instruction following) under three task settings: response reranking, step-level beam search, and critique-based response refinement.

We evaluate 10 different judge models (7B-70B parameters) for 8 …

1 day, 23 hours назад @ paperswithcode.com
/sail-sg/ FlowReasoner: Reinforcing Query-Level Meta-Agents
/sail-sg/ FlowReasoner: Reinforcing Query-Level Meta-Agents /sail-sg/ FlowReasoner: Reinforcing Query-Level Meta-Agents

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query.

Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner.

A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency.

In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning.

Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner.

1 day, 23 hours назад @ paperswithcode.com
/sarahmish/ M$^2$AD: Multi-Sensor Multi-System Anomaly Detection through Global Scoring and Calibrated Thresholding
/sarahmish/ M$^2$AD: Multi-Sensor Multi-System Anomaly Detection through Global Scoring and Calibrated Thresholding /sarahmish/ M$^2$AD: Multi-Sensor Multi-System Anomaly Detection through Global Scoring and Calibrated Thresholding

With the widespread availability of sensor data across industrial and operational systems, we frequently encounter heterogeneous time series from multiple systems.

Anomaly detection is crucial for such systems to facilitate predictive maintenance.

However, most existing anomaly detection methods are designed for either univariate or single-system multivariate data, making them insufficient for these complex scenarios.

To address this, we introduce M$^2$AD, a framework for unsupervised anomaly detection in multivariate time series data from multiple systems.

These residuals are then aggregated into a global anomaly score through a Gaussian Mixture Model and Gamma calibration.

1 day, 23 hours назад @ paperswithcode.com
/juyeonnn/ KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking
/juyeonnn/ KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking /juyeonnn/ KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking

Entity linking (EL) aligns textual mentions with their corresponding entities in a knowledge base, facilitating various applications such as semantic search and question answering.

Recent advances in multimodal entity linking (MEL) have shown that combining text and images can reduce ambiguity and improve alignment accuracy.

However, most existing MEL methods overlook the rich structural information available in the form of knowledge-graph (KG) triples.

In this paper, we propose KGMEL, a novel framework that leverages KG triples to enhance MEL.

(3) Reranking: Refines the KG triples of the candidate entities and employs large language models to identify the best-matching entity for the menti…

1 day, 23 hours назад @ paperswithcode.com
/jsliam94/ Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis
/jsliam94/ Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis /jsliam94/ Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis

Neuroblastoma, adrenal-derived, is among the most common pediatric solid malignancies, characterized by significant clinical heterogeneity.

Timely and accurate pathological diagnosis from hematoxylin and eosin-stained whole slide images is critical for patient prognosis.

Existing automated whole slide image classification methods encounter challenges such as poor interpretability, limited feature extraction capabilities, and high computational costs, restricting their practical clinical deployment.

By fusing multi-scale features and leveraging contrastive learning strategies, CMSwinKAN mimics clinicians' comprehensive approach, effectively capturing global and local tissue characteristics.

2 days, 2 hours назад @ paperswithcode.com
/junzengz/ FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention
/junzengz/ FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention /junzengz/ FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention

Colonoscopy is vital in the early diagnosis of colorectal polyps.

While deep learning has made impressive strides in polyp segmentation, most existing models are trained on single-modality and single-center data, making them less effective in real-world clinical environments.

To overcome these limitations, we propose FocusNet, a Transformer-enhanced focus attention network designed to improve polyp segmentation.

FocusNet incorporates three essential modules: the Cross-semantic Interaction Decoder Module (CIDM) for generating coarse segmentation maps, the Detail Enhancement Module (DEM) for refining shallow features, and the Focus Attention Module (FAM), to balance local detail and global co…

2 days, 2 hours назад @ paperswithcode.com
/stefan-ainetter/ Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding
/stefan-ainetter/ Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding /stefan-ainetter/ Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding

High-level 3D scene understanding is essential in many applications.

However, the challenges of generating accurate 3D annotations make development of deep learning models difficult.

We turn to recent advancements in automatic retrieval of synthetic CAD models, and show that data generated by such methods can be used as high-quality ground truth for training supervised deep learning models.

Our results underscore the potential of automatic 3D annotations to enhance model performance while significantly reducing annotation costs.

To support future research in 3D scene understanding, we will release our annotations, which we call SCANnotate++, along with our trained models.

2 days, 2 hours назад @ paperswithcode.com
/e-galois/ Beyond One-Hot Labels: Semantic Mixing for Model Calibration
/e-galois/ Beyond One-Hot Labels: Semantic Mixing for Model Calibration /e-galois/ Beyond One-Hot Labels: Semantic Mixing for Model Calibration

Model calibration seeks to ensure that models produce confidence scores that accurately reflect the true likelihood of their predictions being correct.

However, existing calibration approaches are fundamentally tied to datasets of one-hot labels implicitly assuming full certainty in all the annotations.

Such datasets are effective for classification but provides insufficient knowledge of uncertainty for model calibration, necessitating the curation of datasets with numerically rich ground-truth confidence values.

In this paper, we introduce calibration-aware data augmentation to create synthetic datasets of diverse samples and their ground-truth uncertainty.

Specifically, we present Calibra…

2 days, 2 hours назад @ paperswithcode.com
/frank-oy/ A Novel Hybrid Approach for Retinal Vessel Segmentation with Dynamic Long-Range Dependency and Multi-Scale Retinal Edge Fusion Enhancement
/frank-oy/ A Novel Hybrid Approach for Retinal Vessel Segmentation with Dynamic Long-Range Dependency and Multi-Scale Retinal Edge Fusion Enhancement /frank-oy/ A Novel Hybrid Approach for Retinal Vessel Segmentation with Dynamic Long-Range Dependency and Multi-Scale Retinal Edge Fusion Enhancement

Accurate retinal vessel segmentation provides essential structural information for ophthalmic image analysis.

However, existing methods struggle with challenges such as multi-scale vessel variability, complex curvatures, and ambiguous boundaries.

To address these limitations, we propose a novel hybrid framework that synergistically integrates CNNs and Mamba for high-precision retinal vessel segmentation.

2) The Dynamic Snake Visual State Space block combines Dynamic Snake Convolution with Mamba to adaptively capture vessel curvature details and long-range dependencies.

3) The MREF module enhances boundary precision through multi-scale edge feature aggregation, suppressing noise while emphas…

2 days, 2 hours назад @ paperswithcode.com
/vision4robotics/ AnyTSR: Any-Scale Thermal Super-Resolution for UAV
/vision4robotics/ AnyTSR: Any-Scale Thermal Super-Resolution for UAV /vision4robotics/ AnyTSR: Any-Scale Thermal Super-Resolution for UAV

Thermal imaging can greatly enhance the application of intelligent unmanned aerial vehicles (UAV) in challenging environments.

Super-resolution (SR) offers a promising solution to address this issue, while most existing SR methods are designed for fixed-scale SR.

To address above issues, this work proposes a novel any-scale thermal SR method (AnyTSR) for UAV within a single model.

Specifically, a new image encoder is proposed to explicitly assign specific feature code to enable more accurate and flexible representation.

Moreover, a novel dataset (UAV-TSR), covering both land and water scenes, is constructed for thermal SR tasks.

2 days, 2 hours назад @ paperswithcode.com
/dcdmllm/ EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model
/dcdmllm/ EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model /dcdmllm/ EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model

Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare, but their reliance on general medical data and coarse-grained global visual understanding limits them in intelligent ophthalmic diagnosis.

Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data.

The lack of deeply annotated, high-quality, multi-modal ophthalmic visual instruction data; (ii) Benchmark.

Subsequently, we design Eyecare-Bench, a benchmark that comprehensively evaluates the overall performance of LVLMs on intelligent ophthalmic diagnosis tasks across multiple dimensions.

Finally, we develop the EyecareGPT, optimized for fine-grained ophthalmic visual unde…

2 days, 2 hours назад @ paperswithcode.com
/zahraakhlaghi/ Adaptive Long-term Embedding with Denoising and Augmentation for Recommendation
/zahraakhlaghi/ Adaptive Long-term Embedding with Denoising and Augmentation for Recommendation /zahraakhlaghi/ Adaptive Long-term Embedding with Denoising and Augmentation for Recommendation

The rapid growth of the internet has made personalized recommendation systems indispensable.

Graph-based sequential recommendation systems, powered by Graph Neural Networks (GNNs), effectively capture complex user-item interactions but often face challenges such as noise and static representations.

In this paper, we introduce the Adaptive Long-term Embedding with Denoising and Augmentation for Recommendation (ALDA4Rec) method, a novel model that constructs an item-item graph, filters noise through community detection, and enriches user-item interactions.

Graph Convolutional Networks (GCNs) are then employed to learn short-term representations, while averaging, GRUs, and attention mechanisms…

2 days, 2 hours назад @ paperswithcode.com
/gpt4vision/ Compile Scene Graphs with Reinforcement Learning
/gpt4vision/ Compile Scene Graphs with Reinforcement Learning /gpt4vision/ Compile Scene Graphs with Reinforcement Learning

Next token prediction is the fundamental principle for training large language models (LLMs), and reinforcement learning (RL) further enhances their reasoning performance.

As an effective way to model language, image, video, and other modalities, the use of LLMs for end-to-end extraction of structured visual representations, such as scene graphs, remains underexplored.

It requires the model to accurately produce a set of objects and relationship triplets, rather than generating text token by token.

To achieve this, we introduce R1-SGG, a multimodal LLM (M-LLM) initially trained via supervised fine-tuning (SFT) on the scene graph dataset and subsequently refined using reinforcement learning …

2 days, 2 hours назад @ paperswithcode.com
/mi3labucm/ Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety
/mi3labucm/ Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety /mi3labucm/ Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety

In this paper, we propose a multimodal approach that integrates vision-language reasoning with zero-shot object detection to improve hazard identification and explanation.

Our pipeline consists of a Vision-Language Model (VLM), a Large Language Model (LLM), in order to detect hazardous objects within a traffic scene.

We refine object detection by incorporating OpenAI's CLIP model to match predicted hazards with bounding box annotations, improving localization accuracy.

Additionally, we release a set of tools for structuring and managing large-scale hazard detection datasets.

Our findings highlight the strengths and limitations of current vision-language-based approaches, offering insights i…

2 days, 2 hours назад @ paperswithcode.com
/tosshero/ Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction
/tosshero/ Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction /tosshero/ Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction

So we present a lightweight multi-modal framework for 3D object detection and trajectory prediction.

Our system synergistically integrates LiDAR and camera inputs to achieve real-time perception of pedestrians, vehicles, and riders in 3D space.

The framework proposes two novel modules: 1) a Cross-Modal Deformable Transformer (CMDT) for object detection with high accuracy and acceptable amount of computation, and 2) a Reference Trajectory-based Multi-Class Transformer (RTMCT) for efficient and diverse trajectory prediction of mult-class objects with flexible trajectory lengths.

Evaluations on the CODa benchmark demonstrate superior performance over existing methods across detection (+2.03% i…

2 days, 2 hours назад @ paperswithcode.com
💼 University and corporation labs
DeepMind DeepMind
последний пост 6 days, 11 hours назад
Introducing Gemini 2.5 Flash
Introducing Gemini 2.5 Flash Introducing Gemini 2.5 Flash

In fact, Gemini 2.5 Flash performs strongly on Hard Prompts in LMArena , second only to 2.5 Pro.

Today we are rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI .

Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off.

2.5 Flash has comparable metrics to other leading models for a fraction of the cost and size.

Start building with Gemini 2.5 Flash today

6 days, 11 hours назад @ developers.googleblog.com
Generate videos in Gemini and Whisk with Veo 2
Generate videos in Gemini and Whisk with Veo 2 Generate videos in Gemini and Whisk with Veo 2

Starting today, Gemini Advanced users can generate and share videos using our state-of-the-art video model, Veo 2.

In Gemini, you can now translate text-based prompts into dynamic videos.

How to create videos with GeminiVeo 2 represents a leap forward in video generation, designed to produce high-resolution, detailed videos with cinematic realism.

To generate videos, select Veo 2 from the model dropdown in Gemini.

Sharing your video on mobile is easy: simply tap the share button to quickly upload engaging short videos to platforms like TikTok and YouTube Shorts.

1 week, 1 day назад @ blog.google
DolphinGemma: How Google AI is helping decode dolphin communication
DolphinGemma: How Google AI is helping decode dolphin communication DolphinGemma: How Google AI is helping decode dolphin communication

For decades, understanding the clicks, whistles and burst pulses of dolphins has been a scientific frontier.

What if we could not only listen to dolphins, but also understand the patterns of their complex communication well enough to generate realistic responses?

Today, on National Dolphin Day, Google, in collaboration with researchers at Georgia Tech and the field research of the Wild Dolphin Project (WDP), is announcing progress on DolphinGemma: a foundational AI model trained to learn the structure of dolphin vocalizations and generate novel dolphin-like sound sequences.

This approach in the quest for interspecies communication pushes the boundaries of AI and our potential connection wit…

1 week, 2 days назад @ blog.google
Taking a responsible path to AGI
Taking a responsible path to AGI Taking a responsible path to AGI

Today, we're sharing our views on AGI safety and security as we navigate the path toward this transformational technology.

Artificial general intelligence (AGI), AI that’s at least as capable as humans at most cognitive tasks, could be here within the coming years.

Building an ecosystem for AGI readinessLed by Shane Legg, Co-Founder and Chief AGI Scientist at Google DeepMind, our AGI Safety Council (ASC) analyzes AGI risk and best practices, making recommendations on safety measures.

As such, we’ve launched a new course on AGI Safety for students, researchers and professionals interested in this topic.

Ultimately, our approach to AGI safety and security serves as a vital roadmap to address …

3 weeks назад @ deepmind.google
Evaluating potential cybersecurity threats of advanced AI
Evaluating potential cybersecurity threats of advanced AI Evaluating potential cybersecurity threats of advanced AI

We analyzed over 12,000 real-world attempts to use AI in cyberattacks in 20 countries, drawing on data from Google’s Threat Intelligence Group .

These frameworks enabled us to evaluate threats across the end-to-end cyber attack chain, from reconnaissance to action on objectives, and across a range of possible attack scenarios.

Our updated Frontier Safety Framework recognizes that advanced AI models could automate and accelerate cyberattacks, potentially lowering costs for attackers.

Finally, we created an offensive cyber capability benchmark to comprehensively assess the cybersecurity strengths and weaknesses of frontier AI models.

Empowering the cybersecurity communityAs AI systems continu…

3 weeks назад @ deepmind.google
Gemini 2.5: Our most intelligent AI model
Gemini 2.5: Our most intelligent AI model Gemini 2.5: Our most intelligent AI model

Today we’re introducing Gemini 2.5, our most intelligent AI model.

Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.

In the field of AI, a system’s capacity for “reasoning” refers to more than just classification and prediction.

Building on this, we recently introduced our first thinking model, Gemini 2.0 Flash Thinking.

Now, with Gemini 2.5, we've achieved a new level of performance by combining a significantly enhanced base model with improved post-training.

4 weeks, 1 day назад @ blog.google
Gemini Robotics brings AI into the physical world
Gemini Robotics brings AI into the physical world Gemini Robotics brings AI into the physical world

Research Gemini Robotics brings AI into the physical world ShareCopy link ×Introducing Gemini Robotics, our Gemini 2.0-based model designed for robotics At Google DeepMind, we've been making progress in how our Gemini models solve complex problems through multimodal reasoning across text, images, audio and video.

Because it’s built on a foundation of Gemini 2.0, Gemini Robotics is intuitively interactive.

Watch "Gemini Robotics: Dexterous"Multiple embodiments Finally, because robots come in all shapes and sizes, Gemini Robotics was also designed to easily adapt to different robot types.

Pause video Play videoEnhancing Gemini’s world understandingAlongside Gemini Robotics, we’re introducing …

1 month, 1 week назад @ deepmind.google
Experiment with Gemini 2.0 Flash native image generation
Experiment with Gemini 2.0 Flash native image generation Experiment with Gemini 2.0 Flash native image generation

In December we first introduced native image output in Gemini 2.0 Flash to trusted testers.

Today, we're making it available for developer experimentation across all regions currently supported by Google AI Studio.

You can test this new capability using an experimental version of Gemini 2.0 Flash (gemini-2.0-flash-exp) in Google AI Studio and via the Gemini API.

Gemini 2.0 Flash combines multimodal input, enhanced reasoning, and natural language understanding to create images that give you exactly what you ask for.

Text and images togetherUse Gemini 2.0 Flash to tell a story and it will illustrate it with pictures, keeping the characters and settings consistent throughout.

1 month, 1 week назад @ developers.googleblog.com
Introducing Gemma 3
Introducing Gemma 3 Introducing Gemma 3

Built-in safety for image applications with ShieldGemma 2Alongside Gemma 3, we're also launching ShieldGemma 2, a powerful 4B image safety checker built on the Gemma 3 foundation.

A “Gemmaverse” of models and toolsThe Gemmaverse is a vast ecosystem of community-created Gemma models and tools, ready to power and inspire your innovation.

Get started with Gemma 3As part of our ongoing commitment to democratizing access to high-quality AI, Gemma 3 represents the next step.

Here's where to start:Instant exploration:Try Gemma 3 at full precision directly in your browser – no setup needed – with Google AI Studio.

Get an API key directly from Google AI Studio and use Gemma 3 with the Google GenAI S…

1 month, 1 week назад @ blog.google
Start building with Gemini 2.0 Flash and Flash-Lite
Start building with Gemini 2.0 Flash and Flash-Lite Start building with Gemini 2.0 Flash and Flash-Lite

Gemini 2.0 Flash-Lite is now generally available in the Gemini API for production use in Google AI Studio and for enterprise customers on Vertex AI

1 month, 3 weeks назад @ deepmind.google
Gemini 2.0 is now available to everyone
Gemini 2.0 is now available to everyone Gemini 2.0 is now available to everyone

And last week, we made an updated 2.0 Flash available to all users of the Gemini app on desktop and mobile, helping everyone discover new ways to create, interact and collaborate with Gemini.

Today, we’re making the updated Gemini 2.0 Flash generally available via the Gemini API in Google AI Studio and Vertex AI.

It is available in Google AI Studio and Vertex AI, and in the Gemini app for Gemini Advanced users.

Finally, 2.0 Flash Thinking Experimental will be available to Gemini app users in the model dropdown on desktop and mobile.

Try Gemini 2.0 Flash in the Gemini app or the Gemini API in Google AI Studio and Vertex AI.

2 months, 2 weeks назад @ blog.google
Updating the Frontier Safety Framework
Updating the Frontier Safety Framework Updating the Frontier Safety Framework

Responsibility & Safety Updating the Frontier Safety Framework ShareCopy link ×Our next iteration of the FSF sets out stronger security protocols on the path to AGI AI is a powerful tool that is helping to unlock new breakthroughs and make significant progress on some of the biggest challenges of our time, from climate change to drug discovery.

That’s why we introduced the first iteration of our Frontier Safety Framework last year - a set of protocols to help us stay ahead of possible severe risks from powerful frontier AI models.

We have also implemented the Framework in our safety and governance processes for evaluating frontier models such as Gemini 2.0.

As a result of this work, today w…

2 months, 2 weeks назад @ deepmind.google
FACTS Grounding: A new benchmark for evaluating the factuality of large language models
FACTS Grounding: A new benchmark for evaluating the factuality of large language models FACTS Grounding: A new benchmark for evaluating the factuality of large language models

They can “hallucinate” false information, particularly when given complex inputs.

Today, we’re introducing FACTS Grounding, a comprehensive benchmark for evaluating the ability of LLMs to generate responses that are not only factually accurate with respect to given inputs, but also sufficiently detailed to provide satisfactory answers to user queries.

We hope our benchmark will spur industry-wide progress on factuality and grounding.

To track progress, we’re also launching the FACTS leaderboard on Kaggle.

We’ve already tested leading LLMs using FACTS Grounding and have populated the initial leaderboard with their grounding scores.

4 months, 1 week назад @ deepmind.google
State-of-the-art video and image generation with Veo 2 and Imagen 3
State-of-the-art video and image generation with Veo 2 and Imagen 3 State-of-the-art video and image generation with Veo 2 and Imagen 3

Earlier this year, we introduced our video generation model, Veo, and our latest image generation model, Imagen 3.

Since then, it’s been exciting to watch people bring their ideas to life with help from these models: YouTube creators are exploring the creative possibilities of video backgrounds for their YouTube Shorts, enterprise customers are enhancing creative workflows on Vertex AI and creatives are using VideoFX and ImageFX to tell their stories.

Together with collaborators ranging from filmmakers to businesses, we’re continuing to develop and evolve these technologies.

Today we're introducing a new video model, Veo 2, and the latest version of Imagen 3, both of which achieve state-of-…

4 months, 1 week назад @ blog.google
Introducing Gemini 2.0: our new AI model for the agentic era
Introducing Gemini 2.0: our new AI model for the agentic era Introducing Gemini 2.0: our new AI model for the agentic era

Today we’re excited to launch our next era of models built for this new agentic era: introducing Gemini 2.0, our most capable model yet.

Starting today our Gemini 2.0 Flash experimental model will be available to all Gemini users.

It's available in Gemini Advanced today.

TPUs powered 100% of Gemini 2.0 training and inference, and today Trillium is generally available to customers so they can build with it too.

If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful.

4 months, 1 week назад @ blog.google
Google
последний пост 14 часов назад
Going from requirements to prototype with Gemini Code Assist
Going from requirements to prototype with Gemini Code Assist Going from requirements to prototype with Gemini Code Assist

Imagine this common scenario: you have a detailed product requirements document for your next project.

Instead of reading the whole document and manually starting to code (or defining test cases or API specifications) to implement the required functions, you want to see how AI can shorten your path from the requirements document to a working application prototype.

In this article, we’ll show you an example of how you can use Gemini Code Assist to access a requirements doc without leaving your code editor through Google Docs integration, part of Gemini Code Assist tools, and get from requirements to a working application using a few natural language prompts.

This can be any requirements anal…

14 часов назад @ cloud.google.com
MCP Toolbox for Databases: Simplify AI Agent Access to Enterprise Data
MCP Toolbox for Databases: Simplify AI Agent Access to Enterprise Data MCP Toolbox for Databases: Simplify AI Agent Access to Enterprise Data

At Google Cloud Next 25, we announced incredible ways for enterprises to build multi-agent ecosystems with Vertex AI and Google Cloud Databases – including better ways for agents to communicate with each other using Agent2Agent Protocol and Model Context Protocol (MCP).

With the growing excitement around MCP for developers, we're making it easy for MCP Toolbox for Databases (formerly Gen AI Toolbox for Databases) to access your enterprise data in databases.

This is another step forward in providing secure and standardized ways to innovate with agentic applications.

MCP Toolbox for Databases (formerly Gen AI Toolbox for Databases)MCP Toolbox for Databases (Toolbox) is an open-source MCP (Mod…

1 day, 14 hours назад @ cloud.google.com
50% faster merge and 50% fewer bugs: How CodeRabbit built its AI code review agent with Google Cloud Run
50% faster merge and 50% fewer bugs: How CodeRabbit built its AI code review agent with Google Cloud Run 50% faster merge and 50% fewer bugs: How CodeRabbit built its AI code review agent with Google Cloud Run

CodeRabbit, a rapidly growing AI code review tool, is leveraging Google Cloud Run to cut code review time and bugs in half by safely and efficiently executing untrusted code.

CodeRabbit improves code quality and automates code reviews by analyzing changes against the entire codebase and generating scripts for deeper analysis.

It integrates with code hosting platforms to provide automated feedback on pull requests.

To safely execute untrusted code, CodeRabbit needed an execution environment that was scalable, cost-effective, and secure enough to analyse and run their customers’ code.

In this post, we’ll share how CodeRabbit built an AI code review agent with Google Cloud Run to scale dynamic…

1 day, 14 hours назад @ cloud.google.com
Google Cloud Database and LangChain integrations now support Go, Java, and JavaScript
Google Cloud Database and LangChain integrations now support Go, Java, and JavaScript Google Cloud Database and LangChain integrations now support Go, Java, and JavaScript

Last year, Google Cloud and LangChain announced integrations that give generative AI developers access to a suite of LangChain Python packages.

This allowed application developers to leverage Google Cloud’s database portfolio in their gen AI applications to drive the most value from their private data.

Today, we are expanding language support for our integrations to include Go, Java, and JavaScript.

This technology unlocks a variety of applications, including personalized product recommendations, question answering, document search and synthesis, customer service automation, and more.

In this post, we’ll share more about the integrations – and code snippets to get started.

1 day, 14 hours назад @ cloud.google.com
Next 25 developer keynote: From prompt, to agent, to work, to fun
Next 25 developer keynote: From prompt, to agent, to work, to fun Next 25 developer keynote: From prompt, to agent, to work, to fun

This year, the developer keynote was hosted by the inimitable duo of Richard Seroter, Google Cloud Chief Evangelist, and Stephanie Wong, Head of Developer Skills and Community, plus a whole host of experts from around Google Cloud product, engineering, and developer advocacy teams.

The keynote itself was organized around a noble, relatable goal: Use AI to help remodel AI Developer Experience Engineer Paige Bailey’s 1970s era kitchen.

It all starts with a promptThe generative AI experience starts by prompting a model with data and your intent.

Paige was joined on stage by Logan Kilpatrick, Senior Product Manager at Google DeepMind.

There, Logan and Paige prompted AI Studio to analyze Paige’s…

1 week, 5 days назад @ cloud.google.com
Colossus: the secret ingredient in Rapid Storage’s high performance
Colossus: the secret ingredient in Rapid Storage’s high performance Colossus: the secret ingredient in Rapid Storage’s high performance

The Colossus Curator constructs a handle and sends it to the Colossus Client running in-process, which caches the handle.

The application issues a write call for an arbitrary-sized log entry to the Colossus Client.

The Colossus Client, using the disk addresses in the handle, writes the log entry in parallel to all the disks.

Rapid Storage builds on Colossus’s stateful protocol, leveraging gRPC-based streaming for the underlying transport.

With Rapid Storage, the client receives a handle from Cloud Storage when creating the stream.

1 week, 6 days назад @ cloud.google.com
Enabling global scientific discovery and innovation on Google Cloud
Enabling global scientific discovery and innovation on Google Cloud Enabling global scientific discovery and innovation on Google Cloud

For example, Google Research’s Quantum AI team leverages Google Cloud to simulate the intricate device physics of quantum hardware, develop sophisticated hybrid quantum-classical algorithms, and explore and test novel quantum algorithms.

- Sergio Boixo, Director, Computer Science, Google Quantum AIHPC clusters demand high I/O performance to keep computational performance from stalling.

Google Cloud Managed Lustre delivers a high-performance, fully-managed parallel file system optimized for HPC and AI applications.

AlphaFold 3 is now available for non-commercial use on Google Cloud.

“Having access to the scientific capabilities of AlphaFold on Google Cloud can help our research rapidly predi…

1 week, 6 days назад @ cloud.google.com
New GKE inference capabilities reduce costs, tail latency and increase throughput
New GKE inference capabilities reduce costs, tail latency and increase throughput New GKE inference capabilities reduce costs, tail latency and increase throughput

When it comes to AI, inference is where today’s generative AI models can solve real-world business problems.

Google Kubernetes Engine (GKE) is seeing increasing adoption of gen AI inference.

For example, customers like HubX run inference of image-based models to serve over 250k images/day to power gen AI experiences, and Snap runs AI inference on GKE for its ad ranking system.

However, there are challenges when deploying gen AI inference.

While many customers are interested in using Tensor Processing Units (TPU), they are looking for compatibility with popular model servers.

1 week, 6 days назад @ cloud.google.com
Vertex AI offers new ways to build and manage multi-agent systems
Vertex AI offers new ways to build and manage multi-agent systems Vertex AI offers new ways to build and manage multi-agent systems

– Laurent Giraud, Chief Data (&AI) Officer, Renault Group.

“We've implemented the Agent Engine as the backbone of our video analysis AI agent, powered by Gemini.

Introducing Agent Engine: Deploying AI agents with enterprise-grade controlsAgent Engine is our fully managed runtime that makes it easy to deploy AI agents to production.

Keep the context in your sessions: Rather than starting from a blank slate each time, the Agent Engine supports short-term memory and long-term memory.

Drive broader adoption by connecting to Agentspace: You can register your agents hosted on Agent Engine to Google Agentspace.

2 weeks назад @ cloud.google.com
34 gen AI success stories with customers and ISVs
34 gen AI success stories with customers and ISVs 34 gen AI success stories with customers and ISVs

In the four decades that have followed, I’ve seen hundreds, maybe even thousands of developers and companies transform as we shifted from the internet to client-server to mobile and to cloud.

I’ve not seen anything quite as transformative quite as fast as what we’re seeing now, helping companies harness AI to transform their software development, business process, information retrieval, and more.

As part of our commitment to offering the most open and innovative generative AI partner ecosystem, we’re lucky enough to work with the most cutting-edge software companies in the world.

And if you want even more examples, check out our list of hundreds of real-world gen AI use cases we’ve helped b…

2 weeks назад @ cloud.google.com
Bringing Gemini and Google Agentspace to you on-premises
Bringing Gemini and Google Agentspace to you on-premises Bringing Gemini and Google Agentspace to you on-premises

Today we are announcing that Gemini will be available on Google Distributed Cloud (GDC), bringing Google’s most capable models to on-premises environments, with public preview starting in Q3 2025.

To do so, we’ve partnered with NVIDIA to bring our Gemini models to NVIDIA Blackwell systems that you can purchase through Google or your preferred channels.

It offers infrastructure-as-a-service, security, data, and AI services, and is extensible with a rich ISV ecosystem.

“NVIDIA and Google Distributed Cloud provide a secure AI platform, bringing Gemini models to enterprise datacenters and regulated industries.

With NVIDIA Blackwell infrastructure and confidential computing, Google Distributed C…

2 weeks назад @ cloud.google.com
Google Cloud databases supercharge the AI developer experience
Google Cloud databases supercharge the AI developer experience Google Cloud databases supercharge the AI developer experience

Built from the ground up by Google Cloud, it provides developers with additional choice for their demanding document database workloads.

We're committed to our partner ecosystem and will continue to support MongoDB Atlas in our Google Cloud Marketplace.

We’re continuing to invest in global infrastructure for Oracle, and these services are being deployed natively in 20 Google Cloud locations.

This is a fully-managed AlloyDB Omni service, from our partner Aiven, running on AWS, Azure, and Google Cloud.

Learn more about Google Cloud databases and start a free trial for Cloud SQL, AlloyDB, and Spanner.

2 weeks назад @ cloud.google.com
New services to help enterprises adopt AI from Google Cloud Consulting
New services to help enterprises adopt AI from Google Cloud Consulting New services to help enterprises adopt AI from Google Cloud Consulting

Enterprise customers are coming to Google Cloud to transform their businesses with AI, and many are turning to our seasoned experts at Google Cloud Consulting to help implement these innovations.

Working alongside our many partners, Google Cloud Consulting teams are helping customers identify the right use cases for AI, deliver them safely and securely, and then generate strong ROI.

In fact, engagements focused on implementing Google Cloud AI have become the fastest-growing area within our consulting practice over the past year — indicating the tremendous level of excitement around Google’s AI offerings.

This week at Google Cloud Next, we’re leaning into this success with the launch of seve…

2 weeks назад @ cloud.google.com
Global startups are building the future of AI on Google Cloud
Global startups are building the future of AI on Google Cloud Global startups are building the future of AI on Google Cloud

The most exciting startups in the world are in Las Vegas this week, as Google Cloud Next kicks off with a major focus on how AI and cloud are powering the next great wave of innovation.

Anthropic has been using Google Cloud infrastructure to support model training and inference for several years.

Google Cloud has also become an increasingly important route for organizations to access Anthropic’s models.

This week at Google Cloud Next, we’re highlighting the progress that even more global startups are making toward building AI systems and applications that will create real value for people and businesses.

We’ll share some of the new startups who have chosen Google Cloud as their key technolo…

2 weeks назад @ cloud.google.com
Building the industry’s best agentic AI ecosystem with partners
Building the industry’s best agentic AI ecosystem with partners Building the industry’s best agentic AI ecosystem with partners

Since the beginning, partners have been core to Google Cloud — and that’s been especially true in the AI era.

Partners have already built more than 1,000 AI agent use cases for customers across nearly every industry.

The AI opportunity for Google Cloud partners is growing fast.

It’s clear that much of the opportunity ahead lies in agentic AI — and now, partners are infused at every layer of our AI agent stack.

We’re committed to helping all of our partners capitalize on the AI opportunity, whether they’re building new technology, integrating with our products, or delivering critical enterprise services.

2 weeks назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 14 часов назад
Research Focus: Week of April 21, 2025
Research Focus: Week of April 21, 2025 Research Focus: Week of April 21, 2025

In this issue:Catch a preview of our presentations and papers at CHI 2025 and ICLR 2025.

You’ll also find a replay of a podcast discussion on rural healthcare innovation with Senior Vice President of Microsoft Health Jim Weinstein.

CONFERENCEMicrosoft at CHI 2025Microsoft Research is proud to be a sponsor of the ACM Computer Human Interaction (CHI) 2025 Conference on Human Factors in Computing Systems (opens in new tab).

It advances our understanding of LLMs and their causal implications, and proposes a framework for future research at the intersection of LLMs and causality.

That’s the driving question behind the Microsoft Research Tools for Thought initiative.

14 часов назад @ microsoft.com
The Future of AI in Knowledge Work: Tools for Thought at CHI 2025
The Future of AI in Knowledge Work: Tools for Thought at CHI 2025 The Future of AI in Knowledge Work: Tools for Thought at CHI 2025

Participants’ reports about critical thinking and the effort involved align with longstanding human tendencies to manage cognitive load at work.

RecommendationsTo foster critical thinking at work, we recommend that AI tools actively encourage awareness, motivation, and skill development.

AI tools should enhance motivators for critical thinking (e.g., quality standards, skill-building) and mitigate inhibitors (e.g., time constraints, low awareness).

AI tools should also support knowledge workers’ ability to think critically by providing reasoning explanations (as some newer AI models now do), guided critiques, and cross-references.

Beyond these insights, our other CHI papers explore practica…

5 days, 14 hours назад @ microsoft.com
Empowering patients and healthcare consumers in the age of generative AI
Empowering patients and healthcare consumers in the age of generative AI Empowering patients and healthcare consumers in the age of generative AI

[LAUGHS]DEBRONKART: Ah, well, that’s … that’s even weirder.

I’m, as I said at the beginning, I’m glad to be alive and I’m really, really, really grateful to be given a chance to share my thoughts with your audience because I really like super smart nerds.

And it’s really to me like where I’m seeing kind of the first set of really kind of promising AI applications.

And so, to me, that’s really kind of where I see the most interesting opportunities for technology and for digital health.

Just really, really appreciate it.

6 days, 7 hours назад @ microsoft.com
Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project
Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project

More expert users are satisfied with AI responses only where AI expertise is on par with their own expertise on the topic, while novice users had low satisfaction rates regardless of AI expertise.

AI expertise: Labels the AI agent expertise based on the same criteria as user expertise above.

We utilized our knowledge work classifier to label the chat log data as relating to knowledge work tasks.

We classified both the user expertise and AI agent expertise for anonymous interactions in Copilot in Bing.

Figure 10: Copilot in Bing satisfaction intersection of AI expertise and User expertise (August-September 2024)ConclusionUnderstanding these metrics is vital for grasping user behavior over ti…

1 week, 2 days назад @ microsoft.com
Debug-gym: an environment for AI coding tools to learn how to debug code like programmers
Debug-gym: an environment for AI coding tools to learn how to debug code like programmers Debug-gym: an environment for AI coding tools to learn how to debug code like programmers

This was what motivated us to maximize the potential time savings from AI coding tools by teaching them to debug code.

Today’s AI coding tools boost productivity and excel at suggesting solutions for bugs based on available code and error messages.

This may leave users feeling like AI coding tools don’t understand the full context of the issues they are trying to solve.

Introducing debug-gymA natural research question emerges: to what degree can LLMs use interactive debugging tools such as pdb?

The green bars indicate the performance of the agent with debugging tools, while the gray bars show the performance of the agent without debugging tools.

1 week, 6 days назад @ microsoft.com
Research Focus: Week of April 7, 2025
Research Focus: Week of April 7, 2025 Research Focus: Week of April 7, 2025

Check out our latest research and other updates.

MRI images can suffer from high levels of noise when scanning is accelerated with parallel imaging or when data are acquired using lower cost, low-field MRI systems.

This broad applicability means that the method is flexible and can be applied to different kinds of models without redesigning them.

The testing showed SNRAware significantly improves the quality and clinical utility of MRI images while preserving important diagnostic details.

To help astronomers accelerate this fundamental process, researchers from Microsoft and external colleagues introduce Mephisto, research designed to analyze extremely distant galaxies observed by the James …

2 weeks назад @ microsoft.com
Real-world healthcare AI development and deployment—at scale
Real-world healthcare AI development and deployment—at scale Real-world healthcare AI development and deployment—at scale

And of course, at the end of the day, I think that’s really been your role.

What is the current state of affairs for multimodal, you know, healthcare AI, medical AI?

So many folks when they hear AI think, it will just magically do everything perfectly.

They’ve had to really think of other ways that will guarantee that the human stays in the loop.

But the fact that we now have a credible chance of making that dream happen for real, I think that’s pretty wonderful.

2 weeks, 6 days назад @ microsoft.com
VidTok introduces compact, efficient tokenization to enhance AI video processing
VidTok introduces compact, efficient tokenization to enhance AI video processing VidTok introduces compact, efficient tokenization to enhance AI video processing

In our paper “VidTok: A Versatile and Open-Source Video Tokenizer,” we introduce a method that converts video data into smaller, structured units, or tokens.

By simplifying videos into manageable chunks, VidTok can enable AI systems to learn from, analyze, and generate video content more efficiently.

QuantizationTo efficiently compress video data, AI systems often use quantization to reduce the amount of information that needs to be stored or transmitted.

Its capacity to model complex visual dynamics could improve the efficiency of video systems by enabling AI processing on more compact units rather than raw pixels.

VidTok serves as a promising foundation for further research in video proce…

3 weeks назад @ microsoft.com
Ideas: Accelerating Foundation Models Research: AI for all
Ideas: Accelerating Foundation Models Research: AI for all Ideas: Accelerating Foundation Models Research: AI for all

But there was that class I really, really enjoyed, which was mathematical logic.

Well, let’s get onto the topic of Accelerating Foundation Models Research and unpack the big idea behind that.

It might be confusing for some people, Accelerating Foundation Models Research.

And so when we started with Accelerating Foundation Models Research and from now on, I will say AFMR if that’s okay.

It’s about access to people, access to the resources and really co-designing so that we can really, really make more advances together.

3 weeks, 2 days назад @ microsoft.com
Research Focus: Week of March 24, 2025
Research Focus: Week of March 24, 2025 Research Focus: Week of March 24, 2025

Deep generative models that learn from the distribution of natural protein sequences and structures may enable the design of new proteins with valuable functions.

Watch the seriesPODCAST The future of generative AI for scientific discovery Most of us think of generative AI in the context of text or image generation, but it’s also a powerful tool for scientific discovery.

In this episode of the Leading the Shift podcast (opens in new tab), host Susan Etlinger speaks with Ade Famoti, a senior leader on the Microsoft Research Accelerator team.

Ade discusses what he calls “AI’s physics moment,” and why he believes generative AI feels fundamentally different from past platform shifts.

Ade shares…

4 weeks назад @ microsoft.com
The reality of generative AI in the clinic
The reality of generative AI in the clinic The reality of generative AI in the clinic

Sara is vice president and chief health AI officer at UC San Francisco Health.

LONGHURST: So the pat response is AI won’t replace doctors, but AI will replace doctors who don’t use AI.

LEE: And I’m assuming a chief health AI officer is not a role that has been around for a long time.

LEE: Should I be impressed or concerned that the chief health AI officer at UC San Francisco Health is using ChatGPT off label?

We’ll delve into how patients are using generative AI for their own healthcare, the hype and reality of AI drug discovery, and more.

1 month назад @ microsoft.com
Claimify: Extracting high-quality claims from language model outputs
Claimify: Extracting high-quality claims from language model outputs Claimify: Extracting high-quality claims from language model outputs

The U.N. found that the resulting contaminated water caused many residents to fall ill, highlighting the need for improved water management.

Emerging markets face economic challenges.

Excerpt: “The U.N. found that the resulting contaminated water caused many residents to fall ill, highlighting the need for improved water management.”Claims: The U.N. found contaminated water in Derna, Libya.

Explanation: The first claim is inaccurate because the U.N. found the link between contaminated water and illness, not the contaminated water itself.

First, it found two instances of ambiguity – “resulting contaminated water” and “many residents” – that it determined could be resolved using the context.

1 month назад @ microsoft.com
Metasurface: Unlocking the future of wireless sensing and communication
Metasurface: Unlocking the future of wireless sensing and communication Metasurface: Unlocking the future of wireless sensing and communication

Using this capability, we developed a GNSS positioning metasurface system (GPMS) based on passive metasurface technology.

Passive metasurfaces guide GNSS signals indoors, while enhanced positioning algorithms provide precise indoor positioning on mobile devices.

This proposed framework can optimize millimeter-wave coverage using low-cost passive metasurface design and strategic placement.

The AutoMS framework generates optimized deployment plans for passive metasurface and access points based on environment scanning results.

Low-cost passive metasurface design: We designed a high-reflectivity passive metasurface with near-2π phase control and broadband compatibility for the millimeter-wave …

1 month назад @ microsoft.com
Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs
Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs Introducing KBLaM: Bringing plug-and-play external knowledge to LLMs

Large language models (LLMs) have demonstrated remarkable capabilities in reasoning, language understanding, and even creative tasks.

A new way to integrate knowledgeTo address these challenges, we introduce the Knowledge Base-Augmented Language Model (KBLaM) —a novel approach that integrates structured knowledge bases into pre-trained LLMs.

In this setup, language tokens (such as those from a user’s question) attend to all knowledge tokens.

However, knowledge tokens do not attend to one another, nor do they attend back to the language tokens.

Figure 2: By having the user’s question attend to the knowledge base, while treating facts in the knowledge base independently, KBLaM scales efficien…

1 month назад @ microsoft.com
Semantic Telemetry: Understanding how users interact with AI systems
Semantic Telemetry: Understanding how users interact with AI systems Semantic Telemetry: Understanding how users interact with AI systems

First, LLMs give us a new thing to measure, that is, how people interact with AI systems.

Semantic Telemetry is a rethink of traditional telemetry–in which data is collected for understanding systems–designed for analyzing chat-based AI.

Description of LLM generated label taxonomy processWith this approach, we have analyzed how people interact with Copilot in Bing.

Figure 7: Most and least complex topics based on percentage of high complexity tasks.

We are now able to obtain actionable insight from complex data that is not possible with traditional data science pattern-matching methods.

1 month, 2 weeks назад @ microsoft.com
MIT AI MIT AI
последний пост 15 часов назад
New model predicts a chemical reaction’s point of no return
New model predicts a chemical reaction’s point of no return New model predicts a chemical reaction’s point of no return

When chemists design new chemical reactions, one useful piece of information involves the reaction’s transition state — the point of no return from which a reaction must proceed.

Their model could make it easier for chemists to design chemical reactions that could generate a variety of useful compounds, such as pharmaceuticals or fuels.

However, that process requires a great deal of computing power and can take hours or days to calculate a single transition state.

In 2023, Kulik, Duan, and others reported on a machine-learning strategy that they developed to predict the transition states of reactions.

“A linear guess is a good starting point for approximating where that transition state wil…

15 часов назад @ news.mit.edu
“Periodic table of machine learning” could fuel AI discovery
“Periodic table of machine learning” could fuel AI discovery “Periodic table of machine learning” could fuel AI discovery

The periodic table stems from one key idea: All these algorithms learn a specific kind of relationship between data points.

Just like the periodic table of chemical elements, which initially contained blank squares that were later filled in by scientists, the periodic table of machine learning also has empty spaces.

An accidental equationThe researchers didn’t set out to create a periodic table of machine learning.

It includes everything from classification algorithms that can detect spam to the deep learning algorithms that power LLMs.

In addition, the flexible periodic table allows researchers to add new rows and columns to represent additional types of datapoint connections.

1 day, 2 hours назад @ news.mit.edu
3D modeling you can feel
3D modeling you can feel 3D modeling you can feel

The CSAIL team’s “TactStyle” tool allows creators to stylize 3D models based on images while also incorporating the expected tactile properties of the textures.

Fundamental to the uniqueness of physical objects are their tactile properties, such as roughness, bumpiness, or the feel of materials like wood or stone.

“TactStyle” tool allows creators to stylize 3D models based on images while also incorporating the expected tactile properties of the textures.

Looking ahead, Faruqi says the team aims to extend TactStyle to generate novel 3D models using generative AI with embedded textures.

This requires exploring exactly the sort of pipeline needed to replicate both the form and function of the…

1 day, 11 hours назад @ news.mit.edu
Norma Kamali is transforming the future of fashion with AI
Norma Kamali is transforming the future of fashion with AI Norma Kamali is transforming the future of fashion with AI

For more than five decades, fashion designer and entrepreneur Norma Kamali has pioneered bold industry shifts, creating iconic silhouettes worn by celebrities including Whitney Houston and Jessica Biel.

“And then suddenly, I was playing.”Experimenting with her proprietary AI model, created by Maison Meta, Kamali used AI to reinterpret one of her signature styles — black garments adorned with silver studs.

Professionals must be empowered to harness AI’s potential in ways that not only enhance their work, but redefine what’s possible.

Designers need new tools to adapt.”Beyond its creative applications, Kamali sees AI as a vehicle for sustainability.

“The possibilities are endless.”Abel Sanche…

1 day, 12 hours назад @ news.mit.edu
MIT’s McGovern Institute is shaping brain science and improving human lives on a global scale
MIT’s McGovern Institute is shaping brain science and improving human lives on a global scale MIT’s McGovern Institute is shaping brain science and improving human lives on a global scale

Their $350 million pledge began with a simple yet audacious vision: to understand the human brain in all its complexity, and to leverage that understanding for the betterment of humanity.

The McGovern community gathers in the shape of the number 25 to celebrate the 25th anniversary of the McGovern Institute.

At the McGovern Institute, the whole is greater than the sum of its parts.”Many discoveries at the McGovern Institute have depended on collaborations across multiple labs, ranging from biological engineering to human brain imaging and artificial intelligence.

Professor Nancy Kanwisher (center) with three of her scientific “children”: (left to right) MIT professors Evelina Fedorenko, Jos…

5 days, 16 hours назад @ news.mit.edu
Making AI-generated code more accurate in any language
Making AI-generated code more accurate in any language Making AI-generated code more accurate in any language

Programmers can now use large language models (LLMs) to generate computer code more quickly.

However, this only makes programmers’ lives easier if that code follows the rules of the programming language and doesn’t cause a computer to crash.

For instance, it could allow businesspeople to write complex queries in SQL, a language for database manipulation, using only natural language prompts.

We can quickly check whether something is in the right programming language, but to check its meaning you have to execute the code.

The researchers’ approach involves engineering knowledge into the LLM to steer it toward the most promising outputs.

6 days, 2 hours назад @ news.mit.edu
A faster way to solve complex planning problems
A faster way to solve complex planning problems A faster way to solve complex planning problems

“Often, a dedicated team could spend months or even years designing an algorithm to solve just one of these combinatorial problems.

With RHO, a user assigns an initial few tasks to machines in a fixed planning horizon, perhaps a four-hour time window.

But when the planning horizon advances, this creates some overlap with operations in the previous planning horizon.

The remaining operations are fed back into the algorithmic solver, which executes the task, recomputes these operations, and moves the planning horizon forward.

They also want to integrate their approach into other types of complex optimization problems like inventory management or vehicle routing.

1 week, 1 day назад @ news.mit.edu
Training LLMs to self-detoxify their language
Training LLMs to self-detoxify their language Training LLMs to self-detoxify their language

As it turns out, large language models (LLMs) — which are trained on extensive, public datasets and therefore often have biases and toxic language baked in — can gain a similar capacity to moderate their own language.

Finding the “guardrails”The training resources behind LLMs almost always include content collected from public spaces like the internet and other readily available datasets.

As such, curse words and bullying/unpalatable language is a component, although some of it is in the context of literary works.

It then follows that LLMs can innately produce — or be tricked into generating — dangerous and/or biased content, which often contains disagreeable words or hateful language, even…

1 week, 2 days назад @ news.mit.edu
New method efficiently safeguards sensitive AI training data
New method efficiently safeguards sensitive AI training data New method efficiently safeguards sensitive AI training data

Data privacy comes with a cost.

The team utilized their new version of PAC Privacy to privatize several classic algorithms for data analysis and machine-learning tasks.

They also demonstrated that more “stable” algorithms are easier to privatize with their method.

A stable algorithm’s predictions remain consistent even when its training data are slightly modified.

The original PAC Privacy algorithm runs a user’s AI model many times on different samples of a dataset.

1 week, 6 days назад @ news.mit.edu
Could LLMs help design our next medicines and materials?
Could LLMs help design our next medicines and materials? Could LLMs help design our next medicines and materials?

It automatically switches between the base LLM and graph-based AI modules to design the molecule, explain the rationale, and generate a step-by-step plan to synthesize it.

It also outperformed LLMs that are more than 10 times its size and that design molecules and synthesis routes only with text-based representations, suggesting multimodality is key to the new system’s success.

While these models are popular for inverse molecular design, they require complex inputs, can’t understand natural language, and yield results that can be difficult to interpret.

A second module uses a graph neural network to encode the generated molecular structure back into tokens for the LLMs to consume.

In experi…

2 weeks, 1 day назад @ news.mit.edu
New method assesses and improves the reliability of radiologists’ diagnostic reports
New method assesses and improves the reliability of radiologists’ diagnostic reports New method assesses and improves the reliability of radiologists’ diagnostic reports

But do the words radiologists use to express their confidence level accurately reflect how often a particular pathology occurs in patients?

They used this approach to provide clear suggestions that help radiologists choose certainty phrases that would improve the reliability of their clinical reporting.

By helping radiologists more accurately describe the likelihood of certain pathologies in medical images, this new framework could improve the reliability of critical clinical information.

Rather than trying to map certainty phrases to a single percentage, the researchers’ approach treats them as probability distributions.

In addition, they are interested in studying how receptive radiologis…

2 weeks, 6 days назад @ news.mit.edu
Taking the “training wheels” off clean energy
Taking the “training wheels” off clean energy Taking the “training wheels” off clean energy

“What these technologies need less is training wheels, and more of a level playing field,” said Brian Deese, an MIT Institute Innovation Fellow, during a conference-opening keynote panel.

The good: Clean energy investment in the United States hit an all-time high of $272 billion in 2024.

And the ugly: Macro conditions are making it more difficult for utilities and private enterprise to build out the clean energy infrastructure needed to meet growing energy demands.

Across the two days, speakers emphasized that the cost-per-unit and scalability of clean energy technologies will ultimately determine their fate.

However, she warned that green energy technologies are unlikely to receive signifi…

2 weeks, 6 days назад @ news.mit.edu
Vana is letting users own a piece of the AI models trained on their data
Vana is letting users own a piece of the AI models trained on their data Vana is letting users own a piece of the AI models trained on their data

AI developers can pitch users on ideas for new models, and if the users agree to contribute their data for training, they get proportional ownership in the models.

Users can upload that information into encrypted digital wallets in Vana and disburse it to train models as they see fit.

In Vana, data are used in a way that preserves user privacy because the system doesn’t expose identifiable information.

More than 140,000 Vana users contributed their Reddit data, which contained posts, comments, messages, and more.

“Let’s say users have Spotify data, Reddit data, and fashion data,” Kazlauskas explains.

3 weeks назад @ news.mit.edu
Researchers teach LLMs to solve complex planning challenges
Researchers teach LLMs to solve complex planning challenges Researchers teach LLMs to solve complex planning challenges

In fact, for all their incredible capabilities, large language models (LLMs) often perform poorly when tasked with directly solving such complicated planning problems on their own.

The researchers’ algorithmic solvers apply the same principles to optimization problems that are far too complex for a human to crack.

It can find the optimal solution to a planning problem right out of the box.

In the future, the researchers want to enable LLMFP to take images as input to supplement the descriptions of a planning problem.

This would help the framework solve tasks that are particularly hard to fully describe with natural language.

3 weeks, 1 day назад @ news.mit.edu
Pattie Maes receives ACM SIGCHI Lifetime Research Award
Pattie Maes receives ACM SIGCHI Lifetime Research Award Pattie Maes receives ACM SIGCHI Lifetime Research Award

Pattie Maes, the Germeshausen Professor of Media Arts and Sciences at MIT and head of the Fluid Interfaces research group within the MIT Media Lab, has been awarded the 2025 ACM SIGCHI Lifetime Research Award.

The Lifetime Research Award is given to individuals whose research in human-computer interaction (HCI) is considered both fundamental and influential to the field.

Recipients are selected based on their cumulative contributions, influence on the work of others, new research developments, and being an active participant in the Association for Computing Machinery’s Special Interest Group on Computer-Human Interaction (ACM SIGCHI) community.

Rather than AI replacing human capabilities, M…

3 weeks, 2 days назад @ news.mit.edu
Berkeley AI
последний пост 1 week, 5 days назад
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign) Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications.

To mitigate the imminent prompt injection threat, we propose two fine-tuning-defenses, StruQ and SecAlign.

Prompt Injection Attack: CausesBelow is the threat model of prompt injection attacks.

Prompt injection threat model in LLM-integrated applicationsWe propose that prompt injection has two causes.

Below are resources to learn more and keep updated on prompt injection attacks and defenses.

1 week, 5 days назад @ bair.berkeley.edu
Repurposing Protein Folding Models for Generation with Latent Diffusion
Repurposing Protein Folding Models for Generation with Latent Diffusion Repurposing Protein Folding Models for Generation with Latent Diffusion

Repurposing Protein Folding Models for Generation with Latent DiffusionPLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models.

In PLAID, we develop a method that learns to sample from the latent space of protein folding models to generate new proteins.

Unlike many previous protein structure generative models, PLAID addresses the multimodal co-generation problem setting: simultaneously generating both discrete sequence and continuous all-atom structural coordinates.

In this way, we can use structural understanding information in the weights of pretrained protein folding models for the p…

2 weeks, 1 day назад @ bair.berkeley.edu
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway DeploymentTraining Diffusion Models with Reinforcement LearningWe deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone.

The challenges of phantom jamsA stop-and-go wave moving backwards through highway traffic.

Smoothing behavior of RL AVs.

Overall, the steps towards deployment involved:Training in data-driven simulations: We used highway traffic data from I-24 to create a training environment with realistic wave dynamics, then validate the trained agent’s performance and robustness in a variety of new traffic scenarios.…

4 weeks, 1 day назад @ bair.berkeley.edu
Virtual Personas for Language Models via an Anthology of Backstories
Virtual Personas for Language Models via an Anthology of Backstories Virtual Personas for Language Models via an Anthology of Backstories

Virtual Personas for Language Models via an Anthology of BackstoriesWe introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience.

What does it mean for large language models (LLMs) to be trained on massive text corpora, collectively produced by millions and billions of distinctive human authors?

In this work, we introduce Anthology, an approach for steering LLMs to representative, consistent, and diverse virtual personas by providing richly detailed life narratives of individuals as conditioning context to models.

By grounding langu…

5 months, 1 week назад @ bair.berkeley.edu
Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination
Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination

Linguistic Bias in ChatGPT: Language Models Reinforce Dialect DiscriminationSample language model responses to different varieties of English and native speaker reactions.

Over 1 billion people around the world speak varieties such as Indian English, Nigerian English, Irish English, and African-American English.

Then, we compared the language model responses to the “standard” varieties and the non-“standard” varieties.

Here, we included the original GPT-3.5 responses, plus responses from GPT-3.5 and GPT-4 where the models were told to imitate the style of the input.

That can reinforce barriers against speakers of non-“standard” varieties as AI models become increasingly used in …

7 months назад @ bair.berkeley.edu
How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark
How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT BenchmarkWhen we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure languages.

This blog post shows how to use a new, state-of-the art jailbreak benchmark - StrongREJECT - to accurately and robustly evaluate jailbreak methods.

PAP instructs an attacker model to persuade a victim model to give it harmful information using techniques like misrepresentation and logical appeals.

We conducted two experiments to test this hypothesis:We used StrongREJECT to evaluate 37 jailbreak methods on an unaligned model; Dolp…

7 months, 4 weeks назад @ bair.berkeley.edu
Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark!
Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark! Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark!

Launching VHs: The Visual Haystacks Benchmark!

Humans excel at processing vast arrays of visual information, a skill that is crucial for achieving artificial general intelligence (AGI).

Visual Haystacks: the first "visual-centric" Needle-In-A-Haystack (NIAH) benchmark designed to rigorously evaluate Large Multimodal Models (LMMs) in processing long-context visual information.

The first NIAH benchmark for visual reasoning was introduced by Google in the Gemini-v1.5 technical report.

What is the Visual Haystacks (VHs) Benchmark?

9 months, 1 week назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 14 часов назад
Build an AI-powered document processing platform with open source NER model and LLM on Amazon SageMaker
Build an AI-powered document processing platform with open source NER model and LLM on Amazon SageMaker Build an AI-powered document processing platform with open source NER model and LLM on Amazon SageMaker

In this post, we discuss how you can build an AI-powered document processing platform with open source NER and LLMs on SageMaker.

Solution overviewThe NER & LLM Gen AI Application is a document processing solution built on AWS that combines NER and LLMs to automate document analysis at scale.

Solution ComponentsStorage architectureThe application uses a multi-bucket Amazon S3 storage architecture designed for clarity, efficient processing tracking, and clear separation of document processing stages.

Logical flowThe document processing workflow orchestrates multiple stages of analysis that operate both in parallel and sequential patterns.

The implementation of a dedicated NER model for autho…

14 часов назад @ aws.amazon.com
Protect sensitive data in RAG applications with Amazon Bedrock
Protect sensitive data in RAG applications with Amazon Bedrock Protect sensitive data in RAG applications with Amazon Bedrock

If sensitive data isn’t sanitized before ingestion, this might lead to retrieving sensitive data from the vector store and inadvertently leak the sensitive data to unauthorized users as part of the model response.

Data redaction at storage level – Identifying and redacting (or masking) sensitive data before storing them to the vector store (ingestion) using Amazon Bedrock Knowledge Bases.

For comprehensive information about Amazon Bedrock security, please refer to the Amazon Bedrock Security documentation.

ConclusionIn this post, we explored two approaches for securing sensitive data in RAG applications using Amazon Bedrock.

He now focuses on building generative AI services such as Amazon B…

14 часов назад @ aws.amazon.com
Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15
Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15 Supercharge your LLM performance with Amazon SageMaker Large Model Inference container v15

Today, we’re excited to announce the launch of Amazon SageMaker Large Model Inference (LMI) container v15, powered by vLLM 0.8.4 with support for the vLLM V1 engine.

Amazon SageMaker AI continues to evolve its generative AI inference capabilities to meet the growing demands in performance and model support for foundation models (FMs).

", "parameters": { "max_new_tokens": 256, "temperature": 0.9, }, "stream": True, }Getting started with LMI v15Getting started with LMI v15 is seamless, and you can deploy with LMI v15 in only a few lines of code.

The container is available through Amazon Elastic Container Registry (Amazon ECR), and deployments can be managed through SageMaker AI endpoints.

Con…

1 day, 13 hours назад @ aws.amazon.com
Accuracy evaluation framework for Amazon Q Business – Part 2
Accuracy evaluation framework for Amazon Q Business – Part 2 Accuracy evaluation framework for Amazon Q Business – Part 2

Challenges in evaluating Amazon Q BusinessEvaluating the performance of Amazon Q Business, which uses a RAG model, presents several challenges due to its integration of retrieval and generation components.

Add users to Amazon Q BusinessYou need to provision users for the pre-created Amazon Q Business application.

Let’s review the first question: “What are the index types of Amazon Q Business and the features of each?” You can read the question, Amazon Q Business generated answers, ground truth, and context.

By using our guidance on how to improve evaluation metrics, you can continuously optimize your Amazon Q Business application to meet enterprise needs with Amazon Q Business.

To learn mor…

1 day, 13 hours назад @ aws.amazon.com
Use Amazon Bedrock Intelligent Prompt Routing for cost and latency benefits
Use Amazon Bedrock Intelligent Prompt Routing for cost and latency benefits Use Amazon Bedrock Intelligent Prompt Routing for cost and latency benefits

We encourage you to incorporate Amazon Bedrock Intelligent Prompt Routing into your new and existing generative AI applications.

Highlights and improvementsToday, you can either use Amazon Bedrock Intelligent Prompt Routing with the default prompt routers provided by Amazon Bedrock or configure your own prompt routers to adjust for performance linearly between the performance of the two candidate LLMs.

Customers who tested Amazon Bedrock Intelligent Prompt Routing in preview (thank you!

We conducted several internal tests with proprietary and public data to evaluate Amazon Bedrock Intelligent Prompt Routing metrics.

When starting with Amazon Bedrock Intelligent Prompt Routing, we recommend …

1 day, 13 hours назад @ aws.amazon.com
How Infosys improved accessibility for Event Knowledge using Amazon Nova Pro, Amazon Bedrock and Amazon Elemental Media Services
How Infosys improved accessibility for Event Knowledge using Amazon Nova Pro, Amazon Bedrock and Amazon Elemental Media Services How Infosys improved accessibility for Event Knowledge using Amazon Nova Pro, Amazon Bedrock and Amazon Elemental Media Services

Infosys, a leading global IT services and consulting organization, used its digital expertise to tackle this challenge by pioneering, Infosys Event AI, an innovative AI-based event assistant.

By transforming ephemeral event content into a persistent and searchable knowledge asset, Infosys Event AI seeks to enhance knowledge utilization and impact.

To address these challenges, Infosys partnered with Amazon Web Services (AWS) to develop the Infosys Event AI to unlock the insights generated during events.

In this post, we explain how Infosys built the Infosys Event AI solution using several AWS services including:Solution ArchitectureIn this section, we present an overview of Event AI, highlig…

1 day, 13 hours назад @ aws.amazon.com
Amazon Bedrock Prompt Optimization Drives LLM Applications Innovation for Yuewen Group
Amazon Bedrock Prompt Optimization Drives LLM Applications Innovation for Yuewen Group Amazon Bedrock Prompt Optimization Drives LLM Applications Innovation for Yuewen Group

Today, we are excited to announce the availability of Prompt Optimization on Amazon Bedrock.

Prompt Optimization is seamlessly integrated into Amazon Bedrock Playground and Prompt Management to easily create, evaluate, store and use optimized prompt in your AI applications.

Results of Prompt OptimizationUsing Bedrock Prompt Optimization, Yuewen Group achieved significant improvements in across various intelligent text analysis tasks, including name extraction and multi-option reasoning use-cases.

Using the power of foundation models, Prompt Optimization produces high-quality results with minimal manual prompt iteration.

Prompt Optimization Best PracticesThroughout our experience with Prompt…

2 days, 7 hours назад @ aws.amazon.com
Build a location-aware agent using Amazon Bedrock Agents and Foursquare APIs
Build a location-aware agent using Amazon Bedrock Agents and Foursquare APIs Build a location-aware agent using Amazon Bedrock Agents and Foursquare APIs

To tackle this challenge, we can combine Amazon Bedrock Agents and Foursquare APIs.

Amazon Bedrock Agents is a feature within Amazon Bedrock that allows you to create autonomous AI agents.

Solution overviewTo demonstrate the power of adding location to Amazon Bedrock Agents, we created a simple architecture that creates an Amazon Bedrock agent with the Foursquare Places APIs and a Weather API.

With Amazon Bedrock Agents, you can build a cloud-centered solution that allows you to use powerful foundation models on Amazon Bedrock to drive these experiences.

About the authorsJohn Baker is a Principal SDE at AWS, where he works on Amazon Bedrock and specifically Amazon Bedrock Agents.

2 days, 12 hours назад @ aws.amazon.com
Build an automated generative AI solution evaluation pipeline with Amazon Nova
Build an automated generative AI solution evaluation pipeline with Amazon Nova Build an automated generative AI solution evaluation pipeline with Amazon Nova

Solution overviewIn this section, we present an automated generative AI evaluation solution that can be used to simplify the evaluation process.

This solution provides both online (real-time comparison) and offline (batch evaluation) evaluation options that fulfill different needs during the generative AI solution development lifecycle.

The architecture of the automated LLM evaluation pipeline focuses on modularity, flexibility, and scalability.

The evaluation solution can significantly enhance team productivity throughout the development lifecycle by reducing manual intervention and increasing automated processes.

We encourage you to explore the GitHub repository and start building your ow…

2 days, 13 hours назад @ aws.amazon.com
Build a FinOps agent using Amazon Bedrock with multi-agent capability and Amazon Nova as the foundation model
Build a FinOps agent using Amazon Bedrock with multi-agent capability and Amazon Nova as the foundation model Build a FinOps agent using Amazon Bedrock with multi-agent capability and Amazon Nova as the foundation model

In this post, we use the multi-agent feature of Amazon Bedrock to demonstrate a powerful and innovative approach to AWS cost management.

Solution overviewOur innovative AWS cost management solution uses the power of AI and multi-agent collaboration to provide comprehensive cost analysis and optimization recommendations.

Amazon Bedrock Agents with multi-agent capabilityThe Amazon Bedrock multi-agent architecture enables sophisticated FinOps problem-solving through a coordinated system of AI agents, led by a FinOpsSupervisorAgent .

This information is required to securely authenticate users and allow the frontend to interact with the Amazon Bedrock agent.

ConclusionThe integration of the mult…

5 days, 13 hours назад @ aws.amazon.com
Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors
Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors Stream ingest data from Kafka to Amazon Bedrock Knowledge Bases using custom connectors

In this post, we showcase the custom data connector capability in Amazon Bedrock Knowledge Bases that makes it straightforward to build RAG workflows with custom input data.

With custom data connectors, you can quickly ingest specific documents from custom data sources without requiring a full sync and ingest streaming data without the need for intermediary storage.

However, with streaming ingestion using custom connectors, Amazon Bedrock Knowledge Bases processes such streaming data without using an intermediary data source, making it available almost immediately.

Amazon Bedrock Knowledge Bases custom connectorAmazon Bedrock Knowledge Bases supports custom connectors and the ingestion of s…

5 days, 13 hours назад @ aws.amazon.com
Add Zoom as a data accessor to your Amazon Q index
Add Zoom as a data accessor to your Amazon Q index Add Zoom as a data accessor to your Amazon Q index

Organizations can now configure Zoom as a data accessor in Amazon Q Business, enabling seamless integration between their Amazon Q index and Zoom AI Companion.

How Amazon Q Business and Zoom AI Companion work togetherThe Amazon Q Business data accessor is a core component within Amazon Q Business.

Create an Amazon Q Business applicationTo access indexed data from Amazon Q Business through Zoom AI Companion, organizations must first set up their Amazon Q Business application.

Configure Amazon Q for Zoom AI CompanionTo start using Zoom as a data accessor for your Amazon Q Business index, the following information from your enterprise Amazon Q Business application must be shared with Zoom:Amaz…

6 days, 12 hours назад @ aws.amazon.com
The future of quality assurance: Shift-left testing with QyrusAI and Amazon Bedrock
The future of quality assurance: Shift-left testing with QyrusAI and Amazon Bedrock The future of quality assurance: Shift-left testing with QyrusAI and Amazon Bedrock

In this post, we explore how QyrusAI and Amazon Bedrock are revolutionizing shift-left testing, enabling teams to deliver better software faster.

QyrusAI: Intelligent testing agents powered by Amazon BedrockQyrusAI is a suite of AI-driven testing tools that enhances the software testing process across the entire software development lifecycle (SDLC).

Using advanced large language models (LLMs) and vision-language models (VLMs) through Amazon Bedrock, QyrusAI provides a suite of capabilities designed to elevate shift-left testing.

The following diagram shows how TestGenerator is deployed on AWS using Amazon Elastic Container Service (Amazon ECS) tasks exposed through Application Load Balance…

6 days, 13 hours назад @ aws.amazon.com
Automate video insights for contextual advertising using Amazon Bedrock Data Automation
Automate video insights for contextual advertising using Amazon Bedrock Data Automation Automate video insights for contextual advertising using Amazon Bedrock Data Automation

Amazon Bedrock Data Automation (BDA) is a new managed feature powered by FMs in Amazon Bedrock.

Solution overviewNonlinear ads are digital video advertisements that appear simultaneously with the main video content without interrupting playback.

Each new video invokes an AWS Lambda function that triggers BDA for video analysis.

Understanding these outputs is essential to understand what type of insights BDA provides and how to use them to build our contextual advertising solution.

ConclusionAmazon Bedrock Data Automation, powered by foundation models from Amazon Bedrock, marks a significant advancement in video analysis.

6 days, 13 hours назад @ aws.amazon.com
How Salesforce achieves high-performance model deployment with Amazon SageMaker AI
How Salesforce achieves high-performance model deployment with Amazon SageMaker AI How Salesforce achieves high-performance model deployment with Amazon SageMaker AI

The Salesforce AI Model Serving team is working to push the boundaries of natural language processing and AI capabilities for enterprise applications.

In this post, we share how the AI Model Service team achieved high-performance model deployment using Amazon SageMaker AI.

Best practice configurations for deployment in SageMaker AIA key advantage of using SageMaker AI is the best practice configurations for deployment.

To learn more about how SageMaker AI enhances Einstein’s LLM latency and throughput, see Revolutionizing AI: How Amazon SageMaker Enhances Einstein’s Large Language Model Latency and Throughput.

For more information on how to get started with SageMaker AI, refer to Guide to g…

6 days, 14 hours назад @ aws.amazon.com
NVIDIA
последний пост 11 часов назад
NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support
NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support

NVIDIA cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework.

cuPyNumeric 25.03 is a milestone update that introduces powerful new capabilities and enhanced accessibility for users and developers alike, as detailed in this post.

Full stack now open sourceWith cuPyNumeric 25.03, NVIDIA open-sourced the Legate framework and runtime layer that powers cuPyNumeric, under the Apache 2 license.

Native HDF5 IO supportcuPyNumeric 25.03 provides native support for HDF5 over GPU Direct Storage, enabling efficient handling of large datasets and seamless interoperability with scientific computing environments.

Get s…

11 часов назад @ developer.nvidia.com
Capital One Banks on AI for Financial Services
Capital One Banks on AI for Financial Services Capital One Banks on AI for Financial Services

Capital One Banks on AI for Financial ServicesFinancial services has long been at the forefront of adopting technological innovations.

“Proprietary data allows you to build proprietary AI that provides enduring differentiated services for your customers.”Capital One’s AI architecture combines open-weight foundation models with deep customizations using proprietary data.

AI factories incorporate all the components required for financial institutions to generate intelligence, combining hardware, software, networking and development tools for AI applications in financial services.

Jacob Liberman, director of product management at NVIDIA, explains how agentic AI bridges the gap between powerful…

14 часов назад @ blogs.nvidia.com
How the Economics of Inference Can Maximize AI Value
How the Economics of Inference Can Maximize AI Value How the Economics of Inference Can Maximize AI Value

As AI models evolve and adoption grows, enterprises must perform a delicate balancing act to achieve maximum value.

That means that as AI model performance and use increases, so do the amount of tokens generated and their associated computational costs.

Key Terminology for the Economics of AI InferenceKnowing key terms of the economics of inference helps set the foundation for understanding its importance.

The smarter and faster an AI model is, the more utility it will have to companies and customers.

Learn more by reading the ebook “AI Inference: Balancing Cost, Latency and Performance.”

15 часов назад @ blogs.nvidia.com
Enterprises Onboard AI Teammates Faster With NVIDIA NeMo Tools to Scale Employee Productivity
Enterprises Onboard AI Teammates Faster With NVIDIA NeMo Tools to Scale Employee Productivity Enterprises Onboard AI Teammates Faster With NVIDIA NeMo Tools to Scale Employee Productivity

Now generally available, NVIDIA NeMo microservices are helping enterprise IT quickly build AI teammates that tap into data flywheels to scale employee productivity.

NVIDIA NeMo microservices — including NeMo Customizer, NeMo Evaluator and NeMo Guardrails — can be used alongside NeMo Retriever and NeMo Curator to ease enterprises’ experiences building, optimizing and scaling AI agents through custom enterprise data flywheels.

Industry Pioneers Boost AI Agent Accuracy With NeMo MicroservicesNVIDIA partners and industry pioneers are using NeMo microservices to build responsive AI agent platforms so that digital teammates can help get more done.

Meta has tapped NVIDIA NeMo microservices through…

17 часов назад @ blogs.nvidia.com
Project G-Assist Plug-In Builder Lets Anyone Customize AI on GeForce RTX AI PCs
Project G-Assist Plug-In Builder Lets Anyone Customize AI on GeForce RTX AI PCs Project G-Assist Plug-In Builder Lets Anyone Customize AI on GeForce RTX AI PCs

Extend Project G-Assist System Assistant with easy-to-build AI plug-ins to take action on PCs with natural language.

With the new ChatGPT-based G-Assist Plug-In Builder, developers and enthusiasts can create and customize G-Assist’s functionality, adding new commands, connecting external tools and building AI workflows tailored to specific needs.

With the G-Assist Plug-In Builder, users can:Use a responsive small language model running locally on GeForce RTX GPUs for fast, private inference.

Start Building TodayWith the G-Assist Plugin Builder and open API support, anyone can extend G-Assist to fit their exact needs.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay in…

17 часов назад @ blogs.nvidia.com
Making Brain Waves: AI Startup Speeds Disease Research With Lab in the Loop
Making Brain Waves: AI Startup Speeds Disease Research With Lab in the Loop Making Brain Waves: AI Startup Speeds Disease Research With Lab in the Loop

BrainStorm Therapeutics, a San Diego-based startup, is accelerating the development of cures for these conditions using AI-powered computational drug discovery paired with lab experiments using organoids: tiny, 3D bundles of brain cells created from patient-derived stem cells.

This hybrid, iterative method, where clinical data and AI models inform one another to accelerate drug development, is known as lab in the loop.

BrainStorm Therapeutics’ AI models, which run on NVIDIA GPUs in the cloud, were developed using the NVIDIA BioNeMo Framework, a set of programming tools, libraries and models for computational drug discovery.

Accelerating Drug Discovery ResearchWith its proprietary platform, …

1 day, 17 hours назад @ blogs.nvidia.com
Chill Factor: NVIDIA Blackwell Platform Boosts Water Efficiency by Over 300x
Chill Factor: NVIDIA Blackwell Platform Boosts Water Efficiency by Over 300x Chill Factor: NVIDIA Blackwell Platform Boosts Water Efficiency by Over 300x

Instead of relying on air as an intermediary, direct-to-chip liquid cooling transfers heat in a technology cooling system loop.

It packs unprecedented compute density into each server rack, delivering 40x higher revenue potential, 30x higher throughput, 25x more energy efficiency and 300x more water efficiency than traditional air-cooled architectures.

Newer NVIDIA GB300 NVL72 systems built on the Blackwell Ultra platform boast a 50x higher revenue potential and 35x higher throughput with 30x more energy efficiency.

By embracing high-density architectures and advanced liquid cooling, the industry is paving the way for a more efficient AI-powered future.

Learn more about breakthrough solutio…

1 day, 17 hours назад @ blogs.nvidia.com
Keeping AI on the Planet: NVIDIA Technologies Make Every Day About Earth Day
Keeping AI on the Planet: NVIDIA Technologies Make Every Day About Earth Day Keeping AI on the Planet: NVIDIA Technologies Make Every Day About Earth Day

Sailing the Seas of AIAmphitrite, based in France, uses satellite data with AI to simulate and predict ocean currents and weather.

Its AI models, driven by the NVIDIA AI and Earth-2 platforms, offer insights for positioning vessels to best harness the power of ocean currents.

Keeping AI on the WeatherWeather agencies and climate scientists worldwide are using NVIDIA CorrDiff, a generative AI weather model enabling kilometer-scale forecasts of wind, temperature and precipitation type and amount.

In another climate effort, NVIDIA Research announced a new generative AI model, called StormCast, for reliable weather prediction at a scale larger than storms.

NVIDIA GB300 NVL72 systems built on th…

1 day, 17 hours назад @ blogs.nvidia.com
Grandmaster Pro Tip: Winning First Place in Kaggle Competition with Feature Engineering using NVIDIA cuDF-pandas
Grandmaster Pro Tip: Winning First Place in Kaggle Competition with Feature Engineering using NVIDIA cuDF-pandas Grandmaster Pro Tip: Winning First Place in Kaggle Competition with Feature Engineering using NVIDIA cuDF-pandas

Feature engineering remains one of the most effective ways to improve model accuracy when working with tabular data.

Below, I share the core feature engineering techniques, accelerated by cuDF-pandas, that led to this result.

The most powerful feature engineering technique is groupby aggregations.

train["NaNs"] = np.float32(0) for i,c in enumerate(CATS): train["NaNs"] += train[c].isna()*2**iPut Numerical Column into BinsThe most powerful (predictive) column in this competition is Weight Capacity.

Feature engineering remains essential for maximizing tabular model performance, but the traditional approach using CPUs often hits a wall, making extensive feature discovery prohibitively slow.

6 days, 7 hours назад @ developer.nvidia.com
AI Bites Back: Researchers Develop Model to Detect Malaria Amid Venezuelan Gold Rush
AI Bites Back: Researchers Develop Model to Detect Malaria Amid Venezuelan Gold Rush AI Bites Back: Researchers Develop Model to Detect Malaria Amid Venezuelan Gold Rush

Once malaria-free, Venezuela is facing a resurgence of the infectious disease, but researchers have trained a model to aid in detection.

Gold prospecting in Venezuela has led to a malaria resurgence, but researchers have developed AI to take a bite out of the problem.

But researchers at the intersection of medicine and technology have tapped AI and NVIDIA GPUs to come up with a solution.

“At some point in Venezuela, malaria was almost eradicated,” said 25-year-old Diego Ramos-Briceño, who has a bachelor’s in engineering that he earned while also pursuing a doctorate in medicine.

Harnessing Gaming GPUs and CUDA for Model Training, InferenceTo run model training, the malaria paper’s team tapp…

6 days, 14 hours назад @ blogs.nvidia.com
Spring Into Action With 11 New Games on GeForce NOW
Spring Into Action With 11 New Games on GeForce NOW Spring Into Action With 11 New Games on GeForce NOW

Plus, roll with the punches in Capcom’s MARVEL vs. CAPCOM Fighting Collection: Arcade Classics, part of 11 games GeForce NOW is adding to its cloud gaming library — featuring over 2,000 titles playable with GeForce RTX 4080 performance.

PC Game Pass members — and those who own the game on Xbox — can stream the action instantly.

Whether players are tracking monstrous bounties solo or teaming with friends, the game’s tense player vs. player vs. environment action and new map, Mammon’s Gulch, are ideal for springtime exploration.

GeForce NOW members can continue their quest wherever spring takes them — including on their laptops, tablets and smartphones.

Time for New GamesCatch MARVEL vs. CAPC…

6 days, 17 hours назад @ blogs.nvidia.com
Isomorphic Labs Rethinks Drug Discovery With AI
Isomorphic Labs Rethinks Drug Discovery With AI Isomorphic Labs Rethinks Drug Discovery With AI

Isomorphic Labs is reimagining the drug discovery process with an AI-first approach.

Max Jaderberg, chief AI officer, and Sergei Yakneen, chief technology officer at Isomorphic Labs joined the AI Podcast to explain why they look at biology as an information processing system.

“We’re building generalizable AI models capable of learning from the entire universe of protein and chemical interactions,” Jaderberg said.

Jacob Liberman, director of product management at NVIDIA, joined the NVIDIA AI Podcast to explain how agentic AI bridges the gap between powerful AI models and practical enterprise applications.

How World Foundation Models Will Advance Physical AI With NVIDIA’s Ming-Yu LiuAI models…

1 week назад @ blogs.nvidia.com
Efficient Federated Learning in the Era of LLMs with Message Quantization and Streaming
Efficient Federated Learning in the Era of LLMs with Message Quantization and Streaming Efficient Federated Learning in the Era of LLMs with Message Quantization and Streaming

Quantization and dequantization are implemented using NVFlare filters and added to the federated schemes, reducing the message size during transmission.

Container and file streaming: Streaming capabilities are implemented on top of ObjectStreamer.

Given that recent LLMs are trained with reduced precision, the default fp32 message precision under NumPy format can even be artificially inflating the message size.

Federated model training with message quantizationTable 1 shows the message size in MB for a 1B parameter LLM under different precisions.

File streaming: Streaming a file rather than a structured object container.

1 week назад @ developer.nvidia.com
Into the Omniverse: How Digital Twins Are Scaling Industrial AI
Into the Omniverse: How Digital Twins Are Scaling Industrial AI Into the Omniverse: How Digital Twins Are Scaling Industrial AI

OpenUSD and the Mega Omniverse Blueprint enable robot fleet simulations in industrial facility digital twins.

The Mega NVIDIA Omniverse Blueprint — available in preview on build.nvidia.com — helps address these challenges by providing a scalable reference workflow for simulating multi-robot fleets in industrial facility digital twins, including those built with the NVIDIA Omniverse platform.

Omniverse Cloud Sensor RTX APIs : Ensure accurate sensor simulation with NVIDIA Omniverse Cloud application programming interfaces to create detailed virtual replicas of industrial facilities.

Video Analytics AI Agents: Integrate AI agents built with the NVIDIA AI Blueprint for video search and summariz…

1 week назад @ blogs.nvidia.com
Thousands of NVIDIA Grace Blackwell GPUs Now Live at CoreWeave, Propelling Development for AI Pioneers
Thousands of NVIDIA Grace Blackwell GPUs Now Live at CoreWeave, Propelling Development for AI Pioneers Thousands of NVIDIA Grace Blackwell GPUs Now Live at CoreWeave, Propelling Development for AI Pioneers

Cohere, IBM and Mistral AI deploy thousands of Blackwell GPUs in NVIDIA GB200 NVL72 rack-scale systems to train and run reasoning models and agentic AI.

Now, CoreWeave customers are gaining access to thousands of NVIDIA Blackwell GPUs.

CoreWeave’s NVIDIA GB200 NVL72 deployment for IBM also harnesses the IBM Storage Scale System, which delivers exceptional high-performance storage for AI.

CoreWeave’s experience standing up NVIDIA GPUs at scale with industry-leading reliability and resiliency through tools such as CoreWeave Mission Control met these requirements.

“What’s exciting about NVIDIA GB200 NVL72 is the new possibilities it opens up for model development and inference.”A Growing Numbe…

1 week, 1 day назад @ blogs.nvidia.com
Facebook
последний пост 1 month, 2 weeks назад
Building multimodal AI for Ray-Ban Meta glasses
Building multimodal AI for Ray-Ban Meta glasses Building multimodal AI for Ray-Ban Meta glasses

With our Ray-Ban Meta glasses, multimodal AI helps the glasses see what the wearer is seeing.

This means anyone wearing Ray-Ban Meta glasses can ask them questions about what they’re looking at.

On this episode of the Meta Tech Podcast, meet Shane, a research scientist at Meta who has spent the last seven years focusing on computer vision and multimodal AI for wearables.

Shane sits down with Pascal Hartig to share how his team is building foundational models for the Ray-Ban Meta glasses.

They talk about the unique challenges of AI glasses and pushing the boundaries of AI-driven wearable technology.

1 month, 2 weeks назад @ engineering.fb.com
Revolutionizing software testing: Introducing LLM-powered bug catchers
Revolutionizing software testing: Introducing LLM-powered bug catchers Revolutionizing software testing: Introducing LLM-powered bug catchers

WHAT IT ISMeta’s Automated Compliance Hardening (ACH) tool is a system for mutation-guided, LLM-based test generation.

Traditionally, automated test generation techniques sought merely to increase code coverage.

LLM-based test generation and LLM-based mutant generation are not new, but this is the first time they’ve been combined and deployed in large-scaled industrial systems.

WHAT’S NEXTOur novel approach combines LLM-based test generation and mutant generation to help automate complex technical organizational workflows in this space.

READ THE PAPERMutation-Guided LLM-based Test Generation at Meta

2 months, 2 weeks назад @ engineering.fb.com
Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine
Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine

Unlocking advertiser value through industry-leading ML innovationMeta Andromeda is a personalized ads retrieval engine that leverages the NVIDIA Grace Hopper Superchip, to enable cutting edge ML innovation in the Ads retrieval stage to drive efficiency and advertiser performance.

Its deployment across Instagram and Facebook applications has achieved +6% recall improvement to the retrieval system, delivering +8% ads quality improvement on selected segments.

Andromeda is designed to maximize ads performance by utilizing the exponential growth in volume of eligible ads available to the retrieval stage.

The design is optimized for AI hardware, minimizing memory bandwidth bottlenecks and enablin…

4 months, 3 weeks назад @ engineering.fb.com
Sequence learning: A paradigm shift for personalized ads recommendations
Sequence learning: A paradigm shift for personalized ads recommendations Sequence learning: A paradigm shift for personalized ads recommendations

Meta’s ad recommendation engine, powered by deep learning recommendation models (DLRMs), has been instrumental in delivering personalized ads to people.

Learning from sequences: developing new sequence learning architectures to replace traditional DLRM neural network architectures.

A paradigm shift with learning from sequences for recommendation systemsMeta’s new system for ads recommendations uses sequence learning at its core.

Scaling the new sequence learning paradigmFollowing the redesign to shift from sparse feature learning to event-based sequence learning, the next focus was scaling across two domains — scaling the sequence learning architecture and scaling event sequences to be long…

5 months назад @ engineering.fb.com
OCP Summit 2024: The open future of networking hardware for AI
OCP Summit 2024: The open future of networking hardware for AI OCP Summit 2024: The open future of networking hardware for AI

At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters.

We’ve expanded our network hardware portfolio and are contributing two new disaggregated network fabrics and a new NIC to OCP.

At Meta, we believe that open hardware drives innovation.

At Meta, we envision a future of AI hardware systems that are not only scalable, but also open and collaborative.

We encourage anyone who wants to help advance the future of networking hardware for AI to engage with OCP and Meta to help share the future of AI infrastructure.

6 months, 1 week назад @ engineering.fb.com
Meta’s open AI hardware vision
Meta’s open AI hardware vision Meta’s open AI hardware vision

At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community.

These innovations include a new AI platform, cutting-edge open rack designs, and advanced network fabrics and components.

The open future of AI infraMeta is committed to open source AI.

We must also prioritize open and standardized models so we can leverage collective expertise, make AI more accessible, and work towards minimizing biases in our systems.​Just as important, we also need open AI hardware systems.

By addressing AI’s infrastructure needs together, we can unlock the true promise of open AI for everyone.​

6 months, 1 week назад @ engineering.fb.com
How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions
How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions

Why we need better population mapsAccurate estimates of population are taken for granted in many countries.

As the world’s natural resource and energy demands scale, accurate population estimates also offer significant opportunities to improve sustainability efforts.

In addition to total population counts, Meta’s population maps also include demographic breakdowns for groups such as the number of children under five, women of reproductive age, youth, and the elderly.

AI-powered population estimates have been scientifically evaluated to be among the most accurate in the world for mapping population distribution for a variety of geographies and use-cases.

Please visit the Data for Good websit…

6 months, 3 weeks назад @ engineering.fb.com
Simulator-based reinforcement learning for data center cooling optimization
Simulator-based reinforcement learning for data center cooling optimization Simulator-based reinforcement learning for data center cooling optimization

Meta is revamping its new data center design to optimize for artificial intelligence and the same methodology will be applicable for future data center optimizations as well.

As Meta is revamping its new data center design to optimize for artificial intelligence, the same methodology will be applicable for future data center optimizations as well to improve operational efficiency.

A reinforcement learning approach to data center coolingReinforcement learning (RL) is good at modeling control systems as sequential state machines.

There are also various RL approaches reported such as, transforming cooling optimization via deep reinforcement learning and data center cooling using model-predicti…

7 months, 2 weeks назад @ engineering.fb.com
How PyTorch powers AI training and inference
How PyTorch powers AI training and inference How PyTorch powers AI training and inference

How PyTorch powers AI training and inferenceLearn about new PyTorch advancements for LLMs and how PyTorch is enhancing every aspect of the LLM lifecycle.

In this talk from AI Infra @ Scale 2024, software engineers Wanchao Liang and Evan Smothers are joined by Meta research scientist Kimish Patel to discuss our newest features and tools that enable large-scale training, memory efficient fine-tuning, and on-device LLM capabilities.

First, they cover the importance of memory-efficient fine-tuning and a few common architectural and algorithmic techniques to enable fine-tuning on consumer-grade hardware.

Then they discuss the challenges of deploying large models for on-device deployment and how …

8 months назад @ engineering.fb.com
Inside the hardware and co-design of MTIA
Inside the hardware and co-design of MTIA Inside the hardware and co-design of MTIA

In this talk from AI Infra @ Scale 2024, Joel Colburn, a software engineer at Meta, technical lead Junqiang Lan, and software engineer Jack Montgomery discuss the second generation of MTIA, Meta’s in-house training and inference accelerator.

They cover the co-design process behind building the second generation of Meta’s first-ever custom silicon for AI workloads, including the PyTorch software ecosystem, and the model architectures for Meta’s key applications.

They demonstrate how MTIA achieves the performance, efficiency, and developer experience to successfully launch models into production.

They also highlight several co-design examples where special silicon features are utilized to acc…

8 months назад @ engineering.fb.com
Bringing Llama 3 to life
Bringing Llama 3 to life Bringing Llama 3 to life

At AI Infra @ Scale 2024, Meta engineers discussed every step of how we built and brought Llama 3 to life, from data and training to inference.

Joe Spisak, Product Director and Head of Generative AI Open Source at Meta, talks about the history of Llama and Meta’s overarching vision for open source AI.

He’s joined by Delia David, a software engineer at Meta, to discuss all things data-related for GenAI.

Kaushik Veeraraghavan, a software engineer at Meta, discusses how Meta trains Llama at scale and delves into the data center, networking, and software investments that have enabled the development of Meta’s Llama 3 models.

Finally, Ye (Charlotte) Qia, a production engineer at Meta, discusses …

8 months назад @ engineering.fb.com
Aparna Ramani discusses the future of AI infrastructure
Aparna Ramani discusses the future of AI infrastructure Aparna Ramani discusses the future of AI infrastructure

Delivering new AI technologies at scale also means rethinking every layer of our infrastructure – from silicon and software systems and even our data center designs.

For the second year in a row, Meta’s engineering and infrastructure teams returned for the AI Infra @ Scale conference, where they discussed the challenges of scaling up an infrastructure for AI as well as work being done on our large-scale GPU clusters, open hardware designs for next-generation data center hardware, and how Meta is building custom silicon like the Meta Training and Inference Accelerator (MTIA) to handle some of our AI training workloads.

Aparna Ramani, VP of Engineering at Meta, responsible for AI infrastructu…

8 months назад @ engineering.fb.com
How Meta animates AI-generated images at scale
How Meta animates AI-generated images at scale How Meta animates AI-generated images at scale

Meta AI’s animate feature, which lets people generate a short animation of a generated image, carried unique challenges in this regard.

Here’s how we were able to deploy Meta AI’s animate feature using a combination of latency optimizations, traffic management, and other novel techniques.

We started by looking at the data for previous traffic on our AI-generated media both at their launches and over time.

With these changes, the preponderance of requests remained in region and latency dropped to roughly what we would expect.

The service tries to take a chunk of that region’s requests and offload them to a nearby region that can handle them without becoming more overloaded.

8 months, 1 week назад @ engineering.fb.com
A RoCE network for distributed AI training at scale
A RoCE network for distributed AI training at scale A RoCE network for distributed AI training at scale

Our paper, “ RDMA over Ethernet for Distributed AI Training at Meta Scale ,” provides the details on how we design, implement, and operate one of the world’s largest AI networks at scale.

These RoCE clusters support an extensive range of production distributed GPU training jobs, including ranking, content recommendation, content understanding, natural language processing, and GenAI model training, among other workloads.

However, our experience with distributed AI training workloads provides a different perspective on tailoring the congestion control algorithms.

Moving forwardThe design and operation of large-scale RoCE networks for distributed AI training workloads have evolved to meet the …

8 months, 3 weeks назад @ engineering.fb.com
Meet Caddy – Meta’s next-gen mixed reality CAD software
Meet Caddy – Meta’s next-gen mixed reality CAD software Meet Caddy – Meta’s next-gen mixed reality CAD software

What happens when a team of mechanical engineers get tired of looking at flat images of 3D models over Zoom?

Meet the team behind Caddy, a new CAD app for mixed reality.

They join Pascal Hartig (@passy) on the Meta Tech Podcast to talk about teaching themselves to code, disrupting the CAD software space, and how they integrated Caddy with Llama 3, and so much more!

Download or listen to the podcast episode below:You can also find the episode wherever you get your podcasts, including:The Meta Tech Podcast is a podcast, brought to you by Meta, where we highlight the work Meta’s engineers are doing at every level – from low-level frameworks to end-user features.

And if you’re interested in lea…

9 months, 1 week назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 1 month назад
How to Build an LLM Agent With AutoGen: Step-by-Step Guide
How to Build an LLM Agent With AutoGen: Step-by-Step Guide How to Build an LLM Agent With AutoGen: Step-by-Step Guide

The efficiency of an LLM agent depends on the selection of the right LLM model.

In this article, we’ll introduce the fundamental building blocks of LLM agents and then walk through the process of building an LLM agent step by step.

Building an LLM agent from scratchIn the following, we’ll build a trip-planning LLM agent from scratch.

Using AutoGen’s OpenAI Assistant Agent, we instantiate a prompt that the LLM agent will follow throughout its interactions.

Related Ethical Considerations and Best Practices in LLM Development Read moreEnhancing LLM agent performanceWhile architecting an LLM agent, you have to keep in mind opportunities to improve the performance of the LLM agent.

1 month назад @ neptune.ai
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection] Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]

Moreover, I will make the case for why Bayesian deep learning can satisfy these desiderata and briefly review recent advances in the field.

The case for Bayesian deep learningBayesian deep learning uses the foundational statistical principles of Bayesian inference to endow deep learning systems with the ability to make probabilistic predictions.

However, Bayesian deep learning is unfortunately still not as easy to use as standard deep learning, which you can do these days in a few lines of PyTorch code.

If you want to use a Bayesian deep learning model, first, you have to think about specifying the prior.

If this is the case, trying out Bayesian deep learning is likely worth your while.

1 month, 1 week назад @ neptune.ai
Introduction to State Space Models as Natural Language Models
Introduction to State Space Models as Natural Language Models Introduction to State Space Models as Natural Language Models

TL;DR State Space Models (SSMs) use first-order differential equations to represent dynamic systems.

Understanding state space modelsBefore exploring how State Space Models (SSMs) can function as components of large language models (LLMs), we’ll examine their foundational mechanics.

State space models for natural language processingState Space Models (SSMs), long established in time series analysis, have been utilized as trainable sequence models for decades.

Linear state space layers (LSSLs)So far, we’ve seen that State Space Models are efficient sequence models.

Improvements on the state matrix AIn the previous section, we explored how the original LSSL relied on a fixed, predefined form …

1 month, 2 weeks назад @ neptune.ai
Ethical Considerations and Best Practices in LLM Development
Ethical Considerations and Best Practices in LLM Development Ethical Considerations and Best Practices in LLM Development

To keep data secure throughout the model’s lifecycle, implement these practices: data anonymization, secure model serving and privacy penetration tests.

For example, a recruitment LLM favoring male applicants due to biased training data reflects a harmful bias that requires correction.

Monitor bias continuouslyMitigating bias isn’t a one-time effort—it requires ongoing monitoring to ensure that your LLM remains fair and effective across iterations.

Although these contributions are publicly available, the move opened up debates about the ethics of reusing community-contributed content for proprietary AI training.

Best practices for ethical LLM developmentNavigating the regulatory landscape r…

1 month, 3 weeks назад @ neptune.ai
Open LLMs are Necessary For Current Private Adaptations and Outperform Their Closed Alternatives [Paper Reflection]
Open LLMs are Necessary For Current Private Adaptations and Outperform Their Closed Alternatives [Paper Reflection] Open LLMs are Necessary For Current Private Adaptations and Outperform Their Closed Alternatives [Paper Reflection]

While much of the discussion around LLMs centers on task and computational performance, in our paper Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives, we focus on the privacy implications of using Open and Closed LLMs.

The threat space in adapting LLMs to private dataThe adaptation of Closed LLMs to private datasets introduces a multifaceted threat space.

Related Zero-Shot and Few-Shot Learning with LLMs Read morePrivate adaptation methods for Open LLMsUnlike Closed LLMs, Open LLMs provide access to their parameters, enabling more flexible and parameter-centric private adaptation methods.

Performance: All adaptation methods for Closed LLMs ach…

2 months назад @ neptune.ai
Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale
Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale

“What is not measured, cannot be improved.” This quote has become a guiding principle for teams training foundation models.

During my talk at NeurIPS, I broke down five key lessons learned from teams facing large-scale model training and monitoring.

Waabi’s teams, running large-scale ML experiments, needed a way to organize and share their experiment data efficiently.

Visualizing large datasetsWe generally do not think of dataset visualization as part of experiment monitoring.

Moving forwardThe path to efficient hyperscale training lies in combining robust monitoring, advanced debugging tools, and comprehensive experiment tracking.

2 months, 1 week назад @ neptune.ai
Mixture of Experts LLMs: Key Concepts Explained
Mixture of Experts LLMs: Key Concepts Explained Mixture of Experts LLMs: Key Concepts Explained

TL;DR Mixture of Experts (MoE) is a type of neural network architecture that employs sub-networks (experts) to process specific input parts.

This is the key idea behind Mixture of Expert LLMs.

The Switch-Language Transformer, Mixtral, GLaM, GShard, and DeepSeekMoE are Mixture of Experts LLMs (MoEs), which require only executing a portion of the model’s computational graph during inference.

Optimization strategies for MoE LLMs are discussed comprehensively in the papers introducing the Switch Transformer, GShard, and GLaM.

Mixture of Experts (MoE) is an approach to scaling LLMs to trillions of parameters with conditional computation while avoiding exploding computational costs.

2 months, 2 weeks назад @ neptune.ai
Hyperparameter Optimization For LLMs: Advanced Strategies
Hyperparameter Optimization For LLMs: Advanced Strategies Hyperparameter Optimization For LLMs: Advanced Strategies

Advanced hyperparameter optimization strategies, like population-based training, Bayesian optimization, and adaptive LoRA, promise to balance computational effort and outcome.

To avoid this, learning rate schedules for LLMs start with a small learning rate and slowly ramp it up to its maximum value.

Can we use traditional machine learning hyperparameter optimization methods for LLMs?

| Modified based on: sourceHands-on: LLM hyperparameter optimization with neptune.aiOptuna is a framework for optimizing hyperparameter search using Bayesian optimization.

See the docs or watch a short product demo (2 min)Play with a live Neptune Scale projectRequest your early accessWhat’s next in LLM hyperpar…

2 months, 3 weeks назад @ neptune.ai
Multimodal Large Language Models
Multimodal Large Language Models Multimodal Large Language Models

TL;DR Multimodal Large Language Models (MLLMs) process data from different modalities like text, audio, image, and video.

This article explores Multimodal Large Language Models, exploring their core functionalities, challenges, and potential for various machine-learning domains.

Let’s break down the concept of Multimodal Large Language Models (MLLMs) by first understanding the terms “modal” and “multimodal:”“Modal” refers to a particular way of communicating or perceiving information.

| SourceGoogle: PaLM-EGoogle developed an embodied language model, PaLM-E, to incorporate continuous sensor modalities into language models and establish the link between words and perceptions.

Improving how t…

3 months назад @ neptune.ai
How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai
How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai

In this guide, we’ll show you how to build a RAG system using the LangChain framework, evaluate its performance using Ragas, and track your experiments with neptune.ai.

Part 1: Building a baseline RAG system with LangChainIn the first part of this guide, we’ll use LangChain to build a RAG system for the blog posts in the LLMOps category on Neptune’s blog.

Ragas works smoothly with LangChain, making it a great choice for evaluating our RAG system.

Step 1: Generate a RAG evaluation datasetAn evaluation set for RAG tasks is similar to a question-answering task dataset.

Step 2: Choose RAG evaluation metricsAs mentioned earlier, Ragas offers both LLM-based and non-LLM-based metrics for RAG syste…

3 months, 4 weeks назад @ neptune.ai
Position: Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection]
Position: Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection] Position: Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection]

In our paper, Understanding LLMs Requires More Than Statistical Generalization, we argue that current machine learning theory cannot explain the interesting emergent properties of Large Language Models, such as reasoning or in-context learning.

Inductive biases affect which solution the neural network converges to, such as the model architecture or the optimization algorithm.

How do language complexity and model architecture affect generalization ability?

showed how different neural network architectures generalize better for different language types.

Presumably, we’ll need to find different complexity measures for different model architectures that consider their specific inductive biases.

4 months назад @ neptune.ai
From Research to Production: Building The Most Scalable Experiment Tracker For Foundation Models
From Research to Production: Building The Most Scalable Experiment Tracker For Foundation Models From Research to Production: Building The Most Scalable Experiment Tracker For Foundation Models

TL;DR At a large-scale model training (in huge models), anomalies are not rare events but problematic patterns that drive failure.

The Neptune Scale experiment tracker supports fault tolerance and is designed to maintain progress despite hardware failures, making it adaptable for enterprise teams tackling LLM fine-tuning, compliance, and building domain-specific models.

Experiment tracking back then was straightforward—dealing mostly with single models or small-scale distributed systems.

One of the biggest lessons we’ve learned is that experiment tracking has evolved into experiment monitoring.

That’s why we’re focusing on building intelligent alerts and anomaly detection right into our exp…

4 months, 1 week назад @ neptune.ai
Transformers Key-Value Caching Explained
Transformers Key-Value Caching Explained Transformers Key-Value Caching Explained

Key-value (KV) caching is a clever trick to do that: At inference time, key and value matrices are calculated for each generated token.

Implementing K-V caching in large-scale production systems requires careful cache management, including choosing an appropriate strategy for cache invalidation and exploring opportunities for cache reuse.

Key-value (KV) caching is a clever trick to do just that – let’s see how it works and when to use it.

Transformer architecture overviewBefore we dive into KV caching, we will need to take a short detour to the attention mechanism used in transformers.

Understanding how it works is required to spot and appreciate how KV caching optimizes transformer inferen…

4 months, 2 weeks назад @ neptune.ai
Learn From Failure: Fine-Tuning LLMs With Trial-and-Error Data For Intuitionistic Propositional Logic Proving [Paper Reflection]
Learn From Failure: Fine-Tuning LLMs With Trial-and-Error Data For Intuitionistic Propositional Logic Proving [Paper Reflection] Learn From Failure: Fine-Tuning LLMs With Trial-and-Error Data For Intuitionistic Propositional Logic Proving [Paper Reflection]

In our paper, Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving, we explored this problem experimentally.

Our goal was to assess the influence of trial-and-error information in the training data on the performance of LLMs in theorem proving.

However, at the time we published our paper, current approaches to training LLMs for ATPs only utilized data on correct proof attempts.

We hope our work can raise the community’s awareness of the importance of trial-and-error data for automated theorem proving.

We believe this advancement is largely due to the substantial trial-and-error data included in the model’s training process.

4 months, 3 weeks назад @ neptune.ai
Fine-Tuning Llama 3 with LoRA: Step-by-Step Guide
Fine-Tuning Llama 3 with LoRA: Step-by-Step Guide Fine-Tuning Llama 3 with LoRA: Step-by-Step Guide

We will explore these challenges and provide an example of fine-tuning the Llama 3 8B Instruct model utilizing the neptune.ai experiment tracker.

The Llama 3 training data is seven times larger than what Meta used for training Llama 2.

For pre-training, Meta combined four types of parallelization, an approach they dubbed “4D parallelism”: data, model, pipeline, and context.

Hands-on guide: resource-efficient fine-tuning of Llama 3 on Google ColabFine-tuning Llama 3 8B is challenging, as it requires considerable computational resources.

We’ll use the Llama 3 8B model, which is sufficient for this task despite being the smallest Llama 3 model.

5 months назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 2 weeks, 4 days назад
On the Biology of a Large Language Model (Part 1)
On the Biology of a Large Language Model (Part 1) On the Biology of a Large Language Model (Part 1)

An in-depth look at Anthropic's Transformer Circuit Blog Post https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson,

Sam Zimmerman, Kelley Rivoire, Thom…

2 weeks, 4 days назад @ youtube.com
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained)
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained) DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained)

#deepseek #llm #reinforcementlearning GRPO is one of the core advancements used in Deepseek-R1, but was introduced already last year in this paper that uses a combination of new RL techniques and iterative data collection to achieve remarkable performance on mathematics benchmarks with just a 7B model. Paper: https://arxiv.org/abs/2402.03300 Abstract:

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achie…

2 months, 3 weeks назад @ youtube.com
Traditional Holiday Live Stream
Traditional Holiday Live Stream Traditional Holiday Live Stream

https://ykilcher.com/discord Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yannic-kilcher

Minds: https://www.minds.com/ykilcher

Parler: https://parler.com/profile/YannicKilcher

LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https:/…

3 months, 4 weeks назад @ youtube.com
Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)
Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained) Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

#tokenization #llm #meta This paper does away with tokenization and creates an LLM architecture that operates on dynamically sized "patches" instead of tokens. By controlling the patch size, they gain a level of control over the tradeoff between model size and FLOPs and use that to achieve more favorable scaling behavior than classically tokenized LLMs. Paper: https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/

Code: https://github.com/facebookresearch/blt Abstract:

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant imp…

4 months назад @ youtube.com
Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)
Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained) Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)

This paper demonstrates in a series of experiments that current safety alignment techniques of LLMs, as well as corresponding jailbreaking attacks, are in large part focusing on modulating the distribution of the first few tokens of the LLM response. Paper: https://openreview.net/forum?id=6Mxhg9PtDE&s=09 Abstract:

The safety alignment of current Large Language Models (LLMs) is vulnerable. Simple attacks, or even benign fine-tuning, can jailbreak aligned models. We note that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output to…

4 months, 2 weeks назад @ youtube.com
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained) TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

A deep dive into the TokenFormer and an opinion about its impact, novelty, and relation to prior work. Paper: https://arxiv.org/abs/2410.23168 Abstract:

Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel dimensions) are introduced, the entire model typically requires retraining from scratch. As model sizes continue growing, this strategy results in increasingly high com…

5 months назад @ youtube.com
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

This paper (by Apple) questions the mathematical reasoning abilities of current LLMs and designs a synthetic template-based dataset distribution to investigate various aspects around LLM performance of high-school level math questions. Paper: https://arxiv.org/abs/2410.05229 Abstract:

Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities hav…

6 months назад @ youtube.com
Were RNNs All We Needed? (Paper Explained)
Were RNNs All We Needed? (Paper Explained) Were RNNs All We Needed? (Paper Explained)

This paper posits the interesting question: How much of the performance of Mamba, S4, and other state-space-like models is actually just attributable to some very core concepts - rather than their elaborate architectures. The authors construct minimal versions of GRUs and LSTMs and report competitive performance. Paper: https://arxiv.org/abs/2410.01201 Abstract:

The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training. As a result, many novel recurrent architectures, such as S4, Mamba, and Aaren, have been proposed that achieve comparable performance. In this work, we revisit traditional …

6 months, 1 week назад @ youtube.com
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper) Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

How can one best use extra FLOPS at test time? Paper: https://arxiv.org/abs/2408.03314 Abstract:

Enabling LLMs to improve their outputs by using more test-time computation is a critical step towards building generally self-improving agents that can operate on open-ended natural language. In this paper, we study the scaling of inference-time computation in LLMs, with a focus on answering the question: if an LLM is allowed to use a fixed but non-trivial amount of inference-time compute, how much can it improve its performance on a challenging prompt? Answering this question has implications not only on the achievable performance of LLMs, but also on the future of LLM pretraining and how one s…

6 months, 2 weeks назад @ youtube.com
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained) Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)

#llm #privacy #finetuning Can you tamper with a base model in such a way that it will exactly remember its fine-tuning data? This paper presents a method of doing exactly that, and implements it in modern transformers. OUTLINE:

0:00 - Intro & Overview

10:50 -Core idea: single-use data traps

44:30 - Backdoors in transformer models

58:00 - Additional numerical tricks

1:00:35 - Experimental results & conclusion Paper: https://arxiv.org/abs/2404.00473

Code: https://github.com/ShanglunFengatETHZ/PrivacyBackdoor Abstract:

Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a…

8 months, 3 weeks назад @ youtube.com
Scalable MatMul-free Language Modeling (Paper Explained)
Scalable MatMul-free Language Modeling (Paper Explained) Scalable MatMul-free Language Modeling (Paper Explained)

Matrix multiplications (MatMuls) are pervasive throughout modern machine learning architectures. However, they are also very resource intensive and require special accelerators (GPUs). This paper explores architectures that do away with MatMuls and use quantization and recurrence to keep performance up. OUTLINE:

0:00 - Intro

2:30 - MatMul is everywhere

5:55 - Ternary accumulation as a substitute for matrix multiplication

16:35 - Replacing attention layers with recurrent layers

32:40 - Replacing dense layers with ternary channel mixing

38:30 - Language modelling results & scaling laws

45:00 - Other experimental results

48:20 - Conclusion Paper: https://arxiv.org/abs/2406.02528

Code: https://…

9 months, 2 weeks назад @ youtube.com
3blue1brown 3blue1brown
последний пост 3 days, 11 hours назад
How to measure nearby galaxies
How to measure nearby galaxies How to measure nearby galaxies

From this video: https://youtu.be/hFMaT9oRbs4

3 days, 11 hours назад @ youtube.com
Measuring the distance to Venus without radar
Measuring the distance to Venus without radar Measuring the distance to Venus without radar

From this video with Terry Tao: https://youtu.be/hFMaT9oRbs4

2 weeks, 1 day назад @ youtube.com
Measuring the speed of light using Jupiter's moons
Measuring the speed of light using Jupiter's moons Measuring the speed of light using Jupiter's moons

From this video with Terry Tao: https://youtu.be/hFMaT9oRbs4

2 weeks, 2 days назад @ youtube.com
The tragic tale of Guillaume Le Gentil
The tragic tale of Guillaume Le Gentil The tragic tale of Guillaume Le Gentil

From this video: https://youtu.be/hFMaT9oRbs4 Artwork by Kurt Bruns

3 weeks, 6 days назад @ youtube.com
Zooming out by powers of 10
Zooming out by powers of 10 Zooming out by powers of 10

From this video: https://youtu.be/YdOXS_9_P4U

4 weeks назад @ youtube.com
There's more to those colliding blocks that compute pi
There's more to those colliding blocks that compute pi There's more to those colliding blocks that compute pi

Two colliding blocks compute pi, here we dig into the physics to explain why

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos. The original paper by Gregory Galperin:

https://www.maths.tcd.ie/~lebed/Galperin.%20Playing%20pool%20with%20pi.pdf Adam Brown's paper on the analogy with Grover's Algorithm:

https://arxiv.org/pdf/1912.02207 Here's a lovely interactive built by GitHub user prajwalsouza after watching this video: https://prajwalsouza.github.io/Experiments/Colliding-Blocks.html Matt Parker's Pi Day video:

https://youtu.be/vlUTlbZT4ig NY Times blog post about this proble…

1 month, 1 week назад @ youtube.com
When being beautifully wrong leads to discovery
When being beautifully wrong leads to discovery When being beautifully wrong leads to discovery

Full video: https://youtu.be/YdOXS_9_P4U

1 month, 3 weeks назад @ youtube.com
Why the ancient Greek's rejected heliocentrism
Why the ancient Greek's rejected heliocentrism Why the ancient Greek's rejected heliocentrism

From this video on the cosmic distance ladder: https://youtu.be/YdOXS_9_P4U

1 month, 3 weeks назад @ youtube.com
How to estimate the distance to the sun
How to estimate the distance to the sun How to estimate the distance to the sun

Full video: https://youtu.be/YdOXS_9_P4U

1 month, 3 weeks назад @ youtube.com
How Aristarchus deduced the distance to the moon
How Aristarchus deduced the distance to the moon How Aristarchus deduced the distance to the moon

Full video: https://youtu.be/YdOXS_9_P4U

1 month, 3 weeks назад @ youtube.com
The cosmic distance ladder with Terence Tao (part 2)
The cosmic distance ladder with Terence Tao (part 2) The cosmic distance ladder with Terence Tao (part 2)

How we know the distances to the planets, stars, and faraway galaxies.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

FAQ with added details and corrections: https://terrytao.wordpress.com/2025/02/13/cosmic-distance-ladder-video-with-grant-sanderson-3blue1brown-commentary-and-corrections/ An equally valuable form of support is to simply share the videos. Terry and his collaborator Tanya have an Instagram about the cosmic distance ladder: https://www.instagram.com/cosmic_distance_ladder/ Artwork of Guillaume Le Gentil by Kurt Bruns

Artwork of Antonia Maury and Henrietta Leavitt by Talia Gershon: https://bit.ly/taliagershonart

Several of t…

1 month, 4 weeks назад @ youtube.com
How Earth's size was computed by Eratosthenes
How Earth's size was computed by Eratosthenes How Earth's size was computed by Eratosthenes

From this video: https://youtu.be/YdOXS_9_P4U

2 months, 1 week назад @ youtube.com
Terence Tao on how we measure the cosmos | Part 1
Terence Tao on how we measure the cosmos | Part 1 Terence Tao on how we measure the cosmos | Part 1

The Cosmic Distance Ladder, how we learned distances in the heavens.

Email list: https://3b1b.co/mail

Patreon supporters see early views of new videos: https://www.patreon.com/3blue1brown Artwork by Kurt Bruns

Thanks to Paul Dancstep for several animations, such as the powers of 10 zoom out and the simulations of shadows on the moon. Thanks to Tanya Klowden for helpful conversations about the history of the distance ladder. Argument for why if every shadow of a convex shape is a circle, it must be a sphere: https://mathoverflow.net/questions/39127/is-the-sphere-the-only-surface-with-circular-projections-or-can-we-deduce-a-sp Timestamps: 0:00 - About Terence Tao and the Distance Ladder

2:02 …

2 months, 2 weeks назад @ youtube.com
Measuring the earth with Terence Tao
Measuring the earth with Terence Tao Measuring the earth with Terence Tao

From this video: https://youtu.be/YdOXS_9_P4U

2 months, 2 weeks назад @ youtube.com
The topology of two-note chords
The topology of two-note chords The topology of two-note chords

Based on a construction in this video: https://youtu.be/IQqtsm-bBRU

2 months, 3 weeks назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 3 days, 22 hours назад
NVIDIA’s Tech: Brutal 2,500,000 Part Simulation!
NVIDIA’s Tech: Brutal 2,500,000 Part Simulation! NVIDIA’s Tech: Brutal 2,500,000 Part Simulation!

❤️ Check out Vast.ai and run DeepSeek or any AI project: https://vast.ai/papers 📝 The papers are available here:

https://www.dgp.toronto.edu/projects/trading-spaces/

https://pcs-sim.github.io/pd/

https://visualcomputing.ist.ac.at/publications/2024/SDTF/

https://starryuniv.cn/files/sig24magnetic.pdf

https://github.com/Univstar/IoB-Ferrofluid-2D 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ah…

3 days, 22 hours назад @ youtube.com
OpenAI’s GPT 4.1 - Absolutely Amazing!
OpenAI’s GPT 4.1 - Absolutely Amazing! OpenAI’s GPT 4.1 - Absolutely Amazing!

❤️ Check out DeepInfra and run DeepSeek or many other AI projects: https://deepinfra.com/papers GPT 4.1 (once again, likely API only, not in the ChatGPT app):

https://openai.com/index/gpt-4-1/ 📝 The paper "Humanity's Last Exam" is available here:

https://agi.safe.ai/ Sources:

https://x.com/paulgauthier/status/1911927464844304591?s=46

https://x.com/ficlive/status/1911853409847906626

https://x.com/flavioad/status/1911848067470598608?s=46P

https://x.com/pandeyparul/status/1911958369734107439?s=46

https://x.com/demishassabis/status/1912197180187897985?s=46

https://x.com/emollick/status/1911966088339894669?s=46

https://x.com/aibattle_/status/1911845556885893488?s=46

https://x.com/augmentcode/sta…

1 week назад @ youtube.com
NVIDIA’s New Robot AI: Insanely Good!
NVIDIA’s New Robot AI: Insanely Good! NVIDIA’s New Robot AI: Insanely Good!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The paper "GR00T N1: An Open Foundation Model for Generalist Humanoid Robots" is available here:

https://github.com/NVIDIA/Isaac-GR00T

https://arxiv.org/abs/2503.14734 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our ge…

1 week, 6 days назад @ youtube.com
Meta's LLAMA 4 AI In 4 Minutes!
Meta's LLAMA 4 AI In 4 Minutes! Meta's LLAMA 4 AI In 4 Minutes!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video Or just run it with Ollama when the model appears:

https://ollama.com/search?q=llama%204 📝 LLAMA 4:

https://ai.meta.com/blog/llama-4-multimodal-intelligence/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patre…

2 weeks, 2 days назад @ youtube.com
OpenAI’s ChatGPT - 8 New Incredible Features!
OpenAI’s ChatGPT - 8 New Incredible Features! OpenAI’s ChatGPT - 8 New Incredible Features!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video Our neural rendering paper: https://users.cg.tuwien.ac.at/zsolnai/gfx/gaussian-material-synthesis/ Sources:

https://x.com/MattJedrzejewsk/status/1906835418588590318

https://x.com/MattJedrzejewsk/status/1907145560056078581

https://x.com/egeberkina/status/1906089394219491782/photo/1

https://x.com/cocktailpeanut/status/1906983829035974890

https://x.com/cocktailpeanut/status/1906956398136811820

https://x.com/ImSamThompson/sta…

2 weeks, 3 days назад @ youtube.com
DeepMind’s New Gemini AI: Build Anything For Free! 🏅
DeepMind’s New Gemini AI: Build Anything For Free! 🏅 DeepMind’s New Gemini AI: Build Anything For Free! 🏅

❤️ Check out Vast.ai and run DeepSeek or any AI project: https://vast.ai/papers Gemini 2.5 Pro is available here:

https://gemini.google.com/app

https://aistudio.google.com

https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ Simulated soccer/fusball: https://editor.p5js.org/trudyp/sketches/KjxZ3d8lX

Sources:

https://x.com/trudypainter/status/1904588040112120062

https://x.com/pandeyparul/status/1904731207541157922

https://x.com/alexanderchen/status/1904748250193654122

https://x.com/renderfiction/status/1905998185962643767

https://x.com/addyosmani/status/1906247323430408550

https://x.com/mbaeuml/status/1906597742522105875

https://x.com/renderfiction/status/…

3 weeks, 1 day назад @ youtube.com
NVIDIA's New AI Makes Cars Fly...Sort Of!
NVIDIA's New AI Makes Cars Fly...Sort Of! NVIDIA's New AI Makes Cars Fly...Sort Of!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers 📝 The paper "GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control" is available here:

https://research.nvidia.com/labs/toronto-ai/GEN3C/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael T…

3 weeks, 3 days назад @ youtube.com
OpenAI’s New Image Generator: An AI Revolution!
OpenAI’s New Image Generator: An AI Revolution! OpenAI’s New Image Generator: An AI Revolution!

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 4o Image Generation: https://openai.com/index/introducing-4o-image-generation/

Apple terminal: https://www.apple.com/mac/lumon-terminal-pro/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, R…

3 weeks, 5 days назад @ youtube.com
DeepSeek V3 - The King is Back…For Free!
DeepSeek V3 - The King is Back…For Free! DeepSeek V3 - The King is Back…For Free!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers Guide for using DeepSeek (R1) on Lambda (can be applied to DeepSeek V3 too, see links below):

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 DeepSeek V3 (0324) is available here:

https://www.deepseek.com/

Try it online (note: they see your data, I prefer private, see below): https://chat.deepseek.com

Paper: https://arxiv.org/pdf/2412.19437 How to run locally: https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file#6-how-to-run-locally

Ollama is probably the simplest way to run it - support …

4 weeks назад @ youtube.com
Finally, DeepMind Made An IQ Test For AIs! 🤖
Finally, DeepMind Made An IQ Test For AIs! 🤖 Finally, DeepMind Made An IQ Test For AIs! 🤖

❤️ Try Macro for free and supercharge your learning: https://macro.com/papers 📝 The papers are available here:

https://physics-iq.github.io/

https://physbench.github.io/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Taras Bobrovytsky, Thomas Krcmar, T…

1 month, 1 week назад @ youtube.com
DeepMind’s New AIs: The Future is Here!
DeepMind’s New AIs: The Future is Here! DeepMind’s New AIs: The Future is Here!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The Gemma 3 paper and the rest are available here:

https://blog.google/technology/developers/gemma-3/

https://developers.googleblog.com/en/experiment-with-gemini-20-flash-native-image-generation/

https://deepmind.google/technologies/gemini-robotics/

https://aistudio.google.com/ Sources:

https://x.com/thepushkarp/status/1899874826669744425/photo/1

https://x.com/Angaisb_/status/1899852603107721388

https://x.com/alexa…

1 month, 1 week назад @ youtube.com
NVIDIA’s New AI Grows Stuff Out Of Nothing!
NVIDIA’s New AI Grows Stuff Out Of Nothing! NVIDIA’s New AI Grows Stuff Out Of Nothing!

❤️ Try Macro for free and supercharge your learning: https://macro.com/papers 📝 The paper "Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale" is available here:

https://research.nvidia.com/labs/dir/meshtron/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard…

1 month, 2 weeks назад @ youtube.com
Microsoft's New Game AI: How Is This Good?
Microsoft's New Game AI: How Is This Good? Microsoft's New Game AI: How Is This Good?

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The paper "World and Human Action Models towards gameplay ideation" is available here:

https://www.microsoft.com/en-us/research/blog/introducing-muse-our-first-generative-ai-model-designed-for-gameplay-ideation/

https://www.nature.com/articles/s41586-025-08600-3 Sources (snake game and more):

https://x.com/emollick/status/1894480971648377198

https://x.com/emollick/status/1894441728175677837

https://x.com/levelsio/s…

1 month, 3 weeks назад @ youtube.com
ChatGPT Opens A Research Lab…For $2!
ChatGPT Opens A Research Lab…For $2! ChatGPT Opens A Research Lab…For $2!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The paper "Agent Laboratory: Using LLM Agents as Research Assistants" and "Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers" are available here:

https://agentlaboratory.github.io/

https://arxiv.org/abs/2409.04109 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable …

1 month, 3 weeks назад @ youtube.com
NVIDIA’s New AI: Text To Video Supercharged!
NVIDIA’s New AI: Text To Video Supercharged! NVIDIA’s New AI: Text To Video Supercharged!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers 📝 Magic 1-For-1:

https://magic-141.github.io/Magic-141/

https://github.com/Open-Magic-Video/Magic-1-For-1

https://arxiv.org/abs/2502.07701v1 📝 Phantom: https://phantom-video.github.io/Phantom/ 📝 Relighting paper: https://bujiazi.github.io/light-a-video.github.io/ 📝 Stepfun:

https://github.com/stepfun-ai/Step-Video-T2V

https://yuewen.cn/videos

https://arxiv.org/abs/2502.10248

https://huggingface.co/stepfun-ai/stepvideo-t2v 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.…

2 months назад @ youtube.com
DataFest Video DataFest Video
последний пост 7 months, 3 weeks назад
Data Fest Online 2020 AI Hardware Track Premiere
Data Fest Online 2020 AI Hardware Track Premiere Data Fest Online 2020 AI Hardware Track Premiere

DataFest Online 2020

AI Hardware track https://ods.ai/tracks/ai-hardware-df2020 Register and get access to the tracks: https://ods.ai/events/datafest2020

Join the community: https://ods.ai/

7 months, 3 weeks назад @ youtube.com
Interview with Juergen Schmidhuber at Data Christmas 2020
Interview with Juergen Schmidhuber at Data Christmas 2020 Interview with Juergen Schmidhuber at Data Christmas 2020

02:00-05:38 What do you think were the most outstanding underestimated news and achievements in AI field in 2020?

05:41-11:28 What do you think about trends in ML like transformers trying to replace LSTMs in NLP?

11:29-16:06 Are you working on any new types of models right now?

16:07-20:41 What is your opinion on the most underestimated ML subfield like Reinforcement Learning?

20:42-22:17 Your best recommendation for our community is to look into AI in the real physical world, right?

22:18-33:10 Do you think it is possible to achieve great results in creative AI, particularly in subjective beauty?

33:17-35:50 What prevents chat bots from reaching more intelligent levels?

36:03-39:39 What is…

7 months, 3 weeks назад @ youtube.com
Mikita Shchutski | A small BERT towards Large Medical Models
Mikita Shchutski | A small BERT towards Large Medical Models Mikita Shchutski | A small BERT towards Large Medical Models

Mikita Shchutski | Lead Machine Learning Engineer, Quantori Training large medical models using electronic health records in order to create a highly informative medical embedding space

7 months, 3 weeks назад @ youtube.com
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 2 weeks, 2 days назад
Что нового умеют нейросети? Визуально-текстовая мультимодальность
Что нового умеют нейросети? Визуально-текстовая мультимодальность Что нового умеют нейросети? Визуально-текстовая мультимодальность

ёЭто фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

2 weeks, 2 days назад @ youtube.com
Из каких деталей состоят Нейро и другие VLM Яндекса
Из каких деталей состоят Нейро и другие VLM Яндекса Из каких деталей состоят Нейро и другие VLM Яндекса

Это фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

3 weeks назад @ youtube.com
Как работает Нейро и что такое мультимодальность
Как работает Нейро и что такое мультимодальность Как работает Нейро и что такое мультимодальность

Это фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

3 weeks, 5 days назад @ youtube.com
Как сделать озвучку книг из простого TTS / Константин Кузнецов
Как сделать озвучку книг из простого TTS / Константин Кузнецов Как сделать озвучку книг из простого TTS / Константин Кузнецов

Это Константин Кузнецов, руководитель группы интонации в Поиске и рекламных технологиях. Представьте, что вашему пользователю нужно слушать синтез голоса ближайшие полчаса. В таком случае нужно позаботиться о том, чтобы человек попросту не уснул. Константин рассказал, как сделать синтез не роботизированным и не пресным, чтобы удержать слушателя. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml

4 weeks назад @ youtube.com
Поисковый аукцион Яндекс Маркета / Евгений Паринов
Поисковый аукцион Яндекс Маркета / Евгений Паринов Поисковый аукцион Яндекс Маркета / Евгений Паринов

Это Евгений Паринов, руководитель группы ранжирования поисковой выдачи в Маркете. Поисковый аукцион — это механизм, который позволяет продавцам на заплатить процент за большее количество продаж. В докладе Евгений рассказал, как в Маркете учитывают рекламную выручку и внедряют учёт юнит-экономики. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml

4 weeks, 1 day назад @ youtube.com
Под капотом Нейро: как работает мультимодальность
Под капотом Нейро: как работает мультимодальность Под капотом Нейро: как работает мультимодальность

Это фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

1 month назад @ youtube.com
Байки про обучение VLM / Антон Клочков
Байки про обучение VLM / Антон Клочков Байки про обучение VLM / Антон Клочков

Это Антон Клочков, руководитель подгруппы распознавания текста в VLM в Поиске и рекламных технологиях. Антон продолжает серию рассказов про развитие картиночной мультимодальности в Яндексе. В докладе он показал, какие эксперименты над VLM ребята успели поставить, какие уроки из них извлекли и какое новогоднее чудо с ними приключилось! Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml

1 month назад @ youtube.com
Мультиагентные подходы для работы с LLM на базе сервисов Yandex Cloud / Мастер-класс
Мультиагентные подходы для работы с LLM на базе сервисов Yandex Cloud / Мастер-класс Мультиагентные подходы для работы с LLM на базе сервисов Yandex Cloud / Мастер-класс

Это мастер-класс на Practical ML Conf. Его провели ребята из Yandex Cloud: Дмитрий Рыбалко, продуктовый архитектор ML-сервисов, и Дмитрий Сошников, консультант, доцент МАИ, НИУ ВШЭ и технический руководитель AI Lab Школы дизайна в ВШЭ. В сфере LLM набирают популярность агентские подходы, когда несколько моделей взаимодействуют друг с другом, чтобы достичь одной цели. В рамках мастер-класса мы показали, как применить этот подход для построения вопросо-ответной системы в транспортной компании. При этом сами методы могут пригодиться в любой индустрии. Для решения задачи поэкспериментировали с различными структурами данных для RAG, в том числе с текстовыми и графовыми. За основу взяли языковую …

1 month назад @ youtube.com
Как достать соседа: бенчмаркаем ANN-алгоритмы / Мастер-класс
Как достать соседа: бенчмаркаем ANN-алгоритмы / Мастер-класс Как достать соседа: бенчмаркаем ANN-алгоритмы / Мастер-класс

Это мастер-класс на Practical ML Conf. Его провёл Михаил Каменщиков, руководитель юнита «Рекомендации» в Авито. Михаил объяснил, зачем нужны алгоритмы приближённого поиска соседей. А ещё рассмотрел реализацию популярных алгоритмов IVF, HNSW и показал, как на своих данных делать бенчмарк разных подходов с использованием библиотеки ann-benchmarks. Подписывайтесь на телеграм-канал Яндекса для ML-специалистов: https://t.me/yandexforml

1 month назад @ youtube.com
Как зафайнтюнить вашу любимую диффузионную модель / Мастер-класс
Как зафайнтюнить вашу любимую диффузионную модель / Мастер-класс Как зафайнтюнить вашу любимую диффузионную модель / Мастер-класс

Это мастер-класс на Practical ML Conf. Его провели: Лев Новицкий, ведущий специалист по исследованию данных Kandinsky Research в Sber AI, и Вера Соболева, младший научный сотрудник Института искусственного интеллекта AIRI. Современные диффузионные модели позволяют генерировать качественные и разнообразные изображения по текстовому описанию. Но что делать, если мы хотим создать не случайную картинку, а конкретный объект, например вазу, которую после продадим на маркетплейсе? Это задача персонализированной генерации: нужно в предобученную модель внедрить знания о конкретном объекте по нескольким фотографиям, чтобы показывать его в новых сценах. На мастер-классе разобрали основы диффузии и баз…

1 month назад @ youtube.com
Как нейросети включают воображение
Как нейросети включают воображение Как нейросети включают воображение

Это фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

1 month назад @ youtube.com
Как выбрать имя для собаки? #llm #yandex #собака
Как выбрать имя для собаки? #llm #yandex #собака Как выбрать имя для собаки? #llm #yandex #собака

Это фрагмент из доклада Екатерины Глазковой из Яндекс Поиска, с выступлении на Practical ML Conf 2024.

1 month, 1 week назад @ youtube.com
ML Party 18.03.2025
ML Party 18.03.2025 ML Party 18.03.2025

Добро пожаловать на вечерний митап для ML-инженеров от Яндекса. В этот раз поговорим про про развитие картиночной мультимодальности, книжном синтезе и не только. Вступайте в канал Yandex for ML, чтобы быть в курсе всех событий и мероприятий Яндекса https://t.me/+UohL-bYM25YyZDNi

Оставляйте свои вопросы спикерам в чате с тегом #вопрос: https://t.me/+OsKnLNG-7DE1ZTFi

1 month, 2 weeks назад @ youtube.com
Сколько калорий в мопсе? #llm #мопс #юмор
Сколько калорий в мопсе? #llm #мопс  #юмор Сколько калорий в мопсе? #llm #мопс #юмор

Это фрагмент из доклада Екатерины Глазковой из Яндекс Поиска, с выступлении на Practical ML Conf 2024.

1 month, 2 weeks назад @ youtube.com
Обучение трансформеров для дискриминативных задач | Эдуард Мартынов
Обучение трансформеров для дискриминативных задач | Эдуард Мартынов Обучение трансформеров для дискриминативных задач | Эдуард Мартынов

Это Эдуард Мартынов, выпускник факультета вычислительной математики и кибернетики МГУ. В докладе на Data Dojo в Санкт-Петербурге Никита рассказал, как применять трансформеры в соревнованиях на платформе Kaggle: обсудил технические трудности, кастомные архитектуры и показал, что нужно для победы в соревнованиях. Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml

1 month, 3 weeks назад @ youtube.com
ML Trainings ML Trainings
последний пост 3 weeks, 2 days назад
Анастасия Функнер, Ольга Павлова, Анна Ефимова | Как Ozon Банк создает ML-платформу будущего
Анастасия Функнер, Ольга Павлова, Анна Ефимова | Как Ozon Банк создает ML-платформу будущего Анастасия Функнер, Ольга Павлова, Анна Ефимова | Как Ozon Банк создает ML-платформу будущего

Спикеры: Анастасия Функнер, Ольга Павлова, Анна Ефимова, Ozon Банк

Тема доклада: MLOps, Data Science и Golang: Как Ozon Банк создает ML-платформу будущего

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:Telegram: https://t.me/datafestВконтакте: https://vk.com/datafestКанал с вакансиями в telegram: https://t.me/odsjobsКанал с апдейтами по курсам: https://t.me/odscoursesКак попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 weeks, 2 days назад @ youtube.com
Алена Феногенова | Новые бенчмарки 2024-2025 для русского языка: вызовы и перспективы
Алена Феногенова | Новые бенчмарки 2024-2025 для русского языка: вызовы и перспективы Алена Феногенова | Новые бенчмарки 2024-2025 для русского языка: вызовы и перспективы

Спикер: Алена Феногенова, AGI NLP TeamLead, Сбер

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 weeks, 2 days назад @ youtube.com
Мария Бегичева | Цифровой помощник компании в управлении операционными рисками
Мария Бегичева | Цифровой помощник компании в управлении операционными рисками Мария Бегичева | Цифровой помощник компании в управлении операционными рисками

Спикер: Мария Бегичева, Senior DS, Сбер

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 weeks, 2 days назад @ youtube.com
Полина Федотова | Foundation Models in Robotics
Полина Федотова |  Foundation Models in Robotics Полина Федотова | Foundation Models in Robotics

Спикер: Полина Федотова, Главный инженер-разработчик, лид исследовательской команды, «Сбер Центр робототехники»

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 weeks, 2 days назад @ youtube.com
Нонна Шахова, Эмели Драль,Ирина Голощапова, Анастасия Никулина | Круглый стол: Women in Data Science
Нонна Шахова, Эмели Драль,Ирина Голощапова, Анастасия Никулина | Круглый стол: Women in Data Science Нонна Шахова, Эмели Драль,Ирина Голощапова, Анастасия Никулина | Круглый стол: Women in Data Science

Спикеры: Нонна Шахова, Эмели Драль, Ирина Голощапова, Анастасия Никулина

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025

Как женщины меняют data science:

Старт в data science:

Работа и карьера:

Профессиональное развитие:

Будущее в data science: Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 weeks, 2 days назад @ youtube.com
Анна Текучева | "Знаешь что это за слово? а оно есть"
Анна Текучева | "Знаешь что это за слово? а оно есть" Анна Текучева | "Знаешь что это за слово? а оно есть"

Спикер: Анна Текучева, Data Scientist HML Wildberries

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 weeks, 2 days назад @ youtube.com
Юлия Раковская | Вступительное слово WiDS Meetup
Юлия Раковская | Вступительное слово WiDS Meetup Юлия Раковская | Вступительное слово WiDS Meetup

Спикер: Юлия Раковская, Руководитель центра исследований и разработки Сбера

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

3 weeks, 2 days назад @ youtube.com
Митап #1 | Data Fusion Contest 2025
Митап #1 | Data Fusion Contest 2025 Митап #1 | Data Fusion Contest 2025

Ежегодная серия соревнований по машинному обучению Data Fusion Contest стартовала. Страница с задачами: https://ods.ai/tracks/data-fusion-2025-competitions Во вторник 25 февраля (19:00 - 20:00 по мск) мы провели первый митап по Data Fusion Contest 2025 в ODS спейсе Spatial.Chat. Поговорили о задачах, ответим на ваши вопросы участников. В программе: Анатолий Глушенко Разбор задачи 1 “LabelCraft”

Алексей Натёкин Обзор задачи 2 “4Cast” и задачи 3 “Distribution” Команда Data Fusion Contest 2025 Q&A и обсуждение вопросов с участниками

1 month, 3 weeks назад @ youtube.com
Анастасия Вепрева | Мастер-класс: Генерация лекарственных молекул
Анастасия Вепрева | Мастер-класс: Генерация лекарственных молекул Анастасия Вепрева | Мастер-класс: Генерация лекарственных молекул

Спикер: Анастасия Вепрева - разработчик моделей в области прогнозирования физико-химических свойств и биологических активностей малых молекул, сотрудница Центра ИИ в химии ИТМО

Мероприятие 21.02.2025: https://ods.ai/events/ai_chemistrymk1 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 3 weeks назад @ youtube.com
Антон Воронов | Итоги года в DS/ML карьерных вопросах
Антон Воронов | Итоги года в DS/ML карьерных вопросах Антон Воронов | Итоги года в DS/ML карьерных вопросах

Спикер: Антон Воронов, Газпром ИД, Заместитель директора департамента, Руководитель платформы Поиска Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 4 weeks назад @ youtube.com
Пётр Ермаков | Итоги года в PyData stack
Пётр Ермаков | Итоги года в PyData stack Пётр Ермаков | Итоги года в PyData stack

Спикер: Пётр Ермаков, ML Brand Director, Яндекс Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 4 weeks назад @ youtube.com
Валентин Малых | Итоги года в NLP
Валентин Малых | Итоги года в NLP Валентин Малых | Итоги года в NLP

Спикер: Валентин Малых, руководитель группы, MTS AI Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 4 weeks назад @ youtube.com
Николай Анохин | Итоги года в RecSys
Николай Анохин | Итоги года в RecSys Николай Анохин | Итоги года в RecSys

Спикер: Николай Анохин, ведущий специалист по ML, AI VK Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 4 weeks назад @ youtube.com
Ирина Голощапова | Итоги года в Reliable ML, часть 2
Ирина Голощапова | Итоги года в Reliable ML, часть 2 Ирина Голощапова | Итоги года в Reliable ML, часть 2

Спикер: Ирина Голощапова, CDO Raiffeisenbank Operations Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 4 weeks назад @ youtube.com
Дмитрий Колодезев | Итоги года в Reliable ML, часть 1
Дмитрий Колодезев | Итоги года в Reliable ML, часть 1 Дмитрий Колодезев | Итоги года в Reliable ML, часть 1

Спикер: Дмитрий Колодезев, директор, Промсофт Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 4 weeks назад @ youtube.com
Primer Primer
последний пост 2 months, 3 weeks назад
Simulating the Evolution of Aging
Simulating the Evolution of Aging Simulating the Evolution of Aging

Patreon: https://www.patreon.com/primerlearning Ageless book: https://www.amazon.com/Ageless-Science-Getting-Older-Without/dp/0525566317/ Papers and other further reading:

Diversity of aging across the tree of life: https://pmc.ncbi.nlm.nih.gov/articles/PMC4157354/

Antagonistic pleiotropy and p53: https://pmc.ncbi.nlm.nih.gov/articles/PMC2771578/

An unsolved problem of biology (Medawar): https://ia903408.us.archive.org/31/items/medawar-1952-unsolved-problem/Medawar1952-Unsolved-Problem.pdf

Evolution of the mutation rate: https://pmc.ncbi.nlm.nih.gov/articles/PMC2910838/

Our World in Data Life Expectancy explainer: https://ourworldindata.org/life-expectancy-how-is-it-calculated-and-how-shoul…

2 months, 3 weeks назад @ youtube.com
Simulating the Evolution of Rock, Paper, Scissors
Simulating the Evolution of Rock, Paper, Scissors Simulating the Evolution of Rock, Paper, Scissors

Twitch: https://www.twitch.tv/justin_helps

Discord: https://discord.gg/NbruaNW

Store: https://store.dftba.com/collections/primer

Patreon: https://www.patreon.com/primerlearning Source and further reading on the common side-blotched lizard:

Sinervo, B.; C.M. Lively (1996). "The rock–paper–scissors game and the evolution of alternative male strategies". Nature. 380 (6571): 240–243.

https://en.wikipedia.org/wiki/Common_side-blotched_lizard Made with Godot

Github: https://github.com/Primer-Learning/PrimerTools Made possible by support from these wonderful Patrons:

abledbody

Alba Caparros-Roissard

Andrew Lang

Anthony Eufemio

Brian Cloutier

Captain Chinchilla

Christoph Grabo (@asaaki)

Christy Ser…

9 months, 2 weeks назад @ youtube.com
Evolving Rock Paper Scissors
Evolving Rock Paper Scissors Evolving Rock Paper Scissors 9 months, 2 weeks назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 6 days, 12 hours назад
#465 – Robert Rodriguez: Sin City, Desperado, El Mariachi, Alita, and Filmmaking
#465 – Robert Rodriguez: Sin City, Desperado, El Mariachi, Alita, and Filmmaking #465 – Robert Rodriguez: Sin City, Desperado, El Mariachi, Alita, and Filmmaking

Robert Rodriguez is a legendary filmmaker and creator of Sin City, El Mariachi, Desperado, Spy Kids, Machete, From Dusk Till Dawn, Alita: Battle Angel, The Faculty, and his newest venture Brass Knuckle Films.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep465-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/robert-rodriguez-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: http…

6 days, 12 hours назад @ lexfridman.com
#464 – Dave Smith: Israel, Ukraine, Epstein, Mossad, Conspiracies & Antisemitism
#464 – Dave Smith: Israel, Ukraine, Epstein, Mossad, Conspiracies & Antisemitism #464 – Dave Smith: Israel, Ukraine, Epstein, Mossad, Conspiracies & Antisemitism

Dave Smith is a comedian, libertarian, political commentator, and the host of Part of the Problem podcast.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep464-sc

See below for timestamps, and to give feedback, submit questions, contact Lex, etc. CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

Dave's X: https://x.com/ComicDaveSmith

Dave's YouTube: https://youtube.com/DSmithcomic

Dave's Instagram: https://instagram.com/theprobl…

2 weeks, 1 day назад @ lexfridman.com
#463 – Douglas Murray: Putin, Zelenskyy, Trump, Israel, Netanyahu, Hamas & Gaza
#463 – Douglas Murray: Putin, Zelenskyy, Trump, Israel, Netanyahu, Hamas & Gaza #463 – Douglas Murray: Putin, Zelenskyy, Trump, Israel, Netanyahu, Hamas & Gaza

Douglas Murray is the author of On Democracies and Death Cults, The War on The West, and The Madness of Crowds.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep463-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://callofduty.com/warzoneOracle: Cloud infrastructure.

Go to https://oracle.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

3 weeks, 3 days назад @ lexfridman.com
#462 – Ezra Klein and Derek Thompson: Politics, Trump, AOC, Elon & DOGE
#462 – Ezra Klein and Derek Thompson: Politics, Trump, AOC, Elon & DOGE #462 – Ezra Klein and Derek Thompson: Politics, Trump, AOC, Elon & DOGE

Ezra Klein is one of the most influential voices representing the left-wing of American politics.

He is a columnist for the NY Times and host of The Ezra Klein Show.

Derek Thompson is a writer at The Atlantic and host of the Plain English podcast.

Together they have written a new book titled Abundance that lays out a set of ideas for the future of the Democratic party.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep462-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

4 weeks назад @ lexfridman.com
#461 – ThePrimeagen: Programming, AI, ADHD, Productivity, Addiction, and God
#461 – ThePrimeagen: Programming, AI, ADHD, Productivity, Addiction, and God #461 – ThePrimeagen: Programming, AI, ADHD, Productivity, Addiction, and God

ThePrimeagen (aka Michael Paulson) is a programmer who has educated, entertained, and inspired millions of people to build software and have fun doing it.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep461-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexNetSuite: Business management software.

Go to http://netsuite.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexAG1: All-in-one daily nutrition drinks.

1 month назад @ lexfridman.com
#460 – Narendra Modi: Prime Minister of India – Power, Democracy, War & Peace
#460 – Narendra Modi: Prime Minister of India – Power, Democracy, War & Peace #460 – Narendra Modi: Prime Minister of India – Power, Democracy, War & Peace

Narendra Modi is the Prime Minister of India.

On YouTube this episode is available in English, Hindi, Russian (and soon other languages).

Captions and voice-over audio tracks are provided (for the main episode video on YouTube) in English, Hindi, Russian, and the original mixed-language version, with subtitles available in your preferred language.

To listen to the original mixed-language version, please select the Hindi (Latin) audio track.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep460-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

1 month, 1 week назад @ lexfridman.com
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters #459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Dylan Patel is the founder of SemiAnalysis, a research & analysis company specializing in semiconductors, GPUs, CPUs, and AI hardware.

Nathan Lambert is a research scientist at the Allen Institute for AI (Ai2) and the author of a blog on AI called Interconnects.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep459-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://invideo.io/i/lexpodGitHub: Developer platform and AI code editor.

(4:31:34) – AI agents(4:40:16) – Programming and AI(4:47:43) – Open source(4:56:55) – Stargate(5:04:24) – Future of AIPODCAST LINKS:– Podcast Website: https://lexfridman.com/podcast–…

2 months, 2 weeks назад @ lexfridman.com
#458 – Marc Andreessen: Trump, Power, Tech, AI, Immigration & Future of America
#458 – Marc Andreessen: Trump, Power, Tech, AI, Immigration & Future of America #458 – Marc Andreessen: Trump, Power, Tech, AI, Immigration & Future of America

Marc Andreessen is an entrepreneur, investor, co-creator of Mosaic, co-founder of Netscape, and co-founder of the venture capital firm Andreessen Horowitz.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep458-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://encord.com/lexGitHub: Developer platform and AI code editor.

Go to https://gh.io/copilotNotion: Note-taking and team collaboration.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

2 months, 3 weeks назад @ lexfridman.com
#457 – Jennifer Burns: Milton Friedman, Ayn Rand, Economics, Capitalism, Freedom
#457 – Jennifer Burns: Milton Friedman, Ayn Rand, Economics, Capitalism, Freedom #457 – Jennifer Burns: Milton Friedman, Ayn Rand, Economics, Capitalism, Freedom

Jennifer Burns is a historian of ideas, focusing on the evolution of economic, political, and social ideas in the United States in the 20th century.

She wrote two biographies, one on Milton Friedman, and the other on Ayn Rand.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep457-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://brain.fm/lexGitHub: Developer platform and AI code editor.

Go to https://gh.io/copilotLMNT: Zero-sugar electrolyte drink mix.

3 months назад @ lexfridman.com
#456 – Volodymyr Zelenskyy: Ukraine, War, Peace, Putin, Trump, NATO, and Freedom
#456 – Volodymyr Zelenskyy: Ukraine, War, Peace, Putin, Trump, NATO, and Freedom #456 – Volodymyr Zelenskyy: Ukraine, War, Peace, Putin, Trump, NATO, and Freedom

Volodymyr Zelenskyy is the President of Ukraine.

On YouTube this episode is available in English, Ukrainian, and Russian.

Captions and voice-over audio tracks are provided in English, Ukrainian, Russian, and the original mixed-language version, with subtitles available in your preferred language.

To listen to the original mixed language version, please select the English (UK) audio track audio track.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep456-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

3 months, 2 weeks назад @ lexfridman.com
#455 – Adam Frank: Alien Civilizations and the Search for Extraterrestrial Life
#455 – Adam Frank: Alien Civilizations and the Search for Extraterrestrial Life #455 – Adam Frank: Alien Civilizations and the Search for Extraterrestrial Life

Adam Frank is an astrophysicist studying star systems and the search for extraterrestrial life and alien civilizations.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep455-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://encord.com/lexEight Sleep: Temp-controlled smart mattress cover.

Go to https://betterhelp.com/lexNotion: Note-taking and team collaboration.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drinks.

4 months назад @ lexfridman.com
#454 – Saagar Enjeti: Trump, MAGA, DOGE, Obama, FDR, JFK, History & Politics
#454 – Saagar Enjeti: Trump, MAGA, DOGE, Obama, FDR, JFK, History & Politics #454 – Saagar Enjeti: Trump, MAGA, DOGE, Obama, FDR, JFK, History & Politics

Saagar Enjeti is a political journalist & commentator, co-host of Breaking Points with Krystal and Saagar and The Realignment Podcast.

He is exceptionally well-read, and the books he recommends are always fascinating and eye-opening.

You can check out all the books he mentions in this episode here: https://lexfridman.com/saagar-booksThank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep454-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://eightsleep.com/lexAG1: All-in-one daily nutrition drinks.

Go to https://drinkLMNT.com/lexBetterHelp: Online therapy and counseling.

4 months, 2 weeks назад @ lexfridman.com
#453 – Javier Milei: President of Argentina – Freedom, Economics, and Corruption
#453 – Javier Milei: President of Argentina – Freedom, Economics, and Corruption #453 – Javier Milei: President of Argentina – Freedom, Economics, and Corruption

Javier Milei is the President of Argentina.

This episode is available in both English and Spanish.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep453-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to http://netsuite.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexAG1: All-in-one daily nutrition drinks.

5 months назад @ lexfridman.com
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity #452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Dario Amodei is the CEO of Anthropic, the company that created Claude.

Amanda Askell is an AI researcher working on Claude’s character and personality.

Chris Olah is an AI researcher working on mechanistic interpretability.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep452-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

(3:49:02) – Character training(3:50:01) – Nature of truth(3:54:38) – Optimal rate of failure(4:01:49) – AI consciousness(4:16:20) – AGI(4:24:58) – Chris Olah – Mechanistic Interpretability(4:29:49) – Features, Circuits, Universality(4:47:23) – Superposition(4:58:22) – Monosemanticity(5:05…

5 months, 1 week назад @ lexfridman.com
#451 – Rick Spence: CIA, KGB, Illuminati, Secret Societies, Cults & Conspiracies
#451 – Rick Spence: CIA, KGB, Illuminati, Secret Societies, Cults & Conspiracies #451 – Rick Spence: CIA, KGB, Illuminati, Secret Societies, Cults & Conspiracies

Rick Spence is a historian specializing in the history of intelligence agencies, espionage, secret societies, conspiracies, the occult, and military history.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep451-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to http://netsuite.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexMasterClass: Online classes from world-class experts.

Go to https://masterclass.com/lexpodShopify: Sell stuff online.

5 months, 3 weeks назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 6 days, 7 hours назад
The AI Revolution in Medicine, Revisited: Empowering patients and healthcare consumers in the age of generative AI
The AI Revolution in Medicine, Revisited: Empowering patients and healthcare consumers in the age of generative AI The AI Revolution in Medicine, Revisited: Empowering patients and healthcare consumers in the age of generative AI

[LAUGHS]DEBRONKART: Ah, well, that’s … that’s even weirder.

I’m, as I said at the beginning, I’m glad to be alive and I’m really, really, really grateful to be given a chance to share my thoughts with your audience because I really like super smart nerds.

And it’s really to me like where I’m seeing kind of the first set of really kind of promising AI applications.

And so, to me, that’s really kind of where I see the most interesting opportunities for technology and for digital health.

Just really, really appreciate it.

6 days, 7 hours назад @ microsoft.com
The AI Revolution in Medicine, Revisited: Real-world healthcare AI development and deployment—at scale
The AI Revolution in Medicine, Revisited: Real-world healthcare AI development and deployment—at scale The AI Revolution in Medicine, Revisited: Real-world healthcare AI development and deployment—at scale

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

2 weeks, 6 days назад @ microsoft.com
Ideas: Accelerating Foundation Models Research: AI for all
Ideas: Accelerating Foundation Models Research: AI for all Ideas: Accelerating Foundation Models Research: AI for all

But there was that class I really, really enjoyed, which was mathematical logic.

Well, let’s get onto the topic of Accelerating Foundation Models Research and unpack the big idea behind that.

It might be confusing for some people, Accelerating Foundation Models Research.

And so when we started with Accelerating Foundation Models Research and from now on, I will say AFMR if that’s okay.

It’s about access to people, access to the resources and really co-designing so that we can really, really make more advances together.

3 weeks, 2 days назад @ microsoft.com
The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic
The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic

Sara is vice president and chief health AI officer at UC San Francisco Health.

LONGHURST: So the pat response is AI won’t replace doctors, but AI will replace doctors who don’t use AI.

LEE: And I’m assuming a chief health AI officer is not a role that has been around for a long time.

LEE: Should I be impressed or concerned that the chief health AI officer at UC San Francisco Health is using ChatGPT off label?

We’ll delve into how patients are using generative AI for their own healthcare, the hype and reality of AI drug discovery, and more.

1 month назад @ microsoft.com
The AI Revolution in Medicine, Revisited: An Introduction
The AI Revolution in Medicine, Revisited: An Introduction The AI Revolution in Medicine, Revisited: An Introduction

About two years ago, with Carey Goldberg and Zak Kohane, we wrote a book, The AI Revolution in Medicine.

If you’re a patient, in what ways could AI change your experience as you try to navigate a complex healthcare system?

A strange and bizarre thought, I admit, but a natural one, I think, for any human being that’s encountering this amazing AI technology for the first time.

And since then, of course, I’ve come to learn that many people have had similar experiences in their first encounters with AI.

And in fact, I’ve come to think of this as, somewhat tongue in cheek, the nine stages of AI grief.

1 month, 2 weeks назад @ microsoft.com
Ideas: Quantum computing redefined with Chetan Nayak
Ideas: Quantum computing redefined with Chetan Nayak Ideas: Quantum computing redefined with Chetan Nayak

CHETAN NAYAK: People sometimes say, well, quantum computers are just going to be like classical computers but faster.

This idea of quantum, because you’ve mentioned Albert Einstein, there’s quantum physics, quantum mechanics, now quantum computing.

Well, let me …NAYAK: And that’s quantum mechanics!

HUIZINGA: OK.NAYAK: You’re probably going to say, well, how does quantum computing fit into this, you know?

[LAUGHS]NAYAK: And, you know, there are people out there who said, you know, quantum computers are decades away; don’t worry about it.

2 months назад @ microsoft.com
Ideas: Building AI for population-scale systems with Akshay Nambi
Ideas: Building AI for population-scale systems with Akshay Nambi Ideas: Building AI for population-scale systems with Akshay Nambi

His work lies at the intersection of systems, AI, and machine learning with a focus on designing, deploying, and scaling AI systems to solve compelling real-world problems.

CHRIS STETKIEWICZ: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code.

NAMBI: That’s right.

This represents a major step towards building AI systems that’s much more holistic personal tutors, which help student understanding and create more engaging, effective learning experience.

Are there some things that could go wrong, even if we get the technology right?

2 months, 1 week назад @ microsoft.com
Ideas: Building AI for population-scale systems with Akshay Nambi
Ideas: Building AI for population-scale systems with Akshay Nambi Ideas: Building AI for population-scale systems with Akshay Nambi

His work lies at the intersection of systems, AI, and machine learning with a focus on designing, deploying, and scaling AI systems to solve compelling real-world problems.

CHRIS STETKIEWICZ: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code.

NAMBI: That’s right.

This represents a major step towards building AI systems that’s much more holistic personal tutors, which help student understanding and create more engaging, effective learning experience.

Are there some things that could go wrong, even if we get the technology right?

2 months, 1 week назад @ microsoft.com
Ideas: Bug hunting with Shan Lu
Ideas: Bug hunting with Shan Lu Ideas: Bug hunting with Shan Lu

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

3 months назад @ microsoft.com
Ideas: AI for materials discovery with Tian Xie and Ziheng Lu
Ideas: AI for materials discovery with Tian Xie and Ziheng Lu Ideas: AI for materials discovery with Tian Xie and Ziheng Lu

And now you can use this loop to design materials really quickly.

XIE: So you can really think about MatterSim and MatterGen accelerating different parts of materials discovery process.

They are also both foundation AI models, meaning they can both be used for a broad range of materials design problems.

Really, really a lot.

Yeah, I really, really like the example that Ziheng mentioned about the educational purposes.

3 months, 1 week назад @ microsoft.com
Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness
Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness

DAEPP: You know, we didn’t really think about the term fraud until we started prepping for this interview with you.

BADANES: Right, right.

One of the things that I get asked a lot is, why can’t we just build good AI to detect bad AI, right?

BADANES: So next time my kids are in a fight, I’m going to point them to Copilot and say, work with Copilot to mediate.

[LAUGHS] No, that’s really, really interesting.

4 months назад @ microsoft.com
NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou
NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou

Earlier today, Lidong gave a keynote here at NeurIPS on the co-evolution of AI and systems engineering.

One dimension is that the scale of the AI systems that we have to support.

And the other dimension is if you look at AI systems, it’s actually a whole-stack kind of design.

STRICKLAND: Yeah, yeah.

ZHOU: Yeah, I think in terms of AI systems, I’m certainly pretty excited about what we can do together, you know, with a combination of AI and systems.

4 months, 1 week назад @ microsoft.com
NeurIPS 2024: AI for Science with Chris Bishop
NeurIPS 2024: AI for Science with Chris Bishop NeurIPS 2024: AI for Science with Chris Bishop

And then the second paradigm really emerged in the 17th century.

And so the third paradigm really began, I guess, sort of, in the ’50s and ’60s, the development of digital computers.

And when I think about AI for Science actually, the space of opportunity is colossal because science is, science is really just understanding more about the world around us.

And now the SMILES autoregressive model can now generate a molecule that’s an improvement on the starting molecule and knows about the protein binding.

But also if you think about [it], science is really about learning more about the world.

4 months, 1 week назад @ microsoft.com
Abstracts: NeurIPS 2024 with Jindong Wang and Steven Euijong Whang
Abstracts: NeurIPS 2024 with Jindong Wang and Steven Euijong Whang Abstracts: NeurIPS 2024 with Jindong Wang and Steven Euijong Whang

Today I’m talking to Jindong Wang, a senior researcher at Microsoft Research, and Steven Whang, a tenured associate professor at the Korea Advanced Institute of Science and Technology.

JINDONG WANG: OK, everybody knows that with the widespread usage of large language models, hallucination has become a crucial factor of concern.

So foreign key constraint basically requires that if there is some director mentioned in the movie table, it has to be one of the directors in the director table.

So now we can join the movie and director table and generate a bigger table.

HUIZINGA: Well, Jindong Wang and Steven Whang, thanks for joining us today, and to our listeners, thanks for tuning in.

4 months, 1 week назад @ microsoft.com
Abstracts: NeurIPS 2024 with Weizhu Chen
Abstracts: NeurIPS 2024 with Weizhu Chen Abstracts: NeurIPS 2024 with Weizhu Chen

The other one actually is some token actually is very, very hard to be predicted during the pretraining.

And the important thing for the data is about data filtering.

If we’re able to build a better model actually is able to benefit so many different kinds of application.

And definitely there’s a lot of things about how to build a better data [that] is unsolved yet in the literature.

And the other thing actually, we are working on something that’s very exciting.

4 months, 2 weeks назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 3 days, 1 hour назад
The Small World Hypothesis
The Small World Hypothesis The Small World Hypothesis

Kyle discusses the history and proof for the small world hypothesis.

3 days, 1 hour назад @ dataskeptic.com
Thinking in Networks
Thinking in Networks Thinking in Networks

Kyle asks Asaf questions about the new network science course he is now teaching. The conversation delves into topics such as contact tracing, tools for analyzing networks, example use cases, and the importance of thinking in networks.

1 week, 4 days назад @ dataskeptic.com
Fraud Networks
Fraud Networks Fraud Networks

In this episode we talk with Bavo DC Campo, a data scientist and statistician, who shares his expertise on the intersection of actuarial science, fraud detection, and social network analytics. Together we will learn how to use graphs to fight against insurance fraud by uncovering hidden connections between fraudulent claims and bad actors. Key insights include how social network analytics can detect fraud rings by mapping relationships between policyholders, claims, and service providers, and how the BiRank algorithm, inspired by Google’s PageRank, helps rank suspicious claims based on network structure. Bavo will also present his iFraud simulator that can be used to model fraudulent networ…

3 weeks, 1 day назад @ dataskeptic.com
Criminal Networks
Criminal Networks Criminal Networks

In this episode we talk with Justin Wang Ngai Yeung, a PhD candidate at the Network Science Institute at Northeastern University in London, who explores how network science helps uncover criminal networks. Justin is also a member of the organizing committee of the satellite conference dealing with criminal networks at the network science conference in The Netherlands in June 2025. Listeners will learn how graph-based models assist law enforcement in analyzing missing data, identifying key figures in criminal organizations, and improving intervention strategies. Key insights include the challenges of incomplete and inaccurate data in criminal network analysis, how law enforcement agencies us…

1 month, 1 week назад @ dataskeptic.com
Graph Bugs
Graph Bugs Graph Bugs

In this episode today’s guest is Celine Wüst, a master’s student at ETH Zurich specializing in secure and reliable systems, shares her work on automated software testing for graph databases. Celine shows how fuzzing—the process of automatically generating complex queries—helps uncover hidden bugs in graph database management systems like Neo4j, FalconDB, and Apache AGE. Key insights include how state-aware query generation can detect critical issues like buffer overflows and crashes, the challenges of debugging complex database behaviors, and the importance of security-focused software testing. We'll also find out which Graph DB company offers swag for finding bugs in its software and get C…

1 month, 2 weeks назад @ dataskeptic.com
Organizational Network Analysis
Organizational Network Analysis Organizational Network Analysis

In this episode, Gabriel Petrescu, an organizational network analyst, discusses how network science can provide deep insights into organizational structures using OrgXO, a tool that maps companies as networks rather than rigid hierarchies. Listeners will learn how analyzing workplace collaboration networks can reveal hidden influencers, organizational bottlenecks, and engagement levels, offering a data-driven approach to improving effectiveness and resilience. Key insights include how companies can identify overburdened employees, address silos between departments, and detect vulnerabilities where too few individuals hold critical knowledge. Real-life applications range from mergers and acq…

1 month, 3 weeks назад @ dataskeptic.com
Organizational Networks
Organizational Networks Organizational Networks

Is it better to have your work team fully connected or sparsely connected? In this episode we'll try to answer this question and more with our guest Hiroki Sayama, a SUNY Distinguished Professor and director of the Center for Complex Systems at Binghamton University. Hiroki delves into the applications of network science in organizational structures and innovation dynamics by showing his recent work of extracting network structures from organizational charts to enable insights into decision-making and performance, He'll also cover how network connectivity impacts team creativity and innovation. Key insights include how the structure of organizational networks—such as the depth of hierarchy …

1 month, 3 weeks назад @ dataskeptic.com
Networks of the Mind
Networks of the Mind Networks of the Mind

A man goes into a bar… This is the beginning of a riddle that our guest, Yoed Kennet, an assistant professor at the Technion's Faculty of Data and Decision Sciences, uses to measure creativity in subjects. In our talk, Yoed speaks about how to combine cognitive science and network science to explore the complexities and decode the mysteries of the human mind. The listeners will learn how network science provides tools to map and analyze human memory, revealing how problem-solving and creativity emerge from changes in semantic memory structures. Key insights include the role of memory restructuring during moments of insight, the connection between semantic networks and creative thinking, and…

2 months назад @ dataskeptic.com
LLMs and Graphs Synergy
LLMs and Graphs Synergy LLMs and Graphs Synergy

In this episode, Garima Agrawal, a senior researcher and AI consultant, brings her years of experience in data science and artificial intelligence. Listeners will learn about the evolving role of knowledge graphs in augmenting large language models (LLMs) for domain-specific tasks and how these tools can mitigate issues like hallucination in AI systems. Key insights include how LLMs can leverage knowledge graphs to improve accuracy by integrating domain expertise, reducing hallucinations, and enabling better reasoning. Real-life applications discussed range from enhancing customer support systems with efficient FAQ retrieval to creating smarter AI-driven decision-making pipelines. Garima’s …

2 months, 1 week назад @ dataskeptic.com
A Network of Networks
A Network of Networks A Network of Networks

In this episode, Bnaya Gross, a Fulbright postdoctoral fellow at the Center for Complex Network Research at Northwestern University, explores the transformative applications of network science in fields ranging from infrastructure to medicine, by studying the interactions between networks ("a network of networks"). Listeners will learn how interdependent networks provide a framework for understanding cascading failures, such as power outages, and how these insights transfer to physical systems like superconducting materials and biological networks. Key takeaways include understanding how dependencies between networks can amplify vulnerabilities, applying these principles to create resilient…

2 months, 2 weeks назад @ dataskeptic.com
Auditing LLMs and Twitter
Auditing LLMs and Twitter Auditing LLMs and Twitter

Our guests, Erwan Le Merrer and Gilles Tredan, are long-time collaborators in graph theory and distributed systems. They share their expertise on applying graph-based approaches to understanding both large language model (LLM) hallucinations and shadow banning on social media platforms. In this episode, listeners will learn how graph structures and metrics can reveal patterns in algorithmic behavior and platform moderation practices. Key insights include the use of graph theory to evaluate LLM outputs, uncovering patterns in hallucinated graphs that might hint at the underlying structure and training data of the models, and applying epidemic models to analyze the uneven spread of shadow ban…

2 months, 3 weeks назад @ dataskeptic.com
Fraud Detection with Graphs
Fraud Detection with Graphs Fraud Detection with Graphs

In this episode, Šimon Mandlík, a PhD candidate at the Czech Technical University will talk with us about leveraging machine learning and graph-based techniques for cybersecurity applications. We'll learn how graphs are used to detect malicious activity in networks, such as identifying harmful domains and executable files by analyzing their relationships within vast datasets. This will include the use of hierarchical multi-instance learning (HML) to represent JSON-based network activity as graphs and the advantages of analyzing connections between entities (like clients, domains etc.). Our guest shows that while other graph methods (such as GNN or Label Propagation) lack in scalability or h…

3 months назад @ dataskeptic.com
Optimizing Supply Chains with GNN
Optimizing Supply Chains with GNN Optimizing Supply Chains with GNN

Thibaut Vidal, a professor at Polytechnique Montreal, specializes in leveraging advanced algorithms and machine learning to optimize supply chain operations. In this episode, listeners will learn how graph-based approaches can transform supply chains by enabling more efficient routing, districting, and decision-making in complex logistical networks. Key insights include the application of Graph Neural Networks to predict delivery costs, with potential to improve districting strategies for companies like UPS or Amazon and overcoming limitations of traditional heuristic methods. Thibaut’s work underscores the potential for GNN to reduce costs, enhance operational efficiency, and provide bette…

3 months, 1 week назад @ dataskeptic.com
The Mystery Behind Large Graphs
The Mystery Behind Large Graphs The Mystery Behind Large Graphs

Our guest in this episode is David Tench, a Grace Hopper postdoctoral fellow at Lawrence Berkeley National Labs, who specializes in scalable graph algorithms and compression techniques to tackle massive datasets. In this episode, we will learn how his techniques enable real-time analysis of large datasets, such as particle tracking in physics experiments or social network analysis, by reducing storage requirements while preserving critical structural properties. David also challenges the common belief that giant graphs are sparse by pointing to a potential bias: Maybe because of the challenges that exist in analyzing large dense graphs, we only see datasets of sparse graphs? The truth is ou…

3 months, 2 weeks назад @ dataskeptic.com
Customizing a Graph Solution
Customizing a Graph Solution Customizing a Graph Solution

In this episode, Dave Bechberger, principal Graph Architect at AWS and author of "Graph Databases in Action", brings deep insights into the field of graph databases and their applications. Together we delve into specific scenarios in which Graph Databases provide unique solutions, such as in the fraud industry, and learn how to optimize our DB for questions around connections, such as "How are these entities related?" or "What patterns of interaction indicate anomalies?" This discussion sheds light on when organizations should consider adopting graph databases, particularly for cases that require scalable analysis of highly interconnected data and provides practical insights into leveraging…

4 months, 1 week назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 1 day, 19 hours назад
881: Beyond GPUs: The Power of Custom AI Accelerators, with Emily Webber
881: Beyond GPUs: The Power of Custom AI Accelerators, with Emily Webber 881: Beyond GPUs: The Power of Custom AI Accelerators, with Emily Webber

Emily Webber speaks to Jon Krohn about her work at Amazon Web Services, from its Annapurna Labs-developed Nitro System, a foundational technology that can enhance securities and performance in the cloud and how Trainium2 became AWS’ most powerful AI chip with four times the compute of Trainium. Hear the specs of AWS’s chips and when to use them. Additional materials: www.superdatascience.com/881 This episode is brought to you by ODSC, the Open Data Science Conference. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (08:36) Emily’s work on AWS’ SageMaker and Trainium (23:54) How AWS N…

1 day, 19 hours назад @ podtrac.com
880: Manus, DeepSeek and China’s AI Boom
880: Manus, DeepSeek and China’s AI Boom 880: Manus, DeepSeek and China’s AI Boom

First developed in China, Manus AI and DeepSeek have made great waves on an international scale. Sought-after for their cost-effectiveness compared to US-made tech, Manus AI and DeepSeek are quickly becoming dominant technologies inside the country. In this five-minute Friday, Jon Krohn asks: Do these technologies warrant the huge amount of resources spent on them by multiple industries in China, and what makes hype become a mainstay? Additional materials: www.superdatascience.com/880 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

5 days, 18 hours назад @ podtrac.com
879: Serverless, Parallel, and AI-Assisted: The Future of Data Science is Here, with Zerve’s Dr. Greg Michaelson
879: Serverless, Parallel, and AI-Assisted: The Future of Data Science is Here, with Zerve’s Dr. Greg Michaelson 879: Serverless, Parallel, and AI-Assisted: The Future of Data Science is Here, with Zerve’s Dr. Greg Michaelson

Greg Michaelson speaks to Jon Krohn about the latest developments at Zerve, an operating system for developing and delivering data and AI products, including a revolutionary feature allowing users to run multiple parts of a program’s code at once and without extra costs. You’ll also hear why LLMs might spell trouble for SaaS companies, Greg’s ‘good-cop, bad-cop’ routine that improves LLM responses, and how RAG (retrieval-augmented generation) can be deployed to create even more powerful AI applications. Additional materials: www.superdatascience.com/879 This episode is brought to you by Trainium2, the latest AI chip from AWS and by the Dell AI Factory with NVIDIA. Interested in sponsoring a…

1 week, 1 day назад @ podtrac.com
878: In Case You Missed It in March 2025
878: In Case You Missed It in March 2025 878: In Case You Missed It in March 2025

AI stacks, AGI, training neural networks, and AI authenticity: Jon Krohn rounds up his interviews from March with this episode of “In Case You Missed It”. In his favorite clips from the month, he speaks to Andriy Burkov (Episode 867), Natalie Monbiot (Episode 873), Richmond Alake (Episode 871) and Varun Godbole (Episode 869). Additional materials: www.superdatascience.com/878 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 week, 5 days назад @ podtrac.com
877: The Neural Processing Units Bringing AI to PCs, with Shirish Gupta
877: The Neural Processing Units Bringing AI to PCs, with Shirish Gupta 877: The Neural Processing Units Bringing AI to PCs, with Shirish Gupta

NPUs, AIPC, and Dell’s growing suite of AI products: Shirish Gupta speaks to Jon Krohn about neural processing units and what makes them a go-to tool for AI inference workloads, reasons to move your workloads from the cloud and to your local devices, what the mnemonic AIPC stands for and why it will soon be on everyone’s lips, and he offers a special intro to Dell’s new Pro-AI Studio Toolkit. Hear about several real-world AIPC applications run by Dell’s clients, from detecting manufacturing defects to improving efficiencies for first responders, massively supporting actual life-or-death situations. Additional materials: www.superdatascience.com/877 This episode is brought to you by ODSC, th…

2 weeks, 1 day назад @ podtrac.com
876: Hugging Face’s smolagents: Agentic AI in Python Made Easy
876: Hugging Face’s smolagents: Agentic AI in Python Made Easy 876: Hugging Face’s smolagents: Agentic AI in Python Made Easy

Small, simple, accessible: Hugging Face makes a huge contribution to the agentic AI wave with its smolagents. Jon Krohn explores how this small-but-mighty new Python library can act as the best personal assistant you never had. Hear about its features and use cases in this five-minute Friday. Additional materials: www.superdatascience.com/876 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

2 weeks, 5 days назад @ podtrac.com
875: How Semiconductors Are Made (And Fuel the AI Boom), with Kai Beckmann
875: How Semiconductors Are Made (And Fuel the AI Boom), with Kai Beckmann 875: How Semiconductors Are Made (And Fuel the AI Boom), with Kai Beckmann

Why are semiconductors so essential in this digital age, and how are they made? Jon Krohn speaks to electronics CEO Kai Beckmann about Merck KGaA, Darmstadt, Germany’s intricate manufacturing process, how we can use AI to develop materials that power next-gen AI technologies, and how a chip with the processing power of the human brain might one day be able to run on the power of a low-watt light bulb. Additional materials: www.superdatascience.com/875 This episode is brought to you by the Dell AI Factory with NVIDIA. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (06:26) How Merck K…

3 weeks, 1 day назад @ podtrac.com
874: How AI is Transforming Baseball (with Lessons For All of Us)
874: How AI is Transforming Baseball (with Lessons For All of Us) 874: How AI is Transforming Baseball (with Lessons For All of Us)

In this Five-Minute Friday, Jon Krohn talks baseball. For decades, coaches have relied on player performance stats to make in-game decisions and refine their season strategies. Now, AI led by Statcast is taking baseball strategy even further, massively broadening analytics data to include pitch, swing and catch trajectories, spin rates, biomechanical information, player matchups, and how to enhance player performances. Listen to the episode to find out what other industries can learn from the “data-friendly” sport of baseball. Additional materials: www.superdatascience.com/874 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship inf…

3 weeks, 5 days назад @ podtrac.com
873: Become Your Best Self Through AI Augmentation — feat. Natalie Monbiot
873: Become Your Best Self Through AI Augmentation — feat. Natalie Monbiot 873: Become Your Best Self Through AI Augmentation — feat. Natalie Monbiot

Natalie Monbiot is an independent advisor and collaborator for projects that concern the “virtual human”, and she is “going all in on the virtual human economy”. Jon Krohn speaks to Natalie about these new ventures, how to mitigate the divide between AI users and nonusers, and how anyone can collaborate with AI without compromising their own creativity. Additional materials: www.superdatascience.com/873 This episode is brought to you by the Dell AI Factory with NVIDIA, by Trainium2, the latest AI chip from AWS and by ODSC, the Open Data Science Conference. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

4 weeks, 1 day назад @ podtrac.com
872: Microsoft’s “Majorana 1” Chip Brings Quantum ML Closer
872: Microsoft’s “Majorana 1” Chip Brings Quantum ML Closer 872: Microsoft’s “Majorana 1” Chip Brings Quantum ML Closer

This podcast is not available yet, please come back soon.

Meanwhile we invite you to check out our podcasts in:Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn

1 month назад @ superdatascience.com
871: NoSQL Is Ideal for AI Applications, with MongoDB’s Richmond Alake
871: NoSQL Is Ideal for AI Applications, with MongoDB’s Richmond Alake 871: NoSQL Is Ideal for AI Applications, with MongoDB’s Richmond Alake

This podcast is not available yet, please come back soon.

Meanwhile we invite you to check out our podcasts in:Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn

1 month назад @ superdatascience.com
870: OpenAI’s “Deep Research”: Get Days of Human Work Done in Minutes
870: OpenAI’s “Deep Research”: Get Days of Human Work Done in Minutes 870: OpenAI’s “Deep Research”: Get Days of Human Work Done in Minutes

So I've been using deep research near daily as a part of that Pro subscription and have been continuously impressed.

And from there, deep research, spent three minutes and looked across eight different sources to come up with my results.

I hope that gives you a sense and kind of a deep dive into deep research with a specific example.

Like any LLM-based tool deep research could hallucinate or make incorrect references, although I haven't caught any of these myself yet, and OpenAI's internal evaluations apparently show markedly lower hallucination rates with deep research than any of their previous tools.

16:11In summary, OpenAI's deep research is transforming the research process by automati…

1 month, 1 week назад @ superdatascience.com
869: AI Should Make Humans Wiser (But It Isn’t), with Varun Godbole
869: AI Should Make Humans Wiser (But It Isn’t), with Varun Godbole 869: AI Should Make Humans Wiser (But It Isn’t), with Varun Godbole

This podcast is not available yet, please come back soon.

Meanwhile we invite you to check out our podcasts in:Subscribe on Website, Apple Podcasts, Spotify, Stitcher Radio or TuneIn

1 month, 1 week назад @ superdatascience.com
868: In Case You Missed It in February 2025
868: In Case You Missed It in February 2025 868: In Case You Missed It in February 2025

Podcast TranscriptJon Krohn: 00:06This is episode number 868 our "In Case You Missed It" February episode.

And that was a really, really big thing that I love.

Jon Krohn: 05:10But yeah, but in DBT automatically creating a documentation for all the fields that you have in a data file.

So Tableau, well, I guess you can run it a few ways, but essentially with Sigma, it sits right on top of Snowflake and works really, really well with that.

LLMs, I think, are one start, but I think all models in general are going to be used long-term.

1 month, 2 weeks назад @ superdatascience.com
867: LLMs and Agents Are Overhyped, with Dr. Andriy Burkov
867: LLMs and Agents Are Overhyped, with Dr. Andriy Burkov 867: LLMs and Agents Are Overhyped, with Dr. Andriy Burkov

Jon Krohn: 00:00:00This is episode number 867 with Dr. Andriy Burkov, machine learning lead at TalentNeuron.

Andriy wrote the indispensable Hundred-page Machine Learning Book that seems to be on every data scientist and ML engineer's bookshelf.

They put some installations like sculptures from ice illuminated with different colors, so it's really, really like a postcard.

You have over 15 years of hands-on experience in automated data analysis, machine learning, natural language processing, you're currently the machine learning lead at a company called TalentNeuron.

I don't know, do you want to dig into this a bit more?Andriy Burkov: 00:30:39Well, I have a couple of comments.

1 month, 2 weeks назад @ superdatascience.com
Data Science at Home Data Science at Home
последний пост 1 week, 6 days назад
AI Agents with Atomic Agents 🚀 with Kenny Vaneetvelde (Ep. 280)
AI Agents with Atomic Agents 🚀 with Kenny Vaneetvelde (Ep. 280) AI Agents with Atomic Agents 🚀 with Kenny Vaneetvelde (Ep. 280)

🎙️ In this episode of Data Science at Home, we sit down with Kenny Vaneetvelde, the mastermind behind Atomic Agents, a groundbreaking framework redefining AI development.

🔍 Discover how atomicity simplifies complex AI systems, why modularity matters more than ever, and how Atomic Agents is eliminating hidden assumptions and redundant complexity in AI workflows.

💡 From real-world applications to the tech stack behind the framework, Kenny takes us on a deep dive into this lightweight, powerful tool for creating consistent and brand-aligned AI.

📌 Timestamps:0:00 – Intro2:30 – Kenny’s journey in AI5:00 – What are Atomic Agents?

10:45 – Why atomicity matters in AI18:20 – The tech behind Atomic A…

1 week, 6 days назад @ datascienceathome.com
WeightWatcher: The AI Detective for LLMs (DeepSeek & OpenAI included) (Ep. 278)
WeightWatcher: The AI Detective for LLMs (DeepSeek & OpenAI included) (Ep. 278) WeightWatcher: The AI Detective for LLMs (DeepSeek & OpenAI included) (Ep. 278)

Enter WeightWatcher—the AI detective tool that peeks inside neural networks without needing their data.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at:hello@datascienceathom…

3 weeks, 3 days назад @ datascienceathome.com
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 278)
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 278) Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 278)

From the viral article “Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything” on my newsletter at https://defragzone.substack.com/p/techs-dumbest-mistake-why-firinghere are my thoughts about AI replacing programmers…✨ Connect with us!

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data profes…

3 weeks, 5 days назад @ datascienceathome.com
Scaling Smart: AI, Data, and Building Future-Ready Enterprises with Josh Miramant (Ep. 276)
Scaling Smart: AI, Data, and Building Future-Ready Enterprises with Josh Miramant (Ep. 276) Scaling Smart: AI, Data, and Building Future-Ready Enterprises with Josh Miramant (Ep. 276)

In this episode, we dive into the transformative world of AI, data analytics, and cloud infrastructure with Josh Miramant, CEO of Blue Orange Digital.

As a seasoned entrepreneur with over $25 million raised across ventures and two successful exits, Josh shares invaluable insights on scaling data-driven businesses, integrating machine learning frameworks, and navigating the rapidly evolving landscape of cloud data architecture.

From generative AI to large language models, Josh explores cutting-edge trends shaping financial services, real estate, and consumer goods.

Tune in for a masterclass in leveraging data for impact and innovation!

Linkshttps://blueorange.digital/https://blueorange.digit…

4 months назад @ datascienceathome.com
Autonomous Weapons and AI Warfare (Ep. 275)
Autonomous Weapons and AI Warfare (Ep. 275) Autonomous Weapons and AI Warfare (Ep. 275)

Here’s the updated text with links to the websites included:AI is revolutionizing the military with autonomous drones, surveillance tech, and decision-making systems.

In this episode of Data Science at Home, we expose the cutting-edge tech reshaping defense—and the chilling ethical questions that follow.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: Francesco Gad📷 Instagram: https://www.instagram.com/datascienceathome/📘 Facebook: https://www.facebook.com/datascienceAH💼 LinkedIn: https://www.linkedin.com/company/data-science-at-home-podcast💬 Discord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

S…

4 months, 1 week назад @ datascienceathome.com
8 Proven Strategies to Scale Your AI Systems Like OpenAI! 🚀 (Ep. 274)
8 Proven Strategies to Scale Your AI Systems Like OpenAI! 🚀  (Ep. 274) 8 Proven Strategies to Scale Your AI Systems Like OpenAI! 🚀 (Ep. 274)

In this episode of Data Science at Home, we’re diving deep into the powerful strategies that top AI companies, like OpenAI, use to scale their systems to handle millions of requests every minute!

From stateless services and caching to the secrets of async processing, discover 8 essential strategies to make your AI and machine learning systems unstoppable.

Instagram: https://www.instagram.com/datascienceathome/Twitter: @datascienceathomeFacebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and ma…

4 months, 1 week назад @ datascienceathome.com
Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273)
Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273) Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273)

Together, they explore the growing importance of distinguishing human-written from AI-generated text, discussing real-world examples from social media to news.

How reliable are current detection tools like DetectGPT?

What are the ethical and technical challenges ahead as AI continues to advance?

And is the balance between innovation and regulation tipping in the right direction?

Tune in for insights on the future of AI text detection and the broader implications for media, academia, and policy.

5 months назад @ datascienceathome.com
AI bubble, Sam Altman’s Manifesto and other fairy tales for billionaires (Ep. 272)
AI bubble, Sam Altman’s Manifesto and other fairy tales for billionaires (Ep. 272) AI bubble, Sam Altman’s Manifesto and other fairy tales for billionaires (Ep. 272)

Welcome to Data Science at Home, where we don’t just drink the AI Kool-Aid.

Today, we’re dissecting Sam Altman’s “AI manifesto”—a magical journey where, apparently, AI will fix everything from climate change to your grandma’s back pain.

In this episode, I’ll break down the bold (and often bizarre) claims in Altman’s grand speech for the Intelligence Age.

I’ll give you the real scoop on what’s realistic, what’s nonsense, and why some tech billionaires just can’t resist overselling.

Chapters00:00 – Intro00:18 – CEO of Baidu Statement on AI Bubble03:47 – News On Sam Altman Open AI06:43 – Online Manifesto “The Intelleigent Age”13:14 – Deep Learning16:26 – AI gets Better With Scale17:45 – Conclu…

5 months назад @ datascienceathome.com
AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271)
AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271) AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271)

In this episode of Data Science at Home, we dive into the hidden costs of AI’s rapid growth — specifically, its massive energy consumption.

With tools like ChatGPT reaching 200 million weekly active users, the environmental impact of AI is becoming impossible to ignore.

Each query, every training session, and every breakthrough come with a price in kilowatt-hours, raising questions about AI’s sustainability.

Join us, as we uncovers the staggering figures behind AI’s energy demands and explores practical solutions for the future.

From efficiency-focused algorithms and specialized hardware to decentralized learning, this episode examines how we can balance AI’s advancements with our planet’s …

5 months, 1 week назад @ datascienceathome.com
Love, Loss, and Algorithms: The Dangerous Realism of AI (Ep. 270)
Love, Loss, and Algorithms: The Dangerous Realism of AI (Ep. 270) Love, Loss, and Algorithms: The Dangerous Realism of AI (Ep. 270)

Subscribe to our new channel https://www.youtube.com/@DataScienceatHomeIn this episode of Data Science at Home, we confront a tragic story highlighting the ethical and emotional complexities of AI technology.

This devastating event has sparked urgent discussions on the mental health risks, ethical responsibilities, and potential regulations surrounding AI chatbots, especially as they become increasingly lifelike.

🎙️ Topics Covered:AI & Emotional Attachment: How hyper-realistic AI chatbots can foster intense emotional bonds with users, especially vulnerable groups like adolescents.

Mental Health Risks: The potential for AI to unintentionally contribute to mental health issues, and the challe…

5 months, 2 weeks назад @ datascienceathome.com
VC Advice Exposed: When Investors Don’t Know What They Want (Ep. 269)
VC Advice Exposed: When Investors Don’t Know What They Want (Ep. 269) VC Advice Exposed: When Investors Don’t Know What They Want (Ep. 269)

Ever feel like VC advice is all over the place?

That’s because it is.

In this episode, I expose the madness behind the money and how to navigate their confusing advice!

Watch the video at https://youtu.be/IBrPFyRMG1QSubscribe to our new Youtube channel https://www.youtube.com/@DataScienceatHome00:00 – Introduction00:16 – The Wild World of VC Advice02:01 – Grow Fast vs. Grow Slow05:00 – Listen to Customers or Innovate Ahead09:51 – Raise Big or Stay Lean?

14:20 – The Real VC Secret: Focus on Your Team and Vision17:03 – Outro

5 months, 4 weeks назад @ datascienceathome.com
AI Says It Can Compress Better Than FLAC?! Hold My Entropy 🍿 (Ep. 268)
AI Says It Can Compress Better Than FLAC?! Hold My Entropy 🍿 (Ep. 268) AI Says It Can Compress Better Than FLAC?! Hold My Entropy 🍿 (Ep. 268)

In this episode of Data Science at Home, Frag dives deep into the wild claims that Large Language Models (LLMs) like Chinchilla 70B are beating traditional lossless compression algorithms.

🧠💥But before you toss out your FLAC collection, let’s break down Shannon’s Source Coding Theorem and why entropy sets the ultimate limit on lossless compression.

We explore: ⚙️ How LLMs leverage probabilistic patterns for compression 📉 Why compression efficiency doesn’t equal general intelligence 🚀 The practical (and ridiculous) challenges of using AI for compression 💡 Can AI actually BREAK Shannon’s limit—or is it just an illusion?

If you love AI, algorithms, or just enjoy some good old myth-busting, thi…

6 months назад @ datascienceathome.com
What Big Tech Isn’t Telling You About AI (Ep. 267)
What Big Tech Isn’t Telling You About AI (Ep. 267) What Big Tech Isn’t Telling You About AI (Ep. 267)

Are AI giants really building trustworthy systems?

A groundbreaking transparency report by Stanford, MIT, and Princeton says no.

In this episode, we expose the shocking lack of transparency in AI development and how it impacts bias, safety, and trust in the technology.

We’ll break down Gary Marcus’s demands for more openness and what consumers should know about the AI products shaping their lives.

Check our new YouTube channel https://www.youtube.com/@DataScienceatHome and Subscribe!

6 months, 2 weeks назад @ datascienceathome.com
Money, Cryptocurrencies, and AI: Exploring the Future of Finance with Chris Skinner [RB] (Ep. 266)
Money, Cryptocurrencies, and AI: Exploring the Future of Finance with Chris Skinner [RB] (Ep. 266) Money, Cryptocurrencies, and AI: Exploring the Future of Finance with Chris Skinner [RB] (Ep. 266)

We’re revisiting one of our most popular episodes from last year, where renowned financial expert Chris Skinner explores the future of money.

In this fascinating discussion, Skinner dives deep into cryptocurrencies, digital currencies, AI, and even the metaverse.

He touches on government regulations, the role of tech in finance, and what these innovations mean for humanity.

Now, one year later, we encourage you to listen again and reflect—how much has changed?

Are Chris Skinner’s predictions still holding up, or has the financial landscape evolved in unexpected ways?

6 months, 2 weeks назад @ datascienceathome.com
Kaggle Kommando’s Data Disco: Laughing our Way Through AI Trends (Ep. 265) [RB]
Kaggle Kommando’s Data Disco: Laughing our Way Through AI Trends (Ep. 265) [RB] Kaggle Kommando’s Data Disco: Laughing our Way Through AI Trends (Ep. 265) [RB]

In this episode, join me and the Kaggle Grand Master, Konrad Banachewicz, for a hilarious journey into the zany world of data science trends.

From algorithm acrobatics to AI, creativity, Hollywood movies, and music, we just can’t get enough.

It’s the typical episode with a dose of nerdy comedy you didn’t know you needed.

Buckle up, it’s a data disco, and we’re breaking down the binary!

SponsorsIntrepid AI is an AI assisted all-in-one platform for robotics teams.

6 months, 3 weeks назад @ datascienceathome.com