Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning
последний пост 3 часа назад
[R] Is ICDE a good conference?
[R] Is ICDE a good conference?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

3 часа назад @ reddit.com
[R] NeurIPS Dataset Anonymization on HuggingFace
[R] NeurIPS Dataset Anonymization on HuggingFace

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

8 часов назад @ reddit.com
[R] NeurIPS 2025: Changing Title
[R] NeurIPS 2025: Changing Title

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

9 часов назад @ reddit.com
[D] At what cost are we training chatbots?
[D] At what cost are we training chatbots?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

9 часов назад @ reddit.com
[P] Framework for training AI models with OpenGL
[P] Framework for training AI models with OpenGL

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

10 часов назад @ reddit.com
[D] stable diffusion model giving noise output
[D] stable diffusion model giving noise output

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

11 часов назад @ reddit.com
[R] Where to find vin decoded data to use for a dataset?
[R] Where to find vin decoded data to use for a dataset?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

12 часов назад @ reddit.com
[D] US CS programs in Medical Imaging
[D] US CS programs in Medical Imaging

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

12 часов назад @ reddit.com
[R] Rethinking Watch Time Optimization: Tubi Finds Tweedie Regression Outperforms Weighted LogLoss for VOD Engagement
[R] Rethinking Watch Time Optimization: Tubi Finds Tweedie Regression Outperforms Weighted LogLoss for VOD Engagement

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
NovaMem & AIV1 A New Computational Paradigm for AI That Learns Like a Human[R]
NovaMem & AIV1 A New Computational Paradigm for AI That Learns Like a Human[R]

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
[P] Eek out better performance LSTM
[P] Eek out better performance LSTM

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

14 часов назад @ reddit.com
[D] Call for Collaborators: Open Source LLM with Novel Efficient Architecture for Personal Computers
[D] Call for Collaborators: Open Source LLM with Novel Efficient Architecture for Personal Computers

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

19 часов назад @ reddit.com
[D] Orthodontic model mesh identification
[D] Orthodontic model mesh identification

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

20 часов назад @ reddit.com
[D] LLM Inference Optimization Techniques
[D] LLM Inference Optimization Techniques

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

20 часов назад @ reddit.com
[R] Am I on the right path in understanding the YoloV4 model?
[R] Am I on the right path in understanding the YoloV4 model?

Your request has been blocked due to a network policy.

If you're running a script or application, please register or sign in with your developer credentials here.

Additionally make sure your User-Agent is not empty and is something unique and descriptive and try again.

if you're supplying an alternate User-Agent string, try changing back to default as that can sometimes result in a block.

if you think that we've incorrectly blocked you or you would like to discuss easier ways to get the data you want, please file a ticket here.

1 day назад @ reddit.com
Towards Data Science
последний пост 5 часов назад
Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer
Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer Google’s AlphaEvolve Is Evolving New Algorithms — And It Could Be a Game Changer

Powered by Google’s formidable Gemini models (that I intend to cover soon, because I’m amazed at their power!

That is, think about Genetic Algorithms, which have existed in data science, numerical methods and computational mathematics for decades.

This parallels very closely the inner working of regular genetic algorithms.

Survival of the fittest: Again like in genetic algorithms, but candidate solutions are automatically compiled, run, and rigorously evaluated against predefined metrics.

One of the most profound implications of tools like AlphaEvolve is the “virtuous cycle” by which AI could improve AI models themselves.

5 часов назад @ towardsdatascience.com
Understanding Random Forest using Python (scikit-learn)
Understanding Random Forest using Python (scikit-learn) Understanding Random Forest using Python (scikit-learn)

This tutorial includes the following:What is BaggingWhat Makes Random Forests DifferentTraining and Tuning a Random Forest using Scikit-LearnCalculating and Interpreting Feature ImportanceVisualizing Individual Decision Trees in a Random ForestAs always, the code used in this tutorial is available on my GitHub.

Note that while bootstrapping is commonly used in Random Forests, it is not strictly necessary because step 2 (random feature selection) introduces sufficient diversity among the trees.

This gives you an initial sense of performance and lets you validate generalization using the out-of-bag (OOB) score, which is built into bagging-based models like Random Forests.

In contrast, scikit-…

5 часов назад @ towardsdatascience.com
How to Learn the Math Needed for Machine Learning
How to Learn the Math Needed for Machine Learning How to Learn the Math Needed for Machine Learning

Most machine learning originated from statistical learning theory, so learning statistics will mean you will inherently learn machine learning or its basics.

Probability Theory — As I said earlier, machine learning is based on statistical learning, which comes from understanding how probability works.

Linear AlgebraLinear algebra is used everywhere in machine learning, and a lot in deep learning.

Mathematics for Machine Learning — As the name implies, this textbook will teach the maths for machine learning.

Mathematics for Machine Learning and Data Science Specialisation — This course is by DeepLearning.AI, the same people who made the Machine Learning Specialisation, arguably the best mach…

6 часов назад @ towardsdatascience.com
How To Build a Benchmark for Your Models
How To Build a Benchmark for Your Models How To Build a Benchmark for Your Models

In this blog post, I’ll explain:What a benchmark is,Why it is important to have a benchmark,How I would build one using an example scenario andSome potential drawbacks to keep in mindWhat is a benchmark?

Why building a benchmark is importantNow that we’ve defined what a benchmark is, let’s dive into why I believe it’s worth spending an extra project week on the development of a strong benchmark.

Helps in Model Selection — A benchmark gives a starting point to compare multiple models fairly.

How I would build a benchmarkI hope I’ve convinced you of the importance of having a benchmark.

However, even though benchmarks are incredibly useful, there are some pitfalls to watch out for:Non-Informa…

10 часов назад @ towardsdatascience.com
🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem
🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem 🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem

Causal — We will use a Graph Model to visualise conditions required to use the Monty Hall problem in real world settings.

As its premise, audience participants were considered traders making deals with the host, Monty Hall 🎩.

The 100 Door Monty Hall problem before the host intervention.

I argue that lessons learnt from the Monty Hall problem definitely are.

Whereas the author points about “similarities” between hypothesis testing and the Monty Hall problem, I think that this is a bit misleading.

10 часов назад @ towardsdatascience.com
Boost 2-Bit LLM Accuracy with EoRA
Boost 2-Bit LLM Accuracy with EoRA Boost 2-Bit LLM Accuracy with EoRA

While recent advances in low-bit quantization are promising, achieving stable and accurate 2-bit quantization remains a significant challenge.

In their paper, NVIDIA presents a wide range of strong results showing that EoRA can significantly boost the accuracy of quantized models.

Image by the authorTo compensate for this degradation, I applied an EoRA adapter using the implementation provided in the GPTQModel library (licensed under Apache 2.0).

All EoRA adapters created for these models are publicly available (Apache 2.0 license):Evaluating EoRA Adapters for 2-bit LLMsLet’s evaluate the effect of the EoRA adapters.

In other words, if you plan to fine-tune a 2-bit model using QLoRA, initia…

1 day назад @ towardsdatascience.com
The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated
The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated

Ever growing data volumes put constraints on systems that handle geospatial data.

The optional GeoAnalytics capabilities within Fabric enable the processing and analytics of vector-type geospatial data, where vector-type geospatial data refers to points, lines, polygons.

It can handle 10 extra common spatial data formats to load and save data spatial data, on top of the Spark natively supported data source formats.

It highlights the challenges posed by increasing data volumes on Geospatial data systems and suggests that traditional big data engines must adapt to handle geospatial data efficiently.

ReferencesGitHub repo with notebooks: delange/Fabric_GeoAnalyticsMicrosoft Fabric: Microsoft F…

1 day назад @ towardsdatascience.com
Strength in Numbers: Ensembling Models with Bagging and Boosting
Strength in Numbers: Ensembling Models with Bagging and Boosting Strength in Numbers: Ensembling Models with Bagging and Boosting

Here we have 1 basic decision tree model and 3 bagged decision tree models – with 3, 50 and 150 trees.

This is a key difference between bagging and boosting – with bagging, more trees eventually stop helping, with boosting more trees eventually start hurting!

We need a couple of functions to create our boosting algorithm:Base decision tree function – a simple function to create and train a single decision tree.

Tuned boosting actuals vs. residuals – image by authorWhile our boosting ensemble model seems to capture the trend reasonably well, we can see off the bat that it isn’t predicting as well as the bagging model.

Both approaches use the power of ensembling to address different problems …

1 day, 1 hour назад @ towardsdatascience.com
Efficient Graph Storage for Entity Resolution Using Clique-Based Compression
Efficient Graph Storage for Entity Resolution Using Clique-Based Compression Efficient Graph Storage for Entity Resolution Using Clique-Based Compression

This article explains the details about efficiently storing highly connected graphs using clique-based graph Compression.

The Entity Graph ModelIn Tilores, a valid entity is a graph where every record is connected to at least one other via a matching rule.

These edges are kept as a simple list, but could alternatively be modeled using an adjacency list structure for more efficient storage.

Hence, retaining all edges in an entity graph serves multiple purposes:Traceability : Allows the user to understand why two records were grouped into the same entity.

Solution: Clique-Based Graph Compression (CBGC)A clique in a graph is a group of nodes where every node is connected to every other node in…

1 day, 4 hours назад @ towardsdatascience.com
Parquet File Format – Everything You Need to Know!
Parquet File Format – Everything You Need to Know! Parquet File Format – Everything You Need to Know!

Parquet file format in a nutshellBefore I show you the ins and outs of the Parquet file format, there are (at least) five main reasons why Parquet is considered a de facto standard for storing data nowadays:Data compression – by applying various encoding and compression algorithms, Parquet file provides reduced memory consumption– by applying various encoding and compression algorithms, Parquet file provides reduced memory consumption Columnar storage – this is of paramount importance in analytic workloads, where fast data read operation is the key requirement.

However, to understand the benefits of using the Parquet file format, we first need to draw the line between the row-based and colu…

1 day, 11 hours назад @ towardsdatascience.com
Survival Analysis When No One Dies: A Value-Based Approach
Survival Analysis When No One Dies: A Value-Based Approach Survival Analysis When No One Dies: A Value-Based Approach

Right censoring is stats-speak for “we don’t know what happened after a certain point” and it’s a big deal in survival analysis.

Using Value Kaplan-Meier in practiceIf you take the Value Kaplan-Meier estimator for a spin on real-world data and compare it to the good old Kaplan-Meier curve, you’ll likely notice something comforting — they often have the same shape.

But here’s where things get interesting: Value Kaplan-Meier usually sits a bit above its traditional cousin.

In such cases, Value Kaplan-Meier offers a natural extension.

Value Kaplan-Meier tends to provide a higher estimate of retained value compared to Kaplan-Meier, due to its ability to account for recoveries.

2 days, 6 hours назад @ towardsdatascience.com
Get Started with Rust: Installation and Your First CLI Tool – A Beginner’s Guide
Get Started with Rust: Installation and Your First CLI Tool – A Beginner’s Guide Get Started with Rust: Installation and Your First CLI Tool – A Beginner’s Guide

Installing Rust — step by stepRegardless of the operating system, it is quite easy to install Rust thanks to the official installer rustup , which is available for free on the Rust website.

Press the Enter key to run the standard installation to install Rust.

After successful installation, Rust should automatically be available in your PATH .

To install Rust, the following steps must be carried out:Open the terminal, for example, with the key combination Ctrl + Alt + T. To install Rust, the following command is executed:curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh3.

With these few steps, you have just created your first successful Rust project, which we can build on in the…

2 days, 10 hours назад @ towardsdatascience.com
Non-Parametric Density Estimation: Theory and Applications
Non-Parametric Density Estimation: Theory and Applications Non-Parametric Density Estimation: Theory and Applications

Hopefully after reading this article, you leave with an appreciation of density estimation as a fundamental statistical tool, and a solid intuition behind the density estimation approaches we discuss here.

As a result, non-parametric density estimates will typically have lower bias and higher variance compared to parametric density estimates.

Next, we’ll look at an alternative method for density estimation, Kernel Density Estimation, that addresses these shortcomings.

Kernel Density Estimators (KDE)Naive Density EstimatorWe’ll first look at the most basic form of a kernel density estimator, the naive density estimator.

This distinction brings up an important point that “more flexible densit…

2 days, 10 hours назад @ towardsdatascience.com
Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware
Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware Rethinking the Environmental Costs of Training AI — Why We Should Look Beyond Hardware

Methods:The paper used a dataset called Notable AI Models from Epoch AI (Epoch AI, 2025), a research institute that investigates the trends of AI development.

The candidate predictors included Parameters, Training Compute, Dataset Size, Training Time, Hardware Quantity, and Hardware Type.

Training Time didn’t fall neatly into either category but was included due to its central role in training AI models.

Histogram of Energy Efficiency (FLOPS/W)The distribution of energy efficiency was highly skewed.

The table below summarizes the tested models and their respective AIC scores:Model Predictors AIC 5 Training Time + Hardware Quantity + Hardware Type 350.78 6 Training Time + Hardware Quantity +…

2 days, 11 hours назад @ towardsdatascience.com
TDS Authors Can Now Receive Payments Via Stripe
TDS Authors Can Now Receive Payments Via Stripe TDS Authors Can Now Receive Payments Via Stripe

We launched the TDS Author Payment Program back in February, and have since announced a major update—with a far more inclusive earning tier—just last month.

Today we’re happy to share that the Program, which has already seen dozens of authors receive their first payout, is now a lot more streamlined and user-friendly thanks to a new Stripe integration.

Starting this week, both existing and new authors can connect their Stripe account to their author profile on our Contributor Portal.

We’ve already contacted all current participants in the Author Payment Program with instructions on next steps, and everyone joining TDS as a contributor (hey, that could be you!)

We look forward to more and mo…

2 days, 12 hours назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост None
The Gradient The Gradient
последний пост 6 months назад
Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research

Mathematics and statistics, once the primary guides of machine learning research, now struggle to provide immediate insight into the latest breakthroughs.

This shift has prompted speculation about mathematics’ diminished role in machine learning research moving forward.

It is also the way that symmetries are usually leveraged when performing computations (for example, in machine learning).

One can reasonably argue that diagrammatic descriptions of well-known constructions, like products, are not useful for the machine learning researcher.

However, as we’ve demonstrated, while mathematics may not maintain the same role in machine learning research that it has held in the past, the success of…

6 months назад @ thegradient.pub
What's Missing From LLM Chatbots: A Sense of Purpose
What's Missing From LLM Chatbots: A Sense of Purpose What's Missing From LLM Chatbots: A Sense of Purpose

Let's jump back to the 1970s, when Roger Schank introduced his "restaurant script" as a kind of dialogue system [1].

The minimum requirement we could have for a dialogue system is that it can stay on the task we gave them.

Concluding marksI have reviewed the making of current LLM dialogue systems, how and why it is insufficient.

The following are two research questions that I’m mostly excited about:(1) Better monitoring and control of dialogue systems with steering techniques.

CitationFor attribution of this in academic contexts or books, please cite this work as:Kenneth Li, "From prediction to purpose: a tutorial on LLM dialogue system", The Gradient, 2024.

8 months, 1 week назад @ thegradient.pub
We Need Positive Visions for AI Grounded in Wellbeing
We Need Positive Visions for AI Grounded in Wellbeing We Need Positive Visions for AI Grounded in Wellbeing

This leads to our second conclusion: We need plausible positive visions of a society with capable AI, grounded in wellbeing.

The rest of this post describes in more detail (1) what we mean by AI that benefits our wellbeing, (2) the need for positive visions for AI grounded in wellbeing, and (3) concrete leverage points to aid in the development and deployment of AI in service of such positive visions.

In diving into the philosophy of flourishing, wellbeing economics, or psychological theories of human wellbeing, one encounters many interesting, compelling, but seemingly incompatible ideas.

The case so far is that we need positive visions for society with capable AI, grounded in individual a…

9 months, 2 weeks назад @ thegradient.pub
TheSequence TheSequence
последний пост 19 часов назад
The Sequence Opinion #542 : Some Ideas About the Future of MCP
The Sequence Opinion #542 : Some Ideas About the Future of MCP The Sequence Opinion #542 : Some Ideas About the Future of MCP

Created Using GPT-4oI’ve been spending a lot of time on Model Context Protocol (MCP) in the last few weeks and wanted to share some ideas.

MCP has quickly become a cornerstone in enabling AI models—particularly large language models (LLMs) and autonomous agents—to interact with external tools, APIs, and data sources through a standardized interface.

Much like USB-C revolutionized device connectivity, MCP aims to standardize how models access and apply contextual information.

We're beginning to see MCP evolve into a foundational layer for distributed AI systems.

We look at five key areas: federated discovery, decentralized MCP networks, persistent and transferable context, trust and cryptogr…

19 часов назад @ thesequence.substack.com
The Sequence Engineering #541: Llama Firewall is the LLM Security Framework We Should All be Using
The Sequence Engineering #541: Llama Firewall is the LLM Security Framework We Should All be Using The Sequence Engineering #541: Llama Firewall is the LLM Security Framework We Should All be Using

Meta's LlamaFirewall is an open-source guardrail framework designed to serve as a final layer of defense against various security risks that come with deploying AI agents.

It addresses challenges such as prompt injection, agent misalignment, and unsafe code generation, providing developers with the necessary tools to build robust and secure AI systems.

Prompt Injection DetectionLlamaFirewall includes PromptGuard 2, a state-of-the-art jailbreak detection engine.

It effectively identifies and blocks prompt injection attempts, ensuring malicious inputs do not alter or exploit the model's behavior.

It evaluates code outputs from AI agents and flags potentially harmful patterns, ensuring code sa…

1 day, 19 hours назад @ thesequence.substack.com
The Sequence Knowledge #540 : Learning About Instruction Following Benchmarks
The Sequence Knowledge #540 : Learning About Instruction Following Benchmarks The Sequence Knowledge #540 : Learning About Instruction Following Benchmarks

💡 AI Concept of the Day: Instruction Following BenchmarksInstruction-following benchmarks have become a cornerstone for evaluating the capabilities of large language models (LLMs) in recent years.

Unlike traditional benchmarks focused purely on accuracy, instruction-following evaluations often require a combination of linguistic understanding, reasoning, and alignment.

Among the most prominent benchmarks in this space is MT-Bench (Model Test Bench), developed by LMSYS.

MT-Bench comprises multi-turn questions across diverse domains and uses both human and LLM-as-a-judge scoring to assess models on coherence, helpfulness, and correctness.

Models are presented with the same instruction and the…

2 days, 19 hours назад @ thesequence.substack.com
The Sequence Radar #539: Keep an Eye on Alibaba’s ZeroSearch
The Sequence Radar #539: Keep an Eye on Alibaba’s ZeroSearch The Sequence Radar #539: Keep an Eye on Alibaba’s ZeroSearch

You can subscribe to The Sequence below:📝 Editorial: Keep an Eye on Alibaba’s ZeroSearchAlibaba has been one of the AI labs pushing the boundaries of frontier models with its Qwen model family.

A few days ago, Alibaba's release of ZeroSearch which marks a key technical advancement in retrieval-augmented training for language models.

The framework introduces a self-supervised paradigm where large language models (LLMs) can simulate search engine behavior, removing dependence on commercial APIs such as Google Search.

ZeroSearch challenges a core assumption in modern LLM training: that high-quality external search queries are necessary for effective information retrieval and question answering…

4 days, 19 hours назад @ thesequence.substack.com
The Sequence Research #538: DeepSeek-Prover-V2: Meet the New Addition to the DeepSeek Family
The Sequence Research #538: DeepSeek-Prover-V2: Meet the New Addition to the DeepSeek Family The Sequence Research #538: DeepSeek-Prover-V2: Meet the New Addition to the DeepSeek Family

Created Using GPT-4oTheorem proving is one of the cornerstones of mathematics and one that is key is we want AIs to help advance science.

DeepSeek-Prover-V2 marks a significant leap in neural theorem proving by tightly integrating informal reasoning with formal proof synthesis.

Developed by DeepSeek-AI, this model narrows the gap between intuitive natural language reasoning and the syntactic rigor of formal proof assistants like Lean 4.

At its core, the system uses DeepSeek-V3 to decompose a complex theorem into intermediate subgoals, expressed in both natural language and formal Lean syntax.

This recursive, hybrid reasoning pipeline enables scalable synthesis of formal proofs and establish…

6 days, 19 hours назад @ thesequence.substack.com
The Sequence Opinion #537: The Rise and Fall of Vector Databases in the AI Era
The Sequence Opinion #537: The Rise and Fall of Vector Databases in the AI Era The Sequence Opinion #537: The Rise and Fall of Vector Databases in the AI Era

In this new gen AI era, few technologies have experienced a surge in interest and scrutiny quite like vector databases.

Designed to store and retrieve high-dimensional vector embeddings—numerical representations of text, images, and other unstructured data—vector databases promised to underpin the next generation of intelligent applications.

This essay examines the meteoric rise and subsequent repositioning of vector databases.

Finally, we contrast the trajectory of vector databases with the lasting success of the NoSQL movement to better understand why vector databases, despite their value, struggled to sustain their standalone identity.

The Emergence of Vector Databases

1 week назад @ thesequence.substack.com
The Sequence Engineering #536: Unbody is the All-In Framework for Building AI Applications
The Sequence Engineering #536: Unbody is the All-In Framework for Building AI Applications The Sequence Engineering #536: Unbody is the All-In Framework for Building AI Applications

Created Using GPT-4oIn today’s engineering section, we would like to cover a super useful framework for building AI applications that still flies surprisingly under the radar.

Unbody is a modular, open-source backend framework purpose-built for AI-native application development.

Often likened to the "Supabase of AI," Unbody streamlines data ingestion, vector storage, model interaction, and API exposure into a unified stack.

This essay provides a technical exploration of Unbody's architecture, principal capabilities, and implementation guidance, highlighting how it simplifies building production-grade AI systems.

Architectural Overview

1 week, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #535: Coding Benchmarks
The Sequence Knowledge #535: Coding Benchmarks The Sequence Knowledge #535: Coding Benchmarks

Today we will Discuss:An overview of coding benchmarks.

In that sense, coding benchmarks have become crucial tools for evaluating and comparing the capabilities of AI models in software development tasks.

As AI continues to revolutionize the software development process, the importance of robust and comprehensive benchmarks cannot be overstated.

In recent years, we have seen an explosion of AI coding benchmarks.

Below, I’ve listed some of the evals that are typically cited in foundation model papers to highlight coding capabilities:

1 week, 2 days назад @ thesequence.substack.com
The Sequence Radar #534: The Leaderboard Illusion: The Paper that Challenges Arena-Based AI Evaluations
The Sequence Radar #534: The Leaderboard Illusion: The Paper that Challenges Arena-Based AI Evaluations The Sequence Radar #534: The Leaderboard Illusion: The Paper that Challenges Arena-Based AI Evaluations

Through analysis of 2 million battles across 243 models and 42 providers, the paper uncovers significant systemic bias in Arena's evaluation pipeline.

A small cohort of proprietary model providers have gained a structural advantage via undisclosed private testing, score retraction privileges, preferential sampling, and asymmetrical model deprecation.

In effect, access to Arena data is becoming a form of leaderboard insider trading.

For initiatives like LMArena that aspire to open and principled AI evaluation, The Leaderboard Illusion reads like both a warning and a blueprint.

🔎 AI ResearchThe Leaderboard IllusionIn the paper The Leaderboard Illusion, researchers from Cohere Labs and academi…

1 week, 4 days назад @ thesequence.substack.com
The Sequence Research #533: NVIDIA's Nemotron Models and the OpenMathReasoning Dataset Kill it in the AI Math Olympiad
The Sequence Research #533: NVIDIA's Nemotron Models and the OpenMathReasoning Dataset Kill it in the AI Math Olympiad The Sequence Research #533: NVIDIA's Nemotron Models and the OpenMathReasoning Dataset Kill it in the AI Math Olympiad

Created Using GPT-4oIf you follow this newsletter, you know I am obsessed with AI in math.

Today, I would like to discuss one of the newest advacements in this area.

In April 2025, NVIDIA reached a significant milestone in AI by securing first place in the AI Mathematical Olympiad – Progress Prize 2 (AIMO-2).

This achievement was driven by two advanced large language models: OpenMath-Nemotron-32B and OpenMath-Nemotron-14B-Kaggle.

These models, trained on the OpenMathReasoning dataset, establish new state-of-the-art capabilities in mathematical reasoning.

1 week, 6 days назад @ thesequence.substack.com
The Sequence Opinion #533: Advancing AI Research : One of the Primitives of Superintelligence
The Sequence Opinion #533: Advancing AI Research : One of the Primitives of Superintelligence The Sequence Opinion #533: Advancing AI Research : One of the Primitives of Superintelligence

Created using GPT-4oWe all heard the ideas of AI improving itself to achieve super intelligence but what are the more practical manifestations of that thesis?

One of the most provocative ideas in AI today is the thesis that artificial intelligence systems may soon play a central role in accelerating AI research itself.

In its most extreme form, this feedback loop is hypothesized to culminate in artificial general intelligence (AGI) and, eventually, superintelligence: systems that vastly exceed human capabilities across virtually all cognitive domains.

This essay articulates the core thesis that AI-facilitated AI research could unlock superintelligence, outlines key technical milestones demo…

2 weeks назад @ thesequence.substack.com
The Sequence Engineering #533 : Inside The Llama Stack
The Sequence Engineering #533 : Inside The Llama Stack The Sequence Engineering #533 : Inside The Llama Stack

Created Using GPT-4oIn today’s engineering section, I would like to dive into one of the most intriguing gen AI stacks in recent months.

The Llama Stack, developed and open-sourced by Meta, marks a major advance in building, deploying, and scaling generative AI applications.

Featuring a modular, layered architecture and a strong emphasis on standardization, Llama Stack addresses key obstacles that have traditionally slowed AI adoption.

This essay delves into the architecture, technical capabilities, and broader impact of the Llama Stack on the generative AI ecosystem.

Llama Stack was designed to standardize and simplify these processes, offering a unified set of APIs and modular components …

2 weeks, 1 day назад @ thesequence.substack.com
The Sequence Knowledge #532: Understanding Function Calling Benchmarks
The Sequence Knowledge #532: Understanding Function Calling Benchmarks The Sequence Knowledge #532: Understanding Function Calling Benchmarks

Created Using GPT-4oToday we will Discuss:An introduction to function calling benchmarks.

A review of Berkeley’s function calling benchmarks.

💡 AI Concept of the Day: Understanding Function Calling BenchmarksAI agents are everywhere and they need to use tools.

Function calling benchmarks evaluate AI models' ability to perform tasks that require structured outputs, such as invoking APIs or executing predefined functions.

Benchmarks like the Berkeley Function Calling Leaderboard (BFCL) and Nexus Function Calling Leaderboard (NFCL) have emerged as standard measures for such evaluations.

2 weeks, 2 days назад @ thesequence.substack.com
📝 Guest Post: Introducing DeepSearcher – A Local Open Source Deep Research
📝 Guest Post: Introducing DeepSearcher – A Local Open Source Deep Research 📝 Guest Post: Introducing DeepSearcher – A Local Open Source Deep Research

The example in our previous post, in contrast to OpenAI’s Deep Research, ran locally, using only open-source models and tools like Milvus and LangChain.

In this post, we build upon our previous post and present Zilliz’s DeepSearcher open-source project.

It is presented as a Python library and command-line tool rather than a Jupyter notebook and is more fully-featured than our previous post.

Question: {question} """ sections.append(common_prompt) # data set prompt data_set = [] for i, collection_name in enumerate(collection_names): data_set.append(f"{collection_name}: {collection_descriptions[i]}") data_set_prompt = f"""The following is all the data set information.

Conditional RepeatUnlike …

2 weeks, 3 days назад @ thesequence.substack.com
The Sequence Radar #531: The Need for AI Interpretability
The Sequence Radar #531: The Need for AI Interpretability The Sequence Radar #531: The Need for AI Interpretability

The opinion section dives into the thesis of how AI creating AI R&A can lead to super intelligence.

You can subscribe to The Sequence below:📝 Editorial: The Need for AI InterpretabilityAnthropic’s CEO Dario Amodei is one of the most creative thinkers in AI.

This was the case of his most recent essay "The Urgency of Interpretability," in which he makes a strong case for putting interpretability at the center of modern AI development.

As AI systems become core components of finance, healthcare, and national security, Amodei argues it's no longer okay to treat them like black boxes.

Ultimately, Amodei wants interpretability to move from a niche research topic to a core part of how we build and…

2 weeks, 4 days назад @ thesequence.substack.com
Synced Review
последний пост 1 month назад
DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT
DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT DeepSeek Signals Next-Gen R2 Model, Unveils Novel Approach to Scaling Inference with SPCT

DeepSeek AI, a prominent player in the large language model arena, has recently published a research paper detailing a new technique aimed…Continue reading on SyncedReview »

1 month назад @ medium.com
Automating Artificial Life Discovery: The Power of Foundation Models
Automating Artificial Life Discovery: The Power of Foundation Models Automating Artificial Life Discovery: The Power of Foundation Models

The recent Nobel Prize for groundbreaking advancements in protein discovery underscores the transformative potential of foundation models…Continue reading on SyncedReview »

4 months, 2 weeks назад @ medium.com
Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI
Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI

Continue reading on SyncedReview »

4 months, 2 weeks назад @ medium.com
DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints
DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints

Recent advancements in training large multimodal models have been driven by efforts to eliminate modeling constraints and unify…Continue reading on SyncedReview »

4 months, 2 weeks назад @ medium.com
NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation
NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation

The Transformer architecture, introduced by Vaswani et al. in 2017, serves as the backbone of contemporary language models. Over the years…Continue reading on SyncedReview »

4 months, 3 weeks назад @ medium.com
From Token to Conceptual: Meta Introduces Large Concept Models in Multilingual AI
From Token to Conceptual: Meta Introduces Large Concept Models in Multilingual AI From Token to Conceptual: Meta Introduces Large Concept Models in Multilingual AI

Large Language Models (LLMs) have become indispensable tools for diverse natural language processing (NLP) tasks. Traditional LLMs operate…Continue reading on SyncedReview »

4 months, 4 weeks назад @ medium.com
NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small…
NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small… NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small…

Language models (LMs) based on transformers have become the gold standard in natural language processing, thanks to their exceptional…Continue reading on SyncedReview »

5 months назад @ medium.com
From Response to Query: The Power of Reverse Thinking in Language Models
From Response to Query: The Power of Reverse Thinking in Language Models From Response to Query: The Power of Reverse Thinking in Language Models

Continue reading on SyncedReview »

5 months назад @ medium.com
Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models
Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models Yann LeCun Team’s New Research: Revolutionizing Visual Navigation with Navigation World Models

Navigation is a fundamental skill for any visually-capable organism, serving as a critical tool for survival. It enables agents to locate…Continue reading on SyncedReview »

5 months, 1 week назад @ medium.com
The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack
The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack

The landscape of vision model pre-training has undergone significant evolution, especially with the rise of Large Language Models (LLMs)…Continue reading on SyncedReview »

5 months, 1 week назад @ medium.com
Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model
Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model Redefining Music AI: The Power of Sony’s SoniDo as a Versatile Foundation Model

A foundation model refers to a pre-trained model developed on extensive datasets, designed to be versatile and adaptable for a range of…Continue reading on SyncedReview »

5 months, 1 week назад @ medium.com
DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence
DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence

Continue reading on SyncedReview »

5 months, 2 weeks назад @ medium.com
Revolutionizing AI on a Budget: Apple’s Roadmap for Small Language Models Training Success
Revolutionizing AI on a Budget: Apple’s Roadmap for Small Language Models Training Success Revolutionizing AI on a Budget: Apple’s Roadmap for Small Language Models Training Success

While large language models (LLMs) dominate the AI landscape, Small-scale Large Language Models (SLMs) are gaining traction as…Continue reading on SyncedReview »

5 months, 2 weeks назад @ medium.com
Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step…
Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step… Redefines Consistency Models”: OpenAI’s TrigFlow Narrows FID Gap to 10% with Efficient Two-Step…

Consistency models (CMs) are a cutting-edge class of diffusion-based generative models designed for rapid and efficient sampling. However…Continue reading on SyncedReview »

5 months, 2 weeks назад @ medium.com
Precision in Pixels: NVIDIA’s Edify Image Model Combines High Quality with Unmatched Control
Precision in Pixels: NVIDIA’s Edify Image Model Combines High Quality with Unmatched Control Precision in Pixels: NVIDIA’s Edify Image Model Combines High Quality with Unmatched Control

The field of text-to-image synthesis has advanced rapidly, with state-of-the-art models now generating highly realistic and diverse images…Continue reading on SyncedReview »

5 months, 3 weeks назад @ medium.com
📓 Cool Blogs
ODS.ai Habr ODS.ai Habr
последний пост 1 month, 2 weeks назад
Байесовская собака: анализ пёсьего компаса
Байесовская собака: анализ пёсьего компаса Байесовская собака: анализ пёсьего компаса

", подумал я. И, к счастью, у меня как раз под рукой оказался идеальный подопытный.

Стандартное арифметическое среднее между 360° и 0° даст нам 180°, несмотря на то, что и 360°, и 0° указывают в одном направлении.

Нулевая гипотеза утверждает, что данные распределены равномерно по кругу, альтернативная — что это не так.

from pingouin import circ_vtest v, pval = circ_vtest(data['radians'], dir=np.pi) print(f"V-statistics: {v:.3f}; p-value: {pval:.6f}")>> V-statistics: 24.127; p-value: 0.002904Вот мы и подобрались к чему-то интересному!

Априорное распределение и функция правдоподобияПредположим, что у нас есть:Априорное распределение с параметрамиФункция правдоподобия для нового наблюдения с п…

1 month, 2 weeks назад @ habr.com
Создаем воспоминания. Осваиваем FLUX, LoRA и ComfyUI
Создаем воспоминания. Осваиваем FLUX, LoRA и ComfyUI Создаем воспоминания. Осваиваем FLUX, LoRA и ComfyUI

Такие модели можно обучать с нуля и это дорого, нужен кластер с GPU (видеокарты) и много данных.

В домене текст-картинка бывают открытые модели, типа Stable Diffusion, Kandinsky и FLUX, бывают закрытые, типа DALL-E.Открытую модель можно дообучать разными способами.

Борис СтругацкийОсобенности: Для личностей типа Стругацких или Бродского, качественных фотографий крайне мало, но много и не надо.

Можно и фразу.

Владимир СурдинАлексей СемихатовВидео с их лекциями можно найти повсеместно, начать можно с канала Вселенная плюс на YouTube и в телеграм.

4 months, 1 week назад @ habr.com
Как нейросети, RL и байесовскую оптимизацию стали использовать на ускорителях заряженных частиц
Как нейросети, RL и байесовскую оптимизацию стали использовать на ускорителях заряженных частиц Как нейросети, RL и байесовскую оптимизацию стали использовать на ускорителях заряженных частиц

Один из них — поддержание стабильной орбиты пучка частиц (траектории, по которой происходит движение), которая критически важна для точности экспериментов.

Вот, кстати, наша статья про то, как сейсмические вибрации будут влиять на орбиту пучка в СКИФ: Beam Stability .

Классические подходы к стабилизации орбиты пучка в ускорителяхДля стабилизации орбиты используют датчики положения пучка (BPM) и магнитные корректоры.

По словам авторов получились следующие преимущества:Ускорение процесса коррекции орбиты и повышение точности по сравнению с классическими методами, такими как SVD.

Задача агента:Автоматическое восстановление орбиты пучка заряженных частиц за ограниченное время и минимальное коли…

4 months, 3 weeks назад @ habr.com
о1: почему новая GPT от OpenAI — это не хайп, а переход к новой парадигме в ИИ
о1: почему новая GPT от OpenAI — это не хайп, а переход к новой парадигме в ИИ о1: почему новая GPT от OpenAI — это не хайп, а переход к новой парадигме в ИИ

В этой статье мы разберемся, чему научилась новая GPT o1, и как это повлияет на дальнейшую эволюцию ИИ.

Компания утверждает, что для них сброс счётчика линейки моделей к единичке знаменует собой переход к новой парадигме, и что эта нейросеть и вовсе демонстрирует новый уровень возможностей ИИ.

На форумах и в Твиттере была куча обсуждений, предвосхищений и хайпа, на фоне которых планка ожиданий некоторых людей взлетела до небес.

Издание Bloomberg рассказало, что в ходе внутренней демонстрации OpenAI показали концепцию из пяти уровней, помогающую отслеживать прогресс в создании ИИ.

Однако на уровне GPT-5 прирост в навыках может быть совсем другим (как в лучшую, так и в худшую сторону).

8 months назад @ habr.com
Большие и чёрные (ящики): что мы знаем о том, как «думают» нейросети?
Большие и чёрные (ящики): что мы знаем о том, как «думают» нейросети? Большие и чёрные (ящики): что мы знаем о том, как «думают» нейросети?

И в том, и в другом случаях объяснение действия не связано с реальным мотивом его сделать, и там, и там рождается поддельное (но правдоподобно звучащее) объяснение причин.

Просто сейчас это не воспринимается всерьёз, ведь LLM не распространены и не становятся ядром бизнес-процессов, включающих принятие решений.

Один и тот же текст запроса+ответа подаётся в модель, и производится оценка вероятности получить именно такой ответ при фиксированном запросе.

Это и желание продолжать существовать/жить, и нежелание умирать, и рассуждения об эмоциях и контроле.

Потому что абстракции, потому что обобщение, потому что это ровно то, за что мы ценим модели.

8 months, 1 week назад @ habr.com
Как организовать процесс А/В тестирования на коленке
Как организовать процесс А/В тестирования на коленке Как организовать процесс А/В тестирования на коленке

В ней авторы выделили 4 этапа зрелости, грубо можно разделить компании по частоте запусков экспериментов:на этапе Crawl компания проводит эксперимент раз в месяц (примерно 10 экспериментов в год);на этапе Walk – раз в неделю (примерно 50 экспериментов в год);на этапе Run – ежедневно (примерно 250 экспериментов в год);на этапе Fly – более 1000 экспериментов в год.

Более четкое разделение основывается на некоторых важных свойствах компаний, таких как: наличие команды платформы экспериментов, возможности автоматического расчета метрик, распространенности экспериментов в компании, самодостаточность продуктовых команд в проведении экспериментов, влияние экспериментов на компанию и другое.

Точка …

9 months назад @ habr.com
Как организовать процесс А/В тестирования на коленке
Как организовать процесс А/В тестирования на коленке Как организовать процесс А/В тестирования на коленке

В ней авторы выделили 4 этапа зрелости, грубо можно разделить компании по частоте запусков экспериментов:на этапе Crawl компания проводит эксперимент раз в месяц (примерно 10 экспериментов в год);на этапе Walk – раз в неделю (примерно 50 экспериментов в год);на этапе Run – ежедневно (примерно 250 экспериментов в год);на этапе Fly – более 1000 экспериментов в год.

Более четкое разделение основывается на некоторых важных свойствах компаний, таких как: наличие команды платформы экспериментов, возможности автоматического расчета метрик, распространенности экспериментов в компании, самодостаточность продуктовых команд в проведении экспериментов, влияние экспериментов на компанию и другое.

Точка …

9 months назад @ habr.com
Введение в MLflow
Введение в MLflow Введение в MLflow

Mlflow Experiments and Mlflow RunsMLflow Experiments и MLflow Runs - это основные абстракции для структурирования проекта.

mlflow run mlproject --entry-point hyperparameters-tuning --env-manager conda --experiment-name Cancer_Classification --run-name Hyperparameters_Search -P n-trials=10Посмотрим на результаты в MLflow UI.

artifact_path: model flavors: python_function: data: model.xgb env: conda: conda.yaml virtualenv: python_env.yaml loader_module: mlflow.xgboost python_version: 3.11.4 xgboost: code: null data: model.xgb model_class: xgboost.sklearn.XGBClassifier model_format: xgb xgb_version: 2.0.3 mlflow_version: 2.14.2 model_size_bytes: 35040 model_uuid: 516954aae7c94e91adeed9df76cb405…

9 months, 2 weeks назад @ habr.com
Machine Learning Mastery
последний пост None
ML in Production
последний пост None
Sorta Insightful Sorta Insightful
последний пост 1 month, 2 weeks назад
Who is AI For?
Who is AI For? Who is AI For?

I think the easy answer to this question is that right now, AI is for the AI developers.

Code is useful, it makes money, it is a testbed for AI speeding up the development of AI, and it is easy.

I’m working in AI because it pays well and is potentially really good for the world.

The artists did not know what AI was, but when they learned, they quickly decided they did not want it.

It feels like the most likely outcome is that people go all-in on pushing raw intelligence, in the way that AI developers can measure it, leaving behind those that are not like AI developers.

1 month, 2 weeks назад @ alexirpan.com
MIT Mystery Hunt 2025
MIT Mystery Hunt 2025 MIT Mystery Hunt 2025

This has spoilers for MIT Mystery Hunt 2025.

I enjoyed it more than their 2018 Hunt, which is commonly cited as an all-time good Mystery Hunt.

In this Mystery Hunt it was reversed, where the act of unlocking is easy but the value and difficulty of a feeder varied.

In my free time pre-Hunt, I went to Puzzled Pint, where I tried to all-brain a logic puzzle (solve it without writing anything).

I’m looking forward to solving “No Assembly Required” in Mystery Hunt 2026, a puzzle that gives you the answer for no work.

3 months, 2 weeks назад @ alexirpan.com
Using AI to Get the Neopets Destruct-o-Match Avatar
Using AI to Get the Neopets Destruct-o-Match Avatar Using AI to Get the Neopets Destruct-o-Match Avatar

If AI can be superhuman at Go, surely AI can be slightly-worse-than-experts at Destruct-o-Match if we try?

Step 0: Is Making a Destruct-o-Match AI Against Neopets Rules?

I believe the precedent is in favor of a Destruct-o-Match AI being okay.

As long as I’m the one inputting moves the Destruct-o-Match AI recommends, I should be okay.

To write a game AI, we first need to implement the rules of the game in code.

4 months назад @ alexirpan.com
Late Takes on OpenAI o1
Late Takes on OpenAI o1 Late Takes on OpenAI o1

I realize how late this is, but I didn’t get a post out while o1 was fresh, and still feel like writing one despite it being cold.

(Also, OpenAI just announced they’re going to ship new stuff starting tomorrow so it’s now or never to say something.)

OpenAI o1 is a model release widely believed (but not confirmed) to be a post-trained version of GPT-4o.

If true, that makes this video especially useful for understanding OpenAI o1.

Which I suppose is part of why I’m talking about o1 rather than building o1.

5 months, 1 week назад @ alexirpan.com
Nine Years Later
Nine Years Later Nine Years Later

I expected to fill that void with more blog writing, but that’s not what happened.

The puzzles are great though, and if that’s good enough for you, I had fun with that.

Undertale YellowUndertale Yellow is a fantastic fan game, that’s been in development for 7 years and comes out feeling like a canon entry made by Toby Fox.

markdown 15837 2024 - 01 - 11 - ai - timelines - 2024. markdown 1939 2024 - 01 - 21 - mh - 2024. markdown 5076 2024 - 03 - 23 - crew - battle .

markdown 826 2024 - 04 - 30 - puzzlehunting - 201. markdown 8641 2024 - 07 - 08 - tragedies - of - reality .

9 months назад @ alexirpan.com
I'm Switching Into AI Safety
I'm Switching Into AI Safety I'm Switching Into AI Safety

There’s often a conflation between the research field of AI safety and the community of AI safety.

Me thinking AI safety is important is not an endorsement for or against anything else in the broader meme space it came from.

Historically, AI safety work did not appeal to me because of how theoretical it was.

I’m aware of the arguments that most AI safety work so far has either been useless or not that different from broader AI work.

Those who care about safety a lot call this safetywashing, the stapling of “safety” to work that does not advance safety.

9 months, 1 week назад @ alexirpan.com
Lil'Log
последний пост None
inFERENCe
последний пост None
The Spectator
последний пост None
Off the Convex Path
последний пост None
fast.ai NLP fast.ai NLP
последний пост None
Sebastian Ruder
последний пост None
Andrew Karpathy blog
последний пост None
大トロ 大トロ
последний пост None
🔬 Science
Papers With Code Papers With Code
последний пост 23 минуты назад
/luisfelipewb/ Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests
/luisfelipewb/ Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests /luisfelipewb/ Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests

Despite significant advancements in Deep Reinforcement Learning (DRL) for Autonomous Surface Vehicles (ASVs), their robustness in real-world conditions, particularly under external disturbances, remains insufficiently explored.

In this paper, we evaluate the resilience of a DRL-based agent designed to capture floating waste under various perturbations.

We train the agent using domain randomization and evaluate its performance in real-world field tests, assessing its ability to handle unexpected disturbances such as asymmetric drag and an off-center payload.

We assess the agent's performance under these perturbations in both simulation and real-world experiments, quantifying performance degr…

23 минуты назад @ paperswithcode.com
/jiepku/ A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
/jiepku/ A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability /jiepku/ A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability

Self-supervised learning shows promise in harnessing extensive unlabeled data, but it also confronts significant privacy concerns, especially in vision.

In this paper, we perform membership inference on visual self-supervised models in a more realistic setting: self-supervised training method and details are unknown for an adversary when attacking as he usually faces a black-box system in practice.

In this setting, considering that self-supervised model could be trained by completely different self-supervised paradigms, e.g., masked image modeling and contrastive learning, with complex training details, we propose a unified membership inference method called PartCrop.

It is motivated by the…

23 минуты назад @ paperswithcode.com
/gta0804/ MASS: Multi-Agent Simulation Scaling for Portfolio Construction
/gta0804/ MASS: Multi-Agent Simulation Scaling for Portfolio Construction /gta0804/ MASS: Multi-Agent Simulation Scaling for Portfolio Construction

LLM-based multi-agent has gained significant attention for their potential in simulation and enhancing performance.

However, existing works are limited to pure simulations or are constrained by predefined workflows, restricting their applicability and effectiveness.

In this paper, we introduce the Multi-Agent Scaling Simulation (MASS) for portfolio construction.

We demonstrate its superiority through performance experiments, ablation studies, backtesting experiments, experiments on updated data and stock pools, scaling experiments, parameter sensitivity experiments, and visualization experiments, conducted in comparison with 6 state-of-the-art baselines on 3 challenging A-share stock pools.…

23 минуты назад @ paperswithcode.com
/jeming-creater/ HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation
/jeming-creater/ HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation /jeming-creater/ HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation

Multimodal medical image segmentation faces significant challenges in the context of gastric cancer lesion analysis.

This clinical context is defined by the scarcity of independent multimodal datasets and the imperative to amalgamate inherently misaligned modalities.

As a result, algorithms are constrained to train on approximate data and depend on application migration, leading to substantial resource expenditure and a potential decline in analysis accuracy.

Second, we introduce HWA-UNETR, a novel 3D segmentation framework that employs an original HWA block with learnable window aggregation layers to establish dynamic feature correspondences between different modalities' anatomical structu…

23 минуты назад @ paperswithcode.com
/kairawal/ Evaluating Model Explanations without Ground Truth
/kairawal/ Evaluating Model Explanations without Ground Truth /kairawal/ Evaluating Model Explanations without Ground Truth

There can be many competing and contradictory explanations for a single model prediction, making it difficult to select which one to use.

Current explanation evaluation frameworks measure quality by comparing against ideal "ground-truth" explanations, or by verifying model sensitivity to important inputs.

We outline the limitations of these approaches, and propose three desirable principles to ground the future development of explanation evaluation strategies for local feature importance explanations.

We propose a ground-truth Agnostic eXplanation Evaluation framework (AXE) for evaluating and comparing model explanations that satisfies these principles.

Unlike prior approaches, AXE does not…

23 минуты назад @ paperswithcode.com
/mathllm/ MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
/mathllm/ MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning /mathllm/ MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

Natural language image-caption datasets, widely used for training Large Multimodal Models, mainly focus on natural scenarios and overlook the intricate details of mathematical figures that are critical for problem-solving, hindering the advancement of current LMMs in multimodal mathematical reasoning.

Specifically, we co-develop our image-to-code model and dataset with model-in-the-loop approach, resulting in an image-to-code model, FigCodifier and ImgCode-8.6M dataset, the largest image-code dataset to date.

Furthermore, we utilize FigCodifier to synthesize novel mathematical figures and then construct MM-MathInstruct-3M, a high-quality multimodal math instruction fine-tuning dataset.

Fina…

23 минуты назад @ paperswithcode.com
/welldonepf/ Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
/welldonepf/ Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis /welldonepf/ Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis

Recent advancements have enhanced the capability of Multimodal Large Language Models (MLLMs) to comprehend multi-image information.

However, existing benchmarks primarily evaluate answer correctness, overlooking whether models genuinely comprehend the visual input.

To address this, we define implicit visual misunderstanding (IVM), where MLLMs provide correct answers without fully comprehending the visual input.

Through our analysis, we decouple the visual and textual modalities within the causal attention module, revealing that attention distribution increasingly converges on the image associated with the correct answer as the network layers deepen.

Attention accuracy directly evaluates the…

23 минуты назад @ paperswithcode.com
/lyc127/ Rethinking Repetition Problems of LLMs in Code Generation
/lyc127/ Rethinking Repetition Problems of LLMs in Code Generation /lyc127/ Rethinking Repetition Problems of LLMs in Code Generation

With the advent of neural language models, the performance of code generation has been significantly boosted.

Previous work has primarily focused on content repetition, which is merely a fraction of the broader repetition problem in code generation.

In this paper, we formally define structural repetition and propose an efficient decoding approach called RPG, which stands for Repetition Penalization based on Grammar, to alleviate the repetition problems in code generation for LLMs.

Specifically, RPG first leverages grammar rules to identify repetition problems during code generation, and then strategically decays the likelihood of critical tokens that contribute to repetitions, thereby mitig…

1 час назад @ paperswithcode.com
/guangjinpan/ Large Wireless Localization Model (LWLM): A Foundation Model for Positioning in 6G Networks
/guangjinpan/ Large Wireless Localization Model (LWLM): A Foundation Model for Positioning in 6G Networks /guangjinpan/ Large Wireless Localization Model (LWLM): A Foundation Model for Positioning in 6G Networks

To address these limitations, we propose a foundation-model-based solution tailored for wireless localization.

We first analyze how different self-supervised learning (SSL) tasks acquire general-purpose and task-specific semantic features based on information bottleneck (IB) theory.

Building on this foundation, we design a pretraining methodology for the proposed Large Wireless Localization Model (LWLM).

We further design lightweight decoders for key downstream tasks, including time-of-arrival (ToA) estimation, angle-of-arrival (AoA) estimation, single base station (BS) localization, and multiple BS localization.

Comprehensive experimental results confirm that LWLM consistently surpasses bo…

1 час назад @ paperswithcode.com
/yiveen/ Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
/yiveen/ Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data /yiveen/ Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data

With the development of photorealistic diffusion models, models trained in part or fully on synthetic data achieve progressively better results.

We define the concept of feasibility as whether attributes in a synthetic image could realistically exist in the real-world domain; synthetic images containing attributes that violate this criterion are considered infeasible.

In this paper, we investigate whether enforcing feasibility is necessary when generating synthetic training data for CLIP-based classifiers, focusing on three target attributes: background, color, and texture.

We introduce VariReal, a pipeline that minimally edits a given source image to include feasible or infeasible attribut…

1 час назад @ paperswithcode.com
/szu-tera/ LDIR: Low-Dimensional Dense and Interpretable Text Embeddings with Relative Representations
/szu-tera/ LDIR: Low-Dimensional Dense and Interpretable Text Embeddings with Relative Representations /szu-tera/ LDIR: Low-Dimensional Dense and Interpretable Text Embeddings with Relative Representations

Semantic text representation is a fundamental task in the field of natural language processing.

(2024) propose interpretable text embeddings using large language models, which forms "0/1" embeddings based on responses to a series of questions.

These interpretable text embeddings are typically high-dimensional (larger than 10,000).

In this work, we propose Low-dimensional (lower than 500) Dense and Interpretable text embeddings with Relative representations (LDIR).

Extensive experimental results show that LDIR performs close to the black-box baseline models and outperforms the interpretable embeddings baselines with much fewer dimensions.

1 час назад @ paperswithcode.com
/redwyd/ From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models
/redwyd/ From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models /redwyd/ From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models

The rise of Large Language Models (LLMs) has heightened concerns about the misuse of AI-generated text, making watermarking a promising solution.

Mainstream watermarking schemes for LLMs fall into two categories: logits-based and sampling-based.

To mitigate this, we integrate logits-based and sampling-based schemes, harnessing their respective strengths to achieve synergy.

In this paper, we propose a versatile symbiotic watermarking framework with three strategies: serial, parallel, and hybrid.

The hybrid framework adaptively embeds watermarks using token entropy and semantic entropy, optimizing the balance between detectability, robustness, text quality, and security.

1 час назад @ paperswithcode.com
/adcombrink/ A Comparative Study of SMT and MILP for the Nurse Rostering Problem
/adcombrink/ A Comparative Study of SMT and MILP for the Nurse Rostering Problem /adcombrink/ A Comparative Study of SMT and MILP for the Nurse Rostering Problem

The effects of personnel scheduling on the quality of care and working conditions for healthcare personnel have been thoroughly documented.

SMT has gained momentum within the formal verification community in the last decades, leading to the advancement of SMT solvers that have been shown to outperform standard mathematical programming techniques.

In this work, we propose generic constraint formulations that can model a wide range of real-world scheduling constraints.

Experimental results show how each solver excels for certain types of problems; the MILP solver generally performs better when the problem is highly constrained or infeasible, while the SMT solver performs better otherwise.

We …

1 час назад @ paperswithcode.com
/liuyz0/ Superposition Yields Robust Neural Scaling
/liuyz0/ Superposition Yields Robust Neural Scaling /liuyz0/ Superposition Yields Robust Neural Scaling

However, the origin of this neural scaling law -- the finding that loss decreases as a power law with model size -- remains unclear.

This robust scaling behavior is explained geometrically: when many more vectors are packed into a lower dimensional space, the interference (squared overlaps) between vectors scales inversely with that dimension.

We then analyzed four families of open-sourced LLMs and found that they exhibit strong superposition and quantitatively match the predictions of our toy model.

The Chinchilla scaling law turned out to also agree with our results.

We conclude that representation superposition is an important mechanism underlying the observed neural scaling laws.

1 час назад @ paperswithcode.com
/juliushenke/ AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents
/juliushenke/ AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents /juliushenke/ AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents

We then present AutoPentest, an application for performing black-box penetration tests with a high degree of autonomy.

AutoPentest is based on the LLM GPT-4o from OpenAI and the LLM agent framework LangChain.

We conduct a study on three capture-the-flag style Hack The Box (HTB) machines, comparing our implementation AutoPentest with the baseline approach of manually using the ChatGPT-4o user interface.

Both approaches are able to complete 15-25 % of the subtasks on the HTB machines, with AutoPentest slightly outperforming ChatGPT.

We measure a total cost of \$96.20 US when using AutoPentest across all experiments, while a one-month subscription to ChatGPT Plus costs \$20.

1 час назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 23 минуты назад
/daniel3303/ StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
/daniel3303/ StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation /daniel3303/ StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation

Visual storytelling systems struggle to maintain character identity across frames and link actions to appropriate subjects, frequently leading to referential hallucinations.

These issues can be addressed through grounding of characters, objects, and other entities on the visual elements.

We propose StoryReasoning, a dataset containing 4,178 stories derived from 52,016 movie images, with both structured scene analyses and grounded stories.

Each story maintains character and object consistency across frames while explicitly modeling multi-frame relationships through structured tabular representations.

Our approach features cross-frame object re-identification using visual similarity and face …

1 час назад @ paperswithcode.com
/ltpwy/ MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
/ltpwy/ MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning /ltpwy/ MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning

Compositional Zero-Shot Learning (CZSL) aims to recognize unseen state-object combinations by leveraging known combinations.

Existing studies basically rely on the cross-modal alignment capabilities of CLIP but tend to overlook its limitations in capturing fine-grained local features, which arise from its architectural and training paradigm.

To address this issue, we propose a Multi-Stage Cross-modal Interaction (MSCI) model that effectively explores and utilizes intermediate-layer information from CLIP's visual encoder.

Specifically, we design two self-adaptive aggregators to extract local information from low-level visual features and integrate global information from high-level visual fe…

1 час назад @ paperswithcode.com
/moonflo/ Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field
/moonflo/ Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field /moonflo/ Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field

Dynamic scene representation and reconstruction have undergone transformative advances in recent years, catalyzed by breakthroughs in neural radiance fields and 3D Gaussian splatting techniques.

While initially developed for static environments, these methodologies have rapidly evolved to address the complexities inherent in 4D dynamic scenes through an expansive body of research.

Coupled with innovations in differentiable volumetric rendering, these approaches have significantly enhanced the quality of motion representation and dynamic scene reconstruction, thereby garnering substantial attention from the computer vision and graphics communities.

This survey presents a systematic analysis …

1 час назад @ paperswithcode.com
/sanofi-public/ MIPHEI-ViT: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models
/sanofi-public/ MIPHEI-ViT: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models /sanofi-public/ MIPHEI-ViT: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models

Histopathological analysis is a cornerstone of cancer diagnosis, with Hematoxylin and Eosin (H&E) staining routinely acquired for every patient to visualize cell morphology and tissue architecture.

On the other hand, multiplex immunofluorescence (mIF) enables more precise cell type identification via proteomic markers, but has yet to achieve widespread clinical adoption due to cost and logistical constraints.

To bridge this gap, we introduce MIPHEI (Multiplex Immunofluorescence Prediction from H&E), a U-Net-inspired architecture that integrates state-of-the-art ViT foundation models as encoders to predict mIF signals from H&E images.

MIPHEI targets a comprehensive panel of markers spanning …

1 час назад @ paperswithcode.com
/kaka0910/ MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting
/kaka0910/ MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting /kaka0910/ MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting

Deep learning approaches for marine fog detection and forecasting have outperformed traditional methods, demonstrating significant scientific and practical importance.

Existing datasets, often focused on a single region or satellite, restrict the ability to evaluate model performance across diverse conditions and hinder the exploration of intrinsic marine fog characteristics.

To address these limitations, we introduce \textbf{MFogHub}, the first multi-regional and multi-satellite dataset to integrate annotated marine fog observations from 15 coastal fog-prone regions and six geostationary satellites, comprising over 68,000 high-resolution samples.

By encompassing diverse regions and satelli…

1 час назад @ paperswithcode.com
/midiyazhu/ Rethinking Prompt Optimizers: From Prompt Merits to Optimization
/midiyazhu/ Rethinking Prompt Optimizers: From Prompt Merits to Optimization /midiyazhu/ Rethinking Prompt Optimizers: From Prompt Merits to Optimization

Prompt optimization (PO) offers a practical alternative to fine-tuning large language models (LLMs), enabling performance improvements without altering model weights.

However, due to limited downward compatibility, verbose, instruction-heavy prompts from advanced LLMs can overwhelm lightweight inference models and degrade response quality.

In this work, we rethink prompt optimization through the lens of interpretable design.

We first identify a set of model-agnostic prompt quality merits and empirically validate their effectiveness in enhancing prompt and response quality.

Unlike prior work, MePO avoids online optimization reliance, reduces cost and privacy concerns, and, by learning clear,…

1 час назад @ paperswithcode.com
/shenwenhao01/ ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
/shenwenhao01/ ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization /shenwenhao01/ ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization

Human mesh recovery (HMR) from a single image is inherently ill-posed due to depth ambiguity and occlusions.

To address these issues, we propose ADHMR, a framework that Aligns a Diffusion-based HMR model in a preference optimization manner.

First, we train a human mesh prediction assessment model, HMR-Scorer, capable of evaluating predictions even for in-the-wild images without 3D annotations.

We then use HMR-Scorer to create a preference dataset, where each input image has a pair of winner and loser mesh predictions.

This dataset is used to finetune the base model using direct preference optimization.

1 час назад @ paperswithcode.com
/cekavis/ VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality
/cekavis/ VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality /cekavis/ VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality

3D Gaussian Splatting (3DGS) has rapidly become a leading technique for novel-view synthesis, providing exceptional performance through efficient software-based GPU rasterization.

In this work, we introduce VRSplat: we combine and extend several recent advancements in 3DGS to address challenges of VR holistically.

We show how the ideas of Mini-Splatting, StopThePop, and Optimal Projection can complement each other, by modifying the individual techniques and core 3DGS rasterizer.

Our method also incorporates a fine-tuning step that optimizes Gaussian parameters based on StopThePop depth evaluations and Optimal Projection.

VRSplat is the first, systematically evaluated 3DGS approach capable o…

1 час назад @ paperswithcode.com
/victorzheleznov/ Learning Nonlinear Dynamics in Physical Modelling Synthesis using Neural Ordinary Differential Equations
/victorzheleznov/ Learning Nonlinear Dynamics in Physical Modelling Synthesis using Neural Ordinary Differential Equations /victorzheleznov/ Learning Nonlinear Dynamics in Physical Modelling Synthesis using Neural Ordinary Differential Equations

Modal synthesis methods are a long-standing approach for modelling distributed musical systems.

A modal decomposition leads to a coupled nonlinear system of ordinary differential equations.

Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems such as electronic circuits automatically from data.

In this work, we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems.

The proposed model leverages the analytical solution for linear vibration of system's modes and employs a neural network to account for nonlinear dynami…

1 час назад @ paperswithcode.com
/miguelaguilera/ Inferring entropy production in many-body systems using nonequilibrium MaxEnt
/miguelaguilera/ Inferring entropy production in many-body systems using nonequilibrium MaxEnt /miguelaguilera/ Inferring entropy production in many-body systems using nonequilibrium MaxEnt

We propose a method for inferring entropy production (EP) in high-dimensional stochastic systems, including many-body systems and non-Markovian systems with long memory.

Standard techniques for estimating EP become intractable in such systems due to computational and statistical limitations.

We infer trajectory-level EP and lower bounds on average EP by exploiting a nonequilibrium analogue of the Maximum Entropy principle, along with convex duality.

It does not require reconstruction of high-dimensional probability distributions or rate matrices, nor any special assumptions such as discrete states or multipartite dynamics.

We demonstrate its numerical performance on a disordered nonequilibr…

1 час назад @ paperswithcode.com
/anonymousaneug/ Two-Stage Generative Model for Intracranial Aneurysm Meshes with Morphological Marker Conditioning
/anonymousaneug/ Two-Stage Generative Model for Intracranial Aneurysm Meshes with Morphological Marker Conditioning /anonymousaneug/ Two-Stage Generative Model for Intracranial Aneurysm Meshes with Morphological Marker Conditioning

Existing shape generation methods struggle to capture realistic IA features and ignore the relationship between IA pouches and parent vessels, limiting physiological realism and their generation cannot be controlled to have specific morphological measurements.

In the first stage, AneuG generates low-dimensional Graph Harmonic Deformation (GHD) tokens to encode and reconstruct aneurysm pouch shapes, constrained to morphing energy statistics truths.

In the second stage, AneuG generates parent vessels conditioned on GHD tokens, by generating vascular centreline and propagating the cross-section.

AneuG's IA shape generation can further be conditioned to have specific clinically relevant morphol…

1 час назад @ paperswithcode.com
/bigdogmanluo/ A Hybrid Strategy for Aggregated Probabilistic Forecasting and Energy Trading in HEFTCom2024
/bigdogmanluo/ A Hybrid Strategy for Aggregated Probabilistic Forecasting and Energy Trading in HEFTCom2024 /bigdogmanluo/ A Hybrid Strategy for Aggregated Probabilistic Forecasting and Energy Trading in HEFTCom2024

Obtaining accurate probabilistic energy forecasts and making effective decisions amid diverse uncertainties are routine challenges in future energy systems.

This paper presents the solution of team GEB, which ranked 3rd in trading, 4th in forecasting, and 1st among student teams in the IEEE Hybrid Energy Forecasting and Trading Competition 2024 (HEFTCom2024).

The solution provides accurate probabilistic forecasts for a wind-solar hybrid system, and achieves substantial trading revenue in the day-ahead electricity market.

This paper also explores the potential of end-to-end learning to further enhance the trading revenue by adjusting the distribution of forecast errors.

Code for all mentione…

1 час назад @ paperswithcode.com
/gaoyuan2/ APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds
/gaoyuan2/ APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds /gaoyuan2/ APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds

Airborne laser scanning (ALS) point cloud segmentation is a fundamental task for large-scale 3D scene understanding.

Continuous Test-Time Adaptation (CTTA) offers a solution by adapting a source-pretrained model to evolving, unlabeled target domains.

Despite its potential, research on ALS point clouds remains limited, facing challenges such as the absence of standardized datasets and the risk of catastrophic forgetting and error accumulation during prolonged adaptation.

To tackle these challenges, we propose APCoTTA, the first CTTA method tailored for ALS point cloud semantic segmentation.

Finally, we construct two benchmarks, ISPRSC and H3DC, to address the lack of CTTA benchmarks for ALS …

1 час назад @ paperswithcode.com
/andreiiarhire/ Learned Lightweight Smartphone ISP with Unpaired Data
/andreiiarhire/ Learned Lightweight Smartphone ISP with Unpaired Data /andreiiarhire/ Learned Lightweight Smartphone ISP with Unpaired Data

The Image Signal Processor (ISP) is a fundamental component in modern smartphone cameras responsible for conversion of RAW sensor image data to RGB images with a strong focus on perceptual quality.

A difficult and costly step when developing a learned ISP is the acquisition of pixel-wise aligned paired data that maps the raw captured by a smartphone camera sensor to high-quality reference images.

In this work, we address this challenge by proposing a novel training method for a learnable ISP that eliminates the need for direct correspondences between raw images and ground-truth data with matching content.

Using lightweight neural network architectures suitable for mobile devices as backbone…

1 час назад @ paperswithcode.com
/desilanja/ Advancing Community Detection with Graph Convolutional Neural Networks: Bridging Topological and Attributive Cohesion
/desilanja/ Advancing Community Detection with Graph Convolutional Neural Networks: Bridging Topological and Attributive Cohesion /desilanja/ Advancing Community Detection with Graph Convolutional Neural Networks: Bridging Topological and Attributive Cohesion

Community detection, a vital technology for real-world applications, uncovers cohesive node groups (communities) by leveraging both topological and attribute similarities in social networks.

However, existing Graph Convolutional Networks (GCNs) trained to maximize modularity often converge to suboptimal solutions.

Additionally, directly using human-labeled communities for training can undermine topological cohesiveness by grouping disconnected nodes based solely on node attributes.

We address these issues by proposing a novel Topological and Attributive Similarity-based Community detection (TAS-Com) method.

TAS-Com introduces a novel loss function that exploits the highly effective and scal…

1 час назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 23 минуты назад
/lamda-rl/ ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts
/lamda-rl/ ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts /lamda-rl/ ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts

A central challenge in reinforcement learning (RL) is its dependence on extensive real-world interaction data to learn task-specific policies.

To bridge this gap, we introduce ImagineBench, the first comprehensive benchmark for evaluating offline RL algorithms that leverage both real rollouts and LLM-imaginary rollouts.

Through systematic evaluation of state-of-the-art offline RL algorithms, we observe that simply applying existing offline RL algorithms leads to suboptimal performance on unseen tasks, achieving 35.44% success rate in hard tasks in contrast to 64.37% of method training on real rollouts for hard tasks.

This result highlights the need for algorithm advancements to better lever…

1 час назад @ paperswithcode.com
/nasosger/ Multi-Token Prediction Needs Registers
/nasosger/ Multi-Token Prediction Needs Registers /nasosger/ Multi-Token Prediction Needs Registers

Multi-token prediction has emerged as a promising objective for improving language model pretraining, but its benefits have not consistently generalized to other settings such as fine-tuning.

In this paper, we propose MuToR, a simple and effective approach to multi-token prediction that interleaves learnable register tokens into the input sequence, each tasked with predicting future targets.

Compared to existing methods, MuToR offers several key advantages: it introduces only a negligible number of additional parameters, requires no architectural changes--ensuring compatibility with off-the-shelf pretrained language models--and remains aligned with the next-token pretraining objective, maki…

1 час назад @ paperswithcode.com
/roypic/ On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging
/roypic/ On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging /roypic/ On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging

Deep neural networks excel in medical imaging but remain prone to biases, leading to fairness gaps across demographic groups.

We provide the first systematic exploration of Human-AI alignment and fairness in this domain.

Our results show that incorporating human insights consistently reduces fairness gaps and enhances out-of-domain generalization, though excessive alignment can introduce performance trade-offs, emphasizing the need for calibrated strategies.

These findings highlight Human-AI alignment as a promising approach for developing fair, robust, and generalizable medical AI systems, striking a balance between expert guidance and automated efficiency.

Our code is available at https:/…

1 час назад @ paperswithcode.com
/kheil-z/ IMITATE: Image Registration with Context for unknown time frame recovery
/kheil-z/ IMITATE: Image Registration with Context for unknown time frame recovery /kheil-z/ IMITATE: Image Registration with Context for unknown time frame recovery

In this paper, we formulate a novel image registration formalism dedicated to the estimation of unknown condition-related images, based on two or more known images and their associated conditions.

We show how to practically model this formalism by using a new conditional U-Net architecture, which fully takes into account the conditional information and does not need any fixed image.

Our formalism is then applied to image moving tumors for radiotherapy treatment at different breathing amplitude using 4D-CT (3D+t) scans in thoracoabdominal regions.

This driving application is particularly complex as it requires to stitch a collection of sequential 2D slices into several 3D volumes at differen…

1 час назад @ paperswithcode.com
/sisinflab/ Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M
/sisinflab/ Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M /sisinflab/ Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M

Large Language Models (LLMs) have become increasingly central to recommendation scenarios due to their remarkable natural language understanding and generation capabilities.

Although significant research has explored the use of LLMs for various recommendation tasks, little effort has been dedicated to verifying whether they have memorized public recommendation dataset as part of their training data.

This is undesirable because memorization reduces the generalizability of research findings, as benchmarking on memorized datasets does not guarantee generalization to unseen datasets.

In this work, we investigate whether LLMs have memorized public recommendation datasets.

First, we define datase…

1 час назад @ paperswithcode.com
/Brain-Cog-Lab/ Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
/Brain-Cog-Lab/ Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence /Brain-Cog-Lab/ Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence

Multimodal learning enhances the perceptual capabilities of cognitive systems by integrating information from different sensory modalities.

However, existing multimodal fusion research typically assumes static integration, not fully incorporating key dynamic mechanisms found in the brain.

Inspired by this biological mechanism, we explore the relationship between multimodal output and information from individual modalities, proposing an inverse effectiveness driven multimodal fusion (IEMF) strategy.

To verify universality and generalization, we also conduct experiments on Artificial Neural Networks (ANN) and Spiking Neural Networks (SNN), with results showing good adaptability to both networ…

2 часа назад @ paperswithcode.com
/gaobb/ AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
/gaobb/ AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection /gaobb/ AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection

Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning, which is critical in open scenarios.

Recent studies have demonstrated that pre-trained vision-language models like CLIP exhibit strong generalization with just zero or a few normal images.

AdaptCLIP treats CLIP models as a foundational service, adding only three simple adapters, visual adapter, textual adapter, and prompt-query adapter, at its input or output ends.

AdaptCLIP supports zero-/few-shot generalization across domains and possesses a training-free manner on target domains once trained on a base dataset.

AdaptCLIP achieves state-of-the-art performance on…

4 часа назад @ paperswithcode.com
/12tqian/ Layered Unlearning for Adversarial Relearning
/12tqian/ Layered Unlearning for Adversarial Relearning /12tqian/ Layered Unlearning for Adversarial Relearning

Our goal is to understand how post-training methods, such as fine-tuning, alignment, and unlearning, modify language model behavior and representations.

We are particularly interested in the brittle nature of these modifications that makes them easy to bypass through prompt engineering or relearning.

To test this hypothesis, we design an unlearning algorithm, Layered Unlearning (LU), that creates distinct inhibitory mechanisms for a growing subset of the data.

We find that LU improves robustness to adversarial relearning for several different unlearning methods.

Our results contribute to the state-of-the-art of machine unlearning and provide insight into the effect of post-training updates.

9 часов назад @ paperswithcode.com
/mileswyn/ Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping
/mileswyn/ Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping /mileswyn/ Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

This study evaluates the performance of zero-shot multi-modal large language models (MLLMs) compared to traditional deep learning methods in ICH binary classification and subtyping.

The study compares various MLLMs, including GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet V2, with conventional deep learning models, including ResNet50 and Vision Transformer.

Carefully crafted prompts were used to guide MLLMs in tasks such as ICH presence, subtype classification, localization, and volume estimation.

Results: The results indicate that in the ICH binary classification task, traditional deep learning models outperform MLLMs comprehensively.

Conclusion: While MLLMs excel in interactive capabilit…

10 часов назад @ paperswithcode.com
/hruedisser/ ARCANE -- Early Detection of Interplanetary Coronal Mass Ejections
/hruedisser/ ARCANE -- Early Detection of Interplanetary Coronal Mass Ejections /hruedisser/ ARCANE -- Early Detection of Interplanetary Coronal Mass Ejections

Interplanetary coronal mass ejections (ICMEs) are major drivers of space weather disturbances, posing risks to both technological infrastructure and human activities.

Automatic detection of ICMEs in solar wind in situ data is essential for early warning systems.

While several methods have been proposed to identify these structures in time series data, robust real-time detection remains a significant challenge.

In this work, we present ARCANE - the first framework explicitly designed for early ICME detection in streaming solar wind data under realistic operational constraints, enabling event identification without requiring observation of the full structure.

Notably, we find that using real-…

10 часов назад @ paperswithcode.com
/vdplasthijs/ Predicting butterfly species presence from satellite imagery using soft contrastive regularisation
/vdplasthijs/ Predicting butterfly species presence from satellite imagery using soft contrastive regularisation /vdplasthijs/ Predicting butterfly species presence from satellite imagery using soft contrastive regularisation

The growing demand for scalable biodiversity monitoring methods has fuelled interest in remote sensing data, due to its widespread availability and extensive coverage.

This paper presents a new data set for predicting butterfly species presence from satellite data in the United Kingdom.

We experimentally optimise a Resnet-based model to predict multi-species presence from 4-band satellite images, and find that this model especially outperforms the mean rate baseline for locations with high species biodiversity.

To improve performance, we develop a soft, supervised contrastive regularisation loss that is tailored to probabilistic labels (such as species-presence data), and demonstrate that t…

11 часов назад @ paperswithcode.com
/wufangr/ MUST: Multi-Scale Structural-Temporal Link Prediction Model for UAV Ad Hoc Networks
/wufangr/ MUST: Multi-Scale Structural-Temporal Link Prediction Model for UAV Ad Hoc Networks /wufangr/ MUST: Multi-Scale Structural-Temporal Link Prediction Model for UAV Ad Hoc Networks

Link prediction in unmanned aerial vehicle (UAV) ad hoc networks (UANETs) aims to predict the potential formation of future links between UAVs.

However, the highly dynamic and sparse nature of UANET topologies presents substantial challenges in effectively capturing meaningful structural and temporal patterns for accurate link prediction.

In this paper, we propose a multi-scale structural-temporal link prediction model (MUST) for UANETs.

Specifically, we first employ graph attention networks (GATs) to capture structural features at multiple levels, including the individual UAV level, the UAV community level, and the overall network level.

Extensive experimental results demonstrate that MUST…

11 часов назад @ paperswithcode.com
/kapilsprinklr/ CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios
/kapilsprinklr/ CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios /kapilsprinklr/ CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios

Large Language Models (LLMs) hold immense potential for revolutionizing Customer Experience Management (CXM), particularly in contact center operations.

Existing benchmarks often lack realism, failing to incorporate deep knowledge base (KB) integration, real-world noise, or critical operational tasks beyond conversational fluency.

To bridge this gap, we introduce CXMArena, a novel, large-scale synthetic benchmark dataset specifically designed for evaluating AI in operational CXM contexts.

Given the diversity in possible contact center features, we have developed a scalable LLM-powered pipeline that simulates the brand's CXM entities that form the foundation of our datasets-such as knowledge…

11 часов назад @ paperswithcode.com
/coffeetau/ Distance-aware Self-adaptive Graph Convolution for Fine-grained Hierarchical Recommendation
/coffeetau/ Distance-aware Self-adaptive Graph Convolution for Fine-grained Hierarchical Recommendation /coffeetau/ Distance-aware Self-adaptive Graph Convolution for Fine-grained Hierarchical Recommendation

Graph Convolutional Networks (GCNs) are widely used to improve recommendation accuracy and performance by effectively learning the representations of user and item nodes.

However, two major challenges remain: (1) the lack of further optimization in the graph representation structure and (2) insufficient attention given to the varying contributions of different convolutional layers.This paper proposes SAGCN, a distance-based adaptive hierarchical aggregation method that refines the aggregation process through differentiated representation metrics.

SAGCN introduces a detailed approach to multilayer information aggregation and representation space optimization, enabling the model to learn hier…

11 часов назад @ paperswithcode.com
/giorgiopiatti/ Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
/giorgiopiatti/ Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents" /giorgiopiatti/ Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"

By replicating key experiments, we validate claims regarding the performance of large models, such as GPT-4-turbo, compared to smaller models.

The impact of the universalization principle is also examined, with results showing that large models can achieve sustainable cooperation, with or without the principle, while smaller models fail without it.

Furthermore, we introduce new settings: we create a heterogeneous multi-agent environment, study a scenario using Japanese instructions, and explore an "inverse environment" where agents must cooperate to mitigate harmful resource distributions.

Our results confirm that the benchmark can be applied to new models, scenarios, and languages, offerin…

11 часов назад @ paperswithcode.com
💼 University and corporation labs
DeepMind DeepMind
последний пост 1 day, 15 hours назад
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

Research AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms ShareCopy link ×New AI agent evolves algorithms for math and practical applications in computing by combining the creativity of large language models with automated evaluators Large language models (LLMs) are remarkably versatile.

Today, we’re announcing AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization.

AlphaEvolve enhanced the efficiency of Google's data centers, chip design and AI training processes — including training the large language models underlying AlphaEvolve itself.

Designing better algorithms with large language…

1 day, 15 hours назад @ deepmind.google
Gemini 2.5 Pro Preview: even better coding performance
Gemini 2.5 Pro Preview: even better coding performance Gemini 2.5 Pro Preview: even better coding performance

Today we’re excited to release Gemini 2.5 Pro Preview (I/O edition).

“We found Gemini 2.5 Pro to be the best frontier model when it comes to "capability over latency" ratio.

“The updated Gemini 2.5 Pro achieves leading performance on our junior-dev evals.

Video to code Gemini 2.5 Pro delivers state-of-the-art video understanding, scoring 84.8% on the VideoMME benchmark.

Gemini 2.5 Pro was able to design and code the microphone UI animation for the dictation starter app.

1 week, 2 days назад @ developers.googleblog.com
Gemini 2.5 Pro Preview: even better coding performance
Gemini 2.5 Pro Preview: even better coding performance Gemini 2.5 Pro Preview: even better coding performance

Today we’re excited to release Gemini 2.5 Pro Preview (I/O edition).

“We found Gemini 2.5 Pro to be the best frontier model when it comes to "capability over latency" ratio.

“The updated Gemini 2.5 Pro achieves leading performance on our junior-dev evals.

Video to code Gemini 2.5 Pro delivers state-of-the-art video understanding, scoring 84.8% on the VideoMME benchmark.

Gemini 2.5 Pro was able to design and code the microphone UI animation for the dictation starter app.

1 week, 2 days назад @ developers.googleblog.com
Build rich, interactive web apps with an updated Gemini 2.5 Pro
Build rich, interactive web apps with an updated Gemini 2.5 Pro Build rich, interactive web apps with an updated Gemini 2.5 Pro

Today we're releasing early access to Gemini 2.5 Pro Preview (I/O edition), an updated version of 2.5 Pro that has significantly improved capabilities for coding, especially building compelling interactive web apps.

We were going to release this update at Google I/O in a couple weeks, but based on the overwhelming enthusiasm for this model, we wanted to get it in your hands sooner so people can start building.

This builds on the overwhelmingly positive feedback to Gemini 2.5 Pro’s coding and multimodal reasoning capabilities.

Beyond UI-focused development, these improvements extend to other coding tasks such as code transformation, code editing and developing complex agentic workflows.

1 week, 2 days назад @ blog.google
Build rich, interactive web apps with an updated Gemini 2.5 Pro
Build rich, interactive web apps with an updated Gemini 2.5 Pro Build rich, interactive web apps with an updated Gemini 2.5 Pro

Today we're releasing early access to Gemini 2.5 Pro Preview (I/O edition), an updated version of 2.5 Pro that has significantly improved capabilities for coding, especially building compelling interactive web apps.

We were going to release this update at Google I/O in a couple weeks, but based on the overwhelming enthusiasm for this model, we wanted to get it in your hands sooner so people can start building.

This builds on the overwhelmingly positive feedback to Gemini 2.5 Pro’s coding and multimodal reasoning capabilities.

Beyond UI-focused development, these improvements extend to other coding tasks such as code transformation, code editing and developing complex agentic workflows.

1 week, 2 days назад @ blog.google
Music AI Sandbox, now with new features and broader access
Music AI Sandbox, now with new features and broader access Music AI Sandbox, now with new features and broader access

Our ongoing collaborations led to the creation of Music AI Sandbox, in 2023, which we’ve shared with musicians, producers and songwriters through YouTube’s Music AI Incubator.

Building upon the work we've done to date, today, we're introducing new features and improvements to Music AI Sandbox, including Lyria 2, our latest music generation model.

We're excited to see what this growing community creates with Music AI Sandbox and encourage interested musicians, songwriters, and producers to sign up here.

Music AI Sandbox We created Music AI Sandbox in close collaboration with musicians.

Building AI for musicians, with musicians Through collaborations like Music AI Sandbox, we aim to build tru…

3 weeks назад @ deepmind.google
Music AI Sandbox, now with new features and broader access
Music AI Sandbox, now with new features and broader access Music AI Sandbox, now with new features and broader access

Our ongoing collaborations led to the creation of Music AI Sandbox, in 2023, which we’ve shared with musicians, producers and songwriters through YouTube’s Music AI Incubator.

Building upon the work we've done to date, today, we're introducing new features and improvements to Music AI Sandbox, including Lyria 2, our latest music generation model.

We're excited to see what this growing community creates with Music AI Sandbox and encourage interested musicians, songwriters, and producers to sign up here.

Music AI Sandbox We created Music AI Sandbox in close collaboration with musicians.

Building AI for musicians, with musicians Through collaborations like Music AI Sandbox, we aim to build tru…

3 weeks назад @ c7510da-dot-gdm-deepmind-com-prod.appspot.com
Introducing Gemini 2.5 Flash
Introducing Gemini 2.5 Flash Introducing Gemini 2.5 Flash

In fact, Gemini 2.5 Flash performs strongly on Hard Prompts in LMArena , second only to 2.5 Pro.

Today we are rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI .

Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off.

2.5 Flash has comparable metrics to other leading models for a fraction of the cost and size.

Start building with Gemini 2.5 Flash today

4 weeks назад @ developers.googleblog.com
Introducing Gemini 2.5 Flash
Introducing Gemini 2.5 Flash Introducing Gemini 2.5 Flash

In fact, Gemini 2.5 Flash performs strongly on Hard Prompts in LMArena , second only to 2.5 Pro.

Today we are rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI .

Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off.

2.5 Flash has comparable metrics to other leading models for a fraction of the cost and size.

Start building with Gemini 2.5 Flash today

4 weeks назад @ developers.googleblog.com
Generate videos in Gemini and Whisk with Veo 2
Generate videos in Gemini and Whisk with Veo 2 Generate videos in Gemini and Whisk with Veo 2

Starting today, Gemini Advanced users can generate and share videos using our state-of-the-art video model, Veo 2.

In Gemini, you can now translate text-based prompts into dynamic videos.

How to create videos with GeminiVeo 2 represents a leap forward in video generation, designed to produce high-resolution, detailed videos with cinematic realism.

To generate videos, select Veo 2 from the model dropdown in Gemini.

Sharing your video on mobile is easy: simply tap the share button to quickly upload engaging short videos to platforms like TikTok and YouTube Shorts.

1 month назад @ blog.google
Generate videos in Gemini and Whisk with Veo 2
Generate videos in Gemini and Whisk with Veo 2 Generate videos in Gemini and Whisk with Veo 2

Starting today, Gemini Advanced users can generate and share videos using our state-of-the-art video model, Veo 2.

In Gemini, you can now translate text-based prompts into dynamic videos.

How to create videos with GeminiVeo 2 represents a leap forward in video generation, designed to produce high-resolution, detailed videos with cinematic realism.

To generate videos, select Veo 2 from the model dropdown in Gemini.

Sharing your video on mobile is easy: simply tap the share button to quickly upload engaging short videos to platforms like TikTok and YouTube Shorts.

1 month назад @ blog.google
DolphinGemma: How Google AI is helping decode dolphin communication
DolphinGemma: How Google AI is helping decode dolphin communication DolphinGemma: How Google AI is helping decode dolphin communication

For decades, understanding the clicks, whistles and burst pulses of dolphins has been a scientific frontier.

What if we could not only listen to dolphins, but also understand the patterns of their complex communication well enough to generate realistic responses?

Today, on National Dolphin Day, Google, in collaboration with researchers at Georgia Tech and the field research of the Wild Dolphin Project (WDP), is announcing progress on DolphinGemma: a foundational AI model trained to learn the structure of dolphin vocalizations and generate novel dolphin-like sound sequences.

This approach in the quest for interspecies communication pushes the boundaries of AI and our potential connection wit…

1 month назад @ blog.google
DolphinGemma: How Google AI is helping decode dolphin communication
DolphinGemma: How Google AI is helping decode dolphin communication DolphinGemma: How Google AI is helping decode dolphin communication

For decades, understanding the clicks, whistles and burst pulses of dolphins has been a scientific frontier.

What if we could not only listen to dolphins, but also understand the patterns of their complex communication well enough to generate realistic responses?

Today, on National Dolphin Day, Google, in collaboration with researchers at Georgia Tech and the field research of the Wild Dolphin Project (WDP), is announcing progress on DolphinGemma: a foundational AI model trained to learn the structure of dolphin vocalizations and generate novel dolphin-like sound sequences.

This approach in the quest for interspecies communication pushes the boundaries of AI and our potential connection wit…

1 month назад @ blog.google
Taking a responsible path to AGI
Taking a responsible path to AGI Taking a responsible path to AGI

Today, we're sharing our views on AGI safety and security as we navigate the path toward this transformational technology.

Artificial general intelligence (AGI), AI that’s at least as capable as humans at most cognitive tasks, could be here within the coming years.

Building an ecosystem for AGI readinessLed by Shane Legg, Co-Founder and Chief AGI Scientist at Google DeepMind, our AGI Safety Council (ASC) analyzes AGI risk and best practices, making recommendations on safety measures.

As such, we’ve launched a new course on AGI Safety for students, researchers and professionals interested in this topic.

Ultimately, our approach to AGI safety and security serves as a vital roadmap to address …

1 month, 1 week назад @ c7510da-dot-gdm-deepmind-com-prod.appspot.com
Taking a responsible path to AGI
Taking a responsible path to AGI Taking a responsible path to AGI

Today, we're sharing our views on AGI safety and security as we navigate the path toward this transformational technology.

Artificial general intelligence (AGI), AI that’s at least as capable as humans at most cognitive tasks, could be here within the coming years.

Building an ecosystem for AGI readinessLed by Shane Legg, Co-Founder and Chief AGI Scientist at Google DeepMind, our AGI Safety Council (ASC) analyzes AGI risk and best practices, making recommendations on safety measures.

As such, we’ve launched a new course on AGI Safety for students, researchers and professionals interested in this topic.

Ultimately, our approach to AGI safety and security serves as a vital roadmap to address …

1 month, 1 week назад @ deepmind.google
Google
последний пост 12 часов назад
Expanding our Risk Protection Program with new insurance partners and AI coverage
Expanding our Risk Protection Program with new insurance partners and AI coverage Expanding our Risk Protection Program with new insurance partners and AI coverage

Coverage for emerging AI risksThe Risk Protection Program is further evolving to address the rapidly changing technological landscape, including the increasing adoption of AI.

A key enhancement now available is the inclusion of Affirmative AI insurance coverage for your Google-related AI workloads.

Translating security investments into financial returnsThe enhanced Risk Protection Program helps translate your dedication to security into tangible business value.

“Google Cloud’s Risk Protection Program has streamlined our insurance underwriting process while providing valuable insight into quantitative risks through standardized reporting, enhanced compliance monitoring, and simplified report…

12 часов назад @ cloud.google.com
A guide to Google ADK and MCP integration with an external server
A guide to Google ADK and MCP integration with an external server A guide to Google ADK and MCP integration with an external server

Anthropic’s Model Context Protocol (MCP) is designed to address this, providing a standardized way for agents to retrieve that crucial, external context needed to inform their responses and actions.

But integrating agents built with Google's Agent Development Kit (ADK) to communicate effectively with an MCP server, especially one hosted externally, might present some integration challenges.

Today, we’ll guide you through developing ADK agents that connect to external MCP servers, initially using Server-Sent Events (SSE).

We’ll take an example of an ADK agent leveraging MCP to access Wikipedia articles, which is a common use case to retrieve external specialised data.

We will also introduce …

1 day, 14 hours назад @ cloud.google.com
Unlock software delivery excellence and quality with Gemini Code Assist agents
Unlock software delivery excellence and quality with Gemini Code Assist agents Unlock software delivery excellence and quality with Gemini Code Assist agents

Gemini Code Assist agentsGemini Code Assist for GitHub, released on February 25 in a public preview as part of Gemini Code Assist for Individuals, meets developers in the pull request workflow.

The agent helps improve documentation quality by:Automating documentation generation and updates.

DORA's research consistently demonstrates a strong correlation between technical debt and decreased software delivery performance.

Strategies for paying down technical debtJust as financial debt requires a deliberate repayment plan, technical debt needs a strategic approach to remediation.

Try Gemini Code Assist todayThe future of software development is a partnership between human ingenuity and artifici…

1 day, 14 hours назад @ cloud.google.com
Cool stuff customers built, May edition: Visual scouts, racing agents, agile ads & more
Cool stuff customers built, May edition: Visual scouts, racing agents, agile ads & more Cool stuff customers built, May edition: Visual scouts, racing agents, agile ads & more

Without our customers, there would be no Google Cloud, as they are the ones building the future on our platform.

What they did: Using Vertex AI Vector Search to help build connections across products, Lowe’s created Visual Scout.

Customers then indicate their preferences by “liking” or “disliking” items in the display, and then Visual Scout dynamically updates the panel with items that reflect customer preferences.

Why it matters: As the service updates in real time, Visual Scout provides an engaging and gamified shopping experience that encourages customer engagement and, ultimately, conversion.

Learn from us: "Both [Vector Search and Feature Store] help Visual Scout process a high volume …

2 days, 14 hours назад @ cloud.google.com
Announcing open-source enhancements to LangChain PostgreSQL
Announcing open-source enhancements to LangChain PostgreSQL Announcing open-source enhancements to LangChain PostgreSQL

What’s newEnhanced security and connectivityBuilding robust and secure generative AI applications requires careful consideration of how your application interacts with the underlying data infrastructure.

With our contributions to the LangChain Postgres package, we've prioritized enhanced security and efficient connectivity through several key improvements.

The enhanced API now clearly separates the permissions required for database schema creation from those needed for routine application usage.

But one of the benefits of using PostgreSQL databases as a vector database is that you can leverage PostgreSQL’s rich querying capabilities to improve the quality of your vector search by using filt…

2 days, 14 hours назад @ cloud.google.com
Evaluate your gen media models with multimodal evaluation on Vertex AI
Evaluate your gen media models with multimodal evaluation on Vertex AI Evaluate your gen media models with multimodal evaluation on Vertex AI

To address this, we're thrilled to introduce Gecko, now available through Google Cloud’s Vertex AI Evaluation Service.

Gecko is a rubric-based and interpretable autorater for evaluating generative AI models that empowers developers with a more nuanced, customizable, and transparent way to assess the performance of image and video generation models.

The challenge of evaluating generative models with auto-ratersCreating useful, performant auto-raters is challenging as the quality of generation dramatically improves.

Introducing Gecko: Interpretable, customizable, and performant evaluationGecko offers a fine-grained, interpretable, and customizable auto-rater.

Gecko makes evaluation interpreta…

2 days, 14 hours назад @ cloud.google.com
Inside Infinity Learn's AI Tutor, powered by Google Cloud
Inside Infinity Learn's AI Tutor, powered by Google Cloud Inside Infinity Learn's AI Tutor, powered by Google Cloud

This model-based custom evaluation leveraged Vertex AI workbench and subject matter experts to confirm AI accuracy within a variation of up to 3-5% (as compared to human reviewer-based accuracy).

Essentially, this level of precision means students can confidently rely on the AI tutor for the majority of their questions.

"Since implementing the AI tutor, we've seen a remarkable shift in how students approach complex problems.

Discover how Google Cloud Consulting can help you leverage Vertex AI to transform your own challenges or business needs.

Check out Google Cloud Consulting’s full portfolio of offerings, or learn how to get started with Vertex AI.

3 days, 14 hours назад @ cloud.google.com
From LLMs to image generation: Accelerate inference workloads with AI Hypercomputer
From LLMs to image generation: Accelerate inference workloads with AI Hypercomputer From LLMs to image generation: Accelerate inference workloads with AI Hypercomputer

From retail to gaming, from code generation to customer care, an increasing number of organizations are running LLM-based applications, with 78% of organizations in development or production today.

As the number of generative AI applications and volume of users scale, the need for performant, scalable, and easy to use inference technologies is critical.

At Google Cloud, we’re paving the way for this next phase of AI’s rapid evolution with our AI Hypercomputer.

At Google Cloud Next 25, we shared many updates to AI Hypercomputer’s inference capabilities, unveiling Ironwood, our newest Tensor Processing Unit (TPU) designed specifically for inference, coupled with software enhancements such as …

6 days, 14 hours назад @ cloud.google.com
Guide to build MCP servers using vibe coding with Gemini 2.5 Pro
Guide to build MCP servers using vibe coding with Gemini 2.5 Pro Guide to build MCP servers using vibe coding with Gemini 2.5 Pro

For developers, this is where "vibe coding " comes in.

Vibe coding helps developers achieve their vision with models like Gemini 2.5 Pro to generate code from natural language prompts.

Today, we’ll show you how vibe coding can help developers create Model Context Protocol (MCP) servers.

MCP, launched in November 2024 by Anthropic, provides an open standard for integrating AI models with various data sources and tools.

You can use Gemini 2.5 Pro's code generation capabilities to create MCP servers with ease, helping you build intuitive, natural language specifications and operational AI infrastructure.

1 week, 1 day назад @ cloud.google.com
How Looker’s semantic layer enables trusted AI for business intelligence
How Looker’s semantic layer enables trusted AI for business intelligence How Looker’s semantic layer enables trusted AI for business intelligence

That's why trusted definitions managed by a semantic layer become indispensable.

Looker’s semantic layer acts as a single source of truth for business metrics and dimensions, helping to ensure that your organization and tools are leveraging consistent and well-defined terms.

A semantic layer is particularly important in the context of gen AI.

When applied directly to ungoverned data, gen AI can produce impressive, but fundamentally inaccurate and inconsistent results.

Our own internal testing has shown that Looker’s semantic layer reduces data errors in gen AI natural language queries by as much as two thirds.

1 week, 1 day назад @ cloud.google.com
Announcing new Vertex AI Prediction Dedicated Endpoints
Announcing new Vertex AI Prediction Dedicated Endpoints Announcing new Vertex AI Prediction Dedicated Endpoints

While existing Vertex AI Prediction Endpoints – managed pools of resources to deploy AI models for online inference – provide a capable serving solution, developers need better ways to reach consistent performance and resource isolation in case of shared resource contention.

Today, we are pleased to announce Vertex AI Prediction Dedicated Endpoints, a new family of Vertex AI Prediction endpoints, designed to address the needs of modern AI applications, including those related with large-scale generative AI models.

gRPC protocol support: For latency-sensitive applications or high-throughput scenarios often encountered with large models, endpoints now natively support gRPC .

The newly integra…

1 week, 3 days назад @ cloud.google.com
Build live voice-driven agentic applications with Vertex AI Gemini Live API
Build live voice-driven agentic applications with Vertex AI Gemini Live API Build live voice-driven agentic applications with Vertex AI Gemini Live API

Imagine frontline professionals using voice commands and visual input to diagnose issues, access vital information, and initiate processes in real-time.

The Gemini 2.0 Flash Live API empowers developers to create next-generation, agentic industry applications.

In this post, we’ll walk you through a use case focused on industrial condition monitoring, specifically motor maintenance, powered by Gemini 2.0 Flash Live API.

The Live API enables low-latency bidirectional voice and video interactions with Gemini.

The model can process text, audio, and video input, and it can provide text and audio output.

1 week, 3 days назад @ cloud.google.com
Pushing the limits of electric mobility: Formula E's Mountain Recharge
Pushing the limits of electric mobility: Formula E's Mountain Recharge Pushing the limits of electric mobility: Formula E's Mountain Recharge

AI Studio then mapped out a route for the challenge.

Smart cars, smart drivers, and a smartphoneDuring the mountain descent, real-time monitoring of the car's progress and energy regeneration would be crucial.

BigQuery's capacity for real-time data ingestion and in-platform AI model creation enabled speedy analysis and the calculation of essential metrics.

AI Studio greatly facilitated the development of the application by translating a picture of a dashboard mockup into fully functional code.

“From figuring out if our crazy Mountain Recharge idea was even possible, to giving us live insights during the descent, AI was our guide,” said Alex Aidan, Formula E’s VP of Marketing.

1 week, 5 days назад @ cloud.google.com
Create chatbots that speak different languages with Gemini, Gemma, Translation LLM, and Model Context Protocol
Create chatbots that speak different languages with Gemini, Gemma, Translation LLM, and Model Context Protocol Create chatbots that speak different languages with Gemini, Gemma, Translation LLM, and Model Context Protocol

Translation LLM server: A dedicated, lightweight MCP server exposing Google Cloud's Translation capabilities as a tool.

Gemini: A specialized MCP server uses Gemini Pro or similar LLM for complex technical reasoning and problem-solving when invoked by the orchestrator.

Model Context Protocol:This protocol will allow Gemma to discover and invoke the Translation and Gemini "tools" running on their respective servers.

Gemma calls on Translation LLM: : Gemma uses the MCP connection to send the French text to the Translation LLM Server, requesting an English translation.

Here, Gemma operates as the client, while Translation LLM, Gemini Flash, and Gemini Pro function on the server.

1 week, 6 days назад @ cloud.google.com
Lowe’s innovation: How Vertex AI helps create interactive shopping experiences
Lowe’s innovation: How Vertex AI helps create interactive shopping experiences Lowe’s innovation: How Vertex AI helps create interactive shopping experiences

At Lowe's, we are always striving to make the shopping experience more enjoyable and convenient for our customers.

To address this issue and enhance the shopping journey, we introduced Visual Scout — an interactive way to explore the product catalog and quickly find products of interest on lowes.com.

Visual Scout is designed for shoppers who value the visual aspects of products when making certain purchasing decisions.

Visual Scout begins by presenting a panel of 10 items.

Based on this feedback, Visual Scout dynamically updates the panel, with items that reflect customer style and design preferences.

2 weeks, 2 days назад @ cloud.google.com
OpenAI
последний пост None
Microsoft Microsoft
последний пост 14 часов назад
Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers
Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers

LEE: Yeah, yeah.

LEE: Yeah, yeah.

LEE: Yeah, yeah.

[LAUGHS]GOLDBERG: Right, right, right, yeah.

Yeah, yeah.

14 часов назад @ microsoft.com
Predicting and explaining AI model performance: A new approach to evaluation
Predicting and explaining AI model performance: A new approach to evaluation Predicting and explaining AI model performance: A new approach to evaluation

This difficulty rating is based on a detailed rubric, originally developed for human tasks and shown to work reliably when applied by AI models.

Top: For each AI model, (1) run the new system on the ADeLe benchmark, and (2) extract its ability profile.

Creating detailed AI ability profilesUsing the 0–5 rating for each ability, the team created comprehensive ability profiles of 15 LLMs.

This analysis revealed the following:When measured against human performance, AI systems show different strengths and weaknesses across the 18 ability scales.

This makes it possible to anticipate potential failures before deployment, adding the important step of reliability assessment for AI models.

3 days, 14 hours назад @ microsoft.com
Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv
Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv

Today I’m talking to two researchers, Hongxia Hao, a senior researcher at Microsoft Research AI for Science, and Bing Lv, an associate professor in physics at the University of Texas at Dallas.

Hongxia and Bing are co-authors of a paper called Probing the Limit of Heat Transfer in Inorganic Crystals with Deep Learning .

LV: So I think one of the biggest things as Hongxia said, right?

We have a lot of new materials, exotic materials, which some of them, Hongxia can elaborate a little bit more.

HAO: Yeah, yeah.

1 week назад @ microsoft.com
Research Focus: Week of May 7, 2025
Research Focus: Week of May 7, 2025 Research Focus: Week of May 7, 2025

NEW RESEARCHThis research introduces Murakkab, a prototype system built on a declarative workflow that reimagines how compound AI systems are built and managed to significantly improve resource efficiency.

They are essential for addressing complex AI tasks.

NEW RESEARCHThis report introduces Phi-4-reasoning (opens in new tab), a 14-billion parameter model optimized for complex reasoning tasks.

The work highlights the combined value of supervised fine-tuning and reinforcement learning for building efficient, high-performing reasoning models.

While LLMs have made considerable strides in complex reasoning tasks, they remain limited by their reliance on static internal knowledge and text-only r…

1 week, 1 day назад @ microsoft.com
Microsoft Fusion Summit explores how AI can accelerate fusion research
Microsoft Fusion Summit explores how AI can accelerate fusion research Microsoft Fusion Summit explores how AI can accelerate fusion research

While scalable fusion energy is still years away, researchers are now exploring how AI can help accelerate fusion research and bring this energy to the grid sooner.

In March 2025, Microsoft Research held its inaugural Fusion Summit, a landmark event that brought together distinguished speakers and panelists from within and outside Microsoft Research to explore this question.

His message was clear: advancing fusion will require international collaboration and the combined power of AI and high-performance computing to model potential fusion reactor designs.

Read more Opens in a new tabExploring AI’s broader impact on fusion engineeringLightning talks from Microsoft Research labs addressed the…

1 week, 1 day назад @ microsoft.com
Abstracts: Societal AI with Xing Xie
Abstracts: Societal AI with Xing Xie Abstracts: Societal AI with Xing Xie

I’m here today with Xing Xie, a partner research manager at Microsoft Research and co-author of a white paper called Societal AI: Research Challenges and Opportunities .

HUIZINGA: So let’s start with a brief overview of the background for this white paper on Societal AI.

XIE: The idea for this white paper emerged in response to the shift we are witnessing in the AI landscape.

XIE: Rather than follow a traditional research methodology, we built this white paper around ten fundamental, foundational research questions.

HUIZINGA: Yeah, yeah, yeah.

1 week, 3 days назад @ microsoft.com
Societal AI: Building human-centered AI systems
Societal AI: Building human-centered AI systems Societal AI: Building human-centered AI systems

This work culminated in Societal AI: Research Challenges and Opportunities, a white paper that explores how AI can better align with societal needs.

Societal AI is an emerging interdisciplinary area of study that examines how AI intersects with social systems and public life.

Societal AI research agendaGuiding principles for responsible integrationThe research agenda is grounded in three key principles:Harmony : AI should minimize conflict and build trust to support acceptance.

How can AI systems be designed to ensure fairness and inclusivity across different cultures, regions, and demographic groups?

For more information, and to access the full white paper, visit the Microsoft Research Soc…

1 week, 3 days назад @ microsoft.com
Laws, norms, and ethics for AI in health
Laws, norms, and ethics for AI in health Laws, norms, and ethics for AI in health

Roxana is among the world’s thought leaders in AI, healthcare, and medicine, thanks in part to groundbreaking work on AI biases and trustworthiness.

When we think about AI, we think about it being very futuristic, but it’s trained on data from the past.

And so I think your struggles and frustrations on, you know, how to expand that nationwide, I think, are really, really informative.

And in a way, I think … I don’t know that I or my coauthors were satisfied with that.

You know, AI has all the time in the world [LAUGHS] to write you a text.

2 weeks назад @ microsoft.com
Research Focus: Week of April 21, 2025
Research Focus: Week of April 21, 2025 Research Focus: Week of April 21, 2025

In this issue:Catch a preview of our presentations and papers at CHI 2025 and ICLR 2025.

You’ll also find a replay of a podcast discussion on rural healthcare innovation with Senior Vice President of Microsoft Health Jim Weinstein.

CONFERENCEMicrosoft at CHI 2025Microsoft Research is proud to be a sponsor of the ACM Computer Human Interaction (CHI) 2025 Conference on Human Factors in Computing Systems (opens in new tab).

It advances our understanding of LLMs and their causal implications, and proposes a framework for future research at the intersection of LLMs and causality.

That’s the driving question behind the Microsoft Research Tools for Thought initiative.

3 weeks, 1 day назад @ microsoft.com
The Future of AI in Knowledge Work: Tools for Thought at CHI 2025
The Future of AI in Knowledge Work: Tools for Thought at CHI 2025 The Future of AI in Knowledge Work: Tools for Thought at CHI 2025

Participants’ reports about critical thinking and the effort involved align with longstanding human tendencies to manage cognitive load at work.

RecommendationsTo foster critical thinking at work, we recommend that AI tools actively encourage awareness, motivation, and skill development.

AI tools should enhance motivators for critical thinking (e.g., quality standards, skill-building) and mitigate inhibitors (e.g., time constraints, low awareness).

AI tools should also support knowledge workers’ ability to think critically by providing reasoning explanations (as some newer AI models now do), guided critiques, and cross-references.

Beyond these insights, our other CHI papers explore practica…

3 weeks, 6 days назад @ microsoft.com
Empowering patients and healthcare consumers in the age of generative AI
Empowering patients and healthcare consumers in the age of generative AI Empowering patients and healthcare consumers in the age of generative AI

[LAUGHS]DEBRONKART: Ah, well, that’s … that’s even weirder.

I’m, as I said at the beginning, I’m glad to be alive and I’m really, really, really grateful to be given a chance to share my thoughts with your audience because I really like super smart nerds.

And it’s really to me like where I’m seeing kind of the first set of really kind of promising AI applications.

And so, to me, that’s really kind of where I see the most interesting opportunities for technology and for digital health.

Just really, really appreciate it.

4 weeks назад @ microsoft.com
Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project
Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project

More expert users are satisfied with AI responses only where AI expertise is on par with their own expertise on the topic, while novice users had low satisfaction rates regardless of AI expertise.

AI expertise: Labels the AI agent expertise based on the same criteria as user expertise above.

We utilized our knowledge work classifier to label the chat log data as relating to knowledge work tasks.

We classified both the user expertise and AI agent expertise for anonymous interactions in Copilot in Bing.

Figure 10: Copilot in Bing satisfaction intersection of AI expertise and User expertise (August-September 2024)ConclusionUnderstanding these metrics is vital for grasping user behavior over ti…

1 month назад @ microsoft.com
Debug-gym: an environment for AI coding tools to learn how to debug code like programmers
Debug-gym: an environment for AI coding tools to learn how to debug code like programmers Debug-gym: an environment for AI coding tools to learn how to debug code like programmers

This was what motivated us to maximize the potential time savings from AI coding tools by teaching them to debug code.

Today’s AI coding tools boost productivity and excel at suggesting solutions for bugs based on available code and error messages.

This may leave users feeling like AI coding tools don’t understand the full context of the issues they are trying to solve.

Introducing debug-gymA natural research question emerges: to what degree can LLMs use interactive debugging tools such as pdb?

The green bars indicate the performance of the agent with debugging tools, while the gray bars show the performance of the agent without debugging tools.

1 month назад @ microsoft.com
Research Focus: Week of April 7, 2025
Research Focus: Week of April 7, 2025 Research Focus: Week of April 7, 2025

Check out our latest research and other updates.

MRI images can suffer from high levels of noise when scanning is accelerated with parallel imaging or when data are acquired using lower cost, low-field MRI systems.

This broad applicability means that the method is flexible and can be applied to different kinds of models without redesigning them.

The testing showed SNRAware significantly improves the quality and clinical utility of MRI images while preserving important diagnostic details.

To help astronomers accelerate this fundamental process, researchers from Microsoft and external colleagues introduce Mephisto, research designed to analyze extremely distant galaxies observed by the James …

1 month назад @ microsoft.com
Real-world healthcare AI development and deployment—at scale
Real-world healthcare AI development and deployment—at scale Real-world healthcare AI development and deployment—at scale

And of course, at the end of the day, I think that’s really been your role.

What is the current state of affairs for multimodal, you know, healthcare AI, medical AI?

So many folks when they hear AI think, it will just magically do everything perfectly.

They’ve had to really think of other ways that will guarantee that the human stays in the loop.

But the fact that we now have a credible chance of making that dream happen for real, I think that’s pretty wonderful.

1 month, 1 week назад @ microsoft.com
MIT AI MIT AI
последний пост 15 часов назад
With AI, researchers predict the location of virtually any protein within a human cell
With AI, researchers predict the location of virtually any protein within a human cell With AI, researchers predict the location of virtually any protein within a human cell

Their method can predict the location of any protein in any human cell line, even when both protein and cell have never been tested before.

This single-cell localization could pinpoint a protein’s location in a specific cancer cell after treatment, for instance.

The researchers combined a protein language model with a special type of computer vision model to capture rich details about a protein and cell.

Our approach is unique in that it can generalize across proteins and cell lines at the same time,” Zhang says.

The researchers verified that PUPS could predict the subcellular location of new proteins in unseen cell lines by conducting lab experiments and comparing the results.

15 часов назад @ news.mit.edu
Study shows vision-language models can’t handle queries with negation words
Study shows vision-language models can’t handle queries with negation words Study shows vision-language models can’t handle queries with negation words

The researchers tested the ability of vision-language models to identify negation in image captions.

Building on those findings, the team created a dataset of images with corresponding captions that include negation words describing missing objects.

Because the image-caption datasets don’t contain examples of negation, VLMs never learn to identify it.

Then they tested models by prompting them with negation words to retrieve images that contain certain objects, but not others.

It improved models’ image retrieval abilities by about 10 percent, while also boosting performance in the multiple-choice question answering task by about 30 percent.

2 days, 2 hours назад @ news.mit.edu
MIT Department of Economics to launch James M. and Cathleen D. Stone Center on Inequality and Shaping the Future of Work
MIT Department of Economics to launch James M. and Cathleen D. Stone Center on Inequality and Shaping the Future of Work MIT Department of Economics to launch James M. and Cathleen D. Stone Center on Inequality and Shaping the Future of Work

In recognition of the gift and the expansion of priorities it supports, on July 1 the initiative will become part of the new James M. and Cathleen D. Stone Center on Inequality and Shaping the Future of Work.

“This generous gift from the Stone Foundation advances our pioneering economics research on inequality, technology, and the future of the workforce.

“Cathy and I are thrilled to welcome MIT to the growing family of Stone Centers dedicated to studying the urgent challenges of accelerating wealth inequality,” James M. Stone says.

Agustín Rayo, dean of the School of Humanities, Arts, and Social Sciences, says, “I am thrilled to celebrate the creation of the James M. and Cathleen D. Stone …

2 days, 9 hours назад @ news.mit.edu
Hybrid AI model crafts smooth, high-quality videos in seconds
Hybrid AI model crafts smooth, high-quality videos in seconds Hybrid AI model crafts smooth, high-quality videos in seconds

Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Adobe Research have now developed a hybrid approach, called “CausVid,” to create videos in seconds.

CausVid’s student model can then generate clips from a simple text prompt, turning a photo into a moving scene, extending a video, or altering its creations with new inputs mid-generation.

Caus(Vid) and effectMany autoregressive models can create a video that’s initially smooth, but the quality tends to drop off later in the sequence.

CausVid instead uses a high-powered diffusion model to teach a simpler system its general video expertise, enabling it to create smooth visuals, but much faster.

Play video…

1 week, 2 days назад @ news.mit.edu
Q&A: A roadmap for revolutionizing health care through data-driven innovation
Q&A: A roadmap for revolutionizing health care through data-driven innovation Q&A: A roadmap for revolutionizing health care through data-driven innovation

Therefore, I founded Holistic Hospital Optimization (H20) with the goal of optimizing hospital operations with machine learning to improve patient care.

Q: What are some surprising ways analytics are being used in health care that most people wouldn’t expect?

These are just two examples; there are many others where an analytical perspective to health care and medicine has made a material difference.

Q: Looking ahead, how do you see artificial intelligence shaping the future of health care?

The book really outlines my work in health care and its applications in the last decade.

1 week, 3 days назад @ news.mit.edu
New tool evaluates progress in reinforcement learning
New tool evaluates progress in reinforcement learning New tool evaluates progress in reinforcement learning

This constant stopping and starting is extremely inefficient, driving up the amount of pollution, including greenhouse gases, that gets emitted per mile of driving.

One approach to counter this is known as eco-driving, which can be installed as a control system in autonomous vehicles to improve their efficiency.

That’s the basic idea behind eco-driving, Wu says.

The benchmark was described in detail in a paper presented at the 2025 International Conference on Learning Representation in Singapore.

Looking at approaches that have been used to address such complex problems, Wu says an important category of methods is multi-agent deep reinforcement learning (DRL), but a lack of adequate standar…

1 week, 3 days назад @ news.mit.edu
Novel AI model inspired by neural dynamics from the brain
Novel AI model inspired by neural dynamics from the brain Novel AI model inspired by neural dynamics from the brain

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a novel artificial intelligence model inspired by neural oscillations in the brain, with the goal of significantly advancing how machine learning algorithms handle long sequences of data.

One new type of AI model, called "state-space models," has been designed specifically to understand these sequential patterns more effectively.

However, existing state-space models often face challenges — they can become unstable or require a significant amount of computational resources when processing long data sequences.

"Our goal was to capture the stability and efficiency seen in biological neural sys…

1 week, 6 days назад @ news.mit.edu
Making AI models more trustworthy for high-stakes settings
Making AI models more trustworthy for high-stakes settings Making AI models more trustworthy for high-stakes settings

Having a smaller prediction set may help a clinician zero in on the right diagnosis more efficiently, which could improve and streamline treatment for patients.

They learn to aggregate the augmentations on these held-out data, automatically augmenting the images in a way that maximizes the accuracy of the underlying model’s predictions.

“Combining test-time augmentation with conformal prediction is simple to implement, effective in practice, and requires no model retraining,” Shanmugam says.

Compared to prior work in conformal prediction across several standard image classification benchmarks, their TTA-augmented method reduced prediction set sizes across experiments, from 10 to 30 percent.…

2 weeks, 1 day назад @ news.mit.edu
The MIT-Portugal Program enters Phase 4
The MIT-Portugal Program enters Phase 4 The MIT-Portugal Program enters Phase 4

I’m thrilled to see the program move into its next phase,” says MIT President Sally Kornbluth.

In each roughly five-year phase, MPP researchers focus on a handful of core research areas.

While work in the Phase 3 fields will continue during Phase 4, researchers will also turn their attention to four more areas: chips/nanotechnology, energy (a previous focus in Phase 2), artificial intelligence, and space.

“We have approval in Phase 4 to bring a number of Portuguese students over, and our principal investigators will benefit from close collaborations with Portuguese researchers,” he says.

“The MIT-Portugal Program has been a key enabler of our research on monitoring the aquatic microbiome fo…

2 weeks, 1 day назад @ news.mit.edu
Merging design and computer science in creative ways
Merging design and computer science in creative ways Merging design and computer science in creative ways

One such researcher is MIT MAD Fellow Alexander Htet Kyaw, a graduate student pursuing dual master’s degrees in architectural studies in computation and in electrical engineering and computer science.

Working with Kyaw were Richa Gupta (architecture) and Bradley Bunch, Nidhish Sagar, and Michael Won — all from the MIT Department of Electrical Engineering and Computer Science (EECS).

Curator AI is designed to streamline online furniture shopping by providing context-aware product recommendations using AI and AR.

The platform uses AR to take the dimensions of a room with locations of windows, doors, and existing furniture.

In April, Kyaw will give a TedX talk at his alma mater, Cornell Univer…

2 weeks, 3 days назад @ news.mit.edu
Novel method detects microbial contamination in cell cultures
Novel method detects microbial contamination in cell cultures Novel method detects microbial contamination in cell cultures

Cell therapy represents a promising new frontier in medicine, especially in treating diseases such as cancers, inflammatory diseases, and chronic degenerative disorders by manipulating or replacing cells to restore function or fight disease.

“Traditionally, cell therapy manufacturing is labor-intensive and subject to operator variability.

By introducing automation and machine learning, we hope to streamline cell therapy manufacturing and reduce the risk of contamination.

Specifically, our method supports automated cell culture sampling at designated intervals to check for contamination, which reduces manual tasks such as sample extraction, measurement, and analysis.

Beyond cell therapy manu…

2 weeks, 6 days назад @ news.mit.edu
Artificial intelligence enhances air mobility planning
Artificial intelligence enhances air mobility planning Artificial intelligence enhances air mobility planning

Every day, hundreds of chat messages flow between pilots, crew, and controllers of the Air Mobility Command's 618th Air Operations Center (AOC).

Their mission planning allows the U.S. Air Force to quickly respond to national security needs around the globe.

Now, we are using chat, which creates opportunities for artificial intelligence to enhance our workflows," says Colonel Joseph Monaco, the director of strategy at the 618th AOC, which is the Department of Defense's largest air operations center.

The 618th AOC is sponsoring Lincoln Laboratory to develop these artificial intelligence tools, through a project called Conversational AI Technology for Transition (CAITT).

That group will implem…

2 weeks, 6 days назад @ news.mit.edu
Designing a new way to optimize complex coordinated systems
Designing a new way to optimize complex coordinated systems Designing a new way to optimize complex coordinated systems

Now, researchers at MIT have developed an entirely new way of approaching these complex problems, using simple diagrams as a tool to reveal better approaches to software optimization in deep-learning models.

The researchers decided to focus on the particular class of deep-learning algorithms, which are currently a hot topic of research.

Deep learning is the basis of the large artificial intelligence models, including large language models such as ChatGPT and image-generation models such as Midjourney.

The numbers within matrices are parameters, and are updated during long training runs, allowing for complex patterns to be found.

“This paper is the first time I’ve seen such a notation used t…

3 weeks назад @ news.mit.edu
New model predicts a chemical reaction’s point of no return
New model predicts a chemical reaction’s point of no return New model predicts a chemical reaction’s point of no return

When chemists design new chemical reactions, one useful piece of information involves the reaction’s transition state — the point of no return from which a reaction must proceed.

Their model could make it easier for chemists to design chemical reactions that could generate a variety of useful compounds, such as pharmaceuticals or fuels.

However, that process requires a great deal of computing power and can take hours or days to calculate a single transition state.

In 2023, Kulik, Duan, and others reported on a machine-learning strategy that they developed to predict the transition states of reactions.

“A linear guess is a good starting point for approximating where that transition state wil…

3 weeks, 1 day назад @ news.mit.edu
“Periodic table of machine learning” could fuel AI discovery
“Periodic table of machine learning” could fuel AI discovery “Periodic table of machine learning” could fuel AI discovery

The periodic table stems from one key idea: All these algorithms learn a specific kind of relationship between data points.

Just like the periodic table of chemical elements, which initially contained blank squares that were later filled in by scientists, the periodic table of machine learning also has empty spaces.

An accidental equationThe researchers didn’t set out to create a periodic table of machine learning.

It includes everything from classification algorithms that can detect spam to the deep learning algorithms that power LLMs.

In addition, the flexible periodic table allows researchers to add new rows and columns to represent additional types of datapoint connections.

3 weeks, 2 days назад @ news.mit.edu
Berkeley AI
последний пост 1 month назад
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign) Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications.

To mitigate the imminent prompt injection threat, we propose two fine-tuning-defenses, StruQ and SecAlign.

Prompt Injection Attack: CausesBelow is the threat model of prompt injection attacks.

Prompt injection threat model in LLM-integrated applicationsWe propose that prompt injection has two causes.

Below are resources to learn more and keep updated on prompt injection attacks and defenses.

1 month назад @ bair.berkeley.edu
Repurposing Protein Folding Models for Generation with Latent Diffusion
Repurposing Protein Folding Models for Generation with Latent Diffusion Repurposing Protein Folding Models for Generation with Latent Diffusion

Repurposing Protein Folding Models for Generation with Latent DiffusionPLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models.

In PLAID, we develop a method that learns to sample from the latent space of protein folding models to generate new proteins.

Unlike many previous protein structure generative models, PLAID addresses the multimodal co-generation problem setting: simultaneously generating both discrete sequence and continuous all-atom structural coordinates.

In this way, we can use structural understanding information in the weights of pretrained protein folding models for the p…

1 month, 1 week назад @ bair.berkeley.edu
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway DeploymentTraining Diffusion Models with Reinforcement LearningWe deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone.

The challenges of phantom jamsA stop-and-go wave moving backwards through highway traffic.

Smoothing behavior of RL AVs.

Overall, the steps towards deployment involved:Training in data-driven simulations: We used highway traffic data from I-24 to create a training environment with realistic wave dynamics, then validate the trained agent’s performance and robustness in a variety of new traffic scenarios.…

1 month, 3 weeks назад @ bair.berkeley.edu
Virtual Personas for Language Models via an Anthology of Backstories
Virtual Personas for Language Models via an Anthology of Backstories Virtual Personas for Language Models via an Anthology of Backstories

Virtual Personas for Language Models via an Anthology of BackstoriesWe introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience.

What does it mean for large language models (LLMs) to be trained on massive text corpora, collectively produced by millions and billions of distinctive human authors?

In this work, we introduce Anthology, an approach for steering LLMs to representative, consistent, and diverse virtual personas by providing richly detailed life narratives of individuals as conditioning context to models.

By grounding langu…

6 months назад @ bair.berkeley.edu
Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination
Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination

Linguistic Bias in ChatGPT: Language Models Reinforce Dialect DiscriminationSample language model responses to different varieties of English and native speaker reactions.

Over 1 billion people around the world speak varieties such as Indian English, Nigerian English, Irish English, and African-American English.

Then, we compared the language model responses to the “standard” varieties and the non-“standard” varieties.

Here, we included the original GPT-3.5 responses, plus responses from GPT-3.5 and GPT-4 where the models were told to imitate the style of the input.

That can reinforce barriers against speakers of non-“standard” varieties as AI models become increasingly used in …

7 months, 3 weeks назад @ bair.berkeley.edu
How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark
How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT BenchmarkWhen we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure languages.

This blog post shows how to use a new, state-of-the art jailbreak benchmark - StrongREJECT - to accurately and robustly evaluate jailbreak methods.

PAP instructs an attacker model to persuade a victim model to give it harmful information using techniques like misrepresentation and logical appeals.

We conducted two experiments to test this hypothesis:We used StrongREJECT to evaluate 37 jailbreak methods on an unaligned model; Dolp…

8 months, 2 weeks назад @ bair.berkeley.edu
Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark!
Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark! Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark!

Launching VHs: The Visual Haystacks Benchmark!

Humans excel at processing vast arrays of visual information, a skill that is crucial for achieving artificial general intelligence (AGI).

Visual Haystacks: the first "visual-centric" Needle-In-A-Haystack (NIAH) benchmark designed to rigorously evaluate Large Multimodal Models (LMMs) in processing long-context visual information.

The first NIAH benchmark for visual reasoning was introduced by Google in the Gemini-v1.5 technical report.

What is the Visual Haystacks (VHs) Benchmark?

9 months, 4 weeks назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 10 часов назад
How Apoidea Group enhances visual information extraction from banking documents with multimodal models using LLaMA-Factory on Amazon SageMaker HyperPod
How Apoidea Group enhances visual information extraction from banking documents with multimodal models using LLaMA-Factory on Amazon SageMaker HyperPod How Apoidea Group enhances visual information extraction from banking documents with multimodal models using LLaMA-Factory on Amazon SageMaker HyperPod

As such, information extraction technology is instrumental in accelerating customer onboarding, maintaining regulatory compliance, and driving the digital transformation of the banking sector, particularly in high-volume document processing tasks.

In this post, we present our work and step-by-step code on fine-tuning the Qwen2-VL-7B-Instruct model using LLaMA-Factory on SageMaker HyperPod.

Fine-tuning vision-language models for visual document understanding tasks offers significant advantages due to their advanced architecture and pre-trained capabilities.

This makes vision-language models powerful tools for advancing visual document understanding technology in both research and practical a…

10 часов назад @ aws.amazon.com
How Qualtrics built Socrates: An AI platform powered by Amazon SageMaker and Amazon Bedrock
How Qualtrics built Socrates: An AI platform powered by Amazon SageMaker and Amazon Bedrock How Qualtrics built Socrates: An AI platform powered by Amazon SageMaker and Amazon Bedrock

In this post, we share how Qualtrics built an AI platform powered by Amazon SageMaker and Amazon Bedrock.

Early 2020, with the push for deep learning and transformer models, Qualtrics created its first enterprise-level ML platform called Socrates.

The standout feature of the Unified GenAI Gateway is its centralized integration with inference platforms like SageMaker Inference and Amazon Bedrock.

“By seamlessly integrating SageMaker Inference into our Socrates platform, we’re able to deliver inference advancements in AI to our global customer base.

Partnership with SageMaker InferenceSince adopting SageMaker Inference, the Qualtrics Socrates team has been a key collaborator in the developmen…

10 часов назад @ aws.amazon.com
Vxceed secures transport operations with Amazon Bedrock
Vxceed secures transport operations with Amazon Bedrock Vxceed secures transport operations with Amazon Bedrock

This solution enables efficient document searching, simplifies trip booking, and enhances operational decisions while maintaining data security and protection.

LimoConnect Q solution overview and implementation highlightsTo address the challenges of secure, efficient, and intelligent ground transportation management, Vxceed developed LimoConnect Q, an AI-powered solution.

LimoConnect Q’s architecture uses Amazon Bedrock, Amazon API Gateway, Amazon DynamoDB, and AWS Lambda to create a secure, scalable AI-powered transportation management system.

Access to multiple AI models: Amazon Bedrock supports various FMs, allowing Vxceed to experiment and optimize performance across different use cases…

10 часов назад @ aws.amazon.com
Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia
Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

PixArt-Sigma is a diffusion transformer model that is capable of image generation at 4k resolution.

By using these AI chips, you can achieve optimal performance and efficiency when running inference with diffusion transformer models like PixArt-Sigma.

Step 2 – Download and compile the PixArt-Sigma model for AWS TrainiumThis section provides a step-by-step guide to compiling PixArt-Sigma for AWS Trainium.

Load compiled component models into the generation pipelineAfter each component model has been compiled, load them into the overall generation pipeline for image generation.

He helps customers train and deploy machine learning models leveraging AWS Trainium or AWS Inferentia chips to accele…

1 day, 14 hours назад @ aws.amazon.com
Customize DeepSeek-R1 671b model using Amazon SageMaker HyperPod recipes – Part 2
Customize DeepSeek-R1 671b model using Amazon SageMaker HyperPod recipes – Part 2 Customize DeepSeek-R1 671b model using Amazon SageMaker HyperPod recipes – Part 2

We demonstrate this through the step-by-step implementation of these recipes using both SageMaker training jobs and SageMaker HyperPod.

In this post, we explore solutions that demonstrate how to fine-tune the DeepSeek-R1 model using these recipes on either SageMaker HyperPod or SageMaker training jobs.

Option B: Fine-tune using SageMaker HyperPod with SlurmTo fine-tune the model using HyperPod, make sure that your cluster is up and ready by following the prerequisites mentioned earlier.

To start using SageMaker HyperPod recipes, visit our sagemaker-hyperpod-recipes GitHub repository for comprehensive documentation and example implementations.

He specializes in large language model training …

1 day, 15 hours назад @ aws.amazon.com
Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights
Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights Build a financial research assistant using Amazon Q Business and Amazon QuickSight for generative AI–powered insights

The necessary users and groups for Amazon Q Business and QuickSight access with at least one Amazon Q Business Pro user with administrative privileges.

Under Security and permissions, give access to Amazon Q Business application by selecting the Amazon Q Business application you created.

Query your Amazon Q Business applicationTo start chatting with Amazon Q Business, complete the following steps:On the Amazon Q Business console, choose your Amazon Q Business application.

Remove Amazon Q Business resources:Delete the Amazon Q Business application.

To learn more about Amazon Q Business, refer to the Amazon Q Business User Guide.

1 day, 15 hours назад @ aws.amazon.com
Securing Amazon Bedrock Agents: A guide to safeguarding against indirect prompt injections
Securing Amazon Bedrock Agents: A guide to safeguarding against indirect prompt injections Securing Amazon Bedrock Agents: A guide to safeguarding against indirect prompt injections

Amazon Bedrock Agents has shared a few sample prompts for Sonnet, Haiku, and Amazon Titan Text Premier models in the Agents Blueprints Prompt Library.

Refer to Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 2 in the AWS Machine Learning Blog for more details on logging and observability for Amazon Bedrock Agents.

ConclusionIn this post, we’ve explored comprehensive strategies to safeguard your Amazon Bedrock Agents against indirect prompt injections.

Satveer Khurpa is a Sr. WW Specialist Solutions Architect, Amazon Bedrock at Amazon Web Services, specializing in Bedrock Security.

Sumanik Singh is a Software Developer engineer at Amazon Web Se…

2 days, 12 hours назад @ aws.amazon.com
Build scalable containerized RAG based generative AI applications in AWS using Amazon EKS with Amazon Bedrock
Build scalable containerized RAG based generative AI applications in AWS using Amazon EKS with Amazon Bedrock Build scalable containerized RAG based generative AI applications in AWS using Amazon EKS with Amazon Bedrock

Amazon EKS provides a scalable, secure, and cost-efficient environment for building RAG applications with Amazon Bedrock and also enables efficient deployment and monitoring of AI-driven workloads while leveraging Bedrock’s FMs for inference.

This enables a RAG scenario with Amazon Bedrock by enriching the generative AI prompt using Amazon Bedrock APIs with your company-specific data retrieved from the OpenSearch Serverless vector database.

Solution overviewThe solution uses Amazon EKS managed node groups to automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for the Amazon EKS Kubernetes cluster.

Create an Amazon Bedrock knowledge base.

Using Amazon S3 and Am…

2 days, 13 hours назад @ aws.amazon.com
How Hexagon built an AI assistant using AWS generative AI services
How Hexagon built an AI assistant using AWS generative AI services How Hexagon built an AI assistant using AWS generative AI services

Understanding these advantages, we partnered with AWS to embark on a journey to develop HxGN Alix, an AI-powered digital worker using AWS generative AI services.

Understanding consumer generative AI and enterprise generative AIGenerative AI serves diverse purposes, with consumer and enterprise applications differing in scope and focus.

This phase was characterized by deepening our technical expertise, refining operational processes, and gaining real-world experience with generative AI models.

This phase was characterized by deepening our technical expertise, refining operational processes, and gaining real-world experience with generative AI models.

Amazon Bedrock – We used Amazon Bedrock a…

2 days, 13 hours назад @ aws.amazon.com
Build an intelligent community agent to revolutionize IT support with Amazon Q Business
Build an intelligent community agent to revolutionize IT support with Amazon Q Business Build an intelligent community agent to revolutionize IT support with Amazon Q Business

Benefits of Amazon Q BusinessThe following are some relevant benefits of Amazon Q Business:Scalability – As an AWS cloud-based service, Amazon Q is highly scalable and able to handle numerous concurrent requests from multiple employees without performance degradation.

Amazon S3 connector – Organizations can upload product documents directly to Amazon Simple Storage Service (Amazon S3), enabling Amazon Q Business to access and incorporate this information into its knowledge base.

Create an Amazon Q applicationYou are now ready to create an Amazon Q application:Sign in to your AWS account on the AWS Management Console and set your preferred AWS Region.

Downgrade your IAM Identity Center user …

3 days, 12 hours назад @ aws.amazon.com
Elevate marketing intelligence with Amazon Bedrock and LLMs for content creation, sentiment analysis, and campaign performance evaluation
Elevate marketing intelligence with Amazon Bedrock and LLMs for content creation, sentiment analysis, and campaign performance evaluation Elevate marketing intelligence with Amazon Bedrock and LLMs for content creation, sentiment analysis, and campaign performance evaluation

Conduct sentiment analysis with social media dataThe next step involves conducting sentiment analysis on social media data.

Using Anthropic Claude 3.5 Sonnet: ================================== Sentiment: Negative Explanation: This post expresses a strongly negative sentiment towards [AnyCompany]'s printer ink refills.

Positive sentiments (praises) – A summary of positive sentiments and themes extracted from the social media sentiment analysis.

Negative sentiments (flaws) – A summary of negative sentiments and critiques extracted from the social media sentiment analysis.

Generated marketing content – The content generated by the content generation LLM, such as ad copy, social media posts, a…

6 days, 13 hours назад @ aws.amazon.com
How Deutsche Bahn redefines forecasting using Chronos models – Now available on Amazon Bedrock Marketplace
How Deutsche Bahn redefines forecasting using Chronos models – Now available on Amazon Bedrock Marketplace How Deutsche Bahn redefines forecasting using Chronos models – Now available on Amazon Bedrock Marketplace

In this post, we share how Deutsche Bahn is redefining forecasting using Chronos models, and provide an example use case to demonstrate how you can get started using Chronos.

Chronos: Learning the language of time seriesThe Chronos model family represents a breakthrough in time series forecasting by using language model architectures.

Unlike traditional time series forecasting models that require training on specific datasets, Chronos can be used for forecasting immediately.

Building on this success, we recently launched Chronos-Bolt, which delivers higher zero-shot accuracy compared to original Chronos models.

Using Chronos, Deutsche Bahn demonstrates how cutting-edge technology can transf…

1 week, 1 day назад @ aws.amazon.com
Use custom metrics to evaluate your generative AI application with Amazon Bedrock
Use custom metrics to evaluate your generative AI application with Amazon Bedrock Use custom metrics to evaluate your generative AI application with Amazon Bedrock

With Amazon Bedrock Evaluations, you can evaluate foundation models (FMs) and Retrieval Augmented Generation (RAG) systems, whether hosted on Amazon Bedrock or another model or RAG system hosted elsewhere, including Amazon Bedrock Knowledge Bases or multi-cloud and on-premises deployments.

Whether assessing factuality, coherence, helpfulness, or domain-specific criteria, custom metrics in Amazon Bedrock enable more meaningful and actionable evaluation insights.

Create a RAG system evaluation with custom metrics using Amazon Bedrock EvaluationsIn this example, we walk through a RAG system evaluation with a combination of built-in metrics and custom evaluation metrics on the Amazon Bedrock co…

1 week, 2 days назад @ aws.amazon.com
Build a gen AI–powered financial assistant with Amazon Bedrock multi-agent collaboration
Build a gen AI–powered financial assistant with Amazon Bedrock multi-agent collaboration Build a gen AI–powered financial assistant with Amazon Bedrock multi-agent collaboration

In terms of communication and reporting, the system generates detailed company financial portfolios and creates comprehensive revenue and expense reports.

Overview of Amazon Bedrock multi-agent collaborationThe Amazon Bedrock multi-agent collaboration framework facilitates the development of sophisticated systems that use LLMs.

Use Amazon Bedrock multi-agent collaboration to power the financial assistantThe implementation of a multi-agent approach offers numerous compelling advantages.

In Advanced prompts, add the following prompt configuration at lines 22 and 23:Here is an example of a company portfolio.

Initial setup requires establishing collaboration:Open the financial agent (primary ag…

1 week, 6 days назад @ aws.amazon.com
WordFinder app: Harnessing generative AI on AWS for aphasia communication
WordFinder app: Harnessing generative AI on AWS for aphasia communication WordFinder app: Harnessing generative AI on AWS for aphasia communication

The challenge: Overcoming communication barriersIn 2023, it was estimated that more than 140,000 people in Australia were living with aphasia.

To address this challenge, the WordFinder app sends the initial word list to an AWS Lambda function through Amazon API Gateway, a fully managed service that securely handles API requests.

WordFinder app component detailsIn this section, we take a closer look at the components of the WordFinder app.

Upon receipt of this data, the WordFinder app updates and displays the new list of words for the user who has aphasia.

ConclusionThe QARC team and Scott Harding worked closely with AWS to develop WordFinder, a mobile app that addresses communication challe…

1 week, 6 days назад @ aws.amazon.com
NVIDIA
последний пост 11 часов назад
Simplify Setup and Boost Data Science in the Cloud using NVIDIA CUDA-X and Coiled
Simplify Setup and Boost Data Science in the Cloud using NVIDIA CUDA-X and Coiled Simplify Setup and Boost Data Science in the Cloud using NVIDIA CUDA-X and Coiled

This post demonstrates how to use NVIDIA RAPIDS, part of NVIDIA CUDA-X libraries, and cloud GPUs with the cloud platform Coiled, which is designed to simplify running Python workloads at scale.

Cloud GPUsMany cloud platforms provide immediate access to the latest NVIDIA GPU architectures, without hardware refresh cycles, offering flexibility to scale resources based on computational demands.

Coiled NotebooksTo start an interactive Jupyter notebook session with Coiled Notebooks, run the RAPIDS notebook container through the notebook service.

coiled notebook start --gpu --container nvcr.io/nvidia/rapidsai/notebooks:25.02-cuda12.8-py3.12Note that the --gpu flag will automatically select a g4dn…

11 часов назад @ developer.nvidia.com
Predicting Performance on Apache Spark with GPUs
Predicting Performance on Apache Spark with GPUs Predicting Performance on Apache Spark with GPUs

The tool works by taking the Spark event logs generated from your CPU-based Spark applications as its primary input.

Recommended Spark configurations for GPU, which are calculated based on cluster information (like memory and cores) and data from the Spark event logs that could impact the performance of Spark applications on GPU.

The process involves running the target Spark applications on both CPU and GPU clusters and collecting the resulting Spark event logs.

Getting started with Apache Spark on GPUsEnterprises can take advantage of the RAPIDS Accelerator for Apache Spark to seamlessly migrate Apache Spark workloads to NVIDIA GPUs.

Run existing Apache Spark applications on GPUs with no c…

13 часов назад @ developer.nvidia.com
Exploring the Revenue-Generating Potential of AI Factories
Exploring the Revenue-Generating Potential of AI Factories Exploring the Revenue-Generating Potential of AI Factories

The best AI factories use accelerated computing to increase tokens per watt — optimizing AI performance while dramatically increasing energy efficiency across AI factories and applications.

How an AI Factory Works in PracticeAn AI factory is a system of components that come together to turn data into intelligence.

NVIDIA provides all the components needed to build AI factories, including accelerated computing, high-performance GPUs, high-bandwidth networking and optimized software.

A full-stack AI factory supports enterprises in achieving operational excellence, enabling them to harness AI’s potential faster and with greater confidence.

Learn more about how AI factories are redefining data …

15 часов назад @ blogs.nvidia.com
Accelerating Embedding Lookups with cuEmbed
Accelerating Embedding Lookups with cuEmbed Accelerating Embedding Lookups with cuEmbed

NVIDIA recently released cuEmbed, a high-performance, header-only CUDA library that accelerates embedding lookups on NVIDIA GPUs.

In this post, I explain what embedding lookups are, why they’re critical for recommenders, and how the cuEmbed optimization techniques deliver exceptional performance.

Embedding lookup forward pass for one batch sampleLearning the embedding vectors requires a backward pass, in which gradients arriving from the downstream neural network are propagated back to the embedding table.

The strategy used by cuEmbed to increase performance of embedding lookups is to maximize the number of loads-in-flight presented to the memory system.

Chen Yang, on Pinterest ML Foundatio…

15 часов назад @ developer.nvidia.com
Time to Slay: ‘DOOM: The Dark Ages’ Looms on GeForce NOW
Time to Slay: ‘DOOM: The Dark Ages’ Looms on GeForce NOW Time to Slay: ‘DOOM: The Dark Ages’ Looms on GeForce NOW

This GFN Thursday, DOOM: The Dark Ages — the bold medieval-inspired prequel to DOOM and DOOM Eternal — is available for GeForce NOW premium members, aka Ultimate and Performance members, to stream from the cloud at launch.

GeForce NOW Ultimate or Performance members can now claim the DOOM Slayer Verdant skin reward, a fierce, ruthless-looking armor set that’s built for relentless slaughter.

Ultimate and Performance members enjoy higher resolutions and lower latency compared with free users for a true cloud-gaming edge.

Ultimate and Performance members can play with GeForce RTX 4080-level performance with the highest frame rates and lowest latency.

Ultimate members can elevate their adventur…

17 часов назад @ blogs.nvidia.com
Into the Omniverse: Computational Fluid Dynamics Simulation Finds Smoothest Flow With AI-Driven Digital Twins
Into the Omniverse: Computational Fluid Dynamics Simulation Finds Smoothest Flow With AI-Driven Digital Twins Into the Omniverse: Computational Fluid Dynamics Simulation Finds Smoothest Flow With AI-Driven Digital Twins

Among the powerful CAE methods, computational fluid dynamics (CFD) simulation plays a critical role in understanding and optimizing fluid flow for use cases, such as aerodynamic testing in aerospace and automotive engineering or thermal management for electronics.

Ansys, a leader in simulation software, is harnessing the power of NVIDIA technologies for real-time physics and accelerated simulation with AI-driven digital twins.

Real-Time Digital Twins for CFDAnsys is also adopting Omniverse and OpenUSD to create more connected, collaborative simulation environments for CFD.

Ansys Simulation World is a virtual and in-person global simulation experience.

Stay up to date by subscribing to NVIDI…

17 часов назад @ blogs.nvidia.com
Visa Makes Payments Personalized and Secure With AI
Visa Makes Payments Personalized and Secure With AI Visa Makes Payments Personalized and Secure With AI

The company is also working toward agentic commerce — where AI agents help customers with payments.

For example, AI agents with access to payment credentials can handle transactions on behalf of consumers, like booking travel arrangements, when instructed.

Jacob Liberman, director of product management at NVIDIA, explains how agentic AI bridges the gap between powerful AI models and practical enterprise applications.

Telenor Builds Norway’s First AI Factory, Offering Sustainable and Sovereign Data ProcessingTelenor opened Norway’s first AI factory in November 2024, enabling organizations to process sensitive data securely on Norwegian soil while prioritizing environmental responsibility.

Te…

1 day, 14 hours назад @ blogs.nvidia.com
Press Play on Don Diablo’s Music Video — Created With NVIDIA RTX-Powered Generative AI
Press Play on Don Diablo’s Music Video — Created With NVIDIA RTX-Powered Generative AI Press Play on Don Diablo’s Music Video — Created With NVIDIA RTX-Powered Generative AI

Electronic musician Don Diablo’s ‘BLACKOUT’ music video highlights a hybrid creative workflow built for speed, control and storytelling, tapping into RTX-powered image generation and cinematic cloud-based animation.

Electronic music icon Don Diablo is known for pushing the boundaries of music, visual arts and live performance — and the music video for his latest single, “BLACKOUT,” explores using generative AI, combining NVIDIA RTX-powered and cloud-based tools in a hybrid workflow.

“Working with these new AI tools literally feels like stepping into my own mind,” said Diablo.

With RTX AI PCs, users can access the latest and greatest models and tools, as well as powerful AI performance.

Plug…

1 day, 17 hours назад @ blogs.nvidia.com
How Reasoning AI Agents Transform High-Stakes Decision Making
How Reasoning AI Agents Transform High-Stakes Decision Making How Reasoning AI Agents Transform High-Stakes Decision Making

Thanks to reasoning AI models, agents can learn how to think critically and tackle complex tasks.

Reasoning agents are making a splash in industries where decisions rely on multiple factors.

Reasoning OffModern AI agents can toggle reasoning on and off, allowing them to efficiently use compute and tokens.

Reasoning AI Agents in ActionReasoning AI agents are already being used for complex problem-solving across industries, including:Healthcare: Enhancing diagnostics and treatment planning.

Designing an AI Reasoning AgentA few key components are required to build an AI agent, including tools, memory and planning modules.

2 days, 15 hours назад @ blogs.nvidia.com
NVIDIA Partners Showcase Cutting-Edge Robotic and Industrial AI Solutions at Automate 2025
NVIDIA Partners Showcase Cutting-Edge Robotic and Industrial AI Solutions at Automate 2025 NVIDIA Partners Showcase Cutting-Edge Robotic and Industrial AI Solutions at Automate 2025

Deepu Talla, vice president of robotics and edge AI at NVIDIA, delivered a keynote on physical AI and industrial autonomy.

It will provide future KUKA robots with better AI vision and AI-based control tasks powered by NVIDIA’s software stack.

Developers can now use their own video data to try the AI Blueprint for VSS in the cloud with NVIDIA Launchable.

The copilot uses NVIDIA accelerated computing and NVIDIA NIM and NeMo Retriever microservices from the AI Blueprint for VSS to add multimodal capabilities.

Explore the following talks to connect with NVIDIA and its partners at Automate:Learn more about NVIDIA’s latest work in robotics and industrial AI at Automate, running through May 15.

2 days, 16 hours назад @ blogs.nvidia.com
NVIDIA Scores COMPUTEX Best Choice Awards
NVIDIA Scores COMPUTEX Best Choice Awards NVIDIA Scores COMPUTEX Best Choice Awards

NVIDIA GeForce RTX 5090 GPU, NVIDIA Quantum-X Photonics InfiniBand switch systems, NVIDIA DGX Spark, NVIDIA GB200 NVL72 and NVIDIA Cosmos recognized at Asia’s premier technology and computer trade exhibition.

The NVIDIA GeForce RTX 5090 GPU won the Gaming and Entertainment category award; the NVIDIA Quantum-X Photonics InfiniBand switch system won the Networking and Communication category award; NVIDIA DGX Spark won the Computer and System category award; and the NVIDIA GB200 NVL72 system and NVIDIA Cosmos world foundation model development platform won Golden Awards.

GB200 NVL72 and NVIDIA Cosmos Go GoldNVIDIA GB200 NVL72 and NVIDIA Cosmos each won Golden Awards.

The NVIDIA GB200 NVL72 sys…

3 days, 2 hours назад @ blogs.nvidia.com
Wildfire Prevention: AI Startups Support Prescribed Burns, Early Alerts
Wildfire Prevention: AI Startups Support Prescribed Burns, Early Alerts Wildfire Prevention: AI Startups Support Prescribed Burns, Early Alerts

AI enables fire departments to keep more eyes on controlled burns, making them safer and more accepted in communities, say industry experts.

“This is just like cancer treatment,” said Sonia Kastner, CEO and founder of Pano AI, based in San Francisco.

Those images are transmitted to the cloud every minute, where AI models running on GPUs do inference for smoke detection.

Harnessing AI for Controlled BurnsCalifornia Department of Forestry and Fire Protection (CAL FIRE) is carrying out prescribed fires, or controlled burns, to reduce dry vegetation that creates fuel for wildfires.

Green Grid has deployed its trailer mounted AI camera sensors for monitoring fires and control burns before they g…

1 week назад @ blogs.nvidia.com
Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud
Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud

We’ll demonstrate integration with serving platforms such as NVIDIA Triton Inference Server, performant LLM inference with vLLM, and deployment on cloud platforms.

Basic deployment: Distributed inference with predict_batch_udfSpark 3.4 introduced the predict_batch_udf API, which provides a simple interface for DL model inference.

With this direct inference approach, you can port your existing PyTorch, TensorFlow, or Huggingface framework code to Spark for distributed inference with minimal code changes.

Distributed inference using an inference server, decoupling CPU and GPU execution.

Setting spark.executor.resource.gpu.amount=(gpus per node) will allocate the requisite GPUs to each executo…

1 week назад @ developer.nvidia.com
Spotlight: Accelerating the Discovery of New Battery Materials with SES AI’s Molecular Universe
Spotlight: Accelerating the Discovery of New Battery Materials with SES AI’s Molecular Universe Spotlight: Accelerating the Discovery of New Battery Materials with SES AI’s Molecular Universe

The number of possible electrolyte materials untouched in the whole chemical design space, the Molecular Universe, is astronomical, and due to the combinatorial power of elements with high connectivity to each other.

The Molecular Universe contains an astronomical number of molecules that could be used as new materials, but humans have only touched an infinitesimal fraction of it.

Mapping the Molecular UniverseSES AI, a company specializing in battery innovation, are using NVIDIA hardware and software to build a “map” of the Molecular Universe.

Verifying the breadth of a screening effort required a way of understanding how all the molecules in the Molecular Universe are related to one anoth…

1 week назад @ developer.nvidia.com
LM Studio Accelerates LLM Performance With NVIDIA GeForce RTX GPUs and CUDA 12.8
LM Studio Accelerates LLM Performance With NVIDIA GeForce RTX GPUs and CUDA 12.8 LM Studio Accelerates LLM Performance With NVIDIA GeForce RTX GPUs and CUDA 12.8

The release of LM Studio 0.3.15 brings improved performance for RTX GPUs thanks to CUDA 12.8, significantly improving model load and response times.

The latest improvements to LM Studio improve its performance and usability — delivering the highest throughput yet on RTX AI PCs.

NVIDIA partnered with the LM Studio and llama.cpp communities to integrate several enhancements to maximize RTX GPU performance.

With a compatible driver, LM Studio automatically upgrades to the CUDA 12.8 runtime, enabling significantly faster model load times and higher overall performance.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter.

1 week назад @ blogs.nvidia.com
Facebook
последний пост 1 week назад
Accelerating GPU indexes in Faiss with NVIDIA cuVS
Accelerating GPU indexes in Faiss with NVIDIA cuVS Accelerating GPU indexes in Faiss with NVIDIA cuVS

Meta and NVIDIA collaborated to accelerate vector search on GPUs by integrating NVIDIA cuVS into Faiss v1.10 , Meta’s open source library for similarity search.

In its latest release, Faiss 1.10.0 officially includes these algorithms from the NVIDIA cuVS library.

Faiss 1.10.0 also includes a new conda package that unlocks the ability to choose between the classic Faiss GPU implementations and the newer NVIDIA cuVS algorithms, making it easy for users to switch between GPU and CPU.

Build time (95% recall@10)Index Embeddings100M x 96(seconds) Embeddings5M x 1536(seconds) Faiss Classic Faiss cuVS Faiss Classic Faiss cuVS Faiss Classic Faiss cuVS IVF Flat IVF Flat 101.4 37.9 (2.7x) 24.4 15.2 (1…

1 week назад @ engineering.fb.com
Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes
Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes Introducing AutoPatchBench: A Benchmark for AI-Powered Security Fixes

We are introducing AutoPatchBench, a benchmark for the automated repair of vulnerabilities identified through fuzzing.

As illustrated, fixing a fuzzing crash involves:Analyzing the crash stack trace and the target code.

Inside AutoPatchBenchWe’re making AutoPatchBench publicly available as part of CyberSecEval 4 to encourage community collaboration in tackling the challenge of automating fuzzing crash repairs.

Then we find the lowest common ancestor (LCA) across all pairs of stacktraces offered by the groundtruth patch and the LLM patch.

As an experienced Security Engineer at Meta, your task is to address the following security-critical fuzzing crash.

2 weeks, 2 days назад @ engineering.fb.com
Building multimodal AI for Ray-Ban Meta glasses
Building multimodal AI for Ray-Ban Meta glasses Building multimodal AI for Ray-Ban Meta glasses

With our Ray-Ban Meta glasses, multimodal AI helps the glasses see what the wearer is seeing.

This means anyone wearing Ray-Ban Meta glasses can ask them questions about what they’re looking at.

On this episode of the Meta Tech Podcast, meet Shane, a research scientist at Meta who has spent the last seven years focusing on computer vision and multimodal AI for wearables.

Shane sits down with Pascal Hartig to share how his team is building foundational models for the Ray-Ban Meta glasses.

They talk about the unique challenges of AI glasses and pushing the boundaries of AI-driven wearable technology.

2 months, 1 week назад @ engineering.fb.com
Revolutionizing software testing: Introducing LLM-powered bug catchers
Revolutionizing software testing: Introducing LLM-powered bug catchers Revolutionizing software testing: Introducing LLM-powered bug catchers

WHAT IT ISMeta’s Automated Compliance Hardening (ACH) tool is a system for mutation-guided, LLM-based test generation.

Traditionally, automated test generation techniques sought merely to increase code coverage.

LLM-based test generation and LLM-based mutant generation are not new, but this is the first time they’ve been combined and deployed in large-scaled industrial systems.

WHAT’S NEXTOur novel approach combines LLM-based test generation and mutant generation to help automate complex technical organizational workflows in this space.

READ THE PAPERMutation-Guided LLM-based Test Generation at Meta

3 months, 1 week назад @ engineering.fb.com
Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine
Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine Meta Andromeda: Supercharging Advantage+ automation with the next-gen personalized ads retrieval engine

Unlocking advertiser value through industry-leading ML innovationMeta Andromeda is a personalized ads retrieval engine that leverages the NVIDIA Grace Hopper Superchip, to enable cutting edge ML innovation in the Ads retrieval stage to drive efficiency and advertiser performance.

Its deployment across Instagram and Facebook applications has achieved +6% recall improvement to the retrieval system, delivering +8% ads quality improvement on selected segments.

Andromeda is designed to maximize ads performance by utilizing the exponential growth in volume of eligible ads available to the retrieval stage.

The design is optimized for AI hardware, minimizing memory bandwidth bottlenecks and enablin…

5 months, 2 weeks назад @ engineering.fb.com
Sequence learning: A paradigm shift for personalized ads recommendations
Sequence learning: A paradigm shift for personalized ads recommendations Sequence learning: A paradigm shift for personalized ads recommendations

Meta’s ad recommendation engine, powered by deep learning recommendation models (DLRMs), has been instrumental in delivering personalized ads to people.

Learning from sequences: developing new sequence learning architectures to replace traditional DLRM neural network architectures.

A paradigm shift with learning from sequences for recommendation systemsMeta’s new system for ads recommendations uses sequence learning at its core.

Scaling the new sequence learning paradigmFollowing the redesign to shift from sparse feature learning to event-based sequence learning, the next focus was scaling across two domains — scaling the sequence learning architecture and scaling event sequences to be long…

5 months, 3 weeks назад @ engineering.fb.com
OCP Summit 2024: The open future of networking hardware for AI
OCP Summit 2024: The open future of networking hardware for AI OCP Summit 2024: The open future of networking hardware for AI

At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters.

We’ve expanded our network hardware portfolio and are contributing two new disaggregated network fabrics and a new NIC to OCP.

At Meta, we believe that open hardware drives innovation.

At Meta, we envision a future of AI hardware systems that are not only scalable, but also open and collaborative.

We encourage anyone who wants to help advance the future of networking hardware for AI to engage with OCP and Meta to help share the future of AI infrastructure.

7 months назад @ engineering.fb.com
Meta’s open AI hardware vision
Meta’s open AI hardware vision Meta’s open AI hardware vision

At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community.

These innovations include a new AI platform, cutting-edge open rack designs, and advanced network fabrics and components.

The open future of AI infraMeta is committed to open source AI.

We must also prioritize open and standardized models so we can leverage collective expertise, make AI more accessible, and work towards minimizing biases in our systems.​Just as important, we also need open AI hardware systems.

By addressing AI’s infrastructure needs together, we can unlock the true promise of open AI for everyone.​

7 months назад @ engineering.fb.com
How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions
How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions How open source AI can improve population estimates, sustainable energy, and the delivery of climate change interventions

Why we need better population mapsAccurate estimates of population are taken for granted in many countries.

As the world’s natural resource and energy demands scale, accurate population estimates also offer significant opportunities to improve sustainability efforts.

In addition to total population counts, Meta’s population maps also include demographic breakdowns for groups such as the number of children under five, women of reproductive age, youth, and the elderly.

AI-powered population estimates have been scientifically evaluated to be among the most accurate in the world for mapping population distribution for a variety of geographies and use-cases.

Please visit the Data for Good websit…

7 months, 2 weeks назад @ engineering.fb.com
Simulator-based reinforcement learning for data center cooling optimization
Simulator-based reinforcement learning for data center cooling optimization Simulator-based reinforcement learning for data center cooling optimization

Meta is revamping its new data center design to optimize for artificial intelligence and the same methodology will be applicable for future data center optimizations as well.

As Meta is revamping its new data center design to optimize for artificial intelligence, the same methodology will be applicable for future data center optimizations as well to improve operational efficiency.

A reinforcement learning approach to data center coolingReinforcement learning (RL) is good at modeling control systems as sequential state machines.

There are also various RL approaches reported such as, transforming cooling optimization via deep reinforcement learning and data center cooling using model-predicti…

8 months, 1 week назад @ engineering.fb.com
How PyTorch powers AI training and inference
How PyTorch powers AI training and inference How PyTorch powers AI training and inference

How PyTorch powers AI training and inferenceLearn about new PyTorch advancements for LLMs and how PyTorch is enhancing every aspect of the LLM lifecycle.

In this talk from AI Infra @ Scale 2024, software engineers Wanchao Liang and Evan Smothers are joined by Meta research scientist Kimish Patel to discuss our newest features and tools that enable large-scale training, memory efficient fine-tuning, and on-device LLM capabilities.

First, they cover the importance of memory-efficient fine-tuning and a few common architectural and algorithmic techniques to enable fine-tuning on consumer-grade hardware.

Then they discuss the challenges of deploying large models for on-device deployment and how …

8 months, 3 weeks назад @ engineering.fb.com
Inside the hardware and co-design of MTIA
Inside the hardware and co-design of MTIA Inside the hardware and co-design of MTIA

In this talk from AI Infra @ Scale 2024, Joel Colburn, a software engineer at Meta, technical lead Junqiang Lan, and software engineer Jack Montgomery discuss the second generation of MTIA, Meta’s in-house training and inference accelerator.

They cover the co-design process behind building the second generation of Meta’s first-ever custom silicon for AI workloads, including the PyTorch software ecosystem, and the model architectures for Meta’s key applications.

They demonstrate how MTIA achieves the performance, efficiency, and developer experience to successfully launch models into production.

They also highlight several co-design examples where special silicon features are utilized to acc…

8 months, 3 weeks назад @ engineering.fb.com
Bringing Llama 3 to life
Bringing Llama 3 to life Bringing Llama 3 to life

At AI Infra @ Scale 2024, Meta engineers discussed every step of how we built and brought Llama 3 to life, from data and training to inference.

Joe Spisak, Product Director and Head of Generative AI Open Source at Meta, talks about the history of Llama and Meta’s overarching vision for open source AI.

He’s joined by Delia David, a software engineer at Meta, to discuss all things data-related for GenAI.

Kaushik Veeraraghavan, a software engineer at Meta, discusses how Meta trains Llama at scale and delves into the data center, networking, and software investments that have enabled the development of Meta’s Llama 3 models.

Finally, Ye (Charlotte) Qia, a production engineer at Meta, discusses …

8 months, 3 weeks назад @ engineering.fb.com
Aparna Ramani discusses the future of AI infrastructure
Aparna Ramani discusses the future of AI infrastructure Aparna Ramani discusses the future of AI infrastructure

Delivering new AI technologies at scale also means rethinking every layer of our infrastructure – from silicon and software systems and even our data center designs.

For the second year in a row, Meta’s engineering and infrastructure teams returned for the AI Infra @ Scale conference, where they discussed the challenges of scaling up an infrastructure for AI as well as work being done on our large-scale GPU clusters, open hardware designs for next-generation data center hardware, and how Meta is building custom silicon like the Meta Training and Inference Accelerator (MTIA) to handle some of our AI training workloads.

Aparna Ramani, VP of Engineering at Meta, responsible for AI infrastructu…

8 months, 4 weeks назад @ engineering.fb.com
How Meta animates AI-generated images at scale
How Meta animates AI-generated images at scale How Meta animates AI-generated images at scale

Meta AI’s animate feature, which lets people generate a short animation of a generated image, carried unique challenges in this regard.

Here’s how we were able to deploy Meta AI’s animate feature using a combination of latency optimizations, traffic management, and other novel techniques.

We started by looking at the data for previous traffic on our AI-generated media both at their launches and over time.

With these changes, the preponderance of requests remained in region and latency dropped to roughly what we would expect.

The service tries to take a chunk of that region’s requests and offload them to a nearby region that can handle them without becoming more overloaded.

9 months назад @ engineering.fb.com
Uber Engineering
последний пост None
neptune.ai neptune.ai
последний пост 18 часов назад
Evaluating RAG Pipelines
Evaluating RAG Pipelines Evaluating RAG Pipelines

Related Building LLM Applications With Vector Databases Read moreDimensions of RAG evaluationEvaluating a RAG pipeline means assessing its behavior across three dimensions:1.

The evaluation of the RAG pipeline is a multi-step process, starting with creating an evaluation dataset, then evaluating the individual components (retriever, generator, etc.

Curating an evaluation datasetThe first step in the RAG evaluation process is the creation of a ground truth dataset.

MAP considers both the presence and rank of relevant chunks but fails to consider the relative position of relevant chunks.

However, not all retrieved chunks are equally relevant and sometimes, the most relevant chunks might not b…

18 часов назад @ neptune.ai
How to Build an LLM Agent With AutoGen: Step-by-Step Guide
How to Build an LLM Agent With AutoGen: Step-by-Step Guide How to Build an LLM Agent With AutoGen: Step-by-Step Guide

The efficiency of an LLM agent depends on the selection of the right LLM model.

In this article, we’ll introduce the fundamental building blocks of LLM agents and then walk through the process of building an LLM agent step by step.

Building an LLM agent from scratchIn the following, we’ll build a trip-planning LLM agent from scratch.

Using AutoGen’s OpenAI Assistant Agent, we instantiate a prompt that the LLM agent will follow throughout its interactions.

Related Ethical Considerations and Best Practices in LLM Development Read moreEnhancing LLM agent performanceWhile architecting an LLM agent, you have to keep in mind opportunities to improve the performance of the LLM agent.

1 month, 3 weeks назад @ neptune.ai
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]
Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection] Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection]

Moreover, I will make the case for why Bayesian deep learning can satisfy these desiderata and briefly review recent advances in the field.

The case for Bayesian deep learningBayesian deep learning uses the foundational statistical principles of Bayesian inference to endow deep learning systems with the ability to make probabilistic predictions.

However, Bayesian deep learning is unfortunately still not as easy to use as standard deep learning, which you can do these days in a few lines of PyTorch code.

If you want to use a Bayesian deep learning model, first, you have to think about specifying the prior.

If this is the case, trying out Bayesian deep learning is likely worth your while.

2 months назад @ neptune.ai
Introduction to State Space Models as Natural Language Models
Introduction to State Space Models as Natural Language Models Introduction to State Space Models as Natural Language Models

TL;DR State Space Models (SSMs) use first-order differential equations to represent dynamic systems.

Understanding state space modelsBefore exploring how State Space Models (SSMs) can function as components of large language models (LLMs), we’ll examine their foundational mechanics.

State space models for natural language processingState Space Models (SSMs), long established in time series analysis, have been utilized as trainable sequence models for decades.

Linear state space layers (LSSLs)So far, we’ve seen that State Space Models are efficient sequence models.

Improvements on the state matrix AIn the previous section, we explored how the original LSSL relied on a fixed, predefined form …

2 months, 1 week назад @ neptune.ai
Ethical Considerations and Best Practices in LLM Development
Ethical Considerations and Best Practices in LLM Development Ethical Considerations and Best Practices in LLM Development

To keep data secure throughout the model’s lifecycle, implement these practices: data anonymization, secure model serving and privacy penetration tests.

For example, a recruitment LLM favoring male applicants due to biased training data reflects a harmful bias that requires correction.

Monitor bias continuouslyMitigating bias isn’t a one-time effort—it requires ongoing monitoring to ensure that your LLM remains fair and effective across iterations.

Although these contributions are publicly available, the move opened up debates about the ethics of reusing community-contributed content for proprietary AI training.

Best practices for ethical LLM developmentNavigating the regulatory landscape r…

2 months, 2 weeks назад @ neptune.ai
Open LLMs are Necessary For Current Private Adaptations and Outperform Their Closed Alternatives [Paper Reflection]
Open LLMs are Necessary For Current Private Adaptations and Outperform Their Closed Alternatives [Paper Reflection] Open LLMs are Necessary For Current Private Adaptations and Outperform Their Closed Alternatives [Paper Reflection]

While much of the discussion around LLMs centers on task and computational performance, in our paper Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives, we focus on the privacy implications of using Open and Closed LLMs.

The threat space in adapting LLMs to private dataThe adaptation of Closed LLMs to private datasets introduces a multifaceted threat space.

Related Zero-Shot and Few-Shot Learning with LLMs Read morePrivate adaptation methods for Open LLMsUnlike Closed LLMs, Open LLMs provide access to their parameters, enabling more flexible and parameter-centric private adaptation methods.

Performance: All adaptation methods for Closed LLMs ach…

2 months, 3 weeks назад @ neptune.ai
Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale
Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale Learnings From Teams Training Large-Scale Models: Challenges and Solutions For Monitoring at Hyperscale

“What is not measured, cannot be improved.” This quote has become a guiding principle for teams training foundation models.

During my talk at NeurIPS, I broke down five key lessons learned from teams facing large-scale model training and monitoring.

Waabi’s teams, running large-scale ML experiments, needed a way to organize and share their experiment data efficiently.

Visualizing large datasetsWe generally do not think of dataset visualization as part of experiment monitoring.

Moving forwardThe path to efficient hyperscale training lies in combining robust monitoring, advanced debugging tools, and comprehensive experiment tracking.

3 months назад @ neptune.ai
Mixture of Experts LLMs: Key Concepts Explained
Mixture of Experts LLMs: Key Concepts Explained Mixture of Experts LLMs: Key Concepts Explained

TL;DR Mixture of Experts (MoE) is a type of neural network architecture that employs sub-networks (experts) to process specific input parts.

This is the key idea behind Mixture of Expert LLMs.

The Switch-Language Transformer, Mixtral, GLaM, GShard, and DeepSeekMoE are Mixture of Experts LLMs (MoEs), which require only executing a portion of the model’s computational graph during inference.

Optimization strategies for MoE LLMs are discussed comprehensively in the papers introducing the Switch Transformer, GShard, and GLaM.

Mixture of Experts (MoE) is an approach to scaling LLMs to trillions of parameters with conditional computation while avoiding exploding computational costs.

3 months, 1 week назад @ neptune.ai
Hyperparameter Optimization For LLMs: Advanced Strategies
Hyperparameter Optimization For LLMs: Advanced Strategies Hyperparameter Optimization For LLMs: Advanced Strategies

Advanced hyperparameter optimization strategies, like population-based training, Bayesian optimization, and adaptive LoRA, promise to balance computational effort and outcome.

To avoid this, learning rate schedules for LLMs start with a small learning rate and slowly ramp it up to its maximum value.

Can we use traditional machine learning hyperparameter optimization methods for LLMs?

| Modified based on: sourceHands-on: LLM hyperparameter optimization with neptune.aiOptuna is a framework for optimizing hyperparameter search using Bayesian optimization.

See the docs or watch a short product demo (2 min)Play with a live Neptune Scale projectRequest your early accessWhat’s next in LLM hyperpar…

3 months, 2 weeks назад @ neptune.ai
Multimodal Large Language Models
Multimodal Large Language Models Multimodal Large Language Models

TL;DR Multimodal Large Language Models (MLLMs) process data from different modalities like text, audio, image, and video.

This article explores Multimodal Large Language Models, exploring their core functionalities, challenges, and potential for various machine-learning domains.

Let’s break down the concept of Multimodal Large Language Models (MLLMs) by first understanding the terms “modal” and “multimodal:”“Modal” refers to a particular way of communicating or perceiving information.

| SourceGoogle: PaLM-EGoogle developed an embodied language model, PaLM-E, to incorporate continuous sensor modalities into language models and establish the link between words and perceptions.

Improving how t…

3 months, 3 weeks назад @ neptune.ai
How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai
How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai How to Build and Evaluate a RAG System Using LangChain, Ragas, and neptune.ai

In this guide, we’ll show you how to build a RAG system using the LangChain framework, evaluate its performance using Ragas, and track your experiments with neptune.ai.

Part 1: Building a baseline RAG system with LangChainIn the first part of this guide, we’ll use LangChain to build a RAG system for the blog posts in the LLMOps category on Neptune’s blog.

Ragas works smoothly with LangChain, making it a great choice for evaluating our RAG system.

Step 1: Generate a RAG evaluation datasetAn evaluation set for RAG tasks is similar to a question-answering task dataset.

Step 2: Choose RAG evaluation metricsAs mentioned earlier, Ragas offers both LLM-based and non-LLM-based metrics for RAG syste…

4 months, 2 weeks назад @ neptune.ai
Position: Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection]
Position: Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection] Position: Understanding LLMs Requires More Than Statistical Generalization [Paper Reflection]

In our paper, Understanding LLMs Requires More Than Statistical Generalization, we argue that current machine learning theory cannot explain the interesting emergent properties of Large Language Models, such as reasoning or in-context learning.

Inductive biases affect which solution the neural network converges to, such as the model architecture or the optimization algorithm.

How do language complexity and model architecture affect generalization ability?

showed how different neural network architectures generalize better for different language types.

Presumably, we’ll need to find different complexity measures for different model architectures that consider their specific inductive biases.

4 months, 3 weeks назад @ neptune.ai
From Research to Production: Building The Most Scalable Experiment Tracker For Foundation Models
From Research to Production: Building The Most Scalable Experiment Tracker For Foundation Models From Research to Production: Building The Most Scalable Experiment Tracker For Foundation Models

TL;DR At a large-scale model training (in huge models), anomalies are not rare events but problematic patterns that drive failure.

The Neptune Scale experiment tracker supports fault tolerance and is designed to maintain progress despite hardware failures, making it adaptable for enterprise teams tackling LLM fine-tuning, compliance, and building domain-specific models.

Experiment tracking back then was straightforward—dealing mostly with single models or small-scale distributed systems.

One of the biggest lessons we’ve learned is that experiment tracking has evolved into experiment monitoring.

That’s why we’re focusing on building intelligent alerts and anomaly detection right into our exp…

5 months назад @ neptune.ai
Transformers Key-Value Caching Explained
Transformers Key-Value Caching Explained Transformers Key-Value Caching Explained

Key-value (KV) caching is a clever trick to do that: At inference time, key and value matrices are calculated for each generated token.

Implementing K-V caching in large-scale production systems requires careful cache management, including choosing an appropriate strategy for cache invalidation and exploring opportunities for cache reuse.

Key-value (KV) caching is a clever trick to do just that – let’s see how it works and when to use it.

Transformer architecture overviewBefore we dive into KV caching, we will need to take a short detour to the attention mechanism used in transformers.

Understanding how it works is required to spot and appreciate how KV caching optimizes transformer inferen…

5 months, 1 week назад @ neptune.ai
Learn From Failure: Fine-Tuning LLMs With Trial-and-Error Data For Intuitionistic Propositional Logic Proving [Paper Reflection]
Learn From Failure: Fine-Tuning LLMs With Trial-and-Error Data For Intuitionistic Propositional Logic Proving [Paper Reflection] Learn From Failure: Fine-Tuning LLMs With Trial-and-Error Data For Intuitionistic Propositional Logic Proving [Paper Reflection]

In our paper, Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving, we explored this problem experimentally.

Our goal was to assess the influence of trial-and-error information in the training data on the performance of LLMs in theorem proving.

However, at the time we published our paper, current approaches to training LLMs for ATPs only utilized data on correct proof attempts.

We hope our work can raise the community’s awareness of the importance of trial-and-error data for automated theorem proving.

We believe this advancement is largely due to the substantial trial-and-error data included in the model’s training process.

5 months, 2 weeks назад @ neptune.ai
▶️ YouTube
Yannic Kilcher Yannic Kilcher
последний пост 1 week, 5 days назад
On the Biology of a Large Language Model (Part 2)
On the Biology of a Large Language Model (Part 2) On the Biology of a Large Language Model (Part 2)

An in-depth look at Anthropic's Transformer Circuit Blog Post

Part 1 here: https://youtu.be/mU3g2YPKlsA

Discord here: https;//ykilcher.com/discord https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy …

1 week, 5 days назад @ youtube.com
On the Biology of a Large Language Model (Part 1)
On the Biology of a Large Language Model (Part 1) On the Biology of a Large Language Model (Part 1)

An in-depth look at Anthropic's Transformer Circuit Blog Post https://transformer-circuits.pub/2025/attribution-graphs/biology.html Abstract:

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology. Authors:

Jack Lindsey†, Wes Gurnee*, Emmanuel Ameisen*, Brian Chen*, Adam Pearce*, Nicholas L. Turner*, Craig Citro*,

David Abrahams, Shan Carter, Basil Hosmer, Jonathan Marcus, Michael Sklar, Adly Templeton,

Trenton Bricken, Callum McDougall◊, Hoagy Cunningham, Thomas Henighan, Adam Jermyn, Andy Jones, Andrew Persic, Zhenyi Qi, T. Ben Thompson,

Sam Zimmerman, Kelley Rivoire, Thom…

1 month, 1 week назад @ youtube.com
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained)
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained) DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained)

#deepseek #llm #reinforcementlearning GRPO is one of the core advancements used in Deepseek-R1, but was introduced already last year in this paper that uses a combination of new RL techniques and iterative data collection to achieve remarkable performance on mathematics benchmarks with just a 7B model. Paper: https://arxiv.org/abs/2402.03300 Abstract:

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achie…

3 months, 2 weeks назад @ youtube.com
Traditional Holiday Live Stream
Traditional Holiday Live Stream Traditional Holiday Live Stream

https://ykilcher.com/discord Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yannic-kilcher

Minds: https://www.minds.com/ykilcher

Parler: https://parler.com/profile/YannicKilcher

LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https:/…

4 months, 2 weeks назад @ youtube.com
Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)
Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained) Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

#tokenization #llm #meta This paper does away with tokenization and creates an LLM architecture that operates on dynamically sized "patches" instead of tokens. By controlling the patch size, they gain a level of control over the tradeoff between model size and FLOPs and use that to achieve more favorable scaling behavior than classically tokenized LLMs. Paper: https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/

Code: https://github.com/facebookresearch/blt Abstract:

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant imp…

4 months, 3 weeks назад @ youtube.com
Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)
Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained) Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)

This paper demonstrates in a series of experiments that current safety alignment techniques of LLMs, as well as corresponding jailbreaking attacks, are in large part focusing on modulating the distribution of the first few tokens of the LLM response. Paper: https://openreview.net/forum?id=6Mxhg9PtDE&s=09 Abstract:

The safety alignment of current Large Language Models (LLMs) is vulnerable. Simple attacks, or even benign fine-tuning, can jailbreak aligned models. We note that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output to…

5 months назад @ youtube.com
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained) TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

A deep dive into the TokenFormer and an opinion about its impact, novelty, and relation to prior work. Paper: https://arxiv.org/abs/2410.23168 Abstract:

Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel dimensions) are introduced, the entire model typically requires retraining from scratch. As model sizes continue growing, this strategy results in increasingly high com…

5 months, 3 weeks назад @ youtube.com
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

This paper (by Apple) questions the mathematical reasoning abilities of current LLMs and designs a synthetic template-based dataset distribution to investigate various aspects around LLM performance of high-school level math questions. Paper: https://arxiv.org/abs/2410.05229 Abstract:

Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities hav…

6 months, 4 weeks назад @ youtube.com
Were RNNs All We Needed? (Paper Explained)
Were RNNs All We Needed? (Paper Explained) Were RNNs All We Needed? (Paper Explained)

This paper posits the interesting question: How much of the performance of Mamba, S4, and other state-space-like models is actually just attributable to some very core concepts - rather than their elaborate architectures. The authors construct minimal versions of GRUs and LSTMs and report competitive performance. Paper: https://arxiv.org/abs/2410.01201 Abstract:

The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training. As a result, many novel recurrent architectures, such as S4, Mamba, and Aaren, have been proposed that achieve comparable performance. In this work, we revisit traditional …

7 months назад @ youtube.com
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper) Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

How can one best use extra FLOPS at test time? Paper: https://arxiv.org/abs/2408.03314 Abstract:

Enabling LLMs to improve their outputs by using more test-time computation is a critical step towards building generally self-improving agents that can operate on open-ended natural language. In this paper, we study the scaling of inference-time computation in LLMs, with a focus on answering the question: if an LLM is allowed to use a fixed but non-trivial amount of inference-time compute, how much can it improve its performance on a challenging prompt? Answering this question has implications not only on the achievable performance of LLMs, but also on the future of LLM pretraining and how one s…

7 months, 1 week назад @ youtube.com
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained) Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)

#llm #privacy #finetuning Can you tamper with a base model in such a way that it will exactly remember its fine-tuning data? This paper presents a method of doing exactly that, and implements it in modern transformers. OUTLINE:

0:00 - Intro & Overview

10:50 -Core idea: single-use data traps

44:30 - Backdoors in transformer models

58:00 - Additional numerical tricks

1:00:35 - Experimental results & conclusion Paper: https://arxiv.org/abs/2404.00473

Code: https://github.com/ShanglunFengatETHZ/PrivacyBackdoor Abstract:

Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a…

9 months, 2 weeks назад @ youtube.com
3blue1brown 3blue1brown
последний пост 1 week, 3 days назад
Summer of Math Exposition #4 | Teachers, I'd love to hear from you
Summer of Math Exposition #4 | Teachers, I'd love to hear from you Summer of Math Exposition #4 | Teachers, I'd love to hear from you

Make a math explainer, get feedback, and receive prizes: https://some.3b1b.co

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos. ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim

https://github.com/3b1b/manim

https://github.com/ManimCommunity/manim/ All code for specific videos is visible here:

https://github.com/3b1b/videos/ The music is by Vincent Rubinetti.

https://www.vincentrubinetti.com

https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

https://open.spotify.com/…

1 week, 3 days назад @ youtube.com
Where my explanation of Grover’s algorithm failed
Where my explanation of Grover’s algorithm failed Where my explanation of Grover’s algorithm failed

Addressing viewer questions from the last video.

These lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to share the videos. ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here:

https://3b1b.co/faq#manim

https://github.com/3b1b/manim

https://github.com/ManimCommunity/manim/ All code for specific videos is visible here:

https://github.com/3b1b/videos/ The music is by Vincent Rubinetti.

https://www.vincentrubinetti.com

https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown

https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u ------------------ 3blue1brown is a ch…

1 week, 4 days назад @ youtube.com
But what is Quantum Computing? (Grover's Algorithm)
But what is Quantum Computing?  (Grover's Algorithm) But what is Quantum Computing? (Grover's Algorithm)

Qubits, state vectors, and Grover's algorithm for search.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to share the videos. The subtitles on this video were done using AI, and are likely imperfect, but they are open for community corrections at https://criblate.com/ Adam Brown's paper on the connection between Grover's Algorithm and block collisions:

https://arxiv.org/pdf/1912.02207 If you want to learn the relevant underlying quantum mechanics here, a very friendly resource is the course Mithuna at Looking Glass Universe is currently putting together. See, for instance, this explainer of a qubit:…

2 weeks, 1 day назад @ youtube.com
Testing your intuition for quantum computing
Testing your intuition for quantum computing Testing your intuition for quantum computing

Full video: https://youtu.be/RQWpF2Gb-gU

2 weeks, 1 day назад @ youtube.com
How to measure nearby galaxies
How to measure nearby galaxies How to measure nearby galaxies

From this video: https://youtu.be/hFMaT9oRbs4

3 weeks, 4 days назад @ youtube.com
Measuring the distance to Venus without radar
Measuring the distance to Venus without radar Measuring the distance to Venus without radar

From this video with Terry Tao: https://youtu.be/hFMaT9oRbs4

1 month, 1 week назад @ youtube.com
Measuring the speed of light using Jupiter's moons
Measuring the speed of light using Jupiter's moons Measuring the speed of light using Jupiter's moons

From this video with Terry Tao: https://youtu.be/hFMaT9oRbs4

1 month, 1 week назад @ youtube.com
The tragic tale of Guillaume Le Gentil
The tragic tale of Guillaume Le Gentil The tragic tale of Guillaume Le Gentil

From this video: https://youtu.be/hFMaT9oRbs4 Artwork by Kurt Bruns

1 month, 2 weeks назад @ youtube.com
Zooming out by powers of 10
Zooming out by powers of 10 Zooming out by powers of 10

From this video: https://youtu.be/YdOXS_9_P4U

1 month, 2 weeks назад @ youtube.com
There's more to those colliding blocks that compute pi
There's more to those colliding blocks that compute pi There's more to those colliding blocks that compute pi

Two colliding blocks compute pi, here we dig into the physics to explain why

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

An equally valuable form of support is to simply share the videos. The original paper by Gregory Galperin:

https://www.maths.tcd.ie/~lebed/Galperin.%20Playing%20pool%20with%20pi.pdf Adam Brown's paper on the analogy with Grover's Algorithm:

https://arxiv.org/pdf/1912.02207 Here's a lovely interactive built by GitHub user prajwalsouza after watching this video: https://prajwalsouza.github.io/Experiments/Colliding-Blocks.html Matt Parker's Pi Day video:

https://youtu.be/vlUTlbZT4ig NY Times blog post about this proble…

2 months назад @ youtube.com
When being beautifully wrong leads to discovery
When being beautifully wrong leads to discovery When being beautifully wrong leads to discovery

Full video: https://youtu.be/YdOXS_9_P4U

2 months, 2 weeks назад @ youtube.com
Why the ancient Greek's rejected heliocentrism
Why the ancient Greek's rejected heliocentrism Why the ancient Greek's rejected heliocentrism

From this video on the cosmic distance ladder: https://youtu.be/YdOXS_9_P4U

2 months, 2 weeks назад @ youtube.com
How to estimate the distance to the sun
How to estimate the distance to the sun How to estimate the distance to the sun

Full video: https://youtu.be/YdOXS_9_P4U

2 months, 2 weeks назад @ youtube.com
How Aristarchus deduced the distance to the moon
How Aristarchus deduced the distance to the moon How Aristarchus deduced the distance to the moon

Full video: https://youtu.be/YdOXS_9_P4U

2 months, 2 weeks назад @ youtube.com
The cosmic distance ladder with Terence Tao (part 2)
The cosmic distance ladder with Terence Tao (part 2) The cosmic distance ladder with Terence Tao (part 2)

How we know the distances to the planets, stars, and faraway galaxies.

Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support

FAQ with added details and corrections: https://terrytao.wordpress.com/2025/02/13/cosmic-distance-ladder-video-with-grant-sanderson-3blue1brown-commentary-and-corrections/ An equally valuable form of support is to simply share the videos. Terry and his collaborator Tanya have an Instagram about the cosmic distance ladder: https://www.instagram.com/cosmic_distance_ladder/ Artwork of Guillaume Le Gentil by Kurt Bruns

Artwork of Antonia Maury and Henrietta Leavitt by Talia Gershon: https://bit.ly/taliagershonart

Several of t…

2 months, 3 weeks назад @ youtube.com
Two Minute Papers Two Minute Papers
последний пост 13 часов назад
New AI: Impossible Creatures Come Alive!
New AI: Impossible Creatures Come Alive! New AI: Impossible Creatures Come Alive!

❤️ Check out DeepInfra and run DeepSeek or many other AI projects: https://deepinfra.com/papers 📝 The papers are available here:

https://anytop2025.github.io/Anytop-page/

https://zhongleilz.github.io/Sketch2Anim/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall,…

13 часов назад @ youtube.com
NVIDIA’s New AI: Impossible Video Game Animations!
NVIDIA’s New AI: Impossible Video Game Animations! NVIDIA’s New AI: Impossible Video Game Animations!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The #NVIDIA paper "GENMO: A GENeralist Model for Human MOtion" is available here:

https://research.nvidia.com/labs/dair/genmo/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 Sources for SLAM:

https://www.youtube.com/watch?v=2GJuEIh4xGo

https://ww…

4 days, 16 hours назад @ youtube.com
3 Ways OpenAI’s ChatGPT Surprised Its Creators!
3 Ways OpenAI’s ChatGPT Surprised Its Creators! 3 Ways OpenAI’s ChatGPT Surprised Its Creators!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video OpenAI post: https://openai.com/index/expanding-on-sycophancy/ Paper on agreeableness: https://arxiv.org/abs/2212.09251 Source: https://x.com/georgejrjrjr/status/1917722125668081863/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to…

1 week назад @ youtube.com
Blender 4.4 Is Here - Still The Best…For Free!
Blender 4.4 Is Here - Still The Best…For Free! Blender 4.4 Is Here - Still The Best…For Free!

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers Get Blender: https://www.blender.org/ Demo files: https://www.blender.org/download/demo-files/

Full donut tutorial: https://www.youtube.com/watch?v=4haAdmHqGOw&pp=ygUWYW5kcmV3IHByaWNlIGRvbnV0IDQuNA%3D%3D Our papers that uses Blender: https://users.cg.tuwien.ac.at/zsolnai/gfx/photorealistic-material-editing/

https://users.cg.tuwien.ac.at/zsolnai/gfx/gaussian-material-synthesis/ Subsurface scattering video source: https://www.youtube.com/shorts/YqxSzGAKiPM

Blue-noise dithered sampling: https://iliyan.com/publications/DitheredSampling Donate to Blender: https://fund.blender.org/ 📝 My paper on simulations th…

2 weeks, 1 day назад @ youtube.com
NVIDIA’s New AI: Impossible Ray Tracing!
NVIDIA’s New AI: Impossible Ray Tracing! NVIDIA’s New AI: Impossible Ray Tracing!

❤️ Check out DeepInfra and run DeepSeek or many other AI projects: https://deepinfra.com/papers 📝 The #nvidia paper "3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting" is available here:

https://research.nvidia.com/labs/toronto-ai/3DGUT/ 📝 Our Separable Subsurface Scattering paper: https://users.cg.tuwien.ac.at/zsolnai/gfx/separable-subsurface-scattering-with-activision-blizzard/

📝 SSS For Gaussian Splatting: https://sss.jdihlmann.com/ Sources:

https://x.com/jonstephens85/status/1911923445660893321?s=46

https://x.com/jonstephens85/status/1908730004973986175?s=46

https://www.youtube.com/watch?v=KPFPuXm2u7I

https://www.youtube.com/shorts/si1-zFZpeDE 📝 My paper on simu…

2 weeks, 3 days назад @ youtube.com
OpenAI’s ChatGPT o3 - Pushing Humanity Forward!
OpenAI’s ChatGPT o3 - Pushing Humanity Forward! OpenAI’s ChatGPT o3 - Pushing Humanity Forward!

❤️ Check out Vast.ai and run DeepSeek or any AI project: https://vast.ai/papers 📝 OpenAI's o3 is available here:

https://openai.com/index/introducing-o3-and-o4-mini/ 📝 The paper "Humanity's Last Exam" is available here:

https://agi.safe.ai/ ChatGPT trick:

Go to your profile - personalization - customize ChatGPT, and add:

“Look for peer-reviewed sources for every answer. Rate the credibility of that source on a scale of 0 to 10 for every source. And of course - use more emoji. ” 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Sources:

https://x.com/DeryaTR_/status/1914133246465487026/photo/1

https://x.com/nicdunz/status/19130435093486…

3 weeks назад @ youtube.com
NVIDIA’s Tech: Brutal 2,500,000 Part Simulation!
NVIDIA’s Tech: Brutal 2,500,000 Part Simulation! NVIDIA’s Tech: Brutal 2,500,000 Part Simulation!

❤️ Check out Vast.ai and run DeepSeek or any AI project: https://vast.ai/papers 📝 The papers are available here:

https://www.dgp.toronto.edu/projects/trading-spaces/

https://pcs-sim.github.io/pd/

https://visualcomputing.ist.ac.at/publications/2024/SDTF/

https://starryuniv.cn/files/sig24magnetic.pdf

https://github.com/Univstar/IoB-Ferrofluid-2D 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ah…

3 weeks, 4 days назад @ youtube.com
OpenAI’s GPT 4.1 - Absolutely Amazing!
OpenAI’s GPT 4.1 - Absolutely Amazing! OpenAI’s GPT 4.1 - Absolutely Amazing!

❤️ Check out DeepInfra and run DeepSeek or many other AI projects: https://deepinfra.com/papers GPT 4.1 (once again, likely API only, not in the ChatGPT app):

https://openai.com/index/gpt-4-1/ 📝 The paper "Humanity's Last Exam" is available here:

https://agi.safe.ai/ Sources:

https://x.com/paulgauthier/status/1911927464844304591?s=46

https://x.com/ficlive/status/1911853409847906626

https://x.com/flavioad/status/1911848067470598608?s=46P

https://x.com/pandeyparul/status/1911958369734107439?s=46

https://x.com/demishassabis/status/1912197180187897985?s=46

https://x.com/emollick/status/1911966088339894669?s=46

https://x.com/aibattle_/status/1911845556885893488?s=46

https://x.com/augmentcode/sta…

4 weeks, 1 day назад @ youtube.com
NVIDIA’s New Robot AI: Insanely Good!
NVIDIA’s New Robot AI: Insanely Good! NVIDIA’s New Robot AI: Insanely Good!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 The paper "GR00T N1: An Open Foundation Model for Generalist Humanoid Robots" is available here:

https://github.com/NVIDIA/Isaac-GR00T

https://arxiv.org/abs/2503.14734 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our ge…

1 month назад @ youtube.com
Meta's LLAMA 4 AI In 4 Minutes!
Meta's LLAMA 4 AI In 4 Minutes! Meta's LLAMA 4 AI In 4 Minutes!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video Or just run it with Ollama when the model appears:

https://ollama.com/search?q=llama%204 📝 LLAMA 4:

https://ai.meta.com/blog/llama-4-multimodal-intelligence/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patre…

1 month, 1 week назад @ youtube.com
OpenAI’s ChatGPT - 8 New Incredible Features!
OpenAI’s ChatGPT - 8 New Incredible Features! OpenAI’s ChatGPT - 8 New Incredible Features!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambda.ai/papers Guide for using DeepSeek on Lambda:

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video Our neural rendering paper: https://users.cg.tuwien.ac.at/zsolnai/gfx/gaussian-material-synthesis/ Sources:

https://x.com/MattJedrzejewsk/status/1906835418588590318

https://x.com/MattJedrzejewsk/status/1907145560056078581

https://x.com/egeberkina/status/1906089394219491782/photo/1

https://x.com/cocktailpeanut/status/1906983829035974890

https://x.com/cocktailpeanut/status/1906956398136811820

https://x.com/ImSamThompson/sta…

1 month, 1 week назад @ youtube.com
DeepMind’s New Gemini AI: Build Anything For Free! 🏅
DeepMind’s New Gemini AI: Build Anything For Free! 🏅 DeepMind’s New Gemini AI: Build Anything For Free! 🏅

❤️ Check out Vast.ai and run DeepSeek or any AI project: https://vast.ai/papers Gemini 2.5 Pro is available here:

https://gemini.google.com/app

https://aistudio.google.com

https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ Simulated soccer/fusball: https://editor.p5js.org/trudyp/sketches/KjxZ3d8lX

Sources:

https://x.com/trudypainter/status/1904588040112120062

https://x.com/pandeyparul/status/1904731207541157922

https://x.com/alexanderchen/status/1904748250193654122

https://x.com/renderfiction/status/1905998185962643767

https://x.com/addyosmani/status/1906247323430408550

https://x.com/mbaeuml/status/1906597742522105875

https://x.com/renderfiction/status/…

1 month, 2 weeks назад @ youtube.com
NVIDIA's New AI Makes Cars Fly...Sort Of!
NVIDIA's New AI Makes Cars Fly...Sort Of! NVIDIA's New AI Makes Cars Fly...Sort Of!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers 📝 The paper "GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control" is available here:

https://research.nvidia.com/labs/toronto-ai/GEN3C/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael T…

1 month, 2 weeks назад @ youtube.com
OpenAI’s New Image Generator: An AI Revolution!
OpenAI’s New Image Generator: An AI Revolution! OpenAI’s New Image Generator: An AI Revolution!

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers 4o Image Generation: https://openai.com/index/introducing-4o-image-generation/

Apple terminal: https://www.apple.com/mac/lumon-terminal-pro/ 📝 My paper on simulations that look almost like reality is available for free here:

https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations:

https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:

Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, R…

1 month, 2 weeks назад @ youtube.com
DeepSeek V3 - The King is Back…For Free!
DeepSeek V3 - The King is Back…For Free! DeepSeek V3 - The King is Back…For Free!

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers Guide for using DeepSeek (R1) on Lambda (can be applied to DeepSeek V3 too, see links below):

https://docs.lambdalabs.com/education/large-language-models/deepseek-r1-ollama/?utm_source=two-minute-papers&utm_campaign=relevant-videos&utm_medium=video 📝 DeepSeek V3 (0324) is available here:

https://www.deepseek.com/

Try it online (note: they see your data, I prefer private, see below): https://chat.deepseek.com

Paper: https://arxiv.org/pdf/2412.19437 How to run locally: https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file#6-how-to-run-locally

Ollama is probably the simplest way to run it - support …

1 month, 2 weeks назад @ youtube.com
DataFest Video DataFest Video
последний пост 8 months, 2 weeks назад
Interview with Juergen Schmidhuber at Data Christmas 2020
Interview with Juergen Schmidhuber at Data Christmas 2020 Interview with Juergen Schmidhuber at Data Christmas 2020

02:00-05:38 What do you think were the most outstanding underestimated news and achievements in AI field in 2020?

05:41-11:28 What do you think about trends in ML like transformers trying to replace LSTMs in NLP?

11:29-16:06 Are you working on any new types of models right now?

16:07-20:41 What is your opinion on the most underestimated ML subfield like Reinforcement Learning?

20:42-22:17 Your best recommendation for our community is to look into AI in the real physical world, right?

22:18-33:10 Do you think it is possible to achieve great results in creative AI, particularly in subjective beauty?

33:17-35:50 What prevents chat bots from reaching more intelligent levels?

36:03-39:39 What is…

8 months, 2 weeks назад @ youtube.com
Data Fest Online 2020 AI Hardware Track Premiere
Data Fest Online 2020 AI Hardware Track Premiere Data Fest Online 2020 AI Hardware Track Premiere

DataFest Online 2020

AI Hardware track https://ods.ai/tracks/ai-hardware-df2020 Register and get access to the tracks: https://ods.ai/events/datafest2020

Join the community: https://ods.ai/

8 months, 2 weeks назад @ youtube.com
Mikita Shchutski | A small BERT towards Large Medical Models
Mikita Shchutski | A small BERT towards Large Medical Models Mikita Shchutski | A small BERT towards Large Medical Models

Mikita Shchutski | Lead Machine Learning Engineer, Quantori Training large medical models using electronic health records in order to create a highly informative medical embedding space

8 months, 2 weeks назад @ youtube.com
Семинары JetBrains Research Семинары JetBrains Research
последний пост None
Яндекс. Компьютерные науки Яндекс. Компьютерные науки
последний пост 18 часов назад
Как VLM понимает текст и картинки
Как VLM понимает текст и картинки Как VLM понимает текст и картинки

Это фрагмент доклада Антона Клочкова, руководителя подгруппы распознавания текста в VLM в Поиске и Рекламных технологиях, который он прочитал на ML Party. В выступлении Антон рассказал о мультимодальных нейросетях. Какие эксперименты над VLM ребята успели поставить, что из них извлекли и какое новогоднее чудо с ними приключилось — узнаете из полной записи на нашем канале. #нейросети #AI #VLM #мультимодели #машинноезрение #обработкатекста #архитектуранейросети #deeplearning #искусственныйинтеллект #большаямодель #ML #моделькотораяпонимает #текстиизображение #нейросетьяндекса #multimodalAI #нейросеть2025

18 часов назад @ youtube.com
Что нового умеют нейросети? Визуально-текстовая мультимодальность
Что нового умеют нейросети? Визуально-текстовая мультимодальность Что нового умеют нейросети? Визуально-текстовая мультимодальность

ёЭто фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

1 month, 1 week назад @ youtube.com
Из каких деталей состоят Нейро и другие VLM Яндекса
Из каких деталей состоят Нейро и другие VLM Яндекса Из каких деталей состоят Нейро и другие VLM Яндекса

Это фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

1 month, 1 week назад @ youtube.com
Как работает Нейро и что такое мультимодальность
Как работает Нейро и что такое мультимодальность Как работает Нейро и что такое мультимодальность

Это фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

1 month, 2 weeks назад @ youtube.com
Как сделать озвучку книг из простого TTS / Константин Кузнецов
Как сделать озвучку книг из простого TTS / Константин Кузнецов Как сделать озвучку книг из простого TTS / Константин Кузнецов

Это Константин Кузнецов, руководитель группы интонации в Поиске и рекламных технологиях. Представьте, что вашему пользователю нужно слушать синтез голоса ближайшие полчаса. В таком случае нужно позаботиться о том, чтобы человек попросту не уснул. Константин рассказал, как сделать синтез не роботизированным и не пресным, чтобы удержать слушателя. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml

1 month, 2 weeks назад @ youtube.com
Поисковый аукцион Яндекс Маркета / Евгений Паринов
Поисковый аукцион Яндекс Маркета / Евгений Паринов Поисковый аукцион Яндекс Маркета / Евгений Паринов

Это Евгений Паринов, руководитель группы ранжирования поисковой выдачи в Маркете. Поисковый аукцион — это механизм, который позволяет продавцам на заплатить процент за большее количество продаж. В докладе Евгений рассказал, как в Маркете учитывают рекламную выручку и внедряют учёт юнит-экономики. Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml

1 month, 3 weeks назад @ youtube.com
Под капотом Нейро: как работает мультимодальность
Под капотом Нейро: как работает мультимодальность Под капотом Нейро: как работает мультимодальность

Это фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

1 month, 3 weeks назад @ youtube.com
Байки про обучение VLM / Антон Клочков
Байки про обучение VLM / Антон Клочков Байки про обучение VLM / Антон Клочков

Это Антон Клочков, руководитель подгруппы распознавания текста в VLM в Поиске и рекламных технологиях. Антон продолжает серию рассказов про развитие картиночной мультимодальности в Яндексе. В докладе он показал, какие эксперименты над VLM ребята успели поставить, какие уроки из них извлекли и какое новогоднее чудо с ними приключилось! Узнать больше о мероприятиях для разработчиков можно тут: https://events.yandex.ru Подписывайтесь на телеграм-канал Яндекса для ML-сообщества: https://t.me/yandexforml

1 month, 3 weeks назад @ youtube.com
Мультиагентные подходы для работы с LLM на базе сервисов Yandex Cloud / Мастер-класс
Мультиагентные подходы для работы с LLM на базе сервисов Yandex Cloud / Мастер-класс Мультиагентные подходы для работы с LLM на базе сервисов Yandex Cloud / Мастер-класс

Это мастер-класс на Practical ML Conf. Его провели ребята из Yandex Cloud: Дмитрий Рыбалко, продуктовый архитектор ML-сервисов, и Дмитрий Сошников, консультант, доцент МАИ, НИУ ВШЭ и технический руководитель AI Lab Школы дизайна в ВШЭ. В сфере LLM набирают популярность агентские подходы, когда несколько моделей взаимодействуют друг с другом, чтобы достичь одной цели. В рамках мастер-класса мы показали, как применить этот подход для построения вопросо-ответной системы в транспортной компании. При этом сами методы могут пригодиться в любой индустрии. Для решения задачи поэкспериментировали с различными структурами данных для RAG, в том числе с текстовыми и графовыми. За основу взяли языковую …

1 month, 3 weeks назад @ youtube.com
Как достать соседа: бенчмаркаем ANN-алгоритмы / Мастер-класс
Как достать соседа: бенчмаркаем ANN-алгоритмы / Мастер-класс Как достать соседа: бенчмаркаем ANN-алгоритмы / Мастер-класс

Это мастер-класс на Practical ML Conf. Его провёл Михаил Каменщиков, руководитель юнита «Рекомендации» в Авито. Михаил объяснил, зачем нужны алгоритмы приближённого поиска соседей. А ещё рассмотрел реализацию популярных алгоритмов IVF, HNSW и показал, как на своих данных делать бенчмарк разных подходов с использованием библиотеки ann-benchmarks. Подписывайтесь на телеграм-канал Яндекса для ML-специалистов: https://t.me/yandexforml

1 month, 3 weeks назад @ youtube.com
Как зафайнтюнить вашу любимую диффузионную модель / Мастер-класс
Как зафайнтюнить вашу любимую диффузионную модель / Мастер-класс Как зафайнтюнить вашу любимую диффузионную модель / Мастер-класс

Это мастер-класс на Practical ML Conf. Его провели: Лев Новицкий, ведущий специалист по исследованию данных Kandinsky Research в Sber AI, и Вера Соболева, младший научный сотрудник Института искусственного интеллекта AIRI. Современные диффузионные модели позволяют генерировать качественные и разнообразные изображения по текстовому описанию. Но что делать, если мы хотим создать не случайную картинку, а конкретный объект, например вазу, которую после продадим на маркетплейсе? Это задача персонализированной генерации: нужно в предобученную модель внедрить знания о конкретном объекте по нескольким фотографиям, чтобы показывать его в новых сценах. На мастер-классе разобрали основы диффузии и баз…

1 month, 3 weeks назад @ youtube.com
Как нейросети включают воображение
Как нейросети включают воображение Как нейросети включают воображение

Это фрагмент из доклада Екатерины Глазковой, тимлида команды элайнмента VLM службы компьютерного зрения. На Practical ML Conf 2024 Екатерина рассказала о продуктовых сценариях использования VLM — нейросетей, которые работают одновременно с изображением и текстом. В докладе показаны методы элайнмента под продуктовые требования на примере трёх реальных задач: мультимодального поиска, описания изображений и фантазийно-генеративных сценариев. Смотрите выступление целиком на нашем канале.

1 month, 4 weeks назад @ youtube.com
Как выбрать имя для собаки? #llm #yandex #собака
Как выбрать имя для собаки? #llm #yandex #собака Как выбрать имя для собаки? #llm #yandex #собака

Это фрагмент из доклада Екатерины Глазковой из Яндекс Поиска, с выступлении на Practical ML Conf 2024.

2 months назад @ youtube.com
ML Party 18.03.2025
ML Party 18.03.2025 ML Party 18.03.2025

Добро пожаловать на вечерний митап для ML-инженеров от Яндекса. В этот раз поговорим про про развитие картиночной мультимодальности, книжном синтезе и не только. Вступайте в канал Yandex for ML, чтобы быть в курсе всех событий и мероприятий Яндекса https://t.me/+UohL-bYM25YyZDNi

Оставляйте свои вопросы спикерам в чате с тегом #вопрос: https://t.me/+OsKnLNG-7DE1ZTFi

2 months назад @ youtube.com
Сколько калорий в мопсе? #llm #мопс #юмор
Сколько калорий в мопсе? #llm #мопс  #юмор Сколько калорий в мопсе? #llm #мопс #юмор

Это фрагмент из доклада Екатерины Глазковой из Яндекс Поиска, с выступлении на Practical ML Conf 2024.

2 months, 1 week назад @ youtube.com
ML Trainings ML Trainings
последний пост 1 month, 2 weeks назад
Анастасия Функнер, Ольга Павлова, Анна Ефимова | Как Ozon Банк создает ML-платформу будущего
Анастасия Функнер, Ольга Павлова, Анна Ефимова | Как Ozon Банк создает ML-платформу будущего Анастасия Функнер, Ольга Павлова, Анна Ефимова | Как Ozon Банк создает ML-платформу будущего

Спикеры: Анастасия Функнер, Ольга Павлова, Анна Ефимова, Ozon Банк

Тема доклада: MLOps, Data Science и Golang: Как Ozon Банк создает ML-платформу будущего

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:Telegram: https://t.me/datafestВконтакте: https://vk.com/datafestКанал с вакансиями в telegram: https://t.me/odsjobsКанал с апдейтами по курсам: https://t.me/odscoursesКак попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 2 weeks назад @ youtube.com
Алена Феногенова | Новые бенчмарки 2024-2025 для русского языка: вызовы и перспективы
Алена Феногенова | Новые бенчмарки 2024-2025 для русского языка: вызовы и перспективы Алена Феногенова | Новые бенчмарки 2024-2025 для русского языка: вызовы и перспективы

Спикер: Алена Феногенова, AGI NLP TeamLead, Сбер

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 2 weeks назад @ youtube.com
Мария Бегичева | Цифровой помощник компании в управлении операционными рисками
Мария Бегичева | Цифровой помощник компании в управлении операционными рисками Мария Бегичева | Цифровой помощник компании в управлении операционными рисками

Спикер: Мария Бегичева, Senior DS, Сбер

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 2 weeks назад @ youtube.com
Полина Федотова | Foundation Models in Robotics
Полина Федотова |  Foundation Models in Robotics Полина Федотова | Foundation Models in Robotics

Спикер: Полина Федотова, Главный инженер-разработчик, лид исследовательской команды, «Сбер Центр робототехники»

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 2 weeks назад @ youtube.com
Нонна Шахова, Эмели Драль,Ирина Голощапова, Анастасия Никулина | Круглый стол: Women in Data Science
Нонна Шахова, Эмели Драль,Ирина Голощапова, Анастасия Никулина | Круглый стол: Women in Data Science Нонна Шахова, Эмели Драль,Ирина Голощапова, Анастасия Никулина | Круглый стол: Women in Data Science

Спикеры: Нонна Шахова, Эмели Драль, Ирина Голощапова, Анастасия Никулина

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025

Как женщины меняют data science:

Старт в data science:

Работа и карьера:

Профессиональное развитие:

Будущее в data science: Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 2 weeks назад @ youtube.com
Анна Текучева | "Знаешь что это за слово? а оно есть"
Анна Текучева | "Знаешь что это за слово? а оно есть" Анна Текучева | "Знаешь что это за слово? а оно есть"

Спикер: Анна Текучева, Data Scientist HML Wildberries

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 2 weeks назад @ youtube.com
Юлия Раковская | Вступительное слово WiDS Meetup
Юлия Раковская | Вступительное слово WiDS Meetup Юлия Раковская | Вступительное слово WiDS Meetup

Спикер: Юлия Раковская, Руководитель центра исследований и разработки Сбера

Мероприятие Wids-meetup-2025: https://ods.ai/events/wids-meetup-2025 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

1 month, 2 weeks назад @ youtube.com
Митап #1 | Data Fusion Contest 2025
Митап #1 | Data Fusion Contest 2025 Митап #1 | Data Fusion Contest 2025

Ежегодная серия соревнований по машинному обучению Data Fusion Contest стартовала. Страница с задачами: https://ods.ai/tracks/data-fusion-2025-competitions Во вторник 25 февраля (19:00 - 20:00 по мск) мы провели первый митап по Data Fusion Contest 2025 в ODS спейсе Spatial.Chat. Поговорили о задачах, ответим на ваши вопросы участников. В программе: Анатолий Глушенко Разбор задачи 1 “LabelCraft”

Алексей Натёкин Обзор задачи 2 “4Cast” и задачи 3 “Distribution” Команда Data Fusion Contest 2025 Q&A и обсуждение вопросов с участниками

2 months, 2 weeks назад @ youtube.com
Анастасия Вепрева | Мастер-класс: Генерация лекарственных молекул
Анастасия Вепрева | Мастер-класс: Генерация лекарственных молекул Анастасия Вепрева | Мастер-класс: Генерация лекарственных молекул

Спикер: Анастасия Вепрева - разработчик моделей в области прогнозирования физико-химических свойств и биологических активностей малых молекул, сотрудница Центра ИИ в химии ИТМО

Мероприятие 21.02.2025: https://ods.ai/events/ai_chemistrymk1 Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

2 months, 2 weeks назад @ youtube.com
Антон Воронов | Итоги года в DS/ML карьерных вопросах
Антон Воронов | Итоги года в DS/ML карьерных вопросах Антон Воронов | Итоги года в DS/ML карьерных вопросах

Спикер: Антон Воронов, Газпром ИД, Заместитель директора департамента, Руководитель платформы Поиска Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

2 months, 2 weeks назад @ youtube.com
Пётр Ермаков | Итоги года в PyData stack
Пётр Ермаков | Итоги года в PyData stack Пётр Ермаков | Итоги года в PyData stack

Спикер: Пётр Ермаков, ML Brand Director, Яндекс Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

2 months, 2 weeks назад @ youtube.com
Валентин Малых | Итоги года в NLP
Валентин Малых | Итоги года в NLP Валентин Малых | Итоги года в NLP

Спикер: Валентин Малых, руководитель группы, MTS AI Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

2 months, 2 weeks назад @ youtube.com
Николай Анохин | Итоги года в RecSys
Николай Анохин | Итоги года в RecSys Николай Анохин | Итоги года в RecSys

Спикер: Николай Анохин, ведущий специалист по ML, AI VK Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

2 months, 2 weeks назад @ youtube.com
Ирина Голощапова | Итоги года в Reliable ML, часть 2
Ирина Голощапова | Итоги года в Reliable ML, часть 2 Ирина Голощапова | Итоги года в Reliable ML, часть 2

Спикер: Ирина Голощапова, CDO Raiffeisenbank Operations Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

2 months, 2 weeks назад @ youtube.com
Дмитрий Колодезев | Итоги года в Reliable ML, часть 1
Дмитрий Колодезев | Итоги года в Reliable ML, часть 1 Дмитрий Колодезев | Итоги года в Reliable ML, часть 1

Спикер: Дмитрий Колодезев, директор, Промсофт Data Ёлка 2024 в гостях у VK: https://ods.ai/events/data-elka-24-vk-offline

Data Ёлка 2024: https://ods.ai/events/data-elka-2024

_____

Наши соц.сети:

Telegram: https://t.me/datafest

Вконтакте: https://vk.com/datafest

Канал с вакансиями в telegram: https://t.me/odsjobs

Канал с апдейтами по курсам: https://t.me/odscourses

Как попасть в чат сообщества ODS Mattermost: https://ods.ai/tracks/mattermost

2 months, 2 weeks назад @ youtube.com
🎧 Podcasts
Lex Fridman AI Podcast Lex Fridman AI Podcast
последний пост 1 week, 3 days назад
#468 – Janna Levin: Black Holes, Wormholes, Aliens, Paradoxes & Extra Dimensions
#468 – Janna Levin: Black Holes, Wormholes, Aliens, Paradoxes & Extra Dimensions #468 – Janna Levin: Black Holes, Wormholes, Aliens, Paradoxes & Extra Dimensions

Janna Levin is a theoretical physicist and cosmologist specializing in black holes, cosmology of extra dimensions, topology of the universe, and gravitational waves.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep468-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://brain.fm/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexNetSuite: Business management software.

Go to https://shopify.com/lexAG1: All-in-one daily nutrition drink.

1 week, 3 days назад @ lexfridman.com
#467 – Tim Sweeney: Fortnite, Unreal Engine, and the Future of Gaming
#467 – Tim Sweeney: Fortnite, Unreal Engine, and the Future of Gaming #467 – Tim Sweeney: Fortnite, Unreal Engine, and the Future of Gaming

Tim Sweeney is a legendary video game programmer, founder and CEO of Epic Games that created the Unreal Engine, Fortnite, Gears of War, Unreal Tournament, and many other groundbreaking and influential video games.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep467-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://notion.com/lexMasterClass: Online classes from world-class experts.

Go to https://shopify.com/lexAG1: All-in-one daily nutrition drink.

Go to https://drinkag1.com/lexLMNT: Zero-sugar electrolyte drink mix.

2 weeks, 1 day назад @ lexfridman.com
#466 – Jeffrey Wasserstrom: China, Xi Jinping, Trade War, Taiwan, Hong Kong, Mao
#466 – Jeffrey Wasserstrom: China, Xi Jinping, Trade War, Taiwan, Hong Kong, Mao #466 – Jeffrey Wasserstrom: China, Xi Jinping, Trade War, Taiwan, Hong Kong, Mao

Jeffrey Wasserstrom is a historian of modern China.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep466-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://oracle.com/lexTax Network USA: Full-service tax firm.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

3 weeks назад @ lexfridman.com
#465 – Robert Rodriguez: Sin City, Desperado, El Mariachi, Alita, and Filmmaking
#465 – Robert Rodriguez: Sin City, Desperado, El Mariachi, Alita, and Filmmaking #465 – Robert Rodriguez: Sin City, Desperado, El Mariachi, Alita, and Filmmaking

Robert Rodriguez is a legendary filmmaker and creator of Sin City, El Mariachi, Desperado, Spy Kids, Machete, From Dusk Till Dawn, Alita: Battle Angel, The Faculty, and his newest venture Brass Knuckle Films.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep465-sc

See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript:

https://lexfridman.com/robert-rodriguez-transcript CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: http…

4 weeks назад @ lexfridman.com
#464 – Dave Smith: Israel, Ukraine, Epstein, Mossad, Conspiracies & Antisemitism
#464 – Dave Smith: Israel, Ukraine, Epstein, Mossad, Conspiracies & Antisemitism #464 – Dave Smith: Israel, Ukraine, Epstein, Mossad, Conspiracies & Antisemitism

Dave Smith is a comedian, libertarian, political commentator, and the host of Part of the Problem podcast.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep464-sc

See below for timestamps, and to give feedback, submit questions, contact Lex, etc. CONTACT LEX:

Feedback - give feedback to Lex: https://lexfridman.com/survey

AMA - submit questions, videos or call-in: https://lexfridman.com/ama

Hiring - join our team: https://lexfridman.com/hiring

Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS:

Dave's X: https://x.com/ComicDaveSmith

Dave's YouTube: https://youtube.com/DSmithcomic

Dave's Instagram: https://instagram.com/theprobl…

1 month, 1 week назад @ lexfridman.com
#463 – Douglas Murray: Putin, Zelenskyy, Trump, Israel, Netanyahu, Hamas & Gaza
#463 – Douglas Murray: Putin, Zelenskyy, Trump, Israel, Netanyahu, Hamas & Gaza #463 – Douglas Murray: Putin, Zelenskyy, Trump, Israel, Netanyahu, Hamas & Gaza

Douglas Murray is the author of On Democracies and Death Cults, The War on The West, and The Madness of Crowds.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep463-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://callofduty.com/warzoneOracle: Cloud infrastructure.

Go to https://oracle.com/lexLMNT: Zero-sugar electrolyte drink mix.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drink.

1 month, 2 weeks назад @ lexfridman.com
#462 – Ezra Klein and Derek Thompson: Politics, Trump, AOC, Elon & DOGE
#462 – Ezra Klein and Derek Thompson: Politics, Trump, AOC, Elon & DOGE #462 – Ezra Klein and Derek Thompson: Politics, Trump, AOC, Elon & DOGE

Ezra Klein is one of the most influential voices representing the left-wing of American politics.

He is a columnist for the NY Times and host of The Ezra Klein Show.

Derek Thompson is a writer at The Atlantic and host of the Plain English podcast.

Together they have written a new book titled Abundance that lays out a set of ideas for the future of the Democratic party.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep462-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

1 month, 2 weeks назад @ lexfridman.com
#461 – ThePrimeagen: Programming, AI, ADHD, Productivity, Addiction, and God
#461 – ThePrimeagen: Programming, AI, ADHD, Productivity, Addiction, and God #461 – ThePrimeagen: Programming, AI, ADHD, Productivity, Addiction, and God

ThePrimeagen (aka Michael Paulson) is a programmer who has educated, entertained, and inspired millions of people to build software and have fun doing it.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep461-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://shopify.com/lexNetSuite: Business management software.

Go to http://netsuite.com/lexBetterHelp: Online therapy and counseling.

Go to https://betterhelp.com/lexAG1: All-in-one daily nutrition drinks.

1 month, 3 weeks назад @ lexfridman.com
#460 – Narendra Modi: Prime Minister of India – Power, Democracy, War & Peace
#460 – Narendra Modi: Prime Minister of India – Power, Democracy, War & Peace #460 – Narendra Modi: Prime Minister of India – Power, Democracy, War & Peace

Narendra Modi is the Prime Minister of India.

On YouTube this episode is available in English, Hindi, Russian (and soon other languages).

Captions and voice-over audio tracks are provided (for the main episode video on YouTube) in English, Hindi, Russian, and the original mixed-language version, with subtitles available in your preferred language.

To listen to the original mixed-language version, please select the Hindi (Latin) audio track.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep460-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

2 months назад @ lexfridman.com
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters #459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Dylan Patel is the founder of SemiAnalysis, a research & analysis company specializing in semiconductors, GPUs, CPUs, and AI hardware.

Nathan Lambert is a research scientist at the Allen Institute for AI (Ai2) and the author of a blog on AI called Interconnects.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep459-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://invideo.io/i/lexpodGitHub: Developer platform and AI code editor.

(4:31:34) – AI agents(4:40:16) – Programming and AI(4:47:43) – Open source(4:56:55) – Stargate(5:04:24) – Future of AIPODCAST LINKS:– Podcast Website: https://lexfridman.com/podcast–…

3 months, 1 week назад @ lexfridman.com
#458 – Marc Andreessen: Trump, Power, Tech, AI, Immigration & Future of America
#458 – Marc Andreessen: Trump, Power, Tech, AI, Immigration & Future of America #458 – Marc Andreessen: Trump, Power, Tech, AI, Immigration & Future of America

Marc Andreessen is an entrepreneur, investor, co-creator of Mosaic, co-founder of Netscape, and co-founder of the venture capital firm Andreessen Horowitz.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep458-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://encord.com/lexGitHub: Developer platform and AI code editor.

Go to https://gh.io/copilotNotion: Note-taking and team collaboration.

Go to https://shopify.com/lexLMNT: Zero-sugar electrolyte drink mix.

3 months, 2 weeks назад @ lexfridman.com
#457 – Jennifer Burns: Milton Friedman, Ayn Rand, Economics, Capitalism, Freedom
#457 – Jennifer Burns: Milton Friedman, Ayn Rand, Economics, Capitalism, Freedom #457 – Jennifer Burns: Milton Friedman, Ayn Rand, Economics, Capitalism, Freedom

Jennifer Burns is a historian of ideas, focusing on the evolution of economic, political, and social ideas in the United States in the 20th century.

She wrote two biographies, one on Milton Friedman, and the other on Ayn Rand.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep457-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://brain.fm/lexGitHub: Developer platform and AI code editor.

Go to https://gh.io/copilotLMNT: Zero-sugar electrolyte drink mix.

3 months, 3 weeks назад @ lexfridman.com
#456 – Volodymyr Zelenskyy: Ukraine, War, Peace, Putin, Trump, NATO, and Freedom
#456 – Volodymyr Zelenskyy: Ukraine, War, Peace, Putin, Trump, NATO, and Freedom #456 – Volodymyr Zelenskyy: Ukraine, War, Peace, Putin, Trump, NATO, and Freedom

Volodymyr Zelenskyy is the President of Ukraine.

On YouTube this episode is available in English, Ukrainian, and Russian.

Captions and voice-over audio tracks are provided in English, Ukrainian, Russian, and the original mixed-language version, with subtitles available in your preferred language.

To listen to the original mixed language version, please select the English (UK) audio track audio track.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep456-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

4 months, 1 week назад @ lexfridman.com
#455 – Adam Frank: Alien Civilizations and the Search for Extraterrestrial Life
#455 – Adam Frank: Alien Civilizations and the Search for Extraterrestrial Life #455 – Adam Frank: Alien Civilizations and the Search for Extraterrestrial Life

Adam Frank is an astrophysicist studying star systems and the search for extraterrestrial life and alien civilizations.

Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep455-scSee below for timestamps, and to give feedback, submit questions, contact Lex, etc.

Go to https://encord.com/lexEight Sleep: Temp-controlled smart mattress cover.

Go to https://betterhelp.com/lexNotion: Note-taking and team collaboration.

Go to https://drinkLMNT.com/lexAG1: All-in-one daily nutrition drinks.

4 months, 3 weeks назад @ lexfridman.com
#454 – Saagar Enjeti: Trump, MAGA, DOGE, Obama, FDR, JFK, History & Politics
#454 – Saagar Enjeti: Trump, MAGA, DOGE, Obama, FDR, JFK, History & Politics #454 – Saagar Enjeti: Trump, MAGA, DOGE, Obama, FDR, JFK, History & Politics

Saagar Enjeti is a political journalist & commentator, co-host of Breaking Points with Krystal and Saagar and The Realignment Podcast.

He is exceptionally well-read, and the books he recommends are always fascinating and eye-opening.

You can check out all the books he mentions in this episode here: https://lexfridman.com/saagar-booksThank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep454-scSee below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc.

Go to https://eightsleep.com/lexAG1: All-in-one daily nutrition drinks.

Go to https://drinkLMNT.com/lexBetterHelp: Online therapy and counseling.

5 months, 1 week назад @ lexfridman.com
Microsoft Research Podcast Microsoft Research Podcast
последний пост 14 часов назад
Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers
Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers

LEE: Yeah, yeah.

LEE: Yeah, yeah.

LEE: Yeah, yeah.

[LAUGHS]GOLDBERG: Right, right, right, yeah.

Yeah, yeah.

14 часов назад @ microsoft.com
Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv
Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv

Today I’m talking to two researchers, Hongxia Hao, a senior researcher at Microsoft Research AI for Science, and Bing Lv, an associate professor in physics at the University of Texas at Dallas.

Hongxia and Bing are co-authors of a paper called Probing the Limit of Heat Transfer in Inorganic Crystals with Deep Learning .

LV: So I think one of the biggest things as Hongxia said, right?

We have a lot of new materials, exotic materials, which some of them, Hongxia can elaborate a little bit more.

HAO: Yeah, yeah.

1 week назад @ microsoft.com
Abstracts: Societal AI with Xing Xie
Abstracts: Societal AI with Xing Xie Abstracts: Societal AI with Xing Xie

I’m here today with Xing Xie, a partner research manager at Microsoft Research and co-author of a white paper called Societal AI: Research Challenges and Opportunities .

HUIZINGA: So let’s start with a brief overview of the background for this white paper on Societal AI.

XIE: The idea for this white paper emerged in response to the shift we are witnessing in the AI landscape.

XIE: Rather than follow a traditional research methodology, we built this white paper around ten fundamental, foundational research questions.

HUIZINGA: Yeah, yeah, yeah.

1 week, 3 days назад @ microsoft.com
The AI Revolution in Medicine, Revisited: Laws, norms, and ethics for AI in health
The AI Revolution in Medicine, Revisited: Laws, norms, and ethics for AI in health The AI Revolution in Medicine, Revisited: Laws, norms, and ethics for AI in health

Roxana is among the world’s thought leaders in AI, healthcare, and medicine, thanks in part to groundbreaking work on AI biases and trustworthiness.

When we think about AI, we think about it being very futuristic, but it’s trained on data from the past.

And so I think your struggles and frustrations on, you know, how to expand that nationwide, I think, are really, really informative.

And in a way, I think … I don’t know that I or my coauthors were satisfied with that.

You know, AI has all the time in the world [LAUGHS] to write you a text.

2 weeks назад @ microsoft.com
The AI Revolution in Medicine, Revisited: Empowering patients and healthcare consumers in the age of generative AI
The AI Revolution in Medicine, Revisited: Empowering patients and healthcare consumers in the age of generative AI The AI Revolution in Medicine, Revisited: Empowering patients and healthcare consumers in the age of generative AI

[LAUGHS]DEBRONKART: Ah, well, that’s … that’s even weirder.

I’m, as I said at the beginning, I’m glad to be alive and I’m really, really, really grateful to be given a chance to share my thoughts with your audience because I really like super smart nerds.

And it’s really to me like where I’m seeing kind of the first set of really kind of promising AI applications.

And so, to me, that’s really kind of where I see the most interesting opportunities for technology and for digital health.

Just really, really appreciate it.

4 weeks назад @ microsoft.com
The AI Revolution in Medicine, Revisited: Real-world healthcare AI development and deployment—at scale
The AI Revolution in Medicine, Revisited: Real-world healthcare AI development and deployment—at scale The AI Revolution in Medicine, Revisited: Real-world healthcare AI development and deployment—at scale

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

1 month, 1 week назад @ microsoft.com
Ideas: Accelerating Foundation Models Research: AI for all
Ideas: Accelerating Foundation Models Research: AI for all Ideas: Accelerating Foundation Models Research: AI for all

But there was that class I really, really enjoyed, which was mathematical logic.

Well, let’s get onto the topic of Accelerating Foundation Models Research and unpack the big idea behind that.

It might be confusing for some people, Accelerating Foundation Models Research.

And so when we started with Accelerating Foundation Models Research and from now on, I will say AFMR if that’s okay.

It’s about access to people, access to the resources and really co-designing so that we can really, really make more advances together.

1 month, 2 weeks назад @ microsoft.com
The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic
The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic The AI Revolution in Medicine, Revisited: The reality of generative AI in the clinic

Sara is vice president and chief health AI officer at UC San Francisco Health.

LONGHURST: So the pat response is AI won’t replace doctors, but AI will replace doctors who don’t use AI.

LEE: And I’m assuming a chief health AI officer is not a role that has been around for a long time.

LEE: Should I be impressed or concerned that the chief health AI officer at UC San Francisco Health is using ChatGPT off label?

We’ll delve into how patients are using generative AI for their own healthcare, the hype and reality of AI drug discovery, and more.

1 month, 3 weeks назад @ microsoft.com
The AI Revolution in Medicine, Revisited: An Introduction
The AI Revolution in Medicine, Revisited: An Introduction The AI Revolution in Medicine, Revisited: An Introduction

About two years ago, with Carey Goldberg and Zak Kohane, we wrote a book, The AI Revolution in Medicine.

If you’re a patient, in what ways could AI change your experience as you try to navigate a complex healthcare system?

A strange and bizarre thought, I admit, but a natural one, I think, for any human being that’s encountering this amazing AI technology for the first time.

And since then, of course, I’ve come to learn that many people have had similar experiences in their first encounters with AI.

And in fact, I’ve come to think of this as, somewhat tongue in cheek, the nine stages of AI grief.

2 months, 1 week назад @ microsoft.com
Ideas: Quantum computing redefined with Chetan Nayak
Ideas: Quantum computing redefined with Chetan Nayak Ideas: Quantum computing redefined with Chetan Nayak

CHETAN NAYAK: People sometimes say, well, quantum computers are just going to be like classical computers but faster.

This idea of quantum, because you’ve mentioned Albert Einstein, there’s quantum physics, quantum mechanics, now quantum computing.

Well, let me …NAYAK: And that’s quantum mechanics!

HUIZINGA: OK.NAYAK: You’re probably going to say, well, how does quantum computing fit into this, you know?

[LAUGHS]NAYAK: And, you know, there are people out there who said, you know, quantum computers are decades away; don’t worry about it.

2 months, 3 weeks назад @ microsoft.com
Ideas: Building AI for population-scale systems with Akshay Nambi
Ideas: Building AI for population-scale systems with Akshay Nambi Ideas: Building AI for population-scale systems with Akshay Nambi

His work lies at the intersection of systems, AI, and machine learning with a focus on designing, deploying, and scaling AI systems to solve compelling real-world problems.

CHRIS STETKIEWICZ: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code.

NAMBI: That’s right.

This represents a major step towards building AI systems that’s much more holistic personal tutors, which help student understanding and create more engaging, effective learning experience.

Are there some things that could go wrong, even if we get the technology right?

3 months назад @ microsoft.com
Ideas: Building AI for population-scale systems with Akshay Nambi
Ideas: Building AI for population-scale systems with Akshay Nambi Ideas: Building AI for population-scale systems with Akshay Nambi

His work lies at the intersection of systems, AI, and machine learning with a focus on designing, deploying, and scaling AI systems to solve compelling real-world problems.

CHRIS STETKIEWICZ: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code.

NAMBI: That’s right.

This represents a major step towards building AI systems that’s much more holistic personal tutors, which help student understanding and create more engaging, effective learning experience.

Are there some things that could go wrong, even if we get the technology right?

3 months назад @ microsoft.com
Ideas: Bug hunting with Shan Lu
Ideas: Bug hunting with Shan Lu Ideas: Bug hunting with Shan Lu

We are sorry, the page you requested cannot be found.

The page you are looking for could not be found or is no longer available.

3 months, 3 weeks назад @ microsoft.com
Ideas: AI for materials discovery with Tian Xie and Ziheng Lu
Ideas: AI for materials discovery with Tian Xie and Ziheng Lu Ideas: AI for materials discovery with Tian Xie and Ziheng Lu

And now you can use this loop to design materials really quickly.

XIE: So you can really think about MatterSim and MatterGen accelerating different parts of materials discovery process.

They are also both foundation AI models, meaning they can both be used for a broad range of materials design problems.

Really, really a lot.

Yeah, I really, really like the example that Ziheng mentioned about the educational purposes.

3 months, 4 weeks назад @ microsoft.com
Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness
Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness

DAEPP: You know, we didn’t really think about the term fraud until we started prepping for this interview with you.

BADANES: Right, right.

One of the things that I get asked a lot is, why can’t we just build good AI to detect bad AI, right?

BADANES: So next time my kids are in a fight, I’m going to point them to Copilot and say, work with Copilot to mediate.

[LAUGHS] No, that’s really, really interesting.

4 months, 3 weeks назад @ microsoft.com
NLP Highlights NLP Highlights
последний пост None
Data Skeptic
последний пост 1 week назад
Unveiling Graph Datasets
Unveiling Graph Datasets Unveiling Graph Datasets 1 week назад @ dataskeptic.com
Network Manipulation
Network Manipulation Network Manipulation

In this episode we talk with Manita Pote, a PhD student at Indiana University Bloomington, specializing in online trust and safety, with a focus on detecting coordinated manipulation campaigns on social media. Key insights include how coordinated reply attacks target influential figures like journalists and politicians, how machine learning models can detect these inauthentic campaigns using structural and behavioral features, and how deletion patterns reveal efforts to evade moderation or manipulate engagement metrics. Follow our guest X/Twitter Google Scholar Papers in focus Coordinated Reply Attacks in Influence Operations: Characterization and Detection ,2025 Manipulating Twitter throug…

2 weeks, 2 days назад @ dataskeptic.com
The Small World Hypothesis
The Small World Hypothesis The Small World Hypothesis

Kyle discusses the history and proof for the small world hypothesis.

3 weeks, 4 days назад @ dataskeptic.com
Thinking in Networks
Thinking in Networks Thinking in Networks

Kyle asks Asaf questions about the new network science course he is now teaching. The conversation delves into topics such as contact tracing, tools for analyzing networks, example use cases, and the importance of thinking in networks.

1 month назад @ dataskeptic.com
Fraud Networks
Fraud Networks Fraud Networks

In this episode we talk with Bavo DC Campo, a data scientist and statistician, who shares his expertise on the intersection of actuarial science, fraud detection, and social network analytics. Together we will learn how to use graphs to fight against insurance fraud by uncovering hidden connections between fraudulent claims and bad actors. Key insights include how social network analytics can detect fraud rings by mapping relationships between policyholders, claims, and service providers, and how the BiRank algorithm, inspired by Google’s PageRank, helps rank suspicious claims based on network structure. Bavo will also present his iFraud simulator that can be used to model fraudulent networ…

1 month, 2 weeks назад @ dataskeptic.com
Criminal Networks
Criminal Networks Criminal Networks

In this episode we talk with Justin Wang Ngai Yeung, a PhD candidate at the Network Science Institute at Northeastern University in London, who explores how network science helps uncover criminal networks. Justin is also a member of the organizing committee of the satellite conference dealing with criminal networks at the network science conference in The Netherlands in June 2025. Listeners will learn how graph-based models assist law enforcement in analyzing missing data, identifying key figures in criminal organizations, and improving intervention strategies. Key insights include the challenges of incomplete and inaccurate data in criminal network analysis, how law enforcement agencies us…

1 month, 4 weeks назад @ dataskeptic.com
Graph Bugs
Graph Bugs Graph Bugs

In this episode today’s guest is Celine Wüst, a master’s student at ETH Zurich specializing in secure and reliable systems, shares her work on automated software testing for graph databases. Celine shows how fuzzing—the process of automatically generating complex queries—helps uncover hidden bugs in graph database management systems like Neo4j, FalconDB, and Apache AGE. Key insights include how state-aware query generation can detect critical issues like buffer overflows and crashes, the challenges of debugging complex database behaviors, and the importance of security-focused software testing. We'll also find out which Graph DB company offers swag for finding bugs in its software and get C…

2 months назад @ dataskeptic.com
Organizational Network Analysis
Organizational Network Analysis Organizational Network Analysis

In this episode, Gabriel Petrescu, an organizational network analyst, discusses how network science can provide deep insights into organizational structures using OrgXO, a tool that maps companies as networks rather than rigid hierarchies. Listeners will learn how analyzing workplace collaboration networks can reveal hidden influencers, organizational bottlenecks, and engagement levels, offering a data-driven approach to improving effectiveness and resilience. Key insights include how companies can identify overburdened employees, address silos between departments, and detect vulnerabilities where too few individuals hold critical knowledge. Real-life applications range from mergers and acq…

2 months, 1 week назад @ dataskeptic.com
Organizational Networks
Organizational Networks Organizational Networks

Is it better to have your work team fully connected or sparsely connected? In this episode we'll try to answer this question and more with our guest Hiroki Sayama, a SUNY Distinguished Professor and director of the Center for Complex Systems at Binghamton University. Hiroki delves into the applications of network science in organizational structures and innovation dynamics by showing his recent work of extracting network structures from organizational charts to enable insights into decision-making and performance, He'll also cover how network connectivity impacts team creativity and innovation. Key insights include how the structure of organizational networks—such as the depth of hierarchy …

2 months, 2 weeks назад @ dataskeptic.com
Networks of the Mind
Networks of the Mind Networks of the Mind

A man goes into a bar… This is the beginning of a riddle that our guest, Yoed Kennet, an assistant professor at the Technion's Faculty of Data and Decision Sciences, uses to measure creativity in subjects. In our talk, Yoed speaks about how to combine cognitive science and network science to explore the complexities and decode the mysteries of the human mind. The listeners will learn how network science provides tools to map and analyze human memory, revealing how problem-solving and creativity emerge from changes in semantic memory structures. Key insights include the role of memory restructuring during moments of insight, the connection between semantic networks and creative thinking, and…

2 months, 3 weeks назад @ dataskeptic.com
LLMs and Graphs Synergy
LLMs and Graphs Synergy LLMs and Graphs Synergy

In this episode, Garima Agrawal, a senior researcher and AI consultant, brings her years of experience in data science and artificial intelligence. Listeners will learn about the evolving role of knowledge graphs in augmenting large language models (LLMs) for domain-specific tasks and how these tools can mitigate issues like hallucination in AI systems. Key insights include how LLMs can leverage knowledge graphs to improve accuracy by integrating domain expertise, reducing hallucinations, and enabling better reasoning. Real-life applications discussed range from enhancing customer support systems with efficient FAQ retrieval to creating smarter AI-driven decision-making pipelines. Garima’s …

3 months назад @ dataskeptic.com
A Network of Networks
A Network of Networks A Network of Networks

In this episode, Bnaya Gross, a Fulbright postdoctoral fellow at the Center for Complex Network Research at Northwestern University, explores the transformative applications of network science in fields ranging from infrastructure to medicine, by studying the interactions between networks ("a network of networks"). Listeners will learn how interdependent networks provide a framework for understanding cascading failures, such as power outages, and how these insights transfer to physical systems like superconducting materials and biological networks. Key takeaways include understanding how dependencies between networks can amplify vulnerabilities, applying these principles to create resilient…

3 months, 1 week назад @ dataskeptic.com
Auditing LLMs and Twitter
Auditing LLMs and Twitter Auditing LLMs and Twitter

Our guests, Erwan Le Merrer and Gilles Tredan, are long-time collaborators in graph theory and distributed systems. They share their expertise on applying graph-based approaches to understanding both large language model (LLM) hallucinations and shadow banning on social media platforms. In this episode, listeners will learn how graph structures and metrics can reveal patterns in algorithmic behavior and platform moderation practices. Key insights include the use of graph theory to evaluate LLM outputs, uncovering patterns in hallucinated graphs that might hint at the underlying structure and training data of the models, and applying epidemic models to analyze the uneven spread of shadow ban…

3 months, 2 weeks назад @ dataskeptic.com
Fraud Detection with Graphs
Fraud Detection with Graphs Fraud Detection with Graphs

In this episode, Šimon Mandlík, a PhD candidate at the Czech Technical University will talk with us about leveraging machine learning and graph-based techniques for cybersecurity applications. We'll learn how graphs are used to detect malicious activity in networks, such as identifying harmful domains and executable files by analyzing their relationships within vast datasets. This will include the use of hierarchical multi-instance learning (HML) to represent JSON-based network activity as graphs and the advantages of analyzing connections between entities (like clients, domains etc.). Our guest shows that while other graph methods (such as GNN or Label Propagation) lack in scalability or h…

3 months, 3 weeks назад @ dataskeptic.com
Optimizing Supply Chains with GNN
Optimizing Supply Chains with GNN Optimizing Supply Chains with GNN

Thibaut Vidal, a professor at Polytechnique Montreal, specializes in leveraging advanced algorithms and machine learning to optimize supply chain operations. In this episode, listeners will learn how graph-based approaches can transform supply chains by enabling more efficient routing, districting, and decision-making in complex logistical networks. Key insights include the application of Graph Neural Networks to predict delivery costs, with potential to improve districting strategies for companies like UPS or Amazon and overcoming limitations of traditional heuristic methods. Thibaut’s work underscores the potential for GNN to reduce costs, enhance operational efficiency, and provide bette…

4 months назад @ dataskeptic.com
SuperDataScience SuperDataScience
последний пост 2 days, 19 hours назад
887: Multi-Agent Teams, Quantum Computing and the Future of Work, with Dell’s Global CTO John Roese
887: Multi-Agent Teams, Quantum Computing and the Future of Work, with Dell’s Global CTO John Roese 887: Multi-Agent Teams, Quantum Computing and the Future of Work, with Dell’s Global CTO John Roese

Jon Krohn speaks to John Roese about the promise of multi-agent teams for business, the benefits of agentic AI systems that can identify and complete tasks independently, and how these systems demand new authentication, authorization, security and knowledge-sharing standards. They also discuss how to use AI to refine project ideas down to a core business need, as well as the new and emerging careers in the tech industry and beyond, all thanks to AI. Additional materials: ⁠⁠www.superdatascience.com/887 This episode is brought to you by ⁠Adverity, the conversational analytics platform⁠ and by the ⁠Dell AI Factory with NVIDIA⁠. Interested in sponsoring a SuperDataScience Podcast episode? Email…

2 days, 19 hours назад @ podtrac.com
886: In Case You Missed it In April 2025
886: In Case You Missed it In April 2025 886: In Case You Missed it In April 2025

Our In Case You Missed It episode for April has clips on NVIDIA’s and Dell’s product and service offers including an overview of NVIDIA’s GPUs, AI Enterprise, and its microservices. You’ll also hear about AWS’ focus on bringing choice to customers and the incredible power of its Graviton CPU, how Zerve opens access to AI deployment, Merck KGaA, Darmstadt, Germany’s multi-chip integration, and why reliance on the cloud might soon become a practice of times past. Additional materials: ⁠www.superdatascience.com/886⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

6 days, 19 hours назад @ podtrac.com
885: Python Polars: The Definitive Guide, with Jeroen Janssens and Thijs Nieuwdorp
885: Python Polars: The Definitive Guide, with Jeroen Janssens and Thijs Nieuwdorp 885: Python Polars: The Definitive Guide, with Jeroen Janssens and Thijs Nieuwdorp

Jeroen Janssens and Thijs Nieuwdorp are data frame library Polars’ greatest advocates in this episode with Jon Krohn, where they discuss their book, Python Polars: The Definitive Guide, best practice for using Polars, why Pandas users are switching to Polars for data frame operations in Python, and how the library reduces memory usage and compute time up to 10x more than Pandas. Listen to the episode to be a part of an O’Reilly giveaway! Additional materials: ⁠www.superdatascience.com/885 This episode is brought to you by Trainium2, the latest AI chip from AWS, by Adverity, the conversational analytics platform and by the Dell AI Factory with NVIDIA. Interested in sponsoring a SuperDataScie…

1 week, 2 days назад @ podtrac.com
884: Model Context Protocol (MCP) and Why Everyone’s Talking About It
884: Model Context Protocol (MCP) and Why Everyone’s Talking About It 884: Model Context Protocol (MCP) and Why Everyone’s Talking About It

Model Context Protocol (MCP) is Anthropic’s hottest tool, with over 1,000 community-built MCP servers in operation by February alone. In this Five-Minute Friday, Jon Krohn explains what took so long for users to catch on: Anthropic released MCP in November 2024. Hear more about the buzz behind MCP, its applications, and how easy it is to get started. Additional materials: ⁠www.superdatascience.com/884⁠ Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 week, 6 days назад @ podtrac.com
883: Blackwell GPUs Are Now Available at Your Desk, with Sama Bali and Logan Lawler
883: Blackwell GPUs Are Now Available at Your Desk, with Sama Bali and Logan Lawler 883: Blackwell GPUs Are Now Available at Your Desk, with Sama Bali and Logan Lawler

Returning after the “Super Bowl of AI”, NVIDIA GTC, Sama Bali and Logan Lawler talk to Jon Krohn about their respective work at tech giants NVIDIA and Dell. Sama and Logan discuss the next-gen Blackwell GPUs to their collaboration with Dell in launching Pro-Max PCs specially designed to take on heavy computational workloads as well as the incredible performance of GB 10 and GB 300 workstations, and the widening accessibility of AI developer tools and models. Additional materials: www.superdatascience.com/883 This episode is brought to you by ODSC, the Open Data Science Conference, by Adverity, the conversational analytics platform and by the Dell AI Factory with NVIDIA. Interested in sponso…

2 weeks, 2 days назад @ podtrac.com
882: 40x Hotter Than the Sun: The ASML Machines That Make AI Chips
882: 40x Hotter Than the Sun: The ASML Machines That Make AI Chips 882: 40x Hotter Than the Sun: The ASML Machines That Make AI Chips

This week’s five-minute Friday heads to the Netherlands to find out more about Dutch company ASML, the brains behind the lithography machines that build AI chips. Jon Krohn walks through how ASML came to dominate the market, where they’re headed next, and how ASML’s complex machines shape AI chips as well as the very future of AI. Additional materials: www.superdatascience.com/882 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

2 weeks, 6 days назад @ podtrac.com
881: Beyond GPUs: The Power of Custom AI Accelerators, with Emily Webber
881: Beyond GPUs: The Power of Custom AI Accelerators, with Emily Webber 881: Beyond GPUs: The Power of Custom AI Accelerators, with Emily Webber

Emily Webber speaks to Jon Krohn about her work at Amazon Web Services, from its Annapurna Labs-developed Nitro System, a foundational technology that can enhance securities and performance in the cloud and how Trainium2 became AWS’ most powerful AI chip with four times the compute of Trainium. Hear the specs of AWS’s chips and when to use them. Additional materials: www.superdatascience.com/881 This episode is brought to you by ODSC, the Open Data Science Conference. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (08:36) Emily’s work on AWS’ SageMaker and Trainium (23:54) How AWS N…

3 weeks, 2 days назад @ podtrac.com
880: Manus, DeepSeek and China’s AI Boom
880: Manus, DeepSeek and China’s AI Boom 880: Manus, DeepSeek and China’s AI Boom

First developed in China, Manus AI and DeepSeek have made great waves on an international scale. Sought-after for their cost-effectiveness compared to US-made tech, Manus AI and DeepSeek are quickly becoming dominant technologies inside the country. In this five-minute Friday, Jon Krohn asks: Do these technologies warrant the huge amount of resources spent on them by multiple industries in China, and what makes hype become a mainstay? Additional materials: www.superdatascience.com/880 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

3 weeks, 6 days назад @ podtrac.com
879: Serverless, Parallel, and AI-Assisted: The Future of Data Science is Here, with Zerve’s Dr. Greg Michaelson
879: Serverless, Parallel, and AI-Assisted: The Future of Data Science is Here, with Zerve’s Dr. Greg Michaelson 879: Serverless, Parallel, and AI-Assisted: The Future of Data Science is Here, with Zerve’s Dr. Greg Michaelson

Greg Michaelson speaks to Jon Krohn about the latest developments at Zerve, an operating system for developing and delivering data and AI products, including a revolutionary feature allowing users to run multiple parts of a program’s code at once and without extra costs. You’ll also hear why LLMs might spell trouble for SaaS companies, Greg’s ‘good-cop, bad-cop’ routine that improves LLM responses, and how RAG (retrieval-augmented generation) can be deployed to create even more powerful AI applications. Additional materials: www.superdatascience.com/879 This episode is brought to you by Trainium2, the latest AI chip from AWS and by the Dell AI Factory with NVIDIA. Interested in sponsoring a…

1 month назад @ podtrac.com
878: In Case You Missed It in March 2025
878: In Case You Missed It in March 2025 878: In Case You Missed It in March 2025

AI stacks, AGI, training neural networks, and AI authenticity: Jon Krohn rounds up his interviews from March with this episode of “In Case You Missed It”. In his favorite clips from the month, he speaks to Andriy Burkov (Episode 867), Natalie Monbiot (Episode 873), Richmond Alake (Episode 871) and Varun Godbole (Episode 869). Additional materials: www.superdatascience.com/878 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month назад @ podtrac.com
877: The Neural Processing Units Bringing AI to PCs, with Shirish Gupta
877: The Neural Processing Units Bringing AI to PCs, with Shirish Gupta 877: The Neural Processing Units Bringing AI to PCs, with Shirish Gupta

NPUs, AIPC, and Dell’s growing suite of AI products: Shirish Gupta speaks to Jon Krohn about neural processing units and what makes them a go-to tool for AI inference workloads, reasons to move your workloads from the cloud and to your local devices, what the mnemonic AIPC stands for and why it will soon be on everyone’s lips, and he offers a special intro to Dell’s new Pro-AI Studio Toolkit. Hear about several real-world AIPC applications run by Dell’s clients, from detecting manufacturing defects to improving efficiencies for first responders, massively supporting actual life-or-death situations. Additional materials: www.superdatascience.com/877 This episode is brought to you by ODSC, th…

1 month, 1 week назад @ podtrac.com
876: Hugging Face’s smolagents: Agentic AI in Python Made Easy
876: Hugging Face’s smolagents: Agentic AI in Python Made Easy 876: Hugging Face’s smolagents: Agentic AI in Python Made Easy

Small, simple, accessible: Hugging Face makes a huge contribution to the agentic AI wave with its smolagents. Jon Krohn explores how this small-but-mighty new Python library can act as the best personal assistant you never had. Hear about its features and use cases in this five-minute Friday. Additional materials: www.superdatascience.com/876 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 1 week назад @ podtrac.com
875: How Semiconductors Are Made (And Fuel the AI Boom), with Kai Beckmann
875: How Semiconductors Are Made (And Fuel the AI Boom), with Kai Beckmann 875: How Semiconductors Are Made (And Fuel the AI Boom), with Kai Beckmann

Why are semiconductors so essential in this digital age, and how are they made? Jon Krohn speaks to electronics CEO Kai Beckmann about Merck KGaA, Darmstadt, Germany’s intricate manufacturing process, how we can use AI to develop materials that power next-gen AI technologies, and how a chip with the processing power of the human brain might one day be able to run on the power of a low-watt light bulb. Additional materials: www.superdatascience.com/875 This episode is brought to you by the Dell AI Factory with NVIDIA. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information. In this episode you will learn: (06:26) How Merck K…

1 month, 2 weeks назад @ podtrac.com
874: How AI is Transforming Baseball (with Lessons For All of Us)
874: How AI is Transforming Baseball (with Lessons For All of Us) 874: How AI is Transforming Baseball (with Lessons For All of Us)

In this Five-Minute Friday, Jon Krohn talks baseball. For decades, coaches have relied on player performance stats to make in-game decisions and refine their season strategies. Now, AI led by Statcast is taking baseball strategy even further, massively broadening analytics data to include pitch, swing and catch trajectories, spin rates, biomechanical information, player matchups, and how to enhance player performances. Listen to the episode to find out what other industries can learn from the “data-friendly” sport of baseball. Additional materials: www.superdatascience.com/874 Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship inf…

1 month, 2 weeks назад @ podtrac.com
873: Become Your Best Self Through AI Augmentation — feat. Natalie Monbiot
873: Become Your Best Self Through AI Augmentation — feat. Natalie Monbiot 873: Become Your Best Self Through AI Augmentation — feat. Natalie Monbiot

Natalie Monbiot is an independent advisor and collaborator for projects that concern the “virtual human”, and she is “going all in on the virtual human economy”. Jon Krohn speaks to Natalie about these new ventures, how to mitigate the divide between AI users and nonusers, and how anyone can collaborate with AI without compromising their own creativity. Additional materials: www.superdatascience.com/873 This episode is brought to you by the Dell AI Factory with NVIDIA, by Trainium2, the latest AI chip from AWS and by ODSC, the Open Data Science Conference. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

1 month, 3 weeks назад @ podtrac.com
Data Science at Home Data Science at Home
последний пост 1 day, 22 hours назад
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)
DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282) DSH/Warcoded Kill Chains and Algorithmic Warfare – Autonomy in Targeting and Engagement (Ep. 282)

In this gripping follow-up, we dive into how AI is transforming kinetic operations—from identifying a threat to executing a strike.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.comWarcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Whether it’s in the sky, on the ground, or in orbit—if it’s intelligent and mobile, Intrepid helps you build it.

1 day, 22 hours назад @ datascienceathome.com
DSH/Warcoded: Eyes and Ears of the Machine – AI Reconnaissance and Surveillance (Ep. 281)
DSH/Warcoded: Eyes and Ears of the Machine – AI Reconnaissance and Surveillance (Ep. 281) DSH/Warcoded: Eyes and Ears of the Machine – AI Reconnaissance and Surveillance (Ep. 281)

Welcome to DSH/WarcodedWe explore how AI is transforming ISR (Intelligence, Surveillance, Reconnaissance)—from satellite imagery to drone feeds.

At the intersection of ethics and engineering, Amethix creates AI systems that don’t just function—they adapt, learn, and serve.

Discover more at amethix.com.”Warcoded is brought to you by Intrepid AI.

From drones to satellites, Intrepid AI gives engineers and defense innovators the tools to prototype, simulate, and deploy autonomous systems with confidence.

Learn more at intrepid.ai.”#AI #defensetech #ISR #LLM #Warcoded #DataScienceAtHome #OSINT #SIGINT #dronewarfare

1 week, 1 day назад @ datascienceathome.com
AI Agents with Atomic Agents 🚀 with Kenny Vaneetvelde (Ep. 280)
AI Agents with Atomic Agents 🚀 with Kenny Vaneetvelde (Ep. 280) AI Agents with Atomic Agents 🚀 with Kenny Vaneetvelde (Ep. 280)

🎙️ In this episode of Data Science at Home, we sit down with Kenny Vaneetvelde, the mastermind behind Atomic Agents, a groundbreaking framework redefining AI development.

🔍 Discover how atomicity simplifies complex AI systems, why modularity matters more than ever, and how Atomic Agents is eliminating hidden assumptions and redundant complexity in AI workflows.

💡 From real-world applications to the tech stack behind the framework, Kenny takes us on a deep dive into this lightweight, powerful tool for creating consistent and brand-aligned AI.

📌 Timestamps:0:00 – Intro2:30 – Kenny’s journey in AI5:00 – What are Atomic Agents?

10:45 – Why atomicity matters in AI18:20 – The tech behind Atomic A…

1 month назад @ datascienceathome.com
Run massive models on crappy machines (Ep. 279)
Run massive models on crappy machines (Ep. 279) Run massive models on crappy machines (Ep. 279)

This episode explores how to break down barriers by running massive AI models on “crappy machines”—affordable, low-spec devices.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail …

1 month, 1 week назад @ datascienceathome.com
WeightWatcher: The AI Detective for LLMs (DeepSeek & OpenAI included) (Ep. 278)
WeightWatcher: The AI Detective for LLMs (DeepSeek & OpenAI included) (Ep. 278) WeightWatcher: The AI Detective for LLMs (DeepSeek & OpenAI included) (Ep. 278)

Enter WeightWatcher—the AI detective tool that peeks inside neural networks without needing their data.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data professional, tech enthusiast, or just curious about the field, our podcast delivers insights, interviews, and discussions.

Send us mail at:hello@datascienceathom…

1 month, 2 weeks назад @ datascienceathome.com
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 278)
Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 278) Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything (Ep. 278)

From the viral article “Tech’s Dumbest Mistake: Why Firing Programmers for AI Will Destroy Everything” on my newsletter at https://defragzone.substack.com/p/techs-dumbest-mistake-why-firinghere are my thoughts about AI replacing programmers…✨ Connect with us!

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: https://www.linkedin.com/in/fragadaleta/Instagram: https://www.instagram.com/datascienceathome/Facebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

Whether you’re a data profes…

1 month, 2 weeks назад @ datascienceathome.com
Scaling Smart: AI, Data, and Building Future-Ready Enterprises with Josh Miramant (Ep. 276)
Scaling Smart: AI, Data, and Building Future-Ready Enterprises with Josh Miramant (Ep. 276) Scaling Smart: AI, Data, and Building Future-Ready Enterprises with Josh Miramant (Ep. 276)

In this episode, we dive into the transformative world of AI, data analytics, and cloud infrastructure with Josh Miramant, CEO of Blue Orange Digital.

As a seasoned entrepreneur with over $25 million raised across ventures and two successful exits, Josh shares invaluable insights on scaling data-driven businesses, integrating machine learning frameworks, and navigating the rapidly evolving landscape of cloud data architecture.

From generative AI to large language models, Josh explores cutting-edge trends shaping financial services, real estate, and consumer goods.

Tune in for a masterclass in leveraging data for impact and innovation!

Linkshttps://blueorange.digital/https://blueorange.digit…

4 months, 3 weeks назад @ datascienceathome.com
Autonomous Weapons and AI Warfare (Ep. 275)
Autonomous Weapons and AI Warfare (Ep. 275) Autonomous Weapons and AI Warfare (Ep. 275)

Here’s the updated text with links to the websites included:AI is revolutionizing the military with autonomous drones, surveillance tech, and decision-making systems.

In this episode of Data Science at Home, we expose the cutting-edge tech reshaping defense—and the chilling ethical questions that follow.

🐦 Twitter: @DataScienceAtHome📘 LinkedIn: Francesco Gad📷 Instagram: https://www.instagram.com/datascienceathome/📘 Facebook: https://www.facebook.com/datascienceAH💼 LinkedIn: https://www.linkedin.com/company/data-science-at-home-podcast💬 Discord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and machine learning.

S…

5 months назад @ datascienceathome.com
8 Proven Strategies to Scale Your AI Systems Like OpenAI! 🚀 (Ep. 274)
8 Proven Strategies to Scale Your AI Systems Like OpenAI! 🚀  (Ep. 274) 8 Proven Strategies to Scale Your AI Systems Like OpenAI! 🚀 (Ep. 274)

In this episode of Data Science at Home, we’re diving deep into the powerful strategies that top AI companies, like OpenAI, use to scale their systems to handle millions of requests every minute!

From stateless services and caching to the secrets of async processing, discover 8 essential strategies to make your AI and machine learning systems unstoppable.

Instagram: https://www.instagram.com/datascienceathome/Twitter: @datascienceathomeFacebook: https://www.facebook.com/datascienceAHLinkedIn: https://www.linkedin.com/company/data-science-at-home-podcastDiscord Channel: https://discord.gg/4UNKGf3NEW TO DATA SCIENCE AT HOME?

Data Science at Home explores the latest in AI, data science, and ma…

5 months назад @ datascienceathome.com
Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273)
Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273) Humans vs. Bots: Are You Talking to a Machine Right Now? (Ep. 273)

Together, they explore the growing importance of distinguishing human-written from AI-generated text, discussing real-world examples from social media to news.

How reliable are current detection tools like DetectGPT?

What are the ethical and technical challenges ahead as AI continues to advance?

And is the balance between innovation and regulation tipping in the right direction?

Tune in for insights on the future of AI text detection and the broader implications for media, academia, and policy.

5 months, 3 weeks назад @ datascienceathome.com
AI bubble, Sam Altman’s Manifesto and other fairy tales for billionaires (Ep. 272)
AI bubble, Sam Altman’s Manifesto and other fairy tales for billionaires (Ep. 272) AI bubble, Sam Altman’s Manifesto and other fairy tales for billionaires (Ep. 272)

Welcome to Data Science at Home, where we don’t just drink the AI Kool-Aid.

Today, we’re dissecting Sam Altman’s “AI manifesto”—a magical journey where, apparently, AI will fix everything from climate change to your grandma’s back pain.

In this episode, I’ll break down the bold (and often bizarre) claims in Altman’s grand speech for the Intelligence Age.

I’ll give you the real scoop on what’s realistic, what’s nonsense, and why some tech billionaires just can’t resist overselling.

Chapters00:00 – Intro00:18 – CEO of Baidu Statement on AI Bubble03:47 – News On Sam Altman Open AI06:43 – Online Manifesto “The Intelleigent Age”13:14 – Deep Learning16:26 – AI gets Better With Scale17:45 – Conclu…

5 months, 3 weeks назад @ datascienceathome.com
AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271)
AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271) AI vs. The Planet: The Energy Crisis Behind the Chatbot Boom (Ep. 271)

In this episode of Data Science at Home, we dive into the hidden costs of AI’s rapid growth — specifically, its massive energy consumption.

With tools like ChatGPT reaching 200 million weekly active users, the environmental impact of AI is becoming impossible to ignore.

Each query, every training session, and every breakthrough come with a price in kilowatt-hours, raising questions about AI’s sustainability.

Join us, as we uncovers the staggering figures behind AI’s energy demands and explores practical solutions for the future.

From efficiency-focused algorithms and specialized hardware to decentralized learning, this episode examines how we can balance AI’s advancements with our planet’s …

6 months назад @ datascienceathome.com
Love, Loss, and Algorithms: The Dangerous Realism of AI (Ep. 270)
Love, Loss, and Algorithms: The Dangerous Realism of AI (Ep. 270) Love, Loss, and Algorithms: The Dangerous Realism of AI (Ep. 270)

Subscribe to our new channel https://www.youtube.com/@DataScienceatHomeIn this episode of Data Science at Home, we confront a tragic story highlighting the ethical and emotional complexities of AI technology.

This devastating event has sparked urgent discussions on the mental health risks, ethical responsibilities, and potential regulations surrounding AI chatbots, especially as they become increasingly lifelike.

🎙️ Topics Covered:AI & Emotional Attachment: How hyper-realistic AI chatbots can foster intense emotional bonds with users, especially vulnerable groups like adolescents.

Mental Health Risks: The potential for AI to unintentionally contribute to mental health issues, and the challe…

6 months, 1 week назад @ datascienceathome.com
VC Advice Exposed: When Investors Don’t Know What They Want (Ep. 269)
VC Advice Exposed: When Investors Don’t Know What They Want (Ep. 269) VC Advice Exposed: When Investors Don’t Know What They Want (Ep. 269)

Ever feel like VC advice is all over the place?

That’s because it is.

In this episode, I expose the madness behind the money and how to navigate their confusing advice!

Watch the video at https://youtu.be/IBrPFyRMG1QSubscribe to our new Youtube channel https://www.youtube.com/@DataScienceatHome00:00 – Introduction00:16 – The Wild World of VC Advice02:01 – Grow Fast vs. Grow Slow05:00 – Listen to Customers or Innovate Ahead09:51 – Raise Big or Stay Lean?

14:20 – The Real VC Secret: Focus on Your Team and Vision17:03 – Outro

6 months, 2 weeks назад @ datascienceathome.com
AI Says It Can Compress Better Than FLAC?! Hold My Entropy 🍿 (Ep. 268)
AI Says It Can Compress Better Than FLAC?! Hold My Entropy 🍿 (Ep. 268) AI Says It Can Compress Better Than FLAC?! Hold My Entropy 🍿 (Ep. 268)

In this episode of Data Science at Home, Frag dives deep into the wild claims that Large Language Models (LLMs) like Chinchilla 70B are beating traditional lossless compression algorithms.

🧠💥But before you toss out your FLAC collection, let’s break down Shannon’s Source Coding Theorem and why entropy sets the ultimate limit on lossless compression.

We explore: ⚙️ How LLMs leverage probabilistic patterns for compression 📉 Why compression efficiency doesn’t equal general intelligence 🚀 The practical (and ridiculous) challenges of using AI for compression 💡 Can AI actually BREAK Shannon’s limit—or is it just an illusion?

If you love AI, algorithms, or just enjoy some good old myth-busting, thi…

6 months, 3 weeks назад @ datascienceathome.com