Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning /r/MachineLearning
последний пост 1 час назад
[D] I need help thinking about what numbers go into DCGAN Generator models, in order to produce larger images
[D] I need help thinking about what numbers go into DCGAN Generator models, in order to produce larger images [D] I need help thinking about what numbers go into DCGAN Generator models, in order to produce larger images

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

1 час назад @ reddit.com
[D] CLAHE Explanation by DeepEigen | Instructor: Sanjeev Sharma | Swaayatt Robots
[D] CLAHE Explanation by DeepEigen | Instructor: Sanjeev Sharma | Swaayatt Robots [D] CLAHE Explanation by DeepEigen | Instructor: Sanjeev Sharma | Swaayatt Robots

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

2 часа назад @ reddit.com
[News] Perception Through Structured Generative Models @ ECCV 2020
[News] Perception Through Structured Generative Models @ ECCV 2020 [News] Perception Through Structured Generative Models @ ECCV 2020

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

2 часа назад @ reddit.com
The Dreamup: A codeless alternative to the Web [Project]
The Dreamup: A codeless alternative to the Web [Project] The Dreamup: A codeless alternative to the Web [Project]

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

3 часа назад @ reddit.com
[P] Tailoring a process/data for machine learning
[P] Tailoring a process/data for machine learning [P] Tailoring a process/data for machine learning

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

4 часа назад @ reddit.com
[D] Recommendations needed for an eye-tracking model for infants (< 2 years)
[D] Recommendations needed for an eye-tracking model for infants (< 2 years) [D] Recommendations needed for an eye-tracking model for infants (< 2 years)

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

4 часа назад @ reddit.com
[N] Do you want to accelerate the production of new optical devices with your research and advance the numerical tools for the simulation of photonic systems?
[N] Do you want to accelerate the production of new optical devices with your research and advance the numerical tools for the simulation of photonic systems? [N] Do you want to accelerate the production of new optical devices with your research and advance the numerical tools for the simulation of photonic systems?

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

5 часов назад @ reddit.com
Big python dataset [P]
Big python dataset [P] Big python dataset [P]

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

5 часов назад @ reddit.com
[D] What PyTorch's model serving framework are you recommending?
[D] What PyTorch's model serving framework are you recommending? [D] What PyTorch's model serving framework are you recommending?

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

5 часов назад @ reddit.com
[Discussion] Recommendation system to prescribe medication
[Discussion] Recommendation system to prescribe medication [Discussion] Recommendation system to prescribe medication

My professor and I had the idea of using some artificial intelligence/machine learning algorithms to perform a recommendation system of drugs to patients with a certain clinical condition.

The idea is a prescribing system who would recommend the best drug for a patient due to its characteristics (age, sex, commodities, etc).

How: use the ML algorithm to learn from the data which drug has the better performance considering characteristics of patients.

I guess I already understood the training part of the model, but I'm stuttering with validation and test.

Example: for ValidationPatient0001 the model recommended the drug A.

5 часов назад @ reddit.com
[D] What should be the Query Q, Key K, and Value V vectors/matrics in torch.nn.MultiheadAttention?
[D] What should be the Query Q, Key K, and Value V vectors/matrics in torch.nn.MultiheadAttention? [D] What should be the Query Q, Key K, and Value V vectors/matrics in torch.nn.MultiheadAttention?

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

6 часов назад @ reddit.com
[Discussion] Analysis: Differences between Statistics and Machine Learning
[Discussion] Analysis: Differences between Statistics and Machine Learning [Discussion] Analysis: Differences between Statistics and Machine Learning

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

6 часов назад @ reddit.com
[R] Shape Adaptor: A Learnable Resizing Module
[R] Shape Adaptor: A Learnable Resizing Module [R] Shape Adaptor: A Learnable Resizing Module

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

6 часов назад @ reddit.com
[Discussion] How to encode an image to StyleGAN latent space in seconds?
[Discussion] How to encode an image to StyleGAN latent space in seconds? [Discussion] How to encode an image to StyleGAN latent space in seconds?

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

7 часов назад @ reddit.com
[D] ECCV 2020 Workshop Registrations
[D] ECCV 2020 Workshop Registrations [D] ECCV 2020 Workshop Registrations

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

7 часов назад @ reddit.com
Towards Data Science Towards Data Science
последний пост 2 часа назад
No, The Data Never Says Anything
No, The Data Never Says Anything No, The Data Never Says Anything

No, The Data Never Says AnythingStop putting words in data’s mouthPicture by Rebecca Barray on FlickrI’m bothered by people going “Clearly, the data says…” or “It is obvious from the data that…”.

She states:INFERENCE = DATA + ASSUMPTIONSIt is our assumptions that speak, not the data.

The data scientists / data analysts might be experts in the statistical assumptions, but they might not know much about the causal assumptions.

Clearly, the data says that the admissions process is sexist.”Barbara also crunched the numbers, but she kept the original data as is.

In closingIf someone tries to browbeat others with data by saying “The data clearly says X!”, try asking them what their causal assumpt…

2 часа назад @ towardsdatascience.com
Spark Streaming for Beginners
Spark Streaming for Beginners Spark Streaming for Beginners

Spark Streaming for BeginnersAn understanding of the concept of Spark Streaming and code demonstrations using Java API for beginners.

Spark Streaming is one of the most important parts of Big Data ecosystem.

How to initiate Spark Streaming?

While creating Java Streaming Context object, we need to specify batch interval; basically spark streaming divides the incoming data into batches such that the final result is also generated in batches.

JavaPairDStream swap_again = sort.mapToPair(x -> x.swap());inputIt gets swapped againThis was an outline of how spark streaming works and a few examples of how we apply transformations to DStreams.

2 часа назад @ towardsdatascience.com
Analyzing & Visualizing Amazon Redshift Data — Tutorial
Analyzing & Visualizing Amazon Redshift Data — Tutorial Analyzing & Visualizing Amazon Redshift Data — Tutorial

Analyzing & Visualizing Amazon Redshift Data — TutorialLearn to Analyze & Visualize Amazon Redshift Data Using KnowiPhoto by Campaign Creators on UnsplashTable of ContentsIntroductionAmazon Redshift is Amazon’s cloud-based relational database management system (RBDMS).

If you’re interested in learning how to use Knowi to analyze data from Amazon Redshift, you’ve come to the right place.

Head down to “Data Warehouses” and click on Amazon Redshift.

As you can see, Facebook emails have a conversion rate of just over 1%, while Netflix emails have a conversion rate of less than 0.5%.

Drag your new “Email Campaign — Area Visualization” widget to the top of your dashboard, which will bring the ori…

2 часа назад @ towardsdatascience.com
Arrays 2.0: Linked List & Queue Data Structures
Arrays 2.0: Linked List & Queue Data Structures Arrays 2.0: Linked List & Queue Data Structures

Moreover, circular linked lists are exceptionally easy to create — just link the tail node to the head node in a non-circular linked list.

Hence, performing standard and well-developed search procedures like binary search do not work in linked lists, although there are search methods designed especially for linked lists.

Note that in this case, we are creating a circular linked list (which can be repurposed as a queue).

A doubly linked list is a linked list with both a previous and a next pointer.

A circular linked list is a linked list in which there does not exist a head or a tail because elements are linked such that the list loops.

2 часа назад @ towardsdatascience.com
A pictorial guide to understanding Random Forest Algorithm
A pictorial guide to understanding Random Forest Algorithm A pictorial guide to understanding Random Forest Algorithm

What this article is aboutIn this article , we will see how the Random Forest algorithm works internally.

Random Forest in one paragraphRandom Forest ( RF) is a tree based algorithm .

It is an ensemble of multiple random trees of different kinds.

Here for the sake of simplicity and for the example, we are choosing 3 random features.

InferencingNow lets predict the values in an unseen data set ( the test data set )For inferencing (more commonly referred to as predicting/ scoring ) the test data, the algorithm passes the record through each mini-tree.

2 часа назад @ towardsdatascience.com
Accurately Labeling Subjective Question-Answer Content Using BERT
Accurately Labeling Subjective Question-Answer Content Using BERT Accurately Labeling Subjective Question-Answer Content Using BERT

Accurately Labeling Subjective Question-Answer Content Using BERTA NLP Tutorial on a 6th Place Solution for Kaggle Q&A Understanding CompetitionIntroductionKaggle released Q&A understanding competition at the beginning of 2020.

If the mirror BERT model can share the weights of the first BERT model, we call it “siamese” structure.

We experimented with both siamese and double structure and choose the best N base models according to cross-validate scores.

They are Siamese Roberta base, Siamese XLNet base, Double Albert base V2, Siamese BERT base uncased.

In the prediction stage, we input the test data to all out-of-fold base models to get the predictions.

4 часа назад @ towardsdatascience.com
Build Your First Shiny Web App in R
Build Your First Shiny Web App in R Build Your First Shiny Web App in R

Shiny is an R package that allows you to easily build interactive web applications.

The benefit of using Shiny is that it makes it possible to extend your R code to the web that would essentially help to expand its usability to a wider community (i.e.

Digital Ocean) or via app hosting services such as Shinyapps.io and Heroku.

The benefit of using an app hosting service is that you don’t have to worry about managing a server.

These app hosting service providers allows you to focus your energy on building the application, period!

4 часа назад @ towardsdatascience.com
PP-YOLO Surpasses YOLOv4 — Object Detection Advances
PP-YOLO Surpasses YOLOv4 — Object Detection Advances PP-YOLO Surpasses YOLOv4 — Object Detection Advances

PP-YOLO Surpasses YOLOv4 — Object Detection AdvancesBaidu publishes PP-YOLO and pushes the state of the art in object detection research by building on top of YOLOv3, the PaddlePaddle deep learning framework, and cutting edge computer vision research.

PP-YOLO evaluation shows faster inference (x-axis) with better accuracy (y-axis)PP-YOLO evaluation metrics show improved performance over YOLOv4, the incumbent state of the art object detection model.

YOLOv3 made further improvements to the detection network and began to mainstream the object detection process.

Anatomy of the YOLO DetectorA graphical depiction of the PP-YOLO object detection networkThe YOLO detector is broken into three main p…

4 часа назад @ towardsdatascience.com
Introduction to Transfer Learning
Introduction to Transfer Learning Introduction to Transfer Learning

Transfer learning is all about how to use a pre-trained network and apply it to our custom task, transferring what it learned from previous tasks.

Transfer learning is where we take architecture like VGG 16 and ResNet which are the result of many architectures and extensive hyperparameter tuning based on what they have already learned we apply that knowledge to a new task/model instead of starting from scratch which is called Transfer learning.

Some of the Transfer Learning models include:XceptionVGG16VGG19Resnet, ResnetV2InceptionV3MobileNetImplementing Medical Application Using Transfer LearningIn this application, we will detect whether the person is having pneumonia or not.

Importing VG…

4 часа назад @ towardsdatascience.com
Add Animated Charts To Your Dashboards With Streamlit-Python
Add Animated Charts To Your Dashboards With Streamlit-Python Add Animated Charts To Your Dashboards With Streamlit-Python

You can interact with the user to get inputs and display dynamic charts, tables, maps etc.

Streamlit can be downloaded using the pip install command (pip install streamlit).

In this article, I have shared a sample animated bar chart built using Streamlit.

Although, some of the information to achieve this is available in bits and pieces, I couldn’t find much information on animated charts in one place.

I hope this will help somebody who is looking to add an animated chart in their Streamlit Dashboard.

4 часа назад @ towardsdatascience.com
Solving Combinatorial Problems with PySpark
Solving Combinatorial Problems with PySpark Solving Combinatorial Problems with PySpark

Photo by Joe Ciciarelli on UnsplashSolving Combinatorial Problems with PySparkPartitioning combinatorial problems using binary representationLet us consider the problem statement.

A brute force strategy for solving this problem would be to permute over the numbers so first, one selects 1 out of n numbers, then 2, then 3, and so on.

If we don’t know function f then we wouldn’t know if the order of the input parameters is important or not.

Some Examples of Function fThere are various cases, where knowledge of the function f can reduce the search space drastically.

ConclusionIn this post, I laid out a simple framework on how to solve the combinatorial problem using a simple function f, though …

4 часа назад @ towardsdatascience.com
Neural Style Transfer for Audio in Pytorch
Neural Style Transfer for Audio in Pytorch Neural Style Transfer for Audio in Pytorch

Neural Style Transfer for Audio in PytorchA Pytorch pipeline to implement neural style transfer on podcastsImage by James Owen on UnsplashThey’ve been some really interesting applications of style transfer.

So I explored the idea of applying neural style transfer to audio.

Run style transferNow I’ll run the style transfer.

This piece of code was taken from the pytorch tutorial for neural style transfer.

For each iteration of the network the style loss and content loss is calculated.

4 часа назад @ towardsdatascience.com
Top AI fellowship programs to look out for
Top AI fellowship programs to look out for Top AI fellowship programs to look out for

Top AI fellowship programs to look out forA guide to the best AI internships.

Google AI residency ProgramPhoto by Rajeshwar Bachu on UnsplashWhen it comes to best internships not talking about google will make no sense at all.

Google AI residency program as the name suggests is an 18-month AI research program with the goal of supporting the next generation of AI researchers.

Open AI Scholars ProgramWhen it comes to research in AI not talking about OpenAI will be super weird.

IBM AI Residency ProgramLast but not the least IBM’s AI program is also one of the most ambitious 12-month AI program.

5 часов назад @ towardsdatascience.com
Dataistic World — The Age of Big Data and Dataism
Dataistic World — The Age of Big Data and Dataism Dataistic World — The Age of Big Data and Dataism

OpinionDataistic World — The Age of Big Data and DataismAlthough Big Data may seem to make it possible to collect more data to find more useful information, the truth is that more data does not necessarily mean more useful information.

One of the best-known technologies to support big data is artificial intelligence (AI) in the form of big data analytics.

Big data has a number of implications that could change the approach to statistical data analysis.

Big data is a variation on the same theme that those who have worked with data for years already understand.

Even though Big Data may seem to make it possible to collect more data to find more useful information, the truth is that more data d…

5 часов назад @ towardsdatascience.com
DecisionTreeRegressor — Stop Using For Future Projections!
DecisionTreeRegressor — Stop Using For Future Projections! DecisionTreeRegressor — Stop Using For Future Projections!

Let us assume that it is easier and economical to measure the iron and calcium content compare to protein content.

In the code below, “Protein content” data column is dropped from the DataFrame and remaining, data i.e independent variables datapoints is declared as X_train.

X_train=Training_data.drop(["Protein Content "], axis=1)y_train=Training_data.drop(["Days Passed", "Iron Content " ,"Calcium Content "], axis=1)The same process is repeated in the below code for the testing data set i.e.

values from day 901 to day 1142,X_test=Test_data.drop(["Protein Content "], axis=1)y_test=Test_data.drop(["Days Passed", "Iron Content " ,"Calcium Content "], axis=1)Step 5- DecisionTreeRegressor model i…

5 часов назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост 1 месяц, 2 недели назад
Curve Detectors
Curve Detectors

Part one of a three part deep dive into the curve neuron family.

1 месяц, 2 недели назад @ distill.pub
Exploring Bayesian Optimization
Exploring Bayesian Optimization

How to tune hyperparameters for your machine learning model using Bayesian optimization.

3 месяца назад @ distill.pub
An Overview of Early Vision in InceptionV1
An Overview of Early Vision in InceptionV1

An overview of all the neurons in the first five layers of InceptionV1, organized into a taxonomy of 'neuron groups.'

4 месяца назад @ distill.pub
Visualizing Neural Networks with the Grand Tour
Visualizing Neural Networks with the Grand Tour

By focusing on linear dimensionality reduction, we show how to visualize many dynamic phenomena in neural networks.

4 месяца, 3 недели назад @ distill.pub
Zoom In: An Introduction to Circuits
Zoom In: An Introduction to Circuits

By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.

4 месяца, 3 недели назад @ distill.pub
Thread: Circuits
Thread: Circuits

What can we learn if we invest heavily in reverse engineering a single neural network?

4 месяца, 3 недели назад @ distill.pub
Growing Neural Cellular Automata
Growing Neural Cellular Automata

Training an end-to-end differentiable, self-organising cellular automata model of morphogenesis, able to both grow and regenerate specific patterns.

5 месяцев, 3 недели назад @ distill.pub
Visualizing the Impact of Feature Attribution Baselines
Visualizing the Impact of Feature Attribution Baselines

Exploring the baseline input hyperparameter, and how it impacts interpretations of neural network behavior.

6 месяцев, 3 недели назад @ distill.pub
Computing Receptive Fields of Convolutional Neural Networks
Computing Receptive Fields of Convolutional Neural Networks

Detailed derivations and open-source code to analyze the receptive fields of convnets.

9 месяцев назад @ distill.pub
The Paths Perspective on Value Learning
The Paths Perspective on Value Learning

A closer look at how Temporal Difference Learning merges paths of experience for greater statistical efficiency

10 месяцев, 1 неделя назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Two Examples of Useful, Non-Robust Features
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Two Examples of Useful, Non-Robust Features

An example project using webpack and svelte-loader and ejs to inline SVGs

12 месяцев назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Robust Feature Leakage
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Robust Feature Leakage

An example project using webpack and svelte-loader and ejs to inline SVGs

12 месяцев назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'

The main hypothesis in Ilyas et al. (2019) happens to be a special case of a more general principle that is commonly accepted in the robustness to distributional shift literature

12 месяцев назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarially Robust Neural Style Transfer
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarially Robust Neural Style Transfer

An experiment showing adversarial robustness makes neural style transfer work on a non-VGG architecture

12 месяцев назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features'
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features'

Six comments from the community and responses from the original authors

12 месяцев назад @ distill.pub
The Gradient The Gradient
последний пост 1 неделя, 3 дня назад
Shortcuts: How Neural Networks Love to Cheat
Shortcuts: How Neural Networks Love to Cheat Shortcuts: How Neural Networks Love to Cheat

The result described above is true, with one little twist: instead of using state-of-the-art artificial deep neural networks, researchers trained “natural” neural networks - more precisely, a flock of four pigeons - to diagnose breast cancer.

In the end, neural networks perhaps aren’t that different from (lazy) humans after all ...

Shortcut Learning in Deep Neural Networks.

Shortcut Learning in Deep Neural Networks.

Jörn-Henrik Jacobsen et al., "Shortcuts: Neural Networks Love to Cheat", The Gradient, 2020.

1 неделя, 3 дня назад @ thegradient.pub
How to Stop Worrying About Compositionality
How to Stop Worrying About Compositionality How to Stop Worrying About Compositionality

The real problem is that language does productivity in a very particular way, and it remains unclear how.

It may also provide fresh ideas, or simply the relief of knowing that compositionality does not have to be tackled entirely in one go.

The compositionality principle says that the meaning of the sentence Dogs sleep is made of the meaning of dogs and the meaning of sleep.

Adhering to purely bottom-up compositionality does make for a somewhat cumbersome semantics, though.

CitationFor attribution in academic contexts or books, please cite this work asAurelie Herbelot, "How to Stop Worrying About Compositionality", The Gradient, 2020.

2 недели, 2 дня назад @ thegradient.pub
Challenges of Comparing Human and Machine Perception
Challenges of Comparing Human and Machine Perception Challenges of Comparing Human and Machine Perception

Given these apparent similarities, many questions arise: How similar are human and machine vision really?

Geirhos et al.

The following figure shows two examples of the Synthetic Visual Reasoning Test (SVRT) (Fleuret et al., 2011).

A large recognition gap was identifiable for our DNN when testing machine-selected stimuli - unlike for the machine algorithms tested by Ullman et al.

Human and machine illustration taken from https://www.flickr.com/photos/gleonhard/33661762360 under the license https://creativecommons.org/licenses/by-sa/2.0/CitationFor attribution in academic contexts or books, please cite this work asJudy Borowski and Christina Funke, "Challenges of Comparing Human and Machine P…

4 недели, 1 день назад @ thegradient.pub
Lessons from the PULSE Model and Discussion
Lessons from the PULSE Model and Discussion Lessons from the PULSE Model and Discussion

— 🔥Kareem Carr🔥 (@kareem_carr) June 23, 2020Further discussion on the subject also occurred on reddit in the thread "[Discussion] about data bias vs inductive bias in machine learning sparked by the PULSE paper/demo".

— Yann LeCun (@ylecun) June 26, 2020The PULSE model and this exchange were later covered in VentureBeat with the article "A deep learning pioneer’s teachable moment on AI bias".

Regardless of which stance you agree with, it makes sense to at least understand the criticisms directed at Dr. LeCun.

— Yann LeCun (@ylecun) June 21, 2020Which again led to questions regarding the validity of the initial claim:Yes.

CitationFor attribution in academic contexts or books, please cite thi…

1 месяц, 1 неделя назад @ thegradient.pub
A Speech-To-Text Practitioner’s Criticisms of Industry and Academia
A Speech-To-Text Practitioner’s Criticisms of Industry and Academia A Speech-To-Text Practitioner’s Criticisms of Industry and Academia

This is a follow-up article to our article on building speech-to-text (STT) models, Towards an ImageNet Moment for Speech-to-Text.

Сriticisms of the IndustryIn general, the majority of STT papers we have read were written by researchers from the industry (e.g.

Most criticisms of STT papers and solutions can be attributed to either the"academic" part or the "industry" part of the researchers’ background.

The majority of modern STT papers usually just heavily overfit on the LibriSpeech ASR corpus (LibriSpeech) with increasingly more extravagant methods.

CitationFor attribution in academic contexts or books, please cite this work asAlexander Veysov, "A Speech-To-Text Practitioner’s Criticisms …

4 месяца назад @ thegradient.pub
Towards an ImageNet Moment for Speech-to-Text
Towards an ImageNet Moment for Speech-to-Text Towards an ImageNet Moment for Speech-to-Text

Speech-to-text (STT), also known as automated-speech-recognition (ASR), has a long history and has made amazing progress over the past decade.

IntroductionFollowing the success and the democratization (the so-called "ImageNet moment", i.e.

This piece will describe our pursuit of an ImageNet moment for STT, which has so far not been found, and particularly in the context of Russian language.

(i) is easy to estimate just by looking at the model's performance during the first 20-25% of its epochs.

CitationFor attribution in academic contexts or books, please cite this work asAlexander Veysov, "Toward's an ImageNet Moment for Speech-to-Text", The Gradient, 2020.

4 месяца, 1 неделя назад @ thegradient.pub
Quantifying Independently Reproducible Machine Learning
Quantifying Independently Reproducible Machine Learning Quantifying Independently Reproducible Machine Learning

My investigation in reproducible ML has also relied on personal notes and records hosted on Mendeley and Github.

What Makes a ML Paper Reproducible?

The biggest factors are that we cannot take all of our assumptions about so-called reproducible ML at face value.

At the same time, our process and systems must result in reproducible work that does not lead us astray.

AcknowledgmentsFeature image source: https://xkcd.com/242/CitationFor attribution in academic contexts or books, please cite this work asEdward Raff, "Quantifying Independently Reproducible Machine Learning", The Gradient, 2020.

5 месяцев, 4 недели назад @ thegradient.pub
GPT-2 and the Nature of Intelligence
GPT-2 and the Nature of Intelligence GPT-2 and the Nature of Intelligence

--The AI system GPT-2, in a December 2019 interview with The Economist, "An artificial intelligence predicts the future"Innateness, empiricism, and recent developments in deep learningConsider two classic hypotheses about the development of language and cognition.

Consider GPT-2, an AI system that was recently featured in The New Yorker and interviewed by The Economist.

The popular blog StatStarCodex featured it, too, in a podcast entitled "GPT-2 as a step towards General Intelligence".

Compared to any previous system for generating natural language, GPT-2 has a number of remarkable strengths.

I speak fluent EnglishIf you run your experiments talktotransformer.com, you will quickly learn th…

6 месяцев, 1 неделя назад @ thegradient.pub
The Economics of AI Today
The Economics of AI Today The Economics of AI Today

Every day we hear claims that Artificial Intelligence (AI) systems are about to transform the economy, creating mass unemployment and vast monopolies.

In September 2017, a group of distinguished economists gathered in Toronto to set out a research agenda for the Economics of Artificial Intelligence (AI).

Previous editions of the Economics of AI conference included papers about the impact of AI in sectors such as media or health-care.

Lack of diversity in the AI research workforce, and the increasing influence of the private sector in setting AI research (and ethical) agendas as part of the industrialization of AI research suggest that this could be a problem, but the evidence base is lackin…

6 месяцев, 2 недели назад @ thegradient.pub
Is NeurIPS Getting Too Big?
Is NeurIPS Getting Too Big? Is NeurIPS Getting Too Big?

NeurIPS 2019, the latest incarnation of the Neural Information Processing Systems conference, wrapped up just over a week ago.

No, that's a keynote at #NeurIPS2019 pic.twitter.com/nJjONGzJww — Jevgenij Gamper (@brutforcimag) December 11, 2019 NeurIPS poster session- Too crowded.

:(NeurIPS 2019, Vancouver, Canada: Got the visa 3 weeks before.

CitationFor attribution in academic contexts or books, please cite this work asAndrey Kurenkov, "Is NeurIPS Getting Too Big?

BibTeX citation:@article{kurenkov2019neuripst,author = {Kurenkov, Andrey},title = {Is NeurIPS Getting Too Big?

7 месяцев, 2 недели назад @ thegradient.pub
An Epidemic of AI Misinformation
An Epidemic of AI Misinformation An Epidemic of AI Misinformation

Unfortunately, the problem of overhyped AI extends beyond the media itself.

General AI still seems like it might be a couple decades away, sixty years after the first optimistic projections were issued.

Hundreds of deep learning for radiology companies have been spawned in the meantime, but thus far no actual radiologists have been replaced, and the best guess is that deep learning can augment radiologists, but not, in the near-term replace them.

If AI system is allegedly better than humans, then which humans, and how much better?

CitationFor attribution in academic contexts or books, please cite this work asGary Marcus, "An Epidemic of AI Misinformation", The Gradient, 2019.

8 месяцев, 1 неделя назад @ thegradient.pub
Introduction to Artificial Life for People who Like AI
Introduction to Artificial Life for People who Like AI Introduction to Artificial Life for People who Like AI

NEAT was awarded the 2017 International Society for Artificial Life Award for Outstanding Paper of the Decade.

First, I think we are seeing the first signs of the next AI winter, a period where people lose confidence in AI research and funding dries out.

She was recently elected to the board of the International Society for Artificial Life.

AcknowledgmentsHeader from "Lenia — Biology of Artificial Life", used with permission of Bert Wang-Chak Chan.

CitationFor attribution in academic contexts or books, please cite this work asLana Sinapayen, "Introduction to Artificial Life for People who Like AI", The Gradient, 2019.

8 месяцев, 1 неделя назад @ thegradient.pub
How Machine Learning Can Help Unlock the World of Ancient Japan
How Machine Learning Can Help Unlock the World of Ancient Japan How Machine Learning Can Help Unlock the World of Ancient Japan

However, these models were unable to achieve strong performance on Kuzushiji recognition.

There are several reasons why Kuzushiji recognition is challenging:Capturing both local and global context is important.

This is one reason why conventional sequence models do not have the capability to work well with many Kuzushiji documents.

However there are many other types of Kuzushiji text that a person might want to transcribe.

CitationFor attribution in academic contexts or books, please cite this work asALex Lamb, "How Machine Learning Can Help Unlock the World of Ancient Japan", The Gradient, 2019.

8 месяцев, 3 недели назад @ thegradient.pub
Gaussian Processes, not quite for dummies
Gaussian Processes, not quite for dummies Gaussian Processes, not quite for dummies

Note: if all k components are independent Gaussian random variables, then $X$ must be multivariate Gaussian (because the sum of independent Gaussian random variables is always Gaussian).

Higher dimensional Gaussian5D GaussianNow we can consider a higher dimension Gaussian, starting from 5D — so the covariance matrix is now 5x5.

We then take K and add $I\sigma_y^2$ for the final covariance matrix to factor in noise -- more on this later.

Gaussian ProcessTextbook definitionFrom the above derivation, you can view Gaussian process as a generalization of multivariate Gaussian distribution to infinitely many variables.

CitationFor attribution in academic contexts or books, please cite this work a…

8 месяцев, 3 недели назад @ thegradient.pub
Evaluation Metrics for Language Modeling
Evaluation Metrics for Language Modeling Evaluation Metrics for Language Modeling

Counterintuitively, having more metrics actually makes it harder to compare language models, especially as indicators of how well a language model will perform on a specific downstream task are often unreliable.

Despite the presence of these downstream evaluation benchmarks, traditional intrinsic metrics are, nevertheless, extremely useful during the process of training the language model itself.

Proof: let P be the distribution of the underlying language and Q be the distribution learned by a language model.

The performance of N-gram language models do not improve much as N goes above 4, whereas the performance of neural language models continue improving over time.

In less than two years,…

9 месяцев, 2 недели назад @ thegradient.pub
DataTau DataTau
последний пост 5 часов назад
How to store user inputs for a chatbot?
How to store user inputs for a chatbot? How to store user inputs for a chatbot?

You must be logged to comment.

5 часов назад @ datatau.net
Easy Guide To Data Preprocessing In Python
Easy Guide To Data Preprocessing In Python Easy Guide To Data Preprocessing In Python

You must be logged to comment.

10 часов назад @ datatau.net
How restaurants use Big Data to recover from the Covid-19 recession
How restaurants use Big Data to recover from the Covid-19 recession How restaurants use Big Data to recover from the Covid-19 recession

You must be logged to comment.

15 часов назад @ datatau.net
The City surveyed residents on how they use open data. Here’s what it found
The City surveyed residents on how they use open data. Here’s what it found The City surveyed residents on how they use open data. Here’s what it found

You must be logged to comment.

1 день назад @ datatau.net
Why monitoring your big data analytics pipeline is important (and how to get there)
Why monitoring your big data analytics pipeline is important (and how to get there) Why monitoring your big data analytics pipeline is important (and how to get there)

You must be logged to comment.

1 день, 21 час назад @ datatau.net
Automatically unlock & lock Ubuntu system using Opencv
Automatically unlock & lock Ubuntu system using Opencv Automatically unlock & lock Ubuntu system using Opencv

You must be logged to comment.

2 дня назад @ datatau.net
Top 7 Types of Charts used for Data Visualization | Implemented using Python Seaborn
Top 7 Types of Charts used for Data Visualization | Implemented using Python Seaborn Top 7 Types of Charts used for Data Visualization | Implemented using Python Seaborn

You must be logged to comment.

2 дня, 4 часа назад @ datatau.net
How to Create a Workout App: Detailed Guide
How to Create a Workout App: Detailed Guide How to Create a Workout App: Detailed Guide

You must be logged to comment.

3 дня, 3 часа назад @ datatau.net
Self-supervised representation learning on videos
Self-supervised representation learning on videos Self-supervised representation learning on videos

You must be logged to comment.

4 дня, 10 часов назад @ datatau.net
Construction Project Management Software
Construction Project Management Software Construction Project Management Software

You must be logged to comment.

5 дней, 7 часов назад @ datatau.net
Using visualizations in Natural Language Processing
Using visualizations in Natural Language Processing Using visualizations in Natural Language Processing

You must be logged to comment.

5 дней, 7 часов назад @ datatau.net
Top 5 Common Time Series Forecasting Algorithms
Top 5 Common Time Series Forecasting Algorithms Top 5 Common Time Series Forecasting Algorithms

You must be logged to comment.

5 дней, 13 часов назад @ datatau.net
Medical & Healthcare Apps for Businesses
Medical & Healthcare Apps for Businesses Medical & Healthcare Apps for Businesses

You must be logged to comment.

6 дней, 11 часов назад @ datatau.net
Top 10 tools for creating a simple mobile app prototype
Top 10 tools for creating a simple mobile app prototype Top 10 tools for creating a simple mobile app prototype

You must be logged to comment.

6 дней, 11 часов назад @ datatau.net
How to Create a Workout App: Detailed Guide
How to Create a Workout App: Detailed Guide How to Create a Workout App: Detailed Guide

You must be logged to comment.

6 дней, 11 часов назад @ datatau.net
Synced Review
последний пост 1 день, 1 час назад
Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks
Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 день, 1 час назад @ medium.com
Applying Linearly Scalable Transformers to Model Longer Protein Sequences
Applying Linearly Scalable Transformers to Model Longer Protein Sequences Applying Linearly Scalable Transformers to Model Longer Protein Sequences

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

4 дня, 2 часа назад @ medium.com
AI-Powered ‘Genderify’ Platform Shut Down After Bias-Based Backlash
AI-Powered ‘Genderify’ Platform Shut Down After Bias-Based Backlash AI-Powered ‘Genderify’ Platform Shut Down After Bias-Based Backlash

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

4 дня, 23 часа назад @ medium.com
MLPerf Training v0.7 Results Released: Google & NVIDIA Lead the Race
MLPerf Training v0.7 Results Released: Google & NVIDIA Lead the Race MLPerf Training v0.7 Results Released: Google & NVIDIA Lead the Race

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

5 дней, 23 часа назад @ medium.com
New Google Research Incorporates Societal Context in ML Systems
New Google Research Incorporates Societal Context in ML Systems New Google Research Incorporates Societal Context in ML Systems

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

5 дней, 23 часа назад @ medium.com
Google Proposes Lasso Algorithm Variant for Learning Convolutions: ‘Bridging the Gap Between…
Google Proposes Lasso Algorithm Variant for Learning Convolutions: ‘Bridging the Gap Between… Google Proposes Lasso Algorithm Variant for Learning Convolutions: ‘Bridging the Gap Between…

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

6 дней назад @ medium.com
Adobe and Stanford Unveil SOTA Method for Human Pose Estimation
Adobe and Stanford Unveil SOTA Method for Human Pose Estimation Adobe and Stanford Unveil SOTA Method for Human Pose Estimation

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 1 день назад @ medium.com
ICML 2020 Announces Outstanding Paper Awards
ICML 2020 Announces Outstanding Paper Awards ICML 2020 Announces Outstanding Paper Awards

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 1 день назад @ medium.com
Chinese Search Specialist Sogou Receives Tencent Buyout Offer
Chinese Search Specialist Sogou Receives Tencent Buyout Offer Chinese Search Specialist Sogou Receives Tencent Buyout Offer

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 1 день назад @ medium.com
DeepMind’s AlignNet Learns Stable Object Representations Across Time
DeepMind’s AlignNet Learns Stable Object Representations Across Time DeepMind’s AlignNet Learns Stable Object Representations Across Time

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 3 дня назад @ medium.com
Microsoft & MIT Apply Adversarially Robust Models for Better Transfer Learning
Microsoft & MIT Apply Adversarially Robust Models for Better Transfer Learning Microsoft & MIT Apply Adversarially Robust Models for Better Transfer Learning

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 4 дня назад @ medium.com
Joint Spatial-Temporal Transformation Learning Boosts Video Inpainting Performance
Joint Spatial-Temporal Transformation Learning Boosts Video Inpainting Performance Joint Spatial-Temporal Transformation Learning Boosts Video Inpainting Performance

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 5 дней назад @ medium.com
Facebook & Inria Propose High-Performance Self-Supervised Technique for CV Tasks
Facebook & Inria Propose High-Performance Self-Supervised Technique for CV Tasks Facebook & Inria Propose High-Performance Self-Supervised Technique for CV Tasks

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 6 дней назад @ medium.com
Automating Machine Learning: Google AutoML-Zero Evolves ML Algorithms From Scratch
Automating Machine Learning: Google AutoML-Zero Evolves ML Algorithms From Scratch Automating Machine Learning: Google AutoML-Zero Evolves ML Algorithms From Scratch

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 6 дней назад @ medium.com
Meet ByteDance AI’s Xiaomingbot: World’s First Multilingual and Multimodal AI News Agent
Meet ByteDance AI’s Xiaomingbot: World’s First Multilingual and Multimodal AI News Agent Meet ByteDance AI’s Xiaomingbot: World’s First Multilingual and Multimodal AI News Agent

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

2 недели назад @ medium.com
🔬 Science
Papers With Code Papers With Code
последний пост 1 день, 3 часа назад
Optimization of XNOR Convolution for Binary Convolutional Neural Networks on GPU
Optimization of XNOR Convolution for Binary Convolutional Neural Networks on GPU Optimization of XNOR Convolution for Binary Convolutional Neural Networks on GPU

Binary convolutional networks have lower computational load and lower memory foot-print compared to their full-precision counterparts.

So, they are a feasible alternative for the deployment of computer vision applications on limited capacity embedded devices... Once trained on less resource-constrained computational environments, they can be deployed for real-time inference on such devices.

In this study, we propose an implementation of binary convolutional network inference on GPU by focusing on optimization of XNOR convolution.

Experimental results show that using GPU can provide a speed-up of up to $42.61\times$ with a kernel size of $3\times3$.

The implementation is publicly available a…

1 день, 3 часа назад @ paperswithcode.com
Linear Attention Mechanism: An Efficient Attention for Semantic Segmentation
Linear Attention Mechanism: An Efficient Attention for Semantic Segmentation Linear Attention Mechanism: An Efficient Attention for Semantic Segmentation

In this paper, to remedy this deficiency, we propose a Linear Attention Mechanism which is approximate to dot-product attention with much less memory and computational costs.

The efficient design makes the incorporation between attention mechanisms and neural networks more flexible and versatile...

Experiments conducted on semantic segmentation demonstrated the effectiveness of linear attention mechanism.

Code is available at https://github.com/lironui/Linear-Attention-Mechanism.

(read more)

1 день, 3 часа назад @ paperswithcode.com
What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation
What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

We first study how much off-the-shelf pre-trained BERT "knows" about recommendation items such as books, movies and music.

Content-based knowledge is knowledge that requires the model to match the titles of items with their content information, such as textual descriptions and genres.

In contrast, collaborative-based knowledge requires the model to match items with similar ones, according to community interactions such as ratings.

Finally, we study how BERT performs in a conversational recommendation downstream task.

Overall, our analyses and experiments show that: (i) BERT has knowledge stored in its parameters about the content of books, movies and music; (ii) it has more content-based kn…

2 дня, 1 час назад @ paperswithcode.com
New approach to MPI program execution time prediction
New approach to MPI program execution time prediction New approach to MPI program execution time prediction

The problem of MPI programs execution time prediction on a certain set of computer installations is considered.

One of the components of this problem is the estimation of the MPI programs execution time on a certain set of computer installations.

This is necessary to determine a proper choice of order and place for program execution.

The article proposes two new approaches to the program execution time prediction problem.

The article shows how the embeddings technique helps to predict the execution time of a MPI program on a certain set of computer installations.

2 дня, 1 час назад @ paperswithcode.com
Enhancement of Retinal Fundus Images via Pixel Color Amplification
Enhancement of Retinal Fundus Images via Pixel Color Amplification Enhancement of Retinal Fundus Images via Pixel Color Amplification

We propose a pixel color amplification theory and family of enhancement methods to facilitate segmentation tasks on retinal images.

Our novel re-interpretation of the image distortion model underlying dehazing theory shows how three existing priors commonly used by the dehazing community and a novel fourth prior are related... We utilize the theory to develop a family of enhancement methods for retinal images, including novel methods for whole image brightening and darkening.

We evaluate the enhancement methods as a pre-processing step to a challenging multi-task segmentation problem and show large increases in performance on all tasks, with Dice score increases over a no-enhancement baseli…

3 дня назад @ paperswithcode.com
AiR: Attention with Reasoning Capability
AiR: Attention with Reasoning Capability AiR: Attention with Reasoning Capability

While attention has been an increasingly popular component in deep neural networks to both interpret and boost performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable.

In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes... We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling quantitative measurement of attention that considers the reasoning process.

We then collect human eye-tracking and answer correctness data, and analyze various machine and human attentions on their re…

3 дня назад @ paperswithcode.com
Unsupervised Learning of Particle Image Velocimetry
Unsupervised Learning of Particle Image Velocimetry Unsupervised Learning of Particle Image Velocimetry

Recently, the development of deep learning based methods has inspired new approaches to tackle the PIV problem...

These supervised learning based methods are driven by large volumes of data with ground truth training information.

However, it is difficult to collect reliable ground truth data in large-scale, real-world scenarios.

We present here what we believe to be the first work which takes an unsupervised learning based approach to tackle PIV problems.

Instead of using ground truth data, we make use of photometric loss between two consecutive image frames, consistency loss in bidirectional flow estimates and spatial smoothness loss to construct the total unsupervised loss function.

3 дня назад @ paperswithcode.com
Decoding machine learning benchmarks
Decoding machine learning benchmarks Decoding machine learning benchmarks

Despite the availability of benchmark machine learning (ML) repositories (e.g., UCI, OpenML), there is no standard evaluation strategy yet capable of pointing out which is the best set of datasets to serve as gold standard to test different ML algorithms.

In recent studies, Item Response Theory (IRT) has emerged as a new approach to elucidate what should be a good ML benchmark...

This work applied IRT to explore the well-known OpenML-CC18 benchmark to identify how suitable it is on the evaluation of classifiers.

Most datasets evaluated in this work (84%) contain easy instances in general (e.g., around 10% of difficult instances only).

This paper presents this new evaluation methodology base…

3 дня назад @ paperswithcode.com
dMelodies: A Music Dataset for Disentanglement Learning
dMelodies: A Music Dataset for Disentanglement Learning dMelodies: A Music Dataset for Disentanglement Learning

Representation learning focused on disentangling the underlying factors of variation in given data has become an important area of research in machine learning.

However, most of the studies in this area have relied on datasets from the computer vision domain and thus, have not been readily extended to music...

In this paper, we present a new symbolic music dataset that will help researchers working on disentanglement problems demonstrate the efficacy of their algorithms on diverse domains.

1.3 million data points) to train and test deep networks for disentanglement learning.

In addition, we present benchmarking experiments using popular unsupervised disentanglement algorithms on this datase…

3 дня назад @ paperswithcode.com
Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints
Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints

Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry (VO) and simultaneous localization and mapping (SLAM), where classic methods consisting of hand-crafted features and sampling-based outlier rejection have been a dominant choice for over a decade.

Although multiple works propose to replace these modules with learning-based counterparts, most have not yet been as accurate, robust and generalizable as conventional methods...

In this paper, we design an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection, while directly optimizing for the geometric pose objective.

We s…

3 дня назад @ paperswithcode.com
Fibonacci and k-Subsecting Recursive Feature Elimination
Fibonacci and k-Subsecting Recursive Feature Elimination Fibonacci and k-Subsecting Recursive Feature Elimination

Feature selection is a data mining task with the potential of speeding up classification algorithms, enhancing model comprehensibility, and improving learning accuracy.

However, finding a subset of features that is optimal in terms of predictive accuracy is usually computationally intractable... Out of several heuristic approaches to dealing with this problem, the Recursive Feature Elimination (RFE) algorithm has received considerable interest from data mining practitioners.

In this paper, we propose two novel algorithms inspired by RFE, called Fibonacci- and k-Subsecting Recursive Feature Elimination, which remove features in logarithmic steps, probing the wrapped classifier more densely f…

3 дня назад @ paperswithcode.com
Force myography benchmark data for hand gesture recognition and transfer learning
Force myography benchmark data for hand gesture recognition and transfer learning Force myography benchmark data for hand gesture recognition and transfer learning

Force myography has recently gained increasing attention for hand gesture recognition tasks.

However, there is a lack of publicly available benchmark data, with most existing studies collecting their own data often with custom hardware and for varying sets of gestures...

This limits the ability to compare various algorithms, as well as the possibility for research to be done without first needing to collect data oneself.

We illustrate one use-case for such data, showing how we can improve gesture recognition accuracy by utilising transfer learning to incorporate data from multiple other persons.

This also illustrates that the dataset can serve as a benchmark dataset to facilitate research o…

3 дня назад @ paperswithcode.com
Camera-Based Piano Sheet Music Identification
Camera-Based Piano Sheet Music Identification Camera-Based Piano Sheet Music Identification

This paper presents a method for large-scale retrieval of piano sheet music images.

Our work differs from previous studies on sheet music retrieval in two ways... First, we investigate the problem at a much larger scale than previous studies, using all solo piano sheet music images in the entire IMSLP dataset as a searchable database.

Second, we use cell phone images of sheet music as our input queries, which lends itself to a practical, user-facing application.

We show that a previously proposed fingerprinting method for sheet music retrieval is far too slow for a real-time application, and we diagnose its shortcomings.

In experiments on IMSLP data, our proposed method achieves a mean reci…

3 дня назад @ paperswithcode.com
Foveation for Segmentation of Ultra-High Resolution Images
Foveation for Segmentation of Ultra-High Resolution Images Foveation for Segmentation of Ultra-High Resolution Images

Segmentation of ultra-high resolution images is challenging because of their enormous size, consisting of millions or even billions of pixels.

Such operations incur information loss in the field-of-view (FoV) i.e., spatial coverage and the image resolution.

The impact on segmentation performance is, however, as yet understudied.

In this work, we start with a motivational experiment which demonstrates that the trade-off between FoV and resolution affects the segmentation performance on ultra-high resolution images---and furthermore, its influence also varies spatially according to the local patterns in different areas.

The foveation module is jointly trained with the segmentation network to …

3 дня назад @ paperswithcode.com
Weakly-Supervised Cell Tracking via Backward-and-Forward Propagation
Weakly-Supervised Cell Tracking via Backward-and-Forward Propagation Weakly-Supervised Cell Tracking via Backward-and-Forward Propagation

We propose a weakly-supervised cell tracking method that can train a convolutional neural network (CNN) by using only the annotation of "cell detection" (i.e., the coordinates of cell positions) without association information, in which cell positions can be easily obtained by nuclear staining.

First, we train co-detection CNN that detects cells in successive frames by using weak-labels... Our key assumption is that co-detection CNN implicitly learns association in addition to detection.

To obtain the association, we propose a backward-and-forward propagation method that analyzes the correspondence of cell positions in the outputs of co-detection CNN.

Experiments demonstrated that the propo…

3 дня назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 1 день, 3 часа назад
Infrastructure-based Multi-Camera Calibration using Radial Projections
Infrastructure-based Multi-Camera Calibration using Radial Projections Infrastructure-based Multi-Camera Calibration using Radial Projections

Multi-camera systems are an important sensor platform for intelligent systems such as self-driving cars.

Pattern-based calibration techniques can be used to calibrate the intrinsics of the cameras individually...

Given the camera intrinsics, infrastucture-based calibration techniques are able to estimate the extrinsics using 3D maps pre-built via SLAM or Structure-from-Motion.

In this paper, we propose to fully calibrate a multi-camera system from scratch using an infrastructure-based approach.

Extensive experiments on multiple indoor and outdoor scenes with multiple multi-camera systems show that our calibration method achieves high accuracy and robustness.

3 дня назад @ paperswithcode.com
Generalization Comparison of Deep Neural Networks via Output Sensitivity
Generalization Comparison of Deep Neural Networks via Output Sensitivity Generalization Comparison of Deep Neural Networks via Output Sensitivity

Although recent works have brought some insights into the performance improvement of techniques used in state-of-the-art deep-learning models, more work is needed to understand their generalization properties.

We shed light on this matter by linking the loss function to the output's sensitivity to its input... We find a rather strong empirical relation between the output sensitivity and the variance in the bias-variance decomposition of the loss function, which hints on using sensitivity as a metric for comparing the generalization performance of networks, without requiring labeled data.

We find that sensitivity is decreased by applying popular methods which improve the generalization perfo…

3 дня назад @ paperswithcode.com
Improving Sample Eficiency with Normalized RBF Kernels
Improving Sample Eficiency with Normalized RBF Kernels Improving Sample Eficiency with Normalized RBF Kernels

In deep learning models, learning more with less data is becoming more important.

This paper explores how neural networks with normalized Radial Basis Function (RBF) kernels can be trained to achieve better sample efficiency...

Moreover, we show how this kind of output layer can find embedding spaces where the classes are compact and well-separated.

In order to achieve this, we propose a two-phase method to train those type of neural networks on classification tasks.

Experiments on CIFAR-10 and CIFAR-100 show that networks with normalized kernels as output layer can achieve higher sample efficiency, high compactness and well-separability through the presented method in comparison to network…

3 дня назад @ paperswithcode.com
A Multi-Task Learning Neural Network for Emotion-Cause Pair Extraction
A Multi-Task Learning Neural Network for Emotion-Cause Pair Extraction A Multi-Task Learning Neural Network for Emotion-Cause Pair Extraction

Include the markdown at the top of your GitHub README.md file to showcase the performance of the model.

Badges are live and will be dynamically updated with the latest ranking of this paper.

3 дня назад @ paperswithcode.com
AutoClip: Adaptive Gradient Clipping for Source Separation Networks
AutoClip: Adaptive Gradient Clipping for Source Separation Networks AutoClip: Adaptive Gradient Clipping for Source Separation Networks

Clipping the gradient is a known approach to improving gradient descent, but requires hand selection of a clipping threshold hyperparameter.

We present AutoClip, a simple method for automatically and adaptively choosing a gradient clipping threshold, based on the history of gradient norms observed during training...

Experimental results show that applying AutoClip results in improved generalization performance for audio source separation networks.

Observation of the training dynamics of a separation network trained with and without AutoClip show that AutoClip guides optimization into smoother parts of the loss landscape.

AutoClip is very simple to implement and can be integrated readily int…

3 дня, 23 часа назад @ paperswithcode.com
Ordinary Differential Equation and Complex Matrix Exponential for Multi-resolution Image Registration
Ordinary Differential Equation and Complex Matrix Exponential for Multi-resolution Image Registration Ordinary Differential Equation and Complex Matrix Exponential for Multi-resolution Image Registration

Autograd-based software packages have recently renewed interest in image registration using homography and other geometric models by gradient descent and optimization, e.g., AirLab and DRMIME.

In this work, we emphasize on using complex matrix exponential (CME) over real matrix exponential to compute transformation matrices... CME is theoretically more suitable and practically provides faster convergence as our experiments show.

Further, we demonstrate that the use of an ordinary differential equation (ODE) as an optimizable dynamical system can adapt the transformation matrix more accurately to the multi-resolution Gaussian pyramid for image registration.

Our experiments include four publi…

3 дня, 23 часа назад @ paperswithcode.com
Declarative Experimentation in Information Retrieval using PyTerrier
Declarative Experimentation in Information Retrieval using PyTerrier Declarative Experimentation in Information Retrieval using PyTerrier

The advent of deep machine learning platforms such as Tensorflow and Pytorch, developed in expressive high-level languages such as Python, have allowed more expressive representations of deep neural network architectures.

We argue that such a powerful formalism is missing in information retrieval (IR), and propose a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design... Like the aforementioned frameworks that compile deep learning experiments into primitive GPU operations, our framework targets IR platforms as backends in order to execute and evaluate retrieval pipelines.

Further, we can…

3 дня, 23 часа назад @ paperswithcode.com
When and why PINNs fail to train: A neural tangent kernel perspective
When and why PINNs fail to train: A neural tangent kernel perspective When and why PINNs fail to train: A neural tangent kernel perspective

Physics-informed neural networks (PINNs) have lately received great attention thanks to their flexibility in tackling a wide range of forward and inverse problems involving partial differential equations.

However, despite their noticeable empirical success, little is known about how such constrained neural networks behave during their training via gradient descent... More importantly, even less is known about why such models sometimes fail to train at all.

In this work, we aim to investigate these questions through the lens of the Neural Tangent Kernel (NTK); a kernel that captures the behavior of fully-connected neural networks in the infinite width limit during training via gradient desce…

3 дня, 23 часа назад @ paperswithcode.com
Biomedical and Clinical English Model Packages in the Stanza Python NLP Library
Biomedical and Clinical English Model Packages in the Stanza Python NLP Library Biomedical and Clinical English Model Packages in the Stanza Python NLP Library

We introduce biomedical and clinical English model packages for the Stanza Python NLP library.

These packages offer accurate syntactic analysis and named entity recognition capabilities for biomedical and clinical text, by combining Stanza's fully neural architecture with a wide variety of open datasets as well as large-scale unsupervised biomedical and clinical text data... We show via extensive experiments that our packages achieve syntactic analysis and named entity recognition performance that is on par with or surpasses state-of-the-art results.

We further show that these models do not compromise speed compared to existing toolkits when GPU acceleration is available, and are made easy …

3 дня, 23 часа назад @ paperswithcode.com
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods.

We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box... Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions.

Further, we introduce a mask alignment weighting loss and a feature a…

3 дня, 23 часа назад @ paperswithcode.com
Enriching Video Captions With Contextual Text
Enriching Video Captions With Contextual Text Enriching Video Captions With Contextual Text

Understanding video content and generating caption with context is an important and challenging task.

Unlike prior methods that typically attempt to generate generic video captions without context, our architecture contextualizes captioning by infusing extracted information from relevant text data... We propose an end-to-end sequence-to-sequence model which generates video captions based on visual input, and mines relevant knowledge such as names and locations from contextual text.

In contrast to previous approaches, we do not preprocess the text further, and let the model directly learn to attend over it.

Guided by the visual input, the model is able to copy words from the contextual text …

3 дня, 23 часа назад @ paperswithcode.com
Neural Network-based Reconstruction in Compressed Sensing MRI Without Fully-sampled Training Data
Neural Network-based Reconstruction in Compressed Sensing MRI Without Fully-sampled Training Data Neural Network-based Reconstruction in Compressed Sensing MRI Without Fully-sampled Training Data

Compressed Sensing MRI (CS-MRI) has shown promise in reconstructing under-sampled MR images, offering the potential to reduce scan times.

Classical techniques minimize a regularized least-squares cost function using an expensive iterative optimization procedure...

Recently, deep learning models have been developed that model the iterative nature of classical techniques by unrolling iterations in a neural network.

While exhibiting superior performance, these methods require large quantities of ground-truth images and have shown to be non-robust to unseen data.

In this paper, we explore a novel strategy to train an unrolled reconstruction network in an unsupervised fashion by adopting a loss …

3 дня, 23 часа назад @ paperswithcode.com
Solving Phase Retrieval with a Learned Reference
Solving Phase Retrieval with a Learned Reference Solving Phase Retrieval with a Learned Reference

Fourier phase retrieval is a classical problem that deals with the recovery of an image from the amplitude measurements of its Fourier coefficients.

In this paper, we assume that a known (learned) reference is added to the signal before capturing the Fourier amplitude measurements.

To recover the signal, we implement an iterative phase retrieval method as an unrolled network.

Then we use back propagation to learn the reference that provides us the best reconstruction for a fixed number of phase retrieval iterations.

We compared our method with standard Fourier phase retrieval methods and observed significant performance enhancement using the learned reference.

3 дня, 23 часа назад @ paperswithcode.com
Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation
Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation

In unsupervised domain adaptation (UDA), classifiers for the target domain are trained with massive true-label data from the source domain and unlabeled data from the target domain.

However, it may be difficult to collect fully-true-label data in a source domain given a limited budget... To mitigate this problem, we consider a novel problem setting where the classifier for the target domain has to be trained with complementary-label data from the source domain and unlabeled data from the target domain named budget-friendly UDA (BFUDA).

The key benefit is that it is much less costly to collect complementary-label source data (required by BFUDA) than collecting the true-label source data (req…

3 дня, 23 часа назад @ paperswithcode.com
Hybrid Deep Learning Gaussian Process for Diabetic Retinopathy Diagnosis and Uncertainty Quantification
Hybrid Deep Learning Gaussian Process for Diabetic Retinopathy Diagnosis and Uncertainty Quantification Hybrid Deep Learning Gaussian Process for Diabetic Retinopathy Diagnosis and Uncertainty Quantification

Diabetic Retinopathy (DR) is one of the microvascular complications of Diabetes Mellitus, which remains as one of the leading causes of blindness worldwide.

However, including the grade estimation and quantification of predictions uncertainty can potentially increase the robustness of the model.

In this paper, a hybrid Deep Learning-Gaussian process method for DR diagnosis and uncertainty quantification is presented.

This method combines the representational power of deep learning, with the ability to generalize from small datasets of Gaussian process models.

The results show that uncertainty quantification in the predictions improves the interpretability of the method as a diagnostic suppo…

3 дня, 23 часа назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 1 день, 3 часа назад
Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm
Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm Mirostat: A Perplexity-Controlled Neural Text Decoding Algorithm

Neural text decoding is important for generating high-quality texts using language models.

To generate high-quality text, popular decoding algorithms like top-k, top-p (nucleus), and temperature-based sampling truncate or distort the unreliable low probability tail of the language model...

Though these methods generate high-quality text after parameter tuning, they are ad hoc.

We use this analysis to design a feedback-based adaptive top-k text decoding algorithm called mirostat that generates text (of any length) with a predetermined value of perplexity, and thereby high-quality text without any tuning.

On the other hand, for large values of k and p, we find that perplexity increases with g…

3 дня, 23 часа назад @ paperswithcode.com
Context-Aware Attentive Knowledge Tracing
Context-Aware Attentive Knowledge Tracing Context-Aware Attentive Knowledge Tracing

Knowledge tracing (KT) refers to the problem of predicting future learner performance given their past performance in educational applications.

Recent developments in KT using flexible deep neural network-based models excel at this task...

In this paper, we propose attentive knowledge tracing (AKT), which couples flexible attention-based neural network models with a series of novel, interpretable model components inspired by cognitive and psychometric models.

AKT uses a novel monotonic attention mechanism that relates a learner's future responses to assessment questions to their past responses; attention weights are computed using exponential decay and a context-aware relative distance meas…

5 дней, 4 часа назад @ paperswithcode.com
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of a successes in single-agent settings and perfect-information games, best exemplified by the success of AlphaZero.

However, algorithms of this form have been unable to cope with imperfect-information games...

This paper presents ReBeL, a general framework for self-play reinforcement learning and search for imperfect-information games.

In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero.

Results show ReBeL leads to low exploitability in benchmark imperfect-information games and achieves superhuman perfor…

5 дней, 4 часа назад @ paperswithcode.com
Corner Proposal Network for Anchor-free, Two-stage Object Detection
Corner Proposal Network for Anchor-free, Two-stage Object Detection Corner Proposal Network for Anchor-free, Two-stage Object Detection

The goal of object detection is to determine the class and location of objects in an image.

This paper proposes a novel anchor-free, two-stage framework which first extracts a number of object proposals by finding potential corner keypoint combinations and then assigns a class label to each proposal by a standalone classification stage... We demonstrate that these two stages are effective solutions for improving recall and precision, respectively, and they can be integrated into an end-to-end network.

Our approach, dubbed Corner Proposal Network (CPN), enjoys the ability to detect objects of various scales and also avoids being confused by a large number of false-positive proposals.

On the …

5 дней, 4 часа назад @ paperswithcode.com
A Closer Look at Art Mediums: The MAMe Image Classification Dataset
A Closer Look at Art Mediums: The MAMe Image Classification Dataset A Closer Look at Art Mediums: The MAMe Image Classification Dataset

To challenge the AI community, this work introduces a novel image classification task focused on museum art mediums, the MAMe dataset.

For each class, MAMe provides a minimum of 850 images (700 for training) of high-resolution and variable shape.

The combination of volume, resolution and shape allows MAMe to fill a void in current image classification challenges, empowering research in aspects so far overseen by the research community.

After reviewing the singularity of MAMe in the context of current image classification tasks, a thorough description of the task is provided, together with dataset statistics.

Finally, these baselines are inspected using explainability methods and expert know…

5 дней, 4 часа назад @ paperswithcode.com
MMDF: Mobile Microscopy Deep Framework
MMDF: Mobile Microscopy Deep Framework MMDF: Mobile Microscopy Deep Framework

In the last decade, a huge step was done in the field of mobile microscopes development as well as in the field of mobile microscopy application to real-life disease diagnostics and a lot of other important areas (air/water quality pollution, education, agriculture).

In current study we applied image processing techniques from Deep Learning (in-focus/out-of-focus classification, image deblurring and denoising, multi-focus image fusion) to the data obtained from the mobile microscope... Overview of significant works for every task is presented, the most suitable approaches were highlighted.

Chosen approaches were implemented as well as their performance were compared with classical computer …

5 дней, 4 часа назад @ paperswithcode.com
Online Neural Connectivity Estimation with Noisy Group Testing
Online Neural Connectivity Estimation with Noisy Group Testing Online Neural Connectivity Estimation with Noisy Group Testing

Many previous approaches have attempted to estimate functional connectivity between neurons using statistical modeling of observational data, but these approaches rely heavily on parametric assumptions and are purely correlational...

Recently, however, holographic photostimulation techniques have made it possible to precisely target selected ensembles of neurons, offering the possibility of establishing direct causal links.

Here, we propose a method based on noisy group testing that drastically increases the efficiency of this process in sparse networks.

By stimulating small ensembles of neurons, we show that it is possible to recover binarized network connectivity with a number of tests th…

5 дней, 4 часа назад @ paperswithcode.com
openXDATA: A Tool for Multi-Target Data Generation and Missing Label Completion
openXDATA: A Tool for Multi-Target Data Generation and Missing Label Completion openXDATA: A Tool for Multi-Target Data Generation and Missing Label Completion

A common problem in machine learning is to deal with datasets with disjoint label spaces and missing labels.

In this work, we introduce the openXDATA tool that completes the missing labels in partially labelled or unlabelled datasets in order to generate multi-target data with labels in the joint label space of the datasets... To this end, we designed and implemented the cross-data label completion (CDLC) algorithm that uses a multi-task shared-hidden-layer DNN to iteratively complete the sparse label matrix of the instances from the different datasets.

We apply the new tool to estimate labels across four emotion datasets: one labeled with discrete emotion categories (e.g., happy, sad, angr…

5 дней, 4 часа назад @ paperswithcode.com
se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains

Tracking the 6D pose of objects in video sequences is important for robot manipulation.

This work proposes a data-driven optimization approach for long-term, 6D pose tracking.

It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model.

Consequently, even when the network is trained only with synthetic data can work effectively over real images.

The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz.

5 дней, 4 часа назад @ paperswithcode.com
Robust Ego and Object 6-DoF Motion Estimation and Tracking
Robust Ego and Object 6-DoF Motion Estimation and Tracking Robust Ego and Object 6-DoF Motion Estimation and Tracking

The problem of tracking self-motion as well as motion of objects in the scene using information from a camera is known as multi-body visual odometry and is a challenging task.

This paper proposes a robust solution to achieve accurate estimation and consistent track-ability for dynamic multi-body visual odometry... A compact and effective framework is proposed leveraging recent advances in semantic instance-level segmentation and accurate optical flow estimation.

A novel formulation, jointly optimizing SE(3) motion and optical flow is introduced that improves the quality of the tracked points and the motion estimation accuracy.

The proposed approach is evaluated on the virtual KITTI Dataset …

5 дней, 4 часа назад @ paperswithcode.com
Improving Results on Russian Sentiment Datasets
Improving Results on Russian Sentiment Datasets Improving Results on Russian Sentiment Datasets

In this study, we test standard neural network architectures (CNN, LSTM, BiLSTM) and recently appeared BERT architectures on previous Russian sentiment evaluation datasets.

We compare two variants of Russian BERT and show that for all sentiment tasks in this study the conversational variant of Russian BERT performs better...

The best results were achieved by BERT-NLI model, which treats sentiment classification tasks as a natural language inference task.

On one of the datasets, this model practically achieves the human level.

(read more)

5 дней, 4 часа назад @ paperswithcode.com
Faster Mean-shift: GPU-accelerated Embedding-clustering for Cell Segmentation and Tracking
Faster Mean-shift: GPU-accelerated Embedding-clustering for Cell Segmentation and Tracking Faster Mean-shift: GPU-accelerated Embedding-clustering for Cell Segmentation and Tracking

Recently, single-stage embedding based deep learning algorithms gain increasing attention in cell segmentation and tracking.

In this study, we propose a novel Faster Mean-shift algorithm, which tackles the computational bottleneck of embedding based cell segmentation and tracking.

With both embedding simulation and empirical validation via the four cohorts from the ISBI cell tracking challenge, the proposed Faster Mean-shift algorithm achieved 7-10 times speedup compared to the state-of-the-art embedding based cell instance segmentation and tracking algorithm.

Our Faster Mean-shift algorithm also achieved the highest computational speed compared to other GPU benchmarks with optimized memory…

5 дней, 4 часа назад @ paperswithcode.com
Flower: A Friendly Federated Learning Research Framework
Flower: A Friendly Federated Learning Research Framework Flower: A Friendly Federated Learning Research Framework

Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model, while keeping their training data on the device, thereby decoupling the ability to do machine learning from the need to store the data in the cloud.

However, FL is difficult to implement and deploy in practice, considering the heterogeneity in mobile devices, e.g., different programming languages, frameworks, and hardware accelerators...

Although there are a few frameworks available to simulate FL algorithms (e.g., TensorFlow Federated), they do not support implementing FL workloads on mobile devices.

In this paper, we present Flower (https://flower.dev/), a FL f…

5 дней, 4 часа назад @ paperswithcode.com
Lifelong Incremental Reinforcement Learning with Online Bayesian Inference
Lifelong Incremental Reinforcement Learning with Online Bayesian Inference Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

A central capability of a long-lived reinforcement learning (RL) agent is to incrementally adapt its behavior as its environment changes, and to incrementally build upon previous experiences to facilitate future learning in real-world scenarios.

In this paper, we propose LifeLong Incremental Reinforcement Learning (LLIRL), a new incremental algorithm for efficient lifelong adaptation to dynamic environments... We develop and maintain a library that contains an infinite mixture of parameterized environment models, which is equivalent to clustering environment parameters in a latent space.

During lifelong learning, we employ the expectation maximization (EM) algorithm with online Bayesian inf…

5 дней, 4 часа назад @ paperswithcode.com
Weakly Supervised 3D Object Detection from Point Clouds
Weakly Supervised 3D Object Detection from Point Clouds Weakly Supervised 3D Object Detection from Point Clouds

A crucial task in scene understanding is 3D object detection, which aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.

Existing 3D object detectors heavily rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios... Weakly supervised learning is a promising approach to reducing the annotation requirement, but existing weakly supervised object detectors are mostly for 2D detection rather than 3D.

In this work, we propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.

First…

5 дней, 4 часа назад @ paperswithcode.com
📓 Cool Blogs
ODS.ai Habr
последний пост 1 месяц, 1 неделя назад
Рубрика «Читаем статьи за вас». Май 2020. Часть 2
Рубрика «Читаем статьи за вас». Май 2020. Часть 2 Рубрика «Читаем статьи за вас». Май 2020. Часть 2

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks (China, 2020)

TAPAS: Weakly Supervised Table Parsing via Pre-training (Google, 2020)

DeepFaceLab: A simple, flexible and extensible faceswapping framework (2020)

End-to-End Object Detection with Transformers (Facebook AI, 2020)

Language Models are Few-Shot Learners (OpenAI, 2020)

TabNet: Attentive Interpretable Tabular Learning (Google Cloud AI, 2020) Читать дальше →

1 месяц, 1 неделя назад @ habr.com
Рубрика «Читаем статьи за вас». Май 2020. Часть 1
Рубрика «Читаем статьи за вас». Май 2020. Часть 1 Рубрика «Читаем статьи за вас». Май 2020. Часть 1

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Efficient Document Re-Ranking for Transformers by Precomputing Term Representations; EARL: Speedup Transformer-based Rankers with Pre-computed Representation (2020)

MakeItTalk: Speaker-Aware Talking Head Animation (Adobe, University of Massachusetts Amherst, Huya, 2020)

Jukebox: A Generative Model for Music (OpenAI, 2020)

Recipes for building an open-domain chatbot (Facebook AI Research, 2020)

One-Shot Object Detection without Fine-Tuning (HKUST, Hong Kong, Tencent, 2020)

f-BRS: Rethinki…

1 месяц, 2 недели назад @ habr.com
Рубрика «Читаем статьи за вас». Апрель 2020. Часть 2
Рубрика «Читаем статьи за вас». Апрель 2020. Часть 2 Рубрика «Читаем статьи за вас». Апрель 2020. Часть 2

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (Georgia Institute of Technology, Atlanta, USA, 2016)

X3D: Expanding Architectures for Efficient Video Recognition (Facebook AI Research, 2020)

Adaptive Attention Span in Transformers (Facebook AI Research, 2019)

ResNeSt: Split-Attention Networks (Amazon, 2020)

Weight Standardization (Johns Hopkins University, 2019)

Supervised Contrastive Learning (Google Research, MIT, 2020)

Improved Training Speed, Accurac…

2 месяца назад @ habr.com
Рубрика «Читаем статьи за вас». Апрель 2020. Часть 1
Рубрика «Читаем статьи за вас». Апрель 2020. Часть 1 Рубрика «Читаем статьи за вас». Апрель 2020. Часть 1

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: TResNet: High Performance GPU-Dedicated Architecture (DAMO Academy, Alibaba Group, 2020)

Controllable Person Image Synthesis with Attribute-Decomposed GAN (China, 2020)

Learning to See Through Obstructions (Taiwan, USA, 2020)

Tracking Objects as Points (UT Austin, Intel Labs, 2020)

CookGAN: Meal Image Synthesis from Ingredients (USA, UK, 2020)

Designing Network Design Spaces (FAIR, 2020)

Gradient Centralization: A New Optimization Technique for Deep Neural Networks (Hong Kong, Alibaba, 2…

2 месяца, 2 недели назад @ habr.com
Лекарей сжигать нельзя беречь сейчас
Лекарей сжигать нельзя беречь сейчас Лекарей сжигать нельзя беречь сейчас

TLDR: кому перестановки делают больнее — меряем свёрткой графов.

Код: RolX и ванильная трёхслойная GCN на мотифах. Выгорание на рабочем месте повстречал ещё в начале своей карьеры — и с тех пор живо интересуюсь этим вопросом. Представьте обстановку. Большой проект внедрения SAP. Высокие ставки. Амбициозные сроки. Нагрузку каждый воспринимал по-своему. Кто-то сорвался и самоустранился от выполнения обязанностей, кто-то стал токсичнее, у меня самого в какой-то момент чувство юмора пропало. Ненадолго. Управление изменениями (дисциплина, направленная на снижение напряжения во время внедрения информационных систем) многим обязана медикам. Во-первых, сам феномен эмоционального выгорания впервые з…

3 месяца назад @ habr.com
Рубрика «Читаем статьи за вас». Март 2020. Часть 2
Рубрика «Читаем статьи за вас». Март 2020. Часть 2 Рубрика «Читаем статьи за вас». Март 2020. Часть 2

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество! Первая часть мартовской сборки обзоров опубликована ранее.

Статьи на сегодня: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (UC Berkeley, Google Research, UC San Diego, 2020)

Scene Text Recognition via Transformer (China, 2020)

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization (Imperial College London, Google Research, 2019)

Lagrangian Neural Networks (Princeton, Oregon, Google, Flatiron, 2020)

Deformable Style Transfer (Chicago, USA, 2020)

Rethinking…

3 месяца, 3 недели назад @ habr.com
Рубрика «Читаем статьи за вас». Март 2020. Часть 1
Рубрика «Читаем статьи за вас». Март 2020. Часть 1 Рубрика «Читаем статьи за вас». Март 2020. Часть 1

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Fast Differentiable Sorting and Ranking (Google Brain, 2020)

MaxUp: A Simple Way to Improve Generalization of Neural Network Training (UT Austin, 2020)

Deep Nearest Neighbor Anomaly Detection (Jerusalem, Israel, 2020)

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch (Google, 2020)

SpERT: Span-based Joint Entity and Relation Extraction with Transformer Pre-training (RheinMain University, Germany, 2019)

High-Resolution Daytime Translation Without Domain Labels (Samsung AI Cen…

3 месяца, 3 недели назад @ habr.com
Машинное обучение на языке R с использованием пакета mlr3
Машинное обучение на языке R с использованием пакета mlr3 Машинное обучение на языке R с использованием пакета mlr3

Источник: https://mlr3book.mlr-org.com/ Привет, Хабр! В этом сообщении мы рассмотрим самый продуманный на сегодняшний день подход к машинному обучению на языке R — пакет mlr3 и экосистему вокруг него. Данный подход основан на «нормальном» ООП с использованием R6-классов и на представлении всех операций с данными и моделями в виде графа вычислений. Это позволяет создавать упорядоченные и гибкие пайплайны для задач машинного обучения, но на первых порах может показаться сложным и запутанным. Ниже постараемся внести определенную ясность и замотивировать к использованию mlr3 в ваших проектах. Содержание: Немного истории и сравнение с конкурирующими решениями

Технические детали: R6-классы и паке…

3 месяца, 4 недели назад @ habr.com
Распространение сферического коня в вакууме по территории РФ
Распространение сферического коня в вакууме по территории РФ Распространение сферического коня в вакууме по территории РФ

Привет от ODS. Мы откликнулись на идею tutu.ru поработать с их датасетом пассажиропотока РФ. И если в посте Milfgard огромная таблица выводов и научпоп, то мы хотим рассказать что под капотом.

Что, опять очередной пост про COVID-19? Да, но нет. Нам это было интересно именно с точки зрения математических методов и работы с интересным набором данных. Прежде, чем вы увидите под катом красивые картинки и графики, я обязан сказать несколько вещей: любое моделирование — это очень сложный процесс, внутри которого невероятное количество ЕСЛИ и ПРЕДПОЛОЖИМ. Мы о них расскажем.

те, кто работал над этой статьей — не эпидемиологи или вирусологи. Мы просто группа любителей теории графов, практикующих ме…

4 месяца, 1 неделя назад @ habr.com
Рубрика «Читаем статьи за вас». Январь — Февраль 2020
Рубрика «Читаем статьи за вас». Январь — Февраль 2020 Рубрика «Читаем статьи за вас». Январь — Февраль 2020

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Представлены обзоры 11 статей по Computer Vision, Natural Language Processing, Reinforcement learning и другим темам. Читать дальше →

4 месяца, 2 недели назад @ habr.com
Настройка функции потерь для нейронной сети на данных сейсморазведки
Настройка функции потерь для нейронной сети на данных сейсморазведки Настройка функции потерь для нейронной сети на данных сейсморазведки

В прошлой статье мы описали эксперимент по определению минимального объема вручную размеченных срезов для обучения нейронной сети на данных сейсморазведки. Сегодня мы продолжаем эту тему, выбирая наиболее подходящую функцию потерь. Рассмотрены 2 базовых класса функций – Binary cross entropy и Intersection over Union – в 6-ти вариантах с подбором параметров, а также комбинации функций разных классов. Дополнительно рассмотрена регуляризация функции потерь. Спойлер: удалось существенно улучшить качество прогноза сети. Читать дальше →

5 месяцев, 2 недели назад @ habr.com
Открытый курс «Deep Learning in NLP» от создателей DeepPavlov на базе курса cs224n
Открытый курс «Deep Learning in NLP» от создателей DeepPavlov на базе курса cs224n

Всем привет!

Если возник вопрос по курсу — посмотрите раздел Q&A ниже.

Вступление

Меня зовут Алексей Клоков, я хочу рассказать о запуске классного курса по обработке естественного языка (Natural Language Processing), который очередной раз запускают физтехи из проекта DeepPavlov – открытой библиотеки для разговорного искусственного интеллекта, которую разрабатывают в лаборатории нейронных систем и глубокого обучения МФТИ. Благодарю их и Moryshka за разрешение осветить эту тему на Хабре в нашем ods-блоге. Итак, поехали! Читать дальше →

6 месяцев назад @ habr.com
Рубрика «Читаем статьи за вас». Октябрь — Декабрь 2019
Рубрика «Читаем статьи за вас». Октябрь — Декабрь 2019 Рубрика «Читаем статьи за вас». Октябрь — Декабрь 2019

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (Facebook, 2019)

Implicit Discriminator in Variational Autoencoder (Indian Institute of Technology Ropar, 2019)

Self-training with Noisy Student improves ImageNet classification (Google Research, Carnegie Mellon University, 2019)

Momentum Contrast for Unsupervised Visual Representation Learning (Facebook, 2019)

Benchmarking Neural Network Robustness to Common Corruptions and …

6 месяцев, 1 неделя назад @ habr.com
SVM. Объяснение с нуля и реализация на python. Подробный разбор метода опорных векторов
SVM. Объяснение с нуля и реализация на python. Подробный разбор метода опорных векторов SVM. Объяснение с нуля и реализация на python. Подробный разбор метода опорных векторов

Привет всем, кто выбрал путь ML-самурая!

Введение:

В данной статье рассмотрим метод опорных векторов (англ. SVM, Support Vector Machine) для задачи классификации. Будет представлена основная идея алгоритма, вывод настройки его весов и разобрана простая реализация своими руками. На примере датасета будет продемонстрирована работа написанного алгоритма с линейно разделимыми/неразделимыми данными в пространстве и визуализация обучения/прогноза. Дополнительно будут озвучены плюсы и минусы алгоритма, его модификации. Рисунок 1. Фото цветка ириса из открытых источников Читать дальше →

6 месяцев, 2 недели назад @ habr.com
TensorRT 6.x.x.x — высокопроизводительный инференс для моделей глубокого обучения (Object Detection и Segmentation)
TensorRT 6.x.x.x — высокопроизводительный инференс для моделей глубокого обучения (Object Detection и Segmentation) TensorRT 6.x.x.x — высокопроизводительный инференс для моделей глубокого обучения (Object Detection и Segmentation)

Больно только в первый раз! Всем привет! Дорогие друзья, в этой статье я хочу поделиться своим опытом использования TensorRT, RetinaNet на базе репозитория github.com/aidonchuk/retinanet-examples (это форк официальной репы от nvidia, который позволит начать использовать в продакшен оптимизированные модели в кратчайшие сроки). Пролистывая сообщения в каналах сообщества ods.ai, я сталкиваюсь с вопросами по использованию TensorRT, и в основном вопросы повторяются, поэтому я решил написать как можно более полное руководство по использованию быстрого инференса на основе TensorRT, RetinaNet, Unet и docker. Читать дальше →

6 месяцев, 2 недели назад @ habr.com
inFERENCe inFERENCe
последний пост 8 месяцев, 3 недели назад
Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem
Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem

November 14, 2019Meta-Learning Millions of Hyper-parameters using the Implicit Function TheoremLast night on the train I read this nice paper by David Duvenaud and colleagues.

Implicit Function TheoremMany - though not all - meta-learning or hyperparameter optimization problems can be stated as nested optimization problems.

$$Using a finite truncation of the Neumann series one can approximate the inverse Hessian in the following way:$$\left[\frac{\partial^2 \mathcal{L}_T}{\partial \theta \partial \theta}\right]^{-1} \approx \sum_{i=1}^j \left(I - \frac{\partial^2 \mathcal{L}_T}{\partial \theta \partial \theta}\right)^i.

Most crucially, methods based on implicit gradients assume that your le…

8 месяцев, 3 недели назад @ inference.vc
The secular Bayesian: Using belief distributions without really believing
The secular Bayesian: Using belief distributions without really believing The secular Bayesian: Using belief distributions without really believing

October 31, 2019The secular Bayesian: Using belief distributions without really believingThe religious BayesianMy parents didn't raise me in a religious tradition.

The secular BayesianOver the years I came to terms with my Bayesian heritage, and I now live my life as a secular Bayesian.

This choice is the real reason why the resulting update rule will end up very Bayes-rule like, as we will see later.

RationalityNow that we have an update rule which satisfies our desiderata, can we say if it's actually a good or useful update rule?

So, not only is this update rule the only update rule that satisfies the desired properties, it is also optimal under this particular definition of optimality/ra…

9 месяцев, 1 неделя назад @ inference.vc
Exponentially Growing Learning Rate? Implications of Scale Invariance induced by Batch Normalization
Exponentially Growing Learning Rate? Implications of Scale Invariance induced by Batch Normalization Exponentially Growing Learning Rate? Implications of Scale Invariance induced by Batch Normalization

October 25, 2019Exponentially Growing Learning Rate?

Implications of Scale Invariance induced by Batch NormalizationYesterday I read this intriguing paper about the midboggling fact that it is possible to use exponentially growing learning rate schedule when training neural networks with batch normalization:Zhiyuan Li and Sanjeev Arora (2019) An Exponential Learning Rate Schedule for Deep LearningThe paper provides both theoretical insights as well as empirical demonstration of this remarcable property.

So Imagine doing vanilla gradient descent (no momentum, weight decay, fixed learning rate) on such a loss surface.

However, the weight vector won't completely blow up to infinity, because th…

9 месяцев, 2 недели назад @ inference.vc
On Marginal Likelihood and Cross-Validation
On Marginal Likelihood and Cross-Validation On Marginal Likelihood and Cross-Validation

The marginal likelihood and cross-validationTo discuss the connection between marginal likelihoods to (Bayesian) cross validation, let's first define what is what.

For each of these permutations we can decompose the marginal likelihood as a product of conditionals, or equivalently we can write the log marginal likelihood as a sum of logs of the same conditionals.

So, the sum of all the terms in this matrix gives the marginal likelihood times 6 (as there are 6 columns).

This observation gives a really good motivation for using the marginal likelihood, and also gives a new perspective on how it works.

Calculating the marginal likelihood amounts to evaluating the average predictive score on al…

9 месяцев, 3 недели назад @ inference.vc
Notes on iMAML: Meta-Learning with Implicit Gradients
Notes on iMAML: Meta-Learning with Implicit Gradients Notes on iMAML: Meta-Learning with Implicit Gradients

September 19, 2019Notes on iMAML: Meta-Learning with Implicit GradientsThis week I read this cool new paper on meta-learning: it a slightly different approach compared to its predecessors based on some observations about differentiating the optima of regularized optimization.

Let me illustrate what that dependence looks like:In the figure above, let's say that we would like to minimise an objective function $f(\theta)$.

Rather than deterministically finding a particular local minimum, SGD samples different minima: when run with different random seeds it will find different minima.

The meta-learning objective now depends on $\theta_0$ in two different ways:as we change the anchor $\theta_0$,…

10 месяцев, 2 недели назад @ inference.vc
Invariant Risk Minimization: An Information Theoretic View
Invariant Risk Minimization: An Information Theoretic View Invariant Risk Minimization: An Information Theoretic View

July 19, 2019Invariant Risk Minimization: An Information Theoretic ViewI finally got around to reading this new paper by Arjovsky et al.

Here, I will describe the main idea and then provide an information theoretic view on the same topic.

$Y \perp\mkern-13mu\perp E\vert X_1, W$: The observable $X_1$ and latent $W$ shield the label $Y$ from the influence of the environment.

Say we have a parametric family of functions $f(y\vert \phi(x); \theta)$ for predicting $y$ from $\phi(x)$.

The conditional information can be approximated as follows:\begin{align}I[Y, E \vert \phi(x)] &\approx \min_\theta {E}_{x,y} \ell (f(y\vert \phi(x); \theta) - \mathbb{E}_e \min_{\theta_e} \mathbb{E}_{x,y\vert e} \el…

1 год назад @ inference.vc
ICML Highlight: Contrastive Divergence for Combining Variational Inference and MCMC
ICML Highlight: Contrastive Divergence for Combining Variational Inference and MCMC ICML Highlight: Contrastive Divergence for Combining Variational Inference and MCMC

Ruiz and Titsias (2019) A Contrastive Divergence for Combining Variational Inference and MCMCBackground: principle of minimal improvementFirst, some background on why I found this paper particulartly interesting.

Using such improvement operator you can define an objective function for policies by measuring the extent to which the operator changes a policy.

In the case of AlphaGo Zero, the improvement operator is Monte Carlo Tree Search (MCTS).

The paper I'm talking about uses a very similar argument to come up with a contrastive divergence for variational inference, where the improvement operator is MCMC step.

Combining VI with MCMCThe two dominant ways of performing inference in latent var…

1 год, 1 месяц назад @ inference.vc
Notes on the Limitations of the Empirical Fisher Approximation
Notes on the Limitations of the Empirical Fisher Approximation Notes on the Limitations of the Empirical Fisher Approximation

June 6, 2019Notes on the Limitations of the Empirical Fisher ApproximationThis post is a short not on an excellent recent paper on empirical Fisher information matrices:Kunstner, Balles and Hennig (2019) Limitations of the Empirical Fisher ApproximationI was debating with myself whether I should write a post about this because it's a superbly written paper that you should probably read in full.

There isn't a whole lot of novelty in the paper, but it is a great discussion paper that provides a concise overview of the Fisher information, the empirical Fisher matrix and their connectinos to generalized Gauss-Newton methods.

The third shows the gradients corrected by the empirical Fisher instea…

1 год, 2 месяца назад @ inference.vc
Perceptual Straightening of Natural Videos
Perceptual Straightening of Natural Videos Perceptual Straightening of Natural Videos

May 30, 2019Perceptual Straightening of Natural VideosVideo is an interesting domain for unsupervised, or self-supervised, representation learning.

So, for example, straight trajectories have an almost $0$ probability under a high-dimensional Brownian motion or Ornstein–Uhlenbeck (OU) process.

Results and SummaryThe main results of the paper - as expected - is that natural video sequences indeed appear to be mapped to straight trajectories in representation space.

For one, the paper assumes a Gaussian observation noise in representation space, and I wonder how robust the analysis would be to assuming heavy-tailed noise.

Similarly, our very definition of straightness and angles relies on the…

1 год, 2 месяца назад @ inference.vc
DeepSets: Modeling Permutation Invariance
DeepSets: Modeling Permutation Invariance DeepSets: Modeling Permutation Invariance

February 7, 2019DeepSets: Modeling Permutation Invariance###### guest post by [ Fabian Fuchs ](https://twitter.com/FabianFuchsML), [ Ed Wagstaff ](https://github.com/edwag), and [ Martin Engelcke ](https://twitter.com/martinengelcke)One of my favourite recent innovations in neural network architectures is Deep Sets.

In such a situation, the invariance property we can exploit is permutation invariance.

To give a short, intuitive explanation for permutation invariance, this is what a permutation invariant function with three inputs would look like: $f(a, b, c) = f(a, c, b) = f(b, a, c) = \dots$.

The Deep Sets Architecture (Sum-Decomposition)Having established that there is a need for permutat…

1 год, 5 месяцев назад @ inference.vc
Causal Inference 3: Counterfactuals
Causal Inference 3: Counterfactuals Causal Inference 3: Counterfactuals

You hopefully know enough about causal inference by now to know that $p(🎓\vert 🧔=0)$ is certainly not the quantity we seek.

Counterfactual queriesTo finally explain counterfactuals, I have to step beyond causal graphs and introduce another concept: structural equation models.

Structural Equation ModelsA causal graph encodes which variables have a direct causal effect on any given node - we call these causal parents of the node.

$f_1$ computes $x$ from its causal parent $u$, and $f_2$ computes $a$ from its causal parents $x$ and $v$.

The structural equation model (SEM) entails the causal graph, in that you can reconstruct the causal graph by looking at the inputs of each function.

1 год, 6 месяцев назад @ inference.vc
Causal Inference 2: Illustrating Interventions via a Toy Example
Causal Inference 2: Illustrating Interventions via a Toy Example Causal Inference 2: Illustrating Interventions via a Toy Example

Consequently,the joint distribution of data alone is insufficient to predict behaviour under interventions.

Finally, you can use various causal discovery techniques to try to identify the causal diagram from the data itself.

Theoretically, recovering the full causal graph from the data is impossible in general cases.

SummaryWe have seen that modeling the joint distribution can only get you so far, and if you want to predict the effect of interventions, i.e.

calculate $p(y\vert do(x))$-like quantities, you have to add a causal graph to your analysis.

1 год, 6 месяцев назад @ inference.vc
Online Bayesian Deep Learning in Production at Tencent
Online Bayesian Deep Learning in Production at Tencent Online Bayesian Deep Learning in Production at Tencent

These applications include active learning, reinforcement learning and online/continual learning.

So as I recently read a paper by Tencent, I was surprised to learn that the online Bayesian deep learning algorithm is apparently deployed in production to power click-through-rate prediction in their ad system.

Assumed Density FilteringThe method relies on the approximate Bayesian online-learning technique often referred to as assumed density filtering.

forward propagation: In Bayesian deep learning, we maintain a distribution $q(w)$ over neural network weights, and each value $w$ defines a conditional probability $p(y\vert x, w)$.

In Bayesian deep learning, we maintain a distribution $q(w)$ o…

1 год, 8 месяцев назад @ inference.vc
👻Halloween Special: Critical reviews of the worst NIPS 2018 papers.
👻Halloween Special: Critical reviews of the worst NIPS 2018 papers. 👻Halloween Special: Critical reviews of the worst NIPS 2018 papers.

posts on machine learning, statistics, opinions on things I'm reading in the space

1 год, 9 месяцев назад @ inference.vc
The Blessings of Multiple Causes: Causal Inference when you Can't Measure Confounders
The Blessings of Multiple Causes: Causal Inference when you Can't Measure Confounders The Blessings of Multiple Causes: Causal Inference when you Can't Measure Confounders

September 7, 2018The Blessings of Multiple Causes: Causal Inference when you Can't Measure ConfoundersHappy back-to-school time everyone!

In this case, the size of the kidney stone is a confounder variable.

Let's look at how this differs from the non-causal association you would measure between treatment and outcome (i.e.

there may be confounders, but all confounders causally influence at least two of the cause variables.

It identifies just enough about the causal structure (the substitute confounder variable) to then be able to make causal inferences of a certain type.

1 год, 11 месяцев назад @ inference.vc
The Spectator The Spectator
последний пост 5 месяцев, 1 неделя назад
Queer Exceptionalism in Science
Queer Exceptionalism in Science Queer Exceptionalism in Science

Read in 5mins (800 words)Today’s queer scientist is exceptional.

Role of the Queer ScientistFor queer people to hold a recognised role in scientific life requires an acknowledgement that to be queer has consequences.

Challenges Facing Queer ScientistsFor the queer scientist, every encounter involves a conscious act of deliberation, risk assessment, and effort, well before any effort of research is begun.

For queer scientists, every new encounter—with a colleague, supervisor, possible letter-writer, examiner, moderator, student, interviewer, acquaintance, or future-friend—sets up a stressful coming-out scene.

To be queer in science is to ask to belong and to be safe.

5 месяцев, 1 неделя назад @ blog.shakirm.com
Machinery of Grace
Machinery of Grace Machinery of Grace

The machinery of grace is always simple.

The machines i’m thinking of are machines with intelligence, machines that learn.

Dialogues that lead to co-design and inclusion in the mission of developing intelligent machines with grace.

Firstly, to celebrate our progress in machine learning, but one that must now be balanced using a new critical practice.

If we are successful in making global AI truly global, and I believe we can be, we set ourselves on the path to realising that intelligent machinery of grace.

8 месяцев, 3 недели назад @ blog.shakirm.com
A New Consciousness of Inclusion in Machine Learning
A New Consciousness of Inclusion in Machine Learning A New Consciousness of Inclusion in Machine Learning

On LGBT Freedoms and our Support for Machine Learning in AfricaThis is an exploration of my thinking and my personal views.

The choice of these host countries has fomented concerns throughout our machine learning community: how can we as a community committed to inclusion in every form consider hosting our conferences in countries like these that are far from inclusive?

A politics of location, and an ethics of inclusion is growing healthily within our machine learning community.

But I too am an out and proud gay machine learning scientist.

My hope is that we will always continue to experiment with the ways in which we organise and support our global machine learning community.

1 год, 1 месяц назад @ blog.shakirm.com
Racialised Lives and the Life Beyond
Racialised Lives and the Life Beyond Racialised Lives and the Life Beyond

The Black women is racialised, and so too is the White man, as is every person we have ever known, and so the cycle of our racialised lives lives on.

About two-and-a-half years ago, I was part of creating a new organisation called the Deep Learning Indaba, as one attempt to engage with these questions.

The grassroots are those groups within our institutions, like our LGBT resource group within DeepMind, and those outside movements, like the Deep Learning Indaba.

I see the leadership of the Deep Learning Indaba as such a collective.

But I think we show the power of political love today, in this room, with our memory, with our energy, and in the celebration of progress that has brought us her…

1 год, 2 месяца назад @ blog.shakirm.com
Talk: How Do We Support Under-represented Groups To Put Themselves Forward?
Talk: How Do We Support Under-represented Groups To Put Themselves Forward? Talk: How Do We Support Under-represented Groups To Put Themselves Forward?

As you think of this question, consider the journey that is taken by the under-represented groups we might have in mind.

Journey’s like mine are our struggle credentials.

This room is filled with struggle credentials.

Struggle credentials play too much of a role in our present.

It is the under-represented groups that must eventually be put forward.

1 год, 9 месяцев назад @ blog.shakirm.com
Machine Learning Trick of the Day (8): Instrumental Thinking
Machine Learning Trick of the Day (8): Instrumental Thinking Machine Learning Trick of the Day (8): Instrumental Thinking

The instrumental variables idea is conceptually simple: we introduce new observed variables z, called instrumental variables, into our model; figure 1 (right).

And this is the trick: instrumental variables are special subset of the data we already have, but they allow us to remove the effect of confounders.

Our problem is to learn a linear value function using features (when in state x) using parameters so that .

But this probabilistic viewpoint through instrumental variables means that we can think of alternative ways of extending this view.

Like every trick in this series, the instrumental variables give us an alternative way to think about existing problems.

1 год, 9 месяцев назад @ blog.shakirm.com
Decolonising Artificial Intelligence
Decolonising Artificial Intelligence Decolonising Artificial Intelligence

· Read in 6mins · 1297 words ·The Artificial Intelligence we believe to be global, is far from it.

Inevitably, a call will be made to decolonise artificial intelligence.

The call for decolonisation in artificial intelligence is yet to reach its full volume.

Kai Fu Lee, The Real Threat of Artificial Intelligence, June 2017We immediately recognise the colonial nature of this possible future.

The only AI that empowers and works for the benefit of humanity is a truly global AI.

1 год, 9 месяцев назад @ blog.shakirm.com
The Price of Transformation
The Price of Transformation The Price of Transformation

The price of transformation is ours to pay.

Transformation cannot be separated from my other pillars, for they require transformation to succeed.

The price of transformation cannot be paid in this way.

We must all confront the question: What is the price of transformation?

We need to convince ourselves that the price of transformation is something we are willing to pay, and that we should pay.

1 год, 10 месяцев назад @ blog.shakirm.com
Machine Learning Trick of the Day (7): Density Ratio Trick
Machine Learning Trick of the Day (7): Density Ratio Trick Machine Learning Trick of the Day (7): Density Ratio Trick

The same is true if we want to compare probability densities: either through a density difference or a density ratio.

Density ratios are ubiquitous in machine learning, and will be our focus.

Density Ratio EstimationThe central task in the above five statistical quantities is to efficiently compute the ratio .

This is where the density ratio trick or formally, density ratio estimation, enters: it tells us to construct a binary classifier that distinguishes between samples from the two distributions.

This final derivation says that the problem of density ratio estimation is equivalent to that of binary classification.

2 года, 6 месяцев назад @ blog.shakirm.com
Cognitive Machine Learning (2): Uncertain Thoughts
Cognitive Machine Learning (2): Uncertain Thoughts Cognitive Machine Learning (2): Uncertain Thoughts

These types of thinking are secondary-levels of thinking: a thinking about thinking.

Like the primary colours, our primary thoughts are those that are the basis of our cognition.

Secondary colours use the primary colours as their basis, and similarly, secondary thoughts are thoughts about our primary thoughts.

Our memories, decisions and attitudes are amongst our primary thoughts, and for each we have secondary thoughts—metacognitive confidence assessments—that guide our behaviours.

Again, we can make such assessments in two ways: about the decisions we are still to make, a prospective decision confidence; and decisions we have already made, a retrospective decision confidence.

3 года, 4 месяца назад @ blog.shakirm.com
大トロ 大トロ
последний пост 4 месяца, 2 недели назад
Neuroevolution of Self-Interpretable Agents
Neuroevolution of Self-Interpretable Agents Neuroevolution of Self-Interpretable Agents

Agents with a self-attention “bottleneck” not only can solve these tasks from pixel inputs with only 4000 parameters, but they are also better at generalization.

Redirecting to attentionagent.github.io, where the article resides.

4 месяца, 2 недели назад @ blog.otoro.net
Learning to Predict Without Looking Ahead
Learning to Predict Without Looking Ahead Learning to Predict Without Looking Ahead

Rather than hardcoding forward prediction, we try to get agents to learn that they need to predict the future.

Redirecting to learningtopredict.github.io, where the article resides.

9 месяцев, 1 неделя назад @ blog.otoro.net
Weight Agnostic Neural Networks
Weight Agnostic Neural Networks Weight Agnostic Neural Networks

We search for neural network architectures that can already perform various tasks even when they use random weight values.

Redirecting to weightagnostic.github.io, where the article resides.

1 год, 1 месяц назад @ blog.otoro.net
Learning Latent Dynamics for Planning from Pixels
Learning Latent Dynamics for Planning from Pixels Learning Latent Dynamics for Planning from Pixels

PlaNet learns a world model from image inputs only and successfully leverages it for planning in latent space.

Redirecting to planetrl.github.io, where the article resides.

1 год, 5 месяцев назад @ blog.otoro.net
Reinforcement Learning for Improving Agent Design
Reinforcement Learning for Improving Agent Design Reinforcement Learning for Improving Agent Design

Little dude rewarded for having little legs.

Redirecting to designrl.github.io, where the article resides.

1 год, 10 месяцев назад @ blog.otoro.net
World Models Experiments
World Models Experiments World Models Experiments

In this article I will give step-by-step instructions for reproducing the experiments in the World Models article (pdf).

For general discussion about the World Models article, there are already some good discussion threads here in the GitHub issues page of the interactive article.

World Models (pdf)A Visual Guide to Evolution StrategiesEvolving Stable StrategiesBelow is optionalMixture Density NetworksMixture Density Networks with TensorFlowRead tutorials on Variational Autoencoders if you are not familiar with them.

I use a combination of OS X for inference, but trained models using Google Cloud VMs.

You should update your git repo with these new models using git add doomrnn/tf_models/*.js…

2 года, 1 месяц назад @ blog.otoro.net
World Models
World Models World Models

Can agents learn inside of their own dreams?

Redirecting to worldmodels.github.io, where the article resides.

2 года, 4 месяца назад @ blog.otoro.net
Evolving Stable Strategies
Evolving Stable Strategies Evolving Stable Strategies

popsize ): # init the agent with a solution agent = Agent ( solutions [ i ]) # rollout env with this agent fitlist [ i ] = rollout ( agent , env ) # give scores results back to ES solver .

One way to convert into a stochastic policy is to make random.

Robot arm grasping task using a stochastic policy.

The Minitaur model in pybullet is designed to mimic the real physical Minitaur.

After making the ball smaller, CMA-ES was able to find a stochastic policy that can walk and balance the ball at the same time.

2 года, 8 месяцев назад @ blog.otoro.net
A Visual Guide to Evolution Strategies
A Visual Guide to Evolution Strategies A Visual Guide to Evolution Strategies

In this post I explain how evolution strategies (ES) work with the aid of a few visual examples.

OpenAI published a paper called Evolution Strategies as a Scalable Alternative to Reinforcement Learning where they showed that evolution strategies, while being less data efficient than RL, offer many benefits.

Schaffer-2D FunctionRastrigin-2D FunctionAlthough there are many definitions of evolution strategies, we can define an evolution strategy as an algorithm that provides the user a set of candidate solutions to evaluate a problem.

Let’s visualise the scheme one more time, on the entire search process on both problems:Because CMA-ES can adapt both its mean and covariance matrix using inform…

2 года, 9 месяцев назад @ blog.otoro.net
Teaching Machines to Draw
Teaching Machines to Draw Teaching Machines to Draw

In this work, we investigate an alternative to traditional pixel image modelling approaches, and propose a generative model for vector images.

For example, we can subtract the latent vector of an encoded pig head from the latent vector of a full pig, to arrive at a vector that represents the concept of a body.

As we saw earlier, a model trained to draw pigs can be made to draw pig-like trucks if given an input sketch of a truck.

Exploring the latent space between different objects can potentially enable creative designers to find interesting intersections and relationships between different drawings:Exploring the latent space between cats and buses, elephants and pigs, and various owls.

In …

3 года, 2 месяца назад @ blog.otoro.net
Recurrent Neural Network Tutorial for Artists
Recurrent Neural Network Tutorial for Artists Recurrent Neural Network Tutorial for Artists

In particular, the experiments in the post help visualise the internals of a recurrent neural network trained to generate handwriting.

Recurrent Neural Network for HandwritingWe have pre-trained a recurrent neural network model to preform the handwriting task described in the previous section.

var x , y ; var dx , dy ; var pen ; var prev_pen ; var rnn_state ; var pdf ; var temperature = 0.65 ; var screen_width = window .

get_pdf ( rnn_state ); [ dx , dy , pen ] = Model .

I haven’t personally used keras.js, and I found it fun to just write the handwriting model from scratch in Javascript.

3 года, 7 месяцев назад @ blog.otoro.net
Hypernetworks
Hypernetworks Hypernetworks

In our paper, we use HyperNetworks to explore a middle ground - to enforce a relaxed version of weight-tying.

The more exciting work is in the second part of my paper where we apply Hypernetworks to Recurrent Networks.

Dynamic HypernetworksAs mentioned in the Introduction, we also tried to apply Hypernetworks on Recurrent Networks, and I feel this is the main contribution of the research.

Our approach is to put a small LSTM cell (called the HyperLSTM cell) inside a large LSTM cell (the main LSTM).

For our implementation of Dynamic Hypernetworks, we made it so that we can just plug our HyperLSTM cell into any TensorFlow code written to use tf.nn.rnn_cell objects, since the HyperLSTM inherite…

3 года, 10 месяцев назад @ blog.otoro.net
Generating Large Images from Latent Vectors - Part Two
Generating Large Images from Latent Vectors - Part Two Generating Large Images from Latent Vectors - Part Two

Random gaussian latent vectors were generated from numpy.random and fed into the generative network to obtain these images.

Our generator can produce large random images of digits using random gaussian vectors as input.

Unlike the previous model though, the generated images do not necessarily have to look exactly like the set of training images.

All the generator has to do is to create a set of new images that share the same classification labels of the set of training images.

Description of Generator NetworkThe generator used in the previous model uses 4 large layers of 128 nodes that are fully connected.

4 года, 2 месяца назад @ blog.otoro.net
Neural Network Evolution Playground with Backprop NEAT
Neural Network Evolution Playground with Backprop NEAT Neural Network Evolution Playground with Backprop NEAT

This demo will attempt to use a genetic algorithm to produce efficient, but atypical neural network structures to classify datasets borrowed from TensorFlow Playground.

People started experimenting with different neural network configurations, such as how many neural network layers are actually needed to fit a certain data set, or what initial features should be used for another data set.

In addition to weight-search, Deep Learning research has also produced many powerful neural network architectures that are important building blocks.

Evolving Neural Network TopologyNeuroevolution of Augmenting Topologies (NEAT) is a method that can evolve new types of neural networks based on genetic algo…

4 года, 3 месяца назад @ blog.otoro.net
Interactive Abstract Pattern Generation Javascript Demo
Interactive Abstract Pattern Generation Javascript Demo Interactive Abstract Pattern Generation Javascript Demo

Interactive Javascript Demo for Abstract Pattern Generation.

Although there were some code available previously in Javascript, it wasn’t general enough to use as a tool for a digital artist.

So I took the Javascript code previously written and spent an hour or two to fine tuned it into a simple web app.

In addition, the user is able to specify the size and depth of the generator neural network.

The depth and size of the network, and also the image resolution of the output can all be customised in the web app.

4 года, 3 месяца назад @ blog.otoro.net
The Unofficial Google Data Science Blog The Unofficial Google Data Science Blog
последний пост 1 неделя, 5 дней назад
Changing assignment weights with time-based confounders
Changing assignment weights with time-based confounders Changing assignment weights with time-based confounders

When assignment weights change in a ramp-up experiment, there are periods of constant assignment weights that we define as epochs.

When there are changing assignment weights and time-based confounders, this complication must be considered either in the analysis or the experimental design.

Epoch: If assignment weights are changed at times $Z^*_1, ..., Z^*_J$ then the assignment weights are constant during $[Z^*_j, Z^*_{j+1})$.

An experimenter who changes assignment weights gets the same answer as the experimenter who doesn’t change assignment weights (modulo some rounding errors) so long as they use the adjusted estimator.

An experimenter who changes assignment weights gets the same answer a…

1 неделя, 5 дней назад @ unofficialgoogledatascience.com
Humans-in-the-loop forecasting: integrating data science and business planning
Humans-in-the-loop forecasting: integrating data science and business planning Humans-in-the-loop forecasting: integrating data science and business planning

Figure 1: A Google data centerAs an example, consider Google’s forecasting and planning for data center capacity.

In particular, the data scientist must take responsibility for stakeholders approving the “best” forecast from all available information sources.

It required investments from our data science team to re-think our statistical forecasting approach to make it easier to compare against customer forecasts.

It also owns Google’s internal time series forecasting platform described in an earlier blog post .

But looking through the blogosphere, some go further and posit that “platformization” of forecasting and “forecasting as a service” can turn anyone into a data scientist at the push …

8 месяцев назад @ unofficialgoogledatascience.com
Estimating the prevalence of rare events — theory and practice
Estimating the prevalence of rare events — theory and practice Estimating the prevalence of rare events — theory and practice

$$S(v_1) = S(v_2) \implies \frac{q(v_1)}{p(v_1)} = \frac{q(v_2)}{p(v_2)}$$The ratio between the importance distribution and target distribution is thus a function of $S(v)$:$$\frac{q(v)}{p(v)} = \frac{\tilde{q}(S(v))}{\tilde{p}(S(v))}$$where $\tilde{p}$ and $\tilde{q}$ are PMFs of $S(v)$ under the target distribution and importance distribution respectively.

In our case when the events are rare and the probability of high conditional prevalence rate is small under the target distribution, the difference between the methods is minor.

We also discuss how to choose $q$ with respect to the conditional prevalence rate $g(S(v))=\mathbb{E}_p\left[f(V)|S(V)=S(v)\right]$.

Conclusion In this post, we…

11 месяцев, 1 неделя назад @ unofficialgoogledatascience.com
Misadventures in experiments for growth
Misadventures in experiments for growth Misadventures in experiments for growth

In summary, classic experimentation is applicable to fledgling products but in a much more limited way than to established products.

For our music example, we imagined that EDM users don't approximate the target population for some experiments.

The behavior of this single user user appears in our data as a large number of impressions with conversions.

A word on growth hackingOf particular concern in growth hacking is the focus on influencers for pushing growth.

For our music example, we imagined that EDM users don't approximate the target population for some experiments.

1 год, 3 месяца назад @ unofficialgoogledatascience.com
Crawling the internet: data science within a large engineering system
Crawling the internet: data science within a large engineering system Crawling the internet: data science within a large engineering system

When queries arrive, the search system matches the inferred meaning of the query to web pages on the basis of these snapshots.

This measure of web page value is on a meaningful linear scale, such that our freshness metric (a weighted average) has an intuitive interpretation.

A global constraint of how much compute and network resources Google itself is willing to dedicate to crawling web pages.

In some regimes (and in practice for google search), a greedy algorithm would devote more recrawl resources towards high value pages, as lower value pages would commonly starve.

We can use this function to sort the web pages, and then determine which web pages should be scheduled for immediate crawl.

2 года назад @ unofficialgoogledatascience.com
Compliance bias in mobile experiments
Compliance bias in mobile experiments Compliance bias in mobile experiments

The differences between the distribution of users experiencing the treatment and the population are likely to be a key factor here.

Compliance Bias A central issue in this application is that users assigned treatment sometimes do not actually experience the treatment at $T_{\mathrm{measure}}$, and furthermore this set of users is not random.

Here, we can draw a direct analogy to Compliance Bias, which is primarily described in literature on the analysis of medical studies.

Propensity scoring within the treatmentFig 5: Estimated probability of experiencing the treatment in the treatment group.

Here, we ignore any control group, and analyze the treatment group as a self-contained observationa…

2 года, 4 месяца назад @ unofficialgoogledatascience.com
Designing A/B tests in a collaboration network
Designing A/B tests in a collaboration network Designing A/B tests in a collaboration network

Our model considers two aspects of network effects:Homophily or similarity within network: users collaborating in network tend to behave similarly.

or similarity within network: users collaborating in network tend to behave similarly.

The network topology itself is the actual collaboration network we observe for GCP.When users are connected in a network, their treatment assignments can generate network effects through their interactions.

In other words, for the three methods of randomizationuniform random componentuniform random projectstratified random component we simulate confidence intervals for A/A tests, i.e.

Conclusion Designing randomized experiments on a network of users is more ch…

2 года, 6 месяцев назад @ unofficialgoogledatascience.com
Unintentional data
Unintentional data Unintentional data

The Future of Data AnalysisAvalanche of questions: the role of the data scientist amid unintentional dataIs it relevant to our goals?

In the world of big, unintentional data there are many discoveries to be had which have no bearing on the organization’s goals.

Democratization of analysis: quantity has a quality all its own Just as dealing with unintentional data shapes the role of the data scientists in their organization, it also shapes the day to day practice of data analysis.

Understanding the goals of the organization as well as guiding principles for extracting value from data are both critical for success in this environment.Thankfully not only have modern data analysis tools made da…

2 года, 9 месяцев назад @ unofficialgoogledatascience.com
Fitting Bayesian structural time series with the bsts R package
Fitting Bayesian structural time series with the bsts R package Fitting Bayesian structural time series with the bsts R package

When fitting bsts models that contain a regression component, extra arguments captured by ... are passed to the SpikeSlabPrior function from the BoomSpikeSlab package.

# Fit a bsts model with expected model size 1, the default.

model2 <- bsts(iclaimsNSA ~ ., state.specification = ss, niter = 1000, data = initial.claims)# Fit a bsts model with expected model size 5, to include more coefficients.

(a) (b)Figure 10: Regression coefficients for the (a) plain logistic regression model and (b) time series logistic regression model under equivalent spike and slab priors.

These are a widely useful class of time series models, known in various literatures as "structural time series," "state space mod…

3 года назад @ unofficialgoogledatascience.com
Our quest for robust time series forecasting at scale
Our quest for robust time series forecasting at scale Our quest for robust time series forecasting at scale

The demand for time series forecasting at Google grew rapidly along with the company over its first decade.

That is, for an attempt to develop methods and tools that would facilitate accurate large-scale time series forecasting at Google.

The demand for time series forecasting at Google grew rapidly along with the company over its first decade.

But like our approach, Prophet aims to be an automatic, robust forecasting tool.At lastly, "forecasting" for us did not mean anomaly detection.

APAby ERIC TASSONE, FARZAN ROHANITime series forecasting enjoys a rich and luminous history, and today is an essential element of most any business operation.

3 года, 3 месяца назад @ unofficialgoogledatascience.com
Attributing a deep network’s prediction to its input features
Attributing a deep network’s prediction to its input features Attributing a deep network’s prediction to its input features

We consider a deep network using the For concreteness, let us focus on a network that performs object recognition.

Deep networks have multiple layers of logic and coefficients, combined using nonlinear activation functions .

Application to other networks Our paper also includes application of integrated gradients to other networks (none of these networks were trained by us).

There is also work (such as this ) on architecting deep networks in ways that allow us to understand the internal representations of these networks.

Overall, we hope that deep networks lose their reputation for being impenetrable black-boxes which perform black magic.

3 года, 4 месяца назад @ unofficialgoogledatascience.com
Causality in machine learning
Causality in machine learning Causality in machine learning

An obvious attempt to fix this is to upweight randomized data in training, or even train the model solely on the randomized data.

As we observed at the start of this post, standard machine learning techniques don’t distinguish between randomized and observational data the way statistical models do.

Conclusion In this post we described how some randomized data may be applied both to check and improve the accuracy of a machine learning system trained largely on observational data.

Indeed, machine learning generally lacks the vocabulary to capture the distinction between observational data and randomized data that statistics finds crucial.

Rather, the focus of this post is on combining observa…

3 года, 6 месяцев назад @ unofficialgoogledatascience.com
Practical advice for analysis of large, complex data sets
Practical advice for analysis of large, complex data sets Practical advice for analysis of large, complex data sets

Some people seemed to be naturally good at doing this kind of high quality data analysis.

Process Separate Validation, Description, and EvaluationValidation or Initial Data Analysis: Do I believe data is self-consistent, that the data was collected correctly, and that data represents what I think it does?

I think about about exploratory data analysis as having 3 interrelated stages:By separating these phases, you can more easily reach agreement with others.

Acknowledge and count your filtering Almost every large data analysis starts by filtering the data in various stages.

Almost every large data analysis starts by filtering the data in various stages.

3 года, 9 месяцев назад @ unofficialgoogledatascience.com
Statistics for Google Sheets
Statistics for Google Sheets Statistics for Google Sheets

IntroductionStatistics for Google Sheets is an add-on for Google Sheets that brings elementary statistical analysis tools to spreadsheet users.

The goal of the Statistics app is to “democratize data science” by putting elementary statistics capabilities in the hands of anyone with a Google account.

If you look closely at the boxplots you can see that returns following down days have slightly greater variation than returns following up days.

Finally, you can use logistic regression to see how a previous day’s return affects the probability of the next day’s return being positive.

Statistics for Google Sheets gives analysts and students the tools to conduct elementary statistical analyses in …

3 года, 10 месяцев назад @ unofficialgoogledatascience.com
Next generation tools for data science
Next generation tools for data science Next generation tools for data science

Introductionthe solution to write data processing pipelines scalable to hundreds of terabytes (or more) is evidenced by the massive uptake.

That MapReduce wassolution to write data processing pipelines scalable to hundreds of terabytes (or more) is evidenced by the massive uptake.

Widely used in medicine for count data, the MH estimator and its generalizations are ubiquitous within data science at Google.

filter( lambda x: x != header) .

Beam/Dataflow’s sweet spot: streaming processing Streaming processing is an ever-increasingly important topic for data science.

3 года, 11 месяцев назад @ unofficialgoogledatascience.com
Andrew Karpathy
последний пост 1 месяц, 3 недели назад
Biohacking Lite
Biohacking Lite Biohacking Lite

The goal of this post is to nerd out over biochemistry and energy metabolism in the animal kingdom, and potentially inspire others on their own biohacking lite adventure.

It’s highly amusing to think that every single time you breathe out (in a fasted state) you are literally breathing out your fat carbon by carbon.

Energy deficit.

To validate the energy deficit math I spent 100 days around late 2019 very carefully tracking my daily energy input and output.

That said, focusing on fat, both approaches show me losing body fat at roughly the same rate, though they are off by an absolute offset.

1 месяц, 3 недели назад @ karpathy.github.io
A Recipe for Training Neural Networks
A Recipe for Training Neural Networks

Some few weeks ago I posted a tweet on “the most common neural net mistakes”, listing a few common gotchas related to training neural nets.

1) Neural net training is a leaky abstractionIt is allegedly easy to get started with training neural nets.

This is just a start when it comes to training neural nets.

As a result, (and this is reeaally difficult to over-emphasize) a “fast and furious” approach to training neural networks does not work and only leads to suffering.

focus on training loss) and then regularize it appropriately (give up some training loss to improve the validation loss).

1 год, 3 месяца назад @ karpathy.github.io
(started posting on Medium instead)
(started posting on Medium instead)

The current state of this blog (with the last post 2 years ago) makes it look like I’ve disappeared.

I’ve certainly become less active on blogs since I’ve joined Tesla, but whenever I do get a chance to post something I have recently been defaulting to doing it on Medium because it is much faster and easier.

I still plan to come back here for longer posts if I get any time, but I’ll default to Medium for everything short-medium in length.

TLDRHave a look at my Medium blog.

2 года, 6 месяцев назад @ karpathy.github.io
A Survival Guide to a PhD
A Survival Guide to a PhD A Survival Guide to a PhD

Unlike the undergraduate guide, this one was much more difficult to write because there is significantly more variation in how one can traverse the PhD experience.

You can go one way (PhD -> anywhere else) but not the other (anywhere else -> PhD -> academia/research; it is statistically less likely).

The adviser is an extremely important person who will exercise a lot of influence over your PhD experience.

During your PhD you’ll get to acquire this sense yourself.

It’s usually a painful exercise for me to look through some of my early PhD paper drafts because they are quite terrible.

3 года, 11 месяцев назад @ karpathy.github.io
Deep Reinforcement Learning: Pong from Pixels
Deep Reinforcement Learning: Pong from Pixels Deep Reinforcement Learning: Pong from Pixels

This is a long overdue blog post on Reinforcement Learning (RL).

From left to right: Deep Q Learning network playing ATARI, AlphaGo, Berkeley robot stacking Legos, physically-simulated quadruped leaping over terrain.

Policy network.

For example, suppose we compute \(R_t\) for all of the 20,000 actions in the batch of 100 Pong game rollouts above.

The total number of episodes was approximately 8,000 so the algorithm played roughly 200,000 Pong games (quite a lot isn’t it!)

4 года, 2 месяца назад @ karpathy.github.io
Short Story on AI: A Cognitive Discontinuity.
Short Story on AI: A Cognitive Discontinuity. Short Story on AI: A Cognitive Discontinuity.

Another great source of good reputation for Visceral were the large number of famous interventions carried out by autonomous Visceral agents.

The list went on and on - one month ago an autonomous Visceral agent recognized a remote drone attack.

He was running the routine software diagnostics on the Visceral agent and one of them had just failed.

The software diagnostics were only at 5% complete, and Merus knew they would take a while to run to completion.

Merus’ avatar broke the silence in the last second: “Come meet me here.” And then the connection was lost.

4 года, 8 месяцев назад @ karpathy.github.io
What a Deep Neural Network thinks about your #selfie
What a Deep Neural Network thinks about your #selfie What a Deep Neural Network thinks about your #selfie

In this fun experiment we’re going to do just that: We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones.

what if someone posted a very good selfie but it was late at night, so perhaps not as many people saw it and it got less likes?

What makes a good #selfie ?

To take a good selfie, Do:Be female.

Also, with some relief, it seems that the best selfies do not seem to be the ones that show the most skin.

4 года, 9 месяцев назад @ karpathy.github.io
The Unreasonable Effectiveness of Recurrent Neural Networks
The Unreasonable Effectiveness of Recurrent Neural Networks The Unreasonable Effectiveness of Recurrent Neural Networks

A glaring limitation of Vanilla Neural Networks (and also Convolutional Networks) is that their API is too constrained: they accept a fixed-sized vector as input (e.g.

If training vanilla neural nets is optimization over functions, training recurrent nets is optimization over programs.

At the core, RNNs have a deceptively simple API: They accept an input vector x and give you an output vector y .

Fun with RNNsAll 5 example character models below were trained with the code I’m releasing on Github.

These models have about 10 million parameters, which is still on the lower end for RNN models.

5 лет, 2 месяца назад @ karpathy.github.io
Breaking Linear Classifiers on ImageNet
Breaking Linear Classifiers on ImageNet Breaking Linear Classifiers on ImageNet

speech recognition systems), and most importantly, also to simple, shallow, good old-fashioned Linear Classifiers (Softmax classifier, or Linear Support Vector Machines, etc.).

Instead, lets fool a linear classifier and lets also keep with the theme of breaking models on images because they are fun to look at.

With input images of size 64x64x3 and 1000 ImageNet classes we therefore have 64x64x3x1000 = 12.3 million weights (beefy linear model!

We can then visualize each of the learned weights by reshaping them as images:Example linear classifiers for a few ImageNet classes.

Linear classifier with lower regularization (which leads to more noisy class weights) is easier to fool (top).

5 лет, 4 месяца назад @ karpathy.github.io
What I learned from competing against a ConvNet on ImageNet
What I learned from competing against a ConvNet on ImageNet What I learned from competing against a ConvNet on ImageNet

The 100,000 test set images are released with the dataset, but the labels are withheld to prevent teams from overfitting on the test set.

It’s fun to note that about 4 years ago I performed a similar (but much quicker and less detailed) human classification accuracy analysis on CIFAR-10.

In total, we attribute 24 (24%) of GoogLeNet errors and 12 (16%) of human errors to this category.

We estimate that approximately 22 (21%) of GoogLeNet errors fall into this category, while none of the human errors do.

On the hand, a large majority of human errors come from fine-grained categories and class unawareness.

5 лет, 11 месяцев назад @ karpathy.github.io
Off the Convex Path
последний пост 4 недели, 1 день назад
Training GANs - From Theory to Practice
Training GANs - From Theory to Practice Training GANs - From Theory to Practice

Training GANs - From Theory to PracticeGANs, originally discovered in the context of unsupervised learning, have had far reaching implications to science, engineering, and society.

However, training GANs remains challenging (in part) due to the lack of convergent algorithms for nonconvex-nonconcave min-max optimization.

In this post, we present a new first-order algorithm for min-max optimization which is particularly suited to GANs.

ConclusionIn this post we have shown how to develop a practical and convergent first-order algorithm for training GANs.

Our simulations show that a version of this algorithm can lead to more stable training of GANs.

4 недели, 1 день назад @ offconvex.org
An equilibrium in nonconvex-nonconcave min-max optimization
An equilibrium in nonconvex-nonconcave min-max optimization An equilibrium in nonconvex-nonconcave min-max optimization

Unlike minimization, where algorithms can always be shown to converge to some local minimum, there is no notion of a local equilibrium in min-max optimization that exists for general nonconvex-nonconcave functions.

Our greedy min-max equilibriumWe use the greedy max function to define a new second-order notion of local optimality for min-max optimization, which we refer to as a greedy min-max equilibrium.

This allows us to define a notion of greedy min-max equilibrium.

Greedy min-max equilibrium: $(x^{\star}, y^{\star})$ is a greedy min-max equilibrium if where $S(x,y):= \mathrm{smooth}_x(\mathrm{truncate}(g(x, y))$.

Further, for compactly supported convex-concave functions a point is a gre…

1 месяц, 1 неделя назад @ offconvex.org
Exponential Learning Rate Schedules for Deep Learning (Part 1)
Exponential Learning Rate Schedules for Deep Learning (Part 1) Exponential Learning Rate Schedules for Deep Learning (Part 1)

Exponential Learning Rate Schedules for Deep Learning (Part 1)This blog post concerns our ICLR20 paper on a surprising discovery about learning rate (LR), the most basic hyperparameter in deep learning.

These divergent approaches suggest that LR, the most basic and intuitive hyperparameter in deep learning, has not revealed all its mysteries yet.

SOTA performance with exponential LRAs mentioned, reaching state-of-the-art accuracy requires reducing the learning rate a few times.

Suppose the training has $K$ phases, and the learning rate is divided by some constant $C_I>1$ when entering phase $I$.

ConclusionWe hope that this bit of theory and supporting experiments have changed your outlook o…

3 месяца, 1 неделя назад @ offconvex.org
Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK)
Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK) Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK)

gradient flow) is equivalent to a kernel regression predictor with a deterministic kernel called neural tangent kernel (NTK).

Now we describe how training an ultra-wide fully-connected neural network leads to kernel regression with respect to the NTK.

In the large width limit, it turns out that the time-varying kernel $ker_t(\cdot,\cdot)$ is (with high probability) always close to a deterministic fixed kernel $ker_{\mathsf{NTK}}(\cdot,\cdot)$, which is the neural tangent kernel (NTK).

Now, at least we have a better understanding of a class of ultra-wide neural networks: they are captured by neural tangent kernels!

Similarly, one can try to translate other architectures like recurrent neural…

10 месяцев назад @ offconvex.org
Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent
Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent

Understanding implicit regularization in deep learning by analyzing trajectories of gradient descentSanjeev’s recent blog post suggested that the conventional view of optimization is insufficient for understanding deep learning, as the value of the training objective does not reliably capture generalization.

In recent years, researchers have come to realize the importance of implicit regularization induced by the choice of optimization algorithm.

This theorem disqualifies Schatten quasi-norms as the implicit regularization in deep matrix factorizations, and instead suggests that all depths correspond to nuclear norm.

Full details behind our results on “implicit regularization as norm minimi…

1 год назад @ offconvex.org
Landscape Connectivity of Low Cost Solutions for Multilayer Nets
Landscape Connectivity of Low Cost Solutions for Multilayer Nets Landscape Connectivity of Low Cost Solutions for Multilayer Nets

Landscape Connectivity of Low Cost Solutions for Multilayer NetsA big mystery about deep learning is how, in a highly nonconvex loss landscape, gradient descent often finds near-optimal solutions —those with training cost almost zero— even starting from a random initialization.

Solutions A and B have low cost but the line connecting them goes through solutions with high cost.

Mode Connectivity.

2019) did try to explain the phenomenon of mode connectivity in simple settings (the first of these demonstrated mode connectivity empirically for multi-layer nets).

Thus to explain mode connectivity for multilayer nets we will need to leverage some stronger property of typical solutions discovered v…

1 год, 1 месяц назад @ offconvex.org
Is Optimization a Sufficient Language for Understanding Deep Learning?
Is Optimization a Sufficient Language for Understanding Deep Learning?

Is Optimization a Sufficient Language for Understanding Deep Learning?

In this Deep Learning era, machine learning usually boils down to defining a suitable objective/cost function for the learning task at hand, and then optimizing this function using some variant of gradient descent (implemented via backpropagation).

I am suggesting that deep learning algorithms also have important properties that are not always reflected in the objective value.

by playing with batch sizes and learning rates) can be preferable to perfect optimization, even in simple settings such as regression.

NB: Empirically we find that Adam, the celebrated acceleration method for deep learning, speeds up optimization a…

1 год, 2 месяца назад @ offconvex.org
Contrastive Unsupervised Learning of Semantic Representations: A Theoretical Framework
Contrastive Unsupervised Learning of Semantic Representations&#58; A Theoretical Framework Contrastive Unsupervised Learning of Semantic Representations&#58; A Theoretical Framework

Contrastive Unsupervised Learning of Semantic Representations: A Theoretical FrameworkSemantic representations (aka semantic embeddings) of complicated data types (e.g.

Researchers are most interested in unsupervised representation learning using unlabeled data.

samples $x, x^{+}$ from the distribution $D_{c^+}$.

The highlighted parts in the table show that the unsupervised representations compete well with the supervised representations on the average $k$-way classification task ($k=2, 10$).

We find this to be true for unsupervised representations, and surprisingly for supervised representations as well.

1 год, 4 месяца назад @ offconvex.org
The search for biologically plausible neural computation: A similarity-based approach
The search for biologically plausible neural computation&#58; A similarity-based approach The search for biologically plausible neural computation&#58; A similarity-based approach

By re-ordering the variables and introducing a new variable, ${\bf W} \in \mathbb{R}^{k\times n}$, we obtain:To prove the second identity, find optimal ${\bf W}$ by taking a derivative of the expression on the right with respect to ${\bf W}$ and setting it to zero, and then substitute the optimal ${\bf W}$ back into the expression.

The price paid for this simplification is the appearance of the minimax optimization problem in variables, ${\bf W}$ and ${\bf M}$.

Variables ${\bf W}$ and ${\bf M}$ are represented by the weights of synapses in feedforward and lateral connections respectively.

In neuroscience, learning rules (2.7) for ${\bf W}$ and ${\bf M}$ are called Hebbian and anti-Hebbian r…

1 год, 8 месяцев назад @ offconvex.org
Machine Learning Mastery Machine Learning Mastery
последний пост 1 час назад
How to Use XGBoost for Time Series Forecasting
How to Use XGBoost for Time Series Forecasting How to Use XGBoost for Time Series Forecasting

XGBoost can also be used for time series forecasting, although it requires that the time series dataset be transformed into a supervised learning problem first.

You can install it using pip, as follows:sudo pip install xgboost 1 sudo pip install xgboostOnce installed, you can confirm that it was installed successfully and that you are using a modern version by running the following code:# xgboost import xgboost print("xgboost", xgboost.__version__) 1 2 3 # xgboost import xgboost print ( "xgboost" , xgboost .

Time Series Data PreparationTime series data can be phrased as supervised learning.

XGBoost for Time Series ForecastingIn this section, we will explore how to use XGBoost for time serie…

1 час назад @ machinelearningmastery.com
Repeated k-Fold Cross-Validation for Model Evaluation in Python
Repeated k-Fold Cross-Validation for Model Evaluation in Python Repeated k-Fold Cross-Validation for Model Evaluation in Python

Tutorial OverviewThis tutorial is divided into three parts; they are:k-Fold Cross-Validation Repeated k-Fold Cross-Validation Repeated k-Fold Cross-Validation in Pythonk-Fold Cross-ValidationIt is common to evaluate machine learning models on a dataset using k-fold cross-validation.

For more on the k-fold cross-validation procedure, see the tutorial:The k-fold cross-validation procedure can be implemented easily using the scikit-learn machine learning library.

Repeated k-Fold Cross-ValidationThe estimate of model performance via k-fold cross-validation can be noisy.

Like k-fold cross-validation itself, repeated k-fold cross-validation is easy to parallelize, where each fold or each repeated…

2 дня, 1 час назад @ machinelearningmastery.com
How to Configure k-Fold Cross-Validation
How to Configure k-Fold Cross-Validation How to Configure k-Fold Cross-Validation

How to calculate the correlation between a cross-validation test harness and an ideal test condition.

For more on the k-fold cross-validation procedure, see the tutorial:The k-fold cross-validation procedure can be implemented easily using the scikit-learn machine learning library.

Accuracy: 0.850 (0.128) 1 Accuracy: 0.850 (0.128)Now that we are familiar with k-fold cross-validation, let’s look at how we might configure the procedure.

That is, do they change together in the same ways: when one algorithm looks better than another via k-fold cross-validation, does this hold on the ideal test condition?

A low correlation suggests the need to change the k-fold cross-validation test harness to b…

5 дней, 1 час назад @ machinelearningmastery.com
Nested Cross-Validation for Machine Learning with Python
Nested Cross-Validation for Machine Learning with Python Nested Cross-Validation for Machine Learning with Python

This is called double cross-validation or nested cross-validation and is the preferred way to evaluate and compare tuned machine learning models.

In this tutorial, you will discover nested cross-validation for evaluating tuned machine learning models.

Tutorial OverviewThis tutorial is divided into three parts; they are:Combined Hyperparameter Tuning and Model Selection What Is Nested Cross-Validation Nested Cross-Validation With Scikit-LearnCombined Hyperparameter Tuning and Model SelectionIt is common to evaluate machine learning models on a dataset using k-fold cross-validation.

As such, the k-fold cross-validation procedure for model hyperparameter optimization is nested inside the k-fol…

1 неделя назад @ machinelearningmastery.com
LOOCV for Evaluating Machine Learning Algorithms
LOOCV for Evaluating Machine Learning Algorithms LOOCV for Evaluating Machine Learning Algorithms

Given the improved estimate of model performance, LOOCV is appropriate when an accurate estimate of model performance is critical.

LOOCV Procedure in Scikit-LearnThe scikit-learn Python machine learning library provides an implementation of the LOOCV via the LeaveOneOut class.

... # fit model model = RandomForestClassifier(random_state=1) model.fit(X_train, y_train) # evaluate model yhat = model.predict(X_test) 1 2 3 4 5 6 .

# fit model model = RandomForestClassifier ( random_state = 1 ) model .

LOOCV to Evaluate Machine Learning ModelsIn this section, we will explore using the LOOCV procedure to evaluate machine learning models on standard classification and regression predictive modeling …

1 неделя, 2 дня назад @ machinelearningmastery.com
Train-Test Split for Evaluating Machine Learning Algorithms
Train-Test Split for Evaluating Machine Learning Algorithms Train-Test Split for Evaluating Machine Learning Algorithms

How to evaluate machine learning algorithms for classification and regression using the train-test split.

Tutorial OverviewThis tutorial is divided into three parts; they are:Train-Test Split Evaluation When to Use the Train-Test Split How to Configure the Train-Test Split Train-Test Split Procedure in Scikit-Learn Repeatable Train-Test Splits Stratified Train-Test Splits Train-Test Split to Evaluate Machine Learning Models Train-Test Split for Classification Train-Test Split for RegressionTrain-Test Split EvaluationThe train-test split is a technique for evaluating the performance of a machine learning algorithm.

Train-Test Split Procedure in Scikit-LearnThe scikit-learn Python machine lea…

1 неделя, 5 дней назад @ machinelearningmastery.com
How to Selectively Scale Numerical Input Variables for Machine Learning
How to Selectively Scale Numerical Input Variables for Machine Learning How to Selectively Scale Numerical Input Variables for Machine Learning

Tweet Share ShareMany machine learning models perform better when input variables are carefully transformed or scaled prior to modeling.

Tutorial OverviewThis tutorial is divided into three parts; they are:Diabetes Numerical Dataset Non-Selective Scaling of Numerical Inputs Normalize All Input Variables Standardize All Input Variables Selective Scaling of Numerical Inputs Normalize Only Non-Gaussian Input Variables Standardize Only Gaussian-Like Input Variables Selectively Normalize and Standardize Input VariablesDiabetes Numerical DatasetAs the basis of this tutorial, we will use the so-called “diabetes” dataset that has been widely studied as a machine learning dataset since the 1990s.

No…

2 недели назад @ machinelearningmastery.com
Add Binary Flags for Missing Values for Machine Learning
Add Binary Flags for Missing Values for Machine Learning Add Binary Flags for Missing Values for Machine Learning

In this tutorial, you will discover how to add binary flags for missing values for modeling.

How to add a flag that indicates if a row has one more missing values and evaluate models with this new feature.

shape [ 1 ] ) if i != 23 ] X , y = data [ : , ix ] , data [ : , 23 ] print ( X .

Download Your FREE Mini-CourseModel With a Binary Flag for Missing ValuesIn the previous section, we replaced missing values with a calculated statistic.

Related TutorialsSummaryIn this tutorial, you discovered how to add binary flags for missing values for modeling.

2 недели, 2 дня назад @ machinelearningmastery.com
How to Create Custom Data Transforms for Scikit-Learn
How to Create Custom Data Transforms for Scikit-Learn How to Create Custom Data Transforms for Scikit-Learn

The solution is to create a custom data transform in scikit-learn using the FunctionTransformer class.

The class can then be used just like any other data transform in scikit-learn, e.g.

This is a type of data cleaning, and there is a data transform provided in scikit-learn called the VarianceThreshold that attempts to address this using the variance of each column.

values # split data into inputs and outputs X , y = data [ : , : - 1 ] , data [ : , - 1 ] # minimally prepare dataset X = X .

values # split data into inputs and outputs X , y = data [ : , : - 1 ] , data [ : , - 1 ] # minimally prepare dataset X = X .

2 недели, 5 дней назад @ machinelearningmastery.com
How to Grid Search Data Preparation Techniques
How to Grid Search Data Preparation Techniques How to Grid Search Data Preparation Techniques

An alternative approach to data preparation is to grid search a suite of common and commonly useful data preparation techniques to the raw data.

This can be achieved by designing a grid search of data preparation techniques and/or sequences of data preparation techniques in pipelines.

values # split the columns into input and output variables X , y = data [ : , : - 1 ] , data [ : , - 1 ] # summarize the shape of the loaded data print ( X .

values X , y = data [ : , : - 1 ] , data [ : , - 1 ] # minimally prepare dataset X = X .

values X , y = data [ : , : - 1 ] , data [ : , - 1 ] # minimally prepare dataset X = X .

3 недели назад @ machinelearningmastery.com
Framework for Data Preparation Techniques in Machine Learning
Framework for Data Preparation Techniques in Machine Learning Framework for Data Preparation Techniques in Machine Learning

Tutorial OverviewThis tutorial is divided into three parts; they are:Challenge of Data Preparation Framework for Data Preparation Data Preparation TechniquesChallenge of Data PreparationData preparation refers to transforming raw data into a form that is better suited to predictive modeling.

This framing of data preparation can also feel overwhelming to beginners given the large number and variety of data preparation techniques.

Download Your FREE Mini-CourseFramework for Data PreparationEffective data preparation requires that the data preparation techniques available are organized and considered in a structured and systematic way.

For more on these types of data preparation techniques, se…

3 недели, 2 дня назад @ machinelearningmastery.com
6 Dimensionality Reduction Algorithms With Python
6 Dimensionality Reduction Algorithms With Python 6 Dimensionality Reduction Algorithms With Python

In this tutorial, you will discover how to fit and evaluate top dimensionality reduction algorithms in Python.

Download Your FREE Mini-CourseDimensionality Reduction AlgorithmsThere are many algorithms that can be used for dimensionality reduction.

Examples of Dimensionality ReductionIn this section, we will review how to use popular dimensionality reduction algorithms in scikit-learn.

For more on LDA for dimensionality reduction, see the tutorial:The scikit-learn library provides the LinearDiscriminantAnalysis class implementation of Linear Discriminant Analysis that can be used as a dimensionality reduction data transform.

TutorialsAPIsSummaryIn this tutorial, you discovered how to fit an…

3 недели, 5 дней назад @ machinelearningmastery.com
4 Automatic Outlier Detection Algorithms in Python
4 Automatic Outlier Detection Algorithms in Python 4 Automatic Outlier Detection Algorithms in Python

How to correctly apply automatic outlier detection and removal to the training dataset only to avoid data leakage.

It would be invalid to fit the outlier detection method on the entire training dataset as this would result in data leakage.

shape ) # fit the model model = LinearRegression ( ) model .

fit ( X_train , y_train ) # evaluate the model yhat = model .

How to correctly apply automatic outlier detection and removal to the training dataset only to avoid data leakage.

4 недели назад @ machinelearningmastery.com
How to Use Feature Extraction on Tabular Data for Machine Learning
How to Use Feature Extraction on Tabular Data for Machine Learning How to Use Feature Extraction on Tabular Data for Machine Learning

In this tutorial, you will discover how to use feature extraction for data preparation with tabular data.

Tutorial OverviewThis tutorial is divided into three parts; they are:Feature Extraction Technique for Data Preparation Dataset and Performance Baseline Wine Classification Dataset Baseline Model Performance Feature Extraction Approach to Data PreparationFeature Extraction Technique for Data PreparationData preparation can be challenging.

Feature Extraction Approach to Data PreparationIn this section, we can explore whether we can improve performance using the feature extraction approach to data preparation.

Related TutorialsBooksAPIsSummaryIn this tutorial, you discovered how to use fea…

1 месяц назад @ machinelearningmastery.com
Lil'Log Lil'Log
последний пост 1 месяц, 4 недели назад
Exploration Strategies in Deep Reinforcement Learning
Exploration Strategies in Deep Reinforcement Learning Exploration Strategies in Deep Reinforcement Learning

Prediction-based ExplorationThe second category of intrinsic exploration bonuses are rewarded for improvement of the agent’s knowledge about the environment.

2007) sketched an idea of using a forward dynamics prediction model to estimate learning progress and assigned intrinsic exploration reward accordingly.

(Image source: Ecoffet, et al., 2020)After vanilla Go-Explore, Yijie Guo, et al.

Cited as:@article{weng2020exploration, title = "Exploration Strategies in Deep Reinforcement Learning", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "2020", url = "https://lilianweng.github.io/lil-log/2020/06/07/exploration-strategies-in-deep-reinforcement-learning.html" }Refer…

1 месяц, 4 недели назад @ lilianweng.github.io
The Transformer Family
The Transformer Family The Transformer Family

(2018) added a set of auxiliary losses to enable training a deep Transformer model on character-level language modeling which outperformed LSTMs.

Longer Attention Span (Transformer-XL)The vanilla Transformer has a fixed and limited attention span.

Image Transformer (Parmer, et al 2018) embraces a formulation of image generation similar to sequence modeling within the Transformer framework.

The top row illustrates the attention connectivity patterns in (a) Transformer, (b) Sparse Transformer with strided attention, and (c) Sparse Transformer with fixed attention.

2019)Cited as:@article{weng2020transformer, title = "The Transformer Family", author = "Weng, Lilian", journal = "lilianweng.githu…

3 месяца, 4 недели назад @ lilianweng.github.io
Curriculum for Reinforcement Learning
Curriculum for Reinforcement Learning Curriculum for Reinforcement Learning

Next, we will look into several categories of curriculum learning, as illustrated in Fig.

This framework of proposing curriculum automatically through another RL agent was formalized as Teacher-Student Curriculum Learning (TSCL; Matiisen, et al.

(Image source: Jabri, et al 2019)Learning a latent skill space can be done in different ways, such as in Hausman, et al.

(Image source: Czarnecki, et al., 2018)Cited as:@article{weng2020curriculum, title = "Curriculum for Reinforcement Learning", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "2020", url = "https://lilianweng.github.io/lil-log/2020/01/29/curriculum-for-reinforcement-learning.html" }References[1] Jeffrey L.…

6 месяцев, 1 неделя назад @ lilianweng.github.io
Self-Supervised Representation Learning
Self-Supervised Representation Learning Self-Supervised Representation Learning

Self-supervised learning opens up a huge opportunity for better utilizing unlabelled data, while learning in a supervised learning manner.

A great summary of how self-supervised learning tasks can be constructed (Image source: LeCun’s talk)Here is a nicely curated list of papers in self-supervised learning.

Self-supervised representation learning has shown great potential in learning useful state embedding that can be used directly as input to a control policy.

2020)Cited as:@article{weng2019selfsup, title = "Self-Supervised Representation Learning", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "2019", url = "https://lilianweng.github.io/lil-log/2019/11/10/self-…

8 месяцев, 4 недели назад @ lilianweng.github.io
Evolution Strategies
Evolution Strategies Evolution Strategies

Evolution Strategies (ES) is one type of black-box optimization algorithms, born in the family of Evolutionary Algorithms (EA).

Evolution strategies (ES) belong to the big family of evolutionary algorithms.

Simple Gaussian Evolution StrategiesThis is the most basic and canonical version of evolution strategies.

(Image source: Wikipedia CMA-ES)Natural Evolution StrategiesNatural Evolution Strategies (NES; Wierstra, et al, 2008) optimizes in a search distribution of parameters and moves the distribution in the direction of high fitness indicated by the natural gradient.

“Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents.” …

11 месяцев назад @ lilianweng.github.io
Meta Reinforcement Learning
Meta Reinforcement Learning Meta Reinforcement Learning

Meta-RL is meta-learning on reinforcement learning tasks.

Define Meta-RLMeta Reinforcement Learning, in short, is to do meta-learning in the field of reinforcement learning.

Cited as:@article{weng2019metaRL, title = "Meta Reinforcement Learning", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "2019", url = "http://lilianweng.github.io/lil-log/2019/06/23/meta-reinforcement-learning.html" }References[1] Richard S. Sutton.

“RL $^ 2$: Fast Reinforcement Learning via Slow Reinforcement Learning.” ICLR 2017.

[16] Abhishek Gupta, et al.“Unsupervised meta-learning for Reinforcement Learning” arXiv preprint arXiv:1806.04640 (2018).

1 год, 1 месяц назад @ lilianweng.github.io
Domain Randomization for Sim2Real Transfer
Domain Randomization for Sim2Real Transfer Domain Randomization for Sim2Real Transfer

Domain Randomization (DR) is a simple but powerful idea of closing this gap by randomizing properties of the training environment.

Domain randomization With domain randomization (DR), we are able to create a variety of simulated environments with randomized properties and train a model that works across all of them.

(Image source: Tobin et al, 2017)Physical dynamics in the simulator can also be randomized (Peng et al.

Match Real Data DistributionUsing real data to guide domain randomization feels a lot like doing system identification or DA.

Cited as:@article{weng2019DR, title = "Domain Randomization for Sim2Real Transfer", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", …

1 год, 3 месяца назад @ lilianweng.github.io
Are Deep Neural Networks Dramatically Overfitted?
Are Deep Neural Networks Dramatically Overfitted? Are Deep Neural Networks Dramatically Overfitted?

Because of its great capability to capture any flexible data representation, deep neural networks have achieved great success in many applications.

(Image source: Zhang’s paper)Are Deep Learning Models Dramatically Overfitted?

(2018) reconciled the traditional bias-variance trade-offs and proposed a new double-U-shaped risk curve for deep neural networks.

The lottery ticket hypothesis opens a new perspective about interpreting and dissecting deep neural network results.

Cited as:@article{weng2019overfit, title = "Are Deep Neural Networks Dramatically Overfitted?

1 год, 4 месяца назад @ lilianweng.github.io
Generalized Language Models
Generalized Language Models Generalized Language Models

As a follow up of word embedding post, we will discuss the models on learning contextualized word vectors, as well as the new trend in large unsupervised pre-trained language models which have achieved amazing SOTA results on a variety of language tasks.

Large-scale pre-trained language modes like OpenAI GPT and BERT have achieved great performance on a variety of language tasks using generic model architectures.

ELMoELMo, short for Embeddings from Language Model (Peters, et al, 2018) learns contextualized word representation by pre-training a language model in an unsupervised way.

Bidirectional Language ModelThe bidirectional Language Model (biLM) is the foundation for ELMo.

Multi-task ben…

1 год, 6 месяцев назад @ lilianweng.github.io
Object Detection Part 4: Fast Detection Models
Object Detection Part 4: Fast Detection Models Object Detection Part 4: Fast Detection Models

Part 4 of the “Object Detection for Dummies” series focuses on one-stage models for fast detection, including SSD, RetinaNet, and models in the YOLO family.

In Part 4, we only focus on fast object detection models, including SSD, RetinaNet, and models in the YOLO family.

Focal LossOne issue for object detection model training is an extreme imbalance between background that contains no object and foreground that holds objects of interests.

The comparison of various fast object detection models on speed and mAP performance.

Cited as:@article{weng2018detection4, title = "Object Detection Part 4: Fast Detection Models", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "…

1 год, 7 месяцев назад @ lilianweng.github.io
Piekniewski's blog
последний пост 1 месяц, 3 недели назад
AI - the no bullshit approach
AI - the no bullshit approach AI - the no bullshit approach

In this post I'd like share some of that agenda, in what I call the "no bullshit" approach to AI.

And since we don't see these things, we don't label datasets with them and hence these "symbols" never make it to AI, neither from the symbolic approach, nor machine learning approach.

Notably the stuff deep learning is mostly successfully used for these days is not mission critical.

The science wayThe scientific approach is really what this blog was all about, before it veered into making cynical posts about the general AI stupidity out there.

Failure of deep learning on delivering of many promises will likely lead to a similar winter.

1 месяц, 3 недели назад @ blog.piekniewski.info
DeflAition
DeflAition DeflAition

Full loyalty to the charter is expected, to the point of even varying the compensation by the level of "faith" .

It is often better to invest resources in getting slightly better data, add one more sensor, than train some ridiculously huge deep learning model and expect miracles.

With honesty and integrity rarely found in Silicon Valley, he went in and said what many were whispering for a while - AI is not really "AI".

Deep learning in clinical applicationsThere was some buzz about deep learning replacing radiologists, nonsense initiated by Hinton and then promptly repeated by Andrew Ng.

The realization that deep learning is not going to cut it with respect to self driving cars and many oth…

3 месяца, 3 недели назад @ blog.piekniewski.info
Autonomous vehicle safety myths and facts, 2020 update.
Autonomous vehicle safety myths and facts, 2020 update. Autonomous vehicle safety myths and facts, 2020 update.

As usual, these number are not really measuring reliably the safety of AV's and there are plenty ways to game them, or overreport.

Please refer to my last years post for a deeper discussion (and 2017 post here, 2018 post here) on why these numbers are essentially flawed.

Nevertheless these are the only official numbers we get, the only glimpse of transparency into this giant corporate endeavor called the "self driving car".

Nevertheless even Waymo and Cruise disengagements are still approximately an order of magnitude from the upper bound of human crash rate.

They finally have recorded some autonomous testing miles with the DMV, all 12.2 of them.

5 месяцев назад @ blog.piekniewski.info
The musings of a transformer
The musings of a transformer The musings of a transformer

Earlier last week I posted a poll on twitter asking If my readers would like me to post a GPT generated article.

The images were generated by https://app.generative.photos/ from RosebudAI - a recent hot startup in the AI space.

We've discussed the problems in the above graphic:In a move to improve safety in space, SpaceX will begin launching small cubesats.

If we consider the way that our brains work, we can think of data as representing information.

Even if a deep neural network could be trained to understand language, we would expect it to produce gibberish.

8 месяцев, 1 неделя назад @ blog.piekniewski.info
AI update, late 2019 - wizards of Oz
AI update, late 2019 - wizards of Oz AI update, late 2019 - wizards of Oz

Self driving carsAs time goes, more and more cracks are showing on the self driving car narrative.

Voyage, another similar startup now wants to solve the self driving problem with Deep Reinforcement Learning.

Meanwhile Daimler joined the crowd of companies slowly deflating the self driving balloon, to the point of even admitting they'd be cutting spending on it.

Element AI, one of these AI wannabe-unicorns with undefined product or service based in Canada raised a flat round and fired their CEO.

SummaryThe whole field of AI resembles a giant collective of wizards of Oz.

8 месяцев, 2 недели назад @ blog.piekniewski.info
Reviewing Rebooting AI
Reviewing Rebooting AI Reviewing Rebooting AI

In this post I'd like to focus on the recent book by Gary Marcus and Ernest Davis, Rebooting AI.

Current deep learning models are black boxes and have surprising failure modes, hence cannot be trusted in important applications.

The book appears to argue for more hybrid approaches to leverage the best of both worlds, symbolic good old fashioned AI (GOFAI) with the new wave deep learning AI.

On the other hand, mixing up current deep learning stuff with symbolic method does not seem to me personally like a road that would get us to actual AI, as in AI that is actually "intelligent".

They observe something I've been explaining in this blog since I started it - nobody really knows what common se…

9 месяцев, 1 неделя назад @ blog.piekniewski.info
Civilization from scratch
Civilization from scratch Civilization from scratch

To what extent would you be able to advance the civilization of the given era with all the knowledge in your head (no notebooks).

Initially the reaction is obviously that since we all live and breathe the current technical civilization, one should be able to recover almost everything right?

The way things are mounted, valves controlled, lubrication provided.

Having electricity generated in more appreciable quantities is the basic requirement, since good 95% of our modern industrial civilization runs on electricity.

We don't generally have the complete knowledge to recover everything from scratch.

1 год назад @ blog.piekniewski.info
AI circus, mid 2019 update
AI circus, mid 2019 update AI circus, mid 2019 update

The most hilarious set of events over the past few months in AI circulated around Open AI and Tesla.

Anyway, recently Open AI which is apparently no longer open, came out with an idea of going for profit.

I wish I could believe this too, but unfortunately I don't and I think Open AI has turned into a total scam.

This judgement is further reinforced by looking at what some of these Open AI people tweet, take e.g.

SummarySo there you go, the state of AI in mid 2019.

1 год, 2 месяца назад @ blog.piekniewski.info
Deep learning and shallow data
Deep learning and shallow data Deep learning and shallow data

Many people these days are fascinated by deep learning, as it enabled new capabilities in many areas, particularly in computer vision.

But the success of deep learning and a set of its surprising failure modes teach us a valuable lesson about the data we process.

Deep learning is providing statistically powerful detectors without the expense of feature engineering, though one still has to have a lot of labeled data, lot of GPU's and a deep learning expert onsite.

In applications where rare but catastrophic failure is acceptable, deep learning will work fine.

I don't think deep learning as practiced right now has anything to do with solving AI.

1 год, 4 месяца назад @ blog.piekniewski.info
A brief story of Silicon Valley's affair with AI
A brief story of Silicon Valley's affair with AI A brief story of Silicon Valley's affair with AI

Once upon a time, in the 1980's there was a magical place called Silicon Valley.

This again allowed Silicon Valley tycoons move more silicon into the households.

This was a problem for Silicon Valley, things started slowing down.

This is all Silicon Valley could have wished for: a new, highly lucrative application space that in addition required a ton of new silicon for compute requirements.

But neither of these improvements seems big enough win, to justify Silicon Valley big bets.

1 год, 4 месяца назад @ blog.piekniewski.info
Autonomous vehicle safety myths and facts, 2019 update
Autonomous vehicle safety myths and facts, 2019 update Autonomous vehicle safety myths and facts, 2019 update

2018 was an important year for self driving as we had seen the first fatal accident caused by an autonomous vehicle (the infamous Uber crash in Arizona).

The precise definition under California law is:“a deactivation of the autonomous mode when a failure of the autonomous technology is detected or when the safe operation of the vehicle requires that the autonomous vehicle test driver disengage the autonomous mode and take immediate manual control of the vehicle.” Section 227.46 of Article 3.7 (Autonomous Vehicles) of Title 13, Division 1, Chapter 1, California Code of Regulations.

“a deactivation of the autonomous mode when a failure of the autonomous technology is detected or when the safe…

1 год, 5 месяцев назад @ blog.piekniewski.info
Fooled by data
Fooled by data Fooled by data

The original data looks somewhat like this:We see that when we rotate the data we find a direction along which the data is separable, there are two well defined pancakes, no wonder the perceptron had no problems finding that separation.

However if we look at the data after PCA:and we find that the data is completely mixed up and inseparable.

This data is explicitly constructed to make the PCA fail but this does happen on real data.

Species Height Score K T 150 K T 145 K S 90 K S 95 K S 100 K S 105 K S 90 K S 95 K S 90 R T 140 R T 135 R T 130 R T 140 R T 135 R T 135 R T 140 R S 80 R S 85 R S 85These 20 rows will be enough to make my point.

In general data is typically much more complex than …

1 год, 6 месяцев назад @ blog.piekniewski.info
Elon and the collective
Elon and the collective Elon and the collective

Also Thunderf00t has a nice debunking video with the analysis of the alleged cost savings of drilling Elon bragged about.

Later in an interview Elon stated that "it was worth it", and that he "does not have the respect for SEC".

To me it just exposes Elon as an arrogant and narcissistic buffoon, not some capitalist superhero.

SummaryAlthough resistance to Elon and his fans is futile, and we will all be assimilated, I call bullshit.

I think the crowds of people who think Elon is the savior of man kind will be in for a great disappointment.

1 год, 7 месяцев назад @ blog.piekniewski.info
AI winter - update
AI winter - update AI winter - update

IntroductionAlmost six months ago (May 28th 2018) I posted the "AI winter is well on its way" post that went viral.

First of all a bit of clarification: some readers have misinterpreted my claims, in that I predicted that the AI hype is declining.

Andrew Ng is a rare example of a person who jumped from an academic bubble into an even bigger AI hype bubble.

I'm pretty certain that following Hotz's lead, many of today's AI hype blowers will be screaming how they've been warning about AI winter all along, once the bubble bursts.

Musk reiterated that he believes in the self driving Tesla fleet however full self driving remains "off menu" as it was too confusing (two years after introduction of …

1 год, 9 месяцев назад @ blog.piekniewski.info
Deep learning - the "why" question.
Deep learning - the "why" question. Deep learning - the "why" question.

There are many many deep learning models out there doing various things.

Science does not need to make gold out of lead every time, or in the case of machine learning, a real scientific paper in this field does not need to beat some current benchmark.

A scientific paper does not even need to answer any questions, if it happens to ask some good ones.

These are mostly the ones which try to show the deficits of deep learning and engage into a discussion as to why that might be the case.

So next time you read a deep learning paper, try to contemplate these quiet and never explained choices the authors have made.

1 год, 9 месяцев назад @ blog.piekniewski.info
Sebastian Ruder Sebastian Ruder
последний пост 1 год, 5 месяцев назад
AAAI 2019 Highlights: Dialogue, reproducibility, and more
AAAI 2019 Highlights: Dialogue, reproducibility, and more AAAI 2019 Highlights: Dialogue, reproducibility, and more

This post discusses highlights of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).

They provide the illusion of having a dialogue but in fact do not have a clue what we are saying or meaning.

During the panel discussion, Imed Zitouni highlighted that the limitations of current dialogue models affect user behaviour.

ReproducibilityAt the Workshop on Reproducible AI, Joel Grus argued that Jupyter notebooks are bad for reproducibility.

Another good resource for reproducibility is the ML reproducibility checklist by Joelle Pineau, which provides a list of items for algorithms, theory, and empirical results to enforce reproducibility.

1 год, 5 месяцев назад @ ruder.io
The 4 Biggest Open Problems in NLP
The 4 Biggest Open Problems in NLP The 4 Biggest Open Problems in NLP

This post discusses 4 major open problems in NLP based on an expert survey and a panel discussion at the Deep Learning Indaba.

NLP for low-resource scenariosDealing with low-data settings (low-resource languages, dialects (including social media text "dialects"), domains, etc.).

Taking a step back, the actual reason we work on NLP problems is to build systems that break down barriers.

Datasets, problems, and evaluationPerhaps the biggest problem is to properly define the problems themselves.

The final question asked what the most important NLP problems are that should be tackled for societies in Africa.

1 год, 6 месяцев назад @ ruder.io
10 Exciting Ideas of 2018 in NLP
10 Exciting Ideas of 2018 in NLP 10 Exciting Ideas of 2018 in NLP

At EMNLP 2018, unsupervised MT hit its stride with two papers from the same two groups that significantly improve upon their previous methods.

2) Pretrained language modelsUsing pretrained language models is probably the most significant NLP trend this year, so I won't spend much time on it here.

In particular, combining multilingual transfer learning (such as multilingual BERT), unsupervised learning, and meta-learning is a promising direction.

To me this really shows that pretrained language models indeed capture similar properties as computer vision models pretrained on ImageNet.

(EMNLP 2018): This paper proposes an auxiliary task that pretrains span representations by predicting for eac…

1 год, 7 месяцев назад @ ruder.io
EMNLP 2018 Highlights: Inductive bias, cross-lingual learning, and more
EMNLP 2018 Highlights: Inductive bias, cross-lingual learning, and more EMNLP 2018 Highlights: Inductive bias, cross-lingual learning, and more

The post discusses highlights of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).

For another beneficial inductive bias for attention, in one of the best papers of the conference, Strubell et al.

In the model by Zhang et al., sentences are viewed as latent variables for summarization.

show that RNN language models can represent filler-gap dependencies and learn a particular subset of restrictions known as island constraints.

Wood-Doughty et al.

1 год, 9 месяцев назад @ ruder.io
HackerNoon Interview
HackerNoon Interview HackerNoon Interview

This post is an interview by fast.ai fellow Sanyam Bhutani with me.

This post originally appeared at HackerNoon with a different introduction.

Sanyam: You’re working as a research scientist today at AYLIEN, and you’re a Ph.D. student at Insight Research Centre for Data Analytics.

If you’re interested in doing research, try to choose a particular subproblem not everyone is working on.

Sanyam: Thank you so much for doing this interview.

1 год, 10 месяцев назад @ ruder.io
A Review of the Neural History of Natural Language Processing
A Review of the Neural History of Natural Language Processing A Review of the Neural History of Natural Language Processing

Understanding better what information such language models capture consequently is an active research area (Kuncoro et al., 2018; Blevins et al., 2018).

It is becoming increasingly possible to learn a good projection in a completely unsupervised way (at least for similar languages) (Conneau et al., 2018; Artetxe et al., 2018; Søgaard et al., 2018), which opens applications for low-resource languages and unsupervised machine translation (Lample et al., 2018; Artetxe et al., 2018).

Three main types of neural networks became the most widely used: recurrent neural networks, convolutional neural networks, and recursive neural networks.

Recurrent neural networks Recurrent neural networks (RNNs) a…

1 год, 10 месяцев назад @ ruder.io
ACL 2018 Highlights: Understanding Representations and Evaluation in More Challenging Settings
ACL 2018 Highlights: Understanding Representations and Evaluation in More Challenging Settings ACL 2018 Highlights: Understanding Representations and Evaluation in More Challenging Settings

They find that all models indeed encode a significant amount of syntax and---in particular---that language models learn some syntax.

Another interesting result regarding the generalization ability of language models is due to Lau et al.

who find that a language model trained on a sonnet corpus captures meter implicitly at human-level performance.

Spithourakis and Riedel observe that language models are bad at modelling numerals and propose several strategies to improve them.

To make this easier, I have recently created a document to collect the state of the art across different NLP tasks.

2 года назад @ ruder.io
NLP's ImageNet moment has arrived
NLP's ImageNet moment has arrived NLP's ImageNet moment has arrived

Such methods herald a watershed moment: they may have the same wide-ranging impact on NLP as pretrained ImageNet models had on computer vision.

The success of ImageNet highlighted that in the era of deep learning, data was at least as important as algorithms.

Pretraining a language model was first proposed in 2015 , but it remained unclear whether a single pretrained language model was useful for many tasks.

One outstanding question is how to transfer the information from a pre-trained language model to a downstream task.

CitationFor attribution in academic contexts or books, please cite this work asSebastian Ruder, "NLP's ImageNet moment has arrived".

2 года назад @ ruder.io
Tracking the Progress in Natural Language Processing
Tracking the Progress in Natural Language Processing Tracking the Progress in Natural Language Processing

This post introduces a resource to track the progress and state-of-the-art across many tasks in NLP.

Go directly to the document tracking the progress in NLP.

Research in Machine Learning and in Natural Language Processing (NLP) is moving so fast these days, it is hard to keep up.

The Electronic Frontier Foundation and the AI Index try to do something similar for all of AI but only cover a few language tasks.

The Language Resources and Evaluation (LRE) Map collects language resources presented at LREC and other conferences, but does not allow to break them out by tasks or popularity.

2 года, 1 месяц назад @ ruder.io
Highlights of NAACL-HLT 2018: Generalization, Test-of-time, and Dialogue Systems
Highlights of NAACL-HLT 2018: Generalization, Test-of-time, and Dialogue Systems Highlights of NAACL-HLT 2018: Generalization, Test-of-time, and Dialogue Systems

This post discusses highlights of NAACL-HLT 2018.

Specifically, my highlights concentrate on three topics, which were prominent throughout the conference: Generalization, the Test-of-Time awards, and Dialogue Systems.

Embeddings from Language Models (ELMo) showed significant improvements over the state-of-the-art on a wide range of tasks as can be seen below.

However, current neural NLG heavily depends on language models and neural NLG can be brittle; in many cases, baselines based on templates can actually work better.

Secondly, language models are surface learners: they need “world” models and must be sensitive to the “latent process” behind language.

2 года, 1 месяц назад @ ruder.io
An overview of proxy-label approaches for semi-supervised learning
An overview of proxy-label approaches for semi-supervised learning An overview of proxy-label approaches for semi-supervised learning

These can be different network architectures in the case of neural networks or completely different learning algorithms.

For multi-view learning, different models work together to teach each other, alternately acting as both teachers and students.

Learning with noisy labels Learning with noisy labels is similar to learning from weak supervision.

For learning with noisy labels, labels are typically assumed to be permuted with a fixed random permutation.

While proxy-label approaches supply the noisy labels themselves, when learning with noisy labels, the labels are part of the data.

2 года, 3 месяца назад @ ruder.io
Text Classification with TensorFlow Estimators
Text Classification with TensorFlow Estimators Text Classification with TensorFlow Estimators

In particular, this article demonstrates how to solve a text classification task using custom TensorFlow estimators, embeddings, and the tf.layers module.

batch ( 100 ) dataset = dataset .

map ( parser ) dataset = dataset .

batch ( 100 ) dataset = dataset .

numpy_input_fn ( x = { "x" : x , "len" : length } , shuffle = False ) predictions = [ p [ 'logistic' ] [ 0 ] for p in classifier .

2 года, 3 месяца назад @ ruder.io
Requests for Research
Requests for Research Requests for Research

This post aims to provide inspiration and ideas for research directions to junior researchers and those trying to get into research.

Machine learning research in particular moves so fast these days that it is difficult to find an opening.

Recent work focuses on creating adversarial examples either by replacing words or characters (Samanta and Mehta, 2017; Ebrahimi et al., 2017) , concatenation (Jia and Liang, 2017) , or adding adversarial perturbations (Yasunaga et al., 2017) .

If the representations are disentangled as in (Hu et al., 2017) , then we are also not too far from style transfer (Shen et al., 2017) .

Recently proposed methods such as cross-stitch units (Misra et al., 2017; Ruder…

2 года, 5 месяцев назад @ ruder.io
Optimization for Deep Learning Highlights in 2017
Optimization for Deep Learning Highlights in 2017 Optimization for Deep Learning Highlights in 2017

This indicates that from the Machine Learning practitioner's perspective, best practices for optimization for Deep Learning have largely remained the same.

An important hyperparameter for optimization in Deep Learning is the learning rate \(\eta\).

It is often thought that adaptive learning rate methods such as Adam are more robust to different learning rates, as they update the learning rate themselves.

In fact, learning rate annealing schedule engineering seems to be the new feature engineering as we can often find highly-tuned learning rate annealing schedules that improve the final convergence behaviour of our model.

Learning rate annealing with warm restarts is also known as cyclical l…

2 года, 8 месяцев назад @ ruder.io
Word embeddings in 2017: Trends and future directions
Word embeddings in 2017: Trends and future directions Word embeddings in 2017: Trends and future directions

This can allows us to reveal laws of semantic change (Hamilton et al., 2016; Bamler & Mandt, 2017; Dubossarsky et al., 2017) , , , to model temporal word analogy or relatedness (Szymanski, 2017; Rosin et al., 2017) , , or to capture the dynamics of semantic relations (Kutuzov et al., 2017) .

This post was meant to highlight some of the current trends and future directions for learning word embeddings that I found most compelling.

CitationFor attribution in academic contexts or books, please cite this work as:Sebastian Ruder, "Word embeddings in 2017: Trends and future directions".

BibTeX citation:@misc{ruder2017wordembeddings2017, author = {Ruder, Sebastian}, title = {{Word embeddings in 20…

2 года, 9 месяцев назад @ ruder.io
💼 University and corporation labs
DeepMind DeepMind
последний пост 1 месяц, 1 неделя назад
Applying for technical roles
Applying for technical roles Applying for technical roles

What can I expect in the interview process?

Feryal: The interview process at DeepMind can vary depending on the particular role you’re applying for.

Phase two - technical interviewsThis part of the process involves several sessions - including one with a technical quiz that covers a large breadth of topics in computer science, statistics, mathematics and machine learning.

~30min] interviews with researchers and leads about your specific research background and interests.

Phase four - culture interviewTowards the end of the interview process, you will once again connect with the recruitment team to discuss DeepMind’s culture and mission.

1 месяц, 1 неделя назад @ deepmind.com
Using AI to predict retinal disease progression
Using AI to predict retinal disease progression Using AI to predict retinal disease progression

The ‘dry’ form is relatively common among people over 65, and usually causes only mild sight loss.

Our contribution highlights the potential of using AI in preventative studies for diseases such as exAMD.

The Moorfields Eye Hospital AMD datasetWe used a dataset of anonymised retinal scans from Moorfields patients with exAMD in one eye, and at high-risk of developing exAMD in their other eye.

To address this, we worked with retinal experts to review all scans for each eye and specify the scan when exAMD was first evident.

In our previous work, now continuing in collaboration with Google Health, we developed a model capable of segmenting these eye scans into thirteen anatomical categories.

2 месяца, 2 недели назад @ deepmind.com
Specification gaming: the flip side of AI ingenuity
Specification gaming: the flip side of AI ingenuity Specification gaming: the flip side of AI ingenuity

Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome.

We have all had experiences with specification gaming, even if not by this name.

In this post, we review possible causes for specification gaming, share examples of where this happens in practice, and argue for further work on principled approaches to overcoming specification problems.

In a Lego stacking task, the desired outcome was for a red block to end up on top of a blue block.

The agent was rewarded for the height of the bottom face of the red block when it is not touching the block.

3 месяца, 2 недели назад @ deepmind.com
Towards understanding glasses with graph neural networks
Towards understanding glasses with graph neural networks Towards understanding glasses with graph neural networks

The practical implications of modelling glassThe glass transition is a ubiquitous phenomenon which manifests in more than window (silica) glasses.

Understanding the glass transition may result in other applications of disordered materials, in fields as diverse as biorenewable polymers and food processing.

Our new work, published in Nature Physics, could help us gain an understanding of the structural changes that may occur near the glass transition.

Leveraging graph neural networks to model glassy dynamicsGlasses can be modelled as particles interacting via a short-range repulsive potential which essentially prevents particles from getting too close to each other.

We then trained a neural n…

4 месяца назад @ deepmind.com
Agent57: Outperforming the human Atari benchmark
Agent57: Outperforming the human Atari benchmark Agent57: Outperforming the human Atari benchmark

Combining off-policy learning with memory is challenging because you need to know what you might remember when executing a different behaviour.

Within that strand, we distinguish two types of rewards: firstly, long-term novelty rewards encourage visiting many states throughout training, across many episodes.

Secondly, short-term novelty rewards encourage visiting many states over a short span of time (e.g., within a single episode of a game).

However, learning density models of high dimensional spaces is fraught with problems due to the curse of dimensionality.

For example, in Montezuma’s Revenge, unlike undirected exploration strategies, long-term novelty rewards allow the agent to surpass…

4 месяца назад @ deepmind.com
A new model and dataset for long-range memory
A new model and dataset for long-range memory A new model and dataset for long-range memory

Modelling natural languageFinding machine learning tasks which both drive the development of better memory architectures and push us further towards artificial general intelligence is challenging.

Transferring knowledgeSuch samples would likely astound Shannon, 70 years on from his early language model experiments.

Google’s prominent natural language model, BERT, achieves state-of-the-art performance on a wide array of NLP benchmarks, and is now a part of Google Search.

Benchmarking language modelsA popular long-range language model benchmark is WikiText-103, which is comprised of English-language Wikipedia articles, and was developed by researchers at Salesforce AI.

As such, we’ve compiled…

5 месяцев, 3 недели назад @ deepmind.com
AlphaFold: Using AI for scientific discovery
AlphaFold: Using AI for scientific discovery AlphaFold: Using AI for scientific discovery

In our study published today in Nature, we demonstrate how artificial intelligence research can drive and accelerate new scientific discoveries.

Our system, AlphaFold – described in peer-reviewed papers now published in Nature and PROTEINS – is the culmination of several years of work, and builds on decades of prior research using large genomic datasets to predict protein structure.

What is the protein folding problem?

What any given protein can do depends on its unique 3D structure.

Why is protein folding important?

6 месяцев, 3 недели назад @ deepmind.com
Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI
Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI

Meanwhile, in close contact with this study of reward learning in animals, computer scientists have developed algorithms for reinforcement learning in artificial systems.

A chain of prediction: temporal difference learningReinforcement learning is one of the oldest and most powerful ideas linking neuroscience and AI.

An important breakthrough in solving the problem of reward prediction was the temporal difference learning (TD) algorithm.

Around the same time, in the late 80s and early 90s, neuroscientists were struggling to understand the behaviour of dopamine neurons.

Distributional reinforcement learning

6 месяцев, 3 недели назад @ deepmind.com
Using WaveNet technology to reunite speech-impaired users with their original voices
Using WaveNet technology to reunite speech-impaired users with their original voices Using WaveNet technology to reunite speech-impaired users with their original voices

This post details a recent project we undertook with Google and ALS campaigner Tim Shaw, as part of Google’s Euphonia project.

We demonstrate an early proof of concept of how text-to-speech technologies can synthesise a high-quality, natural sounding voice using minimal recorded speech data.

But message banking lacks flexibility, resulting in a static dataset of phrases.

Now imagine that you were given the chance to preserve your voice by recording as much of it as possible.

And people who aren’t able to record phrases in time are left to choose a generic computer synthesized voice that lacks the same power of connection as their own.

7 месяцев, 2 недели назад @ deepmind.com
Learning human objectives by evaluating hypothetical behaviours
Learning human objectives by evaluating hypothetical behaviours Learning human objectives by evaluating hypothetical behaviours

TL;DR: We present a method for training reinforcement learning agents from human feedback in the presence of unknown unsafe states.

Training RL agents in the presence of unsafe states is known as the safe exploration problem.

The agent has one source of information: feedback about unsafe states from a human user.

Existing methods for training agents from human feedback ask the user to evaluate data of the agent acting in the environment.

The user provides feedback on this hypothetical behaviour, and the system interactively learns a model of the user's reward function.

7 месяцев, 3 недели назад @ deepmind.com
From unlikely start-up to major scientific organisation: Entering our tenth year at DeepMind
From unlikely start-up to major scientific organisation: Entering our tenth year at DeepMind From unlikely start-up to major scientific organisation: Entering our tenth year at DeepMind

Pioneering research, growing impactA mission this ambitious requires pioneering research on many fronts over many years.

As our research matures, we’ve been finding more opportunities to partner with others for social and commercial impact, often with our colleagues across Alphabet.

Entering our next phaseAs I discussed with Wired in the summer, this year feels like the start of a new phase for DeepMind as an established scientific organisation.

Over the past year, we’ve also been formalising a leadership team with the seasoned experience and skills for our second decade.

Right back to our origins blending neuroscience with machine learning, we’ve found that breakthroughs happen faster when…

8 месяцев назад @ deepmind.com
Strengthening the AI community
Strengthening the AI community Strengthening the AI community

For me, it was being awarded an internship at Intel, the first one ever through Purdue’s Co-Op Engineering program in 1990.

I just didn’t know if I had the right technical skills for the work, or if engineering was really my path.

It grew into a very successful 18-year career at Intel and a 25-year career in tech.

At DeepMind we want to build advanced AI to expand our knowledge and find answers to some of the fundamental questions facing society.

DeepMind Scholarships to open the field of AIThe DeepMind scholarship programme is one way we seek to broaden participation in science and AI.

8 месяцев, 2 недели назад @ deepmind.com
Advanced machine learning helps Play Store users discover personalised apps
Advanced machine learning helps Play Store users discover personalised apps Advanced machine learning helps Play Store users discover personalised apps

Candidate generator unbiasingOur model (called a candidate generator) learns what apps a user is more likely to install based on previous apps they’ve installed from the Play store.

The model therefore learns a bias that favours the apps that are shown – and thus installed – more often.

An importance weight is based on the impression-to-install rate of each individual app in comparison with the median impression-to-install rate across the Play store.

Through importance weighting, our candidate generator can downweight or upweight apps based on their install rates, which mitigates the recommendation bias problem.

Our solution to this, the reranker model, learns the relative importance of a p…

8 месяцев, 2 недели назад @ deepmind.com
AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning
AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

Since then, we have taken on a much greater challenge: playing the full game at a Grandmaster level under professionally approved conditions .

AlphaStar can now play in one-on-one matches as and against Protoss, Terran, and Zerg – the three races present in StarCraft II.

Each of the Protoss, Terran, and Zerg agents is a single neural network.

We chose to use general-purpose machine learning techniques – including neural networks, self-play via reinforcement learning, multi-agent learning, and imitation learning – to learn directly from game data with general purpose techniques.

Using the advances described in our Nature paper, AlphaStar was ranked above 99.8% of active players on Battle.net…

9 месяцев, 1 неделя назад @ deepmind.com
Causal Bayesian Networks: A flexible tool to enable fairer machine learning
Causal Bayesian Networks: A flexible tool to enable fairer machine learning Causal Bayesian Networks: A flexible tool to enable fairer machine learning

This simplified example shows how CBNs can provide us with a visual framework for describing different possible unfairness scenarios.

It is nevertheless necessary to avoid pitfalls when evaluating or designing a decision system.

This means that it would be possible for the system to be deemed fair, even if it carries the unfair influence: this would automatically be the case for an error-free decision system.

On the other hand, if the path G→D→A was considered fair, it would be inappropriate to use statistical parity.

Path-specific techniques enable us to estimate the influence that a sensitive attribute has on other variables along specific sets of causal paths.

10 месяцев назад @ deepmind.com
Google Google
последний пост 1 день, 2 часа назад
Live HDR+ and Dual Exposure Controls on Pixel 4 and 4a
Live HDR+ and Dual Exposure Controls on Pixel 4 and 4a Live HDR+ and Dual Exposure Controls on Pixel 4 and 4a

>Different ways to tone-map a linear RGB image.

(a) The original, “un-tone-mapped” image.

(b) Global curve optimizing for the sky.

(c) Global curve optimizing for the subject.

In the 2D histogram, brighter areas indicate where more pixels of a given input brightness are mapped to the same output.

1 день, 2 часа назад @ ai.googleblog.com
Google Cloud AI and Harvard Global Health Institute Collaborate on new COVID-19 forecasting model
Google Cloud AI and Harvard Global Health Institute Collaborate on new COVID-19 forecasting model Google Cloud AI and Harvard Global Health Institute Collaborate on new COVID-19 forecasting model

“The COVID-19 Public Forecasts model produces forecasts at the critical jurisdiction of public health action—the county.

Coupled with the work of the Harvard Global Health Institute’s county-level COVID-19 Suppression Metrics, the COVID-19 Public Forecast Model will allow for targeted testing and public health interventions on a county-by-county basis.

The COVID-19 Public Forecasts are free to query in BigQuery or to download as CSVs (state forecast CSV and county forecast CSV).

As with any forecasts, the COVID-19 Public Forecasts have limitations that should be carefully considered before being used to inform decisions.

We encourage all users who intend to make decisions in part based on t…

1 день, 4 часа назад @ cloud.google.com
Introducing the Model Card Toolkit for Easier Model Transparency Reporting
Introducing the Model Card Toolkit for Easier Model Transparency Reporting Introducing the Model Card Toolkit for Easier Model Transparency Reporting

Here is an example of the completed Model Card from the Colab tutorial, which leverages the MCT and the provided UI template.

5 дней, 22 часа назад @ ai.googleblog.com
Google breaks AI performance records in MLPerf with world's fastest training supercomputer
Google breaks AI performance records in MLPerf with world's fastest training supercomputer Google breaks AI performance records in MLPerf with world's fastest training supercomputer

Figure 1: Speedup of Google’s best MLPerf Training v0.7 Research submission over the fastest non-Google submission in any availability category.

Comparisons are normalized by overall training time regardless of system size, which ranges from 8 to 4096 chips.

Taller bars are better.1We achieved these results with ML model implementations in TensorFlow, JAX, and Lingvo.

Google’s latest TPU supercomputer can train the same model almost five orders of magnitude faster just five years later.

MLPerf models at-a-glanceMLPerf models are chosen to be representative of cutting-edge machine learning workloads that are common throughout industry and academia.

6 дней, 3 часа назад @ cloud.google.com
Announcing ScaNN: Efficient Vector Similarity Search
Announcing ScaNN: Efficient Vector Similarity Search Announcing ScaNN: Efficient Vector Similarity Search

The goal is to quantize each x i to x̃ i = c 1 or x̃ i = c 2 .

Traditional quantization (left) results in the incorrect ordering of x 1 and x 2 for this query.

Even though our approach (right) chooses centers farther away from the data points, this in fact leads to lower inner product error and higher accuracy.

1 неделя назад @ ai.googleblog.com
Improved customer feedback management with Google Cloud AutoML
Improved customer feedback management with Google Cloud AutoML Improved customer feedback management with Google Cloud AutoML

When it comes to customer satisfaction, the customer service experience can often be more important than the actual product.

According to Forbes, companies lost about $75 billion in 2018 due to poor customer service, and 39% of customers who experienced poor customer service will not do business with the offending company again.

An important part of delivering a positive customer service experience is handling customer feedback, especially negative feedback, quickly and efficiently.

Integrating AI into a customer feedback management process can automate repetitive tasks, freeing up customer support agents to work on the most complex and time-sensitive cases.

In this blog we’ll look at an ex…

1 неделя, 4 дня назад @ cloud.google.com
Closing data gaps with Lacuna Fund
Closing data gaps with Lacuna Fund Closing data gaps with Lacuna Fund

Machine learning has shown enormous promise for social good, whether in helping respond to global health pandemics or reach citizens before natural disasters hit.

But even as machine learning technology becomes increasingly accessible, social innovators still face significant barriers in their efforts to use this technology to unlock new solutions.

From languages to health and agriculture, there is a lack of relevant, labeled data to represent and address the challenges that face much of the world's population.

Labeled data is a particular type of data that is useful in generating machine learning models: This data provides the “ground truth” that a model can use to guess about cases that i…

1 неделя, 5 дней назад @ blog.google
Using AI to identify the aggressiveness of prostate cancer
Using AI to identify the aggressiveness of prostate cancer Using AI to identify the aggressiveness of prostate cancer

These promising results indicate that the deep learning system has the potential to support expert-level diagnoses and expand access to high-quality cancer care.

To evaluate if it could improve the accuracy and consistency of prostate cancer diagnoses, this technology needs to be validated as an assistive tool in further clinical studies and on larger and more diverse patient groups.

Our research advancements in both prostate and breast cancer were the result of collaborations with the Naval Medical Center San Diego and support from Verily.

Our appreciation also goes to several institutions that provided access to de-identified data, and many pathologists who provided advice or reviewed pro…

1 неделя, 5 дней назад @ blog.google
Online shopping gets more personal with Recommendations AI
Online shopping gets more personal with Recommendations AI Online shopping gets more personal with Recommendations AI

With the continuing shift to digital, especially in the retail industry, ensuring a highly personalized shopping experience for online customers is crucial for establishing customer loyalty. In particular, product recommendations are an effective way to personalize the customer experience as they help customers discover products that match their tastes and preferences.Google has spent years delivering high-quality recommendations across our flagship products like YouTube and Google Search. Recommendations AI draws on that rich experience to give organizations a way to deliver highly personalized product recommendations to their customers at scale. Today, we are pleased to announce that Reco…

1 неделя, 6 дней назад @ cloud.google.com
Improving Holistic Scene Understanding with Panoptic-DeepLab
Improving Holistic Scene Understanding with Panoptic-DeepLab Improving Holistic Scene Understanding with Panoptic-DeepLab

Overview of Panoptic-DeepLab.

Semantic segmentation associates pixels in the image with general classes, while the class-agnostic instance segmentation step identifies the pixels associated with an individual object, regardless of the class.

Taken together one gets the final panoptic segmentation image.

1 неделя, 6 дней назад @ ai.googleblog.com
Exploring Faster Screening with Fewer Tests via Bayesian Group Testing
Exploring Faster Screening with Fewer Tests via Bayesian Group Testing Exploring Faster Screening with Fewer Tests via Bayesian Group Testing

Our group testing framework describes an interaction between a testing environment, the wet_lab , whose pooled test results are used by the sampler to draw thousands of plausible hypotheses on the infection status of all individuals.

These hypotheses are then used by an optimization procedure, group_selector, that figures out what groups may be the most relevant to test in order to narrow down on the true infection status.

Once formed, these new groups are then tested again, closing the loop.

At any point in the procedure, the hypotheses formed by the sampler can be averaged to obtain the average probability of infection for each patient.

From these probabilities, a decision on whether a pa…

3 недели назад @ ai.googleblog.com
30 years of family videos in an AI archive
30 years of family videos in an AI archive 30 years of family videos in an AI archive

Theoretically, since they were now stored in the cloud, my family and I could watch them whenever we wanted.

So, as an Applied AI Engineer, I got down to business and built an AI-powered searchable archive of our family videos.

If you’ve ever used Google Photos, you’ve seen the power of using AI to search and organize images and videos.

So, if I search “pool” in the Google Photos app, it’ll show me all the pictures and videos I ever took of pools.

In between shots on a single reel, he’d say: “Say goodbye, I’m going to fade out now.” I would scream, “NO, DON’T FADE OUT,” while the screen faded to black.

3 недели назад @ blog.google
Google at ICML 2020
Google at ICML 2020 Google at ICML 2020

Give us feedback in our Product Forums

3 недели, 1 день назад @ ai.googleblog.com
Grounding Natural Language Instructions to Mobile UI Actions
Grounding Natural Language Instructions to Mobile UI Actions Grounding Natural Language Instructions to Mobile UI Actions

The action phrase extraction model takes a word sequence of a natural language instruction and outputs a sequence of spans (denoted in red boxes) that indicate the phrases describing the operation, the object and the argument of each action in the task.

3 недели, 4 дня назад @ ai.googleblog.com
AutoML Tables: end-to-end workflows on AI Platform Pipelines
AutoML Tables: end-to-end workflows on AI Platform Pipelines AutoML Tables: end-to-end workflows on AI Platform Pipelines

To help make AutoML Tables more useful and user friendly, we’ve released a number of new features, including:This post gives a tour of some of these new features via a Cloud AI Platform Pipelines example that shows end-to-end management of an AutoML Tables workflow.

Cloud AI Platform Pipelines provides a way to deploy robust, repeatable machine learning pipelines along with monitoring, auditing, version tracking, and reproducibility, and delivers an enterprise-ready, easy to install, secure execution environment for your ML workflows.

Using Cloud AI Platform Pipelines to orchestrate a Tables workflowCloud AI Platform Pipelines, now in Beta, provides a way to deploy robust, repeatable machin…

3 недели, 4 дня назад @ cloud.google.com
OpenAI OpenAI
последний пост 3 недели, 5 дней назад
OpenAI Scholars Spring 2020: Final Projects
OpenAI Scholars Spring 2020: Final Projects OpenAI Scholars Spring 2020: Final Projects

Our third class of OpenAI Scholars presented their final projects at virtual Demo Day, showcasing their research results from over the past five months.

The OpenAI Scholars program provides stipends and mentorship to individuals from underrepresented groups to study deep learning and open-source a project.

Learn more about our Scholars program.

I joined the Scholars program in order to learn from the brilliant folks at OpenAI and to immerse myself in AI research.

The OpenAI Scholars program was this magical opportunity to get started by learning from the very best minds in the field.

3 недели, 5 дней назад @ openai.com
Image GPT
Image GPT Image GPT

However, the same broad class of models has not been successful in producing strong features for image classification.

From language GPT to image GPTIn language, unsupervised learning algorithms that rely on word prediction (like GPT-2 and BERT) have been extremely successful, achieving top performance on a wide array of language tasks.

Because masked language models like BERT have outperformed generative models on most language tasks, we also evaluate the performance of BERT on our image models.

LimitationsWhile we have shown that iGPT is capable of learning powerful image features, there are still significant limitations to our approach.

Notably, we achieved our results by directly applyi…

1 месяц, 2 недели назад @ openai.com
OpenAI API
OpenAI API OpenAI API

We’re releasing an API for accessing new AI models developed by OpenAI.

We will terminate API access for obviously harmful use-cases, such as harassment, spam, radicalization, or astroturfing.

What specifically will OpenAI do about misuse of the API, given what you’ve previously said about GPT-2?

How will OpenAI mitigate harmful bias and other negative effects of models served by the API?

Our API models could also cause harm in ways that we haven’t thought of yet.

1 месяц, 3 недели назад @ openai.com
Procgen and MineRL Competitions
Procgen and MineRL Competitions Procgen and MineRL Competitions

We’re excited to announce that OpenAI is co-organizing two NeurIPS 2020 competitions with AIcrowd, Carnegie Mellon University, and DeepMind, using Procgen Benchmark and MineRL.

Procgen CompetitionSign up for ProcgenThe Procgen Competition focuses on improving sample efficiency and generalization in reinforcement learning.

Since all content is procedurally generated, each Procgen environment intrinsically requires agents to generalize to never-before-seen situations.

Moreover, we designed Procgen environments to be fast and simple to use.

One well-known way to reduce the environment sample complexity is to leverage human priors and demonstrations of the desired behavior.

1 месяц, 3 недели назад @ openai.com
AI and Efficiency
AI and Efficiency AI and Efficiency

Other measures of AI progressIn addition to efficiency, many other measures shed light on overall algorithmic progress in AI.

Shufflenet achieved AlexNet-level performance with an 18x inference efficiency increase in 5 years (15-month doubling time), which suggests that training efficiency and inference efficiency might improve at similar rates.

This efficiency analysis suggests that policymakers could develop accurate intuitions about the cost of deploying AI capabilities—and how these costs are going to alter over time—by more closely assessing the rate of improvements in efficiency for AI systems.

Our results suggest that for AI tasks with high levels of investment (researcher time and/o…

3 месяца назад @ openai.com
Jukebox
Jukebox Jukebox

Curated samples Provided with genre, artist, and lyrics as input, Jukebox outputs a new music sample produced from scratch.

We can then train a model to generate audio in this compressed space, and upsample back to the raw audio space.

Now in raw audio, our models must learn to tackle high diversity as well as very long range structure, and the raw audio domain is particularly unforgiving of errors in short, medium, or long term timing.

To better understand future implications for the music community, we shared Jukebox with an initial set of 10 musicians from various genres to discuss their feedback on this work.

While Jukebox is an interesting research result, these musicians did not find …

3 месяца назад @ openai.com
Improving Verifiability in AI Development
Improving Verifiability
in AI Development Improving Verifiability in AI Development

Can I (as an academic) conduct impartial research on the risks associated with large-scale AI systems when I lack the computing resources of industry?

Can I (as an AI developer) verify that my competitors in a given area of AI development will follow best practices rather than cut corners to gain an advantage?

AI developers should pilot bias and safety bounties for AI systems to strengthen incentives and processes for broad-based scrutiny of AI systems.

Standard setting bodies should work with academia and industry to develop audit trail requirements for safety-critical applications of AI systems.

Organizations developing AI and funding bodies should support research into the interpretabili…

3 месяца, 2 недели назад @ openai.com
OpenAI Microscope
OpenAI Microscope OpenAI Microscope

We’re introducing OpenAI Microscope, a collection of visualizations of every significant layer and neuron of eight vision “model organisms” which are often studied in interpretability.

Microscope makes it easier to analyze the features that form inside these neural networks, and we hope it will help the research community as we move towards understanding these complicated systems.

This is the goal of the OpenAI Microscope.

Microscope systematically visualizes every neuron in several commonly studied vision models, and makes all of those neurons linkable.

Our initial release includes nine frequently studied vision models, along with several visualization techniques we’ve found particularly u…

3 месяца, 3 недели назад @ openai.com
OpenAI → PyTorch
OpenAI → PyTorch OpenAI → PyTorch

We are standardizing OpenAI’s deep learning framework on PyTorch.

The main reason we've chosen PyTorch is to increase our research productivity at scale on GPUs.

It is very easy to try and execute new research ideas in PyTorch; for example, switching to PyTorch decreased our iteration time on research ideas in generative modeling from weeks to days.

Going forward we'll primarily use PyTorch as our deep learning framework but sometimes use other ones when there's a specific technical reason to do so.

Many of our teams have already made the switch, and we look forward to contributing to the PyTorch community in upcoming months.

6 месяцев, 1 неделя назад @ openai.com
OpenAI Five
OpenAI Five OpenAI Five

You play against [OpenAI Five] and you realize it has a playstyle that is different.

It’s doing things that you’ve never done and you’ve never seen.

One key learning that we took is how it was allocating resources.

It’s just allocating resources as efficiently as possible.

[…] If OpenAI does that dynamic switch at 100%, we maybe went from 5% to 10%?

7 месяцев, 3 недели назад @ openai.com
Deep Double Descent
Deep Double Descent Deep Double Descent

Many classes of modern deep learning models, including CNNs, ResNets, and transformers, exhibit the previously-observed double descent phenomenon when not using early stopping or regularization.

The model-wise double descent phenomenon can lead to a regime where training on more data hurts.

The double descent phenomena is most prominent in settings with added label noise; without it, the peak is smaller and easy to miss.

For a given number of optimization steps (fixed y-coordinate), test and train error exhibit model-size double descent.

We leave fully understanding the mechanisms behind double descent in deep neural networks as an important open question.

8 месяцев назад @ openai.com
Procgen Benchmark
Procgen Benchmark Procgen Benchmark

We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills.

To fulfill this need, we have created Procgen Benchmark.

CoinRun now serves as the inaugural environment in Procgen Benchmark, contributing its diversity to a greater whole.

With Procgen Benchmark, we strive for all of the following: experimental convenience, high diversity within environments, and high diversity across environments.

We've now expanded on those results, conducting our most thorough study of RL generalization to date using all 16 environments in Procgen Benchmark.

8 месяцев назад @ openai.com
Safety Gym
Safety Gym Safety Gym

We're releasing Safety Gym, a suite of environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training.

Safety GymTo study constrained RL for safe exploration, we developed a new set of environments and tools called Safety Gym.

BenchmarkTo help make Safety Gym useful out-of-the-box, we evaluated some standard RL and constrained RL algorithms on the Safety Gym benchmark suite: PPO, TRPO, Lagrangian penalized versions of PPO and TRPO, and Constrained Policy Optimization (CPO).

There are three things we are most interested in at the moment:Improving performance on the current Safety Gym environments.

We also hope that systems l…

8 месяцев, 2 недели назад @ openai.com
GPT-2: 1.5B Release
GPT-2: 1.5B Release GPT-2: 1.5B Release

As the final model release of GPT-2’s staged release, we’re releasing the largest version (1.5B parameters) of GPT-2 along with code and model weights to facilitate detection of outputs of GPT-2 models.

Our partners at Cornell University surveyed people to assign GPT-2 text a credibility score across model sizes.

People gave the 1.5B model a “credibility score” of 6.91 out of 10.

These results make us more inclined to release the 1.5B model, as the incremental increase in human-perceived credibility relative to 774M seems low.

We acknowledge that we cannot be aware of all threats, and that motivated actors can replicate language models without model release.

9 месяцев назад @ openai.com
Solving Rubik’s Cube with a Robot Hand
Solving Rubik’s Cube with a Robot Hand Solving Rubik’s Cube with a Robot Hand

We've trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand.

Since May 2017, we've been trying to train a human-like robotic hand to solve the Rubik’s Cube.

Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes children several years to gain the dexterity required to master it.

To test the limits of our method, we experiment with a variety of perturbations while the hand is solving the Rubik’s Cube.

Behind the scenes: Rubik’s Cube prototypes In order to benchmark our progress and make the problem tractable, we built and designed custom versions of cubes as stepping stones towards ultimately solving a regular Rubik’s Cube.

9 месяцев, 3 недели назад @ openai.com
Microsoft Microsoft
последний пост 3 часа назад
ICML 2020 highlights: A Transformer-based RL agent, causal ML for increased privacy, and more
ICML 2020 highlights: A Transformer-based RL agent, causal ML for increased privacy, and more ICML 2020 highlights: A Transformer-based RL agent, causal ML for increased privacy, and more

But this diverse group of papers represents only a small slice of the advancements presented by Microsoft researchers.

Fun Fact: This project initially began as a project out of the Microsoft AI Residency Program.

Researchers from Microsoft Research India provided knowledge of causal ML for this project, while researchers from Microsoft Research Cambridge brought expertise on privacy and security.

New tools: The researchers have released an open-source toolkit, RobustDG, for evaluating causal ML models on privacy, robustness, and out-of-distribution accuracy.

It also builds on techniques for natural language understanding, which includes text classification, question answering, and informat…

3 часа назад @ microsoft.com
Three new reinforcement learning methods aim to improve AI in gaming and beyond
Three new reinforcement learning methods aim to improve AI in gaming and beyond Three new reinforcement learning methods aim to improve AI in gaming and beyond

In particular, we focus on developing game agents that learn to genuinely collaborate in teams with human players.

Highlight 1: More accurate uncertainty estimates in deep learning decision-making systemsFrom computer vision to reinforcement learning and machine translation, deep learning is everywhere and achieves state-of-the-art results on many problems.

We give it a dataset, and it gives us a prediction based on a deep learning model’s best guess.

The success of deep learning means that it is increasingly being applied in settings where the predictions have far-reaching consequences and mistakes can be costly.

In more technical terms, we provide an analysis of Random Network Distillatio…

1 день, 6 часов назад @ microsoft.com
Microsoft Hackathon leads to AI and sustainability collaboration to rid plastic from rivers and the ocean
Microsoft Hackathon leads to AI and sustainability collaboration to rid plastic from rivers and the ocean Microsoft Hackathon leads to AI and sustainability collaboration to rid plastic from rivers and the ocean

Dan Morris, AI for Earth program director, says the most important result from the hackathon was that AI for Earth taught The Ocean Cleanup a lot about machine learning.

This year, The Ocean Cleanup was named an AI for Earth grantee for its work.

“Using the AI for Earth grant, we’ve been able to set up and run the machine learning models,” De Vries says.

Van Geijn is among the Microsoft staffers there who have volunteered to help The Ocean Cleanup when it comes to computer and related support.

Photo credit: The Ocean Cleanup.

5 дней, 4 часа назад @ news.microsoft.com
State-of-the-art algorithm accelerates path for quantum computers to address climate change
State-of-the-art algorithm accelerates path for quantum computers to address climate change State-of-the-art algorithm accelerates path for quantum computers to address climate change

Quantum researchers at Microsoft are not only thinking about this question—we are producing tangible results that will shape how large-scale quantum computer applications will accomplish these tasks.

We have begun creating quantum computer applications in chemistry, and they could help to address one of the world’s biggest challenges to date: climate change.

ENGAGE Quantum Development Kit Are you a researcher or developer who wants to help in discovering new algorithms for quantum computers?

In our research, we precisely achieve this runtime reduction by developing a new, efficient quantum algorithm.

The QDK allows researchers to develop and test new quantum algorithms for chemistry, run sm…

5 дней, 4 часа назад @ microsoft.com
Aiming for more than just net zero
Aiming for more than just net zero

Many companies are reaching for net zero emissions, but we’re taking it even further.

1 неделя, 1 день назад @ azure.microsoft.com
Researchers use a strand-displacing DNA polymerase to do biocomputing
Researchers use a strand-displacing DNA polymerase to do biocomputing Researchers use a strand-displacing DNA polymerase to do biocomputing

Existing architectures implementing CRNs via DNA come in two varieties: DNA-only systems and multienzyme DNA systems.

Sequences on a DNA strand can also be split into different parts, each with a separate domain name.

In TMSD, an input DNA strand consisting of complementary domains—let’s call the strand t* o*—binds with an exposed single-stranded, or toehold, portion of a double-stranded DNA complex t o (Figure 1b).

On the left are a typical unimolecular reaction and a typical bimolecular reaction.

A unimolecular reaction has one input (A) and a bimolecular reaction has two inputs (A and B).

1 неделя, 5 дней назад @ microsoft.com
A path to personalization: Using ML to subtype patients receiving digital mental health interventions
A path to personalization: Using ML to subtype patients receiving digital mental health interventions A path to personalization: Using ML to subtype patients receiving digital mental health interventions

Mental health experiences vary widely from individual to individual.

People experiencing symptoms of depression, anxiety, or other mental health conditions know the therapeutic process can be a long and arduous one.

Yet, how these different forms of engagement affect mental health outcomes is less well known.

SilverCloud Health, the world’s largest provider in digital mental health, offers a suite of internet-delivered cognitive behavioral therapy (iCBT) interventions for the treatment of depression, anxiety, and other mental health conditions.

The sensitive and fluctuating nature of mental health symptomology requires an increase in access to interventions that can be personalized in this …

2 недели, 4 дня назад @ microsoft.com
Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities
Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities

Building on our vision to empower all developers to use AI to achieve more, today we’re excited to announce expanded capabilities within Azure Cognitive Services, including:.

These types of documents typically take manual labeling by document type or intensive coding to extract insights.

One of those advancements, Custom Commands, a capability of Speech in Cognitive Services, is now generally available.

With Cognitive Services and Bot Service, the BBC created an AI-enabled voice assistant, Beeb, that delivers a more engaging, tailored experience for its diverse audiences.

Get started todayLearn more with the resources below and get started with Azure Cognitive Services and an Azure free acc…

3 недели, 6 дней назад @ azure.microsoft.com
Defending DRAM for data safety and security in the cloud with Dr. Stefan Saroiu
Defending DRAM for data safety and security in the cloud with Dr. Stefan Saroiu Defending DRAM for data safety and security in the cloud with Dr. Stefan Saroiu

So, as a cloud provider, cloud providers are nervous about both scenarios.

So, now we have the cloud providers, the DRAM vendors and then the security and research community.

Stefan Saroiu: That’s right.

Stefan Saroiu: That’s right.

(music plays)To learn more about Dr. Stefan Saroiu, and the ongoing fight against Rowhammer attacks, visit Microsoft.com/research

3 недели, 6 дней назад @ microsoft.com
Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities
Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities

As the world adjusts to new ways of working and staying connected, we remain committed to providing Azure AI solutions to help organizations invent with purpose.

3 недели, 6 дней назад @ azure.microsoft.com
Toward trusted sensing for the cloud: Introducing Project Freta
Toward trusted sensing for the cloud: Introducing Project Freta Toward trusted sensing for the cloud: Introducing Project Freta

With Project Freta, we invite readers to think not of walls but of sunlight.

project-freta@microsoft.comIncubated at Microsoft Research, Project Freta is a roadmap toward trusted sensing for the cloud that can allow enterprises to engage in regular, complete discovery sweeps for undetected malware.

The goal of this democratization effort is to increase the development cost of undiscoverable cloud malware toward its theoretical maximum.

Project Freta: This releaseThe Project Freta analysis engine consumes snapshots of whole-system Linux volatile memory and extracts an enumeration of system objects.

We hope that Project Freta empowers administrators and responders and is used globally as it h…

4 недели, 1 день назад @ microsoft.com
Teaching a robot to see and navigate with simulation
Teaching a robot to see and navigate with simulation Teaching a robot to see and navigate with simulation

For example, consider autonomous rescue robots that are required to maneuver and navigate in challenging physical environments that humans cannot safely access.

SLAM has made impressive progress with both geometric-based methods and learning-based methods; however, robust and reliable SLAM systems for real-world scenarios remain elusive.

Secondly, many SLAM systems use multiple sensing modalities, such as RGB, depth cameras, and LiDAR, which makes data collection a considerable challenge.

A unique goal of our dataset is to focus on the challenging environments with changing light conditions, adverse weather, and dynamic objects.

We hope the TartanAir dataset can push the limits of the curre…

1 месяц назад @ microsoft.com
Newly discovered principle reveals how adversarial training can perform robust deep learning
Newly discovered principle reveals how adversarial training can perform robust deep learning Newly discovered principle reveals how adversarial training can perform robust deep learning

In machine learning, adversarial examples usually refer to natural inputs plus small, specially crafted perturbations that can fool the model into making mistakes.

In recent years, adversarial examples have been repeatedly discovered in deep learning applications, causing public concerns about AI safety.

Register todayIn a paper titled “Feature Purification: How can Adversarial Training Perform Robust Deep Learning,” researchers from Microsoft Research and Carnegie Mellon University propose the first framework toward understanding the math behind adversarial examples in deep learning.

Background: Mysteries about adversarial examples and adversarial trainingWhy do we have adversarial example…

1 месяц назад @ microsoft.com
Advancing Azure service quality with artificial intelligence: AIOps
Advancing Azure service quality with artificial intelligence: AIOps

As Mark mentioned when he authored the Advancing Reliability blog series, building and operating a global cloud infrastructure at the scale of Azure is a complex task with hundreds of ever-evolving service components, spanning more than 160 datacenters and across more than 60 regions.

1 месяц назад @ azure.microsoft.com
Microsoft AI health director: How AI is fueling intelligent health systems
Microsoft AI health director: How AI is fueling intelligent health systems Microsoft AI health director: How AI is fueling intelligent health systems

Microsoft AI health director: How AI is fueling intelligent health systemsAdvances in artificial intelligence are helping pave the way for intelligent health systems, which focus on using AI and data to establish care and operational strategies, according to Tom Lawry, national director for AI, health and life sciences at Microsoft.

By having full control of the data, organizations can use it to derive insights and make strategy decisions more quickly.

More than just the technology, healthcare organizations are also starting to view AI as a culture and a mindset, Mr. Lawry said.

“Intelligent health systems leverage data and AI to create strategic advantage, and they do that by making servic…

1 месяц, 1 неделя назад @ beckershospitalreview.com
Facebook Facebook
последний пост 2 недели назад
Scalable data classification for security and privacy
Scalable data classification for security and privacy Scalable data classification for security and privacy

What the research is:We’ve built a data classification system that uses multiple data signals, a scalable system architecture, and machine learning to detect semantic types within Facebook at scale.

This is important in situations where it’s necessary to detect where an organization’s data is stored in many different formats across various data stores.

In these cases, a classification system enables organizations to automatically enforce privacy- and security-related policies, such as data retention across control policies.

Why it matters:Organizations generally have a well-defined set of privacy policies aimed at ensuring that people’s privacy is respected.

Read the full paper:Secure and s…

2 недели назад @ engineering.fb.com
Fighting Abuse @Scale 2019 recap
Fighting Abuse @Scale 2019 recap Fighting Abuse @Scale 2019 recap

Fighting abuse presents unique challenges for large-scale organizations working to keep the people on their platforms safe.

At Fighting Abuse @Scale 2019, engineers, data scientists, product managers, and operations specialists gathered in Menlo Park for a day of technical talks focused on state-of-the art technologies to fight fraud, spam, and abuse on platforms that serve millions or even billions of people.

Our key insight is that sharing patterns can help hosting platforms identify abusive content, while hosting platforms can help sharing platforms prevent the spread of abusive content.

Results demonstrate that working together as an industry can strengthen the capacity to more quickly …

7 месяцев, 3 недели назад @ engineering.fb.com
CCSM: Scalable statistical anomaly detection to resolve app crashes faster
CCSM: Scalable statistical anomaly detection to resolve app crashes faster CCSM: Scalable statistical anomaly detection to resolve app crashes faster

A contrast set mining algorithmCSM provides a scalable, robust way to generate human-readable insights on high dimensional crash data.

For a contrast set X and group G, the support S(X,G) is the percentage of vectors in group G for which the contrast set X is true.

To efficiently traverse the search space of feature combinations, we cast the problem of mining contrast sets as a tree search problem.

However, real world data is often mixed — our crash data contains a mix of categorical, discrete, and continuous data.

The continuous contrast mining algorithm adopts the same tree search framework, with modifications to reason about sets of continuous features.

8 месяцев, 1 неделя назад @ engineering.fb.com
Fast dimensional analysis for root cause analysis at scale
Fast dimensional analysis for root cause analysis at scale Fast dimensional analysis for root cause analysis at scale

What the research is:A fast dimensional analysis (FDA) framework that automates root cause analysis on structured logs with improved scalability.

When a failure event happens in a large-scale distributed production environment, performing root cause analysis can be challenging.

Our proposed FDA framework combines structured logs from a number of sources and provides a meaningful combination of features.

As we’ve mentioned, the challenges of performing root cause analysis in a large-scale distributed production environment make outage detection and mitigation difficult.

Read the full paper:Fast Dimensional Analysis for Root Cause Investigation in Large-Scale Service EnvironmentWe’d like to t…

9 месяцев назад @ engineering.fb.com
2019 @Scale Conference recap
2019 @Scale Conference recap 2019 @Scale Conference recap

If you are interested in future events, visit the @Scale website or join the @Scale community.

@Scale 2019: Data InfraZanzibar: Google’s consistent, global authorization systemRuoming Pang, Principal Software Engineer, GoogleDetermining whether online users are authorized to access digital objects is central to preserving privacy.

6 technical challenges in developing a distributed SQL databaseNeha Deodhar, Software Engineer, YugaByteNeha discusses the experience of developing YugaByte.

@Scale 2019: SecurityLeveraging the type system to write secure applicationsShannon Zhu, Software Engineer, FacebookShannon discusses ways to extend the type system to eliminate entire classes of security vul…

9 месяцев, 1 неделя назад @ engineering.fb.com
Video @Scale 2019 recap
Video @Scale 2019 recap Video @Scale 2019 recap

At Video @Scale 2019, engineers gathered in San Francisco for a day of technical talks focused on delivering video at scale.

Adopting video at scaleSteven Robertson, Engineer, YouTubeSteven works on streaming video performance at YouTube.

AV1 PanelRonald Bultje, Founder, Two OriolesYaowu Xu, Principal Software Engineer, GoogleChekib Nouira, Senior Video Systems Engineer, IntelPanel moderated by Ioannis Katsavounidis.

Contextual video ad safetyVijaya Chandra, Software Engineering Manager, FacebookRose Kanjirathinkal, Research Scientist, FacebookVijaya leads video understanding efforts at Facebook.

Video integrity at scaleSonal Gandhi, Software Engineer, FacebookSonal talks about reducing har…

9 месяцев, 3 недели назад @ engineering.fb.com
Releasing a new benchmark and data set for evaluating neural code search models
Releasing a new benchmark and data set for evaluating neural code search models Releasing a new benchmark and data set for evaluating neural code search models

The benchmark includes the largest evaluation data set currently available for Java, consisting of a natural language query and code snippet pairs.

This data set comprises 287 Stack Overflow question-and-answer pairs from the Stack Exchange Data Dump.

A score sheet on the evaluation data set, using two models from our recent work, is also included.

We intend for this data set to serve as a benchmark for evaluating search quality across a variety of code search models.

To evaluate the performance of these models, Stack Overflow questions and code answer pairs are prime candidates, as Stack Overflow questions effectively represent what a developer may ask.

10 месяцев назад @ ai.facebook.com
Hydra: A framework that simplifies development of complex applications
Hydra: A framework that simplifies development of complex applications Hydra: A framework that simplifies development of complex applications

Hydra’s flexible approach to developing, creating, and maintaining code and configurations can help speed the development of complex applications in various fields, including machine learning research.

What it does:Hydra offers an innovative approach to composing an application’s configuration, allowing changes to a composition through configuration files as well as from the command line.

Hydra speeds development of such applications while reducing the chances of bugs, and it enables code to evolve more naturally in response to new requirements.

Why it matters:Hydra is already in use at Facebook to prototype complex research projects.

We expect to continue using the Hydra framework for buil…

10 месяцев назад @ engineering.fb.com
MaRS: How Facebook keeps maps current and accurate
MaRS: How Facebook keeps maps current and accurate MaRS: How Facebook keeps maps current and accurate

To reduce the risk of bad edits, whether intentional ( vandalism ) or unintentional, we don’t update our local copy directly.

So we, like most consumers of OSM data, have an internal storage format (a local copy).

Current approaches to keeping OSM data updated primarily focus on tackling the two axes separately.

Freshness is achieved by simply consuming upstream changesets faster, or essentially rebasing the local copy with the upstream master on a regular cadence (e.g., daily or weekly).

Let V(Downstream) be the current downstream local copy version based on an earlier version of upstream.

10 месяцев, 1 неделя назад @ engineering.fb.com
Register now for @Scale 2019!
Register now for @Scale 2019! Register now for @Scale 2019!

Registration is officially open for @Scale 2019.

Topics for the @Scale 2019 talks include cloud native platforms for event streaming, advances in self-supervised learning and natural language processing, securing SSH traffic, deploying DNS privacy technologies at scale, and more.

To register for @Scale 2019, enter your invite code here.

Visit the @Scale Community page and message us with your name, company name, and email address.

If you’ve never been to an @Scale event, you can watch David Patterson of Google and Clément Farabet of NVIDIA open last year’s event, or see videos of all the talks in last year’s recap.

10 месяцев, 3 недели назад @ engineering.fb.com
Creating a data set and a challenge for deepfakes
Creating a data set and a challenge for deepfakes Creating a data set and a challenge for deepfakes

Yet the industry doesn't have a great data set or benchmark for detecting them.

That's why Facebook is commissioning a realistic data set that will use paid actors, with the required consent obtained, to contribute to the challenge.

No Facebook user data will be used in this data set.

To ensure the quality of the data set and challenge parameters, they will initially be tested through a targeted technical working session this October at the International Conference on Computer Vision (ICCV).

The full data set release and the DFDC launch will happen at the Conference on Neural Information Processing Systems (NeurIPS) this December.

11 месяцев назад @ ai.facebook.com
New advances in natural language processing
New advances in natural language processing New advances in natural language processing

Natural language understanding (NLU) and language translation are key to a range of important applications, including identifying and removing harmful content at scale and connecting people across different languages worldwide.

We’ve also introduced a new self-supervised pretraining approach, RoBERTa, that surpassed all existing NLU systems on several language comprehension tasks.

According to human evaluations, our models were ranked top in four translation tasks: from English to German, German to English, English to Russian, and Russian to English.

SuperGLUE follows in the footsteps of GLUE, which offers a single-number metric that summarizes progress on a diverse set of NLP tasks.

By cha…

11 месяцев, 3 недели назад @ ai.facebook.com
A new model for word embeddings that are resilient to misspellings
A new model for word embeddings that are resilient to misspellings A new model for word embeddings that are resilient to misspellings

What the research is:A new model to learn word embeddings (words or phrases mapped to dense vectors of numbers that represent their meaning) that are resilient to misspellings.

To address this deficiency, we propose Misspelling Oblivious Embeddings (MOE), a new model that combines our open source library fastText with a supervised task that embeds misspellings close to their correct variants.

In addition to the semantic loss, MOE also considers an additional supervisedloss that we call spell correction loss.

The spell correction loss aims to embed misspellings close to their correct versions by minimizing the weighted sum of semantic loss and spell correction loss.

Our approach will improve…

12 месяцев назад @ ai.facebook.com
Michael F. Cohen awarded 2019 Steven A. Coons award
Michael F. Cohen awarded 2019 Steven A. Coons award Michael F. Cohen awarded 2019 Steven A. Coons award

On July 29 at SIGGRAPH, Michael F. Cohen will receive the 2019 Steven A. Coons Award for Outstanding Creative Contributions to Computer Graphics.

The award is given to one individual every two years to honor outstanding lifetime contributions to computer graphics and interactive techniques.

Cohen joined Facebook in Fall 2015 as Director of Facebook’s Computational Photography Research team, which was formed to explore new ways to share photos and videos online.

I never became an engineer but rather first entered the field of computer graphics with intentions to continue studies related to civil engineering.

This is really a marriage of computer graphics and computer vision.

1 год назад @ research.fb.com
EGG: A toolkit for multi-agent language emergence simulations
EGG: A toolkit for multi-agent language emergence simulations EGG: A toolkit for multi-agent language emergence simulations

What’s new:EGG is a new toolkit that allows researchers and developers to quickly create game simulations in which two neural network agents devise their own discrete communication system in order to solve a task together.

A lively area of machine learning (ML) research, language emergence would benefit from a more interdisciplinary approach.

Why it matters:Human language is an extremely powerful communication system that is unique in nature.

Which innate biases are necessary to ensure that a communication system shares the core properties of human language?

With EGG, ML experts can quickly probe the communication skills of new AI architectures.

1 год назад @ engineering.fb.com
MIT AI MIT AI
последний пост 1 неделя назад
Looking into the black box
Looking into the black box Looking into the black box

But much of this success involves trial and error when it comes to the deep learning networks themselves.

A group of MIT researchers recently reviewed their contributions to a better theoretical understanding of deep learning networks, providing direction for the field moving forward.

It is time to stand back and review recent insights.”Climbing data mountainsOur current era is marked by a superabundance of data — data from inexpensive sensors of all types, text, the internet, and large amounts of genomic data being generated in the life sciences.

Generalization puzzleThere is a second puzzle about what is sometimes called the unreasonable effectiveness of deep networks.

Though deep learnin…

1 неделя назад @ news.mit.edu
Commentary: America must invest in its ability to innovate
Commentary: America must invest in its ability to innovate Commentary: America must invest in its ability to innovate

Bush’s report to the president of the United States, “Science: The Endless Frontier,” called on the government to support basic research in university labs.

Named the “Endless Frontier Act,” the bill would support research focused on advancing key technologies like artificial intelligence and quantum computing.

It does not seek to alter or replace the NSF, but to “create new strength in parallel,” they write.

But if leaders take the right steps now, they write, those choices will seem, in retrospect, obvious and wise.

“Now as then, our national prosperity hinges on the next generation of technical triumphs,” Reif and Mcrobbie write.

1 неделя, 4 дня назад @ news.mit.edu
Tackling the misinformation epidemic with “In Event of Moon Disaster”
Tackling the misinformation epidemic with “In Event of Moon Disaster” Tackling the misinformation epidemic with “In Event of Moon Disaster”

As the technology to produce realistic “deepfakes” becomes more easily available, distinguishing fact from fiction will only get more challenging.

Through these sophisticated AI and machine learning technologies, the seven-minute film shows how thoroughly convincing deepfakes can be.

Also part of the launch is a new documentary, “To Make a Deepfake,” a 30-minute film by Scientific American, that uses “In Event of Moon Disaster” as a jumping-off point to explain the technology behind AI-generated media.

The project is supported by the MIT Open Documentary Lab and the Mozilla Foundation, which awarded “In Event of Moon Disaster” a Creative Media Award last year.

The new website is the project…

2 недели, 1 день назад @ news.mit.edu
Faculty receive funding to develop artificial intelligence techniques to combat Covid-19
Faculty receive funding to develop artificial intelligence techniques to combat Covid-19 Faculty receive funding to develop artificial intelligence techniques to combat Covid-19

Artificial intelligence has the power to help put an end to the Covid-19 pandemic.

Now, MIT researchers working on seven groundbreaking projects on Covid-19 will be funded to more rapidly develop and apply novel AI techniques to improve medical response and slow the pandemic spread.

The consortium is dedicated to accelerating advances in research and combining machine learning, artificial intelligence, internet of things, ethics, and public policy — for enhancing societal outcomes.

Gifford was awarded funding for his project that uses machine learning to develop more informed vaccine designs with improved population coverage, and to develop models of Covid-19 disease severity using individu…

2 недели, 4 дня назад @ news.mit.edu
Letting robots manipulate cables
Letting robots manipulate cables Letting robots manipulate cables

For humans, it can be challenging to manipulate thin flexible objects like ropes, wires, or cables.

As a cable slides between the fingers, its shape is constantly changing, and the robot’s fingers must be constantly sensing and adjusting the cable’s position and motion.

The team’s new system uses a pair of soft robotic grippers with high-resolution tactile sensors (and no added mechanical constraints) to successfully manipulate freely moving cables.

The team’s second step was to create a perception-and-control framework to allow cable manipulation.

In the future, they plan to study more complex cable manipulation tasks such as cable routing and cable inserting through obstacles, and they wa…

3 недели, 1 день назад @ news.mit.edu
Exploring interactions of light and matter
Exploring interactions of light and matter Exploring interactions of light and matter

His father, trained as a mechanical engineer, spent his career working first in that field, then in electrical engineering, and then civil engineering.

Last year, Hu earned tenure as an associate professor in MIT’s Department of Materials Science and Engineering.

“I got fascinated with light,” he says, recalling how he began working in this field.

This includes work on devices called optical diodes or optical isolators, which allow light to pass through only in one direction, and systems for coupling light signals into and out of photonic chips.

Lately, Hu has been focusing on applying machine-learning methods to improve the performance of optical systems.

1 месяц назад @ news.mit.edu
The MIT Press and UC Berkeley launch Rapid Reviews: COVID-19
The MIT Press and UC Berkeley launch Rapid Reviews: COVID-19 The MIT Press and UC Berkeley launch Rapid Reviews: COVID-19

The MIT Press has announced the launch of Rapid Reviews: COVID-19 (RR:C19), an open access, rapid-review overlay journal that will accelerate peer review of Covid-19-related research and deliver real-time, verified scientific information that policymakers and health leaders can use.

Using artificial intelligence tools, a global team will identify promising scholarship in preprint repositories, commission expert peer reviews, and publish the results on an open access platform in a completely transparent process.

Amy Brand, director of the MIT Press sees the no-cost open access model as a way to increase the impact of global research and disseminate high-quality scholarship.

“We are confident…

1 месяц назад @ news.mit.edu
Improving global health equity by helping clinics do more with less
Improving global health equity by helping clinics do more with less Improving global health equity by helping clinics do more with less

Despite these encouraging signs, however, the availability of essential vaccines has stagnated globally in recent years, according the World Health Organization.

Both products represent steps toward macro-eyes’ larger goal of transforming health care through artificial intelligence.

“The state of the art in machine learning will result from confronting fundamental challenges in the most difficult environments in the world,” Fels says.

The pair’s experience crunching numbers in different industries alerted them to a shortcoming in health care.

The founders are also exploring ways to apply that approach to help direct Covid-19 patients to health clinics with sufficient capacity.

1 месяц, 1 неделя назад @ news.mit.edu
Identifying a melody by studying a musician’s body language
Identifying a melody by studying a musician’s body language Identifying a melody by studying a musician’s body language

When the ear fails to tell two instruments apart, the eye often pitches in by matching each musician’s movements to the beat of each part.

“Body keypoints provide powerful structural information,” says the study’s lead author, Chuang Gan, an IBM researcher at the lab.

“We learn from all of our senses,” says Antonio Torralba, an MIT professor and co-senior author of the study.

An update to PixelPlayer allowed you to distinguish between two violins in a duet by matching each musician’s movements with the tempo of their part.

The latter study suggests that sound-tracking tools might be a useful addition in self-driving cars, complementing their cameras in poor driving conditions.

1 месяц, 1 неделя назад @ news.mit.edu
Cynthia Breazeal named Media Lab associate director
Cynthia Breazeal named Media Lab associate director Cynthia Breazeal named Media Lab associate director

Cynthia Breazeal has been promoted to full professor and named associate director of the Media Lab, joining the two other associate directors: Hiroshi Ishii and Andrew Lippman.

In her new associate director role, Breazeal will work with lab faculty and researchers to develop new strategic research initiatives.

She will also play a key role in exploring new funding mechanisms to support broad Media Lab needs, including multi-faculty research efforts, collaborations with other labs and departments across the MIT campus, and experimental executive education opportunities.

Her book, “Designing Sociable Robots” (MIT Press, 2002), is considered pivotal in launching the field.

The following year, …

1 месяц, 2 недели назад @ news.mit.edu
Bringing the predictive power of artificial intelligence to health care
Bringing the predictive power of artificial intelligence to health care Bringing the predictive power of artificial intelligence to health care

It blossomed into a six year stint at the Broad, after which he continued exploring the intersection of big data and health care.

“After a year in health care, I realized it was going to be really hard to do anything else,” DeCaprio says.

Often the first problems startups run into is making their algorithms work with each health care system’s data.

Another limitation of AI in health care has been the difficulty of understanding how models get to results.

“Someone who is 85 years old and shut in may not know there’s a community based organization that will deliver them groceries.”For DeCaprio, bringing the predictive power of AI to health care has been a rewarding, if humbling, experience.

1 месяц, 2 недели назад @ news.mit.edu
MIT and Toyota release innovative dataset to accelerate autonomous driving research
MIT and Toyota release innovative dataset to accelerate autonomous driving research MIT and Toyota release innovative dataset to accelerate autonomous driving research

The following was issued as a joint release from the MIT AgeLab and Toyota Collaborative Safety Research Center.

These are some of the questions researchers from the AgeLab at the MIT Center for Transportation and Logistics and the Toyota Collaborative Safety Research Center (CSRC) are trying to answer by sharing an innovative new open dataset called DriveSeg.

Through the release of DriveSeg, MIT and Toyota are working to advance research in autonomous driving systems that, much like human perception, perceive the driving environment as a continuous flow of visual information.

According to Sherony, video-based driving scene perception provides a flow of data that more closely resembles dyna…

1 месяц, 2 недели назад @ news.mit.edu
MIT-Takeda program launches
MIT-Takeda program launches MIT-Takeda program launches

In February, researchers from MIT and Takeda Pharmaceuticals joined together to celebrate the official launch of the MIT-Takeda Program.

The MIT-Takeda Program aims to fuel the development and application of artificial intelligence (AI) capabilities to benefit human health and drug development.

“We were truly impressed by the creativity and breadth of the proposals we received,” says Anantha P. Chandrakasan, dean of the School of Engineering, Vannevar Bush Professor of Electrical Engineering and Computer Science, and co-chair of the MIT-Takeda Program Steering Committee.

“Together we are building capabilities and addressing challenges through interrogation of multiple data types that we hav…

1 месяц, 2 недели назад @ news.mit.edu
What jumps out in a photo changes the longer we look
What jumps out in a photo changes the longer we look What jumps out in a photo changes the longer we look

But in the real world, human attention often shifts abruptly.

When tested, their model outperformed the state of the art at predicting saliency across viewing durations.

In addition to guiding an editing tool to crop an image for shorter or longer viewing durations, it could prioritize which elements in a compressed image to render first for viewers.

Research on human attention offers insights for technologists.

By making it faster and cheaper to gather human attention data, the platforms may help to generate new knowledge on human vision and cognition.

1 месяц, 2 недели назад @ news.mit.edu
Learning the ropes and throwing lifelines
Learning the ropes and throwing lifelines Learning the ropes and throwing lifelines

From her apartment in Sidney-Pacific, where she has stayed put due to travel restrictions in her home country of India, Chauhan is still learning the ropes of her new position.

“It gave me a sense of community and made me feel like I have a family here,” she says.

Chauhan has found additional ways to address the particular difficulties that international students face.

As a member of the Presidential Advisory Council this year, she gathered international student testimonies on visa difficulties and presented them to MIT’s president and the director of the International Students Office.

For Chauhan, that meant working as a teaching assistant, drawing henna designs, singing, enjoying yoga, an…

1 месяц, 3 недели назад @ news.mit.edu
Berkeley AI
последний пост 1 день, 11 часов назад
Estimating the fatality rate is difficult but doable with better data
Estimating the fatality rate is difficult but doable with better data Estimating the fatality rate is difficult but doable with better data

Estimating the fatality rate is difficult but doable with better dataThe case fatality rate quantifies how dangerous COVID-19 is, and how risk of death varies with strata like geography, age, and race.

Current estimates of the COVID-19 case fatality rate (CFR) are biased for dozens of reasons, from under-testing of asymptomatic cases to government misreporting.

The mathematical form of the naive estimator $E_{\rm naive}$ allows us to see easily what we need to do to make it unbiased.

If we collect data properly, even the naive estimator $E_{\rm naive}$ has good performance.

I’d like to re-emphasize a point here: collecting data as above will make the naive estimator $E_{\rm naive}$ unbias…

1 день, 11 часов назад @ bair.berkeley.edu
Exploring Exploration: Comparing Children with RL Agents in Unified Environments
Exploring Exploration: Comparing Children with RL Agents in Unified Environments Exploring Exploration: Comparing Children with RL Agents in Unified Environments

Exploring Exploration: Comparing Children with RL Agents in Unified EnvironmentsDespite recent advances in artificial intelligence (AI) research, human children are still by far the best learners we know of, learning impressive skills like language and high-level reasoning from very little data.

The main thing that we know about the child exploration is that children form hypotheses about how the world works, and they engage in exploration to test those hypotheses.

How do AI agents explore?

We do this using DeepMind Lab, an existing platform for training and evaluating RL agents.

Conclusion and future workIn conclusion, this work only begins to touch on a number of deep questions regarding …

1 неделя, 4 дня назад @ bair.berkeley.edu
Can RL From Pixels be as Efficient as RL From State?
Can RL From Pixels be as Efficient as RL From State? Can RL From Pixels be as Efficient as RL From State?

Can RL From Pixels be as Efficient as RL From State?

To date, it has been commonly assumed that RL operating on coordinate state is significantly more data-efficient than pixel-based RL.

In principle, if the environment is fully observable, we should also be able to learn representations that capture the state.

Contrastive Learning in RL SettingCURL was inspired by recent advances in contrastive representation learning in computer vision (CPC, CPCv2, MoCo, SimCLR).

Contrastive Learning vs Data AugmentationIf data augmentation with RL performs so well, do we need unsupervised representation learning?

2 недели, 2 дня назад @ bair.berkeley.edu
Decentralized Reinforcement Learning:Global Decision-Making viaLocal Economic Transactions
Decentralized Reinforcement Learning:Global Decision-Making viaLocal Economic Transactions Decentralized Reinforcement Learning:Global Decision-Making viaLocal Economic Transactions

One might naturally wonder what it might take for learning systems to scale in complexity in the same way as programmed systems have.

In other words, the society of primitive agents form a super-agent that solves the MDP as a consequence of the primitive agents' optimal auction strategies.

Societal decision-making frames standard reinforcement learning from the perspective of self-organizing primitive agents.

As we discuss next, the primitive agents need not be restricted to literal actions.

In some sense these complex learning systems are grown rather than built because every component at every abstraction layer is learning.

3 недели, 3 дня назад @ bair.berkeley.edu
D4RL: Building Better Benchmarks for Offline Reinforcement Learning
D4RL: Building Better Benchmarks for Offline Reinforcement Learning D4RL: Building Better Benchmarks for Offline Reinforcement Learning

In offline RL, we assume all experience is collected offline, fixed and no additional data can be collected.

In order to develop effective algorithms for offline RL, we need widely available benchmarks that are easy to use and can accurately measure progress on this problem.

Narrow and biased data distributions are a common property in real-world datasets that can create problems for offline RL algorithms.

The Flow project proposes to use autonomous vehicles for reducing traffic congestion, which we believe is a compelling use case for offline RL.

Future DirectionsIn the near future, we would be excited to see offline RL applications move from simulated domains to real-world domains where s…

1 месяц, 1 неделя назад @ bair.berkeley.edu
Open Compound Domain Adaptation
Open Compound Domain Adaptation Open Compound Domain Adaptation

Therefore, we start rethinking machine learning and domain adaptation systems, and try to introduce a continuous learning protocol under domain adaptation scenario.

Open Compound Domain Adaptation (OCDA)The goal of domain adaptation is to adapt the model learned on the training data to the test data of a different distribution.

We propose to study Open Compound Domain Adaptation (OCDA), a continuous and more realistic setting for domain adaptation (Figure 2).

The newly proposed Open Compound Domain Adaptation (OCDA) serves as a more comprehensive and more realistic touchstone for evaluating domain adaptation and transfer learning systems.

Figure 3: The differences between single-target doma…

1 месяц, 3 недели назад @ bair.berkeley.edu
OmniTact: A Multi-Directional High-Resolution Touch Sensor
OmniTact: A Multi-Directional High-Resolution Touch Sensor OmniTact: A Multi-Directional High-Resolution Touch Sensor

OmniTact: A Multi-Directional High-Resolution Touch SensorHuman thumb next to our OmniTact sensor, and a US penny for scale.

Recently, the GelSight sensor has caught significant interest for learning-based robotics due to its low cost and rich signal.

Comparison of GelSight-style sensor (left side) to our OmniTact sensor (right side).

The OmniTact SensorOur OmniTact sensor design aims to address these limitations.

We additionally compared performance with another multi-directional tactile sensor, the OptoForce sensor, which only had a success rate of 17%.

2 месяца, 3 недели назад @ bair.berkeley.edu
Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in Simulation
Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in Simulation Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in Simulation

Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in SimulationHumans manipulate 2D deformable structures such as fabric on a daily basis, from putting on clothes to making beds.

Model-Free MethodsModel-Free Learning without DemonstrationsIn this paper we present a model-free deep reinforcement learning approach for smoothing cloth.

An example of real robot cloth smoothing experiments with varying starting states and cloth colors.

Since this policy is easy to define, we code an algorithmic supervisor in simulation and perform imitation learning using Dataset Aggregation (DAgger).

Several episodes of both manipulating rope and cloth using our method,…

3 месяца назад @ bair.berkeley.edu
Unsupervised Meta-Learning: Learning to Learn without Supervision
Unsupervised Meta-Learning: Learning to Learn without Supervision Unsupervised Meta-Learning: Learning to Learn without Supervision

Unsupervised Meta-Learning: Learning to Learn without SupervisionThis post is cross-listed on the CMU ML blog.

In this post we introduce theory and algorithms for unsupervised meta-learning, where machine learning algorithms themselves propose their own task distributions.

For example, a distribution over supervised learning tasks may include learning a dog detector, learning a cat detector, and learning a bird detector.

These unsupervised meta-learning algorithms allow for learning in regimes previously impractical, and further expand that capability of machine learning methods.

A number of open questions remain about unsupervised meta-learning:Unsupervised learning is closely connected to…

3 месяца назад @ bair.berkeley.edu
The Ingredients of Real World Robotic Reinforcement Learning
The Ingredients of Real World Robotic Reinforcement Learning The Ingredients of Real World Robotic Reinforcement Learning

The simulation will never exactly match the real world, which means that improvements in simulation performance may not translate to improvements in the real world.

However, training robots in the real world with reinforcement learning has proven challenging, due to certain constraints.

What makes real world robotic reinforcement learning so challenging?

We show effective uninstrumented real world learning on two dexterous manipulation tasks with a 3 fingered robotic hand.

However, we believe that the ingredients of real world RL that we have proposed should endure as principles of design for real world RL systems.

3 месяца, 1 неделя назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 1 час назад
Building machine learning workflows with Amazon SageMaker Processing jobs and AWS Step Functions
Building machine learning workflows with Amazon SageMaker Processing jobs and AWS Step Functions Building machine learning workflows with Amazon SageMaker Processing jobs and AWS Step Functions

This integration allows data scientists to easily integrate Amazon SageMaker Processing into their ML workflows using the Step Functions and Step Functions Data Science SDK.

Benefits of the AWS Step Functions Data Science SDKThe AWS Step Functions Data Science SDK allows data scientists to easily construct ML workflows without dealing with DevOps tasks like provisioning hardware or deploying software.

Amazon SNS, Amazon SQS, Amazon EMR, AWS Lambda, AWS Glue, AWS Batch, and Amazon Elastic Container Service (Amazon ECS) – For more information, see AWS Step Functions Data Science SDK– You can build and orchestrate ML workflows using Python.

Amazon SNS, Amazon SQS, Amazon EMR, AWS Lambda, AWS G…

1 час назад @ aws.amazon.com
Improving speech-to-text transcripts from Amazon Transcribe using custom vocabularies and Amazon Augmented AI
Improving speech-to-text transcripts from Amazon Transcribe using custom vocabularies and Amazon Augmented AI Improving speech-to-text transcripts from Amazon Transcribe using custom vocabularies and Amazon Augmented AI

When using Amazon Transcribe out of the box, you may find that some of these technical mentions are mis-transcribed.

Improve transcription using custom vocabulary.

For more information on custom vocabulary tables, see Create a Custom Vocabulary Using a Table.

Using custom vocabularies resulted in an 80-percentage point or more increase in the number of correctly transcribed technical terms.

She works with multiple teams in AWS to create technical documentation and tutorials for customers using Amazon SageMaker, MxNet, and AutoGluon.

23 часа назад @ aws.amazon.com
This month in AWS Machine Learning: July 2020 edition
This month in AWS Machine Learning: July 2020 edition This month in AWS Machine Learning: July 2020 edition

Every day there is something new going on in the world of AWS Machine Learning—from launches to new use cases like posture detection to interactive trainings like the AWS Power Hour: Machine Learning on Twitch.

LaunchesAs models become more sophisticated, AWS customers are increasingly applying machine learning (ML) prediction to video content, whether that’s in media and entertainment, autonomous driving, or more.

AWS Power Hour: Machine Learning is a weekly, live-streamed program that premiered Thursday, July 23, at 7:00 p.m. EST and will air at that time every Thursday for 7 weeks.

See you next month for more on AWS ML!

About the authorLaura Jones is a product marketing lead for AWS AI/M…

3 дня, 21 час назад @ aws.amazon.com
Enhancing recommendation filters by filtering on item metadata with Amazon Personalize
Enhancing recommendation filters by filtering on item metadata with Amazon Personalize Enhancing recommendation filters by filtering on item metadata with Amazon Personalize

You can use the Amazon Personalize console or API to create a filter with your business logic using the Amazon Personalize domain specific language (DSL).

This post walks you through setting up and using item and user metadata-based recommendation filters in Amazon Personalize.

PrerequisitesTo define and apply filters, you first need to set up the following Amazon Personalize resources.

Amazon Personalize can filter items based on user-item interaction, item metadata, or user metadata datasets.

For more information about optimizing your user experience with Amazon Personalize, see What Is Amazon Personalize?

3 дня, 22 часа назад @ aws.amazon.com
Code-free machine learning: AutoML with AutoGluon, Amazon SageMaker, and AWS Lambda
Code-free machine learning: AutoML with AutoGluon, Amazon SageMaker, and AWS Lambda Code-free machine learning: AutoML with AutoGluon, Amazon SageMaker, and AWS Lambda

After you upload the data to Amazon S3, a Lambda function kicks off an Amazon SageMaker model training job that runs the pre-made AutoGluon script on the training data.

Deploying the pipeline with AWS CloudFormationYou can deploy this pipeline automatically in an AWS account using a pre-made AWS CloudFormation template.

All these files are available for download to your local machine from the Amazon S3 console (see the following screenshot).

ConclusionIn this post, we demonstrated how to train ML models and make predictions without writing a single line of code—thanks to AutoGluon, Amazon SageMaker, and AWS Lambda.

The Amazon ML Solutions Lab pairs your team with Amazon ML experts to prepar…

3 дня, 22 часа назад @ aws.amazon.com
Announcing the AWS DeepComposer Chartbusters Spin the Model challenge
Announcing the AWS DeepComposer Chartbusters Spin the Model challenge Announcing the AWS DeepComposer Chartbusters Spin the Model challenge

Whether your jam is reggae, hip-hop or electronic you can get creative and enter the latest AWS DeepComposer Chartbusters challenge!

When you submit a composition, AWS DeepComposer automatically adds it to the Spin the Model challenge playlist in SoundCloud.

You can use the A deep dive into training an AR-CNN model learning capsule available on the AWS DeepComposer console to learn the concepts to train a model.

To access the learning capsule, sign in to the AWS DeepComposer console and choose learning capsules in the navigation pane.

For Imported track, choose Choose file to upload the file.

4 дня, 1 час назад @ aws.amazon.com
Announcing the winner for the AWS DeepComposer Chartbusters Bach to the Future challenge
Announcing the winner for the AWS DeepComposer Chartbusters Bach to the Future challenge Announcing the winner for the AWS DeepComposer Chartbusters Bach to the Future challenge

We are excited to announce the top 10 compositions and the winner for the AWS DeepComposer Chartbusters Bach to the Future challenge.

AWS DeepComposer gives developers a creative way to get started with machine learning.

The first challenge, Bach to the Future, required developers to use a new generative AI algorithm provided on the AWS DeepComposer console to create compositions in the style of Bach.

The winner, Catherine Chui, will receive an AWS DeepComposer Chartbusters gold record.

For more information about the competition and how to participate, see Announcing the AWS DeepComposer Chartbusters Spin the Model challenge.

4 дня, 2 часа назад @ aws.amazon.com
Create a multi-region Amazon Lex bot with Amazon Connect for high availability
Create a multi-region Amazon Lex bot with Amazon Connect for high availability Create a multi-region Amazon Lex bot with Amazon Connect for high availability

AWS customers rely on Amazon Lex bots to power their Amazon Connect self service conversational experiences on telephone and other channels.

The region check function reads this table for the most up-to-date primary Region mapping for Amazon Connect and Amazon Lex.

The PutSession API call doesn’t have any extra costs associated with Amazon Lex, but it doesn’t test any natural language understanding (NLU) features of Amazon Lex.

On every call that Amazon Connect receives, it issues a region check function call to get the active Amazon Lex Region for that particular Amazon Connect Region.

On the Amazon Connect console, choose the instance alias where you want the Amazon Connect flow to be.

4 дня, 5 часов назад @ aws.amazon.com
Optimizing your engagement marketing with personalized recommendations using Amazon Personalize and Braze
Optimizing your engagement marketing with personalized recommendations using Amazon Personalize and Braze Optimizing your engagement marketing with personalized recommendations using Amazon Personalize and Braze

Fortunately, the features and integration options of Braze and Amazon Personalize provide the flexibility to suit your operational requirements.

With personalization built into the application, you can connect Amazon Personalize with Braze to deliver personalized recommendations through outbound engagement channels such as email, SMS, and push notifications.

Furthermore, if you don’t need an Amazon Personalize campaign for other purposes or you’re creating an Amazon Personalize solution dedicated to email personalization, you can forego creating a campaign entirely.

Import users into Braze and build a Braze campaign that uses Connected Content to retrieve personalized recommendations from A…

5 дней, 20 часов назад @ aws.amazon.com
Translating documents, spreadsheets, and presentations in Office Open XML format using Amazon Translate
Translating documents, spreadsheets, and presentations in Office Open XML format using Amazon Translate Translating documents, spreadsheets, and presentations in Office Open XML format using Amazon Translate

Now you can translate .docx, .xlsx, and .pptx documents using Amazon Translate.

Amazon Translate now supports translation of Office Open XML documents in DOCX, PPTX, and XLSX format.

Amazon Translate is a fully managed neural machine translation service that delivers high-quality and affordable language translation in 55 languages.

In this post, we walk you through a step-by-step process to translate documents on the AWS Management Console.

For more information about performing batch translation jobs, see Starting a Batch Translation Job.

5 дней, 23 часа назад @ aws.amazon.com
Simplifying application onboarding with Amazon CodeGuru Profiler
Simplifying application onboarding with Amazon CodeGuru Profiler Simplifying application onboarding with Amazon CodeGuru Profiler

If the role or user already exists in IAM, you simply select the role or user on the CodeGuru Profiler console.

If the role or user already exists in IAM, you simply select the role or user on the CodeGuru Profiler console.

For more information, see Setting up Amazon CodeGuru Profiler.

Run the demo application with the following code: export AWS_CODEGURU_PROFILER_GROUP_NAME=DemoApplication-WithoutIssues mvn clean install ## This command will generate the DemoApplication-1.0-jar-with-dependencies.jar java -javaagent:codeguru-profiler-java-agent-standalone-1.0.0.jar \ -jar target/DemoApplication-1.0-jar-with-dependencies.jar without-issuesFor more information about visualizations in CodeGuru …

6 дней назад @ aws.amazon.com
How SNCF Réseau and Olexya migrated a Caffe2 vision pipeline to Managed Spot Training in Amazon SageMaker
How SNCF Réseau and Olexya migrated a Caffe2 vision pipeline to Managed Spot Training in Amazon SageMaker How SNCF Réseau and Olexya migrated a Caffe2 vision pipeline to Managed Spot Training in Amazon SageMaker

Data channels are Amazon S3 ARNs passed to the Amazon SageMaker SDK at training time and ingested in the Amazon SageMaker container when training starts.

Furthermore, Amazon SageMaker training API calls were set with Managed Spot Instance usage activated, which contributed to a reported savings of 71% compared to the on-demand Amazon SageMaker price.

Amazon SageMaker Managed Spot Training is an Amazon SageMaker feature that enables the use of Amazon Elastic Compute Cloud (Amazon EC2) Spot Instance capacity for training.

In Amazon SageMaker, Spot Instance usage is fully managed by the service, and you can invoke it by setting two training SDK parameters:train_use_spot_instances=True to reque…

1 неделя назад @ aws.amazon.com
Building a multilingual question and answer bot with Amazon Lex
Building a multilingual question and answer bot with Amazon Lex Building a multilingual question and answer bot with Amazon Lex

This post describes how to achieve that by using the multi-language functionality of your question and answer bot (QnABot).

For instructions on creating and customizing your bot, see Create a Question and Answer Bot with Amazon Lex and Amazon Alexa or the Q&A Self-Paced Guide.

{{/ifLang}} {{#defaultLang}} Use the Content Designer Question and Test tools to find your existing documents and edit them directly in the console.

For instructions, see Create a Question and Answer Bot with Amazon Lex and Amazon Alexa.

From the drop-down menu with your default language, choose Language settings.

1 неделя, 1 день назад @ aws.amazon.com
Enhancing your chatbot experience with web browsing
Enhancing your chatbot experience with web browsing Enhancing your chatbot experience with web browsing

Installing the chatbot UITo install your chatbot, complete the following steps:Deploy the chatbot UI in your AWS account by launching the following AWS CloudFormation stack:Set EnableCognitoLogin to true in the parameters.

Installing the chatbot UI enhancerAfter you install the chatbot UI, launch the following AWS CloudFormation stack:There are two parameters for this stack:BotName – The chatbot UI bot you deployed.

– The chatbot UI bot you deployed.

Chatbot iframe – This is the chatbot UI that the end-user interacts with.

Chatbot UI user login with Amazon CognitoWhen you’re authenticated through the integrated Amazon Cognito feature, the chatbot UI attaches a signed token as a session attr…

1 неделя, 3 дня назад @ aws.amazon.com
Processing PDF documents with a human loop using Amazon Textract and Amazon Augmented AI
Processing PDF documents with a human loop using Amazon Textract and Amazon Augmented AI Processing PDF documents with a human loop using Amazon Textract and Amazon Augmented AI

For more information, see Using with Amazon Textract with Amazon Augmented AI for processing critical documents.

Although Amazon Textract can process images (PNG and JPG) and PDF documents, Amazon A2I human reviewers need to have individual pages as images and process them individually using the AnalyzeDocument API of Amazon Textract.

On the Amazon SageMaker console, navigate to the Human review workflows page Choose Create human review workflow.

Updating the solution with the Human Review workflowYou’re now ready to add your human review workflow ARN.

For more information about Amazon Textract and Amazon A2I, see Using Amazon Augmented AI with Amazon Textract.

1 неделя, 4 дня назад @ aws.amazon.com
NVIDIA
последний пост 4 часа назад
AI Explains AI: Fiddler Develops Model Explainability for Transparency
AI Explains AI: Fiddler Develops Model Explainability for Transparency AI Explains AI: Fiddler Develops Model Explainability for Transparency

The San Francisco-based startup offers an explainable AI platform that enables companies to explain, monitor and analyze their AI products.

Explainable AI is a growing area of interest for enterprises because those outside of engineering often need to understand how their AI models work.

Explainable AI is a set of tools and techniques that help explore the math inside an AI model.

The result is explainable AI can help deliver insights into how and why a particular decision was made by a model.

Explainability for TransparencyFounded in 2018, Fiddler Labs offers explainability for greater transparency in businesses.

4 часа назад @ blogs.nvidia.com
Keeping a Watchful AI: NASA Project Aims to Predict Space Weather Events
Keeping a Watchful AI: NASA Project Aims to Predict Space Weather Events Keeping a Watchful AI: NASA Project Aims to Predict Space Weather Events

It uses datasets of tracked changes in the magnetosphere — where the Earth’s magnetic field interacts with solar wind — to train AI-powered models that can detect patterns of space weather events and predict their Earth-related impacts.

Modeling Space Weather Impacts with AIGanju’s work with the FDL began in 2017, when its founder, James Parr, asked her to start advising the organization.

Her current task, advising the geoeffectiveness challenge, seeks to use machine learning to characterize magnetic field perturbations and model the impact of space weather events.

In addition to solar storms, space weather events can include such activities as solar flares, which are sudden flashes of incr…

5 часов назад @ blogs.nvidia.com
Improving INT8 Accuracy Using Quantization Aware Training and the NVIDIA Transfer Learning Toolkit
Improving INT8 Accuracy Using Quantization Aware Training and the NVIDIA Transfer Learning Toolkit Improving INT8 Accuracy Using Quantization Aware Training and the NVIDIA Transfer Learning Toolkit

During training, the system is aware of this desired outcome, called quantization-aware training (QAT).

After the training is complete with a satisfactory model accuracy, the model is then calibrated using the TensorRT INT8 entropy calibrator.

For more information about training a DetectNet_v2 model using the PeopleNet model as pretrained weights, see Training with Custom Pretrained Models Using the NVIDIA Transfer Learning Toolkit.

To deploy this model with the DLA, you must generate the calibration cache file using PTQ on the QAT-trained .tlt model file.

To deploy the PeopleNet v2.0 model on the DLA using INT8 mode, we generated the quantization scales using the force_ptq mode of tlt-expo…

7 часов назад @ developer.nvidia.com
Training Instance Segmentation Models Using Mask R-CNN on the NVIDIA Transfer Learning Toolkit
Training Instance Segmentation Models Using Mask R-CNN on the NVIDIA Transfer Learning Toolkit Training Instance Segmentation Models Using Mask R-CNN on the NVIDIA Transfer Learning Toolkit

Instance segmentation: segmentation masks over detected objects.

With the release of TLT 2.0, NVIDIA added training support for instance segmentation, using Mask R-CNN.

Mask R-CNN is natively integrated with the DeepStream SDK, a streaming analytic toolkit for building intelligent video analytic applications.

Training a Mask R-CNN model using COCOMask R-CNN is a two-stage, object detection and segmentation model introduced in 2017.

ConclusionIn this post, you learned about training instance segmentation models using the Mask R-CNN architecture with the TLT.

7 часов назад @ developer.nvidia.com
Building Intelligent Video Analytics Apps Using NVIDIA DeepStream 5.0 (Updated for GA)
Building Intelligent Video Analytics Apps Using NVIDIA DeepStream 5.0 (Updated for GA) Building Intelligent Video Analytics Apps Using NVIDIA DeepStream 5.0 (Updated for GA)

The DeepStream application can run on an edge device powered by NVIDIA Jetson or on-premises servers powered by NVIDIA T4s.

DeepStream 5.0 featuresWith the DeepStream 5.0, NVIDIA has made it easier than ever to get started on building and deploying AI-based IVA apps on the edge.

After inference, Triton Server returns the output tensors back to the shared library, where they are post-processed to generate the metadata.

For cloud-to-edge messaging, the supported protocol in DeepStream 5.0 is Kafka using the new low-level msgbroker library, interacting directly with the DeepStream application.

[osd] enable=1 gpu-id=0 border-width=3 text-size=15 text-color=1;1;1;1; text-bg-color=0.3;0.3;0.3;1 f…

7 часов назад @ developer.nvidia.com
Teen’s Gambit: 15-Year-Old Chess Master Puts Blundering Laptop in Check with Jetson Platform
Teen’s Gambit: 15-Year-Old Chess Master Puts Blundering Laptop in Check with Jetson Platform Teen’s Gambit: 15-Year-Old Chess Master Puts Blundering Laptop in Check with Jetson Platform

Chess engines like Leela Chess Zero — Zhu’s go-to practice partner, which recently beat all others at the 17th season of the Top Chess Engine Championship — use artificial neural network algorithms to mimic the human brain and make moves.

Zhu turned to the NVIDIA Jetson Xavier NX module to solve the issue.

She also noted that doing the same with the NVIDIA Jetson AGX Xavier module doubled the speed at which the engine analyzed chess positions.

“It was so memorable.”Besides chess, Zhu has a passion for computer science and hopes to study it in college.

Find out more about Zhu’s chess and tech endeavors.

1 день, 5 часов назад @ blogs.nvidia.com
Non-Stop Shopping: Startup’s AI Let’s Supermarkets Skip the Line
Non-Stop Shopping: Startup’s AI Let’s Supermarkets Skip the Line Non-Stop Shopping: Startup’s AI Let’s Supermarkets Skip the Line

The journey starts with the sort of shopping anyone who’s waited in a long checkout line has longed for.

In their daily operations, the system uses those models to run millions of inference tasks with help from NVIDIA TensorRT software.

A supermarket outside London testing the Trigo system uses servers in its back room with 40-50 NVIDIA RTX GPUs.

“They said Trigo was the future of retail,” Gorovici said.

Like sailing in the aqua-blue Mediterranean, AI in retail is a compelling opportunity.

4 дня, 5 часов назад @ blogs.nvidia.com
Taking the Heat Off: AI Temperature Screening Aids Businesses Amid Pandemic
Taking the Heat Off: AI Temperature Screening Aids Businesses Amid Pandemic Taking the Heat Off: AI Temperature Screening Aids Businesses Amid Pandemic

Central California-based IntelliSite Corp. and its recently acquired startup, Deep Vision AI, have developed a temperature screening application that can scan over 100 people a minute.

Temperature readings are accurate within a tenth of a degree Celcius.

“Our software platform has multiple AI modules, including foot traffic counting and occupancy monitoring, as well as vehicle recognition,” said Agustin Caverzasi, co-founder of Deep Vision AI, and now president of IntelliSite’s AI business unit.

“Deep Vision AI joined Inception at the very beginning, and our engineering and research teams received support with resources like GPUs for training,” Caverzasi said.

Deep Vision and IntelliSite ne…

4 дня, 5 часов назад @ blogs.nvidia.com
HPE’s Jared Dame on How AI, Data Science Driving Demand for Powerful New Workstations
HPE’s Jared Dame on How AI, Data Science Driving Demand for Powerful New Workstations HPE’s Jared Dame on How AI, Data Science Driving Demand for Powerful New Workstations

Smart phones, smart devices, the cloud — if it seems like AI is everywhere, that’s because it is.

That makes more essential than ever the powerful workstations able to crunch the ever growing quantities of data on which modern AI is built.

Jared Dame, Hewlett Packard Enterprise’s director of business development and strategy for AI, data science and edge technologies, spoke to AI Podcast host Noah Kravitz about the role HPE’s workstations play in cutting-edge AI and data science.

In the AI pipeline, Dame explained, workstations can do just about everything — from training to inference.

Every vertical market does data science, every vertical market is adopting various types of AI.” — Jared D…

4 дня, 23 часа назад @ blogs.nvidia.com
It’s Not Pocket Science: Undergrads at Hackathon Create App to Evaluate At-Home Physical Therapy Exercises
It’s Not Pocket Science: Undergrads at Hackathon Create App to Evaluate At-Home Physical Therapy Exercises It’s Not Pocket Science: Undergrads at Hackathon Create App to Evaluate At-Home Physical Therapy Exercises

Together, they created PocketPT, an app that lets users know whether they’re completing a physical therapy exercise with the correct posture and form.

The app’s AI model uses the NVIDIA Jetson Nano developer kit to detect a user doing the tree pose, a position known to increase shoulder muscle strength and improve balance.

The Jetson Nano performs image classification so the model can tell whether the pose is being done correctly based on 100+ images it was trained on, which the team took of themselves.

Continuing exercises at home is a crucial part of recovery for physical therapy patients, but doing them incorrectly can actually hinder progress, she explained.

It’s definitely worth our ti…

4 дня, 23 часа назад @ blogs.nvidia.com
Building AI Infrastructure with NVIDIA DGX A100 for Autonomous Vehicles
Building AI Infrastructure with NVIDIA DGX A100 for Autonomous Vehicles Building AI Infrastructure with NVIDIA DGX A100 for Autonomous Vehicles

NVIDIA has introduced NVIDIA DGX A100, which is built on the brand new NVIDIA A100 Tensor Core GPU.

DGX A100 is the third generation of DGX systems and is the universal system for AI infrastructure.

Featuring five petaFLOPS of AI performance, DGX A100 excels on all AI workloads: analytics, training, and inference.

This unmatched flexibility reduces costs, increases scalability, and makes DGX A100 the foundational building block of the modern AI data center.

In this post, I redefine the computational needs for AV infrastructure with DGX A100 systems.

5 дней, 19 часов назад @ developer.nvidia.com
Validating Distributed Multi-Node Autonomous Vehicle AI Training with NVIDIA DGX Systems on OpenShift with DXC Robotic Drive
Validating Distributed Multi-Node Autonomous Vehicle AI Training with NVIDIA DGX Systems on OpenShift with DXC Robotic Drive Validating Distributed Multi-Node Autonomous Vehicle AI Training with NVIDIA DGX Systems on OpenShift with DXC Robotic Drive

In this post, we validate DGX multi-node, multi-GPU, distributed training running on RedHat OpenShift in the DXC Robotic Drive environment.

Such frameworks also support data parallel training using MPI natively and can trigger the workload using MPI tools, such as mpirun or mpiexec.

Figure 1 shows a DL workload using two DGX-1 systems.

In the following examples, we show you how to trigger an MPI-based, DL workload using the Horovod framework.

The Robotic Drive containerized compute platform on OpenShift orchestrates DL workloads at scale, including multi-GPU, multi-node jobs using NVIDIA DGX systems.

5 дней, 19 часов назад @ developer.nvidia.com
NVIDIA Breaks 16 AI Performance Records in Latest MLPerf Benchmarks
NVIDIA Breaks 16 AI Performance Records in Latest MLPerf Benchmarks NVIDIA Breaks 16 AI Performance Records in Latest MLPerf Benchmarks

NVIDIA delivers the world’s fastest AI training performance among commercially available products, according to MLPerf benchmarks released today.

NVIDIA set six records in the first MLPerf training benchmarks in December 2018 and eight in July 2019.

Today’s NVIDIA A100 GPUs — coupled with software updates for CUDA-X libraries — power expanding clusters built with Mellanox HDR 200Gb/s InfiniBand networking.

Selene recently debuted on the TOP500 list as the fastest industrial system in the U.S. with more than an exaflops of AI performance.

These systems are all up and running thanks in part to a broad ecosystem supporting NVIDIA GPUs and DGX systems.

6 дней, 3 часа назад @ blogs.nvidia.com
Accelerating AI Training with MLPerf Containers and Models from NVIDIA NGC
Accelerating AI Training with MLPerf Containers and Models from NVIDIA NGC Accelerating AI Training with MLPerf Containers and Models from NVIDIA NGC

MLPerf Training v0.7 is the third instantiation for training and continues to evolve to stay on the cutting edge.

Framework: PyTorch in NGC PyTorch containers.

You can obtain the source code and pretrained models for all these models from the NGC resources page and NGC models page, respectively.

We provide scripts for training models end-to-end and to validate the trained models.

SummaryThe NVIDIA NGC containers and AI models provide proven vehicles for quickly developing and deploying AI applications.

6 дней, 3 часа назад @ developer.nvidia.com
Optimizing NVIDIA AI Performance for MLPerf v0.7 Training
Optimizing NVIDIA AI Performance for MLPerf v0.7 Training Optimizing NVIDIA AI Performance for MLPerf v0.7 Training

NVIDIA MLPerf AI Records.

To replicate NVIDIA MLPerf performance level on your workloads, see Accelerating AI Training with MLPerf Containers and Models from NVIDIA NGC.

As training scales, if every GPU does the full optimizer work, the optimizer can dominate the overall training time.

As training scales, if every GPU does the full optimizer work, the optimizer can dominate the overall training time.

MLPerf v0.7 submission informationPer-Chip Performance arrived at by comparing performance at the same scale when possible.

6 дней, 3 часа назад @ developer.nvidia.com
Apple Machine Learning Journal
последний пост None
Uber Engineering Uber Engineering
последний пост 1 месяц назад
Fiber: Distributed Computing for AI Made Simple
Fiber: Distributed Computing for AI Made  Simple Fiber: Distributed Computing for AI Made Simple

Instead of programming only a single desktop or laptop, users can leverage this system to program the whole computer cluster.

Fiber allows users to write programs that run on a computer cluster without needing to dive into the details of the computer cluster.

This overall architecture is summarized in Figure 2, below:Job-backed processesFiber introduces a new concept called job-backed processes (also called a Fiber processes).

When starting a new Fiber process, Fiber creates a new job with the proper Fiber back end on the current computer cluster.

Our hypothesis was that Fiber should perform similarly to multiprocessing because neither Fiber nor multiprocessing rely on complex scheduling me…

1 месяц назад @ eng.uber.com
Introducing Neuropod, Uber ATG’s Open Source Deep Learning Inference Engine
Introducing Neuropod, Uber ATG’s Open Source Deep Learning Inference Engine Introducing Neuropod, Uber ATG’s Open Source Deep Learning Inference Engine

Unfortunately, adding support for a new deep learning framework across an entire machine learning stack is resource and time-intensive.

Using multiple deep learning frameworksDeep learning (DL) is advancing very quickly and different DL frameworks are effective at different tasks.

Over the last year, we have deployed hundreds of Neuropod models across Uber ATG, Uber AI, and the core Uber business.

Deep learning with NeuropodLet’s take a look at the overall deep learning process when using Neuropod to see how it helps make experimentation, deployment, and iteration easier.

Next stepsNeuropod has allowed Uber to quickly build and deploy new deep learning models, but that’s just the start.

1 месяц, 3 недели назад @ eng.uber.com
Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning
Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning

The “spikes” at intersections result from the SDV crossing the same intersection multiple times as part of a “grid-coverage” driving pattern.

Data mining the scenario “pedestrian crossing the street”While the SDV perception system is designed to detect pedestrians, only a subset of pedestrians actually cross the street.

Analyzing the “pedestrian crossing the street” scenarioThe scenario of a pedestrian crossing the street has many relevant measurements, including the pedestrian crossing speed, road width, distance walked, crossing duration, distance walked on crosswalk, and traffic light state(s) at the time of crossing.

Let’s start by analyzing just one measurement: the pedestrian crossing…

2 месяца назад @ eng.uber.com
Meta-Graph: Few-Shot Link Prediction Using Meta-Learning
Meta-Graph: Few-Shot Link Prediction Using Meta-Learning Meta-Graph: Few-Shot Link Prediction Using Meta-Learning

For instance, in a social network we may use link prediction to power a friendship recommendation system, or in the case of biological network data, we might use link prediction to infer possible relationships between drugs, proteins, and diseases.

In principle, it can be combined with a wide variety of link prediction approaches based on GNNs, but we adopted a specific GNN, variational graph autoencoders (VGAEs), as our base link prediction framework9.

Experiment setupTo test how Meta-Graph might work in a real-world setting, we designed three novel benchmarks for few-shot link prediction.

In this few-shot link prediction setting, there are train/val/test splits at both the edge level and …

2 месяца, 1 неделя назад @ eng.uber.com
Announcing a New Framework for Designing Optimal Experiments with Pyro
Announcing a New Framework for Designing Optimal Experiments with Pyro Announcing a New Framework for Designing Optimal Experiments with Pyro

We’ll treat working memory capacity as the length of the longest list of random digits that the participant can memorize.

InferenceWe use Bayesian inference to incorporate our new observation into an estimate of the participant’s working memory capacity.

It models the probability of correctly remembering the list of digits of different lengths for people with different working memory capacities, as shown in Figure 1, below:We also need a sense of what working memory capacities are plausible.

Computing the optimal designOur score for experimental designs, EIG, is notoriously difficult to estimate.

In our paper, we showed that this method can be remarkably accurate on a range of different exp…

2 месяца, 3 недели назад @ eng.uber.com
Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

Last year we introduced the Paired Open-Ended Trailblazer (POET) to explore the idea of open-ended algorithms.

ANNECS: A new way to measure progress in open-ended systemsQuantifying the performance of open-ended algorithms has remained elusive for the field.

Compare those from Original POET in Figure 4a to those produced by Enhanced POET in Figure 4b, below.

If this piques your interest, be sure to check out videos of example Enhanced POET agents on the Uber AI YouTube channel.

Towards that end, we are not only releasing a paper with full technical details, but also have open sourced the code for Enhanced POET.

3 месяца назад @ eng.uber.com
Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles
Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles

A trained model, which requires as input the data set artifact, the model training code, and configuration files governing model training.

Example sequence of events: registering a new data setUpon user-registration of a new data set, the VerCD Data set Service stores the dependency metadata in our database.

Data set service APIThe data set service is responsible for tracking the dependencies for building a given data set.

The REST API supports the functions of creating a new data set, reading the metadata for a data set, updating the metadata of a data set, deleting a data set, and getting the artifact locations of the data set (such as in S3 or HDFS).

For instance, the VerCD data set serv…

5 месяцев назад @ eng.uber.com
Building a Backtesting Service to Measure Model Performance at Uber-scale
Building a Backtesting Service to Measure Model Performance at Uber-scale Building a Backtesting Service to Measure Model Performance at Uber-scale

To better assess the performance of our models, we built a backtesting service for measuring forecast model error rates.

The backtesting service runs in a distributed system, allowing multiple models (>10), many backtesting windows (>20), and models for different cities (>200) to run simultaneously.

Backtesting at scaleOur data science teams regularly create forecast models and statistics to better understand budget spending and project financial performance.

For the purposes of our backtesting service, we chose to leverage two primary backtesting data split mechanisms, backtesting with an expanding window and backtesting with a sliding window:Above, we showcase three windows for each metho…

5 месяцев, 3 недели назад @ eng.uber.com
Uber AI in 2019: Advancing Mobility with Artificial Intelligence
Uber AI in 2019: Advancing Mobility with Artificial Intelligence Uber AI in 2019: Advancing Mobility with Artificial Intelligence

At the forefront of this effort is Uber AI, Uber’s center for advanced artificial intelligence research and platforms.

In this year alone, AI research at Uber has led to significant improvements in demand prediction and more seamless pick-up experiences.

Fostering AI collaboration through open sourceIn 2019, Uber AI was committed to sharing knowledge and best practices with the broader scientific community through open source projects.

Looking towards 2020Next year, Uber AI will continue to innovate, collaborate, and contribute to Uber’s platform services through the application of AI across our business.

For more on Uber AI, be sure to check out related articles on the Uber Engineering Blo…

7 месяцев, 2 недели назад @ eng.uber.com
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

We in Uber AI Labs investigated the intriguing question of whether we can create learning algorithms that automatically generate training data, learning environments, and curricula to help AI agents rapidly learn.

Increasingly, neural architecture search (NAS) algorithms are being deployed to automate the search for architectures, with great results.

32), new learners are able to learn on synthetic data faster than real data (red line vs. blue line in Figure 1).

In our experiments, the estimates come either from training for 128 SGD steps on GTN-generated data or real data.

Then, for each method, the final best architecture according to the estimate is trained a long time on real data.

7 месяцев, 2 недели назад @ eng.uber.com
Controlling Text Generation with Plug and Play Language Models
Controlling Text Generation with Plug and Play Language Models Controlling Text Generation with Plug and Play Language Models

This article discusses an alternative approach to controlled text generation, titled the Plug and Play Language Model (PPLM), introduced in a recent paper from Uber AI.

In many ways, language models are like wise but unguided wooly mammoths that lumber wherever they please.

As we will show below, attribute models with only a single layer containing 4,000 parameters perform well at recognizing attributes and guiding generation.

Thus, we use the unmodified language model to ensure the fluency of language is maintained at or near the level of the original language model (in this example, GPT-2-medium).

Multiple attribute modelsWe may combine multiple attribute models in controlled generation, …

8 месяцев назад @ eng.uber.com
Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations

To this end, we previously developed ML models to better understand queries and for multi-objective optimization in Uber Eats search and recommender system in Uber Eats searches and surfaced food options.

Graph learning in a nutshellTo best understand how we made our Uber Eats recommendations more accurate, it helps to know the basics of how graph learning works.

For example, to represent an eater in our Uber Eats model we don’t only use order history to inform order suggestions, but also information about what food items are connected to past Uber Eats orders and insights about similar users.

For our Uber Eats use case, we opted for a graph neural network (GNN)-based approach to obtain an …

8 месяцев назад @ eng.uber.com
Uber Goes to NeurIPS 2019
Uber Goes to NeurIPS 2019 Uber Goes to NeurIPS 2019

This year, Uber is presenting 11 papers at the NeurIPS 2019 conference in Vancouver, Canada!

Scalable Global Optimization via Local Bayesian OptimizationDavid Eriksson (Uber AI) · Michael Pearce (Uber AI intern / Warwick University) · Jacob Gardner (Uber AI) · Ryan Turner (Uber AI) · Matthias Poloczek (Uber AI)ArXivDecember 10 at 4:25 pm, West Ballroom C, NeurIPS Spotlight TalkDecember 10 at 5:30 pm, East Exhibition Hall B&C, Poster #9Bayesian optimization (BO) has recently emerged as a successful technique for the global optimization of black-box functions.

For additional information about our talks and posters, check out the Uber NeurIPS 2019 site.

Interested in the ML research that Uber …

8 месяцев назад @ eng.uber.com
Announcing the 2020 Uber AI Residency
Announcing the 2020 Uber AI Residency Announcing the 2020 Uber AI Residency

On behalf of Uber, we invite you to join us on our journey as an Uber AI Resident.

Established in 2018, the Uber AI Residency is a 12-month training program for recent college and master’s graduates, professionals who are looking to reinforce their AI skills, and those with quantitative skills and interest in becoming an AI researcher at Uber.

This year’s AI residency program will focus on our self-driving cars project through Uber Advanced Technology Group (ATG).

Open source & publication opportunitiesAcross Uber, we are committed to an open and inclusive research mission that benefits the community at large through both Uber AI and Uber ATG Research.

Learn more about the Uber AI Residency…

8 месяцев, 1 неделя назад @ eng.uber.com
Get to Know Uber ATG at ICCV, CoRL, and IROS 2019
Get to Know Uber ATG at ICCV, CoRL, and IROS 2019 Get to Know Uber ATG at ICCV, CoRL, and IROS 2019

We hope our approach to sharing will deepen the interactions and collaborations between industry and academia, and will ultimately bring self-driving research communities together.

This year, Uber ATG has five publications accepted at ICCV, two publications accepted at CoRL, and two publications accepted at IROS.

In addition, Raquel Urtasun, Uber ATG Chief Scientist and Head of Uber ATG R&D, will be giving four talks at