Very ML
State-of-the-art Machine Learning News Feed
/r/MachineLearning /r/MachineLearning
последний пост 4 часа назад
[D] Best practices and platforms to train an image classifier?
[D] Best practices and platforms to train an image classifier? [D] Best practices and platforms to train an image classifier?

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

4 часа назад @ reddit.com
[P] This Hockey Player Does Not Exist
[P] This Hockey Player Does Not Exist [P] This Hockey Player Does Not Exist

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

5 часов назад @ reddit.com
[D] Recommended approach or algorithm for a fill-in-the-blanks algorithm for noisy text
[D] Recommended approach or algorithm for a fill-in-the-blanks algorithm for noisy text [D] Recommended approach or algorithm for a fill-in-the-blanks algorithm for noisy text

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

6 часов назад @ reddit.com
[N] TAO, a new large-scale multi-object tracking benchmark now released! ECCV '20 TAO Challenge announcement.
[N] TAO, a new large-scale multi-object tracking benchmark now released! ECCV '20 TAO Challenge announcement. [N] TAO, a new large-scale multi-object tracking benchmark now released! ECCV '20 TAO Challenge announcement.

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

7 часов назад @ reddit.com
[R] DeepMind Explores Deep RL for Brain and Behaviour Research
[R] DeepMind Explores Deep RL for Brain and Behaviour Research [R] DeepMind Explores Deep RL for Brain and Behaviour Research

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

8 часов назад @ reddit.com
[D] How does the L1 loss help in pix2pix?
[D] How does the L1 loss help in pix2pix? [D] How does the L1 loss help in pix2pix?

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

9 часов назад @ reddit.com
[D] Does anyone have any literature on cropping objects in VOC images?
[D] Does anyone have any literature on cropping objects in VOC images? [D] Does anyone have any literature on cropping objects in VOC images?

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

10 часов назад @ reddit.com
[D] Linear Layers vs Global Average Pooling for Regression
[D] Linear Layers vs Global Average Pooling for Regression [D] Linear Layers vs Global Average Pooling for Regression

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

10 часов назад @ reddit.com
[P] Tensorflow to PyTorch - model.predict equivalent
[P] Tensorflow to PyTorch - model.predict equivalent [P] Tensorflow to PyTorch - model.predict equivalent

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

11 часов назад @ reddit.com
[R] Problem with Wordnet Hypernyms
[R] Problem with Wordnet Hypernyms [R] Problem with Wordnet Hypernyms

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

11 часов назад @ reddit.com
[N][D] Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy
[N][D] Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy [N][D] Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

12 часов назад @ reddit.com
[P] tf-sprinkles 1.1 released. Fast and efficient sprinkles augmentation for image classification tasks.
[P] tf-sprinkles 1.1 released. Fast and efficient sprinkles augmentation for image classification tasks. [P] tf-sprinkles 1.1 released. Fast and efficient sprinkles augmentation for image classification tasks.

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

13 часов назад @ reddit.com
[P] Interactive visual tour of probability distributions
[P] Interactive visual tour of probability distributions [P] Interactive visual tour of probability distributions

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

14 часов назад @ reddit.com
[D] Machine Learning Toolbox
[D] Machine Learning Toolbox [D] Machine Learning Toolbox

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

14 часов назад @ reddit.com
[P] Summarizing pre-recorded BioNLP talks from ACL2020.
[P] Summarizing pre-recorded BioNLP talks from ACL2020. [P] Summarizing pre-recorded BioNLP talks from ACL2020.

Cookies help us deliver our Services.

By using our Services, you agree to our use of cookies.Learn More

14 часов назад @ reddit.com
Towards Data Science Towards Data Science
последний пост 2 часа назад
Removing ‘The Wall’ in ML Ops
Removing ‘The Wall’ in ML Ops Removing ‘The Wall’ in ML Ops

Removing ‘The Wall’ in ML OpsML Ops attempts to provide a continual improvement process to Machine LearningMore and more teams are seeking to adopt an approach called ML Ops.

Getting its name from the DevOps movement that emerged a decade ago, ML Ops attempts to provide a continual improvement process to Machine Learning.

The case for ML OpsIn the field of Machine Learning, where currently only about 10% of projects actually deliver business value, there is clearly a need for a similar revolution.

More importantly, we would like Machine Learning to be able to drive business value, by adapting to the needs quickly, and to catch issues quickly.

DataData is by far the most important asset whic…

2 часа назад @ towardsdatascience.com
Interesting AI/ML Articles You Should Read This Week (July 11)
Interesting AI/ML Articles You Should Read This Week (July 11) Interesting AI/ML Articles You Should Read This Week (July 11)

His path details his two-year journey in which he learnt Python, Deep Learning and many more related subject areas.

Miguel includes a reference of what deep learning courses proved the most effective for him.

He also covers how in retrospect he would have approached his Deep Learning studies, which you, the reader can adopt now and probably reach a level of deep learning proficiency in a shorter timeframe.

The “Competing in Kaggle” section of Miguel’s article is probably the reason why most people would read the article.

The article ends with a nudge to the benefit of attaining knowledge in Deep Learning, even if you don’t choose it as a career choice.

2 часа назад @ towardsdatascience.com
ANOVA + Tukey Test In Python
ANOVA + Tukey Test In Python ANOVA + Tukey Test In Python

Alpha ValueThe alpha value is the probability of rejecting the null hypothesis when the null hypothesis is true.

So a Tukey Test allows us to interpret the statistical significance of our ANOVA test and find out which specific groups’ means (compared with each other) are different.

One-Way ANOVA + Tukey TestHypothesis Test 1: KeywordsQuestion: Are there any differences between athleisure-related keywords when considering search volume?

Tukey Test Result:No need to run Tukey multiple comparisons test since we failed to reject the null hypothesis here.

CodeBelow are code snippet examples for how to perform the One-Way ANOVA and Tukey Test in Python.

2 часа назад @ towardsdatascience.com
Graphical Functions made from an effortless sketch
Graphical Functions made from an effortless sketch Graphical Functions made from an effortless sketch

PurposeAn existing application: MyCurveFitThere are existing applications that take in X and Y coordinates to return the exact equation.

Pipeline of my applicationWhen the user draws on the canvas provided, the X and Y coordinates of points on the curve are extracted.

(Width x Height x Channels)From the array representation of the image, the X and Y coordinates are calculated and stored in X and Y arrays respectively.

# Function: y = 1 + 2*x + 0*x^2As we can see, this can be easily extended to other functions such as PolyLogarithmic.

# Function: y = 1*sin(1*x + 0) + 0This works similarly for other functions such as Exponential and Hyperbolic Sine.

2 часа назад @ towardsdatascience.com
Supercharging customer touchpoints with uplift modeling
Supercharging customer touchpoints with uplift modeling Supercharging customer touchpoints with uplift modeling

Then I’ll show a simple way to build an uplift model and demonstrate a few uplift model evaluation metrics, using synthetic data in Python.

Data for uplift modeling: experiments are keyNow that we know the goal of uplift modeling, how do we get there?

In the context of uplift modeling, one could use the uplift model evaluation metrics introduced below on a validation set as a way to select features, for example by recursive feature elimination.

The gain curve calculation is available as part of the CausalML package, where it is called the uplift curve (CausalML).

With this goal, a variety of modeling techniques have been developed; uplift modeling continues to receive active research intere…

2 часа назад @ towardsdatascience.com
More than just pretty graphs
More than just pretty graphs More than just pretty graphs

More than just pretty graphsThe importance of great data visualizationAll too often, outsiders think that the job of data analysts and scientists is just number-crunching.

Even seasoned data scientists often relegate data visualization to the exploratory stages of data analysis.

In this post, I highlight a few specific instances where data visualization is crucial for helping data scientists and data consumers better understand datasets.

These examples are by no means comprehensive but they can help structure your thinking and understanding about the value of data visualization.

Conversely, as data communicators (data scientists, analysts, journalists), we must strive to make data visualiza…

2 часа назад @ towardsdatascience.com
Annotate data & Train AI for COVID-19 detection in chest CT using NVIDIA Clara on TrainingData.io
Annotate data & Train AI for COVID-19 detection in chest CT using NVIDIA Clara on TrainingData.io Annotate data & Train AI for COVID-19 detection in chest CT using NVIDIA Clara on TrainingData.io

Automatic detection of COVID-19 infection in chest CT using NVIDIA Clara on TrainingData.io Gaurav Follow Jul 11 · 4 min readIn March 2020, to help data scientists working on COVID-19 diagnostic tools, TrainingData.io provided a free collaborative workspace preloaded with the open-source dataset including chest X-ray and chest CT images.

COVID-19 Infection visualized in the Chest CT of a patient.

Chest XRay and Chest CT imaging are being used to observe the progression of the disease through the lungs of a patient diagnosed with COVID-19.

Visualizing GGOs/consolidation patterns in CT imaging plays an important role in helping medical staff to make proper decisions.

Semi-automatic AI-assiste…

2 часа назад @ towardsdatascience.com
Tips, Tricks, & Techniques to Take Your Data Wrangling Skills to the Next Level
Tips, Tricks, & Techniques to Take Your Data Wrangling Skills to the Next Level Tips, Tricks, & Techniques to Take Your Data Wrangling Skills to the Next Level

< Writing Parser Functions />An essential part of data cleaning is writing functions that parse a certain buggy column.

The best solution in this case is to simply select the first character from the data type.

We will need to first create a pandas DataFrame with columns and indices based on the unique values in our list-form data.

Within each iteration of the looping mechanism, create a logical mapping of information between that value and the template.

Any data manipulation/cleaning problem can be solved in a simple fashion by utilizing data projection.

3 часа назад @ towardsdatascience.com
What is correlation?
What is correlation? What is correlation?

For example, “when X is higher, Y tends to be higher” (this is called positive correlation) or “when X is higher, Y tends to be lower” (this is called negative correlation).

When most people hear the word correlation, they tend to think of perfect linear correlation: taking a horizontal step (X) to the right on the hill above gets you the same change in altitude (Y) everywhere on the same slope.

As long as you’re going up from left to right (positive correlation), there are no surprise jagged/curved bits.

X <- seq(-1, 1, 0.01) # Go from -1 to 1 in increments of 0.01Y <- -X^2 # Secret formula for the ideal hillplot(X, Y, main = "The linear correlation is zero")print(cor(X, Y)) # Check the co…

6 часов назад @ towardsdatascience.com
Drug Discovery with Deep Learning Under 10 Lines of Codes.
Drug Discovery with Deep Learning Under 10 Lines of Codes. Drug Discovery with Deep Learning Under 10 Lines of Codes.

The input drug and target arrays should be paired, i.e.

Encoder specificationAfter we obtain the required data format, we need to first specify the encoder to use for drug and protein.

Model initializationNext, we initialize a model using the above configuration:Model trainingNow, it is ready to train by simply typing the model.train function!

A ranked list of drug candidates are automatically generated and printed out:Automatically generated a ranked list of repurposing results.

You can now train a state-of-the-art deep learning model for the drug-target interaction prediction task 👏!

8 часов назад @ towardsdatascience.com
Build A Complete Neural Network From Scratch in Python
Build A Complete Neural Network From Scratch in Python Build A Complete Neural Network From Scratch in Python

Build A Complete Neural Network From Scratch in PythonIdeas and formulas to build a neural network, a complete implementation of a neural network with step by step implementation in PythonSource: AuthorThe Neural Network has been developed to mimic a human brain.

Ideas of Neural NetworkIn a simple neural network, neurons are the basic computation units.

You have to design your neural network based on your dataset and accuracy requirements.

The shape of theta1: Size of layer1 x Size of layer2The shape of theta2: Size of layer2 x Size of layer3From step 2, the shape of ‘df’ is 5000 x 400.

You just developed a complete neural network!

8 часов назад @ towardsdatascience.com
Analysis: Winning at @ratemyskyperoom
Analysis: Winning at @ratemyskyperoom Analysis: Winning at @ratemyskyperoom

Instead, I’ve analyzed 1,321 Tweets to answer a question many of us pandemic-bound remote-workers have wondered since Zoom became part of our daily lives: Do people like my room?!

Instead, here in the real world, the closest thing we’ve got is Room Rater (@ratemyskyperoom).

As more and more (famous) people are revealing their homes via the laptop lens, Room Rater has stepped up to judge them, publicly and quantitatively.

!”To find out, I pulled down all of @ratemyskyperoom’s 1,321 room rating tweets from May 2020 to July 2020, parsed out the ratings, then looked at the content of both the images and text for each of their tweets.

(The interactive version of this post is here: Room Rating St…

8 часов назад @ towardsdatascience.com
A Guide to Text Annotation — the Key to Understanding Language
A Guide to Text Annotation — the Key to Understanding Language A Guide to Text Annotation — the Key to Understanding Language

A Guide to Text Annotation — the Key to Understanding LanguageNamed Entity Recognition, Sentiment Analysis, and MoreIn an increasingly data-driven digital world, businesses must utilize the massive quantities of data users provide on their platforms to distinguish themselves from their competitors.

For instance, consider named entity tagging, a common task in text annotation, in which entities (nouns) in unstructured text are identified and assigned a class.

Often, named entity tagging is extremely complicated, due to competing interests of scope and definition.

Intelligent models that can perform NER successfully — named entity recognition, automated named entity tagging — rely on initiall…

9 часов назад @ towardsdatascience.com
How to Create a Control Chart in Power BI
How to Create a Control Chart in Power BI How to Create a Control Chart in Power BI

How to Create a Control Chart in Power BIIn this article, I will be showing you how you can build a dynamic control chart in Power BI that helps your team pinpoint any outliers or processes out of control.

Control Chart (Image from r-bar.net)First things first for those who don’t know what exactly is a control chart.

This is a control chart I created in Power BI (Image by Author)🚩 Let’s get started by building that Control Chart in Power BIIn order to achieve the main features of a control chart, we start by creating some DAX measures:For the Average calculation, you will simply be using the AVERAGE function.

Lower Control Limit (LCL) = AVERAGE CASES - [STDEV]*3Create a line chart with the …

9 часов назад @ towardsdatascience.com
The Alternative to Web Scraping
The Alternative to Web Scraping The Alternative to Web Scraping

This makes it a prime target for web scraping by finance enthusiasts.

There are nearly daily questions on StackOverflow that reference some sort of data retrieval (oftentimes through web scraping) from Yahoo Finance.

Web Scraping Problem #1The OP is trying to find the current price for a specific stock, Facebook.

Their code is below:And that code produced the following output:the current price: 216.08It’s a pretty simple problem with an also simple web scraping solution.

Let’s check:web scraping #1 min time is 0.5678426799999997lazy #1 min time is 0.11238783999999953web scraping #2 min time is 0.3731000199999997lazy #2 min time is 0.0864451399999993The lazy alternatives are 4x to 5x faster …

9 часов назад @ towardsdatascience.com
Distill.pub Distill.pub
последний пост 3 недели, 2 дня назад
Curve Detectors
Curve Detectors

Part one in a three part deep exploration into a neuron family.

3 недели, 2 дня назад @ distill.pub
Exploring Bayesian Optimization
Exploring Bayesian Optimization

How to tune hyperparameters for your machine learning model using Bayesian optimization.

2 месяца назад @ distill.pub
An Overview of Early Vision in InceptionV1
An Overview of Early Vision in InceptionV1

An overview of all the neurons in the first five layers of InceptionV1, organized into a taxonomy of 'neuron groups.'

3 месяца, 1 неделя назад @ distill.pub
Visualizing Neural Networks with the Grand Tour
Visualizing Neural Networks with the Grand Tour

By focusing on linear dimensionality reduction, we show how to visualize many dynamic phenomena in neural networks.

3 месяца, 3 недели назад @ distill.pub
Thread: Circuits
Thread: Circuits

What can we learn if we invest heavily in reverse engineering a single neural network?

4 месяца назад @ distill.pub
Zoom In: An Introduction to Circuits
Zoom In: An Introduction to Circuits

By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.

4 месяца назад @ distill.pub
Growing Neural Cellular Automata
Growing Neural Cellular Automata

Training an end-to-end differentiable, self-organising cellular automata model of morphogenesis, able to both grow and regenerate specific patterns.

5 месяцев назад @ distill.pub
Visualizing the Impact of Feature Attribution Baselines
Visualizing the Impact of Feature Attribution Baselines

Exploring the baseline input hyperparameter, and how it impacts interpretations of neural network behavior.

6 месяцев назад @ distill.pub
Computing Receptive Fields of Convolutional Neural Networks
Computing Receptive Fields of Convolutional Neural Networks

Detailed derivations and open-source code to analyze the receptive fields of convnets.

8 месяцев, 1 неделя назад @ distill.pub
The Paths Perspective on Value Learning
The Paths Perspective on Value Learning

A closer look at how Temporal Difference Learning merges paths of experience for greater statistical efficiency

9 месяцев, 2 недели назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarially Robust Neural Style Transfer
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarially Robust Neural Style Transfer

An experiment showing adversarial robustness makes neural style transfer work on a non-VGG architecture

11 месяцев, 1 неделя назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Two Examples of Useful, Non-Robust Features
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Two Examples of Useful, Non-Robust Features

An example project using webpack and svelte-loader and ejs to inline SVGs

11 месяцев, 1 неделя назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Robust Feature Leakage
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Robust Feature Leakage

An example project using webpack and svelte-loader and ejs to inline SVGs

11 месяцев, 1 неделя назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Examples are Just Bugs, Too
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Examples are Just Bugs, Too

Refining the source of adversarial examples

11 месяцев, 1 неделя назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'

The main hypothesis in Ilyas et al. (2019) happens to be a special case of a more general principle that is commonly accepted in the robustness to distributional shift literature

11 месяцев, 1 неделя назад @ distill.pub
The Gradient The Gradient
последний пост 4 дня, 10 часов назад
Challenges of Comparing Human and Machine Perception
Challenges of Comparing Human and Machine Perception Challenges of Comparing Human and Machine Perception

Given these apparent similarities, many questions arise: How similar are human and machine vision really?

Geirhos et al.

The following figure shows two examples of the Synthetic Visual Reasoning Test (SVRT) (Fleuret et al., 2011).

A large recognition gap was identifiable for our DNN when testing machine-selected stimuli - unlike for the machine algorithms tested by Ullman et al.

Human and machine illustration taken from https://www.flickr.com/photos/gleonhard/33661762360 under the license https://creativecommons.org/licenses/by-sa/2.0/CitationFor attribution in academic contexts or books, please cite this work asJudy Borowski and Christina Funke, "Challenges of Comparing Human and Machine P…

4 дня, 10 часов назад @ thegradient.pub
Lessons from the PULSE Model and Discussion
Lessons from the PULSE Model and Discussion Lessons from the PULSE Model and Discussion

— Yann LeCun (@ylecun) June 22, 2020Further discussion on the subject also occured reddit in the thread "[Discussion] about data bias vs inductive bias in machine learning sparked by the PULSE paper/demo".

On the Responsibilities of AI ResearchersAnother aspect of the discussion arose in response to this exchange:Not so much ML researchers but ML engineers.

ML engineers get their methods from ML researchers, so ML researchers have the ethic responsibility of showing at least how biased they are.

— Yann LeCun (@ylecun) June 21, 2020Which again led to questions regarding the validity of the initial claim:Yes.

CitationFor attribution in academic contexts or books, please cite this work asAndre…

2 недели, 2 дня назад @ thegradient.pub
A Speech-To-Text Practitioner’s Criticisms of Industry and Academia
A Speech-To-Text Practitioner’s Criticisms of Industry and Academia A Speech-To-Text Practitioner’s Criticisms of Industry and Academia

This is a follow-up article to our article on building speech-to-text (STT) models, Towards an ImageNet Moment for Speech-to-Text.

Сriticisms of the IndustryIn general, the majority of STT papers we have read were written by researchers from the industry (e.g.

Most criticisms of STT papers and solutions can be attributed to either the"academic" part or the "industry" part of the researchers’ background.

The majority of modern STT papers usually just heavily overfit on the LibriSpeech ASR corpus (LibriSpeech) with increasingly more extravagant methods.

CitationFor attribution in academic contexts or books, please cite this work asAlexander Veysov, "A Speech-To-Text Practitioner’s Criticisms …

3 месяца, 1 неделя назад @ thegradient.pub
Towards an ImageNet Moment for Speech-to-Text
Towards an ImageNet Moment for Speech-to-Text Towards an ImageNet Moment for Speech-to-Text

Speech-to-text (STT), also known as automated-speech-recognition (ASR), has a long history and has made amazing progress over the past decade.

IntroductionFollowing the success and the democratization (the so-called "ImageNet moment", i.e.

This piece will describe our pursuit of an ImageNet moment for STT, which has so far not been found, and particularly in the context of Russian language.

(i) is easy to estimate just by looking at the model's performance during the first 20-25% of its epochs.

CitationFor attribution in academic contexts or books, please cite this work asAlexander Veysov, "Toward's an ImageNet Moment for Speech-to-Text", The Gradient, 2020.

3 месяца, 2 недели назад @ thegradient.pub
Quantifying Independently Reproducible Machine Learning
Quantifying Independently Reproducible Machine Learning Quantifying Independently Reproducible Machine Learning

My investigation in reproducible ML has also relied on personal notes and records hosted on Mendeley and Github.

What Makes a ML Paper Reproducible?

The biggest factors are that we cannot take all of our assumptions about so-called reproducible ML at face value.

At the same time, our process and systems must result in reproducible work that does not lead us astray.

AcknowledgmentsFeature image source: https://xkcd.com/242/CitationFor attribution in academic contexts or books, please cite this work asEdward Raff, "Quantifying Independently Reproducible Machine Learning", The Gradient, 2020.

5 месяцев назад @ thegradient.pub
GPT-2 and the Nature of Intelligence
GPT-2 and the Nature of Intelligence GPT-2 and the Nature of Intelligence

--The AI system GPT-2, in a December 2019 interview with The Economist, "An artificial intelligence predicts the future"Innateness, empiricism, and recent developments in deep learningConsider two classic hypotheses about the development of language and cognition.

Consider GPT-2, an AI system that was recently featured in The New Yorker and interviewed by The Economist.

The popular blog StatStarCodex featured it, too, in a podcast entitled "GPT-2 as a step towards General Intelligence".

Compared to any previous system for generating natural language, GPT-2 has a number of remarkable strengths.

I speak fluent EnglishIf you run your experiments talktotransformer.com, you will quickly learn th…

5 месяцев, 2 недели назад @ thegradient.pub
The Economics of AI Today
The Economics of AI Today The Economics of AI Today

Every day we hear claims that Artificial Intelligence (AI) systems are about to transform the economy, creating mass unemployment and vast monopolies.

In September 2017, a group of distinguished economists gathered in Toronto to set out a research agenda for the Economics of Artificial Intelligence (AI).

Previous editions of the Economics of AI conference included papers about the impact of AI in sectors such as media or health-care.

Lack of diversity in the AI research workforce, and the increasing influence of the private sector in setting AI research (and ethical) agendas as part of the industrialization of AI research suggest that this could be a problem, but the evidence base is lackin…

5 месяцев, 3 недели назад @ thegradient.pub
Is NeurIPS Getting Too Big?
Is NeurIPS Getting Too Big? Is NeurIPS Getting Too Big?

NeurIPS 2019, the latest incarnation of the Neural Information Processing Systems conference, wrapped up just over a week ago.

No, that's a keynote at #NeurIPS2019 pic.twitter.com/nJjONGzJww — Jevgenij Gamper (@brutforcimag) December 11, 2019 NeurIPS poster session- Too crowded.

:(NeurIPS 2019, Vancouver, Canada: Got the visa 3 weeks before.

CitationFor attribution in academic contexts or books, please cite this work asAndrey Kurenkov, "Is NeurIPS Getting Too Big?

BibTeX citation:@article{kurenkov2019neuripst,author = {Kurenkov, Andrey},title = {Is NeurIPS Getting Too Big?

6 месяцев, 2 недели назад @ thegradient.pub
An Epidemic of AI Misinformation
An Epidemic of AI Misinformation An Epidemic of AI Misinformation

Unfortunately, the problem of overhyped AI extends beyond the media itself.

General AI still seems like it might be a couple decades away, sixty years after the first optimistic projections were issued.

Hundreds of deep learning for radiology companies have been spawned in the meantime, but thus far no actual radiologists have been replaced, and the best guess is that deep learning can augment radiologists, but not, in the near-term replace them.

If AI system is allegedly better than humans, then which humans, and how much better?

CitationFor attribution in academic contexts or books, please cite this work asGary Marcus, "An Epidemic of AI Misinformation", The Gradient, 2019.

7 месяцев, 1 неделя назад @ thegradient.pub
Introduction to Artificial Life for People who Like AI
Introduction to Artificial Life for People who Like AI Introduction to Artificial Life for People who Like AI

NEAT was awarded the 2017 International Society for Artificial Life Award for Outstanding Paper of the Decade.

First, I think we are seeing the first signs of the next AI winter, a period where people lose confidence in AI research and funding dries out.

She was recently elected to the board of the International Society for Artificial Life.

AcknowledgmentsHeader from "Lenia — Biology of Artificial Life", used with permission of Bert Wang-Chak Chan.

CitationFor attribution in academic contexts or books, please cite this work asLana Sinapayen, "Introduction to Artificial Life for People who Like AI", The Gradient, 2019.

7 месяцев, 2 недели назад @ thegradient.pub
How Machine Learning Can Help Unlock the World of Ancient Japan
How Machine Learning Can Help Unlock the World of Ancient Japan How Machine Learning Can Help Unlock the World of Ancient Japan

However, these models were unable to achieve strong performance on Kuzushiji recognition.

There are several reasons why Kuzushiji recognition is challenging:Capturing both local and global context is important.

This is one reason why conventional sequence models do not have the capability to work well with many Kuzushiji documents.

However there are many other types of Kuzushiji text that a person might want to transcribe.

CitationFor attribution in academic contexts or books, please cite this work asALex Lamb, "How Machine Learning Can Help Unlock the World of Ancient Japan", The Gradient, 2019.

7 месяцев, 3 недели назад @ thegradient.pub
Gaussian Processes, not quite for dummies
Gaussian Processes, not quite for dummies Gaussian Processes, not quite for dummies

Note: if all k components are independent Gaussian random variables, then $X$ must be multivariate Gaussian (because the sum of independent Gaussian random variables is always Gaussian).

Higher dimensional Gaussian5D GaussianNow we can consider a higher dimension Gaussian, starting from 5D — so the covariance matrix is now 5x5.

We then take K and add $I\sigma_y^2$ for the final covariance matrix to factor in noise -- more on this later.

Gaussian ProcessTextbook definitionFrom the above derivation, you can view Gaussian process as a generalization of multivariate Gaussian distribution to infinitely many variables.

CitationFor attribution in academic contexts or books, please cite this work a…

8 месяцев назад @ thegradient.pub
Evaluation Metrics for Language Modeling
Evaluation Metrics for Language Modeling Evaluation Metrics for Language Modeling

Counterintuitively, having more metrics actually makes it harder to compare language models, especially as indicators of how well a language model will perform on a specific downstream task are often unreliable.

Despite the presence of these downstream evaluation benchmarks, traditional intrinsic metrics are, nevertheless, extremely useful during the process of training the language model itself.

Proof: let P be the distribution of the underlying language and Q be the distribution learned by a language model.

The performance of N-gram language models do not improve much as N goes above 4, whereas the performance of neural language models continue improving over time.

In less than two years,…

8 месяцев, 3 недели назад @ thegradient.pub
The State of Machine Learning Frameworks in 2019
The State of Machine Learning Frameworks in 2019 The State of Machine Learning Frameworks in 2019

Since deep learning regained prominence in 2012, many machine learning frameworks have clamored to become the new favorite among researchers and industry practitioners.

Machine learning research itself is also in a massive state of flux.

We work in machine learning because we care - about advancing machine learning research, about democratizing AI, or maybe just about building cool stuff.

CitationFor attribution in academic contexts or books, please cite this work asHorace He, "The State of Machine Learning Frameworks in 2019", The Gradient, 2019.

BibTeX citation:@article{he2019mlframeworks,author = {He, Horace},title = {The State of Machine Learning Frameworks in 2019},journal = {The Gradi…

9 месяцев назад @ thegradient.pub
The #BenderRule: On Naming the Languages We Study and Why It Matters
The #BenderRule: On Naming the Languages We Study and Why It Matters The #BenderRule: On Naming the Languages We Study and Why It Matters

This has led to a digital divide in the field of NLP between high resource and low resource languages.

And yet, the field of NLP is caught in a negative feedback loop that hinders the expansion of the languages we work on.

Work on languages other than English is often considered “language specific” and thus reviewed as less important than equivalent work on English.

Many NLP systems for Chinese, Japanese, Thai and other languages have to start with the problem of word tokenization.

CitationFor attribution in academic contexts or books, please cite this work asEmily M. Bender, "The #BenderRule: On Naming the Languages We Study and Why It Matters", The Gradient, 2019.

10 месяцев назад @ thegradient.pub
DataTau DataTau
последний пост 17 часов назад
Promising Application of BERT and human psychology to derive emotions from Marketing Copy
Promising Application of BERT and human psychology to derive emotions from Marketing Copy Promising Application of BERT and human psychology to derive emotions from Marketing Copy

You must be logged to comment.

17 часов назад @ datatau.net
Learning by Forgetting: Deep Neural Networks and the Jennifer Aniston Neuron
Learning by Forgetting: Deep Neural Networks and the Jennifer Aniston Neuron Learning by Forgetting: Deep Neural Networks and the Jennifer Aniston Neuron

You must be logged to comment.

19 часов назад @ datatau.net
Making Data FAIR
Making Data FAIR Making Data FAIR

You must be logged to comment.

1 день, 6 часов назад @ datatau.net
APPLYING AI & MACHINE LEARNING TO MEDIA, ADVERTISING & ENTERTAINMENT
APPLYING AI & MACHINE LEARNING TO MEDIA, ADVERTISING & ENTERTAINMENT APPLYING AI & MACHINE LEARNING TO MEDIA, ADVERTISING & ENTERTAINMENT

You must be logged to comment.

1 день, 13 часов назад @ datatau.net
What is algorithmic bias?
What is algorithmic bias? What is algorithmic bias?

You must be logged to comment.

1 день, 16 часов назад @ datatau.net
Hire a SharePoint Development Company for Customized Web Development Components
Hire a SharePoint Development Company for Customized Web Development Components Hire a SharePoint Development Company for Customized Web Development Components

You must be logged to comment.

1 день, 17 часов назад @ datatau.net
Easy Speech-to-Text with Python
Easy Speech-to-Text with Python Easy Speech-to-Text with Python

You must be logged to comment.

1 день, 19 часов назад @ datatau.net
Beginners guide to Tensor operations in PyTorch
Beginners guide to Tensor operations in PyTorch Beginners guide to Tensor operations in PyTorch

You must be logged to comment.

1 день, 20 часов назад @ datatau.net
Cartoonize image using neural networks
Cartoonize image using neural networks Cartoonize image using neural networks

You must be logged to comment.

1 день, 21 час назад @ datatau.net
AI in Practice: Identify defective components with AutoML in the Google Cloud Platform
AI in Practice: Identify defective components with AutoML in the Google Cloud Platform AI in Practice: Identify defective components with AutoML in the Google Cloud Platform

You must be logged to comment.

2 дня, 16 часов назад @ datatau.net
Mobile App Development Company
Mobile App Development Company Mobile App Development Company

You must be logged to comment.

2 дня, 17 часов назад @ datatau.net
How Much Math do you need in Data Science?
How Much Math do you need in Data Science? How Much Math do you need in Data Science?

You must be logged to comment.

2 дня, 19 часов назад @ datatau.net
Getting Started with TensorFlow 2
Getting Started with TensorFlow 2 Getting Started with TensorFlow 2

You must be logged to comment.

2 дня, 19 часов назад @ datatau.net
An App Development Trends to Watch Out in 2020 and Beyond
An App Development Trends to Watch Out in 2020 and Beyond An App Development Trends to Watch Out in 2020 and Beyond

You must be logged to comment.

2 дня, 21 час назад @ datatau.net
100+ Ready Made Doors
100+ Ready Made Doors 100+ Ready Made Doors

You must be logged to comment.

2 дня, 23 часа назад @ datatau.net
Synced Review
последний пост 9 часов назад
DeepMind Explores Deep RL for Brain and Behaviour Research
DeepMind Explores Deep RL for Brain and Behaviour Research DeepMind Explores Deep RL for Brain and Behaviour Research

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

9 часов назад @ medium.com
Viral Post Highlights ‘Toxicity Problems’ in the Machine Learning Community
Viral Post Highlights ‘Toxicity Problems’ in the Machine Learning Community Viral Post Highlights ‘Toxicity Problems’ in the Machine Learning Community

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 день, 8 часов назад @ medium.com
Grand Theft Auto Scene Context Data Boosts Human Motion Prediction
Grand Theft Auto Scene Context Data Boosts Human Motion Prediction Grand Theft Auto Scene Context Data Boosts Human Motion Prediction

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 день, 11 часов назад @ medium.com
Researchers Propose ‘Neuro-Symbolic’ Approach for Generative Art
Researchers Propose ‘Neuro-Symbolic’ Approach for Generative Art Researchers Propose ‘Neuro-Symbolic’ Approach for Generative Art

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

2 дня, 8 часов назад @ medium.com
ACL 2020 Announces Best Paper & Test-Of-Time Awards
ACL 2020 Announces Best Paper & Test-Of-Time Awards ACL 2020 Announces Best Paper & Test-Of-Time Awards

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

2 дня, 12 часов назад @ medium.com
‘Beyond the ConvNet’ -Stanford & MIT Neural Network Learns Physical Graph Representations from…
‘Beyond the ConvNet’ -Stanford & MIT Neural Network Learns Physical Graph Representations from… ‘Beyond the ConvNet’ -Stanford & MIT Neural Network Learns Physical Graph Representations from…

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

3 дня, 8 часов назад @ medium.com
Facebook Introduces Integrated Eye & Face Model for 3D Immersion in Remote Communication
Facebook Introduces Integrated Eye & Face Model for 3D Immersion in Remote Communication Facebook Introduces Integrated Eye & Face Model for 3D Immersion in Remote Communication

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

3 дня, 11 часов назад @ medium.com
Nature Paper Puts An Eye on China’s New Generation of AI
Nature Paper Puts An Eye on China’s New Generation of AI Nature Paper Puts An Eye on China’s New Generation of AI

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

4 дня, 10 часов назад @ medium.com
Will Artificial Brain Synapses & Neuromorphic Computing Open the Next AI Hardware Frontier?
Will Artificial Brain Synapses & Neuromorphic Computing Open the Next AI Hardware Frontier? Will Artificial Brain Synapses & Neuromorphic Computing Open the Next AI Hardware Frontier?

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

6 дней, 13 часов назад @ medium.com
ICML 2020 Announces Test of Time Award
ICML 2020 Announces Test of Time Award ICML 2020 Announces Test of Time Award

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя назад @ medium.com
Semantic Segmentation Boosts Kiwifruit-Harvesting Robot Performance
Semantic Segmentation Boosts Kiwifruit-Harvesting Robot Performance Semantic Segmentation Boosts Kiwifruit-Harvesting Robot Performance

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 1 день назад @ medium.com
Top US AI Research Institutes and Tech Companies Support National AI Research Cloud
Top US AI Research Institutes and Tech Companies Support National AI Research Cloud Top US AI Research Institutes and Tech Companies Support National AI Research Cloud

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 1 день назад @ medium.com
NetHack: Fast & Complex Learning Environment For Testing RL Agent Robustness & Generalization
NetHack: Fast & Complex Learning Environment For Testing RL Agent Robustness & Generalization NetHack: Fast & Complex Learning Environment For Testing RL Agent Robustness & Generalization

This website is using a security service to protect itself from online attacks.

The action you just performed triggered the security solution.

There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.

You can email the site owner to let them know you were blocked.

Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.

1 неделя, 2 дня назад @ medium.com
🔬 Science
Papers With Code Papers With Code
последний пост 1 день, 2 часа назад
Gradient Origin Networks
Gradient Origin Networks Gradient Origin Networks

This paper proposes a new type of implicit generative model that is able to quickly learn a latent representation without an explicit encoder.

This is achieved with an implicit neural network that takes as inputs points in the coordinate space alongside a latent vector initialised with zeros...

The gradients of the data fitting loss with respect to this zero vector are jointly optimised to act as latent points that capture the data manifold.

The results show similar characteristics to autoencoders, but with fewer parameters and the advantages of implicit representation networks.

(read more)

1 день, 2 часа назад @ paperswithcode.com
Metric-Guided Prototype Learning
Metric-Guided Prototype Learning Metric-Guided Prototype Learning

This is especially true for many key machine learning applications...

In the case of classification tasks, the hierarchy of errors can be summarized under the form of a cost matrix, which assesses the gravity of confusing each pair of classes.

Our method relies on conjointly learning a feature-extracting network and a set of class representations, or prototypes, which incorporate the error metric into their relative arrangement.

Our approach allows for consistent improvement of the network's prediction with regard to the cost matrix.

Experiments on three different tasks and public datasets -- from agricultural time series classification to depth image semantic segmentation -- validate our a…

1 день, 2 часа назад @ paperswithcode.com
Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models
Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models

Deep learning (DL) can achieve impressive results across a wide variety of tasks, but this often comes at the cost of training models for extensive periods on specialized hardware accelerators.

This energy-intensive workload has seen immense growth in recent years... Machine learning (ML) may become a significant contributor to climate change if this exponential trend continues.

If practitioners are aware of their energy and carbon footprint, then they may actively take steps to reduce it whenever possible.

In this work, we present Carbontracker, a tool for tracking and predicting the energy and carbon footprint of training DL models.

We propose that energy and carbon footprint of model dev…

1 день, 2 часа назад @ paperswithcode.com
DART: Open-Domain Structured Data Record to Text Generation
DART: Open-Domain Structured Data Record to Text Generation DART: Open-Domain Structured Data Record to Text Generation

We introduce DART, a large dataset for open-domain structured data record to text generation.

We consider the structured data record input as a set of RDF entity-relation triples, a format widely used for knowledge representation and semantics description... DART consists of 82,191 examples across different domains with each input being a semantic RDF triple set derived from data records in tables and the tree ontology of the schema, annotated with sentence descriptions that cover all facts in the triple set.

This hierarchical, structured format with its open-domain nature differentiates DART from other existing table-to-text corpora.

We conduct an analysis of DART on several state-of-the-a…

1 день, 2 часа назад @ paperswithcode.com
Natural Emergence of Heterogeneous Strategies in Artificially Intelligent Competitive Teams
Natural Emergence of Heterogeneous Strategies in Artificially Intelligent Competitive Teams Natural Emergence of Heterogeneous Strategies in Artificially Intelligent Competitive Teams

Multi agent strategies in mixed cooperative-competitive environments can be hard to craft by hand because each agent needs to coordinate with its teammates while competing with its opponents.

Learning based algorithms are appealing but many scenarios require heterogeneous agent behavior for the team's success and this increases the complexity of the learning algorithm...

In this work, we develop a competitive multi agent environment called FortAttack in which two teams compete against each other.

We observe a natural emergence of heterogeneous behavior amongst homogeneous agents when such behavior can lead to the team's success.

Such heterogeneous behavior from homogeneous agents is appeali…

1 день, 2 часа назад @ paperswithcode.com
VPN: Learning Video-Pose Embedding for Activities of Daily Living
VPN: Learning Video-Pose Embedding for Activities of Daily Living VPN: Learning Video-Pose Embedding for Activities of Daily Living

In this paper, we focus on the spatio-temporal aspect of recognizing Activities of Daily Living (ADL).

ADL have two specific properties (i) subtle spatio-temporal patterns and (ii) similar visual patterns varying with time...

Therefore, ADL may look very similar and often necessitate to look at their fine-grained details to distinguish them.

Because the recent spatio-temporal 3D ConvNets are too rigid to capture the subtle visual patterns across an action, we propose a novel Video-Pose Network: VPN.

The 2 key components of this VPN are a spatial embedding and an attention network.

1 день, 2 часа назад @ paperswithcode.com
Fast Adaptation via Policy-Dynamics Value Functions
Fast Adaptation via Policy-Dynamics Value Functions Fast Adaptation via Policy-Dynamics Value Functions

Standard RL algorithms assume fixed environment dynamics and require a significant amount of interaction to adapt to new environments.

We introduce Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting to dynamics different from those previously seen in training... PD-VF explicitly estimates the cumulative reward in a space of policies and environments.

An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.

At test time, a few actions are sufficient to infer the environment embedding, enabling a policy to be selected by maximizing the learned value functio…

1 день, 2 часа назад @ paperswithcode.com
Relaxed Conformal Prediction Cascades for Efficient Inference Over Many Labels
Relaxed Conformal Prediction Cascades for Efficient Inference Over Many Labels Relaxed Conformal Prediction Cascades for Efficient Inference Over Many Labels

Providing a small set of promising candidates in place of a single prediction is well-suited for many open-ended classification tasks.

Conformal Prediction (CP) is a technique for creating classifiers that produce a valid set of predictions that contains the true answer with arbitrarily high probability...

This is particularly pervasive in the considered setting, where the correct answer is not unique and the number of total possible answers is high.

First, we relax CP validity to arbitrary criterions of success---allowing our framework to make more efficient predictions while remaining "equivalently correct."

Second, we amortize cost by conformalizing prediction cascades, in which we aggre…

1 день, 2 часа назад @ paperswithcode.com
Long-term Human Motion Prediction with Scene Context
Long-term Human Motion Prediction with Scene Context Long-term Human Motion Prediction with Scene Context

Human movement is goal-directed and influenced by the spatial layout of the objects in the scene.

To plan future human motion, it is crucial to perceive the environment -- imagine how hard it is to navigate a new room with lights off...

Existing works on predicting human motion do not pay attention to the scene context and thus struggle in long-term prediction.

In this work, we propose a novel three-stage framework that exploits scene context to tackle this task.

Given a single scene image and 2D pose histories, our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path.

1 день, 2 часа назад @ paperswithcode.com
Efficient Learning of Generative Models via Finite-Difference Score Matching
Efficient Learning of Generative Models via Finite-Difference Score Matching Efficient Learning of Generative Models via Finite-Difference Score Matching

Several machine learning applications involve the optimization of higher-order derivatives (e.g., gradients of gradients) during training, which can be expensive in respect to memory and computation even with automatic differentiation.

As a typical example in generative modeling, score matching (SM) involves the optimization of the trace of a Hessian... To improve computing efficiency, we rewrite the SM objective and its variants in terms of directional derivatives, and present a generic strategy to efficiently approximate any-order directional derivative with finite difference (FD).

Our approximation only involves function evaluations, which can be executed in parallel, and no gradient com…

1 день, 2 часа назад @ paperswithcode.com
SpinalNet: Deep Neural Network with Gradual Input
SpinalNet: Deep Neural Network with Gradual Input SpinalNet: Deep Neural Network with Gradual Input

However, DNNs need high computation times, and people always expect better performance with lower computation...

Therefore, we study the human somatosensory system and design a neural network (SpinalNet) to achieve higher accuracy with lower computation time.

Hidden layers of the proposed SpinalNet consist of three parts: 1) Input row, 2) Intermediate row, and 3) output row.

Input segmentation enables each hidden layer to receive a part of the input and outputs of the previous layer.

We integrate the SpinalNet as the fully-connected layer of the convolutional neural network (CNN), residual neural network (ResNet), and Dense Convolutional Network (DenseNet), Visual Geometry Group (VGG) netwo…

1 день, 2 часа назад @ paperswithcode.com
TripMD: Driving patterns investigation via Motif Analysis
TripMD: Driving patterns investigation via Motif Analysis TripMD: Driving patterns investigation via Motif Analysis

Processing driving data and investigating driving behavior has been receiving an increasing interest in the last decades, with applications ranging from car insurance pricing to policy making.

A common strategy to analyze driving behavior analysis is to study the maneuvers being performance by the driver...

In this paper, we propose TripMD, a system that extracts the most relevant driving patterns from sensor recordings (such as acceleration) and provides a visualization that allows for an easy investigation.

Additionally, we test our system using the UAH-DriveSet dataset, a publicly available naturalistic driving dataset.

We show that (1) our system can extract a rich number of driving pat…

1 день, 2 часа назад @ paperswithcode.com
Single Shot Video Object Detector
Single Shot Video Object Detector Single Shot Video Object Detector

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.

Nevertheless, the extension of such object detectors from image to video is not trivial especially when appearance deterioration exists in videos, \emph{e.g.

}, motion blur or occlusion... A valid question is how to explore temporal coherence across frames for boosting detection.

In this paper, we propose to address the problem by enhancing per-frame features through aggregation of neighboring frames.

Specifically, we present Single Shot Video Object Detector (SSVD) -- a new architecture that novelly integrates feature aggregation into a one-stage …

1 день, 2 часа назад @ paperswithcode.com
Lossless CNN Channel Pruning via Gradient Resetting and Convolutional Re-parameterization
Lossless CNN Channel Pruning via Gradient Resetting and Convolutional Re-parameterization Lossless CNN Channel Pruning via Gradient Resetting and Convolutional Re-parameterization

Channel pruning (a.k.a.

filter pruning) aims to slim down a convolutional neural network (CNN) by reducing the width (i.e., numbers of output channels) of convolutional layers...

However, as CNN's representational capacity depends on the width, doing so tends to degrade the performance.

A traditional learning-based channel pruning paradigm applies a penalty on parameters to improve the robustness to pruning, but such a penalty may degrade the performance even before pruning.

Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the per…

1 день, 2 часа назад @ paperswithcode.com
Divide-and-Rule: Self-Supervised Learning for Survival Analysis in Colorectal Cancer
Divide-and-Rule: Self-Supervised Learning for Survival Analysis in Colorectal Cancer Divide-and-Rule: Self-Supervised Learning for Survival Analysis in Colorectal Cancer

With the long-term rapid increase in incidences of colorectal cancer (CRC), there is an urgent clinical need to improve risk stratification.

In this work, we aim to learn histopathological patterns within cancerous tissue regions that can be used to improve prognostic stratification for colorectal cancer.

To do so, we propose a self-supervised learning method that jointly learns a representation of tissue regions as well as a metric of the clustering to obtain their underlying patterns.

We furthermore show that the proposed approach can benefit from linear predictors to avoid overfitting in patient outcomes predictions.

The experimental results demonstrate statistically significant patient …

1 день, 2 часа назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 1 день, 2 часа назад
Evaluating German Transformer Language Models with Syntactic Agreement Tests
Evaluating German Transformer Language Models with Syntactic Agreement Tests Evaluating German Transformer Language Models with Syntactic Agreement Tests

Pre-trained transformer language models (TLMs) have recently refashioned natural language processing (NLP): Most state-of-the-art NLP models now operate on top of TLMs to benefit from contextualization and knowledge induction.

Besides other methods, syntactic agreement tests were utilized to analyse TLMs.

In this work, we analyse German TLMs.

To this end, we design numerous agreement tasks, some of which consider peculiarities of the German language.

Our experimental results show that state-of-the-art German TLMs generally perform well on agreement tasks, but we also identify and discuss syntactic structures that push them to their limits.

1 день, 2 часа назад @ paperswithcode.com
Curriculum learning for multilevel budgeted combinatorial problems
Curriculum learning for multilevel budgeted combinatorial problems Curriculum learning for multilevel budgeted combinatorial problems

Learning heuristics for combinatorial optimization problems through graph neural networks have recently shown promising results on some classic NP-hard problems.

These are single-level optimization problems with only one player... Multilevel combinatorial optimization problems are their generalization, encompassing situations with multiple players taking decisions sequentially.

By framing them in a multi-agent reinforcement learning setting, we devise a value-based method to learn to solve multilevel budgeted combinatorial problems involving two players in a zero-sum game over a graph.

Thus, in a bottom-up approach, we generate datasets of heuristically solved instances with increasingly la…

1 день, 2 часа назад @ paperswithcode.com
DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue
DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue

Visual Dialogue task requires an agent to be engaged in a conversation with human about an image.

The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation...

In this architecture, word generation is decomposed into a series of attention-based information selection steps, performed by the novel recurrent Deliberation, Abandon and Memory (DAM) module.

Each DAM module performs an adaptive combination of the response-level semantics captured from the encoder and the word-level semantics specifically selected for generating each word.

Furthermore, DAM is flexible to cooperate with existing visual dialogue encoders and adaptive to…

1 день, 2 часа назад @ paperswithcode.com
MAMO: Memory-Augmented Meta-Optimization for Cold-start Recommendation
MAMO: Memory-Augmented Meta-Optimization for Cold-start Recommendation MAMO: Memory-Augmented Meta-Optimization for Cold-start Recommendation

Recently, some works introduce the meta-optimization idea into the recommendation scenarios, i.e.

The core idea is learning a global sharing initialization parameter for all users and then learning the local parameters for each user separately.

However, most meta-learning based recommendation approaches adopt model-agnostic meta-learning for parameter initialization, where the global sharing parameter may lead the model into local optima for some users.

In this paper, we design two memory matrices that can store task-specific memories and feature-specific memories.

We test the model on two widely used recommendation datasets and consider four cold-start situations.

1 день, 2 часа назад @ paperswithcode.com
Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with Deep Metric Learning
Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with Deep Metric Learning Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with Deep Metric Learning

We propose spatial semantic embedding network (SSEN), a simple, yet efficient algorithm for 3D instance segmentation using deep metric learning.

The raw 3D reconstruction of an indoor environment suffers from occlusions, noise, and is produced without any meaningful distinction between individual entities... For high-level intelligent tasks from a large scale scene, 3D instance segmentation recognizes individual instances of objects.

We approach the instance segmentation by simply learning the correct embedding space that maps individual instances of objects into distinct clusters that reflect both spatial and semantic information.

Unlike previous approaches that require complex pre-process…

1 день, 2 часа назад @ paperswithcode.com
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction

Molecular property prediction (e.g., energy) is an essential problem in chemistry and biology.

However, learning semi-supervised representation for large amounts of molecules is challenging, including the joint representation issue of both molecular essence and structure, the conflict between representation and property leaning.

Here we propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.

In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.

At last, we proposed a novel …

1 день, 2 часа назад @ paperswithcode.com
Policy learning with partial observation and mechanical constraints for multi-person modeling
Policy learning with partial observation and mechanical constraints for multi-person modeling Policy learning with partial observation and mechanical constraints for multi-person modeling

Extracting the rules of real-world biological multi-agent behaviors is a current challenge in various scientific and engineering fields.

Biological agents generally have limited observation and mechanical constraints; however, most of the conventional data-driven models ignore such assumptions, resulting in lack of biological plausibility and model interpretability for behavioral analyses in biological and cognitive science...

Here we propose sequential generative models with partial observation and mechanical constraints, which can visualize whose information the agents utilize and can generate biologically plausible actions.

We formulate this as a decentralized multi-agent imitation learn…

1 день, 2 часа назад @ paperswithcode.com
Discretization-Aware Architecture Search
Discretization-Aware Architecture Search Discretization-Aware Architecture Search

The search cost of neural architecture search (NAS) has been largely reduced by weight-sharing methods.

These methods optimize a super-network with all possible edges and operations, and determine the optimal sub-network by discretization, \textit{i.e.

The discretization process, performed on either operations or edges, incurs significant inaccuracy and thus the quality of the final architecture is not guaranteed.

This paper presents discretization-aware architecture search (DA\textsuperscript{2}S), with the core idea being adding a loss term to push the super-network towards the configuration of desired topology, so that the accuracy loss brought by discretization is largely alleviated.

Ex…

1 день, 2 часа назад @ paperswithcode.com
Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems
Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems

Emerging cross-device artificial intelligence (AI) applications require a transition from conventional centralized learning systems towards large-scale distributed AI systems that can collaboratively perform complex learning tasks.

2020) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems...

The outlined principles are meant to provide a generalization of distributed learning that goes beyond existing mechanisms such as federated learning.

The approach consists of a self-organizing hierarchical structuring mechanism based on agglomerative clustering, hierarchical generalization, and corresponding learning m…

1 день, 2 часа назад @ paperswithcode.com
SegFix: Model-Agnostic Boundary Refinement for Segmentation
SegFix: Model-Agnostic Boundary Refinement for Segmentation SegFix: Model-Agnostic Boundary Refinement for Segmentation

We present a model-agnostic post-processing scheme to improve the boundary quality for the segmentation result that is generated by any existing segmentation model.

Motivated by the empirical observation that the label predictions of interior pixels are more reliable, we propose to replace the originally unreliable predictions of boundary pixels by the predictions of interior pixels... Our approach processes only the input image through two steps: (i) localize the boundary pixels and (ii) identify the corresponding interior pixel for each boundary pixel.

We build the correspondence by learning a direction away from the boundary pixel to an interior pixel.

Our method requires no prior inform…

1 день, 2 часа назад @ paperswithcode.com
Robust Re-Identification by Multiple Views Knowledge Distillation
Robust Re-Identification by Multiple Views Knowledge Distillation Robust Re-Identification by Multiple Views Knowledge Distillation

Include the markdown at the top of your GitHub README.md file to showcase the performance of the model.

Badges are live and will be dynamically updated with the latest ranking of this paper.

1 день, 2 часа назад @ paperswithcode.com
A Multi-Level Approach to Waste Object Segmentation
A Multi-Level Approach to Waste Object Segmentation A Multi-Level Approach to Waste Object Segmentation

We address the problem of localizing waste objects from a color image and an optional depth image, which is a key perception component for robotic interaction with such objects.

Specifically, our method integrates the intensity and depth information at multiple levels of spatial granularity... Firstly, a scene-level deep network produces an initial coarse segmentation, based on which we select a few potential object regions to zoom in and perform fine segmentation.

The results of the above steps are further integrated into a densely connected conditional random field that learns to respect the appearance, depth, and spatial affinities with pixel-level accuracy.

In addition, we create a new …

1 день, 2 часа назад @ paperswithcode.com
Self-Supervised Policy Adaptation during Deployment
Self-Supervised Policy Adaptation during Deployment Self-Supervised Policy Adaptation during Deployment

In most real world scenarios, a policy trained by reinforcement learning in one environment needs to be deployed in another, potentially quite different environment.

However, generalization across different environments is known to be hard... A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal.

Our work explores the use of self-supervision to allow the policy to continue training after deployment without using any rewards.

While previous methods explicitly anticipate changes in the new environment, we assume no prior knowledge of those changes yet still obtain significant improvements.

O…

1 день, 2 часа назад @ paperswithcode.com
Generalizing Tensor Decomposition for N-ary Relational Knowledge Bases
Generalizing Tensor Decomposition for N-ary Relational Knowledge Bases Generalizing Tensor Decomposition for N-ary Relational Knowledge Bases

With the rapid development of knowledge bases (KBs), link prediction task, which completes KBs with missing facts, has been broadly studied in especially binary relational KBs (a.k.a knowledge graph) with powerful tensor decomposition related methods.

However, the ubiquitous n-ary relational KBs with higher-arity relational facts are paid less attention, in which existing translation based and neural network based approaches have weak expressiveness and high complexity in modeling various relations... Tensor decomposition has not been considered for n-ary relational KBs, while directly extending tensor decomposition related methods of binary relational KBs to the n-ary case does not yield s…

1 день, 2 часа назад @ paperswithcode.com
Binary Stochastic Filtering: feature selection and beyond
Binary Stochastic Filtering: feature selection and beyond Binary Stochastic Filtering: feature selection and beyond

Feature selection is one of the most decisive tools in understanding data and machine learning models.

Among other methods, sparsity induced by $L^{1}$ penalty is one of the simplest and best studied approaches to this problem...

Although such regularization is frequently used in neural networks to achieve sparsity of weights or unit activations, it is unclear how it can be employed in the feature selection problem.

This work aims at extending the neural network with ability to automatically select features by rethinking how the sparsity regularization can be used, namely, by stochastically penalizing feature involvement instead of the layer weights.

Furthermore, the method is easily genera…

1 день, 2 часа назад @ paperswithcode.com
Papers With Code Papers With Code
последний пост 1 день, 2 часа назад
Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion
Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion

Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations.

Although several efforts have been made for CRS, two major issues still remain to be solved... First, the conversation data itself lacks of sufficient contextual information for accurately understanding users' preference.

Second, there is a semantic gap between natural language expression and item-level user preference.

To address these issues, we incorporate both word-oriented and entity-oriented knowledge graphs (KG) to enhance the data representations in CRSs, and adopt Mutual Information Maximization to align the word-level and entity-level semantic spaces.

Based on t…

1 день, 2 часа назад @ paperswithcode.com
Dynamic Group Convolution for Accelerating Convolutional Neural Networks
Dynamic Group Convolution for Accelerating Convolutional Neural Networks Dynamic Group Convolution for Accelerating Convolutional Neural Networks

Replacing normal convolutions with group convolutions can significantly increase the computational efficiency of modern deep convolution networks, which has been widely adopted in compact network architecture designs.

However, existing group convolutions undermine the original network structures by cutting off some connections permanently resulting in significant accuracy degradation...

In this paper, we propose dynamic group convolution (DGC) that adaptively selects which part of input channels to be connected within each group for individual samples on the fly.

The DGC preserves the original network structure and has similar computational efficiency as the conventional group convolutions …

1 день, 2 часа назад @ paperswithcode.com
Multiple Expert Brainstorming for Domain Adaptive Person Re-identification
Multiple Expert Brainstorming for Domain Adaptive Person Re-identification Multiple Expert Brainstorming for Domain Adaptive Person Re-identification

Often the best performing deep neural models are ensembles of multiple base-level networks, nevertheless, ensemble learning with respect to domain adaptive person re-ID remains unexplored.

In this paper, we propose a multiple expert brainstorming network (MEB-Net) for domain adaptive person re-ID, opening up a promising direction about model ensemble problem under unsupervised conditions... MEB-Net adopts a mutual learning strategy, where multiple networks with different architectures are pre-trained within a source domain as expert models equipped with specific features and knowledge, while the adaptation is then accomplished through brainstorming (mutual learning) among expert models.

MEB…

2 дня, 6 часов назад @ paperswithcode.com
Variational Autoencoders for Anomalous Jet Tagging
Variational Autoencoders for Anomalous Jet Tagging Variational Autoencoders for Anomalous Jet Tagging

We present a detailed study on Variational Autoencoders (VAEs) for anomalous jet tagging.

When using VAE as an anomaly detector, we present two approaches to detect anomalies: directly comparing in the input space or, instead, working in the latent space.

Results of the tagging performance for different jet types and over a large kinematic range are shown.

Confronted with the problem of mis-assigning lower likelihood to out-of-distributions samples, we explore one potential solution -- Outlier Exposure (OE).

OE, in the context of jet tagging, is employed to facilitate two goals: increasing sensitivity of outlier detection and decorrelating jet mass.

2 дня, 6 часов назад @ paperswithcode.com
CICLAD: A Fast and Memory-efficient Closed Itemset Miner for Streams
CICLAD: A Fast and Memory-efficient Closed Itemset Miner for Streams CICLAD: A Fast and Memory-efficient Closed Itemset Miner for Streams

Mining association rules from data streams is a challenging task due to the (typically) limited resources available vs. the large size of the result.

Frequent closed itemsets (FCI) enable an efficient first step, yet current FCI stream miners are not optimal on resource consumption, e.g.

they store a large number of extra itemsets at an additional cost...

In a search for a better storage-efficiency trade-off, we designed Ciclad,an intersection-based sliding-window FCI miner.

Experimental results indicate Ciclad's memory imprint is much lower and its performances globally better than competitor methods.

2 дня, 6 часов назад @ paperswithcode.com
Confidence-Aware Learning for Deep Neural Networks
Confidence-Aware Learning for Deep Neural Networks Confidence-Aware Learning for Deep Neural Networks

Despite the power of deep neural networks for a wide range of tasks, an overconfident prediction issue has limited their practical use in many safety-critical applications.

In this paper, we propose a method of training deep neural networks with a novel loss function, named Correctness Ranking Loss, which regularizes class probabilities explicitly to be better confidence estimates in terms of ordinal ranking according to confidence.

The proposed method is easy to implement and can be applied to the existing architectures without any modification.

Extensive experimental results on classification benchmark datasets indicate that the proposed method helps networks to produce well-ranked confid…

2 дня, 6 часов назад @ paperswithcode.com
Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient
Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient

Exploration-exploitation dilemma has long been a crucial issue in reinforcement learning.

In this paper, we propose a new approach to automatically balance between these two... Our method is built upon the Soft Actor-Critic (SAC) algorithm, which uses an ``entropy temperature" that balances the original task reward and the policy entropy, and hence controls the trade-off between exploitation and exploration.

It is empirically shown that SAC is very sensitive to this hyperparameter, and the follow-up work (SAC-v2), which uses constrained optimization for automatic adjustment, has some limitations.

The core of our method, namely Meta-SAC, is to use metagradient along with a novel meta objecti…

2 дня, 6 часов назад @ paperswithcode.com
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation
Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation

This paper studies the problem of learning semantic segmentation from image-level supervision only.

To achieve this, two neural co-attentions are incorporated into the classifier to complimentarily capture cross-image semantic similarities and differences.

This helps the classifier discover more object patterns and better ground semantics in image regions.

In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference, hence eventually benefiting semantic segmentation learning.

Moreover, our approach ranked 1st place in the Weakly-Supervised Semantic Segmentation Track of CVPR2020 Learning from Imperfect…

2 дня, 6 часов назад @ paperswithcode.com
Disentangled Graph Collaborative Filtering
Disentangled Graph Collaborative Filtering Disentangled Graph Collaborative Filtering

Learning informative representations of users and items from the interaction data is of crucial importance to collaborative filtering (CF).

Present embedding functions exploit user-item relationships to enrich the representations, evolving from a single user-item instance to the holistic interaction graph...

In this work, we pay special attention to user-item relationships at the finer granularity of user intents.

We hence devise a new model, Disentangled Graph Collaborative Filtering (DGCF), to disentangle these factors and yield disentangled representations.

This leads to disentangled representations, effectively distilling information pertinent to each intent.

2 дня, 6 часов назад @ paperswithcode.com
Segment as Points for Efficient Online Multi-Object Tracking and Segmentation
Segment as Points for Efficient Online Multi-Object Tracking and Segmentation Segment as Points for Efficient Online Multi-Object Tracking and Segmentation

Current multi-object tracking and segmentation (MOTS) methods follow the tracking-by-detection paradigm and adopt convolutions for feature extraction.

However, as affected by the inherent receptive field, convolution based feature extraction inevitably mixes up the foreground features and the background features, resulting in ambiguities in the subsequent instance association...

Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.

Furthermore, multiple informative data modalities are converted into point-wise representations to enrich point-wise features.

The resulting online MOTS framew…

2 дня, 6 часов назад @ paperswithcode.com
PointTrack++ for Effective Online Multi-Object Tracking and Segmentation
PointTrack++ for Effective Online Multi-Object Tracking and Segmentation PointTrack++ for Effective Online Multi-Object Tracking and Segmentation

Multiple-object tracking and segmentation (MOTS) is a novel computer vision task that aims to jointly perform multiple object tracking (MOT) and instance segmentation.

In this work, we present PointTrack++, an effective on-line framework for MOTS, which remarkably extends our recently proposed PointTrack framework... To begin with, PointTrack adopts an efficient one-stage framework for instance segmentation, and learns instance embeddings by converting compact image representations to un-ordered 2D point cloud.

Compared with PointTrack, our proposed PointTrack++ offers three major improvements.

Firstly, in the instance segmentation stage, we adopt a semantic segmentation decoder trained wit…

2 дня, 6 часов назад @ paperswithcode.com
Interpretation of Disease Evidence for Medical Images Using Adversarial Deformation Fields
Interpretation of Disease Evidence for Medical Images Using Adversarial Deformation Fields Interpretation of Disease Evidence for Medical Images Using Adversarial Deformation Fields

The high complexity of deep learning models is associated with the difficulty of explaining what evidence they recognize as correlating with specific disease labels.

We propose a novel method for formulating and presenting spatial explanations of disease evidence, called deformation field interpretation with generative adversarial networks (DeFI-GAN).

An adversarially trained generator produces deformation fields that modify images of diseased patients to resemble images of healthy patients.

We validate the method studying chronic obstructive pulmonary disease (COPD) evidence in chest x-rays (CXRs) and Alzheimer's disease (AD) evidence in brain MRIs.

When extracting disease evidence in long…

2 дня, 6 часов назад @ paperswithcode.com
Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation
Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation

This strategy first utilizes a base tracker to coarsely locate the target and then exploits a refinement module to obtain more accurate results...

However, existing refinement modules suffer from the limited transferability and precision.

In this work, we propose a novel, flexible and accurate refinement module called Alpha-Refine, which exploits a precise pixel-wise correlation layer together with a spatial-aware non-local layer to fuse features and can predict three complementary outputs: bounding box, corners and mask.

We apply the proposed Alpha-Refine module to five famous and state-of-the-art base trackers: DiMP, ATOM, SiamRPN++, RTMDNet and ECO.

The comprehensive experiments on Track…

2 дня, 6 часов назад @ paperswithcode.com
Shape-aware Meta-learning for Generalizing Prostate MRI Segmentation to Unseen Domains
Shape-aware Meta-learning for Generalizing Prostate MRI Segmentation to Unseen Domains Shape-aware Meta-learning for Generalizing Prostate MRI Segmentation to Unseen Domains

Model generalization capacity at domain shift (e.g., various imaging protocols and scanners) is crucial for deep learning methods in real-world clinical deployment.

This paper tackles the challenging problem of domain generalization, i.e., learning a model from multi-domain source data such that it can directly generalize to an unseen target domain... We present a novel shape-aware meta-learning scheme to improve the model generalization in prostate MRI segmentation.

Our learning scheme roots in the gradient-based meta-learning, by explicitly simulating domain shift with virtual meta-train and meta-test during training.

We evaluate our method on prostate MRI data from six different institut…

2 дня, 6 часов назад @ paperswithcode.com
Playing Chess with Limited Look Ahead
Playing Chess with Limited Look Ahead Playing Chess with Limited Look Ahead

However, one common element in these works is the necessity of a finely optimized look ahead algorithm...

The particular interest of this research lies with creating a chess engine that is highly capable, but restricted in its look ahead depth.

We train a deep neural network to serve as a static evaluation function, which is accompanied by a relatively simple look ahead algorithm.

We show that our static evaluation function has encoded some semblance of look ahead knowledge, and is comparable to classical evaluation functions.

The strength of our chess engine is assessed by comparing its proposed moves against those proposed by Stockfish.

2 дня, 6 часов назад @ paperswithcode.com
📓 Cool Blogs
ODS.ai Habr
последний пост 2 недели, 1 день назад
Рубрика «Читаем статьи за вас». Май 2020. Часть 2
Рубрика «Читаем статьи за вас». Май 2020. Часть 2 Рубрика «Читаем статьи за вас». Май 2020. Часть 2

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks (China, 2020)

TAPAS: Weakly Supervised Table Parsing via Pre-training (Google, 2020)

DeepFaceLab: A simple, flexible and extensible faceswapping framework (2020)

End-to-End Object Detection with Transformers (Facebook AI, 2020)

Language Models are Few-Shot Learners (OpenAI, 2020)

TabNet: Attentive Interpretable Tabular Learning (Google Cloud AI, 2020) Читать дальше →

2 недели, 1 день назад @ habr.com
Рубрика «Читаем статьи за вас». Май 2020. Часть 1
Рубрика «Читаем статьи за вас». Май 2020. Часть 1 Рубрика «Читаем статьи за вас». Май 2020. Часть 1

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Efficient Document Re-Ranking for Transformers by Precomputing Term Representations; EARL: Speedup Transformer-based Rankers with Pre-computed Representation (2020)

MakeItTalk: Speaker-Aware Talking Head Animation (Adobe, University of Massachusetts Amherst, Huya, 2020)

Jukebox: A Generative Model for Music (OpenAI, 2020)

Recipes for building an open-domain chatbot (Facebook AI Research, 2020)

One-Shot Object Detection without Fine-Tuning (HKUST, Hong Kong, Tencent, 2020)

f-BRS: Rethinki…

3 недели, 4 дня назад @ habr.com
Рубрика «Читаем статьи за вас». Апрель 2020. Часть 2
Рубрика «Читаем статьи за вас». Апрель 2020. Часть 2 Рубрика «Читаем статьи за вас». Апрель 2020. Часть 2

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (Georgia Institute of Technology, Atlanta, USA, 2016)

X3D: Expanding Architectures for Efficient Video Recognition (Facebook AI Research, 2020)

Adaptive Attention Span in Transformers (Facebook AI Research, 2019)

ResNeSt: Split-Attention Networks (Amazon, 2020)

Weight Standardization (Johns Hopkins University, 2019)

Supervised Contrastive Learning (Google Research, MIT, 2020)

Improved Training Speed, Accurac…

1 месяц, 1 неделя назад @ habr.com
Рубрика «Читаем статьи за вас». Апрель 2020. Часть 1
Рубрика «Читаем статьи за вас». Апрель 2020. Часть 1 Рубрика «Читаем статьи за вас». Апрель 2020. Часть 1

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: TResNet: High Performance GPU-Dedicated Architecture (DAMO Academy, Alibaba Group, 2020)

Controllable Person Image Synthesis with Attribute-Decomposed GAN (China, 2020)

Learning to See Through Obstructions (Taiwan, USA, 2020)

Tracking Objects as Points (UT Austin, Intel Labs, 2020)

CookGAN: Meal Image Synthesis from Ingredients (USA, UK, 2020)

Designing Network Design Spaces (FAIR, 2020)

Gradient Centralization: A New Optimization Technique for Deep Neural Networks (Hong Kong, Alibaba, 2…

1 месяц, 2 недели назад @ habr.com
Лекарей сжигать нельзя беречь сейчас
Лекарей сжигать нельзя беречь сейчас Лекарей сжигать нельзя беречь сейчас

TLDR: кому перестановки делают больнее — меряем свёрткой графов.

Код: RolX и ванильная трёхслойная GCN на мотифах. Выгорание на рабочем месте повстречал ещё в начале своей карьеры — и с тех пор живо интересуюсь этим вопросом. Представьте обстановку. Большой проект внедрения SAP. Высокие ставки. Амбициозные сроки. Нагрузку каждый воспринимал по-своему. Кто-то сорвался и самоустранился от выполнения обязанностей, кто-то стал токсичнее, у меня самого в какой-то момент чувство юмора пропало. Ненадолго. Управление изменениями (дисциплина, направленная на снижение напряжения во время внедрения информационных систем) многим обязана медикам. Во-первых, сам феномен эмоционального выгорания впервые з…

2 месяца, 1 неделя назад @ habr.com
Рубрика «Читаем статьи за вас». Март 2020. Часть 2
Рубрика «Читаем статьи за вас». Март 2020. Часть 2 Рубрика «Читаем статьи за вас». Март 2020. Часть 2

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество! Первая часть мартовской сборки обзоров опубликована ранее.

Статьи на сегодня: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (UC Berkeley, Google Research, UC San Diego, 2020)

Scene Text Recognition via Transformer (China, 2020)

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization (Imperial College London, Google Research, 2019)

Lagrangian Neural Networks (Princeton, Oregon, Google, Flatiron, 2020)

Deformable Style Transfer (Chicago, USA, 2020)

Rethinking…

2 месяца, 3 недели назад @ habr.com
Рубрика «Читаем статьи за вас». Март 2020. Часть 1
Рубрика «Читаем статьи за вас». Март 2020. Часть 1 Рубрика «Читаем статьи за вас». Март 2020. Часть 1

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Fast Differentiable Sorting and Ranking (Google Brain, 2020)

MaxUp: A Simple Way to Improve Generalization of Neural Network Training (UT Austin, 2020)

Deep Nearest Neighbor Anomaly Detection (Jerusalem, Israel, 2020)

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch (Google, 2020)

SpERT: Span-based Joint Entity and Relation Extraction with Transformer Pre-training (RheinMain University, Germany, 2019)

High-Resolution Daytime Translation Without Domain Labels (Samsung AI Cen…

3 месяца назад @ habr.com
Машинное обучение на языке R с использованием пакета mlr3
Машинное обучение на языке R с использованием пакета mlr3 Машинное обучение на языке R с использованием пакета mlr3

Источник: https://mlr3book.mlr-org.com/ Привет, Хабр! В этом сообщении мы рассмотрим самый продуманный на сегодняшний день подход к машинному обучению на языке R — пакет mlr3 и экосистему вокруг него. Данный подход основан на «нормальном» ООП с использованием R6-классов и на представлении всех операций с данными и моделями в виде графа вычислений. Это позволяет создавать упорядоченные и гибкие пайплайны для задач машинного обучения, но на первых порах может показаться сложным и запутанным. Ниже постараемся внести определенную ясность и замотивировать к использованию mlr3 в ваших проектах. Содержание: Немного истории и сравнение с конкурирующими решениями

Технические детали: R6-классы и паке…

3 месяца назад @ habr.com
Распространение сферического коня в вакууме по территории РФ
Распространение сферического коня в вакууме по территории РФ Распространение сферического коня в вакууме по территории РФ

Привет от ODS. Мы откликнулись на идею tutu.ru поработать с их датасетом пассажиропотока РФ. И если в посте Milfgard огромная таблица выводов и научпоп, то мы хотим рассказать что под капотом.

Что, опять очередной пост про COVID-19? Да, но нет. Нам это было интересно именно с точки зрения математических методов и работы с интересным набором данных. Прежде, чем вы увидите под катом красивые картинки и графики, я обязан сказать несколько вещей: любое моделирование — это очень сложный процесс, внутри которого невероятное количество ЕСЛИ и ПРЕДПОЛОЖИМ. Мы о них расскажем.

те, кто работал над этой статьей — не эпидемиологи или вирусологи. Мы просто группа любителей теории графов, практикующих ме…

3 месяца, 1 неделя назад @ habr.com
Рубрика «Читаем статьи за вас». Январь — Февраль 2020
Рубрика «Читаем статьи за вас». Январь — Февраль 2020 Рубрика «Читаем статьи за вас». Январь — Февраль 2020

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Представлены обзоры 11 статей по Computer Vision, Natural Language Processing, Reinforcement learning и другим темам. Читать дальше →

3 месяца, 3 недели назад @ habr.com
Настройка функции потерь для нейронной сети на данных сейсморазведки
Настройка функции потерь для нейронной сети на данных сейсморазведки Настройка функции потерь для нейронной сети на данных сейсморазведки

В прошлой статье мы описали эксперимент по определению минимального объема вручную размеченных срезов для обучения нейронной сети на данных сейсморазведки. Сегодня мы продолжаем эту тему, выбирая наиболее подходящую функцию потерь. Рассмотрены 2 базовых класса функций – Binary cross entropy и Intersection over Union – в 6-ти вариантах с подбором параметров, а также комбинации функций разных классов. Дополнительно рассмотрена регуляризация функции потерь. Спойлер: удалось существенно улучшить качество прогноза сети. Читать дальше →

4 месяца, 3 недели назад @ habr.com
Открытый курс «Deep Learning in NLP» от создателей DeepPavlov на базе курса cs224n
Открытый курс «Deep Learning in NLP» от создателей DeepPavlov на базе курса cs224n

Всем привет!

Если возник вопрос по курсу — посмотрите раздел Q&A ниже.

Вступление

Меня зовут Алексей Клоков, я хочу рассказать о запуске классного курса по обработке естественного языка (Natural Language Processing), который очередной раз запускают физтехи из проекта DeepPavlov – открытой библиотеки для разговорного искусственного интеллекта, которую разрабатывают в лаборатории нейронных систем и глубокого обучения МФТИ. Благодарю их и Moryshka за разрешение осветить эту тему на Хабре в нашем ods-блоге. Итак, поехали! Читать дальше →

5 месяцев назад @ habr.com
Рубрика «Читаем статьи за вас». Октябрь — Декабрь 2019
Рубрика «Читаем статьи за вас». Октябрь — Декабрь 2019 Рубрика «Читаем статьи за вас». Октябрь — Декабрь 2019

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (Facebook, 2019)

Implicit Discriminator in Variational Autoencoder (Indian Institute of Technology Ropar, 2019)

Self-training with Noisy Student improves ImageNet classification (Google Research, Carnegie Mellon University, 2019)

Momentum Contrast for Unsupervised Visual Representation Learning (Facebook, 2019)

Benchmarking Neural Network Robustness to Common Corruptions and …

5 месяцев, 1 неделя назад @ habr.com
SVM. Объяснение с нуля и реализация на python. Подробный разбор метода опорных векторов
SVM. Объяснение с нуля и реализация на python. Подробный разбор метода опорных векторов SVM. Объяснение с нуля и реализация на python. Подробный разбор метода опорных векторов

Привет всем, кто выбрал путь ML-самурая!

Введение:

В данной статье рассмотрим метод опорных векторов (англ. SVM, Support Vector Machine) для задачи классификации. Будет представлена основная идея алгоритма, вывод настройки его весов и разобрана простая реализация своими руками. На примере датасета будет продемонстрирована работа написанного алгоритма с линейно разделимыми/неразделимыми данными в пространстве и визуализация обучения/прогноза. Дополнительно будут озвучены плюсы и минусы алгоритма, его модификации. Рисунок 1. Фото цветка ириса из открытых источников Читать дальше →

5 месяцев, 2 недели назад @ habr.com
TensorRT 6.x.x.x — высокопроизводительный инференс для моделей глубокого обучения (Object Detection и Segmentation)
TensorRT 6.x.x.x — высокопроизводительный инференс для моделей глубокого обучения (Object Detection и Segmentation) TensorRT 6.x.x.x — высокопроизводительный инференс для моделей глубокого обучения (Object Detection и Segmentation)

Больно только в первый раз! Всем привет! Дорогие друзья, в этой статье я хочу поделиться своим опытом использования TensorRT, RetinaNet на базе репозитория github.com/aidonchuk/retinanet-examples (это форк официальной репы от nvidia, который позволит начать использовать в продакшен оптимизированные модели в кратчайшие сроки). Пролистывая сообщения в каналах сообщества ods.ai, я сталкиваюсь с вопросами по использованию TensorRT, и в основном вопросы повторяются, поэтому я решил написать как можно более полное руководство по использованию быстрого инференса на основе TensorRT, RetinaNet, Unet и docker. Читать дальше →

5 месяцев, 3 недели назад @ habr.com
inFERENCe inFERENCe
последний пост 7 месяцев, 4 недели назад
Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem
Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem

November 14, 2019Meta-Learning Millions of Hyper-parameters using the Implicit Function TheoremLast night on the train I read this nice paper by David Duvenaud and colleagues.

Implicit Function TheoremMany - though not all - meta-learning or hyperparameter optimization problems can be stated as nested optimization problems.

$$Using a finite truncation of the Neumann series one can approximate the inverse Hessian in the following way:$$\left[\frac{\partial^2 \mathcal{L}_T}{\partial \theta \partial \theta}\right]^{-1} \approx \sum_{i=1}^j \left(I - \frac{\partial^2 \mathcal{L}_T}{\partial \theta \partial \theta}\right)^i.

Most crucially, methods based on implicit gradients assume that your le…

7 месяцев, 4 недели назад @ inference.vc
The secular Bayesian: Using belief distributions without really believing
The secular Bayesian: Using belief distributions without really believing The secular Bayesian: Using belief distributions without really believing

October 31, 2019The secular Bayesian: Using belief distributions without really believingThe religious BayesianMy parents didn't raise me in a religious tradition.

The secular BayesianOver the years I came to terms with my Bayesian heritage, and I now live my life as a secular Bayesian.

This choice is the real reason why the resulting update rule will end up very Bayes-rule like, as we will see later.

RationalityNow that we have an update rule which satisfies our desiderata, can we say if it's actually a good or useful update rule?

So, not only is this update rule the only update rule that satisfies the desired properties, it is also optimal under this particular definition of optimality/ra…

8 месяцев, 1 неделя назад @ inference.vc
Exponentially Growing Learning Rate? Implications of Scale Invariance induced by Batch Normalization
Exponentially Growing Learning Rate? Implications of Scale Invariance induced by Batch Normalization Exponentially Growing Learning Rate? Implications of Scale Invariance induced by Batch Normalization

October 25, 2019Exponentially Growing Learning Rate?

Implications of Scale Invariance induced by Batch NormalizationYesterday I read this intriguing paper about the midboggling fact that it is possible to use exponentially growing learning rate schedule when training neural networks with batch normalization:Zhiyuan Li and Sanjeev Arora (2019) An Exponential Learning Rate Schedule for Deep LearningThe paper provides both theoretical insights as well as empirical demonstration of this remarcable property.

So Imagine doing vanilla gradient descent (no momentum, weight decay, fixed learning rate) on such a loss surface.

However, the weight vector won't completely blow up to infinity, because th…

8 месяцев, 2 недели назад @ inference.vc
On Marginal Likelihood and Cross-Validation
On Marginal Likelihood and Cross-Validation On Marginal Likelihood and Cross-Validation

The marginal likelihood and cross-validationTo discuss the connection between marginal likelihoods to (Bayesian) cross validation, let's first define what is what.

For each of these permutations we can decompose the marginal likelihood as a product of conditionals, or equivalently we can write the log marginal likelihood as a sum of logs of the same conditionals.

So, the sum of all the terms in this matrix gives the marginal likelihood times 6 (as there are 6 columns).

This observation gives a really good motivation for using the marginal likelihood, and also gives a new perspective on how it works.

Calculating the marginal likelihood amounts to evaluating the average predictive score on al…

8 месяцев, 3 недели назад @ inference.vc
Notes on iMAML: Meta-Learning with Implicit Gradients
Notes on iMAML: Meta-Learning with Implicit Gradients Notes on iMAML: Meta-Learning with Implicit Gradients

September 19, 2019Notes on iMAML: Meta-Learning with Implicit GradientsThis week I read this cool new paper on meta-learning: it a slightly different approach compared to its predecessors based on some observations about differentiating the optima of regularized optimization.

Let me illustrate what that dependence looks like:In the figure above, let's say that we would like to minimise an objective function $f(\theta)$.

Rather than deterministically finding a particular local minimum, SGD samples different minima: when run with different random seeds it will find different minima.

The meta-learning objective now depends on $\theta_0$ in two different ways:as we change the anchor $\theta_0$,…

9 месяцев, 3 недели назад @ inference.vc
Invariant Risk Minimization: An Information Theoretic View
Invariant Risk Minimization: An Information Theoretic View Invariant Risk Minimization: An Information Theoretic View

July 19, 2019Invariant Risk Minimization: An Information Theoretic ViewI finally got around to reading this new paper by Arjovsky et al.

Here, I will describe the main idea and then provide an information theoretic view on the same topic.

$Y \perp\mkern-13mu\perp E\vert X_1, W$: The observable $X_1$ and latent $W$ shield the label $Y$ from the influence of the environment.

Say we have a parametric family of functions $f(y\vert \phi(x); \theta)$ for predicting $y$ from $\phi(x)$.

The conditional information can be approximated as follows:\begin{align}I[Y, E \vert \phi(x)] &\approx \min_\theta {E}_{x,y} \ell (f(y\vert \phi(x); \theta) - \mathbb{E}_e \min_{\theta_e} \mathbb{E}_{x,y\vert e} \el…

11 месяцев, 3 недели назад @ inference.vc
ICML Highlight: Contrastive Divergence for Combining Variational Inference and MCMC
ICML Highlight: Contrastive Divergence for Combining Variational Inference and MCMC ICML Highlight: Contrastive Divergence for Combining Variational Inference and MCMC

Ruiz and Titsias (2019) A Contrastive Divergence for Combining Variational Inference and MCMCBackground: principle of minimal improvementFirst, some background on why I found this paper particulartly interesting.

Using such improvement operator you can define an objective function for policies by measuring the extent to which the operator changes a policy.

In the case of AlphaGo Zero, the improvement operator is Monte Carlo Tree Search (MCTS).

The paper I'm talking about uses a very similar argument to come up with a contrastive divergence for variational inference, where the improvement operator is MCMC step.

Combining VI with MCMCThe two dominant ways of performing inference in latent var…

1 год назад @ inference.vc
Notes on the Limitations of the Empirical Fisher Approximation
Notes on the Limitations of the Empirical Fisher Approximation Notes on the Limitations of the Empirical Fisher Approximation

June 6, 2019Notes on the Limitations of the Empirical Fisher ApproximationThis post is a short not on an excellent recent paper on empirical Fisher information matrices:Kunstner, Balles and Hennig (2019) Limitations of the Empirical Fisher ApproximationI was debating with myself whether I should write a post about this because it's a superbly written paper that you should probably read in full.

There isn't a whole lot of novelty in the paper, but it is a great discussion paper that provides a concise overview of the Fisher information, the empirical Fisher matrix and their connectinos to generalized Gauss-Newton methods.

The third shows the gradients corrected by the empirical Fisher instea…

1 год, 1 месяц назад @ inference.vc
Perceptual Straightening of Natural Videos
Perceptual Straightening of Natural Videos Perceptual Straightening of Natural Videos

May 30, 2019Perceptual Straightening of Natural VideosVideo is an interesting domain for unsupervised, or self-supervised, representation learning.

So, for example, straight trajectories have an almost $0$ probability under a high-dimensional Brownian motion or Ornstein–Uhlenbeck (OU) process.

Results and SummaryThe main results of the paper - as expected - is that natural video sequences indeed appear to be mapped to straight trajectories in representation space.

For one, the paper assumes a Gaussian observation noise in representation space, and I wonder how robust the analysis would be to assuming heavy-tailed noise.

Similarly, our very definition of straightness and angles relies on the…

1 год, 1 месяц назад @ inference.vc
DeepSets: Modeling Permutation Invariance
DeepSets: Modeling Permutation Invariance DeepSets: Modeling Permutation Invariance

February 7, 2019DeepSets: Modeling Permutation Invariance###### guest post by [ Fabian Fuchs ](https://twitter.com/FabianFuchsML), [ Ed Wagstaff ](https://github.com/edwag), and [ Martin Engelcke ](https://twitter.com/martinengelcke)One of my favourite recent innovations in neural network architectures is Deep Sets.

In such a situation, the invariance property we can exploit is permutation invariance.

To give a short, intuitive explanation for permutation invariance, this is what a permutation invariant function with three inputs would look like: $f(a, b, c) = f(a, c, b) = f(b, a, c) = \dots$.

The Deep Sets Architecture (Sum-Decomposition)Having established that there is a need for permutat…

1 год, 5 месяцев назад @ inference.vc
Causal Inference 3: Counterfactuals
Causal Inference 3: Counterfactuals Causal Inference 3: Counterfactuals

You hopefully know enough about causal inference by now to know that $p(🎓\vert 🧔=0)$ is certainly not the quantity we seek.

Counterfactual queriesTo finally explain counterfactuals, I have to step beyond causal graphs and introduce another concept: structural equation models.

Structural Equation ModelsA causal graph encodes which variables have a direct causal effect on any given node - we call these causal parents of the node.

$f_1$ computes $x$ from its causal parent $u$, and $f_2$ computes $a$ from its causal parents $x$ and $v$.

The structural equation model (SEM) entails the causal graph, in that you can reconstruct the causal graph by looking at the inputs of each function.

1 год, 5 месяцев назад @ inference.vc
Causal Inference 2: Illustrating Interventions via a Toy Example
Causal Inference 2: Illustrating Interventions via a Toy Example Causal Inference 2: Illustrating Interventions via a Toy Example

Consequently,the joint distribution of data alone is insufficient to predict behaviour under interventions.

Finally, you can use various causal discovery techniques to try to identify the causal diagram from the data itself.

Theoretically, recovering the full causal graph from the data is impossible in general cases.

SummaryWe have seen that modeling the joint distribution can only get you so far, and if you want to predict the effect of interventions, i.e.

calculate $p(y\vert do(x))$-like quantities, you have to add a causal graph to your analysis.

1 год, 5 месяцев назад @ inference.vc
Online Bayesian Deep Learning in Production at Tencent
Online Bayesian Deep Learning in Production at Tencent Online Bayesian Deep Learning in Production at Tencent

These applications include active learning, reinforcement learning and online/continual learning.

So as I recently read a paper by Tencent, I was surprised to learn that the online Bayesian deep learning algorithm is apparently deployed in production to power click-through-rate prediction in their ad system.

Assumed Density FilteringThe method relies on the approximate Bayesian online-learning technique often referred to as assumed density filtering.

forward propagation: In Bayesian deep learning, we maintain a distribution $q(w)$ over neural network weights, and each value $w$ defines a conditional probability $p(y\vert x, w)$.

In Bayesian deep learning, we maintain a distribution $q(w)$ o…

1 год, 7 месяцев назад @ inference.vc
👻Halloween Special: Critical reviews of the worst NIPS 2018 papers.
👻Halloween Special: Critical reviews of the worst NIPS 2018 papers. 👻Halloween Special: Critical reviews of the worst NIPS 2018 papers.

posts on machine learning, statistics, opinions on things I'm reading in the space

1 год, 8 месяцев назад @ inference.vc
The Blessings of Multiple Causes: Causal Inference when you Can't Measure Confounders
The Blessings of Multiple Causes: Causal Inference when you Can't Measure Confounders The Blessings of Multiple Causes: Causal Inference when you Can't Measure Confounders

September 7, 2018The Blessings of Multiple Causes: Causal Inference when you Can't Measure ConfoundersHappy back-to-school time everyone!

In this case, the size of the kidney stone is a confounder variable.

Let's look at how this differs from the non-causal association you would measure between treatment and outcome (i.e.

there may be confounders, but all confounders causally influence at least two of the cause variables.

It identifies just enough about the causal structure (the substitute confounder variable) to then be able to make causal inferences of a certain type.

1 год, 10 месяцев назад @ inference.vc
The Spectator The Spectator
последний пост 4 месяца, 2 недели назад
Queer Exceptionalism in Science
Queer Exceptionalism in Science Queer Exceptionalism in Science

Read in 5mins (800 words)Today’s queer scientist is exceptional.

Role of the Queer ScientistFor queer people to hold a recognised role in scientific life requires an acknowledgement that to be queer has consequences.

Challenges Facing Queer ScientistsFor the queer scientist, every encounter involves a conscious act of deliberation, risk assessment, and effort, well before any effort of research is begun.

For queer scientists, every new encounter—with a colleague, supervisor, possible letter-writer, examiner, moderator, student, interviewer, acquaintance, or future-friend—sets up a stressful coming-out scene.

To be queer in science is to ask to belong and to be safe.

4 месяца, 2 недели назад @ blog.shakirm.com
Machinery of Grace
Machinery of Grace Machinery of Grace

The machinery of grace is always simple.

The machines i’m thinking of are machines with intelligence, machines that learn.

Dialogues that lead to co-design and inclusion in the mission of developing intelligent machines with grace.

Firstly, to celebrate our progress in machine learning, but one that must now be balanced using a new critical practice.

If we are successful in making global AI truly global, and I believe we can be, we set ourselves on the path to realising that intelligent machinery of grace.

7 месяцев, 3 недели назад @ blog.shakirm.com
A New Consciousness of Inclusion in Machine Learning
A New Consciousness of Inclusion in Machine Learning A New Consciousness of Inclusion in Machine Learning

On LGBT Freedoms and our Support for Machine Learning in AfricaThis is an exploration of my thinking and my personal views.

The choice of these host countries has fomented concerns throughout our machine learning community: how can we as a community committed to inclusion in every form consider hosting our conferences in countries like these that are far from inclusive?

A politics of location, and an ethics of inclusion is growing healthily within our machine learning community.

But I too am an out and proud gay machine learning scientist.

My hope is that we will always continue to experiment with the ways in which we organise and support our global machine learning community.

1 год назад @ blog.shakirm.com
Racialised Lives and the Life Beyond
Racialised Lives and the Life Beyond Racialised Lives and the Life Beyond

The Black women is racialised, and so too is the White man, as is every person we have ever known, and so the cycle of our racialised lives lives on.

About two-and-a-half years ago, I was part of creating a new organisation called the Deep Learning Indaba, as one attempt to engage with these questions.

The grassroots are those groups within our institutions, like our LGBT resource group within DeepMind, and those outside movements, like the Deep Learning Indaba.

I see the leadership of the Deep Learning Indaba as such a collective.

But I think we show the power of political love today, in this room, with our memory, with our energy, and in the celebration of progress that has brought us her…

1 год, 1 месяц назад @ blog.shakirm.com
Talk: How Do We Support Under-represented Groups To Put Themselves Forward?
Talk: How Do We Support Under-represented Groups To Put Themselves Forward? Talk: How Do We Support Under-represented Groups To Put Themselves Forward?

As you think of this question, consider the journey that is taken by the under-represented groups we might have in mind.

Journey’s like mine are our struggle credentials.

This room is filled with struggle credentials.

Struggle credentials play too much of a role in our present.

It is the under-represented groups that must eventually be put forward.

1 год, 8 месяцев назад @ blog.shakirm.com
Machine Learning Trick of the Day (8): Instrumental Thinking
Machine Learning Trick of the Day (8): Instrumental Thinking Machine Learning Trick of the Day (8): Instrumental Thinking

The instrumental variables idea is conceptually simple: we introduce new observed variables z, called instrumental variables, into our model; figure 1 (right).

And this is the trick: instrumental variables are special subset of the data we already have, but they allow us to remove the effect of confounders.

Our problem is to learn a linear value function using features (when in state x) using parameters so that .

But this probabilistic viewpoint through instrumental variables means that we can think of alternative ways of extending this view.

Like every trick in this series, the instrumental variables give us an alternative way to think about existing problems.

1 год, 8 месяцев назад @ blog.shakirm.com
Decolonising Artificial Intelligence
Decolonising Artificial Intelligence Decolonising Artificial Intelligence

· Read in 6mins · 1297 words ·The Artificial Intelligence we believe to be global, is far from it.

Inevitably, a call will be made to decolonise artificial intelligence.

The call for decolonisation in artificial intelligence is yet to reach its full volume.

Kai Fu Lee, The Real Threat of Artificial Intelligence, June 2017We immediately recognise the colonial nature of this possible future.

The only AI that empowers and works for the benefit of humanity is a truly global AI.

1 год, 9 месяцев назад @ blog.shakirm.com
The Price of Transformation
The Price of Transformation The Price of Transformation

The price of transformation is ours to pay.

Transformation cannot be separated from my other pillars, for they require transformation to succeed.

The price of transformation cannot be paid in this way.

We must all confront the question: What is the price of transformation?

We need to convince ourselves that the price of transformation is something we are willing to pay, and that we should pay.

1 год, 9 месяцев назад @ blog.shakirm.com
Machine Learning Trick of the Day (7): Density Ratio Trick
Machine Learning Trick of the Day (7): Density Ratio Trick Machine Learning Trick of the Day (7): Density Ratio Trick

The same is true if we want to compare probability densities: either through a density difference or a density ratio.

Density ratios are ubiquitous in machine learning, and will be our focus.

Density Ratio EstimationThe central task in the above five statistical quantities is to efficiently compute the ratio .

This is where the density ratio trick or formally, density ratio estimation, enters: it tells us to construct a binary classifier that distinguishes between samples from the two distributions.

This final derivation says that the problem of density ratio estimation is equivalent to that of binary classification.

2 года, 5 месяцев назад @ blog.shakirm.com
Cognitive Machine Learning (2): Uncertain Thoughts
Cognitive Machine Learning (2): Uncertain Thoughts Cognitive Machine Learning (2): Uncertain Thoughts

These types of thinking are secondary-levels of thinking: a thinking about thinking.

Like the primary colours, our primary thoughts are those that are the basis of our cognition.

Secondary colours use the primary colours as their basis, and similarly, secondary thoughts are thoughts about our primary thoughts.

Our memories, decisions and attitudes are amongst our primary thoughts, and for each we have secondary thoughts—metacognitive confidence assessments—that guide our behaviours.

Again, we can make such assessments in two ways: about the decisions we are still to make, a prospective decision confidence; and decisions we have already made, a retrospective decision confidence.

3 года, 4 месяца назад @ blog.shakirm.com
大トロ 大トロ
последний пост 3 месяца, 3 недели назад
Neuroevolution of Self-Interpretable Agents
Neuroevolution of Self-Interpretable Agents Neuroevolution of Self-Interpretable Agents

Agents with a self-attention “bottleneck” not only can solve these tasks from pixel inputs with only 4000 parameters, but they are also better at generalization.

Redirecting to attentionagent.github.io, where the article resides.

3 месяца, 3 недели назад @ blog.otoro.net
Learning to Predict Without Looking Ahead
Learning to Predict Without Looking Ahead Learning to Predict Without Looking Ahead

Rather than hardcoding forward prediction, we try to get agents to learn that they need to predict the future.

Redirecting to learningtopredict.github.io, where the article resides.

8 месяцев, 2 недели назад @ blog.otoro.net
Weight Agnostic Neural Networks
Weight Agnostic Neural Networks Weight Agnostic Neural Networks

We search for neural network architectures that can already perform various tasks even when they use random weight values.

Redirecting to weightagnostic.github.io, where the article resides.

1 год, 1 месяц назад @ blog.otoro.net
Learning Latent Dynamics for Planning from Pixels
Learning Latent Dynamics for Planning from Pixels Learning Latent Dynamics for Planning from Pixels

PlaNet learns a world model from image inputs only and successfully leverages it for planning in latent space.

Redirecting to planetrl.github.io, where the article resides.

1 год, 4 месяца назад @ blog.otoro.net
Reinforcement Learning for Improving Agent Design
Reinforcement Learning for Improving Agent Design Reinforcement Learning for Improving Agent Design

Little dude rewarded for having little legs.

Redirecting to designrl.github.io, where the article resides.

1 год, 9 месяцев назад @ blog.otoro.net
World Models Experiments
World Models Experiments World Models Experiments

In this article I will give step-by-step instructions for reproducing the experiments in the World Models article (pdf).

For general discussion about the World Models article, there are already some good discussion threads here in the GitHub issues page of the interactive article.

World Models (pdf)A Visual Guide to Evolution StrategiesEvolving Stable StrategiesBelow is optionalMixture Density NetworksMixture Density Networks with TensorFlowRead tutorials on Variational Autoencoders if you are not familiar with them.

I use a combination of OS X for inference, but trained models using Google Cloud VMs.

You should update your git repo with these new models using git add doomrnn/tf_models/*.js…

2 года, 1 месяц назад @ blog.otoro.net
World Models
World Models World Models

Can agents learn inside of their own dreams?

Redirecting to worldmodels.github.io, where the article resides.

2 года, 3 месяца назад @ blog.otoro.net
Evolving Stable Strategies
Evolving Stable Strategies Evolving Stable Strategies

popsize ): # init the agent with a solution agent = Agent ( solutions [ i ]) # rollout env with this agent fitlist [ i ] = rollout ( agent , env ) # give scores results back to ES solver .

One way to convert into a stochastic policy is to make random.

Robot arm grasping task using a stochastic policy.

The Minitaur model in pybullet is designed to mimic the real physical Minitaur.

After making the ball smaller, CMA-ES was able to find a stochastic policy that can walk and balance the ball at the same time.

2 года, 8 месяцев назад @ blog.otoro.net
A Visual Guide to Evolution Strategies
A Visual Guide to Evolution Strategies A Visual Guide to Evolution Strategies

In this post I explain how evolution strategies (ES) work with the aid of a few visual examples.

OpenAI published a paper called Evolution Strategies as a Scalable Alternative to Reinforcement Learning where they showed that evolution strategies, while being less data efficient than RL, offer many benefits.

Schaffer-2D FunctionRastrigin-2D FunctionAlthough there are many definitions of evolution strategies, we can define an evolution strategy as an algorithm that provides the user a set of candidate solutions to evaluate a problem.

Let’s visualise the scheme one more time, on the entire search process on both problems:Because CMA-ES can adapt both its mean and covariance matrix using inform…

2 года, 8 месяцев назад @ blog.otoro.net
Teaching Machines to Draw
Teaching Machines to Draw Teaching Machines to Draw

In this work, we investigate an alternative to traditional pixel image modelling approaches, and propose a generative model for vector images.

For example, we can subtract the latent vector of an encoded pig head from the latent vector of a full pig, to arrive at a vector that represents the concept of a body.

As we saw earlier, a model trained to draw pigs can be made to draw pig-like trucks if given an input sketch of a truck.

Exploring the latent space between different objects can potentially enable creative designers to find interesting intersections and relationships between different drawings:Exploring the latent space between cats and buses, elephants and pigs, and various owls.

In …

3 года, 1 месяц назад @ blog.otoro.net
Recurrent Neural Network Tutorial for Artists
Recurrent Neural Network Tutorial for Artists Recurrent Neural Network Tutorial for Artists

In particular, the experiments in the post help visualise the internals of a recurrent neural network trained to generate handwriting.

Recurrent Neural Network for HandwritingWe have pre-trained a recurrent neural network model to preform the handwriting task described in the previous section.

var x , y ; var dx , dy ; var pen ; var prev_pen ; var rnn_state ; var pdf ; var temperature = 0.65 ; var screen_width = window .

get_pdf ( rnn_state ); [ dx , dy , pen ] = Model .

I haven’t personally used keras.js, and I found it fun to just write the handwriting model from scratch in Javascript.

3 года, 6 месяцев назад @ blog.otoro.net
Hypernetworks
Hypernetworks Hypernetworks

In our paper, we use HyperNetworks to explore a middle ground - to enforce a relaxed version of weight-tying.

The more exciting work is in the second part of my paper where we apply Hypernetworks to Recurrent Networks.

Dynamic HypernetworksAs mentioned in the Introduction, we also tried to apply Hypernetworks on Recurrent Networks, and I feel this is the main contribution of the research.

Our approach is to put a small LSTM cell (called the HyperLSTM cell) inside a large LSTM cell (the main LSTM).

For our implementation of Dynamic Hypernetworks, we made it so that we can just plug our HyperLSTM cell into any TensorFlow code written to use tf.nn.rnn_cell objects, since the HyperLSTM inherite…

3 года, 9 месяцев назад @ blog.otoro.net
Generating Large Images from Latent Vectors - Part Two
Generating Large Images from Latent Vectors - Part Two Generating Large Images from Latent Vectors - Part Two

Random gaussian latent vectors were generated from numpy.random and fed into the generative network to obtain these images.

Our generator can produce large random images of digits using random gaussian vectors as input.

Unlike the previous model though, the generated images do not necessarily have to look exactly like the set of training images.

All the generator has to do is to create a set of new images that share the same classification labels of the set of training images.

Description of Generator NetworkThe generator used in the previous model uses 4 large layers of 128 nodes that are fully connected.

4 года, 1 месяц назад @ blog.otoro.net
Neural Network Evolution Playground with Backprop NEAT
Neural Network Evolution Playground with Backprop NEAT Neural Network Evolution Playground with Backprop NEAT

This demo will attempt to use a genetic algorithm to produce efficient, but atypical neural network structures to classify datasets borrowed from TensorFlow Playground.

People started experimenting with different neural network configurations, such as how many neural network layers are actually needed to fit a certain data set, or what initial features should be used for another data set.

In addition to weight-search, Deep Learning research has also produced many powerful neural network architectures that are important building blocks.

Evolving Neural Network TopologyNeuroevolution of Augmenting Topologies (NEAT) is a method that can evolve new types of neural networks based on genetic algo…

4 года, 2 месяца назад @ blog.otoro.net
Interactive Abstract Pattern Generation Javascript Demo
Interactive Abstract Pattern Generation Javascript Demo Interactive Abstract Pattern Generation Javascript Demo

Interactive Javascript Demo for Abstract Pattern Generation.

Although there were some code available previously in Javascript, it wasn’t general enough to use as a tool for a digital artist.

So I took the Javascript code previously written and spent an hour or two to fine tuned it into a simple web app.

In addition, the user is able to specify the size and depth of the generator neural network.

The depth and size of the network, and also the image resolution of the output can all be customised in the web app.

4 года, 2 месяца назад @ blog.otoro.net
The Unofficial Google Data Science Blog The Unofficial Google Data Science Blog
последний пост 7 месяцев, 1 неделя назад
Humans-in-the-loop forecasting: integrating data science and business planning
Humans-in-the-loop forecasting: integrating data science and business planning Humans-in-the-loop forecasting: integrating data science and business planning

Figure 1: A Google data centerAs an example, consider Google’s forecasting and planning for data center capacity.

In particular, the data scientist must take responsibility for stakeholders approving the “best” forecast from all available information sources.

It required investments from our data science team to re-think our statistical forecasting approach to make it easier to compare against customer forecasts.

It also owns Google’s internal time series forecasting platform described in an earlier blog post .

But looking through the blogosphere, some go further and posit that “platformization” of forecasting and “forecasting as a service” can turn anyone into a data scientist at the push …

7 месяцев, 1 неделя назад @ unofficialgoogledatascience.com
Estimating the prevalence of rare events — theory and practice
Estimating the prevalence of rare events — theory and practice Estimating the prevalence of rare events — theory and practice

$$S(v_1) = S(v_2) \implies \frac{q(v_1)}{p(v_1)} = \frac{q(v_2)}{p(v_2)}$$The ratio between the importance distribution and target distribution is thus a function of $S(v)$:$$\frac{q(v)}{p(v)} = \frac{\tilde{q}(S(v))}{\tilde{p}(S(v))}$$where $\tilde{p}$ and $\tilde{q}$ are PMFs of $S(v)$ under the target distribution and importance distribution respectively.

In our case when the events are rare and the probability of high conditional prevalence rate is small under the target distribution, the difference between the methods is minor.

We also discuss how to choose $q$ with respect to the conditional prevalence rate $g(S(v))=\mathbb{E}_p\left[f(V)|S(V)=S(v)\right]$.

Conclusion In this post, we…

10 месяцев, 2 недели назад @ unofficialgoogledatascience.com
Misadventures in experiments for growth
Misadventures in experiments for growth Misadventures in experiments for growth

In summary, classic experimentation is applicable to fledgling products but in a much more limited way than to established products.

For our music example, we imagined that EDM users don't approximate the target population for some experiments.

The behavior of this single user user appears in our data as a large number of impressions with conversions.

A word on growth hackingOf particular concern in growth hacking is the focus on influencers for pushing growth.

For our music example, we imagined that EDM users don't approximate the target population for some experiments.

1 год, 2 месяца назад @ unofficialgoogledatascience.com
Crawling the internet: data science within a large engineering system
Crawling the internet: data science within a large engineering system Crawling the internet: data science within a large engineering system

When queries arrive, the search system matches the inferred meaning of the query to web pages on the basis of these snapshots.

This measure of web page value is on a meaningful linear scale, such that our freshness metric (a weighted average) has an intuitive interpretation.

A global constraint of how much compute and network resources Google itself is willing to dedicate to crawling web pages.

In some regimes (and in practice for google search), a greedy algorithm would devote more recrawl resources towards high value pages, as lower value pages would commonly starve.

We can use this function to sort the web pages, and then determine which web pages should be scheduled for immediate crawl.

1 год, 11 месяцев назад @ unofficialgoogledatascience.com
Compliance bias in mobile experiments
Compliance bias in mobile experiments Compliance bias in mobile experiments

The differences between the distribution of users experiencing the treatment and the population are likely to be a key factor here.

Compliance Bias A central issue in this application is that users assigned treatment sometimes do not actually experience the treatment at $T_{\mathrm{measure}}$, and furthermore this set of users is not random.

Here, we can draw a direct analogy to Compliance Bias, which is primarily described in literature on the analysis of medical studies.

Propensity scoring within the treatmentFig 5: Estimated probability of experiencing the treatment in the treatment group.

Here, we ignore any control group, and analyze the treatment group as a self-contained observationa…

2 года, 3 месяца назад @ unofficialgoogledatascience.com
Designing A/B tests in a collaboration network
Designing A/B tests in a collaboration network Designing A/B tests in a collaboration network

Our model considers two aspects of network effects:Homophily or similarity within network: users collaborating in network tend to behave similarly.

or similarity within network: users collaborating in network tend to behave similarly.

The network topology itself is the actual collaboration network we observe for GCP.When users are connected in a network, their treatment assignments can generate network effects through their interactions.

In other words, for the three methods of randomizationuniform random componentuniform random projectstratified random component we simulate confidence intervals for A/A tests, i.e.

Conclusion Designing randomized experiments on a network of users is more ch…

2 года, 5 месяцев назад @ unofficialgoogledatascience.com
Unintentional data
Unintentional data Unintentional data

The Future of Data AnalysisAvalanche of questions: the role of the data scientist amid unintentional dataIs it relevant to our goals?

In the world of big, unintentional data there are many discoveries to be had which have no bearing on the organization’s goals.

Democratization of analysis: quantity has a quality all its own Just as dealing with unintentional data shapes the role of the data scientists in their organization, it also shapes the day to day practice of data analysis.

Understanding the goals of the organization as well as guiding principles for extracting value from data are both critical for success in this environment.Thankfully not only have modern data analysis tools made da…

2 года, 9 месяцев назад @ unofficialgoogledatascience.com
Fitting Bayesian structural time series with the bsts R package
Fitting Bayesian structural time series with the bsts R package Fitting Bayesian structural time series with the bsts R package

When fitting bsts models that contain a regression component, extra arguments captured by ... are passed to the SpikeSlabPrior function from the BoomSpikeSlab package.

# Fit a bsts model with expected model size 1, the default.

model2 <- bsts(iclaimsNSA ~ ., state.specification = ss, niter = 1000, data = initial.claims)# Fit a bsts model with expected model size 5, to include more coefficients.

(a) (b)Figure 10: Regression coefficients for the (a) plain logistic regression model and (b) time series logistic regression model under equivalent spike and slab priors.

These are a widely useful class of time series models, known in various literatures as "structural time series," "state space mod…

3 года назад @ unofficialgoogledatascience.com
Our quest for robust time series forecasting at scale
Our quest for robust time series forecasting at scale Our quest for robust time series forecasting at scale

The demand for time series forecasting at Google grew rapidly along with the company over its first decade.

That is, for an attempt to develop methods and tools that would facilitate accurate large-scale time series forecasting at Google.

The demand for time series forecasting at Google grew rapidly along with the company over its first decade.

But like our approach, Prophet aims to be an automatic, robust forecasting tool.At lastly, "forecasting" for us did not mean anomaly detection.

APAby ERIC TASSONE, FARZAN ROHANITime series forecasting enjoys a rich and luminous history, and today is an essential element of most any business operation.

3 года, 2 месяца назад @ unofficialgoogledatascience.com
Attributing a deep network’s prediction to its input features
Attributing a deep network’s prediction to its input features Attributing a deep network’s prediction to its input features

We consider a deep network using the For concreteness, let us focus on a network that performs object recognition.

Deep networks have multiple layers of logic and coefficients, combined using nonlinear activation functions .

Application to other networks Our paper also includes application of integrated gradients to other networks (none of these networks were trained by us).

There is also work (such as this ) on architecting deep networks in ways that allow us to understand the internal representations of these networks.

Overall, we hope that deep networks lose their reputation for being impenetrable black-boxes which perform black magic.

3 года, 4 месяца назад @ unofficialgoogledatascience.com
Causality in machine learning
Causality in machine learning Causality in machine learning

An obvious attempt to fix this is to upweight randomized data in training, or even train the model solely on the randomized data.

As we observed at the start of this post, standard machine learning techniques don’t distinguish between randomized and observational data the way statistical models do.

Conclusion In this post we described how some randomized data may be applied both to check and improve the accuracy of a machine learning system trained largely on observational data.

Indeed, machine learning generally lacks the vocabulary to capture the distinction between observational data and randomized data that statistics finds crucial.

Rather, the focus of this post is on combining observa…

3 года, 5 месяцев назад @ unofficialgoogledatascience.com
Practical advice for analysis of large, complex data sets
Practical advice for analysis of large, complex data sets Practical advice for analysis of large, complex data sets

Some people seemed to be naturally good at doing this kind of high quality data analysis.

Process Separate Validation, Description, and EvaluationValidation or Initial Data Analysis: Do I believe data is self-consistent, that the data was collected correctly, and that data represents what I think it does?

I think about about exploratory data analysis as having 3 interrelated stages:By separating these phases, you can more easily reach agreement with others.

Acknowledge and count your filtering Almost every large data analysis starts by filtering the data in various stages.

Almost every large data analysis starts by filtering the data in various stages.

3 года, 8 месяцев назад @ unofficialgoogledatascience.com
Statistics for Google Sheets
Statistics for Google Sheets Statistics for Google Sheets

IntroductionStatistics for Google Sheets is an add-on for Google Sheets that brings elementary statistical analysis tools to spreadsheet users.

The goal of the Statistics app is to “democratize data science” by putting elementary statistics capabilities in the hands of anyone with a Google account.

If you look closely at the boxplots you can see that returns following down days have slightly greater variation than returns following up days.

Finally, you can use logistic regression to see how a previous day’s return affects the probability of the next day’s return being positive.

Statistics for Google Sheets gives analysts and students the tools to conduct elementary statistical analyses in …

3 года, 9 месяцев назад @ unofficialgoogledatascience.com
Next generation tools for data science
Next generation tools for data science Next generation tools for data science

Introductionthe solution to write data processing pipelines scalable to hundreds of terabytes (or more) is evidenced by the massive uptake.

That MapReduce wassolution to write data processing pipelines scalable to hundreds of terabytes (or more) is evidenced by the massive uptake.

Widely used in medicine for count data, the MH estimator and its generalizations are ubiquitous within data science at Google.

filter( lambda x: x != header) .

Beam/Dataflow’s sweet spot: streaming processing Streaming processing is an ever-increasingly important topic for data science.

3 года, 10 месяцев назад @ unofficialgoogledatascience.com
Mind Your Units
Mind Your Units Mind Your Units

The perils of incorrect units Is the idea of 'minding our units' just some esoteric issue, or can this actually hurt us in practice?

How do we mind our units in analyses at Google?

The above simulation already hints at one of our approaches to incorporating the group structure in some analyses at Google.

Regardless of how you do it, do remember to mind your units.

Regardless of how you do it, do remember to mind your units.

3 года, 11 месяцев назад @ unofficialgoogledatascience.com
Andrew Karpathy
последний пост 4 недели, 1 день назад
Biohacking Lite
Biohacking Lite Biohacking Lite

The goal of this post is to nerd out over biochemistry and energy metabolism in the animal kingdom, and potentially inspire others on their own biohacking lite adventure.

It’s highly amusing to think that every single time you breathe out (in a fasted state) you are literally breathing out your fat carbon by carbon.

Energy deficit.

To validate the energy deficit math I spent 100 days around late 2019 very carefully tracking my daily energy input and output.

That said, focusing on fat, both approaches show me losing body fat at roughly the same rate, though they are off by an absolute offset.

4 недели, 1 день назад @ karpathy.github.io
A Recipe for Training Neural Networks
A Recipe for Training Neural Networks

Some few weeks ago I posted a tweet on “the most common neural net mistakes”, listing a few common gotchas related to training neural nets.

1) Neural net training is a leaky abstractionIt is allegedly easy to get started with training neural nets.

This is just a start when it comes to training neural nets.

As a result, (and this is reeaally difficult to over-emphasize) a “fast and furious” approach to training neural networks does not work and only leads to suffering.

focus on training loss) and then regularize it appropriately (give up some training loss to improve the validation loss).

1 год, 2 месяца назад @ karpathy.github.io
(started posting on Medium instead)
(started posting on Medium instead)

The current state of this blog (with the last post 2 years ago) makes it look like I’ve disappeared.

I’ve certainly become less active on blogs since I’ve joined Tesla, but whenever I do get a chance to post something I have recently been defaulting to doing it on Medium because it is much faster and easier.

I still plan to come back here for longer posts if I get any time, but I’ll default to Medium for everything short-medium in length.

TLDRHave a look at my Medium blog.

2 года, 5 месяцев назад @ karpathy.github.io
A Survival Guide to a PhD
A Survival Guide to a PhD A Survival Guide to a PhD

Unlike the undergraduate guide, this one was much more difficult to write because there is significantly more variation in how one can traverse the PhD experience.

You can go one way (PhD -> anywhere else) but not the other (anywhere else -> PhD -> academia/research; it is statistically less likely).

The adviser is an extremely important person who will exercise a lot of influence over your PhD experience.

During your PhD you’ll get to acquire this sense yourself.

It’s usually a painful exercise for me to look through some of my early PhD paper drafts because they are quite terrible.

3 года, 10 месяцев назад @ karpathy.github.io
Deep Reinforcement Learning: Pong from Pixels
Deep Reinforcement Learning: Pong from Pixels Deep Reinforcement Learning: Pong from Pixels

This is a long overdue blog post on Reinforcement Learning (RL).

From left to right: Deep Q Learning network playing ATARI, AlphaGo, Berkeley robot stacking Legos, physically-simulated quadruped leaping over terrain.

Policy network.

For example, suppose we compute \(R_t\) for all of the 20,000 actions in the batch of 100 Pong game rollouts above.

The total number of episodes was approximately 8,000 so the algorithm played roughly 200,000 Pong games (quite a lot isn’t it!)

4 года, 1 месяц назад @ karpathy.github.io
Short Story on AI: A Cognitive Discontinuity.
Short Story on AI: A Cognitive Discontinuity. Short Story on AI: A Cognitive Discontinuity.

Another great source of good reputation for Visceral were the large number of famous interventions carried out by autonomous Visceral agents.

The list went on and on - one month ago an autonomous Visceral agent recognized a remote drone attack.

He was running the routine software diagnostics on the Visceral agent and one of them had just failed.

The software diagnostics were only at 5% complete, and Merus knew they would take a while to run to completion.

Merus’ avatar broke the silence in the last second: “Come meet me here.” And then the connection was lost.

4 года, 7 месяцев назад @ karpathy.github.io
What a Deep Neural Network thinks about your #selfie
What a Deep Neural Network thinks about your #selfie What a Deep Neural Network thinks about your #selfie

In this fun experiment we’re going to do just that: We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones.

what if someone posted a very good selfie but it was late at night, so perhaps not as many people saw it and it got less likes?

What makes a good #selfie ?

To take a good selfie, Do:Be female.

Also, with some relief, it seems that the best selfies do not seem to be the ones that show the most skin.

4 года, 8 месяцев назад @ karpathy.github.io
The Unreasonable Effectiveness of Recurrent Neural Networks
The Unreasonable Effectiveness of Recurrent Neural Networks The Unreasonable Effectiveness of Recurrent Neural Networks

A glaring limitation of Vanilla Neural Networks (and also Convolutional Networks) is that their API is too constrained: they accept a fixed-sized vector as input (e.g.

If training vanilla neural nets is optimization over functions, training recurrent nets is optimization over programs.

At the core, RNNs have a deceptively simple API: They accept an input vector x and give you an output vector y .

Fun with RNNsAll 5 example character models below were trained with the code I’m releasing on Github.

These models have about 10 million parameters, which is still on the lower end for RNN models.

5 лет, 1 месяц назад @ karpathy.github.io
Breaking Linear Classifiers on ImageNet
Breaking Linear Classifiers on ImageNet Breaking Linear Classifiers on ImageNet

speech recognition systems), and most importantly, also to simple, shallow, good old-fashioned Linear Classifiers (Softmax classifier, or Linear Support Vector Machines, etc.).

Instead, lets fool a linear classifier and lets also keep with the theme of breaking models on images because they are fun to look at.

With input images of size 64x64x3 and 1000 ImageNet classes we therefore have 64x64x3x1000 = 12.3 million weights (beefy linear model!

We can then visualize each of the learned weights by reshaping them as images:Example linear classifiers for a few ImageNet classes.

Linear classifier with lower regularization (which leads to more noisy class weights) is easier to fool (top).

5 лет, 3 месяца назад @ karpathy.github.io
What I learned from competing against a ConvNet on ImageNet
What I learned from competing against a ConvNet on ImageNet What I learned from competing against a ConvNet on ImageNet

The 100,000 test set images are released with the dataset, but the labels are withheld to prevent teams from overfitting on the test set.

It’s fun to note that about 4 years ago I performed a similar (but much quicker and less detailed) human classification accuracy analysis on CIFAR-10.

In total, we attribute 24 (24%) of GoogLeNet errors and 12 (16%) of human errors to this category.

We estimate that approximately 22 (21%) of GoogLeNet errors fall into this category, while none of the human errors do.

On the hand, a large majority of human errors come from fine-grained categories and class unawareness.

5 лет, 10 месяцев назад @ karpathy.github.io
Off the Convex Path
последний пост 4 дня, 20 часов назад
Training GANs - From Theory to Practice
Training GANs - From Theory to Practice Training GANs - From Theory to Practice

Training GANs - From Theory to PracticeGANs, originally discovered in the context of unsupervised learning, have had far reaching implications to science, engineering, and society.

However, training GANs remains challenging (in part) due to the lack of convergent algorithms for nonconvex-nonconcave min-max optimization.

In this post, we present a new first-order algorithm for min-max optimization which is particularly suited to GANs.

ConclusionIn this post we have shown how to develop a practical and convergent first-order algorithm for training GANs.

Our simulations show that a version of this algorithm can lead to more stable training of GANs.

4 дня, 20 часов назад @ offconvex.org
An equilibrium in nonconvex-nonconcave min-max optimization
An equilibrium in nonconvex-nonconcave min-max optimization An equilibrium in nonconvex-nonconcave min-max optimization

An equilibrium in nonconvex-nonconcave min-max optimizationWhile there has been incredible progress in convex and nonconvex minimization, a multitude of problems in ML today are in need of efficient algorithms to solve min-max optimization problems.

Unlike minimization, where algorithms can always be shown to converge to some local minimum, there is no notion of a local equilibrium in min-max optimization that exists for general nonconvex-nonconcave functions.

Our greedy min-max equilibriumWe use the greedy max function to define a new second-order notion of local optimality for min-max optimization, which we refer to as a greedy min-max equilibrium.

This allows us to define a notion of gre…

2 недели, 2 дня назад @ offconvex.org
Exponential Learning Rate Schedules for Deep Learning (Part 1)
Exponential Learning Rate Schedules for Deep Learning (Part 1) Exponential Learning Rate Schedules for Deep Learning (Part 1)

Exponential Learning Rate Schedules for Deep Learning (Part 1)This blog post concerns our ICLR20 paper on a surprising discovery about learning rate (LR), the most basic hyperparameter in deep learning.

These divergent approaches suggest that LR, the most basic and intuitive hyperparameter in deep learning, has not revealed all its mysteries yet.

SOTA performance with exponential LRAs mentioned, reaching state-of-the-art accuracy requires reducing the learning rate a few times.

Suppose the training has $K$ phases, and the learning rate is divided by some constant $C_I>1$ when entering phase $I$.

ConclusionWe hope that this bit of theory and supporting experiments have changed your outlook o…

2 месяца, 2 недели назад @ offconvex.org
Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK)
Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK) Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK)

gradient flow) is equivalent to a kernel regression predictor with a deterministic kernel called neural tangent kernel (NTK).

Now we describe how training an ultra-wide fully-connected neural network leads to kernel regression with respect to the NTK.

In the large width limit, it turns out that the time-varying kernel $ker_t(\cdot,\cdot)$ is (with high probability) always close to a deterministic fixed kernel $ker_{\mathsf{NTK}}(\cdot,\cdot)$, which is the neural tangent kernel (NTK).

Now, at least we have a better understanding of a class of ultra-wide neural networks: they are captured by neural tangent kernels!

Similarly, one can try to translate other architectures like recurrent neural…

9 месяцев, 1 неделя назад @ offconvex.org
Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent
Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent

Understanding implicit regularization in deep learning by analyzing trajectories of gradient descentSanjeev’s recent blog post suggested that the conventional view of optimization is insufficient for understanding deep learning, as the value of the training objective does not reliably capture generalization.

In recent years, researchers have come to realize the importance of implicit regularization induced by the choice of optimization algorithm.

This theorem disqualifies Schatten quasi-norms as the implicit regularization in deep matrix factorizations, and instead suggests that all depths correspond to nuclear norm.

Full details behind our results on “implicit regularization as norm minimi…

1 год назад @ offconvex.org
Landscape Connectivity of Low Cost Solutions for Multilayer Nets
Landscape Connectivity of Low Cost Solutions for Multilayer Nets Landscape Connectivity of Low Cost Solutions for Multilayer Nets

Landscape Connectivity of Low Cost Solutions for Multilayer NetsA big mystery about deep learning is how, in a highly nonconvex loss landscape, gradient descent often finds near-optimal solutions —those with training cost almost zero— even starting from a random initialization.

Solutions A and B have low cost but the line connecting them goes through solutions with high cost.

Mode Connectivity.

2019) did try to explain the phenomenon of mode connectivity in simple settings (the first of these demonstrated mode connectivity empirically for multi-layer nets).

Thus to explain mode connectivity for multilayer nets we will need to leverage some stronger property of typical solutions discovered v…

1 год назад @ offconvex.org
Is Optimization a Sufficient Language for Understanding Deep Learning?
Is Optimization a Sufficient Language for Understanding Deep Learning?

Is Optimization a Sufficient Language for Understanding Deep Learning?

In this Deep Learning era, machine learning usually boils down to defining a suitable objective/cost function for the learning task at hand, and then optimizing this function using some variant of gradient descent (implemented via backpropagation).

I am suggesting that deep learning algorithms also have important properties that are not always reflected in the objective value.

by playing with batch sizes and learning rates) can be preferable to perfect optimization, even in simple settings such as regression.

NB: Empirically we find that Adam, the celebrated acceleration method for deep learning, speeds up optimization a…

1 год, 1 месяц назад @ offconvex.org
Contrastive Unsupervised Learning of Semantic Representations: A Theoretical Framework
Contrastive Unsupervised Learning of Semantic Representations&#58; A Theoretical Framework Contrastive Unsupervised Learning of Semantic Representations&#58; A Theoretical Framework

Contrastive Unsupervised Learning of Semantic Representations: A Theoretical FrameworkSemantic representations (aka semantic embeddings) of complicated data types (e.g.

Researchers are most interested in unsupervised representation learning using unlabeled data.

samples $x, x^{+}$ from the distribution $D_{c^+}$.

The highlighted parts in the table show that the unsupervised representations compete well with the supervised representations on the average $k$-way classification task ($k=2, 10$).

We find this to be true for unsupervised representations, and surprisingly for supervised representations as well.

1 год, 3 месяца назад @ offconvex.org
The search for biologically plausible neural computation: A similarity-based approach
The search for biologically plausible neural computation&#58; A similarity-based approach The search for biologically plausible neural computation&#58; A similarity-based approach

By re-ordering the variables and introducing a new variable, ${\bf W} \in \mathbb{R}^{k\times n}$, we obtain:To prove the second identity, find optimal ${\bf W}$ by taking a derivative of the expression on the right with respect to ${\bf W}$ and setting it to zero, and then substitute the optimal ${\bf W}$ back into the expression.

The price paid for this simplification is the appearance of the minimax optimization problem in variables, ${\bf W}$ and ${\bf M}$.

Variables ${\bf W}$ and ${\bf M}$ are represented by the weights of synapses in feedforward and lateral connections respectively.

In neuroscience, learning rules (2.7) for ${\bf W}$ and ${\bf M}$ are called Hebbian and anti-Hebbian r…

1 год, 7 месяцев назад @ offconvex.org
Machine Learning Mastery Machine Learning Mastery
последний пост 1 день, 10 часов назад
6 Dimensionality Reduction Algorithms With Python
6 Dimensionality Reduction Algorithms With Python 6 Dimensionality Reduction Algorithms With Python

In this tutorial, you will discover how to fit and evaluate top dimensionality reduction algorithms in Python.

Download Your FREE Mini-CourseDimensionality Reduction AlgorithmsThere are many algorithms that can be used for dimensionality reduction.

Examples of Dimensionality ReductionIn this section, we will review how to use popular dimensionality reduction algorithms in scikit-learn.

For more on LDA for dimensionality reduction, see the tutorial:The scikit-learn library provides the LinearDiscriminantAnalysis class implementation of Linear Discriminant Analysis that can be used as a dimensionality reduction data transform.

TutorialsAPIsSummaryIn this tutorial, you discovered how to fit an…

1 день, 10 часов назад @ machinelearningmastery.com
4 Automatic Outlier Detection Algorithms in Python
4 Automatic Outlier Detection Algorithms in Python 4 Automatic Outlier Detection Algorithms in Python

How to correctly apply automatic outlier detection and removal to the training dataset only to avoid data leakage.

It would be invalid to fit the outlier detection method on the entire training dataset as this would result in data leakage.

shape ) # fit the model model = LinearRegression ( ) model .

fit ( X_train , y_train ) # evaluate the model yhat = model .

How to correctly apply automatic outlier detection and removal to the training dataset only to avoid data leakage.

3 дня, 10 часов назад @ machinelearningmastery.com
How to Use Feature Extraction on Tabular Data for Machine Learning
How to Use Feature Extraction on Tabular Data for Machine Learning How to Use Feature Extraction on Tabular Data for Machine Learning

In this tutorial, you will discover how to use feature extraction for data preparation with tabular data.

Tutorial OverviewThis tutorial is divided into three parts; they are:Feature Extraction Technique for Data Preparation Dataset and Performance Baseline Wine Classification Dataset Baseline Model Performance Feature Extraction Approach to Data PreparationFeature Extraction Technique for Data PreparationData preparation can be challenging.

Feature Extraction Approach to Data PreparationIn this section, we can explore whether we can improve performance using the feature extraction approach to data preparation.

Related TutorialsBooksAPIsSummaryIn this tutorial, you discovered how to use fea…

5 дней, 10 часов назад @ machinelearningmastery.com
How to Choose Data Preparation Methods for Machine Learning
How to Choose Data Preparation Methods for Machine Learning How to Choose Data Preparation Methods for Machine Learning

Correct application of data preparation will transform raw data into a representation that allows learning algorithms to get the most out of the data and make skillful predictions.

Tutorial OverviewThis tutorial is divided into four parts; they are:Strategies for Choosing Data Preparation Techniques Approach 1: Manually Specify Data Preparation Approach 2: Grid Search Data Preparation Methods Approach 3: Apply Data Preparation Methods in ParallelStrategies for Choosing Data Preparation TechniquesThe performance of a machine learning model is only as good as the data used to train it.

Data preparation refers to the techniques used to transform raw data into a form that best meets the expecta…

1 неделя, 1 день назад @ machinelearningmastery.com
8 Top Books on Data Cleaning and Feature Engineering
8 Top Books on Data Cleaning and Feature Engineering 8 Top Books on Data Cleaning and Feature Engineering

It is a huge field of study and goes by many names, such as “data cleaning,” “data wrangling,” “data preprocessing,” “feature engineering,” and more.

Some of these are distinct data preparation tasks, and some of the terms are used to describe the entire data preparation process.

In this post, you will discover the top books on data cleaning, data preparation, feature engineering, and related topics.

“Feature Engineering and Selection”The book “Feature Engineering and Selection: A Practical Approach for Predictive Models” was written by Max Kuhn and Kjell Johnson and was published in 2019.

For textbooks, needed for their references by most researchers, I’d probably recommend:SummaryIn this …

1 неделя, 3 дня назад @ machinelearningmastery.com
Data Preparation for Machine Learning (7-Day Mini-Course)
Data Preparation for Machine Learning (7-Day Mini-Course) Data Preparation for Machine Learning (7-Day Mini-Course)

Lesson 01: Importance of Data PreparationIn this lesson, you will discover the importance of data preparation in predictive modeling with machine learning.

Handling missing data is important as many machine learning algorithms do not support data with missing values.

The scikit-learn Python machine learning library provides an implementation of RFE for machine learning.

Lesson 04: Scale Data With NormalizationIn this lesson, you will discover how to scale numerical data for machine learning.

preprocessing import OneHotEncoder # define the location of the dataset url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/breast-cancer.csv" # load the dataset dataset = read_csv ( url …

1 неделя, 5 дней назад @ machinelearningmastery.com
Feature Engineering and Selection (Book Review)
Feature Engineering and Selection (Book Review) Feature Engineering and Selection (Book Review)

In this post, you will discover my review and breakdown of the book “Feature Engineering and Selection” on the topic of data preparation for machine learning.

Given the importance of data preparation in order to achieve good performance on a dataset, the book is focused on highlighting specific data preparation techniques and how to use them.

Feature Selection OverviewThis chapter motivates the need for feature selection as the selection of the most relevant inputs, not the target variable that is being predicted.

A framework of three methods is used to organize feature selection methods, including:Intrinsic/Implicit Feature Selection.

SummaryIn this post, you discovered my review and break…

2 недели, 1 день назад @ machinelearningmastery.com
kNN Imputation for Missing Values in Machine Learning
kNN Imputation for Missing Values in Machine Learning kNN Imputation for Missing Values in Machine Learning

A popular approach to missing data imputation is to use a model to predict the missing values.

How to load a CSV file with missing values and mark the missing values with NaN values and report the number and percentage of missing values for each column.

... # summarize the number of rows with missing values for each column for i in range(dataframe.shape[1]): # count number of rows with missing values n_miss = dataframe[[i]].isnull().sum() perc = n_miss / dataframe.shape[0] * 100 print('> %d, Missing: %d (%.1f%%)' % (i, n_miss, perc)) 1 2 3 4 5 6 7 .

> 0, Missing: 1 (0.3%) > 1, Missing: 0 (0.0%) > 2, Missing: 0 (0.0%) > 3, Missing: 60 (20.0%) > 4, Missing: 24 (8.0%) > 5, Missing: 58 (19.3%) …

2 недели, 3 дня назад @ machinelearningmastery.com
How to Avoid Data Leakage When Performing Data Preparation
How to Avoid Data Leakage When Performing Data Preparation How to Avoid Data Leakage When Performing Data Preparation

In this tutorial, you will discover how to avoid data leakage during data preparation when evaluating machine learning models.

Tutorial OverviewThis tutorial is divided into three parts; they are:Problem With Naive Data Preparation Data Preparation With Train and Test Sets Train-Test Evaluation With Naive Data Preparation Train-Test Evaluation With Correct Data Preparation Data Preparation With k-fold Cross-Validation Cross-Validation Evaluation With Naive Data Preparation Cross-Validation Evaluation With Correct Data PreparationProblem With Naive Data PreparationThe manner in which data preparation techniques are applied to data matters.

Now that we are familiar with how to apply data prep…

2 недели, 5 дней назад @ machinelearningmastery.com
Tour of Data Preparation Techniques for Machine Learning
Tour of Data Preparation Techniques for Machine Learning Tour of Data Preparation Techniques for Machine Learning

Tweet Share SharePredictive modeling machine learning projects, such as classification and regression, always involve some form of data preparation.

In this tutorial, you will discover the common data preparation tasks performed in a predictive modeling machine learning task.

Tutorial OverviewThis tutorial is divided into six parts; they are:Common Data Preparation Tasks Data Cleaning Feature Selection Data Transforms Feature Engineering Dimensionality ReductionCommon Data Preparation TasksWe can define data preparation as the transformation of raw data into a form that is more suitable for modeling.

Nevertheless, there are steps in a predictive modeling project before and after the data pr…

3 недели, 1 день назад @ machinelearningmastery.com
What Is Data Preparation in a Machine Learning Project
What Is Data Preparation in a Machine Learning Project What Is Data Preparation in a Machine Learning Project

This process provides a context in which we can consider the data preparation required for the project, informed both by the definition of the project performed before data preparation and the evaluation of machine learning algorithms performed after.

The steps before and after data preparation in a project can inform what data preparation methods to apply, or at least explore.

Tutorial OverviewThis tutorial is divided into three parts; they are:Applied Machine Learning Process What Is Data Preparation How to Choose Data Preparation TechniquesApplied Machine Learning ProcessEach machine learning project is different because the specific data at the core of the project is different.

How to C…

3 недели, 3 дня назад @ machinelearningmastery.com
Why Data Preparation Is So Important in Machine Learning
Why Data Preparation Is So Important in Machine Learning Why Data Preparation Is So Important in Machine Learning

We cannot fit and evaluate machine learning algorithms on raw data; instead, we must transform the data to meet the requirements of individual machine learning algorithms.

Tutorial OverviewThis tutorial is divided into three parts; they are:What Is Data in Machine Learning Raw Data Must Be Prepared Machine Learning Algorithms Expect Numbers Machine Learning Algorithms Have Requirements Model Performance Depends on Data Predictive Modeling Is Mostly Data PreparationWhat Is Data in Machine LearningPredictive modeling projects involve learning from data.

Complex Data : Raw data contains compressed complex nonlinear relationships that may need to be exposed: Raw data contains compressed complex…

3 недели, 5 дней назад @ machinelearningmastery.com
Ordinal and One-Hot Encodings for Categorical Data
Ordinal and One-Hot Encodings for Categorical Data Ordinal and One-Hot Encodings for Categorical Data

preprocessing import OrdinalEncoder # define data data = asarray ( [ [ 'red' ] , [ 'green' ] , [ 'blue' ] ] ) print ( data ) # define ordinal encoding encoder = OrdinalEncoder ( ) # transform data result = encoder .

preprocessing import OneHotEncoder # define data data = asarray ( [ [ 'red' ] , [ 'green' ] , [ 'blue' ] ] ) print ( data ) # define one hot encoding encoder = OneHotEncoder ( sparse = False ) # transform data onehot = encoder .

... # load the dataset dataset = read_csv(url, header=None) # retrieve the array of data data = dataset.values 1 2 3 4 5 .

# load the dataset dataset = read_csv ( url , header = None ) # retrieve the array of data data = dataset .

# ordinal encode input …

4 недели, 1 день назад @ machinelearningmastery.com
Lil'Log Lil'Log
последний пост 1 месяц назад
Exploration Strategies in Deep Reinforcement Learning
Exploration Strategies in Deep Reinforcement Learning Exploration Strategies in Deep Reinforcement Learning

Prediction-based ExplorationThe second category of intrinsic exploration bonuses are rewarded for improvement of the agent’s knowledge about the environment.

2007) sketched an idea of using a forward dynamics prediction model to estimate learning progress and assigned intrinsic exploration reward accordingly.

(Image source: Ecoffet, et al., 2020)After vanilla Go-Explore, Yijie Guo, et al.

Cited as:@article{weng2020exploration, title = "Exploration Strategies in Deep Reinforcement Learning", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "2020", url = "https://lilianweng.github.io/lil-log/2020/06/07/exploration-strategies-in-deep-reinforcement-learning.html" }Refer…

1 месяц назад @ lilianweng.github.io
The Transformer Family
The Transformer Family The Transformer Family

(2018) added a set of auxiliary losses to enable training a deep Transformer model on character-level language modeling which outperformed LSTMs.

Longer Attention Span (Transformer-XL)The vanilla Transformer has a fixed and limited attention span.

Image Transformer (Parmer, et al 2018) embraces a formulation of image generation similar to sequence modeling within the Transformer framework.

The top row illustrates the attention connectivity patterns in (a) Transformer, (b) Sparse Transformer with strided attention, and (c) Sparse Transformer with fixed attention.

2019)Cited as:@article{weng2020transformer, title = "The Transformer Family", author = "Weng, Lilian", journal = "lilianweng.githu…

3 месяца назад @ lilianweng.github.io
Curriculum for Reinforcement Learning
Curriculum for Reinforcement Learning Curriculum for Reinforcement Learning

Next, we will look into several categories of curriculum learning, as illustrated in Fig.

This framework of proposing curriculum automatically through another RL agent was formalized as Teacher-Student Curriculum Learning (TSCL; Matiisen, et al.

(Image source: Jabri, et al 2019)Learning a latent skill space can be done in different ways, such as in Hausman, et al.

(Image source: Czarnecki, et al., 2018)Cited as:@article{weng2020curriculum, title = "Curriculum for Reinforcement Learning", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "2020", url = "https://lilianweng.github.io/lil-log/2020/01/29/curriculum-for-reinforcement-learning.html" }References[1] Jeffrey L.…

5 месяцев, 1 неделя назад @ lilianweng.github.io
Self-Supervised Representation Learning
Self-Supervised Representation Learning Self-Supervised Representation Learning

Self-supervised learning opens up a huge opportunity for better utilizing unlabelled data, while learning in a supervised learning manner.

A great summary of how self-supervised learning tasks can be constructed (Image source: LeCun’s talk)Here is a nicely curated list of papers in self-supervised learning.

Self-supervised representation learning has shown great potential in learning useful state embedding that can be used directly as input to a control policy.

2020)Cited as:@article{weng2019selfsup, title = "Self-Supervised Representation Learning", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "2019", url = "https://lilianweng.github.io/lil-log/2019/11/10/self-…

8 месяцев назад @ lilianweng.github.io
Evolution Strategies
Evolution Strategies Evolution Strategies

Evolution Strategies (ES) is one type of black-box optimization algorithms, born in the family of Evolutionary Algorithms (EA).

Evolution strategies (ES) belong to the big family of evolutionary algorithms.

Simple Gaussian Evolution StrategiesThis is the most basic and canonical version of evolution strategies.

(Image source: Wikipedia CMA-ES)Natural Evolution StrategiesNatural Evolution Strategies (NES; Wierstra, et al, 2008) optimizes in a search distribution of parameters and moves the distribution in the direction of high fitness indicated by the natural gradient.

“Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents.” …

10 месяцев, 1 неделя назад @ lilianweng.github.io
Meta Reinforcement Learning
Meta Reinforcement Learning Meta Reinforcement Learning

Meta-RL is meta-learning on reinforcement learning tasks.

Define Meta-RLMeta Reinforcement Learning, in short, is to do meta-learning in the field of reinforcement learning.

Cited as:@article{weng2019metaRL, title = "Meta Reinforcement Learning", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "2019", url = "http://lilianweng.github.io/lil-log/2019/06/23/meta-reinforcement-learning.html" }References[1] Richard S. Sutton.

“RL $^ 2$: Fast Reinforcement Learning via Slow Reinforcement Learning.” ICLR 2017.

[16] Abhishek Gupta, et al.“Unsupervised meta-learning for Reinforcement Learning” arXiv preprint arXiv:1806.04640 (2018).

1 год назад @ lilianweng.github.io
Domain Randomization for Sim2Real Transfer
Domain Randomization for Sim2Real Transfer Domain Randomization for Sim2Real Transfer

Domain Randomization (DR) is a simple but powerful idea of closing this gap by randomizing properties of the training environment.

Domain randomization With domain randomization (DR), we are able to create a variety of simulated environments with randomized properties and train a model that works across all of them.

(Image source: Tobin et al, 2017)Physical dynamics in the simulator can also be randomized (Peng et al.

Match Real Data DistributionUsing real data to guide domain randomization feels a lot like doing system identification or DA.

Cited as:@article{weng2019DR, title = "Domain Randomization for Sim2Real Transfer", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", …

1 год, 2 месяца назад @ lilianweng.github.io
Are Deep Neural Networks Dramatically Overfitted?
Are Deep Neural Networks Dramatically Overfitted? Are Deep Neural Networks Dramatically Overfitted?

Because of its great capability to capture any flexible data representation, deep neural networks have achieved great success in many applications.

(Image source: Zhang’s paper)Are Deep Learning Models Dramatically Overfitted?

(2018) reconciled the traditional bias-variance trade-offs and proposed a new double-U-shaped risk curve for deep neural networks.

The lottery ticket hypothesis opens a new perspective about interpreting and dissecting deep neural network results.

Cited as:@article{weng2019overfit, title = "Are Deep Neural Networks Dramatically Overfitted?

1 год, 3 месяца назад @ lilianweng.github.io
Generalized Language Models
Generalized Language Models Generalized Language Models

As a follow up of word embedding post, we will discuss the models on learning contextualized word vectors, as well as the new trend in large unsupervised pre-trained language models which have achieved amazing SOTA results on a variety of language tasks.

Large-scale pre-trained language modes like OpenAI GPT and BERT have achieved great performance on a variety of language tasks using generic model architectures.

ELMoELMo, short for Embeddings from Language Model (Peters, et al, 2018) learns contextualized word representation by pre-training a language model in an unsupervised way.

Bidirectional Language ModelThe bidirectional Language Model (biLM) is the foundation for ELMo.

Multi-task ben…

1 год, 5 месяцев назад @ lilianweng.github.io
Object Detection Part 4: Fast Detection Models
Object Detection Part 4: Fast Detection Models Object Detection Part 4: Fast Detection Models

Part 4 of the “Object Detection for Dummies” series focuses on one-stage models for fast detection, including SSD, RetinaNet, and models in the YOLO family.

In Part 4, we only focus on fast object detection models, including SSD, RetinaNet, and models in the YOLO family.

Focal LossOne issue for object detection model training is an extreme imbalance between background that contains no object and foreground that holds objects of interests.

The comparison of various fast object detection models on speed and mAP performance.

Cited as:@article{weng2018detection4, title = "Object Detection Part 4: Fast Detection Models", author = "Weng, Lilian", journal = "lilianweng.github.io/lil-log", year = "…

1 год, 6 месяцев назад @ lilianweng.github.io
Piekniewski's blog
последний пост 1 месяц назад
AI - the no bullshit approach
AI - the no bullshit approach AI - the no bullshit approach

In this post I'd like share some of that agenda, in what I call the "no bullshit" approach to AI.

And since we don't see these things, we don't label datasets with them and hence these "symbols" never make it to AI, neither from the symbolic approach, nor machine learning approach.

Notably the stuff deep learning is mostly successfully used for these days is not mission critical.

The science wayThe scientific approach is really what this blog was all about, before it veered into making cynical posts about the general AI stupidity out there.

Failure of deep learning on delivering of many promises will likely lead to a similar winter.

1 месяц назад @ blog.piekniewski.info
DeflAition
DeflAition DeflAition

Full loyalty to the charter is expected, to the point of even varying the compensation by the level of "faith" .

It is often better to invest resources in getting slightly better data, add one more sensor, than train some ridiculously huge deep learning model and expect miracles.

With honesty and integrity rarely found in Silicon Valley, he went in and said what many were whispering for a while - AI is not really "AI".

Deep learning in clinical applicationsThere was some buzz about deep learning replacing radiologists, nonsense initiated by Hinton and then promptly repeated by Andrew Ng.

The realization that deep learning is not going to cut it with respect to self driving cars and many oth…

2 месяца, 4 недели назад @ blog.piekniewski.info
Autonomous vehicle safety myths and facts, 2020 update.
Autonomous vehicle safety myths and facts, 2020 update. Autonomous vehicle safety myths and facts, 2020 update.

As usual, these number are not really measuring reliably the safety of AV's and there are plenty ways to game them, or overreport.

Please refer to my last years post for a deeper discussion (and 2017 post here, 2018 post here) on why these numbers are essentially flawed.

Nevertheless these are the only official numbers we get, the only glimpse of transparency into this giant corporate endeavor called the "self driving car".

Nevertheless even Waymo and Cruise disengagements are still approximately an order of magnitude from the upper bound of human crash rate.

They finally have recorded some autonomous testing miles with the DMV, all 12.2 of them.

4 месяца, 1 неделя назад @ blog.piekniewski.info
The musings of a transformer
The musings of a transformer The musings of a transformer

Earlier last week I posted a poll on twitter asking If my readers would like me to post a GPT generated article.

The images were generated by https://app.generative.photos/ from RosebudAI - a recent hot startup in the AI space.

We've discussed the problems in the above graphic:In a move to improve safety in space, SpaceX will begin launching small cubesats.

If we consider the way that our brains work, we can think of data as representing information.

Even if a deep neural network could be trained to understand language, we would expect it to produce gibberish.

7 месяцев, 2 недели назад @ blog.piekniewski.info
AI update, late 2019 - wizards of Oz
AI update, late 2019 - wizards of Oz AI update, late 2019 - wizards of Oz

Self driving carsAs time goes, more and more cracks are showing on the self driving car narrative.

Voyage, another similar startup now wants to solve the self driving problem with Deep Reinforcement Learning.

Meanwhile Daimler joined the crowd of companies slowly deflating the self driving balloon, to the point of even admitting they'd be cutting spending on it.

Element AI, one of these AI wannabe-unicorns with undefined product or service based in Canada raised a flat round and fired their CEO.

SummaryThe whole field of AI resembles a giant collective of wizards of Oz.

7 месяцев, 3 недели назад @ blog.piekniewski.info
Reviewing Rebooting AI
Reviewing Rebooting AI Reviewing Rebooting AI

In this post I'd like to focus on the recent book by Gary Marcus and Ernest Davis, Rebooting AI.

Current deep learning models are black boxes and have surprising failure modes, hence cannot be trusted in important applications.

The book appears to argue for more hybrid approaches to leverage the best of both worlds, symbolic good old fashioned AI (GOFAI) with the new wave deep learning AI.

On the other hand, mixing up current deep learning stuff with symbolic method does not seem to me personally like a road that would get us to actual AI, as in AI that is actually "intelligent".

They observe something I've been explaining in this blog since I started it - nobody really knows what common se…

8 месяцев, 2 недели назад @ blog.piekniewski.info
Civilization from scratch
Civilization from scratch Civilization from scratch

To what extent would you be able to advance the civilization of the given era with all the knowledge in your head (no notebooks).

Initially the reaction is obviously that since we all live and breathe the current technical civilization, one should be able to recover almost everything right?

The way things are mounted, valves controlled, lubrication provided.

Having electricity generated in more appreciable quantities is the basic requirement, since good 95% of our modern industrial civilization runs on electricity.

We don't generally have the complete knowledge to recover everything from scratch.

11 месяцев, 2 недели назад @ blog.piekniewski.info
AI circus, mid 2019 update
AI circus, mid 2019 update AI circus, mid 2019 update

The most hilarious set of events over the past few months in AI circulated around Open AI and Tesla.

Anyway, recently Open AI which is apparently no longer open, came out with an idea of going for profit.

I wish I could believe this too, but unfortunately I don't and I think Open AI has turned into a total scam.

This judgement is further reinforced by looking at what some of these Open AI people tweet, take e.g.

SummarySo there you go, the state of AI in mid 2019.

1 год, 1 месяц назад @ blog.piekniewski.info
Deep learning and shallow data
Deep learning and shallow data Deep learning and shallow data

Many people these days are fascinated by deep learning, as it enabled new capabilities in many areas, particularly in computer vision.

But the success of deep learning and a set of its surprising failure modes teach us a valuable lesson about the data we process.

Deep learning is providing statistically powerful detectors without the expense of feature engineering, though one still has to have a lot of labeled data, lot of GPU's and a deep learning expert onsite.

In applications where rare but catastrophic failure is acceptable, deep learning will work fine.

I don't think deep learning as practiced right now has anything to do with solving AI.

1 год, 3 месяца назад @ blog.piekniewski.info
A brief story of Silicon Valley's affair with AI
A brief story of Silicon Valley's affair with AI A brief story of Silicon Valley's affair with AI

Once upon a time, in the 1980's there was a magical place called Silicon Valley.

This again allowed Silicon Valley tycoons move more silicon into the households.

This was a problem for Silicon Valley, things started slowing down.

This is all Silicon Valley could have wished for: a new, highly lucrative application space that in addition required a ton of new silicon for compute requirements.

But neither of these improvements seems big enough win, to justify Silicon Valley big bets.

1 год, 4 месяца назад @ blog.piekniewski.info
Autonomous vehicle safety myths and facts, 2019 update
Autonomous vehicle safety myths and facts, 2019 update Autonomous vehicle safety myths and facts, 2019 update

2018 was an important year for self driving as we had seen the first fatal accident caused by an autonomous vehicle (the infamous Uber crash in Arizona).

The precise definition under California law is:“a deactivation of the autonomous mode when a failure of the autonomous technology is detected or when the safe operation of the vehicle requires that the autonomous vehicle test driver disengage the autonomous mode and take immediate manual control of the vehicle.” Section 227.46 of Article 3.7 (Autonomous Vehicles) of Title 13, Division 1, Chapter 1, California Code of Regulations.

“a deactivation of the autonomous mode when a failure of the autonomous technology is detected or when the safe…

1 год, 4 месяца назад @ blog.piekniewski.info
Fooled by data
Fooled by data Fooled by data

The original data looks somewhat like this:We see that when we rotate the data we find a direction along which the data is separable, there are two well defined pancakes, no wonder the perceptron had no problems finding that separation.

However if we look at the data after PCA:and we find that the data is completely mixed up and inseparable.

This data is explicitly constructed to make the PCA fail but this does happen on real data.

Species Height Score K T 150 K T 145 K S 90 K S 95 K S 100 K S 105 K S 90 K S 95 K S 90 R T 140 R T 135 R T 130 R T 140 R T 135 R T 135 R T 140 R S 80 R S 85 R S 85These 20 rows will be enough to make my point.

In general data is typically much more complex than …

1 год, 5 месяцев назад @ blog.piekniewski.info
Elon and the collective
Elon and the collective Elon and the collective

Also Thunderf00t has a nice debunking video with the analysis of the alleged cost savings of drilling Elon bragged about.

Later in an interview Elon stated that "it was worth it", and that he "does not have the respect for SEC".

To me it just exposes Elon as an arrogant and narcissistic buffoon, not some capitalist superhero.

SummaryAlthough resistance to Elon and his fans is futile, and we will all be assimilated, I call bullshit.

I think the crowds of people who think Elon is the savior of man kind will be in for a great disappointment.

1 год, 6 месяцев назад @ blog.piekniewski.info
AI winter - update
AI winter - update AI winter - update

IntroductionAlmost six months ago (May 28th 2018) I posted the "AI winter is well on its way" post that went viral.

First of all a bit of clarification: some readers have misinterpreted my claims, in that I predicted that the AI hype is declining.

Andrew Ng is a rare example of a person who jumped from an academic bubble into an even bigger AI hype bubble.

I'm pretty certain that following Hotz's lead, many of today's AI hype blowers will be screaming how they've been warning about AI winter all along, once the bubble bursts.

Musk reiterated that he believes in the self driving Tesla fleet however full self driving remains "off menu" as it was too confusing (two years after introduction of …

1 год, 8 месяцев назад @ blog.piekniewski.info
Deep learning - the "why" question.
Deep learning - the "why" question. Deep learning - the "why" question.

There are many many deep learning models out there doing various things.

Science does not need to make gold out of lead every time, or in the case of machine learning, a real scientific paper in this field does not need to beat some current benchmark.

A scientific paper does not even need to answer any questions, if it happens to ask some good ones.

These are mostly the ones which try to show the deficits of deep learning and engage into a discussion as to why that might be the case.

So next time you read a deep learning paper, try to contemplate these quiet and never explained choices the authors have made.

1 год, 9 месяцев назад @ blog.piekniewski.info
Sebastian Ruder Sebastian Ruder
последний пост 1 год, 5 месяцев назад
AAAI 2019 Highlights: Dialogue, reproducibility, and more
AAAI 2019 Highlights: Dialogue, reproducibility, and more AAAI 2019 Highlights: Dialogue, reproducibility, and more

This post discusses highlights of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).

They provide the illusion of having a dialogue but in fact do not have a clue what we are saying or meaning.

During the panel discussion, Imed Zitouni highlighted that the limitations of current dialogue models affect user behaviour.

ReproducibilityAt the Workshop on Reproducible AI, Joel Grus argued that Jupyter notebooks are bad for reproducibility.

Another good resource for reproducibility is the ML reproducibility checklist by Joelle Pineau, which provides a list of items for algorithms, theory, and empirical results to enforce reproducibility.

1 год, 5 месяцев назад @ ruder.io
The 4 Biggest Open Problems in NLP
The 4 Biggest Open Problems in NLP The 4 Biggest Open Problems in NLP

This post discusses 4 major open problems in NLP based on an expert survey and a panel discussion at the Deep Learning Indaba.

NLP for low-resource scenariosDealing with low-data settings (low-resource languages, dialects (including social media text "dialects"), domains, etc.).

Taking a step back, the actual reason we work on NLP problems is to build systems that break down barriers.

Datasets, problems, and evaluationPerhaps the biggest problem is to properly define the problems themselves.

The final question asked what the most important NLP problems are that should be tackled for societies in Africa.

1 год, 5 месяцев назад @ ruder.io
10 Exciting Ideas of 2018 in NLP
10 Exciting Ideas of 2018 in NLP 10 Exciting Ideas of 2018 in NLP

At EMNLP 2018, unsupervised MT hit its stride with two papers from the same two groups that significantly improve upon their previous methods.

2) Pretrained language modelsUsing pretrained language models is probably the most significant NLP trend this year, so I won't spend much time on it here.

In particular, combining multilingual transfer learning (such as multilingual BERT), unsupervised learning, and meta-learning is a promising direction.

To me this really shows that pretrained language models indeed capture similar properties as computer vision models pretrained on ImageNet.

(EMNLP 2018): This paper proposes an auxiliary task that pretrains span representations by predicting for eac…

1 год, 6 месяцев назад @ ruder.io
EMNLP 2018 Highlights: Inductive bias, cross-lingual learning, and more
EMNLP 2018 Highlights: Inductive bias, cross-lingual learning, and more EMNLP 2018 Highlights: Inductive bias, cross-lingual learning, and more

The post discusses highlights of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).

For another beneficial inductive bias for attention, in one of the best papers of the conference, Strubell et al.

In the model by Zhang et al., sentences are viewed as latent variables for summarization.

show that RNN language models can represent filler-gap dependencies and learn a particular subset of restrictions known as island constraints.

Wood-Doughty et al.

1 год, 8 месяцев назад @ ruder.io
HackerNoon Interview
HackerNoon Interview HackerNoon Interview

This post is an interview by fast.ai fellow Sanyam Bhutani with me.

This post originally appeared at HackerNoon with a different introduction.

Sanyam: You’re working as a research scientist today at AYLIEN, and you’re a Ph.D. student at Insight Research Centre for Data Analytics.

If you’re interested in doing research, try to choose a particular subproblem not everyone is working on.

Sanyam: Thank you so much for doing this interview.

1 год, 9 месяцев назад @ ruder.io
A Review of the Neural History of Natural Language Processing
A Review of the Neural History of Natural Language Processing A Review of the Neural History of Natural Language Processing

Understanding better what information such language models capture consequently is an active research area (Kuncoro et al., 2018; Blevins et al., 2018).

It is becoming increasingly possible to learn a good projection in a completely unsupervised way (at least for similar languages) (Conneau et al., 2018; Artetxe et al., 2018; Søgaard et al., 2018), which opens applications for low-resource languages and unsupervised machine translation (Lample et al., 2018; Artetxe et al., 2018).

Three main types of neural networks became the most widely used: recurrent neural networks, convolutional neural networks, and recursive neural networks.

Recurrent neural networks Recurrent neural networks (RNNs) a…

1 год, 9 месяцев назад @ ruder.io
ACL 2018 Highlights: Understanding Representations and Evaluation in More Challenging Settings
ACL 2018 Highlights: Understanding Representations and Evaluation in More Challenging Settings ACL 2018 Highlights: Understanding Representations and Evaluation in More Challenging Settings

They find that all models indeed encode a significant amount of syntax and---in particular---that language models learn some syntax.

Another interesting result regarding the generalization ability of language models is due to Lau et al.

who find that a language model trained on a sonnet corpus captures meter implicitly at human-level performance.

Spithourakis and Riedel observe that language models are bad at modelling numerals and propose several strategies to improve them.

To make this easier, I have recently created a document to collect the state of the art across different NLP tasks.

1 год, 11 месяцев назад @ ruder.io
NLP's ImageNet moment has arrived
NLP's ImageNet moment has arrived NLP's ImageNet moment has arrived

Such methods herald a watershed moment: they may have the same wide-ranging impact on NLP as pretrained ImageNet models had on computer vision.

The success of ImageNet highlighted that in the era of deep learning, data was at least as important as algorithms.

Pretraining a language model was first proposed in 2015 , but it remained unclear whether a single pretrained language model was useful for many tasks.

One outstanding question is how to transfer the information from a pre-trained language model to a downstream task.

While NLP models are typically more shallow and thus require different fine-tuning techniques than their vision counterparts, recent pretrained models are getting deeper.

1 год, 12 месяцев назад @ ruder.io
Tracking the Progress in Natural Language Processing
Tracking the Progress in Natural Language Processing Tracking the Progress in Natural Language Processing

This post introduces a resource to track the progress and state-of-the-art across many tasks in NLP.

Go directly to the document tracking the progress in NLP.

Research in Machine Learning and in Natural Language Processing (NLP) is moving so fast these days, it is hard to keep up.

The Electronic Frontier Foundation and the AI Index try to do something similar for all of AI but only cover a few language tasks.

The Language Resources and Evaluation (LRE) Map collects language resources presented at LREC and other conferences, but does not allow to break them out by tasks or popularity.

2 года назад @ ruder.io
Highlights of NAACL-HLT 2018: Generalization, Test-of-time, and Dialogue Systems
Highlights of NAACL-HLT 2018: Generalization, Test-of-time, and Dialogue Systems Highlights of NAACL-HLT 2018: Generalization, Test-of-time, and Dialogue Systems

This post discusses highlights of NAACL-HLT 2018.

Specifically, my highlights concentrate on three topics, which were prominent throughout the conference: Generalization, the Test-of-Time awards, and Dialogue Systems.

Embeddings from Language Models (ELMo) showed significant improvements over the state-of-the-art on a wide range of tasks as can be seen below.

However, current neural NLG heavily depends on language models and neural NLG can be brittle; in many cases, baselines based on templates can actually work better.

Secondly, language models are surface learners: they need “world” models and must be sensitive to the “latent process” behind language.

2 года назад @ ruder.io
An overview of proxy-label approaches for semi-supervised learning
An overview of proxy-label approaches for semi-supervised learning An overview of proxy-label approaches for semi-supervised learning

These can be different network architectures in the case of neural networks or completely different learning algorithms.

For multi-view learning, different models work together to teach each other, alternately acting as both teachers and students.

Learning with noisy labels Learning with noisy labels is similar to learning from weak supervision.

For learning with noisy labels, labels are typically assumed to be permuted with a fixed random permutation.

While proxy-label approaches supply the noisy labels themselves, when learning with noisy labels, the labels are part of the data.

2 года, 2 месяца назад @ ruder.io
Text Classification with TensorFlow Estimators
Text Classification with TensorFlow Estimators Text Classification with TensorFlow Estimators

In particular, this article demonstrates how to solve a text classification task using custom TensorFlow estimators, embeddings, and the tf.layers module.

batch ( 100 ) dataset = dataset .

map ( parser ) dataset = dataset .

batch ( 100 ) dataset = dataset .

numpy_input_fn ( x = { "x" : x , "len" : length } , shuffle = False ) predictions = [ p [ 'logistic' ] [ 0 ] for p in classifier .

2 года, 2 месяца назад @ ruder.io
Requests for Research
Requests for Research Requests for Research

This post aims to provide inspiration and ideas for research directions to junior researchers and those trying to get into research.

Machine learning research in particular moves so fast these days that it is difficult to find an opening.

Recent work focuses on creating adversarial examples either by replacing words or characters (Samanta and Mehta, 2017; Ebrahimi et al., 2017) , concatenation (Jia and Liang, 2017) , or adding adversarial perturbations (Yasunaga et al., 2017) .

If the representations are disentangled as in (Hu et al., 2017) , then we are also not too far from style transfer (Shen et al., 2017) .

Recently proposed methods such as cross-stitch units (Misra et al., 2017; Ruder…

2 года, 4 месяца назад @ ruder.io
Optimization for Deep Learning Highlights in 2017
Optimization for Deep Learning Highlights in 2017 Optimization for Deep Learning Highlights in 2017

This indicates that from the Machine Learning practitioner's perspective, best practices for optimization for Deep Learning have largely remained the same.

An important hyperparameter for optimization in Deep Learning is the learning rate \(\eta\).

It is often thought that adaptive learning rate methods such as Adam are more robust to different learning rates, as they update the learning rate themselves.

In fact, learning rate annealing schedule engineering seems to be the new feature engineering as we can often find highly-tuned learning rate annealing schedules that improve the final convergence behaviour of our model.

Learning rate annealing with warm restarts is also known as cyclical l…

2 года, 7 месяцев назад @ ruder.io
Word embeddings in 2017: Trends and future directions
Word embeddings in 2017: Trends and future directions Word embeddings in 2017: Trends and future directions

Another interesting approach for generating OOV word embeddings is to train a character-based model to explicitly re-create pre-trained embeddings (Pinter et al., 2017) .

(2017) are one of the first to show results on topic categorization as a downstream task; while multi-sense embeddings outperform randomly initialized word embeddings in their experiments, they are outperformed by pre-trained word embeddings.

This can allows us to reveal laws of semantic change (Hamilton et al., 2016; Bamler & Mandt, 2017; Dubossarsky et al., 2017) , , , to model temporal word analogy or relatedness (Szymanski, 2017; Rosin et al., 2017) , , or to capture the dynamics of semantic relations (Kutuzov et al., …

2 года, 8 месяцев назад @ ruder.io
💼 University and corporation labs
DeepMind DeepMind
последний пост 2 недели, 4 дня назад
Applying for technical roles
Applying for technical roles Applying for technical roles

What can I expect in the interview process?

Feryal: The interview process at DeepMind can vary depending on the particular role you’re applying for.

Phase two - technical interviewsThis part of the process involves several sessions - including one with a technical quiz that covers a large breadth of topics in computer science, statistics, mathematics and machine learning.

~30min] interviews with researchers and leads about your specific research background and interests.

Phase four - culture interviewTowards the end of the interview process, you will once again connect with the recruitment team to discuss DeepMind’s culture and mission.

2 недели, 4 дня назад @ deepmind.com
Using AI to predict retinal disease progression
Using AI to predict retinal disease progression Using AI to predict retinal disease progression

The ‘dry’ form is relatively common among people over 65, and usually causes only mild sight loss.

Our contribution highlights the potential of using AI in preventative studies for diseases such as exAMD.

The Moorfields Eye Hospital AMD datasetWe used a dataset of anonymised retinal scans from Moorfields patients with exAMD in one eye, and at high-risk of developing exAMD in their other eye.

To address this, we worked with retinal experts to review all scans for each eye and specify the scan when exAMD was first evident.

In our previous work, now continuing in collaboration with Google Health, we developed a model capable of segmenting these eye scans into thirteen anatomical categories.

1 месяц, 3 недели назад @ deepmind.com
Specification gaming: the flip side of AI ingenuity
Specification gaming: the flip side of AI ingenuity Specification gaming: the flip side of AI ingenuity

Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome.

We have all had experiences with specification gaming, even if not by this name.

In this post, we review possible causes for specification gaming, share examples of where this happens in practice, and argue for further work on principled approaches to overcoming specification problems.

In a Lego stacking task, the desired outcome was for a red block to end up on top of a blue block.

The agent was rewarded for the height of the bottom face of the red block when it is not touching the block.

2 месяца, 3 недели назад @ deepmind.com
Towards understanding glasses with graph neural networks
Towards understanding glasses with graph neural networks Towards understanding glasses with graph neural networks

The practical implications of modelling glassThe glass transition is a ubiquitous phenomenon which manifests in more than window (silica) glasses.

Understanding the glass transition may result in other applications of disordered materials, in fields as diverse as biorenewable polymers and food processing.

Our new work, published in Nature Physics, could help us gain an understanding of the structural changes that may occur near the glass transition.

Leveraging graph neural networks to model glassy dynamicsGlasses can be modelled as particles interacting via a short-range repulsive potential which essentially prevents particles from getting too close to each other.

We then trained a neural n…

3 месяца назад @ deepmind.com
Agent57: Outperforming the human Atari benchmark
Agent57: Outperforming the human Atari benchmark Agent57: Outperforming the human Atari benchmark

Combining off-policy learning with memory is challenging because you need to know what you might remember when executing a different behaviour.

Within that strand, we distinguish two types of rewards: firstly, long-term novelty rewards encourage visiting many states throughout training, across many episodes.

Secondly, short-term novelty rewards encourage visiting many states over a short span of time (e.g., within a single episode of a game).

However, learning density models of high dimensional spaces is fraught with problems due to the curse of dimensionality.

For example, in Montezuma’s Revenge, unlike undirected exploration strategies, long-term novelty rewards allow the agent to surpass…

3 месяца, 1 неделя назад @ deepmind.com
A new model and dataset for long-range memory
A new model and dataset for long-range memory A new model and dataset for long-range memory

Modelling natural languageFinding machine learning tasks which both drive the development of better memory architectures and push us further towards artificial general intelligence is challenging.

Transferring knowledgeSuch samples would likely astound Shannon, 70 years on from his early language model experiments.

Google’s prominent natural language model, BERT, achieves state-of-the-art performance on a wide array of NLP benchmarks, and is now a part of Google Search.

Benchmarking language modelsA popular long-range language model benchmark is WikiText-103, which is comprised of English-language Wikipedia articles, and was developed by researchers at Salesforce AI.

As such, we’ve compiled…

5 месяцев назад @ deepmind.com
Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI
Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI

Meanwhile, in close contact with this study of reward learning in animals, computer scientists have developed algorithms for reinforcement learning in artificial systems.

A chain of prediction: temporal difference learningReinforcement learning is one of the oldest and most powerful ideas linking neuroscience and AI.

An important breakthrough in solving the problem of reward prediction was the temporal difference learning (TD) algorithm.

Around the same time, in the late 80s and early 90s, neuroscientists were struggling to understand the behaviour of dopamine neurons.

Distributional reinforcement learning

5 месяцев, 4 недели назад @ deepmind.com
AlphaFold: Using AI for scientific discovery
AlphaFold: Using AI for scientific discovery AlphaFold: Using AI for scientific discovery

In our study published today in Nature, we demonstrate how artificial intelligence research can drive and accelerate new scientific discoveries.

Our system, AlphaFold – described in peer-reviewed papers now published in Nature and PROTEINS – is the culmination of several years of work, and builds on decades of prior research using large genomic datasets to predict protein structure.

What is the protein folding problem?

What any given protein can do depends on its unique 3D structure.

Why is protein folding important?

5 месяцев, 4 недели назад @ deepmind.com
Using WaveNet technology to reunite speech-impaired users with their original voices
Using WaveNet technology to reunite speech-impaired users with their original voices Using WaveNet technology to reunite speech-impaired users with their original voices

This post details a recent project we undertook with Google and ALS campaigner Tim Shaw, as part of Google’s Euphonia project.

We demonstrate an early proof of concept of how text-to-speech technologies can synthesise a high-quality, natural sounding voice using minimal recorded speech data.

But message banking lacks flexibility, resulting in a static dataset of phrases.

Now imagine that you were given the chance to preserve your voice by recording as much of it as possible.

And people who aren’t able to record phrases in time are left to choose a generic computer synthesized voice that lacks the same power of connection as their own.

6 месяцев, 3 недели назад @ deepmind.com
Learning human objectives by evaluating hypothetical behaviours
Learning human objectives by evaluating hypothetical behaviours Learning human objectives by evaluating hypothetical behaviours

TL;DR: We present a method for training reinforcement learning agents from human feedback in the presence of unknown unsafe states.

Training RL agents in the presence of unsafe states is known as the safe exploration problem.

The agent has one source of information: feedback about unsafe states from a human user.

Existing methods for training agents from human feedback ask the user to evaluate data of the agent acting in the environment.

The user provides feedback on this hypothetical behaviour, and the system interactively learns a model of the user's reward function.

7 месяцев назад @ deepmind.com
From unlikely start-up to major scientific organisation: Entering our tenth year at DeepMind
From unlikely start-up to major scientific organisation: Entering our tenth year at DeepMind From unlikely start-up to major scientific organisation: Entering our tenth year at DeepMind

Pioneering research, growing impactA mission this ambitious requires pioneering research on many fronts over many years.

As our research matures, we’ve been finding more opportunities to partner with others for social and commercial impact, often with our colleagues across Alphabet.

Entering our next phaseAs I discussed with Wired in the summer, this year feels like the start of a new phase for DeepMind as an established scientific organisation.

Over the past year, we’ve also been formalising a leadership team with the seasoned experience and skills for our second decade.

Right back to our origins blending neuroscience with machine learning, we’ve found that breakthroughs happen faster when…

7 месяцев, 1 неделя назад @ deepmind.com
Strengthening the AI community
Strengthening the AI community Strengthening the AI community

For me, it was being awarded an internship at Intel, the first one ever through Purdue’s Co-Op Engineering program in 1990.

I just didn’t know if I had the right technical skills for the work, or if engineering was really my path.

It grew into a very successful 18-year career at Intel and a 25-year career in tech.

At DeepMind we want to build advanced AI to expand our knowledge and find answers to some of the fundamental questions facing society.

DeepMind Scholarships to open the field of AIThe DeepMind scholarship programme is one way we seek to broaden participation in science and AI.

7 месяцев, 3 недели назад @ deepmind.com
Advanced machine learning helps Play Store users discover personalised apps
Advanced machine learning helps Play Store users discover personalised apps Advanced machine learning helps Play Store users discover personalised apps

Candidate generator unbiasingOur model (called a candidate generator) learns what apps a user is more likely to install based on previous apps they’ve installed from the Play store.

The model therefore learns a bias that favours the apps that are shown – and thus installed – more often.

An importance weight is based on the impression-to-install rate of each individual app in comparison with the median impression-to-install rate across the Play store.

Through importance weighting, our candidate generator can downweight or upweight apps based on their install rates, which mitigates the recommendation bias problem.

Our solution to this, the reranker model, learns the relative importance of a p…

7 месяцев, 3 недели назад @ deepmind.com
AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning
AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

Since then, we have taken on a much greater challenge: playing the full game at a Grandmaster level under professionally approved conditions .

AlphaStar can now play in one-on-one matches as and against Protoss, Terran, and Zerg – the three races present in StarCraft II.

Each of the Protoss, Terran, and Zerg agents is a single neural network.

We chose to use general-purpose machine learning techniques – including neural networks, self-play via reinforcement learning, multi-agent learning, and imitation learning – to learn directly from game data with general purpose techniques.

Using the advances described in our Nature paper, AlphaStar was ranked above 99.8% of active players on Battle.net…

8 месяцев, 2 недели назад @ deepmind.com
Causal Bayesian Networks: A flexible tool to enable fairer machine learning
Causal Bayesian Networks: A flexible tool to enable fairer machine learning Causal Bayesian Networks: A flexible tool to enable fairer machine learning

This simplified example shows how CBNs can provide us with a visual framework for describing different possible unfairness scenarios.

It is nevertheless necessary to avoid pitfalls when evaluating or designing a decision system.

This means that it would be possible for the system to be deemed fair, even if it carries the unfair influence: this would automatically be the case for an error-free decision system.

On the other hand, if the path G→D→A was considered fair, it would be inappropriate to use statistical parity.

Path-specific techniques enable us to estimate the influence that a sensitive attribute has on other variables along specific sets of causal paths.

9 месяцев, 1 неделя назад @ deepmind.com
Google Google
последний пост 12 часов назад
Grounding Natural Language Instructions to Mobile UI Actions
Grounding Natural Language Instructions to Mobile UI Actions Grounding Natural Language Instructions to Mobile UI Actions

The action phrase extraction model takes a word sequence of a natural language instruction and outputs a sequence of spans (denoted in red boxes) that indicate the phrases describing the operation, the object and the argument of each action in the task.

12 часов назад @ ai.googleblog.com
AutoML Tables: end-to-end workflows on AI Platform Pipelines
AutoML Tables: end-to-end workflows on AI Platform Pipelines AutoML Tables: end-to-end workflows on AI Platform Pipelines

To help make AutoML Tables more useful and user friendly, we’ve released a number of new features, including:This post gives a tour of some of these new features via a Cloud AI Platform Pipelines example that shows end-to-end management of an AutoML Tables workflow.

Cloud AI Platform Pipelines provides a way to deploy robust, repeatable machine learning pipelines along with monitoring, auditing, version tracking, and reproducibility, and delivers an enterprise-ready, easy to install, secure execution environment for your ML workflows.

Using Cloud AI Platform Pipelines to orchestrate a Tables workflowCloud AI Platform Pipelines, now in Beta, provides a way to deploy robust, repeatable machin…

12 часов назад @ cloud.google.com
Tools for language access during COVID-19
Tools for language access during COVID-19 Tools for language access during COVID-19

How machine translation can helpMachine translation is an automated way to translate text or speech from one language to another.

It can take volumes of data and provide translations into a large number of supported languages.

To translate web content in Chrome, all you have to do is go to a webpage in another language, then click “Translate” at the top.

Use a website translation widgetIf you are a webmaster of a government, non-profit, and/or non-commercial website (e.g.

academic institutions), you may be eligible to sign up for the Google Translate Website Translator widget.

13 часов назад @ blog.google
An update on our work on AI and responsible innovation
An update on our work on AI and responsible innovation An update on our work on AI and responsible innovation

As a leader in AI, we’ve always prioritized the importance of understanding its societal implications and developing it in a way that gets it right for everyone.

That’s why we first published our AI Principles two years ago and why we continue to provide regular updates on our work.

As our CEO Sundar Pichai said in January, developing AI responsibly and with social benefit in mind can help avoid significant challenges and increase the potential to improve billions of lives.

As we develop AI we are committed to testing safety, measuring social benefits, and building strong privacy protections into products.

Building on previous AI Principles updates we shared here on the Keyword in 2018 and …

1 день, 11 часов назад @ blog.google
AutoML-Zero: Evolving Code that Learns
AutoML-Zero: Evolving Code that Learns AutoML-Zero: Evolving Code that Learns

A population is initialized with empty programs.

Many generations later, we see a more evolved population and two of its algorithms compete.

The most accurate wins to produce a child.

After many such events, the final population contains highly accurate classifiers.

1 день, 12 часов назад @ ai.googleblog.com
Ask a Techspert: How do machine learning models explain themselves?
Ask a Techspert: How do machine learning models explain themselves? Ask a Techspert: How do machine learning models explain themselves?

Since machine learning is used in our everyday lives, it’s also important for everyone to understand how it impacts us.

It’s also important for developers and decision-makers to be able to explain or present a machine learning model to people in order to do so.

This is what we call “interpretability.”How do you make machine learning models easier to understand and interpret?

What do you need to be careful of when you’re making conclusions based on machine learning models?

Machine learning technology is a powerful tool that will transform society as we know it, and helping others to use it safely is very rewarding.

1 день, 14 часов назад @ blog.google
Duality — A New Approach to Reinforcement Learning
Duality — A New Approach to Reinforcement Learning Duality — A New Approach to Reinforcement Learning

A plot of the average reward achieved by an agent using the duality-based approach (blue) compared to an agent using standard actor-critic (orange).

In addition to being more mathematically principled, our approach also yields better practical results.

2 дня, 12 часов назад @ ai.googleblog.com
New Compute Engine A2 VMs—first NVIDIA Ampere A100 GPUs in the cloud
New Compute Engine A2 VMs—first NVIDIA Ampere A100 GPUs in the cloud New Compute Engine A2 VMs—first NVIDIA Ampere A100 GPUs in the cloud

Today, we’re excited to introduce the Accelerator-Optimized VM (A2) family on Google Compute Engine, based on the NVIDIA Ampere A100 Tensor Core GPU.

With up to 16 GPUs in a single VM, A2 VMs are the first A100-based offering in the public cloud, and are available now via our private alpha program, with public availability later this year.

Accelerator-Optimized VMs with NVIDIA Ampere A100 GPUsThe A2 VM family was designed to meet today’s most demanding applications—workloads like CUDA-enabled machine learning (ML) training and inference, and high performance computing (HPC).

Each A100 GPU offers up to 20x the compute performance compared to the previous generation GPU and comes with 40 GB o…

3 дня, 13 часов назад @ cloud.google.com
Google at ACL 2020
Google at ACL 2020 Google at ACL 2020

Give us feedback in our Product Forums

4 дня, 13 часов назад @ ai.googleblog.com
SmartReply for YouTube Creators
SmartReply for YouTube Creators SmartReply for YouTube Creators

A 2D projection of the model encodings when presented with a hypothetical comment and a small list of potential replies.

The neighborhood surrounding English comments (black color) consists of appropriate replies in English and their counterparts in Spanish and Arabic.

Note that the network learned to align English replies with their translations without access to any parallel corpus.

1 неделя, 2 дня назад @ ai.googleblog.com
SpineNet: A Novel Architecture for Object Detection Discovered with Neural Architecture Search
SpineNet: A Novel Architecture for Object Detection Discovered with Neural Architecture Search SpineNet: A Novel Architecture for Object Detection Discovered with Neural Architecture Search

A scale-decreased backbone is shown on the left and a scale-permuted backbone is shown on the right.

We define the search space of scale permutations by rearranging intermediate and output blocks, respectively.

We define the search space of scale permutations by rearranging intermediate and output blocks, respectively.

Cross-scale connections: We define two input connections for each block in the search space.

The architecture search process from a scale-decreased backbone to a scale-permuted backbone.

1 неделя, 3 дня назад @ ai.googleblog.com
Google Cloud’s AI Adoption Framework: Helping you build a transformative AI capability
Google Cloud’s AI Adoption Framework: Helping you build a transformative AI capability Google Cloud’s AI Adoption Framework: Helping you build a transformative AI capability

We believe that enterprises that invest in building AI solutions are better positioned to be the industry leaders of tomorrow.

When building an AI capability, executives often ask us:“Which skills should we hire and how should we structure our teams?”“What ML projects should we prioritise?

This whitepaper aims to provide a guiding framework for technology leaders who want to leverage the power of AI to transform their business.

The AI Adoption Framework builds a structure on four areas: people, process, technology, and data.

The interplay between these areas highlights six themes that are critical for success: Lead, Learn, Access, Scale, Automate, and Secure.

1 неделя, 4 дня назад @ cloud.google.com
Reimagining government social services in the COVID-19 era
Reimagining government social services in the COVID-19 era Reimagining government social services in the COVID-19 era

The coronavirus has shown what can happen when IT support systems get pushed to their limits.

States provide support to citizens in all aspects of life, but the coronavirus has made that very difficult, given that shelter-at-home orders have increased states’ need to tap into more modern communication channels to connect with their citizens.

Providing better service over the phone and webThis pandemic has left millions of people in need of state support, leading to unprecedented call volume and web traffic that many states’ legacy technology simply can’t support.

Our modern, cloud-based architecture can help provide flexible user experiences and make changes on the fly, for the web or mobil…

1 неделя, 4 дня назад @ cloud.google.com
How the Google AI Community Used Cloud to Help Biomedical Researchers
How the Google AI Community Used Cloud to Help Biomedical Researchers How the Google AI Community Used Cloud to Help Biomedical Researchers

The goal—to further our understanding about coronaviruses and other diseases—caught the attention of many in the health policy, research and medical community.

Enter the Google artificial intelligence (AI) community.

With the support of Google Cloud credits and credits from the TensorFlow Research Cloud (TFRC), the ML GDEs began to tackle the problem of understanding the research literature.

The team came together in April under the audacious name of ‘AI versus COVID-19’ (aiscovid19.org) and established the objective of using state of the art machine learning and cloud technologies to help biomedical researchers discover new insights, faster, from research literature.

Introducing BREATHEThe…

2 недели назад @ cloud.google.com
Leveraging Temporal Context for Object Detection
Leveraging Temporal Context for Object Detection Leveraging Temporal Context for Object Detection

Here, we can see how additional examples from the same scene help experts determine that the object is an animal and not background.

Context such as the shape & size of the object, its attachment to a herd, and habitual grazing at certain times of day help determine that the species is a wildebeest.

Useful examples occur throughout the month.

2 недели назад @ ai.googleblog.com
OpenAI OpenAI
последний пост 1 день, 14 часов назад
OpenAI Scholars Spring 2020: Final Projects
OpenAI Scholars Spring 2020: Final Projects OpenAI Scholars Spring 2020: Final Projects

Our third class of OpenAI Scholars presented their final projects at virtual Demo Day, showcasing their research results from over the past five months.

The OpenAI Scholars program provides stipends and mentorship to individuals from underrepresented groups to study deep learning and open-source a project.

Learn more about our Scholars program.

I joined the Scholars program in order to learn from the brilliant folks at OpenAI and to immerse myself in AI research.

The OpenAI Scholars program was this magical opportunity to get started by learning from the very best minds in the field.

1 день, 14 часов назад @ openai.com
Image GPT
Image GPT Image GPT

However, the same broad class of models has not been successful in producing strong features for image classification.

From language GPT to image GPTIn language, unsupervised learning algorithms that rely on word prediction (like GPT-2 and BERT) have been extremely successful, achieving top performance on a wide array of language tasks.

Because masked language models like BERT have outperformed generative models on most language tasks, we also evaluate the performance of BERT on our image models.

LimitationsWhile we have shown that iGPT is capable of learning powerful image features, there are still significant limitations to our approach.

Notably, we achieved our results by directly applyi…

3 недели, 2 дня назад @ openai.com
OpenAI API
OpenAI API OpenAI API

We’re releasing an API for accessing new AI models developed by OpenAI.

We will terminate API access for obviously harmful use-cases, such as harassment, spam, radicalization, or astroturfing.

What specifically will OpenAI do about misuse of the API, given what you’ve previously said about GPT-2?

How will OpenAI mitigate harmful bias and other negative effects of models served by the API?

Our API models could also cause harm in ways that we haven’t thought of yet.

4 недели, 1 день назад @ openai.com
Procgen and MineRL Competitions
Procgen and MineRL Competitions Procgen and MineRL Competitions

We’re excited to announce that OpenAI is co-organizing two NeurIPS 2020 competitions with AIcrowd, Carnegie Mellon University, and DeepMind, using Procgen Benchmark and MineRL.

Procgen CompetitionSign up for ProcgenThe Procgen Competition focuses on improving sample efficiency and generalization in reinforcement learning.

Since all content is procedurally generated, each Procgen environment intrinsically requires agents to generalize to never-before-seen situations.

Moreover, we designed Procgen environments to be fast and simple to use.

One well-known way to reduce the environment sample complexity is to leverage human priors and demonstrations of the desired behavior.

1 месяц назад @ openai.com
AI and Efficiency
AI and Efficiency AI and Efficiency

Other measures of AI progressIn addition to efficiency, many other measures shed light on overall algorithmic progress in AI.

Shufflenet achieved AlexNet-level performance with an 18x inference efficiency increase in 5 years (15-month doubling time), which suggests that training efficiency and inference efficiency might improve at similar rates.

This efficiency analysis suggests that policymakers could develop accurate intuitions about the cost of deploying AI capabilities—and how these costs are going to alter over time—by more closely assessing the rate of improvements in efficiency for AI systems.

Our results suggest that for AI tasks with high levels of investment (researcher time and/o…

2 месяца назад @ openai.com
Jukebox
Jukebox Jukebox

Curated samples Provided with genre, artist, and lyrics as input, Jukebox outputs a new music sample produced from scratch.

We can then train a model to generate audio in this compressed space, and upsample back to the raw audio space.

Now in raw audio, our models must learn to tackle high diversity as well as very long range structure, and the raw audio domain is particularly unforgiving of errors in short, medium, or long term timing.

To better understand future implications for the music community, we shared Jukebox with an initial set of 10 musicians from various genres to discuss their feedback on this work.

While Jukebox is an interesting research result, these musicians did not find …

2 месяца, 1 неделя назад @ openai.com
Improving Verifiability in AI Development
Improving Verifiability
in AI Development Improving Verifiability in AI Development

Can I (as an academic) conduct impartial research on the risks associated with large-scale AI systems when I lack the computing resources of industry?

Can I (as an AI developer) verify that my competitors in a given area of AI development will follow best practices rather than cut corners to gain an advantage?

AI developers should pilot bias and safety bounties for AI systems to strengthen incentives and processes for broad-based scrutiny of AI systems.

Standard setting bodies should work with academia and industry to develop audit trail requirements for safety-critical applications of AI systems.

Organizations developing AI and funding bodies should support research into the interpretabili…

2 месяца, 3 недели назад @ openai.com
OpenAI Microscope
OpenAI Microscope OpenAI Microscope

We’re introducing OpenAI Microscope, a collection of visualizations of every significant layer and neuron of eight vision “model organisms” which are often studied in interpretability.

Microscope makes it easier to analyze the features that form inside these neural networks, and we hope it will help the research community as we move towards understanding these complicated systems.

This is the goal of the OpenAI Microscope.

Microscope systematically visualizes every neuron in several commonly studied vision models, and makes all of those neurons linkable.

Our initial release includes nine frequently studied vision models, along with several visualization techniques we’ve found particularly u…

2 месяца, 3 недели назад @ openai.com
OpenAI → PyTorch
OpenAI → PyTorch OpenAI → PyTorch

We are standardizing OpenAI’s deep learning framework on PyTorch.

The main reason we've chosen PyTorch is to increase our research productivity at scale on GPUs.

It is very easy to try and execute new research ideas in PyTorch; for example, switching to PyTorch decreased our iteration time on research ideas in generative modeling from weeks to days.

Going forward we'll primarily use PyTorch as our deep learning framework but sometimes use other ones when there's a specific technical reason to do so.

Many of our teams have already made the switch, and we look forward to contributing to the PyTorch community in upcoming months.

5 месяцев, 1 неделя назад @ openai.com
OpenAI Five
OpenAI Five OpenAI Five

You play against [OpenAI Five] and you realize it has a playstyle that is different.

It’s doing things that you’ve never done and you’ve never seen.

One key learning that we took is how it was allocating resources.

It’s just allocating resources as efficiently as possible.

[…] If OpenAI does that dynamic switch at 100%, we maybe went from 5% to 10%?

7 месяцев назад @ openai.com
Deep Double Descent
Deep Double Descent Deep Double Descent

Many classes of modern deep learning models, including CNNs, ResNets, and transformers, exhibit the previously-observed double descent phenomenon when not using early stopping or regularization.

The model-wise double descent phenomenon can lead to a regime where training on more data hurts.

The double descent phenomena is most prominent in settings with added label noise; without it, the peak is smaller and easy to miss.

For a given number of optimization steps (fixed y-coordinate), test and train error exhibit model-size double descent.

We leave fully understanding the mechanisms behind double descent in deep neural networks as an important open question.

7 месяцев, 1 неделя назад @ openai.com
Procgen Benchmark
Procgen Benchmark Procgen Benchmark

We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills.

To fulfill this need, we have created Procgen Benchmark.

CoinRun now serves as the inaugural environment in Procgen Benchmark, contributing its diversity to a greater whole.

With Procgen Benchmark, we strive for all of the following: experimental convenience, high diversity within environments, and high diversity across environments.

We've now expanded on those results, conducting our most thorough study of RL generalization to date using all 16 environments in Procgen Benchmark.

7 месяцев, 1 неделя назад @ openai.com
Safety Gym
Safety Gym Safety Gym

We're releasing Safety Gym, a suite of environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training.

Safety GymTo study constrained RL for safe exploration, we developed a new set of environments and tools called Safety Gym.

BenchmarkTo help make Safety Gym useful out-of-the-box, we evaluated some standard RL and constrained RL algorithms on the Safety Gym benchmark suite: PPO, TRPO, Lagrangian penalized versions of PPO and TRPO, and Constrained Policy Optimization (CPO).

There are three things we are most interested in at the moment:Improving performance on the current Safety Gym environments.

We also hope that systems l…

7 месяцев, 3 недели назад @ openai.com
GPT-2: 1.5B Release
GPT-2: 1.5B Release GPT-2: 1.5B Release

As the final model release of GPT-2’s staged release, we’re releasing the largest version (1.5B parameters) of GPT-2 along with code and model weights to facilitate detection of outputs of GPT-2 models.

Our partners at Cornell University surveyed people to assign GPT-2 text a credibility score across model sizes.

People gave the 1.5B model a “credibility score” of 6.91 out of 10.

These results make us more inclined to release the 1.5B model, as the incremental increase in human-perceived credibility relative to 774M seems low.

We acknowledge that we cannot be aware of all threats, and that motivated actors can replicate language models without model release.

8 месяцев, 1 неделя назад @ openai.com
Solving Rubik’s Cube with a Robot Hand
Solving Rubik’s Cube with a Robot Hand Solving Rubik’s Cube with a Robot Hand

We've trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand.

Since May 2017, we've been trying to train a human-like robotic hand to solve the Rubik’s Cube.

Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes children several years to gain the dexterity required to master it.

To test the limits of our method, we experiment with a variety of perturbations while the hand is solving the Rubik’s Cube.

Behind the scenes: Rubik’s Cube prototypes In order to benchmark our progress and make the problem tractable, we built and designed custom versions of cubes as stepping stones towards ultimately solving a regular Rubik’s Cube.

8 месяцев, 4 недели назад @ openai.com
Microsoft Microsoft
последний пост 2 дня, 12 часов назад
Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities
Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities

Building on our vision to empower all developers to use AI to achieve more, today we’re excited to announce expanded capabilities within Azure Cognitive Services, including:.

These types of documents typically take manual labeling by document type or intensive coding to extract insights.

One of those advancements, Custom Commands, a capability of Speech in Cognitive Services, is now generally available.

With Cognitive Services and Bot Service, the BBC created an AI-enabled voice assistant, Beeb, that delivers a more engaging, tailored experience for its diverse audiences.

Get started todayLearn more with the resources below and get started with Azure Cognitive Services and an Azure free acc…

2 дня, 12 часов назад @ azure.microsoft.com
Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities
Azure AI: Build mission-critical AI apps with new Cognitive Services capabilities

As the world adjusts to new ways of working and staying connected, we remain committed to providing Azure AI solutions to help organizations invent with purpose.

2 дня, 21 час назад @ azure.microsoft.com
Toward trusted sensing for the cloud: Introducing Project Freta
Toward trusted sensing for the cloud: Introducing Project Freta Toward trusted sensing for the cloud: Introducing Project Freta

With Project Freta, we invite readers to think not of walls but of sunlight.

project-freta@microsoft.comIncubated at Microsoft Research, Project Freta is a roadmap toward trusted sensing for the cloud that can allow enterprises to engage in regular, complete discovery sweeps for undetected malware.

The goal of this democratization effort is to increase the development cost of undiscoverable cloud malware toward its theoretical maximum.

Project Freta: This releaseThe Project Freta analysis engine consumes snapshots of whole-system Linux volatile memory and extracts an enumeration of system objects.

We hope that Project Freta empowers administrators and responders and is used globally as it h…

4 дня, 11 часов назад @ microsoft.com
Teaching a robot to see and navigate with simulation
Teaching a robot to see and navigate with simulation Teaching a robot to see and navigate with simulation

The ability to see and navigate is a critical operational requirement for robots and autonomous systems.

For example, consider autonomous rescue robots that are required to maneuver and navigate in challenging physical environments that humans cannot safely access.

However, building a real-world autonomous system that can operate safely at scale is a very difficult task.

SLAM has made impressive progress with both geometric-based methods and learning-based methods; however, robust and reliable SLAM systems for real-world scenarios remain elusive.

Recent advances in deep reinforcement learning, data-driven control, and deep perception models are fundamentally changing how we build and engine…

1 неделя, 2 дня назад @ microsoft.com
Hammering Rowhammer with Dr. Stefan Saroiu
Hammering Rowhammer with Dr. Stefan Saroiu Hammering Rowhammer with Dr. Stefan Saroiu

Host: Well, let’s talk about memory and computer memory specifically…Stefan Saroiu: Yeah.

Stefan Saroiu: So that’s where sort of the concern lies.

Stefan Saroiu: That’s right.

Stefan Saroiu: That’s right.

(music plays)To learn more about Dr. Stefan Saroiu, and the ongoing fight against Rowhammer attacks, visit Microsoft.com/research

1 неделя, 2 дня назад @ microsoft.com
Newly discovered principle reveals how adversarial training can perform robust deep learning
Newly discovered principle reveals how adversarial training can perform robust deep learning Newly discovered principle reveals how adversarial training can perform robust deep learning

In machine learning, adversarial examples usually refer to natural inputs plus small, specially crafted perturbations that can fool the model into making mistakes.

In recent years, adversarial examples have been repeatedly discovered in deep learning applications, causing public concerns about AI safety.

In a paper titled “Feature Purification: How can Adversarial Training Perform Robust Deep Learning,” researchers from Microsoft Research and Carnegie Mellon University propose the first framework toward understanding the math behind adversarial examples in deep learning.

Background: Mysteries about adversarial examples and adversarial trainingWhy do we have adversarial examples?

On the othe…

1 неделя, 3 дня назад @ microsoft.com
Advancing Azure service quality with artificial intelligence: AIOps
Advancing Azure service quality with artificial intelligence: AIOps

As Mark mentioned when he authored the Advancing Reliability blog series, building and operating a global cloud infrastructure at the scale of Azure is a complex task with hundreds of ever-evolving service components, spanning more than 160 datacenters and across more than 60 regions.

1 неделя, 4 дня назад @ azure.microsoft.com
Microsoft AI health director: How AI is fueling intelligent health systems
Microsoft AI health director: How AI is fueling intelligent health systems Microsoft AI health director: How AI is fueling intelligent health systems

Microsoft AI health director: How AI is fueling intelligent health systemsAdvances in artificial intelligence are helping pave the way for intelligent health systems, which focus on using AI and data to establish care and operational strategies, according to Tom Lawry, national director for AI, health and life sciences at Microsoft.

By having full control of the data, organizations can use it to derive insights and make strategy decisions more quickly.

More than just the technology, healthcare organizations are also starting to view AI as a culture and a mindset, Mr. Lawry said.

“Intelligent health systems leverage data and AI to create strategic advantage, and they do that by making servic…

2 недели, 1 день назад @ beckershospitalreview.com
Microsoft Summit Addresses AI in a Time of Upheaval
Microsoft Summit Addresses AI in a Time of Upheaval Microsoft Summit Addresses AI in a Time of Upheaval

Microsoft U.S. Chief Digital Officer Jacky Wright started the event by talking about why AI is becoming so important for large organizations.

She polled an audience of industry officials about the top barriers to their AI adoption strategies, and the No.

As one speaker put it: “How do we make sure this isn’t a forced march toward something they don’t understand?”Tom Lawry, Microsoft’s national director for AI, health and life sciences, described at length the potential of AI in the health-care space.

“Intelligent health systems leverage data and AI to create strategic advantage, and they do that by making services more efficient.

Azizirad had mentioned earlier in the talk that since March, …

2 недели, 2 дня назад @ govtech.com
Microsoft Research EMEA and Latin America PhD Awards springboard new ideas across intercontinental research
Microsoft Research EMEA and Latin America PhD Awards springboard new ideas across intercontinental research Microsoft Research EMEA and Latin America PhD Awards springboard new ideas across intercontinental research

In their inaugural year with reach across five continents, the Microsoft Research PhD Awards support outstanding doctoral students in computing-related fields in Europe, the Middle East, Africa, and Latin America with funding to support their final year of research.

This is echoed by Latin America PhD Award recipient Segun Taofeek Aroyehun from the Instituto Politécnico Nacional, Mexico.

Each of the six PhD Award recipients is doing important research at the forefront of technology.

Find out more about these two awards from Microsoft Research by going to the EMEA PhD Award homepage and the Latin America PhD Award homepage.

We encourage you to explore the PhD students’ work further by visiti…

2 недели, 2 дня назад @ microsoft.com
Enhancing your photos through artificial intelligence
Enhancing your photos through artificial intelligence Enhancing your photos through artificial intelligence

Bringing old photos back to lifePeople can be nostalgic and often cherish the happy moments in revisiting old times.

Researchers at Microsoft recently proposed a technique to automate this restoration process, which revives old photos with compelling quality using AI.

Different from simply cascading multiple processing operators, we propose a solution specifically for old photos and thus achieve far better results.

This is motivated by the observation that artifacts existing in old photos can be categorized into two types.

Please refer to our paper, “Bringing Old Photos Back to Life,” for more technical details.

2 недели, 3 дня назад @ microsoft.com
Expert panel discusses combining human ingenuity with AI to help reboot business
Expert panel discusses combining human ingenuity with AI to help reboot business Expert panel discusses combining human ingenuity with AI to help reboot business

AT THE HEART IS DATA PREDICTION, OPTIMIZATION LEARNING, VERY PRACTICAL CORE TECHNOLOGIES THAT CAN HELP ANY BUSINESS.

IT IS A COMBINATION OF AI TECHNOLOGY AND THE SKILLING, WHICH IS IMPORTANT, TO HAVE PEOPLE READY IN THE CAPABILITY TO LEVERAGE THAT.

JONATHAN: WHAT WE SEE IS THAT COMPANIES EMBRACING AI, HAVE MORE KNOWLEDGE, MORE DIGITALIZATION, AND THIS BRINGS A LOT OF ADVANTAGES.

WE SHOULD NOT JUST EXPLOIT PAST DATA, WE SHOULD ACTUALLY INFECT, STOP SAYING THAT WE NEED DATA FOR AI, BECAUSE IT IS JUST NOT TRUE.

THE SIMPLE SORT OF AI INVESTMENTS THEY DID NOT KNOW THEY WERE USING AI, BUT THEY ARE LEADING THE WAY.

3 недели назад @ blogs.microsoft.com
Microsoft CTO Kevin Scott Believes Artificial Intelligence Will Help Reprogram The American Dream
Microsoft CTO Kevin Scott Believes Artificial Intelligence Will Help Reprogram The American Dream

next-hop.forbes.com | Access denied (403)Current session has been terminated.

For further information, do not hesitate to contact us.

Ref: 195.201.137.88 1593723692

3 недели, 2 дня назад @ forbes.com
High-Resolution Network: A universal neural architecture for visual recognition
High-Resolution Network: A universal neural architecture for visual recognition High-Resolution Network: A universal neural architecture for visual recognition

High-Resolution Network: Design and its four stagesThe HRNet maintains high-resolution representations through the whole process.

The HRNet has become a standard for human pose estimation since the paper was published in CVPR 2019.

In human pose estimation, HRNet gets superior estimation score with much lower training and inference memory cost and slightly larger training time cost and inference time cost.

In semantic segmentation, HRNet overwhelms PSPNet and DeepLabV3 in terms of all the metrics, and the inference-time cost is less than half of PSPNet and DeepLabV3.

ConclusionsThe high-resolution network (HRNet) is a universal architecture for visual recognition.

3 недели, 2 дня назад @ microsoft.com
2020 Microsoft Research Dissertation Grant supports students’ cutting-edge work
2020 Microsoft Research Dissertation Grant supports students’ cutting-edge work 2020 Microsoft Research Dissertation Grant supports students’ cutting-edge work

This year marks the fourth year of the Microsoft Research Dissertation Grant, which offers grants of up to $25,000 to support the research of students nearing the completion of doctoral degrees at North American universities who are underrepresented in the field of computing.

This was the most competitive year yet for the grant program; about 230 students submitted proposals.

This year’s grant recipients, along with their respective academic institutions and dissertations, are:Furthering their research agendasOur recipients plan to use the grant monies to further various aspects of their research programs; in addition to supporting their tuition, students described the myriad ways in which …

3 недели, 2 дня назад @ microsoft.com
Facebook Facebook
последний пост 7 месяцев назад
Fighting Abuse @Scale 2019 recap
Fighting Abuse @Scale 2019 recap Fighting Abuse @Scale 2019 recap

Fighting abuse presents unique challenges for large-scale organizations working to keep the people on their platforms safe.

At Fighting Abuse @Scale 2019, engineers, data scientists, product managers, and operations specialists gathered in Menlo Park for a day of technical talks focused on state-of-the art technologies to fight fraud, spam, and abuse on platforms that serve millions or even billions of people.

Our key insight is that sharing patterns can help hosting platforms identify abusive content, while hosting platforms can help sharing platforms prevent the spread of abusive content.

Results demonstrate that working together as an industry can strengthen the capacity to more quickly …

7 месяцев назад @ engineering.fb.com
CCSM: Scalable statistical anomaly detection to resolve app crashes faster
CCSM: Scalable statistical anomaly detection to resolve app crashes faster CCSM: Scalable statistical anomaly detection to resolve app crashes faster

A contrast set mining algorithmCSM provides a scalable, robust way to generate human-readable insights on high dimensional crash data.

For a contrast set X and group G, the support S(X,G) is the percentage of vectors in group G for which the contrast set X is true.

To efficiently traverse the search space of feature combinations, we cast the problem of mining contrast sets as a tree search problem.

However, real world data is often mixed — our crash data contains a mix of categorical, discrete, and continuous data.

The continuous contrast mining algorithm adopts the same tree search framework, with modifications to reason about sets of continuous features.

7 месяцев, 2 недели назад @ engineering.fb.com
Fast dimensional analysis for root cause analysis at scale
Fast dimensional analysis for root cause analysis at scale Fast dimensional analysis for root cause analysis at scale

What the research is:A fast dimensional analysis (FDA) framework that automates root cause analysis on structured logs with improved scalability.

When a failure event happens in a large-scale distributed production environment, performing root cause analysis can be challenging.

Our proposed FDA framework combines structured logs from a number of sources and provides a meaningful combination of features.

As we’ve mentioned, the challenges of performing root cause analysis in a large-scale distributed production environment make outage detection and mitigation difficult.

Read the full paper:Fast Dimensional Analysis for Root Cause Investigation in Large-Scale Service EnvironmentWe’d like to t…

8 месяцев назад @ engineering.fb.com
2019 @Scale Conference recap
2019 @Scale Conference recap 2019 @Scale Conference recap

If you are interested in future events, visit the @Scale website or join the @Scale community.

@Scale 2019: Data InfraZanzibar: Google’s consistent, global authorization systemRuoming Pang, Principal Software Engineer, GoogleDetermining whether online users are authorized to access digital objects is central to preserving privacy.

6 technical challenges in developing a distributed SQL databaseNeha Deodhar, Software Engineer, YugaByteNeha discusses the experience of developing YugaByte.

@Scale 2019: SecurityLeveraging the type system to write secure applicationsShannon Zhu, Software Engineer, FacebookShannon discusses ways to extend the type system to eliminate entire classes of security vul…

8 месяцев, 2 недели назад @ engineering.fb.com
Video @Scale 2019 recap
Video @Scale 2019 recap Video @Scale 2019 recap

At Video @Scale 2019, engineers gathered in San Francisco for a day of technical talks focused on delivering video at scale.

Adopting video at scaleSteven Robertson, Engineer, YouTubeSteven works on streaming video performance at YouTube.

AV1 PanelRonald Bultje, Founder, Two OriolesYaowu Xu, Principal Software Engineer, GoogleChekib Nouira, Senior Video Systems Engineer, IntelPanel moderated by Ioannis Katsavounidis.

Contextual video ad safetyVijaya Chandra, Software Engineering Manager, FacebookRose Kanjirathinkal, Research Scientist, FacebookVijaya leads video understanding efforts at Facebook.

Video integrity at scaleSonal Gandhi, Software Engineer, FacebookSonal talks about reducing har…

8 месяцев, 3 недели назад @ engineering.fb.com
Releasing a new benchmark and data set for evaluating neural code search models
Releasing a new benchmark and data set for evaluating neural code search models Releasing a new benchmark and data set for evaluating neural code search models

The benchmark includes the largest evaluation data set currently available for Java, consisting of a natural language query and code snippet pairs.

This data set comprises 287 Stack Overflow question-and-answer pairs from the Stack Exchange Data Dump.

A score sheet on the evaluation data set, using two models from our recent work, is also included.

We intend for this data set to serve as a benchmark for evaluating search quality across a variety of code search models.

To evaluate the performance of these models, Stack Overflow questions and code answer pairs are prime candidates, as Stack Overflow questions effectively represent what a developer may ask.

9 месяцев, 1 неделя назад @ ai.facebook.com
Hydra: A framework that simplifies development of complex applications
Hydra: A framework that simplifies development of complex applications Hydra: A framework that simplifies development of complex applications

Hydra’s flexible approach to developing, creating, and maintaining code and configurations can help speed the development of complex applications in various fields, including machine learning research.

What it does:Hydra offers an innovative approach to composing an application’s configuration, allowing changes to a composition through configuration files as well as from the command line.

Hydra speeds development of such applications while reducing the chances of bugs, and it enables code to evolve more naturally in response to new requirements.

Why it matters:Hydra is already in use at Facebook to prototype complex research projects.

We expect to continue using the Hydra framework for buil…

9 месяцев, 1 неделя назад @ engineering.fb.com
MaRS: How Facebook keeps maps current and accurate
MaRS: How Facebook keeps maps current and accurate MaRS: How Facebook keeps maps current and accurate

To reduce the risk of bad edits, whether intentional ( vandalism ) or unintentional, we don’t update our local copy directly.

So we, like most consumers of OSM data, have an internal storage format (a local copy).

Current approaches to keeping OSM data updated primarily focus on tackling the two axes separately.

Freshness is achieved by simply consuming upstream changesets faster, or essentially rebasing the local copy with the upstream master on a regular cadence (e.g., daily or weekly).

Let V(Downstream) be the current downstream local copy version based on an earlier version of upstream.

9 месяцев, 2 недели назад @ engineering.fb.com
Integrating autoconversion: Facebook’s path from Zawgyi to Unicode
Integrating autoconversion: Facebook’s path from Zawgyi to Unicode Integrating autoconversion: Facebook’s path from Zawgyi to Unicode

Each of the requirements for the autoconversion — content encoding detection, device encoding detection, and conversion — had its own challenges.

Content encoding detectionTo perform autoconversion, we first need to know the content encoding, that is, the encoding used when the text was first input.

We train a machine learning (ML) model on public Facebook content samples for which we already know the content encoding.

Device encoding detectionNext, we need to know which encoding was used by a person’s phone (i.e., the device encoding) to understand whether we need to perform a font encoding conversion.

There’s no single pipeline through which all possible Facebook content passes, which mak…

9 месяцев, 2 недели назад @ engineering.fb.com
Register now for @Scale 2019!
Register now for @Scale 2019! Register now for @Scale 2019!

Registration is officially open for @Scale 2019.

Topics for the @Scale 2019 talks include cloud native platforms for event streaming, advances in self-supervised learning and natural language processing, securing SSH traffic, deploying DNS privacy technologies at scale, and more.

To register for @Scale 2019, enter your invite code here.

Visit the @Scale Community page and message us with your name, company name, and email address.

If you’ve never been to an @Scale event, you can watch David Patterson of Google and Clément Farabet of NVIDIA open last year’s event, or see videos of all the talks in last year’s recap.

9 месяцев, 3 недели назад @ engineering.fb.com
Creating a data set and a challenge for deepfakes
Creating a data set and a challenge for deepfakes Creating a data set and a challenge for deepfakes

Yet the industry doesn't have a great data set or benchmark for detecting them.

That's why Facebook is commissioning a realistic data set that will use paid actors, with the required consent obtained, to contribute to the challenge.

No Facebook user data will be used in this data set.

To ensure the quality of the data set and challenge parameters, they will initially be tested through a targeted technical working session this October at the International Conference on Computer Vision (ICCV).

The full data set release and the DFDC launch will happen at the Conference on Neural Information Processing Systems (NeurIPS) this December.

10 месяцев, 1 неделя назад @ ai.facebook.com
New advances in natural language processing
New advances in natural language processing New advances in natural language processing

Natural language understanding (NLU) and language translation are key to a range of important applications, including identifying and removing harmful content at scale and connecting people across different languages worldwide.

We’ve also introduced a new self-supervised pretraining approach, RoBERTa, that surpassed all existing NLU systems on several language comprehension tasks.

According to human evaluations, our models were ranked top in four translation tasks: from English to German, German to English, English to Russian, and Russian to English.

SuperGLUE follows in the footsteps of GLUE, which offers a single-number metric that summarizes progress on a diverse set of NLP tasks.

By cha…

11 месяцев назад @ ai.facebook.com
A new model for word embeddings that are resilient to misspellings
A new model for word embeddings that are resilient to misspellings A new model for word embeddings that are resilient to misspellings

What the research is:A new model to learn word embeddings (words or phrases mapped to dense vectors of numbers that represent their meaning) that are resilient to misspellings.

To address this deficiency, we propose Misspelling Oblivious Embeddings (MOE), a new model that combines our open source library fastText with a supervised task that embeds misspellings close to their correct variants.

In addition to the semantic loss, MOE also considers an additional supervisedloss that we call spell correction loss.

The spell correction loss aims to embed misspellings close to their correct versions by minimizing the weighted sum of semantic loss and spell correction loss.

Our approach will improve…

11 месяцев назад @ ai.facebook.com
Michael F. Cohen awarded 2019 Steven A. Coons award
Michael F. Cohen awarded 2019 Steven A. Coons award Michael F. Cohen awarded 2019 Steven A. Coons award

On July 29 at SIGGRAPH, Michael F. Cohen will receive the 2019 Steven A. Coons Award for Outstanding Creative Contributions to Computer Graphics.

The award is given to one individual every two years to honor outstanding lifetime contributions to computer graphics and interactive techniques.

Cohen joined Facebook in Fall 2015 as Director of Facebook’s Computational Photography Research team, which was formed to explore new ways to share photos and videos online.

I never became an engineer but rather first entered the field of computer graphics with intentions to continue studies related to civil engineering.

This is really a marriage of computer graphics and computer vision.

11 месяцев, 2 недели назад @ research.fb.com
EGG: A toolkit for multi-agent language emergence simulations
EGG: A toolkit for multi-agent language emergence simulations EGG: A toolkit for multi-agent language emergence simulations

What’s new:EGG is a new toolkit that allows researchers and developers to quickly create game simulations in which two neural network agents devise their own discrete communication system in order to solve a task together.

A lively area of machine learning (ML) research, language emergence would benefit from a more interdisciplinary approach.

Why it matters:Human language is an extremely powerful communication system that is unique in nature.

Which innate biases are necessary to ensure that a communication system shares the core properties of human language?

With EGG, ML experts can quickly probe the communication skills of new AI architectures.

11 месяцев, 2 недели назад @ engineering.fb.com
MIT AI MIT AI
последний пост 1 неделя, 3 дня назад
Exploring interactions of light and matter
Exploring interactions of light and matter Exploring interactions of light and matter

His father, trained as a mechanical engineer, spent his career working first in that field, then in electrical engineering, and then civil engineering.

Last year, Hu earned tenure as an associate professor in MIT’s Department of Materials Science and Engineering.

“I got fascinated with light,” he says, recalling how he began working in this field.

This includes work on devices called optical diodes or optical isolators, which allow light to pass through only in one direction, and systems for coupling light signals into and out of photonic chips.

Lately, Hu has been focusing on applying machine-learning methods to improve the performance of optical systems.

1 неделя, 3 дня назад @ news.mit.edu
The MIT Press and UC Berkeley launch Rapid Reviews: COVID-19
The MIT Press and UC Berkeley launch Rapid Reviews: COVID-19 The MIT Press and UC Berkeley launch Rapid Reviews: COVID-19

The MIT Press has announced the launch of Rapid Reviews: COVID-19 (RR:C19), an open access, rapid-review overlay journal that will accelerate peer review of Covid-19-related research and deliver real-time, verified scientific information that policymakers and health leaders can use.

Using artificial intelligence tools, a global team will identify promising scholarship in preprint repositories, commission expert peer reviews, and publish the results on an open access platform in a completely transparent process.

Amy Brand, director of the MIT Press sees the no-cost open access model as a way to increase the impact of global research and disseminate high-quality scholarship.

“We are confident…

1 неделя, 4 дня назад @ news.mit.edu
Improving global health equity by helping clinics do more with less
Improving global health equity by helping clinics do more with less Improving global health equity by helping clinics do more with less

Despite these encouraging signs, however, the availability of essential vaccines has stagnated globally in recent years, according the World Health Organization.

Both products represent steps toward macro-eyes’ larger goal of transforming health care through artificial intelligence.

“The state of the art in machine learning will result from confronting fundamental challenges in the most difficult environments in the world,” Fels says.

The pair’s experience crunching numbers in different industries alerted them to a shortcoming in health care.

The founders are also exploring ways to apply that approach to help direct Covid-19 patients to health clinics with sufficient capacity.

2 недели, 1 день назад @ news.mit.edu
Identifying a melody by studying a musician’s body language
Identifying a melody by studying a musician’s body language Identifying a melody by studying a musician’s body language

When the ear fails to tell two instruments apart, the eye often pitches in by matching each musician’s movements to the beat of each part.

“Body keypoints provide powerful structural information,” says the study’s lead author, Chuang Gan, an IBM researcher at the lab.

“We learn from all of our senses,” says Antonio Torralba, an MIT professor and co-senior author of the study.

An update to PixelPlayer allowed you to distinguish between two violins in a duet by matching each musician’s movements with the tempo of their part.

The latter study suggests that sound-tracking tools might be a useful addition in self-driving cars, complementing their cameras in poor driving conditions.

2 недели, 1 день назад @ news.mit.edu
Cynthia Breazeal named Media Lab associate director
Cynthia Breazeal named Media Lab associate director Cynthia Breazeal named Media Lab associate director

Cynthia Breazeal has been promoted to full professor and named associate director of the Media Lab, joining the two other associate directors: Hiroshi Ishii and Andrew Lippman.

In her new associate director role, Breazeal will work with lab faculty and researchers to develop new strategic research initiatives.

She will also play a key role in exploring new funding mechanisms to support broad Media Lab needs, including multi-faculty research efforts, collaborations with other labs and departments across the MIT campus, and experimental executive education opportunities.

Her book, “Designing Sociable Robots” (MIT Press, 2002), is considered pivotal in launching the field.

The following year, …

3 недели назад @ news.mit.edu
Bringing the predictive power of artificial intelligence to health care
Bringing the predictive power of artificial intelligence to health care Bringing the predictive power of artificial intelligence to health care

It blossomed into a six year stint at the Broad, after which he continued exploring the intersection of big data and health care.

“After a year in health care, I realized it was going to be really hard to do anything else,” DeCaprio says.

Often the first problems startups run into is making their algorithms work with each health care system’s data.

Another limitation of AI in health care has been the difficulty of understanding how models get to results.

“Someone who is 85 years old and shut in may not know there’s a community based organization that will deliver them groceries.”For DeCaprio, bringing the predictive power of AI to health care has been a rewarding, if humbling, experience.

3 недели, 1 день назад @ news.mit.edu
MIT and Toyota release innovative dataset to accelerate autonomous driving research
MIT and Toyota release innovative dataset to accelerate autonomous driving research MIT and Toyota release innovative dataset to accelerate autonomous driving research

The following was issued as a joint release from the MIT AgeLab and Toyota Collaborative Safety Research Center.

These are some of the questions researchers from the AgeLab at the MIT Center for Transportation and Logistics and the Toyota Collaborative Safety Research Center (CSRC) are trying to answer by sharing an innovative new open dataset called DriveSeg.

Through the release of DriveSeg, MIT and Toyota are working to advance research in autonomous driving systems that, much like human perception, perceive the driving environment as a continuous flow of visual information.

According to Sherony, video-based driving scene perception provides a flow of data that more closely resembles dyna…

3 недели, 1 день назад @ news.mit.edu
MIT-Takeda program launches
MIT-Takeda program launches MIT-Takeda program launches

In February, researchers from MIT and Takeda Pharmaceuticals joined together to celebrate the official launch of the MIT-Takeda Program.

The MIT-Takeda Program aims to fuel the development and application of artificial intelligence (AI) capabilities to benefit human health and drug development.

“We were truly impressed by the creativity and breadth of the proposals we received,” says Anantha P. Chandrakasan, dean of the School of Engineering, Vannevar Bush Professor of Electrical Engineering and Computer Science, and co-chair of the MIT-Takeda Program Steering Committee.

“Together we are building capabilities and addressing challenges through interrogation of multiple data types that we hav…

3 недели, 1 день назад @ news.mit.edu
What jumps out in a photo changes the longer we look
What jumps out in a photo changes the longer we look What jumps out in a photo changes the longer we look

But in the real world, human attention often shifts abruptly.

When tested, their model outperformed the state of the art at predicting saliency across viewing durations.

In addition to guiding an editing tool to crop an image for shorter or longer viewing durations, it could prioritize which elements in a compressed image to render first for viewers.

Research on human attention offers insights for technologists.

By making it faster and cheaper to gather human attention data, the platforms may help to generate new knowledge on human vision and cognition.

3 недели, 2 дня назад @ news.mit.edu
Learning the ropes and throwing lifelines
Learning the ropes and throwing lifelines Learning the ropes and throwing lifelines

From her apartment in Sidney-Pacific, where she has stayed put due to travel restrictions in her home country of India, Chauhan is still learning the ropes of her new position.

“It gave me a sense of community and made me feel like I have a family here,” she says.

Chauhan has found additional ways to address the particular difficulties that international students face.

As a member of the Presidential Advisory Council this year, she gathered international student testimonies on visa difficulties and presented them to MIT’s president and the director of the International Students Office.

For Chauhan, that meant working as a teaching assistant, drawing henna designs, singing, enjoying yoga, an…

1 месяц назад @ news.mit.edu
Engineers put tens of thousands of artificial brain synapses on a single chip
Engineers put tens of thousands of artificial brain synapses on a single chip Engineers put tens of thousands of artificial brain synapses on a single chip

MIT engineers have designed a “brain-on-a-chip,” smaller than a piece of confetti, that is made from tens of thousands of artificial brain synapses known as memristors — silicon-based components that mimic the information-transmitting synapses in the human brain.

The researchers borrowed from principles of metallurgy to fabricate each memristor from alloys of silver and copper, along with silicon.

When a voltage is applied to one electrode, ions from that electrode flow through the medium, forming a “conduction channel” to the other electrode.

In this way, they patterned a millimeter-square silicon chip with tens of thousands of memristors.

“We would like to develop this technology further …

1 месяц назад @ news.mit.edu
Giving soft robots feeling
Giving soft robots feeling Giving soft robots feeling

One of the hottest topics in robotics is the field of soft robots, which utilizes squishy and flexible materials rather than traditional rigid materials.

But soft robots have been limited due to their lack of good sensing.

“Unlike many other soft tactile sensors, ours can be rapidly fabricated, retrofitted into grippers, and show sensitivity and reliability,” says MIT postdoc Josie Hughes, the lead author on a new paper about the sensors.

“By constraining soft fingers with a flexible exoskeleton, and performing high-resolution sensing with embedded cameras, we open up a large range of capabilities for soft manipulators.”Magic ball sensesThe magic ball gripper is made from a soft origami str…

1 месяц, 1 неделя назад @ news.mit.edu
Undergraduates develop next-generation intelligence tools
Undergraduates develop next-generation intelligence tools Undergraduates develop next-generation intelligence tools

One even carried on his experiments from his bedroom, after schlepping his Sphero Bolt robots home in a backpack.

“I’ve been so impressed by their resilience and dedication,” says Katherine Gallagher, one of three artificial intelligence engineers at MIT Quest for Intelligence who works with students each semester on intelligence-related applications.

The project involves training a deep neural network to pick out globules of fat on liver tissue slides to estimate the liver’s overall fat content.

One challenge, says Huang, has been figuring out how to handle variations in how various pathologists classify fat globules.

The final output will be a fat content estimate with pictures of highlig…

1 месяц, 2 недели назад @ news.mit.edu
Fireflies helps companies get more out of meetings
Fireflies helps companies get more out of meetings Fireflies helps companies get more out of meetings

The startup Fireflies.ai is helping people get the most out of their meetings with a note-taking, information-organizing virtual assistant named Fred.

Fred transcribes every word of meetings and then uses artificial intelligence to help people sort and share that information later on.

After each meeting, Fireflies can automatically sync all this meeting data into apps from companies like Slack, Salesforce, and Hubspot.

“Fireflies is like a personal assistant that helps connect your systems of communication with your systems of record,” Udotong says.

The same thing is true today of audio and meeting data.

1 месяц, 2 недели назад @ news.mit.edu
Machine-learning tool could help develop tougher materials
Machine-learning tool could help develop tougher materials Machine-learning tool could help develop tougher materials

For engineers developing new materials or protective coatings, there are billions of different possibilities to sort through.

The focus of this work was on predicting the way a material would break or fracture, by analyzing the propagation of cracks through the material’s molecular structure.

“One of the specialties of my lab is to use what we call molecular dynamics simulations, or basically atom-by-atom simulations” of such processes, Buehler says.

In this case, they were looking at a variety of composite, layered coatings made of crystalline materials.

So, this is a whole new way of simulating how materials fail.”How materials fail is crucial information for any engineering project, Bueh…

1 месяц, 3 недели назад @ news.mit.edu
Berkeley AI
последний пост 2 недели, 1 день назад
D4RL: Building Better Benchmarks for Offline Reinforcement Learning
D4RL: Building Better Benchmarks for Offline Reinforcement Learning D4RL: Building Better Benchmarks for Offline Reinforcement Learning

In offline RL, we assume all experience is collected offline, fixed and no additional data can be collected.

In order to develop effective algorithms for offline RL, we need widely available benchmarks that are easy to use and can accurately measure progress on this problem.

Narrow and biased data distributions are a common property in real-world datasets that can create problems for offline RL algorithms.

The Flow project proposes to use autonomous vehicles for reducing traffic congestion, which we believe is a compelling use case for offline RL.

Future DirectionsIn the near future, we would be excited to see offline RL applications move from simulated domains to real-world domains where s…

2 недели, 1 день назад @ bair.berkeley.edu
Open Compound Domain Adaptation
Open Compound Domain Adaptation Open Compound Domain Adaptation

Therefore, we start rethinking machine learning and domain adaptation systems, and try to introduce a continuous learning protocol under domain adaptation scenario.

Open Compound Domain Adaptation (OCDA)The goal of domain adaptation is to adapt the model learned on the training data to the test data of a different distribution.

We propose to study Open Compound Domain Adaptation (OCDA), a continuous and more realistic setting for domain adaptation (Figure 2).

The newly proposed Open Compound Domain Adaptation (OCDA) serves as a more comprehensive and more realistic touchstone for evaluating domain adaptation and transfer learning systems.

Figure 3: The differences between single-target doma…

3 недели, 5 дней назад @ bair.berkeley.edu
OmniTact: A Multi-Directional High-Resolution Touch Sensor
OmniTact: A Multi-Directional High-Resolution Touch Sensor OmniTact: A Multi-Directional High-Resolution Touch Sensor

OmniTact: A Multi-Directional High-Resolution Touch SensorHuman thumb next to our OmniTact sensor, and a US penny for scale.

Recently, the GelSight sensor has caught significant interest for learning-based robotics due to its low cost and rich signal.

Comparison of GelSight-style sensor (left side) to our OmniTact sensor (right side).

The OmniTact SensorOur OmniTact sensor design aims to address these limitations.

We additionally compared performance with another multi-directional tactile sensor, the OptoForce sensor, which only had a success rate of 17%.

1 месяц, 3 недели назад @ bair.berkeley.edu
Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in Simulation
Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in Simulation Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in Simulation

Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in SimulationHumans manipulate 2D deformable structures such as fabric on a daily basis, from putting on clothes to making beds.

Model-Free MethodsModel-Free Learning without DemonstrationsIn this paper we present a model-free deep reinforcement learning approach for smoothing cloth.

An example of real robot cloth smoothing experiments with varying starting states and cloth colors.

Since this policy is easy to define, we code an algorithmic supervisor in simulation and perform imitation learning using Dataset Aggregation (DAgger).

Several episodes of both manipulating rope and cloth using our method,…

2 месяца назад @ bair.berkeley.edu
Unsupervised Meta-Learning: Learning to Learn without Supervision
Unsupervised Meta-Learning: Learning to Learn without Supervision Unsupervised Meta-Learning: Learning to Learn without Supervision

Unsupervised Meta-Learning: Learning to Learn without SupervisionThis post is cross-listed on the CMU ML blog.

In this post we introduce theory and algorithms for unsupervised meta-learning, where machine learning algorithms themselves propose their own task distributions.

For example, a distribution over supervised learning tasks may include learning a dog detector, learning a cat detector, and learning a bird detector.

These unsupervised meta-learning algorithms allow for learning in regimes previously impractical, and further expand that capability of machine learning methods.

A number of open questions remain about unsupervised meta-learning:Unsupervised learning is closely connected to…

2 месяца, 1 неделя назад @ bair.berkeley.edu
The Ingredients of Real World Robotic Reinforcement Learning
The Ingredients of Real World Robotic Reinforcement Learning The Ingredients of Real World Robotic Reinforcement Learning

The simulation will never exactly match the real world, which means that improvements in simulation performance may not translate to improvements in the real world.

However, training robots in the real world with reinforcement learning has proven challenging, due to certain constraints.

What makes real world robotic reinforcement learning so challenging?

We show effective uninstrumented real world learning on two dexterous manipulation tasks with a 3 fingered robotic hand.

However, we believe that the ingredients of real world RL that we have proposed should endure as principles of design for real world RL systems.

2 месяца, 2 недели назад @ bair.berkeley.edu
Making Decision Trees Accurate Again: Explaining What Explainable AI Did Not
Making Decision Trees Accurate Again: Explaining What Explainable AI Did Not Making Decision Trees Accurate Again: Explaining What Explainable AI Did Not

Making Decision Trees Accurate Again: Explaining What Explainable AI Did NotThe interpretability of neural networks is becoming increasingly necessary, as deep learning is being adopted in settings where accurate and justifiable predictions are required.

In a neural-backed decision tree, predictions are made via a decision tree, preserving high-level interpretability.

Naive Decision Tree: We construct a basic decision tree with one root node and a leaf for each class.

The direct equivalence between a fully-connected layer and a naive decision tree motivates our particular inference method, using an inner-product decision tree.

Decision trees address this, but unfortunately, images are krypt…

2 месяца, 2 недели назад @ bair.berkeley.edu
Robots Learning to Move like Animals
Robots Learning to Move like Animals Robots Learning to Move like Animals

Robots Learning to Move like AnimalsQuadruped robot learning locomotion skills by imitating a dog.

The superior agility seen in animals, as compared to robots, might lead one to wonder: can we create more agile robotic controllers with less effort by directly imitating animals?

a dog), our framework uses reinforcement learning to train a control policy that enables a robot to imitate the motion in the real world.

1) First, given a reference motion, the motion retargeting stage maps the motion from the original animal’s morphology to the robot’s morphology.

2) Next, the motion imitation stage uses the retargeted reference motion to train a policy for imitating the motion in simulation.

3 месяца, 1 неделя назад @ bair.berkeley.edu
Physically Realistic Attacks on Deep Reinforcement Learning
Physically Realistic Attacks on Deep Reinforcement Learning Physically Realistic Attacks on Deep Reinforcement Learning

Physically Realistic Attacks on Deep Reinforcement LearningDeep reinforcement learning (RL) has achieved superhuman performance in problems ranging from data center cooling to video games.

Consequently, it is critical that RL policies are robust: both to naturally occurring distribution shift, and to malicious attacks by adversaries.

We find it is still possible to attack victim policies in this more realistic multi-agent threat model.

To better understand how the adversarial policies exploit their victims, we created “masked” versions of victim policies.

The existence of adversarial policies has significant implications for the training, understanding and evaluation of RL policies.

3 месяца, 2 недели назад @ bair.berkeley.edu
Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning?
Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning? Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning?

Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning?

Corrective Feedback and Why it is Absent in ADPWhat is corrective feedback formally?

This enjoys corrective feedback, and we then contrast it with ADP methods, which do not.

One way to prevent this problem is by computing an “optimal” data distribution that provides maximal corrective feedback, and train Q-functions using this distribution?

More generally, we would like to make a case of analyzing effects of data distribution more deeply in the context of deep RL algorithms.

3 месяца, 3 недели назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 1 день, 8 часов назад
Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger
Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger

When Model Monitor detects an issue, we use Amazon SageMaker Debugger to obtain visual explanations of the deployed model.

To reproduce the different steps and results listed in this post, clone the repository amazon-sagemaker-analyze-model-predictions into your Amazon SageMaker notebook instance or from within your Amazon SageMaker Studio and run the notebook.

Before you can deploy the model to Amazon SageMaker, you need to archive and upload its weights to Amazon Simple Storage Service (Amazon S3).

Creating a Model Monitor scheduleNext, we demonstrate how to set up a monitoring schedule using Model Monitor.

SummaryThis post demonstrated how to use Amazon SageMaker Model Monitor and Amazon…

1 день, 8 часов назад @ aws.amazon.com
Announcing the launch of Amazon Comprehend custom entity recognition real-time endpoints
Announcing the launch of Amazon Comprehend custom entity recognition real-time endpoints Announcing the launch of Amazon Comprehend custom entity recognition real-time endpoints

In this post, we cover how to build an Amazon Comprehend custom entity recognition model and set up an Amazon Comprehend Custom Entity Recognition real time endpoint for synchronous inference.

Create a real-time analysis Amazon Comprehend custom entity recognizer endpoint to identify the chat messages to detect a SERVICE or VERSION entity.

Creating a custom entity recognizerTo create your recognizer, complete the following steps:On the Amazon Comprehend console, create a custom entity recognizer.

Custom entity recognition extends the capability of Amazon Comprehend by enabling you to identify new entity types not supported as one of the preset generic entity types.

With Amazon Comprehend cu…

1 день, 8 часов назад @ aws.amazon.com
Optimizing I/O for GPU performance tuning of deep learning training in Amazon SageMaker
Optimizing I/O for GPU performance tuning of deep learning training in Amazon SageMaker Optimizing I/O for GPU performance tuning of deep learning training in Amazon SageMaker

You can typically see performance improvements up to 10-fold in overall GPU training by just optimizing I/O processing routines.

Data transfer into GPU memory – Copy the processed data from the CPU memory into the GPU memory.

downloading data from Amazon S3, use of file systems such as Amazon EBS & Amazon Elastic File System (Amazon EFS).

If you already have your training data on Amazon Elastic File System (Amazon EFS), you can also use Amazon EFS with Amazon SageMaker.

Amazon SageMaker instances with local NVMe-based SSD storageSome of the Amazon SageMaker GPU instances, such as the ml.p3dn.24xlarge and ml.g4dn, provide local NVMe-based SSD storage instead of EBS volumes.

1 день, 10 часов назад @ aws.amazon.com
Giving your content a voice with the Newscaster speaking style from Amazon Polly
Giving your content a voice with the Newscaster speaking style from Amazon Polly Giving your content a voice with the Newscaster speaking style from Amazon Polly

This post discusses how the Newscaster voice was built and how you can use the Newscaster voice with your content in a few simple steps.

Building the Newscaster style voiceUntil recently, Amazon Polly voices were built such that the speaking style of the voice remained the same, no matter the use case.

To make voices as lifelike as possible, Amazon Polly has built two speaking style voices: Conversational and Newscaster.

To learn more about using the Newscaster style in Amazon Polly, see Using the Newscaster Style.

For the full list of voices that Amazon Polly offers, see Voices in Amazon Polly.

2 дня, 9 часов назад @ aws.amazon.com
Accelerating innovation: How serverless machine learning on AWS powers F1 Insights
Accelerating innovation: How serverless machine learning on AWS powers F1 Insights Accelerating innovation: How serverless machine learning on AWS powers F1 Insights

Technology has always played a central role in F1; where the evolution of the rules and tools is built into the DNA of F1.

F1 partnered with AWS to build new F1 insights, working backwards to build ML models to track pit battles and improve the viewing experience.

We used serverless products offered by AWS, such as Lambda, API Gateway, DynamoDB, Amazon CloudWatch, and S3.

We used AWS CloudFormation to implement an approach called infrastructure as code (IaC) to provision environments and have predictable deployments.

We carefully analyzed racing data and model predictions to extract features that are available in the race data.

2 дня, 13 часов назад @ aws.amazon.com
Building a custom Angular application for labeling jobs with Amazon SageMaker Ground Truth
Building a custom Angular application for labeling jobs with Amazon SageMaker Ground Truth Building a custom Angular application for labeling jobs with Amazon SageMaker Ground Truth

This post walks you through using Angular and Angular Elements to create fully customizable solutions that work nicely with Ground Truth.

This walkthrough assumes that you’re familiar with running a custom labeling job with Ground Truth and Crowd HTML Elements.

The application is built using Angular Elements, which creates Angular components packaged as custom elements (also called web components), a web standard for defining new HTML elements in a framework-agnostic way.

Angular Elements inputs and outputsIn this use case, your Angular component expects two inputs: an invoice description and an invoice translation.

For more information about hierarchical taxonomies in Ground Truth, see Cre…

3 дня, 14 часов назад @ aws.amazon.com
2019 Q4 recipients of AWS Machine Learning Research Awards
2019 Q4 recipients of AWS Machine Learning Research Awards 2019 Q4 recipients of AWS Machine Learning Research Awards

The AWS Machine Learning Research Awards (MLRA) aims to advance machine learning (ML) by funding innovative research and open-source projects, training students, and providing researchers with access to the latest technology.

We’re now pleased to announce 28 new recipients of MLRA’s 2019 Q4 call-for-proposal cycle.

The funded projects aim to develop open-source tools and research that benefit the ML community at large, or create impactful research using AWS ML solutions, such as Amazon SageMaker, AWS AI Services, and Apache MXNet on AWS.

For more information about MLRA, see AWS Machine Learning Research Awards or send an email to aws-ml-research-awards@amazon.com.

About the AuthorSeo Yeon S…

1 неделя, 2 дня назад @ aws.amazon.com
Cisco uses Amazon SageMaker and Kubeflow to create a hybrid machine learning workflow
Cisco uses Amazon SageMaker and Kubeflow to create a hybrid machine learning workflow Cisco uses Amazon SageMaker and Kubeflow to create a hybrid machine learning workflow

The created and trained model is uploaded to Amazon Simple Storage Service (Amazon S3) and uses Amazon SageMaker endpoints for serving.

If you have an existing UCS machine, the Cisco Kubeflow starter pack offers a quick Kubeflow setup on your Kubernetes cluster (v15.x or later).

Preparing the hybrid pipelineFor a seamless ML workflow between Cisco UCS and AWS, we created a hybrid pipeline using the Kubeflow Pipelines component and Amazon SageMaker Kubeflow components.

When the trained model artifacts are uploaded to Amazon S3, Amazon SageMaker uses the model stored in Amazon S3 to deploy the model to a hosting endpoint.

The complete set of blogs and tutorials for Amazon SageMaker makes it e…

1 неделя, 2 дня назад @ aws.amazon.com
Deriving conversational insights from invoices with Amazon Textract, Amazon Comprehend, and Amazon Lex
Deriving conversational insights from invoices with Amazon Textract, Amazon Comprehend, and Amazon Lex Deriving conversational insights from invoices with Amazon Textract, Amazon Comprehend, and Amazon Lex

With AWS AI services such as Amazon Textract, Amazon Comprehend and Amazon Lex, you can set up an automated serverless solution to address this requirement.

Interact with these insights in natural language using Amazon Lex.

Amazon Simple Storage Service (Amazon S3) – Serves as an object store for your documents and allows for central management with fine-tuned access controls.

Deploying the architecture with AWS CloudFormationYou deploy a CloudFormation template to provision the necessary AWS Indentity and Access Management (IAM) roles, services, and components of the solution including Amazon S3, Lambda, Amazon Textract, Amazon Comprehend, and the Amazon Lex chatbot.

Publishing your chatbo…

1 неделя, 3 дня назад @ aws.amazon.com
How Euler Hermes detects typo squatting with Amazon SageMaker
How Euler Hermes detects typo squatting with Amazon SageMaker How Euler Hermes detects typo squatting with Amazon SageMaker

The idea was to train an ML model to recognize domains related to Euler Hermes.

On a daily basis, we need to unearth domains related to Euler Hermes from a large dataset of approximately 150,000 publicly registered domains.

Amazon SageMaker automatically saves the inferences in an S3 bucket that you specify when creating the batch transform job.

With flexibility and inherent programmability, Amazon SageMaker helped us tackle our main pain point of industrializing ML models at scale.

He contributed within the IT innovation team to the deployment of Watson on Amazon Sagemaker.

1 неделя, 4 дня назад @ aws.amazon.com
Building a visual search application with Amazon SageMaker and Amazon ES
Building a visual search application with Amazon SageMaker and Amazon ES Building a visual search application with Amazon SageMaker and Amazon ES

In this post, you build a visual image search application from scratch in under an hour, including a full-stack web application for serving the visual search results.

Submitting a new image to the Amazon SageMaker endpoint and Amazon ES to return similar images.

AWS SAM – AWS Serverless Application Model (AWS SAM) is an open-source framework for building serverless applications.

– AWS Serverless Application Model (AWS SAM) is an open-source framework for building serverless applications.

ConclusionIn this post, we showed you how to create an ML-based visual search application using Amazon SageMaker and the Amazon ES KNN index.

1 неделя, 4 дня назад @ aws.amazon.com
Introducing the open-source Amazon SageMaker XGBoost algorithm container
Introducing the open-source Amazon SageMaker XGBoost algorithm container Introducing the open-source Amazon SageMaker XGBoost algorithm container

As of this writing, you can take advantage of the open-source Amazon SageMaker XGBoost container, which has improved flexibility, scalability, extensibility, and Managed Spot Training.

Benefits of the open-source SageMaker XGBoost containerThe new XGBoost container has following benefits:Latest versionThe open-source XGBoost container supports the latest XGBoost 1.0 release and all improvements, including better performance scaling on multi-core instances and improved stability for distributed training.

Managed Spot TrainingYou can save up to 90% on your Amazon SageMaker XGBoost training jobs with Managed Spot Training support.

For more information, see Managed Spot Training in Amazon SageM…

1 неделя, 4 дня назад @ aws.amazon.com
The tech behind the Bundesliga Match Facts xGoals: How machine learning is driving data-driven insights in soccer
The tech behind the Bundesliga Match Facts xGoals: How machine learning is driving data-driven insights in soccer The tech behind the Bundesliga Match Facts xGoals: How machine learning is driving data-driven insights in soccer

xGoals and other Bundesliga Match Facts are setting new standards by providing data-driven insights in the world of soccer.

Quantifying goal-scoring chancesThe xGoals Match Facts debuted on May 26, 2020, during the Borussia Dortmund vs. FC Bayern Munich match, which was broadcast in over 200 countries worldwide.

It all starts with dataTo bring Match Facts to life, several checks and processes happen before, during, and after a match.

For data quality evaluations and initial experimentations, we need to perform exploratory data analysis, data visualization, data transformation, and data validation.

He was the lead developer of the Bundesliga Match Facts xGoals.

2 недели назад @ aws.amazon.com
Developing NER models with Amazon SageMaker Ground Truth and Amazon Comprehend
Developing NER models with Amazon SageMaker Ground Truth and Amazon Comprehend Developing NER models with Amazon SageMaker Ground Truth and Amazon Comprehend

This post explores an end-to-end pipeline to build a custom NER model using Amazon SageMaker Ground Truth and Amazon Comprehend.

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text.

The end-to-end process is as follows:Upload a set of text files to Amazon Simple Storage Service (Amazon S3).

To track on the Amazon S3 console, complete the following steps:On the Amazon S3 console, navigate to the output Select output.manifest .

For IAM role, if this is the first time you’re using Amazon Comprehend, select Create an IAM role.

2 недели назад @ aws.amazon.com
Generating compositions in the style of Bach using the AR-CNN algorithm in AWS DeepComposer
Generating compositions in the style of Bach using the AR-CNN algorithm in AWS DeepComposer Generating compositions in the style of Bach using the AR-CNN algorithm in AWS DeepComposer

AWS DeepComposer recently launched a new generative AI algorithm called autoregressive convolutional neural network (AR-CNN), which allows you to generate music in the style of Bach.

Listen to a few examples from original Bach compositions to familiarize yourself with his music:Composition 1:Composition 2:Composition 3:The AR-CNN algorithm enhances the original input melody by adding or removing notes from the input melody.

AR-CNN parameters in AWS DeepComposerIn the previous section, you heard an example of a composition created in the style of Bach music using AR-CNN algorithm.

At 0%, the algorithm preserves your original input melody but the algorithm is limited in its ability to enhance…

2 недели, 1 день назад @ aws.amazon.com
NVIDIA
последний пост 16 часов назад
Not So Taxing: Intuit Uses AI to Make Tax Day Easier
Not So Taxing: Intuit Uses AI to Make Tax Day Easier Not So Taxing: Intuit Uses AI to Make Tax Day Easier

Software company Intuit has decided that it’s a job for AI.

EMBED PODCAST HERETo help small businesses, Intuit has a range of programs such as the Intuit Aid Assist Program, which helps business owners figure out if they’re eligible for loans from the government.

And in the long term, Intuit is working on a machine learning program capable of using photos of financial documents to automatically extract necessary information and fill in tax documents.

Tune in to the AI PodcastGet the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Pod…

16 часов назад @ blogs.nvidia.com
Running Python UDFs in Native NVIDIA CUDA Kernels with the RAPIDS cuDF
Running Python UDFs in Native NVIDIA CUDA Kernels with the RAPIDS cuDF Running Python UDFs in Native NVIDIA CUDA Kernels with the RAPIDS cuDF

An essential part of the framework is a parser that parses a CUDA PTX function, which is compiled from the Python UDF, into an equivalent CUDA C++ device function that can be inlined into native CUDA C++ kernels.

A natural way to solve the problem is to backward compile the CUDA PTX device functions into CUDA C++ device functions, which can be inlined into CUDA C++ kernels by the nvcc compiler directly.

Backward compiler from CUDA PTX to CUDA C++CUDA supports the inline PTX syntax that allows you to write the CUDA PTX assembly in CUDA C++ code.

A general pictureThe inline PTX syntax does most of the work in the workflow from CUDA PTX to CUDA C++.

Function parameter loading instructionsThe f…

1 день, 6 часов назад @ developer.nvidia.com
Keeping Its Cool: Lenovo Expands Portfolio in Red Hot HPC and AI Market
Keeping Its Cool: Lenovo Expands Portfolio in Red Hot HPC and AI Market Keeping Its Cool: Lenovo Expands Portfolio in Red Hot HPC and AI Market

NVIDIA and Mellanox have been long-time collaborators with Lenovo and now this relationship is expanding in a big way.

This fall, Lenovo will begin providing NVIDIA Mellanox Spectrum Ethernet switches to its customers in selected integrated solutions, joining the NVIDIA Quantum InfiniBand switches already offered by the company.

The Spectrum Ethernet switches are the most advanced available in the market and are optimized for high-performance, AI, cloud and other enterprise-class systems.

The Spectrum switches from Lenovo will include Cumulus Linux, the leading Linux-based network operating system, recently acquired by NVIDIA.

Additionally, Lenovo ThinkAgile products will qualify NVIDIA Spe…

1 день, 14 часов назад @ blogs.nvidia.com
Accelerating Deep Learning Research in Medical Imaging Using MONAI
Accelerating Deep Learning Research in Medical Imaging Using MONAI Accelerating Deep Learning Research in Medical Imaging Using MONAI

The Medical Open Network for AI (MONAI), is a freely available, community-supported, PyTorch-based framework for deep learning in healthcare imaging.

The goal is to showcase the implementation of research prototypes and demonstrations from recent publications in medical imaging with deep learning.

Along with the flexibility and usability of MONAI, we envision MONAI research as a suitable venue to release the research code, increase the research impact, and promote open and reproducible research.

LAMP can be a useful tool for medical image analysis tasks, such as large image registration, detection, and neural architecture search.

SummaryThis post highlights how deep learning research for me…

2 дня, 14 часов назад @ developer.nvidia.com
Screening for COVID-19: Japanese Startup Uses AI for Drug Discovery
Screening for COVID-19: Japanese Startup Uses AI for Drug Discovery Screening for COVID-19: Japanese Startup Uses AI for Drug Discovery

To support COVID-19 research, the team is using AI to find drugs that are FDA-approved or in clinical trials that could be repurposed to treat the coronavirus.

Yuki spoke about the company’s work in AI for drug discovery in the Inception Startup Showcase at GTC Digital, NVIDIA’s digital conference for developers and AI researchers.

Since molecular data is sensitive intellectual property for the pharma industry, most choose to run the AI models on their own on-prem servers.

Beyond drug discovery, Elix also uses AI for molecular design for material informatics, working with companies like tire- and rubber-manufacturer Bridgestone and RIKEN, Japan’s largest research institution.

Visit our COVI…

2 дня, 18 часов назад @ blogs.nvidia.com
NVIDIA Ampere GPUs Come to Google Cloud at Speed of Light
NVIDIA Ampere GPUs Come to Google Cloud at Speed of Light NVIDIA Ampere GPUs Come to Google Cloud at Speed of Light

The NVIDIA A100 Tensor Core GPU has landed on Google Cloud.

Today’s introduction of the Accelerator-Optimized VM (A2) instance family featuring A100 makes Google the first cloud service provider to offer the new NVIDIA GPU.

“With our new A2 VM family, we are proud to be the first major cloud provider to market NVIDIA A100 GPUs, just as we were with NVIDIA T4 GPUs.

Google Cloud announced that additional NVIDIA A100 support is coming soon to Google Kubernetes Engine, Cloud AI Platform and other Google Cloud services.

For more information, including technical details on the new A2 VM family and how to sign up for access, visit the Google Cloud blog.

3 дня, 13 часов назад @ blogs.nvidia.com
Hardhats and AI: Startup Navigates 3D Aerial Images for Inspections
Hardhats and AI: Startup Navigates 3D Aerial Images for Inspections Hardhats and AI: Startup Navigates 3D Aerial Images for Inspections

Their drone startup, based in San Francisco, is picking up interest worldwide and has landed $35 million in Series D funding.

They founded DroneDeploy there, enabling contractors to capture photos, maps, videos and high-fidelity panoramic images for remote inspections of job sites.

DroneDeploy’s AI software platform — it’s the navigational brains and eyes — is operating in more than 200 countries and handling more than 1 million flights a year.

DroneDeploy was one of three startups that recently presented at an NVIDIA Inception Connect event held by Japanese insurer Sompo Holdings.

Customers of the DroneDeploy platform can follow a quickly created map to carry out a sequence of inspections …

4 дня, 10 часов назад @ blogs.nvidia.com
You Can’t Touch This: Deep Clean System Flags Potentially Contaminated Surfaces
You Can’t Touch This: Deep Clean System Flags Potentially Contaminated Surfaces You Can’t Touch This: Deep Clean System Flags Potentially Contaminated Surfaces

To spotlight potentially contaminated surfaces, hobbyist Nick Bild has come up with Deep Clean, a stereo camera system that flags objects that have been touched in a room.

Deep Clean uses an NVIDIA Jetson AGX Xavier developer kit as the main processing unit to map out a room, detecting where different objects lie within it.

Then, the coordinates are used to automatically annotate an image of the unoccupied room, displaying what has been touched and thus potentially contaminated.

Technology to Help the CommunityDeep Clean isn’t Bild’s first instance of helping the community through his technological pursuits.

Bild calls himself a “prototyper,” as he creates a variety of smart, useful devices…

4 дня, 14 часов назад @ blogs.nvidia.com
Heads Up, Down Under: Sydney Suburb Enhances Livability with Traffic Analytics
Heads Up, Down Under: Sydney Suburb Enhances Livability with Traffic Analytics Heads Up, Down Under: Sydney Suburb Enhances Livability with Traffic Analytics

With a new university campus nearby and an airport under construction, the city of Liverpool, Australia, 27 kilometers southwest of Sydney, is growing fast.

Part of Wollongong’s SMART Infrastructure Facility, the DLL has developed what it calls the Versatile Intelligent Video Analytics platform.

Synthetic data allows the project to learn from numerous scenarios that might not otherwise be present at any given time, like rainstorms or masses of cyclists.

“This synthetic data generation allowed us to generate 35,000-plus images per scenario of interest under different weather, time of day and lighting conditions,” said Barthelemy.

“The synthetic data generation uses ray tracing to improve the…

1 неделя, 1 день назад @ blogs.nvidia.com
Accelerating Apache Spark 3.0 with GPUs and RAPIDS
Accelerating Apache Spark 3.0 with GPUs and RAPIDS Accelerating Apache Spark 3.0 with GPUs and RAPIDS

NVIDIA has worked with the Apache Spark community to implement GPU acceleration through the release of Spark 3.0 and the open source RAPIDS Accelerator for Spark.

In this post, we dive into how the RAPIDS Accelerator for Apache Spark uses GPUs to:Accelerate end-to-end data preparation and model training on the same Spark cluster.

Apache Spark 3.0 represents a key milestone, as Spark can now schedule GPU-accelerated ML and DL applications on Spark clusters with GPUs, removing bottlenecks, increasing performance, and simplifying clusters.

For Apache Spark 3.0, new RAPIDS API actions are used by Spark SQL and DataFrames for GPU-accelerated memory-efficient columnar data processing and query pl…

1 неделя, 1 день назад @ developer.nvidia.com
Detecting Rotated Objects Using the NVIDIA Object Detection Toolkit
Detecting Rotated Objects Using the NVIDIA Object Detection Toolkit Detecting Rotated Objects Using the NVIDIA Object Detection Toolkit

The rotated boxes detected by ODTK (b) address this issue and better fit the outline of the objects.

Figure 5 shows an example of how rotated box intersections can be much more complex than axis-aligned box intersections.

All the features (axis-aligned and rotated bounding box detection) are available in the NVIDIA Object Detection Toolkit (ODTK).

Precision Recall F1 Score Axis-aligned model 0.37 0.55 0.44 Rotated model 0.77 0.76 0.76 Table 2.

Instance-level precision, recall, and F1 scores for an axis-aligned model compared to a rotated model when modeling the ISPRS Potsdam dataset.

1 неделя, 1 день назад @ developer.nvidia.com
Sand Safety: Startup’s Lifeguard AI Hits the Beach to Save Lives
Sand Safety: Startup’s Lifeguard AI Hits the Beach to Save Lives Sand Safety: Startup’s Lifeguard AI Hits the Beach to Save Lives

The two aspiring entrepreneurs — recent MBA graduates of Ben-Gurion University, in the country’s south — decided this was their problem to solve with AI.

They founded Sightbit in 2018 with BGU classmates Gadi Kovler and Minna Shezaf to help lifeguards see dangerous conditions and prevent drownings.

COVID-19 CallsPalmachim Beach lifeguards have a lot to watch, especially now as people get out of their homes for fresh air after the region begins reopening from COVID-19-related closures.

As part of Sightbit’s beach safety developments, the company had been training its network to spot how far apart people were to help gauge child safety.

Sightbit is a member of NVIDIA Inception, a virtual acce…

1 неделя, 1 день назад @ blogs.nvidia.com
Floating on Creativity: SuperBlimp Speeds Rendering Workflows with NVIDIA RTX GPUs
Floating on Creativity: SuperBlimp Speeds Rendering Workflows with NVIDIA RTX GPUs Floating on Creativity: SuperBlimp Speeds Rendering Workflows with NVIDIA RTX GPUs

They’re leaving CPU rendering behind and moving to NVIDIA RTX GPUs, bringing significant acceleration to the rendering workflows for their unique productions.

SuperBlimp had been using NVIDIA GPUs for the past few years, so they were already familiar with the power and performance of GPU acceleration.

But they always had one foot in the CPU camp and needed to constantly switch between CPU and GPU rendering.

“With NVIDIA GPUs, we saw render times reduce from 3 hours to 15 minutes.

Learn how NVIDIA GPUs are powering the future of creativity.

1 неделя, 2 дня назад @ blogs.nvidia.com
Heart of the Matter: AI Helps Doctors Navigate Pandemic
Heart of the Matter: AI Helps Doctors Navigate Pandemic Heart of the Matter: AI Helps Doctors Navigate Pandemic

Caption Health develops software for ultrasound systems, called Caption AI.

Without the images from Caption AI, it would have been difficult to clinch the diagnosis, said a doctor on the scene.

Heart Test Becomes Standard ProcedureCaption AI helped doctors in North Carolina determine that a 62-year-old man had COVID-19-related heart damage.

Then they automatically choose the highest-quality heart images and interpret them to help doctors make informed decisions.

Now that its tool has been tested in a pandemic, Caption Health looks forward to opportunities to help save lives across many ailments.

1 неделя, 3 дня назад @ blogs.nvidia.com
NVIDIA Puts More Tools in Hands of Artists, Designers and Data Scientists Working Remotely
NVIDIA Puts More Tools in Hands of Artists, Designers and Data Scientists Working Remotely NVIDIA Puts More Tools in Hands of Artists, Designers and Data Scientists Working Remotely

NVIDIA is giving these millions of professionals around the world a boost with a new version of our virtual GPU software, vGPU July 2020.

The software adds support for more workloads and is loaded with features that improve operational efficiencies for IT administrators.

Initial offerings will be supported with NVIDIA vComputeServer software, enabling GPU virtualization for AI and data science workloads.

“To ensure the needs of business leaders are met, SUSE and NVIDIA have worked to simplify the use of NVIDIA virtual GPUs in SUSE Linux Enterprise Server.

This includes cross-branch support, where the host and guest vGPU software can be on different versions, easing upgrades and large deploy…

1 неделя, 3 дня назад @ blogs.nvidia.com
Apple Machine Learning Journal
последний пост None
Uber Engineering Uber Engineering
последний пост 1 неделя, 3 дня назад
Fiber: Distributed Computing for AI Made Simple
Fiber: Distributed Computing for AI Made  Simple Fiber: Distributed Computing for AI Made Simple

Instead of programming only a single desktop or laptop, users can leverage this system to program the whole computer cluster.

Fiber allows users to write programs that run on a computer cluster without needing to dive into the details of the computer cluster.

This overall architecture is summarized in Figure 2, below:Job-backed processesFiber introduces a new concept called job-backed processes (also called a Fiber processes).

When starting a new Fiber process, Fiber creates a new job with the proper Fiber back end on the current computer cluster.

Our hypothesis was that Fiber should perform similarly to multiprocessing because neither Fiber nor multiprocessing rely on complex scheduling me…

1 неделя, 3 дня назад @ eng.uber.com
Introducing Neuropod, Uber ATG’s Open Source Deep Learning Inference Engine
Introducing Neuropod, Uber ATG’s Open Source Deep Learning Inference Engine Introducing Neuropod, Uber ATG’s Open Source Deep Learning Inference Engine

Unfortunately, adding support for a new deep learning framework across an entire machine learning stack is resource and time-intensive.

Using multiple deep learning frameworksDeep learning (DL) is advancing very quickly and different DL frameworks are effective at different tasks.

Over the last year, we have deployed hundreds of Neuropod models across Uber ATG, Uber AI, and the core Uber business.

Deep learning with NeuropodLet’s take a look at the overall deep learning process when using Neuropod to see how it helps make experimentation, deployment, and iteration easier.

Next stepsNeuropod has allowed Uber to quickly build and deploy new deep learning models, but that’s just the start.

1 месяц назад @ eng.uber.com
Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning
Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning

The “spikes” at intersections result from the SDV crossing the same intersection multiple times as part of a “grid-coverage” driving pattern.

Data mining the scenario “pedestrian crossing the street”While the SDV perception system is designed to detect pedestrians, only a subset of pedestrians actually cross the street.

Analyzing the “pedestrian crossing the street” scenarioThe scenario of a pedestrian crossing the street has many relevant measurements, including the pedestrian crossing speed, road width, distance walked, crossing duration, distance walked on crosswalk, and traffic light state(s) at the time of crossing.

Let’s start by analyzing just one measurement: the pedestrian crossing…

1 месяц, 1 неделя назад @ eng.uber.com
Meta-Graph: Few-Shot Link Prediction Using Meta-Learning
Meta-Graph: Few-Shot Link Prediction Using Meta-Learning Meta-Graph: Few-Shot Link Prediction Using Meta-Learning

For instance, in a social network we may use link prediction to power a friendship recommendation system, or in the case of biological network data, we might use link prediction to infer possible relationships between drugs, proteins, and diseases.

In principle, it can be combined with a wide variety of link prediction approaches based on GNNs, but we adopted a specific GNN, variational graph autoencoders (VGAEs), as our base link prediction framework9.

Experiment setupTo test how Meta-Graph might work in a real-world setting, we designed three novel benchmarks for few-shot link prediction.

In this few-shot link prediction setting, there are train/val/test splits at both the edge level and …

1 месяц, 1 неделя назад @ eng.uber.com
Announcing a New Framework for Designing Optimal Experiments with Pyro
Announcing a New Framework for Designing Optimal Experiments with Pyro Announcing a New Framework for Designing Optimal Experiments with Pyro

We’ll treat working memory capacity as the length of the longest list of random digits that the participant can memorize.

InferenceWe use Bayesian inference to incorporate our new observation into an estimate of the participant’s working memory capacity.

It models the probability of correctly remembering the list of digits of different lengths for people with different working memory capacities, as shown in Figure 1, below:We also need a sense of what working memory capacities are plausible.

Computing the optimal designOur score for experimental designs, EIG, is notoriously difficult to estimate.

In our paper, we showed that this method can be remarkably accurate on a range of different exp…

1 месяц, 4 недели назад @ eng.uber.com
Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

Last year we introduced the Paired Open-Ended Trailblazer (POET) to explore the idea of open-ended algorithms.

ANNECS: A new way to measure progress in open-ended systemsQuantifying the performance of open-ended algorithms has remained elusive for the field.

Compare those from Original POET in Figure 4a to those produced by Enhanced POET in Figure 4b, below.

If this piques your interest, be sure to check out videos of example Enhanced POET agents on the Uber AI YouTube channel.

Towards that end, we are not only releasing a paper with full technical details, but also have open sourced the code for Enhanced POET.

2 месяца назад @ eng.uber.com
Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles
Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles Under the Hood of Uber ATG’s Machine Learning Infrastructure and Versioning Control Platform for Self-Driving Vehicles

A trained model, which requires as input the data set artifact, the model training code, and configuration files governing model training.

Example sequence of events: registering a new data setUpon user-registration of a new data set, the VerCD Data set Service stores the dependency metadata in our database.

Data set service APIThe data set service is responsible for tracking the dependencies for building a given data set.

The REST API supports the functions of creating a new data set, reading the metadata for a data set, updating the metadata of a data set, deleting a data set, and getting the artifact locations of the data set (such as in S3 or HDFS).

For instance, the VerCD data set serv…

4 месяца, 1 неделя назад @ eng.uber.com
Building a Backtesting Service to Measure Model Performance at Uber-scale
Building a Backtesting Service to Measure Model Performance at Uber-scale Building a Backtesting Service to Measure Model Performance at Uber-scale

To better assess the performance of our models, we built a backtesting service for measuring forecast model error rates.

The backtesting service runs in a distributed system, allowing multiple models (>10), many backtesting windows (>20), and models for different cities (>200) to run simultaneously.

Backtesting at scaleOur data science teams regularly create forecast models and statistics to better understand budget spending and project financial performance.

For the purposes of our backtesting service, we chose to leverage two primary backtesting data split mechanisms, backtesting with an expanding window and backtesting with a sliding window:Above, we showcase three windows for each metho…

4 месяца, 4 недели назад @ eng.uber.com
Uber AI in 2019: Advancing Mobility with Artificial Intelligence
Uber AI in 2019: Advancing Mobility with Artificial Intelligence Uber AI in 2019: Advancing Mobility with Artificial Intelligence

At the forefront of this effort is Uber AI, Uber’s center for advanced artificial intelligence research and platforms.

In this year alone, AI research at Uber has led to significant improvements in demand prediction and more seamless pick-up experiences.

Fostering AI collaboration through open sourceIn 2019, Uber AI was committed to sharing knowledge and best practices with the broader scientific community through open source projects.

Looking towards 2020Next year, Uber AI will continue to innovate, collaborate, and contribute to Uber’s platform services through the application of AI across our business.

For more on Uber AI, be sure to check out related articles on the Uber Engineering Blo…

6 месяцев, 3 недели назад @ eng.uber.com
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

We in Uber AI Labs investigated the intriguing question of whether we can create learning algorithms that automatically generate training data, learning environments, and curricula to help AI agents rapidly learn.

Increasingly, neural architecture search (NAS) algorithms are being deployed to automate the search for architectures, with great results.

32), new learners are able to learn on synthetic data faster than real data (red line vs. blue line in Figure 1).

In our experiments, the estimates come either from training for 128 SGD steps on GTN-generated data or real data.

Then, for each method, the final best architecture according to the estimate is trained a long time on real data.

6 месяцев, 3 недели назад @ eng.uber.com
Controlling Text Generation with Plug and Play Language Models
Controlling Text Generation with Plug and Play Language Models Controlling Text Generation with Plug and Play Language Models

This article discusses an alternative approach to controlled text generation, titled the Plug and Play Language Model (PPLM), introduced in a recent paper from Uber AI.

In many ways, language models are like wise but unguided wooly mammoths that lumber wherever they please.

As we will show below, attribute models with only a single layer containing 4,000 parameters perform well at recognizing attributes and guiding generation.

Thus, we use the unmodified language model to ensure the fluency of language is maintained at or near the level of the original language model (in this example, GPT-2-medium).

Multiple attribute modelsWe may combine multiple attribute models in controlled generation, …

7 месяцев, 1 неделя назад @ eng.uber.com
Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations

To this end, we previously developed ML models to better understand queries and for multi-objective optimization in Uber Eats search and recommender system in Uber Eats searches and surfaced food options.

Graph learning in a nutshellTo best understand how we made our Uber Eats recommendations more accurate, it helps to know the basics of how graph learning works.

For example, to represent an eater in our Uber Eats model we don’t only use order history to inform order suggestions, but also information about what food items are connected to past Uber Eats orders and insights about similar users.

For our Uber Eats use case, we opted for a graph neural network (GNN)-based approach to obtain an …

7 месяцев, 1 неделя назад @ eng.uber.com
Uber Goes to NeurIPS 2019
Uber Goes to NeurIPS 2019 Uber Goes to NeurIPS 2019

This year, Uber is presenting 11 papers at the NeurIPS 2019 conference in Vancouver, Canada!

Scalable Global Optimization via Local Bayesian OptimizationDavid Eriksson (Uber AI) · Michael Pearce (Uber AI intern / Warwick University) · Jacob Gardner (Uber AI) · Ryan Turner (Uber AI) · Matthias Poloczek (Uber AI)ArXivDecember 10 at 4:25 pm, West Ballroom C, NeurIPS Spotlight TalkDecember 10 at 5:30 pm, East Exhibition Hall B&C, Poster #9Bayesian optimization (BO) has recently emerged as a successful technique for the global optimization of black-box functions.

For additional information about our talks and posters, check out the Uber NeurIPS 2019 site.

Interested in the ML research that Uber …

7 месяцев, 1 неделя назад @ eng.uber.com
Announcing the 2020 Uber AI Residency
Announcing the 2020 Uber AI Residency Announcing the 2020 Uber AI Residency

On behalf of Uber, we invite you to join us on our journey as an Uber AI Resident.

Established in 2018, the Uber AI Residency is a 12-month training program for recent college and master’s graduates, professionals who are looking to reinforce their AI skills, and those with quantitative skills and interest in becoming an AI researcher at Uber.

This year’s AI residency program will focus on our self-driving cars project through Uber Advanced Technology Group (ATG).

Open source & publication opportunitiesAcross Uber, we are committed to an open and inclusive research mission that benefits the community at large through both Uber AI and Uber ATG Research.

Learn more about the Uber AI Residency…

7 месяцев, 2 недели назад @ eng.uber.com
Get to Know Uber ATG at ICCV, CoRL, and IROS 2019
Get to Know Uber ATG at ICCV, CoRL, and IROS 2019 Get to Know Uber ATG at ICCV, CoRL, and IROS 2019

We hope our approach to sharing will deepen the interactions and collaborations between industry and academia, and will ultimately bring self-driving research communities together.

This year, Uber ATG has five publications accepted at ICCV, two publications accepted at CoRL, and two publications accepted at IROS.

In addition, Raquel Urtasun, Uber ATG Chief Scientist and Head of Uber ATG R&D, will be giving four talks at ICCV.

Please come visit us at ICCV (booth #D-7) IROS and CORL to learn more about our lab’s research, discuss the work with our researchers, and hear about career opportunities with Uber ATG.

Learn more about research opportunities with Uber ATG by visiting our careers page.

8 месяцев, 2 недели назад @ eng.uber.com
neptune.ai neptune.ai
последний пост 2 дня, 20 часов назад