Very ML
Наша SOTA-подборка ML-новостей. Что-то из этого читает Лекун.
DataTau DataTau
последний пост 2 часа назад
Mobile App Development Company | ArStudioz
Mobile App Development Company | ArStudioz

https://www.arstudioz.com/mobile-app-development

2 часа назад @ datatau.net
Web App Development Company | ArStudioz
Web App Development Company | ArStudioz

https://www.arstudioz.com/web-development

2 часа назад @ datatau.net
MonoLayout | Bird’s-Eye Layout Estimation from A Single Image
MonoLayout | Bird’s-Eye Layout Estimation from A Single Image

https://medium.com/syncedreview/monolayout-birds-eye-layout-estimation-from-a-single-image-3c4cd4e89268

10 часов назад @ datatau.net
generate end-to-end source code for any programming language in one minute
generate end-to-end source code for any programming language in one minute

https://medium.com/javascript-in-plain-english/angular-api-client-with-spring-boot-api-in-one-minute-32d3337483b6

12 часов назад @ datatau.net
Vipassana and Machine Learning
Vipassana and Machine Learning

https://medium.com/@mr5iff/vipassana-and-machine-learning-a89058165088

16 часов назад @ datatau.net
Recommended books for learning R
Recommended books for learning R

http://theautomatic.net/2020/02/25/3-recommended-books-on-learning-r/

18 часов назад @ datatau.net
Being a Data Scientist at a Start-Up
Being a Data Scientist at a Start-Up

https://scienceofdata.org/2020/02/25/being-a-data-scientist-at-a-start-up/

21 час назад @ datatau.net
Why Did I Reject a Data Scientist Job?
Why Did I Reject a Data Scientist Job?

https://www.kdnuggets.com/2020/02/why-reject-data-scientist-job.html

21 час назад @ datatau.net
Generate Diverse Counterfactual Explanations for any machine learning model
Generate Diverse Counterfactual Explanations for any machine learning model

https://github.com/microsoft/dice

1 день, 7 часов назад @ datatau.net
AI Weekly Update (Feb 24, 2020)
AI Weekly Update (Feb 24, 2020)

https://youtu.be/hZz95G8sJAs

1 день, 10 часов назад @ datatau.net
Bengio and Mila Researchers Use GAN Images to Illustrate Impact of Climate Change
Bengio and Mila Researchers Use GAN Images to Illustrate Impact of Climate Change

https://medium.com/syncedreview/bengio-and-mila-researchers-use-gan-images-to-illustrate-impact-of-climate-change-4cac3a78ef47

1 день, 12 часов назад @ datatau.net
Yolo Creator Joseph Redmon Stopped CV Research Due to Ethical Concerns
Yolo Creator Joseph Redmon Stopped CV Research Due to Ethical Concerns

https://medium.com/@Synced/yolo-creator-says-he-stopped-cv-research-due-to-ethical-concerns-b55a291ebb29

1 день, 14 часов назад @ datatau.net
Subclass Distillation Explained
Subclass Distillation Explained

https://youtu.be/cZcO7xFiY84

1 день, 16 часов назад @ datatau.net
Explore Best Programming Courses & Tutorials
Explore Best Programming Courses & Tutorials

https://www.courseya.com/

1 день, 17 часов назад @ datatau.net
Hadoop Interview Questions
Hadoop Interview Questions

https://www.onlineinterviewquestions.com/hadoop-interview-questions/

1 день, 20 часов назад @ datatau.net
Medium Medium
последний пост 2 часа назад
Artificial intelligence is here. Now what?
Artificial intelligence is here. Now what? Artificial intelligence is here. Now what?

Computers have infiltrated our lives. Next, they’ll be indistinguishable from ourselves. What do we do about it?Continue reading on Medium »

2 часа назад @ medium.com
Emerging Technologies
Emerging Technologies Emerging Technologies

Emerging technologies are technologies whose development, practical applications, or both are still largely unrealized, such that they are…Continue reading on Medium »

2 часа назад @ medium.com
Millennial’s: More Narcissistic than Other Generations?
Millennial’s: More Narcissistic than Other Generations? Millennial’s: More Narcissistic than Other Generations?

Brief History Lesson in Greek Mythology:Continue reading on Medium »

2 часа назад @ medium.com
Advantage and Disadvantage of Sampling
Advantage and Disadvantage of Sampling Advantage and Disadvantage of Sampling

Sampling:Continue reading on Medium »

2 часа назад @ medium.com
Reinforcement Learning formulation for Markov Decision Process and Multi Armed Bandit
Reinforcement Learning formulation for Markov Decision Process and Multi Armed Bandit Reinforcement Learning formulation for Markov Decision Process and Multi Armed Bandit

I have explored the basics of Reinforcement Learning in the previous post & now will be going at more advanced level.Reinforcement comes…Continue reading on Data Driven Investor »

2 часа назад @ medium.com
Recommendation System in Python: LightFM
Recommendation System in Python: LightFM Recommendation System in Python: LightFM

A Step-by-Step guide to building a recommender system in Python using LightFMContinue reading on Towards Data Science »

2 часа назад @ towardsdatascience.com
A simple introduction to transitioning into cybersecurity using open-source tools.
A simple introduction to transitioning into cybersecurity using open-source tools. A simple introduction to transitioning into cybersecurity using open-source tools.

How do I transition into a cybersecurity career? This guide talks about open source tools you can use to gain the skills necessary.Continue reading on Towards Data Science »

2 часа назад @ towardsdatascience.com
Pandas’ high efficiency at managing data
Pandas’ high efficiency at managing data Pandas’ high efficiency at managing data

What is Pandas?Continue reading on Medium »

3 часа назад @ medium.com
The 4 most common data science interviews
The 4 most common data science interviews

Anyone trying to enter the field of data science would know that the data science interview process is not as well-defined as the software…Continue reading on Medium »

3 часа назад @ medium.com
Data types in statistics
Data types in statistics Data types in statistics

First and foremost, before trying to model a set of variables it is a must to understand the type of variables. Having an understanding…Continue reading on Medium »

3 часа назад @ medium.com
비아이 매트릭스? 2020 놓쳐서는 안될 데이터 리포팅 툴 TOP 7
비아이 매트릭스? 2020 놓쳐서는 안될 데이터 리포팅 툴 TOP 7 비아이 매트릭스? 2020 놓쳐서는 안될 데이터 리포팅 툴 TOP 7

리포팅 툴은 데이터를 그래프와 차트로 시각화하여 직관적인 방식으로 데이터를 보여주는 데에 매우 중요한 역할을 합니다. 훌륭한 리포팅 툴은 사용자가 정보를 편리하게 수집하고 비즈니스를 포괄적으로 볼 수 있도록 도와 줄 수 있어야 합니다.Continue reading on Medium »

4 часа назад @ medium.com
Using Data Science to Rethink Farm Locations
Using Data Science to Rethink Farm Locations Using Data Science to Rethink Farm Locations

I want to transform the way we view cities. Everyone eats, but most people have little to do with how their food gets to their plate…Continue reading on Medium »

4 часа назад @ medium.com
Travel to or host a place in Seattle, things to know
Travel to or host a place in Seattle, things to know Travel to or host a place in Seattle, things to know

Insight from Airbnb datasetContinue reading on Medium »

4 часа назад @ medium.com
Search Intent For SEO
Search Intent For SEO Search Intent For SEO

The word on almost everybody’s lips in the tech industry is artificial intelligence. It is set to be the most disruptive type of…Continue reading on visualmodo »

4 часа назад @ medium.com
New AI Bots In the Making: Exciting Opportunities for Partners!
New AI Bots In the Making: Exciting Opportunities for Partners! New AI Bots In the Making: Exciting Opportunities for Partners!

OLPORTAL, the world’s first decentralized messenger on hybrid neural networks is releasing new neurobots, after the humongous reception of…Continue reading on Medium »

4 часа назад @ medium.com
/r/MachineLearning /r/MachineLearning
последний пост 59 минут назад
[P] Follow-up work on the baikal library
[P] Follow-up work on the baikal library

Hello everyone, This is a follow-up post to share some advancements and improvements on baikal, a project on a graph-based, functional API for building complex machine learning pipelines that I shared a few months ago. Thank you everyone for your comments. I released a new version with some API improvements to handle some common, important cases, following the feedback I received from some users here on this sub and on GitHub Issues. Also, I added a prettier documentation hosted in readthedocs (thanks to this very sleek sphinx theme). The main improvements are: Make non-naive stacking easier. As some users pointed out, the basic stacking example was a naive protocol prone to over-fitting. T…

59 минут назад @ reddit.com
Batch norm with entropic regularization turns deterministic autoencoders into generative models
Batch norm with entropic regularization turns deterministic autoencoders into generative models

submitted by /u/occamexmachina [link] [comments]

1 час назад @ reddit.com
[D] A US Court says web scraping is legal. Are there any other legal barriers to making machine learning models on content from websites?
[D] A US Court says web scraping is legal. Are there any other legal barriers to making machine learning models on content from websites?

Reference https://thenextweb.com/security/2019/09/10/us-court-says-scraping-a-site-without-permission-isnt-illegal/ submitted by /u/RelevantMarketing [link] [comments]

5 часов назад @ reddit.com
[P] Fine-tuning GPT-2 1.5B on podcast transcripts
[P] Fine-tuning GPT-2 1.5B on podcast transcripts

Hi folks, After becoming obsessed with AI Dungeon, I wanted to try my hand at fine-tuning the GPT-2 1.5B model. Looking for a sizable text corpus, I used Otter.ai to transcribe >350 episodes of the podcast Accidental Tech Podcast. I used a TPU on Google Cloud for fine-tuning, and ran it long enough to exhaust my $300 new account credit (~2000 steps). TPUs seem to be the easiest free option for fine-tuning the 1.5B model. I go into more detail in this blogpost: https://familiarcycle.net/2020/how-to-finetune-gpt2-on-podcast-transcripts.html I'm pretty happy with the results and overall feel like a whole new world has opened up for me. I'm a software engineer by trade but haven't dipped my toe…

5 часов назад @ reddit.com
[D] Graph visualization ?
[D] Graph visualization ?

Hello, I am working on a graph term preject I am supposed to visualize the clusters i have got after doing clustering in the similarity graph. I have cretaed G my similarity graph using networkx and for visualization i have to work with Dash Cytoscape or graph-tool . i have no idea which one is the easier and how i can change colors according to clusters to visualize the colored graph in my notebook. I appreciate any help possible thanks in advance. submitted by /u/ellinor12345 [link] [comments]

5 часов назад @ reddit.com
[R] Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning. By introducing some noise in the feature space rather than in the input space as is typically done for visual inputs, their agents can generalize better to uns
[R] Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning. By introducing some noise in the feature space rather than in the input space as is typically done for visual inputs, their agents can generalize better to uns

submitted by /u/hardmaru [link] [comments]

6 часов назад @ reddit.com
[R] Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation. They show that NMT models with manually engineered, fixed (i.e. position-based) attention patterns perform as well as models that learn how to attend.
[R] Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation. They show that NMT models with manually engineered, fixed (i.e. position-based) attention patterns perform as well as models that learn how to attend.

submitted by /u/hardmaru [link] [comments]

6 часов назад @ reddit.com
[P] Just released a Keras-based Sum-Product Network (SPN) library!
[P] Just released a Keras-based Sum-Product Network (SPN) library! [P] Just released a Keras-based Sum-Product Network (SPN) library!

I recently released the libspn-keras library that enables the ML community to easily train and experiment with Sum-Product Networks (SPNs). SPNs are versatile probabilistic models that come with a natural probabilistic interpretation. Even though SPNs have been around for a while, they've remained a niche within ML research. This is partly due to the fact that they weren't easy to implement in a tensor-based fashion, nor was it trivial to scale them. This library aims to overcome those burdens, ultimately bringing these tractable, interpretable models available to a wider audience by offering a TensorFlow 2.x + Keras compatible implementation. You can find the library here! A short feature …

10 часов назад @ reddit.com
[D]Next Episode with Gary Marcus, What are the questions you are interested in asking?
[D]Next Episode with Gary Marcus, What are the questions you are interested in asking?

Hello! The next Episode of IEEE RAS Soft Robotics with Gary Marcus, What are the questions you are interested in asking? Gary is a professor in the Department of Psychology at New York University and was founder and CEO of Geometric Intelligence, a machine learning company later acquired by Uber. We think it would be interesting to integrate classical AI, deep-learning, symbolic reasoning from the perspective of soft robotics/ robotics. Please let us know what do you think about the questions that should be addressed? submitted by /u/meldiwin [link] [comments]

10 часов назад @ reddit.com
[D] Use of Markets in Distributed Knowledge Systems
[D] Use of Markets in Distributed Knowledge Systems

Hey all, I recently read DeepMind's work on Smooth Markets as a legible way to frame RL problems. Markets have multiple uses in economics, but one of the most important ones is coming to a consensus on the price of some asset, including knowledge assets. This is why prediction markets such soybean futures are good at aggregating knowledge. However, I have not seen any papers that utilize markets in the federated learning setting, which is entirely a function of aggregating disparate but useful knowledge. Is there any work on this? My literature search found nothing, but it was cursory. submitted by /u/BrahmaTheCreator [link] [comments]

10 часов назад @ reddit.com
Learning to Continually Learn - J Clune, Et al
Learning to Continually Learn - J Clune, Et al

submitted by /u/Utopia2 [link] [comments]

11 часов назад @ reddit.com
[D] Uncertainty estimation experiments with the Monte Carlo Dropout in CNNs
[D] Uncertainty estimation experiments with the Monte Carlo Dropout in CNNs

There is a blog post about uncertainty estimation experiments with the MC-dropout method with structured dropout for convolutional networks. The experiments present only the toy MNIST dataset but the post is about the basic concepts behind the NN epistemic uncertainty estimation. I wrote this post as currently I see a lot of ML applications being created where no one cares about the concept of uncertainty. Or they just use the softmax output and call it "confidence" which drives me nuts. With these animated experiments I hope some of the people mentioned above starts to think about NN reliability in applications. Read blog post here submitted by /u/gabegabe6 [link] [comments]

11 часов назад @ reddit.com
[P] Deadline extended for the FDA Open Data Adverse Event Anomalies Challenge
[P] Deadline extended for the FDA Open Data Adverse Event Anomalies Challenge

In response to a flurry of new interest in the Gaining New Insights by Detecting Adverse Event Anomalies Using FDA Open Data Challenge we have decided to extend the submission period to March 13th. New guidance is available on the challenge site including updated evaluation criteria and anomaly examples. Please also note that providing code is no longer required for a valid final submission. Selected contributors will be invited to participate in a panel at the Modernizing FDA’s Data Strategy public meeting. We are also pleased to announce that the Journal of the American Medical Informatics Association (JAMIA) supports the submission of a paper describing the challenge and the insights tha…

12 часов назад @ reddit.com
[D] Is the latent space in GANs regularized?
[D] Is the latent space in GANs regularized?

Hello, I'm working on trying to cluster some high dimensional data and a new approach for me is to use a Variational Auto-Encoder (VAE) to generate the data itself, and then look at the latent space of the VAE. I know that because of the way VAEs work, the latent space is regularized, and therefore I would expect to see some topographic organization in the latent space (though no clusters). My question is: do adversarial GANs also have a regularized latent space? Are they necessarily regularized? Does the visualization of the latent space of GANs make sense? I have looked at this paper: ClusterGAN: Latent Space Clustering in Generative Adversarial Networks but it's a bit dense for me submit…

12 часов назад @ reddit.com
[D] Kaggle Days Dubai
[D] Kaggle Days Dubai

Anyone going to kaggle days dubai? I have received a call for acceptance of my application but couldn't decide if I should go for it. I'm from India. Thanks. submitted by /u/anuragiitr [link] [comments]

12 часов назад @ reddit.com
Towards Data Science Towards Data Science
последний пост 2 часа назад
How to Recruit (and Keep!) Individuals from Under Represented Groups
How to Recruit (and Keep!) Individuals from Under Represented Groups How to Recruit (and Keep!) Individuals from Under Represented Groups

Individuals from Under Represented GroupsEighteen strategies to improve your company’s diversityThrough freelancing and full time employment, I have worked on a lot of teams.

Most of the time the teams were majority straight, cis gender White or Asian males.

I do not go to an interview with all White or Asian male interview panels.

I have had recruiters thank me for making this request and include impressive females to my interview list.

I wrote ways to bring on women from my perspective as a woman, however, I think these are useful techniques for bringing on candidates from any Under Represented Group (URG):

2 часа назад @ towardsdatascience.com
ROC Curve Transforms the Way We Look at a Classification Problem
ROC Curve Transforms the Way We Look at a Classification Problem ROC Curve Transforms the Way We Look at a Classification Problem

ROC Curve Transforms the Way We Look at a Classification ProblemThere is no machine learning algorithm that works best for all the problemsThe Receiver Operating Characteristic (ROC) curve is a probability curve that illustrates how good our binary classification is in classifying classes based on true-positive and false-positive rates.

It is the area under the (ROC) curve.

An example of an ROC curve and AUC.

Source: Huy BuiMotivationWhy is understanding the ROC curve and AUC important for a data scientist?

Note: The approach of precision-recall is very similar to the ROC curve, you can find it document here.

2 часа назад @ towardsdatascience.com
Recommendation System in Python: LightFM
Recommendation System in Python: LightFM Recommendation System in Python: LightFM

There are three groups of datasets:meta-data of the booksuser-book interactionsusers’ detailed book reviewsThese datasets can be merged together by matching book/user/review ids.

To develop a reliable and robust ML model, it is essential to get a thorough understanding of the available data.

As the first step, let’s take a look at all the available fields, and sample databooks_metadata.sample(2)Books Metadata (Sample)While all the available information is vital to extract contextual information to be able to train a better recommendation system, for this example, we’ll only focus on the selected fields that require minimal manipulation.

interactions_selected.groupby(['rating', 'is_read']).s…

2 часа назад @ towardsdatascience.com
A simple introduction to transitioning into cybersecurity using open-source tools.
A simple introduction to transitioning into cybersecurity using open-source tools. A simple introduction to transitioning into cybersecurity using open-source tools.

But the reality is, most industries are moving towards tech, and with tech comes cyber threats.

So traditional IT departments are having to adapt and start putting cybersecurity at the forefront.

There are so many roles in cybersecurity, knowing where your interests are is a good start.

If you are coming from a data science background and love data, well cybersecurity 🤝data science = two peas in a pod.

In cybersecurity, an appetite for knowledge is needed this can be what separates candidates who graduated from the same institute with the same grade.

2 часа назад @ towardsdatascience.com
The ultimate guide to A/B testing. Part 4: non-parametric tests
The ultimate guide to A/B testing. Part 4: non-parametric tests The ultimate guide to A/B testing. Part 4: non-parametric tests

And in part three we talked about two types of tests (parametric and non-parametric) and when and how to use parametric tests.

Just a quick reminder of the A/B test set up:We’ve been developing this amazing arcade game for the last couple of years, and things seem to be going pretty well.

After some discussions, game team decided to develop the new mode and run an A/B test to check, how it affects the metrics.

The most popular non-parametric tests are Pearson’s chi-squared, Fisher’s exact tests and Mann–Whitney U-test.

For our example let’s use chi-squared to check retention and Fisher’s exact tests for conversion (cause for this metric data is highly imbalanced).

2 часа назад @ towardsdatascience.com
How to encode categorical data
How to encode categorical data How to encode categorical data

How to Encode Categorical Data12 different encoding techniques from basic to advancedIn this notebook, I will introduce different approaches to encode categorical data.

By the end of this post, I hope that you would have a better idea of how to deal with categorical data.

Table of ContentThere are three main routes to encode the string data type:Classic Encoders: well known and widely usedContrast Encoders: an innovative way to encode data by looking at different levels of features6.

Let’s create a random pokemon data set!

I created this sample data with only 10 observations to better illustrate the different encoding techniques and bring some joy to myself!

4 часа назад @ towardsdatascience.com
On Trolleys, Self-Driving Cars, and Missing the Forest for the Trees.
On Trolleys, Self-Driving Cars, and Missing the Forest for the Trees. On Trolleys, Self-Driving Cars, and Missing the Forest for the Trees.

When weighing the potential benefits of self-driving cars, versus staying hung up on various philosophical riddles like the original autonomous vehicle Trolley problem or the question of the moral agency of a self-driving car, the choice was obvious.

It is conceivable that such large scale unemployment will lead to social strife so bad that it cancels out the benefits of self-driving vehicles in the first place.

A word of warning:You will have noticed by this point that I use the word ‘potential’ a lot when referring to the benefits of self-driving cars.

This is not because I think the future of L5 autonomous vehicles is in some ways uncertain.

Conclusion:In this post, I explained how I wen…

7 часов назад @ towardsdatascience.com
Deep Learning in the Cosmos: Ranking 3 Machine Learning (ML) Applications
Deep Learning in the Cosmos: Ranking 3 Machine Learning (ML) Applications Deep Learning in the Cosmos: Ranking 3 Machine Learning (ML) Applications

Deep learning has helped advance the state-of-the-art in multiple fields over the last decade, with scientific research as no exception.

Deep learning has likewise found applications in scientific research at the opposite end of the scale spectrum.

In this post, we’ll discuss some recent applications of deep learning used to study cosmology, aka the study of the universe.

Since co-expertise in deep learning and n-body orbital mechanics is going to be rare-squared, it’s easy to get bogged down in unfamiliar details and miss the point.

The use of open-source software and building on previously developed architectures underlines the fact that deep learning is in a advanced readiness phase.

7 часов назад @ towardsdatascience.com
Time Series Land Cover Challenge: a Deep Learning Perspective
Time Series Land Cover Challenge: a Deep Learning Perspective Time Series Land Cover Challenge: a Deep Learning Perspective

The subsampling of the pixels done by the TiSeLaC organizers had, as a first intent, the idea of balancing the classes distribution.

However, we can see the problem from the point of view of signal processing, and more especially of Time Series classification with extra information concerning the localisation of the said time series in space thanks to pixels coordinates data.

TiSeLaC Classification taskOnce we take the direction of Time Series classification, we can compare different models that have been popular in recent Deep Learning for Time Series litterature.

3.1 Multiple unimodal networks: Multi-Channel Deep Convolutional Neural NetworkOne of the most popular models being the Multi-C…

7 часов назад @ towardsdatascience.com
Improving SaaS with Data Science
Improving SaaS with Data Science Improving SaaS with Data Science

Improving SaaS with Data ScienceHow our practicum team is helping a SaaS company improve their customer expansion and retention strategies using dataAs a UC Davis MSBA student, I am part of a practicum team working on a Data Science project for a SaaS company.

Who — The SaaS company and our UC Davis MSBA team— The SaaS company and our UC Davis MSBA team How — Our plans for using their data to improve their businessWhat is SaaS?

They have been super helpful in providing lots of interesting data that combine customer usage data (from the company’s platform) as well as Customer Relationship Management (CRM) information.

We will perform customer segmentation analysis as well as customer churn a…

8 часов назад @ towardsdatascience.com
YOLOv2 Object Detection from ONNX Model in MATLAB
YOLOv2 Object Detection from ONNX Model in MATLAB YOLOv2 Object Detection from ONNX Model in MATLAB

YOLOv2 Object Detection from ONNX Model in MATLABHow I imported Tiny YOLOv2 ONNX model in MATLAB and re-trained the network to detect objects on custom data setImage courtesy Embry-Riddle’s RoboSub TeamData setRoboSub is a competition run by RoboNation where students build an autonomous underwater vehicle to perform simulated search & rescue tasks.

The basic task is to identify and avoid the Autonomous Underwater Vehicle (AUV) through the objects under water.

Data PreparationAll the images are resized to (416 x 416 x 3) and divided into three folders of train, test and validation.

Images are then labelled using the custom automation algorithms in the Ground Truth Labeler app in MATLAB.

Data…

8 часов назад @ towardsdatascience.com
Mastering String Methods in Pandas
Mastering String Methods in Pandas Mastering String Methods in Pandas

In this post, we will walk through some of the most important string manipulation methods provided by pandas.

Next, let’s look at some specific string methods.

If we specify ‘dtype= strings’ and print the series:s4 = pd.Series([' python', 'java', 'ruby', 'fortan'], dtype='string')print(s4)We see that ‘’ has been interpreted.

To summarize, we discussed some basic Pandas methods for string manipulation.

There are many more Pandas string methods I did not go over in this post.

8 часов назад @ towardsdatascience.com
Using null samples to shape decision spaces and defend against adversarial attacks
Using null samples to shape decision spaces and defend against adversarial attacks Using null samples to shape decision spaces and defend against adversarial attacks

Figure 3: Learned decision spaces for a null-trained CNN modelAfter training this null model we visualize the decision spaces as we did for the conventional model.

We then trained a null model, using three types of null samples — uniform noise, mixed-digit images, and tiled-and-shuffled images.

Left: null model trained on mixed-digit null samples.

Middle: null model trained on shuffled-digit null samples.

When trained on both types of null samples (right panel) null models rarely make misclassifications, regardless of the magnitude of the perturbation.

9 часов назад @ towardsdatascience.com
Make a Deep Learning iOS App in Five Minutes
Make a Deep Learning iOS App in Five Minutes Make a Deep Learning iOS App in Five Minutes

Make a Deep Learning iOS App in Five MinutesMaking a deep learning App can seem daunting it’s not so bad in practice.

We’re going to build a deep learning iOS App which classifies CIFAR10 in five minutes.

Deep Learning and iOS development can seem scary but they’re not so bad once you get started.

The first argument is the model we want to convert which should be a Keras model.

For CIFAR10 that list is airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truckBuild an iOS AppWhile our model is simmering let’s build an iOS app for classifying images.

9 часов назад @ towardsdatascience.com
Effective Data Filtering in Pandas Using .loc[]
Effective Data Filtering in Pandas Using .loc[] Effective Data Filtering in Pandas Using .loc[]

Data Filtering Using LabelsThere are a couple of things to highlight from the above code snippet.

Data Filtering Using RangesUsing BooleansOne effective method to filter data is to use a list of boolean values that match the length of the axis we’re working on.

Data Filtering Using Booleans (Multiple Conditions)Using Lambdas or Custom FunctionsSometimes, we need to have a more advanced criterion for data filtering, in which case we can use lambdas as the example shown below.

In other words, this method is built on the data filtering method using the booleans, although they’re conceptually different.

Data Filtering Using LambdasIf we want to have more criteria to set, we can even write a fun…

11 часов назад @ towardsdatascience.com
Becoming Human Becoming Human
последний пост 16 часов назад
The road to Pill AI
The road to Pill AI The road to Pill AI

In looking back, it was the perfect storm that led me to begin PillSync 10 years ago and unintentionally, paved the road towards an Automatic Pill Recognition (APR) platform.

Un-LearningIn 2009, to appease my father, I enrolled in Pharmacy School.

So much so that in my first year of pharmacy school, I would learn how to code after class, at night, and whenever I had the time.

I had just learned about drugs, picked up coding, and now on the hunt for the first iPhone app idea to launch.

This trend started to hit me as users began to email me Pill Photos for identification.

16 часов назад @ becominghuman.ai
How can I start with AI in agriculture?
How can I start with AI in agriculture? How can I start with AI in agriculture?

Actually, to start AI in agriculture you have to develop an AI model like application, system, robot or machines that can leverage the power this advance technology and boost the productivity.

AI Model for AgricultureTo start the AI in agriculture you can develop AI-based model like autonomous tractors, robots, drones and weed controlling machines or other similar devices.

How to Develop AI model for Agriculture?

To get the training data sets for AI in agriculture you can get in touch with companies like Anolytics providing the high-quality data sets to develop such models.

Using the right quality of agriculture training data you can start and use the AI in this field and improve the produc…

16 часов назад @ becominghuman.ai
How Food and Beverage Companies Are Implementing AI for Supply Chain Management
How Food and Beverage Companies Are Implementing AI for Supply Chain Management How Food and Beverage Companies Are Implementing AI for Supply Chain Management

How Food and Beverage Companies Are Implementing AI for Supply Chain Management PreScouter Follow Feb 24 · 3 min readSupply Chain Management is a constant struggle for food and beverage (F&B) companies.

That’s where Artificial Intelligence (AI) can provide F&B companies with new supply chain insights to stay ahead of the curve.

This emerging technology helps F&B companies with Supply Chain Management through logistics, predictive analytics, and transparency.

AI, Food Safety, And Quality AssuranceHaving robots milk cows isn’t new, but testing milk for quality and safety using AI is.

The Ithaca-based plant will help monitor and test throughout the entire dairy supply chain — from farm to proc…

1 день, 13 часов назад @ becominghuman.ai
What Cutting Edge Technology Can We Expect in 2020?
What Cutting Edge Technology Can We Expect in 2020? What Cutting Edge Technology Can We Expect in 2020?

Technology and BusinessMost of the organizations and enterprises existing out there have resisted technological intervention more than once.

Be it machine learning, artificial intelligence, cloud or blockchain, the adoption of cutting edge technologies can bring all the difference to your business.

It does require initial investment and efforts, but once you adopt them and transform your core business process into being an advanced one, there is no going back.

When the core technology is freely available to anyone, companies can’t use it as a competitive advantage.

These algorithms have a huge potential that can impact the market forces and land the organizations utilizing it a competitive …

1 день, 13 часов назад @ becominghuman.ai
How Much Training Data is required for Chatbot Development?
How Much Training Data is required for Chatbot Development? How Much Training Data is required for Chatbot Development?

Natural language processing (NLP) and natural language understanding (NLU) are the two important aspects used to create the training data sets for chatbot.

Multilanguage Supporting Training DataIn chatbot training, data in multiple languages is also very important, as people find comfortable in their own language or as per their own convenience.

Analyze the Amount & Types of QueriesIn chatbot training, the most crucial point while choosing the training data set is — what types of queries and how much queries your customer can generate in a certain type of field.

Along with quantity of training data for chatbot, the quality is also very important, so you need to find the right chatbot traini…

1 день, 14 часов назад @ becominghuman.ai
Attacking Machine Learning Models
Attacking Machine Learning Models Attacking Machine Learning Models

Adversarial examples fooling models basically implies that models are not robust to calculated noise.

Ref: Machine Learning at BerkeleyHow to generate Adversarial Examples?

X = Clip-epsilon( x - sign( grad( J( x, y-target ), x ) ) )We now “descend” the gradient as indicated by the negative sign so as to get closer to y-target.

Machine Learning researchers got really creative and proposed a lot more methods to generate adversarial examples and to defend against such examples.

Whether you develop machine learning systems or research on machine learning algorithms, I hope you are convinced that adversarial attacks are worthy to be considered in your work.

4 дня, 9 часов назад @ becominghuman.ai
AI offers new hope for depression sufferers
AI offers new hope for depression sufferers AI offers new hope for depression sufferers

Algorithm can predict which antidepressant will be more effective for patientsEEG can predict depression treatment responseThe path to an effective treatment for people with depression is slow, and often involves trial and error of several antidepressants before symptom relief happens, if at all.

About 1 in 8 of the 242 million adults are currently prescribed an antidepressant medication, yet psychiatrists have no way to accurately predicting if a patient will benefit from a particular antidepressant medication.

About two thirds of patients diagnosed with depression do not respond to the first antidepressant medication their doctor prescribes.

Artificial Intelligence ConferenceTypically, ab…

4 дня, 9 часов назад @ becominghuman.ai
Reviews on top AI free courses that I’ve taken
Reviews on top AI free courses that I’ve taken Reviews on top AI free courses that I’ve taken

Last year I’ve decided to get past the artificial intelligence buzzwords from the media articles and really have a clue about the subject.

The more research I made the more I got intrigued and interested in AI.

It baffled me how much AI will impact our lives and I realised this is the field I want to be in.

Courses I’ve takenIntro to Artificial IntelligenceAbout the courseIt’s a classic on AI and it happened to be the first course I’ve ever taken on the subject.

Peter Norvig: a director of research at Google and co-author of the leading college text in the field — Artificial Intelligence: A modern ApproachConclusionI can’t recommend it enough.

5 дней, 11 часов назад @ becominghuman.ai
Implementing an Autoencoder in PyTorch
Implementing an Autoencoder in PyTorch Implementing an Autoencoder in PyTorch

The encoder and the decoder are neural networks that build the autoencoder model, as depicted in the following figure,Illustrated using NN-SVG.

To simplify the implementation, we write the encoder and decoder layers in one class as follows,The autoencoder model written as a custom torch.nn.Module.

Finally, we can train our model for a specified number of epochs as follows,Training loop for the autoencoder model.

To see how our training is going, we accumulate the training loss for each epoch ( loss += training_loss.item() ), and compute the average training loss across an epoch ( loss = loss / len(train_loader) ).

Closing RemarksI hope this has been a clear tutorial on implementing an autoe…

5 дней, 11 часов назад @ becominghuman.ai
How AI can impact Mobile App Development and User Experience
How AI can impact Mobile App Development and User Experience How AI can impact Mobile App Development and User Experience

Ai is now being used in many sectors and areas to improve the user experience and enhance the customer journey.

How AI can improve user experienceImproving the app UXArtificial Intelligence helps to engage the user by analyzing their behavior and their purchasing pattern.

Logical reasoning for automationUsing automation to perform several mobile app tasks helps developers deliver a comfortable user experience.

As technology is progressing the use of Artificial intelligence in mobile app development is resulting in the growth of many businesses.

Mobile app developers can use AI-based algorithms to customize the user journey through an app and to engage the customer.

6 дней, 16 часов назад @ becominghuman.ai
A Case for a Futurist Philosophy of Interaction
A Case for a Futurist Philosophy of Interaction A Case for a Futurist Philosophy of Interaction

However, surely that is because artificial intelligence is still in its infancy and flailing around in restrictive code.

The same comment could be made about the approach of artificial intelligence in conjunction with the present loss of jobs to automation.

Artificial intelligence and automation seem to be the perfect case of an domain where the left and the right need to come together in the centre.

Artificial intelligence and automation seem to be the perfect case of domains where the left and the right need to come together.

inevitably, the market will continue to build its case for Artificial Superintelligence until something passes our test of consciousness.

6 дней, 16 часов назад @ becominghuman.ai
How to use Artificial Intelligence for Driving Business?
How to use Artificial Intelligence for Driving Business? How to use Artificial Intelligence for Driving Business?

Well, if you are one of those struggling with business challenges like these, then leveraging Artificial Intelligence to boost business profits is a good option.

Let’s see how can AI in business improve customer experience, and accelerate the overall growth of your business.

Artificial Intelligence helped MongoDB in increasing the new net leads by 70% and the total messaging response by 100%.

The impact of Artificial Intelligence on the business industry is huge and significant.

Lately, Spotify, a popular on-demand music service application increased its user base by using Artificial Intelligence.

6 дней, 16 часов назад @ becominghuman.ai
Convolutional Neural Network — A brief introduction
Convolutional Neural Network — A brief introduction Convolutional Neural Network — A brief introduction

A convolutional neural network (CNN) is a particular implementation of a neural network used in machine learning that exclusively processes array data such as images, and is thus frequently used in machine learning applications targeted at medical images.

Feature extractionThe feature extraction component of a convolutional neural network is what distinguishes CNNs from other multilayered neural networks.

This is what gives the neural network the ability to approximate almost any function.

If NxN is the dimension of the kernel matrix, we need to add floor(N/2) zero layers to the edges.

TrainingMost frequently convolutional neural networks in radiology undergo supervised learning.

1 неделя назад @ becominghuman.ai
What is the Main Purpose of Video Annotation in Machine Learning and AI?
What is the Main Purpose of Video Annotation in Machine Learning and AI? What is the Main Purpose of Video Annotation in Machine Learning and AI?

Just like image annotation, video annotation also helps machines to recognize the objects through computer vision.

So, right here, apart from object detection, we will explain what is the main purpose of video annotation.

Top 4 Most Popular Ai Articles:Localize the ObjectsAnother purpose of video annotation is localizing the objects in the video.

Tracking the ActivitiesAnother significant purpose of video annotation is again to train the computer vision based AI or machine learning model track the human activities and estimate the poses.

There are many video annotation companies providing the data labeling service for AI and machine learning.

1 неделя назад @ becominghuman.ai
Chatbot Conference Flash Sale
Chatbot Conference Flash Sale Chatbot Conference Flash Sale

Chatbot Conference Flash SaleRegister & Save $100 in Tickets this WeekGreat news, today is the start of our Flash Sale.

Register and save 20% on all Chatbot Conference Tickets.

>>> Click Here to Register & Save 20%<<>> Click Here to Register & Save 20%<<>> Register & Save 20%<<<

1 неделя, 1 день назад @ becominghuman.ai
Distill.pub Distill.pub
последний пост 2 недели назад
Growing Cellular Automata
Growing Cellular Automata

Differentiable Self-Organisation: Cellular Automata model of Morphogenesis.

2 недели назад @ distill.pub
Visualizing the Impact of Feature Attribution Baselines
Visualizing the Impact of Feature Attribution Baselines

Exploring the baseline input hyperparameter, and how it impacts interpretations of neural network behavior.

1 месяц, 2 недели назад @ distill.pub
Computing Receptive Fields of Convolutional Neural Networks
Computing Receptive Fields of Convolutional Neural Networks

Detailed derivations and open-source code to analyze the receptive fields of convnets.

3 месяца, 3 недели назад @ distill.pub
The Paths Perspective on Value Learning
The Paths Perspective on Value Learning

A closer look at how Temporal Difference Learning merges paths of experience for greater statistical efficiency

4 месяца, 4 недели назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness'

The main hypothesis in Ilyas et al. (2019) happens to be a special case of a more general principle that is commonly accepted in the robustness to distributional shift literature

6 месяцев, 3 недели назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Robust Feature Leakage
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Robust Feature Leakage

An example project using webpack and svelte-loader and ejs to inline SVGs

6 месяцев, 3 недели назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Two Examples of Useful, Non-Robust Features
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Two Examples of Useful, Non-Robust Features

An example project using webpack and svelte-loader and ejs to inline SVGs

6 месяцев, 3 недели назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarially Robust Neural Style Transfer
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarially Robust Neural Style Transfer

An experiment showing adversarial robustness makes neural style transfer work on a non-VGG architecture

6 месяцев, 3 недели назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Examples are Just Bugs, Too
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Examples are Just Bugs, Too

Refining the source of adversarial examples

6 месяцев, 3 недели назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Learning from Incorrectly Labeled Data
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Learning from Incorrectly Labeled Data

Section 3.2 of Ilyas et al. (2019) shows that training a model on only adversarial errors leads to non-trivial generalization on the original test set. We show that these experiments are a specific case of learning from errors.

6 месяцев, 3 недели назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Discussion and Author Responses
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Discussion and Author Responses 6 месяцев, 3 недели назад @ distill.pub
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features'
A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features'

Six comments from the community and responses from the original authors

6 месяцев, 3 недели назад @ distill.pub
Open Questions about Generative Adversarial Networks
Open Questions about Generative Adversarial Networks

What we'd like to find out about GANs that we don't know yet.

10 месяцев, 3 недели назад @ distill.pub
A Visual Exploration of Gaussian Processes
A Visual Exploration of Gaussian Processes

How to turn a collection of small building blocks into a versatile tool for solving regression problems.

10 месяцев, 4 недели назад @ distill.pub
Visualizing memorization in RNNs
Visualizing memorization in RNNs

Inspecting gradient magnitudes in context can be a powerful tool to see when recurrent units use short-term or long-term contextual understanding.

11 месяцев, 1 неделя назад @ distill.pub
The Gradient The Gradient
последний пост 2 недели, 5 дней назад
Quantifying Independently Reproducible Machine Learning
Quantifying Independently Reproducible Machine Learning Quantifying Independently Reproducible Machine Learning

My investigation in reproducible ML has also relied on personal notes and records hosted on Mendeley and Github.

http://phdcomics.com/comics/archive.php?comicid=1689Our aversion to using or asking for the authors code is more than fear of working with undocumented research-grade code.

What Makes a ML Paper Reproducible?

The biggest factors are that we cannot take all of our assumptions about so-called reproducible ML at face value.

At the same time, our process and systems must result in reproducible work that does not lead us astray.

2 недели, 5 дней назад @ thegradient.pub
GPT-2 and the Nature of Intelligence
GPT-2 and the Nature of Intelligence GPT-2 and the Nature of Intelligence

--The AI system GPT-2, in a December 2019 interview with The Economist, "An artificial intelligence predicts the future"Innateness, empiricism, and recent developments in deep learningConsider two classic hypotheses about the development of language and cognition.

Consider GPT-2, an AI system that was recently featured in The New Yorker and interviewed by The Economist.

The popular blog StatStarCodex featured it, too, in a podcast entitled "GPT-2 as a step towards General Intelligence".

Compared to any previous system for generating natural language, GPT-2 has a number of remarkable strengths.

I speak fluent EnglishIf you run your experiments talktotransformer.com, you will quickly learn th…

1 месяц назад @ thegradient.pub
The Economics of AI Today
The Economics of AI Today The Economics of AI Today

Every day we hear claims that Artificial Intelligence (AI) systems are about to transform the economy, creating mass unemployment and vast monopolies.

In September 2017, a group of distinguished economists gathered in Toronto to set out a research agenda for the Economics of Artificial Intelligence (AI).

Previous editions of the Economics of AI conference included papers about the impact of AI in sectors such as media or health-care.

Lack of diversity in the AI research workforce, and the increasing influence of the private sector in setting AI research (and ethical) agendas as part of the industrialization of AI research suggest that this could be a problem, but the evidence base is lackin…

1 месяц, 1 неделя назад @ thegradient.pub
Is NeurIPS Getting Too Big?
Is NeurIPS Getting Too Big? Is NeurIPS Getting Too Big?

NeurIPS 2019, the latest incarnation of the Neural Information Processing Systems conference, wrapped up just over a week ago.

No, that's a keynote at #NeurIPS2019 pic.twitter.com/nJjONGzJww — Jevgenij Gamper (@brutforcimag) December 11, 2019 NeurIPS poster session- Too crowded.

Lots of Posters/Talks/TopicsThe other primary purpose of any research conference is to inform attendees of new research and inspire new ideas.

:(NeurIPS 2019, Vancouver, Canada: Got the visa 3 weeks before.

2019 NeurIPS was last week in Vancouver.

2 месяца назад @ thegradient.pub
An Epidemic of AI Misinformation
An Epidemic of AI Misinformation An Epidemic of AI Misinformation

Unfortunately, the problem of overhyped AI extends beyond the media itself.

General AI still seems like it might be a couple decades away, sixty years after the first optimistic projections were issued.

Hundreds of deep learning for radiology companies have been spawned in the meantime, but thus far no actual radiologists have been replaced, and the best guess is that deep learning can augment radiologists, but not, in the near-term replace them.

The net consequences could, in the end, debilitate the field, paradoxically inducing an AI winter after initially helping stimulate public interest.

If AI system is allegedly better than humans, then which humans, and how much better?

2 месяца, 3 недели назад @ thegradient.pub
Introduction to Artificial Life for People who Like AI
Introduction to Artificial Life for People who Like AI Introduction to Artificial Life for People who Like AI

Artificial Life, often shortened as ALife.

NEAT was awarded the 2017 International Society for Artificial Life Award for Outstanding Paper of the Decade.

First, I think we are seeing the first signs of the next AI winter, a period where people lose confidence in AI research and funding dries out.

Art ALife: “Edge of Chaos: Artificial Life based interactive art installation” by Vasilija Abramovic and Ruairi GlynnHave you heard about the edge of chaos?

She was recently elected to the board of the International Society for Artificial Life.

3 месяца назад @ thegradient.pub
How Machine Learning Can Help Unlock the World of Ancient Japan
How Machine Learning Can Help Unlock the World of Ancient Japan How Machine Learning Can Help Unlock the World of Ancient Japan

However, these models were unable to achieve strong performance on Kuzushiji recognition.

This was due to inadequate understanding of Japanese historical literature in the optical character recognition (OCR) community and the lack of high quality standardized datasets.

There are several reasons why Kuzushiji recognition is challenging:Capturing both local and global context is important.

This is one reason why conventional sequence models do not have the capability to work well with many Kuzushiji documents.

However there are many other types of Kuzushiji text that a person might want to transcribe.

3 месяца, 1 неделя назад @ thegradient.pub
Gaussian Processes, not quite for dummies
Gaussian Processes, not quite for dummies Gaussian Processes, not quite for dummies

Note: if all k components are independent Gaussian random variables, then $X$ must be multivariate Gaussian (because the sum of independent Gaussian random variables is always Gaussian).

Higher dimensional Gaussian5D GaussianNow we can consider a higher dimension Gaussian, starting from 5D — so the covariance matrix is now 5x5.

We then take K and add $I\sigma_y^2$ for the final covariance matrix to factor in noise -- more on this later.

This means in principle, we can calculate this covariance matrix for any real-valued $x_1$ and $x_2$ by simply plugging them in.

Gaussian ProcessTextbook definitionFrom the above derivation, you can view Gaussian process as a generalization of multivariate G…

3 месяца, 2 недели назад @ thegradient.pub
Evaluation Metrics for Language Modeling
Evaluation Metrics for Language Modeling Evaluation Metrics for Language Modeling

Counterintuitively, having more metrics actually makes it harder to compare language models, especially as indicators of how well a language model will perform on a specific downstream task are often unreliable.

Despite the presence of these downstream evaluation benchmarks, traditional intrinsic metrics are, nevertheless, extremely useful during the process of training the language model itself.

Proof: let P be the distribution of the underlying language and Q be the distribution learned by a language model.

The performance of N-gram language models do not improve much as N goes above 4, whereas the performance of neural language models continue improving over time.

In less than two years,…

4 месяца, 1 неделя назад @ thegradient.pub
The State of Machine Learning Frameworks in 2019
The State of Machine Learning Frameworks in 2019 The State of Machine Learning Frameworks in 2019

Since deep learning regained prominence in 2012, many machine learning frameworks have clamored to become the new favorite among researchers and industry practitioners.

It is perhaps under appreciated how much machine learning frameworks shape ML research.

Machine learning research itself is also in a massive state of flux.

Most of us don't work on machine learning software for the money or to assist in our company's strategic plans.

We work in machine learning because we care - about advancing machine learning research, about democratizing AI, or maybe just about building cool stuff.

4 месяца, 2 недели назад @ thegradient.pub
The #BenderRule: On Naming the Languages We Study and Why It Matters
The #BenderRule: On Naming the Languages We Study and Why It Matters The #BenderRule: On Naming the Languages We Study and Why It Matters

This has led to a digital divide in the field of NLP between high resource and low resource languages.

High resource languages constitute a short list starting with English, (Mandarin) Chinese, Arabic and French .

And yet, the field of NLP is caught in a negative feedback loop that hinders the expansion of the languages we work on.

Work on languages other than English is often considered “language specific” and thus reviewed as less important than equivalent work on English.

Many NLP systems for Chinese, Japanese, Thai and other languages have to start with the problem of word tokenization.

5 месяцев, 2 недели назад @ thegradient.pub
NLP's Clever Hans Moment has Arrived
NLP's Clever Hans Moment has Arrived NLP's Clever Hans Moment has Arrived

However, the model doesn't care about this impossibility and identifies the correct warrant with 71 percent accuracy.

Coming back to the paper, the authors point to a (again, depressingly) large amount of recent work reporting Clever Hans effects in NLP datasets.

For a broader view on this topic, also see Ana Marasović's article on NLP's Generalization Problem.

The growing number of papers finding cases of the Clever Hans effect raises important questions for NLP research, the most obvious one being how the effect can be prevented.

If not much, the dataset may provide unintended non-content cues, such as sentence length or distribution of function words.

6 месяцев назад @ thegradient.pub
Introducing Retrospectives: 'Real Talk' for your Past Papers
Introducing Retrospectives: 'Real Talk' for your Past Papers Introducing Retrospectives: 'Real Talk' for your Past Papers

What the community still lacks, though, are incentives for publicly documenting our real thoughts and feelings about our past papers.

Today, we’re launching ML Retrospectives, a website for hosting reflections and critiques of researchers’ own past papers that we’re calling retrospectives.

With the clearing of this emotional weight it became easier to look at my past papers.

ML Retrospectives is a platform for hosting retrospectives: documents where researchers write honestly about their past papers.

While a venue for critiquing other people’s papers might also be valuable, we wanted to focus on normalizing sharing drawbacks of your own past papers.

6 месяцев, 1 неделя назад @ thegradient.pub
Leveraging Learning in Robotics: RSS 2019 Highlights
Leveraging Learning in Robotics: RSS 2019 Highlights Leveraging Learning in Robotics: RSS 2019 Highlights

The Robotics Science and Systems conference (RSS), along with ICRA and IROS, is among the top three conferences for robotics with a good proportion of papers on the intersection of Learning and Robotics being published.

On the second day, I attended the Robust autonomy: Safe robot learning and control in uncertain real-world environments Workshop.

More on Soft RoboticsProfessor Koichi Suzumori’s informative, comprehensive and engaging Keynote on the past, present and future of Soft Robotics at RSS is a good place to start knowing more about Soft Robotics.

I will discuss my opinions on the significance of soft robotics with the help of a Japanese word “E-kagen”, which has two contrasting mea…

6 месяцев, 1 неделя назад @ thegradient.pub
Is Deep Learning the Future of Medical Decision Making?
Is Deep Learning the Future of Medical Decision Making? Is Deep Learning the Future of Medical Decision Making?

tissue from biopsies) for reference when making medical decisions with new patients is a promising avenue where the state-of-the-art deep learning visual models can be highly applicable.

Cai’s research showcases how the refinement tools they developed on their medical image retrieval system increases the diagnostic utility of images and most importantly, increases a user’s trust in the machine learning algorithm for medical decision making.

point out two crucial challenges in CBIR systems that they call as the intention gap and the semantic gap.

The role of deep learningThe underlying details of the CBIR system analysed by Carrie J. Cai et al.

claims that with their CBIR system users were a…

6 месяцев, 3 недели назад @ thegradient.pub
🔬 Science
arXiv.org arXiv.org
последний пост 1 час назад
Variational Hyper RNN for Sequence Modeling. (arXiv:2002.10501v1 [cs.LG])
Variational Hyper RNN for Sequence Modeling. (arXiv:2002.10501v1 [cs.LG])

In this work, we propose a novel probabilistic sequence model that excels at

capturing high variability in time series data, both across sequences and

within an individual sequence. Our method uses temporal latent variables to

capture information about the underlying data pattern and dynamically decodes

the latent information into modifications of weights of the base decoder and

recurrent model. The efficacy of the proposed method is demonstrated on a range

of synthetic and real-world sequential data that exhibit large scale

variations, regime shifts, and complex dynamics.

1 час назад @ arxiv.org
Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent. (arXiv:2002.10487v1 [cs.LG])
Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent. (arXiv:2002.10487v1 [cs.LG])

Continuous-time mirror descent (CMD) can be seen as the limit case of the

discrete-time MD update when the step-size is infinitesimally small. In this

paper, we focus on the geometry of the primal and dual CMD updates and

introduce a general framework for reparameterizing one CMD update as another.

Specifically, the reparameterized update also corresponds to a CMD, but on the

composite loss w.r.t. the new variables, and the original variables are

obtained via the reparameterization map. We employ these results to introduce a

new family of reparameterizations that interpolate between the two commonly

used updates, namely the continuous-time gradient descent (GD) and unnormalized

exponentiate…

1 час назад @ arxiv.org
Precise Tradeoffs in Adversarial Training for Linear Regression. (arXiv:2002.10477v1 [cs.LG])
Precise Tradeoffs in Adversarial Training for Linear Regression. (arXiv:2002.10477v1 [cs.LG])

Despite breakthrough performance, modern learning models are known to be

highly vulnerable to small adversarial perturbations in their inputs. While a

wide variety of recent \emph{adversarial training} methods have been effective

at improving robustness to perturbed inputs (robust accuracy), often this

benefit is accompanied by a decrease in accuracy on benign inputs (standard

accuracy), leading to a tradeoff between often competing objectives.

Complicating matters further, recent empirical evidence suggest that a variety

of other factors (size and quality of training data, model size, etc.) affect

this tradeoff in somewhat surprising ways. In this paper we provide a precise

and comprehensi…

1 час назад @ arxiv.org
Dynamic Systems Simulation and Control Using Consecutive Recurrent Neural Networks. (arXiv:2002.10228v2 [cs.LG] UPDATED)
Dynamic Systems Simulation and Control Using Consecutive Recurrent Neural Networks. (arXiv:2002.10228v2 [cs.LG] UPDATED)

In this paper, we introduce a novel architecture to connecting adaptive

learning and neural networks into an arbitrary machine's control system

paradigm. Two consecutive Recurrent Neural Networks (RNNs) are used together to

accurately model the dynamic characteristics of electromechanical systems that

include controllers, actuators and motors. The age-old method of achieving

control with the use of the- Proportional, Integral and Derivative constants is

well understood as a simplified method that does not capture the complexities

of the inherent nonlinearities of complex control systems. In the context of

controlling and simulating electromechanical systems, we propose an alternative

to PID…

1 час назад @ arxiv.org
Bifurcation Spiking Neural Network. (arXiv:1909.08341v2 [cs.NE] UPDATED)
Bifurcation Spiking Neural Network. (arXiv:1909.08341v2 [cs.NE] UPDATED)

Spiking neural networks (SNNs) has attracted much attention due to its great

potential of modeling time-dependent signals. The firing rate of spiking

neurons is decided by control rate which is fixed manually in advance, and

thus, whether the firing rate is adequate for modeling actual time series

relies on fortune. Though it is demanded to have an adaptive control rate, it

is a non-trivial task because the control rate and the connection weights

learned during the training process are usually entangled. In this paper, we

show that the firing rate is related to the eigenvalue of the spike generation

function. Inspired by this insight, by enabling the spike generation function

to have adapta…

1 час назад @ arxiv.org
Tailoring Artificial Neural Networks for Optimal Learning. (arXiv:1707.02469v4 [cs.LG] UPDATED)
Tailoring Artificial Neural Networks for Optimal Learning. (arXiv:1707.02469v4 [cs.LG] UPDATED)

As one of the most important paradigms of recurrent neural networks, the echo

state network (ESN) has been applied to a wide range of fields, from robotics

to medicine, finance, and language processing. A key feature of the ESN

paradigm is its reservoir --- a directed and weighted network of neurons that

projects the input time series into a high dimensional space where linear

regression or classification can be applied. Despite extensive studies, the

impact of the reservoir network on the ESN performance remains unclear.

Combining tools from physics, dynamical systems and network science, we attempt

to open the black box of ESN and offer insights to understand the behavior of

general artif…

1 час назад @ arxiv.org
An Assignment Problem Formulation for Dominance Move Indicator. (arXiv:2002.10842v1 [cs.NE])
An Assignment Problem Formulation for Dominance Move Indicator. (arXiv:2002.10842v1 [cs.NE])

Dominance move (DoM) is a binary quality indicator to compare solution sets

in multiobjective optimization. The indicator allows a more natural and

intuitive relation when comparing solution sets. It is Pareto compliant and

does not demand any parameters or reference sets. In spite of its advantages,

the combinatorial calculation nature is a limitation. The original formulation

presents an efficient method to calculate it in a biobjective case only. This

work presents an assignment formulation to calculate DoM in problems with three

objectives or more. Some initial experiments, in the biobjective space, were

done to present the model correctness. Next, other experiments, using three

dimensi…

1 час назад @ arxiv.org
Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory. (arXiv:2002.10674v1 [cs.NE])
Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory. (arXiv:2002.10674v1 [cs.NE])

Batch Normalization (BatchNorm) is commonly used in Convolutional Neural

Networks (CNNs) to improve training speed and stability. However, there is

still limited consensus on why this technique is effective. This paper uses

concepts from the traditional adaptive filter domain to provide insight into

the dynamics and inner workings of BatchNorm. First, we show that the

convolution weight updates have natural modes whose stability and convergence

speed are tied to the eigenvalues of the input autocorrelation matrices, which

are controlled by BatchNorm through the convolution layers' channel-wise

structure. Furthermore, our experiments demonstrate that the speed and

stability benefits are dist…

1 час назад @ arxiv.org
Non-Volatile Memory Array Based Quantization- and Noise-Resilient LSTM Neural Networks. (arXiv:2002.10636v1 [cs.NE])
Non-Volatile Memory Array Based Quantization- and Noise-Resilient LSTM Neural Networks. (arXiv:2002.10636v1 [cs.NE])

In cloud and edge computing models, it is important that compute devices at

the edge be as power efficient as possible. Long short-term memory (LSTM)

neural networks have been widely used for natural language processing, time

series prediction and many other sequential data tasks. Thus, for these

applications there is increasing need for low-power accelerators for LSTM model

inference at the edge. In order to reduce power dissipation due to data

transfers within inference devices, there has been significant interest in

accelerating vector-matrix multiplication (VMM) operations using non-volatile

memory (NVM) weight arrays. In NVM array-based hardware, reduced bit-widths

also significantly i…

1 час назад @ arxiv.org
Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity. (arXiv:2002.10585v1 [cs.NE])
Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity. (arXiv:2002.10585v1 [cs.NE])

The impressive lifelong learning in animal brains is primarily enabled by

plastic changes in synaptic connectivity. Importantly, these changes are not

passive, but are actively controlled by neuromodulation, which is itself under

the control of the brain. The resulting self-modifying abilities of the brain

play an important role in learning and adaptation, and are a major basis for

biological reinforcement learning. Here we show for the first time that

artificial neural networks with such neuromodulated plasticity can be trained

with gradient descent. Extending previous work on differentiable Hebbian

plasticity, we propose a differentiable formulation for the neuromodulation of

plasticity. …

1 час назад @ arxiv.org
Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent. (arXiv:2002.10583v1 [cs.LG])
Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent. (arXiv:2002.10583v1 [cs.LG])

Stochastic gradient descent (SGD) with constant momentum and its variants

such as Adam are the optimization algorithms of choice for training deep neural

networks (DNNs). Since DNN training is incredibly computationally expensive,

there is great interest in speeding up convergence. Nesterov accelerated

gradient (NAG) improves the convergence rate of gradient descent (GD) for

convex optimization using a specially designed momentum; however, it

accumulates error when an inexact gradient is used (such as in SGD), slowing

convergence at best and diverging at worst. In this paper, we propose Scheduled

Restart SGD (SRSGD), a new NAG-style scheme for training DNNs. SRSGD replaces

the constant mome…

1 час назад @ arxiv.org
Neural Lyapunov Model Predictive Control. (arXiv:2002.10451v1 [cs.AI])
Neural Lyapunov Model Predictive Control. (arXiv:2002.10451v1 [cs.AI])

This paper presentsNeural Lyapunov MPC, analgorithm to alternately train a

Lyapunov neuralnetwork and a stabilising constrained Model Pre-dictive

Controller (MPC), given a neural networkmodel of the system dynamics. This

extends re-cent works on Lyapunov networks to be able totrain solely from

expert demonstrations of one-step transitions. The learned Lyapunov networkis

used as the value function for the MPC in orderto guarantee stability and

extend the stable region.Formal results are presented on the existence of aset

of MPC parameters, such as discount factors,that guarantees stability with a

horizon as short asone. Robustness margins are also discussed andexisting

performance bounds on …

1 час назад @ arxiv.org
Anatomy-aware 3D Human Pose Estimation in Videos. (arXiv:2002.10322v2 [cs.CV] UPDATED)
Anatomy-aware 3D Human Pose Estimation in Videos. (arXiv:2002.10322v2 [cs.CV] UPDATED)

In this work, we propose a new solution for 3D human pose estimation in

videos. Instead of directly regressing the 3D joint locations, we draw

inspiration from the human skeleton anatomy and decompose the task into bone

direction prediction and bone length prediction, from which the 3D joint

locations can be completely derived. Our motivation is the fact that the bone

lengths of a human skeleton remain consistent across time. This promotes us to

develop effective techniques to utilize global information across {\it all} the

frames in a video for high-accuracy bone length prediction. Moreover, for the

bone direction prediction network, we propose a fully-convolutional propagating

architectur…

1 час назад @ arxiv.org
A Class of Linear Programs Solvable by Coordinate-wise Minimization. (arXiv:2001.10467v4 [math.OC] UPDATED)
A Class of Linear Programs Solvable by Coordinate-wise Minimization. (arXiv:2001.10467v4 [math.OC] UPDATED)

Coordinate-wise minimization is a simple popular method for large-scale

optimization. Unfortunately, for general (non-differentiable) convex problems

it may not find global minima. We present a class of linear programs that

coordinate-wise minimization solves exactly. We show that dual LP relaxations

of several well-known combinatorial optimization problems are in this class and

the method finds a global minimum with sufficient accuracy in reasonable

runtimes. Moreover, for extensions of these problems that no longer are in this

class the method yields reasonably good suboptima. Though the presented LP

relaxations can be solved by more efficient methods (such as max-flow), our

results are t…

1 час назад @ arxiv.org
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences. (arXiv:2001.06891v2 [cs.CV] UPDATED)
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences. (arXiv:2001.06891v2 [cs.CV] UPDATED)

In this paper, we consider a novel task, Spatio-Temporal Video Grounding for

Multi-Form Sentences (STVG). Given an untrimmed video and a

declarative/interrogative sentence depicting an object, STVG aims to localize

the spatiotemporal tube of the queried object. STVG has two challenging

settings: (1) We need to localize spatio-temporal object tubes from untrimmed

videos, where the object may only exist in a very small segment of the video;

(2) We deal with multi-form sentences, including the declarative sentences with

explicit objects and interrogative sentences with unknown objects. Existing

methods cannot tackle the STVG task due to the ineffective tube pre-generation

and the lack of objec…

1 час назад @ arxiv.org
arXiv.org arXiv.org
последний пост 1 час назад
PDANet: Pyramid Density-aware Attention Net for Accurate Crowd Counting. (arXiv:2001.05643v6 [cs.CV] UPDATED)
PDANet: Pyramid Density-aware Attention Net for Accurate Crowd Counting. (arXiv:2001.05643v6 [cs.CV] UPDATED)

Crowd counting, i.e., estimating the number of people in a crowded area, has

attracted much interest in the research community. Although many attempts have

been reported, crowd counting remains an open real-world problem due to the

vast scale variations in crowd density within the interested area, and severe

occlusion among the crowd. In this paper, we propose a novel Pyramid

Density-Aware Attention-based network, abbreviated as PDANet, that leverages

the attention, pyramid scale feature and two branch decoder modules for

density-aware crowd counting. The PDANet utilizes these modules to extract

different scale features, focus on the relevant information, and suppress the

misleading ones. W…

1 час назад @ arxiv.org
Multi-Graph Transformer for Free-Hand Sketch Recognition. (arXiv:1912.11258v2 [cs.CV] UPDATED)
Multi-Graph Transformer for Free-Hand Sketch Recognition. (arXiv:1912.11258v2 [cs.CV] UPDATED)

Learning meaningful representations of free-hand sketches remains a

challenging task given the signal sparsity and the high-level abstraction of

sketches. Existing techniques have focused on exploiting either the static

nature of sketches with Convolutional Neural Networks (CNNs) or the temporal

sequential property with Recurrent Neural Networks (RNNs). In this work, we

propose a new representation of sketches as multiple sparsely connected graphs.

We design a novel Graph Neural Network (GNN), the Multi-Graph Transformer

(MGT), for learning representations of sketches from multiple graphs which

simultaneously capture global and local geometric stroke structures, as well as

temporal informat…

1 час назад @ arxiv.org
Connecting Vision and Language with Localized Narratives. (arXiv:1912.03098v2 [cs.CV] UPDATED)
Connecting Vision and Language with Localized Narratives. (arXiv:1912.03098v2 [cs.CV] UPDATED)

We propose Localized Narratives, an efficient way to collect image captions

with dense visual grounding. We ask annotators to describe an image with their

voice while simultaneously hovering their mouse over the region they are

describing. Since the voice and the mouse pointer are synchronized, we can

localize every single word in the description. This dense visual grounding

takes the form of a mouse trace segment per word and is unique to our data. We

annotate 628k images with Localized Narratives: the whole COCO dataset and 504k

images of the Open Images dataset, which we make publicly available. We provide

an extensive analysis of these annotations and demonstrate their utility on two

ap…

1 час назад @ arxiv.org
Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks. (arXiv:1910.06259v3 [cs.LG] UPDATED)
Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks. (arXiv:1910.06259v3 [cs.LG] UPDATED)

Adversarial training yields robust models against a specific threat model,

e.g., $L_\infty$ adversarial examples. Typically robustness does not generalize

to previously unseen threat models, e.g., other $L_p$ norms, or larger

perturbations. Our confidence-calibrated adversarial training (CCAT) tackles

this problem by biasing the model towards low confidence predictions on

adversarial examples. By allowing to reject examples with low confidence,

robustness generalizes beyond the threat model employed during training. CCAT,

trained only on $L_\infty$ adversarial examples, increases robustness against

larger $L_\infty$, $L_2$, $L_1$ and $L_0$ attacks, adversarial frames, distal

adversarial exa…

1 час назад @ arxiv.org
Real-Time Semantic Stereo Matching. (arXiv:1910.00541v2 [cs.CV] UPDATED)
Real-Time Semantic Stereo Matching. (arXiv:1910.00541v2 [cs.CV] UPDATED)

Scene understanding is paramount in robotics, self-navigation, augmented

reality, and many other fields. To fully accomplish this task, an autonomous

agent has to infer the 3D structure of the sensed scene (to know where it looks

at) and its content (to know what it sees). To tackle the two tasks, deep

neural networks trained to infer semantic segmentation and depth from stereo

images are often the preferred choices. Specifically, Semantic Stereo Matching

can be tackled by either standalone models trained for the two tasks

independently or joint end-to-end architectures. Nonetheless, as proposed so

far, both solutions are inefficient because requiring two forward passes in the

former case o…

1 час назад @ arxiv.org
Stochastic Conditional Generative Networks with Basis Decomposition. (arXiv:1909.11286v2 [cs.CV] UPDATED)
Stochastic Conditional Generative Networks with Basis Decomposition. (arXiv:1909.11286v2 [cs.CV] UPDATED)

While generative adversarial networks (GANs) have revolutionized machine

learning, a number of open questions remain to fully understand them and

exploit their power. One of these questions is how to efficiently achieve

proper diversity and sampling of the multi-mode data space. To address this, we

introduce BasisGAN, a stochastic conditional multi-mode image generator. By

exploiting the observation that a convolutional filter can be well approximated

as a linear combination of a small set of basis elements, we learn a

plug-and-played basis generator to stochastically generate basis elements, with

just a few hundred of parameters, to fully embed stochasticity into

convolutional filters. By …

1 час назад @ arxiv.org
Underwater Image Super-Resolution using Deep Residual Multipliers. (arXiv:1909.09437v3 [eess.IV] UPDATED)
Underwater Image Super-Resolution using Deep Residual Multipliers. (arXiv:1909.09437v3 [eess.IV] UPDATED)

We present a deep residual network-based generative model for single image

super-resolution (SISR) of underwater imagery for use by autonomous underwater

robots. We also provide an adversarial training pipeline for learning SISR from

paired data. In order to supervise the training, we formulate an objective

function that evaluates the \textit{perceptual quality} of an image based on

its global content, color, and local style information. Additionally, we

present USR-248, a large-scale dataset of three sets of underwater images of

'high' (640x480) and 'low' (80x60, 160x120, and 320x240) spatial resolution.

USR-248 contains paired instances for supervised training of 2x, 4x, or 8x SISR

models…

1 час назад @ arxiv.org
Multiview-Consistent Semi-Supervised Learning for 3D Human Pose Estimation. (arXiv:1908.05293v3 [cs.CV] UPDATED)
Multiview-Consistent Semi-Supervised Learning for 3D Human Pose Estimation. (arXiv:1908.05293v3 [cs.CV] UPDATED)

The best performing methods for 3D human pose estimation from monocular

images require large amounts of in-the-wild 2D and controlled 3D pose annotated

datasets which are costly and require sophisticated systems to acquire. To

reduce this annotation dependency, we propose Multiview-Consistent Semi

Supervised Learning (MCSS) framework that utilizes similarity in pose

information from unannotated, uncalibrated but synchronized multi-view videos

of human motions as additional weak supervision signal to guide 3D human pose

regression. Our framework applies hard-negative mining based on temporal

relations in multi-view videos to arrive at a multi-view consistent pose

embedding. When jointly trai…

1 час назад @ arxiv.org
Learning Variations in Human Motion via Mix-and-Match Perturbation. (arXiv:1908.00733v2 [cs.LG] UPDATED)
Learning Variations in Human Motion via Mix-and-Match Perturbation. (arXiv:1908.00733v2 [cs.LG] UPDATED)

Human motion prediction is a stochastic process: Given an observed sequence

of poses, multiple future motions are plausible. Existing approaches to

modeling this stochasticity typically combine a random noise vector with

information about the previous poses. This combination, however, is done in a

deterministic manner, which gives the network the flexibility to learn to

ignore the random noise. In this paper, we introduce an approach to

stochastically combine the root of variations with previous pose information,

which forces the model to take the noise into account. We exploit this idea for

motion prediction by incorporating it into a recurrent encoder-decoder network

with a conditional va…

1 час назад @ arxiv.org
Context-Integrated and Feature-Refined Network for Lightweight Object Parsing. (arXiv:1907.11474v3 [cs.CV] UPDATED)
Context-Integrated and Feature-Refined Network for Lightweight Object Parsing. (arXiv:1907.11474v3 [cs.CV] UPDATED)

Semantic segmentation for lightweight object parsing is a very challenging

task, because both accuracy and efficiency (e.g., execution speed, memory

footprint or computational complexity) should all be taken into account.

However, most previous works pay too much attention to one-sided perspective,

either accuracy or speed, and ignore others, which poses a great limitation to

actual demands of intelligent devices. To tackle this dilemma, we propose a

novel lightweight architecture named Context-Integrated and Feature-Refined

Network (CIFReNet). The core components of CIFReNet are the Long-skip

Refinement Module (LRM) and the Multi-scale Context Integration Module (MCIM).

The LRM is designed…

1 час назад @ arxiv.org
Progressive Gradient Pruning for Classification, Detection and DomainAdaptation. (arXiv:1906.08746v4 [cs.LG] UPDATED)
Progressive Gradient Pruning for Classification, Detection and DomainAdaptation. (arXiv:1906.08746v4 [cs.LG] UPDATED)

Although deep neural networks (NNs) have achievedstate-of-the-art accuracy in

many visual recognition tasks,the growing computational complexity and energy

con-sumption of networks remains an issue, especially for ap-plications on

platforms with limited resources and requir-ing real-time processing. Filter

pruning techniques haverecently shown promising results for the compression

andacceleration of convolutional NNs (CNNs). However, thesetechniques involve

numerous steps and complex optimisa-tions because some only prune after

training CNNs, whileothers prune from scratch during training by

integratingsparsity constraints or modifying the loss function.In this paper we

propose a new Progre…

1 час назад @ arxiv.org
NAS-FCOS: Fast Neural Architecture Search for Object Detection. (arXiv:1906.04423v4 [cs.CV] UPDATED)
NAS-FCOS: Fast Neural Architecture Search for Object Detection. (arXiv:1906.04423v4 [cs.CV] UPDATED)

The success of deep neural networks relies on significant architecture

engineering. Recently neural architecture search (NAS) has emerged as a promise

to greatly reduce manual effort in network design by automatically searching

for optimal architectures, although typically such algorithms need an excessive

amount of computational resources, e.g., a few thousand GPU-days. To date, on

challenging vision tasks such as object detection, NAS, especially fast

versions of NAS, is less studied. Here we propose to search for the decoder

structure of object detectors with search efficiency being taken into

consideration. To be more specific, we aim to efficiently search for the

feature pyramid networ…

1 час назад @ arxiv.org
Defending Against Universal Attacks Through Selective Feature Regeneration. (arXiv:1906.03444v3 [cs.CV] UPDATED)
Defending Against Universal Attacks Through Selective Feature Regeneration. (arXiv:1906.03444v3 [cs.CV] UPDATED)

Deep neural network (DNN) predictions have been shown to be vulnerable to

carefully crafted adversarial perturbations. Specifically, image-agnostic

(universal adversarial) perturbations added to any image can fool a target

network into making erroneous predictions. Departing from existing defense

strategies that work mostly in the image domain, we present a novel defense

which operates in the DNN feature domain and effectively defends against such

universal perturbations. Our approach identifies pre-trained convolutional

features that are most vulnerable to adversarial noise and deploys trainable

feature regeneration units which transform these DNN filter activations into

resilient features…

1 час назад @ arxiv.org
Implicit Pairs for Boosting Unpaired Image-to-Image Translation. (arXiv:1904.06913v2 [cs.CV] UPDATED)
Implicit Pairs for Boosting Unpaired Image-to-Image Translation. (arXiv:1904.06913v2 [cs.CV] UPDATED)

In image-to-image translation the goal is to learn a mapping from one image

domain to another. In the case of supervised approaches the mapping is learned

from paired samples. However, collecting large sets of image pairs is often

either prohibitively expensive or not possible. As a result, in recent years

more attention has been given to techniques that learn the mapping from

unpaired sets. In our work, we show that injecting implicit pairs into unpaired sets

strengthens the mapping between the two domains, improves the compatibility of

their distributions, and leads to performance boosting of unsupervised

techniques by over 14% across several measurements. The competence of the implicit p…

1 час назад @ arxiv.org
Model-blind Video Denoising Via Frame-to-frame Training. (arXiv:1811.12766v3 [cs.CV] UPDATED)
Model-blind Video Denoising Via Frame-to-frame Training. (arXiv:1811.12766v3 [cs.CV] UPDATED)

Modeling the processing chain that has produced a video is a difficult

reverse engineering task, even when the camera is available. This makes model

based video processing a still more complex task. In this paper we propose a

fully blind video denoising method, with two versions off-line and on-line.

This is achieved by fine-tuning a pre-trained AWGN denoising network to the

video with a novel frame-to-frame training strategy. Our denoiser can be used

without knowledge of the origin of the video or burst and the post processing

steps applied from the camera sensor. The on-line process only requires a

couple of frames before achieving visually-pleasing results for a wide range of

perturbatio…

1 час назад @ arxiv.org
Papers With Code Papers With Code
последний пост 6 часов назад
Localized Flow-Based Clustering in Hypergraphs
Localized Flow-Based Clustering in Hypergraphs Localized Flow-Based Clustering in Hypergraphs

Local graph clustering algorithms are designed to efficiently detect small clusters of nodes that are biased to a localized region of a large graph.

Although many techniques have been developed for local clustering in graphs, very few algorithms have been designed to detect local clusters in hypergraphs, which better model complex systems involving multiway relationships between data objects...

In this paper we present a framework for local clustering in hypergraphs based on minimum cuts and maximum flows.

Our approach extends previous research on flow-based local graph clustering, but has been generalized in a number of key ways.

This allows us to accommodate a wide range of different hype…

6 часов назад @ paperswithcode.com
Crowdsourced Collective Entity Resolution with Relational Match Propagation
Crowdsourced Collective Entity Resolution with Relational Match Propagation Crowdsourced Collective Entity Resolution with Relational Match Propagation

Entity resolution (ER) aims to identify entities in KBs which refer to the same real-world object...

In this paper, we propose a novel approach called crowdsourced collective ER, which leverages the relationships between entities to infer matches jointly rather than independently.

Specifically, it iteratively asks human workers to label picked entity pairs and propagates the labeling information to their neighbors in distance.

During this process, we address the problems of candidate entity pruning, probabilistic propagation, optimal question selection and error-tolerant truth inference.

Our experiments on real-world datasets demonstrate that, compared with state-of-the-art methods, our app…

6 часов назад @ paperswithcode.com
Sparse principal component regression via singular value decomposition approach
Sparse principal component regression via singular value decomposition approach Sparse principal component regression via singular value decomposition approach

Principal component regression (PCR) is a two-stage procedure: the first stage performs principal component analysis (PCA) and the second stage constructs a regression model whose explanatory variables are replaced by principal components obtained by the first stage.

Since PCA is performed by using only explanatory variables, the principal components have no information about the response variable... To address the problem, we propose a one-stage procedure for PCR in terms of singular value decomposition approach.

Our approach is based upon two loss functions, a regression loss and a PCA loss, with sparse regularization.

The proposed method enables us to obtain principal component loadings …

6 часов назад @ paperswithcode.com
Adaptive Covariate Acquisition for Minimizing Total Cost of Classification
Adaptive Covariate Acquisition for Minimizing Total Cost of Classification Adaptive Covariate Acquisition for Minimizing Total Cost of Classification

Assuming that the cost of each covariate, and the cost of misclassification can be specified by the user, our goal is to minimize the (expected) total cost of classification, i.e.

the cost of misclassification plus the cost of the acquired covariates.

Furthermore, on several medical datasets, we show that the proposed method achieves in most situations the lowest total costs when compared to various previous methods.

Finally, we weaken the requirement on the user to specify all misclassification costs by allowing the user to specify the minimally acceptable recall (target recall).

Our experiments confirm that the proposed method achieves the target recall while minimizing the false discover…

6 часов назад @ paperswithcode.com
Bidirectional Generative Modeling Using Adversarial Gradient Estimation
Bidirectional Generative Modeling Using Adversarial Gradient Estimation Bidirectional Generative Modeling Using Adversarial Gradient Estimation

This paper considers the general $f$-divergence formulation of bidirectional generative modeling, which includes VAE and BiGAN as special cases.

We present a new optimization method for this formulation, where the gradient is computed using an adversarially learned discriminator...

In our framework, we show that different divergences induce similar algorithms in terms of gradient evaluation, except with different scaling.

Therefore this paper gives a general recipe for a class of principled $f$-divergence based generative modeling methods.

Theoretical justifications and extensive empirical studies are provided to demonstrate the advantage of our approach over existing methods.

6 часов назад @ paperswithcode.com
Unsupervised Enhancement of Soft-biometric Privacy with Negative Face Recognition
Unsupervised Enhancement of Soft-biometric Privacy with Negative Face Recognition Unsupervised Enhancement of Soft-biometric Privacy with Negative Face Recognition

Current research on soft-biometrics showed that privacy-sensitive information can be deduced from biometric templates of an individual.

Since for many applications, these templates are expected to be used for recognition purposes only, this raises major privacy issues...

In this work, we present Negative Face Recognition (NFR), a novel face recognition approach that enhances the soft-biometric privacy on the template-level by representing face templates in a complementary (negative) domain.

While ordinary templates characterize facial properties of an individual, negative templates describe facial properties that does not exist for this individual.

Experiments are conducted on two publicly …

6 часов назад @ paperswithcode.com
GenDICE: Generalized Offline Estimation of Stationary Values
GenDICE: Generalized Offline Estimation of Stationary Values GenDICE: Generalized Offline Estimation of Stationary Values

An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain.

In many real-world applications, access to the underlying transition operator is limited to a fixed set of data that has already been collected, without additional interaction with the environment being available... We show that consistent estimation remains possible in this challenging scenario, and that effective estimation can still be achieved in important applications.

Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions, derived from fundamental p…

6 часов назад @ paperswithcode.com
Efficient Sentence Embedding via Semantic Subspace Analysis
Efficient Sentence Embedding via Semantic Subspace Analysis Efficient Sentence Embedding via Semantic Subspace Analysis

A novel sentence embedding method built upon semantic subspace analysis, called semantic subspace sentence embedding (S3E), is proposed in this work.

Given the fact that word embeddings can capture semantic relationship while semantically similar words tend to form semantic groups in a high-dimensional embedding space, we develop a sentence representation scheme by analyzing semantic subspaces of its constituent words...

Second, we characterize the interaction between multiple semantic groups with the inter-group descriptor.

The proposed S3E method is evaluated on both textual similarity tasks and supervised tasks.

The complexity of our S3E method is also much lower than other parameterized…

6 часов назад @ paperswithcode.com
Differentially Private Set Union
Differentially Private Set Union Differentially Private Set Union

We study the basic operation of set union in the global model of differential privacy.

We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible.

For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem.

We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$.

We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm.

6 часов назад @ paperswithcode.com
Learning Certified Individually Fair Representations
Learning Certified Individually Fair Representations Learning Certified Individually Fair Representations

To effectively enforce fairness constraints one needs to define an appropriate notion of fairness and employ representation learning in order to impose this notion without compromising downstream utility for the data consumer.

A desirable notion is individual fairness as it guarantees similar treatment for similar individuals...

In this work, we introduce the first method which generalizes individual fairness to rich similarity notions via logical constraints while also enabling data consumers to obtain fairness certificates for their models.

The key idea is to learn a representation that provably maps similar individuals to latent representations at most $\epsilon$ apart in $\ell_{\infty}$…

6 часов назад @ paperswithcode.com
Guessing State Tracking for Visual Dialogue
Guessing State Tracking for Visual Dialogue Guessing State Tracking for Visual Dialogue

This paper proposes the guessing state for the guesser, and regards guess as a process with change of guessing state through a dialogue.

A guessing state tracking based guess model is therefore proposed.

UoVR updates the representation of the image according to current guessing state, QAEncoder encodes the question-answer pairs, and UoGS updates the guessing state by combining both information from the image and dialogue history.

With the guessing state in hand, two loss functions are defined as supervisions for model training.

Early supervision brings supervision to guesser at early rounds, and incremental supervision brings monotonicity to the guessing state.

6 часов назад @ paperswithcode.com
Suppressing Uncertainties for Large-Scale Facial Expression Recognition
Suppressing Uncertainties for Large-Scale Facial Expression Recognition Suppressing Uncertainties for Large-Scale Facial Expression Recognition

Annotating a qualitative large-scale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators.

These uncertainties lead to a key challenge of large-scale Facial Expression Recognition (FER) in deep learning era... To address this problem, this paper proposes a simple yet efficient Self-Cure Network (SCN) which suppresses the uncertainties efficiently and prevents deep networks from over-fitting uncertain facial images.

Specifically, SCN suppresses the uncertainty from two different aspects: 1) a self-attention mechanism over mini-batch to weight each training sample with …

6 часов назад @ paperswithcode.com
Exploring the Connection Between Binary and Spiking Neural Networks
Exploring the Connection Between Binary and Spiking Neural Networks Exploring the Connection Between Binary and Spiking Neural Networks

On-chip edge intelligence has necessitated the exploration of algorithmic techniques to reduce the compute requirements of current machine learning frameworks.

This work aims to bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks - both of which are driven by the same motivation and yet synergies between the two have not been fully explored... We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on large-scale datasets like CIFAR-$100$ and ImageNet.

An important implication of this work is that Binary Spiking Neural Networks can be enabled by "In-Memory" hardware accelerat…

6 часов назад @ paperswithcode.com
Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
Learning from Positive and Unlabeled Data with Arbitrary Positive Shift Learning from Positive and Unlabeled Data with Arbitrary Positive Shift

Positive-unlabeled (PU) learning trains a binary classifier using only positive and unlabeled data.

A common simplifying assumption is that the positive data is representative of the target positive class...

This paper shows that PU learning is possible even with arbitrarily non-representative positive data when provided unlabeled datasets from the source and target distributions.

The first couples negative-unlabeled (NU) learning with unlabeled-unlabeled (UU) learning while the other uses a novel recursive risk estimator robust to positive shift.

Experimental results demonstrate our methods' effectiveness across numerous real-world datasets and forms of positive data bias, including disjoi…

6 часов назад @ paperswithcode.com
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

Scene text detection and recognition has received increasing research attention.

Here we address the problem by proposing the Adaptive Bezier-Curve Network (ABCNet).

2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods.

Experiments on arbitrarily-shaped benchmark datasets, namely Total-Text and CTW1500, demonstrate that ABCNet achieves state-of-the-art accuracy, meanwhile significantly improving the speed.

In particular, on Total-Text, our realtime version is over 10 times faster than recent state-of-the-art methods with a competitive recognition accu…

6 часов назад @ paperswithcode.com
📝 Cool Blogs
ODS.ai Habr
последний пост 1 неделя назад
Настройка функции потерь для нейронной сети на данных сейсморазведки
Настройка функции потерь для нейронной сети на данных сейсморазведки Настройка функции потерь для нейронной сети на данных сейсморазведки

В прошлой статье мы описали эксперимент по определению минимального объема вручную размеченных срезов для обучения нейронной сети на данных сейсморазведки. Сегодня мы продолжаем эту тему, выбирая наиболее подходящую функцию потерь. Рассмотрены 2 базовых класса функций – Binary cross entropy и Intersection over Union – в 6-ти вариантах с подбором параметров, а также комбинации функций разных классов. Дополнительно рассмотрена регуляризация функции потерь. Спойлер: удалось существенно улучшить качество прогноза сети. Читать дальше →

1 неделя назад @ habr.com
Открытый курс «Deep Learning in NLP» от создателей DeepPavlov на базе курса cs224n
Открытый курс «Deep Learning in NLP» от создателей DeepPavlov на базе курса cs224n

Всем привет!

Вступление

Меня зовут Алексей Клоков, я хочу рассказать о запуске классного курса по обработке естественного языка (Natural Language Processing), который очередной раз запускают физтехи из проекта DeepPavlov – открытой библиотеки для разговорного искусственного интеллекта, которую разрабатывают в лаборатории нейронных систем и глубокого обучения МФТИ. Благодарю их и Moryshka за разрешение осветить эту тему на Хабре в нашем ods-блоге. Итак, поехали! Читать дальше →

2 недели, 5 дней назад @ habr.com
Рубрика «Читаем статьи за вас». Октябрь — Декабрь 2019
Рубрика «Читаем статьи за вас». Октябрь — Декабрь 2019 Рубрика «Читаем статьи за вас». Октябрь — Декабрь 2019

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (Facebook, 2019)

Implicit Discriminator in Variational Autoencoder (Indian Institute of Technology Ropar, 2019)

Self-training with Noisy Student improves ImageNet classification (Google Research, Carnegie Mellon University, 2019)

Momentum Contrast for Unsupervised Visual Representation Learning (Facebook, 2019)

Benchmarking Neural Network Robustness to Common Corruptions and …

3 недели, 6 дней назад @ habr.com
SVM. Объяснение с нуля и реализация на python. Подробный разбор метода опорных векторов
SVM. Объяснение с нуля и реализация на python. Подробный разбор метода опорных векторов SVM. Объяснение с нуля и реализация на python. Подробный разбор метода опорных векторов

Привет всем, кто выбрал путь ML-самурая!

Введение:

В данной статье рассмотрим метод опорных векторов (англ. SVM, Support Vector Machine) для задачи классификации. Будет представлена основная идея алгоритма, вывод настройки его весов и разобрана простая реализация своими руками. На примере датасета будет продемонстрирована работа написанного алгоритма с линейно разделимыми/неразделимыми данными в пространстве и визуализация обучения/прогноза. Дополнительно будут озвучены плюсы и минусы алгоритма, его модификации. Рисунок 1. Фото цветка ириса из открытых источников Читать дальше →

1 месяц назад @ habr.com
TensorRT 6.x.x.x — высокопроизводительный инференс для моделей глубокого обучения (Object Detection и Segmentation)
TensorRT 6.x.x.x — высокопроизводительный инференс для моделей глубокого обучения (Object Detection и Segmentation) TensorRT 6.x.x.x — высокопроизводительный инференс для моделей глубокого обучения (Object Detection и Segmentation)

Больно только в первый раз! Всем привет! Дорогие друзья, в этой статье я хочу поделиться своим опытом использования TensorRT, RetinaNet на базе репозитория github.com/aidonchuk/retinanet-examples (это форк официальной репы от nvidia, который позволит начать использовать в продакшен оптимизированные модели в кратчайшие сроки). Пролистывая сообщения в каналах сообщества ods.ai, я сталкиваюсь с вопросами по использованию TensorRT, и в основном вопросы повторяются, поэтому я решил написать как можно более полное руководство по использованию быстрого инференса на основе TensorRT, RetinaNet, Unet и docker. Читать дальше →

1 месяц назад @ habr.com
Проект Lacmus: как компьютерное зрение помогает спасать потерявшихся людей
Проект Lacmus: как компьютерное зрение помогает спасать потерявшихся людей Проект Lacmus: как компьютерное зрение помогает спасать потерявшихся людей

Всем привет! Возможно, вы уже знаете про инициативу Machine Learning for Social Good (#ml4sg) сообщества Open Data Science. В её рамках энтузиасты на бесплатной основе применяют методы машинного обучения для решения социально-значимых проблем. Мы, команда проекта Lacmus (#proj_rescuer_la), занимаемся внедрением современных Deep Learning-решений для поиска людей, потерявшихся вне населённой местности: в лесу, поле и т.д. Читать дальше →

1 месяц, 1 неделя назад @ habr.com
Эксперименты с нейронными сетями на данных сейсморазведки
Эксперименты с нейронными сетями на данных сейсморазведки Эксперименты с нейронными сетями на данных сейсморазведки

Сложность интерпретации данных сейсмической разведки связана с тем, что к каждой задаче необходимо искать индивидуальный подход, поскольку каждый набор таких данных уникален. Ручная обработка требует значительных трудозатрат, а результат часто содержит ошибки, связанные с человеческим фактором. Использование нейронных сетей для интерпретации может существенно сократить ручной труд, но уникальность данных накладывает ограничения на автоматизацию этой работы. Данная статья описывает эксперимент по анализу применимости нейронных сетей для автоматизации выделения геологических слоев на 2D-изображениях на примере полностью размеченных данных из акватории Северного моря. Рисунок 1. Проведение акв…

1 месяц, 2 недели назад @ habr.com
Как подружить PyTorch и C++. Используем TorchScript
Как подружить PyTorch и C++. Используем TorchScript Как подружить PyTorch и C++. Используем TorchScript

Около года назад разработчики PyTorch представили сообществу TorchScript — инструмент, который позволяет с помощью пары строк кода и нескольких щелчков мыши сделать из пайплайна на питоне отчуждаемое решение, которое можно встроить в систему на C++. Ниже я делюсь опытом его использования и постараюсь описать встречающиеся на этом пути подводные камни. Особенное внимание уделю реализации проекта на Windows, поскольку, хотя исследования в ML обычно делаются на Ubuntu, конечное решение часто (внезапно!) требуется под "окошками". Примеры кода для экспорта модели и проекта на C++, использующего модель, можно найти в репозиториии на GitHub. Читать дальше →

2 месяца, 1 неделя назад @ habr.com
О Структурном Моделировании Организационных Изменений
О Структурном Моделировании Организационных Изменений О Структурном Моделировании Организационных Изменений

75% 3 из 4 — так Boston Consulting Group оценивает долю IT проектов, почивших по не-техническим причинам. Уже вот две подряд редакции свода знаний по управлению проектами (PMBOK) выделяют процессы по управлению стейкхолдерами в отдельную область знаний под счастливым номером 13 и настоятельно рекомендуют учитывать: 1. связи между ними,

2. центры влияния, а также 3. культуру общения — для повышения шансов на успех.

Вопрос один: доколе инженеры о стейкхолдерах будут судить догадками? ФОТО: Шариф Хамза для Dazed & Confuzed, модель — Люпита Нионго В свете недавней безоговорочной победы русской математики над вопросом хроматических чисел рассмотрим сценарий применения стремительно набирающей поп…

2 месяца, 3 недели назад @ habr.com
Как я решал соревнование по машинному обучению data-like
Как я решал соревнование по машинному обучению data-like Как я решал соревнование по машинному обучению data-like

Привет, Хабр. Недавно прошло соревнование от Тинькофф и McKinsey. Конкурс проходил в два этапа: первый — отборочный, в kaggle формате, т.е. отсылаешь предсказания — получаешь оценку качества предсказания; побеждает тот, у кого лучше оценка. Второй — онсайт хакатон в Москве, на который проходит топ 20 команд первого этапа. В этой статье я расскажу об отборочном этапе, где мне удалось занять первое место и выиграть макбук. Команда на лидерборде называлась "дети Лёши". Соревнование проходило с 19 сентября до 12 октября. Я начал решать ровно за неделю до конца и решал почти фулл-тайм.

Краткое описание соревнования:

Летом в банковском приложении Тинькофф появились stories (как в Instagram). На s…

2 месяца, 4 недели назад @ habr.com
Рубрика «Читаем статьи за вас». Июль — Сентябрь 2019
Рубрика «Читаем статьи за вас». Июль — Сентябрь 2019 Рубрика «Читаем статьи за вас». Июль — Сентябрь 2019

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Layer rotation: a surprisingly powerful indicator of generalization in deep networks? (Université catholique de Louvain, Belgium, 2018)

Parameter-Efficient Transfer Learning for NLP (Google Research, Jagiellonian University, 2019) RoBERTa: A Robustly Optimized BERT Pretraining Approach (University of Washington, Facebook AI, 2019)

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (Google Research, 2019)

How the Brain Transitions from Conscious to Subliminal Percept…

4 месяца назад @ habr.com
Рубрика «Читаем статьи за вас». Январь — Июнь 2019
Рубрика «Читаем статьи за вас». Январь — Июнь 2019 Рубрика «Читаем статьи за вас». Январь — Июнь 2019

Привет, Хабр! Продолжаем публиковать рецензии на научные статьи от членов сообщества Open Data Science из канала #article_essense. Хотите получать их раньше всех — вступайте в сообщество!

Статьи на сегодня: Neural Ordinary Differential Equations (University of Toronto, 2018)

Semi-Unsupervised Learning with Deep Generative Models: Clustering and Classifying using Ultra-Sparse Labels (University of Oxford, The Alan Turing Institute, London, 2019)

Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure (Massachusetts Institute of Technology, Harvard University, 2019)

Deep reinforcement learning from human preferences (OpenAI, DeepMind, 2017)

Exploring Randomly Wired Neural …

4 месяца, 1 неделя назад @ habr.com
Создаем датасет для распознавания счетчиков на Яндекс.Толоке
Создаем датасет для распознавания счетчиков на Яндекс.Толоке Создаем датасет для распознавания счетчиков на Яндекс.Толоке

Как-то два года назад, случайно включив телевизор, я увидел интересный сюжет в программе "Вести". В нём рассказывали о том, что департамент информационных технологий Москвы создает нейросеть, которая будет считывать показания счетчиков воды по фотографиям. В сюжете телеведущий попросил горожан помочь проекту и прислать снимки своих счетчиков на портал mos.ru, чтобы на них обучить нейронную сеть. Если Вы — департамент Москвы, то выпустить ролик на федеральном канале и попросить людей прислать изображения счетчиков — не очень большая проблема. Но что делать, если Вы — маленький стартап, и сделать рекламу на телеканале не можете? Как получить 50000 изображений счетчиков в таком случае? Читать …

4 месяца, 2 недели назад @ habr.com
Из физиков в Data Science (Из двигателей науки в офисный планктон). Третья часть
Из физиков в Data Science (Из двигателей науки в офисный планктон). Третья часть Из физиков в Data Science (Из двигателей науки в офисный планктон). Третья часть

Эта картинка, за авторством Артура Кузина (n01z3), достаточно точно суммирует содержание блог поста. Как следствие, дальнейшее повествование должно восприниматься скорее как пятничная история, нежели как что-то крайне полезное и техническое. Кроме того, стоит отметить, что текст насыщен английскими словами. Какие-то из них я не знаю как правильно перевести, а какие-то переводить просто не хочется.

Первая часть.

Вторая часть.

О том, как проходил переход из среды академической в среду индустриальную раскрыто в первых двух сериях. В этой же, разговор пойдет о том, что было дальше.

Шел январь 2017 года. На тот момент у меня было чуть больше года трудового стажа и работал я в Сан-Франциско в ком…

5 месяцев назад @ habr.com
Визуализация больших графов для самых маленьких
Визуализация больших графов для самых маленьких Визуализация больших графов для самых маленьких

Что делать, если вам нужно нарисовать граф, но попавшиеся под руку инструменты рисуют какой-то комок волос или вовсе пожирают всю оперативную память и вешают систему? За последние пару лет работы с большими графами (сотни миллионов вершин и рёбер) я испробовал много инструментов и подходов, и почти не находил достойных обзоров. Поэтому теперь пишу такой обзор сам. Читать дальше →

5 месяцев, 3 недели назад @ habr.com
inFERENCe inFERENCe
последний пост 3 месяца, 1 неделя назад
Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem
Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem

November 14, 2019Meta-Learning Millions of Hyper-parameters using the Implicit Function TheoremLast night on the train I read this nice paper by David Duvenaud and colleagues.

Implicit Function TheoremMany - though not all - meta-learning or hyperparameter optimization problems can be stated as nested optimization problems.

$$Using a finite truncation of the Neumann series one can approximate the inverse Hessian in the following way:$$\left[\frac{\partial^2 \mathcal{L}_T}{\partial \theta \partial \theta}\right]^{-1} \approx \sum_{i=1}^j \left(I - \frac{\partial^2 \mathcal{L}_T}{\partial \theta \partial \theta}\right)^i.

Most crucially, methods based on implicit gradients assume that your le…

3 месяца, 1 неделя назад @ inference.vc
The secular Bayesian: Using belief distributions without really believing
The secular Bayesian: Using belief distributions without really believing The secular Bayesian: Using belief distributions without really believing

October 31, 2019The secular Bayesian: Using belief distributions without really believingThe religious BayesianMy parents didn't raise me in a religious tradition.

The secular BayesianOver the years I came to terms with my Bayesian heritage, and I now live my life as a secular Bayesian.

This choice is the real reason why the resulting update rule will end up very Bayes-rule like, as we will see later.

RationalityNow that we have an update rule which satisfies our desiderata, can we say if it's actually a good or useful update rule?

So, not only is this update rule the only update rule that satisfies the desired properties, it is also optimal under this particular definition of optimality/ra…

3 месяца, 3 недели назад @ inference.vc
Exponentially Growing Learning Rate? Implications of Scale Invariance induced by Batch Normalization
Exponentially Growing Learning Rate? Implications of Scale Invariance induced by Batch Normalization Exponentially Growing Learning Rate? Implications of Scale Invariance induced by Batch Normalization

October 25, 2019Exponentially Growing Learning Rate?

Implications of Scale Invariance induced by Batch NormalizationYesterday I read this intriguing paper about the midboggling fact that it is possible to use exponentially growing learning rate schedule when training neural networks with batch normalization:Zhiyuan Li and Sanjeev Arora (2019) An Exponential Learning Rate Schedule for Deep LearningThe paper provides both theoretical insights as well as empirical demonstration of this remarcable property.

So Imagine doing vanilla gradient descent (no momentum, weight decay, fixed learning rate) on such a loss surface.

However, the weight vector won't completely blow up to infinity, because th…

4 месяца назад @ inference.vc
On Marginal Likelihood and Cross-Validation
On Marginal Likelihood and Cross-Validation On Marginal Likelihood and Cross-Validation

The marginal likelihood and cross-validationTo discuss the connection between marginal likelihoods to (Bayesian) cross validation, let's first define what is what.

For each of these permutations we can decompose the marginal likelihood as a product of conditionals, or equivalently we can write the log marginal likelihood as a sum of logs of the same conditionals.

So, the sum of all the terms in this matrix gives the marginal likelihood times 6 (as there are 6 columns).

This observation gives a really good motivation for using the marginal likelihood, and also gives a new perspective on how it works.

Calculating the marginal likelihood amounts to evaluating the average predictive score on al…

4 месяца, 1 неделя назад @ inference.vc
Notes on iMAML: Meta-Learning with Implicit Gradients
Notes on iMAML: Meta-Learning with Implicit Gradients Notes on iMAML: Meta-Learning with Implicit Gradients

September 19, 2019Notes on iMAML: Meta-Learning with Implicit GradientsThis week I read this cool new paper on meta-learning: it a slightly different approach compared to its predecessors based on some observations about differentiating the optima of regularized optimization.

Let me illustrate what that dependence looks like:In the figure above, let's say that we would like to minimise an objective function $f(\theta)$.

Rather than deterministically finding a particular local minimum, SGD samples different minima: when run with different random seeds it will find different minima.

The meta-learning objective now depends on $\theta_0$ in two different ways:as we change the anchor $\theta_0$,…

5 месяцев, 1 неделя назад @ inference.vc
Invariant Risk Minimization: An Information Theoretic View
Invariant Risk Minimization: An Information Theoretic View Invariant Risk Minimization: An Information Theoretic View

July 19, 2019Invariant Risk Minimization: An Information Theoretic ViewI finally got around to reading this new paper by Arjovsky et al.

Here, I will describe the main idea and then provide an information theoretic view on the same topic.

$Y \perp\mkern-13mu\perp E\vert X_1, W$: The observable $X_1$ and latent $W$ shield the label $Y$ from the influence of the environment.

Say we have a parametric family of functions $f(y\vert \phi(x); \theta)$ for predicting $y$ from $\phi(x)$.

The conditional information can be approximated as follows:\begin{align}I[Y, E \vert \phi(x)] &\approx \min_\theta {E}_{x,y} \ell (f(y\vert \phi(x); \theta) - \mathbb{E}_e \min_{\theta_e} \mathbb{E}_{x,y\vert e} \el…

7 месяцев, 1 неделя назад @ inference.vc
ICML Highlight: Contrastive Divergence for Combining Variational Inference and MCMC
ICML Highlight: Contrastive Divergence for Combining Variational Inference and MCMC ICML Highlight: Contrastive Divergence for Combining Variational Inference and MCMC

Ruiz and Titsias (2019) A Contrastive Divergence for Combining Variational Inference and MCMCBackground: principle of minimal improvementFirst, some background on why I found this paper particulartly interesting.

Using such improvement operator you can define an objective function for policies by measuring the extent to which the operator changes a policy.

In the case of AlphaGo Zero, the improvement operator is Monte Carlo Tree Search (MCTS).

The paper I'm talking about uses a very similar argument to come up with a contrastive divergence for variational inference, where the improvement operator is MCMC step.

Combining VI with MCMCThe two dominant ways of performing inference in latent var…

8 месяцев, 2 недели назад @ inference.vc
Notes on the Limitations of the Empirical Fisher Approximation
Notes on the Limitations of the Empirical Fisher Approximation Notes on the Limitations of the Empirical Fisher Approximation

June 6, 2019Notes on the Limitations of the Empirical Fisher ApproximationThis post is a short not on an excellent recent paper on empirical Fisher information matrices:Kunstner, Balles and Hennig (2019) Limitations of the Empirical Fisher ApproximationI was debating with myself whether I should write a post about this because it's a superbly written paper that you should probably read in full.

There isn't a whole lot of novelty in the paper, but it is a great discussion paper that provides a concise overview of the Fisher information, the empirical Fisher matrix and their connectinos to generalized Gauss-Newton methods.

The third shows the gradients corrected by the empirical Fisher instea…

8 месяцев, 3 недели назад @ inference.vc
Perceptual Straightening of Natural Videos
Perceptual Straightening of Natural Videos Perceptual Straightening of Natural Videos

May 30, 2019Perceptual Straightening of Natural VideosVideo is an interesting domain for unsupervised, or self-supervised, representation learning.

So, for example, straight trajectories have an almost $0$ probability under a high-dimensional Brownian motion or Ornstein–Uhlenbeck (OU) process.

Results and SummaryThe main results of the paper - as expected - is that natural video sequences indeed appear to be mapped to straight trajectories in representation space.

For one, the paper assumes a Gaussian observation noise in representation space, and I wonder how robust the analysis would be to assuming heavy-tailed noise.

Similarly, our very definition of straightness and angles relies on the…

9 месяцев назад @ inference.vc
DeepSets: Modeling Permutation Invariance
DeepSets: Modeling Permutation Invariance DeepSets: Modeling Permutation Invariance

February 7, 2019DeepSets: Modeling Permutation Invariance###### guest post by [ Fabian Fuchs ](https://twitter.com/FabianFuchsML), [ Ed Wagstaff ](https://github.com/edwag), and [ Martin Engelcke ](https://twitter.com/martinengelcke)One of my favourite recent innovations in neural network architectures is Deep Sets.

In such a situation, the invariance property we can exploit is permutation invariance.

To give a short, intuitive explanation for permutation invariance, this is what a permutation invariant function with three inputs would look like: $f(a, b, c) = f(a, c, b) = f(b, a, c) = \dots$.

The Deep Sets Architecture (Sum-Decomposition)Having established that there is a need for permutat…

1 год назад @ inference.vc
Causal Inference 3: Counterfactuals
Causal Inference 3: Counterfactuals Causal Inference 3: Counterfactuals

You hopefully know enough about causal inference by now to know that $p(🎓\vert 🧔=0)$ is certainly not the quantity we seek.

Counterfactual queriesTo finally explain counterfactuals, I have to step beyond causal graphs and introduce another concept: structural equation models.

Structural Equation ModelsA causal graph encodes which variables have a direct causal effect on any given node - we call these causal parents of the node.

$f_1$ computes $x$ from its causal parent $u$, and $f_2$ computes $a$ from its causal parents $x$ and $v$.

The structural equation model (SEM) entails the causal graph, in that you can reconstruct the causal graph by looking at the inputs of each function.

1 год, 1 месяц назад @ inference.vc
Causal Inference 2: Illustrating Interventions via a Toy Example
Causal Inference 2: Illustrating Interventions via a Toy Example Causal Inference 2: Illustrating Interventions via a Toy Example

Consequently,the joint distribution of data alone is insufficient to predict behaviour under interventions.

Finally, you can use various causal discovery techniques to try to identify the causal diagram from the data itself.

Theoretically, recovering the full causal graph from the data is impossible in general cases.

SummaryWe have seen that modeling the joint distribution can only get you so far, and if you want to predict the effect of interventions, i.e.

calculate $p(y\vert do(x))$-like quantities, you have to add a causal graph to your analysis.

1 год, 1 месяц назад @ inference.vc
Online Bayesian Deep Learning in Production at Tencent
Online Bayesian Deep Learning in Production at Tencent Online Bayesian Deep Learning in Production at Tencent

These applications include active learning, reinforcement learning and online/continual learning.

So as I recently read a paper by Tencent, I was surprised to learn that the online Bayesian deep learning algorithm is apparently deployed in production to power click-through-rate prediction in their ad system.

Assumed Density FilteringThe method relies on the approximate Bayesian online-learning technique often referred to as assumed density filtering.

forward propagation: In Bayesian deep learning, we maintain a distribution $q(w)$ over neural network weights, and each value $w$ defines a conditional probability $p(y\vert x, w)$.

In Bayesian deep learning, we maintain a distribution $q(w)$ o…

1 год, 3 месяца назад @ inference.vc
👻Halloween Special: Critical reviews of the worst NIPS 2018 papers.
👻Halloween Special: Critical reviews of the worst NIPS 2018 papers. 👻Halloween Special: Critical reviews of the worst NIPS 2018 papers.

posts on machine learning, statistics, opinions on things I'm reading in the space

1 год, 3 месяца назад @ inference.vc
The Blessings of Multiple Causes: Causal Inference when you Can't Measure Confounders
The Blessings of Multiple Causes: Causal Inference when you Can't Measure Confounders The Blessings of Multiple Causes: Causal Inference when you Can't Measure Confounders

September 7, 2018The Blessings of Multiple Causes: Causal Inference when you Can't Measure ConfoundersHappy back-to-school time everyone!

In this case, the size of the kidney stone is a confounder variable.

Let's look at how this differs from the non-causal association you would measure between treatment and outcome (i.e.

there may be confounders, but all confounders causally influence at least two of the cause variables.

It identifies just enough about the causal structure (the substitute confounder variable) to then be able to make causal inferences of a certain type.

1 год, 5 месяцев назад @ inference.vc
The Spectator The Spectator
последний пост 3 месяца, 1 неделя назад
Machinery of Grace
Machinery of Grace Machinery of Grace

The machinery of grace is always simple.

The machines i’m thinking of are machines with intelligence, machines that learn.

Dialogues that lead to co-design and inclusion in the mission of developing intelligent machines with grace.

Firstly, to celebrate our progress in machine learning, but one that must now be balanced using a new critical practice.

If we are successful in making global AI truly global, and I believe we can be, we set ourselves on the path to realising that intelligent machinery of grace.

3 месяца, 1 неделя назад @ blog.shakirm.com
A New Consciousness of Inclusion in Machine Learning
A New Consciousness of Inclusion in Machine Learning A New Consciousness of Inclusion in Machine Learning

On LGBT Freedoms and our Support for Machine Learning in AfricaThis is an exploration of my thinking and my personal views.

The choice of these host countries has fomented concerns throughout our machine learning community: how can we as a community committed to inclusion in every form consider hosting our conferences in countries like these that are far from inclusive?

A politics of location, and an ethics of inclusion is growing healthily within our machine learning community.

But I too am an out and proud gay machine learning scientist.

My hope is that we will always continue to experiment with the ways in which we organise and support our global machine learning community.

8 месяцев, 2 недели назад @ blog.shakirm.com
Racialised Lives and the Life Beyond
Racialised Lives and the Life Beyond Racialised Lives and the Life Beyond

The Black women is racialised, and so too is the White man, as is every person we have ever known, and so the cycle of our racialised lives lives on.

About two-and-a-half years ago, I was part of creating a new organisation called the Deep Learning Indaba, as one attempt to engage with these questions.

The grassroots are those groups within our institutions, like our LGBT resource group within DeepMind, and those outside movements, like the Deep Learning Indaba.

I see the leadership of the Deep Learning Indaba as such a collective.

But I think we show the power of political love today, in this room, with our memory, with our energy, and in the celebration of progress that has brought us her…

8 месяцев, 3 недели назад @ blog.shakirm.com
Talk: How Do We Support Under-represented Groups To Put Themselves Forward?
Talk: How Do We Support Under-represented Groups To Put Themselves Forward? Talk: How Do We Support Under-represented Groups To Put Themselves Forward?

As you think of this question, consider the journey that is taken by the under-represented groups we might have in mind.

Journey’s like mine are our struggle credentials.

This room is filled with struggle credentials.

Struggle credentials play too much of a role in our present.

It is the under-represented groups that must eventually be put forward.

1 год, 3 месяца назад @ blog.shakirm.com
Machine Learning Trick of the Day (8): Instrumental Thinking
Machine Learning Trick of the Day (8): Instrumental Thinking Machine Learning Trick of the Day (8): Instrumental Thinking

The instrumental variables idea is conceptually simple: we introduce new observed variables z, called instrumental variables, into our model; figure 1 (right).

And this is the trick: instrumental variables are special subset of the data we already have, but they allow us to remove the effect of confounders.

Our problem is to learn a linear value function using features (when in state x) using parameters so that .

But this probabilistic viewpoint through instrumental variables means that we can think of alternative ways of extending this view.

Like every trick in this series, the instrumental variables give us an alternative way to think about existing problems.

1 год, 4 месяца назад @ blog.shakirm.com
Decolonising Artificial Intelligence
Decolonising Artificial Intelligence Decolonising Artificial Intelligence

· Read in 6mins · 1297 words ·The Artificial Intelligence we believe to be global, is far from it.

Inevitably, a call will be made to decolonise artificial intelligence.

The call for decolonisation in artificial intelligence is yet to reach its full volume.

Kai Fu Lee, The Real Threat of Artificial Intelligence, June 2017We immediately recognise the colonial nature of this possible future.

The only AI that empowers and works for the benefit of humanity is a truly global AI.

1 год, 4 месяца назад @ blog.shakirm.com
The Price of Transformation
The Price of Transformation The Price of Transformation

The price of transformation is ours to pay.

Transformation cannot be separated from my other pillars, for they require transformation to succeed.

The price of transformation cannot be paid in this way.

We must all confront the question: What is the price of transformation?

We need to convince ourselves that the price of transformation is something we are willing to pay, and that we should pay.

1 год, 5 месяцев назад @ blog.shakirm.com
Machine Learning Trick of the Day (7): Density Ratio Trick
Machine Learning Trick of the Day (7): Density Ratio Trick Machine Learning Trick of the Day (7): Density Ratio Trick

The same is true if we want to compare probability densities: either through a density difference or a density ratio.

Density ratios are ubiquitous in machine learning, and will be our focus.

Density Ratio EstimationThe central task in the above five statistical quantities is to efficiently compute the ratio .

This is where the density ratio trick or formally, density ratio estimation, enters: it tells us to construct a binary classifier that distinguishes between samples from the two distributions.

This final derivation says that the problem of density ratio estimation is equivalent to that of binary classification.

2 года, 1 месяц назад @ blog.shakirm.com
Cognitive Machine Learning (2): Uncertain Thoughts
Cognitive Machine Learning (2): Uncertain Thoughts Cognitive Machine Learning (2): Uncertain Thoughts

These types of thinking are secondary-levels of thinking: a thinking about thinking.

Like the primary colours, our primary thoughts are those that are the basis of our cognition.

Secondary colours use the primary colours as their basis, and similarly, secondary thoughts are thoughts about our primary thoughts.

Our memories, decisions and attitudes are amongst our primary thoughts, and for each we have secondary thoughts—metacognitive confidence assessments—that guide our behaviours.

Again, we can make such assessments in two ways: about the decisions we are still to make, a prospective decision confidence; and decisions we have already made, a retrospective decision confidence.

2 года, 11 месяцев назад @ blog.shakirm.com
Cognitive Machine Learning (1): Learning to Explain
Cognitive Machine Learning (1): Learning to Explain Cognitive Machine Learning (1): Learning to Explain

In what ways can machine learning systems be saved from such a fate?

Learning to ExplainExplanation has always been a core topic in machine learning and artificial intelligence.

This is a problem of relational learning, a highly active topic in machine learning.

This is a problem of relational learning, a highly active topic in machine learning.

By taking inspiration from cognitive science, and the many other computational sciences, and combining them into our machine learning efforts, we take the positive steps on our own path to the hopefully less elusive world of machine learning with explanations.

3 года назад @ blog.shakirm.com
大トロ 大トロ
последний пост 8 месяцев, 2 недели назад
Weight Agnostic Neural Networks
Weight Agnostic Neural Networks Weight Agnostic Neural Networks

We search for neural network architectures that can already perform various tasks even when they use random weight values.

Redirecting to weightagnostic.github.io, where the article resides.

8 месяцев, 2 недели назад @ blog.otoro.net
Learning Latent Dynamics for Planning from Pixels
Learning Latent Dynamics for Planning from Pixels Learning Latent Dynamics for Planning from Pixels

PlaNet learns a world model from image inputs only and successfully leverages it for planning in latent space.

Redirecting to planetrl.github.io, where the article resides.

1 год назад @ blog.otoro.net
Reinforcement Learning for Improving Agent Design
Reinforcement Learning for Improving Agent Design Reinforcement Learning for Improving Agent Design

Little dude rewarded for having little legs.

Redirecting to designrl.github.io, where the article resides.

1 год, 4 месяца назад @ blog.otoro.net
World Models Experiments
World Models Experiments World Models Experiments

In this article I will give step-by-step instructions for reproducing the experiments in the World Models article (pdf).

For general discussion about the World Models article, there are already some good discussion threads here in the GitHub issues page of the interactive article.

World Models (pdf)A Visual Guide to Evolution StrategiesEvolving Stable StrategiesBelow is optionalMixture Density NetworksMixture Density Networks with TensorFlowRead tutorials on Variational Autoencoders if you are not familiar with them.

I use a combination of OS X for inference, but trained models using Google Cloud VMs.

You should update your git repo with these new models using git add doomrnn/tf_models/*.js…

1 год, 8 месяцев назад @ blog.otoro.net
World Models
World Models World Models

Can agents learn inside of their own dreams?

Redirecting to worldmodels.github.io, where the article resides.

1 год, 11 месяцев назад @ blog.otoro.net
Evolving Stable Strategies
Evolving Stable Strategies Evolving Stable Strategies

popsize ): # init the agent with a solution agent = Agent ( solutions [ i ]) # rollout env with this agent fitlist [ i ] = rollout ( agent , env ) # give scores results back to ES solver .

One way to convert into a stochastic policy is to make random.

Robot arm grasping task using a stochastic policy.

The Minitaur model in pybullet is designed to mimic the real physical Minitaur.

After making the ball smaller, CMA-ES was able to find a stochastic policy that can walk and balance the ball at the same time.

2 года, 3 месяца назад @ blog.otoro.net
A Visual Guide to Evolution Strategies
A Visual Guide to Evolution Strategies A Visual Guide to Evolution Strategies

In this post I explain how evolution strategies (ES) work with the aid of a few visual examples.

OpenAI published a paper called Evolution Strategies as a Scalable Alternative to Reinforcement Learning where they showed that evolution strategies, while being less data efficient than RL, offer many benefits.

Schaffer-2D FunctionRastrigin-2D FunctionAlthough there are many definitions of evolution strategies, we can define an evolution strategy as an algorithm that provides the user a set of candidate solutions to evaluate a problem.

Let’s visualise the scheme one more time, on the entire search process on both problems:Because CMA-ES can adapt both its mean and covariance matrix using info…

2 года, 4 месяца назад @ blog.otoro.net
Teaching Machines to Draw
Teaching Machines to Draw Teaching Machines to Draw

In this work, we investigate an alternative to traditional pixel image modelling approaches, and propose a generative model for vector images.

For example, we can subtract the latent vector of an encoded pig head from the latent vector of a full pig, to arrive at a vector that represents the concept of a body.

As we saw earlier, a model trained to draw pigs can be made to draw pig-like trucks if given an input sketch of a truck.

Exploring the latent space between different objects can potentially enable creative designers to find interesting intersections and relationships between different drawings:Exploring the latent space between cats and buses, elephants and pigs, and various owls.

In …

2 года, 9 месяцев назад @ blog.otoro.net
Recurrent Neural Network Tutorial for Artists
Recurrent Neural Network Tutorial for Artists Recurrent Neural Network Tutorial for Artists

In particular, the experiments in the post help visualise the internals of a recurrent neural network trained to generate handwriting.

Recurrent Neural Network for HandwritingWe have pre-trained a recurrent neural network model to preform the handwriting task described in the previous section.

var x , y ; var dx , dy ; var pen ; var prev_pen ; var rnn_state ; var pdf ; var temperature = 0.65 ; var screen_width = window .

get_pdf ( rnn_state ); [ dx , dy , pen ] = Model .

I haven’t personally used keras.js, and I found it fun to just write the handwriting model from scratch in Javascript.

3 года, 1 месяц назад @ blog.otoro.net
Hyper Networks
Hyper Networks Hyper Networks

Recurrent Networks can be viewed as a really deep feed forward network with the identical weights at each layer (this is called weight-tying).

The more exciting work is in the second part of my paper where we apply Hypernetworks to Recurrent Networks.

Our approach is to put a small LSTM cell (called the HyperLSTM cell) inside a large LSTM cell (the main LSTM).

Unlike the Static Hypernetwork, the weight-generating embedding vectors are not kept constant, but will be dynamically generated by the HyperLSTM cell.

This makes it easy to plug my research code to existing code that was designed to use the vanilla LSTM cell.

3 года, 5 месяцев назад @ blog.otoro.net
Generating Large Images from Latent Vectors - Part Two
Generating Large Images from Latent Vectors - Part Two Generating Large Images from Latent Vectors - Part Two

Random gaussian latent vectors were generated from numpy.random and fed into the generative network to obtain these images.

Our generator can produce large random images of digits using random gaussian vectors as input.

Unlike the previous model though, the generated images do not necessarily have to look exactly like the set of training images.

All the generator has to do is to create a set of new images that share the same classification labels of the set of training images.

Description of Generator NetworkThe generator used in the previous model uses 4 large layers of 128 nodes that are fully connected.

3 года, 8 месяцев назад @ blog.otoro.net
Neural Network Evolution Playground with Backprop NEAT
Neural Network Evolution Playground with Backprop NEAT Neural Network Evolution Playground with Backprop NEAT

This demo will attempt to use a genetic algorithm to produce efficient, but atypical neural network structures to classify datasets borrowed from TensorFlow Playground.

People started experimenting with different neural network configurations, such as how many neural network layers are actually needed to fit a certain data set, or what initial features should be used for another data set.

In addition to weight-search, Deep Learning research has also produced many powerful neural network architectures that are important building blocks.

Evolving Neural Network TopologyNeuroevolution of Augmenting Topologies (NEAT) is a method that can evolve new types of neural networks based on genetic algo…

3 года, 9 месяцев назад @ blog.otoro.net
Interactive Abstract Pattern Generation Javascript Demo
Interactive Abstract Pattern Generation Javascript Demo Interactive Abstract Pattern Generation Javascript Demo

Interactive Javascript Demo for Abstract Pattern Generation.

Although there were some code available previously in Javascript, it wasn’t general enough to use as a tool for a digital artist.

Karpathy’s recurrent.js library makes it really easy to implement highly customised neural networks in JS, and adopts a computational graph type of method similar to modern libraries.

In addition, the user is able to specify the size and depth of the generator neural network.

The depth and size of the network, and also the image resolution of the output can all be customised in the web app.

3 года, 10 месяцев назад @ blog.otoro.net
The Frog of CIFAR 10
The Frog of CIFAR 10 The Frog of CIFAR 10

For example, we can set every layer in the generator to be tanh :Pure tanh generative network1 2 3 4 5 6 H = tf .

net_depth_g ): H = tf .

net_depth_g )))The picture below was generated from 8 layers of pure tanh layers, and trained on the frog class of CIFAR-10.

net_depth_g ): H = tf .

We subsample a slightly smaller image inside each training image, so that our training image is actually 30x30 pixels.

3 года, 10 месяцев назад @ blog.otoro.net
Generating Large Images from Latent Vectors
Generating Large Images from Latent Vectors Generating Large Images from Latent Vectors

In this post, we will use CPPN to generate large images from smaller images from the MNIST training set.

The concept behind GANs is introduce a Discriminator (D) network to compliment the Generator (G) network above.

We will lack the ability to generate random images, because we lack the ability to draw Z from a random distribution.

py sampler = Sampler () z = sampler .

That is, take a random MNIST image, encode the image to a latent vector Z, and then generate back the image.

3 года, 11 месяцев назад @ blog.otoro.net
The Unofficial Google Data Science Blog The Unofficial Google Data Science Blog
последний пост 2 месяца, 3 недели назад
Humans-in-the-loop forecasting: integrating data science and business planning
Humans-in-the-loop forecasting: integrating data science and business planning Humans-in-the-loop forecasting: integrating data science and business planning

Figure 1: A Google data centerAs an example, consider Google’s forecasting and planning for data center capacity.

In particular, the data scientist must take responsibility for stakeholders approving the “best” forecast from all available information sources.

It required investments from our data science team to re-think our statistical forecasting approach to make it easier to compare against customer forecasts.

It also owns Google’s internal time series forecasting platform described in an earlier blog post .

But looking through the blogosphere, some go further and posit that “platformization” of forecasting and “forecasting as a service” can turn anyone into a data scientist at the push …

2 месяца, 3 недели назад @ unofficialgoogledatascience.com
Estimating the prevalence of rare events — theory and practice
Estimating the prevalence of rare events — theory and practice Estimating the prevalence of rare events — theory and practice

$$S(v_1) = S(v_2) \implies \frac{q(v_1)}{p(v_1)} = \frac{q(v_2)}{p(v_2)}$$The ratio between the importance distribution and target distribution is thus a function of $S(v)$:$$\frac{q(v)}{p(v)} = \frac{\tilde{q}(S(v))}{\tilde{p}(S(v))}$$where $\tilde{p}$ and $\tilde{q}$ are PMFs of $S(v)$ under the target distribution and importance distribution respectively.

In our case when the events are rare and the probability of high conditional prevalence rate is small under the target distribution, the difference between the methods is minor.

We also discuss how to choose $q$ with respect to the conditional prevalence rate $g(S(v))=\mathbb{E}_p\left[f(V)|S(V)=S(v)\right]$.

Conclusion In this post, we…

6 месяцев назад @ unofficialgoogledatascience.com
Misadventures in experiments for growth
Misadventures in experiments for growth Misadventures in experiments for growth

In summary, classic experimentation is applicable to fledgling products but in a much more limited way than to established products.

For our music example, we imagined that EDM users don't approximate the target population for some experiments.

The behavior of this single user user appears in our data as a large number of impressions with conversions.

A word on growth hackingOf particular concern in growth hacking is the focus on influencers for pushing growth.

For our music example, we imagined that EDM users don't approximate the target population for some experiments.

10 месяцев, 2 недели назад @ unofficialgoogledatascience.com
Crawling the internet: data science within a large engineering system
Crawling the internet: data science within a large engineering system Crawling the internet: data science within a large engineering system

When queries arrive, the search system matches the inferred meaning of the query to web pages on the basis of these snapshots.

This measure of web page value is on a meaningful linear scale, such that our freshness metric (a weighted average) has an intuitive interpretation.

A global constraint of how much compute and network resources Google itself is willing to dedicate to crawling web pages.

In some regimes (and in practice for google search), a greedy algorithm would devote more recrawl resources towards high value pages, as lower value pages would commonly starve.

We can use this function to sort the web pages, and then determine which web pages should be scheduled for immediate crawl.

1 год, 7 месяцев назад @ unofficialgoogledatascience.com
Compliance bias in mobile experiments
Compliance bias in mobile experiments Compliance bias in mobile experiments

The differences between the distribution of users experiencing the treatment and the population are likely to be a key factor here.

Compliance Bias A central issue in this application is that users assigned treatment sometimes do not actually experience the treatment at $T_{\mathrm{measure}}$, and furthermore this set of users is not random.

Here, we can draw a direct analogy to Compliance Bias, which is primarily described in literature on the analysis of medical studies.

Propensity scoring within the treatmentFig 5: Estimated probability of experiencing the treatment in the treatment group.

Here, we ignore any control group, and analyze the treatment group as a self-contained observationa…

1 год, 11 месяцев назад @ unofficialgoogledatascience.com
Designing A/B tests in a collaboration network
Designing A/B tests in a collaboration network Designing A/B tests in a collaboration network

Our model considers two aspects of network effects:Homophily or similarity within network: users collaborating in network tend to behave similarly.

or similarity within network: users collaborating in network tend to behave similarly.

The network topology itself is the actual collaboration network we observe for GCP.When users are connected in a network, their treatment assignments can generate network effects through their interactions.

In other words, for the three methods of randomizationuniform random componentuniform random projectstratified random component we simulate confidence intervals for A/A tests, i.e.

Conclusion Designing randomized experiments on a network of users is more ch…

2 года, 1 месяц назад @ unofficialgoogledatascience.com
Unintentional data
Unintentional data Unintentional data

The Future of Data AnalysisAvalanche of questions: the role of the data scientist amid unintentional dataIs it relevant to our goals?

In the world of big, unintentional data there are many discoveries to be had which have no bearing on the organization’s goals.

Democratization of analysis: quantity has a quality all its own Just as dealing with unintentional data shapes the role of the data scientists in their organization, it also shapes the day to day practice of data analysis.

Understanding the goals of the organization as well as guiding principles for extracting value from data are both critical for success in this environment.Thankfully not only have modern data analysis tools made da…

2 года, 4 месяца назад @ unofficialgoogledatascience.com
Fitting Bayesian structural time series with the bsts R package
Fitting Bayesian structural time series with the bsts R package Fitting Bayesian structural time series with the bsts R package

When fitting bsts models that contain a regression component, extra arguments captured by ... are passed to the SpikeSlabPrior function from the BoomSpikeSlab package.

# Fit a bsts model with expected model size 1, the default.

model2 <- bsts(iclaimsNSA ~ ., state.specification = ss, niter = 1000, data = initial.claims)# Fit a bsts model with expected model size 5, to include more coefficients.

(a) (b)Figure 10: Regression coefficients for the (a) plain logistic regression model and (b) time series logistic regression model under equivalent spike and slab priors.

These are a widely useful class of time series models, known in various literatures as "structural time series," "state space mod…

2 года, 7 месяцев назад @ unofficialgoogledatascience.com
Our quest for robust time series forecasting at scale
Our quest for robust time series forecasting at scale Our quest for robust time series forecasting at scale

The demand for time series forecasting at Google grew rapidly along with the company over its first decade.

That is, for an attempt to develop methods and tools that would facilitate accurate large-scale time series forecasting at Google.

The demand for time series forecasting at Google grew rapidly along with the company over its first decade.

But like our approach, Prophet aims to be an automatic, robust forecasting tool.At lastly, "forecasting" for us did not mean anomaly detection.

APAby ERIC TASSONE, FARZAN ROHANITime series forecasting enjoys a rich and luminous history, and today is an essential element of most any business operation.

2 года, 10 месяцев назад @ unofficialgoogledatascience.com
Attributing a deep network’s prediction to its input features
Attributing a deep network’s prediction to its input features Attributing a deep network’s prediction to its input features

We consider a deep network using the For concreteness, let us focus on a network that performs object recognition.

Deep networks have multiple layers of logic and coefficients, combined using nonlinear activation functions .

Application to other networks Our paper also includes application of integrated gradients to other networks (none of these networks were trained by us).

There is also work (such as this ) on architecting deep networks in ways that allow us to understand the internal representations of these networks.

Overall, we hope that deep networks lose their reputation for being impenetrable black-boxes which perform black magic.

2 года, 11 месяцев назад @ unofficialgoogledatascience.com
Causality in machine learning
Causality in machine learning Causality in machine learning

An obvious attempt to fix this is to upweight randomized data in training, or even train the model solely on the randomized data.

As we observed at the start of this post, standard machine learning techniques don’t distinguish between randomized and observational data the way statistical models do.

Conclusion In this post we described how some randomized data may be applied both to check and improve the accuracy of a machine learning system trained largely on observational data.

Indeed, machine learning generally lacks the vocabulary to capture the distinction between observational data and randomized data that statistics finds crucial.

Rather, the focus of this post is on combining observa…

3 года назад @ unofficialgoogledatascience.com
Practical advice for analysis of large, complex data sets
Practical advice for analysis of large, complex data sets Practical advice for analysis of large, complex data sets

Some people seemed to be naturally good at doing this kind of high quality data analysis.

Process Separate Validation, Description, and EvaluationValidation or Initial Data Analysis: Do I believe data is self-consistent, that the data was collected correctly, and that data represents what I think it does?

I think about about exploratory data analysis as having 3 interrelated stages:By separating these phases, you can more easily reach agreement with others.

Acknowledge and count your filtering Almost every large data analysis starts by filtering the data in various stages.

Almost every large data analysis starts by filtering the data in various stages.

3 года, 3 месяца назад @ unofficialgoogledatascience.com
Statistics for Google Sheets
Statistics for Google Sheets Statistics for Google Sheets

IntroductionStatistics for Google Sheets is an add-on for Google Sheets that brings elementary statistical analysis tools to spreadsheet users.

The goal of the Statistics app is to “democratize data science” by putting elementary statistics capabilities in the hands of anyone with a Google account.

If you look closely at the boxplots you can see that returns following down days have slightly greater variation than returns following up days.

Finally, you can use logistic regression to see how a previous day’s return affects the probability of the next day’s return being positive.

Statistics for Google Sheets gives analysts and students the tools to conduct elementary statistical analyses in …

3 года, 4 месяца назад @ unofficialgoogledatascience.com
Next generation tools for data science
Next generation tools for data science Next generation tools for data science

Introductionthe solution to write data processing pipelines scalable to hundreds of terabytes (or more) is evidenced by the massive uptake.

That MapReduce wassolution to write data processing pipelines scalable to hundreds of terabytes (or more) is evidenced by the massive uptake.

Widely used in medicine for count data, the MH estimator and its generalizations are ubiquitous within data science at Google.

filter( lambda x: x != header) .

Beam/Dataflow’s sweet spot: streaming processing Streaming processing is an ever-increasingly important topic for data science.

3 года, 5 месяцев назад @ unofficialgoogledatascience.com
Mind Your Units
Mind Your Units Mind Your Units

The perils of incorrect units Is the idea of 'minding our units' just some esoteric issue, or can this actually hurt us in practice?

How do we mind our units in analyses at Google?

The above simulation already hints at one of our approaches to incorporating the group structure in some analyses at Google.

Regardless of how you do it, do remember to mind your units.

Regardless of how you do it, do remember to mind your units.

3 года, 6 месяцев назад @ unofficialgoogledatascience.com
Andrew Karpathy
последний пост 10 месяцев назад
A Recipe for Training Neural Networks
A Recipe for Training Neural Networks

Some few weeks ago I posted a tweet on “the most common neural net mistakes”, listing a few common gotchas related to training neural nets.

1) Neural net training is a leaky abstractionIt is allegedly easy to get started with training neural nets.

This is just a start when it comes to training neural nets.

As a result, (and this is reeaally difficult to over-emphasize) a “fast and furious” approach to training neural networks does not work and only leads to suffering.

focus on training loss) and then regularize it appropriately (give up some training loss to improve the validation loss).

10 месяцев назад @ karpathy.github.io
(started posting on Medium instead)
(started posting on Medium instead)

The current state of this blog (with the last post 2 years ago) makes it look like I’ve disappeared.

I’ve certainly become less active on blogs since I’ve joined Tesla, but whenever I do get a chance to post something I have recently been defaulting to doing it on Medium because it is much faster and easier.

I still plan to come back here for longer posts if I get any time, but I’ll default to Medium for everything short-medium in length.

TLDRHave a look at my Medium blog.

2 года, 1 месяц назад @ karpathy.github.io
A Survival Guide to a PhD
A Survival Guide to a PhD A Survival Guide to a PhD

Unlike the undergraduate guide, this one was much more difficult to write because there is significantly more variation in how one can traverse the PhD experience.

You can go one way (PhD -> anywhere else) but not the other (anywhere else -> PhD -> academia/research; it is statistically less likely).

The adviser is an extremely important person who will exercise a lot of influence over your PhD experience.

During your PhD you’ll get to acquire this sense yourself.

It’s usually a painful exercise for me to look through some of my early PhD paper drafts because they are quite terrible.

3 года, 5 месяцев назад @ karpathy.github.io
Deep Reinforcement Learning: Pong from Pixels
Deep Reinforcement Learning: Pong from Pixels Deep Reinforcement Learning: Pong from Pixels

This is a long overdue blog post on Reinforcement Learning (RL).

From left to right: Deep Q Learning network playing ATARI, AlphaGo, Berkeley robot stacking Legos, physically-simulated quadruped leaping over terrain.

Policy network.

For example, suppose we compute \(R_t\) for all of the 20,000 actions in the batch of 100 Pong game rollouts above.

The total number of episodes was approximately 8,000 so the algorithm played roughly 200,000 Pong games (quite a lot isn’t it!)

3 года, 9 месяцев назад @ karpathy.github.io
Short Story on AI: A Cognitive Discontinuity.
Short Story on AI: A Cognitive Discontinuity. Short Story on AI: A Cognitive Discontinuity.

Another great source of good reputation for Visceral were the large number of famous interventions carried out by autonomous Visceral agents.

The list went on and on - one month ago an autonomous Visceral agent recognized a remote drone attack.

He was running the routine software diagnostics on the Visceral agent and one of them had just failed.

The software diagnostics were only at 5% complete, and Merus knew they would take a while to run to completion.

Merus’ avatar broke the silence in the last second: “Come meet me here.” And then the connection was lost.

4 года, 3 месяца назад @ karpathy.github.io
What a Deep Neural Network thinks about your #selfie
What a Deep Neural Network thinks about your #selfie What a Deep Neural Network thinks about your #selfie

In this fun experiment we’re going to do just that: We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones.

what if someone posted a very good selfie but it was late at night, so perhaps not as many people saw it and it got less likes?

What makes a good #selfie ?

To take a good selfie, Do:Be female.

Also, with some relief, it seems that the best selfies do not seem to be the ones that show the most skin.

4 года, 4 месяца назад @ karpathy.github.io
The Unreasonable Effectiveness of Recurrent Neural Networks
The Unreasonable Effectiveness of Recurrent Neural Networks The Unreasonable Effectiveness of Recurrent Neural Networks

A glaring limitation of Vanilla Neural Networks (and also Convolutional Networks) is that their API is too constrained: they accept a fixed-sized vector as input (e.g.

If training vanilla neural nets is optimization over functions, training recurrent nets is optimization over programs.

At the core, RNNs have a deceptively simple API: They accept an input vector x and give you an output vector y .

Fun with RNNsAll 5 example character models below were trained with the code I’m releasing on Github.

These models have about 10 million parameters, which is still on the lower end for RNN models.

4 года, 9 месяцев назад @ karpathy.github.io
Breaking Linear Classifiers on ImageNet
Breaking Linear Classifiers on ImageNet Breaking Linear Classifiers on ImageNet

speech recognition systems), and most importantly, also to simple, shallow, good old-fashioned Linear Classifiers (Softmax classifier, or Linear Support Vector Machines, etc.).

Instead, lets fool a linear classifier and lets also keep with the theme of breaking models on images because they are fun to look at.

With input images of size 64x64x3 and 1000 ImageNet classes we therefore have 64x64x3x1000 = 12.3 million weights (beefy linear model!

We can then visualize each of the learned weights by reshaping them as images:Example linear classifiers for a few ImageNet classes.

Linear classifier with lower regularization (which leads to more noisy class weights) is easier to fool (top).

4 года, 11 месяцев назад @ karpathy.github.io
What I learned from competing against a ConvNet on ImageNet
What I learned from competing against a ConvNet on ImageNet What I learned from competing against a ConvNet on ImageNet

The 100,000 test set images are released with the dataset, but the labels are withheld to prevent teams from overfitting on the test set.

It’s fun to note that about 4 years ago I performed a similar (but much quicker and less detailed) human classification accuracy analysis on CIFAR-10.

In total, we attribute 24 (24%) of GoogLeNet errors and 12 (16%) of human errors to this category.

We estimate that approximately 22 (21%) of GoogLeNet errors fall into this category, while none of the human errors do.

On the hand, a large majority of human errors come from fine-grained categories and class unawareness.

5 лет, 5 месяцев назад @ karpathy.github.io
Quantifying Productivity
Quantifying Productivity Quantifying Productivity

The tracking script currently records active window titles (at frequency of once every 2 seconds) and keystroke typing frequency.

Now, remember that we record keystrokes and window titles throughout.

Hacking Streak is a nifty feature that tries to identify contiguous hacking activity and correlates reasonably with my productivity.

In the end, ulogme shows the final breakdown of titles that occupied me on this day:The final breakdown of active window titles.

The holy grail here is still not implemented: What are the correlated of my productivity?

5 лет, 6 месяцев назад @ karpathy.github.io
Off the Convex Path
последний пост 4 месяца, 3 недели назад
Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK)
Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK) Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK)

gradient flow) is equivalent to a kernel regression predictor with a deterministic kernel called neural tangent kernel (NTK).

Now we describe how training an ultra-wide fully-connected neural network leads to kernel regression with respect to the NTK.

In the large width limit, it turns out that the time-varying kernel $ker_t(\cdot,\cdot)$ is (with high probability) always close to a deterministic fixed kernel $ker_{\mathsf{NTK}}(\cdot,\cdot)$, which is the neural tangent kernel (NTK).

Now, at least we have a better understanding of a class of ultra-wide neural networks: they are captured by neural tangent kernels!

Similarly, one can try to translate other architectures like recurrent neural…

4 месяца, 3 недели назад @ offconvex.org
Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent
Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent

Understanding implicit regularization in deep learning by analyzing trajectories of gradient descentSanjeev’s recent blog post suggested that the conventional view of optimization is insufficient for understanding deep learning, as the value of the training objective does not reliably capture generalization.

In recent years, researchers have come to realize the importance of implicit regularization induced by the choice of optimization algorithm.

This theorem disqualifies Schatten quasi-norms as the implicit regularization in deep matrix factorizations, and instead suggests that all depths correspond to nuclear norm.

Full details behind our results on “implicit regularization as norm minimi…

7 месяцев, 2 недели назад @ offconvex.org
Landscape Connectivity of Low Cost Solutions for Multilayer Nets
Landscape Connectivity of Low Cost Solutions for Multilayer Nets Landscape Connectivity of Low Cost Solutions for Multilayer Nets

Landscape Connectivity of Low Cost Solutions for Multilayer NetsA big mystery about deep learning is how, in a highly nonconvex loss landscape, gradient descent often finds near-optimal solutions —those with training cost almost zero— even starting from a random initialization.

Solutions A and B have low cost but the line connecting them goes through solutions with high cost.

Mode Connectivity.

2019) did try to explain the phenomenon of mode connectivity in simple settings (the first of these demonstrated mode connectivity empirically for multi-layer nets).

Thus to explain mode connectivity for multilayer nets we will need to leverage some stronger property of typical solutions discovered v…

8 месяцев, 2 недели назад @ offconvex.org
Is Optimization a Sufficient Language for Understanding Deep Learning?
Is Optimization a Sufficient Language for Understanding Deep Learning?

Is Optimization a Sufficient Language for Understanding Deep Learning?

In this Deep Learning era, machine learning usually boils down to defining a suitable objective/cost function for the learning task at hand, and then optimizing this function using some variant of gradient descent (implemented via backpropagation).

I am suggesting that deep learning algorithms also have important properties that are not always reflected in the objective value.

by playing with batch sizes and learning rates) can be preferable to perfect optimization, even in simple settings such as regression.

NB: Empirically we find that Adam, the celebrated acceleration method for deep learning, speeds up optimization a…

8 месяцев, 3 недели назад @ offconvex.org
Contrastive Unsupervised Learning of Semantic Representations: A Theoretical Framework
Contrastive Unsupervised Learning of Semantic Representations&#58; A Theoretical Framework Contrastive Unsupervised Learning of Semantic Representations&#58; A Theoretical Framework

Contrastive Unsupervised Learning of Semantic Representations: A Theoretical FrameworkSemantic representations (aka semantic embeddings) of complicated data types (e.g.

Researchers are most interested in unsupervised representation learning using unlabeled data.

samples $x, x^{+}$ from the distribution $D_{c^+}$.

The highlighted parts in the table show that the unsupervised representations compete well with the supervised representations on the average $k$-way classification task ($k=2, 10$).

We find this to be true for unsupervised representations, and surprisingly for supervised representations as well.

11 месяцев, 1 неделя назад @ offconvex.org
The search for biologically plausible neural computation: A similarity-based approach
The search for biologically plausible neural computation&#58; A similarity-based approach The search for biologically plausible neural computation&#58; A similarity-based approach

By re-ordering the variables and introducing a new variable, ${\bf W} \in \mathbb{R}^{k\times n}$, we obtain:To prove the second identity, find optimal ${\bf W}$ by taking a derivative of the expression on the right with respect to ${\bf W}$ and setting it to zero, and then substitute the optimal ${\bf W}$ back into the expression.

The price paid for this simplification is the appearance of the minimax optimization problem in variables, ${\bf W}$ and ${\bf M}$.

Variables ${\bf W}$ and ${\bf M}$ are represented by the weights of synapses in feedforward and lateral connections respectively.

In neuroscience, learning rules (2.7) for ${\bf W}$ and ${\bf M}$ are called Hebbian and anti-Hebbian r…

1 год, 2 месяца назад @ offconvex.org
Understanding optimization in deep learning by analyzing trajectories of gradient descent
Understanding optimization in deep learning by analyzing trajectories of gradient descent Understanding optimization in deep learning by analyzing trajectories of gradient descent

Understanding optimization in deep learning by analyzing trajectories of gradient descentNeural network optimization is fundamentally non-convex, and yet simple gradient-based algorithms seem to consistently solve such problems.

Trajectory-Based Analyses for Deep Linear Neural NetworksLinear neural networks are fully-connected neural networks with linear (no) activation.

2014 were the first to carry out a trajectory-based analysis for deep (three or more layer) linear networks, treating gradient flow (gradient descent with infinitesimally small learning rate) minimizing $\ell_2$ loss over whitened data.

Specifically, we analyze trajectories of gradient descent for any linear neural network …

1 год, 3 месяца назад @ offconvex.org
Simple and efficient semantic embeddings for rare words, n-grams, and language features
Simple and efficient semantic embeddings for rare words, n-grams, and language features Simple and efficient semantic embeddings for rare words, n-grams, and language features

Simple and efficient semantic embeddings for rare words, n-grams, and language featuresDistributional methods for capturing meaning, such as word embeddings, often require observing many examples of words in context.

Here we describe a simple but principled approach called à la carte embeddings, described in our ACL’18 paper with Yingyu Liang, Tengyu Ma, and Brandon Stewart.

For convenience, we will let $u_w^c$ denote the average of the word embeddings of words in $c$.

We test this hypothesis by inducing embeddings for $n$-grams by using contexts from a large text corpus and word embeddings trained on the same corpus.

The à la carte code is available here, allowing you to re-create the resu…

1 год, 5 месяцев назад @ offconvex.org
When Recurrent Models Don't Need to be Recurrent
When Recurrent Models Don't Need to be Recurrent When Recurrent Models Don't Need to be Recurrent

When Recurrent Models Don't Need to be RecurrentIn the last few years, deep learning practitioners have proposed a litany of different sequence models.

Feed-forward models can offer improvements in training stability and speed, while recurrent models are strictly more expressive.

At the outset, recurrent models appear to be a strictly more flexible and expressive model class than feed-forward models.

Feed-forward models make translations using only $k$ words of the sentence, whereas recurrent models can leverage the entire sentence.

Feed-forward models are limited to the past $k$ samples, whereas recurrent models can use the entire history.

1 год, 7 месяцев назад @ offconvex.org
Deep-learning-free Text and Sentence Embedding, Part 2
Deep-learning-free Text and Sentence Embedding, Part 2 Deep-learning-free Text and Sentence Embedding, Part 2

Deep-learning-free Text and Sentence Embedding, Part 2This post continues Sanjeev’s post and describes further attempts to construct elementary and interpretable text embeddings.

Even better, it is much faster to compute, since it uses pretrained (GloVe) word vectors and simple linear algebra.

Note that DisC embeddings leverage classic Bag-of-n-Gram information as well as the power of word embeddings.

The new theorem follows from considering an LSTM that uses random vectors as word embeddings and computes the DisC embedding in one pass over the text.

Sample code for constructing and evaluating DisC embeddings is available, as well as solvers for recreating the sparse recovery results for wo…

1 год, 8 месяцев назад @ offconvex.org
Machine Learning Mastery Machine Learning Mastery
последний пост 13 часов назад
How to Calibrate Probabilities for Imbalanced Classification
How to Calibrate Probabilities for Imbalanced Classification How to Calibrate Probabilities for Imbalanced Classification

In this tutorial, you will discover how to calibrate predicted probabilities for imbalanced classification.

After completing this tutorial, you will know:Calibrated probabilities are required to get the most out of models for imbalanced classification problems.

Tutorial OverviewThis tutorial is divided into five parts; they are:Problem of Uncalibrated Probabilities How to Calibrate Probabilities SVM With Calibrated Probabilities Decision Tree With Calibrated Probabilities Grid Search Probability Calibration with KNNProblem of Uncalibrated ProbabilitiesMany machine learning algorithms can predict a probability or a probability-like score that indicates class membership.

SVM With Calibrated P…

13 часов назад @ machinelearningmastery.com
A Gentle Introduction to the Fbeta-Measure for Machine Learning
A Gentle Introduction to the Fbeta-Measure for Machine Learning A Gentle Introduction to the Fbeta-Measure for Machine Learning

F-Measure = (2 * Precision * Recall) / (Precision + Recall)This is the harmonic mean of the two fractions.

Perfect Precision and Recall: p=1.000, r=1.000, f=1.000 1 Perfect Precision and Recall: p=1.000, r=1.000, f=1.00050% Precision, Perfect RecallIt is not possible to have perfect precision and no recall, or no precision and perfect recall.

Specifically, F-measure and F1-measure calculate the same thing; for example:F-Measure = ((1 + 1^2) * Precision * Recall) / (1^2 * Precision + Recall)F-Measure = (2 * Precision * Recall) / (Precision + Recall)Consider the case where we have 50 percept precision and perfect recall.

The F0.5-Measure is calculated as follows:F0.5-Measure = ((1 + 0.5^2) * …

2 дня, 13 часов назад @ machinelearningmastery.com
How to Develop an Imbalanced Classification Model to Detect Oil Spills
How to Develop an Imbalanced Classification Model to Detect Oil Spills How to Develop an Imbalanced Classification Model to Detect Oil Spills

# define the reference model model = DummyClassifier ( strategy = 'uniform' )Once the model is evaluated, we can report the mean and standard deviation of the G-mean scores directly.

# define models models , names , results = list ( ) , list ( ) , list ( ) # LR models .

# evaluate each model for i in range ( len ( models ) ) : # evaluate the model and store results scores = evaluate_model ( X , y , models [ i ] ) results .

# define models models , names , results = list ( ) , list ( ) , list ( ) # LR Balanced models .

... # define the model smoteenn = SMOTEENN(enn=EditedNearestNeighbours(sampling_strategy='majority')) model = LogisticRegression(solver='liblinear') pipeline = Pipeline(steps=…

5 дней, 13 часов назад @ machinelearningmastery.com
How to Develop a Probabilistic Model of Breast Cancer Patient Survival
How to Develop a Probabilistic Model of Breast Cancer Patient Survival How to Develop a Probabilistic Model of Breast Cancer Patient Survival

Tutorial OverviewThis tutorial is divided into five parts; they are:Haberman Breast Cancer Survival Dataset Explore the Dataset Model Test and Baseline Result Evaluate Probabilistic Models Probabilistic Algorithm Evaluation Model Evaluation With Scaled Inputs Model Evaluation With Power Transforms Make Prediction on New DataHaberman Breast Cancer Survival DatasetIn this project, we will use a small breast cancer survival dataset, referred to generally as the “Haberman Dataset.”The dataset describes breast cancer patient data and the outcome is patient survival.

... # evaluate the model scores = evaluate_model(X, y, model) # summarize performance print('Mean BSS: %.3f (%.3f)' % (mean(scores)…

1 неделя назад @ machinelearningmastery.com
Why Is Imbalanced Classification Difficult?
Why Is Imbalanced Classification Difficult? Why Is Imbalanced Classification Difficult?

Tutorial OverviewThis tutorial is divided into four parts; they are:Why Imbalanced Classification Is Hard Compounding Effect of Dataset Size Compounding Effect of Label Noise Compounding Effect of Data DistributionWhy Imbalanced Classification Is HardImbalanced classification is defined by a dataset with a skewed class distribution.

This is referred to as cost sensitivity of misclassification errors and is a second foundational challenge of imbalanced classification.

These are general characteristics of classification predictive modeling that magnify the difficulty of the imbalanced classification task.

As such, the size of the dataset dramatically impacts the imbalanced classification task…

1 неделя, 2 дня назад @ machinelearningmastery.com
One-Class Classification Algorithms for Imbalanced Datasets
One-Class Classification Algorithms for Imbalanced Datasets One-Class Classification Algorithms for Imbalanced Datasets

How to adapt one-class classification algorithms for imbalanced classification with a severely skewed class distribution.

To be clear, this adaptation of one-class classification algorithms for imbalanced classification is unusual but can be effective on some problems.

# fit on majority class trainX = trainX[trainy==0] model.fit(trainX) 1 2 3 # fit on majority class trainX = trainX [ trainy == 0 ] model .

... # calculate score score = f1_score(testy, yhat, pos_label=-1) print('F1 Score: %.3f' % score) 1 2 3 4 .

How to adapt one-class classification algorithms for imbalanced classification with a severely skewed class distribution.

1 неделя, 5 дней назад @ machinelearningmastery.com
Bagging and Random Forest for Imbalanced Classification
Bagging and Random Forest for Imbalanced Classification Bagging and Random Forest for Imbalanced Classification

Tutorial OverviewThis tutorial is divided into three parts; they are:Bagging for Imbalanced Classification Standard Bagging Bagging With Random Undersampling Random Forest for Imbalanced Classification Standard Random Forest Random Forest With Class Weighting Random Forest With Bootstrap Class Weighting Random Forest With Random Undersampling Easy Ensemble for Imbalanced Classification Easy EnsembleBagging for Imbalanced ClassificationBootstrap Aggregation, or Bagging for short, is an ensemble machine learning algorithm.

Mean ROC AUC: 0.871 1 Mean ROC AUC: 0.871Want to Get Started With Imbalance Classification?

Random Forest for Imbalanced ClassificationRandom forest is another ensemble of …

2 недели назад @ machinelearningmastery.com
A Gentle Introduction to Threshold-Moving for Imbalanced Classification
A Gentle Introduction to Threshold-Moving for Imbalanced Classification A Gentle Introduction to Threshold-Moving for Imbalanced Classification

# fit a model model = LogisticRegression ( solver = 'lbfgs' ) model .

# predict probabilities yhat = model .

... # define thresholds thresholds = arange(0, 1, 0.001) 1 2 3 .

# apply threshold to positive probabilities to create labels def to_labels(pos_probs, threshold): return (pos_probs >= threshold).astype('int') 1 2 3 # apply threshold to positive probabilities to create labels def to_labels ( pos_probs , threshold ) : return ( pos_probs >= threshold ) .

fit ( trainX , trainy ) # predict probabilities yhat = model .

2 недели, 2 дня назад @ machinelearningmastery.com
Cost-Sensitive Learning for Imbalanced Classification
Cost-Sensitive Learning for Imbalanced Classification Cost-Sensitive Learning for Imbalanced Classification

Cost-sensitive learning is a subfield of machine learning that involves explicitly defining and using costs when training machine learning algorithms.

Tutorial OverviewThis tutorial is divided into four parts; they are:Not All Classification Errors Are Equal Cost-Sensitive Learning Cost-Sensitive Imbalanced Classification Cost-Sensitive MethodsNot All Classification Errors Are EqualClassification is a predictive modeling problem that involves predicting the class label for an observation.

This field is generally referred to as Cost-Sensitive Machine Learning, or more simply Cost-Sensitive Learning.

This means that although some methods from cost-sensitive learning can be helpful on imbalanc…

2 недели, 5 дней назад @ machinelearningmastery.com
How to Configure XGBoost for Imbalanced Classification
How to Configure XGBoost for Imbalanced Classification How to Configure XGBoost for Imbalanced Classification

This modified version of XGBoost is referred to as Class Weighted XGBoost or Cost-Sensitive XGBoost and can offer better performance on binary classification problems with a severe class imbalance.

Tutorial OverviewThis tutorial is divided into four parts; they are:Imbalanced Classification Dataset XGBoost Model for Classification Weighted XGBoost for Class Imbalance Tune the Class Weighting HyperparameterImbalanced Classification DatasetBefore we dive into XGBoost for imbalanced classification, let’s first define an imbalanced classification dataset.

Before any modification or tuning is made to the XGBoost algorithm for imbalanced classification, it is important to test the default XGBoost…

3 недели назад @ machinelearningmastery.com
How to Develop a Cost-Sensitive Neural Network for Imbalanced Classification
How to Develop a Cost-Sensitive Neural Network for Imbalanced Classification How to Develop a Cost-Sensitive Neural Network for Imbalanced Classification

After completing this tutorial, you will know:How the standard neural network algorithm does not support imbalanced classification.

Tutorial OverviewThis tutorial is divided into four parts; they are:Imbalanced Classification Dataset Neural Network Model in Keras Deep Learning for Imbalanced Classification Weighted Neural Network With KerasImbalanced Classification DatasetBefore we dive into the modification of neural networks for imbalanced classification, let’s first define an imbalanced classification dataset.

Download Your FREE Mini-CourseNeural Network Model in KerasNext, we can fit a standard neural network model on the dataset.

# define the neural network model def define_model(n_inp…

3 недели, 3 дня назад @ machinelearningmastery.com
Cost-Sensitive SVM for Imbalanced Classification
Cost-Sensitive SVM for Imbalanced Classification Cost-Sensitive SVM for Imbalanced Classification

Tutorial OverviewThis tutorial is divided into four parts; they are:Imbalanced Classification Dataset SVM for Imbalanced Classification Weighted SVM With Scikit-Learn Grid Search Weighted SVMImbalanced Classification DatasetBefore we dive into the modification of SVM for imbalanced classification, let’s first define an imbalanced classification dataset.

# define model model = SVC ( gamma = 'scale' )We will use repeated cross-validation to evaluate the model, with three repeats of 10-fold cross-validation.

SVMs are effective models for binary classification tasks, although by default, they are not effective at imbalanced classification.

This modification of SVM may be referred to as Weighted…

3 недели, 5 дней назад @ machinelearningmastery.com
Cost-Sensitive Decision Trees for Imbalanced Classification
Cost-Sensitive Decision Trees for Imbalanced Classification Cost-Sensitive Decision Trees for Imbalanced Classification

Tutorial OverviewThis tutorial is divided into four parts; they are:Imbalanced Classification Dataset Decision Trees for Imbalanced Classification Weighted Decision Trees With Scikit-Learn Grid Search Weighted Decision TreesImbalanced Classification DatasetBefore we dive into the modification of decision for imbalanced classification, let’s first define an imbalanced classification dataset.

# define model model = DecisionTreeClassifier ( )We will use repeated cross-validation to evaluate the model, with three repeats of 10-fold cross-validation.

As such, this modification of the decision tree algorithm is referred to as a weighted decision tree, a class-weighted decision tree, or a cost-sen…

4 недели назад @ machinelearningmastery.com
💼 University and corporation labs
DeepMind DeepMind
последний пост 2 недели, 2 дня назад
A new model and dataset for long-range memory
A new model and dataset for long-range memory A new model and dataset for long-range memory

Modelling natural languageFinding machine learning tasks which both drive the development of better memory architectures and push us further towards artificial general intelligence is challenging.

Transferring knowledgeSuch samples would likely astound Shannon, 70 years on from his early language model experiments.

Google’s prominent natural language model, BERT, achieves state-of-the-art performance on a wide array of NLP benchmarks, and is now a part of Google Search.

Benchmarking language modelsA popular long-range language model benchmark is WikiText-103, which is comprised of English-language Wikipedia articles, and was developed by researchers at Salesforce AI.

As such, we’ve compiled…

2 недели, 2 дня назад @ deepmind.com
AlphaFold: Using AI for scientific discovery
AlphaFold: Using AI for scientific discovery AlphaFold: Using AI for scientific discovery

In our study published today in Nature, we demonstrate how artificial intelligence research can drive and accelerate new scientific discoveries.

Our system, AlphaFold – described in peer-reviewed papers now published in Nature and PROTEINS – is the culmination of several years of work, and builds on decades of prior research using large genomic datasets to predict protein structure.

What is the protein folding problem?

What any given protein can do depends on its unique 3D structure.

Why is protein folding important?

1 месяц, 1 неделя назад @ deepmind.com
Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI
Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI

Meanwhile, in close contact with this study of reward learning in animals, computer scientists have developed algorithms for reinforcement learning in artificial systems.

A chain of prediction: temporal difference learningReinforcement learning is one of the oldest and most powerful ideas linking neuroscience and AI.

An important breakthrough in solving the problem of reward prediction was the temporal difference learning (TD) algorithm.

Around the same time, in the late 80s and early 90s, neuroscientists were struggling to understand the behaviour of dopamine neurons.

Distributional reinforcement learning

1 месяц, 1 неделя назад @ deepmind.com
Using WaveNet technology to reunite speech-impaired users with their original voices
Using WaveNet technology to reunite speech-impaired users with their original voices Using WaveNet technology to reunite speech-impaired users with their original voices

This post details a recent project we undertook with Google and ALS campaigner Tim Shaw, as part of Google’s Euphonia project.

We demonstrate an early proof of concept of how text-to-speech technologies can synthesise a high-quality, natural sounding voice using minimal recorded speech data.

But message banking lacks flexibility, resulting in a static dataset of phrases.

Now imagine that you were given the chance to preserve your voice by recording as much of it as possible.

And people who aren’t able to record phrases in time are left to choose a generic computer synthesized voice that lacks the same power of connection as their own.

2 месяца, 1 неделя назад @ deepmind.com
Learning human objectives by evaluating hypothetical behaviours
Learning human objectives by evaluating hypothetical behaviours Learning human objectives by evaluating hypothetical behaviours

TL;DR: We present a method for training reinforcement learning agents from human feedback in the presence of unknown unsafe states.

Training RL agents in the presence of unsafe states is known as the safe exploration problem.

The agent has one source of information: feedback about unsafe states from a human user.

Existing methods for training agents from human feedback ask the user to evaluate data of the agent acting in the environment.

The user provides feedback on this hypothetical behaviour, and the system interactively learns a model of the user's reward function.

2 месяца, 2 недели назад @ deepmind.com
From unlikely start-up to major scientific organisation: Entering our tenth year at DeepMind
From unlikely start-up to major scientific organisation: Entering our tenth year at DeepMind From unlikely start-up to major scientific organisation: Entering our tenth year at DeepMind

Pioneering research, growing impactA mission this ambitious requires pioneering research on many fronts over many years.

As our research matures, we’ve been finding more opportunities to partner with others for social and commercial impact, often with our colleagues across Alphabet.

Entering our next phaseAs I discussed with Wired in the summer, this year feels like the start of a new phase for DeepMind as an established scientific organisation.

Over the past year, we’ve also been formalising a leadership team with the seasoned experience and skills for our second decade.

Right back to our origins blending neuroscience with machine learning, we’ve found that breakthroughs happen faster when…

2 месяца, 3 недели назад @ deepmind.com
Strengthening the AI community
Strengthening the AI community Strengthening the AI community

For me, it was being awarded an internship at Intel, the first one ever through Purdue’s Co-Op Engineering program in 1990.

I just didn’t know if I had the right technical skills for the work, or if engineering was really my path.

It grew into a very successful 18-year career at Intel and a 25-year career in tech.

At DeepMind we want to build advanced AI to expand our knowledge and find answers to some of the fundamental questions facing society.

DeepMind Scholarships to open the field of AIThe DeepMind scholarship programme is one way we seek to broaden participation in science and AI.

3 месяца, 1 неделя назад @ deepmind.com
Advanced machine learning helps Play Store users discover personalised apps
Advanced machine learning helps Play Store users discover personalised apps Advanced machine learning helps Play Store users discover personalised apps

Candidate generator unbiasingOur model (called a candidate generator) learns what apps a user is more likely to install based on previous apps they’ve installed from the Play store.

The model therefore learns a bias that favours the apps that are shown – and thus installed – more often.

An importance weight is based on the impression-to-install rate of each individual app in comparison with the median impression-to-install rate across the Play store.

Through importance weighting, our candidate generator can downweight or upweight apps based on their install rates, which mitigates the recommendation bias problem.

Our solution to this, the reranker model, learns the relative importance of a p…

3 месяца, 1 неделя назад @ deepmind.com
AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning
AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

Since then, we have taken on a much greater challenge: playing the full game at a Grandmaster level under professionally approved conditions .

AlphaStar can now play in one-on-one matches as and against Protoss, Terran, and Zerg – the three races present in StarCraft II.

Each of the Protoss, Terran, and Zerg agents is a single neural network.

We chose to use general-purpose machine learning techniques – including neural networks, self-play via reinforcement learning, multi-agent learning, and imitation learning – to learn directly from game data with general purpose techniques.

Using the advances described in our Nature paper, AlphaStar was ranked above 99.8% of active players on Battle.net…

3 месяца, 4 недели назад @ deepmind.com
Causal Bayesian Networks: A flexible tool to enable fairer machine learning
Causal Bayesian Networks: A flexible tool to enable fairer machine learning Causal Bayesian Networks: A flexible tool to enable fairer machine learning

This simplified example shows how CBNs can provide us with a visual framework for describing different possible unfairness scenarios.

It is nevertheless necessary to avoid pitfalls when evaluating or designing a decision system.

This means that it would be possible for the system to be deemed fair, even if it carries the unfair influence: this would automatically be the case for an error-free decision system.

On the other hand, if the path G→D→A was considered fair, it would be inappropriate to use statistical parity.

Path-specific techniques enable us to estimate the influence that a sensitive attribute has on other variables along specific sets of causal paths.

4 месяца, 3 недели назад @ deepmind.com
DeepMind’s health team joins Google Health
DeepMind’s health team joins Google Health DeepMind’s health team joins Google Health

Today, with our healthcare partners, the team is excited to officially join the Google Health family.

It’s remarkable that many frontline clinicians, even in the most world’s most advanced hospitals, are still reliant on clunky desktop systems and pagers that make delivering fast and safe patient care challenging.

That’s why I joined DeepMind, and why I will continue this work with Google Health.

We’ve already seen how our mobile medical assistant for clinicians is helping patients and the clinicians looking after them, and we are looking forward to continuing our partnerships with The Royal Free London NHS Foundation Trust, Imperial College Healthcare NHS Trust and Taunton and Somerset NHS…

5 месяцев, 1 неделя назад @ deepmind.com
Episode 8: Demis Hassabis - The interview
Episode 8: Demis Hassabis - The interview Episode 8: Demis Hassabis - The interview

Find out more about the themes in this episode:If you know of other resources we should link to, please help other listeners by either replying to us on Twitter (#DMpodcast) or emailing us at podcast@deepmind.com.

You can also use that address to send us questions or feedback on the series.

Credits:Presenter: Hannah FryEditor: David PrestSenior Producer: Louisa FieldProducers: Amy Racs, Dan HardoonBinaural Sound: Lucinda Mason-BrownMusic composition: Eleni Shaw (with help from Sander Dieleman and WaveNet)

5 месяцев, 1 неделя назад @ deepmind.com
Episode 7: Towards the future
Episode 7: Towards the future Episode 7: Towards the future

AI researchers around the world are trying to create a general purpose learning system that can learn to solve a broad range of problems without being taught how.

Koray Kavukcuoglu, DeepMind’s Director of Research, describes the journey to get there, and takes Hannah on a whistle-stop tour of DeepMind’s HQ and its research.

Interviewees: Koray Kavukcuoglu, Director of Research; Trevor Back, Product Manager for DeepMind’s science research; research scientists Raia Hadsell and Murray Shanahan; and DeepMind CEO and co-founder, Demis Hassabis.

5 месяцев, 2 недели назад @ deepmind.com
Replay in biological and artificial neural networks
Replay in biological and artificial neural networks Replay in biological and artificial neural networks

The imagination theory makes a different prediction about how replay will look: when you rest on the couch, your brain should replay the sequence "dog, vase, water".

As in previous experiments, fast replay sequences of the objects were evident in the brain recordings.

However, the sequences did not play out in the experienced order (i.e., the scrambled order: spilled water –> vase –> dog).

And to our surprise, during rest they played out in fast sequences that were precisely coordinated with the spontaneous replay sequences mentioned above.

For example, during a dog, vase, water replay sequence, the representation of "water" was preceded by the codes for "home sequence" and "spilled liquid".

5 месяцев, 3 недели назад @ deepmind.com
Episode 6: AI for everyone
Episode 6: AI for everyone Episode 6: AI for everyone

While there is a lot of excitement about AI research, there are also concerns about the way it might be implemented, used and abused.

In this episode Hannah investigates the more human side of the technology, some ethical issues around how it is developed and used, and the efforts to create a future of AI that works for everyone.

Interviewees: Verity Harding, Co-Lead of DeepMind Ethics and Society; DeepMind’s COO Lila Ibrahim, and research scientists William Isaac and Silvia Chiappa.

5 месяцев, 3 недели назад @ deepmind.com
Facebook Facebook
последний пост 1 день, 14 часов назад
Facebook supports research on misinformation and polarization with $2 million commitment
Facebook supports research on misinformation and polarization with $2 million commitment Facebook supports research on misinformation and polarization with $2 million commitment

Our partnerships with outside experts are critical in addressing and understanding social challenges on communication platforms.

“To advance our understanding of how technology impacts people and society, we’ve strengthened our commitment to conducting social science research in partnership with academics globally,” says Umer Farooq, who leads the community integrity research team at Facebook.

With this in mind, we are launching the 2020 Foundational Integrity Research: Misinformation and Polarization request for proposals .

We will award $2 million in unrestricted gifts to support independent social science research on misinformation and polarization related to social communication technol…

1 день, 14 часов назад @ research.fb.com
Summaries of the Content Policy Research Initiative workshops in Dar es Salaam and Rome
Summaries of the Content Policy Research Initiative workshops in Dar es Salaam and Rome Summaries of the Content Policy Research Initiative workshops in Dar es Salaam and Rome

The Content Policy Research Initiative (CPRI) was launched last year with the goal of enhancing engagement with the research community around how we develop and enforce our Community Standards.

The two most recent workshops took place in December 2019, in Dar es Salaam, Tanzania, and Rome, Italy.

CPRI workshop in Dar es SalaamAt the CPRI workshop in Dar es Salaam, Facebook hosted 18 external researchers from around East Africa.

This came up with respect to hate speech, polarization, and the likelihood that offline harm could result from online content.

This came up with respect to hate speech, polarization, and the likelihood that offline harm could result from online content.

5 дней, 13 часов назад @ research.fb.com
Announcing the winners of the Content Governance request for proposals
Announcing the winners of the Content Governance request for proposals Announcing the winners of the Content Governance request for proposals

It has also sparked a wider conversation on a variety of issues, including online speech, law and technology, digital constitutionalism, multi-stakeholderism, content moderation, content governance, journalism, applied ethics, free expression, digital rights, human rights, tech policy, and other related fields of study.

This ongoing exchange of ideas will influence how Facebook develops its strategies and plans for content governance going forward.

Recognizing the importance of this ongoing conversation, Facebook launched a request for proposals aimed at funding research and advocacy work in the area of online content.

Each proposal was reviewed by a multidisciplinary team at Facebook that …

1 неделя, 1 день назад @ research.fb.com
New privacy-protected Facebook data for independent research on social media’s impact on democracy
New privacy-protected Facebook data for independent research on social media’s impact on democracy New privacy-protected Facebook data for independent research on social media’s impact on democracy

In 2018, Facebook began an initiative to support independent academic research on social media’s role in elections and democracy.

That 2019 data set consisted of links that had been shared publicly on Facebook by at least 100 unique Facebook users.

With this data, researchers will be able to understand important aspects of how social media shapes our world.

This new data set, like the data we released before it, is protected by a method known as differential privacy .

This white paper summarizes our learnings on implementing differential privacy and serves as a roadmap for other organizations seeking to implement similar privacy protections.

1 неделя, 5 дней назад @ research.fb.com
Promoting further innovation in data science with new call for research in applied statistics
Promoting further innovation in data science with new call for research in applied statistics Promoting further innovation in data science with new call for research in applied statistics

The RFP is a joint effort led by the Facebook Infrastructure Data Science and Core Data Science (CDS) teams and builds upon the 2019 research awards in this area.

Infra Data Science is a team of applied quantitative and computational experts that use math, statistics, and machine learning to measure and optimize performance, reliability and efficiency of Facebook’s infrastructure and global telecom systems.

“We are interested in leveraging recent advancements in statistics and machine learning to improve the performance, reliability, and efficiency of Facebook’s infrastructure,” says Rajiv Krishnamurthy, Research Data Science Director.

“Engineers and decision-makers routinely utilize experi…

1 неделя, 6 дней назад @ research.fb.com
Announcing the winners of the Systems for ML research awards
Announcing the winners of the Systems for ML research awards Announcing the winners of the Systems for ML research awards

In September 2019 at the annual AI Systems Faculty Summit , Facebook launched the Systems for Machine Learning request for proposals with the goal of funding impactful solutions in the areas of developer toolkits, compilers/code generation, system architecture, memory technologies, and ML accelerator support.

“Due to the great successes of collaborations that have come from our previous RFPs in systems and machine learning, we’re very excited to continue with another round of investments in academic research in this important domain,” says Kim Hazelwood , Senior Engineering Manager.

Previous RFPs have included the 2017 Hardware and Software Systems RFP and the 2019 AI System Hardware/Softwa…

2 недели, 4 дня назад @ research.fb.com
Announcing the recipients of the 2020 Facebook Fellowship awards
Announcing the recipients of the 2020 Facebook Fellowship awards Announcing the recipients of the 2020 Facebook Fellowship awards

Every year, the Facebook Fellowship program awards PhD candidates conducting research in important topics across computer science and engineering such as computer vision, programming languages, computational social science, and more.

Recipients of the Facebook Fellowship award receive tuition and fees paid for up to two academic years and a stipend of $42,000, which includes conference travel support.

The program is now in its ninth year and has supported over 108 PhD candidates from a broad range of universities.

This year, we’ve selected 36 outstanding Fellows from 16 universities in the U.S. and six universities outside of the U.S.

We added more research areas for students to apply for, …

4 недели назад @ research.fb.com
New research award opportunities announced at POPL 2020
New research award opportunities announced at POPL 2020 New research award opportunities announced at POPL 2020

Experts in programming languages and programming systems are meeting in New Orleans from Sunday, January 19, to Saturday, January 25, for the Symposium on Principles of Programming Languages (POPL).

Among the attendees are several Facebook researchers and engineers looking to engage with the academic community and discuss topics in programming languages.

At the conference social hour today, Facebook Software Engineering Manager Satish Chandra is announcing a new research award opportunity in probability and programming.

With this event, we hope to foster a community among women in programming languages, including faculty, postdocs, and students.

New research award opportunityAt POPL 2019, w…

1 месяц назад @ research.fb.com
Supporting independent research to better measure the impact of the digital economy
Supporting independent research to better measure the impact of the digital economy Supporting independent research to better measure the impact of the digital economy

The digital economy is redefining the global economy, both in creating new economic spaces and by altering the way analog activities are undertaken.

Even though an exact definition of the digital economy is elusive, it is clear that the leading digital platforms are having an impact on individuals, markets, and society.

Building upon the 2019 Economic Opportunity and Digital Platforms research awards , we are contributing $1 million in funding to support research on economic impact and opportunity in 2020.

New request for proposalsIn our first initiative in this space for 2020, we launched a request for research proposals on measuring economic impact in the digital economy.

Example topic ar…

1 месяц, 1 неделя назад @ research.fb.com
Facebook releases improved Displacement Maps for crisis response
Facebook releases improved Displacement Maps for crisis response Facebook releases improved Displacement Maps for crisis response

Today, we’re announcing the launch of a new version of Facebook Displacement Maps as part of the Disaster Maps product suite.

Displacement Maps backgroundOver a year ago, we launched an initial version of Displacement Maps, which aimed to understand the number of people displaced long-term after a natural disaster.

Our improved Displacement Maps use aggregated and de-identified data from people using Facebook on their devices who have opted into location history.

New Displacement Maps examples: Cyclone Fani and Typhoon HagibisCyclone FaniCyclone Fani hit India and Bangladesh on May 2019, causing at least 89 fatalities and more than $8 billion in damages.

In the figure below, our new displac…

1 месяц, 1 неделя назад @ research.fb.com
Fighting Abuse @Scale 2019 recap
Fighting Abuse @Scale 2019 recap Fighting Abuse @Scale 2019 recap

Fighting abuse presents unique challenges for large-scale organizations working to keep the people on their platforms safe.

At Fighting Abuse @Scale 2019, engineers, data scientists, product managers, and operations specialists gathered in Menlo Park for a day of technical talks focused on state-of-the art technologies to fight fraud, spam, and abuse on platforms that serve millions or even billions of people.

Our key insight is that sharing patterns can help hosting platforms identify abusive content, while hosting platforms can help sharing platforms prevent the spread of abusive content.

Results demonstrate that working together as an industry can strengthen the capacity to more quickly …

2 месяца, 2 недели назад @ engineering.fb.com
CCSM: Scalable statistical anomaly detection to resolve app crashes faster
CCSM: Scalable statistical anomaly detection to resolve app crashes faster CCSM: Scalable statistical anomaly detection to resolve app crashes faster

A contrast set mining algorithmCSM provides a scalable, robust way to generate human-readable insights on high dimensional crash data.

For a contrast set X and group G, the support S(X,G) is the percentage of vectors in group G for which the contrast set X is true.

To efficiently traverse the search space of feature combinations, we cast the problem of mining contrast sets as a tree search problem.

However, real world data is often mixed — our crash data contains a mix of categorical, discrete, and continuous data.

The continuous contrast mining algorithm adopts the same tree search framework, with modifications to reason about sets of continuous features.

3 месяца назад @ engineering.fb.com
Fast dimensional analysis for root cause analysis at scale
Fast dimensional analysis for root cause analysis at scale Fast dimensional analysis for root cause analysis at scale

Nikolay Pavlovich Laptev Fred Lin Keyur Muzumdar Mihai-Valentin CureleaWhat the research is:A fast dimensional analysis (FDA) framework that automates root cause analysis on structured logs with improved scalability.

When a failure event happens in a large-scale distributed production environment, performing root cause analysis can be challenging.

Our proposed FDA framework combines structured logs from a number of sources and provides a meaningful combination of features.

As we’ve mentioned, the challenges of performing root cause analysis in a large-scale distributed production environment make outage detection and mitigation difficult.

Read the full paper:Fast Dimensional Analysis for Ro…

3 месяца, 2 недели назад @ engineering.fb.com
2019 @Scale Conference recap
2019 @Scale Conference recap 2019 @Scale Conference recap

If you are interested in future events, visit the @Scale website or join the @Scale community.

@Scale 2019: Data InfraZanzibar: Google’s consistent, global authorization systemRuoming Pang, Principal Software Engineer, GoogleDetermining whether online users are authorized to access digital objects is central to preserving privacy.

6 technical challenges in developing a distributed SQL databaseNeha Deodhar, Software Engineer, YugaByteNeha discusses the experience of developing YugaByte.

@Scale 2019: SecurityLeveraging the type system to write secure applicationsShannon Zhu, Software Engineer, FacebookShannon discusses ways to extend the type system to eliminate entire classes of security vul…

4 месяца назад @ engineering.fb.com
Video @Scale 2019 recap
Video @Scale 2019 recap Video @Scale 2019 recap

At Video @Scale 2019, engineers gathered in San Francisco for a day of technical talks focused on delivering video at scale.

Adopting video at scaleSteven Robertson, Engineer, YouTubeSteven works on streaming video performance at YouTube.

AV1 PanelRonald Bultje, Founder, Two OriolesYaowu Xu, Principal Software Engineer, GoogleChekib Nouira, Senior Video Systems Engineer, IntelPanel moderated by Ioannis Katsavounidis.

Contextual video ad safetyVijaya Chandra, Software Engineering Manager, FacebookRose Kanjirathinkal, Research Scientist, FacebookVijaya leads video understanding efforts at Facebook.

Video integrity at scaleSonal Gandhi, Software Engineer, FacebookSonal talks about reducing har…

4 месяца, 1 неделя назад @ engineering.fb.com
Google Google
последний пост 12 часов назад
Enhancing the Research Community’s Access to Street View Panoramas for Language Grounding Tasks
Enhancing the Research Community’s Access to Street View Panoramas for Language Grounding Tasks Enhancing the Research Community’s Access to Street View Panoramas for Language Grounding Tasks

Touchdown instruction: “Two parked bicycles, and a discarded couch, all on the left.

Walk just past this couch, and stop before you pass another parked bicycle.

This bike will be white and red, with a white seat.

Touchdown is sitting on top of the bike seat.” Other panoramas from the same location taken at other times would be highly unlikely to contain these exact items in the exact same positions.

For a concrete example, see the current imagery available for this location in Street View, which contains very different transient objects.

12 часов назад @ ai.googleblog.com
The new tool helping Asian newsrooms detect fake images
The new tool helping Asian newsrooms detect fake images The new tool helping Asian newsrooms detect fake images

Viral images and memes flood our feeds and chats, and often they’re out-of-context or fake.

That’s a barrier for fact-checkers and journalists in countries where most people connect to the internet on their mobile.

For the past two years, the Google News Initiative has worked with journalists to identify manipulated images using technology.

[EK] At Storyful, we see old, inaccurate or modified images being reshared to push a misleading narrative in news cycles big and small.

Source helps detect and translate text in images too, which is especially useful for journalists cataloguing or analyzing memes online.

16 часов назад @ blog.google
Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer
Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer

Peanut butter and bananas were the original sandwich spread (also known as PB&J or Peanut Butter and Jelly), so they are probably my favorite.

When I was a kid, I knew what peanut butter and bananas tasted like, but I didn't really think of them as one flavor.

I did recognize PB & J's as just a sandwich spread, and not really two separate flavours.

Using a kitchen timer, or using a microwave, heat butter in a saucepan and melt over low heat.

Assemble peanut butter and banana sandwich spread by spreading the peanut butter mixture on each slice of bread.

1 день, 10 часов назад @ ai.googleblog.com
Announcing the 2019 Google Faculty Research Award Recipients
Announcing the 2019 Google Faculty Research Award Recipients Announcing the 2019 Google Faculty Research Award Recipients

Give us feedback in our Product Forums

1 день, 12 часов назад @ ai.googleblog.com
Setting Fairness Goals with the TensorFlow Constrained Optimization Library
Setting Fairness Goals with the TensorFlow Constrained Optimization Library Setting Fairness Goals with the TensorFlow Constrained Optimization Library

Illustration of a binary classification dataset with two protected groups: blue and orange.

For ease of visualization, rather than plotting each individual data point, the densities are represented as ovals.

The positive and negative signs denote the labels.

The decision boundary drawn as a black dashed line separating positive predictions (regions above the line) and negative (regions below the line) labels, chosen to maximize accuracy.

4 дня, 13 часов назад @ ai.googleblog.com
Your ML workloads cheaper and faster with the latest GPUs
Your ML workloads cheaper and faster with the latest GPUs Your ML workloads cheaper and faster with the latest GPUs

In fact, it has been supported as a storage format for many years on NVIDIA GPUs: High performance FP16 is supported at full speed on NVIDIA T4, NVIDIA V100, and P100 GPUs.

Automatic mixed precision mode in TensorFlowMixed precision uses both FP16 and FP32 data types when training a model.

Mixed-precision training usually achieves the same accuracy as single-precision training using the same hyper-parameters.

NVIDIA T4 and NVIDIA V100 GPUs incorporate Tensor Cores, which accelerate certain types of FP16 matrix math, enabling faster and easier mixed-precision computation.

Performing arithmetic operations in FP16 takes advantage of the performance gains of using lower-precision hardware (such…

4 дня, 14 часов назад @ cloud.google.com
Explaining model predictions on structured data
Explaining model predictions on structured data Explaining model predictions on structured data

With that in mind, over the next few months, we’ll share a series of blog posts that covers how to use AI Explanations with different data modalities, like tabular, image, and text data.

In today’s post, we’ll take a detailed look at how you can use Explainable AI with tabular data, both with AutoML Tables and on Cloud AI Platform.

AI Explanations offers two approximation methods: Integrated Gradients and Sampled Shapley.

AI Explanations for AutoML TablesAutoML Tables lets you automatically build, analyze, and deploy state-of-the-art machine learning models using your own structured data.

This post walks through the data ingestion—which is made easy by AutoML—and training process using that…

5 дней, 14 часов назад @ cloud.google.com
Generating Diverse Synthetic Medical Image Data for Training Machine Learning Models
Generating Diverse Synthetic Medical Image Data for Training Machine Learning Models Generating Diverse Synthetic Medical Image Data for Training Machine Learning Models

An example of a particularly interesting out-of-focus pattern across a biological tissue slice.

Areas in blue were recognized by the model to be in-focus, whereas areas highlighted in yellow, orange, or red were more out of focus.

The gradation in focus here (represented by concentric circles: a red/orange out-of-focus center surrounded by green/cyan mildly out-of-focus, and then a blue in-focus ring) was caused by a hard “stone” in the center that lifted the surrounding biological tissue.

6 дней, 13 часов назад @ ai.googleblog.com
New Dialogflow Mega Agent for Contact Center AI increases intents by 10 times to 20,000
New Dialogflow Mega Agent for Contact Center AI increases intents by 10 times to 20,000 New Dialogflow Mega Agent for Contact Center AI increases intents by 10 times to 20,000

A regular Dialogflow agent comes with a limit of 2,000 intents—which is the most intents available in the market, based on public information.

With Dialogflow Mega Agent, now in beta, you can combine multiple Dialogflow agents into a single agent, and expand your intent limit by 10 times to 20,000 intents.

With increased intents, customers can have more natural, seamless conversations, pivot intents and questions when they want, and get their questions covered.

Yet, an internal study showed that 80% of Dialogflow agents had easy-to-fix quality issues.

Reducing errors leads to faster bot deployment, and ultimately, higher quality Dialogflow agents in production.

6 дней, 15 часов назад @ cloud.google.com
AutoFlip: An Open Source Framework for Intelligent Video Reframing
AutoFlip: An Open Source Framework for Intelligent Video Reframing AutoFlip: An Open Source Framework for Intelligent Video Reframing

Top: Camera paths resulting from following the bounding boxes from frame-to-frame.

Bottom: Final smoothed camera paths generated using Euclidean-norm path formation.

Left: Scene in which objects are moving around, requiring a tracking camera path.

Right: Scene where objects stay close to the same position; a stationary camera covers the content for the full duration of the scene.

1 неделя, 5 дней назад @ ai.googleblog.com
Learning to See Transparent Objects
Learning to See Transparent Objects Learning to See Transparent Objects

Surface Normal estimation on real images when trained on a) Matterport3D and ScanNet only (MP+SN), b) our synthetic dataset only, and c) MP+SN as well as our synthetic dataset.

Note how the model trained on MP+SN fails to detect the transparent objects.

The model trained on only synthetic data picks up the real plastic bottles remarkably well, but fails for other objects and surfaces.

When trained on both, our model gets the best of both worlds.

1 неделя, 6 дней назад @ ai.googleblog.com
TyDi QA: A Multilingual Question Answering Benchmark
TyDi QA: A Multilingual Question Answering Benchmark TyDi QA: A Multilingual Question Answering Benchmark

Give us feedback in our Product Forums

2 недели, 5 дней назад @ ai.googleblog.com
ML-fairness-gym: A Tool for Exploring Long-Term Impacts of Machine Learning Systems
ML-fairness-gym: A Tool for Exploring Long-Term Impacts of Machine Learning Systems ML-fairness-gym: A Tool for Exploring Long-Term Impacts of Machine Learning Systems

An example of Simpson's paradox.

TP are the true positive classifications, FN corresponds to the false negative classifications and TPR is the true positive rate.

In years 1 and 2, the lender applies a policy that achieves equal TPR between the two groups.

The aggregation over both years does not have equal TPR.

2 недели, 6 дней назад @ ai.googleblog.com
Encode, Tag and Realize: A Controllable and Efficient Approach for Text Generation
Encode, Tag and Realize: A Controllable and Efficient Approach for Text Generation Encode, Tag and Realize: A Controllable and Efficient Approach for Text Generation

When training the models on the full dataset of 1 million examples, both LaserTagger and a BERT-based seq2seq baseline model perform comparably, but when training on a subsample of 10,000 examples or less, LaserTagger clearly outperforms the baseline model (the higher the SARI score the better).

3 недели, 4 дня назад @ ai.googleblog.com
Announcing the Third Workshop and Challenge on Learned Image Compression
Announcing the Third Workshop and Challenge on Learned Image Compression Announcing the Third Workshop and Challenge on Learned Image Compression

Give us feedback in our Product Forums

3 недели, 5 дней назад @ ai.googleblog.com
OpenAI OpenAI
последний пост 3 недели, 5 дней назад
OpenAI → PyTorch
OpenAI → PyTorch OpenAI → PyTorch

We are standardizing OpenAI’s deep learning framework on PyTorch.

The main reason we chose PyTorch is to increase our research productivity at scale on GPUs.

It is very easy to try and execute new research ideas in PyTorch; for example, switching to PyTorch decreased our iteration time on research ideas in generative modeling from weeks to days.

Going forward we'll primarily use PyTorch as our deep learning framework but sometimes use other ones when there's a specific technical reason to do so.

Many of our teams have already made the switch, and we look forward to contributing to the PyTorch community in upcoming months.

3 недели, 5 дней назад @ openai.com
OpenAI Five
OpenAI Five OpenAI Five

You play against [OpenAI Five] and you realize it has a playstyle that is different.

It’s doing things that you’ve never done and you’ve never seen.

One key learning that we took is how it was allocating resources.

It’s just allocating resources as efficiently as possible.

[…] If OpenAI does that dynamic switch at 100%, we maybe went from 5% to 10%?

2 месяца, 2 недели назад @ openai.com
Deep Double Descent
Deep Double Descent Deep Double Descent

Many classes of modern deep learning models, including CNNs, ResNets, and transformers, exhibit the previously-observed double descent phenomenon when not using early stopping or regularization.

The model-wise double descent phenomenon can lead to a regime where training on more data hurts.

The double descent phenomena is most prominent in settings with added label noise; without it, the peak is smaller and easy to miss.

For a given number of optimization steps (fixed y-coordinate), test and train error exhibit model-size double descent.

We leave fully understanding the mechanisms behind double descent in deep neural networks as an important open question.

2 месяца, 3 недели назад @ openai.com
Procgen Benchmark
Procgen Benchmark Procgen Benchmark

We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills.

To fulfill this need, we have created Procgen Benchmark.

CoinRun now serves as the inaugural environment in Procgen Benchmark, contributing its diversity to a greater whole.

With Procgen Benchmark, we strive for all of the following: experimental convenience, high diversity within environments, and high diversity across environments.

We've now expanded on those results, conducting our most thorough study of RL generalization to date using all 16 environments in Procgen Benchmark.

2 месяца, 3 недели назад @ openai.com
Safety Gym
Safety Gym Safety Gym

We're releasing Safety Gym, a suite of environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training.

Safety GymTo study constrained RL for safe exploration, we developed a new set of environments and tools called Safety Gym.

BenchmarkTo help make Safety Gym useful out-of-the-box, we evaluated some standard RL and constrained RL algorithms on the Safety Gym benchmark suite: PPO, TRPO, Lagrangian penalized versions of PPO and TRPO, and Constrained Policy Optimization (CPO).

There are three things we are most interested in at the moment:Improving performance on the current Safety Gym environments.

We also hope that systems l…

3 месяца назад @ openai.com
GPT-2: 1.5B Release
GPT-2: 1.5B Release GPT-2: 1.5B Release

As the final model release of GPT-2’s staged release, we’re releasing the largest version (1.5B parameters) of GPT-2 along with code and model weights to facilitate detection of outputs of GPT-2 models.

Our partners at Cornell University surveyed people to assign GPT-2 text a credibility score across model sizes.

People gave the 1.5B model a “credibility score” of 6.91 out of 10.

These results make us more inclined to release the 1.5B model, as the incremental increase in human-perceived credibility relative to 774M seems low.

We acknowledge that we cannot be aware of all threats, and that motivated actors can replicate language models without model release.

3 месяца, 3 недели назад @ openai.com
Solving Rubik’s Cube with a Robot Hand
Solving Rubik’s Cube with a Robot Hand Solving Rubik’s Cube with a Robot Hand

We've trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand.

Since May 2017, we've been trying to train a human-like robotic hand to solve the Rubik’s Cube.

Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes children several years to gain the dexterity required to master it.

To test the limits of our method, we experiment with a variety of perturbations while the hand is solving the Rubik’s Cube.

Behind the scenes: Rubik’s Cube prototypes In order to benchmark our progress and make the problem tractable, we built and designed custom versions of cubes as stepping stones towards ultimately solving a regular Rubik’s Cube.

4 месяца, 1 неделя назад @ openai.com
OpenAI Scholars Spring 2020
OpenAI Scholars Spring 2020 OpenAI Scholars Spring 2020

The second class of Scholars recently released their projects and presented their work at the 2019 Scholars Demo Day.

While we hope that some of the scholars will join OpenAI, we want this program to improve diversity in the field at large.

For Bay Area participants, we offer an optional desk at the OpenAI office (which our past Scholars have found very valuable).

We look for people who are comfortable writing software (2+ years in software engineering), but no previous machine learning experience is required.

We ask all Scholars to document their experiences studying deep learning to hopefully inspire others to join the field too.

4 месяца, 2 недели назад @ openai.com
Fine-Tuning GPT-2 from Human Preferences
Fine-Tuning GPT-2 from
Human Preferences Fine-Tuning GPT-2 from Human Preferences

We’ve fine-tuned the 774M parameter GPT-2 language model using human feedback for various tasks, successfully matching the preferences of the external human labelers, though those preferences did not always match our own.

Fine-tuning for the stylistic continuation tasks is sample efficient: 5,000 human samples suffice for strong performance according to humans.

However, when combining supervised fine-tuning with human fine-tuning, our models outperform lead-3 on ROUGE scores.

The cost of human data means that volume will always be low, so it is easy to retrain from scratch (or rather, from the GPT-2 starting point) each time.

Looking forwardWe’ve demonstrated reward learning from human pref…

5 месяцев, 1 неделя назад @ openai.com
Emergent Tool Use from Multi-Agent Interaction
Emergent Tool Use from Multi-Agent Interaction Emergent Tool Use from Multi-Agent Interaction

Through training in our new simulated hide-and-seek environment, agents build a series of six distinct strategies and counterstrategies, some of which we did not know our environment supported.

The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.

In this full environment, agents go through two more phases of emergent strategy than in the previous simple environment.

Multi-agent competition vs. intrinsic motivationIn this work we show evidence that agents learn complex strategies and counterstrategies through a self-supervised autocurriculum in hide-and-seek.

Though t…

5 месяцев, 1 неделя назад @ openai.com
Testing Robustness Against Unforeseen Adversaries
Testing Robustness Against Unforeseen Adversaries Testing Robustness Against Unforeseen Adversaries

Our method yields a new metric, UAR (Unforeseen Attack Robustness), which evaluates the robustness of a single model against an unanticipated attack, and highlights the need to measure performance across a more diverse range of unforeseen attacks.

The field has made progress in hardening models against such attacks; however, robustness against one type of distortion often does not transfer to robustness against attacks unforeseen by designers of the model.

It also yields a new metric, UAR, which assesses the adversarial robustness of models against unforeseen distortion types.

A UAR score near 100 against an unforeseen adversarial attack implies performance comparable to a defense with prio…

6 месяцев, 1 неделя назад @ openai.com
GPT-2: 6-Month Follow-Up
GPT-2: 6-Month Follow-Up GPT-2: 6-Month Follow-Up

Our research suggests that current ML-based methods only achieve low to mid–90s accuracy, and that fine-tuning the language models decreases accuracy further.

PartnershipsWe’ve partnered with four leading research organizations to analyze both the newly-released 774M parameter GPT-2 model and the unreleased full-size GPT-2 model.

is studying human susceptibility to digital disinformation generated by language models.

Center on Terrorism, Extremism, and Counterterrorism (CTEC) is exploring how GPT-2 could be misused by terrorists and extremists online.

The University of Oregon is developing a series of “bias probes” to analyze bias within GPT-2.

6 месяцев, 1 неделя назад @ openai.com
Learning Day
Learning Day Learning Day

Before Learning Day, we very rarely saw people grow cross-functionally—for example, employees coming from a software background rarely picked up machine learning (something equally rare in other organizations except academia).

What we learn on Learning DayThe following are examples of what people learn on a single Learning Day.

Learning Day could turn into a normal working day because people may want to accomplish their main project faster (due to internal or external pressure).

We prevent this by having Learning Day on the same day for every team.

Learning Day beyond RoboticsWe’ve recently expanded Learning Day from a subset of our technical teams to the entire company.

6 месяцев, 4 недели назад @ openai.com
Microsoft Invests In and Partners with OpenAI to Support Us Building Beneficial AGI
Microsoft Invests In and Partners with OpenAI to Support Us Building Beneficial AGI Microsoft Invests In and Partners with OpenAI to Support Us Building Beneficial AGI

Microsoft is investing $1 billion in OpenAI to support us building artificial general intelligence (AGI) with widely distributed economic benefits.

We're partnering to develop a hardware and software platform within Microsoft Azure which will scale to AGI.

An AGI working on a problem would be able to see connections across disciplines that no human could.

OpenAI is producing a sequence of increasingly powerful AI technologies, which requires a lot of capital for computational power.

Instead, we intend to license some of our pre-AGI technologies, with Microsoft becoming our preferred partner for commercializing them.

7 месяцев, 1 неделя назад @ openai.com
Why Responsible AI Development Needs Cooperation on Safety
Why Responsible AI Development Needs Cooperation on Safety Why Responsible AI Development Needs Cooperation on Safety

Our analysis shows that industry cooperation on safety will be instrumental in ensuring that AI systems are safe and beneficial, but competitive pressures could lead to a collective action problem, potentially causing AI companies to under-invest in safety.

We hope these strategies will encourage greater cooperation on the safe development of AI and lead to better global outcomes of AI.

Cooperation strategies We've found four strategies that can be used today to improve the likelihood of cooperation on safety norms and standards in AI.

Incentivize adherence to high standards of safety Commend those that adhere to safety standards, reproach failures to ensure that systems are developed safel…

7 месяцев, 2 недели назад @ openai.com
Microsoft Microsoft
последний пост 13 часов назад
EMEA scholarship program continues to foster collaboration with academia while new research award targets rising talent
EMEA scholarship program continues to foster collaboration with academia while new research award targets rising talent EMEA scholarship program continues to foster collaboration with academia while new research award targets rising talent

The Principal Researcher landed on the personal mission years before joining Microsoft Research.

Advancing research like Morrison’s through collaboration and strong relationships between Microsoft Research Cambridge researchers and academia is at the core of the Microsoft Research PhD Scholarship Programme in EMEA (Europe, the Middle East, and Africa).

Proposals are reviewed and selected by Microsoft researchers in a two-stage review process.

EMEA PhD AwardAs a complement to the EMEA PhD Scholarship Programme, Microsoft Research is excited to announce the Microsoft Research EMEA PhD Award, a new research grant for PhD students in computing-related fields who are in their third year or beyon…

13 часов назад @ microsoft.com
The Chief Information Security Officer of Microsoft explains how it’s helping companies protect themselves against a threat almost as big as hackers: Their own employees
The Chief Information Security Officer of Microsoft explains how it’s helping companies protect themselves against a threat almost as big as hackers: Their own employees The Chief Information Security Officer of Microsoft explains how it’s helping companies protect themselves against a threat almost as big as hackers: Their own employees

Microsoft Chief Information Security Officer Bret Arsenault said the new technology, called Microsoft Insider Risk Management, uses AI and machine learning to allow businesses to monitor the way information flows within a network.

"Many of us were focused on external adversaries and I don't think it's a mistake," Arsenault told Business Insider.

"I know that's a super secret project that's supposed to be protected," he told Business Insider.

But Arsenault said the Microsoft tool isn't just about stopping intellectual property theft and other malicious acts.

It can also help them come up with better policies related to the way information is stored and managed within the organization.

3 дня, 15 часов назад @ businessinsider.com
Delivering on the promise of security AI to help defenders protect today’s hybrid environments
Delivering on the promise of security AI to help defenders protect today’s hybrid environments Delivering on the promise of security AI to help defenders protect today’s hybrid environments

The AI capabilities built into Microsoft Security solutions are trained on 8 trillion daily threat signals and the insights of 3,500 security experts.

As a result, Microsoft Security solutions help identify and respond to threats 50% faster than was possible just 12 months ago.

Today, Microsoft Security solutions are able to automate 97% of the routine tasks that occupied defenders’ valuable time just two years ago.

Microsoft Threat Protection breaks down security silos so security professionals can automatically detect, investigate and stop coordinated multi-point attacks.

More details on the Microsoft Threat Protection announcement can be found on the Microsoft Security Blog.

5 дней, 9 часов назад @ blogs.microsoft.com
How Microsoft 365’s new solution uses machine learning to stop data leaks and insider attacks
How Microsoft 365’s new solution uses machine learning to stop data leaks and insider attacks How Microsoft 365’s new solution uses machine learning to stop data leaks and insider attacks

That’s why Microsoft is offering a new Insider Risk Management solution within Microsoft 365 that uses machine learning to intelligently detect potentially risky behavior within a company.

Because mistakes are a larger source of actual risk than insider attacks, the solution was designed to help employees make the right choices and avoid common security lapses.

“Fundamentally, a company’s employees are usually trying to do the right thing,” said Bret Arsenault, Microsoft’s chief information security officer and corporate vice president.

In a recent survey of cybersecurity professionals, 90 percent of organizations indicated that they felt vulnerable to insider risk, and two-thirds considere…

5 дней, 17 часов назад @ blogs.microsoft.com
How Microsoft 365’s new solution uses machine learning to stop data leaks and insider attacks
How Microsoft 365’s new solution uses machine learning to stop data leaks and insider attacks How Microsoft 365’s new solution uses machine learning to stop data leaks and insider attacks

That’s why Microsoft is offering a new Insider Risk Management solution within Microsoft 365 that uses machine learning to intelligently detect potentially risky behavior within a company.

Because mistakes are a larger source of actual risk than insider attacks, the solution was designed to help employees make the right choices and avoid common security lapses.

“Fundamentally, a company’s employees are usually trying to do the right thing,” said Bret Arsenault, Microsoft’s chief information security officer and corporate vice president.

In a recent survey of cybersecurity professionals, 90 percent of organizations indicated that they felt vulnerable to insider risk, and two-thirds considere…

5 дней, 17 часов назад @ blogs.microsoft.com
Five ways your academic research skills transfer to industry
Five ways your academic research skills transfer to industry Five ways your academic research skills transfer to industry

The piece that was missing for me, though, was how exactly to translate my extensive academic experience into what industry finds valuable.

We academics gain a wealth of knowledge and skills in the process, and those skills are often showcased in an extensive academic CV.

Industry job descriptions can seem daunting to an academic, though, as the language used to describe qualifications and required skills is more vague than the troves of information we’re used to providing in academia.

Be patient with the processAll told, the process of converting my CV into a document I could submit for an industry job took a little over two months.

But that is a conversation for another post …Alaina Talbo…

6 дней, 13 часов назад @ microsoft.com
Artificial intelligence makes a splash in efforts to protect Alaska’s ice seals and beluga whales
Artificial intelligence makes a splash in efforts to protect Alaska’s ice seals and beluga whales Artificial intelligence makes a splash in efforts to protect Alaska’s ice seals and beluga whales

She and her fellow National Oceanic and Atmospheric Administration scientists now will use artificial intelligence this spring to help monitor endangered beluga whales, threatened ice seals, polar bears and more, shaving years off the time it takes to get data into the right hands to protect the animals.

Of the four types of ice seals in the Bering Sea — bearded, ringed, spotted and ribbon — the first two are classified as threatened, meaning they are likely to become in danger of extinction within the foreseeable future.

“Remote equipment lets us collect all kinds of data, but scientists have to figure out how to use that data.

Ice seals live largely solitary lives, making them harder to s…

6 дней, 15 часов назад @ news.microsoft.com
Democratizing data, thinking backwards and setting North Star goals with Dr. Donald Kossmann
Democratizing data, thinking backwards and setting North Star goals with Dr. Donald Kossmann Democratizing data, thinking backwards and setting North Star goals with Dr. Donald Kossmann

And the third is, I would call it cost or performance, is making sure that you don’t overpay for the data, right?

Donald Kossmann: …right?

We don’t have product deadlines, shipping deadlines and so we have time to really think things through.

Donald Kossmann: Yeah, so of course now I have to think big and I have to think beyond to be inspiring.

We have all the ingredients here that you need to address these really, really big dreams.

6 дней, 20 часов назад @ microsoft.com
How AI is helping map the world’s most vulnerable places
How AI is helping map the world’s most vulnerable places How AI is helping map the world’s most vulnerable places

Rapid-onset cases such as this put vulnerable communities at risk because aid agencies and authorities don’t have the data they require.

Collaborating with Microsoft’s Bing Maps, HOT set out first to map Uganda and Tanzania.

[READ MORE: How AI and satellites are used to combat illegal fishing]During an emergency, that level of detail is crucial.

The combination of on-the-ground knowledge and committed volunteers with powerful AI and huge datasets has enormous potential.

For more on how technology can save lives, visit AI for Humanitarian Action.

1 неделя, 5 дней назад @ news.microsoft.com
Microsoft Scheduler and dawn of Intelligent PDAs with Dr. Pamela Bhattacharya
Microsoft Scheduler and dawn of Intelligent PDAs with Dr. Pamela Bhattacharya Microsoft Scheduler and dawn of Intelligent PDAs with Dr. Pamela Bhattacharya

Pamela Bhattacharya: Yeah, yeah!

Pamela Bhattacharya: Yeah, yeah.

Host: Right, right, right.

And you have a wonderful explanation for levels of automation and where Scheduler lives now…Pamela Bhattacharya: Yeah, yeah.

Pamela Bhattacharya: Yeah, yeah!

1 неделя, 6 дней назад @ microsoft.com
Microsoft Connected Vehicle Platform: trends and investment areas
Microsoft Connected Vehicle Platform: trends and investment areas

At Microsoft, we’ve expanded our partnerships, including Volkswagen, LG Electronics, Faurecia, TomTom, and more, and taken the wraps off new thinking such as at CES, where we recently demonstrated our approach to in-vehicle compute and software architecture.

2 недели назад @ azure.microsoft.com
Data Visualization: Bridging the Gap Between Users and Information Webinar
Data Visualization: Bridging the Gap Between Users and Information Webinar Data Visualization: Bridging the Gap Between Users and Information Webinar

Microsoft Research Webinar SeriesData Visualization: Bridging the Gap Between Users and InformationWith the explosion of data available today, finding effective ways for humans to interact with that data represents an enormous opportunity for researchers.

At present, the ability to understand the nuances of visualizing data extends beyond simply interpreting data—we need to examine data from multiple perspectives to inform how we act on that data effectively.

As tools to create data visualizations become more advanced, critical thinking about how to make and interpret these has greater implications for how messaging using data can impact our society.

In this webinar led by Microsoft researc…

2 недели, 1 день назад @ note.microsoft.com
Turing-NLG: A 17-billion-parameter language model by Microsoft
Turing-NLG: A 17-billion-parameter language model by Microsoft Turing-NLG: A 17-billion-parameter language model by Microsoft

Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks.

– This summary was generated by the Turing-NLG language model itself.

T-NLG: Benefits of a large generative language modelT-NLG is a Transformer-based generative language model, which means it can generate words to complete open-ended textual tasks.

The resulting T-NLG model has 78 Transformer layers with a hidden size of 4256 and 28 attention heads.

Link to original text T-NLG Summary “Microsoft will be carbon negative by 2030” by Brad Smith, Official Microsoft Blog” Microsoft is committed to being carbon negative by 2030.

2 недели, 1 день назад @ microsoft.com
ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters
ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters

It also presents a clear path to training models with trillions of parameters, demonstrating an unprecedented leap in deep learning system technology.

Overcoming limitations of data parallelism and model parallelism with ZeROWe developed ZeRO to conquer the limitations of data parallelism and model parallelism while achieving the merits of both.

We call this ZeRO-powered data parallelism, which allows per-device memory usage to scale linearly with the degree of data parallelism and incurs similar communication volume as data parallelism.

ZeRO stage one in DeepSpeed provides system support to run models up to 100 billion parameters, 10 times bigger.

For example, to train a model with 20 bill…

2 недели, 1 день назад @ microsoft.com
How AI is helping reinvent the world of manufacturing
How AI is helping reinvent the world of manufacturing How AI is helping reinvent the world of manufacturing

In The Future Computed: AI and Manufacturing, Microsoft Senior Director Greg Shaw explores how AI, automation and the internet of things (IoT) present new challenges and opportunities.

[READ MORE: How AI is helping unlock information about melting glaciers]The auto parts giant ZF Group is using AI and algorithms to make production more reliable and more sustainable.

Click here to load mediaThe energy management firm Schneider Electric is using AI to tame maintenance issues.

AI and data analysis are helping to make drilling more precise.

Click here to load mediaFor more on AI, visit AI Empowering Innovation.

2 недели, 4 дня назад @ news.microsoft.com
MIT AI MIT AI
последний пост 10 часов назад
MIT Solve announces 2020 global challenges
MIT Solve announces 2020 global challenges MIT Solve announces 2020 global challenges

Solve seeks tech-based solutions from social entrepreneurs around the world that address these four challenges.

Finalists will be invited to attend Solve Challenge Finals on Sept. 20 in New York City during U.N. General Assembly week.

At the event, they will pitch their solutions to Solve’s Challenge Leadership Groups, judging panels comprised of industry leaders and MIT faculty.

Solve’s challenge design process collects insights and ideas from industry leaders, MIT faculty, and local community voices alike.

As a marketplace for social impact innovation, Solve’s mission is to solve world challenges.

10 часов назад @ news.mit.edu
Bringing deep learning to life
Bringing deep learning to life Bringing deep learning to life

Gaby Ecanow loves listening to music, but never considered writing her own until taking 6.S191 (Introduction to Deep Learning).

The course covers the technical foundations of deep learning and its societal implications through lectures and software labs focused on real-world applications.

A branch of machine learning, deep learning harnesses massive data and algorithms modeled loosely on how the brain processes information to make predictions.

Predicting protein behavior is key to designing drug targets, among other clinical applications, and Sledzieski wondered if deep learning could speed up the search for viable protein pairs.

After finishing their undergraduate degrees, they decided to …

1 день, 12 часов назад @ news.mit.edu
A human-machine collaboration to defend against cyberattacks
A human-machine collaboration to defend against cyberattacks A human-machine collaboration to defend against cyberattacks

Being a cybersecurity analyst at a large company today is a bit like looking for a needle in a haystack — if that haystack were hurtling toward you at fiber optic speed.

“Most machine learning systems in cybersecurity have been doing anomaly detection,” says Kalyan Veeramachaneni, a co-founder of PatternEx and a principal research scientist at MIT.

Veeramachaneni and Arnaldo knew from their time building tools for machine-learning researchers at MIT that a successful solution would need to seamlessly integrate machine learning with human expertise.

The platform uses machine learning models to go through more than 50 streams of data and identify suspicious behavior.

We do that very efficient…

4 дня, 12 часов назад @ news.mit.edu
A road map for artificial intelligence policy
A road map for artificial intelligence policy A road map for artificial intelligence policy

The rapid development of artificial intelligence technologies around the globe has led to increasing calls for robust AI policy: laws that let innovation flourish while protecting people from privacy violations, exploitive surveillance, biased algorithms, and more.

“This is a very complex problem,” Luis Videgaray PhD ’98, director of MIT’s AI Policy for the World Project, said in a lecture on Wednesday afternoon.

“We can go to the next phase … principles are a necessary but not sufficient condition for AI policy.

Because policy is about making hard choices in uncertain conditions.”Indeed, he emphasized, more progress can be made by having many AI policy decisions be particular to specific i…

5 дней, 12 часов назад @ news.mit.edu
Artificial intelligence yields new antibiotic
Artificial intelligence yields new antibiotic Artificial intelligence yields new antibiotic

Using a machine-learning algorithm, MIT researchers have identified a powerful new antibiotic compound.

The model picked out one molecule that was predicted to have strong antibacterial activity and had a chemical structure different from any existing antibiotics.

This molecule, which the researchers decided to call halicin, after the fictional artificial intelligence system from “2001: A Space Odyssey,” has been previously investigated as possible diabetes drug.

This screen, which took only three days, identified 23 candidates that were structurally dissimilar from existing antibiotics and predicted to be nontoxic to human cells.

The researchers also plan to use their model to design new a…

5 дней, 15 часов назад @ news.mit.edu
Benjamin Chang: Might technology tip the global scales?
Benjamin Chang: Might technology tip the global scales? Benjamin Chang: Might technology tip the global scales?

The nuclear balanceIn the domain of military power, one question Chang has been pursuing is whether the use of AI in nuclear strategy offers a battlefield advantage.

In subsequent research, Chang will examine the impacts of AI on cybersecurity and on autonomous weaponry such as drones.

A start in policy debatePondering international and security issues began early for Chang.

By graduation, Chang knew he wanted to aim for a career in national security and policy by way of a graduate school education.

At LTSG, Chang facilitated wargames simulating Asia-Pacific conflicts, and wrote monographs on Chinese foreign policy, nuclear signaling, and island warfare doctrine.

6 дней, 12 часов назад @ news.mit.edu
SENSE.nano awards seed grants in optoelectronics, interactive manufacturing
SENSE.nano awards seed grants in optoelectronics, interactive manufacturing SENSE.nano awards seed grants in optoelectronics, interactive manufacturing

SENSE.nano has announced the recipients of the third annual SENSE.nano seed grants.

This year’s grants serve to advance innovations in sensing technologies for augmented and virtual realities (AR/VR) and advanced manufacturing systems.

A center of excellence powered by MIT.nano, SENSE.nano received substantial interest in its 2019 call for proposals, making for stiff competition.

“SENSE.nano strives to convey the breadth and depth of sensing research at MIT," says Brian Anthony, co-leader of SENSE.nano, associate director of MIT.nano, and a principal research scientist in the Department of Mechanical Engineering.

The mission of SENSE.nano is to foster the development and use of novel sensor…

1 неделя, 5 дней назад @ news.mit.edu
Maintaining the equipment that powers our world
Maintaining the equipment that powers our world Maintaining the equipment that powers our world

Power equipment maintenance and failure is such a far-reaching problem it’s difficult to attach a dollar sign to.

The cost of equipment failure, both in terms of business interruption and equipment breakdown, must be enormous to justify the high average fixed cost."

“The problem with using AI for industrial applications is the lack of high-quality data,” Vega-Brown explains.

Making power more reliableTagup’s platform combines all of a customer’s equipment data into one sortable master list that displays the likelihood of each asset causing a disruption.

Tagup’s first deployment was in August of 2016 with a power plant that faces the Charles River close to MIT’s campus.

1 неделя, 6 дней назад @ news.mit.edu
Bridging the gap between human and machine vision
Bridging the gap between human and machine vision Bridging the gap between human and machine vision

On the other hand, we know that state-of-the-art classifiers, such as vanilla deep networks, will fail this simple test.

Thus, understanding how human vision can pull off this remarkable feat is relevant for engineers aiming to improve their existing classifiers.

Next, Han and her colleagues performed a comparable experiment in deep neural networks designed to reproduce this human performance.

In addition, limited position-invariance of human vision is better replicated in the network by having the model neurons’ receptive fields increase as they are further from the center of the visual field.

It also has implications for AI, as the results provide new insights into what is a good architec…

2 недели назад @ news.mit.edu
Brainstorming energy-saving hacks on Satori, MIT’s new supercomputer
Brainstorming energy-saving hacks on Satori, MIT’s new supercomputer Brainstorming energy-saving hacks on Satori, MIT’s new supercomputer

Mohammad Haft-Javaherian planned to spend an hour at the Green AI Hackathon — just long enough to get acquainted with MIT’s new supercomputer, Satori.

With an architecture designed to minimize the transfer of data, among other energy-saving features, Satori recently earned fourth place on the Green500 list of supercomputers.

A postdoc at MIT and Harvard Medical School, Haft-Javaherian came to the hackathon to learn more about Satori.

One way to green AI, and tame the exponential growth in demand for training AI, is to build smaller models.

“In the end, we improved the machine, the documentation, and the tools around it.”Going forward, Satori will be joined in Holyoke by TX-Gaia, Lincoln Lab…

2 недели назад @ news.mit.edu
Hey Alexa! Sorry I fooled you ...
Hey Alexa! Sorry I fooled you ... Hey Alexa! Sorry I fooled you ...

One could imagine using TextFooler for many applications related to internet safety, such as email spam filtering, hate speech flagging, or “sensitive” political speech text detection — which are all based on text classification models.

The system first identifies the most important words that will influence the target model’s prediction, and then selects the synonyms that fit contextually.

In total, TextFooler successfully attacked three target models, including “BERT,” the popular open-source NLP model.

It fooled the target models with an accuracy of over 90 percent to under 20 percent, by changing only 10 percent of the words in a given text.

“The system can be used or extended to attack…

2 недели, 4 дня назад @ news.mit.edu
Researchers develop a roadmap for growth of new solar cells
Researchers develop a roadmap for growth of new solar cells Researchers develop a roadmap for growth of new solar cells

Materials called perovskites show strong potential for a new generation of solar cells, but they’ve had trouble gaining traction in a market dominated by silicon-based solar cells.

Solar cells based on perovskites — a broad category of compounds characterized by a certain arrangement of their molecular structure — could provide dramatic improvements in solar installations.

Describing the literature on perovskite-based solar cells being developed in various labs, he says, “They’re claiming very low costs.

“Probably the company that’s raised the most money is a company called Oxford PV, and they’re looking at tandem cells,” which incorporate both silicon and perovskite cells to improve overal…

2 недели, 5 дней назад @ news.mit.edu
A college for the computing age
A college for the computing age A college for the computing age

Since starting his position in August 2019, Daniel Huttenlocher, the inaugural dean of the MIT Schwarzman College of Computing, has been working with many stakeholders in designing the initial organizational structure of the college.

“The MIT Schwarzman College of Computing is both bringing together existing MIT programs in computing and developing much-needed new cross-cutting educational and research programs,” says Huttenlocher.

Cross-cutting collaborations in computingBuilding on the history of strong faculty participation in interdepartmental labs, centers, and initiatives, the MIT Schwarzman College of Computing provides several forms of membership in the college based on cross-cuttin…

3 недели назад @ news.mit.edu
Demystifying artificial intelligence
Demystifying artificial intelligence Demystifying artificial intelligence

Natalie Lao was set on becoming an electrical engineer, like her parents, until she stumbled on course 6.S192 (Making Mobile Apps), taught by Professor Hal Abelson.

App Inventor set Lao on her path to making it easy for anyone, from farmers to factory workers, to understand AI, and use it to improve their lives.

Then, in 2016, the surprise election of Donald Trump to U.S. president forced her to think more critically about technology.

“Fake news doesn’t have any impact in a vacuum — real people have to read it and share it,” says Lao.

As HINTS was getting off the ground, Lao co-founded a second startup, ML Tidbits, with EECS graduate student Harini Suresh.

3 недели, 6 дней назад @ news.mit.edu
Gift will allow MIT researchers to use artificial intelligence in a biomedical device
Gift will allow MIT researchers to use artificial intelligence in a biomedical device Gift will allow MIT researchers to use artificial intelligence in a biomedical device

Researchers in the MIT Department of Civil and Environmental Engineering (CEE) have received a gift to advance their work on a device designed to position living cells for growing human organs using acoustic waves.

The Acoustofluidic Device Design with Deep Learning is being supported by Natick, Massachusetts-based MathWorks, a leading developer of mathematical computing software.

The pressure waves generated by acoustics in a fluid gently move and position the cells without damaging them.

The engineers developed a computer simulator to create a variety of device designs, which were then fed to an AI platform to understand the relationship between device design and cell positions.

Raymond’s…

3 недели, 6 дней назад @ news.mit.edu
Berkeley AI
последний пост 1 месяц, 1 неделя назад
Large Scale Training at BAIR with Ray Tune
Large Scale Training at BAIR with Ray Tune Large Scale Training at BAIR with Ray Tune

Large Scale Training at BAIR with Ray TuneIn this blog post, we share our experiences in developing two critical software libraries that many BAIR researchers use to execute large-scale AI experiments: Ray Tune and the Ray Cluster Launcher, both of which now back many popular open-source AI libraries.

Ray Cluster Launcher: a utility for managing resource provisioning and cluster configurations across AWS, GCP, and Kubernetes.

Actor-based TrainingMany techniques for hyperparameter optimization require a framework that monitors the metrics of all concurrent training jobs and controls the training execution.

The Ray Tune documentation page for distributed experiments shows you how you can do t…

1 месяц, 1 неделя назад @ bair.berkeley.edu
Emergent Behavior by Minimizing Chaos
Emergent Behavior by Minimizing Chaos Emergent Behavior by Minimizing Chaos

Emergent Behavior by Minimizing ChaosAll living organisms carve out environmental niches within which they can maintain relative predictability amidst the ever-increasing entropy around them (1), (2).

In simulated worlds, such as video games, novelty-seeking intrinsic motivation can lead to interesting and meaningful behavior.

In entropic and dynamic environments with undesirable forms of novelty, minimizing surprise (i.e., minimizing novelty) causes agents to naturally seek an equilibrium that can be stably maintained.

Emergent behaviorThe SMiRL agent demonstrates meaningful emergent behaviors in a number of different environments.

The agent also learns emergent game playing behavior in th…

2 месяца, 1 неделя назад @ bair.berkeley.edu
What is My Data Worth?
What is My Data Worth? What is My Data Worth?

In the worst-case scenario, adversarial data sources may even degrade model performance via data poisoning attacks.

Hence, the data value should reflect the efficacy of data by assigning high values to data which can notably improve the model’s performance.

With the desiderata above, we now discuss a principled notion of data value and computationally efficient algorithms for data valuation.

Relating these game theoretic concepts to the problem of data valuation, one can think of the players as training data sources, and accordingly, the utility function $U(S)$ as a performance measure of the model trained on the subset S of training data.

ConclusionWe hope that our approaches for data va…

2 месяца, 1 неделя назад @ bair.berkeley.edu
Learning to Imitate Human Demonstrations via CycleGAN
Learning to Imitate Human Demonstrations via CycleGAN Learning to Imitate Human Demonstrations via CycleGAN

Learning to Imitate Human Demonstrations via CycleGANThis work presents AVID, a method that allows a robot to learn a task, such as making coffee, directly by watching a human perform the task.

Providing rewards via human videos handles the task definition, however there is still human cost during the actual learning process.

Thus, we train a CycleGAN where the domains are human and robot images: for training data, we collect demonstrations from the human and random movements from both the human and robot.

Through this, we obtain a CycleGAN that is capable of generating fake robot demonstrations from human demonstrations, as depicted above.

We used a total of 30 human demonstrations for thi…

2 месяца, 2 недели назад @ bair.berkeley.edu
Model-Based Reinforcement Learning:Theory and Practice
Model-Based Reinforcement Learning:Theory and Practice Model-Based Reinforcement Learning:Theory and Practice

The natural question to ask after making this distinction is whether to use such a predictive model.

The latter half of this post is based on our recent paper on model-based policy optimization, for which code is available here.

Model-based techniquesBelow, model-based algorithms are grouped into four categories to highlight the range of uses of predictive models.

Sampling-based planningIn the fully general case of nonlinear dynamics models, we lose guarantees of local optimality and must resort to sampling action sequences.

The original proposal of such a combination comes from the Dyna algorithm by Sutton, which alternates between model learning, data generation under a model, and policy …

2 месяца, 2 недели назад @ bair.berkeley.edu
Data-Driven Deep Reinforcement Learning
Data-Driven Deep Reinforcement Learning Data-Driven Deep Reinforcement Learning

Deep reinforcement learning methods, however, require active online data collection, where the model actively interacts with its environment.

A data-driven paradigm for reinforcement learning will enable us to pre-train and deploy agents capable of sample-efficient learning in the real-world.

We refer to this problem statement as fully off-policy RL, previously also called batch RL in literature.

A class of deep RL algorithms, known as off-policy RL algorithms can, in principle, learn from previously collected data.

However, the Q-function estimator is only reliable on action-inputs from the behavior policy , which is the training distribution.

2 месяца, 3 недели назад @ bair.berkeley.edu
RoboNet: A Dataset for Large-Scale Multi-Robot Learning
RoboNet: A Dataset for Large-Scale Multi-Robot Learning RoboNet: A Dataset for Large-Scale Multi-Robot Learning

RoboNet: A Dataset for Large-Scale Multi-Robot LearningThis post is cross-listed at the SAIL Blog and the CMU ML blog.

Motivated by the success of large-scale data-driven learning, we created RoboNet, an extensible and diverse dataset of robot interaction collected across four different research labs.

Our goal is to pre-train reinforcement learning models on a sufficiently diverse dataset and then transfer knowledge (either zero-shot or with fine-tuning) to a different test environment.

A sample of RoboNet along with data statistics is shown below:A sample of data from RoboNet alongside a summary of the current dataset.

We’ve also open sourced our code-base and the entire RoboNet dataset.

3 месяца назад @ bair.berkeley.edu
Prof. Anca Dragan Talks About Human-Robot Interaction for WIRED
Prof. Anca Dragan Talks About Human-Robot Interaction for WIRED Prof. Anca Dragan Talks About Human-Robot Interaction for WIRED

Prof. Anca Dragan Talks About Human-Robot Interaction for WIREDProf. Anca Dragan gave a talk as part of the WIRED25 summit, explaining some of the challenges robots face when interacting with people.

First, robots that share space with people, from autonomous cars to quadrotors to indoor mobile robots, need to anticipate what people plan on doing and make sure they can stay out of the way.

This is already hard, because robots are not mind readers, and yet they need access to a rough simulator of us, humans, that they can use to help them decide how to act.

And what if the person decides to accelerate instead?

Find out about the ways in which robots can negotiate these situations in the vide…

3 месяца назад @ bair.berkeley.edu
Can We Learn the Language of Proteins?
Can We Learn the Language of Proteins? Can We Learn the Language of Proteins?

Can We Learn the Language of Proteins?

The incredible success of BERT in Natural Language Processing (NLP) showed that large models trained on unlabeled data are able to learn powerful representations of language.

Protein structure at varying scales: primary structure (sequence), secondary structure (local topology), and tertiary structure (global topology).

Multiple groups had this idea somewhat concurrently, resulting in a flurry of work applying language modeling pretraining to proteins.

Yes, pretraining helpsOur models learn a positional embedding for each amino acid in the protein sequence.

3 месяца, 3 недели назад @ bair.berkeley.edu
Look then Listen: Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following
Look then Listen: Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following Look then Listen: Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following

We’ll start with the environment learning phase, where we will learn abstractions by observing an agent, such as a human, acting in the environment.

After environment learning pre-training, we are ready to move on to learning language.

We’ll compare our method to an end-to-end neural model, which has an identical neural architecture to our ultimate language learning model but without any environment learning pre-training of the decoder.

A baseline neural model gets an accuracy of 18% on the task, but with our environment learning pre-training, the model reaches 28%, an improvement of ten absolute percentage points.

As shown above, using our pre-training method leads to much more data-ef…

4 месяца назад @ bair.berkeley.edu
AWS Machine Learning AWS Machine Learning
последний пост 6 дней, 10 часов назад
Automating your Amazon Forecast workflow with Lambda, Step Functions, and CloudWatch Events rule
Automating your Amazon Forecast workflow with Lambda, Step Functions, and CloudWatch Events rule Automating your Amazon Forecast workflow with Lambda, Step Functions, and CloudWatch Events rule

This post discusses the system architecture for Amazon Redshift to use Forecast to manage hardware and help customers spin up Amazon Redshift clusters quickly.

To learn more about how to use Amazon Forecast, see Amazon Forecast – Now Generally Available and Amazon Forecast now supports the generation of forecasts at a quantile of your choice.

Whenever a customer requests for a new cluster, Amazon Redshift grabs the required number of nodes from the cache pools.

This section lays out the steps to automate your forecast model training and forecast generation workflows using Forecast.

About the authorsZhixing Ma is a senior software development engineer on the Amazon Redshift team where he led…

6 дней, 10 часов назад @ aws.amazon.com
Winners of AWS Machine Learning Research Awards announced
Winners of AWS Machine Learning Research Awards announced Winners of AWS Machine Learning Research Awards announced

The AWS Machine Learning Research Awards (MLRA) provides unrestricted cash funds and AWS Promotional Credits to academics to advance the frontiers of machine learning (ML) and its applications.

About MLRAMLRA aims to advance ML by funding innovative research and open-source projects, training students, and providing researchers with access to the latest technology.

Since 2017, MLRA has supported over 150 research projects from more than 60 schools and research institutes in over 10 countries, on topics such as ML algorithms, computer vision, natural language processing, medical research, neuroscience, social science, physics, and robotics.

An spent many years applying machine learning to bi…

1 неделя назад @ aws.amazon.com
175+ Customers Achieve Machine Learning Success with AWS’s Machine Learning Solutions Lab
175+ Customers Achieve Machine Learning Success with AWS’s Machine Learning Solutions Lab 175+ Customers Achieve Machine Learning Success with AWS’s Machine Learning Solutions Lab

AWS introduced the Machine Learning (ML) Solutions Lab a little over two years ago to connect our machine learning experts and data scientists with AWS customers.

Together with the ML Solutions Lab, Fromosa Plastics created and deployed a model using Amazon SageMaker to automatically detect defects.

We started the ML Solutions Lab because we believe machine learning has the ability to transform every industry, process, and business, but the path to machine learning success is not always straightforward.

To learn more about the AWS Machine Learning Solutions Lab contact your account manager or visit us at https://aws.amazon.com/ml-solutions-lab/.

About the AuthorMichelle K. Lee, Vice Preside…

1 неделя назад @ aws.amazon.com
Simplify Machine Learning Inference on Kubernetes with Amazon SageMaker Operators
Simplify Machine Learning Inference on Kubernetes with Amazon SageMaker Operators Simplify Machine Learning Inference on Kubernetes with Amazon SageMaker Operators

Amazon SageMaker Operators for Kubernetes allows you to augment your existing Kubernetes cluster with SageMaker hosted endpoints.

This post demonstrates how to set up Amazon SageMaker Operators for Kubernetes to create and update endpoints for a pre-trained XGBoost model completely from kubectl .

Creating an Amazon SageMaker execution roleAmazon SageMaker needs an IAM role that it can assume to serve your model.

See the following code:$ kubectl delete -f hosting.yaml hostingdeployment.sagemaker.aws.amazon.com "hosting-deployment" deletedConclusionThis post demonstrated how Amazon SageMaker Operators for Kubernetes supports real-time inference.

You can share how you’re using Amazon SageMaker…

1 неделя, 4 дня назад @ aws.amazon.com
Automating model retraining and deployment using the AWS Step Functions Data Science SDK for Amazon SageMaker
Automating model retraining and deployment using the AWS Step Functions Data Science SDK for Amazon SageMaker Automating model retraining and deployment using the AWS Step Functions Data Science SDK for Amazon SageMaker

In November of 2019, AWS released the AWS Step Functions Data Science SDK for Amazon SageMaker, an open-source SDK that allows developers to create Step Functions-based machine learning workflows in Python.

This post uses the following AWS services:AWS Step Functions allows you to coordinate several AWS services into a serverless workflow.

This post creates a step that branches based on the results of your Amazon SageMaker training step.

SummaryThis post provided an overview of the AWS Step Functions Data Science SDK for Amazon SageMaker.

For additional technical documentation and example notebooks related to the SDK, please see the AWS Step Functions Data Science SDK for Amazon SageMaker a…

1 неделя, 6 дней назад @ aws.amazon.com
Lowering total cost of ownership for machine learning and increasing productivity with Amazon SageMaker
Lowering total cost of ownership for machine learning and increasing productivity with Amazon SageMaker Lowering total cost of ownership for machine learning and increasing productivity with Amazon SageMaker

The Total Cost of Ownership (TCO) is often the financial metric that you use to estimate and compare ML costs.

For the full TCO analysis, see The total cost of ownership of Amazon SageMaker.

The TCO for Amazon SageMaker continues to remain significantly lower over time because Amazon SageMaker optimizes infrastructure usage automatically and doesn’t require upkeep of security and compliance features.

For the full TCO analysis, see The total cost of ownership of Amazon SageMaker.

Her goal is to make it easy for customers to build, train, and deploy machine learning models using Amazon SageMaker.

2 недели назад @ aws.amazon.com
Flagging suspicious healthcare claims with Amazon SageMaker
Flagging suspicious healthcare claims with Amazon SageMaker Flagging suspicious healthcare claims with Amazon SageMaker

This post demonstrates how to train an Amazon SageMaker model to flag anomalous post-payment Medicare inpatient claims and target them for further investigation on suspicion of fraud.

I use Amazon SageMaker PCA to reduce the number of variables and to make sure that your variables are independent of one another.

matrx_train = X_stndrd_train.as_matrix().astype('float32') import io import sagemaker.amazon.common as smac buf_train = io.BytesIO() smac.write_numpy_to_dense_tensor(buf_train, matrx_train) buf_train.seek(0)Calling an Amazon SageMaker fit function to start the training jobThe next step is to call an Amazon SageMaker fit function to start the training job.

See the following code exam…

2 недели, 1 день назад @ aws.amazon.com
Amazon Personalize can now use 10X more item attributes to improve relevance of recommendations
Amazon Personalize can now use 10X more item attributes to improve relevance of recommendations Amazon Personalize can now use 10X more item attributes to improve relevance of recommendations

Amazon Personalize is a machine learning service which enables you to personalize your website, app, ads, emails, and more, with custom machine learning models which can be created in Amazon Personalize, with no prior machine learning experience.

AWS is pleased to announce that Amazon Personalize now supports ten times more item attributes for modeling in Personalize.

Previously, you could use up to five item attributes while building an ML model in Amazon Personalize.

Additionally, to conform with the keywords in Amazon Personalize, this post renames customer_id to USER_ID , product_id to ITEM_ID , and review_date to TIMESTAMP .

Performing real-time recommendations with Amazon Personalize …

2 недели, 4 дня назад @ aws.amazon.com
Capturing and validating alphanumeric identifiers in Amazon Lex
Capturing and validating alphanumeric identifiers in Amazon Lex Capturing and validating alphanumeric identifiers in Amazon Lex

As of this writing, you can use AMAZON.AlphaNumeric slot type in Amazon Lex to capture such inputs in your bot.

You can extend this slot type by applying a validation check to create a custom slot type.

This post demonstrates how to use the AMAZON.AlphaNumeric slot type to capture alphanumeric information and restrict such input to a specific pattern.

To capture the confirmation code and enforce the necessary validation checks, complete the following steps:On the Amazon Lex console, choose FlightReservationBot.

To add the ConfirmationCode slot type to the ModifyReservation intent, complete the following steps:On the Amazon Lex console, choose FlightReservationBot.

2 недели, 5 дней назад @ aws.amazon.com
Registration for Amazon re:MARS is Now Open
Registration for Amazon re:MARS is Now Open Registration for Amazon re:MARS is Now Open

Amazon re:MARS 2020 is June 16–19 in Las Vegas, Nevada.

Developer Day tracks include AWS Machine Learning, AWS RoboMaker, Alexa Skills, and Alexa for Device Makers.

AWS RoboMakerHear from AWS robotics leaders about AWS RoboMaker, how customers are using it to advance their robotics initiatives, and learn about the latest features.

Special guestsAttend Amazon re:MARS 2020 to hear from leaders at Amazon and other organizations on how they use AI to innovate every day.

Register for Amazon re:MARS 2020All systems go?

2 недели, 6 дней назад @ aws.amazon.com
Build a unique Brand Voice with Amazon Polly
Build a unique Brand Voice with Amazon Polly Build a unique Brand Voice with Amazon Polly

AWS is pleased to announce a new feature in Amazon Polly called Brand Voice, a capability in which you can work with the Amazon Polly team of AI research scientists and linguists to build an exclusive, high-quality, Neural Text-to-Speech (NTTS) voice that represents your brand’s persona.

Brand Voice allows you to differentiate your brand by incorporating a unique vocal identity into your products and services.

The Amazon Polly team also built a friendly Australian English NTTS voice for NAB, whose persona was carefully crafted to be unique and consistent with their brand persona.

This customer service-oriented Brand Voice was launched as part of a broader NAB contact center migration to Ama…

3 недели назад @ aws.amazon.com
Identifying worker labeling efficiency using Amazon SageMaker Ground Truth
Identifying worker labeling efficiency using Amazon SageMaker Ground Truth Identifying worker labeling efficiency using Amazon SageMaker Ground Truth

Setting up an Amazon SageMaker notebook instanceAn Amazon SageMaker notebook instance is a fully managed Amazon EC2 compute instance that runs the Jupyter Notebook app.

Analyzing worker labeling efficiencyAfter the labeling job has completed, you can review the labeling results by using the MTurk workforce.

– This is the sum of all the objects the worker annotated (non-golden standard objects and golden standard objects).

Number of Golden Standard Objects Annotated – This is the number of golden standard objects the worker annotated.

Average Golden Standard Accuracy – When a worker annotates a golden standard object, the Python logic tracks the following: The number of golden standard objec…

3 недели, 1 день назад @ aws.amazon.com
Millennium Management: Secure machine learning using Amazon SageMaker
Millennium Management: Secure machine learning using Amazon SageMaker Millennium Management: Secure machine learning using Amazon SageMaker

The presentation walks through some of the controls AWS offers to provide secure Amazon SageMaker deployments.

Overview of solutionAs a fully managed service, Amazon SageMaker deploys resources on your behalf in AWS-managed accounts.

This may impact installations you may need to perform interactively when using the notebook, but you can mitigate this if you know the installations beforehand by using Amazon SageMaker Notebook lifecycle configurations.

Enforcing compliance using SCPsSCPs are a component of AWS Organizations; you can also use them to enforce compliance when using Amazon SageMaker.

Whichever method you choose, Amazon SageMaker makes it easy to prepare, train, and deploy ML mode…

3 недели, 1 день назад @ aws.amazon.com
Amazon Comprehend now supports multi-label custom classification
Amazon Comprehend now supports multi-label custom classification Amazon Comprehend now supports multi-label custom classification

Amazon Comprehend supports custom classification and enables you to build custom classifiers that are specific to your requirements, without the need for any ML expertise.

Starting January 6, custom classification also supports multi-label classification.

You can also create an endpoint with your custom multi-label classifier to enable real-time applications.

Amazon Comprehend multi-label classification is now available in all AWS regions where Amazon Comprehend is available.

To try the new feature, log in to the Amazon Comprehend console for a code-free experience, or download the AWS SDK.

3 недели, 6 дней назад @ aws.amazon.com
Building a business intelligence dashboard for your Amazon Lex bots
Building a business intelligence dashboard for your Amazon Lex bots Building a business intelligence dashboard for your Amazon Lex bots

With conversation logs, all bot interactions can be stored in Amazon CloudWatch Logs log groups.

Solution architectureIn this business intelligence dashboard solution, you will use an Amazon Kinesis Data Firehose to continuously stream conversation log data from Amazon CloudWatch Logs to an Amazon S3 bucket.

This solution allows you to use your Amazon Lex conversation logs data to create live visualizations in Amazon QuickSight.

For CloudWatch Log Group for Lex Conversation Logs, enter the name of the CloudWatch Logs log group where your conversation logs are configured.

Amazon QuickSight in conjunction with Amazon Lex conversation logs makes it easy to create dashboards by streaming the co…

4 недели назад @ aws.amazon.com
NVIDIA
последний пост 8 часов назад
New Breakthrough in Coronavirus Research Uses GPU-Accelerated Software to Support Vaccine Design
New Breakthrough in Coronavirus Research Uses GPU-Accelerated Software to Support Vaccine Design New Breakthrough in Coronavirus Research Uses GPU-Accelerated Software to Support Vaccine Design

This is a 3D, atomic-scale map, or molecular structure, of the 2019-nCoV spike protein.

The application uses CUDA and CUDA libraries with NVIDIA V100 GPUs, and NVIDIA T4 GPUs on-premises and cloud service providers.

2019-nCoV S trimer viewed from the side, along the viral membrane.

The software can help academic researchers and pharmaceutical companies rapidly recover 3D protein structures, eliminating some of the guesswork out of the drug discovery process.

“The structure of 2019-nCoV S should enable rapid development and evaluation of medical countermeasures to address the ongoing public health crisis,” the researchers explained.

8 часов назад @ news.developer.nvidia.com
A New Frontier for PC Gaming: How GeForce NOW’s Game Library Continues to Evolve
A New Frontier for PC Gaming: How GeForce NOW’s Game Library Continues to Evolve A New Frontier for PC Gaming: How GeForce NOW’s Game Library Continues to Evolve

Giving what we love about PC gaming to more people.

That’s the vision that inspired our journey to develop GeForce NOW into an open platform that welcomed gamers, developers and publishers.

And we have an additional 1,500 games in our onboarding queue, from publishers that share a vision of expanding PC gaming to more people.

The world’s largest gaming platform — PC gaming — continues to evolve.

We’re looking forward to giving this power to even more players, and ushering in a new generation of PC gamers.

5 дней, 7 часов назад @ blogs.nvidia.com
Putting AI on Trials: Deep 6 Speeds Search for Clinical-Study Recruits
Putting AI on Trials: Deep 6 Speeds Search for Clinical-Study Recruits Putting AI on Trials: Deep 6 Speeds Search for Clinical-Study Recruits

But when recruiting patients to test promising treatments in clinical trials, the faster the better.

For that, they need the clinical trial process.”Over the past decade, the number of cancer clinical trials has grown 17 percent a year, on average.

“In the age of precision medicine, clinical trial criteria are getting more challenging,” Brusselaers said.

They’ve matched more than 100,000 patients to clinical trials so far.

“It’s just a long slog to find patients for clinical trials,” said Bill McKeon, CEO of Texas Medical Center.

5 дней, 15 часов назад @ blogs.nvidia.com
Developer Blog: Learning to Rank with XGBoost and GPUs
Developer Blog: Learning to Rank with XGBoost and GPUs Developer Blog: Learning to Rank with XGBoost and GPUs

Building a ranking model that can surface pertinent documents based on a user query from an indexed document-set is one of its core imperatives.

To accomplish this, documents are grouped on user query relevance, domains, subdomains, and so on, and ranking is performed within each group.

The initial ranking is based on the relevance judgement of an associated document based on a query.

XGBoost uses the LambdaMART ranking algorithm (for boosted trees), which uses the pairwise-ranking approach to minimize pairwise loss by sampling many pairs.

Read the post, Learning to Rank with XGBoost and GPU, in its entirety on the NVIDIA Developer Blog.

6 дней, 6 часов назад @ news.developer.nvidia.com
Developer Blog: Building a Real-time Redaction App Using NVIDIA DeepStream
Developer Blog: Building a Real-time Redaction App Using NVIDIA DeepStream Developer Blog: Building a Real-time Redaction App Using NVIDIA DeepStream

In part 1, you train an accurate, deep learning model using a large public dataset and PyTorch.

We are redacting four copies of the video simultaneously on a Jetson AGX Xavier edge device.

Read the blog, Building a Real-time Redaction App Using NVIDIA DeepStream, Part 1: Training in its entirety here.

This model is deployed on an NVIDIA Jetson powered AGX Xavier edge device using DeepStream SDK to redact faces on multiple video streams in real time.

Read the blog, Building a Real-time Redaction App Using NVIDIA DeepStream, Part 2: Deployment in its entirety here.

6 дней, 6 часов назад @ news.developer.nvidia.com
Isaac Sim 2020.1 Preview
Isaac Sim 2020.1 Preview Isaac Sim 2020.1 Preview

Today we release a preview of our next-gen Isaac Sim based on NVIDIA Omniverse.

About a year ago NVIDIA released the first iteration of an application called Isaac Sim, a 3D simulation environment designed for use in robot development with the NVIDIA Isaac SDK.

The year 2019 was a busy one for the Isaac team, with multiple releases of both Isaac SDK and Isaac Sim.

Isaac Sim: now on Omniverse KitIsaac Sim version 2020.1 leverages the powerful Omniverse Kit to build the next generation of robotics simulation.

The Isaac Sim 2020.1 Preview provides examples from the Leonardo project to explore the major features of this next-generation simulation platform.

6 дней, 7 часов назад @ news.developer.nvidia.com
NVIDIA at GDC 2020
NVIDIA at GDC 2020 NVIDIA at GDC 2020

Come join NVIDIA at GDC 2020 in San Francisco from Monday, March 16 to Friday, March 20 to see all the latest breakthroughs in gaming—and beyond.

See how NVIDIA RTX technology fuses AI and real-time ray tracing to deliver powerful performance and stunning visuals that redefine gaming.

Experience ray tracing firsthand with our latest interactive demos.

Get guidance from teams who have been leading the ray tracing revolution, including Machine Games, Microsoft, KoekeN Interactive, Infinity Ward, and Saber Interactive.

Back by popular demand after thousands watched the SIGGRAPH course, the improved “Introduction to DirectX Ray Tracing” tutorial includes recent experiences integrating DirectX i…

6 дней, 7 часов назад @ news.developer.nvidia.com
Hail Yeah! How Robotaxis Will Change the Way We Move
Hail Yeah! How Robotaxis Will Change the Way We Move Hail Yeah! How Robotaxis Will Change the Way We Move

With the advent of autonomous vehicles, ride-hailing promises to raise the bar on safety and efficiency.

Known as robotaxis, these shared vehicles are purpose-built for transporting groups of people along optimized routes, without a human driver at the wheel.

The potential for a shared autonomous mobility industry is enormous.

On top of the economic and efficiency benefits, autonomous vehicles are never distracted or drowsy.

Across the pond, autonomous vehicle solution provider AutoX and Swedish electric vehicle manufacturer NEVS are working to deploy robotaxis in Europe by the end of this year.

6 дней, 7 часов назад @ blogs.nvidia.com
NVIDIA Healthcare to Host Clara Developer Day at GTC 2020
NVIDIA Healthcare to Host Clara Developer Day at GTC 2020 NVIDIA Healthcare to Host Clara Developer Day at GTC 2020

GTC offers developers a unique opportunity to learn more about NVIDIA’s Healthcare and Life Sciences solutions.

At GTC 2020 in Silicon Valley, NVIDIA’s team of healthcare software experts will walk you through tools for federated learning, medical imaging, genomics, and natural language processing (NLP).

New to GTC this year is the Clara Developer Day.

Held on Wednesday, March 26, this full-day workshop will take you through hands-on deep dives into the Clara Train and Clara Deploy SDKs for medical imaging.

Register now for NVIDIA’s Clara Developer DayOn Tuesday, March 25, the team is also hosting the Healthcare and Life Sciences Meetup.

6 дней, 8 часов назад @ news.developer.nvidia.com
On-Demand Webinar: JetPack SDK: Overview, Installation and Key features
On-Demand Webinar: JetPack SDK: Overview, Installation and Key features On-Demand Webinar: JetPack SDK: Overview, Installation and Key features

JetPack SDK is the most comprehensive solution for building Jetson-based AI applications for autonomous machines.

It includes the latest OS image, along with libraries and APIs, samples, developer tools, and documentation — all that is needed to accelerate your AI application development.

This webinar will provide you with a deep understanding of JetPack including live demonstration of key new features in JetPack 4.3 which is the latest production software release for all Jetson modules.

NVIDIA JetPack SDK is the most comprehensive solution for building AI applications.

Topics Covered in the Webinar:JetPack overview including software stack and key componentsJetPack installation using NVIDI…

6 дней, 8 часов назад @ news.developer.nvidia.com
Using AI-Based Emulators to Speed Up Simulations by Billions of Times
Using AI-Based Emulators to Speed Up Simulations by Billions of Times Using AI-Based Emulators to Speed Up Simulations by Billions of Times

To simulate how subatomic particles interact, or how haze affects climate, scientists from Stanford University and the University of Oxford developed a deep learning-based method that can speed up simulations by billions of times.

“This is a big deal,” Donald Lucas stated in a Science article about the work, From models of galaxies to atoms, simple AI shortcuts speed up simulations by billions of times.

Specifically, for a global aerosol-climate modeling simulation that used a general circulation model (GCM), the team first tested the simulation using CPU-only.

When using an NVIDIA TITAN X GPU, the simulation was accelerated by over two billion times.

Emulators speed up simulations, such as…

6 дней, 8 часов назад @ news.developer.nvidia.com
Head Injury Study Validates NASA Safety Testing
Head Injury Study Validates NASA Safety Testing Head Injury Study Validates NASA Safety Testing

Designing safety restraints for automobiles is a challenging endeavor, now imagine doing the same for a NASA spacecraft.

Injuries that are minor after a car crash could prevent an astronaut from exiting a capsule that has just landed on water.

Now a team of Wake Forest School of Medicine researchers has carried out a supercomputer simulation to help validate NASA’s crash testing practices.

Using the Pittsburgh Supercomputing Center Bridges Supercomputer, which is comprised of an NVIDIA DGX-2 system, NVIDIA V100 GPU nodes, and 48 previous-generation NVIDIA GPUs, the scientists carried out extensive simulations on crash-test dummies.

The scientists published their results in the January 2020 …

6 дней, 11 часов назад @ news.developer.nvidia.com
An AI for Detail: Nanotronics Brings Deep Learning to Precision Manufacturing
An AI for Detail: Nanotronics Brings Deep Learning to Precision Manufacturing An AI for Detail: Nanotronics Brings Deep Learning to Precision Manufacturing

Matthew Putman, this week’s guest on the AI Podcast, knows that the devil is in the details.

That’s why he’s the co-founder and CEO of Nanotronics, a Brooklyn-based company providing precision manufacturing enhanced by AI, automation and 3D imaging.

SUBHEAD: Key Points From This Episode:Nanotronics develops universal AI models that can be customized depending on individual customers’ processes and deployments.

When the new Nanotronics factory is finished (pictured, above), they’ll use their own deep learning models to ensure precision manufacturing as they construct their equipment.

UC Berkeley’s Pieter Abbeel on How Deep Learning Will Help Robots LearnPieter Abbeel, director of the Berkele…

6 дней, 17 часов назад @ blogs.nvidia.com
Meet Six Smart Robots at GTC 2020
Meet Six Smart Robots at GTC 2020 Meet Six Smart Robots at GTC 2020

The ANYmal C from ANYbotics AG (pictured above), based in Zurich, is among the svelte navigators, detecting obstacles and finding its own shortest path forward thanks to its Jetson AGX Xavier GPU.

It runs computer vision algorithms on a Jetson TX2 GPU to identify and follow its owner’s legs on any hard surfaces.

Mobile Industrial Robots A/S, based in Odense, Denmark, will give a talk at GTC about how it’s adding AI with Jetson Xavier to its pallet-toting robots to expand their work repertoire.

Rounding out the cast, the Serve delivery robot from Postmates will make a return engagement at GTC.

To get in on the action, register here for GTC 2020.

1 неделя назад @ blogs.nvidia.com
NVIDIA Awards $50,000 Fellowships to Ph.D. Students for GPU Computing Research
NVIDIA Awards $50,000 Fellowships to Ph.D. Students for GPU Computing Research NVIDIA Awards $50,000 Fellowships to Ph.D. Students for GPU Computing Research

Our NVIDIA Graduate Fellowship Program recently awarded up to $50,000 each to five Ph.D. students involved in GPU computing research.

Now in its 19th year, the fellowship program supports graduate students doing GPU-based work.

The fellows’ work puts them at the forefront of GPU computing, including projects in deep learning, graphics, high performance computing and autonomous machines.

“Our fellowship recipients are among the most talented graduate students in the world,” said NVIDIA Chief Scientist Bill Dally.

“They’re working on some of the most important problems in computer science, and we’re delighted to support their research.”The NVIDIA Graduate Fellowship Program is open to applica…

1 неделя, 1 день назад @ blogs.nvidia.com
Intel AI Intel AI
последний пост 1 день, 7 часов назад
Protecting Privacy in a Data-Driven World: Privacy-Preserving Machine Learning
Protecting Privacy in a Data-Driven World: Privacy-Preserving Machine Learning Protecting Privacy in a Data-Driven World: Privacy-Preserving Machine Learning

Protecting Privacy in a Data-Driven World: Privacy-Preserving Machine LearningPrivacy for machine learning (ML) and other data-intensive applications is increasingly threatened by sophisticated methods of re-identifying anonymized data.

Privacy-Preserving Machine Learning (PPML), including rapid advances in cryptography, statistics, and other building block technologies, provides powerful new ways to maintain anonymity and safeguard privacy.

The session, Protecting Privacy in a Data-Driven World: Privacy-Preserving Machine Learning , takes place Feb. 25 at 1 pm in Moscone West.

Differential privacy adds mathematical noise to personal data, protecting individual privacy but enabling insights…

1 день, 7 часов назад @ intel.ai
Streamline your Intel® Distribution of OpenVINO™ Toolkit development with Deep Learning Workbench
Streamline your Intel® Distribution of OpenVINO™ Toolkit development with Deep Learning Workbench Streamline your Intel® Distribution of OpenVINO™ Toolkit development with Deep Learning Workbench

Streamline your Intel® Distribution of OpenVINO™ Toolkit development with Deep Learning WorkbenchBack in 2018, Intel launched the Intel® Distribution of OpenVINO™ toolkit .

Let’s walk though some of the key tools and capabilities equipped in the Deep Learning Workbench.

The Deep Learning Workbench provides detailed timing per layer, information on fusions that happened, and a complete runtime graph.

For example 2nd Gen Intel® Xeon® Scalable processors offer improved int8 performance via the Intel Deep Learning Boost feature .

Now, with the latest release of the Intel Distribution of OpenVINO toolkit, the Post-training Optimization Tool is now made available in the Deep Learning Workbench fo…

6 дней, 13 часов назад @ intel.ai
Enhanced low-precision pipeline to accelerate inference with OpenVINO toolkit
Enhanced low-precision pipeline to accelerate inference with OpenVINO toolkit Enhanced low-precision pipeline to accelerate inference with OpenVINO toolkit

Enhanced low-precision pipeline to accelerate inference with OpenVINO toolkitOverviewNeural network quantization and execution in low precision have been widely adopted as an optimization method that can achieve significant acceleration while maintaining accuracy.

Quantization-Aware Training (QAT), a set of third-party components that can produce quantized models compatible with the Intel Distribution of OpenVINO Toolkit.

Default Quantization pipelineDefault Quantization pipeline is designed to do a fast, accurate 8-bit quantization of neural networks.

OpenVINO represents models quantized through frameworks and via post training quantization with the FakeQuantize primitive.

Accelerate deep …

2 недели, 5 дней назад @ intel.ai
Intel® Nervana™ NNP-I Shows Best-in-Class Throughput on BERT NLP Model
Intel® Nervana™ NNP-I Shows Best-in-Class Throughput on BERT NLP Model Intel® Nervana™ NNP-I Shows Best-in-Class Throughput on BERT NLP Model

Intel® Nervana™ NNP-I Shows Best-in-Class Throughput on BERT NLP ModelNatural Language Processing (NLP) will be a 43 billion dollar business by 2025 .

Since the Intel Nervana NNP-I has 12 ICE cores, six parallel and asynchronous batch two inferences can run on the machine simultaneously.

Figure 3: BERT performance on Intel Nervana NNP-I.

Intel Nervana NNP-I System Configuration:PCIe card measurements are based on projections of single chip pre-production NNP-I.

Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.

3 недели назад @ intel.ai
2nd Generation Intel® Xeon® Scalable CPUs Outperform NVIDIA GPUs on NCF Deep Learning Inference
2nd Generation Intel® Xeon® Scalable CPUs Outperform NVIDIA GPUs on NCF Deep Learning Inference 2nd Generation Intel® Xeon® Scalable CPUs Outperform NVIDIA GPUs on NCF Deep Learning Inference

2nd Generation Intel® Xeon® Scalable CPUs Outperform NVIDIA GPUs on NCF Deep Learning InferenceRecommendation systems are some of the most complex and prevalent commercial AI applications deployed by internet companies today.

Thanks to the Intel Deep Learning Boost (Intel DL Boost) feature found in 2nd generation Intel Xeon Scalable processors , we demonstrated leadership NCF model inference performance of 64 million sentences per second under 1.22 milliseconds (msec) on a dual socket Intel Xeon Platinum 9282 Processor-based system, outperforming the GPU performance on NCF published by NVIDIA on Jan 16th, 2020.

[1]Model Platform Performance Precision Dataset NCF Intel Xeon Platinum 9282 CPU…

3 недели, 5 дней назад @ intel.ai
Redefining the Customer Experience with AI
Redefining the Customer Experience with AI Redefining the Customer Experience with AI

Redefining the Customer Experience with AIThe National Retail Federation’s annual conference is where the industry comes together to see the future of retail.

To capture mindshare, wallet share and loyalty from Gen Z, retailers are focusing intently on the customer experience, and specifically the physical store.

It’s all in the service of creating the optimum customer experience at every encounter—the better to convert each encounter into a sale.

AI is CentralSuccess in retail depends more than ever on data and its value in enhancing and curating the customer experience.

Analyzing data collected from every touchpoint, AI enables exciting new ways to deliver a seamless and immersive custome…

4 недели назад @ intel.ai
Intel AI Innovation on Display at AI Week in Israel
Intel AI Innovation on Display at AI Week in Israel Intel AI Innovation on Display at AI Week in Israel

Intel AI Innovation on Display at AI Week in IsraelLast November, Intel teamed up with the University of Tel Aviv and the Israel Innovation Authority to host AI Week , where 3,000 technologists, researchers, and data scientists from 29 countries gathered to discuss AI innovation across a multitude of industries.

From Mobileye to IT to the AI Platforms Group, Intel experts delivered a diverse set of talks on AI research, algorithmic development, industries like healthcare and automotive, hardware, computer vision, and more.

Using AI Platforms to Accelerate Enterprise AI DeploymentWhat happens after an AI model is deployed?

Follow us on @IntelAIResearch for the latest in AI research from Inte…

1 месяц назад @ intel.ai
Addressing the Memory Bottleneck in AI Model Training
Addressing the Memory Bottleneck in AI Model Training Addressing the Memory Bottleneck in AI Model Training

Addressing the Memory Bottleneck in AI Model TrainingHealthcare workloads, particularly in medical imaging, may use more memory than other AI workloads because they often use higher resolution 3D images.

In fact, many researchers are surprised to find how much memory these models use during both training and inference.

Training was increased by 3.4x when comparing standard, unoptimized TensorFlow 1.11 to Intel-optimized TensorFlow 1.11 for the 3D U-Net model (Figure 2).

In fact, Intel demonstrated training a similar 3D U-Net model using a data-parallel method at the 2018 Supercomputing Conference (SC18).

Notices and Disclaimers:Software and workloads used in performance tests may have been …

1 месяц назад @ intel.ai
Exploring Deep Learning Face Recognition with Thermal Images
Exploring Deep Learning Face Recognition with Thermal Images Exploring Deep Learning Face Recognition with Thermal Images

Exploring Deep Learning Face Recognition with Thermal ImagesInteractions with computer systems frequently benefit from the ability to detect information about the user or use environment.

Earlier research has applied facial recognition methods known to work on visible light images to thermal images acquired via a variety of methods.

FindingsWe found a FaceNet model trained on visible light image data to generalize well to thermal images, demonstrating the feasibility of applying such DNNs to thermal image recognition tasks.

Conclusions and Next StepsOur preliminary results show:DNNs for visible light images may be applicable to other image types, such as thermal images.

Resolution degradati…

1 месяц, 2 недели назад @ intel.ai
Shedding Light on Undermapped Areas With AI and the American Red Cross
Shedding Light on Undermapped Areas With AI and the American Red Cross Shedding Light on Undermapped Areas With AI and the American Red Cross

Shedding Light on Undermapped Areas With AI and the American Red CrossIn many areas of the world like the United States and Europe, easy access to maps is a given.

The Missing Maps ProjectThe Missing Maps Project , founded in part by the American Red Cross, seeks to fill in the blank areas of under-mapped regions and provide better information to aid workers and local governments, which is particularly important in the aftermath of disasters.

Most of the current work done through the Missing Maps Project relies on volunteers manually updating OpenStreetMap .

And while we expand the use of our AI models, the Missing Maps Project could use your help to fill in the maps of the world.

Intel, th…

1 месяц, 2 недели назад @ intel.ai
Apple Machine Learning Journal
последний пост 2 месяца, 3 недели назад
Apple at NeurIPS 2019
Apple at NeurIPS 2019 Apple at NeurIPS 2019

The conference, of which Apple is a Diamond Sponsor, will take place in Vancouver, Canada from December 8th to 14th.

If you’re interested in opportunities to make an impact on Apple products through machine learning research and development, check out our teams at Jobs at Apple.

We propose to evaluate both the generator and the discriminator by deriving corresponding Fisher Score and Fisher Information from the EBM.

In this work, we address this problem by introducing data parameters.

During training, at each iteration, as we update the model parameters, we also update the data parameters.

2 месяца, 3 недели назад @ machinelearning.apple.com
Apple at Interspeech 2019
Apple at Interspeech 2019 Apple at Interspeech 2019

Apple is attending Interspeech 2019, the world’s largest conference on the science and technology of spoken language processing.

For Interspeech attendees, join the authors of our accepted papers at our booth to learn more about the great speech research happening at Apple.

If you’re interested in opportunities to make an impact on Apple products through machine learning research and development, check out our teams at Jobs at Apple.

The model can be used to check that text-to-speech (TTS) training speech follows the script and words are pronounced as expected.

Adding more annotated training data for any ML system typically improves accuracy, but only if it provides examples not alrea…

5 месяцев, 2 недели назад @ machinelearning.apple.com
Language Identification from Very Short Strings
Language Identification from Very Short Strings Language Identification from Very Short Strings

For example, this capability is needed to load the right autocorrection lexicon and the right language model for predictive and multilingual typing.

Neural LID ArchitectureWe model LID as a character level sequence labeling problem.

LSTM model sizes, on the other hand, are simply a function of the network parameters.

At Apple, bi-LSTM LID is now used for most tasks which require language identification, like text tagging and other public APIs part of the Natural Language framework.

Language Identification from Short Strings.

7 месяцев, 1 неделя назад @ machinelearning.apple.com
Bridging the Domain Gap for Neural Models
Bridging the Domain Gap for Neural Models Bridging the Domain Gap for Neural Models

This task is called the covariate shift problem, for the case where we have access to labeled data from one domain (source) and unlabeled data from another domain (target).

Unsupervised domain adaptation is an especially attractive alternative when the ground truth labels cannot be obtained easily for the task of interest.

An adversarial learning-based method for domain adaptation at pixel-level would try to translate/synthesize input images from one domain to the other, bringing the input distributions closer.

We can see clear improvements from models trained on source domain only to models trained with the proposed SWD method.

ConclusionThis method of unsupervised domain adaptation helps …

8 месяцев, 2 недели назад @ machinelearning.apple.com
Optimizing Siri on HomePod in Far‑Field Settings
Optimizing Siri on HomePod in Far‑Field Settings Optimizing Siri on HomePod in Far‑Field Settings

Unlike Siri on iPhone, which operates close to the user’s mouth, Siri on HomePod must work well in a far-field setting.

Block diagram of the online multichannel signal processing chain on HomePod for Siri.

The RES is designed to suppress nonlinear components of the echo signal that aren’t being modeled by the linear MCEC.

It is obvious that the optimal integration of our speech processing technologies substantially improves the overall WERs across conditions.

A survey of convolutive blind source separation methods, Multichannel Speech Processing Handbook, 2007.

1 год, 2 месяца назад @ machinelearning.apple.com
Apple at NeurIPS 2018
Apple at NeurIPS 2018 Apple at NeurIPS 2018

This December we’ll be in Montreal, Canada, attending the 32nd Conference on Neural Information Processing Systems (NeurIPS).

We’ll have a booth staffed with Machine Learning experts from across Apple who would love to chat with you.

Please drop by if you’re attending the conference.

Apple is dedicated to advancing state-of-the-art machine learning technologies.

If you are interested in applying to specific machine learning positions, please explore opportunities at Machine Learning Jobs At Apple.

1 год, 2 месяца назад @ machinelearning.apple.com
Can Global Semantic Context Improve Neural Language Models?
Can Global Semantic Context Improve Neural Language Models? Can Global Semantic Context Improve Neural Language Models?

In this article, we explore whether we can improve word predictions for the QuickType keyboard using global semantic context.

Can this global semantic context result in better language models?

All neural network solutions to date predict either a word in context or the local context itself, which doesn’t adequately reflect global semantic information.

ConclusionWe set out to assess the potential benefits of incorporating global semantic information into neural language models.

In summary, using bi-LSTM RNNs to train global semantic word embeddings can indeed lead to improved accuracy in neural language modeling.

1 год, 5 месяцев назад @ machinelearning.apple.com
Finding Local Destinations with Siri’s Regionally Specific Language Models for Speech Recognition
Finding Local Destinations with Siri’s Regionally Specific Language Models for Speech Recognition Finding Local Destinations with Siri’s Regionally Specific Language Models for Speech Recognition

The accuracy of automatic speech recognition (ASR) systems has improved phenomenally over recent years, due to the widespread adoption of deep learning techniques.

We decided to improve Siri’s ability to recognize names of local POIs by incorporating knowledge of the user’s location into our speech recognition system.

We’ve been able to significantly improve the accuracy of local POI recognition and understanding by incorporating users’ geolocation information into Siri’s ASR system.

Incremental Language Models for Speech Recognition Using Finite-state Transducers.

Convolutional Neural Networks for Speech Recognition IEEE/ACM Transactions on Audio, Speech, and Language Processing,…

1 год, 6 месяцев назад @ machinelearning.apple.com
Personalized Hey Siri
Personalized Hey Siri Personalized Hey Siri

When a user says, “Hey Siri, how is the weather today?” the phone wakes up upon hearing “Hey Siri” and processes the rest of the utterance as a Siri request.

The application of a speaker recognition system involves a two-step process: enrollment and recognition.

User EnrollmentThe main design discussion for personalized “Hey Siri” (PHS) revolves around two methods for user enrollment: explicit and implicit.

Improving the Speaker TransformThe speaker transform is the most important part of any speaker recognition system.

At its core, the purpose of the “Hey Siri” feature is to enable users to make Siri requests.

1 год, 10 месяцев назад @ machinelearning.apple.com
Learning with Privacy at Scale
Learning with Privacy at Scale Learning with Privacy at Scale

We develop a system architecture that enables learning at scale by leveraging local differential privacy, combined with existing privacy best practices.

In this article, we give an overview of a system architecture that combines differential privacy and privacy best practices to learn from a user population.

Differential privacy [2] provides a mathematically rigorous definition of privacy and is one of the strongest guarantees of privacy available.

In our system, we choose not to collect raw data on the server which is required for central differential privacy; hence, we adopt local differential privacy, which is a superior form of privacy [3].

ConclusionIn this article, we have presented a…

2 года, 2 месяца назад @ machinelearning.apple.com
Uber Engineering Uber Engineering
последний пост 1 неделя, 5 дней назад
Building a Backtesting Service to Measure Model Performance at Uber-scale
Building a Backtesting Service to Measure Model Performance at Uber-scale Building a Backtesting Service to Measure Model Performance at Uber-scale

To better assess the performance of our models, we built a backtesting service for measuring forecast model error rates.

The backtesting service runs in a distributed system, allowing multiple models (>10), many backtesting windows (>20), and models for different cities (>200) to run simultaneously.

Backtesting at scaleOur data science teams regularly create forecast models and statistics to better understand budget spending and project financial performance.

For the purposes of our backtesting service, we chose to leverage two primary backtesting data split mechanisms, backtesting with an expanding window and backtesting with a sliding window:Above, we showcase three windows for each metho…

1 неделя, 5 дней назад @ eng.uber.com
Uber AI in 2019: Advancing Mobility with Artificial Intelligence
Uber AI in 2019: Advancing Mobility with Artificial Intelligence Uber AI in 2019: Advancing Mobility with Artificial Intelligence

At the forefront of this effort is Uber AI, Uber’s center for advanced artificial intelligence research and platforms.

In this year alone, AI research at Uber has led to significant improvements in demand prediction and more seamless pick-up experiences.

Fostering AI collaboration through open sourceIn 2019, Uber AI was committed to sharing knowledge and best practices with the broader scientific community through open source projects.

Looking towards 2020Next year, Uber AI will continue to innovate, collaborate, and contribute to Uber’s platform services through the application of AI across our business.

For more on Uber AI, be sure to check out related articles on the Uber Engineering Blo…

2 месяца, 1 неделя назад @ eng.uber.com
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

We in Uber AI Labs investigated the intriguing question of whether we can create learning algorithms that automatically generate training data, learning environments, and curricula to help AI agents rapidly learn.

Increasingly, neural architecture search (NAS) algorithms are being deployed to automate the search for architectures, with great results.

32), new learners are able to learn on synthetic data faster than real data (red line vs. blue line in Figure 1).

In our experiments, the estimates come either from training for 128 SGD steps on GTN-generated data or real data.

Then, for each method, the final best architecture according to the estimate is trained a long time on real data.

2 месяца, 1 неделя назад @ eng.uber.com
Controlling Text Generation with Plug and Play Language Models
Controlling Text Generation with Plug and Play Language Models Controlling Text Generation with Plug and Play Language Models

This article discusses an alternative approach to controlled text generation, titled the Plug and Play Language Model (PPLM), introduced in a recent paper from Uber AI.

In many ways, language models are like wise but unguided wooly mammoths that lumber wherever they please.

As we will show below, attribute models with only a single layer containing 4,000 parameters perform well at recognizing attributes and guiding generation.

Thus, we use the unmodified language model to ensure the fluency of language is maintained at or near the level of the original language model (in this example, GPT-2-medium).

Multiple attribute modelsWe may combine multiple attribute models in controlled generation, …

2 месяца, 3 недели назад @ eng.uber.com
Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations

To this end, we previously developed ML models to better understand queries and for multi-objective optimization in Uber Eats search and recommender system in Uber Eats searches and surfaced food options.

Graph learning in a nutshellTo best understand how we made our Uber Eats recommendations more accurate, it helps to know the basics of how graph learning works.

For example, to represent an eater in our Uber Eats model we don’t only use order history to inform order suggestions, but also information about what food items are connected to past Uber Eats orders and insights about similar users.

For our Uber Eats use case, we opted for a graph neural network (GNN)-based approach to obtain an …

2 месяца, 3 недели назад @ eng.uber.com
Uber Goes to NeurIPS 2019
Uber Goes to NeurIPS 2019 Uber Goes to NeurIPS 2019

This year, Uber is presenting 11 papers at the NeurIPS 2019 conference in Vancouver, Canada!

Scalable Global Optimization via Local Bayesian OptimizationDavid Eriksson (Uber AI) · Michael Pearce (Uber AI intern / Warwick University) · Jacob Gardner (Uber AI) · Ryan Turner (Uber AI) · Matthias Poloczek (Uber AI)ArXivDecember 10 at 4:25 pm, West Ballroom C, NeurIPS Spotlight TalkDecember 10 at 5:30 pm, East Exhibition Hall B&C, Poster #9Bayesian optimization (BO) has recently emerged as a successful technique for the global optimization of black-box functions.

For additional information about our talks and posters, check out the Uber NeurIPS 2019 site.

Interested in the ML research that Uber …

2 месяца, 3 недели назад @ eng.uber.com
Announcing the 2020 Uber AI Residency
Announcing the 2020 Uber AI Residency Announcing the 2020 Uber AI Residency

On behalf of Uber, we invite you to join us on our journey as an Uber AI Resident.

Established in 2018, the Uber AI Residency is a 12-month training program for recent college and master’s graduates, professionals who are looking to reinforce their AI skills, and those with quantitative skills and interest in becoming an AI researcher at Uber.

This year’s AI residency program will focus on our self-driving cars project through Uber Advanced Technology Group (ATG).

Open source & publication opportunitiesAcross Uber, we are committed to an open and inclusive research mission that benefits the community at large through both Uber AI and Uber ATG Research.

Learn more about the Uber AI Residency…

3 месяца назад @ eng.uber.com
Get to Know Uber ATG at ICCV, CoRL, and IROS 2019
Get to Know Uber ATG at ICCV, CoRL, and IROS 2019 Get to Know Uber ATG at ICCV, CoRL, and IROS 2019

Uber ATG is committed to publishing research advancements with the goal of bringing self-driving cars to the world safely and scalably.

This year, Uber ATG has five publications accepted at ICCV, two publications accepted at CoRL, and two publications accepted at IROS.

In addition, Raquel Urtasun, Uber ATG Chief Scientist and Head of Uber ATG R&D, will be giving four talks at ICCV.

Please come visit us at ICCV (booth #D-7) IROS and CORL to learn more about our lab’s research, discuss the work with our researchers, and hear about career opportunities with Uber ATG.

Learn more about research opportunities with Uber ATG by visiting our careers page.

4 месяца назад @ eng.uber.com
Evolving Michelangelo Model Representation for Flexibility at Scale
Evolving Michelangelo Model Representation for Flexibility at Scale Evolving Michelangelo Model Representation for Flexibility at Scale

To address these issues, we evolved Michelangelo’s use of Spark MLlib, particularly in the areas of model representation, persistence, and online serving.

Its end-to-end support for scheduled Spark-based data ingestion, model training, and evaluation, along with deployment for batch and online model serving, has gained wide acceptance across Uber.

More recently, Michelangelo has evolved to handle more use cases, including serving models trained outside of core Michelangelo.

Michelangelo had specific pipeline model definitions for each supported model type, with an in-house custom protobuf representation of trained models for serving.

It is important to note that Michelangelo online serving …

4 месяца, 1 неделя назад @ eng.uber.com
Searchable Ground Truth: Querying Uncommon Scenarios in Self-Driving Car Development
Searchable Ground Truth: Querying Uncommon Scenarios in Self-Driving Car Development Searchable Ground Truth: Querying Uncommon Scenarios in Self-Driving Car Development

We use these traffic scenarios to develop machine learning models that help our self-driving cars safely react to common, and not so common, scenarios that come up in a given operational domain.

These specific scenarios can then be used to train our self-driving cars to safely navigate a traffic situation with bicyclists.

Modeled tables are crucial in making our data useful for training self-driving cars to operate safely.

The ability to query data that replicates traffic scenarios ranging from the everyday to the very rare will help prepare our self-driving cars for any situation.

There is no shortage of work to be done in making the future of self-driving cars a reality.

4 месяца, 3 недели назад @ eng.uber.com
Science at Uber: Improving Transportation with Artificial Intelligence
Science at Uber: Improving Transportation with Artificial Intelligence Science at Uber: Improving Transportation with Artificial Intelligence

In our Science at Uber video series, Uber employees talk about how we apply data science, artificial intelligence, machine learning, and other innovative technologies in our daily work.

Zoubin Ghahramani, Chief Scientist at Uber, spent many years in academia researching artificial intelligence.

Applied to the huge amount of data around transportation, artificial intelligence has the capability to make travel easier and more seamless.

At Uber, deep learning, an area of artificial intelligence research, finds use in multiple applications, including improving our understanding of cities and traffic, helping compute ETAs, and in developing self-driving cars.

Beyond deep learning, however, we al…

5 месяцев, 1 неделя назад @ eng.uber.com
Three Approaches to Scaling Machine Learning with Uber Seattle Engineering
Three Approaches to Scaling Machine Learning with Uber Seattle Engineering Three Approaches to Scaling Machine Learning with Uber Seattle Engineering

In an effort to constantly optimize our operations, serve our customers, and train our systems to perform better and better, we leverage machine learning (ML).

In addition, we make many of our ML tools open source, sharing them with the community to advance the state of the art.

In this spirit, members of our Seattle Engineering team shared their work at an April 2019 meetup on ML and AI at Uber.

Below, we highlight three different approaches Uber Seattle Engineering is currently working on to improve our ML ecosystem and that of the tech community at large.

Horovod: Distributed Deep Learning on Apache SparkDuring his talk, senior software engineer Travis Addair, from the ML Platform team, …

5 месяцев, 2 недели назад @ eng.uber.com
Science at Uber: Powering Machine Learning at Uber
Science at Uber: Powering Machine Learning at Uber Science at Uber: Powering Machine Learning at Uber

In our Science at Uber video series, Uber employees talk about how we apply data science, artificial intelligence, machine learning, and other innovative technologies in our daily work.

Machine learning helps Uber make data-driven decisions which not only enable services such as ridesharing, but also financial planning and other core business needs.

Our machine learning platform, Michelangelo, lets teams across the company train, evaluate, and deploy models that help us forecast a wide range of business metrics.

The platform enables our teams to simply, flexibly, and intelligently prototype and productionize machine learning solutions at scale with tools such as Horovod, PyML, and Manifold.…

5 месяцев, 2 недели назад @ eng.uber.com
Introducing LCA: Loss Change Allocation for Neural Network Training
Introducing LCA: Loss Change Allocation for Neural Network Training Introducing LCA: Loss Change Allocation for Neural Network Training

In our paper, LCA: Loss Change Allocation for Neural Network Training, to be presented at NeurIPS 2019, we propose a method called Loss Change Allocation (LCA) that provides a rich window into the neural network training process.

Our methodsOne way of revealing detailed insights into the neural network training process is to measure how much each trainable parameter of the neural network “learns” at any point in time.

Suppose we are training a network, and during a single training iteration the parameter vector moves from to .

We call this measure Loss Change Allocation (LCA): how much a parameter’s movement at an iteration caused the loss to go up or down.

If we track validation LCA as wel…

5 месяцев, 2 недели назад @ eng.uber.com
Advancing AI: A Conversation with Jeff Clune, Senior Research Manager at Uber
Advancing AI: A Conversation with Jeff Clune, Senior Research Manager at Uber Advancing AI: A Conversation with Jeff Clune, Senior Research Manager at Uber

The past few months have been a whirlwind for Jeff Clune, Senior Research Manager at Uber and a founding member of Uber AI Labs.

Now, he brings this same spirit of curiosity and passion for discovery to his artificial intelligence research at Uber.

Artificial intelligence research thus sheds light on both of the twin questions that I have been on a quest to answer my whole life.

Uber acquired Geometric Intelligence, an AI startup I was working at, to create Uber AI Labs.

In what ways is Uber AI working on projects that are directly applicable to Uber?

6 месяцев, 1 неделя назад @ eng.uber.com
✈️ Telegram
DL in NLP DL in NLP
последний пост 14 часов назад
Так как опять нет времени на более подробное описание, вот подборка статей/постов/новостей которые меня заинтересовали за последние пару недель
Так как опять нет времени на более подробное описание, вот подборка статей/постов/новостей которые меня заинтересовали за последние пару недель

Так как опять нет времени на более подробное описание, вот подборка статей/постов/новостей которые меня заинтересовали за последние пару недель Блоги/новости:

1. 1. - нейросети могут ускорять физические вычисления

1. - про сжатие нейросеток Статьи:

1. 1. 1. Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks (, ) - исследование inductive bias различных нейросеток в контексте синтаксиса и иерархии

1. Neural Machine Translation with Joint Representation (, ) - новая хитрая архитектура, альтернатива трансформеру (+1 BLEU на NIST12)

14 часов назад @ t.me
New Ruder Newsletter
New Ruder Newsletter

New Ruder Newsletter newsletter.ruder.io/issues/accelerating-science-memorizing-vs-learning-to-look-things-up-schmidhuber-s-2010s-greek-bert-arc-illustrated-reformer-annotated-gpt-2-olmpics-223195

16 часов назад @ t.me
Если вы студент, вам интересно контрибьютить в opensource и вы хотите немного на этом подзаработать ($4K / лето), то аплайтись на Google Summer of Code. Там много интересных проектов, включая TensorFlow.
Если вы студент, вам интересно контрибьютить в opensource и вы хотите немного на этом подзаработать ($4K / лето), то аплайтись на Google Summer of Code. Там много интересных проектов, включая TensorFlow.

Если вы студент, вам интересно контрибьютить в opensource и вы хотите немного на этом подзаработать ($4K / лето), то аплайтись на Google Summer of Code. Там много интересных проектов, включая TensorFlow.

Только начинайте уже сейчас, потому что вам в том числе нужно составить proposal проекта и согласовать его с потенциальными менторами. summerofcode.withgoogle.com

1 день, 10 часов назад @ t.me
The Annotated GPT-2
The Annotated GPT-2

The Annotated GPT-2

amaarora.github.io/2020/02/18/annotatedGPT2.html Не знаю, при чём тут GPT-2, но в посте неплохо и довольно подробно описан transformer с кодом. А чем больше объяснений трансформера есть - тем лучше.

5 дней, 14 часов назад @ t.me
Multi-Sample Dropout for Accelerated Training and Better Generalization
Multi-Sample Dropout for Accelerated Training and Better Generalization

Multi-Sample Dropout for Accelerated Training and Better Generalization

Inoue [IBM Research]

arxiv.org/abs/1905.09788 Main idea: instead of one dropout mask use a couple of them. 1. Can be easily implemented 1. Significantly accelerates training by reducing the number of iterations

1. Does not significantly increase computation cost per iteration

1. Lower error rates and losses for both the training set and validation set

6 дней, 5 часов назад @ t.me
[photo]
[photo] 6 дней, 5 часов назад @ t.me
From English To Foreign Languages: Transferring Pre-trained Language Models
From English To Foreign Languages: Transferring Pre-trained Language Models

From English To Foreign Languages: Transferring Pre-trained Language Models

Tran [Amazon Alexa AI]

arxiv.org/abs/2002.07306 Когда ты видишь статью с одним автором - это либо полный трэш, либо что-то действительно интересное. В случае с этой статьёй:

__With a single GPU, our approach can obtain a foreign BERT__base__ model within a day and a foreign BERT__large__ within two days

__

Основная идея:

1. Инициализировать эмбеддинги нового языка (L2) с помощью эмбеддингов старого языка (L1). Каждый эмбеддинг L2 - это взвешенная сумма некоторых эмбеддингов L1. Веса находят либо с помощью word transition probability (см. статистический MT) либо с помощью unsupervised embedding alignment (см. Artexe …

6 дней, 13 часов назад @ t.me
https://twitter.com/tscholak/status/1178648609417580544
https://twitter.com/tscholak/status/1178648609417580544

https://twitter.com/tscholak/status/1178648609417580544

6 дней, 21 час назад @ t.me
https://github.com/hasktorch/hasktorch
https://github.com/hasktorch/hasktorch

https://github.com/hasktorch/hasktorch

6 дней, 21 час назад @ t.me
Подъехало новое соревнование!
Подъехало новое соревнование!

Подъехало новое соревнование! В этом году впервые будет проходить соревнование по автоматическому извлечению упоминаний о побочных эффектах лекарств из твитов на русском языке. SMM4H воркшоп будет проходить совместно с конференцией COLING 2020 в Барселоне, 13 сентября. Подробная информация: healthlanguageprocessing.org/smm4h-sharedtask-2020

6 дней, 21 час назад @ t.me
Релиз пандас 1.0
Релиз пандас 1.0

Релиз пандас 1.0

Лично мне интересны даже не новые фичи, а сам факт. Сколько лет пандас уже де-факто стандарт? twitter.com/pandas_dev/status/1222856129774018560

6 дней, 21 час назад @ t.me
Сегодня официально стартовало соревнование RuREBus на конференции "Диалог 2020".
Сегодня официально стартовало соревнование RuREBus на конференции "Диалог 2020".

Сегодня официально стартовало соревнование RuREBus на конференции "Диалог 2020". Контест включает в себя 3 задачи:

1. NER

2. Relation extraction с уже размеченными сущностями

3. End-to-end relation extraction на plain тексте. [Репозиторий ](https://github.com/dialogue-evaluation/RuREBus)[Официальная страница ](http://www.dialog-21.ru/evaluation/)

6 дней, 21 час назад @ t.me
Очередной выпуск NLP Newsletter
Очередной выпуск NLP Newsletter

Очередной выпуск NLP Newsletter

https://twitter.com/omarsar0/status/1223945187388424192?s=19

6 дней, 21 час назад @ t.me
На днях стартовало ещё одно соревнование от конференции "Диалог 2020" - GramEval2020.
На днях стартовало ещё одно соревнование от конференции "Диалог 2020" - GramEval2020.

На днях стартовало ещё одно соревнование от конференции "Диалог 2020" - GramEval2020.

Это соревнование по полной грамматической разметке русского языка, а именно: - морфология - синтаксис - лемматизация [Страница соревнования ](https://competitions.codalab.org/competitions/22902)[Github ](https://github.com/dialogue-evaluation/GramEval2020