Главная | Icmludlposter2019

Deeper Connections between Neural Networks and Gaussian Processes
Speed-up Active Learning

Evgenii Tsymbalov, Sergei Makarychev, Alexander Shapeev, Maxim Panov

Skoltech, Russia

Online poster

ICML UDL Workshop
June 14, 2019

Navigation

Active learning

In many applications labeled (annotated) data is very limited.
Unlabelled data is usually widely available.
Labeling (annotation) is often expensive.
Thus, a clever choice of points to annotate is needed.
Active learning: use machine learning model to select the points to annotate.

Applications: industrial design, chemoinformatics, material design, human annotation (NLP, images), . . .
We focus on the large-dimensional regression problems

Problem statement

problem statement anchor

natural for Gaussian Processes;
easy to compute for Random Forest;
...things are harder for Neural Networks.

UE for NNs

Types of uncertainty estimates for NNs:

ensembling (accurate but costly);
Bayesian NN (natural for might be complicated to achieve state-of-the-art);
Dropout-based (stochastic output: MC-Dropout).

[source]

Yet some problems with MC-Dropout occur:

hard to sample more than one point efficiently

overconfident prediction for out-of-sample points

[source]

From NN to GP

Connections between neural networks and Gaussian processes recently gain significant attention [Matthews et al., 2018], [Lee at al., 2017] (both ICLR'18).
They show that NNs with purely random weights can be approximated by GP in an infinite network width limit.

Here we consider simple FC NN with dropout between the hidden layers. We focus on the output values at different input points for different realizations of dropout mask.

When two points x1 and x2 are pretty close to each other in the feature space, we can see the correlation between the MC-Dropout realizations for the trained NN. The distributions are also Gaussian-like.

In case when two points x1 and x3 are far each other in the feature space, the correlation is lost yet distributions are still pretty Gaussian.

Algorithm

algo anchor

Based on an observable near-Gaussian behaviour, we propose the following

Schematic representation

Benefits of the proposed approach:

GPs allow to sample points sequentially by recomputing uncertainty estimates.
GP uncertainty estimate values are high for out-of-sample points.

Experiments

experiments anchor

Airline delays

Comparison on a airline delays dataset [Hensman et al., 2017] w with the Bayes-by-Backprop and Noise Contrastive Prior from [Hafner et al., 2018].
Small 50 x 2 NN, NCP-based loss function
50K train set, 100K test set (shifted in time).
Here and below, MCDUE refers to Monte-Carlo Dropout Uncertainty Estimate, NNGP - to proposed approach, NCP suffix means the NCP-based loss function. Other labels like in [Hafner et al., 2018].

UCI datasets

We compare the proposed approach with the MCDUE and random sampling on a variety of UCI regression datasets:

Comparison is made in means of Dolan-More curves.

6 x 25 = 125 experiments result in a following curve:

SchNet

SchNet is a state-of-the-art deep learning architecture for the molecules and materials.
For UE, we used a dropout placed between the FC layers.

For the energy prediction in QM9 dataset, experiment showed a 25% RMSE error decrease.
Another look: to reach the error of 2 kcal/mol, we need twice less additional data, which is very nice for a costly quantum-mechanical calculations!

Summary

summary anchor

A novel method for the UE in deep neural networks;
State-of-the-art results in context of active learning for different problems and architectures;
Will be presented on IJCAI'2019.

Growth areas:
applicability of modern methods for GP speed-up to improve the scalability and accuracy;
CNN applications for images, RNN application for text;
Calibrated UE;
Cost-sensitive AL.

Acknowledgements

E.T. and A.S. were supported by by the Skoltech NGP Program No. 2016-7/NGP (a Skoltech-MIT joint project).

ICML UDL POSTER

Deeper Connections between Neural Networks and Gaussian Processes
Speed-up Active Learning

Navigation

Active learning

Problem statement

UE for NNs

From NN to GP

Algorithm

Experiments

Airline delays

UCI datasets

SchNet

Summary

Acknowledgements

CONTACT AUTHORS

CONTACT AUTHORS

Deeper Connections between Neural Networks and Gaussian Processes Speed-up Active Learning

Navigation

Active learning

Problem statement

UE for NNs

From NN to GP

Algorithm

Experiments

Airline delays

UCI datasets

SchNet

Summary

Acknowledgements

CONTACT AUTHORS

CONTACT AUTHORS

Deeper Connections between Neural Networks and Gaussian Processes
Speed-up Active Learning