top of page

Deeper Connections between Neural Networks and Gaussian Processes
Speed-up Active Learning

Evgenii Tsymbalov, Sergei Makarychev, Alexander Shapeev, Maxim Panov

Skoltech, Russia

 

skoltech_logo.png

Online poster

ICML UDL Workshop
June 14, 2019

 

Navigation

Active learning

  • In many applications labeled (annotated) data is very limited.

  • Unlabelled data is usually widely available.

  • Labeling (annotation) is often expensive.

  • Thus, a clever choice of points to annotate is needed.

  • Active learning: use machine learning model to select the points to annotate.

al_cycle.PNG
  • Applications: industrial design, chemoinformatics, material design, human annotation (NLP, images), . . .

  • We focus on the large-dimensional regression problems

Problem statement

problem statement anchor
eq1.PNG

UE for NNs

Types of uncertainty estimates for NNs:​

  • ensembling (accurate but costly);

  • Bayesian NN (natural for might be complicated to achieve state-of-the-art);

  • Dropout-based (stochastic output: MC-Dropout).

mcdue_pic.png

Yet some ​problems with MC-Dropout occur:

  • hard to sample more than one point efficiently

multiple_points_selected.png
  • overconfident prediction for out-of-sample points

dropout_overconfident.PNG

From NN to GP

  • Connections between neural networks and Gaussian processes recently gain significant attention [Matthews et al., 2018], [Lee at al., 2017] (both ICLR'18).

  • They show that NNs with purely random weights can be approximated by GP in an infinite network width limit. 

Here we consider simple FC NN with dropout between the hidden layers. We focus on the output values at different input points for different realizations of dropout mask.

  • When two points x1 and x2 are pretty close to each other in the feature space, we can see the correlation between the MC-Dropout realizations for the trained NN. The distributions are also Gaussian-like.

hist1.png
  • In case when two points x1 and x3 are far each other in the feature space, the correlation is lost yet distributions are still pretty Gaussian.

hist2.png

Algorithm

algo anchor

Based on an observable near-Gaussian behaviour, we propose the following

algo.png

Schematic representation

Scheme_v6.png

Benefits of the proposed approach:

  • GPs allow to sample points sequentially by recomputing uncertainty estimates.

  • GP uncertainty estimate values are high for out-of-sample points.

Experiments

experiments anchor

Airline delays

  • Comparison on a airline delays dataset [Hensman et al., 2017] w with the Bayes-by-Backprop and Noise Contrastive Prior from [Hafner et al., 2018].

  • Small 50 x 2 NN, NCP-based loss function

  • 50K train set, 100K test set (shifted in time).

  • Here and below, MCDUE refers to Monte-Carlo Dropout Uncertainty Estimate, NNGP - to proposed approach, NCP suffix means the NCP-based loss function. Other labels like in [Hafner et al., 2018].

flights_test_upd.png

UCI datasets

We compare the proposed approach with the MCDUE and random sampling on a variety of UCI regression datasets:

uci_datasets.png

Comparison is made in means of Dolan-More curves.

dolan.png

6 x 25 = 125 experiments result in a following curve:

dolan_uci.png

SchNet

  • SchNet is a state-of-the-art deep learning architecture for the molecules and materials.

  • For UE, we used a dropout placed between the FC layers.

schnet_exper2_corrected_2.png
  • For the energy prediction in QM9 dataset, experiment showed a 25% RMSE error decrease.

  • Another look: to reach the error of 2 kcal/mol, we need twice less additional data, which is very nice for a costly quantum-mechanical calculations!

Summary

summary anchor
  • A novel method for the UE in deep neural networks;

  • State-of-the-art results in context of active learning for different problems and architectures;

  • Will be presented on IJCAI'2019.

    Growth areas:

  • applicability of modern methods for GP speed-up to improve the scalability and accuracy;

  • CNN applications for images, RNN application for text;

  • Calibrated UE;

  • Cost-sensitive AL.

Acknowledgements

E.T. and A.S. were supported by by the Skoltech NGP Program No. 2016-7/NGP (a Skoltech-MIT joint project).

bottom of page