ICLR 2017, Toulon

Disclaimer: This is a highly subjective selection of contributions to the ICLR 2017 conference. There where many more very interesting papers. Anyway, maybe this is still helpful for you to find relevant work. Enjoy, Georg Martius

Representation Learning / Inference

Deep Variational Information Bottleneck

Cool way to implement the information bottleneck (Tishby et al 1999) in a variational network. Here it is not autoencoder, but actually a classifier, but with an explicit latent state. Now the latent state is optimized to maximize mutual information with the targets but minimize MI with the inputs. This forces it to extract only relevant information from the inputs (bottleneck). They show that this also helps against adversarial attacks.

poster poster

Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data

Like VAE but particular for non-Markovian state transition models.


There was a paper on expecting/visualizing the internals of CNNs:

@Alex K.

I found it interesting how they performed the conditional sampling to find out what is the effect of a single pixel. I just though that might also be somehow relevant for finding a segmentation in the weakly supervised case.


Reinforcement Learning

Learning to Act by Predicting the Future

Use the generic task of prediction to structure the policy network. (playing Doom)


Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

They use Bayesian Neural networks (A NN where the parameters are integrated out using MCMC) to model the system and then perform policy search on the model (if I understand correctly). They also show how a GP fails here. There is some other interesting thing that they use Alpha-divergence instead of KL-divergence for matching the approximated posterior.

Downside: sampling the posterior is computationally expensive (in contrast to GP)


PGQ Combining policy gradient and Q-learning

!!! Continue here!

Variational Intrinsic Control

A way to implement Empowerment maximization in a RL framework with options.


Generative Models

Maximum Entropy Flow Networks

Method to find/model and sample from Max-Entropy distributions. (Distribution with maximum entropy under the constraint of matching the expected sufficient statistics of the given data.) They use it for instance for texture generation.


Transfer Learning

Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning

A demonstration that domain adaptation by matching central moments (up to order 5) is quite effective (and so simple :-)


Dialog/Text understanding

Query-Reduction Networks for Question Answering

Like a memory network but it is designed to recusively modify the question (with the idea that the it learns how to ask simpler questions to figure out the actual answer). Very effective on bAbI for instance.


Learning to Query, Reason, and Answer Questions On Ambiguous Texts

Like the bAbI task but with some facts having placeholders. Tried also reinforcement learning, but it was worst the supervised learning.


Neural Turing machines etc

Lie-Access Neural Turing Machines

Fancy name for something rather simple, but the idea is nice: In the Neural Turing Machine (NTM) the memory is addressed by a 1D address. To make it differentiable the access is "smoothed" out by weighted access. Now by using a higher dimensional manifold for the address space (like a hypersphere or so) we can define operations on the addresses by a Lie-actions. Lie groups are groups with differentiable operations, such that backpropagation works nicely. So in practice we have a simple vector operations (like rotations in the sphere case). Writing to memory works by placing an element at the current location. Reading is done by weighted sum of the nearest memory entries given a certain address. Disadvantages:


potential improvement with our aggregration method

Deep Learning with Sets and Point Clouds


It is essentially trivial: The permutation-equivariant layer can only do Ident and a uniform operation. Our DAC aggregation stuff might be more useful here?


Deep Learning with Dynamic Computation Graphs. Aka: Tensorflow fold library @Michal

Makes it possible to have much faster "dynamical" computation graphs