NeurIPS2018 — Conference Summary & Reading List

The goal of this blog is three-fold :

  1. 1st - For those who didn’t attend, give a quick impression about the conference;
  2. 2nd - A list of curated papers to read (I have a link of ~60 papers).
  • The conference has accepted ~1000 papers, so it is impossible to go thru all of them.

3. Finally, this blog also for my reference — otherwise I will lose the thoughts as life’s more urgent trifles overcome the important academic pursuits.

P.S: (Dec 12, 2018) The blog is in a rough form. The graphics do not translate well from Word to medium. I will clean up the blog. If I wait for long, I will lose the motivation to publish. An ugly blog will motivate me to clean it up ! Also am leaving Montreal tomorrow and a stack of work waits me on the other end. So need to publish before I take off.

Down below, I have included links to all papers, by category, medium links to selected papers and finally facebook videos of sessions. Again, as a single reference point for me and others.

There are two ways of looking at a massive intellectually packed conference like the NeurIPS.

  • If one has a research (or an implementation) focus, it is relatively easier to parse thru all the relevant papers
  • But if one tries to absorb the conference from a broader perspective, it is a herculean task. The breadth and depth will overwhelm you — at least that is what I experience.

Suggestions on how to get the most out of NeurIPS or similar packed conference:

  1. Go in a group — I really missed my colleagues !!
  2. Read and discuss papers before going in
  3. Have people with different focus — one person can’t cover them all

Initial impressions

Everybody knew the “AI Conference” at Montreal. From the airport to the ice sculptures, everyone welcomed us ! Of course they knew the nerds are not going to conquer Canada ! And Montreal is an AI startup powerhouse !

The conference is Yuuuge ! 1011 papers, ~8000 attendees, posters and workshops covering a spectrum of algorithms, concepts, practices, experiments and ideas — all well researched with due amount of theory.

To match the Yuuge-ness of the conference, the blog is also Yuuge ! Read Part 1 and you will get a good idea. Part 2 has links to papers that you can go thru at your convenience. I plan to do the same.

When looking for papers and posters, one main question is the proverbial exploit vs explore. i.e. exploit already known domains and algorithms vs look for new and interesting ideas. Moreover, w.r.t new ideas, we won’t know the influence until later, sometimes only in 1–2 years from now. The algorithms and ideas need to studied, applied, tweaked, combined and finally deployed to solve problems. And, of course, implemented in frameworks like TensorFlow and pytorch.

NeurIPS’18 Conference Theme:

From what I saw, the broader themes of the conference were Accountability and Algorithmic Bias, Trust, Robustness and most importantly diversity. You can see from the keynotes and workshops.

These questions resonated well with me.:

- Is ML truly ready for real-world deployment?

- Can We Truly Rely on ML?

- ML Predictions Are (Mostly) Accurate but Brittle !

- Need to understand the “failure modes” of ML

My focus:

I was looking for work in the following areas:

  1. DRL
  2. Conversational AI
  3. Machine Reasoning
  4. Structured Memory & Long term memory
  5. Capsule Networks
  6. Smart mobility (orchestration and fleet management using AI/RL)
  7. Object detection (always my favorite)
  8. Embedded ml (to try out on my new iPad pro neural chip and the Swift language)

9. Integration of knowledge and knowledge graph

NeurIPS’18 Topics Summary:

I did find all of my topics in some measure — not equally represented. The actual topics were varied:

  1. Embedding, DRL, GANs, all are represented very well. Probably majority of papers
  • Many papers on other adversarial topics incl Wasserstein methods
  • Graphical GANs to represent structured data

2. Question and Answer algorithms were covered somewhat well

3. VAEs (Variational Autoencoders)

4. Another important topic stream was Training, optimizations, scalability (ie train massive dimensional data) and speedup, showing maturing domains

  • Learning from noisy data, ability to learn noise in data, learning from smaller and more diverse data set — this is a topic that can make models robust to changing data distributions

5. Lots of papers on Bayesian methods at multiple levels

6. Casual Models — many papers and one very good tutorial

7. Interesting topics — a few papers on each

  • Relational reasoning and relational networks — interesting, moving towards knowledge graph ?
  • SACNNS (Structure aware CNNs) — interesting concept
  • Even some time series papers !
  • Attention mechanisms — even a paper on nagging DRL networks (kind of)!
  • Spiking Neural Networks — a few papers on this interesting concept
  • Langevin Dynamics and associated methods
  • Even a paper on Runge-Kutta Discretization !

8. Many papers on optimizing, clustering algorithms, quantifying uncertainty in clustering and so forth

9. Only 3 papers on CapsuleNet (from the titles). I was expecting more

For the detail oriented, I have good links to follow (in addition to my Top 52 at the end of this blog, … yes there is an end …)

NeurIPS Workshops:

There were multiple excellent workshops and I wanted to attend them all ! I attended two.

DRL Workshop — Yuuuge — Room Capacity ~3000. ~2000 attendees in & out

NeurIPS Keynotes:

Some interesting keynotes and talks.

The videos are published at facebook, link down below. I have captured the time and date of the events so that you can match with the fb videos. It is not ideal, but at least there is a way

  • This talk was well received. A view from the biological world

Tutorial on Adversarial was interesting

Counterfactual Inference — definitely requires a second look

Missed this one on Financial Services. Plan to ask the organizers for materials

Interesting Invited talks

Finally, The Montreal Declaration by Prof. Yoshua Bengio

NeurIPS’18 — Awards

  1. Best Paper Awards:
  • Non-delusional Q-learning and Value-iteration
    By: Tyler Lu · Dale Schuurmans · Craig Boutilier
  • Optimal Algorithms for Non-Smooth Distributed Optimization in Networks
    By: Kevin Scaman · Francis Bach · Sebastien Bubeck · Laurent Massoulié · Yin Tat Lee
  • Nearly Tight Sample Complexity Bounds for Learning Mixtures of Gaussians via Sample Compression Schemes
    By: Hassan Ashtiani · Shai Ben-David · Nick Harvey · Christopher Liaw · Abbas Mehrabian · Yaniv Plan
  • Neural Ordinary Differential Equations
    By: Tian Qi Chen · Yulia Rubanova · Jesse Bettencourt · David Duvenaud
  • https://www.technologyreview.com/s/612561/a-radical-new-neural-network-design-could-overcome-big-challenges-in-ai/

2. Test of time award paper

Part 2 : NeurIPS’18 — The Gory details & Reading List

Now let me dig into the core of the conference — the papers, posters, discussions and the rest. I have listed the links for all the 1011 papers and then my reading list of ~52 papers.

My Reading List

  • Finally, the reading list from my explorations of the 1011 papers !

Ran out of time to do it by last week, unfortunately. Have to pack, sleep and catch a plane. I have the list in word, but it doesn’t translate well to medium. So I have to do it one by one. I will finish it on Tuesday. [11/16/18 : Finished !]

Also many times, the materials are distributed i.e. the talk session shows spotlight slides while the poster session has link to the paper and viseo. I have tried to hunt down all the materials and then keep them together in one place along with the paper.

  1. Dendritic cortical microcircuits approximate the backpropagation algorithm by João Sacramento · Rui Ponte Costa · Yoshua Bengio · Walter Senn
  2. Spectral Filtering for General Linear Dynamical Systems by Elad Hazan · HOLDEN LEE · Karan Singh · Cyril Zhang · Yi Zhang
  3. DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors by Arash Vahdat · Evgeny Andriyash · William Macready
  4. GumBolt: Extending Gumbel trick to Boltzmann priors by Amir H Khoshaman · Mohammad Amin
  5. Banach Wasserstein GAN by In Tue Poster Session A
    Jonas Adler · Sebastian Lunz (P.S: I have an interest in Wasserstein Generative Adversarial Networks (WGANs))
  6. Are GANs Created Equal? A Large-Scale Study by Mario Lucic · Karol Kurach · Marcin Michalski · Sylvain Gelly · Olivier Bousquet
  7. Fast and Effective Robustness Certification by Gagandeep Singh · Timon Gehr · Matthew Mirman · Markus Püschel · Martin Vechev (Interesting idea - certifying neural network robustness based on abstract interpretation !)
  8. FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction by Shuyang Sun · Jiangmiao Pang · Jianping Shi · Shuai Yi · Wanli Ouyang
  9. Pelee: A Real-Time Object Detection System on Mobile Devices by Jun Wang · Tanner Bohn · Charles Ling (Plan to try on apple’s neural chip — work for my new iPad Pro !)
  10. Kalman Normalization: Normalizing Internal Representations Across Network Layers by Guangrun Wang · jiefeng peng · Ping Luo · Xinjiang Wang · Liang Lin
  11. CapProNet: Deep Feature Learning via Orthogonal Projections onto Capsule Subspaces By Liheng Zhang · Marzieh Edraki · Guo-Jun Qi (One of the three papers on Capsule Networks — I was expecting more)
  12. HitNet: Hybrid Ternary Recurrent Neural Network by Peiqi Wang · Xinfeng Xie · Lei Deng · Guoqi Li · Dongsheng Wang · Yuan Xie (Interesting approach to balance accuracy and quantization)
  13. The Importance of Sampling inMeta-Reinforcement Learning by Bradly Stadie · Ge Yang · Rein Houthooft · Peter Chen · Yan Duan · Yuhuai Wu · Pieter Abbeel · Ilya Sutskever
  14. Variational Memory Encoder-Decoder by Hung Le · Truyen Tran · Thin Nguyen · Svetha Venkatesh (Conversational)
  15. On the Dimensionality of Word Embedding by Zi Yin · Yuanyuan Shen
  16. Mesh-TensorFlow: Deep Learning for Supercomputers by Noam Shazeer · Youlong Cheng · Niki Parmar · Dustin Tran · Ashish Vaswani · Penporn Koanantakool · Peter Hawkins · HyoukJoong Lee · Mingsheng Hong · Cliff Young · Ryan Sepassi · Blake Hechtman
  17. Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias by Abhinav Gupta · Adithyavairavan Murali · Dhiraj Prakashchand Gandhi · Lerrel Pinto
  18. Evolved Policy Gradients by Rein Houthooft · Yuhua Chen · Phillip Isola · Bradly Stadie · Filip Wolski · OpenAI Jonathan Ho · Pieter Abbeel
  19. Bias and Generalization in Deep Generative Models: An Empirical Study by Shengjia Zhao · Hongyu Ren · Arianna Yuan · Jiaming Song · Noah Goodman · Stefano Ermon
  20. How Does Batch Normalization Help Optimization? by Shibani Santurkar · Dimitris Tsipras · Andrew Ilyas · Aleksander Madry (A medium Post on this topic)
  21. Step Size Matters in Deep Learning by Kamil Nar · Shankar Sastry (Slides)
  22. Precision and Recall for Time Series By Nesime Tatbul · Tae Jun Lee · Stan Zdonik · Mejbah Alam · Justin Gottschlich (Spotlight Slides)
  23. Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation by Matthew O’Kelly · Aman Sinha · Hongseok Namkoong · Russ Tedrake · John Duchi
  24. Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding By Nan Rosemary Ke · Anirudh Goyal ALIAS PARTH GOYAL · Olexa Bilaniuk · Jonathan Binas · Michael Mozer · Chris Pal · Yoshua Bengio (Spotlight Slides)
  25. Chain of Reasoning for Visual Question Answering by Chenfei Wu · Jinlai Liu · Xiaojie Wang · Xuan Dong
  26. Distilled Wasserstein Learning for Word Embedding and Topic Modeling By Hongteng Xu · Wenlin Wang · Wei Liu · Lawrence Carin
  27. Exploration in Structured Reinforcement Learning By Jungseul Ok · Alexandre Proutiere · Damianos Tranos
  28. Recurrent Transformer Networks for Semantic Correspondence by Seungryong Kim · Stephen Lin · SANG RYUL JEON · Dongbo Min · Kwanghoon Sohn (Spotlight Slides)
  29. Hamiltonian Variational Auto-Encoder By Anthony L Caterini · Arnaud Doucet · Dino Sejdinovic
  30. How to Start Training: The Effect of Initialization and Architecture By Boris Hanin · David Rolnick
  31. Revisiting (ϵ,γ,τ)-similarity learning for domain adaptation By Sofiane Dhouib · Ievgen Redko (Spotlight Slides)
  32. Is Q-Learning Provably Efficient? By Chi Jin · Zeyuan Allen-Zhu · Sebastien Bubeck · Michael Jordan
  33. Monte-Carlo Tree Search for Constrained POMDPs By Jongmin Lee · Geon-hyeong Kim · Pascal Poupart · Kee-Eung Kim
  34. Policy Optimization via Importance Sampling By Alberto Maria Metelli · Matteo Papini · Francesco Faccio · Marcello Restelli
  35. Reducing Network Agnostophobia By Akshay Raj Dhamija · Manuel Günther · Terrance Boult (A very real problem when deploying models: Agnostophobia, the fear of the unknown, can be experienced by deep learning engineers while applying their networks to real-world applications. Unfortunately, network behavior is not well defined for inputs far from a network’s training set)
  36. Are ResNets Provably Better than Linear Predictors? By Ohad Shamir
  37. Reinforcement Learning for Solving the Vehicle Routing Problem By MohammadReza Nazari · Afshin Oroojlooy · Lawrence Snyder · Martin Takac
  38. Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning By Tom Zahavy · Matan Haroush · Nadav Merlis · Daniel J Mankowitz · Shie Mannor
  39. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents By Edoardo Conti · Vashisht Madhavan · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune (The concept of novelty seeking agents somehow seems wrong ;o))
  40. Dual Policy Iteration By Wen Sun · Geoffrey Gordon · Byron Boots · J. Bagnell (Dual Policy Iteration looks very interesting. Might be able to solve class of problems that a single policy layer can’t solve)
  41. Online Robust Policy Learning in the Presence of Unknown Adversaries By Aaron Havens · Zhanhong Jiang · Soumik Sarkar
  42. Learning to Navigate in Cities Without a Map By Piotr Mirowski · Matt Grimes · Mateusz Malinowski · Karl Moritz Hermann · Keith Anderson · Denis Teplyashin · Karen Simonyan · koray kavukcuoglu · Andrew Zisserman · Raia Hadsell
  43. Fighting Boredom in Recommender Systems with Linear Reinforcement Learning By Romain WARLOP · Alessandro Lazaric · Jérémie Mary (Interesting concept. I am always partial to adding serendipity to recommendations)
  44. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments By Sriram Srinivasan · Marc Lanctot · Vinicius Zambaldi · Julien Perolat · Karl Tuyls · Remi Munos · Michael Bowling
  45. Learning to Share and Hide Intentions using Information Regularization By Daniel Strouse · Max Kleiman-Weiner · Josh Tenenbaum · Matt Botvinick · David Schwab
  46. Teaching Inverse Reinforcement Learners via Features and Demonstrations By Luis Haug · Sebastian Tschiatschek · Adish Singla (Different world view between a teacher and student is an interesting problem)
  47. Why Is My Classifier Discriminatory? By Irene Chen · Fredrik Johansson · David Sontag (Spotlight Slides)
  48. Wasserstein Variational Inference By Luca Ambrogioni · Umut Güçlü · Yağmur Güçlütürk · Max Hinne · Marcel A. J. van Gerven · Eric Maris
  49. FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network By Aditya Kusupati · Manish Singh · Kush Bhatia · Ashish Kumar · Prateek Jain · Manik Varma
  50. Understanding Batch Normalization By Nils Bjorck · Carla P Gomes · Bart Selman · Kilian Weinberger
  51. Towards Deep Conversational Recommendations By Raymond Li · Samira Ebrahimi Kahou · Hannes Schulz · Vincent Michalski · Laurent Charlin · Chris Pal
  52. Non-delusional Q-learning and value-iteration By Tyler Lu · Dale Schuurmans · Craig Boutilier (Google’s paper, one of the best paper awards)
  53. Fully Understanding The Hashing Trick by Lior Kamma · Casper B. Freksen · Kasper Green Larsen(Spotlight Slides)
  54. When do random forests fail? By Cheng Tang · Damien Garreau · Ulrike von Luxburg
  55. Norm matters: efficient and accurate normalization schemes in deep networks By Elad Hoffer · Ron Banner · Itay Golan · Daniel Soudry(Spotlight Slides)
  56. Out-of-Distribution Detection using Multiple Semantic Label Representations by Gabi Shalev · Yossi Adi · Joseph Keshet
  57. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks By Kimin Lee · Kibok Lee · Honglak Lee · Jinwoo Shin(Spotlight Slides)
  58. Recurrent World Models Facilitate Policy Evolution by David Ha · Jürgen Schmidhuber