Conversational AI, Transformers, Attention & Linguistics

4 min readApr 25, 2020

The Nvidia GPU Conference GTC2020 was fully digital this year. Last year I led a 2-part training session on Reinforcement Learning. This year I ventured into Conversational AI and Transformers. This is a short blog capturing top 7 points discussed in the session.

The slides and the notebooks are available in my github [Slides, Code]. The slides are easier to view if you download them. It gets cut off after a few slides — I have ~200 slides ! I have a cool URL for the github - http://bit.ly/gtc-transformers !

Conversational AI is a Linguistic Problem, but with a Computer Theoretic solution. So my emphasis was two fold — understand what is possible from a Computational perspective but keep in mind what is being achieved from the linguistic perspective — more importantly where we are in the continuum.

The agenda was very straightforward — Transformers, BERT, GPT, Current state-of-the-art bots like the MEENA BOT and hands-on labs in between.

1. Scale of Transformers

Throughout the session, this architectural diagram was the background

2. BERT-ology

Another dimension is the evolution of the architectures, fondly known as BERT-ology !

3. Google’s use of BERT is one of the fastest research-to-production transition

Till now Google treated queries like “a bag of words” — more or less. The use of BERT has added the use of TPUs as well as a better understanding of the ambiguous and nuanced queries. The reason BERT affects only 10% of the queries is not because they have throttled it, but because only ~10% of the queries are nuances enough to warrant BERT

4. Of course BERT is not a panacea !

I like the statement by Paul Michel — it is very good that researchers are finding that BERT has excess capacity; we shouldn’t make BERT smaller but find ways to utilize the capacity, say adding Knowledge Graphs !

5. Linguistics !

I have covered Linguistics elsewhere (here & here), so skipping for this blog. But linguistics is an important part of this discussion and I leave you with 2 slides to emphasize the point.

Sophisticated & Robust Common Sense Understanding of the world won’t come from pattern matching on examples … System should learn the default stuff from outside the conversation