From Recurrent Models to the advent of Attention

Abstract

All recent language technologies, such as BERT, GPT3, or ChatGPT, rely on complex neural networks called Transformers. But, what brought us here? What was there that Transformers fixed? In this talk, we will introduce Recurrent Neural Networks and discuss their main applications to language modeling, such as machine translation, image captioning, or multi-model text generation. Next, we will go through RNNs' weaknesses and motivate what brought to the birth of the Attention mechanism, the core innovation behind Transformers. We will conclude with a brief introduction to the Transformer architecture.

Date
Feb 15, 2023 6:30 PM
Location
eDreams ODIGEO Tech Hub
Milano,
Giuseppe Attanasio
Giuseppe Attanasio
Postdoctoral Researcher