All recent language technologies, such as BERT, GPT3, or ChatGPT, rely on complex neural networks called Transformers. But, what brought us here? What was there that Transformers fixed? In this talk, we will introduce Recurrent Neural Networks and discuss their main applications to language modeling, such as machine translation, image captioning, or multi-model text generation. Next, we will go through RNNs' weaknesses and motivate what brought to the birth of the Attention mechanism, the core innovation behind Transformers. We will conclude with a brief introduction to the Transformer architecture.