You are currently using the Kubernetes version. This message will be visible during all the test phase.

NLP Course documentation

Sequence-to-sequence models[sequence-to-sequence-models]

You are viewing pr_493 version. A newer version undefined is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Sequence-to-sequence models[sequence-to-sequence-models]

Ask a Question

Encoder-decoder models (also called sequence-to-sequence models) use both parts of the Transformer architecture. At each stage, the attention layers of the encoder can access all the words in the initial sentence, whereas the attention layers of the decoder can only access the words positioned before a given word in the input.

The pretraining of these models can be done using the objectives of encoder or decoder models, but usually involves something a bit more complex. For instance, T5 is pretrained by replacing random spans of text (that can contain several words) with a single mask special word, and the objective is then to predict the text that this mask word replaces.

Sequence-to-sequence models are best suited for tasks revolving around generating new sentences depending on a given input, such as summarization, translation, or generative question answering.

Representatives of this family of models include: