You are currently using the Kubernetes version. This message will be visible during all the test phase.

NLP Course documentation

Encoder models

You are viewing pr_493 version. A newer version undefined is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Encoder models

Ask a Question

Encoder models use only the encoder of a Transformer model. At each stage, the attention layers can access all the words in the initial sentence. These models are often characterized as having “bi-directional” attention, and are often called auto-encoding models.

The pretraining of these models usually revolves around somehow corrupting a given sentence (for instance, by masking random words in it) and tasking the model with finding or reconstructing the initial sentence.

Encoder models are best suited for tasks requiring an understanding of the full sentence, such as sentence classification, named entity recognition (and more generally word classification), and extractive question answering.

Representatives of this family of models include: