NLP with LLMs: Working with Tokenizers in Hugging Face
“Hugging Face a leading company in the field of artificial intelligence (AI) offers a comprehensive platform that enables developers and researchers to build trAIn and deploy state-of-the-art machine learning (ML) models with a strong emphasis on open collaboration and community-driven development.
In this course you will discover the extensive libraries and tools Hugging Face offers including the Transformers library which provides access to a vast array of pre-trAIned models and datasets.
Next you will set up your working environment in Google Colab. You will also explore the critical components of the text preprocessing pipeline: normalizers and pre-tokenizers.
Finally you will master various tokenization techniques including byte pAIr encoding (BPE) Wordpiece and Unigram tokenization which are essential for working with transformer models. Through hands-on exercises you will build and trAIn BPE and WordPiece tokenizers configuring normalizers and pre-tokenizers to fine-tune these tokenization methods. “