
Explore data augmentation in NLP by using word embeddings to replace keywords with similar words, expanding datasets while maintaining relevance through supervised filtering.
Explore data augmentation in NLP using BERT, building a masked word prediction pipeline with the transformers library and bert-base-uncased, including multilingual BERT for multiple languages.
Fine-tune a T5 model for paraphrase generation using the simple transformers library on the PAWS dataset. Learn to save, load, and generate domain-specific paraphrases with top-k and top-p sampling.
You might have optimal machine learning algorithm to solve your problem. But once you apply it in real world soon you will realize that you need to train it on more data. Due to lack of large dataset you will try to further optimize the algorithm, tune hyper-parameters or look for some low tech approach. Most state of the art machine learning models are trained on large datasets. Real world performance of machine learning solutions drastically improves with more data.
Through this course you will learn multiple techniques for augmenting text data. These techniques can be used to generate data for any NLP task. This augmented dataset can help you to bridge the gap and quickly improve accuracy of your machine learning solutions.