Question Generation using Natural Language processing
What you'll learn
- Generate assessments like MCQs, True/False questions etc from any content using state-of-the-art natural language processing techniques.
- Apply recent advancements like BERT, OpenAI GPT-2, and T5 transformers to solve real-world problems in edtech.
- Use NLP libraries like Spacy, NLTK, AllenNLP, HuggingFace transformers, etc.
- Deploy transformer models like T5 to production in a Serverless fashion by ONNX quantization and by dockerizing them using FastAPI.
- Use Google Colab environment to run all these algorithms.
- Python, data structures, deep learning and basic familiarity with Pytorch.
This course focuses on using state-of-the-art Natural Language processing techniques to solve the problem of question generation in edtech.
If we pick up any middle school textbook, at the end of every chapter we see assessment questions like MCQs, True/False questions, Fill-in-the-blanks, Match the following, etc. In this course, we will see how we can take any text content and generate these assessment questions using NLP techniques.
This course will be a very practical use case of NLP where we put basic algorithms like word vectors (word2vec, Glove, etc) to recent advancements like BERT, openAI GPT-2, and T5 transformers to real-world use.
We will use NLP libraries like Spacy, NLTK, AllenNLP, HuggingFace transformers, etc.
All the sections will be accompanied by easy to use Google Colab notebooks. You can run Google Colab notebooks for free on the cloud and also train models using free GPUs provided by Google.
This course will focus on the practical use cases of algorithms. A high-level introduction to the algorithms used will be introduced but the focus is not on the mathematics behind the algorithms.
A high-level understanding of deep learning concepts like forward pass, backpropagation, optimizers, loss functions is expected.
Strong Python programming skills with basic knowledge of Natural Language processing and Pytorch is assumed.
The course outline :
➤ Generate distractors (wrong choices) for MCQ options
Students will use several approaches like Wordnet, ConceptNet, and Sense2vec to generate distractors for MCQ options.
➤ Generate True or False questions using pre-trained models like sentence BERT, constituency parser, and OpenAI GPT-2
Students will learn to use constituency parser from AllenNLP to split any sentence. They will learn to use GPT-2 to generate sentences with alternate endings and filter them with Sentence BERT.
➤ Generate MCQs from any content by training a T5 transformer model using the HuggingFace library.
Students will understand the T5 transformer algorithm and use SQUAD dataset to train a question generation model using HuggingFace Transformers library and Pytorch Lightning.
➤ Generate Fill in the blanks questions
Students will learn to use Python Keyword extraction library to extract keywords, use flashtext library to do fast keyword matching, and visualize fill-in-the-blanks using HTML ElementTree in Colab
➤ Generate Match the following questions.
Students will learn to use Python Keyword extraction library to extract keywords, use flashtext library to do fast keyword matching, and use BERT to do word sense disambiguation (WSD).
➤ Deploy question generation models to production.
Deploy transformer models like T5 to production in a serverless fashion by converting them to ONNX format and performing quantization. Create lightweight docker containers using FastAPI for transformer model and deploy on Google Cloud Run.
Who this course is for:
- Data science students with intermediate skillset in Python and Deep learning.
Ramsri is a lead data scientist with 8+ years of work experience at startups and large corporations across Silicon Valley, Singapore, and India.
Most recently he has been a co-founder and CTO of an AI-assisted assessments startup.
Ramsri is very keen on mapping cutting edge NLP and computer vision research to practical real-world use.