
Develop a state-of-the-art deep learning nlp question answering system that searches over 200,000 research papers in real time, guided by Sam Bidden and Sambit Mohapatra.
Explore a real-time deep learning nlp system that answers covid questions by scanning elasticsearch papers, ranking with bm25, and surfacing exact answers via the robert rowbottom reader.
Explore the COVID Q&A course folder structure, including Google Drive and GitHub repositories, a data dictionary, notebooks with step-by-step code, and datasets for building a knowledge-base query engine.
Read and assemble the dataset by extracting files, reading PDFs, and combining video and PMC content to create per-paper data across five locations.
Access and prepare the sample dataset for the BERT covid q&a system by downloading the subset, reading pickle, pdf, and pmc samples, and merging into a unified data frame.
Learn how to clean and merge metadata with pdf data, select relevant abstracts by length, handle missing values, unify column naming, and prepare a unified text dataset for downstream processing.
Pre-process a cross-source dataset by aligning body text with politics content, remove unused columns, and prepare a 22-sample data subset for elasticsearch-based q&a.
Download and run an Elasticsearch docker container, configure a single-node cluster, and prepare a script to load sample data into Elasticsearch, enabling a knowledge base for a Q&A system.
Read and format a dataset, convert it to dictionaries, and write documents with metadata to an Elasticsearch index using Haystack for a BERT NLP Q&A system.
build streamlit ui - part 3 demonstrates sending user input to a backend handler, passing three data points (question, no answers, none of) and receiving a three-element answer list with scores.
Build a docker image for the knowledge base (kb) handler by configuring Elasticsearch, setting up a data cluster, enabling migration across environments, and accessing it through port 9200.
Build a docker image for the UI app, containerize three independent components, and deploy a scalable, host-agnostic UI across environments.
If you are interested to learn about NLP and looking to work on one of the biggest real world NLP projects, then this course is for you.
This course has been designed by professional data scientists and our sole passion is to enable our students to develop and deploy state of the art Deep Learning NLP models.
Our students will learn new skills at every step and section of our course and will be able to develop and deploy an NLP Q&A system that can retrieve answers from a database consisting of over 250,000 COVID research papers.
In this course, we code everything and hence the program is very much project based. The entire program is comprehensive in a way that you can easily extend the algorithms, tools to deploy a NLP Question Answering system on another topic of interest.
We have worked hard to organize the structure of this course in a way that you could apply it to any NLP project in your work or studies. It is structured in the following way:
· Section 1: All instruments you need to complete the course
· Section 2: Accessing and saving COVID dataset
· Section 3: Data pre-processing
· Section 4: Exploratory data analysis
· Section 5: Creating Knowledge base in Elasticsearch
· Section 6: Create BERT QA Engine
· Section 7 : Frontend with streamlit
· Section 8 : Dockerizing and deploying
All the sections include bonus/resource materials to deep dive into the concepts. You can evaluate your understanding by participating in funny yet exciting quizzes.
Waiting to welcome you to be part of this journey!!