Mastering AI Models with Hands-On Google Colab Projects

Name: Mastering AI Models with Hands-On Google Colab Projects
Rating: 4.8 (2 reviews)

Idea Implementation with HF, ReActAgent, DeepSeek, GPT-4o, GPT2, Llama3, Mistral-7B, NLLB, diffusers, HuBERT and Bark

Created byYu Li

Last updated 3/2025

English

What you'll learn

Create Question and Answer Chatbot Using Colab, Llama 3, Mistral-7B and GPT2 Models
Generate Images from Text Using Colab and Stable Diffusion Model
Image Recognition Using Colab and GPT-4o API
Generate Voice Using Colab and Bark Model
Generate Video Using Colab and text-to-video-ms-1.7b Model
Generate description of Image Using Colab and Deepseek Janus 1.3B Model
Using AI Agent ReActAgent to answer questions based on different PDF Files

12 sections • 37 lectures • 1h 19m total length

Introduction1:10
Using Llama 3 Model for QnA. Please remember to go to the hugging face webpage of the Llama3 Model and accept Use Policy. https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
Preparation2:38
Using Hugging Face LlaMa 3 Model for QnA. Please accept the user policy on the HF webpage of this model to get the access to this model. https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
Get the Permission to access the Llama 3 Model3:14
Run the Llama 3 Model4:22

Introduction0:52
Preparation1:40
Using Hugging Face Mistral Model for QnA. Please accept the user policy on the HF webpage of this model to get the access to this model. You might need to use a newer version (e.g. Mistral-7B-Instruct-v0.3) of this model due to update of HF Website.
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
Get the Permission to access the Mistral-7B Model0:33
Run the Mistral-7B Model4:04

Table of Contents

How to use Google Colab
Translation text use NLLB Model
QnA with Llama 3 Model and Mistral-7B Model based on FaQ
QnA with GPT-2 Model based on a JSON File
Image Generation with Stable Diffusion Model
Image Recognition with GPT-4o Model
Voice Generation with Bark Model
Text to Video with text-to-video-ms-1.7b Model
Bonus
- Deepseek: Describe Image with Text Using the Janus-1.3B Model
- AI Multi-Agent: Building an AI Multi-Agent for Q&A on Two PDFs using ReAct, LlammaIndex, and OpenAI

Description

Translation text use NLLB Model
Based on a text in german, using NLLB Model we can translate it to english. NLLB Model supports for more than 200 languages as input and output.

QnA with Llama 3 Model and Mistral-7B Model based on FaQ
An FaQ text file was prepared, by using Llama 3 Model or Mistral-7B Model, questions about this FaQ file can be raised and the Chatbot will give answer based on the FaQ file.

QnA with GPT-2 Model based on a JSON File
Information about a dummy company profile was prepared in a json file, by using GPT-2 Model, the Chatbot will give answer based on the company profile, if related question is raised.
Image Generation with Stable Diffusion Model
Based on text, an image will be generated by using stable diffusion model.

Image Recognition with GPT-4o Model
Given a image of a Sculpture, using GPT-4o model, it will tell you what is inside the image and give you information about the sculpture.

Voice Generation with Bark Model
Based on the voice you provide, the Bark model tries to generate a voice file for text snippets.
Text to Video with text-to-video-ms-1.7b Model
Given a text description about a scenario, the text-to-video-ms-1.7b will generate a video for you.

Bonus
- Deepseek: Describe Image with Text Using the Janus-1.3B Model
  Given an image, the Janus-1.3B Model will generate a description of this image for you.
- AI Multi-Agent: Building an AI Multi-Agent for Q&A on Two PDFs using ReAct, LlammaIndex, and OpenAI
  Given two PDFs, ReActAgent AI Agent will use the correct one depending on the prompt to answer questions