Benchmarking, Improving AI Model - BLEU, TER, GLUE and more

Name: Benchmarking, Improving AI Model - BLEU, TER, GLUE and more
Rating: 4.3 (6 reviews)

Master the art of benchmarking Machine learning models for any usage from Generative AI to narrow ai as computer vision

Created byDan Andrei Bucureanu

Last updated 6/2025

English

What you'll learn

What is Machine Learning benchmarking and how does it work
Standard Metrics used in AI ( Reliability, F1 Score, Recall)
Run a test through an API
How to run a benchmark against GLUE Metric
How to run a benchmark against BLUE Metric
MMLU (Massive Multitask Language Understanding) Benchmarking
TruthfulQA -Evaluation of Truthfulness in Language Models
Run Benchmark against SQuAD (Stanford Question Answering Dataset)
Understand the AI Model Lifecycle
Perplexity and Bias Benchmarking
Benchmark Against AI Fairness- Bias in Bios
Usage of HuggingFace models for benchmark and training
Computer Vision benchmark with CIFAR 10 dataset
Benchmark RAG with RAGAs

Course content

16 sections • 83 lectures • 7h 7m total length

Introduction7:40
About your Instructor2:00
5 minute AI Benchmark Challenge5:04

What makes up AI3:45
Natural Language Processing - NLP4:56
Types of Machine Learning4:03
Machine Learning - Supervised ML3:57
Machine Learning - Unsupervised ML4:59
Machine Learning - Reinforced ML3:16
Importance of Training Data4:05
What is a token in LLMs2:38
Weak AI vs Gen AI vs AGI - Know the difference5:31
Understand the difference between Narrow Task AI vs General Purpose AI vs Artificial General Intelligence (AGI)

Requirements

some python programming experience, you can also do without
basic understanding of AI Principles
Desire to learn the hottest skill on the market
5$ API Credits for OPEN AI - optional, you can use free models
VS Code, Postman, Python, Node

Description

This comprehensive course delves into the essential practices, tools, and datasets for AI model benchmarking. Designed for AI practitioners, researchers, and developers, this course provides hands-on experience and practical insights into evaluating and comparing model performance across tasks like Natural Language Processing (NLP) and Computer Vision.

What You’ll Learn:

Fundamentals of Benchmarking:
- Understanding AI benchmarking and its significance.
- Differences between NLP and CV benchmarks.
- Key metrics for effective evaluation.
Setting Up Your Environment:
- Installing tools and frameworks like Hugging Face, Python, and CIFAR-10 datasets.
- Building reusable benchmarking pipelines.
Working with Datasets:
- Utilizing popular datasets like CIFAR-10 for Computer Vision.
- Preprocessing and preparing data for NLP tasks.
Model Performance Evaluation:
- Comparing performance of various AI models.
- Fine-tuning and evaluating results across benchmarks.
- Interpreting scores for actionable insights.
Tooling for Benchmarking:
- Leveraging Hugging Face and OpenAI GPT tools.
- Python-based approaches to automate benchmarking tasks.
- Utilizing real-world platforms to track performance.
Advanced Benchmarking Techniques:
- Multi-modal benchmarks for NLP and CV tasks.
- Hands-on tutorials for improving model generalization and accuracy.
Optimization and Deployment:
- Translating benchmarking results into practical AI solutions.
- Ensuring robustness, scalability, and fairness in AI models.
Benchmark RAG implementations
1. RAGAS
2. Coherence
3. Confident AI - Deepeval

Hands-On Modules:

Implementing end-to-end benchmarking pipelines.
Exploring CIFAR-10 for image recognition tasks.
Comparing supervised, unsupervised, and fine-tuned model performance.
Leveraging industry tools for state-of-the-art benchmarking

Who this course is for:

AI Engineers
AI Project Managers
ML Testers
AI Testers
Production Owners that work with AI

Benchmarking, Improving AI Model - BLEU, TER, GLUE and more

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 15min

What is benchmarking and how does it work5 lectures • 26min

Introduction to AI - Optional if you know the basics of AI9 lectures • 37min

Setting up the Environment8 lectures • 21min

Hugging Face Platform - AI Engineer repo4 lectures • 26min

Common Traditional Metrics for LLMs ML Model and how to calculate them7 lectures • 25min

Data Splitting in Folds K- Fold Techniques5 lectures • 40min

GLUE - Benchmark against NLP5 lectures • 29min

Benchmark RAG Performance Pipeline - TREC-RAG6 lectures • 48min

Performance Characteristics for AI Models5 lectures • 19min

Requirements

Description

Who this course is for: