
Automates machine learning on your data with Databricks AutoML, preparing data, defining prediction target, and training model trials for regression, classification, and forecasting, presenting the best model with Python notebooks.
Print experiment summaries, view best trial paths and mlflow run IDs, then generate predictions on the testing set using the Databricks Python API and evaluate with R2 and RMSE.
Explore AutoML classification workflows from Python and UI, compare models with F1 scores and hyperparameters like n_estimators, and generate predictions with mlflow, pandas, spark, and confusion matrices.
Explore building and updating a Databricks feature store, training a wine quality model with real time measurement features, and performing batch scoring with versioned model deployment.
Explore Mlflow logging API for managing runs and experiments, logging parameters, metrics, artifacts, and models in notebooks, and creating different experiments with multiple runs.
Welcome to our comprehensive course on Databricks Certified Machine Learning Engineer Associate certification. This course is designed to help you master the skills required to become a certified Databricks ML engineer associate.
Databricks is a cloud-based data analytics platform that offers a unified approach to data processing, machine learning, and analytics. With the growing demand for data engineers, Databricks has become one of the most sought-after skills in the industry.
The minimally qualified candidate should be able to:
Use Databricks Machine Learning and its capabilities within machine learning workflows, including:
Databricks Machine Learning (clusters, Repos, Jobs)
Databricks Runtime for Machine Learning (basics, libraries)
AutoML (classification, regression, forecasting)
Feature Store (basics)
MLflow (Tracking, Models, Model Registry)
Implement correct decisions in machine learning workflows, including:
Exploratory data analysis (summary statistics, outlier removal)
Feature engineering (missing value imputation, one-hot-encoding)
Tuning (hyperparameter basics, hyperparameter parallelization)
Evaluation and selection (cross-validation, evaluation metrics)
Implement machine learning solutions at scale using Spark ML and other tools, including:
Distributed ML Concepts
Spark ML Modeling APIs (data splitting, training, evaluation, estimators vs. transformers, pipelines)
Hyperopt
Pandas API on Spark
Pandas UDFs and Pandas Function APIs
Understand advanced scaling characteristics of classical machine learning models, including:
Distributed Linear Regression
Distributed Decision Trees
Ensembling Methods (bagging, boosting)