Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Snowflake Data Scientist Certification DSA-C02 Exam!!
Rating: 4.4 out of 5(7 ratings)
94 students

Snowflake Data Scientist Certification DSA-C02 Exam!!

Take your Advance Step in the Snowflake Data Cloud journey!
Created byReshma Chhabra
Last updated 7/2024
English

What you'll learn

  • Curated the content based on Most up-to-date SNOWPRO Ⓡ ADVANCED:DATA SCIENTIST EXAM STUDY GUIDE
  • 100% Correct Explanation & Snowflake documentation links available for Exam questions.
  • Comprehensive User Scenarios based Practical Questions with Detailed solution guide.
  • Extracted Questions sets from Experienced Snowflake Data Scientist.

Included in This Course

130 questions
  • Snowflake Advanced Data Scientist DSA-C02 Set A65 questions
  • Snowflake Advanced Data Scientist DSA-C02 Set B65 questions

Description

Snowflake SnowPro Advanced: Data Scientist exam tests advanced knowledge and skills used to

apply comprehensive data science principles, tools, and methodologies using Snowflake. The

exam will assess skills through scenario-based questions and real-world examples.

This certification will test the ability to:

● Outline data science concepts

● Implement Snowflake data science best practices

● Prepare data and feature engineering in Snowflake

● Train and use machine learning models

● Use data visualization to present a business case

Domain                                                   Weightings on Exams

1.0 Data Science Concepts                                           15%

2.0 Data Pipelining                                                          19%

3.0 Data Preparation and Feature Engineering         30%

4.0 Model Development                                                  20%

5.0 Model Deployment                                                      16%


Domain 1.0: Data Science Concepts

1.1 Define machine learning concepts for data science workloads.

● Machine Learning

○ Supervised learning

○ Unsupervised learning

1.2 Outline machine learning problem types.

● Supervised Learning

○ Structured Data

■ Linear regression

■ Binary classification

■ Multi-class classification

■ Time-series forecasting

○ Unstructured Data

■ Image classification

■ Segmentation

● Unsupervised Learning

○ Clustering

○ Association models

1.3 Summarize the machine learning lifecycle.

● Data collection

● Data visualization and exploration

● Feature engineering

● Training models

● Model deployment

● Model monitoring and evaluation (e.g., model explainability, precision, recall, accuracy,confusion matrix)

● Model versioning

1.4 Define statistical concepts for data science.

● Normal versus skewed distributions (e.g., mean, outliers)

● Central limit theorem

● Z and T tests

● Bootstrapping

● Confidence intervals


Domain 2.0: Data Pipelining

2.1 Enrich data by consuming data sharing sources.

● Snowflake Marketplace

● Direct Sharing

● Shared database considerations

2.2 Build a data science pipeline.

● Automation of data transformation with streams and tasks

● Python User-Defined Functions (UDFs)

● Python User-Defined Table Functions (UDTFs)

● Python stored procedures

● Integration with machine learning platforms (e.g., connectors, ML partners, etc.)


Domain 3.0: Data Preparation and Feature Engineering

3.1 Prepare and clean data in Snowflake.

● Use Snowpark for Python and SQL

○ Aggregate

○ Joins

○ Identify critical data

○ Remove duplicates

○ Remove irrelevant fields

○ Handle missing values

○ Data type casting

○ Sampling data

3.2 Perform exploratory data analysis in Snowflake.

● Snowpark and SQL

○ Identify initial patterns (i.e., data profiling)

○ Connect external machine learning platforms and/or notebooks (e.g. Jupyter)

● Use Snowflake native statistical functions to analyze and calculate descriptive

data statistics.

○ Window Functions

○ MIN/MAX/AVG/STDEV

○ VARIANCE

○ TOPn

○ Approximation/High Performing function

● Linear Regression

○ Find the slope and intercept

○ Verify the dependencies on dependent and independent variables

3.3 Perform feature engineering on Snowflake data.

● Preprocessing

○ Scaling data

○ Encoding

○ Normalization

● Data Transformations

○ Data Frames (i.e, Pandas, Snowpark)

○ Derived features (e.g., average spend)

● Binarizing data

○ Binning continuous data into intervals

○ Label encoding

○ One hot encoding

3.4 Visualize and interpret the data to present a business case.

● Statistical summaries

○ Snowsight with SQL

○ Streamlit

○ Interpret open-source graph libraries

○ Identify data outliers

● Common types of visualization formats

○ Bar charts

○ Scatterplots

○ Heat maps


Domain 4.0: Model Development

4.1 Connect data science tools directly to data in Snowflake.

● Connecting Python to Snowflake

○ Snowpark

○ Python connector with Pandas support

○ Spark connector

● Snowflake Best Practices

○ One platform, one copy of data, many workloads

○ Enrich datasets using the Snowflake Marketplace

○ External tables

○ External functions

○ Zero-copy cloning for training snapshots

○ Data governance

4.2 Train a data science model.

● Hyperparameter tuning

● Optimization metric selection (e.g., log loss, AUC, RMSE)

● Partitioning

○ Cross validation

○ Train validation hold-out

● Down/Up-sampling

● Training with Python stored procedures

● Training outside Snowflake through external functions

● Training with Python User-Defined Table Functions (UDTFs)

4.3 Validate a data science model.

● ROC curve/confusion matrix

○ Calculate the expected payout of the model

● Regression problems

● Residuals plot

○ Interpret graphics with context

● Model metrics

4.4 Interpret a model.

● Feature impact

● Partial dependence plots

● Confidence intervals


Domain 5.0: Model Deployment

5.1 Move a data science model into production.

● Use an external hosted model

○ External functions

○ Pre-built models

● Deploy a model in Snowflake

○ Vectorized/Scalar Python User Defined Functions (UDFs)

○ Pre-built models

○ Storing predictions

○ Stage commands

5.2 Determine the effectiveness of a model and retrain if necessary.

● Metrics for model evaluation

○ Data drift /Model decay

■ Data distribution comparisons

● Do the data making predictions look similar to the training data?

● Do the same data points give the same predictions once a model is deployed?

● Area under the curve

● Accuracy, precision, recall

● User defined functions (UDFs)

5.3 Outline model lifecycle and validation tools.

● Streams and tasks

● Metadata tagging

● Model versioning with partner tools

● Automation of model retraining

Who this course is for:

  • Data Scientist, ML Engineers, AI Engineers, Data Engineers