
Explore software integration using Python, SQL, and Tableau through a case-study workflow: from data connectivity and preprocessing to logistic regression, database integration, and Tableau visualization.
Explore core concepts behind data, servers, clients, requests, and responses, including how databases store information, how web and database servers interact in the client-server model, and how browsers request data.
Explore how data connectivity links clients and servers through APIs and endpoints, enabling real-time data transfer and access to data assets via apps and developers.
Explore how APIs act as gateways between clients and servers, with endpoint-driven access to data, enabling multiple services and faster, targeted information retrieval across apps.
Explore how text files enable cross-language data exchange via APIs, using json as the common format to connect apps and servers across languages.
Define integration in programming as both cross-system communication and multi-tool unification, then show how SQL, Python, and Tableau drive data analysis and visualization.
Install Anaconda, Python, and Jupyter Notebook to set up your data science environment and learn how we will install packages and code in Python.
Python proves ideal for data science: open-source, cross-platform, general-purpose, and high-level with easy syntax and rich packages; Jupyter links Python to a browser-based notebook for integrated code, text, and output.
Install Anaconda to get Python, Jupyter notebook, and data science packages; learn to download, install, and launch Anaconda Navigator and Jupyter dashboard across Windows, Mac, and Linux.
Explore the Jupyter Notebook dashboard, the app’s starting point, noting occasional updates, and manage folders, files, and notebooks by renaming, deleting, and uploading or creating new notebooks.
Explore the Jupyter notebook shell, learn about cells, and switch between command mode and edit mode to write, run, and view code outputs.
Master Jupyter shortcuts to speed up coding by running cells and navigating in command mode. Collapse or expand code, show line numbers with Shift+L, and convert between code and markdown.
Learn to read and interpret Python error messages in Jupyter, identify name errors, fix typos, and Google solutions by adjusting variables and code accordingly.
Learn how to restart the Jupyter kernel and clear outputs. Run all cells from the first to the last and handle errors as you verify code execution.
Install scikit-learn with pip in the Anaconda prompt, leveraging Anaconda’s bundled NumPy and Pandas, and get ready to run machine learning experiments for the course.
Explore integrating SQL, Python, and Tableau to manage data like a business analyst, build predictive insights with logistic regression, and present findings through clear visualizations.
Predict work absenteeism from data on factors like distance to work, family and education, to estimate hours away and guide productivity improvements.
Explore a real-world dataset on predicting absenteeism at work, distinguish primary and secondary data, and practice data pre-processing to transform raw data into analysis-ready information.
Import the pandas library as pd, load a csv into a pandas data frame with read_csv, and inspect raw_csv_data, noting quote usage and file paths.
Inspect column names and zero-based indices to preview data, copy the initial dataset to df for safe pre-processing, and use display options plus info to confirm no missing values.
Drop the id column as it is a nominal identifier that does not explain absenteeism time, and assign the result to df to make the change permanent.
Extract the reason for absence from the data frame and use pandas unique, min, max, and len to reveal distinct values and detect a missing number.
Convert the categorical nominal reason for absence column into 28 dummy variables with pandas get_dummies, validate that rows have a single reason, and drop the check column.
Learn how to drop the first dummy variable in Python with get_dummies(drop_first=True) to avoid multicollinearity, preparing data for grouping in upcoming analysis.
Group dummy variables into categorical classes to reduce dimensionality and avoid multi-collinearity, drop the original reason column, and use loc and max to create four reason-type groups.
Concatenate the reason type columns to the main dataframe with pandas pd.concat using axis=1, rename the columns for clarity, and preview the updated df with head.
Learn how to reorder columns in a pandas data frame by creating a column_names_reordered list, moving the last four columns to the front, and applying the new order.
Learn how to create checkpoints in Python and Jupyter by copying the current df state, using a named variable like df_reason_mod to safely test data preprocessing steps.
Create a day of the week column from a date column using date_to_weekday and apply to the data frame, illustrating zero to six weekday values and prep for SQL transfer.
Analyze the next five dataframe columns—transportation expense, distance to work, age, daily workload average, and body mass index—highlighting data types, rounding, and their role in regression analysis of absenteeism.
Transform the education column into a dummy variable with map and a dictionary after computing its unique values and counts, while leaving children and pets unchanged.
Master preprocessing in Python for absenteeism time in hours data. Create a df_preprocessed checkpoint from df_reason_date_mod and prepare the frame for statistical analysis before transferring to MySQL workbench.
Explore framing a machine learning problem with a clean notebook, compare logistic regression and random forest, and predict absenteeism from features like reason for absence, workload, and distance to work.
Use logistic regression to classify absenteeism into two classes by using the median hours as the cutoff, creating a zeros-and-ones targets array.
Choose the regression inputs with pandas iloc, selecting all rows and the first fourteen columns while excluding the target column; store the result as unscaled_inputs for later scaling.
Standardize data with a StandardScaler by fitting on unscaled inputs and transforming data to subtract the mean and divide by the standard deviation, producing scaled_inputs 700 observations and 14 features.
Learn to prevent overfitting by splitting data into train and test sets using train_test_split. Control train_size and shuffle, and fix randomness with random_state for reproducible results.
Train a logistic regression with sklearn, assess accuracy on training data using both score and a manual comparison method showing 80%.
Extract the intercept and coefficients from the logistic regression, align them with feature names from unscaled_inputs, and build a summary table for use in Tableau.
Interpret logistic regression coefficients as weights or log odds, convert to odds ratios, and assess feature importance using standardized coefficients and the base model with reason zero as baseline.
Omit dummy variables from standardization with a CustomScaler, preserving interpretability while selectively scaling features; dummies stay untouched, with a minor accuracy drop and clearer coverage of absenteeism reasons.
Identify the key predictors of excessive absence, including poisoning, diseases, pregnancy, transportation expense, and pets, and compare standardized vs. unstandardized models for accuracy and interpretability.
Apply backward elimination to simplify a regression model, removing near-zero contributors like day of the week, daily workload average, and distance to work, then verify the simpler model remains accurate.
Test the model on unseen data to reach about 74% accuracy, compare with 77% training accuracy, and use predict_proba for class probabilities; save the model for SQL and Tableau use.
Save and deploy machine learning models by pickling the trained logistic regression and the pre-processing scaler, then load them in a new notebook for consistent predictions.
Deploy a machine learning model by saving it and building a reusable Python module to load and clean data and predict probabilities and categories with an absenteeism_model class.
Install MySQL Workbench and MySQL Server via the community installer, select custom features, and set a root password for startup; the video notes cross-platform steps and version considerations.
Set up and manage a MySQL Workbench connection between the GUI and the MySQL server, test the connection, and navigate multiple connections including the default one.
Explore the MySQL Workbench interface to write and run SQL queries, view results in the result grid, and monitor script output while managing schemas and SQL scripts.
Learn to run the absenteeism module in Python using a local custom module in Jupyter, with five required files: notebook, csv, the module file, and the supplementary model and scaler.
Import the absenteeism module, load and clean the new data with Absenteeism_new_data, and run predicted_outputs to generate absenteeism probabilities.
Create and configure a MySQL database named predicted_outputs using drop if exists and use commands, then connect from Python to enable integration with SQL, Python, and Tableau.
Install PyMySQL with pip, import PyMySQL in Jupyter, and establish a bridge between MySQL Workbench and Python to connect to the predicted outputs database.
Open a MySQL connection from a Jupyter notebook with PyMySQL.connect, creating a conn variable for the database, then instantiate a cursor to run SQL in Python.
Drop any existing predicted_outputs table in MySQL, create it with not null constraints, and assign bit, int, and float data types for Python and Tableau compatibility.
Learn how to run a select statement from Python using a cursor, execute the query, and handle empty predicted_outputs_table while guarding against risky variable queries.
Move data from Python to MySQL using a single multi-row insert, storing new predictions in df_new_obs and inserting into the predicted_outputs table in Workbench for efficient data transfer.
Create insert query by looping over df_new_obs rows and columns, extracting values and converting them to strings. Remove trailing commas and append a semicolon for MySQL.
Apply the execute method to insert data, commit the connection, and verify 40 new records in Workbench, then export the dataset as a CSV for Tableau.
Analyze the absenteeism model in Tableau, using a csv dataset of 40 observations to visualize age versus probability and compare logistic regression predictions with other models. Display probabilities as percentages.
Explore how to use Tableau to relate absence reasons to the probability of excessive absence. Convert measures to dimensions, set continuous variables, and interpret four reasons to gain actionable insights.
In Tableau, analyze how transportation expense relates to the probability of excessive absence using a scatterplot, percent formatting, and size/color filters by number of children to reveal trends.
Python, SQL, and Tableau are three of the most widely used tools in the world of data science.
Python is the leading programming language;
SQL is the most widely used means for communication with database systems;
Tableau is the preferred solution for data visualization;
To put it simply – SQL helps us store and manipulate the data we are working with, Python allows us to write code and perform calculations, and then Tableau enables beautiful data visualization. A well-thought-out integration stepping on these three pillars could save a business millions of dollars annually in terms of reporting personnel.
Therefore, it goes without saying that employers are looking for Python, SQL, and Tableau when posting Data Scientist and Business Intelligence Analyst job descriptions. Not only that, but they would want to find a candidate who knows how to use these three tools simultaneously. This is how recurring data analysis tasks can be automated.
So, in this course we will to teach you how to integrate Python, SQL, and Tableau. An essential skill that would give you an edge over other candidates. In fact, the best way to differentiate your job resume and get called for interviews is to acquire relevant skills other candidates lack. And because, we have prepared a topic that hasn’t been addressed elsewhere, you will be picking up a skill that truly has the potential to differentiate your profile.
Many people know how to write some code in Python.
Others use SQL and Tableau to a certain extent.
Very few, however, are able to see the full picture and integrate Python, SQL, and Tableau providing a holistic solution. In the near future, most businesses will automate their reporting and business analysis tasks by implementing the techniques you will see in this course. It would be invaluable for your future career at a corporation or as a consultant, if you end up being the person automating such tasks.
Our experience in one of the large global companies showed us that a consultant with these skills could charge a four-figure amount per hour. And the company was happy to pay that money because the end-product led to significant efficiencies in the long run.
The course starts off by introducing software integration as a concept. We will discuss some important terms such as servers, clients, requests, and responses. Moreover, you will learn about data connectivity, APIs, and endpoints.
Then, we will continue by introducing the real-life example exercise the course is centered around – the ‘Absenteeism at Work’ dataset. The preprocessing part that follows will give you a taste of how BI and data science look like in real-life on the job situations. This is extremely important because a significant amount of a data scientist’s work consists in preprocessing, but many learning materials omit that
Then we would continue by applying some Machine Learning on our data. You will learn how to explore the problem at hand from a machine learning perspective, how to create targets, what kind of statistical preprocessing is necessary for this part of the exercise, how to train a Machine Learning model, and how to test it. A truly comprehensive ML exercise.
Connecting Python and SQL is not immediate. We have shown how that’s done in an entire section of the course. By the end of that section, you will be able to transfer data from Jupyter to Workbench.
And finally, as promised, Tableau will allow us to visualize the data we have been working with. We will prepare several insightful charts and will interpret the results together.
As you can see, this is a truly comprehensive data science exercise. There is no need to think twice. If you take this course now, you will acquire invaluable skills that will help you stand out from the rest of the candidates competing for a job.
Also, we are happy to offer a 30-day unconditional no-questions-asked-money-back-in-full guarantee that you will enjoy the course.
So, let’s do this! The only regret you will have is that you didn’t find this course sooner!