
Explore the world development indicator dataset, with country data in country.csv (region and income group) and indicator data in indicator.csv, enabling spark analytics and time-series dashboards.
Explore the world development indicators categories—economic, social, demographic, health, and environmental—and use over 1.4k indicators to build spark visualizations and comparisons.
Understand the WDI data structure, including attributes like country name, region, ISO codes, indicator names and codes, year, and value. Metadata adds context for time-series analysis in spark.
Set up the java environment by editing the /etc/profile to configure java home, java path, and java jre, then verify with echo $JAVA_HOME.
Install and run Apache Zeppelin on Ubuntu by downloading Zeppelin 0.12.0, extracting the tarball, starting the daemon, and accessing the notebook interface on localhost:8080.
Explore Apache Zeppelin's features and benefits for big data analytics, including multi-language support, real-time execution, and native integration with Spark, Hive, HDFS, and Hadoop for collaboration and sharing across teams.
Explore how Apache Zeppelin converts data into visual insights with tables and charts like bar, line, pie, and scatter plots, and learn to label axes for dashboards.
Explores five default charts in Zeppelin and demonstrates creating bar, pie, area, line, and scatter charts from an employee data frame, highlighting interactive visuals for business decisions.
Run Spark SQL queries on dataframes within Apache Zeppelin and visualize results with built-in charts. Cache dataframes to speed repeated queries and enable interactive analytics.
Visualize spark outputs in Apache Zeppelin by turning spark data frames into interactive tables and charts using temp views and the percentage SQL interpreter, with bar, line, and pie charts.
Load csv data into spark data frames for country and indicator datasets from world development indicators project, using header true and infer schema true, and show the first 20 rows.
Analyze youth literacy rate using Spark SQL on the World Development Indicator dataset, ranking countries by literacy in 1990 and 2010, and visualizing results for policy insights.
Analyze urban population growth in India and China using Spark SQL and the World Development Indicators dataset to compare urbanization trends and their impact on infrastructure, housing, and SDG 11.
Analyze income trends in four poor countries from 1960 to 2014 using Spark SQL and the World Development Indicator data set, measuring gross national income per capita in US dollars.
Compare 2014 average income across Malawi, China, Luxembourg, and the United States using GNP per capita, ranked by income with Spark SQL, highlighting global inequality and China's rise.
Learn how to create a free Databricks account by navigating sign-up flows, entering work email, confirming your registration, and signing in to access the platform.
Explore the world development indicators analytics project using Apache Spark to load World Bank data, analyze global indicators, and visualize the results in Spark.
Explore Spark notebook basics by connecting to a cluster, executing code on the cluster, and using notebook cells to insert, edit, and document workflows with magic commands.
Learn to load data into a Spark DataFrame for world development indicators analytics by enabling header and inferred schema, uploading files, and previewing schema and data for analysis.
Learn to extract and plot the youth literacy rate from world development indicators by joining country and indicators tables, filtering by indicator code, and querying for 1990 and 2010.
Examine the ten countries with the lowest average income in 1962 and 2014, comparing income values across years using world development indicators data.
Explore life expectancy in France from 1960 to 2000 using a line chart to visualize indicators and plot the x values for the selected country data.
Explore how to view world per capita income in 2013 by selecting indicators and a country, then display the data on a world map with the appropriate plotting options.
Apache Spark Project: World Development Indicators Analytics
Are you ready to take your Apache Spark and Big Data skills to the next level by working on a real-world analytics project?
In this hands-on course, we’ll use Apache Spark, Spark SQL, and Apache Zeppelin to analyze one of the most important and widely used datasets in the world — the World Bank’s World Development Indicators (WDI). Covering over 200 countries, 50+ years of data, and hundreds of economic, social, demographic, health, and environmental indicators, this project is the perfect way to apply your Spark skills to real-world problems.
You’ll learn step by step how to:
Set up Spark and Zeppelin on your system (Windows, Ubuntu, or Docker)
Load and explore massive datasets with Spark DataFrames
Write Spark SQL queries to analyze GDP, literacy, poverty, trade, population, life expectancy, urbanization, and more
Build interactive visualizations and dashboards in Zeppelin
Compare economic and social development patterns across countries, regions, and decades
Deliver a resume-ready Spark project that you can showcase in interviews
What makes this course different?
Practical, project-based approach: Learn Spark by solving real-world questions.
Step-by-step guidance: Easy to follow, even if you’re new to Spark.
Comprehensive coverage: From environment setup → to data exploration → to insights.
Portfolio-ready project: By the end, you’ll have a complete Spark + Zeppelin project to demonstrate your skills.
Who is this course for?
Beginners who want to break into Big Data and Analytics with a hands-on project.
Data engineers & data analysts looking to strengthen their Spark SQL and Zeppelin skills.
Job seekers & interview candidates who need a portfolio project to stand out.
Anyone interested in exploring global development trends through the power of big data.
Real-World Case Studies Covered
Gini Index (Income Inequality)
Youth Literacy Rates
GDP per Capita (PPP) for India & China
Trade, Imports & Exports Analysis
Poverty Alleviation Trends
Life Expectancy in India, China & France
Urbanization & Infant Mortality Studies
Richest vs Poorest Countries (1962 vs 2014)
Birth Rates in G7 Countries
Global Per Capita Income in 2013
By the end of this course, you will be able to:
Confidently work with Apache Spark, Spark SQL, and Zeppelin.
Perform advanced data analysis on large, real-world datasets.
Build interactive notebooks and dashboards for visualization.
Showcase your Spark project in interviews and on your resume.
This is not just another Spark course — it’s a career-boosting project that prepares you for the real-world challenges of data engineering and analytics.