
Explore open source tools to aggregate, store, process, and analyze data across batch and streaming workloads. Build practical skills to select the right tools and strategies for real-world data challenges.
Identify data movement strategies and tools for migrating data between stores and formats, considering location, formats (binary, csv, avro), transformation, and encryption. Distinguish bounded and unbounded data, selecting Kafka connectors.
Choose the right data store for the job by evaluating relational databases, white column stores, key-value stores, document stores, graph stores, and object stores.
Explore batch and real-time data processing with open source tools, comparing bounded versus unbounded datasets. Learn how Spark, Beam, Flink, and Kafka enable streaming, aggregation, and real-time analytics.
Introduce open source solutions for massive data workloads, balancing batch and real-time processing for an online grocery store while selecting the right tools for storage, analysis, and visualization.
Choose the right data store to enforce relationships, normalize data, and support caching, flexible documents, and graph networks for a grocery data ecosystem.
Consume stored data and share it with customers and partners in avro or other formats, then visualize data from elasticsearch and the api using kibana dashboards to create insightful visualizations.
Create a free Azure account to receive a $200 credit for 30 days, then switch to pay-as-you-go, add a debit card, and scale down VMs to save costs.
Learn to set up web development tools with WebStorm, choose a browser (Chrome or Edge), and install the CLI tools and package manager for Angular projects.
Download the git repository, log in, and prepare your local development environment to set up the communities cluster using the Azure Resource Manager template.
Provision a two-tier kubernetes cluster using a template and parameters, with one system node and three agent nodes, then execute the deploy script to create resources.
Validate the newly created Kubernetes cluster, review resource groups and pool setup, retrieve cluster credentials, and use controls to run notebooks and scale the cluster for cost efficiency.
Publish cluster resources with Helm as a single unit, using install, upgrade, and uninstall, and explore namespaces, services, and persistent volumes.
Set up and explore a debugger container on Kubernetes with helm charts, inspect deployments in the debugger namespace, and deploy MySQL 5.6, Postgres, Cassandra, and MongoDB.
Set up the MySQL infrastructure using data definition language and data manipulation language files, install and verify the database, and confirm its IP address before proceeding to Cassandre.
Set up Redis, verify it is running, and inspect the container status and internal versus external IP addresses for application communication; then proceed to ElasticSearch and Kibwana cluster.
Learn to set up ElasticSearch and Kibwana, monitor readiness across namespaces, manage their dependency, and use a persistent volume with the watch command before moving to the next setup.
Set up neo4j in standalone mode, accept the enterprise license, configure the login password, retrieve the password if needed, log into the database, and verify ports and external IP.
Learn to provision a Kafka ecosystem by installing Zookeeper, broker, Schema Registry, Kafka Connect, and KSQL in dependency order, verifying readiness before deployment.
Validate the setup and implement a local dns mapping that maps service ip addresses to domain names, using a script to generate host entries for both local and cluster environments.
Set up a MySQL-based e-commerce and inventory database, including users, privileges, schemas, and sample data. Use a data generator app to simulate orders, shipments, and replenishments via Kafka streams.
Explore implementing a key value cache with Redis to store and retrieve backend results, checking the cache before querying the database and managing keys with the Redis CLI.
Learn how to perform batch analysis with Apache Spark by connecting to MongoDB, joining the product and product details collections, and creating an enriched dataset loaded into MongoDB.
Learn real-time analysis of unbounded data streams with the Kafka ecosystem, including Zookeeper, brokers, producers, consumers, schema registry, connect, and streams to join and enrich data.
Visualize data with ElasticSearch and Kibwana dashboards using SpringBoot APIs and D3.js for interactive reports. Run sample reports against a database and explore rest API querying, ElasticSearch, Kafka, and dashboards.
Access course resources, contact via email, and explore the GitHub repository for all materials; follow Efrem tutorials on Twitter, Instagram, and YouTube for upcoming topics and continued learning.
The process of selecting the right tools, technologies and strategies for aggregating, processing and making sense of high-velocity, high-volume application log data from tens, hundreds or sometimes thousands of sources can be very overwhelming, expensive, intimidating, stressful and frustrating. This course offers a complete, hands-on instruction on how to aggregate, process, search and visualize massive log data using open source software tools, frameworks and platforms available today to solve these challenges.