Taming Big Data using Spark & Python
What you'll learn
- Big Data and its EcoSystem like Hadoop , Sqoop, Hive, Flume, Kafka, Spark using Python, Spark SQL & Spark Streaming
- Both the Concepts (Theories & Architectures) + Practicals
- Assignments & Projects Scenarios for Real Projects
- Practice questions for CCA 175 Certification
- Process continual streams of data with Spark Streaming
- Build, deploy, and run Spark scripts on Hadoop clusters
- Transform structured data using SparkSQL and DataFrames
Requirements
- Basic programming skills
- Cloudera Quickstart VM or Your Own Hadoop Setup. You can use VM either with the course without any issues
- A Laptop with Minimum RAM of 6GB to support VM (If using the VM provided in the course). You can do your own installation on local following the course
- Having SQL skills would be advantageous
Description
The Course is for those who do not know even ABC of Big Data and tools, want to learn them and be in a comfortable situation to implement them in projects. The course is also for those, who have some knowledge on Big Data tools, but want to enhance them further and be comfortable working in Projects. Due to the extensive scenario implementation, the course is also suitable for people interested to write Big Data Certifications like CCA 175. The course contains Practice Test for CCA 175.
The course is being provided with fully functional Big Data labs on Cloudera & Windows VMs, you need not to buy cluster very often to practice the tools. Hence, the Course is ONE TIME INVESTMENT for secure future.
In the course, we will learn how to utilize Big Data tools like Hadoop, Flume, Kafka, Spark, Scala (the most valuable tech skills on the market today).
In this course I will show you how to -
1. Use Python and Spark to analyze Big Data.
2. Practice Test for writing CCA 175 Exam is available at the end of the course.
3. Extensive and Real time project scenarios with solutions as you will write in REAL PROJECTS
4. Use Sqoop to import data from Traditional Relational Databases to HDFS & Hive.
5. Use Flume and Kafka to process streaming data
6. Use Hive to view and store data & Partition the tables
7. Use Spark Streaming to fetch the streaming data from Kafka & Flume
Big Data is the most in demand skills right now, and with this course you can learn them quickly and easily! You can also learn the components in the basic setup in files like "hdfs-site.xml", "core-site.xml" etc They are good to know if working for a project.
The course is focused on upskilling someone who do not know Big Data tools and target is to bring them up-to the mark to be able to work in Big Data projects seamlessly without issues.
This course comes with project scenarios and multiple datasets to work on with.
After completing this course you will feel comfortable putting Big Data, Python and Spark on your resume and also will be easily able to work and implement in projects!
Thanks and I will see you inside the course!
Who this course is for:
- The course is designed to be used for all who want to learn and move to Big Data Technologies.
- Those who want to get a real feel of project like scenarios along with learning the concepts
- ONE Stop Shop for Required Big Data Tools with Theories, Concepts, Practicals, Practice Scenarios & Project Scenarios using Python Programming Langiage
Course content
- Preview04:54
- Preview03:05
Instructor
I am an experience Machine Learning Engineer having expertise in Big Data Technologies & BI Tools (IBM Datastage). I have experience on Implementing Machine Learning using Spark, Scala & Python. I am also an experienced Datastage, Python, Spark, Scala, R & Machine Learning trainer, enrolled with many consultancies. I also work as freelancer in my free time.