The real power and value proposition of Apache Spark is its speed and platform to execute data processing and data science tasks. Sounds interesting? Let’s see how easy it is!
Packt’s Video Learning Paths are a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it.
Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists. Spark's unique use case is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations to allow data scientists to tackle the complexities that come with raw unstructured datasets.
This Learning Path starts with an introduction tour of Apache Spark 2. We will look at the basics of Spark, introduce SparkR, then look at the charting and plotting features of Python in conjunction with Spark data processing, and finally take a thorough look at Spark's data processing libraries. We then develop a real-world Spark application. Next, we will help you become comfortable and confident working with Spark for data science by exploring Spark’s data science libraries on a dataset of tweets.
The goal of this course to introduce you to Apache Spark 2 and teach you its data processing and data science libraries so that you are equipped with the skills required from modern data scientists.
This Learning Path is authored by some of the best in their fields.
Rajanarayanan Thottuvaikkatumana, or Raj, is a seasoned technologist with more than 23 years of software development experience at various multinational companies. His experience includes architecting, designing, and developing software applications. He has worked on various technologies including major databases, application development platforms, web technologies, and big data technologies. Currently he is building a next generation Hadoop YARN-based data processing platform and an application suite built with Spark using Scala.
Eric Charles has 10 years’ experience in the field of Data Science and is the founder of Datalayer, a social network for Data Scientists. His typical day includes building efficient processing with advanced machine learning algorithms, easy SQL, streaming and graph analytics. He also focuses a lot on visualization and result sharing. He is passionate about open source and is an active Apache Member. He regularly gives talks to corporate clients and at open source events.