Installing Scala and Spark on Linux (Ubuntu)

A free video tutorial from Jose Portilla
Head of Data Science at Pierian Training
Rating: 4.6 out of 5Instructor rating
86 courses
3,932,019 students
Installing Scala and Spark on Linux (Ubuntu)

Lecture description

Full guide to installing Spark and Scala on a Linux Ubuntu Platform

Learn more from the full course

Scala and Spark for Big Data and Machine Learning

Learn the latest Big Data technology - Spark and Scala, including Spark 2.0 DataFrames!

10:06:42 of on-demand video • Updated September 2019

Use Scala for Programming
Use Spark 2.0 DataFrames to read and manipulate data
Use Spark to Process Large Datasets
Understand hot to use Spark on AWS and DataBricks
English [Auto]
Hello everyone and welcome to the Scala and Spark Ubuntu installation lecture. In this lecture we'll be walking you through how to install Scala and its dependencies such as Java and also how to install spark as well as the atom Text Editor. Let's get started. All right, here I am at my ubuntu desktop. First thing you need to do is open up your terminal. If you don't know how to do that, you can just always search your computer and then search for terminal and it should bring it up. In this case I already have it open so I'll just go ahead and close that. Here at the terminal we need to install Java first so you can go ahead and check to make sure Java is already installed by just typing Java. Version hit enter. And if you get something like this, it means you need to install Java and JDK in order to make sure that Scala installs correctly. First thing you want to do is say sudo apt get and then update. Keep in mind when we're using sudo, you're going to need to enter your password. So if you don't have a password for your ubuntu program, you're going to need to get that. Then just put in your password since this is a pseudo command. It will work and it will make sure that sudo apt get is fully updated. So that will update our package index which will make sure that we actually grab the latest versions of everything we need. And another quick note I want to make is you can check out a written instruction file or a link to a written instruction file in the resource section for this lecture. So in case you just want to read, instead of actually watching this video, you can just read out through the commands once you've gone ahead and said sudo apt get update. The next thing we want to do is actually install the default JDK. So say sudo apt get. Install default JDK hit enter. It'll ask you if you want to continue. Say yes or put in Y for yes and it will install JDK. I'm going to jump ahead in time to this finished installation. All right, so Java has finished downloading and installing. Now we need to download and install Scala. So in your terminal you can type this. You will say sudo. APT get. Install Scala. Hit enter, you'll get a prompt that says, Do you want to continue? Type Y for Yes. And it will download and install Scala for us. Let's jump in time to this finished installation. All right. So we finished installing Scala to make sure that everything installed correctly. Go ahead and type Scala into your terminal and you should see a scala repl or read evaluate print loop. And then if you want to really make sure that everything is working, you can say print ln double quotes. Hello. Hit enter and you should see hello back Right now we have Scala installed but we need to go on to the next step and install spark. If you have everything running like this so far you are good. So do Colon Q to quit this Scala interpreter and you can type clear for your terminal. So so far we've installed Java and we've installed Scala. Now it's time to install Spark. In order to install spark, we need to have git installed. So to make sure we have git installed, just say sudo. APT or apt. Get install. Git. Git. Hit. Enter. Say yes to continue and let that install. I'm going to jump again in time to this finished installation. In order to install spark, we need to download the entire package. So open up a browser in order to do this. In this case I'm just using Firefox web browser and go to spark dot apache.org. Let me pull this up. spark.apache.org. Hit enter. You should see this page open up. And then once that's loaded, click on download and make sure that you're using the latest version of Spark that's greater than 2.0. If you want, you can use 2.0 0.1, 2.0 0.2, etcetera, But it has to be greater than 2.0 to make sure everything we do in this course works. You want a pre-built for Hadoop 2.7 and later and you can do a direct download or you can select an Apache mirror. If the direct download is slow, but direct download should be fine for us. Go ahead and click here to download this tgz and then hit. Okay. And let that download. I'm going to jump ahead in time where this is finished downloading. All right. My download has completed. I have this spark two point 0.1 and it's pre-built for Hadoop 2.7. Note that it's a tgz file, so we need to extract that. I have it under my downloads folder. I will move it to home just so it's easier to find later on. So under home now I can see I have it. This tgz folder. In case it's not under downloads for you. You may also want to check your tmp or temporary folder that should be available to you in your browser. If you just go over here, you have a click here to display progress of ongoing downloads that should also help you find it. Chrome also has a very similar thing where on the bottom over here it will show you where files downloaded are actually saved. Now click to your terminal. And go home. You can do this by just saying CD to change directory to your home directory and now you should see that sparked tgz file is there. Let's say tar xvf and then begin typing spark. You can use Tab to autocomplete this and this should extract everything that we need. So let that extract. I will jump forward in time. All right. Now that that has finished extracting that spark folder, I should be able to just say LZ list and see that extracted spark folder. So you will change directory to spark and you can use tab to autocomplete this. List of folders here and you should see bin as a folder. Say CD two bin. Then go ahead and list these again. You should see things such as Spark, Submit, Spark, SQL, Spark, Shell, Spark, etcetera. We're going to show you how to work with those later on. Right now, we're going to keep things simple and just open up the spark shell, which is basically like a read evaluate print loop, sort of terminal deal with spark instead of just Scala. So to do this, say dot forward slash spark, dash shell. Hit enter. And this should eventually bring up the spark shell. And as you're running this, you may see some warnings, but don't worry, We'll show you later on how to set logging levels so you don't see a bunch of warnings as you're starting up a spark shell for right now. We'll keep things simple. You should see that it's starting a spark session, spark context. And it also gives you a little web link here. You can go ahead and if you scroll up here, grab that Http URL and it has a web interface for you to check out, which is actually really cool. To make sure everything's working here under Sparc. Just type in print. Ln. Then type. Hello world. This has to be in double quotes. Hit enter. You should see back. Hello world. That means everything's working correctly for us. Which is exactly what we need. You now have Spark Scala and Java all set up on your ubuntu computer. Type in colon Q to quit out of this. Next, let's show you how to install the atom text editor, which is the first IDE. We'll be working with. To start off with, when we're learning Scala, we will actually not even be using an editor. We'll go line by line in Scala or Spark Shell's prompt, but later on we will expand our knowledge to working with a text editor. And then after that we will actually show you how to work with IntelliJ. But for now, keeping things simple, let's just go Atom. Text Editor. Hit Enter. Click on the first link. It should be just atom.io and this will eventually take you to a downloads page. And then once you're at this website, just say download DB and this will begin to download this Debian package for atom. Say save file and then let that download. I'm going to jump ahead in time for this to finish downloading. All right. Now that that has finished downloading, let's find that file. Right here. Adam And it's a dot deb file. We can open up its file location. In this case, it's right here under downloads. Once you've located the deb file, you can right click and you should be able to see open with software install. Click on that and this should eventually install your atom software. So you should see ubuntu software have it load up. And then click install. And this will go ahead and install the atom text editor. Once you have atom text editor installed, you may need to put in your password. So put that in and authenticate. And now that we have Adam installed and opened, there's one last thing we need to do, and that's actually to install some packages that will help us program with Scala and Spark. You can click here under install a package Open installer. And here you are going to search for Scala. Hit enter. It's going to begin searching for packages for Scala. And then we want the Scala language support in atom. That's just should just be language dash, Scala click install and that will install a scala language support for this text editor meaning things such as code completion and text highlighting. Once that has finished installing, we want to install one more package. So go and search for terminal and we will install a package which will allow us to open a terminal right here in Atom. There's lots of options for this. I prefer this platformio IDE terminal. So let's install that one. All right. Now that we have this terminal plugin downloaded, let's show you how to create an example Scala script. You can click down here. There's a plus that will open up a new terminal and then we will go ahead and create an example. Scala script. So you'll just come up here. You can close any of these, say file new file. And let's just say print ln. My first Scala script. Double quotes there. So this is just a print line command and we'll learn how to do a lot more Scala later on. And then down here where it says plain text, click on that and then type in Scala. This will encode this to be a Scala. And then you should see the syntax highlighting. Now we haven't saved this yet, so you can do Ctrl or command S and then save this. We will save this as my first script. Dot Scala To keep things simple, I'm just putting it right in my home directory. Save that and then let's show you how to run this file. I will say call that spark shell. In this case, I need to call the whole folder path. So I will say spark, wherever that was, and then bin and then spark shell. And you depending on where you are located in your terminal, you may need to put in like home user slash spark, etcetera. Hit enter. That should load up the spark shell as long as your folder locations are exactly the same as what I showed you earlier. This will then load up the spark shell. And once that has loaded up, we can just say colon load and then type in the name of your script. In this case, it's my first script. Dot Scala. Hit enter and you should see my first script as the output. Perfect. So if you have any questions on this download and installation process, feel free to post them to the Q&A forums, but make sure you do a search of the Q&A forums and check out the written guide. As long as you followed everything exactly as I showed you, you should have been able to follow along with the exact same steps. All right. Thanks, everyone, and I'll see you at the next lecture.