Importing Time Series Data from csv-files

A free video tutorial from Alexander Hagmann
Data Scientist | Finance Professional | Entrepreneur
Rating: 4.6 out of 5Instructor rating
19 courses
121,896 students
Importing Time Series Data from csv-files

Learn more from the full course

The Complete Pandas Bootcamp 2024: Data Science with Python

Now with ChatGPT for Pandas, Online Exercises, Seaborn, Machine Learning. Fully Updated (Pandas 2.1) as of Nov 23

36:24:29 of on-demand video • Updated April 2024

Bring your Data Handling & Data Analysis skills to an outstanding level.
Learn and practice all relevant Pandas methods and workflows with Real-World Datasets
Learn Pandas based on NEW Version 2.x
Import, clean, and merge messy Data and prepare Data for Machine Learning
Master a complete Machine Learning Project A-Z with Pandas, Scikit-Learn, and Seaborn
Analyze, visualize, and understand your Data with Pandas, Matplotlib, and Seaborn
Practice and master your Pandas skills with Quizzes, 150+ Exercises, and Comprehensive Projects
Import Financial/Stock Data from Web Sources and analyze them with Pandas
Learn and master the most important Pandas workflows for Finance
Learn the Basics of Pandas and Numpy Coding (Appendix)
Learn and master important Statistical Concepts with scipy
English [Auto]
In this video, we are going to import our very first time series data. And in our files we have a CSV file called Temperature or Temp, and it contains temperature information for New York and Los Angeles for the years 2013 till 2016. And in a very first step, there's actually no difference between importing non time series data and time series data from CSV files. So first of all, we import pandas and then we are using the method Readcsv. So we are importing our CSV file temperature or temp and we actually store our new dataframe then in the variable temperature. And let's have a very first look here at our dataframe. So here we can see the first five rows and actually we have three columns. So we have the column date time, we have LA for Los Angeles and we have New York. And apparently here as the values of our dataframe, we have actually the temperature at the specific time in the specific city. And the temperature is actually in degrees Celsius. So, for example, here we have in January 11.7 in LA and -1.1°C in New York. So let's get some meta information with the info method. And here on the left hand side we have a range index and we have actually 35,064 rows. So we have over 35,000 different timestamps here. And we have actually three columns. So we have the date time column. And obviously here we have string data types. So here we have object. And this is a clear indication that here we have strings in the column. Then we have the column Los Angeles with the float data types. So these are here degrees Celsius values and therefore float data type makes sense. And also here for the New York column, we have also a float data type. So the first problem is here that in the date time column, we actually do not have a specific date time data type. So we have a string data type and this might not be really helpful. And therefore in a first step we can try to transform data information that is actually stored here in strings into a date time data type or format. And we can do this immediately when we import the dataframe. So the Readcsv method provides some parameters that helps to import the date time information. And let's go here inside and here we can see that we have the parameter pass dates and by default it's set to false. But let's have a look here. So pass dates and here we can actually pass column labels for columns that we want to transform from a string data type to a date time data type. And we have to do this as a list. And obviously our column that we want to transform is here, the date time column. So we pass here date time. And we can also have a look at the documentation for the past states parameter. So here it is. And for example, here we can pass the Boolean value. True. And then Panda tries to parse the index into a datetime format. But obviously here we have a range index, so we do not have any datetime information on the index or we can pass a list of integers or names so we can pass either a list with the index position of the columns or the column labels. And this is exactly what what we do here. We pass the column, label date time and let's see what we get here. So we're actually overriding our variable temperature. And let's have a look here again at the info method. And here we can see now that the data type changed from object to a date time 64. And here in square brackets, we have a nanoseconds. So the precision is nanoseconds if we want to have this. So now we can also update here the view on the first five rows. And now we have here the column date time with the data type date time 64. And now we can also slice here for an element in our date time column and we can select here the very first element with the Iloc operator. And here we have a so-called timestamp. And a timestamp is actually a single point in time. So for example, here we have the year 2013, we have January and we have January the 1st. And then also we have information on the time. So we have midnight here. So these are the hours. Then we have minutes and then seconds. And as you can see here above, our data frame provides hourly temperature information. So the increment is one hour and we are starting here with January the 1st at the midnight. Then our next timestamp is still January the 1st, 1:00 am. Then we have 2:00 am, 3:00 and so on. And we can also check here the data type of the timestamp with the type method. And this is here a Pandas timestamp. So the timestamp is a specific Pandas data type to store date time information. And whenever we have date time information like this, it makes sense to have the date time as the index of our data frame. And by doing so, this gives us a lot of additional functionality. So we can already do this when we import our data with the read CSV method. And we have already seen that there is the parameter index calls. And if we can pass the column label for the column that we want to have as index and in this case, it's your date time. So this is actually a very common workflow with Time series that when we import the data, we first of all pass the columns that contain date time information with the parameter pass dates, and then we set these columns as the index. So let's do this here and we overwrite our variable temperature and let's have again a look at the first five rows. And now we can see here that we have our date time column now as the index. And we can also check this with the info method. So we can see here that we have a so-called datetime index and we have 35,000 entries. And these range from January the 1st, 2013, midnight until December 31st of 2016, 11:00 PM. So we have the hourly temperature information for New York and Los Angeles for the years 2013 till 2016. And we can also call here the index attribute to further analyze our new index. And here we have our date time index, and here we have all of the timestamps. So a date time index is actually the collection of many timestamps. And we can see at the bottom that we have the data type date time 64 and as index label, we have date time and we have 35,000 entries. And we can also select a single element in our index. So let's select the very first element. And this is here, the very first timestamp. So January the 1st, 2013 at midnight. All right. This were the first steps with Time series, and I hope to see you also in the next video. Bye.