Summary

In this chapter, we discussed time series data and the steps you can take to process and manipulate it. A date column can be assigned as an index for Series or DataFrame and can then be used for subsetting them based on the index column. Time series data can be resampled—to either increase or decrease the frequency of the time series. For example, data generated every millisecond can be resampled to capture the data only every second or can be averaged for 1,000 milliseconds for each second. Similarly, data generated every minute can be resampled to have data every second by backfilling or forward filling (filling in the same value as the last or next minute value for all the seconds in that minute).

String to datetime conversion can be done via the datetime, strptime, and strftime packages , and each type of date entry (for example, 22nd July, 7/22/2019, and so on) needs to be decoded differently based on a convention. pandas has the following types of time series objects—datetime.datetime, Timestamp, DateIndex, Period, PeriodIndex, timedelta, and so on. Certain algorithms for time series classification such as shapelets and LSTM require time series components (one separable data entity containing multiple entries of time series data) to be of the same length. This can be done either by truncating all the components to the smallest length or expanding them to the longest length and imputing with zeros or some other value. Matplotlib can be used to plot basic time series data. Shifting, lagging, and rolling functions are used to calculate moving averages, detecting behavioral change at time series component change points.

In the next chapter, we will learn how to use the power of pandas in Jupyter Notebooks to make powerful and interactive reports.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset