lambda and apply

In order to see how the lambda keyword can be used, we need to create some data. We'll create data containing date columns. Handling date columns is a topic in itself, but we'll get a brief glimpse of this process here.

In the following code, we are creating two date columns:

  • Start Date: A sequence of 300 consecutive days starting from 2016-01-15
  • End Date: A sequence of 300 days taken randomly from any day between 2010 and 2025

Some date/time methods have been used to create these dates in the following code block. Please take note of them and ensure that you understand them:

### Importing required libraries
import datetime
import pandas as pd
from random import randint

### Creating date sequence of 300 (periods=300) consecutive days (freq='D') starting from 2016-01-15
D1=pd.date_range('2016-01-15',periods=300,freq='D')

### Creating a date sequence with of 300 days with day (b/w 1-30), month (b/w 1-12) and year (b/w 2010-2025) chosen at random
date_str=[]
for i in range(300):
date_str1=str(randint(2010,2025))+'-'+str(randint(1,30))+'- '+str(randint(3,12))
date_str.append(date_str1)
D2=date_str

### Creating a dataframe with two date sequences and call them as Start Date and End Date
Date_frame=pd.DataFrame({'Start Date':D1,'End Date':D2})
Date_frame['End Date'] = pd.to_datetime(Date_frame['End Date'], format='%Y-%d-%m')

The output DataFrame has two columns, as shown here:

Output DataFrame with Start Date and End Date

Using this data, we will create some lambda functions to find the following:

  • Number of days between today and start date or end date
  • Number of days between the start date and end date
  • Days in the start date or end date that come before a given date

In the following code block, we have written the lambda functions carry out these tasks:

f1=lambda x:x-datetime.datetime.today()
f2=lambda x,y:x-y
f3=lambda x:pd.to_datetime('2017-28-01', format='%Y-%d-%m')>x

Note how x and y have been used as placeholder arguments, that is, the parameters of the functions. While applying these functions to a column of data, these placeholders are replaced with the column name.

Lambda just helps to define a function. We need to call these functions with the actual arguments to execute these functions. Let's see how this is done. For example, to execute the functions we defined previously, we can do the following:

Date_frame['diff1']=Date_frame['End Date'].apply(f1)
Date_frame['diff2']=f2(Date_frame['Start Date'],Date_frame['End Date'])
Date_frame['Before 28-07-17']=Date_frame['End Date'].apply(f3)

The following will be the output:

 Output DataFrame with calculated fields on the date columns

It should be noted that these functions can be called like so:

  • Like simple functions: With a function name and required argument
  • With the apply method: The DataFrame column name for this to be applied on, followed by apply, which takes a function name as an argument

Instead of apply, in this case, map would also work. Try the following and compare the results of diff1 and diff3. They should be the same:

Date_frame['diff3']=Date_frame['End Date'].map(f1)

There are three related methods that perform similar kinds of work with subtle differences:

Name

What does it do?

map

Applies a function over a column or a list of columns.

apply

Applies a function over a column, row, or a list of columns or rows.

applymap

Applies a function over the entire DataFrame, that is, each cell. Will work if the function is executable on each column.

 

Some use cases where these methods are very useful are as follows.

Suppose each row in a dataset represents daily sales of an SKU for a retail company for a year. Each column represents an SKU. We'll call this data sku_sales. Let's get started:

  1. To find the annual sales of each SKU, we will use the following code:
sku_sales.apply(sum,axis=0) # axis=0 represents summing across rows
  1. To find the daily sales across each SKU for each day, we will use the following code:
sku_sales.apply(sum,axis=1) # axis=1 represents summing across columns
  1. To find the mean daily sales for SKU1 and SKU2, we will use the following code:
sku_sales[['SKU1','SKU2']].map(mean)
  1. To find the mean and standard deviation of daily sales for all SKUs, we will use the following code:
sku_sales.applymap(mean)
sku_sales.applymap(sd)

Now, you will be able to write and apply one-liner custom Lambda functions. Now, we'll look into how missing values can be handled.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset