In order to see how the lambda keyword can be used, we need to create some data. We'll create data containing date columns. Handling date columns is a topic in itself, but we'll get a brief glimpse of this process here.
In the following code, we are creating two date columns:
- Start Date: A sequence of 300 consecutive days starting from 2016-01-15
- End Date: A sequence of 300 days taken randomly from any day between 2010 and 2025
Some date/time methods have been used to create these dates in the following code block. Please take note of them and ensure that you understand them:
### Importing required libraries
import datetime
import pandas as pd
from random import randint
### Creating date sequence of 300 (periods=300) consecutive days (freq='D') starting from 2016-01-15
D1=pd.date_range('2016-01-15',periods=300,freq='D')
### Creating a date sequence with of 300 days with day (b/w 1-30), month (b/w 1-12) and year (b/w 2010-2025) chosen at random
date_str=[]
for i in range(300):
date_str1=str(randint(2010,2025))+'-'+str(randint(1,30))+'- '+str(randint(3,12))
date_str.append(date_str1)
D2=date_str
### Creating a dataframe with two date sequences and call them as Start Date and End Date
Date_frame=pd.DataFrame({'Start Date':D1,'End Date':D2})
Date_frame['End Date'] = pd.to_datetime(Date_frame['End Date'], format='%Y-%d-%m')
The output DataFrame has two columns, as shown here:
Using this data, we will create some lambda functions to find the following:
- Number of days between today and start date or end date
- Number of days between the start date and end date
- Days in the start date or end date that come before a given date
In the following code block, we have written the lambda functions carry out these tasks:
f1=lambda x:x-datetime.datetime.today()
f2=lambda x,y:x-y
f3=lambda x:pd.to_datetime('2017-28-01', format='%Y-%d-%m')>x
Note how x and y have been used as placeholder arguments, that is, the parameters of the functions. While applying these functions to a column of data, these placeholders are replaced with the column name.
Lambda just helps to define a function. We need to call these functions with the actual arguments to execute these functions. Let's see how this is done. For example, to execute the functions we defined previously, we can do the following:
Date_frame['diff1']=Date_frame['End Date'].apply(f1)
Date_frame['diff2']=f2(Date_frame['Start Date'],Date_frame['End Date'])
Date_frame['Before 28-07-17']=Date_frame['End Date'].apply(f3)
The following will be the output:
It should be noted that these functions can be called like so:
- Like simple functions: With a function name and required argument
- With the apply method: The DataFrame column name for this to be applied on, followed by apply, which takes a function name as an argument
Instead of apply, in this case, map would also work. Try the following and compare the results of diff1 and diff3. They should be the same:
Date_frame['diff3']=Date_frame['End Date'].map(f1)
There are three related methods that perform similar kinds of work with subtle differences:
Name |
What does it do? |
map |
Applies a function over a column or a list of columns. |
apply |
Applies a function over a column, row, or a list of columns or rows. |
applymap |
Applies a function over the entire DataFrame, that is, each cell. Will work if the function is executable on each column. |
Some use cases where these methods are very useful are as follows.
Suppose each row in a dataset represents daily sales of an SKU for a retail company for a year. Each column represents an SKU. We'll call this data sku_sales. Let's get started:
- To find the annual sales of each SKU, we will use the following code:
sku_sales.apply(sum,axis=0) # axis=0 represents summing across rows
- To find the daily sales across each SKU for each day, we will use the following code:
sku_sales.apply(sum,axis=1) # axis=1 represents summing across columns
- To find the mean daily sales for SKU1 and SKU2, we will use the following code:
sku_sales[['SKU1','SKU2']].map(mean)
- To find the mean and standard deviation of daily sales for all SKUs, we will use the following code:
sku_sales.applymap(mean)
sku_sales.applymap(sd)
Now, you will be able to write and apply one-liner custom Lambda functions. Now, we'll look into how missing values can be handled.