Basic indexing

If you have come across lists in Python, you will know that a pair of square brackets ([]) is used to index and subset a list. This square bracket operator is also useful in slicing NumPy arrays. The square bracket [] is the basic indexing operator in pandas as well.

Let's create a Series, DataFrame, and panel to understand how the square bracket operator is used in pandas:

# Creating a series with 6 rows and user-defined index
ser = pd.Series(["Numpy", "Pandas", "Sklearn", "Tensorflow", "Scrapy", "Keras"],
index = ["A", "B", "C", "D", "E", "F"])

# Creating a 6X3 dataframe with defined row and column labels
df = pd.DataFrame(np.random.randn(6, 3), columns = ["colA", "colB", "colC"],
index = ["R1", "R2", "R3", "R4", "R5", "R6"])

# Creating a panel with 3 items
pan = pd.Panel({"Item1": df+1, "Item2": df, "Item3": df*2})

For a Series, the square bracket operator can be used to slice by specifying the label or the positional index. Both use cases are shown in the following code block: 

# Subset using the row-label
In: ser["D"]
Out: 'Tensorflow'

# Subset using positional index
In: ser[1]
Out: 'Pandas'

The use of the square bracket operator in a DataFrame does have some restrictions. It allows only the column label to be passed and not the positional index or even the row label. Passing any other string that does not represent a column name raises KeyError:

# Subset a single column by column name
df["colB"]

This results in the following output:

Subset of a single column by column name

A sequence of square bracket operators can be used to specify the row index or row label following the column attribute:

# Accessing a single element in a DataFrame
df["colB"]["R3"], df["colB"][1]

This results in the following output:

Slicing a single element using the square bracket operator

The rules that apply to a DataFrame apply to a panel as well—each item can be sliced from the panel by specifying the item name. The square bracket operator accepts only a valid item name: 

# Subset a panel
pan["Item1"]

This results in the following output:

Subset of a panel

To subset multiple values, a list of the labels of the entities to be subset should be passed into the square bracket operator. Let's examine this using the DataFrame. This holds good for Series and Panels as well: 

df[["colA", "colB"]]

This results in the following output:

Slicing multiple columns from a DataFrame

When a string that is not a column name is passed in, it raises an exception. This can be overcome by using the get() method: 

In: df.get("columnA", "NA")
Out: 'NA'

The square bracket operator is also useful for inserting a new column in a DataFrame, as shown in the following code block:

# Add new column "colD"
df["colD"] = list(range(len(df)))
df

This results in the following output:

Adding a new column to a DataFrame

New values can be added to Series and Panels as well, via the method shown here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset