Pandas with R

R has a class of objects called DataFrame, which is the same as a pandas DataFrame. The R DataFrame is several times slower than pandas. Hence, learning pandas will also help tackle data manipulation problems in R. However, using the data.table data type to handle huge DataFrames in R is the best solution.

The reticulate package helps to access and use Python packages in R. For example, you can run these Python snippets in R:

library(reticulate)

# Installing a python package from R
py_install("pandas")

# Importing pandas
pd <- import("pandas", convert = FALSE)

# Some basic pandas operations in R
pd_df <- pd$read_csv("train.csv")
pd_head <- pd_df$head()
pd_dtypes <- pd_df$dtypes

The same can be done on any other package such as NumPy as well:

numpy <- import("numpy")

y <- array(1:4, c(2, 2))
x <- numpy$array(y)

If you already have a concrete pandas function written in Python, you can make use of it in R through the reticulate package.

Consider the following Python code snippet:

import pandas
def get_data_head(file):
data = pandas.read_csv(file)
data_head = data.head()
return(data_head)

Now, the preceding script is saved as titanic.py. This script could be used in R as shown:

source_python("titanic.py")
titanic_in_r <- get_data_head("titanic.csv")

An interactive Python session from R can be created using repl_python().

For example, you can write something like the following:

library(reticulate)
repl_python()
import pandas as pd
[i*i for i in range(10)]

And it returns the results in the R shell itself as though it was a Python IDE.

Python objects (lists, dictionaries, DataFrames, and arrays) created in a Python session can be accessed via R. Suppose df is a Python DataFrame whose summary needs to be found using R. It can be done as follows:

summary(py$df)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset