Pandas with R

R has a class of objects called DataFrame, which is the same as a pandas DataFrame. The R DataFrame is several times slower than pandas. Hence, learning pandas will also help tackle data manipulation problems in R. However, using the data.table data type to handle huge DataFrames in R is the best solution.

The reticulate package helps to access and use Python packages in R. For example, you can run these Python snippets in R:

library(reticulate)

# Installing a python package from R
py_install("pandas")

# Importing pandas
pd &lt;- import("pandas", convert = FALSE)

# Some basic pandas operations in R
pd_df &lt;- pd$read_csv("train.csv")
pd_head &lt;- pd_df$head()
pd_dtypes &lt;- pd_df$dtypes

The same can be done on any other package such as NumPy as well:

numpy &lt;- import("numpy")

y &lt;- array(1:4, c(2, 2))
x &lt;- numpy$array(y)

If you already have a concrete pandas function written in Python, you can make use of it in R through the reticulate package.

Consider the following Python code snippet:

import pandas
def get_data_head(file):
    data = pandas.read_csv(file)
    data_head = data.head()
    return(data_head)

Now, the preceding script is saved as titanic.py. This script could be used in R as shown:

source_python("titanic.py")
titanic_in_r &lt;- get_data_head("titanic.csv")

An interactive Python session from R can be created using repl_python().

For example, you can write something like the following:

library(reticulate)
repl_python()
import pandas as pd
[i*i for i in range(10)]

And it returns the results in the R shell itself as though it was a Python IDE.

Python objects (lists, dictionaries, DataFrames, and arrays) created in a Python session can be accessed via R. Suppose df is a Python DataFrame whose summary needs to be found using R. It can be done as follows:

summary(py$df)

Table of Contents for Pandas with R

Create new playlist

Sign In

Sign Up

Table of Contents for
Pandas with R