Serialization/deserialization

Serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment).[1] When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object.

Data structures such as JSON, arrays, DataFrames, and Series sometimes need to be stored as physical files or transmitted over a network. These serializations can be understood as a dump of data where data can be stored in any format (text, CSV, and so on) or structure but all the important data points can be recreated by loading/deserializing them.

Some examples of this are storing the parameters of the trained model object of a statistical model. This serialized file containing trained parameters can be loaded and the testing data can be passed through it for prediction. This is a popular method that's used to put statistical models to use.

Other uses of serialized data formats include transferring data through wires, storing objects in databases or HDDs, to make remote procedure calls, and to detect changes in time-varying data.

Let's create a sample DataFrame to understand the serialization of various file formats supported by Pandas: 

df = pd.DataFrame({"First_Name":["Mike","Val","George","Chris","Benjamin"],
"Last_name":["K.","K.","C.","B.","A."],
"Entry_date":pd.to_datetime(["June 23,1989","June 16,1995","June 20,1997","June 25,2005","March 25,2016"],format= "%B %d,%Y"),
"Score":np.random.random(5)})
df

Take a look at the following output:

 DataFrame for serialization
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset