How it works...

The mvad contains data about the transition from school to work. It is recorded for July 1993 to June 1999. States are EM, FE, HE, JL, SC, and TR, which indicate employment, further education, higher education, joblessness, school, and training. There are other additional variables such as male, catholic, grammar school. Belfast shows location of school, funemp shows father employment status, fmpr shows father's current or recent job, livboth shows living both parents, gcse5eq shows qualification gained, followed by monthly status. The data is stored in STS format, that is, State Sequence format. In STS format, each sequence is given as a row of consecutive states. We can see that using the head command to print the first six sequences in STS format.

Using the seqstatl function we find all the states. We store short labels and long labels in two different vectors. Using the seqdef function we create state sequence objects. Once we create the state sequence object we display the sequence in a different format, that is, STS (default) or SPS (State Performance Sequence).

Before mining frequent sequential patterns, you are required to create transactions with the temporal information. In this recipe, we introduce two methods to obtain transactions with temporal information. In the first method, we create a list of transactions, and assign a transaction ID for each transaction. We use the as function to transform the list data into a transaction dataset. We then add eventID and sequenceID as temporal information; sequenceID is the sequence that the event belongs to, and eventID indicates when the event occurred. After generating transactions with temporal information, one can use this dataset for frequent sequential pattern mining.

In addition to creating your own transactions with temporal information, if you already have data stored in a text file, you can use the read_basket function from arulesSequences to read the transaction data into the basket format. We can also read the transaction dataset for further frequent sequential pattern mining:

> help("timedsequences-class")  
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset