Creating your own structured data

If the primary source of your data is unstructured or nonexistent, then we will start from the very beginning. Presented here is my personal workflow to create a single structured dataset:

  1. Consider a question that you would like to answer through data along with the necessary data needed to answer that question.
  2. Create a metadata document of your desired dataset columns and types.
  3. Gather the data related to the problem in one or more unstructured datasets.
  4. Convert each unstructured dataset into a machine-readable format.
  5. Seek inconsistencies in each dataset and fix them.
  6. Align types in each record to match the type defined by your metadata document.
  7. Filter columns in each dataset to only the columns defined by your metadata document.
  8. Merge your datasets into a single dataset.
  9. Identify duplicate records and consolidate them.

If this seems like grunt work, you are correct. Haskell is here to help.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset