Chapter 7. What the Analyst Wants

Data-driven developers need to understand that data in their systems might have a lifecycle that extends far beyond the flow of transactions for which their specific application might be immediately responsible. Whether analysts are a part of your team or separated by several organizational divisions (Figure 7-1), the data they work with is the same data that powers the rest of the organization. Implementation artifacts can distort this, but the data characteristics important to those professions can and should be considered (see Figure 7-2).

ddsd 0701
Figure 7-1. Software developer and analyst
ddsd 0702
Figure 7-2. Data loses context

Knowledge management is even broader in scope. Introduction of this topic can lead many organizations to consider triple stores for the storage and retrieval of semantic data. Semantic technology is transformational in that it allows an organization to ask and answer questions effectively at a higher level. Ontological inquiries (both philosophical and information-science based) such as Who are our real customers? can create much more effective analytic answers to business questions such as How do we increase our perceived value in the eyes of our customers?

Once again, this describes the conditions needed for small, nimble groups to disrupt entire industries by asking a few of the right probing questions and by implementing solutions based on a better understanding of the real data. Developers who have not encountered requirements in this area yet might well encounter them soon.1

Extracting, Transforming, and Loading (ETL)

With RDBMS-centric applications, OLAP might well be extracting, transforming, and loading (ETL) records from their data stores into other stores specialized to do the sort of operations that analysts need to do.

ETL itself can be a formidable undertaking, but even with this effort, analytic results are delayed by at least a few hours. Small and nimble companies are disrupting markets by reacting faster than the established incumbent companies. Analytical flows that include ETL processes should be carefully considered. Also, ETL typically strips data from its source context and from its security protections. This might well complicate compliance efforts.

Complete and Integrated

Analysts want complete information about what is happening over a wide scope, which might go well beyond one siloed part of an organization. The less constrained the view, the better the analysis that can be done. What challenges data-driven developers, who must support analysts, is that data is spread out in most organizations across a myriad of siloed systems that do not have synchronized data models.

Collecting data with ETL is fraught with challenges. If our data management system can handle data “as is,” then analysts can do their work more effectively.

Accurate

If accuracy is absolutely essential, as it is in financial institutions, then our OLTP systems’ adherence to strict rules for updates need to be reinforced. ACID compliance has been a well-known feature of OLTP data systems historically. OLAP system were optimized for breadth of scope and flexibility over accuracy. If our analysis results require high accuracy, we might find that the classic ETL processes might not be up to the task. Architectural approaches like the construction of an Operational Data Hub could be called for.

Flexible and Timely

Flexible analysis support is necessary because the future is impossible to predict. OLAP systems are designed to support “index everything” approaches. The RDBMSs that support many OLTP data systems do not provide comprehensive indexes, but instead offer limited indexes that are purposefully built to speed the anticipated transactions. Unanticipated queries against such systems are not performant.

When all of the previous criteria are met, analysts have timely data with which to make decisions. Whether those decisions benefit the organization itself, or whether these outputs are sold to others, the value of timely analysis is far greater than analysis in retrospect. Analysts will want to extract data from our systems in either batch or real-time modes. We should consider whether we want to prioritize analytics by adopting a data management system that can work with data “as is,” without requiring uniform schemas across datasets.

1 Abraham Bernstein, James Hendler, Natalya Noy, “A New Look at the Semantic Web”, Communications of the ACM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset