APPENDIX B
How to Measure Digital Transformation Efforts in Annual Reports with Dictionary‐Based Automated Textual Analysis

Due to the widespread lack of concrete KPIs for measuring digital transformation outcomes, several authors (Beutel 2018; Chen and Srinivasan 2019; Hossnofsky and Junge 2019) have already applied more basic automated textual analysis. They mostly focused on numerical counting of occurrences of some sort of digital dictionary terms in financial reports and/or earnings calls as proxies for digital transformation outcomes. The original research underlying this book followed the same idea by using the previously described “replicable references” It went substantially further, not only by integrating the developed digital transformation transversal framework for clustering of dictionary terms (which was discarded in this book for the sake of simplicity), but, more important, by adding natural language processing (NLP) built on Python, a commonly used, easy‐to‐learn programming language for machine learning libraries (Python 2020): Mostly NLTK and spaCy (NLTK 2020; spaCy 2020;) for textual analysis beyond mere counting of occurrences/frequencies.

Based on these design decisions, a unique digital transformation language dictionary, across all major framework categories (catalysts, reactants, reaction mechanisms, and products), built the foundation of all subsequent analysis. (See Figure B.1.) To limit the data size explosion and the risk of potentially insufficient numbers of observations, the decision was made to stay on element level (catalysts, reactants, reaction mechanisms, and products) and not go down further below (for example, supply and demand or even down further). The initial version of this proprietary dictionary was compiled, and framework‐category clustered manually with a purposely very broad scope of terms. It covered around 400 digital (technology) related words and combined applicable literature research findings (Beutel 2018; Briggs et al. 2019; Hossnofsky and Junge 2019) with my practical experience. This dictionary already has, due to the widespread areas of this category, a strong dominance of catalyst terms. In a second step, recent natural language processing (NLP) advancements were leveraged in three different, state‐of‐the‐art wordembedding/vectoring algorithms implemented by one prepackaged Python module called “Magnitude” (Patel et al. 2018). Two FastText algorithms (Bojanowski et al. 2017) and one ELMo algorithm (Peters et al. 2018) were selected. This allowed expanding each word in the dictionary with similar terms (15 for each algorithm, leading to a total longlist of 9,346 words after removing duplicates). The third step then aimed to manually clean the resulting longlist from all irrelevant words by a combination of the author's expertise in the field and a supplementary crosscheck of a second digital transformation executive. This joint cleansing process applied several principles: It removed “typo similarities” exposed by the algorithms (“virutal,” “softvare”) as no such typos are expected in SEC filings. It deleted obvious company names (for example, Apple, Microsoft, Salesforce, etc.) as there are substantial overlaps between chosen portfolio companies as such and the digital transformation related terms as derived by the above algorithms. This prevented distortions in counting occurrences of these companies as a representation of digital transformation in their own and other companies' reports. Next, it removed obvious wrong associations (IT “cloud” versus “thundercloud”).

Schematic illustration of digital transformation (language) dictionary categories.

FIGURE B.1 Digital transformation (language) dictionary categories.

Finally, the chosen approach which possibly eliminated (with custom Python code) all words with the same lemmas (“automate” as the base form versus “automated” as a variation with the same lemma “automate”). This later avoided double counting of lemmatized tokens/words. The final dictionary now covers 1,008 terms in total across all framework categories. The described dominance of catalyst terms is true also for the final version. This does not affect us for this book, as all categories were in the end aggregated to one single measurement only for the sake of simplicity.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset