Simplifying Data Engineering and Analytics with Delta

BIRMINGHAM—MUMBAI

Simplifying Data Engineering and Analytics with Delta

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dhruv Jagdish Kataria

Senior Editor: Tazeen Shaikh

Content Development Editor: Sean Lobo, Priyanka Soam

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Manju Arasan

Production Designer: Roshan Kawale

Marketing Coordinator: Nivedita Singh

First published: July 2022

Production reference: 1290622

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80181-486-7

www.packt.com

This book is dedicated to my parents for their unconditional love and support.

While there are too many to name here, I would like to thank my mentors and colleagues that have encouraged and aided me in this journey. Last but not least, I would like to thank the team at Packt for all their help and guidance throughout the process.

Foreword

My father was one of the first chief information officers (CIOs) back in the mid-1980s. He led all of IT for the largest commercial property insurer in the world. He reported to the CEO, which, at that time, was uncommon as most IT functions reported to the CFO because they were cost centers. Every weekend he would bring home some type of new technology: an Apple 2E, an IBM PC, even a "portable" computer that weighed 40 lbs. My sisters and I would play with them for hours on end, creating spreadsheets and writing basic programs. At the time, I viewed him as being on the bleeding edge of technology, a real "techie."

When I graduated college and went to work at IBM in 1991, I came home and tried to talk about technology with my father using all the speeds and feeds of the mid-range and Unix systems that I had just been trained on. Each time I mentioned a particular technical specification, he would ask me "What does that do?" or "Why is that important?" His questions frustrated me. When I explained why the SPEC-INT metric was important, he would look confused. I began to think my father wasn't the techie I once believed him to be. And I was right. Part of me was disappointed with this realization. But, over time, I came to see that his expertise was not the technology itself, but understanding the business strategy deeply and translating how specific capabilities provided by technology could be applied to make the business strategy succeed.

Fast forward 30+ years and I'm now the vice president of Global Value Acceleration at Databricks, one of the fastest-growing software companies in history. I lead a global team of consultants, or translators, that help prospects and customers connect the technical power of our data and AI platform to the meaningful business value its capabilities will deliver as they pursue their business strategy.

Looking back, I realize that I've been doing value translation my entire career. I found that when the business strategy meets the technical strategy and they are well aligned, magic happens. Executives who hold budgets and decision-making authority accelerate and approve initiatives and their associated spending. Likewise, when the translation work isn't done or isn't done well, they deny those requests. Over my career, I've learned that when those requests fail, it's generally not the fault of the technology. It comes down to the quality of the translation and the underlying story.

The need for translators in data is significant and increasing. According to a recent McKinsey article, "(data) translators play a critical role in bridging the technical expertise of data engineers and data scientists with the operational expertise of marketing, supply chain, manufacturing, risk, and other frontline managers. In their role, translators help ensure that the deep insights generated through sophisticated analytics translate into impact at scale in an organization. By 2026, the McKinsey Global Institute estimates that demand for translators in the United States alone may reach two to four million."

Through thousands of translation engagements with global enterprises over the last decade, my team, with our business value assessment (BVA) methodology, has proven to be a critical ingredient to the success of large initiatives. The recipe that translates complex technology to the C-suite for investment consideration follows a simple framework comprising a story. It draws executives in, making it easy for them to say "yes":

  1. Key strategic priorities
  2. Use cases aligned with those priorities
  3. Technical barriers in the way of success
  4. Capabilities required to succeed
  5. Value to be realized when successful
  6. Return on investment
  7. Success plan

According to International Data Corporation (IDC), 95% of technology investments require financial justification. This framework provides the financial justification that is needed, but it also reinforces the urgency to act by connecting the project to the most important priorities or business problems that the C-suite and board have their eyes on, and it specifies the capabilities required for success. When you put these together, you have a CFO-ready business case that qualifies and quantifies the value setting your project apart from all others.

This is why I've been so excited about this book. The opportunity to apply powerful technology such as Delta and deliver impact all the way to the boardroom of your employer is real and required for success in today's market. When I first worked with Anindita at Databricks, it was clear to me that she has a special talent that few technical people have. She is a translator. She can speak succinctly about very complex technical topics, make them easy to understand at any level, and connect the technology to why it matters to the business. Her ability to do this for our customers and for other Databricks employees has helped her, and Databricks, succeed in many ways.

As you read on from here, note how everything from data modeling to operationalizing Delta pipelines is made easy to understand and translatable to the business. Anindita, in her special way, will guide you to become a better data engineer while infusing you with specific skills to become a data translator, whose future value may just be priceless.

Doug May

VP, Global Value Acceleration

Databricks Inc.

Contributors

About the author

Anindita Mahapatra is a lead solutions architect at Databricks in the data and AI space helping clients across all industry verticals reap value from their data infrastructure investments. She teaches a data engineering and analytics course at Harvard University as part of their extension school program. She has extensive big data and Hadoop consulting experience from Think Big/Teradata, prior to which she was managing the development of algorithmic app discovery and promotion for both Nokia and Microsoft stores. She holds a master's degree in liberal arts and management from Harvard Extension School, a master's in computer science from Boston University, and a bachelor's in computer science from BITS Pilani, India.

About the reviewer

Oleksandra Bovkun is a solutions architect for data and AI platforms and systems. She works with customers and engineering teams to develop architectures and solutions for data platforms and guide them through the implementation. She has extensive experience and expertise in open source technologies such as Apache Spark, MLflow, Delta Lake, Kubernetes, and Helm, and programming languages such as Python and Scala.Furthermore, she specializes in data platform architecture, especially in DevOps and MLOps processes. Oleksandra has more than 10 years of experience in the field of software development and data engineering. She likes to discover new technologies and tools, architecture patterns, and open source projects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset