Introduction

A tweet from Dr. Brandeis Marshall.

Source: https://twitter.com/csdoctorsister/status/1536343596336418821

My journey to data conscientiousness started when I was a kid as I rolled coins with my mom and helped my dad organize his job's employee resource group's annual membership rosters. My mom would bring out the big Welch's jar, about half full of loose change. Sitting on the living room floor, she'd dump all the coins on the carpet and we'd start separating them by denominations. Mom would bring out all the coin wrapper rolls she'd gotten from the bank. We'd stack the pennies, nickels, dimes, and quarters—and talk about whatever moms and daughters talk about. She taught me how many of each denomination goes into each coin wrapper roll: 50 pennies gives us 50 cents, 40 nickels gives us $2, 40 quarters gives us $10, and 50 dimes gives us $5. When we'd filled as many of the rolls as we could, we'd count up our earnings. Sometimes it would be $30, and other times it would be closer to $100.

At first, I simply liked the counting, the talking, and stuffing the coins in those small paper wrappings. As I grew up, I started to associate these coins as a resource to get what I wanted. That 50 cents could be put to excellent use to get some SweeTARTS. Two dollars in nickels would keep my candy stash stocked for a week. Five dollars would pay for my favorite order at Swenson's and I'd have change left over. In college, $10 in quarters was gold because no laundry would have been done otherwise. Looking back, I realize it was her way of having me practice my counting, learning the many ways to make a dollar using coins and the value of saving.

My dad, for more than a few years, had this annual huge task of verifying each chapter's membership rosters for his job's nationwide employee resource group. The first year or two or three, Mom and I watched him. Somewhere along the way, I started to help out when asked at first and even volunteered, maybe once. My recollection is fuzzy. What I remember vividly was that the amount of mail he received took down a few forests. There were endless printouts on standard perforated paper from those old dot-matrix printers. The basement became overrun with boxes of unopened envelopes of various sizes, from letter-sized to overstuffed legal-sized.

While my dad was figuring out which pile to tackle, opening each envelope started my supply chain of organization and sorting: record the chapter location and region, clip the membership roster to the envelope, highlight the number of chapter members, leave the chapter's membership dues checks in the envelope, and add this new envelope to the other envelopes in that region pile. And yes, each chapter printed and snail-mailed their membership rosters. The cross-checking of the mailed rosters year after year was dizzying. Some chapters used a great printer and had access to plenty of printer ink. Other chapters weren't so fortunate. My young eyes were called upon to read the smeared and faded letters.

Reconciling these membership rosters took weeks of shifting from one pile to the next. Some people changed chapters due to job relocations but didn't update their membership affiliation. Other people decided to not be part of the employee resource group anymore and didn't follow the member pause/deactivate process. The employee resource group finally created an online database—my dad had something to do with this, I'm sure.

Rolling coins introduced me to data as numbers, math, and financial literacy without being intimidating. Organizing chapter membership rosters introduced me to data as people, context. and guideposts to decisions. Armed with this understanding, I found that school was just one cool place where reading, math, and general exploration of data things happened.

But while roaming the computer science hallways at the University of Rochester I came to recognize how data was viewed by the world. The Year 2000 problem, the Y2K bug, dominated the headlines my junior year. Everyone seemed so concerned about the computing infrastructure and whether systems would “hold up” after the clock struck 12 a.m. on January 1st, 2000. Businesses were desperately trying to back up their data on file servers, zip drives. and 3.5-inch disks. My classmates began signing big money employment contracts with signing bonuses by that fall. They were focused on refining their computer systems and networking skills. That's what employers wanted. I predicted that this dotcom boom was about to bust, so I elected to pursue graduate studies.

I saw the additional concern that businesses feared of not having their data. Everything I could think of had a critical connection to data, particularly why, how, and what we digitally house in systems. And society was singularly focused on the systems themselves. I believed then, as I do now, that data runs the world. I decided to go all in on data in graduate school and my career.

Data gets a bad reputation as a pseudo demon spirit creature because all the numbers and math are deemed complicated, confusing, and not relatable. Data is not a tangible concept to many people—those in computing, tech, and data spaces and those who are not in those spaces. We all, to some degree, are in digital spaces where data lives. Critiquing data uses involves all of us, but for those of us in the data trenches, there's a bigger pressure to suss out the issues and course-correct before the tech product goes public.

This book is for the rebel tech talent, those who acknowledge and are ready to address the limitations of software development. They recognize that tech's philosophy and practice of “move fast, break things” is inherently problematic, and needs to be changed, and they want to pinpoint the ways discrimination exists in this digital data space. The primary reader for this book, however, is the entry-level software developer or data analyst. But frankly, it should be considered a reference guide to making more responsible and equitable data connections.

Data Conscience translates theory to practice. The gaps in our current data infrastructure are spotlighted so that data practitioners know more precisely where issues exists. And I'm centering the most vulnerable, ethical issues and resolutions to address social, political, and economic implications and not just computational ones like optimization, load balancing, and latency.

What you will read in this book is a blend of social sciences, humanities, and data management with tangible, real-world examples. Consider it a modern antemortem describing specific instances of where ethical flags are raised and how data structures help or hinder ethics resolutions. I focus on being preemptive in handling data operation for inclusion rather than conducting conversational (generic) autopsies of case studies and algorithmic audits.

The book is divided into three parts. Part I, “Transparency” (Chapters 14), takes you on the rollercoaster of how outcomes and impacts of data, code, algorithms, and systems are revealed to all of us by companies, organizations, and groups. Part II, “Accountability” (Chapters 58), covers ways in which data and software teams can critique and explore interventions to make responsible data connections during the tech building phase. And lastly, Part III, “Governance” (Chapters 911), reviews the action steps taken thus far and ends as a public accountability manifesto on what all of us can do to humanize our relationship to data.

Here's a brief chapter-by-chapter overview:

  • Chapter 1 explores the role data has played in our society, particularly in the United States—how we've handled it and our relationship to handling it well. Oppression tactics, in the law and in the sciences, are mere social controls to enforce a hierarchy positioning that doesn't exist.
  • Chapter 2 describes for those of us on the “inside” of tech how we're torn by this realization that the code we write is likely contributing to a cycle of harm that we don't know how to curtail, stop, or dislodge ourselves from. Reconciling—and more to the point accepting—imperfection in data and tech needs a place in tech. The choice between error or no error doesn't exist anymore. There's a third choice: nontech-solvable.
  • Chapter 3 tackles the term “bias” and its multitude of interpretations head on. I describe how bias shows up and ways to shift our mindset on how we recognize and handle it, even before we write a single line of code. Getting overwhelmed and disengaging in combatting bias efforts is no longer an option.
  • Chapter 4 stretches our minds about what we've accepted as computational thinking and standard discussion points to fold in a more intentional socio-ethical tech understanding. Coding requires a 360-degree panoramic view that requires more than coders in order to see, understand, capture, and partially address the social, technical, and ethical considerations.
  • Chapter 5 focuses on asking the “why” questions, especially as part of data collection and reformat practices. Tech does a poor job of handling data collection and reformat. Learning to ask questions early, often, and with real people in mind streamlines how we manage data operations as a data, computing, and larger tech community.
  • Chapter 6 focuses on asking the “what” questions, especially as part of data storage and management procedures. We must come to grips with the fact that the data storage landscape is a culmination of intentional, yet sometimes harmful, decision-making exercises with social, computational, and morality implications.
  • Chapter 7 focuses on asking the “how” questions, especially as part of data analytics. The algorithms, systems, and platforms are taken from the same playbook, by the same homogeneous people. There's much we can learn about data from our comrades in the humanities and social sciences.
  • Chapter 8 focuses on asking the “when” questions, especially as part of data visualization. The allure of data visualizations is enticing, so proceed with caution with every chart, graph, and dashboard you encounter.
  • Chapter 9 snaps us back to reality as tech moves fast while the law moves slow. Juggling the usefulness versus persnickety hindrances of the law dominates this chapter. Focusing on the fundamental building blocks of tech, like data and the algorithms that use it, has traction and momentum rather than constructing legislation that's fixated on applications of technologies.
  • Chapter 10 discusses tech's dominance in our society and, in particular, as a culture that excludes other solution options. The rockstar tech workers, or algorithmic influencers, guide every industry one software update, code version release, or code library at a time. But clearly, we've hit a ceiling in what tech should do.
  • Chapter 11 is my public accountability manifesto. We're each battling for our dignity in digital spaces, and we must do so with the same indignant veracity as we do in physical spaces. The tech industry's ability to operate, for the most part, without impunity puts more onus on us, as global digital citizens, to maintain intense pressure for transparency, accountability and governance in all spaces where data resides. Algorithmic processes, systems, platforms, and institutions won't become responsible or equitable without us making it a requirement, rather than a choice.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset