Chapter 3. Moons of Saturn—The Crucial Role of Context

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Moons of Saturn—The Crucial Role of Context

In this chapter, we introduce three case studies that we’ll use throughout the book to illustrate the main concepts of technical debt and the strategies for managing it. All long-lived software-intensive systems have to deal with technical debt within their context. The interactions and specifics of the many factors of context help development organizations understand the systems and navigate the causes and consequences of the debt.

“It Depends…”

When asking questions about software development practices, how often have you heard the reply, “it depends”? This is not just a way to dismiss the question. There are no all-inclusive answers, universally applicable techniques, or standard recipes. The answer really does depend on a number of factors that describe the context of the system. Eight of these factors are shown in Figure 3.1.

An overview of the eight factors of context is depicted. — **Figure 3.1** *“It depends”: The many factors of context*

The factors that define the context of the system includes Size, Architecture, Business model, Team distribution, Rate of change, Age of system, Criticality, and Governance.

Size: The size of the system is by far the greatest factor because it drives the size of the team, the number of teams, the need for communication and coordination between teams, the impact of change, and more. The number of person-months, the size of the code, and the development budget are all possible proxies for size. Size is often related to complexity. The larger the system, the more technical debt it can accumulate.
Architecture: Is there a de facto architecture in place at the start of the project? Most projects are not novel enough to require a lot of architectural effort. They follow commonly accepted patterns in their domains. Many key architectural decisions are made in the first few days of development, such as choices related to middleware, operating systems, and programming languages. These choices may be based on what the developers are familiar with and their gut feelings rather than a careful analysis of long-term system consequences. We will show in Chapter 6, “Technical Debt and Architecture,” that technical debt at the architectural level is difficult to identify and very costly to repay.
Business model: What is the money flow? How is the project funded? Are you developing an internal system, a commercial product, a bespoke system on contract for a customer, or a component of a large system involving many different parties? Is it free/libre open-source software (FLOSS)? Financial considerations are a key factor in incurring technical debt or deciding to remediate technical debt.
Team distribution: Team distribution is often linked to the size of a project. How many teams are involved and collocated? Distributed teams increase the need for explicit communication and coordination of decisions as well as stable interfaces between the software components that they are responsible for. Communication issues and organizational silos contribute to the accumulation of technical debt, especially at the architectural level.
Rate of change: Though agile methods are all for embracing change, not all systems experience a rapid pace of change in their environment. Many projects have very stable requirements definitions. How stable is your business environment, and how many risks and unknowns are you facing? The volatility of the requirements will increase the propensity of the team to incur technical debt.
Age of system: Technical debt has more opportunities to accrue on large and long-lived systems. These legacy systems carry hidden assumptions about their architecture, and evolving them can reveal technical debt. Constraints accrue in legacy systems, often causing another source of technical debt. Alternatively, creating a new system, with fewer constraints, can proceed without taking on a lot of debt.
Criticality: How many people die or are hurt if the system fails? For safety-critical and mission-critical systems, documentation needs increase dramatically to satisfy external agencies that want to assure the safety of the public. More formal verification and validation techniques may be essential to ensure that a system behaves the way it should. Such systems often struggle with how to modernize hardware or software that can be a major source of debt—whether it is legacy hardware or some arcane software that implements a critical algorithm.
Governance: How are critical decisions made? How are projects steered? How do projects begin and end? Who decides what to do when things go wrong? How is success or failure defined? Who manages the software project managers? Tension or lack of communication between a project and the management structure may cause technical debt accumulation, as discussed in Chapter 10, “What Causes Technical Debt?”

Other factors can change the context of the software development process, but they have more indirect effects on it. They mostly shape the eight factors just described. Some of these other factors are domain, process maturity, corporate culture, degree of innovation, and economic imperatives.

These factors combine in many different ways to create the context in which development organizations must plan their approach to technical debt. An old and large company might have mostly large projects, a significant level of governance, proprietary code, a stable architecture, large globally distributed teams, and a medium rate of change. A small startup might have a small codebase, an unstable or still fluid architecture, low criticality, a high rate of change, and a collocated team.

Three Case Studies: Moons of Saturn

We now introduce three example projects, laden with different types of technical debt and facing different kinds of tactical choices. We will use the context factors to describe these projects and the systems in development so you can quickly understand the environment, system characteristics, and whether they are similar to your own. We derived these examples from actual companies that we authors have interacted with, but we abstracted many characteristics and details for confidentiality reasons, and in some cases we combined characteristics from two similar organizations into a single example.

These examples feature three different companies, developing different kinds of software-intensive products in three different domains. We named the three projects after three moons of the planet Saturn. Their size variation represents the sizes of the three companies:

Atlas (diameter: 30 km)
Phoebe (diameter: 213 km)
Tethys (diameter: 1,062 km)

An easy way to differentiate the projects is to remember that the sizes of the moons grow in alphabetical order: Atlas is smaller than Phoebe, which is smaller than Tethys.

Table 3.1 summarizes the key differences among the three software products and the respective companies in terms of the eight main factors and two others, describing domain and process.

Table 3.1 Contrasting the three case studies

Factor	Atlas: Small startup	Phoebe: Agile shop	Tethys: Global giant
Domain	E-commerce	Healthcare IT	Transportation
Size	400 KSLOC*	2 MSLOC	4 MSLOC
Architecture	Data analytics, usability, evolvability, cloud, MEAN stack (MongoDB, Explorer.js, Angular.js, Node.js), big data	Security, privacy, scalability, service-oriented architecture (SOA), cloud, large databases	Safety (reliability, high availability, fault tolerance), performance, multiple designs, hardware dependent, real-time embedded
Business model	Market-driven pivots in service to online user base	Open-source software of the partner organizations for business growth	Main contractor for an external customer
Team distribution	Single collocated team, fluid organization	Core team and a few dispersed teams in a single country	Multiple teams (>10), strictly defined roles, globally dispersed
Rate of change	Days to weeks	Months	Years
Age of system	Starting, active development	5 years, modifications for new markets	Over 15 years, in maintenance
Criticality	No	Moderate	High
Governance	Minimal: internal	Moderate: external regulatory compliance	High: multiple external standards, regulatory compliance, certifications
Process	Ad hoc agile with DevOps, rush to customers, multiple betas	Agile using Scrum, involved product owner	Hybrid, iterative, formal documentation and quality assurance

Case Study 1: Atlas—The Small Startup

Atlas is a small startup company, barely three years old, whose original founders act as the senior management. Atlas has a single product in the e-commerce space.

The Atlas development team is collocated and has grown from 4 developers (the founders) to about 15 within two and a half years. They use an ad hoc agile process, neither formalized nor rigorously followed, but they do speak to each other daily, and all use a very well-defined tool set that allows them to quickly deploy new features to customers. They are very focused on their market and tactically “pivot,” a term used to denote a change in product direction that drives a corresponding change in the software product specification. There is no clear role specialization in the team, and everyone contributes to all aspects of development, including requirements, design, coding, and testing.

The Atlas design has no deliberate or explicit architecture. It has no formal documentation: The developers say that “the code is the doc.” Atlas uses an almost continuous delivery for its installed base, but for the wider audience using the open-source part of the system, it has a slower rhythm for releases of about three weeks. However, it has limited regression-testing capabilities. The codebase in Java and JavaScript, with some C, is now about 400,000 source lines of code (400 KSLOC).

The key business driver for Atlas is finding its niche and carving out its piece of the market. The development team added some features to the product in the open-source version to help Atlas attract new business for the full-blown product and develop a friendlier image. The company is in a domain with no external regulation or governance pressure.

As a result of constant pivoting, Atlas has accrued a moderate amount of technical debt, mostly under pressure to deliver the next prototype to the next key reference customer. The product suffers from scalability and evolvability issues, but the codebase has remained relatively clean. The development team has only limited regression-testing capability, and team members are wary of major refactorings.

The current level of technical debt in the codebase is becoming a source of tension between team members. Some developers are pushing to rebuild the product from scratch, which is a huge risk, as it would not allow any externally visible progress for six to eight weeks, and the senior management team is pushing back.

Case Study 2: Phoebe—Agile Shop with a Viable Product

The Phoebe team is developing an open-source software solution that supports health information exchange at the national level. The product has grown from meeting an initial small-scale need to attracting many organizations that would like to set up health information exchanges. The product has been in development and use for about six years, and it has been evolving with participation from both government and private-sector users as well as contributions from developers. Phoebe derives its revenue from selling services, not application or source code.

The core Phoebe development team is collocated, but a small number of developers in partner organizations also develop functionality and contribute to the backlog for their most pressing user stories. The core team size has fluctuated from 35 to 8, decreasing over the years. In addition, at times multiple subcontractor teams have developed different features of Phoebe. The core team has consistently used Scrum to manage iterations and followed agile software development practices.

The Phoebe design has evolved over the years to get ahead in a competitive domain dominated by critical quality concerns such as security and privacy. In addition, the development team must ensure that the product complies with a number of IT standards related to privacy and healthcare data. Phoebe is developed with a service-oriented software paradigm, and now the organization is investigating migrating some of its services to the cloud. To foster open contribution and enable new organizations to adopt the product, the development team has accumulated a substantial amount of online documentation on the architecture, design, open issues, and codebase as well as user documentation for deployment, installation, and use. These documents are open access and at times get out of sync due to different priorities of the core team.

The key business driver for the Phoebe product is to provide a reliable, safe, and efficient infrastructure for addressing the challenges of the growing health information exchange. There are many competitors from the private sector, but by embracing an open-source model, the product owner aims to increase contribution to development as well as product quality and use.

In a domain that is not only competitive but also watched by many eyes in the nation, Team Phoebe struggles to manage multiple stakeholders with diverse requirements, get ahead of changing technology, and sustain a viable product. As a result, technical debt accrues, in most cases intentionally. While Team Phoebe has been trying to repay that debt by prioritizing technical debt reduction in major releases, technology lock-in has become a major hindrance to meeting this goal. The development team keeps track of technical debt items, which are managed with other items of the backlog, tagged as “techdebt.” However, members of the core team do not have a consistent process for identifying and managing technical debt. For example, the team tried using some tools to look into code quality, but it did not sustain their use. Major refactoring releases have eliminated some of the existing technical debt or made it obsolete, but Phoebe has not communicated this broadly to its stakeholders, and it is not clear how the team determines which issues are most important.

Case Study 3: Tethys—The Global Giant

Tethys is a large, global, multi-business organization. The Tethys product is 15 years old. It is safety-critical embedded avionics software, developed as a product line. The product team needs to balance many concerns of an evolving legacy product-line system that has been in existence for over a decade: large customer-installed base, new markets to open, changes in underlying technology, and the like. There is constant pressure to stay on top of competitive innovation with increasing demand from customers to include features. As a result, Team Tethys must, on one hand, define a new rhythm of agility in a complicated context and, on the other hand, pay due diligence to tough quality attribute requirements such as safety criticality, reliability, and security.

The Tethys product is developed by multiple development teams, and at times there are more than 100 developers on task. Project management must coordinate across system engineers, quality assurance teams, and compliance teams, both internal and external to the organization. Team Tethys also works with contractors extensively, which introduces another level of complexity to development.

As is typical with such systems, Tethys evolves through major planned upgrade releases to meet business goals. The longevity of the product and the different families of products in the product line are sources of major revenue for the organization. As a result, the upgrades often prioritize new features over needed re-architecting. The complexity of the deployment makes it impossible to have more than one major release per year and some minor releases for emergency bug fixes.

Such a long history comes with a lot of technical debt, which includes both architectural issues and code quality concerns as a result of developer turnover and inconsistent subcontractor practices. While code quality issues are not ideal, they do not block day-to-day development. Tethys suffers the most technical debt due to its architecture. Needed re-architecting efforts have not occurred in a timely manner, technology has changed but the product has not, each contractor has introduced his or her own interpretation of the structure, and the list goes on. Everyone on the team, from the most junior developer to the most senior manager, is aware of this debt, although not everyone understands the gory details or the extent of it. Yet it is hard to motivate the team to allocate the time and funding to tackle the debt because no one knows how to gracefully reduce it while keeping the business rolling.

Case Study Comparison

Table 3.2 summarizes the technical debt issues the three projects are facing and how they are managed, if at all.

Table 3.2 Technical debt issues addressed by the three case studies

	Atlas: Small startup	Phoebe: Agile shop	Tethys: Global giant
Technical debt issues	Lack of scalability, lack of regression testing, using code as documentation	Locked-in architectural choices that have proved limiting	Mismatched assumptions between teams, high turnover, internal code quality, aging system lagging in technology
Technical debt awareness and management	Awareness of technical debt late in the timeline, conflicting priorities in addressing it	Identification of technical debt, regular focused debt reduction, incomplete consideration of all aspects	Technical debt as the elephant in the room

There is not one universal prescription for managing technical debt that would work for all three projects. The contextual factors color not only the specifics of each organization’s technical debt but also the way it can be managed.

Technical Debt in Context

The specific context factors and their interactions will help you understand your system and navigate the causes and consequences of its debt. The bottom line is that all organizations with long-lived software-intensive systems have to deal with technical debt within their context. We cannot emphasize enough the importance of understanding this as it is a critical first step in successfully managing technical debt.

A figure shows the systems that have technical debt.

As we progress through the book, we will look at how the three different organizations described here use various techniques to improve how they handle their technical debt.

Consider the decision made by countless designers and programmers in the 1970s to handle dates by storing a year value in a two-character string. Why would they have done that when a year is four digits? Memory at the time was at a premium, and every opportunity for memory conservation was important, especially for something as ubiquitous as the value of year. It worked. It was awesome”…until 20-plus years later, when the glut of these systems taken together created a major problem with potentially disastrous consequences and global, vast technical debt. In the years leading up to 2000, what I just described was dubbed the “Y2K problem.” This is personal. I designed and coded some of those systems. Even worse, I programmed some in PL/I, in which it was possible to overlay different kinds of storage—and I did, on the year field! Why did I do this? It was a great opportunity to save storage, and the probability of the risk I took becoming problematic was miniscule. I just never imagined anyone would be using these systems 10 years later, let alone 20. I was little concerned that my systems had technical debt that had to be repaid before January 1, 2000. Thanks goodness, it was.

Here is another bit more recent example. Beginning in 1994, a U.S. Army tactical command-and-control system, called Force XXI Battle Command Brigade and Below (FBCB2), was designed as a hardware/software prototype demonstration system for on-the-move operations (think tanks, Humvees, helicopters, forward operating bases) that would revolutionize situational awareness capabilities. For those of you without military background, situational awareness means knowing the answers to these basic questions: Where am I? Where are my buddies? Where is the enemy? What is the environment?

FBCB2 was to pioneer (among other innovations) the use of GPS receivers, a tactical Internet, and local computer displays with human interaction (Bergey et al. 2005). In doing so, the designers and developers had an opportunity to provide unprecedented, sophisticated capability to the warfighters (who were still relying on physical maps): They made software decisions that prioritized functionality—proving this new capability. That strategy was successful. FBCB2 was used by U.S. forces in the Balkans, Afghanistan, and Iraq. It was the U.S. Army’s most successful entrée into battlefield digitalization and, most importantly, it saved lives.

Not surprisingly, there were also risks in the architectural decisions that prioritized functionality. Modifiability, scalability, interoperability, and extensibility were poorly supported. As FBCB2 enjoyed widespread acceptance and accolades, the technical debt associated with the architectural decisions became problematic. Modifications, new configurations, and maintenance proved difficult and costly. The system needed to be re-architected, and it was. In my opinion, the opportunity to field a less robust system that saved lives and yet risked downstream evolution and sustainment problems was worth the risk and the technical debt. Again, thank goodness, the debt was repaid.

More recently, a colleague shared that his software development organization chose AngularJS—a great opportunity to take advantage of a powerful front-end web application framework that was widely used, supported, and interoperable. There was a proprietary framework layered on top of AngularJS and hundreds of internal applications using this stack. AngularJS not only provided functionality but standardization across the underlying applications. There was little risk as far as anyone could see”…until Angular 2 (now called simply Angular) was released to replace AngularJS. Angular is considerably different from its predecessor in language (now TypeScript) and features. The result was considerable technical debt to migrate both the proprietary framework and associated applications to Angular. The changes to upgrade just the underlying proprietary framework were estimated to take one year, and until it was ready, the applications were to continue writing to the older AngularJS. Some of the applications chose a more expeditious route, going rogue and redeveloping to use Angular directly. The standardization across applications is now lost. Still, the opportunity afforded by AngularJS and Angular (at least in my opinion) is worth the risk. The coupling at the root of the technical debt might have been reduced from the outset, perhaps making it possible to preserve the standardization along with the functional advantage.

There are many other examples I could share. Although I have no scientific evidence to substantiate my view, I have been at both life and software development for more decades than I would like to claim. What I do claim (and I don’t think I am unique) is that it is wise to seize opportunities in life and in software development, mindful that there will always be risks and, in software, technical debt. This book is not about avoiding opportunities. Rather, it is about being cognizant of technical risks (as much as possible) and smartly managing the fallout should they become problems. All three of my examples could have benefited from these insights and approaches.

What Can You Do Today?

Identify the factors of context in your project that can create conditions for technical debt buildup. It is also important to use your knowledge of the context to gain insight into how specific practices for managing technical debt apply in your particular situation.

For Further Reading

The context of software development explained in this chapter is based on previously published work (Kruchten 2013). It is similar to the “agility at scale” model of Scott Ambler (2011).

The Atlas, Phoebe, and Tethys projects that we use as examples throughout this book are based on our experiences. There are other case study examples in the literature that may match your software context. Guo and colleagues (2016) describe a Brazilian software company that provides enterprise-level software development, consulting, and training services. They explain the impact of technical debt on a Java-based, database-driven web application for water vessel management. Ampatzoglou’s team (2016) explores technical debt in seven embedded software systems. Klotins’ team (2018) reports on how technical debt accumulates in a startup context using studies from 86 startups. And Sculley and colleagues (2015) reflect on their experiences developing industry-scale machine-learning systems and summarize the seven different categories of debt that they observe.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3. Moons of Saturn—The Crucial Role of Context

Create new playlist

Sign In

Sign Up

Chapter 3. Moons of Saturn—The Crucial Role of Context

“It Depends…”

Three Case Studies: Moons of Saturn

Case Study 1: Atlas—The Small Startup

Case Study 2: Phoebe—Agile Shop with a Viable Product

Case Study 3: Tethys—The Global Giant

Case Study Comparison

Technical Debt in Context

What Can You Do Today?

For Further Reading

Table of Contents for
Chapter 3. Moons of Saturn—The Crucial Role of Context