In this chapter, we introduce three case studies that we’ll use throughout the book to illustrate the main concepts of technical debt and the strategies for managing it. All long-lived software-intensive systems have to deal with technical debt within their context. The interactions and specifics of the many factors of context help development organizations understand the systems and navigate the causes and consequences of the debt.
When asking questions about software development practices, how often have you heard the reply, “it depends”? This is not just a way to dismiss the question. There are no all-inclusive answers, universally applicable techniques, or standard recipes. The answer really does depend on a number of factors that describe the context of the system. Eight of these factors are shown in Figure 3.1.
Size: The size of the system is by far the greatest factor because it drives the size of the team, the number of teams, the need for communication and coordination between teams, the impact of change, and more. The number of person-months, the size of the code, and the development budget are all possible proxies for size. Size is often related to complexity. The larger the system, the more technical debt it can accumulate.
Architecture: Is there a de facto architecture in place at the start of the project? Most projects are not novel enough to require a lot of architectural effort. They follow commonly accepted patterns in their domains. Many key architectural decisions are made in the first few days of development, such as choices related to middleware, operating systems, and programming languages. These choices may be based on what the developers are familiar with and their gut feelings rather than a careful analysis of long-term system consequences. We will show in Chapter 6, “Technical Debt and Architecture,” that technical debt at the architectural level is difficult to identify and very costly to repay.
Business model: What is the money flow? How is the project funded? Are you developing an internal system, a commercial product, a bespoke system on contract for a customer, or a component of a large system involving many different parties? Is it free/libre open-source software (FLOSS)? Financial considerations are a key factor in incurring technical debt or deciding to remediate technical debt.
Team distribution: Team distribution is often linked to the size of a project. How many teams are involved and collocated? Distributed teams increase the need for explicit communication and coordination of decisions as well as stable interfaces between the software components that they are responsible for. Communication issues and organizational silos contribute to the accumulation of technical debt, especially at the architectural level.
Rate of change: Though agile methods are all for embracing change, not all systems experience a rapid pace of change in their environment. Many projects have very stable requirements definitions. How stable is your business environment, and how many risks and unknowns are you facing? The volatility of the requirements will increase the propensity of the team to incur technical debt.
Age of system: Technical debt has more opportunities to accrue on large and long-lived systems. These legacy systems carry hidden assumptions about their architecture, and evolving them can reveal technical debt. Constraints accrue in legacy systems, often causing another source of technical debt. Alternatively, creating a new system, with fewer constraints, can proceed without taking on a lot of debt.
Criticality: How many people die or are hurt if the system fails? For safety-critical and mission-critical systems, documentation needs increase dramatically to satisfy external agencies that want to assure the safety of the public. More formal verification and validation techniques may be essential to ensure that a system behaves the way it should. Such systems often struggle with how to modernize hardware or software that can be a major source of debt—whether it is legacy hardware or some arcane software that implements a critical algorithm.
Governance: How are critical decisions made? How are projects steered? How do projects begin and end? Who decides what to do when things go wrong? How is success or failure defined? Who manages the software project managers? Tension or lack of communication between a project and the management structure may cause technical debt accumulation, as discussed in Chapter 10, “What Causes Technical Debt?”
Other factors can change the context of the software development process, but they have more indirect effects on it. They mostly shape the eight factors just described. Some of these other factors are domain, process maturity, corporate culture, degree of innovation, and economic imperatives.
These factors combine in many different ways to create the context in which development organizations must plan their approach to technical debt. An old and large company might have mostly large projects, a significant level of governance, proprietary code, a stable architecture, large globally distributed teams, and a medium rate of change. A small startup might have a small codebase, an unstable or still fluid architecture, low criticality, a high rate of change, and a collocated team.
We now introduce three example projects, laden with different types of technical debt and facing different kinds of tactical choices. We will use the context factors to describe these projects and the systems in development so you can quickly understand the environment, system characteristics, and whether they are similar to your own. We derived these examples from actual companies that we authors have interacted with, but we abstracted many characteristics and details for confidentiality reasons, and in some cases we combined characteristics from two similar organizations into a single example.
These examples feature three different companies, developing different kinds of software-intensive products in three different domains. We named the three projects after three moons of the planet Saturn. Their size variation represents the sizes of the three companies:
Atlas (diameter: 30 km)
Phoebe (diameter: 213 km)
Tethys (diameter: 1,062 km)
An easy way to differentiate the projects is to remember that the sizes of the moons grow in alphabetical order: Atlas is smaller than Phoebe, which is smaller than Tethys.
Table 3.1 summarizes the key differences among the three software products and the respective companies in terms of the eight main factors and two others, describing domain and process.
Factor |
Atlas: Small startup |
Phoebe: Agile shop |
Tethys: Global giant |
Domain |
E-commerce |
Healthcare IT |
Transportation |
Size |
400 KSLOC* |
2 MSLOC |
4 MSLOC |
Architecture |
Data analytics, usability, evolvability, cloud, MEAN stack (MongoDB, Explorer.js, Angular.js, Node.js), big data |
Security, privacy, scalability, service-oriented architecture (SOA), cloud, large databases |
Safety (reliability, high availability, fault tolerance), performance, multiple designs, hardware dependent, real-time embedded |
Business model |
Market-driven pivots in service to online user base |
Open-source software of the partner organizations for business growth |
Main contractor for an external customer |
Team distribution |
Single collocated team, fluid organization |
Core team and a few dispersed teams in a single country |
Multiple teams (>10), strictly defined roles, globally dispersed |
Rate of change |
Days to weeks |
Months |
Years |
Age of system |
Starting, active development |
5 years, modifications for new markets |
Over 15 years, in maintenance |
Criticality |
No |
Moderate |
High |
Governance |
Minimal: internal |
Moderate: external regulatory compliance |
High: multiple external standards, regulatory compliance, certifications |
Process |
Ad hoc agile with DevOps, rush to customers, multiple betas |
Agile using Scrum, involved product owner |
Hybrid, iterative, formal documentation and quality assurance |
Atlas is a small startup company, barely three years old, whose original founders act as the senior management. Atlas has a single product in the e-commerce space.
The Atlas development team is collocated and has grown from 4 developers (the founders) to about 15 within two and a half years. They use an ad hoc agile process, neither formalized nor rigorously followed, but they do speak to each other daily, and all use a very well-defined tool set that allows them to quickly deploy new features to customers. They are very focused on their market and tactically “pivot,” a term used to denote a change in product direction that drives a corresponding change in the software product specification. There is no clear role specialization in the team, and everyone contributes to all aspects of development, including requirements, design, coding, and testing.
The Atlas design has no deliberate or explicit architecture. It has no formal documentation: The developers say that “the code is the doc.” Atlas uses an almost continuous delivery for its installed base, but for the wider audience using the open-source part of the system, it has a slower rhythm for releases of about three weeks. However, it has limited regression-testing capabilities. The codebase in Java and JavaScript, with some C, is now about 400,000 source lines of code (400 KSLOC).
The key business driver for Atlas is finding its niche and carving out its piece of the market. The development team added some features to the product in the open-source version to help Atlas attract new business for the full-blown product and develop a friendlier image. The company is in a domain with no external regulation or governance pressure.
As a result of constant pivoting, Atlas has accrued a moderate amount of technical debt, mostly under pressure to deliver the next prototype to the next key reference customer. The product suffers from scalability and evolvability issues, but the codebase has remained relatively clean. The development team has only limited regression-testing capability, and team members are wary of major refactorings.
The current level of technical debt in the codebase is becoming a source of tension between team members. Some developers are pushing to rebuild the product from scratch, which is a huge risk, as it would not allow any externally visible progress for six to eight weeks, and the senior management team is pushing back.
The Phoebe team is developing an open-source software solution that supports health information exchange at the national level. The product has grown from meeting an initial small-scale need to attracting many organizations that would like to set up health information exchanges. The product has been in development and use for about six years, and it has been evolving with participation from both government and private-sector users as well as contributions from developers. Phoebe derives its revenue from selling services, not application or source code.
The core Phoebe development team is collocated, but a small number of developers in partner organizations also develop functionality and contribute to the backlog for their most pressing user stories. The core team size has fluctuated from 35 to 8, decreasing over the years. In addition, at times multiple subcontractor teams have developed different features of Phoebe. The core team has consistently used Scrum to manage iterations and followed agile software development practices.
The Phoebe design has evolved over the years to get ahead in a competitive domain dominated by critical quality concerns such as security and privacy. In addition, the development team must ensure that the product complies with a number of IT standards related to privacy and healthcare data. Phoebe is developed with a service-oriented software paradigm, and now the organization is investigating migrating some of its services to the cloud. To foster open contribution and enable new organizations to adopt the product, the development team has accumulated a substantial amount of online documentation on the architecture, design, open issues, and codebase as well as user documentation for deployment, installation, and use. These documents are open access and at times get out of sync due to different priorities of the core team.
The key business driver for the Phoebe product is to provide a reliable, safe, and efficient infrastructure for addressing the challenges of the growing health information exchange. There are many competitors from the private sector, but by embracing an open-source model, the product owner aims to increase contribution to development as well as product quality and use.
In a domain that is not only competitive but also watched by many eyes in the nation, Team Phoebe struggles to manage multiple stakeholders with diverse requirements, get ahead of changing technology, and sustain a viable product. As a result, technical debt accrues, in most cases intentionally. While Team Phoebe has been trying to repay that debt by prioritizing technical debt reduction in major releases, technology lock-in has become a major hindrance to meeting this goal. The development team keeps track of technical debt items, which are managed with other items of the backlog, tagged as “techdebt.” However, members of the core team do not have a consistent process for identifying and managing technical debt. For example, the team tried using some tools to look into code quality, but it did not sustain their use. Major refactoring releases have eliminated some of the existing technical debt or made it obsolete, but Phoebe has not communicated this broadly to its stakeholders, and it is not clear how the team determines which issues are most important.
Tethys is a large, global, multi-business organization. The Tethys product is 15 years old. It is safety-critical embedded avionics software, developed as a product line. The product team needs to balance many concerns of an evolving legacy product-line system that has been in existence for over a decade: large customer-installed base, new markets to open, changes in underlying technology, and the like. There is constant pressure to stay on top of competitive innovation with increasing demand from customers to include features. As a result, Team Tethys must, on one hand, define a new rhythm of agility in a complicated context and, on the other hand, pay due diligence to tough quality attribute requirements such as safety criticality, reliability, and security.
The Tethys product is developed by multiple development teams, and at times there are more than 100 developers on task. Project management must coordinate across system engineers, quality assurance teams, and compliance teams, both internal and external to the organization. Team Tethys also works with contractors extensively, which introduces another level of complexity to development.
As is typical with such systems, Tethys evolves through major planned upgrade releases to meet business goals. The longevity of the product and the different families of products in the product line are sources of major revenue for the organization. As a result, the upgrades often prioritize new features over needed re-architecting. The complexity of the deployment makes it impossible to have more than one major release per year and some minor releases for emergency bug fixes.
Such a long history comes with a lot of technical debt, which includes both architectural issues and code quality concerns as a result of developer turnover and inconsistent subcontractor practices. While code quality issues are not ideal, they do not block day-to-day development. Tethys suffers the most technical debt due to its architecture. Needed re-architecting efforts have not occurred in a timely manner, technology has changed but the product has not, each contractor has introduced his or her own interpretation of the structure, and the list goes on. Everyone on the team, from the most junior developer to the most senior manager, is aware of this debt, although not everyone understands the gory details or the extent of it. Yet it is hard to motivate the team to allocate the time and funding to tackle the debt because no one knows how to gracefully reduce it while keeping the business rolling.
Table 3.2 summarizes the technical debt issues the three projects are facing and how they are managed, if at all.
|
Atlas: Small startup |
Phoebe: Agile shop |
Tethys: Global giant |
---|---|---|---|
Technical debt issues |
Lack of scalability, lack of regression testing, using code as documentation |
Locked-in architectural choices that have proved limiting |
Mismatched assumptions between teams, high turnover, internal code quality, aging system lagging in technology |
Technical debt awareness and management |
Awareness of technical debt late in the timeline, conflicting priorities in addressing it |
Identification of technical debt, regular focused debt reduction, incomplete consideration of all aspects |
Technical debt as the elephant in the room |
There is not one universal prescription for managing technical debt that would work for all three projects. The contextual factors color not only the specifics of each organization’s technical debt but also the way it can be managed.
The specific context factors and their interactions will help you understand your system and navigate the causes and consequences of its debt. The bottom line is that all organizations with long-lived software-intensive systems have to deal with technical debt within their context. We cannot emphasize enough the importance of understanding this as it is a critical first step in successfully managing technical debt.
As we progress through the book, we will look at how the three different organizations described here use various techniques to improve how they handle their technical debt.
Identify the factors of context in your project that can create conditions for technical debt buildup. It is also important to use your knowledge of the context to gain insight into how specific practices for managing technical debt apply in your particular situation.
The context of software development explained in this chapter is based on previously published work (Kruchten 2013). It is similar to the “agility at scale” model of Scott Ambler (2011).
The Atlas, Phoebe, and Tethys projects that we use as examples throughout this book are based on our experiences. There are other case study examples in the literature that may match your software context. Guo and colleagues (2016) describe a Brazilian software company that provides enterprise-level software development, consulting, and training services. They explain the impact of technical debt on a Java-based, database-driven web application for water vessel management. Ampatzoglou’s team (2016) explores technical debt in seven embedded software systems. Klotins’ team (2018) reports on how technical debt accumulates in a startup context using studies from 86 startups. And Sculley and colleagues (2015) reflect on their experiences developing industry-scale machine-learning systems and summarize the seven different categories of debt that they observe.