5. System Design Example

The previous three chapters presented the universal design principles for system design. However, most people learn best by example. Therefore, this chapter demonstrates the application of concepts from prior chapters with a comprehensive example: a case study. The case study describes the design of a new system called TradeMe, a replacement for a legacy system. The case study is derived directly from an actual system that IDesign designed for one of its customers, albeit with the specific business details scrubbed and obfuscated. The essence of the system remains unchanged, from the business case to the decomposition: I have not glossed over issues or tried to beautify the situation. As mentioned in Chapter 1, design should not be time-consuming. In this case, the design was completed in less than a week by a two-person design team consisting of a seasoned IDesign architect and an apprentice.

The goal of this case study is to show the thought process and the deductions used to produce the design. These are often difficult to learn on your own, but are more easily understood by watching somebody else do it while reasoning about what is taking place. This chapter starts with an overview of the customer and the system, then presents the requirements in the form of several use cases. The identification of the areas of volatility and the architecture relies on The Method structure.

Caution

You should not use this example dogmatically as a template. Every system is different, having its own constraints and requiring its own design considerations and tradeoffs. As an architect, you add value when you devise the correct design for the system at hand. This requires practice and critical thinking. In this chapter, you should focus on the rationale for the design decisions, and use this example to start off your practice, as discussed in Chapter 2.

System Overview

TradeMe is a system for matching tradesmen to contractors and projects. Tradesmen may be plumbers, electricians, carpenters, welders, surveyors, painters, telephone network technicians, gardeners, and solar panel installers, among others. They all work independently and are self-employed. Each tradesman has a skill level, and some, such as electricians, are certified by regulators to do certain tasks. The payment rate for the tradesman varies based on various factors such as discipline (welders are paid more than carpenters), skill level, years of experience, project type, location, and even weather. Other factors affecting their work include regulatory compliance issues (such as minimum wage or employment taxes), risk premium (such as exterior work on skyscrapers or with high voltage), certification of tradesmen’s qualifications for certain kinds of task (such as welding girders or power grid tie-in), reporting requirements, and more.

The contractors are general contractors, and they need tradesmen on an ad hoc basis, from as little as a day to as long as a few weeks. Contractors often have a base crew of generalists whom they employ outside the system on a full-time basis, using TradeMe for the specialized work. On the same project, different tradesmen are needed for different periods of time (one for a day, another for a week) at different times. Tradesmen can come and go on a single project.

The TradeMe system allows tradesmen to sign up, list their skills, their general geographic area of availability, and the rate they expect. It also allows contractors to sign up, list their projects, the required trades and skills, the location of the projects, the rates they are willing to pay, the duration of engagement, and other attributes of the project. Contractors can even request (but not insist upon) specific tradesmen with whom they would like to work.

Other than the factors already mentioned, the rate the contractor is willing to pay depends on supply and demand. When a project is idle, the contractor will increase the price. When the tradesman is idle, the tradesman will lower the price. Similar consideration is given to the duration or requested commitment. The ideal project for a tradesman often pays a high rate and has a short duration. Once the tradesmen have committed to a project, they have to stay for the amount of time to which they committed. Contractors may offer more pay with longer commitments. In general, the system lets market forces set the rate and find equilibrium.

The projects are construction projects for buildings. The system may also be useful in newly emerging markets, such as oil fields or marine yards.

TradeMe allows the tradesmen and the contractors to find one another. The system processes the requests and dispatches the required tradesmen to the work sites. It also keeps track of the hours and wages, and the rest of the reporting to the authorities, saving both contractors and tradesmen the hassle of handling these tasks themselves.

The system isolates tradesmen from contractors. It collects funds from the contractors and pays the tradesmen. Contractors cannot bypass the system and hire the tradesmen directly because the tradesmen have exclusivity with the system.

The TradeMe system aims to find the best rate for the tradesmen and the most availability for the contractors. It makes money on the small spread between the ask rate and the bid rate. Another source of income is the membership fee that both tradesmen and contractors pay. The fee is collected annually but that could change. Consequently, both tradesmen and contractors are called members in the system.

Presently, nine call centers handle the majority of the assignments. Each call center is specific to a particular locale, regulations, building codes, standards, and labor laws. The call centers are staffed with account representatives called reps. The reps today rely on experience to optimize the scheduling across all projects and available tradesmen. Some call centers operate as their own business, whereas others are operated by the same business.

There is also at least one competing application geared more toward finding the cheapest tradesmen, and some contractors prefer that system. Contractors opting for tradesmen based on price as opposed to availability could be a growing trend.

Legacy System

The legacy system, which is deployed in European call centers, has full-time users who rely on a two-tier desktop application connected to a database. Both tradesmen and contractors call in, with the reps entering the details and even performing the matching in real time. Some rudimentary web portals for managing membership bypass the legacy system and work with the database directly. The various subsystems are isolated and very inefficient, requiring a lot of human intervention at almost every step. Users are required to employ as many as five different applications to accomplish their tasks. These applications are independent, and the integration is done by hand. The client applications are chock-full of business logic, and the lack of separation between UI and business logic prevents updating the applications to modern user experience.

Each subsystem even has its own repository, and the users have to reconcile them to make sense of it all. This process is error prone and imposes expensive training and onboarding time for new users.

The legacy system is vulnerable, and its haphazard approach to security exposes it to many possible attack vectors. The legacy system was never designed with security in mind. For that matter, it was never designed at all, but rather grew organically.

The legacy simply cannot accommodate several new features and desirable capabilities:

  • Mobile device support

  • Higher degree of automation of the workflow

  • Some connectivity to other systems

  • Migration to the cloud

  • Fraud detection

  • Quality of work surveys, including incorporating the tradesman’s safety record in the rate and skill level

  • Entering new markets (such as deployment at marine yards)

Both the business and users are frustrated with the legacy system’s inability to keep up with the times, and there is a never-ending stream of desired value-added features. One such feature, continuing education, turned out to be a must-have, so it was cobbled on top of the legacy system. The legacy system assigns tradesmen to certification classes and government-required tests and tracks the tradesmen’s progress. Although external education centers provide training and register the certifications, the users have to manually connect them with the legacy system. While unrelated to the core system aspects, tradesmen are really keen on this feature, as is the business, because the certification feature helps prevent tradesmen from moving to the competitors.

The legacy system is having trouble complying with new legislation across locales. Dealing with any change is very difficult, and the system is highly specific for its current business context. Since the company cannot afford to support a unique version of the system per locale, it created an incentive to dumb down the system to the lowest common denominator across locales. This further increases the burden on the users in terms of their manual workflows, which decreases efficiency, increases training time and costs, and causes loss of business opportunities.

Overall, the system has some 220 reps across all locations. Neither scalability nor throughput poses a problem. However, responsiveness is an issue, although this may just be a side effect of the legacy system.

New System

Given the issues of the poorly designed legacy, the company’s management is interested in designing a new system correctly. The new system should automate the work as much as possible. Ideally, the company would like to have a single, small call center that is used as a backup for an automated process. This call center would use a single system across all locales. While the system is deployed in Europe, there are requests to deploy the system in the United Kingdom1 and even Canada (i.e., outside the European Union). Another driver for investing in the new system is that the competitors have much more flexible, efficient systems, with superior user experience.

1. While this design effort took place prior to Brexit (the departure of the United Kingdom from the EU), Brexit is a classic example of a massive change that was unanticipated at the time, yet the new system accommodated it seamlessly.

While contractors could staff a project using multiple sources of tradesmen, including competing products, integration with competing products and project optimizations in general are beyond the scope of the system: The company is not in the business of optimization or integration. Expanding the marketplace to include other trades such as IT or nursing is also out of scope. Adding these markets would redefine the nature of the business, and the company’s forte is matching tradesmen to construction projects, not general staffing.

The Company

The company views itself as a tradesmen broker, not as a software organization. Software is not its business. In the past the company did not acknowledge what it would really take to develop great software. The company did not devote adequate effort to process or development practices. The company’s attempts to build a replacement system in the past all failed. What the company does have is plenty of financial resources—the legacy application is very profitable. The bitter lessons of the past have convinced management to turn a new page and adopt a sound approach for software development.

Use Cases

There were no existing requirements documents for either the old system or the new system. The customer was able to provide Figures 5-1 to 5-8, depicting some use cases. These may or may not be core use cases; they are simply the required behaviors of the system. To a large extent, the use cases reflected what the legacy system was supposedly doing. Since the design team was looking for core use cases, they ignored low-level use cases such as entering financial details, collecting fees from contractors, and distributing payments to tradesmen. Some use cases, such as continuing education, were not even specified. Moreover, there was clearly room for additional use cases complementing the use cases provided by the company.

Figure 5-1 Add Tradesman or Contractor use case

Figure 5-2 Request Tradesman or Contractor use case

Figure 5-3 Match Tradesman use case

Figure 5-4 Assign Tradesman use case

Figure 5-5 Terminate Tradesman use case

Figure 5-6 Pay Tradesman use case

Figure 5-7 Create Project use case

Figure 5-8 Close Project use case

Core Use Case

Most of the company-provided use cases did not look like core use cases, but rather appeared to be just a list of simple functionalities. Recall that a core use case represents the essence of the business. The essence of the system is not to add a tradesman or contractor, to create a project, or to pay a tradesman. All of these tasks may be done in any number of ways; they add little business value and do not differentiate the system from the competition. Instead, the system’s raison d’être is given in the opening one-sentence definition: “TradeMe is a system for matching tradesmen to contractors and projects.” The only use case with any semblance to that point was the Match Tradesman use case (Figure 5-3).

Simplifying the Use Cases

Customers rarely ever present the requirements in a useful format, let alone in a way that is conducive to good design. You must always transform, clarify, and consolidate the raw data. Early in the design process you may even recognize areas of interaction that will later make mapping the areas to subsystems or layers much more natural. For example, with TradeMe, there were at least three types of roles across all the use cases: the users, the market, and the members. The users can be back-office data entry reps or system administrators. Perhaps only an administrator can terminate a tradesman, but that information was absent from Figure 5-5.

It is useful to show the flow of control between roles, organizations, and other responsible entities, using “swim lanes” in your activity diagrams. For example, Figure 5-9 provides an alternative way of expressing the Terminate Tradesman use case from Figure 5-5.

Figure 5-9 Subdividing the activity diagram with swim lanes

You transform the raw use case by subdividing the activity diagram into areas of interactions. This also helps clarify the required behavior of the system by adding decision boxes or synchronization bars as required. You will see how to use the swim lanes technique later on in this chapter to both initiate and validate the design.

The Anti-Design Effort

Chapter 2 mentioned an anti-design effort as an effective technique to sway people from functional decomposition by deliberately trying to design the worst possible system. While a good anti-design effort produces a valid design because it supports the use cases, it offers no encapsulation and demonstrates tight coupling. Such a design often feels natural to others (i.e., they would have produced something similar). Odds are that the anti-design will be some flavor of functional decomposition.

The Monolith

One simple anti-design example is a god service—an ugly dumping ground of all the functionalities in the requirements, all implemented in one place. While this design is so common it even has a name (the Monolith), by now most people have learned the hard way not to design this way.

Granular Building Blocks

Figure 5-10 shows another take on the anti-design: a massive set of building blocks. Literally every activity in the use cases has a corresponding component in the architecture. There is no encapsulation of either the database access or the database itself.

Figure 5-10 Services explosion anti-design

With so many fine-grained blocks, the Clients become responsible for implementing the business logic of the use cases, as shown in Figure 5-11. Contaminating the Client’s code with business logic results in a bloated Client in which the entire system migrates to the Client, as shown in Figure 2-1.

Figure 5-11 Polluted and inflated Client

Alternatively, you can have the services call each other, as shown in Figure 5-12. However, chaining the highly functional services together in this way creates coupling between them, as depicted in Figure 2-5. Note also in Figure 5-12 the open architecture issues of calling up and sideways.

Figure 5-12 Chaining the services anti-design

Domain Decomposition

Another classic anti-design is to decompose along the domain lines, as shown in Figure 5-13. Here the system is decomposed along the domain lines of Tradesman, Contractor, and Project.

Figure 5-13 Domain decomposition anti-design

Even with a relatively simple system such as TradeMe, there are nearly limitless additional possibilities for domain decomposition, such as Accounts, Administration, Analytics, Approval, Assignment, Certificates, Contracts, Currency, Disputes, Finance, Fulfillment, Legislation, Payroll, Reports, Requisition, Staffing, Subscription, and so on. Who is to say that Project is a better candidate for a domain than Accounts? And which criteria should be used to make that judgment?

Besides having the many drawbacks discussed in Chapter 2, domain decomposition makes it nearly impossible to validate the design by demonstrating support of the use cases. For example, a request for a tradesman will appear on both the Project and Tradesman domain services. Due to the duplication of functionalities across domain lines, it is ambiguous who is doing what and when.

Business Alignment

It is of the utmost importance to recognize that the architecture does not exist for its own sake. The architecture (and the system) must serve the business. Serving the business is the guiding light for any design effort. As such, you must ensure that the architecture is aligned with the vision that the business has for its future and with the business objectives. Moreover, you must have complete bidirectional traceability from the business objectives to the architecture. You must be able to easily point out how each objective is supported in some way by the architecture, and how each aspect of the architecture is derived from some objectives of the business. The alternatives are pointless designs and orphaned business needs.

As discussed in the previous chapters, the architect producing the design has to first recognize the areas of volatility and then encapsulate these areas in system components, operational concepts, and infrastructure. The integration of the components is what supports the required behaviors, and the way the integration takes place is what realizes the business objectives. For example, if a key objective is extensibility and flexibility, then integrating the components over a message bus is a good solution (more on this point later). Conversely, if the key objective is performance and simplicity, introducing a message bus contributes too much complexity.

The rest of this chapter provides a detailed walkthrough of the steps that transform the business needs into a design for TradeMe. These steps start with capturing the system vision and the business objectives, which then drive the design decisions.

The Vision

Seldom will everyone in any environment share the same vision as to what the system should do. Some may have no vision at all. Others may have a different vision than the rest or a vision that serves only their narrow interests. Some may misinterpret the business goals. The company behind TradeMe was stymied by a myriad of additional issues resulting from its failure to keep up with the changing market. These issues were reflected in the existing systems, in the company’s structure, and in the way software development was set up. The new system had to tackle all the issues head-on rather than in a piecemeal fashion, because solving just some of them was insufficient for success.

The first order of business is to get all stakeholders to agree on a common vision. The vision must drive everything, from architecture to commitments. Everything that you do later has to serve that vision and be justified by it. Of course, this cuts both ways—which is why it is a good idea to start with the vision. If something does not serve the vision, then it often has to do with politics and other secondary or tertiary concerns. This provides you with an excellent way of repelling irrelevant demands that do not support the agreed-upon vision. In the case of TradeMe, the design team distilled the vision to a single sentence:

A platform for building applications to support the TradeMe marketplace.

A good vision is both terse and explicit. You should read it like a legal statement. Note that the vision for TradeMe was to build a platform on which to build the applications. This kind of platform mindset addressed the diversity and extensibility the business craved and may be applicable in systems you design.

The Business Objectives

After agreeing on the vision (and only then), you can itemize the vision to specific objectives. You should reject all objectives that do not serve the vision; you should include all objectives that are essential to support the vision. These two types are usually easy to pick out. When you list objectives, you should adopt a business perspective. You must not allow the engineering or marketing people to own the conversation, or to include technology objectives or specific requirements. The design team extracted the following objectives from the TradeMe system overview:

  1. Unify the repositories and applications. The legacy system had entirely too many inefficiencies, requiring a lot of human intervention to keep the system up to date and running.

  2. Quick turnaround for new requirements. The legacy turnaround time for features was abysmal. The new platform had to allow very fast, frequent customization, often tailored just for a specific skill, time of the week, project type, and any combination of these. Ideally, much of this quick turnaround should be automated, from coding to deployment.

  3. Support a high degree of customization across countries and markets. Localization was an incredible pain point because of differences in regulations, legislations, cultures, and languages.

  4. Supports full business visibility and accountability. Fraud detection, audit trails, and monitoring were nonexistent in the legacy system.

  5. Forward looking on technology and regulations. Instead of being in perpetual reactive mode, the system must anticipate change. The company envisioned that this was how TradeMe would defeat the competitors.

  6. Integrate well with external systems. Although somewhat related to the previous objective, the objective here is to enable a high degree of automation over previously laborious manual processes.

  7. Streamline security. The system must be properly secured, and literally every component must be designed with security in mind. To meet the security objective, the development team must introduce security activities such as security audits into the software life cycle and support it in the architecture.

Mission Statement

It may come as a surprise, but articulating the vision (what the business will receive) and the objectives (why the business wants the vision) is often insufficient. People are usually too mired in the details and cannot connect the dots. Thus, you should also specify a mission statement (how you will do it). The TradeMe Mission Statement was:

Design and build a collection of software components that the development team can assemble into applications and features.

This mission statement deliberately does not identify developing features as the mission. The mission is not to build features—the mission is to build components. It now becomes much easier to justify volatility-based decomposition that serves the mission statement because all the dots are connected:

Vision → Objectives → Mission Statement → Architecture

In fact, you have just compelled the business to instruct you to design the right architecture. This is a reversal of the typical dynamics, in which the architect pleads with management to avoid functional decomposition. It is a lot easier to drive the correct architecture through the business by aligning the architecture with the business’s vision, its objectives, and the mission statement. Once you have them agree on the vision, the objectives, and then the mission statement, you have them on your side. If you want the business people to support your architecture effort, you must demonstrate how the architecture serves the business.

The Architecture

Misunderstanding and confusion are endemic with software development and often lead to conflict or unmet expectations. Marketing may use different terms than engineering for the same thing or—even worse—may use the same term but mean a different thing. Such ambiguities may go undetected for years. Before you dive into the act of system design, ensure everyone is on the same page by compiling a short glossary of domain terminology.

Trademe Glossary

A good way of starting a glossary is to answer the four classic questions of “who,” “what,” “how,” and “where.” You answer the questions by examining the system overview, the use cases, and customer interview notes, if you have any. For TradeMe, the answers to the four questions were as follows:

  • Who

    –  Tradesmen

    –  Contractors

    –  TradeMe reps

    –  Education centers

    –  Background processes (i.e., scheduler for payment)

  • What

    –  Membership of tradesmen and contractors

    –  Marketplace of construction projects

    –  Certificates and training for continuing education

  • How

    –  Searching

    –  Complying with regulations

    –  Accessing resources

  • Where

    –  Local database

    –  Cloud

    –  Other systems

Recall from Chapter 3 that you often can map the answers to the four questions to layers, if not to components of the architecture itself.

The list of the “what” is of particularly interest because it hints strongly at possible subsystems or the swim lanes mentioned previously. You can use the swim lanes and the answers to seed and initiate your decomposition effort as you look for areas of volatility. This does not preclude having additional subsystems or imply that these will necessarily be all the subsystems needed—you always decompose based on volatility, and if a “what” is not volatile, then it will not merit a component in the architecture. At this point all it provides is a nice starting point to reason about your design.

Trademe Areas of Volatility

The essence of the decomposition is in identifying the areas of volatility as outlined in the previous chapters. The following list highlights a few of the candidate volatilities for TradeMe and the factors the design team considered:

  • Tradesman. Is this an area of volatility in the system? It is hard to claim that the architecture, even a purely functional one, would suffer to a large extent if you need to add attributes to the tradesman. In other words, tradesman is variable but not volatile. This is also true for any subset of attributes of the tradesman (e.g., skill sets). Maybe the tradesman is not volatile in isolation. Perhaps there exists a more generic volatility, such as membership management or regulations, that has affinity with the tradesman. It is important to discuss the volatility candidates this way and even challenge them. If you cannot clearly state what the volatility is, why it is volatile, and what risk the volatility poses in terms of likelihood and effect, then you need to look further. Identifying tradesman as an area of volatility signals decomposition along domain lines (see Figure 5-12).

  • Education certificates. Is the certification process volatile? If so, what exactly is the true volatility from the point of view of the business and the system? In this case, the volatility arises in the workflow of matching the regulations governing required certifications for projects with appropriately certified tradesman. The certification itself is just an attribute of the tradesman. From the business’s perspective, certification management will forever be secondary to the core added value of being a tradesmen brokerage.

  • Projects. Is project volatility deserving of its own Manager? A Project Manager implies a project context. A Market Manager is better because some activities that the system needs to manage may not require a context of a running project to execute. For example, you can ask the market to propose a match without having a specific project in mind, or maybe a match may require involving multiple projects. Perhaps to maintain a valuable tradesman you wish to pay the tradesman a retainer, irrespective of any project. Identifying projects as a volatility manifests as domain decomposition. The core volatility is the marketplace, not the projects.

There is nothing wrong with suggesting certain areas of volatility, and then examining the resultant architecture. If the result produces a spiderweb of interactions or is asymmetric, then the design is unlikely to be good. You will probably sense whether the design is correct or not.

Sometimes an area of volatility may reside outside the system. For example, while payments may very well be a volatile area due to the various ways in which you could issue payments, TradeMe as a software project was not about implementing a payment system. The payments are ancillary to the core value of the system. The system will likely use a number of external payments systems as Resources. Resources may be whole systems, each with its own volatilities, but these are outside the scope of this system.

The design team produced the following list of areas volatile enough to affect the architecture. The list also identifies the corresponding components of the architecture that encapsulate the areas of volatility:

  • Client applications. The system should allow several distinct client environments to evolve separately at their own pace. The clients cater to different users (tradesmen, contractors, marketplace reps, or education centers) or to background processes, such as a timer that periodically interacts with the system. These client applications may use different UI technologies, devices, or APIs (perhaps the education portal is a mere API); they may be accessed locally or across the Internet (tradesmen versus reps); they may be connected or disconnected; and so on. As expected, the clients are associated with a lot of volatility. Each one of these volatile client environments is encapsulated in its own Client application.

  • Managing membership. There is volatility in the activities of adding or removing tradesmen and contractors, and even the benefits or discounts they get. Membership management changes across locales and over time. These volatilities are encapsulated in the Membership Manager.

  • Fees. All the possible ways TradeMe can make money, combining volume and spread, are encapsulated in the Market Manager.

  • Projects. Project requirements and size not only change but also are volatile and affect the required behavior. Small projects may require different workflows from large projects. The system encapsulates projects in the Market Manager.

  • Disputes. When dealing with people, at best misunderstandings will arise; at worst outright fraud happens. The volatility in handling dispute resolution is encapsulated by the Membership Manager.

  • Matching and approvals. Two volatilities come into play here. The volatility of how to find a tradesman that matches the project needs is encapsulated in the Search Engine. The volatility of search criteria and the definition thereof is encapsulated in the Market Manager.

  • Education. There is volatility in matching a training class to a tradesman and in searching for an available class or a required class. Managing the education workflow volatility is encapsulated in the Education Manager. Searching for classes and certifications is encapsulated in the Search Engine. Compliance with regulatory certification is encapsulated in the Regulation Engine.

  • Regulations. Regulations are likely to change in any given country as time goes by. In addition, the regulations can be internal to the company. This volatility is encapsulated in the Regulation Engine.

  • Reports. All the requirements of reporting and auditing with which the system needs to comply are encapsulated in the Regulation Engine.

  • Localization. Two distinct volatilities relate to localization. UI elements of the Clients encapsulate the volatility in language and culture. For TradeMe, the stakeholders considered this a good enough solution. In other cases, localization could be a strong enough volatility that it would merit its own subsystem (e.g., Manager, Resources). Localization may even affect the design of the Resources. The volatility in regulations between countries is captured in the Regulation Engine.

  • Resources. The Resources may be portals to external systems (such as payment) or store various elements such as lists of tradesman and projects. The exact nature of the store is volatile, potentially ranging from cloud-based database to a local store to a whole other system.

  • Resource access. ResourceAccess components encapsulate the volatility of accessing the Resources such as the location of the storage, its type, and access technology. The ResourceAccess components convert atomic business verbs such as “pay” (e.g., paying a tradesman) into accessing the relevant Resources such as storage and payment systems.

  • Deployment model. The deployment model is volatile. Sometimes data cannot leave a geographic area, or the company may wish to deploy parts or whole systems in the cloud. These volatilities are encapsulated in the composition of the subsystems and the Message Bus utility. The advantages of this modular composable interaction pattern in the system operational concepts are described later.

  • Authentication and authorization. The system can authenticate the Clients in a number of ways, whether they are users or even other systems, and there are multiple options for representing credentials and identities. Authorization is nearly open-ended, with many ways of storing roles or representing claims. These volatilities are encapsulated in the Security Utility component.

Note that the mapping of areas of volatilities to components of the architecture is not 1:1. For example, the preceding list maps three areas of volatilities to the Market Manager. Recall from Chapter 3 that a Manager encapsulates the volatility of a family of logically related use cases, not just a single use case. In the case of the Market Manager, these market use cases are managing projects, matching tradesmen to projects, and charging the fees for the match.

Weak Volatiles

Two additional, weaker areas of volatility are not reflected in the architecture:

  • Notification. How the Clients communicate with the system and how the system communicates with the outside world could be volatile. The use of the Message Bus Utility encapsulates that volatility. If the company had a strong need for open-ended forms of transports such as email or fax, then perhaps a Notification Manager would have been necessary.

  • Analysis. TradeMe could analyze the requirements of projects and verify the requested tradesmen or even propose them in the first place. In this way, TradeMe could optimize the tradesmen assignment to projects. The system could analyze projects in various ways, with such analysis clearly being a volatile area. However, the design team rejected analysis as an area of volatility in the design because, as stated, the company is not in the business of optimizing projects. Providing optimizations, therefore, falls into speculative design. Any analysis activity required is folded into the Market Manager.

Static Architecture

Figure 5-14 shows the static view of the architecture.

Figure 5-14 Static view of the TradeMe architecture

The Clients

The client tier contains a portal for each type of member, the tradesmen and the contractors. There is also a portal for the education center to issue or validate tradesman credentials and a marketplace application for the back-end users to administer the marketplace. In addition, the client tier contains external processes such as a scheduler or a timer that periodically initiates some behavior with the system. These are included in the architecture for reference, but are not part of the system.

Business Logic Services

In the business logic tier are the Membership Manager and Market Manager, encapsulating the respecting volatilities discussed previously. In short, the Membership Manager manages the volatility in the execution of the membership use cases, while the Market Manager is in charge of the use cases pertaining to the marketplace. Note that the use cases of membership (such as adding or removing a tradesman) are both logically related to each other and distinct from those related to the marketplace such as matching a tradesman to a project. The Education Manager encapsulates the volatility in the execution of use cases related to continuing education such as coordinating training and reviewing the education certificates.

There are only two Engines, which encapsulate some of the acute volatilities listed previously. The Regulation Engine encapsulates the regulation and compliance volatility between different countries and even in the same country over time. The Search Engine encapsulates the volatility in producing a match, something that can be done in an open-ended number of ways, ranging from a simple rate lookup, to safety and quality record considerations, to AI and machine learning techniques for the assignments.

ResourceAccess and Resources

The entities required when managing a marketplace, such as payments, members, and projects, all have some storage and corresponding ResourceAccess components. There is also workflows storage, as discussed later.

Utilities

The system requires three Utilities: Security, Message Bus, and Logging. Any future Utilities (e.g., instrumentation) would go in the Utility bar as well.

Message Bus

A message bus is merely a queued Pub/Sub (Figure 5-15). Any message posted to the bus is broadcast to any number of subscribing parties. As such, a message bus provides for general-purpose, queued, N:M communication, where N and M can be any non-negative integers. If the message bus is down, or if the posting party becomes disconnected, messages are queued in front of the message bus, and then processed when connectivity is restored. This provides for availability and robustness. If the subscribing party is down or disconnected (such as a mobile device), the messages are posted to a private queue per subscriber, and are processed when the subscriber becomes available. If both the posting publisher and the subscriber are connected and available, then the messages are asynchronous.

Figure 5-15 The message bus

The choice of technology for the message bus has little to do with architecture and, therefore, is outside the scope of this book. However, specific features provided by a particular message bus may greatly affect ease of implementation, so choosing the right one requires careful consideration. Not all message buses are created equal, including those from brand-name vendors. At the very least, the message bus must support queuing, duplicating messages and multicast broadcasting, headers and context propagation, securing both posting and retrieving of messages, off-line work, disconnected work, delivery failure handling, processing failure handling, poison message handling, transactional processing, high throughput, a service-layer API, multiple-protocol support (especially non-HTTP-based protocols), and reliable messaging. Optional features that may be relevant include message filtering, message inspection, custom interception, instrumentation, diagnostics, automated deployment, easy integration with credentials stores, and remote configuration. No single message bus product will provide all these features. To mitigate the risk of choosing poorly, you should start with a plain, easy-to-use, free message bus, and implement the architecture initially with that message bus. This tactic allows you to better understand the desired qualities and attributes, and to prioritize them. Only then can you choose the best of breed that truly meets your needs.

Adding a message bus to your architecture does not eliminate the need to impose architectural constraints on the communications patterns. For example, you should disallow Client-to-Client communication across the bus.

Operational Concepts

With TradeMe, all communication between all Clients and all Managers takes place over the Message Bus Utility. Figure 5-16 illustrates this operational concept.

Figure 5-16 The abstract system interaction pattern

In this interaction pattern, the Clients and the business logic in the subsystems are decoupled from each other by the Message Bus. Use of the Message Bus in general supports the following operational concepts:

  • All communication utilizes a common medium (the Message Bus). This encapsulates the nature of the messages, the location of the parties, and the communication protocol.

  • No use case initiator (such as Clients) and use case executioner (such as Managers) ever interact directly. If they are unaware of each other, they can evolve separately, which fosters extensibility.

  • A multiplicity of concurrent Clients can interact in the same use case, with each performing its part of the use case. There is no lock-step execution across Clients and system. This, in turn, leads to timeline separation and decoupling of the components along the timeline.

  • High throughput is possible because the queues underneath the Message Bus can accept a very large number of messages per second.

The Message Is the Application

The operational concepts that a message bus supports are certainly nice to have, but by themselves may not justify the increased complexity. The main reason for choosing a message bus is because it supports the most important operational concept of TradeMe: the Message Is the Application design pattern.

When using this design pattern, the “application” is nowhere to be found. There is no collection of components or services that you can point to and identify as the application. Instead, the system comprises a loose collection of services that post and receive messages to one another (over a message bus, although that is secondary consideration). These messages are related to each other. Each service processing a message does some unit of work, and then posts a message back to the bus. Other services will subsequently examine the message, and some of them (or one of them, or none of them) will decide to do something. In effect, the message post by one service triggers another service to do something unbeknownst to the posting service. This stretches decoupling almost to the limit.

Often the same logical message may traverse all the services. Likely the services will add additional contextual information to the message (such as in the headers), modify previous context, pass context from the old message to a new one, and so on. In this way, the services act as transformation functions on the messages. The paramount aspect of the Message Is the Application pattern is that the required behavior of the application is the aggregate of those transformations plus the local work done by the individual services. Any required behavior changes induce changes in the way your services respond to the messages, rather than the architecture or the services.

The business objectives for TradeMe justified the use of this pattern because of the required extensibility. The company can extend the system by adding message processing services, thereby avoiding modification of existing services and risk to a working implementation. This correctly supports the directive from Chapter 3 that you should always build systems incrementally, not iteratively. The objective of forward-looking design is also well served here because nothing in this pattern ties the system to the present requirements. This pattern is also an elegant way of integrating external systems—yet another business objective.

Future-Looking Design

The use of granular services integrated over a message bus with the Message Is the Application design pattern is one of the best ways of preparing the system for the future. By “preparing the system for the future,” I specifically refer to the next epoch in software engineering, the use of the actor model. Over the next decade the software industry will likely adopt a very granular use of services called actors. While actors are services, they are very simple services. Actors reside in a graph or grid of actors, and they only interact with each other using messages. The resulting network of actors can perform calculations or store data. The program is not the aggregate of the code of the actors; instead, the program or the required behavior consists of the progression of messages through the network. To change the program, you change the network of actors, not the actors themselves.

Building systems this way offers fundamental benefits such as better affinity to real-life business models, high concurrency without locking, and the ability to build systems that are presently out of reach, such as smart power grids, command and control systems, and generic AI. Using current-day technology and platforms along with the Message Is the Application is very much aligned—if not most of the way there—with the actor model. For example, in TradeMe, tradesmen and contractors are actors. Projects are networks of these actors, and other actors (such as Market Manager) compose the network. Adopting the TradeMe architecture today prepared the company for the future without compromising on the present.a

a For more on the actor model, see Juval Lowy, Actors: The Past and Future of Software Engineering (YouTube/IDesignIncTV, 2017).

As with everything in life, implementing this pattern comes with a cost. Not every organization can justify using the pattern or even having a message bus. The cost will almost always take the form of additional system complexity and moving parts, new APIs to learn, deployment and security issues, intricate failure scenarios, and more. The upside is an inherently decoupled system geared toward requirements churn, extensibility, and reuse. In general, you should use this pattern when you can invest in a platform and have the backing of your organization both top-down and bottom-up. In many cases, a simpler design in which the Clients just queue up calls to the Managers would be a better fit for the development team. Always calibrate the architecture to the capability and maturity of the developers and management. After all, it is a lot easier to morph the architecture than it is to bend the organization. Once the organizational capabilities have matured, you can incorporate a full Message Is the Application pattern.

Workflow Manager

With The Method, the Managers encapsulate the volatility in the business workflows. Nothing prevents you from simply coding the workflows in the Mangers and then, when the workflows change, changing the code in the Managers. The problem with this approach is that the volatility in the workflows may exceed the developers’ ability, as measured in time and effort, to catch up using just code.

The next operational concept in TradeMe is the use of workflow Managers. I hinted at this concept in Chapter 2 in the discussion of the stock trading system, but this chapter codifies it as another operational pattern. All Managers in TradeMe are workflow Managers. A workflow Manager is a service that enables you to create, store, retrieve, and execute workflows. In theory, it is just another Manager. In practice, however, such Managers almost always utilize some sort of third-party workflow execution tool and workflow storage. For each Client call, the workflow Manager loads not just the correct workflow type but also a specific instance of it, with a particular state and context; executes the workflow; and persists it back to the workflow store. Loading and saving the workflow instance each time supports long-running workflows. The Manager also does not have to maintain any kind of a session with the Client while remaining state-aware. Each call from the same user in the same workflow execution can come from a different device on a different connection and carries with it the unique ID of the instance of the workflow that the Manager should load and execute, as well as information about the client such as its address (e.g., URI).

To add or change a feature, you simply add or change the workflows of the Managers involved, but not necessarily the implementation of the individual participating services. This is a clean way to provide features as aspects of integration (as discussed in Chapter 4) and is a tangible aspect of the mission statement for the system, allowing you to illustrate how the architecture supports the business.

The real necessity for using a workflow Manager arises when the system must handle high volatility. With a workflow Manager, you merely edit the required behavior and deploy the newly generated code. The nature of this editing is specific to the workflow tool you choose. For example, some tools use script editors, whereas others use visual workflows that look like activity diagrams and generate or even deploy the workflow code.

You can even (with the right safeguards) have the product owners or the end users edit the required behavior. This drastically reduces the cycle time for delivering features and allows the software development team to focus on the core services as opposed to chasing changes in the requirements.

The business needs for TradeMe justified the use of this pattern because the objective of a quick turnaround for features is impossible to meet using hand-crafted coding by a small, thinly spread team. Use of workflow Manager enables a high degree of customization across markets, satisfying another objective for the system.

Again, evaluate carefully whether this concept is applicable to your particular case. Make sure the level of workflow volatility justifies the additional complexity, learning curves, and changes to the development process.

Choosing a Workflow Tool

The choice of technology for the workflow tool has little to do with architecture, so it is outside the scope of this book. However, if the architecture calls for it, you had better choose the right workflow tool. Literally dozens of workflow solutions exist, with the various tools offering a very wide set of features. At the very least, the workflow tool should support visual editing of workflows, persisting and rehydrating workflow instances, calling services from within the workflow across multiple protocols, posting messages to the message bus, exposing workflows as services across multiple protocols, nesting workflows, creating libraries of workflows, defining common templates of recurring workflow patterns that can be customized later on, and debugging workflows. It would also be nice to have the ability to play back, instrument, and profile the workflows and to integrate them with a diagnostics system.

Design Validation

You must know before work commences whether the design can support the required behaviors. As Chapter 4 explains, to validate your design, you need to show that the design can support the core use cases by integrating the various areas of volatility encapsulated in your services. You validate the design by showing the respective call chain or sequence diagram for each use case. You may require more than one diagram to complete a use case.

It is important to demonstrate that your design is valid not just to yourself, but also to others. If you cannot validate your architecture, or if the validation is too ambiguous, you need to go back to the drawing board.

As mentioned previously, the few company-provided use cases for TradeMe included just a single candidate for a core use case: Match Tradesmen. The architecture of TradeMe was modular and decoupled from all the use cases to such an extent that the design team could demonstrate that it supported all the provided use cases, not just the core Match Tradesman use case. The next section illustrates the validation of the TradeMe use cases and the operational concept of the new system.

Add Tradesman/Contractor Use Case

The Add Tradesman/Contractor use case involves several areas of volatility: the tradesman (or contractor) Client applications, the workflow of adding a member, compliance with regulations, and the payment system used. You can rearrange and simplify the use case from Figure 5-1 by adding swim lanes to the diagram, as shown in Figure 5-17.

Figure 5-17 The Add Tradesman/Contractor use case with swim lanes

Figure 5-17 shows that the execution of the use case requires interaction between a Client application and the membership subsystem. This is evident in the actual call chains of Figure 5-18 (the Adding Contractor use case is identical but with the contractor’s application, the Contractors Portal). Following the operational concepts of TradeMe, in Figure 5-18 the Client application (in this case, either the Tradesman Portal when the member is applying directly or the Marketplace App when the back-end rep is adding the member) posts the request to the Message Bus.

Figure 5-18 The Add Tradesman/Contractor call chain

Upon receiving the message, the Membership Manager (which is a workflow Manager) loads the appropriate workflow from the workflow storage. This either kicks off a new workflow or rehydrates an existing one to carry on with the workflow execution. Once the workflow has finished executing the request, the Membership Manager posts a message back into the Message Bus indicating the new state of the workflow, such as its completion, or perhaps indicating that some other Manager can start its processing now that the workflow is in a new state. Clients can monitor the Message Bus as well and update the users about their requests. The Membership Manager consults the Regulation Engine that is verifying the tradesman or contractor, adds the tradesman or contractor to the Members store, and updates the Clients via the Message Bus.

Request Tradesman Use Case

The Request Tradesman use case includes two areas of interest: the contractor and the market (Figure 5-19). After initial verification of the request, this use case triggers another use case, Match Tradesman.

Figure 5-19 The Request Tradesman use case with swim lanes

The call chains are depicted in Figure 5-20. Clients such as the Contractors Portal or the internal user of the Marketplace App post a message to the bus requesting a tradesman. The Market Manager receives that message. The Market Manager loads the workflow corresponding to this request, and performs actions such as consulting with the Regulation Engine about what may be valid for this request or updating the project with the request for a tradesman. The Market Manager can then post back to the Message Bus that someone is requesting a tradesman. This will trigger the matching and assignment workflows, all separated on the timeline.

Figure 5-20 Request Tradesman call chains (until matching)

Match Tradesman Use Case

The Match Tradesman core use case involves multiple areas of interest. The first is who initiated the tradesman request that has triggered the match use case. That initiator could be a Client (a contractor or the marketplace reps), as in Figure 5-20, but it could also be a timer or any other subsystem that kicks off the match workflow. The other areas of interest are the market, regulations, search, and ultimately membership, as shown in Figure 5-21.

Figure 5-21 The Match Tradesman use case with swim lanes

Once you realize that regulations and search are all elements of the market, you can refactor the activity diagram to that shown in Figure 5-22. This enables easy mapping to your subsystems design.

Figure 5-22 Refactored swim lanes for the Match Tradesman use case

Figure 5-23 depicts the corresponding call chain. Again, this call chain is symmetrical with other call chains, in the sense that the first action is to load the appropriate workflow and execute it. The last call of the call chain to the Message Bus and to the Membership Manager triggers the Assign Tradesman use case.

Figure 5-23 Call chains for the Match Tradesman use case

Notice the composability of this design. For example, suppose the company really does need to handle acute volatility in analyzing the project’s needs. The call chain for finding a match allows for separating search from analysis. You would add an Analysis Engine to encapsulate the separate set of analysis algorithms. The business can even leverage TradeMe for some business intelligence to answer questions like “Could we have done things better?” For example, a call chain similar to Figure 5-23 could be used for the much more involved scenario of “Analyze all projects between 2016 and 2019” and the design of the components would not have to change at all. The number of these use cases is likely open, and that is the whole point: You have an open-ended design that can be extended to implement any of these future scenarios, a true composable design.

Assign Tradesman Use Case

The Assign Tradesman use case involves four areas of interest (Figure 5-24): client, membership, regulations, and market. Note that the use case is independent of who triggered it, whether an actual internal user or just a request message off the bus from another subsystem. For example, the Match Tradesman use case could trigger the assignment use case as a direct continuation of the workflow in the case of automatic match and assignment.

Figure 5-24 The Assign Tradesman use case swim lanes

Again, after refactoring the activity diagram, it is easy to map to subsystems (Figure 5-25).

Figure 5-25 Unified swim lanes of the Assign Tradesman use case

As with all previous call chains, Figure 5-26 shows how the Membership Manager is executing the workflow that ultimately leads to assigning the tradesman to the project. This is a collaborative work between the Membership Manager and the Market Manager, with each managing its respective subsystem. Note that the Membership Manager is unaware of the Market Manager” it just posts a message to the bus. The Market Manager receives that message and updates the project according to its internal workflow. The Market Manager may, in turn, post another message to the Message Bus to trigger another use case, such as issuing a report on the project, or billing for the contractor, or pretty much anything. This is what the Message Is the Application design pattern is all about: The logical “assignment” message weaves its way between the services, triggering local behaviors as it goes. The Client can also monitor the Message Bus and may advise the user that the assignment is in progress.

Figure 5-26 Call chains for the Assign Tradesman use case

Terminate Tradesman Use Case

In the previous use cases, the initial diagram swim lanes included the regulations area, which was subsequently consolidated into the membership subsystem. Since this was such a recurring pattern, Figure 5-9 shows the refactored diagram for the Terminate Tradesman use case. This diagram still provides enough differentiation to allow for clear mapping to the design.

Figure 5-27 shows the call chain for terminating a tradesman. The Market Manager initiates the termination workflow and notifies the Membership Manager of the termination.

Figure 5-27 Call chains for the Terminate Tradesman use case

Any error condition or deviation from the “happy path” would add a dashed gray arrow from the Membership Manager back to the Message Bus and ultimately back to the client. Figure 5-28 is a sequence diagram demonstrating this interaction, without the calls between the ResourceAccess services and the Resources.

Figure 5-28 Sequence diagram for the Terminate Tradesman use case

Finally, the call chain diagram in Figure 5-27 (or the sequence diagram of Figure 5-28) assumes the termination use case is triggered when a project is completed, and the contractor terminates the assigned tradesmen. But it can also be triggered by the tradesman posting a message from the Tradesman Portal to the Membership Manager, which would cause the call chain to flow in the opposite direction (Membership Manager to Market Manager and on to the Client apps). Again, this is a testimony to the versatility of the design.

Pay Tradesman Use Case

The rest of the use cases closely follow the interactions and design pattern of the use cases described thus far, so only brief descriptions of them appear here. Also note the high degree of self-similarity or symmetry in the call chains. Figure 5-6 showed the Pay Tradesman use case, and its validating call chain is in Figure 5-29.

Figure 5-29 Call chains for the Pay Tradesman use case

Unlike the previous call chains, the payment is triggered by a scheduler or timer that the customer has already in service. The scheduler is decoupled from the actual components and has no knowledge of the system internals: All it does is post a message to the bus. The actual payment is made by PaymentAccess when updating the Payments store and accessing an external payment system, a Resource to TradeMe.

Create Project Use Case

In another simple use case, the Market Manager responds to the request to create a project by executing a corresponding workflow (see Figure 5-30 and the use case diagram in Figure 5-7). Regardless of how many steps this takes, or how many errors are involved, the nature of the workflow Manager pattern allows for as many permutations as needed.

Figure 5-30 Call chains for the Create Project use case

Close Project Use Case

The Close Project use case involves both the Market Manager and the Membership Manager (see Figure 5-31 and the use case in Figure 5-8). Again, TradeMe accomplishes this task with the interplay between these two major abstractions; the interaction is identical to that shown in Figure 5-27.

Figure 5-31 Call chains for the Close Project use case

What’s Next?

This lengthy system design case study concludes the first part of this book. Having the system design in hand is just the first ingredient of success. Next comes project design. You should strike while the iron is hot: Always follow system design with project design, ideally back-to-back, as a continuous design effort.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset