Application development methodologies are principles for organizing and managing software development projects. Methodologies provide a set of practices that developers and stakeholders follow in order to produce operational software.
When a problem is well understood and the deliverable functionality is known in detail, you may find that a process that starts with requirements and analysis and then moves to design and coding in a linear fashion works well. For example, if you were trying to implement a tic-tac-toe game using only the ASCII characters available on a Linux terminal, this methodology may work. If you are planning to develop a tic-tac-toe game that will be played on mobile devices, support multiple mobile platforms, and include additional features such as leaderboards and competitions, then it is unlikely that you will be able to use a strictly linear development methodology and succeed.
Software developers and other stakeholders have developed a variety of application development methodologies. Many of these methodologies are specific implementations of one of three paradigms or models of software development.
Waterfall methodologies are the oldest of the three. The practices advocated in spiral and agile methodologies are designed to avoid some of the drawbacks of waterfall methodologies.
The waterfall model of application development is aptly named because with waterfall methodologies, once you complete a phase, there is no going back, much like going over a waterfall. The typical phases in a waterfall methodology are as follows:
Advocates of the waterfall methodology argue that spending time in early stages of application development will result in lower overall costs. For example, if all functional requirements are known before design starts, then all requirements can be taken into account when designing.
Similarly, if a design is detailed and comprehensive, then developers will save time coding because they will know exactly what needs to be implemented. In theory, waterfall methodologies should reduce the risk of investing time in developing code that will not be used or having to redesign a component because a requirement was missed.
Waterfall methodologies do not work well in situations where requirements cannot be completely known in the early stages or when requirements may change. Requirements may change for business reasons.
Gathering requirements for a new user interface, for example, can be difficult without understanding how users want to interact with a system. User interface designers could get some requirements by interviewing users about their needs and expectations, but users do not necessarily know what they will want until they interact with the system.
Business requirements can change over the course of a project. A sales team may ask developers to create a customer management application with specific functionality, such as assigning each salesperson to a geographic territory. Several weeks later, the sales team reorganizes and now assigns salespeople to products, not geographic territories. In this scenario, the developers cannot proceed with the original specifications without risking building an application that does not meet business requirements.
One way to allow for changes in requirements and design phases is to revisit these stages multiple times over the course of a development project.
Spiral methodologies drop the strict requirement of not returning to an earlier stage in the process. Spiral methodologies use similar phases to waterfall methodologies, but instead of trying to complete each stage for the entire application, spiral approaches work on only a limited set of functionalities at a time. After all the stages have been completed for one set of functionalities, stakeholders determine what to work on next, and the process begins again.
Spiral models are designed to reduce risk in application development. They do this by specifying what should occur in each cycle, including the following:
An advantage of spiral approaches is that you can learn things in each iteration that can be applied in later iterations. Spiral approaches are adaptive, too. For example, if business requirements change after a component has been developed and deployed, it can be changed in a later iteration without disrupting the normal flow of the development process.
Agile methodologies are increasingly being used for software development. These methodologies are distinguished by their focus on close collaboration between developers and stakeholders and on frequent code deployments. Early advocates for agile methods summarized the principles of agile as follows:
See the Agile Manifesto (agilemanifesto.org
) for more on the motivations for this methodology.
Like spiral methodologies, agile methodologies are iterative. However, they typically have shorter cycles and focus on smaller deliverables. Each iteration includes planning, design, development, testing, and deployment.
There is a focus on quality in agile methodologies. This includes meeting business requirements and producing functional, maintainable code. Testing is part of the development stage in agile and not limited to the post-development test phase found in waterfall methodologies.
Agile processes are transparent. There is close collaboration between developers and business stakeholders. This collaboration helps keep a focus on business value and allows developers to learn about changes in requirements quickly. These practices make agile more adaptive than either the waterfall or spiral methodology.
Architects can help application developers decide on the most appropriate methodology for their development efforts. In many cases, agile methods work well because of close collaboration and transparency. This reduces the risk that some critical functionality will be missed or that stakeholders are left uninformed about the status of a project. Agile is well suited to projects that must adapt to changing business and technical requirements.
When developing applications to support business processes that change slowly and have complex requirements, a spiral methodology may be appropriate. In such cases, there may be too many stakeholders and domain experts involved to work collaboratively. In such cases, detailed analysis and documentation may be required so that all stakeholders understand the objectives and risks and agree to them. The iterative nature of a spiral methodology provides opportunities to adapt to changing requirements.
A waterfall methodology may be appropriate for critical safety software, such as an application used with a medical device. In such a case, the requirements may be narrow and fixed. Extensive testing and verification would be required so that it is appropriate to have a separate testing phase in addition to testing done during development. Other devices may interface with the medical device, so detailed technical documentation would be needed.
Another aspect of application development that architects should understand is the accumulation of technical debt.
Application development involves trade-offs. To get a product to market fast enough to beat a competitor, developers may have to choose a design or coding method that can be implemented quickly but is not the option they would have chosen if they had more time. When this happens, an application has code or design features that should be changed in the future. If they are not, then the application will continue to function with substandard code. More substandard code may be added in the future, leading to an accumulation of substandard code in an application.
This situation has been compared to incurring monetary debt. Ward Cunningham, one of the authors of the Agile Manifesto, coined the term technical debt to describe the process of making expedient choices to meet an objective, like releasing code by a particular date. Technical debt incurs something analogous to interest on the national debt, which is the loss of future productivity. Ideally, technical debt is paid down by refactoring code and implementing a better solution.
Projects incur technical debt for many reasons, including the following:
Incurring technical debt is not necessarily a negative factor. Like monetary debt, technical debt can enable a project to move forward and realize more benefit than if the team had not incurred the technical debt. For example, a team may have a deadline to deliver a functioning module within 30 days. If the module is delivered on time and passes a suite of verification tests, then the next larger phase of the project will be funded.
To meet the 30-day deadline, the developers could decide to implement minimal error handling and perform only cursory code reviews. This allows the team to make the deadline and continue developing the larger application. One of the first things they should do in the next phase of the project is to revise the code to improve error handling and perform more thorough code reviews. If the team had not cut corners and had missed the deadline, then there would have been no follow-on development, and the project would have been terminated.
While incurring technical debt is not necessarily a negative factor, not paying it down is. In the previous example, minimal error handling may lead to a less reliable application that simply throws errors up the stack instead of responding to the error in a way that allows the application to continue to operate. Multiple bugs may have been missed because of cursory code reviews, and this could lead to problems in production that adversely impact users of the application.
Technical debt can come in several forms, including code technical debt, architecture design debt, and environment debt. The previous example is an example of code technical debt.
Architecture design debt is incurred when an architecture design choice is made for expedience but will require rework later. For example, an application may be designed to run on a single machine instance. If the application needs to scale up, it will have to run on a larger instance. This is known as vertical scaling. Once the application reaches the limits of vertical scaling, it would have to be rearchitected to work in a distributed environment. This could require changes at multiple levels, such as adding a load balancer and implementing a data partitioning scheme.
Environment debt occurs when expedient choices are made around tooling. For example, instead of implementing a CI/CD platform, a team may decide to build their application and run tests manually. This would save the time required to set up a CI/CD platform, but it leaves developers to perform manual deployments and test executions repeatedly.
Architects should be aware of the level of technical debt in a project. Paying down technical debt of all kinds is important and should be planned for accordingly.
APIs provide programmatic access to services. APIs are often REST APIs or RPC APIs. REST APIs are resource oriented and use HTTP, while RPC APIs tend to be oriented around functions implemented using sockets and designed for high efficiency. For further details on API recommendations, see the Google Cloud API Design Guide (cloud.google.com/apis/design
), which specifies design principles for both REST APIs and RPC APIs.
The following are some Google-recommended API design practices, and these apply to both types of APIs.
APIs should be designed around resources and operations that can be performed on those resources. Resources have a resource name and a set of methods. The following are the four most commonly used HTTP methods for REST APIs:
Custom methods are used to implement functionality that is not available in the standard methods. Standard methods are preferred over custom methods.
Resources may be simple resources or collections. Simple resources consist of a single entity. Collections are lists of resources of the same type. List resources often support pagination, sort ordering, and filtering.
Resources should be named using a hierarchical model. For example, consider an application that maintains customer contacts. Each customer is identified by their email address. A contact may be an outgoing email or an incoming email. Each email has a unique identifier. The following is an example of a message resource name:
customers.example.com/contacts/[email protected]/outgoing/message1
Note that a resource name is not the same as the REST URL. A REST URL should include an API version number. The following is the REST URL for the preceding example:
customers.example.com/v2/contacts/[email protected]/outgoing/message1
When an API call results in an error, a standard HTTP error should be returned. Additional detail about the error can be provided in the message payload. HTTP 200 is the standard status code for a successful call. The following are example HTTP error codes:
In addition to conventions around naming and error messages, there are recommended best practices for securing APIs, for example, restricting the text of the message to the standard text. Providing more information could be a security risk.
APIs should be versioned to improve the stability and reliability of APIs. By specifying versions for APIs, it is possible to add new functionality while maintaining support for existing APIs. If API versions are to be deprecated, consider how you will communicate information about that and how you support migration to newer API versions.
API
Security APIs should enforce controls to protect the confidentiality and integrity of data and the availability of services. Confidentiality and integrity are protected in part by HTTPS-provided encryption. This protects data in transit between a client and an API endpoint. Persistently stored data is encrypted by default in all Google Cloud storage systems. Application designers are responsible for protecting the confidentiality and integrity of data when it is in use by an application.
API functions execute operations on behalf of an entity with an identity. In the Google Cloud Platform, an identity may be a user or a service account. Identities should be managed by a centralized system, such as Cloud Identity and IAM. Identities should be assigned roles, which are collections of permissions. Predefined roles in IAM are designed to accommodate common requirements for different types of users of services.
One way to authenticate users of API functions is to require an API key. API keys are strings of alphanumeric characters that uniquely identify an app or device to a service.
JSON Web Tokens (JWT) are commonly used for authorizing when making API calls. When users log into services, they can receive a JWT, which they then pass to subsequent API calls. The JWT contains claims about the subject of the token and what the subject of the token is allowed to do. JWTs are digitally signed and can be encrypted. A JWT is a JSON structure with three parts.
Headers contain a type attribute indicating that the token is a JWT type of token and the algorithm used to sign the token.
The payload is a set of claims. Claims make statements about the issuer, subject, or token. They may include commonly used claims such as an expiration time or the name of the subject. They may also include private claims that are known to the parties that agree to use them. These might include application-specific claims, such as a permission to query a specific type of data.
The signature is the output of the signature algorithm generated using the header, the payload, and a secret. Before using the claims in the payload, a service should validate the signature. If the signature is valid, it proves that the signer knows the secret and that the JWT has not been altered since it was signed.
The JWT is encoded in three Base64-encoded strings separated by periods.
Maintaining the availability of a service is another aspect of security. If a service were to try to always respond to all function calls from all users, there would be a risk of overloading the system.
Users could intentionally or unintentionally send large volumes of function calls to a service. Eventually resources could be exhausted, and API requests will return with connection failures or other error messages. To prevent the excessive use of system resources, APIs should include resource-limiting mechanisms.
One way to limit resource consumption is to set a maximum threshold for using a service for a given period of time. For example, a user may be limited to 100 API calls a minute. Once a user has made 100 requests, no other requests from that user will be executed until the start of the next minute.
Another way to control resource usage is by rate limiting. In this case, you set a maximum rate, such as 100 API requests a minute, which would be an average of one request every 0.6 seconds. If a user invokes API functions at a rate faster than one every 0.6 seconds, some requests can be dropped until the rate falls below the rate limit.
Sometimes limits are set on the overall number of requests without regard to individual users. These limits are higher than the limits that apply to individual users.
When one of these limits is exceeded, the response should have status code 429 (Too Many Requests). Responding to excessive requests in this way is called throttling.
Testing is an important activity of software development, and it is part of all software development methodologies. Automated testing enables efficient CI/CD. Testing tools can employ a variety of approaches or models.
Testing tools that enable automation may employ a number of different testing frameworks, including the following:
Data-driven testing uses structured data sets to drive testing. Tests are defined using a set of conditions or input values and expected output values. A test is executed by reading the test data source; then, for each condition or set of inputs, the tested function is executed, and the output is compared to the expected value. Data-driven testing is appropriate for testing APIs or functions executed from a command line.
Modularity-driven testing uses small scripts designed to test a limited set of functionalities. These scripts are combined to test higher-order abstractions. For example, a developer may create test scripts for creating, reading, updating, and deleting a customer record. Those four scripts may be combined into a customer management test script. Another script designed to search and sort customers may be combined with the customer management test script into a higher-level script that tests all customer data operations.
Keyword-driven testing separates test data from instructions for running a test. Each test is identified using a keyword or key term. A test is defined as a sequence of steps to execute. For example, the steps to enter a new customer into a database might start with the following:
In addition to these instructions, data for each test is stored in another document or data source. For example, the test data may be in a spreadsheet. Each row in the spreadsheet contains example names, addresses, phone numbers, and email addresses.
The set of instructions can change as the software changes without needing to change the test data. For example, if the application is changed so that a new window is not opened, then this set of instructions can be updated without requiring any changes to the test data. This framework is well suited to manual testing, especially for testing graphical user interfaces. Keyword test frameworks can also be automated.
In model-based testing, instead of having a person generate test data, a simulation program is used to generate it. Typically, when model-based testing is used, the simulator is built in parallel with the system under test. Model-based testing uses several methods to simulate the system being tested, including describing the expected system behavior in a finite state machine model or defining logical predicates that describe the system.
Test-driven development incorporates testing into the development process. In this framework, requirements are mapped to tests. The tests are usually specific and narrowly scoped. This encourages developing small amounts of code and frequent testing. Once a piece of code passes its tests, it can be integrated into the baseline of code.
Hybrid testing is a testing framework that incorporates two or more distinct frameworks.
Developers have a choice of testing tools that range from functionally limited, language-specific tools to general-purpose testing platforms. Here are some examples of automated testing tools.
Developing unit tests can be done with language-specific tools. For example, pytest (docs.pytest.org/en/latest
) is a Python testing framework that makes it easy to write and execute unit tests for Python programs. JUnit (junit.org/junit5
) is a comparable framework for developers testing Java code.
Selenium (www.seleniumhq.org
) is a widely used open source browser automation tool that can be used as part of testing. The Selenium WebDriver API enables tests to function as if a user were interacting with a browser. Selenium scripts can be written in a programming language or by using the Selenium IDE.
Katalon Studio (www.katalon.com
) is an open source, interactive testing platform that builds on Selenium. It can be used to test web-based and mobile applications and APIs.
Another type of automated testing is fuzzing. A method of subjecting a program to semi-random inputs for an extended period of time, it can be used to find bugs and security vulnerabilities that would otherwise turn up only at runtime. Tools that do fuzzing are called fuzzers. You can read more about fuzzing at owasp.org/www-community/Fuzzing
.
Data and system migration tools support the transition from on-premises or other clouds to GCP cloud-based infrastructure. For the purposes of the Google Cloud Professional Architect exam, it helps to understand the types of migrations that organizations can implement and the tools and services that can help with the migration.
Cloud migration projects typically fall into one of three categories.
When implementing a lift-and-shift migration, you should perform an inventory of all applications, data sources, and infrastructure. Identify dependencies between applications because that will influence the order in which you migrate applications to the cloud. You should also review software license agreements. Some licenses may need to be revised to move applications to the cloud. For example, if an enterprise application is run under a site license for one data center and you plan to run that application in both the cloud and on-premises for some period, additional licensing would be required.
Variations on these migration strategies include replatforming, repurchasing, retirement, and retaining:
When migrating and changing applications and infrastructure, you will need a detailed plan identifying what systems will change, how those changes will impact other systems, and the order in which systems will be migrated and modified. If the migration will have any impact on the user experience, training should be included in the plan.
In addition to thinking about migration in terms of applications, it is important to think of how data will migrate to the cloud.
Migrations typically require the transfer of large volumes of data. How you will go about transferring that data is determined by a number of factors, including the following:
The time required to transfer data is a function of the volume of data and the network bandwidth. For example, transferring 1 GB of data over a 100 Gbps network will take about 0.1 seconds; on a 1 Mbps network, that same data transfer will take about three hours. Transferring one petabyte of data will require 30 hours over a 100 Gbps network and more than 120 days over a 1 Gbps network.
You have several options for transferring data into Google Cloud, including the following:
gsutil
command-line utilityThe Storage Transfer Service allows for the transfer of data from an HTTP/S location, an AWS S3 bucket, or a Cloud Storage bucket. The data is always transferred to a Cloud Storage bucket. Transfer operations are defined using transfer jobs that run in the Google Cloud. The Transfer Service is the recommended way of transferring data from AWS or other cloud providers to Google Cloud.
The gsutil command-line utility is the recommended way to transfer data from on-premises to Google Cloud. Consider compressing and de-duplicating data before transferring data to save time on the transfer operation. Compressing data is CPU intensive, so there is a trade-off between reducing transfer time and incurring additional CPU load.
gsutil
is multithreaded, which improves performance when transferring a large number of files. gsutil
also supports parallel loading of chunks or subsets of data in large files. The chunks are reassembled at the destination. gsutil
also supports restarts after failures. You can tune gsutil
transfers with command-line parameters specifying the number of processes, number of threads per process, and other options.
If large volumes of data will be transferred and a transfer over the network would take too long, then it is recommended that you use the Google Transfer Appliance, which is a high-capacity storage device that is shipped to your site. Currently, 40 TB (TA40) and 300 TB (TA400) appliances are available. Those configurations may change in the future. The appliance is installed on your network, and data is transferred to the storage unit, which is then shipped back to Google. After Google receives the storage unit, it will make the unit accessible to you so that you can log in to the console and transfer the data to a Cloud Storage bucket.
Another option is to use a third-party service, such as those offered by Zadara, Iron Mountain, and Prime Focus Technologies.
Google Cloud's Database Migration Service is used to migrate MySQL and PostgreSQL databases from on-premises or in Compute Engine, or other clouds to Cloud SQL. Support for SQL Server migrations is expected soon. The service supports continuous change data capture, so it provides for minimal downtime.
The GCP SDK is a set of command-line tools for managing Google Cloud resources. These commands allow you to manage infrastructure and perform operations from the command line instead of the console. The GCP SDK components are especially useful for automating routine tasks and for viewing information about the state of your infrastructure.
Developers, engineers, and others who work with Google Cloud have the option of using the cloud console for interactive work as well as three options for interacting with Google Cloud programmatically: Google Cloud SDK, Google Cloud Shell, and emulators.
SDK
The Cloud SDK includes the following:
gcloud
, gsutil
, and bq
are installed by default when installing the GCP SDK. Additional components can be installed as needed using the gcloud components install
command. The additional components include the following:
gcloud
include a gcloud bigtable
component.)The gcloud components list
command generates a list of available components. The list contains the name of each component, an ID, the size of the component, and the status of the component on the local device, which is one of these: not installed, installed (and up-to-date), and update available (installed but not up-to-date).
Some components are in alpha or beta release. These are run using the gcloud alpha
and gcloud beta
commands, respectively.
In addition to accessing the SDK from the command line, you can also use client libraries developed for several languages, including Java, Python, Ruby, PHP, C#, Node.js, and Go.
GCP SDK supports both user account and service account authorization. User account authorization is enabled using the gcloud init
command. Service account authorization is enabled using the gcloud auth activate-service-account
command.
Google Cloud Shell is a managed service that provides an online development environment with the features of a Linux shell along with pre-installed tools such as the Google Cloud SDK and kubectl. The service also includes a Cloud Shell Editor.
Cloud Shell can be accessed from a web browser and provides 5 GB of persistent storage.
Google Cloud provides emulators for several services that allow you to develop locally before running your code in the cloud. This can help reduce cloud charges. Emulators are available for the following:
Emulators are installed using the gcloud
commands.
Architects support application development and operations. For example, architects can help teams and organizations choose an application development methodology suitable for their needs. Options include waterfall, spiral, and agile methodologies. Agile methodologies work well in many cases, in part because of their focus on collaboration and rapid, incremental development. In addition to planning new feature work, application developers should invest time and resources to pay down technical debt.
Follow established recommended practices when designing APIs, such as orienting the API around entities, not functions performed on those entities. Include security considerations, such as authorizations and rate limiting, when designing APIs.
Testing should be automated. Developers can choose from a variety of testing frameworks to find one or more that fits well with their development processes.
When migrating applications and data to the cloud, consider data volumes and bandwidth when choosing a method to migrate data.
The GCP SDK is set of command-line tools and language-specific libraries for managing GCP resources. Most GCP services can be managed using the gcloud
commands, but Cloud Storage and BigQuery have their own command-line utilities called gsutil
and bq
, respectively.
Resource limiting is often implemented by API gateways that are separate and distinct from the API itself. GCP has two different API gateway offerings: the API Gateway, a basic service; and Apigee, which is more feature rich.
gsutil
command linegsutil
gcloud
cbt
bq
gsutil
kubectl
gsutil component list
cbt component list
gcloud component list
bq component list