Appendix G

Software Testing Techniques

G1: Basis Path Testing

Basis path testing is a white-box technique that identifies test cases on the basis of flows or logical paths that can be taken through a program. A basis path is a unique path through the program where no iterations are allowed. Basis paths are atomic-level paths, and all possible paths through the system are linear combinations of them. Basis path testing also produces a cyclomatic metric, which measures the complexity of a source code module by examining the control structures.

Consider the following small program, which reads records from a file and tallies the numerical ranges of a field on each record to illustrate the technique.

PROGRAM: FIELD-COUNT

Node Statement
1.     Dowhile not EOF
       read record
2.            if FIELD_COUNTER > 7 then
3.     increment COUNTER_7 by 1
  else
4.     if FIELD_COUNTER > 3 then
5.     increment COUNTER_3 by 1
  else
6.     increment COUNTER_1 by 1
7.     endif
8.     endif
9.     End_While
10.    End

In theory, if the loop were to be iterated 100 times, 1.5 × 10 test cases would be required to perform exhaustive testing, which is not achievable. On the other hand, with basis testing there are four basis test cases required to test the program:

       1 → 10
       1 → 2 → 3 → 8 → 9 → 1 → 10
       1 → 2 → 4 → 5 → 7 → 8 → 9 → 1 → 10
       1 → 2 → 4 → 6 → 7 → 8 → 9 → 1 → 10

Mathematically, all possible paths in the program can be generated by linear combinations of the four basis paths. Experience shows that most of the potential defects will be discovered by executing the four basis path test cases, which demonstrates the power of the technique. The number of basis paths is also the cyclomatic complexity metric. It is recommended that the cyclomatic for a program module not exceed 10. As the calculations are very labor intensive, there are testing tools to automate the process. See Section 6, “Modern Software Testing Tools,” for more details.

Basis path testing can also be applied to integration testing when program modules are integrated. The use of the technique quantifies the integration effort involved as well as the design-level complexity.

G2: Black-Box Testing

Black-box, or functional, testing is one in which test conditions are developed on the basis of the program or system’s functionality; that is, the tester requires information about the input data and observed output, but does not know how the program or system works. Just as one does not have to know how a car works internally to drive it, it is not necessary to know the internal structure of a program to execute it. The technique focuses on testing the program’s functionality against the specification. With black-box testing, the tester views the program as a black-box and is completely unconcerned with the internal structure of the program or system. Some examples in this category include the following: decision tables, equivalence partitioning, range testing, boundary value testing, database integrity testing, cause-effect graphing, orthogonal array testing, array and table testing, exception testing, limit testing, and random testing.

A major advantage of black-box testing is that the tests are geared to what the program or system is supposed to do, and it is natural and understood by everyone. This should be verified with techniques such as structured walkthroughs, inspections, and JADs. A limitation is that exhaustive input testing is not achievable, because this requires that every possible input condition or combination be tested. In addition, because there is no knowledge of the internal structure or logic, there could be errors or deliberate mischief on the part of a programmer that may not be detectable with black-box testing. For example, suppose a disgruntled payroll programmer wanted to insert some job security into a payroll application he is developing. By inserting the following extra code into the application, if the employee were to be terminated, that is, if his employee ID no longer exists in the system, justice would sooner or later prevail.

Extra Program Logic

if my employee ID exists
       deposit regular pay check into my bank account
else
       deposit an enormous amount of money into my bank account
       erase any possible financial audit trails
       erase this code

G3: Bottom-Up Testing

The bottom-up testing technique is an incremental testing approach in which the lowest-level modules or system components are integrated and tested first. Testing then proceeds hierarchically to the top level. A driver, or temporary test program that invokes the test module or system component, is often required. Bottom-up testing starts with the lowest-level modules or system components with the drivers to invoke them. After these components have been tested, the next logical level in the program or system component hierarchy is added and tested driving upward.

Bottom-up testing is common for large complex systems, and it takes a relatively long time to make the system visible. The menus and external user interfaces are tested last, so users cannot have an early review of these interfaces and functions. A potential drawback is that it requires a lot of effort to create drivers, which can add additional errors.

G4: Boundary Value Testing

The boundary-value-testing technique is a black-box technique that focuses on the boundaries of the input and output equivalence classes (see Equivalence Class Partitioning Testing). Errors tend to congregate at the boundaries. Focusing testing in these areas increases the probability of detecting errors.

Boundary value testing is a variation of the equivalence class partitioning technique, which focuses on the bounds of each equivalence class, for example, on, above, and below each class. Rather than select an arbitrary test point within an equivalence class, boundary value analysis selects one or more test cases to challenge each edge. Focus is on the input space (input equivalence classes) and output space (output equivalence classes). It is more difficult to define output equivalence classes and, therefore, boundary value tests.

Boundary value testing can require a large number of test cases to be created because of the large number of input and output variations. It is recommended that at least nine test cases be created for each input variable. The inputs need to be thoroughly understood, and the behavior must be consistent for the equivalence class. One limitation is that it may be very difficult to define the range of the equivalence class if it involves complex calculations. It is, therefore, imperative that the requirements be as detailed as possible. The following are some examples of how to apply the technique.

Numeric Input Data

Field Ranges

Example: “Input can range from integers 0 to 100,” test cases include –1, 0, 100, 101.

Example: “Input can range from real numbers 0 to 100.0,” test cases include –0.00001, 0.0, 100.0, 100.00001.

Numeric Output Data

Output Range of Values

Example: “Numerical range outputs of actuarial tables can be from $0.0 to $100,000.00”; for example, an attempt to create conditions that produce a negative amount, $0.0, $100,000.00, $100,000.01.

Nonnumeric Input Data

Tables or Arrays

Example: Focus on the first and last rows, for example, read, update, write, delete.

Example: Try to access a nonexistent table or array.

Number of Items

Example: “Number of products associated with a model is up to 10”; for example, enter 0, 10, 11 items.

Nonnumeric Output Data

Tables or Arrays

Example: Focus on the first and last rows, for example, update, delete, insert operations.

Number of Outputs

Example: “Up to 10 stocks can be displayed”; for example, attempt to display 0, 10, and 11 stocks.

GUI

  1. Vertically and horizontally scroll to the end of scroll bars.

  2. Upper and lower limits of color selection.

  3. Upper and lower limits of sound selection.

  4. Boundary gizmos, for example, bounds available sets of available input values.

  5. Spinners, for example, small edit field with two half-height buttons.

  6. Flip-flop menu items.

  7. List box bounds.

G5: Branch Coverage Testing

Branch coverage, or decision coverage, is a white-box testing technique in which test cases are written to ensure that every decision has a true and false outcome at least once; for example, each branch is traversed at least once. Branch coverage generally satisfies statement coverage (see G31, “Statement Coverage Testing”), because every statement is on the same subpath from either branch statement.

Consider the following small program, which reads records from a file and tallies the numerical ranges of a field on each record to illustrate the technique.

PROGRAM: FIELD-COUNT

Dowhile not EOF
       read record
       if FIELD_COUNTER > 7 then
              increment COUNTER_7 by 1
       else
              if FIELD_COUNTER > 3 then
              increment COUNTER_3 by 1
       else
              increment COUNTER_1 by 1
       endif
       endif
End_While
End

The test cases to satisfy branch coverage are as follows:

Test Case

Value (FIELD_COUNTER)

1

>7, ex. 8

2

<= 7, ex. 7

3

>3, ex. 4

4

<= 3, ex. 3

For this particular example, Test Case 2 is redundant and can be eliminated.

G6: Branch/Condition Coverage Testing

Branch/condition coverage is a white-box testing technique in which test cases are written to ensure that each decision and the conditions within a decision take on all possible values at least once. It is a stronger logic-coverage technique than decision or condition coverage because it covers all the conditions that may not be tested with decision coverage alone. It also satisfies statement coverage.

One method of creating testing cases using this technique is to build a truth table and write down all conditions and their complements. Then, if they exist, duplicate test cases are eliminated. Consider the following small program, which reads records from a file and tallies the numerical ranges of a field on each record to illustrate the technique.

PROGRAM: FIELD-COUNT

Dowhile not EOF
       read record
       if FIELD_COUNTER > 7 then
              increment COUNTER_7 by 1
else
       if FIELD_COUNTER > 3 then
              increment COUNTER_3 by 1
else
       increment COUNTER_1 by 1
       endif
       endif
End_While
End

The test cases to satisfy branch/condition coverage are as follows:

Test Case

Value (FIELD_COUNTER)

1

>7, ex. 8

2

<= 7, ex. 7

3

>3, ex. 4

4

<= 3, ex. 3

For this particular example there is only one condition for each decision. If there were more, each condition and its complement would be tested. Again, Test Case 2 is redundant and can be eliminated.

G7: Cause-Effect Graphing

Cause-effect diagrams (also known as Ishikawa or Fishbone diagrams) are useful tools to analyze the causes of an unsatisfactory condition. They have several advantages. One is that they provide a visual display of the relationship of one cause to another. This has proved to be an effective way to stimulate ideas during the initial search. Another benefit is that they provide a way to keep searching for root causes by asking why, what, where, who, and how. Yet another benefit is that they are graphical representations in which the cause-relationships are easily discernible.

One application of cause-effect graphs was undertaken to understand the inspection process. It discovered that (1) excessive size of materials to be inspected leads to a preparation rate that is too high, (2) a preparation rate that is too high contributes to an excessive rate of inspection, and (3) an excessive rate of inspection causes fewer defects to be found. This analysis using cause-effect graphics provided insights to optimize the inspection process by limiting the size of materials to be inspected and the preparation rate.

Proper preparation for construction of cause-effect diagrams is essential. Visibility is a key requirement. It is advisable to leave a good deal of space between the causes as they are listed, so there can be room for additional notation as the work continues.

Several stages of construction should be expected before a “finished” product is developed. This often consists of enlarging a smaller section of the cause-effect diagram by taking one significant cause and making it the “effect” to be analyzed on another cause-effect diagram.

Cause-effect graphics can also be applied to test case design, particularly function testing. They are used to systematically select a set of test cases that have high probability of detecting program errors. This technique explores the input and combinations of input conditions of a program to develop test cases, but does not examine the internal behavior of the program. For each test case derived, the technique also identifies the expected output. The input and output are determined through the analysis of the requirement specifications (see Section 6, “Modern Software Testing Tools,” which automates the process).

The following is a brief overview of the methodology to convert requirements to test cases using cause-effect diagrams. It is followed by an example of how to apply the methodology.

Cause-Effect Methodology

  1. Identify all the requirements.

  2. Analyze the requirements, and identify all the causes and effects.

  3. Assign each cause and effect a unique number.

  4. Analyze the requirements, and translate them into a Boolean graph linking the causes and effects.

  5. Convert the graph into a decision table.

  6. Convert the columns in the decision table into test cases.

Example: A database management system requires that each file in the database have its name listed in a master index identifying the location of each file. The index is divided into ten sections. A small system is being developed that allows the user to interactively enter a command to display any section of the index at the terminal. Cause-effect graphing is used to develop a set of test cases for the system. The specification for this system is explained in the following paragraphs.

Specification

To display one of the ten possible index sections, a command must be entered consisting of a letter and a digit. The first character entered must be a D (for display) or an L (for list), and it must be in column 1. The second character entered must be a digit (0 through 9) in column 2. If this command occurs, the index section identified by the digit is displayed on the terminal. If the first character is incorrect, the error message “Invalid Command” is printed. If the second character is incorrect, the error message “Invalid Index Number” is printed.

The causes and effects are identified as follows.

Causes
  1. Character in column 1 is D.

  2. Character in column 1 is L.

  3. Character in column 2 is a digit.

Effects
  1. Index section is displayed.

  2. Error message “Invalid Command” is displayed.

  3. Error message “Invalid Index Number” is displayed.

A Boolean graph (see Exhibit G.1) is constructed through analysis of the specification. This is accomplished by (1) representing each cause and effect by a node and its unique number; (2) listing all the cause nodes vertically on the left side of a sheet of paper and listing the effect nodes on the right side; (3) interconnecting the cause and effect nodes by analyzing the specification. Each cause and effect can be in one of two states: true or false. Using Boolean logic, set the possible states of the causes and determine under what conditions each effect is present; and (4) annotating the graph with constraints describing combinations of causes and effects that are impossible because of syntactic or environmental constraints.

Node 20 is an intermediate node representing the Boolean state of node 1 or node 2. The state of node 50 is true if the states of nodes 20 and 3 are both true. The state of node 20 is true if the state of node 1 or node 2 is true. The state of node 51 is true if the state of node 20 is not true. The state of node 52 is true if the state of node 3 is not true. Nodes 1 and 2 are also annotated with a constraint that states that causes 1 and 2 cannot be true simultaneously.

Exhibit G.2 shows Exhibit G.1 converted into a decision table. This is accomplished by (1) tracing back through the graph to find all combinations of causes that make the effect true for each effect, (2) representing each combination as a column in the decision table, and (3) determining the state of all other effects for each such combination. After completing this, each column in Exhibit G.2 represents a test case.

Images

Exhibit G.1   Cause-Effect Graph

Images

Exhibit G.2   Decision Table

For each test case, the bottom of Exhibit G.2 indicates which effect is present (indicated by a “1”). For each effect, all combinations of causes that result in the effect are represented by the entries in the columns of the table. Blanks in the table mean that the state of the cause is irrelevant.

Each column in the decision table is converted into the four test cases shown in the following table.

Test Case Number

Input

Expected Results

1

D5

Index Section 5 is displayed

2

L4

Index Section 4 is displayed

3

B2

“Invalid Command”

4

DA

“Invalid Index Number”

Cause-effect graphing can produce a useful set of test cases and can point out incompleteness and ambiguities in the requirement specification. It can be applied to generate test cases in any type of computing application when the specification is clearly stated and combinations of input conditions can be identified. Although manual application of this technique is tedious, long, and moderately complex, there are automated testing tools that will automatically help convert the requirements to a graph, decision table, and test cases. See Section 6, “Modern Software Testing Tools,” for more details.

G8: Condition Coverage

Condition coverage is a white-box testing technique in which test cases are written to ensure that each condition in a decision takes on all possible outcomes at least once. It is not necessary to consider the decision branches with condition coverage using this technique. Condition coverage guarantees that every condition within a decision is covered. However, it does not necessarily traverse the true and false outcomes of each decision.

One method of creating testing cases using this technique is to build a truth table and write down all conditions and their complements. If they exist, duplicate test cases are eliminated.

Consider the following small program, which reads records from a file and tallies the numerical ranges of a field on each record to illustrate the technique.

PROGRAM: FIELD-COUNT

Dowhile not EOF
       read record
       if FIELD_COUNTER > 7 then
              increment COUNTER_7 by 1
       else
              if FIELD_COUNTER > 3 then
                     increment COUNTER_3 by 1
       else
                     increment COUNTER_1 by 1
              endif
       endif
End_While
End

The initial test cases to satisfy condition coverage are as follows:

Test Case

Values (FIELD_COUNTER)

1

>7, e.g., 8

2

<= 7 e.g., 7

3

>3, e.g., 6

4

<= 3, e.g., 3

Notice that test cases 2 and 3 are redundant and one of them can be eliminated, resulting in three test cases.

G9: CRUD Testing

A CRUD matrix, or process/data matrix, is optionally developed during the analysis phase of application development, which links data and process models. It helps ensure that the data and processes are discovered and assessed. It identifies and resolves matrix omissions and conflicts and helps refine the data and process models, as necessary. It maps processes against entities, showing which processes create, read, update, or delete the instances in an entity.

The CRUD matrix in Exhibit G.3 is developed at the analysis level of development before the physical system or GUI (physical screens, menus, etc.) has been designed and developed. As the GUI evolves, a CRUD test matrix can be built, as shown in Exhibit G.3. It is a testing technique that verifies the life cycle of all business objects. In Exhibit G.3, each CRUD cell object is tested. When an object does not have full life-cycle operations, a “—” can be placed in a cell.

A variation of this is to also make unit performance measurements for each operation during system fragment testing.

G10: Database Testing

The following subsections provide a description of how to test databases. An overview of relational database concept is also presented, which will serve as a reference to the tester.

Database Integrity Testing

Database integrity testing verifies the structure and format, compliance with integrity constraints, business rules and relationships, edit controls on updates that refresh databases, and database normalization, or denormalization per performance constraints. There are at least six types of integrity tests that need to be performed to verify the integrity of the database.

Entity Integrity

Entity integrity states that each row must always have a primary key value. For example, if team ID is the primary key of the team table, no team can lack a team ID. This can be tested and verified with database integrity reports or queries.

Images

Exhibit G.3   CRUD Testing

Primary Key Integrity

The value of each primary key must be unique and valid. For example, two teams cannot have the same team ID, and a team ID of “ABC” is invalid when numeric values are required. Another rule is that the primary key must not contain a null value (be empty). This can be tested and verified with database integrity reports or queries.

Column Key Integrity

The values in a column have column-specific rules. For example, the values in a column for the number of members on a team must always be a positive number and not exceed 7. This can be tested and verified with database integrity reports or queries. It can also be verified with the following testing techniques: range testing, boundary value testing, field integrity testing, and positive and negative testing.

Domain Integrity

A domain is an object that is a set of data and characteristics that describe those values. For example, “date” could be defined as a basic data type that has a field length, format, and validation rules. Columns can be defined on the basis of domains; in this case a column might be defined as an order date. This can be tested and verified with database queries. It can also be verified with the following testing techniques: range testing, boundary value testing, field integrity testing, and positive and negative testing.

User-Defined Integrity

User-defined integrity checks are specialized validation rules that go beyond the standard row and column checks. User-defined rules for particular data items often must be written manually, using a procedural language.

Another option instead of writing procedures is the use of assertions, if available. Assertions are stand-alone validation checks that are not linked to a particular row or column, but that are automatically applied.

Referential Integrity

The primary key is a candidate key that uniquely identifies a particular entity. With a table of teams, the primary key could be the team number. A foreign key is a key that refers to a primary key in another entity, as a cross-reference. For example, part of the key to a member name (from a member entity) may be a team ID, which is the primary key to the team entity.

A table has business rules that govern the relationships among entities. For example, a member must be related to a team, and only one team. A team, on the other hand, may at any given time have no members, only one member, or many members. This is referred to as the cardinality of the entity relationship. Any member “floating around” in the system, without being associated with a team, is an invalid order. A record such as this is referred to as an orphan.

As an example, assume that a team can have one, no, or more members, but a member cannot exist without a team. The test cases shown in Exhibit G.4 should be created to verify referential integrity.

Other database testing approaches include the following:

  1. Control testing — Includes a variety of control issues that need to be tested.

    1. –   Security testing — Protects the database from unauthorized access.

    2. –   Backup testing — Verifies the ability to back up the system.

    3. –   Recovery testing — Verifies the restoration of a database to a state known to be correct after a failure has rendered it unreliable.

      Images

      Exhibit G.4   Referential Integrity Test Cases

    4. –   Concurrency testing — Ensures that parallel processes such as queries and updates do not interfere with each other.

    5. –   Deadlock control — Ensures that two concurrent processes do not form a “gridlock” and mutually exclude each other from adequate completion.

  2. Data content verification — Provides periodic audits and comparisons with known reference sources.

  3. Refresh verification — Verifies external systems that refresh the database and data conversions.

  4. Data usage — Includes verifying database editing and updating. Often, the developer does not create enough, or may include too many, characters for the columns of an entity. The tester should compare the number of characters on each GUI field to the respective entity field lengths to verify that they are the same. Tip: One way to make sure the database column lengths are large enough is to copy a very large document using the Windows “copy edit” feature and then paste it into each GUI field. Some of the testing techniques that can be employed to generate data include range testing, boundary value testing, field integrity testing, and positive and negative testing. Most databases have query facilities that enable the tester to verify that the data is updated and edited correctly in the database.

  5. Stored Procedures — These are stored and invoked when specific triggers from the application occur.

Data Modeling Essentials

The purpose of this section is to familiarize the reader with data modeling concepts and terminology involved in performing database and GUI field testing against a relational design (see G10, “Database Testing,” “Database Integrity Testing”; G26, “Range Testing”; G22, “Positive and Negative Testing”; G36, “Table Testing”; and G20, “Orthogonal Array Testing”). It will also serve as a useful reference to relational database design in the context of testing.

What Is a Model?

A model is a simplified description of a real-world system that assists the user in making calculations and predictions. Only those aspects of the system that are of interest to the user are included in the model; all others are omitted.

Three different materials are used in creating models:

  1. Metal

  2. Wood

  3. Clay

The most appropriate material is used for the model, even though it may differ from the material used for the system being modeled. The written specifications of a system may be used by themselves as a model of the real world.

A model may be considered to have two features:

  1. Shape or structure

  2. Content

The structure of the model reflects the invariant aspects of the system, and the content reflects the dynamic aspects. For example, the structure of a predictive meteorological model consists of formulas, whereas the content consists of data (temperature, humidity, wind speed, and atmospheric pressure) gathered from many points over a period of time.

Why Do We Create Models?

We must be able to measure real-world systems to be able to understand them, use them effectively, monitor their performance, and predict their future performance. Often, it is impossible to measure the actual system. It may be too expensive or too dangerous. Before an aircraft manufacturer sends a pilot up in a new plane, there must be some assurance that the plane will fly. An automobile manufacturer wants to know what a car will look like before tooling up an assembly line to make it.

We have a requirement to understand, measure, and control a real-world system: the user’s business. The easiest way to make timely, cost-effective measurements and predictions about the business system is to create a model of the business. Data is the most appropriate material for our model; hence the name “data model.” The structure of our data model should represent the aspects of the user’s business that change very little over time. The content of the model (the values stored in the model) represents the information that changes with time. The result is a data model whose structure is stable and, therefore, easily maintained.

Applications that we create will be responsible for adding, changing, and deleting the content of the model and for reporting on the content.

The use of this technique results in the following benefits:

  1. ■ The relatively stable nature of the data model will allow us to be more responsive to changing business needs. Business changes usually result in changes in how the content is maintained and reported. Changes to the structure of the model occur less frequently and are usually minor.

  2. ■ The technique we will use will create a data model that is independent of both current business processes (but not business policy) and current data processing technology.

  3. ■ An additional benefit of this technique is that it can be used when current process-oriented techniques do not work. For example, there are no clearly identifiable processes involved in a management information application. The users cannot specify exactly how data will be used. By creating a data model whose structure reflects the structure of the business, we can support any reasonable inquiry against the data.

  4. ■ Data analysis starts with the development of the data model.

Tables: A Definition

A table is a list of facts, numbers, and the like, systematically arranged in columns.

Tables are used whenever we need to order information for storage or presentation. They are relatively easy to create and maintain, and present information in a clear, unambiguous, simple format. Examples of tables that we may encounter are the following:

  1. ■ Table of Contents

  2. ■ Metric Conversion Table

  3. ■ Table of Weights and Measures

  4. ■ Tax Table

Exhibit G.5 illustrates the features of a table.

Table Names

A table is identified by its name. Therefore, its name must be unique within the scope of the business.

Images

Exhibit G.5   Sample Table

Columns

A table is divided vertically into columns. All entries in a given column are of the same type and have the same format. A column contains a single piece of data about all rows in the table. Each column must have a name unique within the table. The combination of table name and column name is unique within the business. Examples might be CUSTOMER.NAME, CUSTOMER.NUMBER, EMPLOYEE.NAME, and EMPLOYEE.NUMBER.

Rows

A table is divided horizontally into rows. Each row must be uniquely identifiable. Each row has the same number of cells and contains a piece of data of a different type and format.

Order

The order of rows and columns in a table is arbitrary. That is, the order in which rows and columns are presented does not affect the meaning of the data. In fact, each user of a table may have unique requirements for ordering rows and columns. For this reason, there must be no special significance attached to the ordering of rows and columns.

From the foregoing definition, it is clear that tables are useful for documenting data requirements. They can be easily understood by both development and user personnel.

We define a table to represent each object in our model. The table columns provide descriptive information about the object, and the rows provide examples of occurrences of the object.

Entities: A Definition

An entity is a uniquely identifiable person, place, thing, or event of interest to the user, about which the application is to maintain and report data.

When we create a data model, we must first decide which real-world objects are to be included. We include only those objects that are of interest to the users. Furthermore, we include only those objects required by computer applications.

We organize the objects (entities) to be included into groups called entity types. For example, a clothing store might identify customers, products sold, and suppliers of those products as objects to be included in a data model. This grouping, however, is not adequate for a useful model of the real world. Depending on the type of clothing sold by the store, the user may wish to group products by style, type, size, color, and so on. The identification of objects is made difficult by the fuzzy definitions used in the real world. In our model, we must be specific; therefore, we will define as entity types only groups of objects in which each occurrence can be uniquely identified.

Each entity type is given a unique name. Examples are CUSTOMER, SUPPLIER, and EMPLOYEE.

Identification: Primary Key

Every entity must have a primary key.

To allow us to uniquely identify each occurrence of an entity type, we must define a key called the primary key. Its value may be assigned by the user or by the application. There may be more than one choice for the primary key. For the entity type EMPLOYEE, we might choose SOCIAL INSURANCE NUMBER or invent an EMPLOYEE NUMBER. The major requirement is that each value be unique. It is also important that the primary key be one by which the user would naturally identify an occurrence of the entity. You should also choose a key that is not likely to change. It should be as short as possible. This is why serial numbers are popular keys; they are assigned once, they do not change, and they are unique. (Be careful. In the real world, duplicate serial numbers may be inadvertently assigned.)

Note: A key is not an access path. It is only a unique identifier.

Compound Primary Keys

A primary key may be composed of more than one column. For example, an automobile can be uniquely identified only by the combination MAKE + MODEL + VEHICLE IDENTIFICATION NUMBER. A key composed of more than one column is a compound key.

Null Values

In any descriptive information about an entity, it is possible to have a situation where a piece of data for a particular occurrence is not known. For example, when an employee description is added to a personnel application for the first time, the employee’s department number or phone number might not be known. The correct value is not zero or blank; it is unknown. We refer to an unknown value as a null value. We might use blanks or zero or some special indicator to reflect this in a computer application. However, because null means unknown, you cannot compare null values (e.g., for equal). You also cannot use them in numeric computations, because the result would also be unknown. In our data model, we indicate which columns may contain null values.

We bring this point up here because of the following rule:

  1. A primary key may not be null.

It is important to remember this. A null value means we do not know what the correct value is, but primary key values must be known to uniquely identify each occurrence of the entity type to which they refer. In a compound key, it is possible for the key to contain null values in some, but not all, columns.

Identifying Entities

Consider the following list:

  1. ■ Which is an entity type?

  2. ■ Which is an entity occurrence?

  3. ■ Which is neither?

  4. ■ What would be a suitable key?

    1. –   Automobile

    2. –   Ford

    3. –   Superman

    4. –   Nietzsche

    5. –   Telephone

    6. –   Telephone number

    7. –   House

    8. –   Postal code

    9. –   Aquamarine

    10. –   Seven

    11. –   Marriage

One thing you will discover when trying to identify the entity types and occurrences in the above list is that the user context is important. Consider Automobile. If the user is an automobile dealer, then automobile could be an entity type. However, if the user is attempting to keep track of types of transportation, automobile could be an entity occurrence. Ford might be a make of automobile, a U.S. president, or a way to cross a river.

Telephone number is often treated as if it were an entity type. You might instead think of it as the key that identifies a telephone. It cannot identify a specific physical phone, however, because you can replace the phone with a new one without changing the telephone number. It does not identify a specific telephone line, because you can often take the phone number with you when you move to a new location. In fact, the telephone number really identifies a telephone company account.

Aquamarine might be an entity occurrence. What would be the entity type? If your user is a jeweler, the entity type might be Precious Stone; if a paint manufacturer, Color.

Entity Classes

Entities may be grouped for convenience into various classes. Consider the following:

  1. Major entity: An entity that can exist without reference to other entities (e.g., CUSTOMER, ORDER). These entity types are typically identified early in the data analysis process. In most cases, the primary key of a major entity will consist of a single column.

  2. Dependent entity: An entity that depends on and further defines another entity (e.g., ORDER LINE ITEM). These entity types will often be identified during the process of defining relationships or normalizing and refining the model. The primary key of a dependent entity is always a compound key. These topics are covered later.

  3. Minor entity: An entity that is used primarily to define valid values within the model (e.g., EMPLOYEE TYPE, CREDIT CODE). These may be ignored in some cases (e.g., if the only valid values are Y and N). The primary key of a minor entity is almost always a single column.

Relationships: A Definition

Each entity in a data model does not exist in solitary splendor. Entities are linked by relationships. A relationship is an association between two or more entities, of interest to the user, about which the application is to maintain and report data.

This is similar to the definition of an entity, and we show that a relationship can be considered a special type of entity.

Relationship Types

There are three types of relationships:

  1. One-to-one

  2. One-to-many

  3. Many-to-many

We now examine each type and see how we document them.

Images

Exhibit G.6   Employee Table

One-to-One

One-to-one relationships are the simplest and, unfortunately, the least common.

A one-to-one relationship links a single occurrence of an entity to zero or one occurrence of an entity. The related entity occurrences are usually of different types, but there is no rule prohibiting them from being of the same type. When the related entities are of the same type, the relationship is called a recursive relationship.

Let us consider a hypothetical example. An enlightened company, which shall remain nameless, has determined that employees work best when they are not forced to share desks or workstations. As a result, each desk is assigned to only one employee and each employee is assigned to one desk.

We document this happy relationship by placing the primary key of either entity into the description of the other entity as a foreign key.

Either Exhibit G.6 or Exhibit G.7 can be used to illustrate the relationship. Consider Exhibit G.6 first, the EMPLOYEE table.

Images

Exhibit G.7   Desk Table

The PK in the column headed EMPLOYEE NUMBER indicates that this is the primary key. The FK in the column headed DESK NUMBER indicates that this is a foreign key (i.e., it is a primary key in some other table). The ND in this column enforces the one-to-one relationship by indicating that there can be no duplicate values (the same desk cannot be assigned to two different employees).

Exhibit G.7 illustrates the same relationship. The ND indicates that an employee may not be assigned to two different desks. Note, however, that there is an NL indication in the EMPLOYEE NUMBER column in this table. This indicates that a desk may be unassigned.

Although the relationship may be documented either way, there are some guidelines:

  1. ■ Do not document the relationship both ways. Choose one.

  2. ■ Choose the way that reduces or eliminates the need to record nulls. Note that this typically means placing the foreign key in the entity with the fewest occurrences.

On the basis of the aforementioned guidelines, the relationship in our example is best represented, as in Exhibit G.6, by recording the desk number as a foreign key of the employee (although Exhibit G.7 is not wrong).

One-to-Many

One-to-many relationships are the most common, and the documentation technique is straightforward. A one-to-many relationship links one occurrence of an entity type to zero or more occurrences of an entity type.

As an example, let us look again at the company described earlier. When it comes to the assignment of telephones to employees, the company is not so enlightened. Each employee must share a single telephone number and line with other employees. Exhibits G.8 and G.9 illustrate the relationship between telephone numbers and employees.

Images

Exhibit G.8   Employee Table

Images

Exhibit G.9   Telephone Line Table

The documentation of this relationship appears to be the same as for a one-to-one relationship. However, there is only one way to represent a one-to-many relationship. We record the one in the many. In Exhibits G.8 and G.9, we record the telephone number as a foreign key of the EMPLOYEE. To record the relationship the other way would require an array of employee numbers of indeterminate size for each telephone number. There is another important difference. We did not place ND (no duplicates) in the foreign key column. This is because duplicates are allowed; the same telephone number can be assigned to more than one employee.

So the rule here is easy to remember. There is only one correct way:

  1. Record the one in the many.

Many-to-Many

Many-to-many relationships are the most difficult to handle. They also occur frequently enough to make data analysis interesting. A many-to-many relationship links many occurrences of an entity type to many occurrences of an entity type. For an example of this type of relationship, let us again examine the nameless company.

Management believes that the more people are assigned to a given project, the sooner it will be completed. Also, because they become nervous at the sight of idle employees, they give each employee several assignments to work on simultaneously.

We cannot document a many-to-many relationship directly, so we create a new entity (see Exhibits G.10, G.11, and G.12) and link it to each of the entities involved, by a one-to-many relationship (we already know how to do that).

The EMPLOYEE/PROJECT entity has been created to support the relationship between EMPLOYEE and PROJECT. It has a primary key consisting of the primary keys of the entity types it is relating. They are identified as foreign keys. This is an example of a compound key. Any entity may have a compound key that may be completely or partly made up of foreign keys from other entities. This commonly occurs with dependent entities. The EMPLOYEE/PROJECT entity we have created is dependent on EMPLOYEE and PROJECT; it would not exist but for the relationship between them.

Images

Exhibit G.10   Employee Table

Images

Exhibit G.11   Project Table

Images

Exhibit G.12   Employee/Project Table

Note that the foreign keys that make up the primary key in this entity support one-to-many relationships between EMPLOYEE and EMPLOYEE/PROJECT and between PROJECT and EMPLOYEE/PROJECT. We must now demonstrate that this is equivalent to a many-to-many relationship between EMPLOYEE and PROJECT. An example will best illustrate the approach.

Given two employees and two projects, as in Exhibits G.13 and G.14, we can show that both employees work on both projects by creating occurrences of the EMPLOYEE/PROJECT entity, as in Exhibit G.15.

Images

Exhibit G.13   Employee Table

Images

Exhibit G.14   Project Table

Images

Exhibit G.15   Employee/Project Table

We can see that EMPLOYEE 11111 is related to two EMPLOYEE/ PROJECT occurrences (11111ABCD and 11111WXYZ). Each of these EMPLOYEE/PROJECT entities is in turn related to one PROJECT entity. The result is that each EMPLOYEE occurrence may be related to many PROJECT occurrences through the EMPLOYEE/PROJECT entity. By the same technique, each PROJECT occurrence may be related to many EMPLOYEE occurrences.

Multiple Relationships

There will sometimes be more than one type of relationship between occurrences of the same entity types. When you encounter this situation, identify and document each relationship independently of any others. For instance, in the last example, there might have been a requirement to record the project leader of each project independently of any other employees assigned to the project. This relationship might have been a one-to-many relationship with PROJECT LEADER EMPLOYEE NUMBER a foreign key in the PROJECT table.

Entities versus Relationships

The distinction between entities and relationships is not always clear. Consider the following example.

A customer buys an automobile from a dealer. The sale is negotiated by a salesperson employed by the dealer. The customer may have purchased automobiles from this dealer before, but may have dealt with a different salesperson.

Is the purchase a relationship between customer and salesperson? Is it an entity that is related to customer, salesperson, and automobile? How to treat such a real-world situation is often an arbitrary decision. There is no formal rule that can be used as a guide. Fortunately, the technique we use to document entities and relationships can reduce or eliminate the problem.

If we consider a purchase agreement to be an entity, we select a primary key, such as AGREEMENT NUMBER, and define relationships to other entities. There is a one-to-many relationship between SALESPERSON and PURCHASE AGREEMENT and, if we have satisfied customers, between CUSTOMER and PURCHASE AGREEMENT. We document these relationships in Exhibit G.16 by placing CUSTOMER NUMBER and EMPLOYEE NUMBER as foreign keys in PURCHASE AGREEMENT.

If we do not consider the purchase agreement to be an entity, we must document the relationship between CUSTOMER and SALESPERSON (see Exhibit G.17). Because, in the general case, there is a many-to-many relationship between customers and salespeople, we must create a new entity, CUSTOMER/SALESPERSON, with a compound key of CUSTOMER NUMBER + EMPLOYEE NUMBER. We will probably have to add VEHICLE MAKE and IDENTIFICATION NUMBER to the primary key to ensure uniqueness.

To change this relationship to an entity, we need only rename it and change the primary key. The columns already in the table will probably still be required.

Images

Exhibit G.16   Purchase Agreement Table

Images

Exhibit G.17   Customer/Salesperson Table

Attributes: A Definition

An attribute is a characteristic quality of an entity or relationship, of interest to the user, about which the application is to maintain and report data.

Attributes are the data elements or fields that describe entities and relationships. An attribute is represented by a column in a table.

  1. ■ Primary keys are attributes or sets of attributes that uniquely identify entities.

  2. ■ Foreign keys are attributes that define relationships between entities.

  3. ■ Nonkey attributes provide additional information about entities (e.g., EMPLOYEE NAME) and relationships (e.g., QUANTITY ORDERED on an order line).

The information in this section applies to all types of attributes. All attributes base their values on domains.

Domain

A domain is a set of possible values of an attribute.

To determine which values are valid for a given attribute, we need to know the rules for assigning values. The set of values that may be assigned to a given attribute is the domain of that attribute.

All attributes of the same type must come from the same domain. For example, the following attributes could describe different entities or relationship:

  1. ■ Department Number

  2. ■ Sales Branch Number

  3. ■ Service Branch Number

They are all based on the domain of possible department numbers. The domain is not a list of the assigned department numbers but a set of the possible department numbers from which values may be selected.

The definition of domains is somewhat arbitrary, and there may be a temptation to create general domains that allow too much freedom. Consider CUSTOMER NUMBER and DEPARTMENT NUMBER. If these attributes are both defined on the basis of a domain of any numbers, we could end up with the following:

Customer  –12345

Department 12.34

By restricting the domain to positive integers, we can avoid negative numbers and decimal fractions. However, with a definition that is this general, we can still combine customers and departments. For example, someone might decide that, whenever an internal order is processed, the CUSTOMER NUMBER field on the order will be sent to the ordering DEPARTMENT NUMBER. To satisfy processing requirements, we would have to place department numbers in the CUSTOMER table, because all valid customers appear there. Now, whenever we reorganize the business, we must update the customer data.

The safest approach in our example is to define the domains of CUSTOMER NUMBERS and DEPARTMENT NUMBERS separately.

Note: Be careful when defining the domain of fields such as customer number, employee number, or part number. It is natural to think of such fields as numeric. It may even be true that, currently, all assigned values are numeric. Alphabetic characters, however, have a nasty habit of showing up in these identifiers sooner or later.

Domain Names

Each domain should have a name that is unique within the organization. The name and the rules for defining values within the domain should be documented. A single column primary key based on a domain will usually have the same name as the domain (e.g., customer number). If a key (primary or foreign) is compound, each column will usually have the same name as the domain. Where the same domain is referenced more than once by attributes of an entity (e.g., date born, date hired for an employee), the domain name should be part of the attribute column name.

Domains, and the attributes based on them, must be nondecomposable.

This statement does not mean that attributes should not decay or fall apart from old age. As an example of a decomposable domain and attribute, consider the following.

Whenever an order is recorded, it is assigned an order number. The order number is created according to the following rules:

  1. The customer number makes up the first (high order) part of the order number.

  2. The order entry date, in the form YYMMDD, is the next part of the order number.

  3. The last two positions of the order number hold a sequence number to ensure uniqueness if a customer submits several orders in one day.

Because we use the term order number, there is a temptation to treat this as a single column. Resist the temptation. The primary key in this example is a compound key made up of customer number, order entry date, and a sequence number. Each attribute making up the compound key is based on a different domain. It is now possible to document the fact that there is an attribute in the order that is based on the domain of customer numbers. Any changes in the rules for that domain can be checked for their impact on the order entity.

Having said that all domains must be nondecomposable, we now state two exceptions:

  1. Date

  2. Time

Date is usually in the form month/day/year. There is usually no need to record this as three separate attributes. Similarly, time may be left as hours/minutes.

Attributes versus Relationships

Just as there is a somewhat arbitrary choice between entities and relationships, there is a similar choice between attributes and relationships. You could consider an attribute as a foreign key from a table of valid values for the attribute. If, for example, you were required to record eye color as an attribute of EMPLOYEE, you might set up an entity called COLOR with a primary key of COLOR NAME. You could then create EYE COLOR as a foreign key in EMPLOYEE. This would probably not provide much advantage over a simple attribute. You might even get into trouble. If you chose to add HAIR COLOR as a foreign key related to the same primary key, you could end up with an employee with blue hair and red eyes.

Although this example may seem trivial, real-world choices are often more subtle. You might choose a foreign key over a simple attribute if you wished to have a table for edit checking or if you needed a long description on reports. The description could be an attribute in the table in which the foreign key was a primary key.

Normalization: What Is It?

Normalization is the process of refining an initial set of entities into an optimum model. The purpose is to eliminate data redundancy and to ensure that the data structures are as flexible, understandable, and maintainable as possible.

Normalization is achieved by ensuring that an entity contains only those attributes that depend on the key of the entity. By “depend on,” we mean that each value of the key determines only one value for each attribute of the entity. If the concept is unclear at this point, do not be discouraged; it is explained later in this section. Said another way, normalization means ensuring that:

Each attribute depends on

  1. The Key, The Whole Key, and Nothing But the Key.

Problems with Unnormalized Entities

Exhibit G.18 illustrates the problems that will occur in attempting to maintain an unnormalized entity. The example in Exhibit G.18 is unnormalized because the department name is dependent on the department number, not on the employee number, which is the key of the entity. Consider the effects of the design on the application.

  1. Modification anomaly: Suppose a corporate reorganization makes it necessary to change the name of department 354 to Advertising and Promotion. A special-purpose program will be required to modify this information accurately and completely everywhere that it appears in the database.

  2. Insertion anomaly: A new employee is hired for department 220. The clerk maintaining the data may not have all the relevant information. Either he will have to scan the data looking for existing names for department 220 or, probably, he will guess and assign our new employee to department 220, with DEPTNAME SHALLOW THOUGHT. What is the correct name of the department now?

  3. Deletion anomaly: Employee number 00215 has retired. Her replacement starts work next week. However, by deleting the entry for employee 00215, we have lost the information that tells us that the publishing department exists.

  4. Redundancy: It is possible to reduce the impact of these anomalies by designing programs that take their existence into account. Typically, this results in code that is more complex than it needs to be, and in additional code to resolve inconsistencies. These increase the cost of development and maintenance without eliminating the problems. In addition, the duplication of data will increase file or database sizes and will result in increased operating costs for the application.

Images

Exhibit G.18   Employee Table

Images

Exhibit G.19   Customer Table

All of the foregoing problems are collectively known as the update anomaly.

Steps in Normalization

We explain normalization by discussing a series of examples that illustrate the three basic steps to be followed in reducing unnormalized data to third normal form.

First Normal Form (1NF)

Each attribute depends on the key.

Each attribute can only have a single value for each value of a key. The first step in normalization is to remove attributes that can have multiple values for a given value of the key and form them into a new entity.

For example, consider the following entity (CUSTOMER) whose key attribute is CUSTNO (Customer Number) shown in Exhibit G.19.

In this case, the multivalued attribute consists of the three “attributes” ADDR_LINE_1, ADDR_LINE_2, ADDR_LINE_3. In fact, these are really three elements of an array. The first normal form of this entity is shown in Exhibits G.20 and G.21.

We have created a new entity (CUSTOMER ADDRESS), with a compound key of customer number and line number (to identify each line of a customer’s address). This new entity is dependent on the CUSTOMER entity and allows an address to have a variable number of lines (0 to 99).

Images

Exhibit G.20   Customer Table

Images

Exhibit G.21   Customer Address Table

Multivalued attributes can usually be identified because they are recorded as arrays (ADDR(1), ADDR(2)), including arrays of structures, where each element of the array is, in fact, a different value of the attribute. In some cases, as in the previous example, the fact that an attribute is multivalued has been disguised by the use of unique column names. The giveaway is in the similarity of names. Additional examples of giveaways are names such as:

  1. ■ CURRENT_SALESMAN, PREVIOUS_SALESMAN and

  2. ■ FIRST_BRANCH_OFFICE, SECOND_BRANCH_OFFICE,...

Images

Exhibit G.22   Product Model Table

Second Normal Form (2NF)

Each attribute depends on the whole key.

The second step in normalization is to remove attributes that depend on only a part of the key and form them into a new entity.

Let us examine Exhibit G.22, in which the entity (PRODUCT MODEL) is an entity consisting of all the products and their models. The key is PRODNO + MODNO. Let us assume that each product has a single SOURCE OF SUPPLY and that it is necessary to know the QTY ON HAND of each model.

PRODDESCRIPT and SOURCE_OF_SUPPLY in Exhibit G.23 are PRODNO and are removed to form a new PRODUCT entity in Exhibit G.24.

The old PRODUCT MODEL entity is now dependent on the new PRODUCT entity. New models can be added without maintaining product descriptions and source of supply information. Models can be deleted while still retaining information about the product itself.

Images

Exhibit G.23   Product Table

Images

Exhibit G.24   Product Model Table

Dependence of attributes on part of a key is particularly evident in cases where a compound key identifies occurrences of an entity type.

What would be the effect on the above entities if a product could have multiple sources of supply?

Third Normal Form (3NF)

Each attribute depends on no other but the key.

The third step in normalization is to remove attributes that depend on other nonkey attributes of the entity.

Images

Exhibit G.25   Order Table

At this point it should be noted that a nonkey attribute is an attribute that is neither the primary key nor a candidate key. A candidate key is an attribute other than the primary key that also uniquely identifies each occurrence of an entity. (For example, a personnel file is keyed on employee serial number and also contains a social insurance number, either of which uniquely identifies the employee. The employee serial number might function as the primary key and the social security number would be a candidate key.)

Consider the entity ORDER in Exhibit G.25, each occurrence of which represents an order for a product. As a given, assume that the UNIT PRICE varies from machine to machine and contract to contract.

Here we see a number of attributes that are not dependent on key. UNIT PRICE is dependent on CONTRACT TYPE and PRODNO and TENDED PRICE is dependent on both QTY and UNIT PRICE.

Reduction to third normal form requires us to create a new entity, PRODUCT/MODEL/CONTRACT, whose key is PRODNO + MODNO + CONTRACT TYPE, with UNIT PRICE an attribute of the entity. EXTENDED PRICE is calculated from the values of the other attributes and can be dropped from the table and computed as required. This is known as a derived attribute.

The third normal form should look similar to those displayed in Exhibits G.26 and G.27.

In this form, prices and quantities may be changed. Data from both entities is joined together to calculate an EXTENDED PRICE. What changes to the model might be required to protect the customer against price changes? What would be the effect on the application if it were decided to maintain the EXTENDED PRICE as an attribute of the ORDER entity?

Images

Exhibit G.26   Order Table

Images

Exhibit G.27   Product/Model/Contract Table

Model Refinement

This section discusses additional refinements that can be (and, in a real situation, usually must be) incorporated into a data model.

What is important about these refinements is that they introduce constraints in the model, which must be documented in the design and incorporated into the application.

Entity Subtypes

Frequently it is necessary to decompose (break down) a defined entity into subtypes.

A Definition

Entity subtypes:

  1. ■ Have attributes peculiar to the subtype

  2. ■ Participate in relationships peculiar to the subtype

  3. ■ Are identified by a subset of the key of the entity

An entity subtype is not the same as a dependent entity. A dependent entity is identified by a compound key, consisting of the key of the major entity plus additional qualifying attributes.

This need not be so in the case of an entity subtype, which has the same key as the major entity and is, in fact, merely a subclassification of that entity.

For example, all of us are employees and hence are occurrences of the EMPLOYEE entity. Some employees, however, are also marketing reps with attributes (marketing unit, team, territory, quota, etc.) that are unique to their occupation. The MARKETING REP entity is a subtype of the EMPLOYEE entity.

Additional subtypes of the EMPLOYEE entity might be:

  1. ■ Employee as manager

  2. ■ Employee as stockholder

  3. ■ Employee as beneficiary

The existence of entity subtypes raises issues of referential integrity, which we discuss in the next section.

Referential Integrity

Integrity Rule:

  1. For any foreign key value in a table, there must be a corresponding primary key value in the same or another table.

As stated, the rule is very simple. To enforce this rule may require a great deal of complicated application code, preceded (of course) by a significant design effort. Fortunately, some database management systems have built-in features that make the provision of referential integrity much simpler (e.g., the logical insert, replace and delete rules in IMS/VS).

Exhibits G.28, G.29, and G.30 illustrate the problem by means of three entities (customer, product, and order).

Images

Exhibit G.28   Customer Table

Images

Exhibit G.29   Product Table

Images

Exhibit G.30   Order Table

In practical terms, adherence to the referential integrity rule means the following:

  1. ■ A customer can be inserted without integrity checks.

  2. ■ A product can be inserted without integrity checks.

  3. ■ An order can be inserted, but the customer number foreign key must exist in the CUSTOMER entity, and the product code foreign key must exist in the PRODUCT entity.

  4. ■ A customer may not be deleted if its primary key exists in the order entity as a foreign key.

  5. ■ A product may not be deleted if its primary key exists in the order entity as a foreign key.

  6. ■ An order can be updated, but the customer number foreign key must exist in the CUSTOMER entity and the product code foreign key must exist in the PRODUCT entity if the values of those attributes are being altered.

Sometimes, adherence to the integrity rules can be more complicated. For example, we might want to permit the creation of a CUSTOMER at the time the order is entered, in which case the application must be coded to enforce a modified rule:

  1. An order can be inserted, but the customer number foreign key must exist in the CUSTOMER entity or must be inserted along with its attributes during order insertion. The product code foreign key must exist in the PRODUCT entity.

If these restrictions seem unduly harsh, ask yourself if you would want a salesman to enter orders for customers and products that do not exist.

Integrity constraints apply to entity subtypes as well. In a sense, a subtype simply has a special (1:1) relationship with another entity in which the primary key of the subtype is also a foreign key into the other entity type. In other words, we cannot appoint Joe Bloggs as a marketing representative unless he is already an employee.

  1. Referential integrity rules must be documented as part of the data model.

The rules, based on the dependency constraints, in this case would be as follows:

  1. ■ An order can be inserted without dependency checks (although to do so without inserting at least one order line might be meaningless).

  2. ■ An order line item can be inserted, but the order number foreign key must exist in the ORDER entity or must be inserted along with its attributes during order line insertion.

  3. ■ An order line can be deleted without dependency checks.

  4. ■ An order cannot be deleted unless all its dependent order lines have been previously deleted.

  5. ■ Deletion of the order must trigger the guaranteed deletion of all dependent order lines.

  1. The dependency constraints must be documented as part of the data model.

Dependency Constraints

Constraint Rule

A dependent entity cannot exist unless the entity on which it depends also exists.

The dependency constraint rule is a special form of referential integrity constraint applicable to dependent entities. Some database management systems automatically enforce most dependency constraints. Others do not.

Exhibits G.31 and G.32 are illustrations of dependency as an ORDER with multiple LINE-ITEMS.

Images

Exhibit G.31   Order Table

Images

Exhibit G.32   Line-Item Table

Images

Exhibit G.33   Employee Table

Recursion

A recursive relationship is a relationship between two entities of the same type.

Recursive relationships are found more frequently than one might think. Two of the most common recursive relationships are:

  1. Bill-of-Materials Explosion/Implosion

  2. Organizational Hierarchies

Recursive relationships are a special case among the common relationships (i.e., 1:1, l:M, M:M) and are modeled in exactly the same way. We can start out by making an EMPLOYEE entity represent the organizational structure of a company, as in Exhibit G.33.

The relationship between a manager and his employee (the organizational structure) is a one-to-many relationship. The manager’s employee number as a foreign key is shown in Exhibit G.34.

Recursive relationships impose additional integrity constraints. In this case:

  1. ■ A manager cannot work for himself. This implies that the topmost level of the hierarchy must contain a null value in the MGR_EMP_NUMBER column.

  2. ■ A manager cannot work for one of his employees; nor can he work for anyone who works for one of his employees … and so on ad infinitum.

Images

Exhibit G.34   Employee Table

A bill of materials processing model is an example of a many-to-many recursive relationship in which each component is used in many subassemblies and finished products and in which each product contains many components.

As an exercise:

  1. ■ What would such a model look like?

  2. ■ What constraints should be placed on the model? Would they differ from the constraints placed on the previous model?

Using the Model in Database Design

All the work of modeling is of no use unless it directly contributes to the database design. In converting the model to a physical database design, some compromises with normalization may be necessary in order to obtain satisfactory performance. The compromises will be:

  1. ■ Least in implementing a relational design

  2. ■ Moderate in implementing a hierarchical design

  3. ■ Greatest in implementing a flat file design

It is not the intent of this section to give complete guidance for implementing the model using a specific database management system (DBMS). This material is covered in IMS database design and relational database design courses.

Relational Design

The first cut at implementing the model using a relational DBMS is to implement the model as it stands:

  1. ■ Each entity and relationship becomes a table.

  2. ■ Group logically related entities into databases.

  3. ■ Each attribute becomes a column in the table.

  4. ■ A unique index is defined for each primary key (to ensure row uniqueness).

  5. ■ Additional indices are created to support known access paths.

  6. ■ For each table, an index is chosen by which the data will be clustered, to support the most frequently used access sequence.

  7. ■ Space calculations are performed.

Subsequent modifications may be required to achieve acceptable performance.

G11: Decision Tables

Decision tables are a technique for representing combinations of actions for the respective set of decisions and are an alternative to flowchart analysis. Each column, therefore, comprises a test case, or path through a flowchart.

Consider the following small program, which reads records from a file and tallies the numerical ranges of a field on each record to illustrate the technique.

PROGRAM: FIELD-COUNT

Dowhile not EOF
       read record
       if FIELD_COUNTER > 7 then
              increment COUNTER_7 by 1
       else
              if FIELD_COUNTER > 3 then
                     increment COUNTER_3 by 1
              else
                     increment COUNTER_1 by 1
              endif
       endif
End_While
End

The corresponding decision table is displayed in Exhibit G.35, and there are four test cases to test the program using decision tables.

Images

Exhibit G.35   Decision Table

G12: Desk Checking

Desk checking is a human error-detection process in the form of a one-person walkthrough. The typical application is where an individual reads a program, verifies it with a checklist, and manually walks test data through it. It can also be applied to requirements and design as a check on the work. This technique provides an evaluation of the quality of the program after it has been written or after the design has been completed.

G13: Equivalence Partitioning

Equivalence partitioning is a black-box testing technique that partitions the input domain into a set of input classes that can cause multiple behaviors.

From the requirements, each input is divided into partitions. Using this technique, one representative value from each partition is selected and tested. It is assumed that the results predict the results for other values in the partition, which demonstrates the power and economy of this technique.

It is more complicated than a simple range test because a range is divided into a series or one or more ranges because of the different behaviors that can occur. Consider the following application. The income needs to be broken up into three equivalence classes, as the behavior (or tax) varies according to the income value.

Images

Exhibit G.36   Income versus Tax Percentage

Images

Exhibit G.37   Income/Tax Test Cases

An IRS program computes the amount of state income tax on the basis of the income, as is displayed in Exhibits G.36 and G.37. The following are some guidelines for defining equivalence classes.

Sets of Values

Example: “Input colors can be red, blue, white or black”; for example, the tests would be red, blue, white, black.

Numeric Input Data

Field Ranges

Example: “Input can range from integers 0 to 100”; for example, a test case could be 45 (any arbitrary number between 1 and 100).

Example: “Input can range from real numbers 0.0 to 100.0”; for example, a test case could be 75.0 (any arbitrary number between 0.0 and 100.0).

Numeric Output Data

Output Range of Values

Example: “Numerical range outputs of actuarial tables can be from $0.0 to $100,000.00”; for example, a test case could be $15,000.00 (any arbitrary number between $0.0 and $100,000.00).

Nonnumeric Input Data

Tables or Arrays

Example: A test case could be to input from any table row with alphabetic content.

Number of Items

Example: “Number of products associated with a model is up to 10”; for example, a test case could be 5 products (any arbitrary number of products between 0 and 10).

Nonnumeric Output Data

Tables or Arrays

Example: Update, write, delete any table row.

Number of Outputs

Example: “Up to 10 customers can be displayed”; for example, a test case could be 7 customers displayed (any arbitrary number of customers between 0 and 10).

Steps to Create the Test Cases Using
Equivalence Class Partitioning

  1. Define the equivalence classes.

  2. Write the first test case to cover as many of the valid equivalence classes from the rule set as possible (although they may be mutually exclusive).

  3. Continue writing test cases until all of the valid equivalence classes from the rules have been covered.

  4. Write one test case for each invalid class.

The following example illustrates this process:

Suppose the requirement states that the cost of a car shall be between $25,000 and $38,000 with 4 doors. The car types shall be a Ford, Chevy, Jeep, or Honda. The monthly payments shall be less than $500.

Step 1. Define the Equivalence Classes

Images

Step 2–4. Create Valid and Invalid Test Cases

Images

G14: Exception Testing

With exception testing, all the error messages and exception-handling processes are identified, including the conditions that trigger them. A test case is written for each error condition. A test case/error exception test matrix (Exhibit G.38) can be helpful for documenting the error conditions and exceptions.

Images

Exhibit G.38   Test Case/Error Exception Test Matrix

G15: Free-Form Testing

Free-form testing, often called error guessing, ad hoc testing, or brainstorming, is a “blue-sky” intuition of where and how errors are likely to occur and is an add-on technique to other testing techniques.

Some testers are naturally adept at this form of testing, which does not use any particular testing technique. It involves intuition and experience to “smell out” defects. There is no particular methodology for applying this technique, but the basic approach is to enumerate a list of potential errors or error-prone situations and write test cases on the basis of the list.

G16: Gray-Box Testing

Black-box testing focuses on the program’s functionality against the specification. White-box testing focuses on the paths of logic. Gray-box testing is a combination of black- and white-box testing. The tester studies the requirements specifications and communicates with the developer to understand the internal structure of the system. The motivation is to clear up ambiguous specifications and “read between the lines” to design implied tests. One example of the use of gray-box testing is when it appears to the tester that a certain functionality seems to be reused throughout an application. If the tester communicates with the developers and understands the internal design and architecture, a lot of tests will be eliminated, because it might be possible to test the functionality only once. Another example is when the syntax of a command consists of 7 possible parameters that can be entered in any order as follows:

Command parm1, parm2, parm3, parm4, parm5, parm6, parm7 (enter)

In theory, a tester would have to create 7! or 5040 tests. The problem is compounded even more if some of the parameters are optional. If the tester uses gray-box testing, by talking with the developer and understanding the parser algorithm, if each parameter is independent, only 7 tests may be required to test each parameter.

G17: Histograms

A histogram is a graphical description of measured values organized according to the frequency or relative frequency of occurrence. In Exhibit G.39, the table consists of a sample of 100 client/server terminal response times (enter key until a server response) for an application. This was measured with a performance testing tool.

Images

Exhibit G.39   Response Time of 100 Samples (seconds)

Images

Exhibit G.40   Response Time Histogram

The histogram in Exhibit G.40 illustrates how the raw performance data from the aforementioned table is displayed in a histogram. It should be noted that the design specification is for response times to be less than 3 seconds. It is obvious from the data that the performance requirement is not being satisfied and there is a performance problem.

G18: Inspections

Inspections are the most formal, commonly used form of peer review. The key feature of an inspection is the use of checklists to facilitate error detection. These checklists are updated as statistics indicate that certain types of errors are occurring more or less frequently than in the past. The most common types of inspections are conducted on the product design and code, although inspections may be used during any life-cycle phase.

Inspections should be short because they are often intensive; therefore, the product component to be reviewed must be small. Specifications or designs that result in 50 to 100 lines of code are usually manageable. This translates into an inspection of 15 minutes to 1 hour, although complex components may require as much as 2 hours. In any event, inspections of more than 2 hours are generally less effective and should be avoided.

Two or three days before the inspection, the producer assembles the input to the inspection and gives it to the coordinator for distribution. Participants are expected to study and make comments on the materials before the review.

The review is led by a participant other than the producer. Generally, the individual who has the greatest involvement in the next phase of the life cycle is designated as reader. For example, a requirements inspection would likely be led by a designer, a design review by an implementer, and so forth. The exception to this is the code inspection, which is led by the designer. The inspection is organized and coordinated by an individual designated as the group leader or coordinator.

The reader goes through the product component, using the checklist as a means to identify common types of errors as well as standards violations. A primary goal of an inspection is to identify items that can be modified to make the component more understandable, maintainable, or usable. Participants discuss any issues that they identified in the preinspection study.

At the end of the inspection, an accept or reject decision is made by the group, and the coordinator summarizes all the errors and problems detected and gives this list to all participants. The individual whose work was under review (e.g., designer, implementer, tester) uses the list to make revisions to the component. When revisions are implemented, the coordinator and producer go through a minireview, using the problem list as a checklist. The coordinator then completes management and summary reports. The summary report is used to update checklists for subsequent inspections.

G19: JADs

A JAD is a technique that brings users and development together to design systems in facilitated group sessions. Studies show that JADs increase productivity over traditional design techniques. JADs go beyond one-on-one interviews to collect information. They promote communication, cooperation, and teamwork among the participants by placing the user in the driver’s seat.

JADs are logically divided into phases: customization, session, and wrap-up. Regardless of what activity one is pursuing in development, these components will always exist. Each phase has its own objectives.

  1. Customization — This phase is key to a JAD and largely consists of preparation for the next phase. Participants include the session leader and JAD analysts. The tasks include organizing the team, defining the JAD tasks and deliverables, and preparing the materials for the next JAD session.

  2. Session — This phase consists of facilitated sessions in which the analysts and users jointly define the requirements and the system design. The session leader facilitates the session, and the analyst documents the results.

  3. Wrap-Up — In this final phase, formal JAD outputs are produced. The facilitated session leader summarizes the visual and other documentation into a JAD document. The design results are fed back to the executive sponsor.

A given development effort may consist of a series of the three phases until the final requirements and design have been completed. When a project has multiple design activity (e.g., different portions of the overall design), a final wrap-up occurs at the completion of the design, at which point the design is reviewed as a whole.

G20: Orthogonal Array Testing

Orthogonal array testing, a statistical technique pioneered by Dr. Genichi Taguchi in manufacturing, helps in the selection of test cases to get a handle on the potentially enormous number of combination factors. It calculates the ideal number of tests required and identifies variations of input values and conditions; for example, it helps in the test selection process to provide maximum coverage with a minimum number of test cases.

Taguchi methods refer to techniques of quality engineering that embody both statistical process control (SPC) and new quality-related management techniques. Most of the attention and discussion of Taguchi methods have been focused on the statistical aspects of the procedure; it is the conceptual framework of a methodology for quality improvement and process robustness that needs to be emphasized.

An example is when the syntax of a command consists of three possible parameters in which there can be three possible values as follows.

       Command PARM1, PARM2, PARM3 (enter)
       PARMx = 1,2,3

In theory, a tester would have to create 33 or 27 test combinations, as shown in Exhibit G.41.

Applying orthogonal array testing (OATS), the technique selects test cases so as to test the interactions between independent measures called factors. Each factor also has a finite set of possible values called levels. In Exhibit G.41, there are three factors (PARM1, PARM2, and PARM3). Each has three levels (1, 2, and 3). The technique calls for the tester to locate the best fit of the number of factors and levels to the possible orthogonal arrays (found in most statistical texts). In Exhibit G.42, the orthogonal array with three factors and three levels is chosen. Each column in the array corresponds to a factor and each row corresponds to a test case. The rows represent all possible pairwise combinations of possible levels for the factors. Thus, only nine test cases are required, which demonstrates the power of the technique.

Images

Exhibit G.41   Parameter Combinations (with Total Enumeration)

Images

Exhibit G.42   Parameter Combinations (QATS)

G21: Pareto Analysis

Pareto diagrams are a special form of a graph that points to where efforts should be concentrated. By depicting events or facts in order of decreasing frequency (or cost or failure rate, etc.), it allows for a quick separation of the “vital few” from the trivial many. The Pareto chart is more commonly known to information systems personnel as the 80-20 rule: that is, 20 percent of the causes make up 80 percent of the frequencies. A Pareto chart is a histogram showing values in descending order, which helps identify the high-frequency causes of problems so that appropriate corrective action can be taken. It is an organized ranking of causes of a problem by type of cause. The objective is to select the most frequent cause or causes of a problem to direct action to eliminate those causes.

The four steps in using a Pareto chart include the following:

  1. Identify a problem area. One problem example is an excessive number of defects discovered during software testing.

  2. Identify and name the causes of the problem. This is the most time-consuming step because it requires the collection of information from various causes. Causes of defects include the following: architectural, database integrity, documentation, functionality, GUI, installation, performance, and usability. For most problems, there is little need to identify more than 12 causes. When more than 12 causes can be identified, one approach is to select 11 causes and the 12th cause can be classified as “Other.” If the “Other” category becomes significant, then it may need to be broken down into specific causes.

    Images

    Exhibit G.43   Pareto Chart

  3. Document the occurrence of the causes of the problem. The occurrences of the causes need to be documented. Samples from the defect-tracking database can be used to obtain these frequencies.

  4. Rank the causes by frequency, using the Pareto chart. This involves two tasks: to count the problem occurrences by type, and to build a bar chart (or Pareto chart), with the major causes listed on the left-hand side and the other causes listed in descending order of occurrence.

In Exhibit G.43 there are eight defect causes. Approximately 1050 defects have been recorded. Of those, 750 are caused by functionality and database integrity. Thus, 20 percent of the causes account for 71 (or approximately 80 percent) of the frequency. In our example, functionality is the major cause, and database integrity is the second cause. Emphasis should be placed on eliminating the number of functional and database problems. One approach might be increased unit testing and reviews.

G22: Positive and Negative Testing

Positive and negative testing is an input-based testing technique that requires that a proper balance of positive and negative tests be performed. A positive test is one with a valid input, and a negative test is one with an invalid input. Because there typically are many more negative than positive tests, a suggested balance is 80 percent negative and 20 percent positive tests.

For example, suppose an application accepts stock market or mutual fund five-character symbols and then displays the respective stock or mutual fund name. An example of a positive test is “PHSTX,” which is the mutual fund symbol associated with a health science fund. If this symbol displayed some other fund, it would entail a positive test that failed.

Values that are not valid stock or mutual fund symbols are negative tests. Typically, a negative test produces an invalid error message. For example, if “ABCDE” is entered and an invalid error message is displayed, this is a negative test that passed.

Some considerations of negative testing are how much negative testing is enough and how do we anticipate unexpected conditions. Testing the editing of a single alphabetic character field can be complex. One negative test would be “(“and should be detected by the system. Should”)” be tested? How many other nonalphabetic characters should be tested? Unanticipated conditions are also sometimes difficult to detect. For example, “&” and “‘“ have special meaning with SQL. Should both of these be tested in every field?

G23: Prior Defect History Testing

With prior defect history testing, a test case is created or rerun for every defect found in prior tests of the system. The motivation for this is that defects tend to cluster and regress back to the original problem. Some causes include poor software configuration management procedures, poor coding and unit testing during defect repair, the tendency for bugs to cluster, and so on.

A defect matrix is an excellent tool that relates test cases to functions (or program units). A check entry in the defect matrix indicates that the test case is to be retested because a defect was previously discovered while running this test case. The absence of an entry indicates that the test does not need to be retested.

If this approach is not economical because a large number of defects have been discovered, a test case should be retested on or above a certain defined severity level.

G24: Prototyping

Prototyping is an iterative approach often used to build systems that users are initially unable to describe precisely. The concept is made possible largely through the power of fourth-generation languages and application generators. Prototyping is, however, as prone to defects as any other development effort, maybe more so if not performed in a systematic manner. Prototypes need to be tested as thoroughly as any other system. Testing can be difficult unless a systematic process has been established for developing prototypes.

The following sections describe several prototyping methodologies. They are presented to show the diversity of concepts used in defining software life cycles and to illustrate the effects of prototyping on the life cycle in general.

Cyclic Models

This concept of software development with prototyping consists of two separate but interrelated cyclic models: one consisting of a classical software development cycle and the other of a prototyping cycle that interacts with the classical model during the phases of analysis and design. The major operations are the following:

  1. ■ Classical cycle:

    1. –   User request

    2. –   Feasibility

    3. –   Investigation

    4. –   Consideration of prototyping

    5. –   Analysis

    6. –   Design

    7. –   Final proposed design

    8. –   Programming

    9. –   Testing

    10. –   Implementation

    11. –   Operation

    12. –   Evaluation

    13. –   Maintenance

    14. –   (The cycle is repeated.)

  2. ■ Prototyping cycle:

    1. –   Prototype is designed.

    2. –   Prototype is used.

    3. –   Investigation is conducted using the prototype.

    4. –   Analysis is performed on the investigation.

    5. –   Refinements are made, or a new prototype is built.

    6. –   (This cycle is also repeated.)

The interaction of the two cycles occurs when investigation in the classical cycle uncovers the need to prototype, at which time the prototyping cycle is entered. Prototyping is terminated when analysis, design, or the final proposed design of the classical cycle can be completed on the basis of information discovered or verified in the prototyping cycle.

Fourth-Generation Languages and Prototyping

This method proposes the following life-cycle steps:

  1. A prototyping team of one analyst/programmer and one end user is formed.

  2. User needs are identified by interviewing several end users to define the problem and elicit sample user expectations.

  3. A prototype is developed quickly to address most of the issues of the problem and user expectations.

  4. The prototype is demonstrated to the end user. The user experiments with it and performs work within a specified time period. If the prototype is not acceptable, it is scrapped.

  5. The prototype is refined by including changes identified through use. This step and the previous one are iterated until the system fully achieves the requirements.

  6. An end-user test group is formed to provide more feedback on the prototype within a specified period of time.

  7. A determination is made as to whether the prototype will be implemented or the system will be rewritten in a conventional language. This decision is based on maintenance considerations, hardware and software efficiency, flexibility, and other system requirements.

Iterative Development Accounting

This model is based on the view that a system is a sequence of specification levels with an increasing amount of detail at each level. These levels are:

  1. ■ Informal requirements

  2. ■ Formal requirements

  3. ■ Design

  4. ■ Implementation

  5. ■ Configuration

  6. ■ Operation

Each level contains more detail than the one preceding it. In addition, each level must be balanced with upper-level specifications. Iterative development imposes development accounting on each level (i.e., a change in one specification level can be made only if the next higher level has been modified to accommodate the change).

A complete history of development is maintained by this accounting technique to ensure that consistency remains throughout all levels. A prototype is developed at each level to show that the specifications are consistent. Each prototype concentrates on the functions to be evaluated at that level. The final prototype becomes the implemented system once testing, installation, and training have been completed.

Evolutionary and Throwaway

Two models are presented here. In the first, the prototype is built and gradually enhanced to form the implemented system. The other is known as the throwaway model.

End users are integral parts of the prototype development in both models and should be trained in the use of a prototyping tool (e.g., a simulation language or 4GL). The two models are described briefly as follows:

  1. ■ Method 1:

    1. –   The user experiments with and uses a prototype built to respond to the end user’s earliest and most tentative needs to perform work.

    2. –   The analyst watches the user to see where prototype refining needs to take place. A series of prototypes, or modifications to the initial prototype, evolve into the final product.

  2. ■ Method 2:

    1. –   A prototype is implemented. The initial design is developed from this and the end user’s feedback. Another prototype is produced to implement the initial design. The final system is implemented in a conventional language.

Application Prototyping

This method involves the following steps:

  1. Identification of basic needs — Concentrate on identifying fundamental goals, objectives, and major business problems to be solved and defining data elements, data relations, and functions.

  2. Development of a working model — Build a working prototype quickly to address the key needs.

  3. Demonstration of prototype — Present the prototype to all interested users and obtain additional requirements through user feedback.

  4. Completion of prototype — Iterate between demonstration and enhancement of the prototype until users are satisfied that the organization could provide the service needed from the prototype. Once users agree that the prototype fits the concept of the service needed, it can be enhanced into the final system or rewritten in a more efficient language.

Prototype Systems Development

The stages for this approach are as follows:

  1. Management states the organization’s objectives. These are described in terms of information requirements and the scope of the system boundaries and capabilities. Prototype screens and reports are developed.

  2. End users and management review and approve the prototype. Full system design, equipment selection, programming, and documentation are completed.

  3. Management reviews and commits to implementing the system. System tests of the prototype are run in parallel with the old system. Work begins on the next release, which causes an iteration of all three stages.

Data-Driven Prototyping

Prototyping is a great communication tool for fleshing out design ideas, testing assumptions, and gathering real-time feedback from users.

This methodology consists of the following steps:

  1. Operational review — Define the project scope and evaluate the environment, current organization, and information structures.

  2. Conceptual design — Define proposed metadata (i.e., the structure of data and relationships between individual structures), the scenarios needed to describe service functions that change data states, and types of retrievals.

  3. Data design — Normalize the metadata.

  4. Heuristic analysis — Check consistency of requirements against metadata through the use of real data values; this step is iterated with the data design step.

  5. Environment test — Build programs to support data entry and retrieval (prototype).

Replacement of the Traditional Life Cycle

In this model, the steps include the following:

  1. Rapid analysis — Results in an incomplete paper model that shows the system context, critical functions, an entity–relationship model of the database, and conceptual tables, screens, attributes, reports, and menus.

  2. Database development — Uses a relational architecture to create a working database for the use of the prototype.

  3. Menu development — Expands on the initial concepts defined in rapid analysis and fixes the hierarchical structure of the application.

  4. Function development — Groups functions by type into modules.

  5. Prototype demonstration — Iterates by redoing parts as necessary and tuning if possible.

  6. Design, coding, and testing — Completes the detailed design specifications.

  7. Implementation — Is based on the evolution of the prototype and completion of all programs, tests, and documentation.

Early-Stage Prototyping

This model can assist in specifying user requirements, verifying the feasibility of system design, and translating the prototype into the final system. The procedure includes the following:

  1. A preliminary analysis and requirements specification establish a baseline for future reference.

  2. A prototype is defined and implemented, emphasizing the user interface. The prototype is developed by a small development team using prototype development language and tools to assist in rapid development.

  3. The prototype is tested in the user’s workplace.

  4. The prototype is refined by incorporating user comments as quickly as possible.

  5. Baseline requirements are refined by incorporating lessons learned from the prototype.

  6. The production system is developed through the use of a traditional life cycle with requirements derived from the prototype.

User Software Engineering

This is based on a model of software development that is part formal and part informal and includes the following steps:

  1. Requirements analysis — Activity and data modeling and identification of user characteristics.

  2. External design — Develop transactions and user–program interfaces.

  3. Facade development — Used as a prototype of the user–program interface and revised as needed.

  4. Narrative text — Used to informally specify the system operations.

  5. Preliminary relational database — Designed as the basis for a functional prototype of the system.

  6. Functional prototype — Developed to provide at least some, and perhaps all, of the functions of the proposed system.

  7. Formal specification of the system operations — May be optionally developed at this point.

  8. System architecture and modules — Conceptual design and defines the overall structure and associated software models.

  9. System implementation — In a procedural language.

  10. Testing and verification — Performed on the system before the system is released into the production environment.

G25: Random Testing

Random testing is a technique in which a program or system is tested by selecting at random some subset of all possible input values. It is not an optimal testing technique, because it has a low probability of detecting many defects. It does, however, sometimes uncover defects that standardized testing techniques might not. It should, therefore, be considered an add-on testing technique.

G26: Range Testing

Range testing is a technique that assumes that the behavior of any input variable within a predefined range will be the same. The range over which the system behavior should be the same is first selected. Then an arbitrary representative from the range is selected and tested. If it passes, it is assumed that the rest of the values do not have to be tested.

For example, consider the following piece of code, which calculates the results Z from two input values X and Y:

Z=X2Y2

If X and Y are positive integers ranging from 0 to 5 and X is greater than or equal to Y, there are 21 possible test cases, as depicted in Exhibit G.44.

Applying this technique has the potential of saving a lot of test generation time. However, it does have the limitation of the assumption that selecting an arbitrary input sample will produce the same system behavior for the rest of the inputs. Additional tests such as the conditions X and Y positive integers and Y greater than X need to be tested as well as the verification of square roots results; for example, we need to determine if the Z variable will accept fractional values as the result of the calculation or truncation (also see G4, “Boundary Value Testing”).

G27: Regression Testing

Regression testing checks the application in light of changes made during a development spiral, debugging, maintenance, or the development of a new release. This test must be performed after functional improvements or repairs have been made to a system to confirm that the changes have introduced no unintended side effects. Corrections of errors relating to logic and control flow, computational errors, and interface errors are examples of conditions that necessitate regression testing. Cosmetic errors generally do not affect other capabilities and do not require that regression testing be performed.

Images

Exhibit G.44   Range Testing Test Cases

It would be ideal if all the tests in the test suite were rerun for each new spiral, but due to time constraints, this is probably not realistic. A good regression strategy during spiral development is for some regression testing to be performed during each spiral to ensure that previously demonstrated capabilities are not adversely affected by later development spirals or error corrections. During system testing after the system is stable and the functionality has been verified, regression testing should consist of a subset of the system tests. Policies need to be created to decide which tests to include.

In theory, the reliability of a system that has been modified cannot be guaranteed without a full regression test of all tests. However, there are many practical considerations:

  1. ■ When defects are uncovered, additional regression tests should be created.

  2. ■ A regression test library should be available and maintained as it evolves.

  3. ■ There should be a methodology of isolating regression tests that focus on certain areas (see retest and defect matrices).

  4. ■ If the overall architecture of a system is changed, full regression testing should be performed.

  5. ■ Automated testing with capture/playback features should be strongly considered (see Section 6, “Modern Software Testing Tools”).

G28: Risk-Based Testing

The purpose of risk management testing is to measure the degree of business risk in an application system to improve testing. This is accomplished in two ways: high-risk applications can be identified and subjected to more extensive testing, and risk analysis can help identify the error-prone components of an individual application so that testing can be directed at those components.

Risk analysis is a formal method for identifying vulnerabilities (i.e., areas of potential loss). Any area that could be misused, intentionally or accidentally, and result in a loss to the organization is a vulnerability. Identification of risks allows the testing process to measure the potential effect of those vulnerabilities (e.g., the maximum loss that could occur if the risk or vulnerability were exploited).

Risk-based testing is a technique in which test cases are created for every major risk factor that has been previously identified. Each condition is tested to verify that the risk has been averted.

G29: Run Charts

A run chart is a graphical representation of how a quality characteristic varies with time. It is usually a line graph that shows the variability in a measurement or in a count of items. For example, in Exhibit G.45, a run chart can show the variability in the number of defects detected over time. It can show results from a sample of a population or from 100 percent.

Images

Exhibit G.45   Sample Run Chart

A control chart, a special form of run chart, places lines on the chart to represent the limits of permissible variability. These limits could be determined by a design specification or an agreed-upon standard. The control limits are frequently set to show the statistical limit of variabilities that could be due to a chance occurrence. This is calculated by using the averages and range of measurement from each sample of data. Control charts are not only used as an alarm when going outside the limits, but also to examine trends occurring within the limits. For example, if the sequence of ten measurements in Exhibit G.45 is shown to fall above the expected average, it can be assumed that this is not due to mere chance and, therefore, an investigation is in order.

G30: Sandwich Testing

Sandwich testing uses top-down and bottom-up techniques simultaneously and is a compromise between the two. The approach integrates from the top and bottom at the same time, meeting somewhere in the middle of the hierarchical control structure. The meeting point in the middle is defined by the program structure.

It is typically used on large programs but is difficult to justify on small programs. The top level of the hierarchy usually includes the user interfaces to the system, which requires stubs to mimic business functions. The bottom level includes primitive-level modules that require drivers to simulate lower-level modules.

G31: Statement Coverage Testing

Statement coverage is a white-box technique that ensures that every statement or line of code (LOC) is executed at least once. It does guarantee that every statement is executed, but it is a very weak code coverage approach and not as comprehensive as other techniques, such as branch coverage, where each branch from a decision statement is executed.

Consider the following small program, which reads records from a file and tallies the numerical ranges of a field on each record to illustrate the technique.

PROGRAM: FIELD-COUNT

Dowhile not EOF
       read record
       if FIELD_COUNTER > 7 then
       increment COUNTER_7 by 1
       else
              if FIELD_COUNTER > 3 then
                     increment COUNTER_3 by 1
              else
                     increment COUNTER_1 by 1
              endif
       endif
End_While
End

The test cases to satisfy statement coverage are as follows.

Test Case

Values (FIELD_COUNTER)

1

>7, ex. 8

2

>3, ex. 4

3

<= 3, ex. 3

G32: State Transition Testing

State transition testing is a testing technique in which the states of a system are first identified. Then a test case is written to test the triggers or stimuli that cause a transition from one condition to another state. The tests can be designed using a finite-state diagram or an equivalent table.

Consider the following small program, which reads records from a file and tallies the numerical ranges of a field on each record to illustrate the technique.

PROGRAM: FIELD-COUNT

Dowhile not EOF
       read record
       if FIELD_COUNTER > 7 then
              increment COUNTER_7 by 1
       else
              if FIELD_COUNTER > 3 then
              increment COUNTER_3 by 1
       else
              increment COUNTER_1 by 1
       endif
endif
End_While
End

Exhibit G.46 illustrates the use of the testing technique to derive test cases. The states are defined as the current value of COUNTER_7, COUNTER_3, and COUNTER_1. Then the possible transitions are considered. They consist of the end-of-file condition or the value FIELD_COUNTER for each successive record input. For each of these transitions, a definition of how each respective state is transformed is performed. Each transition becomes a test case and the final state is the expected result.

G33: Statistical Profile Testing

With statistical profile testing, statistical techniques are used to develop a usage profile of the system. Based on the expected frequency of use, the tester determines the transaction paths, conditions, functional areas, and data tables that merit focus in testing. The tests are, therefore, geared to the most frequently used part of the system.

G34: Structured Walkthroughs

Structured walkthroughs are more formal than the code-reading reviews. Distinct roles and responsibilities are assigned before the review. Preview preparation is greater, and a more formal approach to problem documentation is stressed. Another key feature of this review is that it is presented by the producer. The most common walkthroughs are those held during design and coding; however, recently they have been applied to specifications documentation and test results.

The producer schedules the review and assembles and distributes input. In most cases, the producer selects the walkthrough participants (although this is sometimes done by management) and notifies them of their roles and responsibilities. The walkthrough is usually conducted with less than seven participants and lasts no more than 2 hours. If more time is needed, there should be a break or the product should be reduced in size. Roles usually included in a walkthrough are producer, coordinator, recorder, and representatives of user, maintenance, and standards organizations.

Images

Exhibit G.46   State Transition Table

Although the review is opened by the coordinator, the producer is responsible for leading the group through the product. In the case of design and code walkthroughs, the producer simulates the operation of the component, allowing each participant to comment, depending on that individual’s area of specialization. A list of problems is kept, and at the end of the review, each participant signs the list, or other walkthrough form, indicating whether the product is accepted as is, accepted with recommended changes, or rejected. Suggested changes are made at the discretion of the producer. There are no formal means of follow-up on the review comments. If the walkthrough review is used for products throughout the life cycle, however, comments from past reviews can be discussed at the start of the next review.

G35: Syntax Testing

Syntax testing is a technique in which a syntax command generator generates test cases based on the syntax rules of the system. Both valid and invalid values are created. It is a data-driven black-box testing technique for testing input data to language processors, such as string processors and compilers. Test cases are developed based on rigid data definitions. The valid inputs are described in Backus–Naur Form (BNF) notation.

The main advantage of syntax testing is that it ensures that no misunderstandings about valid and invalid data and specification problems will become apparent when employing this technique.

G36: Table Testing

Table testing is a technique that tests the table, which is usually associated with a relational database (the same approaches can be applied to arrays, queues, and heaps). Tables usually come in two forms: sequential and indexed. The following are general tests that need to be performed against tables:

1. Indexed Tables:

a.   Delete the first record in the table.

b.   Delete a middle record in the table.

c.   Delete the last record in the table.

d.   Add a new first record in the table.

e.   Add a new middle record in the table.

f.   Add a new last record in the table.

g.   Attempt to add a duplicate record.

h.   Add a record with an invalid key, for example, garbage in the key field.

i.   Change the key fields on a existing record; for example, change an order number.

j.   Delete a nonexisting record; for example, enter a delete key that does not match table entries.

k.   Update and rewrite an existing record.

2. Sequential Tables:

a.   Attempt to delete a record from an empty table.

b.   Read a record from an empty table.

c.   Add a record to a full table.

d.   Delete one record from a one-record table.

e.   Read the last record.

f.   Read the next record after the last record.

g.   Scroll sequentially through the table.

h.   Insert an out-of-sequence record.

i.   Attempt to insert a duplicate record.

G37: Thread Testing

Thread testing is a software testing technique that demonstrates key functional capabilities by testing a string of program units that accomplishes a specific business function in the application.

A thread is basically a business transaction consisting of a set of functions. It is a single discrete process that threads through the whole system. Each function is tested separately, then added one at a time to the thread. The business transaction thread is then tested. Threads are in turn integrated and incrementally tested as subsystems, and then the whole system is tested. This approach facilitates early systems and acceptance testing.

G38: Top-Down Testing

The top-down testing technique is an incremental approach in which the high-level modules or system components are integrated and tested first. Testing then proceeds hierarchically to the bottom level. This technique requires the creation of stubs. When a module or system component is tested, the modules or components it invokes are represented by stubs, which return control back to the calling module or system component with a simulated result. As testing progresses down the program structure, each stub is replaced by the actual code it represents. There is no rule that specifies which module to test next; the only rule is that at least one of the modules or system component-calling modules must have been tested previously.

Top-down testing allows early discovery of major design flaws occurring at the top of the program, because high-level functions and decisions are tested early, and they are generally located at the top of the control structure. This verifies the program design early. An early prototype or initial design facilitates early demonstrations. Because the menus are often at the top of the control structure, the external interfaces can be displayed early to the user. Stubs need to be created, but are generally easier to create than drivers. On the other hand, critical low-level modules or system components are not tested until late in the process. In rare cases, problems with these critical modules or system components may force a redesign.

G39: White-Box Testing

White-box testing, or structural testing, is one in which test conditions are designed by examining paths of logic. The tester examines the internal structure of the program or system. Test data are driven by examining the logic of the program or system, without concern for the program or system requirements. The tester has knowledge of the internal program structure and logic, just as a mechanic knows the inner workings of an automobile. Specific examples in this category include basis path analysis, statement coverage, branch coverage, condition coverage, and branch/condition coverage.

An advantage of white-box testing is that it is thorough and focuses on the produced code. Because there is knowledge of the internal structure or logic, errors or deliberate mischief on the part of a programmer has a higher probability of being detected.

One disadvantage of white-box testing is that it does not verify that the specifications are correct; that is, it focuses only on the internal logic and does not verify the logic to the specification. Another disadvantage is that there is no way to detect missing paths and data-sensitive errors. For example, if the statement in a program should be coded “if |a–b| < 10” but is coded “if (a–b) < 1,” this would not be detectable without specification details. A final disadvantage is that white-box testing cannot execute all possible logic paths through a program, because this would entail an astronomically large number of tests.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset