Chapter 3. Terminology

 

"When I use a word," Humpty Dumpty        said in rather a scornful tone, "it means           just what I choose it to mean—                neither more nor less."

 
 --Lewis Carroll, "Through the Looking Glass"

Topics Covered in This Chapter

  • Why This Terminology Is Important

  • Value-related Terms

  • Structure-related Terms

  • Relationship-related Terms

  • Integrity-related Terms

  • Summary

Why This Terminology Is Important

Relational database design has its own unique set of terms just as any other profession, trade, or discipline does. Learning this terminology is important because

  • it is used to express and define the special ideas and concepts of the relational database model. . Much of the terminology is derived from the mathematical branches of set theory and first order predicate logic, which form the basis of the relational database model.

  • it is used to express and define the database design process itself. . A number of terms are used to help define certain steps within the design process, and knowing them makes this process clearer and easier to understand.

  • it is used anywhere a relational database or RDBMS is discussed. . This terminology is used in such publications as educational course materials, trade magazine articles, RDBMS software manuals, and commercial RDBMS software books, as well as in conversations between practitioners.

Each term in this chapter is defined and then discussed in some detail. Pertinent details or further discussion of a given term will be added later, at the point where the term is expressly used within a specific technique in the design process.

A majority of the terms used to define the ideas and concepts of the design process are included here. But there are a few terms that are introduced and discussed later in the book, because they are more easily understood within the context of the specific idea or concept to which they are related.

Four categories of terms are defined in this chapter: value-related, structure-related, relationship-related, and integrity-related.

Value-related Terms

Data

Data are the values that are stored in the database. Data are static in the sense that they remain in the same state until they are modified by some manual or automated process. For example, consider the data shown in Figure 3-1.

An example of basic data.

Figure 3-1. An example of basic data.

On the surface, this data is meaningless. We cannot learn, simply by inspection, what "92883" represents. Is it a zip code? Is it a part number? Even if we know it represents a service code, is it a code associated with George Edleman? There's just no way of knowing until the data is processed.

Information

Information is data that has been processed in a way that makes it meaningful and therefore useful to the person working with or viewing it. Information is dynamic in the sense that it constantly changes relative to the data stored in the database, and also in the sense that it can be processed in an unlimited number of ways.

Figure 3-2 shows the data from the previous example but processed and transformed into information. The data has been manipulated in such a way—in this case as part of a Patient Invoice report—that it is now meaningful to anyone who views it.

Information can be presented in a variety of forms. It can be shown as the result of a query, displayed on an on-screen form, or printed on a variety of reports. But the point to remember is that data must be processed in some manner in order to become meaningful.

It is very important to understand the difference between data and information. A database is designed to provide meaningful information to someone within a business or organization. This information can be provided only if the appropriate data exists in the database and the database is structured in such a way to support that information. When this concept is fully understood, the logic behind the database design process becomes crystal clear.

An example of data transformed into information.

Figure 3-2. An example of data transformed into information.

Note

Unfortunately, data and information are two terms that are frequently used interchangeably throughout the database industry. This error is found in numerous trade magazines and commercial database books; the terms are even used erroneously by authors who should know better.

Null

Whenever a value is missing or unknown, it is said to be Null. A null value represents neither zero (in the case of numeric data) nor blank (represented by one or more spaces in the case of textual data). Zero and blank are actual values and can be meaningful in some way under certain circumstances. For example, a zero can represent the current state of an Account Balance; a blank in a Middle Initial field can represent the fact that an employee has no middle initial in his or her name. In Figure 3-3, a blank represents the fact that Washington, D.C., is not located in any county whatsoever.

An example of null values.

Figure 3-3. An example of null values.

A null value is typically used to represent an unknown value in a field. In Figure 3-3, for example, there are null values in the County field. Shannon McLain did not know what county she lived in at the time her data was entered into the database, so no entry was made into the County field. As a result, the County field contains a null value. This value can be changed, however, once Shannon finds out what county she lives in.

A null value is also used to represent a missing value in a field. If the person who entered the data for Shannon McLain failed to ask her for the name of the county she lives in, the data is considered missing since no entry was made into the County field due to operator error. Once the error is recognized, it can be easily corrected by obtaining the appropriate value from Ms. McLain.

Nulls used in a mathematical expression.

Figure 3-4. Nulls used in a mathematical expression.

A drawback to null values is that they cannot be evaluated by mathematical expressions or aggregate functions. If a null value is used in a mathematical expression, that expression will evaluate to Null. In Figure 3-4, Total Value is derived from the mathematical expression "[SRP] * [Qty On Hand]." Note, however, that the value for the Total Value field is missing where the Qty On Hand value is Null, resulting in a null value for the Total Value field as well. This is logically reasonable—if the number is unknown, the value will necessarily be unknown. Also there is a serious undetected error that occurs if all the values in the Total Value field are then added together: an inaccurate total. The only way to obtain an accurate total is to provide a value for the entries in the Qty On Hand field that are currently Null.

The result of an aggregate function, such as "Count()," will be Null if it is based on a field that contains null values. For example, Figure 3-5 shows the results of a summary query that counts the total number of occurrences of each category in the PRODUCTS table shown above. The value of Total Occurrences in the summary query is the result of the function expression "Count([Total Occurrences])." Notice that the summary query shows 0 occurrences of an unspecified Category, implying that each product has been assigned a category. This information is clearly inaccurate because there are two products in the PRODUCTS table that have not been assigned a category.

Nulls used in an aggregate function.

Figure 3-5. Nulls used in an aggregate function.

The issues of missing values, unknown values, and whether a value will be used in a mathematical expression or aggregate function are all taken into consideration in the database design process, and we will revisit and further discuss these issues in later chapters.

Structure-related Terms

Table

A table (known as a relation in relational database theory) is the chief structure in a relational database. It is composed of fields and records, the order of which is completely unimportant. A table always represents a single, specific subject, which can be either an object or an event. A typical table structure is shown in Figure 3-6.

When a table represents an object, it represents something that is tangible, such as a person, place, or thing. An object has a set of characteristics that can be stored as data, which can then be processed as information in a variety of ways. Examples of objects include vendors, customers, products, patients, materials, components, properties, and locations. The table in Figure 3-6 is an example of a table that represents an object (client).

A typical table structure.

Figure 3-6. A typical table structure.

When the subject of a table is an event, it represents something that occurs at a specific point in time. Facts about the event, like the characteristics of an object, can be stored as data and processed later as information. Examples of events include appointments, transactions, inquiries, sales, visits, and transfers. Figure 3-7 shows a table that represents an event.

A table representing an event.

Figure 3-7. A table representing an event.

An example of a validation table.

Figure 3-8. An example of a validation table.

A table that stores data used to supply information is called a data table; it is the most common type of table in a relational database. Data in this type of table is dynamic because it is manipulated (modified, deleted, and so forth) and processed into information in some form or fashion. Users constantly interact with these types of tables while performing their work on a day-to-day basis.

When a table holds data specifically used to implement data integrity, it is known as a validation table. A validation table usually represents subjects such as city names, skill categories, product codes, and project identification numbers. Data in this type of table is static because it will very rarely change at all. Although there is very little direct user interaction with these tables, these tables are frequently used indirectly to validate values that a user customarily enters into a data table. Figure 3-8 shows an example of a validation table.

Validation tables are discussed in more detail in Chapter 11, "Business Rules."

Field

A field (known as an attribute in relational database theory) is the smallest structure in a relational database. A field is used to store data in the database and it represents a characteristic of the subject of the table in which the field resides. If a field contains more than one type of distinct value, it is considered a multipart field; a field that contains multiple instances of the same type of value is considered a multivalued field. A field that contains a concatenated string value or the result of a mathematical expression is called a calculated field. Figure 3-9 shows a table with an example of each of these types of fields.

A table containing regular, calculated, multipart, and multivalued fields.

Figure 3-9. A table containing regular, calculated, multipart, and multivalued fields.

Calculated, multipart, and multivalued fields will be discussed in greater detail in Chapter 7, "Establishing Table Structures."

Record

A record (known as a tuple in relational database theory) is a structure within a table that represents a unique instance of the subject of the table. It is composed of the entire set of fields in a table, regardless of whether or not the fields contain any values. In Figure 3-9, for example, "Timothy Ennis" represents a unique instance of the subject "Clients." Ennis's record, then, is the total collection of fields, treated as a unit. The values of those fields represent relevant facts about Ennis that are important to someone in the organization.

View

A View is a virtual table that is composed of the fields of one or more data or validation tables. A View is considered "virtual" because it doesn't store any data on its own; instead it draws its data from the tables on which it is based. Views are commonly implemented as saved queries in a majority of RDBMS programs. Figure 3-10 shows an example of a View.

An example of a View.

Figure 3-10. An example of a View.

In this example, the STUDENT INSTRUMENTS View is composed of fields taken from both the STUDENTS table and the INSTRUMENTS table. Data displayed in the View is drawn from both tables simultaneously, based on matching values between the Student ID field in the STUDENTS table and the Student ID field in the INSTRUMENTS table.

Views are important for at least three reasons. First, they allow you to draw data from multiple tables. (In order for a View to draw data from multiple tables, it is necessary for the tables to have connections, or relationships, to each other.) Second, Views are important because they provide a means of preventing specified fields within a table (or group of tables) from being manipulated or seen by certain users. This capability can be very advantageous in terms of security. Third, views can be used to implement data integrity; this type of View is known as a Validation View. Designing and using Views is discussed in greater detail in Chapter 12, "Views."

Keys

Keys are special fields that serve specific purposes within a table, and the type of key determines its use within the table. For example, a Primary key is a field that uniquely identifies a record within a table. Another type is a Foreign key, which is a field that is used to establish a relationship between a pair of tables. Figure 3-11 shows an example of each of these types of key fields.

In Figure 3-11, Agent ID is the Primary key of AGENTS because it uniquely identifies each record in that table. Likewise, Client ID is the Primary key of CLIENTS because it uniquely identifies each of the table's records. In contrast, the Agent ID field in the CLIENTS table is a Foreign key because it is used to establish a relationship between the CLIENTS table and the AGENTS table.

An example of Primary and foreign key fields.

Figure 3-11. An example of Primary and foreign key fields.

Key fields are a very important part of a relational database because they help to establish various levels of data integrity, and they are used to establish relationships between tables. Primary keys are discussed in greater detail in Chapter 8, "Keys," and Foreign keys are discussed in Chapter 10, "Table Relationships."

Index

An index is a structure within an RDBMS that is used to improve data processing. The way an index is used and how it works is strictly dependent on the RDBMS program. However, an index has absolutely nothing to do with the logical database structure! The only reason that the term index is mentioned here is because it is very often confused with the term key.

Unfortunately, key and index are two terms that are frequently used interchangeably throughout the database industry, yet another major error that is found in numerous trade magazines and commercial database books. (Remember data and information?) The best way to remember the difference between the two is that keys are logical structures used to identify records within a table, and indexes are physical structures used to optimize data processing.

Relationship-related Terms

Relationships

A connection established between a pair of tables is known as a relationship. A relationship exists when a pair of tables is connected by a Primary key and a Foreign key (as shown in Figure 3-11) or is linked together by a third table, known as a linking table. Figure 3-12 shows a relationship established between two tables through a linking table.

Relationships are very important to data integrity because they help reduce redundant data and duplicate data. They also provide the means to define views.

Every relationship can be characterized by the type of relationship that exists between the tables, the type of participation each table has within the relationship, and the degree of participation each table has within the relationship.

Types of Relationships

When two tables are related, there is always a specific type of relationship (traditionally known as cardinality) that exists between them. There are three possible types of relationships: one-to-one, one-to-many, and many-to-many.

A relationship established between two tablers with the help of a linking table.

Figure 3-12. A relationship established between two tablers with the help of a linking table.

One-to-One Relationships

A one-to-one relationship exists between a pair of tables if a single record in the first table is related to only one record in the second table, and a single record in the second table is related to only one record in the first table. Figure 3-13 shows an example of a one-to-one relationship involving an EMPLOYEES table and a COMPENSATION table. In this example, a single record in the EMPLOYEES table is related to only one record in the COMPENSATION table; likewise, a single record in the COMPENSATION table is related to only one record in the EMPLOYEES table.

An example of a one-to-one relationship.

Figure 3-13. An example of a one-to-one relationship.

One-to-Many Relationships

A one-to-many relationship exists between a pair of tables if a single record in the first table can be related to one or more records in the second table, but a single record in the second table can be related to only one record in the first table. A one-to-many relationship involving a STUDENTS table and an INSTRUMENTS table is shown in Figure 3-14. In this case, a single record in the STUDENTS table can be related to one or more records in the INSTRUMENTS table, but a single record in the INSTRUMENTS table is related to only one record in the STUDENTS table.

This is by far the most common relationship that exists between a pair of tables in a database. A one-to-many relationship is very important because it helps to eliminate duplicate data and to keep redundant data to an absolute minimum.

An example of a one-to-many relationship.

Figure 3-14. An example of a one-to-many relationship.

Many-to-Many Relationships

A many-to-many relationship exists between a pair of tables if a single record in the first table can be related to one or more records in the second table, and a single record in the second table can be related to one or more records in the first table. Figure 3-15 shows a classic many-to-many relationship. In this example, a single record in the STUDENTS table can be related to one or more records in the CLASSES table; likewise, a single record in the CLASSES table can be related to one or more records in the STUDENTS table.

Establishing a direct connection between these two tables is difficult because it will produce a large amount of redundant data in one of the tables. There is also a problem with inserting, updating, and deleting data in this type of relationship. However, these problems can be surmounted, and the solution is provided in Chapter 10, "Table Relationships."

An example of a many-to-many relationship.

Figure 3-15. An example of a many-to-many relationship.

It's important to know what type of relationship exists between a pair of tables, because the type of relationship determines how the tables are related, whether or not records between the tables are interdependent, and the minimum and maximum number of related records that can exist within the relationship.

Types of Participation

There are two types of participation that a table can have within a relationship: mandatory and optional. Say there is a relationship between two tables called TABLE A and TABLE B. If records in TABLE A must exist before any new records can be entered into TABLE B, TABLE A's participation within the relationship is mandatory. However, if it isn't necessary for records in TABLE A to exist in order to enter any new records into TABLE B, TABLE A's participation within the relationship is optional. Each table within a relationship can participate in either manner. The type of participation each table has within a relationship is typically determined by the way the data in each table is related and how the data is being used.

Consider the relationship between the AGENTS and CLIENTS tables in Figure 3-16. The AGENTS table has a mandatory participation within the relationship if agents must exist before a new client can be entered into the CLIENTS table. But the AGENTS table's participation is optional if it isn't necessary to have agents in the AGENTS table before a new client can be entered into the CLIENTS table. The type of participation established for the AGENTS table is determined by the way its data is being used in relation to the data in the CLIENTS table. For example, if it is necessary to ensure that each client is assigned an available agent, then the participation of the AGENTS table within the relationship should be mandatory.

The AGENTS and CLIENTS tables.

Figure 3-16. The AGENTS and CLIENTS tables.

Degree of Participation

Each table in a relationship has a degree of participation, which is the minimum and maximum number of records in one table that can be related to a single record in the other table. Consider, once again, a relationship between two tables called TABLE A and TABLE B. The degree of participation for TABLE B is established by indicating a minimum and maximum number of records in TABLE B that can be related to a single record in TABLE A. If a single record in TABLE A can be related to a minimum of one record up to a maximum of ten records in TABLE B, then the degree of participation for TABLE B is 1,10. (The degree of participation is notated with the minimum number on the left, the maximum number on the right, separated by a comma.) The degree of participation for TABLE A is established in the same manner. The degree of participation for each table in a relationship is typically determined by the way the data in each table is related and how the data is being used.

Consider the AGENTS and CLIENTS table in Figure 3-16 once more. If, for example, we say that an agent should handle at least one client but certainly no more than eight, then the degree of participation for the CLIENTS table is 1,8. If we further say that each client can only be assigned to one agent, then the degree of participation for the AGENTS table is 1,1.

Integrity-related Terms

Field Specification

A Field Specification (traditionally known as a domain) represents all the elements of a field. Each Field Specification has three types of elements: general, physical, and logical.

A field's general elements comprise the most fundamental information about a field in a database and include items such as field name, description, and source table. If a field is used in more than one table, these items will remain the same. Physical elements determine how a field is built and how it is represented to the person using it; they include items such as data type, length, and display format. Logical elements describe the values stored in a field; they include items such as Required Value, Range of Values, and Default Value. All elements associated with a Field Specification, including those mentioned here, are covered in greater detail in Chapter 9, "Field Specifications."

Data Integrity

Data integrity refers to the validity, consistency, and accuracy of the data in a database. It cannot be overstated that the level of accuracy of the information retrieved from the database is in direct proportion to the level of data integrity imposed within the database. Data integrity is one of the most important aspects of the database design process, and it should not be underestimated, overlooked, or even partially neglected. To make any of these mistakes would result in a high risk of undetectable errors. This, in turn, would result in users making poor strategic decisions based on inaccurate information.

Four types of data integrity are implemented during the database design process. Three types of data integrity are based on various aspects of the database structure and are labeled according to the area (level) in which they operate. The fourth type of data integrity is based on the way the data is viewed and used by an organization. Here are the types of data integrity and their descriptions:

  • Table-level integrity ensures that the field that identifies each record within the table, is unique and is never missing its value.

  • Field-level integrity ensures that the structure of every field is sound, that the values in each field are valid, consistent, and accurate, and that fields of the same type (such as City fields) are consistently defined throughout the database.

  • Relationship-level integrity (traditionally known as referential integrity) ensures that the relationship between a pair of tables is sound and that there is synchronization between the two tables whenever data is entered, updated, or deleted.

  • Business rules impose restrictions or limitations on certain aspects of a database based on the ways an organization perceives and uses its data. These restrictions can affect aspects of database design such as the range and types of values stored in a field, the type of participation and the degree of participation of each table within a relationship, and the type of synchronization used for relationship-level integrity in certain relationships. All of these restrictions are discussed in more detail in Chapter 11, "Business Rules." Since business rules affect integrity, they must be considered along with the other three types of data integrity during the design process.

Summary

We opened this chapter by explaining why terminology is important for defining, discussing, or reading about the relational database model and the database design process.

The discussion about value-related terms showed you that there is a distinct difference between "data" and "information," and that understanding this difference is crucial to understanding the database design process. You now know about Nulls and their affect on information retrieved from the database.

Next we discussed structure-related terms. You learned that the core structures of every relational database are fields, records, and tables. You now know that Views are "virtual" tables that are used, in part, to draw data from two or more tables simultaneously. We then looked at key fields, which are used to uniquely identify records within a table and to establish a relationship between a pair of tables. Finally, you learned the difference between a key field and an index. Now you know that an index is strictly a software device used to optimize data processing.

In our coverage of relationship-related terms we established that a connection between a pair of tables is a relationship. A relationship is used to help ensure various aspects of data integrity, and it is the mechanism used by a View to draw data from multiple tables. We then discussed the three characteristics of table relationships. The first characteristic is the type of relationship (one-to-one, one-to-many, many-to-many); the second characteristic is the type of participation (optional or mandatory); and the third characteristic is the degree of participation (minimum/maximum number of related records).

The chapter ended with a discussion of integrity-related terms. We saw that a Field Specification establishes the general, physical, and logical characteristics of a field—characteristics that are an integral part of every field in the database. We then learned that data integrity is one of the most important aspects of the database design process because of its positive effect on the data in the database. Also, you now know that there are four types of data integrity—three based on database structure and one that is based on the way the organization perceives and uses its data. These levels of integrity ensure the quality of your database design and the accuracy of the information you retrieve from it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset