The most effective approach to the design of a database (or collection of databases) for an environment as diverse as that presented by Mighty-Mite Motors usually involves breaking the design into components indicated by the organization of the company. As the design evolves, the designer can examine the entities and the relationships to determine where parts of the organization will need to share data. Working on one portion of the design at a time also simplifies dealing with what might at first seem to be an overwhelmingly large database environment. Paying special attention to the need to share data helps ensure that shared data are consistent and suitable for all required uses.
A systems analysis indicates that the MMM database environment falls into the following areas:
Examining the Data Flows
During the systems analysis, a data flow diagram can be a great help in identifying where data are shared by various parts of an organization. The top-level DFD (the
context diagram in
Figure 11-11) actually tells us very little. It indicates that three sources outside the company provide data: customers (the stores to which the company sells), purchasers (the individuals who purchase products from the stores), and raw materials suppliers. Somewhere, all those data are used by a general process called Manufacture and Sell Products to keep the company in business.
However, the level 1 DFD (
Figure 11-12) is much more telling. As the data handling processes are broken down, five data stores emerge:
▪ Raw materials: This data store holds both the raw materials inventory and the orders for raw materials.
▪ Product data: The product data store contains data about the products being manufactured, product testing results, and the finished-goods inventory.
▪ Customer orders: This data store contains customer information as well as order data.
▪
Purchaser data: The purchaser data store contains information about the individuals who purchase products and the products they have purchased.
▪ Problem data: This final data store contains problem reports.
As you examine the processes that interact with these five data stores, you will find a number of processes that manipulate data in more than one data store, as well as data stores that are used by more than one process:
▪ The raw materials data store is used by the raw materials ordering and the manufacturing processes.
▪ Product data are used by manufacturing, sales, shipping, and product registration.
▪ Customer order data are used by sales and shipping.
▪ The purchases data store is used by purchaser registration and problem handling.
▪ The problem data store, used only by problem handling, is the only data store that is not shared by multiple processes.
The raw materials ordering process is the only process that uses only a single data store. Nonetheless, the level 1 DFD makes it very clear that there is no instance in which a single process uses a single data store without interaction with other data stores and processes. Given that each process in the DFD probably represents all or part of an application program, this suggests that the database designer should consider either a single database or a set of small databases along with software to facilitate the interchange of data.
The DFD makes it very clear that the need for the integration of the various data stores is very strong. In addition, Mighty-Mite Motors is a relatively small business, and therefore a single database that manages all needed aspects of the company will not grow unreasonably large. It will also be more cost effective and perform better than multiple databases that use some type of middleware to exchange data. Ultimately, the database designer may decide to distribute the database onto multiple servers, placing portions of it that are used most frequently in the division where that use occurs. The database design, however, will be the same regardless of whether the final implementation is centralized or distributed. The essential decision is to create a single database rather than several smaller, interrelated databases that must exchange data.
The ER Diagram
The systems analyst preparing the requirements document for the Mighty-Mite Motors reengineering project has had two very good sources of information about exactly what needs to be stored in the database: the employees of the company and the paper documents that the company has been using. The document that is given to the database design is therefore quite complete.
The design needs to capture all the information on the paper documents. Some documents are used only for input (for example, the product registration form or the order form). Others represent reports that an application program must be able to generate (for example, the line schedule report).
Although the current documents do not necessarily represent all of the outputs that the application programs running against the database will eventually prepare, they do provide a good starting place for the design. Whenever the designer has questions, he or she can then turn to Might-Mite's employees for clarification.
Working from the requirements document prepared by the systems analyst, along with the paper input and output documents, the database designer puts together the ER diagram. Because there are so many entities, all of which interconnect, the diagram is very wide and has been split into three pieces to make it easier to understand. As you look at each of the pieces, keep in mind that entities that appear on more than one piece represent the connection between the three illustrations.
The first part (
Figure 11-13) contains the entities for raw materials and manufacturing. This portion of the data model is dealing with three many-to-many relationships:
▪ material_order to raw_material (resolved by the composite entity material_order_line)
▪ raw_material to model (resolved by the composite entity material needed)
▪ manufacturing_line to model (resolved by the composite entity line_schedule)
The second portion of the ERD (
Figure 11-14) contains entities for product testing and sales. (Remember that in this instance, the customers are toy stores rather than individual purchasers.) There are two many-to-many relationships:
▪ test_type to model (resolved by the test entity)
▪ order to model (resolved by the order_line composite entity)
The test entity is somewhat unusual for a composite entity. It is an activity that someone performs and as such has an existence outside the database. It is not an entity created just to resolve a many-to-many relationship.
At this point, the diagrams become a bit unusual because of the need to keep track of individual products rather than simply groups of products of the same model. The model entity in
Figure 11-13 represents a type of vehicle manufactured by Mighty-Mite Motors. However, the product entity in
Figure 11-14 represents a single vehicle that is uniquely identified by a serial number. This means that the relationships among an order, the line items on an order, and the models and products are more complex than for most other sales database designs.
The order and line item entities are fairly typical. They indicate how many of a given model are required to fill a given order. The shipment entity then indicates how many of a specific model are shipped on a specific date. However, the database must also track the order in which individual products are shipped. As a result, there is a direct relationship between the product entity and the order entity, in addition to the relationship between order_line and model. In this way, Mighty-Mite Motors will know exactly where each product has gone. At the same time, the company will be able to track the status of orders (in particular, how many units of each model have yet to ship).
The final portion of the ERD (
Figure 11-15) deals with the purchasers and problem reports. There are two many-to-many relationships:
▪ problem_type to product (resolved with the entity problem_report)
▪ purchase to feature (resolved with the composite entity purchase_feature)
Like the test entity you saw earlier, the problem_report entity acts like a composite entity to resolve many-to-many relationships, but it is really a simple entity. It is an entity that has an existence outside of the database and that was not created simply to take care of the M:N relationship.
Note: Calling an entity a “problem_report” can be a bit misleading. In this case, the word “report” does not refer to a piece of paper but to the action of reporting a problem. A “problem_report” is therefore an activity rather than a document. In fact, the printed documentation of a problem report will probably include data from several entities, including the product, problem_report, purchase, and owner entities.
If you look closely at
Figure 11-15, you’ll notice that there is a one-to-one relationship between the product and purchase entities. The handling of the data supplied by a purchaser on the product registration card presents an interesting dilemma for a database designer. Each product will be registered by only one purchaser. (Even if the product is later sold or given to someone else, the new owner will not have a registration card to send in.) There will be only one set of registration data for each product, which first suggests that all of the registration data should be part of the product entity.
However, there is a lot of registration data—including one repeating group (the features for which the purchaser chose the product, represented by the feature and purchase_feature entities)—and the product is involved in a number of relationships that have nothing to do with product registration. If the DBMS has to retrieve the registration data along with the rest of the product data, database performance will suffer. It therefore makes sense in this case to keep the purchase data separate and to retrieve it only when absolutely necessary.
Note: One common mistake made by novice database designers is to create an entity called “registration card.” It is important to remember that the card itself is merely an input document. What is crucial is the data the card contains and the entity that the data describe, rather than the medium on which the data are supplied.