Choosing hardware and software to maintain a database and then designing and implementing the database itself was once enough to establish a database environment. Today, however, security concerns loom large, coupled with government regulations on the privacy of data. In addition, a new database is unlikely to be the first database in an organization that has been in business for a while; the new database may need to interact with an existing database that cannot be merged into the new database. In this section, we’ll briefly consider how those factors influence database planning.
Security
Before the Internet, database management was fairly simple in that we were rarely concerned about security. A user name and password were enough to secure access to a centralized database. The most significant security threats were internal—from employees who either corrupted data by accident or purposely exceeded their authorized access.
Most DBMSs provide some type of internal security mechanism. However, that layer of security is not enough today. Adding a database server to a network that has a full-time connection to the Internet means that database planning must also involve network design.
Authentication servers, firewalls, and other security measures therefore need to be included in the plans for a database system.
There is little benefit to the need for added security. The planning time and additional hardware and software increase the cost of implementing the database. The cost of maintaining the database also increases
as network traffic must be monitored far more than when we had classic centralized architectures. Unfortunately, there is no alternative. Data is the lifeblood of almost every modern organization, and it must be protected.
The cost of a database security breach can be devastating to a business. The loss of trade secrets, the release of confidential customer information—even if the unauthorized disclosure of data doesn't cause any problems, security breaches can be a public relations nightmare, causing customers to lose confidence in the organization and convincing them to take their business elsewhere.
Note: Because database security is so vitally important,
Chapter 16 is devoted entirely to this topic.
Government Regulations and Privacy
Until the past 10 years or so, decisions about what data must be secured to maintain privacy has been left up to the organization storing the data. In the United States, however, that is no longer the case for many types of data. Government regulations determine who can access the data and what they may access. The following are some of the U.S. laws that may affect owners of databases.
▪ Health Insurance Portability and Accountability Act (HIPAA): HIPAA is intended to safeguard the privacy of medical records. It restricts the release of medical records to the patient alone (or the parent/guardian in the case of those under 18) or to those the patient has authorized in writing to retrieve records. It also requires the standardization of the formats of patient records so they can be transferred easily among insurance companies and the use of unique identifiers for patients. (The Social Security number may not be used.) Most importantly for database administrators, the law requires that security measures be in place to protect the privacy of medical records.
▪
Family Educational Rights and Privacy Act (FERPA): FERPA is designed to safeguard the privacy of educational records. Although the U.S. federal government has no direct authority over private schools, it does wield considerable power over funds that are allocated to schools. Therefore, FERPA denies federal funds to those schools that don't meet the requirements of the law. It states that parents have a right to view the records of children under 18 and that the records
of older students (those 18 and over) cannot be released to anyone but the student without the written permission of the student. Schools therefore have the responsibility to ensure that student records are not disclosed to unauthorized people, increasing the need for secure information systems that store student information.
▪ Children's Online Privacy Protection Act: Provisions of this law govern which data can be requested from children (those under 13) and which of those data can be stored by a site operator. It applies to Web sites, “pen pal services,” e-mail, message boards, and chat rooms. In general, the law aims to restrict the soliciting and disclosure of any information that can be used to identify a child—beyond information required for interacting with the Web site—without approval of a parent or guardian. Covered information includes first and last name, any part of a home address, e-mail address, telephone number, Social Security number, or any combination of the preceding. If covered information is necessary for interaction with a Web site—for example, registering a user—the Web site must collect only the minimally required amount of information, ensure the security of that information, and not disclose it unless required to do so by law.
Legacy Databases
Many businesses keep their data “forever.” They never throw anything out, nor do they delete electronically stored data. For a business that has been using computing since the 1960s or 1970s, this typically means that old database applications are still in use. We refer to such databases that use pre-relational data models as legacy databases. The presence of legacy databases presents several challenges to an organization, depending on the need to access and integrate the older data.
If legacy data are needed primarily as an archive (either for occasional access or retention required by law), then a company may choose to leave the database and its applications as they stand. The challenge in this situation occurs when the hardware on which the DBMS and application programs run breaks down and cannot be repaired. The only alternative may be to recover as much of the data as possible and convert it to be compatible with newer software.
Businesses that need legacy data integrated with more recent data must answer the question “Should the data be converted for storage in the
current database, or should intermediate software be used to move data between the old and the new as needed?” Because we are typically talking about large databases running on mainframes, neither solution is inexpensive.
The seemingly most logical alternative is to convert legacy data for storage in the current database. The data must be taken from the legacy database and reformatted for loading into the new database. An organization can hire one of a number of companies that specialize in data conversion, or it can perform the transfer itself. In both cases, a major component of the transfer process is a program that reads data from the legacy database, reformats them as necessary so that they match the requirements of the new database, and then loads them into the new database. Because the structure of legacy databases varies so much among organizations, the transfer program is usually custom-written for the business using it.
Just reading the procedure makes it seem fairly simple, but keep in mind that because legacy databases are old, they often contain “bad data” (data that are incorrect in some way). Once bad data get into a database, it is very difficult to get rid of them. Somehow, the problem data must be located and corrected. If there is a pattern to the bad data, that pattern must be identified to prevent any more bad data from getting into the database. The process of cleaning the data can therefore be the most time-consuming part of data conversion. Nonetheless, it is still far better to spend the time cleaning the data as they come out of the legacy database than attempting to find and correct the data once they get into the new database.
The bad data problem can be compounded by missing mandatory data. If the new database requires that data be present (for example, requiring a zip code for every order placed in the United States) and some of the legacy data are missing the required values, there must be some way to “fill in the blanks” and provide acceptable values. Supplying values for missing data can be handled by conversion software, but application programs that use the data must then be modified to identify and handle the instances of missing data.
Data migration projects also include the modification of application programs that ran solely using the legacy data. In particular, it is likely that the data manipulation language used by the legacy database is not the same as that used by the new database.
Some very large organizations have determined that it is not cost effective to convert data from a legacy database. Instead, they choose to
use some type of middleware that moves data to and from the legacy database in real time as needed. An organization that has a widely used legacy database can usually find middleware. IBM markets software that translates and transfers data between IMS (the legacy product) and DB2 (the current, relational product). When such an application does not exist, it will need to be custom-written for the organization.
Note: One commonly used format for transferring data from one database to another is XML, which you will read more about in
Chapter 18.