Roll Film: Back to CyberCinema

In the previous chapter, we built a sample requirements document for a proposed CyberCinema movie review Web site. In the case of CyberCinema, the answer to the question “What is the data?” seems quite simple—movie reviews. Application designers have to be suspicious by nature, and this answer is suspiciously simple. Delving into this answer a little deeper, we can ask, “What are movie reviews?” Movie reviews essentially are articles, and articles consist of paragraphs of written text. The subject matter is movies. Movie reviews are written by someone, so there's author information as well. A data model begins to take shape, as shown in Figure 4-3.

Figure 4-3. First cut of CyberCinema data model diagram


Great, we're finished. Just to be sure, though, let's go back to our requirements document and double-check that we've covered all the use cases in our requirements.

  1. Reviewers must be able to write reviews.

  2. Reviews can contain normal text features such as bold, italic, and underline.

  3. Reviews can include headings to delineate one section of a review from another.

  4. In reviews, actor names, movie names, and director names must be links to a search facility of some kind.

  5. Reviews can contain links to other reviews.

  6. Reviews can contain links to outside URLs.

  7. Reviews must be searchable by movie.

  8. Reviews must be searchable by director.

  9. Reviews must be searchable by actor.

  10. Reviews must be searchable by reviewer.

  11. Movies must be searchable by director.

  12. Movies must be searchable by actor.

We won't able to meet these requirements with our conceived data model. For example, we're not addressing directors, actors, or reviewers in the data model, so we won't be able to satisfy requirements 8–12. If we're going to be able to look up movie reviews reliably by actor, director, and reviewer, and movies by actor and director, we must include these elements in our model. By creating separate elements in our data model (and subsequently, our database schema, as we'll see in Chapter 6), the database will be able to look up information based on these items.

Normalization Equals Power: Defining Relationships

If you rely on the title of a movie to uniquely identify it in your data model, how are you going to differentiate The Parent Trap with Haley Mills from The Parent Trap with Dennis Quaid? If you rely on names to uniquely identify actors, how is your system going to know that Robin Wright is the same person as Robin Wright Penn? The only way to reliably track all these different types of data is to separate them into different logical entities (such as directors, actors, and reviewers).

Thus, we now have several “entities” that we wish to manage in our fledgling system: movie reviews, movies, review authors, actors, and directors. Let's redraw our data model (see Figure 4-4) and take a look at what we're working with.

Figure 4-4. Redrawn CyberCinema data model diagram, including actors, directors, and reviewers


Keep It Simple: No Really, I Mean It

When looking at the data model in Figure 4-4, if you're thinking “Something looks very similar between actor, director, and reviewer,” great! Now you're starting to think like a data architect. You've noticed that the only difference between the actor, director, and reviewer entities is the name of the entity. By collapsing these three entities into one all-encompassing “person” entity, we can greatly simplify this model, while allowing it to retain its power.

If we were doing an object-oriented design at this stage, we might create a “person” superclass and make “actor,” “director,” and “reviewer” subclasses of this “person” class. When you get to the stage where you actually start writing application code, you may, in fact, want to do this. Remember that this is data-oriented application design. The idea is to think through the data model first, without considering implementation specifics. Thinking in terms of superclasses and subclasses is an important part of application design, but we want to keep such application-centric thinking out of our abstract data model. Notice we're also not talking explicitly about the use of XML at this point; that also comes later, after the abstract design is complete.

So, keeping the abstract mantra in mind, we'll reduce our actor, director, and reviewer entities into one “person” entity (see Figure 4-5).

Figure 4-5. Simplified CyberCinema data model


Even though the data model in Figure 4-5 has one person entity where before we had three entities (actor, director, and reviewer), we've kept three relationships in our diagram, represented by the lines bridging the three entities (Movie, Review, and Person).

We know that people act in and direct movies and that people also author reviews (using author as a verb).

Getting Complex: Many-to-One and Many-to-Many Relationships

In order for our reviewers to be able to write more than one review, we must redefine the relationship between review and reviewer as a many-to-one relationship. Luckily, UML provides a simple notation for doing so (see Figure 4-6).

Figure 4-6. A many-to-one relationship


In Figure 4-6, the n and 1 represent the nature of the relationship between reviewer and review. A single reviewer (1) can write any number of reviews (n). Of course, multiple reviewers can collaborate on the same review, but with this realization, the data modelers are tempted to say, “Enough is enough! No! One reviewer per review, and that's it!”

Their resistance is because these rather callous modelers no doubt have built such systems before, and they are envisioning a relational table structure that could accommodate multiple authors per review—and they're cringing. They're cringing with good reason. This is when the requirements for a system start to balloon, sending even the most mild-mannered engineers into apoplectic fits. Calm down. Go to your happy place. Remember, we're not dealing with specific implementation issues when modeling data.

Returning to our requirements document, we may see that, in fact, multiple reviewers can collaborate on a review. It pays to focus on the data model itself here, leaving aside all thoughts related to implementation. We can worry about implementation later.

For each relationship we've defined, we now must ask the question: Is this a one-to-one relationship, a one-to-many relationship, or a many-to-many relationship? Let's step through each relationship.

  • Can a review be written by more than one reviewer? Yes

  • Can a reviewer write more than one review? Yes

  • Thus reviewer to review is a many-to-many relationship.

  • Can a movie have more than one actor? Unless the actor is Spalding Grey, yes, and the same goes for director.

  • Can an actor be in more than one movie? Unfortunately, in some cases, yes.

  • Thus actor to movie is a many-to-many relationship.

  • Can a review be about more than one movie? Yes.

  • Can a movie be reviewed more than once? Undoubtedly.

  • Thus movie to review is a many-to-many relationship.

It's beginning to look like our data model consists mostly of many-to-many relationships between entities. Our new diagram is shown in Figure 4-7.

Figure 4-7. CyberCinema data modeling diagram with relationships


Another Layer of Complexity: Adding Media

If you have been paying attention, you may have noticed that we didn't correctly list all the requirements for our CyberCinema Web site. We left out one requirement: that reviews can include photographs or other media (such as video or sound clips). One approach to modeling this layer of complexity might be to leave media out of your data model entirely. The reasoning might go something like this:

Well, this is for a Web site, so we can embed the images or media into the articles using HTML tags while the actual files can sit on disk somewhere. According to the requirements document, we're never going to search for or look up the reviews based on media, so we're good to go with the previously conceived data model.

Unfortunately, using this reasoning, you've just lost the ability to manage your media. For example, consider that the publishers of the site, after CyberCinema is launched, could decide they want the search results screens to display next to review titles icons, indicating whether the review contains images, a video clip, or a sound clip. If you use the previous reasoning on which to base your data model, you're in serious trouble.

The next instinct of a novice data modeler is usually to allow graphics to be part of the data model, but allow only one piece of media per review. That will keep everyone happy and the data model simple. Using this reasoning, you're imposing an artificial constraint on your review authors, based only on your prior experience that a one-to-one relationship is easier to implement than a many-to-many relationship. Although it's a valid reason and follows the “keep it simple” mantra, such reasoning is another case of falling into the trap of implementation-specific thinking.

For the purposes of our abstract data model, all we need to know is that media entities (for simplicity's sake, we'll wrap them into one entity) can be embedded in reviews and have a many-to-many relationship with reviews. A new data model diagram incorporating these changes is shown in Figure 4-8.

Figure 4-8. Revised CyberCinema data model diagram, incorporating new “media” entity


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset