Chapter 35. Herb's Text Processing Tangle

Herb was trying to rationalize a series of programs that processed text. Each of several programs operated on a stream of text and marked it in various ways: names, phrases, places, organizations, and so on. The trouble was that each of the programs made assumptions that other programs had already operated on the text stream. This made it difficult to add to or change any of the programs in any significant way.

He went to Sandy, who had just finished using Patterns in her project with Arthur. "I hear you're now an expert on design patterns," Herb said. "Is there a pattern or two in that list that can help me?"

"There might be," Sandy said. "But you should recognize that they aren't magic bullets, and you don't find them everywhere. They're just reusable software methodologies, but let's look; one might apply here."

After Herb described his problem, Sandy thumbed through Design Patterns for a moment. "It looks to me like this series of programs could be described as a Chain of Responsibility pattern," she said, "but you need to revise or subclass them so that they all have the same interface. Then you can send your text data down the chain and link the chain members more easily."

"That's easy enough," Herb said, "but what about the problem that some of these modules do some processing and others don't? This object recognizes some HTML tags so it can find new paragraphs, but these others don't. Wouldn't it be better if this wasn't scrambled into several classes?"

"It would be," Sandy agreed. "So early in the chain why don't you create a new object that parses out the HTML and just sends the rearranged text on to the remaining modules? You won't have to change the later modules because they will no longer see any HTML. This is still a Chain of Responsibility but with one more class in the chain."

"Well, this is a great theory," Herb agreed, "but now I have to write my own HTML parser. Thanks a lot."

"Actually," Sandy pointed out, "you only have to check for a few items and skip over and eliminate the rest. For example, you need to add a blank line wherever you find a <p> tag. The same applies to list tags and header tags. There isn't too much more to this. You might want to pull out the document titles as you go along, but that's about all."

"And how do I go about this? It still seems like a lot of work," Herb protested.

"I'd use the Interpreter pattern," Sandy answered. "Its advantage is that every item that you need to interpret is handled in a separate class, and none of them is very complicated."

"And I'll bet I can guess how you connect these little Interpreter objects," Herb replied. "Another Chain of Responsibility. This really simplifies my work." Then his face fell. "I just remembered. Some of the documents are going to be in XML. That's just enough to throw the whole design for a loop."

"Not really," Sandy pointed out. "You could use the Strategy pattern to select an HTML or XML parser as the first step in the chain. That's not hard at all, and there are lots of Java-based XML parsers available."

"Terrific!" said Herb. "I hope the rest of the project is as easy as this is turning out to be."

"Here," said Sandy, "why don't you borrow my copy of Design Patterns until the copy you're going to order comes in."

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset