Chapter 9

The PACK and UNPACK Operators I: The Single-Attribute Case

This chapter and the next build on the notions introduced in the previous chapter—viz., expanded form and collapsed form, which apply to sets of intervals as such—to define two further canonical forms, viz., unpacked and packed form, which apply to relations with zero or more attributes of some interval type. The present chapter considers what’s probably the most important special case, viz., the case of unpacking or packing a relation on the basis of exactly one interval attribute. In particular, the usefulness of the two canonical forms, and the corresponding UNPACK and PACK operators, in dealing with the sample queries from Chapter 5 is clearly demonstrated.

Keywords

UNPACK; PACK; unpacked form; packed form

Relations are simply a tedious pack.

—Oscar Wilde:

A Woman of No Importance (1893)

As indicated at the end of Chapter 8, the purpose of the present chapter is to introduce and describe certain relational operators that build on the operators COLLAPSE and EXPAND discussed in that previous chapter. The operators in question are called PACK and UNPACK, respectively;1 PACK builds on COLLAPSE and UNPACK builds on EXPAND.

Preliminary Examples

We begin with a couple of preliminary examples that should help you understand the detailed discussions to appear in subsequent sections. Consider the following relation (let’s call it r):2

image

“Packing” relation r on its interval attribute DURING—which we express formally as PACK r ON (DURING)—gives the following packed form of the relation:

image

Informally, this result consists of a restructured version of the original relation r; it represents the same information as r—in fact, it’s formally equivalent to r, in a sense to be explained in Chapter 10—but the representation has been restructured in such a way that no two DURING intervals for a given supplier either meet or overlap. The effect of that restructuring, or packing, is thus to let us view the information content of r in a clumped form, without having to worry about the possibility that distinct clumps might meet or overlap. The relevance of COLLAPSE to such restructuring should be obvious.3

Analogously, unpacking that same original relation r on DURING, which we express formally as UNPACK r ON (DURING), gives the following unpacked form of the relation:

image

Informally, this result also consists of a restructured version of the original relation r; it represents the same information as r—again, it’s formally equivalent to r—but the representation has been restructured in such a way that every DURING value is a unit interval specifically. The effect of that restructuring, or unpacking, is thus to let us view the information content of r at an atomic level, without having to worry about the many different ways in which that information might be bundled together into clumps. The relevance of EXPAND to such restructuring should be obvious.4

In the next two sections, we explain the PACK and UNPACK operators in detail. Our explanations are based on examples that are based in turn on Queries A and B from the very end of Chapter 5. Just to remind you, here first are slightly simplified restatements of those queries:

Queries (fully temporal database with FROM-TO pairs, Fig. 5.3):

■ Query A: Get SNO-FROM-TO triples such that FROM and TO together designate a maximal interval during which supplier SNO was able to supply at least one part. Note: We remind you that the term maximal interval, in the case at hand, means that supplier SNO was unable to supply any part at all on the day immediately before FROM or immediately after TO. We remind you also that the result of the query might contain several tuples for the same supplier (but with different intervals, of course; moreover, those different intervals will neither meet nor overlap).

■ Query B: Get SNO-FROM-TO triples such that FROM and TO together designate a maximal interval during which supplier SNO was unable to supply any parts at all. Note: Again the result might contain several tuples for the same supplier.

Of course, we need to restate these queries in terms of the database of Fig. 6.1 (i.e., the version of the database containing intervals as such, instead of explicit FROM-TO pairs. For convenience, however, we first repeat in Fig. 9.1 the sample values from Fig. 6.1:

image
Fig. 9.1 Simplified suppliers-and-shipments database (fully temporal version using intervals)–sample values

Now we can restate the queries:

Queries (fully temporal database with DURING intervals, Fig. 9.1):

■ Query A: Get SNO-DURING pairs such that DURING designates a maximal interval during which supplier SNO was able to supply at least one part.

■ Query B: Get SNO-DURING pairs such that DURING designates a maximal interval during which supplier SNO was unable to supply any parts at all.

It might help to show the results of these queries, given the sample values from Fig. 9.1. Here they are—Query A on the left, Query B on the right:

image

In what follows, we’ll refer to these two result relations as ResultA and ResultB, respectively. Observe in particular that the DURING values in these relations are all maximal intervals.

Packed Form

We now focus on Query A specifically. First of all, it’s intuitively obvious that Query A can be answered from SP_DURING alone—we don’t need to inspect S_DURING at all. Now, you might recall that the very first version of the query, which applied to the nontemporal version of the database (Fig. 5.1) and was discussed in Chapter 5, involved a simple projection operation on SP. So you might not be surprised to learn that the restated version of the query for the database of Fig. 9.1 is going to involve an operator on SP_DURING that some writers like to call “temporal projection”—though we hasten to add that we don’t much care for this nomenclature ourselves, because the operator doesn’t apply only to temporal intervals as such.

Be that as it may, we’ll build up our formulation of the query one step at a time, using Tutorial D’s WITH construct. Here’s the first step:

WITH ( t1 := SP_DURING { SNO , DURING } ) :

This step simply projects away part numbers, which are irrelevant to the query under consideration, and introduces the name t1 for the result of that projection. So here’s t1 (observe in particular that certain tuples in this result derive from more than one tuple in SP_DURING—in other words, the projection has “eliminated duplicates,” to use the common parlance):

image

Observe next that t1 contains redundant information; for example, it tells us three times that supplier S1 was able to supply something on day 6. By contrast, the desired result, ResultA, contains no such redundancies. Now, as you’ve probably realized, that desired result is in fact the packed form of t1 on DURING—and please note very carefully that a DURING value for a given supplier in that packed form doesn’t necessarily exist as a DURING value for that supplier in the relation t1 from which that packed form is derived. In our example, this remark applies to supplier S4 in particular (but to supplier S4 only, as it happens).

Now, we’ll eventually reach a point where we can obtain that desired result by means of a simple expression of the form:

PACK t1 ON ( DURING )

As already indicated, however, we want to build up to that point gradually. The next step is:

WITH ( t2 := t1 GROUP { DURING } AS X ) :

Aside: The source expression on the right of the “:=” symbol in the assignment here could equally well have been the following EXTEND expression:

EXTEND t1 { SNO } : { X := !!t1 }

In other words, the X value corresponding to any given tuple t of t1 is, precisely, the image relation within t1 itself of the SNO value in that tuple t. End of aside.

So here’s t2 (note in particular that attribute X is relation valued):

image

Next, we replace each of those X values by its collapsed form, using the “what if” form of EXTEND and the COLLAPSE operator from the previous chapter:

WITH ( t3 := EXTEND t2: { X := COLLAPSE ( X ) } )

Here’s t3:

image

Finally, we ungroup:

t3 UNGROUP X

And this ungrouping yields the relation we earlier called ResultA. Thus, bringing all of the steps together and simplifying slightly, ResultA is the result of evaluating this expression:

WITH ( tl := SP_DURING { SNO , DURING } ,

   t2 := tl GROUP { DURING } AS X ,

   t3 := EXTEND t2 : { X := COLLAPSE ( X ) } ) :

t3 UNGROUP X

Now, it would obviously be desirable to be able to get from t1 to ResultA in a single operation. To that end, we introduce (at last!) our new PACK operator. Here’s the definition:

Definition (single-attribute PACK): Let relation r have an interval attribute A. Then the expression PACK r ON (A) denotes the packing of r on A, and it’s defined to be shorthand for the following:

WITH ( rl := r GROUP { A } AS X, r2 := EXTEND rl : { X := COLLAPSE ( X ) } ) :

r2 UNGROUP X

And it’s clear from this definition that, as claimed earlier, PACK is really just shorthand—but of course it’s extremely useful shorthand. It should also be clear (but it might help to spell the point out explicitly) that packing a relation on some attribute A involves partitioning that relation on all of its attributes apart from A. For instance, consider relation t1 from the example we used above to explain the packing process. That relation had attributes SNO and DURING, and when we packed it on DURING, we effectively formed a partition consisting of the tuples for supplier S1, another consisting of the tuples for supplier S2, and so on. (Note, however, that there’s no suggestion that each such partition will contribute just one tuple to the final packed result. That is, the expression PACK r ON (A) might return a result with several tuples for any given value of B, where B is all of the attributes of r apart from A. By way of example, see relation ResultA above, which has two tuples for supplier S2.)

To get back to Query A, we can now offer the following as a reasonably concise formulation of that query:

PACK SP_DURING { SNO , DURING } ON ( DURING )

As indicated earlier, the overall operation denoted by this expression is an example of what’s sometimes called temporal projection (see, e.g., reference [62]). To be specific, it’s the “temporal projection” of SP_DURING on SNO and DURING. We’ll have quite a lot more to say about such “temporal” operators in general in Chapter 11.

Unpacked Form

We turn now to Query B (“Get SNO-DURING pairs such that DURING designates a maximal interval during which supplier SNO was unable to supply any parts at all”). Intuitively, it should be clear that, unlike Query A, this one can’t be answered from SP_DURING alone—we need to inspect S_DURING as well. In fact, as you might recall, the very first version of this query, for the nontemporal version of the database, involved taking the difference between … well, not exactly S and SP as such, but, rather, certain projections of S and SP. But, ignoring this point of detail for the moment, if now you’re expecting to see something that might be called a “temporal difference,” then of course you’re right. What’s more, as you might also be expecting, while “temporal projection” involved the PACK operator, “temporal difference” is going to involve the UNPACK operator. (Actually it’s going to involve PACK as well, as we’ll see.)

In essence, then, what we need to do for Query B is look for SNO-DURING pairs that (a) are either contained in or implied by S_DURING and (b) are neither contained in nor implied by SP_DURING, because (speaking very loosely):

■ SNO-DURING pairs that are contained in or implied by S_DURING show when suppliers were under contract.

■ SNO-DURING pairs that are contained in or implied by SP_DURING show when suppliers were able to supply something.

■ The difference between the first of these two sets of SNO-DURING pairs and the second (in that order) shows when suppliers were under contract but unable to supply anything.

This brief outline should be sufficient to suggest, correctly, that (again in essence) what we need to do is perform a couple of unpack operations and then take the difference between the results. So first let’s introduce the UNPACK operator:

Definition (single-attribute UNPACK): Let relation r have an interval attribute A. Then the expression UNPACK r ON (A) denotes the unpacking of r on A, and it’s defined to be shorthand for the following:

WITH ( r1 := r GROUP { A } AS X , r2 := EXTEND r1 : { X := EXPAND ( X ) } ) :

r2 UNGROUP X

As you can see, this definition is identical to that for PACK, except for the appearance of EXPAND rather than COLLAPSE in the second line of the expansion.

Returning to Query B, we can obtain the left operand we need (i.e., SNO-DURING pairs that are contained in or implied by S_DURING) as follows:

UNPACK S_DURING { SNO , DURING } ON ( DURING )

(Of course, the subexpression S_DURING {SNO,DURING} here could be simplified to just S_DURING, since that projection is in fact an identity projection. We show the projection explicitly for reasons of clarity.) Here’s the expanded form of this UNPACK:

WITH ( r1 := S_DURING { SNO , DURING } GROUP { DURING } AS X , r2 := EXTEND r1 : { X := EXPAND ( X ) } ) :
r2 UNGROUP X

Given the sample data of Fig. 9.1, the result of this expression—let’s call it u1—looks like this:

image

Relation u1 is the unpacked form of S_DURING on DURING, and it’s the left operand for the difference operation we’re gradually building up to. Of course, the right operand (i.e., SNO-DURING pairs that are contained in or implied by SP_DURING) is obtained in like fashion:

UNPACK SP_DURING { SNO, DURING } ON ( DURING )

The result of this expression—let’s call it u2—looks like this:

image

Now we can form the difference:5

u1 MINUS u2

The result of this expression, u3 say, looks like this:

image

Finally, we pack u3 on DURING to obtain the desired overall result:

PACK u3 ON ( DURING )

And that final result is, of course, exactly ResultB as shown previously:

image

Here then is a formulation of Query B as a single expression:6

PACK
  ( ( UNPACK S_DURING { SNO , DURING } ON ( DURING ) )
    MINUS
    ( UNPACK SP_DURING { SNO , DURING } ON ( DURING ) ) )
ON ( DURING )

As already noted, the overall operation denoted by this expression is an example of what’s sometimes called temporal difference. More precisely, it’s a “temporal difference” between (a) the projection of S_DURING on SNO and DURING and (b) the projection of SP_DURING on SNO and DURING (in that order). Again, we’ll have much more to say on such matters in Chapter 11; here we content ourselves with a few fairly obvious remarks regarding the semantics of the PACK and UNPACK operators as such. Here first is a repeat of the formal definition of UNPACK:

UNPACK r ON ( A ) def¯¯image

  WITH ( r1 := r GROUP { A } AS X , r2 := EXTEND r1 : { X := EXPAND ( X ) } ) :
  r2 UNGROUP X

Observe now that:

■ Unpacking r on A (just like packing r on A) involves partitioning r on all of its attributes apart from A.

■ Like PACK, UNPACK is really just shorthand; in particular, it’s defined in terms of the EXPAND operator from Chapter 8.

■ Like the operators COLLAPSE and EXPAND on which they’re based, PACK and UNPACK are not inverses of each other. That is, neither of the expressions
UNPACK ( PACK r ON ( A ) ) ON ( A )
and
PACK ( UNPACK r ON ( A ) ) ON ( A )
is guaranteed to return r, in general. In fact, the following identities clearly hold:
UNPACK ( PACK r ON ( A ) ) ON ( A ) ≡ UNPACK r ON ( A )
PACK ( UNPACK r ON ( A ) ) ON ( A ) ≡ PACK r ON ( A )
It follows that the first operation in a PACK-then-UNPACK or UNPACK-then-PACK sequence on some given relation can simply be ignored, a fact that could be useful for optimization purposes (especially when that first operation is UNPACK).7

■ The following identities clearly hold as well:
UNPACK ( UNPACK ( r ) ON ( A ) ) ON ( A ) ≡ UNPACK ( r ) ON ( A )
PACK ( PACK ( r ) ON ( A ) ) ON ( A ) ≡ PACK ( r ) ON ( A )

Further Queries

In this section, we give some further examples of the use of PACK and UNPACK in formulating queries. We assume the result is required in packed form in each case.

Our first example is deliberately not a temporal one. We’re given a relvar NHW, with attributes NAME, HEIGHT, and WEIGHT, giving the height and weight of certain persons. (You might like to try writing out a sample value for this relvar and using that sample value as a basis for understanding what’s being asked for in the following query, also for illustrating the steps in the subsequent explanation.) The query is “For each weight represented in NHW, get every range of heights such that for each such range hr and for each height in hr there’s at least one person represented in NHW who’s of that height and weight.” Possible formulation:

PACK
  ( ( EXTEND NHW { HEIGHT , WEIGHT } : { HR := INTERVAL_HEIGHT ( [ HEIGHT : HEIGHT ] ) } ) { WEIGHT , HR } )
ON ( HR )

Explanation: We begin by projecting NHW over HEIGHT and WEIGHT, thereby obtaining all height : weight pairs in the original relation (i.e., all height : weight pairs such that there’s at least one person of that height and weight). We then extend that projection by introducing another attribute, HR, whose value in any given tuple is a unit interval of the form [h:h], where h is the HEIGHT value in that same tuple (note the invocation of the interval selector INTERVAL_HEIGHT). We then project away the HEIGHT attribute and pack the result on HR. The final result is a relation with two attributes, WEIGHT and HR, and predicate as follows:

HR denotes a maximal interval of heights such that, for all heights h in HR, there exists at least one person p such that p has height h and weight WEIGHT.

Note carefully that this example is indeed, as stated, not a temporal one—the intervals involved represent ranges of heights, not ranges of temporal values. To be specific, they’re of type INTERVAL_HEIGHT, where HEIGHT is the applicable point type.

By way of a second example, consider relvar SP_DURING once again (Fig. 9.1). At any given time, if there are any shipments at all at that time, then there’s some part number pmax such that, at that time, (a) at least one supplier is able to supply part pmax, but (b) no supplier is able to supply any part with a part number greater than pmax. (Obviously we’re assuming here that the operator “>” is defined for values of type PNO.) So consider the query “For each part number that has ever been such a pmax value, get that part number together with the interval(s) during which it actually was that pmax value.” Here’s a possible formulation:

WITH ( t1 := UNPACK SP_DURING ON ( DURING ) ,
   t2 := EXTEND t1 { DURING } : { PMAX := MAX ( !!t1 , PNO ) } ) :
PACK t2 ON ( DURING )

Our third and last example is based on relvar S_PARTS_DURING from the section “A More Searching Example” in Chapter 6. For convenience, we show a sample value for that relvar in Fig. 9.2 (a repeat of Fig. 6.2):8

image
Fig. 9.2 A relvar with two interval attributes (S_PARTS_DURING)–sample value

Consider the query “For each part that has ever been capable of being supplied by supplier S3, get the part number and the applicable intervals of time.” Given the values shown in Fig. 9.2, the desired result looks like this:

image

Here’s a possible formulation of this query (note the reliance in line 4 on the fact that, thanks to the UNPACK in line 3, PARTS values in t3 are unit intervals):

WITH ( t1 := S_PARTS_DURING WHERE SNO = SNO ( 'S3' ) ,
    t2 := t1 { PARTS , DURING } ,
    t3 := UNPACK t2 ON ( PARTS ) ,
    t4 := EXTEND t3 : { PNO := POINT FROM PARTS } ,
    t5 := t4 { PNO , DURING } ) :
PACK t5 ON ( DURING )

Exercises

9.1 Give a formulation of the query just discussed—“For each part that has ever been capable of being supplied by supplier S3, get the part number and the applicable intervals of time”—using our usual relvar SP_DURING instead of S_PARTS_DURING.

9.2 Give formulations of the query “For each day on which some part has been capable of being supplied by supplier S3, get that day and the applicable ranges of parts,” using (a) relvar SP_DURING; (b) relvar S_PARTS_DURING.

9.3 Can you find a pair of relations r1 and r2 such that the unpacked form of r1 is a proper subset of the unpacked form of r2 but the packed form of r1 is a proper superset of the packed form of r2 (where the packing and unpacking operations are done on the basis of the same attribute in every case, of course)?

Answers

9.1 WITH ( t1 := SP_DURING WHERE SNO = SNO ( 'S3' ) ,
    t2 := t1 { PNO , DURING } ) :
PACK t2 ON ( DURING )
Note: If we can rely on relvar SP_DURING being kept in packed form—see Chapter 13—the final PACK step here will be unnecessary (why, exactly?).

9.2 

a. WITH ( t1 := SP_DURING WHERE SNO = SNO ( 'S3' ) ,
    t2 := t1 { PNO , DURING } ,
    t3 := UNPACK t2 ON ( DURING ) ,
    t4 := EXTEND t3 : { DAY := POINT FROM DURING } ,
    t5 := EXTEND t4 : { PARTS := INTERVAL_DATE ( [ DAY : DAY ] ) } ) :
PACK t5 { DAY , PARTS } ON ( PARTS )

b. One solution here (perhaps not the most efficient) is just to unpack S_PARTS_DURING on PARTS; then, if t0 is the result of that UNPACK, the solution to Exercise 9.2a applies directly, with t0 playing the role of SP_DURING. Alternatively:
WITH ( t1 := S_PARTS_DURING WHERE SNO = SNO ( 'S3' ) ,
    t2 :=t1 { PARTS , DURING } ,
    t3 :=UNPACK t2 ON ( DURING ) ,
    t4 :=EXTEND t3 : { DAY := POINT FROM DURING } ,
PACK t4 { DAY , PARTS } ON ( PARTS )

9.3 A simple example is as follows:

image

1As with COLLAPSE and EXPAND, other names appear in the literature, earlier writings by the present authors included.

2You might be thinking r is a possible value for relvar S_DURING from Chapter 6, but actually it can’t be, because it’s not consistent with the predicate for that relvar. Indeed, this chapter and the next five, as well as Chapter 16, are all concerned in large part with what’s involved in avoiding such inconsistencies.

3It’s worth noting that the DURING values are now maximal intervals, and hence that this restructured version of the relation (unlike the original version) does satisfy the predicate for—and is therefore a possible value for—relvar S_DURING.

4The predicate satisfied by this unpacked form of the relation is Supplier SNO was under contract on day d where day d is the sole day contained in the interval DURING (i.e., DURING is a unit interval).

5We could have used NOT MATCHING here in place of MINUS, thus: u1 NOT MATCHING u2.

6Again we could have used NOT MATCHING in place of MINUS. In fact, if we did, then the two projections on SNO and DURING wouldn’t be necessary.

7Actually, when we get to the slightly more formal and extended treatment of these matters in the next chapter, we’re going to define PACK to do a preliminary UNPACK anyway. But in the case at hand, where the packing and unpacking are done on the basis of just a single attribute, it’s easy to see the preliminary UNPACK can simply be ignored.

8Of course, relvar S_PARTS_DURING has two interval attributes, not just one. In keeping with the subject matter and title of this chapter, however, the only packing and unpacking we’ll be doing here will be on the basis of just a single attribute. Chapter 10 will examine the question of packing and unpacking on two attributes or more. Note: We remind you also from Chapter 6 that the sample value shown is actually incomplete, but this fact doesn’t materially affect the discussion in any way.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset