Appendix 2

Two Measures of Code Complexity

,

 

 

 

A2.1. Cyclomatic complexity

Among the myriad of metrics of code complexity available, cyclomatic complexity is probably the best known. Most code analysis tools can measure it.

Cyclomatic complexity aims to measure the amount of decision logic encapsulated in a software module or a function.

For simplicity, we shall assume henceforth that there is a single point of entry and a single exit point to the module whose complexity we wish to evaluate. Cyclomatic complexity is defined by first associating a control-flow graph to the code under consideration. Nodes in this graph represent elementary statements, while edges, which have a direction, represent transfer of control between these statements. This control-flow graph represents all possible execution paths through the module.

Figure A2.1. The flow graph associated to a piece of code containing one decision and one loop

app2-fig2.1.jpg

The earlier definition of Shannon entropy could perhaps encourage us to define the complexity of the code simply as the total number of such execution paths (or its logarithm). This, however, would not account for the over-counting due to the many paths that are not truly independent. In other words, we should be aware that many paths can be obtained by combination of others. Cyclomatic complexity Ccycl precisely takes this into account and, assuming an entry point and an exit point are defined, it is defined by1:

This rather abstract-looking definition of ccycl fortunately turns out to be very easy to compute. Let E and N denote, respectively, the number of edges and the number of nodes in the control-flow graph. Then

images

The formula is so simple that many books take it as the definition. But this really obscures the true combinatorial significance of the cyclomatic complexity in terms of independent paths. Further insight into this quantity is gained by considering two more facts. Let Ntest cases denote the number of tests cases necessary to cover all execution branches of the module and Nif statements be the number of “IF” statements in the module. Naturally, we could expect the number of test cases to coincide with the number of independent paths. However, the logic of the program might forbid some combinations, thus

images

In other words, the number of test cases is never larger than the cyclomatic complexity. Thus, Ccycl can be seen as a measure of the amount of effort to test the program. There is another statement, which substantiates the idea that Ccycl measures something like an amount of logical complexity contained in a software module and, hence, the amount of effort to produce to understand it, namely

images

This second statement is less obvious, but it is nevertheless true.

Practical wisdom has suggested that a module should not have a cyclomatic complexity much larger than 10. There is substantial evidence that beyond that limit the code becomes just too complicated to be maintained reliably.

Thus, Ccycl has some very loose relation to something like Shannon entropy, provided we are ready to accept that the various paths are followed randomly (which, strictly speaking, is of course wrong). It is also loosely related to the “Learn” facet of simplicity as we just noticed. On a more fundamental and quantitative level, however, there is no relation with the concepts from information theory.

Finally, let us mention that the etymology of “cyclomatic” is related to the fact that the Ccycl, when defined by E−N + 1,is the number of cycles in an undirected graph.

A2.2. An example of a scale-invariant complexity measure

Intuitively, scale-invariance for a metric of complexity refers to the idea that complexity should remain independent of the scale of description. To define a scale-invariant measure more formally, the first step is to define a mathematically practical abstraction of a multiscale architecture. Recursive graphs do just this2. Rather than giving a formal definition, we simply describe it using Figure A2.2:

Figure A2.2. An example of recursive graph with three levels

app2-figA2.2.jpg

A recursive graph G is nothing but a finite sequence of graphs G = (G1…,Gn), where G1 is the graph with coarsest scale (with least details). Gn is the graph with smallest scale (with most details), in fact it is the one that contains all information about a multiscale structure. A graph Gk is made up of a collection Vk of vertices connected by a set Lk of links. The graph Gk−1 describes a coarser view of Gk in the sense that it contains less information. More precisely, a node in Vk−1 is a non-empty collection of vertices from Vk and a link in Lk−1 is also a link in Lk (but not necessarily the other way around). Going from G1 to G2 until Gn is like zooming-in across a multiscale structure, each step revealing further details.

With these definitions in mind, we can now formulate more precisely what we mean by scale-invariance. To do this, we define two operations on recursive graphs, a zoom-out operation Z and a zoom-in operation X. The first is defined in the following way:

images

Note that the Z zoom-out mapping just erases the information contained in the finest grained graph Gn of the multigraph G. It is really a view of the same system at a larger scale. It is thus a non-invertible mapping and, strictly speaking, a zoom-in operation cannot be defined. A partial inverse can, however, be defined in the following way. First, consider all recursive graphs G such that Z[G] = (G1, …,Gn−1). Second, from this (actually infinite) set keep only those graphs (G1, …,Gn−1Gn ) which satisfy the following two conditions:

– Given a node v in Vn–1, the set of vertices in Gn that map to this ν under Z should form a complete graph.

– If vertices u1 and u2 in Gn map under Z to two vertices ν1 and ν2 in Vn−1, which are connected by a link in Ln−1, then u1 and u2 should be connected by a link in Ln. Thus, the finer-scale nodes are supposed to be maximally connected. This is surely a strong restriction on the structure of graphs, but it is indeed necessary to make things work.

Figure A2.3. The nodes in Gn should be maximally connected

app2-figA2.3.jpg

Let us then denote X[(G1, …,Gn−1)], this restricted set of recursive graphs. Now, obviously, for any H = ((G1,…, Gn−1, Hn)) belonging to the set X[{(G1, …, Gn−1)] we have, by definition of the mapping X, Z[H] = (G1, …, Gn−1). Thus, X is a sort of inverse mapping for the zoom-out mapping Z.

One simple and useful extension of recursive graphs, that we shall need, is that of weighted recursive graphs. It allows us to take into account a unit cost associated to each node of the recursive graph. We assume that there is one weight function w i for each scale i = 1,…,n, which associates a number wi(ν) to each node ν in Vi.

One crucial assumption regarding the set of weight functions wi is their additivity.

Figure A2.4. An example of a path in p(Gn) with repetitions at vertices ν2 and ν3

app2-figA2.4.jpg

More specifically, the weight wi(ν) of a node ν in Vi is supposed to be the sum of the weights wi+1(u) of the vertices u in Vi+1 that have been identified in v, explicitly:

images

This is an assumption that puts strong limits on the semantics of the weight system (wi)1…n. In particular, this assumption prevents taking into account more complicated aggregations of complexity that would not follow from mere addition of weights. The true reason for this restriction on the weights (wi)i= 1…n is simply that it allows us to define a scale-invariant complexity measure. Other, more general weight functions, would not allow this. It is thus important to check that a system of weight functions taken from real IT life satisfies, at least approximately, this assumption.

Scale-invariant measures of complexity C(p), where p is an arbitrary integer, are then defined. For a recursive graph G = (G1, …, Gn), endowed with a weight system w = (wi)i−1…n, the complexity is then defined as follows

images

The sum here is over all paths (ν1, …, νp) in the set p(Gn) of paths with repetitions in the finest scale graph Gn in the recursive graph G. That repetitions are allowed means that a path can connect a node with itself.

The complexity C(p)[G] has the following remarkable properties:

1) C(p)[G] is linear in the weight w in the sense that C(p)[G,λw] = λC(p)[G,w]. This property is trivial, since the pth root compensates for the factor λp from the p factors wn inside the sum. In other words:

Multiplying all the weights by some factor simply multiplies the complexity by the same factor.

2) C(p)[G] is scale-invariant under zoom-out transformations in the sense that C(p)[Z[G],w] = C(p)[G,w].

The complexity of a recursive graph G and that of its zoomed-out version Z[G] are the same.

This follows immediately by first substituting wn−1(ν) = Σu ∈νwn(u) in the sum defining C(p)[Z[G],w] and then by grouping paths at level n according to paths at level n−1.

3) C(p)[G] is scale-invariant under zoom-in transformations as described by the zoom-in operation X defined earlier. Recall that for any H in X[G1, …Gn−1)], we have Z[H] = (G1, …,Gn−1). For such an H, we have C(p)[H,w′] = C(p)[G, w], provided the original weight system w is extended to a weight system w that now includes weights wn on the smaller scale Gn. In order for scale-invariance to hold, the weights wn−1 on each node uVn−1 should be distributed evenly across all ν∈u where, remember, ν ∈ Vn. More explicitly, if |u| is the number of Vn vertices inside u, then wn(ν) = wn−1(u)/|u| for any ν ∈ u.

The complexity of a recursive graph G and that of any of its zoomed-in versions in X[G] are the same, provided the weights are distributed evenly on the smallest scale.

This again follows from simple combinatorial analysis.

Properties 2 and 3 really justify the wording scale-invariant complexity for the metric C(p)[G,w] of a weighted, recursive graph. The case p = 1 is really trivial as it does not take into account the structure of the graph. The first non-trivial case is actually p = 2 and this is therefore the most common value chosen for p as it is also quite easy to compute in practice.

Examples

To gain some intuition for this complexity measure, let us look at the following two examples.

Figure A2.5. The spaghetti graph and the hierarchical graph

app2-figA2.5.jpg

First, let us look at the spaghetti architecture Gspaghetti with n nodes. Set any p and give equal weights wi = 1 to all nodes. Since the spaghetti graph is nothing but the complete graph and since we must sum over paths with repeated vertices, each node ν in the path 1,…,νp) can be chosen independently. There are np of them, from which we conclude that

images

Second, let us look at a hierarchical architecture Ghierachical with n nodes. Once a starting node ν1 has been chosen, each node v in the path 1, …, νp) can be chosen in at most four different ways, thus this time we have

images

Provided that n is large enough and p ≥ 2, we see that C(p)[Gspaghetti,w] >> C(p)[Ghierachical,w], which accounts for the intuitive fact that spaghetti architectures are messier than hierarchical architectures!

A2.3. Conclusion

In this section, we investigate the possibility of defining a scale-invariant measure of complexity. It is indeed possible to define such a quantity, even though this imposes some rather drastic limitations. First, on how weights for individual nodes should be related at different scales. Then, on how the descriptions of the system at various scales are related when going from large to small scales.

For the special case p = 2 this scale-invariant complexity measure is quite easy to compute in practice.

This measure of complexity has no relation with the deep concepts from information theory discussed in section 2.1.1. The metric C(p) combines two ingredients to provide a measure of complexity, namely the weights on the nodes (through the factors wn(νj)) of the multigraph and the combinatorial analysis of the links between those nodes (through the sum over paths in p(Gn)). The price to pay for computability, as we announced, is some arbitrariness and even some artificialness in the definition.

Let us emphasize that the requirement of scale-invariance that led us to define C(p) is in no way essential. Neither is the additivity of weights a natural condition. We believe this is a good illustration of the high level of arbitrariness implied by computable complexity measures. In this book, we shall instead consider the various scales present in an information system architecture as being independent. Complexity or simplicity should be analyzed within each scale. Nothing is assumed in general about the possibility of adding or combining complexities associated with different scales.

 

 

1 A rigorous definition of the concept of path independence would be outside the scope of this book. The mathematically inclined reader might want to review the concept of the first homology group of a graph. Cyclomatic complexity is actually nothing but the cardinality of this group.

2 We follow here the treatment proposed in Yves Caseau [CAS 07].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset