Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Among the myriad of metrics of code complexity available, cyclomatic complexity is probably the best known. Most code analysis tools can measure it.

Cyclomatic complexity aims to measure the amount of decision logic encapsulated in a software module or a function.

For simplicity, we shall assume henceforth that there is a single point of entry and a single exit point to the module whose complexity we wish to evaluate. Cyclomatic complexity is defined by first associating a control-flow graph to the code under consideration. Nodes in this graph represent elementary statements, while edges, which have a direction, represent transfer of control between these statements. This control-flow graph represents all possible execution paths through the module.

Figure A2.1. The flow graph associated to a piece of code containing one decision and one loop

The earlier definition of Shannon entropy could perhaps encourage us to define the complexity of the code simply as the total number of such execution paths (or its logarithm). This, however, would not account for the over-counting due to the many paths that are not truly independent. In other words, we should be aware that many paths can be obtained by combination of others. Cyclomatic complexity C_cycl precisely takes this into account and, assuming an entry point and an exit point are defined, it is defined by¹:

This rather abstract-looking definition of c_cy_cl fortunately turns out to be very easy to compute. Let E and N denote, respectively, the number of edges and the number of nodes in the control-flow graph. Then

The formula is so simple that many books take it as the definition. But this really obscures the true combinatorial significance of the cyclomatic complexity in terms of independent paths. Further insight into this quantity is gained by considering two more facts. Let N_{test cases} denote the number of tests cases necessary to cover all execution branches of the module and N_{if statements} be the number of “IF” statements in the module. Naturally, we could expect the number of test cases to coincide with the number of independent paths. However, the logic of the program might forbid some combinations, thus

In other words, the number of test cases is never larger than the cyclomatic complexity. Thus, C_cycl can be seen as a measure of the amount of effort to test the program. There is another statement, which substantiates the idea that C_cycl measures something like an amount of logical complexity contained in a software module and, hence, the amount of effort to produce to understand it, namely

This second statement is less obvious, but it is nevertheless true.

Practical wisdom has suggested that a module should not have a cyclomatic complexity much larger than 10. There is substantial evidence that beyond that limit the code becomes just too complicated to be maintained reliably.

Thus, C_cycl has some very loose relation to something like Shannon entropy, provided we are ready to accept that the various paths are followed randomly (which, strictly speaking, is of course wrong). It is also loosely related to the “Learn” facet of simplicity as we just noticed. On a more fundamental and quantitative level, however, there is no relation with the concepts from information theory.

Finally, let us mention that the etymology of “cyclomatic” is related to the fact that the C_cycl, when defined by E−N + 1,is the number of cycles in an undirected graph.

A2.2. An example of a scale-invariant complexity measure

Intuitively, scale-invariance for a metric of complexity refers to the idea that complexity should remain independent of the scale of description. To define a scale-invariant measure more formally, the first step is to define a mathematically practical abstraction of a multiscale architecture. Recursive graphs do just this². Rather than giving a formal definition, we simply describe it using Figure A2.2:

Figure A2.2. An example of recursive graph with three levels

A recursive graph G is nothing but a finite sequence of graphs G = (G₁…,G_n), where G₁ is the graph with coarsest scale (with least details). G_n is the graph with smallest scale (with most details), in fact it is the one that contains all information about a multiscale structure. A graph G_k is made up of a collection V_k of vertices connected by a set L_k of links. The graph G_k−₁ describes a coarser view of G_k in the sense that it contains less information. More precisely, a node in V_k−₁ is a non-empty collection of vertices from V_k and a link in L_k−₁ is also a link in L_k (but not necessarily the other way around). Going from G₁ to G₂ until G_n is like zooming-in across a multiscale structure, each step revealing further details.

With these definitions in mind, we can now formulate more precisely what we mean by scale-invariance. To do this, we define two operations on recursive graphs, a zoom-out operation Z and a zoom-in operation X. The first is defined in the following way:

Note that the Z zoom-out mapping just erases the information contained in the finest grained graph G_n of the multigraph G. It is really a view of the same system at a larger scale. It is thus a non-invertible mapping and, strictly speaking, a zoom-in operation cannot be defined. A partial inverse can, however, be defined in the following way. First, consider all recursive graphs G such that Z[G] = (G_1, …,G_n−₁). Second, from this (actually infinite) set keep only those graphs (G_1, …,G_n₋₁G_n ) which satisfy the following two conditions:

– Given a node v in V_n–₁, the set of vertices in G_n that map to this ν under Z should form a complete graph.

– If vertices u₁ and u₂ in G_n map under Z to two vertices ν₁ and ν₂ in V_n−₁, which are connected by a link in L_n₋₁, then u₁ and u₂ should be connected by a link in L_n. Thus, the finer-scale nodes are supposed to be maximally connected. This is surely a strong restriction on the structure of graphs, but it is indeed necessary to make things work.

Figure A2.3. The nodes in G_n should be maximally connected

Let us then denote X[(G₁, …,G_n−₁)], this restricted set of recursive graphs. Now, obviously, for any H = ((G₁,…, G_n_−1, H_n)) belonging to the set X[{(G_1, …, G_n₋₁)] we have, by definition of the mapping X, Z[H] = (G_1, …, G_n₋₁). Thus, X is a sort of inverse mapping for the zoom-out mapping Z.

One simple and useful extension of recursive graphs, that we shall need, is that of weighted recursive graphs. It allows us to take into account a unit cost associated to each node of the recursive graph. We assume that there is one weight function w _i for each scale i = 1,…,n, which associates a number w_i(ν) to each node ν in V_i.

One crucial assumption regarding the set of weight functions w_i is their additivity.

Figure A2.4. An example of a path in p(G_n) with repetitions at vertices ν₂ and ν₃

More specifically, the weight w_i(ν) of a node ν in V_i is supposed to be the sum of the weights w_i+₁(u) of the vertices u in V_i+1 that have been identified in v, explicitly:

This is an assumption that puts strong limits on the semantics of the weight system (w_i)_1…n. In particular, this assumption prevents taking into account more complicated aggregations of complexity that would not follow from mere addition of weights. The true reason for this restriction on the weights (w_i)_i= 1…n is simply that it allows us to define a scale-invariant complexity measure. Other, more general weight functions, would not allow this. It is thus important to check that a system of weight functions taken from real IT life satisfies, at least approximately, this assumption.

Scale-invariant measures of complexity C^(p), where p is an arbitrary integer, are then defined. For a recursive graph G = (G₁, …, G_n), endowed with a weight system w = (w_i)_i−1…n, the complexity is then defined as follows

images

The sum here is over all paths (ν_1, …, ν_p) in the set p(G_n) of paths with repetitions in the finest scale graph G_n in the recursive graph G. That repetitions are allowed means that a path can connect a node with itself.

The complexity C^(p)[G] has the following remarkable properties:

1) C^(p)[G] is linear in the weight w in the sense that C^(p)[G,λw] = λC^(p)[G,w]. This property is trivial, since the pth root compensates for the factor λ^p from the p factors w_n inside the sum. In other words:

Multiplying all the weights by some factor simply multiplies the complexity by the same factor.

2) C^(p)[G] is scale-invariant under zoom-out transformations in the sense that C^(p)[Z[G],w] = C^(p)[G,w].

The complexity of a recursive graph G and that of its zoomed-out version Z[G] are the same.

This follows immediately by first substituting w_n−1(ν) = Σ_{u ∈ν}w_n(u) in the sum defining C^(p)[Z[G],w] and then by grouping paths at level n according to paths at level n−1.

3) C^(p)[G] is scale-invariant under zoom-in transformations as described by the zoom-in operation X defined earlier. Recall that for any H in X[G_{1, …}G_n−₁)], we have Z[H] = (G₁, …,G_n−1). For such an H, we have C^(p)[H,w′] = C^(p)[G, w], provided the original weight system w is extended to a weight system w that now includes weights w^′_n on the smaller scale G_n. In order for scale-invariance to hold, the weights w_n−₁ on each node u ∈ V_n−₁ should be distributed evenly across all ν∈u where, remember, ν ∈ V_n. More explicitly, if |u| is the number of V_n vertices inside u, then w^′_n(ν) = w_n₋₁(u)/|u| for any ν ∈ u.

The complexity of a recursive graph G and that of any of its zoomed-in versions in X[G] are the same, provided the weights are distributed evenly on the smallest scale.

This again follows from simple combinatorial analysis.

Properties 2 and 3 really justify the wording scale-invariant complexity for the metric C^(p)[G,w] of a weighted, recursive graph. The case p = 1 is really trivial as it does not take into account the structure of the graph. The first non-trivial case is actually p = 2 and this is therefore the most common value chosen for p as it is also quite easy to compute in practice.

Examples

To gain some intuition for this complexity measure, let us look at the following two examples.

Figure A2.5. The spaghetti graph and the hierarchical graph

First, let us look at the spaghetti architecture G_spaghetti with n nodes. Set any p and give equal weights w_i = 1 to all nodes. Since the spaghetti graph is nothing but the complete graph and since we must sum over paths with repeated vertices, each node ν in the path (ν₁,…,ν_p) can be chosen independently. There are n^p of them, from which we conclude that

Second, let us look at a hierarchical architecture G_hierachical with n nodes. Once a starting node ν₁ has been chosen, each node v in the path (ν₁, …, ν_p) can be chosen in at most four different ways, thus this time we have

Provided that n is large enough and p ≥ 2, we see that C^(p)[G_spaghetti,w] >> C^(p)[G_hierachical,w], which accounts for the intuitive fact that spaghetti architectures are messier than hierarchical architectures!

A2.3. Conclusion

In this section, we investigate the possibility of defining a scale-invariant measure of complexity. It is indeed possible to define such a quantity, even though this imposes some rather drastic limitations. First, on how weights for individual nodes should be related at different scales. Then, on how the descriptions of the system at various scales are related when going from large to small scales.

For the special case p = 2 this scale-invariant complexity measure is quite easy to compute in practice.

This measure of complexity has no relation with the deep concepts from information theory discussed in section 2.1.1. The metric C^(p) combines two ingredients to provide a measure of complexity, namely the weights on the nodes (through the factors w_n(ν_j)) of the multigraph and the combinatorial analysis of the links between those nodes (through the sum over paths in p(G_n)). The price to pay for computability, as we announced, is some arbitrariness and even some artificialness in the definition.

Let us emphasize that the requirement of scale-invariance that led us to define C^(p) is in no way essential. Neither is the additivity of weights a natural condition. We believe this is a good illustration of the high level of arbitrariness implied by computable complexity measures. In this book, we shall instead consider the various scales present in an information system architecture as being independent. Complexity or simplicity should be analyzed within each scale. Nothing is assumed in general about the possibility of adding or combining complexities associated with different scales.

1 A rigorous definition of the concept of path independence would be outside the scope of this book. The mathematically inclined reader might want to review the concept of the first homology group of a graph. Cyclomatic complexity is actually nothing but the cardinality of this group.

2 We follow here the treatment proposed in Yves Caseau [CAS 07].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
Appendix 2: Two Measures of Code Complexity

Appendix 2

Two Measures of Code Complexity

A2.1. Cyclomatic complexity

A2.2. An example of a scale-invariant complexity measure

A2.3. Conclusion

Table of Contents for Appendix 2: Two Measures of Code Complexity

Create new playlist

Sign In

Sign Up

Appendix 2Two Measures of Code Complexity

A2.1. Cyclomatic complexity

A2.2. An example of a scale-invariant complexity measure

A2.3. Conclusion

Table of Contents for
Appendix 2: Two Measures of Code Complexity

Appendix 2

Two Measures of Code Complexity