Chapter 1

Engineering Large-scale Complex Systems 1

1.1. Introduction

The terms “systems science”, “systems of systems” and “systems engineering” have, for decades, been excluded from use in the field of “hard” sciences due to their “engineering” connotations. These gaps have been filled by the domains of control engineering and the theory of dynamical systems, apparently more “noble” due to their use of equations and theorems derived from applied mathematics. These terms have recently resurfaced to a great deal of media attention in light of recent events: the 2008 economic crisis and subsequent attempts to escape from the crisis, attempts to achieve stability in Iraq and Afghanistan, and the crisis provoked by the Icelandic volcanic ash cloud.

It is, moreover, interesting – even entertaining – to see how pseudo-specialist media publications, in the form of specialist editions produced by wide-distribution media or successful books by amateur economists, have made the notion of systems more fashionable in the context of the economic crisis. They insist on the heterogeneity of components, their relationships and interactions, and the complexity of these interactions in both temporal and spatial terms. Moving beyond this essential notion, the whole approach of systemics has become fashionable, with general favor accorded to a holistic approach, moving simultaneously from global to local and from specific to general aspects, to take account of all feedback loops at different levels in the system, etc. All of this comes from the same experts who previously spoke of microeconomic parameters and zealously promoted reductionism.

Systemics is once again (for the time being – we should not count on permanence in this age of consumption of icons, whether talking about sports stars, pop stars, TV stars or temporary disciples of a stream of thought) on the agenda in an attempt to provide explanations where previous analyses have failed. By considering the object of study from this angle of multiples, links and complexity (in the etymological sense of the term, “multi-stranded, braided”), we demonstrate the need for multiple perspectives, different approaches, and to avoid becoming trapped in a monolithic vision backed up solely by the knowledge inherent in a single given domain.

This is abundantly clear in a number of studies on the crisis in the Middle East, where it seems evident that, in order to escape the inevitable impasses created by difficult stabilization, a purely military, political or economic response is insufficient. It is clear that military intervention has been unsuccessful in establishing alternatives following the removal of old regimes; donor conferences have not succeeded in establishing bases for permanent economic and industrial reconstruction within the states in question, nor have the creation of constitutions and the establishment of elections been enough to create political stability and guarantee the creation of a viable state. It is, in fact, a conjunction of these actions, and many others, which is currently used in the secret hope that a suitable combination of these ingredients might be found rapidly using the resources already involved. Still, we should note that this “magic recipe” will not remain the same over time; military, political and economic approaches must be “dosed” appropriately to create and exploit margins for maneuver, allowing us to envisage progress in the stabilization process.

Once again, systemics provides the keys to explaining and modeling, from which it becomes possible to create action plans supporting trajectories towards desired objectives. However, this system analysis must be carried out without prejudice as to the importance of specific viewpoints: systemics is born of the richness and multiplicity of approaches to a problem, but is destroyed by overly hasty and excessively simplistic conclusions.

Let us return to the example of the stabilization problem in Afghanistan and Iraq. Insurgent action is directed towards neutralization of the conditions necessary for the establishment of a political-economical-judicial system or, in other words, a state that would guarantee the security, prosperity and well-being of the population that created it. This is an example of actions undertaken by insurgents to avoid the establishment of a “system”, “an integrated set of connected and interlinked elements (personnel, products and processes), which aim to satisfy one or more defined objectives” (ISO/IEC 15288). The strategies of this mode of combat have been used by T.E. Lawrence against the Turks, Mao Tse-Tung (China), Vo Nguyen Giap and Ho Chi Minh (Vietnam), the Sandinists (Nicaragua), the Intifada and the al-Aqsa Intifada (Israel/Palestine), and finally al-Qaeda. All use systemic reflections in their writings calling for insurrection, as discussed by [HAM 06], providing a posteriori justification for the use of systems science as an analytical tool to combat this type of situation.

The adoption of a system representation then allows us to identify the different parts of a puzzle, the causes and consequences involved, the strengths and weaknesses of dependencies, the nature of interactions, to understand how all this information contributes to a common goal, and what precedence certain elements may take over others at specific moments in attaining objectives. Based on this analysis, it becomes possible to imagine certain consequences that would arise from working on specific components, and it becomes possible to see, for example through simulation, the possible effects of specific counter-actions on a previous disturbance. This phase of synthesizing a set of actions with the aim of recreating an acceptable level of goal fulfillment when faced with non-mastered disturbances is the main aim when applying a systemic approach to a problem.

Let us now look at the recent crisis generated by the cloud of volcanic ash emitted from a volcano in Iceland, a major event in the second half of April 2010 that led to the complete closure of the airspace of a number of European countries over several days. This disrupted international flights, which were forced to take detours around Europe, and prevented travel for tens of thousands of people, a problem aggravated by the fact that the incident occurred during the school holiday period in a number of countries. Beyond the problems of individual travelers, who were obliged to delay or cancel their vacations or bear the unexpected expense of several days’ accommodation while waiting for return flights, these repatriation problems rapidly took on a political aspect.

In France, for example, airlines and travel companies turned to the government to repatriate travelers and have military air bases opened, in the strongest tradition of the all-powerful Welfare State. To illustrate the fact that this was not as straightforward as it may seem, consider how responsibility would have been attributed if an airplane had failed to land safely. Without immediately assuming the worst, how would insurance companies deal with the damage to luggage created by an accident of this kind?

The political dimension is accompanied by a social aspect: the French government requested that certain social groups within the rail workers’ network suspend strike action in order to transport passengers diverted to an airport other than their planned destination. Here, we gain a transparent vision of the interconnections between various transport systems, including the air and rail networks, when dealing with a traveler who did not choose the combination of these options.

On top of this interaction between transport systems, we should also note the links between reservation systems: those travelers directly affected by air travel redirections were added to the “normal” passenger load. This produced a problem with two distinct aspects, covering both the reestablishment and maintenance of traffic.

Another dimension to consider, in addition to the evident economic considerations resulting from a suspension of air traffic over several days in a heavily-used zone, is the diplomatic dimension: this air traffic crisis prevented a certain number of heads of state from attending the funeral of the Polish president, who was killed just a few days before… in an air accident.

Additionally, we must not forget the technical aspect: at the outset, the crisis was caused by the interaction between microscopic dust particles and an aircraft engine. Volcanic residue is particularly hard and can damage fins, leading to temperature increases on a scale that causes serious damage to an aircraft engine. This, at least, is the situation predicted by one of the simulation codes used in aeronautics.

In summary, a digital model with a pre-defined domain of validity that sets out the level of trustworthiness of its predictions led to a flight ban in independentlymanaged airspaces that are spatially correlated by necessity, with economic, political, diplomatic and social consequences. This demonstrated the relationships and interdependencies of several systems, including air transport, the rail network, tour transport, travel reservation systems, air traffic control, weather forecasting, political systems, insurance systems etc. In short, we are faced with the obvious existence of a system of systems.

Faced with these complex systems and systems of systems, or at the very least with these representations generated by a systemic vision, it is useful and often necessary to have access to potentially multidisciplinary methods and tools to design, create, produce, maintain and develop the systems under study. This is the domain of systems engineering (formalized in a number of standards, from MILSTD- 499B to the more recent ISO/IEC 15288, the latest standard (issued in 2008), via EIA/IS-632, ISO-12207, SE-CMM and ISO 9000:2000).

In what follows, we shall go into detail on a number of points that are key to the success and mastery of the complexity of large systems encountered in the domains of banking, healthcare, transportation, space travel, aeronautics and defense, for which it is no longer conceivable to create an ad hoc system each time new needs are expressed or new technologies become available. We must, therefore, move from a focus on separate, stove-piped or compartmentalized systems towards a capability approach oriented towards operational needs, expressed in a more or less formal manner, most often as desirable performance characteristics of a service, and that develop to adapt to the environment and context of use. This situation is reinforced by the fact that the growing maturity of new information and communications technologies encourages the creation of networked systems, with physical and operational interconnections, allowing us to generate new services by coupling the functionalities of individual systems.

In this section, we shall not go into detail on a certain number of points already covered in [LUZ 08a], [LUZ 08b] and [CAN 09]; the current work builds on the contents of these previous publications, and readers interested in these ideas will find full details in the bibliography. We shall concentrate on several questions that have become important in relation to mastering complexity in large systems in recent times. Thus, we shall begin by considering the notion of service, which is becoming increasingly dominant in systems of systems and large sociotechnical systems, and which is applicable to a number of interesting themes that still lack necessary responses in terms of tools: the problem of architecture in systems of this kind and the issue of resilience. We shall then look at the development of relationships between the various parties involved directly or indirectly with these systems, and the contractual setups we might encounter and that should enter into general practice. Finally, we shall return to the problem of complexity in systems and the ways in which systems engineering can, and should, account for this factor.

1.2. The notion of service in large complex systems

Over the past three decades, the word “service” has been somewhat overused, and is always presented as a main motor for change; over this period, we have frequently heard that the tertiary sector, and service activities in particular, creates value beyond the traditional sector of wealth production. After IT consulting businesses, where the notion of service is mixed in with the development of computing at business level, we saw the emergence of service offers and chèques emploi service in France1, with a focus on the final user as the central recipient of the service. This plurality of semantic notions is interesting, and demonstrates the complexity of the concept and the need to account for the whole value chain, from the service creator to the user via necessary intermediaries – if, in fact, the creator is not also the user, which may be the case. This creates new, potentially circular value chains. The final user may, in fact, play an important part in the creation of value associated with the service: think of applications available online, destined to create content (music, films, etc.) with the possibility of breaking these applications down into micro-applications following the desires and competences of the user. In this case, the user is as much involved in forging his or her tool as in using it to create something. He or she thus becomes the service provider (or “blacksmith”) for others, who then use the tool to create value (the “creator” of the content); this content may then be consumed by the same initial user (in the role of final consumer of the results).

Beyond the definitions of the concept, then, it appears necessary to understand to what extent mastery is possible within the large socio-technical systems2 that surround us.

What, then, is a service? A first definition opposes it to goods, which are, by their nature, material and are destroyed or transformed by consumption: a service is a “composable immaterial provision, manifested in a perceptible manner and which, in predefined usage conditions, is a source of value for the consumer and the provider” (ITIL/ISO20000). This definition highlights that a service is not attached to a material resource on a one-to-one basis, and does not disappear when it is used; “composability” even tends to suggest that consumption may continue as long as desired. It also points out that a service only exists through the “value” that may be attached to it, and in specific conditions. Here, we encounter a microeconomic constant: the value of something is relative and, moreover, this relativity also depends on the global chain involved in using the service in question. That which has a certain value today may have no value or, conversely, greater value tomorrow. This realization is essential as, if the value is associated with the price attached to a service, it implies that the price is not absolute. This is nothing new, but, essentially, value depends on the consumer, as it is not the production of the service that determines the price, but its use. Thus, we enter into different commercial perspectives: today when we take out a telephone contract we do not pay for the telephone itself or the infrastructure, but the capacity to place a call anytime, or at certain moments, in given conditions, to receive data at a certain speed, etc. Of course, when we go into the details of the physical implementation of the service, we encounter material resources (the cell phone, telecommunications masts, relay servers, etc.). The nature of the service and its price, however, and consequently its value from the viewpoint of the final consumer, take little account of these material details and concentrate on conditions of service quality, and thus of a performance contractualized between the consumer and provider.

This quality of service and the notion of contractualization are essential: as the service is a priori immaterial (or at least contains an important non-material dimension, unlike goods that exist only by their physical presence and their contributions at the moment of consumption), it is linked to a thing (a tangible or intangible object) or to a person. We thus find services offered to individuals or for objects; in the first case, we might cite the example of catering services, and in the second, merchandise transport or repair services. We also find services that do not involve tangible objects in the same way as in our previous examples, but where the person/object dichotomy still exists: training or healthcare services, in the first case, or insurance services and financial activities in the second case.

The notion of contractualization corresponds to the formalization of the expectations of parties involved in a service exchange. This is done using either results-based obligations or means-based obligations, but in both cases we define levels of service that allow us to associate precise values with a qualitative notion of performance acceptable to the user and attainable by the provider. As with any form of contractualization, all parties are involved in the process and the outcome of negotiations is a formalization of the level of obligation; however, in this case, the means of verifying the adequacy of the service during provision are also fixed and defined in the contract. Means of correction or reparations given to the consumer (penalties, ulterior obligations) in case of discrepancy are also defined, along with conditions for the suspension of service by one or both parties where applicable. Contractualization is the act that gives meaning to the execution of the service, defining the parties concerned, the nature of the service and the conditions in which it will be provided, and in particular the accompanying interactions and exchanges.

In a complex system, we may find all of the ingredients mentioned above, all potential service types, associated with products, etc. To illustrate the fact that a service, while immaterial, must not systematically be dissociated from physical resources, we may simply consider the example of roaming telephony (with the capacity for the user to be reached on the same number, whether the receiver of the call is using a mobile or fixed-line telephone). We could also consider air transport services. The traveler pays for a seat, i.e. the capability to be transported at a given moment, but this seat is itself a physical entity in a given means of transport. Ticket prices change depending on conditions of purchase, but the physical result of the transaction remains constant: the seat itself remains the same. We thus see the necessity of precise definition in a complex system, of different services and goods or products and the set of data relating to the one and the other, i.e. the parties involved, contracts, etc., particularly in relation to services. This must be done at the level of the system architecture, and it is crucial in this case to consider the system in its context of use and not just as a static abstraction defined outside all constraints or real-time retroaction connected with use. This produces a dynamic views concerned with services where the user (consumer) is a key parameter of this dynamic.

While services seem (from the preceding paragraphs) to introduce a significant degree of complexity into a system, we should also highlight the flexibility they allow. In fact, their immaterial and composable character, which allows them to overcome the limitations of physical matter to a certain extent, permits the use of new means in fulfilling the objectives assigned to the system.

In this way, services provide agility, particularly in authorizing the reuse of capabilities, leading to increased efficiency in terms of resource and cost management. Typically, we might create a functionality similar to another that already exists. The designer (or, more precisely, the provider of the desired capability) constructs a service reusing the functionality as far as possible. In the case of a capability with a dominant informational aspect, this is especially easy as the system architecture is particularly well suited to this use, (without going into detail as to the implementation, we create a new service, for example at the level of a company bus, by updating registers of services, different yearbooks etc.). If this is not the case, it is important, among other things, to make a list of all parties concerned, the desired interactions and those which can be deduced from existing factors. This will allow you to establish different mechanisms for sharing responsibility, in short, all elements susceptible to be used, if a service is considered following the definition proposed by [OAS 09]: “a mechanism to allow access to one or more capabilities, where access is provided using a prescribed interface and authorized in coherence with constraints and policies specified by the service description”.

Another important point in the design and later use of a complex system integrating services is the organization of the resources necessary to implement and manage the service in question. We thus encounter the front-office and back-office, the first directed towards the client and the second “hidden” in a behind-the-scenes role. Satisfaction, as expressed by the final user, depends on the efficiency of this “hidden” section, but the back-office also has effects on cost, another element of satisfaction. We must therefore find a compromise (in economic and organizational, as much as technical, terms) that clearly has direct effects on the way different components of the global system are set up, i.e. the definition of the system architecture, an aspect we shall discuss in the next section.

As a conclusion to this section focusing on the notion of service within largescale complex systems (for example transport systems, energy supply systems, healthcare systems, social systems, aeronautical and aerospace systems, defense systems etc.), we should highlight the fact that the issue is not restricted to the technical dimension. Systems engineering must account for these aspects as early as possible, initially at the strategic level, including the set of organizations and parties that may be concerned throughout the creation, supply and use of the service in the corresponding vision. Only by this global level of consideration may we truly evaluate the global cost of the service (and its possible price or pricing policy) and the set of responsibility and flow chains necessary for successful engineering of the service. This is particularly critical, as we would do well to remember, because the value of a service from the point of view of the consumer most often proceeds from a co-creation approach.

1.3. Architecture: a key concept

Marcus Vitruvius Pollio, during the reigns of Julius Caesar and the emperor Augustus, produced a 10-volume work, De Architectura, on the architecture and construction of civil and military machines. In this work, the author gives a list of desirable virtues in architecture: firmitas (solidity, robustness), utilitas (utility, satisfaction of needs) and venustas (beauty, esthetics). [LAN 08] considers that, in a modern context, this work contains important reflections on user needs, durability, reliability and environmental constraints that must be taken into consideration during the design process.

Beyond architectural considerations, which are not part of our focus here, it is useful to consider what this work puts forward in relation to a response to a client request in terms of desired qualities. The triptych set out above applies not only to intrinsic qualities that should be sought after (firmitas implies robustness, but also resilience, or even invariance in relation to the evolution of demands) but also to the importance of involving all concerned parties (utilitas insists on the finality and place of the user in the judgment of adequacy in relation to the attainment of final objectives). Neither do we neglect the esthetic dimension (venustas does not apply to useless luxury, but to beauty, purity and balance). This last aspect bears a resemblance to the search, in mathematics, for brief and elegant demonstrations, as opposed to fastidious calculations that we may not be proud of even if they respond to the initial question. While retaining a sense of proportion, we should aim to provide these characteristics in large complex systems. For us, architecture is, on the one hand, the invariable representation of the object to design and develop, notwithstanding inevitable changes in the context or those made by those involved; on the other hand, it is the means of exchange and sharing between these parties throughout the life of the system. Architecture is a representation, as it is a set of static and dynamical descriptions (it would be an error to exclude time factors or transformation functions that may operate on a given architecture to produce a different architecture), a set not strictly reduced to a juxtaposition, but more a composition or interweaving. It also constitutes a means for exchange and sharing as the formalized architecture transmits codified information, comprehensible to all who adhere to the standard of expression of information used; we thus have a representation that overcomes spatial, temporal, cultural and generational limits and a priori has the same meaning for all. It may then be used by each individual to particular ends, but based on a general acceptation.

These aspects are clearly critical in considering our issues and, while we might present the objection that an architecture is a response produced by compromise and need analysis, it remains the place in which we find the basic formalization of needs and expectations, in that it is not limited (for certain of the views it offers) to concentrating on detailed aspects of installation.

Clearly, this presumes that we do not subscribe to a narrow vision of architecture, strictly limited to technical or operational viewpoints, but take account of the strategic processes that surround the realization of the project itself at a given moment and a given point in its lifecycle. Once again, the distance taken does not mean that our viewpoint is decorrelated from the rest; we still aim to establish links of dependency and overlap, which are potentially dynamic, with all views making up the representation of the group.

Note that following ANSI/IEEE Std 1471-2000, architecture is seen as the fundamental organization of a system, defined by its components, their relationships with other components and with the environment, and the principles governing its design and development.

The activity of architecting aims to establish an agreement between all parties involved and to lay the foundations of the solution to the problem by a list of demands covering technical, organizational and financial considerations. Depending on the domain, the nature of this activity depends on the degree of maturity, which is variable. As we have already mentioned in relation to works on the subject in Antiquity, the fields of construction and civil engineering as a whole have reached a high degree of maturity. This is not necessarily the case for interconnected information systems or large complex systems, as feedback in these cases only covers decades instead of centuries or millennia. Architecture, in the sense used here and in terms of practice in systems engineering, is expressed by a set of viewpoints, the coherence and completeness of which must be mastered, over the whole lifespan of the solution under consideration. To do this, we must have access to standards, or at least to reference points, methods, formalisms and notations that allow us to carry out comparative evaluation of various possible architectures. This is offered by architecture frameworks [MIN 08], which may be broken down into two broad families: methodological frameworks, on the one hand (TOGAF, FEAF, ISO- 15704/GERAM, etc.3), and formal and denotational frameworks on the other (DoDAF, MODAF, NAF, etc.4). The first set allow us to develop concepts, principles and processes applicable to the products or services which, in the long run, will make up the planned system. The second group defines reference structures that may be instantiated to represent a system and thus obtain a set of views.

These views or perspectives allow us to consider the solution on several levels: technological, functional, system and strategic. The technological level describes the different choices of physical implementation for the solution. Functional level describes functions carried out and their possible hierarchy (functions and subfunctions). At system level, we find a description of the organization of the system and information, energy and matter flows between different blocks. Strategic level describes the business and organizational process that regulate non-technical aspects, and replace the system and its environment in the global context of use, taking account of objectives and policies for use. The search for the added value to be produced by the system during use concentrates on this level. In “traditional” systems, use is mainly made of the first three levels, but for the information systems and complex systems that interest us here, the strategic level is increasingly widely accepted as being just as important, particularly in cases where the system is reused in different contexts.

Clearly, these different perspectives offered by various architectural frameworks are only one solution for representing a system; as when representing a threedimensional object there is no set group of privileged viewpoints, but simply a requirement that the representation be constituted in a sufficiently complete manner and contain potential overlaps. In the same way, we cannot define the architectural framework or the list of viewpoints to use. However, it is useful to have shared practices and a shared reference in terms of viewpoints, or the means of transforming certain viewpoints into others, if we wish to reuse system architecture when considering a larger system that includes the first system in part or its entirety.

1.4. Towards resilient systems

The systems under consideration have finite resources, may potentially include multiple and possibly conflicting goals for their different components, integrate individuals and organizations, and evolve within non-fixed contexts that are not completely predictable. Faced with disturbances, variations, changes, disruptive phenomena, surprises and unpredicted events, it is essential that these systems should remain operational, if not at a nominal then at an acceptable level. To do this, human organizations generally make use of various measures (regulations, procedures, policies, automation rules, etc.) to create a secure environment for operation and adopt a reactive posture through emergency procedures. Experience has shown that failures do occur due to the misapplication of certain rules in cases that had been planned for initially, but also that high levels of success are due to the fact that individuals are able to adapt to changing conditions and attempt to recreate new secure conditions. This operates via an evaluation of current conditions and anticipation of changes, or at least of broad trends, in order to work in a proactive rather than a reactive manner.

Our aim in this section is to see how these properties, perceived at the level of human organizations can act as an inspiration when applying these “recipes” to engineering large complex systems. They can be used in the design phase – so principally at architectural level, to create resilience – or during use – in which case we are concerned with procedures and user organizations, considering the evaluation of changing conditions of use, anticipation of subsequent modifications and the installation of mechanisms based on these previous reflections.

1.4.1. Resilience: definitions

Let us begin by considering intuitive interpretations of the notion of resilience. A simple example of a resilient system is an electric toothbrush, which may still be used as a toothbrush even when the electric motor ceases to function. In the context of company management, resilience is the capacity to survive, adapt and grow when faced with major or turbulent changes, typically those which arise in periods of crisis. Examples of companies with high resilience include those focused on the client, i.e. where the organization is not centered on product lines or the environment (for example, where departments are oriented towards competitors or suppliers) but on clients who essentially represent the invariable nucleus in a changing world. However, although this type of organization seems to be an obvious choice from the perspective of resilience, this is not necessarily the case when optimizing supply chains, for example, or profiting from aspects shared between products and services in the company.

The following example, taken from [HOL 06], is interesting when considering a system’s resilience outside of its technical aspects. The Concorde, the “flower of French civil aviation”, was designed in the 1960s in full accordance with the regulations in force at the time. Due to the reduced scale of production of the aircraft and the exorbitant cost of updates, however, Concorde progressively ceased to conform to developments in civil aviation regulations (for example the requirement for a speed of 250 knots under level 100, which would have made it impossible for the aircraft to cross the Atlantic due to fuel requirements). It was also unable to transport heavy luggage (due to the necessary fuel/seats/luggage compromise), something which may appear to contradict the notion of a luxury voyage, the very commercial model used by Concorde. In both cases, the solutions found allowed operations to continue, in spite of apparent contradictions and the fact that the totality of regulations were not strictly respected. Even after the terrible accident of July 25, 2000, which caused the deaths of 111 people, Concorde continued to fly for another two years, despite the fact that this event could have served as a pretext for a flight ban. In other terms, this example illustrates the resilience of the European supersonic air transport model (although this resilience was not, unfortunately, applicable to the aircraft involved in the accident itself). Resilience can therefore be characterized by management of the borders of the domain of application (the borders, in this example, being characterized by regulatory demands and the predetermined model of commercial exploitation). The challenges linked to resilience include the management of that which is uncertain or unplanned, accidents (in the etymological sense of the term, such as the appearance of threatening events), the transition between more or less catastrophic circumstances while avoiding a true catastrophe, and the return to a more normal operational status.

Two particular characteristics are important in this case: the ability to recover following change (so a certain elasticity) and the capacity for adaptation (or learning). This second characteristic is not, in fact, an end in and of itself, but a possible solution to satisfy the first characteristic; however, it is sufficiently widespread to merit separate consideration. Various characteristics may contribute to the capacity for resilience in a system, including diversity, efficiency, adaptability and cohesion. These correspond directly to characteristics required for reliability: redundancy, efficacy, reactivity and retroaction. Without going into semantic developments which, in any case, are of little interest due to the ambiguity of natural language and the difficulty in providing strict formal definitions of the terms used, we observe a qualitative gap between the two groups of notions. Criteria relating to dependability appear to be more easily quantifiable, while their counterparts in resilience take a more qualitative approach. This is partly due to the relative newness of the domain, but also to qualities of the property of resilience itself.

Let us, then, attempt to pin down the meaning of these terms as far as possible, while remembering that each individual has his or her own understanding. As we shall see in the following section, the domain of resilience has connections to the fields of dependability and risk analysis and management, while remaining distinct, mainly through the fact that it concerns adaptation to unplanned circumstances rather than to predictable disturbances or events.

The concept of resilience is encountered in various disciplines [FIK 07] including psychology, where it takes on particular meanings (see the works of Boris Cyrulnik, showing the capacity of individuals to manage stress or catastrophes of an emotional nature), but also economics, sociology, risk management, ecology, network theory, etc. It is defined in a general manner as the capacity of a system to tolerate internal or external disturbances, while maintaining its structure and function and while continuing to function in an acceptable manner.

In economics, resilience is the capacity of a local economy to preserve jobs and prosperity levels when faced with disturbances caused by the loss of a local industry or a significant crisis.

In ecology, resilience is the ability of a system to return to a previous state following a disturbance or transitory regime; this may correlate to the distance within or in relation to the attractor, in cases using a dynamical system model. It may also be a measurement of the amount of change necessary for a system to move from a given state to another state during reorganization of the system.

In network theory, resilience is the capacity of a network to supply and maintain an acceptable level of service when faced with faults and failings, the capacity to maintain access to information and communications across the network.

By referring to different domains where the notion of resilience may be applied, we find definitions that allow us to calculate the resilience value of a system numerically:

– [ATT 09] gives a state of the art of a certain number of definitions of resilience for urban infrastructures. Here, resilience is measured by the functionality of the infrastructure, after an external shock, in returning to the level of performance experienced before the shock, and the time taken to do this: if t1 and t2 are two instants respectively before and after the shock event, and if Q(t) is the quality of the infrastructure (to define as a function of the property being analyzed), the resilience index may be calculated as follows:

image

if we want to account for the development in performance between t1 and t2. This index is 1 if the performance has not changed, and otherwise is between 0 and 1 inclusive, and, in fact, measures the relationship between the area under the graph of Q(t) and Q(t1) between instants t1 and t2.

– In [WAN 09], the system is represented by a network with nodes corresponding to resource supply and consumption, respectively. Each edge has a certain reliability; the resilience of the network is calculated as the weighted sum (based on the connectivity of each node) of the resilience of each node, calculated as a function of the reliability and relative rate of resource flow through each node.

– In [OME 09] describing transatlantic communications networks, the basic resilience of the network is the relationship between the value delivered by the network before and after a disturbance. This relationship may be calculated at the level of a node or of the network as a whole by combining individual values.

If we analyze these different formulae closely, we see that they do not really present new notions, and are simply ways of rewriting calculations of reliability or quality. This illustrates the fact that we still lack true metrics to describe the intuitive concept. In what follows, we shall tackle the problem from a different angle, and rather than attempting to directly quantify the resilience of a system – with the aim of deciding whether one system is more or less resilient than another – we shall propose systems engineering methods that give consideration to the qualitative criteria discussed previously to attempt to circumscribe the concept of resilience.

1.4.2. Resilience versus dependability

Dependability and a fortiori resilience is an emergent property (and, moreover, is behavioral rather than structural). This statement does little to further our reflection, but highlights a posteriori the fact that we cannot necessarily define this property in its entirety using precise elements of the system specification.

In relation to reliability, defined as the probability of failure5 under certain conditions, or dependability, which attempts to define zones of dependability to avoid risk, resilience is based on the management of risk. It involves attempting to identify weak signals that are usually easier to detect a posteriori through possible indications of behavioral change in system–environment interactions. It also relies on mastery of the “crossing” and “exit” from a risk zone, which should be marked out as clearly as possible by rules, good practice, procedures, laws or even successful improvization!

It is foolish to attempt to define an explicit model of risk to insert as is into the model of the dynamics of a system in use in order to modify the command strategy. We will not go into the eternal debates concerning the epistemology of sociotechnical systems, where an approach of this kind requires us to have a mechanistic model of the human operator, or discuss the social or human organization in charge of the use of the system or systems here. It is clear that the delay required for integration of the recommendations of a risk model in a closed loop with the system in use is a priori incompatible with the reactivity needed to confront a risk situation. At most, this is conceivable in situations such as that of Apollo 13, where a risk is not immediate, although clearly potentially lethal, and leaves time for the simulation and analysis of alternative scenarios in order to master this risk.

Improving the resilience of a large complex system implies the presence of a certain degree of flexibility within the organization responsible for system operations at a given moment; this is an important criterion for conditions of system use. Excessively fixed or “locked” practices may be useful in avoiding operational errors (errors that may, themselves, lead to disaster) but diminish the capacity for proactiveness, anticipation and improvisation, factors that may be essential to the resilience of a system.

If we wish to be proactive, we should highlight the prevention of loss of control of a risk rather than the capacity of the system to recover when there is a loss of control. We thus require a high capacity for evaluation of the position of the system (or, depending on the case considered, the company or organization) in relation to a danger zone, and an efficient response to signals of dangerous situations, known or otherwise. Resilience should therefore be seen as a dynamical process of “visual piloting” and not as a static system state.

The difference between resilience and dependability is therefore clear: a system may be secure, but not resilient. This occurs notably in cases where there is no procedure to allow operation of the system outside the “safe” framework. Reciprocally, a system may be resilient but not secure. This is the case, for example, of a country defending itself against military aggression or confronted with a major catastrophe (such as the Haitian earthquake of January 2010, or the tsunami in Thailand of 2004), where resilience is seen in the maintenance of government structures and order despite significant human and material losses. These considerations should be put into perspective in the case of a system of systems: the dependability of a system of systems will depend, at the very least, on the dependability of its component systems, but this is not necessarily the case for resilience. Thus, a society made up of individuals (such as a hive, a termite mound or a banking group) may be resilient, whereas its component parts are not (major damage to a hive or termite mound following a storm, for example, or the closure of local branches in the case of a banking group).

1.4.3. Engineering resilience

We have seen that resilience is a combination of functions: avoidance (capacity for anticipation), resistance (capacity for absorption), adaptation (capacity for reconfiguration) and recovery (capacity for restoration). These functions should be executed in a certain environment.

Let us, then, set out the kinds of events or disturbances that a system should be able to face:

– habitual or predictable events: for example, earthquakes in regions such as California or Japan, or explosions in a chemical factory;

– rare or occasional events, which cannot all be described or predicted as there are too many of these similar events to be able to prepare for all eventualities. These events represent problems that are, however, a priori solvable. This is the case, for example, when an explosive device is detected in a subway system, for example that of Paris or London (unfortunately, threats of this kind in Israel, for example, fall into the previous category due to their frequency);

– events that are a priori impossible to predict, requiring more improvisation than in the previous cases in addition to a radical change in mental approach. The attacks of September 11, 2001 fall into this category. Unlike in the previous situations, which may be included in ready-made or easily adaptable response plans, the level of resilience intrinsic to the system is clearly demonstrated in such cases.

In this context, the resilience of a complex system is clearly improved by the capacity to detect and analyze weak signals, avoiding excessive reliance on preconceived ideas. [HOL 06] gives the example of Sidney Camm, responsible for the Hawker aircraft company just before WWII, who – realizing that Britain would need fighter aircraft in the event of a German air campaign against England – began building Hurricane class fighters even before obtaining a government contract. Camm’s foresight proved essential during the Battle of Britain, where the majority of aircraft used were Hurricanes. To conclude this example of perception of weak signals, note that Camm received a knighthood for his efforts.

We are faced with the challenge of designing systems “for uncertainty”, which may seem paradoxical, as that which is uncertain or unpredictable is a priori difficult to specify in terms of precise requirements! We must therefore define the required, desired or acceptable performance envelope and insist that the system be able to recognize situations where it may be outside of this envelope; this is known as a viability requirement, as defined in system control theory (see the works of Jean-Pierre Aubin on this subject). Concerning disturbances and their “uncertain” or “unpredictable” tag, note that this may arise from the fact that knowledge of the viability envelope of the system is incomplete or erroneous, or that the environment of use places particularly high demands on the system in terms of pressures, particular demands, premature wear and tear, etc.

Seen in this way, resilience is obtained via the capacity to monitor conditions at the edges of the performance envelope, determining their value and the usual distance from the edge, and the ability to adapt the operational behavior of the system to potential developments in this envelope (as an aside, from the theoretical viewpoint of system command, this is a “double” problem, concerning both the control of error in relation to the limits of the domain of viability and the dynamical control of these limits, or at least the model of observation of the limits). The required capabilities therefore include the capacity of the system to absorb changes without endangering its performance or structure, the flexibility of the architecture, in particular the ability of the system to modify part of its structure if necessary, and the control of margins and tolerances, so the capacity of a system to evaluate its own dynamics, in order to exploit this if necessary in the vicinity of the border.

The difficulty, particularly in cases where human operators are concerned, or where human organizations intervene at usage level (as in ICT systems in the military or banking sectors, in a context of network-centric warfare or a networkbased business, where the modes of operation of organizations are codified but leave place for local initiative – or error!) is the capacity, as much as the risk, for individuals to take initiatives. In an ideal world, systems would be designed and created with a level of quality in which the usage procedures are the best procedures for all situations that may arise, with a corresponding strict obligation to follow these procedures; however, first, the world is not ideal, and second, the human operator or organization is not a machine! Sometimes, a procedure may seem counter-intuitive, which does not facilitate its application and my raise problems at the limits of the mastered domain of operation…

A failure rarely results from a single cause; in such cases it would be simple to travel back to the root of the problem and correct it a posteriori. More often, a failure results from a conjunction of events, which taken individually are not particularly problematic, but which when taken together lead to catastrophe. The challenge in engineering resilience is therefore to “detach” such potential causes, then to seek models for possible conjunctions of events, rather than seeing conjunctions as exceptional situations, and finally to develop the dynamical stability models already mentioned in order to be able to evaluate and master situations. This understanding of the causes of incidents or accidents and the capacity for development of the current state will lead to the ability to master and reduce risks, providing increased dependability.

The capacity for evolution of a system can be evaluated by observing the way in which it responds to catastrophic events (in the mathematical sense of the term, i.e. sudden changes in dynamics). These events provide information on the localization of conditions at the edges or limits of the system and on the behavior of the system in the immediate neighborhood and beyond these limits. In an informal manner, this seems reasonable and relatively easy to understand; however, if we look more closely, we note certain ambiguities. If the system has a capacity for adaptation, it will react to the trigger event by compensating the dynamics it possessed up to this point in a certain way. Is this a sign of adaptation or of imminent failure? In order to evaluate the true nature of these disturbances, we need to be able to analyze the development of system dynamics in the presence of a set of disturbances, in other words, we need to carry out a “second order” analysis. Our aim is to quantify the “slide” of the system towards a state of failure before a major breakdown occurs. This “slide” is tricky to define, although in the case of a socio-technical system with a major human component, we can talk of “sliding” from the point where modes of intervention change in relation to those habitually predefined by procedures, or, in short, when we encounter a difference between what happens in reality and what was imagined by decision-makers or regulators.

Based on these observations, it is useful to update risk models after the event, so that the next time a situation of this kind occurs the actions taken will not constitute a change to pre-established rules, but rather a new rule. One solution for determining the resilience of a system in a more quantitative manner, making use of the previous remarks, is to compare behaviors produced by the model of the system as it was designed with those produced by a model of the system in operation.

How, then, can we define appropriate metrics to make use of this comparison and allow us to detect changes and risks, in order to trigger the necessary measures before entering into a state of failure?

To illustrate this problem, we may use the analogy of the canary used by miners to detect the presence of toxic gasses within galleries in a mine. We shall return to this problem later, replacing it in the broader context of system architecture.

The integration of resilience engineering into systems engineering can then be carried out at two high points in the lifecycle of the system: during use, evidently, but also during design (the order in which we cite these points is relevant, and corresponds to the lack of maturity of the second case in relation to the growing consideration given to the issue in the first case).

1.4.3.1 Engineering resilience during system use

Dependability models do not include cultural and organizational aspects linked to conditions of use, but these aspects may constitute a key factor for proactive anticipation and flexibility in the case of danger. It is therefore useful to integrate risk evaluation and dependability capacities as early as possible in order to evaluate potential dangers at the current point of operation.

The system, in its different dimensions, should have the capacity to adapt in order to be able to ride out a crisis, if not with success, then at least avoiding patent failures. This is made possible by a capacity to take feedback into consideration on different technical and organizational levels and the ability to listen to individual and collective operators, analyze feedback with captors tracking potential early signs of the crisis, and determine new practices that may, still, be ignored in order to improve resilience. All in all, this represents a major challenge.

We therefore need to create a loop made up of performance monitoring, learning and management of change, in parallel with (or sometimes to replace) the standard command loop.

This supposes that effective models of communication and transmission of information exist or may be established to give meaning to this capacity for “learning”.

To summarize the characteristics of a potentially resilient usage process, the process must be able to:

– provide an independent viewpoint with the ability to question current modes of organization. This allows us to overcome certain operational constraints and power struggles;

– access information on the current state of operations and evolutions of this state, notably differences in relation to planned normal operations; and

– be aware of margins for maneuver and the weak points of an organization, and the potential differences between prescribed modes of operation and what happens in reality. The aim is not necessarily to remove these differences, but to exploit the margin for maneuver that they may generate.

1.4.3.2 Engineering resilience in the system design phase

We shall now propose an architectural framework that allows us to take the capacity for resilience into account during the system design process. To do this, we take inspiration from our work in the domain of robotics [LUZ 02].

In effect, autonomous robotized systems must, by definition, be able to adapt to disturbances, change their mode of interaction with the environment if necessary and adapt their own operations as required. In order to do this, the system must be able to integrate knowledge of its own dynamics and the representation of the exterior to construct explicit representations of its internal and external worlds. A number of different architectures have been proposed for artificial robotized systems in order to provide increased capacities of autonomy, and several decades of feedback allow us to measure certain intrinsic limits and to analyze possible directions for improvement.

First, we note that a robot is a complex system combining sensors, actuators, electronic and mechanical organs and processing resources. It must have access to means of organizing these different heterogeneous components in order to fulfill a predefined mission, which may, moreover, evolve over time. Additional constraints, such as considerations of real time and cost, are present, and these must be taken into account in operational systems. Architectures must therefore provide the elements of a response regarding the best way to construct a system based on different basic components. They must produce a coherent system using these components and find the way to organize components to fulfill a mission that changes over time.

Historically, the first robot architectures introduced derived from the perception– planning–action paradigm found in artificial intelligence: this is a top-down approach based on a recursive functional decomposition of the problem into subproblems to attain a level of granularity where an explicit solution to a problem may be found. However, architectures of this kind are subject to problems of: symbol grounding (the attachment of a symbol to real data); completeness (the impossibility of imagining all possible situations in a real-world context); and brittleness (the problem of specifying the different situations that may arise, with the risk of associating a particular mode with each situation, making the system excessively sensitive to any misjudgment). In other terms, the corresponding systems manipulate symbols that cannot be constructively linked to environmental traits; they must be based on an environmental model that needs to be complete and hierarchically redefined in order to correspond to a descending decomposition. All of this is within the realms of possibility in static environments (as is usually the case in certain flexible industrial workshops), but any and all unplanned situations can have a dramatic impact on the system.

In reaction to this approach, bottom-up architectures have been proposed, inspired by work in the fields of biology and ethology. These are not based on explicit environment models, but on input–output behaviors, which are integrated in order to be able to solve more complex tasks. Braitenberg vehicles, constructed several decades ago illustrate this idea: by combining phototaxis and avoidance behaviors, we create behaviors that might be interpreted as coming from a higher level. One of the best known architectures in this family is Brook’s subsumption architecture.

As with the previous approach, this approach is doomed to failure, and is excessive in the way it leads to systems that are unable to solve complex tasks. It does allow systems to produce behaviors that resemble those seen in certain insects or animals, but this level cannot a priori be exceeded.

In order to profit from the advantages of both these types of architecture (and, if possible, avoid their shortcomings!), a third way has been developed in recent years.

“Hybrid” architectures contain a reactive component (inherited from the ascending approach) and a decision or planning model taken from the descending approach.

We support this third area of research, and have proposed a hybrid architecture for autonomous robotized systems. After experimentation on mobile robots, we were able to contribute to the extension of this architecture for other artificial systems for reasoning and decision making [AFS 07, MAR 08, SAL 07].

Our proposed architecture consists of four blocks organized around a fifth unit: perception processes, an attention manager, a behavior selector and action processes, with the core of the architecture based on representations.

Figure 1.1. Proposed architecture

image

Sensors return data exploited in the perception process to create representations of the environment. These representations are therefore instances of specialized perception models. For example, in the domain of robotics, if the sensor is a video camera, then the representation of a wall to follow may be restricted to the coordinates of the edge detected in the image. Each representation comes with references to the process involved in its creation: the date and various other pieces of data linked to the captor (position, zoom, etc.). Representations are stored in a memory bank of fixed and constant length, with a discard mechanism (the first information acquired will be deleted first) used to manage the size of the memory bank. Thus, representations are images of recent points of interest detected in the environment, with spatial and temporal details.

Perception processes are activated or inhibited by the attention manager, and also receive information on current active behavior. This information is used to plan and check the coherence of a representation. The attention manager has three basic functions: it updates representations (on a periodic or exceptional basis); monitors the environment (detects new events) and contains processing algorithms (prediction and feedback loops); and it guarantees the efficient use of processing resources. The behavior selection module chooses a behavior for the robot based on a predefined goal or goals, representations and their estimated reliability.

In conclusion, behaviors control drivers in a closed loop with associated perception processes, and each behavior has a corresponding action selection mechanism depending on the current situation.

The key ideas involved in this generic architecture are as follows:

Use of sensory-motor behaviors linking perceptions and low-level actions: coupling allows us to compare predictions for the next perception (estimated based on the previous perception and the current command). The actual perception is obtained after application of the command, in order to decide whether current behavior is progressing correctly or if it should be modified.

Use of perception processes, with the aim of creating local and situated representations (i.e. representations with a spatio-temporal position) of the environment. No global model of the environment is used; however, using local and instantaneous representations, it is possible to create less local or high-level representations.

Qualitative evaluation of each representation: each processing algorithm includes evaluation metrics that give each constructed representation a value expressing its trustworthiness. This is important in that each processing algorithm has a domain of viability with parameters ideally adapted to certain situations. There is no “perfect” process that always provides correct results.

Use of an attention manager: this supervises the execution of processing actions on data produced by perception, independently of current actions. It takes into account the processing time required for each perception process, and the cost in terms of processing power requirements. It examines all new events due to environmental dynamics, which might signal a new danger or opportunities for behavioral change. It may also trigger processes used to check whether sensors are functioning properly and is able to receive error signals produced by current perception processes. In practice, for example, for a visual sensor attention is focused on lighting conditions, the coherency of movement in the robot, the temporal coherency of representations, and on error signals sent by the perception processes. With this information, it is then possible to invalidate representations involving malfunctioning sensors or badly-used processes.

The behavior selection module chooses sensory-motor behaviors that must be activated or inhibited based on a predefined objective, the representations available and events created by the attention manager. This module is found at the highest level of the architecture. Note that the quantitative evaluation of representations plays a key role in the decision process of the behavior selector. First, a representation may be more or less suitable for a current situation, depending on the captor used or on the conditions of acquisition of perception information. For example, in the case of mobile robotics, a daytime camera used at night will provide representations that are not particularly reliable. Second, certain representations may be more or less interesting depending on behavior, or may provide additional assistance in choosing between different behaviors. For example, a wall-following behavior requires information on contours more than velocity vectors, whereas a target-following behavior operates in the opposite direction. Thus, each representation is weighted for each behavior based on its potential use, and this weighting is combined with the intrinsic evaluation of the representation;

The action selection module brings together low-level controllers that act on the drivers. It uses valid representations to calculate control laws.

Let us now look at the way in which this architecture operates. Behaviors follow on from one another based on events detected by the perception processes. For example, an obstacle avoidance behavior would naturally follow a behavior focused on following the edge of a road in cases where an obstacle is detected (this perception mechanism forms an integral part of the sensor-motor behavior of obstacle avoidance). Based on this specific case, we reach certain generalizations: each active behavior corresponds to a sub-group from within the full set of perception processes, and their activation should allow the detection of an important event leading to the modification of current behavior. Events may be split into three categories: those that are important for system dependability; those that only have an immediate influence on current behavior; and those that have no direct impact or a very low probability of occurring.

In addition to organizing perception processes by order of utility, the attention manager must guarantee the reactivity of the system. In order to achieve this, we might consider the processing cost of each perception process and allocate a processing time quota to the manager, which is shared between essential perception processes (those with a direct effect on dependability) and other useful processes, with regular activation of the first set of processes. This organization of perception processes evidently depends on prior knowledge of the system and its conditions of evolution in the environment, but adaptation or learning mechanisms may also be included to generate evolution.

We shall now move on to the evaluation mechanisms used for quantitative evaluation of various representations. Looking at the operation of the architecture, we see that the action selector executes behaviors made up of action and perception processes, whereas the attention manager only selects perception processes. Temporal evaluation is carried out at the level of action processes, taking account of previous actions and perceptions; instantaneous evaluation is carried out on perception processes. The attention manager carries out comparative evaluation of all representations, enabled by the fact that it is not constrained by the immediacy of the low-level sensor-motor loop. The action selection carries out long-term evaluations of perception processes with regard to the execution of the tasks assigned to the system. This allows the implementation of deliberative mechanisms and of planning at this level.

These different evaluations are carried out using metrics that, depending on the case, are based on intrinsic quality indices found within the processes or on effectiveness in relation to the task or mission underway.

The distribution of evaluations in this way provides the architecture with an important capacity for autonomy, where different time scales are able to coexist, with a synchronous character at low level (sensory-motor loop) and an asynchronous character both in the perception-selector and action–action-perception loops (deliberative loops) and, especially, in the perception-attention manager-perception loop (attention loop). There is no centralized, omniscient evaluation and decision mechanism, and it is the existence of these three loops, with different time scales and coordinations in relation to development within the environment, that render the architecture effective. As there are no centralized evaluation practices, there is no global metric, but a set of metrics for each local process, and the architecture as a whole enables the use of all of these metrics. Here, we find a good illustration of the way we reformulated the problem of metric definition to produce a design problem applicable to a multi-level architecture.

1.4.3.3 Conclusion

Through reading the description of the presented architecture and the functionalities it provides, we see that this architecture is relatively generic and can be applied to decision making for complex systems in real-world situations with changing environments and goals.

The main interest of this architecture is that it simultaneously offers a low level loop, a deliberative loop and an attention loop. The low level loop relates to the realtime evolution of the system within its environment (with the possibility of implementing redundancy mechanisms at this level to give reactivity when faced with certain situations, particularly system component failures). The deliberative loop is the same as that found in classic decision systems. It allows open-loop planning, based on representations and simulating behavior over a period of time if necessary, as when a sportsperson concentrates on a race and rehearses his or her trajectory before departure. The attention loop allows us to manage processing resources and perception processes during their implementation, in order to account for events that may indicate unforeseen situations to which the system must react. It is this attention mechanism that allows permanent evaluation of the environment and development of system behavior, including any changes from predicted behavior, and thus corresponds to the qualities discussed above to guarantee resilience.

We shall now return to the concepts involved in the loops cited above and reinterpret them from the perspective of a negotiation between participants in a debate, with four main components: action, perception, supervision and diagnostics [AFS 07, MAR 08, SAL 07]. Action and perception retain their meaning, the selection of behaviors is reinterpreted as supervision, and the attention mechanisms are relabeled as diagnostics. The specific links that do not directly relate to the two loops are, in this case, notifications between perception and supervision and the regulations between diagnostics and action. This allows us to consider the action/perception and diagnostics/supervision pairings as part of a set of paradigmatic oppositions, resolved by specific links. Thus, we can see how far the debate provides each participant with new knowledge to improve his or her own actions. Using this different interpretation, it is even possible to tackle collective modes of reasoning, as practiced within scientific communities (the influence of Thomas Kuhn is clear in this vision), thus enlarging the field of application of work initially carried out in robotics. The domain of robotics is, in fact, only one specific field of application. It is emblematic due to the presence of the dual issues of realtime operations and the requirement for an evolved capacity for decision in order to carry out complex tasks, and due to its demonstrative character. These reflections can, however, be transferred immediately to any complex system involving demands of direct participation in the real world and a capacity for reflection – a level of adaptation to context above and beyond simple immediate reaction.

If we refer to the previous sections, it seems that the mechanisms implemented by the different loops presented and the associated evaluation issues respond to the demands of resilience we initially discussed. Therefore, we feel that the architecture presented is a pragmatic response for engineering resilient complex systems.

1.5. Development of relationships between participants

With the increasing complexity of systems and the high cost of infrastructures often required for ulterior service provision, the “usual” project management and contracting organizations must evolve in order to manage risks, but also to guarantee the financial feasibility of these ambitious projects. The 1990s thus saw the development of new modes of contracting and financing in public service management, particularly in major cities and in transport infrastructure projects. We thus have access to a fairly large amount of feedback [CAM 09, PER 00], with projects spanning a period of up to 20 years, although this is mainly limited to infrastructure or building architecture projects; services sold are directly dependent on the infrastructures that authorize them. We find all types of schemes that are a priori possible in terms of the distribution of engagements and delegation of responsibility between the public and private sector actors, from markets of exploitation to privatization via superintendence, plus basis management, leasing, contracting, etc. Feedback allows us to identify traps to be avoided and recipes that generally lead to success, but the fundamental question of whether or not it is reasonable to apply these techniques to large complex systems that are less directly connected to transport or collective urban services remains.

The development of definition and design activities, of relationships with clients and users or recipients (of the final product or intermediary services) and directors, managers, etc., leads – at the very least – to the use of adaptation mechanisms for these professional environments or to the emergence of new preoccupations. The latter take into account new demands emanating from the context of activity, the development of the economic, political or regulatory situation, the evolution of social demand, etc.

As an example, we might cite the increasingly assertive way in which the limitations of sustainable development are taken into account, including ecological aspects and mastery of the energy component, and the social aspects linked to this sustainable development with regard to child workers, for example, or the exploitation of certain groups (aspects seen in the fair trade movement from the early 2000s onwards). This is fundamentally due to the intensification of social interactions within production processes. This interleaving of interactions concerns all actors involved at one point or another (project managers, economic clients, final users, financial, institutional or political partners), which modifies the organizational structures of project realization, with the appearance of new issues, activities and professions.

In addition to the increased complexity of organizations and of chains of responsibility, we should also note that this leads to an increase in the distance between actors, particularly between designers and clients, increasing the risk of poor understanding of real issues or deformation of certain constraints. In fact, the complexity inherent in the envisaged systems implies a less affirmative definition of needs, with a necessity for mutual understanding of the needs, expectations and intentions of all those involved. This raises a need for new modes of dialog and shared working practices.

Given these financial, economic and even societal issues, the contractor is usually a public entity: state service, territorial collectivity or plenipotentiary agency. We shall therefore study how this relationship must evolve in relation to various issues, adding to the first elements of discussion on this subject contained in [LUZ 08a]. The examples we may use to understand this requirement for evolution include contracting in the domain of transport (motorways and crossings, including viaducts across estuaries or large valleys, public transport systems, ports, airports), delegated management of collective urban services (urban user services, energy distribution, public lighting, water and sanitation, waste management), etc.

First, we should note that this issue is not, by any means, new. From the 16th Century onwards, we find different types of contracting for infrastructures, particularly involving private companies. From the end of the 19th Century, we observe delegation in the management of certain urban services (urban transport systems in particular). This should not come as a surprise, as the notions of contracting and public management date from the 13th Century, as seen in the writings of Villard de Honnecourt, with the existence of contracts between actors made necessary by the flourishing cathedral construction industry. During this period of the Middle Ages, we also see the emergence of specialized trades, something that was not common in the 12th Century and is characteristic of a distribution of responsibilities. Given that the workforce was no longer entirely polyvalent, it became necessary to organize this workforce. There was thus a need to manage relations between the entity requiring a service and that defined the aims, calendar and budget, on one hand, and the competences of the workforce involved in designing, explaining, planning and providing quotations, on the other hand. While the terms used and trades involved were not the same as those found today (for example, the chapter of canons would choose a person responsible for directing work, procuring materials, keeping accounts and engaging qualified workers), we find a distribution of responsibilities and a hierarchical organization not dissimilar to current working practices on complex projects. Let us stay with this example of the construction of cathedrals. With the aim of connecting Heaven to Earth, while highlighting the void between ordinary mortals and those in power through daring and imposing architecture, the set of current logistical and even sustainable development issues already existed. Particular efforts were made to save resources (construction materials, such as wood, stone, lime and iron, but also materials used for planning, such as parchment!) due to supply issues and to avoid overexploitation, related to an insufficient industrial infrastructure. Efforts were also made to clear transport difficulties: resource management, logistics and the acquisition of raw materials close to good supply routes, along with the construction site itself, were major and permanent concerns for the teams responsible for carrying out work.

While major changes took place due to wholesale industrialization in the 19th and 20th Centuries, when mankind first succeeded in simplifying problems through the automation of both work and methods, this historical perspective remains interesting as the complexity of modern day systems in relation to our present capabilities is comparable to that seen in earlier systems in relation to what could be practically envisaged. Nowadays, who would launch themselves, heart and soul, into the construction of edifices over a predicted time-span of a few decades in relatively unstable geopolitical contexts with no financial guarantees?

The new relationships between different participants must provide higher quality services at a lower cost to society. Thus, we have recourse to the private sector to finance investment projects based on projected revenues generated by future use. This alleviates public budgetary constraints (by avoiding contributing to the public debt, and possibly through profit sharing between the operator and the public authorities). It also enables the public sector to profit from particular skills in terms of technical ability or asset management that are not necessarily available internally.

Why, then, do we make use of the private sector in providing public goods and services? This is not done for ideological reasons, but due to concerns of economic efficiency, for four main reasons:

– the private sector benefits from economies of scale, in that it is able to provide the same type of service to several clients at national or international level, allowing investment costs to be recuperated and the risk linked to development and production to be distributed across a wider base;

– the private sector benefits from a system of incentives and sanctions (profits versus bankruptcy), giving it a dynamic unseen in the private sector, in that even in the case of cost centers, failure does not have the same, irreparable, results;

– the private sector has greater flexibility in terms of use of finance, investment policies and resource management, enabling greater reactivity and increased momentary efficiency;

– the private sector is better able to make use of advanced technologies, mainly due to the capacity for short-term investment and flexibility in the management of resources and competences.

These relationships operate outside the strict sequential framework of client– supplier relations, where demands and responses are expressed in a highly formalized framework of exchange, defined in terms of time and expected results: the complexity of these systems requires better integration and synergy in different phases of design, construction and operation. Thus, when we consider major investment projects taken on in whole or part by the owner, it is important to give him or her a high degree of freedom in the creation of the usage contract, in order to give maximum stability and equity. Stability is necessary as a guarantee to maintain the relationship over time and ensure the success of the project. Equity is necessary in that the risk taken in design or at the beginning of exploitation should be covered as far as possible by future remuneration generated throughout the phase of use.

Risks can vary greatly in nature, and their identification and the mastery of these risks are key factors for success (or failure). During the design phase, we encounter:

– technical risks;

– cost overrun risks related to delays in delivery;

– risks in the successful execution of the project;

– risks linked to interfacing and sub-contracting;

– economic and financial risks (evolution of cost and resource indices, exchange rates in the case of international purchasing or sub-contracting, etc.); and

– risks linked to refinancing in complex operations where major liquidity may be required during different phases of the project.

Next, during the usage phase, we encounter:

– risks linked to immediate exploitation;

– revenue risks linked to external income;

– risks of increase in the cost of use;

– financial risks;

– environmental risks – whether due to natural disasters or to macroeconomic causes (financial crisis);

– judicial risks (particularly developments in legislation);

– political risks (particularly regime changes bringing changes in priorities, which may pose problems for stability over the duration needed to guarantee profitability on initial investment);

– risks of social and cultural acceptability;

– etc.

Risk management, in the form of estimation, distribution and valorization, aims to allocate these risks to those best able to master them, based on the cost-efficiency relationship. By looking to allocate risks in an optimal manner and partially transferring risks, whether technical construction risks (which are easy to deal with) or commercial risks linked to ulterior use, public authorities are better able to concentrate on monitoring quality of service rather than on the detailed definition or effective provision of this service. The user is the client, and the operator must optimize the quality of service offered based on an economic model with a framework that is fixed in advance, but evolves and adapts based on the context through the actions of the operator. This flexibility in operations can be used to gauge mastery of complexity; it is impossible to fix or predict everything in advance.

We are thus confronted with a redefinition of roles: the proprietary and operating entity on one side, and the regulatory and monitoring body on the other. For this latter body to successfully carry out its work, systematic performance indicators are needed to make it possible to follow the execution of service provision. Otherwise, we run risks of non-performance and potential interruption of work and services (a few years ago, this almost happened in the context of use of the Channel Tunnel). For this reason, we must develop technical expertise in contracting to limit risks during different stages (without falling into the trap of abandoning technical specification abilities just because production is externalized). We must also develop judicial and financial expertise for participation in risk reduction, where only those risks that may be covered in the framework of a perennial contractual relationship are transferred.

In this way, a contract must be established based on a risk and profit structure, following the principle of seeking financial and economic balance. In this new distribution of roles, the public authority usually conserves the role of mission definition and regulation. The process may (and should) be envisaged as a reversible process, however, notwithstanding the problem of skills that may disappear if they are not used, rendering this reversibility even more difficult to attain. This same problem is encountered in the nationalization or privatization of companies or sectors: movement is possible in both directions, (privatization is never definitive, even if renationalization is often the fruit of strong political alternation, as in the case of the crude oil sector in various countries in the Middle East and South America over the past few decades).

Relationships between participants are obviously contractual, with both parties engaged in obligations to the other. However, this contractual relationship should be seen as a relationship that profits both parties. In the case of points of divergence (but also of strong convergence) it is important to implement specific discussions in order to refocus on common goals and the distribution of potential profits.

During the Middle Ages, a cathedral construction project had to be accompanied by a set of alliances providing durable guarantees of the feasibility of the project, notably by avoiding local armed conflict that might put an end to the project. In the same way, the financial sustainability of large complex projects required financial constructions that are themselves complex, with access to and development of financial markets. The previous remark also highlights the role played by politics, however, as certain general policy decisions can have an effect on public spending, reduce subscriptions and, generally, have an impact on operating conditions in the course of a project, which may or may not generate significant cost increases.

This all serves to underline the key role of partnership that is necessary, around specific and non-related objectives, perhaps, but also locally and with a focus on shared goals. In this, the key values are respect and confidence, required for the creation of a special and durable relationship. In fact, any failure in relations between participants, particularly in contracting, leads to a reduction in the quality of service provided, with an associated risk of cost increases and, for the operator, lower profits or even the loss of some or all of the capital invested.

Having highlighted contractual flexibility as a factor for success (in this context, “flexible” does not mean “fuzzy”, but the capacity to adapt based on particular contexts without needing to rethink the entire contractual structure), it is clear that overly-complex contracts must be avoided. This is because they become difficult to maintain, particularly in cases that attempt to predict all eventualities and thus define all obligations. In reality, it is necessary to adapt to new environmental norms, investment priorities, possible financial devaluations or economic or social crises. Contractual adaptation should not itself be seen as a “crisis”, but as an adjustment mechanism linked to the evolution of initial conditions. Evidently, this should always take place in a climate of trust without either actor seeking to profit abusively from the situation.

This mechanism is not easy to implement, and for this reason, in highly complex projects, it is useful to make use of a regulator (the contract between participants is not always sufficient). The regulator can mediate concerning different local problems, ensure the respect of major contractual engagements, make impartial statistical analyses, assist in contracting when necessary and provide support in the adaptation of rules. Thus, the construction of new rules is the result of cooperation between actors, and this rather pragmatic approach becomes, in a way, the product of a “learning” process linked to the relationship between different parties. In addition to the creation of basic rules for sharing, we need to have the means of applying and ensuring respect of these rules, which requires competences in monitoring, measurement and evaluation; where necessary, contract regulation may be followed by renegotiation.

This mechanism is not easy to implement, and for this reason, in highly complex projects, it is useful to make use of a regulator (the contract between participants is not always sufficient). The regulator can mediate concerning different local problems, ensure the respect of major contractual engagements, make impartial statistical analyses, assist in contracting when necessary and provide support in the adaptation of rules. Thus, the construction of new rules is the result of cooperation between actors, and this rather pragmatic approach becomes, in a way, the product of a “learning” process linked to the relationship between different parties. In addition to the creation of basic rules for sharing, we need to have the means of applying and ensuring respect of these rules, which requires competences in monitoring, measurement and evaluation; where necessary, contract regulation may be followed by renegotiation.

Here, we have a clear illustration of the evolution of competences of contractors, from prescription to regulation, project management (in the production and usage phases) with a focus on the allocation of risks, contractual engineering and the full cost approach (this last aspect should guarantee mastery of different unitary choices in the project as a whole, including all phases of design, construction, maintenance and operations).

Financial setups also have specificities in that they must be suited to the risk structure, but also to the judicial context. Classic financial responses are not necessarily suited to certain risk levels, particularly given the duration of the projects considered here.

Finally, it seems that in the context of large complex systems, operating within one or more projects, the plurality of approaches (industrial, service, financial and contractual) presents a real challenge. This only finds a response through the use of real expertise in judicial (including management of industrial and intellectual property rights), financial, accounting and technical matters alongside contractual engineering. These competences must be used in the definition of the quality of the goods and services concerned.

1.6. Complexity: plurality of viewpoints for systems engineering

Complexity is a recurring theme and one that is difficult to pin down: it is hard to provide a precise definition of the concept. We often encounter an opposition between “complication” and “complexity”, although this distinction is not always evident in everyday usage. Etymologically, the “plect” aspect of complexity refers to interlacing or braiding; whereas “ple” or “plic” in “complication” refers to folding and the juxtaposition of components. It is, then, the properties of interaction and interweaving that constitute the difference between that which is complex and that which is complicated. This interweaving is temporal as well as spatial, highlighting dynamical and local interaction loops; it may be observed at the level of system components, but also in exchanges with the environment of the system that is not considered to be part of the system itself.

This raises another issue: the definition of this difference between “interior” and “exterior”, the exact location of the place in which they are separated – the border, which should be explicit, as a place (spatial) and in links (interconnection) for exchange between the system and another system – the environment – from which we may wish to free ourselves if necessary, as it is even more difficult to master. It is from this lack of mastery – which is not only voluntary but is caused by the impossibility of imposing a desired dynamic on the environment – that one part of complexity emerges. Complexity reappears on the inside of the demarcation line, via the establishment of local mechanisms for mastering exchanges of resources, energy and information. As it is presented by the proponents of autopoiesis, the border is born of and embodies/represents the difference between what it then defines geometrically and dynamically as the endosystem and the exosystem: if, for a cell, the membrane is the natural place of the border, does this mean that, for a human being, the body constitutes this place? Are my spectacles part of the “me, human being in the world” system? And, in our current digital society, does my telephone, giving me access to daily services that are almost essential for my place in society (or for my very survival, to go by the example of certain young people), delimit the border with the environment in which I live and work, or is this frontier further away? If we consider the telephone as an integral part of the “human system within society”, does it determine the border as a physical place at the level of its strict material extent, or should we integrate other software, material and information components? In an attempt to obtain a minimalist definition of the concept, at the most we could hope that interactions with the environment would be less important than internal interactions, providing a basic dynamical division, with the separation incarnated in the border.

After addressing the question of the border, we are faced with the question of opening and closing the system. This is a key argument involved in the complexity of the system, as it deals with the existence of unplanned and a priori unpredictable exchange flows that are unquantifiable in time and space. This trait in particular motivates adaptation mechanisms, which move from reactivity to proactivity, anticipation and learning, aspects that are characteristic of complexity.

In an attempt to make progress in the debate regarding definition of the field of a complex system, with a view to creating strategies to master this complexity we shall now list some characteristics of a complex system:

– its elements respond to solicitations from nearby elements, to which they are linked by exchanges that vary over time;

– a large number of constituent elements, often heterogeneous;

– the interweaving of components with other components, but also with nearby elements with which they may connect, is such that global structure and behaviors cannot easily be deduced from exhaustive knowledge of individual structures and behaviors.

The first characteristic shows the importance of a dynamical vision, and particularly of retroaction, which is essential as it provides the means of counteracting the second law of thermodynamics: that of the increase in entropy, by producing “pockets of order” [JOH 07]. To skeptics who see this as an invalidation of the second law, we would respond that retroaction implies the operation of a regulator, and so an external energy source, meaning that the regulated system is no longer isolated. This explains the fact that it may be ordered via regulation. Complexity then comes from the distance between this level of order and the disorder that would be generated by the simple application of the second law, and specifically from the link between this distance and exchanges that take place to implement it (in control theory, this is known as the synthesis of the control law).

The second characteristic, if it occurred alone, could be attributed to the simple characterization of complication. In quantity and quality, it may be observed in the cases under study: diversity of product and service components, diversity of organizations and relationships, a number of the pre-cited elements that increase through interconnections, etc. This characteristic is a structural marker, but remains purely static, and time is not explicitly taken into consideration.

The last characteristic is often the mark of that which is known as “emergence”, which in and of itself is a “complex” term and susceptible to give rise to debates that are difficult to resolve! Emergence is often observed mathematically due to:

– borderline conditions (harmonics on a violin string in resonation or waves in a closed recipient, for example);

– the presence of forces outside the system (such as gravity, which sculpts certain snow crystals in addition to the phenomenon of accretion around impurities);

– the result of static interactions leading to equilibrium (Turing motives in a chemical reaction–diffusion equation, for example); or

– dynamic interactions potentially maintained by an external source, in which case they characterize situations that are far from equilibrium (such as traffic jams in the road network).

In what follows, we shall look at the fact that these characteristics occur in natural systems as well as in artificial systems, and we shall focus on the bridges we might build between these two domains. This is for the simple reason that the scientific and technical communities interested in these domains are generally different and communicate very little with each other about their centers of interest and the results and methods they obtain and develop.

We could indeed admit that there is no similarity between a termite or ant hill and urban collective services, or between a school of barracudas hunting sardines and an aero-naval group. However, the temperature regulation mechanism of a termite mound and the capacity to recover this regulation ability in a matter of hours following a major cataclysm (such as the destruction of part of the edifice), the removal of the remains of dead members of the colony, the search for food and its distribution within the insect community – all present analogous problems. Such models thus offer ideas for solutions, for example for regulating traffic in a city center with the accompanying need to master resource supply (consumption, energy etc.) and the evacuation of waste. In the same manner, the hunting and evasion mechanisms of schools of fish, both predators and prey, provide inspiration for models useful for missions for the occupation of a maritime zone and the sub-marine operations of a military aero-naval group.

The fundamental difference is that natural systems, unlike artificial systems, were not designed by human beings, although both are often used by mankind seeking to profit as much as possible from this exploitation, whether by gaining maximum immediate profit or by making it last as long as possible. As an aside, note that we are not going to go into the debate of whether these natural systems are or are not the fruit of intelligent design… in any case, even staunch supporters of this idea would admit that these systems are not the product of a limited rationality, such as that used in engineering artificial systems.

How, then, does this engineering process operate, and what practices lead us to suppose that the problems involved are so different as to produce an almost total absence of exchange over the past few decades?

First, systems engineering, as it is traditionally practiced, is based on a double principle of decomposition and integration. It follows the purest Cartesian tradition of breaking a problem down into sub-problems, the context of which is simplified in the aim of achieving mastery, in order to propose a solution for each elementary question and finally reassemble the pieces of the puzzle and their solutions, thus constructing a global solution element by element. This way of working is based on hypotheses of inversability and repeatability: decomposition may be followed, as needed, by reintegration without any loss of information, as the principle of superposition of elementary solutions is implicit. This non-loss of information is linked to acceptance of the fact that high-level demands may be broken down into lower-level demands. There is still, however, a need to check that elementary tasks have been successfully executed (the verification process) and correspond to expectations (the validation process) and that assembly is carried out successfully. (The integration process is based as much on summary checking that the material, physical and informational interfaces are correct as on the fear of unexpected “leaps” when moving from local to global level. The typical example of this is the assembly of parts of aircraft, satellites or rockets in different geographic locations, where the final assembly in one place is linked to ulterior transport constraints and is the object of exhaustive verifications, but a priori no questions are raised as to the feasibility of the assembly.) The processes of decomposition and recomposition carried out in this way necessarily presume the existence of a capacity to make predictions and (clearly) the hope, or expectation, that these predictions will be proved right. These are the same principles that govern the theoretical realm of linear systems, which are characterized by the principle of superposition (the solution to a problem broken down into two sub-problems is identical to the composition of the solutions of these two sub-problems). There is a particular implication that the passage from local to global level does not create problems. Knowledge of a linear system in a portion of its configuration space (the open neighborhood of a regular point of the state space, to use more formal language) is equivalent to knowledge of the system over the whole space. This linearity – with its multiple semantic acceptance – can be found at all levels of “traditional” systems engineering:

– the output and behaviors of the system are known beforehand (these are declined as performance or system requirements in other products of this engineering activity that we will look at later, such as that of a training system for future development);

– the project organization is centralized, with a project manager who makes decisions concerning the use of all necessary resources, breaks activities down into tasks and sub-tasks as necessary, with different dedicated resources. The project manager plans subsequent reassemblies in a manner that is entirely linear, in that only the attainment or non-attainment of a milestone is important; the route taken to reach this milestone has no impact on the global feasibility of the project;

– management of change may be centralized in the same way, with an authority operating along the same lines described above.

All of this works when the hypothesis of linearity, concerning both the system and its interactions with its environment, is acceptable. This requires that the environment be predictable and that possible interactions be completely accessible, both in terms of quality and quantity. This may appear to be something of a caricature, but think of factories, metropolitan transport stations, highways etc. Limits are imposed on the environment to avoid unplanned situations, for example by tracing lines to delimit authorized passages, allowing systems to be automated, as in factories, with total normalization of curves and straight lines. There is also possibly the installation of sensors (counters to estimate traffic in road infrastructures, magnetic captors in subway platforms to permit the automated movement of cleaning machines) in order to facilitate or limit possible behaviors and minimize or prevent situations of non-conformity. In effect, we attempt to create a closed world, with mastery of the full dynamical description (or supposition of constant conditions) and that can, in theory, be reduced locally to a particular observation.

Linear systems, as interesting as they may be, are only one particular subset of systems, however, and from a mathematical point of view this subset is itself negligible, despite the fact that almost every dynamical system (on the condition that a dynamical system is formally defined as the action of a semi-group on the set of piecewise continuous functions) may locally be approximated to a linear system. In today’s world, where systems are increasingly interconnected and the input into a system may be services provided by other systems of which the organization is unknown to us, this hypothesis ceases to be valid. We are no longer dealing with objects (products, services) that are designed, created and used alone, and thus may be managed and controlled simply with no consideration for external factors. Systematic and exhaustive planning is no longer possible, and current complex systems require that we look to the future rather than to the past. We lose the notions of global observability – which allows complete mathematical reconstruction of a system based on past observations – and global controllability – which allows us to choose a desired state based on the knowledge of all past states and actions (in passing, note the mathematical equivalence of these notions for linear systems, an equivalence that is immediately lost when considering non-linear systems).

In addition to the impossibility of exhaustive planning, we should highlight the difficulty of planning as a whole. There are several reasons for this difficulty, including:

– imperfect or even incorrect knowledge of the current situation;

– the deformation of a user’s view of the system due to his or her own partial role as an actor;

– a non-fixed conception of the system and its interactions with other systems (access to new services, establishment of interconnections with new systems during the life of the system, etc.); and

– the fact that monitoring aims are not always explicitly defined, making planning based on these aims even more difficult!

At times, we aim more towards global coherence in a complex situation, with an acceptable quality of service, rather than towards achieving particular results. In this way, it is more important for a mobile office to have the capacity to interconnect in almost any geographic situation following the movements of the user (with, for example, the ability to interoperate with unknown digital networks in a transparent manner) than to provide perfect service (in terms of reliability, economic efficiency, sustainability, availability, integrity, safety, etc.) within a rigid framework. This latter option would remove the “mobile” aspect of the office, whereby the system aims to have no geographic limits on the Earth’s surface.

The limitations of linearity can also be seen when working with multiple scales – a situation encountered in complex systems, where decomposition does not follow a simply-definable hierarchy. A holiday reservation system providing an end-to-end service to a client, including transport, hotel accommodation and the reservation of particular services linked to leisure activities, must be able to adapt in the case of failure of one of the links in the chain (we think, here, of the episode of the Icelandic ash cloud, mentioned earlier in this chapter). This can be broken down into transport systems, hotel reservation systems, etc. These can in turn be broken down into reservation systems, security systems (information to provide to customs or bank information to be supplied in advance, for example), which themselves break down into information systems, computer systems, computer security systems, human organizations, etc. This last group can be broken down still further into materials, software, groups, individuals, etc. However, we would clearly be unable to define the higher level – the holiday reservation and organization service – in an immediate manner based on the set of computers, programs and individuals involved at all levels. Moreover, from one moment to another (i.e. for a service ordered a few minutes later), different computers, programs and individuals will be involved. The definition of scale levels is thus essential to describe, use and attempt to master the long-term use of complex systems. This, moreover, is the approach we followed above in explaining our example. It is clearly impossible to design a system of this kind without making use of the notion of scale; it is also clear, intuitively, that the notion of performance should be described differently depending on the level involved. Here, the discussion becomes somewhat informal; this precise difficulty in grasping these informal notions highlights the complexity of the subject and the need to review certain practices. This also raises questions linked to some of our objectives, such as global performance optimization or safety. This is because in practice (and also in theory, if we consider certain classes of formal systems susceptible to model the systems encountered in the circumstances we have described) it is now impossible to master all components exhaustively and optimize each one, or to deduce a property of safety based on the knowledge of individual safety levels. The same difficulty arises in attempting to define the logistical footprint of a complex system based on the logistical footprints of its component parts. At best, we might obtain global notions of stability, robustness and resilience, focusing more on the capacity to pursue global operations via the definition of global regulation policies than on the attainment of precise quantitative goals, declined or declinable from, or at the level of, all component parts.

This changes the task of the systems engineer. In trying to understand the operation of a termite mound or turbulent flow in an airplane tailpipe, it is not necessary or sufficient to be able to describe the individual behavior of each termite or of each electron making up the atoms and molecules of the fluid in question. Ecological principles of competition and evolution are useful means of explanation and provide handles for controlling the mastery of natural complex systems. In the same way, it is not the individual specification of each component that will give up the secrets of properties of resilience or capabilities in large complex systems, such as a multimodal urban transport management system or an inter-bank exchange network. Moreover, nature seems to have privileged co-evolution (symbioses, stabilized predator–prey food chains); in the same way, we find opportunist aggregations far more frequently than planned design in artificial complex systems. The Internet and related network-based applications are an example of this; applications have flourished with the launch of certain highly successful applications (for example the multiple reincarnations of Google or the iPhone).

We might go further, using the analogy between natural complex systems resulting from the adaptation of a particular species to a particular ecological niche and the artificial complex system resulting from acceptance by a growing group of users, whose personal or collective interest has nothing to do with the potential group of developers, architects or creators of the system. Here, the complex system is generated by the successful integration – a source of value for users – of other systems, generating revenues that allow us to consolidate and extend this integration (following a schema analogous to the development of a plant or animal species in certain conditions). The traditional schema of “simple” systems engineering, however, would have involved design and integration financed in the period preceding use.

This break from the traditional approach, where we begin with a formalized expression of a need, broken down and refined into solutions that are then assembled before being tested, with the requirement of mastery of the decomposition and recomposition processes, presents a real challenge. Current systems, which as we have seen can be put together to obtain new products and services, are not a priori composable. They do not have a shared conceptual basis, they were not created with the same aims, and they are often designed and produced to function autonomously with their own economic exploitation models and rhythm within their lifecycle and their own program management. The current tendency, however, is to try and interconnect and generate interactions among all elements that the designers, but also (and especially) users, may imagine.

Let us summarize the key differences between “traditional” and complex systems from the angle of the linear/non-linear dichotomy.

There is superposition – equivalent to the myth of decomposition and recomposition – that would allow us to easily find a solution to a complicated problem based on individual solutions. Unfortunately, mechanisms of this kind do not generally exist in non-linear situations, obliging us to take a holistic view of the problem that cannot be reduced to a study of sub-problems. We should be wary of breaking down tasks and attributing resources as a consequence: problems cannot necessarily be resolved by the consideration of these textbook cases!

A logical consequence of the previous observation is that “local” is not equivalent to “global”. While the dynamics of a linear system allow us to deduce behaviors in the whole state space based on simple local knowledge around an operational point, this does not apply to non-linear cases. In non-linear cases a regular point of function (i.e. where the dynamic behaves “nicely”, in that it is easily describable, for example using a linear approximation) may be found next to irregular points of operation with divergent dynamics. Such cases could, on the other hand, demonstrate horrendously complicated or even complex changes in dynamic regime. They could, for example, produce very different dynamics in an arbitrarily small neighborhood (going from one periodic mode to another periodic mode, or to a chaotic mode, or the dynamics generated by a simple logistical equation we shall discuss later could show the first sign of confirmed non-linearity).

This fundamental difference between local and global is essential, in that it contradicts a presumption that “locally, every system behaves like a linear system, i.e. that the dynamic, characterized by a parameter, is such that the variation of this parameter leads to an observable variation of another parameter in a proportional manner so that the variations of the two parameters are linked. While this model is attractive in its simplicity and seems to draw on a profound physiological attraction, it has the drawback of not being robust from a mathematical point of view. (It has not been proven that certain neuronal connections privilege linear relations, but this is highly plausible given the results of certain anatomical studies and biological considerations linked to our heritage and the ancestral survival of our species.) Aside from the factor of chance – or possibly intelligent design – nothing, in theory, guarantees such a prevalence of linear relationships, especially as dynamic equations tend to prove that the opposite is true. If a divergent point can be found next to a regular point, or, worse, if a regular point may be drowned in a sea of irregular points, however, it is difficult to imagine under what circumstances system dynamics could be predictable! It is easy to come up with examples of families of systems with these terrible properties, and families of this type are far from being isolated examples within the set of dynamic terms. In other words, the illusion of the local–global passage must give way to manifestations of unpredictability in terms of dynamics.

To move beyond the strictly theoretical textbook case, note that systems such as the global financial exchange system are not a priori linear, and their points of equilibrium may be found next to points denoting very different dynamic regimes, for example local divergences, or in other terms financial crises. The global financial situation of the period 2008–2009 might be a good example of this.

Having looked at the “myth” of the possibility of explanation let us now move on to the myth of knowledge and determinism. In this case, we shall refer to a particular phenomenon – that of sensitivity to initial conditions made popular by an article by the meteorologist Edward Lorenz in 1961, who stated that a butterfly beating its wings in South America could lead to a cyclone in the United States. This phenomenon is well-known today by all users of calculations: up to a certain level of precision, two calculations may appear identical even when this is far from being the case, and the accumulation of errors can produce huge discrepancies (in case of doubt, take a calculator, attempt to calculate π – 1/(1/π) then multiply the result by one million or one billion, depending on the capacity of the instrument. Instead of 0, the calculator gives a result linked to calculation errors, illustrating this issue. Later, we shall see a better illustration of this phenomenon as, in fact, it is simply an effect derived from other limitations).

From the viewpoint of the systems considered, this sensitivity to initial conditions in dynamical systems has the result that a small disturbance may be produce a large effect (in financial trading, this effect is sometimes sought after with, for example, huge momentary profits following a small development in the market. The opposite effect may also occur, however, leading to the explosion of a financial bubble or even to a major crisis if the effects are more durable). The opposite case is also possible (and again may be beneficial to the resilience of the systems considered). This is impossible in a linear situation, where a small disturbance produces small effects and major disturbances produce major effects. This situation is reassuring, but it is often useful to be able to move beyond this limit, for example with the extreme maneuverability of certain aeronautical vessels (aircraft, missiles) where a small correction can produce major changes in dynamics, a necessary factor for certain vital actions (avoidance of obstacles or approaching entities, recuperation of standard flight domains, etc.). The interconnection of systems only reinforces this property, and it must be taken into account if we wish to master the complexity of modern systems. An example of this is the electric breakdown that left almost 60 million inhabitants of the eastern United States and Canada in darkness in August 2003 (and caused the suspension of activity in several power stations) due to a cascading security mechanism in elements protecting the network against surges. The triggering of one local security mechanism prompted the activation of a cascade of safety measures, leading to a gigantic (global) breakdown in the system.

Sensitivity to initial conditions puts an end to attempts to completely master a system, for example by trying to limit behaviors to precise desired results. Only potentially infinite knowledge would allow us to reach a prescriptive objective of this kind. This property is linked to another principle connecting the complexity of the regulator to the complexity of the system: the law of requisite variety6 introduced by W. Ross Ashby in the field of cybernetics. This law states that the regulation of a system is only effective if it is based on a controlling system with the same order of complexity as the system itself. It is important to bear this remark in mind, as it shows that if we wish to master a complex system formed, for example, by the interconnection of other systems, its regulation is not evident (and must itself be sufficiently complex). It requires us, in particular, to identify different degrees of freedom with which to work to modify global behaviors. These include the identification of interactions, the elements involved, whether loops are stabilizing or destabilizing, their qualitative nature and modes of action.

To summarize, in order to take complexity into account, we must abandon the hypothesis of linearity. This involves a review of current principles, but also allows new behaviors. This applies to both natural and artificial systems. As an illustration, we shall take an example discussed in [CAN 08], which we shall approach as an engineering problem for an industrial system.

Our example involves a factory manufacturing products, which we shall identify using the numbers 0 to 9 for simplicity. We are only interested in the type of product being produced, and products leave the factory one by one. Factory production can therefore be characterized naturally using a series of numbers, indicating the type of the first product manufactured, the second product, then the third etc. This series of numbers is arbitrarily long, depending only on the production interval considered and the speed of production. One question we might then ask would be “is there any particular periodicity in the type of products created?” The question is important from a logistical point of view, as different raw materials may be needed for different products, and we need to know how best to organize supply. The same question can be tackled from the viewpoint of distribution of manufactured products, as each type may correspond to a different distribution network or require different transport arrangements (in terms of volume, refrigeration, special protective measures, etc.). In this case, we must define rotations of means of transport so as to lose as little time as possible and make the best use of transport capacities. Elements of response to all of these questions can be found by analyzing the properties of the number sequence mentioned above.

Let us consider a number sequence d0, d1, … as a decimal expression 0.d0d1… This means for instance that the sequence 2,3,2,3 is understood as the number 0.2323. Each sequence then corresponds in a unique manner to a number between 0 and 1, and the enumeration of manufactured products can naturally be associated with the trajectory of a discrete dynamical system, with an evolution equation xk+1 = 10 xk – [10 xk], where the function [.] is the integer part (i.e. the equal or lower natural integer), and with the initial condition of x0 = 0.d0d1… Clearly, then dk = [10 xk]. A dynamic system of this kind is also known as a Bernoulli shift; if we observe its action on the sequence of figures d0, d1, d2, etc., we see that it “shifts” it to yield a sequence d1, d2, …, simply losing the first element. Control engineers will notice that the system may also be written xk+1 = 10 xk – uk and yk= [10 xk], where uk = [10 xk], and we recognize the state and observation equations for a system with state xk, output yk and command input uk. The command y is defined as state feedback (in fact, it is even output feedback since uk = yk). We therefore see that the system is linear, but that the command is non-linear in its current state and is, moreover, simple in that it is a sample and hold relay.

Our industrial problem is therefore formalized as a system in which the only difference with a linear system is the non-linearity of the command (and this is only a relay – in mechanics, this would be a simple cog in a continuous distribution chain). This is what leads to complexity in the system. We can easily demonstrate that the set of numbers of the unit interval [0,1] written periodically is null: in other words, a number picked out at random has every chance of not being periodic. In these conditions, it is difficult to plan optimal rotations of logistical transport vehicles! We can also show that almost every number in the unit interval is noncalculable [LUZ 95, LUZ 97]. This means that we are unable to find an algorithm to predict which product will be created, or even which products will be manufactured over a given production interval – once again, logistical calculations are extremely difficult! The dynamical system under consideration is, in fact, ergodic and involves mixing [LUZ 93], making it very difficult to predict (at least from one instant to the next, as from a statistical point of view we are able to calculate a certain number of characteristics). It is a simple example of sensitivity to initial conditions, making it an immediate model of a chaotic system: two numbers that differ simply at the Nth digit will produce identical trajectories for the first N iterations, then differ. As this N may be arbitrarily large, this means that two arbitrarily close initial conditions may generate divergent trajectories and knowledge of the initial conditions, however precise, is not sufficient for precise knowledge of the system beyond a finite horizon.

This consequence of non-linearity, the absence of mastery of system behavior beyond a fixed horizon, differs fundamentally from the type of properties likely to appear in a linear system. Returning to the domain of engineering artificial complex systems, this puts a definitive stop to attempts to master the global behavior of the system and is a potential obstacle to the possibility of, if not defining, then guaranteeing long-term reliability for a given system. We must therefore adapt our concepts, and accept that we will not be able to master everything at design level. We must implement surveillance mechanisms for monitoring and adaptation during use in order to satisfy certain properties.

This situation is, in fact, no different to that which occurs in natural complex systems, where capacities for adaptation and learning allow individuals or communities to survive and develop.

We shall now look at another example that shows the diversity of possible behavior types in a non-linear artificial system. Once again, let us consider a factory. This time, we shall look at the workload: as a first approximation, we may consider that the increase in workload operates in a linear manner (production is proportional to the material or human resources available). It is clear that from a certain point, however, this is no longer the case, as the means of production will become saturated due to a lack of space, machines, personnel or supply and distribution networks.

We therefore need to add a corrective aspect to the linear term in the model, and the simplest hypothesis for the modeler will be to take a second-order term.

This gives us an equation of the following type, where we have supposed that all multiplication coefficients are identical: x′(t) = r x(t) – r x2(t), where x(t) is the workload in relation to time and x′(t) is its temporal derivative. The same equation could be given for a networked system handling information, in which case x(t) would be an instantaneous measurement of the quantity of information. Once again we can suppose, for a very simple model, that the quantity of information in the network grows in a proportional manner (supposing that, for example, each node has access to a constant number of other nodes to which it can distribute information). As before, this model comes up against problems of saturation linked to the fact that network topology cannot be extended infinitely in a homothetic manner. This is then modeled using a second-order correction. Thus, this model can be found in fairly basic situations in artificial systems. This equation has been well known (in its discrete form: xk+1 = r xk(1– xk), given that it is an iteration) for several decades in the world of mathematical complexity. It has been the object of widespread study from the 19th Century onwards, but results concerning the route towards chaos date from 1978, particularly with the work of Feigenbaum. Depending on the values of r, the number of values between which xk may oscillate is highly variable. With increases in r, this number doubles progressively, tending towards infinity, then chaos is established, then it returns to a finite state, begins to double towards infinity again, etc. This shows the huge sensitivity of the model in relation to the coupling constant, which seemed so harmless. Finally, the relationship between two successive values of r producing a change in dynamics (i.e. a different number of values between which xk oscillates) is constant. The same phenomena occur with all unimodal curves, not just x(1-x), and the relationship mentioned above has the same value. This property, often known as the universality of the Feigenbaum constant, highlights the extreme complexity of chaos, which in fact exhibits strong regularities in its extreme diversity.

Thus, we see clearly that complexity is born of non-linearity, and there is no need to seek sophisticated models and systems to see it in operation. However, the previous discussion also highlights the potential synergy between work carried out on natural complex systems and the artificial complex systems under study here that are useful to us in our engineering activities.

The models displayed above are archetypes of that which is studied by the mathematical or physical complexity community, and we have seen that they also work as case studies for artificial systems engineering. This observation leads us to think that this may also be the case for various other subjects of study, and it would therefore be useful to build bridges between mutually ignorant communities7. In an attempt to demonstrate the utility of this idea, in the next few paragraphs we shall create connections between models of systems engineering problems and models of natural complex phenomena, widely studied by those interested in the science of complexity beginning with the work of Poincaré at the beginning of the 20th Century. We hope that readers will gain encouragement in moving in this direction, going further than the first observations made in [SHE 06] and [LUZ 08c].

In most books on complexity, we find the following list of mathematical models, which is described briefly in [CAN 08]: equational models with reaction-diffusion and convection equations, computational models with various automata, various networks and graphs, and models based on fractals and power laws.

We might see here the traditional opposition between ascending and descending approaches. The equations begin with general principles. There is the conservation of certain values such as energy or the quantity of information. There is also symmetry, i.e. invariance under the action of groups of transformations such as translations or rotations, or more complex applications as characterized in gauge theory, or the maximization of certain quantities, for example an action defined as a function of energy. These principles are used to deduce synthetic formulae, the resolution of which shows the required trajectories of variables under study. However, these equations often require us to use complex theoretical tools to solve them, and it is highly possible that explicit and global solution (not uniquely in the vicinity of a particular point or within a fixed time-span) will be difficult, if not impossible.

Computational models, on the other hand, do not have this synthetic character, and start from mechanisms for local construction of a future state based on knowledge of the current state and possibly its immediate vicinity. The whole of the trajectory is then accessible through the successive step-by-step calculation of the computational principle. As different as these two approaches are, the first being based on a mathematical corpus of almost two centuries of resolution techniques and the second having experienced its moment of glory and rapidly drawn media interest with the appearance of computers and intensive calculation, it is interesting to note that they are not necessarily independent. Thus, we can show that certain systems with discrete events or cellular automata, mentioned above in relation to calculatory approaches, in fact produce solutions to to partial derivative functions. These equations may be written by applying a principle of optimization of energy. Such solutions can also be based on local exchanges of information between spatially neighboring states, allowing constructive calculation of the next future state. It is often easy to write a master equation – giving probability densities to these exchanges – that we may then attempt to solve and which leads us, under certain hypotheses, to the examples given below.

Each of these approaches has its advantages and drawbacks. Equational methods benefit from their formal character and accompanying theoretical baggage, allowing treatment of stability, sensitivity and robustness in particular. At the level of the system being modeled, this provides benefits in terms of reliability and dependability. The major drawback, beside possible mathematical difficulties, is that we must not forget that we are dealing with a model, and so the property obtained in the solution to equations, sometimes after considerable effort, is only a property of the model and not necessarily of the system itself. Calculatory methods (generally) present the advantage of simplicity and their illustrative capacity is attractive, but they sometimes require considerable processing capacity and suffer from intrinsic limitations of algorithmic complexity, and even calculability, in the phenomena they attempt to model. Moreover, as with equational models, we should remember that the results obtained are only valid for a partial representation of the system.

The choice of an approach to take in a given context is therefore based more on the availability of tools for modeling and resolution than on the general evaluation of given approaches.

Equational approaches can be split into two classes depending on the systems being considered: conservative systems on the one hand and dissipative systems on the other. The fundamental difference between the two is that the first can be considered to be isolated in terms of the total energy of the system, whereas the second cannot. Typical examples include the propagation of waves without damping (resonance of an ideal violin string) for the first group; and diffusion (propagation of heat in a metal plate heated at one particular point) for the second group. From a strictly mathematical point of view, we pass from conservative system equations to dissipative system equations by the simple addition of non-linear terms to account for non-linearities potentially created by system couplings. Among this continuum of systems, we find, for example, reaction–diffusion and convection mechanisms, which are particular examples of so-called transport equations.

For information purposes and to show the wide variation in behaviors produced by different non-linearities, let us look at a family of equations of this type: tu = D ∂x2 u + R(u). Depending on the form of the non-linear term R(u), we observe different phenomena, modeling a variety of situations. If R(u) = 0, we obtain pure diffusion (the “heat equation”), as in the classic example of the metal plate that heats up globally if it is heated locally. If R(u) = u(1-u), we obtain a simple modeling of population development with migratory phenomena. If R(u) = u(1 – u2), we obtain Rayleigh-Bénard convection. This convection phenomenon is that observed in a pan of boiling water heated under certain conditions: “rolls” of water circulation appear beneath the surface. The same model is applied to magma circulation when modeling plate tectonics; it also explains certain atmospheric circulations. If R(u) = u(1 – u)(u – a) and 0<a<1, we obtain Zeldovich’s combustion model.

These different equations show three different mechanisms: diffusion, reaction and convection. The diffusion mechanism has a tendency to homogenize concentrations of the elements concerned. It is characterized by an irreversible transport phenomenon due to the migration of elements (hence its use in modeling situations of this kind). Moreover, boundary conditions, i.e. the shape of the system boundary, are propagated “instantly” in that, at a given moment, each point is influenced by all other points. The reaction mechanism, on the other hand, shows a local influence, a function of the current state. The convection mechanism demonstrates an influence in function of the stream of speeds that is proportional to local speed, producing large-scale circulation phenomena.

It is the combination of these mechanisms that creates complex behaviors, implementing spatio-temporal correlations on a large scale. An example of this is the modeling of the appearance of Turing motives, such as stripes or spots (zebra, leopard, etc.). We also find this in Belousov-Zhabotinsky chemical reactions with oscillation dynamics, in nature with the appearance of vegetal rings (truffle lovers will understand!) or the propagation of epidemics (it is the measurement of diffusion speed that allows us to identify the high point of an epidemic, often used when discussing seasonal ’flu or gastroenteritis outbreaks).

Models of this kind can be used (in competition with the computational models discussed above) for modeling crowd behaviors, making them useful in designing infrastructures enabling rapid evacuation in a crisis-management situation (fire, accidents or terrorist attacks – see [ZOL 10]). They are also useful for modeling the propagation of information or beliefs within a population: we might wish to disturb this propagation, for example for counter-insurgency purposes, by introducing supplementary terms that have the effect of creating other solutions without diffusion. A certain number of systems engineering problems can also be modeled in this way, including logistical problems or problems of system interconnection with exchanges of information flows: in this case, we use flow models that consist of producing systems of ordinary differential equations coupled with a hypothesis of flow conservation [DAP 10].

For example, if we consider a queue (the basic mechanism of any resourcemanagement interface and so at the root of complexity in artificial systems), the rate of change of the queue going into any system indexed by i (machine, distribution network, reservation system etc.) is given by the difference between input (λi) and output (μi) flows: dqi/dt= λi – μi where μi = λi+1. We can demonstrate that a queue system of this kind is stable, with queues of finite length, if and only if the associated fluid model has a stable stationary solution. Without going into detail, this fluid model is obtained by considering the density of elements entering the system using the famous flow conservation equation8. We introduce a new variable x (in partially derived equations, this is often the “spatial” dimension, as opposed to the “temporal” dimension, t) that describes the degree of completion of the production or treatment process inherent to the system being considered, and the state variable which solves the equations will be ρ(x,t), which gives the density of the product at state x and time t. The temporal development of this variable must satisfy the principle of flow conservation, giving local equations of the type ∂ρ/∂t+∂(vρ)/ ∂x=0, vρ|x=0= λ(t), ρ(x,0)=f(x), where the boundary condition λ(t) represents the external flow and f(x) an arbitrary initial condition.

Equations of this kind have been the subject of major publications for more than a century and are used today in models for traffic flow and mechanisms in financial markets, among other things. Their solutions are shock waves (corresponding to traffic jams in the case of road traffic circulation). The model sketched above is convective. To model a more realistic situation, we might add diffusion terms to give an equation of type ∂ρ/∂t+∂(vρ)/ ∂x=D(ρ,t) ∂2ρ/∂t2, to account for higher-level effects on the state variable and include the fact that, during production, products (or information traveling through the system) do not strictly follow the planned temporal sequence (delays occur for a variety of reasons, for example the failure of a machine or a server or the breaking of a connection). Here, we see a direct parallel with the equational models described above in the case of complex natural systems. Readers are invited to consult the published literature on the subject to see how these partially derived equations contribute to the modeling of logistical or other systems.

We shall end this summary introduction by indicating that another interesting aspect of these models is the ability to integrate constraints on the state variable, modeling, for example, finite processing capacities, and which – from a mathematical point of view – bring in optimization mechanisms for the solutions to partially derived equations.

Having looked at equational models, let us now review calculatory models. The first, and simplest, computational models that spring to mind are finite state automata: these are made up of a finite number of objects, known as states, and the transitions between these states that are observed by reading a tag associated with each transition. Certain specific categories exist within these states: initial states and final states. The path of an automaton begins with an initial state and moves toward a final state, and the list of tags obtained when passing through state-to-state transitions determines the word recognized by this path. These models are naturally used to describe, as a first approximation, systems with discrete events, i.e. systems where the passage from one state to another occurs via the arrival or occurrence of a particular event. Examples of this include logistical systems [SIM 05], where events correspond to the availability of certain flows, for instance in reservation systems where the event is a transaction between particular parts of the system. It is interesting to complexify the initial model of the finite automaton. We might, for example, add a memory and a means of accessing this memory at the level of each transition. Here we obtain increasingly expressive models, in terms of the algorithmic complexity of the languages formed using the recognized words.

Models using finite state automata and their generalizations are widespread due to their ease of use. Other extensions of finite state automata, such as Petri nets or automata, are also popular in simulation. In Petri nets, transitions occur when particular events arise, adding a temporal notion to the interpretation of the pathway. In addition, there is a token mechanism which means that certain transitions are only authorized if we have the necessary token to “validate” the transition. In this case, the token in question is also transmitted to the next state. This mechanism gives Petri automata higher powers of expression than those found in finite state automata, and makes them particularly useful for modeling usage scenarios for complex systems.

Automata form the basis of computational models. Following the same basic idea of state transitions, but allowing multi-dimensional states or non-deterministic transitions (several possible source states, or spontaneous transitions), it is possible to model a wide variety of situations, with, among other things, manipulated structured data. This is the case in cellular automata, which are widely used as a model to simulate urban or agricultural development (applied to urban planning) or the propagation of pollution, fire or epidemics (with the aim of offering realistic projections of current crisis scenarios for crisis-management systems).

The starting idea is very simple: a cell encodes a state that may take a finite number of values (often represented graphically by a color). From one moment to the next, each cell modifies its state depending on the values of neighboring states, while only considering a finite neighborhood. This modification is supposed to be able to be expressed in the form of a simple rule, of the type “if neighbor A has value x and neighbor B has value y, etc., then the new value of the current state is z”.

To see how we move from this example to a model of epidemiology, for example, we may simply interpret state 1 as the fact that a given person is healthy and state 0 as the fact that the individual is infected. The rule then indicates how the infection is propagated locally based on the state of a person and their immediate entourage. Evidently, to produce a useful model, the algorithm needs to be more complicated. It needs to consider, for example, certain constraints on spatial neighborhoods (an epidemic does not spread in the same way in all geographic conditions) and temporal constraints (a previously infected individual may become immune, or a person who is not infected in a certain environment may possess natural immunity, etc.).

The interest of these models lies in the ease with which we may model a complex situation where we must principally be able to specify how each individual agent evolves from one moment to the other based on local interactions with a certain number of neighbors. Step-by-step execution then demonstrates the descriptive capacities of the model and shows global behaviors where applicable.

Clearly – as human ingenuity is apparently limitless – models exist that regroup the properties of the two main families mentioned above, producing hybrid models. To provide a rapid characterization of these models, remember that in equational models the state variable is continuous and the solutions to equations using derivatives of this state variable follow different parameters, whereas in computational models the state variable and its variations are discrete. A hybrid model uses a state variable that is discontinuous at certain moments in time. To illustrate this, in a logistical system, discontinuities occur when the logistical chain jumps from one configuration to another, representing, for example, certain intermediary transport processes if the state variable corresponds to a stock level. We thus obtain automata that describe the succession of different regimes, where each regime is described by a continuous model.

These different models are useful as they also allow us to pass from one to the other depending on the desired use, and also depending on the available theoretical or practical tools for resolution or calculation. In general, the passage from a discrete computational model to a continuous equational model is carried out by making the numbers of elements considered tend towards infinity. This means that instead of exhaustively considering all elements that are exchanged, we consider the density function of such elements and its level of variation, or we can link each element to probability distributions that account for certain traits linked to incertitude. The passage from a continuous model to a discrete model works by highlighting certain regime changes and leaving aside system development within a regime.

Let us now consider another family of models: networks, also known as graphs in mathematics. Their definition is simple and reuses the definition of automata given above: a finite group of nodes (vertices of the graph) and a finite group of connections between nodes (the edges of the graph). The former correspond to the entities under consideration, and the latter to the existence (or otherwise) of interactions between two entities. Graphs may also be oriented, in that edges may be defined as linking two vertices with a notion of a point of departure and a point of arrival. This notion is useful in showing positive or negative dependencies between the vertices in question, given that our influence changes depending on the orientation: in the first case, the quantities associated with each vertex vary in the same direction, whereas in the second case, one decreases as the other increases. We find this type of model in both artificial and natural systems, including food chains, river networks (in this particular case, loops are rare, unless we consider more or less stagnant oxbow lakes), road and rail networks, telephone networks, electric networks, internet networks, social networks, etc. The existence of interactions, and more generally of closed-loop chains of interactions, renders these structures particularly interesting. Thus in cancer research, for example, graphs are used where the vertices are genes or proteins and the edges are regulation interactions between these vertices. The general behavior is thus a combination of all interactions, whether direct or closed-loop.

From the viewpoint of graph theory, the dynamics of graph assembly – preferential attachment to weakly connected nodes, for example – and disassembly – targeted deletion of certain edges – are interesting for our purposes. So is the influence of certain operations applied to graphs on different measurements of graph performance. We can identify three main families of graphs, which present different properties and which we may wish to obtain: random graphs, scale-free graphs and small-world networks. In random graphs, the distribution of degrees of connection (the degree of connection of a node is the number of nodes to which it is connected) follows a random law. Graphs of this kind are therefore robust when faced with the random deletion of edges; however, they are not particularly efficient in terms of information transmission between two nodes. In scale-free graphs, the distribution of degrees of connection follows a power law (x↦xλ). Certain nodes will therefore have a large number of edges, while the majority will have few edges. This is the principle found in air traffic networks, with hubs on the one hand and regional airports on the other; in the latter case, several changes are needed to travel from one regional airport to another, while direct flights exist between major airports. Social and biological networks are often of this type. These graphs are robust when faced with random deletion (the cancellation of a flight from a regional airport does not disturb traffic in a hub). They are, however, vulnerable to targeted attacks on strongly-connected nodes (all traffic travelling via London Heathrow, for example, was disturbed by terrorist threats to the airport, immobilizing a large number of passengers, whether they were traveling directly to a destination or at the airport for a connecting flight). At system design level, this means that particular measures should be put in place to protect nodes of this type: this is one immediate conclusion obtained through topological analysis of the network. Finally, in small-world networks any pair of nodes may be connected by a relatively short path: these networks fall, in some ways, between a random graph and a regular graph. Food chains and certain social networks are of this type.

Among the various examples of complex systems that spring to mind, let us take the example of systems of systems, within which the socio-technical component (i.e. the human and, in particular, the organizational dimension) is a key factor in success or failure. Certain views of this type of system may be modeled using networks, and the analysis of these networks is then an effective means of identifying levers for action in terms of maximizing the value chain within complex organizations. We may identify ways to improve the circulation of information between different strata of the hierarchy, optimize competences within the network, identify critical individuals or functions and obtain scale effects in the product or service offer. A particularly important factor is that the chains of causality between final users, direct suppliers and third parties have a major effect on the performance of an organization, its flexibility and its adaptability in the case, for example, of the disappearance of a third party or of one of the suppliers involved.

By studying different networks from the point of view of their topology (degree of connectivity, study of transformation of the topology when edges are removed, etc.), we may deduce the strong and weak points of organizations, and propose means of improvement in order to optimize certain properties. Thus, in networks modeling epidemiological situations, we may define unbiased immunization policies (i.e. without choosing particular individuals, or certain network nodes) on the condition that these networks have a certain degree of connectivity. Other policies, with bias, could include the choice of random sub-populations and the immunization of certain neighbors of the elements of these sub-populations (favoring nodes with strong connectivity, as they have more chance of being chosen as potential neighbors). These principles may also be applied to networks modeling the propagation of beliefs, where immunization is replaced by ideological conversion. The reasons this field of study has become important in the context of anti-terrorist efforts in recent years are therefore evident.

Note, however, that the topological study of graphs does not provide quantitative information on the systems being modeled, unlike other models presented above. It does, however, provide a structural complement that, moreover, gives us information concerning certain measurements of theoretical complexity.

The models presented in the following paragraphs are different from those described above as they lack the same predictive power, but they do offer original approaches to the explanation, comparison and classification of systems, a capacity that is useful in outlining, if not mastering, the complexity of these systems. They are based on the notion of scale.

Scale is a property of both space and time. It either refers to spatiotemporal resolution (a field of ripe wheat is a yellow polygon when seen from an airplane, or a more or less ordered group of stems when seen from the edge of a field) or to the extent covered by the analysis (we might look at a space of a few meters in front of our feet, or decide to look at several areas and consider all points of view). Clearly, we will not be interested in the same aspects of distribution of centers of interest depending on the scale used, nor in the same level of interactions between elements. This leads to the creation of a hierarchy of elements and interactions depending on scale and a quest to find aspects that are invariable on any scale. It also leads to the search for interactions between different scale levels, with either negative (homeostatic loops leading to local equilibrium, often sought after in system regulation) or positive effects (e.g. the self-amplification process, for instance the greenhouse effect in climatology).

Isometric and allometric analyses can be used to identify certain scale laws based on similar relationships between parameters whatever the level considered or, on the contrary, on symmetry breaking (i.e. discontinuities9) as these may demonstrate the existence of a particular physical phenomenon that should be exploited.

One example of a scale law of this kind was provided by the Swiss chemist Max Kleiber when he was working in the 1930s. Kleiber studied the relationship between the mass of an animal and its metabolism, and proposed a universal formula: the quantity of energy burnt per unit of weight is proportional to the mass of the animal raised to the power of three-fourths. George Kingsley Zipf, a linguist at Harvard University, formulated another power law, which carries his name, in 1932. For example, the proportion of towns with a certain number of inhabitants can be expressed approximately as the inverse of the square of the number of inhabitants (explaining the fact that there are few very big cities and a large number of small towns; this distribution is linear on a logarithmic scale). This also applies to the presence of words of a certain length in the lexicon of a language, or to deaths in the course of a conflict. Moreover, [BAL 04] and [JOH 07] cite works inspired by that of Richardson, who gives the logarithm of the number N of casualties in conflicts between 1820 and 1945 in relation to the logarithm of the number of conflicts generating N casualties: the points line up in a straight line, showing the presence of a power law. The coefficient was estimated at 1.8 by Mark Newman, of the University of Michigan. Carrying out the same exercise for terrorist attacks, in G7 countries on one hand and other countries on the other, Aaron Clauset and Maxwell Young of the University of New Mexico identified two power laws with coefficients of 1.7 and 2.5 respectively. The first coefficient is similar to that given for “regular” conflicts (remember that these conflicts occurred in the zone that now corresponds to the G20 member states). Going further, Mike Spagat and Jorge Restrepo analyzed losses for different battles during these wars, and identified power laws with coefficients tending towards 2.5; this would show a certain “universality” of this coefficient, and thus of the model of deaths in conflicts, whatever the nature of these conflicts.

The scale laws cited above characterize the appearance of a phenomenon depending on the scale at which it is observed or appears; other situations exist where the phenomenon is reproduced identically on different scales or, in a way, recursively. This is known as a fractal phenomenon, introduced under this name by Benoît Mandelbrot in 1976 [MAN 74]. Taking examples, such as the fern or the cauliflower, which present an analogous structure when we zoom in on a detail, we find these phenomena in both artificial and natural systems. Thus, the Internet may be modeled as a fractal when we look at the connection topology. The same applies to certain models of development of urban settlements and to the evolution of front lines in combats [MOF 02].

Understanding the links between different scale levels and thus obtaining a representation of the internal dynamics of a system is one of the keys to mastering complexity; however, we must still dissociate temporal and spatial loops and fast and slow dynamics. If, for example, we consider the economies of “blocs”, such as Europe or the United States, we are confronted with systems (the “local” economies of states) that are integrated in different ways within a system of systems: a union on the one hand, and a federation on the other. Without going into practical detail, the differences in modes of management and interconnections mean that the global systems in the two cases are very different, both in terms of architecture and the range of behaviors they allow.

What, then, should we retain from this examination of complexity and the formal models developed in order to understand or even master it?

First, we should take direct inspiration from all work carried out on complexity theory within the field of application of natural systems, and therefore revise a certain number of presuppositions and habits inherited from traditional systems engineering, intimately linked to the hypothesis of linearity, such as:

– the belief that, in a cause-and-effect relationship, mastery of the cause will lead to mastery of the effect;

– the belief in a mechanism based on the decomposition–integration sequence involving no loss of information;

– the belief in the reduction of complexity in the hope of isolating a system from its environment, neglecting dynamic interaction loops between dynamic scale levels and ignoring local links that may have global consequences;

– the belief in mastered planning of tasks given to geographically distributed teams with simple reviews and tokens for progress; and

– the belief in mastery of system reliability throughout interconnections by the simple addition of individual properties of reliability and the reliability properties of interfaces.

It is widely accepted today that emergence, dynamic retroaction and couplings between scale levels play an important part in natural systems10 and that these factors must be taken into consideration. In practice, however, we find shameless and almost systematic application of centralized management, collaborative teamwork, interconnection of systems and their logistical chains, evolutions of specifications based on changes in rules and the reinforcement of the place of human users in the system with no revision whatsoever of practices and processes that have been around for several decades!

Second, we have learnt to be cautious of hasty conclusions, whether obtained intuitively or deduced by analogy. The diversity of dynamic behaviors observed in non-linear systems should lead us to be careful when faced with a large complex system with multiple interactions at all levels, particularly where humans are involved; human behavior clearly cannot be assimilated to a totally predictable linear system! Insofar as non-linearities most often imply irreversibility (this is the case in systems involving the dissipation of energy or information), we need to establish preventative mechanisms for identification, evaluation and reaction. This is in order, if not to totally control the behavior of systems, at least to master these behaviors and keep them within the envelope of acceptable behaviors in relation to the objective to be satisfied by the system.

1.7. The maintenance and logistics of systems of systems

For any given system, the majority of costs are associated with the usage phase (we find ratios of the order of 30:70 or even 20:80 between design costs and usage costs, and this is often without taking all indirect usage costs into account): this also applies to large complex systems. Moreover, this stage of the lifecycle has an even greater chance (or risk) of occurring in this case, if only because the user (as much as the designer) will clearly be confronted with a certain number of systems that have been re-used in system-of-systems assemblies or with which interconnections will be established.

We must therefore consider two major issues: maintenance and logistics. Maintenance includes curative, preventive and evolutive approaches, distinguishing, in the latter case, between minor and major developments. This vocabulary, now used in the context of systems, has been inherited for the most part from the domain of information systems. In order to put a stop to the debate and avoid entering into semantic quarrels, note that for our purposes the border between minor and major evolutions is a question of evaluation, the basic criteria for which involves deciding whether or not a critical sub-system or component needs to be redesigned. Logistics includes all problems linked to the transportation of pieces or components for maintenance and repair and to the flows (of matter, energy and information) necessary for the correct operation of the system.

If we look at the logistics chain for a large complex system in its most limited sense, we naturally encounter questions concerning resource management, whether in terms of:

materials: storage conditions, storage locations, transport, management of inventories to know whether any system components require particular transport conditions;

people: necessary suppliers, identification of critical suppliers (whose disappearance would have an impact on the continued operation of the global system), manufacturers involved in the supply chain and technological cycles involved in their chain of production. (The latter includes problems of obsolescence, where technologies may have life-spans that are several orders of magnitude lower than that of the meta-system of which they are component parts. The issue is applicable in both electronics and material sciences, particularly with the increased importance of sustainable development directives and regulations that affect certain manufacturing processes and certain materials, due to their potential risks or products involved in their manufacture);

information: long-term management of information linked to the system, establishment of knowledge management structures, choice of information systems to use in order to guarantee the durability and possibility of data reuse.

In addition to resource management considerations, if we concentrate simply on the material question of storing the parts necessary for specific components and subsystems of the global system to operate, we might think about optimizing stock management. Clearly, the local optima associated with each system do not correspond to the optimal response when considering the complex system as a whole. If we take an initial problem concerning the choice of locations for production and consumption, as a simple example, we are presented with a Cornelian dilemma when dealing with several systems at once. This raises an issue of flow analysis, if only to plan and establish trajectories and the associated means of transport, depending on predicted and planned demand, but also on the complexity of certain couplings between systems. In addition, we encounter difficulties with individual inventory planning, definition of levels of stock replenishment or, in short, propagation throughout the global logistics chain of various elements that are generally involved in an isolated logistics chain.

The issue is complicated still further by the fact that from a contractual perspective, various providers and contractual frameworks may be used, with different rights and responsibilities and even different rules. We should thus aim to establish collaboration between the different parties involved in order to obtain joint optimization of multiple contracts, seeking dependencies between supply chains in order to optimize production, storage and transport as far as possible and to avoid intermediary storage and inventories that consume resources (i.e. create cost) and do not provide immediate value to the final user.

For systems of systems, the combination of the logistics and maintenance aspects associated with each system creates a need to look for shared elements (processes and resources) in order to analyze dependencies and constraints. This bottom-up approach appears in contrast to a top-down approach, where we would aim to master usage in terms of complexity and cost. This starts from the level of service expected by the final user (the consumer), deducing constraints in terms of system maintenance and the management of evolutions at component level (systems, subsystems or critical components) to attain the expected level of performance, possibly leading to design constraints at system architecture level. Readers may notice that this is an extension to the approach used for integrated logistical support, applied at the highest level of the system structure. This is attractive on paper, but should be examined in relation to contractual engineering associated with component reuse, accounting for possible questions of industrial and intellectual property, and certain commercial strategies. Remember that complexity comes in particular from the combination of blocks, and that different dimensions (technical, economic, organizational, political, etc.) must be taken into consideration.

Finally, these aspects of logistics and maintenance once more demonstrate the critical nature of knowledge (or mastery) of interdependences and the fact that the challenge is contractual as much as technical. Indeed, development or corrective – and particularly preventative – maintenance, would be very difficult in a case where we are unable to explicitly identify the component likely to be responsible for a fault.

1.8. Perspectives and lines of enquiry

The following sections build on and adapt the results of a collective project to suit the issues covered in this work. The project in question was carried out within the Association Française de l’Ingénierie Système11 (French Association for Systems Engineering) in 2010 [DEV 10] and presented in [LUZ 10], among other works, with the aim of defining a medium-term vision of systems engineering.

1.8.1. Contextual elements

Let us recall those contextual elements that will direct lines of enquiry for research in engineering large-scale complex systems.

Let us return to the basic elements involved in moving from the idea of a system to an industrial solution, i.e. for climbing the ladder of technological maturity (for example, the technology readiness level scale developed by NASA), from TRL 1, involving basic technological research, to TRL 9, which indicates system testing and implementation. Clearly, a multi-disciplinary methodological approach is required for design, development, creation and production, with major aims including the mastery of risk and the management of costs occurring throughout this transformation. Of course, the response to these issues is to be found in systems engineering. Readers may wish to consult one of a number of surveys on the effectiveness of systems engineering or software engineering for a qualitative demonstration of the value of these efforts. Nowadays, whatever the budget involved in the project, this effort is necessary as it correlates with improvements in performance (increased mastery of time schedules, cost and time overruns and reduction in time to market). Moreover, in the value chain of systems engineering, particularly in the presence of ever more complex systems, all technological advantages, the different actors involved in task execution (including recruiters and those responsible for training experts in systems engineering, consultants and associated strategists, etc.) are included. This encompasses clients and users, and the environment with its judicial, economic, economic etc. constraints. Faced with this host of actors, the domain of systems engineering has developed a frame of reference that may be considered to be mature, having existed for several decades. (Certain concepts such as the distribution of tasks between project managers and contractors, contracting between parties, the concept of architecture, the principles of logistical support, etc. – have been around for centuries or even millennia, if we take risk analysis back to the first assurances of maritime commerce established under the rule of Tiberius.) Systems engineering has been put to the test in a wide range of projects on very different scales in varying contexts. That said, note that there is still plenty of room for improvement, initially in the systematic application of principles, methods and tools, then in accounting for several crucial characteristics, such as the human factor, and the complexity of the object system or systems. (The adaptation of architectures to the presence of human actors and/or decision makers has implications at the system level in terms of reliability, dependability and repetitiveness. Reference frameworks have mostly concentrated on workstation ergonomics, focusing on essentially physical constraints.)

Other orientations have also been abandoned by engineering reference frameworks, but are increasingly present in the field of practical applications. These include adaptation to the world of business, with projects involving very small companies that are unable to create the same organizational structures found in larger companies. As a large complex system has a high chance (or risk) of involving small component systems that may be operated by these small companies, the methodological shortcoming that excludes them from the field of interest could generate potentially dramatic blind spots in the final system. Another of these orientations involves particularities of the domain of application, which should be included in the engineering process of corresponding systems. For example, constraints and traditions are clearly not the same in the fields of defense, civil mobile telephony or the clothing industry. We therefore see the importance of using trade processes that are suitable for the context of application, even if the framework used is identical at a higher level. That said, when we need to create collaborations between such different domains within a complex system, we must be aware of the different frameworks involved and their points of compatibility or incompatibility in order to have a chance of mastering the final complex object12. We may also explicitly raise the question of how to account for constraints linked to the organization of engineering teams in systems engineering. In this case, organizational aspects cover the direct management of personnel and other resources involved, regardless of their potential cultural diversity, a result of the internationalization of businesses and geographically dispersed implantation. These aspects also cover contractual complexity and the management of the rights and responsibilities of those involved when dealing with innovative partnerships. Here industrial actors come together momentarily to work on engineering a complex system that they would not have been able to tackle independently and which cannot necessarily be broken down systematically into individual parts to be dealt with by individual actors.

In addition to these first questions that spring to mind and despite considerable progress in the mastery of certain well-defined activities, no existing methods or tools integrate these aspects for the benefit of large systems. We also lack an integrated vision in terms of the management of models and their global link to systems engineering as a whole and over the complete lifespan of the system. We are led to wonder whether certain standardization initiatives, with descriptive languages at component or system level and protocols for the construction of data exchange structures, may have had a negative effect. This could by promoting the belief that the problem as a whole has been solved or that low-level tools can act as a substitute for method, or by creating cliques of experts or users greedy for new and fashionable technologies, to the detriment of a general approach applicable to the wider community.

This question of tools, and also of optimal strategy in terms of editors and distributors (in a context of risk sharing in complex projects, while preserving individual knowledge and competences over the long term to remain competitive in a difficult market, to suit necessary or opportunistic momentary alliances) is applicable both in economic and commercial terms and to methodology. To illustrate the relevance of tools, we have only to look at the revenues linked to computer-aided design tools in relation to those linked to software packages for systems engineering: the first largely outweighs the second.

1.8.2. Factors of influence

Businesses focus on their primary activity to ensure their survival in the short and medium term, seeking to preserve their competences in a durable manner. They concentrate on those activities with the highest added value and are permanently on the lookout for ways to create value, taking account of the expressed or deduced desires of users. In many cases, the focus is on service provision rather than product manufacture. Companies tend to abandon activities not linked to their primary focus in order to remain up-to-date on key technologies or sectors linked to their strategy. Moreover, as we have already highlighted, businesses enter into new relationships with suppliers, and particularly with partners with whom they share risks. This has led to the creation of new rules of management linked to project execution and the distribution of engineering activities, both in the architectural and production phases and even in maintenance or development. This raises questions concerning the attribution of responsibility in terms of system reliability and security: how is this responsibility attributed – or, in short, who is responsible?

The globalization of exchanges has also led to the appearance of new actors who are likely to experience considerable development over the coming years, notably in developing countries and including the sectors of large complex systems such as the aerospace, nuclear and aeronautics industries. This tendency is reinforced by the fact that large contracts are almost always accompanied by technological exchanges. Moreover, externalization in the form of outsourcing and off-shoring has contributed considerably to these developments and migrations of competences. In addition to the appearance of these new actors, this also implies the development of multicultural teams and a need to manage this diversity within project teams, while at the same time dealing with major variations in different national policies. This complexity in managing projects, personnel and competences is an important issue in determining the success of these developments.

Another factor to take into consideration is technological developments, particularly in the domains of computing, materials and convergences between several branches of science. In computing, the technological advances that have a direct impact on the systems that interest us apply to basic components (increasingly fast microprocessors, increases in the size and rapidity of access to memory, graphics cards, etc.). They also apply to processing capacities (parallelism, resource distribution, grid computing), sophisticated software programs and high-level programming languages with a high capacity for expression and, ideally, simple and intuitive syntax. Where materials are concerned, we should highlight the contribution of nanotechnologies, among other domains [LUZ 07], that allow us to envisage new applications, particularly intelligent materials with integrated means of signaling wear and tear, offering new potential concepts for maintenance in “push” rather than the traditional “pull” mode. The convergence of different branches of sciences principally refers to bioinformatics, the symbiosis between materials and processing and/or information transmission capacities, new forms of computing that may have a radical effect on system performances and even the field of calculability (quantum processing, DNA processing, etc.). In short, we are faced with a wide range of possibilities for development with the potential to radically change the architecture of systems.

In addition to these technological developments, there is a clear demand for new and sophisticated products and services: the “fashion” for innovation is not lacking supporters and, particularly, consumers. This development also allows us to improve the productivity and quality of constructed systems, allowing the creation of new products and services via the use of innovative basic components or new forms of infrastructures. All of these developments, which may be characterized by the shock expression “innovate or die”, are accompanied by significant pressure to reduce time-to-market – a clear result of the “fashion” effect!

Beyond this “race for innovation”, whether it be tolerated or desired, current society is oriented towards a search for processes offering increased economic efficiency within a sustainable development approach. This is due, in part, to the scarcity of raw materials (or to an awareness of the finite nature of resources that will eventually lead to scarcity with the generalized expansion of demand). It is also due to increases in production costs. This is because of the sophistication of components or systems, requiring complex transformation operations on raw materials and, often, considerable investment in research on these processes and in production infrastructures. We are thus in a situation where new markets are emerging, for example the organic and fair trade markets, accompanied by the creation of new professions, including eco-design and the “green economy”. (“Green” seems to be becoming a compulsory attribute, with green fuels, green chemistry, green armaments, etc.13) In short, new markets involve eco-efficiency, where we aim to create value while reducing ecological impact. This is done through reduction in material use, energy consumption, waste and particularly toxic waste and by increasing recycling, product life-spans and the intensity of services (in order to reduce the quantity of resources needed for their provision). This opens up new horizons at all stages of the engineering process: design, creation, production, maintenance, development and dismantling.

To conclude this review of major factors of influence, note the impact of the population pyramid, with the retirement of a number of qualified personnel and the arrival of smaller numbers (at least in the domains of competence needed for the engineering of large complex systems) from newer generations on the national market. The possible arrival of people from developing countries in the global market does not compensate for this imbalance, due to the fact that the education systems present in these countries do not tend to promote the development of the required competences. Without making any predictions for the future, this may create a dual problem; in the short term, we must deal with the reduction in resources by finding new ways of working, and we must also develop training and knowledge-management capacities to provide satisfactory perspectives for longterm development.

1.8.3. Trends, issues and challenges in systems engineering

Systems, as we have already stated, are becoming increasingly complex, with emerging, adaptive and highly dynamic behaviors at different scales of observation. Moreover, large complex systems cannot, by their nature, be designed or acquired from scratch, essentially for cost reasons. Consequently, they must involve “inherited” systems that are not directly adapted to need as envisaged for the global system. Design teams may be geographically dispersed, leading to a diversity of cultures, either between different companies or within different geographical zones. Moreover, the ubiquity of new information technologies, particularly the Internet, means that we live in an era where time is “compressed”, creating new demands in terms of performance and availability, but also increased vulnerability as systems become susceptible to malicious attacks.

According to the latest work published by Kevin Kelly [KEL 10], founder of Wired magazine, large complex systems form part of an attempt to increase the 13 following characteristics: efficiency, opportunity, emergence, complexity, diversity, specialization, ubiquity, liberty, mutualism, beauty, life, structure and evolutiveness. Readers are invited to consult the work in question for explanations and justifications; although we would use different terms in some cases, we agree with this analysis for the most part, including where esthetic characteristics are concerned. Given these general developments, the major subjects for study or issues encountered in complex systems engineering may be set out as in the following list, which we will discuss briefly with reference to the challenges associated with each item:

– issue 1: very large heterogeneous systems;

– issue 2: very large autonomous systems;

– issue 3: modeling and simulation around the whole system perimeter;

– issue 4: virtual prototyping of very large systems;

– issue 5: verification, validation and qualification of systems;

– issue 6: knowledge management throughout the system’s lifecycle;

– issue 7: human-centered agile design.

1.8.3.1. Issue 1: very large heterogeneous systems

The appearance of high bandwidth networks has generated high expectations in terms of connectivity and data availability. This has clearly led to massive interconnection between systems, as we have already noted. Often these interconnections were not planned at the outset and have been created in the hope of producing new services or pooling certain pre-existing functionalities of various components. Systems of systems [LUZ 08a, LUZ 08b] fall into this category. Note that a connection is not always created by the designers in charge of finding solutions to new demands; it may also come from users, and this way of connecting systems based on use is a recent trend, particularly where multimedia components are concerned.

The challenges that naturally arise thus include the management of legacy systems:

– How can we take pre-existing components into account in the architecture of the new system?

– How might these components operate in environments and conditions for which they were not necessarily designed?

– How may we master the complexity created by connections, trying, for example, to reduce or hide this complexity through the use of improved interfaces?

We are also faced with questions of security, operational safety and resilience, and those of evaluation and qualification of systems, particularly in changes of scale created by the use of communication and computing technologies that multiply capacities.

1.8.3.2. Issue 2: very large autonomous systems

With the development of decision and automation techniques, the past few decades have seen the emergence of autonomous systems such as planetary exploration robots, aerial reconnaissance drones able to undertake military missions, robot receptionists, etc. In short, in numerous situations that are too dangerous to risk a human presence, or for reasons of economic efficiency, we use machines instead of or in addition to human operators for tasks or for partial or complete missions.

The challenges involved include decision making in a limited and incomplete environment, simultaneous to real-time control, such as taking the environment and neighboring systems into account in order to establish cooperative working where necessary. In addition, we must also consider reconfiguration in case of failures or breakdown. This is essential in order to guarantee a sufficient level of autonomy. Finally, we may expect and require properties of reliability, trust, security and safety in order to give authority to a large system of this kind in any situation. The definition of architectures allowing and guaranteeing these characteristics is the key to mastery of these systems.

1.8.3.3. Issue 3: modeling and simulation around the system perimeter

As set out in [CAN 08], modeling and simulation are used to support all the processes of design, architecture, comparative evaluation of architectures, evaluation and qualification, training and maintenance. They provide us with valuable performance information without us having to carry out tests using real materials, leading to considerable gains in terms of cost and time, and may allow us to make choices that could not otherwise have been validated. Note that in this case we are interested in modeling and simulation of all system components and over the whole lifecycle of the system. The ideal is to obtain a vision that is as integrated as possible, with the aim of mastering the global complexity of the object of study while avoiding falling victim to the complexity of the tool itself!

Challenges appear in modeling, particularly the mastery of granularity, i.e. the ability to pass from a general model to a specific model and vice versa. This backand- forth between abstraction and aggregation is essential for the construction of different models and models with different levels of granularity in particular, which is the case for the applications envisaged above, for example heterogeneous systems managed by different organizations and where the available data are necessarily different in nature. Moreover, in terms of technology, we encounter the question of exploiting computing resources, a key factor in implementing models with a very fine level of granularity and in complex simulation assemblies. Modeling of human behavior, and notably of human behavior in stress situations, is also a crucial point in developing future systems. Finally, we are faced with the question of inversion of the model, i.e. the problem of defining a model and thus a simulation based on a set of expected behaviors.

1.8.3.4. Issue 4: virtual prototyping of very large systems

Above and beyond the simulation of a system, however complex, we encounter the question of enabling the coexistence of real systems and simulations, which we find, for example, when attempting to see whether a potential assembly can or cannot provide new responses in cases where certain component systems are absent or otherwise unavailable.

The coexistence of systems, products, services and executable models creates technological challenges (interoperability, integration, etc.). We may find responses to these challenges in terms of standards and infrastructures [CAN 08], but we must also define particular processes for the exploitation of these technological resources once the resources themselves have been defined. If we are able to move from a system back to its model in terms of demands and technical performance (via retroengineering techniques, which are not necessarily easy or feasible in all situations: we are, after all, trying to move from a single sausage to the pig that produced the meat!), it becomes possible to include models of various existing systems. Such models determine a virtual prototype of the system, and can even be used to develop certain characteristics of the system by modifying one of the models and then observing the behavior of the global complex system integrating the aforementioned modifications in a specific situation.

1.8.3.5. Issue 5: verification, validation and qualification of systems

Three major steps in the systems engineering process may be found at all levels: verification (do we have what we asked for?), validation (do we have what we expected?) and certification or qualification (where we declare officially that we have obtained what we ordered).

Specific challenges encountered in the case of large complex systems include accounting for the human factor, the complexity generated by interconnections – particularly during use – and the fact that organizations are increasingly international and that we therefore benefit from using processes defined by other authorities that we cannot or do not wish to redefine. The impact of legacy systems is also important for the verification of certain properties and for global reliability in particular. With an increase in complexity, cause and effect are often increasingly distant from each other in both spatial and temporal terms, making verification and validation increasingly difficult. Furthermore, in the case of reconfigurable systems (such as large autonomous systems), how may we characterize these operations and certify certain properties? After all, what we seek to obtain in such cases is the appearance of new properties throughout the lifecycle of the system! Finally, we may also wish to take the most integrated approach possible involving design and verification, validation and qualification operations, particularly in modeling and simulation of the system, in order to create an engineering process with the greatest possible mastery of its own efficiency.

1.8.3.6. Issue 6: knowledge management throughout the system’s lifecycle

Systems have a life span longer than that of project teams, or at least of the personnel in charge of one or other of their key steps, such as design, creation or renovation. This problem increases in importance with increases in complexity, both at the level of individual systems and in the assembly of various systems, potentially with differing degrees of maturity. This is due to the multiplication of families of systems and the multiplication of systems created during use that raise issues linked to their development and maintenance (particularly responsibility for repairs in cases of dysfunction). (The multiplication of families of systems involves local adaptation to user needs. An example of this involving products can be found in the automobile sector, with specific options available for vehicles. It can also be found in services, where we may find personalized offers based on client loyalty or particular limitations of access to a service.)

One key issue involved in the mastery of systems of this kind is the capacity to store and retrieve information linked to specific steps of specific configurations of the system or its components. This leads to preoccupations concerning both methods and tools: we must define the manner of organizing heterogeneous data in systems and sub-systems throughout certain stages of the lifecycle, with the ability to trace the path leading to certain key decisions. This preoccupation relates to the definition of an ontology and a technical structure for data management, with adequate tools for research in preceding structures, potentially in large distributed databases, and with non-uniform access conditions (for reasons of heterogeneity in subjacent technical infrastructures, or judicial infrastructures, linked to intellectual and industrial property rights).

The challenges we encounter involve the ability to use this data management in association with management of key competences (without falling into the trap seen with the Millennium Bug, note the existence of problems linked to the loss and management of competences associated with technologies that are obsolete, but still in service). Moreover, configuration management is complicated by the differing lifecycles of components and the need to take account of innovations at different levels of the system.

In addition to managing data storage and research as described above and involving various actors, we encounter challenges linked to the representation of these data in order to facilitate use in terms of diagnostics and decision making, and reuse in alternative contexts (e.g. reuse of one sub-system for another, different, system).

The subjacent challenge, tackled briefly from the angle of intellectual property rights, is therefore that of contractual engineering, or the way in which we may reuse, a priori or a posteriori, the fruits of actions carried out previously within given contractual frameworks.

1.8.3.7. Issue 7: human-centered agile design

Humans constitute a key part of systems, either as designers, operators or an obligatory point of passage when making certain decisions. The human actor cannot be treated as a “simple” architectural element. The role of the human actor as judge and as component, rather than his or her inherent complexity, gives the actor this particular place (we would not attempt to imprison the human actor in a mechanistic vision by affirming that it is not the intrinsic complexity of individuals that gives them their extra-architectural status). Human actors are complex, particularly given the current developments of our society where, first, products and services are put in place by and for human users and second, systems have begun to “appear” through use, with the creation of interactions between various systems that were initially independent, or by appropriation by users who may themselves personalize a system during use. We are able to observe this in e-commerce (an offer may be directly targeted towards a user based on their consumption behaviors) and in multimedia applications for personal creation of content or even applications and services. In short, we need to operate in a context of co-creation processes involving project teams and users, thus moving away from the dichotomy of traditional systems where there is a gap between the phases of construction and use, materialized by tokens of qualification, and the launch of services.

Large complex system engineering should therefore be oriented more towards the consideration of user experiences.

Our main challenge is to include this in information collection processes concerning needs and constraints and in architectural prescriptions. Given the widespread “technophile” mentality, this also involves new modes for the insertion of innovations into systems already in use, or co-creations of value through use by partnerships with the user, of which we must define the form (and possible incitements).

Another challenge linked to the human component is modeling, for example in situations of stress (fatigue, incomplete environmental information, a requirement to make rapid decisions and important stakes involved in decisions) insofar as the human cannot a priori be modeled by something as simple as a deterministic finite automaton or a threshold modification in technical performance.

1.8.4. Development of the engineering process

Given these trends, issues and challenges in systems, the engineering process needs to evolve. We do not wish to imply that the process is incorrect. In its current state, the majority of concepts are included. (Although some would beg to differ – those who always wish to reinvent methods while ignoring what already exists – the process is widely used and has been proven to work in a number of applications. Such applications include complex projects, such as ballistic missiles or the moon landings, which were at the origin of the methods described.) The use of such an engineering process requires certain arrangements and modifications, however, in order to respond to various challenges.

Useful transformation may be described in terms of characteristics: a process should be standard, integrated, extensible, implantable, agile and “lean”. We shall discuss each of these characteristics in turn in the following paragraphs.

Standard: the process should be based on established practices shared by several actors, in order to avoid excessively frequent reviews. It should be applicable within an extended company: what, indeed, would be the use of a process if it were strictly dependent on a particular team, and could not be used for any project involving multiple teams? This means that standards must be suited to applications in complex projects and potentially established for the duration of the project, or at least be able to account for the differing stages of maturity and progress through the system’s lifecycle. Standards must also be suited to component mastery, as there is no guarantee that the same granularity of operational units will apply to all actors. It also seems to us that the process should not stop at setting out the general methodology structuring the engineering activity of the complex system, but must also give, or demand, indications on the content of steps describing the activity. Typically, for large complex systems the pivotal element of information is the architecture, with its different views, the traceability linked to local choices and developments through various configurations and renovations. Another information element is the structure of technical data, whether data are associated with certain architectural views, how they were validated at the system level. In these cases, the process should set out clear rules for organization (both in terms of ontology and of syntax) and management (of both time and exchanges). This enunciation must itself be based upon recognized and shared standards (whether de facto or de jure), in order to profit from the experience of the community and the state of the art at all levels, without becoming bogged down in economic impasses that can be particularly dramatic given the scale of the perimeters envisaged and the financial stakes involved in the long term. Explicit references are provided in [LUZ 08b, Ch. 12].

Integrated: the process should be integrated with other business and project processes. We wish to avoid situations where system engineering is seen as a supplementary task, or as an additional cost. This is an essential condition for mastery of the final result and should not constitute a cost center, but an essential and systematic component in operations. Moreover, by “integrated process”, we also wish to highlight the fact that potential aspects (technical, economic, judicial, contractual, strategic, etc.) must not be a juxtaposition of independent interests, but a global consideration of the complexity of mastering a system that is itself complex. This dual articulation is inevitable, and its resolution requires that the process be integrated.

Extensible: the process must be able to adapt to large, potentially distributed companies on the one hand and to small businesses on the other hand, in aspects of organization and resource and project management. This capacity of the process to adapt to different scales and durations is not simply a question of technology (although technology does, naturally, contribute to the technical feasibility of certain aspects of extensibility), but concerns all aspects of the process. We encounter the question of, if not “lightening” the process in certain contexts, then at least making sure it does not generate constraints that are too costly to respect in relation to the cost of other efforts involved.

Implementable: a proposed methodology must not be simply theoretical but should be applicable in practice, with associated tools for its execution. Moreover, in order to be useful in the framework of complex projects, potentially distributed over time and between different teams, this suite of tools must be based on software and material standards that are sufficiently widespread and relatively durable. It must also be usable without requiring excessive training or complicated modes of operation, which lead to the use of roundabout methods to facilitate tasks and thus act as an obstacle to the capitalization of information between teams or over extended periods of time.

Agility is an essential characteristic that has become rather fashionable: the risks of having an incomplete or evolutive set of demands is clear, and it has been noted many times that this is a major cause of cost overruns, time overruns and even failure. Unfortunately, this is almost inevitable in large complex systems, such as systems of systems, the interest of which lies in the assembly (almost always unplanned) of systems that have already been designed and created or will be developed to increase value within an assembly. By “agility”, we thus refer to means of:

– mastering the risks that we know we must take;

– tracing choices and decisions in order to observe the consequences for individual components and for the global system, with a view to immediate use and to potential future uses;

– managing configurations and planning maintenance operations, once again with mastery of the differential in relation to that which preceded the new assembly.

The introduction of the property of agility at process level brings a methodological arsenal into play, with various data and representation management tools, but also modes of organization that must be defined and piloted with the participation of all parties potentially concerned.

Lean: a process must follow innovations in terms of methods, techniques and tools. This idea was developed from the 1980s onwards by Toyota, following the adage “make cars for customers, not for engineers”. The main aim of this methodological arsenal is to increase customer-oriented value: we must orient all processes towards the final customer, while at the same time eliminating all nonproductive tasks that lead to waste of time and therefore of value. We aim to reduce the effort required in designing, creating and providing products, reduce the investment necessary to reach given levels of productivity, reduce the production of defective products, reduce the number of suppliers used, and ensure that key processes in the company take place more rapidly and with less effort, less inventory cost and less stress for employees. All of this is made possible by the elimination of waste (actions without added value in terms of the final aims of the company as perceived by the client, whether or not they are necessary to the company):

– superfluous use of raw materials;

– reduction of time taken to create products or services;

– design or manufacturing errors;

– overproduction or the production of useless and unproductive inventories as far as the target product or service is concerned;

– poor use of resources (unnecessary resources, involvement of personnel in tasks with little added value, pointless movements of products or materials between transformations and poor use of work or production spaces).

The approach may be applied to production tasks, but also in the general administration of the company, personnel management etc., or, in short, to all processes and sub-processes. This approach is clearly becoming increasingly present in companies, in a context of evolution of modes of organization where just-in-time production, immediate solutions and cost reduction are dominant. It offers an approach to efficiency in terms of flows between the supplier and the client, which must be optimized at each stage of the process by avoiding all potential obstacles or things that may slow down the flow, including the use of unformatted tools and ad hoc actions, engaging in short-term objectives of performance improvement. The approach is based on a detailed knowledge of the demands and expectations of clients and of users in particular. It involves the systematic use of standards, reuse of existing resources in terms of design or production processes, mastery of the time and cost involved in prototyping, refusal of unnecessary complexity in final products and processes for the mastery of these products, avoidance of last-minute design changes and large-scale evaluation aimed at mastering production feedback. This list essentially summarizes best practice in systems engineering, in a vision that is totally and systematically oriented towards the final user.

Beyond these characteristics, it is tempting to wonder whether or not the traditional V-model principle needs amending. The response to this, in fact, depends on our interpretation of this model. If we see it as a top-down, sequential waterfall that then re-ascends – i.e. an analysis of higher level needs followed by a specification that serves as an expression of need for the next level down, and so on until we reach the final level of decomposition, then integration and validation at the lowest level before climbing to the next level – it is clear that this approach requires modification. This modification has, in fact, already been made, in that we know that for each level, plans for integration and validation must be laid at the same time as the analysis and specification of the level in question.

In fact, the V principle is more of a general philosophy of decomposition and aggregation, but with a loop initiated at each level when descending the V, which is completed on the way back up. That said, even if loops exist at every level of the decomposition hierarchy (compensating for the linearity of the process induced by the decomposition–recomposition principle, as we have already highlighted), the schema still seems to place all steps of the process at the same level of complexity with working methods that are a priori similar. It is this point that makes the model unsuitable for use with large complex systems, insofar as the first step in particular – which moves from an expectation or need expressed in the form of a global capacity to specifications of higher level systems, which are the object of our interest – presents major difficulties in terms of the formalization of expectations in order to render them exploitable for engineering individual systems. At this level, it would be useful to have a certain number of iterations, taking into consideration by a trial and error mechanism the possibility of launching several process initializations at the level of component systems before beginning detailed engineering of all of these components. To put it another way, particular importance must be accorded to the first loop at global system level, accompanied by several attempts to descend and re-ascend a process using hypotheses, before truly engaging in a more complete process. Moreover, as we stated when discussing system maintenance, this management of evolutions may itself relaunch design–production–integration cycles at different levels.

Finally, the traditional process should be seen as a guide, the main interest of which lies first in the fact that it is suited to our Cartesian manner of thinking, and second in the fact that its use over several decades means we have access to a range of tools. We must, however, be aware of its theoretical limits (see our discussion of non-linearity) in the context of our particular problem in order to overcome these limitations at the level of interpretation and when making decisions at various stages. We therefore recommend following this process, knowing how to use and modify it intelligently where required, but with thorough documentation and tracing of all these modifications in order to master them in the long run and in spite of the replacement of various actors involved throughout the lifespan of the system in question.

1.8.5. Themes of research

In connection with the challenges listed above, we shall now present a number of research themes expressed in the language of the “hard” sciences (mathematics, physics and theoretical computing), highlighting their relative contribution to these issues. This exercise merits further attention. We present it here essentially in order to open the way for debate between communities and to sketch out possible ideas for shared research involving scientific and technical domains that have an unfortunate tendency to ignore each other.

1.8.5.1. Modeling: development, analysis and model inversion

This theme presents strong links to issues 1, 2 and 3 and also contributes to issue 5. Some examples of scientific problems concerned include:

– discrete event systems, hybrid automata and graph and network theory (smallworld, scale-free etc.): beyond problems strictly linked to modeling, we need to look at the characterization of properties of robustness, resilience and fragility, notably by explicit analysis of the links between topology and threat types. Another interesting subject is the behavioral analysis and synthesis of models with specific properties;

– game theory in cases involving multiple actors and retroactions in local decisions (strategies with norms, etc.): as a caricature, we might say that the last century saw a move from the Pareto optimum to the Nash equilibrium, and we now need to make the same qualitative leap in highly dynamic environments, particularly where the influence of choices correlates to the topology (and connections in particular) subjacent to the actors present;

– understanding scalability: this understanding, both quantitative and qualitative, is based on the study of behaviors of differential equations, partially derived equations (reaction–diffusion, convection, flow equations, etc.), the analysis of their bifurcations, shock-wave solutions or, in short, all specific regimes with a perceived influence on our knowledge of dynamics outside of equilibrium situations;

– mastery of the passage from the discrete to the continuous, from stochastic to deterministic, from microscopic to mesoscopic and macroscopic scales, and vice versa. We are thus concerned with mastering the passage to the limit in the presence of a significant number of components, and the ability to design and analyze behaviors of dynamic solutions at various different resolutions, with the possibility of breaking down or aggregating them as required. Various possibilities already exist, which should be developed for application to real-world cases as seen in engineering. Micro-local analysis, for example, allows us to move from local equations (at quantum level) to equations modeling globally averaged behaviors (at statistical set level) and vice versa. The master equation allows us (as long as we possess the capacity to solve it) to go from the stochastic modeling of local exchanges to global modeling in the form of flow equations. We do, however, still need to generalize these approaches, and especially master (for example) the “probabilization” of a model to account for particular hazards, risks etc., in order to succeed in achieving the passage from global to local;

– the application of systemics to modeling: development of algebraic/logical structural models (based, for example, on categories and topoi) in order to be able to model the large complex system via a composition approach and to analyze the system via the properties we might impose upon it in the form of logical constraints (the idea is to generalize what happens, for example, when specifying blockage situations, deadlock and livelock, for example, in networks);

– the understanding, or even specification, of the emergence of forms and behaviors in equations: how to move beyond the analysis of bifurcations between models and preliminary work in mathematical morphogenesis linked to the understanding of certain thermodynamic phenomena far from equilibrium;

– the passage from the parametric structure of a model to the set of its trajectories, and vice versa, with the aim of controlling these trajectories: this would constitute no more or less than the ability to choose a future model based on an acceptable set of trajectories.

1.8.5.2. Automatic proof (for decision-making in evolutive multi-system contexts)

This theme is strongly linked to issues 5 and 6. The scientific problems concerned include:

– the analysis of theories with large numbers of axioms: in addition to combinatory aspects, we need to resolve, theoretically and in a constructive manner, the simplification of these theories (notably by exhibiting non-independent axioms). We also need to develop certain frameworks allowing us to account for possible contradictions (either within the same logical framework as in certain non-monotone or paraconsistent logics, or by the ability to manage the passage between theories);

– parallel management of several provers: this problem is not simply a question of algorithms, but also raises theoretical issues as it generalizes the passage between theories. Note the interest of this approach, both for behavioral analysis of models and for model synthesis, i.e. in systems engineering, specification under constraints;

– accounting for the stochastic factor at the level of axioms in a theory: this stochastic dimension is then found in proofs, and can potentially be brought back to parallel management of several theories (each step being weighted in a probabilistic manner). Seen in this way, the combinatory explosion is immediate;

– the analysis and reduction of the algorithmic complexity of automatic provers: this theme is evidently already the subject of work, but its importance is reinforced in light of the previous points.

1.8.5.3. Design of software-intensive complex systems with consideration of the human factor

This theme is strongly linked to issues 4, 6 and 7 and also contributes to issues 1 and 2. The scientific problems concerned include:

– archiving and search algorithms in large databases: the cases of “flat” (large data set) and relational databases require study;

– tools for the visualization and manipulation of heterogeneous databases: the aim is to establish easy contact between the designer and user and the problem in question. This involves smoothing out complexity, where possible, by the designer and user using their own cognitive resources to comprehend this complexity, facilitating mainly preliminary design work and the difficult passage between the formalization of expectations and specification of need;

– the mastery of links between high-level and low-level semantics: the issue of mastery arises in connection with specification languages, languages for the description of architectures and languages for the description of simulations. This links back to the problem of mastery of scalability (micro/meso/macro) mentioned above, but this time in relation to descriptive languages;

– interfacing and the integration of design tools with simulation tools: this, in particular, poses the problem of distributing software and material resources;

– automatic generation of lowest level code: how to use MDA (model-driven architecture, with architectural transformation) and design patterns at different levels – system, subsystem and component. (The definition of reference patterns, at application domain level and not exclusively for technical components, is required. For example, we might easily imagine that the security services of an information system derive a priori from a similar model to banking information systems and for operational information systems in defense, but each redefines everything in its own domain, etc.);

– modeling the human factor: this concerns, first and foremost, models of human behavior, particularly under stress, but also models of interaction with humans, where we consider both person-to-person interactions and interactions between people and automated systems. This last point concerns aspects of man-machine interfaces (in terms of ergonomics and ease of use) and aspects linked to decision making or information exchanges on different levels (representations guided by syntax for machines and by semantics for human actors). The interface issue concerns posts linked to system operations, but also design and retro-engineering tools upstream;

– the security of information systems under the constraint of distributed software: new collaborative tools used by geographically dispersed teams and which potentially belong to different companies that find themselves in temporary alliance pose sizeable issues of data confidentiality, authentication and the protection of data against retro-engineering (a supplier may require architectural elements or technical data from a client, corresponding to specific competences of the client, and the exchange should not allow suppliers to acquire these competences themselves).

1.8.5.4. Co-design of materials and software

This theme is strongly linked to issues 2 and 4 and also contributes to issues 1 and 3. Examples of the scientific problems concerned include:

– optimal adaptive resource distribution: to guarantee sufficient longevity to systems, optimization must not impose unnecessary constraints on architecture, holding back future technological and functional evolutions. This has been a known issue for at least 20 years in relation to circuit boards, but has more recently emerged as an issue in components and systems, with the growing place accorded to information and communication technologies. (It is also an issue for so-called “electric organs”, which replace mechanical transmission mechanisms. This still raises questions linked to electronic components which, from one generation to the next, will not necessarily be replaced one by one. Any optimization too firmly linked to the temporary physical architecture will slow down development);

– the rules and means of definition of reconfigurable, modulable, flexible architectures: this is a generalization of the previous issue to cover the whole system;

– intensive computing: this applies to domains such as simulation and the management of large amounts of data. The issue concerns algorithms as much as means of calculation (concepts currently receiving a lot of attention include grid computing and cloud computing).

1.9. Conclusion

We have seen that the systems we are called on to operate as users, or which we must design or develop as engineers, are increasingly complex, interconnected and distributed. They contain, within their building blocks, elements as varied as products and services, with different approaches to design, use, maintenance and retirement. We have highlighted the way in which this complexity takes multiple forms and have sought to establish analogies between natural and artificial systems with the aim of sharing approaches and solutions for the mastery of this complexity between domains that are currently disconnected as study disciplines.

Multiple attitudes exist towards this assessment and the challenges presented by complexity. One of these attitudes involves attempts to reduce complexity. The intention here is praiseworthy, but without the use of semantic tricks it is doomed to failure, if indeed reduction means a “jump” following a predefined scale of complexity.

Indeed, if we always had the ability to reduce the complexity of a system, this would mean in particular that the initial system S1 could be reduced to S2. (This could occur via regulation, an architectural model or any other functional representation, i.e. one including sufficient information in order for the system to remain useable. If this condition was not included, a representation containing no information would automatically constitute the fullest reduction of all systems!) S2 could also be reduced to yield S3, and so on. This is certainly possible from a strictly formal mathematical viewpoint, but is unimaginable for an engineer. If complexity has any meaning in engineering, a scale used to measure it must make use of discrete measurements (in the sense that they cannot be infinitely close, but must be separated by a minimum limit) and there must be a minimum measurement of complexity. After renormalization, this means that the complexity scale is graded using natural integers: 0, 1, 2, etc. The reduction mechanism moves down the scale from the initial measurement of complexity, so at the end of a finite number of reductions we arrive at 0, meaning that if every system could be reduced in terms of complexity it would be reduced to a trivial system of null information. Such a result is clearly absurd in terms of systems engineering.

Why, then, do we so often talk of reducing complexity? Why, too, is the adage KISS (keep it simple, stupid) so popular and used as a guiding principle in so many approaches? First, because in all intellectual undertakings, the human tendency is to aim to achieve complete mastery of a problem; the challenge of complexity being no exception! Second, because it is useful to return to what we referred to earlier as “semantic tricks”. We can, in fact, attempt to define a representation of a system that, in its expression, appears simpler than it actually is, but for which the rules and notations defining expression may be complex (the rules must counterbalance the increase in simplicity suggested by the expression, as in computer programs using a high-level language that seems very explicit, but is based on a significant layer of translation which must be gone through in order to reach instructions at the basic level of electronic resources). The difference is that these rules and notations are predefined in connection with the use to which they are put, and are provided in libraries of meta-object handling tools. Reduction is, then, possible, but must be carried out through the choice of suitable “meta-tools”. The means of reducing complexity is then, in fact, made up of standards, methods and tools.

From our perspective, we feel that these standards, methods and tools must be used at the level of the architecture to provide the greatest added value for the duration. Moreover, in order to avoid having to repeat work when a system is used in another context, or in order to be able to reuse part of an existing system when considering, for example, potential assemblies of various systems, it is useful to have access to architecture transformation mechanisms, allowing the application of reasoning through analogy at practical level. An orientation based around the development of methodologies, including tools, which respond to these preoccupations seems to us to be an important route to take in achieving mastery of large complex systems.

Throughout this chapter, we have considered users as consumers, but also as producers. Every large complex system owes its existence – or its interest, to avoid philosophical connotations – to the simple reason that it contributes to a given objective in which man takes a place, at the very least as a user. The current tendency is to go beyond this strict limitation of the user to the role of passive consumer. In almost all real-life situations, we see that the user must play the role of integrator, mostly working on his or her own initiative, particularly in cases where the user must move from one system to another to fulfill his or her needs. For example, tonight I must travel to a town to give a lecture on system-of-systems engineering. As part of a mission established by the organization responsible for the lecture, my plane ticket has been booked, I know what cab firm I must use in order to be reimbursed, and my accommodation for the night will also be paid for. However, the air and road transport systems are unconnected in terms of managing the user of these transport services, even if, geographically, efforts have been made to facilitate the passage from one mode of transportation to another. It is down to me, the user, to find a cab run by the right company. In the same way, I am responsible for reserving my hotel accommodation, although the expense will be processed in full by a specific service at the moment of reimbursement. I, the user, must take on certain tasks allowing me to fully profit from a global service of “mission support”. In the case of multi-modal transport, I will not be an actor in the same way, as the same ticket (and the associated signage, which removes the tasks where the user creates a small amount of value – if choosing a taxi operated by the right company can truly be considered to represent value creation) covers my passage from the airplane to the taxi. In the case of systems providing all-inclusive holidays (not really the case in the example given above!), the hotel reservation would also be included, and the transport ticket would cover the plane, taxi, arrival at the hotel, reception etc., without any action on the part of the user besides following the precise indications given by the various actors involved in implementing the system. However, even if excursions and meals are pre-planned, I still need to choose presents and souvenirs. To summarize, whatever the complexity level of a system considered, the user is always an actor who creates value at one point or another. Consequently, the user must be considered in this way, and systems should be designed to allow the user to participate correctly as a co-creator of value. This generates a clear need for openness at architecture level.

Moreover, as highlighted in [NOR 88], we should keep in mind that the user should be able to use the system intuitively, particularly in the case of systems linked to current use. This is not contradictory to the increasing complexity of the systems we encounter, but shows that complexity should not be transferred to, and thus inconvenience, the user. It should, instead, be transparent if we want the usage value of the system to correspond to the level of investment involved in creating the system. Thus, effort needs to be made in designing these systems and their architecture as, again, specific users will be involved (even if we may presume that they will have received training). Once again, we must avoid complicating the task unnecessarily, as users might then implement personal derogation procedures that differ from the procedures we would like them to follow. We must aim for simplicity at all levels of the value chain in direct interaction with human actors (whether they are users, operators, architects, etc.). This means, as we mentioned above, that there must be a local increase in complexity at other points in the value chain, creating a risk factor in terms of mastery of the security and safety of the system.

1.10. Bibliography

[AFS 07] AFSHAR M., DARTNELL C., LUZEAUX D.,SALLANTIN J.,TOGNETTI Y., “Aristotle’s square revisited to frame discovery science”, Journal of Computers, vol. 2, no. 5, pp. 54-66, July 2007.

[ATT 09] ATTOH-OKINE N.O., COOPER A.T., MENSAH S.A., “Formulation of resilience index of urban infrastructure using belief functions”, IEEE Systems Journal, vol. 3, no. 3, pp. 147-153, 2009.

[BAL 04] BALL P., Critical Mass: How One Thing Leads to Another, Arrow Books, London, 2004.

[BOM 07] BOMSEL O., Gratuit!: le Déploiement de l’Économie Numérique, Gallimard, Paris, France, 2007.

[CAM 09] CAMPAGNAC E., Evaluer les Partenariats Public-privé en Europe, Presses de l’École Nationale des Ponts et Chaussées, Paris, France, 2009.

[CAN 08] CANTOT P., LUZEAUX D., Modélisation et Simulation de Systèmes de Systèmes: vers la Maîtrise de la Complexité, Hermès Lavoisier, Paris, France, 2009.

[DAP 10] D’APICE C., GÖTTLICH S., HERTY M.,PICCOLI B., Modeling, Simulation and Optimization of Supply Chains, Society for Industrial and Applied Mathematics, Philadelphia, USA, 2010.

[DEV 10] DEVIC C., LUZEAUX D., ROUSEL J.-C., FAISANDIER A., MOREL G., DE CHAZELLES P., GALINIER M., AFIS Vision 2020, working document created by the AFIS, 2010.

[FIK 07] FIKSEL J., “Sustainability and resilience: toward a systems approach”, IEEE Engineering Management Review, vol. 35, no. 3, pp. 5-15, 2007.

[HAM 06] HAMMES T.X., The Sling and the Stone: on War in the 21st Century, Manas Publications, New Delhi, India, 2006.

[HOL 06] HOLLNAGEL E., WOODS D.D., LEVESON N., Resilience Engineering: Concepts and Precepts, Ashgate Publishing, Hampshire, 2006.

[JOH 07] JOHNSON N., Simply Complexity, A Oneworld Book, Oxford, 2007.

[KEL 10] KELLY K., What Technology Wants, Viking, New York, USA, 2010.

[LAN 08] LANNOY A., Maîtrise des Risques et Sûreté de Fonctionnement: Repères Historiques et Méthodologiques, Lavoisier, Paris, France, 2008.

[LUZ 93] LUZEAUX D., “Intelligent control, ergodicity and chaos”, International Simulation Technology Multiconference, Simtech-Wnn-Fnn93, San Francisco, USA, November 1993.

[LUZ 95] LUZEAUX D., MARTIN E., “Intelligent control: a theoretical framework”, 3rd SIAM Conference on Control and its Applications, St. Louis, MO, USA, April 1995.

[LUZ 97] LUZEAUX D., ANTONIOTTI J.-F., “Mathematical system theory and digital control”, Southeastern Symposium on System Theory, Cookeville, TN, USA, March 1997.

[LUZ 02] LUZEAUX D., DALGALARRONDO A., “Harpic, an architecture based on representations, perception and intelligent control: a way to provide autonomy to robots”, Chapter 2 no Innovation in Intelligent Systems”, in: BRAHAM A., LAKHMI J., KACPRZYK J., Studies (eds), Fuzziness and Soft Computing Series, Springer Verlag, New York, USA, pp. 37-56, 2002.

[LUZ 07] LUZEAUX D., PUIG T., A la Conquête du Nanomonde, Editions du Félin, Paris, France, 2007.

[LUZ 08a] LUZEAUX D., RUAULT J.-R., Systèmes de Systèmes: Concepts et Illustrations Pratiques, Hermès Lavoisier, Paris, France, 2008.

[LUZ 08b] LUZEAUX D., RUAULT J.-R., Ingénierie des Systèmes de Systèmes: Méthodes et Outils, Hermès Lavoisier, Paris, France, 2008.

[LUZ 08c] LUZEAUX D., “Des systèmes à l’ingénierie système: concepts de base, interprétations et pièges”, Conférence Invitée aux Journées Académie-Industrie Organisées par l’AFIS, Nîmes, France, December 2008.

[LUZ 10] LUZEAUX D., “System engineering: AFIS Vision 2020”, Conférence Invitée à CSDM (1st International Conference on System Design and Management), Paris, France, October 2010.

[MAN 74] MANDLEBROT B., “Intermittent turbulence in self similar cascades; divergence of high moments and dimension of the character”, Journal of Fluid Mechanics, no. 62, pp. 331-358, 1974.

[MAR 08] MARTIN E., SALLANTIN J., LUZEAUX D., “Towards intelligent machines grounded on a formal phenomenology”, European Computing and Philosophy Conference (ECAP’08), Computational Formalisms and Phenomenology, Montpellier, France, June 2008.

[MIN 08] MINOLI D., Enterprise Architecture A to Z: Frameworks, Business Process Modeling, SOA, and Infrastructure Technology, CRC Press, Auerbach Publications, Boca Raton, FL, USA, 2008.

[MOF 02] MOFFAT J., Command and Control in the Information Age, TSO, Norwich, 2002.

[NOR 88] NORMAN D.A., The Design of Everyday Things, Basic Books, New York, USA, 1988.

[OAS 09] OASIS, Reference Architecture Framework for Service Oriented Architecture, version 1.0, Committee Draft 02, (docs.oasis-open.org/soa-rm/soa-ra/v1.0/soa-ra-cd-02.pdf), October 2004.

[OME 09] OMER M., NILCHIANI R., MOSTASHARI A., “Measuring the resilience of the transoceanic telecommunication cable system”, IEEE Systems Journal, vol. 3, no. 3, pp. 295-303, 2009.

[PER 00] PERROT J.-Y., CHATELUS G., Financement des Infrastructures et des Services Collectifs: le Recours au Partenariat Public-privé, Presses de l’École Nationale des Ponts et Chaussées, Paris, France, 2000.

[RUA 09] RUAULT J.-R., LUZEAUX D., COLAS C., SARRON J.-C., “Ingénierie système et résilience des systèmes sociotechniques”, 5e Conférence Annuelle de l’AFIS, Paris, France, September 2009.

[SAL 07] SALLANTIN J., LUZEAUX D., SZCECINIARZ J.-J., “An epistemology of computer science based on didactical interaction”, European Computing and Philosophy Conference (ECAP’07), Philosophy Science Special Track, Twente University, Entschede, The Netherlands, June 2007.

[SHE 06] SHEARD S., “Definition of the science of complex systems”, INSIGHT Journal, vol. 9, no. 1, pp. 25-26, October 2006.

[SIM 05] SIMCHI-LEVI D., CHEN X., BRAMEL J., The Logic of Logistics, Springer Series in Operations Research, New York, USA, 2005.

[WAN 09] WANG D., IP W.H., “Evaluation and analysis of logistics network resilience with application to aircraft servicing”, IEEE Systems Journal, vol. 3, no. 3, pp. 166-173, 2009.

[ZOL 10] ZOLESIO J.-R., “Critical infratsrusture protection”, in: LUZEAUX D. and RUAULT J.- R. (eds.), Systems of Systems, ISTE, London, John Wiley and Sons, New York, January 2010.

1 Chapter written by Dominique LUZEAUX.

1 Translator’s note: means of payment and contractualization of personal services used by private individuals.

2 The phrase “socio-technical” is used intentionally here, not as a substitute for “complex” or literary reasons, but to highlight the dual sociological or organizational and technical or technological motivations involved, as these two dimensions constitute both main motors and a raison d’être of the service logic.

3 TOGAF – The Open Group Architecture Framework

FEAF – Federal Enterprise Architecture Framework

GERAM – Generalized Enterprise Reference Architecture Framework.

4 DoDAG – US Department of Defense Architecture Framework

MODAF – UK Ministry of Defence Architecture Framework

NAF – NATO Architecture Framework

5 Note that in the Babylonian period, around 1700 BC, reliability was not defined in such a probabilistic manner. If a house collapsed and killed the owner, the architect responsible for the construction was put to death; if the son of the proprietor was killed, then the son of the architect was executed!

6 Variety is the logarithm of the total number of possible system states; it may be linked to a certain information measurement in the system. In reality, this law is close to the second law of thermodynamics in physics, using the principle of internal models introduced by Wonham in automatics and certain theorems proposed by Shannon in information theory.

7 We strongly recommend [BAL 04], which takes a comparable approach in presenting accessible versions of various models of complex systems produced in the physical sciences, placing them in a historical context in the philosophy of science before highlighting their interest in relation to economic and social problems.

8 We talk of “fluid” models, as the initial domain of application was the flow of fluids in hydrodynamics and in aerodynamics.

9 One example of symmetry breaking can be found in social insect societies, for example in the division of work between worker and soldier ants, where nothing initially (at egg level) leads us to suspect this division will occur; it is the global state of the society that determines the treatment, and therefore the future, of an egg at a given moment.

10 With the notorious exception of the climate, where we immediately fall into this trap, claiming that the reduction of industrial emissions of carbon gas would have a positive effect (but does “positive” refer to the plus sign, or to morale? If we wish to reduce a temperature – i.e. warming effects – due to a greenhouse effect cause by these emissions, the reduction would actually imply a negative influence) on the climate (in the purest tradition of global extrapolation of local observations in terms of space and time). We also quantify this necessary reduction (in the purest tradition of “proportionality” of the cause and effect relationship, with the addition of the myth of carbon exchanges, which are a clear negation of all spatial dependence of the phenomenon!) while simultaneously supporting the idea of complexity in the climate system. However, the whole issue is itself eminently complex from an epistemological point of view and the various thought disciplines concerned are themselves interwoven and not independent!

11 The AFIS is a non-profit association created in 1999. It brings together around 30 members (industrial and public establishments involved in defense, transportation, energy, automobile and banking industry), academics (from universities and engineering schools) and small and medium businesses, involving around 500 individual members, with an exclusive agreement in France with the INCOSE (International Council on Systems Engineering). See www.afis.fr for more information.

12 This question is not purely rhetorical: the FELIN combat system is an example of a system involving the fields of defense and clothing manufacture. It is linked to the production of combat clothing and has capacities for insertion in the digital battlefield, offering ballistic protection and tactical mobility to infantry.

13 The raison d’être of armaments have not changed – the aim is still to kill – but, where possible, with minimal pain in order to respect human dignity, and especially without polluting the battlefield environment, for example with heavy metals or other materials that leave residues lasting beyond the duration of the conflict. The criterion of eco-compatibility seems to have become more relevant than the loss of life during conflict, in the name of the current belief that the world belongs to future generations and that we are not proprietors but simply tenants. We might be tempted to point out that lost lives do not produce future generations, or happy generations…

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset