Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 22

DNA Double-Strand Break–Based Nonmonotonic Logic

Andrei Doncescu¹; Pierre Siegel² ¹ University Paul Sabatier – INSERM – CNRS, CRCT, France
² Aix Marseille University, CNRS, LIF, Marseille, France

Abstract

Biological systems are complex systems with a dynamic evolution in terms of structure and biological causality. In order to model and control these systems, we must observe and simulate their behavior using relevant models. This chapter introduces a logical model based on default logic for DNA double-strand break (DSB) modeling. This new approach handles the uncertainties of knowledge and missing information concerning the conditions of activation/inhibition of some proteins. This model can be applied to any signaling pathway, even those with more than 6000 genes and proteins.

Keywords

Knowledge representation

default logic

tractable algorithms

signaling pathways

main pathway

1 Introduction

E-cell simulation is a very useful tool in the drug discovery process. The simulation programs can be divided into two areas: dynamic simulation and knowledge-based discovery programs (KBDPs). In the first, the model is based on differential equations and has the capacity to predict biological states. In the other, the consistency of the model is checked. The most efficient KBDP is based on first-order logic (FOL).

A particularly and difficult field is cancer. The causal graph representation used in this area requires an appropriate evaluation of sure knowledge, corroborated with the available experimental data in order to represent existing knowledge and discover new knowledge. The key mechanism of cancer is DNA double-strand breaks (DSBs), which could be viewed as a complex system constituting protein-protein interactions, protein-DNA interactions, and gene regulation, where the input is the DNA DSBs and the output is apoptosis or cancer.

Cancer can be seen as a pathological alteration of the signaling networks of the cell. The study of signaling events appears to be one important key of biological, pharmacological, and medical research. For over a decade, signaling networks have been studied using analytical methods based on the recognition of proteins by specific antibodies. Parallel DNA chips (microarrays) are widely used to study the coexpression of candidate genes to explain the etymology of certain diseases, including cancer. This huge amount of data allows the modeling of gene interactions. Biologists look for evidence of interactions between metabolites or genes.

From the standpoint of artificial intelligence, cells are sources of information that include a large amount of intracellular and extracellular inputs and outputs. Therefore, representation by graphs is the best way to understand biological systems. The graph representation has mathematical properties and for this application the presence of positive and negative loops are the representation of genetic regulatory function. Biochemical reactions are often a series of events instead of one elementary action. We suppose even the biological cycles could be discretized according some events. Therefore, one direction of research in system biology is to capture or describe the series of steps called pathways by metabolic engineering. Any reaction that allows the transformation of one initial molecule to a final one constitutes a metabolic pathway. Any compound that participates in different metabolic pathways is grouped under the term metabolite.

The challenges of analysis and modeling of causal graphs have been well identified and studied in artificial intelligence over the last 30 years. Indeed, the description of signaling pathway is never complete: biological experiments prove a number of biological interactions, but certainly not all of them. On the other hand, the complexity and the burden of some experiments can lead to uncertain and imprecise information. Some data may be erroneous and must be corrected or revised. Finally the information from different sources and experiences can be contradictory. It is the goal of different nonclassical logics, and particularly nonmonotonic logics, to handle these kinds of situations. Afterward, these graphs of interactions must be validated by biological experiments. Of course, these experiments are time consuming and expensive, but less so than some exhaustive experiments.

This chapter focuses on three main problems: conflicts that can occur in gene representation, completing the signaling pathway using new interactions, and handling the complexity of the algorithms. The solutions to these problems allow a great number of inferences be made with the aim of discovering gene mechanisms.

Our approach uses default logic, which allows us to deal with incomplete information. Abductive reasoning is used to complete the missing information from the signaling pathway. The last part is dedicated to a new language of representation, which seems to be the key to handling algorithm complexity.

2 DNA DSBs

DNA double- and single-strand breaks are not formed as a consequence of direct perturbation of DNA. Rather, they form due to attempts to repair DNA damage. It is a common event in eukaryotic cells, and there are two major pathways for repairing them: homologous recombination and nonhomologous DNA end joining.

The regulation of the cellular cycle is the process deciding the cell’s survival, containing the detection and repairing of genetic damage. The cell cycle is involved in uncontrollable cell division. Two classes of molecules, cyclins and kinase-dependent cyclins (CDKs), determine cellular dynamics. It was showed that CDK, CDK-activating kinase generate dynamical behaviors including limit cycles. By the term cellular dynamics, we mean the molecule concentrations in a specific signaling pathway according the time when different stimuli are applied. In a general way, the cell can receive information by protein interactions that will transduce signals. First, the information is discovered by sensor proteins, which will recruit some other mediator proteins whose function is to participate to all interactions between the sensors and the transducers. These transducers are proteins that will amplify the signal by biochemical methods such as phosphorylation. In the end, the signal will be sent to effectors that will engage important cell processes.

Cell responses to DNA DSBs of DNA have been studied for some years, but the ATM-dependent signaling pathway has only been clarified since the discovery of H2AX (Bassing and Alt, 2004), the phosphorylated form of histone H2AX. All the protein interactions of this pathway have been reported by Pommier et al. (2005, 2006). Figures 22.1 and 22.2 (Pommier et al., 2005, 2006), including the signaling of the DSB (involving important proteins such as H2AX, MDC1, BRCA1, and the MRN complex) but also for the checkpoint mechanisms (involving p53, the Cdc25s, and Chk2). In this pathway, the DSB is recognized by the Mre11-Rad50-Nbs1 (MRN) complex is involved in detection and signaling of DSBs. Moreover, it recruits ATM in its inactive dimer form. The protein kinase ataxia-telangiectasia mutated (ATM) is well known for its role as activator of the DNA damage response to DNA double-strand breaks (DSBs). ATM will phosphorylate itself and dissociate to become an active monomer. This active form of ATM will phosphorylate many mediators, such as γ − H2AX, MDC1, BRCA1, or 53BP1 (Bartkova et al., 1970). Then the signal is transduced by proteins such as Chk2, p53 (the most studied cancer protein), or Cdc25s. The effectors can vary according to context: p21 and Gadd45 will induce cycle arrest, whereas Box, Nas, Puma, and Fas will induce cell apoptosis.

f22-01-9780128025086 — Figure 22.1 Signaling pathway of DNA DSBs.

f22-02-9780128025086 — Figure 22.2 Legend TK

3 Logical model for system biology

3.1 Declarative representation of signaling pathway

Figure 22.3 presents a classical simplified pathway of interactions in a cell (Inoue et al., 2013). It shows the interactions between cancer and ultraviolets and the genes:

f22-03-9780128025086 — Figure 22.3 The simplified model of a DSB.

• The arrow → shows the direction of the activation. For example, UV → Cancer means that the ultraviolet causes cancer, but → p53 means that p53 is activated.

• The inhibition is represented by the symbol . For example, A Cancer means that A blocks cancer.

• The binding means that two genes bind together to create a compound, such as p53 and Mdm2. This compound activates B.

From a biological point of view, Mdm2 is a cellular antagonist of p53 that maintains a balanced cellular level of p53. The two proteins are linked by a negative regulatory feedback loop and physically bind to each other via a putative helix formed by residues 18–26 of p53 transactivation domain (TAD).

Intuitively, the causal graph represented in Figure 22.3 shows through different mechanisms (not shown) that an ultraviolet (UV) drives the cell to develop cancer. This is represented by an arrow: UV → cancer. On the other hand, the UV activates the protein P53: UV → P53. Protein B activates protein A: P53 → A, and protein A blocks cancer: A cancer. In addition, Mdm2 binds protein P53, the complex activates B, and B blocks A.

Obviously, this explication lacks coherence. If it is known that UV is an active input and mdm2 is not, then the cancer is triggered. But, if p53 is activated, then A is activated and cancer is inhibited or blocked. Therefore, the cancer is both activated and blocked, which is inconsistent.

This example is very simple, but it is helpful to explain our representation for gene networks and the algorithms used for biological knowledge discovery. To analyze real biological systems, which contain many pathways with thousands of genes, it is obvious that computational complexity has to be handled.

3.2 Logic representation

A biological function is not the result of only one macro-molecule. Indeed, a group of macromolecules including genes and proteins acts simultaneously and results in the activation of one or more signaling pathways. In this logical model, genes and proteins are considered i the same object, even though they are different biologically (Tran and Baral, 2007).

In this chapter, biological events are often described using propositional logic. However, to design a new plan of experiments, the study of interactions is based on queries. Queries are used to ask whether something is a logical consequence of a knowledge base or not. With propositional queries, a user can ask yes-or-no questions. Queries with variables allow the system to return the values of the variables that make the query a logical consequence of the knowledge base. Therefore, the representation of concentration changes. It is possible, for example, using predicates such as increased or decreased, and the query is whether the concentration of protein A increases or decreases. Some of these queries fall outside the scope of propositional logic, but the basic problems are the same: the protein concentration is rarely precise, and often in practice, the experiments encounter some difficulty in defining thresholds.

The dynamic aspect of protein interactions is based on first-order logic and propositional logic. For example, stimulation(UV) means that the DNA is subjected to ultraviolet, and GlassScreen → ¬stimulation(UV) means a glass screen protects against ultraviolet light.

The advantage of such a logical framework is the possibility of representing almost everything in a very natural way that is close to natural language. The tradeoff of this first-order language representation is the combinatorial complexity algorithms. Therefore, it is essential to reduce the expressive power of language.

3.3 Causality and classical inference

Interactions between genes can be seen as a very simple form of causality (Kayser and Levy, 2004). To express basic interactions in logic, it is common to use two binary relations, trigger(A, B) and block(A, B), as well as the ternary relation bind(A, B, C). The first relation means that the protein A triggers or stimulates or activates the production of protein B. The second relation is the inhibition. These two relations are graphically represented by A → B and A B. The third relation means that A and B bind to produce C (in Figure 22.3, for instance, p53 and Mdm2 bind to produce B).

These relations are a basic form of causality. The main purpose of causality is to distinguish between what is caused and what is true. That means that not all inferences obtained are true in the case of biological systems.

In this chapter, we describe and use the simplest logical form of causality, and show that cell modeling can be validated by either biological knowledge or biological experiments. This approach could be improved or replaced by Bayesian approaches or much better probabilistic logics (Sato and Kameya, 2003), and another option is modal logics.

The inferences of classical logic A → B or A B are fully described formally, with all the so-called good logic properties (tautology, noncontradiction, transitivity, contraposition, etc.). But the causality cannot be seen as a classical logic relation. A basic example is “If it rains, the grass is wet.” In classical logic, the formula Rain → LawnWet means that if it rains, the grass is always wet. But this formula is too strong. Indeed, there may be exceptions to this rule (say, if the lawn is under a shed).

In the first approach, the biological events can be expressed naturally as follows:

1. If A trigger B and A is true, then B is true.

2. If A blocks B and A is true, then B is false.

Depending on the context, true can mean what is known, certain, believed, or proved. The first idea is to express these laws in classical logic by two formulas:

$\begin{array}{l} f_{1} = trigger (A, B) \land A \to B \\ f_{2} = block (A, B) \land A \to \neg B \end{array}$ $\begin{array}{l} f_{1} = trigger (A, B) \land A \to B \\ f_{2} = block (A, B) \land A \to \neg B \end{array}$

si1_e

But this formulation is problematic when there is conflict. Unfortunately, this is often ignored. For example, if we add the three pieces of information: A trigger B, A blocks B and A is true, there is a conflict (inconsistency in logic). These three information are represented by a set of three propositional formulas:

$F = \{A, trigger (A, B), blocks (A, B)\},$ $F = \{A, trigger (A, B), blocks (A, B)\},$

From F, implies B and ¬B. This is inconsistent. To solve such incongruencies, we can try to use methods inspired by constraint programming, such as the use of negation by failure of the Prolog programming language. It is also possible to use nonmonotonic logic.

The first method, negation by failure, poses many theoretical and technical problems beyond the simplest examples. These problems are often solved by adding properties to the formal system in keeping with the domino principle. We find here all the classic problems that arise when one wants to try to formalize and use of negation by failure in programming languages such as Prolog or Solar (Nabeshima et al., 2010).

Therefore, we will use here a classical nonmonotonic formalism: the default logic of Reiter (Reiter 1980).

3.4 Causality and nonmonotonic logics

To solve the conflict problem action/reaction simultaneously, the intuitive idea is to weaken the previous formulation as follows:

1. If A triggers B, if A is true, and it is possible that B, then B is true.

2. If A blocks B, if A is true, and it is possible that B is false, then B is false.

The problem is to formally describe what is meant by “possible”. This question began to arise in the study of artificial intelligence 30 years ago, and led to investigations of new types of reasoning. In this type of reasoning, one has to deal with incomplete, uncertain information that is subject to revision and sometimes is false. On the other hand, we must often choose between several possible contradictor conclusions.

One basic example is:

Penguins are birds.

Birds fly.

Penguins do not fly.

In classical logic, we have a contradiction if we know that Tweety flies, and the system is inconsistent. This inconsistency can be accounted for if we can handle the exception by replacing “Birds fly” with “Typically, birds fly”. The nonmonotonic logics formally describe the modes of reasoning that take into account these phenomena.

Classical logic, as first-order logic or modals logic, are monotonic: if it adds information or a formula F to a formula E, everything which was deduced from E will be deducted from E ∪ F. In other words, for monotonic logic, anything that is deduced from information will always be true if we add new information. This property of monotonicity generates inconsistent or revisable information. Indeed, in real life, it will be common to invalidate previously established conclusions when new information is added or changed. Nonmonotonic logic allows us to eliminate the monotony property of the classical logic: conclusions can be revised with the addition of new knowledge.

3.5 Default logic

In this section, we use default logic (Reiter, 1980), which is the nonmonotonic logic most frequently used. This logic can be seen as an improvement and a generalization of the negation by failure in Prolog. It is also a generalization of answer set programming (ASP) formalisms, which appear years later.

Default logic is defined by = (D, W). The set W is a set of facts that are always true information represented by formulas of propositional logic or first-order logic. The set D is a set of defaults that are inference rules with specific content, which expresses the uncertainty. A default is an expression of the form

$d = \frac{A (X) : B (X)}{C (X)},$ $d = \frac{A (X) : B (X)}{C (X)},$

si3_e

where A(X), B(X), and C(X) are formulas and X is a set of variables. A(X) is the prerequisite, B(X) is the justification, and C(X) is the consequent. The set X is the set of free variables in d.

Intuitively, the default d means that if A(X) is true, and if it is possible that B(X) is true, then C(X) is true. If B(X) = C(X), the default is normal. Normal default means that normally, A implies B. In this chapter, a default is seminormal if its justification entails its conclusion. The default logic used transforms the previous rules, which are expressed as follows:

1. If A trigger B, if A is true and if B is not contradictory, then B is true.

2. If A blocks B, if A is true and if ¬B is not contradictory, then ¬B is true.

And these rules can be represented by normal defaults, without variables:

$d 1 = \frac{trigger (A, B) \land A : \neg B}{B}$ $d 1 = \frac{trigger (A, B) \land A : \neg B}{B}$

$d 2 = \frac{blocks (A, B) \land A : \neg B}{\neg B}$ $d 2 = \frac{blocks (A, B) \land A : \neg B}{\neg B}$

Therefore, a signaling pathway is represented here using a default theory = {W, D}, where W is a set of classical logic formula and D is the set of defaults.

3.6 Extensions and choice of extensions

The goal of default logic is to find extensions of a default theory = {W, D}. An extension E is a consistent set of formulas obtained by adding to W, under specific conditions, a maximal set of consequents of D. An extension can for example, represent a subgraph of the gene pathways constituted in network without conflict. The definition of extension is based on the utilization of W and a subset of defaults D.

Formally, E is an extension of if and only if

$E = ⋃_{i = 0}^{\infty} E_{i},$ $E = ⋃_{i = 0}^{\infty} E_{i},$

With

$1) E_{0} = W$ $1) E_{0} = W$

$2) E_{i + 1} = T h (E_{i}) \cup C / \{d = \frac{A : B}{C} D, A \in E_{i}, \neg B \notin E\} .$ $2) E_{i + 1} = T h (E_{i}) \cup C / \{d = \frac{A : B}{C} D, A \in E_{i}, \neg B \notin E\} .$

si8_e

Here, Th(E_i) is the set of formulas derived from E_i in classical first-order logic.

The previous definition is difficult to use in practice. Indeed, in the latter condition, ¬B ∉ E supposes that E is known, while E is not yet calculated. In this case, an extension is a fixed point. But if all defaults are normal, the last condition can be replaced by ¬B ∉ Th(E_i). The verification of the condition is possible and extension to build. In practical terms, an extension is built starting with W and adds a maximal consistent set of consequents of D. The condition to use a normal default $d = \frac{A : B}{B}$ $d = \frac{A : B}{B}$ starts by checking if, in the current state, the prerequisite A is verified as B, and B does not lead to contradiction. In a simple manner, B does not lead to contradiction, meaning ¬B (the negation of B) is not verified. If this request is True, we add the consequent B to W, and the algorithm is restarted until all defaults have been used or rejected.

For example, if we consider = {W, D} with

$W = \{A, trigger (A, B), block (A, B)\}$ $W = \{A, trigger (A, B), block (A, B)\}$

and

$D = \{d 1, d 2\} given above,$ $D = \{d 1, d 2\} given above,$

it is impossible to use d1 and d2 in the same extension. Indeed, if d1 is used, B is added to the extension, and it is impossible to use d2 because ¬ ¬B = B is verified. Similarly, if d2 is used, it is impossible to use d1. However, it is possible to use a single default. Therefore, we obtain two extensions: E1, which contains B, and E2, which contains ¬B:

$\begin{array}{l} E 1 = \{A, trigger (A, B), B\} if d 1 is used . \\ E 2 = \{A, block (A, B), \neg B\} if d 2 is used . \end{array}$ $\begin{array}{l} E 1 = \{A, trigger (A, B), B\} if d 1 is used . \\ E 2 = \{A, block (A, B), \neg B\} if d 2 is used . \end{array}$

si12_e

By using default logic, the conflict is resolved, and we obtain two contradictory solutions.

But it is not possible to rank the extensions: Is B true or false? In fact, this will depend on the context. For biologists, sometimes the negative interactions are preferred to the positive ones (or vice versa). Another possibility is to use probabilistic or statistical methods to weigh each extension based on the evaluation of the knowledge. From an algorithmic viewpoint, the ranking of extensions also can be evaluated through the calculation of these extensions; and even offline ranking is preferred.

4 Completing the signaling pathways by default abduction

In Figure 22.3, earlier in this chapter, we showed a simple example that sums up the question “How could we block cancer by preventing B?” One analytical solution is to activate protein C, which blocks B, and to consider that protein X is for binding with p53 or mdm2. This simple reasoning reflects the biologist’s approach (Inoue et al., 2013). Therefore, in this case, it is possible to set up some experiments that prove this hypothesis. It is well known that experiments are time consuming and expensive. It is obvious that in the case of Big Data, it is necessary to automate the process, taking into account all available information. In so doing, we can consider how it is possible to find, in silico, a molecule (i.e., a future drug) that might act effectively by blocking some pathways. In terms of logic, drug discovery is a problem of abduction.

Classical logic uses the deduction F ⊢ R, which says that result R is inferred from an information (formula) F. Abduction generalizes deduction. The information F is incomplete, and abduction amounts to adding to F a set of hypotheses H such that F ∧ H ⊢ R and F ∧ H is consistent.

Abduction is an important topic in artificial intelligence. The difficulty lies with implementation of the algorithms. Abduction algorithms are far too high in terms of computational complexity. Even limited to propositional calculus, the theoretical complexity revolves around $\sum_{2}^{p}$ $\sum_{2}^{p}$ which is totally unacceptable beyond very small examples. Many theoretical studies have been done on the complexity of abduction and research the sublanguage of propositional calculus where complexity is reduced. These sublanguages most often cover the Horn clauses and renaming. But even here, the complexity is too great and can be considered NP-complete (all known algorithms are exponential in the worst case). Conversely, existing polynomial classes provide only a low power of expression on issues to be addressed. On the other hand, for many real applications, experience shows that it is not necessary to use the full expressive power of logic.

In the case of signaling pathways, abduction is used mainly to search missing interactions. The predicate trigger(α, X) is used inside the default without prerequisite:

$d = \frac{: trigger (α, X)}{trigger (α, X)} .$ $d = \frac{: trigger (α, X)}{trigger (α, X)} .$

si14_e

Here, α is a variable, which could be instantiated by any protein. This can be generalized to all predicates.

For the example given by Figure 22.3, by calculating the extensions that contains the predicate block(cancer), we obtained eight extensions, but only two contains p53 and mdm2. These two extensions are represented by Figure 22.4:

f22-04a-9780128025086 — Figure 22.4 (a) The explanation using Default Logic (extension 1): the protein X joints mdm2. (b) The explanation using Default Logic (extension 2): the protein X joints p53.

f22-04b-9780128025086 — Figure 22.4 (a) The explanation using Default Logic (extension 1): the protein X joints mdm2. (b) The explanation using Default Logic (extension 2): the protein X joints p53.

• In Figure 22.4, all the original rules (black arrows) are given as defaults rules.

• The red arrows show links automatically found by abduction.

• The right portions of the figures, list the default rules used for the extensions that block cancer. Here, the sign - is the negation ¬.

• As UV and Mdm2 are activated, is indeed obtained that cancer is blocked.

• Note that for the 2 cases, it is impossible to use a additional default rule: we get an inconsistent set of formulas.

5 Logic representation of a signaling pathway with the goal of reducing computational complexity

Today, no programming language exists that allows the abduction reasoning under the incomplete and uncertain information. We present in this section the outlines of a language dedicated to the discovery of biological interactions responding to these requests. This formalism uses the default logic and also has a dynamic approach that consists of considering the time as a succession of events. The syntax is inspired from Prolog, and it is described next.

5.1 Clauses and Horn clauses

The logic plays with atoms building relation R(A1, .., An). If the atoms are signed, they are called literals: positives R(A1, .An) or negatives ¬R(A1, ..An).

In our representation, product(P53) is an atom. It is also a positive literal, meaning that the protein p53 increases in concentration. The ¬product(P53) is a negative literal, which means that it is not possible to determine if the p53 concentration increasing. In this model, activation is related to the increase in concentration. The dynamic of the system, for example, can be specified by concentration (p53,100,T), which means the concentration of p53 at the time T equals 100 a.u., and ¬concentration(P53,sup(200),T + 3) says that at the time T + 3, the concentration of p53 is not greater than 200 a.u.

Clauses are the simplest type of formula. Formally, a clause is a disjunction of literals $1 \lor .. \lor l n$ $1 \lor .. \lor l n$ . If the connectors are forgetters, a clause is a set or a list of literals. For example {a, ¬b, ¬c, d} or a ¬b ¬c d represents $a \lor \neg b \lor \neg c \lor d$ $a \lor \neg b \lor \neg c \lor d$ . A Horn clause is a clause with a maximum of one positive literal. The clauses a and ¬b ∨ ¬c ∨ d and ¬b ∨ ¬c are Horn clauses, and a ∨ b is not. For the rest, we use Horn clauses, which are interesting for two reasons.

First, using Horn clauses is a natural way to represent knowledge. In fact, the formula a ∧ b ∧ c ∧ d → d is equivalent to the Horn clause ¬a ∨ ¬b ∨ ¬c ∨ d. Similarly, the formula ¬(a ∧ b) (a and be cannot be true simultaneously) is equivalent to the negative Horn clause ¬a ∨ ¬b.

The second advantage of Horn clauses, which is fundamental in this discussion, is that their use drastically reduces the computational complexity. Indeed, any logical formula can be rewritten as a set of clauses, so complexity problems may arise in terms of clauses. In propositional calculus, the basic problem is whether a set of clauses is consistent. This is the Satisfiability problem, which is NP-complete. Otherwise, all known algorithms are exponential in the worst case. On the other hand, if all clauses are Horn causes, algorithms can be linear proportional to the size of the data. For gene pathways, the use of Horn clauses provides practically usable algorithms.

Clearly, Horn clauses cannot represent all formulas. In particular, a ∨ b is not a Horn clause. But this type of positive disjunctive information is quite rare in practice. To be more explicit, in the case of DNA DSBs, all biological reactions can be represented using Horn clauses. Even in the case of non-Horn clauses, it is possible to solve the problem by using efficient techniques, which come from constraint programming and from practical algorithms for solving NP-complete problems. These techniques may be, for example, using cardinalities, symmetries (Benhamou and Siegel, 2008) (Benhamou et al., 2010), and strong backdoors (Ostrowski et al., 2010).

In fact, the use of Horn clauses and default logic has been studied for an application with the DCNS group (DCNS is the main French group that designs, builds, and supports surface combatants, submarines, systems, and equipment). The problem was to simulate the decisions of a commander aboard a submarine in wartime. This is a problem of incomplete information because a submarine is “blind” and almost “deaf,” but it must make quick decisions, so to speak. Using Horn clauses, default logic, mutual exclusion, and simple management of the temporal aspect, it was possible to simulate several hours of fighting in seconds (Toulgoat et al., 2010a, 2010b; Toulgoat, 2011).

5.2 Language syntax

A signaling pathway is described by a set of rules. A rule can be either a hard rule, a default. Pratically a rule is a triplet (<type>, <corps>, <weight>) for which:

• < type > can take two values: hard and def. If the value is hard, the rule is a hard rule and represents a Horn clause, which is sure and nonrevisable. If the value is def, the rule represents a normal default.

• < weight > weighs the rule, which makes it possible to choose between the different extensions proposed by the algorithm.

• < corps > is a couple (L, R). The left element L is a set of literals (l1, ..ln) eventually empty. This set is identified to l1 ∧ .. ∧ ln. The right element R is either a single literal or empty. If the rule is hard, the couple (L, R) represents the formula L → R. If the rule is a default, the couple represents a normal default $d = \frac{L : R}{R}$ $d = \frac{L : R}{R}$ Increased attention is made given these two cases.

5.3 Hard rules and default rules

5.3.1 Hard rules

A hard rule (L, R) represents the formula L → R, where L is a conjunction of literals and R a literal. As we decided to restrict our algorithm to Horn clauses, all literals of L are positive. The literal R can be positive or negative. Here, we describe two special cases:

1. L is empty. Therefore, the rule represents a positive or negative unary clause. Unary clauses are elementary sources of information. They do not contain variables, they are ground clauses. This allows the decidability of the algorithm. However the other clauses can contain variables, making possible to leave the pure propositional calculus.

2. R is empty. For this empty consequence, the rule L → ∅ is equivalent to ¬L, which in turn is equivalent to a negative clause. For example, we can use such a clause to represent a mutual exclusion: “It is impossible to trigger and to block a protein at the same time.”

5.3.2 Default rules

If (L, R) is a default rule, then it represents a normal default, the prerequisite is L, and R is the justification and the consequent. If the prerequisite is empty, the default is without justification. According to the definition of the defaults, it is impossible to have an empty consequence. Unlike hard rules, the prerequisite R can contain negative literals.

5.4 Cell signaling pathway representation

We have based our work on the bibliographic data of the response to DSBs represented on a signaling pathway (see Figure 22.2 earlier in this chapter), given by Pommier et al. (2005). The drawback of this representation is that it is very difficult to add a new interaction or protein without full reassessment. In particular, the management of conflicts (such as simultaneous trigger and block interactions) is very difficult. We worked on the translation of this interaction map in our logic language. Initial results have translated this signaling pathway, and some algorithms have been tested (Doncescu and Siegel, 2012; Doncescu et al., 2012a, 2013, 2014a, 2014b).

Today, the signaling pathway is translated by 206 rules in a very natural way, without having to tweak the predicates or the rules. The rules are expressed in the syntax given previously in this chapter. These rules can be hard rules or defaults. The syntax used is very simple, and it is possible to change the nature of these rules to test different configurations. We can calculate the extensions in a very short time. As it turns out, the use of non-Horn clauses was not necessary. This reinforces our opinion that it is possible to use a nonmonotonic logic and abduction and time on real applications.

In the case of signaling pathways, we consider a predicate to be a biological reaction or the production and activation of proteins. For example:

• product(P), binding(P, Q, R), block-binding(P, Q), stimulation(P),

• phosphorylation(P), dissociation, transcription-activating,

• increase(P), decrease(P)…

• concentration(P, > 1000), where 1000 is a threshold if it is possible to measure it.

The logical model of DNA DSB contains two types of rules: facts (hard) and defaults (def). For example:

• hard: stimulate(dsb, dna) that is an elementary fact (a ground unary clause) says that DSB stimulates ADN.

• def: stimulate(dsb, dna) → product(altered-dna). This default rule means, “(Generally when dsb (double-strand breaks) stimulates DNA, an altered DNA is produced). For example, during replication, a thymine nucleotide might be inserted in place of a guanine nucleotide. With base substitution mutations, only a single nucleotide within a gene sequence is changed, so only one codon is affected.”

• hard: product(p-atm-atm-bound) → ¬product(atm-atm) that is a negative clause. def: product(p-s15-p53-mdm2) ∧ product(p-chk1). → phosphorylation(p-chk1, →p-s15-p53-mdm2).

Using a simple logic formalism can express much of what biologists need to represent.

6 Algorithm and implementation

The algorithm is written with SWI-Prolog, which is an implementation of Prolog. A rule:

(< type >, < corps >, < weight >)

is represented by a unary Prolog clause:

rule(< type >, < corps >, < weight >).

Thus, the rules and the algorithm are in the same Prolog program, which is very practical. Another advantage to using Prolog is that the unification, the backtracking, and the list management are well optimized. Of course, Prolog is interpreted, so it is slower than compiled languages (but not significantly). On the other hand, Prolog programs are short and simple, which saves a lot of time to testing programs and heuristics.

Our algorithm calculates the extensions. As the clauses are Horn clauses, and as the defaults are normal, the research tree is optimized. It is easy to calculate extensions without duplicating (i.e., we do not calculate the same extension several times). For algorithms, we can also use a weak form of negation as failure (Benhamou and Siegel, 2012).

For initial tests, given by the map of the entire signaling pathway of Pommier et al. (2005), we calculate all extensions in a short time. For example, with most of the rules, there are two extensions by default. The calculation time takes 500,000 logicals inference per second (LIPS) and 0.4 seconds of central processing unit (CPU) time on MacBook. The temporal aspect of gene networks has been tested for small examples, but the scaling has not yet been done. Our experiments on submarines application (Toulgoat et al., 2010a, 2010b; Toulgoat, 2011) suggest that it should work. For the abduction, it is similar. The algorithm has been tested on small examples, and scaling up, with the greatest exemples, remains to be done, but again, that should be possible. There are not theoretical problems.

7 Results

It has been well established that loss of function of the p53 tumor suppressor protein is a major step in the development of cancer. This can be obtained by direct mutation of the p53 gene, but also by alterations in p53 regulators, downstream targets, or both. Moreover, it has been established that the Mdm2 protein is an essential regulator of p53 during embryonic development. The p53 protein family, which consists of p53, p63, and p73, plays a pivotal role in the regulation of multiple cellular mechanisms, including apoptosis, cell cycle control, and DNA damage repair. The role of p53 in human studies is relatively well studied, but less is known about p73 and p63. Using our e-cancer model, we found several hypotheses about the activity of p73. Introducing time in defaults (the prerequisite considered at time t and the conclusion at time t + 1), we obtained a simplified map of Pommier et al. (2005). The DNA DSB generated automatically showed that the p73 protein is not directly implicated to trigger apoptosis, which is one of the key points of our knowledge discovery (Figure 22.5a).

f22-05a-9780128025086 — Figure 22.5 (a) DNA DSB map generated automatically.(b) DNA DSB map generated by Pommier (Pommier et al., 2006).

f22-05b-9780128025086 — Figure 22.5 (a) DNA DSB map generated automatically.(b) DNA DSB map generated by Pommier (Pommier et al., 2006).

Therefore, our main result is a new DNA DSB map containing only the main reactions. This result, obtained automatically using default logic (Figure 22.5a), has been compared with the simplified biological model of Pommier (Figure 22.5b). This model was obtained manually using a reasoning based on biological deductions. The difference is a few proteins.

At first, we used Hypothesis-Logic (Siegel 1990; Siegel and Schwind, 1991, 1993; Schwind and Siegel, 1994), which is a modal generalization of default logic but the computational complexity was too high. We also used and tested the concepts of production fields (Siegel, 1987) of characteristic clauses (Bossu and Siegel, 1985) and a consequence-finding algorithm (Siegel, 1987; Boi et al., 1992), based on production fields. Again, the computational complexity is not of interest here.

Our algorithm has been compared with SOLAR (Nabeshima et al., 2010). SOLAR is based on production fields, and the consequence-finding algorithm is used to produce new clauses (Doncescu et al., 2002, 2007a, 2007b). The great complexity of these algorithms does not allow real examples to be set up (Inoue, 2004).

We compared this with the ASP formalism. Again, the impression is mixed. Indeed, ASP deals mainly with normal defaults, without prerequisites. Obtaining all the power representations of defaults with prerequisites is possible by rewriting techniques. But the drawback of this approach is a loss of clarity and efficiency. For ASP, a new semantic (Benhamou and Siegel, 2012), seems to work reasonably well in practice.

8 Conclusions

Most researchers attempt to complete the signaling pathway. In our approach, the map is simplified as the main reactions, which is very useful for biological experiments. In this chapter, an algorithm based on default logic is proposed to check the consistency of DNA DSB maps. First, we define the representation, which is concise and adequate. Our representation keeps the flow of information represented by gene expression and receptor and protein structures from DNA DSBs to apoptosis and all main proteins involved in the cell cycle. Our result concerning p73 opens new avenues of investigation. Furthermore, it was demonstrated that p73 and p63 family interact with each other, and these interactions play a critical role in human tumorigenesis. Our experimental data also suggest that these proteins are strongly involved in a response to anticancer curative treatment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 22: DNA Double-Strand Break–Based Nonmonotonic Logic

Create new playlist

Sign In

Sign Up