CADBIOM reference manual

Michel LE BORGNE
Geoffroy ANDRIEUX
Nathalie THÉRET

Contents

1 Introduction
2 Installation
 2.1 Availability
 2.2 Installation
 2.3 Running CADBIOM
3 Background
 3.1 Basic concepts: informal introduction
  3.1.1 State variables
  3.1.2 Chronometrical time versus logical time
  3.1.3 Events and time in Cadbiom.
4 CADBIOM concepts
 4.1 Model components
 4.2 Model frontier:
 4.3 Special places
 4.4 Discrete event simulation
5 CADBIOM graphical interface
 5.1 File menu
 5.2 Layouts menu
 5.3 Help menu
 5.4 Model design and model handling
 5.5 Graphical view
 5.6 Node list
 5.7 Model global view
 5.8 Informations
 5.9 CADBIOM simulator
  5.9.1 Simulator window
  5.9.2 Initialization window
  5.9.3 Chronogram
 5.10 CADBIOM checker
  5.10.1 Purpose of the checker
  5.10.2 Checker window
 5.11 CADBIOM static analysis
  5.11.1 Isolated SCC
  5.11.2 Basal activated genes
  5.11.3 Dependancy graph
  5.11.4 Statistics
6 CADBIOM textual language
7 Pathway Interaction Database interpretation
 7.1 Elementary translations
 7.2 Composition
8 CADBIOM API
 8.1 Building the model
  8.1.1 From a chart model
  8.1.2 Loading from files
 8.2 Modifying a model
 8.3 Analysis
  8.3.1 Static analysis
  8.3.2 Dynamic analysis- Basic methods
  8.3.3 Dynamic analysis - Higher level methods for signaling models
  8.3.4 Extracting results
 8.4 Simulation

Chapter 1
Introduction

This manual describes the usage of CADBIOM, a software package implementing the interpretation of biological events into guarded transitions. Guarded transitions are a kind of rule based modeling which proved to be useful in the study of cell signaling combinatorics. CADBIOM software is able to analyze large models which can be built either manually with the help of a graphical interface or a specification language, or can be obtained from compilation of different frameworks for representing biological knowledge. For example, CADBIOM is able to compile the entire Pathway Interaction Database into a model which integrate 137 different pathways. Developments are underway to provide interpretations, as guarded transition based models, of other biological knowledge representations.

Models based on guarded transitions are oriented to signal propagation dynamics. This point of view contrasts with many modeling approaches of cell signaling which focus on equilibrium changes in the concentration of species involved in cell signaling. In the actual implementation, the objectives of the models are to give conditions on cell configuration and cell environment, that imply some physiological effect through signaling.

The CADBIOM package include a Python package and a graphical interface. The graphical interface is the easiest way to use CADBIOM. The interface provides a graphical editor to build models, different means to explore it and a simple query interface to analyze the dynamics of the model. For more involved analysis of models, it is advisable to use the CADBIOM package in Python scripts. A hight level description of the CADBIOM API is provided in this manual. For a more technical manual, have a look to the technical documentation which is provided on the site. This documentation was automatiquely generated from the Python source by the epidoc([]) documentation tool.

Chapter 2
Installation

2.1 Availability

CADBIOM is freely available on cadbiom.genouest.org, distributed under the terms of the GNU General Public License. The website also provides a broad range of informations, model library and documentation.

2.2 Installation

CADBIOM is mainly written in Python 2.7 language under a Linux operating system. However it uses a Python binding to the CryptoMinisat SAT solver which must be recompiled for porting to other architectures. The SAT solver and its Python bindings are contained in the two files pyCryptoMS.py and _pyCryptoMS.so. For convenience, we distribute 32-bits and 64-bit versions for Intel architectures.

C++ and Swig sources for building the solver and it’s Python binding are available and should be easy to compile on other architectures, provide a cmake program is available together with some fairly standard libraries. Details can be found in the source distribution of pyCryptoMS.py and _pyCryptoMS.so on CADBIOM site.

The pyCryptoMS package can be installed in the current Python 2.7 distribution if you can get the root privileges. Otherwise, it can be put in a user dir and the PYTHONPATH environment variable must be set accordingly.

CADBIOM was developed on a Linux Debian distribution but has also been tested on a Ubuntu distribution running in a VirtualBox supported by a Windows system. Before running CADBIOM, the following Python packages have to be installed : python-pygraphviz, python-lxml, python-networkx and python-gtksourceview2. CADBIOM uses more Python packages which are parts of the Debian or Ubuntu distributions.

On other distributions you may have to find the needed packages under different denominations. In any case, don’t hesitate to launch CADBIOM. If somethings wrong, the Python interpreter will complain sooner or later for missing libraries.

We do not plan to port CADBIOM on Windows systems. It should not be a difficult task since there is a Windows version of the GTK graphical library. For a light to medium use of CADBIOM, the virtual machine solution is very comfortable. If you need to perform computationally intensive searches, you will probably move to a cluster running, most of the time, under a Linux operating system.

2.3 Running CADBIOM

Launch CADBIOM with the command: python cadbiom.py in a terminal window in CADBIOM main directory.

Chapter 3
Background

Deciphering biological signaling networks represents a huge challenge. Advances in this domain may help in the understanding of many illnesses including cancers. The complexity of cell signalling comes from its dependence on many protein components including the protein involved in the standard signaling pathways but also on the type of the cell of interest and the state of the extracellular matrix. Information processing by the cell is a very complex mechanism which depends on numerous external and internal parameters. It is then difficult to formulate a question concerning signaling process since the context of the question (the external and internal parameters) may be partially unknown. For example, the effect of cytokine TGFβ on the cell cycle is dramatically modified during an epithelial-mesenchymal transition. In the epithelial stage TGFβ represses proliferation while in the mesenchymal state it induces proliferation. The great quantity of components involved in cell signaling defy reasoning. We need to glue the numerous protein interactions in a formal model and perform complex computations to try to answer a question such that: what are the conditions under which the TGFβ induces cell proliferation instead of cell homeostasy?

The first approaches to build models of signaling pathways were based on biochemical descriptions of the pathways. Differential equations has been used to model the variations of concentrations of proteins involved in the pathway. For example a fairly complex differential model of EGF signalling pathway was made and simulated to give quantitative results in accordance with experimental measures ([]). Models exist also for TGFβ, although with less details([]). However, these differential models describe only one signaling pathway at a time and even partially. It is very difficult to couple two models and consider cross-talks between pathways. The obstacles to this coupling are of different nature. Among the most important we find differences in the degree of details and concentration measurement which can be relative or not or even in arbitrary units. Even combining models describing various aspects of a unique pathway is not an easy task. Added to the well known fact that building a new differential model is a huge task, these kind of models cannot be used at present for understanding complex signaling events. Moreover in the case when differential models might exist, it is not clear that simulating concentration variations might give a deep insight on information processing in cells.

Although the primary literature on cell signaling scarcely contains quantitative information on biochemical reactions, it still contains lot of informations of qualitative nature. This kind of information is based on biological concepts. Many efforts have been made in the past years through specification of controlled vocabulary and ontologies to precisely define and classify concepts. These result in the availability of some kind of grids to extract the information spread in the literature. Several databases were build under a rigorous curation scheme based on controlled vocabulary and ontologies. However, the content of each database reflect the interest of its creators. For example, several databases are dedicated to protein interactions (HDDR, MINT, ..). Protein interaction is indeed a fundamental concept in signaling. However, this concept is too general and too static to analyze the dynamic of a signaling network. What we really want is the effect of the interaction, its role in a transformation or a reaction which is a component of a biological process. The BIOCARTA and REACTOME databases have adopted this point of view and have been included in the PID database which focuses on biological events. According to PID documentation, a biological event (transcription, binding, translocation, reaction) has molecular input, output, positive and negative regulators which give a role to molecular interactions. Such dynamical description of biological event is amenable to dynamical model building.

Efforts for representing biological knowledge in a model building perspective are underway. Let us notice the interesting initiative of the System Biology Graphical Notation group ([]). This group introduced the view concept: in short, it is necessary to adopt different views to have a comprehensible description of biological events. In the Kappa approach, biochemical knowledge is described by mean of rewriting rules. These descriptions are subject to different interpretations which have some similarities with different view, but at modeling steps. CADBIOM propose its own view of biological events. This view focus on information processing and not on concentration variations. Of cause, this point of view captures only one aspect of cell live but we thing that this aspect is truly important.

So, CADBIOMhas been designed to give some help in modeling complex cell signalling networks. The standard computer science concept of guarded transition formalism is closed enough to reaction description in the literature and in some databases. This allows for automatic model building from data extracted from the whole PID database.

3.1 Basic concepts: informal introduction

3.1.1 State variables

Numerous papers in the molecular biology literature describe the behavior of some biomolecules. Biomolecules are mostly (but not uniquely) proteins with post translational modifications and cellular or extracellular localizations. Complexes play also an important role and can be considered as biomolecules. Biomolecules are subject to transformations such that phosphorilations, translocations, complexification and decomplexifications. Variations in concentration, discretized on a finite number of levels, may also be considered as transformations although it is not the point of view of CADBIOM.

In many biological reactions, some biomolecules are subject to modifications (chemical modifications, bindings, translocations ...) while others play a role without being transformed. The concepts of activator and inhibitor of a biological reaction is very common at this abstraction level. Of cause, the status of a biomolecule depends on the abstraction level: an enzyme may be considered as an actor of a biochemical reaction without being transformed while, at a finer grain description, it is transiently bounded to reactants.

In Cadbiom language, biomolecules can be represented as state variables represented by places in the graphical description. A place can represent either a biomolecule or a concentration level of a biomolecule. More formally, it is a boolean variable. If the value is true we will say that the place is activated. It is unactivated for the value false. The meanings of “activated” and “unactivated” are linked to the modeling context. Roughly, “activated” means present and ready to play its role in the model.

A biological reaction is naturally represented by guarded transitions. The transition stands for the transformation of a biomolecule into another one or an evolution in the concentration. The guard is useful to model the activation and inhibitions which doesn’t involve any transformation of the actors.

Specifying a discrete model of a biological system in a guarded transitions language is very comfortable since the formal language is closed to the way molecular events are described in biology. The language ensure a natural compositionality by connecting transitions that share state variables and through condition guards. However, as in many distributed system, time is an important concern. As far as transitions are connected, the sequenciality of reactions is ensured. Difficulty arise when reaction have several inputs and several reactions are responsible for a molecule transformation. In the later case it is in general difficult to have information on which reaction occurs since the biological knowledge comes from different experiments and cannot be linked by a common time reference. The evolution of the whole model will depend on the synchronizations between the different parts, loosely connected by conditions only.

Before describing the way Cadbiom address this problem, we introduce the concept of logical time.

3.1.2 Chronometrical time versus logical time

Time is an important concept in every day life. Time is in the digital clock that wakes you up every morning but it is also the rhythm of the lunar phases or the sequence of cell cycle phases. Time is a highly complex notion that conveys several ideas, such as periods with a cyclic structure, instants with a temporal precedence order, or durations. [?]. Physical time observation is associated with an event such that the full moon with several (presumably an infinite number) occurrences. A clock is a counter of occurrences of an event. Events are the basis of time measure.

Time duration is usually measured by counting the number of occurrences (ticks) of an event called a chronometer, between two occurrences of other events such that the start and the end of a race. A chronometer may be realized from astronomical observation, small oscillations of a pendulum or the vibrations of a quartz or a cesium atom. So a chronometer is a particular event. Time measurement imply that it is possible to compare event occurrences with occurrences of a reference event.

When it is impossible to measure durations as in cell chemical reactions or when it is impossible to compare time measures as in computer science, we use logical time. Logical time is particularly adequate to describe discrete event systems. Logical time focuses on the ordering of occurrences of different events and their interlacing. It ignores the physical duration between the occurrences of events.

When we deal with logical time, no event plays a special role as it is the case for a chronometer when dealing with physical time. Moreover, no regularity hypothesis is made on the occurrences of events as it is the case for the ticks of a chronometer. This time framework is well adapted to discrete modeling of biological events since fact are gathered without any common time reference.

CADBIOM events The CADBIOM events are special signal. A (discrete time) signal is realized by a sequence of values taken in its domain. Of cause, a signal with name X has, in general several realizations. An event is a signal h with only one value . h is the name of the event. A realization of an event is a sequence of . An isolated event has no interest since in logical time, we consider partial orders between event occurrences.

Logical time appears when we consider several events1. Given two events h1 and h2, in a realization of both events, h2 may be absent at an occurrence of h1. In order to represent this situation, we introduce the symbol which means absent. For instance, a realization of a pair (h1,h2) of events is:

h :  ⊤  ⊤   ⊥  ⊥   ⊤  ⊤   ⊤  ...
 1
h2 : ⊥  ⊤   ⊤  ⊤   ⊥  ⊥   ⊤  ...
So a realization of a finite set of events (hi)i∈I is a sequence of multiples in {⊤,⊥}|I|-{()i∈I}. The multiple with all components equal to is excluded since there is no time if there is no event. If nothing happens, there is no observation. The non-occurrence of an event is always defined relatively to another event. The indexes of the sequence are called steps. If for a step, the component of the multiple representing an event h is , we will say that the event is present at this step. Otherwise, the event h is said to be absent at the step.

It is very convenient, for writing simple mathematical formula, to assimilate the value to the boolean value True or 1. With this assimilation, realization of a model with events and state variables is a sequence (finite or infinite) of multiples from {0,1,⊥}n, one component for each event or state. Since states have always a value, a component corresponding to a state never takes value. A realization is also called an interpretation or a trajectory.

Temporal operators. We define 2 operators, default and when, that act more specifically on event occurrences. The default operator merges the 2 events h1 and h2.

          h1 :  1  1  ⊥   ⊥  1   1  1  ...
          h2 :  ⊥  1  1   1  ⊥   ⊥  1  ...
h1 default h2 : 1  1  1   1  1   1  1  ...
This operator is commutative on events.

If B is a boolean expression on state variables, the expression h1 when B defines a new event. It occurs when h1 occurs and B is true. Otherwise, this event is absent:

        h1 :  1  1  ⊥   ⊥  1   1  1  ...
        B  :  0  1  0   1  0   1  0  ...

h1 when B  :  ⊥  1  ⊥   ⊥  ⊥   1  ⊥  ...
The when operand realizes an under-sampling of the event. Notice that the condition B is defined at any time since it is a boolean expression on state variables.

Event constraints. Event constraints are specified from pairwise event constraints. There are only 3 binary event relations of interest. The first one is event inclusion specified by the instruction  included(h1, h2) which means that in any realization, if h1 is present then h2 is present. The second binary relation is synchro(h1,h2) which specifies that two or more events have the same occurrences in any realization. The last relation is exclus(h1,h2). Its semantic implies that the two events h1 and h2 cannot be present simultaneously in any realization.

3.1.3 Events and time in Cadbiom.

Time and synchronizations are important components of a dynamic system. In models oriented to concentration variations dynamic, a uniform time model is applied. In general the so called asynchronous scheme is used: one variation at each step.

In the guarded transition context, a natural synchronization constraint is: at each step, every fireable transition is fired. At first sight, this synchronization constraint is appealing since it says that a reaction occurs as soon as the inputs are activated and the guard condition, representing activators and inhibitors, is satisfied. Moreover this interpretation correspond to the data-flow computational model, a well known asynchronous computational model. Unfortunately the data-flow interpretation gives rise to many difficulties mainly originated in the building of models by composition of pieces from various experiments. We need to delay some transitions in order to integrate several independent observations. This is obtained by introducing free events.

In CADBIOMlanguage, each transition is guarded not only by a condition but also by a event expression. The transition is fired only if the origin is activated, the condition is satisfied and the event is present. This kind of guarded transition is similar to the ones used in UML specification language. The UML formalism is enriched, in CADBIOM, by an algebra on events and states for expressing sophisticated events. Many free events are introduced and there is no uniform time model. For that reason, we say that CADBIOMis polychronous.

Chapter 4
CADBIOM concepts

In this chapter we present the graphical language of CADBIOM based on guarded transitions.

4.1 Model components

CADBIOM uses two type of variables: the state variables represented by place names and the events variables which appears only in event expressions.

Places and transitions The basic building brick in our formal language is the transition. Graphically, a transition is composed of a source place and a target place represented by rectangles and linked by an arrow (fig4.1). The transition graph is a graph with places as vertices and transition arrows as directed edges.


PIC

Figure 4.1: Two places and a transition. A is the source place and B is the target place. The condition is [C or D]. In this example there is no event, the guard is reduced to a condition.


A transition is labelled with an expression describing the transition guard:

h[C or D ]
The expression between brackets is a boolean expression on simple place variables. No events are allowed. In the example, C and D must be place names. The identifier h represents an event expression. The same event identifier can be used in different transition guards. Events are expressions using the default and when operators. The two operands of a default must be of event type. The left operand of a when operator must be of event type and the right operand must be a boolean formula with place names as variables as for conditions between brackets.


PIC

Figure 4.2: Right transition fired if AA activated and after left transition is fired


The semantic of a transition is similar to UML transition semantic1. A transition is fired at a step of a realization if and only if:

Firing a transition has two effects:

From a temporal point of view, the firing of a transition is atomic. That is to say: all actions (deactivation of the source, activation of the target) must be considered as simultaneous and occurring within a step. The condition of a transition expression is optional. If omitted, the transition is assumed to have a True condition. Similarly, the event expression is optional. If omitted, the event of the transition is assumed to be present any time or, equivalently for the dynamic behavior, when the source place is activated.

If two transitions t1 and t2 are such that target(t1) = source(t2) and source(t2) is not activated, they cannot be fired simultaneously (fig 4.2).


PIC

Figure 4.3: The two transitions cannot be fired simultaneously if A is actived. The firing of at least a transition implies a new step for the trajectory of the system.


Transitions satisfying the firing conditions are fired simultaneously. If several transitions, sharing the same place, are fired simultaneously, their effects are activation of the place for incoming transitions and unactivation of the place for outgoing transitions. In this case, any activation overcome any unactivation. So, if a place is both activated by an incoming transition and unactivated by an outgoing transition at the same moment, it remains active, even if there are several unactivations for one activation. In CADBIOM there is no weight on transitions.

4.2 Model frontier:

The frontier is the list of simple places which cannot be activated from inside the model. This is the case for places without incoming transition. A more subtle case is when there is a cycle in the transition graph. No place in this cycle can be activated from the model if there is no incoming transition for any place of the cycle. This situation is illustrated in figure 4.4 where the places of cycleA,B,C cannot be activated by an incoming transition. Even if incoming transitions to a place in the cycle exists, no place of the cycle can be activated if these transitions cannot be fired when all places of the cycle are unactivated. The most general case is obtained when the cycle is replaced by a strongly connected component of the transition graph. If no incoming transition to a strongly connected component can be fired when all the places of the component are unactivated, the component is an isolated component.

The start places can be used to add places into the frontier. This can be useful when there is a cycle without fireable transitions. The user can choose one or more places of the cycle to be possibly activated at initialization.

The frontier of a model is an important concept. It formalize the interface of the modelled biological system with the external world. The checker (section 5.10), provides solution as a list of activated places on the frontier. The frontier of a model and isolated strongly connected components can be computed by static analysis of the model (see 5.11).

4.3 Special places

Special places are introduced either as markers in order to ease model specification or to improve the look of graphical representations.

Special places : Start places. A start place is represented by a black dot (figure 4.4). The place targeted by the outgoing transition of a start node is added to the frontier. A start place cannot have any incoming transition and its outgoing transition does not allow any condition. Start places are marks and a transition from a start place to another place is a pseudo transition.

In signal propagation models, places which are not at the frontier are initialized as unactivated. This is done in order to rule out trivial solutions for most of the queries. Marking a place with a start node put it in the frontier. As a consequence it is not initialized to unactivated but considered as a free variable which can be affected by the SAT solver for the initial step.


PIC

Figure 4.4: An example of the usefulness of start place. Without start place, the model has no frontier (because all places have an incoming transition). There is no natural initialization of places. With the start place, the place A is in the frontier and an initialization can be computed.


Special places : Permanent places. Very often we have to model some kind of “production process” which introduces, under some condition, some product as an actor in a system. The production process may be protein synthesis (transduction + translation), protein post-traductional transformation, transport action…From a modeling point of view we don’t want to give details in the description of this process but only say if it is active or not. Of cause, the production of the protein under some condition (transition fired) doesn’t imply the destruction of the producing mechanism. So the firing of a transition originated in such a place doesn’t result in the unactivation of the place.

For that reason we have introduced permanent places. A permanent place has two special properties:

Permanent places are represented as orange rectangles (figure 4.5).

Permanent places do not give more power to the formalism. The properties of permanent places could be simulated by adding a transition from the place to itself. However, the introduction of permanent places, as syntactic sugar, improve the readability of the graphical representation. They also help for a clear definition of the frontier. In most of the cases they are used to model gene activities.


PIC

Figure 4.5: Use case of permanent place : transcriptional activation of p15 protein.


Special places : Input places. Ordinary places cannot be activated from the outside world except at the initialization step. A place which can be activated or unactivated from outside is an input place. The action of the external world is the only one which can modify the status of an input place. Transitions from an input place to another place cannot have condition nor event. Input places have no incoming transitions.

Special places : Trap places. Trap places are simple places with no outgoing transitions. This is allowed for simple places so trap places were introduced for readability purpose. They are very convenient to abstract a degradation process. Trap places are represented by a black circle.

4.4 Discrete event simulation

Discrete event system simulation has been studied for a long time. A standard simulation scheme consist in repeating the following actions until some halting condition is satisfied:

In our case, the first action, finding the current events consists in reading the current input and events from a file or from memory and compute the transition events. Then the transitions to be fired are found. A transition is armed if its source place is activated. An armed transition is fireable if its condition is True and its event is present. If a transition has no event and no condition, it is fireable as soon as it is armed.


PIC

Figure 4.6: Place evolutions. A: step1, The start place actives (red) the place A at the initialization. B: step2, the transition is fired, the place A is deactivated and the place B is activated


Performing the state evolutions reduce, in our context to fire the fireable transitions, implementing the semantic described above: source unactivation and target activation. Enforcing the priority of activation on unactivation is realized by performing all unactivations first and then applying all activations.

Chapter 5
CADBIOM graphical interface

CADBIOMcomes with a graphical interface which is useful to study moderate size models. It provides means for model exploration, simulation and checking. Models with more than a thousand places are not displayed as the graphical representation is unreadable and time consuming. On such models it is still possible to extract selected places (see 5.6). For big models and complex studies, it is advisable to consider the CADBIOM API (8). The main window of the graphical interface (Figure 5.1) is divided in five parts:


PIC

Figure 5.1: CADBIOM window


5.1 File menu


PIC

Figure 5.2: File menu


The File menu (Figure 5.2) incorporates :

The extension file name has to be manually provided. The “.bcx” or “.cal” suffixes are only indicative and not mandatory. However, the file chooser use them to select displayed files. If you use other prefixes or no prefixes you must turn the selected files to “all files” in the file chooser.

The menu item for importing a PID XML file opens a widget for selecting the desired interpretation of PID data. It is possible to build a model with or without free events. A model without free events is interpreted in the data-flow semantic. It is also possible to choose how to combine activators and inhibitors either in a “or” formula or an “and” formula.

5.2 Layouts menu


PIC

Figure 5.3: Layouts menu


The layouts menu (Figure 5.3) changes the coordinates according to different layouts :

These layouts come from PyGraphviz module. The most useful are the hierarchical ones.

5.3 Help menu


PIC

Figure 5.4: Help menu


5.4 Model design and model handling


PIC

Figure 5.5: Model design and model handling buttons


Model design These buttons are used to graphically modified the model. With the exception of the first one, each button activates an editing function.

A new node (simple or permanent) name is entered in the informations frame (5.8) which automatically display informations on the new node when it is created. In the same way transition specifications can be completed with event expression and condition expression. This frame is also available when selecting an object on the graphical representation. Any information may be modified at any time.

Model handling

Static analysis Different static analysis can be performed. In particular isolated strongly connect component can be compute which places are candidates for joining the frontier of the model.

5.5 Graphical view

The graphical representation of the current model is represented here. It is possible to modify or create a model, using the graphic editor. The figure 5.6 and 5.7 show modifications of a toy example model.


PIC

Figure 5.6: Creation of toy example model. A transition is being drawn between SMAD4 and SMAD_complex.



PIC

Figure 5.7: Final look of the toy example model.


5.6 Node list

Display the list of all model nodes or frontier nodes. The selected nodes can be highlighted using Show button

The text-entry widget, at the bottom of the view, can be use to search for a node. It is possible to use Python regular expressions to search for a family of nodes with similar names.


PIC

Figure 5.8: Node list


5.7 Model global view

This window display an overview of the complete model in a simplify representation. Each modification in the graphical view is automatically update in the global view. The red rectangle allows to move in the graphical view and shows the area of the model displayed in the graphic window. The Narrow and Enlarge buttons adjust the size of the displayed area of the model in the main view.


PIC

Figure 5.9: Model global view


5.8 Informations

The information frame contain three kinds of informations depending on the item selected in the main graphic view.


PIC

Figure 5.10: Informations about transition. Show Conds colors the places involved in the transition condition. Note button opens an editor to write complementary informations. These information are commentaries with no effect on the model.


5.9 CADBIOM simulator

5.9.1 Simulator window

The simulation window (Figure 5.11) is made up of four parts :


PIC

Figure 5.11: simulator window


The protocol is:

  1. Load the input file if necessary.
  2. Initialize the dynamical system by selecting activated places in frontier or whole model.
  3. Choose simulation parameters
  4. Launch simulation or step by step simulation

5.9.2 Initialization window

Before a simulation, a model must be initialized. Some places are activated. The remaining places are set to unactivated. This is done in the initialization window.


PIC

Figure 5.12: Selected window. Choose the nodes to activate before the simulation


The list of places can be manually activated (or un-activated) and different features are proposed :

5.9.3 Chronogram

The chronogram (figure 5.13) is a representation of the evolution of places during a simulation. A place can be activated (high level in the chronogram) or unactivated (basal level in the chronogram). At each step an horizontal line is drawn. The change of activation state is represented by a vertical line. A "Selected chronogram" display only places that change their activated state during the last simulation.


PIC

Figure 5.13: Chronogram window of figure 5.7 model after a simulation of 10 steps.


5.10 CADBIOM checker

5.10.1 Purpose of the checker

The checker is aimed at finding answers to some queries. It is oriented to signal propagation problems. It allows for two type of properties: reachability - the property P will happen - and invariance - P will never happen. The last formula can be reformulated as ¬P is invariant. A property is always checked on a finite horizon. This limitation comes from the unfolding technique used for answering queries.

An answer to a query, also called a solution, is a list of frontier places that must be activated to get the property during a scenario. Places that are not in the solution are unactivated at initialization. Most of time, a solution contains essential places but also noisy places. These places correspond to trajectory fragments without relation with the property. Minimal Activation conditions (MAC) focus on more interesting places. Indeed a MAC is a solution such that no sublist is solution and so each place in a MAC is essential.

5.10.2 Checker window


PIC

Figure 5.14: Checker window


Query form

The upper part of the checker windows is the query form. The property must be expressed as a boolean formula with place names as variables. The logical operators are or, and, not. Two radio buttons select a reachability property (happens) or an invarience one (doesn’t happen). The horizon is specified as a number of steps allowed in the unfolding.

Optional initialisation property and invariant property can be added to the query. The last item in the query form put a limit on the number of answers allowed. very often, a query has a lot of answers and it is usefull to limit the computations to a small number of them.

Control buttons

5.11 CADBIOM static analysis

Static analysis provide various information which can be computed without unfolding of the model.

5.11.1 Isolated SCC

Recall that isolated SCC are strongly connected componants such that every entering transition cannot be fired if all the places in the SCC are unactivated. Without initialization of some of the places as activated, places in such a part of the model will remain unactivated in any scenario. The isolated SCC are enumerated in a solution window and it is possible to save the results in a file. Marquing some places as activated at initialisation is completely manual and must be done with the help of the graphic editor or by editing the textual language representation of the model.

5.11.2 Basal activated genes

In some models, it may happen that some gene transcriptions and traductions can only be inhibited. So, if the gene is present, then it always produce its protein as far as an inhibitor is not activated (and its event is present). This situation can be detected by an analysis of logical formulas in events and conditions. This analysis detects basal activated genes and for each such gene provide informations on inhibitors and activators.

An inhibitor is called an essential inhibitor if the transition is fireable as soon as the inhibitor is absent. In other word, it must be present to get an inhibition.

An inhibitor is dominant if the transition is inhibited as soon as this inhibitor is activated.

An activator is essential if the transition cannot be fired if it is absent. This activator is necessary to unable the transition.

An activator is dominant if the transition is firable as soon as the activator is activated. This kind of activator is sufficient to unable the transition.

5.11.3 Dependancy graph

Dependancy graph window exports the current CADBIOM model into different kind of dependency graph and different format. For more complete information about the different graph type, see the static analysis section of [?].

Choose the type of dependency graph between :

Then click on Dot or Graphml button to save the graph into one of these graph description languages.

5.11.4 Statistics

Gives some statistics on the model. Some of these statistics are dependant on the naming conventions used in PID data interpretation into guarded transition models.

Chapter 6
CADBIOM textual language

The CADBIOM graphical interface is useful for studying models of moderate size (a few hundred places). However it becomes very uncomfortable when one wants to modify a large model. The graphical models can be saved as xml text files which mix model graphical informations with conceptual informations. Although it is possible to use this textual representation for modifying a model, it is not very convenient and error prone. For that reason, we developed a textual representation of the models. The name of the language described in this section, is CADLANG.

The CADLANG language is very close to the CADBIOM graphical language. A transition graphically represented as:


PIC


is textually represented as:

  A --> B; [C or D];

Events are represented by a clock expression:

  A --> B; h1 default (h2 when X)[C or D];

If a transition is annotated (note button of the graphical interface in transition information), this annotation is simply reproduced between curly brackets:

  A --> B; h1[C or D] {my favorite transition - PMID 9774674};

Note that there is no semicolon between the condition expression and the annotation which is optional. The two semicolons are mandatory. The condition may be empty but the two brackets are mandatory. A transition without condition is written as:

  A --> B; h1[] {no condition};  
  C --> D; []   {no event, no condition};

The event is also optional. If there is not event in a transition guard, the event is considered to be always present.

The left hand side of a transition can be a permanent place. This is indicated by adding the modifier /p after the name of the place. The transition:


PIC


is denoted as (with the beginning of the condition only):

  p15INK4b_gene/p --> p15INK4b; [SMAD23_SP1 ...];

Input places are specified in a similar fashion with the help of the modifier /i:

  Input/i --> B; h1[] {Input is an input place};

A start place is simply indicated by the absence of name in the left hand side of a transition and there is no condition (recall that a start place is simply a marker for frontier definition):

   --> A; {notice: no condition};  
   --> B;

If there is no note, the second semicolon is absent. On a similar way, a transition to a trap place has no identifier on the right hand side:

  A -->; [ C or D]{notice: mandatory condition};  
  B -->; [];

Conditions contain boolean expressions between the two brackets. Boolean expression are built from place identifiers and boolean operators. The usual boolean operators: or, and and not are allowed as are parenthesis. Boolean operators must be in lower case characters.

An identifier must begin with a letter or an underscore. Following characters must be letters, underscore or numbers. The appearance of an identifier on a left or right hand side of a transition arrow implicitly declares it. Conditions must contain identifiers which are declared in preceding transitions. In order to deal with cross references and isolated places (without transition), place declaration is possible as in the examples:

  p15INK4b_gene/p ;  
  Smad7;

CADLANG has no name space so the scope of a name is the entire file.

Clock expressions uses the default and when operators. Operands of a default operator must be of clock type. The left operand of the when operator must be of clock type while the right one (the condition) must be a boolean expression similar to the ones allowed in conditions.

To complete the description of the CADLANG language, let us mention that it is possible to give a name to the model specified:

  /name my_model;

and use comments in the text. The /name directive must be at the beginning of the file and appears at most one time. Every line starting by // is a comment and is ignored by the compiler as usual. The compiler is called from the import menu of the graphical interface. The export menu can be used to produce a CADLANG file from a graphical representation. Positions of places will be lost in the process and if the CADLANG file is imported later, a default left to right hierarchal layout will be applied. In order to give a flavor of a CADLANG language we put here an extract of a file produced by the graphical interface:

/name  tgfb_ref  
 
// =============== isolated nodes ==================  
ERalpha;  
NKX2_5;  
// =============== transitions =====================  
 
SHC --> SHC_active_p_p;  
    [(TGFB_TGFBR2_TGFBR1_integraltomembrane_active)]  
    {PMID [1767390]};  
SMAD7_gene/p --> SMAD7;  
    [(SMAD2_3_SMAD4_SP1_active_nucleus or AP1_active_nucleus)]  
    {PMID [10843994][10973944]};  
 
IL10_gene/p --> IL10;  
    [(SMAD3_SMAD4_GATA3_active_nucleus)]  
    {PMID [11790301]};  
SnoN --> ;  
    [(SMAD3_SMAD4_active_nucleus)]  
    {PMID [10531062]};  
_14_3_3E --> EIF2A_14_3_3E_active;  
    [(EIF2A_p)]  
    {PMID [11781097]};  
--> SHC;  
    {Start};  
 
--> PPP1CA;

Chapter 7
Pathway Interaction Database interpretation

In CADBIOM, it is possible to import models created with the graphical editor in CADBIOM XML format, or in CADLANGlanguage (Files menu). It is also possible to load a file coming from the PID database: click on "Files" then on "Import from PID xml file". The file must be in XML PID format.

As PID is is a collection of data, it collects biological reactions as established from experimentations. No particular interpretation of the data is given. In CADBIOM we interpret the data in a guarded transition based model, emphasing the information propagation aspect of signal networks. This modelling point of view is not universal and might be debated. However, we found it useful for deciphering the combinatorial of cell signal propagation.

We give here the translation scheme of the different biological reactions considered in PID database.

7.1 Elementary translations

Every biological reaction described in PID has, in general, several activators and several inhibitors. The way activators and inhibitors influence the reaction is not clear. Is it necessary to have at least one activator and no inhibitor? In this case, activation condition is obtained by a logical disjunction of activator variables and inhibitor condition follow the same law. In other cases, we may need all the activator to activate and all the inhibitors to inhibit. In this case, the activators and inhibitors variables are combined by a conjunction. The two cases are presented as options in the loading window. If act is the formula combining activators and inh the formula combining inhibitors, the basic condition for a transition generated from a biological reaction is act   ¬inh. More components are added in the following elementary cases.

Transcription representation The CADBIOM representation of a transcription reaction uses a place which represents a gene. We chose a permanent place which is colored in salmon. Below (fig 7.1), the graphical representation of p21 expression in PID database is represented on the top picture. The bottom picture is the graphical representation of the CADBIOM translation.


PIC

Figure 7.1: PID transcription


The transition condition:
(SMAD23_SMAD4_FOXO1-3a-4_nucleus and not (SMAD23_SMAD4_FOXO1-3a-4_FOXG1_nucleus)
is the translation of the regulation of this transcription (red and green arrows in the PID representation).

Since permanent nodes are activated at initialization, it is not possible, in the present state of CADBIOM, to reason on gene epigenetic regulation. This point is under study for the next version of CADBIOM.

Translocation representation A translocation is simply represented as a transition between two places representing bio-molecules which differs only by the location name (fig 7.2).


PIC


Figure 7.2: PID translocation


Reaction representation A reaction gives as many transitions as there are pairs of input, output in the reaction. The reaction represented here is a complex formation. The two transitions are synchronized by the use of the complementary component of the dimer in the transition condition with origin a component (fig 7.3).


PIC


Figure 7.3: PID reaction


With this interpretation, we ensure that a reaction can only occurs when all the inputs are present.

Name generation

CADBIOM identifier generation: Informations on biomolecules as contained in PID include the name of the molecule and state and localization informations. In XML representation, these informations are scattered on different places. Since we consider that the identification of a biomolecule must contain post translational modifications and its localization, these informations are coded in the biomolecule identifier. With this coding, each biomolecule has a unique representation in our models. The coding conventions are part of the translation scheme and will be described now. The kernel off each biomolecule name is the preferred symbol we can find in the PID XML. This name is also used for the PID graphical interface. Then we add to this name the available informations about its location, activity state…In some cases, input and output name are the same in PID graphical interface because the biomolecules differ only in their post translational modification. Fortunately the PID XML is extremely well documented and differentiates those modifications. In CADBIOM we distinguish the post translational modifications too. For example, the biomolecule SMAD2+[cy], which means that SMAD2 is activated and localized in the cytoplasm, is translated into a place called SMAD2_active_cytoplasm in our CADBIOM model.

7.2 Composition

CADBIOM models contain at most a transition between two places A and B. However, any reaction A  +     B  +   induces a transformation A  h[]
-→ B. We can assume that the condition is void since the transition A h1[C1]
- → B is equivalent to the transition A h1 when C1[]
   -→ B. In the following we will drop the condition and denote transition as A -h→ B, where h is an event expression.

So two reactions R1 and R2 of the form A  +     B  +   induce two guarded transitions A h1
-→ B and A  h2
-→ B which are combined into one guarded transition: A h1 default h2
    -→ B. With this interpretation, the transformation of A into B may result from reaction R1 or reaction R2, or both.

The building of a guarded transition dynamic system share some similarities with the building of a differential equation based model. In a differential model one writes down concentration variations by summing up all the reaction influences on concentrations. In the interpretation of PID database, we similarly sum up (default up?) all the information propagations induced by reactions.

Interpreting PID database extracts give rise to very complex models. The guarded transition brick is very simple but the complexity lies in the combinations of events. Real transition event expressions can be quite large and difficult to analyze.

Chapter 8
CADBIOM API

Large models (beyond a thousand places) are difficult to handle using the graphical interface. If a model is too large, it is no longer displayed as a transition graph. However, it is still possible, with the graphical interface, to extract places with their environment. More important, only a subset of available model analysis are implemented in the graphical interface. This is due to the complexity of expressing questions in a graphical mode and also because implementing sophisticated treatment in a graphical interface is time consuming. So for handling complex questions on large models, it is preferable to write Python scripts using the CADBIOM API. In this section we give a schematic description of the API. The reader must read the full technical documentation for a more precise and up to date information.

Working with a discrete model of a biological system involve essentially three steps:

  1. Load the model.
  2. Perform analysis
  3. Extract the results of analysis.

The two main Python modules of the API are models.clause_constraints.mcl.MCLAnalyser and models.guard_transitions.analyser.static_analysis. The MCLAnalyser object is devoted to dynamical analysis while the StaticAnalyzer object give access to static analysis. So, depending on the kind of analysis one wants to perform, the first step is to create a XXXAnalyser, where XXX stands for MCL or Static. By the way, MCL means multi clock and refers to the polychronous orientation of CADBIOM.

An XXXAnalyser important component is the error reporter. An error reporter is an object which is in charge of error messages. It can print error messages as far as they appear, visualize them by any graphical mean or collect them until the end of a treatment. It must provide a display(mess) method with a string parameter and a error public attribute which is set to true by the display method. The SimpleErrorReporter class in gt_gui.utils.reporter module provide error reporters which can be used within a command line application using either MCLAnalyser or StaticAnalyser.

A StaticAnalyser is then created by a call to the constructor StaticAnalyser(reporter) where reporter is an error reporter. Similarly, a MCLAnalyser is created by a call to the constructor MCLAnalyser(reporter).

An important component of analysers is the representation of the guarded transition system as a CharModel object. All the objects, method for guarded transition models are in the package models.guard_transitions.

8.1 Building the model

8.1.1 From a chart model

If chart_model is a guarded transition model represented by a ChartModel, the method build_from_chart_model(chart_model) sets the chart_model for a StaticAnalyser or a MCLAnalyzer.

8.1.2 Loading from files

A model can be described in an XML based language containing graphical informations (with possibly a .bcx suffix) produced by the graphical interface. An other possibility is to have a textual description of the model using the cadlang language. A third source of model description is PID database XML based language. Both MCLAnalyser and StaticAnalyser object have methods for loading the three type of model description.

build_from_chart_file(file_name) This method loads a model from a graphical description generated by the graphical interface. As expected, the parameter file_name is the name of the file. Since a graphical model contains screen coordinates, the graphical interface displays such a model as it.

build_from_cadlang(file_name) Similar to the preceding method but assume model description is in CADLANG. In the graphical interface, a layout method must be applied before displaying the model graphically.

build_from_pid_file(file_name, has_clock=True, ai_interpretation=0) This method loads a model by interpreting a PID file. Again file_name is the name of the file. The two parameters has_clock and ai_interpretation are used to chose the kind of interpretation. has_clock is a boolean with default value true. When it is true, clocks are generated following the translation scheme describe in 7. The second optional parameter, ai_interpretation is for set of activators and inhibitors translation. With the default value 0, one activator and no inhibitor is the condition of activation of the transition (“or” interpretation). If ai_interpretation=1: all activators are necessary to fire the transition while the presence of all inhibitors is necessary to prevent the firing (“and” interpretation).

8.2 Modifying a model

A model is represented as an object from the ChartModel class. This class together with auxiliary classes is defined in the module models.guard_transitions.chart_model. Different methods can be used to modify the model.

get_simple_node(name) allows for retrieving a node, as a CNode object, from its name

mark_as_frontier(node_name) marks a node as a frontier node. A start node is created and a pseudo-transition is set from this start node to the node

turn_into_input(node_name) turns a simple node into an input node. The node must not have any incoming transition. If it has incoming transitions, an exception is raised.

turn_into_perm(self, node_name) turns a simple node into a permanent one. Validity conditions are the same than in the preceding method.

Many methods are available to deal with guarded transition model representations. The reader should look at the technical representation for more details.

8.3 Analysis

8.3.1 Static analysis

Static analysis provides methods for exploration, frontier computation, statistics, and dependence graph computation.

get_frontier_node_names() get a list of frontier places: either places without predecessor in the transition graph or places mark as frontier places with a start place in their predecessors.

get_gene_places() returns a list of places representing genes. A place representing a gene must contain the string “gene” in its name. This is ensure for models built from PID data base files.

get_frontier_scc() computes strongly connected components (SCC) of the transition graph that cannot be activated from inside the model. A SCC is activated if one of its places is activated. The criteria used in this computation ensure that no incoming transition of any place in the SCC can be fired when all the places of the SCC are unactivated. If nothing is done, this part of the model will remain unactivated in any trajectory. If it is an unwanted behavior, at least one place of the SCC must be marked as a frontier place.

get_basal_activated_genes() assume that places representing genes are permanent places and contain the “gene” string in their name. Basal activated genes are genes which are activated even if all the places present in activation condition (condition of the transition from gene place to protein place), are unactivated. We assume that the gene itself is activated. This exclude any epigenetic regulation model.

get_why_basal_genes() similar to the preceding method but providing a more complete information. Returns a list of pairs (g,lpv) where g is a basal activated gene and lpv are the principal variables (essential inhibitor, dominant inhibitor, essential activator, dominant activator). An inhibitor is essential if the guard condition is true when the inhibitor is absent whatever the values of the other state variables in the condition. An inhibitor is dominant if the guard condition is false when the inhibitor is present. An activator is essential if the guard condition is false when this activator is absent. An activator is dominant when the guard condition is true when it is present. All these properties hold whatever the values of the other state variables in the condition. Remark that it is enough to analyze the condition of the transition because the event of a gene transcription/translation is always a free event. For more general transitions, it is necessary to analyze the transition event.

get_tr_principal_variables(tr) computes the essential inhibitors, dominant inhibitors, essential activators and dominant activators (if any) of a transition tr.

The next methods are used to create and analyze dependency graph from loaded model. A dependency graph is a directed graph where edges represent dependencies between nodes. An edge from a source to a target node means that the target depends on the source. Places in CADBIOM are represented by node in dependency graph, and we can distinguish different kind of dependency. First, we can draw an edge from vertex A to B in dependency graph when it exists a transition from place A to B in CADBIOM. But in CADBIOM, transition condition influences the activity state of input and output places. This is why we can also consider dependency from places in condition to the output place, and even input place.

Every kind of dependency graph in the following are DiGraph object of Networkx python package. User can used DiGraph methods to perform other analysis, see Networkx documentation for more informations.

make_transition_dg() returns a dependency graph of current model using only transition as influence. The conditions of transitions are not used. This kind of dependency graph represents the paths of chart-model.

make_dependence_dg(weight = False) returns a dependency graph of current model using transition and condition as dependence. Given a CADBIOM transition, the output place depends on the input place and places used in guards or transition events. In dependency graph, there will be an edge from input to output, and from each place in guards to output. If weight, edges have a weight of 2 for dependency coming from CADBIOM transition, and a weight of 1 for dependency coming from CADBIOM guards. Else, all edges have a weight of 1.

make_full_dependence_dg(weight = False) returns a dependency graph of current model using transition and guards as influence. Transition output depends on the input and places used in guards. Input is also influenced by conditions. If weight, edges have a weight of 2 for dependency coming from CADBIOM transition, and a weight of 1 for dependency coming from CADBIOM guards. Else, all edges have a weight of 1.

Many methods are available for exploring dependence graphs and can be founded in the technical documentation.

8.3.2 Dynamic analysis- Basic methods

Before using the methods implementing dynamic analysis, we need to explain the unfolding technique and some technical constraints linked to the use of SAT solver. We can represent a dynamic system as an evolution equation X= f(X,H,I) where X represent a boolean vector encoding the current state (activated or unactivated) of the places, Xis the future state of the places. The boolean vector H represents the current free events (present or not) and the vector I stands for the current inputs. Due to the unfolding technique, the temporal properties can be checked only on a finite horizon in the future or in the past. So we will consider limited length trajectories defined by a maximal number of steps n.

Temporal properties of dynamic systems could be described using some kind of temporal logic. However, in practice, only simple queries can be mastered by a human being. So we don’t need the full power of temporal logics and we restrict ourselves to a family of simple temporal queries. Extensions to more complex dynamic properties may be implemented in CADBIOM future versions.

Simple temporal properties are specified by mean of three properties:

The variant property is specified by a list of properties (V Pk(X))k=1,,n that must be enforced at each step k. Each property is a state property and consequently is represented propositional logic formulas with place names as propositional variables. These variables take value True for activated places and False for unactivated ones.

Checking the dynamic property on a finite horizon n consists in generating the logical constraints:

The set of constraints is solved by a solver SAT and a solution is a list of vectors (xi,hi,ii) of boolean values.

MCLQuery: An object of the MCLQuery class must be used to register the different properties involved in a query. The constructor call MCLQuery(start_prop, inv_prop, final_prop) build a query initialized with the three basic properties: initial, invariant and final. Each parameter is either None or a logical formula in literal form (string). For example MCLQuery(None, None, "C3 and C4") build a query to test reachability of the property "C3 and C4" without initial condition, nor invariant condition. To set the variant property use the method:

set_variant_property(var_prop) where var\_prop is a list of logical formulas in literal form. The length of the list must be in accordance with the chosen horizon. More operations on MCLQueries are available and are referenced in the technical documentation.

The CLUnfolder component of the MCLAnalyser is in charge of the unfolding and solving steps. However, the actual implementation is much more complex than the process described above. The reason is: SAT solver can solve logical constraints in clause form only and SAT solvers use an encoding of clauses known as DIMACS encoding. In this encoding, a logical variable is encoded as a positive integer different from zero. If i is the code of the logical variable v then i is the code of the literal v while -i is the code of the literal not v. In solutions, a positive value represents true and a negative one represents false, both representing the assignment of the corresponding variable. Worse, the compilation process which translates the logical constraints into clause constraints introduces a lot of auxiliary variables which are mixed, in solutions, with the original variables representing places, events and inputs. So a raw solution obtained by the SAT solver is not directly readable and must be used with the help of extraction and decoding methods.

When loading a model in a MCLAnalyser object, a CLUnfolder is generated which implies the compilation of the dynamical system into a clause constraint dynamical system and the DIMACS coding of all the variables. This coding cannot be changed later. The unfolding of constraints is performed efficiently on the numeric form of the clause constraints. The CLUnfolder provide various method to convert variable names to DIMACS code and back, to extract the frontier of a model and to extract informations from raw solutions.

The two methods implementing the general scheme describe above are:

sq_is_satisfiable(query, max_step) with parameters:

.

sq_solutions(query, max_step, max_sol, vvars) Parameters are the same than in sq_is_satisfiable except for vvars parameter which deserves some explanations. The solver doesn’t provide all solutions of the set of logical constraints encoding the temporal property. It gives only a sample of these solutions limited in size by the max_sol parameter. Since many logical variables are auxiliary variables with no interest other than technical, we want different solutions to differ by their values on some set of significant variables. That is the meaning of the vvars parameter. Remark that the variables of interest must be specified in DIMACS code. Most of the time, this method is used internally, so DIMACS code is transparent to the user.

The sq_solutions method returns a list of RawSolution objects. The class RawSolution provides a representation of solutions which are raw results from the SAT solver and consequently not easily readable. In addition to the solution in DIMACS form (including values for auxiliary variables), a RawSolution object registers data which permit information extraction from these raw solutions. The most important methods are:

extract_frontier_values() which extracts the frontier values from the solution. These values are in DIMACS code

extract_act_frontier_values() extracts frontier variables which are active in solution.

extract_act_input_clock_seq() Extract the sequence of activated inputs and events in the solution.

In models of signal propagation, we are more interested in frontier solutions. The following method is then well adapted.

sq_frontier_solutions(query, max_step, max_sol) With parameters similar to the preceding method. The set of frontier place names which must be activated for implying query satisfaction is returned as a list of FrontierSolution.

The class FrontierSolution provides objects wrapping a symbolic representation of frontier values and timing from a raw solution. The main attributes are activated_frontier which is a list of (string) name of frontier places which are activated in the solution. The second important attribute is ic_sequence, a list of strings which describes the successive activated inputs and free events in the solution. If events h2, h5 and input in3 are activated at step k in the solution, then the kth element of the list is “% h2 h5 in3”. This format is understood by the CADBIOM simulator.

8.3.3 Dynamic analysis - Higher level methods for signaling models

Basic Dynamic analysis method implements the trajectory unfolding method. The solutions retrieve at this level are not satisfactory from a biologist point of view. In signal propagation models with many cross talks, the first natural question is: what are the conditions (frontier initializations and free event sequence) necessary to get some effect such that activating a gene for example. In signal propagation models, the frontier is considered as the interface with the external world and signals propagate from an initial frontier activation state.

If we try to answer this question with basic methods, most of the time the answers will be unsatisfactory. The problem comes from the SAT solver which gives somewhat arbitrary (although correct) answers. To be more precise, the frontier initialization got from the solver will be sufficient to obtain the desired effect but it will give a fuzzy answer since some frontier places will be unnecessary activated. For that reason, the concept of minimal activation condition is more satisfactory by removing a lot of noise.

A minimal activation condition (MAC) for a property P is a set of frontier places A such that if the model is initialized with all the places in A activated and all the other places set unactivated, then the property P is reachable and the property is lost if one place is removed from A. As usual in our approach, the reachability is checked on a finite horizon. Given a property P, there are in general several minimal activated conditions for P. Minimal activation solutions are also timing dependant, so a complete answer in term of minimal activation condition should include the set A of activated frontier places and a sequence IC of input and free events specifying the timing necessary to reach the property P.

The MCAnalyser methods implementing MAC computations are:

mac_search(query, nb_step) where query is a MCLQuery object. nb_step represents the horizon on which the property must be satisfied. The method returns a list of FrontierSolution objects.

Given a FrontierSolution which is an activation solution for a property P, it may happen that the property is not reached with the same activation condition but a different timing. If there exists no timing for which the property is not reached (on the given horizon), then the activation condition is a strong activation condition. The following method check this property:

When an activation solution is known, it is interesting to compute inhibitors. This is obtained by applying the MCLAnalyser method:

mac_inhibitor_search(mac, query, max_sol) where mac is the (possibly minimal) activated condition represented as a FrontierSolution and query is the query which was used for computing act_sol and is represented by a MCLQuery object.max_sol is an integer limiting the search. The method returns a list of FrontierSolution objects. The original activation condition is always included in the activation conditions of each object of the list.

As for activation conditions, the inhibitor effect of an activation condition may depend on timing. The following method check if it is the case:

8.3.4 Extracting results

To represent a set of solutions obtained with mac_search for instance, we propose an analysis to cluster solutions, order places and display the results with graph representation. This analysis is provided through a python script: sol2dag.py. Mainly, 3 high level functions are used in the analysis according to the wanted representation of the results.

flat_mac_tree(model_file, property_list, mac_file, graphml_file, freq_bool, far_bool) where model_file is a CADBIOMfile at .bcx format, property_list is a python list of places of the property used to get the mac_file. These both parameters are necessary to compute the distances between model places and property places. The mac_file contains the solution at tabular form, without event schedule. freq_bool and far_bool are parameters used to order places in the DAG. If freq_bool is True the places will be first order from the most to the lowest frequent. If not, places will be first order according to the distance to the property. If far_bool is True, places will be order from farthest to closest, if not from closest to farthest. The obtained DAG representation of solutions is exported into a graphml file at graphml_file location. The function flat_mac_tree computes a flat graph representation where redundancies are not eliminated.

mac_dag(model_file, property_list, mac_file, graphml_file, freq_bool,
far_bool)
with the same parameters as flat_mac_tree. This function exports a DAG into graphml file and returns the list of DAG (to be used with mac_dag_minimization). The function mac_dag computes a graph without redundancy, and ensures that each sub-graph is unique.

mac_dag_minimization(list_of_dag, graphml_file, l_red, h_red, loop) Based on topological analysis, this function translates linear path in the graph into and node. All nodes of a linear path are retrieved into a node labelled with all the names separated by "and". In the same way, horizontal diversity (nodes with same parents and same children) can be translated into or node. The parameter list_of_dag is a list of dag object (obtained with mac_dag), graphml_file is the location of output graphml file, l_red is a boolean to perform or not linear reduction, h_red is a boolean to perform or not horizontal reduction and loop is a boolean to perform linear and horizontal reduction until no reduction was possible (loop could be True only if h_red and l_red are True).

8.4 Simulation

Simulations can be performed with the graphical user interface. However in some occasions, a stand alone simulator is useful for example for automatic validation or model extraction from simulation.

A simulator object is an object from the class ChartSimulator defined in the package: models.guard_transitions.simulator.chart_simul. It is created by a call to the constructor ChartSimulator() without parameter. The object obtained is just an empty shell which can be turn into a real model by the method build(self, chart_model, sim_option, reporter. The parameter chart_model is an object representing a guarded transition model, sim_option must be set to True and reporter is an error reporter object similar to the ones used in constructors of XXXAnalyser classes.

Some models have no free events neither inputs and can be simulated as it. Their evolution relation is a deterministic function. When inputs and/or events are parts of the model, providing an input/event sequence is mandatory to perform a simulation. Input management is performed by an object from the InputStream class define in models.guard_transitions.simulator.input_stream. The objects from this class provide correct input to the simulator at each step of the simulation. The description of input and event sequences can use different formats. When the format is the one used for results from a MCLAnalyser method, the method set_act_input_stream(input_clocks) of the ChartSimulator class, provides a simulator with the correct InputStream object. The parameter input_clocks is a list of strings specifying the input and event sequence as in [’%’, ’% h2’, ’% i4 % h3’, ’% h4’].

The main methods of the simulator are:

simul_init_places(init_places): for initializations. The parameter init_places is the list of names of places activated at the start of the simulation.

simul_step(): performs a simulation step. When a transition is fired it is marked (activated = True), the origin is marked unactivated (activated = False) and the target is marked as activated.

Activation mark of transitions and places in a model can be reset with the clean() method of the ChartModel object.

A simulation can be supervised for invariant properties and reachable properties. Such properties can be registered in the simulator object by the two methods:

set_invariant_prop(prop): the parameter prop is a string specifying the property as a textual boolean expression on place variables. These expressions are submitted to the same restrictions than transition conditions.

set_reachable_prop(prop): similar to the invariant property

The invariant property is checked at the end of each simulation step. If it is violated, an exception is raised. The reachable property is checked by a method:

check_reachable_prop(): which returns a boolean. This method can be used after each simulation step. For invariant property we can also use a similar method: check_inv_conditions().