Command line package¶
Tools¶
Models¶
This module groups functions directly related to the management and the extraction of data of a Cadbiom model.
Here we find high-level functions to manage the logical formulas of the events and conditions defining the transitions; as well as useful functions to manage the entities, like to obtain their metadata or the frontier places of the model.
-
cadbiom_cmd.tools.models.
decompile_condition
(tree, inhibitors_nodes)[source]¶ Recursive function to decompile conditions
Parameters: - tree (<expression>) –
Example of tree argument: tree = ('H', 'v', ( ('F', 'v', 'G'), '^', ( ('A', 'v', 'B'), '^', ('C', 'v', ('D', '^', 'E')) ) ))
- inhibitors_nodes (<set>) – Set of inhibitors
Returns: List of valid paths composed of entities (except inhibitors). Inhibitors are added to inhibitors_nodes.
- tree (<expression>) –
-
cadbiom_cmd.tools.models.
get_frontier_places
(transitions, all_places)[source]¶ Return frontier places of a model (deducted from its transitions and from all places of the model).
Note
why we use all_places from the model instead of (input_places - output_places) to get frontier places ? Because some nodes are only in conditions and not in transitions. If we don’t do that, these nodes are missing when we compute valid paths from conditions.
Parameters: arg1 (<dict> keys: names of events values: list of transitions as tuples (with in/output, and label)) – Model’s transitions. {u’h00’: [(‘Ax’, ‘n1’, {u’label’: u’h00[]’}),] Returns: Set of frontier places. Return type: <set>
-
cadbiom_cmd.tools.models.
get_model_identifier_mapping
(model_file, external_identifiers)[source]¶ Get Cadbiom names corresponding to the given external identifiers (xrefs)
Note
This function works only on v2 formated models with JSON additional data
Parameters: - model_file (<str>) – Model file.
- external_identifiers (<set>) – Set of external identifiers to be mapped.
Returns: Mapping dictionary with external identifiers as keys and cadbiom names as values.
Return type: <dict <str>:<list>>
-
cadbiom_cmd.tools.models.
get_places_data
(places, model)[source]¶ Get a list of JSON data parsed from each given places in the model.
This function is used by
cadbiom_cmd.models.low_model_info()
.Note
v1 models return a dict with only 1 key: ‘cadbiomName’
Note
Start nodes (with a name like __start__x) are handled even with no JSON data. They are counted in the other_types and other_locations fields.
Example of JSON data that can be found in the model: { "uri": entity.uri, "entityType": entity.entityType, "names": list(entity.synonyms | set([entity.name])), "entityRef": entity.entityRef, "location": entity.location.name if entity.location else None, "modificationFeatures": dict(entity.modificationFeatures), "members": list(entity.members), "reactions": [reaction.uri for reaction in entity.reactions], "xrefs": entity.xrefs, }
Parameters: - places (<set>) – Iterable of name of places.
- model (<MakeModelFromXmlFile>) – Model from handler.
Returns: List of data parsed from each give places.
Note
Here is the list of field retrieved for v2 models:
- cadbiomName
- uri
- entityType
- entityRef
- location
- names
- xrefs
Return type: <list <dict>>
-
cadbiom_cmd.tools.models.
get_places_from_condition
(condition)[source]¶ Parse condition string and return all places, regardless of operators.
Note
This function is only used to get all nodes in a condition when we know they are all inhibitors nodes.
Todo
See the workaround in the code, without using very time consuming and badly coded functions.
Param: Condition string. Type: <str> Returns: Set of places. Return type: <set>
-
cadbiom_cmd.tools.models.
get_transitions
(parser)[source]¶ Get all transitions in the given parser.
There are two methods to access the transitions of a model.
Example: >>> print(dir(parser)) ['handler', 'model', 'parser'] >>> # Direct access >>> events = list() >>> for transition in parser.model.transition_list: ... events.append(transition.event) >>> >>> # Indirect access via a handler >>> events = list() >>> for transitions in parser.handler.top_pile.transitions: ... # transitions is a list of CTransition objects ... for transition in transitions: ... events.append(transition.event)
Todo
This function is relatively perfectible and although it is useful and mandatory for the design of networkx graphs based on solutions or models, it presents a rather heavy structure which dates from the time when the API of Cadbiom (of transition objects) was unknown and not documented.
Param: Parser opened on a bcx file. Type: <MakeModelFromXmlFile> Returns: A dictionnary of events as keys, and transitions as values. Since many transitions can define an event, values are lists. Each transition is a tuple with: origin node, final node, attributes like label and condition. {'h00': [('Ax', 'n1', {'label': 'h00[]'}),]
Return type: <dict <list <tuple <str>, <str>, <dict <str>: <str>>>>
-
cadbiom_cmd.tools.models.
get_transitions_from_model_file
(model_file)[source]¶ Get all transitions and parser from a model file (bcx format).
Param: bcx file. Type: <str> Returns: Transitions (see get_transitions()) and the Parser for the model. Return type: <dict>, <MakeModelFromXmlFile>
-
cadbiom_cmd.tools.models.
parse_condition
(condition, all_nodes, inhibitors_nodes)[source]¶ Return valid paths according the given logical formula and nodes; and set inhibitors_nodes
Note
inhibitors_nodes is modified(set) by this function.
Raises: AssertionError – If no valid path was found.
Parameters: - condition (<str>) – Condition string of a transition.
- all_nodes (<set>) – Nodes involved in transitions + frontier places.
- inhibitors_nodes (<set>) – Inactivated nodes in paths of conditions. Modified by the function.
Returns: Set of paths. Each path is a tuple of nodes.
Return type: <set>
Graphs¶
This module groups functions directly related to the creation and the management of the graph based on a Cadbiom model.
Here we find high-level functions to create a Networkx graph, and convert it to JSON or GraphML formats.
-
cadbiom_cmd.tools.graphs.
build_graph
(solution, steps, transitions)[source]¶ Build a graph for the given solution.
- Get & make all needed edges
- Build graph
Note
Legend:
- Default nodes: grey
- Frontier places: red
- Transition nodes: blue
- Inhibitors nodes: white
- Default transition: grey
- Inhibition edge: red
- Activation edge: green
Parameters: - solution (<str> or <set> or <list>) – Frontier places. String data will be split on spaces.
- steps (<list <list>>) – List of steps (with events in each step).
- transitions (<dict <list <tuple <str>, <str>, <dict <str>: <str>>>>) – A dictionnary of events as keys, and transitions as values (see get_transitions()).
Returns: - Networkx graph object.
- Nodes corresponding to transitions with conditions.
- All nodes in the model
- Edges between transition node and nodes in condition
- Normal transitions without condition
Return type: <networkx.classes.digraph.DiGraph>, <list>, <list>, <list>, <list>
-
cadbiom_cmd.tools.graphs.
draw_graph
(output_dir, frontier_places, solution_index, G, transition_nodes, all_nodes, edges_in_cond, edges)[source]¶ Draw graph with colors and export it svg file format .
This function is no longer used but can be still usefull.
Note
Legend:
- red: frontier places (in frontier_places variable),
- white: middle edges,
- blue: transition edges
Parameters: - output_dir (<str>) – Output directory for GraphML files.
- frontier_places (<set>) – Solution: a set of frontier places.
- solution_index (<int> or <str>) – Index of the solution in the Cadbiom result file (used to distinguish exported filenames).
- G (<networkx.classes.digraph.DiGraph>) – Networkx graph object.
- transition_nodes (<list>) – Nodes corresponding to transitions with conditions. List of tuples: event, node
- all_nodes (<list>) – All nodes in the model.
- edges_in_cond (<list>) – Edges between transition node and nodes in condition
- edges (<list>) – Normal transitions without condition.
-
cadbiom_cmd.tools.graphs.
export_graph
(output_dir, frontier_places, solution_index, G, *args)[source]¶ Export a networkx graph to GraphML format.
Note
Legend: See
build_graph()
.Parameters: - output_dir (<str>) – Output directory for GraphML files.
- frontier_places (<set>) – Solution: a set of frontier places. This argument is used to build the filename.
- solution_index (<int> or <str>) – Index of the solution in the Cadbiom result file (used to distinguish exported filenames).
- G (<networkx.classes.digraph.DiGraph>) – Networkx graph object.
-
cadbiom_cmd.tools.graphs.
get_json_graph
(G)[source]¶ Translate Networkx graph into a dictionary ready to be dumped in a JSON file.
Note
In classical JSON graph, ids of nodes are their names; also, their position in the array of nodes gives their numerical id, which is used as source or target in edges definitions. Here, for readability and debugging purpose, we use distinct attributes id and label for nodes.
Parameters: graph (<networkx.classes.digraph.DiGraph>) – Networkx graph. Returns: Serialized graph ready to be dumped in a JSON file. Return type: <dict>
-
cadbiom_cmd.tools.graphs.
get_solutions_graph_data
(G, info, centralities)[source]¶ Complete the given dictionary with information specific to the graph considered
Doc:
https://networkx.github.io/documentation/networkx-1.10/reference/algorithms.component.html https://networkx.github.io/documentation/stable/reference/algorithms/shortest_paths.html average_shortest_path_length https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.generic.average_shortest_path_length.html#networkx.algorithms.shortest_paths.generic.average_shortest_path_length weakly_connected_component_subgraphs https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.algorithms.components.weakly_connected.weakly_connected_component_subgraphs.html#networkx.algorithms.components.weakly_connected.weakly_connected_component_subgraphs Measures https://networkx.github.io/documentation/stable/reference/algorithms/index.html
By default the following information are added:
- graph_nodes: Number of nodes - graph_edges: Number of edges - graph_nodes_places: Number of biological places/entities. The graph is a false bipartite graph, we remove the subset of transitions in order to have the real count of biological places/entities.
If centralities is True, the folliwing information are added to the a new key named “centralities”:
- strongly_connected: - weakly_connected - max_degree - min_degree - average_degree - degree - connected_components_number - connected_components - average_shortest_paths
Parameters: - G (<networkx.classes.digraph.DiGraph>) – NetworkX directed graph
- info (<dict>) – Dictionnary of data to be completed
- centralities (<boolean>) – Flag to activate the computation of centralities.
-
cadbiom_cmd.tools.graphs.
merge_graphs
(graphs)[source]¶ Merge graphs in the given iterable; count and add the weights to the edges of the final graph
Parameters: graphs (<generator <networkx.classes.digraph.DiGraph>>) – Networkx graph objects. Returns: Networkx graph object. Return type: <networkx.classes.digraph.DiGraph>
Solutions¶
This module groups functions directly related to the parsing and the management of the files generated by the solver of Cadbiom.
Here we find high-level functions to parse or clean mac files, and extract all their data to a JSON format, a data interchange format that is humanly readable and useful in programming.
Generic functions¶
Handle *mac_complete.txt files¶
Handle *mac* files¶
-
cadbiom_cmd.tools.solutions.
convert_solutions_to_json
(sol_steps, transitions, conditions=True)[source]¶ Convert all events for all solutions in a complete MAC file and write them in a separate file in the JSON format.
This is a function to quickly search all transition attributes involved in a solution.
Example: >>> from tools.models import get_transitions >>> # Get transitions from the model >>> model_transitions = get_transitions('model.bcx') >>> decomp_solutions = convert_solutions_to_json( ... load_solutions('./solution_mac_complete.txt'), ... model_transitions, ... conditions=True, ... ) >>> print(decomp_solutions) [{ "solution": "Ax Bx", "steps": [ [{ "event": "_h_2", "transitions": [{ "ext": "n3", "ori": "Bx" }] }], ] }]
Parameters: - arg1 (<list>) – List of steps involved in a solution. See load_solutions().
A tuple of “frontier places” and a list of events in each step.
("Bx Ax", [['h2', 'h00'], ['h3'], ['h0', 'h1'], ['hlast']])
- arg2 (<dict <list <tuple <str>, <str>, <dict <str>: <str>>>>) – A dictionnary of events as keys, and transitions as values.
Since many transitions can define an event, values are lists.
Each transition is a tuple with: origin node, final node, attributes
like label and condition.
{'h00': [('Ax', 'n1', {'label': 'h00[]'}),]
See get_transitions(). - arg3 (<bool>) – (Optional) Integrate in the final file, the conditions for each transition.
Returns: Return the JSON data for the given steps.
Example:
[{ "solution": "Ax Bx", "steps": [ [{ "event": "_h_2", "transitions": [{ "ext": "n3", "ori": "Bx" }] }], ] }]
Return type: <list>
- arg1 (<list>) – List of steps involved in a solution. See load_solutions().
A tuple of “frontier places” and a list of events in each step.
-
cadbiom_cmd.tools.solutions.
get_all_macs
(path)[source]¶ Return a set of all MAC LINES from a directory or from a file.
This function is based on
get_solutions()
that returns mac lines and stripped mac lines, andget_mac_lines()
that returns only mac lines from a file.Note
Alternatively we do some verifications here:
- Detection of duplicated MACS (AssertionError raised)
- Print number of MACS per file
- Print duplicated MACS
- Print number of MACS
Param: Filepath to be opened and in which solutions will be returned. Type: <str> Returns: Set of MAC/CAM from the given path. Return type: <frozenset <str>>
-
cadbiom_cmd.tools.solutions.
get_mac_lines
(filepath)[source]¶ Returns only a set of MAC LINES from A file.
This function is based on
get_solutions()
that returns mac lines and stripped mac lines.Note
You would prefer to use
get_all_macs()
which:- Can handle a directory path and return all macs in it,
- Can handle a simple file,
- Do some verifications on all parsed macs.
Note
We assume that at this point, all MAC lines are sorted in alphabetical order.
Note
We return LINES not a set of places.
Example: {'Cx Dx', 'Ax Bx'}
Param: Filepath to be opened and in which solutions will be returned. Type: <str> Returns: Set of MAC/CAM from the given file. Return type: <set <str>>
-
cadbiom_cmd.tools.solutions.
get_query_from_filename
(model_file, solution_file)[source]¶ Return the query string according to the given model and solution filenames
Example: >>> get_query_from_filename( ... "/path/model.bcx", ... "/another_path/model_ENTITY_and_not_ENTITY_mac_complete.txt" ... ) "ENTITY_and_not_ENTITY"
Parameters: - model_file (<str>) – Path of a bcx model.
- solution_file (<str>) – Path of a solution file (*mac* file).
-
cadbiom_cmd.tools.solutions.
get_solutions
(file_descriptor)[source]¶ Generator of solution lines and corresponding stripped lines for *mac* file.
Note
This function does not return events! It is just original lines and cleaned lines containing solutions (i.e sets of frontier places/boundaries).
We remove the last
'\n'
and'\t'
. Tabs in the middle are replaced by one space' '
.Param: Opened file. Type: <file> Returns: A generator of tuples; each tuple contains the original line, and the cleaned line. Example: For an original line:
'Z\tY\tX\n'
('Z\tY\tX', 'X Y Z')
Return type: <tuple <str>, <str>>
-
cadbiom_cmd.tools.solutions.
load_solutions
(file)[source]¶ Open a file with many solution/MACs (*mac_complete.txt files) and yield them.
Example: >>> solutions = load_solutions('./solution_mac_complete.txt') >>> print([solution for solution in solutions]) ("Ax Bx", [['h2', 'h00'], ['h3'], ['h0', 'h1'], ['hlast']])
Param: File name
Type: <str>
Returns: A generator of tuples of “frontier places” and a list of events in each step.
Example: ("Ax Bx", [['h2', 'h00'], ['h3'], ['h0', 'h1'], ['hlast']])
Return type: <tuple <str>, <list>>
Display, compare, and query a model¶
Display, compare, and query a model
-
cadbiom_cmd.models.
graph_isomorph_test
(model_file_1, model_file_2, output_dir=u'graphs/', make_graphs=False, make_json=False)[source]¶ Entry point for model consistency checking.
This functions checks if the graphs based on the two given models have the same topology, nodes & edges attributes/roles.
Todo
This function should not write any file, and should be exported to the module tools.
Note
Cf graphmatcher https://networkx.github.io/documentation/development/reference/generated/networkx.algorithms.isomorphism.categorical_edge_match.html
Use in scripts: >>> from cadbiom_cmd.models import graph_isomorph_test >>> print(graph_isomorph_test('model_1.bcx', 'model_2.bcx')) INFO: 3 transitions loaded INFO: 3 transitions loaded INFO: Build graph for the solution: Connexin_32_0 Connexin_26_0 INFO: Build graph for the solution: Connexin_32_0 Connexin_26_0 INFO: Topology checking: True INFO: Nodes checking: True INFO: Edges checking: True {'nodes': True, 'edges': True, 'topology': True}
Parameters: - model_file_1 (<str>) – Filepath of the first model.
- model_file_2 (<str>) – Filepath of the second model.
Key output_dir: Output path.
Key make_graphs: If True, make a GraphML file in output path.
Key make_json: If True, make a JSON dump of results in output path.
Returns: Dictionary with the results of tests. keys: ‘topology’, ‘nodes’, ‘edges’; values: booleans
Return type: <dict <str>: <boolean>>
-
cadbiom_cmd.models.
low_graph_info
(model_file, graph_data=False, centralities=False)[source]¶ Low level function for
model_graph()
.Get JSON data with information about the graph based on the model.
See also
tools.graphs.get_solutions_graph_data()
.Parameters: model_file (<str>) – File for the model.
Key graph_data: Also return a dictionary with the results of measures on the given graph. keys: measure’s name; values: measure’s value
Example:
{ 'modelFile': 'string', 'modelName': 'string', 'events': int, 'entities': int, 'transitions': int, 'graph_nodes': int, 'graph_edges': int, 'centralities': { 'degree': { 'entity_1': float, 'entity_2': float }, 'strongly_connected': boolean, 'weakly_connected': boolean, 'max_degree': int, 'min_degree': int, 'average_degree': float, 'connected_components_number': int, 'connected_components': list, 'average_shortest_paths': int, } }
Key centralities: If True with, compute centralities (degree, closeness, betweenness).
Returns: Tuple of tuples from
tools.graphs.build_graph()
, set of frontier places, and dictionary with the results of measures on the given graph if requested.Return type: <tuple>, <str>, <dict>
-
cadbiom_cmd.models.
low_model_info
(model_file, all_entities=False, boundaries=False, genes=False, smallmolecules=False)[source]¶ Low level function for
model_info()
.Get JSON data with information about the model and its entities.
Todo
- add dump of transitions (option)
- See get_transitions remark about its deprecation for the current use case
- Dump roles of boundaries, computed here or in ChartModel… Already implemented for queries_2_common_graph and for pie charts.
See also
Format de sortie de:
tools.solutions.convert_solutions_to_json()
Parameters: model_file (<str>) – File for the model. Key all_entities: If True, data for all places of the model are returned (optional). Key boundaries: If True, only data for the frontier places of the model are returned (optional). Key genes: If True, only data for the genes of the model are returned (optional). Key smallmolecules: If True, only data for the smallmolecules of the model are returned (optional). Returns: Dictionary with informations about the model and the queried nodes. Example: { 'modelFile': 'string', 'modelName': 'string', 'events': int, 'entities': int, 'boundaries': int, 'transitions': int, 'entitiesLocations': { 'cellular_compartment_a': int, 'cellular_compartment_b': int, ... }, 'entitiesTypes': { 'biological_type_a': int, 'biological_type_b': int, ... }, 'entitiesData': { [{ 'cadbiomName': 'string', 'immediateSuccessors': ['string', ...], 'uri': 'string', 'entityType': 'string', 'entityRef': 'string', 'location': 'string', 'names': ['string', ...], 'xrefs': { 'external_database_a': ['string', ...], 'external_database_b': ['string', ...], ... } }], ... } }
Return type: <dict>
-
cadbiom_cmd.models.
model_graph
(model_file, output_dir=u'./graphs/', centralities=False, **kwargs)[source]¶ Get quick information and make a graph based on the model.
Parameters: - model_file (<str>) – File for the ‘.bcx’ model.
- output_dir (<str>) – Output directory.
- centralities (<boolean>) – If True with
--json
, compute centralities (degree, in_degree, out_degree, closeness, betweenness). - graph (<boolean>) – If True, make a GraphML file based on the graph maked from the model (optional).
- json (<boolean>) – If True, make a JSON dump of results in output path(optional).
-
cadbiom_cmd.models.
model_identifier_mapping
(model_file, *args, **kwargs)[source]¶ Entry point for the mapping of identifiers from external databases
Parameters: model_file (<str>) – File for the model. Key external_file: File with 1 external identifier per line. Key external_identifiers: List of external identifiers to be mapped.
-
cadbiom_cmd.models.
model_info
(model_file, output_dir=u'./', all_entities=False, boundaries=False, genes=False, smallmolecules=False, default=True, **kwargs)[source]¶ Get quick and full informations about the model structure and places.
Parameters: model_file (<str>) – File for the ‘.bcx’ model. Key output_dir: Output directory. Key all_entities: If True, data for all places of the model are returned (optional). Key boundaries: If True, only data for the frontier places of the model are returned (optional). Key genes: If True, only data for the genes of the model are returned (optional). Key smallmolecules: If True, only data for the smallmolecules of the model are returned (optional). Key default: Display quick description of the model (Number of places, transitions, entities types, entities locations). Key json: If True, make a JSON dump of results in output path(optional). Key csv: If True, make a csv dump of informations about filtered places.
Merge Minimal Accessibility Conditions¶
Handle generated files¶
Handle generated files
This module provides some functions to do some analyzis on the output files of Cadbiom.
Entry points:
Example of the content of a complete solution file: | |
---|---|
Bx Ax
% h2 h00
% h3
% h0 h1
% hlast
Bx Ax
% h2
% h3 h00
% h0 h1
%
% hlast
Bx Ax
% h2
% h3 h00
% h0 h1
% hlast
%
%
Bx Ax
% h2 h00
% h3
% h0 h1
% hlast
%
%
%
|
-
cadbiom_cmd.solution_sort.
get_solution_graphs
(sol_steps, transitions)[source]¶ Generator that yields the graphs of the given solutions.
Note
See the doc of a similar function
save_solutions_to_graphs()
.
-
cadbiom_cmd.solution_sort.
occurrence_matrix
(output_dir, model_file, path, matrix_filename=u'occurrence_matrix.csv')[source]¶ Make a matrix of occurrences for the solutions in the given path.
- Compute occurrences of each place in all mac.txt files.
- Save the matrix in csv format with the following columns:
- Fieldnames: “patterns (number)/places (number);mac_number;frontier places” Each request (pattern) is accompanied by the number of solutions found.
Todo
Split the creation and writing of the matrix in 2 functions.
Parameters: - output_dir (<str>) – Output path.
- model_file (<str>) – Filepath of the model.
- path (<str>) – Directory of many complete solutions files.
- matrix_filename (<str>) – (Optional) Filename of the matrix file.
Returns: A dictionnary with the matrix object. keys: queries, values: occurrences of frontier places
Return type: <dict>
-
cadbiom_cmd.solution_sort.
queries_2_common_graph
(output_dir, model_file, path, make_graphs=True, make_csv=False, make_json=False, *args, **kwargs)[source]¶ Entry point for queries_2_common_graph
Create a GraphML formated file containing a unique representation of all trajectories corresponding to all solutions in each complete MAC files (*mac_complete files).
This is a function to visualize paths taken by the solver from the boundaries to the entities of interest.
CSV fields:
- query: Query giving the solutions - solutions: nb trajectories/solutions - boundaries: Number of boundary places - events: Number of events in all solutions - genes: Number of genes involved in solutions - Protein: Number of boundaries with the type Protein (genes are not counted) - Complex: Number of boundaries with the type Complex (genes are not counted) - input_boundaries: Boundaries found only as input places - guard_boundaries: Boundaries found only in guards - mixed_boundaries: Boundaries found in guards AND in inputs of reactions - graph_nodes: Total number of nodes in the graph - graph_nodes_places: Nodes that are biomolecules (do not count reaction nodes) - graph_edges: Number of edges - strongly_connected: Is the graph strongly connected ? - max_degree - min_degree - average_degree
This function tests if the given path is a directory or a file.
Parameters: - output_dir (<str>) – Output path.
- model_file (<str>) – Filepath of the model.
- path (<str>) – Filepath/directory of a/many complete solutions files.
Key make_graphs: (optional) Make a GraphML for each query results in path. default: True
Key make_csv: (optional) Make a global CSV for all query results in path. default: False
Key make_json: (optional) Make a JSON dump of each query results in path. default: False
-
cadbiom_cmd.solution_sort.
queries_2_json
(output_dir, model_file, path, conditions=True)[source]¶ Entry point for queries_2_json
Create a JSON formated file containing all data from complete MAC files (*mac_complete files). The file will contain frontier places/boundaries and decompiled steps with their respective events for each solution.
This is a function to quickly search all transition attributes involved in a solution.
This function tests if the given path is a directory or a file.
Parameters: - output_dir (<str>) – Output path.
- model_file (<str>) – Filepath of the model.
- path (<str>) – Filepath/directory of a complete solution file.
- conditions (<boolean>) – (Optional) If False, conditions of transitions will not be present in the JSON file. This allows to have only places/entities used inside trajectories; thus, inhibitors are avoided.
-
cadbiom_cmd.solution_sort.
queries_2_occcurrence_matrix
(output_dir, model_file, path, transposed=False, normalized=False)[source]¶ Entry point for queries_2_occcurrence_matrix
See
occurrence_matrix()
.Parameters: - output_dir (<str>) – Output path.
- model_file (<str>) – Filepath of the model.
- path (<str>) – Directory of many complete solutions files.
- transposed (<boolean>) – (Optional) Transpose the final matrix (switch columns and rows).
-
cadbiom_cmd.solution_sort.
save_solutions_to_graphs
(output_dir, sol_steps, transitions)[source]¶ Build and export graphs based on the given solutions
Each solution is composed of a set of frontier places and steps, themselves composed of events. We construct a graph based on the transitions that occur in the composition of the events of the given solution.
Parameters: - output_dir (<str>) – Output path.
- sol_steps (<tuple <str>, <list>>) –
A generator of tuples of “frontier places” and a list of events in each step.
Example: ("Bx Ax", [['h2', 'h00'], ['h3'], ['h0', 'h1'], ['hlast']])
- transitions (<dict <list <tuple <str>, <str>, <dict <str>: <str>>>>) –
A dictionnary of events as keys, and transitions as values. Since many transitions can define an event, values are lists. Each transition is a tuple with: origin node, final node, attributes like label and condition.
Example: {'h00': [('Ax', 'n1', {'label': 'h00[]'}),]
-
cadbiom_cmd.solution_sort.
solutions_2_graphs
(output_dir, model_file, path)[source]¶ Entry point for solutions_2_graphs
Create GraphML formated files containing a representation of the trajectories for each solution in complete MAC files (*mac_complete files).
This is a function to visualize paths taken by the solver from the boundaries to the entities of interest.
This function tests if the given path is a directory or a file.
Parameters: - output_dir (<str>) – Output path.
- model_file (<str>) – Filepath of the model.
- path (<str>) – Filepath/directory of a/many complete solutions files.
-
cadbiom_cmd.solution_sort.
solutions_sort
(path)[source]¶ Entry point for sorting solutions.
Read a solution(s) file(s) (*mac* files) and sort all frontier places/boundaries in alphabetical order.
This function tests if the given path is a directory or a file.
Warning
The files will be modified in place.
Param: Filepath or directory path containing Cadbiom solutions. Type: <str>
-
cadbiom_cmd.solution_sort.
sort_solutions_in_file
(filepath)[source]¶ Sort all solutions in the given file in alphabetical order.
Warning
The file is modified in place.
Param: Filepath to be opened and in which solutions will be sorted. Arg: <str>
-
cadbiom_cmd.solution_sort.
transpose_csv
(input_file=u'occurrence_matrix.csv', output_file=u'occurrence_matrix_t.csv')[source]¶ Useful function to transpose a csv file x,y => y,x
Note
The csv file must be semicolon ‘;’ separated.
Parameters: - input_file (<str>) – Input file.
- output_file (<str>) – Output file transposed.
-
cadbiom_cmd.solution_sort.
write_json
(output_dir, file_path, file_suffix, data)[source]¶ Write decompiled solutions to a JSON formated file
Called by
queries_2_json TODO()
andqueries_2_common_graph()
Parameters: - output_dir (<str>) – Output directory
- file_path (<str>) – Filepath of the original solution file. We extract the basename in order to name the JSON file.
- file_suffix (<str>) – String added to the solution filename. Ex: filename + file_suffix + “.json”
- data (<list> or <dict> or <whatever>) – Data to be serialized in JSON
Search Minimal Accessibility Conditions¶
Search Minimal Accessibility Conditions
Simulation of the system until some halting condition (given with the final property) is satisfied.
-
class
cadbiom_cmd.solution_search.
ErrorReporter
[source]¶ Cf class CompilReporter(object): gt_gui/utils/reporter.py
-
cadbiom_cmd.solution_search.
compute_combinations
(final_properties)[source]¶ Return all combinations of final properties.
Note
(in case of input_file and combinations set).
Param: List of final properties. Type: <list> Returns: List of str. Each str is a combination of final_properties linked by a logical ‘and’. Example: ('TGFB1', 'COL1A1'), ('TGFB1', 'decorin')
gives:['TGFB1 and COL1A1', 'TGFB1 and decorin']
Return type: <list <str>>
-
cadbiom_cmd.solution_search.
compute_macs
(params)[source]¶ Launch Cadbiom search of MACs (Minimal Activation Conditions).
This function is called 1 or multiple times according to the necessity to use multiprocessing (Cf launch_researchs()).
Note
Previous result files will be deleted.
-
cadbiom_cmd.solution_search.
detect_model_type
(mclanalyser, filepath)[source]¶ Return the function to use to load the model.
The detection is based on the file extension.
- bcx file: Build an MCLAnalyser from a .bcx file:
- build_from_chart_file()
- cal file: Build an MCLAnalyser from a .cal file of PID database
- build_from_cadlang()
- xml file: Build an MCLAnalyser from a .xml file of PID database:
- build_from_pid_file()
Parameters: - arg1 (<MCLAnalyser>) – MCLAnalyser.
- arg2 (<str>) – File that contains the model.
Returns: The function to use to read the given file.
Return type: <func>
-
cadbiom_cmd.solution_search.
find_mac
(mcla, mac_file, mac_step_file, mac_complete_file, steps, final_prop, start_prop, inv_prop, previous_frontier_places)[source]¶ Search for 1 solution, save timings, save frontiers, and return it with the current step (deprecated, see find_macs()).
For every new solution, the system is reinitialized, and a satisfiability test is made on a new query to evaluate the minimal number of steps for reachability.
The side effect is that this process is expensive in a general way, and that parsing the properties (logical formulas of the frontier places of the previous solutions for example) in text format is very expensive because realized by the grammar ANTLR.
Parameters: previous_frontier_places (<set <tuple <str>>>) – Set of frontier places tuples from previous solutions. These tuples will be banned from the future solutions. Returns: A tuple of activated frontiers and the current step. None if there is no new Solution or if problem is not satisfiable.
-
cadbiom_cmd.solution_search.
find_macs
(mcla, mac_file, mac_step_file, mac_complete_file, steps, final_prop, start_prop, inv_prop, limit, current_nb_sols, previous_frontier_places)[source]¶ Search for many solutions, save timings, and save frontiers.
For every new solution, the system is NOT reinitialized, and a satisfiability test is made ONLY when there is no solution for the current step. This test is made to evaluate the minimal number of steps for reachability.
Unlike find_mac(), this function is autonomous and takes into account the limitation of the number of solutions.
Todo
Handle all_macs flag like the old method with find_mac() Not used very often but can be usefull sometimes…
Parameters: - limit (<int>) – Limit the number of solutions.
- current_nb_sols (<int>) – The current number of solutions already found. This number is used to limit the number of searched solutions.
- previous_frontier_places (<set <tuple <str>>>) – Set of frontier places tuples from previous solutions. These tuples will be banned from the future solutions.
Returns: None
-
cadbiom_cmd.solution_search.
get_dimacs_start_properties
(mcla, previous_frontier_places)[source]¶ Translate frontier places to their numerical values thanks to the current unfolder.
It’s much more efficient than using the ANTLR grammar to parse formulas for each new query.
Returns: List of previous solutions (list of negative values of frontier places) Ex: [[-1, -2], [-2, -3], …] Return type: <list <list <int>>
-
cadbiom_cmd.solution_search.
logical_operator
(elements, operator)[source]¶ Join elements with the given logical operator.
Parameters: - arg1 (<list>) – Iterable of elements to join with a logical operator
- arg2 (<str>) – Logical operator to use ‘and’ or ‘or’
Returns: logical_formula: str - AND/OR of the input list
Return type: <str>
-
cadbiom_cmd.solution_search.
make_logical_formula
(previous_frontier_places, start_prop)[source]¶ Make a logical formula based on previous results of MAC.
The aim is to exclude previous solution.
- 1 line:
"A B" => (A and B)
- another line:
"B C" => (B and C)
- merge all lines:
(A and B) or (B and C)
- forbid all combinaisons:
not((A and B) or (B and C))
Parameters: - arg1 (<set>) – Set of previous frontier places (previous solutions).
- arg2 (<str>) – Original property (constraint) for the solver.
Returns: A logical formula which excludes all the previous solutions.
Return type: <str>
- 1 line:
-
cadbiom_cmd.solution_search.
read_mac_file
(file)[source]¶ Return a list a fontier places already found in mac file
Note
use make_logical_formula() to get the new start_prop of the run.
Param: Mac file of a previous run Type: <str> Returns: A set a frontier places. Return type: <set>
-
cadbiom_cmd.solution_search.
search_entry_point
(model_file, mac_file, mac_step_file, mac_complete_file, mac_strong_file, steps, final_prop, start_prop, inv_prop, all_macs, continue_run, limit)[source]¶ Search solutions
Parameters: - model_file (<str>) – Model file (bcx, xml, cal).
- mac_file (<str>) – File used to store Minimal Activation Condition (MAC/CAM).
- mac_step_file (<str>) – File used to store Minimal step numbers for each solution.
- mac_complete_file (<str>) – File used to store MAC & trajectories.
- mac_strong_file (<str>) –
???
- steps (<int>) – Maximal steps to reach the solutions.
- final_prop (<str>) – Formula: Property that the solver looks for.
- start_prop (<str>) – Formula: Property that will be part of the initial state of the model. In concrete terms, some entities can be activated by this mechanism without modifying the model.
- inv_prop (<str>) – Formula: Invariant property that will always occur during the simulation. The given logical formula will be checked at each step of the simulation.
- all_macs (<boolean>) –
If set to True (not default), search all macs with less or equal the maxium of steps defined with the argument steps. If set to False: The solver will search all solutions with the minimum of steps found in the first returned solution.
Example:: all_macs = False, steps = 10; First solution found with 4 steps; The next solution will be searched with a maximum of 4 steps; all_macs = True, steps = 10; First solution found with 4 steps; The next solution is not reachable with 4 steps but with 5 steps (which is still less than 10 steps); Get the solution for 5 steps;
- continue_run (<boolean>) – If set to True (not default), previous macs from a previous run, will be reloaded.
- limit (<int>) – Limit the number of solutions.
-
cadbiom_cmd.solution_search.
solutions_search
(params)[source]¶ Launch the search for Minimum Activation Conditions (MAC) for entities of interest.
- If there is no input file, there will be only one process.
- If an input file is given, there will be 1 process per line (per logical formula on each line).
Make an interaction graph based on molecules of interest¶
This module groups functions directly related to the design of an interaction weighted graph based on the search of molecules of interest.
Entry point: json_2_interaction_graph()
.
-
cadbiom_cmd.interaction_graph.
build_graph
(output_dir, all_genes, all_stimuli, genes_interactions, stimulis_interactions, genes_stimuli_interactions, molecule_stimuli_interactions)[source]¶ Make an interaction weighted graph based on the search of molecules of interest
Edges: - gene - gene: Two genes present simultaneously in a solution
- stimulus - stimulus: Two stimuli present simultaneously in a solution
- gene - stimulus: One gene and one stimulus present simultaneously in a solution (deprecated)
- molecule of interest - stimulus: A molecule of interest in a trajectory related to a solution that contains a stimulus.
Legend of the edges: - gene - gene: red
- stimulus - stimulus: blue (deprecated)
- gene - stimulus: red
- molecule of interest - stimulus: yellow
Legend of the nodes: - genes: red
- stimuli: blue
- molecules of interest: yellow
Parameters: - output_dir (<str>) – Output path.
- all_genes (<set>) – All genes in all the solutions
- all_stimuli (<set>) – All stimulis in all the solutions
- genes_interactions (<Counter>) – Interactions between genes in the same solution
- stimulis_interactions (<Counter>) – Interactions between stimuli in the same solution
- genes_stimuli_interactions (<Counter>) – Interactions between genes and stimulis in the same solution
- molecule_stimuli_interactions (<Counter>) – Counter interactions between molecules
of interest and frontier places that are not genes (stimuli)
in trajectories (i.e.:
(molecule, stimulus)
).
-
cadbiom_cmd.interaction_graph.
build_interactions
(filtered_macs, binary_interactions)[source]¶ Make binary interactions used by the graph as edges
PS: genes and stimulis are frontier places.
Parameters: - filtered_macs (<tuple <tuple <str>>>) –
All solutions related to the molecules of interest.
(("frontier_1", "frontier_2", "frontier_3"),)
- binary_interactions (<dict <str>: <Counter <str>: <int>>>) –
A dictionary of related frontier places.
# For molecules of interest "A" and "B" {"A": { "frontier_1": 1, "frontier_2": 1, }, "B": { "frontier_3": 1, }, }
Returns: Various Counters of binary interactions:
- all_genes: All genes in all the solutions
- all_stimuli: All stimulis in all the solutions
- genes_interactions: Interactions between genes in the same solution
- stimulis_interactions: Interactions between stimuli in the same solution
- genes_stimuli_interactions: Interactions between genes and stimulis in the same solution
- molecule_stimuli_interactions: Counter interactions between molecules
of interest and frontier places that are not genes (stimuli)
in trajectories (i.e.:
(molecule, stimulus)
).
Return type: <set>, <set>, <Counter>, <Counter>, <Counter>, <Counter>
- filtered_macs (<tuple <tuple <str>>>) –
-
cadbiom_cmd.interaction_graph.
filter_trajectories
(trajectories, molecules_of_interest)[source]¶ Get solutions and count frontier places related to the given molecules of interest.
Parameters: - trajectories (<generator <tuple <tuple>, <set>>>) –
A generator of tuples with tuple of frontier places as keys and set of places involved in transitions as values.
(("Ax", "Bx"), {"n3", "Bx"})
- molecules_of_interest (<tuple>) – Iterable of molecules of interest.
Returns: A tuple of all solutions related to the molecules of interest, and a dictionary of related frontier places and their occurences for each molecule of interest.
# For molecules of interest "A" and "B" ((("frontier_1", "frontier_2", "frontier_3"),), {"A": { "frontier_1": 1, "frontier_2": 1, }, "B": { "frontier_3": 1, }, })
Return type: <tuple <tuple <tuple <str>>>, <dict <str>: <Counter <str>: <int>>>>
- trajectories (<generator <tuple <tuple>, <set>>>) –
Read decompiled solutions files (*.json* files)
This functions tests if the given path is a directory or a file.
Parameters: path (<str>) – Filepath/directory of a decompiled JSON file. Returns: A generator of tuples with tuple of frontier places as keys and set of places involved in transitions as values. (("Ax", "Bx"), {"n3", "Bx"})
Return type: <generator <tuple <tuple>, <set>>>
Get frontier places and other places involved in transitions.
Parameters: - file_path –
Path of a JSON file; this file is generated by
convert_solutions_to_json()
.A solution is composed of steps with events, composed of transitions: [{ "solution": "Ax Bx", "steps": [ [ { "event": "_h_2", "transitions": [{ "ext": "n3", "ori": "Bx" }] }, ], ] }]
- file_path – <str>
Returns: A generator of tuples with tuple of frontier places as keys and set of places involved in transitions as values.
(("Ax", "Bx"), {"n3", "Bx"})
Return type: <generator <tuple <tuple>, <set>>>
- file_path –
-
cadbiom_cmd.interaction_graph.
json_2_interaction_graph
(output_dir, molecules_of_interest, path)[source]¶ Entry point for json_2_interaction_graph
Read decompiled solutions files (*.json* files produced by the directive
queries_2_json
) and make a graph of the relationships between one or more molecules of interest, the genes and other frontier places/boundaries found among all the solutions.More information about the graph and its legend:
build_graph()
.Parameters: - output_dir (<str>) – Output path.
- molecules_of_interest (<tuple>) – Iterable of molecules of interest.
- path (<str>) – Filepath/directory of a JSON solution file.
Make heatmaps¶
Module used to create a hierarchically-clustered heatmap of boundaries.
-
cadbiom_cmd.queries_2_clustermap.
draw_matrix_heatmap
(df, filepath)[source]¶ Draw and save clustermap from the given dataframe
Parameters: - df (<pandas.core.frame.DataFrame>) – Pandas dataframe
- filepath (<str>) – Filepath of the matrix. Used to build the SVG file.
-
cadbiom_cmd.queries_2_clustermap.
open_dataframe
(filepath)[source]¶ Get Pandas dataframe from CSV file
Because yes, pandas knows to open a CSV file (not like R). It’s awesome. Don’t teach this in bio-info please. You should always prefer complex and legacy technologies it makes you smart (especially for the first ones ><).
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Returns: Pandas dataframe Return type: <pandas.core.frame.DataFrame>
-
cadbiom_cmd.queries_2_clustermap.
payload
(output_dir, filepath)[source]¶ Make a clustermap based on an occurrence matrix for the given solution file
Parameters: - output_dir (<str>) – Output path.
- filepath (<str>) – Solution filepath.
-
cadbiom_cmd.queries_2_clustermap.
queries_2_clustermap
(output_dir, path, *args, **kwargs)[source]¶ Entry point for queries_2_clustermap
Create a hierarchically-clustered heatmap of boundaries in mac files.
Parameters: - output_dir (<str>) – Output path.
- path (<str>) – Filepath/directory of a/many complete solutions files.
-
cadbiom_cmd.queries_2_clustermap.
write_matrix
(filepath, output_dir)[source]¶ Make an occurrence matrix of boundaries found in the given solution file
Example of CSV produced:
- Columns: Frontier places
- Lines: Solution with a ‘1’ in columns corresponding to an occurrence of the frontier place.
solution_number;boundary_1;boundary_2;... 1;0;1;... 2;1;0;...
Parameters: - filepath (<str>) – Solution filepath.
- output_dir (<str>) – Output path.
Returns: Filepath of the CSV file produced. Filename is of the form <solution_file>_sol_matrix.csv
Return type: <str>