Documentation of the package for developers¶
biopax_converter¶
This module is used to translate BioPAX data to CADBIOM models.
-
biopax2cadbiom.biopax_converter.
add_cadbiom_names_to_entities
(dictPhysicalEntity)[source]¶ Fill ‘cadbiom_names’ attribute of entities.
The aim is to have the list of elements contained in each entities and their names.
We process essentially entities with subunits: components or members (complexes or classes).
The attribute ‘cadbiom_names’ corresponds to a list of unique cadbiom IDs for the entity (Complex, Class). Each member of the list is the unique cadbiom ID of each subcomponent present in the attribute ‘flat_components’.
Warning
To fill ‘cadbiom_names’, we first handle complexes that can be classes; BUT classes are not necessarily complexes (without ‘flat_components’), so a recursive decomposition is made. For that, see
get_cadbiom_names()
Note
Because complexes are already developed in
developComplexEntity()
, this type of entities do not have to be decompiled recursively here.Parameters: dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
-
biopax2cadbiom.biopax_converter.
add_conditions_to_reactions
(dictReaction, dictPhysicalEntity, dictControl)[source]¶ Elaborate condition for each event attached to a reaction.
Note
Condition: i.e. guard of transition in Cadbiom formalism.
Parameters: - dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
-
biopax2cadbiom.biopax_converter.
add_controllers_to_reactions
(dictReaction, dictControl)[source]¶ Fill the attribute controllers of Reaction objects with Controls objects.
Note
Thanks to
filter_control()
we have only entities; each controller (control.controllers) is an entity (not a pathway), so only entities control reactions.Note
The controllers attribute of a Reaction corresponds to a set of controller entities involved in it.
Parameters: - dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions created, by the function query.get_biopax_reactions()
- dictControl (<dict <str>: <Control>> keys: uris; values control objects) – Dictionary of biopax controls created, by the function query.get_biopax_controls()
-
biopax2cadbiom.biopax_converter.
add_locations_to_entities
(dictPhysicalEntity, dictLocation)[source]¶ Add Location objects to PhysicalEntities
-
biopax2cadbiom.biopax_converter.
add_modifications_features_to_entities
(dictPhysicalEntity, dictModificationFeatures)[source]¶ Add modifications and their number to the entity name
-
biopax2cadbiom.biopax_converter.
add_reactions_and_controllers_to_entities
(dictReaction, dictControl, dictPhysicalEntity)[source]¶ Fill the attribute reactions of PhysicalEntity objects with Reactions and Controls objects.
Note
The reactions attribute corresponds to a set of reactions in which the entity is involved (as controller or participant). We use this attribute in order to know if complexes have to be deconstructed (only if a subentity is used elsewhere in a reaction).
Note
Supported roles in reactions are: - productComponent - participantComponent - leftComponents - rightComponents - controller of
Thanks to
filter_control()
we have only entities; each controller (control.controllers) is an entity (not a pathway), so only entities control reactions. Empty controllers of Control objects shouldn’t happen since this attr is not optional in the SPARQL query.Some controlled elements can also be controls (Cf Modulation class in some databases); This has nothing to do with this function. See
add_controllers_to_reactions()
andget_control_group_condition()
instead.=> We just remove controllers that aren’t in dictPhysicalEntity; and Controls that haven’t a controlled reaction (but another control).
Parameters: - dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
- dictControl (<dict <str>: <Control>> keys: uris; values control objects) – Dictionary of biopax controls, created by the function query.get_biopax_controls()
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
-
biopax2cadbiom.biopax_converter.
add_unique_cadbiom_name_to_entities
(dictPhysicalEntity)[source]¶ Add cadbiom_name attribute to entities in dictPhysicalEntity.
Note
The attribute cadbiom_name corresponds to a unique cadbiom ID for the entity (Protein, Complex, Class, etc.).
Parameters: dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
-
biopax2cadbiom.biopax_converter.
add_xrefs_to_entities
(dictPhysicalEntity, dictEntities_db_refs)[source]¶ Add xrefs to entities
Parameters: dictEntities_db_refs (<dict <str>: <dict <str>: <list>>>) – Dictionary of entityRefs. keys: uris; values: dict of databases keys: database names; values: ids
-
biopax2cadbiom.biopax_converter.
assign_missing_names
(dictPhysicalEntity)[source]¶ Assign an arbitrary name to entities without displayName
The chosen name is the first among the sorted synonyms.
Some files do not declare the displayName attribute of their entities. This has important consequences on the merge of similar entities because this process uses this attribute to detect them.
cf. CellDesigner, Curie files.
-
biopax2cadbiom.biopax_converter.
build_cadbiom_name
(entity, synonym=None)[source]¶ Get entity name formatted for Cadbiom.
Parameters: - entity (<PhysicalEntity>) – PhysicalEntity for which the name will be encoded.
- synonym (<str>) – (Optional) Synonym that will be used instead of the name of the given entity.
Returns: Encoded name with location if it exists.
Return type: <str>
-
biopax2cadbiom.biopax_converter.
compute_locations_names
(dictLocation, numeric_compartments_names=False)[source]¶ Create a cadbiom ID for each location.
Warning
It updates the key ‘cadbiom_name’ of entities in dictLocation[location].
Parameters: - dictLocation (<dict>) – Dictionary of biopax locations created by query.get_biopax_locations(). keys: CellularLocationVocabulary uri; values: Location object
- numeric_compartments_names (<bool>) – (optional) If True, names of compartments will be based on numeric values instead of their real names.
Returns: Dict of encoded locations. keys: numeric value or real location name; values: Location object
Return type: <dict <str>:<Location>>
-
biopax2cadbiom.biopax_converter.
createControlFromEntityOnBothSides
(dictReaction, dictControl)[source]¶ Remove entities on both sides of reactions and create a control instead.
We believe that these entities present in the reagents and products are in fact a catalysts without which the reaction can not take place.
We remove this entity from the reaction and add an ACTIVATION controller to the list of BioPAX Controls.
Note
This function must be called before adding reactions to entities.
Parameters: - dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
- dictControl (<dict <str>: <Control>> keys: uris; values control objects) – Dictionary of biopax controls, created by the function query.get_biopax_controls()
-
biopax2cadbiom.biopax_converter.
detect_members_used
(dictPhysicalEntity, full_graph=False, keepEmptyClasses=False)[source]¶ Set the attribute ‘membersUsed’ of generic entities (classes).
Set of members involved in at least one reaction. Empty set if the entity does not have members.
Warning
A generic entity can be any of the subclasses of PhysicalEntity. Note that complexes are the only entities with ALWAYS ‘flat_components’ != None value.
A complex can also be a class and we check that these entities have no flat_components in
develop_complexes()
.Parameters: - dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
- full_graph (<bool>) – (optional) Convert all entities to cadbiom node, even the entities that are not used elsewhere.
- keepEmptyClasses – (optional) (deprecated) If some members are not used, we add the entity to the membersUsed attribute with the aim to represent all the members not used. => This will break some conversions and unit tests because the translation implies the removal of genericity.
-
biopax2cadbiom.biopax_converter.
developComplexEntity
(complex_entity, dictPhysicalEntity, new_physical_entities)[source]¶ Fill flat_components attribute of the given complex.
Called by
develop_complexes()
.Search recursively all components of the given complex.
Some Complex have subcomplex like in Reactome 56 from PC8. Example:
- Complex_c33f6c2be7551100a54e716b3bf8ec8a:
- Complex_0088fc0fe989a0b0abc3635b20df8d90
- Complex_b87d9cb2e60df79cdde88a9f8f45e80d
Here we handle ONLY COMPLEXES! Even if some complexes have sub-entities that are classes, flat_components contains ONLY uris of atomic entities. ONLY flat_components_primitives contains generic entities. In flat_components_primitives, we just want items (including generic ones) in the same order as any flat_component in flat_components.
—
Some Complex are classes, some of these classes may have components. In this case, we produce a new Complex (copy of the class) with only the components of the class; and we erase these components from the class.
- The new complex is added to new_physical_entities and must be added later to dictPhysicalEntity. Its uri is completed with the suffix “_not_class”.
- The class is left in dictPhysicalEntity.
Warning
Complexes are the only entities that have a flat_components attribute set. However, Complexes that are also classes should have an empty flat_components.
Note
Empty complexes (without component) are processed like any basic entity. Cf VirtualCase19: ‘B_bottom’
Todo
When a class occurs multiple times through components of complexes we should remove it and make a set of primitives. This will avoid cartesian product of members, duplication of complexes on useless flat_components. Cf VirtualCase19: ‘B’ class in C_top and C_bottom.
—
Full explanations:
developed_components is a list of tuples that contain combinations of all recursively searched sub-entities in the given complex.
developed_classes is a list of primitives sub-entities in the given complex. Classes are not replaced by their members. Entities are in the same order as in a flat_component. The aim is to dynamically rebuild the flat_component of a complex when we remove genericity in replace_and_build().
Example:
A: complex composed with components: B: complex with components: W: protein X: generic smallmolecule with members: Y: smallmolecule (used elsewhere) Z: smallmolecule (not used elsewhere) C: protein X is a class that represents 2 smallmolecules: Z and Y For X: developed_components = [X, Y] (edit: just [Y] now) For W: developed_components = [W] So for B: developed_components = [[X, Y], [W]] (edit: [[Y], [W]]) and flat_components = [(X, W), (Y, W)] (edit: [(Y, W)]) For A: developed_components = [[C], [(X, W), (Y, W)]] (edit: [[C], [(Y, W)]]) and flat_components = [(C, X, W), (C, Y, W)] (edit: now X is removed, and the final result is [(C, Y, W)]) If Z has been used elsewhere, we would have had the following final result for developed_components of A: [[C], [(Y, W), (Z, W)]] and flat_components: [(C, Y, W), (C, Z, W)] developed_classes = [C, X, W] flat_components_primitives = [C, X, W] PS: 'A' can be Complex_6e3d8ef563cbcc0c9e2a4afb2a920c38 (Reactome v56 inPC8); In this complex, Z is also used, so X is totally removed.
Parameters: - complex_entity (<PhysicalEntity>) – Complex entity
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
-
biopax2cadbiom.biopax_converter.
develop_complexes
(dictPhysicalEntity)[source]¶ Set the attribute ‘flat_components’ of complexes entities.
‘flat_components’ is a list of tuples of component URIs.
This function depends of
detect_members_used()
.Parameters: dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
-
biopax2cadbiom.biopax_converter.
filter_controls
(controls, pathways_names, blacklisted_entities)[source]¶ Remove pathways and cofactors from controls and keep others entities.
Note
Remove also entities that control pathways.
Note
We want ONLY entities and by default, there are pathways + entities.
Parameters: - controls (<dict>) – Dict of Contollers. keys: URIs; values: <Control>
- pathways_names (<dict>) – Dict of pathways URIs and names. keys: URIs; values: names (or uri if no name)
- blacklisted_entities (<set>) – set of entity uris blacklisted
Returns: Filtered controllers dict.
Return type: <dict>
-
biopax2cadbiom.biopax_converter.
filter_entities
(dictPhysicalEntity, blacklisted_entities)[source]¶ Remove blacklisted entities from BioPAX entities.
Note
Blacklisted entities are removed from dictPhysicalEntity, from components and from members.
Parameters: - dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
- blacklisted_entities (<set>) – set of entity uris blacklisted
Returns: Dictionary of biopax physicalEntities without blacklisted entities
Return type: <dict <str>: <PhysicalEntity>>
-
biopax2cadbiom.biopax_converter.
filter_reactions
(dictReaction, blacklisted_entities)[source]¶ Remove blacklisted entities from reactions.
Note
Effects: - productComponent and participantComponent can be set to None - blacklisted entities are removed from leftComponents and rightComponents
Parameters: - dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
- blacklisted_entities (<set>) – set of entity uris blacklisted
-
biopax2cadbiom.biopax_converter.
find_unique_synonyms
(cadbiom_name, entity_uris, unique_cadbiom_names, dictPhysicalEntity)[source]¶ Build unique names for the given uris, having the same cadbiom name.
Note
First, we use synonyms from BioPAX database to find a unique name. When there is no more usable synonyms to build a unique name, we add a version number based on the given cadbiom name for all the remaining entities.
Note
The merging procedure for similar entities greatly reduces the number of entity groups proposed to this function.
Parameters: - cadbiom_name (<str>) – The redundant cadbiom name
- entity_uris (<set>) – Set of uris of entities having the same name
- unique_cadbiom_names (<set>) – Set of unique cadbiom names already used
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns: Dictionary of uris as keys and unique names as values.
Return type: <dict>
-
biopax2cadbiom.biopax_converter.
get_cadbiom_names
(entity, dictPhysicalEntity)[source]¶ To be called recursively or by
add_cadbiom_names_to_entities()
Note
See
add_cadbiom_names_to_entities()
for more information.Note
The attribute ‘cadbiom_names’ corresponds to a list of unique cadbiom IDs for the entity (Complex, Class). Each member of the list is the unique cadbiom ID of each subcomponent present in the attribute ‘flat_components’.
Parameters: - entity (<PhysicalEntity>) – A PhysicalEntity.
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns: Set of cadbiom names for the given entity.
Return type: <set>
-
biopax2cadbiom.biopax_converter.
get_control_group_condition
(controls, dictPhysicalEntity, controlled_controls)[source]¶ Get condition for a group of controllers.
- Activators are linked together by a logical ‘OR’,
- inhibitors are linked together by a logical ‘OR’,
- but sets of activators and inhibitors are linked together by a logical ‘AND’.
Cascades of controls are supported here; Each regulation from a nested condition is linked by an ‘AND’ operator.
Unsupported controlTypes lead to a None condition; i.e. a cascade of controls can be breaked if a control has an unknown controlType.
Warning
controlType can be as follows (* are currently supported because they are general terms; others are from EcoCyc and will be logged as errors):
- ACTIVATION*
- INHIBITION*
- INHIBITION-ALLOSTERIC
- INHIBITION-COMPETITIVE
- INHIBITION-IRREVERSIBLE
- INHIBITION-NONCOMPETITIVE
- INHIBITION-OTHER
- INHIBITION-UNCOMPETITIVE
- ACTIVATION-NONALLOSTERIC
- ACTIVATION-ALLOSTERIC
Note
Controllers/classes are processed in
get_cadbiom_names()
. Here we just use cadbiom_names to distinguish entities.Parameters: - controls (<set <Control>>) – Set of Control objects for a reaction.
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns: Sympy condition or None
Return type: <sympy.core.symbol.Symbol> or <None>
-
biopax2cadbiom.biopax_converter.
get_pathways_entities
(dictReaction, dictControl, dictPhysicalEntity)[source]¶ This function creates the Dictionary pathwayToPhysicalEntities.
Parameters: - dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
- dictControl (<dict <str>: <Control>> keys: uris; values control objects) – Dictionary of biopax controls, created by the function query.get_biopax_controls()
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns: pathwayToPhysicalEntities keys: pathway uris; values: set of entities involved in the pathway.
Return type: <dict <str>: <set>>
-
biopax2cadbiom.biopax_converter.
get_transitions
(dictReaction, dictPhysicalEntity)[source]¶ Return transitions with (ori/ext nodes) and their respective events.
- Types considered as reactions:
- Conversion
- BiochemicalReaction
- ComplexAssembly
- Transport
- TransportWithBiochemicalReaction
- Types considered as regulators:
- Catalysis
- Control
- TemplateReactionRegulation
- Types not supported:
- MolecularInteraction
- Degradation
Warning
dictPhysicalEntity is modified in place. We add “virtual nodes” for genes that are not in BioPAX format.
Todo
handle Degradation types and TRASH nodes => will crash cadbiom writer because they are not entities…
Parameters: - dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns: Dictionary of transitions and their respective set of events.
- Example:
subDictTransition[(cadbiomL,right)].append({ ‘event’: transition[‘event’], ‘reaction’: reaction, ‘sympyCond’: transitionSympyCond }
Return type: <dict <tuple <str>, <str>>: <list <dict>>>
-
biopax2cadbiom.biopax_converter.
load_blacklisted_entities
(blacklist_file)[source]¶ Get all URIs of blacklisted elements in the given file.
Note
The csv can be written with the following delimiters: ‘,;’. In the first column we expect the URI, In the second column users can put the corresponding cadbiom name (currently not used).
Param: blacklist_file filename. Type: <str> Returns: Set of uris. Return type: <set>
-
biopax2cadbiom.biopax_converter.
main
(params)[source]¶ Entry point
Here we detect the presence of the pickle backup and its settings. If there is no backup or if the user doesn’t want to use this functionality, queries are made against the triplestore.
Then, we construct a Cadbiom model with all the retrieved data.
-
biopax2cadbiom.biopax_converter.
merge_duplicated_entities
(dictPhysicalEntity, model_path, log_files=True)[source]¶ Merge multiple occurrences of the same entity in the model
The duplicates can come from the BioPAX database, as well as from the process of transferring post-translational modifications of classes to their daughter entities in
transfer_class_attributes_on_child_entities()
In order to group the entities, they are ordered according to some of their attributes:
- entityType
- entityRef
- name
- components_uris
- location_uri
- modificationFeatures
3 files are created in this function:
- sort_dumped.csv: Dump of all entities (sorted but not grouped)
- sort_grouped.csv: Dump proposed groups
- sort_grouped_after_merge.csv: Dump definitive groups
Note
About reactions attached to duplicate entities: Reactions from all duplicates are merged together.
Warning
If classes with similar attributes are merged, then we consider that their members are similar. These members are not merged together.
Todo
During the merge of entities, prefer existing uris in the BioPAX model rather than those formed by duplication.
Parameters: - dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
- model_path (<str>) – Filepath of the final model.
Key log_files: (optional) If True, csv files are created. Default: True.
Returns: Dictionary of canonical uris as keys, and lists of non-canonical linked uris as values.
Return type: <dict <list>>
-
biopax2cadbiom.biopax_converter.
shortening_modifications
(modificationFeatures, length=1)[source]¶ Return a short version of all given modification names and occurences.
Note
Some terms can be corrected before shortening: - residue modification, inactive: inactive - residue modification, active: active
Parameters: - modificationFeatures (<Counter>) – Counter of modificationFeatures
- length (<int>) – Length of the shortening; put None for entire strings.
Returns: Short and merged version of the given modificationFeatures.
Return type: <str>
-
biopax2cadbiom.biopax_converter.
sort_callback
(elem)[source]¶ Order of the sort of PhysicalEntities on their attributes
The sort of all entities must respect lexicographic order of all attributes.
=> if component URI is not casted into a sorted list, the order is modified, and then, itertools.groupby will be fooled:
- [‘W’, ‘X’] < [‘X’, ‘Y’] => True
- {‘X’, ‘W’} < {‘X’, ‘Y’} => False
['W', 'X'] is < to ['X', 'Y'] Ater;A;['W', 'X'];http://simulated/test#anywhere; A;A;['X', 'Y'];http://simulated/test#anywhere; Abis;A;['X', 'Y'];http://simulated/test#anywhere; If we do not cast set into list: A;A;['Y', 'X'];http://simulated/test#anywhere; Abis;A;['Y', 'X'];http://simulated/test#anywhere; Ater;A;['X', 'W'];http://simulated/test#anywhere;
Parameters: elem (<PhysicalEntity>) – PhysicalEntity
-
biopax2cadbiom.biopax_converter.
transfer_class_attributes_on_child_entities
(entities, dictPhysicalEntity)[source]¶ Transfer modificationFeatures and location of classes on child entities
If a child entity does not have the same attributes as its class, it is inserted in the list of BioPAX entities under a specific (new) URI, with its new inherited attributes. It is possible that an entity describing this state is already in the BioPAX ontology. In this case, the two entities will then be grouped by the function
merge_duplicated_entities()
.Todo
Si entité dupliquée déjà dans le dictionnaire:
- elle est déjà utilisée ailleurs dans 1 classe. => doit etre décompilée même si ne participe à aucune réaction.
- Sinon, supprimer les réactions. => sert à rien de créer des entités non utilisées dans le modèle.
Note
In a general way, sub-entities are not duplicated if the class doesn’t provide information that is not already in the sub-entity.
We try not to overwrite modifications or location if they are the same. The transfer of similar post-translational modifications AND location is useless. => Avoid the duplication of entities.
However, we can not exclude that inconsistent / conflicting modifications are applied to the sub-entities such as: residue modification, active and residue modification, inactive
Note
About reactions attached to duplicate entities: We CAN reset all reactions (the attributes reactions of sub-entities) involving the entity in its old context (without the transfer of attributes that we operate here). This avoids appearing in the model entities that are not reused anywhere else. If the entity must be present in the model, it will be decided during the merge by the function
merge_duplicated_entities()
that also merges the reactions of the duplicates.BUT we choose to keep the reactions of the parent entity in order to solve VirtualCase14 bug. We prefer to have more entities than false transitions in the final model.
There are four cases to consider about this problem:
none of the duplicates contains a reaction. => the merged entity will be absent from the model
the duplicate entity has no reaction but the duplicate already in the model contains one. => the merged entity will be in the model
the duplicate entity has a reaction but the duplicate already in the model does not contain one.
- if the attribute reactions is not reset, the merged entity will
be wrongly in the model because of it will be flagged as being
reused elsewhere by
detect_members_used()
. - if the attribute reactions is reset, a side effect described in testCase 14 will appear: the decompilation of classes participating in reactions causes the formation of incorrect relations between the entities of these classes.
- if the attribute reactions is not reset, the merged entity will
be wrongly in the model because of it will be flagged as being
reused elsewhere by
Warning
dictPhysicalEntity is modified here.
Parameters: - entities (<dict <str>: <PhysicalEntity>>) – Dictionary of entities to be processed. keys: uris; values entity objects
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>>) – Dictionary of all entities in the model. keys: uris; values entity objects
cadbiom_writer¶
This module groups functions used to export BioPAX-processed data to a Cabiom model file.
-
biopax2cadbiom.cadbiom_writer.
build_json_data
(entity)[source]¶ Build JSON data about the given entities.
Note
We can handle reactions from dictReaction, or entities from dictPhysicalEntity
Note
Return these attributes if they exist:
- PhysicalEntity:
- uri
- entityType
- name + synonyms
- entityRef
- location
- modificationFeatures
- members
- reactions
- Reaction:
- uri
- interactionType
Parameters: entity (<str>) – URI of an entity or a list of reactions. Returns: JSON formatted str. Return type: <str>
-
biopax2cadbiom.cadbiom_writer.
create_cadbiom_model
(dictTransition, dictPhysicalEntity, dictReaction, model_name, file_path)[source]¶ Export data into a Cadbiom file format.
Parameters: - dictTransition (<dict <tuple <str>, <str>>: <list <dict>>>) –
Dictionnary of transitions and their respective set of events.
- Example:
subDictTransition[(cadbiomL,right)].append({ ‘event’: transition[‘event’], ‘reaction’: reaction, ‘sympyCond’: transitionSympyCond }
- dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionnary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
- dictReaction (<dict>) –
- model_name (<str>) – Name of the model.
- file_path (<str>) – File path.
- dictTransition (<dict <tuple <str>, <str>>: <list <dict>>>) –
-
biopax2cadbiom.cadbiom_writer.
format_condition
(condition)[source]¶ Build the string representation of the given sympy expression
We just replace logical operators by their textual version in the string representation of the given condition. To see the previous version with tree parsing go to commits <= 9702ea8; this last method is cleaner but recursive and costly for BIG and COMPLEX conditions.
Parameters: condition (<sympy.Or>, <sympy.And>, <sympy.Not>) – Sympy expression Return type: <str>
-
biopax2cadbiom.cadbiom_writer.
format_events_and_conditions
(events_conditions)[source]¶ Build the condition of a transition based on the given set of events and conditions
Parameters: events_conditions (<set <tuple <str>, <str>>>) – Set of tuples (event name, sympy condition) Return type: <str>
-
biopax2cadbiom.cadbiom_writer.
get_names_of_missing_physical_entities
(dictPhysicalEntity)[source]¶ Get URI and cadbiom name for each entity in the model.
Parameters: dictPhysicalEntity (<dict>) – Dictionnary of uris as keys and PhysicalEntities as values. Returns: Dictionnary of names as keys and uris as values. Return type: <dict>
classes¶
This module describes the classes that wrap the BioPAX formalism.
-
class
biopax2cadbiom.classes.
Control
(uri, interactionType, controlType, reaction_uri, controller)[source]¶ Bases:
biopax2cadbiom.classes.GenericBioPAXEntity
,biopax2cadbiom.classes.GenericBioPAXInteraction
Class for Control
Attributes:
Parameters: - interactionType (<str>) – Subclass of Control
- controlType (<str>) – type of control (ACTIVATION or INHIBITION). See warning below.
- controlled (<str>) – Control/Reaction that is controlled. (supposed to be a subclass of Interaction; can be None)
- controller (<str>) – Entity that controls the reaction
- controllers (<set>) – Entities that control the reaction. It happens in some borderline cases (KEGG). .. TODO:: Not currently supported!
Optional:
Parameters: evidences (<set>) – set of evidences uris (identify controllers of the same reaction) -
controlType
¶ Get control type
-
class
biopax2cadbiom.classes.
GenericBioPAXEntity
[source]¶ Bases:
object
Generic class for BioPAX entities which brings basic common functions
-
short_uri
¶ Return the URI without the prefix of the host.
- Example:
- uri: http://pathwaycommons.org/pc2/#Protein_XXX
- short_uri: Protein_XXX
Note
This attribute short_uri is read-only.
-
-
class
biopax2cadbiom.classes.
GenericBioPAXInteraction
[source]¶ Bases:
object
Generic class for BioPAX interactions which brings basic common functions
-
interactionType
¶ Get interaction type
-
-
class
biopax2cadbiom.classes.
Location
(uri, locationTerm)[source]¶ Bases:
biopax2cadbiom.classes.GenericBioPAXEntity
Class for Location
Attributes:
Parameters: - uri (<str>) – Uri of the location.
- locationTerm (<str>) –
Optional:
Parameters: - xrefs (<dict <str>:<set>>) – UnificationXref with dbnames as keys and sets of terms as values
- cadbiom_name (<str>) –
-
class
biopax2cadbiom.classes.
PhysicalEntity
(uri, name, location_uri, entityType, entityRef)[source]¶ Bases:
biopax2cadbiom.classes.GenericBioPAXEntity
Class for PhysicalEntity
Attributes:
Parameters: - uri (<str>) – Uri of the entity.
- name (<str>) – Name of the entity (displayName biopax attribute).
- location_uri (<str>) – Uri of the location of the entity.
- entityType – Type of the entity (Protein, Complex, etc.).
- entityRef (<str>) – Uri of the entity reference.
Optional:
Parameters: - synonyms (<set>) – All alternative names of an entity (name biopax attribute).
- components_uris (<set>) – Uris of the components of a Complex.
- members (<set>) – Uris of the members of a generic entity (class).
- location (<Location>) – Location object related to the entity.
This attribute is set by
biopax2cadbiom.biopax_converter.add_locations_to_entities()
. - xrefs (<dict <str>:<set>>) – UnificationXref with dbnames as keys and sets of terms as values
- reactions (<set>) – Set of reactions where the entity is involved.
This attribute is set by
biopax2cadbiom.biopax_converter.add_reactions_and_controllers_to_entities()
. - membersUsed (<set>) – Set of members of a class involved in at least one reaction.
Empty if the entity does not have members.
This attribute is set by
biopax2cadbiom.biopax_converter.detect_members_used()
. - cadbiom_name (<set>) – Unique name of the entity in a Cadbiom model.
This attribute is set by
add_unique_cadbiom_name_to_entities()
. - modificationFeatures (<Counter <str>:<int>>) – Dictionary of modification features;
modifications as keys; number of modifications as values.
This attribute is set by
add_modifications_features_to_entities()
. - flat_components (<list>) – Possible components of a complex. If classes are in
components_uris, the length of flat_components is > 1.
The unique Cadbiom name of a flat component is available at the same
index in cadbiom_names.
This attribute is set by
develop_complexes()
. - flat_components_primitives (<list>) – All primitive objects in flat_components.
Only intermediate complexes are replaced by their components.
Classes and nested classes are not decompiled and are kept as they are.
This attribute is used to rebuild a flat_component when we remove the
genericity during the duplication of the reactions.
This attribute is set by
develop_complexes()
. - cadbiom_names (<list>) – Possible names of the entity.
Based on the flat components for complexes, members for classes and
cadbiom_name for simple entities.
This attribute is set by
add_cadbiom_names_to_entities()
.
-
add_xref
(dbref, idref)[source]¶ Add xref to the existant xrefs
Parameters: - dbref (<str>) – Name of external database.
- idref (<str>) – Identifier in the given database.
-
entityType
¶ Get entity type
-
is_class
¶ Return True if the object is a class, False otherwise
-
is_complex
¶ Return True if the object is a complex, False otherwise
-
modificationFeatures
¶ Get modification features
Returns: Dictionary of modification features; modifications as keys; number of modifications as values Return type: <Counter <str>:<int>>
-
class
biopax2cadbiom.classes.
Reaction
(uri, name, interactionType, productComponent, participantComponent)[source]¶ Bases:
biopax2cadbiom.classes.GenericBioPAXEntity
,biopax2cadbiom.classes.GenericBioPAXInteraction
Class for reaction
Attributes:
Parameters: - uri (<str>) – Uri of the reaction.
- name (<str>) – Name of the reaction.
- interactionType (<str>) – Type of the interaction. Subclass of the BioPAX Interaction class.
- productComponent (<str>) –
- participantComponent (<str>) –
Optional:
Parameters: - pathways (<set>) – Set of pathways containing the reaction.
- leftComponents (<set>) – Set of uris of reagents.
- rightComponents (<set>) – Set of uris of products.
- controllers (<set>) – Control entities that control the reaction
- cadbiomSympyCond (<sympy.core.symbol.Symbol>) – Sympy condition.
- event (<str>) – Name of the event (Cadbiom notation)
- complexes (<dict>) – Used during duplication of reactions to replace generic complexes by their unique flat_component.
TODO: Handle conversionDirection attr.
sparql_biopaxQueries¶
This module contains a list of functions to query any SPARQL endpoint with BioPAX data.
-
biopax2cadbiom.sparql_biopaxQueries.
get_biopax_controls
(graph_uris, provenance_uri)[source]¶ Get objects of the Control class and its subclasses in the given graphs
Note
controlType is in (ACTIVATION, INHIBITION) Please note that Only Catalysis is allowed to have a default (not specified) controlType. Because of this, this attribute is optional.
In the near future, if you try to create a Modulation or any other class with a controlType which is None, this object will not be considered.
Parameters: - graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
- provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns: Dictionary of controls. uris as keys; Control objects as values
Return type: <dict <str>:<Control>>
-
biopax2cadbiom.sparql_biopaxQueries.
get_biopax_locations
(graph_uris)[source]¶ Get Location objects in the given graphs
Parameters: - graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
- provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns: Dictionary of locations. uris as keys; Location objects as values
Return type: <dict <str>:<Location>>
-
biopax2cadbiom.sparql_biopaxQueries.
get_biopax_modificationfeatures
(graph_uris, provenance_uri)[source]¶ Get ModificationFeatures that occur on PhysicalEntities, grouped by entity, modification type and number of modifications per type.
Parameters: - graph_uris – List of RDF graphs that will be queried on the triplestore.
- provenance_uri – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns: A dict of dicts (not Counters)! Each dict contains the modifications as keys and their number as values.
Return type: <dict <dict>>
-
biopax2cadbiom.sparql_biopaxQueries.
get_biopax_parent_pathways
(graph_uris, provenance_uri)[source]¶ Get sets of direct parent pathways for every pathway in the given graphs
Parameters: - graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
- provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns: Dict of parent pathways. Pathways as keys; sets of parent pathways as values
Return type: <dict <str>:<set>>
-
biopax2cadbiom.sparql_biopaxQueries.
get_biopax_pathways
(graph_uris, provenance_uri)[source]¶ Extract pathways from the given graphs
Parameters: - graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
- provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns: Dict of pathways URIs and names. keys: URIs; values: names (or uri if no name)
Return type: <dict>
-
biopax2cadbiom.sparql_biopaxQueries.
get_biopax_physicalentities
(graph_uris, provenance_uri)[source]¶ Get objects of the PhysicalEntity class and its subclasses in the given graphs
Note
From the BioPAX documentation, about the use of memberPhysicalEntity:
Using this property is not recommended. memberPhysicalEntity is only defined to support legacy data in certain databases. It is used to define a generic physical entity that is a collection of other physical entities. In general, EntityReference class should be used to create generic groups of physical entities, however, there are some cases where this is not possible, and the property has to be used. For instance, when an entity reference is used to define a generic physical entity with generic features, the generic features of the same type must be grouped. If you do not have grouping information for features of generic physical entities, you cannot use entity reference to define generic physical entities and must use the memberPhysicalEntity property. Another example for using this property is to create generic complexes, which are currently not supported with the EntityReference scheme (there is no “ComplexReference” class).
Parameters: - graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
- provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns: Dictionary of PhysicalEntity. uris as keys; Reaction objects as values
Return type: <dict <str>:<PhysicalEntity>>
-
biopax2cadbiom.sparql_biopaxQueries.
get_biopax_reactions
(graph_uris, provenance_uri)[source]¶ Query all Interactions of the database, minus Control objects.
Warning
We also get ‘Control’, if we do ‘rdfs: subClassOf * biopax3: Interaction’, but this must be done by get_biopax_controls().
THEREFORE: Suppression of the controls from the results via MINUS {}
Note
Control class contains (Catalysis, TemplateReactionRegulation, …)
Note
We correct the BioPAX hierarchy generated by some tools like BiNOM. This tool defines the entire hierarchy of parent classes for each BioPAX object instead of let users to use the RDFS reasoner and the rdfs: subclassof property. As a result, objects are queried as many times as they have parent classes. Fortunately, we remove Control derivatives from Interaction objects. However, Interaction objects are far too generic to be interpreted/used in the program, so we must ensure that objects created here have the most accurate interactionType attribute possible. In practice Virtuoso returns first the rdf: type most accurate property, then the parent classes (Ex: BiochemicalReaction then Conversion in the case of an object that would include these 2 properties). In theory, nothing seems to guarantee that this happens all the time.
Note
FR version: Nous corrigeons la hierarchie BioPAX générée par certains outils comme BiNOM. Cet outil définit toute la hiérarchie des classes parentes pour chaque objet BioPAX au lieu de laisser les utilisateurs d’utiliser le raisonneur RDFS et la propriété rdfs:subclassof. Par conséquent les objets sont requêtés autant de fois qu’ils ont de classes parentes. Heureusement nous enlevons les dérivés de Control des objets de type Interaction. Toutefois les objets Interaction sont bien trop génériques pour être interprétés/utilisés dans le programme, nous devons donc veiller à ce que les objets créés ici aient un attribut interactionType le plus précis possible. En pratique Virtuoso renvoie en premier la propriété rdf:type la plus précise, puis ensuite les classes parentes (Ex: BiochemicalReaction puis Conversion dans le cas d’un objet qui comporterait ces 2 propriétés). En théorie, rien ne semble garantir que cela se produise tout le temps.
Note
conversionDirection and catalysisDirection are respectively for Conversion and Catalysis subclasses. Do not forget that Catalysis direction overrides Conversion direction. Currently we assume that Conversion are LEFT_TO_RIGHT (although this is not recommended in the standard). Order of priority for directions: catalysisDirection > conversionDirection > spontaneous > thermodynamic constants and FBA analysis
Parameters: - graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
- provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns: Dictionary of reactions. uris as keys; Reaction objects as values
Return type: <dict <str>:<Reaction>>
-
biopax2cadbiom.sparql_biopaxQueries.
get_biopax_xrefs
(graph_uris, provenance_uri, database_name=None)[source]¶ Get xrefs of all entities in the given database (if specified)
- An Xref is a reference from an instance of a class in the current ontology
- to an object in external resource.
- An xref can be an instance of PublicationXref, RelationshipXref,
- UnificationXref.
Warning
WE DO NOT filter the references according to the relation of identity or similarity that they define. i.e, UnificationXref relationships have the same weight as RelationshipXref relationships, and the relationshipType attributes of RelationshipXref objects are not used to show the degree of similarity between the current object and the object in the external database (see the note below).
Note
Classes inherit xref from their members.
Note
Each ontology can differently name their databases. Ex: ‘UniProt’ vs ‘uniprot knowledgebase’, ‘ChEBI’ vs ‘chebi’
Note
Some objects (RelationshipXref, ?) have relationshipType attributes pointing to RelationshipTypeVocabulary objects. These objects use the PSI Molecular Interaction ontology (MI).
Parameters: - graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
- provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns: Dictionary of entityRefs. keys: uris; values: dict of databases keys: database names; values: ids
Return type: <dict <str>: <dict <str>: <list>>>
-
biopax2cadbiom.sparql_biopaxQueries.
get_graphs_from_triplestore
()[source]¶ Get the list of graphs URIs in the triplestore
Note
The queried graphs are named graphs.
Returns: Iterable of tuples (1 graph URI per tuple) Return type: <generator>
-
biopax2cadbiom.sparql_biopaxQueries.
get_info_from_triplestore
(graph_uris=[])[source]¶ List graphs and subgraphs from the triplestore and retrieve some metadata
Parameters: graph_uris (<list>) – List of graphs uris (optional) Returns: Generator of tuples: (graph_uri, provenance_uri, name, dname, comment) Return type: <generator>
-
biopax2cadbiom.sparql_biopaxQueries.
get_subgraphs_from_triplestore
(graph_uris)[source]¶ Get URIs of BioPAX graphs in the configured triplestore
Note
We assume that graphs are in full BioPAX format, i.e that dataSource attribute is set on entities. That’s the only way to extract a database from another in a merged graph (Cf PathwayCommons).
Note
In practice, name is more precise than displayName.
Note
SPARQL query:
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#> SELECT ?graph ?provenance ?name ?dname ?comment WHERE { GRAPH ?graph { ?provenance a bp:Provenance. OPTIONAL { ?provenance bp:standardName ?name. } OPTIONAL { ?provenance bp:displayName ?dname. } OPTIONAL { ?provenance bp:comment ?comment. } } } ORDER BY ?graph ?name
Parameters: graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore. Returns: Iterable of tuples. (graph_uri, provenance_uri, name, display_name, comment) Note
If you get an encoding error in name or comment, please put ‘from __future__ import unicode_literals’ at the begining of your Python script.
Type: <generator>
sparql_wrapper¶
Module used to query SPARQL endpoint.
-
biopax2cadbiom.sparql_wrapper.
auto_add_prefixes
(func)[source]¶ Decorator: Add all prefixes to the SPARQL query at first argument of sparql_query()
-
biopax2cadbiom.sparql_wrapper.
load_sparql_endpoint
()[source]¶ Make a connection to SPARQL endpoint & retrieve a cursor.
Returns: sparql cursor in version 1! => we don’t use SPARQLWrapper2 cursor that provides SPARQLWrapper.SmartWrapper.Bindings-class to convert JSON from server. Return type: <SPARQLWrapper>
-
biopax2cadbiom.sparql_wrapper.
order_results
(query, orderby='?uri', limit=4000)[source]¶ Build nested query for access points with restrictions.
Build the nested query by encapsulating the original between the same SELECT command (minus useless DISTINCT clause), and the OFFSET & LIMIT clauses at the end. PS: don’t forget to add the ORDER BY at the end of the original query.
http://vos.openlinksw.com/owiki/wiki/VOS/VirtTipsAndTricksHowToHandleBandwidthLimitExceed https://etl.linkedpipes.com/components/e-sparqlendpointselectscrollablecursor
Warning
WE ASSUME THAT THE SECOND LINE OF THE QUERY CONTAINS THE FULL SELECT COMMAND !!!
Parameters: - arg1 (<str>) – Original normal SPARQL query.
- arg2 (<str>) – Order queries by this variable.
- arg3 (<int>) – Max items queried for 1 block.
Returns: A generator of lines of results.
Return type: <dict>