Documentation of the package for developers

biopax_converter

This module is used to translate BioPAX data to CADBIOM models.

biopax2cadbiom.biopax_converter.add_cadbiom_names_to_entities(dictPhysicalEntity)[source]

Fill ‘cadbiom_names’ attribute of entities.

The aim is to have the list of elements contained in each entities and their names.

We process essentially entities with subunits: components or members (complexes or classes).

The attribute ‘cadbiom_names’ corresponds to a list of unique cadbiom IDs for the entity (Complex, Class). Each member of the list is the unique cadbiom ID of each subcomponent present in the attribute ‘flat_components’.

Warning

To fill ‘cadbiom_names’, we first handle complexes that can be classes; BUT classes are not necessarily complexes (without ‘flat_components’), so a recursive decomposition is made. For that, see get_cadbiom_names()

Note

Because complexes are already developed in developComplexEntity(), this type of entities do not have to be decompiled recursively here.

Parameters:dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
biopax2cadbiom.biopax_converter.add_conditions_to_reactions(dictReaction, dictPhysicalEntity, dictControl)[source]

Elaborate condition for each event attached to a reaction.

Note

Condition: i.e. guard of transition in Cadbiom formalism.

Parameters:
  • dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
biopax2cadbiom.biopax_converter.add_controllers_to_reactions(dictReaction, dictControl)[source]

Fill the attribute controllers of Reaction objects with Controls objects.

Note

Thanks to filter_control() we have only entities; each controller (control.controllers) is an entity (not a pathway), so only entities control reactions.

Note

The controllers attribute of a Reaction corresponds to a set of controller entities involved in it.

Parameters:
  • dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions created, by the function query.get_biopax_reactions()
  • dictControl (<dict <str>: <Control>> keys: uris; values control objects) – Dictionary of biopax controls created, by the function query.get_biopax_controls()
biopax2cadbiom.biopax_converter.add_locations_to_entities(dictPhysicalEntity, dictLocation)[source]

Add Location objects to PhysicalEntities

biopax2cadbiom.biopax_converter.add_modifications_features_to_entities(dictPhysicalEntity, dictModificationFeatures)[source]

Add modifications and their number to the entity name

biopax2cadbiom.biopax_converter.add_reactions_and_controllers_to_entities(dictReaction, dictControl, dictPhysicalEntity)[source]

Fill the attribute reactions of PhysicalEntity objects with Reactions and Controls objects.

Note

The reactions attribute corresponds to a set of reactions in which the entity is involved (as controller or participant). We use this attribute in order to know if complexes have to be deconstructed (only if a subentity is used elsewhere in a reaction).

Note

Supported roles in reactions are: - productComponent - participantComponent - leftComponents - rightComponents - controller of

Thanks to filter_control() we have only entities; each controller (control.controllers) is an entity (not a pathway), so only entities control reactions. Empty controllers of Control objects shouldn’t happen since this attr is not optional in the SPARQL query.

Some controlled elements can also be controls (Cf Modulation class in some databases); This has nothing to do with this function. See add_controllers_to_reactions() and get_control_group_condition() instead.

=> We just remove controllers that aren’t in dictPhysicalEntity; and Controls that haven’t a controlled reaction (but another control).

Parameters:
  • dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
  • dictControl (<dict <str>: <Control>> keys: uris; values control objects) – Dictionary of biopax controls, created by the function query.get_biopax_controls()
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
biopax2cadbiom.biopax_converter.add_unique_cadbiom_name_to_entities(dictPhysicalEntity)[source]

Add cadbiom_name attribute to entities in dictPhysicalEntity.

Note

The attribute cadbiom_name corresponds to a unique cadbiom ID for the entity (Protein, Complex, Class, etc.).

Parameters:dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
biopax2cadbiom.biopax_converter.add_xrefs_to_entities(dictPhysicalEntity, dictEntities_db_refs)[source]

Add xrefs to entities

Parameters:dictEntities_db_refs (<dict <str>: <dict <str>: <list>>>) – Dictionary of entityRefs. keys: uris; values: dict of databases keys: database names; values: ids
biopax2cadbiom.biopax_converter.assign_missing_names(dictPhysicalEntity)[source]

Assign an arbitrary name to entities without displayName

The chosen name is the first among the sorted synonyms.

Some files do not declare the displayName attribute of their entities. This has important consequences on the merge of similar entities because this process uses this attribute to detect them.

cf. CellDesigner, Curie files.

biopax2cadbiom.biopax_converter.build_cadbiom_name(entity, synonym=None)[source]

Get entity name formatted for Cadbiom.

Parameters:
  • entity (<PhysicalEntity>) – PhysicalEntity for which the name will be encoded.
  • synonym (<str>) – (Optional) Synonym that will be used instead of the name of the given entity.
Returns:

Encoded name with location if it exists.

Return type:

<str>

biopax2cadbiom.biopax_converter.clean_name(name)[source]

Clean name for correct cadbiom parsing.

biopax2cadbiom.biopax_converter.compute_locations_names(dictLocation, numeric_compartments_names=False)[source]

Create a cadbiom ID for each location.

Warning

It updates the key ‘cadbiom_name’ of entities in dictLocation[location].

Parameters:
  • dictLocation (<dict>) – Dictionary of biopax locations created by query.get_biopax_locations(). keys: CellularLocationVocabulary uri; values: Location object
  • numeric_compartments_names (<bool>) – (optional) If True, names of compartments will be based on numeric values instead of their real names.
Returns:

Dict of encoded locations. keys: numeric value or real location name; values: Location object

Return type:

<dict <str>:<Location>>

biopax2cadbiom.biopax_converter.createControlFromEntityOnBothSides(dictReaction, dictControl)[source]

Remove entities on both sides of reactions and create a control instead.

We believe that these entities present in the reagents and products are in fact a catalysts without which the reaction can not take place.

We remove this entity from the reaction and add an ACTIVATION controller to the list of BioPAX Controls.

Note

This function must be called before adding reactions to entities.

Parameters:
  • dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
  • dictControl (<dict <str>: <Control>> keys: uris; values control objects) – Dictionary of biopax controls, created by the function query.get_biopax_controls()
biopax2cadbiom.biopax_converter.detect_members_used(dictPhysicalEntity, full_graph=False, keepEmptyClasses=False)[source]

Set the attribute ‘membersUsed’ of generic entities (classes).

Set of members involved in at least one reaction. Empty set if the entity does not have members.

Warning

A generic entity can be any of the subclasses of PhysicalEntity. Note that complexes are the only entities with ALWAYS ‘flat_components’ != None value.

A complex can also be a class and we check that these entities have no flat_components in develop_complexes().

Parameters:
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
  • full_graph (<bool>) – (optional) Convert all entities to cadbiom node, even the entities that are not used elsewhere.
  • keepEmptyClasses – (optional) (deprecated) If some members are not used, we add the entity to the membersUsed attribute with the aim to represent all the members not used. => This will break some conversions and unit tests because the translation implies the removal of genericity.
biopax2cadbiom.biopax_converter.developComplexEntity(complex_entity, dictPhysicalEntity, new_physical_entities)[source]

Fill flat_components attribute of the given complex.

Called by develop_complexes().

Search recursively all components of the given complex.

Some Complex have subcomplex like in Reactome 56 from PC8. Example:

  • Complex_c33f6c2be7551100a54e716b3bf8ec8a:
  • Complex_0088fc0fe989a0b0abc3635b20df8d90
  • Complex_b87d9cb2e60df79cdde88a9f8f45e80d

Here we handle ONLY COMPLEXES! Even if some complexes have sub-entities that are classes, flat_components contains ONLY uris of atomic entities. ONLY flat_components_primitives contains generic entities. In flat_components_primitives, we just want items (including generic ones) in the same order as any flat_component in flat_components.

Some Complex are classes, some of these classes may have components. In this case, we produce a new Complex (copy of the class) with only the components of the class; and we erase these components from the class.

  • The new complex is added to new_physical_entities and must be added later to dictPhysicalEntity. Its uri is completed with the suffix “_not_class”.
  • The class is left in dictPhysicalEntity.

Warning

Complexes are the only entities that have a flat_components attribute set. However, Complexes that are also classes should have an empty flat_components.

Note

Empty complexes (without component) are processed like any basic entity. Cf VirtualCase19: ‘B_bottom’

Todo

When a class occurs multiple times through components of complexes we should remove it and make a set of primitives. This will avoid cartesian product of members, duplication of complexes on useless flat_components. Cf VirtualCase19: ‘B’ class in C_top and C_bottom.

Full explanations:

developed_components is a list of tuples that contain combinations of all recursively searched sub-entities in the given complex.

developed_classes is a list of primitives sub-entities in the given complex. Classes are not replaced by their members. Entities are in the same order as in a flat_component. The aim is to dynamically rebuild the flat_component of a complex when we remove genericity in replace_and_build().

Example:

A: complex composed with components:
    B: complex with components:
        W: protein
        X: generic smallmolecule with members:
            Y: smallmolecule (used elsewhere)
            Z: smallmolecule (not used elsewhere)
    C: protein

X is a class that represents 2 smallmolecules: Z and Y

For X: developed_components = [X, Y] (edit: just [Y] now)
For W: developed_components = [W]
So for B: developed_components = [[X, Y], [W]] (edit: [[Y], [W]])
and flat_components = [(X, W), (Y, W)] (edit: [(Y, W)])
For A: developed_components = [[C], [(X, W), (Y, W)]] (edit: [[C], [(Y, W)]])
and flat_components = [(C, X, W), (C, Y, W)]
(edit: now X is removed, and the final result is [(C, Y, W)])

If Z has been used elsewhere, we would have had the following
final result for developed_components of A:
[[C], [(Y, W), (Z, W)]]
and flat_components:
[(C, Y, W), (C, Z, W)]

developed_classes = [C, X, W]
flat_components_primitives = [C, X, W]

PS:
'A' can be Complex_6e3d8ef563cbcc0c9e2a4afb2a920c38
(Reactome v56 inPC8); In this complex, Z is also used,
so X is totally removed.
Parameters:
  • complex_entity (<PhysicalEntity>) – Complex entity
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
biopax2cadbiom.biopax_converter.develop_complexes(dictPhysicalEntity)[source]

Set the attribute ‘flat_components’ of complexes entities.

‘flat_components’ is a list of tuples of component URIs.

This function depends of detect_members_used().

Parameters:dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
biopax2cadbiom.biopax_converter.filter_controls(controls, pathways_names, blacklisted_entities)[source]

Remove pathways and cofactors from controls and keep others entities.

Note

Remove also entities that control pathways.

Note

We want ONLY entities and by default, there are pathways + entities.

Parameters:
  • controls (<dict>) – Dict of Contollers. keys: URIs; values: <Control>
  • pathways_names (<dict>) – Dict of pathways URIs and names. keys: URIs; values: names (or uri if no name)
  • blacklisted_entities (<set>) – set of entity uris blacklisted
Returns:

Filtered controllers dict.

Return type:

<dict>

biopax2cadbiom.biopax_converter.filter_entities(dictPhysicalEntity, blacklisted_entities)[source]

Remove blacklisted entities from BioPAX entities.

Note

Blacklisted entities are removed from dictPhysicalEntity, from components and from members.

Parameters:
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
  • blacklisted_entities (<set>) – set of entity uris blacklisted
Returns:

Dictionary of biopax physicalEntities without blacklisted entities

Return type:

<dict <str>: <PhysicalEntity>>

biopax2cadbiom.biopax_converter.filter_reactions(dictReaction, blacklisted_entities)[source]

Remove blacklisted entities from reactions.

Note

Effects: - productComponent and participantComponent can be set to None - blacklisted entities are removed from leftComponents and rightComponents

Parameters:
  • dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
  • blacklisted_entities (<set>) – set of entity uris blacklisted
biopax2cadbiom.biopax_converter.find_unique_synonyms(cadbiom_name, entity_uris, unique_cadbiom_names, dictPhysicalEntity)[source]

Build unique names for the given uris, having the same cadbiom name.

Note

First, we use synonyms from BioPAX database to find a unique name. When there is no more usable synonyms to build a unique name, we add a version number based on the given cadbiom name for all the remaining entities.

Note

The merging procedure for similar entities greatly reduces the number of entity groups proposed to this function.

Parameters:
  • cadbiom_name (<str>) – The redundant cadbiom name
  • entity_uris (<set>) – Set of uris of entities having the same name
  • unique_cadbiom_names (<set>) – Set of unique cadbiom names already used
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns:

Dictionary of uris as keys and unique names as values.

Return type:

<dict>

biopax2cadbiom.biopax_converter.get_cadbiom_names(entity, dictPhysicalEntity)[source]

To be called recursively or by add_cadbiom_names_to_entities()

Note

See add_cadbiom_names_to_entities() for more information.

Note

The attribute ‘cadbiom_names’ corresponds to a list of unique cadbiom IDs for the entity (Complex, Class). Each member of the list is the unique cadbiom ID of each subcomponent present in the attribute ‘flat_components’.

Parameters:
  • entity (<PhysicalEntity>) – A PhysicalEntity.
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns:

Set of cadbiom names for the given entity.

Return type:

<set>

biopax2cadbiom.biopax_converter.get_control_group_condition(controls, dictPhysicalEntity, controlled_controls)[source]

Get condition for a group of controllers.

  • Activators are linked together by a logical ‘OR’,
  • inhibitors are linked together by a logical ‘OR’,
  • but sets of activators and inhibitors are linked together by a logical ‘AND’.

Cascades of controls are supported here; Each regulation from a nested condition is linked by an ‘AND’ operator.

Unsupported controlTypes lead to a None condition; i.e. a cascade of controls can be breaked if a control has an unknown controlType.

Warning

controlType can be as follows (* are currently supported because they are general terms; others are from EcoCyc and will be logged as errors):

  • ACTIVATION*
  • INHIBITION*
  • INHIBITION-ALLOSTERIC
  • INHIBITION-COMPETITIVE
  • INHIBITION-IRREVERSIBLE
  • INHIBITION-NONCOMPETITIVE
  • INHIBITION-OTHER
  • INHIBITION-UNCOMPETITIVE
  • ACTIVATION-NONALLOSTERIC
  • ACTIVATION-ALLOSTERIC

Note

Controllers/classes are processed in get_cadbiom_names(). Here we just use cadbiom_names to distinguish entities.

Parameters:
  • controls (<set <Control>>) – Set of Control objects for a reaction.
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns:

Sympy condition or None

Return type:

<sympy.core.symbol.Symbol> or <None>

biopax2cadbiom.biopax_converter.get_pathways_entities(dictReaction, dictControl, dictPhysicalEntity)[source]

This function creates the Dictionary pathwayToPhysicalEntities.

Parameters:
  • dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
  • dictControl (<dict <str>: <Control>> keys: uris; values control objects) – Dictionary of biopax controls, created by the function query.get_biopax_controls()
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns:

pathwayToPhysicalEntities keys: pathway uris; values: set of entities involved in the pathway.

Return type:

<dict <str>: <set>>

biopax2cadbiom.biopax_converter.get_transitions(dictReaction, dictPhysicalEntity)[source]

Return transitions with (ori/ext nodes) and their respective events.

Types considered as reactions:
  • Conversion
  • BiochemicalReaction
  • ComplexAssembly
  • Transport
  • TransportWithBiochemicalReaction
Types considered as regulators:
  • Catalysis
  • Control
  • TemplateReactionRegulation
Types not supported:
  • MolecularInteraction
  • Degradation

Warning

dictPhysicalEntity is modified in place. We add “virtual nodes” for genes that are not in BioPAX format.

Todo

handle Degradation types and TRASH nodes => will crash cadbiom writer because they are not entities…

Parameters:
  • dictReaction (<dict <str>: <Reaction>> keys: uris; values reaction objects) – Dictionary of biopax reactions, created by the function query.get_biopax_reactions()
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
Returns:

Dictionary of transitions and their respective set of events.

Example:
subDictTransition[(cadbiomL,right)].append({
    ‘event’: transition[‘event’],
    ‘reaction’: reaction,
    ‘sympyCond’: transitionSympyCond
}

Return type:

<dict <tuple <str>, <str>>: <list <dict>>>

biopax2cadbiom.biopax_converter.load_blacklisted_entities(blacklist_file)[source]

Get all URIs of blacklisted elements in the given file.

Note

The csv can be written with the following delimiters: ‘,;’. In the first column we expect the URI, In the second column users can put the corresponding cadbiom name (currently not used).

Param:blacklist_file filename.
Type:<str>
Returns:Set of uris.
Return type:<set>
biopax2cadbiom.biopax_converter.main(params)[source]

Entry point

Here we detect the presence of the pickle backup and its settings. If there is no backup or if the user doesn’t want to use this functionality, queries are made against the triplestore.

Then, we construct a Cadbiom model with all the retrieved data.

biopax2cadbiom.biopax_converter.merge_duplicated_entities(dictPhysicalEntity, model_path, log_files=True)[source]

Merge multiple occurrences of the same entity in the model

The duplicates can come from the BioPAX database, as well as from the process of transferring post-translational modifications of classes to their daughter entities in transfer_class_attributes_on_child_entities()

In order to group the entities, they are ordered according to some of their attributes:

  • entityType
  • entityRef
  • name
  • components_uris
  • location_uri
  • modificationFeatures

3 files are created in this function:

  • sort_dumped.csv: Dump of all entities (sorted but not grouped)
  • sort_grouped.csv: Dump proposed groups
  • sort_grouped_after_merge.csv: Dump definitive groups

Note

About reactions attached to duplicate entities: Reactions from all duplicates are merged together.

Warning

If classes with similar attributes are merged, then we consider that their members are similar. These members are not merged together.

Todo

During the merge of entities, prefer existing uris in the BioPAX model rather than those formed by duplication.

Parameters:
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
  • model_path (<str>) – Filepath of the final model.
Key log_files:

(optional) If True, csv files are created. Default: True.

Returns:

Dictionary of canonical uris as keys, and lists of non-canonical linked uris as values.

Return type:

<dict <list>>

biopax2cadbiom.biopax_converter.shortening_modifications(modificationFeatures, length=1)[source]

Return a short version of all given modification names and occurences.

Note

Some terms can be corrected before shortening: - residue modification, inactive: inactive - residue modification, active: active

Parameters:
  • modificationFeatures (<Counter>) – Counter of modificationFeatures
  • length (<int>) – Length of the shortening; put None for entire strings.
Returns:

Short and merged version of the given modificationFeatures.

Return type:

<str>

biopax2cadbiom.biopax_converter.sort_callback(elem)[source]

Order of the sort of PhysicalEntities on their attributes

The sort of all entities must respect lexicographic order of all attributes.

=> if component URI is not casted into a sorted list, the order is modified, and then, itertools.groupby will be fooled:

  • [‘W’, ‘X’] < [‘X’, ‘Y’] => True
  • {‘X’, ‘W’} < {‘X’, ‘Y’} => False
['W', 'X'] is < to ['X', 'Y']
Ater;A;['W', 'X'];http://simulated/test#anywhere;
A;A;['X', 'Y'];http://simulated/test#anywhere;
Abis;A;['X', 'Y'];http://simulated/test#anywhere;

If we do not cast set into list:
A;A;['Y', 'X'];http://simulated/test#anywhere;
Abis;A;['Y', 'X'];http://simulated/test#anywhere;
Ater;A;['X', 'W'];http://simulated/test#anywhere;
Parameters:elem (<PhysicalEntity>) – PhysicalEntity
biopax2cadbiom.biopax_converter.transfer_class_attributes_on_child_entities(entities, dictPhysicalEntity)[source]

Transfer modificationFeatures and location of classes on child entities

If a child entity does not have the same attributes as its class, it is inserted in the list of BioPAX entities under a specific (new) URI, with its new inherited attributes. It is possible that an entity describing this state is already in the BioPAX ontology. In this case, the two entities will then be grouped by the function merge_duplicated_entities().

Todo

Si entité dupliquée déjà dans le dictionnaire:

  • elle est déjà utilisée ailleurs dans 1 classe. => doit etre décompilée même si ne participe à aucune réaction.
  • Sinon, supprimer les réactions. => sert à rien de créer des entités non utilisées dans le modèle.

Note

In a general way, sub-entities are not duplicated if the class doesn’t provide information that is not already in the sub-entity.

We try not to overwrite modifications or location if they are the same. The transfer of similar post-translational modifications AND location is useless. => Avoid the duplication of entities.

However, we can not exclude that inconsistent / conflicting modifications are applied to the sub-entities such as: residue modification, active and residue modification, inactive

Note

About reactions attached to duplicate entities: We CAN reset all reactions (the attributes reactions of sub-entities) involving the entity in its old context (without the transfer of attributes that we operate here). This avoids appearing in the model entities that are not reused anywhere else. If the entity must be present in the model, it will be decided during the merge by the function merge_duplicated_entities() that also merges the reactions of the duplicates.

BUT we choose to keep the reactions of the parent entity in order to solve VirtualCase14 bug. We prefer to have more entities than false transitions in the final model.

There are four cases to consider about this problem:

  • none of the duplicates contains a reaction. => the merged entity will be absent from the model

  • the duplicate entity has no reaction but the duplicate already in the model contains one. => the merged entity will be in the model

  • the duplicate entity has a reaction but the duplicate already in the model does not contain one.

    • if the attribute reactions is not reset, the merged entity will be wrongly in the model because of it will be flagged as being reused elsewhere by detect_members_used().
    • if the attribute reactions is reset, a side effect described in testCase 14 will appear: the decompilation of classes participating in reactions causes the formation of incorrect relations between the entities of these classes.

Warning

dictPhysicalEntity is modified here.

Parameters:
  • entities (<dict <str>: <PhysicalEntity>>) – Dictionary of entities to be processed. keys: uris; values entity objects
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>>) – Dictionary of all entities in the model. keys: uris; values entity objects

cadbiom_writer

This module groups functions used to export BioPAX-processed data to a Cabiom model file.

biopax2cadbiom.cadbiom_writer.build_json_data(entity)[source]

Build JSON data about the given entities.

Note

We can handle reactions from dictReaction, or entities from dictPhysicalEntity

Note

Return these attributes if they exist:

  • PhysicalEntity:
    • uri
    • entityType
    • name + synonyms
    • entityRef
    • location
    • modificationFeatures
    • members
    • reactions
  • Reaction:
    • uri
    • interactionType
Parameters:entity (<str>) – URI of an entity or a list of reactions.
Returns:JSON formatted str.
Return type:<str>
biopax2cadbiom.cadbiom_writer.create_cadbiom_model(dictTransition, dictPhysicalEntity, dictReaction, model_name, file_path)[source]

Export data into a Cadbiom file format.

Parameters:
  • dictTransition (<dict <tuple <str>, <str>>: <list <dict>>>) –

    Dictionnary of transitions and their respective set of events.

    Example:
    subDictTransition[(cadbiomL,right)].append({
        ‘event’: transition[‘event’],
        ‘reaction’: reaction,
        ‘sympyCond’: transitionSympyCond
    }
    
  • dictPhysicalEntity (<dict <str>: <PhysicalEntity>> keys: uris; values entity objects) – Dictionnary of biopax physicalEntities, created by the function query.get_biopax_physicalentities()
  • dictReaction (<dict>) –
  • model_name (<str>) – Name of the model.
  • file_path (<str>) – File path.
biopax2cadbiom.cadbiom_writer.format_condition(condition)[source]

Build the string representation of the given sympy expression

We just replace logical operators by their textual version in the string representation of the given condition. To see the previous version with tree parsing go to commits <= 9702ea8; this last method is cleaner but recursive and costly for BIG and COMPLEX conditions.

Parameters:condition (<sympy.Or>, <sympy.And>, <sympy.Not>) – Sympy expression
Return type:<str>
biopax2cadbiom.cadbiom_writer.format_events_and_conditions(events_conditions)[source]

Build the condition of a transition based on the given set of events and conditions

Parameters:events_conditions (<set <tuple <str>, <str>>>) – Set of tuples (event name, sympy condition)
Return type:<str>
biopax2cadbiom.cadbiom_writer.get_names_of_missing_physical_entities(dictPhysicalEntity)[source]

Get URI and cadbiom name for each entity in the model.

Parameters:dictPhysicalEntity (<dict>) – Dictionnary of uris as keys and PhysicalEntities as values.
Returns:Dictionnary of names as keys and uris as values.
Return type:<dict>
biopax2cadbiom.cadbiom_writer.remove_scc_from_model(file_path)[source]

Remove SCC (Strongly Connected Components) from a model

The new model will be exported with “_without_scc” suffix in its filename.

Parameters:file_path (<str>) – Path of a model (.bcx file)

classes

This module describes the classes that wrap the BioPAX formalism.

class biopax2cadbiom.classes.Control(uri, interactionType, controlType, reaction_uri, controller)[source]

Bases: biopax2cadbiom.classes.GenericBioPAXEntity, biopax2cadbiom.classes.GenericBioPAXInteraction

Class for Control

Attributes:

Parameters:
  • interactionType (<str>) – Subclass of Control
  • controlType (<str>) – type of control (ACTIVATION or INHIBITION). See warning below.
  • controlled (<str>) – Control/Reaction that is controlled. (supposed to be a subclass of Interaction; can be None)
  • controller (<str>) – Entity that controls the reaction
  • controllers (<set>) – Entities that control the reaction. It happens in some borderline cases (KEGG). .. TODO:: Not currently supported!

Optional:

Parameters:evidences (<set>) – set of evidences uris (identify controllers of the same reaction)
controlType

Get control type

class biopax2cadbiom.classes.GenericBioPAXEntity[source]

Bases: object

Generic class for BioPAX entities which brings basic common functions

short_uri

Return the URI without the prefix of the host.

Example:

Note

This attribute short_uri is read-only.

class biopax2cadbiom.classes.GenericBioPAXInteraction[source]

Bases: object

Generic class for BioPAX interactions which brings basic common functions

interactionType

Get interaction type

class biopax2cadbiom.classes.Location(uri, locationTerm)[source]

Bases: biopax2cadbiom.classes.GenericBioPAXEntity

Class for Location

Attributes:

Parameters:
  • uri (<str>) – Uri of the location.
  • locationTerm (<str>) –

Optional:

Parameters:
  • xrefs (<dict <str>:<set>>) – UnificationXref with dbnames as keys and sets of terms as values
  • cadbiom_name (<str>) –
add_xref(dbref, idref)[source]

Add xref to the existant xrefs

class biopax2cadbiom.classes.PhysicalEntity(uri, name, location_uri, entityType, entityRef)[source]

Bases: biopax2cadbiom.classes.GenericBioPAXEntity

Class for PhysicalEntity

Attributes:

Parameters:
  • uri (<str>) – Uri of the entity.
  • name (<str>) – Name of the entity (displayName biopax attribute).
  • location_uri (<str>) – Uri of the location of the entity.
  • entityType – Type of the entity (Protein, Complex, etc.).
  • entityRef (<str>) – Uri of the entity reference.

Optional:

Parameters:
  • synonyms (<set>) – All alternative names of an entity (name biopax attribute).
  • components_uris (<set>) – Uris of the components of a Complex.
  • members (<set>) – Uris of the members of a generic entity (class).
  • location (<Location>) – Location object related to the entity. This attribute is set by biopax2cadbiom.biopax_converter.add_locations_to_entities().
  • xrefs (<dict <str>:<set>>) – UnificationXref with dbnames as keys and sets of terms as values
  • reactions (<set>) – Set of reactions where the entity is involved. This attribute is set by biopax2cadbiom.biopax_converter.add_reactions_and_controllers_to_entities().
  • membersUsed (<set>) – Set of members of a class involved in at least one reaction. Empty if the entity does not have members. This attribute is set by biopax2cadbiom.biopax_converter.detect_members_used().
  • cadbiom_name (<set>) – Unique name of the entity in a Cadbiom model. This attribute is set by add_unique_cadbiom_name_to_entities().
  • modificationFeatures (<Counter <str>:<int>>) – Dictionary of modification features; modifications as keys; number of modifications as values. This attribute is set by add_modifications_features_to_entities().
  • flat_components (<list>) – Possible components of a complex. If classes are in components_uris, the length of flat_components is > 1. The unique Cadbiom name of a flat component is available at the same index in cadbiom_names. This attribute is set by develop_complexes().
  • flat_components_primitives (<list>) – All primitive objects in flat_components. Only intermediate complexes are replaced by their components. Classes and nested classes are not decompiled and are kept as they are. This attribute is used to rebuild a flat_component when we remove the genericity during the duplication of the reactions. This attribute is set by develop_complexes().
  • cadbiom_names (<list>) – Possible names of the entity. Based on the flat components for complexes, members for classes and cadbiom_name for simple entities. This attribute is set by add_cadbiom_names_to_entities().
add_xref(dbref, idref)[source]

Add xref to the existant xrefs

Parameters:
  • dbref (<str>) – Name of external database.
  • idref (<str>) – Identifier in the given database.
entityType

Get entity type

is_class

Return True if the object is a class, False otherwise

is_complex

Return True if the object is a complex, False otherwise

modificationFeatures

Get modification features

Returns:Dictionary of modification features; modifications as keys; number of modifications as values
Return type:<Counter <str>:<int>>
class biopax2cadbiom.classes.Reaction(uri, name, interactionType, productComponent, participantComponent)[source]

Bases: biopax2cadbiom.classes.GenericBioPAXEntity, biopax2cadbiom.classes.GenericBioPAXInteraction

Class for reaction

Attributes:

Parameters:
  • uri (<str>) – Uri of the reaction.
  • name (<str>) – Name of the reaction.
  • interactionType (<str>) – Type of the interaction. Subclass of the BioPAX Interaction class.
  • productComponent (<str>) –
  • participantComponent (<str>) –

Optional:

Parameters:
  • pathways (<set>) – Set of pathways containing the reaction.
  • leftComponents (<set>) – Set of uris of reagents.
  • rightComponents (<set>) – Set of uris of products.
  • controllers (<set>) – Control entities that control the reaction
  • cadbiomSympyCond (<sympy.core.symbol.Symbol>) – Sympy condition.
  • event (<str>) – Name of the event (Cadbiom notation)
  • complexes (<dict>) – Used during duplication of reactions to replace generic complexes by their unique flat_component.

TODO: Handle conversionDirection attr.

sparql_biopaxQueries

This module contains a list of functions to query any SPARQL endpoint with BioPAX data.

biopax2cadbiom.sparql_biopaxQueries.get_biopax_controls(graph_uris, provenance_uri)[source]

Get objects of the Control class and its subclasses in the given graphs

Note

controlType is in (ACTIVATION, INHIBITION) Please note that Only Catalysis is allowed to have a default (not specified) controlType. Because of this, this attribute is optional.

In the near future, if you try to create a Modulation or any other class with a controlType which is None, this object will not be considered.

See: biopax2cadbiom.classes.Control.

Parameters:
  • graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
  • provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns:

Dictionary of controls. uris as keys; Control objects as values

Return type:

<dict <str>:<Control>>

biopax2cadbiom.sparql_biopaxQueries.get_biopax_locations(graph_uris)[source]

Get Location objects in the given graphs

Parameters:
  • graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
  • provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns:

Dictionary of locations. uris as keys; Location objects as values

Return type:

<dict <str>:<Location>>

biopax2cadbiom.sparql_biopaxQueries.get_biopax_modificationfeatures(graph_uris, provenance_uri)[source]

Get ModificationFeatures that occur on PhysicalEntities, grouped by entity, modification type and number of modifications per type.

Parameters:
  • graph_uris – List of RDF graphs that will be queried on the triplestore.
  • provenance_uri – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns:

A dict of dicts (not Counters)! Each dict contains the modifications as keys and their number as values.

Return type:

<dict <dict>>

biopax2cadbiom.sparql_biopaxQueries.get_biopax_parent_pathways(graph_uris, provenance_uri)[source]

Get sets of direct parent pathways for every pathway in the given graphs

Parameters:
  • graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
  • provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns:

Dict of parent pathways. Pathways as keys; sets of parent pathways as values

Return type:

<dict <str>:<set>>

biopax2cadbiom.sparql_biopaxQueries.get_biopax_pathways(graph_uris, provenance_uri)[source]

Extract pathways from the given graphs

Parameters:
  • graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
  • provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns:

Dict of pathways URIs and names. keys: URIs; values: names (or uri if no name)

Return type:

<dict>

biopax2cadbiom.sparql_biopaxQueries.get_biopax_physicalentities(graph_uris, provenance_uri)[source]

Get objects of the PhysicalEntity class and its subclasses in the given graphs

Note

From the BioPAX documentation, about the use of memberPhysicalEntity:

Using this property is not recommended. memberPhysicalEntity is only defined to support legacy data in certain databases. It is used to define a generic physical entity that is a collection of other physical entities. In general, EntityReference class should be used to create generic groups of physical entities, however, there are some cases where this is not possible, and the property has to be used. For instance, when an entity reference is used to define a generic physical entity with generic features, the generic features of the same type must be grouped. If you do not have grouping information for features of generic physical entities, you cannot use entity reference to define generic physical entities and must use the memberPhysicalEntity property. Another example for using this property is to create generic complexes, which are currently not supported with the EntityReference scheme (there is no “ComplexReference” class).

Parameters:
  • graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
  • provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns:

Dictionary of PhysicalEntity. uris as keys; Reaction objects as values

Return type:

<dict <str>:<PhysicalEntity>>

biopax2cadbiom.sparql_biopaxQueries.get_biopax_reactions(graph_uris, provenance_uri)[source]

Query all Interactions of the database, minus Control objects.

Warning

We also get ‘Control’, if we do ‘rdfs: subClassOf * biopax3: Interaction’, but this must be done by get_biopax_controls().

THEREFORE: Suppression of the controls from the results via MINUS {}

Note

Control class contains (Catalysis, TemplateReactionRegulation, …)

Note

We correct the BioPAX hierarchy generated by some tools like BiNOM. This tool defines the entire hierarchy of parent classes for each BioPAX object instead of let users to use the RDFS reasoner and the rdfs: subclassof property. As a result, objects are queried as many times as they have parent classes. Fortunately, we remove Control derivatives from Interaction objects. However, Interaction objects are far too generic to be interpreted/used in the program, so we must ensure that objects created here have the most accurate interactionType attribute possible. In practice Virtuoso returns first the rdf: type most accurate property, then the parent classes (Ex: BiochemicalReaction then Conversion in the case of an object that would include these 2 properties). In theory, nothing seems to guarantee that this happens all the time.

Note

FR version: Nous corrigeons la hierarchie BioPAX générée par certains outils comme BiNOM. Cet outil définit toute la hiérarchie des classes parentes pour chaque objet BioPAX au lieu de laisser les utilisateurs d’utiliser le raisonneur RDFS et la propriété rdfs:subclassof. Par conséquent les objets sont requêtés autant de fois qu’ils ont de classes parentes. Heureusement nous enlevons les dérivés de Control des objets de type Interaction. Toutefois les objets Interaction sont bien trop génériques pour être interprétés/utilisés dans le programme, nous devons donc veiller à ce que les objets créés ici aient un attribut interactionType le plus précis possible. En pratique Virtuoso renvoie en premier la propriété rdf:type la plus précise, puis ensuite les classes parentes (Ex: BiochemicalReaction puis Conversion dans le cas d’un objet qui comporterait ces 2 propriétés). En théorie, rien ne semble garantir que cela se produise tout le temps.

Note

conversionDirection and catalysisDirection are respectively for Conversion and Catalysis subclasses. Do not forget that Catalysis direction overrides Conversion direction. Currently we assume that Conversion are LEFT_TO_RIGHT (although this is not recommended in the standard). Order of priority for directions: catalysisDirection > conversionDirection > spontaneous > thermodynamic constants and FBA analysis

Parameters:
  • graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
  • provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns:

Dictionary of reactions. uris as keys; Reaction objects as values

Return type:

<dict <str>:<Reaction>>

biopax2cadbiom.sparql_biopaxQueries.get_biopax_xrefs(graph_uris, provenance_uri, database_name=None)[source]

Get xrefs of all entities in the given database (if specified)

  • An Xref is a reference from an instance of a class in the current ontology
    to an object in external resource.
  • An xref can be an instance of PublicationXref, RelationshipXref,
    UnificationXref.

Warning

WE DO NOT filter the references according to the relation of identity or similarity that they define. i.e, UnificationXref relationships have the same weight as RelationshipXref relationships, and the relationshipType attributes of RelationshipXref objects are not used to show the degree of similarity between the current object and the object in the external database (see the note below).

Note

Classes inherit xref from their members.

Note

Each ontology can differently name their databases. Ex: ‘UniProt’ vs ‘uniprot knowledgebase’, ‘ChEBI’ vs ‘chebi’

Note

Some objects (RelationshipXref, ?) have relationshipType attributes pointing to RelationshipTypeVocabulary objects. These objects use the PSI Molecular Interaction ontology (MI).

Parameters:
  • graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
  • provenance_uri (<str>) – URI of the queried subgraphs. Used to filter objects on their dataSource attribute.
Returns:

Dictionary of entityRefs. keys: uris; values: dict of databases keys: database names; values: ids

Return type:

<dict <str>: <dict <str>: <list>>>

biopax2cadbiom.sparql_biopaxQueries.get_graphs_from_triplestore()[source]

Get the list of graphs URIs in the triplestore

Note

The queried graphs are named graphs.

Returns:Iterable of tuples (1 graph URI per tuple)
Return type:<generator>
biopax2cadbiom.sparql_biopaxQueries.get_info_from_triplestore(graph_uris=[])[source]

List graphs and subgraphs from the triplestore and retrieve some metadata

Parameters:graph_uris (<list>) – List of graphs uris (optional)
Returns:Generator of tuples: (graph_uri, provenance_uri, name, dname, comment)
Return type:<generator>
biopax2cadbiom.sparql_biopaxQueries.get_subgraphs_from_triplestore(graph_uris)[source]

Get URIs of BioPAX graphs in the configured triplestore

Note

We assume that graphs are in full BioPAX format, i.e that dataSource attribute is set on entities. That’s the only way to extract a database from another in a merged graph (Cf PathwayCommons).

Note

In practice, name is more precise than displayName.

Note

SPARQL query:

PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>
SELECT ?graph ?provenance ?name ?dname ?comment
WHERE {
    GRAPH ?graph {
        ?provenance a bp:Provenance.
        OPTIONAL {
            ?provenance bp:standardName ?name.
        }
        OPTIONAL {
            ?provenance bp:displayName ?dname.
        }
        OPTIONAL {
            ?provenance bp:comment ?comment.
        }
    }
}
ORDER BY ?graph ?name
Parameters:graph_uris (<list>) – List of RDF graphs that will be queried on the triplestore.
Returns:Iterable of tuples. (graph_uri, provenance_uri, name, display_name, comment)

Note

If you get an encoding error in name or comment, please put ‘from __future__ import unicode_literals’ at the begining of your Python script.

Type:<generator>

namespaces

This module is used to load all RDF Namespaces.

Use: from namespaces import *

biopax2cadbiom.namespaces.get_RDF_prefixes()[source]

Prefixes sent in SPARQL queries.

sparql_wrapper

Module used to query SPARQL endpoint.

biopax2cadbiom.sparql_wrapper.auto_add_prefixes(func)[source]

Decorator: Add all prefixes to the SPARQL query at first argument of sparql_query()

biopax2cadbiom.sparql_wrapper.load_sparql_endpoint()[source]

Make a connection to SPARQL endpoint & retrieve a cursor.

Returns:sparql cursor in version 1! => we don’t use SPARQLWrapper2 cursor that provides SPARQLWrapper.SmartWrapper.Bindings-class to convert JSON from server.
Return type:<SPARQLWrapper>
biopax2cadbiom.sparql_wrapper.order_results(query, orderby='?uri', limit=4000)[source]

Build nested query for access points with restrictions.

Build the nested query by encapsulating the original between the same SELECT command (minus useless DISTINCT clause), and the OFFSET & LIMIT clauses at the end. PS: don’t forget to add the ORDER BY at the end of the original query.

http://vos.openlinksw.com/owiki/wiki/VOS/VirtTipsAndTricksHowToHandleBandwidthLimitExceed https://etl.linkedpipes.com/components/e-sparqlendpointselectscrollablecursor

Warning

WE ASSUME THAT THE SECOND LINE OF THE QUERY CONTAINS THE FULL SELECT COMMAND !!!

Parameters:
  • arg1 (<str>) – Original normal SPARQL query.
  • arg2 (<str>) – Order queries by this variable.
  • arg3 (<int>) – Max items queried for 1 block.
Returns:

A generator of lines of results.

Return type:

<dict>

biopax2cadbiom.sparql_wrapper.sparql_query(*args, **kwargs)[source]

Return modified function with prefix added on the first argument