graphein.construct_graphs¶
graphein.construct_graphs.ProteinGraph¶
-
class
graphein.construct_graphs.
ProteinGraph
(granularity, keep_hets, insertions, node_featuriser, get_contacts_path, pdb_dir, contacts_dir, exclude_waters=True, covalent_bonds=True, include_ss=True, include_ligand=False, intramolecular_interactions=None, graph_constructor=None, edge_featuriser=None, edge_distance_cutoff=None, verbose=True, deprotonate=False, remove_string_labels=False, long_interaction_threshold=None)[source]¶ -
__init__
(granularity, keep_hets, insertions, node_featuriser, get_contacts_path, pdb_dir, contacts_dir, exclude_waters=True, covalent_bonds=True, include_ss=True, include_ligand=False, intramolecular_interactions=None, graph_constructor=None, edge_featuriser=None, edge_distance_cutoff=None, verbose=True, deprotonate=False, remove_string_labels=False, long_interaction_threshold=None)[source]¶ Initialise ProteinGraph Generator Class
- Parameters
granularity (str) – Specifies granularity of the graph construction. {‘atom’, ‘CA’, ‘CB’}. CA = Alpha Carbon, CB = Beta Carbon
keep_hets (bool) – Keep heteroatoms present in the PDB file. Typically, these correspond to metal ions or modified residues (e.g. MSE)
insertions (bool) – Keep atoms/residues with multiple insertion positions. Multiple insertions exist when the electron density is too vague to define a single insertion
node_featuriser (DGL Node Featuriser) – DGL Node featuriser for atom-level graphs. Canonical Featurises recommended.
pdb_dir (str) – Directory to PDB files. We will download .PDB files to this folder if you don’t have an existing local copy of the requisite structure
contacts_dir (str) – Directory to GetContacts files
exclude_waters (bool) – Specifies inclusion of water molecules. Not yet fully operational.
covalent_bonds (bool) – Specifies inclusion of covalent backbone. E.g. joins adjacent residues in the sequence
include_ss (bool) – Specifies inclusion of secondary structure features computed by DSSP. Future warning: this will be changed in a subsequent update for managing feature selection.
include_ligand (bool) – Not yet implemented. Will specify option to include bound ligand(s) in the graph.
intramolecular_interactions (list) – List of allowable intramolecular interactions to include from GetContacts. [‘sb’, ‘pc’, ‘ps’, ‘ts’, ‘vdw’, ‘hb’, ‘hbb’, ‘hbsb’, ‘hbbb’, ‘hbss’, ‘wb’, ‘wb2’, ‘hblb’, ‘hbls’, ‘lwb’, ‘lwb2’, ‘hp’]. See https://getcontacts.github.io/interactions.html for details.
edge_distance_cutoff (float) – Distance in angstroms specifying cutoff distance for constructing an edge when using distance construction
long_interaction_threshold (int) – Specifies minimum distance in sequence for two nodes to be connected
-
dgl_graph_from_pdb_code
(pdb_code=None, file_path=None, chain_selection='all', contact_file=None, edge_construction=['contacts'], encoding=False, k_nn=None, custom_edges=None)[source]¶ Produces a DGL graph from a PDB code and a selection of polypeptide chains
- Parameters
file_path (str) –
custom_edges (Pandas DataFrame, optional) – Pass user-defined custom edges to use in edge construction, defaults to None
edge_construction (list) – Specifies edge construction methods. {‘contact’, ‘distance’, ‘custom’}, defaults to [‘contacts’]
k_nn (int) – Specifies number of nearest neighbours to make K_NN edges with
encoding (bool) – Indicates whether or not node names and labels should be encoded
contact_file (str) – Path to local GetContacts output file, defaults to None
pdb_code (str) – 4 character PDB accession code
chain_selection (list) – string indicating which chains to select {‘A’, ‘B’, ‘AB’, …, ‘all’}, defaults to ‘all’
- Returns
DGLGraph object, nodes populated by residues or atoms as specified in class initialisation
-
dgl_graph_from_pdb_file
(file_path, chain_selection, contact_file, edges=None)[source]¶ Produces a DGL graph from a PDB file and a selection of polypeptide chains
- Parameters
edges (Pandas DataFram, optional) – User-defined custom edges, defaults to None
contact_file (str) – Path to local GetContacts output file
file_path (str) – 4 character PDB accession code
chain_selection (str) – Polypeptide chains in structure to select {‘A’, ‘B’, ‘AB’, …, ‘all}
- Returns
DGLGraph object, nodes populated by residues or atoms as specified in class initialisation
- Return type
DGLGraph
-
nx_graph_from_pdb_code
(pdb_code, chain_selection='all', contact_file=None, edge_construction=['contacts'], encoding=False, k_nn=None, custom_edges=None)[source]¶ Produces a NetworkX Graph Object
- Parameters
encoding –
edges (Pandas DataFrame, optional) – User-supplied edges, defaults to None
pdb_code (str) – 4 character PDB accession code
chain_selection (str) – string indicating chain selection {‘A’, ‘B’, ‘AB’, …, ‘all’}, defaults to ‘all’
contact_file (str, optional) – Path to GetContacts output file.
- Returns
NetworkX graph object of protein
- Return type
NetworkX graph
-
nx_graph_from_pdb_file
(pdb_code, chain_selection='all', contact_file=None)[source]¶ Produces a NetworkX Graph Object
- Parameters
pdb_code (str) – 4 character PDB accession code
chain_selection (str) – string indicating chain selection {‘A’, ‘B’, ‘AB’, …, ‘all’}
contact_file (str, optional) – Path to GetContacts output file.
- Returns
NetworkX graph object of protein
-
torch_geometric_graph_from_pdb_code
(pdb_code, chain_selection='all', edge_construction=['contacts'], contact_file=None, encoding=False, k_nn=None, custom_edges=None)[source]¶ Produces a PyToch Geometric Data object from a protein structure
- Parameters
k_nn (int, optional) – Specifies K nearest neighbours to use in KNN edge construction, defaults to None
custom_edges (Pandas DataFrame, optional) – User-supplied edges to use, defaults to None
encoding (bool) –
edge_construction (list) – List containing edge construction to be used. [‘contacts’, ‘distance’, ‘delaunay’], defaults to [‘contacts’]
pdb_code (str) – 4-character PDB accession code
chain_selection (str) – Specifies polypeptide chains to include. e.g. one of {‘A’, ‘B’ ,’AB’, ‘BC’}, defaults to ‘all’
contact_file (str) – Path to contact file if using local file.
- Returns
Pytorch Geometric Graph of protein structure.
- Return type
PyTorch Geometric Data object
-