graphein.construct_graphs

graphein.construct_graphs.ProteinGraph

class graphein.construct_graphs.ProteinGraph(granularity, keep_hets, insertions, node_featuriser, get_contacts_path, pdb_dir, contacts_dir, exclude_waters=True, covalent_bonds=True, include_ss=True, include_ligand=False, intramolecular_interactions=None, graph_constructor=None, edge_featuriser=None, edge_distance_cutoff=None, verbose=True, deprotonate=False, remove_string_labels=False, long_interaction_threshold=None)[source]
__init__(granularity, keep_hets, insertions, node_featuriser, get_contacts_path, pdb_dir, contacts_dir, exclude_waters=True, covalent_bonds=True, include_ss=True, include_ligand=False, intramolecular_interactions=None, graph_constructor=None, edge_featuriser=None, edge_distance_cutoff=None, verbose=True, deprotonate=False, remove_string_labels=False, long_interaction_threshold=None)[source]

Initialise ProteinGraph Generator Class

Parameters
  • granularity (str) – Specifies granularity of the graph construction. {‘atom’, ‘CA’, ‘CB’}. CA = Alpha Carbon, CB = Beta Carbon

  • keep_hets (bool) – Keep heteroatoms present in the PDB file. Typically, these correspond to metal ions or modified residues (e.g. MSE)

  • insertions (bool) – Keep atoms/residues with multiple insertion positions. Multiple insertions exist when the electron density is too vague to define a single insertion

  • node_featuriser (DGL Node Featuriser) – DGL Node featuriser for atom-level graphs. Canonical Featurises recommended.

  • pdb_dir (str) – Directory to PDB files. We will download .PDB files to this folder if you don’t have an existing local copy of the requisite structure

  • contacts_dir (str) – Directory to GetContacts files

  • exclude_waters (bool) – Specifies inclusion of water molecules. Not yet fully operational.

  • covalent_bonds (bool) – Specifies inclusion of covalent backbone. E.g. joins adjacent residues in the sequence

  • include_ss (bool) – Specifies inclusion of secondary structure features computed by DSSP. Future warning: this will be changed in a subsequent update for managing feature selection.

  • include_ligand (bool) – Not yet implemented. Will specify option to include bound ligand(s) in the graph.

  • intramolecular_interactions (list) – List of allowable intramolecular interactions to include from GetContacts. [‘sb’, ‘pc’, ‘ps’, ‘ts’, ‘vdw’, ‘hb’, ‘hbb’, ‘hbsb’, ‘hbbb’, ‘hbss’, ‘wb’, ‘wb2’, ‘hblb’, ‘hbls’, ‘lwb’, ‘lwb2’, ‘hp’]. See https://getcontacts.github.io/interactions.html for details.

  • edge_distance_cutoff (float) – Distance in angstroms specifying cutoff distance for constructing an edge when using distance construction

  • long_interaction_threshold (int) – Specifies minimum distance in sequence for two nodes to be connected

dgl_graph_from_pdb_code(pdb_code=None, file_path=None, chain_selection='all', contact_file=None, edge_construction=['contacts'], encoding=False, k_nn=None, custom_edges=None)[source]

Produces a DGL graph from a PDB code and a selection of polypeptide chains

Parameters
  • file_path (str) –

  • custom_edges (Pandas DataFrame, optional) – Pass user-defined custom edges to use in edge construction, defaults to None

  • edge_construction (list) – Specifies edge construction methods. {‘contact’, ‘distance’, ‘custom’}, defaults to [‘contacts’]

  • k_nn (int) – Specifies number of nearest neighbours to make K_NN edges with

  • encoding (bool) – Indicates whether or not node names and labels should be encoded

  • contact_file (str) – Path to local GetContacts output file, defaults to None

  • pdb_code (str) – 4 character PDB accession code

  • chain_selection (list) – string indicating which chains to select {‘A’, ‘B’, ‘AB’, …, ‘all’}, defaults to ‘all’

Returns

DGLGraph object, nodes populated by residues or atoms as specified in class initialisation

dgl_graph_from_pdb_file(file_path, chain_selection, contact_file, edges=None)[source]

Produces a DGL graph from a PDB file and a selection of polypeptide chains

Parameters
  • edges (Pandas DataFram, optional) – User-defined custom edges, defaults to None

  • contact_file (str) – Path to local GetContacts output file

  • file_path (str) – 4 character PDB accession code

  • chain_selection (str) – Polypeptide chains in structure to select {‘A’, ‘B’, ‘AB’, …, ‘all}

Returns

DGLGraph object, nodes populated by residues or atoms as specified in class initialisation

Return type

DGLGraph

nx_graph_from_pdb_code(pdb_code, chain_selection='all', contact_file=None, edge_construction=['contacts'], encoding=False, k_nn=None, custom_edges=None)[source]

Produces a NetworkX Graph Object

Parameters
  • encoding

  • edges (Pandas DataFrame, optional) – User-supplied edges, defaults to None

  • pdb_code (str) – 4 character PDB accession code

  • chain_selection (str) – string indicating chain selection {‘A’, ‘B’, ‘AB’, …, ‘all’}, defaults to ‘all’

  • contact_file (str, optional) – Path to GetContacts output file.

Returns

NetworkX graph object of protein

Return type

NetworkX graph

nx_graph_from_pdb_file(pdb_code, chain_selection='all', contact_file=None)[source]

Produces a NetworkX Graph Object

Parameters
  • pdb_code (str) – 4 character PDB accession code

  • chain_selection (str) – string indicating chain selection {‘A’, ‘B’, ‘AB’, …, ‘all’}

  • contact_file (str, optional) – Path to GetContacts output file.

Returns

NetworkX graph object of protein

torch_geometric_graph_from_pdb_code(pdb_code, chain_selection='all', edge_construction=['contacts'], contact_file=None, encoding=False, k_nn=None, custom_edges=None)[source]

Produces a PyToch Geometric Data object from a protein structure

Parameters
  • k_nn (int, optional) – Specifies K nearest neighbours to use in KNN edge construction, defaults to None

  • custom_edges (Pandas DataFrame, optional) – User-supplied edges to use, defaults to None

  • encoding (bool) –

  • edge_construction (list) – List containing edge construction to be used. [‘contacts’, ‘distance’, ‘delaunay’], defaults to [‘contacts’]

  • pdb_code (str) – 4-character PDB accession code

  • chain_selection (str) – Specifies polypeptide chains to include. e.g. one of {‘A’, ‘B’ ,’AB’, ‘BC’}, defaults to ‘all’

  • contact_file (str) – Path to contact file if using local file.

Returns

Pytorch Geometric Graph of protein structure.

Return type

PyTorch Geometric Data object