This notebook is the fourth in the training material series, and focuses on investigating Ramachandran and side-chain outliers using the REST API of PDBe.
First, we import some packages that we will use, and set some variables.
Note: Full list of valid URLs is available from http://www.ebi.ac.uk/pdbe/api/doc/
[1]:
import requests
import re
base_url = "https://www.ebi.ac.uk/pdbe/"
api_base = base_url + "api/"
outlier_data_url = api_base + 'validation/protein-ramachandran-sidechain-outliers/entry/'
Let’s start with defining a function that can be used to GET a single PDB entry, or POST a comma-separated list of PDB entries.
We will use this function to retrieve secondary structure mapping for entries.
[2]:
def make_request(url, mode, pdb_id):
"""
This function can make GET and POST requests to
the PDBe API
:param url: String,
:param mode: String,
:param pdb_id: String
:return: JSON or None
"""
if mode == "get":
response = requests.get(url=url+pdb_id)
elif mode == "post":
response = requests.post(url, data=pdb_id)
if response.status_code == 200:
return response.json()
else:
print("[No data retrieved - %s] %s" % (response.status_code, response.text))
return None
We will use one of the validation data calls of the PDBe API to get information on the Ramachandran and side-chain outliers of various models in a PDB entry.
For this exercise, we will look at the NMR entry “2aqa”. Generally, the JSON data will have the following basic structure:
[3]:
example = {
"2aqa": {
"ramachandran_outliers": [],
"sidechain_outliers": []
}
}
The lists will contain dictionaries which give residue-level information. For example:
[4]:
example = {
"model_id": 2,
"entity_id": 1,
"residue_number": 47,
"author_residue_number": 48,
"chain_id": "A",
"author_insertion_code": "",
"alt_code": "",
"struct_asym_id": "A"
}
The entry “2aqa” has multiple models, and it may be of interest to see if any of the models has relatively more outliers than the rest.
First, we will list the number of outlier residues per models using the functions below:
[5]:
def get_outlier_data(pdb_id):
"""
This function will GET the outlier data from
the PDBe API using the make_request() function
:param pdb_id: String
:return: JSON
"""
# Check if the provided PDB id is valid
# There is no point in making an API call
# with bad PDB ids
if not re.match("[0-9][A-Za-z][A-Za-z0-9]{2}", pdb_id):
print("Invalid PDB id")
return None
# GET the outlier data
outlier_data = make_request(outlier_data_url, "get", pdb_id)
# Check if there is data
if not outlier_data:
print("No data found")
return None
return outlier_data
def list_number_of_outliers_per_model(pdb_id):
"""
This function calls get_outlier_data() and
then list the number of Ramachandran and
side-chain outliers per model in the PDB entry
:param pdb_id: String,
:return: None
"""
# We will collect the number of outlier
# residues per model
outliers = {"ramachandran_outliers": {}, "sidechain_outliers": {}}
# Getting outlier data
outlier_data = get_outlier_data(pdb_id)
# If there is no data, return None
if not outlier_data:
return None
# Iterate through both Ramachandran and
# side-chain outliers
for key in outliers.keys():
for i in range(len(outlier_data[pdb_id][key])):
# Grab the model id
model_id = outlier_data[pdb_id][key][i]["model_id"]
# If the model id was not observed before, add to
# the outliers dictionary with the corresponding
# outlier type, otherwise increase the count by one
if model_id not in outliers[key].keys():
outliers[key][model_id] = 1
else:
outliers[key][model_id] += 1
print("Ramachandran outliers:")
for model in outliers["ramachandran_outliers"].keys():
print("Model %i has %i Ramachandran outliers" % (model,
outliers["ramachandran_outliers"][model]))
print()
print("Side-chain outliers:")
for model in outliers["sidechain_outliers"].keys():
print("Model %i has %i Side-chain outliers" % (model,
outliers["sidechain_outliers"][model]))
list_number_of_outliers_per_model("2aqa")
Ramachandran outliers:
Model 2 has 1 Ramachandran outliers
Model 3 has 1 Ramachandran outliers
Model 5 has 1 Ramachandran outliers
Model 6 has 1 Ramachandran outliers
Model 7 has 1 Ramachandran outliers
Model 8 has 1 Ramachandran outliers
Model 9 has 1 Ramachandran outliers
Model 11 has 1 Ramachandran outliers
Model 12 has 1 Ramachandran outliers
Model 13 has 1 Ramachandran outliers
Model 15 has 1 Ramachandran outliers
Model 16 has 1 Ramachandran outliers
Model 19 has 1 Ramachandran outliers
Side-chain outliers:
Model 1 has 4 Side-chain outliers
Model 2 has 6 Side-chain outliers
Model 3 has 8 Side-chain outliers
Model 4 has 6 Side-chain outliers
Model 5 has 6 Side-chain outliers
Model 6 has 4 Side-chain outliers
Model 7 has 9 Side-chain outliers
Model 8 has 4 Side-chain outliers
Model 9 has 5 Side-chain outliers
Model 10 has 4 Side-chain outliers
Model 11 has 8 Side-chain outliers
Model 12 has 4 Side-chain outliers
Model 13 has 5 Side-chain outliers
Model 14 has 5 Side-chain outliers
Model 15 has 4 Side-chain outliers
Model 16 has 6 Side-chain outliers
Model 17 has 7 Side-chain outliers
Model 18 has 2 Side-chain outliers
Model 19 has 7 Side-chain outliers
Model 20 has 4 Side-chain outliers
Next, we will write a function that lists out which are the outlier residues within a specific model.
[6]:
def list_outlier_residues_of_model(pdb_id, model_id):
"""
This function calls get_outlier_data()
and lists all outlier residues of a
specific model
:param pdb_id: String,
:param model_id: Integer,
:return: None
"""
outlier_data = get_outlier_data(pdb_id)
# If there is no data, return None
if not outlier_data:
return None
# Iterate thourgh the outlier types
for outlier_type in outlier_data[pdb_id].keys():
# Loop through all the residue-level outlier information
for i in range(len(outlier_data[pdb_id][outlier_type])):
outlier_information = outlier_data[pdb_id][outlier_type][i]
# Only process outlier information that corresponds to
# the model id of interest
if outlier_information["model_id"] != model_id:
continue
# Set outlier type labels
if outlier_type == "ramachandran_outliers":
label = "Ramachandran"
else:
label = "side-chain"
residue = outlier_information["residue_number"]
chain = outlier_information["chain_id"]
entity = outlier_information["entity_id"]
print ("Residue %i in chain %s of entity %i is a %s outlier" % (residue,
chain,
entity,
label))
print("Listing outlier residues for Model #7 of PDB entry 2aqa:")
list_outlier_residues_of_model("2aqa", 7)
Listing outlier residues for Model #7 of PDB entry 2aqa:
Residue 14 in chain A of entity 1 is a Ramachandran outlier
Residue 3 in chain A of entity 1 is a side-chain outlier
Residue 22 in chain A of entity 1 is a side-chain outlier
Residue 27 in chain A of entity 1 is a side-chain outlier
Residue 30 in chain A of entity 1 is a side-chain outlier
Residue 39 in chain A of entity 1 is a side-chain outlier
Residue 42 in chain A of entity 1 is a side-chain outlier
Residue 44 in chain A of entity 1 is a side-chain outlier
Residue 46 in chain A of entity 1 is a side-chain outlier
Residue 57 in chain A of entity 1 is a side-chain outlier