This notebook is the second in the training material series, and focuses on getting secondary structure mappings for PDB entries using the REST API of PDBe.
First, we import some packages that we will use, and set some variables.
Note: Full list of valid URLs is available from http://www.ebi.ac.uk/pdbe/api/doc/
[1]:
import re
import requests
base_url = "https://www.ebi.ac.uk/pdbe/"
api_base = base_url + "api/"
secondary_structure_url = api_base + 'pdb/entry/secondary_structure/'
Let’s start with defining a function that can be used to GET a single PDB entry, or POST a comma-separated list of PDB entries.
We will use this function to retrieve secondary structure mapping for entries.
[2]:
def make_request(url, mode, pdb_id):
"""
This function can make GET and POST requests to
the PDBe API
:param url: String,
:param mode: String,
:param pdb_id: String
:return: JSON or None
"""
if mode == "get":
response = requests.get(url=url+pdb_id)
elif mode == "post":
response = requests.post(url, data=pdb_id)
if response.status_code == 200:
return response.json()
else:
print("[No data retrieved - %s] %s" % (response.status_code, response.text))
return None
Next, we will define a function that can be used to retrieve secondary structural element ranges for PDB entries, and extracts this information so that it can be displayed in a user-friendly way.
The function will rely on the make_request() function we have defined previously.
This new function should either accept a single PDB id, or a list of PDB ids, and make a GET or a POST call to the API accordingly. The data structure then has to be traversed, and the residue ranges of helices and strands have to be recorded. Since the data is in a nested JSON format, (for the sake of not touching on more advanced Python topics) we will use nested for-loops to get to the level of interest.
If you are wondering how the complete JSON looks like, follow this link: https://www.ebi.ac.uk/pdbe/api/pdb/entry/secondary_structure/1cbs
[3]:
def get_secondary_structure_ranges(pdb_id=None, pdb_list=None):
"""
This function calls the PDBe API and retrieves the residue
ranges of secondary structural elements in a single PDB entry
or in a list of PDB entries
:param pdb_id: String,
:param pdb_list: String
:return: None
"""
# If neither a single PDB id, nor a list was provided,
# exit the function
if not pdb_id and not pdb_list:
print("Either provide one PDB id, or a list of ids")
return None
if pdb_id:
# If a single PDB id was provided, call the API with GET
data = make_request(secondary_structure_url, "get", pdb_id)
else:
# If multiple PDB ids were provided, call the API with POST
# The POST API call expects PDB ids as a comma-separated lise
pdb_list_string = ", ".join(pdb_list)
data = make_request(secondary_structure_url, "post", pdb_list_string)
# When no data is returned by the API, exit the function
if not data:
print("No data available")
return None
# Loop through all the PDB entries in the retrieved data
for entry_id in data.keys():
entry = data[entry_id]
molecules = entry["molecules"]
# Loop through all the molecules of a given PDB entry
for i in range(len(molecules)):
chains = molecules[i]["chains"]
# Loop through all the chains of a given molecules
for j in range(len(chains)):
secondary_structure = chains[j]["secondary_structure"]
helices = secondary_structure["helices"]
strands = secondary_structure["strands"]
helix_list = []
strand_list = []
# Loop through all the helices of a given chain
for k in range(len(helices)):
start = helices[k]["start"]["residue_number"]
end = helices[k]["end"]["residue_number"]
helix_list.append("%s-%s" % (start, end))
# Loop through all the strands of a given chain
for l in range(len(strands)):
start = strands[l]["start"]["residue_number"]
end = strands[l]["end"]["residue_number"]
strand_list.append("%s-%s" % (start, end))
report = "%s chain %s has " % (entry_id, chains[j]["chain_id"])
if len(helix_list) > 0:
report += "helices at residue ranges %s " % str(helix_list)
else:
report += "no helices "
report += "and "
if len(strand_list) > 0:
report += "strands at %s" % str(strand_list)
else:
"no strands"
print(report)
return None
Let’s try our new function first with a single PDB entry (1cbs), and then with a list of two entries (2aqa and 2klm)
[4]:
print("Example of a single PDB entry")
get_secondary_structure_ranges(pdb_id="1cbs")
print()
print("Example of multiple PDB entries")
get_secondary_structure_ranges(pdb_list=["2aqa", "2klm"])
Example of a single PDB entry
1cbs chain A has helices at residue ranges ['14-22', '25-37'] and strands at ['5-13', '40-46', '49-55', '60-66', '71-74', '80-89', '92-99', '107-113', '119-125', '128-136']
Example of multiple PDB entries
2klm chain A has helices at residue ranges ['24-29', '33-49', '74-83', '100-110', '119-135'] and strands at ['4-14', '51-60', '64-66', '98-99', '137-138']
2aqa chain A has helices at residue ranges ['41-46'] and strands at ['5-6', '12-13']
[ ]: