This notebook is the third in the training material series, and focuses on evaluating the impact of PDB entries by counting citations using the REST API of PDBe.
First, we import some packages that we will use, and set some variables.
Note: Full list of valid URLs is available from http://www.ebi.ac.uk/pdbe/api/doc/
[2]:
import requests
import re
base_url = "https://www.ebi.ac.uk/pdbe/"
api_base = base_url + "api/"
citation_url = api_base + 'pdb/entry/related_publications/'
Let’s start with defining a function that can be used to GET a single PDB entry, or POST a comma-separated list of PDB entries.
We will use this function to retrieve citations data for entries.
[3]:
def make_request(url, mode, pdb_id):
"""
This function can make GET and POST requests to
the PDBe API
:param url: String,
:param mode: String,
:param pdb_id: String
:return: JSON or None
"""
if mode == "get":
response = requests.get(url=url+pdb_id)
elif mode == "post":
response = requests.post(url, data=pdb_id)
if response.status_code == 200:
return response.json()
else:
print("[No data retrieved - %s] %s" % (response.status_code, response.text))
return None
In this exercise, we will try to evaluate the impact of a set of PDB entries based on the number of citations (publications that mention the entry, where the authors are not the same as the PDB depositors)
We will use the API call “related publications” to do this.
[4]:
pdb_list = ("1cbs", "2aqa", "3bow", "2klm", "5tok")
Let’s see what the API data would look like for a specific entry.
For example the entry “3bow” would have information such as this:
[5]:
{
"3bow": {
"appears_without_citation": {
"Reviews": [],
"Articles": []
},
"cited_by": {
"Reviews": [
{
"title": "Calpains and cancer: friends or enemies?",
"journal": "Arch. Biochem. Biophys.",
"citation_type": "Review",
"year": "2014",
"volume": "564",
"pubmed_id": "25305531",
"authors": "Moretti D, Del Bello B, Allavena G, Maellaro E.",
"cited_by_count": 12,
"pages": "26-36"
}
],
"Articles": []
},
"uniprot_publications": {
"Reviews": [],
"Articles": []
}
}
}
[5]:
{'3bow': {'appears_without_citation': {'Reviews': [], 'Articles': []},
'cited_by': {'Reviews': [{'title': 'Calpains and cancer: friends or enemies?',
'journal': 'Arch. Biochem. Biophys.',
'citation_type': 'Review',
'year': '2014',
'volume': '564',
'pubmed_id': '25305531',
'authors': 'Moretti D, Del Bello B, Allavena G, Maellaro E.',
'cited_by_count': 12,
'pages': '26-36'}],
'Articles': []},
'uniprot_publications': {'Reviews': [], 'Articles': []}}}
As you can see, the API call returns a structured JSON object with information on three types of related publications. For our purposes, the most relevant citations will be found in the “cited_by” sub-dictionary. For the sake of this exercise, we will argue that the impact of a PDB entry is best quantified by how many times it was cited directly. We will also make a distinction between reviews and articles, as articles are on the frontline of science, and the most impactful developments are published in this type of publications.
[6]:
def calculate_citations(pdb_list):
"""
This function will calculate the number of review and article
citations for each PDB entry on an id list
:param pdb_list: List
:return: Dict or None
"""
# We will save valid and unique PDB ids
validated_unique_ids = []
citation_counts = {}
# First, we loop through the PDB list
# and check if the ids match the PDB
# id pattern
for pdb_id in pdb_list:
if not re.match("[0-9][A-Za-z][A-Za-z0-9]{2}", pdb_id):
continue
if pdb_id not in validated_unique_ids:
validated_unique_ids.append(pdb_id)
# Join the list of PDB ids in a string
# format that the API requires
joined_list = ", ".join(validated_unique_ids)
# Get the citations data for the list of
# PDB entries
citations_data = make_request(citation_url, "post", joined_list)
if not citations_data:
print("No data")
return None
for entry_id in citations_data.keys():
number_of_reviews = len(citations_data[entry_id]["cited_by"]["Reviews"])
number_of_articles = len(citations_data[entry_id]["cited_by"]["Articles"])
citation_counts[entry_id] = {"reviews": number_of_reviews, "articles": number_of_articles}
return citation_counts
Now that we have this function, we can call it with the list of PDB entries. We also added a simple function to print the results in a more user-friendly manner.
[7]:
print("Getting citation counts:")
counts = calculate_citations(pdb_list)
print(counts)
print()
def print_nicely(counts):
"""
This function iterates through a count
dictionary and prints the values in a
user-friendly way
:param counts: Dict,
:return: None
"""
print("pdb id\tarticles\treviews")
for entry_id in counts.keys():
print("%s\t%i\t%i" % (entry_id,
counts[entry_id]["articles"],
counts[entry_id]["reviews"]))
return None
print_nicely(counts)
Getting citation counts:
{'2klm': {'reviews': 11, 'articles': 10}, '1cbs': {'reviews': 16, 'articles': 80}, '2aqa': {'reviews': 22, 'articles': 42}, '5tok': {'reviews': 3, 'articles': 1}, '3bow': {'reviews': 25, 'articles': 86}}
pdb id articles reviews
2klm 10 11
1cbs 80 16
2aqa 42 22
5tok 1 3
3bow 86 25