Examples
Basics
The main aim of AIUPred is to identify Intrinsically Disordered Protein Regions (IDPRs, i.e.
regions that lack a stable monomeric structure under native
conditions) based on a biophysics-based model enhanced with deep learning techniques.
The user can input any protein sequence and IUPred returns a score between 0 and 1 for each
residue, corresponding to the probability of the given
residue being part of a disordered region.
The disordered nature of a protein segment can be context dependent: certain protein regions
can switch between an ordered and a disordered state
depending on various environmental factors. Currently, the AIUPred server is able to detect
such context-dependent disorder in the case where the
environmental factors are either a change in the redox state or the presence of an ordered
binding partner (for more details see
here).
The following sections outline the use of AIUPred in various scenarios.
Protein sequence input
There are two basic ways to input protein sequences into AIUPred:
- I - If the protein is deposited in the UniProt database (either in SwissProt or
TrEMBL) you can specify the accession
code or the Entry of the protein in the "Enter UniProt accession or entry" field. The AIUPred server
is always linked to latest version of UniProt. The header of the UniProt entry will be
- II (though advanced options) - Type or cut and paste your sequence in the "paste the amino acid sequence" field. The amino acid sequence must be in the standard FASTA format or must be a plain sequence. Spaces and other non-standard characters within the pasted sequence are permitted, however they will be removed with the remaining sequence treated as a single continuous chain.
Prediction type
There are two different disorder prediction types offered, each using different parameters
optimized for slightly different applications through advanced options.
These are: Default smoothing and no smoothing
Default smoothing:
Apply a Savitzky–Golay filter with a window size of 11 and a polynom order of 5 which increases the performance of the method
No smoothing
Show the standard output of the neural network without any smoothing.
Context-dependent predictions
IDPRs often harbor binding regions that are able to specifically interact with a globular domain. During this interaction, in the majority of known cases, the binding disordered region adopts an ordered structure in its bound form. This is probably the most commonly occurring context-dependent protein disorder, where the transition between the unstructured and the structured states is initiated by the presence of an appropriate protein partner. Such disordered binding regions are identified using the ANCHOR2 prediction algorithm. Similarly to AIUPred, ANCHOR2 also assigns to each residue a score between 0 and 1, representing the probability of the given residue to be part of a disordered binding region. Selecting ANCHOR2 as a prediction option, the ANCHOR2 score is provided along with the IUPred score.
Output
Basic features:
The primary output of AIUPred is a graph showing the disorder tendency of each residue in
the given protein, where higher values correspond
to a higher probability of disorder. The graph is scalable and can be directly downloaded
for presentation/publication purposes. The list of
position-specific disorder scores is also downloadable in simple text or JSON
format.
Extended features:
If the prediction was run by specifying a UniProt ID/accession, the output of AIUPred also
shows additional protein annotations, including
Pfam regions; post-translational
modifications (PTMs), including phosphorylations (upper line),
methylations and acetylations (lower line) taken from PhosphoSitePlus;
corresponding structures
from the PDB; and regions that were
experimentally verified to be disordered, taken from
DisProt, DIBS, and
MFIB.
If context-dependent predictions were selected, the output graph and the downloadable
results incorporate additional data as well.
Regions overlapping with experimentally verified disordered regions are marked with a red background on the
plot. Alongside with this notation regions which were categorised as ordered are marked with a grey
background.
In case of disordered binding region prediction via ANCHOR2, the graph shows the probability
of each residue being part of a binding region in blue.
The presence or absence of the AIUPred and ANCHOR2 scores are switchable by clicking on the
legend.
RESTful API
AIUPred can be accessed using RESTful API to analyse proteins programatically. The API can be accessed using a standard GET request at
https://aiupred.elte.hu/rest_api
Available paramaters are:
Parameter | Default | Values | |
---|---|---|---|
accession | Required | UniProt accession | |
analysis_type | Optional | 'binding' for ANCHOR2 and 'redox' for redox AIUPred-redox | |
smoothing | Optional | default | 'default' or 'False' |
import requests
import json
data = {'accession': 'q32p44', 'smoothing': 'default', 'analysis_type': 'redox'}
url = 'https://aiupred.elte.hu/rest_api'
for key, val in json.loads(requests.get(url, params=data).text).items():
print(key, val)
Programmatic usage
AIUPred can be freely downloaded for academic users. It contains a loadable python library as well as an executable python script. First download and extract AIUPred. Change the working directory to the extracted directory and install its dependencies. It is highly advised to use a virtual environment!
pip3 install -r requirements.txt
If you want to use the standalone executable just run python3 aiupred.py
Available options:
usage: aiupred.py [-h] -i INPUT_FILE [-o OUTPUT_FILE] [-v] [-g GPU] [--force-cpu] [--no-smoothing] [--low-memory [LOW_MEMORY]] options: -h, --help show this help message and exit -i INPUT_FILE, --input_file INPUT_FILE Input file in (multi) FASTA format -o OUTPUT_FILE, --output_file OUTPUT_FILE Output file -v, --verbose Increase output verbosity -g GPU, --gpu GPU Index of GPU to use, default=0 --force-cpu Force the network to only utilize the CPU. Calculation will be very slow, not recommended --no-smoothing Removes the default SavGol smoothing function --low-memory [LOW_MEMORY] Use chunking to lower the memory usage. Default chunk size is 1000. The lower the chunk size the lower the memory consumption well as the accuracyThe following section gives some tips how to use the importable library.
Add the location of the extracted directory to your PYTHONPATH environment variable (assuming standard bash shell)
export PYTHONPATH="${PYTHONPATH}:/path/to/aiupred/folder"
After reloading the shell AIUPred will be importable in your python scripts.
import aiupred_lib
# Load the models and let AIUPred find if a GPU is available.
embedding_model, regression_model, device = aiupred_lib.init_models()
# Predict disorder of a sequence
sequence = 'THISISATESTSEQENCE'
prediction = aiupred_lib.predict_disorder(sequence, embedding_model, regression_model, device)
Gábor Erdős, Zsuzsanna Dosztányi
Nucleic Acids Research 2024; gkae385