Examples

Basics

The main aim of AIUPred is to identify Intrinsically Disordered Protein Regions (IDPRs, i.e. regions that lack a stable monomeric structure under native conditions) based on a biophysics-based model enhanced with deep learning techniques. The user can input any protein sequence and IUPred returns a score between 0 and 1 for each residue, corresponding to the probability of the given residue being part of a disordered region.

The disordered nature of a protein segment can be context dependent: certain protein regions can switch between an ordered and a disordered state depending on various environmental factors. Currently, the AIUPred server is able to detect such context-dependent disorder in the case where the environmental factors are either a change in the redox state or the presence of an ordered binding partner (for more details see here).

The following sections outline the use of AIUPred in various scenarios.

Protein sequence input

There are two basic ways to input protein sequences into AIUPred:

  • I - If the protein is deposited in the UniProt database (either in SwissProt or TrEMBL) you can specify the accession code or the Entry of the protein in the "Enter UniProt accession or entry" field. The AIUPred server is always linked to latest version of UniProt. The header of the UniProt entry will be
  • II (though advanced options)
  • - Type or cut and paste your sequence in the "paste the amino acid sequence" field. The amino acid sequence must be in the standard FASTA format or must be a plain sequence. Spaces and other non-standard characters within the pasted sequence are permitted, however they will be removed with the remaining sequence treated as a single continuous chain.

Prediction type

There are two different disorder prediction types offered, each using different parameters optimized for slightly different applications through advanced options. These are: Default smoothing and no smoothing

Default smoothing:
Apply a Savitzky–Golay filter with a window size of 11 and a polynom order of 5 which increases the performance of the method

No smoothing
Show the standard output of the neural network without any smoothing.

Context-dependent predictions

IDPRs often harbor binding regions that are able to specifically interact with a globular domain. During this interaction, in the majority of known cases, the binding disordered region adopts an ordered structure in its bound form. This is probably the most commonly occurring context-dependent protein disorder, where the transition between the unstructured and the structured states is initiated by the presence of an appropriate protein partner. Such disordered binding regions are identified using the ANCHOR2 prediction algorithm. Similarly to AIUPred, ANCHOR2 also assigns to each residue a score between 0 and 1, representing the probability of the given residue to be part of a disordered binding region. Selecting ANCHOR2 as a prediction option, the ANCHOR2 score is provided along with the IUPred score.

Output

Basic features:
The primary output of AIUPred is a graph showing the disorder tendency of each residue in the given protein, where higher values correspond to a higher probability of disorder. The graph is scalable and can be directly downloaded for presentation/publication purposes. The list of position-specific disorder scores is also downloadable in simple text or JSON format.


Extended features:
If the prediction was run by specifying a UniProt ID/accession, the output of AIUPred also shows additional protein annotations, including Pfam regions; post-translational modifications (PTMs), including phosphorylations (upper line), methylations and acetylations (lower line) taken from PhosphoSitePlus; corresponding structures from the PDB; and regions that were experimentally verified to be disordered, taken from DisProt, DIBS, and MFIB.

If context-dependent predictions were selected, the output graph and the downloadable results incorporate additional data as well.
Regions overlapping with experimentally verified disordered regions are marked with a red background on the plot. Alongside with this notation regions which were categorised as ordered are marked with a grey background. In case of disordered binding region prediction via ANCHOR2, the graph shows the probability of each residue being part of a binding region in blue. The presence or absence of the AIUPred and ANCHOR2 scores are switchable by clicking on the legend.

RESTful API

AIUPred can be accessed using RESTful API to analyse proteins programatically. The API can be accessed using a standard GET request at

https://aiupred.elte.hu/rest_api
Available paramaters are:
Parameter Default Values
accession Required UniProt accession
analysis_type Optional 'binding' for ANCHOR2 and 'redox' for redox AIUPred-redox
smoothing Optional default 'default' or 'False'

import requests
import json

data = {'accession': 'q32p44', 'smoothing': 'default', 'analysis_type': 'redox'}
url = 'https://aiupred.elte.hu/rest_api'
for key, val in json.loads(requests.get(url, params=data).text).items():
    print(key, val)

Programmatic usage

AIUPred can be freely downloaded for academic users. It contains a loadable python library as well as an executable python script. First download and extract AIUPred. Change the working directory to the extracted directory and install its dependencies. It is highly advised to use a virtual environment!

pip3 install -r requirements.txt
If you want to use the standalone executable just run
python3 aiupred.py
Available options:
usage: aiupred.py [-h] -i INPUT_FILE [-o OUTPUT_FILE] [-v] [-g GPU] [--force-cpu] [--no-smoothing] [--low-memory [LOW_MEMORY]]

options:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input_file INPUT_FILE
                        Input file in (multi) FASTA format
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Output file
  -v, --verbose         Increase output verbosity
  -g GPU, --gpu GPU     Index of GPU to use, default=0
  --force-cpu           Force the network to only utilize the CPU. Calculation will be very slow, not recommended
  --no-smoothing        Removes the default SavGol smoothing function
  --low-memory [LOW_MEMORY]
                        Use chunking to lower the memory usage. Default chunk size is 1000. The lower the chunk size the lower the memory consumption well as the accuracy
The following section gives some tips how to use the importable library.

Add the location of the extracted directory to your PYTHONPATH environment variable (assuming standard bash shell)
export PYTHONPATH="${PYTHONPATH}:/path/to/aiupred/folder"
After reloading the shell AIUPred will be importable in your python scripts.
import aiupred_lib
# Load the models and let AIUPred find if a GPU is available.     
embedding_model, regression_model, device = aiupred_lib.init_models()
# Predict disorder of a sequence
sequence = 'THISISATESTSEQENCE'
prediction = aiupred_lib.predict_disorder(sequence, embedding_model, regression_model, device)

Primary citation
AIUPred: combining energy estimation with deep learning for the enhanced prediction of protein disorder
Gábor Erdős, Zsuzsanna Dosztányi
Nucleic Acids Research 2024; gkae385
v1.2.2