AlphaPulldown¶
AlphaPulldown is a customized implementation of AlphaFold-Multimer designed for customizable high-throughput screening of protein-protein interactions.
Policy¶
AlphaPulldown is freely available to users at HPC2N, under GNU General Public License v3.0.
Citations
If you use this software, please cite it along the dependencies it uses internally, at least for AlphaPulldow itself:
Molodenskiy,D. et al. (2025) AlphaPulldown2—a general pipeline for high-throughput structural modeling. Bioinformatics, Volume 41, Issue 3, March 2025, btaf115
Overview¶
From the AlphaPulldown’s repository:
“AlphaPulldown is a customized implementation of AlphaFold-Multimer designed for customizable high-throughput screening of protein-protein interactions. It extends AlphaFold’s capabilities by incorporating additional run options, such as customizable multimeric structural templates (TrueMultimer), MMseqs2 multiple sequence alignment (MSA) via ColabFold databases, protein fragment predictions, and the ability to incorporate mass spec data as an input using AlphaLink2.
AlphaPulldown can be used in two ways: either by a two-step pipeline made of python scripts, or by a Snakemake pipeline as a whole. For details on using the Snakemake pipeline, please refer to the separate GitHub repository.
To enable faster usage and avoid redundant feature recalculations, we have developed a public database containing precomputed features for all major model organisms, available for download. You can check the full list and download individual features at https://alphapulldown.s3.embl.de/index.html or ttps://s3.embl.de/alphapulldown/index.html.
For more details, click here.”
Usage at HPC2N¶
On HPC2N we have AlphaPulldown available as a module.
Loading¶
To use the AlphaPulldown module, add it to your environment. You can find versions with
and you can then find how to load a specific version (including prerequisites), with
Example¶
The workflow in AlphaPulldown is divided into two steps: a First step for doing alignements which runs purely on CPUs, and a Second step for the prediction of structures which runs on GPUs.
Use your project directory (for instance /proj/nobackup/folder-for-alphapulldown
) and not your HOME
folder for running your
simulations because the latter has a limited size (25GB). In your folder for AlphaPulldown place the fasta file my-fasta-file.fasta
(in a real case here is were all fasta files can be located) containing the lines:
This script (job-first.sh
) is used for the First MSA Step (see the comments regarding mmseqs2):
#!/bin/bash
#SBATCH -A Project_ID # Your project ID
#SBATCH -J af-pd # Job name in the queue
#SBATCH -t 05:00:00 # Wall time
#SBATCH -c 8
ml purge > /dev/null 2>&1 # Purge the module environment
module load GCC/12.3.0 OpenMPI/4.1.5 AlphaPulldown/2.0.0-CUDA-12.1.1
# The database is located here: /pfs/data/databases/AlphaFold/20240325
# First phase CPU
#create_individual_features.py --fasta_paths=my-fasta-file.fasta --data_dir=/pfs/data/databases/AlphaFold/20240325 --max_template_date=2024-11-20 --skip_existing=True --seq_index=$SLURM_ARRAY_TASK_ID --output_dir=$PWD
# using mmseqs2: this option is much faster than the one above and as far as we know, is the recommended one (double check with a trial simulation)
create_individual_features.py --fasta_paths=my-fasta-file.fasta --data_dir=/pfs/data/databases/AlphaFold/20240325 --max_template_date=2024-11-20 --skip_existing=True --use_mmseqs2=True --seq_index=$SLURM_ARRAY_TASK_ID --output_dir=$PWD
To submit the jobs to the SLURM queue, execute on the terminal:
This should submit only two jobs as I have only two sequences, but one can have many.
For the Second prediction step, one can use a different script (job-second.sh
), as this part can take advantage of the GPUs:
#!/bin/bash
#SBATCH -A Project_ID # Your project ID
#SBATCH -J af-pd # Job name in the queue
#SBATCH -t 05:00:00 # Wall time
#SBATCH -C nvidia_gpu # select any NVIDIA GPU
#SBATCH --gpus=1 # select one card only
ml purge > /dev/null 2>&1 # Purge the module environment
module load GCC/12.3.0 OpenMPI/4.1.5 AlphaPulldown/2.0.0-CUDA-12.1.1
# The database is located here: /pfs/data/databases/AlphaFold/20240325
# Second phase GPUs
run_multimer_jobs.py --mode=custom --monomer_objects_dir=$PWD --data_dir=/pfs/data/databases/AlphaFold/20240325 --protein_lists=protein_list.txt --output_path=$PWD --num_cycle=3 --num_predictions_per_model=1 --job_index=$SLURM_ARRAY_TASK_ID
One can create a file called, for instance, protein_list.txt
which contains all the proteins considered in the First step:
On the terminal execute:
This will submit only two jobs as there are only two lines in protein_list.txt
.
Note
- In this example the variable
$PWD
was used to indicate that the working directory will be the one where the submit files (and files for AlphaPulldown) are located. If you change this variable to any other path, you will need to change it in a consistent manner for both batch scripts. - This example is a basic adaptation of the documentation page of AlphaPulldown. For more realistic cases, we refer you to that official documentation, in the link provided at the bottom.
- The lines for the commands
create_individual_features.py
andrun_multimer_jobs.py
should be continuous lines. - You can monitor the resources by using the
job-usage
tool available on the Kebnekaise’s terminal.
Additional info¶
More information can be found on