AlphaPulldown¶
AlphaPulldown is a customized implementation of AlphaFold-Multimer designed for customizable high-throughput screening of protein-protein interactions.
Policy¶
AlphaPulldown is freely available to users at HPC2N, under GNU General Public License v3.0.
Citations
If you use this software, please cite it along the dependencies it uses internally, at least for AlphaPulldow itself:
Molodenskiy,D. et al. (2025) AlphaPulldown2—a general pipeline for high-throughput structural modeling. Bioinformatics, Volume 41, Issue 3, March 2025, btaf115
Overview¶
From the AlphaPulldown’s repository:
“AlphaPulldown is a customized implementation of AlphaFold-Multimer designed for customizable high-throughput screening of protein-protein interactions. It extends AlphaFold’s capabilities by incorporating additional run options, such as customizable multimeric structural templates (TrueMultimer), MMseqs2 multiple sequence alignment (MSA) via ColabFold databases, protein fragment predictions, and the ability to incorporate mass spec data as an input using AlphaLink2.
AlphaPulldown can be used in two ways: either by a two-step pipeline made of python scripts, or by a Snakemake pipeline as a whole. For details on using the Snakemake pipeline, please refer to the separate GitHub repository.
To enable faster usage and avoid redundant feature recalculations, we have developed a public database containing precomputed features for all major model organisms, available for download. You can check the full list and download individual features at https://alphapulldown.s3.embl.de/index.html or ttps://s3.embl.de/alphapulldown/index.html.
For more details, click here.”
Usage at HPC2N¶
On HPC2N we have AlphaPulldown available as a module.
Loading¶
To use the AlphaPulldown module, add it to your environment. You can find versions with
and you can then find how to load a specific version (including prerequisites), with
Example¶
The workflow in AlphaPulldown is divided into two steps: a First step for doing alignments which runs purely on CPUs, and a Second step for the prediction of structures which runs on GPUs.
Use your project directory (for instance /proj/nobackup/folder-for-alphapulldown
) and not your HOME
folder for running your
simulations because the latter has a limited size (25GB). In your folder for AlphaPulldown place the fasta file my-fasta-file.fasta
(in a real case here is were all fasta files can be located) containing the lines:
This script (job-first.sh
) is used for the First MSA Step (see the comments regarding mmseqs2):
#!/bin/bash
#SBATCH -A Project_ID # Your project ID
#SBATCH -J af-pd # Job name in the queue
#SBATCH -t 05:00:00 # Wall time
#SBATCH -c 8
ml purge > /dev/null 2>&1 # Purge the module environment
module load GCC/12.3.0 OpenMPI/4.1.5 AlphaPulldown/2.0.0-CUDA-12.1.1
# The database is located here: /pfs/data/databases/AlphaFold/20240325
# First phase CPU
#create_individual_features.py --fasta_paths=my-fasta-file.fasta --data_dir=/pfs/data/databases/AlphaFold/20240325 --max_template_date=2024-11-20 --skip_existing=True --seq_index=$SLURM_ARRAY_TASK_ID --output_dir=$PWD
# using mmseqs2: this option is much faster than the one above and as far as we know
create_individual_features.py --fasta_paths=my-fasta-file.fasta --data_dir=/pfs/data/databases/AlphaFold/20240325 --max_template_date=2024-11-20 --skip_existing=True --use_mmseqs2=True --seq_index=$SLURM_ARRAY_TASK_ID --output_dir=$PWD
To submit the jobs to the SLURM queue, execute on the terminal:
This should submit only two jobs as I have only two sequences, but one can have many.
For the Second prediction step, one can use a different script (job-second.sh
), as this part can take advantage of the GPUs:
#!/bin/bash
#SBATCH -A Project_ID # Your project ID
#SBATCH -J af-pd # Job name in the queue
#SBATCH -t 05:00:00 # Wall time
#SBATCH -C nvidia_gpu # select any NVIDIA GPU
#SBATCH --gpus=1 # select one card only
ml purge > /dev/null 2>&1 # Purge the module environment
module load GCC/12.3.0 OpenMPI/4.1.5 AlphaPulldown/2.0.0-CUDA-12.1.1
# The database is located here: /pfs/data/databases/AlphaFold/20240325
# Second phase GPUs
run_multimer_jobs.py --mode=custom --monomer_objects_dir=$PWD --data_dir=/pfs/data/databases/AlphaFold/20240325 --protein_lists=protein_list.txt --output_path=$PWD --num_cycle=3 --num_predictions_per_model=1 --job_index=$SLURM_ARRAY_TASK_ID
One can create a file called, for instance, protein_list.txt
which contains all the proteins considered in the First step:
On the terminal execute:
This will submit only two jobs as there are only two lines in protein_list.txt
.
Note
- In this example the variable
$PWD
was used to indicate that the working directory will be the one where the submit files (and files for AlphaPulldown) are located. If you change this variable to any other path, you will need to change it in a consistent manner for both batch scripts. - This example is a basic adaptation of the documentation page of AlphaPulldown. For more realistic cases, we refer you to that official documentation, in the link provided at the bottom.
- The lines for the commands
create_individual_features.py
andrun_multimer_jobs.py
should be continuous lines. - You can monitor the resources by using the
job-usage
tool available on the Kebnekaise’s terminal.
Installation of Downstream analysis tools¶
Execute preferably the following steps in a login node accessed through ssh or Thinlinc. If you access a compute node (through Open onDemand), you will need to request for several cores (6 for instance):
# Use your project for caching data instead of the default at $HOME/.apptainer
export APPTAINER_CACHEDIR=/proj/nobackup/your-project-folder/.apptainer
# In your project folder, create another folder (for instance, called CCP4):
cd /proj/nobackup/your-project-folder/
mkdir CCP4 && cd CCP4
# Pull the container
apptainer pull docker://kosinskilab/fold_analysis:latest
# Create a folder for the installation (called, for instance, install):
mkdir install
apptainer build --sandbox /proj/nobackup/your-project-folder/CCP4/install fold_analysis_latest.sif
Download the CCP4 program according to the instructions and continue with the next steps as given in the Kosinski repository.
In that repository conda
is used for installing dependencies, but it is not recommended at HPC2N.
The following modules satisfy the requirements that might be needed depending on the analysis you are
targeting:
ml GCC/12.3.0 OpenMPI/4.1.5 AlphaPulldown/2.0.0-CUDA-12.1.1
ml OpenMM/8.0.0-CUDA-12.1.1 Kalign/3.4.0 HH-suite/3.3.0 HMMER/3.4 jax/0.4.25-CUDA-12.1.1 PyTorch/2.1.2-CUDA-12.1.1
Note
Because AlphaPulldown is able to separate the alignment (performed on CPUs) and the prediction (suitable for GPUs) steps, it can be a good alternative to the installed AlphaFold versions where both steps need to be done in the same batch script with the same resources.
AlphaPulldown uses AlphaFold 2.3.2 as a backend to compute monomers and multimers.
Additional info¶
More information can be found on