AlphaPulldown

AlphaPulldown is a customized implementation of AlphaFold-Multimer designed for customizable high-throughput screening of protein-protein interactions.

Policy

AlphaPulldown is freely available to users at HPC2N, under GNU General Public License v3.0.

Citations

If you use this software, please cite it along the dependencies it uses internally, at least for AlphaPulldow itself:

Molodenskiy,D. et al. (2025) AlphaPulldown2—a general pipeline for high-throughput structural modeling. Bioinformatics, Volume 41, Issue 3, March 2025, btaf115

Overview

From the AlphaPulldown’s repository:

“AlphaPulldown is a customized implementation of AlphaFold-Multimer designed for customizable high-throughput screening of protein-protein interactions. It extends AlphaFold’s capabilities by incorporating additional run options, such as customizable multimeric structural templates (TrueMultimer), MMseqs2 multiple sequence alignment (MSA) via ColabFold databases, protein fragment predictions, and the ability to incorporate mass spec data as an input using AlphaLink2.

AlphaPulldown can be used in two ways: either by a two-step pipeline made of python scripts, or by a Snakemake pipeline as a whole. For details on using the Snakemake pipeline, please refer to the separate GitHub repository.

To enable faster usage and avoid redundant feature recalculations, we have developed a public database containing precomputed features for all major model organisms, available for download. You can check the full list and download individual features at https://alphapulldown.s3.embl.de/index.html or ttps://s3.embl.de/alphapulldown/index.html.

For more details, click here.”

Usage at HPC2N

On HPC2N we have AlphaPulldown available as a module.

Loading

To use the AlphaPulldown module, add it to your environment. You can find versions with

module spider AlphaPulldown

and you can then find how to load a specific version (including prerequisites), with

module spider AlphaPulldown/<VERSION>

Example

The workflow in AlphaPulldown is divided into two steps: a First step for doing alignments which runs purely on CPUs, and a Second step for the prediction of structures which runs on GPUs.

Use your project directory (for instance /proj/nobackup/folder-for-alphapulldown) and not your HOME folder for running your simulations because the latter has a limited size (25GB). In your folder for AlphaPulldown place the fasta file my-fasta-file.fasta (in a real case here is were all fasta files can be located) containing the lines:

cat my-fasta-file.fasta
>proteinA
MHHHHHHPEIV
>proteinB
MHHHHHHPEIV

This script (job-first.sh) is used for the First MSA Step (see the comments regarding mmseqs2):

#!/bin/bash
#SBATCH -A Project_ID # Your project ID
#SBATCH -J af-pd # Job name in the queue
#SBATCH -t 05:00:00 # Wall time
#SBATCH -c 8

ml purge > /dev/null 2>&1 # Purge the module environment
module load GCC/12.3.0 OpenMPI/4.1.5 AlphaPulldown/2.0.0-CUDA-12.1.1
# The database is located here: /pfs/data/databases/AlphaFold/20240325

# First phase CPU
#create_individual_features.py --fasta_paths=my-fasta-file.fasta --data_dir=/pfs/data/databases/AlphaFold/20240325 --max_template_date=2024-11-20 --skip_existing=True --seq_index=$SLURM_ARRAY_TASK_ID --output_dir=$PWD

# using mmseqs2: this option is much faster than the one above and as far as we know
create_individual_features.py --fasta_paths=my-fasta-file.fasta --data_dir=/pfs/data/databases/AlphaFold/20240325 --max_template_date=2024-11-20 --skip_existing=True --use_mmseqs2=True --seq_index=$SLURM_ARRAY_TASK_ID --output_dir=$PWD

To submit the jobs to the SLURM queue, execute on the terminal:

count=`grep ">" my-fasta-file.fasta | wc -l`; echo $count
sbatch --array=1-$count job-first.sh

This should submit only two jobs as I have only two sequences, but one can have many.

For the Second prediction step, one can use a different script (job-second.sh), as this part can take advantage of the GPUs:

#!/bin/bash
#SBATCH -A Project_ID # Your project ID
#SBATCH -J af-pd # Job name in the queue
#SBATCH -t 05:00:00 # Wall time
#SBATCH -C nvidia_gpu # select any NVIDIA GPU
#SBATCH --gpus=1 # select one card only

ml purge > /dev/null 2>&1 # Purge the module environment
module load GCC/12.3.0 OpenMPI/4.1.5 AlphaPulldown/2.0.0-CUDA-12.1.1
# The database is located here: /pfs/data/databases/AlphaFold/20240325

# Second phase GPUs
run_multimer_jobs.py --mode=custom --monomer_objects_dir=$PWD --data_dir=/pfs/data/databases/AlphaFold/20240325 --protein_lists=protein_list.txt --output_path=$PWD --num_cycle=3 --num_predictions_per_model=1 --job_index=$SLURM_ARRAY_TASK_ID

One can create a file called, for instance, protein_list.txt which contains all the proteins considered in the First step:

cat protein_list.txt
proteinA
proteinB

On the terminal execute:

count=`grep -c "" protein_list.txt `; echo $count
sbatch --array=1-$count job-second.sh

This will submit only two jobs as there are only two lines in protein_list.txt.

Note

  1. In this example the variable $PWD was used to indicate that the working directory will be the one where the submit files (and files for AlphaPulldown) are located. If you change this variable to any other path, you will need to change it in a consistent manner for both batch scripts.
  2. This example is a basic adaptation of the documentation page of AlphaPulldown. For more realistic cases, we refer you to that official documentation, in the link provided at the bottom.
  3. The lines for the commands create_individual_features.py and run_multimer_jobs.py should be continuous lines.
  4. You can monitor the resources by using the job-usage tool available on the Kebnekaise’s terminal.

Installation of Downstream analysis tools

Execute preferably the following steps in a login node accessed through ssh or Thinlinc. If you access a compute node (through Open onDemand), you will need to request for several cores (6 for instance):

# Use your project for caching data instead of the default at $HOME/.apptainer
export APPTAINER_CACHEDIR=/proj/nobackup/your-project-folder/.apptainer

# In your project folder, create another folder (for instance, called CCP4):
cd /proj/nobackup/your-project-folder/
mkdir CCP4 && cd CCP4 

# Pull the container
apptainer pull docker://kosinskilab/fold_analysis:latest

# Create a folder for the installation (called, for instance, install):

mkdir install
apptainer build --sandbox /proj/nobackup/your-project-folder/CCP4/install fold_analysis_latest.sif

Download the CCP4 program according to the instructions and continue with the next steps as given in the Kosinski repository.

In that repository conda is used for installing dependencies, but it is not recommended at HPC2N. The following modules satisfy the requirements that might be needed depending on the analysis you are targeting:

ml GCC/12.3.0  OpenMPI/4.1.5 AlphaPulldown/2.0.0-CUDA-12.1.1
ml OpenMM/8.0.0-CUDA-12.1.1  Kalign/3.4.0  HH-suite/3.3.0 HMMER/3.4 jax/0.4.25-CUDA-12.1.1 PyTorch/2.1.2-CUDA-12.1.1

Note

Because AlphaPulldown is able to separate the alignment (performed on CPUs) and the prediction (suitable for GPUs) steps, it can be a good alternative to the installed AlphaFold versions where both steps need to be done in the same batch script with the same resources.

AlphaPulldown uses AlphaFold 2.3.2 as a backend to compute monomers and multimers.

Additional info

More information can be found on