====== Omegafold ======

Omegafold is another class of prediction structure based on a Protein Language Model (PLM). It doesn't require any multiple sequence alignment and use solely the sequence of the protein of interest.

**For now, it does not support multimere predictions.**

<note>Omegafold main limitation is the GPU memory as it takes a lot for the predictions (see below)</note>
<note>Omegafold is *really fasta* : seconds for small sequences (up to ~100) and minutes for bigger ones (5-10minutes for a 800 sequences protein)</note>

===== Version =====

It use the v1.1.0 available from the Github repository https://github.com/HeliXonProtein/OmegaFold

===== Ressources =====

To know more about Omegafold, I highly recommend to read:
  * the preprint : https://www.biorxiv.org/content/10.1101/2022.07.21.500999v1
  * the GitHub repo: https://github.com/HeliXonProtein/OmegaFold
  * the available notebook: https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/omegafold.ipynb


===== Installation =====

The installation follow the same process of AlphaFold.\\
**It's available on nodes node061, node062, node063 and node081**

The installation requires only a Python Package to install. A conda environment **omegafold** was created for this purpose.


===== Utilization =====

**Use the same queues as alphafold: ''alphafold'' or ''alphafold2'' **

See here: http://www-lbt.ibpc.fr/wiki/doku.php?id=cluster-lbt:extra-tools:alphafold_tool#queues

Since the main limitations of Omegafold is the GPU memory, you should **always** use half of the node for the predictions.

==== Input file ====

Omegafold support only a fasta file.

For several predictions, you can give a multifasta and sequences will be treated as batch (one after another).


==== GPU Memory ====


Omegafold use a lot of the GPU Memory for the predictions:
  * ~500Mb for a 70 sequences protein
  * ~27GB for a 800 sequences protein

The GPU installed in nodes 6X have ~10Gb memory and the ones in the node81 have ~48Gb.

However, omegafold has an option to decrease the memory used (and thus increase the prediction time) called ''subbatch_size''. Here the explanation taken from the [[https://github.com/HeliXonProtein/OmegaFold#setting-subbatch|GitHub]]:

> Subbatch makes a trade-off between time and space. One can greatly reduce the space requirements by setting --subbatch_size very low. The default is the number of residues in the sequence and the lowest possible number is 1. For now we do not have a rule of thumb for setting the --subbatch_size, but we suggest half the value if you run into GPU memory limitations.


==== Running ====

<note important>The first time you use Omegafold, it will download a weight file (model.pt) and will copy it into ~/.cache/omegafold_ckpt directory.</note>

A script is available called ''omegafold'':
<code bash>
(omegafold) [santuz@node061 simple_dimere]$ omegafold -h
usage: omegafold [-h] [--num_cycle NUM_CYCLE] [--subbatch_size SUBBATCH_SIZE] [--device DEVICE] [--weights_file WEIGHTS_FILE] [--weights WEIGHTS]
                 [--pseudo_msa_mask_rate PSEUDO_MSA_MASK_RATE] [--num_pseudo_msa NUM_PSEUDO_MSA] [--allow_tf32 ALLOW_TF32]
                 input_file output_dir

Launch OmegaFold and perform inference on the data. Some examples (both the input and output files) are included in the Examples folder, where each
folder contains the output of each available model from model1 to model3. All of the results are obtained by issuing the general command with only model
number chosen (1-3).

positional arguments:
  input_file            The input fasta file
  output_dir            The output directory to write the output pdb files. If the directory does not exist, we just create it. The output file name
                        follows its unique identifier in the rows of the input fasta file"

optional arguments:
  -h, --help            show this help message and exit
  --num_cycle NUM_CYCLE
                        The number of cycles for optimization, default to 10
  --subbatch_size SUBBATCH_SIZE
                        The subbatching number, the smaller, the slower, the less GRAM requirements. Default is the entire length of the sequence. This
                        one takes priority over the automatically determined one for the sequences
  --device DEVICE       The device on which the model will be running, default to the accelerator that we can find
  --weights_file WEIGHTS_FILE
                        The model cache to run
  --weights WEIGHTS     The url to the weights of the model
  --pseudo_msa_mask_rate PSEUDO_MSA_MASK_RATE
                        The masking rate for generating pseudo MSAs
  --num_pseudo_msa NUM_PSEUDO_MSA
                        The number of pseudo MSAs
  --allow_tf32 ALLOW_TF32
                        if allow tf32 for speed if available, default to True

</code>

==== Submission script ====

You can find below an example of a submission script to perform Omegafold computations.

**Script version 18/11/2022**

<file bash job_OmegaFold.sh>
#!/bin/bash
#PBS -S /bin/bash
#PBS -N AF2
#PBS -o $PBS_JOBID.out
#PBS -e $PBS_JOBID.err

#Half node always
#PBS -l nodes=1:ppn=8
#PBS -l walltime=24:00:00
#PBS -A simlab_project
#PBS -q alphafold_hn

#script version 18.11.2022

### FOR EVERYTHING BELOW, I ADVISE YOU TO MODIFY THE USER-part ONLY ###
WORKDIR="/"
NUM_NODES=$(cat $PBS_NODEFILE|uniq|wc -l)
if [ ! -n "$PBS_O_HOME" ] || [ ! -n "$PBS_JOBID" ]; then
        echo "At least one variable is needed but not defined. Please touch your manager about."
        exit 1
else
        if [ $NUM_NODES -le 1 ]; then
                WORKDIR+="scratch/"
                export WORKDIR+=$(echo $PBS_O_HOME |sed 's#.*/\(home\|workdir\)/\(.*_team\)*.*#\2#g')"/$PBS_JOBID/"
                mkdir $WORKDIR
                rsync -ap $PBS_O_WORKDIR/ $WORKDIR/

                # if you need to check your job output during execution (example: each hour) you can uncomment the following line
                # /shared/scripts/ADMIN__auto-rsync.example 3600 &
        else
                export WORKDIR=$PBS_O_WORKDIR
        fi
fi

echo "your current dir is: $PBS_O_WORKDIR"
echo "your workdir is: $WORKDIR"
echo "number of nodes: $NUM_NODES"
echo "number of cores: "$(cat $PBS_NODEFILE|wc -l)
echo "your execution environment: "$(cat $PBS_NODEFILE|uniq|while read line; do printf "%s" "$line "; done)

cd $WORKDIR

# If you're using only one node, it's counterproductive to use IB network for your MPI process communications
if [ $NUM_NODES -eq 1 ]; then
        export PSM_DEVICES=self,shm
        export OMPI_MCA_mtl=^psm
        export OMPI_MCA_btl=shm,self
else
# Since we are using a single IB card per node which can initiate only up to a maximum of 16 PSM contexts
# we have to share PSM contexts between processes
# CIN is here the number of cores in node
        CIN=$(cat /proc/cpuinfo | grep -i processor | wc -l)
        if [ $(($CIN/16)) -ge 2 ]; then
                PPN=$(grep $HOSTNAME $PBS_NODEFILE|wc -l)
                if [ $CIN -eq 40 ]; then
                        export PSM_SHAREDCONTEXTS_MAX=$(($PPN/4))
                elif [ $CIN -eq 32 ]; then
                        export PSM_SHAREDCONTEXTS_MAX=$(($PPN/2))
                else
                        echo "This computing node is not supported by this script"
                fi
                echo "PSM_SHAREDCONTEXTS_MAX defined to $PSM_SHAREDCONTEXTS_MAX"
        else
                echo "no PSM_SHAREDCONTEXTS_MAX to define"
        fi
fi

function get_gpu-ids() {
        if [ $PBS_NUM_PPN -eq $(cat /proc/cpuinfo | grep -cE "^processor.*:") ]; then
                echo "0,1" && return
        fi

        if [ -e /dev/cpuset/torque/$PBS_JOBID/cpus ]; then
                FILE="/dev/cpuset/torque/$PBS_JOBID/cpus"
        elif [ -e /dev/cpuset/torque/$PBS_JOBID/cpuset.cpus ]; then
                FILE="/dev/cpuset/torque/$PBS_JOBID/cpuset.cpus"
        else
                FILE=""
        fi

        if [ -e $FILE ]; then
                if [ $(cat $FILE | sed -r 's/^([0-9]).*$/\1/') -eq 0 ]; then
                        echo "0" && return
                else
                        echo "1" && return
                fi
        else
                echo "0,1" && return
        fi
}

gpus=$(get_gpu-ids)


## USER Part
module load gcc/8.3.0
module load miniconda-py3/latest


conda activate omegafold

#Run
cd $WORKDIR/

d1=`date +%s`
echo $(date)


omegafold query.fasta outputdir/


d2=$(date +%s)
echo $(date)

diff=$((($d2 - $d1)/60))
echo "Time spent (min) : ${diff}"

## DO NOT MODIFY THE PART OF SCRIPT: you will be accountable for any damage you cause
# At the term of your job, you need to get back all produced data synchronizing workdir folder with you starting job folder and delete the temporary one (workdir)
if [ $NUM_NODES -le 1 ]; then
        cd $PBS_O_WORKDIR
        rsync -ap $WORKDIR/ $PBS_O_WORKDIR/
        rm -rf $WORKDIR
fi
## END-DO
</file>

==== Benchmarks ====


===== Troubleshooting =====

In case of trouble, you can contact me at : ''hubert.santuz[at]ibpc.fr''


==== RuntimeError: CUDA out of memory. ====

If you encounter this error: 

<code>
Traceback (most recent call last):
  File "/shared/compilers/conda-py3/latest/envs/omegafold/bin/omegafold", line 8, in <module>
    sys.exit(main())
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/__main__.py", line 74, in main
    output = model(
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/model.py", line 175, in forward
    result, prev_dict = self.omega_fold_cycle(
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/model.py", line 89, in forward
    prev_node, edge_repr, node_repr = self.geoformer(
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/geoformer.py", line 175, in forward
    node_repr, edge_repr = block(
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/geoformer.py", line 122, in forward
    edge_repr += layer(edge_repr, mask[..., 0, :], fwd_cfg=fwd_cfg)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/modules.py", line 677, in forward
    out = self._get_attended(edge_repr, mask, fwd_cfg)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/modules.py", line 607, in _get_attended
    attended[s:e] = self.attention(
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/modules.py", line 431, in forward
    attn_out = self._get_attn_out(q_inputs, kv_inputs, fwd_cfg, bias)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/modules.py", line 455, in _get_attn_out
    attn_out, _ = attention(
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/modules.py", line 156, in attention
    res, attn = _attention(
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/omegafold/modules.py", line 93, in _attention
    logits = torch.einsum("...id, ...jd -> ...ij", query * scale, key)
  File "/shared/compilers/conda-py3/latest/envs/omegafold/lib/python3.9/site-packages/torch/functional.py", line 360, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: CUDA out of memory. Tried to allocate 14.09 GiB (GPU 0; 10.92 GiB total capacity; 9.36 GiB already allocated; 747.38 MiB free; 9.60 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

</code>

It means your predictions use too much GPU memory that the card can handle. Try playing with the ''subbatch_size'' option (as explained [[cluster-lbt:extra-tools:omegafold#gpu_memory|here]]) to reduce the memory used.