Installation Guide
DDGWizard consists of 3 components: the feature calculation pipeline, that processes raw ΔΔG data and outputs feature-enriched ΔΔG data with 1547 features; the DDGWizard dataset, including 15752 ΔΔG data; and the accurate ΔΔG prediction model.
This section explains how to install dependencies for using the DDGWizard's application (there is no need to install anything to access the DDGWizard dataset; it can be directly downloaded).
Installation prerequisites:
CentOS 7 or Ubuntu system; GCC version higher than 4.8.5; Conda version higher than 23.0; Git.Feature Calculation Pipeline (for Generating Feature-Enriched ΔΔG Data)
This subsection is for users who need to use the feature calculation pipeline. It can assist users in processing input raw ΔΔG data and outputting feature-enriched new data, including 1574 features that completed calculations. It can facilitate further analysis, feature selection, and machine learning.
The installation steps are as follows.
1. Git clone the DDGWizard repository
$ git clone https://github.com/bioinfbrad/DDGWizard.git
2. Config and install conda virtual environment
There is an Environment.yml file located in the path DDGWizard/src, which is the Conda environment configuration file.
Open this file with your text editor (e.g., nano, vim, vi, etc.). Here we use nano as an example:
$ cd </path/to/DDGWizard/>src/
$ nano Environment.yml
Modify the prefix, which is on the last line. Change the prefix to your local conda envs folder path.
If you don't know how to find the path to local conda envs folder, you can use command:
$ conda info --envs
After changing, the prefix should be prefix: /path/to/your_conda/envs/DDGWizard.
Once user have changed the prefix of Environment.yml file, please use Conda commands to create a Conda virtual environment and install dependencies. This may take some time.
$ conda env create -f Environment.yml
3. Download NCBI-BLAST-2.13.0+
Users need to download the NCBI-BLAST-2.13.0 program for allowing DDGWizard to carry out multiple sequence alignment (MSA). Users can visit Download NCBI-BLAST-2.13.0+ to download the ncbi-blast-2.13.0+-x64-linux.tar.gz file. We recommend download this file to the path DDGWizard/src/. Users can also use wget to download:
$ cd </path/to/DDGWizard/>src/
$ wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.13.0/ncbi-blast-2.13.0+-x64-linux.tar.gz
Then copy this compressed file to the path DDGWizard/bin/ncbi_blast_2_13_0+/ and extract it. Use the following commands (assuming the file has been downloaded to the path DDGWizard/src/):
$ cd </path/to/DDGWizard/>src/
$ cp ncbi-blast-2.13.0+-x64-linux.tar.gz ../bin/ncbi_blast_2_13_0+/
$ cd ../bin/
$ tar -zxvf ncbi-blast-2.13.0+-x64-linux.tar.gz
$ cp -r ncbi-blast-2.13.0+/* .
NCBI-BLAST-2.13.0+ is a "United States Government Work" under the terms of the United States Copyright Act. Please read and accept the license file in its folder before proceeding further.
4. Configure Modeller
The Modeller software is used for homology or comparative modeling of protein three-dimensional structures. In DDGWizard, Modeller is used to construct PDB protein structure files of mutations based on the user's input of wild-type PDB protein structure files. Modeller has already been installed when creating Conda environment. But to allow our program to call it, you need to have a license of the Modeller and configure it. Please enter Official Modeller Website to register an account. Modeller use "Academic End-User Software License Agreement for MODELLER" terms. Please follow their instructions, read and accept the terms to obtain a license key. Then input the license key into installed Modeller's configuration file. You can access it under the Conda envs folder. Please use following commands:
$ nano </path/to/your_conda/envs/DDGWizard/>lib/modeller-10.6/modlib/modeller/config.py
Replace the XXXX to your license key. Save and close it.
To use DDGWizard feature calculation pipeline, the following software dependencies are optional (step 5-11) and not required to be installed (if certain software is not installed, the feature values it calculates will not be output).
If users want to calculate more features, please install the following software dependencies. If users want to test the feature calculation pipeline for now, it can already run (for usage, see section Generate Feature-Enriched ΔΔG data).
Before running, please don't forget to make sure the programs of the DDGWizard have the executable permission (step 12). Return to the DDGWizard program folder and execute the command:
$ cd </path/to/DDGWizard/>
$ chmod -R +x .
To use DDGWizard prediction model, users need to further complete installation of step 5-8 (Ring 3.0 needs to apply and achieves permission to download, might take some time).
(Optional) 5. Download FoldX 5.0
Users can download the FoldX 5.0 program for allowing DDGWizard to calculate energy terms of proteins. FoldX has academic version and commercial version. To use it in DDGWizard, academic version is enough. Please visit Apply for FoldX 5.0 to register an account, read and accept "FoldX Academic License" terms to download the foldx5Linux64.zip file. Copy this compressed file to the path DDGWizard/bin/FoldX_5.0/ and extract it. Use the following commands (assuming the file has been downloaded to the path DDGWizard/src/):
$ cd </path/to/DDGWizard/>src/
$ cp foldx5Linux64.zip ../bin/FoldX_5.0/
$ cd ../bin/FoldX_5.0/
$ unzip foldx5Linux64.zip
(Optional) 6. Download Ring 3.0
Users can download the Ring 3.0 application for allowing DDGWizard to calculate residue interaction information. Please visit Apply for Ring 3.0 to apply and wait permission to download. Please read and accept the license of Ring 3.0 to obtain the ring-3.0.0.tgz file. Copy this compressed file to the path DDGWizard/bin/ring-3.0.0/ and extract it. Use the following commands (assuming the file has been downloaded to the path DDGWizard/src/):
$ cd </path/to/DDGWizard/>src/
$ cp ring-3.0.0.tgz ../bin/ring-3.0.0/
$ cd ../bin/ring-3.0.0/
$ tar -zxvf ring-3.0.0.tgz
$ cp -r ./ring-3.0.0/* .
(Optional) 7. Download DisEMBL
Users can download the DisEMBL program for allowing DDGWizard to count disorder information of proteins. Please visit Download the DisEMBL to download the DisEMBL-1.4.tgz file. Copy this compressed file to the path DDGWizard/bin/DisEMBL_1_4/ and extract it. Use the following commands (assuming the file has been downloaded to the path DDGWizard/src/):
$ cd </path/to/DDGWizard/>src/
$ cp DisEMBL-1.4.tgz DDGWizard/bin/DisEMBL_1_4/
$ cd ../bin/DisEMBL_1_4/
$ tar -zxvf DisEMBL-1.4.tgz
$ cp -r ./DisEMBL-1.4/* .
DisEMBL uses GPL 2.0 open-source license. Please read and accept the license file in its folder before proceeding further.
(Optional) 8. Configure DSSP
The DSSP is used to calculate the RSA (relative surface area) and secondary stuctures of PDB files. To allow DDGWizard use DSSP, please enter your local Conda envs folder, then enter bin folder, and copy mkdssp as dssp:
$ cd </path/to/your_conda/envs/DDGWizard/bin/>
$ cp mkdssp dssp
(Optional) 9. Install Bio3D
Users can install the Bio3D package for allowing DDGWizard to calculate atomic fluctuations based on NMA (normal mode analysis). It requires users have R as prerequisites (it can be downloaded and installed from Official R Website). Then please use following commands to install Bio3d package:$ R
install.packages("bio3d")
(Optional) 10. Download PROFbval
PROFbval relies on the Ubuntu environment. To address cross-platform compatibility, we have created container images for easy download by users. This requires users have Docker or Singularity as a prerequisite. Please download the following two files: myprof.tar (128MB) and myprof.sif (360MB) from https://zenodo.org/records/12817843, and copy them to the path: DDGWizard/src/Prof_Source. Please use the following commands (assuming the files have been downloaded to the path DDGWizard/src/).
$ cd </path/to/DDGWizard/>src/
$ cp ./myprof.tar ./Prof_Source
$ cp ./myprof.sif ./Prof_Source
DDGWizard will automatically call the programs within the container images. Users only need to have either Docker or Singularity. If the user chooses Docker, an additional step is required:
$ docker load -i <b></path/to/DDGWizard/></b>src/Prof_Source/myprof.tar
PROFbval uses GPL 3.0+ open-source license. Please read and accept its license before proceeding further.
(Optional) 11. Download SIFT 6.2.1
Users can download the SIFT 6.2.1 program for allowing DDGWizard to predict impact of amino acid substitution on protein function. Please visit Download SIFT 6.2.1 or use wget to download the sift6.2.1.tar.gz file. Copy this compressed file to the path DDGWizard/bin/sift6_2_1/ and extract it. Use the following commands (assuming the files have been downloaded to the path DDGWizard/src/):
$ cd </path/to/DDGWizard/>src/
$ wget https://s3.amazonaws.com/sift-public/nsSNV/sift6.2.1.tar.gz
$ cp sift6.2.1.tar.gz ../bin/sift6_2_1/
$ cd ../bin/sift6_2_1/
$ tar -zxvf sift6.2.1.tar.gz
$ cp -r sift6.2.1/* .
SIFT 6.2.1 uses non-commercial license. Please read and accept the license file in its folder before proceeding further.
12. Make sure the programs of the DDGWizard have the executable permission
The programs of DDGWizard need the executable permission to run. Return to the DDGWizard program folder and execute the command:
$ cd </path/to/DDGWizard/>
$ chmod -R +x .
ΔΔG Prediction Model (for Predicting ΔΔG)
This subsection is for users who need to use the ΔΔG prediction model.
To use DDGWizard's ΔΔG prediction model, users are required to complete steps 1-8 (these are no longer optional) and execute step 12 of Feature Calculation Pipeline's installation part. Steps 9-11 are not required.