The ScoresDecomposer.py script summarizes the scoring components from Dock scoring files, OEDocking scoring tables and Gold log files into a table-like file to be used as input to CompScore. This script is freely available for download from here. System requirements Before running ScoresDecomposer be sure that your system meets the following requirements:
- Python >=3.6 (Conda >= 4.4.10 recommended)
- NumPy >= 1.14.0
Supported scoring functions
Currently, ScoresDecomposer.py supports the extraction of scores components from the scoring functions listed in the below table. The components of these scoring functions are extracted according to files formats and identifying keywords.
|Docking program||Scoring functions|
|Dock 6.8||Grid, Contact, Continuous, Hawkins, PBSA, SASA|
|Gold||PLP, GoldScore, ChemScore, ASP|
|OEDocking||Shapegauss, ChemScore, ChemGauss 3, ChemGauss 4, PLP|
The script is run as:
python ./ScoresDecomposer.py -i <Input File> -o <Output Scores Table>
The OutputFile will contain a text file that can be used as Data file for the CompScore Web Service. The format of the input file for “ScoresDecomposer.py” is described in teh next section.
The input file contains information regarding the output files from docking calculations. This information includes the file location, the docking program that produced the file and the scoring function the file corresponds to. The file contains information of one scoring file per line with fields separated by tab. Columns is the input file are organized as follow:
- Column 1: Location of the scoring file relative to current directory
- Column 2: Docking program ID. The IDs of the docking programs are Dock35 for Dock 3.6, Dock6 for Dock 6.8, OpenEye for OEDocking and GOLD for Gold. The use of these IDs are mandatory.
- Column 3: Scoring function ID. Each scoring function from the above listed programs are identified as:
|Param||Scoring function||Scoring function ID|
It is important that IDs for programs and scoring functions are correctly specified in the input file, otherwise the ScoresDecomposer.py will fail to generate a scores components summary file. A typical input file for this script will look like:
The OutputFile is a text file that can be used as Data file for the CompScore Web Service. This file is organized as:
- First row: data heading, i.e scores components IDs.
- Column 1: The ID of each compound.
- Column 2: The classification of the compounds in either ligand (1) or decoy (0).
- Column 3: The value of the number of heavy atoms.
- Column 4: to last: Scoring components values.
Dock 6.8 Grid scoring files are used as the reference files for the extraction of the classification and number of heavy atoms of the compounds. Compounds not present in a Dock 6.8 Grid scoring file will be assigned a number of 999999. Given that all calculations for CompScore validation were performed with the DUD-E database, for classification extraction if the “dbgen_dec” pattern is found in the reference file the compounds in that file are classified as decoys (0). On the other hand, if the “dbgen_lig” pattern is found in the reference file the compounds in that file are classified as ligands (1).
Compounds not present in a Dock 6.8 Grid scoring file or those present but for which no “dbgen_dec” or “dbgen_lig” patterns are found in their paths a classification of -1 is assigned. In cases where some compounds can’t be assigned to either the ligands or decoys groups and the OutputFile of ScoresDecomposer.py are required for a CompScore GA optimization run, the user can manually add the compounds classification latter. Finally, the IDs that ScoresDecomposer.py assign to each docking score component are listed here .