Caveat

TargetDBP+: Enhancing the Performance of Identifying DNA-Binding Proteins via Weightedly Convolutional Features


Protein-DNA interactions exist ubiquitously and play important roles in the life cycles of living cells. The accurate identification of DNA-binding proteins (DBPs) is one of the key steps to understand the mechanisms of protein-DNA interactions. Although many DBP identification methods are proposed, the current performance is still unsatisfactory. In this study, a new method, called TargetDBP+, is developed to enhance the performance of identifying DBPs further. In TargetDBP+, five convolutional features are first extracted from five feature sources, i.e., AAOHM (amino acid one-hot matrix), PSSM (position-specific scoring matrix), PSSPM (predicted secondary structure probability matrix), PSAPM (predicted solvent accessibility probability matrix), and PPDBS (predicted probabilities of DNA-binding sites); secondly, the five features are weightedly and serially combined via using all element weights learned by differential evolution algorithm; finally, the DBP identification model of TargetDBP+ is trained by using the SVM (support vector machine) algorithm. To evaluate and compare the developed TargetDBP+ with other existing methods, a new gold-standard benchmark dataset, called UniSwiss, is constructed, which consists of 4,881 DBPs and 4,881 non-DBPs extracted from UniprotKB/Swiss-Prot database. Experimental results demonstrate that TargetDBP+ can obtain accuracy of 85.83% and precision of 88.45% covering 82.41% of all DBP data on the independent validation subset of UniSwiss, with the MCC value (0.718) significantly higher than that of other state-of-the-art control methods. The web-server of TargetDBP+ is accessible at http://csbio.njust.edu.cn/bioinf/targetdbpplus/; the UniSwiss dataset and standalone program of TargetDBP+ can be found at https://github.com/jun-csbio/TargetDBPplus.

To use this TargetDBP+ server, one or more protein sequences and an available email address should be inputted. Then, the server will evaluate the running time and send it to user by email. After the server finished the inputted prediction task, the result email will automatically be sent to user with instruction to access the result page, which will be kept on the TargetDBP web server for 3 months.

If only one given protein sequence is inputted, the TargetDBP+ prediction typically takes ~3-90 m depending on the length of the given sequence. The relatively long computational time stems from the fact that TargetDBP+ must perform PSI-BLAST, PSIPRED, SANN, the procedure of DBS prediction, and LIBSVM to gain discriminative features and predict whether the inputted protein is DBP or not.

Although there is no limitation in the number of inputted protein sequences in the TargetDBP+ server, we strongly suggested that you should input less than 10 protein sequences once, since our computation resource is limited.