Introduction
DOMAC
is an accurate, hybrid protein domain prediction server.
DOMAC integrates homology modeling, domain parsing, and ab intio methods together.
The preliminary implementation of DOMAC (server name: FOLDpro (Cheng and Baldi,
Bioinformatics, 2006) ) is ranked
first among all domain prediction servers in the seventh edition of
Critical Assessment of Techniques for Protein Structure Prediction (CASP7) (Moult
et al., Proteins, 2005). DOMAC predicts protein domains in the following two steps.
-
First, it uses PSI-BLAST (Altschul et al., NAR, 1997) to search the target sequence against NCBI Non-Redundant
sequence database to construct a profile. The profile is used to search a template
library built from the proteins in Protein Data Bank (Berman et al., Nucleic
Acids Res., 2000) to identify templates.
-
Second, if some significant templates are identified (e-value <= 0.001), it
generates a structure model for the target using Modeller (Sali and Blundell, JMB, 1993) based on the template
structures. Multiple significant templates are combined to improve model
quality if available. Then it uses a domain parsing tool PDP ((Alexandrov and Shindyalov, Bioinformatics, 2003) to parse the model
into domains. If the parsed domains do not cover the whole target sequence, DOMAC
will assign the uncovered regions to the adjacent domains.
-
If no significant template is found, DOMAC will invoke the ab initio domain predictor
DOMpro to predict domains. DOMpro (Cheng et al., Data Mining and Knowledge Discovery, 2006) uses neural networks in conjunction with sequence
profile, predicted secondary structure, and relative solvent accessibility to predict
domain boundary.
Input and Output
-
Inputs to the web server include target name, sequence, and email address.
It usually takes less than 15 minutes to process one query, depending on the server load and
sequence length. Sequence must be entered as a plain sequence of amino acids. Maximum sequence length
allowed is 1500.
-
Domain prediction outputs include the user-defined target name, the protein sequence, the predicted domain number,
the start and end positions of each domain, and the method (template-based or ab initio) used to make the
prediction. For template-based prediction, it also reports the PDB codes of the templates used to make the prediction.
Dataset Download