TY - JOUR
T1 - AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties
AU - Kandaswamy, Krishna Kumar
AU - Chou, Kuo-Chen
AU - Martinetz, Thomas
AU - Möller, Steffen
AU - Suganthan, P. N.
AU - Sridharan, S.
AU - Pugalenthi, Ganesan
PY - 2011/2/7
Y1 - 2011/2/7
N2 - Some creatures living in extremely low temperatures can produce some special materials called “antifreeze proteins” (AFPs), which can prevent the cell and body fluids from freezing. AFPs are present in vertebrates, invertebrates, plants, bacteria, fungi, etc. Although AFPs have a common function, they show a high degree of diversity in sequences and structures. Therefore, sequence similarity based search methods often fails to predict AFPs from sequence databases. In this work, we report a random forest approach “AFP-Pred” for the prediction of antifreeze proteins from protein sequence. AFP-Pred was trained on the dataset containing 300 AFPs and 300 non-AFPs and tested on the dataset containing 181 AFPs and 9193 non-AFPs. AFP-Pred achieved 81.33% accuracy from training and 83.38% from testing. The performance of AFP-Pred was compared with BLAST and HMM. High prediction accuracy and successful of prediction of hypothetical proteins suggests that AFP-Pred can be a useful approach to identify antifreeze proteins from sequence information, irrespective of their sequence similarity.
AB - Some creatures living in extremely low temperatures can produce some special materials called “antifreeze proteins” (AFPs), which can prevent the cell and body fluids from freezing. AFPs are present in vertebrates, invertebrates, plants, bacteria, fungi, etc. Although AFPs have a common function, they show a high degree of diversity in sequences and structures. Therefore, sequence similarity based search methods often fails to predict AFPs from sequence databases. In this work, we report a random forest approach “AFP-Pred” for the prediction of antifreeze proteins from protein sequence. AFP-Pred was trained on the dataset containing 300 AFPs and 300 non-AFPs and tested on the dataset containing 181 AFPs and 9193 non-AFPs. AFP-Pred achieved 81.33% accuracy from training and 83.38% from testing. The performance of AFP-Pred was compared with BLAST and HMM. High prediction accuracy and successful of prediction of hypothetical proteins suggests that AFP-Pred can be a useful approach to identify antifreeze proteins from sequence information, irrespective of their sequence similarity.
UR - https://www.ncbi.nlm.nih.gov/pubmed/21056045
U2 - 10.1016/j.jtbi.2010.10.037
DO - 10.1016/j.jtbi.2010.10.037
M3 - Journal articles
SN - 0022-5193
VL - 270
SP - 56
EP - 62
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
IS - 1
ER -