Computational approaches for the prediction of apicoplast-targeted proteins
MetadataShow full item record
Motivation:The cells of eukaryotic organisms contain subunits called organelles. The apicoplast is a unique organelle found in a group of parasites, known as Apicomplexa, that are responsible for a wide range of serious diseases including malaria. The apicoplast is an ideal drug target because of its unique properties. Identifying apicoplast-targeted proteins (ATPs) is necessary for drug target identification and accurate in silico prediction methods are needed to accelerate this process. Current computational approaches concentrate on a single species of Apicomplexa and are capable of predicting only a subset of ATPs. Methodology: We have developed two new computational approaches, ApicoAP and ApicoAMP, that concentrate on different types of ATPs and that are applicable to multiple species of Apicomplexa. ApicoAP is a generalized rule-based classification model. In ApicoAP, we conduct a systematic search over a rule space using the expected prediction performance of a rule on a training set as the optimization criterion. The rule space is formalized by our parametric rule definition. We devised a genetic algorithm to perform the optimization that results in a classification rule. Performance of ApicoAP is evaluated for labeled datasets of proteins from 4 different apicomplexan species, and expected prediction accuracies range between 82%, and 87%. ApicoAMP is an ensemble classification model. In ApicoAMP, different algorithms and feature sets are used to train several classifiers that are evaluated and combined in an ensemble classification model to obtain the best expected performance. ApicoAMP is trained on a set of proteins from 11 apicomplexan species, and its expected prediction accuracy is found to be 91%. In addition, we developed ApicoAP Pipeline, where we introduced an automated training data gathering procedure. This pipeline works as an automated ApicoAP classifier generator that does not require training data to be provided, but instead is capable of generating a classifier from the information available from public resources at a given time. Conclusions: Our work significantly broaden the set of apicoplast-targeted proteins that can be identified computationally. The ApicoAP and ApicoAMP prediction software and ApicoAP Pipeline client software are available for public use at http://bcb.eecs.wsu.edu.