Show simple item record

dc.contributor.advisorBroschat, Shira L.
dc.creatorCILINGIR, GOKCEN
dc.date.accessioned2013-09-20T18:44:11Z
dc.date.available2013-09-20T18:44:11Z
dc.date.issued2013
dc.identifier.urihttp://hdl.handle.net/2376/4763
dc.descriptionThesis (Ph.D.), School of Electrical Engineering and Computer Science, Washington State Universityen_US
dc.description.abstractMotivation:The cells of eukaryotic organisms contain subunits called organelles. The apicoplast is a unique organelle found in a group of parasites, known as Apicomplexa, that are responsible for a wide range of serious diseases including malaria. The apicoplast is an ideal drug target because of its unique properties. Identifying apicoplast-targeted proteins (ATPs) is necessary for drug target identification and accurate in silico prediction methods are needed to accelerate this process. Current computational approaches concentrate on a single species of Apicomplexa and are capable of predicting only a subset of ATPs. Methodology: We have developed two new computational approaches, ApicoAP and ApicoAMP, that concentrate on different types of ATPs and that are applicable to multiple species of Apicomplexa. ApicoAP is a generalized rule-based classification model. In ApicoAP, we conduct a systematic search over a rule space using the expected prediction performance of a rule on a training set as the optimization criterion. The rule space is formalized by our parametric rule definition. We devised a genetic algorithm to perform the optimization that results in a classification rule. Performance of ApicoAP is evaluated for labeled datasets of proteins from 4 different apicomplexan species, and expected prediction accuracies range between 82%, and 87%. ApicoAMP is an ensemble classification model. In ApicoAMP, different algorithms and feature sets are used to train several classifiers that are evaluated and combined in an ensemble classification model to obtain the best expected performance. ApicoAMP is trained on a set of proteins from 11 apicomplexan species, and its expected prediction accuracy is found to be 91%. In addition, we developed ApicoAP Pipeline, where we introduced an automated training data gathering procedure. This pipeline works as an automated ApicoAP classifier generator that does not require training data to be provided, but instead is capable of generating a classifier from the information available from public resources at a given time. Conclusions: Our work significantly broaden the set of apicoplast-targeted proteins that can be identified computationally. The ApicoAP and ApicoAMP prediction software and ApicoAP Pipeline client software are available for public use at http://bcb.eecs.wsu.edu.en_US
dc.description.sponsorshipDepartment of Computer Science, Washington State Universityen_US
dc.language.isoEnglish
dc.rightsIn copyright
dc.rightsPublicly accessible
dc.rightsopenAccess
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.rights.urihttp://www.ndltd.org/standards/metadata
dc.rights.urihttp://purl.org/eprint/accessRights/OpenAccess
dc.subjectBioinformaticsen_US
dc.subjectComputer scienceen_US
dc.subjectArtificial intelligenceen_US
dc.subjectapicomplexaen_US
dc.subjectapicoplasten_US
dc.subjectgenetic algorithmsen_US
dc.subjectmachine learningen_US
dc.subjectpredictionen_US
dc.subjectsubcellular localization predictionen_US
dc.titleComputational approaches for the prediction of apicoplast-targeted proteins
dc.typeElectronic Thesis or Dissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record