Computational methods for estimating genetic relationships
MetadataShow full item record
The advent of molecular methods has altered our approach to the studyof the genetic relationships of microbes. In particular, we now havethe unprecedented ability to estimate genetic relationships from wholegenome sequences whose numbers are increasing exponentially. In thisdissertation we examine several computational methods for using genomesequences to infer the genetic relationships of both plasmids andbacteria.First we describe a method for relating 527 Gram-negative bacterialplasmids based on their genetic sequences. Initial classification oftheir genetic relationships was accomplished using a computationalapproach analogous to hybridization of "mixed-genome microarrays."Relationships were refined for several clusters by identifyingconserved proteins within a cluster. The replication of consistentresults produced in a separate study for a small group of IncA/Cplasmids and clusters of Borrelia plasmids provides evidencethat the approach used can correctly predict genetic relationships.Second, we use the pClust program to estimate the geneticrelationships of the same 527 plasmid genomes. Protein clustersgenerated by pClust are used to create profiles for each plasmidin the tree, which are then used as correlation filters forclassification of a new bacterial plasmid. The major contribution ofthis work is the development of a method that can be used to constructa tree and, more importantly, to insert a new taxon a posteriori.While this method was developed specifically for plasmids, it can beused with genomes of any kind.The third project is a study of the genetic relationships of bacteria,more specifically species of the alphaproteobacteria class. Typicallyphylogeny studies of bacteria are based on the 16S rRNA gene. In thiswork, however, we again use the software program pClust withtwelve genomes to generate homologous protein clusters which are thenused to construct a tree. The results are compared with a treeconstructed using 16S rRNA; while certain features in both trees aresimilar, the differences indicate that the use of whole-genomesequences may provide a better estimate of genetic relationships.