UWA Logo Computer Science & Software Engineering
Honours, Grad.Dip., and Masters(Coursework) projects offered in 2008
   Faculty Home  |  School Home  |  School Teaching  |  2008 Projects

Dr Michael Wise, mwise@cyllene.uwa.edu.au

Since the early 1990s A/Prof Wise's research focus has been bioinformatics, following many years in mainstream computer science research, particularly language primitives for parallel computation and automated plagiarism detection. A/Prof Wise's bioinformatics research interests are diverse, ranging from keyword clustering of proteins based on text mining, through analysis of low-complexity and aggregation-prone proteins, to simulation of metabolic and other chemical reaction pathways. His philosophy is that bioinformatics is pure and simply the application of computer technology to biology, generally based on molecular-level data.

Starting at UWA in 2004, A/Prof Wise is a joint appointment between the Schools of Biomoledular, Biomedical & Chemical Sciences, Computer Science & Software Engineering and in the Faculty of Natural and Agricultural Sciences.

Projects

  1. A Novel Method for Building Phylogenetic Trees

    Phylogeny is the study of the relatedness of species. The way this is done these days is through the computational analysis of genes in living organisms (after all, we can't go back in time to track speciation events as they happened). The phylogeny of organisms is often depicted as phylogenetic trees and there is a considerable literature on how best to create such trees. Most methods take as input data from a single gene or protein sequences. That is, the same gene is found in all the species of interest and then compared to build the tree. The problem with this approach is that it assumes that the gene is "typical" and that evolutionary pressures have acted in the same way across all the species to shape that gene. A second problem is to find a gene that is both ubiquitous and conserved in its function, but with sufficient variability to differentiate the various species possessing that gene. In this project you will create an application which extracts data generated by an existing genome analysis application as it traverses whole bacterial chromosomes, and then, after normalising the elements of the data vectors, try different methods for building phylogenetic trees from the data. In other words, rather than trying to find the idea gene around which to build a tree, this method will use all the data available in chromosomes or, by extension, entire genomes.

  2. Computing the Genetic Basis for the Sit-and-Wait Hypothesis of Bacterial Pathogenicity

    The Sit-and-Wait Hypothesis (Walther & Ewald 2004) relates the environmental durability of non-vector-borne microbial pathogens to their pathogenicity. Non-vector-borne pathogens, such as E. coli are passed directly from host to host, rather than being transmitted by another vector species, such as mosquitos. (For example, Ross River Fever, seen around Perth, is mosquito borne, while influenza is transmitted from person to person. Both, however, are viruses.) In the hypothesis, durability - the ability to survive the stresses associated with existing for a period outside a host - is, in effect, a cofactor for pathogenicity, in concert with the necessary presence of virulence factors. That is, without an assortment of virulence factors, a microorganism is unable to colonise a host, but if the microorganism is labile, i.e. not able to withstand desiccation, the pathogen cannot afford to be too virulent sick because an immobilised infective host is unable to move and thus unable to spread the infection, and the infection will die out.

    GeneOntology (Gene Ontology Consortium 2000) is a set of 3 ontologies specific to molecular biology. The project will involve mapping GeneOntology (GO) terms to genes in bacterial genomes. This will be done either directly from anotations associated with the genes during the annotation of the genomes, or indirectly via text mining for keywords that have been extracted from the annotation using techniques similar to those employed in the Protein Annotators Assistant (Wise 2000). A third track will be association of genes with particular protein families based on hidden markov model searches.

    Term extraction will focus on the GeneOntology biological process terms related to tolerance of abiotic stress. Examples might include GO:0042538 (hyperosmotic salinity response), GO:0009414 (response to water deprivation) and GO:0009409 (response to cold). In the hierarchic GeneOntology system, all these terms come under the parent terms GO:0009628 (response to abiotic stimulus) and GO:0006950 (response to stress). The sets of genes related to tolerance of abiotic stress from the different bacterial species will then be correlated with information from the literature about durability of the species and known pathology, much as was reported in Walther and Ewald (2004), except that in this case durability will be marked by the presence of genes thought most closely related to tolerance of abiotic stress.

    1. Gene Ontology Consortium (2000), "Gene Ontology: tool for the unification of biology", Nature Genetics 25:25-29.
    2. Walther B.A. and Ewald P.W. (2004), "Pathogen Survival in the External Environment and the Evolution of Virulence", Biological Reviews 79:849-869.
    3. Wise M.J. (2000), "Protein Annotators Assistant: A Novel Application of Information Retrieval Techniques", Journal of the American Society for Information Science (JASIS) 51(12):1131-1136, John Wiley.

    Background Reading

    Tutorials aimed at computer scientists can be found at:

Top of Page
CRICOS Provider Code: 00126G