|
Since the early 1990s A/Prof Wise's research focus has been bioinformatics,
following many years in mainstream computer science research, particularly
language primitives for parallel computation and automated plagiarism
detection. A/Prof Wise's bioinformatics research interests are diverse,
ranging from keyword clustering of proteins based on text mining, through
analysis of low-complexity and aggregation-prone proteins, to simulation of
metabolic and other chemical reaction pathways. His philosophy is that
bioinformatics is pure and simply the application of computer technology to
biology, generally based on molecular-level data.
Starting at UWA in 2004, A/Prof Wise is a joint appointment between the
Schools of Biomoledular, Biomedical & Chemical Sciences, Computer Science &
Software Engineering and in the Faculty of Natural and Agricultural
Sciences.
|
Projects
- A Novel Method for Building Phylogenetic Trees
Phylogeny is the study of the relatedness of species. The way this is done these days is
through the computational analysis of genes in living organisms (after all, we can't go
back in time to track speciation events as they happened).
The phylogeny of organisms is often depicted as
phylogenetic trees and
there is a considerable literature on how best to create such trees.
Most methods take as input data from a single gene or protein sequences.
That is, the same gene is found in all the species of interest and then compared to build
the tree.
The problem with this approach is that it assumes that the gene is "typical"
and that evolutionary pressures have acted in the same way across all the
species to shape that gene.
A second problem is to find a gene that is both ubiquitous and conserved in
its function, but with sufficient variability to differentiate the various species
possessing that gene.
In this project you will create an application which extracts data generated by
an existing genome analysis application
as it traverses whole bacterial chromosomes, and then, after normalising the elements
of the data vectors, try different methods for building phylogenetic trees from
the data.
In other words, rather than trying to find the idea gene around which to build a tree,
this method will use all the data available in chromosomes or, by extension, entire
genomes.
- Computing the Genetic Basis for the Sit-and-Wait Hypothesis of Bacterial Pathogenicity
The Sit-and-Wait Hypothesis (Walther & Ewald 2004) relates the environmental durability of
non-vector-borne microbial pathogens to their pathogenicity.
Non-vector-borne pathogens, such as E. coli are passed directly from host to host,
rather than being transmitted by another vector species, such as mosquitos.
(For example, Ross River Fever, seen around Perth, is mosquito borne, while influenza is
transmitted from person to person. Both, however, are viruses.)
In the hypothesis, durability - the ability to survive the stresses associated with
existing for a period outside a host - is, in effect, a cofactor for pathogenicity,
in concert with the necessary presence of virulence factors.
That is, without an assortment of virulence factors, a microorganism is unable to
colonise a host, but if the microorganism is labile, i.e. not able to withstand
desiccation, the pathogen cannot afford to be too virulent sick because an immobilised infective host is
unable to move and thus unable to spread the infection, and the infection will die
out.
GeneOntology (Gene Ontology Consortium 2000) is a set of 3 ontologies specific to molecular
biology.
The project will involve mapping GeneOntology (GO) terms to genes in bacterial genomes.
This will be done either directly from anotations associated with the genes during the
annotation of the genomes, or indirectly via text mining for
keywords that have been extracted from the annotation using techniques similar to those employed in
the Protein Annotators Assistant (Wise 2000).
A third track will be association of genes with particular protein families based on
hidden markov model searches.
Term extraction will focus on the
GeneOntology biological process terms related to tolerance of abiotic stress. Examples might include
GO:0042538 (hyperosmotic salinity response), GO:0009414 (response to water deprivation) and
GO:0009409 (response to cold). In the hierarchic GeneOntology system, all these terms come under
the parent terms GO:0009628 (response to abiotic stimulus) and GO:0006950 (response to stress).
The sets of genes related to tolerance of abiotic stress from the different bacterial species will then
be correlated with information from the literature about durability of the species and known pathology,
much as was reported in Walther and Ewald (2004), except that in this case durability will be marked
by the presence of genes thought most closely related to tolerance of abiotic stress.
- Gene Ontology Consortium (2000), "Gene Ontology: tool for the unification of biology", Nature Genetics 25:25-29.
- Walther B.A. and Ewald P.W. (2004), "Pathogen Survival in the External Environment and the Evolution of Virulence", Biological Reviews 79:849-869.
- Wise M.J. (2000), "Protein Annotators Assistant: A Novel Application of Information Retrieval Techniques", Journal of the American Society for Information Science (JASIS) 51(12):1131-1136, John Wiley.
Background Reading
Tutorials aimed at computer scientists can be found at:
|