CSU EAST BAY
DEPARTMENT OF MATHEMATICS AND
COMPUTER SCIENCE
THESIS PRESENTATION
Friday, August 26, 2005; Noon Sc S105C
Speaker: Monica Jain
Computational Algorithms for the EST Clustering Problem to Identify Unique Genes
ESTs (Expressed Sequence Tags) are short sequences derived from cloned DNA molecules that represent the transcribed regions of a genome. ESTs contain sequences of the coding regions and 5' and 3' transcribed noncoding regions of genes. The assembly of EST sequences into clusters is important for discovering the structure and functions of the EST-associated genes. Algorithms for assembling EST sequences must be specific enough to ensure that only sequences of a unique gene are clustered together while remaining computationally feasible. Different clusters derived from sequences from the same gene can then be grouped and used to help define gene function, to identify elements that can be exploited for the development of DNA micro arrays and to facilitate whole genome assembly. Developing efficient EST clustering algorithms is critical to the future of biology.
In the work presented in this thesis, I have developed an efficient algorithm for large-scale clustering of EST sequences derived from the unicellular green alga Chlamydomonas reinhardtii. C. reinhardtii is a model organism used by a large and active community of researchers who study processes such as photosynthesis and motility. A tremendous amount of EST sequence information for this organism is available in various public databases. Once I generated the algorithm to form sequence clusters out of the available EST data, I was able to assemble each cluster into either full-length cDNAs or contiguous stretches of cDNA (contigs) that represent unique genes. Although my protocol has been developed for and tailored to the sequences of a dataset derived for C. reinhardtii, it, in principle, could be exploited to assemble sequences derived from any eukaryotic organism.
Pizza and soda will be served for those attending!