|
|
Age of Genes;
Looking for Common Ancestors Daniel Greengard, Horace Greeley High School, Chappaqua; Dmitry Mozzherin and Moises Eisenberg, Departments of Pharmacology and Medical Informatics, StonyBrook University | |||
We can only look at the genetic code of present species. After we have decoded and identified each gene from each species, how can we determine how old each gene is? In order to get closer to answering this question, we first need to have a reliable set of complete genomes from a large and diverse set of species. Subsequently, we have to compare every gene in a specie to every gene in every one of all other species, to establish proximities and homologies. To organize all this information in a way that would allow comparisons to be made, we needed to put all of the data into a common database, and follow a systematic nomenclature and format. We selected to use MySQL as the database platform, and the simple "FASTA" format to describe the amino acid sequences of all genes in the various genomes. All programming to generate the organized data was done in the language "Python." This portion of the project has been completed and can now be used to continually update data and enlarge it to more species. The following species were selected for complete genomes: Homo sapiens (human), Caenorhabditis elegans (ascaris worm), Drosophila melanogaster (fruit fly), Fugu rubripes (blowfish), Saccharomyces cerevisiae (baker's yeast), Mus musculus (house mouse), Arabidopsis thaliana (small flowering plant), and Xenopus laevis (frog). Data was obtained from the Swiss-Prot, TrEMBL, and TrEMBL New databases that are publically available on the internet. To
rank gene pairs from two species, we computed the values of "score",
"identity", "total identity", similarity", and "total
similarity", which are taken from the Smith-Waterman set of gene comparisonalgorithms.
The current implementation of these algorithms we used is the one developed by
D. Mozzherin at Stony Brook University. This process paves the way for all comparisons
of complete genomes.
| ||||
Back to Home page