Age of Genes; Looking for Common Ancestors
Daniel Greengard, Horace Greeley High School, Chappaqua; Dmitry Mozzherin and Moises Eisenberg, Departments of Pharmacology and Medical Informatics, StonyBrook University

We can only look at the genetic code of present species. After we have decoded and identified each gene from each species, how can we determine how old each gene is? In order to get closer to answering this question, we first need to have a reliable set of complete genomes from a large and diverse set of species. Subsequently, we have to compare every gene in a specie to every gene in every one of all other species, to establish proximities and homologies.

To organize all this information in a way that would allow comparisons to be made, we needed to put all of the data into a common database, and follow a systematic nomenclature and format. We selected to use MySQL as the database platform, and the simple "FASTA" format to describe the amino acid sequences of all genes in the various genomes. All programming to generate the organized data was done in the language "Python." This portion of the project has been completed and can now be used to continually update data and enlarge it to more species.

The following species were selected for complete genomes: Homo sapiens (human), Caenorhabditis elegans (ascaris worm), Drosophila melanogaster (fruit fly), Fugu rubripes (blowfish), Saccharomyces cerevisiae (baker's yeast), Mus musculus (house mouse), Arabidopsis thaliana (small flowering plant), and Xenopus laevis (frog).

Data was obtained from the Swiss-Prot, TrEMBL, and TrEMBL New databases that are publically available on the internet.

To rank gene pairs from two species, we computed the values of "score", "identity", "total identity", similarity", and "total similarity", which are taken from the Smith-Waterman set of gene comparisonalgorithms. The current implementation of these algorithms we used is the one developed by D. Mozzherin at Stony Brook University. This process paves the way for all comparisons of complete genomes.

In addition to doing research, I learned a lot during my research fellowship at Stony Brook University. I learned different techniques and concepts of programming, I learned the SQL and Python languages, and I became familiar with UNIX. My research accomplishments included the following: I helped debug the original comparison program; I analyzed existing data using the SQL language; I helped write scripts in Python to import data from FASTA format files to the MySQL database; Dr. Alex Backer from the California Institute of Technology is studying the age of genes and he gave me 40 human genes (the 30 with the highest connectivities and the 10 with the lowest), which I compared to each of the aforementioned genomes.

 

Back to Home page