Daniel Bogenhagen

Analysis of CryoEM Images of Human Pol Gamma

The computational task we currently need resolve is to reconstruct the 3D structure of some mitochondrial proteins using cryo-electron microscopy (cryoEM) data. CryoEM allows us to obtain structural information about large protein molecule and their complexes that are difficult to study
using X-ray diffraction. The initial proteins we are studying include the mitochondrial DNA polymerase and a replicative helicase. Large numbers (10000 -100000) of individual 2D CryoEM images of protein molecules preserved in vitreous ice are collected and used for 3D reconstruction. Since these images are noisy, and protein molecules are presented in multiple orientations (not known a priori) such a 3D reconstruction represents a separate task. Individual images (~100x100 pixels) are usually taken and they are subjected to reference free alignment to split them onto classes corresponding to similar molecule orientation. These classes are used to generate a starting model that is subjected to several
rounds of refinement using iterative projection matching yielding a 3D structure with 10-20 Å resolution (sufficient to see an overall molecule shape and its general structural features). The latter procedure requires considerable computational resources. A typical medium resolution 3D reconstruction procedure deals with ~100 GB data set and requires up to several days on a dual Xeon Linux machine. Treatment of full data sets can take weeks and is a rate limiting step in our research. We hope that use
of the Seawulf cluster will accelerate this work.

The main software we currently use for reconstruction is the EMAN software package (http://ncmi.bcm.tmc.edu/~stevel/EMAN/doc/). It is an open source program running on multiple platforms including Linux and supporting parallel processor architecture. It consists of several routines that can run separately. Most of them, including all time and resource consuming programs, run in the text window allowing us to work via SSH. We may also be interested in using SPIDER as an alternative software package http://www.wadsworth.org/spider_doc/spider/docs/spider.html. We do not have a precise estimate of the amount of computer time required for this project, since we do not have any benchmarks with this software on a cluster. We have estimated a requirement of 20 hr on 4 processors per month for a period of 6 months.

 

This could be a section title