Daniel Bogenhagen
Analysis of CryoEM Images of Human Pol Gamma
The computational task we currently need resolve is to reconstruct the 3D
structure of some mitochondrial proteins using cryo-electron microscopy
(cryoEM) data. CryoEM allows us to obtain structural information about
large protein molecule and their complexes that are difficult to study
using X-ray diffraction. The initial proteins we are studying include the
mitochondrial DNA polymerase and a replicative helicase.
Large numbers
(10000 -100000) of individual 2D CryoEM images of protein molecules
preserved in vitreous ice are collected and used for 3D reconstruction.
Since these images are noisy, and protein molecules are presented in
multiple orientations (not known a priori) such a 3D reconstruction
represents a separate task. Individual images (~100x100 pixels) are
usually taken and they are subjected to reference free alignment to split
them onto classes corresponding to similar molecule orientation. These
classes are used to generate a starting model that is subjected to several
rounds of refinement using iterative projection matching yielding a 3D
structure with 10-20 Å resolution (sufficient to see an overall molecule
shape and its general structural features). The latter procedure requires
considerable computational resources. A typical medium resolution 3D
reconstruction procedure deals with ~100 GB data set and requires up to
several days on a dual Xeon Linux machine. Treatment of full data sets can
take weeks and is a rate limiting step in our research. We hope that use
of the Seawulf cluster will accelerate this work.
The main software we currently use for reconstruction is the EMAN software package (http://ncmi.bcm.tmc.edu/~stevel/EMAN/doc/). It is an open source program running on multiple platforms including Linux and supporting parallel processor architecture. It consists of several routines that can run separately. Most of them, including all time and resource consuming programs, run in the text window allowing us to work via SSH. We may also be interested in using SPIDER as an alternative software package http://www.wadsworth.org/spider_doc/spider/docs/spider.html. We do not have a precise estimate of the amount of computer time required for this project, since we do not have any benchmarks with this software on a cluster. We have estimated a requirement of 20 hr on 4 processors per month for a period of 6 months.
This could be a section title