NYCCS/Computer Science Seminar

Workshop on Massively Parallel Computing, October 14, 2008

Math Tower - Room S-240

Carlos Simmerling, SBU - Functional Motions of Proteins Studied through Molecular Dynamics

Experimental methods have been highly successful in determining
3-dimensional biomolecular structures. However, most approaches
provide only time- and ensemble-averaged data, making it much more
difficult to study the dynamic and energetic aspects of biological
systems. Atomic-resolution simulations are highly complementary to
experiments, and can provide data with unparalleled resolution in time
and space. Increasingly powerful supercomputers have enabled atomistic
simulations to address many important questions. However, the current
hardware trend is toward massively parallel architectures, and
traditional "brute-force" molecular dynamics simulations are unable to
scale well to the tens of thousands of processors on these machines.
This seminar will present our recent work in biomolecular simulation
algorithms, including the coupling of multiple simulations that share
information to significantly improve efficiency as compared to
uncoupled simulations. Applications include the study of protein
structure and stability as well as the modeling of slow yet
functionally important dynamics that have been inaccessible to
traditional simulations.

Jeremy Rice, IBM - Cardiac Modeling

High performance computing in multiscale cardiac modeling:  Bridging proteins to cells to whole heart

Abstract

The talk will cover current methods to simulate the cardiac tissue at multiple spatial and temporal scales.  The first part of the talk will describe a mechanistic model to understand the fundamental properties of the contractile proteins in cardiac cells.  The model is computationally expensive and requires a supercomputer to simulate sub-cellular structures.  While useful to understand biophysical questions, such detailed models are obviously impractical to model the billions of cells that comprise a whole human heart.  The second part of the talk will cover the efforts to bridge from cells to large organ-level anatomical structures with practical run times.  Specifically, a method is developed to distribute the computational workload over 8196 computational nodes on a IBM Blue Gene System.  By using optimal recursive bisection, the anatomical data of the Visible Men ventricles is distributed to the computational nodes.  The segments were created with an overlap region to be able to compute the spatial derivatives.  The anatomical data dictated up to 1.44 billion elements with a size of 0.1 mm, a spatial increment that approximate real cell dimensions.  This decompositions method represents an increase of roughly 2 orders of magnitude over existing published methods that are limited to hundreds of computational cores.  Small numbers of computational cores limit spatial resolution or produce impractical runtimes.  Hence, the results show that appropriate decomposition methods to large number of cores can make whole heart simulations practical with high spatial resolution and reasonable runtimes.  The goal if this work is to foster wider use of cardiac models for research and clinical applications.

 

John Reinitz, SBU -"From Data to Dynamical Systems via Parallel Computing"

Abstract:

I will present a case study of how parallel computing can be used to
solve an important scientific problem. Animals develop from embryos
that self-organize their body patterns in the presence of noise, which
arises from molecular fluctuations and other sources. In order to
produce a viable organism, the variance associated with this noise
must be reduced, a process called "canalization" which was first
predicted in 1942 by C. H. Waddington. Using the early embryo of the
fruit fly Drosophila as an example, I will show how dynamical
interactions between genes reduce positional variance of gene
expression in the early embryo. The model takes the form of a
deterministic dynamical system with fluctuating maternal inputs, and
it is well supported by confirmed predictions of increased variance in
certain double mutants. Finally, I will show how the reduction in
variance can be understood in terms of trajectories in state space
which are governed by point attractors in the anterior of the embryo
and an attracting manifold in the posterior.

All of these biological results were obtained through high
performance parallel computing. Obtaining the dynamical model required
solving an inverse problem in which the 48 parameters of a system of
232 simultaneous nonlinear ODEs were determined from about 2100
observations of gene expression over time. This difficult numerical
problem requires about 50 CPU-days of computation and is best solved
in parallel. This is done by means of parallel simulated annealing,
using an algorithm devised in collaboration with Y. Deng and
K.-W. Chu. I will describe this algorithm as well as its strengths and
limitations an various parallel architectures, with emphasis on the
prospects for highly scalable speedup on BlueGene and other systems
with fast and balanced communication.

 

Chad Peck, IBM - Brain Simulation and Imaging with BlueGene/P

High Performance Computing for Medical Imaging and Neuroscience Research

Abstract:

By reducing or eliminating computational barriers in medical imaging and neuroscience research, high performance computing expands the scope of intellectual exploration and the range of possible solutions.  This talk presents IBM research in medical imaging that exploits high performance computing for improved data acquisition and analysis.  It also presents our research into large-scale neural systems modeling

 

I-HShin Chung, IBM - Performance Tools for Parallel Programming

Next Generation Application Enablement Tools

Abstract:  In the presentation, I will first give an overview of IBM High Performance Computing Toolkit  that collects performance data from various system and programming “dimensions” (e.g., CPU, memory, message passing, threads, I/O...).  It provides an excellent starting point for a programmer to understand the performance behavior of their applications.  The second part of the presentation describes an ongoing project that responses to the productivity challenge of the U.S. DARPA High Productivity Computing Systems (HPCS) initiative.  We have developed a framework that provides a simple and pain-free interface through which scientists can collect and query rich performance data during application execution and analyze its performance by evaluating this data using predefined bottleneck signatures.  The framework helps to make sense of the performance data collected and to automate the performance tuning process.

 

Carlos Sosa, IBM - Case Study: Open MP/Hybrid Programs on BG/P

Introduction to OpenMP on Blue Gene

Abstract:

In this session we provide an introduction to OpenMP.  This parallel paradigm is supported on the Blue Gene/P system for shared-memory parallel programming in C/C++ and Fortran. This parallel paradigm has been jointly defined by a group of hardware and software vendors and has evolved as a standard for shared-memory parallel programming.

OpenMP consists of a collection of compiler directives and a library of functions that can be invoked within the application. This combination provides a paradigm for developing parallel programs on shared-memory architectures. In the case of the Blue Gene/P system, it allows the user to exploit the SMP mode on each compute node. Multi-threading is now enabled on the Blue Gene/P system. Using OpenMP, the user can have access to data parallelism as well as functional parallelism.

 

Tulin Kaman, SBU - Performance Modeling with Tau

Performance Modeling with TAU

Abstract:

We discuss performance modeling and also a programming model for hybrid (threads + MPI) architectures.

Computational resources are under-used due to performance bottlenecks.  Performance tuning tools are essential to identify and eliminate the bottlenecks.  A variety of profiling and execution analysis tools exist for parallel programs.  TAU (Tuning and Analysis Utilities) is a toolkit for the performance analysis and tuning of distributed and multi-threaded programs.  TAU can be used by New York Blue users writing a C, C++, or Fortran application who want to understand where the performance bottlenecks are.  The TAU performance evaluation tool is available for codes which run on IBM Blue Gene L/P platforms.

We analyzed our code run on BG/P with threads enabled but no modifications to identify thread safe segments.  The result was a 20% (out of a potential 400%) improvement.

We also introduce a noninvasive programming model which will allow the effective use of multithreading.  It consists of "three" levels of mesh decomposition: the finest is the computational mesh, the middle is the thread mesh and the coarsest is the MPI mesh.

 

Gheorghe Almasi, IBM - The Common PGAS Runtime for IBM's X10, UPC
and Co-array Fortan Compilers

The Common PGAS Runtime for IBM's X10, UPC and Co-array Fortran Compilers

Abstract:

The talk presents the design and implementation challenges of the common PGAS runtime that supports three different IBM compilers and languages.  The design goals include interoperability with existing code (MPI and OpenMP), operation in several different programming environments (Blue Gene, system/p AIX and Linux, as well as the Barcelona Supercomputer), multiple network transports including Infiniband, IBM HPS/Federation and Infiniband.  We are also exploring options to expand the limits imposed by PGAS languages by adding collective communication into the mix.