COVID 19 Data Commons and Analytic Environment
The Renaissance School of Medicine and the College of Engineering and Applied Sciences have developed a COVID 19 data commons that will support integrated management, query and analysis of clinical, radiology, pathology, spatial and molecular data. The clinical data captures all available information, in a HIPAA compliant fashion, about COVID 19 patient symptoms, past medical history, family history, clinical course, treatment and response as well as data elements relating patient demographics and co-morbidities. The radiology data includes all imaging studies obtained during each patient’s treatment including CT and chest x-ray data along with computationally derived data products. Radiology imaging data is extremely important in COVID 19 from both a diagnostic and a monitoring perspective, given the crucial nature of COVID 19 pulmonary disease and its rapid phenotypic changes. COVID 19 can also impact other organs, so we include all radiological studies. Pathological whole slide imaging data is being generated from COVID 19 patient biopsies and autopsies. We anticipate that the data commons will help guide molecular studies, and so information on clinically validated and experimental biomarkers, and both host and viral genomic, epigenetic, and transcriptomic data will be incorporated as it becomes available. The integrated use of molecular and imaging methods to understand host/pathogen interaction is particularly important in COVID 19. The spatial information will encompass home and work addresses of patients who have been under investigation for having COVID 19 whether they test positive or not. The spatial information will also include patients with positive or negative antibody tests. Notably, the data commons will include the ability to query and analyze spatial patient information while maintaining patient privacy.
The data commons will make these datasets available to all researchers from contributing organizations, to help guide best clinical practices in complex clinical situations, support analytic pipelines designed to discover and evaluate biomarkers to predict clinical course and treatment response, as well as pipelines that can predict potential outbreaks and steer prevention efforts. Effective approaches to COVID are likely to require coordination and integration of multiple approaches and multiple data sources. The data commons will manage and make available results from the above-mentioned analytic pipelines to the research and clinical communities.
Stony Brook Medicine is actively engaged in leveraging the data commons to help lead a variety of broader national and international COVID 19 clinical and research efforts. We are using the data commons to develop a variety of molecular, imaging and geospatial computational modeling pipelines. In collaboration between the Renaissance School of Medicine, the College of Engineering and the Institute for Engineering-Driven Medicine, we have launched a variety of statistical and artificial intelligence-based projects to predict outcome, progression and response to treatment in our patient population. This effort leverages the highly productive research efforts already underway in medical applications of machine learning and artificial intelligence. These efforts make integrated use of clinical, imaging, molecular and spatial informatics information.