IT Infrastructure & Informatics: Data Storage, Analysis and Visualization header

Download Brochure |  Short Courses 

The amount of data being generated by the life sciences is growing at an exponential pace; the amount of data coming from next generation sequencing platforms, as well as data generated from imaging efforts has created a large scale informatics challenge. IT infrastructure and informatics continue to evolve to keep pace with the storage, analysis, visualization and sharing needs of data being generated at the terabyte level. This track will feature case studies from leading IT, IS and Informatics experts from pharmaceutical drug discovery and development, and discuss current approaches, platforms and workflows being utilized to handle data on such a large scale.


9:00 Conference Registration and Morning Coffee




9:30 Chairperson’s Opening RemarksSponsored by

Derek Burke, Director, International Marketing, Panasas

9:35 Next-Generation Sequencing Data: Next-Generation Management Problems

Guy CoatesGuy Coates, Ph.D., Informatics System Group, The Wellcome Trust Sanger

Sequencing centers are now collecting and storing large amounts of data from their sequencing machines. However, as datasets get ever larger, simply keeping track of the data becomes a challenge; datasets are lost or needlessly duplicated. The situation is further complicated as datasets become dispersed over collaborative and cloud infrastructures. The talk will explore some of the technologies we are using to help people manage and share their data in the next-gen sequencing era.

10:05 Keeping Pace—Scalable IT Infrastructures to Support Data Intensive Science

Rupert LueckRupert Lueck, Ph.D., Head, IT Services, EMBL Heidelberg

The IT infrastructure required to support science at EMBL is seriously challenged by the enormous amounts of data generated by technologies such as NGS and high-throughput microscopy that are used in large-scale and interdisciplinary systems biology projects. This talk highlights our strategies to ensure a scalable, reliable and cost efficient IT environment to keep pace with the rapidly growing demands for high performance storage and compute power.

Sponsored by
10:35 Coffee Break


11:00 Omics Data - Serving Sequencing at the National Level

Francois ArtiguenaveFrancois Artiguenave, Ph.D., Head, Bioinformatics Laboratory, Genoscope

Life science has been profoundly impacted by technological advances allowing faster and cheaper DNA sequencing. Opening a wide range of applications, the last sequencing platforms raised new challenges in processing, analysing and interpreting massive data. The growing role of informatics and bioinformatics will be illustrated by providing some figures about genome sequencing and others applications aimed at unraveling biological mechanisms.

Sponsored by
11:30   A Novel Web 2.0 Solution to Address the Sequencing Data Analysis Bottleneck
Brigitte Ganter, Ph.D., Director of Product, DNAnexus
DNA sequencing output has completely outstripped the pace of Moore's law, making a data analysis bottleneck inevitable. At the same time, sequencing costs have plummeted, bringing large-scale sequencing projects within the grasp of more institutions. New, easily accessible informatics solutions are needed to enable researchers to fully harness the advancements in DNA sequencing technology for practical use. DNAnexus will present an integrated, web-based sequence analysis platform powered by cloud computing that removes the data analysis bottleneck from next-generation sequencing.

Sponsored by
11:45 Enabling Decision Making in Drug Discovery: Knowledge Enrichment Framework
Ilya Mazo, Ph.D., CEO, Ariadne Genomics
Today, bioinformaticians face challenges not only presented by the large volumes of generated data but also by providing useable knowledge across the different phases of drug discovery and development. Integrating data from different domains (target ID, lead opt, preclinical, clinical development) which often has non-existing or quickly changing data models in a coherent fashion remains a high priority.  To meet this challenge, Ariadne developed a system that capitalizes on knowledge extraction to interpret and integrate information from public and legacy sources including disease associations, druggability, mechanisms of action and toxicity, and experimental data. The application of this practical knowledge management solution to the variety of areas ranging from pharmacology to oncology and biomarkers will be presented.

12:00 Sponsored by
Data Management Challenges in Translational Collaborative Research and Clinical Care at Erasmus MC
Bert Eussen, Clinical Genetics, Erasmus MC
Peter Walgemoed, Director, Carelliance

Translational medicine is driving an evolution in research and personalized patient care; particularly in next generation sequencing research.  This evolution also requires a new approach in the data center to architect new data access and reporting policies, set service levels and manage the explosive data growth associated with this new model for research and clinical care. Erasmus MC in partnership with HP, is implementing a storage and computational data services concept. They will share their approach to managing the Cell Biology/Clinical Genetics data sprawl and information management challenges.

12:30 Lunch for Purchase in the Exhibit Hall

13:45 Dedicated Poster Viewing in the Exhibit Hall


STORAGE: Solutions for massive sequencing Data MANAGEMENT

14:30 Chairperson’s Remarks

14:35 Featured Presentation: Sequencing Data Storage Sponsored by

Chris DagdigianChris Dagdigian, Founding Partner & Director, Technology, BioTeam, Inc.

Next-Generation Sequencing (NGS) instruments are forcing evolutionary and revolutionary changes in research IT architectures & infrastructures. Chemistry and lab protocols are advancing faster than the underlying IT systems and methods, leading to a crisis of capability in many organizations. This presentation will focus on “science-centric storage” for life science informatics, with specific attention on requirements, trends, data management methods and emerging best practices.

15:05 Experiences from the European Bioinformatics Institute’s Data Resources, Storage, and Management

Guy Cochrane, Ph.D., European Nucleotide Archive Team Leader, Bioinformatics Institute

15:35 Refreshment Break

Sponsored by
Isilon Systems


Sponsored by
Isilon Systems 
16:00 How Scale-out Requirements in Life Sciences are Making Obsolete RAID, LUNs, Volumes and the Rest of the Traditional Storage Model
Rob Anderson, Director, Technology & Business Development, EMEA
Many areas of life science research such as genomics, proteomics and high-resolution measurement and modeling involve the generation, accumulation, analysis, and distribution of large amounts of data. These advances are offset by capacity increases in storage technologies that are undergoing their own rapid evolution, such as the increase in hard-drive density. However, storage systems at scale, incorporating these ever-denser disks, myriad processing elements and proliferating gateways, are becoming less and less reliable and manageable, as they try to cope with ever-increasing quantities of data. A differentiated approach is shown which avoids the elements of traditional storage, such as hardware RAID, allowing massive scale-out needs to be served with high levels of availability and performance, and with incredible ease of management, regardless of scale.

16:30 How to Overcome the 100 Miles between Petabases and Petabytes

Jürgen Eils, Bioinformatics Database Group Leader, German Cancer Research Center

Recently, Heidelberg University received a grant to build the largest data storage facility in Germany at 5-10 petabytes. From a management and logicistical perspective, the massive throughput of next-generation sequencing requires new concepts and strategies. One problem is the long distance transport of data from the sequencer machine to the data storage facility. We will present strategies and concepts with emphasis on reusability and sustainability for storing and retrieving the comprehensive collection of sequencing data in combination with associated clinical and histopathological annotation data – all in accordance with the International Cancer Genome Consortium (ICGC) guidelines.

17:00 Enhanced Scalabilty, Large Data Volumes Management, Integrated Analysis, and NGS Informatics Support in a Medical Setting

Andrew Stubbs, Ph.D., Assistant Professor, Department of Bioinformatics, Erasmus Medical Center

17:30 The First Success Stories after the Swedish Buildup of Computational Power and Large-Scale Storage for Gene-Sequence Data

Ingela NystromIngela Nystrom, Ph.D., Director, UPPMAX, Center for Image Analysis, Uppsala University

Last year, we reported on the buildup of a system at Uppsala University, Sweden, intended for researchers who deal with the large-scale data from modern gene-sequencing technology. The system has 1200 cores, 4 TB RAM, and 500 TB storage. Now, we report our first success stories, e.g., the whole-genome resequencing project revealing signatures of selection during chicken domestication.

18:00 Sponsored Presentation (Opportunity Available)

18:30 Interactive Breakout Discussion Groups:

Repositories of Metagenomic Data and Tools for Academic and Commercial Users

Moderator: Oleg Reva, Ph.D., Senior Lecturer, Biochemistry, University of Pretoria

  • Data formats and supplementary information requirements
  • Clustering and binning of environmental sequences
  • Modeling metabolic pathways and ecological interactions: facts and artifacts

Storage of Omic Data - The Cloud & Beyond

Moderator: Chris Dagdigian, Founding Partner & Director of Technology, BioTeam, Inc.

  • Science-centric storage
  • Data management issues
  • Best practices

Web Services

Moderator: Christian Hauck, Ph.D., Knowledge Management & Competitive Intelligence, Novartis Pharma AG

Analyzing and Storing Gene Sequence Data

Moderator: Ingela Nystrom, Ph.D., Director, UPPMAX, Center for Image Analysis, Uppsala University

  • User support
  • Data security issues
  • What will come next?

Semantic Web and Ontologies

Moderator: Martin Gollery, Senior Bioinformatics Scientist, Tahoe Informatics


Title To Be Announced
Moderator: Richard Compton, Vice President, EMEA

• Determine priority areas where ‘scientifically enabling’ the research desktop provides most value to the critical path of the research cycle
• Determine issues to overcome in meeting corporate ‘enterprise standards’ for deployment
• Showcase areas of nascent technology

19:15 – 21:00 CHI Networking Reception

Sponsored by

Isilon Systems


9:00 Conference Registration and Morning Coffee



9:30 Chairperson’s Opening RemarksSponsored by
GGA Software

Yuriy Gankin, Ph.D., Chief Scientific Officer, GGA Software Services LLC

9:35 Implementation & Use of PHAEDRA, a Standards-Based System for High-Content Image Analysis and Evaluation

Frans Cornelissen, IT Manager, Janssen Pharmaceutical

High-content Image Analysis based screening (HTS-HCA) technology is an important drug discovery tool for identification of biological probes and drug leads by screening large volume, diverse biochemical and cell-based assays, using image capture & analysis. The architecture and usage of the PHAEDRA environment will be described, using examples like measurement of 3D tumor colony size on brightfield image stacks, quantification of dendritic length, spine density and spine diameter in 3D fluorescent image stacks.

9:55 Lessons Learned in Imaging Informatics for Drug Discovery

Gudrun Zahlmann, Ph.D., Manager, Imaging Infrastructure, pRED, F. Hoffman-La Roche Ltd.

10:15 Quantitative Image Analysis Tools for Biological Research

Ewert Bengtsson, Ph.D., Professor, Center for Image Analysis, Uppsala University

Pure visual analysis of microscopy images is limited in its ability to provide objective quantitative information. Here computerized image analysis can provide automated, quantitative tools enabling accurate and high throughput analysis. We will describe methods for improved microscope image analysis ranging from better utilization of the color information via robust modeling and segmentation to quantification and classification methods.

Sponsored by
Isilon Systems
10:35 Coffee Break



11:00 Machine Intelligence can Improve the Quality of Drug and siRNA Screens

Peter Horvath, Ph.D., Head, Image Analysis, Light Microscopy Centre, ETH Zurich


11:30 Next-Generation Interfaces and Interaction with Complex Information Landscapes

Bryn RobertsBryn Roberts, Ph.D., Global Head, Informatics, Pharma Research and Early Development, F. Hoffmann-La Roche AG

Presentation of background and proof-of-concept projects on two main themes: Potential approaches of enabling scientists to navigate complex, heterogeneous information using semantic integration technologies and a next generation of user interfaces; and Enabling teams to generate, explore and progress hypotheses together using collaborative computer interfaces.


12:00 Informatics for Data Driven Drug Discovery: How to Win the "War" against Data Complexity and Silos

Jacob de VliegJacob de Vlieg, Ph.D., Professor & Global Head, Molecular Design and Informatics, Merck

Modern drug discovery and development is based on a highly data-intensive and complex multidisciplinary research process with the capacity to perform data-driven research as the potentially biggest differentiator in drug discovery and development. Working smarter is an absolute requirement to win the “war” against data complexity and silos. However, there are many unmet scientific, technological and business process challenges that need to be addressed before we can truly make full use of powerful informatics & omics-inspired technologies within Pharma R&D. New broad-oriented in silico drug hunters able to work at the interface of chemistry, biology and informatics and employing enhanced scientific (e-science) concepts are required to connect the “inhuman” scale of data and to present the data and information at a “human” scale of understanding. Data-driven technologies (e.g. molecular profiling technologies, SBDD, HTS, modeling & simulation) are becoming increasingly interwoven with each other and often require significant modifications of the Pharma R&D business process to have full impact or meaning. In the presentation, I will give examples on how integrative informatics, pattern-recognition and e-science concepts are used to make better use of internal and external (experimental) data sets and collaborations for output-driven drug design and biomarker discovery. 

12:30 Lunch for Purchase in the Exhibit Hall

13:00 Dedicated Poster Viewing in the Exhibit Hall

13:30 Close of Conference

Download Brochure |  Short Courses 

Japanese Korean Chinese Simplified Chinese Traditional 
Premier Sponsors


Hitachi Data Systems


View All Sponsors 

Premier Sponsor

Official Media Partner

Bio-IT World

View Media Partners 


Bio-IT World Events

Bio-IT World Expo Locations 
Bio-IT World Expo 

Bio-IT Cloud Summit