Clinical Genomics Europe Conference interior header

Bio-IT World and Cambridge Healthtech Institute’s Fifth Annual

High-Scale Computing
Turning Big Genomics Data into Smart Data

4-5 December 2013 | Sheraton Lisbon Hotel & Spa | Lisbon, Portugal

Day 1 | Day 2 | Download Brochure | Biographies 

The management and analysis of the massive amounts of genomic data generated by next-generation sequencing (NGS) are demanding. The system flexibility, sustainability and scalability of IT infrastructure are vital to support genomic research and translation into the clinic. CHI’s Fifth Annual High-Scale Computing conference showcases innovative network applications, architectures, protocols and interfaces for genomic data management and analysis. These best practices for applying, using and improving critical systems and for services, networking and storage turn big data into smart data.

Recommended Pre-Conference Symposia*

Clinical Epigenetics
Quantitative Digital Detection Technologies 

*Separate Registration Required

Wednesday, 4 DECEMBER

07:30 Registration and Morning Coffee



08:30 Chairperson’s Opening Remarks

Janis E. Landry-Lane, Program Director, World Wide High Performance Technical Computing, Life Sciences/Higher Education Segments, IBM


08:40 ELIXIR: The European Research Infrastructure for Life Science Data

Niklas Blomberg, Ph.D.Niklas Blomberg, Ph.D., ELIXIR Director, ELIXIR Hub, EMBL-EBI, United Kingdom

The mission of ELIXIR is to construct and operate a sustainable infrastructure for the sharing of biological information throughout Europe, to support life science research and drive its translation to medicine and the environment, the bio industries and society. The challenges in storing, integrating and analyzing the data from modern biological experiments are real; ELIXIR meets this challenge through a distributed e-infrastructure of bioinformatics services built around established European centres of excellence.

09:10 Scalability, Reproducibility and Traceability in Large-Scale NGS Facilities

Gianmauro CuccuruGianmauro Cuccuru, Ph.D., Researcher, CRS4 Bioinformatics Laboratory, Italy

As the rate of samples to process increases, manually performing and tracking operations becomes increasingly difficult, costly and error-prone, while processing the massive amounts of data poses significant computational challenges. We will present how combining scientific workflow applications (Galaxy) with state-of-the-art processing technologies like Hadoop, OMERO and iRODS can help address these challenges, thus, empowering more complex life science studies while providing scalability, full reproducibility and traceability.

09:40 Providing the Clinical Genomics Platform: A How-To Guide for Flexible and Extensible Services for Clinical Big Data

Brent RichterBrent Richter, Executive Director, Enterprise Research Infrastructure & Services, Information Systems & Academic Programs, Partners HealthCare & Massachusetts General Hospital/Brigham & Women’s Hospital, United States

Managing NGS data and reporting results require secure but extensible infrastructures. Continuing to adopt these platforms to additional clinical areas such as pathology and microbiotics is the current challenge, but the future brings the promise of network medicine that incorporates all information about a patient, from genomics to real-time imaging. What will be the information technology architectures required that places storage, networks and analytics together in a secure and available environment? What services need to be developed?

10:10 Coffee Break in the Exhibit Hall with Poster Viewing

10:45 1000 Genomes/UK10K Projects: Data Management and Data Sharing

Thomas KeaneThomas Keane, Ph.D., Senior Scientific Manager, Vertebrate Resequencing Informatics, Wellcome Trust Sanger Institute, United Kingdom

We have now reached the point where large-scale human disease association studies are carried out primarily using next-generation sequencing technologies. These studies can generate many hundreds of terabases of sequencing data. One of the key challenges is to devise scalable and robust data management and data sharing solutions. In this talk, I will cover how we have addressed these challenges at the Wellcome Trust Sanger Institute.

11:15 Implementation of Translational Informatics in the Clinic

Jesper TegnerJesper Tegnér, Ph.D., M.D., Professor, Strategic Chair, Computational Medicine and Director, Unit of Computational Medicine, Center for Molecular Medicine, Medicine, Karolinska Institutet and Karolinska University Hospital, Sweden

There is an urgent need to manage and integrate different data types originating from molecular translational research and healthcare. In collaboration with clinicians, we have implemented a system integrating 15172 DNA, serum and synovial samples, 1436 cell samples and 65 SNPs per patient and clinical database with 5652 clinical visits for a cohort of 379 patients. Basic functionalities include research data management, development of bioinformatics workflow and analysis, sub-cohort selection and reuse of clinical data in research settings. We will describe the challenges and solutions inherent in this kind of work.

IBM11:45 An Integrated High Performance Computing Platform for Genomics and Translational Research

~ Janis_Landry-Lane

Janis E. Landry-Lane, Program Director, World Wide High Performance Technical Computing, Life Sciences/Higher Education Segments, IBM


~ Tzy-Hwa (Kathy) Tzeng, Ph.D., Senior Technical Staff Member, IBM Life Science and NGS Solution


High-performance computing and storage are required to efficiently process data generated by NGS. The applications used to map reads and detect variants are typically CPU and I/O intensive.  IBM has characterized this workload and developed optimal genomics platform to address the demand. The goal of any sequencing project is to gain insight into diverse range of biological processes by integrating genome data with corresponding phenotypes. Large computational capacity and sophisticate algorithms are mandatory in the translational platform.  We will illustrate how IBM seamlessly integrates genomics and translational platforms.


12:15 Luncheon Presentation (Sponsorship Opportunity Available) or Lunch on Your Own

13:15 Session Break



14:00 Chairperson’s Opening Remarks

Per Hansen, Technical Specialist EMEA, Aspera, Inc.

14:05 Cloud4Science: Using Public Clouds in NGS Pipelines

Ignacio BlanquerIgnacio Blanquer, Ph.D., Associate Professor/Researcher, Computer Systems/Institute of Instrumentation for Molecular Imaging, Universitat Politècnica de València, Spain

Cloud4Science is an initiative funded by Microsoft to boost the use of clouds in science. The first prototype focuses on genomics on the cloud ( and integrates a complete pipeline for mutation detection analysis running on Windows Azure cloud, which users can easily deploy and run using their Azure credentials. Cloud4Science has focused on the issues in data transfer, provenance and massive processing from a user-friendly portal that transparently deploys the needed resources.

14:35 Cost-Effective GPU-Grid for Genome-Wide Epistasis Calculations

Benno PützBenno Pütz, Ph.D., Statistical Genetics, Max Planck Institute of Psychiatry, Germany

With local restrictions on handing out clinical data to external computer centers, the cloud, we were forced to establish appropriate computing resources in-house as an alternative solution. From a price/performance point of view, low-cost systems based on consumer graphics cards turned out to be our best option. The system setup as well as some of the work performed on epistasis will be presented.


15:05 CloudMC: A Cloud Computing Platform for Radiation Calculations

Hector MirasHector Miras, Medical Physicist, Department of Medical Physics, Virgen Macarena University Hospital, Spain





Rubén JiménezRubén Jiménez, Chief Software Architect, R&D Division, Icinetic, Spain

Monte Carlo (MC) algorithms are considered the gold standard for radiation calculations, but they are computationally expensive. That is why they are not used in routine clinical practice. CloudMC is a platform developed on Windows Azure intended for parallelization of radiation calculations using MC algorithms. This platform allows any user to have access to a big computing power to perform MC simulations paying only for the resources used.

Aspera15:35 High-Speed Data Movement for Global Collaboration in Genomic Research - Bridging Enterprise and Cloud Infrastructure for Effective Global Collaboration in Life Sciences

Michelle_MunsonMichelle Munson, Founder & CEO, Aspera, Inc. 




16:05 Refreshment Break in the Exhibit Hall with Poster Viewing

16:45 Selected Oral Poster Presentation: Bringing the Tools to Data: Providing Scientists with Personalized Bioinformatics Services on Clouds

Christophe Blanchet, Ph.D., Distributed Research Infrastructure for Life Science, Centre National pour la Recherche Scientifique (CNRS), France

Improvements of experimental technologies force life science researchers to face a deluge of data that requires relevant tools and sufficient computing resources. To answer that new difficulty we created personalized bioinformatics cloud services that are predefined and turnkey appliances with common bioinformatics tools, workflows and gateways. They have sizes usually of gigabytes, which is more efficient for moving them where the terabytes of biological data are stored instead of moving these data. To better identify the needs and the relevant appliances to develop, this work is done in collaboration with the French Institute of Bioinformatics (IFB).

17:05 The EUDAT Project and Cloud Storage 

Wolfgang GentzschWolfgang Gentzsch, Ph.D., Co-Founder, The UberCloud HPC Experiment; Executive Consultant, HPC, Grid and Cloud; Advisor, EUDAT; Chairman, ISC Cloud Conferences, Germany

EUDAT aims to develop and support a Collaborative Data Infrastructure allowing researchers to share data across communities and carry out research effectively. EUDAT’s data services, like persistent storage, identification, authenticity, workflow execution and mining, can leverage cloud storage. But copyright or national law might not allow the data to leave the country or even the data centre in which they are curated. EUDAT takes a long-term view of the data it holds and is considering “trust marks” for digital archiving.

17:40 Close of Session

17:45 Welcome Reception in the Exhibit Hall with Poster Viewing

19:15 Close of Day

Day 1 | Day 2 | Download Brochure | Biographies 

*IBM and the IBM logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. 


Final Agenda 

Brochure Cover
Download Brochure


 Premier Sponsors 






Right Arrow View All Sponsors 

Right Arrow View Media Partners 


Clinical Exome Sequencing 



High-Scale Computing 

Genome Informatics 

Pre-Conference Symposia:

Clinical Epigenetics 

Digital Detection 

*IBM and the IBM logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.