NGS Data Management: Sequencing Systems, Storage, and Analysis

Download Brochure |  Short Courses 

Next-generation sequencing (NGS) platforms play an increasingly prominent role in biological research. In the coming year, we can expect the amount of sequence data generated to exceed the entire data generated in the past decade. To use NGS at maximum capability, we must address IT tools of data management, storage, and analysis. NGS Data Management convenes hardware/software engineers, database architects, storage managers, systems integrators and analysts, as well as biological researchers and bioinformaticists. This track will provide perspectives from each specialty, emphasizing how they can be integrated into a cohesive, comprehensive team to manage the sequencing data deluge.


9:00 Conference Registration and Morning Coffee


Informatics: Meeting the Challenge of Turning Ngs Data into Knowledge

9:30 Chairperson’s Opening Remarks

9:35 Into the Unknown: Expression Profiling without Genome Sequence Information by Next-Generation Sequencing

Fabian Birzele, Ph.D., Genomics, Boehringer Ingelheim Pharma, GmbH & Co. KG
Expression profiling in organisms lacking genome or transcriptome sequence information is feasible by combining Illumina’s mRNA-seq technology with a novel bioinformatics pipeline that integrates assembled and annotated sequences from read data with information derived from related organisms. Using the Chinese hamster as a model, expression patterns for more than 13000 genes can be analyzed. A detailed analysis of selected biological functions such as DNA replication and cell cycle control demonstrate the potential of NGS expression profiling in organisms without extended genome sequence to improve data quantity and quality.

10:05 Iterative Read Mapping and Assembly Allows the Use of a More Distant Reference in Metagenome Assembly

Bas E. Dutilh, Ph.D., Centre for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Center

Using a reference can greatly improve the assembly of next-generation sequencing reads, but a closely related genome is not always available; however, by using a permissive mapping algorithm and by iterating the mapping and assembly several times, a more distant reference can still be used. From a short-read sequenced enriched bioreactor, we construct a sequence that captures the consensus of the population’s metagenome.

Sponsored by
10:35 Coffee Break


11:00 Utilizing Next-Generation Sequencing to Analyze the Complex Genome of Barley

Burkhard Steuernagel, Department of Cytogenetics and Genome Analysis, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)

De novo sequencing the entire genome of a large complex plant genome like the one of barley (Hordeum vulgare L.) is a major challenge both in terms of experimental feasibility and costs. The emergence and breathtaking progress of next-generation sequencing technologies has put this goal into focus. As a proof of concept, 3500 clones were sequenced to at least 15-fold coverage in pools of up to 48 barcoded BACs. The high quality of the assembled contigs show the application in a clone based sequencing strategy for sequencing the barley genome.

11:30 Fully Automated Genome Annotation with Deep RNA Sequencing

Gunnar Rätsch, Ph.D., Group Leader, Friedrich Miescher Laboratory, Max Planck Society

The development of high-throughput sequencing technologies allows the determination of the complete set of RNA-transcripts expressed under a given condition. We present accurate and efficient computational methods to automatically annotate genes and transcripts together with their expression levels from deep RNA sequencing. We illustrate that these approaches only require RNA deep sequencing reads and the genome sequence and are able to determine very accurate genome annotations. Hence, high quality genome annotation can now be fully automated. We applied these techniques to annotate novel and to reannotate well-characterized genomes and could identify many yet undiscovered transcripts.

Sponsored by
Quantum 125px
12:00 Next Gen Data Management for Next Gen Life Sciences
Roberto FabbrettiRoberto Fabbretti, Ph.D., IT Manager, Swiss Institute of Bioinformatics
The advent of next generation sequencers are contributing to orders of magnitude more data to sift through, analyze, and share, increasing complexities of genomic sequencing workflows.  Added complexity means added risk, jeopardizing time to discovery.  A tightly integrated scalable high-performance computing platform with intelligent data management options could make all the difference.  Learn how to deploy an end-to-end data management infrastructure that solves the most demanding Next Gen Sequencing Workflows so your scientists’ strive for the next major medical breakthrough or discovery go unimpeded.  Roberto Fabbretti PhD, IT Manager from the Swiss Institute of Bioinformatics explains his implementation.



Sponsored by

12:15 Avadis NGS – Next-Gen Sequencing Analysis for the Rest of Us
Thon de Boer, Ph.D., Director, Product Management, Software, Strand Scientific Intelligence, Inc.
Ever since the time of Frederick Sanger in Cambridge and Maxam & Gilbert at Harvard, DNA sequencing has been an invaluable tool in the arsenal of the Life Science researcher since the early ‘70s. The introduction of low-cost, high throughput next-gen sequencing technologies has resulted in an exponential growth of the amount of data being produced on a daily bases. However, turning this data in biological relevant and actionable results has not been as easy yet. Biologist have found themselves having to master many different and complex tools and learn command line utilities to get even the most basic results, let alone being able to do this for hundreds of samples at a time.

Strand Scientific Intelligence has developed the application Avadis NGS to help Biologists make sense of their NGS data and do this for hundreds of samples on even the most basic of computers. We will showcase the capabilities of Avadis NGS for ChIP-SEQ, RNA-SEQ and Variation analysis and will show how a user can quickly get from large amounts of NGS data to biological relevant information, through powerful statistics, insightful visualizations and real biological information such as pathways.

12:30 Lunch for Purchase in the Exhibit Hall

13:45 Dedicated Poster Viewing in the Exhibit Hall


STORAGE: Solutions for Massive Sequencing Data Management

14:30 Chairperson’s Remarks

14:35 Featured Presentation: Sequencing Data StorageSponsored by

Chris DagdigianChris Dagdigian, Founding Partner & Director of Technology, BioTeam, Inc.

Next-Generation Sequencing (NGS) instruments are forcing evolutionary and revolutionary changes in research IT architectures & infrastructures. Chemistry and lab protocols are advancing faster than the underlying IT systems and methods leading to a crisis of capability in many organizations. This presentation will focus on “science-centric storage” for life science informatics with specific attention on requirements, trends, data management methods and emerging best practices.

15:05 Experiences from the European Bioinformatics Institute’s Data Resources, Storage, and Management

Guy Cochrane, Ph.D., European Nucleotide Archive Team Leader, Bioinformatics Institute

15:35 Refreshment Break

Sponsored by
Isilon Systems



Sponsored by
Isilon Systems
16:00 How Scale-out Requirements in Life Sciences are Making Obsolete RAID, LUNs, Volumes and the Rest of the Traditional Storage Model
Rob Anderson, Director, Technology & Business Development, EMEA
Many areas of life science research such as genomics, proteomics and high-resolution measurement and modeling involve the generation, accumulation, analysis, and distribution of large amounts of data. These advances are offset by capacity increases in storage technologies that are undergoing their own rapid evolution, such as the increase in hard-drive density. However, storage systems at scale, incorporating these ever-denser disks, myriad processing elements and proliferating gateways, are becoming less and less reliable and manageable, as they try to cope with ever-increasing quantities of data. A differentiated approach is shown which avoids the elements of traditional storage, such as hardware RAID, allowing massive scale-out needs to be served with high levels of availability and performance, and with incredible ease of management, regardless of scale.

16:30 How to Overcome the 100 Miles between Petabases and Petabytes

Jürgen Eils, Bioinformatics Database Group Leader, German Cancer Research Center

Recently, Heidelberg University received a grant to build the largest data storage facility in Germany at 5-10 petabytes. From a management and logical perspective, the massive throughput of next generation sequencing requires new concepts and strategies. One problem is the long distance transport of data from the sequencer machine to the data storage facility. We will present strategies and concepts with emphasis on reusability and sustainability for storing and retrieving the comprehensive collection of sequencing data in combination with associated clinical and histopathological annotation data – all in accordance with the International Cancer Genome Consortium (ICGC) guidelines.

17:00 Enhanced Scalability, Large Data Volumes Management, Integrated Analysis, and NGS Informatics Support in a Medical Setting

Andrew Stubbs, Ph.D., Assistant Professor, Department of Bioinformatics, Erasmus Medical Center

17:30 The First Success Stories after the Swedish Buildup of Computational Power and Large-Scale Storage for Gene-Sequence Data

Ingela NystromIngela Nystrom, Ph.D., Director, UPPMAX, Center for Imaging Analysis, Uppsala University

Last year, we reported on the buildup of a system at Uppsala University, Sweden, intended for researchers who deal with the large-scale data from modern gene-sequencing technology. The system has 1200 cores, 4 TB RAM, and 500 TB storage. Now, we report our first success stories, e.g., the whole-genome resequencing project revealing signatures of selection during chicken domestication.

18:00 Sponsored Presentation (Opportunity Available)

18:30 Interactive Breakout Discussion Groups
Join a facilitated discussion group focused around specific scientific and technology related topics. This unique session allows conference participants to exchange ideas, experiences, and develop future collaborations around a focused topic. Current discussion groups include:

Repositories of Metagenomic Data and Tools for Academic and Commercial Users

Moderator: Oleg Reva, Ph.D., Senior Lecturer, Biochemistry, University of Pretoria

  • Data formats and supplementary information requirements
  • Clustering and binning of environmental sequences
  • Modeling metabolic pathways and ecological interactions: facts and artifacts

Storage of Omic Data - The Cloud & Beyond

Moderator: Chris Dagdigian, Founding Partner & Director of Technology, BioTeam, Inc.

  • Science-centric storage
  • Data management issues
  • Best practices

Web Services

Moderator: Christian Hauck, Ph.D., Knowledge Management & Competitive Intelligence, Novartis Pharma AG

Analyzing and Storing Gene Sequence Data

Moderator: Ingela Nystrom, Ph.D., Director, UPPMAX, Center for Image Analysis, Uppsala University

  • User support
  • Data security issues
  • What will come next?

Semantic Web and Ontologies

Moderator: Martin Gollery, Senior Bioinformatics Scientist, Tahoe Informatics


Title To Be Announced
Moderator: Richard Compton, Vice President, EMEA

• Determine priority areas where ‘scientifically enabling’ the research desktop provides most value to the critical path of the research cycle
• Determine issues to overcome in meeting corporate ‘enterprise standards’ for deployment
• Showcase areas of nascent technology


19:15 – 21:00 CHI Networking Reception

Sponsored by

  Isilon Systems



9:00 Conference Registration and Morning Coffee


SeQuenCING DATA: Provides a Promising Tool for Molecular Diagnostics

9:30 Chairperson’s Opening Remarks

Kevin Davies, Ph.D., Editor-in-Chief, Bio-IT World



9:35 Applications of Next-Generation Sequencing in an Academic Medical Hospital: From Single Molecules to a Complete Female Genome

Marjolein Kriek, M.D., Ph.D., Clinical Geneticist, Human and Clinical Genetics, Leiden University Medical Centre

Medical application of full human genome sequencing to resolve a health problem will soon be a realistic option. Within our academic hospital we have started to implement next-generation sequencing with applications covering many subjects: candidate disease genes, bacterial genomes, single molecules sequencing and the sequencing of a complete female genome. We experienced that technologically, human genome sequencing is feasible, but computationally it was at the limits of our possibilities. The problem resides in interpretation, where efficient analytical tools are largely lacking.

10:05 Drivers, Challenges and Opportunities to Bring Diagnostics Closer to the Points of Need

Rudi PauwelsRudi Pauwels, Ph.D., Founding Director & CEO, Biocartis

The rapidly expanding atlas of molecular-based biomarkers and the advent of novel technologies are creating new opportunities to improve the outcome for the individual patient by providing tools to implement a more personalized and increasingly more molecular-based medicine. These trends will likely accelerate in a climate of intensive changes and pressures that relate to the various industries, healthcare players and regulatory bodies involved. Although technical, regulatory, economic, business and adoption hurdles need to be overcome, the overall need of affordable healthcare for all is likely to be an important selection pressure, if not critical determinant , for the future evolution of healthcare.

Sponsored by
Isilon Systems
10:35 Coffee Break




Sponsored by
Qiagen small logo 
11:00 Pioneering Personalized Healthcare through Pharma Partnering: A Case Study in Companion Diagnostic Co-Development
Steven Little Stephen Little, Ph.D., Vice President Personalized Healthcare, QIAGEN
QIAGEN (formerly DxS Ltd) has established itself as the market leader in the successful co-development of drug-diagnostic solutions with pharmaceutical partners. QIAGEN has a considerable portfolio of previous and ongoing collaborations with drug giants such as Amgen, AstraZeneca, Bristol-Myers Squibb and ImClone Systems, and Boehringer Ingelheim. This presentation will outline:

  • The current state of the personalized medicine industry
  • Some of the hurdles involved in bringing a companion diagnostic to market
  • The importance of pharma partnering during clinical development.
  • It will also look at two key examples of QIAGEN’s experiences, producing the first companion diagnostic of its kind – the TheraScreen®: K-RAS Mutation Kit to predict patient response to metastatic colorectal cancer therapies Vectibix® and Erbitux® based on the mutation status of the K-RAS oncogene, and the TheraScreen: EGFR29 Mutation Kit for the non-small cell lung cancer drug IRESSA®.

11:30 Bringing Next-Generation Sequencing to the Clinic

Linh Hoang, M.D., Ph.D., Director, Genomic Medicine, LIFE Technologies, USA
While the pace of technological improvements has led to lowering of costs, the next frontier will be in demonstrating clinical utility. In select cases, diagnostic tests limited to a panel of genes fail to correctly diagnose all patients who have the clinical manifestations of the disease. We will highlight collaborative work on Charcot-Marie-Tooth and Noonan Syndrome using NGS. In acquired diseases, we are focusing on cancer, where the mutation profiles are so complex that whole genomic approaches are preferred. Key issues around privacy, reimbursement, education, and regulation are essential to realizing the clinical power of NGS.


Moderator: Kevin Davies, Ph.D., Editor-in-Chief, Bio-IT World

The explosion of next-generation sequencing and other tools for high-throughput genomic analysis is already proving its value, for example in identifying mutations underlying rare Mendelian disorders and rationalizing cancer treatments. But the extraordinary volume and complexity of these data make the informatics of data analysis more costly and time-consuming than generating the original data. As one scientist said, “What use is the $1,000 genome if it costs $20,000 to do the analysis?” In this panel discussion, Doctors Kriek, Pauwels, Little and Hoang discuss the challenges of clinically interpreting next-gen genomic data and delivering those data in a timely and accessible fashion to the bedside.

12:30 Lunch for Purchase and Poster Viewing in the Exhibit Hall

13:30 Close of Conference

Download Brochure |  Short Courses 

Japanese Korean Chinese Simplified Chinese Traditional 
Premier Sponsors


Hitachi Data Systems


View All Sponsors 

Premier Sponsor

Official Media Partner

Bio-IT World

View Media Partners 


Bio-IT World Events

Bio-IT World Expo Locations 
Bio-IT World Expo 

Bio-IT Cloud Summit