October 2014
Volume 4, Issue 1


Meeting the Challenge of Rapid Whole Genome Analysis


In the never-ending race to stay a few steps ahead of the deluge of genomic data pouring out of our lab, CIDR recently acquired a new booster rocket in the form of the Bina Box, a highly optimized genomic analysis system from Bina Technologies. The main attraction is that it is fast: a single unit can process a whole 30X human genome in 7 to 9 hours, which is 2 times faster than it takes on our compute cluster. The system is a 4-node, 64-core genomic analysis appliance that runs best-practices workflows (e.g., BWA 0.7.8 with GATK 3.1) using state-of-the-art "big data" techniques to speed up the analyses, primarily by optimizing the complex tasks of job and data distribution.


Our R&D staff tracked Bina for two years as they refined their technology. Earlier this year we concluded successful WGS analysis tests on the Amazon cloud version of their product. After validating the results of those tests, we acquired a single unit of the Bina Genomic Analysis Platform, primarily with the intent to speed up whole genome analysis, but also with an eye to other potential uses such as a tumor/normal workflow. We are currently validating the Bina WGS workflow against the same analysis run on our own compute cluster, and the results have been increasingly indistinguishable. It also provides four structural variant callers, although these are not yet optimized for the system. Beyond research studies, due in part to the exceptionally short analysis turn-around time we are also considering the system for clinical (CAP/CLIA-certified) projects, which will depend on work Bina is currently pursuing to facilitate requirements for clinical studies (e.g., CAP's "Bioinformatics Pipelines for NGS" checklist).


We have continued to work with Bina on specific desired enhancements including a programmatic interface for automating job submission and integration with our sequencing pipeline as well as testing of their cloud-based annotation platform which will be tightly integrated with the Bina Box. Since the system also has an easy-to-use web-based graphical interface for job submission, monitoring and results evaluation, we are exploring the possibility of CIDR PIs accessing the platform for sequence analysis tasks. This could perhaps be done by utilizing excess capacity on our system or by running identical workflows in the Bina Cloud. An example might be running multi-sample calling with more samples in addition to those from a PI's CIDR project.

Meeting the Challenge

To subscribe to CIDR News, click here, or send a blank email to sympa@lists.johnshopkins.edu with subject line: subscribe CIDR_News