HUANG Liren (started October 2014)
The increasing amount of next-generation sequencing data poses a fundamental challenge for genomic analytics. Addressing this issue requires solutions for both hardware and software. Cloud infrastructure and platform services (IaaS and PaaS) have been well established in informatics discipline. To be compatible on the cluster, methods involving message passing and data aggregation between computers must be re-implement with higher level programming interface. In this project, we will present a Spark and Hadoop based bioinformatics framework, Sparkhit, that should be easy to use on local server or in the cloud. We want to implemented a variety of analytical tools and methods for different types of genomic NGS data. These methods will be programmed in MapReduce model, where parallelization will be optimized and supervised (fault tolerance).
Supervisors: Alexander Sczyrba (Bielefeld University), Alexander Goesmann (Giessen University), Colin C. Collins (Vancouver Prostate Centre)