Konstantinos Tzanakis(started October 2016)
Recent improvements in technologies and experimental methods have led to rapidly increased amounts of data in all areas of life science. Also in the field of metabolomics, in which mass spectrometry is the key technology to investigate the metabolites that are abundant in an organism or a tissue, state-of-the-art methods allow to gain ten thousands of mass spectra within a few minutes which in turn characterize hundreds of potential compounds. Nowadays, more and more, the processing of these large amounts of data has emerged as a bottleneck and requires new ways of data handling, processing and interpretation.
This project aims at the development and evaluation of novel methods for the storage and analysis of mass spectrometry-based metabolomics data based on so called Big Data frameworks which allow for the distributed processing of large data sets across clusters of computers in a scalable manner. Examples for this are the Apache Hadoop software libraries and Apache Spark as a fast and general engine for large-scale data processing. Goal of the project is to provide users a bioinformatics platform for the efficient and user-friendly handling of own experimental data. Key objectives of the platform are, at first, the automated processing of large numbers of chromatographic datasets in terms of untargeted metabolite profiling, quantification and de novo identification, and at second, their integration with ither omics data to finally enable a target oriented and time efficient interpretation of the data.
Supervisors: Stefan P. Albaum (Bielefeld University), Tim W. Nattkemper (Bielefeld University), Karsten Niehaus (Bielefeld University)