This is an old revision of the document!
YU Jia (started October 2014)
The NGS has brought us a high potential to sequence a full genome in a short time. Therefore there are more and more strains are sequenced which are possible for pan-genome computation. Considering genome data volume is normally big, pan-genome computation would need massive processing power and storage resources.
Bielefeld University and Justus Liebig University Giessen has collaboratively developed a powerful pan-genome computation platform EDGAR. But to face nowadays genome data flood, it has to be more efficient.
For my project, the first goal could be figuring a way to deal with such massive genome data. We will try to deploy EDGAR in a fully distributed mode for the possibility to horizontally scale-out.
There is an alternative first step of my project, which is trying to add more functionalities into our existing EDGAR platform, for instance phylogenetic analysis, new machine learning algorithms, etc.
Supervisors: Alexander Goesmann (Bielefeld University), Alexander Sczyrba (Bielefeld University)