9.30 | Welcome coffee |
10.00 | Rayan Chikhi: “AllSome Sequence Bloom Trees” |
11.00 | Georges Hattab: “Analyzing colony dynamics and visualizing cell diversity in spatiotemporal experiments” |
11.30 | Robert Müller: “Applying succinct data structures to massive sequence clustering” |
12.00 | Lunch break |
1.30 | Michael Menzel: “User-friendly software for viral integration site analysis” |
2.30 | Krister Swenson: “Linking large-scale genomic rearrangements to 3D chromatin structure” |
3.30 | Coffee break |
4.15 | Eyla Willing: “Genome rearrangement by inversions and indels” (PhD defense) |
7.30 | Dinner (downtown: Pepper's, Niederwall 31-35, next to tram station “Rathaus”) |
9.21 | Bus 24 from downtown (“Jahnplatz”) to “Bauernhausmuseum” |
9.45 | Group event at Bauernhausmuseum incl. lunch |
12.45 | Walk to workshop site |
1.30 | Faraz Hach: “Computational proteogenomic identification and functional interpretation of translated fusions and micro structural variations in cancer” |
2.30 | Leonid Chindelevitch: “PathOGiST: Calibrated multi-criterion genomic analysis for public health microbiology” |
3.30 | Coffee break |
4.15 | Guillaume Holley: “Pan-genome search and storage” (PhD defense) |
6.30 | Dinner (on site) |
9.00 | Individual Research Meetings |
9.30 | Faculty Meeting |
12.00 | Lunch (faculty and guests only) |
Studierendenwerk, administrative building (Morgenbreede 2-4), close to main campus
Room 1.01
Show bigger map
Arriving at Bielefeld Hbf (main station), exit the building towards the city center; you should see a white buildung (Hotel “Bielefelder Hof” - not your hotel!) in front of you (if you see a Cinemaxx, you are on the wrong side!). Cross the street to get to the underground tram (Stadtbahn / U-Bahn) station.
Hotel Arcadia
Niederwall 31-35, 33602 Bielefeld
Phone: +49 521 5253 0
Homepage
Hotel Golden Tulip
Waldhof 15, 33602 Bielefeld
Phone: +49 521 528 00
Homepage
Hotel Bültmannskrug
Babenhauser Str. 37, 33613 Bielefeld
Phone: +49 521 88 31 44
Homepage
The ubiquity of next generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2,652 human RNA-seq experiments uploaded to the Sequence Read Archive. Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. I'll talk about a set of algorithmic improvements called the AllSome Sequence Bloom Tree that significantly reduce the tree construction time and query time.
The rise of high-throughput methods in genomic research greatly expanded the amount of genomic annotations. Annotations can be used to characterise genomic positions, e.g. protein binding, virus integration, or differential methylation. Nevertheless, the amount of these sites generated by high-throughput methods are way too large for manual inspection. Here, we present Enhort, a novel, user-friendly software tool for the deep analysis of large amounts of genomic positions. It uses a complex but easy-to-use mechanism for adjusting statistical background models according to experimental conditions or specific scientific questions. The models can adapt to annotation tracks, sequence logos or distribution of distances for given genomic positions. Comparative analysis of several viral integration data sets on integration site preferences and genotoxicity estimation was conducted.
Towards the beginning of the 20th century, Sturdevant discovered that the genes of Drosophila are organized linearly on the chromosomes. Later, through hybridization experiments on polytene chromosomes, Sturdevant and Dobzhanski noticed that a substrand of the fruit fly DNA can be inverted; in some strains of fruit fly the sequence of genes on the chromosome appears in reverse order. Further, he showed that these inversions were linked to the phenotype of those individuals that possessed it: male flies with a particular inversion had few or no male offspring. So as early as 1936, evolutionary histories between species of fruit fly were being inferred based on inversion histories. It is now known that rearrangements can both be fixed in a population, and are associated with a multitude of diseases. It was not for another half a century that appropriate questions were asked about the inference of rearrangement histories. In this talk we introduce concepts and models for understanding rearrangement histories. We present our work relating 3D chromatin conformation, as represented by Hi-C data, to large-scale rearrangements across evolutionary time scales
Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. In this talk, I will introduce a novel computational framework which can integratively analyze all three types of omics data to obtain a complete molecular profile of a tissue sample, in normal and disease conditions. This framework includes MiStrVar, an algorithmic method we developed to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can provide an accurate profile of structurally aberrant transcripts in cancer samples. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures in the respective proteomics data sets. Our framework's ability to observe structural aberrations at three levels of omics data provides means of validating their presence.
As public health organizations start to rely on whole-genome sequencing (WGS) data for infectious disease surveillance and outbreak investigations, two main issues emerge from the use of WGS data for genotyping. First, methods for differentiating outbreak-related strains from sporadic strains are often based on a single type of genomic variation. Second, WGS-based sample clustering algorithms are often not calibrated, meaning that the determination of clustering thresholds or sub-typing cutoffs is still mostly arbitrary. We hypothesize that we can achieve better outbreak cluster identification by combining multiple variants within a unified statistical model, with model parameters calibrated to specific pathogens.
In this talk, I will describe the current progress in the development of the PathOGiST application, that implements existing and novel genomic variant-calling algorithms from WGS data (SNPs, tandem repeats, MLST), clustering algorithms for WGS datasets based on a multi-criterion genome dissimilarity measure using these various kinds of genomic variants, and calibration of the statistical models and algorithms using large reference sets of selected pathogen genome from epidemiologically confirmed outbreak strains.