User Tools

Site Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cds [2017/11/13 07:57]
tischulz Added abstract and title
cds [2017/11/19 16:34] (current)
jensstoye [Schedule]
Line 7: Line 7:
  
 | Organized by: | Faculty of Technology, GRK 1906 DiDy | | Organized by: | Faculty of Technology, GRK 1906 DiDy |
-| Place:| Bielefeld University, ​rooms V2-105/115 |+| Place:| Bielefeld University, ​main building, room V2-105/115 |
 | Date:| November 20-22, 2017 | | Date:| November 20-22, 2017 |
 \\ \\
Line 21: Line 21:
  
 **Tuesday, November 21st** **Tuesday, November 21st**
-| 9h30 | Annalisa Marsico | Statistical Models of post-transcriptional gene regulation |+<del>9h30</​del> ​<del>Annalisa Marsico</​del> ​<del>Statistical Models of post-transcriptional gene regulation</​del>​ (//​postponed//​) ​|
 | 14h00 | Tobias Marschall | Towards haplotype-resolved genome assembly -- or how to solve multiple jigsaw puzzles simultaneously | | 14h00 | Tobias Marschall | Towards haplotype-resolved genome assembly -- or how to solve multiple jigsaw puzzles simultaneously |
  
Line 27: Line 27:
 **Wednesday,​ November 22nd** **Wednesday,​ November 22nd**
 | 9h30 | Stephan Schiffels | Unlocking human history - Computational methods for demographic inference from genome sequences | | 9h30 | Stephan Schiffels | Unlocking human history - Computational methods for demographic inference from genome sequences |
-| 14h00 | Alexander Schönhuth | |+| 14h00 | Alexander Schönhuth | Genome Data Science ​|
  
 \\ \\
Line 35: Line 35:
 //by Jochen Kruppa// //by Jochen Kruppa//
  
-Bioinformatics methods often incorporate the frequency distribution of nulecobases or k-mers in DNA or RNA sequences, for example as part of metagenomic or phylogenetic analysis. Because the frequency matrix, with sequences in the rows and nucleobases in the columns, is multi-dimensional and therefore hard to visualize. Here, we present the R-package '​kmerPyramid'​ that allows to display each sequence, based on its nucleobase or k-mer distribution projected to the space of principal components, as a point within a 3-dimensional,​ interactive pyramid (Kruppa et al., 2017). Using the computer mouse, the user can turn the pyramid?s axes, zoom in and out and identify individual points. Additionally,​ the package provides the related frequency distribution matrices of about 2.000 bacteria and 5.000 viruses, respectively,​ calculated from NCBI GenBank. The ?kmerPyramidcan particularly be used for intra- and inter species comparisons. We show the application of clustering genetic regions, like coding and non-coding DNA sequences, the visualization of genomic islands in bacteria genomes, and the detection of low complexity regions in a genome. We are also able to visualize the direct comparison of two sequences considering higher k-mers. This feature might be a guidance for later motif search. The kmerPyramid is based on principal component analysis (PCA) that is used to project the multi-dimensional matrix of nucleobase and k-mer frequencies in the 3-dimensional space. PCA, as a method for dimension reduction, has already been demonstrated to preserve relevant information when exploring these frequencies (Dodsworth et al., 2013; Podar et al., 2013; Imelfort et al., 2014). The kmerPyramid package is available on GitHub (https://​github.com/​jkruppa/​kmerPyramid).+Bioinformatics methods often incorporate the frequency distribution of nulecobases or k-mers in DNA or RNA sequences, for example as part of metagenomic or phylogenetic analysis. Because the frequency matrix, with sequences in the rows and nucleobases in the columns, is multi-dimensional and therefore hard to visualize. Here, we present the R-package '​kmerPyramid'​ that allows to display each sequence, based on its nucleobase or k-mer distribution projected to the space of principal components, as a point within a 3-dimensional,​ interactive pyramid (Kruppa et al., 2017). Using the computer mouse, the user can turn the pyramid's axes, zoom in and out and identify individual points. Additionally,​ the package provides the related frequency distribution matrices of about 2.000 bacteria and 5.000 viruses, respectively,​ calculated from NCBI GenBank. The 'kmerPyramid' ​can particularly be used for intra- and inter species comparisons. We show the application of clustering genetic regions, like coding and non-coding DNA sequences, the visualization of genomic islands in bacteria genomes, and the detection of low complexity regions in a genome. We are also able to visualize the direct comparison of two sequences considering higher k-mers. This feature might be a guidance for later motif search. The kmerPyramid is based on principal component analysis (PCA) that is used to project the multi-dimensional matrix of nucleobase and k-mer frequencies in the 3-dimensional space. PCA, as a method for dimension reduction, has already been demonstrated to preserve relevant information when exploring these frequencies (Dodsworth et al., 2013; Podar et al., 2013; Imelfort et al., 2014). The kmerPyramid package is available on GitHub (https://​github.com/​jkruppa/​kmerPyramid).
  
 \\ \\
Line 89: Line 89:
  
 \\ \\
 +
 +**Genome Data Science**
 +
 +//by Alexander Schönhuth//​
 +
 +Die modernen Sequenziertechnologien haben die Biologie, und
 +insbesondere die Genomik mit sintflutartigen Datenmengen konfrontiert.
 +Die Konsequenzen sind gewaltig, nicht nur in Hinsicht auf die sich
 +dadurch ergebenden Chancen in punkto Lebensdauer und -qualität,
 +sondern auch hinsichtlich der der Data Science zuzurechnenden
 +Herausforderungen. In meinem Vortrag werde ich zwei gegenwärtig ​
 +dominante Themenkreise ansprechen.
 +
 +Zum Ersten werde ich besprechen, wie man Cliquen in
 +Genom-Assembly-Graphen zügig enumerieren kann, um diese dann dazu
 +benutzen, um Virusgenome zu rekonstruieren. Diese Vorgehensweise der
 +Rekonstruktion von Virusgenomen ist neu. Die Ergebnisse zeigen, dass
 +dieser Data-Mining-orientierte,​ streng datenbezogene Ansatz
 +entscheidende Vorteile im Abgleich mit (weniger datenbezogenen)
 +State-of-the-Art-Methoden hat.
 +
 +Zweitens werde ich besprechen, wie man DNA-Sequenz -- und auch
 +Sequenz im Allgemeinen -- mit Hilfe von Hilbert-Kurven repräsentieren
 +kann, um sie mit Deep Convolutional Neural Networks zu klassifizieren.
 +Convolutional Neural Networks haben in letzter Zeit insbesondere in
 +der Bildanalyse grosse Erfolge gefeiert. Die Idee ist, solche Erfolge
 +in der DNA-Sequenzanalyse zu reproduzieren. Hilbert-Kurven haben
 +aufgrund ihrer charakterisierenden Eigenschaften das Potenzial, Sequenz
 +in Bilder zu verwandeln, so dass die Stärken der Konvolution optimal
 +ausgenutzt werden, was sich in den entsprechenden Ergebnissen
 +niederschlägt.
 +
 +
 +
cds.1510559879.txt.gz · Last modified: 2017/11/13 07:57 by tischulz