The morning sessions will consist of lectures introducing the topics whereas the afternoon sessions will cover special problems and methods (mostly for genomics), featuring presentations of selected research papers and giving the participants the opportunity to discuss issues related to their own research projects.
The lectures will introduce the area of data mining, focusing on algorithms, particularly for cluster analysis and classification.
Classification * Classifier evaluation * Decision trees * Naïve Bayes classifier * Logistic regression * Bayesian networks * Support vector machines * Nearest neighbor classifier * Ensemble methods * Regression analysis
|Lecture||Introduction, Clustering (to slide 49)||Clustering (slides 50-95)||Clustering (until end), Classification (to slide 125)||Classification (to slide 164 )||Classification (to the end), Conclusion|
|Seminar||Hofree et al. (2013) Network-based stratification of tumor mutations. Nat. Methods, 10, 1108–15.||Lawrence et al.(1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, 208–14.||Gardy et al. (2003) PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res., 31, 3613–3617.||Vazquez et al. (2003) Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol., 21, 697–700.||Bleakley et al. (2007) Supervised reconstruction of biological networks with local models. Bioinformatics, 23, i57–65. Yu et al. Using Bayesian network inference algorithms to recover molecular genetic regulatory networks.|
In this part of the course, principles of data visualization will be introduced. First, the history of visualization is reviewed to develop a definition of the terms “visualization”, “scientific visualization” and “information visualization”. Second, categories of data and metadata will be defined and basic visualization techniques will be described for each category. Third, basic aspects of human visual perception and cognition will be reviewed in the context of data visualization.
The last part of the course will center on two recent developments in advanced data analysis (a) nonlinear dimensionality reduction technologies for intuitive data inspection, which focuses on the question how to map high dimensional data points to low dimensions such that the structure of the data becomes visible and (b) metric learning techniques, which adjust the metric, i.e. the data representation according to auxiliary knowledge. Both technologies constitute matured fields of research with a variety of methods being readily available and quite some advanced applications in the field of bioinformatics and beyond. We will give an overview of the underlying concepts, where we try to provide a clear classification of the differences of the current most popular technologies.