Master’s course 1000-719bMSB
-
University of Warsaw USOS web page
-
Prerequisite: Statistical analysis or equivalents. R/Python programming and data analysis.
We examine modern challenges in modeling and understanding complex biological systems through data. High-throughput molecular measurements have necessitated development and application of statistics and machine learning, giving rise to computational biology. Microarray and sequencing technologies enable us to quantify how complex systems are responding to and influenced by experimental and external conditions. It may lead to better understanding fundamental organizational principles and functionalities of molecules and cells. Lately, there have been interesting developments in single cell analyses, spatial genomics, imaging and others that involve higher resolutions, scales, and complexities. In this course, we study exploratory data analysis, statistical learning, and neural networks that are specifically designed for such biological studies. Good understanding of statistics and programming are prerequisites. Students will program in R and Python, read primary literature weekly, and complete data analysis projects.
Lectures
Computer Labs
Homework Assignments
Homeworks are given throughout the semester. They are presented in Lab Notebooks.
When you have completed all the homework problems, upload your R Markdown+HTML files or Python Jupyter Notebook with outputs and graphics (e.g., show figures directly on the notebooks). Additionally, save your figures as PDF/JPEG files.
Upload those files your Github account. Keep one repository for the course, create a separate directory for each homework, name your figures “yourlastname_problem1.pdf” and so on. When done uploading, add https://github.com/ncchung as your collaborator.
Classnotes
Students will be assigned to write 2-page summaries of course materials, for the upcoming week. These classnotes must be in your own words - do not copy or plagiarize from any source. These notes will be shared with all students. Email a class note by Sunday night 23:00.
-
Deep Learning: Binda
-
Interpretability of ML: Kraszewski
Textbooks
- An Introduction to Statistical Learning with Applications in R (ISL) by James, Witten, Hastie and Tibshirani
- Dive into Deep Learning (D2L) by by Zhang, Lipton, Li, and Smola
Learn R and Python
- Code Academy
- DataCamp
- Coursera Data Science Specialization by JHU
- PyTorch Tutorials
- R for Data Science Book by Hadley Wickham
- Yet Another R Primer by John Storey
Readings
- Austin, Dialsingh, Altman 2014 ISAS: Multiple Hypothesis Testing Review
- Storey and Tibshirani 2003 PNAS: false discovery rates and q-value
- Wall, Rechtsteiner, Rocha 2014: SVD PCA Review
- Leek and Storey 2007 Plos Genetics: Surrogate Variable Analysis
- Leek et al. 2010 NRG: Batch effect Review
- Chung and Storey 2015 Bioinformatics: Jackstraw with PCA
- Kursa and Rudnicki 2010 JSS: Boruta
- Zheng et al. 2017 Nature Comm: scRNA-seq 10X Genomics
- Chung 2020 Bioinformatics: Jackstraw for Clustering
- Macosko et al. 2015 Cell: ScRNA-seq nanoliter droplets
- Butler 2018 NatureBiotech: Seurat
- Luecken and Theis 2018 MSB: Single Cell RNA-seq Review
- Eraslan 2019 NRGL Deep learning for Genomics Review
- Stahl et al. 2016 Science Spatial Transcriptomics
- Trapnell et al. 2014 Nature Biotech: pseudotemporal ordering of single cells
- Simonyan et al. 2013 ICLR: vanilla saliency map
- Murdoch et al. 2019 PNAS: interpretable machine learning
- Rudin 2019 Nature: Stop explaining black box ML
- Papadimitroulas et al. 2021 PhysMed deep learning radiomics Review