Master’s course 1000-719bMSB
-
University of Warsaw USOS web page
-
Prerequisite: Statistical analysis or equivalents. R/Python programming and data analysis.
We examine modern challenges in modeling and understanding complex biological systems through data. High-throughput molecular measurements have necessitated development and application of statistics and machine learning, giving rise to computational biology. Microarray and sequencing technologies enable us to quantify how complex systems are responding to and influenced by experimental and external conditions. It may lead to better understanding fundamental organizational principles and functionalities of molecules and cells. Lately, there have been interesting developments in single cell analyses, spatial genomics, imaging and others that involve higher resolutions, scales, and complexities. In this course, we study exploratory data analysis, statistical learning, and neural networks that are specifically designed for such biological studies. Good understanding of statistics and programming are prerequisites. Students will program in R and Python, read primary literature weekly, and complete data analysis projects.
Lectures
- Week 1 Introduction and exploratory data analysis
- Week 2 Multiple hypothesis tests
- Week 3 Latent variable models and dimension reduction
- Week 4 Batch effects, technical variables, and unwanted variation
- Week 5 Empirical Bayes, shrinkage, and SVA
- Week 6 Statistical tests and feature selection in unsupervised learning
- Week 7 Single cell biology and single cell RNA sequencing
- Week 8 scRNA-seq analysis and cellular populations
- Week 9a Integration and multiplex of single cell RNA-seq
- Week 9b Cellular Trajectories in scRNA-seq
Computer Labs
-
Week 3 Exploratory data analysis, dimension reduction, and latent variable models
-
Week 4-5 Batch effects, technical variables, and unwanted variation
-
Week 6 Statistical tests and feature selection in unsupervised learning
Homework Assignments
Homeworks are given throughout the semester. They are embedded in the Lab Notebooks, which are partially presented during the lab. You must read through that Markdown file to see the assignments. Write your own code.
When you have completed all the homework problems, upload your R or Python script (preferably markdown, jupyter notebook, or others) and PDF/PNG files to your Github account. Make sure your figures are named “yourlastname_problem1.pdf” and so on. When done uploading, add your instructor as your collaborator.
Classnotes
Students will be assigned to write 2-page summaries of course materials, for the upcoming week. These classnotes must be in your own words - do not copy or plagiarize from any source. These notes will be shared with all students. Email a class note by Saturday night 23:00.
Textbooks and Resources
- An Introduction to Statistical Learning with Applications in R (ISL) by James, Witten, Hastie and Tibshirani
- Dive into Deep Learning (D2L) by by Zhang, Lipton, Li, and Smola
R materials
- Yet Another R Primer (R Primer) by John D. Storey
- Coursera Data Science Specialization by JHU
- R for Data Science by Hadley Wickham
Required Readings
- Austin, Dialsingh, Altman 2014 ISAS: Multiple Hypothesis Testing Review
- Butler 2018 NatureBiotech: Seurat
- Chung and Storey 2015 Bioinformatics: Jackstraw with PCA
- Chung 2020 Bioinformatics: Jackstraw for Clustering
- Eraslan 2019 NRGL Deep learning for Genomics Review
- Kursa and Rudnicki 2010 JSS: Boruta
- Leek and Storey 2007 Plos Genetics: SVA
- Leek et al. 2010 NRG: Batch effect Review
- Luecken and Theis 2018 MSB: Single Cell RNA-seq Review
- Maaskola et al. 2018 bioRxiv: Spatial Transcriptome Decomposition
- Macosko et al. 2015 Cell: ScRNA-seq nanoliter droplets
- Papadimitroulas et al. 2021 PhysMed deep learning radiomics Review
- Stahl et al. 2016 Science Spatial Transcriptomics
- Storey and Tibshirani 2003 PNAS: qvalue
- Trapnell et al. 2014 Nature Biotech: pseudotemporal ordering of single cells
- Wall, Rechtsteiner, Rocha 2014: SVD PCA Review
- Way and Greene 2018 PSB: VAE in gene expression
- Zheng et al. 2017 Nature Comm: scRNA-seq 10XGenomics
- Zou, Hastie, Tibshirani 2016 JCGS: Sparse PCA