# Advanced Topics in Modeling and Data Analysis A.k.a., Statistical Machine Learning

### STATS 605, Fall 2017

Schedule: Mondays and Wednesdays, 10:10 - 11:30am, 296 Weiser Hall

Instructor: Shuheng Zhou
Office Hours: Mondays 4-5pm at 445B West Hall.

GSI: Yanzhen Deng (dengyz@umich.edu)
Office Hour: Fridays 1:30- 3:00pm
Location: Science Learning Center (SLC, 1720 Chemistry).

Course description

Advanced Topics in Modeling and Data Analysis is a graduate level course in statistical machine learning, assuming students have taken Multivariate and categorical data analysis (Stats 601 or its equivalent where dimensionality reduction, classification and clustering methods are covered in full length) and Mathematical Statistics (Casella and Berger 2002, or Wasserman 2004). This course emphasizes on statistical analysis and methodology in handling large data sets.

We aim to develop both the intuition and the theoretical foundation for each method we study. Therefore, statistical convergence, consistency, and computational analysis are presented together with practical aspects of methodology throughout our lectures. We also introduce advanced mathematical, computational, and algorithmic tools such as concentration of measure, elements of convex optimization, and probabilistic methods which are essential to address challenges arising from modern data sets.

Textbooks

Most of the topics will come from the first three books on this list, as well as a list of recent research papers. The other books on the list provide useful references.

1. Larry Wasserman (2006). All of Nonparametric Statistics. Springer Series in Statistics.

2. Trevor Hastie, Robert Tibshirani, Jerome Friedman (2009). The Elements of Statistical Learning: Second Edition (ESL).

3. Trevor Hastie, Robert Tibshirani, Martin J. Wainwright (2015). Statistical Learning with Sparsity: the Lasso and Generalization Chapman and Hall/CRC press, Series in Statistics and Applied Probability.

4. Peter Buelmann and Sara van de Geer (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics.

5. Larry Wasserman (2004). All of Statistics: A Concise Course in Statistical Inference. Springer Series in Statistics.

Schedule
 Week Date Lecture Topic Handouts / Papers 1 Sep 6 1 Introduction/Linear models Syllabus, Project Description 2 Sep 11, 13 2, 3 Model Selection using the Basis Pursuit Chp 2 - 4 in ESL, Tib96, Donoho06, CT05,CT07, MB06, HTW book Chp 1, 5, 10 3 Sep 18, 20 4, 5 Restricted Strong Convexity, Lasso consistency properties MB 06, BRT09,GR04, ZY06,Wainwright09, HTW book Chp. 11 6 Sep 25, 27 6, 7 Graphical Lasso Algorithm, Kernel Smoothing, Linear Smoothers and Nonparametric Regression MB06, BGA08, FHT08, HTW book Chp 9, Wasserman 06 Chp 4, 5 5 Oct 2, 4 8, 9 Kernel Density Estimation Wasserman Chapter 6, ESL Chp. 6, Tsybakov book, Stone 80, 84 8 Oct 9, 11 10, 11 Local Polynomial Regression Wasserman 06 Chp 5, Tsybakov Chp 1, Fan 92, 93, Loader96, Hastie and Loader 93 7 Oct 18 12 Multivariate Density Estimation, Minimax theory Wasserman 06 Chp 4, 6, 7, Yu97 4 Oct 23, 25 13, 14 The Gaussian Graphical Model (GGM) and its estimation procedures; Matrix variate normal models ESL Chp 17, Dempster72, Drton-Perlman 04, Yuan-Lin 07, BGA08, FHT08, RBLZ08, RWRY11, ZLW08, Zhou 2014, Hsieh et. al. 2014 9 Oct 30, Nov 1 15, 16 Markov Properties on Graphs Lauritzen book (96) 10 Nov 6, 8, 10 17, 18, 19 Factorization, Graphical models as Exponential Families Wainwright and Jordan monograph (2008) 11 Nov 13, 15 Project midterm report due 12 Nov 20, 22 20, 21 Multinomials and log-linear models; Tensor modeling, Minimax theory Wasserman Lecture Notes, Yu97 13 Nov 27, 29 22, 23 Minimax theory Wasserman Lecture Notes 14 Dec 4, 8 24, 25, 26 Confidence regions and tests for high-dimensional models, Project Presentation Wasserman 06 Chp 6, 7, Yu97, Tsybakov book, Chapter 2 15 Dec 11 26, Final Presentation Project Presentation Final Report Due

References

more to come....
 [BGA08] Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data O. Banerjee, L. El Ghaoui, A. d'Aspremont, Journal of Machine Learning Research, 9(Mar):485-516, 2008. [BRT09] Simultaneous Analysis of Lasso and Dantzig Selector, Peter Bickel, Ya'acov Ritov and Alexander Tsybakov, Annals of Statistics, Vol.37, No. 4, 1705-1732, 2009. [Bre95] Better Subset Regression Using the Nonnegative Garrote, Leo Breiman, Technometrics, Vol. 37, No. 4, Nov., 1995. [CT05] Decoding by Linear Programming, Emmanuel Candes and Terrence Tao, IEEE Inf. Theory 51 (2005), 4203-4215 [CT06] Near Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? Emmanuel Candes and Terrence Tao, IEEE Inf. Theory 52 (2006), 5406-5425 [CT07] The Dantzig selector: statistical estimation when $p$ is much larger than $n$, Emmanuel Candes and Terrence Tao, Annals of Statistics 35 (2007), 2313-2351 [Don06] Compressed sensing, D. Donoho, IEEE Trans. Info. Theory, 52(4):1289-1306, 2006 [Dempster72] Covariance Selection, A. P. Dempster, Biometrics 28 (March 1972), pp. 157-175. [FHT08] Sparse inverse covariance estimation with the graphical lasso, Jerome Friedman, Trevor Hastie, and Robert Tibshirani, Biostatistics, Vol. 9, No. 3. (1 July 2008), pp. 432-441. [FG94] The Risk Inflation Criterion for Multiple Regression Dean P. Foster and Edward I. George, Ann. Statist. Volume 22, Number 4 (1994), pp. 1947-1975. [vandeGBRD14] On asymptotically optimal confidence regions and tests for high-dimensional models, Sara van de Geer, Peter Buhlmann, Ya'acov Ritov and Ruben Dezeure. Annals of Statistics, Volume 42, Number 3, 1166-1202, 2014. [GR04] Persistence in high-dimensional linear predictor selection and the virtue of overparametrization Eitan Greenshtein and Ya'acov Ritov, Bernoulli Volume 10, Number 6 (2004), pp. 971-988. [LW08] Rodeo: Sparse, greedy nonparametric regression, John Lafferty and Larry Wasserman, Ann. Statist., Vol. 36, No. 1 (2008), pp 28-63. [MB06] High dimensional graphs and variable selection with the Lasso, Nicolai Meinshausen and Peter Buhlmann, Annals of Statistics, 34(3): 1436 - 1462, 2006 [RWL10] High-dimensional Ising model selection using $\ell_1$-regularized logistic regression, P. Ravikumar, M. J. Wainwright and J. Lafferty, Annals of Statistics, Vol. 38, Number 3. pp. 1287--1319. [Tib96] Regression shrinkage and selection via the lasso, R. Tibshirani, J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288. [Yu97] Assouad, Fano, and Le Cam, Bin Yu, Festschrift for Lucien Le Cam, D. Pollard, E. Torgersen, and G. Yang (eds), pp. 423-435, Springer-Verlag. [Yuan-Lin 07] Model selection and estimation in the Gaussian graphical model, Ming Yuan and Yi Lin, Biometrika Vol. 94, No. 1 (2007), pp. 19-35. [Wai09] Sharp thresholds for noisy and high-dimensional recovery of sparsity using $\ell_1$-constrained quadratic programming (Lasso)., M. J. Wainwright, IEEE Transactions on Information Theory, 55:2183--2202, May 2009 [WR09] High-dimensional variable selection, Larry Wasserman and Kathryn Roeder, Annals of Statistics 2009, Vol. 37, No. 5A, 2178-2201. [ZZ13 ] Confidence intervals for low dimensional parameters in high dimensional linear models, Cun-Hui Zhang and Stephanie Zhang, Journal of the Royal Statistical Society: Series B (Statistical Methodology). Vol 76 Issue 1. 217--242. [ZY06] On Model Selection Consistency of Lasso, Peng Zhao and Bin Yu, J. Machine Learning Research, 7 (nov), 2541-2567. [ZLW08] Time varying undirected graphs, Shuheng Zhou, John Lafferty and Larry Wasserman, Machine Learning Journal, Vol 80, Numbers 2--3, Pages 295--319, Sep. 2010