Advanced Topics in Modeling and Data Analysis
A.k.a., Statistical Machine Learning
STATS 605, Fall 2017
Schedule: Mondays and Wednesdays, 10:10  11:30am, 296 Weiser Hall
Instructor: Shuheng Zhou
Office Hours: Mondays 45pm at 445B West Hall.
GSI: Yanzhen Deng (dengyz@umich.edu)
Office Hour: Fridays 1:30 3:00pm
Location: Science Learning Center (SLC, 1720 Chemistry).
Course description
Advanced Topics in Modeling and Data Analysis is a graduate level course in statistical machine learning, assuming students have taken Multivariate and categorical data analysis (Stats 601 or its equivalent where dimensionality reduction, classification and clustering methods are covered in full length) and Mathematical Statistics (Casella and Berger 2002, or Wasserman 2004). This course emphasizes on statistical analysis and methodology in handling large data sets.
We aim to develop both the intuition and the theoretical foundation for each method we study. Therefore, statistical convergence, consistency, and computational analysis are presented together with practical aspects of methodology throughout our lectures. We also introduce advanced mathematical, computational, and algorithmic tools such as concentration of measure, elements of convex optimization, and probabilistic methods which are essential to address challenges arising from modern data sets.
Textbooks
Most of the topics will come from the first three books on this list, as
well as a list of recent research papers. The other books on the list provide useful references.
1. Larry Wasserman (2006). All of Nonparametric Statistics. Springer Series in Statistics.
2. Trevor Hastie, Robert Tibshirani, Jerome Friedman (2009). The
Elements of Statistical Learning: Second Edition (ESL).
3. Trevor Hastie, Robert Tibshirani, Martin J. Wainwright (2015).
Statistical Learning with Sparsity: the Lasso and Generalization
Chapman and Hall/CRC press, Series in Statistics and Applied Probability.
4. Peter Buelmann and Sara van de Geer (2011).
Statistics for HighDimensional Data: Methods, Theory and Applications. Springer Series in Statistics.
5. Larry Wasserman (2004). All of Statistics: A Concise Course in Statistical Inference. Springer Series in Statistics.

Schedule
Week 
Date 
Lecture 
Topic 
Handouts / Papers 
1 
Sep 6 
1 
Introduction/Linear models 
Syllabus, Project Description 
2 
Sep 11, 13 
2, 3 
Model Selection using the Basis Pursuit 
Chp 2  4 in ESL, Tib96, Donoho06,
CT05,CT07, MB06, HTW book Chp 1, 5, 10 
3 
Sep 18, 20 
4, 5 
Restricted Strong
Convexity, Lasso consistency properties 
MB 06, BRT09,GR04, ZY06,Wainwright09, HTW book Chp. 11 
6 
Sep 25, 27 
6, 7 
Graphical Lasso Algorithm, Kernel Smoothing, Linear Smoothers and Nonparametric Regression 
MB06, BGA08, FHT08, HTW book Chp 9, Wasserman 06 Chp 4, 5 
5 
Oct 2, 4 
8, 9 
Kernel Density Estimation 
Wasserman Chapter 6, ESL Chp. 6, Tsybakov
book, Stone 80, 84 
8 
Oct 9, 11 
10, 11 
Local Polynomial Regression 
Wasserman 06 Chp 5, Tsybakov Chp 1, Fan 92, 93, Loader96, Hastie and Loader 93 
7 
Oct 18 
12 
Multivariate Density
Estimation, Minimax theory 
Wasserman 06 Chp 4, 6, 7, Yu97

4 
Oct 23, 25 
13, 14 
The Gaussian Graphical Model (GGM) and
its estimation procedures; Matrix variate normal models

ESL Chp 17, Dempster72, DrtonPerlman 04, YuanLin 07, BGA08, FHT08,
RBLZ08, RWRY11, ZLW08, Zhou 2014, Hsieh et. al. 2014 
9 
Oct 30, Nov 1 
15, 16 
Markov Properties on Graphs 
Lauritzen book (96) 
10 
Nov 6, 8, 10 
17, 18, 19 
Factorization, Graphical models as Exponential Families 
Wainwright and Jordan monograph (2008) 
11 
Nov 13, 15 

Project midterm report due


12 
Nov 20, 22 
20, 21 
Multinomials and loglinear models; Tensor modeling, Minimax theory 
Wasserman Lecture Notes, Yu97 
13 
Nov 27, 29 
22, 23 
Minimax theory 
Wasserman Lecture Notes 
14 
Dec 4, 8 
24, 25, 26 
Confidence regions and tests for highdimensional models, Project Presentation 
Wasserman 06 Chp 6, 7, Yu97, Tsybakov
book, Chapter 2 
15 
Dec 11 
26, Final Presentation 
Project Presentation 
Final Report Due 
References
more to come....
[BGA08]

Model Selection Through Sparse Maximum Likelihood Estimation
for Multivariate Gaussian or Binary Data
O. Banerjee, L. El Ghaoui, A. d'Aspremont,
Journal of Machine Learning Research, 9(Mar):485516, 2008.

[BRT09]

Simultaneous Analysis of Lasso and Dantzig Selector,
Peter Bickel, Ya'acov Ritov and Alexander Tsybakov,
Annals of Statistics, Vol.37, No. 4, 17051732, 2009.

[Bre95]

Better Subset Regression Using the Nonnegative Garrote,
Leo Breiman, Technometrics, Vol. 37, No. 4, Nov., 1995.

[CT05]

Decoding by Linear Programming,
Emmanuel Candes and Terrence Tao,
IEEE Inf. Theory 51 (2005), 42034215

[CT06]

Near Optimal Signal Recovery From Random Projections: Universal Encoding
Strategies? Emmanuel Candes and Terrence Tao,
IEEE Inf. Theory 52 (2006), 54065425

[CT07]

The Dantzig selector: statistical estimation when $p$ is much larger than $n$,
Emmanuel Candes and Terrence Tao,
Annals of Statistics 35 (2007), 23132351

[Don06]

Compressed sensing,
D. Donoho, IEEE Trans. Info. Theory, 52(4):12891306, 2006

[Dempster72]

Covariance Selection,
A. P. Dempster, Biometrics 28 (March 1972), pp. 157175.

[FHT08]

Sparse inverse covariance estimation with the graphical lasso,
Jerome Friedman, Trevor Hastie, and Robert Tibshirani,
Biostatistics, Vol. 9, No. 3. (1 July 2008), pp. 432441.

[FG94]

The Risk Inflation Criterion for Multiple Regression
Dean P. Foster and Edward I. George,
Ann. Statist. Volume 22, Number 4 (1994), pp. 19471975.

[vandeGBRD14]

On asymptotically optimal confidence regions and tests for
highdimensional models,
Sara van de Geer, Peter Buhlmann, Ya'acov Ritov and Ruben Dezeure.
Annals of Statistics, Volume 42, Number 3, 11661202, 2014.

[GR04]

Persistence in highdimensional linear predictor selection and the virtue of overparametrization
Eitan Greenshtein and Ya'acov Ritov,
Bernoulli Volume 10, Number 6 (2004), pp. 971988.

[LW08]

Rodeo: Sparse, greedy nonparametric regression,
John Lafferty and Larry Wasserman, Ann. Statist., Vol. 36, No. 1 (2008), pp 2863.

[MB06]

High dimensional graphs and variable selection with the Lasso,
Nicolai Meinshausen and Peter Buhlmann,
Annals of Statistics, 34(3): 1436  1462, 2006

[RWL10]

Highdimensional Ising model selection using $\ell_1$regularized logistic regression,
P. Ravikumar, M. J. Wainwright and J. Lafferty,
Annals of Statistics, Vol. 38, Number 3. pp. 12871319.

[Tib96]

Regression shrinkage and selection via the lasso, R. Tibshirani,
J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267288.

[Yu97]

Assouad, Fano, and Le Cam, Bin Yu, Festschrift for Lucien Le Cam,
D. Pollard, E. Torgersen, and G. Yang (eds), pp. 423435, SpringerVerlag.

[YuanLin 07]

Model selection and estimation in the Gaussian graphical model,
Ming Yuan and Yi Lin, Biometrika Vol. 94, No. 1 (2007), pp. 1935.

[Wai09]

Sharp thresholds for noisy and highdimensional recovery of sparsity using $\ell_1$constrained quadratic programming (Lasso).,
M. J. Wainwright, IEEE Transactions on Information Theory, 55:21832202, May 2009

[WR09]

Highdimensional variable selection,
Larry Wasserman and Kathryn Roeder,
Annals of Statistics 2009, Vol. 37, No. 5A, 21782201.

[ZZ13 ]

Confidence intervals for low dimensional parameters in high dimensional linear models,
CunHui Zhang and Stephanie Zhang, Journal of the Royal Statistical Society: Series B (Statistical
Methodology). Vol 76 Issue 1. 217242.

[ZY06]

On Model Selection Consistency of Lasso,
Peng Zhao and Bin Yu, J. Machine Learning Research, 7 (nov), 25412567.

[ZLW08]

Time varying undirected graphs,
Shuheng Zhou, John Lafferty and Larry Wasserman,
Machine Learning Journal, Vol 80, Numbers 23, Pages 295319, Sep. 2010

