SNPS
SNPS are a neat idea I picked up while interning at Google in the summer of 2007. Basically,
you report what you've been up to so that interested parties might contact you for collaboration.
Week of January 28
Goals
- Finish draft of UAI paper
- Finish work for ALEA conference
- Analyze behaviour of gen err CI as function of noise/signal
- Simulations for KShed
Interesting Stuff
- Read first two chapter of Neural Dynamic Programming by Bertsekas and Tsitsiklis
Finished
- First five sections of UAI paper
- Looked for behavior of gen err CI as function of noise/signal, findings
were intuitive, as noise grew, so did the width. One thing to note is
that Yang's CV method had nearly idential growth.
- Yang and I ran some simulations for kshed relating to learning curves.
Our current approach didn't work very well. We have a new approach based on
online learning algorithms which we hope will work better
-
Finished and submitted ALEA stuff
-
Some interesting papers from this week:
- A Comparison of Tight Generalization Error Bounds (Langford and Kaariainen)
-
Beating the Hold-Out: Bounds for K-fold and Progressive Cross-Validation (Blum,Kalai, Langford)
-
Tutorial on Practical Prediction Theory for Classification (Langford)
Week of Febuary 4
Goals
- Make revisions to UAI paper
- More background reading on training bounds
- Test online learning approach to estimating learning curve
- Chapter 3 Neuro-Dynamic Programming
- Get machine learning group up and running on N. Campus
- Set up adaptive design group
- Modeling stuff for NODA
Finished
- Chapter 3 NDP
- Met with Harsh about adaptive design group arranged the following talks
- Stochastic approximation by Anindya
- Bandit Problems by Yizao Wang
- More to come!
-
Made revisions of UAI paper, still unhappy about conclusions and future work see that draft here
-
Tested online approach to learning--works reasonably well, but needs more theoretical justification
- Some interesting stuff from the last two weeks
- The following papers were excellent reads:
- Convexity, classification, and risk bounds by Bartlett, Jordan and McAuliffe
- Theory of Classification: A Survey of Recent Advances by Boucheron,Bousquet and Lugosi
- Model Seletion by Bootstrap Penalization for Classification by Magalie Fromont
- I also had an intesting chat with Natesh Pillai this week. Natesh is a great stats grad student at Duke University, he's
interested in collaborating on some learning theory stuff--should be a fanstastic opportunity
Week of Febuary 18
Goals
- More revisions to UAI paper
- Beg and plead for feedback from CS buddies:
- --Ajit Singh a good friend from Google who puts up with my endless questions
-
- --Joelle Pineau a mentor and collaborator at McGill University
- --Clay Scott a great CS Prof. here a UMICH who has been
kind enough to provide insight and references
- Read a good part of Sylvain Arlot's thesis (not sure if I can link to this, but google should find it if intersted)
- Chapter 4 Neuro-Dynamic Programming
- Get machine learning group up and running on N. Campus (carry over from last time)
- Construct simulation framework to learn more about online-learning curve construction
- Modeling stuff for NODA (carry over from last time)
Finished
- Submitted the UAI paper which can be found here
- Read "Pathwise Stochastic Optimal Control" by Rogers (an intersting paper but not too helpful for current research)
- Finished chapter 4 from NDP
- Met with Erik Talvitie a CS grad student here, he's agreed to help get the machine learning group up and running on N. Campus
Good News!
- A paper that JJ Prescott and I sumbitted to ALEA was accepted, I'll post a copy when it's finished
Week of March 3
Goals
- Keep working on Arlot's thesis
- More background reading on training bounds
- Construct simulation scenario for online learning
- Work on ALEA paper
Finished
- Worked on ALEA paper (testing random assignment of criminal cases to declination prosecutors)
- Toward the above end, I read a paper on Sequential Monte Carlo Methods for Statistical Analysis of Tables by Chen,Diaconis, Holmes, and Lui (JASA 2005). This was a great paper--usefull for
small sample inference for contigency tables.
- Also read a paper on the mixing of Simulated Annealing and Importance Sampling by Neale (Statistics and Computing 2001)
- I went to Abbot's Close-Up convention in Colon, MI--this was great but impeded the rest of my work
Week of March 17
Goals
- Simulations for testing coverage properties of std. bootstrap
- Prepare for talk in Math 626 (Topic: Pathwise Stochastic Optimal Control)
- More background reading and simulation design for online learning
- Keep working on ALEA paper
Finished
Finished sims for Stanford talk
Incorperated new data into NODA dataset--data scrubbing is always a good time
The online learning algorithm is still giving us trouble...
Week of April 06
It's been too long since I've updated this. Since the last update I've been doing a bit
of traveling and spending way too much time on a plane. The upside is that I was able to
eat Kangaroo...so I guess there's that. Anywho, onto the goals...
Goals
- Finish paper for ALEA (or at least declination part)
- Give 811 talk
- Meet with Prof. Shedden about online learning algorithm--seems unstable
- Work on outline for CUD-Bound paper
- I came up with an idea for evaluating all possible classifications in polynomial time
for VC classes of fns, I'll post the idea here
, need to check if this is new or not
- Finish machine learning and adaptive design websites
Finished
All of the above except the adaptive design website...
Week of May 11th
Again, it's been too long since my last update. The UAI paper has been accepted, the reviewers
wanted more simulations so I've been scrambling to get these done. We're adding 6 new data sets,
5 of these are from UCI data repository and 1 is a particualarly nasty simulated data set. I
also tried adding the double bootstrap as a possible competitor--this turned out to be too unstable for
small sample sizes. The reason for this is that the repeated subsampling reduced the effective
sampling size too much [Example: Suppose we have 11 features and n = 30 training pts. The first
resample ==> ~ 19 unique points the second resample ==> ~ 12 unique points which leads
to severe overfitting. ] Despite it's poor performance with standard percentile bootstrap and small samples,
I still plan on trying this with the CUD
bound as a possible correction for conservatism.
For future simulations I think the m out of n bootstrap may be also reasonable
thing to try. I have lots to finish this week, so let's get started.
Goals
- Write up second stage randomization tests for ALEA
- Analyze results from new UAI simulations
- Make talk for adaptive design group (Topic: Stochastic Optimal Control)
Go Back