Some statistics related things

collected here to share and for reference

R & Shiny

  • Your don't have to know JavaScript to use JavaScript libraries in Shiny. A guide to htmlwidget. (Mar 2019)

  • To use output values in a conditionalPanel, you must render it in UI first. See this post. (Feb 2019)

  • Caching plots in Shiny is now possible (though not with Plotly yet). (Feb 2019)

Stats

  • The ridge solution can be written as a weighted average of 2^p regression coefficients on all possible subsets of variables. See Leamer and Chaimberlain (1976). (Nov 2019)

  • A brainteaser that I couldn't solve during an interview. Not really stats-related. (Oct 2019)

  • Don't use AUROC when evaluating the predictive performance for rare events. The precision-recall curve handles multiple testing (through FDR) and better reflects the difficulty of the problem. See also, a post by Jason Brownlee. (Jul 2019)

  • You cannot arbitrarily design full conditional distributions and expect them to be compatible (only when they are compatible can you use Brook's lemma). I was suprised how this is glossed over by many people. (numerical example, theory)

  • There is little reason to unconditionally favor Fisher's method (sum of negative log p-values) over, say, Edginton's Method (sum of p-values) for combining p-values (although it is better known that Fisher's method and Bonferroni's method are good at picking out different alternatives). In some cases of small and distributed effects, the latter wins; see numerical example, theory. In particular, Edginton's Method is optimal when the p-values under the alternative are truncated exponentials.

  • Don't include both log(x) and log^2(x) in your linear model; see this post.