Quick Links

Remove Unwanted Variation (RUV)
Classification Permutation Test
Cell type deconvolution
Randomized experiments
Microenvironment Microarrays
Social Media and Surveys
Stably Expressed Genes

Working and Recent Papers

Working Papers

Precise Unbiased Estimation in Randomized Experiments using Auxiliary Observational Data
J. A. Gagnon-Bartsch*, A. C. Sales*, E. Wu, A. E. Botelho, L. W. Miratrix, N. T. Heffernan
The Role of Scale in the Estimation of Cell-type Proportions
G. J. Hunt, J. A. Gagnon-Bartsch
Additional resources
A Critical Evaluation of Tracking Surveys with Social Media: A Case Study in Presidential Approval
R. A. Ferg, F. G. Conrad, and J. A. Gagnon-Bartsch
Additional resources

Recently Published / Available Online

Design-Based Covariate Adjustments in Paired Experiments
E. Wu and J. A. Gagnon-Bartsch
Journal of Educational and Behavioral Statistics, 2020
Automatic Transformation and Integration to Improve Visualization and Discovery of Latent Effects in Imaging Data
G. J. Hunt, M. A. Dane, J. E. Korkola, L. M. Heiser, J. A. Gagnon-Bartsch
Journal of Computational and Graphical Statistics, 2020
Additional resources
Stably expressed genes in single-cell RNA-sequencing
J. Deeke and J. A. Gagnon-Bartsch
Journal of Bioinformatics and Computational Biology, 2020
Additional resources
The Classification Permutation Test: A Flexible Approach to Testing for Covariate Imbalance in Observational Studies.
J. A. Gagnon-Bartsch and Y. Shem-Tov
The Annals of Applied Statistics, 2019
Additional resources
Social Media as an Alternative to Surveys of Opinions about the Economy
F. G. Conrad*, J. A. Gagnon-Bartsch*, R. A. Ferg, M. Schober, J. Pasek, and E. Hou
Social Science Computer Review, 2019
Additional resources
A new normalization for the Nanostring nCounter gene expression assay
R. Molania, J. A. Gagnon-Bartsch, A. Dobrovic, and T. P. Speed
Nucleic Acids Research, 2019
scMerge: Integration of multiple single-cell transcriptomics datasets leveraging stable expression and pseudo-replication
Y. Lin, S. Ghazanfar, K. Wang, J. A. Gagnon-Bartsch, K. K. Lo, X. S. Su, Z.-G. Han, J. T. Ormerod, T. P. Speed, P. Yang, and J. Y. H. Yang
Proceedings of the National Academy of Sciences, 2019

Projects

Remove Unwanted Variation (RUV)

About

RUV is a set of methods originally developed to remove batch effects and other unwanted variation from gene expression data. More generally, RUV attempts to adjust high dimensional data for unobserved confounders, by making use of negative controls and replicates. A negative control is a variable that is known a priori to be (1) unaffected by the factor of interest, and (2) affected by the unobserved confounders. Negative controls and replicates can be used to help identify unwanted variation and separate it from variation of interest, even when the wanted and unwanted variation are correlated, and even when the factors causing the unwanted variation are unknown.

Resources

Main Project Page
Selected Manuscripts
CRAN R package
Docker image on Docker Hub. Includes shiny app and tutorial.

Balance Testing

About

In many studies in the social sciences and medicine the researcher does not control treatment assignment and instead may rely upon natural experiments or matching methods as a substitute to experimental randomization. In such cases it is helpful to check whether observed covariates are balanced across treatment conditions. The Classification Permutation Test (CPT) is a covariate balance test that first trains a classifier to distinguish treated units from control units, and then, using permutation inference, determines whether the classifier is able to do so better than would be expected by chance.

Resources

Estimating Cell Type Proportions

About

Biological tissues are typically composed of several distinct cell types. dtangle is a method to estimate the proportions of different cell types comprising a tissue sample from gene expression data. (This is sometimes referred to as "cell type deconvolution.") Similar to other deconvolution methods, dtangle requires reference expression profiles for each cell type, as well as a list of marker genes that are expressed primarily in one cell type. Where dtangle is unique is in its treatment of scale; gene expression values are considered on both linear and log scales, with the dual aims of a scientifically plausible mixing model, and statistical robustness of the fitting procedure.

Resources

dtangle: accurate and robust cell type deconvolution
- Docker image
The Role of Scale in the Estimation of Cell-type Proportions
- Docker image
Github Project Page (Includes vignettes, datasets, and scripts to reproduce the analyses in the papers.)
CRAN R package
Greg Hunt's web page

Covariate Adjustment in Randomized Experiments

About

Two advantages of randomized trials are (1) potential confounding variables are largely balanced across treatment conditions, and (2) design-based inference may be used, in which statistical assumptions are largely justified by the physical act of randomization. Randomization does not balance potential confounders perfectly, however, and there are typically small observed imbalances in baseline covariates. Adjusting for these imbalances can improve the precision of treatment effect estimates, but methods that do so are not always design-based. We are working on developing new design-based estimators to fill this gap.

Resources

Microenvironment Microarrays

About

The immediate physical and bio-chemical surroundings of a cell, the cellular microenvironment, is an important component of many fundamental cell and tissue level processes and is implicated in many diseases and dysfunctions. To study perturbations of cellular microenvironments a novel image-based cell-profiling technology called the microenvironment microarray (MEMA) has been recently employed. We are helping to develop an analysis pipeline for MEMA data, which will allow for the integration of image features, as well as adjustments for spatial and other technical artifacts in the data.

Resources

Automatic Transformation and Integration to Improve Visualization and Discovery of Latent Effects in Imaging Data
- Docker image
rrscale CRAN R package
rrscale project page on Github
The LINCS and MEP LINCS web pages
Synapse (data)
Greg Hunt's web page

Social Media and Surveys

About

Traditional surveys, such as those used for political polling and economic research, are expensive and suffer increasingly from non-response. Social media has been suggested as an alternative data source to augment or even replace survey data. We are exploring the feasibility of using social media data in this manner.

Resources

Stably Expressed Genes

About

Genes with relatively stable expression are of biological interest, and also useful for normalization (see RUV above). We are interested in discovering such genes, and especially interested in genes that are stable even at the level of single cells. At the single cell level, the notion of stability is somewhat ambiguous; for example, a gene could be stable in terms of the absolute quantity of transcripts, or in terms of concentration (proportional to cell size). Our goal is to identify different sets of genes that satisfy different notions of stability.

Johann Gagnon-Bartsch

Contact Info

Quick Links

Working and Recent Papers

Working Papers

Recently Published / Available Online

Projects

Remove Unwanted Variation (RUV)

About

Resources

Balance Testing

About

Resources

Estimating Cell Type Proportions

About

Resources

Covariate Adjustment in Randomized Experiments

About

Resources

Microenvironment Microarrays

About

Resources

Social Media and Surveys

About

Resources

Stably Expressed Genes

About

Resources