Remove Unwanted Variation, 2-step

Description

The RUV-2 algorithm. Estimates and adjusts for unwanted variation using negative controls.

Usage

RUV2(Y, X, ctl, k, Z = 1, eta = NULL, fullW = NULL, inputcheck = TRUE)

Arguments

Y

The data. A m by n matrix, where m is the number of samples and n is the number of features.

X

The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1.

ctl

The negative controls. A logical vector of length n.

k

The number of unwanted factors to use. Can be 0.

Z

Any additional covariates to include in the model. Either a m by q matrix of covariates, or simply 1 (the default) for an intercept term.

eta

Gene-wise (as oposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. A matrix with n columns.

fullW

Can be included to speed up execution.

inputcheck

Perform a basic sanity check on the inputs, and issue a warning if there is a problem.

Details

Implements the RUV-2 algorithm as described in Gagnon-Bartsch and Speed (2012), using the SVD as the factor analysis routine. Unwanted factors W are estimated using control genes. Y is then regressed on the variables X, Z, and W.

Value

A list containing

betahat

The estimated coefficients of the factor(s) of interest. A p by n matrix.

sigma2

Estimates of the features' variances. A vector of length n.

t

t statistics for the factor(s) of interest. A p by n matrix.

p

P-values for the factor(s) of interest. A p by n matrix.

multiplier

The constant by which sigma2 must be multiplied in order get an estimate of the variance of betahat

df

The number of residual degrees of freedom.

W

The estimated unwanted factors.

alpha

The estimated coefficients of W.

byx

The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots.

bwx

The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots.

X

X. Included for reference.

k

k. Included for reference.

ctl

ctl. Included for reference.

Z

Z. Included for reference.

fullW

Can be used to speed up future calls of RUV2.

Examples

## Create some simulated data
m = 50
n = 10000
nc = 1000
p = 1
k = 20
ctl = rep(FALSE, n)
ctl[1:nc] = TRUE
X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p)
beta = matrix(rnorm(p*n), p, n)
beta[,ctl] = 0
W = matrix(rnorm(m*k),m,k)
alpha = matrix(rnorm(k*n),k,n)
epsilon = matrix(rnorm(m*n),m,n)
Y = X%*%beta + W%*%alpha + epsilon

## Run RUV-2
fit = RUV2(Y, X, ctl, k)

## Get adjusted variances and p-values
fit = variance_adjust(fit)