Remove Unwanted Variation, ridged inverse method

Description

The RUV-rinv algorithm. Estimates and adjusts for unwanted variation using negative controls.

Usage

RUVrinv(Y, X, ctl, Z = 1, eta = NULL, fullW0 = NULL, invsvd = NULL, lambda = NULL, k = NULL, l = NULL, randomization = FALSE, iterN = 1e+05, inputcheck = TRUE)

Arguments

Y

The data. A m by n matrix, where m is the number of samples and n is the number of features.

X

The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1.

ctl

The negative controls. A logical vector of length n.

Z

Any additional covariates to include in the model. Either a m by q matrix of covariates, or simply 1 (the default) for an intercept term.

eta

Gene-wise (as oposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. A matrix with n columns.

fullW0

Can be included to speed up execution.

invsvd

Can be included to speed up execution. Generally used when calling RUV(r)inv many times with different values of lambda.

lambda

Ridge parameter. If unspecified, an appropriate default will be used.

k

When calculating the default value of lambda, a call to RUV4 is made. This parameter specifies the value of k to use. Otherwise, an appropriate default k will be used.

l

If lambda and k are both NULL, then k must be estimated using the getK routine. The getK routine only accepts a single-column X. If p > 1, l specifies which column of X should be used in the getK routine.

randomization

Whether the inverse-method variances should be computed using randomly generated factors of interest (as opposed to a numerical integral).

iterN

The number of random "factors of interest" to generate (used only when randomization=TRUE).

inputcheck

Perform a basic sanity check on the inputs, and issue a warning if there is a problem.

Details

Implements the RUV-rinv algorithm as described in Gagnon-Bartsch, Jacob, and Speed (2013). This function is essentially just a wrapper to RUVinv, but with a little extra code to calculate the default value of lambda.

Value

A list containing

betahat

The estimated coefficients of the factor(s) of interest. A p by n matrix.

sigma2

Estimates of the features' variances. A vector of length n.

t

t statistics for the factor(s) of interest. A p by n matrix.

p

P-values for the factor(s) of interest. A p by n matrix.

multiplier

The constant by which sigma2 must be multiplied in order get an estimate of the variance of betahat

df

The number of residual degrees of freedom.

W

The estimated unwanted factors.

alpha

The estimated coefficients of W.

byx

The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots.

bwx

The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots.

X

X. Included for reference.

k

k. Included for reference.

ctl

ctl. Included for reference.

Z

Z. Included for reference.

fullW0

Can be used to speed up future calls of RUV4.

lambda

lambda. Included for reference.

Examples

## Create some simulated data
m = 50
n = 10000
nc = 1000
p = 1
k = 20
ctl = rep(FALSE, n)
ctl[1:nc] = TRUE
X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p)
beta = matrix(rnorm(p*n), p, n)
beta[,ctl] = 0
W = matrix(rnorm(m*k),m,k)
alpha = matrix(rnorm(k*n),k,n)
epsilon = matrix(rnorm(m*n),m,n)
Y = X%*%beta + W%*%alpha + epsilon

## Run RUV-rinv
fit = RUVrinv(Y, X, ctl)

## Get adjusted variances and p-values
fit = variance_adjust(fit)