R: Remove Unwanted Variation, ridged inverse method

Remove Unwanted Variation, ridged inverse method

Description

The RUV-rinv algorithm. Estimates and adjusts for unwanted variation using negative controls.

Usage

RUVrinv(Y, X, ctl, Z = 1, eta = NULL, fullW0 = NULL, invsvd = NULL, lambda = NULL, k = NULL, l = NULL, randomization = FALSE, iterN = 1e+05, inputcheck = TRUE)

Arguments

`Y`	The data. A m by n matrix, where m is the number of samples and n is the number of features.
`X`	The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1.
`ctl`	The negative controls. A logical vector of length n.
`Z`	Any additional covariates to include in the model. Either a m by q matrix of covariates, or simply 1 (the default) for an intercept term.
`eta`	Gene-wise (as oposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. A matrix with n columns.
`fullW0`	Can be included to speed up execution.
`invsvd`	Can be included to speed up execution. Generally used when calling RUV(r)inv many times with different values of lambda.
`lambda`	Ridge parameter. If unspecified, an appropriate default will be used.
`k`	When calculating the default value of lambda, a call to RUV4 is made. This parameter specifies the value of k to use. Otherwise, an appropriate default k will be used.
`l`	If lambda and k are both NULL, then k must be estimated using the getK routine. The getK routine only accepts a single-column X. If p > 1, l specifies which column of X should be used in the getK routine.
`randomization`	Whether the inverse-method variances should be computed using randomly generated factors of interest (as opposed to a numerical integral).
`iterN`	The number of random "factors of interest" to generate (used only when randomization=TRUE).
`inputcheck`	Perform a basic sanity check on the inputs, and issue a warning if there is a problem.

Details

Implements the RUV-rinv algorithm as described in Gagnon-Bartsch, Jacob, and Speed (2013). This function is essentially just a wrapper to RUVinv, but with a little extra code to calculate the default value of lambda.

Value

A list containing

`betahat`	The estimated coefficients of the factor(s) of interest. A p by n matrix.
`sigma2`	Estimates of the features' variances. A vector of length n.
`t`	t statistics for the factor(s) of interest. A p by n matrix.
`p`	P-values for the factor(s) of interest. A p by n matrix.
`multiplier`	The constant by which `sigma2` must be multiplied in order get an estimate of the variance of `betahat`
`df`	The number of residual degrees of freedom.
`W`	The estimated unwanted factors.
`alpha`	The estimated coefficients of W.
`byx`	The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots.
`bwx`	The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots.
`X`	`X`. Included for reference.
`k`	`k`. Included for reference.
`ctl`	`ctl`. Included for reference.
`Z`	`Z`. Included for reference.
`fullW0`	Can be used to speed up future calls of RUV4.
`lambda`	`lambda`. Included for reference.

Examples

## Create some simulated data
m = 50
n = 10000
nc = 1000
p = 1
k = 20
ctl = rep(FALSE, n)
ctl[1:nc] = TRUE
X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p)
beta = matrix(rnorm(p*n), p, n)
beta[,ctl] = 0
W = matrix(rnorm(m*k),m,k)
alpha = matrix(rnorm(k*n),k,n)
epsilon = matrix(rnorm(m*n),m,n)
Y = X%*%beta + W%*%alpha + epsilon

## Run RUV-rinv
fit = RUVrinv(Y, X, ctl)

## Get adjusted variances and p-values
fit = variance_adjust(fit)