The RUV-4 algorithm. Estimates and adjusts for unwanted variation using negative controls.
RUV4(Y, X, ctl, k, Z = 1, eta = NULL, fullW0 = NULL, inputcheck = TRUE)
Y |
The data. A m by n matrix, where m is the number of samples and n is the number of features. |
X |
The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1. |
ctl |
The negative controls. A logical vector of length n. |
k |
The number of unwanted factors to use. Can be 0. |
Z |
Any additional covariates to include in the model. Either a m by q matrix of covariates, or simply 1 (the default) for an intercept term. |
eta |
Gene-wise (as oposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. A matrix with n columns. |
fullW0 |
Can be included to speed up execution. |
inputcheck |
Perform a basic sanity check on the inputs, and issue a warning if there is a problem. |
Implements the RUV-4 algorithm as described in Gagnon-Bartsch, Jacob, and Speed (2013), using the SVD as the factor analysis routine. Unwanted factors W are estimated using control genes. Y is then regressed on the variables X, Z, and W.
A list containing
betahat |
The estimated coefficients of the factor(s) of interest. A p by n matrix. |
sigma2 |
Estimates of the features' variances. A vector of length n. |
t |
t statistics for the factor(s) of interest. A p by n matrix. |
p |
P-values for the factor(s) of interest. A p by n matrix. |
multiplier |
The constant by which |
df |
The number of residual degrees of freedom. |
W |
The estimated unwanted factors. |
alpha |
The estimated coefficients of W. |
byx |
The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots. |
bwx |
The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots. |
X |
|
k |
|
ctl |
|
Z |
|
fullW0 |
Can be used to speed up future calls of RUV4. |
## Create some simulated data m = 50 n = 10000 nc = 1000 p = 1 k = 20 ctl = rep(FALSE, n) ctl[1:nc] = TRUE X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p) beta = matrix(rnorm(p*n), p, n) beta[,ctl] = 0 W = matrix(rnorm(m*k),m,k) alpha = matrix(rnorm(k*n),k,n) epsilon = matrix(rnorm(m*n),m,n) Y = X%*%beta + W%*%alpha + epsilon ## Run RUV-4 fit = RUV4(Y, X, ctl, k) ## Get adjusted variances and p-values fit = variance_adjust(fit)