RUV starter analysis

Description

A script that performs many RUV analyses, generates plots, tables, etc., and nicely formats the output in an html file. This script is useful for getting a quick first look at your data; it should not be considered to provide a final or complete analysis. More information about the usage of this script is provided in the how-to file in the "inst" sub-directory of the ruv.extras package.

Usage

ruv_starter_analysis(Y, X, ctl, Z = 1, eta = NULL, pctl = NULL, genecoloring = NULL, samplecoloring = NULL, genetexts = NULL, sampletexts = NULL, genesymbols = NULL, samplesymbols = NULL, geneinfo = NULL, rankbybeta = FALSE, topN = 40, topcount_thresholds = c(20, 40, 60, 80, 100), rankset = NULL, kset = c(1, 2, 3, 5, 7, 10, 15, 20, 30, 50, 75, 100, 200, 500, 1000, 2000, 5000, 10000), factorset = 1:5, bin = 10, do_general = TRUE, do_unadjusted = TRUE, do_ruv2 = TRUE, do_ruv4 = TRUE, do_ruvinv = TRUE, do_ruvrinv = TRUE, do_pptable = TRUE, outdir = "html", initialize_collapsed = FALSE, webtitle = "RUV Starter Analysis", inputcheck = TRUE, verbose = FALSE)

Arguments

Y

The data. A m by n matrix, where m is the number of samples and n is the number of features.

X

The factor of interest. A m by 1 matrix, where m is the number of samples.

ctl

The negative controls. A logical vector of length n.

Z

Any additional covariates to include in the model. Either a m by q matrix of covariates, or simply 1 (the default) for an intercept term.

eta

Gene-wise (as oposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. A matrix with n columns.

pctl

Positive controls. A logical vector of length n.

genecoloring

A vector of length n. The colors to use when plotting genes.

samplecoloring

A vector of length m. The colors to use when plotting samples.

genetexts

A vector of length n. Any text to be used in place of symbols, when plotting genes. Elements that are NA are plotted as symbols.

sampletexts

A vector of length m. Any text to be used in place of symbols, when plotting samples. Elements that are NA are plotted as symbols.

genesymbols

A vector of length n. The plot symbols to use when plotting genes.

samplesymbols

A vector of length m. The plot symbols to use when plotting symbols.

geneinfo

A matrix with n rows. Each column should contain some information about the genes (such as their names) for use in tables.

rankbybeta

Should the analysis include a ranking of the features based on the absolue value of estimated effect size (betahat)?

topN

The number of top-ranked genes to include in tables.

topcount_thresholds

The thresholds to use when counting the number of top-ranked positive controls.

rankset

The genes to be considered when determining which are top-ranked. A logical vector. NULL implies all genes.

kset

Which values of K should be considered.

factorset

Which factors should be included in the projection plot table.

bin

The bin size in the method of empirical variances.

do_general

Should the "general" analysis be performed?

do_unadjusted

Should the "unadjusted" analysis be performed?

do_ruv2

Should the RUV-2 analysis be performed?

do_ruv4

Should the RUV-4 analysis be performed?

do_ruvinv

Should the RUV-inv analysis be performed?

do_ruvrinv

Should the RUV-rinv analysis be performed?

do_pptable

Should the factor projection plot table be created?

outdir

Directory where the web page should be written.

initialize_collapsed

Should the web page be created so that only headers are shown, and must be manually expanded?

webtitle

The title of the web page.

inputcheck

Perform a basic sanity check on the inputs, and issue a warning if there is a problem.

verbose

Verbose output.

Value

Does not return a (meaningful) value. Generates a web page.