bootMeans: Bootstrap two gene sets and compare their mean values

Description Usage Arguments Details Value Author(s)

Description

A function for comparing two sets of genes without relying on any distributional assumptions.

Usage

1
2
3
bootMeans(valCol, data, testIds, refIds, idCol = 1L, binCol = "lengthBin",
  filt = "> -Inf", nGenes = 1000L, nBoot = 100L, minGenes = 200L, ...,
  na.rm = TRUE, replace = TRUE, maxP)

Arguments

valCol

A singular column containing the value to be bootstrapped

data

A data frame containing all the data required

testIds

A character vector with test set of Ids

refIds

A character vector with the reference setof Ids

idCol

The column in data containing the Ids in the vectors testIds and refIds. Can be specified as an integer position or as a character (regular expression).

binCol

The column in data containing the bin allocations for each gene. Can also be specified as an integer or by name.

filt

A text expression passed to the NSE capabilities of the filter_ function.

nGenes

integer. The number of genes to sample at each iteration. Values greater than the number of testIds will automatically be capped at the number of testIds

nBoot

integer. The number of bootstrap iterations to be performed

minGenes

integer. The minimum number of IDs required to conduct a bootstrap procedure with any meaning.

...

Passed to the function mean internally

na.rm

logical. Also passed internally to the function mean

replace

logical. Should the bootstrap use sampling with replacement (replace = TRUE) or without

maxP

The maximum probability (weight) allowed for an individual gene in the reference set. Defaults to 1/nGenes

Details

This is a modification of the bootMedians function, but is written to only work with a single column of values to be bootstrapped. To apply across multiple value columns, please use lapply or sapply.

This function breaks the supplied data.frame into two sets of test IDs & reference IDs. The data.frame must contain a column (binCol) which classifies each ID into a bin. The probabilities of bin membership in the test IDs are then used for sampling during the bootstrap procedure.

The values to be bootstrapped must be specified in the argument valCol, and this can be a regular expression or integer, but must specify only a single column in the supplied data.frame.

The function will automatically filter the data to remove any values ouside the specified criterion.

The function itself will sample the same number of IDs (nGenes) from each dataset, based on the probabilities of bin membership in the test dataset. At each bootstrap iteration, the mean values for each column specified will be returned from both datasets, with the reference values then subtracted from the tested values. This allows direct comparison of these values as they will be drawn from similar distributions based on the binning variable used.

If any genes have a probability of being resampled > maxP they may exert undue influence on the results. If any are found the process will stop to allow removal of this grouping. Alternatively, the value for maxP can be reset up to a maximum of 1, which would represent maximum permissability.

Value

A list with components:

Author(s)

Steve Pederson


steveped/funsForLu documentation built on May 30, 2019, 5:39 p.m.