Description Usage Arguments Details Value Author(s)
A function for comparing two sets of genes without relying on any distributional assumptions.
1 2 3 |
valCol |
A singular column containing the value to be bootstrapped |
data |
A data frame containing all the data required |
testIds |
A |
refIds |
A |
idCol |
The column in |
binCol |
The column in |
filt |
A text expression passed to the NSE capabilities of the |
nGenes |
|
nBoot |
|
minGenes |
|
... |
Passed to the function |
na.rm |
|
replace |
|
maxP |
The maximum probability (weight) allowed for an individual gene in the reference set.
Defaults to |
This is a modification of the bootMedians
function,
but is written to only work with a single column of values to be bootstrapped.
To apply across multiple value columns, please use lapply
or sapply
.
This function breaks the supplied data.frame
into two sets of test IDs & reference IDs.
The data.frame
must contain a column (binCol
) which classifies each ID into a bin.
The probabilities of bin membership in the test IDs are then used for sampling during the bootstrap procedure.
The values to be bootstrapped must be specified in the argument valCol
,
and this can be a regular expression or integer, but must specify only a single column
in the supplied data.frame
.
The function will automatically filter the data to remove any values ouside the specified criterion.
The function itself will sample the same number of IDs (nGenes
) from each dataset,
based on the probabilities of bin membership in the test dataset.
At each bootstrap iteration, the mean values for each column specified will be returned from both datasets,
with the reference values then subtracted from the tested values.
This allows direct comparison of these values as they will be drawn from similar
distributions based on the binning variable used.
If any genes have a probability of being resampled > maxP
they may exert undue influence on the results.
If any are found the process will stop to allow removal of this grouping.
Alternatively, the value for maxP
can be reset up to a maximum of 1,
which would represent maximum permissability.
A list
with components:
$samples
The sampled differences in the median values
$p
The proportion of sampled differences which are > 0
$nGenes
The number of genes sampled at each bootstrap iteration
$nBoot
The number of bootstrap iterations
$sampleSizes
A data_frame
with the sample sizes for each dataset,
broken down into Expressed and Not Expressed genes.
$testBins
The distributions of genes amongst the binning variable in the set of test IDs
$refBins
The distributions of genes amongst the binning variable in the set of reference IDs.
The final column represents the sampling probability for each individual gene in the corresponding bin
$missingBins
These are the bins not commonly represented in the dataset.
If any are found a non-fatal warning message will be printed during running of the process.
Steve Pederson
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.