centerGeneData | R Documentation |
Performs per-row centering on a numeric matrix
centerGeneData(
x,
centerGroups = NULL,
na.rm = TRUE,
controlSamples = NULL,
useMedian = TRUE,
rmOutliers = FALSE,
madFactor = 5,
controlFloor = NA,
naControlAction = c("na", "row", "floor", "min"),
naControlFloor = 0,
rowStatsFunc = NULL,
returnGroupedValues = FALSE,
returnGroups = FALSE,
mean = NULL,
verbose = FALSE,
...
)
x |
|
centerGroups |
|
na.rm |
|
controlSamples |
|
useMedian |
|
rmOutliers |
|
madFactor |
|
controlFloor |
|
naControlAction |
|
rowStatsFunc |
|
returnGroupedValues |
|
returnGroups |
|
verbose |
|
... |
additional arguments are passed to |
This function centers data by subtracting the median or mean for each row.
Columns can be grouped using argument centerGroups
.
Each group group of columns defined by centerGroups
is centered independently.
Data can be centered relative to specific control columns
using argument controlSamples
.
When controlSamples
is not supplied, the default behavior
is to use all columns. This process is consistent with
typical MA-plots.
It may be preferred to define controlSamples
in cases where
there are known reference samples, against which other samples
should be compared.
The controlSamples
logic is applied independently to each
group defined in centerGroups
.
You can confirm the centerGroups
and controlSamples
are
correct in the result data, by accessing the attribute
"center_df"
, see examples below.
Note: This function assumes input data is suitable for centering by subtraction. This data requirement is true for:
most log-transformed gene expression data
quantitative PCR (QPCR) cycle threshold (CT) values
other numeric data that has been suitably transformed to meet reasonable parametric assumption of normality,
rank-transformed data which results in difference in rank
generally speaking, any data where the difference between 5 and 7 (2) is reasonably similar to the difference between 15 and 17 (2).
it may be feasible to perform background subtraction on straight count data, for example sequence coverage at a particular location in a genome.
The data requirement is not true for:
most gene expression data in normal space (hint: if any value is above 100, it is generally not log-transformed)
numeric data that is strongly skewed
generally speaking, any data where the difference between 5 and 7 is not reasonably similar to the difference between 15 and 17. If the percent difference is more likely to be the interesting measure, data may be log-transformed for analysis.
For special cases, rowStatsFunc
can be supplied to perform
specific group summary calculations per row.
When controlSamples
is supplied, and contains all NA
values
for a given row of data, within relevant centerGroups
subsets,
the default behavior is defined by naControlAction="NA"
below:
naControlAction="na"
: values are centered versus NA
which
results in all values NA
(current behavior, default).
naControlAction="row"
: values are centered versus the row,
using all samples in the same center group. This action effectively
"centers to what we have".
naControlAction="floor"
: values are centered versus a numeric
floor defined by argument naControlFloor
. When naControlFloor=0
then values are effectively not centered. However, naControlFloor=10
could for example be used to center values versus a practical noise
floor, if the range of detection for a particular experiment starts
at 10 as a low value.
naControlAction="min"
: values are centered versus the minimum
observed summary value in the data, which effectively uses the data
to define a value for naControlFloor
.
The motivation to center versus something other than controlSamples
when all measurements for controlSamples
are NA
is to have
a numeric
value to indicate that a measurement was detected in
non-control columns. This situation occurs in technologies when
control samples have very low signal, and in some cases report
NA
when no measurement is detected within the instrument range
of detection.
Other jam matrix functions:
jammacalc()
,
jammanorm()
,
matrix_to_column_rank()
x <- matrix(1:100, ncol=10);
colnames(x) <- letters[1:10];
# basic centering
centerGeneData(x);
# grouped centering
centerGeneData(x,
centerGroups=rep(c("A","B"), c(5,5)));
# centering versus specific control columns
centerGeneData(x,
controlSamples=letters[c(1:3)]);
# grouped centering versus specific control columns
centerGeneData(x,
centerGroups=rep(c("A","B"), c(5,5)),
controlSamples=letters[c(1:3, 6:8)]);
# confirm the centerGroups and controlSamples
x_ctr <- centerGeneData(x,
centerGroups=rep(c("A","B"), c(5,5)),
controlSamples=letters[c(1:3, 6:8)],
returnGroups=TRUE);
attr(x_ctr, "center_df");
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.