Description Usage Arguments Details Value Author(s) References Examples
Returns a dataframe containing all the 'hits' here 2 or more observations in
source
and/or in ID
passing the threshold set by the supplied
criteria.
1 2 |
data |
Dataframe containing a column of identifiers and column(s) of assay data providing scores to determine if an individual is a putative hit. |
ID |
The name of the column containing individual identifiers. Must
contain same values or as |
source |
A list of individuals (contained in |
values |
Values (or columns) in |
var.cuts |
Logical, will variable cutoffs be used for each of the assays (columns)? Must provide high.cut and low.cut if TRUE |
low.cut |
A list of values (same length as the number of assay columns) giving the MAXIMUM value for an observation to be considered BELOW 'normal'. |
high.cut |
A list of values (same length as the number of assay columns) giving the MINIMUM value for an observation to be considered ABOVE 'normal'. |
cutoff |
p value below which observations are considered a putative hit. |
Z |
Z score which is considered a hit. |
... |
Other parameters. |
This function uses data coming out of the cdf.pval
function or
data with Zscores. Suggestions for using pvalue data are given below.
The whole data object can be used, including if there are additional
descriptors. ID
refers to the identifier for individuals. Does not
need to be unique. source
is optional and contains a list of
identifiers to be test for putative hits. If there are multiple individuals
with the same ID
(ex, in the same test group) then over half of them
need to meet the criteria to be a putative hit. values
indicates the
columns containing values to evaluate, with start = the position of the first
column and stop = the position of the last column.
If you wish to use a different cutoff for each column, then set
var.cuts
= TRUE and supply lists for both low.cut
and
high.cut
that correspond to the largest value to be considered a hit
on the low side (ex low abundance) and the smallest value to be considered
a hit on the high side (ex high abundance), respectively. Alternatively,
cutoff
is used for data coming out of cdf.pval
.
cutoff
=0.05 then values <=0.025 and values >= 0.975 will be
considered putative hits. If Zscores are provided (or other criteria where
values >= abs(x) are considered a hit), then Z
should be used to
define a cutoff.
data
are subsetted based on the column (ID
) either by all
levels (e.g. group A, group B) or by source
, if provided. Each column
in values
(e.g. assay) is evaluated to see if any individuals in that
column meet the criteria for a putative hit. If more than half of the
individuals meet the criteria to be a putative hit for that column, all the
individuals belonging to that level are put into the output data frame. If
not, then the remaining columns are evaluated or it moves to the next level.
Individual responses that are low or high are evaluated separately.
find_hits
returns a dataframe containing putative hits and data for
other individuals in their group.
Shannon M. Bell
Bell SM, Burgoon LD, Last RL. MIPHENO: Data normalization for high throughput metabolite analysis. BMC Bioinformatics 2012, 13(10)
1 | #See the sweave document in the corresponding paper for examples
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.