require(kableExtra) require(tidyverse) require(MF) ## example data a <- data.frame( room = paste("Room", rep(c("W", "Z"), each = 24)), pen = paste("Pen", rep(LETTERS[1:6], each = 8)), litter = paste("Litter", rep(11:22, each = 4)), tx = rep(rep(c("vac", "con"), each = 2), 12) ) set.seed(76153) a$lung[a$tx == "vac"] <- round(rnorm(24, 5, 1.3), 2) a$lung[a$tx == "con"] <- round(rnorm(24, 7, 1.3), 2) a <- a[-48, ] a
\clearpage
This document is intended to supplement function help pages and
guide users through the steps of applying functions of
the MF
package to the evaluation of mitigated fraction when data is
arranged in a nested hierarchical structure, first implemented in version 4.3.5.
Details of the algorithm are covered in a separate vignette titled Algorithms for calculating MF from hierarchical data. When calculating MF for clustered but
not hierarchical data, refer to help pages for functions MFClus
and MFClusBoot
.
For convenience, all help pages can also be accessed via the MF Manual.
Examples in this document expect input data to be structured in a nested hierarchical tree, as shown in Table 1. If there is only one grouping variable or the experimental design is crossed, it is not appropriate to use this guide as written.
A nested design assumes that the factor level of one variable co-occurs with that of another variable. For example, Pen D exists only within Room Z.
kbl(a, align = "c", row.names = FALSE, caption = "Nested hierarchical data structure.") |> kable_styling("bordered", full_width = FALSE, font_size = 8) |> collapse_rows(columns = 1:4, valign = "middle")
Users have the option of simply calculating the mitigated fraction or simultaneously bootstrapping the mitigated fraction (including confidence interval) and calculating the mitigated fraction. Examples of both options are shown in this document.
Examples in this manual were created using MF version r packageVersion('MF')
,
r R.version.string
on Windows. CVB has not tested code usage on other systems.
The package can be found online at: https://github.com/ABS-dev/MF/blob/master/README.md, including installation instructions. It is expected that users have passing familiarity with R code and usage, as CVB does not have resources to address training or IT issues that may occur at external organizations.
Bug reports can be submitted online at: https://github.com/ABS-dev/MF/issues. To expedite resolution, please include a minimal working example and refrain from using confidential data. Incomplete issues may be closed without further investigation. Do not include confidential data as issue tracking is visible to all.
\clearpage
To calculate mitigated fraction directly from data without bootstrapping, use
the function MFClusHier
.
This function requires four inputs:
formula
: The formula in form y ~ x + a/b/c
where y
is a continuous
response, x
is a factor with two levels of treatment, and a/b/c
are grouping
variables. Nesting is assumed to be in order, left
to right, highest to lowest. So a single level of "a" will contain multiple
levels of "b" and a single level of "b" will contain multiple levels of "c".data
: The data table (data.frame
or tibble
) with variables as identified
in formula. If there are extra variables, they will be ignored.compare
: Treatment groups, a text vector of length two. The first position [1] is treated as
the control or reference group to which members of the second position [2] are
compared.which.factor
: One or more variable(s) of interest. This can be any of the
grouping variables from the data set. If "All" is specified, a summary MF will be
calculated for the whole tree.Refer to the MFClusHier
help page (?MFClusHier
) for extended usage details.
The which.factor
argument allows users the option of selecting if mitigated
fraction should be calculated for levels of a particular variable. In the data
from Table 1, it may be of interest to calculate the mitigated fraction for each
level of variable "room". It is possible to consider levels from multiple
variables simultaneously, and the designator All
can be used to evaluate for
the entire tree, without breaking out by any variable. Evaluating the value of
mitigated fraction for the entire tree is the default behavior, should the user
not specify anything to the which.factor
argument.
Default case, looking only at whole tree:
MFClusHier(formula = lung ~ tx + room / pen / litter, data = a)
Selecting a single variable to evaluate for each factor:
MFClusHier(formula = lung ~ tx + room / pen / litter, data = a, which.factor = "room")
\clearpage
Selecting multiple variables, including the entire tree:
MFClusHier(formula = lung ~ tx + room / pen / litter, data = a, which.factor = c("room", "litter", "All"))
The default display of MFClusHier
output is a table with one row
for each unique mitigated fraction calculated. A unique mitigated fraction value
will be calculated for each level of the variables identified by the user in the
argument which.factor
(see previous discussion). The total number of rows is the
sum of the number of unique levels for each grouping variable the user specified.
The value "All" will add one row to this table.
mf_multiple <- MFClusHier(formula = lung ~ tx + room / pen / litter, data = a, which.factor = c("room", "litter", "All")) mf_multiple
mf_multiple$MFnest |> mutate(MF = round(MF, 2)) |> as.data.frame()
The table columns are:
variable
: Which variable was considered when evaluating MF for the row.level
: A unique factor level of the variable which was considered when evaluating MF for the row.MF
: Mitigated fraction calculated value.N1N2
: Sum of the n1n2
values from the rank table for the factor level of
that row.U
: Sum of the u
values from the rank table for the factor level of
that row.con_N
& vac_N
: Sum of the counts from the rank table matching the
particular factor level of the row. Note that the left hand side of the underscore will match values passed by user to compare
.con_medResp
& vac_medResp
: Median of responses for each comparison
group matching the particular factor level of the row. Note that the left hand side of the underscore will match values passed by user to argument compare
.This section covers the technical description of how to access the rank table
from MFh()
and summarize using MFnest()
. For a full discussion of the
algorithms used in these functions and how they are related, see Algorithms for calculating MF from hierarchical data.
To access the rank table, use the MFh field of the output object from
MFClusHier()
. For example:
thisMFh <- mf_multiple$MFh thisMFh
thisMFh$coreTbl |> as.data.frame()
Alternatively, calculate rank table directly by using the MFh()
function:
MFh(formula = lung ~ tx + room / pen / litter, data = a)
MFh(formula = lung ~ tx + room / pen / litter, data = a)$coreTbl |> as.data.frame()
The output table includes the following information, used to calculate the Basic Output table as described previously:
con_n
& vac_n
: Counts of observations for each treatment for a
particular instance of unique factor level of a variable. Note that the left
hand side of the underscore will match values passed by user to compare
.n1n2
: Product of counts, con_n
* vac_n
.w
: Wilcoxon statistic.u
: Mann-Whitney statistic.con_medResp
& vac_medResp
: Median observed response for each treatment group
in a particular instance of unique factor level. Note that the
left hand side of the underscore will match values passed by user to compare
.There is one row for each unique combination of factor levels across all variables. This table shows the initial statistics as determined by the experimental design. Refer to the vignette for algorithm design Algorithms for calculating MF from hierarchical data for in-depth discussion of how the rank is used to evaluate mitigated fraction values.
The rank table is not affected by changes to the which.factor
argument.
The rank table can be used to re-calculate the mitigated fraction values, for
example if the user intends to explore a different selection to the
which.factor
argument. To do this, use the function MFnest
which takes
the following input arguments:
Y
: MFh field output as from MFClusHier()
which.factor
: As above.For example:
MFnest(thisMFh, which.factor = "pen")
MFnest(thisMFh, which.factor = "pen") |> as.data.frame()
This is the same output as if the user had initially selected for variable "pen":
MFClusHier(formula = lung ~ tx + room / pen / litter, data = a, which.factor = "pen")
MFClusHier(formula = lung ~ tx + room / pen / litter, data = a, which.factor = "pen")$MFnest |> as.data.frame()
The function MFClusBootHier
allows for a bootstrapping approach to calculating
mitigated fraction values. Input arguments are:
Same as in non-bootstrapped usage (i.e. MFClusHier)
formula
: The formula in form y ~ x + a/b/c
where y
is a continuous
response, x
is a factor with two levels of treatment, and a/b/c
are grouping
variables. Nesting is assumed to be in order, left
to right, highest to lowest. So a single level of "a" will contain multiple
levels of "b" and a single level of "b" will contain multiple levels of "c".data
: The data table (data.frame
or tibble
) with variables as identified in argument formula
. If there are extra variables, they will be ignored.compare
: Treatment groups, a text vector of length two. The first position [1] is treated as the control or reference group to which members of the second position [2] are compared.which.factor
: Variable(s) of interest. This can be any of the
grouping variables from the data set. If "All" is specified, a summary MF will be
calculated for the whole tree.Additional arguments
nboot
: Number of bootstrapping events.boot.unit
: Boolean whether to sample observations from within those of the
same core.boot.cluster
: Boolean whether to sample which clusters are present. If TRUE,
some trees have all the clusters represented in the original data while others only
have a subset.alpha
: Complement of the confidence level. As used in MFClusBoot
.seed
: Used to initialize random number generator for reproducibility.A "core" is the unique combination of variable levels from the data, including
the compare
designation. In Table 1, one core would
be Room W/Pen A/Litter 11/vac and another would be Room Z/Pen F/Litter 22/con.
A "cluster" is the the unique combination of variable levels from the data,
without the compare
designation. In Table 1, the first four observations
are from the same core, Room W/Pen A/Litter 11.
Refer to the MFClusHier
help page (?MFClusBootHier
) for extended usage details.
Further discussion regarding the bootstrapping algorithm can be found in Algorithms for calculating MF from hierarchical data.
Default case, looking only at whole tree and bootstrapping both at the cluster and unit levels:
MFClusBootHier(formula = lung ~ tx + room / pen / litter, data = a)
Specifying what bootstrapping sampling to occur:
MFClusBootHier(formula = lung ~ tx + room / pen / litter, data = a, boot.unit = TRUE, boot.cluster = FALSE, which.factor = "room")
Adjusting the number of bootstrapping events and alpha:
MFClusBootHier(formula = lung ~ tx + room / pen / litter, data = a, boot.unit = FALSE, boot.cluster = TRUE, alpha = 0.1, which.factor = c("room", "litter", "All"))
\clearpage
The default display of MFClusBootHier
is a table with one row for each unique
mitigated fraction calculated, just like in the non-bootstrapped approach.
However, instead of summary statistics about the calculated mitigated fraction,
there are values summarizing the mitigated fraction from a bootstrapped
population.
mfboot_multiple <- MFClusBootHier(formula = lung ~ tx + room / pen / litter, data = a, boot.unit = FALSE, boot.cluster = TRUE, alpha = 0.1, which.factor = c("room", "litter", "All"), seed = 150) mfboot_multiple
mfboot_multiple$MFnestBoot$mfnest_summary |> mutate(median = round(median, 2), etlower = round(etlower, 2), mf.obs = round(mf.obs, 2)) |> as.data.frame()
\clearpage
The variable columns are:
variable
: Which variable was considered when evaluating MF for the row.level
: A unique factor level of the variable which was considered when
evaluating MF for the row.median
: Median of mitigated fractions calculated from the bootstrapped
population.etlower
: Lower value of equal tailed range of mitigated fractions calculated
from the bootstrapped population.etupper
: Upper value of equal tailed range of mitigated fractions calculated
from the bootstrapped population.hdlower
: Lower value of the highest posterior density range of mitigated fractions calculated
from the bootstrapped population.hdupper
: Upper value of the highest posterior density range of mitigated fractions calculated
from the bootstrapped population.mf.obs
: Mitigated fraction value calculated from data input, without
bootstrapping.This section covers the technical description of how to access the bootstrapped
results and use the MFnestBoot()
function. For a full discussion of the
algorithms of how MFhBoot()
,
MFnestBoot()
and MFClusBootHier
are related, or details of the bootstrapping
algorithm, see Algorithms for calculating MF from hierarchical data.
The bootstrapping stage is the most computationally intensive, so a user may
wish to bypass this step subsequently if the only change are values
being passed to the which.factor
or alpha
arguments. Access bootstrapping step outputs
using the MFhBoot field of the output object. For example:
this_boot_mfh <- mfboot_multiple$MFhBoot this_boot_mfh
It is possible to calculate mfboot_multiple$MFhBoot
directly by using the
MFhBoot()
function, however this executes the bootstrapping stage again. To
get the same output with both approaches, the seed
argument must be the same
value.
MFhBoot(formula = lung ~ tx + room / pen / litter, data = a, boot.unit = FALSE, boot.cluster = TRUE, seed = 150)
This output is a list of four objects:
bootmfh
: The rank table of all the bootstrapped data sets. This is
formatted as the rank table for non-bootstrapped data with designation of a single
bootstrapping instance using the additional variable bootID. Values
for bootID are simply 1:nboot
.clusters
: Unique clusters in the data.compare
: User-supplied value passed to compare argument.mfh
: The rank table from user-supplied data; formatted as for non-bootstrapped data.Note that in the particular case of boot.cluster = FALSE
, although all
nboot
instances will be the same clusters as input data, observed responses
will vary due to sampling of the cores. The case of both
boot.cluster
and boot.unit
as FALSE is simply calculating MF from data
without any bootstrapping.
To bootstrap from the same data in a reproducible manner, numerical
value for the seed argument in MFClusBootHier
. If no value for seed
is
specified, the function defaults to a value of sample(1:1e+05, 1)
and
subsequent iterations may yield different summaries.
MFClusBootHier(formula = lung ~ tx + room / pen / litter, data = a, seed = 150)
Since the bootstrapping step is the most computationally intensive, a user may
wish to use the bootstrapped rank table to recalculate mitigated fraction for
levels of different variables or a different alpha
for calculating confidence
intervals. This is possible using the MFnestBoot
function,
which takes the following input arguments:
x
: MFhBoot field output as from MFClusBootHier
.which.factor
: As above.alpha
: As above.For example:
MFnestBoot(this_boot_mfh, which.factor = c("pen", "All"), alpha = 0.1)
a <- data.frame( room = paste("Room", rep(c("W", "Z"), each = 24)), pen = paste("Pen", rep(LETTERS[1:6], each = 8)), litter = paste("Litter", rep(11:22, each = 4)), tx = rep(rep(c("vac", "con"), each = 2), 12) ) set.seed(76153) a$lung[a$tx == "vac"] <- round(rnorm(24, 5, 1.3), 2) a$lung[a$tx == "con"] <- round(rnorm(24, 7, 1.3), 2) a <- a[-48, ]
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.