Description Usage Arguments Details Value Author(s) References See Also Examples
This function creates a frequency MAP object from categorical data and a continuous covariate. The frequency MAP is a moving average estimate of category frequencies, where frequency means and posterior bounds are estimated.
1 |
dat |
A dataframe with two columns. The first column should have the numeric value over which we are calculating the moving average (e.g. age), the 2nd column is a string vector giving the category (e.g. genotype). |
x |
A numeric vector at which to calculate the moving average
(relative to the first column of |
x.label |
A string which will be used to name the covariate
column (first column) of the output moving average dataframe,
|
hw |
The half-width of the moving average. See Details below. |
cat.names |
Optional. The categories to analyze. The default behavior is to use all unique values in dat[,2]. See Details below. |
cat.short |
Optional. A string vector of the same length as
|
num.samples |
The number of samples to generate from the posterior on the true category frequencies. |
The following calculations are performed independently for each
element of x
:
First, a frequency moving average is generated by
binning the category data in dat[,2]
into buckets
x[i]+/-hw
by the value in dat[,1]
. Then the observed
category frequency is tabulated
for all categories. (Note that with certain choices of x
and
hw
, data in dat[,2]
can be counted multiple times in
multiple buckets.)
Next, the frequency data is modeled as Multinomial with an unknown
true category frequency vector. The prior on the true category
frequency vector is assumed to be uniform ( Dirichlet(1,...,1) ). Samples are
generated from the Dirichlet posterior distribution on the true
category frequency vector. The central 95% posterior interval (CPI)
on all true category frequencies is estimated from the posterior
samples. See Value below for details on how results are tabulated.
If it is specified, cat.names
must contain at least all of the
unique values of dat[,2]
. It can be useful to define extra
elements of cat.names
if you know that there are other possible
categories that were not observed in dat[,2]
simply due to
finite sampling of low probability categories.
Along with the example given below, see freqMAP-package
for an example based on genotype data.
A list with the following elements:
cat.ma |
A dataframe with the following columns. |
post.samples |
Three dimensional array containing the posterior
samples. The first dimension indexes the sample, the second dimension
indexes the category, the third dimension indexes the bucket centered at
|
cat.names |
The category names analyzed. |
cat.short |
Shortforms of the category names. If not supplied,
then this will equal |
hw |
The bucket halfwidth passed in. |
x.label |
The name of the continuous covariate. All functions
using the object created by this function will search for element
|
The returned object has class c("freqMAP", "list")
.
Colin McCulloch <colin.mcculloch@themccullochgroup.com>
Payami, H., Kay, D.M., Zabetian, C.P., Schellenberg, G.D., Factor, S.A., and McCulloch, C.C. (2009) "Visualizing Disease Associations: Graphic Analysis of Frequency Distributions as a Function of Age Using Moving Average Plots (MAP) with Application to Alzheimer's and Parkinson's Disease", Genetic Epidemiology
plot.freqMAP
, summary.freqMAP
,
posterior.comparison.freqMAP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | #Make two sets of 2-category frequency data, y1 & y2, which both vary as
#a function of a continuous variable x
x <- runif(2000,min=-2,max=2)
y1 <- c("a","b")[1+rbinom(n=length(x),size=rep(1,length(x)),prob=pnorm(x/2))]
y2 <- c("a","b")[1+rbinom(n=length(x),size=rep(1,length(x)),prob=pnorm(x/5))]
#Create the frequency MAP objects for y1 and y2
fp1 <- freqMAP(data.frame(x=x,y=y1,stringsAsFactors=FALSE),
x=seq(-2,2,by=.2),x.label="x",hw=.2)
fp2 <- freqMAP(data.frame(x=x,y=y2,stringsAsFactors=FALSE),
x=seq(-2,2,by=.2),x.label="x",hw=.2)
#Examine the frequency MAP objects
summary(fp1)
print(fp2)
#Compare the posterior distributions on the two frequency MAPs
pc <- posterior.comparison.freqMAP(group1=fp1,group2=fp2)
#Three example plots
plot(fp1,ylim=matrix(c(0,1),nrow=length(fp1$cat.names),ncol=2,byrow=TRUE))
plot(fp1,fp2,type="freq",legend=c("y1","y2"),show.p.value.legend=TRUE)
plot(fp1,fp2,type="or")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.