freqMAP: Create a Frequency Moving Average Plot (MAP) Object

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function creates a frequency MAP object from categorical data and a continuous covariate. The frequency MAP is a moving average estimate of category frequencies, where frequency means and posterior bounds are estimated.

Usage

1
freqMAP(dat, x, x.label, hw, cat.names = NULL, cat.short = NULL, num.samples = 1e+05)

Arguments

dat

A dataframe with two columns. The first column should have the numeric value over which we are calculating the moving average (e.g. age), the 2nd column is a string vector giving the category (e.g. genotype).

x

A numeric vector at which to calculate the moving average (relative to the first column of dat).

x.label

A string which will be used to name the covariate column (first column) of the output moving average dataframe, cat.ma. The name of the covariate column in dat is not used for this purpose.

hw

The half-width of the moving average. See Details below.

cat.names

Optional. The categories to analyze. The default behavior is to use all unique values in dat[,2]. See Details below.

cat.short

Optional. A string vector of the same length as cat.names which gives short forms of the category names. These shortforms will be used in subsequent plots.

num.samples

The number of samples to generate from the posterior on the true category frequencies.

Details

The following calculations are performed independently for each element of x:

First, a frequency moving average is generated by binning the category data in dat[,2] into buckets x[i]+/-hw by the value in dat[,1]. Then the observed category frequency is tabulated for all categories. (Note that with certain choices of x and hw, data in dat[,2] can be counted multiple times in multiple buckets.) Next, the frequency data is modeled as Multinomial with an unknown true category frequency vector. The prior on the true category frequency vector is assumed to be uniform ( Dirichlet(1,...,1) ). Samples are generated from the Dirichlet posterior distribution on the true category frequency vector. The central 95% posterior interval (CPI) on all true category frequencies is estimated from the posterior samples. See Value below for details on how results are tabulated.

If it is specified, cat.names must contain at least all of the unique values of dat[,2]. It can be useful to define extra elements of cat.names if you know that there are other possible categories that were not observed in dat[,2] simply due to finite sampling of low probability categories.

Along with the example given below, see freqMAP-package for an example based on genotype data.

Value

A list with the following elements:

cat.ma

A dataframe with the following columns. x gives the x argument passed in. n gives the number of category observations falling in the bucket centered at x. Columns titled by the elements of cat.names are the observed frequencies for each category. Columns with names ending in .lpi and .upi give the lower and upper bounds of the 95% CPI for each category.

post.samples

Three dimensional array containing the posterior samples. The first dimension indexes the sample, the second dimension indexes the category, the third dimension indexes the bucket centered at x.

cat.names

The category names analyzed.

cat.short

Shortforms of the category names. If not supplied, then this will equal cat.names.

hw

The bucket halfwidth passed in.

x.label

The name of the continuous covariate. All functions using the object created by this function will search for element obj$cat.ma[,"obj$x.label"] as the continuous covariate.

The returned object has class c("freqMAP", "list").

Author(s)

Colin McCulloch <colin.mcculloch@themccullochgroup.com>

References

Payami, H., Kay, D.M., Zabetian, C.P., Schellenberg, G.D., Factor, S.A., and McCulloch, C.C. (2009) "Visualizing Disease Associations: Graphic Analysis of Frequency Distributions as a Function of Age Using Moving Average Plots (MAP) with Application to Alzheimer's and Parkinson's Disease", Genetic Epidemiology

See Also

plot.freqMAP, summary.freqMAP, posterior.comparison.freqMAP

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  #Make two sets of 2-category frequency data, y1 & y2, which both vary as
  #a function of a continuous variable x
  x <- runif(2000,min=-2,max=2)
  y1 <- c("a","b")[1+rbinom(n=length(x),size=rep(1,length(x)),prob=pnorm(x/2))]
  y2 <- c("a","b")[1+rbinom(n=length(x),size=rep(1,length(x)),prob=pnorm(x/5))]

  #Create the frequency MAP objects for y1 and y2
  fp1 <- freqMAP(data.frame(x=x,y=y1,stringsAsFactors=FALSE),
                  x=seq(-2,2,by=.2),x.label="x",hw=.2)
  fp2 <- freqMAP(data.frame(x=x,y=y2,stringsAsFactors=FALSE),
                  x=seq(-2,2,by=.2),x.label="x",hw=.2)

  #Examine the frequency MAP objects
  summary(fp1)
  print(fp2)

  #Compare the posterior distributions on the two frequency MAPs
  pc <- posterior.comparison.freqMAP(group1=fp1,group2=fp2)

  #Three example plots
  plot(fp1,ylim=matrix(c(0,1),nrow=length(fp1$cat.names),ncol=2,byrow=TRUE))
  plot(fp1,fp2,type="freq",legend=c("y1","y2"),show.p.value.legend=TRUE)
  plot(fp1,fp2,type="or")

freqMAP documentation built on May 29, 2017, 11:42 p.m.