freqMAP: Create a Frequency Moving Average Plot (MAP) Object
In freqMAP: Frequency Moving Average Plots (MAP) of Multinomial Data by a Continuous Covariate

Description Usage Arguments Details Value Author(s) References See Also Examples

This function creates a frequency MAP object from categorical data and a continuous covariate. The frequency MAP is a moving average estimate of category frequencies, where frequency means and posterior bounds are estimated.

1	freqMAP(dat, x, x.label, hw, cat.names = NULL, cat.short = NULL, num.samples = 1e+05)

`dat`	A dataframe with two columns. The first column should have the numeric value over which we are calculating the moving average (e.g. age), the 2nd column is a string vector giving the category (e.g. genotype).
`x`	A numeric vector at which to calculate the moving average (relative to the first column of `dat`).
`x.label`	A string which will be used to name the covariate column (first column) of the output moving average dataframe, `cat.ma`. The name of the covariate column in `dat` is not used for this purpose.
`hw`	The half-width of the moving average. See Details below.
`cat.names`	Optional. The categories to analyze. The default behavior is to use all unique values in dat[,2]. See Details below.
`cat.short`	Optional. A string vector of the same length as `cat.names` which gives short forms of the category names. These shortforms will be used in subsequent plots.
`num.samples`	The number of samples to generate from the posterior on the true category frequencies.

The following calculations are performed independently for each element of x:

First, a frequency moving average is generated by binning the category data in dat[,2] into buckets x[i]+/-hw by the value in dat[,1]. Then the observed category frequency is tabulated for all categories. (Note that with certain choices of x and hw, data in dat[,2] can be counted multiple times in multiple buckets.) Next, the frequency data is modeled as Multinomial with an unknown true category frequency vector. The prior on the true category frequency vector is assumed to be uniform ( Dirichlet(1,...,1) ). Samples are generated from the Dirichlet posterior distribution on the true category frequency vector. The central 95% posterior interval (CPI) on all true category frequencies is estimated from the posterior samples. See Value below for details on how results are tabulated.

If it is specified, cat.names must contain at least all of the unique values of dat[,2]. It can be useful to define extra elements of cat.names if you know that there are other possible categories that were not observed in dat[,2] simply due to finite sampling of low probability categories.

Along with the example given below, see freqMAP-package for an example based on genotype data.

A list with the following elements:

`cat.ma`	A dataframe with the following columns. `x` gives the `x` argument passed in. `n` gives the number of category observations falling in the bucket centered at `x`. Columns titled by the elements of `cat.names` are the observed frequencies for each category. Columns with names ending in `.lpi` and `.upi` give the lower and upper bounds of the 95% CPI for each category.
`post.samples`	Three dimensional array containing the posterior samples. The first dimension indexes the sample, the second dimension indexes the category, the third dimension indexes the bucket centered at `x`.
`cat.names`	The category names analyzed.
`cat.short`	Shortforms of the category names. If not supplied, then this will equal `cat.names`.
`hw`	The bucket halfwidth passed in.
`x.label`	The name of the continuous covariate. All functions using the object created by this function will search for element `obj$cat.ma[,"obj$x.label"]` as the continuous covariate.

The returned object has class c("freqMAP", "list").

Colin McCulloch <colin.mcculloch@themccullochgroup.com>

Payami, H., Kay, D.M., Zabetian, C.P., Schellenberg, G.D., Factor, S.A., and McCulloch, C.C. (2009) "Visualizing Disease Associations: Graphic Analysis of Frequency Distributions as a Function of Age Using Moving Average Plots (MAP) with Application to Alzheimer's and Parkinson's Disease", Genetic Epidemiology

plot.freqMAP, summary.freqMAP, posterior.comparison.freqMAP

  #Make two sets of 2-category frequency data, y1 & y2, which both vary as
  #a function of a continuous variable x
  x <- runif(2000,min=-2,max=2)
  y1 <- c("a","b")[1+rbinom(n=length(x),size=rep(1,length(x)),prob=pnorm(x/2))]
  y2 <- c("a","b")[1+rbinom(n=length(x),size=rep(1,length(x)),prob=pnorm(x/5))]

  #Create the frequency MAP objects for y1 and y2
  fp1 <- freqMAP(data.frame(x=x,y=y1,stringsAsFactors=FALSE),
                  x=seq(-2,2,by=.2),x.label="x",hw=.2)
  fp2 <- freqMAP(data.frame(x=x,y=y2,stringsAsFactors=FALSE),
                  x=seq(-2,2,by=.2),x.label="x",hw=.2)

  #Examine the frequency MAP objects
  summary(fp1)
  print(fp2)

  #Compare the posterior distributions on the two frequency MAPs
  pc <- posterior.comparison.freqMAP(group1=fp1,group2=fp2)

  #Three example plots
  plot(fp1,ylim=matrix(c(0,1),nrow=length(fp1$cat.names),ncol=2,byrow=TRUE))
  plot(fp1,fp2,type="freq",legend=c("y1","y2"),show.p.value.legend=TRUE)
  plot(fp1,fp2,type="or")