pamCat: Prediction Analysis of Categorical Data
In scrime: Analysis of High-Dimensional Categorical Data Such as SNP Data

pamCat

R Documentation

Prediction Analysis of Categorical Data

Description

Performs a Prediction Analysis of Categorical Data.

Usage

pamCat(data, cl, theta = NULL, n.theta = 10, newdata = NULL, newcl = NULL)

Arguments

`data`	a numeric matrix composed of the integers between 1 and `n_{cat}`, where `n_{cat}` is the number of levels each of the variables represented by the rows of `data` must take. No missing values allowed.
`cl`	a numeric vector of length `ncol(data)` comprising the class labels of the observations represented by the columns of `data`. `cl` must consist of the integers between 1 and `n_{cl}`, where `n_{cl}` is the number of classes.
`theta`	a numeric vector consisting of the strictly positive values of the shrinkage parameter used in the Prediction Analysis. If `NULL`, a vector consisting of `n.theta` values for the shrinkage parameter are determined automatically.
`n.theta`	an integer specifying the number of values for the shrinkage parameter of the Prediction Analysis. Ignored if `theta` is specified.
`newdata`	a numeric matrix composed of the integers between 1 and `n_{cat}`. Must have the same number of rows as `data`, and each row of `newdata` must contain the same variable as the corresponding row of `data`. `newdata` is employed to compute the misclassification rates of the Prediction Analysis for the given values of the shrinkage parameter. If `NULL`, `data` is used to determine the misclassification rates.
`newcl`	a numeric vector of length `ncol(newdata)` that consists of integers between 1 and `n_{cl}`, and specifies the class labels of the observations in `newdata`. Must be specified, if `newdata` is specified.

Value

An object of class pamCat composed of

`mat.chisq`	a matrix with `m` rows and `n_{cl}` columns consisting of the classwise values of Pearson's `\chi^2` statistic for each of the `m` variables.
`mat.obs`	a matrix with `m` rows and `n_{cat} * n_{cl}` columns in which each row shows a contingency table between the corresponding variable and `cl`.
`mat.exp`	a matrix of the same size as `mat.obs` containing the numbers of observations expected under the null hypothesis of an association between the respective variable and `cl`.
`mat.theta`	a data frame consisting of the numbers of variables used in the classification of the observations in `newdata` and the corresponding misclassification rates for a set of values of the shrinkage parameter `\theta`.
`tab.cl`	a table summarizing the values of the response, i.e.\ the class labels.
`n.cat`	`n_{cat}`.

Author(s)

Holger Schwender, holger.schwender@udo.edu

References

Schwender, H.\ (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.

Examples

## Not run: 
# Generate a data set consisting of 2000 rows (variables) and 50 columns.
# Assume that the first 25 observations belong to class 1, and the other
# 50 observations to class 2.

mat <- matrix(sample(3, 100000, TRUE), 2000)
rownames(mat) <- paste("SNP", 1:2000, sep = "")
cl <- rep(1:2, e = 25)

# Apply PAM for categorical data to this matrix, and compute the
# misclassification rate on the training set, i.e. on mat.

pam.out <- pamCat(mat, cl)
pam.out

# Now generate a new data set consisting of 20 observations, 
# and predict the classes of these observations using the
# value of theta that has led to the smallest misclassification
# rate in pam.out.

mat2 <- matrix(sample(3, 40000, TRUE), 2000)
rownames(mat2) <- paste("SNP", 1:2000, sep = "")
predict(pam.out, mat2)

# Let's assume that the predicted classes are the real classes
# of the observations. Then, mat2 can also be used in pamCat
# to compute the misclassification rate. 

cl2 <- predict(pam.out, mat2)
pamCat(mat, cl, newdata = mat2, newcl = cl2)


## End(Not run)

scrime documentation built on Jan. 26, 2026, 1:07 a.m.