ps_tree: ps_tree
In benmarwick/karon: Compositional Data Analysis of Archaeological Artefacts

ps_tree

R Documentation

ps_tree

Description

Fit a recursive partitioning model (classification tree) to data from sources

Usage

ps_tree(
  doc = "ps_tree",
  data,
  GroupVar,
  Groups = "All",
  AnalyticVars,
  wts = NA,
  Seed = 11111,
  CpDigits = 3,
  plotTree = TRUE,
  plotCp = TRUE,
  Model,
  ModelTitle,
  minSplit = 20,
  cP = 0.01,
  predictSources = TRUE,
  predictUnknowns = FALSE,
  unknownData,
  ID = " ",
  unknownID = " ",
  folder = " "
)

Arguments

`doc`	A string with documentation added to defintion of usage, default is ps_tree (the function name)
`data`	A data frame with the source data to be analyzed
`GroupVar`	The name of the variable defining groups, grouping is required
`Groups`	A vector of codes for groups to be used, 'All' (the default) if use all groups
`AnalyticVars`	A vector with the names (character values) of the analytic variables
`wts`	Option to weight the observations, if used, vector with length nrow(data); if NA (the default), assume equal weights
`Seed`	A positive integer, to produce a reproducible analysis
`CpDigits`	The number of significant digits to display in the Cp table, default value is 3
`plotTree`	Logical. If TRUE (the default), plot the recursive partitioning tree
`plotCp`	Logical. If TRUE (the default), plot the Cp table values
`Model`	A character string containing the names of the variables (characters) considered separated by + signs
`ModelTitle`	The parameter Model as a single character value
`minSplit`	The minimum size of a group for splitting, default is 20 (the default in rpart())
`cP`	The required improvement in Cp for a group to be split, default is .01 (the default in rpart())
`predictSources`	Logical: if TRUE, use the tree to predict sources for the source data; default is TRUE
`predictUnknowns`	Logical: if TRUE, use the tree to predict sources for observations in unknownData; default is FALSE
`unknownData`	Data frame with data used to predict sources, must contain all variables in AnalyticVars
`ID`	If not " " (the default), the name of a variable identifying a sample in data
`unknownID`	If not " " (the default), the name of a variable identifying a sample in unknownData
`folder`	The path to the folder in which data frames will be saved; default is " "

Details

The function fits a classification tree model us the R function rpart(). The variables in AnalyticVars are considered in the order in which they appear in the Model argument (from left to right). See the vignette for more details.

Value

The function returns a list with the following components:

usage: A string with the contents of the argument doc, the date run, the version of R used
dataUsed: The contents of the argument data restricted to the groups used
params_grouping: A list with the values of the arguments GroupVar and Groups
analyticVars: A vector with the value of the argument AnalyticVars
params: A list with the values of the grouping, logical, and splitting parameters
Seed: A positive integer to set the random number generator
model: A character string with the value of the argument ModelTitle
treeFit: A list with details of the tree construction_
classification: A data frame showing the crossclassification of sources and predicted sources. Rows represent sources, columns represent predicted source
CpTable: A data frame showing the decrease in Cp with increasing numbers of splits
predictedSource: If predictSources = TRUE, a data frame with the predicted source for each source sample, plus the known source, the sample ID (if given), and the analytic variable values
predictedProbs: If predictSources = TRUE, a data frame with the set of prediction probabilities for each source sample, plus the known source and sample ID (if given)
predictedSourceUnknowns: If predictUnknowns = TRUE, a data frame with the predicted source for each unknown sample, plus the the sample ID (if given) and the analytic variable values
predictedProbsUnknowns: If predictUnknowns = TRUE, a data frame with the set of prediction probabilities for each unknown sample, plus the sample ID (if given)
errorRate: If predictSources = TRUE, the proportion of misassigned source samples
errorCount: If predictSources = TRUE, a vector with the number of misassigned sources and total number of sources
predictedTotalsUnknowns: If predictUnknowns = TRUE, a vector with the number of objects predicted to be from each source
location: The value of the argument folder

Examples

# Analyze the obsidian source data with variables in the model statement in order of
# importance from a random forest analysis
data(ObsidianSources)
analyticVars<-c("Rb","Sr","Y","Zr","Nb")
save_tree <- ps_tree(data=ObsidianSources, GroupVar="Code",Groups="All",
 AnalyticVars=analyticVars, Model = "Rb"+"Sr"+"Y"+"Zr"+"Nb",
 ModelTitle = "Sr + Nb + Rb + Y + Zr", predictSources=TRUE, predictUnknowns=FALSE,
 ID="ID")

# Predict the sources of the obsidian artifacts
data(ObsidianSources)
data(ObsidianArtifacts)
analyticVars<-c("Rb","Sr","Y","Zr","Nb")
save_tree <- ps_tree(data=ObsidianSources, GroupVar="Code",Groups="All",
 AnalyticVars=analyticVars, Model = "Rb"+"Sr"+"Y"+"Zr"+"Nb",
 ModelTitle = "Sr + Nb + Rb + Y + Zr", predictSources=FALSE, predictUnknowns=TRUE,
 unknownData=ObsidianArtifacts, unknownID="ID")

benmarwick/karon documentation built on July 29, 2023, 10:11 a.m.