diagnosability: Diagnosability
In EricArcher/strataG: Summaries and Population Structure Analyses of Genetic Data

diagnosability

R Documentation

Diagnosability

Description

Conduct Random Forest on a gtypes object to compute the diagnosability of each stratum (PD from Archer et al 2017).

Usage

diagnosability(
  g,
  gene = 1,
  pairwise = FALSE,
  conf.level = 0.95,
  replace = FALSE,
  sampsize = NULL,
  train.pct = 0.5,
  min.n = 2,
  min.votes.pct = c(0.8, 0.9, 0.95),
  rp.nrep = 0,
  unk = NULL
)

Arguments

`g`	haploid `gtypes` object with aligned sequences.
`gene`	number or name of gene to use from multidna `@sequences` slot. Defaults to the first gene in the object.
`pairwise`	do analysis on all pairwise combinations of strata?
`conf.level`	confidence level for the `binom.test` confidence interval.
`replace`	sample with replacement in Random Forest trees? (see `randomForest`).
`sampsize`	sample size for each Random Forest tree? (see `randomForest`). If `NULL` a balanced sample size is chosen (see `balancedSampsize`).
`train.pct`	if `sampsize` is `NULL`, the percent of the minimum strata size to use for `sampsize`.
`min.n`	minimum sample size across all strata.
`min.votes.pct`	numeric vector giving the minimum percent of votes for the assigned strata for a sample to be considered correctly assigned.
`rp.nrep`	number of replicates for `rfPermute` computation of significance of site importance scores.
`unk`	vector of strata to be treated as "unknowns" for prediction with Random Forest model.

Value

a list containing a data.frame of summary statistics (smry), and the randomForest object (rf). If pairwise is TRUE then the rf element is a list of randomForest results for each row in smry.

Author(s)

Eric Archer eric.archer@noaa.gov

Examples

## Not run: 
library(strataG)
data(dloop.g)

pd <- diagnosability(dloop.g, pairwise = TRUE)

lapply(pd, function(x) x$rf.confusion.mat)

## End(Not run)

EricArcher/strataG documentation built on June 8, 2025, 2:12 a.m.