phenotypes: Create Core Hunter phenotype data from data frame or file.
In corehunter/corehunter3-r: Multi-Purpose Core Subset Selection

phenotypes

R Documentation

Create Core Hunter phenotype data from data frame or file.

Description

Specify either a data frame containing the phenotypic trait observations or a file from which to read the data. See https://www.corehunter.org for documentation and examples of the phenotype data format used by Core Hunter.

Usage

phenotypes(data, types, min, max, file)

Arguments

`data`	Data frame containing one row per individual and one column per trait. Unique row and column names are required and used as item and trait ids, respectively. The data frame may optionally include a first column `NAME` used to assign names to some or all individuals.
`types`	Variable types (optional). Vector of characters, each of length one or two. Ignored when reading from file. The first letter indicates the scale type and should be one of `N` (nominal), `O` (ordinal), `I` (interval) or `R` (ratio). The second letter optionally indicates the variable encoding (in Java) and should be one of `B` (boolean), `T` (short), `I` (integer), `L` (long), `R` (big integer), `F` (float), `D` (double), `M` (big decimal), `A` (date) or `S` (string). The default encoding is `S` (string) for nominal variables, `I` (integer) for ordinal and interval variables and `D` (double) for ratio variables. Interval and ratio variables are limited to numeric encodings. If no explicit variable types are specified these are automatically inferred from the data frame column types and classes, whenever possible. Columns of type `character` are treated as nominal string encoded variables (`N`). Unordered `factor` columns are converted to `character` and also treated as string encoded nominals. Ordered factors are converted to integer encoded interval variables (`I`) as described below. Columns of type `logical` are taken to be asymmetric binary variables (`NB`). Finally, `integer` and more broadly `numeric` columns are treated as integer encoded interval variables (`I`) and double encoded ratio variables (`R`), respectively. Boolean encoded nominals (`NB`) are treated as asymmetric binary variables. For symmetric binary variables just use the default string encoding (`N` or `NS`). Other nominal variables are converted to factors. Ordinal variables of class `ordered` are converted to integers respecting the order and range of the factor levels and subsequently treated as integer encoded interval variables (`I`). This conversion allows to model the full range of factor levels also when some might not occur in the data. For other ordinal variables it is assumed that each value occurs at least once and that values follow the natural ordering of the chosen data type (in Java). If explicit types are given for some variables others can still be automatically inferred by setting their type to `NA`.
`min`	Minimum values of interval or ratio variables (optional). Numeric vector. Ignored when reading from file. If undefined for some variables the respective minimum is inferred from the data. If the data exceeds the minimum it is also updated accordingly. For nominal and ordinal variables just put `NA`.
`max`	Maximum values of interval or ratio variables (optional). Numeric vector. Ignored when reading from file. If undefined for some variables the respective maximum is inferred from the data. If the data exceeds the maximum it is also updated accordingly. For nominal and ordinal variables just put `NA`.
`file`	File containing the phenotype data.

Value

Phenotype data of class chpheno with elements

data: Phenotypes (data frame).
size: Number of individuals in the dataset.
ids: Unique item identifiers.
names: Item names. Names of individuals to which no explicit name has been assigned are equal to the unique ids.
types: Variable types and encodings.
ranges: Variable ranges, when applicable (NA elsewhere).
java: Java version of the data object.
file: Normalized path of file from which the data was read (if applicable).

Examples

# create from data frame
pheno.data <- data.frame(
 season = c("winter", "summer", "summer", "winter", "summer"),
 yield = c(34.5, 32.6, 22.1, 54.12, 43.33),
 size = ordered(c("l", "s", "s", "m", "l"), levels = c("s", "m", "l")),
 resistant = c(FALSE, TRUE, TRUE, FALSE, TRUE)
)
pheno <- phenotypes(pheno.data)

# explicit types
pheno <- phenotypes(pheno.data, types = c("N", "R", "O", "NB"))
# treat last column as symmetric binary, auto infer others
pheno <- phenotypes(pheno.data, types = c(NA, NA, NA, "NS"))

# explicit ranges
pheno <- phenotypes(pheno.data, min = c(NA, 20.0, NA, NA), max = c(NA, 60.0, NA, NA))

# read from file
pheno.file <- system.file("extdata", "phenotypes.csv", package = "corehunter")
pheno <- phenotypes(file = pheno.file)

corehunter/corehunter3-r documentation built on May 16, 2023, 5:12 p.m.