phenotypes: Create Core Hunter phenotype data from data frame or file.

View source: R/data.R

phenotypesR Documentation

Create Core Hunter phenotype data from data frame or file.

Description

Specify either a data frame containing the phenotypic trait observations or a file from which to read the data. See https://www.corehunter.org for documentation and examples of the phenotype data format used by Core Hunter.

Usage

phenotypes(data, types, min, max, file)

Arguments

data

Data frame containing one row per individual and one column per trait. Unique row and column names are required and used as item and trait ids, respectively. The data frame may optionally include a first column NAME used to assign names to some or all individuals.

types

Variable types (optional). Vector of characters, each of length one or two. Ignored when reading from file.

The first letter indicates the scale type and should be one of N (nominal), O (ordinal), I (interval) or R (ratio).

The second letter optionally indicates the variable encoding (in Java) and should be one of B (boolean), T (short), I (integer), L (long), R (big integer), F (float), D (double), M (big decimal), A (date) or S (string). The default encoding is S (string) for nominal variables, I (integer) for ordinal and interval variables and D (double) for ratio variables. Interval and ratio variables are limited to numeric encodings.

If no explicit variable types are specified these are automatically inferred from the data frame column types and classes, whenever possible. Columns of type character are treated as nominal string encoded variables (N). Unordered factor columns are converted to character and also treated as string encoded nominals. Ordered factors are converted to integer encoded interval variables (I) as described below. Columns of type logical are taken to be asymmetric binary variables (NB). Finally, integer and more broadly numeric columns are treated as integer encoded interval variables (I) and double encoded ratio variables (R), respectively.

Boolean encoded nominals (NB) are treated as asymmetric binary variables. For symmetric binary variables just use the default string encoding (N or NS). Other nominal variables are converted to factors.

Ordinal variables of class ordered are converted to integers respecting the order and range of the factor levels and subsequently treated as integer encoded interval variables (I). This conversion allows to model the full range of factor levels also when some might not occur in the data. For other ordinal variables it is assumed that each value occurs at least once and that values follow the natural ordering of the chosen data type (in Java).

If explicit types are given for some variables others can still be automatically inferred by setting their type to NA.

min

Minimum values of interval or ratio variables (optional). Numeric vector. Ignored when reading from file. If undefined for some variables the respective minimum is inferred from the data. If the data exceeds the minimum it is also updated accordingly. For nominal and ordinal variables just put NA.

max

Maximum values of interval or ratio variables (optional). Numeric vector. Ignored when reading from file. If undefined for some variables the respective maximum is inferred from the data. If the data exceeds the maximum it is also updated accordingly. For nominal and ordinal variables just put NA.

file

File containing the phenotype data.

Value

Phenotype data of class chpheno with elements

data

Phenotypes (data frame).

size

Number of individuals in the dataset.

ids

Unique item identifiers.

names

Item names. Names of individuals to which no explicit name has been assigned are equal to the unique ids.

types

Variable types and encodings.

ranges

Variable ranges, when applicable (NA elsewhere).

java

Java version of the data object.

file

Normalized path of file from which the data was read (if applicable).

Examples

# create from data frame
pheno.data <- data.frame(
 season = c("winter", "summer", "summer", "winter", "summer"),
 yield = c(34.5, 32.6, 22.1, 54.12, 43.33),
 size = ordered(c("l", "s", "s", "m", "l"), levels = c("s", "m", "l")),
 resistant = c(FALSE, TRUE, TRUE, FALSE, TRUE)
)
pheno <- phenotypes(pheno.data)

# explicit types
pheno <- phenotypes(pheno.data, types = c("N", "R", "O", "NB"))
# treat last column as symmetric binary, auto infer others
pheno <- phenotypes(pheno.data, types = c(NA, NA, NA, "NS"))

# explicit ranges
pheno <- phenotypes(pheno.data, min = c(NA, 20.0, NA, NA), max = c(NA, 60.0, NA, NA))

# read from file
pheno.file <- system.file("extdata", "phenotypes.csv", package = "corehunter")
pheno <- phenotypes(file = pheno.file)


corehunter documentation built on Sept. 1, 2023, 5:07 p.m.