setna.sp: Set values to 'NA' for sweetpotato data.
In reyzaguirre/st4gi: Statistical tools for genetic improvement

setna.sp

R Documentation

Set values to `NA` for sweetpotato data.

Description

Detect impossible values for sweetpotato data and set them to missing value (NA) according to some rules.

Usage

setna.sp(dfr, f = 10)

Arguments

`dfr`	The name of the data frame.
`f`	Factor for extreme values detection. See details.

Details

The data frame must use the labels (lower or upper case) listed in function check.names.sp. Consider the following groups of traits:

pre (traits evaluated pre-harvest): vir, vir1, vir2, alt, alt1, alt2, and vv.
wvn (traits evaluated with vines non-pre-harvest): vw, biom, biom.d, vw.d, fytha, fytha.aj, dmvy, dmvy.aj, bytha, bytha.aj, dmby, dmby.aj, vpp, vpsp, dmvf, dmvd, hi, shi, and dmv.
cnn (continuos non-negative traits): vw, crw, ncrw, trw, trw.d, biom, biom.d, cytha, cytha.aj, rytha, rytha.aj, dmry, dmry.aj, vw.d, fytha, fytha.aj, dmvy, dmvy.aj, bytha, bytha.aj, dmby, dmby.aj, nrpp, nrpsp, ncrpp, ncrpsp, ypp, ypsp, vpp, vpsp, rtyldpct, rfr, bc, tc, fe, zn, ca, and mg.
cpo (continuous positive traits): dmf, dmd, dmvf, dmvd, acrw, ancrw, and atrw.
pnn (percentage non-negative traits): ci, hi, shi, fruc, gluc, sucr, and malt.
ppo (percentage positive traits): dm, dmv, prot, and star.
dnn (discrete non-negative traits): nops, nope, noph, nopr, nocr, nonc, and tnr.
ctg (categorical 1 to 9 traits): vir, vir1, vir2, alt, alt1, alt2, vv, scol, fcol, fcol2, rs, rf, rtshp, damr, rspr, alcdam, wed, stspwv, milldam, fraw, suraw, straw, coof, coosu, coost, coot, and cooap.

Values are set to NA with the following rules:

cnn traits with negative values are set to NA.
cpo traits with non-positive values are set to NA.
pnn traits with values out of the [0, 100] interval are set to NA.
ppo with values out of the (0, 100] interval are set to NA.
dnn traits with negative and non-integer values are set to NA.
ctg traits with out of scale values are set to NA.
Beta carotene values determined by RHS color charts with values different from the possible values in the RHS color chart are set to NA.
Extreme low and high values are detected using the interquartile range. The rule is to detect any value out of the interval [Q_1 - f \times (m/3 + IQR); Q_3 + f \times (m/3 + IQR)] where m is the mean. By default f = 10 and if less than 10 a warning is shown. Values out of this range are set to NA.
If nope == 0 and there is some data for any trait, then nope is set to NA.
If noph == 0 and there is some data for any non-pre-harvest trait, then noph is set to NA.
If nopr == 0 and there is some data for any trait evaluated with roots, then nopr is set to NA.
If noph > 0 and nocr, nonc, crw, ncrw, and vw are all 0, then vw is set to NA.
If nopr > 0 and nocr, nonc, crw, and ncrw are all 0, then ncrw and nonc are both set to NA.
If nocr == 0 and crw > 0, then nocr is set to NA.
If nocr > 0 and crw == 0, then crw is set to NA.
If nonc == 0 and ncrw > 0, then nonc is set to NA.
If nonc > 0 and ncrw == 0, then ncrw is set to NA.

Value

It returns the data frame with all impossible values set to NA and a list of warnings with all the rows that have been modified.

Author(s)

Raul Eyzaguirre.

Examples

dfr <- data.frame(trw = c(2.2, 5.0, 3.6, 12, 1600, -4),
                  dm = c(21, 23, 105, 24, -3, 30),
                  tnr = c(1.3, 10, 11, NA, 2, 5),
                  scol = c(1, 0, 15, 5, 4, 7),
                  fcol.cc = c(1, 15, 12, 24, 55, 20))
setna.sp(dfr)

reyzaguirre/st4gi documentation built on April 30, 2024, 5:45 a.m.