cleanData: Score samples using the CSCI tool

Description Usage Arguments Details Value Examples

View source: R/cleanData.r

Description

Function to find or remove errors in BMI data

Usage

1
cleanData(data, purge = FALSE, msgs = FALSE)

Arguments

data

A data frame with BMI data (see details)

purge

If true, a data frame will be returned with problematic rows removed, see details.

msgs

logical, if FALSE a purged or non-purged data frame, if TRUE a two-element list with the data frame and concated list of messages, see the return value

Details

This functions checks for several types of common errors: incorrect case in FinalID names, FinalIDs that are missing from the internal database, FinalIDs with inappropriate life stage codes (e.g., non-insects with a LifeStageCode other than 'X').

This functions requires that the dataframe contains at least two columns: FinalID and LifeStageCode.

The default value purge = FALSE will not remove rows where the FinalIDs are incorrect, otherwise they are removed. In the former example, a new column problemFinalID is added as a T/F vector indicating which rows are incorrect. For both purge = FALSE and purge = TRUE, rows with correct FinalID values are also checked for correct life stage codes in the LifeStageCode column. Values are replaced with default values in a lookup table provided with the package if they are incorrect. A new column fixedLifeStageCode is added as a T/F vector indicating which rows were fixed for an incorrect life stage code.

Value

If msgs = FALSE (default), a data frame is returned that is either the same as the input if all checks have passed or a purged (purge = TRUE) or non-purged purge = FALSE) dataset with additional columns for FinalID and LifeStageCode. If msgs = TRUE, a two-element list is returned, where the first element data is the data frame that would be returned if msgs = FALSE and the second element is msg with a concatenated character string of messages indicating if all checks have passed and if not, which issues were encountered. In the latter case, row numbers in the messages indicate which observations in the input data had issues.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# load bug, station data
data(bugs_stations) 

## Not run: 

# function returns input data
cleanData(bugs_stations[[1]])

# same as above but retrieve msgs
cleanData(bugs_stations[[1]], msgs = TRUE)

# create some wrong FinalID values in bug data
wrongdata <- bugs_stations[[1]]
wrongdata$FinalID <- as.character(wrongdata$FinalID)
wrongdata$FinalID[c(1, 15, 30)] <- c('idwrong1', 'idwrong2', 'idwrong3')

# default, purge nothing
# new columns fixedLifeStageCode, ProblemFinalID with T/F for wrong/right
cleanData(wrongdata)

# purge
# removes from output
cleanData(wrongdata, purge = TRUE)

# create some wrong lifestagecodes, only applies if purge is T
wrongdata$LifeStageCode <- as.character(wrongdata$LifeStageCode)
wrongdata$LifeStageCode[c(2, 16, 31)] <- c('lscwrong1', 'lscwrong2', 'lscwrong3')

# no purge
cleanData(wrongdata)

#compare with purge
cleanData(wrongdata, purge = TRUE)

# with messages
cleanData(wrongdata, purge = TRUE, msgs = TRUE)

## End(Not run)

SCCWRP/CSCI documentation built on Feb. 8, 2022, 11:25 a.m.