dss_check_data: Test whether your data file has the required format for use...

View source: R/dss_check_data.R

dss_check_dataR Documentation

Test whether your data file has the required format for use in rdss.

Description

This is the mandatory first step when using rdss. This function performs several checks for possible formatting mistakes, and returns a dataframe with “normalized” reformatted contents.

Usage

dss_check_data(dtf, sex, females, males,
               tbd, rm_empty_rows = FALSE,
               mode = "console")

Arguments

dtf

previously imported dataframe. Warning: at this stage, individual IDs must be indicated as a character vector in the first column of the dataframe, and not directly as custom row names (see Notes below, see also the package vignette).

sex

character string; name of the column filled with the sex of individuals in the dataframe dtf.

females

character string; abbreviation used for female individuals in the sex column.

males

character string; abbreviation used for male individuals in the sex column.

tbd

character string; abbreviation used for target individuals in the sex column.

rm_empty_rows

boolean. Should individuals with no value at all be removed from the dataframe?

mode

for internal use in the shiny app only; final users in R scripts should stick with the default value, console.

Details

This functions performs a series a six checks on the dataframe dtf, and displays explicit and useful error messages when formatting mistakes are found (duplicates in row names, typos in the Sex column, etc.).

Also, it returns a dataframe whose the contents are “standardized”:

  • the sex column is automatically renamed as Sex

  • the sex factor is then releveled: females now match the level F, males now match the level M, target individuals now match the level TBD. This will facilitate and standardize the presentation of classification results for all users.

Value

A dataframe with same contents as dtf, but whose sex factor has possibly be renamed and releveled (see Details).

Note

Please note that the input dataframe dtf must not have row names, i.e. must not have been imported using the argument row.names = 1 from read.csv(), for instance. Instead, its first column must be a character vector filled with individual IDs. This character vector will be transformed as row names (after several checks) by this function. See the package vignette for additional details.

Author(s)

Frédéric Santos


frederic-santos/rdss documentation built on March 25, 2023, 5:25 p.m.