sample_verification: Add Sample Verification Column (Level-2)

View source: R/sample_verification.R

sample_verificationR Documentation

Add Sample Verification Column (Level-2)

Description

This function takes in a level-1 data frame and an exclusion list and returns a level-2 data frame with a verification column. The verification column contains either "Y", indicating the row is good for analysis, or messages contained in the exclusion list for why the data rows are excluded. If an exclusion list is not provided, all rows are assumed to be good for use in further analyses and are verified with "Y".

Usage

sample_verification(
  FILENAME,
  data.in,
  exclusion.info,
  assay,
  output.res = FALSE,
  INPUT.DIR = NULL,
  OUTPUT.DIR = NULL,
  verbose = TRUE
)

Arguments

FILENAME

(Character) A string used to identify the output level-1 file. "<FILENAME>-<assay>-Level1.tsv".

data.in

(Data Frame) A level-1 data frame from the format functions.

exclusion.info

(Data Frame) A data frame containing the variables and values of the corresponding variables to exclude rows. See details for full explanation.

assay

(Character) A string indicating what assay data the input file is. Valid input is one of the following: "Clint", "fup-UC", "fup-RED", or "Caco-2". This argument only needs to be specified when importing input data set with FILENAME or exporting a data file.

output.res

(Logical) When set to TRUE, the resulting data frame (level-2) will be exported to the user's per-session temporary directory or OUTPUT.DIR (if specified) as a .tsv file. (Defaults to FALSE.)

INPUT.DIR

(Character) Path to the directory where the input level-1 file exists. If NULL, looking for the input level-1 file in the current working directory. (Defaults to NULL.)

OUTPUT.DIR

(Character) Path to the directory to save the output file. If NULL, the output file will be saved to the user's per-session temporary directory or INPUT.DIR if specified. (Defaults to NULL.)

verbose

(logical) Indicate whether printed statements should be shown. (Default is TRUE.)

Details

The 'exclusion.info' should be a data frame with the following columns:

Variables level-1 variable(s) used to filter rows for exclusion
Values Value(s) to exclude
Message Simple explanation for the exclusion

When filtering on multiple variable-value pairs, the character input for "Variables" and "Values" should be separated by a vertical bar "|" , and the variable-value pairs should match. See demonstration in Examples, Scenario 1.

NOTE: Currently if NA's exist in a variable of interest for 'verification' assignments, then that variable cannot be used for assigning verification. Thus, either alternative variable-value pairs will need to be used in lieu of variable with missing values, or (though less ideal) "manual coding" adjustments in the verification column may be necessary.

If the output level-2 data frame is chosen to be exported and an output directory is not specified, it will be exported to the user's R session temporary directory. This temporary directory is a per-session directory whose path can be found with the following code: tempdir(). For more details, see https://www.collinberke.com/til/posts/2023-10-24-temp-directories/.

As a best practice, INPUT.DIR (when importing a .tsv file) and/or OUTPUT.DIR should be specified to simplify the process of importing and exporting files. This practice ensures that the exported files can easily be found and will not be exported to a temporary directory.

Value

A level-2 data frame with a verification column.

Author(s)

Zhihui (Grace) Zhao

Examples

level1 <- invitroTKstats::clint_L1

# Scenario 1: Pass in data.in and exclusion.info data frame from R session 

# Create a exclusion criteria data frame
# Use the excluded samples found in \code{invitroTKstats::clint_L2_heldout}
# If more than one variable is used to define a set of samples to be excluded,
# enter them as one string, separate the Variables with a vertical bar, "|",
# and do the same for Values. 

excluded_level2 <- invitroTKstats::clint_L2_heldout

exclusion_criteria <- data.frame(
  Variables = paste("Compound.Name", "Lab.Sample.Name", sep = "|"), 
  Values = paste(excluded_level2[,"Compound.Name"], excluded_level2[,"Lab.Sample.Name"], sep = "|"),
  Message = excluded_level2[,"Verified"]
  )
  
# Run the verification function.
my.level2 <- sample_verification(data.in=level1,
                                 exclusion.info = exclusion_criteria,
                                 output.res = FALSE)

# Scenario 2: Import 'tsv' as input data and do not pass in an exclusion.info data frame

## Not run: 
# Write the level-1 file to some folder
# Will need to replace <desired level-1 FOLDER> with desired export folder location.
# The <desired level-1 FOLDER> needs to already exist.   

write.table(level1,
file=here::here("<desired level-1 FOLDER>/Smeltz-Clint-Level1.tsv"),
sep="\t",
row.names=FALSE,
quote=FALSE)

# Run the verification function.
# Specify the path to import level-1 data with INPUT.DIR.
# Will need to replace INPUT.DIR = <desired level-1 FOLDER> with chosen output
# folder location from above 
# If no exclusion.info data frame is used, will label all samples as verified.
# A level-2 file is also exported to INPUT.DIR when OUTPUT.DIR is not specified.
my.level2 <- sample_verification(FILENAME="Smeltz", 
assay="Clint", INPUT.DIR = here::here("<desired level-1 FOLDER>"))

## End(Not run)


invitroTKstats documentation built on Aug. 23, 2025, 9:08 a.m.