format_caco2: Creates a Standardized Data Frame with Caco-2 Data (Level-1)

View source: R/format_caco2.R

format_caco2R Documentation

Creates a Standardized Data Frame with Caco-2 Data (Level-1)

Description

This function formats data describing mass spectrometry (MS) peak areas from samples collected as part of in vitro measurements of membrane permeability using Caco-2 cells \insertCitehubatsch2007determinationinvitroTKstats. The input data frame is organized into a standard set of columns and is written to a tab-separated text file.

Usage

format_caco2(
  FILENAME = "MYDATA",
  data.in,
  sample.col = "Lab.Sample.Name",
  lab.compound.col = "Lab.Compound.Name",
  dtxsid.col = "DTXSID",
  date = NULL,
  date.col = "Date",
  compound.col = "Compound.Name",
  area.col = "Area",
  istd.col = "ISTD.Area",
  type.col = "Type",
  direction.col = "Direction",
  membrane.area = NULL,
  membrane.area.col = "Membrane.Area",
  receiver.vol.col = "Vol.Receiver",
  donor.vol.col = "Vol.Donor",
  test.conc = NULL,
  test.conc.col = "Test.Compound.Conc",
  cal = NULL,
  cal.col = "Cal",
  dilution = NULL,
  dilution.col = "Dilution.Factor",
  time = NULL,
  time.col = "Time",
  istd.name = NULL,
  istd.name.col = "ISTD.Name",
  istd.conc = NULL,
  istd.conc.col = "ISTD.Conc",
  test.nominal.conc = NULL,
  test.nominal.conc.col = "Test.Target.Conc",
  biological.replicates = NULL,
  biological.replicates.col = "Biological.Replicates",
  technical.replicates = NULL,
  technical.replicates.col = "Technical.Replicates",
  analysis.method = NULL,
  analysis.method.col = "Analysis.Method",
  analysis.instrument = NULL,
  analysis.instrument.col = "Analysis.Instrument",
  analysis.parameters = NULL,
  analysis.parameters.col = "Analysis.Parameters",
  note.col = "Note",
  level0.file = NULL,
  level0.file.col = "Level0.File",
  level0.sheet = NULL,
  level0.sheet.col = "Level0.Sheet",
  output.res = FALSE,
  save.bad.types = FALSE,
  sig.figs = 5,
  INPUT.DIR = NULL,
  OUTPUT.DIR = NULL,
  verbose = TRUE
)

Arguments

FILENAME

(Character) A string used to identify the output level-1 file. "<FILENAME>-Caco-2-Level1.tsv", and/or used to identify the input level-0 file, "<FILENAME>-Caco-2-Level0.tsv" if importing from a .tsv file. (Defaults to "MYDATA".)

data.in

(Data Frame) A level-0 data frame containing mass-spectrometry peak areas, indication of chemical identity, and measurement type. The data frame should contain columns with names specified by the following arguments:

sample.col

(Character) Column name of data.in containing the unique mass spectrometry (MS) sample name used by the laboratory. (Defaults to "Lab.Sample.Name".)

lab.compound.col

(Character) Column name of data.in containing the test compound name used by the laboratory. (Defaults to "Lab.Compound.Name".)

dtxsid.col

(Character) Column name of data.in containing EPA's DSSTox Structure ID (http://comptox.epa.gov/dashboard). (Defaults to "DTXSID".)

date

(Character) The laboratory measurement date, format "MMDDYY" where "MM" = 2 digit month, "DD" = 2 digit day, and "YY" = 2 digit year. (Defaults to NULL.) (Note: Single entry only, use only if all data were collected on the same date.)

date.col

(Character) Column name containing date information. (Defaults to "Date".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in date.)

compound.col

(Character) Column name of data.in containing the test compound. (Defaults to "Compound.Name".)

area.col

(Character) Column name of data.in containing the target analyte (that is, the test compound) MS peak area. (Defaults to "Area".)

istd.col

(Character) Column name of data.in containing the MS peak area for the internal standard. (Defaults to "ISTD.Area".)

type.col

(Character) Column name of data.in containing the sample type (see table under Details). (Defaults to "Type".)

direction.col

(Character) Column name of data.in containing the direction of the Caco-2 permeability experiment: either apical donor to basolateral receiver (AtoB), or basolateral donor to apical receiver (BtoA). (Defaults to "Direction".)

membrane.area

(Numeric) The area of the Caco-2 monolayer (in cm^2). (Defaults to NULL.) (Note: Single entry only, use only if all tested compounds have the same area for the Caco-2 monolayer.)

membrane.area.col

(Character) Column name containing membrane.area information. (Defaults to "Membrane.Area".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in membrane.area.)

receiver.vol.col

(Character) Column name of data.in containing the media volume (in cm^3) of the receiver portion of the Caco-2 experimental well. (Defaults to "Vol.Receiver".)

donor.vol.col

(Character) Column name of data.in containing the media volume (in cm^3) of the donor portion of the Caco-2 experimental well where the test chemical is added. (Defaults to "Vol.Donor".)

test.conc

(Numeric) The standard test chemical concentration for the Caco-2 assay. (Defaults to NULL.) (Note: Single entry only, use only if the same standard concentration was used for all tested compounds.)

test.conc.col

(Character) Column name containing test.conc information. (Defaults to "Test.Compound.Conc".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in test.conc.)

cal

(Character) MS calibration the samples were based on. Typically, this uses indices or dates to represent if the analyses were done on different machines on the same day or on different days with the same MS analyzer. (Defaults to NULL.) (Note: Single entry only, use only if all data were collected based on the same calibration.)

cal.col

(Character) Column name containing cal information. (Defaults to "Cal".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in cal.)

dilution

(Numeric) Number of times the sample was diluted before MS analysis. (Defaults to NULL.) (Note: Single entry only, use only if all samples underwent the same number of dilutions.)

dilution.col

(Character) Column name containing dilution information. (Defaults to "Dilution.Factor".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in dilution.)

time

(Numeric) The amount of time (in hours) before the receiver and donor compartments are measured. (Defaults to NULL.)

time.col

(Character) Column name containing meas.time information. (Defaults to "Time".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in meas.time.)

istd.name

(Character) The identity of the internal standard. (Defaults to NULL.) (Note: Single entry only, use only if all tested compounds use the same internal standard.)

istd.name.col

(Character) Column name containing istd.name information. (Defaults to "ISTD.Name".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in istd.name.)

istd.conc

(Numeric) The concentration for the internal standard. (Defaults to NULL.) (Note: Single entry only, use only if all tested compounds have the same internal standard concentration.)

istd.conc.col

(Character) Column name containing istd.conc information. (Defaults to "ISTD.Conc".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in istd.conc.)

test.nominal.conc

(Numeric) The nominal concentration added to the donor compartment at time 0. (Defaults to NULL.) (Note: Single entry only, use only if all tested compounds used the same concentration at time 0.

test.nominal.conc.col

(Character) Column name containing test.nominal.conc information. (Defaults to "Test.Target.Conc".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in test.nominal.conc.)

biological.replicates

(Character) Replicates with the same analyte. Typically, this uses numbers or letters to index. (Defaults to NULL.) (Note: Single entry only, use only if none of the test compounds have replicates.)

biological.replicates.col

(Character) Column name of data.in containing the number or the indices of replicates with the same analyte. (Defaults to "Biological.Replicates".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in biological.replicates.)

technical.replicates

(Character) Repeated measurements from one sample. Typically, this uses numbers or letters to index. (Defaults to NULL.) (Note: Single entry only, use only if none of the test compounds have replicates.)

technical.replicates.col

(Character) Column name of data.in containing the number or the indices of replicates taken from the one sample. (Defaults to "Technical.Replicates".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in technical.replicates.)

analysis.method

(Character) The analytical chemistry analysis method, typically "LCMS" or "GCMS", liquid chromatography or gas chromatography–mass spectrometry, respectively. (Defaults to NULL.) (Note: Single entry only, use only if the same method was used for all tested compounds.)

analysis.method.col

(Character) Column name containing analysis.method information. (Defaults to "Analysis.Method".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in analysis.method.)

analysis.instrument

(Character) The instrument used for chemical analysis, for example "Agilent 6890 GC with model 5973 MS". (Defaults to NULL.) (Note: Single entry only, use only if the same instrument was used for all tested compounds.)

analysis.instrument.col

(Character) Column name containing analysis.instrument information. (Defaults to "Analysis.Instrument".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in analysis.instrument.)

analysis.parameters

(Character) The parameters used to identify the compound on the chemical analysis instrument, for example "Negative Mode, 221.6/161.6, -DPb=26, FPc=-200, EPd=-10, CEe=-20, CXPf=-25.0". (Defaults to NULL.) (Note: Single entry only, use only if the same parameters were used for all tested compounds.)

analysis.parameters.col

(Character) Column name containing analysis.parameters information. (Defaults to "Analysis.Parameters".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in analysis.parameters.)

note.col

(Character) Column name of data.in containing additional notes on test compounds. (Defaults to "Note").

level0.file

(Character) The level-0 file from which the data.in were obtained. (Defaults to NULL.) (Note: Single entry only, use only if all rows in data.in were obtained from the same level-0 file.)

level0.file.col

(Character) Column name containing level0.file information. (Defaults to "Level0.File".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in level0.file.)

level0.sheet

(Character) The specific sheet name of level-0 file from which the data.in is obtained from, if the level-0 file is an Excel workbook. (Defaults to NULL.) (Note: Single entry only, use only if all rows in data.in were obtained from the same sheet in the same level-0 file.)

level0.sheet.col

(Character) Column name containing level0.sheet information. (Defaults to "Level0.Sheet".) (Note: data.in does not necessarily have this field. If this field is missing, it can be auto-filled with the value specified in level0.sheet.)

output.res

(Logical) When set to TRUE, the result table (level-1) will be exported to the user's per-session temporary directory or OUTPUT.DIR (if specified) as a .tsv file. (Defaults to FALSE.)

save.bad.types

(Logical) When set to TRUE, export data removed due to inappropriate sample types. See the Detail section for the required sample types. (Defaults to FALSE.)

sig.figs

(Numeric) The number of significant figures to round the exported result table (level-1). (Defaults to 5.)

INPUT.DIR

(Character) Path to the directory where the input level-0 file exists. If NULL, looking for the input level-0 file in the current working directory. (Defaults to NULL.)

OUTPUT.DIR

(Character) Path to the directory to save the output file. If NULL, the output file will be saved to the user's per-session temporary directory or INPUT.DIR if specified. (Defaults to NULL.)

verbose

(logical) Indicate whether printed statements should be shown. (Default is TRUE.)

Details

In this experiment an in vitro well is separated into two by a membrane composed of a monolayer of Caco-2 cells. A test chemical is added to either the apical or basolateral side of of the monolayer at time 0, and after a set time samples are taken from both the "donor" (side where the test chemical was added) and the "receiver" side. Depending on the direction of the test the donor side can be either apical or basolateral.

The data frame of observations should be annotated according to direction (either apical to basolateral – "AtoB" – or basolateral to apical – "BtoA") and type of concentration measured:

Blank with no chemical added Blank
Target concentration added to donor compartment at time 0 (C0) D0
Donor compartment at end of experiment D2
Receiver compartment at end of experiment R2

Chemical concentration is calculated qualitatively as a response and returned as a column in the output data frame:

Response <- AREA / ISTD.AREA * ISTD.CONC

If the output level-1 result table is chosen to be exported and an output directory is not specified, it will be exported to the user's R session temporary directory. This temporary directory is a per-session directory whose path can be found with the following code: tempdir(). For more details, see https://www.collinberke.com/til/posts/2023-10-24-temp-directories/.

As a best practice, INPUT.DIR and/or OUTPUT.DIR should be specified to simplify the process of importing and exporting files. This practice ensures that the exported files can easily be found and will not be exported to a temporary directory.

Value

A level-1 data frame with a standardized format containing a standardized set of columns and column names with membrane permeability data from a Caco-2 assay.

Author(s)

John Wambaugh

References

\insertRef

hubatsch2007determinationinvitroTKstats

Examples

## Load example level-0 data and do not export the result table
level0 <- invitroTKstats::caco2_L0
level1 <- format_caco2(data.in = level0,
                       sample.col = "Sample",
                       lab.compound.col = "Lab.Compound.ID",
                       compound.col = "Compound",
                       area.col = "Peak.Area",
                       istd.col = "ISTD.Peak.Area",
                       membrane.area = 0.11,
                       test.conc.col = "Compound.Conc",
                       cal = 1, 
                       time = 2, 
                       istd.conc = 1, 
                       test.nominal.conc = 10, 
                       biological.replicates = 1, 
                       technical.replicates = 1,
                       analysis.method.col = "Analysis.Params",
                       analysis.instrument = "Agilent.GCMS",
                       analysis.parameters = "Unknown",
                       note.col = NULL,
                       output.res = FALSE
)


invitroTKstats documentation built on Aug. 23, 2025, 9:08 a.m.