merge_level0: Merge Multiple Level-0 files into a Single Table for...

View source: R/merge_level0.R

merge_level0R Documentation

Merge Multiple Level-0 files into a Single Table for Processing

Description

This function reads multiple Excel files containing mass-spectrometry (MS) data and extracts the chemical sample data from the specified sheets. The argument 'level0.catalog' is a table that provides the necessary information to find the data for each chemical. The primary data of interest are the analyte peak area, the internal standard peak area, and the target concentration for calibration curve (CC) samples. The argument 'data.label' is used to annotate this particular mapping of level-0 files into data ready to be organized into a level-1 file.

Usage

merge_level0(
  FILENAME = "MYDATA",
  level0.catalog,
  file.col = "File",
  sheet = NULL,
  sheet.col = "Sheet",
  skip.rows = NULL,
  skip.rows.col = "Skip.Rows",
  num.rows = NULL,
  num.rows.col = NULL,
  date = NULL,
  date.col = "Date",
  compound.col = "Chemical.ID",
  istd.col = "ISTD",
  col.names.loc = NULL,
  col.names.loc.col = "Col.Names.Loc",
  sample.colname = NULL,
  sample.colname.col = "Sample.ColName",
  type.colname = NULL,
  type.colname.col = "Type",
  peak.colname = NULL,
  peak.colname.col = "Peak.ColName",
  istd.peak.colname = NULL,
  istd.peak.colname.col = "ISTD.Peak.ColName",
  conc.colname = NULL,
  conc.colname.col = "Conc.ColName",
  analysis.param.colname = NULL,
  analysis.param.colname.col = "AnalysisParam.ColName",
  additional.colnames = NULL,
  additional.colname.cols = NULL,
  chem.ids,
  chem.lab.id.col = "Chem.Lab.ID",
  chem.name.col = "Compound",
  chem.dtxsid.col = "DTXSID",
  catalog.out = FALSE,
  output.res = FALSE,
  INPUT.DIR = NULL,
  OUTPUT.DIR = NULL,
  verbose = TRUE
)

Arguments

FILENAME

(Character) A string used to identify outputs of the function call. (Default to "MYDATA")

level0.catalog

A data frame describing which columns of which sheets in which Excel files contain MS data for analysis. See details for full explanation.

file.col

(Character) Column name containing level-0 file names to pull data from.

sheet

(Character) Excel file sheet name/identifier containing level-0 where data is to be pulled from. (Defaults to 'NULL'.) (Note: Single entry only, use only if all files have the same sheet identifier for level-0 data.)

sheet.col

(Character) Catalog column name containing 'sheet' information. (Default to "Sheet")

skip.rows

(Numeric) Number of rows to skip when extracting level-0 data from the specified Excel file(s). (Defaults to 'NULL'.) (Note: Single entry only, use only if all files need to skip the same number of rows for extracting level-0 data.)

skip.rows.col

(Character) Catalog column name containing 'skip.rows' information. (Default to "Skip.Rows")

num.rows

(Numeric) Number of rows to pull when extracting level-0 data from the specified Excel file(s). (Defaults to 'NULL'.) (Note: Single entry only, use only if all files need to pull the same number of rows for extracting level-0 data.)

num.rows.col

(Character) Catalog column name containing 'num.rows' information. (Default to 'NULL')

date

(Character) Date of laboratory measurements. Typical format "MMDDYY" ("MM" = 2 digit month, "DD" = 2 digit day, and "YY" = 2 digit year). (Defaults to 'NULL'.) (Note: Single entry only, use only if all files have the same laboratory measurement date.)

date.col

(Character) Catalog column name containing 'date' information. (Defaults to "Date")

compound.col

(Character) Catalog column name containing 'compound' information. (Defaults to "Chemical.ID")

istd.col

(Character) Catalog column name containing 'istd' information, or the MS peak area for the internal standard. (Defaults to "ISTD")

col.names.loc

(Numeric) Row location of data column names. (Defaults to 'NULL'.) (Note: Single entry only, use only if all files have column names in the same row location, typically the first row.)

col.names.loc.col

(Character) Catalog column name containing 'col.names.loc' information. (Defaults to "Col.Names.Loc")

sample.colname

(Character) Column name of level-0 data containing sample information. (Defaults to 'NULL'.) (Note: Single entry only, use only if all files use the same column name for sample names when extracting level-0 data.)

sample.colname.col

(Character) Catalog column name containing 'sample.colname' information. (Defaults to "Sample.ColName")

type.colname

(Character) Column name of the level-0 data containing the type of sample. (Defaults to 'NULL'.) (Note: Single entry only, use only if all files use the same column name for sample type information when extracting level-0 data.)

type.colname.col

(Character) Catalog column name containing 'type.colname' information. (Defaults to "Type".)

peak.colname

(Character) Column name of the level-0 data containing the analyte Mass Spectrometry peak area. (Defaults to 'NULL'.) (Note: Single entry only, use only if all files use the same column name for analyte peak area information when extracting level-0 data.)

peak.colname.col

(Character) Catalog column name containing 'peak.colname' information. (Defaults to "Peak.ColName")

istd.peak.colname

(Character) Column name of the level-0 data containing the internal standard Mass Spectrometry peak area. (Note: Single entry only, use only if all files use the same column name for internal standard MS peak area information when extracting level-0 data.)

istd.peak.colname.col

(Character) Catalog column name containing 'istd.peak.colname' information. (Defaults to "ISTD.Peak.ColName")

conc.colname

(Character) Column name of the level-0 data containing intended concentrations for calibration curves. (Defaults to 'NULL'.) (Note: Single entry only, use only if all files use the same column name for intended concentration information when extracting level-0 data.)

conc.colname.col

(Character) Catalog column name containing 'conc.colname' information. (Defaults to "Conc.ColName")

analysis.param.colname

(Character) Column name of the level-0 data containing Mass Spectrometry instrument parameters for the analyte. (Defaults to 'NULL'.) (Note: Single entry only, use only if all files use the same column name for analysis parameter information when extracting level-0 data.)

analysis.param.colname.col

(Character) Catalog column name containing 'analysis.param.colname' information. (Defaults to "AnalysisParam.ColName")

additional.colnames

Additional columns from the level-0 data files to pull information from when extracting level-0 data and include in the compiled level-0 returned from 'merge_level0'. (Defaults to 'NULL'.)

additional.colname.cols

Catalog column name(s) containing 'additional.colnames' information, (Defaults to 'NULL'.)

chem.ids

(Data frame) A data frame containing basic chemical identification information for tested chemicals.

chem.lab.id.col

(Character) Column in 'chem.ids' containing the compound/chemical identifier used by the laboratory in level-0 measured data. (Defaults to "Chem.Lab.ID")

chem.name.col

(Character) 'chem.ids' column name containing the "standard" chemical name to use for annotation of the compiled level-0 returned from 'merge_level0'. (Defaults to "Compound")

chem.dtxsid.col

(Character) ‘chem.ids' column name containing EPA’s DSSTox Structure ID (http://comptox.epa.gov/dashboard) (Defaults to "DTXSID")

catalog.out

(Logical) When set to TRUE, the data frame specified in level0.catalog will be exported to the user's per-session temporary directory or OUTPUT.DIR (if specified) as a .tsv file. (Defaults to FALSE.)

output.res

(Logical) When set to TRUE, the result table (level-0) will be exported to the user's per-session temporary directory or OUTPUT.DIR (if specified) as a .tsv file. (Defaults to FALSE.)

INPUT.DIR

(Character) Path to the directory where the Excel files with level-0 data exist. If not specified, looking for the files in the current working directory. (Defaults to NULL.)

OUTPUT.DIR

(Character) Path to the directory to save the output file. If NULL, the output file will be saved to the user's per-session temporary directory. (Defaults to NULL.)

verbose

(logical) Indicate whether printed statements should be shown. (Default is TRUE.)

Details

Unless specified to be a single value for all the files, for example sheet="Data", the argument 'level0.catalog' should be a data frame with the following columns:

File The Excel filename to be loaded
Sheet The name of the Sheet to examine within in the Excel file
Skip.Rows How many rows should be skipped on the sheet to get usable column names
Date The date the measurements were made
Chemical.ID The laboratory chemical identity
ISTD The internal standard used
Col.Names.Loc The row locations of the column names
Sample.ColName The column name on the sheet that contains sample identity
Type.ColName The column name on the sheet that contains the type of sample
Peak.ColName The column name on the sheet that contains the analyte MS peak area
ISTD.Peak.ColName The column name on the sheet that contains the internal standard MS peak area
Conc.ColName The column name on the sheet that contains the intended concentration for calibration curves
AnalysisParam.ColName The column name on the sheet that contains the MS instrument parameters for the analyte

Columns with names ending in ".ColName" indicate the columns to be extracted from the specified Excel file and sheet containing level-0 data.

If the output level-0 file is chosen to be exported and an output directory is not specified, it will be exported to the user's R session temporary directory. This temporary directory is a per-session directory whose path can be found with the following code: tempdir(). For more details, see https://www.collinberke.com/til/posts/2023-10-24-temp-directories/.

As a best practice, INPUT.DIR (when importing a .tsv file) and/or OUTPUT.DIR shoud be specified to simplify the process of importing and exporting files. This practice ensures that the exported files can easily be found and will not be exported to a temporary directory.

Value

data.frame

A data.frame in standardized level-0 format

Author(s)

John Wambaugh

Examples


# Create level0.catalog data.frame
# Will need to retrieve "Hep_745_949_959_082421_final.xlsx" file from 
# inst/extdata/Kreutz-Clint and save it to desired directory.
# Note XLSX file does not need to be saved to current working directory. 
catalog <- create_catalog(file = "Hep_745_949_959_082421_final.xlsx",
                          sheet = "Data063021",
                          skip.rows = 44,
                          num.rows = 30,
                          date = "063021",
                          compound = "745",
                          istd = "MFBET",
                          sample = "Name",
                          type = "Type",
                          peak = "Area...13",
                          istd.peak = "Resp....16",
                          conc = "Final Conc....11",
                          analysis.param = "Exp. Conc....10",
                          col.names.loc = 2)
# Create chem.ids data.frame
chem.ids <- data.frame("Chem.Lab.ID" = "745",
                       "Compound" = "(Heptafluorobutanoyl)pivaloylmethane",
                       "DTXSID" = "DTXSID3066215")
# Create level0 data.frame       
# Will need to replace <PATH TO FILE> with chosen desired directory containing
# XLSX file from above.                  
level0 <- merge_level0(level0.catalog = catalog,
             INPUT.DIR = system.file("extdata/Kreutz-Clint",package = "invitroTKstats"),
             istd.col = "ISTD.Name",
             type.colname.col = "Type.ColName",
             num.rows.col = "Number.Data.Rows",
             chem.ids = chem.ids,
             catalog.out = FALSE,
             output.res = FALSE) # do not auto-save the file


invitroTKstats documentation built on Aug. 23, 2025, 9:08 a.m.