read_from_excel: Read formatted Excel files
In antonvsdata/notame: Workflow for non-targeted LC-MS metabolic profiling

read_from_excel

R Documentation

Read formatted Excel files

Description

Reads data from an Excel file of the following format:

Left side of the sheet contains information about the features, size features x feature info columns
Top part contains sample information, size sample info variables x samples
The middle contains the actual abundances, size features x samples

See the vignette for more information. This function separates the three parts from the file, and returns them in a list

Usage

read_from_excel(
  file,
  sheet = 1,
  id_column = NULL,
  corner_row = NULL,
  corner_column = NULL,
  id_prefix = "ID_",
  split_by = NULL,
  name = NULL,
  mz_limits = c(10, 2000),
  rt_limits = c(0, 20),
  skip_checks = FALSE
)

Arguments

`file`	path to the Excel file
`sheet`	the sheet number or name
`id_column`	character, column name for unique identification of samples
`corner_row`	integer, the bottom row of sample information, usually contains data file names and feature info column names. If set to NULL, will be detected automatically.
`corner_column`	integer or character, the corresponding column number or the column name (letter) in Excel. If set to NULL, will be detected automatically.
`id_prefix`	character, prefix for autogenerated sample IDs, see Details
`split_by`	character vector, in the case where all the modes are in the same Excel file, the column names of feature data used to separate the modes (usually Mode and Column)
`name`	in the case where the Excel file only contains one mode, the name of the mode, such as "Hilic_neg"
`mz_limits`	numeric vector of two, all m/z values should be in between these
`rt_limits`	numeric vector of two, all retention time values should be in between these
`skip_checks`	logical: skip checking data integrity. Not recommended, but sometimes useful when you just want to read the data in as is and fix errors later. NOTE: Sample_ID and QC columns will not be constructed. The data integrity checks need to be passed when contstructing MetaboSet objects.

Details

Only specify one of split_by and name. The feature data returned will contain a column named "Split", which is used to separate features from different modes. Unless a column named "Feature_ID" is found in the file, a feature ID will be generated based on the value of "Split", mass and retention time. The function will try to find columns for mass and retention time by looking at a few common alternatives, and throw an error if no matching column is found. Sample information needs to contain a row called "Injection_order", and the values need to be unique. In addition, a possible sample identifier row needs to be named "Sample_ID", or to be specified in id_column, and the values need to be unique, with an exception of QC samples: if there are any "QC" identifiers, they will be replaced with "QC_1", "QC_2" and so on. If a "Sample_ID" row is not found, it will be created using the id_prefix and injection order.

Value

list of three data frames:

exprs: the actual abundances, size features x samples
pheno_data: sample information, size sample info variables x samples
feature_data: information about the features, size features x feature info columns

antonvsdata/notame documentation built on Sept. 14, 2024, 11:09 p.m.