curate_se_colData: Curate Summarized Experiment colData
In jmw86069/platjam: Platform Jam, biological platform importers.

curate_se_colData

R Documentation

Curate Summarized Experiment colData

Description

Apply curation to colData in a SummarizedExperiment object

Usage

curate_se_colData(
  se,
  df,
  pattern_colname = head(colnames(df), 1),
  group_colname = NULL,
  id_colname = "Label",
  use = c("colnames"),
  use_delim = "_",
  subset_se = FALSE,
  warn_multimatch = TRUE,
  indent = 0,
  verbose = TRUE,
  ...
)

Arguments

`se`	`SummarizedExperiment` object.
`df`	`data.frame` (or equivalent) which contains columns of data annotation to be applied. The first column is assumed to be the column used for patterns to be matched with identifiers in the `se` object. The pattern column can be defined with `pattern_colname`.
`pattern_colname`	`character` value indicating which column in `df` contains patterns to be matched with identifiers in `se`. The default uses the first column in `df`. This value is passed to `curate_to_df_by_pattern()`.
`group_colname`	`character` or `NULL` (default) indicating which column(s) represent experimental groups, used only to create a corresponding column with unique label for each entry. When `NULL` no action is taken, which is default.
`id_colname`	`character` used only when `group_colname` is defined and present in `colnames(df)`, used to create a unique label for each row in `colData(se)`. By default `group_colname=NULL` so no action is taken.
`use`	`character` string indicating the data to use as the identifiers when applying curation logic. The default is to use `colnames(se)`, however it can use one or more columns from `SummarizedExperiment::colData(se)`. Some options are described below: `"colnames"`: uses `colnames(se)`, which should be equivalent to using `rownames(SummarizedExperiment::colData(se))`. `"rownames"`: uses `rownames(SummarizedExperiment::colData(se))`, which as stated above should be equivalent to `colnames(se)`. one or more `character` values that match `colnames(colData(se))`.
`use_delim`	`character` string used as a delimiter when `use` is supplied as a vector with multiple colnames. The values in each column are concatenated using this delimiter, by calling `jamba::pasteByRow()`.
`subset_se`	`logical` indicating whether the `se` object columns be subset when not all identifiers matched the patterns in `df`. When `subset_se=FALSE` any entries in `se` for which the identifier did not match the pattern in `df`, the corresponding rows of `SummarizedExperiment::colData()` will contain `NA` values. When `subset=TRUE` any entries in `se` for which the identifier did not match the pattern in `df` will be removed from the `se` object. This option is sometimes a convenient way to subset a large data to use only user-defined samples.
`warn_multimatch`	`logical` indicating whether to print a warning when any one pattern matches two or more identifiers. Sometimes this behavior is intended, however it may indicate that the patterns are not specific enough to match one unique identifier. See Details.
`indent`	`numeric` value used when `verbose=TRUE`, passed to `jamba::printDebug()`.
`verbose`	`logical` indicating whether to print verbose output.
`...`	additional arguments are passed to `curate_to_df_by_pattern()`.

Details

Given a SummarizedExperiment object, this function is intended to augment the SummarizedExperiment::colData() annotation associated with columns, which are typically biological or experimental samples. Measurements within each sample are typically stored as rows.

A convenient wrapper to curate_to_df_by_pattern(), which applies the result directly to SummarizedExperiment::colData() which is stored as a S4Vectors::DataFrame-class.

Note that colnames present in both colData(se) and df will take the value from df as replacement, including the presence of NA values.

About pattern matching

The patterns are used to match identifiers using regular expressions, and the argument warn_multimatch=TRUE (default) will print a warning when one pattern matches two or more identifiers. It may be intended, or may indicate that some patterns are not specific enough to match only one intended identifier.

For example pattern="sample_3" will match identifiers: c("one_sample_3", "two_sample_3", "one_sample_31").

To overcome this type of issue, use regular expressions to limit matching to the end, for example pattern="sample_3$" will only match c("one_sample_3", "two_sample_3") and will not match "one_sample_31".

It can be helpful to name the pattern column "Pattern" so that the pattern used is clearly defined in the output colData(se), and can be compared to the intended identifiers.

Value

SummarizedExperiment::SummarizedExperiment object.

When subset_se=FALSE (default), the output will contain the same dimensions and column order as the input se.
When subset_se=TRUE the output object may contain fewer columns based upon the number of identifiers that matched the patterns supplied in df.

jmw86069/platjam
Platform Jam, biological platform importers.

curate_se_colData: Curate Summarized Experiment colData
In jmw86069/platjam: Platform Jam, biological platform importers.

Curate Summarized Experiment colData

Description

Usage

Arguments

Details

About pattern matching

Value

See Also

Related to curate_se_colData in jmw86069/platjam...

R Package Documentation

Browse R Packages

We want your feedback!

jmw86069/platjam Platform Jam, biological platform importers.

curate_se_colData: Curate Summarized Experiment colData In jmw86069/platjam: Platform Jam, biological platform importers.

Curate Summarized Experiment colData

Description

Usage

Arguments

Details

About pattern matching

Value

See Also

Related to curate_se_colData in jmw86069/platjam...

R Package Documentation

Browse R Packages

We want your feedback!

jmw86069/platjam
Platform Jam, biological platform importers.

curate_se_colData: Curate Summarized Experiment colData
In jmw86069/platjam: Platform Jam, biological platform importers.