remove_redundancy-methods: Drop redundant elements (e.g., samples) for which feature...

remove_redundancyR Documentation

Drop redundant elements (e.g., samples) for which feature (e.g., transcript/gene) abundances are correlated

Description

remove_redundancy() takes as input A 'tbl' (with at least three columns for sample, feature and transcript abundance) or 'SummarizedExperiment' (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) for correlation method or | <DIMENSION 1> | <DIMENSION 2> | <...> | for reduced_dimensions method, and returns a consistent object (to the input) with dropped elements (e.g., samples).

Usage

remove_redundancy(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  of_samples = TRUE,
  correlation_threshold = 0.9,
  top = Inf,
  transform = identity,
  Dim_a_column,
  Dim_b_column,
  log_transform = NULL
)

## S4 method for signature 'spec_tbl_df'
remove_redundancy(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  of_samples = TRUE,
  correlation_threshold = 0.9,
  top = Inf,
  transform = identity,
  Dim_a_column = NULL,
  Dim_b_column = NULL,
  log_transform = NULL
)

## S4 method for signature 'tbl_df'
remove_redundancy(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  of_samples = TRUE,
  correlation_threshold = 0.9,
  top = Inf,
  transform = identity,
  Dim_a_column = NULL,
  Dim_b_column = NULL,
  log_transform = NULL
)

## S4 method for signature 'tidybulk'
remove_redundancy(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  of_samples = TRUE,
  correlation_threshold = 0.9,
  top = Inf,
  transform = identity,
  Dim_a_column = NULL,
  Dim_b_column = NULL,
  log_transform = NULL
)

## S4 method for signature 'SummarizedExperiment'
remove_redundancy(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  of_samples = TRUE,
  correlation_threshold = 0.9,
  top = Inf,
  transform = identity,
  Dim_a_column = NULL,
  Dim_b_column = NULL,
  log_transform = NULL
)

## S4 method for signature 'RangedSummarizedExperiment'
remove_redundancy(
  .data,
  .element = NULL,
  .feature = NULL,
  .abundance = NULL,
  method,
  of_samples = TRUE,
  correlation_threshold = 0.9,
  top = Inf,
  transform = identity,
  Dim_a_column = NULL,
  Dim_b_column = NULL,
  log_transform = NULL
)

Arguments

.data

A 'tbl' (with at least three columns for sample, feature and transcript abundance) or 'SummarizedExperiment' (more convenient if abstracted to tibble with library(tidySummarizedExperiment))

.element

The name of the element column (normally samples).

.feature

The name of the feature column (normally transcripts/genes)

.abundance

The name of the column including the numerical value the clustering is based on (normally transcript abundance)

method

A character string. The method to use, correlation and reduced_dimensions are available. The latter eliminates one of the most proximar pairs of samples in PCA reduced dimensions.

of_samples

A boolean. In case the input is a tidybulk object, it indicates Whether the element column will be sample or transcript column

correlation_threshold

A real number between 0 and 1. For correlation based calculation.

top

An integer. How many top genes to select for correlation based method

transform

A function that will tranform the counts, by default it is log1p for RNA sequencing data, but for avoinding tranformation you can use identity

Dim_a_column

A character string. For reduced_dimension based calculation. The column of one principal component

Dim_b_column

A character string. For reduced_dimension based calculation. The column of another principal component

log_transform

DEPRECATED - A boolean, whether the value should be log-transformed (e.g., TRUE for RNA sequencing data)

Details

'r lifecycle::badge("maturing")'

This function removes redundant elements from the original data set (e.g., samples or transcripts). For example, if we want to define cell-type specific signatures with low sample redundancy. This function returns a tibble with dropped redundant elements (e.g., samples). Two redundancy estimation approaches are supported: (i) removal of highly correlated clusters of elements (keeping a representative) with method="correlation"; (ii) removal of most proximal element pairs in a reduced dimensional space.

Underlying method for correlation: widyr::pairwise_cor(sample, transcript,count, sort = TRUE, diag = FALSE, upper = FALSE)

Underlying custom method for reduced dimensions: select_closest_pairs = function(df) couples <- df |> head(n = 0)

while (df |> nrow() > 0) pair <- df |> arrange(dist) |> head(n = 1) couples <- couples |> bind_rows(pair) df <- df |> filter( !'sample 1' !'sample 2' )

couples

Value

A tbl object with with dropped redundant elements (e.g., samples).

A tbl object with with dropped redundant elements (e.g., samples).

A tbl object with with dropped redundant elements (e.g., samples).

A tbl object with with dropped redundant elements (e.g., samples).

A 'SummarizedExperiment' object

A 'SummarizedExperiment' object

Examples



 tidybulk::se_mini |>
 identify_abundant() |>
   remove_redundancy(
	   .element = sample,
	   .feature = transcript,
	   	.abundance =  count,
	   	method = "correlation"
	   	)

counts.MDS =
 tidybulk::se_mini |>
 identify_abundant() |>
  reduce_dimensions( method="MDS", .dims = 3)

remove_redundancy(
	counts.MDS,
	Dim_a_column = `Dim1`,
	Dim_b_column = `Dim2`,
	.element = sample,
  method = "reduced_dimensions"
)


stemangiola/tidybulk documentation built on April 9, 2024, 4:16 a.m.