DaMiR.sampleFilt: Filter Samples by Mean Correlation Distance Metric

Description Usage Arguments Details Value Author(s) Examples

View source: R/Normalization.R

Description

This function implements a sample-per-sample correlation. Samples with a mean correlation lower than a user's defined threshold will be filtered out.

Usage

1
DaMiR.sampleFilt(data, th.corr = 0.9, type = c("spearman", "pearson"))

Arguments

data

A SummarizedExpression object

th.corr

Threshold of mean correlation; default is 0.9

type

Type of correlation metric; default is "spearman"

Details

This step introduces a sample quality checkpoint. Global gene expression should, in fact, exhibit a high correlation among biological replicates; conversely, low correlated samples may be suspected to bear some technical artifact (e.g. poor RNA or library preparation quality), despite they may have passed sequencing quality checks. If not assessed, these samples may, thus, negatively affect all the downstream analysis. This function looks at the mean absolute correlation of each sample and removes those samples with a mean correlation lower than the value set in th.corr argument. This threshold may be specific for different experimental setting but should be as high as possible. For sequencing data we suggest to set th.corr greater than 0.85.

Value

A SummarizedExperiment object which contains a normalized and filtered expression matrix (log2 scale) and a filtered data frame with 'class' and (optionally) variables.

Author(s)

Mattia Chiesa, Luca Piacentini

Examples

1
2
3
4
# use example data:
data(data_norm)
# filter out samples with Pearson's correlation <0.92:
data_filt<- DaMiR.sampleFilt(data_norm, th.corr=0.92, type ="pearson")

BioinfoMonzino/DaMiRseq documentation built on Aug. 22, 2021, 3:11 p.m.