Description Usage Arguments Details Value Author(s) See Also Examples
Filtering of genes and features with low expression. Additionally, for the dmSQTLdata object, filtering of genotypes with low frequency.
1 2 3 4 5 6 7 8 9 10 11 12 13 | dmFilter(x, ...)
## S4 method for signature 'dmDSdata'
dmFilter(x, min_samps_gene_expr = 0,
min_samps_feature_expr = 0, min_samps_feature_prop = 0,
min_gene_expr = 0, min_feature_expr = 0, min_feature_prop = 0)
## S4 method for signature 'dmSQTLdata'
dmFilter(x, min_samps_gene_expr = 0,
min_samps_feature_expr = 0, min_samps_feature_prop = 0,
minor_allele_freq = 0.05 * nrow(samples(x)), min_gene_expr = 0,
min_feature_expr = 0, min_feature_prop = 0,
BPPARAM = BiocParallel::SerialParam())
|
x |
|
... |
Other parameters that can be defined by methods using this generic. |
min_samps_gene_expr |
Minimal number of samples where genes should be expressed. See Details. |
min_samps_feature_expr |
Minimal number of samples where features should be expressed. See Details. |
min_samps_feature_prop |
Minimal number of samples where features should be expressed. See details. |
min_gene_expr |
Minimal gene expression. |
min_feature_expr |
Minimal feature expression. |
min_feature_prop |
Minimal proportion for feature expression. This value should be between 0 and 1. |
minor_allele_freq |
Minimal number of samples where each of the genotypes has to be present. |
BPPARAM |
Parallelization method used by
|
Filtering parameters should be adjusted according to the sample size of the experiment data and the number of replicates per condition.
min_samps_gene_expr
defines the minimal number of samples where
genes are required to be expressed at the minimal level of
min_gene_expr
in order to be included in the downstream analysis.
Ideally, we would like that genes were expressed at some minimal level in
all samples because this would lead to better estimates of feature ratios.
Similarly, min_samps_feature_expr
and min_samps_feature_prop
defines the minimal number of samples where features are required to be
expressed at the minimal levels of counts min_feature_expr
or
proportions min_feature_prop
. In differential transcript/exon usage
analysis, we suggest using min_samps_feature_expr
and
min_samps_feature_prop
equal to the minimal number of replicates in
any of the conditions. For example, in an assay with 3 versus 5 replicates,
we would set these parameters to 3, which allows a situation where a
feature is expressed in one condition but may not be expressed at all in
another one, which is an example of differential transcript/exon usage.
By default, all the filtering parameters equal zero which means that features with zero expression in all samples are removed as well as genes with only one non-zero feature.
In QTL analysis, usually, we deal with data that has many more replicates
than data from a standard differential usage assay. Our example data set
consists of 91 samples. Requiring that genes are expressed in all samples may
be too stringent, especially since there may be missing values in the data
and for some genes you may not observe counts in all 91 samples. Slightly
lower threshold ensures that we do not eliminate such genes. For example, if
min_samps_gene_expr = 70
and min_gene_expr = 10
, only genes
with expression of at least 10 in at least 70 samples are kept. Samples with
expression lower than 10 have NA
s assigned and are skipped in the
analysis of this gene. minor_allele_freq
indicates the minimal number
of samples for the minor allele presence. Usually, it is equal to roughly 5%
of total samples.
Returns filtered dmDSdata
or
dmSQTLdata
object.
Malgorzata Nowicka
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | # --------------------------------------------------------------------------
# Create dmDSdata object
# --------------------------------------------------------------------------
## Get kallisto transcript counts from the 'PasillaTranscriptExpr' package
library(PasillaTranscriptExpr)
data_dir <- system.file("extdata", package = "PasillaTranscriptExpr")
## Load metadata
pasilla_metadata <- read.table(file.path(data_dir, "metadata.txt"),
header = TRUE, as.is = TRUE)
## Load counts
pasilla_counts <- read.table(file.path(data_dir, "counts.txt"),
header = TRUE, as.is = TRUE)
## Create a pasilla_samples data frame
pasilla_samples <- data.frame(sample_id = pasilla_metadata$SampleName,
group = pasilla_metadata$condition)
levels(pasilla_samples$group)
## Create a dmDSdata object
d <- dmDSdata(counts = pasilla_counts, samples = pasilla_samples)
## Use a subset of genes, which is defined in the following file
gene_id_subset <- readLines(file.path(data_dir, "gene_id_subset.txt"))
d <- d[names(d) %in% gene_id_subset, ]
# --------------------------------------------------------------------------
# Differential transcript usage analysis - simple two group comparison
# --------------------------------------------------------------------------
## Filtering
## Check what is the minimal number of replicates per condition
table(samples(d)$group)
d <- dmFilter(d, min_samps_gene_expr = 7, min_samps_feature_expr = 3,
min_gene_expr = 10, min_feature_expr = 10)
plotData(d)
# --------------------------------------------------------------------------
# Create dmSQTLdata object
# --------------------------------------------------------------------------
# Use subsets of data defined in the GeuvadisTranscriptExpr package
library(GeuvadisTranscriptExpr)
geuv_counts <- GeuvadisTranscriptExpr::counts
geuv_genotypes <- GeuvadisTranscriptExpr::genotypes
geuv_gene_ranges <- GeuvadisTranscriptExpr::gene_ranges
geuv_snp_ranges <- GeuvadisTranscriptExpr::snp_ranges
colnames(geuv_counts)[c(1,2)] <- c("feature_id", "gene_id")
colnames(geuv_genotypes)[4] <- "snp_id"
geuv_samples <- data.frame(sample_id = colnames(geuv_counts)[-c(1,2)])
d <- dmSQTLdata(counts = geuv_counts, gene_ranges = geuv_gene_ranges,
genotypes = geuv_genotypes, snp_ranges = geuv_snp_ranges,
samples = geuv_samples, window = 5e3)
# --------------------------------------------------------------------------
# sQTL analysis - simple group comparison
# --------------------------------------------------------------------------
## Filtering
d <- dmFilter(d, min_samps_gene_expr = 70, min_samps_feature_expr = 5,
minor_allele_freq = 5, min_gene_expr = 10, min_feature_expr = 10)
plotData(d)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.