diffExpressedVariants: Retrieve condition-specific variants in RNA-seq data

View source: R/diffExpressedVariants.R

diffExpressedVariantsR Documentation

Retrieve condition-specific variants in RNA-seq data

Description

Function that retrieves condition-specific variants in RNA-seq data.

Usage

diffExpressedVariants(countsData, conditions, pvalue = 1, 
    filterLowCountsVariants = 10, flagLowCountsConditions = 10,
    technicalReplicates = FALSE,
    nbCore = 1)

Arguments

countsData

a data frame containing the counts in the appropriate format (see Details below).

conditions

a character vector containing the experimental conditions.

pvalue

a numerical value indicating the p-value threshold below which the events will be kept in the final data frame.

filterLowCountsVariants

a numerical value indicating the global variant count value (see Details below) below which events are filtered out in order to increase statistical power of the analysis. Both variant must have a read coverage below this value in order to remove the event. This filter is done after the normalization and the overdispersion estimation.

flagLowCountsConditions

a numerical value indicating the global condition count value (see Details below) below which we flag events as 'lowCounts' in the final data frame. At least n-1 conditions (over n conditions) must have low counts to flag the event as 'lowCounts'.

technicalReplicates

a boolean value indicating if the counts in countsData come from technical replicates only or not.

nbCore

an integer indicating the number of cores to use for the model fitting step.

Details

The countsData data frame must be formatted as follows:

  • Column 1: names of the events

  • Column 2: lengths (in bp) of the variants

  • Column 3 to n: counts corresponding to each replicate of each experimental condition of one variant

Each row corresponds to one variant, thus an event correspond to two rows with the longest variant (or inclusion variant) in the first row (thus denotated as upper path: UP) and the smallest variant (or exclusion variant) in the second row (thus denotated as lower path: LP). This data frame can be obtained using kissplice2counts function.\ The global variant count is the minimal number of reads that cover one or the other variant across all the replicates (sum by variant).\ The global condition count is the minimal number of reads that cover one or the other condition (sum by replicates for each conditions).

Value

diffExpressedVariants returns a list of 6 objects:

finalTable

a data frame containing the columns

  • ID: the variation identifier

  • Length_diff: the size of the variable region

  • UP_Condi_Rj_Norm (resp LP_Condi_Rj_Norm): returns the normalized counts of the first variant (UP, resp. second variant: LP), for the condition i (Condi) and the replicate j (Rj)

  • Adjusted_pvalue: p-value adjusted for multiple testing with Benjamini & Hochberg method

  • Deltaf/DeltaPSI: difference of relative abundance of variants across conditions. For instance if there are 2 conditions, deltaPSI returns relative abudance in condition 2 - relative abundance in condition 1. Inclusion variant's counts are corrected for the length of the variant so that we do not overestimate the PSI value.

  • lowcounts: a column that flag low counts in data. If TRUE, at least n-1 conditions over n conditions have less than 10 reads.

correctedPval

a numeric vector containing p-values after correction for multiple testing

uncorrectedPVal

a numeric vector containing p-values before correction for multiple testing

resultFitNBglmModel

a data frame containing the results of the fitting of the model to the data

f/psiTable

a data frame containing the allele frequency (f)/Percent Spliced In (PSI) of each replicate

k2rgFile

a string containing either the KisSplice2RefGenome file path and name or NULL if no KisSplice2RefGenome input file was given

References

Lopez-Maestre et al., 2016. Snp calling from rna-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence. Nucleic Acids Research, 44(19):e148. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/nar/gkw655")}

Examples

fpath1 <- system.file("extdata", "output_kissplice_SNV.fa", package = "kissDE")
mySNVcounts <- kissplice2counts(fpath1, counts = 0, pairedEnd = TRUE)
mySNVconditions <- c("EUR", "EUR", "TSC", "TSC")
# diffSNV <- diffExpressedVariants(mySNVcounts, mySNVconditions)

fpath2 <- system.file("extdata", "table_counts_alt_splicing.txt", 
package = "kissDE")
mySplicingconditions <- c("C1", "C1", "C2", "C2")
mySplicingcounts <- read.table(fpath2, header = TRUE)
# diffSplicing <- diffExpressedVariants(mySplicingcounts, mySplicingconditions)

aursiber/kissDE documentation built on Jan. 28, 2024, 10:01 a.m.