diffExpressedVariants: Retrieve condition-specific variants in RNA-seq data
In kissDE: Retrieves Condition-Specific Variants in RNA-Seq Data

Description Usage Arguments Details Value References Examples

View source: R/diffExpressedVariants.R

Function that retrieves condition-specific variants in RNA-seq data.

diffExpressedVariants(countsData, conditions, pvalue = 1, 
    filterLowCountsVariants = 10, flagLowCountsConditions = 10,
    technicalReplicates = FALSE,
    nbCore = 1)

`countsData`	a data frame containing the counts in the appropriate format (see Details below).
`conditions`	a character vector containing the experimental conditions.
`pvalue`	a numerical value indicating the p-value threshold below which the events will be kept in the final data frame.
`filterLowCountsVariants`	a numerical value indicating the global variant count value (see Details below) below which events are filtered out in order to increase statistical power of the analysis. Both variant must have a read coverage below this value in order to remove the event. This filter is done after the normalization and the overdispersion estimation.
`flagLowCountsConditions`	a numerical value indicating the global condition count value (see Details below) below which we flag events as 'lowCounts' in the final data frame. At least n-1 conditions (over n conditions) must have low counts to flag the event as 'lowCounts'.
`technicalReplicates`	a boolean value indicating if the counts in `countsData` come from technical replicates only or not.
`nbCore`	an integer indicating the number of cores to use for the model fitting step.

The countsData data frame must be formatted as follows:

Column 1: names of the events
Column 2: lengths (in bp) of the variants
Column 3 to n: counts corresponding to each replicate of each experimental condition of one variant

Each row corresponds to one variant, thus an event correspond to two rows with the longest variant (or inclusion variant) in the first row (thus denotated as upper path: UP) and the smallest variant (or exclusion variant) in the second row (thus denotated as lower path: LP). This data frame can be obtained using kissplice2counts function.\ The global variant count is the minimal number of reads that cover one or the other variant across all the replicates (sum by variant).\ The global condition count is the minimal number of reads that cover one or the other condition (sum by replicates for each conditions).

diffExpressedVariants returns a list of 6 objects:

`finalTable`	a data frame containing the columns `ID`: the variation identifier `Length_diff`: the size of the variable region `UP_Condi_Rj_Norm (resp LP_Condi_Rj_Norm)`: returns the normalized counts of the first variant (UP, resp. second variant: LP), for the condition i (`Condi`) and the replicate j (`Rj`) `Adjusted_pvalue`: p-value adjusted for multiple testing with Benjamini & Hochberg method `Deltaf/DeltaPSI`: difference of relative abundance of variants across conditions. For instance if there are 2 conditions, `deltaPSI` returns relative abudance in condition 2 - relative abundance in condition 1. Inclusion variant's counts are corrected for the length of the variant so that we do not overestimate the PSI value. `lowcounts`: a column that flag low counts in data. If `TRUE`, at least n-1 conditions over n conditions have less than 10 reads.
`correctedPval`	a numeric vector containing p-values after correction for multiple testing
`uncorrectedPVal`	a numeric vector containing p-values before correction for multiple testing
`resultFitNBglmModel`	a data frame containing the results of the fitting of the model to the data
`f/psiTable`	a data frame containing the allele frequency (f)/Percent Spliced In (PSI) of each replicate
`k2rgFile`	a string containing either the `KisSplice2RefGenome` file path and name or NULL if no `KisSplice2RefGenome` input file was given

Lopez-Maestre et al., 2016. Snp calling from rna-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence. Nucleic Acids Research, 44(19):e148. https://doi.org/10.1093/nar/gkw655

fpath1 <- system.file("extdata", "output_kissplice_SNV.fa", package = "kissDE")
mySNVcounts <- kissplice2counts(fpath1, counts = 0, pairedEnd = TRUE)
mySNVconditions <- c("EUR", "EUR", "TSC", "TSC")
# diffSNV <- diffExpressedVariants(mySNVcounts, mySNVconditions)

fpath2 <- system.file("extdata", "table_counts_alt_splicing.txt", 
package = "kissDE")
mySplicingconditions <- c("C1", "C1", "C2", "C2")
mySplicingcounts <- read.table(fpath2, header = TRUE)
# diffSplicing <- diffExpressedVariants(mySplicingcounts, mySplicingconditions)