DEXSeqDataSet: DEXSeqDataSet object and constructors

View source: R/classes.R

DEXSeqDataSet-classR Documentation

DEXSeqDataSet object and constructors

Description

The DEXSeqDataSet is a subclass of DESeqDataSet, specifically designed to adapt the DESeqDataSet to test for differences in exon usage.

Usage


  DEXSeqDataSet( countData, sampleData, 
    design= ~ sample + exon + condition:exon, 
    featureID, groupID, featureRanges=NULL, 
    transcripts=NULL, alternativeCountData=NULL)

  DEXSeqDataSetFromHTSeq( 
    countfiles, sampleData, 
    design= ~ sample + exon + condition:exon, 
    flattenedfile=NULL )

  DEXSeqDataSetFromSE( SE, 
    design= ~ sample + exon + condition:exon )

Arguments

countData

A matrix of count data of non-negative integer values. The rows correspond to counts for each exon counting bin, the columns correspond to samples. Note that biological replicates should each get their own column, while the counts of technical replicates (i.e., several sequencing runs/lanes from the same sample) should be summed up into a single column

alternativeCountData

DEXSeq can be also used for test for differences in exon inclusion based on the number of reads supporting the inclusion of an exon and the number of reads supporting the exclusion of an exon. A matrix of count data of non-negative integer values.The rows correspond to exonic regions and the columns correspond to samples. This matrix should contain the number of exon-exon junction reads that skip each exon in each sample. If NULL, then the sum of the other exons belonging to the same gene is considered for testing (i.e. the normal DEXSeq approach).

countfiles

A character vector containing the path to the files that were originated with the script 'dexseq_count.py'.

sampleData

A data.frame with the annotation (e.g. treatments, or tissue types, or phenotypes, or the like). The number of rows in the data frame must to be equal to the number of columns of the countData matrix, assigning the annotation of each sample.

design

A formula which specifies the design of the experiment. It must specify an interaction term between a variable from the sampleData columns with the 'exon' variable. By default, the design will be '~ sample + exon + condition:exon'. This formula indicates the contrast between 'condition' and exon', i.e. differences in exon usage due to changes in the 'condition' variable. See the vignette for more examples of other designs.

featureID

A character vector of counting regions identifiers ordered according to the rows in countData. The identifiers names can be repeated between groups but not within groups.

groupID

A vector of group identifiers ordered according to its respective row in countData. It must reflect the sets of counting regions belonging to the same group, for example, exon bins in belonging to the same gene should have the same group identifier.

featureRanges

Optional. GRanges or GRangesList describing the genomic coordinates of each of the rows of countData.

transcripts

Optional. A list of the same length as the number of rows in countData. Each element of the list should contain a character vector of transcript identifiers indicating the transcript identifiers overlapping with the exonic regions.

flattenedfile

A character vector containing the path to the flattened annotation file that was originated with the script 'dexseq_prepare_annotation.py'.

SE

A SummarizedExperiments object, originated using the function SummarizeOverlaps. See examples for more details.

Value

A DEXSeqDataSet object.

See Also

DEXSeqDataSetFromHTSeq DEXSeqDataSetFromSE

Examples


## Not run: 
  #######################################
  ### From the output of the          ### 
  ### acconpaning python scripts      ### 
  #######################################

  inDir = system.file("extdata", package="pasilla", mustWork=TRUE)
  flattenedfile = file.path(inDir, "Dmel.BDGP5.25.62.DEXSeq.chr.gff")
  sampleData = data.frame(
      condition = c( rep("treated", 3), rep("untreated", 4) ),
      type = c("single", "paired", "paired", "single", "single", "paired", "paired") )
     
  countFiles <- list.files(inDir, pattern="fb.txt")
  rownames( sampleData ) <- countFiles

  DEXSeqDataSetFromHTSeq(
    countfiles=file.path( inDir, countFiles ),
    sampleData = sampleData,
    design = ~ sample + exon + type:exon + condition:exon,
    flattenedfile=flattenedfile )


  #######################################
  ### From GRanges derived objects    ###
  #######################################

  library(GenomicRanges)
  library(GenomicFeatures)
  library(GenomicAlignments)

  hse <- makeTxDbFromBiomart( biomart="ensembl",
                             dataset="hsapiens_gene_ensembl",
                             host="grch37.ensembl.org")

  bamDir <- system.file( 
    "extdata", package="parathyroidSE", mustWork=TRUE )
  fls <- list.files( bamDir, pattern="bam$", full=TRUE )
  
  bamlst <- BamFileList( 
    fls, index=character(), 
    yieldSize=100000, obeyQname=TRUE )

  exonicParts <- exonicParts( hse, linked.to.single.gene.only = TRUE )

  SE <- summarizeOverlaps( exonicParts, bamlst, 
    mode="Union", singleEnd=FALSE,
    ignore.strand=TRUE, inter.feature=FALSE, fragments=TRUE )

  colData(SE)$condition <- c("A", "A", "B")

  DEXSeqDataSetFromSE( SE, 
    design= ~ sample + exon + condition:exon )
   

  #######################################
  ### From elementary data structures ###
  #######################################
  countData <- matrix( rpois(10000, 100), nrow=1000 )
  sampleData <- data.frame(
      condition=rep( c("untreated", "treated"), each=5 ) )
  design <- formula( ~ sample + exon + condition:exon )
  groupID <- rep(
      paste0("gene", 1:10),
      each= 100 )
  featureID <- rep(
      paste0("exon", 1:100),
      times= 10 )
  DEXSeqDataSet( countData, sampleData, design,
              featureID, groupID )


## End(Not run)


areyesq89/DEXSeq documentation built on Oct. 15, 2023, 1:40 p.m.