isoformSwitchAnalysisPart1: Isoform Switch Analysis Workflow Part 1: Extract Isoform...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/high_level_functions.R

Description

This high-level function takes a pre-existing switchAnalyzeRlist as input (see importRdata). Then part 1 of the workflow is performed. Specifically it is filtered to remove low expression, isoform switches are identified via the statistical methods (unless switchTestMethod='none') and ORF are predicted if not already annotated. Lastly the function extracts the nucleotide sequence and the ORF AA sequences of the isoforms involved in isoform switches. To enable external and internal sequence analysis these sequences are both saved to the computer (as fasta files) and added to the switchAnalyzeRlist.

This function is meant to be used as part 1 of the isoform switch analysis workflow, which can be followed by the second step via isoformSwitchAnalysisPart2.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
isoformSwitchAnalysisPart1(
    switchAnalyzeRlist,
    alpha = 0.05,
    dIFcutoff = 0.1,
    switchTestMethod='DEXSeq',
    orfMethod = "longest",
    genomeObject = NULL,
    cds = NULL,
    pathToOutput = getwd(),
    outputSequences = TRUE,
    prepareForWebServers = FALSE,
    overwriteORF=FALSE,
    quiet=FALSE
)

Arguments

switchAnalyzeRlist

A switchAnalyzeRlist.

alpha

The cutoff which the FDR correct p-values must be smaller than for calling significant switches. Default is 0.05.

dIFcutoff

The cutoff which the changes in (absolute) isoform usage must be larger than before an isoform is considered switching. This cutoff can remove cases where isoforms with (very) low dIF values are deemed significant and thereby included in the downstream analysis. This cutoff is analogous to having a cutoff on log2 fold change in a normal differential expression analysis of genes to ensure the genes have a certain effect size. Default is 0.1 (10%).

switchTestMethod

A sting indicating which statistical method should be used for testing differential isoform usage. The following options are available:

  • 'DEXSeq' : Uses DEXSeq to perform the statistical test. See isoformSwitchTestDEXSeq. Default

  • 'DRIMSeq' : Uses the DRIMSeq package to perform the statistical test. See isoformSwitchTestDRIMSeq.

  • 'none' : No statistical test is performed. Should only be used if a test have already been performed and should not be overwritten (e.g when importing cuffdiff data).

orfMethod

A string indicating which of the 4 ORF identification methods should be used. The methods are:

  • longest : Identifies the longest ORF in the transcript. This approach is similar to what the CPAT tool uses in it's analysis of coding potential

  • longestAnnotated : Identifies the longest ORF downstream of an annotated translation start site (supplied via the cds argument)

  • mostUpstreamAnnoated : Identifies the ORF downstream of the most upstream overlapping annotated translation start site (supplied via the cds argument)

Default is longest.

genomeObject

A BSgenome object (for example Hsapiens for Homo sapiens).

pathToOutput

A path to the folder in which the plots should be made. Default is working directory ( getwd() ).

cds

A CDSSet object containing annotated coding regions, see ?CDSSet and ?getCDS for more information. Only necessary if \'orfType\' arguments is \'longestAnnotated\' or \'mostUpstreamAnnoated\'.

overwriteORF

A logical indicating whether to overwrite the ORF analysis already stored in the supplied switchAnalyzeRlist. Default is FALSE.

outputSequences

A logical indicating whether transcript nucleotide and amino acid sequences should be outputted to pathToOutput. Default is TRUE.

prepareForWebServers

A logical indicating whether the amino acid fasta files saved (if outputSequences=TRUE) should be prepared for the online web-services currently supported (as they have some limitations on what can submitted). See details. Default is FALSE (for backward compatibility).

quiet

A logical indicating whether to avoid printing progress messages (incl. progress bar). Default is FALSE

Details

This function performs the first part of a Isoform Analysis Workflow by

  1. Remove non-expressed isoforms and single-isoform genes (see preFilter)

  2. Predict isoform switches unless switchTestMethod is set to 'none'.

  3. If no ORFs are annotated the isoforms are analyzed for open reading frames (ORFs, see analyzeORF)

  4. The isoform nucleotide and ORF amino acid sequences are extracted and saved to fasta files as well as added to the switchAnalyzeRlist enabling external sequence analysis with CPAT, Pfam and SignalP (see vignette for more info).

if prepareForWebServers=TRUE both the "removeLongAAseq" and "alsoSplitFastaFile" will be enabled in the extractSequence function.

Value

This function have two outputs. It returns a switchAnalyzeRlist object where information about the isoform switch test, ORF prediction and nt and aa sequences have been added. Secondly (if outputSequences is TRUE) the nucleotide and amino acid sequence of transcripts involved in switches are also save as fasta files enabling external sequence analysis.

Author(s)

Kristoffer Vitting-Seerup

References

Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Mol. Cancer Res. (2017).

See Also

preFilter
isoformSwitchTestDEXSeq
isoformSwitchTestDRIMSeq
analyzeORF
extractSequence

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data("exampleSwitchList")
exampleSwitchList

exampleSwitchList <- isoformSwitchAnalysisPart1(
    switchAnalyzeRlist=exampleSwitchList,
    dIFcutoff = 0.4,        # Set high for short runtime in example data
    outputSequences = FALSE # keeps the function from outputting the fasta files from this example
)

exampleSwitchList

IsoformSwitchAnalyzeR documentation built on Nov. 8, 2020, 5:36 p.m.