analyzeSwitchConsequences: Analyze Consequences of Isoform Switches

View source: R/analyze_switch_consequences.R

analyzeSwitchConsequencesR Documentation

Analyze Consequences of Isoform Switches

Description

This function extracts all isoforms with an absolute dIF change larger than dIFcutoff from genes with a significant isoform switch (as defined by alpha). For each gene these isoforms are then analyzed for differences in the functional annotation (defined by consequencesToAnalyze) by pairwise comparing the isoforms that are used more (switching up (dIF > 0)) with the isoforms that are used less (switching down (dIF < 0)). For each comparison a small report of the analyzed features is returned.

Usage

analyzeSwitchConsequences(
    switchAnalyzeRlist,
    consequencesToAnalyze=c(
        'intron_retention',
        'coding_potential',
        'ORF_seq_similarity',
        'NMD_status',
        'domains_identified',
        'domain_isotype',
        'IDR_identified',
        'IDR_type',
        'signal_peptide_identified'
    ),
    alpha=0.05,
    dIFcutoff=0.1,
    onlySigIsoforms=FALSE,
    ntCutoff=50,
    ntFracCutoff=NULL,
    ntJCsimCutoff=0.8,
    AaCutoff=10,
    AaFracCutoff=0.8,
    AaJCsimCutoff=0.9,
    removeNonConseqSwitches=TRUE,
    showProgress=TRUE,
    quiet=FALSE
)

Arguments

switchAnalyzeRlist

A switchAnalyzeRlist object containing the result of an isoform switch analysis (such as the one provided by isoformSwitchTestDEXSeq) as well as additional annotation data for the isoforms.

consequencesToAnalyze

A vector of strings indicating what type of functional consequences to analyze. Do note that there is bound to be some differences between isoforms (else they would be identical and not annotated as separate isoforms). See details for full list of usable strings and their meaning. Default is c('intron_retention','coding_potential','ORF_seq_similarity','NMD_status','domains_identified','signal_peptide_identified') (corresponding to analyze: intron retention, CPAT result, ORF AA sequence similarity, NMD status, protein domains annotated and signal peptides annotated by Pfam).

alpha

The cutoff which the FDR correct p-values (q-values) must be smaller than for calling significant switches. Default is 0.05.

dIFcutoff

The cutoff which the changes in (absolute) isoform usage must be larger than before an isoform is considered switching. This cutoff can remove cases where isoforms with (very) low dIF values are deemed significant and thereby included in the downstream analysis. This cutoff is analogous to having a cutoff on log2 fold change in a normal differential expression analysis of genes to ensure the genes have a certain effect size. Default is 0.1 (10%).

onlySigIsoforms

A logic indicating whether to only consider significant isoforms, meaning only analyzing genes where at least two isoforms which both have significant usage changes in opposite direction (quite strict) Naturally this only works if the isoform switch test used have isoform resolution (which the build in isoformSwitchTestDEXSeq has). If FALSE all isoforms with an absolute dIF value larger than dIFcutoff in a gene with significant switches (defined by alpha and dIFcutoff) are included in the pairwise comparison. Default is FALSE (non significant isoforms are also considered based on the logic that if one isoform changes it contribution - there must be an equivalent opposite change in usage in the other isoforms from that gene).

ntCutoff

An integer indicating the length difference (in nt) a comparison must be larger than for reporting differences when evaluating 'isoform_length', 'ORF_length', '5_utr_length', '3_utr_length', 'isoform_seq_similarity', '5_utr_seq_similarity' and '3_utr_seq_similarity'. Default is 50 (nt).

ntFracCutoff

An numeric indicating the cutoff in length difference, measured as a fraction of the length of the downregulated isoform, a comparison must be larger than for reporting differences when evaluating 'isoform_length', 'ORF_length', '5_utr_length', '3_utr_length'. For example does 0.05 mean the upregulated isoform must be 5% longer/shorter before it is reported. NULL disables the filter. Default is NULL.

ntJCsimCutoff

An numeric (between 0 and 1) indicating the cutoff on Jaccard Similarity (JCsim) (see details) between the overlap of two nucloetide (nt) sequences. If the measured JCsim is smaller than this cutoff the sequences are considered different and reported as such. This cutoff affects the result of the 'isoform_seq_similarity', '5_utr_seq_similarity' and '3_utr_seq_similarity' analysis. Default is 0.8

AaCutoff

An integer indicating the length difference (in AA) a comparison must be larger than for reporting differences when evaluating 'ORF_seq_similarity', primarily implemented to avoid differences in very short AA sequences being classified as different. Default is 10 (AA).

AaFracCutoff

An numeric indicating the cutoff of length difference of the protein domain or IDR. The difference is measured as a fraction of the longest region, a comparison must be larger than before reporting it. Only used when analyzing 'domain_length' or 'IDR_length'. For example setting AaFracCutoff = 0.8 means the short protein domain (or IDR) must be less than 80% of the length of the long region, before it is reported. NULL disables the filter. Default is 0.8.

AaJCsimCutoff

An numeric (between 0 and 1) indicating the cutoff on Jaccard Similarity (JCsim) (see details) between the overlap of two amino acid (AA) sequences. If the measured JCsim is smaller than this cutoff the sequences are considered different and reported as such. This cutoff affect the result of the 'ORF_seq_similarity' analysis. Default is 0.9

removeNonConseqSwitches

A logic indicating whether to, in the "switchConsequence" entry added to the switchAnalyzeRList, remove the comparison of isoforms where no consequences were found (if TRUE) or to keep then (if FALSE). Defaults is TRUE.

showProgress

A logic indicating whether to make a progress bar (if TRUE) or not (if FALSE). Default is TRUE.

quiet

A logic indicating whether to avoid printing progress messages (incl. progress bar). Default is FALSE

Details

Changes in isoform usage are measure as the difference in isoform fraction (dIF) values, where isoform fraction (IF) values are calculated as <isoform_exp> / <gene_exp>.

The idea is that once we know there is (at least) one isoform with a significant change in how much it is used (as defined by alpha and dIFcutoff) in a gene we take that/those isoform(s) and compare the functional annotation of this isoform to the isoform(s) with the compensatory change(s) in isoform usage (since if one isoform is use more another/others have to be used less). Here we only require that one of the isoforms in the comparison of annotation is significant (unless onlySigIsoforms=TRUE, then both must be), but all isoforms considered must have a change in isoform usage larger than dIFcutoff.

Note that sometimes we find complex switches meaning that many isoforms passes all the filters. In these cases we compare all pairwise combinations of the isoform(s) used more (positive dIF) vs the isoform(s) used less (negative dIF).

For sequences similarity analysis the two compared sequences are (globally) aligned to one another and the Jaccard similarity (JCsim) is calculated. Here JCsim is defined as the length of the aligned regions (omitting gaps) divided by the total combined unique sequence length: JCsim = (length of aligned region w.o gaps) / ( (length of sequence a) + (length of sequence b) - (length of aligned region w.o gaps) ). The pairwise alignment is done with pairwiseAlignment{Biostrings} as a Needleman-Wunsch global alignment which is guaranteed to find the optimal global alignment. The pairwise alignment is done with end gap penalties for the full sequences alignments ('isoform_seq_similarity' and 'ORF_seq_similarity') and without gap penalties for the alignment of sub-sequence ('5_utr_seq_similarity' and '3_utr_seq_similarity') by specifying type='global' and type='overlap' respectively.

If AA sequences were trimmed in the process of exporting the fasta files when using extractSequence the regions not analyzed in both isoforms will be ignored.

The arguments passed to consequencesToAnalyze must be a combination of:

  • all : Test transcripts for any of the differences described below. Please note that jointly the analysis below covers all transcript feature meaning that they should be different. Furthermore note that 'class_code' will only be included if the switchAnalyzeRlist was made from Cufflinks/Cuffdiff output.

  • tss : Test transcripts for whether they use different Transcription Start Site (TSS) (more than ntCutoff from each other).

  • tts : Test transcripts for whether they use different Transcription Termination Site (TTS) (more than ntCutoff from each other).

  • last_exon : Test whether transcripts utilizes different last exons (defined as the last exon of each transcript is non-overlapping).

  • isoform_seq_similarity : Test whether the isoform nucleotide sequences are different (as described above). Reported as different if the measured JCsim is smaller than ntJCsimCutoff and the length difference of the aligned and combined region is larger than ntCutoff.

  • isoform_length : Test transcripts for differences in isoform length. Only reported if the difference is larger than indicated by the ntCutoff and ntFracCutoff. Please note that this is a less powerful analysis than implemented in 'isoform_seq_similarity' as two equally long sequences might be very different.

  • exon_number : Test transcripts for differences in exon number.

  • intron_structure : Test transcripts for differences in intron structure, e.g. usage of exon-exon junctions. This analysis corresponds to analyzing whether all introns in one isoform is also found in the other isoforms.

  • intron_retention : Test for differences in intron retentions (and their genomic positions). Require that analyzeIntronRetention have been run.

  • isoform_class_code : Test transcripts for differences in the transcript classification provide by cufflinks. For a updated list of class codes see http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/#transfrag-class-codes.

  • coding_potential : Test transcripts for differences in coding potential, as indicated by the CPAT or CPC2 analysis. Requires that analyzeCPAT or analyzeCPC2 have been used to add external conding potential analysis to the switchAnalyzeRlist.

  • ORF_seq_similarity : Test whether the amino acid sequences of the ORFs are different (as described above). Reported as different if the measured JCsim is smaller than AaJCsimCutoff and the length difference of the aligned and combined region is larger than AaCutoff. Requires that least one of the isoforms are annotated with a ORF either via identifyORF or by supplying a GTF file and setting addAnnotatedORFs=TRUE when creating the switchAnalyzeRlist.

  • ORF_genomic : Test transcripts for differences in genomic position of the Open Reading Frames (ORF). Requires that least one of the isoforms are annotated with an ORF either via identifyORF or by supplying a GTF file and settingaddAnnotatedORFs=TRUE when creating the switchAnalyzeRlist.

  • ORF_length : Test transcripts for differences in length of Open Reading Frames (ORF). Note that this is a less powerful analysis than implemented in ORF_seq_similarity as two equally long sequences might be very different. Only reported if the difference is larger than indicated by the ntCutoff and ntFracCutoff. Requires that least one of the isoforms are annotated with a ORF either via identifyORF or by supplying a GTF file and setting addAnnotatedORFs=TRUE when creating the switchAnalyzeRlist.

  • 5_utr_seq_similarity : Test whether the isoform nucleotide sequences of the 5' UnTranslated Region (UTR) are different (as described above). The 5'UTR is defined as the region from the transcript start to the ORF start. Reported as different if the measured JCsim is smaller than ntJCsimCutoff and the length difference of the aligned and combined region is larger than ntCutoff. Requires that both the isoforms are annotated with an ORF.

  • 5_utr_length : Test transcripts for differences in the length of the 5' UnTranslated Region (UTR). The 5'UTR is defined as the region from the transcript start to the ORF start. Note that this is a less powerful analysis than implemented in '5_utr_seq_similarity' as two equally long sequences might be very different. Only reported if the difference is larger than indicated by the ntCutoff and ntFracCutoff. Requires that both the isoforms are annotated with a ORF.

  • 3_utr_seq_similarity : Test whether the isoform nucleotide sequences of the 3' UnTranslated Region (UTR) are different (as described above). The 3'UTR is defined as the region from the end of the ORF to the transcript end. Reported as different if the measured JCsim is smaller than ntJCsimCutoff and the length difference of the aligned and combined region is larger than ntCutoff. Requires that both the isoforms are annotated with a ORF.

  • 3_utr_length : Test transcripts for differences in the length of the 3' UnTranslated Regions (UTR). The 3'UTR is defined as the region from the end of the ORF to the transcript end. Note that this is a less powerful analysis than implemented in 3_utr_seq_similarity as two equally long sequences might be very different. Requires that identifyORF have been used to predict NMD sensitivity or that the ORF was imported though one of the dedicated import functions implemented in isoformSwitchAnalyzeR. Only reported if the difference is larger than indicated by the ntCutoff and ntFracCutoff. Requires that both the isoforms are annotated with a ORF.

  • NMD_status : Test transcripts for differences in sensitivity to Nonsense Mediated Decay (NMD). Requires that both the isoforms have been annotated with PTC either via identifyORF or by supplying a GTF file and setting addAnnotatedORFs=TRUE when creating the switchAnalyzeRlist.

  • domains_identified : Test transcripts for differences in the name and order of which domains are identified by the Pfam in the transcripts. Requires that analyzePFAM have been used to add external Pfam analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • domain_isotype : Test transcripts for differences in the isotype of overlapping protein domain. Requires that analyzePFAM have been used to add external Pfam analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • domain_length : Test transcripts for differences in the length of overlapping domains of the same type (same hmm_name) thereby enabling analysis of protein domain truncation. Do however note that a small difference in length is will likely not truncate the protein domain. The length difference, measured in AA, must be larger than AaCutoff and AaFracCutoff. Requires that analyzePFAM have been used to add external Pfam analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • genomic_domain_position : Test transcripts for differences in the genomic position of the domains identified by the Pfam analysis (Will be different unless the two isoforms have the same domains at the same genomic location). Requires that analyzePFAM have been used to add external Pfam analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • IDR_identified : Test for differences in isoform IDRs. Specifically the two isoforms are analyzed for whether they contain IDRs which do not overlap in genomic coordinates. Requires that analyzeNetSurfP2 or analyzeIUPred2A have been used to add external IDR analysis to the switchAnalyzeRlist.

  • IDR_length : Test for differences in the length of overlapping (in genomic coordinates) IDRs. The length difference, measured in AA, must be larger than AaCutoff and AaFracCutoff. Requires that analyzeNetSurfP2 or analyzeIUPred2A have been used to add external IDR analysis to the switchAnalyzeRlist.

  • IDR_type : Test for differences in IDR type. Specifically the two isoforms are tested for overlapping IDRs (genomic coordinates) and overlapping IDRs are compared with regards to their IDR type (IDR vs IDR w binding site). Only available if analyzeIUPred2A was used to add external IDR analysis to the switchAnalyzeRlist.

  • signal_peptide_identified : Test transcripts for differences in whether a signal peptide was identified or not by the SignalP analysis. Requires that analyzeSignalP have been used to add external SignalP analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • sub_cell_location : Test for any differences in predicted sub-cellular localization. Requires that analyzeDeepLoc2 have been used to add external location analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • sub_cell_shift_to_cell_membrane : Test for differences in membrane associations. Requires that analyzeDeepLoc2 have been used to add external location analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • sub_cell_shift_to_cytoplasm : Test for differences in cytoplasm associations. Requires that analyzeDeepLoc2 have been used to add external location analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • sub_cell_shift_to_nucleus : Test for differences in nuclear localization. Requires that analyzeDeepLoc2 have been used to add external location analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • sub_cell_shift_to_Extracellular : Test differences in extracellular localization. Requires that analyzeDeepLoc2 have been used to add external location analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • isoform_topology : Test differences in predicted topology compared to the cell membrane (intracellular, transmembrane helix, extracellular). Requires that analyzeDeepTMHMM have been used to add external topology analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • extracellular_region_count : Test differences in number of extracellular region. Requires that analyzeDeepTMHMM have been used to add external topology analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • intracellular_region_count : Test differences in number of intracellular region. Requires that analyzeDeepTMHMM have been used to add external topology analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF.

  • extracellular_region_length : Test differences in total length of extracellular regions. Requires that analyzeDeepTMHMM have been used to add external topology analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF. The length difference, measured in AA, must be larger than AaCutoff and AaFracCutoff

  • intracellular_region_length : Test differences in total length of intracellular regions. Requires that analyzeDeepTMHMM have been used to add external topology analysis to the switchAnalyzeRlist. Requires that both the isoforms are annotated with a ORF. The length difference, measured in AA, must be larger than AaCutoff and AaFracCutoff

Value

The supplied switchAnalyzeRlist is returned, but now annotated with the predicted functional consequences as follows. First a column called 'switchConsequencesGene' is added to isoformFeatures entry of switchAnalyzeRlist. This column containing a binary indication (TRUE/FALSE (and NA)) of whether the switching gene have predicted functional consequences or not.

Secondly the data.frame 'switchConsequence' is added to the switchAnalyzeRlist containing one row feature analyzed per comparison of isoforms pr comparison of condition. It contains 8 columns:

  • gene_ref : A unique reference to a specific gene in a specific comparison of conditions. Enables easy handles to integrate data from all the parts of a switchAnalyzeRlist.

  • gene_id: The id of the gene which the isoforms compared belongs to. Matches the 'gene_id' entry in the 'isoformFeatures' entry of the switchAnalyzeRlist

  • gene_name : The gene name associated with the <gene_id>, typically a more readable one (for example p53 or BRCA1)

  • condition_1: The first condition of the comparison. Should be though of as the ground state - meaning the changes occure from condition_1 to condition_2. Matches the 'condition_1' entry in the 'isoformFeatures' entry of the switchAnalyzeRlist

  • condition_2: The second condition of the comparison. Should be though of as the changed state - meaning the changes occure from condition_1 to condition_2. Matches the 'condition_2' entry in the 'isoformFeatures' entry of the switchAnalyzeRlist

  • isoformUpregulated: The name of the isoform which is used more in condition_2 (when compared to condition_1, positive dIF values). Matches the 'isoform_id' entry in the 'isoformFeatures' entry of the switchAnalyzeRlist

  • isoformDownregulated: The name of the isoform which is used less in condition_2 (when compared to condition_1, negative dIF values). Matches the 'isoform_id' entry in the 'isoformFeatures' entry of the switchAnalyzeRlist

  • iso_ref_up : A unique reference to a specific isoform in a specific comparison of conditions for the isoform switching up. Enables easy handles to integrate data from all the parts of a switchAnalyzeRlist.

  • iso_ref_down : A unique reference to a specific isoform in a specific comparison of conditions for the isoform switching down. Enables easy handles to integrate data from all the parts of a switchAnalyzeRlist.

  • featureCompared: The category of the isoform features/annotation compared in this row (see details above)

  • isoformsDifferent: A logic (TRUE/FALSE) indicating whether the two isoforms are different with respect to the featureCompared (see details above)

  • switchConsequence: If the isoforms compared are different this columns contains a short description of the features of the upregulated isoform. E.g. domain loss means that the upregulated isoforms (isoformUpregulated) have lost domains compare to the downregulated isoform (isoformDownregulated).

Author(s)

Kristoffer Vitting-Seerup

References

Vitting-Seerup et al. The Landscape of Isoform Switches in Human Cancers. Mol. Cancer Res. (2017).

See Also

analyzeORF
analyzeCPAT
analyzePFAM
analyzeSignalP
extractConsequenceSummary
extractConsequenceEnrichment
extractConsequenceEnrichmentComparison
extractConsequenceGenomeWide

Examples

### Prepare example data
data("exampleSwitchListAnalyzed")

# subset for fast runtime
exampleSwitchListAnalyzed <- subsetSwitchAnalyzeRlist(
    exampleSwitchListAnalyzed,
    exampleSwitchListAnalyzed$isoformFeatures$gene_id %in% sample(exampleSwitchListAnalyzed$isoformFeatures$gene_id, 10)
)

### Analyze consequences
consequencesOfInterest <- c(
    'intron_retention',
    'coding_potential',
    'NMD_status',
    'domains_identified'
)

exampleSwitchListAnalyzed <- analyzeSwitchConsequences(
    exampleSwitchListAnalyzed,
    consequencesToAnalyze = consequencesOfInterest,

)

### simple overview
extractSwitchSummary(exampleSwitchListAnalyzed, filterForConsequences = FALSE)
extractSwitchSummary(exampleSwitchListAnalyzed, filterForConsequences = TRUE)


### Detailed switch overview
consequenceSummary <- extractConsequenceSummary(
    exampleSwitchListAnalyzed,
    includeCombined = TRUE,
    returnResult = TRUE,        # return data.frame with summary
    plotGenes = TRUE            # plot summary
)



### Now switches are analyzed we can also extract the the largest/most significant switches with the extractTopSwitches() function
# Extract top 2 switching genes (by q-value)
extractTopSwitches(
    exampleSwitchListAnalyzed,
    filterForConsequences = TRUE,
    n = 2,
    extractGenes = TRUE,
    sortByQvals = TRUE
)

# Extract top 2 switching isoforms (by q-value)
extractTopSwitches(
    exampleSwitchListAnalyzed,
    filterForConsequences = TRUE,
    n = 2,
    extractGenes = FALSE,
    sortByQvals = TRUE
)

# Extract top 2 switching isoforms (by dIF)
extractTopSwitches(
    exampleSwitchListAnalyzed,
    filterForConsequences = TRUE,
    n = 2,
    extractGenes = FALSE,
    sortByQvals = FALSE
)

### Note the function ?extractConsequenceSummary is specific made for the post analysis of switching consequences

kvittingseerup/IsoformSwitchAnalyzeR documentation built on Jan. 14, 2024, 11:30 p.m.