iso.switch: Isoform switch analysis for time-series data

View source: R/iso.switch.R

iso.switchR Documentation

Isoform switch analysis for time-series data

Description

This function is used to search and score transcript isoform switch in time-series expression.

Usage

iso.switch(data.exp, mapping, times, rank = F, min.t.points = 2,
  min.difference = 1, spline = F, spline.df = NULL, verbose = T)

Arguments

data.exp

Time-series isoform expression data with first row indicating the replicate labels and second row indicating the time points. The remained lines are isoform names in the first column followed by the expression values. All the replicates for each time point should be grouped together and the time points follow the sequential order.

mapping

gene and isoform mapping table with gene names in first column and transcript isoform names in the second column

times

a numeric vector of time labellings of all the relicated samples, e.g. 1,1,1,2,2,2,3,3,3,...

rank

logical (TRUE or FALSE). Should isoform expression be converted to rank of isoform expression in sample basis?

min.t.points

if the number of time points in all intervals < min.t.points, this pair of isoforms are not switch candidates since they only have transient switches.

min.difference

If the mean of differences of average isoform expression or spline fitted expression < min.difference, this pair of isoforms are supposed to tied together and they are not considered as switch candidates.

spline

logical, whether to use spline method to fit isoform expression (TRUE) or mean expression of time points (FALSE).

spline.df

the degree of freedom used in spline method. See ns in splines for details.

verbose

logical, to track the running progressing (TRUE) or not (FALSE).

Details

The detailed steps:

Figure: figures\_001.png

Figure 1: Isoform switch analysis methods. Expression data with 3 replicates for each condition/time point is simulated for isoforms iso_i and iso_j. The points in the plots represent the samples and the black lines connect the average of samples. (A) is the iso-kTSP algorithm for comparisons of two conditions c_1 and c_2. Time-Series Isoform Switch (TSIS) tool is designed for detection and characterization of isoform switches for time series data shown in figure (B). The time-series with 6 time points is divided into 4 intervals by the intersection points of average expression.

Step 1: determine switch points.

Given that a pair of isoforms iso_i and iso_j may have a number of switches in a time-series, we have offered two approaches to search for the switch time points in TSIS:

  • The first approach takes the average values of the replicates for each time point for each transcript isoform. Then it searches for the cross points of the average value of two isoforms across the time points (seen in Figure 1(B)).

  • The second approach uses natural spline curves to fit the time-series data for each transcript isoform and find cross points of the fitted curves for each pair of isoforms.

In most cases, these two methods produce very similar results. However, average values of expression may lose precision without having information of backward and forward time points. The spline method fit time-series of expression with control points (depending on spline degree of freedom provided) and weights of several neighbours to obtain designed precision (Hastie and Tibshirani, 1990). The spline method is useful to find global trends in the time-series data when the data is very noisy. But it may lack details of isoform switch in the local region. It is recommended that users use both average and spline method to search for the switch points and examine manually when inconsistent results were produced by the above two methods.

Step 2: define switch scoring measures

We define each transcript isoform switch by 1) the switch point P_i, 2) time points between switch points P_(i-1) and P_i as interval I_1 before switch P_i and 3) time points between switch points P_i and P_(i+1) as interval I_2 after the switch P_i (see Figure 1(B)). We defined five measurements to characterize each isoform switch. The first two are the probability/frequency of switch and the sum of average sample differences before and after switch, which are similar to Score 1 and Score 2 in iso-kTSP method (Sebestyen, et al., 2015) (see Figure 1(A))).

  • Firstly, for a switch point P_i of two isoforms iso_i and iso_j with interval I_1 before the switch and interval I_2 after the switch ( Figure1 (B)), Score 1 is defined as

    S_1 (iso_i,iso_j |I_1,I_2)=|p(iso_i>iso_j |I_1)+p(iso_i<iso_j |I_2)-1|,

    Where p(iso_i>iso_j |I_1) and p(iso_i<iso_j |I_2) are the frequencies/probabilities that the samples of one isoform is greater or less than in the other in corresponding intervals.

  • Secondly, instead of rank differences as in iso-kTSP to avoid possible ties, we directly use the average abundance differences. The sum of mean differences of samples in intervals I_1 and I_2 are calculated as

    S_2 (iso_i,iso_j |I_1,I_2)=d(iso_i,iso_j |I_1)+d(iso_i,iso_j |I_2)

    Where d(iso_i,iso_j |I_k) is the average expression difference in interval I_k,k=1,2 defined as

    d(iso_i, iso_j |I_k)=\frac{1}{|I_k|}\sum_{m_{I_k}} |exp(iso_i |s_{m_{I_k}},I_k)-exp(iso_j |s_{m_{I_k}},I_k) |

    |I_k | is the number of samples in interval I_k and exp(iso_i |s_(m_(I_k ) ),I_k) is the expression of iso_i of sample s_(m_(I_k ) ) in interval I_k.

  • Thirdly, paired t-tests were carried out to test if there are significant differences for the two switched isoforms within the intervals before and after the switch.

  • Fourthly, the numbers of time points in intervals I_1 and I_2 were also provided, which indicate whether this switch is transient or a long lived change.

  • Finally, Isoforms with high negative correlations across the time points may point to the splicing regulation have antagonistic effects on the switched isoforms. Thus they are of great interest for investigation of alternative splicing regulations. As an additional score, we calculated the Pearson correlation of two isoforms across the whole time series.

For TSIS analysis details and examples, please go the the user manual on Github: https://github.com/wyguo/TSIS.

Value

a data frame of scores. The column names:

  • iso1,iso2: the isoform pairs.

  • iso1.mean.ratio, iso2.mean.ratio: the mean ratios of isoforms to their gene.

  • left.interval, left.interval: The intervals before and after switch points.

  • x.value, y.value: The values of x axis (time) and y axis (expression) coordinates of the switch points.

  • left.prob, right.prob: the frequencies/probabilities that the samples of an isoform is greater or less than the other in left and right intervals, respectively.

  • left.dist, right.diff: the average sample differences in intervals before and after switch, respectively.

  • left.pval, right.pval: the paired t-test p-values of the samples in the intervals before and after switch points.

  • left.t.points, right.t.points: the number of time points in intervals before and after the switch points.

  • prob: Score1: the probability/frequency of switch.

  • diff: Score2: the sum of average sample differences before and after switch.

  • cor: Pearson correlation of two isoforms.

References

1. Sebestyen E, Zawisza M, Eyras E: Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer. Nucleic Acids Res 2015, 43(3):1345-1356.

2. Hastie, T.J. and Tibshirani, R.J. Generalized additive models. Chapter 7 of Statistical Models in S eds. Wadsworth & Brooks/Cole 1990.


wyguo/TSIS documentation built on May 21, 2023, 12:36 a.m.