meta2d: Detect rhythmic signals from time-series datasets with...

Description Usage Arguments Details Value References Examples

View source: R/meta2dMainF.R

Description

This is a function that incorporates ARSER, JTK_CYCLE and Lomb-Scargle to detect rhythmic signals from time-series datasets.

Usage

1
2
3
4
5
6
7
meta2d(infile, outdir = "metaout", filestyle, timepoints, minper = 20,
  maxper = 28, cycMethod = c("ARS", "JTK", "LS"),
  analysisStrategy = "auto", outputFile = TRUE,
  outIntegration = "both", adjustPhase = "predictedPer",
  combinePvalue = "fisher", weightedPerPha = FALSE, ARSmle = "auto",
  ARSdefaultPer = 24, outRawData = FALSE, releaseNote = TRUE,
  outSymbol = "", parallelize = FALSE, nCores = 1, inDF = NULL)

Arguments

infile

a character string. The name of input file containing time-series data.

outdir

a character string. The name of directory used to store output files.

filestyle

a character vector(length 1 or 3). The data format of input file, must be "txt", or "csv", or a character vector containing field separator character(sep), quoting character (quote), and the character used for decimal points(dec, for details see read.table).

timepoints

a numeric vector corresponding to sampling time points of input time-series data; if sampling time points are in the first line of input file, it could be set as a character sting-"Line1" or "line1".

minper

a numeric value. The minimum period length of interested rhythms. The default is 20 for circadian rhythms.

maxper

a numeric value. The maximum period length of interested rhythms. The default is 28 for circadian rhythms.

cycMethod

a character vector(length 1 or 2 or 3). User-defined methods for detecting rhythmic signals, must be selected as any one, any two or all three methods(default) from "ARS"(ARSER), "JTK"(JTK_CYCLE) and "LS"(Lomb-Scargle).

analysisStrategy

a character string. The strategy used to select proper methods from cycMethod for analyzing input time-series data, must be "auto"(default), or "selfUSE". See Details part for more information.

outputFile

logical. If TRUE, analysis results will be wrote in the output files. If FALSE, analysis results will be returned as an R list.

outIntegration

a character string. This parameter controls what kinds of analysis results will be outputted, must be one of "both" (default), "onlyIntegration"(only output integration file), or "noIntegration"(not output integration file).

adjustPhase

a character string. The method used to adjust original phase calculated by each method in integration file, must be one of "predictedPer"(adjust phase with predicted period length) or "notAdjusted"(not adjust phase).

combinePvalue

a character string. The method used to integrate multiple p-values, must be one of "bonferroni"(Bonferroni correction), or "fisher"(Fisher's method).

weightedPerPha

logical. If TRUE, weighted scores based on p-value given by each method will be used to calculate the integrated period length and phase.

ARSmle

a character string. The strategy of using MLE method in ar fit of "ARS", must be one of "auto"(use MLE depending the number of time points), "mle" (always use MLE), or "nomle"(never use MLE).

ARSdefaultPer

a numeric value. The expected period length of interested rhythm, which is a necessary parameter for ARS. The default is 24(for circadian rhythms). Set it to another proper numeric value for other rhythms.

outRawData

logical. If TRUE, raw time-series data will be added in the output files.

releaseNote

logical. If TRUE, reminding or warning notes during the analysis will be released on the screen.

outSymbol

a character string. A common prefix exists in the names of output files.

parallelize

logical. If TRUE, computation will be done in paralleL Doesn't work in windows machine.

nCores

a integer. Bigger or equal to one, number of cores to use.

inDF

data.frame. If !is.null(inDF) and timepoints is a numeric meta2d will use this data.frame instead of loading from infile.

Details

ARSER(Yang, 2010), JTK_CYCLE( Hughes, 2010), and Lomb-Scargle(Glynn, 2006) are three popular methods of detecting rhythmic signals. ARS can not analyze unevenly sampled datasets, or evenly sampled datasets but with missing values, or with replicate samples, or with non-integer sampling interval. JTK is not suitable to analyze unevenly sampled datasets or evenly sampled datasets but with non-integer sampling interval. If set analysisStrategy as "auto"(default), meta2d will automatically select proper method from cycMethod for each input dataset. If the user clearly know that the dataset could be analyzed by each method defined by cycMethod and do not hope to output integrated values, analysisStrategy can be set as "selfUSE".

ARS used here is translated from its python version which always uses "yule-walker", "burg", and "mle" methods(see ar) to fit autoregressive models to time-series data. Fitting by "mle" will be very slow for datasets with many time points. If ARSmle = "auto" is used, meta2d will only include "mle" when number of time points is smaller than 24. In addition, one evaluation work(Wu, 2014) indicates that ARS shows relative high false positive rate in analyzing high-resolution datasets (1h/2days and 2h/2days). JTK(version 3) used here is the latest version, which improves its p-value calculation in analyzing datasets with missing values.

The power of detecting rhythmic signals for an algorithm is associated with the nature of data and interested periodic pattern(Deckard, 2013), which indicates that integrating analysis results from multiple methods may be helpful to rhythmic detection. For integrating p-values, Bonferroni correction("bonferroni") and Fisher's method( "fisher") (Fisher, 1925; implementation code from MADAM) could be selected, and "bonferroni" is usually more conservative than "fisher". The integrated period is arithmetic mean of multiple periods. For integrating phase, meta2d takes use of mean of circular quantities. Integrated period and phase is further used to calculate the baseline value and amplitude through fitting a constructed periodic model.

Phases given by JTK and LS need to be adjusted with their predicted period (adjustedPhase = "predictedPer") before integration. If adjustedPhas = "notAdjusted" is selected, no integrated phase will be calculated. If set weightedPerPha as TRUE, weighted scores will be used in averaging periods and phases. Weighted scores for one method are based on all its reported p-values, which means a weighted score assigned to any one profile will be affected by all other profiles. It is always a problem of averaging phases with quite different period lengths(eg. averaging two phases with 16-hours' and 30-hours' period length). Currently, setting minper, maxper and ARSdefaultPer to a same value may be the only way of completely eliminating such problem.

This function is originally aimed to analyze large scale periodic data( eg. circadian transcriptome data) without individual information. Please pay attention to data format of input file(see Examples part). Except the first column and first row, others are time-series experimental values(setting missing values as NA).

Value

meta2d will write analysis results in different files under outdir if set outputFile = TRUE. Files named with "ARSresult", "JTKresult" and "LSreult" store analysis results from ARS, JTK and LS respectively. The file named with "meta2d" is the integration file, and it stores integrated values in columns with a common name tag-"meta2d". The integration file also contains p-value, FDR value, period, phase(adjusted phase if adjustedPhase = "predictedPer") and amplitude values calculated by each method. If outputFile = FALSE is selected, meta2d will return a list containing the following components:

ARS analysis results from ARS method
JTK analysis results from JTK method
LS analysis results from LS method
meta the integrated analysis results as mentioned above

References

Yang R. and Su Z. (2010). Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation. Bioinformatics, 26(12), i168–i174.

Hughes M. E., Hogenesch J. B. and Kornacker K. (2010). JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. Journal of Biological Rhythms, 25(5), 372–380.

Glynn E. F., Chen J. and Mushegian A. R. (2006). Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics, 22(3), 310–316.

Wu G., Zhu J., Yu J., Zhou L., Huang J. Z. and Zhang Z. (2014). Evaluation of five methods for genome-wide circadian gene identification. Journal of Biological Rhythms, 29(4), 231–242.

Deckard A., Anafi R. C., Hogenesch J. B., Haase S.B. and Harer J. (2013). Design and analysis of large-scale biological rhythm studies: a comparison of algorithms for detecting periodic signals in biological data. Bioinformatics, 29(24), 3174–3180.

Fisher, R.A. (1925). Statistical methods for research workers. Oliver and Boyd (Edinburgh).

Kugler K. G., Mueller L.A. and Graber A. (2010). MADAM - an open source toolbox for meta-analysis. Source Code for Biology and Medicine, 5, 3.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# write 'cycSimu4h2d', 'cycMouseLiverRNA' and 'cycYeastCycle' into three
# 'csv' files
write.csv(cycSimu4h2d, file="cycSimu4h2d.csv", row.names=FALSE)
write.csv(cycMouseLiverRNA, file="cycMouseLiverRNA.csv", row.names=FALSE)
write.csv(cycYeastCycle, file="cycYeastCycle.csv", row.names=FALSE)

# write 'cycMouseLiverProtein' into a 'txt' file
write.table(cycMouseLiverProtein, file="cycMouseLiverProtein.txt",
  sep="\t", quote=FALSE, row.names=FALSE)

# analyze 'cycMouseLiverRNA.csv' with JTK_CYCLE
# this is masked for keeping the total running time within 10s required by CRAN check
# meta2d(infile="cycMouseLiverRNA.csv", filestyle="csv", outdir="example",
#  timepoints=18:65, cycMethod="JTK", outIntegration="noIntegration")

# analyze 'cycMouseLiverProtein.txt' with JTK_CYCLE and Lomb-Scargle
meta2d(infile="cycMouseLiverProtein.txt", filestyle="txt",
  outdir="example", timepoints=rep(seq(0, 45, by=3), each=3),
  cycMethod=c("JTK","LS"), outIntegration="noIntegration")

# analyze 'cycSimu4h2d.csv' with ARSER, JTK_CYCLE and Lomb-Scargle and
# output integration file with analysis results from each method
meta2d(infile="cycSimu4h2d.csv", filestyle="csv", outdir="example",
  timepoints="Line1")

# analyze 'cycYeastCycle.csv' with ARSER, JTK_CYCLE and Lomb-Scargle to
# detect transcripts associated with cell cycle, and only output
# integration file
meta2d(infile="cycYeastCycle.csv",filestyle="csv", outdir="example",
  minper=80, maxper=96, timepoints=seq(2, 162, by=16),
  outIntegration="onlyIntegration", ARSdefaultPer=85,
  outRawData=TRUE)
# return analysis results instead of output them into files
cyc <- meta2d(infile="cycYeastCycle.csv",filestyle="csv",
  minper=80, maxper=96, timepoints=seq(2, 162, by=16),
  outputFile=FALSE, ARSdefaultPer=85, outRawData=TRUE)
head(cyc$ARS)
head(cyc$JTK)
head(cyc$LS)
head(cyc$meta)

Example output

The JTK is in process from  11:48:35 08-07-2019 
Warning: the input 'minper' is not suitable for JTK, it was reset as  21 
Warning: the input 'maxper' is not suitable for JTK, it was reset as  27 
The analysis by JTK is finished at  11:48:38 08-07-2019 
The LS is in process from  11:48:38 08-07-2019 
The analysis by LS is finished at  11:48:38 08-07-2019 
DONE! The analysis about ' cycMouseLiverProtein.txt '  has been finished.
                user.self     sys.self      elapsed   user.child    sys.child 
"Time used:"      "2.475"       "0.02"       "2.51"          "0"          "0" 


The ARS is in process from  11:48:38 08-07-2019 
The analysis by ARS is finished at  11:48:39 08-07-2019 
The JTK is in process from  11:48:39 08-07-2019 
The analysis by JTK is finished at  11:48:39 08-07-2019 
The LS is in process from  11:48:39 08-07-2019 
The analysis by LS is finished at  11:48:39 08-07-2019 
DONE! The analysis about ' cycSimu4h2d.csv '  has been finished.
                                  user.self              sys.self 
         "Time used:"               "1.006" "0.00800000000000001" 
              elapsed            user.child             sys.child 
              "1.027"                   "0"                   "0" 


The ARS is in process from  11:48:39 08-07-2019 
The analysis by ARS is finished at  11:48:39 08-07-2019 
The JTK is in process from  11:48:39 08-07-2019 
The analysis by JTK is finished at  11:48:39 08-07-2019 
The LS is in process from  11:48:39 08-07-2019 
The analysis by LS is finished at  11:48:39 08-07-2019 
DONE! The analysis about ' cycYeastCycle.csv '  has been finished.
                                  user.self              sys.self 
         "Time used:"               "0.335" "0.00399999999999999" 
              elapsed            user.child             sys.child 
              "0.339"                   "0"                   "0" 


The ARS is in process from  11:48:40 08-07-2019 
The analysis by ARS is finished at  11:48:40 08-07-2019 
The JTK is in process from  11:48:40 08-07-2019 
The analysis by JTK is finished at  11:48:40 08-07-2019 
The LS is in process from  11:48:40 08-07-2019 
The analysis by LS is finished at  11:48:40 08-07-2019 
DONE! The analysis about ' cycYeastCycle.csv '  has been finished.
                              user.self            sys.self             elapsed 
       "Time used:" "0.225000000000001"                 "0"             "0.225" 
         user.child           sys.child 
                "0"                 "0" 


               CycID filter_type   ar_method period_number           period
1 YLR072W_1777968_at           0         mle             1 87.7362637362637
2 YBR275C_1771064_at           0 yule-walker             1 89.7078651685393
3 YCL024W_1779753_at           1 yule-walker             1 86.3135135135135
4 YGR177C_1771190_at           0     default             1               85
5 YEL069C_1775358_at           1         mle             1 89.7078651685393
6 YLR125W_1769482_at           0 yule-walker             1 91.7701149425287
         amplitude            phase      mean  R_square R2_adjust   coef_var
1 94.5793636593827 61.4128785798212  996.0288 0.6673638 0.5842048 0.10218618
2 101.952188023999 7.15399100874371  567.2834 0.5396428 0.4245535 0.26405373
3 452.102904641171 1.57365902889558 1187.0017 0.5867804 0.4834755 0.40121892
4  150.64941279979 49.6491302047991 1842.6738 0.7439124 0.6798906 0.07935373
5 9.59969195724396  82.549797941594  199.5306 0.8345810 0.7932263 0.03683684
6 26.1892698123898 6.54455912941449  425.5334 0.6511591 0.5639488 0.06605052
       pvalue     fdr_BH recov_2min recov_18min recov_34min recov_50min
1 0.012242720 0.02155702   869.8609    797.2953    942.6339   1069.1362
2 0.044913802 0.04592298   879.3746    849.6005    518.1937    477.9088
3 0.029155702 0.03644463  2448.7403   1560.6675    745.9879    599.9138
4 0.004300846 0.01433615  1664.7032   1605.5524   1817.0391   2066.6323
5 0.000748758 0.00748758   211.1860    192.3690    191.4977    194.5959
6 0.014808455 0.02155702   487.0184    454.4303    405.9583    408.3755
  recov_66min recov_82min recov_98min recov_114min recov_130min recov_146min
1   1124.1407   1070.2156    903.6072     960.5258    1049.3727    1080.8489
2    501.3832    557.3082    589.2522     563.5682     465.7285     415.0808
3   1061.5011   1398.0423   1247.4566    1059.1671     967.4813     902.5316
4   1915.4511   1675.6887   1724.0113    1942.5318    1989.9812    1955.0953
5    204.7598    208.4046    203.6591     196.5309     192.7436     191.1770
6    402.5615    422.6579    455.0661     439.8246     408.4970     399.6859
  recov_162min
1    1088.6792
2     422.7184
3    1065.5292
4    1912.7260
5     207.9130
6     396.7915
               CycID        BH.Q       ADJ.P PER LAG       AMP recov_2min
1 YLR072W_1777968_at 0.009169698 0.001455026  96  64 141.67214   869.8609
2 YBR275C_1771064_at 0.009169698 0.003667879  96   8 234.34002   879.3746
3 YCL024W_1779753_at 0.013924713 0.008354828  96   0 576.06547  2448.7403
4 YGR177C_1771190_at 0.042711778 0.034169422  80  56 177.30558  1664.7032
5 YEL069C_1775358_at 0.069754311 0.062778880  96  88  11.33888   211.1860
6 YLR125W_1769482_at 0.013924713 0.008354828  96   8  34.27492   487.0184
  recov_18min recov_34min recov_50min recov_66min recov_82min recov_98min
1    797.2953    942.6339   1069.1362   1124.1407   1070.2156    903.6072
2    849.6005    518.1937    477.9088    501.3832    557.3082    589.2522
3   1560.6675    745.9879    599.9138   1061.5011   1398.0423   1247.4566
4   1605.5524   1817.0391   2066.6323   1915.4511   1675.6887   1724.0113
5    192.3690    191.4977    194.5959    204.7598    208.4046    203.6591
6    454.4303    405.9583    408.3755    402.5615    422.6579    455.0661
  recov_114min recov_130min recov_146min recov_162min
1     960.5258    1049.3727    1080.8489    1088.6792
2     563.5682     465.7285     415.0808     422.7184
3    1059.1671     967.4813     902.5316    1065.5292
4    1942.5318    1989.9812    1955.0953    1912.7260
5     196.5309     192.7436     191.1770     207.9130
6     439.8246     408.4970     399.6859     396.7915
               CycID PhaseShift PhaseShiftHeight PeakIndex  PeakSPD   Period
1 YLR072W_1777968_at   66.20825        1097.1808        16 4.082555 89.73913
2 YBR275C_1771064_at   97.10063         576.7867         8 3.056345 92.97297
3 YCL024W_1779753_at   92.04248        1273.4348        13 3.125249 90.92511
4 YGR177C_1771190_at   51.04197        1941.0634        30 3.876712 84.59016
5 YEL069C_1775358_at   82.30036         206.3205         6 4.300112 93.81818
6 YLR125W_1769482_at   99.51891         441.0674         1 4.096129 96.00000
           p  N Nindependent  Nyquist      BH.Q recov_2min recov_18min
1 0.09701453 11            6 0.034375 0.2360983   869.8609    797.2953
2 0.25114957 11            6 0.034375 0.2790551   879.3746    849.6005
3 0.23625389 11            6 0.034375 0.2790551  2448.7403   1560.6675
4 0.11804913 11            6 0.034375 0.2360983  1664.7032   1605.5524
5 0.07869067 11            6 0.034375 0.2360983   211.1860    192.3690
6 0.09576088 11            6 0.034375 0.2360983   487.0184    454.4303
  recov_34min recov_50min recov_66min recov_82min recov_98min recov_114min
1    942.6339   1069.1362   1124.1407   1070.2156    903.6072     960.5258
2    518.1937    477.9088    501.3832    557.3082    589.2522     563.5682
3    745.9879    599.9138   1061.5011   1398.0423   1247.4566    1059.1671
4   1817.0391   2066.6323   1915.4511   1675.6887   1724.0113    1942.5318
5    191.4977    194.5959    204.7598    208.4046    203.6591     196.5309
6    405.9583    408.3755    402.5615    422.6579    455.0661     439.8246
  recov_130min recov_146min recov_162min
1    1049.3727    1080.8489    1088.6792
2     465.7285     415.0808     422.7184
3     967.4813     902.5316    1065.5292
4    1989.9812    1955.0953    1912.7260
5     192.7436     191.1770     207.9130
6     408.4970     399.6859     396.7915
               CycID  ARS_pvalue   ARS_BH.Q ARS_period ARS_adjphase
1 YLR072W_1777968_at 0.012242720 0.02155702   87.73626    61.412879
2 YBR275C_1771064_at 0.044913802 0.04592298   89.70787     7.153991
3 YCL024W_1779753_at 0.029155702 0.03644463   86.31351     1.573659
4 YGR177C_1771190_at 0.004300846 0.01433615   85.00000    49.649130
5 YEL069C_1775358_at 0.000748758 0.00748758   89.70787    82.549798
6 YLR125W_1769482_at 0.014808455 0.02155702   91.77011     6.544559
  ARS_amplitude  JTK_pvalue    JTK_BH.Q JTK_period JTK_adjphase JTK_amplitude
1     94.579364 0.001455026 0.009169698         96           66     141.67214
2    101.952188 0.003667879 0.009169698         96           10     234.34002
3    452.102905 0.008354828 0.013924713         96            2     576.06547
4    150.649413 0.034169422 0.042711778         80           58     177.30558
5      9.599692 0.062778880 0.069754311         96           90      11.33888
6     26.189270 0.008354828 0.013924713         96           10      34.27492
   LS_pvalue   LS_BH.Q LS_period LS_adjphase LS_amplitude meta2d_pvalue
1 0.09701453 0.2360983  89.73913   66.208254    1097.1808  0.0001767816
2 0.25114957 0.2790551  92.97297    4.127657     576.7867  0.0025662522
3 0.23625389 0.2790551  90.92511    1.117369    1273.4348  0.0033620108
4 0.11804913 0.2360983  84.59016   51.041968    1941.0634  0.0012498505
5 0.07869067 0.2360983  93.81818   82.300358     206.3205  0.0003392889
6 0.09576088 0.2360983  96.00000    3.518911     441.0674  0.0009084778
  meta2d_BH.Q meta2d_period meta2d_phase meta2d_Base meta2d_AMP meta2d_rAMP
1 0.001696444      91.15846    64.575261    997.4851 107.517320  0.10778839
2 0.003666075      92.89361     7.071532    569.6488 118.435669  0.20790997
3 0.003876404      91.07954     1.559115   1200.1148 483.025552  0.40248278
4 0.002499701      83.19672    52.945684   1848.1603 165.485553  0.08954069
5 0.001696444      93.17535    84.947692    200.1210   9.952018  0.04973000
6 0.002271194      94.59004     6.689105    426.2096  31.079671  0.07292110
  recov_2min recov_18min recov_34min recov_50min recov_66min recov_82min
1   869.8609    797.2953    942.6339   1069.1362   1124.1407   1070.2156
2   879.3746    849.6005    518.1937    477.9088    501.3832    557.3082
3  2448.7403   1560.6675    745.9879    599.9138   1061.5011   1398.0423
4  1664.7032   1605.5524   1817.0391   2066.6323   1915.4511   1675.6887
5   211.1860    192.3690    191.4977    194.5959    204.7598    208.4046
6   487.0184    454.4303    405.9583    408.3755    402.5615    422.6579
  recov_98min recov_114min recov_130min recov_146min recov_162min
1    903.6072     960.5258    1049.3727    1080.8489    1088.6792
2    589.2522     563.5682     465.7285     415.0808     422.7184
3   1247.4566    1059.1671     967.4813     902.5316    1065.5292
4   1724.0113    1942.5318    1989.9812    1955.0953    1912.7260
5    203.6591     196.5309     192.7436     191.1770     207.9130
6    455.0661     439.8246     408.4970     399.6859     396.7915

MetaCycle documentation built on May 2, 2019, 9:14 a.m.