cssFile <- system.file("extdata", "style.css", package="ccPaperRev")
options(markdown.HTML.stylesheet = cssFile)

Introduction Problems

Rev 3: The definition of FLE and FSE is confusing. It is a good rule to use acronyms with more than one letter difference. The terms "feature-list" and "feature-set" do not provide an intuitive idea of the difference. Set/threshold-based (e.g. Fisher's Exact) and rank-based/threshold-free (GSEA) would have been better depictions of the two categories of (annotation) gene-set enrichment tests (GSE, or GSET) or over-representation analysis (ORA). Another major distinction the author may consider is between competitive and self-contained, based on the type of null hypothesis to be rejected, although for microarray and gene expression analysis the competitive methods (or hybrid) are far more popular than self-contained (which is more suitable for rare genetic variants, or specific applications to expression data). FLA: another poor acronym choice, too similar to FLE; suggest feature intersection analysis..? (FIA)

Response: We agree that the definitions were not truly useful, and appreciate the comments on how to improve them. The original and new text is below.

Original Text: Methods commonly used to summarize and infer relationships from high-throughput experiments include feature-list enrichment (FLE) and feature-set enrichment (FSE). FLE seeks to find statistically significant over-represented feature annotations in the differentially expressed feature list compared to all of the measured features (i.e. they are enriched), whereas FSE attempts to find sets of features (defined by common annotations) that appear significantly at the extremes of a ranked feature list.

New Text: Methods commonly used to summarize and infer relationships from high-throughput experiments include the set-based threshold based (SBTB) and rank-based threshold-free (RBTF) methods. SBTB methods typically seek to find statistically significant over-represented feature annotations (sets) in the differentially expressed (threshold) feature list compared to all of the measured features (i.e. they are enriched), whereas RBTF attempts to find sets of features (defined by common annotations) that appear significantly at the extremes of a ranked feature list.

Methods Problems

Rev 1: The use of a single approach (or two similar approaches if you include the supplemental data) is not enough. There is time course data here, some kind of time course analysis would have been nice, using maSigPro or STEM or similar. Something like WGCNA for module analysis could also have been employed. Also lacking is any biological follow-up on the in-silico inferences. As mentioned in the paper, there is probably a profound tissue effect in the data, which could have been addressed by removal of tissue specific genes.

Response: As mentioned by yourself, we could not decide whether we were writing a methods or biology paper. In principle, if this was a single analysis of the skin denervation time course, then maybe a time course method would be appropriate. The rewrite of the manuscript (in progress) focuses on the categoryCompare method, and therefore we did not include an actual time-course specific method analysis of the skin-denervation data. In addition, as the goal was the comparison between tissues, and there is no equivalent dataset for muscle (that we have been able to find), such an analysis would have likely been criticized for not matching the muscle dataset.

Rev 2: 1.Page 6, line 192. Reference to Bioconductor (Gentleman et al, 2004) is misleading here. It would be appropriate to give reference or URL to CategoryCompare here, otherwise it may seem that CategoryCompare was developed in 2004. 2. Page 7, line 237, Muscle Denervation: It should be mentioned here that it is mouse (not rat) data, otherwise further discussions are not clear. * Fig 5 requires more detailed explanations in the figure caption. Currently it is hard to understand.

Response 1. Thank you for pointing out the confusion here. As was noted, the current wording implies a reference to the package itself, whereas the intention was to reference the Bioconductor software project. Therefore, we have moved the Bioconductor reference directly adjacent to the mention of Bioconductor itself, and included the URL to the categoryCompare package site on the Bioconductor project. 2. Mouse is mentioned in the description of the data set acquisition. * 3. Additional information about the figure has been included, including how Table 2 relates to Figure 5, and what the legend signifies.

Rev 3: the overlap and Jaccard coefficient used in Enrichment Map and cites are also available in combined form (combined overlap coefficient), which was to work better by the authors, so it should be implemented as well the authors should provide for input from any gene-set enrichment test producing a gene-set level significance score; restricting to Fisher's Exact test (aka hypergeometric) is over-simplistic, and disregards all the method development that happened in the last decade; being able to process GSEA output would be a good start * the way the color scheme is used is suboptimal for visualization (the colors do not help very much distinguishing the various condition combinations in an intuitive way); the authors should make a better effort on that side; I specifically suggest trying the following methodology: pick one color for each comparison with an ssociated list of annotation gene-sets enriched (e.g. skin t7 up: yellow, skin t14 up: orange, skin t7 dw: olive green, skin 14t dw: water green, muscle up: magenta, muscle dw: blue) and then represent the significance in more than one comparison using Cytoscape pie chart node coloring (so, a node representing an annotation gene-set significantly over-represented in skin t7 up and skin t14 up would be half yellow half orange).

Responses:

One publications Supporting Materials (http://www.nature.com/nature/journal/v466/n7304/extref/nature09146-s1.pdf) mentions that "We find that the arithmetic average of these two behaves better, and that is what we use here and refer to as the weighted overlap", however no evidence of the improvement is provided outside of this statement. Therefore, the combined coefficient has been included, but without more evidence from the authors of EM, I am not inclined to make it the default.

The combined coefficient has been implemented, see https://github.com/rmflight/categoryCompare/commit/7d00dcbcfb4c1fdae3401e45e5c4dccbe3637822 for the actual commit adding the function, and commits https://github.com/rmflight/categoryCompare/commit/7930b0925d31fa2d7691ce3d182a2ae7a89c174d https://github.com/rmflight/categoryCompare/commit/4bf13e21ed174dc73c4d360f92fb5aa75d40d72c for where the slot is added to the classes so it can be used.

Discussion Problems

Rev 1: * There is little tie-in of the results to the underlying biological question either independently or compared to prior art, it is not clear how much, if any, the field has been advanced by the findings. The graph relationships between the factors as presented in figure 5 are not addressed at all.

Rev 2: 1. Authors discussed the advantages of the CategoryCompare approach, but have not mentioned possible concerns or limitations. It would help to list the assumptions of the approach. 2. While using CategoryCompare for mice and rats, authors preserved only the genes that have homologs in both species. How many genes (%) where discarded as a result of such selection. How much information is lost? Why was it necessary if you perform comparison at the level of pathways and biological processes? Have you tried the same comparison without discarding the genes that had no homologs?

Responses: * 1. This was an oversight. We hope that an assessment of the method using hypothetical data demonstrates possible limitations.

Rev 3: * besides the well thought point about angiogenesis, there is a bit too much speculation about biological results; unless results are more convincing, this just burdens the paper, which is more about the tool and its demonstratation of use; the only point that may deserve a bit more attention is the presence / absence of an inflammatory / immune response signature

Results Problems

Rev 1: * The figures in the paper are inappropriate. Figures 1 and 2 are unnecessary, they are simplistic and only referenced in the introduction. Figure 4 is relatively uninformative and also not needed. Figure 5 is a wall of graphs which is very hard to interpret. In addition, there is no evidence of the edge weighting mentioned in the paper.

Rev 3: GSEA (pre-ranked using the limma moderated t statistic) should be tried out for the analysis, results may get better I am disappointed there is no clue of axonogenesis or other developmental gradients other than angiogenesis and VEGF the authors should compare to the use of Enrichment Map (EM), with the following two-comparison maps: muscle up + skin 7t up, muscle up + skin 14t up, muscle down + skin 7t dw, muscle down + skin 14t dw. Yes , EM was not designed for (wildly) multi-condition studies, but in these cases comparison can still be broken down and analyzed -- the authors should strive to make a convincing point about the increased intuitiveness of their representation, and convince the reader. Note this would probably work better with GSEA NES for coloring nodes.

Reponse:

Submission Petruska of data to GEO

Rev 1: The biology in the paper is a microarray experiment, this should be submitted to GEO.

Response:

Test on multiple datasets

Drop mapping to orthologues

Argue that this removes the benefit of using annotations instead of genes.

Support GSEA

Use GSEA results too

Claim of Novelty

Rev 1: It is not clear if this is a biology paper that uses a bioinformatic tool or a bioinformatic paper using biology as an example. From a bioinformatics perspective, there is no clear evidence that the novel algorithm is any better than established methods for performing this kind of analysis, and not enough comparison with other methods was presented in the paper. From a biological perspective, there are few novel findings presented in this paper and no clear indication of how they impact the field.

If it were to be refactored, the authors should decide whether it is the biology or the algorithm they want to present and prioritize the paper accordingly. If the former then a more detailed analysis of the array should be undertaken, preferably with further wet bench validation. If the latter than a wider range of examples of its application should be included, with emphasis on its ability to integrate heterogeneous data and the utility of visualizing the results in cytoscape

Response: Nowhere do we claim the algorithm is novel. We simply lay it out. Seriously, a lot of other people have done this, as we list in our references. This is simply packaging it up in a nice little application that we feel is useful.

What we really have done

Took a simple method that had been previously done (see Shen & Tseng 2010, Merico et al 2010), generalized Shen & Tseng's work (i.e. don't really need to have same items for the MAPE-P part), and then put the equivalent of Merico 2010 into R, that get's visualized in Cytoscape.

Advantages: all the annotation available from Bioconductor, as well as the various enrichment methods available, with the visualization capabilities of R. This means we get the statistical manipulation abilities of R for our data.

Discussion with Eric

Trim down the amount of focus on Jeff's data. Can we trim down the biology and background and just focus on it as an example data set.

Hypothetical situation, theoretical data set: Start with the GO-Terms, have 2000 diff exp. genes, from the 2000, choose some Gene Ontologies (perhaps 20), how many genes need to be rep. ?

Test data set: 60 GO Terms from BP, 20 with large number of genes, 20 with medium, 20 with low (>= 10 genes annotated), and choose large fraction from each.

Set 1500 diff expressed genes from the genes annotated with GO Terms chosen above, and apply the two methods, and see how they are different.

Major Things to Address after Round #2

Reviewer #1 is happy

pending GEO submission and revised discussion. Yay! Note: we still need to write a kick-ass discussion.

Reviewer #3 still seems to feel that the major contribution is the visualization, and is rather confused by the simulation. This needs to be addressed in some detail.

Doesn't want so much emphasis that categoryCompare is better, but that the methods may be complementary. * This is actually borne out by the simulations: In some cases feature combining worked better, in other cases annotation combining worked better. So it is probably not a bad idea to use both, as was advocated by Shen and Tseng.

Currently our major contribution is showing using simulations why you might want to combine datasets using annotations instead of features, and implementing a method that is rather generic in R.

Specific Points:

(316) "Traditional methods for comparing high-throughput data sets depend on finding shared features between significant feature lists for each data set and then performing some form of enrichment analysis on the overlapping feature sets to determine biological themes among the shared features"; (329) "Figure 1 illustrates this contrast. As a simple example, consider two feature or annotation lists from two different data sets (“Start”, middle box of Figure 1). In the “Traditional Analysis”, the intersection and set-difference of the features are determined, and biological interpretation provided by annotation enrichment of each of the resulting lists."

(321) "However, there are many different scenarios that complicate this analysis, including cases when different types of features are measured (i.e. two studies using different microarray platforms), or comparing studies that examine different types of features that may share the same annotations (transcriptomics vs. metabolomics for example)."

(339) "When no feature-annotation relationships are available, the network considers the lists themselves as nodes, with edges to annotations that were derived from that list."

Regarding Results and Methods:

These are major additions.

I understand the simulations were added to demonstrate that considering all annotations significant in at least one condition, rather than annotations significant only for the overlapping genomic features, does not inflate the number of noisy annotations. This is a legitimate concern, but as presented now it look as the main point of the paper. This should be addressed in a subsection of the results, and part of the details should be put in the methods section; the figure should also be modified to convey this message more clearly and concisely.

That said, I am skeptical about proving the superiority of categorycompare mainly by simulations. This is a visualization method, the performance must be measured by how it helps a researcher to productively interpret data. This cannot be measured by simulation.

Right now, I find the results partially overwhelming, confusing and unconvincing. I would like to see two or three examples where a multi-condition publicly-available data-set is presented, the biological question is concisely explained, and what can be gleaned by the "traditional analysis" as well as "category compare" are presented, with corresponding graphics. If a researcher reads this, he should agree the method is useful for his research. Mind the two approaches may as well be complementary, showing different aspects of the data; this conclusion would not weaken at all this article. Conclusions from simulations could be discussed, but, in the current form, the amount of work done on simulations and details presented in the result section does not translate to a sufficiently convincing message about the method's performance.

I encourage the authors to do some more work to address these concerns as they fell appropriate.

Ability to import GSEA type results:

I think what I suggested is very easy to implement: a user should just export his enrichment results in a simple and standardized format and then upload in categorycompare.



rmflight/ccPaper documentation built on May 27, 2019, 9:31 a.m.