In rmflight/ccPaper: work for categoryCompare paper revisions

cssFile <- system.file("extdata", "style.css", package="ccPaperRev")
options(markdown.HTML.stylesheet = cssFile)

Introduction Problems

Rev 3: The definition of FLE and FSE is confusing. It is a good rule to use acronyms with more than one letter difference. The terms "feature-list" and "feature-set" do not provide an intuitive idea of the difference. Set/threshold-based (e.g. Fisher's Exact) and rank-based/threshold-free (GSEA) would have been better depictions of the two categories of (annotation) gene-set enrichment tests (GSE, or GSET) or over-representation analysis (ORA). Another major distinction the author may consider is between competitive and self-contained, based on the type of null hypothesis to be rejected, although for microarray and gene expression analysis the competitive methods (or hybrid) are far more popular than self-contained (which is more suitable for rare genetic variants, or specific applications to expression data). FLA: another poor acronym choice, too similar to FLE; suggest feature intersection analysis..? (FIA)

Response: We agree that the definitions were not truly useful, and appreciate the comments on how to improve them. The original and new text is below.

Original Text: Methods commonly used to summarize and infer relationships from high-throughput experiments include feature-list enrichment (FLE) and feature-set enrichment (FSE). FLE seeks to find statistically significant over-represented feature annotations in the differentially expressed feature list compared to all of the measured features (i.e. they are enriched), whereas FSE attempts to find sets of features (defined by common annotations) that appear significantly at the extremes of a ranked feature list.

New Text: Methods commonly used to summarize and infer relationships from high-throughput experiments include the set-based threshold based (SBTB) and rank-based threshold-free (RBTF) methods. SBTB methods typically seek to find statistically significant over-represented feature annotations (sets) in the differentially expressed (threshold) feature list compared to all of the measured features (i.e. they are enriched), whereas RBTF attempts to find sets of features (defined by common annotations) that appear significantly at the extremes of a ranked feature list.

Methods Problems

Rev 1: The use of a single approach (or two similar approaches if you include the supplemental data) is not enough. There is time course data here, some kind of time course analysis would have been nice, using maSigPro or STEM or similar. Something like WGCNA for module analysis could also have been employed. Also lacking is any biological follow-up on the in-silico inferences. As mentioned in the paper, there is probably a profound tissue effect in the data, which could have been addressed by removal of tissue specific genes.

Response: As mentioned by yourself, we could not decide whether we were writing a methods or biology paper. In principle, if this was a single analysis of the skin denervation time course, then maybe a time course method would be appropriate. The rewrite of the manuscript (in progress) focuses on the categoryCompare method, and therefore we did not include an actual time-course specific method analysis of the skin-denervation data. In addition, as the goal was the comparison between tissues, and there is no equivalent dataset for muscle (that we have been able to find), such an analysis would have likely been criticized for not matching the muscle dataset.

Rev 2: 1.Page 6, line 192. Reference to Bioconductor (Gentleman et al, 2004) is misleading here. It would be appropriate to give reference or URL to CategoryCompare here, otherwise it may seem that CategoryCompare was developed in 2004. 2. Page 7, line 237, Muscle Denervation: It should be mentioned here that it is mouse (not rat) data, otherwise further discussions are not clear. * Fig 5 requires more detailed explanations in the figure caption. Currently it is hard to understand.

Response 1. Thank you for pointing out the confusion here. As was noted, the current wording implies a reference to the package itself, whereas the intention was to reference the Bioconductor software project. Therefore, we have moved the Bioconductor reference directly adjacent to the mention of Bioconductor itself, and included the URL to the categoryCompare package site on the Bioconductor project. 2. Mouse is mentioned in the description of the data set acquisition. * 3. Additional information about the figure has been included, including how Table 2 relates to Figure 5, and what the legend signifies.

Rev 3: the overlap and Jaccard coefficient used in Enrichment Map and cites are also available in combined form (combined overlap coefficient), which was to work better by the authors, so it should be implemented as well the authors should provide for input from any gene-set enrichment test producing a gene-set level significance score; restricting to Fisher's Exact test (aka hypergeometric) is over-simplistic, and disregards all the method development that happened in the last decade; being able to process GSEA output would be a good start * the way the color scheme is used is suboptimal for visualization (the colors do not help very much distinguishing the various condition combinations in an intuitive way); the authors should make a better effort on that side; I specifically suggest trying the following methodology: pick one color for each comparison with an ssociated list of annotation gene-sets enriched (e.g. skin t7 up: yellow, skin t14 up: orange, skin t7 dw: olive green, skin 14t dw: water green, muscle up: magenta, muscle dw: blue) and then represent the significance in more than one comparison using Cytoscape pie chart node coloring (so, a node representing an annotation gene-set significantly over-represented in skin t7 up and skin t14 up would be half yellow half orange).

Responses:

regarding similarity/overlap coefficients, these implementations came from the original Erichment Map (EM) paper (Merico D, Isserlin R, Stueker O, Emili A, Bader GD (2010) Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation. PLoS ONE 5(11): e13984. doi:10.1371/journal.pone.0013984). The combined coefficient is mentioned on the EM user guide page, but there is no mention of it actually being found to be better.

One publications Supporting Materials (http://www.nature.com/nature/journal/v466/n7304/extref/nature09146-s1.pdf) mentions that "We find that the arithmetic average of these two behaves better, and that is what we use here and refer to as the weighted overlap", however no evidence of the improvement is provided outside of this statement. Therefore, the combined coefficient has been included, but without more evidence from the authors of EM, I am not inclined to make it the default.

The combined coefficient has been implemented, see https://github.com/rmflight/categoryCompare/commit/7d00dcbcfb4c1fdae3401e45e5c4dccbe3637822 for the actual commit adding the function, and commits https://github.com/rmflight/categoryCompare/commit/7930b0925d31fa2d7691ce3d182a2ae7a89c174d https://github.com/rmflight/categoryCompare/commit/4bf13e21ed174dc73c4d360f92fb5aa75d40d72c for where the slot is added to the classes so it can be used.

You are right, we should be able to handle just about any input. Some support was available using the "GENcc" methods, however they have been significantly updatedto allow the use of any results as long as they are supplied with significant annotations from each sample, and a mapping of annotations to genes. This was used to enable GSEA type analysis of both experimental data-sets discussed in the publication.
This is a good point. Although for small numbers of experiments (see for example the Crohn's vs UC comparison, which has only 4 distinct lists, and a total of 7 meaningful comparisons) solid colors may not be too bad, beyond that it gets rather complicated. However, using pie-graphs where the pie is split only by the significant experiments may lead to perception problems, as solid (only significant in one experiment) or a few slices (significant in 2 or 3) may be percieved as more important. Based on this thinking, we have provided a method that splits nodes into pie-graphs with a slice for each condition / sample, and non-significance is denoted by complete desaturation of the color for that condition. In addition, a generateLegend function is included that plots a legend in an R plot window for reference. Note that we use png graphics that replace the node in Cytoscape, because we are using RCytoscape to communicate with Cytoscape, we cannot interact with other plugins in Cytoscape from R.
Piegraph visualization addition in commit: https://github.com/rmflight/categoryCompare/commit/40168a0d55578e96fae9f0e6f3d1f4ef9b42c028

Discussion Problems

Rev 1: * There is little tie-in of the results to the underlying biological question either independently or compared to prior art, it is not clear how much, if any, the field has been advanced by the findings. The graph relationships between the factors as presented in figure 5 are not addressed at all.

The sections on the general characteristics of skin/muscle denervation are entirely background and do not belong in the discussion. The paper can also not decide if its primary focus is the algorithm or the biology. There is some mention in various parts of the paper including the discussion how CategoryCompare would be useful for the integration of heterogeneous datasets, yet this potentially useful application is not exemplified or demonstrated in any way.
line 582 claims that there are "a number of new testable hypotheses" generated by the algorithm, but none are listed.
line 598 claims that the biological data from the paper is a necessary benchmark for screening assessment of tissues from disease/trauma, but how that applies is not clear.
the paragraph beginning line 521 even suggests the approach taken in the paper might not be productive.

Rev 2: 1. Authors discussed the advantages of the CategoryCompare approach, but have not mentioned possible concerns or limitations. It would help to list the assumptions of the approach. 2. While using CategoryCompare for mice and rats, authors preserved only the genes that have homologs in both species. How many genes (%) where discarded as a result of such selection. How much information is lost? Why was it necessary if you perform comparison at the level of pathways and biological processes? Have you tried the same comparison without discarding the genes that had no homologs?

Responses: * 1. This was an oversight. We hope that an assessment of the method using hypothetical data demonstrates possible limitations.

1. The question of necessity is a good one, and we could not think of any particular reason to enforce that outside of other previous studies doing it, and therefore used all the genes in both cases.

Rev 3: * besides the well thought point about angiogenesis, there is a bit too much speculation about biological results; unless results are more convincing, this just burdens the paper, which is more about the tool and its demonstratation of use; the only point that may deserve a bit more attention is the presence / absence of an inflammatory / immune response signature

Results Problems

Rev 1: * The figures in the paper are inappropriate. Figures 1 and 2 are unnecessary, they are simplistic and only referenced in the introduction. Figure 4 is relatively uninformative and also not needed. Figure 5 is a wall of graphs which is very hard to interpret. In addition, there is no evidence of the edge weighting mentioned in the paper.

I do not find the results particularly compelling or novel compared to other approaches. It is clear there is alot of overlap between the FLA approach and other approaches with respect to the findings. In addition two of the three novel annotations highlighted in table 2 are not really novel: There is little difference between "reactive oxygen metabolism" and "response to H2O2"; and similarly little difference between "Blood vessel development" and "VEGF signalling", the latter being a primary driver of angiogenesis.

Rev 3: GSEA (pre-ranked using the limma moderated t statistic) should be tried out for the analysis, results may get better I am disappointed there is no clue of axonogenesis or other developmental gradients other than angiogenesis and VEGF the authors should compare to the use of Enrichment Map (EM), with the following two-comparison maps: muscle up + skin 7t up, muscle up + skin 14t up, muscle down + skin 7t dw, muscle down + skin 14t dw. Yes , EM was not designed for (wildly) multi-condition studies, but in these cases comparison can still be broken down and analyzed -- the authors should strive to make a convincing point about the increased intuitiveness of their representation, and convince the reader. Note this would probably work better with GSEA NES for coloring nodes.

Reponse:

Re: GSEA: we have used GSEA for both data-sets. Please see the revised manuscript and the "UC vs Crohn's" and "Skin vs Muscle" vignettes in the ccPaperRev package. https://github.com/rmflight/ccPaperRev
There are many similarities between categoryCompare and other tools, and we admit that. However, we are not seeking to demonstrate that our visualization is better than EM's, but the goal is to provide a tool that is for the most part self-contained in R, from analysis to visualization. Although the data can be visualized in EM, that is merely a convenience, and we were wrong to try and place so much emphasis on the visualization in the original manuscript. In addition, all of the graph information is accessible in the original R instance.

Submission Petruska of data to GEO

Rev 1: The biology in the paper is a microarray experiment, this should be submitted to GEO.

Response:

This was an oversight on the first author's part on not checking the status of getting the data submitted. We have started the submission process, and are still waiting on protocol details to complete the submission.

Test on multiple datasets

Suggestion from Editor, and Rev 1 that if we are going to be a tool paper, we should do multiple datasets.
Problem: what datasets to use that exist, that don't take forever to process?
Simulated data: can we come up with a simulated dataset?

Drop mapping to orthologues

Argue that this removes the benefit of using annotations instead of genes.

How would this change the results? Is it that important? Might be, given the low annotation of Rat genes overall

Support GSEA

Use GSEA results too

We should be able to write a method that takes GSEA results and does the same analysis, and demonstrate it.

Claim of Novelty

Rev 1: It is not clear if this is a biology paper that uses a bioinformatic tool or a bioinformatic paper using biology as an example. From a bioinformatics perspective, there is no clear evidence that the novel algorithm is any better than established methods for performing this kind of analysis, and not enough comparison with other methods was presented in the paper. From a biological perspective, there are few novel findings presented in this paper and no clear indication of how they impact the field.

If it were to be refactored, the authors should decide whether it is the biology or the algorithm they want to present and prioritize the paper accordingly. If the former then a more detailed analysis of the array should be undertaken, preferably with further wet bench validation. If the latter than a wider range of examples of its application should be included, with emphasis on its ability to integrate heterogeneous data and the utility of visualizing the results in cytoscape

Response: Nowhere do we claim the algorithm is novel. We simply lay it out. Seriously, a lot of other people have done this, as we list in our references. This is simply packaging it up in a nice little application that we feel is useful.

What we really have done

Took a simple method that had been previously done (see Shen & Tseng 2010, Merico et al 2010), generalized Shen & Tseng's work (i.e. don't really need to have same items for the MAPE-P part), and then put the equivalent of Merico 2010 into R, that get's visualized in Cytoscape.

Advantages: all the annotation available from Bioconductor, as well as the various enrichment methods available, with the visualization capabilities of R. This means we get the statistical manipulation abilities of R for our data.

Discussion with Eric

Trim down the amount of focus on Jeff's data. Can we trim down the biology and background and just focus on it as an example data set.

Hypothetical situation, theoretical data set: Start with the GO-Terms, have 2000 diff exp. genes, from the 2000, choose some Gene Ontologies (perhaps 20), how many genes need to be rep. ?

Test data set: 60 GO Terms from BP, 20 with large number of genes, 20 with medium, 20 with low (>= 10 genes annotated), and choose large fraction from each.

Set 1500 diff expressed genes from the genes annotated with GO Terms chosen above, and apply the two methods, and see how they are different.

Major Things to Address after Round #2

Reviewer #1 is happy

pending GEO submission and revised discussion. Yay! Note: we still need to write a kick-ass discussion.

GEO submission - Jeff and Ben are working on it
Discussion - Eric and I are working on it

Reviewer #3 still seems to feel that the major contribution is the visualization, and is rather confused by the simulation. This needs to be addressed in some detail.

Doesn't want so much emphasis that categoryCompare is better, but that the methods may be complementary. * This is actually borne out by the simulations: In some cases feature combining worked better, in other cases annotation combining worked better. So it is probably not a bad idea to use both, as was advocated by Shen and Tseng.

Currently our major contribution is showing using simulations why you might want to combine datasets using annotations instead of features, and implementing a method that is rather generic in R.

Specific Points:

(316) "Traditional methods for comparing high-throughput data sets depend on finding shared features between significant feature lists for each data set and then performing some form of enrichment analysis on the overlapping feature sets to determine biological themes among the shared features"; (329) "Figure 1 illustrates this contrast. As a simple example, consider two feature or annotation lists from two different data sets (“Start”, middle box of Figure 1). In the “Traditional Analysis”, the intersection and set-difference of the features are determined, and biological interpretation provided by annotation enrichment of each of the resulting lists."

It is also common to test the genomic features from the experiments independently, and then compare the results in a table or heat map, without extra visualization support for the similarity relations among annotations. You should probably state this explicitly, and extend figure 1.

(321) "However, there are many different scenarios that complicate this analysis, including cases when different types of features are measured (i.e. two studies using different microarray platforms), or comparing studies that examine different types of features that may share the same annotations (transcriptomics vs. metabolomics for example)."

Typically, two studies using different microarray platforms measure the same type of features, but the actual feature instances may be only partially (or not) overlapping. Also: if different types of features do not share any annotation, it is not possible to compare them anyway.

(339) "When no feature-annotation relationships are available, the network considers the lists themselves as nodes, with edges to annotations that were derived from that list."

not sure this functionality is helpful at all: if there are no relation, a graph offers no visualization support to offer relations in a more intuitive / intelligible form

Regarding Results and Methods:

These are major additions.

I understand the simulations were added to demonstrate that considering all annotations significant in at least one condition, rather than annotations significant only for the overlapping genomic features, does not inflate the number of noisy annotations. This is a legitimate concern, but as presented now it look as the main point of the paper. This should be addressed in a subsection of the results, and part of the details should be put in the methods section; the figure should also be modified to convey this message more clearly and concisely.

That said, I am skeptical about proving the superiority of categorycompare mainly by simulations. This is a visualization method, the performance must be measured by how it helps a researcher to productively interpret data. This cannot be measured by simulation.

Right now, I find the results partially overwhelming, confusing and unconvincing. I would like to see two or three examples where a multi-condition publicly-available data-set is presented, the biological question is concisely explained, and what can be gleaned by the "traditional analysis" as well as "category compare" are presented, with corresponding graphics. If a researcher reads this, he should agree the method is useful for his research. Mind the two approaches may as well be complementary, showing different aspects of the data; this conclusion would not weaken at all this article. Conclusions from simulations could be discussed, but, in the current form, the amount of work done on simulations and details presented in the result section does not translate to a sufficiently convincing message about the method's performance.

I encourage the authors to do some more work to address these concerns as they fell appropriate.

Ability to import GSEA type results:

I think what I suggested is very easy to implement: a user should just export his enrichment results in a simple and standardized format and then upload in categorycompare.

rmflight/ccPaper documentation built on May 27, 2019, 9:31 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rmflight/ccPaper
work for categoryCompare paper revisions

In rmflight/ccPaper: work for categoryCompare paper revisions

Introduction Problems

Methods Problems

Discussion Problems

Results Problems

Submission Petruska of data to GEO

Test on multiple datasets

Drop mapping to orthologues

Support GSEA

Claim of Novelty

What we really have done

Discussion with Eric

Major Things to Address after Round #2

Reviewer #1 is happy

Reviewer #3 still seems to feel that the major contribution is the visualization, and is rather confused by the simulation. This needs to be addressed in some detail.

Specific Points:

R Package Documentation

Browse R Packages

We want your feedback!

rmflight/ccPaper work for categoryCompare paper revisions

In rmflight/ccPaper: work for categoryCompare paper revisions

Introduction Problems

Methods Problems

Discussion Problems

Results Problems

Submission Petruska of data to GEO

Test on multiple datasets

Drop mapping to orthologues

Support GSEA

Claim of Novelty

What we really have done

Discussion with Eric

Major Things to Address after Round #2

Reviewer #1 is happy

Reviewer #3 still seems to feel that the major contribution is the visualization, and is rather confused by the simulation. This needs to be addressed in some detail.

Specific Points:

R Package Documentation

Browse R Packages

We want your feedback!

rmflight/ccPaper
work for categoryCompare paper revisions