manuscript/f1000_revisions/response.md

Taranu Response

Spatial/Regional Heterogeneity in Chlorophyll/Microcystin realtionship

We agree that there are likely regional differences and would like to account for this; however, sample sizes for each region vary (67 to 155) and are relatively small. The resulting conditional probability analysis would have very wide confidence intervals. Thus, comparison between regions would be difficult and inferring a pattern would not be possible. We have added additional text in the discussion (second to last paragraph) that raises this issue. Additionally, we have added the Beaver et al. reference in this discussion.

Specific Edits

P3

P4

Marion Response

Additional discussion needed in methods related to the National Lakes Assessment:

The readership may not be aware of the U.S. NLA performed in 2007. The author(s) should clarify where samples were collected (nearshore or from the surface in the deeper waters). NLA chlorophyll a samples were take from the profundal zone rather than the littoral zone. The readership may also be interested in how many chl a samples were collected from each lake. Where were the microcystin-LR samples collected? - Response: We agree that additional information was needed describing the NLA. We have added this to the first paragraph on the Data section.

Additional discussion needed on how the data were organized for data analysis:

Were these samples paired (collected at the same time from the same locale) or are these some type of aggregated value over a lake season? Describing this in the methods will really help for understanding the importance of this work. Paired results (MC-LR and Chl a from the same day) are much more impactful for demonstrating the rapid advantage of chl a compated to using results that are a seasonal average indicating that the hypereutrophic and eutrophic lakes (ones with the highest chl a) are also the ones that are most likely to have a cyanoHAB event sometime during the year. - Response: We agree and have added some additional wording to the Data section indicating that the samples are taken at the same time.

Improved discussion needed on alternative indicators for cHABs and cyanotoxins not assessed in the NLA:

Brief mention is given to phycocyanin (one study), and the additional language (about phycocyanin not always beiThese blooms are expected to increase in frequecy and severity due to the impacts of climate change ng available for measure and when measured, it is for only measuring pigment and not toxins) is equally relevant for chl a. The same in vivo handheld fluorometers and continuous monitoring solutions available for chl a are now widely available for phycocyanin, often at the same cost as a rapid measure for chl a. Phycocyanin, like chl a, does not measure toxin either, but phycocyanin in many studies has outperformed chl a, and in some studies it has not (especially when toxin concentration is low). Historical records on PC are likely not as great as chlorophyll a. Overall, several studies on this topic have been produced in the last two to four years (see Zamyadi and Dorner’s work), with one study using phycocyanin to predict non-alcoholic liver disease presuming a relationship with cyanotoxins (Zhang et al. 2015) - Response: We agree that phycocyanin is more closely linked to microcystin than is chl a. Our paragraph mentioning phycocyanin was confusing and did suggest that chl a had a stronger association. Wording of that paragraph has been changed and the Ahn et al paper was added as reference. We feel that further discussion of phycocyanin, while important, is beyond the scope of our paper with its focus on chlorophyll.

Consideration desired on region-specific criteria or limitations of national recommendations for chl a:

With nearly 30% of the lakes in the temperate plains being coded as poor for chlorophyll a in the 2007 NLA, what impact would these conditional probabilities have on these lakes? Should the lake managers in this region be monitoring continuously all the time? What are the mean/median chlorophyll a levels for this part of the U.S? Regional variability may be really important and did the conditional probability approach take this into consideration or can it take it into consideration? Is there a way to evaluate if there are significant regional effects in the U.S? For nutrient standards in the U.S. and macroinvertebrate assessments, EPA has had to issue region-specific guidelines/criteria, etc. for some parameters. - Response: We agree that there are likely regional differences and would like to account for this; however, sample sizes for each region vary (67 to 155) and are relatively small. The resulting conditional probability analysis would have very wide confidence intervals. Thus, comparison between regions would be difficult and inferring a pattern would not be possible. We have added additional text in the discussion (second to last paragraph) that raises this issue. Additionally, we have added the Beaver et al. reference in this discussion.

Greater discussion needed on limitations of NLA and need for model validation/future studies:

The paper fails to address the limitations of the NLA – as a reader, I’m not aware of the limitations. I have much respect for the NLA, but I do have questions regarding the number of samples for each lake. Furthermore, a statement or two discussing the need to validate modeled data may be worthwhile. Is there a way to see if the probabilities actually align with the accuracy and type II error rates predicted by the conditional probability approach? - Response: We added a paragraph to the discussions about validation and the single sample limitations of the NLA.

Specific Edits

Abstract-Specific Comments:

Results Comments:

Discussion Comments:

Wilson Response

I think the authors need to more broadly consider the existing literature and describe how their findings relate to and build from past studies. Below, I provide some related studies that the authors might want to consider.

- **Response:**  First thanks for the fantastic list of refs!  Having it linked with this publication is a resource in and of itself.  We have looked at those carefully and have added several: including Ahn et al, Beaver et al, Yuan and Pollard, and Marion et al.  We have not added signficantly to the background on this paper because our goal was to keep this research communication short and focused on on the chl and microcystin relationship.

Based on the 2007 National Lakes Assessment report, roughly two-thirds of the waterbodies reported no detectable microcystin (detection limit = 0.05 ug/L) despite covering a huge range of chlorophyll concentrations. And, Fig 2 suggests that a large number of sites had barely detectable concentration of microcystin across a wide range of chlorophyll. It is not clear from the text how the authors dealt with waterbodies with undetectable or barely detectable microcystin concentrations.

- **Response:**  We have added some text to the Data section indicating how we deal with the detection limit.  We feel it is important to keep these values in the analysis as removing them would inflate our confidence around the conditional probabilities.  We hope this is clearer in our revision.

Presenting histograms of chlorophyll and microcystin concentrations for the study lakes would be useful.

- **Response:**  We have chosen to present the distribution information in text and present for both chlorophyll and microcystin the range, mean, and median.  Figure 2 also indicates the distribution of both.  Lastly, the data are availble via [code from the GitHub repository](https://github.com/USEPA/Microcystinchla/blob/master/R/get_nla.R).

I am not an expert on conditional probability analysis. Based on the authors’ text (second paragraph in Analytical Methods section), it appears that this analysis considers multiple events over time. If their dataset includes single measurements in a waterbody, I don’t understand where the temporal component comes into the analysis. Again, I could be totally misunderstanding how this analysis works and should probably read the relevant references the authors provided.

- **Response:**  We have added some additional text in the methods about the NLA as well as in the Discussion on NLA limitations.  In short, this is not a temporal analysis and is based on a single snap shot.

Based on increasing error in the conditional probability plots as chlorophyll increases, the reported chlorophyll thresholds should not include significant digits (i.e., ± 0.1) but instead be whole numbers.

- **Response:**  Done. NEED TO DO on table already in overleaf

I would organize the information in table 1 by either concentration (low to high) or advisory type (drinking or recreational) and concentration (low to high). It might also be useful to include the number of lakes represented in each category based on microcystin.

- **Response:** Table  re-orderd based on concentration.  Number of lakes (as percentage) included in text.  Need to do directly on table in overleaf.

In table 2, I would add the specific microcystin concentration target under each advisory type to avoid having to look back at table 1 for these data.

- **Response:**  Done.  Need to transfer to overleaf.

Most waterbodies lacked microcystin and Figure 2 clearly shows that there are a huge number of waterbodies across a large chlorophyll range that apparently had microcystin concentrations at the detection limit of 0.05 ug/L. I am concerned about the microcystin data at the detection limit. They appear to be false positives. I agree with the authors who acknowledged that high chlorophyll is not always a good predictor of high microcystin. What should be done for those waterbodies with high concentrations of chlorophyll but that had no or barely detectable microcystin?

- **Response:** We added some discussion about this in the last paragraph of the Data section.  We feel that these should be left in as removing them would erroneously inflate our confidence intervals and impact the conditional probabilities.  Essentially these are lakes with very low microcystin but widely varying chlorophyll values.

I am confused about the data collected and available for the 2007 National Lakes Assessment. For example, I organized this dataset in July 2010 and found that 1158 lakes were sampled once (1152 of these lakes included data for both chlorophyll and microcystin) and 95 of the 1158 originally sampled lakes were sampled a second time in 2007. Yuan et al. 2014 (Freshwater Biology) used data for 1077 sampled lakes. The current study (as well as the National Lakes Assessment website and report) describes data for 1028 lakes. Clarity about these discrepancies is not necessarily the authors’ job, but it would be good to understand why the differences exist across these datasets. Also, for this study, how were data used for lakes sampled twice in 2007?

- **Response:** We share your confusion!  There are many "types" of samples included with the raw NLA data.  For this analysis, we only used the probability samples (i.e. no reference samples) and only used the first visit to a lake.  Additionally, lakes that had no data reported for either chl or microsystin were not included.  As noted, this results in 1028 samples

Although all of the National Lakes Assessment data are publicly available, the authors should provide the dataset that they used for this study.

- **Response:** Code to access the data is available from [USEPA/microcysinchla](https://github.com/USEPA/microcystinchla).  We have also added in a static .csv file to this repository of the data used for our analysis. This is listed in the "Data and software availability" section.

Specific Edits:

Title and Abstract:



USEPA/Microcystinchla documentation built on May 9, 2019, 5:23 p.m.