SydColDat | R Documentation |
Transformed counts of faecal coliform bacteria in sea water at seven locations: Longreef, Bondi East, Port Hacking “50”, and Port Hacking “100” (controls) and Bondi Offshore, Malabar Offshore and North Head Offshore (outfalls). At each location measurements were made at four depths: 0, 20, 40, and 60 meters.
The data sets are named SydColCount
and SydColDisc
.
Data frames with 5432 observations on the following 6 variables.
y
Transformed measures of the number of faecal coliform count bacteria in a sea-water sample of some specified volume. The original measures were obtained by a repeated dilution process.
For SydColCount
the transformation used was essentially
a square root transformation, resulting values greater than 150
being set to NA
. The results are putatively compatible
with a Poisson model for the emission probabilities.
For SydColDisc
the data were discretised
using the cut()
function with breaks given
by c(0,1,5,25,200,Inf)
and labels equal to
c("lo","mlo","m","mhi","hi")
.
Note that in the SydColDisc
data there are 180 fewer
missing values (NA
s) in the y
column than in
the SydColCount
data. This is because in forming
the SydColCount
data (transforming the original data
to a putative Poisson distribution) values that were greater
than 150 were set equal to NA
, and there were 180 such
values.
locn
a factor with levels “LngRf” (Longreef), “BondiE” (Bondi East), “PH50” (Port Hacking 50), “PH100” (Port Hacking 100), “BondiOff” (Bondi Offshore), “MlbrOff” (Malabar Offshore) and “NthHdOff” (North Head Offshore)
depth
a factor with levels “0” (0 metres), “20” (20 metres), “40” (40 metres) and “60” (60 metres).
ma.com
A factor with levels no
and yes
,
indicating whether the Malabar sewage outfall had been commissioned.
nh.com
A factor with levels no
and yes
,
indicating whether the North Head sewage outfall had been commissioned.
bo.com
A factor with levels no
and yes
,
indicating whether the Bondi Offshore sewage outfall had been commissioned.
The observations corresponding to each location-depth combination constitute a time series. The sampling interval is ostensibly 1 week; distinct time series are ostensibly synchronous. The measurements were made over a 194 week period. See Turner et al. (1998) for more detail.
Geoff Coade, of the New South Wales Environment Protection Authority (Australia)
T. Rolf Turner, Murray A. Cameron, and Peter J. Thomson. Hidden Markov chains in generalized linear models. Canadian J. Statist., vol. 26, pp. 107 – 125, 1998.
Rolf Turner. Direct maximization of the likelihood of a hidden Markov model. Computational Statistics and Data Analysis 52, pp. 4147 – 4160, 2008, doi:10.1016/j.csda.2008.01.029.
# Select out a subset of four locations:
loc4 <- c("LngRf","BondiE","BondiOff","MlbrOff")
SCC4 <- SydColCount[SydColCount$locn %in% loc4,]
SCC4$locn <- factor(SCC4$locn) # Get rid of unused levels.
rownames(SCC4) <- 1:nrow(SCC4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.