Description Usage Arguments Details Value todo Author(s) See Also Examples
A collection of tools for dealing with Point Intercept Method (PIM) data.
1 2 3 |
file |
A string specifying a filename to read and process. |
samp_year |
An interger. What year was the survey carried out, eg 2013 |
samp_season |
A character string. What season was the survey carried out, eg "Summer" |
raw_pim |
A dataframe read directly from a |
subunit |
A dataframe of class |
pim_read
takes data-forms filled out in the field and returns the information they contain in a sensible long-format dataframe.
Importantly, the file must have 6 rows of metadata. If it doesn't this function will produce unpredictable results (probably an error).
Several internal checks must be passed for the function to run. A failure of these checks will print an error to screen detailing the file and the error. Currently these check cannot be circumvented. The specific tests that must be passed are (in order):
First two columns of data must be strings and must not be blank. They are assumed to be the FieldName and the ScientificName.
If there is something in the first two columns (i.e. a species) then this must have data associated with it at some point in the file.
If there are data in a row, it must have at the first two columns filled out (i.e. it must have a FieldName or a ScientificName associated with it.).
There can be no blank columns in the middle of the dataset.
The first two columns must be read by R as strings (i.e. they cannot be all numeric codes). If they are deemed to be numeric values then this is perceived as an error.
Cell A2 must look something like the phrase 'Sampling Unit' and cell B2 cannot be blank.
Cell A3 must look something like the word 'Date' and cell B3 cannot be blank.
Cell A4 must look something like the word 'Assessor' and cell B4 cannot be blank.
Cell A5 must look something like the phrase 'Transect No.' and cell B5 must be able to be coerced to an integer.
Steps must be labelled sequentially from 1 to n and n must match with the number of data collected (i.e. if there are n steps, there must be n*2 data columns.)
There must be an equal number of strata and condition scores and their column headers must be 's' and 'c' (in that order).
Essentially the file head should look something like this:
A | B | C | D | E | F | |
--------------------- | --------------------- | --------------------- | --------------------- | --------------------- | --------------------- | |
1 | | PIMs stratum ... | |||||
2 | | Sampling unit | BNS01 | ||||
3 | | Date | 19/10/2012 | ||||
4 | | Assessor | DP | ||||
5 | | Transect No. | 1 | ||||
6 | | Step | 1 | ||||
7 | | Field name | Scientific name | s | c | s | c |
8 | | Lept_gran | Leptospermum | 1 | 4 | 1 | 3 |
For pim_parse
at a minimum raw_pim
must contain columns named:
Year
, Season
, Start.Date..YYYY.MM.DD.
, Scientific.Name
, Sampling.Unit.ID
, Step
, Stratum
, Condition.Score..1.5.
and Plot.Transect.Number
.
This is consistent with raw data extracted from the CMLR database on 05/07/2013.
pim_sum
takes the output from pim_parse
(an error will be thrown otherwise). The function is intended to process one site at a time, if more that one site is detected, the function will continue, but a snippy warning is issued. The function returnes a dataframe summarising the pim data by sampling unit (species frequencies and median condition). See example below for how to do this.
pim_read
returns a named list with two items:
metadata |
A dataframe with columns 'SamplingUnit', 'Date', 'Assessor' and 'TransectNo'. Contains a single row. |
data |
A long-format dataframe with columns 'Date', 'Assessor', 'SamplingUnit', 'Transect', 'Step', 'FieldName', 'ScientificName', 'Strata' and 'Condition'. Contains a row for each data point (i.e. each species at each step). |
Items returned by pim_read
are either strings or numerics. If required, coercing to factors will need to be done after the data have been read in.
pim_parse
returns a dataframe of class parsed.pim
with columns 'Date','SampUnit', 'Transect','Step','Strata','ScientificName' and 'Condition'
pim_sum
returns dataframe (of no specific class) with columns 'Season', 'Year','SamplingUnit', 'Transect', 'Date', 'ScientificName', 'Frequency' and 'MedianCondition'.
Make date dynamic.
Make methods code dynamic.
Enforce the 6-row rule.
Check date formatting. Return an R date.
Allow free-format metadata in key:value pairs (odd columns = key; even columns = value).
Gretchen Brownstein and Daniel Pritchard
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ## Not run:
# First: ensure your current working directory contains the .csv files to process.
all_csv <- sort(Sys.glob('*.csv'))
allmeta = NULL
alldata = NULL
for(a in 1:length(all_csv)){
print(cat('File number ', a, sep=''))
pimout <- pim_read(all_csv[a])
alldata <- rbind(alldata, pimout$data)
allmeta <- rbind(allmeta, pimout$metadata)
}
# Read Data
# Assumes data is extracted from the CMLR database. Developed with data extracted on
# 2013-07-05
pim_data<-read.csv(file=file.choose())
parsed_data<-pim_parse(pim_data)
#this is to run pim_sum (and set working directory to where you want the file to end up), this sums all transects together for a plot
summary_pim = NULL
for(a in unique(parsed_data$seasonyearsu)){
print(cat('season-year-SamplingUnit', a, sep=' '))
subunit<- subset(parsed_data, seasonyearsu== a )
pimsum<-pim_sum(subunit)
summary_pim<-rbind(summary_pim, pimsum)
}
# To sum each transect indiviually, do this:
parsed_data$site_trans <-paste(parsed_data$seasonyearsu, parsed_data$Transect) #makes new id
summary_pim = NULL
for(a in unique(parsed_data$site_trans)){
print(cat('season-year-SamplingUnit', a, sep=' '))
subunit<- subset(parsed_data, site_trans== a )
pimsum<-pim_sum(subunit)
summary_pim<-rbind(summary_pim, pimsum)
}
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.