pim_tools: CMLR Point Intercept Method Tools
In dpritchard/dgmisc: Miscellaneous R functions for research

Description Usage Arguments Details Value todo Author(s) See Also Examples

A collection of tools for dealing with Point Intercept Method (PIM) data.

1
2
3

pim_read(file, samp_year=NA, samp_season=NA)
pim_parse(raw_pim)
pim_sum(subunit)

`file`	A string specifying a filename to read and process.
`samp_year`	An interger. What year was the survey carried out, eg 2013
`samp_season`	A character string. What season was the survey carried out, eg "Summer"
`raw_pim`	A dataframe read directly from a `.csv` file from the CMLR database. See Details.
`subunit`	A dataframe of class `parsed.pim` (i.e. the output from `pim_parse`). See Details.

pim_read takes data-forms filled out in the field and returns the information they contain in a sensible long-format dataframe.

Importantly, the file must have 6 rows of metadata. If it doesn't this function will produce unpredictable results (probably an error).

Several internal checks must be passed for the function to run. A failure of these checks will print an error to screen detailing the file and the error. Currently these check cannot be circumvented. The specific tests that must be passed are (in order):

First two columns of data must be strings and must not be blank. They are assumed to be the FieldName and the ScientificName.
If there is something in the first two columns (i.e. a species) then this must have data associated with it at some point in the file.
If there are data in a row, it must have at the first two columns filled out (i.e. it must have a FieldName or a ScientificName associated with it.).
There can be no blank columns in the middle of the dataset.
The first two columns must be read by R as strings (i.e. they cannot be all numeric codes). If they are deemed to be numeric values then this is perceived as an error.
Cell A2 must look something like the phrase 'Sampling Unit' and cell B2 cannot be blank.
Cell A3 must look something like the word 'Date' and cell B3 cannot be blank.
Cell A4 must look something like the word 'Assessor' and cell B4 cannot be blank.
Cell A5 must look something like the phrase 'Transect No.' and cell B5 must be able to be coerced to an integer.
Steps must be labelled sequentially from 1 to n and n must match with the number of data collected (i.e. if there are n steps, there must be n*2 data columns.)
There must be an equal number of strata and condition scores and their column headers must be 's' and 'c' (in that order).

Essentially the file head should look something like this:

	A	B	C	D	E	F
	---------------------	---------------------	---------------------	---------------------	---------------------	---------------------
1 \|	PIMs stratum ...
2 \|	Sampling unit	BNS01
3 \|	Date	19/10/2012
4 \|	Assessor	DP
5 \|	Transect No.	1
6 \|	Step		1
7 \|	Field name	Scientific name	s	c	s	c
8 \|	Lept_gran	Leptospermum	1	4	1	3

For pim_parse at a minimum raw_pim must contain columns named:

Year, Season, Start.Date..YYYY.MM.DD., Scientific.Name, Sampling.Unit.ID, Step, Stratum, Condition.Score..1.5. and Plot.Transect.Number.

This is consistent with raw data extracted from the CMLR database on 05/07/2013.

pim_sum takes the output from pim_parse (an error will be thrown otherwise). The function is intended to process one site at a time, if more that one site is detected, the function will continue, but a snippy warning is issued. The function returnes a dataframe summarising the pim data by sampling unit (species frequencies and median condition). See example below for how to do this.

pim_read returns a named list with two items:

`metadata`	A dataframe with columns 'SamplingUnit', 'Date', 'Assessor' and 'TransectNo'. Contains a single row.
`data`	A long-format dataframe with columns 'Date', 'Assessor', 'SamplingUnit', 'Transect', 'Step', 'FieldName', 'ScientificName', 'Strata' and 'Condition'. Contains a row for each data point (i.e. each species at each step).

Items returned by pim_read are either strings or numerics. If required, coercing to factors will need to be done after the data have been read in.

pim_parse returns a dataframe of class parsed.pim with columns 'Date','SampUnit', 'Transect','Step','Strata','ScientificName' and 'Condition'

pim_sum returns dataframe (of no specific class) with columns 'Season', 'Year','SamplingUnit', 'Transect', 'Date', 'ScientificName', 'Frequency' and 'MedianCondition'.

Make date dynamic.
Make methods code dynamic.
Enforce the 6-row rule.
Check date formatting. Return an R date.
Allow free-format metadata in key:value pairs (odd columns = key; even columns = value).

Gretchen Brownstein and Daniel Pritchard

bb_tools.

## Not run: 
# First: ensure your current working directory contains the .csv files to process.
all_csv <- sort(Sys.glob('*.csv'))
allmeta = NULL
alldata = NULL
for(a in 1:length(all_csv)){
    print(cat('File number ', a, sep=''))
    pimout <- pim_read(all_csv[a])
    alldata <- rbind(alldata, pimout$data)
    allmeta <- rbind(allmeta, pimout$metadata)
}

# Read Data
# Assumes data is extracted from the CMLR database. Developed with data extracted on
# 2013-07-05

pim_data<-read.csv(file=file.choose())
parsed_data<-pim_parse(pim_data)

#this is to run pim_sum (and set working directory to where you want the file to end up), this sums all transects together for a plot

summary_pim = NULL

for(a in unique(parsed_data$seasonyearsu)){
  print(cat('season-year-SamplingUnit', a, sep=' '))
  subunit<- subset(parsed_data, seasonyearsu== a )
  pimsum<-pim_sum(subunit)
  summary_pim<-rbind(summary_pim, pimsum)
}

# To sum each transect indiviually, do this:

parsed_data$site_trans <-paste(parsed_data$seasonyearsu, parsed_data$Transect) #makes new id

summary_pim = NULL

for(a in unique(parsed_data$site_trans)){
  print(cat('season-year-SamplingUnit', a, sep=' '))
  subunit<- subset(parsed_data, site_trans== a )
  pimsum<-pim_sum(subunit)
  summary_pim<-rbind(summary_pim, pimsum)
}

## End(Not run)