pipeline: Run the SeaFlow Pipeline

Description Usage Arguments Examples

Description

run the pipeline

Usage

1
2
3
4
5
6
7
8
9
pipeline(cruise.name='', repo=REPO.PATH, range=NULL, steps=1:4, pct=.97,  
	clust.concat.ct=3, resample.size=300, resamp.concat.max=10,
	 filter.width=1.5, filter.notch=1,  filter.origin=NA,
	 classify.func =2, classify.varnames=CHANNEL.CLMNS.SM, classify.numc=0, classify.noise=0,
	 map.margin=2, 
	concat.sds=!is.na(match(1,steps)), load.to.db=FALSE,  preplot=FALSE, cleanup=TRUE,
	input.path=paste(repo, '/', cruise.name, sep=''),
	output.path=input.path,  log.dir=output.path,
	def.path=paste(input.path,'/', 'pop.def.tab',sep=''), parallel=TRUE, submit.cmd='qsub')

Arguments

cruise.name

Simplified cruise name (same name as the subdirectory in the seaflow data dir).

steps

Which steps of the pipeline to run. step 1 is filter, step 2 is classify, step 3 is census and consensus, step 4 is summarize. 1:2 will do step 1 to 2, etc.

pct

percentage completion (number of indicator files created vs input files) each job step should go to.

clust.concat.ct

Number of event file to concatenate at a time during the clustering/classification step.

map.margin

Margin in latitude/longitude around the map plots.

resample.size

Minimum number of events in a population.

resamp.concat.max

Maximum number of allowable event files to concatenate to generate statistics from.

filter.notch

the location of the x=y (by default) point to create the notch in the gated filter

filter.width

the margin of error for particle alignment determination in the filter step.

origin

correction factor for the stream alignment. When stream is not properly aligned, aligned particles do not scatter light equally on D1 and D2 and must be corrected. By default (NA), the value of the origin is calculated as the difference between the median value of D2 with respect to the median value of D1

classify.func

Choose the clustering method, either flowClust (func = 1) or flowMeans (func = 2, by default) function

classify.varnames

A character vector specifying the variables (columns) to be included in clustering when choosing flowMeans.

classify.numc

Number of clusters when choosing flowMeans. If set to 0 (default) the value matches the number of populations defined in pop.def table . If set to NA, the optimal number of clusters will be estimated automatically.

classify.noise

Set up the noise threshold for phytoplankton cells. Only cells with chlorophyll value higher than the noise will be clustered

concat.sds

Determines if the sds files in the individual julian day directories should be concatenated together into sds.tab

load.to.db

Load the sds and stat files to the database.

preplot

Preplot the level 2 analysis plots to 'output.path'.

cleanup

Cleanup the submission and (non error reporting) R CMD BATCH log files.

input.path

Path to the directory with input data (raw evt or opp files.

output.path

Path to the directory where you wish to output data.

log.dir

Path to the directory where log file will be written.

def.path

Path to the file that defines how to gate & cluster the events into populations.

parallel

Boolean indicating if the job should be run in parallel using qsub (vs in serial)

repo

Full path to your SeaFlow repository

range

A named, two-integer vector specifying the start and end (inclusive) range for subsetting the input files used in each analysis step (with the exception of summarize). Values should be a (evt/opp) file numbers and names should be strings corresponding to the year_julianday directory names. The nv() function is useful for creating this vector.

submit.cmd

the command used to deploy an R CMD BATCH system call to a cluster. Must be used in conjunction with parallel=TRUE.

Examples

1
2
3
4
5
6
7
8
9
example.cruise.name <- 'seaflow_cruise'
temp.out.dir <- '.' #path.expand('~')
output.path <- paste(temp.out.dir,'/',example.cruise.name,sep='')
seaflow.path <- system.file("extdata", example.cruise.name, package="flowPhyto")

file.copy(from=seaflow.path, to=temp.out.dir, recursive=TRUE)

pipeline(repo= temp.out.dir, cruise.name='seaflow_cruise', steps=4, parallel=FALSE) 
unlink(example.cruise.name, recursive=TRUE)

armbrustlab/flowPhyto documentation built on May 10, 2019, 1:40 p.m.