readFSA: Read and size .fsa files
In plantarum/binner: Read fsa fragment files from an ABI Genetic Analzyer

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/readFSA.R

readFSA reads and processes raw .fsa files into R.

readFSA(files = NULL, path = "./", dye, lad.channel = 105, pretrim = NA,
  posttrim = ".fsa", ladder = c(35, 50, 75, 100, 139, 150, 160, 200, 250,
  300, 340, 350, 400, 450, 490, 500), SNR = 6000, ladder.check = 250,
  sizing = "local", bin.width = 1, min.peak.height = 50,
  baseline.width = 51, verbose = TRUE, smoothing = 3, CORES = 1)

`files`	A list of fsa files to read. If NULL (the default), all .fsa files in the directory specified by `path` will be read.
`path`	The directory to search for `files`. The default is the current directory.
`dye`	A vector of dyes to include when reading data. Valid values include: "FAM", "VIC", "NED", "PET".
`lad.channel`	Which .fsa data channel has the size standard ladder data. The default is 105, which is the value for our system.
`pretrim`	A regexp - text to trim off the front of the sample names.
`posttrim`	A regexp - text to trim off the end of the sample names.
`ladder`	A vector with the fragments present in the ladder, in order. The default is the standard GS500(-250)LIZ ladder.
`SNR`	This is a cut-off value, used to exclude the primer-dimer spike at the beginning of the run from being erroneously interpreted as a ladder fragment. This spike is usually > 6000 rfus, and the true ladder peaks are usually (always?) well below this cut-off. Not setting this value may lead to slower, and poorer ladder-fitting.
`ladder.check`	If not null, the size of a ladder fragment that is present but not used for sizing. This size of this fragment will be estimated, and the estimate reported during scanning. Otherwise, it will be ignored. See below.
`sizing`	Currently two options are supported, "local" and "cubic". "local" provides the local Southern method, identical to the one used in PeakScanner et al., and recommended. "cubic" uses a cubic spline function.
`bin.width`	The width in basepairs of each bin. Used to tune the peak-finding algorithm of the internal function `get.peaks`.
`min.peak.height`	The minimum rfu value to consider a true peak, passed to `get.peaks`. Note that you can exclude low peaks later on in the process. This is preferable, because normalization isn't done inside `readFSA`; the height of some peaks may be increased by normalization, if they aren't excluded here.
`baseline.width`	The width of the window to use when 'correcting' the rfu intensity. Each rfu value will be corrected by having the running minimum from a window `baseline.width` units wide subtracted from it. Without this correction, the rfu values on some runs will gradually decline over the course of the run. As I understand it, this is identical to the implementation in PeakScanner.
`verbose`	Do you want to see all the details scroll by or not? `readFSA` can take a while, so this gives you something to watch while you wait.
`smoothing`	This is a tuning value. If smoothing is > 1, the rfu values will be converted to the running mean of the actual values, with a window width of of 'smoothing'. 3 seems to work nicely and is the default. 1 may be fine too. Even numbers or non-integer values may break the time-space continuum (untested).

pretrim and posttrim are regexps, passed to grep. The substring at the front of each rowname matching pretrim (or the end for posttrim) is removed. To cancel trimming, set these to NA.

ladder.check In the standard ladder GS500, the 250bp fragment commonly migrates at an odd rate, making it inappropriate for use in sizing. Setting ladder.check = 250, which is the default, will exclude this fragment from the sizing process. Set ladder.check = NA if you want to use all the peaks in ladder in sizing the data.

readFSA returns an object of class fsa. The elements include:

ep:: A list of electropherogram objects, each corresponding to one fsa file (see below.)
dyes:: A list of the data channels (fluorescent dyes) read.
area:: A data frame recording the total area under the curve used for each dye/sample combination, used for normalizing results.
error:: If present, a vector of sample names for all samples that produced unsatisfactory sizing results. Most likely bad reactions that should be removed.

electropherogram objects have three components:

scans:: A data frame, the columns of which are the heights (in RFUs) of each dye, including the size standard, for each time step in the capillary run. The data is ordered, with the first reads at the beginning of the table. There is an additional column, ‘bp’, which stores the size, in base pairs, of each row in the table.
peaks:: A list of vectors, each of which contains the position of the peaks for each dye in the electropherogram, in base pairs.
sample:: The original sample name for the fsa file.

Tyler Smith

fsaNormalize, fsa2PeakTab, plot.fsa, fsaRGbin, binSet, scanGel

## Not run: 
## A set of fsa files are included in this package, which you can read
## with the following example. For your own data replace
## \code{system.file(...)} with the path to your fsa files.

## Read the raw files:
## Pretrim and postrim are optional, and serve only to remove
## extraneous components of the sample name added by the sequencing
## lab.

## Note that I've deliberately included a bad sample, which takes
## considerably longer to process than clean reads.
fsa.data <- readFSA(path = system.file("pp5", package = "binner"),
                     pretrim = "AFLP_.*AFLP_", posttrim = "-5_Frag.*",
                     dye = "FAM")

## The print function for fsa objects doesn't do much yet:
fsa.data
summary(fsa.data)

## Plot the second sample, which has a nice, clean ladder
plot(fsa.data, 2)

## Plot the bad sample, note the funky ladder
plot(fsa.data, fsa.data$errors[1])

## Kill it! KILL IT WITH FIRE!!
fsa.data = fsaDrop(fsa = fsa.data, epn = fsa.data$errors[1])

fsa.data
summary(fsa.data)

## Normalize the electropherograms
fsa.norm <- fsaNormalize(fsa.data)

## Plot the second sample again, note the peak heights (y-axis) have
## changed, but otherwise this plot is identical to the first plot
## above.
plot(fsa.norm, 2)

## Convert the electropherograms into a peak table
peaktab <- fsa2PeakTab(fsa.norm, dye = "FAM")
head(peaktab)

## Binning:
bins <- fsaRGbin(peaktab)

## Review the bins:
scanGel(peaktab, bins)
aflp <- binSet(peaktab, bins, pref = "A")

## Extract the scoring data and proceeed with analysis:
mydata <- aflp[, , "alleles"]

## See scangel() for additional examples


## End(Not run)