srFilter | R Documentation |
These functions create user-defined (srFitler
) or built-in
instances of SRFilter
objects. Filters can be
applied to objects from ShortRead
, returning a logical vector
to be used to subset the objects to include only those components
satisfying the filter.
srFilter(fun, name = NA_character_, ...)
## S4 method for signature 'missing'
srFilter(fun, name=NA_character_, ...)
## S4 method for signature 'function'
srFilter(fun, name=NA_character_, ...)
compose(filt, ..., .name)
idFilter(regex=character(0), fixed=FALSE, exclude=FALSE,
.name="idFilter")
occurrenceFilter(min=1L, max=1L,
withSread=c(NA, TRUE, FALSE),
duplicates=c("head", "tail", "sample", "none"),
.name=.occurrenceName(min, max, withSread,
duplicates))
nFilter(threshold=0L, .name="CleanNFilter")
polynFilter(threshold=0L, nuc=c("A", "C", "T", "G", "other"),
.name="PolyNFilter")
dustyFilter(threshold=Inf, batchSize=NA, .name="DustyFilter")
srdistanceFilter(subject=character(0), threshold=0L,
.name="SRDistanceFilter")
##
## legacy filters for ungapped alignments
##
chromosomeFilter(regex=character(0), fixed=FALSE, exclude=FALSE,
.name="ChromosomeFilter")
positionFilter(min=-Inf, max=Inf, .name="PositionFilter")
strandFilter(strandLevels=character(0), .name="StrandFilter")
alignQualityFilter(threshold=0L, .name="AlignQualityFilter")
alignDataFilter(expr=expression(), .name="AlignDataFilter")
fun |
An object of class |
name |
A |
filt |
A |
.name |
An optional |
regex |
Either |
fixed |
|
exclude |
|
min |
|
max |
|
strandLevels |
Either |
withSread |
A |
duplicates |
Either |
threshold |
A |
nuc |
A |
batchSize |
|
subject |
A |
expr |
A |
... |
Additional arguments for subsequent methods; these arguments are not currently used. |
srFilter
allows users to construct their own filters. The
fun
argument to srFilter
must be a function accepting a
single argument x
and returning a logical vector that can be
used to select elements of x
satisfying the filter with
x[fun(x)]
The signature(fun="missing")
method creates a default filter
that returns a vector of TRUE
values with length equal to
length(x)
.
compose
constructs a new filter from one or more existing
filter. The result is a filter that returns a logical vector with
indices corresponding to components of x
that pass all
filters. If not provided, the name of the filter consists of the names
of all component filters, each separated by " o "
.
The remaining functions documented on this page are built-in filters
that accept an argument x
and return a logical vector of
length(x)
indicating which components of x
satisfy the
filter.
idFilter
selects elements satisfying
grep(regex, id(x), fixed=fixed)
.
chromosomeFilter
selects elements satisfying
grep(regex, chromosome(x), fixed=fixed)
.
positionFilter
selects elements satisfying
min <= position(x) <= max
.
strandFilter
selects elements satisfying
match(strand(x), strand, nomatch=0) > 0
.
occurrenceFilter
selects elements that occur >=min
and
<=max
times. withSread
determines how reads will be
treated: TRUE
to include the sread, chromosome, strand, and
position when determining occurrence, FALSE
to include
chromosome, strand, and position, and NA
to include only
sread. The default is withSread=NA
. duplicates
determines how reads with more than max
reads are
treated. head
selects the first max
reads of each set of
duplicates, tail
the last max
reads, and sample
a
random sample of max
reads. none
removes all reads
represented more than max
times. The user can also provide a
function (as used by tapply
) of a single argument to
select amongst reads.
nFilter
selects elements with fewer than threshold
'N'
symbols in each element of sread(x)
.
polynFilter
selects elements with fewer than threshold
copies of any nucleotide indicated by nuc
.
dustyFilter
selects elements with high sequence complexity, as
characterized by their dustyScore
. This emulates the
dust
command from WindowMaker
software. Calculations can be memory intensive; use
batchSize
to process the argument to dustyFilter
in
batches of the specified size.
srdistanceFilter
selects elements at an edit distance greater
than threshold
from all sequences in subject
.
alignQualityFilter
selects elements with alignQuality(x)
greater than threshold
.
alignDataFilter
selects elements with
pData(alignData(x))
satisfying expr
. expr
should
be formulated as though it were to be evaluated as
eval(expr, pData(alignData(x)))
.
srFilter
returns an object of SRFilter
.
Built-in filters return a logical vector of length(x)
, with
TRUE
indicating components that pass the filter.
Martin Morgan <mtmorgan@fhcrc.org>
SRFilter
.
sp <- SolexaPath(system.file("extdata", package="ShortRead"))
aln <- readAligned(sp, "s_2_export.txt") # Solexa export file, as example
# a 'chromosome 5' filter
filt <- chromosomeFilter("chr5.fa")
aln[filt(aln)]
# filter during input
readAligned(sp, "s_2_export.txt", filter=filt)
# x- and y- coordinates stored in alignData, when source is SolexaExport
xy <- alignDataFilter(expression(abs(x-500) > 200 & abs(y-500) > 200))
aln[xy(aln)]
# both filters as a single filter
chr5xy <- compose(filt, xy)
aln[chr5xy(aln)]
# both filters as a collection
filters <- c(filt, xy)
subsetByFilter(aln, filters)
summary(filters, aln)
# read, chromosome, strand, position tuples occurring exactly once
aln[occurrenceFilter(withSread=TRUE, duplicates="none")(aln)]
# reads occurring exactly once
aln[occurrenceFilter(withSread=NA, duplicates="none")(aln)]
# chromosome, strand, position tuples occurring exactly once
aln[occurrenceFilter(withSread=FALSE, duplicates="none")(aln)]
# custom filter: minimum calibrated base call quality >20
goodq <- srFilter(function(x) {
apply(as(quality(x), "matrix"), 1, min, na.rm=TRUE) > 20
}, name="GoodQualityBases")
goodq
aln[goodq(aln)]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.