Description Usage Arguments Details Value Author(s) See Also Examples
These functions create user-defined (srFitler) or built-in
instances of SRFilter objects. Filters can be
applied to objects from ShortRead, returning a logical vector
to be used to subset the objects to include only those components
satisfying the filter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | srFilter(fun, name = NA_character_, ...)
## S4 method for signature 'missing'
srFilter(fun, name=NA_character_, ...)
## S4 method for signature 'function'
srFilter(fun, name=NA_character_, ...)
compose(filt, ..., .name)
idFilter(regex=character(0), fixed=FALSE, exclude=FALSE,
.name="idFilter")
occurrenceFilter(min=1L, max=1L,
withSread=c(NA, TRUE, FALSE),
duplicates=c("head", "tail", "sample", "none"),
.name=.occurrenceName(min, max, withSread,
duplicates))
nFilter(threshold=0L, .name="CleanNFilter")
polynFilter(threshold=0L, nuc=c("A", "C", "T", "G", "other"),
.name="PolyNFilter")
dustyFilter(threshold=Inf, batchSize=NA, .name="DustyFilter")
srdistanceFilter(subject=character(0), threshold=0L,
.name="SRDistanceFilter")
##
## legacy filters for ungapped alignments
##
chromosomeFilter(regex=character(0), fixed=FALSE, exclude=FALSE,
.name="ChromosomeFilter")
positionFilter(min=-Inf, max=Inf, .name="PositionFilter")
strandFilter(strandLevels=character(0), .name="StrandFilter")
alignQualityFilter(threshold=0L, .name="AlignQualityFilter")
alignDataFilter(expr=expression(), .name="AlignDataFilter")
|
fun |
An object of class |
name |
A |
filt |
A |
.name |
An optional |
regex |
Either |
fixed |
|
exclude |
|
min |
|
max |
|
strandLevels |
Either |
withSread |
A |
duplicates |
Either |
threshold |
A |
nuc |
A |
batchSize |
|
subject |
A |
expr |
A |
... |
Additional arguments for subsequent methods; these arguments are not currently used. |
srFilter allows users to construct their own filters. The
fun argument to srFilter must be a function accepting a
single argument x and returning a logical vector that can be
used to select elements of x satisfying the filter with
x[fun(x)]
The signature(fun="missing") method creates a default filter
that returns a vector of TRUE values with length equal to
length(x).
compose constructs a new filter from one or more existing
filter. The result is a filter that returns a logical vector with
indices corresponding to components of x that pass all
filters. If not provided, the name of the filter consists of the names
of all component filters, each separated by " o ".
The remaining functions documented on this page are built-in filters
that accept an argument x and return a logical vector of
length(x) indicating which components of x satisfy the
filter.
idFilter selects elements satisfying
grep(regex, id(x), fixed=fixed).
chromosomeFilter selects elements satisfying
grep(regex, chromosome(x), fixed=fixed).
positionFilter selects elements satisfying
min <= position(x) <= max.
strandFilter selects elements satisfying
match(strand(x), strand, nomatch=0) > 0.
occurrenceFilter selects elements that occur >=min and
<=max times. withSread determines how reads will be
treated: TRUE to include the sread, chromosome, strand, and
position when determining occurrence, FALSE to include
chromosome, strand, and position, and NA to include only
sread. The default is withSread=NA. duplicates
determines how reads with more than max reads are
treated. head selects the first max reads of each set of
duplicates, tail the last max reads, and sample a
random sample of max reads. none removes all reads
represented more than max times. The user can also provide a
function (as used by tapply) of a single argument to
select amongst reads.
nFilter selects elements with fewer than threshold
'N' symbols in each element of sread(x).
polynFilter selects elements with fewer than threshold
copies of any nucleotide indicated by nuc.
dustyFilter selects elements with high sequence complexity, as
characterized by their dustyScore. This emulates the
dust command from WindowMaker
software. Calculations can be memory intensive; use
batchSize to process the argument to dustyFilter in
batches of the specified size.
srdistanceFilter selects elements at an edit distance greater
than threshold from all sequences in subject.
alignQualityFilter selects elements with alignQuality(x)
greater than threshold.
alignDataFilter selects elements with
pData(alignData(x)) satisfying expr. expr should
be formulated as though it were to be evaluated as
eval(expr, pData(alignData(x))).
srFilter returns an object of SRFilter.
Built-in filters return a logical vector of length(x), with
TRUE indicating components that pass the filter.
Martin Morgan <mtmorgan@fhcrc.org>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | sp <- SolexaPath(system.file("extdata", package="ShortRead"))
aln <- readAligned(sp, "s_2_export.txt") # Solexa export file, as example
# a 'chromosome 5' filter
filt <- chromosomeFilter("chr5.fa")
aln[filt(aln)]
# filter during input
readAligned(sp, "s_2_export.txt", filter=filt)
# x- and y- coordinates stored in alignData, when source is SolexaExport
xy <- alignDataFilter(expression(abs(x-500) > 200 & abs(y-500) > 200))
aln[xy(aln)]
# both filters as a single filter
chr5xy <- compose(filt, xy)
aln[chr5xy(aln)]
# both filters as a collection
filters <- c(filt, xy)
subsetByFilter(aln, filters)
summary(filters, aln)
# read, chromosome, strand, position tuples occurring exactly once
aln[occurrenceFilter(withSread=TRUE, duplicates="none")(aln)]
# reads occurring exactly once
aln[occurrenceFilter(withSread=NA, duplicates="none")(aln)]
# chromosome, strand, position tuples occurring exactly once
aln[occurrenceFilter(withSread=FALSE, duplicates="none")(aln)]
# custom filter: minimum calibrated base call quality >20
goodq <- srFilter(function(x) {
apply(as(quality(x), "matrix"), 1, min, na.rm=TRUE) > 20
}, name="GoodQualityBases")
goodq
aln[goodq(aln)]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.