matchprobes: A function to match a query sequence to the sequences of a...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/matchprobes.R


The query sequence, a character string (probably representing a transcript of interest), is scanned for the presence of exact matches to the sequences in the character vector records. The indices of the set of matches are returned.

The function is inefficient: it works on R's character vectors, and the actual matching algorithm is of time complexity length(query) times length(records)!

See matchPattern, vmatchPattern and matchPDict for more efficient sequence matching functions.


matchprobes(query, records, probepos=FALSE)



A character vector. For example, each element may represent a gene (transcript) of interest. See Details.


A character vector. For example, each element may represent the probes on a DNA array.


A logical value. If TRUE, return also the start positions of the matches in the query sequence.


toupper is applied to the arguments query and records before matching. The intention of this is to make the matching case-insensitive. The function is embarrassingly naive. The matching is done using the C library function strstr.


A list. Its first element is a list of the same length as the input vector. Each element of the list is a numeric vector containing the indices of the probes that have a perfect match in the query sequence.

If probepos is TRUE, the returned list has a second element: it is of the same shape as described above, and gives the respective positions of the matches.


R. Gentleman, Laurent Gautier, Wolfgang Huber

See Also

matchPattern, vmatchPattern, matchPDict


    seq <- hgu95av2probe$sequence[1:20]
    target <- paste(seq, collapse="")
    matchprobes(target, seq, probepos=TRUE)

Example output

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,, basename, cbind, colMeans, colSums, colnames,
    dirname,, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax,, pmin,, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':


Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':


Loading required package: hgu95av2probe
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'hgu95av2probe'

Biostrings documentation built on Nov. 8, 2020, 11:12 p.m.