virtualFISH: Virtual _in situ_ hybridization.

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/virtualFISH.R

Description

This function queries a list of DNA sequences with a virtual probe (either a sequence or a profile hidden Markov model) and returns only the sequences and regions that are of sufficient similarity based on log-odds alignment scoring.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
virtualFISH(
  x,
  probe,
  minscore = 100,
  minamplen = 50,
  maxamplen = 500,
  up = NULL,
  down = NULL,
  rcdown = TRUE,
  minfsc = 60,
  minrsc = 60,
  maxNs = 0.02,
  cores = 1,
  quiet = FALSE
)

Arguments

x

a list of DNA sequences in DNAbin format.

probe

a DNA sequence ("DNAbin" object) or profile hidden Markov model ("PHMM" object) to be used as the virtual hybridization probe.

minscore

numeric; the minimum specificity (log-odds score for the optimal alignment) between the query sequence and the probe for the former to be retained in the output object.

minamplen, maxamplen

integers giving the minimum and maximum acceptable amplicon lengths.

up, down

optional objects of class DNAbin giving the forward and reverse primer sequences with which to query the sequence list following virtual probe hybridization.

rcdown

logical indicating whether the reverse primer should be reverse-complemented prior to aligning with the input sequences. Should be set to TRUE if down is the reverse complement of the target sequence (e.g. the sequence of a reverse primer as would be ordered from an oligo supplier).

minfsc

numeric, giving the minimum specificity(log-odds score for the optimal alignment) between the forward primer and a sequence for that sequence to be retained.

minrsc

numeric, the minimum specificity (log-odds score for the optimal alignment) between the reverse primer (if provided) and a sequence for that sequence to be retained.

maxNs

numeric giving the maximum acceptable proportion of the ambiguous residue "N" within the output sequences. Defaults to 0.02.

cores

integer giving the number of processors for multithreading. Defaults to 1, and reverts to 1 if x is not a list. This argument may alternatively be a 'cluster' object, in which case it is the user's responsibility to close the socket connection at the conclusion of the operation, for example by running parallel::stopCluster(cores). The string 'autodetect' is also accepted, in which case the maximum number of cores to use is one less than the total number of cores available. Note that in this case there may be a tradeoff in terms of speed depending on the number and size of sequences to be processed, due to the extra time required to initialize the cluster.

quiet

logical indicating whether progress should be printed to the console.

Details

This function is generally used when filtering/trimming a local sequence database, to mop up any high-scoring sequences with partial/missing primer bind sites that were not included in the output of the virtualPCR. For example, this includes sequences that were generated using the same primer set as used in the virtual PCR, and whose primer binding sites were trimmed prior to deposition in the sequence database. Unlike the virtualPCR function, there is no option to retain the primer-bind sites in the returned sequences.

Value

a list of trimmed sequences, returned as an object of class DNAbin.

Author(s)

Shaun Wilkinson

See Also

virtualPCR

Examples

1
2
3
4
  ## ensure whale sequences are globally alignable
  data(whales)
  model <- aphid::derivePHMM(whales)
  z <- virtualFISH(whales, probe = model)

shaunpwilkinson/insect documentation built on Aug. 9, 2021, 5 a.m.