cut_probes: Cut probes

cut_probesR Documentation

Cut probes


Generate probes from nucleotide reference sequences


  ref.seq.from.file = FALSE,,
  fasta.file = NULL,
  delete.fasta = FALSE,
  start = 1,
  stop = NULL,
  start.correction = FALSE,
  size = 24:32,
  delete.incomplete = FALSE,
  delete.identical = FALSE, = FALSE,
  mc.cores = 1,
  verbose = TRUE



logical; read reference sequences from file (TRUE) or download them from NCBI data base (FALSE).

identification number of reference nucleotide sequences. Only used when ref.seq.from.file = FALSE. GenBank accession numbers, GenInfo identifiers (GI) or Entrez unique identifiers (UID) may be used.


character; NCBI database for search. See entrez_dbs for possible values. Only used when ref.seq.from.file = FALSE.


character; FASTA file name and path, only used when ref.seq.from.file = TRUE.


logical; delete FASTA file.

start, stop

integer; number of first and last nucleotide of the reference sequence's segment that should be cut into probes. All sequence is used by default.


logical; count probes' start and stop nucleotides relatively to the specified segment (FALSE) or to the whole sequence (TRUE). Only used if start>1.


integer; vector of probe size


logical; remove probes that contain undeciphered nucleotides


logical; remove identical (duplicated) probes

logical; add probes' identification numbers


integer; number of processors for parallel computation (not supported on Windows)


logical; show messages


This function takes nucleotide sequences and cut them on segments (probes) of given size. Sequences might be downloaded from given FASTA file or from NCBI data bases. In the latter case, FASTA file is created. If desired, FASTA file can be deleted after.

Not all sequence must be cut on probes, you may define needed segment by start and stop parameters. Note that in this case probes' start and stop nucleotides would be counted relatively to the specified segment (start.correction = FALSE) or to the whole sequence (start.correction = TRUE).

Undeciphered nucleotides are the one that are indicated by "rywsmkhbvdn" symbols.

Probes' identification numbers are created by adding numeric indexes to reference sequence's identification number.

See cut_string, delete_duplicates_DF and make_ids for details.


Data frame with probe id (optionally), sequence id, probe size, start and stop nucleotide, sequence.


Elena N. Filatova


path <- tempdir()
dir.create (path)
# download and save as FASTA "Chlamydia pneumoniae B21 contig00001,
# whole genome shotgun sequence" (GI = 737435910)
if (!requireNamespace("rentrez", quietly = TRUE)) {
stop("Package \"rentrez\" needed for this function to work. Please install it.", call. = FALSE)}
reference.string <- rentrez::entrez_fetch(db = "nucleotide", id = 737435910,
write( x= reference.string, file = paste0 (path, "/fasta"))
probes <- cut_probes (ref.seq.from.file = TRUE, fasta.file = paste0(path, "/fasta"),
                     delete.fasta = TRUE, start = 1000, stop = 1500,
                     start.correction = FALSE, size = c(400, 500),
                     delete.incomplete = FALSE,
                     delete.identical = FALSE, = TRUE, mc.cores = 1)
unlink (path, recursive = TRUE)

disprose documentation built on March 19, 2022, 2:15 a.m.