cut_probes: Cut probes

cut_probesR Documentation

Cut probes

Description

Generate probes from nucleotide reference sequences

Usage

cut_probes(
  ref.seq.from.file = FALSE,
  ref.seq.id,
  ref.seq.db,
  fasta.file = NULL,
  delete.fasta = FALSE,
  start = 1,
  stop = NULL,
  start.correction = FALSE,
  size = 24:32,
  delete.incomplete = FALSE,
  delete.identical = FALSE,
  give.probes.id = FALSE,
  mc.cores = 1,
  verbose = TRUE
)

Arguments

ref.seq.from.file

logical; read reference sequences from file (TRUE) or download them from NCBI data base (FALSE).

ref.seq.id

identification number of reference nucleotide sequences. Only used when ref.seq.from.file = FALSE. GenBank accession numbers, GenInfo identifiers (GI) or Entrez unique identifiers (UID) may be used.

ref.seq.db

character; NCBI database for search. See entrez_dbs for possible values. Only used when ref.seq.from.file = FALSE.

fasta.file

character; FASTA file name and path, only used when ref.seq.from.file = TRUE.

delete.fasta

logical; delete FASTA file.

start, stop

integer; number of first and last nucleotide of the reference sequence's segment that should be cut into probes. All sequence is used by default.

start.correction

logical; count probes' start and stop nucleotides relatively to the specified segment (FALSE) or to the whole sequence (TRUE). Only used if start>1.

size

integer; vector of probe size

delete.incomplete

logical; remove probes that contain undeciphered nucleotides

delete.identical

logical; remove identical (duplicated) probes

give.probes.id

logical; add probes' identification numbers

mc.cores

integer; number of processors for parallel computation (not supported on Windows)

verbose

logical; show messages

Details

This function takes nucleotide sequences and cut them on segments (probes) of given size. Sequences might be downloaded from given FASTA file or from NCBI data bases. In the latter case, FASTA file is created. If desired, FASTA file can be deleted after.

Not all sequence must be cut on probes, you may define needed segment by start and stop parameters. Note that in this case probes' start and stop nucleotides would be counted relatively to the specified segment (start.correction = FALSE) or to the whole sequence (start.correction = TRUE).

Undeciphered nucleotides are the one that are indicated by "rywsmkhbvdn" symbols.

Probes' identification numbers are created by adding numeric indexes to reference sequence's identification number.

See cut_string, delete_duplicates_DF and make_ids for details.

Value

Data frame with probe id (optionally), sequence id, probe size, start and stop nucleotide, sequence.

Author(s)

Elena N. Filatova

Examples

path <- tempdir()
dir.create (path)
# download and save as FASTA "Chlamydia pneumoniae B21 contig00001,
# whole genome shotgun sequence" (GI = 737435910)
if (!requireNamespace("rentrez", quietly = TRUE)) {
stop("Package \"rentrez\" needed for this function to work. Please install it.", call. = FALSE)}
reference.string <- rentrez::entrez_fetch(db = "nucleotide", id = 737435910,
                                         rettype="fasta")
write( x= reference.string, file = paste0 (path, "/fasta"))
probes <- cut_probes (ref.seq.from.file = TRUE, fasta.file = paste0(path, "/fasta"),
                     delete.fasta = TRUE, start = 1000, stop = 1500,
                     start.correction = FALSE, size = c(400, 500),
                     delete.incomplete = FALSE,
                     delete.identical = FALSE, give.probes.id = TRUE, mc.cores = 1)
unlink (path, recursive = TRUE)


disprose documentation built on March 19, 2022, 2:15 a.m.