extract_region: Extract regions from a set of sequences (maybe with...

Description Usage Arguments Value

View source: R/tzara.R

Description

Extract regions from a set of sequences (maybe with qualities)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
extract_region(
  seq,
  positions,
  region,
  region2 = region,
  outfile = NULL,
  append = FALSE,
  format,
  compress = NULL,
  ...
)

## S3 method for class 'character'
extract_region(
  seq,
  positions,
  region,
  region2 = region,
  outfile = NULL,
  append = FALSE,
  format = NULL,
  compress = NULL,
  qualityType = "FastqQuality",
  ...
)

## S3 method for class 'XStringSet'
extract_region(
  seq,
  positions,
  region,
  region2 = region,
  outfile = NULL,
  append = FALSE,
  format = NULL,
  compress = NULL,
  ...
)

## S3 method for class 'ShortRead'
extract_region(
  seq,
  positions,
  region,
  region2 = region,
  outfile = NULL,
  append = FALSE,
  format = NULL,
  compress = NULL,
  ...
)

## S3 method for class 'list'
extract_region(seq, positions, region, region2 = region, outfile = NULL, ...)

Arguments

seq

(character scalar (a file name), an object belonging to several classes representing nucleotide sequences, a character vector of nucleotide sequences, or a list of any of these) the sequences to extract regions from. To extract from several files, use a list of single filenames, instead of a vector of filenames.

positions

(data.frame) as returned by itsx with positions = TRUE and read_function set; should have columns $seq_id with sequence IDs (matching those in seq), $region giving the name of each region, and $start and $end giving the start and stop location, if found, of each region.

region

(character) The region to extract. Should match a value given in positions$region.

region2

(character) If different from region, then the entire segment beginning at the start of region and ending at the end of region2 will be extracted. For instance, to extract the entire ITS region, use region = 'ITS1', region2 = 'ITS2'.

outfile

(character) If given, the output will be written to the filename given in fasta or fastq format. The format is determined by seq, not by the extension of outfile.

append

(logical scalar) if TRUE, then data is appended to outfile; if FALSE, existing data in outfile is overwritten.

format

(character) File format to write (if outfile is given). Default is to guess based on ".fasta[.gz]" or ".fastq[.gz]" extension.

compress

(logical scalar or NULL) Whether to gz-compress outfile. Default is to detect based on presence of ".gz" extension.

...

Passed to methods.

qualityType

(character scalar) fastq file quality encoding; see readFastq.

Value

(object of the same class as seq, or if seq is a filename, XStringSet-class or QualityScaledXStringSet-class depending on the format of the file) The requested region from each of the input sequences where it was found.


brendanf/tzara documentation built on March 11, 2021, 5:40 a.m.