adapter_filter: Remove full and partial adapters from a ShortReadQ object

Description Usage Arguments Value Author(s) Examples

View source: R/adapter_filter.R

Description

This program can remove adapters and partial adapters from 3' and 5', using the functions trimLRPatterns The program extends the methodology of the trimLRPatterns function of Biostrings, being also capable of removing adapters present within reads and with other additional otpions (e.g., threshold of minimum number of bases for trimming). For a given position in the read, the two Biostrings functions return TRUE when a match is present between a substring of the read and the adapter. As trimLRPatterns , adapter_filter also selects region and goes up to the end of the sequence in the corresponding flank as the best match. The default error rate is 0.2. If several valid matches are found, the function removes the largest subsequence. Adapters can be anchored or not. When indels are allowed, the second method uses the 'edit distance' between the subsequences and the adapter

Usage

1
2
3
4
adapter_filter(input, Lpattern = "", Rpattern = "", rc.L = FALSE,
  rc.R = FALSE, first = c("R", "L"), with_indels = FALSE,
  error_rate = 0.2, anchored = TRUE, fixed = "subject",
  remove_zero = TRUE, checks = TRUE, min_match_flank = 3L, ...)

Arguments

input

ShortReadQ object

Lpattern

5' pattern (character or DNAString object)

Rpattern

3' pattern (character or DNAString object)

rc.L

Reverse complement Lpattern? default FALSE

rc.R

Reverse complement Rpatter? default FALSE

first

trim first right('R') or left ('L') side of sequences when both Lpattern and Rpattern are passed

with_indels

Allow indels? This feature is available only when the error_rate is not null

error_rate

Error rate (value in the range [0, 1] The error rate is the proportion of mismatches allowed between the adapter and the aligned portion of the subject. For a given adapter A, the number of allowed mismatches between each subsequence s of A and the subject is computed as: error_rate * L_s, where L_s is the length of the subsequence s

anchored

Adapter or partial adapter within sequence (anchored = FALSE, default) or only in 3' and 5' terminals? (anchored = TRUE)

fixed

Parameter passed to trimLRPatterns Default 'subject', ambiguities in the pattern only are interpreted as wildcard. See the argument fixed in trimLRPatterns

remove_zero

Remove zero-length sequences? Default TRUE

checks

Perform checks? Default TRUE

min_match_flank

Do not trim in flanks of the subject, if a match has min_match_flank of less length. Default 1L (only trim with >=2 coincidences in a flank match)

...

additional parameters passed to trimLRPatterns

Value

Edited DNAString or DNAStringSet object

Filtered ShortReadQ object

Author(s)

Leandro Roser learoser@gmail.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
require('Biostrings')
require('ShortRead')

# create 6 sequences of width 43
set.seed(10)
input <- random_seq(6, 43)

# add adapter in 3' 
adapter <- "ATCGACT"

input <- paste0(input, as.character(DNAString(adapter)))
input <- DNAStringSet(input)

# create qualities of width 50
set.seed(10)
input_q <- random_qual(c(30,40), slength = 6, swidth = 50, 
encod = 'Sanger')

# create names
input_names <- seq_names(length(input))

# create ShortReadQ object
my_read <- ShortReadQ(sread = input, quality = input_q, id = input_names)

# trim adapter
filtered <- adapter_filter(my_read, Rpattern = adapter)

# look at the filtered sequences
sread(filtered)

FastqCleaner documentation built on Nov. 8, 2020, 5:05 p.m.