extractSubseq: Extract subsequence
In MarioniLab/sarlacc: Pipeline for Oxford Nanopore RNA-Seq Data Analysis

Description Usage Arguments Details Value Author(s) See Also Examples

Extract an arbitrary read subsequence corresponding to positions of the aligned adaptor.

1	extractSubseq(aligned, subseq1, subseq2, number=1e5, BPPARAM=SerialParam())

`aligned`	A DataFrame containing the output of `adaptorAlign`.
`subseq1`	A list of two integer vectors `start` and `end` of equal length. Parallel entries specify the start and end positions on adaptor 1 to extract the aligned read subequence.
`subseq2`	Same as `subseq1` but for adaptor 2.
`number`	Integer scalar specifying the number of records to read at once from the FASTQ file, see `?FastqStreamer`.
`BPPARAM`	A BiocParallelParam object specifying how the parallelization is to be performed.

This function will align the adaptors in aligned to the start and end of each read (see ?adaptorAlign). From the alignment, it will extract the subsequence of the read corresponding to the specified positions on the adaptor sequence in subseq1 or subseq2. This is useful in other functions such as expectedDist, which rely on read sequences corresponding to constant regions of the adaptor.

At least one of subseq1 or subseq2 must be specified.

A list containing up to two DataFrames. Each DataFrame corresponds to an adaptor and contains the extracted read subsequences where each row corresponds to a row of aligned. DataFrames are only returned for adaptors where subseq* was specified.

Aaron Lun

adaptorAlign to generate aligned.

example(adaptorAlign)

# Let's say we want to take the first part of 'a1'.
substr(a1, 1, 9)
extractSubseq(out, subseq1=list(starts=1, ends=9))

# Let's say we also want to take some part of 'a2'.
substr(a2, 5, 11)
extractSubseq(out, subseq1=list(starts=1, ends=9),
    subseq2=list(starts=5, ends=11))