Home

/

GitHub

/

ctl43/TranSpotteR

/

annotate_seq: Annotate sequences by aligning to different reference...

annotate_seq: Annotate sequences by aligning to different reference...
In ctl43/TranSpotteR: Pipeline for identifying LINE1 insertoin in the WGS data

View source: R/annotate_contigs.R

annotate_seq

R Documentation

Annotate sequences by aligning to different reference sequences sequentially.

Description

Annotate sequences to the aligned regions.

Usage

annotate_seq(seq, insert, genome, 
             customised_annotation = customised_annotation,
             BPPARAM = MulticoreParam(workers = 3L))

Arguments

`seq`	A vector of strings with names.
`insert`	A file path pointing to a BWA indexed fasta sequence. The sequence are the targeted sequence intended to bed identified in the genome,e.g. LINE1, virus genome and plasmid sequence, etc.
`genome`	A file path pointing to BWA indexed genome sequence.
`customised_annotation`	A list object containing functions that further annotate the sequence, e.g. identifying the polyA sequence, etc.
`BPPARAM`	A BiocParallelParam object controlling parameters in parallelization. When the number of workers is larger than 3, it will not improve the performance.

Details

The annotation process includes the following steps, 1. Contigs are aligned to the insert sequence first and annotate the primary alignment (regardless the mapping quality) to the sequence. 2. The unaligned parts are extracted and aligned to the genome. Only aligned regions with mapping quality greater than 10 are annotated to the contigs. If supplementary alignments are found, the primary aligmnet will be assigned to the sequence first, then the supplementary alignment. The assignmnet of supplementary alignment do not follow a specific rule, it follows the order of appearance. 3. The unannotated reads are subjected to customised annotations to identify customised structures, for example, polyA sequence. 4. The unannotated parts in reads are remained to be the sequence itself.

Value

A data.table object with fields, start, end, width, QNAME, annotation, cigar and seq.

`start`	Starting location of the annotation in the read.
`end`	Ending location of the annotation in the read.
`width`	Length of the annotated part.
`QNAME`	Names of the annotated sequence.
`annotation`	Annotation of the read part, e.g. aligned genomic regions, aligned insertion or other customised annotation, etc.
`cigar`	Concise Idiosyncratic Gapped Alignment Report (CIGAR)

.

seq

Original sequence of the annotated part.

Author(s)

Cheuk-Ting Law

ctl43/TranSpotteR
Pipeline for identifying LINE1 insertoin in the WGS data

annotate_seq: Annotate sequences by aligning to different reference...
In ctl43/TranSpotteR: Pipeline for identifying LINE1 insertoin in the WGS data

Annotate sequences by aligning to different reference sequences sequentially.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to annotate_seq in ctl43/TranSpotteR...

R Package Documentation

Browse R Packages

We want your feedback!

ctl43/TranSpotteR Pipeline for identifying LINE1 insertoin in the WGS data

annotate_seq: Annotate sequences by aligning to different reference... In ctl43/TranSpotteR: Pipeline for identifying LINE1 insertoin in the WGS data

Annotate sequences by aligning to different reference sequences sequentially.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to annotate_seq in ctl43/TranSpotteR...

R Package Documentation

Browse R Packages

We want your feedback!

ctl43/TranSpotteR
Pipeline for identifying LINE1 insertoin in the WGS data

annotate_seq: Annotate sequences by aligning to different reference...
In ctl43/TranSpotteR: Pipeline for identifying LINE1 insertoin in the WGS data