Home

/

GitHub

/

ctl43/TranSpotteR

/

README.md

README.md
In ctl43/TranSpotteR: Pipeline for identifying LINE1 insertoin in the WGS data

TranSpotteR

This package aims to identify annotated and unannotated LINE1 insertion in the genome.

1. Extracting useful reads (`extract_reads`)

Reads that are potentially informative for inferring insertion are retrieved, including discordant reads, reads aligning to the sequence of intereset and reads at the region of interest.

2. Clustering reads (`cluster_reads`)

Reads with MAPQ >= 10 are considered as uniquely mapped and clustered together.

3. Assembling reads clusters (`construct_contigs`)

For the assembly step, a self-writtern assembly function employing the Overlaps-Layout-Consensus(OLC) method is applied to assemble reads in read cluster into longer contigs.

4. Annotating the constructed reads (`annotate_contigs`)

Contigs are annotated to the regions that they align to. First, contigs are aligned to sequences of interest (e.g a consensus sequence of Hot LINE1). Then, the unaligned parts of the read will be subjected to the next alignment aginst the genome. The first alignment is served as a 'bait' to collect all the sequence of interest, and the second aligment is to locate the insertion site in the genome. In addition, some useful feature, for example, polyA sequence will also be identified and annotated in this step.

5. Inferring the LINE1 integration (`infer_transposon`)

Under development and will be out soon.

6. Inferring viral/plasmid integration (`infer_simple_insertion`)

extract_reads(bam = "tesing.bam", out_dir = getwd())
reads <- import_files(extracted = "tesing_extracted.txt", anchor_min_mapq = 10)
clusters <- cluster_reads(reads)
clusters <- construct_contigs(clusters) # The most time-consuming sttep;saving the result is recommended
saveRDS(clusters, "test_contigs.rds")
annotation <- annotate_contigs(clusters, insert = "LINE1.fa", genome="hg19.fa")
result <- infer_transposon(annotation) # under development (for LINE1)

extract_reads(bam = "tesing.bam", out_dir = getwd())
reads <- import_files(extracted = "tesing_extracted.txt", anchor_min_mapq = 10)
clusters <- cluster_reads(reads)
clusters <- construct_contigs(clusters)
# Or simply import the result generated for LINE1 insertion
# clusters <- readRDS("test_contigs.rds")
annotation <- annotate_contigs(clusters, insert = "HBV.fa", genome="hg19.fa")
result <- infer_simple_insertion(annotation)

The applications of this package will be extended to detect chromosomal translocation and transduction of viral genome.
Long-read sequencing reads will be accepted later.
Trace back the LINE1 origin from the genome based on sequence similarity (although it is likely that many of them are untraceable).

ctl43/TranSpotteR documentation built on Sept. 9, 2022, 5:49 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ctl43/TranSpotteR
Pipeline for identifying LINE1 insertoin in the WGS data

README.md
In ctl43/TranSpotteR: Pipeline for identifying LINE1 insertoin in the WGS data

TranSpotteR

Workflow

1. Extracting useful reads (`extract_reads`)

2. Clustering reads (`cluster_reads`)

3. Assembling reads clusters (`construct_contigs`)

4. Annotating the constructed reads (`annotate_contigs`)

5. Inferring the LINE1 integration (`infer_transposon`)

6. Inferring viral/plasmid integration (`infer_simple_insertion`)

Example usage (LINE1 insertion)

Example usage (viral insertion)

In the future/To-do list

R Package Documentation

Browse R Packages

We want your feedback!

ctl43/TranSpotteR Pipeline for identifying LINE1 insertoin in the WGS data

README.md In ctl43/TranSpotteR: Pipeline for identifying LINE1 insertoin in the WGS data

TranSpotteR

Workflow

1. Extracting useful reads (extract_reads)

2. Clustering reads (cluster_reads)

3. Assembling reads clusters (construct_contigs)

4. Annotating the constructed reads (annotate_contigs)

5. Inferring the LINE1 integration (infer_transposon)

6. Inferring viral/plasmid integration (infer_simple_insertion)

Example usage (LINE1 insertion)

Example usage (viral insertion)

In the future/To-do list

R Package Documentation

Browse R Packages

We want your feedback!

ctl43/TranSpotteR
Pipeline for identifying LINE1 insertoin in the WGS data

README.md
In ctl43/TranSpotteR: Pipeline for identifying LINE1 insertoin in the WGS data

1. Extracting useful reads (`extract_reads`)

2. Clustering reads (`cluster_reads`)

3. Assembling reads clusters (`construct_contigs`)

4. Annotating the constructed reads (`annotate_contigs`)

5. Inferring the LINE1 integration (`infer_transposon`)

6. Inferring viral/plasmid integration (`infer_simple_insertion`)