map_rangetype: Interval classification.

View source: R/map_rangetype.R

map_rangetypeR Documentation

Interval classification.

Description

map_rangetype Classifies sequences based on interval mapping against a reference.

Usage

map_rangetype(
  map,
  type = "percent",
  ss = NULL,
  min_loop_width = 4,
  intervals = list(start = 1:5, mid = 45:55, end = 95:100),
  N_include = FALSE
)

Arguments

map

PAC_map (generated by PAC_mapper) or a Reanno object (generated by map_reanno), containing mapping coordinates against a reference fasta file.

type

Character indicating what type of intervals that is provided.

If type="nucleotides", then the interval list is given as ranges of nucleotide positions. For example, if interval=list(start=1:3, end=1:3) the function will classify sequences starting within the first three nucleotides of the reference as 'type_start_nuc' and sequences ending in within the last three nucleotides of the reference as 'type_end_nuc'.

If type="percent", then intervals needs to be provided as percent ranges. For example, if intervals=list(start=1:5, mid=45:50, end=95:100) then the function will classify sequences starting within the 5 in the references as 'type_start_per', and sequences ending within the 5 last nucleotides of the references as "type_end_per". It will also, classify sequences starting within 45-50 "type_mid_start_per" and sequences ending within 45-50 as 'type_mid_end_per'.

If type="ss", then intervals is obtained from an ss file, obtained for example from tRNAscan-SE (http://lowelab.ucsc.edu/tRNAscan-SE/) or at GtRNAdb http://gtrnadb.ucsc.edu/.

Importantly - the intervals list is name sensitive. If type="nuclotides", intervals can only contain two intervals named 'start' and 'end', while if type="percent" then intervals needs to contain three intervals named 'start', 'mid' and 'end'.

Hint, for classifying 5' and 3' half tsRNA you need to run the function twice. First, classify each sequence as 5'-start or 3'-end tsRNA using type="nucleotides", and then rerun the the map object using type="percent" specifying the 'mid' region as the half interval.

ss

File path to ss file (character), readLines vector of ss file (character) or ss list. If character, the function will attempt to read a file from the path given in the character string. If this fails, the function assumes that the ss file has already been read using readLines, and will attempt to split that character vector into a list of unique sequences by splitting at the empty lines. Empty line normally delimits each sequence entry in the ss file. Such a list can also be parsed directly to the function, making it easy to change for example sequence names using lapply prior to running the function.

min_loop_width

Integer setting the minimum number of nucleotides for a loop. Only applicable when type="ss". Loops in ss-files are defined by ">" followed by x number of "." ending with "<". For example:
ATCGGTGGTTCAGTGGTAGAATGCTCGCCTCGCGGGCGGCCCGGGTTCGATTCCCGGCCGATG
>>>>>..>>>>.......<<<<.>>>>>...<<<<<....>>>>>.......<<<<<<<<<<<
Here are three possible loops: "AGTGGTA", "CTC", "TTCGATT". If min_loop_width=3, the middle loop (">...<"="CTC") will be classified as a loop. If min_loop_width=4 (default), the middle loop will not be classified as a loop because it is too short.

intervals

A named list with integer intervals.

N_include

Logical whether or not N "wild card" nucleotides should be counted in the terminals. This conveniently controls the N_up and N_down arguments in the PAC_mapper function. If N_include=FALSE (default), start and end of tRNA will be measured from the first and last canonical nucleotides (A, T, C, G). Thus, if fragments align to an NNN-terminal, it will receive a negitve value. If N_include=TRUE, N wild-cards will be treated as any other nucleotide.

Details

Given a PAC_map object (PAC_mapper) and an interval list this function will attempt to classify mapped sequences based on where these sequences starts and ends in reference. This function can for example be used for 5' and 3' tRNA classification.

Value

Map list object containing reference sequence (Ref_seq) as Biostrings::DNAStringSet and the new classifications embedded with the alignments (Alignments) in a dataframe.

See Also

https://github.com/Danis102 for updates on the current package.

Other PAC analysis: PAC_covplot(), PAC_deseq(), PAC_filter(), PAC_filtsep(), PAC_gtf(), PAC_jitter(), PAC_mapper(), PAC_nbias(), PAC_norm(), PAC_pca(), PAC_pie(), PAC_saturation(), PAC_sizedist(), PAC_stackbar(), PAC_summary(), PAC_trna(), as.PAC(), filtsep_bin(), tRNA_class()

Examples


###########################################################
### test the map_rangetype function
# More complicated examples can be found in the vignette.
##----------------------------------------

# First create an annotation blank PAC with group means
load(system.file("extdata", "drosophila_sRNA_pac_filt_anno.Rdata", 
                 package = "seqpac", mustWork = TRUE))
anno(pac) <- anno(pac)[,1, drop=FALSE]
pac_trna <- PAC_summary(pac, norm = "cpm", type = "means", 
                        pheno_target=list("stage"), merge_pac = TRUE)

# Then re-annotate only tRNA using the PAC_mapper function
ref <- system.file("extdata/trna", "tRNA.fa", 
                         package = "seqpac", mustWork = TRUE)
map_object <- PAC_mapper(pac_trna, ref=ref, N_up = "NNN", N_down = "NNN", 
                         mismatches=0, threads=2, report_string=TRUE, 
                         override=TRUE)

## Coverage plot of tRNA using PAC_covplot

# Single tRNA targeting a summary table 
PAC_covplot(pac_trna, map=map_object, 
                      summary_target= list("cpmMeans_stage"),
                      map_target="tRNA-Ala-AGC-1-1")
            
## Classify range types with map_rangetype (see vignette for examples
# on how to use ss-files for detailed tRNA loop structure).

# Classify fragments using percent intervals
map_object <- map_rangetype(map_object, 
                intervals = list(start = 1:5, mid = 45:55, end = 95:100))
       
names(map_object)
map_object[[1]]


Danis102/seqpac documentation built on Aug. 26, 2023, 10:15 a.m.