load_anno: Load a transcriptome lengths file

load_lengthsR Documentation

Load a transcriptome lengths file

Description

'load_lengths' reads in a transcriptome lengths file

‘gff_to_lengths' reads in a GFF-formatted file and parse for 5’ UTR, CDS, and 3' UTR lengths. First column must be the transcript name; third column must be one of 'UTR5', 'CDS', or 'UTR3'.

'load_fasta' reads in a FASTA-formatted file and generates a character vector of transcript sequences

'load_offsets' reads in a .txt file containing A-site offset values per RPF length and frame. Row names correspond to RPF length and column names correspond to RPF frame (named 'frame_0', 'frame_1', and 'frame_2').

'read_fasta_as_codons' reads in a FASTA-formatted file and generates a list of character vectors where transcript sequences have been split into codons.

Usage

load_lengths(lengths_fname)

gff_to_lengths(gff_fname)

load_fasta(transcript_fa_fname)

load_offsets(offsets_fname)

read_fasta_as_codons(transcript_fa_fname, transcript_length_fname)

Arguments

gff_fname

character; file path to GFF annotation file

transcript_fa_fname

character; file path to transcriptome .fasta file

offsets_fname

character; file path to A site assignment rules .txt file

transcript_length_fname

character; file path to transcriptome lengths file

length_fname

character; file path to transcriptome lengths file

Value

A data frame containing transcript names, 5' UTR lengths, CDS lengths, and 3' UTR lengths

A data frame containing transcript names, 5' UTR lengths, CDS

A named numeric vector of transcript sequences

A data frame in long format of A site offset values per RPF length and frame

A named list of character vectors of transcript sequences as codons


amandamok/choros documentation built on March 15, 2023, 7:57 p.m.