sort_tr2g: Sort transcripts to the same order as in kallisto index

View source: R/tr2g.R

sort_tr2gR Documentation

Sort transcripts to the same order as in kallisto index

Description

This function takes the data frame output from the tr2g_* family of functions in this package as the input, and sorts it so the transcripts are in the same order as in the kallisto index used to generate the bus file. Sorting is vital to obtain the correct sparse matrix from the bus file as equivalence class notations are based on the index of transcripts in the kallisto index.

Usage

sort_tr2g(tr2g, file, kallisto_out_path)

Arguments

tr2g

The data frame output from the tr2g_* family of functions.

file

Character vector of length 1, path to a tsv file with transcript IDs and the corresponding gene IDs, in the format required for bustools, or written by save_tr2g_bustools.

kallisto_out_path

Character vector of length 1, path to the directory for the outputs of kallisto bus.

Details

Since the attribute field of GTF and GFF3 files varies across sources, output from tr2g_gtf and tr2g_gff3 may need further clean up. You may also supply gene and transcript IDs from other sources. This function should be used after the clean up, when the transcript IDs in the cleaned up data frame have the same format as those in transcript

Value

A data frame with columns transcript and gene and the other columns present in tr2g or the data frame in file, with the transcript IDs sorted to be in the same order as in the kallisto index.

Note

This function has been superseded by the new version of tr2g_* functions that can extract transcriptome for only the biotypes specified and with only the standard chromosomes. The new version of tr2g_* functions also sorts the transcriptome so the tr2g and the transcriptome have transcripts in the same order.

See Also

Other functions to retrieve transcript and gene info: tr2g_EnsDb(), tr2g_TxDb(), tr2g_ensembl(), tr2g_fasta(), tr2g_gff3(), tr2g_gtf(), transcript2gene()

Examples

toy_path <- system.file("testdata", package = "BUSpaRse")
file_use <- paste(toy_path, "gtf_test.gtf", sep = "/")
tr2g <- tr2g_gtf(file = file_use, get_transcriptome = FALSE,
  write_tr2g = FALSE, save_filtered_gtf = FALSE, transcript_version = NULL)
tr2g <- sort_tr2g(tr2g, kallisto_out_path = toy_path)

BUStools/BUSpaRse documentation built on Aug. 2, 2024, 5:07 a.m.