get_velocity_files | R Documentation |
Computation of RNA velocity requires the number of unspliced transcripts, which can be quantified with reads containing intronic sequences. This function extracts intronic sequences flanked by L-1 bases of exonic sequences where L is the biological read length of the single cell technology of interest. The flanking exonic sequences are included for reads partially mapping to an intron and an exon.
get_velocity_files(
X,
L,
Genome,
Transcriptome = NULL,
out_path = ".",
style = c("genome", "Ensembl", "UCSC", "NCBI", "other"),
isoform_action = c("separate", "collapse"),
exon_option = c("full", "junction"),
compress_fa = FALSE,
width = 80L,
...
)
## S4 method for signature 'GRanges'
get_velocity_files(
X,
L,
Genome,
Transcriptome = NULL,
out_path = ".",
style = c("genome", "Ensembl", "UCSC", "NCBI", "other"),
isoform_action = c("separate", "collapse"),
exon_option = c("full", "junction"),
compress_fa = FALSE,
width = 80L,
transcript_id = "transcript_id",
gene_id = "gene_id",
transcript_version = "transcript_version",
gene_version = "gene_version",
version_sep = ".",
transcript_biotype_col = "transcript_biotype",
gene_biotype_col = "gene_biotype",
transcript_biotype_use = "all",
gene_biotype_use = "all",
chrs_only = TRUE,
save_filtered_gtf = FALSE
)
## S4 method for signature 'character'
get_velocity_files(
X,
L,
Genome,
Transcriptome = NULL,
out_path = ".",
style = c("genome", "Ensembl", "UCSC", "NCBI", "other"),
isoform_action = c("separate", "collapse"),
exon_option = c("full", "junction"),
compress_fa = FALSE,
width = 80L,
is_circular = NULL,
transcript_id = "transcript_id",
gene_id = "gene_id",
transcript_version = "transcript_version",
gene_version = "gene_version",
version_sep = ".",
transcript_biotype_col = "transcript_biotype",
gene_biotype_col = "gene_biotype",
transcript_biotype_use = "all",
gene_biotype_use = "all",
chrs_only = TRUE,
save_filtered_gtf = FALSE
)
## S4 method for signature 'TxDb'
get_velocity_files(
X,
L,
Genome,
Transcriptome,
out_path,
style = c("genome", "Ensembl", "UCSC", "NCBI", "other"),
isoform_action = c("separate", "collapse"),
exon_option = c("full", "junction"),
compress_fa = FALSE,
width = 80L,
chrs_only = TRUE
)
## S4 method for signature 'EnsDb'
get_velocity_files(
X,
L,
Genome,
Transcriptome,
out_path,
style = c("genome", "Ensembl", "UCSC", "NCBI", "other"),
isoform_action = c("separate", "collapse"),
exon_option = c("full", "junction"),
compress_fa = FALSE,
width = 80L,
use_transcript_version = TRUE,
use_gene_version = TRUE,
transcript_biotype_col = "TXBIOTYPE",
gene_biotype_col = "GENEBIOTYPE",
transcript_biotype_use = "all",
gene_biotype_use = "all",
chrs_only = TRUE
)
X |
Gene annotation with transcript and exon information. It can be a
path to a GTF file with annotation of exon coordinates of
transcripts, preferably from Ensembl. In the metadata, the following fields
are required: type (e.g. whether the range of interest is a gene or
transcript or exon or CDS), gene ID, and transcript ID. These
fields need not to have standard names, as long as their names are specified
in arguments of this function. It can also be a |
L |
Length of the biological read. For instance, 10xv1: 98 nt,
10xv2: 98 nt, 10xv3: 91 nt, Drop-seq: 50 nt. If in doubt check read length
in a fastq file for biological reads with the |
Genome |
Either a |
Transcriptome |
A |
out_path |
Directory to save the outputs written to disk. If this directory does not exist, then it will be created. Defaults to the current working directory. |
style |
Formatting of chromosome names. Use
|
isoform_action |
Character, indicating action to take with different transcripts of the same gene. Must be one of the following:
|
exon_option |
Character, indicating how exonic sequences should be included in the kallisto index. Must be one of the following:
|
compress_fa |
Logical, whether to compress the output fasta file. If
|
width |
Maximum number of letters per line of sequence in the output fasta file. Must be an integer. |
... |
Extra arguments for methods. |
transcript_id |
Character vector of length 1. Tag in |
gene_id |
Character vector of length 1. Tag in |
transcript_version |
Character vector of length 1. Tag in |
gene_version |
Character vector of length 1. Tag in |
version_sep |
Character to separate bewteen the main ID and the version number. Defaults to ".", as in Ensembl. |
transcript_biotype_col |
Character vector of length 1. Tag in
|
gene_biotype_col |
Character vector of length 1. Tag in |
transcript_biotype_use |
Character, can be "all" or
a vector of transcript biotypes to be used. Transcript biotypes aren't
entirely the same as gene biotypes. For instance, in Ensembl annotation,
|
gene_biotype_use |
Character, can be "all", "cellranger", or
a vector of gene biotypes to be used. If "cellranger", then the biotypes
used by Cell Ranger's reference are used. See |
chrs_only |
Logical, whether to include chromosomes only, for GTF and
GFF files can contain annotations for scaffolds, which are not incorporated
into chromosomes. This will also exclude haplotypes. Defaults to |
save_filtered_gtf |
Logical. If filtering type, biotypes, and/or
chromosomes, whether to save the filtered |
is_circular |
Logical vector of the same length as the number of
sequences in the annotation and with the same names as the sequences,
indicating whether the sequence is circular. If |
use_transcript_version |
Logical, whether to include version number in the Ensembl transcript ID. |
use_gene_version |
Logical, whether to include version number in the Ensembl gene ID. Unlike transcript version number, it's up to you whether to include gene version number. |
The following files will be written to disk in the directory
out_path
:
A fasta file containing both the spliced transcripts
and the flanked intronic sequences. The intronic sequences are flanked by L-1
nt of exonic sequences to capture reads from nascent transcript partially
mapping to exons. If the exon is shorter than 2*(L-1) nt, then the entire
exon will be included in the intronic sequence. This will be used to build
the kallisto
index.
A text file of transcript IDs of spliced
transcripts. If exon_option == "junction"
, then IDs of the exon-exon
junctions. These IDs will have the pattern "transcript ID"-Jx, where x is a
number differentiating between different junctions of the same transcript.
Here x will always be ordered from 5' to 3' as on the plus strand.
A text file of IDs of introns. The names will have the pattern "transcript ID"-Ix, where x is a number differentiating between introns of the same transcript. If all transcripts of the same gene are collapsed before inferring intronic sequences, gene ID will be used in place of transcript ID. Here x will always be ordered from 5' to 3' as on the plus strand.
A text file with two columns matching transcripts and introns to genes. The first column is transcript or intron ID, and the second column is the corresponding gene ID. The part for transcripts are generated from the gene annotation supplied.
Nothing is returned into the R session.
# Use toy example
toy_path <- system.file("testdata", package = "BUSpaRse")
file <- paste0(toy_path, "/velocity_annot.gtf")
genome <- Biostrings::readDNAStringSet(paste0(toy_path, "/velocity_genome.fa"))
transcriptome <- paste0(toy_path, "/velocity_tx.fa")
get_velocity_files(file, 11, genome, transcriptome, ".",
gene_version = NULL, transcript_version = NULL)
# Clean up output of the example
file.remove("cDNA_introns.fa", "cDNA_tx_to_capture.txt",
"introns_tx_to_capture.txt", "tr2g.txt")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.