Description Usage Arguments Details Value Author(s) See Also Examples
makeTxDb is a low-level constructor for making
a TxDb object from user supplied transcript annotations.
Note that the end user will rarely need to use makeTxDb directly
but will typically use one of the high-level constructors
makeTxDbFromUCSC, makeTxDbFromEnsembl,
or makeTxDbFromGFF.
1 2 3 |
transcripts |
Data frame containing the genomic locations of a set of transcripts. |
splicings |
Data frame containing the exon and CDS locations of a set of transcripts. |
genes |
Data frame containing the genes associated to a set of transcripts. |
chrominfo |
Data frame containing information about the chromosomes hosting the set of transcripts. |
metadata |
2-column data frame containing meta information about this set of
transcripts like organism, genome, UCSC table, etc...
The names of the columns must be |
reassign.ids |
|
on.foreign.transcripts |
Controls what to do when the input contains foreign transcripts
i.e. transcripts that are on sequences not in |
The transcripts (required), splicings (required)
and genes (optional) arguments must be data frames that
describe a set of transcripts and the genomic features related
to them (exons, CDS and genes at the moment).
The chrominfo (optional) argument must be a data frame
containing chromosome information like the length of each chromosome.
transcripts must have 1 row per transcript and the following
columns:
tx_id: Transcript ID. Integer vector. No NAs. No duplicates.
tx_chrom: Transcript chromosome. Character vector (or factor)
with no NAs.
tx_strand: Transcript strand. Character vector (or factor)
with no NAs where each element is either "+" or "-".
tx_start, tx_end: Transcript start and end.
Integer vectors with no NAs.
tx_name: [optional] Transcript name. Character vector (or
factor). NAs and/or duplicates are ok.
tx_type: [optional] Transcript type (e.g. mRNA, ncRNA, snoRNA,
etc...). Character vector (or factor). NAs and/or duplicates are ok.
gene_id: [optional] Associated gene. Character vector (or
factor). NAs and/or duplicates are ok.
Other columns, if any, are ignored (with a warning).
splicings must have N rows per transcript, where N is the nb
of exons in the transcript. Each row describes an exon plus, optionally,
the CDS contained in this exon. Its columns must be:
tx_id: Foreign key that links each row in the splicings
data frame to a unique row in the transcripts data frame.
Note that more than 1 row in splicings can be linked to the
same row in transcripts (many-to-one relationship).
Same type as transcripts$tx_id (integer vector). No NAs.
All the values in this column must be present in
transcripts$tx_id.
exon_rank: The rank of the exon in the transcript.
Integer vector with no NAs. (tx_id, exon_rank)
pairs must be unique.
exon_id: [optional] Exon ID.
Integer vector with no NAs.
exon_name: [optional] Exon name. Character vector (or factor).
NAs and/or duplicates are ok.
exon_chrom: [optional] Exon chromosome.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_chrom is used.
If present then exon_strand must also be present.
exon_strand: [optional] Exon strand.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_strand is used
and exon_chrom must also be missing.
exon_start, exon_end: Exon start and end.
Integer vectors with no NAs.
cds_id: [optional] CDS ID. Integer vector.
If present then cds_start and cds_end must also
be present.
NAs are allowed and must match those in cds_start and
cds_end.
cds_name: [optional] CDS name. Character vector (or factor).
If present then cds_start and cds_end must also be
present. NAs and/or duplicates are ok. Must contain NAs at least
where cds_start and cds_end contain them.
cds_start, cds_end: [optional] CDS start and end.
Integer vectors.
If one of the 2 columns is missing then all cds_* columns
must be missing.
NAs are allowed and must occur at the same positions in
cds_start and cds_end.
cds_phase: [optional] CDS phase. Integer vector.
If present then cds_start and cds_end must also
be present.
NAs are allowed and must match those in cds_start and
cds_end.
Other columns, if any, are ignored (with a warning).
genes should not be supplied if transcripts has a
gene_id column. If supplied, it must have N rows per transcript,
where N is the nb of genes linked to the transcript (N will be 1 most
of the time). Its columns must be:
tx_id: [optional] genes must have either a
tx_id or a tx_name column but not both.
Like splicings$tx_id, this is a foreign key that
links each row in the genes data frame to a unique
row in the transcripts data frame.
tx_name: [optional]
Can be used as an alternative to the genes$tx_id
foreign key.
gene_id: Gene ID. Character vector (or factor). No NAs.
Other columns, if any, are ignored (with a warning).
chrominfo must have 1 row per chromosome and the following
columns:
chrom: Chromosome name.
Character vector (or factor) with no NAs and no duplicates.
length: Chromosome length.
Integer vector with either all NAs or no NAs.
is_circular: [optional] Chromosome circularity flag.
Logical vector. NAs are ok.
Other columns, if any, are ignored (with a warning).
A TxDb object.
Hervé Pagès
makeTxDbFromUCSC, makeTxDbFromBiomart,
and makeTxDbFromEnsembl, for making a TxDb
object from online resources.
makeTxDbFromGRanges and makeTxDbFromGFF
for making a TxDb object from a GRanges
object, or from a GFF or GTF file.
The TxDb class.
saveDb and
loadDb in the AnnotationDbi
package for saving and loading a TxDb object as an SQLite
file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | transcripts <- data.frame(
tx_id=1:3,
tx_chrom="chr1",
tx_strand=c("-", "+", "+"),
tx_start=c(1, 2001, 2001),
tx_end=c(999, 2199, 2199))
splicings <- data.frame(
tx_id=c(1L, 2L, 2L, 2L, 3L, 3L),
exon_rank=c(1, 1, 2, 3, 1, 2),
exon_start=c(1, 2001, 2101, 2131, 2001, 2131),
exon_end=c(999, 2085, 2144, 2199, 2085, 2199),
cds_start=c(1, 2022, 2101, 2131, NA, NA),
cds_end=c(999, 2085, 2144, 2193, NA, NA),
cds_phase=c(0, 0, 2, 0, NA, NA))
txdb <- makeTxDb(transcripts, splicings)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.