describeGtfAttrNames: describe gtf/gff3 attribute names by feature type

describeGtfAttrNamesR Documentation

describe gtf/gff3 attribute names by feature type

Description

describe gtf/gff3 attribute names by feature type

Usage

describeGtfAttrNames(
  GTF,
  geneFeatureType = "gene",
  txFeatureType = c("transcript", "mRNA"),
  nrows = 10000,
  maxNper = 10,
  maxAttrs = 50,
  zcat_command = "zcat",
  verbose = FALSE,
  ...
)

Arguments

GTF

character path to GTF or GFF3 file, or data.frame containing GTF or GFF3 data.

geneFeatureType, txFeatureType

character vectors with values in column 3 of the GTF or GFF3 file, used to subset then split the output data.

  • Return all feature types by providing any of these terms: ".", "any", "all"

nrows

integer max number of rows to process. For this purpose, summarizing the type of data seen for each feature type, a subset of rows is usually sufficient.

maxNper

integer default 10, number of entries retained within each feature type. Set to Inf to retain all data. As a brief summary, 10 is sufficient to show typical content.

maxAttrs

integer default 50, maximum attributes to retain for each entry. Only in rare cases are more than 50 attributes present for one record, and typicaly these are the rare cases where those attributes were not necessary for annotation purposes.

zcat_command

character name or path to the zcat command or equivalent, used only when the R package 'R.utils' is not installed, and the input GTF has .gz file extension.

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Details

  • Note that when the "name" in a name/value pair is repeated, the first instance retains the name, while subsequent instances are versioned by jamba::makeNames(x, renameFirst=FALSE). For example "tag" may appear multiple times, the resulting colnames will become: c("tag", "tag_v1", "tag_v2").

Value

list named by ⁠c(geneFeatureType, txFeatureType⁠ with data.frame objects which have split the name/value pairs into columns. Each data.frame may have different columns, using the observed name/value pair data.

See Also

Other jam gtf functions: getGtfAttrs(), makeTx2geneFromGtf(), readGtf()


jmw86069/splicejam documentation built on April 14, 2025, 3:12 a.m.