hpgltools: A pile of (hopefully) useful R functions

gather_utrs_padding

R Documentation

Take a BSgenome and data frame of chr/start/end/strand, provide 5' and 3' padded sequence.

Description

For some species, we do not have a fully realized set of UTR boundaries, so it can be useful to query some arbitrary and consistent amount of sequence before/after every CDS sequence. This function can provide that information. Note, I decided to use tibble for this so that if one accidently prints too much it will not freak out.

Usage

gather_utrs_padding(
  bsgenome,
  annot_df,
  gid = NULL,
  name_column = "gid",
  chr_column = "chromosome",
  start_column = "start",
  end_column = "end",
  strand_column = "strand",
  type_column = "annot_gene_type",
  gene_type = "protein coding",
  padding = 120,
  ...
)

Arguments

`bsgenome`	BSgenome object containing the genome of interest.
`annot_df`	Annotation data frame containing all the entries of interest, this is generally extracted using a function in the load_something_annotations() family (load_orgdb_annotations() being the most likely).
`gid`	Specific GID(s) to query.
`name_column`	Give each gene a name using this column.
`chr_column`	Column name of the chromosome names.
`start_column`	Column name of the start information.
`end_column`	Ibid, end column.
`strand_column`	Ibid, strand.
`type_column`	Subset the annotation data using this column, if not null.
`gene_type`	Subset the annotation data using the type_column with this type.
`padding`	Return this number of nucleotides for each gene.
`...`	Arguments passed to child functions (I think none currently).