gather_utrs_padding: Take a BSgenome and data frame of chr/start/end/strand,...

View source: R/sequence.R

gather_utrs_paddingR Documentation

Take a BSgenome and data frame of chr/start/end/strand, provide 5' and 3' padded sequence.

Description

For some species, we do not have a fully realized set of UTR boundaries, so it can be useful to query some arbitrary and consistent amount of sequence before/after every CDS sequence. This function can provide that information. Note, I decided to use tibble for this so that if one accidently prints too much it will not freak out.

Usage

gather_utrs_padding(
  bsgenome,
  annot_df,
  gid = NULL,
  name_column = "gid",
  chr_column = "chromosome",
  start_column = "start",
  end_column = "end",
  strand_column = "strand",
  type_column = "annot_gene_type",
  gene_type = "protein coding",
  padding = 120,
  ...
)

Arguments

bsgenome

BSgenome object containing the genome of interest.

annot_df

Annotation data frame containing all the entries of interest, this is generally extracted using a function in the load_something_annotations() family (load_orgdb_annotations() being the most likely).

gid

Specific GID(s) to query.

name_column

Give each gene a name using this column.

chr_column

Column name of the chromosome names.

start_column

Column name of the start information.

end_column

Ibid, end column.

strand_column

Ibid, strand.

type_column

Subset the annotation data using this column, if not null.

gene_type

Subset the annotation data using the type_column with this type.

padding

Return this number of nucleotides for each gene.

...

Arguments passed to child functions (I think none currently).

Value

Dataframe of UTR, CDS, and UTR+CDS sequences.


elsayed-lab/hpgltools documentation built on May 9, 2024, 5:02 a.m.