pattern_count_genome: Find how many times a given pattern occurs in every gene of a...
In elsayed-lab/hpgltools: A pile of (hopefully) useful R functions

pattern_count_genome

R Documentation

Find how many times a given pattern occurs in every gene of a genome.

Description

There are times when knowing how many times a given string appears in a genome/CDS is helpful. This function provides that information and is primarily used by cp_seq_m().

Usage

pattern_count_genome(
  fasta,
  gff = NULL,
  pattern = "TA",
  type = "gene",
  id_col = "ID",
  key = NULL
)

Arguments

`fasta`	Genome sequence.
`gff`	Gff of annotation information from which to acquire CDS (if not provided it will just query the entire genome).
`pattern`	What to search for? This was used for tnseq and TA is the mariner insertion point.
`type`	Column to use in the gff file.
`id_col`	Column containing the gene IDs.
`key`	What type of entry of the gff file to key from?

Details

This is once again a place where mcols() usage might improve the overall quality of life.

Value

Data frame of gene names and number of times the pattern appears/gene.

Examples

 pa_data <- get_paeruginosa_data()
 pa_fasta <- pa_data[["fasta"]]
 pa_gff <- pa_data[["gff"]]
 ta_count <- pattern_count_genome(pa_fasta, pa_gff)
 head(ta_count)

elsayed-lab/hpgltools documentation built on May 9, 2024, 5:02 a.m.