pattern_count_genome: Find how many times a given pattern occurs in every gene of a...

View source: R/annotation_gff.R

pattern_count_genomeR Documentation

Find how many times a given pattern occurs in every gene of a genome.

Description

There are times when knowing how many times a given string appears in a genome/CDS is helpful. This function provides that information and is primarily used by cp_seq_m().

Usage

pattern_count_genome(
  fasta,
  gff = NULL,
  pattern = "TA",
  type = "gene",
  id_col = "ID",
  key = NULL
)

Arguments

fasta

Genome sequence.

gff

Gff of annotation information from which to acquire CDS (if not provided it will just query the entire genome).

pattern

What to search for? This was used for tnseq and TA is the mariner insertion point.

type

Column to use in the gff file.

id_col

Column containing the gene IDs.

key

What type of entry of the gff file to key from?

Details

This is once again a place where mcols() usage might improve the overall quality of life.

Value

Data frame of gene names and number of times the pattern appears/gene.

See Also

[Biostrings] [Rsamtools::FaFile()] [Biostrings::PDict()]

Examples

 pa_data <- get_paeruginosa_data()
 pa_fasta <- pa_data[["fasta"]]
 pa_gff <- pa_data[["gff"]]
 ta_count <- pattern_count_genome(pa_fasta, pa_gff)
 head(ta_count)

elsayed-lab/hpgltools documentation built on May 9, 2024, 5:02 a.m.