View source: R/annotation_gff.R
pattern_count_genome | R Documentation |
There are times when knowing how many times a given string appears in a genome/CDS is helpful. This function provides that information and is primarily used by cp_seq_m().
pattern_count_genome(
fasta,
gff = NULL,
pattern = "TA",
type = "gene",
id_col = "ID",
key = NULL
)
fasta |
Genome sequence. |
gff |
Gff of annotation information from which to acquire CDS (if not provided it will just query the entire genome). |
pattern |
What to search for? This was used for tnseq and TA is the mariner insertion point. |
type |
Column to use in the gff file. |
id_col |
Column containing the gene IDs. |
key |
What type of entry of the gff file to key from? |
This is once again a place where mcols() usage might improve the overall quality of life.
Data frame of gene names and number of times the pattern appears/gene.
[Biostrings] [Rsamtools::FaFile()] [Biostrings::PDict()]
pa_data <- get_paeruginosa_data()
pa_fasta <- pa_data[["fasta"]]
pa_gff <- pa_data[["gff"]]
ta_count <- pattern_count_genome(pa_fasta, pa_gff)
head(ta_count)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.