getkmer: Compare the kmer difference between 2 sets of sequences

View source: R/structure.R

getkmerR Documentation

Compare the kmer difference between 2 sets of sequences

Description

Compare the kmer difference between 2 sets of sequences, reporting the ratio between the kmer frequencies of the 2 sequence sets, the p-value and adjusted p-value of the difference.

Usage

getkmer(
  targetfile = NULL,
  genomename = "mm10",
  k = 6,
  genes1,
  genes2,
  feature,
  radius = 1000
)

Arguments

targetfile

The directory of the file indicating the gene regions whose kmer need to be compared between the 2 sequence sets, not necessary to be extactly the gene body region between TSS and TTS sites. Columns named as chr, start, end, strand, and gene_id are required. If it is NULL, all the gene regions defined together by the parameters genomename, feature and radius will be analyzed.

genomename

Specify the genome of the genes to be analyzed, when the parameter targetfile is NULL.

k

The length of the kmer to be analyzed. Default is 6, meaning 6-mer will be analyzed.

genes1

The symbols of genes in set 1 whose kmers need to be compared with that of set 2. The regions indicated by the parameter targetfile or genomename etc will be used for these genes only if they belong to the genes indicated by this parameter genes1.

genes2

The symbols of genes in set 2 whose kmers need to be compared with that of set 1. Similar to the parameter genes1.

feature

If the parameter targetfile is NULL, while the parameter genomename is defined. This parameter feature can be used to further select regions from the genes indicated by genomename. Can choose from 'promoter', 'end', and 'genebody'. If it is 'promoter' or 'end', another parameter radius is needed to define the radius of the promoter or end region centering around the TSS or TTS site.

radius

A numberic value needed to define the radius length of the gene promoter or end region if the parameter feature is set as 'promoter' or 'end'.

Value

A data.frame indicating the kmer frquency ratios between sequence set 1 and set 2, as well as p-values (calculated with Fisher's test) and adjusted p-values (adjusted using the Benjamini & Hochberg method).


yuabrahamliu/proRate documentation built on Nov. 3, 2024, 10:14 a.m.