create_ngrams: Get all possible n-Grams

View source: R/ngrams.R

create_ngramsR Documentation

Get all possible n-Grams

Description

Creates the vector of all possible n_grams (for given n).

Usage

create_ngrams(n, u, possible_grams = NULL)

Arguments

n

integer size of n-gram.

u

integer, numeric or character vector of all possible unigrams.

possible_grams

number of possible n-grams. If not NULL n-grams do not contain information about position

Details

See Details section of count_ngrams for more information about n-grams naming convention. The possible information about distance must be added by hand (see examples).

Value

a character vector. Elements of n-gram are separated by dot.

Note

Input data must be a matrix or data frame of numeric elements.

Examples

# bigrams for standard aminoacids
create_ngrams(2, 1L:20)
# bigrams for standard aminoacids with positions, 10 amino acid long sequence, so 
# only 9 bigrams can be located in sequence
create_ngrams(2, 1L:20, 9)
# bigrams for DNA with positions, 10 nucleotide long sequence, distance 1, so only 
# 8 bigrams in sequence
# paste0 adds information about distance at the end of n-gram
paste0(create_ngrams(2, 1L:4, 8), "_0")

michbur/biogram documentation built on Feb. 4, 2024, 6:38 p.m.