biogram: N-Gram Analysis of Biological Sequences

Tools for extraction and analysis of various n-grams (k-mers) derived from biological sequences (proteins or nucleic acids). Contains QuiPT (quick permutation test) for fast feature-filtering of the n-gram data.

Install the latest version of this package by entering the following in R:
AuthorMichal Burdukiewicz [cre, aut], Piotr Sobczyk [aut], Chris Lauber [aut]
Date of publication2017-01-06 01:18:55
MaintainerMichal Burdukiewicz <>

View on CRAN

Man pages

aaprop: Normalized amino acids properties

add_1grams: Add 1-grams Coerce feature_test object to a data frame

binarize: Binarize

biogram-package: biogram - analysis of biological sequences using n-grams

calc_criterion: Calculate value of criterion

calc_cs: Calculate Chi-squared-based measure

calc_ed: Calculate encoding distance

calc_ig: Calculate IG for single feature

calc_kl: Calculate KL divergence of features

calc_pi: Calculate partition index

calc_si: Compute similarity index

cluster_reg_exp: Clustering of sequences based on regular expression

code_ngrams: Code n-grams

construct_ngrams: Construct and filter n-grams

count_multigrams: Detect and count multiple n-grams in sequences

count_ngrams: Count n-grams in sequences

count_specified: Count specified n-grams

count_total: Count total number of n-grams

create_encoding: Create encoding

create_feature_target: Create feature according to given contingency matrix

create_ngrams: Get all possible n-Grams

criterion_distribution: criterion_distribution class

cut.feature_test: Categorize tested features

decode_ngrams: Decode n-grams

degenerate: Degenerate protein sequence

distr_crit: Compute criterion distribution

encoding2df: Convert encoding to data frame

fast_crosstable: Very fast 2d cross-tabulation

feature_test: feature_test class

gap_ngrams: Gap n-grams

get_ngrams_ind: Get indices of n-grams

human_cleave: Human signal peptides cleavage sites

is_ngram: Validate n-gram

l2n: Convert letters to numbers

list2matrix: Convert list of sequences to matrix

n2l: Convert numbers to letters

ngrams2df: n-grams to data frame

plot.criterion_distribution: Plot criterion distribution

position_ngrams: Position n-grams

print.feature_test: Print tested features

seq2ngrams: Extract n-grams from sequence

summary.feature_test: Summarize tested features

table_ngrams: Tabulate n-grams

test_features: Permutation test for feature selection

validate_encoding: Validate encoding


aaprop Man page
add_1grams Man page Man page
binarize Man page
biogram Man page
biogram-package Man page
calc_criterion Man page
calc_cs Man page
calc_ed Man page
calc_ig Man page
calc_kl Man page
calc_pi Man page
calc_si Man page
cluster_reg_exp Man page
code_ngrams Man page
construct_ngrams Man page
count_multigrams Man page
count_ngrams Man page
count_specified Man page
count_total Man page
create_encoding Man page
create_feature_target Man page
create_ngrams Man page
criterion_distribution Man page
cut.feature_test Man page
decode_ngrams Man page
degenerate Man page
distr_crit Man page
encoding2df Man page
fast_crosstable Man page
feature_test Man page
gap_ngrams Man page
get_ngrams_ind Man page
human_cleave Man page
is_ngram Man page
l2n Man page
list2matrix Man page
n2l Man page
ngrams2df Man page
plot.criterion_distribution Man page
position_ngrams Man page
print.feature_test Man page
seq2ngrams Man page
summary.feature_test Man page
table_ngrams Man page
test_features Man page
validate_encoding Man page


tests/testthat/test_create_ngrams.R tests/testthat/test_table_ngrams.R tests/testthat/test_crosstable.R tests/testthat/test_position_ngrams.R tests/testthat/test_seq2grams.R tests/testthat/test_is_ngram.R tests/testthat/test_quipt_consistency.R tests/testthat/test_count_ngrams.R tests/testthat/test_calc_ed.R tests/testthat/test_get_ngrams_pos.R tests/test-all.R
R/count_ngrams.R R/create_encoding.R R/indices_and_positions.R R/position_ngrams.R R/test_features.R R/calc_ed.R R/human_cleave.R R/table_ngrams.R R/information_gain.R R/seq2matrix.R R/count_specified.R R/utilities.R R/aaprop.R R/ngram_coding.R R/criterion_distribution.R R/kl_divergence.R R/feature_test_class.R R/construct_ngrams.R R/criterions.R R/biogram.R R/distr_crit.R R/add_remove_ngrams.R R/calc_si.R R/cluster_reg_exp.R R/count_multigrams.R R/data_manipulation.R R/is_ngram.R R/chi_square.R R/degenerate.R R/ngrams.R
man/calc_ed.Rd man/create_feature_target.Rd man/create_encoding.Rd man/list2matrix.Rd man/table_ngrams.Rd man/get_ngrams_ind.Rd man/seq2ngrams.Rd man/calc_ig.Rd man/cluster_reg_exp.Rd man/calc_cs.Rd man/degenerate.Rd man/count_total.Rd man/count_multigrams.Rd man/calc_pi.Rd man/plot.criterion_distribution.Rd man/ man/calc_criterion.Rd man/binarize.Rd man/position_ngrams.Rd man/summary.feature_test.Rd man/is_ngram.Rd man/calc_si.Rd man/construct_ngrams.Rd man/fast_crosstable.Rd man/cut.feature_test.Rd man/count_specified.Rd man/human_cleave.Rd man/create_ngrams.Rd man/calc_kl.Rd man/gap_ngrams.Rd man/code_ngrams.Rd man/criterion_distribution.Rd man/validate_encoding.Rd man/count_ngrams.Rd man/l2n.Rd man/print.feature_test.Rd man/encoding2df.Rd man/test_features.Rd man/decode_ngrams.Rd man/feature_test.Rd man/biogram-package.Rd man/add_1grams.Rd man/ngrams2df.Rd man/n2l.Rd man/aaprop.Rd man/distr_crit.Rd

Questions? Problems? Suggestions? or email at

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.