generate_random_db: Generate a random database of peptides.

Description Usage Arguments Details Value Author(s) See Also

View source: R/generate_random_db.R

Description

This function generates a database of random sequences using a restricted randomization procedure that shuffels the amino acids of the input database over its peptide length distribution. It also respects the N-terminal pyroglutamination and C-terminal amidation frequencies. If we plot the sorted molecular weights of the mock database on the sorted molecular weights of the input database, we expect them to reside approximately on y=x. The generate_random_db function will be used in the false discovery estimation of pep.id.

Usage

1
generate_random_db(db, size = 1, plot = F, verbose = F)

Arguments

db

A database with the first 3 columns "name","MW" and "sequence" (as read in with the download_lpm_db function)

size

Numeric. The desired mock database size as a proportion of the original database size.

plot

Logical. If TRUE, mock database is plotted onto original database. Only works if mock and real database are of equal size (size paramter is 1).

verbose

Logical. If TRUE, some properties of the mock database are printed in the terminal.

Details

A mock database is generated based on the input database. This function works as follows: all peptide lengths (in number of amino acids) of the input database are stored in one vector, and all amino acids of the input are stored in a second vector. Next, samples with replacement are taken from the length distribution, and peptides with these lenghts are generated by sampling with replacement from the amino acid vectors. In a last step, amidations and pyroglutaminations are added with a chance equal to their proportion in the input database. As a result, all masses in the newly generated database are realistic peptide masses.

Value

A database of random peptides.

Author(s)

Rik Verdonck

See Also

pep.id


goat-anti-rabbit/labelpepmatch.R documentation built on May 17, 2019, 7:29 a.m.