sim.mol.data | R Documentation |
The molecular data simulator generates either gene.data or cpd.data of different ID types, molecule numbers, sample sizes, either continuous or discrete.
sim.mol.data(mol.type = c("gene", "gene.ko", "cpd")[1], id.type = NULL,
species="hsa", discrete = FALSE, nmol = 1000, nexp = 1, rand.seed=100)
mol.type |
character of length 1, specifing the molecular type, either "gene" (including
transcripts, proteins), or "gene.ko" (KEGG ortholog genes, as defined in
KEGG ortholog pathways), or "cpd" (including metabolites, glycans,
drugs). Note that KEGG ortholog gene are considered "gene" in function
|
id.type |
character of length 1, the molecular ID type. When mol.type="gene", proper ID types include "KEGG" and "ENTREZ" (Entrez Gene). Multiple other ID types are also valid When species is among 19 major species fully annotated in Bioconductor, e.g. "hsa" (human), "mmu" (mouse) etc, check:
|
species |
character, either the kegg code, scientific name or the common name of
the target species. This is only effective when mol.type =
"gene". Setting species="ko" is equilvalent to
mol.type="gene.ko". Default species="hsa", equivalent to either "Homo
sapiens" (scientific name) or "human" (common name). Gene data id.type
has multiple other choices for 19 major research species, for details
do: |
discrete |
logical, whether to generate discrete or continuous data. d discrete=FALSE, otherwise, mol.data will be a charactor vector of molecular IDs. |
nmol |
integer, the target number of different molecules. Note that the specified id.type may not have as many different IDs as nmol. In this case, all IDs of id.type are used. |
nexp |
integer, the sample size or the number of columns in the result simulated data. |
rand.seed |
numeric of length 1, the seed number to start the random sampling process. This argumemnt makes the simulation reproducible as long as its value keeps the same. Default rand.seed=100. |
This function is written mainly for simulation or experiment with pathview package. With the simulated molecular data, you may check whether and how pathview works for molecular data of different types, IDs, format or sample sizes etc. You may also generate both gene.data and cpd.data and check data pathway based integration with pathview.
either vector (single sample) or a matrix-like data (multiple
sample), depends on the value of nexp
. Vector should be numeric
with molecular IDs as names or it may also be character of molecular
IDs depending on the value of discrete
. Matrix-like data structure has molecules as
rows and samples as columns. Row names should be molecular IDs.
This returned data can be used directly as gene.data or cpd.data
input of pathview
main function.
Weijun Luo <luo_weijun@yahoo.com>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
node.map
the node data mapper function.
mol.sum
the auxillary molecular data mapper,
id2eg
, cpd2kegg
etc the auxillary molecular ID mappers,
pathview
the main function,
#continuous compound data
cpd.data.c=sim.mol.data(mol.type="cpd", nmol=3000)
#discrete compound data
cpd.data.d=sim.mol.data(mol.type="cpd", nmol=3000, discrete=TRUE)
head(cpd.data.c)
head(cpd.data.d)
#continuous compound data named with "CAS Registry Number"
cpd.cas <- sim.mol.data(mol.type = "cpd", id.type = "CAS Registry Number", nmol = 10000)
#gene data with two samples
gene.data.2=sim.mol.data(mol.type="gene", nmol=1000, nexp=2)
head(gene.data.2)
#KEGG ortholog gene data
ko.data=sim.mol.data(mol.type="gene.ko", nmol=5000)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.