prepMSig: Preparation of Molecular Signatures

prepMSigR Documentation

Preparation of Molecular Signatures

Description

prepMSig downloads and prepares data bases of Molecular Signatures (MSig) for enrichment analysis by gene sets.

Usage

prepMSig(
  species = "human",
  msig_url = NULL,
  abbr_species = NULL,
  ortho_mart = switch(species, mouse = "mmusculus_gene_ensembl", rat =
    "rnorvegicus_gene_ensembl", human = "to_itself", "unknown"),
  db_path = "~/proteoQ/dbs/msig",
  filename = NULL,
  overwrite = FALSE
)

Arguments

species

Character string; the name of a species for the conveninent preparation of MSig data bases. The species available for the convenience feature is in one of c("human", "mouse", "rat") with "human" being the default. The argument is not required for other species; instead, users will provide values under arguments ortho_mart for the lookup of orthologs to human.

msig_url

A URL to MSig . At the NULL default, a c2.all.v[...].entrez.gmt data will be used for all species. A valid web address is required for a custom data base. For simplicity, only files with entrez IDs will be handled; files of c2.all.v[...].symbols.gmt will not be parsed.

abbr_species

Two-letter character string; the abbreviated name of species used with org.Xx.eg.db. The value of abbr_species will be determined automatically if the species is in one of c("human", "mouse", "rat"). Otherwise, for example, users need to provide abbr_species = Ce for fetching the org.Ce.eg.db package in the name space of proteoQ.

For analysis against gene ontology and Molecular Signatures, the argument is further applied to differentiate the same biological terms under different species; e.g., GO~0072686 mitotic spindle becomes hs_GO~0072686 mitotic spindle for human and mm_GO~0072686 mitotic spindle for mouse.

ortho_mart

Character string; a dataset name from useMart and/or listDatasets for the lookup of orthologs to human genes. For species in c("human", "mouse", "rat"), the value will be determined automatically unless otherwise specified.

db_path

Character string; the local path for database(s). The default is "~/proteoQ/dbs/msig".

filename

Character string; An output file name. At the NULL default, the name will be determined automatically at a given species; i.e., msig_hs.rds for human data. The file is saved as a .rds object for uses with prnGSPA.

overwrite

Logical; if TRUE, overwrite the downloaded database(s). The default is FALSE.

Examples


library(proteoQ)

## the default `MSig` is `c2.all`
# `human`; outputs under `db_path`
prepMSig()
head(readRDS(file.path("~/proteoQ/dbs/msig/msig_hs.rds")))

prnGSPA(
  gset_nms = file.path("~/proteoQ/dbs/msig/msig_hs.rds"), 
)

# `mouse`
prepMSig(species = mouse, filename = msig_mm.rds)
head(readRDS(file.path("~/proteoQ/dbs/msig/msig_mm.rds")))

# `rat`
prepMSig(species = rat, filename = msig_rn.rds)
head(readRDS(file.path("~/proteoQ/dbs/msig/msig_rn.rds")))

# `dog`; need `ortho_mart` for species other than `human`, `mouse` and `rat`
# (try `?biomaRt::useMart` for a list of marts)
prepMSig(
  # species = dog,
  abbr_species = Cf, 
  ortho_mart = cfamiliaris_gene_ensembl,
  filename = msig_cf.rds,
)

# also `dog`
prepMSig(
  species = my_dog,
  abbr_species = Cf, 
  ortho_mart = cfamiliaris_gene_ensembl,
  filename = msig_cf2.rds,
)

msig_cf <- readRDS(file.path("~/proteoQ/dbs/msig/msig_cf.rds"))
msig_cf2 <- readRDS(file.path("~/proteoQ/dbs/msig/msig_cf2.rds"))
identical(msig_cf, msig_cf2)

## use an `MSig`other than the default of `c2.all`
prepMSig(
  msig_url = "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/7.0/c2.cgp.v7.0.entrez.gmt",
  species = human,
  filename = c2_cgp_hs.rds,
)

prepMSig(
  msig_url = "https://data.broadinstitute.org/gsea-msigdb/msigdb/release/7.0/c2.cgp.v7.0.entrez.gmt",
  species = dog,
  ortho_mart = cfamiliaris_gene_ensembl,
  filename = c2_cgp_cf.rds,
)


## Not run: 
# enrichment analysis with custom `MSig`
prnGSPA(
  gset_nms = c("~/proteoQ/dbs/msig/msig_hs.rds",
               "~/proteoQ/dbs/msig/msig_mm.rds"),
)

## End(Not run)


qzhang503/proteoQ documentation built on March 16, 2024, 5:27 a.m.