xGR2xGenes: Function to define genes from an input list of genomic...
In hfang-bristol/XGR: Exploring Genomic Relations for Enhanced Interpretation Through Enrichment, Similarity, Network and Annotation Analysis

xGR2xGenes

R Documentation

Function to define genes from an input list of genomic regions given the crosslink info

Description

xGR2xGenes is supposed to define genes crosslinking to an input list of genomic regions (GR). Also required is the crosslink info with a score quantifying the link of a GR to a gene. Currently supported built-in crosslink info is enhancer genes, eQTL genes, conformation genes and nearby genes (purely), though the user can customise it via 'crosslink.customised'; if so, it has priority over the built-in data.

Usage

xGR2xGenes(
data,
format = c("chr:start-end", "data.frame", "bed", "GRanges"),
build.conversion = c(NA, "hg38.to.hg19", "hg18.to.hg19"),
crosslink = c("genehancer", "PCHiC_PMID27863249_combined",
"GTEx_V6p_combined",
"nearby"),
crosslink.customised = NULL,
cdf.function = c("original", "empirical"),
scoring = FALSE,
scoring.scheme = c("max", "sum", "harmonic"),
scoring.rescale = FALSE,
nearby.distance.max = 50000,
nearby.decay.kernel = c("rapid", "slow", "linear", "constant"),
nearby.decay.exponent = 2,
verbose = TRUE,
silent = FALSE,
RData.location = "http://galahad.well.ox.ac.uk/bigdata",
guid = NULL
)

Arguments

`data`	input genomic regions (GR). If formatted as "chr:start-end" (see the next parameter 'format' below), GR should be provided as a vector in the format of 'chrN:start-end', where N is either 1-22 or X, start (or end) is genomic positional number; for example, 'chr1:13-20'. If formatted as a 'data.frame', the first three columns correspond to the chromosome (1st column), the starting chromosome position (2nd column), and the ending chromosome position (3rd column). If the format is indicated as 'bed' (browser extensible data), the same as 'data.frame' format but the position is 0-based offset from chromomose position. If the genomic regions provided are not ranged but only the single position, the ending chromosome position (3rd column) is allowed not to be provided. The data could also be an object of 'GRanges' (in this case, formatted as 'GRanges')
`format`	the format of the input data. It can be one of "data.frame", "chr:start-end", "bed" or "GRanges"
`build.conversion`	the conversion from one genome build to another. The conversions supported are "hg38.to.hg19" and "hg18.to.hg19". By default it is NA (no need to do so)
`crosslink`	the built-in crosslink info with a score quantifying the link of a GR to a gene. It can be one of 'genehancer' (enhancer genes; PMID:28605766), 'nearby' (nearby genes; if so, please also specify the relevant parameters 'nearby.distance.max', 'nearby.decay.kernel' and 'nearby.decay.exponent' below), 'PCHiC_PMID27863249_combined' (conformation genes; PMID:27863249), 'PCHiC_PMID31501517_combined' (conformation genes; PMID:31501517), 'GTEx_V6p_combined' (eQTL genes; PMID:29022597), 'eQTL_scRNAseq_combined' (eQTL genes; PMID:29610479), 'eQTL_jpRNAseq_combined' (eQTL genes; PMID:28553958), 'eQTL_ImmuneCells_combined' (eQTL genes; PMID:24604202,22446964,26151758,28248954,24013639), 'eQTL_DICE_combined' (eQTL genes; PMID:30449622)
`crosslink.customised`	the crosslink info with a score quantifying the link of a GR to a gene. A user-input matrix or data frame with 4 columns: 1st column for genomic regions (formatted as "chr:start-end", genome build 19), 2nd column for Genes, 3rd for crosslink score (crosslinking a genomic region to a gene, such as -log10 significance level), and 4th for contexts (optional; if not provided, it will be added as 'C'). Alternatively, it can be a file containing these 4 columns. Required, otherwise it will return NULL
`cdf.function`	a character specifying how to transform the input crosslink score. It can be one of 'original' (no such transformation), and 'empirical' for looking at empirical Cumulative Distribution Function (cdf; as such it is converted into pvalue-like values [0,1])
`scoring`	logical to indicate whether gene-level scoring will be further calculated. By default, it sets to false
`scoring.scheme`	the method used to calculate seed gene scores under a set of GR. It can be one of "sum" for adding up, "max" for the maximum, and "sequential" for the sequential weighting. The sequential weighting is done via: ∑_{i=1}{\frac{R_{i}}{i}}, where R_{i} is the i^{th} rank (in a descreasing order)
`scoring.rescale`	logical to indicate whether gene scores will be further rescaled into the [0,1] range. By default, it sets to false
`nearby.distance.max`	the maximum distance between genes and GR. Only those genes no far way from this distance will be considered as seed genes. This parameter will influence the distance-component weights calculated for nearby GR per gene
`nearby.decay.kernel`	a character specifying a decay kernel function. It can be one of 'slow' for slow decay, 'linear' for linear decay, and 'rapid' for rapid decay. If no distance weight is used, please select 'constant'
`nearby.decay.exponent`	a numeric specifying a decay exponent. By default, it sets to 2
`verbose`	logical to indicate whether the messages will be displayed in the screen. By default, it sets to true for display
`silent`	logical to indicate whether the messages will be silent completely. By default, it sets to false. If true, verbose will be forced to be false
`RData.location`	the characters to tell the location of built-in RData files. See `xRDataLoader` for details
`guid`	a valid (5-character) Global Unique IDentifier for an OSF project. See `xRDataLoader` for details

Value

If scoring sets to false, a data frame with following columns:

GR: genomic regions
Gene: crosslinked genes
Score: the original score between the gene and the GR (if cdf.function is 'original'); otherwise cdf (based on the whole crosslink inputs)
Context: the context

If scoring sets to true, a data frame with following columns:

Gene: crosslinked genes
Score: gene score summarised over its list of crosslinked GR
Pval: p-value-like significance level transformed from gene scores
Context: the context

Examples

RData.location <- "http://galahad.well.ox.ac.uk/bigdata"
## Not run: 

# 1) provide the genomic regions
## load ImmunoBase
ImmunoBase <- xRDataLoader(RData.customised='ImmunoBase',
RData.location=RData.location)
## get lead SNPs reported in AS GWAS and their significance info (p-values)
gr <- ImmunoBase$AS$variant
names(gr) <- NULL
dGR <- xGR(gr, format="GRanges")

# 2) using built-in crosslink info
## enhancer genes
df_xGenes <- xGR2xGenes(dGR, format="GRanges", crosslink="genehancer",
RData.location=RData.location)
## conformation genes
df_xGenes <- xGR2xGenes(dGR, format="GRanges",
crosslink="PCHiC_combined", RData.location=RData.location)
## eQTL genes
df_xGenes <- xGR2xGenes(dGR, format="GRanges",
crosslink="GTEx_V6p_combined", RData.location=RData.location)
## nearby genes (50kb, decaying rapidly)
df_xGenes <- xGR2xGenes(dGR, format="GRanges", crosslink="nearby",
nearby.distance.max=50000, nearby.decay.kernel="rapid",
RData.location=RData.location)

# 3) advanced use
# 3a) provide crosslink.customised
## illustration purpose only (see the content of 'crosslink.customised')
df <- xGR2nGenes(dGR, format="GRanges", RData.location=RData.location)
crosslink.customised <- data.frame(GR=df$GR, Gene=df$Gene,
Score=df$Weight, Context=rep('C',nrow(df)), stringsAsFactors=FALSE)
#crosslink.customised <- data.frame(GR=df$GR, Gene=df$Gene, Score=df$Weight, stringsAsFactors=FALSE)
# 3b) define crosslinking genes
# without gene scoring
df_xGenes <- xGR2xGenes(dGR, format="GRanges",
crosslink.customised=crosslink.customised,
RData.location=RData.location)
# with gene scoring
df_xGenes <- xGR2xGenes(dGR, format="GRanges",
crosslink.customised=crosslink.customised, scoring=TRUE,
scoring.scheme="max", RData.location=RData.location)

## End(Not run)

hfang-bristol/XGR documentation built on Feb. 4, 2023, 7:05 a.m.

hfang-bristol/XGR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hfang-bristol/XGR
Exploring Genomic Relations for Enhanced Interpretation Through Enrichment, Similarity, Network and Annotation Analysis

xGR2xGenes: Function to define genes from an input list of genomic...
In hfang-bristol/XGR: Exploring Genomic Relations for Enhanced Interpretation Through Enrichment, Similarity, Network and Annotation Analysis

Function to define genes from an input list of genomic regions given the crosslink info

Description

Usage

Arguments

Value

See Also

Examples

Related to xGR2xGenes in hfang-bristol/XGR...

R Package Documentation

Browse R Packages

We want your feedback!

hfang-bristol/XGR Exploring Genomic Relations for Enhanced Interpretation Through Enrichment, Similarity, Network and Annotation Analysis

xGR2xGenes: Function to define genes from an input list of genomic... In hfang-bristol/XGR: Exploring Genomic Relations for Enhanced Interpretation Through Enrichment, Similarity, Network and Annotation Analysis

Function to define genes from an input list of genomic regions given the crosslink info

Description

Usage

Arguments

Value

See Also

Examples

Related to xGR2xGenes in hfang-bristol/XGR...

R Package Documentation

Browse R Packages

We want your feedback!

hfang-bristol/XGR
Exploring Genomic Relations for Enhanced Interpretation Through Enrichment, Similarity, Network and Annotation Analysis

xGR2xGenes: Function to define genes from an input list of genomic...
In hfang-bristol/XGR: Exploring Genomic Relations for Enhanced Interpretation Through Enrichment, Similarity, Network and Annotation Analysis