Description Usage Arguments Value See Also Examples
xGR2xGenes
is supposed to define genes crosslinking to an input
list of genomic regions (GR). Also required is the crosslink info with
a score quantifying the link of a GR to a gene. Currently supported
built-in crosslink info is enhancer genes, eQTL genes, conformation
genes and nearby genes (purely), though the user can customise it via
'crosslink.customised'; if so, it has priority over the built-in data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | xGR2xGenes(
data,
format = c("chr:start-end", "data.frame", "bed", "GRanges"),
build.conversion = c(NA, "hg38.to.hg19", "hg18.to.hg19"),
crosslink = c("genehancer", "PCHiC_PMID27863249_combined",
"GTEx_V6p_combined",
"nearby"),
crosslink.customised = NULL,
cdf.function = c("original", "empirical"),
scoring = F,
scoring.scheme = c("max", "sum", "sequential"),
scoring.rescale = F,
nearby.distance.max = 50000,
nearby.decay.kernel = c("rapid", "slow", "linear", "constant"),
nearby.decay.exponent = 2,
verbose = T,
silent = F,
RData.location = "http://galahad.well.ox.ac.uk/bigdata",
guid = NULL
)
|
data |
input genomic regions (GR). If formatted as "chr:start-end" (see the next parameter 'format' below), GR should be provided as a vector in the format of 'chrN:start-end', where N is either 1-22 or X, start (or end) is genomic positional number; for example, 'chr1:13-20'. If formatted as a 'data.frame', the first three columns correspond to the chromosome (1st column), the starting chromosome position (2nd column), and the ending chromosome position (3rd column). If the format is indicated as 'bed' (browser extensible data), the same as 'data.frame' format but the position is 0-based offset from chromomose position. If the genomic regions provided are not ranged but only the single position, the ending chromosome position (3rd column) is allowed not to be provided. The data could also be an object of 'GRanges' (in this case, formatted as 'GRanges') |
format |
the format of the input data. It can be one of "data.frame", "chr:start-end", "bed" or "GRanges" |
build.conversion |
the conversion from one genome build to another. The conversions supported are "hg38.to.hg19" and "hg18.to.hg19". By default it is NA (no need to do so) |
crosslink |
the built-in crosslink info with a score quantifying the link of a GR to a gene. It can be one of 'genehancer' (enhancer genes; PMID:28605766), 'nearby' (nearby genes; if so, please also specify the relevant parameters 'nearby.distance.max', 'nearby.decay.kernel' and 'nearby.decay.exponent' below), 'PCHiC_PMID27863249_combined' (conformation genes; PMID:27863249), 'PCHiC_PMID31501517_combined' (conformation genes; PMID:31501517), 'GTEx_V6p_combined' (eQTL genes; PMID:29022597), 'eQTL_scRNAseq_combined' (eQTL genes; PMID:29610479), 'eQTL_jpRNAseq_combined' (eQTL genes; PMID:28553958), 'eQTL_ImmuneCells_combined' (eQTL genes; PMID:24604202,22446964,26151758,28248954,24013639), 'eQTL_DICE_combined' (eQTL genes; PMID:30449622) |
crosslink.customised |
the crosslink info with a score quantifying the link of a GR to a gene. A user-input matrix or data frame with 4 columns: 1st column for genomic regions (formatted as "chr:start-end", genome build 19), 2nd column for Genes, 3rd for crosslink score (crosslinking a genomic region to a gene, such as -log10 significance level), and 4th for contexts (optional; if not provided, it will be added as 'C'). Alternatively, it can be a file containing these 4 columns. Required, otherwise it will return NULL |
cdf.function |
a character specifying how to transform the input crosslink score. It can be one of 'original' (no such transformation), and 'empirical' for looking at empirical Cumulative Distribution Function (cdf; as such it is converted into pvalue-like values [0,1]) |
scoring |
logical to indicate whether gene-level scoring will be further calculated. By default, it sets to false |
scoring.scheme |
the method used to calculate seed gene scores under a set of GR. It can be one of "sum" for adding up, "max" for the maximum, and "sequential" for the sequential weighting. The sequential weighting is done via: ∑_{i=1}{\frac{R_{i}}{i}}, where R_{i} is the i^{th} rank (in a descreasing order) |
scoring.rescale |
logical to indicate whether gene scores will be further rescaled into the [0,1] range. By default, it sets to false |
nearby.distance.max |
the maximum distance between genes and GR. Only those genes no far way from this distance will be considered as seed genes. This parameter will influence the distance-component weights calculated for nearby GR per gene |
nearby.decay.kernel |
a character specifying a decay kernel function. It can be one of 'slow' for slow decay, 'linear' for linear decay, and 'rapid' for rapid decay. If no distance weight is used, please select 'constant' |
nearby.decay.exponent |
a numeric specifying a decay exponent. By default, it sets to 2 |
verbose |
logical to indicate whether the messages will be displayed in the screen. By default, it sets to true for display |
silent |
logical to indicate whether the messages will be silent completely. By default, it sets to false. If true, verbose will be forced to be false |
RData.location |
the characters to tell the location of built-in
RData files. See |
guid |
a valid (5-character) Global Unique IDentifier for an OSF
project. See |
If scoring sets to false, a data frame with following columns:
GR
: genomic regions
Gene
: crosslinked genes
Score
: the original score between the gene and the GR (if
cdf.function is 'original'); otherwise cdf (based on the whole
crosslink inputs)
Context
: the context
If scoring sets to true, a data frame with following columns:
Gene
: crosslinked genes
Score
: gene score summarised over its list of crosslinked
GR
Pval
: p-value-like significance level transformed from
gene scores
Context
: the context
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ## Not run:
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"
# 1) provide the genomic regions
## load ImmunoBase
ImmunoBase <- xRDataLoader(RData.customised='ImmunoBase',
RData.location=RData.location)
## get lead SNPs reported in AS GWAS and their significance info (p-values)
gr <- ImmunoBase$AS$variant
names(gr) <- NULL
dGR <- xGR(gr, format="GRanges")
# 2) using built-in crosslink info
## enhancer genes
df_xGenes <- xGR2xGenes(dGR, format="GRanges", crosslink="genehancer",
RData.location=RData.location)
## conformation genes
df_xGenes <- xGR2xGenes(dGR, format="GRanges",
crosslink="PCHiC_combined", RData.location=RData.location)
## eQTL genes
df_xGenes <- xGR2xGenes(dGR, format="GRanges",
crosslink="GTEx_V6p_combined", RData.location=RData.location)
## nearby genes (50kb, decaying rapidly)
df_xGenes <- xGR2xGenes(dGR, format="GRanges", crosslink="nearby",
nearby.distance.max=50000, nearby.decay.kernel="rapid",
RData.location=RData.location)
# 3) advanced use
# 3a) provide crosslink.customised
## illustration purpose only (see the content of 'crosslink.customised')
df <- xGR2nGenes(dGR, format="GRanges", RData.location=RData.location)
crosslink.customised <- data.frame(GR=df$GR, Gene=df$Gene,
Score=df$Weight, Context=rep('C',nrow(df)), stringsAsFactors=F)
#crosslink.customised <- data.frame(GR=df$GR, Gene=df$Gene, Score=df$Weight, stringsAsFactors=F)
# 3b) define crosslinking genes
# without gene scoring
df_xGenes <- xGR2xGenes(dGR, format="GRanges",
crosslink.customised=crosslink.customised,
RData.location=RData.location)
# with gene scoring
df_xGenes <- xGR2xGenes(dGR, format="GRanges",
crosslink.customised=crosslink.customised, scoring=T,
scoring.scheme="max", RData.location=RData.location)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.