R/data.R

#' Dataframe with overlaps GoNL variants and 450K probes
#'
#' Dataframe containing all SNPs and short INDELS from GoNLv5 that
#' overlap with 450K probes.  This release does not include X and Y
#' chromosomes, so only information for autosomal probes is
#' available. For each overlap there is an unique row. Consequently,
#' some probes are duplicated (probes that overlap with multiple
#' variants) and some variants are duplicated (some variants overlap
#' with more than one probe).
#'
#'
#' @format A data frame with 207866 rows and 19 variables:
#' \describe{
#'   \item{CHROM}{chromosome, X and Y chromosomes are not available,
#'        since they are not included in this GoNL release}
#'   \item{probe}{probe ID}
#'   \item{type}{Infinium probedesign}
#'   \item{strand}{orientation of the probe}
#'   \item{probeType}{whether the probe measures a CpG-site (cg) or
#'        a non-CpG site (ch)}
#'   \item{location_c}{Location of the queried 'C' of the CpG dinucleotide. 
#'        Note that this is the location of the C that is actually measured.
#'        For probes that interrogate the reverse strand (plus-strand probes) 
#'        this is one base downstream of the C nucleotide on the forward strand}
#'   \item{location_g}{Location of the G nucleotide of the CpG dinucleotide. 
#'        Note that this is the location of the queried G. For probes that 
#'        interrogate the reverse strand (plus-strand probes) this is one base
#'        upstream of the G nucleotide on the forward strand}
#'   \item{ID}{SNP ID}
#'   \item{snpBeg}{Start coordinate of the variant. Identical to snpEnd for 
#'        SNPs.}
#'   \item{snpEnd}{End coordinate of the variant. Identical to snpBeg for SNPs}
#'   \item{AF}{Allele frequency of alternative allele}
#'   \item{REF}{Reference allele}
#'   \item{ALT}{Alternative allele}
#'   \item{FILTER}{Filter information from GoNL.}
#'   \item{MAF}{Minor allele frequency}
#'   \item{variantType}{SNP or INDEL}
#'   \item{distance_3end}{Distance between SNP and 3'end of the probe. For type
#'        I probes the 3'end of the probe coincides with the queried C 
#'        nucleotide. For type II probes the 3'end of the probe coincides with 
#'        the G nucleotide directly after the C nucleotide.}
#'   \item{distance_c}{Distance from queried C nucleotide. A distance of -1 
#'        indicates that the SNPs overlaps the SBE-position for type I probes.}
#'   \item{channel_switch}{Indicates whether a variant in the SBE-location of
#'        type I probes causes a color-channel-switch or overlap with an INDEL.
#'        For plus-strand probes C/T, C/A and C/G SNPs are expected to cause a 
#'        color-channel switch. For min-strand probes A/G, G/T and C/G SNPs are 
#'        expected to cause a color-channel switch.}
#' }
#'
#' @usage data(hg19.GoNLsnps)
#' 
#' @examples
#'     data(hg19.GoNLsnps)
#'     
#'     # Select variants that overlap with queried C nucleotide
#'     snps_c <- hg19.GoNLsnps[hg19.GoNLsnps$distance_c == 0, ] 
#'     
#'     # Select all INDELS
#'     indels <- hg19.GoNLsnps[hg19.GoNLsnps$variantType == "INDEL",] 
#'     
#'     # Select SNPs that cause a channel-switch
#'     channel_switch <- hg19.GoNLsnps[!is.na(hg19.GoNLsnps$channel_switch)
#' & hg19.GoNLsnps$channel_switch == "Yes",]
#'
#' @source 
#'     \url{http://zwdzwd.github.io/InfiniumAnnotation}
#'         
#'     \url{https://molgenis26.target.rug.nl/downloads/gonl_public/variants/release5/}
"hg19.GoNLsnps"

#' HM450 population-specific probe-masking recommendations
#' 
#' Adapted version of the annotation file provided by Zhou et al. (see
#' source, Mar-13-2017 release).  This annotation file contains
#' population-specific probe-masking recommendations based on SNPs
#' within 5 bases from the 3'end of the probe, mapping issues,
#' non-unique 3' 30bp subsequence and channel-switching SNPs in the
#' single-base-extension for type I probes. We added
#' population-specific masking recommendations for the Dutch
#' population using GoNL release 5. This release does not include X
#' and Y chromosomes, so for the Dutch population, only masking
#' information for the autosomal probes is available.
#' 
#' Note: Zhou et al. identified several probes that match to a
#' different location than annotated in the original Illumina manifest
#' file. The authors have used the 'updated' location in their
#' annotation file. Therefore, a handful of probes in this annotation
#' file are annotated to a different location than the original
#' Illumina manifest file. For the identification of the overlaps we
#' used the locations as annotated in the original Illumina file.
#' Therefore, a few of the identified overlaps do not match the
#' locations as specified in this annotation file.  This also explains
#' why GoNL masking information is available for a couple of probes
#' that are located in the X- and Y-chromosome in this annotation:
#' these probes map to autosomal probes in the original Illumina
#' file. All probes that map to a different location than originally
#' annotated are recommended to be masked (in the MASK.mapping and
#' MASK.general column), so generally they won't be included in
#' further analyses.
#'
#' @format A GRanges object with 485577 ranges and 65 metadata columns:
#' \describe{
#'   \item{MASK.general.pop}{Recommended general purpose masking merged from 
#'        "MASK.sub30.copy", "MASK.mapping" (in either the hg38 or hg19 
#'        genome), "MASK.extBase", "MASK.typeINextBaseSwitch" and 
#'        "MASK.snp5.pop" from the "hm450.manifest" file and the 
#'        "hm450.manifest.pop" file (see source).For GoNL, 
#'        "MASK.typeINextBaseSwitchandINDEL.GoNL" is used instead of 
#'        "MASK.typeINextBaseSwitch"}
#'   \item{MASK.snp5.pop}{Whether the 5bp 3'-subsequence (including extension 
#'         for type II) overlap with a SNP with population-specific AF > 0.01}
#'   \item{MASK.typeINextBaseSwitchandINDEL.GoNL}{SNPs (that cause a 
#'        color-channel switch) and INDELS with AF > 0.01 in GoNL.  In 
#'        contrast, "MASK.typeINextBaseSwitch" column is based on all SNPs in 
#'        1000 genomes and dbSNP, regardless of population or allele frequency}
#' }
#'
#' @source \url{http://zwdzwd.github.io/InfiniumAnnotation}
#' 
#' @usage data(hm450.manifest.pop.GoNL)
#' 
#' @examples 
#' # Select probes that should be masked in Dutch population 
#' # (note that X and Y chromosomes are not included)
#' hm450.manifest.pop.GoNL <- hm450.manifest.pop.GoNL[!is.na(
#'     hm450.manifest.pop.GoNL$MASK.general.GoNL) &
#'     hm450.manifest.pop.GoNL$MASK.general.GoNL == TRUE, ]
#'   
#' # Select probes that should be masked in Dutch population because there is 
#' # a SNP within 5 bases of the 3'end of the probe 
#' # (note that X and Y chromosomes are not included)
#' hm450.manifest.pop.GoNL <- hm450.manifest.pop.GoNL[!is.na(
#'     hm450.manifest.pop.GoNL$MASK.snp5.GoNL) &
#'     hm450.manifest.pop.GoNL$MASK.snp5.GoNL == TRUE, ]
#'     
#' # When studying a Dutch population and one wants to include X and Y 
#' # chromosomal probes, the EUR or CEU population can be used.                                                      
#' # Select probes that should be masked in European population 
#' # (these include X and Y chromosomes)                                                      
#' hm450.manifest.pop.GoNL <- hm450.manifest.pop.GoNL[
#'     hm450.manifest.pop.GoNL$MASK.general.EUR == TRUE,]
#'
#'
#' @references
#'     Zhou W, Laird PW and Shen H: Comprehensive characterization, 
#'     annotation and innovative use of Infinium DNA Methylation BeadChip 
#'     probes.
#'     Nucleic Acids Research 2016
"hm450.manifest.pop.GoNL"

Try the omicsPrint package in your browser

Any scripts or data that you put into this service are public.

omicsPrint documentation built on Nov. 8, 2020, 4:55 p.m.