View source: R/create_hapmap_reference.R
create_hapmap_reference | R Documentation |
This function creates the standard allele reference file, as
used by QC_GWAS
and match_alleles
,
from data publicly available at the website of the
international HapMap project (see 'References').
create_hapmap_reference(dir = getwd(), download_hapmap = FALSE, download_subset, hapmap_files = list.files(path = dir, pattern = "freqs_chr"), filename = "allele_reference_HapMap", save_txt = TRUE, save_rdata = !save_txt, return_reference = FALSE)
dir |
character string; the directory of the input and output files. Note that R uses forward slash (/) where Windows uses the backslash (\). |
download_hapmap |
logical; if |
download_subset |
character-string; indicates the population to download for creating the reference. Options are: ASW, CEU, CHB, CHD, GIH, JPT, LWK, MEX, MKK, TSI, YRI. |
hapmap_files |
character vector of the filenames of
HapMap frequency-files to be included in the reference. The
default option includes all files with the string
"freqs_chr" in the filename. (This argument is only
used when |
filename |
character string; the name of the output file, without file-extension. |
save_txt, save_rdata |
logical; should the reference be
saved as a tab-delimitated text file and/or an RData file?
If saved as RData, the object name |
return_reference |
logical; should the function return the reference as it output value? |
The function removes SNPs with invalid alleles and with allele
frequencies that do not add up to 1
. It also removes
all instances of duplicate SNPids. If such entries are
encountered, a warning is printed in the R console and the
entries are saved in a .txt file in the output directory.
Like the QC_GWAS
, create_hapmap_reference
codes
the X chromosome as 23
, Y as 24
, XY (not
available on HapMap website) as 25
and M as 26
.
Both the .RData export and the function return store the alleles as factors rather than character strings.
If return_reference
is TRUE
, the function
returns the generated reference table. If FALSE
, it
returns an invisible NULL
.
The required data is available at the Website of the International HapMap project, under bulk data downloads > bulk data > frequencies
http://hapmap.ncbi.nlm.nih.gov
The HapMap files downloaded by this function are subject to the HapMap terms and policies. See: http://hapmap.ncbi.nlm.nih.gov/datareleasepolicy.html
match_alleles
# This command will download the CEU HapMap dataset and use # it to generate an allele-reference. Create a folder # "new_hapmap" to store the data and make sure there is # sufficient disk space and a reasonably fast internet # connection. ## Not run: new_hapmap <- create_hapmap_reference(dir = "C:/new_hapmap", download_hapmap = TRUE, download_subset = "CEU", filename = "new_hapmap", save_txt = TRUE, return_reference = TRUE) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.