munge | R Documentation |
Function to process GWAS summary statistis files and prepair them for LD score regression
munge(files,hm3,trait.names=NULL,N,info.filter = .9,maf.filter=0.01, column.names=list(),parallel=FALSE,cores=NULL,overwrite=TRUE ...)
files |
A vector of file names, files must be located in the working directory, or a path must be provided. |
hm3 |
A file of SNPs with A1, A2 and rsID used to allign alleles across traits. We suggest using an (UNZIPPED) file of HAPMAP3 SNPs with some basic cleaning applied (e.g., MHC region removed) that is supplied and created by the original LD score regression developers and available here: https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2: |
trait.names |
A vector of trait names which will be used as names for the munged files |
N |
A vector of sample size |
info.filter |
Numeric value which is used as a lower bound for inputation quality (INFO) |
maf.filter |
Numeric value used as a lower bound for minor allel frequency |
column.names |
Optional list detailing which columns represent, SNP, MAF, etc. e.g. list(SNP=my_snp_column) |
parallel |
Indicates whether munge should process the summary statistics files in parallel or serial fashion. Default is TRUE, indicating that it will run in parallel. |
cores |
Indicates how many cores to use when running in parallel. The default is NULL, in which case munge will use 1 less than the total number of cores available in the local environment. |
overwrite |
Indicates whether existing .sumstats.gz files should be overwritten |
The function writes files of the ".sumstats" format, which can be used to estimate SNP heritability and genetic covariance using the ldsc() function. The function will also output a .log file that should be examined to ensure that column names are being interpret correctly.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.