read_gwas: Read a GWAS results file into a data frame.

Description Usage Arguments Chromosome Styles Preprocessing

View source: R/read.r


Read a GWAS results file into a data frame.


read_gwas(input, missing = c("NA", "N/A", "null", "."),
  chromosome_style = "ucsc", preprocess = NULL, verbose = TRUE)



Path to a file containing GWAS summary statistics. If multiple paths are specified all files will be read in and combined into a single data.frame.


Vector of characters that represent missing value codes. By default the following strings are interpreted as NA: "", ".", "NA", "N/A", and "null".


Convert chromosomes to ordered factors with labels based on the specified style (default is "ucsc"; see below for a comparison of the different styles). Set to NULL to leave chromosomes unchanged.


a shell command that preprocesses the file; see below for more details


Provide description of processing steps

Chromosome Styles

We use the Homo sapiens chromosome styles defined in Bioconductor's GenomeInfoDb. Valid options include "ncbi", "ensembl", "ucsc" and "dbsnp". The following table provides a preview of each style (note ncbi and ensembl are identical):

ncbi/ensembl ucsc dbsnp
1 chr1 ch1
2 chr2 ch2
3 chr3 ch3
... ... ...
X chrX chX
Y chrY chY
MT chrM chMT


The preprocessor argument allows you to specify shell commands that preprocess the file before it's read into R. For example, we could use grep to filter our results to include only markers with an RS number:

 read_gwas("my-results.txt", preprocess = "grep -e '^rs'") 

Note that read_gwas() handles the header row separately so column labels wouldn't be filtered out by grep in this example.

By default, the input filename is appended to preprocess argument prior to execution. However, you can control where the filename should be inserted in the command by using %s as a placeholder. In the following example, tr is being used to remove null terminators:

  read_gwas("my-results.txt", preprocess = "tr -d '\000' < %s")

aaronwolen/gwasio documentation built on April 9, 2018, 9:07 a.m.