Output

Description Usage Arguments Chromosome Styles Preprocessing

View source: R/read.r

Read a GWAS results file into a data frame.

1
2
3

read_gwas(input, sep = "auto", missing = c("NA", "N/A", "null", "."),
  chromosome_style = "ucsc", preprocess = NULL, nrows = -1L,
  header = TRUE, col.names = NULL, verbose = TRUE)

`input`	Path to a file containing GWAS summary statistics. If multiple paths are specified all files will be read in and combined into a single `data.frame`.
`sep`	The separator between columns. Defaults to the first character in the set [`,\t \|;:`] that exists on line `autostart` outside quoted (`""`) regions, and separates the rows above `autostart` into a consistent number of fields, too.
`missing`	Vector of characters that represent missing value codes. By default the following strings are interpreted as `NA`: `""`, `"."`, `"NA"`, `"N/A"`, and `"null"`.
`chromosome_style`	Convert chromosomes to ordered factors with labels based on the specified style (default is `"ucsc"`; see below for a comparison of the different styles). Set to `NULL` to leave chromosomes unchanged.
`preprocess`	a shell command that preprocesses the file; see below for more details
`nrows`	The number of rows to read, by default -1 means all. Unlike `read.table`, it doesn't help speed to set this to the number of rows in the file (or an estimate), since the number of rows is automatically determined and is already fast. Only set `nrows` if you require the first 10 rows, for example. 'nrows=0' is a special case that just returns the column names and types; e.g., a dry run for a large file or to quickly check format consistency of a set of files before starting to read any.
`header`	Does the first data line contain column names? Defaults according to whether every non-empty field on the first data line is type character. If so, or TRUE is supplied, any empty column names are given a default name.
`col.names`	A vector of optional names for the variables (columns). The default is to use the header column if present or detected, or if not "V" followed by the column number.
`verbose`	Provide description of processing steps

We use the Homo sapiens chromosome styles defined in Bioconductor's GenomeInfoDb. Valid options include "ncbi", "ensembl", "ucsc" and "dbsnp". The following table provides a preview of each style (note ncbi and ensembl are identical):

ncbi/ensembl	ucsc	dbsnp
1	chr1	ch1
2	chr2	ch2
3	chr3	ch3
...	...	...
X	chrX	chX
Y	chrY	chY
MT	chrM	chMT

The preprocessor argument allows you to specify shell commands that preprocess the file before it's read into R. For example, we could use grep to filter our results to include only markers with an RS number:

1	read_gwas("my-results.txt", preprocess = "grep -e '^rs'")

Note that read_gwas() handles the header row separately so column labels wouldn't be filtered out by grep in this example.

By default, the input filename is appended to preprocess argument prior to execution. However, you can control where the filename should be inserted in the command by using %s as a placeholder. In the following example, tr is being used to remove null terminators: