format_data: Format GWAS summary data.

Description Usage Arguments Value

View source: R/Format_data.R

Description

Reads in GWAS summary data. Infer Zscores from p-values and signed satatistics. This function is adapted from the format_data() function in MRCIEU/TwoSampleMR.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
format_data(
  dat,
  snps.merge = w_hm3.snplist,
  snps.remove = MHC.SNPs,
  snp_col = "SNP",
  b_col = "b",
  or_col = "or",
  se_col = "se",
  freq_col = "freq",
  A1_col = "effect_allele",
  A2_col = "other_allele",
  p_col = "pval",
  ncase_col = "ncase",
  ncontrol_col = "ncontrol",
  n_col = "N",
  n = NULL,
  z_col = "z",
  info_col = "info",
  log_pval = FALSE,
  n_qc = F,
  chi2_max = 80,
  min_freq = 0.05
)

Arguments

dat

Data frame. Must have header with at least SNP A1 A2 signed statistics pvalue and sample size.

snps.merge

Data frame with SNPs to extract. must have headers: SNP A1 and A2. For example, the hapmap3 SNPlist.

snps.remove

a set of SNPs needed to be removed. For example, the SNPs in MHC region.

snp_col

column with SNP rs IDs. The default is "SNP".

b_col

Name of column with effect sizes. The default is "b".

se_col

Name of column with standard errors. The default is "se".

freq_col

Name of column with effect allele frequency. The default is "freq".

A1_col

Name of column with effect allele. Must contain only the characters "A", "C", "T" or "G". The default is "A1".

A2_col

Name of column with non effect allele. Must contain only the characters "A", "C", "T" or "G". The default is "A2".

p_col

Name of column with p-value. The default is "pval".

ncase_col

Name of column with number of cases. The default is "ncase".

ncontrol_col

Name of column with number of controls. The default is "ncontrol".

n_col

Name of column with sample size. The default is "n".

n

Sample size

z_col

Name of column with Zscore. The default is "z".

info_col

Name of column with inputation Info. The default is "info_col".

log_pval

The pval is -log10(p_col). The default is FALSE.

n_qc

Whether to remove SNPs according to the sample size of SNPs. The default is FALSE.

chi2_max

SNPs with tested chi^2 statistics large than chi2_max will be removed.The default is 80

min_freq

SNPs with allele frequecy less than min_freq will be removed.The default is 0.05

or_col:

Name of column with odds ratio. The default is or.

Value

data frame wih headers: SNP: rsid; A1: effect allele; A2: non effect allel; Z: Z score; N: sample size; chi2: chi square statistics; P: p-value.


YangLabHKUST/MRAPSS documentation built on Dec. 12, 2020, 11:36 p.m.