format_data: Format GWAS summary data.

View source: R/Format_data.R

format_dataR Documentation

Format GWAS summary data.

Description

Reads in GWAS summary data. Infer Zscores from p-values and signed satatistics. This function is adapted from the format_data() function in MRCIEU/TwoSampleMR.

Usage

format_data(
  dat,
  snps.merge = w_hm3.snplist,
  snps.remove = MHC.SNPs,
  snp_col = "SNP",
  b_col = "b",
  or_col = "or",
  se_col = "se",
  freq_col = "freq",
  A1_col = "A1",
  A2_col = "A2",
  p_col = "p",
  ncase_col = "ncase",
  ncontrol_col = "ncontrol",
  n_col = "n",
  n = NULL,
  z_col = "z",
  info_col = "INFO",
  log_pval = FALSE,
  chi2_max = NULL,
  min_freq = 0.05
)

Arguments

dat

Data frame. Must have header with at least SNP A1 A2 signed statistics pvalue and sample size.

snps.merge

Data frame with SNPs to extract. must have headers: SNP A1 and A2. For example, the hapmap3 SNPlist.

snps.remove

a set of SNPs needed to be removed. For example, the SNPs in MHC region.

snp_col

column with SNP rs IDs. The default is SNP.

b_col

Name of column with effect sizes. The default is b.

se_col

Name of column with standard errors. The default is se.

freq_col

Name of column with effect allele frequency. The default is frew.

A1_col

Name of column with effect allele. Must contain only the characters "A", "C", "T" or "G". The default is A1.

A2_col

Name of column with non effect allele. Must contain only the characters "A", "C", "T" or "G". The default is A2.

p_col

Name of column with p-value. The default is p.

ncase_col

Name of column with number of cases. The default is ncase.

ncontrol_col

Name of column with number of controls. The default is ncontrol.

n_col

Name of column with sample size. The default is n.

n

Sample size

z_col

Name of column with Zscore. The default is z.

info_col

Name of column with inputation Info. The default is info.

log_pval

The pval is -log10(p_col). The default is FALSE.

chi2_max

SNPs with tested chi^2 statistics large than chi2_max will be removed.The default is 80

min_freq

SNPs with allele frequecy less than min_freq will be removed.The default is 0.05

or_col:

Name of column with odds ratio. The default is or.

n_qc

Whether to remove SNPs according to the sample size of SNPs. The default is FALSE.

Value

data frame wih headers: SNP: rsid; A1: effect allele; A2: non effect allel; Z: Z score; N: sample size; chi2: chi square statistics; P: p-value.


YangLabHKUST/MR-APSS documentation built on April 13, 2025, 7:56 p.m.