format_data: format data
In MRCIEU/mrQC: CheckSumStats

format_data

R Documentation

format data

Description

Get the trait summary data ready for the QC checks.

Usage

format_data(
  dat = NULL,
  trait = NA,
  population = NA,
  ncase = NA,
  ncontrol = NA,
  rsid = NA,
  effect_allele = NA,
  other_allele = NA,
  beta = NA,
  se = NA,
  lnor = NA,
  lnor_se = NA,
  eaf = NA,
  p = NA,
  or = NA,
  or_lci = NA,
  or_uci = NA,
  chr = NA,
  pos = NA,
  z_score = NA,
  drop_duplicate_rsids = TRUE
)

Arguments

`dat`	the dataset to be formatted
`trait`	the name of the trait.
`population`	describe the population ancestry of the dataset
`ncase`	number of cases or name of the column specifying the number of cases
`ncontrol`	number of controls or name of the column specifying the number of controls. If your summary data was generated in a linear model of a continuous trait, use ncontrol to indicate the total sample size.
`rsid`	name of the column containing the rs number or identifiers for the genetic variants
`effect_allele`	name of the effect allele column
`other_allele`	name of the non-effect allele column
`beta`	name of the column containing the SNP effect sizes. Use this argument if your summary data was generated in a linear model of a continuous trait.
`se`	standard error for the beta. Use this argument if your summary data was generated in a linear model of a continuous trait.
`lnor`	name of the column containing the log odds ratio. If missing, tries to infer it from the odds ratio
`lnor_se`	name of the column containing the standard error for the log odds ratio. If missing, tries to infer it from 95% confidence intervals or pvalues
`eaf`	name of the effect allele frequency column
`p`	name of the pvalue columne
`or`	name of column containing the odds ratio
`or_lci`	name of column containing the lower 95% confidence interval for the odds ratio
`or_uci`	name of column containing the upper 95% confidence interval for the odds ratio
`chr`	name of the column containing the chromosome number for each genetic variant
`pos`	genomic position for the genetic variant in base pairs
`z_score`	effect size estimate divided by its standard error
`drop_duplicate_rsids`	drop duplicate rsids? logical. default TRUE. duplicate rsids may for example correspond to triallelic SNPs.