R/Nmarkers_SimpleM.R

Defines functions Nmarkers_SimpleM

Documented in Nmarkers_SimpleM

#' Estimate the number of effective markers in a chromosome based on an adapted version of the simpleM methodology
#'
#' @param ld.file A data frame with the pairwise linkage disequilibrium (LD) values for a chromosome. The column names SNP_A, SNP_B, and R are mandatory, where the SNP_A and SNP_B contained the markers names and the R column the LD values between the two markers.
#' @param PCA_cutoff A cutoff for the total of the variance explained by the markers.
#' @details This function estimate the effective number of markers in a chromosome using adapted version of the simpleM methodology described in Gao et al. (2008). The function use as input a data frame composed by three mandatory columns (SNP_A, SNP_B, and R). This data frame can be obtained using PLINK or any other software to compute LD between markers. Additionally, a threshold for percentage of the sum of the variances explained by the markers must be provided. The number of effective markers identified by this approach can be used in multiple testing corrections, such as Bonferroni.
#' @return The effective number of markers identified by the SimpleM approach
#' @importFrom data.table setkeyv
#' @importFrom stats qnorm
#' @importFrom stats pnorm
#' @importFrom Matrix sparseMatrix
#' @importFrom Matrix Diagonal
#' @importFrom dplyr filter
#'@references Gao et al. (2008) Genet Epidemiol, Volume 32, Issue 4, Pages 361-369.
#'(\doi{10.1002/gepi.20310})
#' @name Nmarkers_SimpleM
#' @export

Nmarkers_SimpleM<-function(ld.file, PCA_cutoff=0.995){
  
  snps<-unique(c(ld.file$SNP_A,ld.file$SNP_B))
  
  tmp.ld<-ld.file %>% dplyr::filter(SNP_A %in% snps &  SNP_B %in% snps)
  
  num_snps <- length(snps)
  
  snp_indices <- as.numeric(factor(snps))
  
  mat.r <- Matrix::sparseMatrix(
    i = snp_indices[match(tmp.ld$SNP_A, snps)],
    j = snp_indices[match(tmp.ld$SNP_B, snps)],
    x = tmp.ld$R2,
    dims = c(num_snps, num_snps)
  )
  
  mat.r <- as.matrix(mat.r) + t(as.matrix(mat.r)) - Matrix::Diagonal(num_snps)
  
  diag(mat.r)<-1
  
  cut.off <- PCA_cutoff
  
  out.cut<-inferCut(mat.r, cut.off)
  
  return(out.cut)
}

Try the GALLO package in your browser

Any scripts or data that you put into this service are public.

GALLO documentation built on June 22, 2024, 9:17 a.m.