prepareGeno: Prepare genomic input

Description Usage Arguments Value Author(s) Examples

View source: R/Preprocessing.R

Description

Writes a new genomic file that sambada can work with after having applied the selected genomic filtering options. For this function you need SamBada to be installed on your computer; if this is not already the case, you can do this with downloadSambada() - for Mac users, please read the details in downloadSambada's documentation. The output file has the same name as the input file but with a .csv extension

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
prepareGeno(
  fileName,
  outputFile,
  saveGDS,
  mafThresh = NULL,
  missingnessThresh = NULL,
  ldThresh = NULL,
  mgfThresh = NULL,
  directory = NULL,
  interactiveChecks = FALSE,
  verbose = FALSE
)

Arguments

fileName

char Name of the input file (must be in active directory). Can be .gds, .ped, .bed, .vcf. If different from .gds, a gds file (SNPRelate specific format) will be created unless no filtering options are chosen

outputFile

char Name of the output file. Must be a .csv

saveGDS

logical If true (and if the input file extension is different from GDS) the GDS file will be saved. We recommend to set this parameter to TRUE to save time in subsequent functions that rely on GDS file

mafThresh

double A number between 0 and 1 specifying the Major Allele Frequency (MAF) filtering (if null no filtering on MAF will be computed)

missingnessThresh

double A number between 0 and 1 specifying the missing rate filtering (if null no filtering on missing rate will be computed)

ldThresh

double A number between 0 and 1 specifying the linkage disequilibrium (LD) rate filtering (if null no filtering on LD will be computed)

mgfThresh

double A number between 0 and 1 specifying the Major Genotype Frequency (MGF) rate filtering (if null no filtering on MGF will be computed). NB: sambada computations rely on genotypes. NB2: The code is written in C++ and needs to be compiled on your computer, therefore Rtools is needed if this parameter is not null.

directory

char The directory where binaries of sambada are saved. This parameter is not necessary if directory path is permanently stored in the PATH environmental variable or if a function invoking sambada executable (prepareGeno or sambadaParallel) has been already run in the R active session.

interactiveChecks

logical If TRUE, plots will show up showing distribution of allele frequency etc... and the user can interactively change the chosen threshold for mafThresh, missingnessThresh, mgfThresh (optional, default value=FALSE)

verbose

logical Turn on verbose mode

Value

None

Author(s)

Solange Duruz, Oliver Selmoni

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Example with data from the package
# You first need to download sambada and add the directory input parameter to specify where
# you saved it, unless you add it to your PATH environmental varialbe
#################
# Run prepareGeno
#################
# Example with ped input file, no filtering
prepareGeno(system.file("extdata", "uganda-subset-mol.ped", package = "R.SamBada"),
     outputFile=file.path(tempdir(),'/uganda-subset-mol.csv'),FALSE, interactiveChecks=FALSE)


# Example with gds file and filtering
# Define right GDS file according to your OS
if(Sys.info()['sysname']=='Windows'){
  gdsFile=system.file("extdata", "uganda-subset-mol_windows.gds", package = "R.SamBada")
} else {
  gdsFile=system.file("extdata", "uganda-subset-mol_unix.gds", package = "R.SamBada")
}
prepareGeno(gdsFile, outputFile=file.path(tempdir(),'/uganda-subset-mol.csv'),
     saveGDS=FALSE,mafThresh=0.05, missingnessThresh=0.1,interactiveChecks=FALSE)
     
# Run prepareGeno with interactiveChecks=TRUE
prepareGeno(fileName=system.file("extdata", "uganda-subset-mol.ped", package = "R.SamBada"),
     outputFile=file.path(tempdir(),'/uganda-subset-mol.csv'),TRUE, mafThresh=0.05, 
     missingnessThresh=0.05,interactiveChecks=TRUE)

R.SamBada documentation built on Jan. 5, 2022, 1:08 a.m.