sambadaParallel: Run sambada on parallel cores

Description Usage Arguments Author(s) Examples

View source: R/Run_sambada.R

Description

Read samBada's input file to retrieve necessary information (number of individuals etc...), split the dataset using SamBada's Supervision tool, run sambada on the splitted dataset and merge all using Supervision. For this function you need SamBada to be installed on your computer; if this is not already the case, you can do this with downloadSambada() - for Mac users, please read the details in downloadSambada's documentation. This function produces the following output files: outputFile-Out-0.csv to outputFile-Out-dimMax.csv as well as outputFile-storey.csv (outputFile and dimMax are parameters of the function). See sambada's documentation for more information. In case you have to specify several words in one parameter, you can either specify them in one string and separate them with a space or add a vector string

Usage

1
2
3
4
5
6
sambadaParallel(genoFile, envFile, idGeno, idEnv, outputFile, dimMax = 1,
  cores = NULL, wordDelim = " ", saveType = "END BEST 0.05",
  populationVar = NULL, spatial = NULL, autoCorr = NULL,
  shapeFile = NULL, colSupEnv = NULL, colSupMark = NULL,
  subsetVarEnv = NULL, subsetVarMark = NULL, headers = TRUE,
  directory = NULL, keepAllFiles = FALSE)

Arguments

genoFile

The name of the file in the current directory of genetic information, compliant with samBada's format (use prepareGeno to transform it)

envFile

The name of the file in the current directory of environmental information (use link{createEnv} to create it and link{prepareEnv} to reduce the correlated dataset and check order)

idGeno

Name of the column in the genoFile corresponding to the id of the animals

idEnv

Name of the column in the envFile corresponding to the id of the animals

outputFile

char Base name(s) for the results file(s). Output files will be created from the base name with suffixes (e.g. -Out-)

dimMax

Maximum number of environmental variables included in the logistic models. Use 1 for univariate models, 2 for univariate and bivariates models

cores

Number of cores to use. If NULL, the #cores-1 will be used where #cores corresponds to all cores available on your computer.

wordDelim

char Word delimiter of input file(s). Default ' ',

saveType

composed of three words 1) one of 'end' or 'real' to save the result during the analysis or at the end (allows sorting of result) 2) one of 'all' or 'best' to save all models or only significant models 3) If 'best' specify the threshold of significance (before applying Bonferroni's correction). Default 'END BEST 0.05',

populationVar

one of 'first' or 'last'. This option indicates whether any explanatory variables represent the population structure. If present, the said population variables must be gathered in the input file, either on the left or on the right side of the group of environmental variables. Default null.

spatial

composed of 5 words 1) Column name (or number) for longitude 2) Column name (or number) for latitude 3) one of 'spherical' or 'cartesian': to indicate the type of coordinate 4) one of 'distance', 'gaussian', bisquare' or 'nearest': type of weighting scheme (see sambadoc) 5) Number bandwidth of weighting function: Units are in [m] for spherical coordinates; for cartesian coordinates, units match those of the samples' positions (see sambadoc)

autoCorr

composed of 3 words. 1) one of global, local or both: to indicate the type of spatial autocorrelation to compute. 2) one of env, mark or both: to indicate the variables on which to compute the analysis 3) integer The number of permutation to compute the pseudo p-value. Ex 'global both 999'

shapeFile

one of yes or no. With this option, the LISA are saved as a shapefile (in addition to the usual output)

colSupEnv

char or vector of char Name(s) of the column(s) in the environmental data to be excluded from the analysis. Default NULL

colSupMark

char or vector of char Name(s) of the column(s) in the molecular data to be excluded from the analysis. Default NULL

subsetVarEnv

char or vector of char Name(s) of the column(s) in the environmental data to be included in the analysis while the other columns are set as inactive. Default NULL

subsetVarMark

char or vector of char Name(s) of the column(s) in the molecular data to be included in the analysis while the other columns are set as inactive. Default NULL

headers

logical Presence or absence of variable names in input files Default TRUE

directory

char The directory where binaries of sambada are saved. This parameter is not necessary if directory path is permanently stored in the PATH environmental variable or if a function invoking sambada executable (prepareGeno or sambadaParallel) has been already run in the R active session.

keepAllFiles

logical If TRUE, all parameter files and split genoFile and log-files are not removed. Default FALSE

Author(s)

Solange Duruz, Sylvie Stucki

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Example with data from the package
# You first need to download sambada with downloadSambada(tempdir())
# Example without population structure, using only one core
sambadaParallel(genoFile=system.file("extdata", "uganda-subset-mol.csv", package = "R.SamBada"), 
     envFile=system.file("extdata", "uganda-subset-env-export.csv", package = "R.SamBada"), 
     idGeno='ID_indiv', idEnv='short_name', dimMax=1, cores=1, saveType='END ALL', 
     outputFile=file.path(tempdir(),'uganda-subset-mol')) 

# Example with population structure, using multiple core
sambadaParallel(genoFile=system.file("extdata", "uganda-subset-mol.csv", package = "R.SamBada"), 
     envFile=system.file("extdata", "uganda-subset-env-export.csv", package = "R.SamBada"), 
     idGeno='ID_indiv', idEnv='short_name', dimMax=2, cores=2, saveType='END ALL', 
     populationVar='LAST', outputFile=file.path(tempdir(),'uganda-subset-mol'))

SolangeD/R.SamBada documentation built on Dec. 25, 2021, 10:48 a.m.