sGD: Calculate spatially explicit indicies of genetic diversity...

Description Usage Arguments Value Examples

Description

Calculate spatially explicit indicies of genetic diversity and Wright's neighborhood size (NS).

Usage

1
2
3
sGD(genind_obj, xy, dist.mat, NH_radius, min_N, max_N = NULL,
  metrics = NULL, NHmat_ans = FALSE, genout_ans = FALSE,
  file_name = NULL, NeEstimator_dir = NULL)

Arguments

genind_obj

A genind object (created by the adegenet package function import2genind or related methods) containing individual genotypes. The order of the individuals must be the same as the order in the xy and dist.mat inputs below.

xy

A dataframe containing 3 columns in the following order: individual IDs, X coordinates, and Y coordinates. The order of the rows must match the order in the genind_obj and dist.mat inputs.

dist.mat

An NxN (N= sample size) matrix of pairwise landscape distances (Euclidean or effective). The distmat function in the sGD package may be used to produce Euclidean and cost-weighted distance matrices. The order of the rows and columns in the matrix must match the order in the xy and genind_obj inputs.

NH_radius

A single value to be used for all neighborhoods, or a vector of genetic neighborhood radii, optionally obtained from using the infer2sigma function. Note that if you specify a vector of radii, sGD will not calculate metrics when the radius value is NA.

min_N

The minimum sample size per neighborhood for indices to be calculated. NA is returned for neighborhoods < min_N.

max_N

Optional. The maximum sample size per neighborhood for indices to be calculated. If the number of individuals in the neighborhood exceeds max_N, a sample of size max_N will be used from the neighborhood to compute the metrics and output files specified by the user. Note that if max_N is specified, and the value is too small to be representative of the neighobrhood, the results could differ significantly compared to if all individuals in the neighborhood were used.

metrics

Optional. Provide a vector of the metrics you would like sGD to produce. Options include "GD" (genetic diversity indices), "NS" (Wright's genetic neighobrhood size), "HWE" (tests for Hardy-Weinberg equilibrium, heterozygote excess, and homozygote excess), and "pFST" (a matrix of pairwise FST values for all neighborhoods)". Note that calculating pFST takes considerable time (several hours using the sGD demo data).

NHmat_ans

Logical (Default = FALSE). If TRUE, a matrix defining neighborhood membership is written to the working directory. For each row in the matrix, a value of 1 occurs at the indices of all individuals inside the neighborhood and a value of 0 occurs for all individuals outside the neighborhood.

genout_ans

Logical (Default = FALSE). If TRUE, a genepop file containing the genotypes for all neighborhoods is written to the working directory.

file_name

(optional) A character string that will be appended to the front of the output filename (will end with "_sGD.csv"). If none specified, no output file will be written.

NeEstimator_dir

Optional. Path to the NeEstimator directory. NeEstimator 2.01 is required only if you include the "NS" metric. It can be downloaded from http://molecularfisherieslaboratory.com.au/neestimator-software.

Value

sGD returns a data frame containing estimates of genetic diversity and/or neighborhood size for neighborhoods surrounding each sample location. The order of the rows in the output matches the order of the samples in the inputs.

Variables in the output data frame include (depending on the metrics selected):

NH_Index - an index of the neighborhoods, from 1 to the total number of neighborhoods.

NH_ID - the ID of the individual at the neighborhood center, taken from individual's ID in the xy_file.

X - the X coordinate of the neighborhood center.

Y - the Y coordinate of the neighborhood center.

N - the number of individuals within the neighborhood.

A - the average number of alleles across all loci/individuals within the neighborhood.

Ap - the proportion of alleles from the entire population that area actually present in the neighborhood.

Ar - the allelic richness across all loci/individuals within the neighborhood.

He - the average expected heterozygosity across all loci/individuals within the neighborhood.

Ho - the average observed heterozygosity across all loci/individuals within the neighborhood.

FIS - the average inbreeding coefficient across all loci/individuals within the neighborhood.

NS_ex0 - an estimate of the effective number of breeding indviduals (Wright's neighborhood size) present within the neighborhood, not exluding rare alleles that could bias the estimate.

NS_ex0.02 - an estimate of the effective number of breeding indviduals (Wright's neighborhood size) present within the neighborhood, exluding alleles with a frequency of 0.02 or less.

NS_ex0.05 - an estimate of the effective number of breeding indviduals (Wright's neighborhood size) present within the neighborhood, exluding alleles with a frequency of 0.05 or less.

NS_ex0.10 - an estimate of the effective number of breeding indviduals (Wright's neighborhood size) present within the neighborhood, exluding alleles with a frequency of 0.10 or less.

If specified in the sGD arguments, the following output files will also be written to the working directory:

NHmat - if NHmat_and = TRUE, sGD writes the NH membership matrix described above to a .csv file in the working directory. The row and column names match the individual ID's in the input files, and are in the same order as the input files.

genout - if genout_ans = TRUE, sGD writes the NH genepop file to the working directory.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
library(sGD)
library(adegenet)
library(raster)

# read in genotypes, locations, and distance matrix
genepop.file <- system.file("extdata","sGD_demo_IBR.gen",package="sGD") 
xy = read.csv(system.file("extdata","sGD_demo_xy.csv",package="sGD"))
dist.mat <- as.matrix(read.csv(system.file("extdata","sGD_demo_cdmat.csv",package="sGD"),
                               header=FALSE))

# convert genepop to genind (make sure you specify the correct allele code digits - ncode)
genind_obj <- read.genepop(genepop.file,ncode=3L,quiet=TRUE)
pop(genind_obj) = xy$Indiv_ID # give each location a unique population ID

# run sGD
sGD_output <- sGD(genind_obj,xy,dist.mat,NH_radius=16000,min_N=20,max_N=NULL,
                  metrics=c("GD","NS","HWE"), NHmat_ans=TRUE,genout_ans=TRUE,
                  file_name="sGD_demo", NeEstimator_dir="C:/NeEstimator_2.01")

# read in the landscape raster to use in plots
landscape <- raster(system.file("extdata","sGD_demo_IBR_landscape.asc",package="sGD"))

# Convert raster to dataframe for ggplot 
landscape.p <- rasterToPoints(landscape)
landscape.df <- data.frame(landscape.p)
colnames(landscape.df) <- c("X", "Y", "Resistance")

# Plot sGD output (Ap is shown here, but explore all sGD outputs) atop the resistance model
library(ggplot2)

ggplot()  +
 geom_raster(data=landscape.df,aes(x=X,y=Y,fill=Resistance),alpha=I(0.5)) +
 scale_fill_gradient(low="black", high="lightgrey") + 
 geom_point(data=sGD_output, aes(x=X, y=Y,color=Ap),size=5) + 
 scale_color_gradient(low="red", high="green",na.value = "white") + 
 theme(panel.grid.major = element_blank(),
       panel.grid.minor = element_blank(),
       panel.background = element_blank())
       

Andrew-Shirk/sGD documentation built on May 26, 2019, 6:38 a.m.