popgenreport: This is the main function of the package. It analyses an...

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/popgenreport.r


This function is used to analyse population genetic data. The main idea is to provide a framework for analysing microsatellite and also SNP genetic data (if not too many loci, say below 1000) using a mix of existing and new functions. The function works on an object of class genind. There are several ways to convert data into a genind object using existing functions provided by the adegenet package ( import2genind, df2genind,read.fstat, read.structure, read.genetix ,read.genepop) or refer to read.genetable how to import data from an EXCEL (csv) document. The function performs a number of different genetic analyses (e.g. counts of indivuals and alleles across sub-populations, tests for heterozygosity and Hardy-Weinberg Equilibrium, differentiation statistics Fst, G'st, Jost's D, and genetic distance between individuals and populations), with users having the option to select which analysis routines are included in the report. To select a routine, the user simply turns on a switch e.g. mk.map=TRUE returns a map with the sampling location for each individual (if coordinates are provided).
Coordinates need to specified within the genind object. As a standard genind object does not require spatial coordinates, we extended it by using the other slot in the genind object. The easiest way to provide spatial coordinates is to use the read.genetable function and use the lat, long or x, y arguments for WGS1984 projected data or mercator projected data respectively. To calculate distances the data are internally reprojected using the Mercator function in package dismo), which is the projection used by google maps. Or you can add data manually to your genind object using the mentioned (e.g. [email protected]$latlong <- yourlatlong data or [email protected]$xy <- your_xy_data). If you have your data in a different projection you need to reproject them into either WGS1984 or the google maps Mercator projection. If you use a different projection distance calculation may be wrong and probably the map will not be correct. See the manual for an example how to project and add spatial coordinates to your genetic data.
Names for alleles ([email protected]) are truncated if longer than six characters. If truncated Captial letters linked by a hyphen are added to guarentee they are unique. You can rename them by providing new names by accessing the [email protected] slot prior to running popgenreport.
Note that the popgenreport function can take a long time to run if the options mk.complete, mk.gd.kosman, or mk.gd.smouse are set to TRUE. For example, running popgenreport with mk.complete=TRUE on a dataset with 500 individuals with 36 loci will take 14 to 15 minutes on a PC with a 3.5 Ghz processor and nearly 3 hours for a dataset with ~3200 individuals.


popgenreport(cats = NULL, mk.counts = TRUE, mk.map = FALSE,
  maptype = "satellite", mapdotcolor = "blue", mapdotsize = 1,
  mapdotalpha = 0.4, mapdottype = 19, mapzoom = NULL, mk.locihz = FALSE,
  mk.hwe = FALSE, mk.fst = FALSE, mk.gd.smouse = FALSE,
  mk.gd.kosman = FALSE, mk.pcoa = FALSE, mk.spautocor = FALSE,
  mk.allele.dist = FALSE, mk.null.all = FALSE, mk.allel.rich = FALSE,
  mk.differ.stats = FALSE, mk.custom = FALSE, fname = "PopGenReport",
  foldername = "results", path.pgr = NULL, mk.Rcode = FALSE,
  mk.complete = FALSE, mk.pdf = TRUE)



this is the genind object the analysis will be based on.


switch is to provide overview counts of the number of individuals sampled, numbers of individuals and alleles sampled per sub-population, number of alleles per locus, mean number of alleles per locus and the percentatge of missing data.


switch to produce a map with the sampling location of each individual marked. This switch requires individual coordinates (latitudes and longitudes in WGS1984) be provided (under [email protected]$latlong or see read.genetable on how to import them from a table of genetic data). An error message will be generated if you turn this routine on, but do not provide the coordinates in the right format. If the coordinates are provided in a seperate file, they must be attached to the genind object in the slot
[email protected]$latlong <- yourlatlongdata.
yourlatlongdata needs to be a data frame that has the same number and order of individuals per row as the population genetic data. Note that an internet connection is required to connect to the Google Maps server which provides the basemap for this routine.


Defines the type of map. Default is 'satellite'. Other options are: 'roadmap', 'mobile', 'terrain', 'hybrid'.


Color of dots for each individual on the map. Default is 'blue'.


Size of dots for each individual. Default is 1.


Transparency of dots. 1 is invisible, 0 is no transparency. Default is 0.4.


Defines the type of the symbol. For explanation see pch under par. Default is 19 - a filled circle.


Zoom level of the map. If not specified the default zoom of Google maps are used. Please be aware if you set the zoom level to high, the map may not show all sample locations.


switch to test for population heterozygosity


switch to test for Hardy-Weinberg equilibrium for each loci and population


switch to calculate Fst values for each loci and pairwise Fst (Nei's 1973) over subpopulations


Individual pairwise genetic distances based on Smouse and Peakall (1999). Refer to gd_smouse. Spatial coordinates need to be provided to be able to run this analysis.


Individual pairwise genetic distances based on Kosman & Leonhard (2005). Refer to gd_kosman. Spatial coordinates need to be provided to be able to run this analysis.


Principal component analysis following Jombart et al. 2009. Spatial coordinates need to be provided to be able to run this analysis. Refer to vignettes within adegenet.


Spatial autocorrelation analysis following Smouse & Peakall 1999. Spatial coordinates need to be provided to be able to run this analysis. Refer to spautocor for more information.


switch to look at allele distributions by loci and subpopulation


check for null alleles


calculation of allelic richness


switch to look at population differentiation statistics (Nei's Gst, Hedrick's Gst, and Jost's D)


edit custom.snw to include your own function to a report.


filename for the output files. Defauts to PopGenReport. Note that using a filename which includes a space in the name will result in the filename for each figure being printed out in the PDF report for each figure. Replacing the space with an underscore should prevent this from happening.


name of folder, where files are stored. Defaults to 'results'


Folder where the output files are stored. Defaults to the temporary directory (tempdir()). If you want to store the output in another directory, simply provide the path here. e.g. path.pgr=getwd() saves it in your current working directory.


switch to get the full R script that is used to generate the report. A great way to get a very detailed insight on the kind of analysis and also an easy way to generate a script which you can customize for your analytical needs.


switch to create a full report using all of the routines (all switches are set to TRUE, except mk.subgroups).


switch to create a shiny pdf output. You need a working latex version running on your system (e.g. MikTex (Windows) or Texmaker (Linux, MacOSX). For more information how to install latex on your system refer to the www.popgenreport.org and to the manuals of the knitr package and its manuals.


The function returns an object (e.g. res) that has all of the results produced by this function in it. The structure of the object can be accessed via str(res). The main slots in this object (if you ran a full report) are:
dataoverview, PopHet, Alleledist, Fst, HsHtdifferentiate, HWEresults,
subgroups, GDKosman, GDSmouse

Additional ouput is provided in the form of a PDF (if mk.pdf=TRUE),which will be saved to the specified subfolder (via foldername) in your current working directory, and maps and figures which will be placed in this folder as well. This folder will be generated automatically in your current working directory. If you do not specify a working directory via path.pgr then the temporary working directory of R will be used (tempdir()). If mk.Rcode=T is set, an R file named fname.R will be saved to your specified subfolder.


Aaron Adamack & Bernd Gruber, [email protected], [email protected]


Kosman E., Leonard K.J. 2005. Similarity coefficients for molecular markers in studies of genetic relationships between individuals for haploid, diploid, and polyploidy species. Molecular Ecology 14:415-424

Peakall R., Smouse P. 2012. GenAlEx 6.5: Genetic analysis in Excel. Population genetic software for teaching and research - an update. Bioinformatics 28:2537-2539

See Also

adegenet, pegas, mmod


#not run:
#data(bilby) # a generated data set
#res <- popgenreport(bilby, mk.counts=TRUE, mk.map=TRUE, mk.pdf=FALSE)
#check results via res or use created tables in the results folder.

### RUN ONLY with a working Latex version installed
# res <- popgenreport(bilby, mk.counts=TRUE, mk.map=TRUE, mk.pdf=TRUE, path.pgr="c:/temp")
# for a full report in a single pdf set mk.complete to TRUE
# res <- popgenreport(bilby, mk.complete=TRUE)

PopGenReport documentation built on May 29, 2017, 9:09 p.m.