unPC: Unbundled PCA (un-PC) uses PCA results together with...

Description Usage Arguments Value

Description

Unbundled PCA (un-PC) uses PCA results together with geographic sampling information to infer past patterns of migration on the landscape

Usage

1
2
3
4
5
6
7
unPC(inputToProcess, outputPrefix = "unPC_visualization",
  runDataImport = TRUE, runPairwiseCalc = TRUE, geogrCoords,
  roundEarth = FALSE, firstPC = 1, secondPC = 2, runPlotting = TRUE,
  geogrCoordsForPlotting = NULL, plotWithMap = FALSE,
  applyManualColor = FALSE, colorBrewerPalette = NULL,
  ellipseWidth = NULL, populationPointNormalization = 2,
  runAggregated = TRUE, savePlotsToPdf = TRUE)

Arguments

inputToProcess

is either a single file or a directory containing multiple PCA result files to process.

outputPrefix

is the name prefix used to generate the output .r file from data import, the .rds file from pairwiseCalc, and the .pdf files from the unPC plotting (depending on which of these tasks are activated in the function flags)

runDataImport

boolean TRUE/FALSE. TRUE Runs the data import module; FALSE does not.

runPairwiseCalc

boolean TRUE/FALSE. TRUE Runs the pairwise calculation module (which must follow the import module); FALSE does not.

geogrCoords

specifies the path to the file containing the geographic coordinates for each individual represented in the input file(s). If the input PCA results were calculated with smartPCA following the msLandscape pipeline, the geogrCoords are the automatically generated file of focal population locations and are then used to calculate the pairwise unPC values between populations. If the input PCA results are not from msLandscape, then the geogrCoords need to be the coordinates for each individual (i.e. the number of rows in geogrCoords must be the same as the number of rows in the input PCA data). These geogrCoords are then used to generate population labels for each individual and these reduced population-level coordinates are then used to calculate the pairwise unPC values between populations.

roundEarth

boolean TRUE/FALSE. TRUE uses the Haversine formula to calculate the distance between pairs of populations on a globe; FALSE uses Cartesian distance on a plane instead.

firstPC

the number of the first principal component to use in calculating the unPC values; this is used as a column index into the inputToProcess file(s).

secondPC

the number of the second principal component to use in calculating the unPC values; this is used as a column index into the inputToProcess file(s).

runPlotting

boolean TRUE/FALSE. TRUE Runs the unPC plotting module (which must follow the pairwise calculation module); FALSE does not.

geogrCoordsForPlotting

specifies the path to the file containing the geographic coordinates for each individual represented in the input file(s). These geographic coordinates are used in generating the unPC plot ONLY (not calculating the unPC scores). If not specified, the geogrCoords are used for both unPC calculation and plotting.

plotWithMap

boolean TRUE/FALSE. TRUE includes the portion of the world map specified by the geographic coordinates used for plotting; FALSE does not include any map

applyManualColor

boolean TRUE/FALSE (this may be removed later). TRUE is the manual coloring to use for the manuscript; FALSE is dynamic coloring based on the range of unPC values for the given dataset.

colorBrewerPalette

is a string specifying a color palette in the RColorBrewer package. This is only used if applyManualColor is FALSE, otherwise it is ignored.

ellipseWidth

if specified is used as the width of the plotted ellipses instead of the default value. This can help to fix cases where ellipses are wider than the length of the ellipse (especially a problem for small ellipses), which causes them to appear like they are connecting populations other than the populutations they are actually connecting. Setting this takes some trial and error based on the geographic range of the data being processed. Try 1 or 0.5 as a start.

populationPointNormalization

float. This is a normalization (division) factor for the size of the points representing the number of individuals sampled from each population. By default the normalization is 2, which halves the size of all points (all sizes will be divided by 2).

runAggregated

boolean TRUE/FALSE. If inputToProcess is a directory containing multiple files, this controls whether the output from all the files is averaged and then plotting (TRUE), or whether the output from each file is plotted individually (FALSE).

savePlotsToPdf

boolean TRUE/FALSE. Whether to automatically save the plots to pdf (the default; TRUE), or to display the plots on the screen (FALSE)

Value

None


hahnlab/un-PC documentation built on May 17, 2019, 2:25 p.m.