In mrc-ide/PlasmoMAPI: Mapping Plasmodium Spatial Connectivity from Genetic Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

set.seed(1)
library(PlasmoMAPI)
library(magrittr)

Sometimes barriers/corridors of gene flow can act at different spatial scales. For example, a river might be a barrier to human movement when on foot but not at longer distances because people tend to drive and so can quickly access the nearest road bridge. We can explore this phenomenon in PlasmoMAPI by restricting our analysis to edges of a certain length, and ignoring anything outside the defined range. In some cases this can lead to sharper inference by reducing the amount of random noise in the data.

Simulating and loading data

We assume a uniform scatter of sampling locations, which we load into a new project.

# assume uniform scatter of sampling locations
node_df <- data.frame(long = runif(2e2, -10, 10),
                      lat = runif(2e2, -10, 10))

# create project and map
p <- pm_project() %>%
  load_coords(node_df$long, node_df$lat) %>%
  create_map() %>%
  assign_map(pb_markdown = TRUE)

As in the previous tutorial we will simulate some data with a known barrier to gene flow using the get_barrier_intersect() function. This time we will assume a more detailed barrier shape to test our ability to detect small and complex features.

# create barrier polygon
barrier_list <- list(data.frame(long = c(-5,0,6,7,4,-3,-2,3,4,3,-5,-5),
                                lat = c(5,7,6,0,-8,-4,-3,-4,-1,4,4,5)))

# plot map with barriers
plot_map(p, poly_list = barrier_list)

The argument max_barrier_range in get_barrier_intersect() allows us to define a spatial distance beyond which the barrier penalty does not apply, for example if max_barrier_range = 100 and two sampling locations are either side of the barrier and separated by 110km then the barrier effectively does not exist between these locations. We will use this mtehod to simulate pairwise genetic distances between sampling locations, this time on the [-Inf,Inf] interval rather than the [0,1] interval just for a change.

# calculate distances, taking into account barriers
sim_dist <- get_barrier_intersect(node_df$long, node_df$lat,
                                  barrier_list = barrier_list,
                                  barrier_penalty = -50,
                                  barrier_method = 2,
                                  max_barrier_range = 500)

# add some noise to the data
sim_dist_noise <- sim_dist + rnorm(length(sim_dist), sd = 2e2)
genetic_distance <- as.matrix(as.dist(sim_dist_noise))

Create a project, load the data, and look at a simple plot of spatial vs. genetic distance.

# load data and produce simple scatterplot
p <- load_data(p, genetic_distance)

plot_dist(p)

As before, it is quite difficult to see any systematic pattern in the data.

Running a range-limited analysis

Now let's see what happens when we carry out the PlasmoMAPI analysis over the entire range of data, and then a second time restricting to distances less than 500km:

# run analysis over all distance and plot map
p <- pm_analysis(p, max_dist = Inf, pb_markdown = TRUE)
plot_map(p, poly_list = barrier_list)

# run analysis over short distance and plot map
p <- pm_analysis(p, max_dist = 500, pb_markdown = TRUE)
plot_map(p, poly_list = barrier_list)

Notice that we didn't find any areas of significance when using the whole dataset because the random noise at distances above 500km was washing out any signal, but when restricting to short distances the barrier becomes clear. However, throwing data away like this comes at a cost - there are now quite a number of hexes that are below our required coverage level, and so are shown in grey. The more we subset the data, the fewer edges we we have at our disposal and the fewer hexes will pass our coverage checks. So, performing a focussed analysis like this is a trade-off between precision and statistical power, and how far you can push it will depend on the number of sampling locations at hand.

mrc-ide/PlasmoMAPI documentation built on Oct. 1, 2020, 9:41 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com