miss_plot_hist: Plot missing genotypes, by samples

View source: R/miss_plot_hist.R

miss_plot_histR Documentation

Plot missing genotypes, by samples

Description

Use to visualise missing data with respect to samples and their associated populations.

Usage

miss_plot_hist(
  dat,
  plotBy,
  look = "ggplot",
  sampCol = "SAMPLE",
  locusCol = "LOCUS",
  genoCol = "GT",
  popCol = NA,
  plotColours = "white",
  plotNCol = 2
)

Arguments

dat

Data table: Contains genetic information and must have the following columns,

  1. The sampled individuals (see param sampCol).

  2. The locus ID (see param locusCol).

  3. The genotype column, e.g. a genotype of allele frequency, (see param genoCol).

plotBy

Character: One of 'samples' or 'loci', the focus of missing data.

look

Character: The look of the plot. Default = 'ggplot', the typical gray background with gridlines produced by ggplot2. Alternatively, when set to 'classic', produces a base R style plot.

sampCol

Character: The column name with the sampled individual ID. Default = 'SAMPLE'.

locusCol

Character: The column name with the locus ID. Default = 'LOCUS'.

genoCol

Character: The column name with the genotype info. Default = 'GT'. Missing data should be represeted by NA.

popCol

Character: The column name with the population ID. Optional parameter. Default = NA.

plotColours

Character: The fill colour for histogram bars.

plotNCol

Integer: The number of columns to arrange indiviudal population plots into. Only takes effect when popCol is specified. Default = 2.

Details

When popCol is unspecified, then all samples are used to create the plots. If it is specified, then that column name is used to make one plot for each population. These are arranged in rows and columns, and the user can specify the number of columns with the argument plotNCol.

Value

Returns a ggplot object.

Examples

library(genomalicious)

####   MISSING GENOTYPE DATA   ####
data(data_Genos)
datGt <- data_Genos

# Add missing values
datGt <- do.call(
 'rbind',
 # Split data table by sample, and iterate through samples, X
 split(datGt, by='POP') %>%
   lapply(., function(Dpop){
     pop <- Dpop$POP[1]

     if(pop=='Pop1'){
       pr <- 0.1
     } else if(pop=='Pop2'){
       pr <- 0.2
     } else if(pop %in% c('Pop3','Pop4')){
       pr <- 0.05
     }

     # Numbers and unique loci and samples
     num.loc <- Dpop$LOCUS %>% unique %>% length
     uniq.loc <- Dpop$LOCUS %>% unique
     num.samp <- Dpop$SAMPLE %>% unique %>% length
     uniq.samp <- Dpop$SAMPLE %>% unique

     # Vector of missingness
     num.miss <- rbinom(n=num.samp, size=num.loc, prob=pr)

     # Iterate through samples and add unique loci
     for(i in 1:num.samp){
       locs <- sample(uniq.loc, size=num.miss[i], replace=FALSE)
       Dpop[SAMPLE==uniq.samp[i] & LOCUS%in%locs, GT:=NA]
     }

     # Return
     return(Dpop)
   }
   )
)

head(datGt, 10)

####   PLOT MISSING BY SAMPLES   ####
# Histograms, ggplot and classic looks
miss_plot_hist(datGt, plotBy='samples', look='ggplot')
miss_plot_hist(datGt, plotBy='samples',, look='classic')

# Histograms, by population, specifying colour
miss_plot_hist(datGt, plotBy='samples',, look='ggplot'
                 , popCol='POP' , plotColours='deeppink2')

####   PLOT MISSING BY LOCI   ####
miss_plot_hist(datGt, plotBy='loci',, look='classic'
                 , popCol='POP' , plotColours='deeppink2')

####   CATCH PLOT OUTPUT FOR LATER USE   ####
gg4pops <- miss_plot_hist(datGt, plotBy='samples', popCol='POP')
plot(gg4pops)


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.