miss_plot_heatmap: Plot a heatmap of missing information

View source: R/miss_plot_heatmap.R

miss_plot_heatmapR Documentation

Plot a heatmap of missing information

Description

Use to visualise the presence or absence of genetic information for all sample and locus combinations. Can also be subset by population.

Usage

miss_plot_heatmap(
  dat,
  sortLoci = "order",
  sortSamp = "order",
  chromCol = "CHROM",
  posCol = "POS",
  sampCol = "SAMPLE",
  genoCol = "GT",
  popCol = NA,
  plotColours = NULL,
  plotNCol = 2
)

Arguments

dat

Data table: Contains genetic information and must have the following columns,

  1. The sampled individuals (see param sampCol).

  2. The chromosome/contig ID (see param chromCol).

  3. The position ID (see param posCol).

  4. The genotype column, e.g. a genotype of allele frequency, (see param genoCol).

sortLoci

Character: Sort the loci by their genomic order ('order'), or by the proportion of missing data ('missing'). Default = 'order'.

sortSamp

Character: Sort the samples by their alpha-numeric order ('order'), or by the proportion of missing data ('missing'). Default = 'order'.

chromCol

Character: The column name with the chromosome/contig ID. Default = 'CHROM'.

posCol

Character: The column name with the position ID. Default = 'POS'.

sampCol

Character: The column name with the sampled individual ID. Default = 'SAMPLE'.

genoCol

Character: The column name with the genotype info. Default = 'GT'. Missing data should be represeted by NA.

popCol

Character: The column name with the population ID. Optional parameter. Default = NA.

plotColours

Character: Vector of colours to use in plotting with a length of 2. The first colour is the missing colour, and the second colour is the non-missing colour. Default = NULL.

plotNCol

Integer: The number of columns to arrange indiviudal population plots into. Only takes effect when popCol is specified. Default = 2.

Value

Returns a ggplot object.

Examples

library(genomalicious)

####   MISSING GENOTYPE DATA   ####
data(data_Genos)
datGt <- data_Genos

# Add missing values
datGt <- do.call(
 'rbind',
 # Split data table by sample, and iterate through samples, X
 split(datGt, by='POP') %>%
   lapply(., function(Dpop){
     pop <- Dpop$POP[1]

     if(pop=='Pop1'){
       pr <- 0.1
     } else if(pop=='Pop2'){
       pr <- 0.2
     } else if(pop %in% c('Pop3','Pop4')){
       pr <- 0.05
     }

     # Numbers and unique loci and samples
     num.loc <- Dpop$LOCUS %>% unique %>% length
     uniq.loc <- Dpop$LOCUS %>% unique
     num.samp <- Dpop$SAMPLE %>% unique %>% length
     uniq.samp <- Dpop$SAMPLE %>% unique

     # Vector of missingness
     num.miss <- rbinom(n=num.samp, size=num.loc, prob=pr)

     # Iterate through samples and add unique loci
     for(i in 1:num.samp){
       locs <- sample(uniq.loc, size=num.miss[i], replace=FALSE)
       Dpop[SAMPLE==uniq.samp[i] & LOCUS%in%locs, GT:=NA]
     }

     # Return
     return(Dpop)
   }
   )
)

head(datGt, 10)

# Heatmaps, default and specified colours
miss_plot_heatmap(datGt)
miss_plot_heatmap(datGt, plotColours=c('black', 'deeppink2'))

# Heatmaps, by population
miss_plot_heatmap(datGt, popCol='POP', plotNCol=4)

####   CATCH PLOT OUTPUT FOR LATER USE   ####
gg4pops <- miss_plot_heatmap(datGt, popCol='POP')
plot(gg4pops)


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.