Objective

Update and replace the draw_stackbar_chr() function from rnaseqTools to use ggplot2 and use an HermesData object as input.

Idea

We need to access two rowData factors in the HermesData object: Chromosome and LowExpressionFlag, and then use ggplot function to draw stackbar.

To make the plot easy to read, the Chromosome factor should be sorted to 1-22, "X", "Y", "MT", "Others".

Use ggplot() and geom_bar() functions to create stack bar.

#Creating the HermesData object and read rowData information

load("~/hermes/data/hermes_data.rda")

object <- hermes_data

df <- data.frame(Chromosome = rowData(object)$Chromosome, LowExpressionFlag = rowData(object)$LowExpressionFlag)

# Sort Chromosome factor
chrname <- c(as.character(c(1:22)), "X", "Y", "MT", "Others")
df$chr <- factor(ifelse(df$Chromosome %in% chrname, df$Chromosome, "Others"), levels = chrname)

# Use ggplot to create stackplot
ggplot(data=df, aes(x=chr))+ geom_bar(aes(fill = LowExpressionFlag))

Editing changes for the plots will be added later.

Miscellaneous issue 1: the name of Chromosome

At this stage, the Chromosome factor is mandatory in HermesData object. However, its contents (i.e. the names of the chromosome) are not limited. In this setting, we force the names of the chromosomes to be 1, 2, ..., 22, X, Y, MT. If the names are not in this list, they will be identified as Others. This setting is limited. For example, if a dataset includes chromosomes with names chr1, chr2, etc, they will all be counted as Others. This problem should be solved.

We may either force the Chromosome to have the names 1, 2, ..., 22, X, Y, MT in the HeremesData() function, which could make our package to be limited. The other approach could be adding an option use.original.chr.namein the draw_stackbar_chr() function. If users choose use.original.chr.name = Y, then we do not sort chromosome factors, and directly print a stack bars based on original chromosome names, which may be lengthy because there are lots of chromosomes such as KI270734.1, GL000205.2 (i.e. unplaced geno).

Miscellaneous issue 2: NA in LowExpressionFlag

As the LowExpressionFlag is optional in HermesData object. If a SE object does not include LowExpressionFlag, HermesData() will add this factor into rowData with all values in NA.

The HermesData object will lead to error message in ggplot since all LowExpressionFlag values are NA. We need one test in assert_that() to dectect if all values in the LowExpressionFlag are NA.



insightsengineering/hermes documentation built on May 2, 2024, 6:01 a.m.