knitr::opts_chunk$set(
  # code or die
  echo = TRUE,
  # minimize verbosity
  warning = FALSE, message = FALSE,
  # dpi = 150, # for hires images
  comment = "#>")
set.seed(0xFEED)

Under Construction

Easy to run a PCA over the rnaseq data, or whatever other assy you got there (CNV, proteomics, take your pick)

library(FacileData)
library(FacileAnalysis)
library(dplyr)

pca.crc <- exampleFacileDataSet() |> 
  filter_samples(indication == "CRC", sample_type == "tumor") |> 
  fpca(assay_name = "rnaseq")

Vizualize the results:

viz(pca.crc, color_aes = "subtype_crc_cms", shape_aes = "sex")

Need to fix up report()

Not sure how to compare() two PCA results ...

Exploring Feature Loadings on Principal Components

The principal components are a new basis of orthogonal axes that are linear combinations of the original axes, which in this examples are genes. By exploring which genes (and gene sets) are most highly loaded along each principal component, analysts can begin to better understand what biological processes might be driving the variability in their data.

The ranks(fpca()) result extracts the feature (gene) loadings onto the principal components. signature(fpca()) extracts the top N ntop features from each principal component.

The code below returns the top 10 features over the first three PCs. The score column corresponds to the loading:

pca.sig <- signature(pca.crc, dims = 1:3, ntop = 10)
pca.sig |> 
  tidy() |> 
  select(feature_id, symbol, dimension, score)

When we map the expression values of the genes to the PCA viz, we can see that the expression of the genes with high abs(score) values should track with its position on its principal component.

:::note

# We need to hack the expression values for these genes into the pca result for
# now, but we are working to upgrade the data retrieval abilities around
# FacileAnalysisResult objects so that this can be made effortless.
pcgenes <- c(IL8 = "3576", ARFGEF2 = "10564", BRIP1 = "83990")
pca.crc$result <- pca.crc$result |> 
  with_assay_data(pcgenes)

# The following issue tracks our intent to enable easier query/retrieval of
# data from different places to make interactive exploration more facile:
# 
# https://github.com/facileverse/FacileData/issues/8
# 
# Instead of hacking the gene expression data back into the pca.crc$result
# tibble, we should be able to do something like this:
# 
#   viz(pca.crc, color_aes = "feature:IL8|name")

:::

For example, we can see that IL8 is highly loaded on PC1:

viz(pca.crc, color_aes = "IL8", title = "IL8")

ARFGEF2 is also highly loaded on PC1, but witha flipped sign:

viz(pca.crc, color_aes = "ARFGEF2", title = "ARFGEF2")

and BRIP1 is a highly loaded gene on PC2

viz(pca.crc, color_aes = "BRIP1", title = "BRIP1")


facilebio/FacileAnalysis documentation built on March 15, 2024, 7:37 a.m.