In wangjiaxuan666/autopca: the packages created for draw pca plot more easily.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

autopca

Originally,this R package autopca was a script I used to draw PCA. PCA can analyze batch effects in experimental processing, and also analyze experimental processing factors. In my point, it is very important in checking omics data. At present, this R package Features are very few But are under active development. The basic function is already there, so I placed on github, if necessary, you can download and use it yourself.

Update

fix bug 20201211

[x] pca(dat) :Error in if (sample_group == FALSE) { : 参数长度为零
[ ] 将数据整理放在pca_data_tidy中，为后续loading 图做准备
[x] sample_group必须是tibble，要提取第二列
[x] 要加上一个注释说明，str_group必须在str_sample已经更改的基础上修改

Installation

The R packages only can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("wangjiaxuan666/autopca")

Notice : if you install autopca failed , please check the R installed envirment. maybe you need install tidyverse or plyr before. IF YOU FAILED WITH ANGRY AND BOOM ERRORS. Please contact me with poormouse@126.com , I' m sorry about what happen to you. Although it's me, I also have to admit that this is a bug-filled R package.

Every time I use this R packages , it will spend 95% time to fix new bug! Although it will spend a little time if i use the scicpt not R packages. Automation is always very difficult because it is suitable for all kinds of situations and data type. BUT IT IS A FUN WHEN A PROBLEM NEED TO THINK

Example

Before we start the PCA analysis, we first need to tidy the input data. We can use the function pca_data_tidyto make the data clean for the next PCA analysis.It should be noted that the format of the input data is very important to determine the success of subsequent analysis.

Here, Emphasize the format of the input data. First, PCA analysis use the function stat::prcomp, Consequently the result given the every observed values' Variance in principal components. the observed values is the prcomp data' rownames. For example, the data iris

head(iris)

the rownames is every iris ID number, in iris, we want to demonstrate the proportion of each iris flower in terms of petal length and width and so on. That is we want, so for iris data, we just tidy a little for next analysis. Just like

library(autopca)
irisgroup <- iris[,5] 
iristidy <- iris[,-5]
pca(iristidy,sample_group = as.data.frame(irisgroup))

the group information is Grouping information is required, otherwise an error will be reported!

#pca(iris)

As for why and how to use autopca, We need to start from the beginning.

Illustration

the autopca designed for transcriptome data, The classic transcriptome data is rownames is gene id, the column names is every observed sample. sometime it also add some annotation information in the tail. like this.

test = matrix(rnorm(200), 20, 10)
test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3
test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2
test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4
colnames(test) = paste("Test", 1:10, sep = "")
rownames(test) = paste("Gene", 1:20, sep = "")
annot <- c(rep("KEGG",20))
test <- data.frame(test,annot)
head(test)

Through the above steps, we obtained a classic transcriptome data frame. NOW WE explain the sample variance in PC. So we need tidy the data by the functionpca_data_tidy.

pca_data_tidy(as.data.frame(test)) -> test_tidy

NEXT, we can use the tidy data to analysis, just like this

as.data.frame(c(rep("A",5),rep("B",5))) -> group
rownames(group) <- colnames(test[,-11])
colnames(group) <- "group"
head(group)
pca(test_tidy,sample_group = group)

But pca function not only that, It supports regular matching characters to replace the names of sample or group. When the sample names is "CK-1_fokm, CK-2_fokm," and so on , It will be very useful.

#pca(data = re,# the data didn't exist,just a example to display the parameter
#    display_sample = TRUE,
#    rename = "replace",
#    str_sample = "_.*",
#    str_group = "-\\d$",
#    add_ploy = TRUE)

ALL parameters explain :

Usage:

pca(
  data = data,
  center = T,
  retx = T,
  scale = FALSE,
  display_sample = FALSE,
  rename = c("diy", "replace"),
  sample_group = NULL,
  str_sample = NULL,
  str_group = "-.*",
  add_ploy = FALSE
)

Arguments

data:iput data form the function 'pca_data_tidy'

center:the prcomp param, detail see '?prcomp'

retx:the prcomp param, detail see '?prcomp'

scale:the prcomp param, detail see '?prcomp'

display_sample:if TRUE will add the text labels on points.

rename：the method for change the sample and group names, two argment can choose, "diy" is for the creat a data for name,"replace" is use regexp to replace or change the name

sample_group：a data for change the sample and group name, the rownames is sample and the first column is group

str_sample:the 'regexp' for the sample name to become the target name

str_group:the 'regexp' for the group name to become the target name

add_ploy:if TRUE will add the polygon on points.

Of course, as a fan of tidyverse, all function in autopca also support tibble data input. If there are any questions and suggestions in use, welcome questions and suggestions

wangjiaxuan666/autopca documentation built on Jan. 19, 2021, 6:54 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com