pyramid_3d: ACGT pyramid 3D ploting function

Description Usage Arguments Details Value Author(s) Examples

View source: R/pyramid_3d.R

Description

The acgt pyramid 3D ploting function allows to plot the ACGT distribution of a given sample into the 3D space by a principal component analysis.

Usage

1
2
pyramid_3d(df, type = "points", color = "black", ids = NULL,
  text = NULL, cex = 1, identify = FALSE, classify = NULL, groups = 2)

Arguments

df

Data frame of k-mer frequencies. The single k-mers are the columns and the rows indicating different samples, sequence reads, or contigs. See teh function link{get_kmer_distribution} to generate the frequencies from a DNA sequence.

type

Choose if single points [default] or lines between the points should be shown.

color

Single value 'black', if all points should be black or vector of length nrow(df) if all points should be colored differently.

ids

If identify = TRUE the single points can be selected and the given id is shown. Must have the length nrow(df).

text

Single value 'x', if all points should be printed as 'x' or vector of length nrow(df) if all points should be printed by a different letter differently.

cex

Size of the shown text.

identify

Set to TRUE, if points should be identified by their ids.

classify

Sepcify the used classifier: 'kmeans' or 'hclust'

groups

Number of assumed groups or 'k' [default = 2]

Details

The function allows to plot the ACGT distribution of a given sample into the 3D space using a PCA. The PCA is done by the function prcomp with the default parameters. Further the function is able to draw the points in different colors and text symbols. Only points and letters are available. The identify option allows to identify single points in the 3D plot.

The sequences must be provided as a DNAStringSet object. See https://web.stanford.edu/class/bios221/labs/biostrings/lab_1_biostrings.html for more details and readDNAStringSet() for reading in fasta files.

Value

If classify is set a data.frame with predicated group labels.

Author(s)

Jochen Kruppa

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
## Read in own DNA sequences by the package Biostrings (see Details for more information)

data(viralExampleSeqs)

kmer_distr <- get_kmer_distribution(viralExampleSeqs, k = 1)

pyramid_3d(kmer_distr,
           cex = 2,
           color = "blue")

ids <- names(viralExampleSeqs)

pyramid_3d(kmer_distr,
           ids = ids,
           cex = 2,
           color = "blue",
           identify = TRUE)

data(viralExampleCodingSeq)

kmer_distr <- get_kmer_distribution(viralExampleCodingSeq, k = 1)
text_ids <- ifelse(names(viralExampleCodingSeq) == "non_coding", "x", "o")
color_ids <- ifelse(names(viralExampleCodingSeq) == "non_coding", "black", "red")

pyramid_3d(kmer_distr,
           cex = 1,
           text = text_ids,
           color = color_ids)

ids <- names(viralExampleCodingSeq)

pyramid_3d(kmer_distr,
           ids = ids,
           cex = 1,
           text = text_ids,
           color = color_ids,
           identify = TRUE)

Use the build in classification

pred_df <- pyramid_3d(kmer_distr,
                      cex = 1,
                      text = text_ids,
                      classify = "kmeans")

pred_df <- pyramid_3d(kmer_distr,
                      cex = 1,
                      text = text_ids,
                      classify = "hclust")

jkruppa/acgtPyramid documentation built on May 19, 2019, 12:45 p.m.