knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction to group_14_package

The group_14_package contains a dataset called 'codonTable.rda' and 5 different functions.

Data - codonTable

To explore the different functions and their possibilities a dataset is preloaded with the package.
Data is a table describing the different amino acids by a capital letter and how different combinations/codons can describe an amino acid.

data = get(load("/cloud/project/data/codonTable.rda"))
data

Generate your own DNA string using stringDNA()

The function generates a string of DNA bases “A”, “T”, “G” and “C” using sampling method.  The length of the DNA string is defined by the size argument, where default is 1. Output is a DNA string.

stringDNA <- function(size = 1){
  set.seed(41)
  sampleDNA <- sample(c("A", "T", "G", "C"), size = size, replace = TRUE)
  strDNA <- paste0(sampleDNA, collapse = "")
  return(strDNA)
}

# Example run
stringDNA(size = 50)

Translation of DNA to RNA

When generating a DNA string using the function above it might also be wanted to translate this into a RNA string. For this the RNATransfunction can be used. This function takes a certain DNA string (character) and returns the corresponding RNA string that would be development as a result of the transcription process, in line with the central dogma.

RNATrans <- function(strDNA){ 
  strRNA <- gsub("T", "U", strDNA) 
  return(strRNA) 
}

# Example runs
RNATrans("CGCGGTTGGCCCTGAAGCTGCATGCAATTATCATATGGTGGGACTCGTGC")
RNATrans(stringDNA(size = 50))

From DNA to RNA to codons

The getCodon function takes an RNA sequence as input string and returns a character vector of codons. Each invividual element in the vector is a triplet of three ribonucleic acids. As a default the open reading frame starts at at first character in the input RNA sequence, but this function also allows the user to determine the start position of the open reading frame.

The getCodon function can be used in conjuction with RNATrans() as the next step in processing sequencing data. The output from RNATrans() is an RNA sequence that can be directly inputted into getCodon() to retrieve the codons given in the RNA sequence.

getCodon <- function(RNASeq, start = 1){ 
  seqLen <- nchar(RNASeq) 
  codons <- substring(RNASeq, 
                      first = seq(from = start, to = seqLen-3+1, by = 3), 
                      last = seq(from = 3+start-1, to = seqLen, by = 3)) 
  return(codons) 
} 

# Example runs
RNA <- "UCGUUA" 
getCodon(RNA, start = 1)
getCodon(RNATrans("CGCGGTTGGCCCTGAAGCTGCATGCAATTATCATATGGTGGGACTCGTGC"))
getCodon(RNATrans(stringDNA(size = 50)))

From Codons to Amino acids

Using the data table called codonTable it is possible to translate codons into amino acids and then return it as a sting

check_codon <- function(codons){
  is_codon <- paste0(codonTable[codons], collapse = "")
  return(is_codon)
}


#Example run
check_codon(getCodon(RNATrans(stringDNA(size = 50))))

Amino acid occurance plot

The plotAAOccur function creates a bar chart of every unique character (here assumed to be amino acids) in the input string, in correspondence to the number of occurrences for each character in the input string.

The function will only return a plot so if the data is needed, using the counts variable as an extra output could be usefull (the output would be a dataframe)

#tidyverse is needed for the pipe functionality
library(tidyverse)
plotAAOccur <- function(stringAA){
  #create the categories
  uniqueAA <- stringAA %>%
    #splitting the input string for every character
    stringr::str_split(pattern = stringr::boundary("character"), simplify = TRUE) %>%
    #making everything in the final string a character
    as.character() %>%
    #keep only unique entries in the string
    unique()

# counting occurance of the uniqe charaters in the original string

  counts <- sapply(uniqueAA, function(occurrenceAA) stringr::str_count(
      string = stringAA, pattern =  occurrenceAA)) %>%
    as.data.frame()



  colnames(counts) <- c("Counts")
  counts[["stringAA"]] <- rownames(counts)

  #plotting the entries in the input and the occurrence of each
  occurrenceAAplot <- counts %>%
    ggplot2::ggplot(ggplot2::aes(x = stringAA, y = Counts, fill = stringAA)) +
    ggplot2::geom_col() +
    ggplot2::theme_bw()

  return(occurrenceAAplot)
}

#Example run using the four DNA nucleotides
plotAAOccur(stringDNA(size = 100))


#Example run with Amino acids
plotAAOccur(check_codon(getCodon(RNATrans(stringDNA(size = 50)))))


rforbiodatascience22/group_14_package documentation built on April 7, 2022, 8:20 p.m.