knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Creating the Package

File New Project 
Version Control
Copy Address from GitHub to RStudio
install.packages("devtools")
library(devtools)
library(usethis)
use.vignette("kamakiVignette")

Downloading FastQC Files

library(kamaki)

Locate your FastQC files on Server, navigate them to Home Directory.
Utilize this example command structure to transfer your files.

mv /path/to/your/fastqc_data.txt ./
exit session
in Local terminal
scp 'stp4004@aristotle.med.cornell.edu:fastqc_data.txt ./'
Rename your files to (example) ERR4588859.txt for record keeping.

5 Explanation of the internal Sed & Grep Functionality

"sed -n '/", test, "/,/END_MODULE/p' ", file, " | grep -v '^>>'"
The sed function is a UNIX command for extracting values. In this case, we start extracting at the string "Per Base Sequence Quality", and Stop at "END MODULE". Furthermore, the grep function will remove (from previous extraction) only the lines starting with '>>', which contain only the values. 

6 Now go back to the function’s code and add a variable to the function that adds an additional column to the resulting data frame containing a user-specified sample name (e.g. “WT_1_ERR458493”). I.e., the function should get at least one more argument

reading_in <- function(file, test = "Per base sequence quality", input, input2)
dat[, "Sample"] <- input
dat[, "Genotype"] <- input2

Example.
reading_in <- function(file, test = "Per base sequence quality", "ERR458493", "WT")

7 Use your updated function to read in the FastQC results of at least 4 fastq files that should cover 2 biological replicates and 2 technical replicates of each. Make sure to keep track of the sample name in the new Robjects you’re creating.

Grabbing Files, Previous ERR4538493 was transferred earlier. 
Went back for 1 more Technical Replicate + 2 Technical Replicate for WT2 

scp stp4004@aristotle.med.cornell.edu:/home/stp4004/ERR4538494.txt ERR458878.txt ERR458879.txt ./
cd ERR4538494.txt ERR45887{8..9}.txt /home/spirpinias/Desktop

Files Renamed ERR4538493 ERR4538494 ERR458878 ERR458879

Syntax WT(Indicating Biological Replicate)_(Indicating Technical Replicate)

WT1=reading_in("/home/spirpinias/Desktop/WT1ERR4538493.txt", test = "Per base sequence quality", "ERR4538493","WT") 
WT1_2=reading_in("/home/spirpinias/Desktop/WT2ERR4538494.txt", test = "Per base sequence quality", "ERR4538494","WT") 
WT2=reading_in("/home/spirpinias/Desktop/WT2ERR458878.txt", test = "Per base sequence quality", "ERR4588878","WT") 
WT2_1=reading_in("/home/spirpinias/Desktop/WT2ERR458879.txt", test = "Per base sequence quality", "ERR4588879","WT") 

Unable to write a for loop, as I am not sure what I am supposed to be looping.

8 Combine all these data.frames into one

colnames(TotalData) <- c("Base", "Mean","Median","Lower Quartile","Upper Quartile","10th Percentile","90th Percentile","Sample","Genotype")

TotalWTData=rbind(WT1,WT1_2,WT2,WT2_1)

9 The goal is to include that combined data frame as a data object with your package.

save(TotalWTData, file="WTBioRep1-2.rda")

10 How do you build your package?

install.packages("devtools")
library(devtools)
install_github("spirpinias/kamaki")
library(kamaki)

11 Make a ggplot2-based plot using the combined data frame. Try to mimick the basic features of the example plot below, but feel free to change the color palette, remove the grey background and other details

knitr::include_graphics("/home/spirpinias/Documents/RStudioStuff/kamaki/HW5.png")


spirpinias/kamaki documentation built on March 8, 2020, 10:41 a.m.