knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

DEGSimilar is a package which provides a streamlined pipeline in order to compare a gene of interest to a set of genes chosen by the user. This is done via the BLAST+ algorithm.

In order to download and use DEGSimilar, you must first have the BLAST+ executables available locally on your machine. In order to check whether this is already available locally on your machine run:

Sys.which("blastn")
Sys.which("tblastn")
Sys.which("makeblastdb")

If the executable exists on your machine then a path should be returned by the above command. If you do not have the executable then it may be downloaded from:

https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download.

You may have to restart R after running the installer for R to be able find it.

If you install the executables and R cannot find it still, then you may need to manually refer R to the location of the executables on your system.

In addition in some instances the Biostrings package must be installed from Bioconductor before installing DEGSimilar. This can be done by running the following.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Biostrings")

After doing the above you may run the following to install DEGSimilar:

require("devtools")
devtools::install_github("GeorgeLy7/DEGSimilar", build_vignettes = TRUE)

Afterwards the function may be loaded into R by running

library(DEGSimilar)

Functions

The following section will describe the two functions of the package getBlastResults and graphBLASTResults.

getBlastResults

This function will output the results of a BLAST search based on the user parameters. The type of BLAST will vary on the parameters.

This function will take in two fasta files, one (labelled as userFasta) is the query sequence the user wishes to use in their BLAST run. That is the fasta file the user wishes to compare against the second file (lablled as fastaDatabase). getBlastResults will create a BLAST+ database file based on the fastaDatabase file. The user should provide paths to the files as parameters. These fasta files may be of nucleotide or amino acids, the user will have to input the type of the fasta files as parameters.

The other parameters of the function are as follows:

maxSequencesReturned will truncate the returned list of the function to the number maxSequencesReturned is set to. The value of this parameter must be >= 1 or the function will return an error.

An example with a value of maxSequencesReturned of 2:

getBlastResults("exampleQueryFasta.fasta","exampleDatabase.fasta",maxSequencesReturned=2, dbType="nucl", userFastaType="prot")

An example with a value of maxSequencesReturned of 5:

getBlastResults("exampleQueryFasta.fasta","exampleDatabase.fasta",maxSequencesReturned=5, dbType="nucl", userFastaType="prot")

dbType indicates whether the fastaDatabase file is a fasta file of nucleotides of amino acids. This value must be either "nucl" for nucleotides or "prot" for protein.

userFastaType indicates whether the fastaDatabase file is a fasta file of nucleotides of amino acids. This value must be either "nucl" for nucleotides or "prot" for protein.

The two above parameters will determine the type of BLAST ran, for example if both dbType and userFastaType are "nucl" then blastn will be ran. If dbType is "nucl" and userFastaType is "prot" then tBlastn will be ran.

graphBLASTResults

This function will create a heatmap of the sequences recieved from getBlastResults and the provided gene expression values from the user.

The parameter of blastResults will take in the exact output of getBlastResults with no alterations.

The parameter of geneData is required to be a list or database with the following format:

geneData <- read.csv("GeneDataExample.csv",encoding="UTF-8",row.names=1)
geneData

The row names must be the same as their respective gene names in the database file used in getBlastResults.

The rowLabelScaling and colLabelScaling change the size of the row and column labels.

Package References

Ly, George. (2020) DEGSimilar: A streamlined pipeline to compare DEGs via BLAST. Unpublished.

Other References

R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/



GeorgeLy7/DEGSimilar documentation built on Dec. 31, 2020, 12:04 p.m.