Description Usage Arguments Details Value Author(s) Examples
View source: R/G4iM.Grinder.Funs.R
The function is suitable for determining the genomic features that share their location with (and hence may be affected by) GiG's PQS and PiMS results. It requires of an annotation file for the sequence, with which then will match positions. The function returns a data frame (within a tibble) of all the matches found for the input sequences with all the information of the genomic feature hit.
1 | GiG.df.GenomicFeatures(GiG.df, , db, NumRow = NA, Feature = NA, sep = ";")
|
GiG.df |
data.frame, G4-iM Grinder M2a or M3a result data frame (PQSM2a or PQSM3a). Usually within a GiG.List. |
GFF |
data.frame, annotation file as a data.frame. Must have "Start", "Finish", "type" and "attribute" columns. |
NumRow |
integer or vector of integers, the rows of the GiG.df to colocate with the GFF file. It can be left as |
Feature |
character or vector of characters, the genomic features to colocate with G4-iM Grinder's results (e.g. "CDS" or c("gene", "UTR", "telomere")). It can be left as |
sep |
character, separator character used to split information from the GFF attribute column. The factory-fresh default is |
If both GiG.df and GFF files have a strand column ("Strand" or "strand" can be used), the results will be matched also by strand. Else, strands will not be used to match the results and a warning will be displayed regarding the circumstance. Please make sure the annotation file supplied matches the sequence used in the analysis. G4-iM Grinder reverse complements the sequence to generate the supplementary strand. If the annotation files is only the complement, the positions of one of the files has first to be adapated (see below).
The function creates two new columns for the GiG.df.
nFeatures is the total number of biological features co-located with the results.
Features is a dataframe with the actual biological features that co-locate with the results.
Efres Belmonte-Reche
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # Downloading a sequence from refseq via biomartr package
require(G4iMGrinder)
require(biomartr)
#Ebola virus genome identificator for the refseq database
org <- "GCF_000855585.1"
Sequence <- toString(biomartr::read_genome(biomartr::getGenome(db = "refseq", organism = "GCF_000855585.1", reference = F)))
Name <- names(biomartr::read_genome(biomartr::getGenome(db = "refseq", organism = "GCF_000855585.1", reference = F)))
# Creating the G4iMGrinder Results for the DNA search of G-quadruplex.
Rs <- G4iMGrinder(Name = Name, Sequence = Sequence)
# Getting the annotatation file of the genome
GFF <- biomartr::getGFF(organism = "GCF_000756205.1", reference = F)
GFF <- biomartr::read_gff(GFF)
# Finding biological features that colocate with the PQS results (overlapping size-restricted search).
Rs.Features <- GiG.df.GenomicFeatures(df = Rs$PQSM2a, GFF= GFF )
# Finding biological features that colocate with the Potential Higher Order Quadruplex Sequence (PHOQS) results (non-overlapping size-unrestricted search).
Rs.Features <- GiG.df.GenomicFeatures(df = Rs$PQSM3a, org = org, db = "refseq")
# To change the direction (regarding the PQS positions) of the supplementary strand in G4-iM Grinder's results.
# Get results dataframe (Here for example, Method 2 (M2)- overlapping and size-restricted search).
M2 <- Rs$PQSM2a
#Length of Sequence
LSeq <- nchar(Sequence)
#Dataframe of old positions.
Supdf <- data.frame(Start = M2$Start[M2$Strand =="-"], Finish = M2$Finish[M2$Strand =="-"])
# invert the positions (from reverse complement, to complement).
M2$Start[M2$Strand =="-"] <- LSeq - Supdf$Finish
M2$Finish[M2$Strand =="-"] <- LSeq - Supdf$Start
# Position are inverted. Complementary is now in anti-sense direction (complement).
# Then apply GiG.df.GenomicFeatures function.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.