| FiltContig | R Documentation |
Filter out data from contigs that do not reach criterias of selection.
FiltContig( gposModBasePos, grangesModPos, cContigToBeRemoved = NULL, dnastringsetGenome, nContigMinSize = -1, listPctSeqByContig, nContigMinPctOfSeq = 95, listMeanCovByContig, nContigMinCoverage = 20 )
gposModBasePos |
An UnStitched GPos object containing PacBio CSV data to be filtered. |
grangesModPos |
A GRanges object containing PacBio GFF data to be filtered. |
cContigToBeRemoved |
Names of contigs for which the data will be removed. Defaults to NULL. |
dnastringsetGenome |
A DNAStringSet object containing the sequence for each contig. |
nContigMinSize |
Minimum size for contigs to keep. Contigs with a size below this value will be removed. Defaults to -1 (= no filter). |
listPctSeqByContig |
List containing, for each strand, the percentage of sequencing for each contig. This list must be composed of 2 dataframes (one by strand) called f_strand and r_strand. In each dataframe, "refName" column returning names of contigs and "seqPct" column returning percentage of sequencing. |
nContigMinPctOfSeq |
Minimum percentage of sequencing for contigs to keep. Contigs with a percentage below this value will be removed. Defaults to 95. |
listMeanCovByContig |
List containing, for each strand, the mean of coverage for each contig. This list must be composed of 2 dataframes (one by strand) called f_strand and r_strand. In each dataframe, "refName" column returning names of contigs and "mean_coverage" column returning mean of coverage. |
nContigMinCoverage |
Minimum mean coverage for contigs to keep. Contigs with a mean coverage below this value will be removed. Defaults to 20. |
A list with filtered gposModBasePos and filtered grangesModPos.
# loading genome
myGenome <- Biostrings::readDNAStringSet(system.file(
package = "DNAModAnnot", "extdata",
"ptetraurelia_mac_51_sca171819.fa"
))
myGrangesGenome <- GetGenomeGRanges(myGenome)
# Preparing a gposPacBioCSV and a grangesPacBioGFF datasets
myGrangesPacBioGFF <-
ImportPacBioGFF(
cPacBioGFFPath = system.file(
package = "DNAModAnnot", "extdata",
"ptetraurelia.modifications.sca171819.gff"
),
cNameModToExtract = "m6A",
cModNameInOutput = "6mA",
cContigToBeAnalyzed = names(myGenome)
)
myGposPacBioCSV <-
ImportPacBioCSV(
cPacBioCSVPath = system.file(
package = "DNAModAnnot", "extdata",
"ptetraurelia.bases.sca171819.csv"
),
cSelectColumnsToExtract = c(
"refName", "tpl", "strand", "base",
"score", "ipdRatio", "coverage"
),
lKeepExtraColumnsInGPos = TRUE, lSortGPos = TRUE,
cContigToBeAnalyzed = names(myGenome)
)
# Preparing ParamByStrand Lists
myPct_seq_csv <- GetSeqPctByContig(myGposPacBioCSV, grangesGenome = myGrangesGenome)
myMean_cov_list <- GetMeanParamByContig(
grangesData = myGposPacBioCSV,
dnastringsetGenome = myGenome,
cParamName = "coverage"
)
# Filtering
myFiltered_data <- FiltContig(myGposPacBioCSV, myGrangesPacBioGFF,
cContigToBeRemoved = NULL,
dnastringsetGenome = myGenome, nContigMinSize = 1000,
listPctSeqByContig = myPct_seq_csv, nContigMinPctOfSeq = 95,
listMeanCovByContig = myMean_cov_list, nContigMinCoverage = 20
)
myFiltered_data$csv
myFiltered_data$gff
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.