FiltContig | R Documentation |
Filter out data from contigs that do not reach criterias of selection.
FiltContig( gposModBasePos, grangesModPos, cContigToBeRemoved = NULL, dnastringsetGenome, nContigMinSize = -1, listPctSeqByContig, nContigMinPctOfSeq = 95, listMeanCovByContig, nContigMinCoverage = 20 )
gposModBasePos |
An UnStitched GPos object containing PacBio CSV data to be filtered. |
grangesModPos |
A GRanges object containing PacBio GFF data to be filtered. |
cContigToBeRemoved |
Names of contigs for which the data will be removed. Defaults to NULL. |
dnastringsetGenome |
A DNAStringSet object containing the sequence for each contig. |
nContigMinSize |
Minimum size for contigs to keep. Contigs with a size below this value will be removed. Defaults to -1 (= no filter). |
listPctSeqByContig |
List containing, for each strand, the percentage of sequencing for each contig. This list must be composed of 2 dataframes (one by strand) called f_strand and r_strand. In each dataframe, "refName" column returning names of contigs and "seqPct" column returning percentage of sequencing. |
nContigMinPctOfSeq |
Minimum percentage of sequencing for contigs to keep. Contigs with a percentage below this value will be removed. Defaults to 95. |
listMeanCovByContig |
List containing, for each strand, the mean of coverage for each contig. This list must be composed of 2 dataframes (one by strand) called f_strand and r_strand. In each dataframe, "refName" column returning names of contigs and "mean_coverage" column returning mean of coverage. |
nContigMinCoverage |
Minimum mean coverage for contigs to keep. Contigs with a mean coverage below this value will be removed. Defaults to 20. |
A list with filtered gposModBasePos and filtered grangesModPos.
# loading genome myGenome <- Biostrings::readDNAStringSet(system.file( package = "DNAModAnnot", "extdata", "ptetraurelia_mac_51_sca171819.fa" )) myGrangesGenome <- GetGenomeGRanges(myGenome) # Preparing a gposPacBioCSV and a grangesPacBioGFF datasets myGrangesPacBioGFF <- ImportPacBioGFF( cPacBioGFFPath = system.file( package = "DNAModAnnot", "extdata", "ptetraurelia.modifications.sca171819.gff" ), cNameModToExtract = "m6A", cModNameInOutput = "6mA", cContigToBeAnalyzed = names(myGenome) ) myGposPacBioCSV <- ImportPacBioCSV( cPacBioCSVPath = system.file( package = "DNAModAnnot", "extdata", "ptetraurelia.bases.sca171819.csv" ), cSelectColumnsToExtract = c( "refName", "tpl", "strand", "base", "score", "ipdRatio", "coverage" ), lKeepExtraColumnsInGPos = TRUE, lSortGPos = TRUE, cContigToBeAnalyzed = names(myGenome) ) # Preparing ParamByStrand Lists myPct_seq_csv <- GetSeqPctByContig(myGposPacBioCSV, grangesGenome = myGrangesGenome) myMean_cov_list <- GetMeanParamByContig( grangesData = myGposPacBioCSV, dnastringsetGenome = myGenome, cParamName = "coverage" ) # Filtering myFiltered_data <- FiltContig(myGposPacBioCSV, myGrangesPacBioGFF, cContigToBeRemoved = NULL, dnastringsetGenome = myGenome, nContigMinSize = 1000, listPctSeqByContig = myPct_seq_csv, nContigMinPctOfSeq = 95, listMeanCovByContig = myMean_cov_list, nContigMinCoverage = 20 ) myFiltered_data$csv myFiltered_data$gff
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.