Home

/

GitHub

/

Benjamin-Vincent-Lab/StarSalmon

/

README.md

README.md
In Benjamin-Vincent-Lab/StarSalmon: Functions for writing commands and post processing the Star Salmon pipeline.

Post processes output from the star salmon pipeline.

Depricated. Renamed and moved to PostRNASeqAlign

post_process_salmon(
  this_script_path = housekeeping::get_script_dir_path(include_file_name = T),
  input_file_paths = <input_file_paths>,
  output_dir = <output_dir>,
  ref = "grch38",
  gene_biotypes = c('protein_coding', 
                    'IG_C_gene','IG_D_gene', 'IG_J_gene', 'IG_V_gene',
                    'TR_C_gene', 'TR_D_gene', 'TR_J_gene','TR_V_gene'),
  thread_num = 8,
  output_transcript_matrix = F,
  output_hgnc_matrix = T,
  output_entrez_id_matrix = F,
  output_piped_hugo_entrez_id_matrix = F,
  output_upper_quartile_norm = F,
  output_log2_upper_quartile_norm = F,
  counts_or_tpm = "counts"
)

In R:

housekeeping::assemble_package(package_name = "StarSalmon", my_version = "0.2-02",
  my_dir = "/datastore/alldata/shiny-server/rstudio-common/dbortone/packages/StarSalmon")

In bash:

cd /datastore/alldata/shiny-server/rstudio-common/dbortone/packages/StarSalmon
my_comment="Forgot to rebuild 0.2-01"
git add .
git commit -am "$my_comment"; git push origin master
git tag -a 0.2-02 -m "$my_comment"; git push -u origin --tags

Restart R In R (local library, packrat library):

devtools::install_github("Benjamin-Vincent-Lab/StarSalmon")

Or for a specific version:

devtools::install_github("Benjamin-Vincent-Lab/StarSalmon", ref = "0.2-02")

https://sc.unc.edu/dbortone/starsalmon https://sc.unc.edu/benjamin-vincent-lab/starsalmon Moved to github so that the package could be accessed without a token.

The following code was used to make the grch38 bm_results

mart = useMart(biomart="ENSEMBL_MART_ENSEMBL",
                  dataset="hsapiens_gene_ensembl", 
                  host="useast.ensembl.org") # uk, useast, uswest, asia was intermitant, www was hardley ever working
#https://useast.ensembl.org/info/website/archives/index.html
# other hosts failed to find the data !@$!#$
# failed submisions
# failures mid submission

# datasets <- biomaRt::listDatasets(ensembl)
unique_names = my_dt[[1]]
# my_filters = listFilters(mart)
# my_attr = listAttributes(mart)
# ucsc  |  UCSC Stable ID(s) [e.g. ENST00000000233.9] # these aren't ucsc!?!
# had to run this a ton to get it to go all the way through.  was in the process of breaking it up into a loop when it ran through so I saved it.
failed = FALSE
BM_results = tryCatch({
  biomaRt::getBM(
    filters= "ucsc",
    attributes= c("ucsc", "hgnc_symbol", "entrezgene_id", "gene_biotype"),
    values= unique_names,
    mart= mart
  )
}, warning = function(w) {
  message("Got warning")
  failed = TRUE
}, error = function(e) {
  message("Got error")
  failed = TRUE
})

I renamed the 'entrezgene_id' to 'entrezgene.' Connecting with the dataabse above was very problematic. It failed to connect 1 out of 5 times and when it did connnect it didn't finish. I was giving up on it and was going to write a loop to keep sending smaller batches using the try catch statement when finally the whole thing went through. For future uses a loop is the way to go. Also don't expect the column names to stay stable. They change these on almost a monthly basis. I'd love to switch to something other than biomaRt, but unfortunately AFAIK there isn't anything else.

Using test_code.R, I checked if hgnc or entrez was better for not having one ensemble map to them. Almost all of the duplicates for the ensembl id were from one ensembl id mapping to multiple entrez. Very few of the hgnc caused multimappings. I also checked if using entrez to lookup hgnc and visa versa caused more multimappings. It wasn't a huge contributor and looking up the genes did find a lot fo new genes: ~70 for hgnc and ~300 for entrez. I skiped using tximport. I don't see the value of this package, since I'd have to make the tx2gene matrix anyway. That's the hard part.

See inst/mapping_genes.R for the addition of another 1882 genes using AnnotationDbi on 20200805.

This package really should be called something like PostSalmon or PostRNAQuant(once adapted to other quantifiers).
Could just merge this package into binfotron.

Benjamin-Vincent-Lab/StarSalmon documentation built on Feb. 15, 2024, 11:34 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Benjamin-Vincent-Lab/StarSalmon
Functions for writing commands and post processing the Star Salmon pipeline.

README.md
In Benjamin-Vincent-Lab/StarSalmon: Functions for writing commands and post processing the Star Salmon pipeline.

Example

Assembling this package

Push changes

Install

Previous locations

Creating BM_results

Comments

ToDos

R Package Documentation

Browse R Packages

We want your feedback!

Benjamin-Vincent-Lab/StarSalmon Functions for writing commands and post processing the Star Salmon pipeline.

README.md In Benjamin-Vincent-Lab/StarSalmon: Functions for writing commands and post processing the Star Salmon pipeline.

Example

Assembling this package

Push changes

Install

Previous locations

Creating BM_results

Comments

ToDos

R Package Documentation

Browse R Packages

We want your feedback!

Benjamin-Vincent-Lab/StarSalmon
Functions for writing commands and post processing the Star Salmon pipeline.

README.md
In Benjamin-Vincent-Lab/StarSalmon: Functions for writing commands and post processing the Star Salmon pipeline.