Mail from Cédric.
Les Long Reads :
/work/mzahm/Project_Pangafish/RawData/*.fastq.gz
Les reads Hi-C :
/work/mzahm/Project_Pangafish/RawData/HiC/*.fastq.gz
Les reads 10X :
/work/mzahm/Project_Pangafish/RawData/Run_PANGA1G.13800/Analyse_DemultiplexData.53847/*.fastq.gz
Le génome assemblé LR :
/work/project/sigenae/Christophe/Panga_longranger/map_to_scaffolds/genome.fa
Le bam Longranger :
/work/project/sigenae/Christophe/Panga_longranger/map_to_scaffolds/ALIGN/outs/possorted_bam.bam
Le fichier de comptage des tags 10X par bin de 10k :
/work/project/sigenae/Christophe/Panga_longranger/map_to_scaffolds/possorted-tagcmp.tsv.gz
Le fichier “contact map” simulé :
/work/project/sigenae/Christophe/Panga_longranger/map_to_scaffolds/genome.mnd.txt
Le fichier .hic généré à partir des données Hi-C :
/work/project/sigenae/Christophe/Panga_longranger/map_to_scaffolds/genome.final.10X.hic
Le fichier .hic généré à partir des données 10X :
/work/project/sigenae/Christophe/Panga_longranger/map_to_scaffolds/genome.final.Hi-C.hic
Le fichier .assembly du génome assemblé avec LR+Hi-C pour le chargement dans Juicebox :
/work/project/sigenae/Christophe/Panga_longranger/map_to_scaffolds/genome.final.assembly
La procédure utilisée pour générer un fichier .hic à partir de données 10X :
http://genoweb.toulouse.inra.fr/~sigenae/10X_and_Juicebox.html
Mother 1
- Unpolished contigs: /work2/project/seqoccin/assemblies/assemblers/nanopore/bos_taurus/wtdbg2_trio1_mother_37165_run123/mother_raw.cgt.fa
- ONT: /work2/project/seqoccin/data/reads/nanopore/bos_taurus/trio1.mother.run?.fastq.gz
- Hi-C: /work2/project/seqoccin/assemblies/scaffolders/hic/bos_taurus/juicer_trio1_mother_37165_wtdbg2_ont_run123_hic_run1/Maison_plus/aligned/inter_30.hic
- 10X: /work2/project/seqoccin/assemblies/scaffolders/10x/bos_taurus/LR_wtdbg_assembly_mother_37165_run123/Bovin-37165-align/outs/possorted_bam.bam
- Assembly from JuiceBox: /work2/project/seqoccin/assemblies/scaffolders/hic/bos_taurus/juicer_trio1_mother_37165_wtdbg2_ont_run123_polished_hic_run1/Maison_plus/juicebox/mother_raw_wtdbg2_58x_polished.final.assembly
Offspring 2
- Polished contigs: /work/project/seqoccin/assemblies/polishing_1/bos_taurus/nfPolishing_trio2_offspring_37160_wtdbgassemblyrun12345_ontrun12345_10Xrun1234/PolishingResult/PILON_SR/assembly.pilon1.fa
- ONT: /work/project/seqoccin/data/reads/nanopore/bos_taurus/trio2.offspring.*.fastq.gz
- Hi-C: /work/project/seqoccin/data/reads/hic/bos_taurus/trio2.offspring*
- 10X: /work/project/seqoccin/data/reads/10x/bos_taurus/Bovin-37160/*
~/Desktop/Apps/minimap2/minimap2 -t 10 -o /Scratch/mazytnicki/Mother_1/mapping.paf /Scratch/mazytnicki/GCF_002263795.1_ARS-UCD1.2_genomic.fna.gz /Scratch/mazytnicki/Mother_1/mother_raw.cgt.fa ~/Apps/samtools-1.10/samtools faidx /Scratch/mazytnicki/Mother_1/mother_raw.cgt.fa cut -f 1-2 /Scratch/mazytnicki/Mother_1/mother_raw.cgt.fa.fai > /Scratch/mazytnicki/Mother_1/mother_raw.cgt.len ~/Apps/samtools-1.10/samtools view -c /Scratch/mazytnicki/Mother_1/possorted_bam.bam # 1590040440 ../CheckScaffolding/checkBreaks /Scratch/mazytnicki/Mother_1/mapping.paf /Scratch/mazytnicki/Mother_1/mother_raw.cgt.len 1000000 100000 outBreaks.tsv ../CheckScaffolding/checkBreaks /Scratch/mazytnicki/Mother_1/mapping.paf /Scratch/mazytnicki/Mother_1/mother_raw.cgt.len breaks_hic.tsv 100000 10000 predicted.out putative.out > difference_breaks_hic.txt ../CheckScaffolding/checkBreaks /Scratch/mazytnicki/Mother_1/mapping.paf /Scratch/mazytnicki/Mother_1/mother_raw.cgt.len breaks_ont.tsv 100000 10000 predicted.out putative.out > difference_breaks_ont.txt ../CheckScaffolding/checkChromosome /Scratch/mazytnicki/Mother_1/mapping.paf > scaffChrs.txt ../CheckScaffolding/checkStitch scaffChrs.txt stitch.txt ./checkJoins /Scratch/mazytnicki/Mother_1/mapping.paf joins.txt 10000 10000 zcat /Scratch/mazytnicki/Mother_1/trio1.mother.run1.fastq.gz | wc -l # 12064000 ~/Desktop/Apps/minimap2/minimap2 -t 10 -o /Scratch/mazytnicki/Mother_1/trio1.mother.run1.paf /Scratch/mazytnicki/Mother_1/mother_raw.cgt.fa /Scratch/mazytnicki/Mother_1/trio1.mother.run1.fastq.gz wc -l /Scratch/mazytnicki/Mother_1/trio1.mother.run1.paf # 4862048 /Scratch/mazytnicki/Mother_1/trio1.mother.run1.paf ~/Apps/samtools-1.10/samtools index -@ 5 /Scratch/mazytnicki/Mother_1/possorted_bam.bam
scp mzytnicki@genologin.toulouse.inrae.fr:/work/project/seqoccin/assemblies/polishing_1/bos_taurus/nfPolishing_trio2_offspring_37160_wtdbgassemblyrun12345_ontrun12345_10Xrun1234/PolishingResult/PILON_SR/assembly.pilon1.fa /Scratch/mazytnicki/Offspring_2/ # From genologin sbatch /home/mzytnicki/work/Projects/tenxChecker/map_offspring2.sh sbatch /home/mzytnicki/work/Projects/tenxChecker/lrmkref.sh sbatch /home/mzytnicki/work/Projects/tenxChecker/lralign.sh sbatch /home/mzytnicki/work/Projects/tenxChecker/lralignSmall.sh ~/Apps/samtools-1.10/samtools faidx /Scratch/mazytnicki/Offspring_2/Juicer/references/assembly.pilon1.fa cut -f 1-2 /Scratch/mazytnicki/Offspring_2/Juicer/references/assembly.pilon1.fa.fai > /Scratch/mazytnicki/Offspring_2/Juicer/references/assembly.pilon1.size ~/Apps/bwa-0.7.17/bwa index /Scratch/mazytnicki/Offspring_2/Juicer/references/assembly.pilon1.fa python3 /home/mazytnicki/Apps/juicer/misc/generate_site_positions.py HindIII /Scratch/mazytnicki/Offspring_2/Juicer/references/assembly.pilon1.fa /Scratch/mazytnicki/Offspring_2/Juicer/references/assembly.pilon1_HindIII.txt # Add bwa, samtools, & gawk to the PATH PATH=~/bin:$PATH /Scratch/mazytnicki/Offspring_2/Juicer/scripts/juicer.sh -d /Scratch/mazytnicki/Offspring_2/Juicer/ -z /Scratch/mazytnicki/Offspring_2/Juicer/references/assembly.pilon1.fa -p /Scratch/mazytnicki/Offspring_2/Juicer/references/assembly.pilon1.size -y /Scratch/mazytnicki/Offspring_2/Juicer/references/assembly.pilon1_HindIII.txt -D /Scratch/mazytnicki/Offspring_2/Juicer/ -t 6 -g bt37160 &> /Scratch/mazytnicki/Offspring_2/Juicer/juicer.out ln -s /Scratch/mazytnicki/Offspring_2/Juicer/aligned/inter.hic /Scratch/mazytnicki/Offspring_2/inter.hic scp mzytnicki@genologin.toulouse.inrae.fr:/home/mzytnicki/work/Projects/tenxChecker/alignment.bam /Scratch/mazytnicki/Offspring_2/ # LongRanger data does not seem sorted... samtools sort -m 2G -o possorted_bam2.bam -@ 6 possorted_bam.bam
devtools::load_all(".") resolution <- 1000 bamFileName <- "/Scratch/mazytnicki/Mother_1/possorted_bam2.bam" bamData <- parseBamFile(bamFileName, resolution) hicFileName <- "/Scratch/mazytnicki/Mother_1/inter_30.hic" hicData <- parseHicFile(hicFileName, resolution) ontFileName <- "/Scratch/mazytnicki/Mother_1/trio1.mother.run1.paf" ontData <- parsePafFile(ontFileName, resolution, 500, 10, 10) contigFileName <- "/Scratch/mazytnicki/Mother_1/mother_raw.cgt.fa" scaffolder <- msscaf(contigFileName, resolution) scaffolder <- addExp(scaffolder, ontData, "ONT") scaffolder <- addExp(scaffolder, hicData, "HiC") scaffolder <- addExp(scaffolder, bamData, "10X") #saveRDS(scaffolder, file = "/Scratch/mazytnicki/Mother_1/allObject0.rds") #scaffolder <- readRDS(file = "/Scratch/mazytnicki/Mother_1/allObject0.rds") scaffolder <- estimateDistributions(scaffolder) scaffolder <- cleanData(scaffolder) #saveRDS(scaffolder, file = "/Scratch/mazytnicki/Mother_1/allObject.rds") #scaffolder <- readRDS(file = "/Scratch/mazytnicki/Mother_1/allObject.rds") scaffolder <- findBreaks(scaffolder, 0.01) scaffolder <- splitChromosomes(scaffolder) #saveRDS(scaffolder, file = "/Scratch/mazytnicki/Mother_1/allObject1.rds") #scaffolder <- readRDS(file = "/Scratch/mazytnicki/Mother_1/allObject1.rds") scaffolder <- findJoins(scaffolder, 0.05) #saveRDS(scaffolder, file = "/Scratch/mazytnicki/Mother_1/allObject2.rds") #scaffolder <- readRDS(file = "/Scratch/mazytnicki/Mother_1/allObject2.rds") scaffolder <- scaffold(scaffolder) Biostrings::writeXStringSet(Biostrings::DNAStringSet(scaffolder@sequences), "sequences.fa") scaffolder@breaks %>% dplyr::select(ref, bin) %>% dplyr::mutate(bin = bin * resolution) %>% readr::write_tsv("breaks.tsv", col_names = FALSE) scaffolder@joins %>% dplyr::arrange(pvalue) %>% dplyr::select(ref1, ref2) %>% readr::write_tsv("joins.tsv", col_names = FALSE)
../CheckScaffolding/verifyJoins /Scratch/mazytnicki/Mother_1/mapping.paf 100000 10000 joins.tsv ../CheckScaffolding/checkJoins /Scratch/mazytnicki/Mother_1/mapping.paf outJoins.tsv 1000 1000 joins.tsv ../CheckScaffolding/checkBreaks /Scratch/mazytnicki/Mother_1/mapping.paf /Scratch/mazytnicki/Mother_1/mother_raw.cgt.len 100000 10000 predicted.out breaks.tsv putative.out > difference_breaks.txt # From tapou ~/Desktop/Apps/quast-5.0.2/quast-lg.py -t 15 -r /Scratch/mazytnicki/GCF_002263795.1_ARS-UCD1.2_genomic.fna.gz -t 6 -o /home/mazytnicki/Desktop/Projects/Other/Assembly/Quast/Test1 /Scratch/mazytnicki/Mother_1/mother_raw.cgt.fa wtdbg2_trio1_mother_37165_arcs_links.fa sequences.fa sequences_ONT_HiC.fa # From genologin
scp /Scratch/mazytnicki/GCF_002263795.1_ARS-UCD1.2_genomic.fna.gz /Scratch/mazytnicki/Mother_1/mother_raw.cgt.fa wtdbg2_trio1_mother_37165_arcs_links.fa sequences.fa sequences_ONT_HiC.fa mzytnicki@genologin.toulouse.inrae.fr:/home/mzytnicki/work/Projects/tenxChecker scp sequences.fa mzytnicki@genologin.toulouse.inrae.fr:/home/mzytnicki/work/Projects/tenxChecker # From genologin module load system/Miniconda3 module load bioinfo/Inspector-v1.2 cd /home/mzytnicki/work/Projects/tenxChecker/ sbatch inspector.sh
devtools::load_all(".") #object <- makeSimObject(1, 1, 100, 20, 5, 0.2, c(1), c(50), c(), c(), c()) #object <- makeSimObject(2, 5, 100, 20, 10, 0.2, c(1, 2), c(25, 75), c(3, 4), c(4, 5), c(1, 5)) object <- makeSimObject(2, 5, 100, 100, 20, 0.5, c(1, 1, 2, 2), c(25, 75, 75, 75), c(3, 4), c(4, 5), c(1, 10)) #object <- makeSimObject(1, 2, 100, 20, 10, 0.2, c(), c(), c(1), c(2), c(5)) #plot.10X(object) object <- estimateDistributions(object) object <- cleanData(object) object <- findBreaks(object, 0.05) object <- splitChromosomes(object) object <- findJoins(object, 0.05) object <- scaffold(object)
devtools::load_all(".") scaffolder <- readRDS(file = "/Scratch/mazytnicki/Mother_1/allObject.rds") scaffolder <- readRDS(file = "/Scratch/mazytnicki/Mother_1/tmp.rds") plot.msscafJoin(scaffolder, ref1 = "ctg672", ref2 = "ctg200", after1 = T, after2 = T, outliers = T, metaSize = T) plot.msscafJoin(scaffolder, ref1 = "ctg853", ref2 = "ctg304", after1 = F, after2 = T, outliers = F) plot.msscafJoin(scaffolder, ref1 = "ctg345", ref2 = "ctg326", after1 = F, after2 = F, outliers = F) plot.msscafJoin(scaffolder, ref1 = "ctg369", ref2 = "ctg347", after1 = F, after2 = F, outliers = F, metaSize=T) plot.msscafJoin(scaffolder, ref1 = "ctg502", ref2 = "ctg310", after1 = F, after2 = F, outliers = F, metaSize=T) # Nothing here plot.msscafJoin(scaffolder, ref1 = "ctg975", ref2 = "ctg941", after1 = T, after2 = F, outliers = F, metaSize=T) # 100% outlier plot.msscafJoin(scaffolder, ref1 = "ctg672", ref2 = "ctg200", after1 = T, after2 = T, outliers = F, metaSize=T) plot.msscafJoin(scaffolder, ref1 = "ctg831", ref2 = "ctg534", after1 = T, after2 = T, outliers = F, metaSize=T) plot.msscafJoin(scaffolder, ref1 = "ctg831", ref2 = "ctg534", after1 = F, after2 = T, outliers = T, metaSize=T) # Borderline plot.msscafJoin(scaffolder, ref1 = "ctg295", ref2 = "ctg159", after1 = F, after2 = T, outliers = T, metaSize=T) plot.msscafJoin(scaffolder, ref1 = "ctg36", ref2 = "ctg18", after1 = T, after2 = F, outliers = T, metaSize=T) plot.msscafJoin(scaffolder, ref1 = "ctg75", ref2 = "ctg33", after1 = T, after2 = T, outliers = T, metaSize=T) sizes <- scaffolder@sizes scaffolder <- checkJoins(scaffolder, 0.05) scaffolder <- removeAmbiguousJoins(scaffolder) scaffolder <- checkCorners(scaffolder, 0.05) scaffolder <- mergeJoins(scaffolder, 0.05)
Oddly, the possorted is not always pos-sorted...
devtools::load_all(".") resolution <- 1000 #dirName <- "/media/Stock/tenxchecker/Offspring_2" dirName <- "/Scratch/mazytnicki/Offspring_2" bamFileName <- file.path(dirName, "possorted_bam2.bam") # 438,352,223 elements bamData <- parseBamFile(bamFileName, resolution) hicFileName <- file.path(dirName, "inter.hic") hicData <- parseHicFile(hicFileName, resolution) ontFileName <- file.path(dirName, "alignment.paf") # 19,005,147 elements ontData <- parsePafFile(ontFileName, resolution, 500, 10, 10) contigFileName <- file.path(dirName, "assembly.pilon1.fa") scaffolder <- msscaf(contigFileName, resolution) scaffolder <- addExp(scaffolder, ontData, "ONT") scaffolder <- addExp(scaffolder, hicData, "HiC") scaffolder <- addExp(scaffolder, bamData, "10X") #saveRDS(scaffolder, file = "/Scratch/mazytnicki/Offspring_2/allObject0.rds") #scaffolder <- readRDS(file = "/Scratch/mazytnicki/Offspring_2/allObject0.rds") scaffolder <- estimateDistributions(scaffolder) scaffolder <- cleanData(scaffolder) #saveRDS(scaffolder, file = "/Scratch/mazytnicki/Offspring_2/allObject.rds") #scaffolder <- readRDS(file = "/Scratch/mazytnicki/Offspring_2/allObject.rds") scaffolder <- findBreaks(scaffolder, 0.01) scaffolder <- splitChromosomes(scaffolder) #saveRDS(scaffolder, file = "/Scratch/mazytnicki/Offspring_2/allObject1.rds") #scaffolder <- readRDS(file = "/Scratch/mazytnicki/Offspring_2/allObject1.rds") #Biostrings::writeXStringSet(Biostrings::DNAStringSet(scaffolder@sequences), "sequencesSplit.fa") scaffolder <- findJoins(scaffolder, 0.05) #saveRDS(scaffolder, file = "/Scratch/mazytnicki/Offspring_2/allObject2.rds") #scaffolder <- readRDS(file = "/Scratch/mazytnicki/Offspring_2/allObject2.rds") scaffolder <- scaffold(scaffolder) Biostrings::writeXStringSet(Biostrings::DNAStringSet(scaffolder@sequences), "sequences.fa") scaffolder@breaks %>% dplyr::select(ref, bin) %>% dplyr::mutate(bin = bin * resolution) %>% readr::write_tsv("breaks.tsv", col_names = FALSE) scaffolder@joins %>% dplyr::arrange(pvalue) %>% dplyr::select(ref1, ref2) %>% readr::write_tsv("joins.tsv", col_names = FALSE)
Tigmint
# from genologin cd /home/mzytnicki/work/Projects/tenxChecker/Offspring sbatch lrb.sh fold refdata-assembly.pilon1/fasta/genome.fa > genome.fa sbatch tigmint.sh # from tapou mazytnicki@tapou:~/Projects/Other/Assembly/tenxchecker$ scp mzytnicki@genologin.toulouse.inrae.fr:/home/mzytnicki/work/Projects/tenxChecker/Offspring/genome.tigmint.arcs.fa /Scratch/mazytnicki/Offspring_2/
Quast
~/Desktop/Apps/quast-5.0.2/quast-lg.py -t 15 -r /Scratch/mazytnicki/Offspring_2/assembly.pilon1.fa -t 6 -o /home/mazytnicki/Desktop/Projects/Other/Assembly/Quast/Offspring sequences.fa /Scratch/mazytnicki/Offspring_2/genome.tigmint.arcs.fa
Inspector ```{bash} scp sequencesSplit.fa mzytnicki@genologin.toulouse.inrae.fr:/home/mzytnicki/work/Projects/tenxChecker/Offspring/ scp sequences.fa mzytnicki@genologin.toulouse.inrae.fr:/home/mzytnicki/work/Projects/tenxChecker/Offspring/sequencesMs.fa
cd /home/mzytnicki/work/Projects/tenxChecker/ sbatch inspector_raw.sh sbatch inspector_tigmint.sh sbatch inspector_tigmint0.sh sbatch inspector_mss.sh
scp mzytnicki@genologin.toulouse.inrae.fr:/home/mzytnicki/work/Projects/tenxChecker/Offspring/genome.tigmint.arcs.fa /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references mv /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/genome.tigmint.arcs.fa /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/tmp sed '/^>/s/ .*//' /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/tmp | fold > /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/genome.tigmint.arcs.fa ~/Apps/samtools-1.10/samtools faidx /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/genome.tigmint.arcs.fa cut -f 1-2 /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/genome.tigmint.arcs.fa.fai > /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/genome.tigmint.arcs.size ~/Apps/bwa-0.7.17/bwa index /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/genome.tigmint.arcs.fa cd /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/restriction_sites python3 /home/mazytnicki/Apps/juicer/misc/generate_site_positions.py HindIII genome.tigmint.arcs.fa /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/genome.tigmint.arcs.fa
PATH=~/bin:$PATH mkdir splits ln -s fastq/* splits/ /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/scripts/juicer.sh -d /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/ -z /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/genome.tigmint.arcs.fa -p /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/references/genome.tigmint.arcs.size -y /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/restriction_sites/genome.tigmint.arcs.fa_HindIII.txt -D /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/ -t 6 -g bt37160 &> /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/juicer.out samtools view -O SAM -F 1024 /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/aligned/merged_dedup.bam | awk -v mnd=1 -f /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/scripts/common/sam_to_pre.awk > /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/aligned/merged_nodups.txt xz -T 5 /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/aligned/merged_nodups.txt scp /Scratch/mazytnicki/Offspring_2/Juicer/Tigmint/aligned/merged_nodups.txt.xz mzytnicki@genologin.toulouse.inrae.fr:/home/mzytnicki/work/Projects/tenxChecker/Offspring/
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.