prepUMI4C: Prepare UMI4C data

View source: R/contactsUMI4C.R

prepUMI4CR Documentation

Prepare UMI4C data

Description

Prepare the FastQ files for the further analysis by selecting reads with bait and adding the respective UMI identifier for each read in its header.

Usage

prepUMI4C(
  fastq_dir,
  wk_dir,
  file_pattern = NULL,
  bait_seq,
  bait_pad,
  res_enz,
  numb_reads = 1e+09
)

Arguments

fastq_dir

Path of the directory containing the FastQ files (compressed or uncompressed).

wk_dir

Working directory where to save the outputs generated by the UMI-4c analysis.

file_pattern

Character that can be used to filter the files you want to analyze in the fastq_dir.

bait_seq

Character containing the bait primer sequence.

bait_pad

Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence).

res_enz

Character containing the restriction enzyme sequence.

numb_reads

Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=1e9.

Value

Creates a compressed FASTQ file in wk_dir/prep named basename(fastq)).fq.gz, containing the filtered reads with the UMI sequence in the header. A log file with the statistics is also generated in wk_dir/logs named umi4c_stats.txt.

See Also

contactsUMI4C.

Examples

if (interactive()) {
path <- downloadUMI4CexampleData(reduced = TRUE)
raw_dir <- file.path(path, "CIITA", "fastq")

prepUMI4C(
    fastq_dir = raw_dir,
    wk_dir = file.path(path, "CIITA"),
    bait_seq = "GGACAAGCTCCCTGCAACTCA",
    bait_pad = "GGACTTGCA",
    res_enz = "GATC"
)
}

Pasquali-lab/UMI4Cats documentation built on Nov. 3, 2024, 3:10 p.m.