parse_ucsc_gokey: Parse UCSC tracks that use the Gokey format

parse_ucsc_gokeyR Documentation

Parse UCSC tracks that use the Gokey format

Description

Parse UCSC tracks that use the Gokey format

Usage

parse_ucsc_gokey(
  track_lines,
  overlay_grep = c("[ -._](plus|minus|pos|neg)($|[ -._])"),
  priority = 5000,
  output_format = c("text", "list"),
  debug = c("none"),
  multiwig_concat_header = TRUE,
  group_header_delim = ": ",
  default_env = new.env(),
  verbose = FALSE,
  ...
)

Arguments

track_lines

character vector containing lines read from a track file, or valid path or connection to a track file.

overlay_grep

character vector containing valid regular expression patterns used to recognize when a track should be considered an overlay coverage track. For example ⁠track name="trackA F"⁠ and ⁠track name="trackA R"⁠ would be recognized as forward and reverse strand for a track named "trackA". Overlay tracks are handled using the UCSC "multiWig" approach, and not the composite track approach. To disable overlay_grep, use overlay_grep="^$". To enable overlay_grep for all tracks, use overlay_grep="$".

priority

integer value indicating the priority to start when assigning priority to each track.

output_format

character string indicating the output format, where "text" will return one long character string, and "list" will return a list with one track per list element with class ⁠"glue","character"⁠.

debug

character indicating type of debug output:

  • df: returns the intermediate track_df data.frame;

  • pri: prints priority during track parsing;

  • none: does no debug, the default.

multiwig_concat_header

logical indicating whether multiWig parent tracks should be named by concatenating header1 and header2 values.

group_header_delim

character string used as delimiter between track group and track header label, where for example "headingA1" and "headingA2" would be combined with delimiter ": " to form "headingA1: headingA2" as the visible label for each group of tracks.

verbose

logical indicating whether to print verbose output during processing.

...

additional arguments are treated as a named list of track parameters that override existing parameter values. For example scoreFilter=1 will override the default for bigBed tracks scoreFilter=5.

Details

Given a text file, or lines from a text file, representing the ⁠Gokey format⁠, this function will parse the track lines into groups, and return a text string usable in a UCSC genome browser track hub.

In general, the intention is to convert a set of UCSC track lines to a track hub format, where common track options are converted to relevant track hub configuration lines.

Tracks are generally divided into two types of groupings:

multiWig Overlay Tracks

Track name that matches overlay_grep regular expression pattern are configured as multiWig overlay tracks. This configuration uses the UCSC multiWig format as described here https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#aggregate

  • Tracks must be named such that after overlay_grep is removed, the resulting string defines a set of tracks. The tracks that share this string are assigned to the same multiWig track.

  • To customize the default aggregate method, supply for example aggregate="none" in the ... arguments to this function, which will cause multiWig tracks to use "none" when tracks are displayed.

  • Each parent track is configured as a superTrack, which contains one or more multiWig tracks beneath it.

  • It is recommended to have one heading "ChIP-seq" with sub-heading "coverage", then each track is grouped by name after removing the pattern matched by overlay_grep. This configuration will create one pulldown entry in the track hub configuration representing the superTrack. The superTrack will contain one multiWig track for each unique name (after removing overlap_grep), each of which contains all tracks that match that name.

  • Each track group is assigned priority in order of each unique track group defined in the track config lines.

  • Individual tracks are configured as child tracks to the track groups.

Composite View Tracks

All other tracks are grouped as composite tracks, specifically using composite track view, as described here https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#compositeTrack

  • More specifically, the parent track is configured as a compositeTrack, including views as "view Views COV=Coverage JUNC=Junctions PEAK=Peaks" by default.

  • An intermediate track is created to represent each view, by default "JUNC" however this value is not visible to users unless there are multiple different view values.

  • Each track is configured as a child to the relevant view track. Track priority is assigned in the order it appears in the track config lines. The priority allows peak tracks to be ordered directly after or before the associated coverage track.

Top-Level Parent Tracks

Note that in both scenarios above, there is one top-level parent track that contains a subset of tracks. The top-level grouping can be defined in the track lines by supplying two header lines immediately before each top-level grouping of tracks, referred to as header1 and header2 for clarity.

The first header line header1 is used as the top-level track. For composite tracks, one composite track view is created underneath the top-level track for each secondary header header2. Composite tracks can associate two views to the same parent by using only second header line header2 for subsequent track groups. In this way, composite views can effectively contain a subgroup of tracks within each top-level header header1.

For multiWig overlay tracks, each overlay track is grouped into the top-level header header1 track. However, there is no additional subgroup available.

An example for two composite tracks, each with one view.

headingA1
headingA2
track name=trackname1
track name=trackname2

headingB1
headingB2
track name=trackname5
track name=trackname6

In this case, there will be two top-level parent tracks, labeled "headingA1" and "headingB1", which appear inside the track hub. Within each track, there will be one composite view: for headingA1 there is one internal track headingA2; and for headingB1 there is one internal track headingB2.

Value

by default a character string suitable to cat() directly into a text file, when output_format="text". When output_format="list" it returns a list of glue objects, which can be concatenated into one character string with Reduce("+", trackline_list).

See Also

Other jam ucsc browser functions: assign_track_defaults(), get_track_defaults(), make_ucsc_trackname()

Examples

# example of two composite track top-level parent tracks
track_lines_text <- c("headingA1
headingA2
track name=trackname1 shortLabel=trackname1 bigDataUrl=some_url
track name=trackname2 shortLabel=trackname2 bigDataUrl=some_url
track name=trackname3 shortLabel=trackname3 bigDataUrl=some_url
track name=trackname4 shortLabel=trackname4 bigDataUrl=some_url

headingB1
headingB2
track name=trackname5 shortLabel=trackname5 bigDataUrl=some_url
track name=trackname6 shortLabel=trackname6 bigDataUrl=some_url
track name=trackname7 shortLabel=trackname7 bigDataUrl=some_url
track name=trackname8 shortLabel=trackname8 bigDataUrl=some_url
")
track_lines <- unlist(strsplit(track_lines_text, "\n"));
cat(parse_ucsc_gokey(track_lines))
track_df <- parse_ucsc_gokey(track_lines, debug="df")

# example of two multiWig track top-level parent tracks
# each of which contain two tracks with positive/negative coverage
track_lines_text2 <- c("headingA1
headingA2
track name=trackname1_pos shortLabel=trackname1_pos bigDataUrl=some_url
track name=trackname1_neg shortLabel=trackname1_neg bigDataUrl=some_url
track name=trackname2_pos shortLabel=trackname2_pos bigDataUrl=some_url
track name=trackname2_neg shortLabel=trackname2_neg bigDataUrl=some_url

headingB1
headingB2
track name=trackname3_pos shortLabel=trackname3_pos bigDataUrl=some_url
track name=trackname3_neg shortLabel=trackname3_neg bigDataUrl=some_url
track name=trackname4_pos shortLabel=trackname4_pos bigDataUrl=some_url
track name=trackname4_neg shortLabel=trackname4_neg bigDataUrl=some_url
")
track_lines2 <- unlist(strsplit(track_lines_text2, "\n"));
track_text2 <- parse_ucsc_gokey(track_lines2);
cat(track_text2);

# the final step is to save into a text file
if (FALSE) {
   cat(track_text2, file="trackDb_platjam.txt")
}


jmw86069/platjam documentation built on Sept. 26, 2024, 3:31 p.m.