parse_ucsc_gokey | R Documentation |
Parse UCSC tracks that use the Gokey format
parse_ucsc_gokey(
track_lines,
overlay_grep = c("[ -._](plus|minus|pos|neg)($|[ -._])"),
priority = 5000,
output_format = c("text", "list"),
debug = c("none"),
multiwig_concat_header = TRUE,
group_header_delim = ": ",
default_env = new.env(),
verbose = FALSE,
...
)
track_lines |
|
overlay_grep |
|
priority |
|
output_format |
|
debug |
|
multiwig_concat_header |
|
group_header_delim |
|
verbose |
|
... |
additional arguments are treated as a named list
of track parameters that override existing parameter values.
For example |
Given a text file, or lines from a text file, representing
the Gokey format
, this function will parse the track
lines into groups, and return a text string usable in
a UCSC genome browser track hub.
In general, the intention is to convert a set of UCSC track lines to a track hub format, where common track options are converted to relevant track hub configuration lines.
Tracks are generally divided into two types of groupings:
Track name that matches overlay_grep
regular expression pattern
are configured as multiWig
overlay tracks. This configuration
uses the UCSC multiWig format as described here
https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#aggregate
Tracks must be named such that after overlay_grep
is removed, the
resulting string defines a set of tracks. The tracks that share this
string are assigned to the same multiWig track.
To customize the default aggregate method, supply for example
aggregate="none"
in the ...
arguments to this function, which
will cause multiWig tracks to use "none"
when tracks are displayed.
Each parent track is configured as a superTrack
, which contains
one or more multiWig tracks beneath it.
It is recommended to have one heading "ChIP-seq"
with sub-heading
"coverage"
, then each track is grouped by name after removing the
pattern matched by overlay_grep
. This configuration will create
one pulldown entry in the track hub configuration representing the
superTrack. The superTrack will contain one multiWig track for each unique
name (after removing overlap_grep
), each of which contains all
tracks that match that name.
Each track group is assigned priority in order of each unique track group defined in the track config lines.
Individual tracks are configured as child tracks to the track groups.
All other tracks are grouped as composite tracks, specifically using composite track view, as described here https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#compositeTrack
More specifically, the parent track is configured as a compositeTrack
,
including views as "view Views COV=Coverage JUNC=Junctions PEAK=Peaks"
by default.
An intermediate track is created to represent each view, by default
"JUNC"
however this value is not visible to users unless there are
multiple different view values.
Each track is configured as a child to the relevant view track. Track priority is assigned in the order it appears in the track config lines. The priority allows peak tracks to be ordered directly after or before the associated coverage track.
Note that in both scenarios above, there is one top-level parent
track that contains a subset of tracks. The top-level grouping can
be defined in the track lines by supplying two header lines immediately
before each top-level grouping of tracks, referred to as header1
and header2
for clarity.
The first header line header1
is used as the top-level track.
For composite tracks, one composite track view is created underneath
the top-level track for each secondary header header2
.
Composite tracks can associate two views to the same parent by
using only second header line header2
for subsequent track groups.
In this way, composite views can effectively contain a subgroup of
tracks within each top-level header header1
.
For multiWig overlay tracks, each overlay track is grouped into
the top-level header header1
track. However, there is no additional
subgroup available.
An example for two composite tracks, each with one view.
headingA1 headingA2 track name=trackname1 track name=trackname2 headingB1 headingB2 track name=trackname5 track name=trackname6
In this case, there will be two top-level parent tracks, labeled
"headingA1"
and "headingB1"
, which appear inside the track hub.
Within each track, there will be one composite view:
for headingA1
there is one internal track headingA2
; and
for headingB1
there is one internal track headingB2
.
by default a character string suitable to cat()
directly into a text file, when output_format="text"
.
When output_format="list"
it returns a
list of glue
objects, which can be concatenated into
one character string with Reduce("+", trackline_list)
.
Other jam ucsc browser functions:
assign_track_defaults()
,
get_track_defaults()
,
make_ucsc_trackname()
# example of two composite track top-level parent tracks
track_lines_text <- c("headingA1
headingA2
track name=trackname1 shortLabel=trackname1 bigDataUrl=some_url
track name=trackname2 shortLabel=trackname2 bigDataUrl=some_url
track name=trackname3 shortLabel=trackname3 bigDataUrl=some_url
track name=trackname4 shortLabel=trackname4 bigDataUrl=some_url
headingB1
headingB2
track name=trackname5 shortLabel=trackname5 bigDataUrl=some_url
track name=trackname6 shortLabel=trackname6 bigDataUrl=some_url
track name=trackname7 shortLabel=trackname7 bigDataUrl=some_url
track name=trackname8 shortLabel=trackname8 bigDataUrl=some_url
")
track_lines <- unlist(strsplit(track_lines_text, "\n"));
cat(parse_ucsc_gokey(track_lines))
track_df <- parse_ucsc_gokey(track_lines, debug="df")
# example of two multiWig track top-level parent tracks
# each of which contain two tracks with positive/negative coverage
track_lines_text2 <- c("headingA1
headingA2
track name=trackname1_pos shortLabel=trackname1_pos bigDataUrl=some_url
track name=trackname1_neg shortLabel=trackname1_neg bigDataUrl=some_url
track name=trackname2_pos shortLabel=trackname2_pos bigDataUrl=some_url
track name=trackname2_neg shortLabel=trackname2_neg bigDataUrl=some_url
headingB1
headingB2
track name=trackname3_pos shortLabel=trackname3_pos bigDataUrl=some_url
track name=trackname3_neg shortLabel=trackname3_neg bigDataUrl=some_url
track name=trackname4_pos shortLabel=trackname4_pos bigDataUrl=some_url
track name=trackname4_neg shortLabel=trackname4_neg bigDataUrl=some_url
")
track_lines2 <- unlist(strsplit(track_lines_text2, "\n"));
track_text2 <- parse_ucsc_gokey(track_lines2);
cat(track_text2);
# the final step is to save into a text file
if (FALSE) {
cat(track_text2, file="trackDb_platjam.txt")
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.