utils: Generic internal functions used by genespace

utilsR Documentation

Generic internal functions used by genespace

Description

utils Convience functions for genespace, not meant to be called directly by the user. Little documentation support provided, use at your own risk.

.onAttach startup messages

check_integer Checks and parses integer arguments to GENESPACE functions. Replaces values of x not in range (min, max) with the minimum or maximum values. If a single value and a value that is not coercible to an integer is specified, returns the default value. If na.rm = TRUE and onlySingleValue = FALSE, drops NAs. from the vector

check_numeric See check_integer. Same but for numeric values.

check_character See check_integer. Same but for character values.

check_logical See check_integer. Same but for logical values.

check_filePathParam QC of user-specified parameter

check_onlyDNA QC to ensure the peptides are actually peptides

read_aaFasta read fasta-formatted peptide sequences

get_nAA count the number of amino acids by gene

read_bed read and check a raw bed file with four columns.

align_charLeft for a vector of character strings, add " " to the right side so they all align to the left when printed

align_charRight for a vector of character strings, add " " to the left side so they all align to the right when printed

read_orthofinderSpeciesIDs Parses the SpeciesIDs.txt file into a data.table and returns to R.

read_orthofinderSequenceIDs Reads the sequence IDs:gene name dictionary into memory.

get_nSeqs Counts the number of lines with ">" in a file. If the output is not convertible to an integer, returns NA.

check_annotFiles ensure the annotations match correctly

check_MCScanXhInstall check that MCScanX_h can be called

parse_ogs read and parse orthofinder orthogroups.tsv files

parse_hogs read and parse orthofinder phylogenetically hierarchical orthogroup (N0.tsv) files

parse_orthologues read and parse orthofinder orthologs

round_toInteger flexible rounding to any integer.

add_rle run-length equivalent conversion, either as the length of the runs or the unique run ids.

gs_colors get a set of colors from the genespace palette

clus_igraph cluster connected subgraphs from pairwise observations

are_colors check if a vector is coercible to R colors

scale_between scale a vector between a range

read_combBed ensures consistent combBed IO

write_combBed ensures consistent combBed IO

read_allBlast ensures consistent allBlast IO

write_allBlast ensures consistent allBlast IO

read_synHits ensures consistent synHit IO

write_synHits ensures consistent synHit IO

add_alpha add transparency to a color

read_intSynPos utility to read interpolated syntenic position files

write_intSynPos utility to write interpolated syntenic position files

get_orthofinderVersion Checks that orthofinder is installed and if so, returns the installed version.

get_diamondVersion Checks that DIAMOND is installed and if so, returns the installed version.

theme_genespace specifies publication-style themes. Col here is the color of the panel.background.

download_exampleData downloads chicken and human annotations from NCBI

interp_approx use approx to interpolate missing positions based on the positions of the bounder x/y coordinates.

read_refGenomeSynHits read in all syntenic hits files involving a single reference genome and, where necessary, invert the hits so that the reference genome is always the query (genome1).

read_refGenomeAllBlast read in all hits files involving a single reference genome and, where necessary, invert the hits so that the reference genome is always the query (genome1).

get_bedInBlk splits the bed file so that two entries (query and target ) match the physical bounds of blocks in the hits object.

add_array2bed add array ID to the bed file

add_arrayReps2bed add array representative genes to the combined bed object

write_pangenes utility to correctly write in long-formatted pan=gene text files.

read_pangenes utility to correctly read in long-formatted pangene text files.

find_nnHit find anchor hits that are nearest to non-anchor xy position

flag_hitsInRadius given a vector of anchors, pulls xy positions within radius of anchors using dbscan

flag_hitsInBlk finds hits within the bounding coordinates of blocks

get_orderedTips Respect ordering of tree when ladderized

pull_pairwise Builds new pairwise files in /syntenicHits

Usage

.onAttach(...)

check_integer(
  x,
  min = -Inf,
  max = Inf,
  default = NA,
  na.rm = FALSE,
  onlySingleValue = length(x) <= 1
)

check_numeric(
  x,
  min = -Inf,
  max = Inf,
  default = NA,
  na.rm = FALSE,
  onlySingleValue = length(x) <= 1
)

check_character(
  x,
  default = NULL,
  na.rm = FALSE,
  onlySingleValue = length(x) <= 1
)

check_logical(x, default = NA, na.rm = FALSE, onlySingleValue = length(x) <= 1)

check_filePathParam(filepath)

check_onlyDNA(path)

read_aaFasta(path)

get_nAA(path)

read_bed(filepath)

align_charLeft(x)

align_charRight(x)

read_orthofinderSpeciesIDs(filepath)

read_orthofinderSequenceIDs(filepath)

get_nSeqs(filepath)

check_annotFiles(filepath, genomeIDs)

check_MCScanXhInstall(filepath)

parse_ogs(filepath, genomeIDs)

parse_hogs(filepath)

parse_orthologues(filepath)

round_toInteger(x, to)

add_rle(x, which = "n")

gs_colors(n = 10)

clus_igraph(id1, id2)

are_colors(col)

scale_between(x, min, max, scale1toMean = TRUE)

read_combBed(filepath)

write_combBed(x, filepath)

read_allBlast(filepath, ...)

write_allBlast(x, filepath)

read_synHits(filepath, ...)

write_synHits(x, filepath)

add_alpha(col, alpha = 1)

read_intSynPos(filepath)

write_intSynPos(x, filepath)

get_orthofinderVersion(filepath)

get_diamondVersion(filepath)

theme_genespace(col = "black")

download_exampleData(filepath)

interp_approx(x, y)

read_refGenomeSynHits(gsParam, refGenome)

read_refGenomeAllBlast(gsParam, refGenome)

get_bedInBlk(hits, bed)

add_array2bed(bed, synBuff, maxIter = 10, reorder = TRUE)

add_arrayReps2bed(bed)

write_pangenes(x, filepath)

read_pangenes(x, filepath, ...)

find_nnHit(x, y, isAnchor, radius)

flag_hitsInRadius(x, y, isAnchor, radius)

flag_hitsInBlk(x, y, blkID)

get_orderedTips(treFile, ladderize = TRUE, genomeIDs)

pull_pairwise(gsParam, verbose)

Arguments

...

additional parameters passed on to other functions
If called, utils returns its own arguments.

x

single-value parameter, string, integer, numeric, list, vector

min

if x is an integer or numeric, the minimum value allowed

max

if x is an integer or numeric, the minimum value allowed

default

if there is a problem with x, replace with this value

na.rm

logical, should NA's be dropped

onlySingleValue

logical, should long a single valuebe returned?

filepath

file.path

path

file path character string

genomeIDs

character vector of genomeIDs

to

integer, top end value

which

character specifying what to use

n

integer, number of observations

id1

character, first id

id2

character, second id

col

color

scale1toMean

logical, should values be scaled to 1?

alpha

numeric, transparency

y

numeric, y values

gsParam

genespace parameters, see init_genespace.

refGenome

character string specifying the reference genome

hits

data.table of syntenic hits

bed

data.table containing the combined bed object

synBuff

see init_genespace.

maxIter

integer, the maximum number of iterations to use

reorder

logical, should the gene rank position be re-ordered?

isAnchor

logical, is a hit an anchor?

radius

numeric, the 2d search radius.

blkID

vector of block IDs

treFile

file.path to the tree file.

ladderize

logical, should the tree be ladderized?

verbose

logical, should updates be printed to the console?


jtlovell/GENESPACE documentation built on Jan. 25, 2025, 6:39 a.m.