process_input: Process sequence data
In almeidasilvaf/syntenet: Inference And Analysis Of Synteny Networks

process_input

R Documentation

Process sequence data

Description

Process sequence data

Usage

process_input(
  seq = NULL,
  annotation = NULL,
  gene_field = "gene_id",
  filter_annotation = FALSE
)

Arguments

`seq`	A list of AAStringSet objects, each list element containing protein sequences for a given species. This list must have names (not NULL), and names of each list element must match the names of list elements in annotation.
`annotation`	A GRangesList, CompressedGRangesList, or list of GRanges with the annotation for the sequences in seq. This list must have names (not NULL), and names of each list element must match the names of list elements in seq.
`gene_field`	Character, name of the column in the GRanges objects that contains gene IDs. Default: "gene_id".
`filter_annotation`	Logical indicating whether annotation should be filtered to keep only genes that are also in seq. This is particularly useful if users want to remove information on non-protein coding genes from annotation, since such genes are typically not present in sets of whole-genome protein sequences. Default: FALSE.

Details

This function processes the input sequences and annotation to:

Remove whitespace and anything after it in sequence names (i.e., names(seq[[x]]), which is equivalent to FASTA headers), if there is any.
Add a unique species identifier to sequence names. The species identifier consists of the first 3-5 strings of the element name. For instance, if the first element of the seq list is named "Athaliana", each sequence in it will have an identifier "Atha_" added to the beginning of each gene name (e.g., Atha_AT1G01010).
If sequences have an asterisk (*) representing stop codon, remove it.
Add a unique species identifier (same as above) to gene and chromosome names of each element of the annotation GRangesList/CompressedGRangesList.
Filter each element of the annotation GRangesList/CompressedGRangesList to keep only seqnames, ranges, and gene ID.

Value

A list of 2 elements:

seq: The processed list of AAStringSet objects from seq.
annotation: The processed GRangesList or CompressedGRangesList object from annotation.

Examples

data(annotation)
data(proteomes)
seq <- proteomes
clean_data <- process_input(seq, annotation)

almeidasilvaf/syntenet documentation built on Dec. 23, 2024, 6:26 a.m.

almeidasilvaf/syntenet index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

almeidasilvaf/syntenet
Inference And Analysis Of Synteny Networks

process_input: Process sequence data
In almeidasilvaf/syntenet: Inference And Analysis Of Synteny Networks

Process sequence data

Description

Usage

Arguments

Details

Value

Examples

Related to process_input in almeidasilvaf/syntenet...

R Package Documentation

Browse R Packages

We want your feedback!

almeidasilvaf/syntenet Inference And Analysis Of Synteny Networks

process_input: Process sequence data In almeidasilvaf/syntenet: Inference And Analysis Of Synteny Networks

Process sequence data

Description

Usage

Arguments

Details

Value

Examples

Related to process_input in almeidasilvaf/syntenet...

R Package Documentation

Browse R Packages

We want your feedback!

almeidasilvaf/syntenet
Inference And Analysis Of Synteny Networks

process_input: Process sequence data
In almeidasilvaf/syntenet: Inference And Analysis Of Synteny Networks