When importing data into iAtlas, it is very important that the following conventions are followed. Doing so will get the new data into the iAtlas database and make it available for the app.
Information on the data model can be found in the data_model
folder which contains this README.md file.
All data should come into the iAtlas application in the form of feather files. Feather files allow for fast reading and help ensure structural integrity.
All data (feather files) should be located in the iAtlas Synapse directory
Within the feather_file
directory, data files should be segregated into folders as follows:
copy_number_results
datasets
driver_results
edges
features
gene_types
genes
mutation_codes
mutation_types
mutations
nodes
patients
publications
relationships
datasets_to_tags
features_to_samples
genes_to_samples
genes_to_types
samples_to_mutations
samples_to_tags
tags_to_tags
samples
slides
tags
Data files in each folder MUST follow a specific convention for that data type. The files can be named as is deemed most descriptive and MUST end in .feather
.
Column names MUST be spelled exactly as shown in this document.
The conventions for the feather files in each folder are as follows:
copy_number_results
#### Copy Number Results Column Names
feature
The name of a feature. These unique names MUST exist in data in the features
folder.
type - (character)
entrez
The entrez id of a gene. These genes MUST exist in data in the genes
folder.
type - (numeric)
dataset
The name of a dataset. These unique names MUST exist in data in the datasets
folder.
type - (character)
tag
The tag name associated with this copy number result. These tags MUST exist in data in the tags
folder.
type - (character)
direction
The direction of this copy number result.
type - (DIRECTION_ENUM)
mean_normal
The mean normal value this copy number result.
type - (numeric)
mean_cnv
The mean normal cnv this copy number result.
type - (numeric)
p_value
The p value associated with this copy number result.
type - (numeric)
log10_p_value
The log10 p value associated with this copy number result.
type - (numeric)
t_stat
The t stat value of this copy number result.
type - (numeric)
### datasets
#### Datasets Column Names
name
The name of a the dataset. Must be unique, must not use any charcaters besides letters, number and underscores.
type - (character)
display
A display name for the dataset.
type - (character)
### driver_results
#### Driver Results Column Names
feature
The name of a feature. These unique names MUST exist in data in the features
folder.
type - (character)
dataset
The name of a dataset. These unique names MUST exist in data in the datasets
folder.
type - (character)
entrez
The entrez id of a gene. These genes MUST exist in data in the genes
folder.
type - (numeric)
mutation_code
The mutation code associated with this driver result. These mutation codes MUST exist in data in the mutation_codes
folder.
type - (character)
tag
The tag name associated with this driver result. These tags MUST exist in data in the tags
folder.
type - (character)
p_value
The p value associated with this driver result.
type - (numeric)
fold_change
The fold change value associated with this driver result.
type - (numeric)
log10_p_value
The log10 p value associated with this driver result.
type - (numeric)
log10_fold_change
The log10 fold change value associated with this driver result.
type - (numeric)
n_wt
The number of "Wild Type" genes associated with this driver result.
type - (integer)
n_mut
The number of "Mutant" genes associated with this driver result.
type - (integer)
edges
#### Edges Column Names
name
The name of the edge. Each edge must have a unique name, we suggest <dataset_name>_<network_name>_<row_number>
type - (character)
node1
The node the edge is starting from. This must be in the name column of the corresponding nodes
file.
type - (integer)
node2
The node the edge is ending at. This must be in the name column of the corresponding nodes
file.
type - (integer)
label (optional)
The label of the edge.
type - (character)
score (optional)
The numeric value of the edge.
type - (numeric)
features
#### Features Column Names
name
The name of the feature.
type - (character)
display
A friendly display name for the feature.
type - (character)
class
The class of the feature. If the feature does not have a class, use Miscellaneous
.
type - (character)
method_tag (optional)
The method tag of the feature.
type - (character)
order (optional)
The prefered order of priority for the feature.
type - (integer)
unit (optional)
The unit used for the value of the feature.
type - (UNIT_ENUM)
gene_types
#### Gene Type Column Names
name
The name of the gene type.
type - (character)
display
A friendly display name for the gene type.
type - (character)
genes
#### Gene Column Names
entrez (required)
The entrez identifier of the gene. This is used through out the app to uniquely identify the gene. This is REQUIRED.
type - (numeric)
hgnc
The Hugo Id of the gene.
type - (character)
description
A description of the gene.
type - (character)
friendly_name (optional)
A human friendly display name for the gene.
type - (character)
io_landscape_name (optional)
The IO Landscape name for the gene.
type - (character)
gene_family (optional)
The gene family of the gene.
type - (character)
gene_function (optional)
The gene function of the gene.
type - (character)
immune_checkpoint (optional)
The immune checkpoint for the gene.
type - (character)
pathway (optional)
The pathway of the gene.
type - (character)
super_category (optional)
The super category of the gene.
type - (character)
therapy_type (optional)
The therapy type of the gene.
type - (character)
mutation_codes
#### Mutation Code Column Names
code
The mutation code. This is REQUIRED.
type - (character)
mutation_types
#### Mutation Type Column Names
name
The name of the mutation type.
type - (character)
display
A friendly display name for the mutation type.
type - (character)
mutations
#### Mutation Column Names
entrez (required)
The entrez id of a gene. These genes MUST exist in data in the genes
folder.
type - (numeric)
mutation_code (required)
The code (name) of a mutation code. These mutation codes MUST exist in data in the mutation_codes
folder.
type - (character)
mutation_type
The name of a mutation type. These mutation types MUST exist in data in the mutation_types
folder.
type - (character)
nodes
#### Node Column Names
A node may use a gene OR a feature. One of these is REQUIRED.
name
The name of the nodes. Each node must have a unique name, we suggest <dataset_name>_<network_name>_<row_number>
type - (character)
entrez
The entrez id of a gene. These genes MUST exist in data in the genes
folder.
type - (numeric)
feature
The name of the feature. These features MUST exist in data in the features
folder.
type - (character)
network
The network tag related to the node. These tags MUST exist in data in the tags
folder.\
type - (character)
dataset
The name of a dataset. These unique names MUST exist in data in the datasets
folder.
type - (character)
tag
The a tag related to the node. These tags MUST exist in data in the tags
folder.\
For a node to be used in edges, this column name MUST also exist in the edges data.
type - (character)
tag_\ (optional)
Additional tags related to the node. These tags MUST exist in data in the tags
folder.\
The column name MUST start with tag
but may be followed by a dot (_
) and some additional descriptive text. ie tag_second
or tag_01
. There may be as many tag columns as needed.\
This column name MUST also exist in the nodes data.
type - (character)
label (optional)
The label of the edge.
type - (character)
score (optional)
The numeric value of the node.
type - (numeric)
patients
#### Patients Column Names
barcode
The unique identifier representing a patient.
type - (character)
age (optional)
The age of the patient.
type - (character)
ethinicity (optional)
The ethinicity of the patient.
type - (character)
gender (optional)
The gender of the patient.
type - (character)
height (optional)
The height of the patient.
type - (character)
race (optional)
The race of the patient.
type - (character)
weight (optional)
The weight of the patient.
type - (character)
samples
#### Sample Column Names
name
The unique identifier representing the sample.
type - (character)
patient_barcode
The unique identifier representing a patient related to the sample. The patient MUST exist in the data in the patients
folder.
type - (character)
dataset
The name of a the dataset. These unique names MUST exist in data in the datasets
folder.
type - (character)
slides
#### Slide Column Names
name
The unique identifier representing the slide.
type - (character)
patient_barcode
The unique identifier representing a patient related to the slide. The patient MUST exist in the data in the patients
folder.
type - (character)
tags
#### Tag Column Names
Tags may be used to group various pieces of data. At a base level, a tag is simply a string (with some descriptive meta data). Multpile pieces of data may be related by tagging them. Tags may even be tagged to create the semblance of hierarchy.
name
The unique identifying name of the tag.
type - (character)
characteristics
Any identifying characteristics of the tag.
type - (character)
display
A human friendy display name for the tag.
type - (character)
color
A specific hex value to represent the tag by color.
type - (character)
publications
#### Publications Column Names
pubmed_id
The unique id at "https://pubmed.ncbi.nlm.nih.gov/{id}"
type - (integer)
journal
The journal published in
type - (character)
first_author_last_name
The last name of the first author
type - (character)
year
The year published
type - (integer)
title
The name of the publication
type - (character)
relationships
Often data is about relationships. The following folders are for data relationships. Each relationship depends on the original dat pieces being represented in their respective folders.
publications_to_genes
The pubmed id of the publication. These unique ids MUST exist in data in the publications
folder.
type - (integer)
The entrez id of the gene. These unique ids MUST exist in data in the genes
folder.
type - (integer)
datasets_to_tags
The name of a the dataset. These unique names MUST exist in data in the datasets
folder.
type - (character)
The name of the tag. These unique names MUST exist in data in the tags
folder.
type - (character)
features_to_samples
The name of the feature. These features MUST exist in data in the features
folder.
type - (character)
The name of the sample. These samples MUST exist in data in the samples
folder.
type - (character)
The numeric value of the feature to sample relationship. The unit of the value is expressed in the features data.
type - (numeric)
genes_to_samples
The entrez id of a gene. These genes MUST exist in data in the genes
folder.
type - (numeric)
The name of the sample. These samples MUST exist in data in the samples
folder.
type - (character)
The unique numeric RNA sequence expression of the relationship between the gene and the sample.
type - (numeric)
genes_to_types
The entrez id of a gene. These genes MUST exist in data in the genes
folder.
type - (numeric)
The type of gene this specific gene is related to. These gene types MUST exist in data in the gene_types
folder.
type - (character)
samples_to_mutations
The name of the sample. These samples MUST exist in data in the samples
folder.
type - (character)
The entrez id of a gene. These genes MUST exist in data in the genes
folder.
type - (numeric)
The code (name) of the mutation code. These mutation codes MUST exist in data in the mutation_codes
folder.
type - (character)
The name of the mutation type. These mutation types MUST exist in data in the mutation_types
folder.
type - (character)
The status of the gene in this psecific relationship. My be Wt
(Wild Type) or Mut
(Mutant).
type - (STATUS_ENUM)
samples_to_tags
The name of the sample. These samples MUST exist in data in the samples
folder.
type - (character)
The tag related to the sample. These tags MUST exist in data in the tags
folder.
type - (character)
tags_to_tags
The name of the tag. These tags MUST exist in data in the tags
folder.
type - (character)
The tag related to the initial tag. These tags MUST exist in data in the tags
folder.
type - (character)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.