TFpredict: TFpredict
In brengong/ConservationtextmineR:

Description Usage Arguments Details Value Author(s) Examples

If type = "single" uses a data.table with a column named Targeting_Factor and a column named Consensus_Sequence to predict what Targeting Factor consensus sequences are located within a Protein, DNA, or RNA sequence Requires a character vector containing the Protein, DNA, or RNA sequence of interest. Requires transcription factor data table with a column labeled "Targeting_Factor" housing the targeting protein of interest and a column labeled "Consensus_Sequence" housing the targeting protein consensus sequence.

1	TFpredict(Target, Targeting_Factor_DT, type)

`Target`	If type = "single", a vector containing one protein, DNA, or RNA character string. If type = "multiple", a data.table containing two columns, one labeled "Sequence" and one labeled "gene_symbol". If type = "multiple_species", a data.table containing four columns, one labeled "Sequence" one labeled "gene_symbol", one labeled "Common_Name", and one labeled "Scientific_Name". If type = "multiple_species_unknown", a data.table containing three columns, one labeled "Sequence" one labeled "gene_symbol", and one labeled "Scientific_Name".
`Targeting_Factor_DT`	a data table with two columns labeled: Targeting_Factor and Consensus_Sequence
`type`	a single character indicating "single", "multiple", "multiple_species", or "multiple_species_unknown"

If type = "multiple" Uses two data.tables. The first one (Targeting_Factor_DT) houses a column named "Targeting_Factor" and a column named "Consensus_Sequence". The second one (Target) houses a column named "Sequence" and a column named "gene_symbol". It returns what targeting protein consensus sequences housed in Targeting_Factor_DT are located within a data table of protein, DNA, RNA sequence housed in Target. Requires data.table (Targeting_Factor_DT) with a column named "Targeting_Factor" and a column named "Consensus_Sequence" housing the consensus sequence. Requires data.table (Target) with a columns named "Sequence" and "gene_symbol" housing the protein, DNA, or RNA sequences and designated names to query.

If type = "multiple_species" Uses two data.tables. The first one (Targeting_Factor_DT) houses a column named "Targeting_Factor" and a column named "Consensus_Sequence". The second one (Target) houses a column named "Sequence", a column named "gene_symbol", column labeled "Common_Name", and a column labeled "Scientific_Name". It returns what consensus sequences housed in Targeting_Factor_DT are located within a data table of protein, DNA, opr RNA sequence housed in Target across multiple species. Requires data.table (Targeting_Factor_DT) with a column named "Targeting_Factor" and a column named "Consensus_Sequence" housing the consensus sequence. Requires data.table (Target) with a columns named "Sequence", "gene_symbol", "Common_Name", and "Scientific_Name", housing the protein, DNA, or RNA sequences and designated names to query.

If type = "multiple_species_unknown" Used if the Target data.table does not contain a column labeled "Common_Name". Uses two data.tables. The first one (Targeting_Factor_DT) houses a column named "Targeting_Factor" and a column named "Consensus_Sequence". The second one (Target) houses a column named "Sequence", a column named "gene_symbol", and a column labeled "Scientific_Name". It returns what consensus sequences housed in Targeting_Factor_DT are located within a data.table of protein, DNA, or RNA sequence housed in Target across multiple species. Requires data.table (Targeting_Factor_DT) with a column named "Targeting_Factor" and a column named "Consensus_Sequence" housing the consensus sequence. Requires data.table (Target) with a columns named "Sequence", "gene_symbol", and "Scientific_Name", housing the protein, DNA, or RNA sequences and designated names to query.

A data table

Brendan Gongol

Targeting_Factor <- c("AMPK", "PKA", "PKC", "MAPK", "CAMKKB", "CAMKI", "CAMKIV", "CKII", "CDK", "SRC", "AKT")
Consensus_Sequence <- c("AGCNVTQ", "PPKLYS", "AAT","AAAAAAAAAAATTGCNVMDEDE", "AA", "ATA", "GTTT", "AAAAAAAAAAAAAAAAAAAAA", "(A|T)..(H|F)", "GCTAAGCTGCGCAATTTTTGTATTTTGT|AGTTCTTTTTGTGTATTAGCTCAGATTTTCCAGCTG","(A|T)T(A|G)(C|T)(C|T)T(C|T)")

data_table <- data.table(cbind(Targeting_Factor, Consensus_Sequence))
data_table

Prot <- "ACQVAPKLHGEAGCNVTQDWTYMMGVCSTASYAATWEQDEPLWYMAATNGHCATWWAAASSCATAFQTSKLPIIIGHATSDF"

Sequence <- c("ACDFEQAGCNVTQPCTSTSGANDEPHYYASTGFWYKAGCNVTQETCCKLLHAQSWW",
              "ACQVAPKLHGEDWTYMMGVCSTASYWEQDEPLWYMNGHCATWWAAASSCTAQTSKLPIIIGHATSDF",
              "TGHATSHCTANMKLPYWQEDTGSCANMHGTYYYDEDEDDASQWWWMNNNCGYTEWSDFGCPKKK",
              "AAAATSTSTSGGGGGAAACCCCNNNNMMMMPPPPWWWWQHGGTTYYNNCCAA", "AAAAAATTTTTTCCCCCCGGGGGG")
gene_symbol <- c("PABP", "EIF4E", "SREBP", "FOXO", "ABCA1")
Common_Name <- c("Rat", "Mouse", "Human", "Cattle", "Dog")
Scientific_Name <- c("Rattus_norvegicus", "Mus_musculus", "Homo_sapiens", "Bos_taurus", "Canis_familiaris")
proteinTarg <- data.frame(cbind(gene_symbol,Sequence, Common_Name, Scientific_Name))
proteinTarg$Sequence <- as.character(proteinTarg$Sequence)
proteinTarg$gene_symbol <- as.character(proteinTarg$gene_symbol)
proteinTarg$Common_Name <- as.character(proteinTarg$Common_Name)
proteinTarg$Scientific_Name <- as.character(proteinTarg$Scientific_Name)
proteinTarg


TFpredict(Prot, data_table , type = "single")
TFpredict(proteinTarg, data_table, type = "multiple")
TFpredict(proteinTarg, data_table, type = "multiple_species")
TFpredict(proteinTarg, data_table, type = "multiple_species_unknown")


Targeting_Factor <- c("KLF2", "PGC1A", "FOXO1", "NCL", "SREBP", "MYC", "HIF", "NF-KB", "TXNIP", "PPAR")
Consensus_Sequence <- c("AAGCT", "GCGC", "AAT","AAAAAAAAAAAAAAAAAAAAAAAA", "AA", "ATA", "GTTT", "AAAAAAAAAAAAAAAAAAAAA", "(A|T)..(C|G)", "GCTAAGCTGCGCAATTTTTGTATTTTGT|AGTTCTTTTTGTGTATTAGCTCAGATTTTCCAGCTG")
TX_data_table <- data.table(cbind(Targeting_Factor, Consensus_Sequence))
TX_data_table

chromo_seq <- "AAGCTAAGCTAAGCTGCGCAATTTTTGTATTTTGTTTAAACAGAATCCTCAAGGGAACATCATCCTCAGTTCTTTTTGTGTATTAGCTCAGATTTTCCAGCTGTTTTTAAAGCT"


Sequence <- c("AAGCTAAGCTAAGCTGCGCAATTTTTGTATTTTGTTTAAACAGAATCCTCAAGGGAACATCATCCTCAGTTCTTTTTGTGTATTAGCTCAGATTTTCCAGCTGTTTTTAAAGCT",
              "CTGTTTCGAGCCTGAATCTCGATCGCTCGCGCTAGACAGCTCGACGCACTTTTCAGCAGGAGCCTG",
              "TCAGCAGATAGCGCTCGATACAGCTCGACAGCTCTTGCTGTATTGTGTG",
              "TTGCTGTATTGTGTGATCCTCGATACAGGTATTTTCTGAGCCTGATAGCTAGCTTTGCTGTATTGTGTG",
              "AAAAAATTTTTTCCCCCCGGGGGG")
gene_symbol <- c("AKT", "PI3K", "SREBP", "FOXO", "ABCA1")
Common_Name <- c("Rat", "Mouse", "Human", "Cattle", "Dog")
Scientific_Name <- c("Rattus_norvegicus", "Mus_musculus", "Homo_sapiens", "Bos_taurus", "Canis_familiaris")
chromo <- data.frame(cbind(Sequence, gene_symbol, Common_Name, Scientific_Name))
chromo$Sequence <- as.character(chromo$Sequence)
chromo$gene_symbol <- as.character(chromo$gene_symbol)
chromo$Common_Name <- as.character(chromo$Common_Name)
chromo$Scientific_Name <- as.character(chromo$Scientific_Name)
chromo



TFpredict(chromo_seq, TX_data_table , type = "single")
TFpredict(chromo, TX_data_table, type = "multiple")
TFpredict(chromo, TX_data_table, type = "multiple_species")
TFpredict(chromo, TX_data_table, type = "multiple_species_unknown")