GAN: Generative adversarial network for generating protein...
In dongminjung/GenProSeq: Generating Protein Sequences with Deep Generative Models

GAN	R Documentation

Generative adversarial network for generating protein sequences

Description

The generative adversarial network (GAN) is made up of a discriminator and a generator that compete in a two-player minimax game. The objective of the generator is to produce an output that is so close to real that it confuses the discriminator in being able to differentiate the fake data from the real data. The conditional GAN (CGAN) is based on vanilla GAN with additional conditional input to generator and discriminator. The auxiliary classifier GAN (ACGAN) is an extension of CGAN that adds conditional input only to the generator. The Word2vec is applied to amino acids for embedding. The GAN or ACGAN model can be trained by the function "fit_GAN", and then the function "gen_GAN" generates protein sequences from the trained model.

Usage

fit_GAN(prot_seq,
        label = NULL,
        length_seq,
        embedding_dim,
        embedding_args = list(),
        latent_dim = NULL,
        intermediate_generator_layers,
        intermediate_discriminator_layers,
        prot_seq_val = NULL,
        label_val = NULL,
        epochs,
        batch_size,
        preprocessing = list(
            x_train = NULL,
            x_val = NULL,
            y_train = NULL,
            y_val = NULL,
            lenc = NULL,
            length_seq = NULL,
            num_seq = NULL,
            embedding_dim = NULL,
            embedding_matrix = NULL,
            removed_prot_seq = NULL,
            removed_prot_seq_val = NULL,
            latent_dim = NULL),
        optimizer = "adam",
        validation_split = 0)

gen_GAN(x,
        label = NULL,
        num_seq,
        remove_gap = TRUE)

Arguments

`prot_seq`	aligned amino acid sequence
`label`	label (default: NULL)
`length_seq`	length of sequence
`embedding_dim`	dimension of the dense embedding
`embedding_args`	list of arguments for "word2vec::word2vec" but for dim, min_count and split
`latent_dim`	dimension of latent vector (default: NULL)
`intermediate_generator_layers`	list of intermediate layers for generator, without input layer
`intermediate_discriminator_layers`	list of intermediate layers for discriminator, without output layer
`prot_seq_val`	amino acid sequence for validation (default: NULL)
`label_val`	label for validation (default: NULL)
`epochs`	number of epochs
`batch_size`	batch size
`preprocessing`	list of preprocessed results, they are set to NULL as default x_train : embedded sequence data for train x_val : embedded sequence data for validation y_train : labels for train y_val : labels for validation lenc : encoded labels length_seq : length of sequence num_seq : number of sequences for train embedding_dim : dimension of the dense embedding embedding_matrix : embedding matrix removed_prot_seq : index for removed protein sequences while checking removed_prot_seq_val : index for removed protein sequences of validation latent_dim : dimension of latent vector
`optimizer`	name of optimizer (default: adam)
`validation_split`	proportion of validation data, it is ignored when there is a validation set (default: 0)
`x`	result of the function "fit_GAN"
`num_seq`	number of sequences to be generated
`remove_gap`	remove gaps from sequences (default: TRUE)

Value

`model`	trained GAN model
`generator`	trained generator model
`discriminator`	trained discriminator model
`preprocessing`	preprocessed results
`gen_seq`	generated sequence data
`label`	labels for generated sequence data

Author(s)

Dongmin Jung

References

Liebowitz, J. (Ed.). (2020). Data Analytics and AI. CRC Press.

Pedrycz, W., & Chen, S. M. (Eds.). (2020). Deep Learning: Concepts and Architectures. Springer.

Suguna, S. K., Dhivya, M., & Paiva, S. (Eds.). (2021). Artificial Intelligence (AI): Recent Trends and Applications. CRC Press.

Sun, S., Mao, L., Dong, Z., & Wu, L. (2019). Multiview machine learning. Springer.

Examples

# model parameters
length_seq <- 403
embedding_dim <- 8
latent_dim <- 4
epochs <- 2
batch_size <- 64

# GAN
GAN_result <- fit_GAN(prot_seq = example_PTEN,
                    length_seq = length_seq,
                    embedding_dim = embedding_dim,
                    latent_dim = latent_dim,
                    intermediate_generator_layers = list(
                        layer_dense(units = 16),
                        layer_dense(units = 128)),
                    intermediate_discriminator_layers = list(
                        layer_dense(units = 128, activation = "relu"),
                        layer_dense(units = 16, activation = "relu")),
                    prot_seq_val = example_PTEN,
                    epochs = epochs,
                    batch_size = batch_size)
set.seed(1)
gen_prot_GAN <- gen_GAN(GAN_result, num_seq = 100)


### from preprocessing
GAN_result2 <- fit_GAN(preprocessing = GAN_result$preprocessing,
                        intermediate_generator_layers = list(
                            layer_dense(units = 16),
                            layer_dense(units = 128)),
                        intermediate_discriminator_layers = list(
                            layer_dense(units = 128, activation = "relu"),
                            layer_dense(units = 16, activation = "relu")),
                        epochs = epochs,
                        batch_size = batch_size)
gen_prot_GAN <- gen_GAN(GAN_result2, num_seq = 100)

dongminjung/GenProSeq documentation built on May 3, 2022, 10:28 p.m.