| nlp_medical_ner | R Documentation | 
This Named Entity recognition annotator allows to train generic NER model based on Neural Networks. Its train data (train_ner) is either a labeled or an external CoNLL 2003 IOB based spark dataset with Annotations columns. Also the user has to provide word embeddings annotation column.
nlp_medical_ner(
  x,
  input_cols,
  output_col,
  label_col = NULL,
  max_epochs = NULL,
  lr = NULL,
  po = NULL,
  batch_size = NULL,
  dropout = NULL,
  verbose = NULL,
  include_confidence = NULL,
  random_seed = NULL,
  graph_folder = NULL,
  validation_split = NULL,
  eval_log_extended = NULL,
  enable_output_logs = NULL,
  output_logs_path = NULL,
  enable_memory_optimizer = NULL,
  pretrained_model_path = NULL,
  override_existing_tags = NULL,
  tags_mapping = NULL,
  test_dataset = NULL,
  use_contrib = NULL,
  log_prefix = NULL,
  include_all_confidence_scores = NULL,
  graph_file = NULL,
  uid = random_string("medical_ner_")
)
x | 
 A   | 
input_cols | 
 Input columns. String array.  | 
output_col | 
 Output column. String.  | 
label_col | 
 If DatasetPath is not provided, this seq of Annotation type of column should have labeled data per token (string)  | 
max_epochs | 
 Maximum number of epochs to train (integer)  | 
lr | 
 Initial learning rate (float)  | 
po | 
 Learning rate decay coefficient. Real Learning Rate: lr / (1 + po * epoch) (float)  | 
batch_size | 
 Batch size for training (integer)  | 
dropout | 
 Dropout coefficient (float)  | 
verbose | 
 Verbosity level (integer)  | 
include_confidence | 
 whether to include confidence values (boolean)  | 
random_seed | 
 Random seed (integer)  | 
graph_folder | 
 folder path that contain external graph files  | 
validation_split | 
 proportion of the data to use for validation (float)  | 
eval_log_extended | 
 whether logs for validation to be extended: it displays time and evaluation of each label. (boolean)  | 
enable_output_logs | 
 whether to enable the TensorFlow output logs (boolean)  | 
output_logs_path | 
 path for the output logs  | 
enable_memory_optimizer | 
 allow training NerDLApproach on a dataset larger than the memory  | 
pretrained_model_path | 
 set the location of an already trained MedicalNerModel, which is used as a starting point for training the new model.  | 
override_existing_tags | 
 controls whether to override already learned tags when using a pretrained model to initialize the new model.  | 
tags_mapping | 
 a string list specifying how old tags are mapped to new ones. (e.g. c("B-PER,B-VIP", "I-PER,I-VIP"))  | 
test_dataset | 
 path to test dataset  | 
use_contrib | 
 whether to use contrib LSTM cells  | 
log_prefix | 
 a string prefix to be included in the logs  | 
include_all_confidence_scores | 
 whether to include confidence scores in annotation metadata  | 
graph_file | 
 Folder path that contain external graph files  | 
uid | 
 A character string used to uniquely identify the ML estimator.  | 
Neural Network architecture is Char CNNs - BiLSTM - CRF that achieves state-of-the-art in most datasets. See https://nlp.johnsnowlabs.com/docs/en/annotators#ner-dl
The object returned depends on the class of x.
spark_connection: When x is a spark_connection, the function returns an instance of a ml_estimator object. The object contains a pointer to
a Spark Estimator object and can be used to compose
Pipeline objects.
ml_pipeline: When x is a ml_pipeline, the function returns a ml_pipeline with
the NLP estimator appended to the pipeline.
tbl_spark: When x is a tbl_spark, an estimator is constructed then
immediately fit with the input tbl_spark, returning an NLP model.
When x is a spark_connection the function returns a NerDLApproach estimator.
When x is a ml_pipeline the pipeline with the NerDLApproach added. When x
is a tbl_spark a transformed tbl_spark  (note that the Dataframe passed in must have the input_cols specified).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.