LogisticRegression: Logistic Regression

Description Usage Arguments Format

Description

Logistic Regression (aka logit, MaxEnt) classifier. In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to ‘ovr’, and uses the cross- entropy loss if the ‘multi_class’ option is set to ‘multinomial’. (Currently the ‘multinomial’ option is supported only by the ‘lbfgs’, ‘sag’ and ‘newton-cg’ solvers.) This class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied). The ‘newton-cg’, ‘sag’, and ‘lbfgs’ solvers support only L2 regularization with primal formulation. The ‘liblinear’ solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. Read more in the User Guide.

Usage

1
2
3
4
5
6
7
rsk_LogisticRegression

LogisticRegression(x, y, penalty = "l2", dual = FALSE, C = 1,
  fit_intercept = TRUE, intercept_scaling = 1, class_weight = NULL,
  max_iter = 100, random_state = NULL, solver = "liblinear",
  tol = 1e-04, multi_class = "ovr", verbose = 0, warm_start = FALSE,
  n_jobs = 1)

Arguments

x

matrix. Training Data

y

matrix. Target Values

penalty

str, 'l1' or 'l2', default: 'l2' Used to specify the norm used in the penalization. The 'newton-cg', 'sag' and 'lbfgs' solvers support only l2 penalties.

dual

bool, default: False Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

C

float, default: 1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

fit_intercept

bool, default: True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

intercept_scaling

float, default 1. Useful only when the solver 'liblinear' is used and self.fit_intercept is set to True. In this case, x becomes (x, self.intercept_scaling), i.e. a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight. as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

class_weight

dict or 'balanced', default: None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. New in version 0.17: class_weight=’balanced’ instead of deprecated class_weight=’auto’.

max_iter

int, default: 100 Useful only for the newton-cg, sag and lbfgs solvers. Maximum number of iterations taken for the solvers to converge.

random_state

int seed, RandomState instance, default: None The seed of the pseudo random number generator to use when shuffling the data. Used only in solvers 'sag' and 'liblinear'.

solver

str 'newton-cg', 'lbfgs', 'liblinear', 'sag', default: 'liblinear' Algorithm to use in the optimization problem.

  • For small datasets, 'liblinear' is a good choice, whereas 'sag' is faster for large ones.

  • For multiclass problems, only 'newton-cg', 'sag' and 'lbfgs' handle multinomial loss; 'liblinear' is limited to one-versus-rest schemes.

  • 'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty. Note that 'sag' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing. New in version 0.17: Stochastic Average Gradient descent solver.

tol

float, default: 1e-4 Tolerance for stopping criteria.

multi_class

str, ‘ovr’, ‘multinomial’, default: ‘ovr’ Multiclass option can be either ‘ovr’ or ‘multinomial’. If the option chosen is ‘ovr’, then a binary problem is fit for each label. Else the loss minimised is the multinomial loss fit across the entire probability distribution. Works only for the ‘newton-cg’, ‘sag’ and ‘lbfgs’ solver. New in version 0.18: Stochastic Average Gradient descent solver for ‘multinomial’ case.

verbose

int, default: 0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.

warm_start

bool, default: False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. New in version 0.17: warm_start to support lbfgs, newton-cg, sag solvers.

n_jobs

int, default: 1 Number of CPU cores used during the cross-validation loop. If given a value of -1, all cores are used.

Format

An object of class R6ClassGenerator of length 24.


dfalbel/rsk documentation built on May 15, 2019, 5:10 a.m.