Class LogisticRegression (1.31.0)

LogisticRegression(
    *,
    optimize_strategy: typing.Literal[
        "auto_strategy", "batch_gradient_descent"
    ] = "auto_strategy",
    fit_intercept: bool = True,
    l1_reg: typing.Optional[float] = None,
    l2_reg: float = 0.0,
    max_iterations: int = 20,
    warm_start: bool = False,
    learning_rate: typing.Optional[float] = None,
    learning_rate_strategy: typing.Literal["line_search", "constant"] = "line_search",
    tol: float = 0.01,
    ls_init_learning_rate: typing.Optional[float] = None,
    calculate_p_values: bool = False,
    enable_global_explain: bool = False,
    class_weight: typing.Optional[
        typing.Union[typing.Literal["balanced"], typing.Dict[str, float]]
    ] = None
)

Logistic Regression (aka logit, MaxEnt) classifier.

from bigframes.ml.linear_model import LogisticRegression import bigframes.pandas as bpd bpd.options.display.progress_bar = None X = bpd.DataFrame({ "feature0": [20, 21, 19, 18], "feature1": [0, 1, 1, 0], "feature2": [0.2, 0.3, 0.4, 0.5]}) y = bpd.DataFrame({"outcome": [0, 0, 1, 1]})

Create the LogisticRegression

model = LogisticRegression() model.fit(X, y) LogisticRegression() model.predict(X) # doctest:+SKIP predicted_outcome predicted_outcome_probs feature0 feature1 feature2 0 0 [{'label': 1, 'prob': 3.1895929877221615e-07} ... 20 0 0.2 1 0 [{'label': 1, 'prob': 5.662891265051953e-06} ... 21 1 0.3 2 1 [{'label': 1, 'prob': 0.9999917826885262} {'l... 19 1 0.4 3 1 [{'label': 1, 'prob': 0.9999999993659574} {'l... 18 0 0.5 4 rows × 5 columns

[4 rows x 5 columns in total]

Score the model

score = model.score(X, y) score # doctest:+SKIP precision recall accuracy f1_score log_loss roc_auc 0 1.0 1.0 1.0 1.0 0.000004 1.0 1 rows × 6 columns

[1 rows x 6 columns in total]

Parameters

NameDescription
optimize_strategystr, default "auto_strategy"

The strategy to train logistic regression models. Possible values are "auto_strategy" and "batch_gradient_descent". The two are equilevant since "auto_strategy" will fall back to "batch_gradient_descent". The API is kept for consistency. Default to "auto_strategy".

fit_interceptdefault True

Default True. Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

class_weightdict or 'balanced', default None

Default None. Weights associated with classes in the form {class_label: weight}.If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). Dict isn't supported.

l1_regfloat or None, default None

The amount of L1 regularization applied. Default to None. Can't be set in "normal_equation" mode. If unset, value 0 is used.

l2_regfloat, default 0.0

The amount of L2 regularization applied. Default to 0.

max_iterationsint, default 20

The maximum number of training iterations or steps. Default to 20.

warm_startbool, default False

Determines whether to train a model with new training data, new model options, or both. Unless you explicitly override them, the initial options used to train the model are used for the warm start run. Default to False.

learning_ratefloat or None, default None

The learn rate for gradient descent when learning_rate_strategy='constant'. If unset, value 0.1 is used. If learning_rate_strategy='line_search', an error is returned.

learning_rate_strategystr, default "line_search"

The strategy for specifying the learning rate during training. Default to "line_search".

tolfloat, default 0.01

The minimum relative loss improvement that is necessary to continue training when EARLY_STOP is set to true. For example, a value of 0.01 specifies that each iteration must reduce the loss by 1% for training to continue. Default to 0.01.

ls_init_learning_ratefloat or None, default None

Sets the initial learning rate that learning_rate_strategy='line_search' uses. This option can only be used if line_search is specified. If unset, value 0.1 is used.

calculate_p_valuesbool, default False

Specifies whether to compute p-values and standard errors during training. Default to False.

enable_global_explainbool, default False

Whether to compute global explanations using explainable AI to evaluate global feature importance to the model. Default to False.

Methods

__repr__

__repr__()

Print the estimator's constructor with all non-default parameter values.

fit

fit(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    X_eval: typing.Optional[
        typing.Union[
            bigframes.dataframe.DataFrame,
            bigframes.series.Series,
            pandas.core.frame.DataFrame,
            pandas.core.series.Series,
        ]
    ] = None,
    y_eval: typing.Optional[
        typing.Union[
            bigframes.dataframe.DataFrame,
            bigframes.series.Series,
            pandas.core.frame.DataFrame,
            pandas.core.series.Series,
        ]
    ] = None,
) -> bigframes.ml.base._T

Fit the model according to the given training data.

Parameters
NameDescription
Xbigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series

Series or DataFrame of shape (n_samples, n_features). Training vector, where n_samples is the number of samples and n_features is the number of features.

ybigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series

DataFrame of shape (n_samples,). Target vector relative to X.

X_evalbigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series

Series or DataFrame of shape (n_samples, n_features). Evaluation vector, where n_samples is the number of samples and n_features is the number of features.

y_evalbigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series

DataFrame of shape (n_samples,). Target vector relative to X_eval.

Returns
TypeDescription
LogisticRegressionFitted estimator.

get_params

get_params(deep: bool = True) -> typing.Dict[str, typing.Any]

Get parameters for this estimator.

Parameter
NameDescription
deepbool, default True

Default True. If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
TypeDescription
DictionaryA dictionary of parameter names mapped to their values.

predict

predict(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ]
) -> bigframes.dataframe.DataFrame

Predict class labels for samples in X.

Parameter
NameDescription
Xbigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series

Series or DataFrame of shape (n_samples, n_features). The data matrix for which we want to get the predictions.

Returns
TypeDescription
bigframes.dataframe.DataFrameDataFrame of shape (n_samples, n_input_columns + n_prediction_columns). Returns predicted values.

predict_explain

predict_explain(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    *,
    top_k_features: int = 5
) -> bigframes.dataframe.DataFrame

Explain predictions for a logistic regression model.

Parameter
NameDescription
top_k_featuresint, default 5

an INT64 value that specifies how many top feature attribution pairs are generated for each row of input data. The features are ranked by the absolute values of their attributions. By default, top_k_features is set to 5. If its value is greater than the number of features in the training data, the attributions of all features are returned.

Returns
TypeDescription
bigframes.pandas.DataFrameThe predicted DataFrames with explanation columns.

register

register(vertex_ai_model_id: typing.Optional[str] = None) -> bigframes.ml.base._T

Register the model to Vertex AI.

After register, go to the Google Cloud console (https://console.cloud.google.com/vertex-ai/models) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.

Parameter
NameDescription
vertex_ai_model_idOptional[str], default None

Optional string id as model id in Vertex. If not set, will default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.

score

score(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
) -> bigframes.dataframe.DataFrame

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy, which is a harsh metric since you require that each label set be correctly predicted for each sample.

Parameters
NameDescription
Xbigframes.dataframe.DataFrame or bigframes.series.Series

DataFrame of shape (n_samples, n_features). Test samples.

ybigframes.dataframe.DataFrame or bigframes.series.Series

DataFrame of shape (n_samples,) or (n_samples, n_outputs). True labels for X.

Returns
TypeDescription
bigframes.dataframe.DataFrameA DataFrame of the evaluation result.

to_gbq

to_gbq(
    model_name: str, replace: bool = False
) -> bigframes.ml.linear_model.LogisticRegression

Save the model to BigQuery.

Parameters
NameDescription
model_namestr

The name of the model.

replacebool, default False

Determine whether to replace if the model already exists. Default to False.

Returns
TypeDescription
LogisticRegressionSaved model.