Lädt...

📚 Logistic Regression, Explained: A Visual Guide with Code Examples for Beginners


Nachrichtenbereich: 🔧 AI Nachrichten
🔗 Quelle: towardsdatascience.com

CLASSIFICATION ALGORITHM

Finding the perfect weights to fit the data in

While some probabilistic-based machine learning models (like Naive Bayes) make bold assumptions about feature independence, logistic regression takes a more measured approach. Think of it as drawing a line (or plane) that separates two outcomes, allowing us to predict probabilities with a bit more flexibility.

All visuals: Author-created using Canva Pro. Optimized for mobile; may appear oversized on desktop.

Definition

Logistic regression is a statistical method used for predicting binary outcomes. Despite its name, it’s used for classification rather than regression. It estimates the probability that an instance belongs to a particular class. If the estimated probability is greater than 50%, the model predicts that the instance belongs to that class (or vice versa).

Dataset Used

Throughout this article, we’ll use this artificial golf dataset (inspired by [1]) as an example. This dataset predicts whether a person will play golf based on weather conditions.

Just like in KNN, logistic regression requires the data to be scaled first. Convert categorical columns into 0 & 1 and also scale the numerical features so that no single feature dominates the distance metric.

Columns: ‘Outlook’, ‘Temperature’, ‘Humidity’, ‘Wind’ and ‘Play’ (target feature). The categorical columns (Outlook & Windy) are encoded using one-hot encoding while the numerical columns are scaled using standard scaling (z-normalization).
# Import required libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np

# Create dataset from dictionary
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', 'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy', 'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast', 'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0, 72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0, 88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False, True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)

# Prepare data: encode categorical variables
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)

# Rearrange columns
column_order = ['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind', 'Play']
df = df[column_order]

# Split data into features and target
X, y = df.drop(columns='Play'), df['Play']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)

# Scale numerical features
scaler = StandardScaler()
X_train[['Temperature', 'Humidity']] = scaler.fit_transform(X_train[['Temperature', 'Humidity']])
X_test[['Temperature', 'Humidity']] = scaler.transform(X_test[['Temperature', 'Humidity']])

# Print results
print("Training set:")
print(pd.concat([X_train, y_train], axis=1), '\n')
print("Test set:")
print(pd.concat([X_test, y_test], axis=1))

Main Mechanism

Logistic regression works by applying the logistic function to a linear combination of the input features. Here’s how it operates:

  1. Calculate a weighted sum of the input features (similar to linear regression).
  2. Apply the logistic function (also called sigmoid function) to this sum, which maps any real number to a value between 0 and 1.
  3. Interpret this value as the probability of belonging to the positive class.
  4. Use a threshold (typically 0.5) to make the final classification decision.
For our golf dataset, logistic regression might combine the weather factors into a single score, then transform this score into a probability of playing golf.

Training Steps

The training process for logistic regression involves finding the best weights for the input features. Here’s the general outline:

  1. Initialize the weights (often to small random values).
# Initialize weights (including bias) to 0.1
initial_weights = np.full(X_train_np.shape[1], 0.1)

# Create and display DataFrame for initial weights
print(f"Initial Weights: {initial_weights}")

2. For each training example:
a. Calculate the predicted probability using the current weights.

def sigmoid(z):
return 1 / (1 + np.exp(-z))

def calculate_probabilities(X, weights):
z = np.dot(X, weights)
return sigmoid(z)

def calculate_log_loss(probabilities, y):
return -y * np.log(probabilities) - (1 - y) * np.log(1 - probabilities)

def create_output_dataframe(X, y, weights):
probabilities = calculate_probabilities(X, weights)
log_losses = calculate_log_loss(probabilities, y)

df = pd.DataFrame({
'Probability': probabilities,
'Label': y,
'Log Loss': log_losses
})

return df

def calculate_average_log_loss(X, y, weights):
probabilities = calculate_probabilities(X, weights)
log_losses = calculate_log_loss(probabilities, y)
return np.mean(log_losses)

# Convert X_train and y_train to numpy arrays for easier computation
X_train_np = X_train.to_numpy()
y_train_np = y_train.to_numpy()

# Add a column of 1s to X_train_np for the bias term
X_train_np = np.column_stack((np.ones(X_train_np.shape[0]), X_train_np))

# Create and display DataFrame for initial weights
initial_df = create_output_dataframe(X_train_np, y_train_np, initial_weights)
print(initial_df.to_string(index=False, float_format=lambda x: f"{x:.6f}"))
print(f"\nAverage Log Loss: {calculate_average_log_loss(X_train_np, y_train_np, initial_weights):.6f}")

b. Compare this probability to the actual class label by calculating its log loss.

3. Update the weights to minimize the loss (usually using some optimization algorithm, like gradient descent. This include repeatedly do Step 2 until log loss cannot get smaller).

def gradient_descent_step(X, y, weights, learning_rate):
m = len(y)
probabilities = calculate_probabilities(X, weights)
gradient = np.dot(X.T, (probabilities - y)) / m
new_weights = weights - learning_rate * gradient # Create new array for updated weights
return new_weights

# Perform one step of gradient descent (one of the simplest optimization algorithm)
learning_rate = 0.1
updated_weights = gradient_descent_step(X_train_np, y_train_np, initial_weights, learning_rate)

# Print initial and updated weights
print("\nInitial weights:")
for feature, weight in zip(['Bias'] + list(X_train.columns), initial_weights):
print(f"{feature:11}: {weight:.2f}")

print("\nUpdated weights after one iteration:")
for feature, weight in zip(['Bias'] + list(X_train.columns), updated_weights):
print(f"{feature:11}: {weight:.2f}")
# With sklearn, you can get the final weights (coefficients)
# and final bias (intercepts) easily.
# The result is almost the same as doing it manually above.

from sklearn.linear_model import LogisticRegression

lr_clf = LogisticRegression(penalty=None, solver='saga')
lr_clf.fit(X_train, y_train)

coefficients = lr_clf.coef_
intercept = lr_clf.intercept_

y_train_prob = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(y_train_prob) + (1 - y_train) * np.log(1 - y_train_prob))

print(f"Weights & Bias Final: {coefficients[0].round(2)}, {round(intercept[0],2)}")
print("Loss Final:", loss.round(3))

Classification Steps

Once the model is trained:
1. For a new instance, calculate the probability with the final weights (also called coefficients), just like during the training step.

2. Interpret the output by seeing the probability: if p ≥ 0.5, predict class 1; otherwise, predict class 0

# Calculate prediction probability
predicted_probs = lr_clf.predict_proba(X_test)[:, 1]

z_values = np.log(predicted_probs / (1 - predicted_probs))

result_df = pd.DataFrame({
'ID': X_test.index,
'Z-Values': z_values.round(3),
'Probabilities': predicted_probs.round(3)
}).set_index('ID')

print(result_df)

# Make predictions
y_pred = lr_clf.predict(X_test)
print(y_pred)

Evaluation Step

result_df = pd.DataFrame({
'ID': X_test.index,
'Label': y_test,
'Probabilities': predicted_probs.round(2),
'Prediction': y_pred,
}).set_index('ID')

print(result_df)

Key Parameters

Logistic regression has several important parameters that control its behavior:

1.Penalty: The type of regularization to use (‘l1’, ‘l2’, ‘elasticnet’, or ‘none’). Regularization in logistic regression prevents overfitting by adding a penalty term to the model’s loss function, that encourages simpler models.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

regs = [None, 'l1', 'l2']
coeff_dict = {}

for reg in regs:
lr_clf = LogisticRegression(penalty=reg, solver='saga')
lr_clf.fit(X_train, y_train)
coefficients = lr_clf.coef_
intercept = lr_clf.intercept_
predicted_probs = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(predicted_probs) + (1 - y_train) * np.log(1 - predicted_probs))
predictions = lr_clf.predict(X_test)
accuracy = accuracy_score(y_test, predictions)

coeff_dict[reg] = {
'Coefficients': coefficients,
'Intercept': intercept,
'Loss': loss,
'Accuracy': accuracy
}

for reg, vals in coeff_dict.items():
print(f"{reg}: Coeff: {vals['Coefficients'][0].round(2)}, Intercept: {vals['Intercept'].round(2)}, Loss: {vals['Loss'].round(3)}, Accuracy: {vals['Accuracy'].round(3)}")

2. Regularization Strength (C): Controls the trade-off between fitting the training data and keeping the model simple. A smaller C means stronger regularization.

# List of regularization strengths to try for L1
strengths = [0.001, 0.01, 0.1, 1, 10, 100]

coeff_dict = {}

for strength in strengths:
lr_clf = LogisticRegression(penalty='l1', C=strength, solver='saga')
lr_clf.fit(X_train, y_train)

coefficients = lr_clf.coef_
intercept = lr_clf.intercept_

predicted_probs = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(predicted_probs) + (1 - y_train) * np.log(1 - predicted_probs))
predictions = lr_clf.predict(X_test)

accuracy = accuracy_score(y_test, predictions)

coeff_dict[f'L1_{strength}'] = {
'Coefficients': coefficients[0].round(2),
'Intercept': round(intercept[0],2),
'Loss': round(loss,3),
'Accuracy': round(accuracy*100,2)
}

print(pd.DataFrame(coeff_dict).T)
# List of regularization strengths to try for L2
strengths = [0.001, 0.01, 0.1, 1, 10, 100]

coeff_dict = {}

for strength in strengths:
lr_clf = LogisticRegression(penalty='l2', C=strength, solver='saga')
lr_clf.fit(X_train, y_train)

coefficients = lr_clf.coef_
intercept = lr_clf.intercept_

predicted_probs = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(predicted_probs) + (1 - y_train) * np.log(1 - predicted_probs))
predictions = lr_clf.predict(X_test)
accuracy = accuracy_score(y_test, predictions)

coeff_dict[f'L2_{strength}'] = {
'Coefficients': coefficients[0].round(2),
'Intercept': round(intercept[0],2),
'Loss': round(loss,3),
'Accuracy': round(accuracy*100,2)
}

print(pd.DataFrame(coeff_dict).T)

3. Solver: The algorithm to use for optimization (‘liblinear’, ‘newton-cg’, ‘lbfgs’, ‘sag’, ‘saga’). Some regularization might require a particular algorithm.

4. Max Iterations: The maximum number of iterations for the solver to converge.

For our golf dataset, we might start with ‘l2’ penalty, ‘liblinear’ solver, and C=1.0 as a baseline.

Pros & Cons

Like any algorithm in machine learning, logistic regression has its strengths and limitations.

Pros:

  1. Simplicity: Easy to implement and understand.
  2. Interpretability: The weights directly show the importance of each feature.
  3. Efficiency: Doesn’t require too much computational power.
  4. Probabilistic Output: Provides probabilities rather than just classifications.

Cons:

  1. Linearity Assumption: Assumes a linear relationship between features and log-odds of the outcome.
  2. Feature Independence: Assumes features are not highly correlated.
  3. Limited Complexity: May underfit in cases where the decision boundary is highly non-linear.
  4. Requires More Data: Needs a relatively large sample size for stable results.

In our golf example, logistic regression might provide a clear, interpretable model of how each weather factor influences the decision to play golf. However, it might struggle if the decision involves complex interactions between weather conditions that can’t be captured by a linear model.

Final Remark

Logistic regression shines as a powerful yet straightforward classification tool. It stands out for its ability to handle complex data while remaining easy to interpret. Unlike some other basic models, it provides smooth probability estimates and works well with many features. In the real world, from predicting customer behavior to medical diagnoses, logistic regression often performs surprisingly well. It’s not just a stepping stone — it’s a reliable model that can match more complex models in many situations.

🌟 Logistic Regression Code Summarized

# Import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the dataset
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', 'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy', 'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast', 'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0, 72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0, 88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False, True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)

# Prepare data: encode categorical variables
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)

# Split data into training and testing sets
X, y = df.drop(columns='Play'), df['Play']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)

# Scale numerical features
scaler = StandardScaler()
float_cols = X_train.select_dtypes(include=['float64']).columns
X_train[float_cols] = scaler.fit_transform(X_train[float_cols])
X_test[float_cols] = scaler.transform(X_test[float_cols])

# Train the model
lr_clf = LogisticRegression(penalty='l2', C=1, solver='saga')
lr_clf.fit(X_train, y_train)

# Make predictions
y_pred = lr_clf.predict(X_test)

# Evaluate the model
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

Further Reading

For a detailed explanation of the Logistic Regression and its implementation in scikit-learn, readers can refer to the official documentation [2], which provides comprehensive information on its usage and parameters.

Technical Environment

This article uses Python 3.7 and scikit-learn 1.5. While the concepts discussed are generally applicable, specific code implementations may vary slightly with different versions.

About the Illustrations

Unless otherwise noted, all images are created by the author, incorporating licensed design elements from Canva Pro.

Reference

[1] T. M. Mitchell, Machine Learning (1997), McGraw-Hill Science/Engineering/Math, pp. 59

[2] F. Pedregosa et al., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html


Logistic Regression, Explained: A Visual Guide with Code Examples for Beginners was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

...

📰 Logistic Regression, Explained: A Visual Guide with Code Examples for Beginners


📈 76.85 Punkte
🔧 AI Nachrichten

📰 Least Squares Regression, Explained: A Visual Guide with Code Examples for Beginners


📈 56.08 Punkte
🔧 AI Nachrichten

🔧 Linear vs Logistic Regression: How to Choose the Right Regression Model for Your Data


📈 45.9 Punkte
🔧 Programmierung

📰 Turn Linear Regression into Logistic Regression


📈 45.9 Punkte
🔧 AI Nachrichten

📰 Model Calibration, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Predicted Probability, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Predicted Probability, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Bias-Variance Tradeoff, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Discretization, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Gaussian Naive Bayes, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Dummy Regressor, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Scaling Numerical Data, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Decision Tree Classifier, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Missing Value Imputation, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Bernoulli Naive Bayes, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Bernoulli Naive Bayes, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 K Nearest Neighbor Classifier, Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

📰 Dummy Classifier Explained: A Visual Guide with Code Examples for Beginners


📈 43.51 Punkte
🔧 AI Nachrichten

🎥 Logistic Regression in ML Explained


📈 42.3 Punkte
🎥 IT Security Video

📰 A Visual Understanding of Logistic Regression


📈 40.92 Punkte
🔧 AI Nachrichten

📰 What is Logistic Regression? A Comprehensive Guide


📈 37.77 Punkte
📰 IT Nachrichten

🔧 Logistic Regression from theory to code implementation


📈 36.73 Punkte
🔧 Programmierung

📰 Lasso and Elastic Net Regressions, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 Model Validation Techniques, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 Gradient Boosting Regressor, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 AdaBoost Classifier, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 Random Forest, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 Data Leakage in Preprocessing, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 Data Leakage in Preprocessing, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 Decision Tree Regressor, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 K Nearest Neighbor Regressor, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 K Nearest Neighbor Regressor, Explained: A Visual Guide with Code Examples


📈 34.16 Punkte
🔧 AI Nachrichten

📰 Encoding Categorical Data, Explained: A Visual Guide with Code Example for Beginners


📈 33.72 Punkte
🔧 AI Nachrichten

🔧 Logistic Regression โดยใช้ Python


📈 33.34 Punkte
🔧 Programmierung