📚 Logistic Regression, Explained: A Visual Guide with Code Examples for Beginners
Nachrichtenbereich: 🔧 AI Nachrichten
🔗 Quelle: towardsdatascience.com
CLASSIFICATION ALGORITHM
Finding the perfect weights to fit the data in
While some probabilistic-based machine learning models (like Naive Bayes) make bold assumptions about feature independence, logistic regression takes a more measured approach. Think of it as drawing a line (or plane) that separates two outcomes, allowing us to predict probabilities with a bit more flexibility.

Definition
Logistic regression is a statistical method used for predicting binary outcomes. Despite its name, it’s used for classification rather than regression. It estimates the probability that an instance belongs to a particular class. If the estimated probability is greater than 50%, the model predicts that the instance belongs to that class (or vice versa).

Dataset Used
Throughout this article, we’ll use this artificial golf dataset (inspired by [1]) as an example. This dataset predicts whether a person will play golf based on weather conditions.
Just like in KNN, logistic regression requires the data to be scaled first. Convert categorical columns into 0 & 1 and also scale the numerical features so that no single feature dominates the distance metric.

# Import required libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
# Create dataset from dictionary
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', 'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy', 'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast', 'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0, 72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0, 88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False, True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)
# Prepare data: encode categorical variables
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)
# Rearrange columns
column_order = ['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind', 'Play']
df = df[column_order]
# Split data into features and target
X, y = df.drop(columns='Play'), df['Play']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)
# Scale numerical features
scaler = StandardScaler()
X_train[['Temperature', 'Humidity']] = scaler.fit_transform(X_train[['Temperature', 'Humidity']])
X_test[['Temperature', 'Humidity']] = scaler.transform(X_test[['Temperature', 'Humidity']])
# Print results
print("Training set:")
print(pd.concat([X_train, y_train], axis=1), '\n')
print("Test set:")
print(pd.concat([X_test, y_test], axis=1))
Main Mechanism
Logistic regression works by applying the logistic function to a linear combination of the input features. Here’s how it operates:
- Calculate a weighted sum of the input features (similar to linear regression).
- Apply the logistic function (also called sigmoid function) to this sum, which maps any real number to a value between 0 and 1.
- Interpret this value as the probability of belonging to the positive class.
- Use a threshold (typically 0.5) to make the final classification decision.

Training Steps
The training process for logistic regression involves finding the best weights for the input features. Here’s the general outline:
- Initialize the weights (often to small random values).

# Initialize weights (including bias) to 0.1
initial_weights = np.full(X_train_np.shape[1], 0.1)
# Create and display DataFrame for initial weights
print(f"Initial Weights: {initial_weights}")
2. For each training example:
a. Calculate the predicted probability using the current weights.

def sigmoid(z):
return 1 / (1 + np.exp(-z))
def calculate_probabilities(X, weights):
z = np.dot(X, weights)
return sigmoid(z)
def calculate_log_loss(probabilities, y):
return -y * np.log(probabilities) - (1 - y) * np.log(1 - probabilities)
def create_output_dataframe(X, y, weights):
probabilities = calculate_probabilities(X, weights)
log_losses = calculate_log_loss(probabilities, y)
df = pd.DataFrame({
'Probability': probabilities,
'Label': y,
'Log Loss': log_losses
})
return df
def calculate_average_log_loss(X, y, weights):
probabilities = calculate_probabilities(X, weights)
log_losses = calculate_log_loss(probabilities, y)
return np.mean(log_losses)
# Convert X_train and y_train to numpy arrays for easier computation
X_train_np = X_train.to_numpy()
y_train_np = y_train.to_numpy()
# Add a column of 1s to X_train_np for the bias term
X_train_np = np.column_stack((np.ones(X_train_np.shape[0]), X_train_np))
# Create and display DataFrame for initial weights
initial_df = create_output_dataframe(X_train_np, y_train_np, initial_weights)
print(initial_df.to_string(index=False, float_format=lambda x: f"{x:.6f}"))
print(f"\nAverage Log Loss: {calculate_average_log_loss(X_train_np, y_train_np, initial_weights):.6f}")
b. Compare this probability to the actual class label by calculating its log loss.

3. Update the weights to minimize the loss (usually using some optimization algorithm, like gradient descent. This include repeatedly do Step 2 until log loss cannot get smaller).

def gradient_descent_step(X, y, weights, learning_rate):
m = len(y)
probabilities = calculate_probabilities(X, weights)
gradient = np.dot(X.T, (probabilities - y)) / m
new_weights = weights - learning_rate * gradient # Create new array for updated weights
return new_weights
# Perform one step of gradient descent (one of the simplest optimization algorithm)
learning_rate = 0.1
updated_weights = gradient_descent_step(X_train_np, y_train_np, initial_weights, learning_rate)
# Print initial and updated weights
print("\nInitial weights:")
for feature, weight in zip(['Bias'] + list(X_train.columns), initial_weights):
print(f"{feature:11}: {weight:.2f}")
print("\nUpdated weights after one iteration:")
for feature, weight in zip(['Bias'] + list(X_train.columns), updated_weights):
print(f"{feature:11}: {weight:.2f}")
# With sklearn, you can get the final weights (coefficients)
# and final bias (intercepts) easily.
# The result is almost the same as doing it manually above.
from sklearn.linear_model import LogisticRegression
lr_clf = LogisticRegression(penalty=None, solver='saga')
lr_clf.fit(X_train, y_train)
coefficients = lr_clf.coef_
intercept = lr_clf.intercept_
y_train_prob = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(y_train_prob) + (1 - y_train) * np.log(1 - y_train_prob))
print(f"Weights & Bias Final: {coefficients[0].round(2)}, {round(intercept[0],2)}")
print("Loss Final:", loss.round(3))
Classification Steps
Once the model is trained:
1. For a new instance, calculate the probability with the final weights (also called coefficients), just like during the training step.
2. Interpret the output by seeing the probability: if p ≥ 0.5, predict class 1; otherwise, predict class 0

# Calculate prediction probability
predicted_probs = lr_clf.predict_proba(X_test)[:, 1]
z_values = np.log(predicted_probs / (1 - predicted_probs))
result_df = pd.DataFrame({
'ID': X_test.index,
'Z-Values': z_values.round(3),
'Probabilities': predicted_probs.round(3)
}).set_index('ID')
print(result_df)
# Make predictions
y_pred = lr_clf.predict(X_test)
print(y_pred)
Evaluation Step

result_df = pd.DataFrame({
'ID': X_test.index,
'Label': y_test,
'Probabilities': predicted_probs.round(2),
'Prediction': y_pred,
}).set_index('ID')
print(result_df)
Key Parameters
Logistic regression has several important parameters that control its behavior:
1.Penalty: The type of regularization to use (‘l1’, ‘l2’, ‘elasticnet’, or ‘none’). Regularization in logistic regression prevents overfitting by adding a penalty term to the model’s loss function, that encourages simpler models.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
regs = [None, 'l1', 'l2']
coeff_dict = {}
for reg in regs:
lr_clf = LogisticRegression(penalty=reg, solver='saga')
lr_clf.fit(X_train, y_train)
coefficients = lr_clf.coef_
intercept = lr_clf.intercept_
predicted_probs = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(predicted_probs) + (1 - y_train) * np.log(1 - predicted_probs))
predictions = lr_clf.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
coeff_dict[reg] = {
'Coefficients': coefficients,
'Intercept': intercept,
'Loss': loss,
'Accuracy': accuracy
}
for reg, vals in coeff_dict.items():
print(f"{reg}: Coeff: {vals['Coefficients'][0].round(2)}, Intercept: {vals['Intercept'].round(2)}, Loss: {vals['Loss'].round(3)}, Accuracy: {vals['Accuracy'].round(3)}")
2. Regularization Strength (C): Controls the trade-off between fitting the training data and keeping the model simple. A smaller C means stronger regularization.

# List of regularization strengths to try for L1
strengths = [0.001, 0.01, 0.1, 1, 10, 100]
coeff_dict = {}
for strength in strengths:
lr_clf = LogisticRegression(penalty='l1', C=strength, solver='saga')
lr_clf.fit(X_train, y_train)
coefficients = lr_clf.coef_
intercept = lr_clf.intercept_
predicted_probs = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(predicted_probs) + (1 - y_train) * np.log(1 - predicted_probs))
predictions = lr_clf.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
coeff_dict[f'L1_{strength}'] = {
'Coefficients': coefficients[0].round(2),
'Intercept': round(intercept[0],2),
'Loss': round(loss,3),
'Accuracy': round(accuracy*100,2)
}
print(pd.DataFrame(coeff_dict).T)
# List of regularization strengths to try for L2
strengths = [0.001, 0.01, 0.1, 1, 10, 100]
coeff_dict = {}
for strength in strengths:
lr_clf = LogisticRegression(penalty='l2', C=strength, solver='saga')
lr_clf.fit(X_train, y_train)
coefficients = lr_clf.coef_
intercept = lr_clf.intercept_
predicted_probs = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(predicted_probs) + (1 - y_train) * np.log(1 - predicted_probs))
predictions = lr_clf.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
coeff_dict[f'L2_{strength}'] = {
'Coefficients': coefficients[0].round(2),
'Intercept': round(intercept[0],2),
'Loss': round(loss,3),
'Accuracy': round(accuracy*100,2)
}
print(pd.DataFrame(coeff_dict).T)
3. Solver: The algorithm to use for optimization (‘liblinear’, ‘newton-cg’, ‘lbfgs’, ‘sag’, ‘saga’). Some regularization might require a particular algorithm.
4. Max Iterations: The maximum number of iterations for the solver to converge.
For our golf dataset, we might start with ‘l2’ penalty, ‘liblinear’ solver, and C=1.0 as a baseline.
Pros & Cons
Like any algorithm in machine learning, logistic regression has its strengths and limitations.
Pros:
- Simplicity: Easy to implement and understand.
- Interpretability: The weights directly show the importance of each feature.
- Efficiency: Doesn’t require too much computational power.
- Probabilistic Output: Provides probabilities rather than just classifications.
Cons:
- Linearity Assumption: Assumes a linear relationship between features and log-odds of the outcome.
- Feature Independence: Assumes features are not highly correlated.
- Limited Complexity: May underfit in cases where the decision boundary is highly non-linear.
- Requires More Data: Needs a relatively large sample size for stable results.
In our golf example, logistic regression might provide a clear, interpretable model of how each weather factor influences the decision to play golf. However, it might struggle if the decision involves complex interactions between weather conditions that can’t be captured by a linear model.
Final Remark
Logistic regression shines as a powerful yet straightforward classification tool. It stands out for its ability to handle complex data while remaining easy to interpret. Unlike some other basic models, it provides smooth probability estimates and works well with many features. In the real world, from predicting customer behavior to medical diagnoses, logistic regression often performs surprisingly well. It’s not just a stepping stone — it’s a reliable model that can match more complex models in many situations.
🌟 Logistic Regression Code Summarized
# Import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
# Load the dataset
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', 'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy', 'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast', 'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0, 72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0, 88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False, True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)
# Prepare data: encode categorical variables
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)
# Split data into training and testing sets
X, y = df.drop(columns='Play'), df['Play']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)
# Scale numerical features
scaler = StandardScaler()
float_cols = X_train.select_dtypes(include=['float64']).columns
X_train[float_cols] = scaler.fit_transform(X_train[float_cols])
X_test[float_cols] = scaler.transform(X_test[float_cols])
# Train the model
lr_clf = LogisticRegression(penalty='l2', C=1, solver='saga')
lr_clf.fit(X_train, y_train)
# Make predictions
y_pred = lr_clf.predict(X_test)
# Evaluate the model
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
Further Reading
For a detailed explanation of the Logistic Regression and its implementation in scikit-learn, readers can refer to the official documentation [2], which provides comprehensive information on its usage and parameters.
Technical Environment
This article uses Python 3.7 and scikit-learn 1.5. While the concepts discussed are generally applicable, specific code implementations may vary slightly with different versions.
About the Illustrations
Unless otherwise noted, all images are created by the author, incorporating licensed design elements from Canva Pro.
Reference
[1] T. M. Mitchell, Machine Learning (1997), McGraw-Hill Science/Engineering/Math, pp. 59
[2] F. Pedregosa et al., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Logistic Regression, Explained: A Visual Guide with Code Examples for Beginners was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
...
🎥 Logistic Regression in ML Explained
📈 42.3 Punkte
🎥 IT Security Video
📰 A Visual Understanding of Logistic Regression
📈 40.92 Punkte
🔧 AI Nachrichten
🔧 Logistic Regression โดยใช้ Python
📈 33.34 Punkte
🔧 Programmierung