Ausnahme gefangen: SSL certificate problem: certificate is not yet valid 📌 Mastering Pandas: A Comprehensive Guide with Exercises

🏠 Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeiträge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden Überblick über die wichtigsten Aspekte der IT-Sicherheit in einer sich ständig verändernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch übersetzen, erst Englisch auswählen dann wieder Deutsch!

Google Android Playstore Download Button für Team IT Security



📚 Mastering Pandas: A Comprehensive Guide with Exercises


💡 Newskategorie: Programmierung
🔗 Quelle: dev.to

Day 5 of 100 Days Data Science Bootcamp from noob to expert.

GitHub link: Complete-Data-Science-Bootcamp

Main Post: Complete-Data-Science-Bootcamp

Recap Day 4

Yesterday we have studied in detail about NumPy in Python.

Let's Start

Pandas is a powerful data analysis and manipulation library in Python. It allows you to easily access, select, and manipulate data in your dataset. In this post, you'll learn how to use Pandas to create a gradebook for tracking student grades. You'll learn how to read in data from a CSV file, manipulate the data, and create a report of the grades. You'll also learn how to handle missing values and prepare your data for visualization. By the end of this course, you'll be able to efficiently use Pandas to manage and analyze your data.

Let's say we have a dataset of student grades in a CSV file called "grades.csv". The first step in exploring this dataset with Pandas is to read it into a Pandas DataFrame. We can do this using the read_csv() function:

import pandas as pd

df = pd.read_csv('grades.csv')
df
name grade
0 John 89.0
1 Mary 95.0
2 Emily 77.0
3 Michael 82.0
4 Rachel NaN
  • Now that we have our DataFrame, we can start exploring the data. Let's say we want to access the grades of a specific student. We can do this by selecting the row of the student and then selecting the 'grade' column:
student_name = 'John'
grade = df[df['name'] == student_name]['grade']
print(grade)

0 89.0 Name: grade, dtype: float64

  • We can also select a specific column by its label using the '[]' operator:
grades = df['grade']
print(grades)

0 89.0 1 95.0 2 77.0 3 82.0 4 NaN Name: grade, dtype: float64

  • If we want to select multiple columns, we can pass a list of column labels to the '[]' operator:
student_info = df[['name', 'grade']]
print(student_info)

name grade 0 John 89.0 1 Mary 95.0 2 Emily 77.0 3 Michael 82.0 4 Rachel NaN

  • Now let's say we have some missing values in our dataset. We can handle these missing values using the fillna() function:
df = df.fillna(-1)
df
name grade
0 John 89.0
1 Mary 95.0
2 Emily 77.0
3 Michael 82.0
4 Rachel -1.0

This will replace all missing values with -1.

This is basic overview of pandas. Now we will go deep dive into it.

1. Importing and reading in data:

  • read in data from a variety of sources, such as a CSV file:
import pandas as pd
df = pd.read_csv('people_data.csv')
df
#Or a Excel file:

# df = pd.read_excel('data.xlsx')
Name Age Gender
0 John 20 Male
1 Jane 30 Female
2 Bob 40 Male
3 Alice 50 Female

2. Inspecting data:

Once you have your data in a pandas DataFrame, you can use various methods to inspect it.

For example, you can view the first few rows of the data using the head() method:

df.head()
Name Age Gender
0 John 20 Male
1 Jane 30 Female
2 Bob 40 Male
3 Alice 50 Female

You can also view the column names and data types using the info() method:

df.info()

<class 'pandas.core.frame.DataFrame'> RangeIndex: 4 entries, 0 to 3 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 4 non-null object 1 Age 4 non-null int64 2 Gender 4 non-null object dtypes: int64(1), object(2) memory usage: 224.0+ bytes

3. Selecting data:

You can select specific columns or rows of data using the [] operator or the loc and iloc attributes.

For example, to select the "Name" and "Age" columns, you can use the following code:

df[['Name', 'Age']]
Name Age
0 John 20
1 Jane 30
2 Bob 40
3 Alice 50

To select rows with a specific value in a certain column, you can use the loc attribute:

df.loc[df['Gender'] == 'Female']
Name Age Gender
1 Jane 30 Female
3 Alice 50 Female

4. Manipulating data:

You can use various methods to manipulate data in a pandas DataFrame.

For example, you can add a new column by assigning a value to a new column name:

df['County'] = ["India", "USA", "India", "Canada"]
df
Name Age Gender County
0 John 20 Male India
1 Jane 30 Female USA
2 Bob 40 Male India
3 Alice 50 Female Canada

You can also drop columns or rows using the drop() method:

newdf = df.drop('County', axis=1)  # drop a column
newdf
Name Age Gender
0 John 20 Male
1 Jane 30 Female
2 Bob 40 Male
3 Alice 50 Female
newdf1 = df.drop(df[df['Age'] < 35].index, inplace=True)  # drop rows with Age < 18
df
Name Age Gender County
2 Bob 40 Male India
3 Alice 50 Female Canada

5. Grouping and aggregating data:

You can group data by specific values and apply an aggregation function using the groupby() method and the apply() function:

import numpy as np
groupdf = df.groupby('Gender')['Age'].apply(np.mean)  # group by Gender and calculate mean Age
groupdf

Gender Female 50.0 Male 40.0 Name: Age, dtype: float64

6. Merging and joining data:

You can merge or join data from multiple DataFrames using the merge() function or the concat() function.

For example, to merge two DataFrames based on a common column, you can use the following code:

df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                    'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3']})
df1
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 B3
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']})
df2
key C D
0 K0 C0 D0
1 K1 C1 D1
2 K2 C2 D2
3 K3 C3 D3
merged_df = pd.merge(df1, df2, on='key')
merged_df
key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3

To concatenate data horizontally (i.e. adding columns), you can use the concat() function:

concat_df = pd.concat([df1, df2], axis=1)
concat_df
key A B key C D
0 K0 A0 B0 K0 C0 D0
1 K1 A1 B1 K1 C1 D1
2 K2 A2 B2 K2 C2 D2
3 K3 A3 B3 K3 C3 D3

7. Handling missing data:

It's common to encounter missing data in real-world datasets. Pandas provides various methods to handle missing data, such as filling missing values with a specific value or dropping rows with missing values.

To fill missing values with a specific value, you can use the fillna() method:

data = {'Name': ['John', 'Jane', 'Bob', 'Alice'],
        'Age': [20, 30, 40, np.nan],
        'Gender': ['Male', 'Female', 'Male', 'Female']}
df = pd.DataFrame(data)
df
Name Age Gender
0 John 20.0 Male
1 Jane 30.0 Female
2 Bob 40.0 Male
3 Alice NaN Female
df['Age'].fillna(value='22', inplace=True)
df
Name Age Gender
0 John 20.0 Male
1 Jane 30.0 Female
2 Bob 40.0 Male
3 Alice 22 Female

To drop rows with missing values, you can use the dropna() method:

df.dropna(inplace=True)

8. Working with dates and times:

Pandas has built-in support for working with dates and times.

You can convert a column of strings to datetime objects using the to_datetime() function:

df['Date'] = pd.to_datetime(df['Date'])

You can then extract specific parts of the datetime, such as the year or month, using the dt attribute:

df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month

9. Advanced operations:

There are many more advanced operations that you can perform with pandas, such as pivot tables, time series analysis, and machine learning. Here are a few more code snippets to help you explore these topics:

To create a pivot table, you can use the pivot_table() function:

pivot_table = df.pivot_table(index='Column1', columns='Column2', values='Column3', aggfunc=np.mean)

To perform time series analysis, you can use the resample() method to resample data at a different frequency:

resampled_df = df.resample('D').mean()  # resample to daily frequency

9. Visualizing data:

You can use the plot() method to create various types of plots, such as bar plots, scatter plots, and line plots:

df.plot(x='X Column', y='Y Column', kind='scatter')  # scatter plot
df.plot(x='X Column', y='Y Column', kind='bar')  # bar plot
df.plot(x='X Column', y='Y Column')  # line plot

In conclusion, pandas is a powerful and versatile library for data manipulation and analysis in Python. With its wide range of built-in functions and methods, pandas makes it easy to work with a variety of data sources, perform complex data operations, and visualize results. Whether you're a beginner or an experienced data scientist, pandas is an essential tool for any data-related project.

Exercise Question you will find in the exercise notebook of Day 5 on GitHub.

If you liked it then...

Buy Me A Coffee

...



📌 Mastering Pandas: A Comprehensive Guide with Exercises


📈 66.05 Punkte

📌 [Python's Pandas] The Future Of Pandas


📈 31.52 Punkte

📌 No More Sad Pandas: Optimizing Pandas Code for Speed and Efficiency


📈 31.52 Punkte

📌 Pandas for Fun and Profit: Using Pandas for Successful Investing


📈 31.52 Punkte

📌 Pandas in One Hour (Introduction to Pandas Library)


📈 31.52 Punkte

📌 Pandas Isn’t Enough. Learn These 25 Pandas to SQL Translations To Upgrade Your Data Analysis Game


📈 31.52 Punkte

📌 Pandas - Visualizing Dataframe Data - 7 Days of Pandas


📈 31.52 Punkte

📌 How to Rename a Column in Pandas – Python Pandas Dataframe Renaming Tutorial


📈 31.52 Punkte

📌 pandas.DataFrame.sort_values - How To Sort Values in Pandas


📈 31.52 Punkte

📌 Pandas round() Method – How To Round a Float in Pandas


📈 31.52 Punkte

📌 Python vs Pandas - Difference Between Python and Pandas


📈 31.52 Punkte

📌 Need for Speed: cuDF Pandas vs. Pandas


📈 31.52 Punkte

📌 Mastering Injectable Services: A Comprehensive Guide


📈 30.73 Punkte

📌 Mastering PostgreSQL Views and CTEs for Rails Developers: A Comprehensive Guide


📈 30.73 Punkte

📌 Mastering AWS High Availability: A Comprehensive Guide for Optimizing Your Infrastructure


📈 30.73 Punkte

📌 Mastering CSS Border Style: A Comprehensive Guide


📈 30.73 Punkte

📌 Mastering Higher-Order Components in React JS: A Comprehensive Guide 🚀


📈 30.73 Punkte

📌 Mastering SAML Implementation in PHP: A Comprehensive Step-by-Step Guide


📈 30.73 Punkte

📌 Mastering Time Complexity in Ruby: A Comprehensive Guide with Code Examples and Tests


📈 30.73 Punkte

📌 A comprehensive guide to mastering symbolic links in Linux


📈 30.73 Punkte

📌 Mastering API Testing and Exploring the Power of Postman: A Comprehensive Guide


📈 30.73 Punkte

📌 Mastering JavaScript: A Comprehensive Interview Guide for Students


📈 30.73 Punkte

📌 Mastering Django Now: A Comprehensive Guide from Beginner to Advanced


📈 30.73 Punkte

📌 Mastering Power BI Report Builder: A Comprehensive Guide to Paginated Reports


📈 30.73 Punkte

📌 Mastering Linters : A Code Quality Assurance Comprehensive Guide using Ruby on Rails


📈 30.73 Punkte

📌 Mastering the GPT Workflow: A Comprehensive Guide to Harnessing AI-Powered Language Models


📈 30.73 Punkte

📌 Mastering Mage AI Generator: A Comprehensive Guide


📈 30.73 Punkte

📌 Mastering Python Operators: A Comprehensive Guide to Power Your Code


📈 30.73 Punkte

📌 Mastering WordPress CLI: A Comprehensive Guide to Boost Your Productivity


📈 30.73 Punkte

📌 Mastering Flexbox Basics: A Comprehensive guide


📈 30.73 Punkte

📌 Mastering Asynchronous JavaScript: A Comprehensive Guide


📈 30.73 Punkte

📌 Mastering TypeScript: A Comprehensive Guide. Part(1)


📈 30.73 Punkte

📌 Mastering AWS ECS with CloudFormation: A Comprehensive Guide


📈 30.73 Punkte

📌 Mastering the journalctl Command: A Comprehensive Guide


📈 30.73 Punkte

📌 Mastering Web Application Security Assessments: A Comprehensive Guide


📈 30.73 Punkte











matomo