Lädt...


🔧 SQL Data Exploration for BoardGameGeek datasets


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

Based on the BoardGameGeek data fetched with the Python script (and this script) I've created before, I have made some SQL data exploration for the datasets that are available in the repository. You can check the SQL queries in the queries.sql file in this repo, or clone the repo to your machine and use the datasets for your own exploration needs.

For start I will find number of games published in each year, and which year have the highest number of published games.

SELECT year_published, COUNT(*) total_games
FROM bgg.items
WHERE type = 'boardgame'
    AND year_published <> '0'
    AND rating <> '0'
    AND year_published < 2024
    AND year_published > 2012
GROUP BY year_published
ORDER BY year_published;

The output:

year_published total_games
2013 2539
2014 2941
2015 3158
2016 3439
2017 3619

...

Next exploration is about the average rating of games per year for the past 20 years:

SELECT year_published, ROUND(AVG(rating), 3) AS rating
FROM bgg.items
WHERE type = 'boardgame'
    AND year_published <> '0'
    AND rating <> '0'
    AND year_published < 2024
    AND year_published > 2012
GROUP BY year_published
ORDER BY year_published;

The output:

year_published rating
2013 6.284
2014 6.289
2015 6.369
2016 6.517
2017 6.645

...

Followed by a discovery for the top 10 board game categories in the past 10 years:

SELECT categories, SUM(games) AS total_games
FROM (
    SELECT categories, COUNT(*) AS games
    FROM bgg.items AS items
    LEFT JOIN bgg.categories AS categories
    USING (game_id)
    WHERE year_published <> '0'
        AND categories <> '0'
        AND type = 'boardgame'
        AND year_published > 2012
        AND year_published < 2024
    GROUP BY categories
) AS category_counts
GROUP BY categories
ORDER BY total_games DESC
LIMIT 10;

The output:

categories total_games
Card Game 14860
Party Game 5345
Wargame 5129
Fantasy 4928
Children's Game 4164

...

And also, beside the top categories, here are the 10 most popular mechanics in the past 10 years:

SELECT mechanics, SUM(games) AS total_games
FROM (
    SELECT mechanics, COUNT(*) AS games
    FROM bgg.items AS items
    LEFT JOIN bgg.mechanics AS mechanics
    USING (game_id)
    WHERE year_published <> '0'
        AND mechanics <> '0'
        AND type = 'boardgame'
        AND year_published > 2012
        AND year_published < 2024
    GROUP BY mechanics
) AS mechanics_counts
GROUP BY mechanics
ORDER BY total_games DESC
LIMIT 10;

The output:

mechanics total_games
Dice Rolling 10461
Hand Management 7833
Set Collection 5305
Variable Player Powers 4215
Cooperative Game 4195

...

Next in the list are the top five categories in the past 10 years with total games per category / per year ratio

WITH top_categories AS (
    SELECT categories, COUNT(*) AS total_games
    FROM bgg.items AS items
    LEFT JOIN bgg.categories AS category
    USING (game_id)
    WHERE type = 'boardgame'
        AND year_published <> '0'
        AND rating <> '0'
        AND year_published > 2012
        AND year_published < 2024
    GROUP BY categories
    ORDER BY total_games DESC
    LIMIT 5
)
SELECT year_published, categories, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.categories AS category
USING (game_id)
WHERE type = 'boardgame'
    AND year_published <> '0'
    AND rating <> '0'
    AND year_published > 2012
    AND year_published < 2024
    AND categories IN (SELECT categories FROM top_categories)
GROUP BY year_published, categories
ORDER BY year_published

The output:

year_published categories total_games
2013 Card Game 817
2013 Dice 238
2013 Fantasy 234
2013 Party Game 269
2013 Wargame 325
2014 Card Game 1022

...

In addition, I've done the similar exploration for the top five mechanics in the past 10 years with total games per category / per year ratio:

WITH top_mechanics AS (
    SELECT mechanics, COUNT(*) AS total_games
    FROM bgg.items AS items
    LEFT JOIN bgg.mechanics AS mechanic
    USING (game_id)
    WHERE type = 'boardgame'
        AND year_published <> '0'
        AND rating <> '0'
        AND year_published > 2012
        AND year_published < 2024
    GROUP BY mechanics
    ORDER BY total_games DESC
    LIMIT 5
)
SELECT year_published, mechanics, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.mechanics AS mechanic
USING (game_id)
WHERE type = 'boardgame'
    AND year_published <> '0'
    AND rating <> '0'
    AND year_published > 2012
    AND year_published < 2024
    AND mechanics IN (SELECT mechanics FROM top_mechanics)
GROUP BY year_published, mechanics
ORDER BY year_published

The output:

year_published mechanics total_games
2013 Cooperative Game 148
2013 Dice Rolling 635
2013 Hand Management 487
2013 Set Collection 327
2013 Variable Player Powers 260

...

Next are the top 15 most active publishers in the past 10 years,

SELECT publishers, COUNT(*) AS total_games
FROM bgg.items as items
LEFT JOIN bgg.publishers
USING (game_id)
WHERE publishers <> '0'
    AND year_published > 2012
    AND year_published < 2024
    AND type = 'boardgame'
    AND publishers NOT IN (
        '(Self-Published)',
        '(Web published)',
        'Inc.',
        'LLC',
        'Ltd.',
        '(Unknown)',
        '(Looking for a publisher)',
        '(Public Domain)'
    )
GROUP BY publishers
ORDER BY total_games DESC
LIMIT 15;

The output:

publishers total_games
Hasbro 506
Pegasus Spiele 468
Hobby World 397
Korea Boardgames Co. 384
Rebel Sp. z o.o. 369

...

top 15 game designers in the past 10 years,

SELECT designers, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.designers
USING (game_id)
WHERE type = "boardgame"
    AND year_published > 2012
    AND year_published < 2024
    AND designers NOT IN (
        '0',
        '(Uncredited)',
        'Jr.'
    )
GROUP BY designers
ORDER BY total_games DESC
LIMIT 15;

The output:

designers total_games
Reiner Knizia 203
Paul Rohrbaugh 130
Joseph Miranda 110
Prospero Hall 109
Charles Darrow 108

...

and the top 15 game artists in the past 10 years:

SELECT artists, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.artists
USING (game_id)
WHERE type = "boardgame"
    AND year_published > 2012
    AND year_published < 2024
    AND artists NOT IN (
        '0',
        '(Uncredited)'
    )
GROUP BY artists
ORDER BY total_games DESC
LIMIT 15;

The output:

artists total_games
Joe Youst 168
Mark Mahaffey 135
Ilya Kudriashov 123
Klemens Franz 109
Michael Menzel 106

...

There are two more discoveries that I have made, one being the top 15 categories in the past 10 years that have the highest average rating (although, these are not ordered by the games published in these categories; some other categories have more games published)

SELECT categories, ROUND(AVG(rating), 3) AS average_rating, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.categories AS categories
USING (game_id)
WHERE type = 'boardgame'
    AND categories <> '0'
    AND year_published > 2012
    AND year_published < 2024
GROUP BY categories
ORDER BY average_rating DESC
LIMIT 15;

The output:

categories average_rating total_games
World War II 6.504 1448
Vietnam War 6.473 96
Civilization 6.437 466
Renaissance 6.382 311
Civil War 6.353 199

...

And finally, the top 15 mechanics in the past 10 years that have the highest average rating:

SELECT mechanics, ROUND(AVG(rating), 3) AS average_rating, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.mechanics AS mechanics
USING (game_id)
WHERE type = 'boardgame'
    AND mechanics <> '0'
    AND year_published > 2012
    AND year_published < 2024
GROUP BY mechanics
ORDER BY average_rating DESC
LIMIT 15;

The output:

mechanics average_rating total_games
Auction: English 8.348 2
Tags 7.6 32
Neighbor Scope 7.557 8
Auction Compensation 7.374 2
Ratio / Combat Results Table 7.374 137

...

...

🔧 SQL Data Exploration for BoardGameGeek datasets


📈 70.31 Punkte
🔧 Programmierung

🔧 BoardGameGeek board games data fetching with Python


📈 35.47 Punkte
🔧 Programmierung

🔧 I've updated the Python fetcher for BoardGameGeek data


📈 35.47 Punkte
🔧 Programmierung

🔧 CVPR 2024 Datasets and Benchmarks - Part 1: Datasets


📈 32.73 Punkte
🔧 Programmierung

🍏 Open Data Editor 1.0.0 - Explore and publish data: datasets, tables, charts, maps, stories, and more.


📈 21.99 Punkte
🍏 iOS / Mac OS

📰 From Data-Informed to Data-Driven Decisions: An Introduction to Tradespace Exploration


📈 20.75 Punkte
🔧 AI Nachrichten

🔧 Technical Report: Initial Data Analysis of Titanic Datasets


📈 19.18 Punkte
🔧 Programmierung

🔧 Dynamic Datasets and Customizable Parameters for Data Analysis


📈 19.18 Punkte
🔧 Programmierung

📰 Using GPT-3.5-Turbo and GPT-4 to Apply Text-defined Data Quality Checks on Humanitarian Datasets


📈 19.18 Punkte
🔧 AI Nachrichten

📰 Master Data Integrity to Clean Your Computer Vision Datasets


📈 19.18 Punkte
🔧 AI Nachrichten

📰 Researchers develop a process to categorize massive datasets, making data more accessible


📈 19.18 Punkte
📰 IT Security Nachrichten

🔧 Data Drift Monitoring for Azure ML Datasets | AI Show


📈 19.18 Punkte
🔧 Programmierung

🎥 Data Drift Monitoring for Azure ML Datasets


📈 19.18 Punkte
🎥 Video | Youtube

🎥 Azure Open Datasets: ML-Ready Open Data on Azure


📈 19.18 Punkte
🎥 Video | Youtube

🔧 Mining of Massive Datasets: The Ultimate Guide for Data Science Enthusiasts


📈 19.18 Punkte
🔧 Programmierung

🔧 Azure Open Datasets: ML-Ready Open Data on Azure | AI Show


📈 19.18 Punkte
🔧 Programmierung

📰 Facebook Data of Millions Exposed in Leaky Datasets


📈 19.18 Punkte
📰 IT Security Nachrichten

📰 Sepal AI: A Data Development Platform that Enables You to Curate Useful Datasets


📈 19.18 Punkte
🔧 AI Nachrichten

📰 Sepal AI: A Data Development Platform that Enables You to Curate Useful Datasets


📈 19.18 Punkte
🔧 AI Nachrichten

🔧 SQL and Database Design: A Detailed Exploration - Week Seven


📈 18.48 Punkte
🔧 Programmierung

🔧 Exploration of Digital Banking Transactions: A SQL Analysis


📈 18.48 Punkte
🔧 Programmierung

🔧 LUX: Toolkit for Smart Data Exploration


📈 17.93 Punkte
🔧 Programmierung

🔧 PivotTable : Toolkit for Smart Data Exploration and Analysis


📈 17.93 Punkte
🔧 Programmierung

🔧 Data Management in Distributed Systems: A Comprehensive Exploration of Open Table Formats


📈 17.93 Punkte
🔧 Programmierung

📰 Microsoft Researchers Introduce InsightPilot: An LLM-Empowered Automated Data Exploration System


📈 17.93 Punkte
🔧 AI Nachrichten

🎥 How autonomous agents will support data exploration


📈 17.93 Punkte
🎥 Video | Youtube

📰 Exploration of DShield Cowrie Data with jq, (Wed, Apr 5th)


📈 17.93 Punkte
📰 IT Security

📰 Visualization Module in Arabica Speeds Up Text Data Exploration


📈 17.93 Punkte
🔧 AI Nachrichten

matomo