🔧 SQL Data Exploration for BoardGameGeek datasets
Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to
Based on the BoardGameGeek data fetched with the Python script (and this script) I've created before, I have made some SQL data exploration for the datasets that are available in the repository. You can check the SQL queries in the queries.sql
file in this repo, or clone the repo to your machine and use the datasets for your own exploration needs.
For start I will find number of games published in each year, and which year have the highest number of published games.
SELECT year_published, COUNT(*) total_games
FROM bgg.items
WHERE type = 'boardgame'
AND year_published <> '0'
AND rating <> '0'
AND year_published < 2024
AND year_published > 2012
GROUP BY year_published
ORDER BY year_published;
The output:
year_published | total_games |
---|---|
2013 | 2539 |
2014 | 2941 |
2015 | 3158 |
2016 | 3439 |
2017 | 3619 |
...
Next exploration is about the average rating of games per year for the past 20 years:
SELECT year_published, ROUND(AVG(rating), 3) AS rating
FROM bgg.items
WHERE type = 'boardgame'
AND year_published <> '0'
AND rating <> '0'
AND year_published < 2024
AND year_published > 2012
GROUP BY year_published
ORDER BY year_published;
The output:
year_published | rating |
---|---|
2013 | 6.284 |
2014 | 6.289 |
2015 | 6.369 |
2016 | 6.517 |
2017 | 6.645 |
...
Followed by a discovery for the top 10 board game categories in the past 10 years:
SELECT categories, SUM(games) AS total_games
FROM (
SELECT categories, COUNT(*) AS games
FROM bgg.items AS items
LEFT JOIN bgg.categories AS categories
USING (game_id)
WHERE year_published <> '0'
AND categories <> '0'
AND type = 'boardgame'
AND year_published > 2012
AND year_published < 2024
GROUP BY categories
) AS category_counts
GROUP BY categories
ORDER BY total_games DESC
LIMIT 10;
The output:
categories | total_games |
---|---|
Card Game | 14860 |
Party Game | 5345 |
Wargame | 5129 |
Fantasy | 4928 |
Children's Game | 4164 |
...
And also, beside the top categories, here are the 10 most popular mechanics in the past 10 years:
SELECT mechanics, SUM(games) AS total_games
FROM (
SELECT mechanics, COUNT(*) AS games
FROM bgg.items AS items
LEFT JOIN bgg.mechanics AS mechanics
USING (game_id)
WHERE year_published <> '0'
AND mechanics <> '0'
AND type = 'boardgame'
AND year_published > 2012
AND year_published < 2024
GROUP BY mechanics
) AS mechanics_counts
GROUP BY mechanics
ORDER BY total_games DESC
LIMIT 10;
The output:
mechanics | total_games |
---|---|
Dice Rolling | 10461 |
Hand Management | 7833 |
Set Collection | 5305 |
Variable Player Powers | 4215 |
Cooperative Game | 4195 |
...
Next in the list are the top five categories in the past 10 years with total games per category / per year ratio
WITH top_categories AS (
SELECT categories, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.categories AS category
USING (game_id)
WHERE type = 'boardgame'
AND year_published <> '0'
AND rating <> '0'
AND year_published > 2012
AND year_published < 2024
GROUP BY categories
ORDER BY total_games DESC
LIMIT 5
)
SELECT year_published, categories, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.categories AS category
USING (game_id)
WHERE type = 'boardgame'
AND year_published <> '0'
AND rating <> '0'
AND year_published > 2012
AND year_published < 2024
AND categories IN (SELECT categories FROM top_categories)
GROUP BY year_published, categories
ORDER BY year_published
The output:
year_published | categories | total_games |
---|---|---|
2013 | Card Game | 817 |
2013 | Dice | 238 |
2013 | Fantasy | 234 |
2013 | Party Game | 269 |
2013 | Wargame | 325 |
2014 | Card Game | 1022 |
...
In addition, I've done the similar exploration for the top five mechanics in the past 10 years with total games per category / per year ratio:
WITH top_mechanics AS (
SELECT mechanics, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.mechanics AS mechanic
USING (game_id)
WHERE type = 'boardgame'
AND year_published <> '0'
AND rating <> '0'
AND year_published > 2012
AND year_published < 2024
GROUP BY mechanics
ORDER BY total_games DESC
LIMIT 5
)
SELECT year_published, mechanics, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.mechanics AS mechanic
USING (game_id)
WHERE type = 'boardgame'
AND year_published <> '0'
AND rating <> '0'
AND year_published > 2012
AND year_published < 2024
AND mechanics IN (SELECT mechanics FROM top_mechanics)
GROUP BY year_published, mechanics
ORDER BY year_published
The output:
year_published | mechanics | total_games |
---|---|---|
2013 | Cooperative Game | 148 |
2013 | Dice Rolling | 635 |
2013 | Hand Management | 487 |
2013 | Set Collection | 327 |
2013 | Variable Player Powers | 260 |
...
Next are the top 15 most active publishers in the past 10 years,
SELECT publishers, COUNT(*) AS total_games
FROM bgg.items as items
LEFT JOIN bgg.publishers
USING (game_id)
WHERE publishers <> '0'
AND year_published > 2012
AND year_published < 2024
AND type = 'boardgame'
AND publishers NOT IN (
'(Self-Published)',
'(Web published)',
'Inc.',
'LLC',
'Ltd.',
'(Unknown)',
'(Looking for a publisher)',
'(Public Domain)'
)
GROUP BY publishers
ORDER BY total_games DESC
LIMIT 15;
The output:
publishers | total_games |
---|---|
Hasbro | 506 |
Pegasus Spiele | 468 |
Hobby World | 397 |
Korea Boardgames Co. | 384 |
Rebel Sp. z o.o. | 369 |
...
top 15 game designers in the past 10 years,
SELECT designers, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.designers
USING (game_id)
WHERE type = "boardgame"
AND year_published > 2012
AND year_published < 2024
AND designers NOT IN (
'0',
'(Uncredited)',
'Jr.'
)
GROUP BY designers
ORDER BY total_games DESC
LIMIT 15;
The output:
designers | total_games |
---|---|
Reiner Knizia | 203 |
Paul Rohrbaugh | 130 |
Joseph Miranda | 110 |
Prospero Hall | 109 |
Charles Darrow | 108 |
...
and the top 15 game artists in the past 10 years:
SELECT artists, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.artists
USING (game_id)
WHERE type = "boardgame"
AND year_published > 2012
AND year_published < 2024
AND artists NOT IN (
'0',
'(Uncredited)'
)
GROUP BY artists
ORDER BY total_games DESC
LIMIT 15;
The output:
artists | total_games |
---|---|
Joe Youst | 168 |
Mark Mahaffey | 135 |
Ilya Kudriashov | 123 |
Klemens Franz | 109 |
Michael Menzel | 106 |
...
There are two more discoveries that I have made, one being the top 15 categories in the past 10 years that have the highest average rating (although, these are not ordered by the games published in these categories; some other categories have more games published)
SELECT categories, ROUND(AVG(rating), 3) AS average_rating, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.categories AS categories
USING (game_id)
WHERE type = 'boardgame'
AND categories <> '0'
AND year_published > 2012
AND year_published < 2024
GROUP BY categories
ORDER BY average_rating DESC
LIMIT 15;
The output:
categories | average_rating | total_games |
---|---|---|
World War II | 6.504 | 1448 |
Vietnam War | 6.473 | 96 |
Civilization | 6.437 | 466 |
Renaissance | 6.382 | 311 |
Civil War | 6.353 | 199 |
...
And finally, the top 15 mechanics in the past 10 years that have the highest average rating:
SELECT mechanics, ROUND(AVG(rating), 3) AS average_rating, COUNT(*) AS total_games
FROM bgg.items AS items
LEFT JOIN bgg.mechanics AS mechanics
USING (game_id)
WHERE type = 'boardgame'
AND mechanics <> '0'
AND year_published > 2012
AND year_published < 2024
GROUP BY mechanics
ORDER BY average_rating DESC
LIMIT 15;
The output:
mechanics | average_rating | total_games |
---|---|---|
Auction: English | 8.348 | 2 |
Tags | 7.6 | 32 |
Neighbor Scope | 7.557 | 8 |
Auction Compensation | 7.374 | 2 |
Ratio / Combat Results Table | 7.374 | 137 |
...
...
🎥 Data Drift Monitoring for Azure ML Datasets
📈 19.18 Punkte
🎥 Video | Youtube
🔧 LUX: Toolkit for Smart Data Exploration
📈 17.93 Punkte
🔧 Programmierung