The following packages are used in this analysis:
library(dplyr)
library(ggplot2)
library(jsonlite)
library(knitr)
library(readr)
library(stringr)
library(tidyr)
library(zoo)
Hearthstone is a popular collectible card game published by Blizzard Entertainment in 2014, which is based on the Warcraft series by the same company. The goal of the game is to build a deck of 30 cards and defeat the opponent who also has a deck of 30 cards.
Cards can be classified according to the following categories:
Which are the most popular cards used in Ranked decks?
Which are the most popular cards used in Ranked decks?
We focus on the Ranked format where players get to decide which cards to include in their deck, therefore the cards’ popularity are more accurately represented, and the gameplay is not subject to additional constraints that other game modes (like Tavern Brawls and Adventures) may impose.
How do we determine popularity?
We determine popularity by the number of decks that include at least 1 copy in the starting 30 cards (not generated by other effects).
A deck can include at most 2 copies of any card (1 for Legendary cards), thus a card’s popularity is not heavily influenced by the number of copies players wish to use.
Possible biases to consider
Decks and cards presented in this data may not necessarily be used by players in the game itself. However, without the means to track actual matches being played in the game itself, we instead use the list of decks submitted by players as an approximation of the decks and cards often used by the playerbase.
Since Neutral cards can be used by multiple classes, they should be more popular than Class cards.
For the Wild format, cards from the older sets may be more popular simply because they have been in the game longer.
For the Standard format, cards from the Basic and Classic sets will be more popular because they do not rotate out of the format unlike expansion cards.
If a certain card becomes too popular (i.e. the community thinks players must include it in their decks), it reduces the card variety in the metagame and makes gameplay frustrating for other players (amongst other consequences). In the long term, this may lead to player attrition and loss of potential revenue (when players purchase card packs or other cosmetics).
Historically, Blizzard has dealt with problematic cards in one of several ways:
There are three datasets we will use:
data.csv
contains a list of decks submitted by players to HearthPwn from 2013 (pre-launch) to 2017,refs.json
contains detailed information about all cards (both collectible and non-collectible) up to March 2017.
cards_collectible.json
contains detailed information about all the cards that are collectible in the game.
Assuming that the datasets are stored in a separate folder at the same subfolder level, the following code reads the datasets to be used:
data_path <- file.path("..", "data")
# data for decks
decks_raw <- read_csv(file.path(data_path, "data.csv"))
# data for collectible cards
cards_raw <- tbl_df(fromJSON(file.path(data_path, "cards_collectible.json"), flatten = TRUE))
# data for all cards
cards_all_raw <- fromJSON(file.path(data_path, "refs.json"), flatten = TRUE)
The first few rows and columns of the raw data decks_raw
is shown below:
craft_cost | date | deck_archetype | deck_class | deck_format | deck_id | deck_set | deck_type | rating | title |
---|---|---|---|---|---|---|---|---|---|
9740 | 2016-02-19 | Unknown | Priest | W | 433004 | Explorers | Tavern Brawl | 1 | Reno Priest |
9840 | 2016-02-19 | Unknown | Warrior | W | 433003 | Explorers | Ranked Deck | 1 | RoosterWarrior |
2600 | 2016-02-19 | Unknown | Mage | W | 433002 | Explorers | Theorycraft | 1 | Annoying |
15600 | 2016-02-19 | Unknown | Warrior | W | 433001 | Explorers | None | 0 | Standart pay to win warrior |
7700 | 2016-02-19 | Unknown | Paladin | W | 432997 | Explorers | Ranked Deck | 1 | Palamix |
5740 | 2016-02-19 | Unknown | Warrior | W | 432995 | Explorers | Ranked Deck | 2 | Kolento’s Elise Control Warrior |
The decks_raw
data has 346232 rows and 41 columns. The first 11 columns craft_cost
to user
describe the deck’s attributes (like date submitted, class, deck format) while the remaining 30 columns describe the cards each deck contains (based on the card’s unique ID which can be referenced from the cards_raw
data).
Detailed information on the variables can be found on the Kaggle dataset.
names(decks_raw)
## [1] "craft_cost" "date" "deck_archetype" "deck_class"
## [5] "deck_format" "deck_id" "deck_set" "deck_type"
## [9] "rating" "title" "user" "card_0"
## [13] "card_1" "card_2" "card_3" "card_4"
## [17] "card_5" "card_6" "card_7" "card_8"
## [21] "card_9" "card_10" "card_11" "card_12"
## [25] "card_13" "card_14" "card_15" "card_16"
## [29] "card_17" "card_18" "card_19" "card_20"
## [33] "card_21" "card_22" "card_23" "card_24"
## [37] "card_25" "card_26" "card_27" "card_28"
## [41] "card_29"
glimpse(decks_raw)
## Observations: 346,232
## Variables: 41
## $ craft_cost <int> 9740, 9840, 2600, 15600, 7700, 5740, 1800, 1800...
## $ date <date> 2016-02-19, 2016-02-19, 2016-02-19, 2016-02-19...
## $ deck_archetype <chr> "Unknown", "Unknown", "Unknown", "Unknown", "Un...
## $ deck_class <chr> "Priest", "Warrior", "Mage", "Warrior", "Paladi...
## $ deck_format <chr> "W", "W", "W", "W", "W", "W", "W", "W", "W", "W...
## $ deck_id <int> 433004, 433003, 433002, 433001, 432997, 432995,...
## $ deck_set <chr> "Explorers", "Explorers", "Explorers", "Explore...
## $ deck_type <chr> "Tavern Brawl", "Ranked Deck", "Theorycraft", "...
## $ rating <int> 1, 1, 1, 0, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,...
## $ title <chr> "Reno Priest", "RoosterWarrior", "Annoying", "S...
## $ user <chr> "FunKaliTy", "RooosterRooo", "Messalm", "KingSn...
## $ card_0 <int> 9, 75, 195, 75, 476, 75, 1007, 48, 272, 147, 11...
## $ card_1 <int> 237, 75, 195, 290, 476, 75, 90, 48, 613, 290, 3...
## $ card_2 <int> 279, 285, 315, 304, 679, 400, 285, 397, 613, 29...
## $ card_3 <int> 545, 401, 315, 338, 943, 400, 389, 753, 635, 41...
## $ card_4 <int> 613, 401, 531, 401, 890, 401, 1026, 753, 220, 4...
## $ card_5 <int> 791, 546, 555, 401, 756, 401, 523, 1014, 825, 4...
## $ card_6 <int> 797, 546, 555, 546, 1073, 546, 531, 1029, 825, ...
## $ card_7 <int> 841, 596, 614, 546, 1073, 546, 642, 1029, 1186,...
## $ card_8 <int> 1186, 596, 614, 581, 1167, 596, 654, 1092, 1650...
## $ card_9 <int> 1189, 635, 662, 785, 1721, 596, 757, 1098, 1650...
## $ card_10 <int> 1363, 635, 662, 785, 1753, 636, 825, 1940, 1753...
## $ card_11 <int> 1365, 785, 1084, 810, 1783, 785, 921, 1997, 175...
## $ card_12 <int> 1367, 785, 1084, 825, 1786, 785, 985, 2012, 179...
## $ card_13 <int> 1650, 810, 1659, 825, 1786, 912, 994, 2012, 179...
## $ card_14 <int> 1933, 1023, 1659, 912, 1793, 1023, 1037, 2033, ...
## $ card_15 <int> 1938, 1186, 1753, 1023, 1793, 1023, 1073, 2033,...
## $ card_16 <int> 2040, 1657, 1753, 1023, 1804, 1657, 1098, 2035,...
## $ card_17 <int> 2067, 1659, 1783, 1074, 1804, 1659, 1109, 2046,...
## $ card_18 <int> 1935, 1659, 1783, 1074, 2026, 1659, 1182, 2046,...
## $ card_19 <int> 2262, 1721, 1793, 1186, 2026, 1793, 1940, 2065,...
## $ card_20 <int> 374, 1781, 1793, 1657, 2027, 1793, 2050, 2067, ...
## $ card_21 <int> 2280, 1781, 1801, 1721, 2029, 1805, 2064, 2071,...
## $ card_22 <int> 2511, 2021, 1801, 2018, 2029, 1805, 2064, 2071,...
## $ card_23 <int> 2555, 2021, 2037, 2296, 2064, 1808, 2257, 2073,...
## $ card_24 <int> 2566, 2064, 2037, 2262, 2078, 1808, 2490, 2093,...
## $ card_25 <int> 2582, 2064, 2064, 336, 374, 2021, 2495, 2306, 2...
## $ card_26 <int> 2683, 2078, 2064, 2729, 2717, 2729, 2495, 2490,...
## $ card_27 <int> 2736, 2510, 2078, 2729, 2717, 2729, 2521, 2507,...
## $ card_28 <int> 2568, 2729, 38710, 2736, 2889, 2736, 2576, 2520...
## $ card_29 <int> 2883, 2736, 38710, 2760, 2889, 2951, 2580, 2752...
There are 8 rows that contain missing data. The missing values are in the 10th column, which contains the decks’ title
as submitted by the users, so they can be safely ignored.
which(is.na(decks_raw), arr.ind = TRUE)
## row col
## [1,] 16747 10
## [2,] 175608 10
## [3,] 216047 10
## [4,] 238021 10
## [5,] 278491 10
## [6,] 326192 10
## [7,] 329285 10
## [8,] 329286 10
The first few rows and columns of the cards_raw
data is shown below (some columns are shown truncated):
col_trunc <- function(col, width, side = c("right", "left", "center"),
ellipsis = "..."){
if (class(col) == "character"){
col <- str_trunc(col, width, side, ellipsis)
return(col)
} else {
return(col)
}
}
cards_raw[1:12] %>%
lapply(col_trunc, width = 10, side = "right") %>%
as_tibble() %>%
head()
artist | cardClass | collectible | cost | dbfId | flavor | id | name | rarity | set | text | type |
---|---|---|---|---|---|---|---|---|---|---|---|
Nutthap… | MAGE | TRUE | 5 | 2539 | It’s on… | AT_001 | Flame L… | COMMON | TGT | Deal $8… | SPELL |
Tooth | MAGE | TRUE | 3 | 2541 | Burning… | AT_002 | Effigy | RARE | TGT | Secr… | SPELL |
Arthur … | MAGE | TRUE | 2 | 2545 | And he … | AT_003 | Fallen … | RARE | TGT | Your He… | MINION |
Gabor S… | MAGE | TRUE | 1 | 2572 | Now wit… | AT_004 | Arcane … | EPIC | TGT | Deal $2… | SPELL |
Mike Sass | MAGE | TRUE | 3 | 2542 | It’s al… | AT_005 | Polymor… | RARE | TGT | Transfo… | SPELL |
Dan Scott | MAGE | TRUE | 4 | 2549 | Is he a… | AT_006 | Dalaran… | COMMON | TGT | Insp… | MINION |
This dataset has 1751 rows and 65 columns. The first 32 columns artist
to questReward
describe the characteristics of each card. Some of the characteristics, including name
, Mana cost
, race
, health
and text
, can also been seen on the card itself.
Each card has two unique identifiers: a character/string id
and an integer dbfId
. The integer dbfId
is used across both the deck and card datasets, which can be joined later.
names(cards_raw[1:32])
## [1] "artist" "cardClass" "collectible"
## [4] "cost" "dbfId" "flavor"
## [7] "id" "name" "rarity"
## [10] "set" "text" "type"
## [13] "mechanics" "attack" "health"
## [16] "referencedTags" "race" "elite"
## [19] "targetingArrowText" "durability" "overload"
## [22] "spellDamage" "armor" "faction"
## [25] "howToEarn" "howToEarnGolden" "collectionText"
## [28] "classes" "multiClassGroup" "entourage"
## [31] "hideStats" "questReward"
Some of the characteristics, including name
, Mana cost
, race
, health
and text
, can also been seen on the card itself. Key mechanics
are highlighted bold in the card text:
The remaining columns are in fact nested under the playRequirements
field, which specifies how certain cards can only be played in the game. These requirements are also implicitly/explicitly stated in the card text. (Column names are shown truncated)
names(cards_raw)[33:65] %>%
str_trunc(25, "left")
## [1] "...ents.REQ_MINION_TARGET" "...nts.REQ_TARGET_TO_PLAY"
## [3] "...ments.REQ_ENEMY_TARGET" "...s.REQ_TARGET_WITH_RACE"
## [5] "..._MINIMUM_ENEMY_MINIONS" "...s.REQ_TARGET_FOR_COMBO"
## [7] "...ts.REQ_FRIENDLY_TARGET" "...EQ_TARGET_IF_AVAILABLE"
## [9] "...s.REQ_NUM_MINION_SLOTS" "...NIMUM_FRIENDLY_MINIONS"
## [11] "...ARGET_WITH_DEATHRATTLE" "...OF_RACE_DIED_THIS_GAME"
## [13] "..._MINION_DIED_THIS_GAME" "...s.REQ_LEGENDARY_TARGET"
## [15] "...BLE_AND_DRAGON_IN_HAND" "....REQ_TARGET_MAX_ATTACK"
## [17] "...nts.REQ_NONSELF_TARGET" "...s.REQ_STEALTHED_TARGET"
## [19] "...ements.REQ_HERO_TARGET" "...T_OR_MANA_CRYSTAL_SLOT"
## [21] "...s.REQ_UNDAMAGED_TARGET" "...ts.REQ_WEAPON_EQUIPPED"
## [23] "...nts.REQ_DAMAGED_TARGET" "...EQ_MUST_TARGET_TAUNTER"
## [25] "....REQ_TARGET_MIN_ATTACK" "..._MINIMUM_TOTAL_MINIONS"
## [27] "...ments.REQ_DRAG_TO_PLAY" "...ONE_CAP_FOR_NON_SECRET"
## [29] "...ents.REQ_FROZEN_TARGET" "...NO_3_COST_CARD_IN_DECK"
## [31] "...NIMUM_FRIENDLY_SECRETS" "...s.REQ_CANNOT_PLAY_THIS"
## [33] "...ENTAL_PLAYED_LAST_TURN"
Additional information on the variables can be found on HearthstoneJSON.
The following card, based on its text, would require certain conditions to be played (a minion on the board, that has not taken any damage):
The following computes the number of missing values in each field, with the exception of those that are present as lists or data frames (mechanics
, referencedTags
, classes
, entourage
).
Many of the columns contain significant amounts of missing values. However, some key attributes that define the card such as cardClass
, collectible
, dbfId
, name
, set
, and type
do not contain any missing values.
# code is not run
sapply(cards_raw, function(x) sum(is.na(x)))
The raw dataset is split into two, decks_attr
containing the deck attributes and decks_comp
containing the deck composition (cards), joined by the deck_id
identifier. We also exclude decks created before launch (there were many card changes in the alpha and beta stages, making card popularity very volatile).
The decks_comp
data will be pivoted to long format later on, thus excluding fields that are not related to the cards will minimize the size of the dataset.
launch_date <- as.Date("2014-03-11")
decks_attr <- decks_raw %>%
filter(date >= launch_date) %>%
select(deck_id, craft_cost:deck_format, deck_set:user)
decks_comp <- decks_raw %>%
filter(date >= launch_date) %>%
select(deck_id, card_0:card_29)
Within decks_attr
, the factor columns are identified and converted to the appropriate type.
fct_cols_attr <- c("deck_archetype", "deck_class", "deck_format", "deck_set", "deck_type")
decks_attr[fct_cols_attr] <- lapply(decks_attr[fct_cols_attr], factor)
rm(fct_cols_attr)
While each deck has a submission date
, we may also be interested in grouping the decks by month (which corresponds to Ranked seasons) and by year (which is marked by expansion release dates instead of calendar dates).
Years in the game based on a time period that:
So based on the release dates (taken from the above link), the years would be:
# month (with year)
decks_attr$hsmonth <- as.yearmon(decks_attr$date)
# year (not by calendar)
decks_attr$hsyear <- case_when(
decks_attr$date <= as.Date("2015-04-01") ~ "2014",
decks_attr$date >= as.Date("2015-04-02") &
decks_attr$date <= as.Date("2016-04-25") ~ "2015",
decks_attr$date >= as.Date("2016-04-26") ~ "2016"
) %>%
factor()
The Standard and Wild formats were formally introduced into the game on 2016-04-26 with the release of Whispers of the Old Gods; however the plot below shows that many decks from June 2014 to April 2016 were mistakenly labelled as Wild.
We can simply relabel all decks created before 2016-04-26 as Standard since all cards before then are not separated by format:
decks_attr$deck_format[decks_attr$date < as.Date("2016-04-26")] <- "S"
A summary of the processed data is shown below:
## Observations: 324,104
## Variables: 13
## $ deck_id <int> 433004, 433003, 433002, 433001, 432997, 432995,...
## $ craft_cost <int> 9740, 9840, 2600, 15600, 7700, 5740, 1800, 1800...
## $ date <date> 2016-02-19, 2016-02-19, 2016-02-19, 2016-02-19...
## $ deck_archetype <fct> Unknown, Unknown, Unknown, Unknown, Unknown, Un...
## $ deck_class <fct> Priest, Warrior, Mage, Warrior, Paladin, Warrio...
## $ deck_format <fct> S, S, S, S, S, S, S, S, S, S, S, S, S, S, S, S,...
## $ deck_set <fct> Explorers, Explorers, Explorers, Explorers, Exp...
## $ deck_type <fct> Tavern Brawl, Ranked Deck, Theorycraft, None, R...
## $ rating <int> 1, 1, 1, 0, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,...
## $ title <chr> "Reno Priest", "RoosterWarrior", "Annoying", "S...
## $ user <chr> "FunKaliTy", "RooosterRooo", "Messalm", "KingSn...
## $ hsmonth <S3: yearmon> Feb 2016, Feb 2016, Feb 2016, Feb 2016,...
## $ hsyear <fct> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,...
## deck_id craft_cost date
## Min. : 36923 Min. : 0 Min. :2014-03-11
## 1st Qu.:253573 1st Qu.: 2840 1st Qu.:2015-05-26
## Median :428597 Median : 5120 Median :2016-02-09
## Mean :419989 Mean : 5745 Mean :2015-12-21
## 3rd Qu.:603508 3rd Qu.: 7840 3rd Qu.:2016-08-09
## Max. :749548 Max. :48000 Max. :2017-03-19
##
## deck_archetype deck_class deck_format
## Unknown :220501 Mage :42230 S:307743
## Midrange Shaman: 5472 Priest :41756 W: 16361
## Control Priest : 5135 Paladin:39368
## Control Warrior: 4939 Warlock:35598
## Tempo Mage : 4545 Druid :35488
## Midrange Hunter: 4371 Shaman :33969
## (Other) : 79141 (Other):95695
## deck_set deck_type rating
## Explorers : 57307 Arena : 8178 Min. : 0.000
## Old Gods : 49895 None : 75120 1st Qu.: 1.000
## Blackrock Launch: 38900 PvE Adventure: 9059 Median : 1.000
## Gadgetzan : 31329 Ranked Deck :202104 Mean : 2.777
## Naxx Launch : 22283 Tavern Brawl : 6360 3rd Qu.: 1.000
## Yogg Nerf : 22175 Theorycraft : 19686 Max. :4016.000
## (Other) :102215 Tournament : 3597
## title user hsmonth hsyear
## Length:324104 Length:324104 Min. :2014 2014: 65119
## Class :character Class :character 1st Qu.:2015 2015:128062
## Mode :character Mode :character Median :2016 2016:130923
## Mean :2016
## 3rd Qu.:2017
## Max. :2017
##
Many columns in the cards_raw
data pertain to the card’s stats, mechanics and play requirements, which is better explained by the text on the card image. So we choose to only include the columns that we consider are the core properties of the card:
cards_simple <- select(cards_raw,
dbfId, name, cost, cardClass,
rarity, type, set, collectible, id)
## Observations: 1,751
## Variables: 9
## $ dbfId <int> 2539, 2541, 2545, 2572, 2542, 2549, 2571, 2544, 25...
## $ name <chr> "Flame Lance", "Effigy", "Fallen Hero", "Arcane Bl...
## $ cost <int> 5, 3, 2, 1, 3, 4, 3, 6, 8, 5, 4, 4, 1, 3, 2, 2, 4,...
## $ cardClass <chr> "MAGE", "MAGE", "MAGE", "MAGE", "MAGE", "MAGE", "M...
## $ rarity <chr> "COMMON", "RARE", "RARE", "EPIC", "RARE", "COMMON"...
## $ type <chr> "SPELL", "SPELL", "MINION", "SPELL", "SPELL", "MIN...
## $ set <chr> "TGT", "TGT", "TGT", "TGT", "TGT", "TGT", "TGT", "...
## $ collectible <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR...
## $ id <chr> "AT_001", "AT_002", "AT_003", "AT_004", "AT_005", ...
The set
column contains abbreviated names or nicknames and is not necessarily informative; we create a new column that uses the actual names of the card sets:
unique(cards_simple$set)
## [1] "TGT" "BOOMSDAY" "BRM" "GANGS"
## [5] "CORE" "EXPERT1" "HOF" "NAXX"
## [9] "GILNEAS" "GVG" "HERO_SKINS" "ICECROWN"
## [13] "KARA" "LOE" "LOOTAPALOOZA" "OG"
## [17] "UNGORO"
# list for recoding card sets
cardset_lst <- list(
"CORE" = "Basic",
"EXPERT1" = "Classic",
# 2014 sets
"NAXX" = "Curse of Naxxramas",
"GVG" = "Goblins vs Gnomes",
# 2015 sets
"BRM" = "Blackrock Mountain",
"TGT" = "The Grand Tournament",
"LOE" = "League of Explorers",
# 2016 sets
"OG" = "Whispers of the Old Gods",
"KARA" = "One Night in Karazhan",
"GANGS" = "Mean Streets of Gadgetzan",
"HOF" = "Hall of Fame",
# 2017 sets
"UNGORO" = "Journey to Un'Goro",
"ICECROWN" = "Knights of the Frozen Throne",
"LOOTAPALOOZA" = "Kobolds & Catacombs",
# 2018 sets
"GILNEAS" = "The Witchwood",
"BOOMSDAY" = "The Boomsday Project"
)
cards_simple$card_set <- recode(cards_simple$set, !!!cardset_lst,
.default = "Other")
# drop the set column to avoid confusion (it remains in the cards_raw dataset)
cards_simple$set <- NULL
# remove list (no longer needed)
rm(cardset_lst)
Some columns are entirely uppercase, which we convert to title case for readability:
titlecase_cols <- c("cardClass", "rarity", "type")
cards_simple[titlecase_cols] <- lapply(cards_simple[titlecase_cols],
str_to_title)
rm(titlecase_cols)
The factor columns are then identified and converted to the appropriate type:
fct_cols_cards <- c("cardClass", "rarity", "type", "card_set")
cards_simple[fct_cols_cards] <- lapply(cards_simple[fct_cols_cards], factor)
rm(fct_cols_cards)
A summary of the processed data is shown below:
## dbfId name cost cardClass
## Min. : 7 Length:1751 Min. : 0.000 Neutral:657
## 1st Qu.: 1987 Class :character 1st Qu.: 2.000 Paladin:123
## Median :38957 Mode :character Median : 4.000 Hunter :122
## Mean :25375 Mean : 3.856 Mage :122
## 3rd Qu.:43163 3rd Qu.: 5.000 Warlock:122
## Max. :53187 Max. :20.000 Druid :121
## NA's :22 (Other):484
## rarity type collectible id
## Common :612 Hero : 33 Mode:logical Length:1751
## Epic :298 Minion:1192 TRUE:1751 Class :character
## Free :142 Spell : 471 Mode :character
## Legendary:253 Weapon: 55
## Rare :446
##
##
## card_set
## Classic :236
## Basic :142
## Journey to Un'Goro :135
## Knights of the Frozen Throne:135
## Kobolds & Catacombs :135
## The Boomsday Project :135
## (Other) :833
Card IDs in the decks_comp
data may be incorrect, due to the following reasons:
Specifically, we are looking for the version of each card that is collectible (since all cards used in Ranked decks must be collectible). This step requires using the full card data cards_all_raw
(which contains non-collectible cards).
The first step is to compile a list of unique card IDs found in decks_comp
:
# IDs of all cards used
cards_used <- decks_comp %>%
select(card_0:card_29) %>%
unlist(use.names = FALSE) %>% # flatten into a vector
unique() %>%
sort()
The following code filters for those IDs that are not found in our collectible cards data cards_raw
:
# IDs of missing cards (using dbfId)
missing_cards <- cards_used[!cards_used %in% cards_raw$dbfId]
Using the above vector of IDs, we filter out the relevant non-collectible cards from the cards_all_raw
dataset.
# filter for those missing cards only
mssng_cards <- cards_all_raw %>%
select(dbfId, name, cardClass, type, collectible) %>%
filter(dbfId %in% missing_cards)
dbfId | name | cardClass | type | collectible |
---|---|---|---|---|
40341 | Cleave | NEUTRAL | SPELL | NA |
2177 | Dark Wispers | DRUID | SPELL | NA |
42146 | Doppelgangster | NEUTRAL | MINION | NA |
38319 | Druid of the Claw | DRUID | MINION | NA |
2230 | Druid of the Fang | DRUID | MINION | NA |
2310 | Druid of the Flame | DRUID | MINION | NA |
To find the correct IDs, we join the cards by name to the cards_simple
dataset. The dbfID.x
on the left would be replaced by the dbfID.y
on the right:
mislabelled <- mssng_cards %>%
select(dbfId, name) %>%
# also drops uncollectible mislabelled cards (which are not in simple_cards)
inner_join(cards_simple, by = "name") %>%
arrange(name)
mislabelled
dbfId.x | name | dbfId.y | cost | cardClass | rarity | type | collectible | id | card_set |
---|---|---|---|---|---|---|---|---|---|
40341 | Cleave | 940 | 2 | Warrior | Free | Spell | TRUE | CS2_114 | Basic |
2177 | Dark Wispers | 2009 | 6 | Druid | Epic | Spell | TRUE | GVG_041 | Goblins vs Gnomes |
42146 | Doppelgangster | 40953 | 5 | Neutral | Rare | Minion | TRUE | CFM_668 | Mean Streets of Gadgetzan |
38319 | Druid of the Claw | 692 | 5 | Druid | Common | Minion | TRUE | EX1_165 | Classic |
2230 | Druid of the Fang | 2048 | 5 | Druid | Common | Minion | TRUE | GVG_080 | Goblins vs Gnomes |
2310 | Druid of the Flame | 2292 | 3 | Druid | Common | Minion | TRUE | BRM_010 | Blackrock Mountain |
40402 | Evolve | 38266 | 1 | Shaman | Rare | Spell | TRUE | OG_027 | Whispers of the Old Gods |
41409 | Jade Idol | 40372 | 1 | Druid | Rare | Spell | TRUE | CFM_602 | Mean Streets of Gadgetzan |
468 | Mark of Nature | 151 | 3 | Druid | Common | Spell | TRUE | EX1_155 | Classic |
41609 | Nefarian | 2261 | 9 | Neutral | Legendary | Minion | TRUE | BRM_030 | Blackrock Mountain |
38113 | Raven Idol | 13335 | 1 | Druid | Common | Spell | TRUE | LOE_115 | League of Explorers |
1161 | Starfall | 86 | 5 | Druid | Rare | Spell | TRUE | NEW1_007 | Classic |
38710 | Unstable Portal | 1929 | 2 | Mage | Rare | Spell | TRUE | GVG_003 | Goblins vs Gnomes |
38653 | Wisp | 179 | 0 | Neutral | Common | Minion | TRUE | CS2_231 | Classic |
137 | Wrath | 836 | 2 | Druid | Common | Spell | TRUE | EX1_154 | Classic |
We create a named list that can be used within recode()
:
# list values are correct ids
mislab_recode <- as.list(mislabelled$dbfId.y)
# list names are mislabelled ids
names(mislab_recode) <- mislabelled$dbfId.x
The following step applies the recode()
function to all the card columns card_0
to card_29
in our decks_comp
data, which will relabel all the incorrect IDs with their respective IDs (dbfID.x
with dbfID.y
based on the table above).
# relabel ids in data
decks_comp <- mutate_at(decks_comp, vars(starts_with("card")),
recode, !!!mislab_recode)
# intermediate objects no longer needed
rm(cards_used, missing_cards, cards_all_raw, mssng_cards)
The following objects are used for plotting and visualization:
# class colours
class_colors <- c(
"Druid" = "#FF7D0A",
"Hunter" = "#569B56", #"#ABD473",
"Mage" = "#40C7EB",
"Paladin" = "#F58CBA",
"Priest" = "#FFFFFF",
"Rogue" = "#FFF569",
"Shaman" = "#0070DE",
"Warlock" = "#8787ED",
"Warrior" = "#C79C6E",
"Neutral" = "#777777"
)
# rarity colours
rarity_colors <- c(
"Free" = NA,
"Common" = "#000000",
"Rare" = "#1E90FF",
"Epic" = "#9932CC",
"Legendary" = "#FFB90F"
)
# release dates of card sets
release_dates <- c(
"Launch" = "2014-03-11",
"Naxxramas" = "2014-07-22",
"GvG" = "2014-12-08",
"Blackrock" = "2015-04-02",
"TGT" = "2015-08-24",
"Explorers" = "2015-11-12",
"Old Gods" = "2016-04-26",
"Karazhan" = "2016-08-11",
"Gadgetzan" = "2016-12-01"
)
# darker plot theme for this doc
theme_doc <- theme(panel.spacing = unit(1, "points"),
panel.grid.major.y = element_blank(),
plot.title = element_text(size = 12),
axis.text.y = element_text(size = 9),
panel.background = element_rect(fill = "#C3C3C3"))
The question we are trying to answer is
Which are the most popular cards used in Ranked decks (Standard format)?
We measure card popularity by counting the number of decks that include at least one copy of that card.
To begin, we filter only for Ranked Decks in Standard format, pick out the variables of interest, join the decks to their compositions, and pivot the card columns.
# create set of ranked standard decks for pct calculation
rs_decks_all <- decks_attr %>%
filter(deck_type == "Ranked Deck",
deck_format == "S")
ranked_deck_cards <- rs_decks_all %>%
select(deck_id, deck_class,
date, hsmonth, hsyear,
deck_format, deck_set, craft_cost) %>%
inner_join(decks_comp, by = "deck_id") %>%
gather(card_no, card_id, card_0:card_29) %>%
select(-card_no) %>% # drop the key column which is unnecessary
# select unique combinations of deck and card (ignore duplicates)
distinct(deck_id, card_id, .keep_all = TRUE)
glimpse(ranked_deck_cards)
## Observations: 3,806,924
## Variables: 9
## $ deck_id <int> 433003, 432997, 432995, 432992, 432990, 432989, 43...
## $ deck_class <fct> Warrior, Paladin, Warrior, Priest, Mage, Mage, Rog...
## $ date <date> 2016-02-19, 2016-02-19, 2016-02-19, 2016-02-19, 2...
## $ hsmonth <S3: yearmon> Feb 2016, Feb 2016, Feb 2016, Feb 2016, Fe...
## $ hsyear <fct> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 20...
## $ deck_format <fct> S, S, S, S, S, S, S, S, S, S, S, S, S, S, S, S, S,...
## $ deck_set <fct> Explorers, Explorers, Explorers, Explorers, Explor...
## $ craft_cost <int> 9840, 7700, 5740, 8780, 2800, 0, 2980, 1540, 4440,...
## $ card_id <int> 75, 476, 75, 272, 113, 77, 251, 242, 290, 180, 476...
To derive overall card popularity, we count the number of times each card_id
appears, then join it to the cards_simple
data to retrieve the card details. After that, we sort by number of decks. In addition, we also calculate the percentage of all Ranked Standard decks that feature each card.
pop_cards_all <- ranked_deck_cards %>%
group_by(card_id) %>%
summarise(n_decks = n(),
pct_all = n_decks * 100 / nrow(rs_decks_all)) %>%
inner_join(cards_simple, by = c("card_id" = "dbfId"))
As the table below shows, the most popular cards are expected to be Neutral class since they can be used by all nine classes in the game, as opposed to Class cards:
Name | Class | Rarity | Card Set | No. of Decks | % of All Decks |
---|---|---|---|---|---|
Azure Drake | Neutral | Rare | Hall of Fame | 62327 | 32.2 |
Sylvanas Windrunner | Neutral | Legendary | Hall of Fame | 46499 | 24.0 |
Emperor Thaurissan | Neutral | Legendary | Blackrock Mountain | 37614 | 19.4 |
Dr. Boom | Neutral | Legendary | Goblins vs Gnomes | 37033 | 19.1 |
Sludge Belcher | Neutral | Rare | Curse of Naxxramas | 34718 | 17.9 |
Defender of Argus | Neutral | Rare | Classic | 31081 | 16.1 |
Ragnaros the Firelord | Neutral | Legendary | Hall of Fame | 29656 | 15.3 |
Piloted Shredder | Neutral | Common | Goblins vs Gnomes | 28434 | 14.7 |
Acolyte of Pain | Neutral | Common | Classic | 28340 | 14.6 |
Bloodmage Thalnos | Neutral | Legendary | Classic | 28227 | 14.6 |
Which abilities/properties makes these cards popular?
As a side note, the Hall of Fame was only created in April 2017; the popularity of those three cards (Azure Drake, Sylvanas Windrunner, Ragnaros the Firelord) may have been one of the factors behind Blizzard’s decision to move them out of Standard format and into the Hall of Fame.
The following plot shows the relative popularity of all cards in the dataset:
Since Neutral cards are expected to be more popular in general, we follow up by asking:
Amongst Class cards, which is the most popular card for each class?
By tallying the total number of decks by class, we can evaluate for each class the percentage of decks which use the most popular class cards.
rs_decks_class <- rs_decks_all %>%
group_by(deck_class) %>%
summarise(tot_decks_class = n())
pop_cards_class <- pop_cards_all %>%
filter(!is.na(cardClass), cardClass != "Neutral") %>%
select(cardClass, name, rarity, card_set, n_decks) %>%
inner_join(rs_decks_class, by = c("cardClass" = "deck_class")) %>%
mutate(pct_class = n_decks * 100 / tot_decks_class)
The most popular class cards are as follows:
Class | Name | Rarity | Card Set | No. of Decks | Total No. of Class Decks | % of Class Decks |
---|---|---|---|---|---|---|
Druid | Swipe | Free | Basic | 18802 | 20163 | 93.3 |
Hunter | Animal Companion | Free | Basic | 15772 | 18226 | 86.5 |
Mage | Frostbolt | Free | Basic | 22374 | 23850 | 93.8 |
Paladin | Truesilver Champion | Free | Basic | 20395 | 23742 | 85.9 |
Priest | Shadow Word: Death | Free | Basic | 21863 | 24396 | 89.6 |
Rogue | Eviscerate | Common | Classic | 17717 | 19058 | 93.0 |
Shaman | Hex | Free | Basic | 16880 | 21099 | 80.0 |
Warlock | Power Overwhelming | Common | Hall of Fame | 13400 | 21268 | 63.0 |
Warrior | Fiery War Axe | Free | Basic | 20856 | 21776 | 95.8 |
Which abilities/properties makes the above cards popular?
Some of the other popular cards (in the following plot) also have effects like card draw or area of effect.
While Frostbolt appeared in the largest number of decks, many of the most popular cards were included in more than 90% of decks from their class. The only exception would be Power Overwhelimg (Warlock) which only appeared in 63% of Warlock decks.
As the table shows, cards from the Basic and Classic sets tend to be more popular because they do not rotate out of Standard format, and they have been around since the launch of Hearthstone.
The following plot shows the relative popularity of the top 3 cards for each class. Dashed lines indicate the total number of decks representing each class in the dataset:
Our decks_attr
data spans 3 years from March 2014 to March 2017. Recall that years are defined by release dates of card sets instead of calendar months:
To derive card popularity, we count the number of times each card_id
appears in each hsyear
:
rs_decks_year <- rs_decks_all %>%
group_by(hsyear) %>%
summarise(tot_decks_year = n())
pop_cards_year <- ranked_deck_cards %>%
group_by(hsyear, card_id) %>%
summarise(n_decks = n()) %>%
inner_join(cards_simple, by = c("card_id" = "dbfId"))
The following table shows the top 3 cards for each year. As expected, Neutral cards are used more often than class-specific cards. At the same time, the most popular card in 2015 and 2016 (Dr. Boom and Azure Drake) were considered even more essential than the others, considering that they were included in more than 40% of decks submitted:
Year | Name | Rarity | Card Set | No. of Decks | Total No. of Decks | % of Decks |
---|---|---|---|---|---|---|
2014 | Sludge Belcher | Rare | Curse of Naxxramas | 10490 | 35791 | 29.3 |
2014 | Defender of Argus | Rare | Classic | 8832 | 35791 | 24.7 |
2014 | Harvest Golem | Common | Classic | 8814 | 35791 | 24.6 |
2015 | Dr. Boom | Legendary | Goblins vs Gnomes | 31770 | 73406 | 43.3 |
2015 | Sludge Belcher | Rare | Curse of Naxxramas | 24227 | 73406 | 33.0 |
2015 | Piloted Shredder | Common | Goblins vs Gnomes | 23333 | 73406 | 31.8 |
2016 | Azure Drake | Rare | Hall of Fame | 34079 | 84381 | 40.4 |
2016 | Sylvanas Windrunner | Legendary | Hall of Fame | 21198 | 84381 | 25.1 |
2016 | Brann Bronzebeard | Legendary | League of Explorers | 19699 | 84381 | 23.3 |
Which abilities/properties makes the above cards popular?
The following plot illustrates the relative popularity of cards in each year, and clearly shows the popularity of 2015 Dr. Boom and 2016 Azure Drake over the others.
On the other hand, the Ranked ladder resets to mark a new Season at the start of every calendar month. Players take this opportunity to improve upon their previous ranking, and they may try to include new cards to overcome the popular decks that they tend to encounter.
From the graph above, there is usually a spike in the number of decks submitted when a new card set is released (marked by the vertical lines). The number of submissions gradually tapers off until the next card set is released.
A possible question to ask may be:
Are there any shifts in card popularity in the three months following the release of the Old Gods card set (2016-04-26)?
The plot below charts the number of decks submitted every day for the first three months following 26th April. The number of decks submitted (and the number of cards included) sharply tapers off after the first week.
The following code evaluates the card popularity for each month from April 2016 to July 2016, as well as the percentage of decks that included these cards in each month:
pop_cards_2016_aprjul <- ranked_deck_cards %>%
filter(date >= as.Date("2016-04-26"),
date <= as.Date("2016-07-25")) %>%
group_by(hsmonth, card_id) %>%
summarise(n_decks = n()) %>%
inner_join(cards_simple, by = c("card_id" = "dbfId"))
rs_decks_2016_aprjul <- rs_decks_all %>%
filter(date >= as.Date("2016-04-26"),
date <= as.Date("2016-07-25")) %>%
group_by(hsmonth) %>%
summarise(tot_decks_month = n())
While cards from the latest card set were popular in the first week (still in April), cards from the older card sets (such as Classic) returned to the top spots in subsequent months. However, since the data concerns decks submitted (not necessarily played in games) we cannot infer that the new cards fell in popularity, only that decks submitted in the later weeks favoured including older cards over the new ones.
Month | Name | Card Set | No. of Decks | Total No. of Decks | % of Decks |
---|---|---|---|---|---|
Apr 2016 | C’Thun | Whispers of the Old Gods | 2126 | 6721 | 31.6 |
Apr 2016 | Azure Drake | Hall of Fame | 2110 | 6721 | 31.4 |
Apr 2016 | Sylvanas Windrunner | Hall of Fame | 1984 | 6721 | 29.5 |
Apr 2016 | Disciple of C’Thun | Whispers of the Old Gods | 1866 | 6721 | 27.8 |
Apr 2016 | Beckoner of Evil | Whispers of the Old Gods | 1825 | 6721 | 27.2 |
May 2016 | Sylvanas Windrunner | Hall of Fame | 3742 | 10546 | 35.5 |
May 2016 | Azure Drake | Hall of Fame | 3333 | 10546 | 31.6 |
May 2016 | Brann Bronzebeard | League of Explorers | 2204 | 10546 | 20.9 |
May 2016 | C’Thun | Whispers of the Old Gods | 2188 | 10546 | 20.7 |
May 2016 | Emperor Thaurissan | Blackrock Mountain | 2132 | 10546 | 20.2 |
Jun 2016 | Sylvanas Windrunner | Hall of Fame | 1563 | 4776 | 32.7 |
Jun 2016 | Azure Drake | Hall of Fame | 1526 | 4776 | 32.0 |
Jun 2016 | Emperor Thaurissan | Blackrock Mountain | 979 | 4776 | 20.5 |
Jun 2016 | Acolyte of Pain | Classic | 927 | 4776 | 19.4 |
Jun 2016 | Doomsayer | Classic | 878 | 4776 | 18.4 |
Jul 2016 | Azure Drake | Hall of Fame | 1001 | 2858 | 35.0 |
Jul 2016 | Sylvanas Windrunner | Hall of Fame | 953 | 2858 | 33.3 |
Jul 2016 | Emperor Thaurissan | Blackrock Mountain | 638 | 2858 | 22.3 |
Jul 2016 | Acolyte of Pain | Classic | 599 | 2858 | 21.0 |
Jul 2016 | Doomsayer | Classic | 572 | 2858 | 20.0 |
Which were the most popular cards (Neutral and Class-specific) for each card set?
The first table shows the most popular Neutral card from each card set, with many of them being Legendary:
Card Set | Name | Class | Rarity | No. of Decks |
---|---|---|---|---|
Basic | Acidic Swamp Ooze | Neutral | Free | 17107 |
Blackrock Mountain | Emperor Thaurissan | Neutral | Legendary | 37614 |
Classic | Defender of Argus | Neutral | Rare | 31081 |
Curse of Naxxramas | Sludge Belcher | Neutral | Rare | 34718 |
Goblins vs Gnomes | Dr. Boom | Neutral | Legendary | 37033 |
Hall of Fame | Azure Drake | Neutral | Rare | 62327 |
League of Explorers | Brann Bronzebeard | Neutral | Legendary | 25885 |
Mean Streets of Gadgetzan | Kazakus | Neutral | Legendary | 5308 |
Mean Streets of Gadgetzan | Aya Blackpaw | Neutral | Legendary | 5308 |
One Night in Karazhan | Barnes | Neutral | Legendary | 9228 |
The Grand Tournament | Justicar Trueheart | Neutral | Legendary | 15013 |
Whispers of the Old Gods | N’Zoth, the Corruptor | Neutral | Legendary | 11764 |
The following plot shows the relative popularity of Neutral cards in each card set. Notably, the Basic, Blackrock Mountain and Karazhan sets have an overwhelming favourite (with a large difference over the 2nd most popular card):
The next table shows the most popular Class-specific cards from each card set.
Card Set | Name | Class | Type | Rarity | No. of Decks |
---|---|---|---|---|---|
Basic | Frostbolt | Mage | Spell | Free | 22374 |
Blackrock Mountain | Imp Gang Boss | Warlock | Minion | Common | 11162 |
Classic | Eviscerate | Rogue | Spell | Common | 17717 |
Curse of Naxxramas | Death’s Bite | Warrior | Weapon | Common | 9012 |
Goblins vs Gnomes | Shielded Minibot | Paladin | Minion | Common | 9620 |
Hall of Fame | Power Overwhelming | Warlock | Spell | Common | 13400 |
League of Explorers | Entomb | Priest | Spell | Common | 10527 |
Mean Streets of Gadgetzan | Dragonfire Potion | Priest | Spell | Epic | 3410 |
One Night in Karazhan | Maelstrom Portal | Shaman | Spell | Rare | 5505 |
The Grand Tournament | Totem Golem | Shaman | Minion | Common | 10933 |
Whispers of the Old Gods | Thing from Below | Shaman | Minion | Rare | 7361 |
The following plot shows the relative popularity of Class cards in each card set. Besides the Hall of Fame (which usually consists of Classic cards), the Naxxramas set has an overwhelming favourite in Death’s Bite (Warrior), which was highly valued both for its strong damage and Deathrattle effect.
Between 1 and 10-cost mana cards, which cards were the most popular amongst cards of the same cost?
Mana Cost | Name | Class | Type | Card Set | No. of Decks |
---|---|---|---|---|---|
0 | Innervate | Druid | Spell | Basic | 18345 |
1 | Power Word: Shield | Priest | Spell | Basic | 20956 |
2 | Bloodmage Thalnos | Neutral | Minion | Classic | 28227 |
3 | Acolyte of Pain | Neutral | Minion | Classic | 28340 |
4 | Defender of Argus | Neutral | Minion | Classic | 31081 |
5 | Azure Drake | Neutral | Minion | Hall of Fame | 62327 |
6 | Sylvanas Windrunner | Neutral | Minion | Hall of Fame | 46499 |
7 | Dr. Boom | Neutral | Minion | Goblins vs Gnomes | 37033 |
8 | Ragnaros the Firelord | Neutral | Minion | Hall of Fame | 29656 |
9 | Ysera | Neutral | Minion | Classic | 17256 |
10 | N’Zoth, the Corruptor | Neutral | Minion | Whispers of the Old Gods | 11764 |
The following plot shows the relative popularity of cards with the same mana cost. Notably, the groups of 5, 7, 8 and 9-cost cards have the most popular cards leading with a significant margin. There are also some Class cards that fare well against cards with the same mana cost, despite our expectation that Neutral cards should be favoured over Class cards.
So far, we have looked at cards that users tend to include in decks in Standard format for Ranked play, which is also used for official Hearthstone tournaments - making these popular cards highly visible to a wide audience. We have also looked at card popularity when broken down by various categories, such as class, time period and card set.
Are there any limitations to the data that may have affected our analysis?
The major limitation of this data is that it only looks at decks submitted to a third-party website, which brings up the following issues:
rating
attribute may reflect how strong other players consider a deck, it is also biased towards the popularity of the user as well as the date of submission:
How can we expand on this analysis?