survivoR

610 episodes. 41 seasons. 1 package!

survivoR is a collection of data sets detailing events across all 40 seasons of the US Survivor, including castaway information, vote history, immunity and reward challenge winners and jury votes.

Installation

Now on CRAN (v0.9.6).

install.packages("survivoR")

Or install from Git for the latest (v0.9.9). I’m constantly improving the data sets and the github version is likely to be slightly improved.

devtools::install_github("doehm/survivoR")

News

survivoR v0.9.9

Season 41

For episode by episode updates follow me on twitter.

Dataset overview

Season summary

A table containing summary details of each season of Survivor, including the winner, runner ups and location.

season_summary
#> # A tibble: 41 x 20
#>    season_name  season location  country tribe_setup  full_name winner_id winner
#>    <chr>         <dbl> <chr>     <chr>   <chr>        <chr>         <dbl> <chr> 
#>  1 Survivor: 41     41 Mamanuca~ Fiji    "Three trib~ Eirka Ca~       597 Erika 
#>  2 Survivor: W~     40 Mamanuca~ Fiji    "Two tribes~ Tony Vla~       424 Tony  
#>  3 Survivor: I~     39 Mamanuca~ Fiji    "Two tribes~ Tommy Sh~       590 Tommy 
#>  4 Survivor: E~     38 Mamanuca~ Fiji    "Two tribes~ Chris Un~       559 Chris 
#>  5 Survivor: D~     37 Mamanuca~ Fiji    "Two tribes~ Nick Wil~       556 Nick  
#>  6 Survivor: G~     36 Mamanuca~ Fiji    "Two tribes~ Wendell ~       536 Wende~
#>  7 Survivor: H~     35 Mamanuca~ Fiji    "Three trib~ Ben Drie~       516 Ben   
#>  8 Survivor: G~     34 Mamanuca~ Fiji    "Two tribes~ Sarah La~       414 Sarah 
#>  9 Survivor: M~     33 Mamanuca~ Fiji    "Two tribes~ Adam Kle~       498 Adam  
#> 10 Survivor: K~     32 Koh Rong~ Cambod~ "Three trib~ Michele ~       478 Miche~
#> # ... with 31 more rows, and 12 more variables: runner_ups <chr>,
#> #   final_vote <chr>, timeslot <chr>, premiered <date>, ended <date>,
#> #   filming_started <date>, filming_ended <date>, viewers_premier <dbl>,
#> #   viewers_finale <dbl>, viewers_reunion <dbl>, viewers_mean <dbl>, rank <dbl>

Castaways

This data set contains season and demographic information about each castaway. It is structured to view their results for each season. Castaways that have played in multiple seasons will feature more than once with the age and location representing that point in time. Castaways that re-entered the game will feature more than once in the same season as they technically have more than one boot order e.g. Natalie Anderson - Winners at War.

Each castaway has a unique castaway_id which links the individual across all data sets and seasons. It also links to the following ID’s found on the vote_history, jury_votes and challenges data sets.

castaways |> 
  filter(season == 40)
#> # A tibble: 22 x 20
#>    season_name     season full_name    castaway_id castaway   age city   state  
#>    <chr>            <dbl> <chr>              <dbl> <chr>    <dbl> <chr>  <chr>  
#>  1 Survivor: Winn~     40 Tony Vlachos         424 Tony        45 Allen~ New Je~
#>  2 Survivor: Winn~     40 Natalie And~         442 Natalie     33 Edgew~ New Je~
#>  3 Survivor: Winn~     40 Michele Fit~         478 Michele     29 Hobok~ New Je~
#>  4 Survivor: Winn~     40 Sarah Lacina         414 Sarah       34 Cedar~ Iowa   
#>  5 Survivor: Winn~     40 Ben Drieber~         516 Ben         36 Boise  Idaho  
#>  6 Survivor: Winn~     40 Denise Stap~         386 Denise      48 Marion Iowa   
#>  7 Survivor: Winn~     40 Nick Wilson          556 Nick        28 Willi~ Kentuc~
#>  8 Survivor: Winn~     40 Jeremy Coll~         433 Jeremy      41 Foxbo~ Massac~
#>  9 Survivor: Winn~     40 Kim Spradli~         371 Kim         36 San A~ Texas  
#> 10 Survivor: Winn~     40 Sophie Clar~         353 Sophie      29 Santa~ Califo~
#> # ... with 12 more rows, and 12 more variables: personality_type <chr>,
#> #   episode <dbl>, day <dbl>, order <dbl>, result <chr>, jury_status <chr>,
#> #   original_tribe <chr>, swapped_tribe <chr>, swapped_tribe_2 <chr>,
#> #   merged_tribe <chr>, total_votes_received <dbl>, immunity_idols_won <dbl>

Castaway details

A few castaways have changed their name from season to season or have been referred to by a different name during the season e.g. Amber Mariano; in season 8 Survivor All-Stars there was Rob C and Rob M. That information has been retained here in the castaways data set.

castaway_details contains unique information for each castaway. It takes the full name from their most current season and their most verbose short name which is handy for labelling.

It also includes gender, date of birth, occupation, race and ethnicity data. If no source was found to determine a castaways race and ethnicity, the data is kept as missing rather than making an assumption.

castaway_details
#> # A tibble: 608 x 10
#>    castaway_id full_name     short_name date_of_birth date_of_death gender race 
#>          <dbl> <chr>         <chr>      <date>        <date>        <chr>  <chr>
#>  1           1 Sonja Christ~ Sonja      1937-01-28    NA            Female <NA> 
#>  2           2 B.B. Andersen B.B.       1936-01-18    2013-10-29    Male   <NA> 
#>  3           3 Stacey Still~ Stacey     1972-08-11    NA            Female <NA> 
#>  4           4 Ramona Gray   Ramona     1971-01-20    NA            Female Black
#>  5           5 Dirk Been     Dirk       1976-06-15    NA            Male   <NA> 
#>  6           6 Joel Klug     Joel       1972-04-13    NA            Male   <NA> 
#>  7           7 Gretchen Cor~ Gretchen   1962-02-07    NA            Female <NA> 
#>  8           8 Greg Buis     Greg       1975-12-31    NA            Male   <NA> 
#>  9           9 Jenna Lewis   Jenna L.   1977-07-16    NA            Female <NA> 
#> 10          10 Gervase Pete~ Gervase    1969-11-02    NA            Male   Black
#> # ... with 598 more rows, and 3 more variables: ethnicity <chr>,
#> #   occupation <chr>, personality_type <chr>

Vote history

This data frame contains a complete history of votes cast across all seasons of Survivor. This allows you to see who who voted for who at which Tribal Council. It also includes details on who had individual immunity as well as who had their votes nullified by a hidden immunity idol. This details the key events for the season.

vh <- vote_history |> 
  filter(
    season == 40,
    episode == 10
  ) 
vh
#> # A tibble: 11 x 15
#>    season_name        season episode   day tribe_status castaway immunity  vote 
#>    <chr>               <dbl>   <dbl> <dbl> <chr>        <chr>    <chr>     <chr>
#>  1 Survivor: Winners~     40      10    25 Merged       Ben      <NA>      Tyson
#>  2 Survivor: Winners~     40      10    25 Merged       Denise   Hidden    None 
#>  3 Survivor: Winners~     40      10    25 Merged       Jeremy   <NA>      Immu~
#>  4 Survivor: Winners~     40      10    25 Merged       Kim      <NA>      Soph~
#>  5 Survivor: Winners~     40      10    25 Merged       Michele  <NA>      Tyson
#>  6 Survivor: Winners~     40      10    25 Merged       Nick     <NA>      Tyson
#>  7 Survivor: Winners~     40      10    25 Merged       Sarah    <NA>      Deni~
#>  8 Survivor: Winners~     40      10    25 Merged       Sarah    <NA>      Tyson
#>  9 Survivor: Winners~     40      10    25 Merged       Sophie   <NA>      Deni~
#> 10 Survivor: Winners~     40      10    25 Merged       Tony     Individu~ Tyson
#> 11 Survivor: Winners~     40      10    25 Merged       Tyson    <NA>      Soph~
#> # ... with 7 more variables: nullified <lgl>, voted_out <chr>, order <dbl>,
#> #   vote_order <dbl>, castaway_id <dbl>, vote_id <dbl>, voted_out_id <dbl>
vh |> 
  count(vote)
#> # A tibble: 5 x 2
#>   vote       n
#>   <chr>  <int>
#> 1 Denise     2
#> 2 Immune     1
#> 3 None       1
#> 4 Sophie     2
#> 5 Tyson      5

Events in the game such as fire challenges, rock draws, steal-a-vote advantages or countbacks in the early days often mean a vote wasn’t placed for an individual. Rather a challenge may be won, lost, no vote cast but attended Tribal Council, etc. These events are recorded in the vote field. I have included a function clean_votes for when only need the votes cast for individuals. If the input data frame has the vote column it can simply be piped.

vh |> 
  clean_votes() |> 
  count(vote)
#> # A tibble: 3 x 2
#>   vote       n
#>   <chr>  <int>
#> 1 Denise     2
#> 2 Sophie     2
#> 3 Tyson      5

Challenges

The challenge_results and challenge_description data sets supersede the challenges data set.

Challenge results

A nested tidy data frame of immunity and reward challenge results. The winners and winning tribe of the challenge are found by expanding the winners column. For individual immunity challenges the winning tribe is simply NA.

challenge_results |> 
  filter(season == 40)
#> # A tibble: 25 x 10
#>    season_name  season episode   day episode_title challenge_name challenge_type
#>    <chr>         <dbl>   <dbl> <dbl> <chr>         <chr>          <chr>         
#>  1 Survivor: W~     40       1     2 Greatest of ~ By Any Means ~ Reward and Im~
#>  2 Survivor: W~     40       1     3 Greatest of ~ Blue Lagoon B~ Immunity      
#>  3 Survivor: W~     40       2     6 It's Like a ~ Draggin' the ~ Reward and Im~
#>  4 Survivor: W~     40       3     9 Out for Blood Rise and Shine Reward and Im~
#>  5 Survivor: W~     40       4    11 I Like Reven~ Beyond the Wh~ Reward and Im~
#>  6 Survivor: W~     40       5    14 The Buddy Sy~ Sea Crates     Immunity      
#>  7 Survivor: W~     40       6    16 Quick on the~ Rice Race      Reward and Im~
#>  8 Survivor: W~     40       7    18 We're in the~ Dear Liza      Immunity      
#>  9 Survivor: W~     40       7    18 We're in the~ Losing Face    Reward        
#> 10 Survivor: W~     40       8    21 This is Wher~ Get a Grip     Immunity      
#> # ... with 15 more rows, and 3 more variables: outcome_type <chr>,
#> #   challenge_id <chr>, winners <list>

Typically in the merge if a single person win a reward they are allowed to bring others along with them. This is identified by outcome_status column. If it states Chosen to particpate it means they were chosen by the winner to particpate in the reward.

The day field on this data set represents the day of the tribal council rather than the day of the challenge. This is to more easily associate the reward challenge with the immunity challenge and result of the tribal council. It also helps for joining tables.

The challenge_id is the primary key for the challenge_description data set. The challange_id will change as the data or descriptions change.

Challenge description

This data set contains descriptive binary fields for each challenge. Challenges can go by different names but where possible recurring challenges are kept consistent. While there are tweaks to the challenges, where the main components of the challenge is consistent, they share the same name.

The features of each challenge have been determined largely through string searches of key words that describe the challenge. It may not capture the full essence of the challenge but on the whole will provide a good basis for analysis. Since the description is simply a short paragraph or sentence it may not flag all appropriate features. If any descriptive features need altering please let me know in the issues.

Features:

challenge_description
#> # A tibble: 886 x 14
#>    challenge_id challenge_name    puzzle race  precision endurance strength
#>    <chr>        <chr>             <lgl>  <lgl> <lgl>     <lgl>     <lgl>   
#>  1 CH0001       Quest for Fire    FALSE  TRUE  FALSE     FALSE     FALSE   
#>  2 CH0002       Bridging the Gap  FALSE  TRUE  FALSE     FALSE     FALSE   
#>  3 CH0003       Trail Blazer      FALSE  TRUE  FALSE     FALSE     FALSE   
#>  4 CH0004       Buggin' Out       FALSE  FALSE FALSE     FALSE     FALSE   
#>  5 CH0005       Tucker'd Out      FALSE  TRUE  TRUE      FALSE     FALSE   
#>  6 CH0006       Safari Supper     FALSE  TRUE  FALSE     FALSE     FALSE   
#>  7 CH0007       Marquesan Menu    FALSE  FALSE FALSE     FALSE     FALSE   
#>  8 CH0008       Thai Menu         FALSE  FALSE FALSE     TRUE      FALSE   
#>  9 CH0009       Amazon Menu       FALSE  TRUE  FALSE     FALSE     FALSE   
#> 10 CH0010       Survivor Smoothie FALSE  FALSE FALSE     FALSE     FALSE   
#> # ... with 876 more rows, and 7 more variables: turn_based <lgl>,
#> #   balance <lgl>, food <lgl>, knowledge <lgl>, memory <lgl>, fire <lgl>,
#> #   water <lgl>

challenge_description |> 
  summarise_if(is_logical, sum)
#> # A tibble: 1 x 12
#>   puzzle  race precision endurance strength turn_based balance  food knowledge
#>    <int> <int>     <int>     <int>    <int>      <int>   <int> <int>     <int>
#> 1    238   721       184       115       50        132     143    23        55
#> # ... with 3 more variables: memory <int>, fire <int>, water <int>

Jury votes

History of jury votes. It is more verbose than it needs to be, however having a 0-1 column indicating if a vote was placed or not makes it easier to summarise castaways that received no votes.

jury_votes |> 
  filter(season == 40)
#> # A tibble: 48 x 7
#>    season_name            season castaway finalist  vote castaway_id finalist_id
#>    <chr>                   <dbl> <chr>    <chr>    <dbl>       <dbl>       <dbl>
#>  1 Survivor: Winners at ~     40 Adam     Michele      0         498         478
#>  2 Survivor: Winners at ~     40 Amber    Michele      0          27         478
#>  3 Survivor: Winners at ~     40 Ben      Michele      0         516         478
#>  4 Survivor: Winners at ~     40 Danni    Michele      0         166         478
#>  5 Survivor: Winners at ~     40 Denise   Michele      0         386         478
#>  6 Survivor: Winners at ~     40 Ethan    Michele      0          48         478
#>  7 Survivor: Winners at ~     40 Jeremy   Michele      0         433         478
#>  8 Survivor: Winners at ~     40 Kim      Michele      0         371         478
#>  9 Survivor: Winners at ~     40 Nick     Michele      0         556         478
#> 10 Survivor: Winners at ~     40 Parvati  Michele      0         197         478
#> # ... with 38 more rows
jury_votes |> 
  filter(season == 40) |> 
  group_by(finalist) |> 
  summarise(votes = sum(vote))
#> # A tibble: 3 x 2
#>   finalist votes
#>   <chr>    <dbl>
#> 1 Michele      0
#> 2 Natalie      4
#> 3 Tony        12

Hidden Idols

A dataset containing the history of hidden immunity idols including who found them, on what day and which day they were played. The idol number increments for each idol the castaway finds during the game.

hidden_idols |> 
  filter(season == 40)
#> # A tibble: 10 x 10
#>    season_name              season castaway_id castaway idol_number idols_held
#>    <chr>                     <dbl>       <dbl> <chr>    <chr>            <dbl>
#>  1 Survivor: Winners at War     40         112 Sandra   1                    1
#>  2 Survivor: Winners at War     40         386 Denise   1                    1
#>  3 Survivor: Winners at War     40         371 Kim      1                    1
#>  4 Survivor: Winners at War     40         353 Sophie   1                    1
#>  5 Survivor: Winners at War     40         386 Denise   2                    1
#>  6 Survivor: Winners at War     40         478 Michele  1                    1
#>  7 Survivor: Winners at War     40         424 Tony     1                    1
#>  8 Survivor: Winners at War     40         516 Ben      1                    1
#>  9 Survivor: Winners at War     40         442 Natalie  1                    1
#> 10 Survivor: Winners at War     40         442 Natalie  2                    1
#> # ... with 4 more variables: votes_nullified <chr>, day_found <dbl>,
#> #   day_played <dbl>, legacy_advantage <lgl>

Confessionals

A dataset containing the number of confessionals for each castaway by season and episode.

confessionals |> 
  filter(season == 40) |> 
  group_by(castaway) |> 
  summarise(n_confessionals = sum(confessional_count))
#> # A tibble: 20 x 2
#>    castaway   n_confessionals
#>    <chr>                <dbl>
#>  1 Adam                    37
#>  2 Amber                   21
#>  3 Ben                     30
#>  4 Boston Rob              28
#>  5 Danni                   14
#>  6 Denise                  18
#>  7 Ethan                   19
#>  8 Jeremy                  32
#>  9 Kim                     19
#> 10 Michele                 25
#> 11 Natalie                 24
#> 12 Nick                    21
#> 13 Parvati                 25
#> 14 Sandra                  16
#> 15 Sarah                   31
#> 16 Sophie                  20
#> 17 Tony                    52
#> 18 Tyson                   26
#> 19 Wendell                 12
#> 20 Yul                     17

Viewers

A data frame containing the viewer information for every episode across all seasons. It also includes the rating and viewer share information for viewers aged 18 to 49 years of age.

viewers |> 
  filter(season == 40)
#> # A tibble: 14 x 9
#>    season_name    season episode_number_o~ episode episode_title    episode_date
#>    <chr>           <dbl>             <dbl>   <dbl> <chr>            <date>      
#>  1 Survivor: Win~     40               583       1 Greatest of the~ 2020-02-12  
#>  2 Survivor: Win~     40               584       2 It's Like a Sur~ 2020-02-19  
#>  3 Survivor: Win~     40               585       3 Out for Blood    2020-02-26  
#>  4 Survivor: Win~     40               586       4 I Like Revenge   2020-03-04  
#>  5 Survivor: Win~     40               587       5 The Buddy Syste~ 2020-03-11  
#>  6 Survivor: Win~     40               588       6 Quick on the Dr~ 2020-03-18  
#>  7 Survivor: Win~     40               589       7 We're in the Ma~ 2020-03-25  
#>  8 Survivor: Win~     40               590       8 This is Where t~ 2020-04-01  
#>  9 Survivor: Win~     40               591       9 War is Not Pret~ 2020-04-08  
#> 10 Survivor: Win~     40               592      10 The Full Circle  2020-04-15  
#> 11 Survivor: Win~     40               593      11 This is Extorti~ 2020-04-22  
#> 12 Survivor: Win~     40               594      12 Friendly Fire    2020-04-29  
#> 13 Survivor: Win~     40               595      13 The Penultimate~ 2020-05-06  
#> 14 Survivor: Win~     40               596      14 It All Boils Do~ 2020-05-13  
#> # ... with 3 more variables: viewers <dbl>, rating_18_49 <dbl>,
#> #   share_18_49 <dbl>

Tribe colours

This data frame contains the tribe names and colours for each season, including the RGB values. These colours can be joined with the other data frames to customise colours for plots. Another option is to add tribal colours to ggplots with the scale functions.

tribe_colours
#> # A tibble: 145 x 5
#>    season_name                      season tribe      tribe_colour tribe_status
#>    <chr>                             <dbl> <chr>      <chr>        <chr>       
#>  1 Survivor: Borneo                      1 Pagong     #FFFF05      Original    
#>  2 Survivor: Borneo                      1 Rattana    #7CFC00      Merged      
#>  3 Survivor: Borneo                      1 Tagi       #FF9900      Original    
#>  4 Survivor: The Australian Outback      2 Barramundi #FF6600      Merged      
#>  5 Survivor: The Australian Outback      2 Kucha      #32CCFF      Original    
#>  6 Survivor: The Australian Outback      2 Ogakor     #A7FC00      Original    
#>  7 Survivor: Africa                      3 Boran      #FFD700      Original    
#>  8 Survivor: Africa                      3 Moto Maji  #00A693      Merged      
#>  9 Survivor: Africa                      3 Samburu    #E41A2A      Original    
#> 10 Survivor: Marquesas                   4 Maraamu    #DFFF00      Original    
#> # ... with 135 more rows

Scale functions

Included are ggplot2 scale functions of the form scale_fill_survivor() and scale_fill_tribes() to add season and tribe colours to ggplot. The scale_fill_survivor() scales uses a colour palette extracted from the season logo and scale_fill_tribes() scales uses the tribal colours of the specified season as a colour palette.

All that is required for the ‘survivor’ palettes is the desired season as input. If not season is provided it will default to season 40.

castaways |> 
  count(season, personality_type) |> 
  ggplot(aes(x = season, y = n, fill = personality_type)) +
  geom_bar(stat = "identity") +
  scale_fill_survivor(40) +
  theme_minimal()

Below are the palettes for all seasons.

To use the tribe scales, simply input the season number desired to use those tribe colours. If the fill or colour aesthetic is the tribe name, this needs to be passed to the scale function as scale_fill_tribes(season, tribe = tribe) (for now) where tribe is on the input data frame. If the fill or colour aesthetic is independent from the actual tribe names, like gender for example, tribe does not need to be specified and will simply use the tribe colours as a colour palette, such as the viewers line graph above.

ssn <- 35
labels <- castaways |>
  filter(
    season == ssn,
    str_detect(result, "Sole|unner")
  ) |>
  mutate(label = glue("{castaway} ({original_tribe})")) |>
  select(label, castaway)

jury_votes |>
  filter(season == ssn) |>
  left_join(
    castaways |>
      filter(season == ssn) |>
      select(castaway, original_tribe),
    by = "castaway"
  ) |>
  group_by(finalist, original_tribe) |>
  summarise(votes = sum(vote)) |>
  left_join(labels, by = c("finalist" = "castaway")) |>
  {
    ggplot(., aes(x = label, y = votes, fill = original_tribe)) +
      geom_bar(stat = "identity", width = 0.5) +
      scale_fill_tribes(ssn, tribe = .$original_tribe) +
      theme_minimal() +
      labs(
        x = "Finalist (original tribe)",
        y = "Votes",
        fill = "Original\ntribe",
        title = "Votes received by each finalist"
      )
  }

Issues

Given the variable nature of the game of Survivor and changing of the rules, there are bound to be edges cases where the data is not quite right. Before logging an issue please install the git version to see if it has already been corrected. If not, please log an issue and I will correct the datasets.

New features will be added, such as details on exiled castaways across the seasons. If you have a request for specific data let me know in the issues and I’ll see what I can do. Also, if you’d like to contribute by adding to existing datasets or contribute a new dataset, please contact me directly.

Showcase

Data viz projects to showcase the data sets. This looks at the number of immunity idols won and votes received for each winner.

Contributors

A big thank you to:

References

Data was almost entirely sourced from Wikipedia. Other data, such as the tribe colours, was manually recorded and entered by myself and contributors.

Torch graphic in hex: Fire Torch Vectors by Vecteezy