The UFC brings out the big guns (again…)

Why did the UFC let Brock Lesnar challenge Daniel Cormier for the belt?

This weekend I watched UFC 226, Miocic vs. Cormier. This was a title match with Miocic defending the heavyweight belt for the third time. Cormier landed a hard right hook in the final minute of the first round that dropped Miocic to the mat, and after a few ground-and-pound head shots, Cormier was given the win by KO (punches).

Before Joe Rogan could finish his fight recap interview, Cormier grabbed the microphone and called out Brock Lesnar (who was conveniently standing just outside the octagon). Lesnar entered the ring, said some rather disrespectful things about Miocic, then called out Cormier, alluding to an upcoming title fight between the 39-year-old Cormier and for WWE star.

I found this spectacle to be surreal, and couldn’t believe Joe Rogan let someone take his microphone. To say the least, it was an unprofessional display of sportsmanship and made me feel like I was watching more of a stage play than an athletic event.

Why would Dana White allow Brock Lesnar back into the UFC?

This question was bouncing around in my head most of the night, and I decided to dig into the numbers to see if I could understand what happened.

The Sports Daily data

After some snooping. I found a data set of pay-per-view sales (the actual metric is ‘Buy Rate’). These data come from the post titled, ” All-Time UFC PPV Sales Data,” and they contain data up until 225 (right before the fight in question).

The code chunk below downloads the table of data from the website, extracts the table, and checks the shape of this new data frame.

PPVUFC_url <- "http://thesportsdaily.com/2018/02/16/all-time-ufc-ppv-sales-data-fox11/"
PPVUFC_extraction <- PPVUFC_url %>%
     read_html() %>%
     html_nodes("table")
# check the structure of the new PPVUFC_extraction object
# PPVUFC_extraction %>% str()
# extract the html table
PpvUfcRaw <- rvest::html_table(PPVUFC_extraction[[1]]) 
# check the shape of the raw data 
# PpvUfcRaw %>% dplyr::glimpse(78)
# Now I need to clean these data up a bit by making the following changes:
# 1. rename the variables
# 2. remove the first row of data (they are the column names)
# 3. remove an empty row of data (row 2)
PayPerViewUFC <- PpvUfcRaw %>%
  dplyr::rename(event = X1,
                date = X2,
                main_event = X3,
                buy_rate = X4)
PayPerViewUFC <- PayPerViewUFC %>%
  filter(event != "Event" & event != "")
# check the dataShape
# dataShape() is a function I wrote that combines a little bit of
# dplyr::glimpse(), utils::head(), utils::tail(), and base::class()
dataShape <- function(df) {
    obs <- nrow(df)
    vars <- ncol(df)
    class <- paste0(class(df), collapse = "; ")
    first_var <- base::names(df) %>% head(1)
    last_var <- base::names(df) %>% tail(1)
    group <- is_grouped_df(df)
    heads_tails <- tibble::as_tibble(.env$ht(df))
    cat("Observations: ", obs, "\n", sep = "")
    cat("Variables: ", vars, "\n", sep = "")
    cat("Class(es): ", class, "\n", sep = " ")     
    cat("First/last variable: ", first_var, "/", last_var, "\n", sep = "")
    cat("Grouped: ", group, "\n", sep = "")
    cat("Top 5 & bottom 5 observations:", "\n", sep = "") 
    heads_tails 
} 
PayPerViewUFC %>% dataShape()
## Observations: 219
## Variables: 4
## Class(es):  data.frame
## First/last variable: event/buy_rate
## Grouped: FALSE
## Top 5 & bottom 5 observations:
## # A tibble: 10 x 4
##    event   date      main_event                         buy_rate
##  *                                       
##  1 UFC 1   Nov 12/93 Tournament                         86,000
##  2 UFC 2   Mar 11/94 Tournament                         300,000
##  3 UFC 3   Sept 9/94 Tournament                         90,000
##  4 UFC 4   Dec16/94  Tournament                         120,000
##  5 UFC 5   Apr 7/95  Royce Gracie vs Ken Shamrock       260,000
##  6 UFC 221 Feb 11/18 Yoel Romero vs Luke Rockhold       N/A
##  7 UFC 222 Mar 3/18  Cris Cyborg vs Yana Kunitskaya     N/A
##  8 UFC 223 Apr 7/18  Khabib Nurmagomedov vs Al Iaquinta N/A
##  9 UFC 224 May 12/18 Amanda Nunes vs Raquel Pennington  N/A
## 10 UFC 225 Jun 9/18  Robert Whittaker vs Yoel Romero    250,000

Export the raw data and the processed data file.

# writeLines(fs::dir_ls("data"))
write_csv(as_data_frame(PpvUfcRaw), "data/PpvUfcRaw.csv")
write_csv(as_data_frame(PayPerViewUFC), "data/PayPerViewUFC.csv")

Unique Events

These should be the UFC event (all the way back to the beginning). I want to see if this is unique (1 per row). The best to way to be sure of this is with base::identical()

base::identical(x = nrow(dplyr::distinct(PayPerViewUFC, event)),
          y = nrow(PayPerViewUFC))
## [1] TRUE

That’s helpful information–now I know I don’t have duplicate identification numbers for each UFC event.

The dates for each event

The next column in the data set is date, and these are all given as Month(abbreviation) DD/YY. I can quickly clean these data up using the lubridate::mdy() functions.

# check the format
PayPerViewUFC$date %>% glimpse(78)
##  chr [1:219] "Nov 12/93" "Mar 11/94" "Sept 9/94" "Dec16/94" "Apr 7/95" ...
# pick the function and parse date
PayPerViewUFC$date <- lubridate::mdy(PayPerViewUFC$date) 
# check the new date PayPerViewUFC$date %>% glimpse(78)
##  Date[1:219], format: "1993-11-12" "1994-03-11" "1994-09-09" "1994-12-16" "1995-04-07" ...

The Main Event (main_event) column

These are the titles for each main event in the UFC. We can see a quick count of how many are listed as Tournament, how many rematches there are (those with a Main Event that showed up more than once), and how many events are listed only once.

PayPerViewUFC %>%
  dplyr::count(main_event, sort = TRUE) %>%
  head(10)
## # A tibble: 10 x 2
##    main_event                            n
##                                 
##  1 Tournament                           11
##  2 Chuck Liddell vs Randy Couture        3
##  3 Anderson Silva vs Chael Sonnen        2
##  4 Andrei Arlovski vs Tim Sylvia         2
##  5 Chuck Liddell vs Tito Ortiz           2
##  6 Frankie Edgar vs Benson Henderson     2
##  7 Frankie Edgar vs Gray Maynard         2
##  8 George St-Pierre vs Matt Serra        2
##  9 Johny Hendricks vs Robbie Lawler      2
## 10 Jose Aldo vs Chad Mendes              2

The ‘Buy Rates’ (buy_rate) column

This column is the pay per view buy rate, but as noted in the original post,

No need for much of an introduction here – this is a list of the sales totals for every UFC pay-per-view since Day 1. Now, since the UFC is a private company and doesn’t release sales info, all this is based on estimates, usually released by the inimitable Dave Meltzer based on info from PPV providers (and listed on the Wikipedia pages for the events – UFC 178 & UFC 179 from Tapology). And during the Dark Ages of the sport, when it was pretty much banned everywhere, no PPV info is available.

These can be formatted correctly by removing the comma (stringr::str_remove_all()) and then converting to numeric (base::as.numeric()).

PayPerViewUFC <- PayPerViewUFC %>%
  mutate(buy_rate = stringr::str_remove_all(string = buy_rate, pattern = ","),
         buy_rate = base::as.numeric(buy_rate))
PayPerViewUFC %>% dataShape()
## Observations: 219
## Variables: 4
## Class(es):  data.frame
## First/last variable: event/buy_rate
## Grouped: FALSE
## Top 5 & bottom 5 observations:
## # A tibble: 10 x 4
##    event   date       main_event                         buy_rate
##  *                                          
##  1 UFC 1   1993-11-12 Tournament                            86000
##  2 UFC 2   1994-03-11 Tournament                           300000
##  3 UFC 3   1994-09-09 Tournament                            90000
##  4 UFC 4   1994-12-16 Tournament                           120000
##  5 UFC 5   1995-04-07 Royce Gracie vs Ken Shamrock         260000
##  6 UFC 221 2018-02-11 Yoel Romero vs Luke Rockhold             NA
##  7 UFC 222 2018-03-03 Cris Cyborg vs Yana Kunitskaya           NA
##  8 UFC 223 2018-04-07 Khabib Nurmagomedov vs Al Iaquinta       NA
##  9 UFC 224 2018-05-12 Amanda Nunes vs Raquel Pennington        NA
## 10 UFC 225 2018-06-09 Robert Whittaker vs Yoel Romero      250000

Remove Tournaments to focus on events with fighters

I’m going to filter out the Main Events that were merely listed as Tournament.

UFCFighterEvents <- PayPerViewUFC %>%
  dplyr::filter(main_event != "Tournament")
UFCFighterEvents %>% dataShape()
## Observations: 208
## Variables: 4
## Class(es):  data.frame
## First/last variable: event/buy_rate
## Grouped: FALSE
## Top 5 & bottom 5 observations:
## # A tibble: 10 x 4
##    event   date       main_event                         buy_rate
##  *                                          
##  1 UFC 5   1995-04-07 Royce Gracie vs Ken Shamrock         260000
##  2 UFC 6   1995-07-14 Ken Shamrock vs Dan Severn           240000
##  3 UFC 7   1995-09-08 Ken Shamrock vs Oleg Taktarov        190000
##  4 UFC 8   1996-02-16 Ken Shamrock vs Kimo Leopoldo        300000
##  5 UFC 9   1996-05-17 Ken Shamrock vs Dan Severn           141000
##  6 UFC 221 2018-02-11 Yoel Romero vs Luke Rockhold             NA
##  7 UFC 222 2018-03-03 Cris Cyborg vs Yana Kunitskaya           NA
##  8 UFC 223 2018-04-07 Khabib Nurmagomedov vs Al Iaquinta       NA
##  9 UFC 224 2018-05-12 Amanda Nunes vs Raquel Pennington        NA
## 10 UFC 225 2018-06-09 Robert Whittaker vs Yoel Romero      250000

Separate the fighters into their own columns

Now I want to separate the names of both fighters in the main_event column into two different columns fighter_1 and fighter_2.

UFCFighterEvents <- UFCFighterEvents %>%
    tidyr::separate(
        col = main_event,
        into = c("fighter_1",
                 "fighter_2"),
        sep = " vs ",
        remove = FALSE)
UFCFighterEvents %>% dataShape()
## Observations: 208
## Variables: 6
## Class(es):  data.frame
## First/last variable: event/buy_rate
## Grouped: FALSE
## Top 5 & bottom 5 observations:
## # A tibble: 10 x 6
##    event   date       main_event         fighter_1    fighter_2   buy_rate
##  *                                         
##  1 UFC 5   1995-04-07 Royce Gracie vs K… Royce Gracie Ken Shamro…   260000
##  2 UFC 6   1995-07-14 Ken Shamrock vs D… Ken Shamrock Dan Severn    240000
##  3 UFC 7   1995-09-08 Ken Shamrock vs O… Ken Shamrock Oleg Takta…   190000
##  4 UFC 8   1996-02-16 Ken Shamrock vs K… Ken Shamrock Kimo Leopo…   300000
##  5 UFC 9   1996-05-17 Ken Shamrock vs D… Ken Shamrock Dan Severn    141000
##  6 UFC 221 2018-02-11 Yoel Romero vs Lu… Yoel Romero  Luke Rockh…       NA
##  7 UFC 222 2018-03-03 Cris Cyborg vs Ya… Cris Cyborg  Yana Kunit…       NA
##  8 UFC 223 2018-04-07 Khabib Nurmagomed… Khabib Nurm… Al Iaquinta       NA
##  9 UFC 224 2018-05-12 Amanda Nunes vs R… Amanda Nunes Raquel Pen…       NA
## 10 UFC 225 2018-06-09 Robert Whittaker … Robert Whit… Yoel Romero   250000

Pay-per-view purchases over time

Now I can visualize these data by plotting date on the x-axis and buy_rate on the y-axis.

UFCFighterEvents %>%
  ggplot(aes(x = date,
             y = buy_rate)) +
    geom_point(alpha = 0.5,
               size = 1.5) +
          ggplot2::labs(x = "Date",
                        y = "Pay-Per-View Buy Rate",
                        title = "UFC pay-per-view viewership") +
   ggplot2::labs(caption = "data source: https://goo.gl/UWhwEZ")

This shows a pretty clear rise of UFC pay-per-view purchases over time since 2001. I want to get rid of some additional zeros in the buy_rate variable by creating buy_rate_mil (which is done by dividing each number by 1,000,000).

I also want to check the top 50 UCF fights and see who the main attractions were because I’m curious about what fighters bring the most viewership (and when this tends to happen). I am going to switch to a line plot because I will be looking at fewer points and I’d like to see the trends (or changes) more clearly.

# Create buy_rate_mil --------
UFCFighterEvents <- UFCFighterEvents %>%
  dplyr::mutate(buy_rate_mil = buy_rate/1000000)
UFCFighterEventsTop50 <- UFCFighterEvents %>%
  dplyr::arrange(desc(buy_rate)) %>%
  utils::head(50)
UFCFighterEventsTop50 %>%
    ggplot(aes(x = date, y = buy_rate_mil)) +
    geom_line() +
      theme_ipsum() +
        ggplot2::labs(x = "Date",
                      y = "Pay-Per-View Buy Rate (Millions)",
                      title = "Top 50 UFC pay-per-view events") +
   ggplot2::labs(caption = "data source: https://goo.gl/UWhwEZ")

This graph shows the top ‘most viewed’ 50 UFC events. I added the theme_ipsum_rc() from the hrbrthemes package. Read more about it here..

This looks like I’m actually looking at three distinct peaks (one in mid 2009, the others in early and late 2016). Any idea who was fighting in these events?

Write a function to summarize the buy_rate and buy_rate_mil variables

I want some descriptive statistics on the two PPV metrics in this data set, so I am going to write a function that will give me a quick numerical summary of each variable. Learn more about writing functions in the awesome “Programming with dplyr” vignette.

numvarSum <- function(df, expr) {
  expr <- enquo(expr) # turns expr into a quosure
  summarise(df,
          n = sum((!is.na(!!expr))), # non-missing
          na = sum((is.na(!!expr))), # missing
          mean = mean(!!expr, na.rm = TRUE), # unquotes mean()
          median = median(!!expr, na.rm = TRUE), # unquotes median()
          sd = sd(!!expr, na.rm = TRUE), # unquotes sd()
          variance = var(!!expr, na.rm = TRUE), # unquotes var()
          min = min(!!expr, na.rm = TRUE), # unquotes min()
          max = max(!!expr, na.rm = TRUE), # unquotes max()
          se = sd/sqrt(n)) # standard error
}
UFCFighterEvents %>%
  numvarSum(buy_rate)
##     n na   mean median     sd  variance   min     max    se
## 1 184 24 437701 350000 317497 1.008e+11 35000 1650000 23406
UFCFighterEvents %>%
  numvarSum(buy_rate_mil)
##     n na   mean median     sd variance   min  max      se
## 1 184 24 0.4377   0.35 0.3175   0.1008 0.035 1.65 0.02341

How many events had more than 1 million views?

The line graph above shows the buying rate trends of the top 50 UFC events over time. However, I want some more details on the events themselves. I will start by creating a factor variable (buy_rate_fct) that sorts the buy_rate_mil variable into five levels. The numerical summaries above can help me make sure the new variable is coded correctly.

 

UFCFighterEvents <- UFCFighterEvents %>% 
 dplyr::mutate(buy_rate_fct = case_when(
buy_rate_mil < 1.00 ~ "less than 1.00 million ppvs",
buy_rate_mil >= 1.00 & buy_rate_mil < 1.25 ~ "1.00-1.25 million ppvs",
buy_rate_mil >= 1.25 & buy_rate_mil < 1.50 ~ "1.25-1.50 million ppvs",
buy_rate_mil >= 1.50 & buy_rate_mil < 1.75 ~ "1.50-1.75 million ppvs",
buy_rate_mil >= 1.75 & buy_rate_mil < 2.00 ~ "1.75-2.00 million ppvs"),
# convert to factor
buy_rate_fct = factor(buy_rate_fct,
                     levels = c("1.75-2.00 million ppvs",
                                "1.50-1.75 million ppvs",
                                "1.25-1.50 million ppvs",
                                "1.00-1.25 million ppvs",
                            "less than 1.00 million ppvs")))

 

# check the buy_rate_fct
knitr::kable(
UFCFighterEvents %>% dplyr::count(buy_rate_fct))
buy_rate_fct n
1.50-1.75 million ppvs 3
1.25-1.50 million ppvs 1
1.00-1.25 million ppvs 11
less than 1.00 million ppvs 169
NA 24

Now I can map the adjusted buy rate (buy_rate_mil) on the y-axis and the categorical buy rate variable (buy_rate_fct) to the color aesthetic to see more details of the distribution.

UFCFighterEvents %>%
  dplyr::filter(!is.na(buy_rate_fct)) %>%
    ggplot(aes(x = date,
               y = buy_rate_mil,
               group = buy_rate_fct)) +
    geom_point(aes(color = buy_rate_fct), size = 1.5) +
    ggplot2::xlab("Date") +
    ggplot2::ylab("UFC PPV Sales (in millions)") +
    ggplot2::ggtitle(label = "Pay-per-view UFC Events",
                     subtitle = "Every UFC pay-per-view since UFC 1") +
    ggplot2::labs(caption = "data source: https://goo.gl/UWhwEZ") +
      scale_color_ipsum() +
      scale_fill_ipsum() +
      theme_ipsum_rc()

Why not just assign a different color to every level of buy_rate_mil? Too many different colors would make it hard to track the differences in the PPV buy rate (I would need to keep re-checking the legend to figure out what I was seeing).

Which fighters attract the most viewership?

We can see from looking at the head() of the data in UFCFighterEventsTop50 that Conor McGregor is in four of the top 5 fights.

knitr::kable(
UFCFighterEventsTop50 %>%
  dplyr::select(main_event,
                fighter_1,
                fighter_2,
                buy_rate,
                buy_rate_mil) %>%
  head(5))
main_event fighter_1 fighter_2 buy_rate buy_rate_mil
Nate Diaz vs Conor McGregor Nate Diaz Conor McGregor 1650000 1.65
Brock Lesnar vs Frank Mir Brock Lesnar Frank Mir 1600000 1.60
Conor McGregor vs Nate Diaz Conor McGregor Nate Diaz 1500000 1.50
Eddie Alvarez vs Conor McGregor Eddie Alvarez Conor McGregor 1300000 1.30
Jose Aldo vs Conor McGregor Jose Aldo Conor McGregor 1200000 1.20

Tidy the fighters

I want to create a tidy data frame that gathers up the fighters in from the main_event column (now in the fighter_1 and fighter_2 columns). This means I want a single column (fighter_val) that lists all the fighters, and another column (fighter_key), that tells me whether they were fighter_1 or fighter_2. I also reorganize the data frame with some of dplyrs handy select() helper functions, and sort the data so we can see what this new arrangement looks like.

TidyUFCFighters <- UFCFighterEvents %>%
  tidyr::gather(key = fighter_key,
                value = fighter_val,
                fighter_1:fighter_2) %>%
  dplyr::select(event,
                date,
        dplyr::contains("buy"),
        dplyr::contains("fight"),
        main_event) %>%
  dplyr::arrange(event)
TidyUFCFighters %>% dataShape()
## Observations: 416
## Variables: 8
## Class(es):  data.frame
## First/last variable: event/main_event
## Grouped: FALSE
## Top 5 & bottom 5 observations:
## # A tibble: 10 x 8
##    event   date       buy_rate buy_rate_mil buy_rate_fct       fighter_key
##  *                                   
##  1 UFC 100 2009-07-11  1600000        1.6   1.50-1.75 million… fighter_1
##  2 UFC 100 2009-07-11  1600000        1.6   1.50-1.75 million… fighter_2
##  3 UFC 101 2009-08-08   850000        0.85  less than 1.00 mi… fighter_1
##  4 UFC 101 2009-08-08   850000        0.85  less than 1.00 mi… fighter_2
##  5 UFC 102 2009-08-29   435000        0.435 less than 1.00 mi… fighter_1
##  6 UFC 99  2009-06-13   360000        0.36  less than 1.00 mi… fighter_2
##  7 UFC Br… 1998-10-16       NA       NA                    fighter_1
##  8 UFC Br… 1998-10-16       NA       NA                    fighter_2
##  9 UFC Ja… 1997-12-21       NA       NA                    fighter_1
## 10 UFC Ja… 1997-12-21       NA       NA                    fighter_2
## # ... with 2 more variables: fighter_val , main_event 

This is exactly what I should expect–each event listed twice–once for each fighter.

Who are the top five (most occuring fighters)?

I can now use the dplyr::count() function to determine which fighter has the most main event occurrences.

knitr::kable(
TidyUFCFighters %>%
  dplyr::count(fighter_val, sort = TRUE) %>%
  head(5))
fighter_val n
Randy Couture 17
Anderson Silva 16
Tito Ortiz 15
Chuck Liddell 11
Jon Jones 11

And now I can see that when it comes to PPV purchases, Randy Couture is the most popular main event fighter in the UFC (followed by Anderson Silva and Tito Ortiz).

Which fighter draws the most PPV purchases?

Now I want to know which fighter draws the most PPV purchases in the UCF. This question is a little trickier than simply asking ‘what events have the highest PPV buy rate?’, because each event has two fighters and it isn’t always clear who is drawing the crowd. One way to get at this is by seeing how many of the most purchased events featured fighters who were also in many events.

I can do this in the following steps:
1. remove the missing PPV purchase data,
2. count the number of UFC events per fighter,
3. sort the data by the number of UFC events per fighter, then by the PPV purchase rate,
4. limit this to only the events with PPV purchase rates over 1 million.

The wrangling steps below use the same functions as above to create TopPPVFighters.

TopPPVFighters <- TidyUFCFighters %>%
  # remove missing buy rates
  filter(!is.na(buy_rate_mil)) %>%
  # count the number of occurances per fighter
  dplyr::count(fighter_val) %>%
  # rename n to fghtr_events
  dplyr::rename(fghtr_events = n) %>%
  # join back to original tidy data set
  dplyr::left_join(., TidyUFCFighters, by = "fighter_val") %>%
 # arrange by descending number of occurances and PPV buy rates
  dplyr::arrange(desc(fghtr_events, buy_rate_mil)) %>%
  # limit to events over 1 million
  dplyr::filter(buy_rate_mil > 1.00) %>%
  # only the distinct fighter_val
  dplyr::distinct(main_event, .keep_all = TRUE)
TopPPVFighters %>% dataShape()
## Observations: 14
## Variables: 9
## Class(es):  tbl_df; tbl; data.frame
## First/last variable: fighter_val/main_event
## Grouped: FALSE
## Top 5 & bottom 5 observations:
## # A tibble: 10 x 9
##    fighter_val    fghtr_events event   date       buy_rate buy_rate_mil
##                                         
##  1 Anderson Silva           16 UFC 168 2013-12-28  1025000         1.02
##  2 Randy Couture            14 UFC 91  2008-11-15  1010000         1.01
##  3 Chuck Liddell            11 UFC 66  2006-12-30  1050000         1.05
##  4 Rashad Evans              9 UFC 114 2010-05-29  1050000         1.05
##  5 Jose Aldo                 7 UFC 194 2015-12-12  1200000         1.2
##  6 Brock Lesnar              5 UFC 116 2010-07-03  1160000         1.16
##  7 Conor McGregor            5 UFC 196 2016-03-05  1500000         1.5
##  8 Conor McGregor            5 UFC 202 2016-08-20  1650000         1.65
##  9 Conor McGregor            5 UFC 205 2016-11-12  1300000         1.3
## 10 Amanda Nunes              3 UFC 200 2016-07-09  1200000         1.2
## # ... with 3 more variables: buy_rate_fct , fighter_key ,
## #   main_event 

Now I can create a plot that uses the TopPPVFighters data frame, and maps the name of the fighters to the point on the graph. But before I do this, I want to make a few adjustments to the fivethirtyeight theme from ggthemes. As you can see from the code below, I make a few minor changes to display the axis titles and adjust the fonts. Read more about theme_foundation() and ggthemes.

# check the colors for fivethirtyeight theme
# ggthemes_data$fivethirtyeight
 # dkgray medgray ltgray red blue green 
# "#3C3C3C" "#D2D2D2" "#F0F0F0" "#FF2700" "#008FD5" "#77AB43"
theme_fivethirtyeightv1.1 <- function(base_size = 11, base_family = "sans") {
 (theme_foundation(base_size = base_size, base_family = base_family) + 
     theme(line = element_line(color = "black"), 
             rect = element_rect(
                 fill = "#F0F0F0", 
                 linetype = 0, 
                 color = NA), 
       text = element_text(color = "#3C3C3C"), 
       axis.text = element_text(), 
       axis.ticks = element_blank(),
       axis.line = element_blank(),
       legend.background = element_rect(
                fill = "#F0F0F0",
                color = "#D2D2D2",
                size = 1), 
       legend.position = "bottom",
       legend.direction = "horizontal",
         panel.grid = element_line(color = NULL), 
         panel.grid.major = element_line(color = "#D2D2D2"), 
         panel.grid.minor = element_blank(), 
         plot.title = element_text(family = "mono",
                     hjust = 0, 
                     size = rel(1.5), 
                     face = "bold"), 
       plot.margin = unit(c(1, 1, 1, 1), "lines"), 
       strip.background = element_rect()))
}
TopPPVFighters %>% 
 ggplot(aes(x = date, 
            y = buy_rate_mil, 
            label = fighter_val)) + 
 ggplot2::geom_point(aes(size = fghtr_events,
                         color = fighter_val),
                         alpha = 0.7,
                         show.legend = TRUE) +
 ggplot2::guides(color = FALSE) +
 ggplot2::scale_size_continuous(name = "number of events",
                               breaks = c(3, 6, 9, 12, 15),
                               labels = c("3", "6", "9", 
                                        "12", "15")) +
 ggrepel::geom_text_repel(direction = "both",
                          hjust = 0.5,
                          vjust = 0.5,
                          segment.size = 0.5,
                          color = "black",
                          size = 2.5) +
 ggthemes::scale_color_calc() +
 theme_fivethirtyeightv1.1() +
            ggplot2::xlab("Date") +
            ggplot2::ylab("UFC PPV Sales (in millions)") +
            ggplot2::ggtitle(label = "UFC Events by Number of Appearances", 
            subtitle = "larger point = more appearances") + 
            ggplot2::labs(caption = "data source: https://goo.gl/UWhwEZ")

 

The thing to notice about this graph is that Conor McGregor and Brock Lesnar are the only two fighters above the 1.5 million PPV purchase mark. It remains unknown if McGregor will return to the UFC, so it’s important to consider how many of the big names in UFC are retired or have moved onto different franchises:

After looking at PPV purchase numbers and the recent exodus of talent (voluntary or otherwise), it makes sense that Saturday’s fight felt more like a staged drama event than a typical bout. The UFC has to compete with an increasing number of fighting organizations (Viacom’s Bellator, and Rizin Fighting Federation–formerly Pride–out of Tokyo, and nothing attracts fight fans like a narrative.

In the next post I will look into the MMA attendance data to see if the same big names also sell the most tickets.

devtools::session_info()
##  setting  value
##  version  R version 3.5.0 (2018-04-23)
##  system   x86_64, darwin15.6.0
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  tz       America/Los_Angeles
##  date     2018-07-09
##
##  package       * version    date
##  assertthat      0.2.0      2017-04-11
##  backports       1.1.2      2017-12-13
##  base          * 3.5.0      2018-04-24
##  base64enc       0.1-3      2015-07-28
##  bindr           0.1.1      2018-03-13
##  bindrcpp      * 0.2.2      2018-03-29
##  broom           0.4.5      2018-07-03
##  Cairo         * 1.5-9      2015-09-26
##  cellranger      1.1.0      2016-07-27
##  cli             1.0.0      2017-11-05
##  colorspace      1.3-2      2016-12-14
##  compiler        3.5.0      2018-04-24
##  crayon          1.3.4      2017-09-16
##  curl            3.2        2018-03-28
##  datasets      * 3.5.0      2018-04-24
##  devtools        1.13.6     2018-06-27
##  digest          0.6.15     2018-01-28
##  dplyr         * 0.7.6      2018-06-29
##  evaluate        0.10.1     2017-06-24
##  extrafont     * 0.17       2014-12-08
##  extrafontdb     1.0        2012-06-11
##  fastmatch     * 1.1-0      2017-01-28
##  forcats       * 0.3.0      2018-02-19
##  foreign         0.8-70     2017-11-28
##  formatR         1.5        2017-04-25
##  ggplot2       * 3.0.0.9000 2018-07-09
##  ggrepel       * 0.8.0.9000 2018-07-09
##  ggthemes      * 3.5.0      2018-05-07
##  glue            1.2.0      2017-10-29
##  graphics      * 3.5.0      2018-04-24
##  grDevices     * 3.5.0      2018-04-24
##  grid          * 3.5.0      2018-04-24
##  gridExtra     * 2.3        2017-09-09
##  gtable          0.2.0      2016-02-26
##  haven           1.1.2      2018-06-27
##  highr           0.7        2018-06-09
##  hms             0.4.2      2018-03-10
##  hrbrmisc      * 0.2.0      2018-07-09
##  hrbrthemes    * 0.5.0      2018-07-08
##  htmltools       0.3.6      2017-04-28
##  htmlwidgets     1.2        2018-04-19
##  httr            1.3.1      2017-08-20
##  janeaustenr     0.1.5      2017-06-10
##  jsonlite        1.5        2017-06-01
##  knitr           1.20       2018-02-20
##  labeling        0.3        2014-08-23
##  lattice         0.20-35    2017-03-25
##  lazyeval        0.2.1      2017-10-29
##  lubridate       1.7.4      2018-04-11
##  magrittr      * 1.5        2014-11-22
##  Matrix          1.2-14     2018-04-13
##  memoise         1.1.0      2017-04-21
##  methods       * 3.5.0      2018-04-24
##  mnormt          1.5-5      2016-10-15
##  modelr          0.1.2      2018-05-11
##  munsell         0.5.0      2018-06-12
##  nlme            3.1-137    2018-04-07
##  pacman        * 0.4.6      2017-05-14
##  parallel        3.5.0      2018-04-24
##  pillar          1.2.3      2018-05-25
##  pkgconfig       2.0.1      2017-03-21
##  plyr            1.8.4      2016-06-08
##  psych           1.8.4      2018-05-06
##  purrr         * 0.2.5      2018-05-29
##  R6              2.2.2      2017-06-17
##  Rcpp            0.12.17    2018-05-18
##  readr         * 1.1.1      2017-05-16
##  readxl          1.1.0      2018-04-20
##  reshape2        1.4.3      2017-12-11
##  rlang           0.2.1      2018-05-30
##  rmarkdown       1.10       2018-06-11
##  rprojroot       1.3-2      2018-01-03
##  rstudioapi      0.7        2017-09-07
##  Rttf2pt1        1.3.7      2018-06-29
##  rvest         * 0.3.2      2016-06-17
##  scales        * 0.5.0.9000 2018-07-09
##  selectr         0.4-1      2018-04-06
##  seleniumPipes   0.3.7      2016-10-01
##  SnowballC       0.5.1      2014-08-09
##  stats         * 3.5.0      2018-04-24
##  stringi         1.2.3      2018-06-12
##  stringr       * 1.3.1      2018-05-10
##  tibble        * 1.4.2      2018-01-22
##  tidyr         * 0.8.1      2018-05-18
##  tidyselect      0.2.4      2018-02-26
##  tidytext      * 0.1.9      2018-05-29
##  tidyverse     * 1.2.1      2017-11-14
##  tokenizers      0.2.1      2018-03-29
##  tools           3.5.0      2018-04-24
##  utf8            1.1.4      2018-05-24
##  utils         * 3.5.0      2018-04-24
##  whisker         0.3-2      2013-04-28
##  withr           2.1.2      2018-03-15
##  xml2          * 1.2.0      2018-01-24
##  yaml            2.1.19     2018-05-01
##  source
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  local
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  local
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  local
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  Github (tidyverse/ggplot2@a27c365)
##  Github (slowkow/ggrepel@8fa50e0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  local
##  local
##  local
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  Github (hrbrmstr/hrbrmisc@ebb928c)
##  Github (hrbrmstr/hrbrthemes@beae03c)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  local
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  local
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  Github (hadley/scales@a0f0da1)
##  CRAN (R 3.5.0)
##  cran (@0.3.7)
##  CRAN (R 3.5.0)
##  local
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  local
##  CRAN (R 3.5.0)
##  local
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)
##  CRAN (R 3.5.0)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.