Movie information and user ratings from IMDB.com (wide format).
Format
A data frame with 1,579 rows and 13 variables
title. Title of the movie.
year. Year of release.
budget. Total budget in millions of US dollars
length. Length in minutes.
rating. Average IMDB user rating.
votes. Number of IMDB users who rated this movie.
mpaa. MPAA rating.
action, animation, comedy, drama, documentary, romance, short. Binary variables representing if movie was classified as belonging to that genre.
NumGenre. The number of different genres a film was classified in an integer between one and four.
Details
Modified dataset from ggplot2movies
package.
The internet movie database, https://imdb.com/, is a website devoted to collecting movie data supplied by studios and fans. It claims to be the biggest movie database on the web and is run by amazon.
Movies were selected for inclusion if they had a known length and had been rated by at least one IMDB user. Small categories such as documentaries and NC-17 movies were removed.
Examples
dim(movies_wide)
#> [1] 1579 13
head(movies_wide)
#> # A tibble: 6 × 13
#> title year length budget rating votes mpaa Action Animation Comedy Drama
#> <chr> <int> <int> <dbl> <dbl> <int> <fct> <int> <int> <int> <int>
#> 1 'Til The… 1997 113 23 4.8 799 PG-13 0 0 1 0
#> 2 10 Thing… 1999 97 16 6.7 19095 PG-13 0 0 1 0
#> 3 100 Mile… 2002 98 1.1 5.6 181 R 0 0 1 0
#> 4 13 Going… 2004 98 37 6.4 7859 PG-13 0 0 1 1
#> 5 13th War… 1999 102 85 6.1 14344 R 1 0 0 0
#> 6 15 Minut… 2001 120 42 6.1 10866 R 0 0 0 1
#> # ℹ 2 more variables: Romance <int>, NumGenre <int>
dplyr::glimpse(movies_wide)
#> Rows: 1,579
#> Columns: 13
#> $ title <chr> "'Til There Was You", "10 Things I Hate About You", "100 Mil…
#> $ year <int> 1997, 1999, 2002, 2004, 1999, 2001, 1972, 2003, 1999, 2000, …
#> $ length <int> 113, 97, 98, 98, 102, 120, 180, 107, 101, 99, 129, 124, 93, …
#> $ budget <dbl> 23.0, 16.0, 1.1, 37.0, 85.0, 42.0, 4.0, 76.0, 6.0, 26.0, 12.…
#> $ rating <dbl> 4.8, 6.7, 5.6, 6.4, 6.1, 6.1, 7.3, 5.1, 5.4, 2.5, 7.6, 8.0, …
#> $ votes <int> 799, 19095, 181, 7859, 14344, 10866, 1754, 9556, 4514, 2023,…
#> $ mpaa <fct> PG-13, PG-13, R, PG-13, R, R, PG, PG-13, R, R, R, R, R, R, P…
#> $ Action <int> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, …
#> $ Animation <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ Comedy <int> 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
#> $ Drama <int> 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, …
#> $ Romance <int> 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
#> $ NumGenre <int> 2, 2, 1, 3, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 3, 2, 2, 1, …