Skip to contents

Movie information and user ratings from IMDB.com (wide format).

Usage

movies_wide

Format

A data frame with 1,579 rows and 13 variables

  • title. Title of the movie.

  • year. Year of release.

  • budget. Total budget in millions of US dollars

  • length. Length in minutes.

  • rating. Average IMDB user rating.

  • votes. Number of IMDB users who rated this movie.

  • mpaa. MPAA rating.

  • action, animation, comedy, drama, documentary, romance, short. Binary variables representing if movie was classified as belonging to that genre.

  • NumGenre. The number of different genres a film was classified in an integer between one and four.

Details

Modified dataset from ggplot2movies package.

The internet movie database, https://imdb.com/, is a website devoted to collecting movie data supplied by studios and fans. It claims to be the biggest movie database on the web and is run by amazon.

Movies were selected for inclusion if they had a known length and had been rated by at least one IMDB user. Small categories such as documentaries and NC-17 movies were removed.

Examples

dim(movies_wide)
#> [1] 1579   13
head(movies_wide)
#> # A tibble: 6 × 13
#>   title      year length budget rating votes mpaa  Action Animation Comedy Drama
#>   <chr>     <int>  <int>  <dbl>  <dbl> <int> <fct>  <int>     <int>  <int> <int>
#> 1 'Til The…  1997    113   23      4.8   799 PG-13      0         0      1     0
#> 2 10 Thing…  1999     97   16      6.7 19095 PG-13      0         0      1     0
#> 3 100 Mile…  2002     98    1.1    5.6   181 R          0         0      1     0
#> 4 13 Going…  2004     98   37      6.4  7859 PG-13      0         0      1     1
#> 5 13th War…  1999    102   85      6.1 14344 R          1         0      0     0
#> 6 15 Minut…  2001    120   42      6.1 10866 R          0         0      0     1
#> # ℹ 2 more variables: Romance <int>, NumGenre <int>
dplyr::glimpse(movies_wide)
#> Rows: 1,579
#> Columns: 13
#> $ title     <chr> "'Til There Was You", "10 Things I Hate About You", "100 Mil…
#> $ year      <int> 1997, 1999, 2002, 2004, 1999, 2001, 1972, 2003, 1999, 2000, …
#> $ length    <int> 113, 97, 98, 98, 102, 120, 180, 107, 101, 99, 129, 124, 93, …
#> $ budget    <dbl> 23.0, 16.0, 1.1, 37.0, 85.0, 42.0, 4.0, 76.0, 6.0, 26.0, 12.…
#> $ rating    <dbl> 4.8, 6.7, 5.6, 6.4, 6.1, 6.1, 7.3, 5.1, 5.4, 2.5, 7.6, 8.0, …
#> $ votes     <int> 799, 19095, 181, 7859, 14344, 10866, 1754, 9556, 4514, 2023,…
#> $ mpaa      <fct> PG-13, PG-13, R, PG-13, R, R, PG, PG-13, R, R, R, R, R, R, P…
#> $ Action    <int> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, …
#> $ Animation <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ Comedy    <int> 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, …
#> $ Drama     <int> 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, …
#> $ Romance   <int> 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, …
#> $ NumGenre  <int> 2, 2, 1, 3, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 3, 2, 2, 1, …