Movie information and user ratings from IMDB.com (wide format).

movies_wide

Format

A data frame with 1,579 rows and 13 variables

  • title. Title of the movie.

  • year. Year of release.

  • budget. Total budget in millions of US dollars

  • length. Length in minutes.

  • rating. Average IMDB user rating.

  • votes. Number of IMDB users who rated this movie.

  • mpaa. MPAA rating.

  • action, animation, comedy, drama, documentary, romance, short. Binary variables representing if movie was classified as belonging to that genre.

  • NumGenre. The number of different genres a film was classified in an integer between one and four

Source

https://CRAN.R-project.org/package=ggplot2movies

Details

Modified dataset from ggplot2movies package.

The internet movie database, http://imdb.com/, is a website devoted to collecting movie data supplied by studios and fans. It claims to be the biggest movie database on the web and is run by amazon. More information about imdb.com can be found online, http://imdb.com/help/show_leaf?about, including information about the data collection process, http://imdb.com/help/show_leaf?infosource.

Movies were selected for inclusion if they had a known length and had been rated by at least one imdb user. Small categories such as documentaries and NC-17 movies were removed.

Examples

dim(movies_wide)
#> [1] 1579 13
head(movies_wide)
#> # A tibble: 6 x 13 #> title year length budget rating votes mpaa Action #> <chr> <int> <int> <dbl> <dbl> <int> <fct> <int> #> 1 'Til There Was You 1997 113 23 4.8 799 PG-13 0 #> 2 10 Things I Hate About You 1999 97 16 6.7 19095 PG-13 0 #> 3 100 Mile Rule 2002 98 1.1 5.6 181 R 0 #> 4 13 Going On 30 2004 98 37 6.4 7859 PG-13 0 #> 5 13th Warrior, The 1999 102 85 6.1 14344 R 1 #> 6 15 Minutes 2001 120 42 6.1 10866 R 0 #> Animation Comedy Drama Romance NumGenre #> <int> <int> <int> <int> <int> #> 1 0 1 0 1 2 #> 2 0 1 0 1 2 #> 3 0 1 0 0 1 #> 4 0 1 1 1 3 #> 5 0 0 0 0 1 #> 6 0 0 1 0 1
dplyr::glimpse(movies_wide)
#> Rows: 1,579 #> Columns: 13 #> $ title <chr> "'Til There Was You", "10 Things I Hate About You", "100 ... #> $ year <int> 1997, 1999, 2002, 2004, 1999, 2001, 1972, 2003, 1999, 200... #> $ length <int> 113, 97, 98, 98, 102, 120, 180, 107, 101, 99, 129, 124, 9... #> $ budget <dbl> 23.0, 16.0, 1.1, 37.0, 85.0, 42.0, 4.0, 76.0, 6.0, 26.0, ... #> $ rating <dbl> 4.8, 6.7, 5.6, 6.4, 6.1, 6.1, 7.3, 5.1, 5.4, 2.5, 7.6, 8.... #> $ votes <int> 799, 19095, 181, 7859, 14344, 10866, 1754, 9556, 4514, 20... #> $ mpaa <fct> PG-13, PG-13, R, PG-13, R, R, PG, PG-13, R, R, R, R, R, R... #> $ Action <int> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, ... #> $ Animation <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... #> $ Comedy <int> 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, ... #> $ Drama <int> 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, ... #> $ Romance <int> 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, ... #> $ NumGenre <int> 2, 2, 1, 3, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 3, 2, 2, ...