"Since the summer of 2013, UI Assistant Professor of business Kang Zhao and UI graduate student Michael Lash* have tried to find a reliable way to predict the profitability of film releases. Their shared interests in movies and big data inspired them to develop a unique way for determining box-office success. [...]
'This is a nice example of data-mining computer science,' [Street] said, 'People working on specific business-related problems in a way other people don’t know how to predict.' Because manually copying and pasting data from thousands of movies would be impossible, Lash had to create a code that would automatically collect statistics. 'I built what’s called a web scraper to scrape lots of data from a site called Box Office Mojo, and later augmented that to scrape additional information from IMDB,' he said. 'We now have, I think, more than 16,000 or 17,000 movies in our data base.'
The researchers said they used an algorithm called 'random forest' to predict profitability."
* Michael Lash is a second year PhD student in The Department of Computer Science at the University of Iowa [Now at University of Kansas]. He is currently working with Nick Street, The Computational Epidemiology Research Group (Alberto Maria Segre), and Kang Zhao. His interests and areas of research are in machine learning, data mining, and predictive analytics.