Articles

The Petabyte Era of Gaming Data

Article Author
Dr. Ashok K. Singh and Andrew Cardno
Publish Date
August 31, 2008
Article Tools
View all articles in the CEM Archive
Author: 
Dr. Ashok K. Singh and Andrew Cardno

The volume of data in the world is increasing at an extraordinary rate. This growth rate is so staggering that we now consider it possible only to estimate the total volume and the potential end point. This astronomical growth in data storage means that by 2010 the world will be storing more than 1021 data points (10 with 21 zeros)—this number is about the same as the estimated number of stars in the universe.1 Businesses operating in this universe of information will be challenged to apply not only their own data, but also to relate it to the data produced and stored by other businesses.

Back in 1992 Wal-Mart started with a database of 1 terabyte (TB) for more than 3,600 stores in the United States.2 This grew to more than 4 petabytes (PB) of data for 6,000 stores by 2007.3 Likewise, the casino industry has not been lagging in the data warehousing and data mining area—cross-referencing external databases with their own databases, casinos have been using loyalty cards for a long time. When a customer walks into a casino and swipes his card, a network of databases begins to capture his total play time, his total trip win or loss, and even his betting strategy for skill games. This information is used in many ways, including estimating player worth so that he can be given suitable rewards, including comps. (The total value of comps given out by casinos is considerable. MGM Mirage, for example, gave out almost $300 million in comps in 2000.)

Casino databases are rather small relative to the Wal-Mart database, but the total data collected by various casinos adds up to a significant amount. MGM Mirage has 6TB of data on its 9 million customers. Harrah’s Total Rewards program has 29 million members, from which its database can be estimated at about 18TB. Research shows that casino loyalty programs play an important role in encouraging customer loyalty, as can be seen in Chart 1.

The amount of data collected by casinos is going to increase with the advent of downloadable games. Changes are already happening—CityCenter, the new $9.2 billion, 4,000-room hotel-casino project of MGM Mirage and Dubai World expected to open in 2009, will be wired for the slot machines of the future. According to a Las Vegas Sun news report, CityCenter will be populated with networked slot machines connected to a server in a back office. These server-based slot machines will allow the players to select a game from a large menu, order drinks, print show tickets, and participate in progressive games. All of this data (gaming, comps, tickets, etc.) will be continuously collected and stored for data mining applications in order to increase customer satisfaction and customer loyalty.

Growth in Diversity
Gaming systems are growing dramatically in number, complexity and data volumes generated.

“At Fontainebleau resorts, we believe that world-class resorts simply cannot be run, nor can revenue be optimized, without world-class information systems. Our resort plans to open with over 156 separate systems, and this sea of systems will provide us with tremendous ability to offer our guests a unique experience. It also produces a huge volume of complex and interrelated data after the systems are operational, and we see significant value coming from action that is driven from insight gained from the operational systems.”
                                                                             –Tim Rod, CIO, Fontainebleau Resorts

As Rod states, he is utilizing more than 156 systems to run a large-scale resort operation. The sheer volume of data and complexity of the entertainment business now makes it a necessity to have automated systems for everything from restaurant management to valet service. The byproduct of what seems to be a massive number of systems is a massive diversity in the data collected.

“When the industry could rely mainly on gaming to produce the cash, things were easy. Now efficient management of the entire system is critical,” added a casino general manager who wished to remain anonymous.

Growth in Density
The advent of downloadable games will give huge flexibility and control to the user, but this flexibility will also produce a torrent of data. We calculate that it will produce at least one line of data per each game play, or approximately 480 times more data than current slot monitoring systems, where data is typically stored hourly. With some of the more complex systems, we expect that multiple interactions for one play will be tracked separately, further increasing the volume of data.

The largest dataset for most traditional casino operators is generated by slot machine transactions, where two records are collected per game per hour: 1 poll and approximately 1 rating. This results in 2 x 24 x 365 x 1,000 x 5 = 87.6 x 106 records, which are stored for five years. This amounts to approximately 82GB of data for a casino with 1,000 slot machines. As there are approximately 720,000 slot machines in U.S. casinos, we estimate the total gaming data size for 2008 to be 58.74TB.

We next estimate the total gaming data for 2016. With each game play stored for each slot machine, and at an estimated 8 plays per minute, 8 x 60 = 480 polls per hour. Assuming that there will still be approximately 1 rating per slot per hour, this amounts to 481 records per hour for one slot machine. Further, assuming a 25 percent increase in the number of casinos in the United States, the arithmetic as shown above yields approximately 35PB of data stored by the year 2016.

Growth in the Industry
Each U.S. casino, racino and Native American casino has started using data warehousing and data mining, and the total number of commercial casinos in the United States is steadily increasing (see Chart 2). The growth rate for the number of casinos in the United States estimated from 2006–2008 data is 6 for commercial casinos and racinos, while the growth rate for Native American casinos is 10 per year. As more and more casinos are built, and each starts its own customer loyalty program, the total data stored will also increase.

Centralized Data and Data Sharing
As the industry grows in its capacity to collect data, operators who share data with each other and with their suppliers will gain an information advantage. This advantage will further lock in the drive to accumulate data and the drive to apply it for value. It is likely that the casinos that are heavily involved in data warehousing and data mining, like Google in the marketing industry, will develop new business models and become large players in this new market environment.

New Analytical and Statistical Techniques
George E. P. Box, one of the most influential statisticians of the 20th century, once said, “All models are wrong, but some are useful.” An earlier version of this idea is, “Models, of course, are never true, but fortunately it is only necessary that they be useful.” There is certainly truth in these statements. We must remember that a model is typically developed for a purpose, and things may go wrong if one tries to use it for a purpose other than what it was designed for. We must also keep in mind that with the amount of data now available in a typical data mining application, the usual statistical methods of inference are typically not needed. The problem of formulating a hypothesis and testing it based on information in a sample has been replaced by the problem of finding patterns and associations in vast amounts of data. There will still be problems for which statistical inference will be required, no matter how much data we have. Forecasting gaming revenues into the future is one example. In this case, one can use the time series method of Autoregressive Integrated Moving Average (ARIMA) modeling to project the gaming revenue series into the future, and also to calculate confidence and prediction intervals for the future series.

As gaming transforms itself in the petabyte age of gaming data, we will see traditional rules of thumb be replaced by huge supercomputer correlation calculations. Like unlocking the DNA sequence, gaming data for many situations will now be treated as a whole population and not a sample.

Data Visualization
Advanced data visualizations applied in new and exciting ways will emerge as the interface to the data. In an era where most data is never seen by a human, visual summarization will allow humans to bridge the gap and truly apply their intuition to these huge data sets.

The Players of the Future
Players have very specific needs. With massive data gathering, casinos will no longer have to cluster and sample. Casinos will now be able to “know” individual players and meet their exact needs, and will not have to categorize and build hierarchies of customer preferences. A massive casino will have the capability to essentially transform its casino floor to a 15-machine bar where a regular player walks in and is handed a cup of coffee just the way he likes it. Casino comps will be specific to the player, lifting the gaming experience to another level.

The Challenge
As an industry that has been early to adopt tracking systems, the challenge is now on to become an industry that integrates, visualizes and acts on the petabytes of data that we collect.

Footnotes
1 http://imagine.gsfc.nasa.gov/docs/ask_astro/answers/970115.html
2 www.teradata.com/t/page/169696/
3 http://storefrontbacktalk.com/story/080307walmart.php

Dr. Ashok K. Singh has taught statistics, mathematics and operations research courses at New Mexico Tech, Socorro, N.M., and statistics and mathematics courses at University of Nevada, Las Vegas. He has over 75 publications in theoretical and applied statistics.


Andrew Cardno has more than 16 years of experience in business analytics, ranging from modeling healthcare drive times to casino gaming floor analytics. He often presents on the future of analytics across the world and has spent the last seven years living in the United States and working with corporations around the world.

Comments

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.