The following is some notes and slides from my East Coast Games Conference presentation.
My name is Zack Hiwiller. I’m a game designer from Orlando, Florida. I’ve worked on a bunch of different platforms from the GBA to the iPad, from tiny independent studios to large ones like EA and Gameloft and on traditional retail products and free-to-play apps. Today I want to talk about how players spend and how that can inform your basic design.
But first: why? Why talk about spends and design in the same breath? Aren’t the monetization metrics something for the producers and suits to worry about and design the land of limitless creativity? Sometimes. It depends. There are infinite ideas out there and you have a limited time on this Earth to make games. If you work for someone, then you need your game to make a profit at the end of the day to keep making games. If you are making games as a hobby, you may want to be able to choose w
hich ideas from that infinite set get people involved enough to want to give you money for it. It’s one method of validation.
Today’s talk is going to cover three areas. First, I’m going to give a quick primer on statistical distributions, just so no one is lost. Next, I’m going to go through some simulations of thousands of theoretic games to see what their inputs, sometimes using real world data, can tell us. Finally, we are going to try and sum up some design lessons that we can glean from what we found.
Caveat: I’m not a statistician. Someone once said that a little knowledge is much more dangerous than ignorance and it is true. I could be completely off-base. Some elements of this talk come with huge caveats and I try to identify them as I go, but I may forget to mention them. One major one is that I’m using bounded distributions for what in real-life is unbounded. I know this. I’m not trying to come up with “the answer” but instead try to visualize things in a different way to get us closet to “the answer”. I think this is foundation for very useful research that can be done in the future, but I don’t pretend that this has any scientific rigor beyond the: “hey, look at this a bit closer”.
There’s been a lot of talk in the past decade about “fat-tailed” distributions and how they model real life phenomena. Chris Anderson’s The Long Tail is one of these, but I credit Nicholas Nassim Taleb’s books The Black Swan and Fooled by Randomness in bringing the topic to the forefront. Mark Buchanan’s Nexus is also a great pop-sci look at the math and science of networks.
Let’s say that you had access to a service that would get you a player for three dollars. You look at your revenue and you divide by the number of players and you get $3.25. This is what the industry calls “APRU”, average revenue per user. Since you expect $3.25 for a user and it costs $3.00 per user, you should do it right? That’s $0.25 profit. Well, maybe. It depends on what the underlying distribution looks like. On average, sure you will gain $0.25/user, but it may take a long time to get to that average. If your underlying distribution is what’s called “normal” (on the left above), most of the weight is clustered around the mean and that’s a likely scenario. If you have something like on the right, then that’s much more risky. Most people give you $0. Most $3 acquisitions will result in no revenue return. Now how certain are you to get that n+1th user?
Here’s the quick recap on distributions. What’s above is an approximation of what’s called the normal distribution. It’s sometimes called a bell curve. If you take a bunch of samples of American men and graphed their heights, it would look like this. Most would be around 5’9″, but some will be very short and some will be very tall. Lots of things are normally distributed so we use this all the time: the lifetime of a lightbulb, an SAT score, the size of snowflakes, and how much cereal is in each box of Cap’n Crunch can all be modeled by the normal distribution. The law of large numbers even makes the means of every independent random variable follow the normal distribution. It’s everywhere.
But it isn’t the only type of distribution. Above is an approximation of the Pareto distribution, named after 19th century economist Vilfredo Pareto. In a Pareto distribution, most of the weight is near zero, but there is a fat tail that goes out very far. Imagine if heights were distributed in this way. Most everyone would be 5’1″, every once in a while you would get someone 5’3″ or so, but don’t think you will ever get someone NBA sized. Imagine if the cereal in your Cap’n Crunch was Pareto-distributed. You would get two or three bits in most boxes, but somewhere out there would be a box with the mass of a neutron star.
There are tons of things that are Pareto-disributed. Pareto came up with this when observing that 80% of the wealth in Italy at the time was held by 20% of the population, which is true more or less for most every society throughout history. But there are other things that are Pareto distributed like the number of links to a particular website or the power of Earthquakes. Most Earthquakes are very minor, unnoticeable without sensitive equipment. But every once in a while there is a “Big One” with great power.
There are vast differences between normal (or Gaussian) distributions and power (or Paretian) distributions. Whereas the Gaussian distributions have a steady mean and variance, Paretian distributions do not. Since Paretian distributions are highly sensitive to extreme values (the “big one” earthquakes account for the vast majority of the damages for earthquakes in the United States). However, Gaussian distributions often ignore outliers as anomalous. Think about it this way: if I’m opening a tailor’s shop, I’m not preparing for Yao Ming to walk in. He’s an outlier and I can ignore him.
One of the main differences in these two types of distributions has to do with independence. In Gaussian distributions, we assume that events are independent. In power distributions, we assume that events are interconnected. For instance, if I am already rich I have more opportunities to invest, which means I have the ability to become even richer.
But if a power distribution has no steady mean or variance because a Yao Ming instantly shifts the mean and variance, then what does that say about acting on statistics that use the mean or variance? This means saying things like the average spend of a user if the user spends are Paretian has little meaning! For more on this, see McKelvey (2005).
So do we just throw our hands up and say we cannot say anything about Paretian distributions? No. We have to find a way to work with what we have. If our models can incorporate the chance of these outliers then it can at least point us in the direction of the truth. By definition, we cannot account for Taleb’s Black Swans, but we can model the universe around them.
Back to games since that’s why we are here. Let’s do the simplest possible simulation that can be useful to us. Let’s say there are two types of customers. Morlocks are the jerks who download and play your game but never pay you. Eloi are the enlightened few who give you money for your work.
In Experiment 1, we say that we have a game where 1 in a thousand of your users are Eloi, but these Eloi spend bank: $1,000 each. This $1,000 figure is called the ARPPU or Average Revenue per Paying User since we are only averaging the people who actually give us money. When we run a simulation of 1,000 independent games (independent in the statistical sense, not “indie”) each with 1,000 monthly players, we would expect, on average, to get $1,000 per game. But the results show a lot more variability: 38.6% of games never capture an Eloi at all and get nothing. 1.9% capture 4x or more Eloi as expected and make a great deal of money. Even with this simple model, we see that there is a lot of variability.
But we don’t have to use made up fake numbers. Swrve, the mobile metrics company, puts out great reports showing what games that use their analytics software are reporting. In the January report, 1.5% of total players made a purchase and the average that they spend was $15.27. If we change our Experiment 1 numbers to this and add the 30% fee that Apple, Google, et al., require, we have Experiment 2:
This shows the distribution for 1,000 games. Most games make near the expected value of $0.23 per user. But knowing your revenue is only one side of the story. What about costs? At EA, when we were doing back-of-the-napkin calculations for profit and loss statements we used the heuristic of $10,000 per developer per month. But that’s EA. They have big offices with big time executives that take home big time paychecks. Let’s say that your team is super efficient and can get the same job done for less than $6,000 per person per month. Let’s say your team of four for your mobile game costs $23,000/month. We will use that odd number because it is a reasonable cost and makes the math easy when your ARPU is $0.23.
We run another version of Experiment 2 that adds costs into the equation. So using Swrve’s numbers and a break-even of $23,000/month, how often do you break even? Naturally, that will depend on how many users you have. Before, we were doing our calculations on a per-kilo-user basis. We can’t do that here so we need to come up with a number. I’m using 100,000 users/month. That sounds like a lot and it is. I worked on an independent game called Fire and Dice which was critically successful and was featured on Kotaku during prime game-buying hours. We had around 24,000 downloads total give or take. To give you a sense of scale though, Distimo says you need to get 23,000 users per day to be seen on the Top 50 chart. So 100,000 per month is somewhere in-between wild success and middling success.
What we see in this experiment is not a single run of a thousand exceeded the break-even. One problem is the 30% portal tax. When we remove it, we get much better odds: a 43% chance of breaking even. The real problem though is cost. If you were to decrease the cost to $14,500/month, then you could be nearly certain of breaking even.
This is a very simplified model of the world. If we could increase the fidelity of the models, we could get results that better approximate the real world. One assumption that was overly simple is that players are neatly classified as either Eloi or Morlock. But that’s not even close to reality.
There are as many types of players as there are players. If we could model the Eloi with a bit more accuracy, maybe we would come closer to what is the case in reality. In Swrve’s data, they break down the spending habits of players who do actually spend money into ten deciles. Each decile has a listed frequency of purchases and an average price per purchase. This allows us to make ten “castes” of Eloi and break up our model.
Using Swrve’s 11 castes (10 Eloi + Morlock), we can get a little closer to something modeling reality. We run 1,000 simulated games at 1,000 simulated users/month each using the above mentioned spending frequency. If we count the $30 portal tax, you need to make $328.57/1000 users to actually make $0.23/user. If that is your break-even, then how often do you succeed? This is Experiment 3.
In only 45.5% of runs, the game makes $230/1000 users. This is the amount you need to make to get $0.23/user without the portal tax. With the portal tax, the figure of successes drops to less than 15%. Interestingly, some of the runs collected less than $25 per 1,000 users while others collected over $600 per 1,000 users. If you had two games at your studio that performed like that, it would be tempting to say that one had an inspired design and successful monetization strategy where the other was crud. Don’t be fooled.
So running numbers is fun and it looks like information, but what can it really tell us? I have six lessons that I think are pertinent:
Lesson 1: The more fat-tailed the distribution, the more you live and die by the Eloi (or, ugh, “whales”)
Experiment 1 had a vastly wide tail. A tiny percent of the users paid a huge percentage (all, actually) of the money. And those runs that didn’t capture one of those users were hung out to dry. If your choice is between having a lot of players pay a little bit of money versus a few super-users paying the lion’s share of the money, go for the former. It’s easier to acquire the non-super users.
One of the best designs that I think captures this principle is Jetpack Joyride. The game offers the normal “buy coins” monetization where the coins allow for in-game items. But one of the purchasable items is different. Called the “Counterfeit Machine” it doubles any coins you collect in the game. This means it has greater lifetime value when you buy it earlier. Priced at the store minimum of $0.99, it’s easy to reason that many players look at this as the cheapest way to get a lot of coins. Thus, it is one of their most popular in-app purchases.
Casinos may cater to high-rollers, but they don’t do it at the expense of the penny slots.
We used to have a great way for all of our users to pay us. It was called retail. It’s not dead. You don’t have to be free-to-play no matter what someone shilling analytics software or pushing their own wildly successful F2P game tells you. They might have just lucked into the right side of the distribution.
Lesson 2: The more users you have, the narrower your standard deviation and the less likely you are to win or miss big.
Lesson 2 is similar to Lesson 1. Having more users reduces the variability and shortens the tails. This is an obvious lesson. Of course you want more users. You don’t get paid by what your average user pays, you get paid by what the sum of all users pays. But don’t let your statistics fool you. Your ARPU could easily go down while your revenue is going up, but decision-makers are still obsessed with ARPU as the magical value-per-user that it cannot be on a person-by-person basis.
Lesson 3: Reduce costs.
Back in Experiment 2B, we saw how sensitive the break-even was to costs. Above is a chart that shows the likelihood of success based on different break-even values. If you design for the smallest valuable feature set, you can attempt to control one parameter that determines your costs. I think this is why that commercial games that are developed from game jam ideas turn out so well. See: 868-HACK, Don’t Starve, Surgeon Simulator, Super Time Force and numerous others. They did the hard part for free. It’s better to release something quickly and see how it does. It may be the best thing since sliced bread, but the market isn’t there for it. It’s better to lose a little than go all-in and lose a lot.
I was curious in creating this talk about my own usage patterns of F2P games. I looked through my emails to get an idea of how long it took me to spend money in Hearthstone and League of Legends. In both, it took me between 3-4 weeks between first log on and first spend. I understand that the Swrve stuff is measuring mobile F2P spends and I am choosing PC F2P games, so there’s a bit of apples and oranges here. But it follows a reasonable theoretic model: the player explores the free portion of the game, exhausts the content, and then pays to get more of it. That makes sense.
But that actually isn’t what happens. According to Swrve, the majority of players pay in their first week. Not only do they pay in their first week, they largely give up in their first week. Only a third of players stay after the first day. Only a sixth stay after the first week.
A quarter of users pay on the first day. A third of users leave on the first day. What does that tell you about designing your on-boarding and your monetization items? This leads to Lesson 4:
Lesson 4: Your Users Have Options
Here is the embodiment of the mobile games market:
Between 1985 and 1994, there were 822 games released for the Nintendo Entertainment System. This is considered by many in my generation a golden age of gaming. There were 822 games released playable on mobile phones last week. There are 400,000 apps on the Google App Store. Players have these, Netflix, Facebook, Amazon Prime, Skype, Twitter, and this thing I’ve heard good things about called “outside”. What do you provide that these do not? That you provide a functional okay experience isn’t enough.
Below is a list of results from the 2013 NFL season (give or take 1 game):
NFL teams spend millions and millions of dollars to get additional wins. And we create great narratives based on these figures. For instance, the Kansas City Chiefs started off an amazing 9-0, but then struggled after some injuries to key pass rushers and finished out the season 2-5. In the same list you have the 0.750 New England Patriots with their future hall-of-fame coach Bill Belichick and their future hall-of-fame quarterback Tom Brady and you also have the hapless 0.250 Cleveland Browns who rotate staff so fast that they have a temp agency on speed dial.
There’s only one problem with this list. It isn’t real:
The results are actually the results of an Excel simulation I ran on my computer where seventeen teams each played each other once in a coin flipping contest where each had a 50% chance of winning. Team A (the Patriots) didn’t have a base rate any higher than Team E (the Browns). We would feel silly applying those same narratives from above to my coin flipping simulation.
Lesson 5: Be Lucky.
There is variance in the world. And sometimes it makes things that are random look like they have order. But that order is decided post-facto. This makes it dangerous to try to copy the successes of others and expect that it will lead to your own success. Sometimes people are just really good at flipping heads. All you can do is keep playing and hope that it is you next time.
I didn’t want to end on something so fatalistic, so I have one more:
Lesson 6: Sometimes we live in Pareto’s world, not Gauss’.
It’s tempting to use Gaussian analysis on everything because it applies quite often, it’s what we’ve learned in school, and the math is so easy. But it is vastly inappropriate for many things, especially things that don’t exhibit independence. By understanding how this works, we can craft our designs in a direction that gives us a better chance of being sustainable and successful.