Too Short vs. Too Long

Clive Thompson has a great article on making the age-old argument that gamers who grow up and get a job make: games don’t need to be forty hours long to be fun.

I read a couple of dozen write-ups of [The Maw], all of which were highly positive — but which complained that [it] was “too short.” … The Maw felt like the perfect length — because the game ends precisely at the moment that your learning curve flattens out. After three hours, I felt like I’d figured out every permutation of weird trick I could pull with my ever-expanding Maw — so when the ending arrived, my brain felt perfectly exercised.

I’m on Thompson’s bandwagon and he elucidates the reasoning well, so I am disinclined to make the same argument. But I do want to raise a question: which is better for a game to be – too short or too long?

A game that is too long is defended by its proponents for being full of value (see the Disgaea series’ post-game for an extreme example, but most RPGs are competent examples). In the long game’s example, the subset of gamers who stick with it are rewarded the most, while the quitters take all they can handle. In short games, all gamers get the maximum satisfaction the game offers, but yet many still want more.

Hey! This sounds like a supply and demand problem! Long games provide a utility surplus. Some folks will be at the far end of the curve where utility supplied = utility demanded, some will be where utility supplied > utility demanded, but next to no one will be in the situation where utility supplied < utility demanded. The short game provides the opposite.

So it seems obvious: long games are better because everyone can take their fill – it’s like a buffet.

But that is only correct if your objective is to maximize total utility given. But perhaps the real objective should be to maximize the number of people whose utility demanded is as high as possible – that everyone who gets your game enjoys it as much as possible. This is the case for the short game.

Obviously the truly economic issues matter a lot more than the design issues examined as if they were economic issues. But looking at the problem as if it were a problem of economics allows us to see that the “buffet” style long game is great for those who gorge themselves as it maximizes utility over all players while the “gourmet” style short game is great for those who want the complete package and no more as it maximizes the number of players who enjoys all there is to see.

So asking whether games are too short or too long is suggesting that both Golden Corral and gourmet restaurants can’t exist in the same economy. If you have a short game, market it like a gourmet dinner. If you have a long game, market it like Golden Corral.

I’m hungry.

Design By Numbers

Via, I found this blog post from a departing Google employee with some strikingly resonant themes:

Without a person at (or near) the helm who thoroughly understands the principles and elements of Design, a company eventually runs out of reasons for design decisions. With every new design decision, critics cry foul. Without conviction, doubt creeps in. Instincts fail. “Is this the right move?” When a company is filled with engineers, it turns to engineering to solve problems. Reduce each decision to a simple logic problem. Remove all subjectivity and just look at the data. Data in your favor? Ok, launch it. Data shows negative effects? Back to the drawing board. And that data eventually becomes a crutch for every decision, paralyzing the company and preventing it from making any daring design decisions.

Analyzing data has a cost, even if it is only a day of time and not focus testing forty-one shades of blue. You can cut that cost significantly by hiring skilled designers who aren’t blinded by the art and engineering problems faced day to day and who only focus on the design problems. But the difference between capital D design above and game design is that everyone thinks they know game design. Everyone’s opinion is just as valuable as the next because everyone plays games, right?

I’ve worked with a lot of engineers in decision-making roles previously and they all without exception follow the “data as arbiter” approach to which apparently Google is addicted. It certainly has its place as designers aren’t Oracles, but designers aren’t to be hired just to make decisions the crowd and powers-that-be agree with but to make decisions based on a underlying design consistency that may be an initial hard-sell. So why do you pay your designer again?

Man-Months to Quality

One assumption that designers and developers naively make is that the highest quality game will lead to the highest profit for the company. Unfortunately, that may not be true and I will show you an example of this. Keep in mind that this is a clearly hypothetical example, but I will be disclosing the assumptions made along the way so that you can see my steps. There are some wildly controversial assumptions here and I want anyone reading this to understand my analysis.

Let’s take two individuals: a lead designer and an executive producer. The lead designer wants to create the highest quality game possible and the executive producer wants to create the most profitable game possible. They are sitting down attempting to discuss the scope of the project, which will be boiled down to a spend of man-months. The game being discussed is a mid-size project in a genre that has had a wide array of quality from critical panning to critical acclaim. Assume there is a base level of functionality that must be delivered to ship a game in this genre. This means that there are only so many features that they can cut and still fulfill the requirements of the genre.

Metacritic is a wildly used but also wildly disputed measure of quality. The site gives what is called a “Metacritic Score” by aggregating the review scores of as many legitimate review sites and magazines as possible. Below we see a possible relationship of Man-Month spend to resulting quality. Extremely low spends result in broken and terribly rated games. But each marginal man-month spent increases quality by smaller and smaller amounts (decreasing marginal utility) until the project reaches a point of maximum quality. At this point, the team is adding some of their less exciting and impactful features and the glut detracts from the impressiveness, usability and/or functionality of the other, better features and quality drops.

Fig 1 – Metacritic vs. Man-Months

This graph will vary wildly depending on the market expectations and the subject matter. For instance, World of Warcraft can continue adding quality features without diluting the product for a very long time. A casual market Sudoku title can only do so much before additions become trivial or distracting.

Also, this relationship only holds looking forward from the beginning. The team cannot be basing their design, code and assets off a particular plan of N hours and then have an additional X hours added to the plan in hopes that they will get the same bang-for-the-buck in terms of quality as if they based their original plan on having N+X hours. While the team with suddenly X more hours will be able to move to the right on the curve, they will not be able to move the full X hours over. There is always some issue of refactoring code, data or design that makes this “bonus” time inefficient. In fact, sometimes it is so inefficient that it yields no quality improvements at all. This depends wholly on the team and the circumstances of the project. This is related to the widely quoted (by engineers) assertion that changes made during design are orders of magnitude cheaper than changes made during testing.

Now, further assume as in Figure Two that the Metacritic measure of quality is proportional to the revenue the project amasses. Now, there are numerous counterexamples to this statement: both of critical darlings that couldn’t turn a profit and of dreck that raked in huge revenues. But by and large, the correlation of Metacritic Score and Sales has been reasonably tight (especially for non-licensed titles), so we will use that to aid in our analysis.

Fig. 2 – Revenue vs. Metacritic

There is a certain score in each genre on each platform where any score below is considered a statement of terrible quality. For instance, the difference between a 30-rated game and a 40-rated game is unnoticeable for most examples, while the difference between an 80-rated game and a 90-rated game is significant. You see this in the example in Figure Two where up until a score of 60, revenues are mostly flat. Then the 60-75 range (highlighted by the bracket) is the “bang for the buck” area where each additional point of “quality” convinces a larger and larger audience that the game is worth buying. Then, eventually, a title reaches a level of quality where additional points do not matter as much and the “bang for the buck” dies off.

Cost is fixed plus some factor that varies linearly with man-months.

Now that we have revenues and costs, we can make a graph of profits (Figure Three). As you can see, there is a point of maximum profit where the cost of additional man-hours is greater than the revenue that additional man-hour will bring in.

Fig. 3 – Profit vs. Man-Months

But if we look at both the points of max quality and max profit on the same graph, we see that they are not the same! The point of max profit occurs before the point of max quality.

Fig. 4 – Comparing Profit and Metacritic on the same graph

What happens when we do a sensitivity analysis varying the cost of labor? We find that as we get closer to zero, the point of max profit approaches the point of max quality. But as we increase our variable costs, the point of max profit moves away from the point of max quality! Eventually, there reaches a level of cost such that no level of man-month effort is profitable and then the level of max-profit is to not do the project at all.

So this has huge implications if the assumptions are valid. Companies where it is very expensive to make games (assuming they are profit-maximizing enterprises as well) will choose an amount of man-hours that results in games farther from their point of max quality than will a similar cheaper outfit under the same conditions.

But also consider our two developers. Even if the designer has the perfect plan for how to make the highest quality game, the producer will almost always be forced to cut him short. This is sort of comforting for us who have this happen on every project.

So, as designers, how do we act on this information?

Conclusion One: Find Your Happy Place
There is some level of satisfaction where you can feel good about releasing the project into the wild. A common trait among designers is the desire to be a perfectionist. Shake that desire. You likely won’t be able to hit your game’s full potential in terms of quality, so figure out what the minimum level of quality you would be satisfied with and do everything in your power to at least hit that level. Everything else is gravy. Obsessing about making the complete, perfect experience will only cause pain and strife when your beloved features are cut to lower costs.

Conclusion Two: Figure out the way to make every man-month count.
Many do not realize how truly expensive overhead can be. If you believe that the higher the monthly costs of operating, the farther away the point of maximum profit is from the point of maximum quality, then you will want to do everything in your power to make those precious man-months count. This is essentially changing Figure One into a steeper curve, where top quality happens with less effort. (Note that I didn’t say “where top quality happens sooner”. Many managers in this industry seem to believe that working their charges to complete features by such-and-such date no matter what is productive. What they don’t realize is how little productivity death marches yield.) Productivity is a cause everyone can rally-behind.

I realize that this analysis boils a very complex market down to a few variables, but the purpose of this was not to make a proof or a law, but to get thinking about the differences between maximum quality and maximum profit, which many assume must necessarily be one and the same.