$Unique_ID{BAS01277} $Pretitle{} $Title{Appendix: Baseball, Computers, and New Statistics} $Subtitle{} $Author{ Gillette, Gary} $Subject{Computers Statistics computer statistic statisticians number numbers sabermetrics sabermetricians statistical computerized Epstein James Palmer Elias Stats stat OPS gambling CD-ROM DataDiscman reference CMC Sony Franklin Electronics statistician} $Log{} Total Baseball: Appendixes Baseball, Computers, and New Statistics Gary Gillette With the coming of age of the personal computer, the sea of baseball statistics in recent years has become a veritable flood. With the baseball world seemingly inundated with numbers, there has been a backlash against statistics and the statisticians. Many fans--especially those of the "old school"--believe that baseball already had enough stats, and that the newer numbers served mostly to obscure the game they love, rather than to illuminate it. Which is true? Certainly, baseball has been blessed over the years with a wealth of numbers. Far more than any other major sport, baseball has been described and preserved in the script of Arabic numerals--it is the primary way in which the stars of yesterday are remembered. All sports have their legends, with their apocryphal stories about their superstars. Where baseball transcends other sports, however, is in the numbers. Baseball statistics give clarity and perspective to the hallowed yet hazy pictures of the past; numbers make almost tangible the exploits of players long gone. It is, in fact, the accuracy and scope of these numbers which make meaningful comparisons of players across eras at all possible. Who is responsible for this flood of numbers? First, the fans, who have always looked for evidence to buttress their opinions. Second, the media, who have latched onto this trend and exploited it commercially. Third, Major League Baseball, which has aided and abetted the trend for publicity. Fourth, fantasy baseball players, many of whom were only casual fans before the advent of Rotisserie leagues. Fifth, authors and analysts, who use these new numbers to examine and explain the seemingly simple game of baseball. Of all the new statistics, the type which has become most popular is the situational. In contrast to traditional baseball totals or averages like home runs and batting average, situational stats are an attempt to break baseball statistics into how and when they happened. The most important situational breakdowns are how batters perform versus lefthanded and righthanded pitchers (and pitchers vs. same and opposite handed batters) and how players perform in their home park as opposed to when they are on the road. In reality, these situational stats have been part of the game for decades, as generations of fans have argued about the effect of Yankee Stadium on sluggers like Ruth and Maris, while generations of managers have platooned their hitters. Aside from these two key categories, several other types of breakdowns have become prominent: "clutch" hitting measures, grass/turf splits, and individual pitcher-batter matchups. The biggest problem with this flood of data is that very little analysis has been done. For example, when a manager chooses to start a right-handed batter (who would normally be platooned) against a right-handed pitcher, he can justify it statistically in any number of ways. He could base his decisions on that hitter's past performance against the opposing team, or by his performance against the opposing pitcher, or by his performance in day games, or by his performance against ground-ball pitchers, or even by his performance in the past week if he's been on a hot streak. All of these statistics (and many more) are available to teams on a daily basis. Of course, many of these numbers will be contradictory: a hitter might have done poorly against right-handed pitchers, but have done well against that team. He could also carry a pitiful average against that day's starting pitcher, but have done excellently in day games. The real problem, then, is not a lack of available data--the problem is too little information. Whether you are managing the 1993 World Champion Toronto Blue Jays or playing armchair manager while watching the home team on cable, which statistics do you rely upon? Literally everyone these days comes armed with reams of statistics--managers, teams, players and their agents, the media, and the fans. Unfortunately, the answer is that no one really knows. Precious little work has been done in understanding what these numbers mean; most of the work has been done in generating still more numbers. No one really knows if individual pitcher-batter matchups are more reliable predictors of performance than left/right breakdowns, and while everyone talks about "clutch" hitters, there are a dozen ways of measuring what might be called "clutch" hitting. Indeed, many studies have been done which fail to show any consistent, measurable effect which could be labeled as "clutch" hitting. Just what is true and what isn't? The answer is both true and false. Most of the statistics quoted and published are literally true. That is, they are accounts of events which actually happened on the field, and therefore must be true. Stating that a player batted .390 with runners in scoring position doesn't leave anything to argue over. What is manifestly false, however, are most of the claims made about these statistics. If one argues that batting .390 with runners in scoring position means that the player is a good "clutch" hitter, the validity of that statement depends not on the actual numbers, but on the interpretation of those numbers. The interpretation and misinterpretation of the numbers and the predictions that flow from them are what cause the arguments. At their best, these new statistics illuminate the various aspects of the game, making it easier to understand how and why players and teams win and lose. The fact that they describe performances in specific situations (hence their name) is both their strength and their weakness. At their worst, situational stats divide up the game into irrelevant categories which hinder understanding. The common parody of situational statistics--how a player hits "in Tuesday night games at home when facing a southpaw in July with the bases loaded"--is sometimes all too close to reality. In Game Seven of the 1992 World Series, for example, more than five hundred stats were bandied about by the network broadcasters or displayed on the screen during the game. With so many numbers coming fast and furious, the significant ones get lost in the trivial, and the currency of all analytical statistics is debased. The schizoid attitudes in baseball toward new statistics and analysis are shown by the fact that there isn't even general agreement on the use of the word sabermetrics: many fans have never heard of the term, and while some professionals in the field call themselves sabermetricians, others eschew the word and call themselves statistical analysts. The word sabermetrics was coined by best-selling author Bill James, who modified the acronym for the Society for American Baseball Research (SABR) for the root of the word and added the suffix -metrics to denote measurement. While SABR may have lent these new statistics its name, the Society for American Baseball Research is not the primary purveyor of these new statistics. There are many outlets for statistics today, but most of these come from only a few sources. The Major League Baseball-IBM Baseball Information System, which operates out of the commissioner's office in New York City, now compiles baseball's official statistics. (Computer giant IBM is an official sponsor of major-league baseball and provides the computer hardware.) This system was set up in the late 1980s to provide statistics to major league teams, and it now also provides much of the material used by the media. The Elias Sports Bureau, also headquartered in New York, remains the official statistician for both major leagues and is another major source of statistics. Howe News Bureau, which had previously served for many years as the official statisticians of the American League, became the official statisticians in the late 1980s for all minor leagues and most of the Latin and Caribbean leagues. Signifying its new business orientation was a change in name to Howe Sportsdata International, which now provides minor-league statistics to numerous publications, making the records of tomorrow's stars available to fans today. In the front offices of baseball teams, there have been two primary usages of these new statistics. Team Public Relations departments, aided by the computerized MLB-IBM Baseball Information System, have become increasingly more proficient and prolific at churning out special stats about their players. Most of these find their way to the fans via the media, who publicize these stats in print and on the air. The other way in which the new statistics have penetrated the business end of baseball is through salary arbitration, a quasilegal proceeding which directly sets salaries for a few dozen players each year. Indirectly, however, salary arbitration has a much broader impact on the game's salary structure. Newer statistical measures have been used by both management and labor in arbitration in recent years, although many analytical stats are still not admissible in arbitration proceedings by the mutual agreement of the disputing parties. Outside the hearing room, new analytical measures have had a greater impact. Several major league teams have employed full-time professional statistical analysts in the past decade, and other teams have employed statistical analysts as consultants. Of these analysts, Eddie Epstein, now Director of Research and Statistics for the Baltimore Orioles, has risen the highest and had the most influence. Some other teams have employed computer systems to compile and analyze performance data, with the Oakland Athletics and manager Tony LaRussa getting the most credit for successfully using these tools. It is clear that the impact of computers (and the analytical statistics they make possible) will continue to grow in baseball's executive suites in the future. In the publishing world, the main effect of the new statistics has been to create a new subgenre of baseball books. Celebrity biographies still sell the most sports books, but the number of statistically oriented titles released in recent years is astounding. Bill James, a Kansan who was not a sportswriter, made the best-seller lists year after year in the 1980s on the strength of his detailed statistical analysis as well as his witty and satirical prose. Pete Palmer, a computer programmer by trade, blazed the way for accurate historical comparisons of players and teams by combining his tireless research and top-notch computer skills to produce a comprehensive historical data base. Published for the first time in Total Baseball, a comprehensive reference work, Palmer brought sabermetrics and serious analytical measures to the general baseball public. Many other authors in the last decade have published baseball books which relied on the numbers. By 1990, well over a hundred baseball books were being released by major publishers each spring; several dozen of these were devoted to statistical analysis or intended as reference works. The Elias Sports Bureau, longtime official statisticians of the National League, made their mark by publishing their annual eponymous book of situational stats starting in 1985. These stats, previously available only to major-league teams, instantly became part of the baseball public's consciousness. The Elias Baseball Analyst became the best-seller of the annual statistical tomes and is very widely quoted by the media. Just as James, Palmer, and Elias were preceded by the members of the Society for American Baseball Research, they were also followed by many others. Project Scoresheet, a nonprofit organization founded by James in 1984, coordinated the efforts of hundreds of volunteers and produced the first and only publicly available data base of contemporary baseball. Retrosheet, another volunteer group founded by David Smith (a longtime SABR member and professor at the University of Delaware), is now collecting scoresheets from pre-1984 games. Armed with copies of more than 75,000 scoresheets donated by teams, sportswriters and fans, Retrosheet will soon make public its first computerized data (for the 1967 season). As the official statisticians, Major League Baseball and the Elias Sports Bureau are the primary sources from which both the electronic and the print media get their baseball statistics. Two independent organizations also maintain baseball data bases, providing both the media and the fans with ready access to almost any conceivable baseball statistic: Stats, Inc., of Chicago (run by John Dewan and Dick Cramer), and The Baseball Workshop of Philadelphia (run by the author). Stats, Inc., has provided greatly improved and expanded box scores to USA Today, Baseball Weekly, and other newspapers since 1990. Stats also provides statistics to ESPN and other media outlets, produces the annual Scouting Report, and self-publishes other baseball reference books. The Baseball Workshop maintains and updates the former Project Scoresheet data base while providing statistics and analysis to publishers and to media clients. The Workshop's annual Great American Baseball Stat Book features comprehensive situational statistics for all active major-league players. On the periodical side, media conglomerate Gannett founded a weekly newspaper in 1990 devoted solely to baseball--more specifically to baseball statistics, which take up a large portion of the paper. While the hundred-year-old Sporting News has deemphasized baseball, USA Today Baseball Weekly has found a market hundreds of thousands strong by focusing exclusively on baseball. It is true, though, that a very large share of the Baseball Weekly audience is composed of fantasy baseball players, and the format of the statistics published in BBW is designed for their convenience. Baseball America, a biweekly newspaper which focuses on minor-league and college baseball, has also gained a large following among dedicated fans and fantasy baseball players. The ready availability of information about thousands of minor-leaguers in Baseball America, and the regular publication by BBW in 1992 of OPS stats (On-Base plus Slugging, an analytical measure developed by Pete Palmer) show just how far the new statistics have come. Probably the most high-profile and controversial element of the new statistics trend has been the explosion in popularity of fantasy baseball, a pastime that occupies several million players. An undeniable attraction of fantasy baseball is that it brings to baseball one element which football has had for decades: gambling. It is undeniably true that without the wager, there would be very few fantasy leagues. The excitement of betting on football is one of the main reasons for its spectacular growth in popularity, and fantasy games give baseball fans a chance to partake of the action during the summer as well as in the Gamblers, both fantasy players and others, make extensive use of the new stats. Fantasy baseball is firmly established and likely to increase in popularity. Another area of great growth has been in baseball games. Baseball board games, using dice as the element of chance and statistics to recreate performance, have been played by a small but devoted group since the birth of APBA Baseball in the early 1950s. In the mid-1980s computer baseball games took hold and now appear to be the future of baseball gaming. While board games still have their audience, computer games are much more flexible and are able to provide fans with a variety of simulated experiences which board games cannot. Furthermore, the virtues of a computer opponent, when a flesh-and-blood one is unavailable, are understandable. The latest computer games simulate sophisticated opposing managers as well as recreating player performance. Moreover, the era of almost real-time baseball games has already dawned. In 1991 the computer service Prodigy debuted a baseball game which used last night's real-life performances to play simulated games. The participants in "Big League Manager" send in their lineups by modem before they go to bed; the next morning, they dial up the Prodigy computer and see a boxscore for last night's game for their team displayed along with current league standings. Compuserve and other major computer on-line services also provide an electronic forum which connects baseball fans across the country. These services run hundreds of electronic fantasy baseball leagues which attract thousands of players. Fantasy baseball contests, made possible by the combination of computers and "800" and "900" phone lines, have also become big business in the 1990s. Yet another electronic milestone was reached in 1991-1992 with the debut of the electronic baseball encyclopedia, in two forms: Total Baseball became available in compact disc, read-only memory form (CD-ROM)--in a mini-disc for Sony's palmtop DataDiscman player and for the desktop computer, MS-DOS or Macintosh, in a conventional size disc published by CMC; also, Big League Baseball, a handheld reference device published by Franklin Electronics which fit into a shirt pocket. The most recent development, as of 1994, is Microsoft Baseball, which incorporates the statistical database and prose features of Total Baseball within a larger reference framework, created for the Windows graphical environment. The advantage of an electronic baseball encyclopedia is not simply its portability or its compact data storage and retrieval. These electronic editions invite the user to manipulate the numbers, to make customized lists and complicated research requests, all of which make the numbers more meaningful and accessible to the fan than they are on the printed page. Last, and least, is the negative reaction to the new statistics. There has been a backlash, and it's true that many fans believe the game has suffered from these stats and their purveyors, but the new statistics don't change the grand old game; they just provide new ways of looking at it. Almost all traditional baseball statistics (e.g., batting average, earned run average, errors) were invented in the nineteenth century or the early twentieth, and they reflect the way the game was played at that time. The game on the field is quite different now, and it is appropriate that the statistics used to describe and analyze the game reflect the way the game is played now. As with new strategies on the field, the old stats will persist while their replacements become established. During this time, the improvement in analysis will be obscured by the clash of stats and the arguments of the analysts. Inevitably, though, the best of the new stats will oust the worst of the old, and the game will look a little different in the future. Not too long ago, you know, nobody bothered to count such silly things as runs batted in, batter strikeouts, times caught stealing, or saves. Players change, teams change, ballparks change, strategies change--even "the unchanging game" itself changes. Why shouldn't baseball statistics change along with them?