home *** CD-ROM | disk | FTP | other *** search
- $Unique_ID{BAS01277}
- $Pretitle{}
- $Title{Appendix: Baseball, Computers, and New Statistics}
- $Subtitle{}
- $Author{
- Gillette, Gary}
- $Subject{Computers Statistics computer statistic statisticians number numbers
- sabermetrics sabermetricians statistical computerized Epstein James Palmer
- Elias Stats stat OPS gambling CD-ROM DataDiscman reference CMC Sony Franklin
- Electronics statistician}
- $Log{}
-
- Total Baseball: Appendixes
-
-
- Baseball, Computers, and New Statistics
-
- Gary Gillette
-
-
- With the coming of age of the personal computer, the sea of baseball
- statistics in recent years has become a veritable flood. With the baseball
- world seemingly inundated with numbers, there has been a backlash against
- statistics and the statisticians. Many fans--especially those of the "old
- school"--believe that baseball already had enough stats, and that the newer
- numbers served mostly to obscure the game they love, rather than to illuminate
- it.
- Which is true? Certainly, baseball has been blessed over the years with
- a wealth of numbers. Far more than any other major sport, baseball has been
- described and preserved in the script of Arabic numerals--it is the primary
- way in which the stars of yesterday are remembered. All sports have their
- legends, with their apocryphal stories about their superstars. Where baseball
- transcends other sports, however, is in the numbers. Baseball statistics give
- clarity and perspective to the hallowed yet hazy pictures of the past; numbers
- make almost tangible the exploits of players long gone. It is, in fact, the
- accuracy and scope of these numbers which make meaningful comparisons of
- players across eras at all possible.
- Who is responsible for this flood of numbers? First, the fans, who have
- always looked for evidence to buttress their opinions. Second, the media, who
- have latched onto this trend and exploited it commercially. Third, Major
- League Baseball, which has aided and abetted the trend for publicity. Fourth,
- fantasy baseball players, many of whom were only casual fans before the advent
- of Rotisserie leagues. Fifth, authors and analysts, who use these new numbers
- to examine and explain the seemingly simple game of baseball.
- Of all the new statistics, the type which has become most popular is the
- situational. In contrast to traditional baseball totals or averages like home
- runs and batting average, situational stats are an attempt to break baseball
- statistics into how and when they happened. The most important situational
- breakdowns are how batters perform versus lefthanded and righthanded pitchers
- (and pitchers vs. same and opposite handed batters) and how players perform in
- their home park as opposed to when they are on the road. In reality, these
- situational stats have been part of the game for decades, as generations of
- fans have argued about the effect of Yankee Stadium on sluggers like Ruth and
- Maris, while generations of managers have platooned their hitters. Aside from
- these two key categories, several other types of breakdowns have become
- prominent: "clutch" hitting measures, grass/turf splits, and individual
- pitcher-batter matchups.
- The biggest problem with this flood of data is that very little analysis
- has been done. For example, when a manager chooses to start a right-handed
- batter (who would normally be platooned) against a right-handed pitcher, he
- can justify it statistically in any number of ways. He could base his
- decisions on that hitter's past performance against the opposing team, or by
- his performance against the opposing pitcher, or by his performance in day
- games, or by his performance against ground-ball pitchers, or even by his
- performance in the past week if he's been on a hot streak. All of these
- statistics (and many more) are available to teams on a daily basis. Of course,
- many of these numbers will be contradictory: a hitter might have done poorly
- against right-handed pitchers, but have done well against that team. He could
- also carry a pitiful average against that day's starting pitcher, but have
- done excellently in day games.
- The real problem, then, is not a lack of available data--the problem is
- too little information. Whether you are managing the 1993 World Champion
- Toronto Blue Jays or playing armchair manager while watching the home team on
- cable, which statistics do you rely upon? Literally everyone these days comes
- armed with reams of statistics--managers, teams, players and their agents, the
- media, and the fans. Unfortunately, the answer is that no one really knows.
- Precious little work has been done in understanding what these numbers mean;
- most of the work has been done in generating still more numbers. No one really
- knows if individual pitcher-batter matchups are more reliable predictors of
- performance than left/right breakdowns, and while everyone talks about
- "clutch" hitters, there are a dozen ways of measuring what might be called
- "clutch" hitting. Indeed, many studies have been done which fail to show any
- consistent, measurable effect which could be labeled as "clutch" hitting. Just
- what is true and what isn't?
- The answer is both true and false. Most of the statistics quoted and
- published are literally true. That is, they are accounts of events which
- actually happened on the field, and therefore must be true. Stating that a
- player batted .390 with runners in scoring position doesn't leave anything to
- argue over. What is manifestly false, however, are most of the claims made
- about these statistics. If one argues that batting .390 with runners in
- scoring position means that the player is a good "clutch" hitter, the validity
- of that statement depends not on the actual numbers, but on the interpretation
- of those numbers. The interpretation and misinterpretation of the numbers and
- the predictions that flow from them are what cause the arguments.
- At their best, these new statistics illuminate the various aspects of the
- game, making it easier to understand how and why players and teams win and
- lose. The fact that they describe performances in specific situations (hence
- their name) is both their strength and their weakness. At their worst,
- situational stats divide up the game into irrelevant categories which hinder
- understanding. The common parody of situational statistics--how a player hits
- "in Tuesday night games at home when facing a southpaw in July with the bases
- loaded"--is sometimes all too close to reality. In Game Seven of the 1992
- World Series, for example, more than five hundred stats were bandied about by
- the network broadcasters or displayed on the screen during the game. With so
- many numbers coming fast and furious, the significant ones get lost in the
- trivial, and the currency of all analytical statistics is debased.
- The schizoid attitudes in baseball toward new statistics and analysis are
- shown by the fact that there isn't even general agreement on the use of the
- word sabermetrics: many fans have never heard of the term, and while some
- professionals in the field call themselves sabermetricians, others eschew the
- word and call themselves statistical analysts. The word sabermetrics was
- coined by best-selling author Bill James, who modified the acronym for the
- Society for American Baseball Research (SABR) for the root of the word and
- added the suffix -metrics to denote measurement.
- While SABR may have lent these new statistics its name, the Society for
- American Baseball Research is not the primary purveyor of these new
- statistics. There are many outlets for statistics today, but most of these
- come from only a few sources. The Major League Baseball-IBM Baseball
- Information System, which operates out of the commissioner's office in New
- York City, now compiles baseball's official statistics. (Computer giant IBM is
- an official sponsor of major-league baseball and provides the computer
- hardware.) This system was set up in the late 1980s to provide statistics to
- major league teams, and it now also provides much of the material used by the
- media. The Elias Sports Bureau, also headquartered in New York, remains the
- official statistician for both major leagues and is another major source of
- statistics. Howe News Bureau, which had previously served for many years as
- the official statisticians of the American League, became the official
- statisticians in the late 1980s for all minor leagues and most of the Latin
- and Caribbean leagues. Signifying its new business orientation was a change in
- name to Howe Sportsdata International, which now provides minor-league
- statistics to numerous publications, making the records of tomorrow's stars
- available to fans today.
- In the front offices of baseball teams, there have been two primary
- usages of these new statistics. Team Public Relations departments, aided by
- the computerized MLB-IBM Baseball Information System, have become increasingly
- more proficient and prolific at churning out special stats about their
- players. Most of these find their way to the fans via the media, who publicize
- these stats in print and on the air. The other way in which the new statistics
- have penetrated the business end of baseball is through salary arbitration, a
- quasilegal proceeding which directly sets salaries for a few dozen players
- each year. Indirectly, however, salary arbitration has a much broader impact
- on the game's salary structure. Newer statistical measures have been used by
- both management and labor in arbitration in recent years, although many
- analytical stats are still not admissible in arbitration proceedings by the
- mutual agreement of the disputing parties.
- Outside the hearing room, new analytical measures have had a greater
- impact. Several major league teams have employed full-time professional
- statistical analysts in the past decade, and other teams have employed
- statistical analysts as consultants. Of these analysts, Eddie Epstein, now
- Director of Research and Statistics for the Baltimore Orioles, has risen the
- highest and had the most influence. Some other teams have employed computer
- systems to compile and analyze performance data, with the Oakland Athletics
- and manager Tony LaRussa getting the most credit for successfully using these
- tools. It is clear that the impact of computers (and the analytical statistics
- they make possible) will continue to grow in baseball's executive suites in
- the future.
- In the publishing world, the main effect of the new statistics has been
- to create a new subgenre of baseball books. Celebrity biographies still sell
- the most sports books, but the number of statistically oriented titles
- released in recent years is astounding. Bill James, a Kansan who was not a
- sportswriter, made the best-seller lists year after year in the 1980s on the
- strength of his detailed statistical analysis as well as his witty and
- satirical prose. Pete Palmer, a computer programmer by trade, blazed the way
- for accurate historical comparisons of players and teams by combining his
- tireless research and top-notch computer skills to produce a comprehensive
- historical data base. Published for the first time in Total Baseball, a
- comprehensive reference work, Palmer brought sabermetrics and serious
- analytical measures to the general baseball public.
- Many other authors in the last decade have published baseball books which
- relied on the numbers. By 1990, well over a hundred baseball books were being
- released by major publishers each spring; several dozen of these were devoted
- to statistical analysis or intended as reference works. The Elias Sports
- Bureau, longtime official statisticians of the National League, made their
- mark by publishing their annual eponymous book of situational stats starting
- in 1985. These stats, previously available only to major-league teams,
- instantly became part of the baseball public's consciousness. The Elias
- Baseball Analyst became the best-seller of the annual statistical tomes and is
- very widely quoted by the media.
- Just as James, Palmer, and Elias were preceded by the members of the
- Society for American Baseball Research, they were also followed by many
- others. Project Scoresheet, a nonprofit organization founded by James in 1984,
- coordinated the efforts of hundreds of volunteers and produced the first and
- only publicly available data base of contemporary baseball. Retrosheet,
- another volunteer group founded by David Smith (a longtime SABR member and
- professor at the University of Delaware), is now collecting scoresheets from
- pre-1984 games. Armed with copies of more than 75,000 scoresheets donated by
- teams, sportswriters and fans, Retrosheet will soon make public its first
- computerized data (for the 1967 season).
- As the official statisticians, Major League Baseball and the Elias Sports
- Bureau are the primary sources from which both the electronic and the print
- media get their baseball statistics. Two independent organizations also
- maintain baseball data bases, providing both the media and the fans with ready
- access to almost any conceivable baseball statistic: Stats, Inc., of Chicago
- (run by John Dewan and Dick Cramer), and The Baseball Workshop of Philadelphia
- (run by the author). Stats, Inc., has provided greatly improved and expanded
- box scores to USA Today, Baseball Weekly, and other newspapers since 1990.
- Stats also provides statistics to ESPN and other media outlets, produces the
- annual Scouting Report, and self-publishes other baseball reference books. The
- Baseball Workshop maintains and updates the former Project Scoresheet data
- base while providing statistics and analysis to publishers and to media
- clients. The Workshop's annual Great American Baseball Stat Book features
- comprehensive situational statistics for all active major-league players.
- On the periodical side, media conglomerate Gannett founded a weekly
- newspaper in 1990 devoted solely to baseball--more specifically to baseball
- statistics, which take up a large portion of the paper. While the
- hundred-year-old Sporting News has deemphasized baseball, USA Today Baseball
- Weekly has found a market hundreds of thousands strong by focusing exclusively
- on baseball. It is true, though, that a very large share of the Baseball
- Weekly audience is composed of fantasy baseball players, and the format of the
- statistics published in BBW is designed for their convenience. Baseball
- America, a biweekly newspaper which focuses on minor-league and college
- baseball, has also gained a large following among dedicated fans and fantasy
- baseball players. The ready availability of information about thousands of
- minor-leaguers in Baseball America, and the regular publication by BBW in 1992
- of OPS stats (On-Base plus Slugging, an analytical measure developed by Pete
- Palmer) show just how far the new statistics have come.
- Probably the most high-profile and controversial element of the new
- statistics trend has been the explosion in popularity of fantasy baseball, a
- pastime that occupies several million players. An undeniable attraction of
- fantasy baseball is that it brings to baseball one element which football has
- had for decades: gambling. It is undeniably true that without the wager, there
- would be very few fantasy leagues. The excitement of betting on football is
- one of the main reasons for its spectacular growth in popularity, and fantasy
- games give baseball fans a chance to partake of the action during the summer
- as well as in the Gamblers, both fantasy players and others, make extensive
- use of the new stats. Fantasy baseball is firmly established and likely to
- increase in popularity.
- Another area of great growth has been in baseball games. Baseball board
- games, using dice as the element of chance and statistics to recreate
- performance, have been played by a small but devoted group since the birth of
- APBA Baseball in the early 1950s. In the mid-1980s computer baseball games
- took hold and now appear to be the future of baseball gaming. While board
- games still have their audience, computer games are much more flexible and are
- able to provide fans with a variety of simulated experiences which board games
- cannot. Furthermore, the virtues of a computer opponent, when a
- flesh-and-blood one is unavailable, are understandable.
- The latest computer games simulate sophisticated opposing managers as
- well as recreating player performance. Moreover, the era of almost real-time
- baseball games has already dawned. In 1991 the computer service Prodigy
- debuted a baseball game which used last night's real-life performances to play
- simulated games. The participants in "Big League Manager" send in their
- lineups by modem before they go to bed; the next morning, they dial up the
- Prodigy computer and see a boxscore for last night's game for their team
- displayed along with current league standings. Compuserve and other major
- computer on-line services also provide an electronic forum which connects
- baseball fans across the country. These services run hundreds of electronic
- fantasy baseball leagues which attract thousands of players. Fantasy baseball
- contests, made possible by the combination of computers and "800" and "900"
- phone lines, have also become big business in the 1990s.
- Yet another electronic milestone was reached in 1991-1992 with the debut
- of the electronic baseball encyclopedia, in two forms: Total Baseball became
- available in compact disc, read-only memory form (CD-ROM)--in a mini-disc for
- Sony's palmtop DataDiscman player and for the desktop computer, MS-DOS or
- Macintosh, in a conventional size disc published by CMC; also, Big League
- Baseball, a handheld reference device published by Franklin Electronics which
- fit into a shirt pocket. The most recent development, as of 1994, is Microsoft
- Baseball, which incorporates the statistical database and prose features of
- Total Baseball within a larger reference framework, created for the Windows
- graphical environment. The advantage of an electronic baseball encyclopedia is
- not simply its portability or its compact data storage and retrieval. These
- electronic editions invite the user to manipulate the numbers, to make
- customized lists and complicated research requests, all of which make the
- numbers more meaningful and accessible to the fan than they are on the printed
- page.
- Last, and least, is the negative reaction to the new statistics. There
- has been a backlash, and it's true that many fans believe the game has
- suffered from these stats and their purveyors, but the new statistics don't
- change the grand old game; they just provide new ways of looking at it. Almost
- all traditional baseball statistics (e.g., batting average, earned run
- average, errors) were invented in the nineteenth century or the early
- twentieth, and they reflect the way the game was played at that time. The game
- on the field is quite different now, and it is appropriate that the statistics
- used to describe and analyze the game reflect the way the game is played now.
- As with new strategies on the field, the old stats will persist while
- their replacements become established. During this time, the improvement in
- analysis will be obscured by the clash of stats and the arguments of the
- analysts. Inevitably, though, the best of the new stats will oust the worst of
- the old, and the game will look a little different in the future.
- Not too long ago, you know, nobody bothered to count such silly things as
- runs batted in, batter strikeouts, times caught stealing, or saves. Players
- change, teams change, ballparks change, strategies change--even "the
- unchanging game" itself changes. Why shouldn't baseball statistics change
- along with them?
-