


ABSTRACT 
Relative to other team games, the contribution of individual team members to the overall team performance is more easily quantifiable in cricket. Viewing players as securities and the team as a portfolio, cricket thus lends itself better to the use of analytical methods usually employed in the analysis of securities and portfolios. This paper demonstrates the use of stochastic dominance rules, normally used in investment management, to analyze the One Day International (ODI) batting performance of Indian cricketers. The data used span the years 1989 to 2005. In dealing with cricketing data the existence of ‘not out’ scores poses a problem while processing the data. In this paper, using a Bayesian approach, the ‘notout’ scores are first replaced with a conditional average. The conditional average that is used represents an estimate of the score that the player would have gone on to score, if the ‘not out’ innings had been completed. The data thus treated are then used in the stochastic dominance analysis. To use stochastic dominance rules we need to characterize the ‘utility’ of a batsman. The first derivative of the utility function, with respect to runs scored, of an ODI batsman can safely be assumed to be positive (more runs scored are preferred to less). However, the second derivative needs not be negative (no diminishing marginal utility for runs scored). This means that we cannot clearly specify whether the value attached to an additional run scored is lesser at higher levels of scores. Because of this, only firstorder stochastic dominance is used to analyze the performance of the players under consideration. While this has its limitation (specifically, we cannot arrive at a complete utility value for each batsman), the approach does well in describing player performance. Moreover, the results have intuitive appeal. 
Key words:
Bayesian, utility function, batting average, conditional average, geometric distribution

Key
Points
 The problem of dealing with ‘not out’ scores in cricket is tackled using a Bayesian approach.
 Stochastic dominance rules are used to characterize the utility of a batsman.
 Since the marginal utility of runs scored is not diminishing in nature, only first order stochastic dominance rules are used.
 The results, demonstrated using data for the Indian cricket team are intuitively appealing.
 The limitation of the approach is that it cannot arrive at a complete utility value for the batsman.

As a game, cricket is a statistician’s delight. Each game of cricket throws up a huge amount of performance related statistics. As other games have evolved and developed, they too have become richer in the use of performance statistics. For example, use of statistics like ‘unforced errors’ in lawn tennis or ‘assists’ in basketball is increasingly becoming popular. However in cricket these statistics have always been part and parcel of the game. Cricket is one of the few games in which a ‘scorer’ is required to continuously maintain statistical data on key game/playerspecific performance statistics. It is one of the few games that have detailed ‘scoring sheets’. These scoring sheets were maintained manually in the predigital age and are maintained electronically today. In spite of this legacy and long history of maintaining statistical data, two aspects associated with cricketing data are striking. The first is the idiosyncrasy that has persisted in the treatment of the ‘not out’ scores of a player. The second is the lack of effort in exploiting the richness of data to improve the representation of player performance. The batting average of player i, R_{i}, is computed as: Equation (1) introduces an upward bias in the average. This bias is caused because the numerator is the total runs scored over all innings while the denominator excludes the innings in which the player has remained ‘not out’. This bias cannot seemingly be avoided. Taking the denominator to be n instead of nk would instead introduce a downward bias in the average. A similar problem arises while preparing the input data required for the stochastic dominance rules developed later in the paper. The input data that is required is the inningsbyinnings runs scored by the player. What should be done with the scores for the innings in which the player has remained ‘not out’? This paper first proposes a method to deal with this problem. The second aspect of cricketing data is the scant attention that has been focused by researchers on certain aspects of cricket. A substantial portion of the work has focused on devising optimal playing strategies. The strategies studied have either focused on batting strategies (Clarke, 1988; Clarke and Norman, 1999; Preston and Thomas, 2000; Swartz et al., 2006) or on bowling strategies (Rajadhyaksha and Arapostathis, 2004). A fair amount of work has also focused on the problem of arriving at a fair result when a game has to be prematurely terminated due to weather conditions or other disturbances (Duckworth and Lewis, 1998; Preston and Thomas, 2002; Carter and Guthrie, 2004). The third stream of work, on the understanding and development of playerspecific performance statistics, (Kimber and Hansford, 1993; Lemmer, 2004; Lewis, 2005; Wood, 1945), has received little attention. Cricket, with its slow pace and noncontinuous nature, is a very televisionfriendly game. It allows viewers the leisure of watching replays without impinging on realtime action. It thus allows for the presentation of a vast amount of descriptive statistics during the course of a game. In spite of this feature of the game and the long history of the game, cricket commentators sometimes seem to feel constrained by the inability of performance statistics to really describe player performance. Comments like “Statistics don’t say everything ”are very commonly heard. The attention devoted by researchers to this aspect of cricket, therefore, seems surprisingly scant in relation to its importance and relevance. This paper seeks to develop methods to assess the performance of batsmen in cricket that (i) makes use of more information than current methods do and (ii) can be converted into visually appealing graphics for the television medium. The method is demonstrated using player statistics for the some of the key members of the Indian One Day International (ODI) cricket team between 1989 and 2005. The names of the players included in the study are listed in Table 1.
The primary measure of a batsman’s performance in cricket today is the player’s batting average defined as in Equation (1). This measure suffers from the shortcoming that it is a onedimensional number and does not capture the richness of the underlying data. Though cited very often, this measure fails to capture the various facets of a batsman. It does not provide answers to many questions that arise during the course of a game. These unaddressed concerns or questions feature often in the comments of cricket commentators. For example, commentators of the game are found to say “Player X is a dangerous player once he is set”. Or “Player X has the ability to convert a good start to a big score”. Or “Though his average does not reflect it, Player X is a more consistent performer than Player Y”.
Adjusting the raw dataThe raw data used in the development of any method for representing a batsman’s performance are the inningsbyinnings runs scored by the player. However, using this raw data poses a problem. In some of the innings the batsman would not have been dismissed. In such cases the score would not reflect the number of runs the player could potentially have gone on to score. The scores for these innings (the ‘not out’ situations) have thus to be replaced by a number that is a good estimate of the number of runs the player would have scored had he batted on. In an early work Wood, 1945 had provided empirical support to support the claim that a batsman’s scores follow a geometric distribution. Under this assumption, because of the memoryless property of the geometric distribution, a batsman’s chance of getting out is independent of the score he is on. This assumption can be used to arrive at an estimate of the number of runs a ‘not out’ player would have scored had he batted on. However, the assumption of a geometric distribution for a batsman’s scores might not hold for all players. There may be some players who are ‘slow starters’ and who therefore do better as they progress. There may be other players who become more adventurous as their score increases. For such adventurous players their chances of getting out might increase as their score increases. Kimber and Hansford, 1993 did consider deviations from the geometric distribution, but their focus was on arriving at an optimal estimator for the population mean. On the other hand, we need a method to arrive at an estimate of the number of runs a ‘not out’ batsman would have gone on to score. A Bayesian approach has been adopted in this paper to arrive at this estimate. This is achieved in the following manner. Assume that in his j th innings player i remains ‘not out’ on a score of R_{ij}. Define a binary variable G_{rik} such that: The estimate of the number of runs that the ‘notout’ batsman would have gone on to score is then given by: In other words, the estimator used for the runs that the ‘not out’ batsman would have gone on to score is the conditional average of the batsman at that point of time, given that he has already scored a certain number of runs. In every instance of a ‘not out’, the batsman’s score in that innings j is replaced by the estimate Eij. This approach has the advantage of handling deviations from the geometric distribution assumption. It is also information efficient, with the posterior values of the conditional average incorporating more information on the batsman’s performance. Table 2 gives an example of the computational procedure used for finding the replacement values for the first two ‘not outs’ in the career of one member of the Indian ODI team, Sachin Tendulkar.
Stochastic DominanceThe adjusted raw data is now used to arrive at an analytical representation of the player’s batting performance. The approach adopted draws from methods normally used for the analysis of securities and portfolios in investment management. The focus in investment management is on wealth creation. The problem of portfolio choice is that of selecting a portfolio that maximizes the utility for the investor. The utility function for the investor attaches a utility to various levels of wealth. The utility function can be constrained to have certain properties like nonsatiation (more wealth is always preferred to less) or risk aversion (diminishing marginal utility for incremental units of wealth). In mathematical terms the first constraint requires the first derivative of the utility function to be positive. Again, in mathematical terms the second constraint requires the second derivative of the utility function to be negative. Consistent with some of the abovelisted features of utility functions, the traditional approach to the portfolio selection problem has been the meanvariance approach. Amongst the alternative approaches to the portfolio selection problem suggested in the investment management literature is the set of stochastic dominance rules (Ali, 1975; Bawa, 1978; Levy, 1973). To use stochastic dominance rules we need to characterize the utility function of the investor. According to the firstorder stochastic dominance rules a portfolio A is preferred to another portfolio B if, for any level of return, the cumulative probability of portfolio A giving a return lesser than the given level of return is never greater, and sometimes less, than the cumulative probability of portfolio B giving a return lesser than that given level of return. This rule is consistent with the assumption that in the investors’ utility function more wealth is preferred to less. (Elton and Gruber, 1997) Analogous to the portfolio selection problem, a similar approach is adopted in this paper to represent the batting performance of cricketers. Using this approach we can say that a batsman A’s performance is better than another batsman B’s if, for any level of score, the probability of batsman A getting a score greater than the given score is never lesser, and sometimes greater, than the probability of batsman B getting a score greater than that given score. This rule corresponds to the firstorder stochastic dominance rules and assumes that more runs are always preferred to less. The cumulative probability charts of various batsmen can now be charted with runs on the Xaxis (with the origin as zero) and the probability of scoring more runs than the Xaxis value of the score (that is one minus the cumulative probabilities of scoring lesser than the Xaxis value of score) on the Yaxis. Visually this would mean that a batsman whose stochastic dominance curve envelops another’s curve stochastically dominates the other batsman.
The method is demonstrated using data for the Indian ODI cricket team spanning the years 1989 (the year one of India’s most highly rated players, Sachin Tendulkar, made his debut) to 2005. This period was chosen because this was a period during which the compositional changes in the Indian ODI team were very few. A sample batting performance stochastic dominance chart output for five Indian players is given in Figure 1. Four of the five players represented are essentially specialist batsmen (Tendulkar, Dravid, Sehwag and Laxman) and one a specialist bowler (Khan). The results are interesting and have intuitive appeal. They are consistent with popular notions regarding the batsmen whose performances were studied. For example, the curve for Sachin Tendulkar, who is considered an icon of Indian cricket, almost completely envelops the curves for other players. And the curve for Rahul Dravid, who is referred to as ‘the wall’ because of his perceived consistency, does indeed dominate the curves for other players till the 20 run point. In other words, the chances of Rahul Dravid getting a score less than 20 is lesser than the chances for any other player in the Indian team getting a score lesser than 20. Finally, the curves for the specialist batsmen very clearly dominate the curves for the specialist bowlers, as should be the case.
The method that has been developed only provides an alternative approach to represent the batting performance of cricket players. This alternative approach is visually and intuitively appealing. The attempt in this paper is not to arrive at a model to rank the utility of players. Nor is the goal to develop a model to assist in team selection. The utility of a player goes far beyond the runs scored by him. Factors like tactical skills, passive support to the partner batsmen, etc. cannot be gauged by looking at the runs scored. Even if we use runs scored as the sole measure of utility, firstorder stochastic dominance rules alone cannot be used to rank players in terms of their utility. And if we go on to secondorder stochastic dominance rules the utility function might not have a negative second derivative. In other words, there could be potentially match winning situations in which a batsman who is batting on a very high score (say, 108) has to score one more run in order for the team to win the match. In this situation the incremental one run (from 108 to 109) might be much more valuable than the incremental one run the batsman scored while he was on a lower score (say, 23) during the same innings.
ConclusionsWithin the limits of this study, the paper seeks to highlight the tremendous scope that exists to improve and develop on the measures currently used to describe the performances of cricket players in general, and batsmen in particular. The measures used today do not adequately capture the richness of the underlying data. Similar approaches can be adopted to represent the performances of bowlers too.

AUTHOR BIOGRAPHY 

Uday Damodaran 
Employment: Professor of Finance, XLRI School of Business and Human Resources, Jamshedpur, Jharkand, India. 
Degree: Fellow (IIM Bangalore), M.Stat., B.Sc. (Physics) 
Research interests: Investments, Corporate Finance, Cricket. 
Email: uday@xlri.ac.in 



REFERENCES 
Ali M. (1975) Stochastic Dominance and Portfolio Analysis. Journal of Financial Economics 2, 205230.

Bawa V. (1978) Safetyfirst, stochastic dominance, and optimal portfolio choice. Journal of Financial and Quantitative Analysis XIII, 225271.

Carter M., Guthrie G. (2004) Cricket interruptus: fairness and incentive in limited overs cricket matches. Journal of the Operational Research Society 55, 822829.

Clarke S.R. (1988) Dynamic Programming in oneday cricket optimal scoring rates. Journal of the Operational Research Society 39, 331337.

Clarke S.R., Norman J.M. (1999) To run or not?: Some dynamic programming models in cricket. Journal of the Operational Research Society 50, 536545.

Duckworth F.C., Lewis A.J. (1998) A fair method of resetting the target in interrupted oneday cricket matches. Journal of the Operational Research Society 49, 220227.

Elton E.J., Gruber M.J. (1997) . Modern portfolio theory and investment analysis , .

Kimber A.C., Hansford A.R. (1993) A statistical analysis of batting in cricket. Journal of the Royal Statistical Society: Series A (Statistics in Society) 156, 443455.

Lemmer H.H. (2004) A measure for the batting performance of cricket players. South African Journal for Research in Sport, Physical Education and Recreation 26, 5564.

Levy H. (1973) Stochastic dominance, efficiency criteria, and efficient portfolios: The multiperiod case. American Economic Review LXIII, 986994.

Lewis A.J. (2005) Towards fairer measures of player performance in oneday cricket. Journal of the Operational Research Society 56, 804815.

Preston I., Thomas J. (2000) Batting strategy in limited overs cricket. Journal of the Royal Statistical Society: Series D (The Statistician) 49, 95106.

Preston I., Thomas J. (2002) Batting strategy in limited overs cricket. Journal of the Royal Statistical Society: Series D (The Statistician) 51, 189202.

Rajadhyaksha G., Arapostathis A., Hubbard M., Mehta R.D., Pallis J.M. (2004) Cricket strategy: Using a Bayesian network to recommend the best bowling option. The Engineering of Sport V, Vol 2 , 425431.

Swartz T.B., Gill P.S., Beaudoin D., de Silva B.M. (2006) Optimal batting orders in oneday cricket. Computers and Operations Research 33, 19391950.

Wood G.H. (1945) Cricket scores and geometric progressions. Journal of the Royal Statistical Society: Series A (Statistics in Society) 108, 1222.






