|
|
BIOSTATISTICS |
|
Year : 2016 | Volume
: 2
| Issue : 2 | Page : 217-219 |
|
Understanding the calculation of the kappa statistic: A measure of inter-observer reliability
Sidharth S Mishra, Nitika
Department of Community Medicine, School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh, India
Date of Submission | 22-Feb-2016 |
Date of Acceptance | 29-Mar-2016 |
Date of Web Publication | 28-Dec-2016 |
Correspondence Address: Nitika School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh India
 Source of Support: None, Conflict of Interest: None  | Check |
DOI: 10.4103/2455-5568.196883
It is common practice to assess the consistency of diagnostic ratings in terms of “agreement beyond chance.” The kappa coefficient is a popular index of agreement for binary and categorical ratings. This article focuses on the unweighted kappa statistic calculation by providing a stepwise approach that is supplemented with an example. The aim is that health care personnel may better understand the purpose of the kappa statistic and how to calculate it. The following core competencies are addressed in this article: Medical knowledge.
Keywords: Inter-rater agreement, kappa coefficient, unweighted kappa
How to cite this article: Mishra SS, Nitika. Understanding the calculation of the kappa statistic: A measure of inter-observer reliability. Int J Acad Med 2016;2:217-9 |
How to cite this URL: Mishra SS, Nitika. Understanding the calculation of the kappa statistic: A measure of inter-observer reliability. Int J Acad Med [serial online] 2016 [cited 2023 Jun 5];2:217-9. Available from: https://www.ijam-web.org/text.asp?2016/2/2/217/196883 |
Introduction | |  |
It is common practice to assess the consistency of diagnostic ratings in terms of “agreement beyond chance” specifically, agreement between two clinicians under two different conditions or the agreement among multiple clinicians under one condition.[1] To this end, we consider a relevant statistical technique such as Cohen's kappa, which is a common index of agreement for binary and categorical ratings.[2] If the categories are unordered, the unweighted kappa statistic (K) is appropriate. If the categories are ordered – as they are in most rating scales in clinical, psychological, and epidemiological research – the weighted kappa statistic (K[w]) is preferable.[3] While there are many modifications and variants of kappa statistic, this article focused on calculation of the unweighted kappa statistic calculation by providing a stepwise approach and supplemented with an example.
Estimating the Kappa Statistic | |  |
Step 1: Calculate the percentages of each row and column out of the grand total of all four cells [Table 1].
Step 2: Calculate the percentage of observed agreement

Step 3: Calculate the percentage of agreement expected by chance alone.
In this agreement is present in two cells, i.e. A – in which both are agreeing and in D – in which both disagrees. “a” is the expected value for cell A, and “d” is the expected value for cell D.
For each cell, we need to find it by,

That is, 
Similarly, method has to be followed for calculating d [Table 2].
Percentage agreement expected by chance is

Step 4:

Step 5 (inference): It was suggested by Landis and Koch [4] that a kappa value more than 0.75 represented excellent agreement beyond chance whereas below 0.40 had poor agreement. A kappa value in the range of 0.40–0.75 represents intermediate to good agreement.
Example of Estimation of the Kappa Statistic | |  |
Suppose that 100 patients suffering from pancreatic carcinoma underwent contrast-enhanced computed tomography abdomen and that 2 radiologists reviewed the reports [Table 3].
Solution to Problem | |  |
Step 1: Calculate the percentages of each row and column out of the grand total of all four cells [Table 4].
Step 2: Calculate the percentage of observed agreement

Step 3: Calculation of the percentage of agreement expected by chance alone [Table 5].
For each cell, we need to find it by,

That is, 
Similarly, b = 20.25, c = 30.25, and d = 24.75
Percentage agreement expected by chance alone

Step 4:

Step 5 (inference): intermediate to good agreement.
Conclusion | |  |
The kappa statistic is a frequently used measure of inter-observer reliability, but its manual calculation may cause confusion. The aim of this article is to help health care personnel better understand the purpose of the kappa statistic and how to calculate it.
Acknowledgment
We would like to thank Dr. Reshmi Mishra, and Dr. Tushar Subhadarshan Mishra.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References | |  |
1. | Cao H, Sen PK, Peery AF, Dellon ES. Assessing agreement with multiple raters on correlated kappa statistics. Biom. J. doi: 10.1002/bimj. 201500029. |
2. | Barnhart HX, Williamson JM. Weighted least-squares approach for comparing correlated kappa. Biometrics 2002;58:1012-9. |
3. | Ludbrook J. Statistical techniques for comparing measurers and methods of measurement: A critical review. Clin Exp Pharmacol Physiol 2002;29:527-36. |
4. | Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74. |
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]
|