Agreement between raters and groups of raters

Vanbelle, Sophie

Download

Doctoral thesis (Dissertations and theses)

Agreement between raters and groups of raters

Vanbelle, Sophie

2009

Permalink
https://hdl.handle.net/2268/39575

Files (2)Send to Details Statistics Bibliography Similar publications

Files

Full Text

vanbelle-thesis-5-5-2009.pdf

Publisher postprint (1.62 MB)

Download

Annexes

vanbelle-thesis-errata-28-05-2009.pdf

Publisher postprint (135.09 kB)

errata

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

reliability; intraclass correlation coefficient; categorical data; fiabilité; coefficient de correlation intraclasse; données catégorielles

Abstract :

[en] Agreement between raters on a categorical scale is not only a subject of scientific research but also a problem frequently encountered in practice. Whenever a new scale is developed to assess individuals or items in a certain context, inter-rater agreement is a prerequisite for the scale to be actually implemented in routine use. Cohen's kappa coeffcient is a landmark in the developments of rater agreement theory. This coeffcient, which operated a radical change in previously proposed indexes, opened a new field of research in the domain. In the first part of this work, after a brief review of agreement on a quantitative scale, the kappa-like family of agreement indexes is described in various instances: two raters, several raters, an isolated rater and a group of raters and two groups of raters. To quantify the agreement between two individual raters, Cohen's kappa coefficient (Cohen, 1960) and the intraclass kappa coefficient (Kraemer, 1979) are widely used for binary and nominal scales, while the weighted kappa coefficient (Cohen, 1968) is recommended for ordinal scales. An interpretation of the quadratic (Schuster, 2004) and the linear (Vanbelle and Albert, 2009c) weighting schemes is given. Cohen's kappa (Fleiss, 1971) and intraclass kappa (Landis and Koch, 1977c) coefficients were extended to the case where agreement is searched between several raters. Next, the kappa-like family of agreement coefficients is extended to the case of an isolated rater and a group of raters (Vanbelle and Albert, 2009a) and to the case of two groups of raters (Vanbelle and Albert, 2009b). These agreement coefficients are derived on a population-based model and reduce to the well-known Cohen's kappa coefficient in the case of two single raters. The proposed agreement indexes are also compared to existing methods, the consensus method and Schouten's agreement index (Schouten, 1982). The superiority of the new approach over the latter is shown. In the second part of the work, methods for hypothesis testing and data modeling are discussed. Firstly, the method proposed by Fleiss (1981) for comparing several independent agreement indexes is presented. Then, a bootstrap method initially developed by McKenzie et al. (1996) to compare two dependent agreement indexes, is extended to several dependent agreement indexes (Vanbelle and Albert, 2008). All these methods equally apply to the kappa coefficients introduced in the first part of the work. Next, regression methods for testing the effect of continuous and categorical covariates on the agreement between two or several raters are reviewed. This includes the weighted least-squares method allowing only for categorical covariates (Barnhart and Williamson, 2002) and a regression method based on two sets of generalized estimating equations. The latter method was developed for the intraclass kappa coefficient (Klar et al., 2000), Cohen's kappa coefficient (Williamson et al., 2000) and the weighted kappa coefficient (Gonin et al., 2000). Finally, a heuristic method, restricted to the case of independent observations, is presented (Lipsitz et al., 2001, 2003) which turns out to be equivalent to the generalized estimating equations approach. These regression methods are compared to the bootstrap method extended by Vanbelle and Albert (2008) but they were not generalized to agreement between a single rater and a group of raters nor between two groups of raters.
[fr] Sujet d'intenses recherches scientifiques, l'accord entre observateurs sur une échelle catégorisée est aussi un problème fréquemment rencontré en pratique. Lorsqu'une nouvelle échelle de mesure est développée pour évaluer des sujets ou des objets, l'étude de l'accord inter-observateurs est un prérequis indispensable pour son utilisation en routine. Le coefficient kappa de Cohen constitue un tournant dans les développements de la théorie sur l'accord entre observateurs. Ce coefficient, radicalement différent de ceux proposés auparavant, a ouvert de nouvelles voies de recherche dans le domaine. Dans la première partie de ce travail, après une brève revue des mesures d'accord sur une échelle quantitative, la famille des coefficients kappa est décrite dans différentes situations: deux observateurs, plusieurs observateurs, un observateur isolé et un groupe d'observateurs, et enfin deux groupes d'observateurs. Pour quantifier l'accord entre deux observateurs, le coefficient kappa de Cohen (Cohen, 1960) et le coefficient kappa intraclasse (Kraemer, 1979) sont largement utilisés pour les échelles binaires et nominales. Par contre, le coefficient kappa pondéré (Cohen, 1968) est recommandé pour les échelles ordinales. Schuster (2004) a donné une interprétation des poids quadratiques tandis que Vanbelle and Albert (2009c) se sont interessés aux poids linéaires. Les coefficients d'accord correspondant au coefficient kappa de Cohen (Fleiss, 1971) et au coefficient kappa intraclasse (Landis and Koch, 1977c) sont aussi donnés dans le cas de plusieurs observateurs. La famille des coefficients kappa est ensuite étendue au cas d'un observateur isolé et d'un groupe d'observateurs (Vanbelle and Albert, 2009a) et au cas de deux groupes d'observateurs (Vanbelle and Albert, 2009b). Les coefficients d'accord sont élaborés à partir d'un modèle de population et se réduisent au coefficient kappa de Cohen dans le cas de deux observateurs isolés. Les coefficients d'accord proposés sont aussi comparés aux méthodes existantes, la méthode du consensus et le coefficient d'accord de Schouten (Schouten, 1982). La supériorité de la nouvelle approche sur ces dernières est démontrée. Des méthodes qui permettent de tester des hypothèses et modéliser des coefficients d'accord sont abordées dans la seconde partie du travail. Une méthode permettant la comparaison de plusieurs coefficients d'accord indépendants (Fleiss, 1981) est d'abord présentée. Puis, une méthode basée sur le bootstrap, initialement développée par McKenzie et al. (1996) pour comparer deux coefficients d'accord dépendants, est étendue au cas de plusieurs coefficients dépendants par Vanbelle and Albert (2008). Pour finir, des méthodes de régression permettant de tester l'effet de covariables continues et catégorisées sur l'accord entre deux observateurs sont exposées. Ceci comprend la méthode des moindres carrés pondérés (Barnhart and Williamson, 2002), admettant seulement des covariables catégorisées, et une méthode de régression basée sur deux équations d'estimation généralisées. Cette dernière méthode a été développée dans le cas du coefficient kappa intraclasse (Klar et al., 2000), du coefficient kappa de Cohen (Williamson et al., 2000) et du coefficient kappa pondéré (Gonin et al., 2000). Enfin, une méthode heuristique, limitée au cas d'observations indépendantes, est présentée (Lipsitz et al., 2001, 2003). Elle est équivalente à l'approche par les équations d'estimation généralisées. Ces méthodes de régression sont comparées à l'approche par le bootstrap (Vanbelle and Albert, 2008) mais elles n'ont pas encore été généralisées au cas d'un observateur isolé et d'un groupe d'observateurs ni au cas de deux groupes d'observateurs.
[nl] Het bepalen van overeenstemming tussen beoordelaars voor categorische gegevens is niet alleen een kwestie van wetenschappelijk onderzoek, maar ook een probleem dat men veelvuldig in de praktijk tegenkomt. Telkens wanneer een nieuwe schaal wordt ontwikkeld om individuele personen of zaken te evalueren in een bepaalde context, is interbeoordelaarsovereenstemming een noodzakelijke voorwaarde vooraleer de schaal in de praktijk kan worden toegepast. Cohen's kappa coëfficiënt is een mijlpaal in de ontwikkeling van de theorie van interbeoordelaarsovereenstemming. Deze coëfficiënt, die een radicale verandering met de voorgaande indices inhield, opende een nieuw onderzoeksspoor in het domein. In het eerste deel van dit werk wordt, na een kort overzicht van overeenstemming voor kwantitatieve gegevens, de kappa-achtige familie van overeenstemmingsindices beschreven in verschillende gevallen: twee beoordelaars, verschillende beoordelaars, één geïsoleerde beoordelaar en een groep van beoordelaars, en twee groepen van beoordelaars. Om de overeenstemming tussen twee individuele beoordelaars te kwantificeren worden Cohen's kappa coëfficiënt (Cohen, 1960) en de intraklasse kappa coëfficiënt (Kraemer, 1979) veelvuldig gebruikt voor binaire en nominale gegevens, terwijl de gewogen Kappa coëfficiënt (Cohen, 1968) aangewezen is voor ordinale gegevens. Een interpretatie van de kwadratische (Schuster, 2004) en lineaire (Vanbelle and Albert, 2009c) weegschema's wordt gegeven. Overeenstemmingsindices die overeenkomen met Cohen's Kappa (Fleiss, 1971) en intraklasse-kappa (Landis and Koch, 1977c) coëfficiënten kunnen worden gebruikt om de overeenstemming tussen verschillende beoordelaars te beschrijven. Daarna wordt de familie van kappa-achtige overeenstemmingscoëfficiënten uitgebreid tot het geval van één geïsoleerde beoordelaar en een groep van beoordelaars (Vanbelle and Albert, 2009a) en tot het geval van twee groepen van beoordelaars (Vanbelle and Albert, 2009b). Deze overeenstemmingscoëfficiënten zijn afgeleid van een populatie-gebaseerd model en kunnen worden herleid tot de welbekende Cohen's coëfficiënt in het geval van twee individuele beoordelaars. De voorgestelde overeenstemmingsindices worden ook vergeleken met bestaande methodes, de consensusmethode en Schoutens overeenstemmingsindex (Schouten, 1982). De superioriteit van de nieuwe benadering over de laatstgenoemde wordt aangetoond. In het tweede deel van het werk worden hypothesetesten en gegevensmodellering besproken. Vooreerst wordt de methode voorgesteld door Fleiss (1981) om verschillende onafhankelijke overeenstemmingsindices te vergelijken, voorgesteld. Daarna wordt een bootstrapmethode, oorspronkelijk ontwikkeld door McKenzie et al. (1996) om twee onafhankelijke overeenstemmingsindices te vergelijken, uitgebreid tot verschillende afhankelijke overeenstemmingsindices (Vanbelle and Albert, 2008). Al deze methoden kunnen ook worden toegepast op de overeenstemmingsindices die in het eerste deel van het werk zijn beschreven. Ten slotte wordt een overzicht gegeven van regressiemethodes om het e ect van continue en categorische covariabelen op de overeenstemming tussen twee of meer beoordelaars te testen. Dit omvat de gewogen kleinste kwadraten methode, die alleen werkt met categorische covariabelen (Barnhart and Williamson, 2002) en een regressiemethode gebaseerd op twee sets van gegeneraliseerde schattingsvergelijkingen. De laatste methode was ontwikkeld voor de intraklasse kappa coëfficiënt (Klar et al., 2000), Cohen's kappa coëfficiënt (Williamson et al., 2000) en de gewogen kappa coëfficiënt (Gonin et al., 2000). Ten slotte wordt een heuristische methode voorgesteld die alleen van toepassing is op het geval van onafhankelijk waarnemingen (Lipsitz et al., 2001, 2003). Ze blijkt equivalent te zijn met de benadering van de gegeneraliseerde schattingsvergelijkingen. Deze regressiemethoden worden vergeleken met de bootstrapmethode uitgebreid door Vanbelle and Albert (2008) maar werden niet veralgemeend tot de overeenstemming tussen een enkele beoordelaar en een groep van beoordelaars, en ook niet tussen twee groepen van beoordelaars.

Disciplines :

Mathematics

Author, co-author :

Vanbelle, Sophie ; Université de Liège - ULiège > Département de mathématique > Département de mathématique

Language :

English

Title :

Agreement between raters and groups of raters

Alternative titles :

[fr] Accord entre observateurs et groupes d'observateurs

Defense date :

11 June 2009

Number of pages :

226

Institution :

ULiège - Université de Liège

Degree :

Docteur en Sciences

Promotor :

Albert, Adelin ; Université de Liège - ULiège > Département des sciences de la santé publique

President :

Gérard, Paul ; Université de Liège - ULiège > Département de mathématique

Jury member :

Haesbroeck, Gentiane ; Université de Liège - ULiège > Mathematics

Giet, Didier ; Université de Liège - ULiège > Soins primaires et santé

Lambert, Philippe ; Université de Liège - ULiège > Mathematics

Lesaffre, Emmanuel

Monseur, Christian ; Université de Liège - ULiège > Unités de recherche interfacultaires > Research Unit for a life-Course perspective on Health and Education (RUCHE)

Schouten, Hubert

Available on ORBi :

since 25 May 2010

Statistics

Number of views

727 (13 by ULiège)

Number of downloads

5939 (24 by ULiège)

More statistics