Confidence testing; Educational research; Test validity
Résumé :
[en] In a confidence weighting situation, the examinee is asked to indicate the correct answer, and how certain he or she is of the correctness of that answer. This paper reviews the bases for confidence marking, its validity and accuracy in evaluating students, and it's use in research.
Disciplines :
Education & enseignement
Auteur, co-auteur :
Leclercq, Dieudonné ; Université de Liège - ULiège > Département d'éducation et formation > Technologie de l'éducation
Adams, Adams (1961) Realism of confidence judgments. Psychological Review 68:33-45.
Ahlgren Confidence on achievement tests and the prediction of retention, Unpublished doctoral dissertation, Harvard University; 1967.
Allais (1953) Le comportement de l'hommme rationnel devant le risque critique des postulats et axiomes de l'école américane. Econometrica 21:503-546.
Allais (1953) La psychologie de l'homme rationnel devant le risque: La théorie et l'expérience. Journal of Social Statistics 94:43-73.
Atkinson An Introduction to Motivation, Van Nostrand, Princeton; 1964.
Atkinson, Litwin (1960) Achievement motive and test anxiety conceived as motive to approach success and motive to avoid failure. Journal of Abnormal Social Psychology 60:52-63.
Attneave Application of information theory to psychology, Holt Rinehart and Winston, New York; 1959.
Baker (1969) The uncertain student and the understanding computer. La Recherche en enseignement programmé, Tendances actuelles , Dunod, Paris; 303-319.
Bartholomé, Houziaux (1979) SIAM-DOCEO II. Instruction manual.
Bayes (1763) An essay toward solving a problem in the doctrine of chance. Philosophical Transactions of the Royal Society, London; .
Beaujot, Didelez, Fontaine, Leclercq (1966) Etude d'une nouvelle technique d'évitement sans signal avertisseur chez le rat. Psychologica Belgica 7.
Beenen (1970) Psychiatric diagnosis and subjective probabilities. Acta Psychologica 34.
.
Brier (1950) Verification of forecasts expressed in terms of probability. Monthly Weather Review 75:1-3.
Brownless, Keats (1958) A retest method of studying partial knowledge. Psychometrika 23(1):67-73.
Bujas, Koviacic, Rohacek (1975) Psychological function based on confidence rating. Acta Instituti Psychologici Universitatis Zagrabiensis 74.
Chauvin (1967) Paradoxe dans les résultats du conditionnement. Journal de Pschychologie Normale et Pathologique 129-141.
Chernoff (1954) Rational selection of decision functions. Econometrica 422-443.
Chernoff, Moses Elementary Decision Theory, wiley, New York; 1954.
Chernoff (1962) The scoring of multiple choice questionnaires. The Annals of Mathematical Statistics 33:375-393.
Cherry (1957) On the validity of applying communication theory to experimental psychology. British Journal of Psychology 48:176-188.
.
Choppin The correction for guessing on objective tests, International Association for the Evaluation of Educational Achievement, Stockholm; 1974.
Choppin (1975) Guessing the answer on objective tests. British Journal of Educational Psychology 45:206-213.
Choppin (1975) Recent developments in item banking. A review. Montreaux, Second International Symposium on Educational Testing.
Choppin (1978) Item banking and the monitoring of achievement. Research in progress, NFER Series.
Clark, Teevan, Ricciuti (1956) Hope of success and fear of failure as aspects of need for achievement. Journal of Abnormal Social Psychology 53:182-186.
Cooke (1906) Forecasts and verifications in Western Australia. Monthly Weather Review 34:23-24.
Coombs (1950) Psychological scaling without a unit of measurement. Psychological Review 57:145-158.
Coombs (1953) On the use of objective examinations. Educational and Psychological Measurement 13:108-130.
Coombs, Milholland, Womer (1956) The assessment of partial knowledge. Educational and Psychological Measurement 16:13-37.
Coombs, Pruitt (1950) Components of risk in decision making Probability and variance Preferences. Journal of Experimental Psychology 265-277.
Coombs (1960) A Theory of data. Psychological Review 67:143-159.
Coombs, Greenberg, Zinnes (1961) A double law of comparative judgment for the analysis of preferential choice and similarities data. Psychometrika 26:165-171.
Coombs, Bowen (1971) A test of VE theories of risk and the effect of the central limit theorem. Acta Psychologica 35:15-28.
Crawford, Lewy A rapid and efficient method for scoring and analyzing complex multiple choice examinations, National Council on Measurement in Education, Chicago; 1965.
Cronbach (1950) Further evidence on response sets and test design. Educational and Psychological Measurement 10:3-31.
Davis (1959) Use of correction for chance success in test scoring. Journal of Educational Research 52:179-180.
Davis, Fifer (1959) The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement 19:159-170.
Davis Educational Measurements and their Interpretations, Woodsworth, Belmont, CA; 1964.
De Finetti (1937) La prévision: ses lois logiques, ses sources subjectives. Annales de l'Institut Henri Poincaré 7.
De Finetti (1959) Dans quel sens la théorie de la décision est-elle et doit-elle être normative?. La Décision, F.N.R.S, FNRS, Paris; .
De Finetti (1962) Does it make sense to speak of good probability appraisers?. The Scientist Speculates , I.J. Good, Basic Books, New York; 357-364.
De Finetti (1963) La décision et les probabilités. Revue des Mathématiques pures et appliquées , Bucarest; 405-413.
De Finetti (1965) Methods for discriminating levels of partial knowledge concerning a test item. British Journal of Mathematical and Statistical Psychology 18:87-123.
De Finetti (1970) Logical foundations and measurement of subjective probability. Acta Psychologica 34:129-145.
D'Hainaut (1974) Une méthode de compensation statistique des choix heureux par ignorance dans les questions fermées d'épreuves d'acquisition. Les sciences de l'éducation 7(1):57-83.
Diamond, Evans (1973) The correction for guessing. Review of Educational Research 43:2.
Dressel, Schmid (1953) Some modifications of the multiple-choice item. Educational and Psychological Measurement 13:574-595.
Ebel (1965) Confidence-weighting and test reliability. Journal of Educational Measurement 2:49-57.
Ebel (1968) Review of valid confidence testing demonstration kit. Journal of Educational Measurement 5:353-354.
Ebel (1969) Expected reliability as a function of choices per item. Educational and Psychological Measurement 29:565-570.
Ebert (1971) Sequential decision making An aggregate scheduling methodology. Psychometrika 36.
Echternacht (1972) The use of confidence testing in objective tests. Review of Educational Research 42:217-236.
Edgington (1965) Soring formulas that correct for guessing. Journal of Experimental Education 33:345-346.
Edwards (1953) Probability preferences in gambling. The American Journal of Psychology 66:349-364.
Edwards (1954) Variance preferences in gambling. American Journal of Psychology 67:441-452.
Edwards (1954) Probability preferences among bets with differing expected values. American Journal of Psychology 67:56-57.
Edwards (1954) The reliability of probability preferences. American Journal of Psychology 67:68-95.
Edwards (1954) The theory of decision making. Psychological Bulletin 51:380-417.
Edwards (1954) Methods for computing uncertainties. American Journal of Psychology 67:164-170.
Edwards (1955) The prediction of decisions among bets. Journal of Experimental Psychology 59:201-214.
Edwards (1960) Measurement of utility and subjective probability. Psychological Scaling: Theory and Applications, H. Gulliksen, Messick, Wiley, New York; .
Edwards (1961) Probability learning in 1000 trials. Journal of Experimental Psychology 62(4):385-394.
Edwards (1961) Behavioral Decision theory. Annual Review of Psychology 12:473-498.
Edwards (1962) Subjective probabilities inferred from decisions. Psychological Review 69:109-135.
Edwards (1962) Utility subjective probability their interaction and variance preferences. Journal of Conflict Resolution 6:42-51.
Edwards (1967) Probabilistic information processing by men and man-machine systems. La simulation du comportement humain , Dunod, Paris; 187.
Edwards, Lindman, Phillips (1965) Emerging technologies for making decisions. New Directions in psychology , Holt, Rinehart and Winston, New York; 2:261-325.
Epstein (1969) A scoring system for probability forecasts of ranked categories. Journal of Applied Meteorology 8:985-987.
Fabre (1977) Docimologie et évaluation par questionnaires: étude du jugement multiple et de l'autopondération. Thèse de doctorat de 3e cycle en psychologie, Université de Provence; .
Festinger Conflict, Decision and dissonance, Stanford University Press, Stanford; 1964.
Fischer (1977) Tailored testing on the basis of the Rasch model. Paper presented at the 3rd International Symposium on Educational Testing, Leyden; .
Greenberg (1963) J scale models for preference behavior. Psychometrika 28(3):265-271.
Hambleton, Roberts, Traub (1970) A Comparison of the reliability and validity of two methods for assessing partial knowledge on a multiple-choice test. Journal of Educational Measurement 7:75-82.
Hamilton (1950) Bias and error in multiple-choice tests. Psychometrika 15:151-168.
Hammerton (1965) The guessing correction in vocabulary tests. British Journal of Educational Psychology 35:249-251.
Hancock, Teevan (1964) Fear of failure and risk-taking behavior. J. Pers. 32:200-209.
Hardy Approche expérimentale du comportement d'estimation et de sa mesure, Unpublished graduate dissertation, University of Liège; 1980.
Hardy Using computer-based feedback to improve estimation ability, IFIP, NCCE, Lausanne; 1981.
Henmon (1911) The relation of the time of a judgment to its accuracy. Psychological Review 18:186-201.
Hevner (1932) A method of correcting for guessing in true-false tests and empirical evidence in support of it. The Journal of Social Psychology 3:359-362.
Hollingworth (1913) Archives of Psychology. Experimental studies in judgment 29:1-119.
Hopkins (1964) Extrinsic reliability estimating and attenuating variance from response styles chance and other irrelevant sources. Educational and Psychological Measurement 24:271-281.
Hopkins, Hakstian, Hopkins (1973) Validity and reliability consequences of confidence weighting. Educational and Psychological Measurement 33:135-141.
Horst (1933) The difficulty of a multiple-choice test item. Journal of Educational Psychology 24:229-232.
Houziaux (1965) Les fonctions didactiques de DOCEO. Actes du XII Colloque de l'association internationale de pédagogie expérimentale de langue francaise , University of Caen; 47-71.
Houziaux Vers l'enseignement assisté par ordinateur, Presses Universitaires Francaises, Paris; 1972.
Houziaux, Godart, Lavigne, Bartholome, Luyckx, Lefebvre (1978) Une expérience d'enseignement assisté par ordinateur chez des patients diabetiques insulinodépendants. Scientia Paedagogica Experimentalis 15:215-250.
Hurwicz (1951) Optimality Criteria for Decision Making under Ignorance. Technical report no. 70, Cowles commission discussion paper, Statistics.
Isaacson (1964) Relation between achievement, test anxiety and curricular choices. Journal of Abnormal Social Psychology 68:447-452.
Irwin, Smith (1957) Value, cost and information as determiners of decision. Journal of Experimental Psychology 54:229-232.
Jacobs (1968) An empirical investigation of the relationship between selected aspects of peronality and confidence-weighting behaviors. Doctoral dissertation , University of Maryland, University of Micro-films; 68.
Jacobs (1968) An empirical investigation of the relationship between selected aspects of personality and confidence-weighting behaviors. Doctoral dissertation 16.
Jacobs (1968) An empirical investigation of the relationahip between selected aspects of personality and confidence-weighting behaviors. Doctoral dissertation 676.
Jacobs (1971) Correlates of unwarranted confidence in response to objective test items. Journal of Educational Measurement 8:1.
Jungermann, Dezeeuw (1977) Decision making and change in human affairs. Proceedings of the fifth research conference on subjective probability, utility and decision making, Darmstadt, Reidel; .
Kido (1970) The utilization of subjective probabilities in production planning. Acta Psychologica 34:338-347.
Koehler (1971) A comparison of the validities of conventional choice testing and various confidence marking procedures. Journal of Educational Measurement 8:4.
Kuder (1950) Identifying the faker. Personnel Psychology 3:155-167.
Leclercq (1977) Sequential adaptive tailored testing and confidence marking. Psychometrics for Educational Debates: Proceedings of the 3rd International Symposium on Education Testing , Vanderkamp, Langerak, De Gruyter; 306.
Leclercq (1977) Concepts, procedures and coefficients to be used with confidence marking. Paper presented at the 8th European Mathematical Psychology meeting, Saarbrucken; .
Leclercq (1977) L̇ Matrices or the computation of consequences for confidence marking prcedures in educational settings; rationale, algorithm and FORTRAN program. Paper presented at the 6th Research Conference on Subjective Probability, Utility and Decision-Making, Warsaw; .
Leclercq (1979) Test-retest replication and spontaneous acuity of subjective probabilities; results from a guessing game. Paper presented at the 7th research conference on subjective probability, utility and decision-making, Gothenburg; .
Leclercq (1978) Un module d'auto-évaluation ou Comment impliquer l'etudiant dans la régulation de ses apprentissages. Education (165):59-73.
Leclercq (1978) L'Auto-évaluation des compétences dans le domaine cognitif. Revue, 13e annee , February; (2):3-20.
Leclercq (1980) Computerised tailored testing Structured and calibrated item banks for summative and formative evaluation. European Journal of Education 15(3).
Lefebvre, Houziaux (1969) Anamnése assistée par ordinateur en diabétologie. Résultats préliminaires , Revue Médicale de Liége; 24:803-809.
Lewy, McGuire (1966) A study of alternative approaches in estimating the reliability of conventional tests. Paper presented at the AERA annual meeting, Chicago; .
Lieblich (1968) The effect of Stress and the motivation to succeed on test risk. Journal of Personality 36:608-615.
Lichtenstein, Fischhoff, Phillips Calibration of probabilities: The state of the art, Jungermann, DeZeeuw; 1977.
Linder, Wortman, Brehm (1971) Temporal changes in predecision preferences among choice alternatives. Journal of Peronality and Social Psychology 19:282-284.
Lindley Introduction to probability and statistics from a Bayesian viewpoing, Part 1: Probability, Cambridge University Press, London; 1969.
Lindley Introduction to Probability… Part 2: Inference, Cambridge University Press, London; 1970.
Lindley Making decisions, Wiley, London; 1971.
Littig The Effect of Motivation on Probability Preference and Subjective Personality, University of Michigan; 1959.
Lord (1963) Formula scoring and validity. Educational and Psychological Measurement 23:663-672.
Lord (1964) The effect of random guessing on test validity. Educational and Psychological Measurement 24:745-747.
Lord, Novick Statistical Theories of Mental Test Scores, Addison-Wesley, Reading, MA; 1968.
Lord Some test theory for tailored testing, W.H. Holtsman; 1970.
Lord (1970) The self-scoring flexilevel test. E.T.S. Research Bulletin 70:43.
Lovie, Davies (1970) The effect of rate of revision and initial revision on the perception of another's age. Acta Psychologica 34:322-327.
Luce Individual Choice Behavior, Wiley, New York; 1959.
Luce, Raiffa Games and Decision, Wiley, New York; 1966.
Lumingu Etude préalable a la construction d'un test diagnostique sur la consultation du dictionnaire, Unpublished thesis, University of Llége; 1974.
Lyerly (1951) A Note for correcting for chance success in objective tests. Psychometrika 16:21-30.
Manz (1970) Experiments on probabilistic information processing. Acta Psychologica 34:184-200.
Martin Bayesian Decision Problems and Markov Chains, Wiley, New York; 1967.
Massengill, Shuford What Pupils and Teachers Should Know About Guessing, Shuford-Massengill Corp, Lexington, MA; 1967.
Massengill, Shuford A Report on the Effect of Degree of Confidence in Student Teaching, U.S. Air Force, Office of Scientific Research; 1968.
Medley (1966) The effects of heterogeneity of content and guessing on the accuracy of scores in multiple-choice tests. American Educational Research Journal 3:27-33.
Mellenbergh (1967) Nieuwe Ervaringen met een Zekerheidsaanduiding. Ned. T. Psychol. 22:168-181.
Meuwese, Barendregt, Vastenhout (1960) Een onderzoek naar de relatie tussen de juistheid van oordelen en het begeleidend gevoel van zekerheid. Ned. T. Psychol. 15:529-541.
McClelland Studies in Motivation, Appleton, New York; 1955.
McNeel, Messick (1970) A Bayesian analysis of subjective probabilities of interpersonal relationships. Acta Psychologica 34:311-321.
Michael (1968) The reliability of a multiple-choice examination under various test-making instructions. Journal of Educational Measurement 5:307-314.
Michael (1968) The reliability of a multiple-choice examination under various test-making instructions. Journal of Educational Measurement 5:307-314.
Miller (1956) The magical number seven, plus or minus two. Psychological Review 63:81-97.
Murphy (1966) A note on the utility of probabilistic predictions and the probability score in the cost-loss ratio decision situation. Journal of Applied Meteorology 5:534-537.
Murphy The evaluation of probabilistic predictions in meteorology, Unpublished doctoral dissertation, University of Michigan; 1969.
Murphy (1969) Measures of the utility of probabilistic predictions in cost-loss ratio decision situations in which knowledge of the cost-loss ratio is incomplete. Journal of Applied Meteorology 8:863-873.
Murphy (1970) The ranked probability score and the probability score A comparison. Monthly Weather Review 98.
Murphy (1969) On expected-utility measures in cost-loss ratio decision situations. Journal of Applied Meteorology 8:989-991.
Murphy (1972) Scalar and vector partitions of the probability score (Part 1) Two-state situation. Journal of Applied Meteorology 11:273-282.
Murphy (1973) A new vector partition of the probability score. Journal of Applied Meteorology 12:595-600.
Murphy (1974) A sample skill score for probability forecasts. Monthly Weather Review 102:48-55.
Murphy, Epstein (1967) Verification of probabilistic predictions A brief review. Journal of Applied Meteorology 6:748-755.
Myers (1965) Risk taking and academic success and their relation to an objective measure of achievement motivation. Educational and Psychological Measurement 25:355-363.
Oskamp (1962) The relationship of clinical experience and training methods to several criteria of clinical prediction. Psychological Monographs: General and Applied 76.
Pitz (1974) Subjective probability distributions for imperfectly known quantities. Knowledge and Cognition, L.W. Gregg, Wiley, New York; .
Raiffa Decision Analysis, Introductory Lectures on Choice under Uncertainty, Addison-Wesley, New York; 1970.
Richelle (1970) Malentendus sur les apports du conditionnement. Rev. Comp. Animal 4(1):22-31.
Rippey (1968) A Fortran Program for scoring and analyzing probabilistic tests. Behavioral Science 13:424.
Rippey (1968) Probabilistic testing. Journal of Educational Measurement 5:211-215.
Rippey (1970) A comparison of five different scoring functions for confidence tests. Journal of Educational Measurement 7:3.
Rouanet (1961) Etudes de decisions experimentales et calcul de probabilites. La decision , C.N.R.S, Paris; 33-43.
Ruch, Stoddard (1925) Comparative reliabilities of five types of objective examinations. Journal of Educational Psychology 16:89-103.
Ruch, DeGraaff (1926) Corrections for chance and guess vs. do not guess. Instructions in multiple-choice tests , Journal of Educational Psychology; 17:368-375.
Sandbergen (1968) Test strategie/test strategy. Ned. T. Psychol. 23:16-38.
Sandbergen Meningen von Studenten over Zekerheidscoring/Students Opinions about Confidence Marking, R.I.T.P. memorandum (unpublished); 1972.
Sandbergen Guessing and confidence in testing educational achievement, In Choppin, B. (A/106 IEA Memorandum); 1972.
Savage The Foundations of Statistics, Wiley, New York; 1951.
Savage (1971) Elicitation of personal probabilities and expectations. Journal of the American Statistical Association 66:336.
Savage (1971) Elicitation of personal probabilities and expectations. Journal of the American Statistical Association 66:783-801.
Schaefer, Borcherding (1973) The assessment of subjective probability distribution: A training experiment. Acta Psychologica 37:117-129.
Schum, Goldstein, Howell, Southard (1967) Subjective probability under several cost payoff arrangements. Org. Behav. Hum. Perform. 2:84-104.
Shannon (1948) A mathematical theory of communication. Bell System Technical Journal 27.
Shannon, Weaver The Mathematical Theory of Communication, University of Illinois Press; 1949.
Shannon (1951) Prediction and entropy of printed English. Bell System Technical Journal 30:50-64.
Sherrifs, Boomer (1954) Who is penalized by the penalty for guessing. Journal of Educational Psychology 45:81-90.
Shuford (1967) How to Shorten a Test and Increase its Reliability and Validity. Technical Report SCM R-8, Shuford-Massengill Corporation, Lexington; .
Shuford (1969) Systems of confidence weighting, theory, and practice. Los Angeles, Annual Meeting of the American Educational Research Association.
Shuford, Albert, Massengill (1966) Admissable probability measurement procedures. Psychometrika 31:125-145.
Shuford, Brown (1975) Elicitation of personal probabilities and their assessment. Instructional Science 4:137-188.
Sidman (1953) Avoidance conditioning with brief shock and no exteroceptive warning signal. Science.
Siegel, Siegel, McMichael Choice, Strategy and Utility, McGraw-Hill, New York; 1961.
Slakter (1967) Risk-taking on objective examinations. American Educational Research Journal 4:31-43.
Slakter (1968) The penalty for not guessing. Journal of Educational Measurement 5:141-144.
Slovic (1962) Convergent validation of risk taking measures. Journal of Abnormal and Social Psychology 65:68-71.
Slovic, Lichtenstein, Edwards (1968) Boredom induced changes in preferences among bets. The American Journal of Psychology 78:208-217.
Slovic, Lichtenstein (1968) Relative importance of probabilities and payoffs in risk taking. Journal of Exper. Psych. 78.
Smith (1964) Relationship between achievement-related motives and intelligence, performance level, and persistance. J. Abnorm. Soc. Psych. 68:523-533.
Smith (1970) An empirical investigation of complexity and process in multiple-choice items. Journal of Educational Measurement 7:1.
Soderquist (1936) A new method of weighting scores in a true-false test. Journal of Educational Research 30:290-292.
Solomon Studies in Item Analysis and Prediction, Stanford University Press; 1961.
Stanley, Wang (1968) Differential Weighting. A Survey of Methods and Empirical Studies, College Entrance Exam. Board, New York; .
Stanley, Wang (1970) Weighting test items and test-item opinions. Educational and Psychological Measurement 30:21-35.
Stevens Handbook of Experimental Psychology, Wiley, New York; 1951.
Stevens (1957) On the psychophysical law. Psychological Review 64:153-181.
Stevens (1959) Measurement, psychophysics and utility. Measurement, Definitions and Theories, C.W. Churchman, P. Ratoosh, Wiley, New York; .
Stevens (1962) The surprising simplicity of sensory metrics. American Psychologist 17:29-39.
Swineford (1938) The measurement of a personality trait. Journal of Educational Psychology 29:289-292.
Swineford (1941) Analysis of a personality trait. Journal of Educational Psychology 32:438-444.
Swineford, Miller (1953) Effects of directions regarding guessing on item statistics of a multiple-choice vocabulary test. Journal of Educational Psychology 44:129-133.
Tables of the Cumulative Binomial Probability Distribution, Harvard Univ. Press, Cambridge, Mass; 1955.
Thorndike, Hagen Measurement and Evaluation in Psychology and Education, 3e ed., Wiley, New York; 1969.
Thorndike Educational Measurement, 2e ed., Amer. Council on Education; 1971.
Thrall, Coombs, Davies Decision Processes, Wiley, New York; 1954.
Tiberghien (1968) Etude de la certitude du rappel au cours d'un apprentissage verbal. Année psychol. 18:32-39.
Torgerson Theory and Methods of Scaling, Wiley, New York; 1967.
Trow (1923) The psychology of confidence an experimental inquiry. Archiv für Psychiatrie und Nervenkrankheiten 67:1-47.
Tversky, Kahneman (1974) Judgement under uncertainty: Heuristics and biases. Science 185:1124-1131.
Van Naerssen (1962) A scale for the measurement of subjective probability. Acta Psychologica , 2; 20:159-166.
Van Naerssen, Van Beaumont (1965) Ervaringen met een Zekerheidsaanduiding bij objektieve Tentamens. Ned. T. Psychol. 20:308-315.
Van Naerssen, Sandbergen, Bruynis (1966) Is de Utiliteitscurve van Examenscores een Ogief?. Ned. T. Psychol. 21(6):358-363.
Von Neumann, Morgenstern Theory of Games and Economic Behavior, Princeton Univ. Press; 1947.
Votaw (1936) The effect of Do-Not-Guess directions upon the validity of true-false or multiple-choice tests. Journal of Educational Psychology 28:698-703.
Waters, Waters (1971) Validity and likeability ratings for three scoring instructions for a multiple-choice vocabulary test. Educational and Psychological Measurement 31:935-938.
Wiley, Trimble (1936) The ordinary objective test as a possible criterion of certain personality traits. School and Society 43:446-448.
Williamson (1964) Assessing clinical judgment. J. of Medical Educ. 39:893.
Winkler (1967) The quantification of judgment Some methodological suggestions. Journal of the American Statistical Association 62:1105-1120.
Winkler (1969) Scoring rules and the evaluation of probability assessors. Journal of the American Statistical Association 64:1073-1078.
Winkler, Murphy (1968) “Good” probability assessors. Journal of Applied Meteorology 7.
Winkler (1970) Nonlinear utility and the probability score. Journal of Applied Meteorology 9:143-148.
Wood (1977) Multiple choice: A state of the art report. Evaluation in Education: International Progress, B. Choppin, T.N. Postlethwaite, Pergamon; .
Wright, Stone Best Test Design, Mesa Press, Chicago; 1979.
Ziller (1957) A measure of the gambling response set in objective tests. Psychometrica 22:289-292.