BISON A Short Guide to the Development of Performance Tests United States SUI\!Y AT BuFFALO Civil Service Commission TL"= ll... -:>A,11ES Bureau of Policies and Standards DEPO .... : ...,r,' COPY 8 697 ll ABSTRACT This pamphlet was written as a source document for the Psychologists in the Regional and Central offices of the U. S. Civil Service Commission on the development of performance tests, particularly those tests which measure necessary physical and/or psychomotor skills for a job. It summarizes the literature and the experiences of experts in this area. ABSTRACT This pamphlet was written as a source document for the Psychologists in the Regional and Central offices of the U. S. Civil Service Commission on the development of performance tests, particularly those tests which measure necessary physical and/or psychomotor skills for a job. It summarizes the literature and the experiences of experts in this area. Objective 408 Professional Series 7 5 -1 A SHORT GUIDE TO THE DEVELOPMENT OF PERFORMANCE TESTS Lynnette B. Plumlee, Ph. D. Test Services Section Personnel Research and Development Center United States Civil Service Commission Washington, D. C. 20415 January 1975 TABLE OF CONTENTS Page 1 Introduction Purposes of Performance Testing 1 Deciding whether to Use a Performance Test 2 Overview of Performance Test Development 4 5 Job Analysis 10 Planning the Test Setting Specific Test Tasks 18 Establishing Rating Procedures for Work 22 Sample and Simulated Work Sample Tests 26 Tryout Tryout Analysis and Use of Analysis 31 Results Final Test Preparation 37 Final Test Analysis 37 38 Validation Charts and Exhibits (Pages 43 -52) Chart A. Illustrative Test Techniques 43 Chart B. Sample Performance Testing 45 Techniques Illustrated in the Literature Chart C. Sample Test Specifications 49 Exhibit 1. Rating Form 51 Exhibit 2. Summary Tryout Analysis Form 52 Selected Bibliography 54 58 References PREFACE This Guide was designed primarily to provide Psychologists in the Regional and Central offices of the U. S. Civil Service Commission with a brief source document which summarizes the procedures for developing performance tests. It was also designed to serve as a source document for other behavioral scientists such as those in State and local governments. Since these scientists may or may not be Psychologists 1 they are referred to as Measurement Specialists (MS) throughout this Guide. The Guide represents an effort on the part of the author to bring together techniques and procedures used by those who have been extensively involved in test development. References have been made to tests and other sources which will provide further background material. The author has expressed her acknowledgement for the helpful suggestions offered by Dr. Frank Schmidt 1 Mrs. Luzelle Mays I Mr. John Kraft and others of the Personnel Research and Development Center of the U. S .. Civil Service Commission. Dr. Plumlee has a Ph. D. in Psychology from the University of Chicago. Formerly she was Director of the Test Development Division of the Educational Testing Service. Currently she is in private practice as a consultant on test development and validation procedures. She prepared the Guide in fulfillment of Purchase Order 74-2248. Introduction This pamphlet summarizes the principles I problems I and procedures of performance test development and use. Its purpose is to assist the Measurement Specialist (MS) working with Civil Service selection problems in deciding when and how to use performance tests. The term "performance test" has been used to designate a wide variety of tests I principally those which are not heavily dependent on intellectual skills. This pamphlet considers primarily tests designed to determine whether an applicant has the necessary physical and/or psychomotor skills for a job. It will discuss work sample tests~ simulated work sample tests I and tests of motor and sensory skills related to typical jobs. It will not consider in depth such situational tests as those commonly used in assessment centers I although these are mentioned.. Performance tests are also used as a training technique and in the evaluation of training but the primary focus here will be on their use in selecting personnel. There are several excellent sources of information on performance testing which provide more detailed discussions of this type of test and which include illustrations of a variety of techniques. References will be provided to some of these techniques in later It is suggested that those planning to develop performance sections. tests review the following publications: Gagne and Fleishman (1959 1 Boyd and Shimberg (1971) I and Fitzpatrick and Chaps. 11 and 12) I Morrison (1971). The material in Adkins (1947) (now out of print) is helpful if the reader can obtain a copy. Purposes of Performance Testing in Personnel Selection Since performance tests are generally more costly and time consuming to administer than paper-and-pencil tests I such tests will probably be used only when one of the following circumstances exists: 1 • The skill being tested does not depend on verbal ability; e.g. I skill in reading printed questions is irrelevant to the skills required on the job. 2. Ability to perform the job cannot be measured by paper-and-pencil tests; e.g., a required skill in handling tools and equipment cannot be determined by paper-and-pencil tests. 3. Not all those hired on the basis of training and experience records and paper-and-pencil tests are capable of learning the job. 4. There are too few candidates for a criterion-related validity study and a work sample test lends itself more readily to content validity. The use of performance tests is not limited to manual jobs, though such use has been more common in this area. The most familiar performance tests are the stenographic and typing skills tests. Performance tests of non-manual skills include those designed to measure leadership, administrative, and diagnostic skills. Examples of various techniques and uses are described in Chart A. Deciding whether to Use a Performance Test The first step in considering the use of measurement procedures for evaluating job applicants is a job analysis. This analysis will provide a description of tasks to be performed and the relevant knowledges, skills, abilities, and other worker characteristics (KSA' s) required to carry out the tasks. Using the job analysis, the MS then determines whether each skill is needed on entry or whether training to attain the skill will be provided after entry. If a skill can be learned with ease by some but with difficulty by others, one may be justified in requiring this skill on entry or an aptitude for learning it. Even where skills are easily learned, if learning a composite set of such skills would take considerable time, it may be reasonable to require that some of these be brought to the job. In choosing the particular selection procedures for identifying candidates who possess the necessary developed skills or aptitudes, the following factors will normally be considered: 1. Evidence that the procedure will be effective in identifying those with the necessary skills. 3 2. Cost of development and use (relative to the gain to be expected in productivity over use of a less costly procedure) o 3 o Acceptability to those tested o 4 o Cost of validation o Those qualities which can be satisfactorily measured and validated by paper-and-pencil tests will normally be The MS may decide that some essentialtested by such means 0 skills cannot be measured adequately by paper-and-pencil tests o He may conclude that the ability to perform some critical motor tasks· can be measured only by performance measures; or he may decide that no suitable tests are available and that the small number of candidates makes a work sample test more practical. Performance tests are ordinarily more costly than paper Some costs and-pencil tests if many examinees are involved 0 cost of equipment,in performance testing are fairly obvious: Other costs mayindividual administration, and material use. staff time in detailed skill analysis, content be less apparent: validation, and preparation of the rating system; loss of security (due to discussion among examinees) and hence need for parallel sets of tasks; worker time lost in experimental tryout; training of administrators, etc o Nevertheless, the cost of performance testing relative to paper-and-pencil testing diminishes as the number of candidates If the test is to be used only once and only one decreases 0 administrator is required, then existing equipment may be usable and other costs may be counterbalanced by the cost of developing and validating a paper-and-pencil test for a handful of individuals o Also, although some performance tests are fairly elaborate, not all such tests need be. The typing and shorthand tests are examples of relatively simple performance tests 0 A sampling of the. skills required by the job may be sufficiently representative of the total job that it is not necessary to test all skills 0 One must weigh as a cost factor the relative effectiveness of alternative selection procedures. What would be the cost of hiring a person who could "pass" the paper-and-pencil test but lacks the necessary psychomotor skills to perform the job? Such evaluation of relative costs is not easy, but it may be possible to gain some insight through analysis of past hiring experience and informal tryout before undertaking a full-scale development project. Nor should one overlook the cost of defending a paper-and-pencil test which lacks face validity should it be challenged. On the other hand, in using a performance test to facilitate validation through content validity, one should not assume that, because the particular test task appears related to some phaseof the job, the test is representative of the job. The circumstances of administration may alter the relationship between test and job; and the sampling of tasks may not measure certain critical elements of the job. Also, abilities tested by the performance test may, on statistical analysis, prove to contribute little or nothing beyond the measurement provided by paper-and-pencil tests and may even contribute less (Cronbach, 1970, pp. 388-394}. (The problems of validation are considered at greater length in a later section.) Chart B provides a literature guide to illustrations of various performance testing techniques which have been devised for specific jobs. Overview .of Performance Test Development Where a performance test is to be used, the relevant job tasks must be analysed in considerable detail, unless actual segments of the job are being used as the test. A decision must be made regarding which job tasks will be tested, and whether they will be tested in order to determine the developed skill or the aptitude for developing the skill. For the test of developed skill, the sample of job tasks is converted to a set of test tasks. For the test of aptitude, the sample of job tasks is analyzed for the psychomotor and sensory skills required to develop skill on the job and tasks requiring these same underlying skills are devised. 5 Instructions for administering the test are prepared 0 Decisions are made regarding standards for acceptable performance and the relative weight to be assigned to quality I and other relevant criteriao Tentative scoring orspeed 1 procedures are developed and the test is administered rating on a tryout basis o An analysis of tryout results will indicate It willwhether each task will discriminate among examinees 0 also provide information on the difficulty of the task and the consistency with which different raters rate a candidate o On the basis of this tryout~ some tasks _rnay be revised and a final selectLon will be made. Post-test analysis is part of the test development procedure and includes a determination of rater reliability I and the appropriateness of test difficulty o Finally I the validity of the test must be shown using content or criterion related validation strategies I as appropriate o These steps will be discussed at greater length in succeeding sections 0 Job Analysis Some job analysis procedures have been described in Developing and Documentingthe document I Job Analysis I Data I BIPP 152-35 I December 1973 I published by the U 0 S 0 Civil Service Commission o* Examples of job descriptions are available in the Dictionary of Occupational Titles I published These descriptions tend to by the U o So Department of Labor 0 be generalized in order to make them usable by a number of one should compare users o If these descriptions are used 1 them step by step with the job under consideration 0 Also I Adkins (1947 I PPo 216-224), Boyd and Shimberg (1971 1 PPo pp. 256-257)12-20) 1 and Fitzpatrick and Morrison (1971 1 give examples of or describe job analyses directed at developing performance tests. * The Personnel Research and Development Center does not Each has recommend any one method of job analysis 0 advantages and disadvantages depending on its intended use o The methods mentioned here are for illustration only; they may not be appropriate for some users 0 6 The Department of Labor's Handbook for Analyzing Jobsdescribes in more detail one process of analyzing jobs andprovides many examples. It provides an extensive list ofworker functions and an analysis of worker traits by trainingtime, aptitudes, temperaments, interest, and physical demands.Data are given in the aptitude section by level of skill requiredand may be useful in detailing the job requirements. Inanalyzing jobs as a basis for psychomotor tests other thanwork samples or simulated work samples, the analysis shouldbe performed by one who is trained in such analysis and whohas had experience in discriminating among skill requirements.Other discussions of job analysis techniques are found in Gagneand Fleishman (1959, pp. 344-348) and Morsh (1967, pp. 4-11). The job description which results from a job analysis shouldinclude:* 1. What the worker does. 2 . Procedures performed by the worker. 3. Equipment and tools used by the worker. 4. End product sought. S. KSA' s required, including entry level requirements. 6. Relative importance of job tasks and KSA' s. 7. Documentation of procedures used in obtaining theforegoing information as well as supporting evidencefor any standards established. Certain problems in developing job descriptions may beencountered such as the following: 1. Deciding the level of generality of description. The definition of the KSA' s must bs general enoughto apply to all those who perform the speci.fic * See Jo~n_!:\nalysis, Developing and Documenting Data, BIPP 152-35, Po 9ff. for a more detailed outline and discussion.(In a few methods of job analysis steps 1-4 are not pedormed.The KSA' s are directly determined, such as those required forthe barely acceptable and the superior workers.) 7 job for which the performance test is being developed. It must be specific enough to identify the essential skills. Example: Too general - Assembles micro electronic equipment. More specific -Inserts and attaches 1 by bonding or soldering 1 lead wires of electronic com ponents including diodes I transistors I resistors I and capacitors I into minature assembly boards I using tweezers and magnifier. Too specific Inserts and solders lead wires of diodes 1 transistors 1 resistors 1 and capacitors into printed circuit board No. 78 for the 510 computer assembly. 2. Verifying the job description. The analyst should be aware of the following sources of error: a.. Shift in task requirements over time--work schedules and needs may be such that certain tasks will be performed only seasonally; or there may be changes in work requirements from one year to the next. b. Observer bias--the person observing the job may have more familiarity with certain aspects of the job and emphasize these over those less familiar. co Employee bias--the employee being interviewed may attribute a larger percent of work time to those tasks he dislikes; he may focus on those he nost enjoys; or he may overlook important details which have become routine and automatic. To increase the likelihood that the job description is a true representation of the job, one should obtain information from independent sources or by independent means. These should be compared to make certain that they are in essential agreement. 3. Assigning importance weights to job tasks and/or KSA' s. This is a complex procedure which should be carried out with care. There are several ways that this can be done. ·One method is described in the ., U. S. Civil Service Commission's Job Analysis Manual and uses four categories of importance: frequency, time required, level of difficulty, and consequence of error. To obtain estimates of these, the Manual recommends having workers, supervisors, and others familiar with the job, rate job tasks according to each category. It also suggests methods of rating. If the number of job tasks is small, the tasks may be ranked within each category. When the number of job tasks is large, a ranking of all tasks can become unwieldy, especially in the middle range where distinctions are difficult to make. A modified ranking approach can sometimes be used whereby the evaluator assigns a proportion of tasks to each of 5 or more ranks I as follows: A scale is established from most to least important, with the proportion to be assigned to each category determined on the basis of an assumed normal distribution of importance .. If five categories are used 1 for example 1 proportions would be 7% I 24% I 38% I 24% 1 and 7%. In this procedure 1 the evaluator first makes a rough .assignment to each of the 5 categories and then compares :tasks in adjacent categories to make the necessary adjustments toward achieving the desired increment in importance and the specified proportions. The MS may find it more satisfactory to set up a rating scale for use by those evaluating importance I when the number of tasks is large. For examole 1 the scale for frequency might be: required several times a day; required on a daily basis; required weekly; required several times a year; rarely required. In 9 oneinterpreting the results of such a rating scale, should keep in mind that, although a task performed three times a day is important, one performed once a month may also be critical to adequate performance of the job. 4. Determining reliability of the job description. Whether ratings or rankings are used, it is important to use independent evabations by different evaluators as a basis for establishing reliability of the ratings. If each task is ranked individually, a rank order correlation may be used between pairs of evaluators or a combined rank order correlation across more than two evaluators may be obtained by the formula: D2 = 1 _ 12 ~ P n(n _ 1) N(N2-1) , where D =the difference between ratings of two evaluators; n = number of raters, and N = the number of tasks ranked (Adkins, 1947, p. 234). If ratings or grouped ranks are used, a Pearson product-moment correlation may be obtained between pairs of evaluators across tasks. If two or more evaluators are familiar with the job and qualified to rate task importance, a combination of their ratings or rankings will ordinarily be more reliable (and more valid) than those of the individual evaluator. If ratings are added to produce a combined rating, the reliability of the combination may be estimated by use of the Spearman-Brown formula: 2r xy , where r is 1 + r xy xy the correlation between evaluators. (This should be a low estimate if the reliabilities of both ratings exceed the inter-rater correlation.) If one rater lacks familiarity with the job and its component tasks, combining the ratings could result in a reliability lower than that of a single qualified rater. Planning the Test Responsibility for planning the test development project Planning the test and carrying out the test development process may be the responsibility of a panel of persons familiar with the job under the guidance of the MS; or the MS may take major responsibility, calling on the experts for help as needed. (See Plumlee, 1974, for a more complete discussion of these techniqu.es.) The MS will ordinarily wish to assemble a group of subject-matter experts at the job analysis and planning stage to assist in determining those job tasks to be tested. A panel is likely to be used for development of the test when it is to be employed by two or more different agencies and when developed job skills are being tested rather than aptitude for learning the skill. When the test is to be used by different agencies, the decisions necessary for developing a test suitable for all work units can be made more readily in a panel meeting with representatives from the various units than by the MS meeting with these experts individually. Determination of skills to be tested Whether or not the MS uses a panel in test development, it will be helpful to have supervisors and key employees identify those characteristics of job pedormance (i.e. , how the job is done by the employee) which indicate top level performance and those which indicate a minimally acceptable level. This may be facilitated by asking the supervisor to identify his best workers and those most needing improvement and to analyze the differences in the ways they do the job. This will help assure that those skills which make the difference between successful and less successful performance are included. As a separate step~ it may be helpful to go through the job sequence with the supervisor and employee and ask which operations give the most trouble. On the basis of this information a decision is then made regarding the skills which should be tested. Some job analysts perform the job to determine skill requirements, but a danger then exists of 11 identifying skills required by the novice rather than by the experienced worker. (See Fleishman, 1962, p. 146 ff.) If a decision is made to test psychomotor or sensory skills, the analysis should provide evidence indicating how the specific skills were identified as critical. Psychomotor skills will normally be tested apart from a work sample only when candidates have not yet been trained for the specific job; hence a distinction must be drawn between those skills which are normally developed by all and/or improved on the job and those which are possessed by some but not by all candidates and are basic to learning the job. Fleishman defines these basic skills as "abilities"- "fairly enduring traits which, in the adult, are more difficult tochange" (Flei§_hrp.an,_l964, p. 9). Itisthese.abilities which will ordinarily be measured by psychomotor tests. Fleishman reserves the term "skill" for identifying "level of proficiency on a specific task or limited group of tasks." On the basis of extensive factor analysis research, Fleishman identified the following more important psychomotor factors and described tests which may be used to measure them: control precision, multilimb coordination, response orientation, reaction time, speed of arm movement, rate control, manual dexterity_, finger dexterity, arm-hand steadiness, wrist-finger speed, and aiming (Fleishman, 1964, pp. 16-26). The MS · who is considering the measurement of psychomotor skills will find it instructive to read Fleishman's article on "The Description and Prediction of Perceptual-Motor Skill Learning" (Fleishman , 1962, Chap. 5). He presents a summary of research on validity of psychomotor tests· for predicting job performance and discusses the problems in obtaining validity. He provides evidence, for example, that the importance of specific psychomotor abilities in contributing to performance on complex tasks changes as training progresses (p. 146 ff.). Determination pf the most efficient measurement techniques Chart A shows some of the measurement techniques used for selection purposes. 1. Paper-and-pencil tests Since the present pamphlet is concerned only with a candidate's physical and/or psychomotor skills relative to the job, little attention will be given to the non-psychomotor paper-and-pencil tests. The MS, however, should be familiar with these tests as job-relevant measurement techniques. a. Tests of knowledge Knowledge required for jobs which demand physical skills can sometimes be tested with pictorial items and minimal or no written instructions. These picture-identification tests may require such responses as matching tools with jobs or showing proper and improper procedures and asking the candidate to specify the correct procedure. Oral tests have also been found to be very useful in testing knowledge. The use of these two types of tests may enable the examiner to test more broadly the knowledge required of the candidate than would be possible in a simulated work situation. Where trade knowledge can be tested by paper-and-pencil tests, it may be more efficient to use these before giving candidates a performance test which requires such knowledge. This avoids use of the more costly performance test for some who are not prepared to take it. b. Tests of diagnostic skills The Tab Item was developed by Glaser, Damrin, and Gardner for diagnostic type jobs, such as TV repairman (Fitzpatrick and Morrison, 1971, 13 pp. 248-250*). The examinee is given a list of possible checking steps from which he selects the steps he would take in the order in which he would take them. As he chooses a step, he lifts the tab (or erases a coating) which covers the results of making the check. (For instance, choosing the step "Check the voltage of the power supply" leads to the information beneath the tab: "30 volts." The candidate adds this to his fund of knowledge about the problem and chooses another step.) The number of tabs lifted and the kind of steps taken are used to indicate the examinee's proficiency in diagnosing the problem. This approach has also been used with medical students in testing their diagnostic skills (Boyd and Shimberg, 1971, pp. 62-65). c. Psychomotor paper-and-pencil tests Some paper-and-pencil tests of psychomotor skills have been used to indicate the applicant's skill in quick and skillful fine motions. Research indicates that these should not be assumed to be equivalent to motions required in working with equipment, without further checking. 2. Equipment tests a. Psychomotor equipment tests These have the advantage of making it possible to require motions similar to the actual job. (See previous discussion of Fleishman's work in this area on page 11. Also see Guion, 1965a, pp. 288-296.) The following source publication was written by the Tab Item's * developers: Glaser, R., Damrin, D. E., and Gardner, F. M., "The Tab Item: a technique for the measurement of proficiency in diagnostic problem-solving tasks." Urbana: University of Illinois, College of Education, Bureau of Research and Service, 1952. Since the few commercially available psychomotor tests are limited to a relatively small number of skills I it may be necessary for the MS to develop his own equipment tests to match the skills . required by the jobs with which he is concerned. Where specific psychomotor skills are required to perform a number of different jobs I such tests may make it possible to cover a wide range of jobs with fewer different tests than would be required if all the skills needed for each of these jobs were tested in a corresponding work sample. b. Actual work sample The most straightforward means of measuring job performance skills is with an actual sample of the job operation. It is applicable where the candidate is expected to bring trained skills to the job I although it would probably not be used where expensive equipment is required or where less costly techniques can be shown to be valid. The task is put in a standard situation so that all examinees do the same job. Performance is measured against a uniform set of standards. c. Simulated work sample A simulated work sample test makes more efficient use of time when the job involves too many steps for an actual work sample test. It is also more efficient for those tasks where most candidates can perform many of the steps and a disproportionate amount of time would thus be spent on non-differentiating job tasks if a work sample were used. Those elements of the job which require skilled performance are identified and form the basis for the test tasks. Insofar as possible 1 the processes which all can perform are eliminated and the operations through which the examinee goes are reduced to those which will best identify. the expert worker1 or alternatively 1 the worker with inadequate skills. Less expensive 15 materials and equipment may be substituted for those used on the job. 3. Situational tests The situational test is a kind of simulated work sample which ordinarily does not utilize The candidate is equipment as described above. told to assume he is engaged in a specified real which typically involves interaction life task I The MS will find a further discussion with others. of this kind of test in Fitzpatrick and Morrison (1971 1 pp. 242-245) and sources cited there. The In-Basket test is a type of situational test in which the candidate is provided with a letters collection of typical business memoranda 1 and other records and asked to make administrative The In-Basket test and its application decisions. in industry are further discussed in Frederiksen (1967) and Lopez (1967). 4. Sensory skill tests These tests are designed to measure such color blindness I depth attributes as visual acuity I and hearing acuity and may be administered of vision I as part of a physical examination. The level of function needed should be determined by careful job analysis and validation studies. Tiffin and McCormick (1965 1 pp. 180-183) have grouped many industrial jobs into visual job families (those requiring similar visual skills) and have provided evidence that performance on jobs in a given visual job family is related to meeting vision standards established for that family. Also, see pp. 279-288) for a discussion of Guion, (1965a 1 testing for vision. Identification of job tasks to be covered by the performance test Unless the job is one which involves few tasks, it is obvious that one cannot test all aspects of the job. The. basic problem will be to make the most efficient use of the available time. In selecting those tasks to be covered by the test, the MS is advised to consider the following: 1. The total time required for the tasks under consideration must be reasonable in terms of both the candidate's time and the test administrator's time. 2. The final composite of test tasks should be representative of the job tasks as indicated by the ratings of importance in the job analysis. 3. Tasks which all or most candidates can perform will contribute little or nothing to differentiation among candidates. One should look for procedures where errors commonly occur. 4. Tasks for which training on the job is normally provided must not be included. 5. The operation must be sufficiently exact to permit accurate standardization and objective judgment. 6. The cost of equipment, material, storage, and staff time should be minimized so far as consistent with sound testing principles. Where two tasks require similar procedures and one is less expensive to administer it will ordinarily be used. 7. If the use of an expert for administration can reasonably be avoided, it is less costly to do so. 8. It is desirable initially to identify more job tasks than can be tested in the final test. Some will not be readily measurable and others may be discarded on the basis of tryout results. 17 Final test plan Some decisions, such as those regarding rating procedures and the training needed for administrators cannot be made before test tasks have been tentatively developed. The folloWing decisions, however, should be made before proceeding to develop the test tasks: 1 • Time limitations. 2. Tentative list of job tasks to be represented by test tasks, considering a. Relative weight to be assigned to each selected job task. The final weighting will be expected to correspond to the importance ratings developed earlier. b. Relative importance to be assigned in test results to quality of product, speed of production, and job method. 3. Type of test to be used for each job skill to be measured (i.e., job sample, simulated job sample, psychomotor skill, etc.). 4. If psychomotor skill tests are to be used, the job tasks for which each skill is relevant must be specified. 5. Equipment needs. These needs will be spelled out sufficiently to indicate whether presently available equipment can be used for testing or whether new equipment must be built or otherwise obtained. 6. Level of proficiency required. Job tasks which not all can master or which require extensive training time may be tested at the level of mastery if it can be reasonably expected 18 that qualified candidates will be able to perform them on entry. Other job tasks may be easily learned by those with basic skills, in which event one will ordinarily measure performance at a lower or more general level of skill, or one may measure aptitude for learning the skill. 7. Plans for determining reliability. A decision must be made at this point regarding means for determining the reliability because, if a decision is made to use the parallel task approach, this must be built into the actual test. Reliability is discussed on page 31. 8. Vaiidation plans It is important to consider plans for validation at the time test plans are developed to insure that data needed for validation will be available. (See discussion of validity on p. 38.) Setting Specific Test Tasks In setting the specific tas]$.s to be performed by the candidate in the test situation, the following considerations are important: 1. The sampling of operations or steps must be a reasonably faithful representation of the selected job task operations and the required solution should be one which may reasonably be expected. For example, in troubleshooting, the defect should not be one which would occur very rarely. Such a defect could discriminate against the experienced worker who checks first for the most probable defect. An exception to this principle might occur when one is differentiating among top level performers. 2. The expected operation (s) and/or product must be defined to the extent that candidates know what is expected of them and different observers will agree on whether standards have been met. 19 3. To the extent possible, required steps should not depend on successfully performing a preceding step. For example 1 if an error in cutting material prevents an examinee from subsequently demonstrating skill in placing and nailing, his· score on this task will be no better than that of an examinee who can perform none of the component steps satisfactorily. It is generally better to use many short test tasks rather than a few long tasks. Interdependence of tasks is a major :problem of performance testing and is not easily avoided in a work sample test, but the problem can be reduced by breaking the task down into steps and setting a new test task for each step. This may have the disadvantage of failing to test the examinee's skill in integrating steps on a job; if this seems important to measure in a particular instance, one can consider inserting oral questions such as "What would you do next?" 4. Equipment, materials, and testing conditions must be the same from candidate to candidate. 5. The task should not give an advantage to those who have typically used one procedure over those who have typically used another. For example, terminology or procedures which may vary from one locality to another and which are not necessary to the job skill being tested should not be required for satisfactory performance of the test task. The procedures selected should have substantially equal efficacy and standing in the trade or occupation. 6. Tasks should make efficient use of equipment and administration time. 7. Where titne permits, it will help impr::>ve the accuracy of measurement of operations requiring precision if the candidate is provided with more than one opportunity to demonstrate precision. One needs enough measures of each skill, knowledge, or ability to optimize reliability. By maximizing reliability, one maximizes the potential for validity but not if reliability is at the expense of measuring a variety of tasks. One sometimes has to I / 20 sacrifice reliability (as statistically determined) to maximize validity. Reliability is simply the upper bound of validity and the gap is usually sufficient that more attention should be given to validity than reliability per se. Validity can be maximized by measuring as wide a variety as possible of the operational elements which are critical to job performance. If however, one can maximize reliability without sacrificing important job requirements, one thereby maximizes the potential for validity. 8. More tasks will be included in the preliminary test form than will be used in the final form to permit the elimination of tasks which prove ineffective in the tryout. 9. The scoring system to be used for each task must be such that scores across tasks can be combined in some meaningful way. (See section on establishing rating procedures on page 22.). If a simulated work sample is contemplated, the MS is advised to read the discussion of simulation on pages 238-242 of Fitzpatrick and Morrison (1971), on pages 96-104 of Besnard and Briggs (1967), and in Chapter 8 of Gagne (1962). If a test of psychomotor skills .is being used, it should be noted that many components involved in typical electronic assemblies h