Sample outline of an evaluation plan with Kirkpatricks’ four levels.
The chapter describes a system for the development and evaluation of educational programs (e.g., individual courses or whole programs). The system describes steps that reflect best practices. The early stages in development (planning, design, development, implementation) are described briefly. The final stage (evaluation) is described in more detail. The evaluation step is a four-tiered process based on the Kirkpatrick and Kirkpatrick model and a performance indexing measurement system of Tatum and Nebeker. The chapter should be a valuable guide for teachers, program directors, and department chairs in their efforts to create and maintain quality educational experiences and high levels of student learning.
- educational programs
- curriculum development
- course evaluation
- program evaluation
- student learning
Most educational endeavors (e.g., producing curricula, programs, courses) follow a pretty standard set of activities for the purpose of educating students as shown in Figure 1. The chapter will rely mostly on college and university curriculum examples, but this does not exclude primary and secondary schools. Think of this as a roadmap. Like any roadmap, it is not the only way to get from Point A to Point B, but it will show the landscape and road signs from which to navigate through the process of creating new and better educational experiences for students. This chapter will briefly describe the first four phases of the process, and then focus in more detail on the evaluation phase. The emphasis on the evaluation phase is in line with current trends in education that view student learning and success as essential to academic performance.
2. Phase I: planning
Planning is the first of five phases in creating an educational experience (e.g., an individual course, an academic program). Planning includes a set of data gathering and assessment activities aimed at helping to decide whether or not to proceed to the design, development, implementation, and evaluation phases. The output of the planning phase should be a written concept proposal that makes an academic case for proceeding to the subsequent phases. This planning document can then be submitted to the appropriate approval structures (e.g., principal, department chair, dean, academic committee).
A planning document needs to cover several areas, including but not limited to, a mission statement, a needs analysis, required resources, benchmark assessments, general target competencies and/or outcomes, and an evaluation plan. Before proceeding to the design phase, a few words should be said about the distinction between competencies and outcomes and the evaluation plan.
Competencies refer to a general set of knowledge, skills, abilities, and other personal traits (e.g., attitudes, ethics, interests) that predict behavior in a wide variety of situations. Competencies provide the student with an integrated “mental model” of the current state and evolving standards of the field: [1, 2, 3, 4, 5, 6, 7, 8]. Examples of competencies include problem solving ability, communication skills, personal and professional ethics, and values, to name just a few.
Outcomes come in two varieties: program learning outcomes (PLOs) and course learning outcomes (CLOs). Learning outcomes tend to be more specific than competencies, with PLOs representing broad program objectives and CLOs representing specific ways in which a particular course meets those objectives . Learning outcomes should be expressed as observable, behavioral outcomes (i.e., what the student is expected to do), and typically include an action verb and a target content area. The action verb is often taken from Bloom’s taxonomy [10, 11] that ranges from low level actions (e.g., remember, understand) to high level actions (produce, construct). For example, a PLO might be: A graduate of this program will be able to evaluate research designs and construct research projects. A CLO for a research course in the program might be: At the completion of this class, the student will be able to identify the major designs from Campbell and Stanley . Another CLO for the same class might be: At the completion of this class, the student will be able to create a research project using one of the Campbell and Stanley designs.
2.3 Evaluation plan
An important element in any planning document is an answer to the question: What will be used as evidence that a program or course was successful? One highly researched and successful approach addressing this question comes from the model proposed by Kirkpatrick and Kirkpatrick . The model identifies four evaluation levels as shown in Table 1. These levels are (a) reaction: participant satisfaction and self-assessment of learning, (b) learning: the learners’ knowledge and skill improvement, (c) behavior: transfer of learned skills to other areas (e.g., jobs or future classes), and (d) results: impact on the institutions success and improvement. Often, educators seem satisfied with only assessing the first two of these levels (reaction and learning). The last two (behavior and results), however, may be even more essential to academic performance. The last two levels go beyond just learning, and assess what students can do and how this contributes to a more general measure of educational success.
3. Phase II: design
The design phase involves creating a general structure for later development (see [14, 15]). Completing these steps will help guide the next phase (Development). Several actions should be taken such as (a) establish time frames for the future phases, (b) specify desired competencies or learning outcomes, (c) identify learning and performance activities that demonstrate successful achievement of the competencies/outcomes, (d) set prerequisites (e.g., students taking Algebra II must have completed Algebra I, students enrolled in a college program must have a high school diploma), (e) determine the major administrative concerns, and (f) decide what data can be collected that will reflect the four Kirkpatrick levels (see Table 1 for some suggestions).
For some ideas about what design actions can be taken, see the checklist below (adapted from ).
Who is the primary point of contact (POC)?
To whom are the applications submitted?
How will candidates and participants be kept informed?
How will prerequisites be assessed?
Who will ensure the application materials are complete?
Who reviews and approves the applications?
When and where will training be conducted?
Where will the student records (e.g., attendance, course completion, start dates) be kept?
What budget will pay for the support personnel?
How will exams be administered?
How will exams be secured?
Who will write, proctor, and grade the exams?
Where, when, and how will skills training be conducted?
What corrective action steps will be used and who will monitor this process?
What awards and or recognitions will be issued?
Who will oversee ongoing program maintenance?
What sources of data are required to assess the Kirkpatrick four levels and how will they be obtained?
4. Phase III: development
The Development phase described below explains the steps required to produce an educational practice that is ready for implementation (Phase IV). The development phase can generally be carried out in two steps.
4.1 Select and develop learning and performance activities
This step builds on the work completed earlier under Design (identifying learning and performance activities). This is where the actual learning and training activities are generated and matched to the learning outcomes/competencies. There are two, not mutually exclusive, options for achieving this step: (a) find relevant learning and performance activities from external sources, and (b) select or develop these activities in-house.
Procuring the relevant activities from an external source is far less time consuming than developing them in-house. The principal disadvantage is that the learning opportunities offered by outside sources may not be entirely suitable for the curriculum (i.e., the activities may not address the outcomes and competencies in the most direct and relevant fashion).
Selecting and developing the learning and performance activities in-house allows for customized experiences that can target specific knowledge, skills, and abilities. Home-grown educational experiences have the advantage of being directly relevant to the outcomes and competencies identified for the curriculum. The disadvantage of this customized approach is that it can be very specialized and may require a high degree of instructional design expertise and technical skill to develop.
4.2 Establish tests and measures of outcomes/competencies
In this step, the learning outcomes/competencies that the students are acquiring will be assessed. The process of developing tests and measures is described in the literature on testing theory and practice (see [14, 16]). The easiest approach is to locate existing tests and measures. These existing materials may come from a variety of sources including curricula from other institutions, training classes, certification programs, continuing education units, extension classes, and so forth. Below is a partial list of candidate tests and measures that can be used to assess whether students are mastering the content and meeting the outcomes and/or competencies:
Participant satisfaction ratings
Participant self-assessment of learning
Class quiz scores
Midterm and final exam scores
Instructor ratings of class assignments
Final project/thesis evaluation
Supervisor’s assessment ratings
Self-review of functional skills
Expert ratings of oral presentations
Skill exercise observations
Panel review recommendations
When an adequate set of existing assessment tools cannot be located from external sources, then customize tests and measures must be developed. When developing custom-assessment items, two important criteria must be met—reliability and validity . A reliable assessment is one that is consistent. A valid assessment is one that is accurate. The first criterion (reliability) is generally established by showing that the test or measure is stable over time (e.g., repeated use of class quizzes yield consistent scores). The second criterion (validity) assures that the tests or measures accurately evaluate what they are intended to appraise. There are several techniques for ensuring the validity of tests and measures, but the most common validity check is to use Subject Matter Experts (SMEs) who closely examine the tests and measures and form a consensus that these tools in fact reflect the relevant outcomes or competencies.
5. Phase IV: implementation
After a course or program has been developed, it is ready to be implemented. There is no standardized process for implementation, but educational institutions have developed and implemented initiatives across a wide variety of disciplines and there is a large body of common practices to draw from: ([14, 15, 17, 18]). In general, there are at least four steps involved in a standard implementation.
5.1 Conduct pilot studies
A pilot study is a “pre-study” conducted as a dry-run prior to launching the full effort. The study should be on a much smaller scale than the full curriculum (e.g., fewer students, less costly technology, fewer classes), but still preserve the essence of the program.
5.2 Refine essential elements
The results of the pilot study should be examined and lessons learned should be noted. Specifically, at least the following elements need to be reviewed and modifications made.
Are these the right outcomes or competencies for this curriculum? Should more be added? Should some be deleted?
5.2.2 Time frames
Is the timing of events (course duration, project times, testing schedules) optimal? Where can changes occur?
Were the correct prerequisites identified? Should some be added? Should some be removed?
5.2.4 Administrative procedures
Was the administration of the pilot study efficient? Where were the administrative bottlenecks and glitches? How can these be improved?
5.2.5 Learning and performance activities
Did the learning and performance activities produce the intended outcomes? Should new activities be added? Should some activities be discarded? Can improvements be made to the existing set of activities?
5.2.6 Tests and measures
Did the knowledge tests and performance measures assess the outcomes and competencies of the students as expected? What adjustments should be made?
5.2.7 Data collection
Are the data collected easily obtained and in a usable form? Can clear conclusions be drawn from these data?
5.3 Market the initiative
To help ensure success, a marketing plan should be devised to advertise the program and recruit students. The following items should be considered: (a) identify the target audience, (b) align the marketing objectives with the curriculum objectives, (c) create a communication plan, (d) publish a schedule, and (e) use specific institutional marketing techniques (e.g., fact sheets, web and electronic media, newsletter, brochures, communication networks, open house events, personal visits to potential recruiting venues).
5.4 Launch full curriculum
In this step, the program gets implemented in accordance with the pilot study modifications.
6. Phase V: evaluation
After the course or program has been implemented, it must be evaluated for effectiveness. This evaluation should be driven by some formal model such as the Kirkpatrick and Kirkpatrick model  shown in Table 1. If the Kirkpatrick’s model is adopted, then data are required that assess each of the four levels. If the evaluation is for a single course, then the tests and measures will be mainly, but not exclusively, relevant to level 2 (learning). If the evaluation is for an entire program, then all the levels should be assessed (as shown in Figure 3 to be discussed below). I’ll begin with the evaluation of a single course, and show one possible approach.
6.1 Evaluating a course
Once the knowledge tests and performance measures have been administered to students in a class, each person should have a set of relatively objective scores. These scores, when combined, should show how successful the student was with regard to the class outcomes. The approach illustrated here is based on “performance indexing” developed by Tatum and Nebeker . Performance indexing is a system for combining and weighting a set of scores and generating an overall index. Performance indexing has been employed successfully in fields outside of education (e.g., real estate, environmental quality management, organizational improvement), but can be used just as effectively in an educational setting. The weighting feature is especially useful because it takes into account how valuable each test or measure is in the overall assessment. If, for example, in a biology class, mid-term and final exam performance is more important than homework, this difference will be reflected in the final index. Often, the degree of importance is reflected by the number of points that can be earned by each assignment. Performance indexing offers a more sophisticated system for balancing performance and getting at the essence of student learning. An example of performance indexing used in a hypothetical class is shown in Figure 2.
There are several steps to developing and using performance indexing (for a more complete discussion of the topic see Tatum and Nebeker ). The most essential features are (a) each test or measure is given a weight according to its importance in the assessment, and (b) an overall index score is generated from the weighted values (e.g., 370–400 is outstanding performance). Table 2 is a step-by-step guide for building and using the performance index table in Figure 2.
6.2 Evaluating a program
A program (e.g. clinical psychology, biology, history) is designed so that students graduate having met certain competencies or PLOs. Whether the program uses competencies or PLOs is a matter of preference, but regardless of this choice, the CLOs must be designed to meet one or more of these competencies or PLOs. When students successfully complete all of courses in the program, they will have satisfied the expected objectives of the program and will leave with specific knowledge, skills, abilities and other desired characteristics (e.g., attitudes, personal ethics, interests).
Figure 3 is an example of how performance indexing can be used to evaluate an entire program (as opposed to a specific course within that program as shown above in Figure 2). Developing an index table for a program involves basically the same steps outline in Table 2, with a few modifications. The evaluation measures (shown in the diagonal spaces) are based on the Kirkpatrick and Kirkpatrick  levels (see Table 1), which are shown at the top of the Figure 3. The specific evaluation measures will vary from program to program, but each measure should fall into one of the levels. For example, level 1 (reaction) is supposed to indicate the students’ satisfaction rating of the program and an assessment of how much they think they have learned. These ratings can be obtained from each class or as part of an exit survey at the end of the program. Level 2 (learning) is intended to reveal, on a more objective basis, how much the students learned (in this case based on grades, ratings of acquired skills, and test scores average across classes). Level 2 is closely tied to the competencies or PLOs of the program. Level 3 (behavior) is an indication of the degree to which the program changed the student’s behavior and the extent to which the student can transfer this behavior to other settings (e.g., did they learn valuable job skills, did the acquire knowledge and skills in prerequisite classes that they can apply to future classes?). Although level 3 is normally associated with assessing an entire program, it is still possible to include behavioral measures at the course level. For example, Figure 2 shows an “internship supervisor rating” as an essential measure of the student’s ability to apply what was learned. Finally, level 4 (results) is supposed to show that the program had a positive impact on the current success and future improvement of the institution (e.g., local school, school district, college). Evidence for positive results can be demonstrated by a variety of data such as graduation rates, employment success, advancement to higher levels of education, or ratings by external agencies. Level 4 measures are not common among educational institutions when evaluating individual programs (although these data are routinely collected at higher levels), but they should be. To capture the essence of academic performance, we must assess the degree to which our programs contribute to the general success and welfare of the broader academic community.
Once the final index is computed, the overall success of the program can be evaluated. In the hypothetical program depicted in Figure 3, the index score of 300 indicates that the program is above average. A close examination of Figure 3 will also reveal where the program is performing well (students rate their learning and job skills as exceptional, the program gets an exceptionally high rating by external agencies) and where it requires improvement (test scores are low, there is a low percentage of students finding jobs or advancing to other programs).
7. Concluding remarks
The phases and steps advocated here are obviously a mechanistic (non-theoretical) approach. It resembles Tyler’s  thinking about curricular design more than contemporary thought (e.g., [21, 22]). There is nothing wrong with a more mechanical approach. In fact, the phases and steps proposed in this article are not incompatible with modern views of education such as the sharing of common goals , scaffolding , or the spiral curriculum . At some point, however, we need to find and follow a path towards building an educational program, and this roadmap shows us the way without too many detours.
Academic performance has been the focus of much research and interest during the past few years. Initiatives such as No Child Left Behind  and Race to the Top  have generated much debate and concern regarding the components of academic performance  and the optimal methods for assessing learning and success . This chapter proposes a method for developing and evaluating courses and programs that gets at the heart of academic performance in five phases (i.e., planning, design, development, implementation, and evaluation). The first four phases are a prelude to what the author considers the true essence of academic performance; namely, the identification and measurement of performance indicators. This chapter presents an evaluation model (based on [13, 19]) that guides the user down a well-traveled road that leads, in the end, to a quantitative understanding of student course performance and program success. In the inimitable words of Peter Drucker: You can’t manage what you can’t measure.
The author would like to acknowledge the valued assistance of Dr. Don T. Sine for his support and critique of parts of this research and manuscript.