Open access peer-reviewed chapter - ONLINE FIRST

Using ChatGPT to Create Constructively Aligned Assessment Tasks and Criteria in the Context of Higher Education Teaching

Written By

Niclas Schaper

Submitted: 29 February 2024 Reviewed: 01 March 2024 Published: 28 April 2024

DOI: 10.5772/intechopen.1005129

Artificial Intelligence for Quality Education IntechOpen
Artificial Intelligence for Quality Education Edited by Seifedine Kadry

From the Edited Volume

Artificial Intelligence for Quality Education [Working Title]

Dr. Seifedine Kadry

Chapter metrics overview

14 Chapter Downloads

View Full Metrics

Abstract

The chapter deals with the question, how ChatGPT can be used to support and facilitate the planning of competence-oriented written university examinations. Therefore, the possibilities and potentials of ChatGPT in the formulation of learning outcomes, the derivation of examination tasks from the outcomes and the formulation of assessment criteria are examplarily demonstrated and analyzed. First, it is explained what ChatGPT actually is and which aspects should be considered when using it. Then didactic concepts and requirements for designing higher education examinations (especially the constructive alignment concept) are presented, which ChatGPT can support. The focus is on the competence-oriented design of written examinations with open answer formats. Therefore, the use of ChatGPT in the generation of learning outcomes, examination tasks and assessment criteria as well as the critical review and optimization of the drafts will be presented using an example from the domain of work and organizational psychology.

Keywords

  • ChatGPT
  • constructive alignment
  • competence oriented examinations
  • higher education teaching
  • written exam tasks
  • assessment criteria
  • learning outcomes

1. Introduction

Several authors claim that the fastest impact of ChatGPT in university teaching was seen in the area of examinations after this generative AI was made freely available to the public in November 2022 [1, 2]. In particular, for unproctored written exams in the form of seminar papers, for example, students were already experimenting with how ChatGPT could take over writing tasks in December 2022 [3], and in January the first students were already caught having their papers written by ChatGPT [4]. In a survey of US-American students conducted at the same time, 30% of respondents stated that they were already using ChatGPT for assignments [5]. University teachers also tested relatively early on whether their exams could be solved by ChatGPT, which was confirmed for exams in Business Information Systems [6] and Astrophysics [7]. Naturally, such examples prompted alarmists to think about how to prevent the misuse of chatbots in higher education (cf. e.g., Fleck [8], Gamage et al. [2] or Sullivan et al. [9]). According to a recent survey by Solis [10] at German universities, the use of ChatGPT is therefore completely prohibited in a small number of universities (approx. 2%) and only permitted for certain purposes (e.g., research) in just over a fifth of the universities surveyed.

However, ChatGPT can also be put to good use, for example, as an effective learning aid for students [1, 11]. ChatGPT can, for example, help with reading academic texts (e.g., by asking the bot to summarize texts or explain sections of text that are difficult to understand in more detail) or help with learning (e.g., by asking ChatGPT to extract the most important key terms from a textbook chapter or ask questions about specific course content) or help with writing (e.g., by having ChatGPT suggest outlines or key questions on a topic or give feedback on text passages). University teachers can also use ChatGPT not only for writing academic texts, but also for teaching tasks. For example, texts can be generated from keyword lists or text summaries can be created (e.g., for exercises in seminars). To illustrate teaching content, ChatGPT can help to develop examples and counter-examples. ChatGPT can also be used to test comprehension and application skills in seminars or lectures by creating quiz questions or short tests (see, e.g., Patel [12]). ChatGPT therefore harbors both risks and potential for scientific activities and university teaching. Although, it is understandable to try to avoid or minimize the risks, also the potential and benefits of AI can and should be used for one’s own work.

As someone who teaches and designs teaching myself, but also accompanies other teachers in workshops to deal with questions about the design of competence-oriented teaching and in particular corresponding examination formats, I am familiar with how laborious it often is to determine and formulate learning outcomes or competence-oriented learning objectives and to develop teaching processes and learning objective-oriented examination tasks and assessment criteria on this basis. This requires not only subject-specific but also didactic expertise at an advanced level, as well as time and cognitive resources, which are often scarce in the everyday life of an academic. For inexperienced university teachers in particular, such tasks are not only laborious but also associated with uncertainties, doubts and considerable motivational challenges.

This raises the question of whether the design of these didactic components—in particular the learning outcomes and the corresponding examination tasks and assessment criteria—can be meaningfully supported and made more efficient or less time-consuming with the help of AI technologies. If this can actually be realized and ChatGPT can be an effective support for teaching and examination tasks, it would make things much easier for many teachers and offer support for a qualitative improvement of teaching and examinations.

The aim of this chapter is therefore to test and analyze the extent to which the AI tool ChatGPT can be used sensibly to support and facilitate the conceptual tasks and requirements described, especially in the area of examinations. On the one hand, the possibilities and potentials of ChatGPT in the formulation of learning outcomes, the derivation of examination tasks from the outcomes and the formulation of assessment criteria and schemes for the learning goal-oriented evaluation of learning achievements are to be demonstrated and analyzed by way of example. Since generative language models are prone to errors and inaccuracies in their answers, we will also look at how AI answers can be concretized and improved with regard to incorrect, incomplete and too vague information.

Advertisement

2. What is ChatGPT and what should I bear in mind when using it? (hier weiterlesen)

ChatGPT is a ChatBot that not only answers (knowledge) questions, but can also write its own texts (e.g., term papers or speech manuscripts), summarize articles, program or solve mathematical problems. The core of ChatGPT is an AI-supported language model that has been trained to compose texts that correspond to a high degree to human conversation or text contributions with regard to the respective topics addressed. For this purpose, a neural network of the language model was fed with an enormous amount of texts in order to learn the characteristics and content of the linguistic representation. In simple terms, the program’s language-generative mechanism works by using a highly complex statistical algorithm to generate one word after another so that they fit together in a meaningful way with a specific context or linguistic requirement.

The language model is also further refined using human feedback. Output generated by the model is manually evaluated by humans. However, this is usually not done by experts but by the so-called “clickworkers” in the form of students or unemployed academics (e.g., from India or Kenya), who evaluate the responses generated by the model with regard to content and linguistic characteristics (e.g., with regard to the use of offensive words) [13]. Clickworkers often only have 2–3 minutes for the content fact check to examine the truth of a statement with a short research. Wikipedia articles (which should also be treated with caution as a source of information) are often used as a benchmark to determine whether a statement made by the chatbot is true or false. In view of the precarious working conditions and the high time pressure under which the clickworkers carry out such checks, the fact checks can be characterized as rather inaccurate and unreliable. They only superficially conceal the susceptibility of the language model to errors. Against this background, the best open source language models achieve an average accuracy of 73% at best [13, 14]. Even ChatGPT 4 only achieves a hit rate of just over 80% for two topics, while the rest fluctuate between 60% and 80%. Overall, it can therefore be assumed that the quality of ChatGPT’s responses varies greatly.

The procedure described above attempts to train the language models so that they answer as many questions as possible correctly and helpfully. In principle, however, the program itself cannot check and decide whether its answers are correct or incorrect or good or bad [13, 14]. This applies in particular to answers about complex issues or questions where there are several possible answers. Although a language model such as ChatGPT can calculate an answer, it cannot check how good or correct this answer is. This requirement is therefore left to the user of ChatGPT. In areas where a high degree of accuracy is required and even the smallest errors can cause major damage (e.g., in medicine), other expert systems are needed to provide reliable answers. However, if one is aware of these problems of language models, they can be used in areas where the accuracy of the solution is not that important, but rather the clarification of the basic solution path. In such cases, precision and quality can also be generated and guaranteed by human users themselves. This includes in particular the aspect that the quality of chatbot answers can be significantly improved by the way in which they are prompted (e.g., giving detailed instructions, providing examples or a step-by-step instruction) [15, 16].

A major benefit of ChatGPT, as well as other language models, is the program’s ease of use. A simple text interface is used to formulate a question, to which ChatGPT responds directly (usually after a few seconds). However, as already mentioned, the quality and fit of the answers to the questions depends to a large extent on the way in which the questions or prompts are formulated. In general, it is recommended to develop an incremental, dialogic approach, that is, instead of a difficult and complex question in the form of a long prompt, start with simple questions and gradually increase the complexity of the questions [17]. The structure of the prompt should also be based on the ROMANE scheme [18]:

  • R stands for role assigned to the AI (who or what is being simulated)

  • O stands for the goal of the prompt

  • M stands for instructions or assistance (what to do and in what order)

  • A stands for application examples or content specifications (what exactly or specifically is it about)

  • N stands for useful details (e.g., secondary conditions or restrictions; what should also be taken into account)

  • E stands for the recipient of the text (for whom the text is written)

Furthermore, it has proven useful to develop an understanding of the effects of prompts by trying them out for yourself. It is usually also important to take into account that language models such as ChatGPT take into account the previous context or course of the conversation, so that it makes sense to generate answers to a complex task step by step. This means, among other things, that the chat should be restarted when changing or starting a new task. In this context, it can also be helpful to use additional plug-ins such as “Text Blaze” or “Prompt Box” to efficiently save text snippets or prompts and quickly insert them into the input fields of ChatGPT or to extract and summarize texts from videos with the help of “Chrome” [1, 19].

Advertisement

3. Didactic concepts and requirements for competence-oriented examination planning

A competence-oriented planning and design of examination arrangements that are appropriate in terms of the Bologna reform requires that this is based on learning outcomes which are formulated in a curriculum- and target group-oriented way [20, 21]. Corresponding learning outcomes for courses or course units are derived on the one hand from the overarching qualification objectives of the degree program and refer to the competences or sub-competences to be acquired during the course. On the other hand, they describe as specifically as possible the knowledge, skills, and abilities to be acquired in a course or associated teaching units in a defined subject domain, which should contribute to the development of the overarching competencies. On the basis of the learning outcomes, the teaching and learning activities as well as the associated content must then be planned in further steps, which not only aim to acquire knowledge, but also to initiate and practice skills, abilities and professional attitudes. The step of determining and developing the forms and tasks of the examinations is also carried out before or immediately after this. This must be done in a way that they are suitable for observing and recording the knowledge and skills or competencies to be acquired under conditions of action and performance. It also includes the identification and description of assessment criteria and schemes in order to analyze and evaluate the extent of competence acquisition during the examination tasks. These criteria can also be used to provide informative and learning-promoting feedback to the learners. The next step is to check whether these didactic elements are sufficiently related or coordinated, which is referred to as constructive alignment [22]. This concept is the central didactic principle of competence-oriented teaching and examination design. ChatGPT or other generative language models can be used to support the design of the above mentioned didactic elements. Before we come to this, it should be clarified which design requirements these core elements of competence-oriented teaching and examination design should fulfill.

3.1 Definition and formulation of learning outcomes

Learning outcomes describe the competence facets related to the relevant knowledge, skills, abilities and attitudes that have to be acquired in a course or course unit [20]. The outcomes should be formulated as specific as possible so that they can be operationalized for examination purposes. This is done in such a way that the description of objectives makes clear which behavior—especially in terms of cognitive performance—illustrates the application of knowledge or its mastery in certain situations [23]. On the one hand, reference must be made to the subject-specific content of the knowledge and application domain covered in the course, as well as to the tasks and requirements that must be mastered in this context in order to solve certain problems. However, the social-communicative, metacognitive and motivational-affective aspects that additionally motivate, control, and support professional action should not be neglected.

On the other hand, certain linguistic and conceptual principles must be taken into account when formulating learning outcomes in order to ensure that the learning outcomes provide a sufficient basis for further planning steps. The following aspects should be taken into account:

  • Learning outcomes should describe actions to be learned, that is, they should refer to both a content component (What does the learning action relate to in terms of content?) and a behavioral component in the form of a verb (What is to be done? What are the requirements for dealing with the knowledge?)

  • Learning outcomes should describe concretely observable modes of action so that they can be operationalized for teaching and assessment purposes; the criterion of concreteness and specificity refers both to the behavior to be demonstrated or the required cognitive activities and to the relevant knowledge concepts.

  • It is also helpful if, in connection with the learning outcome, it is described which methods, instruments or procedures are required to implement the action to be learned (How is the learning outcome achieved?) and why the action to be learned is needed in a comprehensive sense.

  • It is also advisable to use proven learning objective taxonomies to derive and formulate learning objectives in order to create a description of the learning content that is based on learning theory and comprehensible to other participants. The taxonomy by Anderson and Krathwohl [24], which provides categories for the systematization and derivation of cognitive learning objectives or learning outcomes, has become widespread and proven its usefulness in the higher education context. For the formulation, categorization and analysis of cognitive learning objectives, a distinction is made between two dimensions: (1) the process dimensions, with which different cognitive processes in dealing with knowledge (remembering, understanding, applying, analyzing, evaluating, synthesizing/creating) are characterized, and (2) the knowledge dimension, which differentiates between different types of knowledge (factual knowledge, conceptual/contextual knowledge, procedural knowledge, metacognitive knowledge).

3.2 Construction of examination tasks

The next step is to design the examination tasks on the basis of the learning outcomes and the learning content. The learning outcomes can be assessed in different formats, which can also be combined with each other. In principle, a choice or distinction can be made between written, oral and practical examination forms, which in turn can be implemented and methodically designed in a variety of ways [25, 26].

The further discussion is focused on written examinations, especially the more open formats. This is done because of space restrictions and in order to provide the most concrete advice possible for the use of ChatGPT in the generation of examination tasks.

Written examinations are the main examination format for many degree programs. In this context, test performance is achieved by answering or completing written tasks or requirements. With regard to the design of written examinations, a distinction is usually made between strongly pre-structured (e.g., in the form of multiple choice tasks) and more open forms (e.g., based on open questions on a topic) as well as different time periods that are made available (at a defined rather short timeslot in the form of written exams or tests vs. longer timeframes in the form of, e.g., term papers) [26]. The more structured formats are suitable for testing knowledge, whereas more open formats are also able to test more complex performance or competencies. In the more open forms of written examinations, a distinction is also made between restricted tasks (which can be answered with keywords or limited explanations and which relate more to requirements for knowledge reproduction and understanding or simple analysis and application requirements) and tasks that require a more detailed answer in the form of more complex statements, case analyses and solutions or requirements for more extensive conceptual drafts.

When formulating open written tasks, certain principles must also be considered [25, 27]. The tasks should be formulated as clearly, completely and structured as possible; that is, the examinees should understand exactly what is expected of them (e.g., with regard to the focus of the content, the scope and level of detail and the structure of the answers) as well as the criteria that the answers should meet. The questions or assignments should therefore clearly state what kind of knowledge is required to answer them (especially for less demanding learning outcomes) and what cognitive requirements are to be met when answering the tasks (especially for more demanding learning outcomes; e.g., describe, compare, justify, evaluate, develop).

Since the learning outcomes should relate to cognitive requirement levels based on learning taxonomies (e.g., the Anderson and Krathwohl taxonomy [24]), certain task forms and variants must also be taken into account when designing appropriate tasks with reference to a specific requirement level (Schaper [28]). For the “Understanding” level, it is therefore particularly important to construct tasks that require the explanation or clarification of a theory or tasks in which it is necessary to justify why a certain method is particularly suitable for the analysis of certain facts, while for the “Applying” level, tasks such as the transfer of a theory to a specific situation should be set. At the “Analyzing” level, for example, tasks should be constructed in which two or more variants of a concept or case are compared or errors in an application description are to be identified. In connection with requirements at the “Evaluating” level, tasks are to be constructed, for example, in which a solution approach for a specific case is to be evaluated on the basis of various criteria or a case or a topic is to be viewed and evaluated from different perspectives. Finally, for the “Creating” level, tasks are required in which, for example, a complex plan for solving a specific problem must be designed using certain principles or heuristics or a concept or technical solution for a specific task or function must be designed or constructed.

In addition, the tasks must be designed with regard to difficulty-determining aspects (e.g., the number of aspects to be considered, the scope of the text to be analyzed, the contextual relevance of the analysis and the request for additional justifications at the “Analyzing” level). It should also be considered to what extent the tasks should be better illustrated or made more difficult by providing additional documents (e.g., statistics, illustrations, calculation examples, descriptions).

3.3 Determining and designing assessment criteria and schemes

In a further design step, the focus is on the question of how examination performance can be evaluated and assessed. Basically, this step involves comparing the examination performance shown with the expectations that were formulated in advance of the examination with regard to certain requirements [26, 29]. According to the purpose of the examination, the type of performance expected and the examination format, an assessment scheme should be developed on which the assessment is based [30]. An assessment scheme requires the determination of assessment-relevant criteria that can be used for a transparent and objective assessment. In most cases, the individual tasks are presented in tabular form in the assessment scheme with response categories, their definition or characterizing description, anchor examples and the allocation of values (e.g., points) [31, 32]. For each response category, the characteristics that a response must fulfill in order to be assigned to the category are described. The descriptions therefore contain characteristics or indicators that allow different answers to be assigned without any doubt. In some cases, it is advisable to list all possible correct answers, in other cases, especially more complex questions, examiners should provide a generic description of characteristics for assessment purposes. The naming of anchor examples can also facilitate the assignment to a specific assessment category [32].

Furthermore, it is usually necessary to convert the assessments into points and grades; that is, the assessments of the individual aspects of an examination performance must be converted into a numerical value in the form of points. These points must then be aggregated and transformed into a grade using appropriate schemes. This requires that the assessment schemes should also specify how the individual aspects of the examination performance are to be scored and how the point totals are to be converted into grades. For more complex examination requirements (e.g., when assessing case analyses or more complex answer texts), where it makes little sense to refer only to individual aspects of the performance, but rather to assess the solution as a whole on the basis of certain criteria, it is advisable to use so-called rubrics. Rubrics are assessment schemes that consist of a list of competence-related assessment criteria (including a definition of the criterion) and, for each criterion, a description of competence levels that reflect different levels of competence in relation to the criterion (e.g., excellent performance, satisfactory but improvable performance, unsatisfactory performance) [33].

Once the central methodological elements of an examination arrangement have been designed, the final step is to check the fit of the learning outcome, examination task and assessment scheme and adjust them if necessary. This corresponds to the construction alignment concept [22], in which the fit of the teaching/learning activities with the learning outcomes and the examination tasks and requirements must also be checked and ensured. Methodologically, the corresponding fit aspects can be checked with the help of learning objective taxonomies. Anderson and Krathwohl’s approach is particularly suitable for this (see Universität Zürich—Arbeitsstelle für Hochschuldidaktik [34]). Essentially, this involves analyzing to which knowledge categories and requirement levels the learning outcome(s), the examination tasks and the assessment criteria and schemes can be assigned. After this step, a comparison is made as to whether the learning outcome, task and assessment criteria can be assigned to the same categories and levels or not (e.g., procedural knowledge and application for both the learning outcome, task and assessment schemes). If this is the case, a successful constructive alignment can be assumed. However, if there are deviating assignments, adjustments or modifications to the three components are necessary in order to achieve or guarantee constructive alignment. In addition, content-related and representation-related aspects (e.g., whether the content area is sufficiently represented by the tasks) must also be taken into account when assessing the fit.

The conceptual and design aspects described provide an overview of the key conditions that need to be taken into account when designing examination arrangements, particularly for written but also other examination forms. These aspects are also decisive in the instruction of ChatGPT for the generation of corresponding examination elements as well as in the evaluation of the elements generated by ChatGPT. The following section presents how to proceed when instructing or prompting ChatGPT for the generation of learning outcomes, examination tasks and assessment schemes for specific learning objectives.

Advertisement

4. Generation of a competence-oriented examination arrangement with the help of ChatGPT

4.1 Generation of learning outcomes

Teachers at universities—be they research assistants, academic assistants, professors or lecturers—often have little training in didactics and instructional psychology—with a few exceptions. However, they usually have a high level of professional and academic expertise in the subject matter they are teaching. When planning courses and teaching units, including the examination arrangements required in this context, they often approach this task from a subject-systematic, content-oriented perspective or are guided by traditional concepts of teaching design. It is therefore often difficult for them to determine learning objectives and the learning outcomes to be achieved for their courses and to formulate them in a didactically appropriate way. This difficulty often results in learning objectives that only vaguely represent what is to be achieved. Or they are too knowledge-heavy in their objectives and do not adequately describe which cognitive and non-cognitive skills are to be acquired in the course. It is difficult to derive learning activities and examination requirements in an appropriate form on the basis of such deficient learning objectives respective learning outcomes.

In this context, a generative, AI-supported language model such as ChatGPT can provide helpful support in order to partially take on the task of formulating adequate learning outcomes for certain learning domains. In particular, ChatGPT is able to take into account the conceptual and linguistic aspects described in chapter 3. The more precisely and skillfully certain aspects of prompting are considered, the better the system is able to do this. Therefore, various aspects and levels should be addressed when generating learning outcomes for a course with the help of ChatGPT:

  • The assignment for ChatGPT can be formulated as follows, for example: “Formulate learning outcomes for [specification of the overarching topic] with the learning content [keywords of the learning content and aspects to be covered].”

  • Furthermore, the target group of the course should be stated (e.g., Bachelor’s students of Computer Science in the 3rd semester) and how the course or learning unit is integrated into the curriculum (e.g., the course builds on the introductory lecture of informatics)

  • Another crucial aspect of prompting is to specify the taxonomic system according to which the learning outcomes are to be developed (e.g., “Base your formulation on the taxonomy of Anderson and Krathwohl.”). It is also advisable to specify the exact learning objective levels that you want to address in the course (e.g., understanding, applying, analyzing) or to specify each level individually using a separate prompt.

  • It should also be specified what the overarching qualification objective of the course or teaching unit is (e.g., ability to carry out requirements analyses for specific issues) and also which format the course is based on (e.g., lecture, seminar, tutorium, project seminar) and which didactic principle or approach the course is based on (e.g., problem-oriented didactics).

  • Didactic planning usually requires the consideration of different levels of planning, which refer to different levels of concretization of the instructional approach; at least two, if not three or more levels are to be distinguished (e.g., program, module, and course level). These different levels of planning also refer to different levels of goal formulation. For the prompting of learning outcomes, this means that these different levels should be addressed by specifying the respective curricular level.

  • Last but not least, it may be useful to specify which additional aspects should be taken into account when formulating the learning outcomes (e.g., explain the why and wherefore of the learning outcomes).

Two examples of generating learning outcomes using ChatGPT are presented in Tables 1 and 2, which take the aforementioned aspects of prompting into account in different ways. First, an example of generating learning outcomes with the help of ChatGPT for a lecture covering topics of “Psychology based personnel development” for psychology students in the 4th semester of their bachelor’s degree program and second learning outcomes of a lecture unit concerning the transfer of learning respective training results is presented:

The request for generating learning outcomes for this course unit was: “Formulate learning outcomes for a lecture with 10 sessions on the topic of personnel development based on personnel psychology with the following learning contents: definitions and objects of personnel development, methods for determining learning and training needs, approaches to promoting different areas of competence, evaluation of personnel development and approaches to enhance transfer. Participants in the lecture are Bachelor of Psychology students in their 4th semester. The overarching qualification objective of the lecture is to familiarize students with personnel psychology concepts and to enable them to develop solutions for practical problems in corporate personnel development on the basis of scientific concepts and findings. When formulating the learning outcomes, use Anderson and Krathwohl’s learning objective taxonomy as a guide.”
ChatGPT provided me with the following answer (this is limited to a selection of learning outcomes for reasons of space): “After completing the teaching unit ‘Psychology-based Personnel Development’, students should be able to achieve the following learning outcomes based on Anderson and Krathwohl’s learning objective taxonomy:
3. Applying:
  • Students will be able to apply appropriate methods to determine learning and training needs in specific organizational contexts.

  • Students can apply various approaches to promote different areas of competence in practice.

4. Analyzing:
  • Students can analyze the evaluation of personnel development and select suitable evaluation methods.

  • Students can analyze approaches to transfer enhancement and assess their effectiveness.

5. Evaluating:
  • Students can evaluate the effectiveness of personnel development measures and make suggestions for improvement.

  • Students can evaluate the effectiveness of transfer enhancement approaches and suggest alternative approaches.

These learning outcomes should ensure that students gain a sound understanding of personnel development based on personnel psychology and are able to apply and evaluate this knowledge in practical situations.”

Table 1.

Generation of learning outcomes for a lecture in personnel psychology by ChatGPT.

In a second trial the query was focused on a specific learning unit (approaches to enhance transfer in company training measures):
Inquiry: “Formulate learning outcomes at the Understand, Apply, Analyze, Evaluate and Create learning objective levels of Anderson and Krathwohl’s learning objective taxonomy for the teaching unit ‘Approaches to transfer enhancement for in-company training courses’. Approaches, influencing factors and methods of transfer enhancement in occupational psychology research are discussed with examples of operational application.”
ChatGPT provided me with the following answer (presentation is limited to excerpts):
  • “Understanding: Participants will understand the basic concepts and theories of transfer enhancement. They will learn about the various approaches to promoting the transfer of training content into workplace practice and understand their importance to performance development and organizational success.

  • Apply: Participants will be able to apply the approaches and methods of transfer enhancement they have learned to specific in-company training programs. They will be able to identify suitable measures to promote transfer and integrate these into the design and implementation of training courses.

  • Analyze: Participants will be able to analyze the factors influencing the transfer of training content into operational practice. They will understand the individual, organizational and contextual factors that can influence the transfer process and will be able to evaluate these and derive appropriate measures to optimize transfer.”

Table 2.

Generation of learning outcomes for a learning unit of the lecture by ChatGPT.

Comments on the first example (Table 1): The learning outcomes generated by ChatGPT are consistently based on the cognitive process dimension of Anderson and Krathwohl’s taxonomy so that at least two learning outcomes are formulated for each of the learning objective levels. However, the knowledge dimension of the taxonomy is not taken into account. It is therefore not addressed which type of knowledge (declarative, conceptual, procedural or metacognitive) is to be acquired. It should also be noted that the learning objectives do not systematically refer to the various learning content areas addressed in the prompt. The formulated outcomes therefore only incompletely cover the breadth of the mentioned learning contents. However, the generated learning outcomes specify meaningful learning requirements of the HR development domain. They are therefore rather exemplary suggestions for relevant learning outcomes of this domain, although this is not made clear by the system. Overall, it should be noted that the specification of the taxonomy leads to a comprehensible addressing of cognitive learning requirements and relevant learning content in relation to the relevant knowledge and action domain.

Comments on the second example (Table 2): The first thing that is noticeable here with regard to the individual learning outcomes is that they are formulated more extensively than the learning outcomes for the overarching subject area. In addition to an introductory, rather general description of the learning outcome at the respective level, further sentences are used to specify what characterizes the cognitive activity or action at this learning objective level and with regard to which aspects of the action the learners should be enabled. Overall, it can be stated for this example that the three learning outcomes address important aspects of the respective learning objective level for the addressed content area. Also, the cognitive requirements of the content area are better and more completely covered than in the previous example. However, no reference is made to specific concepts of transfer enhancement. These may still need to be specified.

In both cases, there is a need to revise the learning outcomes generated by ChatGPT. The language model is therefore not able to provide readymade solutions for the formulation of learning outcomes for a course or teaching unit. However, the system provides useful suggestions as to the form in which the learning outcomes should be formulated and which content aspects should be addressed in the learning outcomes. It is also made clear how the learning objective levels differ in terms of specific learning requirements. In my opinion, this creates a good basis for further revision.

Therefore, in the next step you should revise the generated learning outcomes yourself (possibly in collaboration with colleagues or as part of an expert consultation). In a first step, this includes analyzing the learning outcomes generated for a specific case in terms of their strengths and deficits, in a similar way to that described in the comments. In further steps, depending on the initial situation, either minor additions or extensions (e.g., with regard to the mentioned concepts in the learning outcomes) should then be made. In some cases, however, it may also be necessary to formulate additional learning outcomes with regard to subject-specific content not yet covered or to significantly expand the description of the learning outcomes in order to describe more specifically which aspects of acting and thinking the learners should be enabled to do.

Furthermore, it can also be recommended that ChatGPT should be used to optimize the learning outcomes—for example, through more limited and more specific prompts. For example, it is advisable to limit yourself to only one learning objective level in order to achieve a greater breadth of content in the generated learning outcomes. It is also possible to provide ChatGPT with a best practice example of a learning outcome for the learning domain and ask the system to generate learning outcomes for other content areas or learning objective levels in a similar way.

4.2 Generating examination tasks on the basis of predefined learning outcomes

On the one hand, examination tasks should cover the relevant learning area broadly enough in terms of subject matter content in order to test a representative cross-section of knowledge and skills in a domain. On the other hand, the tasks should address the cognitive and non-cognitive skills to be learned in the course, for example, with regard to different requirement levels. While most teachers can still meet the first requirement with relative certainty due to their subject matter expertise, many teachers are often unsure with regard to the second condition. That means they are unsure whether they are addressing more demanding learning objective levels appropriately with their examination tasks. This uncertainty or, in some cases, missing knowledge of how certain cognitive learning requirements can be specifically addressed with the help of certain tasks leads teachers to focus only on the requirements that they can confidently implement and assess (often tasks that only require knowledge reproduction and comprehension). As a result, certain requirement areas are not sufficiently addressed in the examination. Or tasks are constructed that lead to students being under- or overchallenged or that pose demands that are irrelevant in relation to the skills to be learned (e.g., by disproportionately emphasizing the fulfillment of formal requirements).

ChatGPT can also provide support with regard to the design of examination tasks. By specifying and prompting the relevant learning outcomes, ChatGPT is able to construct examination tasks that represent the relevant cognitive requirements of the addressed learning objective levels in a valid and appropriate manner. However, when generating examination tasks for a specific teaching area with the help of ChatGPT, certain aspects should be taken into account with regard to effective prompting:

  • The assignment for ChatGPT can be formulated for example as follows: Suggest to me [number, e.g., 2] examination tasks for a [name the desired task or examination format; e.g., written exam] on the following learning outcome: [original formulation of the learning outcome]

  • Furthermore, the target group for the examination should be specified and the prerequisites that the candidates have for the examination (e.g., whether they already possess skills to apply certain concepts or in dealing with certain tasks through exercises).

  • It is also advisable to provide information on the framework conditions for the examination (e.g., the average amount of time available for a task or additional material/devices available for working on the task—e.g., reference books or calculators).

  • Further information that may be relevant for the design of examination tasks relates to the relevant contextual reference or the degree of realism to be taken into account. Or it should be specified in which way certain aspects should be considered concerning the difficulty of the task (e.g., requesting a justification).

Three examples of how ChatGPT generates examination tasks are presented in Table 3 to illustrate the potential that the bot has for creating examination tasks.

  1. Request: “Suggest examination tasks for a written exam on the following learning outcome (learning objective level Understanding): Participants will understand the basic concepts and theories of transfer enhancement in workplace training programs. They will become familiar with the various approaches for promoting the transfer of training content into operational practice and understand their significance for performance development and organizational success.”

    Answer: “Examination task (exemplary): Explain the basic concepts of transfer enhancement in in-company training measures and give an example of their application in each case. Also describe how these concepts can contribute to performance development and organizational success.”

  2. Request: “Suggest examination tasks for a written exam on the following learning outcome (learning objective level Apply): Participants will be able to apply the learned approaches and methods of transfer enhancement in training measures to specific in-company training programs. They will be able to identify suitable measures to promote transfer and integrate these into the design and implementation of training courses.”

    Answer: “Examination task (exemplary): You are working as a trainer for a company and are to carry out a training measure on the topic of ‘Effective communication in a team’. Describe three specific measures that you would take to promote the transfer of what you have learned into the participants’ everyday working life. Explain how these measures can be integrated into the design and delivery of the training.”

  3. Request: “Suggest examination tasks for a written exam on the following learning outcome (learning objective level Analyze): Participants will be able to analyze the factors influencing the transfer of training content into operational practice. They will understand the organizational, contextual and individual factors that can influence the transfer process and be able to evaluate these and derive appropriate measures to optimize transfer.”

    Answer: “Examination task (exemplary): Consider the organizational context and contextual factors that can influence the transfer of training content into workplace practice. Describe three organizational or contextual factors and analyze how they can influence the transfer process. In addition, evaluate what measures could be taken to optimize transfer in relation to these factors.”

Table 3.

Generation of examination tasks based on the learning outcomes by ChatGPT.

Comments on the examples in Table 3: In my opinion, the suggestions generated by ChatGPT for examination tasks on the topic of transfer enhancement address the cognitive requirements relevant to the respective learning objective level adequately. The tasks could certainly be adopted in this form. Nevertheless, they are more likely to be assessed as draft samples, as they are formulated rather vaguely with regard to certain aspects. These aspects that are not sufficiently addressed in the presented task examples I will discuss below for each of the three tasks:

  • In the task for the Understanding level, it would be useful to specify the approach for classifying or systematizing the basic concepts to which the examinees should refer here, as various systematization approaches exist for this.

  • In my opinion, the task for the Applying level should not only describe the measures for enhancing transfer but also explain why these are relevant for the stated training topic for enhancing transfer. If necessary, the overarching objective of the training measure should also be stated so that it is clear what the communication skills are specifically needed for or are to be improved.

  • In the task for the Analyzing level, the task is somewhat general and is aimed rather at requirements of the knowledge reproduction and comprehension level. I would therefore make the application context more concrete or contextualize it with a specific training scenario so that the examinees are required to perform analytical tasks to a greater extent. In my opinion, the additional instruction in this assignment (“Explain how these measures can be integrated into the design and delivery of the training.”) also requires a correction: The examinee should first be instructed to create measures to adapt and optimize the context factors and then it should be asked for reasons why they lead to an improved transfer.

The comments make clear that there are several needs to optimize the task suggestions by ChatGPT. At best, the suggestions generated can be used as a basis for further elaboration. In my opinion, however, they are still useful as a source of ideas. They also substantially facilitate or support the conceptual work designing examination tasks. When generating examination tasks, ChatGPT draws relatively systematically on knowledge about suitable task formats for certain learning objective levels and contextualizes these in relation to the subject domains addressed. The bot thus provides valuable support in the construction of tasks tailored to learning objectives, as the consistent design of such examination tasks is often difficult to achieve on one’s own or without appropriate assistance, especially for teachers with little didactic experience.

However, it must also be stated that further revision steps are required in order to achieve a usable result. This requires also an analysis of which aspects already fulfill the proposed tasks, but where there are still deficits and a need for changes or additions. This can relate to technical aspects to make the task more specific and to better focus on the intended cognitive requirements or performance. Additionally, information should be provided in the task instructions with regard to the criteria used to assess the answers to the task. This gives the examinees information on what they need to pay attention to when answering the question or task and improves the clarity of the examination task. Furthermore, ChatGPT can be used to further develop and optimize examination tasks. For this purpose, it is recommended to provide the bot with the previous version of the task and ask it to specify certain aspects (e.g., generation of a suitable application scenario). The bot can also be used to generate additional tasks for a specific learning outcome in order to increase the covered breadth of content.

4.3 Generation of assessment criteria and schemes for examination tasks

The assessment of examination performance should be as objective, reliable, valid, transparent and fair as possible. In addition to examination tasks that are geared toward learning objectives and the guarantee of standardized and fair implementation conditions, assessment criteria also essentially contribute to the examination being experienced as appropriate and fair and that the examination fulfills the above-mentioned quality criteria. Assessment criteria and assessment schemes support examiners in ensuring that the assessment of examination performance is not carried out arbitrarily or intuitively, but on the basis of comprehensible and content-appropriate standards, and that quality criteria of objectivity, reliability, validity and fairness are fulfilled to a sufficient extent. If this is not the case, the result of the examination is not only questionable and distorted by uncontrolled assessment processes, but also legally contestable. Assessment criteria for evaluating examination performance—not only for written examinations—are therefore essential. They also provide examiners with certainty when evaluating examination performance, increase transparency with regard to how the examination result is arrived at and are helpful in providing detailed feedback on the strengths and weaknesses of performance. Incidentally, this applies not only to inexperienced examiners but also to experienced and didactically skilled examiners. In my opinion, assessment criteria for the tasks used in an examination that are appropriate in terms of content and requirements are therefore an indispensable part of the examination arrangement. For various reasons, appropriate assessment criteria should also be developed during the construction of the examination tasks, that is, parallel to or immediately after the task construction. On the one hand, they are necessary for a solid task design in order to avoid errors or ambiguities in the formulation of the task (e.g., to check whether the task is sufficiently focused on the skills and knowledge to be tested and whether this is also clearly stated in the task instructions for the examinees). Since the assessment criteria should also refer to a large extent to the learning outcome and the corresponding learning objective level as well as the knowledge required for this, they also ensure that the examination arrangement fulfills the principle of constructive alignment. In addition, the assessment criteria can be used to transparently communicate to examinees what is important in the examination and what standards their examination performance should meet.

In my opinion, ChatGPT can also effectively support examiners in this step. Various application tests show that the system is able to develop meaningful suggestions for determining and formulating assessment criteria and schemes for certain examination tasks. The bot is oriented toward common assessment dimensions in examinations and is also able to take into account specific methodological concepts of assessment schemes (e.g., the rubrics concept). When generating assessment criteria for a specific examination task or even a class/group of tasks with the help of ChatGPT, certain aspects of prompting should be taken into account:

  • The core prompt for ChatGPT can be formulated, for example, as follows: “How should the answers to the following examination task be scored: [state the original wording of the exam task]. Also, give me a detailed evaluation scheme for scoring the performance.”

  • In addition to stating the original task, it may be useful to state the learning outcome on which the task is based, which is also an important reference point for generating the assessment criteria.

  • Additionally, it may be necessary to specify the type of assessment scheme required (e.g., whether this should take the form of a rubric). As described above, rubrics consist of an evaluation matrix in which the evaluation criteria relevant to a specific task or examination performance are named and defined on one axis. On the second axis, the type and number of quality levels for evaluating the individual criteria or dimensions are also specified (e.g., (1) good to very good, (2) satisfactory and (3) insufficient fulfillment of criteria). The cells of the matrix then describe notes and aspects for evaluating the respective quality level for each criterion. Another tested evaluation scheme refers to following aspects: Evaluation category, definition/characterization of the level, anchor example, points. This scheme is particularly useful for tasks in which a distinction is made only between “correct” and “incorrect” or “partially correct.” In this way, a well-founded decision or assignment can be made on the basis of the definitions of each level and the anchor examples given in each case.

In the following (see Tables 4 and 5), two examples are presented in which ChatGPT was asked to generate suitable assessment criteria for two specific examination tasks. These in turn relate to the teaching unit “Enhancing transfer in in-company training measures” and include examination tasks for the learning objective levels “Understanding” and “Evaluating.”

Question: “How should the answers to the following examination task be assessed? Explain the basic concepts of transfer enhancement in in-company training measures and give an example of their application in each case. Also describe how these concepts can contribute to performance development and organizational success. Give me a detailed assessment scheme for scoring performance.”
Answer: “The following criteria and a detailed scoring scheme could be used to score performance when evaluating responses to this assessment task:
  1. Understanding of the basic concepts of transfer enhancement in workplace training programs (e.g., repetition, practical application, feedback, transfer climate):

    • Complete and accurate explanation of concepts: 3 points

    • Partially correct explanation of the concepts: 2 points

    • Superficial or incomplete explanation of concepts: 1 point

    • Missing or incorrect explanation of concepts: 0 points

  2. Exemplary application of the concepts:

    • Clear and relevant examples that illustrate the application of the concepts: 3 points

    • Partially appropriate examples that partially illustrate the application of the concepts: 2 points

    • Unclear or inadequate examples that do not clearly illustrate the application of the concepts: 1 point

    • Missing or incorrect examples: 0 points

  3. Description of the contribution of the concepts to performance development and organizational success:

    • Clear and comprehensive connection between the concepts and their contribution to performance development and organizational success: 3 points

    • Partially correct connection between the concepts and their contribution to performance development and organizational success: 2 points

    • Superficial or incomplete description of the link: 1 point

    • Missing or incorrect description of the connection: 0 points

The evaluation scheme could then add up the points for each category and determine an overall score for the task. For example, a possible scale could be as follows:
  • 9–10 points: Very good performance

  • 7–8 points: Good performance

  • 5–6 points: Adequate performance

  • 3–4 points: Poor performance

  • 0–2 points: Insufficient performance”

Table 4.

Generation of assessment criteria by ChatGPT for the task at the understanding level.

Question: “How should the answers to the following assessment task be evaluated: Critically analyze the effectiveness of a particular approach to transfer enhancement in workplace training (e.g., collegial mentoring) and discuss its advantages and disadvantages. Assess in which workplace contexts this approach might work best and what factors might influence its effectiveness. Provide recommendations on how the approach could be improved to facilitate the transfer of training content into practice. Give me a detailed assessment scheme for scoring performance.”
Answer: “A possible scheme for scoring performance in response to this examination task could be as follows:
  1. Analyze the effectiveness of the transfer enhancement approach (max. 10 points):

    • Critically evaluate the effectiveness of the approach (e.g., using studies or case studies)

    • Consideration of advantages and disadvantages of the approach

    • Understanding of the underlying principles and mechanisms of the approach

  2. Discussion of the operational contexts and influencing factors (max. 10 points):

    • Identification of operational contexts in which the approach might work best

    • Analysis of factors that could influence the effectiveness of the approach (e.g., corporate culture, employee motivation)

  3. Recommendations for improving the approach (max. 5 points):

    • Creative and realistic suggestions for optimizing the approach

    • Consideration of the challenges and potentials in the implementation of the recommendations

  4. Structure and clarity of the argumentation (max. 5 points):

    • Logical and well-structured presentation of the arguments

    • Comprehensible and precise formulation of the statements

  5. References and use of specialist terminology (max. 5 points):

    • Citation of relevant sources to support the statements

    • Use of specialist terminology appropriately and correctly

The maximum score is therefore 35 points. The assessment should be based on these criteria, with particular consideration given to the quality of the argument, the depth of the analysis and the originality of the recommendations.”

Table 5.

Generation of assessment criteria by ChatGPT for the task at the evaluating level.

Comments on the first example (Table 4): The assessment criteria for the “Understanding” level task refers to the various subtasks and proposes a scoring scheme for different levels of response quality for each subtask. However, the levels are characterized in relatively general terms (from “complete and precise” to “missing or incorrect explanation of concepts”), making it difficult to differentiate between the various levels or to assign a specific answer to a level. If this assessment scheme is to be used, it is therefore advisable to add more specific information to assign different answers and the delimitation of scoring levels. In particular, the insertion of anchor examples, that is, typical response examples for the respective level, could be helpful here. Also, the recommendation for the grading scheme is relatively generic and can or must also be adapted to specific application contexts (e.g., with regard to the point ranges selected for the respective grade).

Comments on the second example (Table 5): The assessment criteria for the task at the “Evaluate” level refer on the one hand to the various subtasks of the task and on the other to overarching aspects of the quality of the answer (“Structure and clarity of argumentation” and “Sources cited and use of specialist knowledge”). However, no levels of the response quality are described for the assessment of the individual subtasks, but rather specific criteria to which reference should be made in the response (e.g., “understanding of the underlying principles and mechanisms of the approach” for the first subtask). In my opinion, this achieves a more detailed specification of the assessment criteria than in the first case. However, with regard to a concrete application of this assessment scheme, an additional concretization of these specific criteria is recommended (e.g., through sample solutions or anchor examples) in order to obtain even clearer indications for the evaluation of individual answers. Also, the scoring scheme should be defined more clearly. The indication of the maximum possible point totals for each criteria is not very comprehensible and primarily illustrates a different weighting of the subtasks. Overall, ChatGPT provides useful information on structuring the assessment and determining relevant assessment criteria and aspects, which, however, require further elaboration as described.

My comments on the two examples have already illustrated the potential, but also the need for revision, of the suggestions generated by ChatGPT for the assessment of examination tasks. For this step, it can also be stated that ChatGPT is able to generate useful suggestions for the assessment of examination performance for defined examination tasks. In this case too, ChatGPT’s suggestions have the status of useful ideas and hints rather than readymade solutions. Suggestions for the independent revision of such proposals have already been mentioned and relate primarily to the specification of the assessment criteria through anchor examples and sample solutions as well as the clearer definition of the aspects for which points are to be awarded and how the points can be transformed into grades. Chat GPT can also be used in this context for the further development of assessment schemes, in particular by specifying certain forms of assessment schemes (e.g., rubrics or true/false assessment schemes).

Finally, in a further test step, it is advisable to analyze and evaluate the constructive alignment of the various elements. On this basis, adjustments can then be made between the elements and the fit between learning outcome, examination tasks and assessment criteria as well as the associated teaching/learning activities can be improved. This coordination of the various elements is not only important for reasons of successful didactic coherence, but also has an effect on the quality of the learning process and has been proven to promote motivation to learn. Various studies have shown that learning arrangements that follow the constructive alignment principle result in or effectively promote the use of learning strategies for more in-depth learning and greater learning effort as well as the achievement of more challenging learning goals [35, 36, 37]. However, attempts to determine the extent to which ChatGPT is also able to support teachers in checking constructive alignment proved to be less successful. Unfortunately, the language model was not able to make appropriate assessments of the fit between learning outcome, examination task and assessment criteria in relation to this didactic principle.

Advertisement

5. Conclusion and outlook

In my opinion, the trial to use ChatGPT for the design of written examinations, in particular for the design of learning outcome and competence-oriented examination arrangements, has made clear that ChatGPT can indeed provide support for this task and the process in various ways. For example, the AI tool can be used to generate drafts and ideas for the appropriate formulation of learning outcomes, the learning outcome-oriented design of examination tasks and the construction of suitable assessment criteria and schemes. However, it also became clear that ChatGPT does not provide readymade solutions, but at best drafts and ideas that require further elaboration. It also became clear that ChatGPT creates errors and inaccuracies in its proposals, which need to be identified and corrected through careful and critical analysis. In this context, it should be pointed out once again that the generative language models are not trained to produce true and correct solutions, but rather probable word sequences. The fact that these word sequences make sense and are true is to a certain extent only a by-product [1]. Although between 60% and 80% of the answers are correct in this way, conversely this means that 20–40% of the answers can also be or are incorrect [13]. ChatGPT therefore repeatedly presents answers that are partially incorrect or even invented. The providers and developers of generative language models will certainly work on increasing this rate of correct answers, but the fundamental problem will remain permanently, as it is inherent in the design of the language model. For the application context under consideration, this means that ChatGPT’s proposals for the formulation of learning outcomes, the design of examination tasks and assessment criteria/schemes must be reviewed and revised in any case—it cannot be repeated often enough.

Finally, it should be pointed out that ChatGPT can be used to generate not only open response formats for exams, but also closed task formats or examination tasks. Reference should be made here to the presentations by Patel [12] or Workera [38], which show that ChatGPT can also be used to generate useful suggestions for multiple choice questions, cloze texts, true/false questions, matching tasks or open short questions for all possible subject areas. As described above, information is required not only on the content to be tested, but also on the target group, the level of difficulty or the type of cognitive demand, the style of wording, etc. when prompting. These descriptions also address what needs to be done and considered with regard to the revision of the proposals (in particular [38]). ChatGPT can also be used to check answers to exam questions. Examples of this can be found in Herft [39], where it is particularly pointed out that these reviews can be used for individual feedback to students regarding their performance in formative (exercise) tasks. Potential for change through ChatGPT in the area of examinations also arises from the fact that this tool can be used specifically when writing unsupervised examination formats (such as seminar papers and term papers) and when writing academic theses. ChatGPT thus becomes a partner in the writing process (in the sense of hybrid writing) and is therefore not only a relief for certain writing requirements, but also enables further learning processes through its support. However, in order to allow such hybrid processes in written examinations, clear framework conditions are also required that specify the form in which the bot may be used for the creation of such texts and what must be considered (e.g., that all passages in which ChatGPT or another generative AI was used in writing must be marked).

Let me resume with a statement of an experienced colleague in this field [1]: “I assume that sooner rather than later ChatGPT will become a tool that we use as a matter of course - especially at universities. We should understand this tool (with all its strengths and weaknesses), use it confidently and reflectively and be vigilant about the risks and side effects.”

References

  1. 1. Fleischmann A. ChatGPT in der Hochschullehre. Wie künstliche Intelligenz uns unterstützen und herausfordern wird. In: Berendt B, Fleischmann A, Schaper N, Sczcyrba B, Wiemer M, Wildt J, editors. Neues Handbuch Hochschullehre A 1.30. Ausgabe 110 ed. Berlin: DUZ; 2023. pp. 13-56
  2. 2. Gamage K, Dehideniya SCP, Xu Z, Zu XT. ChatGPT and higher education assessments: More opportunities than concerns? Journal of Applied Learning and Teaching. 2023;6(2):1-12
  3. 3. Kalnitski M. Bachelorarbeit mit KI Schreiben? TiTok vom 21.12.2022. 2023. Verfügbar unter: https://www.tiktok.com/@botscaler/video/719688580804087046
  4. 4. Tangermann V. College Student caught Submitting Paper Using ChatGPT. 2023. Available from: https://futurism.com/college-student-caught-writing-paper.chatgpt
  5. 5. Pollfish Surveys. Nearly 1 in 3 College Students Have Used ChtaGPT on Written Assignments. 2023. Available from: https://www.inteligent.c0m/nearly-1-in-3-college-students-have-used-chatgpt-on-written-assignments
  6. 6. Jouanne-Diedrich Hv. Professor Gegen KI! Kann ChatGPT Meine Klausur Lösen? Video. 2023. Verfügbar unter: https://owww.youtube.com/watch?v=SS24iTTOn7o
  7. 7. Kipping D. How does ChatGPT Do on a College Level Astrophysics Exam? Video. 2023. Verfügbar unter: https://www.youtube.com/watch?v=KOcmmKPKlp4
  8. 8. Fleck T. Prüfungsrechtliche Fragenzu ChatGPT. Erlangen: Handreichung der Stabsstelle IT-Recht der bayerischen Universitäten; 2023. Verfügbar unter: https://www.rz.uni-wuerzburg.de/fileadmin/42010000/2023/ChatGPT_und_Prüfungsrecht.pdf
  9. 9. Sullivan M, Kelley A, McLaughlin P. ChatGPT in higher education: Considerations for academic integrity and student learning. Journal of Applied Learning and Teaching. 2023;6(1):1-10
  10. 10. Solis T. Die ChatGPT-Richtlinien der 100 Größten Deutschen Universitäten. Amsterdam: Scribbr; 2023. Verfügbar unter: https://www.scribbr.de/ki-tools-nutzen/chatgpt-universitaere-richtlinien/
  11. 11. Rasul T et al. The role of ChatGPT in higher education: Benefits, challenges, and future research directions. Journal of Applied Learning and Teaching. 2023;6(1):1-16
  12. 12. Patel N. Using ChatGPT Prompts to Generate Assessment Questions. 2023. Available from: https://www.playpowerlabs.com/post/using-chatgpt-prompts-to-generate-effective-formative-assessment-questions
  13. 13. Gieselmann H. Die 80-Prozent-Maschinen. Warum KI-Sprachmodelle weiterhin Fehler machen und was das für den produktiven Einsatz bedeutet. c’t, Heft. 2023;21:17-19
  14. 14. Caulfield J. Is ChatGPT Trustworthy? Accuracy Tested. 2023. Available from: https://www.scribbr.com/ai-tools/is-chatgpt-trustworthy/
  15. 15. Ekin S. Pprompt enginnering for ChatGPT. A quick guide to techniques, tips, and best practices. 2023. DOI: 10.36227/techrxiv.22683919.v1
  16. 16. Gptbot.io. Mastering ChatGPT: How to Craft Effective Prompts. 2023. Available from: https://gptbot.io/master-chatgpt-prompting-techniques-guide/
  17. 17. Bager J. Instruieren und verifizieren. Tipps und Tools, mit denen Sie Sprachmodelle produktiv nutzen. c’t, Heft. 2023;21:26-19
  18. 18. Coursera. How To Write ChatGPT Prompts: Your 2024 Guide. 2024. Available from: https://www.coursera.org/articles/how-to-write-chatgpt-prompts
  19. 19. Crabtree M. 10 of The Best ChatGPT Plugins to Get The Most From AI in 2023. 2023. Available from: https://www.datacamp.com/blog/best-chat-gpt-plugins
  20. 20. Schaper N, Soyka C. Kompetenzorientiertes Prüfen. Grundlagen, präsenz- und onlinegestützte Formate, Bewertung und Rückmeldung kompetenzorientierter Prüfungsleistungen. In: Neues Handbuch Hochschullehre, H 6.8. Ausgabe 100 ed. Berlin: DUZ Verlags- und Medienhaus; 2021. pp. 95-122
  21. 21. Wesselink R, Biemans H, Gulikers J, Mulder M. Models and principles for designing competence-based curricula, teaching, learning and assessment. In: Mulder M, editor. Competence-Based Vocational and Professional Education. London: Springer; 2017. pp. 533-553
  22. 22. Biggs J, Tang K. Teaching for Quality Learning at University. 4th ed. Maidenhead: Oxford University Press; 2011
  23. 23. Pointz TS. Learning outcomes: What are they? Who defines them? When and where are they defined? Educational Assessment, Evaluation and Accountability. 2010;22:119-137
  24. 24. Anderson LW, Krathwohl DR. A Taxonomy for Learning, Teaching and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives: Complete Edition. New York: Longman; 2001
  25. 25. Gamage KAA, Pradeep RGG, de Silva EK. Rethinking assessment: The future of examinations in higher education. Sustainability. 2022;14:3552. DOI: 10.3390/su14063552
  26. 26. Schaper N. Prüfen in der Hochschullehre. In: Kordts-Freudinger R, Schaper N, Scholkmann A, Szczyrba B, editors. Handbuch Hochschuldidaktik. Bielefeld: wbv/utb; 2021. pp. 87-102
  27. 27. Klapproth F. Von den Lehrzielen zur schriftlichen Prüfung. Ein Leitfaden für Lehrende der Psychologie. Göttingen: Hogrefe; 2023
  28. 28. Schaper N. Konzeption schriftlicher Prüfungsaufgaben auf Basis der Lernzieltaxonomie von Anderson & Krathwohl. In: Schaper N, Vogelsang C, editors. Prüfen in der Hochschullehre—Konzeptionell Fundiert, Evidenzbasiert und Qualitätsorientiert. Köln: FIHB-Reihe, TH Köln; 2024. pp. 58-72
  29. 29. Bloxham S, Boyd P. Developing Assessment in Higher Education: A Practical Guide. New York, NY: Open University Press, McGraw-Hill; 2007
  30. 30. Metzger C, Nüesch C. Fair prüfen. Ein Qualitätsleitfaden für Prüfende an Hochschulen, Hochschuldidaktische Schriften. Vol. 6. StaGallen: Institut für Wirtschaftspädagogik, Universität St. Gallen; 2024
  31. 31. Glion Institute of Higher Education. How to Write Asssessment Criteria? 2023. Available from: https://library.glion.edu/celt/how-to-write-assessment-criteria/
  32. 32. Schlomske-Bodenstein N, Strasser A, Schindler C, Schulz F. Handreichungen zum Kompetenzorientierten Prüfen. 2018. Verfügbar unter: https://www.lehren.tum.de/fileadmin/w00bmo/www/Downloads/Themen/Pruefungen/Handreichungen_kompetenzorientiert_Pruefen.pdf
  33. 33. Stevens DD, Levi A. Introduction to Rubrics: An Assessment Tool to Save Grading Time, Convey Effective Feedback, and Promote Student Learning. Sterling, VA: Stylus Publishing LLC; 2012
  34. 34. Universität Zürich—Arbeitsstelle für Hochschuldidaktik. Taxonomie-Matrix zur Analyse und Selbstevaluation von Hochschullehre (TAMAS). 2010. Verfügbar unter: https://www.weiterbildung.uzh.ch/dam/jcr:ffffffff-9a08-8cca-0000-000037b2e4ce/DU_Tamas_def.pdf
  35. 35. Biggs J. Constructive alignment in university teaching. HERDSA Review of Higher Education. 2014;1:5-22. Available from: https://www.herdsa.org.au/herdsa-review-higher-education-vol-1/5-22
  36. 36. Hailikari T, Virtanen V, Vesalainen M, Postareff L. Student perspectives on how different elements of constructive alignment support active learning. Active Learning in Higher Education. 2022;23(3):217-231. DOI: 10.1177/1469787421989160
  37. 37. Raeburn P, Muldoon N, Bookallil C. Blended spaces, work-based learning and constructive alignment: Impacts on student engagement. In: Atkinson RJ, McBeath C, editors. Same Places, Different Spaces. Proceedings of the 26th Annual Ascilite International Conference. Auckland: Ascilite; 2009. pp. 820-831
  38. 38. Workera. Making the Right Choices: How to Generate Outstanding Multiple-Choice Questions using ChatGPT. 2023. Verfügbar unter: https://workera.ai/blog/chatgpt-learning-assessments
  39. 39. Herft A. Leitfaden für Lehrkräfte zur Nutzung von ChatGPT. Vortrag Gehalten am Pädagogischen Tag am Deutsch-Französischen Gymnasium in Saarbrücken am 13.02.2023. 2023. Verfügbar unter: https://material.rpi-virtuell.de/material/leitfaden-fuer-lehrkraefte-zur-nutzung-von-chatgpt/

Written By

Niclas Schaper

Submitted: 29 February 2024 Reviewed: 01 March 2024 Published: 28 April 2024