Descriptive statistics and correlations for Studies 1–4.
Abstract
Despite the growing interest in utilizing commercial off-the-shelf (COTS) games for instructional and assessment purposes there is a lack of research evidence regarding COTS games for these applications. This chapter considers the application of COTS games for instruction and assessment and provides preliminary evidence comparing COTS game scores to traditional multiple-choice assessments. In a series of four studies, we collected data and compared results from the performance in a COTS game to scores on a traditional multiple-choice assessment written for the purposes of each study. Each assessment was written to evaluate the same content presented in the game for each respective study. Three of the four studies demonstrated a significant correlation between the COTS game and the traditional multiple choice assessment scores. The non-significant value in Study 4 was likely due to a small sample size (n < 100). The results of these studies support our hypothesis and demonstrate that COTS games may be a useful educational tool for training or assessment purposes. We recommend that future research focuses on specific applications of COTS games to explore further opportunities for utilizing COTS in education and assessment.
Keywords
- commercial off-the-shelf (COTS) game
- game-based assessments (GBAs)
- serious games
- game-based learning (GBL)
- instructional games
- multiple-choice assessments
1. Introduction
Over the last decade, the use of games for purposes outside of entertainment has grown in both research and industry, leading to discrepancies in conceptual definitions, use cases, and evidence-based benefits [1, 2, 3, 4]. An increase in recent research literature reviews and meta-analyses on the topic of game-based interventions emphasizes the wealth of research being conducted in this area [5, 6]. However, game-developers have quickly capitalized on the widespread interest and often market game-based interventions in ways that are not always empirically tested with evidence for their publicized use [7]. In this chapter we aim to provide clarity on this topic by first reviewing important terminology, second discussing applications of games for instruction, and third exploring applications of games for assessment. Then we focus on the rise of commercial off-the-shelf (COTS) games for both instructional and assessment purposes. We close by highlighting the authors’ research from four studies focused on comparing COTS games to traditional assessments.
1.1 Important terminology
In this chapter, we use the term
1.2 Game-based learning (GBL)
Games have been used to stimulate student learning by attracting students’ attention through increased motivation [14]. Additionally, researchers and practitioners have shown increasing interest in alternative instructional methods, which emphasize a more active learner role [8]. Combined with the increasing growth of interactive technologies, this provides a unique opportunity to explore learning environments which involve students in their own learning process, such as the gamification of learning content and using a GBL solution [15]. In contrast with traditional learning methods, educational games have shown a positive relationship with the thinking skills of learners and can improve learner motivation [16, 17]. Considering the ways that GBL solutions can present learning experiences in a motivating way to students, using games for learning could feasibly increase student engagement and in turn have the potential to improve the academic performance of students [18, 19, 20].
1.3 Game-based assessment (GBA)
GBAs are designed to measure knowledge, skills, or abilities (KSAs) within the context of the game [21, 22, 23]. They are defined as evaluations that use game elements to immerse the individual in a specific game environment, allowing them to interact with it, and demonstrate the desired KSA [2]. Importantly, GBAs typically use game activities as tasks to generate evidence of complex skills [24]. Using a GBA with strong psychometric evidence (e.g., reliability and validity evidence for the intended use) has the potential to provide a number of benefits. Some researchers have proposed that GBAs might be designed to generate positive outcomes in test-takers such as reduced test anxiety, challenges faking, and measuring more behavior-based measures [3, 7, 8, 9, 16, 25, 26]. In the context of work, some researchers have referred to GBA as “an assessment method in which job candidates are players participating in a core gameplay loop while trait information is inferred” [9].
While traditional assessments use participants’ responses to textual or graphic prompts to collect data about their knowledge, skills, and abilities, GBAs gather this data based on the test-taker’s in-game behaviors [27]. These behaviors can be a range of information such as overt information like a choice a player makes when faced with a discrete decision in a game. A theory-driven assessment often focuses on a theoretical research model to build the design and intention of the evaluative components into the assessment. These components may lead to a narrative or series of decisions around targeted behaviors related to the variable being assessed. An alternative to a theory-driven approach are games built using a data-driven approach where minor behaviors during the gameplay are used to predict the outcomes of interest. A data-driven GBA is not built with a targeted construct and measurement in mind [28, 29, 30] but can empirically measure a construct using
There are several benefits regarding GBAs that make them particularly appealing to organizations who might use them for personnel selection. One of these is the suggestion that GBAs might be able to predict job performance beyond traditional selection methods [16, 32]. Given the longstanding concern of applicant faking, the potential to reduce socially desirable responses are another benefit that GBAs may provide to organizational assessments [16, 32]. Since GBAs can be designed to measure traits and behaviors indirectly, this has the potential to obscure the purpose of the assessment which could in turn make it more challenging for candidates to identify the variable being assessed and what a good answer would look like [32]. A last benefit that may be of particular interest to organizations is that GBAs have been shown to reduce adverse impact compared with traditional paper-and-pencil tests [27, 33, 34]. Because of these benefits, organizations may want to consider investing in GBAs to evaluate different characteristics in the workplace, such as cognitive ability, individual characteristics, or job skills [27, 35].
1.4 Commercial off-the-shelf (COTS) games
Given the time, money, and expertise needed to develop game-based interventions, there are contexts in which it is more reasonable for researchers or practitioners to use an existing COTS game for their purpose rather than investing the time or resources into developing their own game [3, 27, 36]. However, a critical consideration when choosing to use a COTS game is that these games are rarely designed for the purpose they will be used for. Although a growing number of vendors are developing COTS games for learning or assessment purposes and having a growing body of evidence to support those games for those uses, in most contexts a researcher and practitioner will not have the option of a COTS game designed for the purpose they intend. This makes the consideration and selection of the COTS game to be used a critical step.
If a COTS game is being used for assessment or evaluation, the behaviors being displayed by the player must be measurable, either by a metric captured in the game or by an observable behavior that can be recorded by an observer. Additionally, the content needs to present a scenario in which the player has the opportunity to demonstrate relevant behavior to the variable being measured. For example, see the series of studies by [27] where VR COTS games were used for measuring spatial recognition variables.
If a COTS game were being used for a learning application, the game would need to present the relevant content that the player needs to learn. This includes presenting the information and ideally presenting a context in which the relevant information could be used or the intended skill could be practiced. It would be further valuable for the selection of the COTS game to avoid games with excessive information or components that aren’t relevant to the learning experience, also called
Research on traditional assessments focuses heavily on the psychometric reliability and validity of the instrument [9], but empirical support for the psychometric properties of COTS games is still in its infancy. This lack of research of the efficacy and utility of COTS games for instructional and assessment purposes has led to the current series of studies which are intended to provide preliminary evidence on the use of COTS game scores. In these studies, we ask the following research question.
Below we detail four correlational studies focused on the application of COTS games. We seek to better understand potential applications of COTS games as learning or assessments interventions by better understanding the similarity between scores produced in a COTS game to traditional multiple-choice assessment. Each of the COTS games in the studies below was carefully considered and chosen as it represents content that could be relevant to learners in a GBL context. Since the scores produced by COTS games are not intended for learning or assessment purposes, our question is general to understanding if these scores are meaningfully related to a relevant measure in these particular cases; scores on a traditional multiple-choice assessment in the same content area. We caveat that this evidence does not generalize to all COTS games and individual data would need to be collected and analyzed if a COTS game is being considered for a similar application. However, this series of studies does provide a general proof-of-concept that may aid future applications of COTS games.
2. Research series
Data from the first three studies were collected as part of other projects and analyzed for this research question. All four studies collected samples from undergraduate and graduate students from Universities in the Western United States, with approval from the associated Institutional Review Board. Each study reports demographic information, scores from the COTS game used, and scores from the multiple-choice assessment. An image of the COTS game used in each study is shown in Figure 1. Each multiple-choice assessment included four response options, one correct answer, and good metrics based on a pilot sample (i.e., reasonable difficulty between 0.30 and 0.90 and knowledge discrimination of
2.1 Study 1
Participants were 385 students; primarily female (51%) and Caucasian 73% with an average age of almost 19 years (
2.1.1 COTS game
In the videogame
2.1.2 Multiple-choice assessment
A 26-item multiple choice assessment was written for this study with questions linked to the specific missions, roles, and responsibilities in the game.
2.2 Study 2
Participants were 140 students; primarily female (69%), and mostly Caucasian (39%) or Hispanic (29%) with an average age of almost 24 years (
2.2.1 COTS game
The videogame
2.2.2 Multiple-choice assessment
A 17-item multiple choice assessment was written for this study. All questions pertained to the surgical terms, tools, and procedures presented during the game.
2.3 Study 3
Participants were 100 students; mostly females (67%) of Caucasian (40%) or Hispanic (32%) descent with an average age of about 23 years (
2.3.1 COTS game
In the game
2.3.2 Multiple-choice assessment
An 18-item multiple choice assessment was written for this study. All questions were related to the tasks and terminology used in the game for repairing PCs.
2.4 Study 4
Participants were 78 students; mostly females (70%) of Hispanic (27%) or Asian (28%) descent with an average age of about 23 years (
2.4.1 COTS game
In the game
2.4.2 Multiple-choice assessment
A 10-item multiple choice assessment was written for this study. All questions asked about the tools and types of repairs done in the game.
3. Research results
Results of the correlation coefficient analyses are shown in Table 1. These demonstrate that there are positive, significant correlation between the COTS game and multiple-choice assessments in three of the four studies; Study 1
1 | 2 | 3 | 4 | 5 | |||
---|---|---|---|---|---|---|---|
| 0.48 | 0.50 | — | ||||
| 18.97 | 1.72 | 0.18** | — | |||
| 2.64 | 0.99 | 0.62** | 0.10* | |||
| 0.55 | 0.15 | 0.16** | 0.04 | 0.14** | — | |
| 0.56 | 0.15 | 0.29** | 0.07 | 0.45** | 0.23** | — |
| 0.28 | 0.44 | — | ||||
| 23.75 | 4.85 | 0.14 | — | |||
| 2.16 | 1.01 | 0.60** | −0.12 | |||
| 0.60 | 0.24 | 0.05 | −0.26** | 0.21* | — | |
| 0.11 | 0.03 | 0.05 | 0.16 | −0.01 | 0.27** | — |
| 0.29 | 0.46 | — | ||||
| 22.49 | 5.61 | 0.15 | — | |||
| 2.90 | 0.55 | 0.59** | −0.13 | |||
| 0.55 | 0.27 | 0.11 | −0.16 | 0.22* | — | |
| 0.71 | 0.16 | 0.09 | −0.04 | 0.23* | 0.36** | — |
| 0.20 | 0.40 | — | ||||
| 24.04 | 6.85 | 0.29* | — | |||
| 2.51 | 1.10 | 0.50** | 0.10 | |||
| 0.42 | 0.10 | −0.02 | 0.03 | −0.14 | — | |
| 0.55 | 0.25 | 0.06 | −0.25* | 0.30* | 0.02 | — |
4. Discussion
The purpose of this chapter was to provide information about the application of COTS games as game-based interventions. In the research series presented as part of this chapter, we sought to build upon a growing body of research literature on COTS games by showing the convergence between COTS game scores with traditional multiple-choice assessment scores on the same content. Overall, the research results answered our research question and showed that COTS game scores significantly and positively correlated with the traditional knowledge assessment scores in Studies 1–3 but not Study 4. Although a non-significant relationship was found in Study 4, we note that these results may be attributed to a limited sample size. Data collection was halted during Study 4 due to the onset of the COVD-19 pandemic and local laws prohibited in-person gatherings including the collection of in-person data which was part of this research protocol. Overall, these results suggest that when a COTS game is considered and chosen intentionally, that the COTS game scores may be reasonably similar to a traditional knowledge assessment score. We caveat these results by stating that these findings cannot generalize outside of these particular games and provide primarily a proof-of-concept to the consideration and use of COTS games. We encourage researchers and practitioners to collect and evaluate their own evidence for the COTS games they select and the intended application of those games.
4.1 Intentional game selection
The meaningful relationships between the COTS game scores and traditional assessments highlights the potential benefits of using game-based interventions. COTS games may provide viable solutions to learning and assessment contexts but must be considered carefully with the particular purpose of the game in mind. While some COTS games may demonstrate meaningful uses without being intentionally designed to do so [7, 9], it is important to note that not all COTS games will demonstrate similar results. Here we highlight the importance of selecting the right COTS game for the intended purpose. To avoid potential pitfalls of using a COTS game, individuals can prioritize the selection process by incorporating time for planning and review into the early stages of their project. Having a thorough understanding of the differences in game design and content is critical to selecting an appropriate COTS game for the intended purpose. As mentioned earlier in this chapter, learning and assessment are examples of two desired outcomes. To further illustrate how a COTS game could be reasonably connected to the end goal, we provide more detail about the content of the COTS games used in the series of research studies presented in this chapter.
4.1.1 Study 1
In the game
4.1.2 Study 2
In the game
4.1.3 Study 3
In the game
4.1.4 Study 4
In the game
Each of these topics are relevant to workplace learning and may present a relevant context for training or assessment. For example, in Study 1, participants must learn their role in the game but also communicate with members of their team, coordinate their actions, and effectively work together to meet the game objectives. In this instance, the participants cannot be successful together unless they work as a team. Although the environment of being in space does not present a typical work environment, players were actively engaging in workplace applicable teamwork skills. In contrast to this, the physical environment of the games in Studies 2–4 were relevant to the fields of their subject matter (healthcare, computer science, and automotive industry, respectively). As an increasing number of industries continue to consider and apply game-based interventions, the variety of different COTS game contexts becomes more relevant [41, 42, 43].
4.2 Low and high stakes contexts
While COTS games offer a reasonable alternative to the resource intensive commitment of developing a game, they do not provide the same precision and control compared to traditional methods, which can be an important component in some training and assessment contexts [9]. Because of this, it is important to only use COTS in particular contexts when this precision and control is not critical. Beyond assessing how the content of a COTS game is associated with the intended purpose, it’s imperative to consider whether the data will be used in a low stakes or a high stakes scenario.
High stakes scenarios occur when results could have a large, consequential influence on a person’s access to or receiving of resources. This means it might impact their pay, promotion, or selection into a program, course, or position. A low stakes scenario occurs when the results are used for general information given to the individual or for developmental purposes when shared with a larger institution. For example, using a game to assess applicants for a job would be high stakes. Whereas, using a game to assess high school students’ language skills to provide feedback on how to better prepare for college would be a low stakes scenario.
An additional consideration to distinguish high and low stakes scenarios is when a cost is associated with the service and when a claim is made with regard to the actual result of the intervention. For example, someone paying for a training course intended to improve their skills with a software is high stakes and should have evidence that this program improves the skills in individuals that it claims to improve. Similarly, if the program is free but it claims to improve the skill, it should still provide this evidence based on its claim. In contrast, if the program is a free game intended for entertainment purposes and does not make claims about improving cognitive functioning or other skill development then it is low stakes.
High stakes scenarios should be approached with extra care and use instruments or interventions with strong psychometric evidence on the reliability and validity of intended use. To this end, we recommend COTS games to be used primarily in low stakes scenarios. Given the degree of importance and variation that can occur in high stakes scenarios, control over a game or gamified experience is paramount so that any bias or unintended negative consequences can be reconciled. COTS games are typically used without the option to customize aspects of the game, hindering the ability to identify and rectify any pitfalls. Using established and rigorous science-based processes for the development of game-based interventions is especially important for high stakes training or assessment purposes. GBL solutions are best developed following empirical instructional system design (ISD) methodology while GBAs should follow standard assessment development and validation processes to demonstrate the psychometric properties of the assessment [9].
4.3 Recommendations for software developers
Given the results of this paper, it appears that COTS games have potential uses beyond entertainment purposes when selected with intention and applied thoughtfully. Below are recommendations for software developers regarding actions which might benefit a COTS game for being used for alternative purposes.
Firstly, software developers can make a variety of game metrics available to collect from gameplay as providing frequent and detailed feedback on performance is a key component of both training and assessment [44]. When games are used for training or assessment purposes, having detailed metrics (e.g., how long a player spent on a level, how many incorrect selections were made etc.) can be a helpful device. COTS games which provide a metrics sheet or summary of data would greatly aid in the game’s accessibility for alternate purposes. For example, many Massively Multiplayer Online Role-Playing Games (MMORPGs) and Multiplayer Online Battle Arena (MOBAs) make detailed game metrics publicly available for players to view statistics of their own and other players’ performance to better understand and improve their future gameplay [45].
Secondly, when possible, it would help to have software developers create easily customizable options in their games that could increase the usability of the game for alternative purposes. For example, some games could add additional modes (e.g., single player, cooperative, or competitive modes) which allow for alignment with a greater variety of intervention purposes. Games could also provide more content related tutorials so players have the opportunity to learn about the context of the game in a tutorial rather than just learning through game mechanics. For example, a medical based game could provide specific information about tools, procedures, and different considerations when caring for a patient. Additionally, learning and assessment components can be an added benefit when thoughtfully integrated into other contexts such as educational settings [46].
Thirdly, when a game is naturally aligned to an area that might have potential for alternative uses outside of entertainment (e.g., game has content relevant to child or adult learning or the content might potentially assess a relevant topic area) it would help to have software developers seek input or feedback from potential users. This may include contacting educators to ask about particular features they would want in the game if they were to use it for learning or posting summaries of the game on online forums and asking for general ideas on how to improve the game if it were used for assessment purposes. Understanding, and marketing, the particular qualities and game characteristics that would be most beneficial for alternative uses would help in promoting the selection of the COTS game.
4.4 Understanding COTS games and modern technologies
Considering the rapid emergence of new technologies, it is relevant to consider how some of the most recent innovations in the gaming world might impact the results discussed here. New technologies such as XR and VR offer high fidelity experiences in contexts that can be created and customized for individuals to experience. The potentially high impact of these new experiences involves the degree of realism they create for the player. Several researchers have begun to explore these technologies and their potential uses for training and assessment purposes, highlighting the hyper realistic environment as a key feature that future researchers and practitioners can capitalize on [3, 27].
This realism can be a valuable training or assessment component of the game or experience for individuals. For example, if trying to teach medical students how to perform a particular procedure, it may be beneficial to have them perform this skill under a variety of conditions such as in a quiet, focused space but also in a loud, and busy space that might better replicate the environment of an Emergency Room. Using XR or VR, the procedure could be adapted to different scenarios, and embedded within a seemingly realistic ER environment that allows common sights and sounds from this experience to be included within the training session.
Similarly for assessment, it may be critical for an electrical engineer to be able to perform specific repairs on power lines and equipment. In XR or VR, these repairs could be assessed in an environment that emulates a heights scenario, where the participant truly feels they are at the top of a powerline. The sights, sounds, and sensations they experience can bring a sense of presence and embodiment that is difficult, if not impossible, to emulate in non-XR or VR environments. In short, it is clear that there are new technologies emerging that could further enhance the usability and application of future COTS games.
5. Conclusion
Research has demonstrated that some game-based interventions may provide benefits in learning and assessment contexts. As the research on game-based interventions continues to mature, more empirical evidence is being produced on how to design and implement games in a more impactful way [47, 48]. Game-based interventions have much potential [9], but game design can be resource-intensive, making COTS games an attractive alternative when the requisite time, resources, or expertise are not available for game development. In the focal research series, we sought to explore how COTS game scores converge with traditional knowledge assessment scores and found preliminary proof-of-concept evidence that in this context, three of the four COTS games demonstrated significant convergence with the respective traditional, multiple-choice assessment. Although these findings do not generalize outside the context of these particular games, they do provide a promising future to the further investigation of COTS games as viable solutions in low stakes instructional and evaluative contexts. Overall, we recommend COTS games within particular contexts where a well-suited game is chosen and encourage individuals to exercise intention in how the COTS games are used.
References
- 1.
Bedwell WL, Pavlas D, Heyne K, Lazzara EH, Salas E. Toward a taxonomy linking game attributes to learning: An empirical study. Simulation & Gaming. 2012; 43 :729-760. DOI: 10.1177/1046878112439444 - 2.
Landers RN. An introduction to game-based assessment: Frameworks for the measurement of knowledge, skills, abilities and other human characteristics using behaviors observed within videogames. International Journal of Gaming and Computer-Mediation Simulations. 2015; 7 :iv-viii. DOI: 10.4018/IJGCMS - 3.
Sanchez DR, Weiner EJ, Van Zelderen A. Virtual reality assessments (VRAs): Exploring the reliability and validity of evaluations in VR. International Journal of Selection and Assessment. 2022; 30 (1):103-125. DOI: 10.1111/ijsa.12369 - 4.
Sanchez DR. Videogame-based learning: A comparison of direct and indirect effects across outcomes. Multimodal Technologies and Interaction. 2022; 5 :1 - 5.
Bai S, Hew KF, Huang B. Does gamification improve student learning outcome? Evidence from a meta-analysis and synthesis of qualitative data in educational contexts. Educational Research Review. 2020; 30 :100322. DOI: 10.1016/j.edurev.2020.100322 - 6.
Wu WH, Hsiao HC, Wu PL, Lin CH, Huang SH. Investigating the learning-theory foundations of game-based learning: A meta-analysis. Journal of Computer Assisted Learning. 2011; 28 :265-279. DOI: 10.1111/j.1365-2729.2011.00437.x - 7.
Landers RN, Collmus AB. Gamifying a personality measure by converting it into a story: Convergence, incremental prediction, faking, and reactions. International Journal of Selection and Assessment. 2022; 30 (1):145-156. DOI: 10.1111/ijsa.12373 - 8.
Garris R, Ahlers R, Driskell JE. Games, motivation, and learning: A research and practice model. Simulation & Gaming. 2002; 33 :441-467. DOI: 10.1177/1046878102238607 - 9.
Landers R, Sanchez DR. Game-based, gamified, and gamefully designed assessments for employee selection: Definitions, distinctions, design, and validation. International Journal of Selection and Assessment. 2022; 30 (1):1-13. DOI: 10.1111/ijsa.12376 - 10.
Ritzhaupt AD, Gunter E, Jones JG. Survey of commercial off-the-shelf video games: Benefits and barriers in informal educational settings. International Journal of Instructional Technology and Distance Learning. 2010; 7 :45-55. DOI: 10.1177/0047239515588161 - 11.
Stansbury JA, Wheeler EA, Buckingham JT. Can Wii engage college-level learners? Ease of commercial off-the-shelf gaming in an introductory statistics course. Computers in the Schools. 2014; 31 :103-115. DOI: 10.1080/07380569.2014.879791 - 12.
Sundqvist P. Commercial-off-the-shelf games in the digital wild and L2 learner vocabulary. Language Learning & Technology. 2019; 23 :87-113. DOI: 10125/44674 - 13.
Wiklund M, Ekenberg L. Going to school in World of Warcraft: Observations from a trial program using off-the-shelf computer games as learning tools in secondary education. Designs for Learning. 2009; 2 :36-55. DOI: 10.16993/dfl.18 - 14.
Sanchez DR, Nelson TQ , Kraiger K, Weiner EJ, Lu Y, Schnall J. Defining motivation in video game-based training: Exploring the differences between measures of motivation. International Journal of Training & Development. 2022; 26 (1):1-28. DOI: 10.1111/ijtd.12233 - 15.
Krath J, Schürmann L, Von Korflesch HF. Revealing the theoretical basis of gamification: A systematic review and analysis of theory in research on gamification, serious games and game-based learning. Computers in Human Behavior. 2021; 125 :106963. DOI: 10.1016/j.chb.2021.106963 - 16.
Armstrong MB, Landers RN, Collmus AB. Gamifying recruitment, selection, training, and performance management: Game-thinking in human resource management. In: Gangadharbatla H, Davis DZ, editors. Emerging Research and Trends in Gamification. Hershey, Pennsylvania, USA: IGI Global; 2016. pp. 140-165. DOI: 10.4018/978-1-4666-8651-9.ch007 - 17.
Sanchez DR, Langer M, Kaur R. Gamification in the classroom: Examining the impact of gamified quizzes on student learning. Computers & Education. 2020; 144 :103666. DOI: 10.1016/j.compedu.2019.103666 - 18.
Erhel S, Jamet E. Digital game-based learning: Impact of instructions and feedback on motivation and learning effectiveness. Computers & Education. 2013; 67 :156-167. DOI: 10.1016/j.compedu.2013.02.019 - 19.
Taub M, Sawyer R, Smith A, Rowe J, Azevedo R, Lester J. The agency effect: The impact of student agency on learning, emotions, and problem-solving behaviors in a game-based learning environment. Computers & Education. 2020; 147 :103781. DOI: 10.1016/j.compedu.2019.103781 - 20.
Yang YTC, Chang CH. Empowering students through digital game authorship: Enhancing concentration, critical thinking, and academic achievement. Computers & Education. 2013; 68 :334-344. DOI: 10.1016/j.compedu.2013.05.023 - 21.
Hummel HG, Joosten-ten Brinke D, Nadolski RJ, Baartman LK. Content validity of game-based assessment: A case study of a serious game for ICT managers in training. Technology, Pedagogy, and Education. 2017; 26 :225-240. DOI: 10.1080/1475939X.2016.1192060 - 22.
Shute VJ, Ventura M, Kim YJ. Assessment and learning of qualitative physics in Newton's Playground. Journal of Educational Research. 2013; 106 :423-430. DOI: 10.1080/00220671.2013.832970 - 23.
Stănescu DF, Ionită C, Ionită AM. Game-thinking in personnel recruitment and selection: Advantages and disadvantages. Postmodern Openings. 2020; 11 :267-276. DOI: 10.18662/po/11.2/174 - 24.
Kim YJ, Almond RG, Shute VJ. Applying evidence-centered design for the development of game-based assessments in physics playground. International Journal of Testing. 2016; 16 :142-163. DOI: 10.1080/15305058.2015.1108322 - 25.
Courtney L, Graham S. ‘It’s like having a test but in a fun way’: Young learners’ perceptions of a digital game-based assessment of early language learning. Language Teaching for Young Learners. 2019; 1 :161-186. DOI: 10.1075/ltyl.18009.cou - 26.
Mavridis A, Tsiatsos T. Game-based assessment: Investigating the impact on test anxiety and exam performance. Journal of Computer Assisted Learning. 2017; 33 :137-150. DOI: 10.1111/jcal.12170 - 27.
Weiner EJ, Sanchez DR. Cognitive ability in virtual reality: Validity evidence for VR game-based assessments. International Journal of Selection and Assessment. 2020; 28 :215-236. DOI: 10.1111/ijsa.12295 - 28.
Auer EM, Mersy G, Marin S, Blaik J, Landers RN. Using machine learning to model trace behavioral data from a game-based assessment. International Journal of Selection and Assessment. 2022; 30 (1):82-102. DOI: 10.1111/ijsa.12363 - 29.
Shute VJ, Wang L, Greiff S, Zhao W, Moore G. Measuring problem-solving skills via stealth assessment in an engaging video game. Computers in Human Behavior. 2016; 63 :106-117. DOI: 10.1016/j.chb.2016.05.047 - 30.
Westera W, Nadoski R, Hummel H. Serious gaming analytics: What students' log files tell us about gaming and learning. International Journal of Serious Games. 2014; 1 :35-50. DOI: 10.17083/ijsg.v1i2.9 - 31.
Levy R. Dynamic Bayesian network modeling of game-based diagnostic assessments. Multivariate Behavioral Research. 2014; 54 :771-794. DOI: 10.1080/00273171.2019.1590794 - 32.
Nikolaou I, Georgiou K, Kotsasarlidou V. Exploring the relationship of a gamified assessment with performance. Spanish Journal of Psychology. 2019; 22 :E6. DOI: 10.1017/sjp.2019.5 - 33.
Chan D, Schmitt N. Video-based versus paper-and-pencil method of assessment in situational judgment tests: Subgroup differences in test performance and face validity perceptions. Journal of Applied Psychology. 1997; 82 :143-159. DOI: 10.1037/0021-9010.82.1.143 - 34.
Outtz JL. The role of cognitive ability tests in employment selection. Human Performance. 2002; 15 :161-172. DOI: 10.1207/S15327043HUP1501&02_10 - 35.
Landers R. Developing a theory of gamified learning: Linking serious games and gamification of learning. Simulation & Gaming. 2014; 45 :752-768. DOI: 10.1177/1046878114563660 - 36.
Kim YJ, Shute VJ. The interplay of game elements with psychometric qualities, learning, and enjoyment in game-based assessment. Computers & Education. 2015; 87 :340-356. DOI: 10.1016/j.compedu.2015.07.009 - 37.
Sanchez DR. Videogame-based training: The impact and interaction of videogame characteristics on learning outcomes. Multimodal Technologies and Interaction. 2022; 5 . DOI: 10.3390/mti6030019 - 38.
Findley WG. A rationale for evaluation of item discrimination statistics. Educational and Psychological Measurement. 1956; 16 :175-180. DOI: 10.1177/001316445601600201 - 39.
Lord FM. The relation of the reliability of multiple-choice tests to the distribution of item difficulties. Psychometrika. 1952; 17 :181-194. DOI: 10.1007/BF02288781 - 40.
Sanchez DR, Langer M. Video Game Pursuit (VGPu) scale development: Designing and validating a scale with implications for game-based learning and assessment. Simulations & Gaming. 2020; 51 :55-86. DOI: 10.1177/1046878119882710 - 41.
Aziz R, Norman H, Nordin N, Wahid FN, Tahir NA. They like to play games? Student interest of serious game-based assessments for language literacy. Creative Education. 2019; 10 :3175-3185. DOI: 10.4236/ce.2019.1012241 - 42.
Fong G. Adapting COTS games for military experimentation. Simulation & Gaming. 2006; 37 :452-465. DOI: 10.1177/1046878106291670 - 43.
Graafland M, Schraagen JM, Schijven MP. A systematic review of serious games for medical education and surgical skills training. British Journal of Surgery. 2012; 99 :1322-1330. DOI: 10.1002/bjs.8819 - 44.
Shute VJ, Ke F. Games, learning, and assessment. In: Ifenthaler D, Eseryel D, Ge X, editors. Assessment in Game-Based Learning. New York, NY, USA: Springer; 2012. pp. 43-58. DOI: 10.1007/978-1-4614-3546-4_4 - 45.
Steinkuehler C, Duncan S. Scientific habits of mind in virtual worlds. Journal of Science Education and Technology. 2008; 17 :530-543. DOI: 10.1007/s10956-008-9120-8 - 46.
Shute VJ, Ventura M, Bauer M, Zapata-Rivera D. Monitoring and fostering learning through games and embedded assessments. ETS Research Report Series. 2008; 2008 :i-32. DOI: 10.1002/j.2333-8504.2008.tb02155.x - 47.
All A, Castellar EPN, Van Looy J. Assessing the effectiveness of digital game-based learning: Best practices. Computers & Education. 2016; 92 :90-103. DOI: 10.1016/j.compedu.2015.10.007 - 48.
Marquard JL, Zayas-Cabán T. Commercial off-the-shelf consumer health informatics interventions: Recommendations for their design, evaluation, and redesign. Journal of the American Medical Informatics Association. 2012; 19 :137-142. DOI: 10.1136/amiajnl-2011-000338