Open access peer-reviewed chapter

Commercial-off-the-Shelf (COTS) Games: Exploring the Applications of Games for Instruction and Assessment

Written By

Diana R. Sanchez, Amanda Rueda, Leila Jimeno Jimènez and Mahsa Norouzi Nargesi

Submitted: 15 February 2022 Reviewed: 27 February 2022 Published: 11 April 2022

DOI: 10.5772/intechopen.103965

From the Edited Volume

Computer Game Development

Edited by Branislav Sobota

Chapter metrics overview

244 Chapter Downloads

View Full Metrics


Despite the growing interest in utilizing commercial off-the-shelf (COTS) games for instructional and assessment purposes there is a lack of research evidence regarding COTS games for these applications. This chapter considers the application of COTS games for instruction and assessment and provides preliminary evidence comparing COTS game scores to traditional multiple-choice assessments. In a series of four studies, we collected data and compared results from the performance in a COTS game to scores on a traditional multiple-choice assessment written for the purposes of each study. Each assessment was written to evaluate the same content presented in the game for each respective study. Three of the four studies demonstrated a significant correlation between the COTS game and the traditional multiple choice assessment scores. The non-significant value in Study 4 was likely due to a small sample size (n < 100). The results of these studies support our hypothesis and demonstrate that COTS games may be a useful educational tool for training or assessment purposes. We recommend that future research focuses on specific applications of COTS games to explore further opportunities for utilizing COTS in education and assessment.


  • commercial off-the-shelf (COTS) game
  • game-based assessments (GBAs)
  • serious games
  • game-based learning (GBL)
  • instructional games
  • multiple-choice assessments

1. Introduction

Over the last decade, the use of games for purposes outside of entertainment has grown in both research and industry, leading to discrepancies in conceptual definitions, use cases, and evidence-based benefits [1, 2, 3, 4]. An increase in recent research literature reviews and meta-analyses on the topic of game-based interventions emphasizes the wealth of research being conducted in this area [5, 6]. However, game-developers have quickly capitalized on the widespread interest and often market game-based interventions in ways that are not always empirically tested with evidence for their publicized use [7]. In this chapter we aim to provide clarity on this topic by first reviewing important terminology, second discussing applications of games for instruction, and third exploring applications of games for assessment. Then we focus on the rise of commercial off-the-shelf (COTS) games for both instructional and assessment purposes. We close by highlighting the authors’ research from four studies focused on comparing COTS games to traditional assessments.

1.1 Important terminology

In this chapter, we use the term game-based intervention to speak broadly of game-like experiences and the ways they are applied for non-entertainment purposes such as education or evaluation. Within this broad conceptualization, there are several important terms: game-based learning (GBL), game-based assessment (GBA), gameful design, gamification, and COTS games. GBL, also referred to as serious games [8] occurs when a game is used for the primary purpose of learning [2]. Similarly, GBAs are evaluations that take place within the context of a game expressly designed to identify specific skills, traits, or behaviors [9]. In contrast, gameful design and gamification are design strategies which incorporate game elements into new or existing assessments respectively; typically for the purpose of engaging and motivating individuals [2, 9]. Thus, a thorough understanding of game design is requisite to the development of effective GBLs and GBAs as these interventions use the full suite of game attributes. For example, see [1] for a taxonomy of game attributes. As a result, developing GBL and GBAs entails considerably more expertise and resources compared to incorporating game elements through gamification or gameful design. A potential solution to the resource-heavy nature of game development is the utilization of COTS games. Designed by game developers, COTS are most commonly designed for entertainment purposes and are available for purchase by a vendor and can be used almost immediately after purchase [10, 11]. COTS games offer a reasonable alternative to the time and resource intensive commitment to develop one’s own games [12, 13]. GBL, GBA, and COTS games are explored in more detail in the following sections.

1.2 Game-based learning (GBL)

Games have been used to stimulate student learning by attracting students’ attention through increased motivation [14]. Additionally, researchers and practitioners have shown increasing interest in alternative instructional methods, which emphasize a more active learner role [8]. Combined with the increasing growth of interactive technologies, this provides a unique opportunity to explore learning environments which involve students in their own learning process, such as the gamification of learning content and using a GBL solution [15]. In contrast with traditional learning methods, educational games have shown a positive relationship with the thinking skills of learners and can improve learner motivation [16, 17]. Considering the ways that GBL solutions can present learning experiences in a motivating way to students, using games for learning could feasibly increase student engagement and in turn have the potential to improve the academic performance of students [18, 19, 20].

1.3 Game-based assessment (GBA)

GBAs are designed to measure knowledge, skills, or abilities (KSAs) within the context of the game [21, 22, 23]. They are defined as evaluations that use game elements to immerse the individual in a specific game environment, allowing them to interact with it, and demonstrate the desired KSA [2]. Importantly, GBAs typically use game activities as tasks to generate evidence of complex skills [24]. Using a GBA with strong psychometric evidence (e.g., reliability and validity evidence for the intended use) has the potential to provide a number of benefits. Some researchers have proposed that GBAs might be designed to generate positive outcomes in test-takers such as reduced test anxiety, challenges faking, and measuring more behavior-based measures [3, 7, 8, 9, 16, 25, 26]. In the context of work, some researchers have referred to GBA as “an assessment method in which job candidates are players participating in a core gameplay loop while trait information is inferred” [9].

While traditional assessments use participants’ responses to textual or graphic prompts to collect data about their knowledge, skills, and abilities, GBAs gather this data based on the test-taker’s in-game behaviors [27]. These behaviors can be a range of information such as overt information like a choice a player makes when faced with a discrete decision in a game. A theory-driven assessment often focuses on a theoretical research model to build the design and intention of the evaluative components into the assessment. These components may lead to a narrative or series of decisions around targeted behaviors related to the variable being assessed. An alternative to a theory-driven approach are games built using a data-driven approach where minor behaviors during the gameplay are used to predict the outcomes of interest. A data-driven GBA is not built with a targeted construct and measurement in mind [28, 29, 30] but can empirically measure a construct using trace data such as mouse clicks, movements, interactions with objects in games, or time spent on a task [28]. This trace data can be collected automatically by a game and has been shown to demonstrate meaningful information with regards to assessing player information [28]. Either type of GBA changes how assessments are traditionally measured, but holds onto the psychometric properties within the game to evaluate a variety of KSAs [9, 31].

There are several benefits regarding GBAs that make them particularly appealing to organizations who might use them for personnel selection. One of these is the suggestion that GBAs might be able to predict job performance beyond traditional selection methods [16, 32]. Given the longstanding concern of applicant faking, the potential to reduce socially desirable responses are another benefit that GBAs may provide to organizational assessments [16, 32]. Since GBAs can be designed to measure traits and behaviors indirectly, this has the potential to obscure the purpose of the assessment which could in turn make it more challenging for candidates to identify the variable being assessed and what a good answer would look like [32]. A last benefit that may be of particular interest to organizations is that GBAs have been shown to reduce adverse impact compared with traditional paper-and-pencil tests [27, 33, 34]. Because of these benefits, organizations may want to consider investing in GBAs to evaluate different characteristics in the workplace, such as cognitive ability, individual characteristics, or job skills [27, 35].

1.4 Commercial off-the-shelf (COTS) games

Given the time, money, and expertise needed to develop game-based interventions, there are contexts in which it is more reasonable for researchers or practitioners to use an existing COTS game for their purpose rather than investing the time or resources into developing their own game [3, 27, 36]. However, a critical consideration when choosing to use a COTS game is that these games are rarely designed for the purpose they will be used for. Although a growing number of vendors are developing COTS games for learning or assessment purposes and having a growing body of evidence to support those games for those uses, in most contexts a researcher and practitioner will not have the option of a COTS game designed for the purpose they intend. This makes the consideration and selection of the COTS game to be used a critical step.

If a COTS game is being used for assessment or evaluation, the behaviors being displayed by the player must be measurable, either by a metric captured in the game or by an observable behavior that can be recorded by an observer. Additionally, the content needs to present a scenario in which the player has the opportunity to demonstrate relevant behavior to the variable being measured. For example, see the series of studies by [27] where VR COTS games were used for measuring spatial recognition variables.

If a COTS game were being used for a learning application, the game would need to present the relevant content that the player needs to learn. This includes presenting the information and ideally presenting a context in which the relevant information could be used or the intended skill could be practiced. It would be further valuable for the selection of the COTS game to avoid games with excessive information or components that aren’t relevant to the learning experience, also called seductive details [37].

Research on traditional assessments focuses heavily on the psychometric reliability and validity of the instrument [9], but empirical support for the psychometric properties of COTS games is still in its infancy. This lack of research of the efficacy and utility of COTS games for instructional and assessment purposes has led to the current series of studies which are intended to provide preliminary evidence on the use of COTS game scores. In these studies, we ask the following research question.

Research question:Will scores on COTS games significantly converge with traditional multiple-choice assessment scores?

Below we detail four correlational studies focused on the application of COTS games. We seek to better understand potential applications of COTS games as learning or assessments interventions by better understanding the similarity between scores produced in a COTS game to traditional multiple-choice assessment. Each of the COTS games in the studies below was carefully considered and chosen as it represents content that could be relevant to learners in a GBL context. Since the scores produced by COTS games are not intended for learning or assessment purposes, our question is general to understanding if these scores are meaningfully related to a relevant measure in these particular cases; scores on a traditional multiple-choice assessment in the same content area. We caveat that this evidence does not generalize to all COTS games and individual data would need to be collected and analyzed if a COTS game is being considered for a similar application. However, this series of studies does provide a general proof-of-concept that may aid future applications of COTS games.


2. Research series

Data from the first three studies were collected as part of other projects and analyzed for this research question. All four studies collected samples from undergraduate and graduate students from Universities in the Western United States, with approval from the associated Institutional Review Board. Each study reports demographic information, scores from the COTS game used, and scores from the multiple-choice assessment. An image of the COTS game used in each study is shown in Figure 1. Each multiple-choice assessment included four response options, one correct answer, and good metrics based on a pilot sample (i.e., reasonable difficulty between 0.30 and 0.90 and knowledge discrimination of r > 0.25). Each assessment was developed using research-based principles [37, 38]. Lastly, each study collected the participant’s tendency to play videogames using the Video Game Pursuit scale (VGPu) [39]. The scores on this 19-item measure are reported on a 5-point scale from 1 = strongly disagree to 5 = strongly agree. This measure is used since researchers have proposed that a propensity towards playing video games may impact the results of videogame interventions [40].

Figure 1.

Images from the commercial-off-the-shelf (COTS) games used in Study 1 (Quintet, top left), Study 2 (Arm Surgery 2, top right), Study 3 (PC Building Simulator, bottom left), and Study 4 (Car Mechanic Simulator, bottom right).

2.1 Study 1

Participants were 385 students; primarily female (51%) and Caucasian 73% with an average age of almost 19 years (SD = 1.72). In the study, participants completed a consent form, then an initial survey measuring demographic information and video game pursuit. Participants then played the assigned game for 20 minutes followed by a second questionnaire which included a multiple-choice assessment. The study concluded with a four-minute debriefing video.

2.1.1 COTS game

In the videogame Quintet, participants play as crew members of a spaceship (i.e., captain, helm, tactical, engineer, or scientist) who must work with a team to complete various science missions. Participants must learn their role and how to manage their responsibilities on the ship to earn points and meet the mission objectives.

2.1.2 Multiple-choice assessment

A 26-item multiple choice assessment was written for this study with questions linked to the specific missions, roles, and responsibilities in the game.

2.2 Study 2

Participants were 140 students; primarily female (69%), and mostly Caucasian (39%) or Hispanic (29%) with an average age of almost 24 years (SD = 4.85). In the study, participants completed a consent form, then played the assigned game for 15 minutes followed by a questionnaire with the multiple-choice assessment, demographic questions, and the measure of videogame pursuit. The study concluded with a short debriefing statement.

2.2.1 COTS game

The videogame Arm Surgery 2, takes place in a virtual operating room where players take on the role of a surgical doctor. The game begins with a tutorial, where a nurse guides the player through surgical techniques and medical instruments. Players then have to complete a surgery to repair a broken arm as quickly as possible with as few mistakes as possible.

2.2.2 Multiple-choice assessment

A 17-item multiple choice assessment was written for this study. All questions pertained to the surgical terms, tools, and procedures presented during the game.

2.3 Study 3

Participants were 100 students; mostly females (67%) of Caucasian (40%) or Hispanic (32%) descent with an average age of about 23 years (SD = 6.28). In the study, participants signed a consent form then reviewed an 8-page tutorial on how to play the game. Participants then played the assigned game for 30 minutes before completing a survey of demographic questions, videogame pursuit, and a multiple-choice assessment. Participants were then given a debriefing handout and dismissed.

2.3.1 COTS game

In the game PC Building Simulator, players take on the role of a PC repair shop owner. They must manage their shop by diagnosing and repairing computers to make money for their store.

2.3.2 Multiple-choice assessment

An 18-item multiple choice assessment was written for this study. All questions were related to the tasks and terminology used in the game for repairing PCs.

2.4 Study 4

Participants were 78 students; mostly females (70%) of Hispanic (27%) or Asian (28%) descent with an average age of about 23 years (SD = 6.28). In the study, participants completed a consent form then played the assigned game for 15 minutes. The study ended with a survey of demographic measures, videogame pursuit, and a multiple-choice assessment.

2.4.1 COTS game

In the game Car Mechanic Simulator 2, players take on the role of a mechanic to repair, paint, and tune cars in a garage. Their goal is to make money by repairing as many cars as possible.

2.4.2 Multiple-choice assessment

A 10-item multiple choice assessment was written for this study. All questions asked about the tools and types of repairs done in the game.


3. Research results

Results of the correlation coefficient analyses are shown in Table 1. These demonstrate that there are positive, significant correlation between the COTS game and multiple-choice assessments in three of the four studies; Study 1 r = 0.23, p < 0.01; Study 2 r = 0.27, p < 0.01; and Study 3 r = 0.36, p < 0.01; but not Study 4 r = 0.02, p < 0.05. This shows that in three of the four studies presented here, the score generated by playing the COTS game was significantly related to the score earned by the participant on the traditional multiple-choice assessment. These results are valuable for the area of COTS games because they show that when selected carefully, the scores produced in a COTS game might provide meaningful information for individuals wanting to use COTS games for learning or assessment purposes.

Study 1
  1. Sex at birth

  1. Age

  1. Video game pursuit

  1. COTS game

  1. Multiple-choice assessment

Study 2
  1. Sex at birth

  1. Age

  1. Video game pursuit**−0.120.91
  1. COTS game

  1. Multiple-choice assessment−0.010.27**
Study 3
  1. Sex at birth

  1. Age

  1. Video game pursuit

  1. COTS game

  1. Multiple-choice assessment

Study 4
  1. Sex at birth

  1. Age

  1. Video game pursuit

  1. COTS game

  1. Multiple-choice assessment


Table 1.

Descriptive statistics and correlations for Studies 1–4.

*p < 0.05, **p < 0.01. M = mean, SD = standard deviation, COTS = commercial-off-the-shelf. Sex at birth coded as 0 = female, 1 = male. The COTS game scores and multiple-choice assessment scores are percentages. Estimates of reliability are bolded along the diagonal.


4. Discussion

The purpose of this chapter was to provide information about the application of COTS games as game-based interventions. In the research series presented as part of this chapter, we sought to build upon a growing body of research literature on COTS games by showing the convergence between COTS game scores with traditional multiple-choice assessment scores on the same content. Overall, the research results answered our research question and showed that COTS game scores significantly and positively correlated with the traditional knowledge assessment scores in Studies 1–3 but not Study 4. Although a non-significant relationship was found in Study 4, we note that these results may be attributed to a limited sample size. Data collection was halted during Study 4 due to the onset of the COVD-19 pandemic and local laws prohibited in-person gatherings including the collection of in-person data which was part of this research protocol. Overall, these results suggest that when a COTS game is considered and chosen intentionally, that the COTS game scores may be reasonably similar to a traditional knowledge assessment score. We caveat these results by stating that these findings cannot generalize outside of these particular games and provide primarily a proof-of-concept to the consideration and use of COTS games. We encourage researchers and practitioners to collect and evaluate their own evidence for the COTS games they select and the intended application of those games.

4.1 Intentional game selection

The meaningful relationships between the COTS game scores and traditional assessments highlights the potential benefits of using game-based interventions. COTS games may provide viable solutions to learning and assessment contexts but must be considered carefully with the particular purpose of the game in mind. While some COTS games may demonstrate meaningful uses without being intentionally designed to do so [7, 9], it is important to note that not all COTS games will demonstrate similar results. Here we highlight the importance of selecting the right COTS game for the intended purpose. To avoid potential pitfalls of using a COTS game, individuals can prioritize the selection process by incorporating time for planning and review into the early stages of their project. Having a thorough understanding of the differences in game design and content is critical to selecting an appropriate COTS game for the intended purpose. As mentioned earlier in this chapter, learning and assessment are examples of two desired outcomes. To further illustrate how a COTS game could be reasonably connected to the end goal, we provide more detail about the content of the COTS games used in the series of research studies presented in this chapter.

4.1.1 Study 1

In the game Quintet, the primary content focuses on players learning their distinct responsibilities (i.e. the Scientist role can survey the surroundings while the Engineer role can repair and boost various functions) while learning to communicate and coordinate their actions with their teammates. Because the game missions require teammates to coordinate the timing of their actions, gameplay inevitably promotes active communication and collaboration between teammates. Teams will automatically perform better in meeting the mission objectives when they are actively communicating and coordinating their different actions.

4.1.2 Study 2

In the game Arm Surgery 2, the player is presented with factual information about specific surgical equipment and procedures. During gameplay, players are guided through several procedures where they have to select the correct tool and perform the correct action before the game can progress. Progressing through the game naturally promotes active attention towards these tools and procedures as players must monitor and maintain the patient’s health while performing their tasks to earn a higher score.

4.1.3 Study 3

In the game PC Building Simulator, players are responsible for their own PC repair shop. In order to improve their score, they must accurately diagnose the computer’s problem, order the correct parts, and repair or replace those parts before returning the PC to the customer. When players aren’t sure about particular parts, they can go to a guide to click on parts for the name and information on that part. For the gameplay to progress, players must continue to accurately identify and repair the defective components.

4.1.4 Study 4

In the game Car Mechanic Simulator 2, the player owns a car mechanic shop. Similar to the game in Study 3, the player must accurately diagnose the problem, order the needed parts, and replace those parts in the car. The individual can also go to schematics which will give the name and primary functioning of the different parts that will help them when trying to understand and diagnose a problem. Thus, gameplay naturally leads to improved understanding of the different parts of a car and their functioning.

Each of these topics are relevant to workplace learning and may present a relevant context for training or assessment. For example, in Study 1, participants must learn their role in the game but also communicate with members of their team, coordinate their actions, and effectively work together to meet the game objectives. In this instance, the participants cannot be successful together unless they work as a team. Although the environment of being in space does not present a typical work environment, players were actively engaging in workplace applicable teamwork skills. In contrast to this, the physical environment of the games in Studies 2–4 were relevant to the fields of their subject matter (healthcare, computer science, and automotive industry, respectively). As an increasing number of industries continue to consider and apply game-based interventions, the variety of different COTS game contexts becomes more relevant [41, 42, 43].

4.2 Low and high stakes contexts

While COTS games offer a reasonable alternative to the resource intensive commitment of developing a game, they do not provide the same precision and control compared to traditional methods, which can be an important component in some training and assessment contexts [9]. Because of this, it is important to only use COTS in particular contexts when this precision and control is not critical. Beyond assessing how the content of a COTS game is associated with the intended purpose, it’s imperative to consider whether the data will be used in a low stakes or a high stakes scenario.

High stakes scenarios occur when results could have a large, consequential influence on a person’s access to or receiving of resources. This means it might impact their pay, promotion, or selection into a program, course, or position. A low stakes scenario occurs when the results are used for general information given to the individual or for developmental purposes when shared with a larger institution. For example, using a game to assess applicants for a job would be high stakes. Whereas, using a game to assess high school students’ language skills to provide feedback on how to better prepare for college would be a low stakes scenario.

An additional consideration to distinguish high and low stakes scenarios is when a cost is associated with the service and when a claim is made with regard to the actual result of the intervention. For example, someone paying for a training course intended to improve their skills with a software is high stakes and should have evidence that this program improves the skills in individuals that it claims to improve. Similarly, if the program is free but it claims to improve the skill, it should still provide this evidence based on its claim. In contrast, if the program is a free game intended for entertainment purposes and does not make claims about improving cognitive functioning or other skill development then it is low stakes.

High stakes scenarios should be approached with extra care and use instruments or interventions with strong psychometric evidence on the reliability and validity of intended use. To this end, we recommend COTS games to be used primarily in low stakes scenarios. Given the degree of importance and variation that can occur in high stakes scenarios, control over a game or gamified experience is paramount so that any bias or unintended negative consequences can be reconciled. COTS games are typically used without the option to customize aspects of the game, hindering the ability to identify and rectify any pitfalls. Using established and rigorous science-based processes for the development of game-based interventions is especially important for high stakes training or assessment purposes. GBL solutions are best developed following empirical instructional system design (ISD) methodology while GBAs should follow standard assessment development and validation processes to demonstrate the psychometric properties of the assessment [9].

4.3 Recommendations for software developers

Given the results of this paper, it appears that COTS games have potential uses beyond entertainment purposes when selected with intention and applied thoughtfully. Below are recommendations for software developers regarding actions which might benefit a COTS game for being used for alternative purposes.

Firstly, software developers can make a variety of game metrics available to collect from gameplay as providing frequent and detailed feedback on performance is a key component of both training and assessment [44]. When games are used for training or assessment purposes, having detailed metrics (e.g., how long a player spent on a level, how many incorrect selections were made etc.) can be a helpful device. COTS games which provide a metrics sheet or summary of data would greatly aid in the game’s accessibility for alternate purposes. For example, many Massively Multiplayer Online Role-Playing Games (MMORPGs) and Multiplayer Online Battle Arena (MOBAs) make detailed game metrics publicly available for players to view statistics of their own and other players’ performance to better understand and improve their future gameplay [45].

Secondly, when possible, it would help to have software developers create easily customizable options in their games that could increase the usability of the game for alternative purposes. For example, some games could add additional modes (e.g., single player, cooperative, or competitive modes) which allow for alignment with a greater variety of intervention purposes. Games could also provide more content related tutorials so players have the opportunity to learn about the context of the game in a tutorial rather than just learning through game mechanics. For example, a medical based game could provide specific information about tools, procedures, and different considerations when caring for a patient. Additionally, learning and assessment components can be an added benefit when thoughtfully integrated into other contexts such as educational settings [46].

Thirdly, when a game is naturally aligned to an area that might have potential for alternative uses outside of entertainment (e.g., game has content relevant to child or adult learning or the content might potentially assess a relevant topic area) it would help to have software developers seek input or feedback from potential users. This may include contacting educators to ask about particular features they would want in the game if they were to use it for learning or posting summaries of the game on online forums and asking for general ideas on how to improve the game if it were used for assessment purposes. Understanding, and marketing, the particular qualities and game characteristics that would be most beneficial for alternative uses would help in promoting the selection of the COTS game.

4.4 Understanding COTS games and modern technologies

Considering the rapid emergence of new technologies, it is relevant to consider how some of the most recent innovations in the gaming world might impact the results discussed here. New technologies such as XR and VR offer high fidelity experiences in contexts that can be created and customized for individuals to experience. The potentially high impact of these new experiences involves the degree of realism they create for the player. Several researchers have begun to explore these technologies and their potential uses for training and assessment purposes, highlighting the hyper realistic environment as a key feature that future researchers and practitioners can capitalize on [3, 27].

This realism can be a valuable training or assessment component of the game or experience for individuals. For example, if trying to teach medical students how to perform a particular procedure, it may be beneficial to have them perform this skill under a variety of conditions such as in a quiet, focused space but also in a loud, and busy space that might better replicate the environment of an Emergency Room. Using XR or VR, the procedure could be adapted to different scenarios, and embedded within a seemingly realistic ER environment that allows common sights and sounds from this experience to be included within the training session.

Similarly for assessment, it may be critical for an electrical engineer to be able to perform specific repairs on power lines and equipment. In XR or VR, these repairs could be assessed in an environment that emulates a heights scenario, where the participant truly feels they are at the top of a powerline. The sights, sounds, and sensations they experience can bring a sense of presence and embodiment that is difficult, if not impossible, to emulate in non-XR or VR environments. In short, it is clear that there are new technologies emerging that could further enhance the usability and application of future COTS games.


5. Conclusion

Research has demonstrated that some game-based interventions may provide benefits in learning and assessment contexts. As the research on game-based interventions continues to mature, more empirical evidence is being produced on how to design and implement games in a more impactful way [47, 48]. Game-based interventions have much potential [9], but game design can be resource-intensive, making COTS games an attractive alternative when the requisite time, resources, or expertise are not available for game development. In the focal research series, we sought to explore how COTS game scores converge with traditional knowledge assessment scores and found preliminary proof-of-concept evidence that in this context, three of the four COTS games demonstrated significant convergence with the respective traditional, multiple-choice assessment. Although these findings do not generalize outside the context of these particular games, they do provide a promising future to the further investigation of COTS games as viable solutions in low stakes instructional and evaluative contexts. Overall, we recommend COTS games within particular contexts where a well-suited game is chosen and encourage individuals to exercise intention in how the COTS games are used.


Conflict of interest

The authors declare no conflict of interest.


  1. 1. Bedwell WL, Pavlas D, Heyne K, Lazzara EH, Salas E. Toward a taxonomy linking game attributes to learning: An empirical study. Simulation & Gaming. 2012;43:729-760. DOI: 10.1177/1046878112439444
  2. 2. Landers RN. An introduction to game-based assessment: Frameworks for the measurement of knowledge, skills, abilities and other human characteristics using behaviors observed within videogames. International Journal of Gaming and Computer-Mediation Simulations. 2015;7:iv-viii. DOI: 10.4018/IJGCMS
  3. 3. Sanchez DR, Weiner EJ, Van Zelderen A. Virtual reality assessments (VRAs): Exploring the reliability and validity of evaluations in VR. International Journal of Selection and Assessment. 2022;30(1):103-125. DOI: 10.1111/ijsa.12369
  4. 4. Sanchez DR. Videogame-based learning: A comparison of direct and indirect effects across outcomes. Multimodal Technologies and Interaction. 2022;5:1
  5. 5. Bai S, Hew KF, Huang B. Does gamification improve student learning outcome? Evidence from a meta-analysis and synthesis of qualitative data in educational contexts. Educational Research Review. 2020;30:100322. DOI: 10.1016/j.edurev.2020.100322
  6. 6. Wu WH, Hsiao HC, Wu PL, Lin CH, Huang SH. Investigating the learning-theory foundations of game-based learning: A meta-analysis. Journal of Computer Assisted Learning. 2011;28:265-279. DOI: 10.1111/j.1365-2729.2011.00437.x
  7. 7. Landers RN, Collmus AB. Gamifying a personality measure by converting it into a story: Convergence, incremental prediction, faking, and reactions. International Journal of Selection and Assessment. 2022;30(1):145-156. DOI: 10.1111/ijsa.12373
  8. 8. Garris R, Ahlers R, Driskell JE. Games, motivation, and learning: A research and practice model. Simulation & Gaming. 2002;33:441-467. DOI: 10.1177/1046878102238607
  9. 9. Landers R, Sanchez DR. Game-based, gamified, and gamefully designed assessments for employee selection: Definitions, distinctions, design, and validation. International Journal of Selection and Assessment. 2022;30(1):1-13. DOI: 10.1111/ijsa.12376
  10. 10. Ritzhaupt AD, Gunter E, Jones JG. Survey of commercial off-the-shelf video games: Benefits and barriers in informal educational settings. International Journal of Instructional Technology and Distance Learning. 2010;7:45-55. DOI: 10.1177/0047239515588161
  11. 11. Stansbury JA, Wheeler EA, Buckingham JT. Can Wii engage college-level learners? Ease of commercial off-the-shelf gaming in an introductory statistics course. Computers in the Schools. 2014;31:103-115. DOI: 10.1080/07380569.2014.879791
  12. 12. Sundqvist P. Commercial-off-the-shelf games in the digital wild and L2 learner vocabulary. Language Learning & Technology. 2019;23:87-113. DOI: 10125/44674
  13. 13. Wiklund M, Ekenberg L. Going to school in World of Warcraft: Observations from a trial program using off-the-shelf computer games as learning tools in secondary education. Designs for Learning. 2009;2:36-55. DOI: 10.16993/dfl.18
  14. 14. Sanchez DR, Nelson TQ , Kraiger K, Weiner EJ, Lu Y, Schnall J. Defining motivation in video game-based training: Exploring the differences between measures of motivation. International Journal of Training & Development. 2022;26(1):1-28. DOI: 10.1111/ijtd.12233
  15. 15. Krath J, Schürmann L, Von Korflesch HF. Revealing the theoretical basis of gamification: A systematic review and analysis of theory in research on gamification, serious games and game-based learning. Computers in Human Behavior. 2021;125:106963. DOI: 10.1016/j.chb.2021.106963
  16. 16. Armstrong MB, Landers RN, Collmus AB. Gamifying recruitment, selection, training, and performance management: Game-thinking in human resource management. In: Gangadharbatla H, Davis DZ, editors. Emerging Research and Trends in Gamification. Hershey, Pennsylvania, USA: IGI Global; 2016. pp. 140-165. DOI: 10.4018/978-1-4666-8651-9.ch007
  17. 17. Sanchez DR, Langer M, Kaur R. Gamification in the classroom: Examining the impact of gamified quizzes on student learning. Computers & Education. 2020;144:103666. DOI: 10.1016/j.compedu.2019.103666
  18. 18. Erhel S, Jamet E. Digital game-based learning: Impact of instructions and feedback on motivation and learning effectiveness. Computers & Education. 2013;67:156-167. DOI: 10.1016/j.compedu.2013.02.019
  19. 19. Taub M, Sawyer R, Smith A, Rowe J, Azevedo R, Lester J. The agency effect: The impact of student agency on learning, emotions, and problem-solving behaviors in a game-based learning environment. Computers & Education. 2020;147:103781. DOI: 10.1016/j.compedu.2019.103781
  20. 20. Yang YTC, Chang CH. Empowering students through digital game authorship: Enhancing concentration, critical thinking, and academic achievement. Computers & Education. 2013;68:334-344. DOI: 10.1016/j.compedu.2013.05.023
  21. 21. Hummel HG, Joosten-ten Brinke D, Nadolski RJ, Baartman LK. Content validity of game-based assessment: A case study of a serious game for ICT managers in training. Technology, Pedagogy, and Education. 2017;26:225-240. DOI: 10.1080/1475939X.2016.1192060
  22. 22. Shute VJ, Ventura M, Kim YJ. Assessment and learning of qualitative physics in Newton's Playground. Journal of Educational Research. 2013;106:423-430. DOI: 10.1080/00220671.2013.832970
  23. 23. Stănescu DF, Ionită C, Ionită AM. Game-thinking in personnel recruitment and selection: Advantages and disadvantages. Postmodern Openings. 2020;11:267-276. DOI: 10.18662/po/11.2/174
  24. 24. Kim YJ, Almond RG, Shute VJ. Applying evidence-centered design for the development of game-based assessments in physics playground. International Journal of Testing. 2016;16:142-163. DOI: 10.1080/15305058.2015.1108322
  25. 25. Courtney L, Graham S. ‘It’s like having a test but in a fun way’: Young learners’ perceptions of a digital game-based assessment of early language learning. Language Teaching for Young Learners. 2019;1:161-186. DOI: 10.1075/ltyl.18009.cou
  26. 26. Mavridis A, Tsiatsos T. Game-based assessment: Investigating the impact on test anxiety and exam performance. Journal of Computer Assisted Learning. 2017;33:137-150. DOI: 10.1111/jcal.12170
  27. 27. Weiner EJ, Sanchez DR. Cognitive ability in virtual reality: Validity evidence for VR game-based assessments. International Journal of Selection and Assessment. 2020;28:215-236. DOI: 10.1111/ijsa.12295
  28. 28. Auer EM, Mersy G, Marin S, Blaik J, Landers RN. Using machine learning to model trace behavioral data from a game-based assessment. International Journal of Selection and Assessment. 2022;30(1):82-102. DOI: 10.1111/ijsa.12363
  29. 29. Shute VJ, Wang L, Greiff S, Zhao W, Moore G. Measuring problem-solving skills via stealth assessment in an engaging video game. Computers in Human Behavior. 2016;63:106-117. DOI: 10.1016/j.chb.2016.05.047
  30. 30. Westera W, Nadoski R, Hummel H. Serious gaming analytics: What students' log files tell us about gaming and learning. International Journal of Serious Games. 2014;1:35-50. DOI: 10.17083/ijsg.v1i2.9
  31. 31. Levy R. Dynamic Bayesian network modeling of game-based diagnostic assessments. Multivariate Behavioral Research. 2014;54:771-794. DOI: 10.1080/00273171.2019.1590794
  32. 32. Nikolaou I, Georgiou K, Kotsasarlidou V. Exploring the relationship of a gamified assessment with performance. Spanish Journal of Psychology. 2019;22:E6. DOI: 10.1017/sjp.2019.5
  33. 33. Chan D, Schmitt N. Video-based versus paper-and-pencil method of assessment in situational judgment tests: Subgroup differences in test performance and face validity perceptions. Journal of Applied Psychology. 1997;82:143-159. DOI: 10.1037/0021-9010.82.1.143
  34. 34. Outtz JL. The role of cognitive ability tests in employment selection. Human Performance. 2002;15:161-172. DOI: 10.1207/S15327043HUP1501&02_10
  35. 35. Landers R. Developing a theory of gamified learning: Linking serious games and gamification of learning. Simulation & Gaming. 2014;45:752-768. DOI: 10.1177/1046878114563660
  36. 36. Kim YJ, Shute VJ. The interplay of game elements with psychometric qualities, learning, and enjoyment in game-based assessment. Computers & Education. 2015;87:340-356. DOI: 10.1016/j.compedu.2015.07.009
  37. 37. Sanchez DR. Videogame-based training: The impact and interaction of videogame characteristics on learning outcomes. Multimodal Technologies and Interaction. 2022;5. DOI: 10.3390/mti6030019
  38. 38. Findley WG. A rationale for evaluation of item discrimination statistics. Educational and Psychological Measurement. 1956;16:175-180. DOI: 10.1177/001316445601600201
  39. 39. Lord FM. The relation of the reliability of multiple-choice tests to the distribution of item difficulties. Psychometrika. 1952;17:181-194. DOI: 10.1007/BF02288781
  40. 40. Sanchez DR, Langer M. Video Game Pursuit (VGPu) scale development: Designing and validating a scale with implications for game-based learning and assessment. Simulations & Gaming. 2020;51:55-86. DOI: 10.1177/1046878119882710
  41. 41. Aziz R, Norman H, Nordin N, Wahid FN, Tahir NA. They like to play games? Student interest of serious game-based assessments for language literacy. Creative Education. 2019;10:3175-3185. DOI: 10.4236/ce.2019.1012241
  42. 42. Fong G. Adapting COTS games for military experimentation. Simulation & Gaming. 2006;37:452-465. DOI: 10.1177/1046878106291670
  43. 43. Graafland M, Schraagen JM, Schijven MP. A systematic review of serious games for medical education and surgical skills training. British Journal of Surgery. 2012;99:1322-1330. DOI: 10.1002/bjs.8819
  44. 44. Shute VJ, Ke F. Games, learning, and assessment. In: Ifenthaler D, Eseryel D, Ge X, editors. Assessment in Game-Based Learning. New York, NY, USA: Springer; 2012. pp. 43-58. DOI: 10.1007/978-1-4614-3546-4_4
  45. 45. Steinkuehler C, Duncan S. Scientific habits of mind in virtual worlds. Journal of Science Education and Technology. 2008;17:530-543. DOI: 10.1007/s10956-008-9120-8
  46. 46. Shute VJ, Ventura M, Bauer M, Zapata-Rivera D. Monitoring and fostering learning through games and embedded assessments. ETS Research Report Series. 2008;2008:i-32. DOI: 10.1002/j.2333-8504.2008.tb02155.x
  47. 47. All A, Castellar EPN, Van Looy J. Assessing the effectiveness of digital game-based learning: Best practices. Computers & Education. 2016;92:90-103. DOI: 10.1016/j.compedu.2015.10.007
  48. 48. Marquard JL, Zayas-Cabán T. Commercial off-the-shelf consumer health informatics interventions: Recommendations for their design, evaluation, and redesign. Journal of the American Medical Informatics Association. 2012;19:137-142. DOI: 10.1136/amiajnl-2011-000338

Written By

Diana R. Sanchez, Amanda Rueda, Leila Jimeno Jimènez and Mahsa Norouzi Nargesi

Submitted: 15 February 2022 Reviewed: 27 February 2022 Published: 11 April 2022