Worked Examples in Physics Games: Challenges in Integrating Proven Cognitive Scaffolds into Game Mechanics Worked Examples in Physics Games: Challenges in Integrating Proven Cognitive Scaffolds into Game Mechanics

The current study is an exploratory study into the potential of integrating research on worked examples and physics games. Students were assigned to either a base version of a physics game, called the Fuzzy Chronicles, or assigned to a version of the Fuzzy Chronicles augmented with worked examples. Students in both conditions demonstrated significant gains on the pre-post-test, but students in the base game version dem - onstrated significantly greater gains than the students in the worked example version. The results from the current study reinforce results from other studies by our research group demonstrating how important it is that scaffolds based on multimedia research (a) do not over scaffold the student or promote passive, automatic behaviors, (b) do not excessively detract from the student’s gameplay time, and (c) do not disrupt game cognition and flow.


Introduction
The current study is an exploratory study into the potential of integrating research on worked examples into physics games to support deeper learning. The theory behind worked examples is that working memory, which is limited in capacity, is heavily utilized when solving problems, such as setting subgoals that require highly focused cognition [1]. Problem solving has been shown to consume cognitive resources that could be better allocated for learning, integration and consolidation. Worked examples free cognitive resources in working memory for learning, specifically, for the assimilation of new knowledge by generative processing [2].
Many research studies have shown the advantages of learning from correct worked examples [1,[3][4][5][6][7][8][9][10]. It is important however to be aware of the "expertise reversal effect" [11], which has shown through numerous studies that an instructional technique that benefits low prior knowledge learners can lose its benefit for, and in some cases be detrimental to, high prior knowledge learners.
Based on the research discussed above, worked examples can enhance learning in multimedia settings. The purpose of the current study was to examine the efficacy of an approach to integrating worked examples into physics learning games. The current study explores this question by comparing two conditions, one that integrates worked examples during into gameplay and one that does not, in order to explore four hypotheses:

4.
These effects would be especially pronounced in low prior knowledge learners.

Participants
Participants consisted of 53 seventh grade students (F = 24, M = 29) from a Nashville area middle school. Twelve students were removed from the sample due to either missing the pretest (4), a day of gameplay (4), or the posttest (4). One student was also removed from the worked examples condition due to have a difference score over three standard deviations below the mean. This left a total of 40 students (F = 17, M = 23). A chi-squared analysis revealed equal numbers of males and females distributed between the two groups, X 2 (1, N = 40) = 0.102, p = 0.75.

Materials
The game used for this study was an updated version of the conceptually-integrated educational physics game known as The Fuzzy Chronicles [12][13][14]. During the game, the player takes on the role of the space pilot Surge who is trying to help a group of aliens known as Fuzzies.
Similar to versions of Fuzzy Chronicles use in other studies, each game level in the version used for the current study takes place on a grid with the goal of navigating from a launching point to a goal portal/door while avoiding obstacles. Unlike the previous version, the version of the game used for this experiment only included one level of difficulty for each mission to encourage students to progress through educational content of the game. The levels were broken up into five concepts based on Newton's laws of motion and the game mechanics designed to teach those mechanics. The five concepts included: combination of forces (using rocket boosts), changes in mass (picking up Fuzzies), equal and opposite reactions in 1D (launching Fuzzies while not moving), equal and opposite reactions in 2D (launching Fuzzies while moving), and the law of inertia where an object in motion will stay in motion unless acted upon by an outside force (dropping Fuzzies).
There were 7 levels for each concept, making for a total of 35 levels. Each set of seven levels included one boss level that required the students to use all the skills they developed through the previous six levels to show mastery over the concept. For example, for the boss level for the concept of combining forces, students had to learn how to increase and decrease their speed as well as complete two 90° turns. For the experimental manipulation, the six non-boss levels in each set were grouped in pairs. The first level in each pair for the worked examples group was designed to be split into two isomorphic segments which are separated by a laser.
During the first segment there is a preplaced trajectory and preplaced forces that take the ship from the starting launch point to the button that switches off the laser. The first segment thus demonstrates how to complete the target maneuver successfully, and serves as an example that students can use to help navigation from the laser button to the goal.
For the base game group, students are only given the second half of the level and are required to figure out how to complete the maneuver on their own using the basic tips given in the level introductory text. An example of the same level requiring students to move on a diagonal path for the control and the worked examples group can be found in Figure 1. For the second level in each pair, both the groups are given the same level which asks them complete a similar maneuver. A table with the level progression for the two groups can be found in Table 1.
The pretest and posttest consisted of 18 multiple choice questions. Eight of the items from the test dealt with changes in acceleration in 1D with subsets of those items dealing with: (2) the relationship between force and acceleration when mass is unchanged, (2) balanced forces, and (4) combining forces. Four of the items dealt with adding forces together to create 2D movement. Three of the test items were focused on the F = MA relationship, highlighting changes in mass. Finally, the last three questions dealt with the 3rd law of motion (for every action there is an equal and opposite reaction) and required the students to imagine how throwing an object would affect the speed and direction of another object.
In addition to addressing whether students can learn from the game, this study also examines students' perceptions of the game in terms of enjoyment, difficulty, cognitive effort, and perceived learning. To address this goal, a seven-item game evaluation survey with a five point Likert scale ranging from "1 Strongly Agree" to "5 Strongly Disagree" was developed.
A second survey was administered to determine the student's level of video game experience. Students were asked how many hours a week they typically played video games, which video game consoles and portable video game devices their families owned and they played regularly, whether they played games on a desktop or laptop computer, and whether they regularly played games on a smartphone or tablet.
Finally, to assess the students' self-efficacy in terms of playing video and computer games, the video gaming subset of items from the Self-Efficacy in Technology and Sciences (SETS) instrument was used [18]. The instrument was included in order to determine whether self-efficacy while playing games could affect the student's play style or if the different versions of the game were more helpful to students with lower video game self-efficacy.

Procedure
Ten days prior to playing the game, students were given the pretest to determine prior knowledge levels related to the learning goals of the game. Students were given as much time as they needed in order to complete the test. All students were given two full class periods, (90 min) to play the game. On the first day of gameplay students were introduced to the WISE system and given instructions on how to set up their personal game accounts. Students were given hard copy instructions related to their version of the game. The first activity students had to complete in the game was the 15-item video game self-efficacy survey [18]. Upon completing the survey, students were then taken to the in-game tutorial which instructed students on how to place trajectory points, place forces on the timeline, set the direction and magnitude of each force, and how to combined forces.
Students then played the game at their own pace. Students were told that they could assist other students at their lab table, but were instructed not to touch the other student's computer.

Base game Worked example (WE) Level
Adding The researchers gave minimal help in terms of game instructions and were instructed to refrain from giving any physics related assistance. The posttest was administered on Friday at the end of the week. Due to the alternating block scheduling, half of the students completed the test 1 day after finishing gameplay while the other half completed the test 2 days after finishing gameplay. After the posttest, students were asked to complete the game evaluation survey and the gaming experience survey.

Learning gains results
Due to a typo on the pretest materials, one question relating to mass was removed from the analysis for both the pre and posttest. There were no significant differences between the two groups on the pretest t (38) = 0. There was also a significant interaction between testing session and group, F(1, 36) = 4.38, p = .04, with the base game group showing significantly higher gains between the two testing sessions compared to the worked example group. Table 2 contains the means and standard deviations for both groups for both the pretest and the posttest.
In addition to looking at overall learning gains, an additional repeated measures ANOVA was conducted to examine which concepts showed gains between the pre and posttest as well as significant differences in gains between the two conditions. For questions dealing with 1D and changes in acceleration, there was a significant main effect of testing session for 1D questions in general Although there was no significant benefit for providing worked examples in terms of learning gains, one possibility is that worked examples could have been more effective for lower prior knowledge players. Students were ranked as high and low prior knowledge learners using a median split (18 low, 22 high). A chi-squared analysis revealed that there was no significant difference in distribution of high and low ranked prior knowledge individuals between the two conditions, X 2 (1, N = 40) = 0.404, p = 0.53. A MANOVA looking at gain scores between the pre and posttest showed no significant effect of prior knowledge ranking, F (1, 36) = 0.01, p = 0.93, as well as no significant interaction between condition and prior knowledge rank, F (1, 37) = 0.08, p = 0.78. This suggests that regardless of prior physics knowledge, as measured by our pretest, participants did significantly better when they were in the base game group.

Game play analysis
Highest game level completed was used to determine how much of the game and learning content was experienced by the students. A significant positive relationship was found between pretest score and highest level completed, r(40) = 0.48, p = 0.002. There was also a significant positive correlation between the highest level completed during the game and learning gains, r(40) = 0.33, p = 0.04 suggesting that progression further in the game also helped students learn more. The strongest correlation was between posttest score and highest level completed, r(40) = 0.63, p < 0.001. Together these results indicate that prior knowledge helped students to progress through the game and that progressing further also helped students improve their scores between the two testing sessions.
To examine whether differences in the number of levels completed affected learning differences between the two groups an ANOVA was conducted. There was significant difference between the two conditions in terms of the average highest level completed by the participants, F (1, 36) = 5.02, p = 0.03, with participants in the base game group tending to complete more levels (M = 19.10, SD = 7.21) than participants in the worked examples group (M = 15.20, SD = 8.61). Similar to the work of Sweller, this project was also interested in how the two conditions affected play style. To explore this question, the average number of attempts made for each level was computed for the levels that did not include a worked example and was the same across the two conditions (the second level in each pair). This was done for just the first 3 non-worked example levels since all participants had completed those levels. There were no significant differences between the two conditions for average overall attempts, t

Game satisfaction survey
The game satisfaction survey revealed no significant differences between the two groups in terms of whether students enjoyed playing the game (p = 0.38), found the game difficult to play (p = 0.87), found the interface confusing, (p = 0.73), would play again (p = 0.27), or reported working hard to complete game missions (p = 0.57). In terms of perceived physics learning, there were no significant differences between the two groups in whether they thought they learned a lot about physics from the game (p = 0.21) or thought that the game helped them understand physics lessons they had learned in class (p = 0.19).
The distribution of responses for all of the participants can be found in Table 3. In general, 65% of the students reported that they strongly agreed or agreed that they enjoyed playing the game while only one student said they disagreed. Over half of the students (60%) reported that they agreed or strongly agreed that they would like to play the game or games like it again in the future. In terms of mental effort, 80.4% reported that they worked hard to understand how to play the game and to complete the missions. For difficulty dealing with the game, 32.5% of the students either agreed or strongly agree that they found the game difficult to play, 30% said they neither agreed nor disagreed, while 37.5% said they did not agree with the game being difficult to play. Only 12.5% of the students found interacting with the game to be difficult while the majority were neutral (42.5%). For physics learning, the majority of students agreed to some degree that they learned about physics from playing the game (72.5%) and also thought that the game helped them understand physics lessons they had learned in class (72.5%).

Video game self-efficacy
The analyses focus on the video game self-efficacy subscale due to the lack of significant relationship between any of the learning measures and the computer gaming self-efficacy scale. There was no significant difference in reported video game self-efficacy scores between the two groups, t(38) = −1.61, p = 0.12. In addition, there was no significant interaction between game version and self-efficacy ranking (high vs. low created using a median split) on learning gains, F (1, 36) = 0.01, p = 0.91. In general, there was a significant positive relationship between

Simulation and Gaming
video game self-efficacy and performance on the posttest, r(40) = 0.33, p = 0.04, although there was no significant relationship with either the pretest or learning gains. There was also a significant positive relationship between the video game self-efficacy score and the highest game level completed, r(40) = 0.38, p = 0.02, as well as a significant negative relationship with the number of attempts made on the non-worked example levels, r(40) = −0.37, p = 0.02. This suggests that students with higher self-efficacy were less likely to just use trial and error to solve the levels and completed more levels as a result. This is further supported by a significant positive relationship between perceived game difficulty, r(40) = 0.44, p = 0.005, and finding the game confusing, r(40) = 0.51, p = 0.001, with video game self-efficacy. Students with lower self-efficacy were more likely to agree that they found the game confusing to interact with and difficult to play. However, as students' self-efficacy increased they were more like to say they liked the game, r(40) = −0.52, p < 0.001, and would play the game again, r(40) = −0.34, p = 0.03. Most importantly, as students' self-efficacy increased, students were more likely to report that they learned physics from the game, r(40) = −0.37, p = 0.02, and that the game helped them understand lessons from class, r(40) = −0.50, p = 0.001.

Discussion
The highest level completed by each student correlated with how much game and learning content the students experienced while interacting with The Fuzzy Chronicles. The measured pretest score served as a positive indicator for how far students were projected to advance in the game, suggesting that students' prior knowledge was a significant component to the number of levels students were able to complete. Pretest scores did not differ between students in the worked example condition and the baseline condition, which suggests that the conditions were balanced in terms of prior knowledge. Students who completed more levels also showed higher learning gains.
Overall, the baseline condition proved to be more beneficial for both high and low prior knowledge learners compared to the outcome of student performance in the worked example condition. This finding was contradictory to our expectation that low prior knowledge participants would perform better after playing through the worked example condition. Highest level completed also correlated positively with students' post test scores, indicating that game play was correlated with increasing or facilitating understanding of game play content. The average highest game level completed differed significantly between the two conditions, with baseline students completing more levels than their counterparts in the worked level condition. This result was unexpected. If anything, we would have expected the same or less time to complete each level in the worked example condition. Instead, however, somehow the inclusion of the worked examples impeded level progression and actually had a negative effect on student performance overall. No significant difference was found between worked example and baseline conditions regarding the overall number of attempts, suggesting that game play behavior between conditions was equivalent even though the amount of level completion between conditions differed.
Self-efficacy of the students was analyzed to determine how individual judgments of performance ability affected game play. Self-efficacy differences were not seen between the two groups and self-efficacy did not appear to have a significant influence on pretest scores or learning gains. However, self-efficacy was revealed to have a significant positive relationship with posttest scores and highest game level completed indicating that the students' perception of their ability to understand the game while playing the game could have influenced their performance and engagement. Self-efficacy and number of attempts made on non-worked example levels were inversely related, suggesting that students with a higher degree of selfefficacy were less likely to approach the levels by trial and error. Students with lower selfefficacy scores were also more likely to perceive the game as confusing and difficult to play, whereas students with higher self-efficacy reported that they enjoyed the game and would play it again. Self-efficacy scores were also positively correlated with students indicating they learned physics concepts from the game and that game help to reinforce content from class. Self-efficacy may have influenced how students perceived the value of the game.

Conclusions
The findings show that students in both the base game condition and the worked example condition demonstrated significant pre-post gains. In an earlier study, we included a null condition with only a pretest and posttest but no intervention to determine whether a test/retest phenomenon could account for gains without an intervention [17]. That study showed that gains on the test could not be attributed simply to a test/retest effect. We therefore believe the significant pre-post gains in both conditions in the current study to demonstrate the overall efficacy of the approach enacted in Fuzzy Chronicles. Newtonian concepts can be very challenging for students, and are often resistant to instruction [19][20]. We are pleased that these findings are in line with the overarching disciplinary integrated ideas of the Fuzzy Chronicles.
The findings from the worked examples condition, however, do not support our hypotheses. While these findings are disappointing, we have encountered similar patterns in our prior research when we have attempted to integrate well-documented principles about scaffolding from psychology and cognitive science into digital games for learning. Our research has shown that when scaffolding functionality comes at the expense of time spent in gameplay, it can detract from game cognition and STEM learning [15,21]. Adams and Clark's findings demonstrated that self-explanation prompts slowed students in the prompt condition so that the students completed significantly fewer levels and scored significantly lower on a learning assessment [15]. Looking across those studies and the current study, we note that the efficacy of implementing well-documented multimedia principles of learning in STEM games may not enhance learning if the design interferes with students' flow, cognitive load, or engagement with the game mechanics. In particular, results suggested that when the worked example approach is overemphasized in a STEM game, it can disrupt or possibly over scaffold learners, resulting in detrimental learning gains and gaming behavior. More specifically, across the current study and the other two studies to which we referred, we have repeatedly found that scaffolds based on multimedia research must not (a) over scaffold the student or promote passive, automatic behaviors, (b) excessively detract from the student's gameplay time, and (c) disrupt game cognition and flow, especially the pace of flow.
This does not mean that these well-documented learning and scaffolding principles are incompatible in the design of digital games for learning. It simply means that refining designs that carefully integrate game mechanics and the design of the scaffolding requires careful iteration. In the case of the self-explanation functionality, for example, building on the findings of the Adams and Clark study, we redesigned the self-explanation functionality to adapt to students' level of sophistication in working with abstract prompts [15]. We also adjusted the timing and frequency of the prompts so that the prompts only appear after the player had successfully completed a level. By timing the prompts in this way, they were less intrusive and disruptive to players' gameplay, and more likely to be appropriate to the player's current progress and solution. Our research on this refined approach to self-explanation demonstrated significant pre-post learning gains as compared to a version without the selfexplanation functionality [22]. Similarly, we believe that these findings imply the need to refine our approach to worked examples within gameplay rather than implying that worked examples are inappropriate for application in this setting. Essentially, we consider the findings as a reminder of the complexity of integrating scaffolds, an idea which has been developed in other educational contexts and applied to the context of digital games for learning.