First Critical Research Review

For part of my doctoral program I was asked to examine an article and provide a critical review of the publication based on a series of questions. The content below corresponds to the questions asked.

  1. Overall Erhel and Jamet (2013) poorly specified the problem for which their research was designed. Generally, a clear statement of a problem appears in the introduction section of the paper but, in this case, the authors did not clearly state any problem in their introduction. The clearest statement of the objectives of the paper were found in the second to last paragraph of the paper which indicates that “one of the objectives of the present study was to answer the question “Is deep learning compatible with serious games?” (Erhel and Jamet, 2013, p. 165). This objective was not clearly stated anywhere in the prior nine pages although there is a hint to this issue when the authors generically comment in their introduction that “the use of games to teach educational content inevitably raises the question of their compatibility with deep learning” (Erhel and Jamet, 2013, p. 156). In addition, while they indicated there are two objectives, they do not outline the second objective in any overt way. This leaves the reader responsible for trying to ascertain if and what the second objective of the study may be. The authors did state in the introduction that the question of deep learning compatibility “has prompted many researchers to investigate the actual benefits of digital games in terms of learning and moitvation” (Erhel and Jamet, 2013, p. 156). However, given that a value-added approach was used in this study (see question four for a discussion of that framework), the second objective does not seem to be to determine if digital games are an effective means for impacting learning and motivation since the assumption underlying that approach is that these are effective. Thus, based on what is presented in both the literature review and the experiment design, the second objective (what is called the assumed objective) may be that they are seeking to determine the effectiveness of elements (instructions and feedback) within game design in promoting learning and motivation. Unfortunately, this second objective was not specifically stated by the authors as one would have expected in an academic research paper and is only assumed. Based on how the literature review and research was conducted, one could consider that even the authors were unclear on both objectives within their research design and may have only thrown the idea of two objectives in at the end to try and tie their research together based on the results seen.


  1. As noted in question one, there was very little clarity of the actual objectives of this research. If one were to work from inferences of what the author’s may have intended but poorly expressed, then there was one known stated objective — to assess if deep learning is compatible with serious — and one assumed objective — to understand how elements of game design impacts overall learning and motivation. If these were the actual problems the authors wished to address, then there is a need for this study as one of the issues in understanding the use of games in education is to determine how they connect to learning and how the aspects of the actual game may impact effectiveness. In addition, Erhel and Jamet (2013) also used this study to present the value-added approach as a mechanism for assessing effectiveness. While there are issues with how this was implemented within this study (see question four), the idea that disciplines should develop, evaluate and improve upon conceptual models for testing is not a new one nor out of place since testing different conceptual models to identify the effective ways to measure learning within educational research are needed.
  2. Based upon the stated and assumed objectives (see question one), there is a potential for this problem to be researchable but not within a single study as presented here and not in how the authors designed their study (see questions 4 through 14 for more on the issues of their study). Since the authors first objective was assessing learning through “serious games” one would first have to define what constitutes a “serious game” in terms of design elements and goals. The authors did not offer much in terms of defining what specifically constitutes a “serious game” within their study. They suggested that how the learner approaches the game (as either entertainment or as learning) impacts their goals (performance or mastery) but they did not specify that which defines a serious from non-serious game. Once this idea of what is a “serious” versus “non-serious” game was defined, then one would need to identify or design a serious game with the required characteristics that allowed the researcher to assess the depth of learning (surface versus deep). This would likely require further refinement for understanding the hallmarks of deep versus surface learning (and why this is significant to educational research) and how these are assessable (qualitatively and/or quantitatively). To address the second problem of how elements of digital game design impact learning and motivation would require many additional experiments whereby design aspects, such instructions and feedback, were assessed independently (qualitatively and/or quantitatively) from one another for impacts on motivation and learning before assessing them within groupings as the authors do. Therefore, the intended objectives are researchable even if that is not what was done by the authors.
  3. The conceptual framework underlying Erhel’s and Jamet’s (2013) research was found in their discussion of the benefits of digital learning games compared with conventional media (p.157). Within section 1.3, the authors outlined that this study was designed to illustrate an alternative approach to how to assess digital game-based learning (DGBL) and, through this alternative method, identify specifics of game-based design which improve motivation and learning. This is done in such a way that it rests on an initial belief that digital games are an effective medium for learning and that they are only testing to determine what specifically impacts that effectiveness. In testing this idea, the authors argued that prior media comparison approaches to digital game-based learning have been ineffective in being able to reach any concrete conclusions as to whether digital games can be an effective learning medium. This they felt was due to a general vulnerability in the media comparison approach in which “many confounding factors (e.g. format, pace, educational content, teacher’s social presence), …prevent us from clearly identifying the factors responsible for the benefits of DGBL” (Erhel and Jamet, 2013, p. 157). However, the authors failed to address specifics as to how these “confounding factors” impacted the outcomes in the prior studies they cited and why they were mitigated within a value-added approach. Thus, in response to these studies, the authors designed this research to utilize a value-added approach. This framework relied on determining a difference in scores between a start and end points wherein the only a single variable between those two scoring events has occurred. This approach also works best when applied to random samples or, if nonrandom, to samples which have been statistically controlled for populational variance, such as prior experience and learning level (Douglas, 2017). Given the sampling and procedural issues discussed in questions 9 and 10 below it is unlikely that the sample was randomly constituted which also means that additional measures should have been implemented to determine the degree of non-randomness of variables the authors do not account for – such as sex, major, and year of schooling. Since these other variables either are not collected or are suspect given numerical inconsistencies, this questions how effective the application of the value-added approach would be for this research.
  4. Overall Erhel and Jamet (2013) offered a rather rambling combination of theory and research as the foundation of their research and, as a result, there seems to be no central theory underlying their research. They went so far as to state “whichever interpretive theory we choose to apply” suggesting they are not subscribing to a single theory at the start of the paper (Erhel and Jamet, 2013, p. 158). What does underly their research is their use of the value-added approach. This approach rested on the idea that it digital game-based learning is assumed to be effective, despite the inconclusiveness of prior studies, and that they only needed to assess how that effectiveness varied which changing conditions (instructions and feedback). Overall, they did not effectively present why the value-added approach was a better approach to the media comparison approach nor did they address any limitations and criticism of the value-added approach.

When examining their literature review, the connection of the research they presented to the only stated research objective and to the specifics of assessing digital game-based learning are tenuous. The first section (1.1) was meant to outline what digital game-based learning is. However, the literature sources selected dd not offer much concreteness in differentiating digital games from other games and specified little about what a digital game is beyond that is for “entertainment” and results in “cognitive changes.” Since not all cognitive changes represent actual learning processes, this lack of clear definition is problematic. In section 1.2, the authors attempted to lay out how games impact motivation by linking how a learner’s goals of mastery or performance (these seem to be two be presented as mutually exclusive goals) are connected to the entertainment and educational dimensions of games. To do this, the authors examined literature on general motivation and mastery and performance goals and offered no concrete explanation of research which has assessed the linkage between games and learner goals. The only offering of actual digital game research indicated that there is a positive linkage between intrinsic motivation and scoring but nothing about how this connects to learner goals. In section 1.3, the authors sought to present the benefits of DGBL games compared with conventional media but overall end up presenting that DGBL games are a mixed bag and instead the authors used this section to present their value-added approach without very much literature discussion to ascertain why this is better and what caveats structure such an approach. In section 1.4, the authors introduced the concept of instruction design to improve DGBL game effectiveness. They presented they are addressing this as it is an area not studied before within DGBL. To address how important instruction design may be to DGBL, the authors then proceeded to offer research only on the how text-based instructions impact how someone reads and approaches a learning item without addressing how reading on screen is cognitively different than reading from a document. They then transitioned to a discussion of literature which outlines how having no instructions prior to learning from a text document (what they call incidental learning) promotes surface learning whereas clear instructions promote deeper learning (what they call intentional learning). This is even though none of their experiments used either a text document nor had a situation of no instructions prior to learning.

  1. Erhel and Jamet’s (2013) literature review offered no summary of the literature and did not specifically direct any summary to addressing the only stated research objective of the paper. Instead the authors offered a prediction that incidental (surface) learning is likely when “the instructions given to learners encourage them to play rather than learn” suggesting to them that “when the emphasis is placed on the playful components of a digital learning game, learners may fail to put in the effort required for learning” (Erhel and Jamet, 2013, p. 158). This lack of connection to the only stated research objective may stem from this objective having been an afterthought since it is only stated at the end of the paper and may not have been used to build the literature review in any meaningful way. Overall the authors did not offer any concrete information regarding what are “serious games,” the literature on how serious games may connect to surface and deep learning (they only present on how the presence of written instructions in documents may connect to surface or deep learning) nor did they offer much on the specifics of game design which would examine the effectiveness of the elements within it such as instructions. Since most of their references applied solely to text-based documents, one questions the ability to transition cognitive associations with paper documents to that of digital game-based learning. Most interestingly, even though they undertook an experiment in which feedback was used as a tool of game design, they did not offer any references within their literature review which addressed how feedback plays into either motivation or learning. The only discussion of this occurred within the context of the experiment introduction and existed almost as if experiment two was an afterthought done only when experiment one did not present the desired results.
  2. In examining the general foundations for their research, it is not surprising that the Erhel and Jamet (2013) failed to offer clear and specific research questions or hypotheses which connect to their stated objective of the paper. In experiment one, the authors first indicated that the study was done “to ascertain whether the effects of instructions given during the readings phase that have been observed for text-based learning would also manifest themselves during DGBL” (Erhel and Jamet, 2013, p. 158). However, they stated at the end of the discussion of the experiment that “the first experiment was designed to assess the effects of instructions on learning quality and motivation in DGBL” (Erhel and Jamet, 2013, p. 161). These are two different research questions, so it is confusing as to which one they were evaluating with their experiment. The first would suggest they were replicating studies conducted in text-based learning to see if the outcomes also hold true for DGBL. The second suggests that were testing the relationship between instructions and learning and motivation. Setting aside the later research question and looking only at the first, the authors created two assumption to test at the start of experiment two. The first is that “entertainment instruction would improve our participants subjective experience and be reflected in significantly higher intrinsic motivation scores” (Erhel and Jamet, 2013, p. 158). For their first assumption they turned to a metadata analysis by Vogel et al. (2006) for support but this is problematic as the conclusions reached by this study were examining digital simulation and game learning to traditional learning and not specifically how instructions impacted learning for digital games. The second assumption was that “participants in the entertainment instruction condition would achieve a higher learning outcome” (Erhel and Jamet, 2013, p.158). This is based on research conducted by Liebermann (2006) – a reference they give numerous times without any description as to the actual study and results. In doing this, Erhel and Jamet (2013) failed to specify what is the measure of the higher learning outcome that they are basing this assumption on. For the study, they used scoring on the questionnaires, but it is unclear if that was similar to what was done in Lieberman’s study. Overall it is unclear how this research question and assumptions stated are connected to the stated objective of “Is deep learning compatible with serious games?” (Erhel and Jamet, 2013, p. 165).

The second experiment outlined that Erhel and Jamet (2013) “set out to determine whether the presence of KCR [knowledge of correct response] feedback in DGBL quizzes can influence the types of learning strategies induced by the instructions” (p. 162). They predicted that “the addition of feedback containing the correct response would reduce redundant, superficial cognitive processing, thereby making learning more relevant in both the entertainment and learning instructions conditions” (Erhel and Jamet, 2013, p. 162). Like the first research question, the connections of the one stated research objective and the experiment conducted around it are not clearly connected. In addition, there was no mention of using feedback for learning in either the introduction or the literature review prior to experiment two meaning that this experiment was completely disconnected. Overall, experiment two was never actually part of the original research design as the authors clearly indicated that experiment two “was designed to overcome the problem” they found in experiment one – namely that the outcomes did not match the assumptions (Erhel and Jamet, 2013, p. 164). This they felt was from the entertainment perspective in which “the instructions failed to engage a sufficiently strong effort to trigger the central processes needed for learning” so to compensate they added an additional dimension of feedback to change the outcomes (Erhel and Jamet, 2013, p. 164).  This indicates they were more concerned with figuring out when their predictions would be correct than addressing the research objective stated. Overall there is a general lack of cohesiveness between what the introduction states this paper is about and what the actual experiments are designed to answer relative to any stated objectives for the research.

  1. In examining the two experiments conducted, there are some concerns about the actual study design relative to the research questions presented. In experiment one, Erhel and Jamet (2013) indicated that the study was done “to ascertain whether the effects of instructions given during the readings phase that have been observed for text-based learning would also manifest themselves during DGBL” (p. 158) and from this they create two hypotheses. The first was that “entertainment instruction would improve our participants subjective experience (although they do not specify how they are measuring that) and be reflected in significantly higher intrinsic motivation scores” (Erhel and Jamet, 2013, p. 158). The second was that “participants in the entertainment instruction condition would achieve a higher learning outcome” (Erhel and Jamet, 2013, p.158). These assumptions were built from the data of the literature review. However, in reflecting on the results they shared for text-based study of instructional conditions (entertainment versus study) in the literature review, the data was collected via a think aloud. This allowed the researchers in that study to observe the process of learning between the two conditions of the entertainment versus learning. There were no mechanisms for this type of data collection in this study and thus the current study is not collecting comparable data to that one. This would mean the ability to apply conclusions from that study to this one is problematic. In addition, the authors included two different set of questions – paraphrase and inferential – in assessing learning outcomes and offered that one was an example of deeper learning and the other of surface learning without offering evidence to support that assumption.

In examining the design of experiment number two, the research question was “set out to determine whether the presence of KCR [knowledge of correct response] feedback in DGBL quizzes can influence the types of learning strategies induced by the instructions” (Erhel and Jamet, 2013, p. 162). Now the addition of this experiment to a study which a) did not have it included from the start (as evidenced by the authors’ own words) and b) for which the experiment was designed to prove their predictions were correct by changing the conditions of the experiment is problematic towards their overall purpose. Aside from the reasoning given for experiment number two, the general design was that this experiment was done by adding feedback responses to the ASTRA simulation while that the same time continuing with the two instructional conditions of entertainment or learning. Thus, the authors were not only testing the presence of the feedback but also that of the instructional condition even though they said they are only testing the instruction condition (Erhel and Jamet, 2013, p. 162). Overall the study’s design relative to the actual research questions asked is less than ideal and suggests that, at least in the case of experiment two, there was significant bias on the part of the researchers towards a particular outcome which resulted in modifications of overall research purposes.

  1. Overall the sampling methods presented by Erhel and Jamet (2013) within their two experiments were poorly explained. The authors indicated that students were recruited from a pool of students but did not indicate how the original pool was established, how big this pool was to begin with, and how the actual participants were pulled from that larger pool for both experiments. In looking more closely at experiment one, the authors indicated that they randomly assigned participants to the two experimental conditions (learning versus entertainment instructions) but since they showed the same uneven numbers of males and females to each of these groups (9 men 15 women) it would suggest they actually did a stratified random sample to evenly distribute the numbers of each sex that had available between the two experimental conditions . Furthermore, they failed to examine the specific of the populations within each of the two conditional states beyond their general age mean within experiment one and they did not offer any breakdowns of the populations, such as by sex or age within experiment two. Erhel and Jamet (2013) did mention that for both experiments they excluded students based on their enrollment in medical or allied health programs. However, they did not offer any breakdown of the background majors of those that did participate within the study. This could have created another dimension to their data which may have been relevant between the two experiments and may impact results. Because of the weakness in the description of the populations participating in the study and overall lack of explanation of their actual sampling methods, the overall generalizability of the study’s results is limited for at least two reasons. First, since they did not explain their populations parameters, they cannot offer any intra- and inter-group analyses within and between the two experiments which would help the reader to understand if the results from the comparisons between these two populations are valid. Secondly, since they offered no population parameters, the ability to apply what they discover to “like” populations is not possible since the “like” is undefined by the authors. While this could allow make one think that overall generalizations would be safe due to lack of specific population parameters, the opposite is true. Since they offer no discussion of population data by which to breakdown their analyses, it may be that there are underlying specifics to the populations that influenced the results which they have not accounted for. For example, perhaps there were differences in distribution of majors between the two experiments which influenced why one group scored better on the inferential questions than the other. Unfortunately, the authors, for reasons unknown, did not even consider these issues of sampling when outlining what are limitations of their research.
  2. In evaluating the adequacy of Erhel and Jamet’s (2013) procedures there are several issues evident. The first issue raised regards the actual number of participants in their study. In experiment one, the authors indicated a total of 46 participants (22 men and 24 women) but when they broke the data down by the two groups they indicated that each group had 9 males and 15 females for a total of 18 men and 30 women participating in experiment one. They only mentioned omitting one male due to having scored too high on the pretest. This means they either miscalculated their original number of study participants, they removed more males and added additional females without explanation as to when, how and why this occurred within the experiment procedures, or that they padded their data to reach a desired result.

In examining the procedures outlined for experiment one, the authors indicated there were five experimental phases but only explained three phases (pretest, simulation, and questionnaires) in their descriptions of their procedures. It is unclear if they miscounted or failed to properly divide the phases within the writing.  Beyond the actual number of phases, within each phase they did explain there are concerns which can be raised. The first phase – a pretest phase – involved the participants completing a questionnaire on prior knowledge. This pretest was on “medical notions” and according to the authors “they would not help the learners answer the quizzes or the knowledge questionnaire.” (Erhel and Jamet, 2013, p. 159). However, despite this statement that it would not impact what the participant would be exposed to later, the researchers used the score on this pretest to eliminate persons from participation for possessing “too much prior knowledge” (Erhel and Jamet, 2013, p. 159) without evidence within the data that these individuals skewed the results with this prior knowledge.

In the second phase of the experiment (where they tested the two experimental conditions between the two groups), the authors indicated that distinctions between these were a “learning condition in which the instructions stresses ASTRA’s playful dimension, presenting it as a game” whereas the other stressed “the educational dimensions, presenting it as a learning module” (Erhel and Jamet, 2013, p.159). In translating the two examples provided in French by the researchers, there are some concerns. Despite their description that one was set of instructions was for learning and the other for a game, the phrase “helps you to learn” was evident in both sets of instructions suggesting that the distinction between game and educational instructions were not complete between the two sets of instructions. In addition, there was an additional variation in the wording used to emphasize the game (“be challenged to answer quizzes”) versus the learning module (“be introduced to quizzes) – see underlined in Box 1 and Box 2. The word selection of the passive “be introduced” for learning and the active “be challenged” for gaming may be more influential in framing the instructions when viewed by the learner and may be indicating more about word selection in instructions than the framing of a game versus learning module. Further into during phase two, the authors indicated that they used a single room with 6 booths which would indicate that they had to break each experimental condition set up into at least four groups to work through the ASTRA simulation. There was no indication of how close in time these rotations occurred, data collected on time for completion, nor any analysis of variations in results based on these differing simulation runs.

In the third phase they indicated they gave a motivational questionnaire and then a knowledge questionnaire. Unfortunately these questionnaires were not provided for review in this paper as one would have expected especially since they do not originate in full from any previously published motivational questionnaire. In the motivational questionnaire, the authors indicated that there were 12 learning goal questions and 3 motivational questions assessed but only 5 were shown and deconstructed within the paper for how they addressed aspects of performance goals, mastery goals, and intrinsic motivation.  It would have been more appropriate to show the entire questionnaire deconstructed to these specific categories. The knowledge questionnaire contained “four paraphrase-type” questions and four “inference-type questions” with the written response to each of scored on as 0, .5, 1 or 2 points based on degree of accuracy. However, the authors did not indicate who did the scoring (one author, both authors, or another), when this was done (all at once or over time, blind to group assignment or by group) or if scoring bias was assessed to address issues in the collection of scoring data.

In examining experiment two, the authors offered no specifics of procedures beyond that they adjusted the ASTRA simulation to allow for a correct or incorrect response with correct answer to be given to the learner. No discussion of data collection procedures was denoted beyond the phrase “exactly the same as Experiment 1” (Erhel and Jamet, 2013, p. 162). This is problematic since like in experiment one, there is a numbers issue with participants. The authors indicated they had 16 men and 28 women for a total of 44 participants in experiment two and that 4 were excluded after the pretest phase. Despite this exclusion, their numbers still all ran with 44 participants through the analysis of the different data from the ASTRA quizzes and the motivational and knowledge questionnaires.

  1. In examining the appropriateness and quality of the measures used by Erhel and Jamet (2013), there are some issues evident in their work. Given the participant numbers issues noted in both experiment one and two as well as the aspects of the scoring for the knowledge questionnaire as discussed in question 10, it is uncertain how reliable their actual data is from which they draw their conclusions. In addition it was noted in that in the second experiment they were only testing the single instructional independent variable (the instructions) but in reality they had layered a second variable – that of the feedback – into this experiment. Given that the results of the instructions testing in experiment one was not in support of their expectations, it seems unusual that the authors would layer this additional variable without testing how it did in its own experiment (i.e. does knowledge gained response (KGR) feedback improve performance regardless of instruction style). Thus they should have run a prior experiment before including the KGR feedback with instructions that would allowed them to assess what degree of impact the feedback alone.
  2. Overall in examining the data analyses presented by Erhel and Jamet (2013) there are several critical issues seen. First is the overall lack of information about how the sampling was done. Since the value-added approach relied on either random sampling or control of the variation of the non-randomness of the sample one would have expected specifics of this process having been discussed and a greater breakdown of the data by variables such as sex, age, major and level of schooling would help verify the use of the dataset relative to this approach. Second, there is an overall lack of clarity and possibly inaccuracy in the size of the populations reported and those that were analyzed. Both experiments indicated the total number of participants but since pretesting was part of the experiment this would suggest that these numbers are what occurs at the start of the procedures. In both experiments they dropped individuals (due to high pretest scores) but the numbers do not change when examining any of the phase three scores. This could mean that the authors were not using pretests as part of the actual experiment phasing but as part of the populational sampling design, so these individuals were dropped prior to the participant count being finalized. However, that is not how it is presented in the actual procedures of the experiment. In addition, in experiment one, the actual number of the participants reported varies from 46 to 48 and back to 46 and the numbers of males and females do not match to what is initially reported and it is not corrected when dropping of participants due to pretest scores is considered. Given these small samples (both under 50) and the unclear numbers of actual participants, even the addition or loss of one or two individuals could shift the significance of the conclusions reached.
  3. Erhel and Jamet (2013) did outline three limitations to their study. The first they denoted was in the selection of a less than interactive game. Since they stated that their research objective was assessing “is deep learning compatible with serious games” (Erhel and Jamet, 2013, p. 165), the actual game design would have been something that should have been considered at the start of the research design. The authors did not mention why the ASTRA simulation was selected over other games which would have explained some of this issue and should have been discussed. The second limitation they noted was that since overall scores were high in the in game quizzes and thus there was very limited feedback given to participants. Since their entire second experiment was based on the importance of feedback it is surprising that they say that even though it was a limiting factor that the data shows it was “beneficial factor” (Erhel and Jamet, 2013, p. 165). They should have assessed solely the impact of feedback alone without the conditions of the instructions to understand what this meant in greater detail. The third limitation they mentioned was that their use of asynchronous data from the actual game play meant the data received was not a reflection of actual play but a recollection regarding it. They mentioned they could have collected in-time data but fail to explain why they didn’t. Overall, the authors are correct in that these are three limitations of the research, but they failed to see the other larger issues present in this study with regards to research design, experiment procedures and sampling, and overall populational controls (see prior questions). Since many of these call into doubt their overall results, it is not surprising that these are not mentioned as they would undermine the entire reasoning for publishing the study.
  4. Research design and procedural issues aside, Erhel and Jamet (2013) offered some conclusions in their final discussion which seem to be making leaps from the actual results of the study. In experiment one, the Erhel and Jamet (2013) presented that there was no significant difference between the instruction modes and responses to the paraphrase type questions but that participants in the learning module “performed significantly better that those in the entertainment instruction condition” on inferential questions (p. 161). By the time this reached the final discussion of the paper, these results are expressed as coming “out against the entertainment instruction” since it “failed to trigger the central processes for learning” (Erhel and Jamet, 2013, p. 164). However, that is not what the results indicate since both modes were effective for learning but not for the same kinds of question. Whether or not these questions are demonstrated examples of learning processes is not evident within the experiment design provided by the authors. Thus, the authors showed an assumption of reasoning behind outcomes without any foundation within the actual research conducted.

In the second experiment, Erhel and Jamet (2013) concluded that the entertainment instruction group “performed better on comprehension (inference) questions than those in the learning group and that this was caused by being “less frightened of failure” (p. 164) and that this support the notions of cognitive load that “adding features such as feedback to an educational document can trigger deep cognitive processing that contributes better to learning” as well as extending “findings that adding feedback in DGBL enhances memorization” (p. 164). However, since as the experiment offered no measurement of cognitive load within the actual experiment and it is only discussed cognitive load in consideration of experiment two and not experiment one. Therefore, it was premature of the authors to suggest that this supports that theory to the exclusion of others. In addition, since the authors connected memorization aspects to the paraphrase questions in the last paragraph of section 3.3 and comprehension to the inferential questions (even though they do not concretely demonstrate this is the case with their study), their second experiment actually indicates that feedback serves those in the learning condition more for memorization than it does for those in the entertainment condition.

  1. Since the overall theoretical base of this study is not very apparent within the confines of the introduction and the only overarching statement the authors make with regards to underlying paradigms is that of the use of the value-added approach, the ability of the authors to relate their findings to a specific overarching theory is not easily discernable. The authors did make mention of several articles which offer up reasonings as to the various results of their findings, but this is not conveyed that they were subscribing to a specific theoretical foundation. Rather they seemed to be picking those ideas which best match their overall results rather than designing their research towards a theoretical perspective.
  2. While the overall idea of assessing how digital games connect to learning and the specific conditions which impact different kinds of learning through digital games is of importance within educational technology research, the value of Erhel and Jamet’s (2013) results are lessened significantly by the overall problems in their research design and numerical controls. The overall lack of information on sampling methods and population parameters means that understanding the representativeness of this population for generalizability is limited. Given that this paper was cited in over 168 articles (based on Research Gate results), quite a few studies have used this study as part of their research. This is problematic given the extensive issues present in this study design and the overall lack of cohesive research objectives and research design which undermine the quality of Erhel and Jamet’s results. It would be very beneficial to examine other studies of this field to determine what additional research examines the foundation of learning through digital games and then identify the critical components of study for digital games which should be examined within future studies.


Douglas, D (2017) The Value of Value-Added: Science, Technology, and Policy in Educational Evaluation, CUNY Academic Works. Retrieved from

Erhel, S and Jamet, E. (2013) Digital game-based learning: Impact of instructions and feedback on motivation and learning effectiveness. Computers & Education,67(C), 156-167

Vogel, J. J., Vogel, D. S., Cannon-Bowers, J., Bowers, C. A., Muse, K., & Wright, M. (2006). Computer gaming and interactive simulations for learning: a meta-analysis. Journal of Educational Computing Research, 34(3), 229–243.

Promoting Student Engagement in Videos Through Quizzing

Cummins, S. Beresford, A.R. and Rice. A (2016) Investigating Engagement with In-Video Quiz Questions in a Programming Course. IEEE Transactions on Learning Technologies 9(1): 57-66

The use of videos to supplement or replace lectures that were previously done face-to-face is a standard to many online courses. However these videos often encourage passivity on the part of the learner. Other than watching and taking notes, there may be little to challenge to the video-watching learner to transform the information into retained knowledge, to self-assess whether or not they understand the content, and to demonstrate their ability to utilize what they have learned towards novel situations. Since engagement with videos is often the first step towards learning, Cummins, Beresford, and Rice (2016) tested whether or not student can become actively engaged in video materials through the use of in-video quizzes. They had two research questions: a) “how do students engage with quiz questions embedded within video content” and b) “what impact do in-video quiz questions have on student behavior” (p. 60).

Utilizing an Interactive Lecture Video Platform (ILVP) they developed and open sourced, the researchers were able to collect real-time student interactions with 18 different videos developed as part of a flipped classroom for programmers. Within each video, multiple choice and text answer based questions were embedded and were automatically graded by the system. Videoplay was automatically stopped at each question and students were require to answer. Correct answers automatically resumed playback while students had the option of retrying incorrect ones or moving ahead. Correct responses were discussed immediately after each quiz question when payback resumed. The style of questions were on the level of Remember, Understand, Apply, and Analyse within Bloom’s revised taxonomy . In addition to the interaction data, the researchers also administered anonymous questionnaires to collect student thoughts on technology and on behaviors they observed and also evaluated student engagement based on question complexity. Degree of student engagement was measured by on the number of students answering the quiz questions relative the number of students accessing the video.

According to the Cummins et. al. (2016), that students were likely to engage with the video through the quiz but that question style, question difficulty, and the overall number of questions in a video impacted the likelihood of engagement. In addition, student behaviors were variable in how often and in what ways this engagement took place. Some students viewed videos in their entirety while others skipped through them to areas they felt were relevant. Others employed a combination of these techniques. The authors suggest that, based both on the observed interactions and on questionnaire responses, four patterns of motivating are present during student engagement with the video – completionism (complete everything because it exists), challenge-seeking (only engage in those questions they felt challenged by), feedback (verify understanding of material), and revision (review of materials repeatedly). Interestingly, the researchers noted that student recollection of their engagement differed in some cases with actual recorded behavior but, the authors suggest this may actually show that students are not answering the question in the context of the quiz but are doing so within other contexts not recorded by the system. Given the evidence in student selectivity in responding to questions based on motivations, the author’s suggest a diverse approach to question design within videos will offer something for all learners.

While this study makes no attempt to assess the actual impact on performance and retention of the learners (due to the type of class and the assessment designs within it relative to the program), it does show that overall in-video quizzes may offer an effective way to promote student engagement with video based materials. It is unfortunate the authors did not consider an assessment structure within this research design so as to collect some assessment of learning. However given that the platform they utilized it available to anyone ( and that other systems of integrated video quizzing are available  (i.e. Techsmith Relay) which, when combined with key-strokes and eye movement recording technology, could capture similar information does open up the ability to further test how in-video quizzing impacts student performance and retention.

In terms of further research, one could visual a series of studies using a similar processes which could examine in-video quizzing to greater depth not only for data on how it specifically impacts engagement, learning and retention but also how these may be impacted based on variables such as video purpose, length, context and the knowledge level of the questions.  As Schwartz and Hartmann (2007) noted design variations with regards to video genres may depend on learning outcomes so assessing if this engagement only exists for lecture based transitions or may transfer to other genre is intriguing. As the Cummins et. al (2016) explain, students “engaged less with the Understand questions in favour of other questions” (p.  62) which would suggest that students were actively selecting what they engaged with based on what they felt were most useful to them. Thus further investigation of how to design more engaging and learner centered questions would be useful towards knowledge retention. In addition, since the videos were sessions to replace lectures and ranged in length from 5 minutes and 59 seconds to 29 minutes and 6 seconds understanding how length impacts engagement would help to understand if there is a point at which student motivation and thus learning waivers. While the authors do address some specifics as to where drop-offs in engagement occurred relative to specific questions, they do not offer a breakdown as to engagement versus the relative length of the video and overall admit that the number of questions varied between videos (three had no questions at all) and that there was no connection between number of questions and the video length. Knowing more about the connections between in-video quizzing and student learning as well as the variables which impact this process could help to better assess the overall impact of in-video quizzing  and allow us to optimize in-video quizzes to promote student engagement, performance and retention.

Schwartz, D. L., & Hartman, K. (2007). It is not television anymore: Designing digital video for learning and assessment. In Goldman, R., Pea, R., Barron, B., & Derry, S.J. (Eds.), Video research in learning science. pp 349-366 Mahwah, NJ: Lawrance Erlbaum Associates.

Video Podcasts and Education

Kay, R. H. (2012). Exploring the use of video podcasts in education: A comprehensive review of the literature. Computers in Human Behavior, 28, 820-831

While the use of podcasts in education is growing, the literature to support their effectiveness in learning is far from concluded. Kay (2012) offers an overview of the literature on the use of podcasts in education a) to understand the ways in which podcasts have been used,  b) to identify the overall benefits and challenges to using video podcasts, and c) to outline areas of research design which could enhance evaluations of their effectiveness in learning. Utilizing keywords, such as ‘podcasts, vodcasts, video podcasts, video streaming, webcasts, and online videos” (p. 822), Kay searched for articles published in peer-reviewed journals. Through this she identified 53 studies published between 2009 and 2011 to analyze. Since the vast number of these were of focused on specific fields of undergraduates, Kay presents this as a review of  “the attitudes, behaviors and learning outcomes of undergraduate students studying science, technology, arts and health” (p. 823) Within this context, Kay (2012) shows there is a lot of diversity in how podcasts are used and how they are structured and tied into learning. She notes that podcasts generally fall into four categories (lecture-based, enhanced, supplementary and worked examples), can be variable in length and segmentation, designed for differing pedagogical approaches (passive viewing, problem solving and applied production) and have differing levels of focus (from narrow to specific skills to broader to higher cognitive concepts).  Because of the variability in research design, purpose and analysis methods, Kay (2012) approached this not from a meta-analysis perspective but from a broad comparison perspective with regards to the benefits from and challenges presented in using video podcasts.

In comparing the benefits and challenges, Kay (2012) presents that while there are great benefits shown in most studies, some studies are less conclusive. In examining the benefits, Kay finds that students in these studies are coming into podcasts primarily in evenings and weekends, primarily on home computers and not mobile devices (but this will vary by the type of video),  are utilizing different styles of viewing and that access is tied to a desire to improve knowledge (often ahead of an exam or class). This suggests that students are engaged in the flexibility and freedom afforded them through podcasts to learn anywhere and in ways that are conducive to their learning patterns. Overall student attitudes with regards to podcasts are positive in many of the studies. However, some showed a student preference for lectures over podcasts which limited the desire of the student to access them. Many studies commonly noted that students felt podcasts gave them a sense of control over their learning,  motivated them to learn through relevancy and attention, and helped them improve their understanding and performance. In considering performance, some of the studies showed improvement over traditional approaches with regards to tests scores while others showed no improvement. In additional while some studies showed that educators and students believed there were specific skills such as team building, technology usage and teaching skills the processes as to how these occur were not shared. In addition, some studies indicate technical problems with podcasts and lack of awareness can made podcasts inaccessible to some students and that several studies showed that students who regularly accessed podcasts attended class less often.

In reflecting on this diverse outcomes, Kay presents that the conflict evident in understanding the benefits and challenges is connected to research design. Kay (2012) argues that issues of podcast description, sample selection and description and data collection need to be addressed  “in order to establish the reliability and validity of results, compare and contrast results from different studies, and address some of the more difficult questions such as under what conditions and with whom are video podcasts most effective” (p. 826).  She argues that understanding more about the variation in length, structure and purpose of podcasts can better help to differentiate and better compare study data. Furthermore, Kay asks for more diverse populations (K-12) and better demographic population descriptions within studies so as to remove limits on ability to compare any findings among different contexts. Finally, she presents that an overall lack of examination of quantitative data and overall low quality descriptions of qualitative data techniques undermine the data being collected. “It is difficult to have confidence in the results reported, if the measures used are not reliable and valid or the process of qualitative data analysis and evaluation is not well articulated.” (p. 827) From these three issues, Kay recommends an overall greater depth to the design, descriptions, and data collection of research is needed in video podcasting research.

While literature review offers a general overview of the patterns the author witnessed in the studies collected, there are questions about data collection process as the author is unclear as to a) why three prior literature reviews were included as part of an analysis and b) as to whether the patterns she discusses are only from those papers which had undergraduate populations (as is intimated by her statement on this – as noted in italics above) or is it of all samples she collected. The author also used articles published in peer-reviewed journals and included no conference papers. It is unclear what difference in data would have resulted from including these other sources.

Overall the most critical information she provides from this study is the fact that there is no unifying research design that underlies the studies on video podcasts and this results in a diverse set of studies without complete consensus on the effective use of podcasts in education and overall little applicability on how to effectively implement video podcasts. The importance of research design in creating a comparative body of data cannot be understated and is something which should be considered in all good educational technology research. Unfortunately, while Kay denotes the issues present in how various studies are coding and how data is collected and analyzed in the studies she examined, she does not address the underlying research design issues much when thinking about areas of further research.  While this is not to lessen the issues she does bring up for future research, the need for better research design is evident and given little specifics by Kay.  One would have liked a more specific vision from her on this issue since greater consideration towards the underlying issues of research design with regards to describing and categorizing video podcasts, sampling strategies and developing methods of both qualitative and quantitative analysis are needed.


Designing Effective Qualitative Research

Hoepfl, M. C. (1997) Choosing qualitative research: A primer for technology education researchers. Journal of Technology Education, 9, 47–63

According to Hoepfl (1997), research in technology education has largely relied on quantitative research, possibly due to its own limitations in knowledge and skill on qualitative research design. Desiring to increase the implementation of qualitatively designed research, Hoepfl offers a “primer” on the purpose, processes and practice of qualitative research. Presenting qualitative research as expanding knowledge beyond what quantitative can achieve, Hoepfl (1997) sees it as having three critical purposes. First it can help understand issues about which little is known. Secondly it can offer new insight on what we already know. Thirdly, qualitative research can more easily convey the depth of data beyond what quantitative can. In addition, since qualitative data is often presented in ways which are similar to how people experience their world, he offers that it finds greater resonance with the reader. With regard to the processes of qualitative research, Hoepfl (1997) denotes that due to its nature, qualitative research design requires different consideration  as the “particular design of a qualitative study depends on the purpose of the inquiry, what information will be most useful, and what information will have the most credibility” (p.50). This leads to a flexibility – not finality – of research strategy before data collection and a de-emphasis on the confidence of data being a result solely of random sampling strategies and numbers. This flexibility in design strategy means a great deal of thought must be made on how to best situate data collection with recognition that actions in field may require adjustments of design as some questions fail or if new patterns emerge. In terms of strategies, the author offers up purposeful sampling options and discusses how maximum variation sampling may lead to both depth of description and sensitivity for emergent pattern recognition. He also outlines some of the various forms of data available in qualitative research and the stages of data analysis. In doing this, Hoepfl (1997) recognizes that qualitative data is much more difficult to collect and analyze than quantitative data and that often the research may require numerous cyclical movements through the various stages of collection and analysis. Importantly he addresses the practices of the researcher and reviewer in considering authority and trustworthiness in qualitative research by examining issues of credibility, transferability, dependability and confirmability.

In examining Hoepfl’s work, he offers a quality start to understanding the strengths and struggles of qualitative research. He correctly argues that the ability for qualitative research to have increasing acceptance within technology education rests on the ability of the researcher to address the questions of authority and trustworthiness which are more easily (albeit possibly erroneous) accepted in quantitative research. However  there were other aspects which are inherent in qualitative research which he gives almost no treatment to at all. These include consideration of  how relationships become built and defined between subjects and researcher and the impacts these can have on subject behavior. Hoepfl (1997) makes mention of these relationships and the risk of altering participant behavior denoting that “the researcher must be aware of, and work to minimize.” (p.  53) but  he offers no process for either recognizing when this occurs within the data nor how to actually go about minimizing this.  When it comes to the ethics of human subject interaction, Hoepfl (1997)  denotes that “the researcher must consider the legal and ethical responsibilities associated with naturalistic observation” (p. 53) but earlier offered that limiting the knowledge of the researcher’s identity and purpose or even hiding them may be appropriate. This is a problematic statement given informed consent guidelines and outlines a key aspect of information missing in this primer – that of how to consider human subject research ethics within qualitative research design. Since Hoepfl is offering a general guide to qualitative research and since the existence of IRB’s and the primacy guidelines of informed consent were established in 1974 by the National Research Act,  one would have expected at least some consideration of those guidelines, a mentioning of informed consent, or at least a discussion of how to handle the sensitive data that may come with qualitative data collection.

In reflecting on the applicability of Hoepfl’s work to my research interests, the emphasis on what qualitative research can bring to the educational technology table is enlightening as I did not recognize how much of a new approach this was to education as it was something of a staple to my anthropological education. Of particular interest was Hoepfl discussion of maximum variation sampling. He cites Patton in saying

“The maximum variation sampling strategy turns that apparent weakness into a strength by applying the following logic: Any common patterns that emerge from great variation are of particular interest and value in capturing the core experiences and central, shared aspects or impacts of a program” (Hoepfl, 1997 p.52)

This statement and his discussion of trustworthiness connected to a recent article I read on generalizing in educational research written by Ercikan and Roth’s (2014). In particular, the authors discuss the reliance on quantitative research for its supposed ability to be generalized but then break down this assumption to argue that qualitative data actually has more applicability since, if properly designed, can create essentialist generalizations. These are:

“the result of a systematic interrogation of “the particular case by constituting it as a ‘particular instance of the possible’… in order to extract general or invariant properties….In this approach, every case is taken as expressing the underlying law or laws; the approach intends to identify invariants in phenomena that, on the surface, look like they have little or nothing in common”(p. 10).

Thus by looking at “central, shared aspects” denoted by Hoepfl through maximum variation sampling and discerning the essential aspects which underlie the patterns, qualitative research could “identify the work and processes that produce phenomena.” Once this is established, the testability of the generalization is done by examining it to any other case study. If issues of population heterogeneity are also considered within the design of the qualitative data collection, the authors then argue that the ability to generalize from data is potentially greater with qualitative research.

Additional References

Ercikan, K. and Roth W-M (2014) Limits of Generalizing in Education Research: Why Criteria for Research Generalization Should Include Population Heterogeneity and Uses of Knowledge Claims. Teachers College Record Volume 116 (5): 1-28