CRR#2: Using Peer Feedback to Enhance the Quality of Student Online Postings: An Exploratory Study

As part of the educational technology program, the following is offered as a critical reflection to specific questions on the following article:

Ertmer, P., Richardson, J., Belland, B., Camin, D., Connolly, P., Coulthard, G., Lei, K., and Mong, C. (2007). Using Peer Feedback to Enhance the Quality of Student Online Postings: An Exploratory Study. Journal of ComputerMediated Communication,12(2), 412-433.

1.Identify the clarity with which this article states a specific problem to be explored.

According to Maxwell (2005), a research problem “identifies something that is going on in the world, something in itself that is problematic or that has consequences that are problematic” (p. 40). In the article, “Using Peer Feedback to Enhance the Quality of Student Online Postings: An Exploratory Study,” Ertmer et al. (2007) presented that this exploratory study was created to address the research gap in understanding how feedback impacts learning – particularly regarding how peer feedback construction impacts higher levels of thinking learning.  How problematic this gap was to the authors was presented in the introduction.  The authors noted that online discussions are critical to “collaborative meaning-making” and that to be effective discussions need to “progress to include both reflection and critical thinking” (Ertmer et al., 2007. 413). But as they noted by citing Black, (2005) there is little evidence these interactions develop much beyond the basics of “sharing and comparing information” (Ertmer et al., 2007. 413). To help move online discussions into deeper levels of thinking, the authors proposed that peer feedback “specifically related to the quality of their postings” can assist students in developing their learning towards these deeper levels. (Ertmer et al., 2007. 413).

In reflecting on the clarity of how they proposed this research problem, the most specific statement of their research problem appeared within the Purpose of Study section, but their initial idea and reasoning are presented within the introduction and are supported throughout the literature review. There was clear flow between these sections as well as to the content of the abstract. If anything were to be improved from its current state, it would only be a minor adjustment within the last paragraph of the introduction wherein the authors could conclude with a firmer research problem statement which mirrors that in their Purpose of Study. Such a statement could encapsulate the literature review specifically towards framing the foundation principles underlying the study.  Such changes, however, are not specifically required as their research problem and reasoning were presented with well-developed clarity and whetted the audience’s appetite for further reading – a task specific to the introduction of any paper.

2. Comment on the need for this study and its educational significance as it relates to this problem.

The role of feedback is to increase student motivation and performance by connecting the student to their learning in meaningful and cognitively significant ways. This feedback can come in a variety of sources, some of which may be more effective than others in both meeting student and faculty expectations. This, Ertmer et al. (2007) noted, is particularly vital for online students where expectations about feedback can often impact retention. The authors advised that meeting students expectations regarding feedback requires “a significant amount of time and effort” on the part of the instructor (Ertmer et al., 2007, p. 414).  While they offer no research regarding the impact of feedback on instructor workload, from this reviewer’ perspective (and that of her online colleagues), providing personalized, timely and constructive feedback in online asynchronous discussions often requires the instructor be online continuously, be active in the conversations, and offer detailed feedback for further improvement which is time-consuming.  In proposing the use of peer feedback to address this workload issue, Ertmer et al. (2007) referenced four studies which suggest that students benefit from both giving and receiving peer feedback within traditional classroom settings but that these have yet to be determined fully within the online environment. With the increasing number of online courses and programs being offered, knowing what works in the online education is a particularly salient issue even eleven years after the publication of this paper. One could surmise that at the time when this was written, the use of online discussion and technology supporting them were relatively new within online education. Therefore, assessing how effective peer feedback is towards promoting higher order thinking would have been just as educationally significant then.

3. Comment on whether the problem “researchable”? That is, can it be investigated through the collection and analysis of data?

The point of this exploratory study was to examine “student perceptions of the value of giving and receiving peer feedback” with a goal of determining whether this feedback impacted the quality of the discussion postings (Ertmer et al., 2007, p 416). To address this, Ertmer et al. (2007) framed three specific research questions that were investigated through the collection and analysis of data.  Their first research question asked what the impact of peer feedback will be on the quality of online conversations. This testable by establishing a research design which would allow for the collection of data on the quality of student postings over time when they are given peer feedback. There could potentially be several ways to measure quality as well as ways for creating systems of peer feedback and there would be a need to address other variables, such as timeliness and format, which could impact this process.

In their second research question, Ertmer et al (2007) asked how students perceive “the value of receiving peer feedback” and how this compares to their perceptions of instructor feedback (Ertmer et al. (2007), p 416). The third research question considers the “students’ perceptions on the value of giving peer feedback” (Ertmer et al. (2007), p 416). These questions would be testable by establishing a research design which could collect pre-feedback perceptions of students’ values of both peer and instructor feedback. This could be done using surveys and/or interviews. Then after receiving both peer and instructor feedback, students could reflect on their experiences with these two forms of feedback through surveys and/or interviews. Since issues of ordering, timeliness and quality of feedback and overall motivation and past experiences may impact this perception, this design would need to address these variables.

4. Critique the author’s conceptual framework.

A conceptual framework, as proposed by Maxwell (2005), is the basic model about what a researcher plans to study “and of what is going on with these things and why” so as to create a foundation of a tentative theory which can “inform the rest of your design,” (p 39).  Primarily within their introduction and literature review, Ertmer et al. (2007) outlined a conceptual framework that ties discussion, higher order thinking, feedback, and perceptions together and examines them within an exploratory case study framework.

The authors started from the vantage point that there is a consensus among faculty and students that student discussions are “where the real learning take place” (p. 412). Citing Black (2005) and Lang (2005), Ertmer et al. (2007) shared that discussions create learning opportunities as they engage the student in a “dialogical process that leads to increasingly sound, well grounded, and valid understandings of a topic or issue” and “have the potential to motivate student inquiry and to create a learning context in which collaborative meaning-making occurs” (Ertmer et al., 2007, p 413). But given this, they returned to Black (2005) in reflecting that there is little evidence that “the critical level of learning desired” is a natural outcome of student discussions (Ertmer et al., 2007, p 413).

In suggesting more is needed to promote higher-order thinking, the authors looked to feedback as a means of providing this stimulus.  Citing Higgins et al. (2002), Ertmer et al. (2007) offered that “feedback that is meaningful, of high quality, and timely helps students become cognitively engaged in the content under study as well as in the learning environment in which they are studying” (p 413). The authors commented that feedback is critical within the online environment. Referencing Ko and Rossen (2001), the authors noted that “students in online courses are more likely to disconnect from material or environment than students in face-to-face courses” when there is a lack of feedback (Ertmer et. al., 2007, p 414). Furthermore, Ertmer et. al (2007) indicated that student perceptions of feedback are significant as Schwartz and White’s research indicated that “students expect feedback to be 1) prompt, timely and thorough; 2) ongoing formative (about online discussions) and summative (about grades)’ 3) constructive, supportive and substantive; 4) specific, objective and individual; and 5) consistent” (Ertmer et al., 2007, p 414). However, in looking to Dunlap (2005), the authors surmised that the ability to provide this level of feedback is problematic to the instructor’s workload.

Consequently, the authors offered peer feedback as an alternative since, as Corgan et. al (2004) noted, peer feedback “offers a number of distinct advantages including the timeliness of feedback, providing new learning opportunities for both givers and receivers of feedback, humanizing the environment and building community” (Ertmer et. al., 2007, p 414-415). However, the use of peer feedback is not without issue. In citing Palloff and Pratt, the authors shared that “the ability to give meaningful feedback which helps others think about the work they have produced is not a naturally acquired skill” (Ertmer et al., 2007, p 415).  This coupled with “overcoming anxiety about giving and receiving feedback…ensuring the reliability of the feedback” and addressing how the online environment affects communication means that there is no guarantee that peer feedback will help develop higher level thinking. (Ertmer et al., 2007, p 415).

To this end, they propose testing the connection between peer feedback, higher-order thinking (as demonstrated by the quality of discourse), and perceptions about feedback with this exploratory case study. The authors outlined that the use of a case study was an appropriate avenue for inquiry but not until their methods section. According to Yin (2012), “case studies are the preferred strategy when ‘how’ or ‘why’ questions are being posed, when the investigator has little control over events, and when the focus is on a contemporary phenomenon within some real-life context” (p. 1).

Overall, the authors offered a clear and easily read body of information as to how they were constructing their research, and for the most part were consistent in connecting the varying aspects of their reasoning together.  There were a few issues this reviewer noted that could use additional investment for clarification.  Within the introduction, Ertmer et al (2007) indicated that discussions are not enough to get to “critical level of learning desired” (p. 413) but they are rather vague as to what that level of learning is exactly and why achieving higher order thinking is important within the online environment.  In addition, the authors are not clear enough in explaining how this is a problem indicative of the innate aspects of discussions and not reflective of a problem found within online learning in general. At this point within their article, the authors inserted a separate paragraph on how the “use of discussions in online environments is supported by the socio-cognitive perspective” and referenced Vygotsky’s Zone of Proximal Development (Ertmer et al., 2007, p 413). This separation is a bit confusing as the prior paragraph is also discussing the importance of discussions, either online or face-to-face, as being important for learning. By giving it its own paragraph, it initially led this reviewer to consider that they would be using this theory as part of the conceptual framework since this could connect into why peer feedback (and scaffolding it) may be effective towards higher order thinking. However, after this section, this theory is not eluded to in any way. This reviewer took this to mean that the intention was only to use this theory as a means of additional research support but that it was not a significant part of the conceptual framework.  Given this, perhaps incorporating these ideas within the prior paragraph may be warranted.

5. How effectively does the author tie the study to relevant theory and prior research? Are all cited references relevant to the problem under investigation?

Within their article, Ertmer et al. (2007) cited numerous prior research studies in support of their conceptual framework and there is relatively good connection between these sources and the points the authors are making. They provide references to studies on why discussions are an important part of online learning, what role feedback plays within instruction, what makes good feedback, and what expectations students and faculty have about online feedback. However, most of these are of singular citations to each sentence which is much fewer than this reviewer has seen in other papers. This may indicate a clear and directed focus by the authors on the most relevant research (as some authors superfluously cite references to convey scholarly aptitude) or a lack of available studies as the authors mention later in their purpose of study.

In addition, there are some gaps in their research at points.  For example, there is no citation to their comment that “lack of feedback is most often cited as the reason for withdrawing from online courses” when one would expect a citation to support that statement since it is different in scope to the prior sentence prior mentioning student disconnectedness due to lack of feedback (Ertmer et al., 2007, p 414). In reflecting on their feedback, most of their commentary centers on overall critical components of feedback when given online and not much research is given specifically to feedback that occurs online or specifically within discussions which is the central focus of their research study.  This likely due to the overall lack of research available in these areas, as the authors note in their purpose of study. However, a note to this effect would clarify this within the earlier section. Thirdly, while the authors discuss the potential student benefits of giving and receiving peer feedback, the issues students may have with giving feedback, and student expectations of feedback, there is little that is discussed regarding how students perceive of giving and receiving peer feedback even though this is a focus of two of the research questions they present.

In examining relevant theories, Ertmer et al. (2007) specifically only reference Vygotsky’s Zone of Proximal Development but overall nothing with this theory throughout the rest of the article.  In this case, the authors seem to use this more as additional support to why discussions can be important avenues for learning rather than as a specific theory underpinning their research design.  As such, it seems a bit disingenuous in its usage as they have several references which already address this idea.

6. Does the literature review conclude with a brief summary of the literature and its implications for the problem investigated?

Unlike other articles this reviewer has encountered, Ertmer et al. (2007) structured their article such that the literature review is presented not specifically as one. Following their introduction, the authors presented sections outlining the role of feedback in instructions, the role of feedback in online environments, the advantages of using peer feedback and the challenges of using peer feedback. Within each of these sections they presented literature to support their ideas and connect the reasoning behind their conceptual framework. Thus, it is left to the reader to surmise that these sections were intended as the literature review. A simple header of “Literature Review” prior to the first section could clarify this more. In seeking a brief summary of the literature, there is none where one would expect to be between these sections and the purpose of study. Rather, within each section the authors summarized the main points of that area within the final paragraph or few sentences. The inclusion of a summary to pull these salient ideas together from their prior sections and transitions into the purpose of study would clarify this more for the reader.

7. Evaluate the clarity and appropriateness of the research questions or hypotheses.

In this exploratory study, Ertmer et al. (2007) offered three research questions within their purpose of study that were appropriately related to the research problem they stated. These were specifically constructed to fill in the research gap by assessing the “impact of using peer feedback to shape the quality of postings” and to examine “student perceptions on the value of giving and receiving peer feedback regarding the quality of discussion postings” (Ertmer et al., 2007, p. 416). While for the most part these research questions are very clearly written, the wording within the second part of research question one is problematic. Within research question one, the first part questions the “impact of peer feedback on the quality of student of student postings in an online environment” (Ertmer et al., 2007, p 416). This is very clear and measurable. The authors then questioned if “the quality of discourse/learning can be maintained and/or increased through the use of peer feedback” (Ertmer et al., 2007, p 416). This is confusing as written since it conflates quality of discourse and the quality of learning within a single entity. As these are two different variables to be assessed in this research question, these may be differently affected by peer feedback and thus combining them within a single question may not yield clear answers.

8. Critique the appropriateness and adequacy of the study’s design in relation to the research questions or hypotheses.

Ertmer et al. (2007) approached their research from a case study framework and utilized the collecting of qualitative and quantitative data from student discussion postings, surveys and interviews in order to build their dataset. A case study is form of “empirical inquiry about a contemporary phenomenon (e.g., a “case”), set within its real-world context—especially when the boundaries between phenomenon and context are not clearly evident” (Yin, 2009, p. 18). According to Yin (2012) a case study approach would be appropriate when the researcher is asking descriptive or explanatory questions, is collecting data in a natural setting, and/or is concerned with an evaluative process that is occurring.  Since the research was on “describing the process of giving and receiving peer feedback within an online course” (Ertmer et al., 2007, p. 416), offered several descriptive research questions, and focused on the collection of data within the “natural” setting of the discussions that were occurring within the class, the use of the case study framework is appropriate.

By using the combination of qualitative and quantitative data, the authors may be attempting to balance the strengths and weaknesses of each form of data. As Ertmer et al, (2007) commented within their study, “limited research has been conducted that examines the role or impact of feedback in online environments in which learners constructs their own knowledge, based on prior experiences and peer interactions” (Ertmer et al., 2007, p 416).  As Hoepfl (1997) noted qualitative research “can be used to better understand any phenomenon about which little is yet known” and are “appropriate in situations where one needs to first identify the variables that might later be tested quantitatively” (p. 48-49). Therefore, the focus on qualitatively derived data collected from surveys and interviews is an appropriate choice since there is little prior work that has established student perceptions on peer feedback and how this would impact quality of work within the online context and their research questions (RQ2 and RQ3) are specific to this.  Ertmer et al. (2007) also chose to collect quantitative data by statistically analyzing student posting quality (based on scoring) after receiving peer feedback. This is an appropriate methodology since their first research question was designed to determine if there is a relationship between these peer feedback and posting quality is specifically asking if change occurred.

9. Critique the adequacy of the study’s sampling methods (e.g., choice of participants) and their implications for generalizability.

Based on the descriptions of their context and procedures within this case study, it appears that rather than randomly sampling from several classes, Ertmer et al. (2007) utilized purposeful sampling to focus intensively on a small group of fifteen students (10 females and 5 males) who were enrolled in a single course. Hoepfl (1997) noted that “purposeful sampling is the dominant strategy in qualitative research method” as it seeks “information-rich cases which can be studied in depth” (Hoepfl, 1997, p. 51). However, there is little to indicate if these fifteen comprised the whole class or why this class was specifically selected for this case study. Such information is needed to ascertain if the authors were selecting out of convenience which, as Patton (1990) noted “saves time, money, and effort” but has the “poorest rationale; lowest credibility” and yields “information-poor cases” relative to other purposeful sampling strategies (p. 183).

While small sample sizes are problematic for research for quantitative studies, Patton (1990) remarked that,

There are no rules for sample size in qualitative inquiry. Sample size depends on what you want to know, the purpose of the inquiry, what’s at stake, what will be useful, what will have credibility, and what can be done with available time and resources” (p. 184)

However qualitative researchers must still be aware of the potential for sampling error when using purposeful sampling. As Hoepfl (1997) noted, sampling errors may be introduced into purposeful sampling when there is insufficient breadth in the sampling, there are “distortions introduced by changed over time” or they lack depth of data collection within each case (p. 52). In the Ertmer et al. (2007) study population, the 15 participants were drawn from only a single class during a single term. Within this group, 12 were either educational administrators or educators and 14 were pursuing advanced degrees.  Given the short period (one term) there is likely little distortion due to changes over time and the fact that they collected multiple forms of data from each participant means there was a depth to their data. Perhaps the greatest weakness in the study sample lies in the potential lack of breadth due to the common backgrounds within this small group (educational administrators or teachers pursuing advanced degrees). Patton (1990) remarked that purposeful samples should “be judged on the basis of the purpose and rationale of each study and the sampling strategy used to achieve the study’s purpose” (p 185). In applying Patton’s measure, Ertmer et al. (2007) used Bloom’s taxonomy and one could suggest they rationalized the sample selection since these participants “were familiar with Bloom’s taxonomy or assessing levels of questioning and determining instances of critical thinking” (p. 417). This indicates that this lack of breadth is not a source of sampling error however it does hold issue for generalizability.

In commenting on their study limitations, Ertmer et al. (2007) remarked that sample size did “limit the results of this study” (p. 428). This comment may be reflective of this issue of sample breadth. As Ercikan and Roth (2014) denoted that as long as qualitative studies “take into account the contextual particulars relevant to the manifestation of the generalization” (p. 17) they can offer aspects of generalizability. In examining the population within this study, the relative homogeneity of this population does represent that the results of this study may not be as applicable to groups which lack this similar occupational and educational composition but the Ertmer et al. (2007) are candid in describing the population parameters recognizing this in their section on study limitations.

10. Critique the adequacy of the study’s procedures and materials (e.g., interventions, interview protocols, data collection procedures).

Ertmer et al (2007) elected to collect data through scored ratings of student’s postings, participant interviews, and participant surveys. In order to collect the data, the researchers utilized a group of 7 graduate students and 2 faculty members. This team collaboratively created the data collection instruments and “each team member took primary responsibility for collecting and analyzing the data from a subgroup of two participants” (Ertmer et al., 2007, p 416-417). They also indicated that “each member of the research team interviewed two participants via telephone or in persons (Ertmer et al.,2007, p 420). However, as there were only 15 participants and 9 members of the research term so there may be some clarification needed here in how this procedurally worked.  To address how data would be analyzed, several well-designed protocols were established by the researchers to address observer bias in scoring discussion postings and in coding interview responses.

Within this study, feedback students viewed was given on each discussion postings as both a score and descriptive comments.  For the first five week of the course, the two instructors of the course provided the feedback to the students. It was unclear if these are also members of the research team. Beginning in week seven and for the next six weeks (ending week 13), the students provided peer feedback to two classmates with peer review assignments being rotated on a weekly basis. At some point within this peer feedback period, interviews were conducted however it is unclear specifically when these started and ended as Ertmer et al. (2007) only denoted in the data analysis that the interviews were conducted “several weeks after the peer feedback process had started” (p. 420). Further clarification of this timing would be beneficial towards understanding the study timeline. Three weeks after this peer feedback period had ended period (week 16), the students then completed a post survey for final perceptions on both the instructor and peer feedback they received. It is unclear why there is a three-week gap between when the peer feedback period ended and when the surveys were administered. This could have potentially impacted the survey data as students were recollecting their perceptions rather than providing them in-the-moment.

For a scoring rubric, the researchers used Bloom’s taxonomy to create a 0-2 point scale for students and researchers to use evaluating posting quality. The selection of the Bloom’s taxonomy was appropriate as this is one their education students should have some familiarity with to some degree but this a very narrow scale considering the number of levels within Bloom’s taxonomy and the desire by the researchers to measure a change in quality.  The study would be improved with a larger scale more reflective of the actual structure of the taxonomy and with more ability to measure quality change over time. In addition, while the instructors modeled scoring feedback through the rubric for the first five weeks, there was very little evaluation of the students’ ability to effectively evaluate postings based on the rubric prior to its implementation. Ertmer et al. (2007) mentioned that students were provide examples of possible responses and explanations for these but there was no demonstration within the study that the participants could effectively apply it in giving peer feedback. Incorporating a scaffolded approach to this peer feedback wherein, after modeling the rubric use, the instructor offered feedback to individual students on their effective use of the form in giving peer feedback would have established a better foundation for students effectively using it. Overall lack of training could be one of the reasons why peer feedback was viewed as less preferred than instructor feedback by participants within this study. The author’s seemed to be aware of this after the fact and discuss this as part of the limitation of their study

Since the scorings were used for grading, the procedures of the study required that all peer feedback be passed through and reviewed by the instructor before being sent on. This was a thoughtful step designed to address issues anonymity and any problems that might arise. However, this likely impacted the study’s outcome. Students within the class saw the scores and comments from their instructors very soon after their submissions so they were useful in how they responded on a subsequent discussion. However, peer feedback was moderated by the instructor and resulted in up to a couple week delay on response to the student.  This meant students would potentially have no recent peer feedback to use for improvement on subsequent boards.  This is likely one of the larger issues in this study’s design and may have impacted not only student performance on postings but students’ perceptions on peer versus instructor feedback.

To determine the change in quality of student postings when given peer feedback, the researchers did not rely on the actual instructor and peer feedback scores that were given but rather used scoring of all postings provided by the researchers using the same rubric the students and instructor used. This was done to “provide a better indication of the changing quality of the responses,” to “ensure consistency in scoring student’s online postings” and to address the incompleteness of the student dataset due to the design of the class (Ertmer et al, 2007, 418). While these are all valid reasons for doing this, some data analysis of the actual peer feedback scoring would have been helpful to support the need for an alternative measurement of quality than what students received and based posting improvement on during the class duration.

11. Critique the appropriateness and quality (e.g., reliability, validity) of the measures used.

As Drost (2011) indicated “reliability is the extent to which measurements are repeatable” (p 106).  The research team protocols for addressing interobserver biases in scoring student postings as well as in standardizing the interview protocol and coding of interview data were evidence of their effort to provide reliability of their measurements.  According to Drost (2011), “validity is concerned with the meaningfulness of research components” and “whether they are measuring what they intended to measure” (p.114). One way to address validity is to use measurement tools which have been validated in their use by prior studies.  As Ertmer et al. noted, the use of Bloom’s taxonomy “provided a relatively high degree of face validity” as it was familiar to the participants and researchers (p. 421) and “had been successfully implemented by the researchers in a similar graduate course” (p 417). Validity was also addressed by the authors triangulation between the sources of data such as the survey results and the individual interviews.

12. skipped per faculty instructions

13. Critique the author’s discussion of the methodological and/or conceptual limitations of the results.

Ertmer et al. (2007) properly acknowledged several issues within their study that were linked to their methodology. Some issues were addressed within the limitations and suggestions for further work area. These included the small sample, the short duration of the study, and evaluation scale. However, Ertmer et al (2007) noted several specific issues which likely impacted the study outcomes only within the analysis discussion session. These included:

  • Use of discussion questions that were not of the caliber to be conducive to “higher-level responses,” (p.426)
  • Time delay in receiving peer feedback due to faculty moderation and review
  • Inclusion of general interpersonal and motivational postings in the analysis even though they were not likely to ever reach the upper levels of the scoring taxonomy.

These highlight three critical issues within the study design that, as the authors rightly noted, likely had an impact on the results.  First, given that this study was designed to evaluate quality changes in postings when given peer feedback, it is concerning that there was not significant effort to evaluate the questions prior to the study start to confirm they would elicit the desired level of student response. A follow-up analysis to see if the results vary when question design is directed toward high-order thinking would be useful. Secondly, since their own literature research stressed the importance of timeliness in student perceptions of feedback, it was concerning that they selected a class to study wherein peer feedback was intentionally delayed. A follow-up analysis to see if students who received timelier peer feedback perceived of this differently than those who did not would also be useful. Finally, the inclusion of motivational and interpersonal postings was likely affecting their dataset since they were using averaging scores between the two feedback periods. The authors indicated that they did not remove this from the dataset as they did not know which ones post-hoc the students intended to be counted and that if they had removed these there would have only been 160 postings to analyze rather than 778 which they felt “would limit our ability to measure change in posting quality” (Ertmer et al., 2007, p. 426). The first reasoning is a non-issue. Since the authors were not using the actual scores the students were providing to one another but their own scoring, the intention of the student is irrelevant. Since these non-content postings accounted for 79% of the total volume of student postings scored to evaluate for quality change, inclusion of them likely impacted their data since they acknowledged that these likely would not have likely scored high. Given that they knew of the actual number of these non-content postings (618), the authors could have considered one analysis with these included and one with these excluded to see if this impacts the quality change observed.  It is worth noting that these three issues are addressed only within the analysis discussion section and not in the latter limitations and suggested future work, even though at several points the authors indicate need to address these “in the future” (Ertmer et al., 2007, p. 426). Therefore, some reiteration of these issues within that later section – perhaps as a commentary on considerations for future study design – is warranted.

14. How consistent and comprehensive are the author’s conclusions with the reported results?

Overall Ertmer et al. (2007) offer a clear, consistent and comprehensive set of conclusions that are well supported by the data.  In addressing RQ1, the authors concluded that there was no quantitative change in the quality of postings during the peer feedback period (it did not increase or decrease). While the authors appropriately addressed several reasons why this may be, the results suggested to Ertmer et al. (2007) that once a level has been reached, peer feedback “may be effective in maintaining quality of postings” (p 422-422) and that “peer feedback is a viable alternative to instructor feedback” since there was no negative impact (p. 428).  In evaluating RQ2/RQ3, the authors found that student perceptions of the importance of feedback rose over the term and that there was perceived value in both giving and receiving peer feedback as based on survey and interview data.  This is consistent with the results of the study. In specifically comparing perceptions of peer and instructor feedback in RQ2, students perceived more value from instructor feedback than peer feedback. This was counter to what they thought would occur. While the authors reflected on several factors which could be impacting this, they also acknowledged that is consistent to what other studies had indicated.

15. How well did the author relate the results to the study’s theoretical base?

In reflecting on their results, Ertmer et al. (2007), connected and compared their results to the Ertmer and Stepich (2004) study. Overall, the authors found their results ran counter to what was seen in the prior study. In analyzing their results, Ertmer et al (2007) returned to several of the studies which formed the foundation of their original literature discussion at the start of the article including Black (2005), Ko and Rossen (2001), Palloff and Pratt (1999) and Topping (1998). This offered a well-developed connection between their theoretical basis and their results and demonstrated their interest in the continued development of this already existing body of knowledge.

16. In your view, what is the significance of the study, and what are its primary implications for theory, future research, and practice?

In this reviewer’s opinion, Ertmer et. al. (2007) provided the reader with good research on why giving and receiving peer feedback may impact student performance. Their analysis of how peer feedback benefits and challenges the learner and the perceptions that students then have of peer feedback relative to faculty feedback indicates there is more to building effective feedback systems into online courses then just creating a discussion board. In particular, the need to develop the student into providing effective peer feedback and the considerations that need to be made in how to structure that feedback are of critical importance. This requires faculty to take into consideration not only the relative newness students have in providing peer feedback but the need to acknowledge that there are issues of anxiety and responsibility which some students are unprepared to do, particularly within the asynchronous nature of an online class. The onus is on the faculty member wishing to use peer feedback to reflect on and scaffold peer feedback as a viable source of learning input for online students. This may not result in the work load decrease, Ertmer et al. (2007) hinted to, but this would provide a skill set that could serve the student well throughout their educational experience and beyond.


Black, “The use of asynchronous discussion: creating a text of talk,” Contemporary Issues in Technology and Teacher Education 5 (1), 2005, pp. 5-24

Drost, E. (2011). Validity and Reliability in Social Science Research. Education Research and Perspectives, 38(1), 105 – 123.

Ertmer, P., Richardson, J., Belland, B., Camin, D., Connolly, P., Coulthard, G., Lei, K., and Mong, C. (2007). Using Peer Feedback to Enhance the Quality of Student Online Postings: An Exploratory Study. Journal of Computer‐Mediated Communication,12(2), 412-433

Hoepfl, M. C. (1997). Choosing qualitative research: A primer for technology education researchers. Journal of Technology Education, 9, 47–63.

Cobb, P., Confrey, J., Lehrer, R., & Schauble, L. (2003). Design experiments in educational research. Educational Researcher, 32(1), 9-13.

Maxwell, J. A. (2013). Qualitative research design: An interactive approach (3rd Ed.). Thousand Oaks, CA: SAGE Publications

Patton, M. (1990). Designing qualitative studies In Qualitative evaluation and research methods (pp. 169-186). Beverly Hills, CA: Sage

Roehler, L. R., & Cantlon, D. J. (1997). Scaffolding: A powerful tool in social constructivist classrooms. In K. Hogan & M. Pressley (Eds.), Scaffolding student learning: instructional approaches and issues (pp. 6–42). Cambridge, MA: Brookline

Yin, R. K. (2009). Case study research: design and methods (4th Ed.). Thousand Oaks, CA: Sage.

Yin, R.K. (2012) A (very) brief refresher on the case study method In Applications of case study research (pp 3-20) Thousand Oaks, CA: SAGE Publications


First Critical Research Review

For part of my doctoral program I was asked to examine an article and provide a critical review of the publication based on a series of questions. The content below corresponds to the questions asked.

  1. Overall Erhel and Jamet (2013) poorly specified the problem for which their research was designed. Generally, a clear statement of a problem appears in the introduction section of the paper but, in this case, the authors did not clearly state any problem in their introduction. The clearest statement of the objectives of the paper were found in the second to last paragraph of the paper which indicates that “one of the objectives of the present study was to answer the question “Is deep learning compatible with serious games?” (Erhel and Jamet, 2013, p. 165). This objective was not clearly stated anywhere in the prior nine pages although there is a hint to this issue when the authors generically comment in their introduction that “the use of games to teach educational content inevitably raises the question of their compatibility with deep learning” (Erhel and Jamet, 2013, p. 156). In addition, while they indicated there are two objectives, they do not outline the second objective in any overt way. This leaves the reader responsible for trying to ascertain if and what the second objective of the study may be. The authors did state in the introduction that the question of deep learning compatibility “has prompted many researchers to investigate the actual benefits of digital games in terms of learning and moitvation” (Erhel and Jamet, 2013, p. 156). However, given that a value-added approach was used in this study (see question four for a discussion of that framework), the second objective does not seem to be to determine if digital games are an effective means for impacting learning and motivation since the assumption underlying that approach is that these are effective. Thus, based on what is presented in both the literature review and the experiment design, the second objective (what is called the assumed objective) may be that they are seeking to determine the effectiveness of elements (instructions and feedback) within game design in promoting learning and motivation. Unfortunately, this second objective was not specifically stated by the authors as one would have expected in an academic research paper and is only assumed. Based on how the literature review and research was conducted, one could consider that even the authors were unclear on both objectives within their research design and may have only thrown the idea of two objectives in at the end to try and tie their research together based on the results seen.


  1. As noted in question one, there was very little clarity of the actual objectives of this research. If one were to work from inferences of what the author’s may have intended but poorly expressed, then there was one known stated objective — to assess if deep learning is compatible with serious — and one assumed objective — to understand how elements of game design impacts overall learning and motivation. If these were the actual problems the authors wished to address, then there is a need for this study as one of the issues in understanding the use of games in education is to determine how they connect to learning and how the aspects of the actual game may impact effectiveness. In addition, Erhel and Jamet (2013) also used this study to present the value-added approach as a mechanism for assessing effectiveness. While there are issues with how this was implemented within this study (see question four), the idea that disciplines should develop, evaluate and improve upon conceptual models for testing is not a new one nor out of place since testing different conceptual models to identify the effective ways to measure learning within educational research are needed.
  2. Based upon the stated and assumed objectives (see question one), there is a potential for this problem to be researchable but not within a single study as presented here and not in how the authors designed their study (see questions 4 through 14 for more on the issues of their study). Since the authors first objective was assessing learning through “serious games” one would first have to define what constitutes a “serious game” in terms of design elements and goals. The authors did not offer much in terms of defining what specifically constitutes a “serious game” within their study. They suggested that how the learner approaches the game (as either entertainment or as learning) impacts their goals (performance or mastery) but they did not specify that which defines a serious from non-serious game. Once this idea of what is a “serious” versus “non-serious” game was defined, then one would need to identify or design a serious game with the required characteristics that allowed the researcher to assess the depth of learning (surface versus deep). This would likely require further refinement for understanding the hallmarks of deep versus surface learning (and why this is significant to educational research) and how these are assessable (qualitatively and/or quantitatively). To address the second problem of how elements of digital game design impact learning and motivation would require many additional experiments whereby design aspects, such instructions and feedback, were assessed independently (qualitatively and/or quantitatively) from one another for impacts on motivation and learning before assessing them within groupings as the authors do. Therefore, the intended objectives are researchable even if that is not what was done by the authors.
  3. The conceptual framework underlying Erhel’s and Jamet’s (2013) research was found in their discussion of the benefits of digital learning games compared with conventional media (p.157). Within section 1.3, the authors outlined that this study was designed to illustrate an alternative approach to how to assess digital game-based learning (DGBL) and, through this alternative method, identify specifics of game-based design which improve motivation and learning. This is done in such a way that it rests on an initial belief that digital games are an effective medium for learning and that they are only testing to determine what specifically impacts that effectiveness. In testing this idea, the authors argued that prior media comparison approaches to digital game-based learning have been ineffective in being able to reach any concrete conclusions as to whether digital games can be an effective learning medium. This they felt was due to a general vulnerability in the media comparison approach in which “many confounding factors (e.g. format, pace, educational content, teacher’s social presence), …prevent us from clearly identifying the factors responsible for the benefits of DGBL” (Erhel and Jamet, 2013, p. 157). However, the authors failed to address specifics as to how these “confounding factors” impacted the outcomes in the prior studies they cited and why they were mitigated within a value-added approach. Thus, in response to these studies, the authors designed this research to utilize a value-added approach. This framework relied on determining a difference in scores between a start and end points wherein the only a single variable between those two scoring events has occurred. This approach also works best when applied to random samples or, if nonrandom, to samples which have been statistically controlled for populational variance, such as prior experience and learning level (Douglas, 2017). Given the sampling and procedural issues discussed in questions 9 and 10 below it is unlikely that the sample was randomly constituted which also means that additional measures should have been implemented to determine the degree of non-randomness of variables the authors do not account for – such as sex, major, and year of schooling. Since these other variables either are not collected or are suspect given numerical inconsistencies, this questions how effective the application of the value-added approach would be for this research.
  4. Overall Erhel and Jamet (2013) offered a rather rambling combination of theory and research as the foundation of their research and, as a result, there seems to be no central theory underlying their research. They went so far as to state “whichever interpretive theory we choose to apply” suggesting they are not subscribing to a single theory at the start of the paper (Erhel and Jamet, 2013, p. 158). What does underly their research is their use of the value-added approach. This approach rested on the idea that it digital game-based learning is assumed to be effective, despite the inconclusiveness of prior studies, and that they only needed to assess how that effectiveness varied which changing conditions (instructions and feedback). Overall, they did not effectively present why the value-added approach was a better approach to the media comparison approach nor did they address any limitations and criticism of the value-added approach.

When examining their literature review, the connection of the research they presented to the only stated research objective and to the specifics of assessing digital game-based learning are tenuous. The first section (1.1) was meant to outline what digital game-based learning is. However, the literature sources selected dd not offer much concreteness in differentiating digital games from other games and specified little about what a digital game is beyond that is for “entertainment” and results in “cognitive changes.” Since not all cognitive changes represent actual learning processes, this lack of clear definition is problematic. In section 1.2, the authors attempted to lay out how games impact motivation by linking how a learner’s goals of mastery or performance (these seem to be two be presented as mutually exclusive goals) are connected to the entertainment and educational dimensions of games. To do this, the authors examined literature on general motivation and mastery and performance goals and offered no concrete explanation of research which has assessed the linkage between games and learner goals. The only offering of actual digital game research indicated that there is a positive linkage between intrinsic motivation and scoring but nothing about how this connects to learner goals. In section 1.3, the authors sought to present the benefits of DGBL games compared with conventional media but overall end up presenting that DGBL games are a mixed bag and instead the authors used this section to present their value-added approach without very much literature discussion to ascertain why this is better and what caveats structure such an approach. In section 1.4, the authors introduced the concept of instruction design to improve DGBL game effectiveness. They presented they are addressing this as it is an area not studied before within DGBL. To address how important instruction design may be to DGBL, the authors then proceeded to offer research only on the how text-based instructions impact how someone reads and approaches a learning item without addressing how reading on screen is cognitively different than reading from a document. They then transitioned to a discussion of literature which outlines how having no instructions prior to learning from a text document (what they call incidental learning) promotes surface learning whereas clear instructions promote deeper learning (what they call intentional learning). This is even though none of their experiments used either a text document nor had a situation of no instructions prior to learning.

  1. Erhel and Jamet’s (2013) literature review offered no summary of the literature and did not specifically direct any summary to addressing the only stated research objective of the paper. Instead the authors offered a prediction that incidental (surface) learning is likely when “the instructions given to learners encourage them to play rather than learn” suggesting to them that “when the emphasis is placed on the playful components of a digital learning game, learners may fail to put in the effort required for learning” (Erhel and Jamet, 2013, p. 158). This lack of connection to the only stated research objective may stem from this objective having been an afterthought since it is only stated at the end of the paper and may not have been used to build the literature review in any meaningful way. Overall the authors did not offer any concrete information regarding what are “serious games,” the literature on how serious games may connect to surface and deep learning (they only present on how the presence of written instructions in documents may connect to surface or deep learning) nor did they offer much on the specifics of game design which would examine the effectiveness of the elements within it such as instructions. Since most of their references applied solely to text-based documents, one questions the ability to transition cognitive associations with paper documents to that of digital game-based learning. Most interestingly, even though they undertook an experiment in which feedback was used as a tool of game design, they did not offer any references within their literature review which addressed how feedback plays into either motivation or learning. The only discussion of this occurred within the context of the experiment introduction and existed almost as if experiment two was an afterthought done only when experiment one did not present the desired results.
  2. In examining the general foundations for their research, it is not surprising that the Erhel and Jamet (2013) failed to offer clear and specific research questions or hypotheses which connect to their stated objective of the paper. In experiment one, the authors first indicated that the study was done “to ascertain whether the effects of instructions given during the readings phase that have been observed for text-based learning would also manifest themselves during DGBL” (Erhel and Jamet, 2013, p. 158). However, they stated at the end of the discussion of the experiment that “the first experiment was designed to assess the effects of instructions on learning quality and motivation in DGBL” (Erhel and Jamet, 2013, p. 161). These are two different research questions, so it is confusing as to which one they were evaluating with their experiment. The first would suggest they were replicating studies conducted in text-based learning to see if the outcomes also hold true for DGBL. The second suggests that were testing the relationship between instructions and learning and motivation. Setting aside the later research question and looking only at the first, the authors created two assumption to test at the start of experiment two. The first is that “entertainment instruction would improve our participants subjective experience and be reflected in significantly higher intrinsic motivation scores” (Erhel and Jamet, 2013, p. 158). For their first assumption they turned to a metadata analysis by Vogel et al. (2006) for support but this is problematic as the conclusions reached by this study were examining digital simulation and game learning to traditional learning and not specifically how instructions impacted learning for digital games. The second assumption was that “participants in the entertainment instruction condition would achieve a higher learning outcome” (Erhel and Jamet, 2013, p.158). This is based on research conducted by Liebermann (2006) – a reference they give numerous times without any description as to the actual study and results. In doing this, Erhel and Jamet (2013) failed to specify what is the measure of the higher learning outcome that they are basing this assumption on. For the study, they used scoring on the questionnaires, but it is unclear if that was similar to what was done in Lieberman’s study. Overall it is unclear how this research question and assumptions stated are connected to the stated objective of “Is deep learning compatible with serious games?” (Erhel and Jamet, 2013, p. 165).

The second experiment outlined that Erhel and Jamet (2013) “set out to determine whether the presence of KCR [knowledge of correct response] feedback in DGBL quizzes can influence the types of learning strategies induced by the instructions” (p. 162). They predicted that “the addition of feedback containing the correct response would reduce redundant, superficial cognitive processing, thereby making learning more relevant in both the entertainment and learning instructions conditions” (Erhel and Jamet, 2013, p. 162). Like the first research question, the connections of the one stated research objective and the experiment conducted around it are not clearly connected. In addition, there was no mention of using feedback for learning in either the introduction or the literature review prior to experiment two meaning that this experiment was completely disconnected. Overall, experiment two was never actually part of the original research design as the authors clearly indicated that experiment two “was designed to overcome the problem” they found in experiment one – namely that the outcomes did not match the assumptions (Erhel and Jamet, 2013, p. 164). This they felt was from the entertainment perspective in which “the instructions failed to engage a sufficiently strong effort to trigger the central processes needed for learning” so to compensate they added an additional dimension of feedback to change the outcomes (Erhel and Jamet, 2013, p. 164).  This indicates they were more concerned with figuring out when their predictions would be correct than addressing the research objective stated. Overall there is a general lack of cohesiveness between what the introduction states this paper is about and what the actual experiments are designed to answer relative to any stated objectives for the research.

  1. In examining the two experiments conducted, there are some concerns about the actual study design relative to the research questions presented. In experiment one, Erhel and Jamet (2013) indicated that the study was done “to ascertain whether the effects of instructions given during the readings phase that have been observed for text-based learning would also manifest themselves during DGBL” (p. 158) and from this they create two hypotheses. The first was that “entertainment instruction would improve our participants subjective experience (although they do not specify how they are measuring that) and be reflected in significantly higher intrinsic motivation scores” (Erhel and Jamet, 2013, p. 158). The second was that “participants in the entertainment instruction condition would achieve a higher learning outcome” (Erhel and Jamet, 2013, p.158). These assumptions were built from the data of the literature review. However, in reflecting on the results they shared for text-based study of instructional conditions (entertainment versus study) in the literature review, the data was collected via a think aloud. This allowed the researchers in that study to observe the process of learning between the two conditions of the entertainment versus learning. There were no mechanisms for this type of data collection in this study and thus the current study is not collecting comparable data to that one. This would mean the ability to apply conclusions from that study to this one is problematic. In addition, the authors included two different set of questions – paraphrase and inferential – in assessing learning outcomes and offered that one was an example of deeper learning and the other of surface learning without offering evidence to support that assumption.

In examining the design of experiment number two, the research question was “set out to determine whether the presence of KCR [knowledge of correct response] feedback in DGBL quizzes can influence the types of learning strategies induced by the instructions” (Erhel and Jamet, 2013, p. 162). Now the addition of this experiment to a study which a) did not have it included from the start (as evidenced by the authors’ own words) and b) for which the experiment was designed to prove their predictions were correct by changing the conditions of the experiment is problematic towards their overall purpose. Aside from the reasoning given for experiment number two, the general design was that this experiment was done by adding feedback responses to the ASTRA simulation while that the same time continuing with the two instructional conditions of entertainment or learning. Thus, the authors were not only testing the presence of the feedback but also that of the instructional condition even though they said they are only testing the instruction condition (Erhel and Jamet, 2013, p. 162). Overall the study’s design relative to the actual research questions asked is less than ideal and suggests that, at least in the case of experiment two, there was significant bias on the part of the researchers towards a particular outcome which resulted in modifications of overall research purposes.

  1. Overall the sampling methods presented by Erhel and Jamet (2013) within their two experiments were poorly explained. The authors indicated that students were recruited from a pool of students but did not indicate how the original pool was established, how big this pool was to begin with, and how the actual participants were pulled from that larger pool for both experiments. In looking more closely at experiment one, the authors indicated that they randomly assigned participants to the two experimental conditions (learning versus entertainment instructions) but since they showed the same uneven numbers of males and females to each of these groups (9 men 15 women) it would suggest they actually did a stratified random sample to evenly distribute the numbers of each sex that had available between the two experimental conditions . Furthermore, they failed to examine the specific of the populations within each of the two conditional states beyond their general age mean within experiment one and they did not offer any breakdowns of the populations, such as by sex or age within experiment two. Erhel and Jamet (2013) did mention that for both experiments they excluded students based on their enrollment in medical or allied health programs. However, they did not offer any breakdown of the background majors of those that did participate within the study. This could have created another dimension to their data which may have been relevant between the two experiments and may impact results. Because of the weakness in the description of the populations participating in the study and overall lack of explanation of their actual sampling methods, the overall generalizability of the study’s results is limited for at least two reasons. First, since they did not explain their populations parameters, they cannot offer any intra- and inter-group analyses within and between the two experiments which would help the reader to understand if the results from the comparisons between these two populations are valid. Secondly, since they offered no population parameters, the ability to apply what they discover to “like” populations is not possible since the “like” is undefined by the authors. While this could allow make one think that overall generalizations would be safe due to lack of specific population parameters, the opposite is true. Since they offer no discussion of population data by which to breakdown their analyses, it may be that there are underlying specifics to the populations that influenced the results which they have not accounted for. For example, perhaps there were differences in distribution of majors between the two experiments which influenced why one group scored better on the inferential questions than the other. Unfortunately, the authors, for reasons unknown, did not even consider these issues of sampling when outlining what are limitations of their research.
  2. In evaluating the adequacy of Erhel and Jamet’s (2013) procedures there are several issues evident. The first issue raised regards the actual number of participants in their study. In experiment one, the authors indicated a total of 46 participants (22 men and 24 women) but when they broke the data down by the two groups they indicated that each group had 9 males and 15 females for a total of 18 men and 30 women participating in experiment one. They only mentioned omitting one male due to having scored too high on the pretest. This means they either miscalculated their original number of study participants, they removed more males and added additional females without explanation as to when, how and why this occurred within the experiment procedures, or that they padded their data to reach a desired result.

In examining the procedures outlined for experiment one, the authors indicated there were five experimental phases but only explained three phases (pretest, simulation, and questionnaires) in their descriptions of their procedures. It is unclear if they miscounted or failed to properly divide the phases within the writing.  Beyond the actual number of phases, within each phase they did explain there are concerns which can be raised. The first phase – a pretest phase – involved the participants completing a questionnaire on prior knowledge. This pretest was on “medical notions” and according to the authors “they would not help the learners answer the quizzes or the knowledge questionnaire.” (Erhel and Jamet, 2013, p. 159). However, despite this statement that it would not impact what the participant would be exposed to later, the researchers used the score on this pretest to eliminate persons from participation for possessing “too much prior knowledge” (Erhel and Jamet, 2013, p. 159) without evidence within the data that these individuals skewed the results with this prior knowledge.

In the second phase of the experiment (where they tested the two experimental conditions between the two groups), the authors indicated that distinctions between these were a “learning condition in which the instructions stresses ASTRA’s playful dimension, presenting it as a game” whereas the other stressed “the educational dimensions, presenting it as a learning module” (Erhel and Jamet, 2013, p.159). In translating the two examples provided in French by the researchers, there are some concerns. Despite their description that one was set of instructions was for learning and the other for a game, the phrase “helps you to learn” was evident in both sets of instructions suggesting that the distinction between game and educational instructions were not complete between the two sets of instructions. In addition, there was an additional variation in the wording used to emphasize the game (“be challenged to answer quizzes”) versus the learning module (“be introduced to quizzes) – see underlined in Box 1 and Box 2. The word selection of the passive “be introduced” for learning and the active “be challenged” for gaming may be more influential in framing the instructions when viewed by the learner and may be indicating more about word selection in instructions than the framing of a game versus learning module. Further into during phase two, the authors indicated that they used a single room with 6 booths which would indicate that they had to break each experimental condition set up into at least four groups to work through the ASTRA simulation. There was no indication of how close in time these rotations occurred, data collected on time for completion, nor any analysis of variations in results based on these differing simulation runs.

In the third phase they indicated they gave a motivational questionnaire and then a knowledge questionnaire. Unfortunately these questionnaires were not provided for review in this paper as one would have expected especially since they do not originate in full from any previously published motivational questionnaire. In the motivational questionnaire, the authors indicated that there were 12 learning goal questions and 3 motivational questions assessed but only 5 were shown and deconstructed within the paper for how they addressed aspects of performance goals, mastery goals, and intrinsic motivation.  It would have been more appropriate to show the entire questionnaire deconstructed to these specific categories. The knowledge questionnaire contained “four paraphrase-type” questions and four “inference-type questions” with the written response to each of scored on as 0, .5, 1 or 2 points based on degree of accuracy. However, the authors did not indicate who did the scoring (one author, both authors, or another), when this was done (all at once or over time, blind to group assignment or by group) or if scoring bias was assessed to address issues in the collection of scoring data.

In examining experiment two, the authors offered no specifics of procedures beyond that they adjusted the ASTRA simulation to allow for a correct or incorrect response with correct answer to be given to the learner. No discussion of data collection procedures was denoted beyond the phrase “exactly the same as Experiment 1” (Erhel and Jamet, 2013, p. 162). This is problematic since like in experiment one, there is a numbers issue with participants. The authors indicated they had 16 men and 28 women for a total of 44 participants in experiment two and that 4 were excluded after the pretest phase. Despite this exclusion, their numbers still all ran with 44 participants through the analysis of the different data from the ASTRA quizzes and the motivational and knowledge questionnaires.

  1. In examining the appropriateness and quality of the measures used by Erhel and Jamet (2013), there are some issues evident in their work. Given the participant numbers issues noted in both experiment one and two as well as the aspects of the scoring for the knowledge questionnaire as discussed in question 10, it is uncertain how reliable their actual data is from which they draw their conclusions. In addition it was noted in that in the second experiment they were only testing the single instructional independent variable (the instructions) but in reality they had layered a second variable – that of the feedback – into this experiment. Given that the results of the instructions testing in experiment one was not in support of their expectations, it seems unusual that the authors would layer this additional variable without testing how it did in its own experiment (i.e. does knowledge gained response (KGR) feedback improve performance regardless of instruction style). Thus they should have run a prior experiment before including the KGR feedback with instructions that would allowed them to assess what degree of impact the feedback alone.
  2. Overall in examining the data analyses presented by Erhel and Jamet (2013) there are several critical issues seen. First is the overall lack of information about how the sampling was done. Since the value-added approach relied on either random sampling or control of the variation of the non-randomness of the sample one would have expected specifics of this process having been discussed and a greater breakdown of the data by variables such as sex, age, major and level of schooling would help verify the use of the dataset relative to this approach. Second, there is an overall lack of clarity and possibly inaccuracy in the size of the populations reported and those that were analyzed. Both experiments indicated the total number of participants but since pretesting was part of the experiment this would suggest that these numbers are what occurs at the start of the procedures. In both experiments they dropped individuals (due to high pretest scores) but the numbers do not change when examining any of the phase three scores. This could mean that the authors were not using pretests as part of the actual experiment phasing but as part of the populational sampling design, so these individuals were dropped prior to the participant count being finalized. However, that is not how it is presented in the actual procedures of the experiment. In addition, in experiment one, the actual number of the participants reported varies from 46 to 48 and back to 46 and the numbers of males and females do not match to what is initially reported and it is not corrected when dropping of participants due to pretest scores is considered. Given these small samples (both under 50) and the unclear numbers of actual participants, even the addition or loss of one or two individuals could shift the significance of the conclusions reached.
  3. Erhel and Jamet (2013) did outline three limitations to their study. The first they denoted was in the selection of a less than interactive game. Since they stated that their research objective was assessing “is deep learning compatible with serious games” (Erhel and Jamet, 2013, p. 165), the actual game design would have been something that should have been considered at the start of the research design. The authors did not mention why the ASTRA simulation was selected over other games which would have explained some of this issue and should have been discussed. The second limitation they noted was that since overall scores were high in the in game quizzes and thus there was very limited feedback given to participants. Since their entire second experiment was based on the importance of feedback it is surprising that they say that even though it was a limiting factor that the data shows it was “beneficial factor” (Erhel and Jamet, 2013, p. 165). They should have assessed solely the impact of feedback alone without the conditions of the instructions to understand what this meant in greater detail. The third limitation they mentioned was that their use of asynchronous data from the actual game play meant the data received was not a reflection of actual play but a recollection regarding it. They mentioned they could have collected in-time data but fail to explain why they didn’t. Overall, the authors are correct in that these are three limitations of the research, but they failed to see the other larger issues present in this study with regards to research design, experiment procedures and sampling, and overall populational controls (see prior questions). Since many of these call into doubt their overall results, it is not surprising that these are not mentioned as they would undermine the entire reasoning for publishing the study.
  4. Research design and procedural issues aside, Erhel and Jamet (2013) offered some conclusions in their final discussion which seem to be making leaps from the actual results of the study. In experiment one, the Erhel and Jamet (2013) presented that there was no significant difference between the instruction modes and responses to the paraphrase type questions but that participants in the learning module “performed significantly better that those in the entertainment instruction condition” on inferential questions (p. 161). By the time this reached the final discussion of the paper, these results are expressed as coming “out against the entertainment instruction” since it “failed to trigger the central processes for learning” (Erhel and Jamet, 2013, p. 164). However, that is not what the results indicate since both modes were effective for learning but not for the same kinds of question. Whether or not these questions are demonstrated examples of learning processes is not evident within the experiment design provided by the authors. Thus, the authors showed an assumption of reasoning behind outcomes without any foundation within the actual research conducted.

In the second experiment, Erhel and Jamet (2013) concluded that the entertainment instruction group “performed better on comprehension (inference) questions than those in the learning group and that this was caused by being “less frightened of failure” (p. 164) and that this support the notions of cognitive load that “adding features such as feedback to an educational document can trigger deep cognitive processing that contributes better to learning” as well as extending “findings that adding feedback in DGBL enhances memorization” (p. 164). However, since as the experiment offered no measurement of cognitive load within the actual experiment and it is only discussed cognitive load in consideration of experiment two and not experiment one. Therefore, it was premature of the authors to suggest that this supports that theory to the exclusion of others. In addition, since the authors connected memorization aspects to the paraphrase questions in the last paragraph of section 3.3 and comprehension to the inferential questions (even though they do not concretely demonstrate this is the case with their study), their second experiment actually indicates that feedback serves those in the learning condition more for memorization than it does for those in the entertainment condition.

  1. Since the overall theoretical base of this study is not very apparent within the confines of the introduction and the only overarching statement the authors make with regards to underlying paradigms is that of the use of the value-added approach, the ability of the authors to relate their findings to a specific overarching theory is not easily discernable. The authors did make mention of several articles which offer up reasonings as to the various results of their findings, but this is not conveyed that they were subscribing to a specific theoretical foundation. Rather they seemed to be picking those ideas which best match their overall results rather than designing their research towards a theoretical perspective.
  2. While the overall idea of assessing how digital games connect to learning and the specific conditions which impact different kinds of learning through digital games is of importance within educational technology research, the value of Erhel and Jamet’s (2013) results are lessened significantly by the overall problems in their research design and numerical controls. The overall lack of information on sampling methods and population parameters means that understanding the representativeness of this population for generalizability is limited. Given that this paper was cited in over 168 articles (based on Research Gate results), quite a few studies have used this study as part of their research. This is problematic given the extensive issues present in this study design and the overall lack of cohesive research objectives and research design which undermine the quality of Erhel and Jamet’s results. It would be very beneficial to examine other studies of this field to determine what additional research examines the foundation of learning through digital games and then identify the critical components of study for digital games which should be examined within future studies.


Douglas, D (2017) The Value of Value-Added: Science, Technology, and Policy in Educational Evaluation, CUNY Academic Works. Retrieved from

Erhel, S and Jamet, E. (2013) Digital game-based learning: Impact of instructions and feedback on motivation and learning effectiveness. Computers & Education,67(C), 156-167

Vogel, J. J., Vogel, D. S., Cannon-Bowers, J., Bowers, C. A., Muse, K., & Wright, M. (2006). Computer gaming and interactive simulations for learning: a meta-analysis. Journal of Educational Computing Research, 34(3), 229–243.