Timothy S. Faith, JD
tfaith@ccbcmd.edu
From the Legal Studies Department, School of Business, Technology and Law, Community College of Baltimore County, Baltimore, Maryland.
This study compared traditional methods of college-level instruction, including lecture and class discussion followed by assessment via course content exams, with a variety of other instructional techniques. The intent was to evaluate whether more contemporary instructional techniques are significantly correlated with improved average exam scores for students in a business law course at a large, mid-Atlantic community college. The author found that certain techniques, such as plickers (where students respond with a unique QR code to multiple choice questions presented on screen), were effective. A combination of an open educational resource textbook with fewer content items spread across more exams coupled with journaling (where students were invited to reflect through writing on course content at various times during the course) also yielded significantly improved exam scores. Conversely, the use of practice exams and Kahoots! was related to a significant decline in average exam scores. Assigning more homework was also negatively correlated with performance on exams. However, the percentage of homework assignments completed by students was significantly related to exam performance, which may be an indirect method to measure student motivation to do well in the course.
What is the role of the instructor in teaching legal courses to community college students? Traditionally, law professors lecture and assess student performance through the administration of exams (Glofcheski, 2017; Preece, 2001; Tribe & Tribe, 1988). However, changes in technology and challenges during the COVID-19 pandemic altered the education landscape for teachers and students leading to increased use of technology and remote-communication tools to deliver instruction (Hashemi et al., 2024).
A question arises as to whether the use of newer engagement techniques positively correlates with student performance compared with a more traditional, lecture-based approach to education. This study was conducted to investigate alternative teaching techniques in a business law course and assess their impacts on student exam performance.
For this study, Business Law I (MNGT 140) courses taught at the Community College of Baltimore County from spring 2014 to spring 2017 are treated as the controls, representing a more traditional educational approach to teaching students. This combined control group contains a total of 410 observations and only includes students that took all of the exams in the semester. Observations are derived from sections that were taught both in person and asynchronously.
During the 2014 semesters, the course was taught with a publisher textbook and related online MindTap assignments (practice quizzes on the assigned reading), coupled with lecture and some discussion in the courses conducted in-person or online. Williamson and Zumalt (2017) studied the use of MindTap software in chemistry courses and suggested its use may improve student course attitudes. Another study by Bertheussen and Myrland (2016) suggests that the interactive practice activities such as MindTap can positively impact exam scores in a finance course, though the authors noted that there are other factors in play, such as prior academic success. Practice exams were also offered, which has also been shown to increase course exam grades (Gretes & Green, 2000). In the control group semesters, 2 exams were administered on course content during the semester as the primary method of assessing student performance.
During the 2015, 2016, and spring 2017 semesters, in addition to the above teaching methods, the course material was broken down into 3 units, separating out the contract and uniform commercial code content from the other course materials into a separate exam. Some academic research suggests that breaking down course materials into smaller assessments results in overall improved student exam performance (Paola & Scoppa, 2011). Total homework assignments ranged from 27 to 51 in the control group.
Treatment cohort A contains 251 observations, and only includes students that took all of the course exams from fall 2017 to fall 2019 semesters. Observations are derived from sections that were taught both in-person and asynchronously. I implemented several changes to the course instruction in cohort A. First, I replaced the publisher textbook with an open educational resource textbook that I developed with 2 of my colleagues for use by all faculty teaching this course in the department (Mandl et al., 2017). Second, the publisher MindTap assignments were replaced with 6 problem sets which were similar practice quizzes on the assigned reading. Third, I began to utilize journaling, where students were invited to write an entry on the course content throughout the semester and subjectively reflect on their understanding with the material. Cisero’s study supports the notion that reflective journaling may improve student course performance (Cisero, 2006). Fourth, I implemented a 4-exam format where common law and uniform commercial code concepts were tested separately on the premise that further breaking down the course material into smaller assessments would improve overall student exam performance (Paola & Scoppa, 2011). Fifth and finally, for my in-person students, I used plickers (https://get.plickers.com/) in the classroom, where students are presented with a multiple-choice question on the course content and are invited to select an answer based on their understanding. Student responses to each question were then summarized, and we discussed the answers and why the correct answer was so marked. Research has suggested (Stines-Chaumeil et al., 2019) that use of systems such as plickers may improve student exam performance and research by Chanialidis (2019) suggested that systems like plickers may also improve student engagement and participation in the course. Total homework assignments in cohort A ranged from 18 to 22.
Treatment cohort A2 is comprised of students taught from spring 2020 to spring 2022 and contains 252 observations. Cohort A2 only includes students that took all of the exams in the semester. Observations are derived from sections that were taught in-person, a section in spring 2022 that was taught remote synchronously, sections that were taught asynchronously, and spring 2020 sections that were converted to a remote synchronous format mid-semester. This cohort is similar to cohort A, except that practice exams were not offered. Total homework assignments in cohort A2 ranged from 15 to 26.
Treatment cohort B, comprised of sections taught from summer 2022 through summer 2023, contains 129 observations, and only includes students that took all of the exams in the semester. Observations are derived from sections that were taught both in person and asynchronously. In cohort B I included additional problem sets, bringing the total number to 12. For summer and fall 2022, I used a 3-exam format, recombining common law and uniform commercial code contracts materials into a single exam. In spring 2023, I created additional homework, introducing scenario-based assignments with several questions on the scenario, bringing the total number of problems and problem sets to 22. Research by Ryan and Hemmes (2005) and Latif and Miles (2011) both found that increasing homework can improve student course and/or exam performance. In addition, I began to assign Kahoots! (https://kahoot.com/) as homework assignments. Studies by Iwamoto and colleagues (2017) and Ares and colleagues (2018) both found the use of Kahoots! can improve student performance in undergraduate courses and on high-stakes exams. I also returned to a 4-exam format and assigned the development of a study outline for each exam as homework. I also dropped journaling as an assignment for the semester. Total homework assignments in cohort B ranged from 19 to 37.
Cohort C, comprised of sections taught from fall 2023 to spring 2024 semesters, contains 125 observations, and only includes students that took all of the exams in the semester. Observations are derived from sections that were taught both in person and asynchronously. I implemented several changes to the course. First, I re-introduced journaling but changed the subject matter to a case extract for each unit of the course where students were invited to summarize the case and connect it with the materials in the unit. Second, I introduced skeletal outlines as a review activity for students to complete in combination with a YouTube streaming lecture on the main unit course concepts. Some educational researchers have found that skeletal outlines, which require the student to substantially annotate the outline with their own notes, can improve student exam performance (Bui & McDaniel, 2015; Prabhu et al., 2015). Third and finally I re-introduced practice exams in combination with Kahoots! such that students could complete either or both for each unit. Total homework assignments in cohort C ranged from 36 to 51.
Cohort D is comprised of students that completed at least one practice exam or one Kahoot! during the term. It includes sections of students taught both in person and asynchronously. Cohort D contains 516 treated observations, and only includes students that took all of the exams in the same semester.
Cohort E is comprised of students that were exposed to plickers in the classroom. Cohort E contains 336 treated observations, and only includes students that took all of the exams in the same semester, and includes sections of students taught both in person and asynchronously.
Cohorts D and E use all students not exposed to the Kahoot! and plickers as the controls for preference score matching and analysis.
I defined a dependent variable of the student’s average overall exam score, which I calculated by adding all points the student earned on all exams that semester and dividing by the total points that could be earned on the exams.
To fairly evaluate whether any treatment impacted average exam scores, I developed a preference score match utilizing the following independent variables: (a) whether the student was male, (b) the grouped cumulative GPA of the students, and (c) the rate at which the student completed the available homework assignments. GPAs were grouped as follows: students with GPAs >3.75 were grouped as a 4.0 GPA; students with GPAs between 3.25 - 3.75 were grouped as a 3.5 GPA; students with GPAs between 2.75 - 3.25 were grouped as a 3.0 GPA; students with GPAs between 2.25 - 2.75 were grouped as a 2.5 GPA; students with GPAs between 1.75 - 2.25 were grouped as a 2.0 GPA; students with GPAs between 1.25 - 1.75 were grouped as a 1.5 GPA; students with GPAs between 0.75 - 1.25 were grouped as a 1.0 GPA; students with GPAs between 0.25 - 0.75 were grouped as a 0.5 GPA; all other students were grouped as a 0 GPA.
I utilized MatchIt in the statistical package R to calculate a preference score for each of the cohorts compared with the control group. I defined a total of 5 models, 1 for each treatment cohort as defined above, and applied a preference score match to each model. I utilized the subclass matching methodology for all models to match up similar observations using the above 3 independent variables between the control and each treatment cohort. This matching methodology resulted in no discarded control or treatment observations.
I then conducted a linear regression analysis on each matched model to assess the impact on exam scores of the teaching techniques on each treatment cohort compared with the control group. An average treatment on the treated was calculated for each model (Griefer, 2022; Zhao et al., 2021).
I investigated whether any of the treatment cohorts performed better on exams compared to the control group. Other educational researchers have identified several covariates that may significantly correlate with student outcomes, such as student demographics and cumulative college GPA (Alyahyan & Düştegör, 2020; Fischer et al., 2015). In examining my data set, I noted that gender and cumulative GPA both significantly correlated with average student exam scores, as shown in Figure 1.
Educational researchers have also suggested that increasing the amount of homework and practice correlates with improved educational outcomes (Latif & Miles, 2011; Ryan & Hemmes, 2005). However, I found that the total number of assignments completed by each student was not clearly correlated with student exam performance as shown in the left panel of Figure 2. In addition, the middle panel of Figure 2 shows that the total possible assignments to complete may be negatively correlated with student exam performance. However, I found that the percentage of assignments completed by the student was significantly correlated with 14% higher exam average scores (Figure 2, right panel).
Love plots illustrating that the matched models are in better balance (where the standardized mean differences on all variables are less than 0.1) are provided in Figure 3 for each cohort. I then conducted a linear regression analysis on each matched model to assess the impact on exam scores of the teaching techniques on each treatment cohort compared with the control group. An average treatment on the treated (ATT) was calculated for each model (Griefer, 2023; Zhao et al., 2021). This analysis is summarized in Table 1.
| Table 1. Impact of Teaching Techniques on Exam Scores. | ||||
|---|---|---|---|---|
| Cohort | ATT | P | F-statistic | Adjusted R squared |
| A | +0.021 | 0.00239** | 71.88 | 0.3008 |
| A2 | +0.019 | 0.01525* | 67.63 | 0.2873 |
| B | -0.021 | 0.0296* | 38.63 | 0.2186 | C | -0.004 | 0.661 | 46.85 | 0.2556 | D | -0.014 | 0.0125* | 87.42 | 0.2288 | D | +0.015 | 0.013* | 89.37 | 0.2328 |
| ATT = average treatment on the treated. ** = p < 0.01; * = p < 0.05. | ||||
I found that the teaching techniques implemented in treatment cohorts A, A2, and E showed a significant, small improvement on average exam scores compared with the control group. Conversely, cohorts B and D showed a significant, small decline in average exam scores compared with the control group. Treatment cohort C did not show a significant difference in average exam scores compared with the control group. Finally, exam scores from 2014 - 2024 are presented in Figure 4.
The trend line in Figure 4, frustratingly, shows a return in 2024 to similar 2014 median exam scores, in spite of numerous course changes away from primarily lecture to a wide variety of other student engagement techniques. One might conclude that traditional methods to teach law are no worse or better than more contemporary techniques from this graphic. However, the story is more complex when applying a preference score methodology to control for known covariates, with some groupings of techniques showing improvement on exam scores for students, while other techniques are not correlated with such an improvement.
For example, the additional teaching techniques in treatment cohorts A and A2 seemed to show an improvement on average exam scores compared to the control (Table 1). In that case, I had implemented an OER textbook while also breaking down the tests into 4 non-cumulative exams. I also introduced the idea of student journaling in these cohorts, where students were invited to write independently about content that interested them from each unit and reflect on what they had learned from the material. For in-person students, I also introduced plickers, which I used in the classroom primarily as a review activity where students could respond to multiple choice questions about the content covered in that class meeting, and discuss the questions where students appeared to be confused to help clarify the topic. I also examined whether plickers alone was impactful on exam scores in cohort E, and found that the method again was (compared with all units not exposed to this activity) positively correlated with improved exam average scores.
However, other techniques that one would expect would be successful in improving student exam success, such as practice exams and Kahoots!, turned out not to be correlated with improved average exam scores. This result was unexpected given prior research on the topic. A study by Bernal (2018) involved the use of Kahoots! in a university chemistry course, where the author compared 2 groups of chemistry students that participated in Kahoots! during the academic term to a prior control group not exposed to this treatment. Bernal found that the treatment groups (combined n = 89) both passed the final exam at a higher rate than the control group from the prior academic year. However, this study did not attempt a preference score match of control and treatment units, and so it is possible that the positive result with Kahoots! in that study was caused by differences in other variables not considered in the pilot study.
Similarly, Iwamoto (2019) implemented Kahoots! in a first-year university general psychology class (n = 49) and found that students exposed to Kahoots! showed a significant improvement in test scores. In this course a coin toss was used to determine which section would participate in the Kahoots! rather than receiving a study guide prior to each exam. There was a significant (p = 0.008) improvement in mean test scores, however, Iwamoto’s study involved fewer students exposed to the treatment as compared with the present study, and he notes that other confounding variables may have been present that may have contributed to the correlation.
In the present study, I did not assess student self-regulation, but I did control for both the overall GPA of students in matching control and treatment units, and the rate at which students completed homework assignments for the course. Other studies have shown that GPA is highly correlated with academic outcomes (Alyahyan & Düştegör, 2020; Fischer et al., 2015), and it is possible that there was variation in student GPAs in the Iwamoto (2019) study that may have impacted the outcome.
Student motivation is a psychological factor that other educational researchers have identified as significantly correlated with academic performance (Cho, 2023). Cho administered an academic motivation scale to 350 college freshmen enrolled in an elective course taught in 3 modalities and found that both intrinsic and extrinsic motivation were significant predictors of midterm exam grades in the course. I posit that the rate at which students complete the homework in the course may be correlated with student motivation to succeed in the class. A follow-up study that administered a survey like the academic motivation scale with homework completion rates could help to determine if there is a significant correlation between motivation and homework completion rates, such that the latter could indirectly measure the former.
One other unexpected result is that adding more homework assignments actually appears to be slightly negatively correlated with student performance, even though other researchers have identified a significant correlation between homework and academic success (Planchard et al., 2015). I found a significant negative correlation (-0.097) between the total number of homework assignments and average exam scores using the ggpairs function in GGally in R. However, McJames and colleagues (2024) applied a machine learning model, Bayesian Causal Forests, to study 8th grade Irish student data from the Trends in International Mathematics and Science Study (TIMMS) from 2019 and examined the frequency and time spent by Irish students completing homework, and the relationships of these to mean achievement scores. Their results suggest that daily mathematics homework increased mean achievement scores by 7.51 points, and science homework 3 – 4 days per week increased mean achievements scores by 5.31 points. McJames’ study also takes into account the duration students reported spending on science and math homework, which suggests a possible follow-up study on the duration students are spending on my homework assignments, and whether that (or the frequency with which they are assigned, or both) may be correlated with exam scores.
In terms of the reliability of the models used for this analysis, all models appear to have reasonably improved balance for the covariates included within the study based on the Love Plots in Figure 3. As summarized below in Table 2, bootstrap sampling of each model through the boot library with boot.ci indicates each model shows a reliable and significant result where noted, when run repeatedly with varying samples from the underlying matched model, as a result of relatively low bias and standard error values, and the 95% confidence interval has a narrow range that excludes zero (Griefer, 2023). Treatment cohort C did not show a significant result and was excluded from the bootstrap sampling.
| Table 2. Reliability of Preference Score Match Models. | |||
|---|---|---|---|
| Cohort | Rosenbaum Sensitivity Test | Bootstrap Sampling | |
| A | Unconfounded estimate 0.0046 P = 0.0803 at Gamma 1.2 |
Original 1.033266 Bias = 0.00158 SE = 0.019 95% percentile confidence = 0.996 to 1.075 |
|
| A2 | Unconfounded estimate 0.044 P = 0.1443 at Gamma 1.2 |
Original 1.031743 Bias = 0.01139 SE = 0.025 95% percentile confidence = 0.998 to 1.095 |
|
| B | Unconfounded estimate 0.9929 P = 0.9929 at Gamma 1 |
Original 1.00474 Bias = -0.0184 SE = 0.0259 95% percentile confidence = 0.931 to 1.033 |
|
| C | Unconfounded estimate 0.4446 P = 0.4446 at Gamma 1 |
n/a | |
| D | Unconfounded estimate 0.9559 P = 0.9559 at Gamma 1 |
Original 0.9986617 Bias = -0.009 SE = 0.0136 95% percentile confidence = 0.9638 to 1.0186 |
|
| E | Unconfounded Estimate 0.0002 P = 0.1021 at Gamma 1.2 |
Original 1.0281 Bias -0.003438 SE 0.01534 95% percentile confidence = 0.995 to 1.055 |
|
| SE = standard error | |||
However, there are several important limitations on the above analysis. First, not all variation is accounted for based on the adjusted R squared values in Table 1 (where the treatments account for between 22% and 30% of the overall variability in average exam scores). Unknown additional covariates are likely to be present in the population studied that may further explain the correlations, though these additional covariates were not available to include in this study. This limitation is further highlighted in examining the loss of significance relatively close to the unconfounded estimate using the Rosenbaum sensitivity test for each model above, where the gamma is relatively close to 1 at the point where the significance value exceeds 0.05 (and treatment cohorts B, C, and D show no significance at the start of this validation step). The Rosenbaum sensitivity test combined with the relatively low R squared values in Table 1 emphasize that the treatment effects evaluated in these models could be sensitive to potential hidden biases, such as other demographic or student-specific variables.
Second, I also considered whether course modality (whether the course was taught in-person or online) and whether courses that were held pre-pandemic or during and after the pandemic impacted average exam grades, but found no linear relationships between these variables using the ggpairs function from GGally in R where the correlation value varies from 1 to -1, and values at or close to 0 showing no correlation. The function also provides a significance estimate to determine significance from p values < 0.05 to < 0.01 and < 0.001. Teaching modality showed a non-significant correlation with exam averages of 0.024, supporting the premise that modality is not related to exam scores in the data studied. In addition, I defined an additional dummy variable of whether the semesters were prior to the pandemic or post-pandemic, and found a non-significant correlation with exam averages of -0.016, supporting the premise that whether students took the course before or during the pandemic has no relationship with the exam scores in the dataset. However, the absence of relationships between these variables and exam scores may not generalize to larger student populations as I have found that instructional modality can impact course outcomes (Faith, 2024) and the pandemic itself may have had an impact on student outcomes as a result of negative impacts on student self-efficacy and motivation (Wolniak & Burman, 2022).
Third, the control groups used in this study are relatively small. A more robust preference score match may be achieved by having a larger group of control units that was available to study. In some cases, other researchers have indicated that a control group that is substantially larger than the treatment group can reduce bias or improve matching among control and treatment units in the study (Desai et al., 2016).
Finally, preference score matching is not a controlled study where students are randomly assigned to a control or treatment classroom, as such a study was not feasible. A controlled study that assessed a wider variety of covariates may lead to higher confidence in the correlations between various treatments considered in this study.