 | Time-of-Day Effect on Student Evaluations
Mark Linville
Washington State University
Student
evaluations are commonly used to measure the quality of teaching (Kemp and Kumar
1990). Yet, the validity of student evaluations as a tool to evaluate teaching
is still being debated (Marks 2000). Research has found that several factors
unrelated to learning affect student evaluations: class size (Fernandez et al
1998), perceived teacher care (Teven and McCroskey 1997), enthusiasm (Lang 1997),
gender (Centra and Gaubatz 2000), instructor’s pregnancy (Baker and Copp
1997), prior interest in the topic (Scherr and Scherr 1990), expected grade
(Scherr and Scherr 1990), and elective or required status of the course (Scherr
and Scherr 1990). Some of these factors produce small effects but the existence
of any bias raises questions, in this case, about the validity of student evaluations.
I find evidence of another bias that appears to have been ignored in recent
research: time-of-day effect.
Method and Results
My university
uses a 0-4 summary scale for student evaluations. During the time of the study,
my overall student evaluations averaged 3.03, my auditing course 2.89, and my
other courses 3.26. The department and college means during that time were 2.93
and 2.85, respectively. In analyzing the evaluations in the auditing course,
I noted that student evaluations in early morning auditing sections were consistently
below the evaluations in auditing sections offered later in the day.
The student
evaluations used in this study are from the sixteen sections of undergraduate
auditing that I have taught. The instructional methods used in the course differ
little across the semesters and do not differ at all within the semesters in
which I taught two sections of the course. Therefore, instructor skill and instructional
methods should be controlled so that variation in student evaluations can be
divined as from another source, and not from teaching variations. In all cases
the early morning section is a twice-a-week 75-minute class beginning at 7:45
a.m. The other auditing sections had a variety of starting times ranging from
11:00 a.m. to 3:45 p.m. and generally met twice a week (two of the sections
met three times a week).
To determine
whether time of day affects student evaluations, I have divided the evaluations
of the auditing course into two groups: sections offered in the early morning
and sections offered at other times. Each group has eight sections. The evaluations
from the sections offered early in the morning range from 2.44 to 3.10 (standard
deviation of 0.18) on the summary measure. The evaluations from sections offered
at other times of the day range from 2.90 to 3.27 (standard deviation of 0.13).
The mean evaluation of the sections at other times of the day is 3.06 while
the mean evaluation of the early morning sections is 2.72, a 12.50% difference
which is statistically significant (p = 0.0008 in a two-tail test). Clearly,
the student evaluations are lower in the early morning sections than in the
sections of the same class offered later in the day despite the fact that almost
all elements of the class were the same.
One possible explanation
for the higher student evaluations in the later sections is that the initial,
early morning presentation provided practice that improved the subsequent, later
section’s presentation. In four semesters during the time frame of this
study, I taught two sections of auditing — one at the early morning time
and the other at a time later in the day which allows testing of this explanation.
I compare the student evaluations from the sections given the initial presentation
— eight of the sections are at the early morning time and three are at
a later time in the day. If an inferior presentation is the problem, the means
from these two groups should be approximately the same. As reported above, the
mean student evaluation in the early morning sections is 2.72. The mean student
evaluation in the sections given the same presentation later in the day is 3.11.
The difference is statistically significant (p = 0.0085, two-tail test) which
suggests that the initial class presentation is the not the cause of the lower
student evaluations in the early morning sections.
To further test
whether improvement in subsequent presentations is the source of the lower student
evaluations in the early morning classes, I compare student evaluations from
the sections meeting later in the day. Three sections had the initial presentation
of the material and five sections were given a subsequent presentation. If the
initial presentation differs in quality from the subsequent one, a difference
in mean student evaluations should be seen. The mean for the first group as
reported earlier is 3.11 while the mean for the second group in 3.02. The difference
is not statistically significant (p = 0.3686, two-tail test). Like the prior
results, these results suggest that a differing quality between the initial
and subsequent presentations is not likely causing the lower student evaluations
in the early morning sections. Time of day is left as a more likely factor.
Discussion
The results
reported above show that student evaluations from early morning sections of
a course are lower than student evaluations in sections of the same course offered
later in the day. This could be due to student resentment of the early morning
time. The most common complaint I heard from the early morning sections (other
than possibly the ubiquitous "too much work" complaint) was that the
class was "too early."
Another possibility
for the lower student evaluations is that my class presentation skills were
negatively affected by the early time. If so, I was not consciously aware of
it. Indeed, in four semesters when I taught an early morning section (7:45 a.m.)
and a late afternoon section (3:45 p.m.), I was concerned that due to mental
fatigue my presentations were weaker in the afternoon sections. Yet, student
evaluations in the afternoon sections were higher than in the morning sections
in each of the four semesters despite exactly the same lectures and class
activities in the two sections (2.71 vs. 2.92; 2.44 vs. 3.10; 2.72 vs. 2.90;
2.73 vs. 3.00; morning vs. afternoon respectively). Although I am convinced
that my morning lecture presentations were not inferior to those in the afternoon,
a limitation to this study is that I have a vested interest in drawing such
a conclusion.
If early morning
times negatively influence student evaluations, the effects of this possible
bias could be easily mitigated. Administrators could rotate among instructors
the time slots unpopular with students or possibly avoid scheduling classes
at those times altogether. Another possibility is that student evaluations from
unpopular times could carry less weight for faculty evaluation purposes.
Additional
research is needed on student evaluations. The results of research on the validity
and effectiveness of student evaluations have been equivocal (Marks 2000). Research
has noted several dysfunctional effects that appear to be induced as faculty
attempt to increase their student evaluations (Crumbley 1995) including grade
inflation (Nelson and Lynch 1984, Addy and Herring 1996). If biases exist in
student evaluations or dysfunctional behaviors are being induced, these need
to be identified in order to mitigate them. If enough biases or dysfunctional
behaviors exist so that student evaluations are deemed inherently flawed, then
more appropriate methods of evaluating teaching must be developed (Crumbley
1995, Wallace and Wallace 1998). Ultimately coming to a consensus about student
evaluations is important because many important decisions are made using student
evaluations. If the instrument used to make these decisions is flawed, many
parties may be negatively affected: students who may not truly receive quality
instruction, faculty may be denied promotion or tenure, and the university may
make inappropriate resource decisions.
References
Addy, N., and
C. Herring, 1996. "Grade Inflation Effects of Administrative Policies."
Issues in Accounting Education 11(1): 1-13.
Baker, P.,
and M. Copp, 1997. "Gender Matters Most: The Interaction of Gendered
Expectations, Feminist Course Content, and Pregnancy in Student Course Evaluations."
Teaching Sociology 25(1): 29-43.
Centra, J.
A., and N. B. Gaubatz, 2000. "Is There Gender Bias in Student Evaluations
of Teaching?" Journal of Higher Education 71(1): 17-33.
Crumbley, D.
L. 1995. "The Dysfunctional Atmosphere of Higher Education: Games Professors
Play." Accounting Perspectives 1 (Spring). http://www.bus.lsu/accounting/faculty/lcrumbley/behavior.html
Fernandez,
J., M. A. Mateo, and J. Muniz, 1998. "Is There A Relationship Between
Class Size and Student Ratings of Teaching Quality?" Educational
and Psychological Measurement 58(4): 596-604.
Kemp, B. W.,
and G. S. Kumar, 1990. "Students Evaluations: Are We Using Them Correctly?"
Journal of Education for Business (November/December): 106-111.
Lang, S. S.
1997. "Student Ratings Soar When Professor Uses Enthusiasm." Human
Ecology Forum 25(4): 24.
Marks, R. B.
2000. "Determinants of Student Evaluations of Global Measures of Instructor
a Course Value." Journal of Marketing Education 22(2): 108-119.
Nelson, J.,
and K. Lynch. 1984. "Grade Inflation, Real Income, Simultaneity, and
Teaching Evaluations." Journal of Economic Education (Winter):
21-36.
Scherr, F.,
and S. S. Scherr, 1990. "Bias In Student Evaluations of Teacher Effectiveness."
Journal of Education for Business 65(May): 356-8.
Teven, J. J.,
and J. C. McCloskey, 1997. "The Relationship of Perceived Teacher Caring
with Student Learning and Teacher Evaluation." Communication Education
46(1): 1-9.
Wallace, J.
J. and W. A. Wallace, 1998. "Why The Costs of Student Evaluations Have
Long Since Exceeded Their Value." Issues in Accounting Education
13(2): 443-447.
|