Technical Report No. 18: Reliablility
By Steve Benton
Today’s blog incorporates some of the content in Technical Report No. 18.
Reliability evidence is important for determining whether student ratings are consistent enough to be used as a source of evidence for making judgments about teaching effectiveness. When ratings vary substantially among students within the same course or when average instructor ratings change dramatically from one class to another, evaluative decisions about effectiveness are difficult. As revealed in the following paragraphs, credible evidence exists to support both the class-level and instructor-level reliability of each item in the IDEA system.
Hoyt and Lee (2002) provided evidence for adequate class-level reliability (i.e., consistency in ratings by students in the same class) by computing split-half reliabilities on each of the items on the DF. Classes with the number of student respondents ranging from 13-17 were randomly split and means were computed for each half. The means were then correlated, and the Spearman-Brown formula was applied to estimate reliabilities for class size ranges of 10-14, 15-34, 35-49, and 50+. For all items, split-half reliability estimates were above .80 when class size was at least 15. Standard errors of measurement (SEM) were approximately .30 or less once class size reached 10.
Consistency in ratings within the same class is a prerequisite of instructor-level reliability (i.e., stability in ratings of the same instructor across different classes) (Gillmore, 2000). However, ratings can have adequate class-level reliability without being consistent at the instructor-level. Benton et al. (2015) obtained measures of instructor-level reliability on IDEA items by computing inter-class reliability coefficients on a subset of data from 2,500 instructors who had been rated in at least five classes. The Spearman-Brown prophecy formula was then applied to estimate reliabilities for 1 to 15 classes. All reliability estimates approached or exceeded .60 for a single class. When at least two classes were rated, all coefficients were above .70 and most approached or exceeded .80. Reliability coefficients increased as the number of classes rated increased; all were .90 or greater when at least seven classes were rated. Standard errors averaged .30 or less for all items when at least two classes were rated.
Summary of Reliability Evidence
Substantial evidence exists to support both class-level and instructor-level reliability for IDEA items. As class size and the number of classes rated increases, faculty and administrators can gain greater confidence in the reliability of the ratings. Read the full report here.
Hoyt, D. P., & Lee, E. (2002). IDEA Technical Report No. 12: Basic data for the revised IDEA system. Kansas State University, Manhattan, KS: The IDEA Center.