NSSE Logo
Richard Vaz

For several years we debated the need for substantive reform of our first-year curriculum. NSSE results provided evidence to persuade the faculty to change, and helped inform the new curriculum that we are now implementing.

Richard F. Vaz, Dean for Interdisciplinary and Global Studies, Worcester Polytechnic Institute

Reliability


Introduction to NSSE Reliability

The degree to which an instrument is reliable is an important indicator of an instrument’s psychometric quality. Reliability is the degree to which a set of items consistently measures the same thing across respondents and institutional settings. Another characteristic of a reliable instrument is stability, the degree to which the students respond in similar ways at two different points in time. One approach to measuring stability is test-retest, wherein the same students are asked to fill out NSSE two or more times within a reasonably short period of time. Very few large-scale survey instruments have test-retest information available due to the substantial expense and effort needed to obtain such information. It’s particularly challenging and logistically problematic for a national study of college students conducted during the spring term to collect test-retest data because of the amount of time available to implement the original survey and then in the short amount of time left in the term to locate once again and convince respondents to complete the instrument a second time.

Estimating the stability aspect of reliability is problematic in two other ways. First, the student experience is somewhat of a moving target; a month’s time for some students can make a non-trivial difference in how they respond to some items because of what’s transpired between the first and second administration of the survey. Second, attempts to estimate the stability of an instrument assume that the items have not changed or been re-worded. To improve the validity and reliability of NSSE, minor editing and item substitutions have been made prior to each administration. We will return to these points later.

Two additional pertinent indicators are estimates of skewness and kurtosis. Skewness represents the extent to which scores are bunched toward the upper or lower end of a distribution, while kurtosis indicates the extent to which a distribution of scores is relatively flat or relatively peaked. Values ranging from approximately + 1.00 to - 1.00 on these indicators are generally regarded as evidence of normality. For some items, out-of-range skewness values can be expected, such as participating in a community-based project as part of a regular course where, because of a combination of factors (major, course selection, faculty interest), relatively few students will respond something other than “never.”

Internal Consistency of NSSE Benchmarks

2009
2006
2005
NSSE Benchmark and Scale Items Internal Consistency and Intercorrelation

Student-level Test-retest Analysis

Assuming little variation in student behavior between the test and retest, we would expect consistent or reliable responses to the survey items. In 2002, we conducted a test-retest analysis using 1,226 respondents who completed the same form of the paper survey twice over a period of several months. For the students’ responses on the items related to three of the benchmarks (i.e., level of academic challenge, active and collaborative learning, and enriching educational experiences), the reliability coefficients were 0.74. Student responses for the items related to student interaction with faculty members and to supportive campus environment had reliability coefficients of 0.75 and 0.78, respectively. In 2005, we conducted the study again using 1,536 respondents who completed the paper or Web survey twice within a period of several months. The results were similar to the earlier study with the reliability coefficients ranging from 0.69 (level of academic challenge) to 0.74 (enriching educational experiences). The following table shows the test-retest analysis results from 2002 and 2005 NSSE survey administration. These findings suggest little variation in student responses from one testing period to the next.

Benchmarks Test-retest Correlations
2002 2005
Level of Academic Challenge 0.74 0.69
Active and Collaborative Learning 0.74 0.72
Student-Faculty Interaction 0.75 0.70
Enriching Educational Experiences 0.74 0.74
Supportive Campus Environment 0.78 0.70
N 1226 1536


Institution-level Stability Analysis

Assuming little variation in an individual student’s behavior within a short time period, we expect consistent or reliable responses to the survey items. In 2002, we conducted a test-retest analysis using 1,226 respondents who completed the same form of the paper survey twice over a period of several months. For the students’ responses on the items related to three of the benchmarks (i.e., level of academic challenge, active and collaborative learning, and enriching educational experiences), the reliability coefficients were 0.74. Student responses for the items related to student-faculty interaction and to supportive campus environment had reliability coefficients of 0.75 and 0.78, respectively. In 2005, we conducted the study again using 1,536 respondents who completed the paper or Web survey twice within a period of several months. The results were similar to the earlier study with the reliability coefficients ranging from 0.69 (level of academic challenge) to 0.74 (enriching educational experiences). The following table shows the test-retest analysis results from the 2002 and 2005 NSSE survey administration. These findings suggest little variation in student responses from one testing period to the next.