January 8, 2003

EVI-2 Report

This report is for the EVI-2 administration in November 2002.  There were 52 pre-tests and 57 post-tests administered and completed.  A total of 47 students completed both the pre- and post-test. Because there were more than thirty participants with both pre- and post-test scores, repeated measures t-tests could be conducted on the scale scores in order to assess if there was a significant increase in the scores after participation in the program. Descriptive statistics for each subscale, and the total score can be found in Table 1 below, with the results of the repeated measures t-tests included in the right-hand column of the descriptive statistics table. 

There was a significant increase in the subscales of Integrity, Values, and Ethics.  There was no significant difference from pre- to post-test in the Community subscale.  There was also a significant increase in the total scores from pre- to post-test.

Table 2 provides descriptive statistics for each item at both pre- and post-test.  The percentage of students answering each item correctly can be obtained by looking at the means for each item.

Reliability estimates (Cronbachs coefficient alpha) were computed for the scores from both the subscales and total test at both pre- and post-testing occasions. The results of these analyses are in Table 3.  Reliability is the degree to which the items representing a construct are answered consistently. Reliability, in this case internal consistency, is necessary to assess if one intends to compute summated scores.  In the current situation, the EVI-2 was designed to produce a total and four subscale summated scores.  Therefore, the consistency of the responses to the items representing each summated scale score must be assessed.  If reliability is poor, one should not compute a summated scale score because the items used to compute the score are not answered consistently.  The more reliable a test, the more confidence we can have that the scores obtained from the administration of the test are essentially the same scores that would be obtained if the test were readministered.  Reliability is expressed numerically, usually as a coefficient that ranges from 0 to 1.00.  A high coefficient indicates high reliability or consistency.  If the test scores were perfectly reliable, the coefficient would be 1.00.  However, seldom are test scores perfectly reliable.  Scores can be affected by errors in measurement or characteristics of the test itself (e. g., ambiguous items, items with no variance).  In general, reliabilities above .70 are considered adequate for program evaluation or research.  For evaluation of individual students, reliabilities closer to .80 are desirable.  Examining the results in Table 3, we can see that scores from the total test can be considered reliable.  However, the reliability estimates of the subscale scores are low to moderate. 

In order to get a better picture of why the reliability estimates are low to moderate for the subscale scores, correlation matrices, item total correlations, and the reliability value if an item was removed from the scale should be examined. These, therefore, are presented in Appendix A.  When examining the correlation matrices, negative correlations and extremely low correlations should be noted. This indicates that an item is negatively, or very weakly, related to another item that represents the construct. If two items are actually representing the same construct, one would expect them to have a moderate to strong positive relationship. Along the same lines, the item-total correlation indicates how correlated the corresponding item is to the total of all the items representing the subscale. If all the items consistently represent the construct, one would expect these item-total correlations to be positive and of at least moderate magnitude (i.e., at least .40). Finally, the column entitled Alpha if Item Deleted indicates what the reliability of the subscale score would be if the corresponding item was deleted. This should be compared to the actual Cronbachs Coefficient Alpha value (called Alpha in Appendix A). If the value of alpha is higher if the item is deleted, it indicates that this item isnt answered in the same manner as the other items representing the scale.

By looking at the correlation matrix for the Integrity subscale, it is apparent that items 9, 27, and 34 are negatively correlated with several other items representing the subscale. In addition, when looking at the Item-Total Correlation column, we can see that all three of these items have low values (below .14 !).  Further inspection of the Alpha if Item Deleted column reveals that if items 27 and 34 were removed, the reliability estimate of the scores would increase from .4761 to the .50 range. In fact, if all three of these items were removed from the scale the reliability of the Integrity scores would equal .56 (this information was gathered from re-running the analysis with these three items removed).  The above evidence would suggest that these three items may need inspection to see if the wording of the items or the distracters could be contributing to them being negatively correlated to the other items in the subscale.

When looking at the correlation matrix for the Values subscale, items 6, 7, 8 and 31 have several negative or extremely weak correlations with other items representing the subscale. In fact, item 6 has some serious problems as it is negatively correlated with ten of the other eleven items representing the subscale. Therefore, it is not a surprise that it has a negative item-total correlation (-.04). The negative and weak correlations between items 7, 8, and 31 with other items on the scale produces low item-total correlations for these three items (.09, .18, .21).  Finally, the Alpha if Item Deleted column indicates that the reliability estimate of the scores would increase if items 6 or 7 are removed and would remain the same if item 8 removed. In fact, if items 6, 7, and 8 are removed, the reliability increases to .70.

Problems items were also easily identified for the Ethics subscale. Specifically, items 24, 28, 32, 36, 38 and 39 had several negative or extremely weak relationships with other items representing the subscale. This information is further represented by the low or negative item-total correlations for these items. If items 24, 32, 38, or 39 were deleted reliability would increase (if all four are deleted it increases to .60, if 36 was also deleted it would increase to .61).

For the Community subscale, items 25 and 41 have several negative correlations with other items representing this construct. They, in turn, have negative item-total correlations. The Alpha if Item Deleted column reveals that if item 25 were removed from the subscale, the reliability estimate of the scores would increase from .5233 to .6500; this is quite an increase. In addition, it indicates that reliability would increase if item 41 was deleted. In fact, reliability increases to .69 when both items are removed from the subscale.

When examining the statistical characteristics of the items listed above (6, 7, 8, 9, 24, 25, 27, 28, 31, 32, 34, 36, 38, 39, 41), a few of them have little or no variance (e.g., 6, 7, 27) at post-test (see Table 2).  Because reliability is a function of the variance of the responses to items, low variance results in low reliability. Specifically, all or almost all students are responding correctly to these three items at post-test. Obviously, this is a good thing if the program promotes understanding of the concepts these items represent (even though it does decrease the estimate of reliability). However, notice that the majority of the students understood these concepts before completing the program (high percentage passing at pre-test). This questions the necessity of these items for evaluating the effectiveness of the program.

At this point, the items that functioned poorly for each subscale should be examined by content/program experts to try to identify the cause of the problem (poorly worded item, item doesnt actually represent construct, confusing options, etc.). For example, items 9, 27, and 34 of the Integrity scale caused concern. As noted above, item 27 had no variance at post-test, which in turn decreases its relationship with other items representing the scale. That is fairly easy to diagnose. Item 9, however, had extremely weak relationships with other items and this doesnt seem to be due to a lack of variance. This item was not answered in the same manner as the other items representing the Integrity construct (demonstrated by the weak relationships with other items). The same can be said for item 34. The question that needs to be answered by program directors is why? Obviously, there is no right or wrong answer to this question, but if these items continue to function poorly something should be done. What that something is depends on how the problem is diagnosed.  

When looking at the reliability of the scores from the Values subscale, it looks adequate. In fact, the items causing the biggest problems (6, 7) are simply a function of no variance (everyone is basically answering the question correctly at post-test). As noted above, given that students come in to the program with this knowledge, a decision should be made if these items are necessary to cover the breadth of this construct.

There were several items that caused problems for the Ethnic subscale. Again, you should go through each of these items to try to diagnose the problem. For example, Item 38 displays some interesting characteristics. There is variability in the responses (see Table 2) so the problem is not a function of low variance. When looking at the frequency distributions in Appendix B, it seems that students are split between b, the correct answer, and c, one of the distracters. Therefore, a possibility for the negative correlation with other items on the scale is that students who have gotten the other items correct may have gotten this item wrong because they felt that c is the best definition.

As a final example, item 25 was one of the problem items representing the Community scale. However, on the surface, the wording of the item appears to fit in the Values subscale. Each item should clearly represent the construct of interest with limited overlap with other constructs. This may be an issue with this item.

It should be re-stated that the reliability estimates reviewed above are for the post-test scores.  Analyses were also conducted for the pre-test scores and they indicated that items 6, 25, and 39 present the same concern as they did on the post-test.  However, at pre-test, items 11, 12, 28, 29, and 32 also have negative correlations with the total subscale scores.  To summarize our reliability analyses, it seems that one can confidently make inferences about program effectiveness using the total score, however, we caution use of the subscale scores.  If the instrument was designed for the purpose of attaining a students profile in terms of integrity, values, ethics, and community, we highly encourage further work on this measure.

Appendix B contains the frequency distributions for the unscored items both before and after the program (make sure to examine the Valid Percent Column). These items were left unscored so one could see which distracter options were being chosen most often at each time period (the option with the * next to it was the correct answer). This can help identify misconceptions students bring to the program and those they leave with. Interestingly, there are several items that a majority of students answer correctly at the pre-test.  This indicates that they come into the program with this knowledge.  If 70% or more students answered the item correctly, it was considered that they had prior knowledge of the item.  Items showing this effect are items 1-12, 14, 17, 24, 26, 27, 28, 31, 32, 34-37, 39-44, 46, and 47 (68% of the items). This indicates that students know the majority of the concepts before completing the program. This information can be useful in program development or redesign. Interestingly, for item 19, fewer students get the item correct at post-test than at pre-test.

Finally, Table 4 displays the results for the demographic items (48-51).  According to these results, most students taking this administration of the EVI-2 are males, sophomores, and have not completed the By the Numbers or the Calling the Shots sanctions.


Table 1

Subscale Means, Standard Deviations, and t-values

Scale

Pre-Test Mean

 (SD)

Post-Test Mean (SD)

Number of Subjects

t-values, p

 Integrity

(items 2, 5, 9, 15, 17, 21, 27, 34)

6.1277  (1.32889)

6.7872  (1.33410)

47

3.028, p=.004

Values

(items 1, 4, 6, 7, 8, 14, 20, 22,

23, 26, 31, 45)

9.2128  (1.68027)

10.2979  (1.87564)

47

3.944, p=.000

Ethics

(items 11, 13, 16, 18, 19, 24

28, 29, 30, 32, 33, 35, 36, 38,

39, 40, 46)

11.3191  (1.95722)

13.3617  (2.35377)

47

3.660, p=.001

Community

(items 3, 10, 12, 25, 37, 41,

42, 43, 44, 47)

7.5745  (1.26396)

7.9149  (1.26542)

47

1.563, p=.125

Total

35.1489  (4.42807)

39.2979  (5.83809)

47

  5.833, p=.000


Table 2

Item Means and Standard Deviations

Item

 Number

Pre-Test Mean

 (SD)

Post-Test Mean (SD)

Number of

Subjects

1

.81   (.398)

.85   (.360)

47

2

.83   (.383)

.85   (.360)

47

3

.92   (.282)

.94   (.247)

47

4

.85   (.360)

.91   (.282)

47

5

.77   (.428)

.96   (.204)

47

6

.87   (.337)

1.00   (.000)

47

7

.96  (.204)

.98   (.146)

47

8

.98   (.146)

.87  (.337)

47

9

.83  (.383)

.81   (.398)

47

10

.89  (.312)

.94   (.247)

47

11

.77   (.428)

.89   (.312)

47

12

.96   (.204)

.94   (.247)

47

13

.40   (.496)

.75   (.441)

47

14

.92   (.282)

.94   (.247)

47

15

.64   (.486)

.79   (.414)

47

16

.53   (.504)

.72   (.452)

47

17

.96   (.204)

.83   (.380)

47

18

.34   (.479)

.75   (.441)

47

19

.70   (.462)

.55   (.503)

47

20

.38   (.491)

.62   (.491)

47

21

.53   (.504)

.72   (.452)

47

22

.30   (.462)

.66   (.479)

47

23

.72   (.452)

.91   (.285)

47

24

.89   (.312)

.89   (.312)

47

25

.47   (.504)

.60   (.496)

47

26

.81   (.398)

.85   (.360)

47

27

.85   (.360)

1.00   (.000)

47

28

.85   (.360)

.85   (.360)

47

29

.11   (.312)

.57   (.500)

47

30

.49   (.505)

.68   (.471)

47

31

.92   (.282)

.92   (.282)

47

32

.87   (.337)

.85   (.363)

47

33

.68   (.471)

.85   (.360)

47

34

.89   (.312)

.83   (.380)

47

35

.94   (.247)

.98   (.146)

47

36

.92   (.282)

.94   (.247)

47

Item Number

Pre-Test Mean

(SD)

Post-Test Mean

 (SD)

Number of Subjects

37

.89   (.312)

.98   (.146)

47

38

.45   (.503)

.60   (.496)

47

39

.85   (.360)

.94   (.247)

47

40

.79   (.414)

.77   (.428)

47

41

.89   (.312)

.94   (.247)

47

42

.94   (.247)

.89   (.312)

47

43

.87   (.337)

.89   (.312)

47

44

.89   (.315)

.87   (.337)

47

45

.70   (.462)

.81   (.398)

47

46

.75   (.441)

.81   (.398)

47

47

.79  (.414)

.87   (.337)

47

Table 3

                                           Reliability of Total Test and Subscales

 

Pre-Test

Post Test

Integrity

Subscale

a=.3826

a=.4761

Values

Subscale

a=.4835

a=.6765

Ethics

Subscale

a=.3822

a=.5344

Community

Subscale

a=.4517

a=.5233

Total

Test

a=.7152

a=.8393


Table 4

Results of Demographics Questions 48-51

For these questions, the breakdown of the total number of students who responded to these demographic questions at pre-test and post-test is reported.  The Combined columns are the results for those who responded at both pre- test and post-test.  Some students did not answer these or used a response that is not included in the choice of responses.

Gender

 

Pre-Test (N)

Pre-Test

%

Post Test (N)

Post Test

%

Combined (N)

Combined

%

Male

40

78.4

39

70.9

35

76.1

Female

11

21.6

16

29.0

11

23.9

Total

51

100.00

55

100.00

46

100.00

Class Level

 

Pre-Test (N)

Pre-Test

%

Post Test (N)

Post Test

%

Combined (N)

Combined

%

Freshman

14

27.5

12

21.4

11

23.9

Sophomore

27

52.9

29

51.8

26

56.5

Junior

7

13.7

9

16.1

6

13.0

Senior

3

5.9

6

10.7

3

6.6

Total

51

100.00

56

100.00

46

100.00

Completion of By the Numbers

 

Pre-Test (N)

Pre-Test

%

Post Test (N)

Post Test

%

Combined (N)

Combined

%

Yes

11

22.0

12

22.6

10

22.2

No

39

78.0

41

77.4

35

77.8

Total

50

100.00

53

100.00

45

100.00

Completion of Calling the Shots

 

Pre-Test (N)

Pre-Test

%

Post Test (N)

Post Test

%

Combined (N)

Combined

%

Yes

3

6.1

3

5.6

3

6.7

No

46

93.9

51

94.4

42

93.3

Total

49

100.00

54

100.00

45

100.00

Appendix A

Item Total Correlations

And

Correlation Matrices for

Subscales


Integrity Subscale

Correlation Matrix

        I2P        I5P       I9P        I15P        I17P       I21P     I27P    I34P

I2P     1.0000

I5P      .0532    1.0000

I9P      .3791     .0358    1.0000

I15P     .2869    -.0080     .1304     1.0000

I17P    -.0536     .1831    -.0732      .2144     1.0000

I21P     .1263     .2918     .0849      .3417      .4341    1.0000

I27P    -.0771     .2779    -.0826     -.0985      .0880     .0596   1.0000

I34P     .0792     .0200    -.0732      .2144     -.0915    -.0492   -.0880    1.0000  

Scale          Scale       Corrected

               Mean         Variance       Item-               Alpha

              if Item        if Item       Total              if Item

              Deleted        Deleted    Correlation           Deleted

I2P            5.8070         1.4799        .2704              .4203

I5P            5.7544         1.5815        .2374              .4377

I9P            5.8246         1.5758        .1324              .4750

I15P           5.8772         1.2882        .4027              .3512

I17P           5.8421         1.4925        .2065              .4457

I21P           6.0175         1.1604        .4252              .3244

I27P           5.7018         1.8202       -.0425              .5032

I34P           5.8421         1.6711        .0152              .5227

Alpha=.4761

Values Subscale

Correlation Matrix

         I1P       I4P       I6P       I7P       I8P

I1P      1.0000

I4P       .2831    1.0000

I6P      -.0550    -.0374    1.0000

I7P       .1964    -.0534    -.0259    1.0000

I8P       .0236     .1281    -.0467    -.0667    1.0000

I14P      .3561     .2419    -.0321    -.0458     .1740

I20P      .0896     .2028     .1676     .0422     .0760

I22P      .1386     .2406    -.0966     .0653     .1176

I23P     -.1278     .3995    -.0422     .2772     .0940

I26P      .3771     .2562    -.0590    -.0842     .1628

I31P     -.1132     .1923    -.0374    -.0534     .1281

I45P      .1835     .3864    -.0667     .1470     .1194

         I14P      I20P      I22P      I23P      I26P     I31P     I45P

I14P     1.0000

I20P      .2958    1.0000

I22P      .1645     .4275    1.0000

I23P      .2036     .2610     .4369    1.0000

I26P      .3278     .1458     .0972     .2040    1.0000

I31P     -.0660     .3448     .3870     .1563    -.1214   1.0000

I45P      .0820     .2465     .2153     .3181     .2732    .0374    1.0000

Item-total Statistics

               Scale          Scale     

               Mean         Variance       Item-           Alpha

              if Item        if Item       Total          if Item

              Deleted        Deleted    Correlation       Deleted

I1P            9.4643         3.0169        .2583          .6665

I4P            9.3929         2.9701        .4698          .6385

I6P            9.3393         3.4646       -.0483          .6886

I7P            9.3571         3.3610        .0908          .6815

I8P            9.4286         3.1584        .1826          .6764

I14P           9.3750         3.1114        .3686          .6538

I20P           9.7143         2.4987        .4601          .6286

I22P           9.6607         2.5192        .4688          .6261

I23P           9.4107         2.9373        .4444          .6391

I26P           9.4821         2.9088        .3262          .6552

I31P           9.3929         3.1883        .2183          .6701

I45P           9.5179         2.7633        .4010          .6411

Alpha=.6765

Ethics Subscale

Correlation Matrix

         I11P      I13P      I16P      I18P        I19P

I11P     1.0000

I13P     -.0044    1.0000

I16P      .0273     .1771    1.0000

I18P      .1526     .1771     .0418    1.0000

I19P      .1336     .1307     .2045     .0341      1.0000

I24P      .2588    -.2025     .1056     .1056       .1291

I28P     -.0214     .0413     .1928     .1928       .1405

I29P      .3240     .0822     .0750     .1591       .2544

I30P      .0987     .0914     .2414     .1511       .0161

I32P      .0179     .2359    -.0979     .2778       .0223

I33P     -.0386     .3208    -.1672     .0492       .0000

I35P      .3563     .2125    -.0795     .2329       .1667

I36P     -.0917     .0224     .2272     .0434       .1307

I38P      .0367    -.0914     .1421    -.2015       .1222

I39P      .1031    -.1794    -.1637    -.1637      -.0857

I40P      .0624     .2432    -.2076     .0955       .1078

I46P     -.0546     .1801     .2296     .0209       .1485

          I24P      I28P      I29P      I30P        I32P

I24P     1.0000

I28P      .0311    1.0000

I29P      .2319     .2054    1.0000

I30P      .0622    -.0831     .2842    1.0000

I32P     -.1208    -.0214    -.0060    -.0193      1.0000

I33P     -.1491     .0463     .3457     .1947      -.0386

I35P     -.0430    -.0602     .1547     .2035       .3563

I36P     -.0760    -.1062     .1115     .0126      -.0917

I38P      .0118    -.3476    -.0123     .0412      -.1878

I39P     -.0886     .0654     .0359    -.1873      -.1069

I40P      .1392     .2423     .2453     .2182       .0624

I46P     -.1581     .0246     .0183     .0590      -.0546

         I33P      I35P      I36P      I38P        I39P

I33P     1.0000

I35P     -.0642    1.0000

I36P     -.1132    -.0327    1.0000

I38P     -.0794    -.1069     .3056    1.0000

I39P      .0495    -.0381    -.0673     .0681      1.0000

I40P      .3217     .2576    -.1269    -.2339       .1911

I46P      .2357     .2722     .0801    -.0187       .2100

     Item-total Statistics

               Scale          Scale     

               Mean         Variance       Item-           Alpha

              if Item        if Item       Total          if Item

              Deleted        Deleted    Correlation       Deleted

I11P          12.5455         4.8451        .2206         .5132

I13P          12.7091         4.5434        .2719         .4993

I16P          12.6727         4.7057        .2023         .5150

I18P          12.6727         4.6687        .2226         .5106

I19P          12.8182         4.4108        .3032         .4905

I24P          12.5091         5.1064        .0719         .5355

I28P          12.5818         4.9515        .1167         .5308

I29P          12.8545         4.0896        .4667         .4468

I30P          12.7273         4.5354        .2678         .5000

I32P          12.5455         5.1414        .0199         .5451

I33P          12.6000         4.7630        .2180         .5123

I35P          12.4364         5.0653        .3317         .5173

I36P          12.4727         5.1428        .0862         .5327

I38P          12.8000         5.2741       -.1020         .5840

I39P          12.4909         5.2916       -.0626         .5510

I40P          12.6364         4.6061        .2823         .4986

I46P          12.6182         4.7589        .2061         .5144

Alpha=.5344


Community Subscale

Correlation Matrix

         I3P         I10P        I12P        I25P        I37P

I3P      1.0000

I10P      .2963      1.0000

I12P      .2963       .6481      1.0000

I25P     -.0255       .1359       .1359      1.0000

I37P      .5669       .5669       .5669      -.1059      1.0000

I41P     -.0648      -.0648      -.0648      -.0767      -.0367

I42P      .2046       .2046       .2046      -.2458       .4309

I43P      .1512       .1512       .1512      -.1868       .3571

I44P      .1512       .1512       .1512      -.0771       .3571

I47P      .1310       .1310       .1310       .0946       .3307

         I41P        I42P        I43P        I44P        I47P

I41P     1.0000

I42P     -.0852      1.0000

I43P      .3157       .2619      1.0000

I44P     -.1028       .2619       .3486      1.0000

I47P      .0867       .0532       .1566       .1566      1.0000

Item-total Statistics

               Scale          Scale     

               Mean         Variance       Item-           Alpha

              if Item        if Item       Total          if Item

              Deleted        Deleted    Correlation       Deleted

I3P            7.9474         1.4793        .3156         .4784

I10P           7.9474         1.4079        .4571         .4456

I12P           7.9474         1.4079        .4571         .4456

I25P           8.2807         1.5627       -.0822         .6500

I37P           7.9123         1.4743        .6565         .4509

I41P           7.9649         1.6416       -.0076         .5531

I42P           7.9825         1.4818        .2011         .5028

I43P           8.0175         1.3390        .3320         .4601

I44P           8.0175         1.3747        .2816         .4774

I47P           8.0351         1.3559        .2749         .4793

Alpha=.5233


Appendix B

Frequency Tables for Items 1-47
Frequency Tables for Items 1-47

The letter P after the item number indicates that this is a post-test item.  The asterisk indicates the correct response. Examine the valid percent column as it does not include missing data.