Implementation Fidelity

On This Page...

What is Implementation Fidelity
Collecting IF data
Using IF Data to Evaluate Outcomes
Additional Resources
Integrating AI

Once you've developed an intentional, evidence-based program, it is important to evaluate the extent to which the program is actually implemented as intended. This requires collecting what is known as implementation fidelity data. During this step of the process, you will need to consider questions such as:

Is the program implemented with high quality?
Are students actively engaged in the program?
Do facilitators adhere to the program outline?
If the program is not implemented as intended, what conclusions (if any) can be drawn about program effectiveness?

The Assessment Cycle with step 4 highlighted.

What is Implementation Fidelity?

Implementation Fidelity (IF) refers to the degree to which a program is delivered as intended. When we develop programs, we often assume they will be implemented exactly as we’ve planned. In other words, we assume our programs will be implemented with high fidelity.

For many reasons, however, this may not be the case. Facilitators may (intentionally or unintentionally) stray from the curriculum, activities may run long or be cut short, or participants may be unengaged. Unless we as program developers personally implement every aspect of the program, we cannot know what programming students actually receive. This lack of information about the actual or delivered program is often referred to as the “black box” of outcomes assessment.

Black Box

Why do we care about IF?

The degree to which a program is delivered as intended has implications for how we interpret assessment results. If assessment results are positive (meaning students achieved the stated student learning outcomes, or SLOs), it is tempting to conclude the program was effective. Conversely, if results are negative, it is easy to conclude the program was ineffective. However, both of these conclusions are based on an untested assumption that the delivered program was exactly the same as the intended program. If the program was not delivered as intended (for example, the facilitator skipped an activity), then assessment results are not an accurate reflection of the program’s effectiveness. Luckily, by gathering implementation fidelity data, we can evaluate the degree of alignment between the delivered and intended program.

When should we collect IF data?

If resources allow, implementation fidelity data can be collected alongside student learning outcomes data. For example, you could administer a pretest of the learning outcomes, use an implementation fidelity checklist to document how well the program was implemented, then follow up with a posttest of the learning outcomes.

It is also possible (and sometimes recommended) to collect implementation fidelity data before collecting outcomes data. The logic behind this is simple: if a program is being implemented with low fidelity, it would be a waste of resources to collect outcomes data. By examining implementation fidelity first, you can address implementation issues before devoting the resources to outcomes assessment.

Collecting Implementation Fidelity Data

Implementation Fidelity Checklists

Implementation fidelity data can be collected using a chart or checklist (Swain, Finney & Gerstner, 2013). See the example below:

A strong implementation fidelity checklist typically has some (or all) of the following components:

Program Differentiation: Program differentiation involves articulating the program's SLOs and identifying the program features believed to facilitate mastery of each outcome. This is the first (and most important) step to developing a good implementation fidelity checklist. For example, in the checklist above, a single SLO is highlighted. This outcome is then mapped to a broad program component, which is subsequently separated into three specific program features.

Adherence: Adherence is the most basic information that can be collected about a specific program feature. Here we simply ask, was the intended programming delivered or not? This is generally a yes/no question.

Exposure: Additionally, we may want to ask were students exposed to the full program? This is typically answered by specifying a priori how much time each program feature is expected to take, then recording the actual time spent.

Quality: Broadly speaking, quality refers to how well a program feature was delivered or implemented. Evaluators may be trained to judge a number of qualities including (but not limited to): clarity, conciseness, charisma, facilitation skills, etc. In short, "quality" can be operationalized in many different ways. In the example above, quality is rated on a scale of 1 (low quality) to 3 (high quality).

Responsiveness: While most of the other checklist components are used to evaluate the performance of program facilitators, responsiveness ratings capture the degree to which students are engaged or actively participating. In the example above, responsiveness is rated on a scale of 1 (low engagement) to 3 (high engagement).

Practical Considerations

Implementation fidelity data can be collected in multiple ways. Each has its pros and cons:

Self-Report

Program facilitators evaluate their own program; students evaluate their own responsiveness.

Advantages: Time and cost efficient (no additional staff or equipment needed).

Disadvantages: Risk of desirability bias (i.e., students may feel pressure to report high responsiveness and facilitators may feel pressure to indicate perfect adherence to the planned program).

Outside Observation

Trained, independent evaluators (often posing as participants) observe and evaluate the program.

Advantages: Outside observers are less subject to social desirability; outside observers actually experience the program as a participant.

Disadvantages: Time and cost intensive. Auditing an entire program may take a long time. Additionally, hiring outside observers may be expensive.

Audio Recording

The program is audio recorded and reviewed by one or more evaluators at a later date.

Advantages: Cost effective, convenient, ability to review the data multiple times, ability to use multiple evaluators, ability have "blind" raters.

Disadvantages: Limits observation--loss of visual data; reactivity to recording (students and/or facilitators may act differently if they know they are being recorded).

Video Recording

The program is video recorded and reviewed by one or more evaluators at a later date.

Advantages: Relatively cost effective, convenient, ability to review the data multiple times, ability to use multiple evaluators, richer depiction of the environment.

Disadvantages: Camera costs, reactivity to camera (students and/or facilitators may act differently if they know they are being recorded).

Using Implementation Fidelity Data to Evaluate SLOs

By combining IF data with outcomes data, we can make much stronger inferences about the effectiveness of our programs. For example, if assessment results are unfavorable (meaning students did not achieve the stated SLOs) and implementation fidelity data suggest the program was implemented with high fidelity, then we can reasonably claim the program was ineffective.

If instead, however, implementation fidelity data indicate that the program was not implemented as intended, then unfavorable assessment results cannot be interpreted as a reflection of the program's effectiveness. Similar interpretations can be made when assessment results are favorable. These scenarios are summarized in the figure below:

Integrating AI (IF)

While AI can’t replace real data collection, it can help you predict problems, break down your program components, and refine your evaluation tools.

What evidence would you gather to describe the programming students actually experienced?

ChatGPT and Copilot Chat can:

Help brainstorm data sources or questions you might use to collect this evidence.
Suggest templates or checklists for documenting student experiences based on the fidelity data you must collect (see SASS’s IF webpage).

The following questions can only be answered with actual data collection from observations, facilitator feedback, and/or student responses which is not something AI can generate. However, there are still ways to utilize generative AI tools.

How is the designed program being implemented?

Use ChatGPT or Copilot Chat to:

Predict where programming may fall short due to time constraints or logistical challenges.
Identify parts of the program where students are likely to disengage.
Brainstorm what issues might arise by student group.

AI tools like ChatGPT and Copilot can help you think critically about fidelity of your program implementation, where things might go off-track and how to proactively plan for those risks.

!!! Note: These tools can help foreshadow potential issues, but should not be used as a substitute for actual fidelity monitoring. !!!

If your program must fit within a specific timeframe, use AI to brainstorm how to allocate time wisely across components. You may prompt with:

Break down this 60-minute session into parts based on these learning objectives.
How can I balance discussion and activity time in a 90-minute workshop?

!!! Note: AI can serve as a good starting point but should not replace the work you do as a program facilitator or implementation fidelity evaluator. !!!