AI Grading and Feedback Annotations

TimelyGrader helps instructors to streamline their grading and feedback workflows through the seamless integration of AI. However, due to a lack of trust in AI grading suggestions instructors were not adopting our AI grading feature. I overhauled this feature to build instructor trust in our AI’s ability to deliver accurate and reliable grading suggestions.

TIMELINE

April - May 2024

TEAM

Erin Forbes, Lead Product Designer

Chris Du, Product Manager

Mingyu Gao, Full-Stack Engineer

Brandon Lee, Back-End Developer

Timothy Chan, QA Test Engineer

MY ROLE

As the lead (& solo) designer I was responsible for the entire design process, including research, design, and testing.

I also created all marketing and support assets related to this feature.

Instructors lacked confidence in the accuracy of AI grading suggestions leading them to revert back to grading manually.

There were two main reasons for this:

1) Some instructors missed the justifications TimelyGrader provides explaining why the AI selected the grade it did. This left them unable to validate the AI's reasoning.

2) For instructors who reviewed the justifications, the time needed to cross-reference them with student papers was as much or more than manual grading.

The current implementation of grade justifications failed to build instructor confidence in suggested grades and detracted from the primary goal of saving instructors time.

We created an annotation system that highlights the context behind AI grading and feedback suggestions.

Pinpointing each part of the student’s assignment that is relevant to the criterion being assessed creates an efficient human-in-the-loop system that doesn’t increase instructors’ workloads. This review workflow helps instructors to understand the AI’s reasoning at a glance, and either confirm the grade or adjust accordingly.

Previously, instructors had to manually identify the assignment context for grade justifications, often necessitating multiple re-reads. This view did not enable simultaneous review of the justifications and rubric.

The redesigned justifications can be viewed alongside the assignment, and context is pinpointed by highlight annotations. Instructors can review the assignment, justifications, and rubric simultaneously.

Micro-Feedback Annotations

In addition to grading annotations, we saw an opportunity to enhance assignment feedback, which was currently only provided at the macro-level. Implementing the same framework as grading annotations, we were able to satisfy requests from students, instructional designers and educators to integrate micro-feedback.

AI feedback annotations highlight a specific example related to each piece of feedback that students can use as a reference for further improvements throughout their work.

A New Way to Review and Edit

Annotations introduce a new way for instructors to review grades and feedback. They can choose either our split-screen view, which presents a traditional grading setup with the rubric/feedback on the right, or TimelyGrader's new full-screen view (pictured).

This new grading workflow allows instructors to concentrate solely on students' work and TimelyGrader’s grading/feedback suggestions. They can review and edit grades/feedback in-line with highlights.

Manual grading consumes significant instructor time, and is a major pain point. We needed to discover why instructors were choosing to grade manually rather than adopting a more efficient solution.

Interviews with pilot instructors revealed they were pleased with the AI feedback but less satisfied with the grading suggestions. Their reluctance stemmed from two main reasons:

  1. Low confidence in AI suggested grades

  2. Lack of time savings

We then asked these instructors to walk us through their grading workflows in order to identify the root of these issues. Here’s what we found:

Low confidence in AI-suggested grades → These instructors lacked confidence in the AI-suggested grades because they were either unaware of or did not review the justifications provided. An instructor at ASU noted:

“I didn’t even know that was there, I completely missed it. That takes the guesswork out of if the AI is correct and would be game-changing.”

Lack of time savings → Instructors who engaged with the justifications felt much more confident in the suggested grades. However, they found that cross-referencing them with student papers only marginally improved efficiency compared to manual grading, leading them to abandon AI grading altogether.

Our research highlighted that building trust, and saving time were equally import to the adoption AI grading. While justifications proved to be a promising trust-building solution, their current implementation lacked efficiency, negatively impacting instructor satisfaction and feature adoption.

We needed to find a way to increase the visibility of justifications, and reduce the amount of time it took to review them.

The considerable time required for manual grading is a significant pain point for most instructors. Increasing their grading efficiency was a huge opportunity to boost instructor satisfaction, loyalty, and ultimately, increase user retention. However, their adoption of this solution hinged on building trust in the AI's suggestions, requiring extensive review of its reasoning. My challenge lay in finding a solution that balanced these competing priorities.

Highlighting grading/feedback context with annotations

I started by exploring ways to integrate justifications into instructors’ main grading workflow, allowing them to view the student paper, grading rubric, feedback, and justifications all in one place.

If instructors had to read entire paper to find what the grading justifications were referencing, manual grading remained preferable. I aimed to remove the burden of pinpointing context manually, shifting that responsibility to our platform.

  1. Instructors should be able to compare the student work, grade justifications, and rubric simultaneously

  2. Instructors shouldn’t have to read the whole paper to identify what each justification is referring to

Given these criteria, I decided to integrate justifications directly into the student paper. This approach mimics how an instructor typically annotates an assignment during grading, but with our AI handling the markups. By embedding justifications as annotations, instructors can easily see which parts of the paper contributed to the AI’s suggestions and quickly assess if they agree or disagree. Additionally, this places justifications beside the rubric, enabling instructors to grade as they read, without having to switch between the justifications sidebar and the rubric sidebar.

Micro-feedback annotations

While grading was our initial focus, feedback from instructional designers pointed towards a new opportunity to enhance student learning: using this annotation framework to provide micro-feedback. Micro-feedback had been requested by instructors in the past, but what made us decide to expand the scope of this project now was actually something from our recent student interviews - 70% of students felt overwhelmed by macro-feedback. Students needed feedback to be more specific, and wanted more examples.

With these factors in mind, we began to see annotations as a feature that could provide value across our platform, from instant student feedback to final submission grading. Given the time constraints on this project, and the expanded scope I focused on designing components and systems adaptable for grading and feedback workflows for both instructors and students.

Previously, feedback was delivered in one large paragraph, which 70% of students found overwhelming and difficult to implement.

With annotations, feedback is delivered at the micro-level, and highlights correspond to each piece of feedback.

Harmonious roadblocks

We faced technical constraints when finding a document viewer that could support our requirements for grading/feedback annotations. I had to adjust our existing designs to align with the limitations of available viewers.

01. Annotation buttons couldn’t be implemented, but highlights could not be the justification trigger because they overlapped for multiple different criteria

02. Difficult to target a highlight start and end point for separate criteria that shared document context (especially with larger rubrics)

Furthermore, testing these designs with users surfaced usability issues:

03. Instructors wanted to be able to move the justification pop-up to view additional context outside of the highlighted section. Space was already limited, and this was almost impossible on tablet and mobile sizes.

04. Instructors were confused about switching between criteria vs. switching between justifications related to one criterion

Next Steps

This feature is currently in development.

With the added scope, we had to postpone developing our secondary grading workflow. My next steps are to further build out the functionality of this workflow, making it an independent process from our traditional grading workflow.

Given time constraints, instructors in the current designs can only edit and delete AI-generated annotations. I’m excited to expand annotation functionality, enabling instructors to create their own annotations in the future!

Personal Outcomes

The project was a great lesson in managing changing requirements and addressing scope creep strategically. Initially, we viewed annotations as a point solution, but it wasn't until later in the design process that we recognized their potential as a larger system.

With time constraints and an expanding scope, I initially adapted the existing grading designs for feedback. I quickly learned that challenges I was having with the grading designs were only magnified when applied to feedback.

Without technical constraints prompting significant changes to the existing designs at this point, I may not have overhauled these designs so significantly. Luckily, this created an opportunity to revisit the requirements and address elements of the current designs that were hindering the broader system.

I definitely breathed a sigh of relief after finishing the final iteration - a few small changes significantly reduced the complexity of the designs (even for grading by itself) and left me wondering why I hadn’t seen this solution from the beginning! Humbling! This redesign proved much more effective, reducing complexity and saving us weeks of development time.

My takeaway is that if your designs start to feel like trying to squish your foot into a too small shoe it’s probably best to take a step back and reassess the approach.