AI Generated Alternative Text: Missing the Point of Context

I was on a call where one of the users had been implementing an “AI” solution to automatically add alternative text to images. The user was explaining to me how good the AI was. He was impressed when an image came back as “man in front of a crowd” as that is what was in the image.

Indeed, I agree, it is impressive that algorithms can be coded that are able to review images and figure out details inside of them at such a high level/degree of accuracy. Machine detection, discernment, and assistance have come a long way.

But is the current level of machine detection sufficient for writing good alternative text for images on websites?

It is important to understand, images are added to a web page to serve a purpose. They become part of the narrative of the page. Writing good alternative text for accessibility means a consideration of the page narrative and how the image supports it. And then conveying the purpose of that image to screen readers or other assistive technology.

This is why the WCAG success criteria is focusing on the purpose of the non-text content (including images) and not merely providing a description:

WCAG 2.1: 1.1.1 Non-text Content – Level A

All non-text content that is presented to the user has a text alternative that serves the equivalent purpose…

Note the Success Criteria language “serves the equivalent purpose.” In the context of images, meeting this Success Criteria means a consideration of the purpose of the web page and the purpose this image serves on that web page.

Principle: The same image used in different contexts or for a different purpose will often have different alternative text.

Take the image below of Steve Jobs on a stage at Macworld 2008 unveiling the new Macbook Air.

Steve Jobs at Macworld unveiling the new Macbook Air

When writing the alternative text for this image, you need to be thinking what is the purpose of this page? (What story are you trying to tell?) And what is the purpose of this image on this page? (How does this image support the page’s narrative?)

Is “man in front of crowd” adequate here? Is that what the page author intended to convey when they added that image to the web page?

Probably not.

So what alternative text would you use?

If I were writing an article about the magic of Steve Jobs to excite fans, I might have an alternative text like:

“Steve Jobs exciting the crowd with a new product unveiling”

Or if the web page is focusing on the laptop and not the who:

“unveiling the Macbook Air”

Or if the focus of the page is on the conference itself, the alt text might be:

“Macworld”

Or maybe I do care about the who, what, when, and where. If that were the case, I would write:

“Steve Jobs at Macworld 2008 unveiling the new Macbook Air”

The point is that the purpose of the web page matters and the purpose why the image was included on the page matters as well. Most of the time something like the year wouldn’t be in the alternative text. However, for some images and context, the year might have meaning for the image or is apparent from the image. Then it might be appropriate. It all depends on the context. The goal is to provide an equivalent experience if the image isn’t presented to the user.

Principle: A generic alt text is often no better than marking the image as decorative.

In many instances, a generic “man in front of a crowd” really doesn’t convey much value and could be less useful than just having the screen reader skip over the image.

Think of those possible different pages/usages of this image that I mentioned above. Consider two of the different narratives I presented and then a generic alternative text.

Page Narrative: The Magic of Steve Jobs. Does “man in front of a crowd” add much value here?
Page Narrative: The Macbook Air Release. Does “man in front of a crowd” add much value here?

You get the point.

In most of these instances, if I had just marked the image as decorative it would have served as much value for accessibility (“alternative text that serves an equivalent purpose”) as the machine-generated “man in front of a crowd.”

With no alternative text, most screen readers will either skip over the image or just read “image” and move on. Now if you add this generic “man in front of a crowd”, not only does it not convey much value, the screen reader will read this to them when it reaches the image. (Which provides little to no additional value and slows them down as they listen to this description).

So what if an image doesn’t matter or the content of the image is fully conveyed by the surrounding text or image caption. If that is the case, you can mark the image as decorative. To mark an image as decorative, use the empty quote marks in alt text: alt text = “”

Most screen readers will skip over decorative images.

(Don’t just run out and start marking all images as decorative…use decorative images appropriately).

Of course, there are some times when the “AI” might actually get it right or good enough. But is it going to be often enough to really matter? For the most part, the automatically generated alternative text is not sufficient for accessibility.

My recommendation is if you do implement a solution that autogenerates alternative text, be sure that the tool you are using provides a way for you to review what was automatically added. Ideally, it would show you the before and after the fix, and allow you to override or correct what was automatically added as needed. If your solution for automatically adding alternative text has functionality like this, it could be useful, acting as a guide or assistant to help you make your page accessible.

Without the human consideration of context, the machine guessing of what the alternative text should be is not likely to be of much value for accessibility. And your “solution” could actually be making navigating the web page clunkier for screenreader users than the page had been before the “fix.”

If you wish to go deeper on writing good alternative text, consider WebAIM’s article on Alternative text.