Updated:

Higher Ed in 4k Project (2019)

An accessibility analysis of web pages from every college and university in the United States (nearly 4,000 higher education institutions).

The 4k project considers the automatically detectable errors that can be identified by WAVE.

Annual Reports

Purpose

The main purpose of the 4k Project is to document the progress of United States Higher Education Institutions’ web accessibility.

Interpreting the results

All automated tools, including WAVE, have limitations—only 25% to 35% of possible conformance failures can be automatically detected. The absence of detectable errors does not indicate that a site is accessible or compliant. Still, the data presented in this project provide a meaningful representation of the state of web accessibility in Higher Ed. “Errors” are WAVE-detected, accessibility barriers that have notable end-user impact and are likely WCAG 2 Level A/AA conformance failures.

What this isn’t

This project is not a condemnation of Higher Ed in the US, remember that .edu, .us, and .gov had the lowest number of average accessibility errors of all common-level domains (TLDs) in the WebAIM Million project. Before learning about web accessibility, some of the team members at Pope Tech worked in higher education on websites and created some of the same types of accessibility errors (perhaps some of those errors still exist on websites and made it into this study!).

Automated testing isn’t a silver bullet, automated tools can’t detect everything or even close to everything. Only a human can determine true accessibility. WAVE is a suite of tools to help in this process and not the end goal.

We believe web accessibility in higher education is important to track which is why we perform this study.

2019 Introduction

In November 2019, Pope Tech ran an accessibility evaluation of every top-level .edu domain in the United States launching the Higher Ed in 4k project. The accessibility analysis is performed using the Pope Tech platform which is powered by the WAVE testing engine. This analysis only looks at automatically detectable errors that can be identified by WAVE. (There are most certainly more accessibility issues on these sites that can only be identified by manual testing.)

The Higher Ed in 4k Project was inspired by the WebAIM Million Project. After the WebAIM Million project launched, one of our takeaways was the opportunity to help make websites more accessible in a significant way. What would happen if we went deeper than the home page if we focused on one group (Higher Education in the US)?

The sample

The category data and URLs for the initial launch of the 4k project were obtained from the US Department of Education’s IPED List containing 7,153 institutions. This list was trimmed of any non .edu domains along with any subdomains. For example, if institution.edu was in the sample along with sub.institution.edu only institution.edu was used. This trimmed list of institutions containing tags for state, number of students, private/public etc was then uploaded to Pope Tech and set to crawl up to 100 pages and 4 levels deep including subdomains. Some websites timed out and 29 had aggressive robots.txt files blocking all bots which we respected. The end initial sample was 3,832 higher education institutions, 17,470 websites, and 314,305 pages scanned. 

We didn’t try to influence where the crawler would go, it was just the first 100 pages returned that were linked to from the institutions’ homepage within 4 links.

The WAVE Engine

The WAVE accessibility engine was used to analyze the rendered home pages (i.e., the DOM of all pages after scripting and styles were applied). The WAVE engine uses heuristics and logic to detect patterns in web page content that align with end-user accessibility issues and Web Content Accessibility Guidelines (WCAG) conformance failures. All automated tools, including WAVE, are limited in their detection of accessibility issues—only around a third of possible conformance failures can be automatically detectable. The absence of detectable errors does not indicate that a site is accessible or compliant. Despite these limitations, the data presented in this project provide a meaningful representation of the state of web accessibility in Higher Education in the US.

Crawling Method

We didn’t try to influence where the crawler would go, it was just the first 100 pages returned that were linked to from the institutions’ homepage within 4 links.

We don’t know who visits which pages the most, which page is most important, or which ones are prioritized – this is both practical and the intent of our chosen methodology to reduce human bias. The method and logic applied to crawl these pages were consistent with all institutions. But all pages in the sample are publicly available, so we can assume real people, including those with disabilities, go to them at least sometimes. The errors detected and thus rankings reflect potential end-user barriers on those pages.

Results: How did we do?

For the initial analysis, 0.078 percent of institutions had no errors or 99.922% had detectable WCAG violations. On a page level 93.331% of pages had detectable WCAG violations. There were a total of 7,464,465 detectable errors found or 23.8 errors per page.

Now remember that the WebAIM Million report had 60 errors per page overall and 36 errors per page on .edu domains. This is encouraging for higher education. It is interesting, but makes sense that as we include less complicated pages beside the home pages we would see fewer errors. This project also includes many more colleges and universities than were included in the WebAIM Million project.

The “average” institution

The average institution in the data set had the following:

  • 82 pages scanned
  • 24 errors per page
  • 27 alerts per page
  • a user would encounter a detectable error on 1 in every 30 elements on a page

The 5 most common errors

The 5 most common errors made up 91% of all detectable errors. Two of these, Linked Image Missing Alternative Text and Image Missing Alternative Text require the same solution. In other words, by fixing just Contrast Errors, Empty Links, Missing Form Labels, and Alternative text errors we would fix 6,773,773 accessibility errors in this sample.

1. Very Low Contrast

  • What it means?
    • Very low contrast between foreground and background colors.
  • Why it matters?
    • Adequate contrast is necessary for all users, especially users with low vision.
  • 4k results:
    • 64% of all errors were low-contrast errors
    • 15.4 contrast errors per page
    • lowest institution had 0 contrast errors
    • highest institution had 1,435 contrast errors per page

2. Empty Link

  • What it means?
    • A link contains no text.
  • Why it matters?
    • If a link contains no text, the function or purpose of the link will not be presented to the user. This can introduce confusion for keyboard and screen reader users.
  • 4k results:
    • 14.6% of all errors were Empty link errors
    • 3.6 empty link errors per page
    • lowest institution had 0 empty links
    • highest institution had 341 empty link errors per page

3. Missing Form Label

  • What it means?
    • A form control does not have a corresponding label.
  • Why it matters?
    • If a form control does not have a properly associated text label, the function or purpose of that form control may not be presented to screen reader users. Form labels also provide visible descriptions and larger clickable targets for form controls.
  • 4k results:
    • 4.3% of all errors were missing form label errors
    • 1.1 missing form labels per page
    • lowest institution had 0 empty links
    • highest institution had 100 missing form labels per page

4. Linked Image Missing Alternative Text

  • What it means?
    • An image without alternative text results in an empty link.
  • Why it matters?
    • Images that are the only content within a link must have descriptive alternative text. If an image is within a link that contains no text and that image does not provide alternative text, a screen reader has no content to present to the user regarding the function of the link.
  • 4k results:
    • 4.2% of all errors
    • 1 per page
    • lowest institution had 0 empty links
    • highest institution had 178 per page

5. Missing Alternative Text

  • What it means?
    • Image alternative text is not present.
  • Why it matters?
    • Each image must have an alt attribute. Without alternative text, the content of an image will not be available to screen reader users or when the image is unavailable.
  • 4k results:
    • 3.6% of all errors
    • .9 per page
    • lowest institution had 0 empty links
    • highest institution had 44 per page

State/territory rankings

With this data, we are able to see how each state is doing. Alaska has the least detectable accessibility errors with only 9 per page, followed by Idaho, Montana, Hawaii, and Wyoming which all are below 15 errors per page. While these states tend to have lower population levels the overall state population didn’t correlate too much with the number of errors. Texas and Pennsylvania were right at the average with 24 errors per page and the bottom 6 states were Vermont, Utah, New Mexico, Florida, South Dakota and Arkansas with over 30 errors per page each. Arkansas had 44 errors per page.

When we initially ran this we only looked at the total errors per page reflected above, as we discovered some larger outliers with thousands of errors at one single institution we changed the ranking to be by the median of each institutions average per state. For example, this moved Wyoming up to the top and Main to the bottom.

On our State Rankings page you can see each state ranking, median, errors per page. We also show this for just public institutions. These are updated in real-time as institutions rescan and (hopefully) improve their web accessibility.

Rankings by tags

With this project each institution was tagged with IPED List data and mixed with a few other data sources including the UCEDD institutions to allow us to compare institutions. Below are some of the interesting comparisons we found.

Highest Degree Offered

TypeNumber of institutionsErrors per page
up to Associates degree1,48824.8
up to Bachelor’s degree50025.1
Post Bachelor degrees1,84323.3

Private vs. public institutions

TypeNumber of institutionsErrors per page
Public1,58120.21
Private not-for-profit1,51825.3
Private for-profit73330.1

Public institutions have 20 errors per page, and are more accessible than private ones. Private not-for-profit institutions are much more accessible (with 25 errors per page) than private for-profit institutions which have 30 errors per page. It would make sense that public institutions would be better as they have additional laws beyond the Americans with Disability Act including Section 508.

Student enrollment

TypeNumber of institutionsErrors per page
Enrollment under 1,0001,48528.3
Enrollment 1,000 – 4,9991,31124.06
Enrollment 5,000 – 9,99945419.02
Enrollment 10,000 – 19,99930418.98
Enrollment 20,000 and above20814.02

There was a direct correlation between student enrollment numbers and accessibility errors per page. The more students, the fewer errors. This could be because the larger institutions have more resources and budget. Institutions with over 20,000 students enrolled had only 14 detectable accessibility errors per page.

Land Grant

TypeNumber of institutionsErrors per page
Land Grant institutions10115.4
Non-Land Grant institutions3,73124.3

Land Grant Universities were much more accessible than non-land grant institutions with only 14 errors per page.

In a system

TypeNumber of institutionsErrors per page
In a system1,32223.1
Not in a system2,51024.6

Carnegie classifications

TypeNumber of institutionsErrors per page
Doctoral/Research Universities–Extensive14315.26
Masters Colleges and Universities I45419.1
Baccalaureate Colleges–Liberal Arts20520.78
Associates Colleges1,00221.52
Baccalaureate Colleges–General27524.36
Medical schools and medical centers3627.84
Schools of law1830.66

When comparing Carnegie classifications it is important to understand that not all institutions had a classification in the IPED List, the data only reflects those with a specified classification. Doctoral/Research Universities were the most accessible classification with 15 errors per page. It is interesting that the two least accessible classifications were Medical schools with 28 errors per page and Schools of Law with over 30 errors per page.

UCEDD vs. non

TypeNumber of institutionsErrors per page
UCEDD6413.8
Non-UCEDD3,76824.3

UCEDD stands for University Centers for Excellence in Developmental Disabilities Education. The vision of the UCEDD program is, “a nation in which all Americans, including Americans with disabilities, participate fully in their communities.” There is at least one in every US state and territory housed inside a host university. The host UCEDD institutions have less than 14 errors per page, which is 10 less errors per page than the average institution.

ARIA

WAVE automatically detects the presence of ARIA and we were able to see this per institution. If you are not sure what Accessibility of Rich Internet Applications (ARIA) is or want to learn more about it, we recommend the WebAIM article on ARIA.

ARIA in the wild

In our analyses, we found an institution that had an average of 5,552 ARIA per page or 533,002 ARIA across their 96-page sample. Another Institution had more ARIA than page elements with 4,907 per page. These were two large universities including their admissions pages in the sample.

This is impressive and took some real effort to accomplish, but this is a reminder that ARIA doesn’t improve accessibility unless done correctly. Without knowing anything else we can safely assume that having 4,907 ARIA attributes on a University admissions page is not correct. 1,773 of these ARIA attributes are tabindexes with values 0 or less and over 1,000 ARIA popups and ARIA expanded attributes.

These are a conscience decision either at the site developer, template creator or CMS creator level. The good news is with a little education and minimal effort these could be simply removed to significantly improve the accessibility of this website.

Relationships between ARIA and detectable errors

In our analysis, we found a slight correlation between increased use of ARIA and detectable errors. When we changed the analysis to take into account page density (the number of elements on a page), this correlation reversed. It would make sense that as a page is more complex there would be more need for ARIA and more potential elements to have errors. It is also important to understand that in the example above with 4,907 ARIA per page, there were less than average detectable errors from an automated tool but very many impactful accessibility issues that were not detectable.

As a comparison, the WebAIM Million project found that as ARIA increased detectable errors tended to increase as well.

Structure vs overall errors correlation?

We looked for a correlation between if a page used HTML regions or didn’t have an h1 element present and if there were more likely to be detectable errors but didn’t find any correlation. Even though no correlation was found, a semantic structure by itself is very impactful to screen reader users to navigate a page.

Interesting and random tidbits

In the evaluation, 6,964 skip links didn’t have a target. This means someone went through the effort to add a skip link but then either never tested it or it was broken with a template update.

There were 170 marquee tags still around.

We found 335,183 layout tables and only 53,715 data tables. A data table is classified as a data table if it is a properly structured table with proper heading rows. Realistically we suspect that there are relatively few true layout tables, but many tables with data without heading rows.

We found 655,992 links to PDFs or 2 per page. These may or may not be accessible, but as we know from the 2019 WebAIM screen reader survey, 74% of screen reader users are either Very Likely or Somewhat Likely to encounter significant issues accessing a PDF document.

Conclusion

While there is still significant work to be done to ensure Higher Education websites are accessible to everyone, we are encouraged by how much better Higher Ed results are compared to non-higher Ed websites. We are hopeful that this project and other endeavors by the Higher Education web accessibility community can help bring more awareness to web accessibility and are optimistic that over time we will see additional improvement.

There are countless ways this data can be analyzed and explored. We also see the potential of additional studies and further analysis against the ongoing WebAIM Million Project. We are open to feedback on ways to make this more impactful for the Higher Education Community. If you have questions about this project or feedback please contact us.