State of the Internet’s Languages

I didn’t realize there is an effort to “de-colonize” the Internet but it’s unsurprising and makes a lot of sense. While I’ve never seriously questioned it, I’ve wondered in the back of my mind just how weird it must be for developers who don’t speak English as their first language. Code is completely English and the way we name things is often that way as well.

So yeah, there is certainly a barrier to entry for any non-English speakers trying to crack into web development. We often hail the Internet as global and democratized information, but clearly there are many people who get left behind. How many? I’m not sure, but Whose Knowledge? is a global campaign trying to figure it out and offers at least some compelling figures about online representation as a whole in their 2020 report.

75% of the world’s online populaton is currently from the Global South, and 45% of all women in the world are online. At the same tme, we know that content online remains heavily skewed towards rich, Western countries, and most online knowledge today is accessible only through colonial languages.

We estmate that only about 500 of the world’s 7000+ languages are represented online, with English and Chinese dominatng. Google estmates that 129 million books have been published in about 480 languages. At best, then, only 7% of the world’s 7000 languages are captured in published material. An even smaller fracton of these languages is represented in digital content.

“De-Colonizing the Internet’s Languages Report” Summary Report” (2020)

That’s usage, not development. But the point still holds. And the barriers keep mounting as technology evolves and grows.

I’m currently part of a cohort that is auditing and improving curriculum at the school where I teach so that it is more culturally inclusive and equitable, and that’s where I found this stuff. I applied to the cohort for a number of reasons — not the least of which is to learn to write a better syllabus and lesson plans — but I wanted to apply whatever lessons I gained there to the front-end development courses I teach. I doubt I’ll make any significant dent in the overall system, but I’m hoping that seeing code through less of an English lens opens up opportunities to make writing and learning it a little more inclusive.

Things like:

  • Surfacing the contributions of women and minorities in the formation of the Internet. They were there in the web’s early days, and many are currently pioneering the path forward. We ought to call them out and encourage others like them to participate. Note that folks like Jeffrey Zeldman have been riding this bell for many years now.
  • Helping advance the careers of my students. This is truly important to me because I believe the next wave of web designers and developers are going to make incredible things. It’s important we raise them up starting with a solid foundation of core web languages and context so they can build off all the work we’ve done up to this point. It’s also important to help mentor students when it comes to job searches, interviewing, and the realities of the workplace.
  • De-mystifying web development. You’d be amazed by the number of students I talk to who think that learning basic HTML and CSS is out of their reach. Students have told me they think it’s too technical and it takes a higher degree in math, computer science, or engineering to get a foot in the door. Clearly, there’s a perception that you’ve got to be smart, rich, and educated to get a foot in the door. That’s a shame and I want to make my class a place where code is not only approachable, but welcoming.

Anyway, there’s so much more to cover here and I’m only scratching the surface as I start this cohort. So stay tuned!

✏️ Handwritten by Geoff Graham on May 4, 2021