HTML language markup

Date of publication

When creating digital content, a central question is in which language it should be published. If no conscious decision is made here, many content management systems (CMS) will label the content as English-language.

Impact

What is invisible to the naked eye can leave screen reader users perplexed by incomprehensible content. If not deactivated by the user, the speech output and the Braille output table are based on the language markup. Thus, a German text may be read aloud with English speech output or displayed with the wrong character encoding on the Braille display.

Cheer up!

The head of the HTML file contains <html lang="de">, where the "de" in this example stands for a German language markup. This sets the language markup for the entire document. This should therefore be identical to the main language of the page.

Multilingual content

However, a text can also contain paragraphs in other languages. If I want to insert an English paragraph into a German text, I start a new paragraph <p> and give it the language tag <p lang="en">. Thus the text is marked up to the closing tag </p> English-language.

Single words

Individual words can also be marked up with the help of a span element. The procedure is identical to that for the language marking of a paragraph: <span lang="fr">However, individual words should only be marked separately in exceptional cases, since frequent use causes problems in the reading flow. This is due to the fact that most people have developed a much faster auditory comprehension with their native language and a change of language leads to the fact that not only the pronunciation changes, but in many cases also the voice and sometimes even the gender of the reading voice.

What is the distinction of individual words based on?

The correct pronunciation of common words in other languages is the responsibility of the screen reader manufacturer or text-to-speech engine provider. For example, the word "restaurant" or the word "rendezvous" is already present in the pronunciation rules without a correspondingly set speech markup.

However, since it is impossible to know which screen reader and which speech output will be used when creating the text, it is advisable to exclude foreign language terms as far as possible. If this is not possible, because it concerns product names or names which are rather unusual, a distinction is advantageous. But the rule here is: less is more!

Show me!

The following two screencasts show the effects of speech markup in HTML.

Screencast: Document-wide speech markup

Screencast: Finer granular speech markup

Profile picture for user DeepL

DeepL is a deep learning company that develops AI systems for languages. The company, based in Cologne, Germany, was founded in 2009 as Linguee, and introduced the first internet search engine for translations. Linguee has answered over 10 billion queries from more than 1 billion users.

Profile picture for user dennis.westphal

Dennis Westphal

Dennis is an IT consultant at the Company for the Development of Things. His field is accessibility. Helpfully, Dennis has been blind since birth. He creates his screencasts with open source software.

Comments