meta charset='utf 8' what the fuss?
I have seen html developers writing below piece of code in the head tag of the html document.
<meta charset='utf-8'> OR <meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
What does charset='utf-8'
mean? What if this tag is ommited from the document?
Charset is used to specify character encoding.
Lets go to the basics, a text is a collection of characters and its stored in the computer as bytes. When we save anything to our computer it exists as bytes. Characters are represented by numbers and stored in sequence of bytes. Sometimes more than one byte is used to represent a single character.
Character encoding governs the way these bytes will be converted back to characters.
What is character encoding?
Now what happens when we omit the encoding declaration? When a developer miss the meta tag declaring the char encoding, then the char encoding of the content is left to be interpreted by the browser.
Have you ever noticed garbled characters on a web page? See the pic below.
So the absence of character may lead garbled text compromising on readability and also on search engine(SEO
) failing to make sense of the text and will not display the content in search result
One more important thing, "Fonts"
are nothing but representation of characters in symbolic form. A font is a collection of glyph definations, defining shapes for characters.
Once the bytes are interpreted as a character via a character encoding, the application looks for fonts which can be used to display these characters. If the encoding is wrong then the shape used to denote that character will be wrong.
If a font does not have a glyph of a particular character, it may look into other fonts and display wrong info or a square box, question mark or any other character.
Browser's Role
Browsers identifies the character encoding of a document via a algorithm. In absence of the character encoding declaration, it may calculate the character encoding incorrectly and may render the page incorrectly with garbled characters.
Specifying the character encoding speeds up a webpage rendering as browser does not have tp calculate the encoding and saves time.
Different ways of specifying character encodings
Character encoding can be specified by meta tags specified above in the article or they can be set by the server.
In php it can be done using the header function like this
header('Content-type: text/html; charset=utf-8');
In python:
print "Content-Type: text/html; charset=utf-8\n\n";
In JSP:
<%@ page contentType="text/html; charset=UTF-8" %>
In XML:
<?xml version="1.0" encoding="UTF-8"?>
Apache Sever configuration
It can also be configured in Apache server, via .htaccess
file. Just add the following like to the file
AddCharset UTF-8 .html
We need to configure our text editors to save data in whichever encoding we want out data to be in. For sublime it can be done like image below.
More Information
blog comments powered by Disqus