“If it looks like a duck, and it quacks like a duck, it must be a duck!” Or so the famous saying goes. But that isn’t always true on the web or in print. Two symbols can look visually identical and be something completely different. By “different” I mean “be represented by different encodings”. That means an attacker can craft a URL that looks like a familiar one (e.g. for your bank) and can be something completely different!
Readers of this blog may remember my article on typography from last year. In it I talked about formatting in Microsoft Word. I have long had an interest in typography. Its use in cyber attacks is therefore intriguing to me.
Before I continue discussing the attack itself, I’d like to briefly discuss the name of the attack. The term homograph generally refers to two words written the same (homo – same + graph write). My favorite is ‘wind’ which could refer to blowing air, or to twisting something to make a clock or wristwatch continue to operate.
A homoglyph is one or more characters that look the same but are different. If you have ever been confused by the lower-case ell ‘l’ and the one ‘1’ in the Courier font family, you know what I mean. In some fonts the oh and the zero can be confusing, too.
Thus, what is often referred to as a “homograph attack” should probably be called a “homoglyph attack”.
Attackers can exploit this – primarily in phishing attacks – by using URLs that point to sites that look legit but are indeed malicious. I could encode “DOG” as “ᎠՕᏀ”. It looks enough like the original that it would confuse the average reader. In reality, it is three totally different characters. In Microsoft Word, the ‘D’ and the ‘G’ are from a font called Gadugi (the word is from the language of the Cherokee). While two letters are non-ASCII here, one is enough to have a URL point to a different site.
(I actually used characters represented in hexadecimal (hex) as 0x13a0, 0x0555, and 0x13c0 in the example above. They look different here than in Microsoft Word, so you can look at them there with this info if you wish.)
Fortunately, attackers tend to use the “Punycode” representation of these encoded URLs, so the URL would be xn--m9a134e1c.com for the bogus DOG.com. Some browsers display the Punycode and generally ask whether you meant whatever the appropriate URL was. Older browsers didn’t do this, so be sure to keep your browser current. (You can set Firefox to always display Punycode, there is a setting you can access in the configuration. This is a somewhat advanced technique, but for those who know what to do, the setting is
network.IDN_show_punycode and it should be set to true to show the Punycode.)
For important URLs such as those for banks and such, they will have what’s called an “Extended Validation Certificate”. This is a certificate (as in https) that verifies the legal identity of the holder of the certificate. The major browsers turn the address bar green when the site uses such a certificate.
Absolutely! There are no Latin character representations for most symbols in Asian and other alphabets. People who natively use those languages have websites using those alphabets and they are represented, for example, in Punycode for those of us without those symbols on our keyboards.