The Latest in IT Security

Spammers Take Advantage of Unicode Normalization to Hide URLs

04
Aug
2011

by Francisco Pardo and Nick Johnston

Spammers are never idle when it comes to finding new ways to bypass mail filters—after all, this is crucial to a spammer's success. Recently, we've seen a low but steady number of spam messages in which spammers are replacing certain characters in URLs (which point to spam sites) with Unicode characters that look similar or identical. This is yet another way of obfuscating URLs in an attempt to make it more difficult to analyze them.

To understand how this technique works, a bit of knowledge of the Unicode standard is helpful. As well as specifying a large repertoire of characters, Unicode also provides normalization rules for converting similar and/or equivalent characters to a single form. For example, under various Unicode normalization forms, an encircled number is considered equivalent to the corresponding ordinary number. This latest spammer-led URL obfuscation technique relies on the HTML-rendering engine in mail clients (or Web browsers for Web-based email) to apply the appropriate Unicode normalization to URLs.

For example, a spam message could contain the following URL:

http://example??y/xyz

At first glance, the period or dot may look like a normal dot character, but it has actually been replaced with Unicode character U+2024, "ONE DOT LEADER". The "l" in the top-level domain also appears to be a normal Latin letter "l", but is actually Unicode character U+217C, "SMALL ROMAN NUMERAL FIFTY". When a Web browser or mail client HTML-rendering engine processes this URL, it typically applies Unicode normalization to it, replacing the "ONE DOT LEADER" character with a normal dot and the "SMALL ROMAN NUMERAL FIFTY" with a normal "l" character, allowing the user to visit the spam site. The process works as follows:
 

In a sense, this is similar to internationalized domain name (IDN) homograph attacks, in which similar-looking Unicode characters are used to lead users to fake sites, often for phishing purposes. However, this technique differs because it involves using similar Unicode characters to obfuscate a site rather than fake or spoof a site.

Leave a reply


Categories

THURSDAY, MARCH 28, 2024
WHITE PAPERS

Mission-Critical Broadband – Why Governments Should Partner with Commercial Operators:
Many governments embrace mobile network operator (MNO) networks as ...

ARA at Scale: How to Choose a Solution That Grows With Your Needs:
Application release automation (ARA) tools enable best practices in...

The Multi-Model Database:
Part of the “new normal” where data and cloud applications are ...

Featured

Archives

Latest Comments