Speech Technology Magazine

 

Domain-Squatting Scheme Exploits TTS

By Leonard Klie - Posted May 8, 2015
Page1 of 1
Bookmark and Share

A new type of Internet domain name abuse has been uncovered that utilizes the text-to-speech capabilities of some computers and smartphones. The discovery was made by a professor in the computer science department at the State University of New York–Stony Brook.

Dubbed "sound-squatting," it is the latest form of cyber-squatting—defined in the federal Anticybersquatting Consumer Protection Act as registering, trafficking in, or using an Internet domain name with the bad-faith intent to profit from a trademark belonging to someone else.

The newest scheme involves the creation of domains with words that are pronounced exactly the same way as those used by legitimate brands. As an example, in the case of YouTube, so-called sound-squatters have registered the domain names www.yewtube.com, ewetube.com, and utube.com. None are affiliated with Google’s video-sharing site.

Primary uses for the bogus domains include displaying ads, conducting phishing attacks, installing malicious software, and stealing traffic from targeted domains.

Many of these domain squatters monetize their bogus URLs with parking services that generate dynamic ads every time the site is visited. If a site visitor clicks on any of these ads, the parking service gives a fraction of the advertising money to the owner of the domain.

Of the top 10,000 Web sites ranked by Web analytics provider Alexa, the report identified 8,476 domains that might be vulnerable to sound-squatting. Twenty-two percent (1,823) of those domains were already registered by domain squatters. In total, 1,037 (57 percent) of the 1,823 registered sound-squatted domains were tagged as malicious, with the majority displaying ads. "More than half of the registrations belonged to domain squatters trying to monetize these domains by abusing the trust users have for the original, authoritative domains," says Nick Nikiforakis, coauthor of the report and an assistant professor in the computer science department at Stony Brook University.

Low-ranking Web sites are just as vulnerable as ones that receive a lot of traffic, the research found.

Nikiforakis calls the link between sound-squatting and text-to-speech an "unfortunate interaction ... where attackers, if they ever wanted to, could abuse the near-identical sounding domains to attack people that depend on text-to-speech technologies." The victims, he says, are often users of personal assistants such as Apple's Siri, or vision-impaired people "who really need the help of text-to-speech software to navigate the Web and consume content."

Nikiforakis has urged regulators to take note due to the high proportion of malicious domains uncovered.

However, the good news is that not all domains are targets, and typically, the number of homophones—words having the same pronunciation but different meanings, origins, or spellings—is much lower than the misspelling variations that are exploited by typo-squatting. Typo-squatting is based on the probability that a certain number of Internet users will mistype the name of a Web site or spell it wrong in their browser address bars. Defending against sound-squatting, therefore, costs significantly less than defending against typo-squatting.

There is no easy way to automatically check for sound-squatting. Nikiforakis suggests that companies use the site Homophone.com to identify words that sound similar to those in their brand domain names. His research team has also developed a tool to automatically generate valid sound-squatted domains, called AutoSoundSquatter, but Nikiforakis says it is being re-engineered and is not yet ready to go to market.

The easiest thing that a domain owner can do is to proactively register sound-squatting domains and then set them up to automatically redirect visitors back to their legitimate sites, according to Nikiforakis.

He also suggests that text-to-speech software vendors reconfigure their applications so that when they identify a URL, they automatically switch to a spelling mode that pronounces each letter of the Web address individually instead of trying to read the entire domain name as one word.

Page1 of 1