Affiliation:
1. University of California, Irvine, CA
2. Duke University, Durham, NC
Abstract
An ads-portal domain refers to a Web domain that shows only advertisements, served by a third-party advertisement syndication service, in the form of ads listing. We develop a machine-learning-based classifier to identify ads-portal domains, which has 96% accuracy. We use this classifier to measure the prevalence of ads-portal domains on the Internet. Surprisingly, 28.3/25% of the (two-level) *.
com
/*.
net
web domains are ads-portal domains. Also, 41/39.8% of *.
com
/*.
net
ads-portal domains are typos of well-known domains, also known as typo-squatting domains. In addition, we use the classifier along with DNS trace files to estimate how often Internet users visit ads-portal domains. It turns out that ∼5% of the two-level *.
com
, *.
net
, *.
org
, *.
biz
and *.
info
web domains on the traces are ads-portal domains and ∼50% of these accessed ads-portal domains are typos. These numbers show that ads-portal domains and typo-squatting ads-portal domains are prevalent on the Internet and successful in attracting many visits. Our classifier represents a step towards better categorizing the web documents. It can also be helpful to search engines ranking algorithms, helpful in identifying web spams that redirects to ads-portal domains, and used to discourage access to typo-squatting ads-portal domains.
Funder
Division of Computer and Network Systems
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献