Abstract
The quality of web sources has been traditionally evaluated using
exogenous
signals such as the hyperlink structure of the graph. We propose a new approach that relies on
endogenous
signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy.
The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model.
We call the trustworthiness score we computed
Knowledge-Based Trust (KBT)
. On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
108 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献