Trust evaluation of health websites by eliminating phishing websites and using similarity techniques

Author:

Gupta Sarika1,Bansal Himani1

Affiliation:

1. Department of CSE & IT Jaypee Institute of Information Technology Noida India

Abstract

SummaryEvery user uses a search engine to find health information from websites. Content‐rich health websites are considered in our research as wrong information in these websites can threaten life. Search engines give a list of URLs related to their search keyword. Generally, the user follows the top websites displayed by the search engine. Newly constructed websites do not have ratings, hit counts, and reviews. The search engine does not display newly constructed websites in their top rank. In such a case, the newly constructed website with the same content as the website displayed at the top of the search engine loses the user's trust. Another problem is; the phishing website URLs are also displayed by the Google Search engine, which appear similar to the genuine websites. To solve the problem and enhance the trust of health websites which is not at the top of the search engine among users, we have proposed an approach that extracts all URLs based on the keyword. It identifies all legitimate URLs using a Machine Learning classifier. Address bar features, Domain name features, HTML, and JavaScript features were identified for the dataset of getting legitimate URLs. Three classifiers (Decision Tree, Random Forest, and Support Vector Machine) were trained and evaluated. Decision Tree has the highest training accuracy, 94.125, testing accuracy, 92.75, and precision score of 96.97. The cross‐validation score of all three models is almost 93. Therefore, Decision tree is used to identify legitimate websites. After getting the list of legitimate URLs, all the content of the legitimate website is extracted. A Semantic Similarity between top‐rank legitimate website content and legitimate websites is found using Natural language processing techniques. Then the websites are ranked based on similarity and the value of the trust is assigned from highly trustable to less trustable. We have compared and correlated our results with the Web of Trust, a reputation tool for trust analysis, and have achieved a positive correlation. Thus, our approach removes phishing websites and enhances the trust in other websites that are not at the top of the search engine.

Publisher

Wiley

Subject

Computational Theory and Mathematics,Computer Networks and Communications,Computer Science Applications,Theoretical Computer Science,Software

Reference46 articles.

1. “Website ” Wikipedia.2021. Accessed August 2021.https://en.wikipedia.org/w/index.php?title=Website&oldid=1026709417.

2. On Deep Learning for Trust-Aware Recommendations in Social Networks

3. A systematic review and research perspective on recommender systems

4. Facing the cold start problem in recommender systems

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3