A Shop Bot for Web Market Intelligence

Author:

Morales-Arroyo Miguel A.1,Yuan Foo Chee2,Muar Lim Thian2,Hwee Kwek Choon2

Affiliation:

1. UNAM, Mexico

2. Nanyang Technical University, Singapore

Abstract

With the large amount of information available in the WWW, the ability to distinguish relevant from irrelevant data becomes a crucial factor. In this project, eight web scraping spiders were configured and evaluated for their functionality in order to determine their suitability for Interactive Digital media (IDM) start-ups to be utilized for competitive intelligence gathering. These spiders were chosen from the internet because of their availability and low cost. Each spider was configured and tested on two web sites. The evaluation process was first carried out individually to give a score to the spiders and then as a team to moderate the scores. The Web Info Extractor has the highest overall score as a web scraping spider while the Web Content Extractor has the best task analysis result. After the evaluation process, it is concluded that different spiders have varying capabilities and thus are suitable for different tasks. A spider that can handle more complex tasks is usually inherently more complex to configure and less-user friendly. Hence, in order to select the correct spider, companies should understand the tasks undertaken by their customers through basic task analysis as well as the knowledge of the amount of resources that they have at their disposal when it comes to configuring and operating the spiders.

Publisher

IGI Global

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3