Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application-Reference-Cited by-同舟云学术

Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application

Published:2021-11-28 Issue:3 Volume:13 Page:145-168
ISSN:2710-1274
Container-title:International Journal of Advances in Soft Computing and its Applications
language:
Short-container-title:IJASCA

Author:

Khder Moaiad

Abstract

Web scraping or web crawling refers to the procedure of automatic extraction of data from websites using software. It is a process that is particularly important in fields such as Business Intelligence in the modern age. Web scrapping is a technology that allow us to extract structured data from text such as HTML. Web scrapping is extremely useful in situations where data isn’t provided in machine readable format such as JSON or XML. The use of web scrapping to gather data allows us to gather prices in near real time from retail store sites and provide further details, web scrapping can also be used to gather intelligence of illicit businesses such as drug marketplaces in the darknet to provide law enforcement and researchers valuable data such as drug prices and varieties that would be unavailable with conventional methods. It has been found that using a web scraping program would yield data that is far more thorough, accurate, and consistent than manual entry. Based on the result it has been concluded that Web scraping is a highly useful tool in the information age, and an essential one in the modern fields. Multiple technologies are required to implement web scrapping properly such as spidering and pattern matching which are discussed. This paper is looking into what web scraping is, how it works, web scraping stages, technologies, how it relates to Business Intelligence, artificial intelligence, data science, big data, cyber securityو how it can be done with the Python language, some of the main benefits of web scraping, and what the future of web scraping may look like, and a special degree of emphasis is placed on highlighting the ethical and legal issues. Keywords: Web Scraping, Web Crawling, Python Language, Business Intelligence, Data Science, Artificial Intelligence, Big Data, Cloud Computing, Cybersecurity, legal, ethical.

Publisher

Alzaytoonah University of Jordan

Cited by 62 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Understanding the skills gap between higher education and industry in the UK in artificial intelligence sector;Industry and Higher Education;2024-09-03

2. Aligning hard skills demands between Brazilian universities and the labor market: A conceptual model for bridging the gap;Industry and Higher Education;2024-09-02

3. Potentials and challenges of artificial intelligence-supported greenwashing detection in the energy sector;Energy Research & Social Science;2024-09

4. Practical Survey Private Search Engine Over the Web 3.0;Advances in Web Technologies and Engineering;2024-08-16

5. Sentiment Analysis of Nuclear Power Plant and Nuclear Science in Indonesia Based on Platform X Using BERT, VADER, and TextBlob Methods;2024 International Electronics Symposium (IES);2024-08-06