An efficient content extraction method for webpage based on tag-line-block analysis-Reference-Cited by-同舟云学术

登录注册会员服务联系我们

An efficient content extraction method for webpage based on tag-line-block analysis

Published:2023-08-24 Issue:20 Volume:27 Page:14631-14645
ISSN:1432-7643
Container-title:Soft Computing
language:en
Short-container-title:Soft Comput

Author:

Chen Zeqiu,Zhou Jianghui,Sun Ruizhi^ORCID

Funder

National Key Research and Development Program of China

Publisher

Springer Science and Business Media LLC

Subject

Geometry and Topology,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s00500-023-09076-x.pdf

Reference34 articles.

1. Baroni M, Chantree F, Kilgarriff A et al (2008) Cleaneval: a competition for cleaning web pages. In: Proceedings of the 6th international conference on language resources and evaluation, pp 638–643

2. Cai D, Yu S, Wen J R, et al (2003) Extracting content structure for web pages based on visual representation. In: Proceedings of the 5th Asia-pacific web conference on web technologies and applications, pp 406–417

3. Cardoso E, Jabour I, Laber E, et al (2011) An efficient language-independent method to extract content from news webpages. In: Proceedings of the 11th ACM symposium on document engineering, pp 121–128

4. Chen X (2011) Universal web content extraction based on row block distribution function. https://code.google.com/p/cx-extractor

5. Crescenzi V, Mecca G, Merialdo P (2001) Roadrunner: towards automatic data extraction from large web sites. In: Proceedings of the 27th international conference on very large data bases, vol. 1, pp 109–118

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线，采集、加工和组织学术论文而形成的新型学术文献查询和分析系统，可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容，当前同舟云学术共收录了国内外主流学术期刊6万余种，收集的期刊论文及会议论文总量共计约1.5亿篇，并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询！咨询电话：010-8811{复制后删除}0370

www.globalauthorid.com

TOP