A layered approach for investigating the topological structure of communities in the Web-Reference-Cited by-同舟云学术

A layered approach for investigating the topological structure of communities in the Web

Published:2003-08-01 Issue:4 Volume:59 Page:410-429
ISSN:0022-0418
Container-title:Journal of Documentation
language:en
Short-container-title:

Author:

Thelwall Mike

Abstract

A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully‐automated community identification using all possible single starting points. The overall topology of the graphs in the three least‐aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non‐trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.

Publisher

Emerald

Subject

Library and Information Sciences,Information Systems

Reference49 articles.

1. Albert, R., Jeong, H. and Barabási, A.L. (1999), “Diameter of the World‐Wide Web”, Nature, Vol. 401, pp. 130‐1.

2. Arasu, A., Cho, J., Garcia‐Molina, H., Paepcke, A. and Raghavan, S. (2001), “Searching the Web”, ACM Transactions on Internet Technology, Vol. 1 No. 1, pp. 2‐43.

3. Baeza‐Yates, R. and Castillo, C. (2001), “Relating Web characteristics with link based Web page raking”, in Proceedings of SPIRE 2001, IEEE CS Press, November, pp. 21‐32.

4. Björneborn, L. (2001), “Small‐world linkage and co‐linkage”, in Proceedings of the 12th ACM Conference on Hypertext and Hypermedia, ACM Press, New York, NY, pp. 133‐4.

5. Borgman, C. and Furner, J. (2002), “Scholarly communication and bibliometrics”, in Cronin, B. (Ed.), Annual Review of Information Science and Technology 36, Information Today, Medford, NJ, pp. 3‐72.

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Website Quality Evaluation Methodology Universal Star: 1st point – “Content”;Informatics;2020-09-30

2. A hierarchical typology of scholarly information units: based on a deduction-verification study;Journal of Documentation;2019-09-20

3. Measuring the alignment of websites and organisational critical activities;Technology Analysis & Strategic Management;2015-03-10

4. Towards the Development of Community Algorithm;2009 International Conference on Information Management and Engineering;2009

5. Towards the Development of Human Community Ontology;2009 WRI World Congress on Software Engineering;2009