Author:
Atanassova Iana,Bertin Marc,Larivière Vincent
Abstract
Purpose
– Scientific abstracts reproduce only part of the information and the complexity of argumentation in a scientific article. The purpose of this paper provides a first analysis of the similarity between the text of scientific abstracts and the body of articles, using sentences as the basic textual unit. It contributes to the understanding of the structure of abstracts.
Design/methodology/approach
– Using sentence-based similarity metrics, the authors quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the introduction, methods, results and discussion structure, using a corpus of over 85,000 research articles published in the seven Public Library of Science journals.
Findings
– The authors provide evidence that 84 percent of abstract have at least one sentence in common with the body of the paper. Studying the distributions of sentences in the body of the articles that are re-used in abstracts, the authors show that there exists a strong relation between the rhetorical structure of articles and the zones that authors re-use when writing abstracts, with sentences mainly coming from the beginning of the introduction and the end of the conclusion.
Originality/value
– Scientific abstracts contain what is considered by the author(s) as information that best describe documents’ content. This is a first study that examines the relation between the contents of abstracts and the rhetorical structure of scientific articles. The work might provide new insight for improving automatic abstracting tools as well as information retrieval approaches, in which text organization and structure are important features.
Subject
Library and Information Sciences,Information Systems
Reference21 articles.
1. Andrade, C.
(2011), “How to write a good abstract for a scientific paper or conference presentation”,
Indian Journal of Psychiatry
, Vol. 53 No. 2, pp. 172-175. doi: 10.4103/0019-5545.82558.
2. Banerjee, S.
and
Pedersen, T.
(2002), “An adapted Lesk algorithm for word sense disambiguation using wordNet”, in
Gelbukh, A.F.
(Ed.), Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, Springer, Berlin and Heidelberg, pp. 136-145.
3. Bertin, M.
,
Atanassova, I.
,
Larivière, V.
and
Gingras, Y.
(2016), “The invariant distribution of references in scientific papers”,
Journal of the Association for Information Science and Technology
, Vol. 67 No. 1, pp. 164-177. doi: 10.1002/asi.23367.
4. Elkiss, A.
,
Shen, S.
,
Fader, A.
,
Erkan, G.
,
States, D.
and
Radev, D.
(2008), “Blind men and elephants: what do citation summaries tell us about a research article?”,
Journal of the American Society for Information Science and Technology
, Vol. 59 No. 2003, pp. 51-62. doi: 10.1002/asi.20707.
5. Gabrilovich, E.
and
Markovitch, S.
(2007), “Computing semantic relatedness using Wikipedia-based explicit semantic analysis”, The International Joint Conference on Artificial Intelligence, Vol. 7, pp. 1606-1611.
Cited by
23 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献