Extraction of Structure and Content from the Edgar Database: A Template-Based Approach

Author:

Cong Yu1,Kogan Alexander2,Vasarhelyi Miklos A.2

Affiliation:

1. Towson University

2. Rutgers, The State University of New Jersey, Newark

Abstract

This paper presents a template-based approach to extract data from the EDGAR database. A set of heuristic-based templates is used to configure the trainable system in order to have one type of EDGAR filings processed in a single configuration. Such configurability is highly desirable as it adds expendability and flexibility to this system. The template-based approach also enables the system to extract both structural information and content from the filings in the EDGAR database. The ability to extract structural information from a section or a complete filing makes it possible to collect data from real-world documents for users of financial data in both academia and industry. We use the income statement section of 10-K filings to illustrate the system and the utilization of the template-based approach.

Publisher

American Accounting Association

Subject

Computer Science Applications,Accounting

Reference29 articles.

1. American Institute of Certified Public Accountants (AICPA). 2000. Accounting Trends and Techniques. New York, NY: AICPA.

2. Appelt, D. E., and D. Israel. 1999. Introduction to information extraction technology: A tutorial prepared for IJCAI-99. 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. Available at: http: / / www.ai.sri.com / appelt / ie-tutorial / IJCAI99.pdf.

3. Baeza-Yates, R., and B. Ribeiro-Neto. 1999. Modern Information Retrieval. New York, NY: ACM Press.

4. Does the Year 2000 XBRL Taxonomy Accommodate Current Business Financial-Reporting Practice?

5. Financial Reporting and Auditing Agent with Net Knowledge (FRAANK) and eXtensible Business Reporting Language (XBRL)

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Say on pay votes, subsequent firm performance, and CEO risk‐taking behavior;Journal of Corporate Accounting & Finance;2022-05-02

2. Corporate governance, CEO turnover and say on pay votes;Accounting Research Journal;2021-07-26

3. Exploring the U.S. Securities and Exchange Commission’s Edgar database by sampling joint venture contracts;International Journal of Disclosure and Governance;2020-07-28

4. A Position-Based Method for the Extraction of Financial Information in PDF Documents;Proceedings of the 21st Australasian Document Computing Symposium;2016-12-05

5. Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research;Intelligent Systems in Accounting, Finance and Management;2016-03-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3