General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes-Reference-Cited by-同舟云学术

General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes

Published:2013-07-24 Issue:2 Volume:48 Page:227-248
ISSN:1574-020X
Container-title:Language Resources and Evaluation
language:en
Short-container-title:Lang Resources & Evaluation

Author:

Švec Jan,Lehečka Jan,Ircing Pavel,Skorkovská Lucie,Pražák Aleš,Vavruška Jan,Stanislav Petr,Hoidekr Jan

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Linguistics and Language,Education,Language and Linguistics

Link

http://link.springer.com/content/pdf/10.1007/s10579-013-9246-z.pdf

Reference25 articles.

1. Baroni, M. & Bernardini, S. (2004). Bootcat: Bootstrapping corpora and terms from the web. In In Proceedings of LREC 2004, pp. 1313–1316.

2. Broder, A. Z., Glassman, S. C., Manasse, M. S., & Zweig, G. (1997). Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8–13), 1157–1166.

3. Bulyko, I., Ostendorf, M., Siu, M., Ng, T., Stolcke, A., & Çetin, O. (2007). Web resources for language modeling in conversational speech recognition. ACM Transactions on Speech and Language Processing (TSLP), 5(1), 1:1–1:25.

4. Fairon, C. (2006). Corporator: a tool for creating rss-based specialized corpora. In Proceedings of the 2nd international workshop on web as corpus, WAC ’06 (pp. 43–49). Stroudsburg, PA, USA: Association for Computational Linguistics.

5. Kanis, J., & Skorkovská, L. (2010). Comparison of different lemmatization approaches through the means of information retrieval performance. In: P. Sojka, A. Horák, I. Kopeček, & K. Pala (Eds.), TSD 2010. LNCS (Vol. 6231, pp. 93–100). Heidelberg: Springer.

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. T5G2P: Text-to-Text Transfer Transformer Based Grapheme-to-Phoneme Conversion;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2024

2. Is it Possible to Re-Educate Roberta? Expert-Driven Machine Learning for Punctuation Correction;Journal of Linguistics/Jazykovedný casopis;2023-06-01

3. Text-to-Text Transfer Transformer Phrasing Model Using Enriched Text Input;Text, Speech, and Dialogue;2022

4. Automatic Grammar Correction of Commas in Czech Written Texts: Comparative Study;Text, Speech, and Dialogue;2022

5. T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion;Interspeech 2021;2021-08-30