• Türkçe
    • English
  • English 
    • Türkçe
    • English
  • Login
View Item 
  •   DSpace@MEF
  • Fakülteler
  • Mühendislik Fakültesi
  • Bilgisayar Mühendisliği | Computer Engineering
  • MF, BM, Makale Koleksiyonu
  • View Item
  •   DSpace@MEF
  • Fakülteler
  • Mühendislik Fakültesi
  • Bilgisayar Mühendisliği | Computer Engineering
  • MF, BM, Makale Koleksiyonu
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
Advanced Search

Graph-based Turkish text normalization and its impact on noisy text processing

Thumbnail

View/Open

Full Text - Article (1.034Mb)

Access

info:eu-repo/semantics/openAccess

Date

2022

Author

Demir, Şeniz
Topçu, Berkay

Metadata

Show full item record

Citation

Demir, S., & Topcu, B. (June 2022). Graph-based Turkish text normalization and its impact on noisy text processing. Engineering Science and Technology, an International Journal. pp.1-13. https://doi.org/10.1016/j.jestch.2022.101192

Abstract

User generated texts on the web are freely-available and lucrative sources of data for language technology researchers. Unfortunately, these texts are often dominated by informal writing styles and the language used in user generated content poses processing difficulties for natural language tools. Experienced performance drops and processing issues can be addressed either by adapting language tools to user generated content or by normalizing noisy texts before being processed. In this article, we propose a Turkish text normalizer that maps non-standard words to their appropriate standard forms using a graph-based methodology and a context-tailoring approach. Our normalizer benefits from both contextual and lexical similarities between normalization pairs as identified by a graph-based subnormalizer and a transformation-based subnormalizer. The performance of our normalizer is demonstrated on a tweet dataset in the most comprehensive intrinsic and extrinsic evaluations reported so far for Turkish. In this article, we present the first graph-based solution to Turkish text normalization with a novel context-tailoring approach, which advances the state-of-the-art results by outperforming other publicly available normalizers. For the first time in the literature, we measure the extent to which the accuracy of a Turkish language processing tool is affected by normalizing noisy texts before being processed. An analysis of these extrinsic evaluations that focus on more than one Turkish NLP task (i.e., part-of-speech tagger and dependency parser) reveals that Turkish language tools are not robust to noisy texts and a normalizer leads to remarkable performance improvements once used as a preprocessing tool in this morphologically-rich language.

URI

https://doi.org/10.1016/j.jestch.2022.101192
https://hdl.handle.net/20.500.11779/1794

Collections

  • Araştırma Çıktıları, Scopus İndeksli Yayınlar Koleksiyonu [455]
  • Araştırma Çıktıları, WOS İndeksli Yayınlar Koleksiyonu [482]
  • MF, BM, Makale Koleksiyonu [27]



DSpace software copyright © 2002-2015  DuraSpace
Contact Us | Send Feedback
Theme by 
@mire NV
 

 




| Instruction | Guide | Contact |

DSpace@MEF

by OpenAIRE

sherpa/romeo

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsInstitution AuthorTitlesORCIDSubjectsTypeLanguageDepartmentCategoryPublisherAccess TypeThis CollectionBy Issue DateAuthorsInstitution AuthorTitlesORCIDSubjectsTypeLanguageDepartmentCategoryPublisherAccess Type

My Account

LoginRegister

Statistics

View Google Analytics Statistics

DSpace software copyright © 2002-2015  DuraSpace
Contact Us | Send Feedback
Theme by 
@mire NV
 

 


|| Guide|| Instruction || Library || MEF University || OAI-PMH ||

MEF University Library, İstanbul, Turkey
If you find any errors in content please report us

Creative Commons License
MEF University Institutional Repository is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Unported License..

DSpace@MEF:


DSpace 6.2

tarafından İdeal DSpace hizmetleri çerçevesinde özelleştirilerek kurulmuştur.