Abstractive text summarization is an emerging natural language processing (NLP) technology that combines advanced natural language understanding (NLU) and natural language generation (NLG) technologies. Unlike extractive summarization which selects few informative sentences, abstractive summarization requires full-stack semantic parsing (the NLU technology), and text generation from the abstract semantic structures (the NLG technology). The project's industrial partner, LETA, requires these technologies for efficient and innovative media monitoring and content production. The research partner, AiLab, has extensive experience in semantic parsing, language generation, and creation of annotated language resources. The goal of the project is twofold. First, to create a syntactically and semantically annotated multilayer text corpus for Latvian, anchored in acknowledged cross-lingual representations (Universal Dependencies, FrameNet, PropBank, AMR), and a wide-coverage lexical database of Latvian, linked to WordNet and accompanied with monolingual and multilingual computational lexicons. Second, to showcase the use of these language resources in the development of data-driven NLU components and knowledge-based NLG components, and to combine these components in a proof-of-concept abstractive summarization pipeline.
Contract no.1.1.1.1/16/A/219
Project Manager: Normunds Grūzītis
Duration of the project : 2016-2019
More information Here