Using the SB Sam NLP tools for manual and automatic annotation of climate change texts
Reference number | |
Coordinator | Institutet För Språk & Folkminnen |
Funding from Vinnova | SEK 182 331 |
Project duration | November 2021 - December 2022 |
Status | Completed |
Venture | AI - Competence, ability and application |
Call | Staff exchange for applied AI-research 2.0 |
Important results from the project
The natural language processing technique "topic modeling" was applied to extract re-occurring topics from two different corpora on climate change that had been collected by "the Applied CompLing Discourse Research Lab". The two corpora were (i) a collection of recently published German tweets, and (ii) a collection of editorials published in the journals Nature and Science between 1969 and 2016. Results were disseminated at the Clarin conference, which was held in Prague in 2022, and in an article published in the Journal of Computational Social Science.
Expected long term effects
For the corpus with editorials from Nature and Science, we compared topic trends detected through a manual annotation of the editorials with trends that were automatically extracted by topic modeling. Most of the major trends that were detected by the manual annotation were also found automatically when using topic modeling. These results provide an example of how natural language processing methods can be used for supporting tasks that would otherwise require a fully manual analysis of large amounts of texts.
Approach and implementation
The text-mining tool Topics2Themes was used for exploring both of the two corpora. This tool is currently maintained and further developed at the Swedish national research infrastructure node "SB Sam", which is also a Clarin node. Topics2Themes uses topic modeling to automatically extract frequently re-occurring topics from large text collections, and display the output in an interactive graphical user interface. Most of the collaboration was conducted on-site in Potsdam. All travels, i.e., visits to Potsdam, and travels to conferences and seminars, were made by train.