Seminari

A lexicon-based approach to analyse short texts from social media

by Anastasiya Sopyryaeva

Europe/Rome
Sala Venturi+Prendiparte

Sala Venturi+Prendiparte

Description

To analyze social media texts what makes the difference is the usage of a proper lexicon that enables NLP algorithms to understand, analyze, and process human language effectively. In this report, we describe the approach used to define new lexicons for cultural heritage and vandalism and extend CrisisLexRec lexicon for natural disasters. Furthermore, we demonstrate the application of these lexicons in identifying text themes, employing different topic modeling techniques such as BertTopic Modeling and Dirichlet multinomial mixture. 
Additionally, we perform named entity recognition with BERT model to identify entities from tweets and categorize them with respect to the most suitable class. 
This study has been performed by considering a set of English tweets collected from the 1st of January 2023 to the 26th of April 2023 about cultural heritage, vandalism, bombing, and a set of natural disasters such as downpour, earthquake, explosion, fire, flood, hail, landslide, squall, tsunami, and volcano in the context of the 4CH project.