Resources
Project resources include datasets related to COVID-19 communication in Social Media
Datasets related to COVID-19 communication in Social Media
Senti-Cro-CoV-Tweets: dataset of 10.000 annotated COVID-19 related tweets in the Croatian language (containing tweet ID annotatet with one of the labels: positive, negative, neutral).
Senti-Cro-CoV-Reddit: dataset of 6.000 annotated messages in the Croatian language (labels: positive, negative, neutral).
Cro-CoV-Tweets: dataset of COVID-19 related tweets in the Croatian language posted n 1 January 2020 and 31 May 2021.
Cro-CoV-Texts: dataset with samples of the pre-processed COVID-19 related news articles in the Croatian language published in online portals.
Cro-CoV-Texts-Comments: dataset with samples of the pre-processed COVID-19 related users’ comments in the Croatian language published in online portals.
Cro-CoV-Texts-Unigrams: dataset with collection of unigrams extracted from the COVID-19 related news articles.
Cro-CoV-Texts-Links: dataset of links to COVID-19 related news articles.
Cro-CoV-Nets: datasets of networks and multilayer networks collected from Twitter.
Language models and classification models
Cro-CoV-cseBERT: general language model (based on cseBERT) trained on large corpora composed of the texts written in Croatian language related to COVID-19.
Cro-CoV-BERTić: general language model (based on BERTIć) trained on large corpora composed of the texts written in Croatian language related to COVID-19.
Senti-CoV-cseBERT: model trained for sentiment classification in the domain of COVID-19.
Multi-Cro-CoV-cseBERT: model trained for prediction of spreading in the domain of COVID-19.