infocov light logo type version

Resources
Project resources include datasets related to COVID-19 communication in Social Media

 

Datasets related to COVID-19 communication in Social Media

Senti-Cro-CoV-Tweets: dataset of 10.000 annotated COVID-19 related tweets in the Croatian language (containing tweet ID annotatet with one of the labels: positive, negative, neutral).

Senti-Cro-CoV-Reddit: dataset of 6.000 annotated messages in the Croatian language (labels: positive, negative, neutral).

Cro-CoV-Tweets: dataset of COVID-19 related tweets in the Croatian language posted n 1 January 2020 and 31 May 2021.

Cro-CoV-Texts: dataset with samples of the pre-processed COVID-19 related news articles in the Croatian language published in online portals.

Cro-CoV-Texts-Comments: dataset with samples of the pre-processed COVID-19 related users’ comments in the Croatian language published in online portals.

Cro-CoV-Texts-Unigrams: dataset with collection of unigrams extracted from the COVID-19 related news articles.

Cro-CoV-Texts-Links: dataset of links to COVID-19 related news articles.

Cro-CoV-Nets: datasets of networks and multilayer networks collected from Twitter.

 

Language models and classification models

Cro-CoV-cseBERT:  general language model (based on cseBERT) trained on large corpora composed of the texts written in Croatian language related to COVID-19.

Cro-CoV-BERTić: general language model (based on BERTIć) trained on large corpora composed of the texts written in Croatian language related to COVID-19.

Senti-CoV-cseBERT: model trained for sentiment classification in the domain of COVID-19.

Multi-Cro-CoV-cseBERT: model trained for prediction of spreading in the domain of COVID-19.

Back to top