A University of Maryland researcher, Ernesto Calvo, is analyzing real-time Twitter data from people in Ukraine to create an “early alarm system for human rights abuses” along with an international and multi-institutional team.
The project, called Data for Ukraine, consists of two main parts: figuring out who is saying things and deciphering what they are saying, said Calvo, a government and politics professor and the Interdisciplinary Lab for Computational Social Science director at this university.
Calvo’s team determines which Twitter accounts are reliable enough to source data and information from. To do this, the team looks at where the community is gathering information, who is producing that information and how people and their social media accounts are connected to one another, he said.
By establishing a predetermined set of reliable accounts and networks of accounts, Calvo’s team can determine what information is reliable when events happen.
The Data for Ukraine team uses Twitter to collect data because it is widely used for global and local politics and other information that people want to be public, Calvo said.
“Twitter is a good platform, and it produces a lot of data. And the most important question is, is that data reliable, is that data good data?” Calvo said.
Erik Wibbels is a political science professor who leads Duke University’s DevLab. Wibbels and his team determine what the Twitter accounts are saying through a natural language processing element, Calvo said.
Graeme Robertson, a political science professor and Center for Slavic, Eurasian and East European Studies director at the University of North Carolina at Chapel Hill, also works on the language side of the project by identifying keywords and Russian translations.
To understand the language being used in tweets, Robertson said he often looks at spikes in the data and the underlying tweets to understand what’s really going on in these situations in which the spikes occur.
The project has drawn interest from people within the British government, human rights organizations in Geneva and within the Ukrainian government itself, Robertson said. But the team is still figuring out how to produce regular reports on the data and get those reports to interested parties.
“We’re trying to work out exactly what that’s gonna look like because this is all happening in real time and we’re all completely swamped,” Robertson said. “It’s been a whirlwind.”
Katrina Fenlon, a professor in the information studies college at this university who has a background in archives for digital curation, said analyzing and storing online data is a field that is “increasingly important, addressing problems that are increasingly urgent.”
“Data can be threatened by sort of shifts in political administrations, it can be threatened by war and by other crises that, you know, actually take down servers’ information,” Fenlon said.
But Fenlon also pointed out that with data preservation and archiving, there is also a need to be careful about the information that is being stored to protect the people who produce the data.
“These social media data can contain traces of information that personally identifies the people who posted it, for example, or their location. And it can endanger the privacy or the rights or the well-being of people who share information,” Fenlon said.
Twitter even released specific guidance to Ukrainians on how to secure or delete their accounts because of the risks of their Twitter data and location information being used by Russian forces to locate them, Fenlon said.
However, Fenlon said this is mainly in regard to projects that archive and reuse this data without context.
The data in Data for Ukraine is largely picked up from highly-public accounts with many followers, otherwise, the users are anonymized, Robertson said.
In the future, Robertson said he hopes the original tweets used in their data collection can be archived and used by other researchers.
But for now, because the data collection is happening in real time, the researchers are mainly focused on getting the data out there and getting it as accurate as possible, Calvo explained.
“When you’re working in the past, you have all the time in the world to see what things are not working and to correct the model and to work again,” Calvo said. “You are going to lose some precision, and you’re going to have to deal with different types of data problems when you are trying to get things out quickly in the most efficient way possible.”