Six years ago I lived in Damascus, Syria.
It has already been two years since the beginning of the Arab Spring there. Life was to be planned and lived one day at a time. We adjusted to the new changes and challenges that came. Less electricity, less water, less outdoors time, less air to breathe — and on the other hand, a more expensive life, more crowding, more explosions, more death notes, more restrictions, more destruction. Over time, all that became hard to live with, and it was time for me to join so many friends, family, and strangers — but familiar stories — leaving the country. I went through my own displacement journey until I settled in Ireland but I always looked back and still do, to all the Syrians that were scattered by the unfortunate events that took place in Syria and I feel hopeless. I used to believe that donations — human aid or financial — were the only way to help make the lives of refugees better; until I discovered the power of big data.
Social media, news articles, satellite images or online searches are all types of big data sources that are publicly available (to some extent) and can be harnessed for social good. For example: Facebook advertisements can tell us how much migrants and locals interests match to some level. A user’s network of friends can give an indication of how much a community is integrated in a host country. People’s tweets also can also be used as an indicator for integration, xenophobia, unemployment or human mobility. Satellite imagery is also a useful non-traditional source of data to understand the dynamics and evolution of refugee camps — take, for example, providing data around the number of people and shelters, which can inform camp planning and monitoring. It is also used to measure damage in areas that are out of reach.
What the News Says About Refugees
I started looking at those data sources differently. Facebook was never the same for me. The Syrian crisis was documented through its pages and accounts. Twitter was awash with welcoming and far less welcoming hashtags towards refugees. Satellite images were describing what happened in my home country and it was the only way I could know more about my house in the Damascus suburbs after we lost connection with the neighborhood. Unfortunately, the house was in a red square. My morning news browsing was full of refugees related articles. Welcoming refugees, blaming refugees, describing their escape journeys, their integration, their problems.
“ Migrants are a threat to Hungary” – Viktor Orban, Hungary’s President
“Refugees are a burden because they exploit the social benefits and work of the native inhabitants”. 65% of Italians believe that based on a poll by IPSOS MORI
“Refugees do not come with firewood from their countries, they have destroyed our environment.” – Rebecca Kadaga, The Speaker of Parliament of Uganda
“Migrants are lucky we aren’t executing them” US National Guardsman
This is just a sample of some of the statements made publicly by people with high profiles and levels of authority around the world. And it is just for one month. Can you imagine what could we find if we dig deeper? Not only they are harshly worded (and delivered) messages, but they affect how people think of refugees and therefore interact with them. News “bubbles” can polarize host communities, and filtered information focused on negative stereotypes can make communication and integration between refugees and host communities much harder. But can we actually detect Xenophobia from news articles?
What Effect Does the News Have?
I decided to explore the news using GDELT (Global Dataset of Emotions, Language, Tone), the biggest open source collection of news articles with high frequency (15min.) and in over 100 languages. GDELT has been used previously in measuring refugee flows in Europe depending on media citation . It is also used to measure the sentiment of German media coverage of refugee issues, and how this aligns with the public interest using their Google search.
I was curious to explore the messages found in the news and create an overview of it. I extracted news from GDELT for one month, June 2018, looking for messages describing refugees, migrants and asylum seekers.
Example: “Refugees are a threat to Hungary”
Next, I extracted locations from those sentences. If Location was not found, I extracted a random one from the article:
Example: “Refugees are a threat to Hungary” => Location: Hungary
After that, I measured the sentiment of the sentence — whether it is a positive or negative story around refugees or just a neutral piece of news. Since the number of news items is significant, and there is no labeled data for journalism sentiment, I start with a naive approach. I apply an off-the-shelf classifier trained on movies reviews.
Example: “Refugees are a threat to Hungary” => Negative
With the help of the crowd, users can take part in labeling the news, improving the accuracy of the classifier in order to better tag future news.
After extracting the location and the sentiment for each article, I visualize the results on a map as follows:
Users can click over a country to see what news items appear around refugees on that date. Orange circles indicate negative news.
In addition to the map, there are three parts that help users better understand the narrative and answer the following questions:
What are the main topics in the refugees related articles?
Using Latent Semantic Analysis (LSA) over the articles, I extract groups of keywords representing a topic. A topic is a group of words appearing together in the same document. Each topic is represented in a group of color . The size of the node represents the weight of that word in the documents (which is calculated using tf-idf). The bigger the node the more important it is. Topic keywords overlap in some cases, which is seen in the links between them.
What are the words most co-occurring with refugees?
Users can see the word cloud of the most common words that co-occur with refugees, after removing stop words and noisy words (e.g.: crisis, camps). The bigger the word, the more frequent/important it is.
How are words associated with refugees?
For a more detailed view of the word association, users can navigate through a network of words where the word ‘refugee’ is the center node. The second level of nodes in green represents the second word that occurs after refugee, showing only the top 20 words for a clearer visualization. Purple nodes are the third word/level in a sentence starting with ‘Refugee’. The edge of the network also indicates how frequently those words appear with each other — the thicker the edge, the more frequent.
This dashboard would potentially help UNICEF explore locations of countries with negative stories in their media, in order to better target the problem of Xenophobia with customized local campaigns or media training. It can also be useful to spread awareness and empathy towards migrants and refugees by allowing the public to participate in classifying the news as positive, negative or neutral and giving them the chance to understand refugees’ daily struggles and challenges.
Currently, the data available on the website is a sample for one month. The next step is to add more daily news and to automate the process to make it a real-time dashboard plugged in directly to GDELT news.
Also, the current data is only for English news which can be considered as a bias since local news tends to be closer to communities, and may be different to what is internationally published. Thus, adding localized news would give another depth to the news analysis.
To explore the sample dataset and help us classify news articles, go to : http://refugeesare.info
TechFugees Challenge Winner