Date of Award

7-2019

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Engineering and Sciences

First Advisor

Ronaldo Menezes

Second Advisor

Nezamoddin Nezamoddini-Kachouie

Third Advisor

Eraldo Ribeiro

Fourth Advisor

Thomas Eskridge

Abstract

With the rapid growth in the number of online users, people tend to use emojis to enrich their text with emotions. In this dissertation, after an overview of the previous studies on emojis, we provide a brief introduction to emojis and the way they are constructed by the Unicode codes points. Then, we extract the emojis from messages collected from Twitter in different topics. In the first step towards analyzing the emoji usage on social media, we created the directed weighted co-occurrence network of emojis for each topic. By analyzing these networks, we realized that emoji usage has a similar structure regardless of topic. Then we show that most of the emojis are grouped in the top 5 communities of those networks. Later on, we show that most of the emojis are used in positive sentiment tweets. As a further exploration, by analyzing the distribution of the position of emojis, we realized that most of the emojis are used at the end of tweets, and this happens independent of the sentiment of emojis. We also showed that the semantics of emojis are changing through different categories. In order to find the cultural differences reflected by emojis, we consider languages and countries as two indicators of culture. We divide the whole data set with respect to the language of the tweets and we call these the subject-based language data sets. Then, we create the network of each subject-based language data set. Following this, we extracted the node betweenness, and PageRank scores of the emojis. After calculating the rank correlation between the pairs of the subject-based language data sets, we cluster them using Ward’s method of hierarchical clustering. We show that some languages are similar in spite of the fact that they may seem to have less similarity based on the language family they come from. We follow the same procedure for countries and show the similarity between the subject-based country data sets and some of the social and economical indices. We also introduce a novel way for method validation to show that our method of structural hierarchical clustering can find meaningful clusters.

Share

COinS