Date of Award

7-2015

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering and Sciences

First Advisor

Philip Chan

Second Advisor

Ryan Stansifer

Third Advisor

Ken Lindeman

Fourth Advisor

Richard Newman

Abstract

In Twitter, many aspects of retweeting behavior, which is the most effective indicator of spreading effectiveness of the tweet, have been researched, such as whether a reader will retweet a certain tweet or not. However, the total number of retweets of the tweet, which is the quantitative measure of quality, has not been well addressed by existing work. To estimate the number of retweets and associated factors, this paper proposes a procedure to develop a personalized model for one author. The training data comes from the author’s past tweets. We propose 3 types of new features based on the contents of the tweets: Entity, Pair, and Cluster features, and combine them with features used in prior work. The experiments on 7 authors demonstrate that comparing to the previous features only. Pair feature has a statistically significant improvement on the correlation coefficient between the prediction and the actual number of retweets. We studied all combinations of the 3 types of features, and the combination of the Pair and Cluster features has the best performance overall. As an application, this work can be used as a personalized tool for an author to evaluate his/her tweet before posting it, so that he/she can improve the tweet to achieve more attention.

Share

COinS