Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Computer Engineering and Sciences

First Advisor

Eraldo Ribeiro

Second Advisor

Theodore Petersen

Third Advisor

Ronaldo Menezes

Fourth Advisor

Philip Bernhard


The success of humans cannot be attributed to language, but it is certainly true that language and humans are inseparable. Since the first language appeared, we have seen that language continually evolving over space and social gatherings to formed around 7,000 languages today. The origin and evolution of languages still vague, and state-of-the-art in languages evolution still lack a comprehensive characterization. In general, this problem is mainly tackled by statistical measuring the changes on the part of the language ( e.g., words and sounds). Given the current availability of data and computational power, this dissertation proposes a comprehensive data-driven characterization of language evolution using vocabulary in two main fields. First, extracted and classified the structural and chronological relations between the languages using its vocabulary. Second, studied the Spatiotemporal effect on language vocabulary and its relation with socio-economic factors ( i.e., educational attainment). The results demonstrated that the proposed method is capable of uncovering the relation between languages from both structural and chronological aspects, also we found that the vocabulary levels can reveal the educational attainment of a resident population for specific areas and times.


Copyright held by author