Date of Award

12-2001

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Engineering and Sciences

First Advisor

Ryan Stansifer

Second Advisor

Phil Bernhard

Third Advisor

James Whittaker

Fourth Advisor

Gary Howell

Abstract

Recent and (continuing) rapid increases in computing power now enable more of humankind's written communication to be represented as digital data. The most recent and obvious changes in multilingual information processing have been the introduction of larger character sets encompassing more writing systems. Yet the very richness of larger collections of characters has made the interpretation and processing of text more difficult. The many competing motivations (satisfying the needs of linguists, computer scientists, and typographers) for standardizing character sets threaten the purpose of information processing: accurate and facile manipulation of data. Existing character sets are constructed without a consistent strategy or architecture. Complex algorithms and reports are necessary now to understand raw streams of characters representing multilingual text. We assert that information processing is an architectural problem and not just a character set problem. We analyze several multilingual information processing algorithms (e.g., bidirectional reordering and character normalization) and we conclude that they are more dangerous than beneficial. The countless number of unexpected interactions suggest a lack of a coherent architecture. We introduce abstractions, novel mechanisms, and take the first steps towards organizing them into a new architecture for multilingual information processing. We propose a multilayered architecture which we call Metacode where character sets appear in lower layers and protocols and algorithms in higher layers. We recast bidirectional reordering and character normalization in the Metacode framework.

Recommended Citation

Atkin, Steven Edward, "A Framework for Multilingual Information Processing" (2001). Theses and Dissertations. 681.
https://repository.fit.edu/etd/681

Download

Included in

Computer Sciences Commons

COinS

Theses and Dissertations

A Framework for Multilingual Information Processing

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Theses and Dissertations

A Framework for Multilingual Information Processing

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner