Date of Award

12-2022

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Engineering and Sciences

First Advisor

Philip Chan

Second Advisor

Georgios Anagnostopoulos

Third Advisor

Debasis Mitra

Fourth Advisor

Marius C. Silaghi

Abstract

As machine learning models have achieved great success in various research and industry fields, the success of these models heavily relies on the massive amount of data collection and human annotations. While the real world is an open set, the daily emerged categories and the lacking of annotations have become new challenges for machine learning models. The absence of newly emerged categories in training samples can be captured by Open Set Recognition (OSR). Then, given the newly emerged samples, the process of automatically identifying the novel categories is called Novel Category Discovery (NCD). In this dissertation, we focused on learning the representations for OSR and NCD. To learn the representations for OSR, we first introduce an extension called Min Max Feature (MMF) that can be incorporated into different loss functions to find more discriminative representations. Our evaluation shows that the proposed extension can significantly improve the OSR performances of different types of loss functions. Then, we propose a self-supervision method Detransformation Autoencoder (DTAE), for the OSR problem in the image dataset. This proposed method engages in learning representations that are invariant to the transformations of the input data. Next, to extend DTAE to the graph dataset, we present two transfor

mations (FCG-shift and FCG-random) for the Function Call Graph (FCG) based malware representations to facilitate the pretext task. The experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem in both image and graph datasets. To tackle the problem of NCD under an open-set scenario, we propose General Intra-Inter (GII) loss to learn a representation space that clusters the unlabeled samples as novel categories, meanwhile maintaining sensitivity to the unknown category. Our evaluation of image and graph datasets shows that GII outperforms other approaches in NCD and OSR.

Recommended Citation

Jia, Jingyun, "Representation Learning for Open Set Recognition and Novel Category Discovery" (2022). Theses and Dissertations. 743.
https://repository.fit.edu/etd/743

Download

Included in

Computer Sciences Commons

COinS

Theses and Dissertations

Representation Learning for Open Set Recognition and Novel Category Discovery

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Theses and Dissertations

Representation Learning for Open Set Recognition and Novel Category Discovery

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner