Date of Award

5-2026

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical Engineering and Computer Science

First Advisor

Sneha Sudhakaran

Second Advisor

Christian Sonnenberg

Third Advisor

Khaled Slhoub

Fourth Advisor

Brian A. Lail

Abstract

With the rapid advancements in artificial intelligence-based image generation and manipulation tools, it is extremely difficult to detect if an image is genuine or artificially crafted. Despite extensive research in this area, existing image detection systems suffer from three major problems: suboptimal cross-dataset generalization due to shortcut learning of dataset-specific patterns, unreliable probability estimates due to domain shift, particularly in cross-manipulation evaluation settings, and an inability to detect images manipulated by multiple types of manipulations within a single detection framework. To address these limitations, we propose HDRA-Fusion (Hybrid Detection with Routed Architecture), a framework built on the conclusion that different types of AI face forgery require different detection approaches and there is no single detector that can achieve high detection on multiple forgery techniques. The framework first detects and analyzes the input image using a manipulation router to obtain a raw CLIP ViT-B/32 visual encoder embedding to determine the manipulation technique type used, whether the image is GAN-based generated or face-swap manipulated. The router, implemented as a lightweight classifier, achieves an AUC of 0.9726 and an accuracy of 93.25% on completely unseen sources, proving that CLIP visual representations inherently encode manipulation modality without any specific fine-tuning. Based on the router’s confidence, the image is sent to one of two implemented detectors: FRED- Fusion (Forensic Recognition and Ensemble Detection Fusion), a hybrid feature-fusion detector designed to test different manipulation techniques using the leave-one-source- out (LOSO) approach. FRED-Fusion uses fine-tuned CLIP ViT-B/32 embeddings, along with classical forensic features and frequency-domain representations, all com- bined in a domain-adversarily trained XGBoost ensemble. The second detector is the Self-Blended Images model, a pretrained EfficientNet-B4 specialist for face-swap and facial forgery detection. Images for which the router cannot come up with a decision of the manipulation-type are flagged as abstain and withheld from detection, accounting for only 1.2% of evaluated images. This architecture’s main goal is to combine face-swap and GAN-based detection into a single framework that achieves high per- formance, demonstrating that manipulation-aware routing is a principled and effective approach to building forensic detection systems that generalize across the full spectrum of AI-generated and AI-manipulated facial imagery. Code and pre-trained models are publicly available at: https://github.com/omarebeid24/HDRA-Fusion

Share

COinS