Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Computer Engineering and Sciences

First Advisor

Eraldo Ribeiro

Second Advisor

Georgios Anagnostopoulos

Third Advisor

Shengzhi Zhang

Fourth Advisor

Marius Silaghi


Model-free visual tracking is an important problem in computer vision. The abundance of applications make the problem attractive, and, as a result, significant progress was made, especially in the recent years. A number of reasons make tracking a hard problem: change of lighting conditions throughout the video, change of scale and rotation of the object, as well as frequent and occlusions. In this dissertation, we build upon a tracker known as Struck, which is based on a structured support vector machine. To make the structured tracker robust, we improve it in a number of ways. To make the structured tracker robust to short-time occlusions and falsepositive detections, we propose to use the Robust Kalman filter. Here, we develop a strategy that allows us to detect, and recover from, short-time occlusions and/or incorrect detections. By treating inconsistent detections, which are labeled by the filter as outliers, we show that our new method, called RobStruck, improves the tracking accuracy as measured by standard tracking-accuracy metrics. To guide the tracker into locations that are more likely to contain an object, we propose to use saliency measures. Saliency measures, also known as objectness, estimate how likely a given location in the image to contain an object of any type. The objectness measures we consider here - straddling and edge density - are based on semantic object segmentation and edge detection. These measures are unsupervised, and are fast to compute - an ideal fit for tracking, where realtime performance is often desired. We build a object-aware tracker, which we call ObjStruck and show that objectness measures improve tracking. To find a better feature representation, we incorporate deep features from prelearned deep-convolutional network in a computationally-efficient manner. Using a M-Best diverse-sampling approach, we can sample a small and diverse set of bounding boxes that are likely to contain the target. These bounding boxes are then used to perform detection using deep features. The resulting tracker, which we call MBestStruck, uses high-quality feature representation while remaining computationally efficient. We systematically evaluate each of our contributions on four different visualtracking benchmarks and compare them to the state-of-the-art.