Portfolio

Video Domain Adaptation for Semantic Segmentation

Unsupervised domain adaptation (UDA) techniques address domain shift and allow deep neural networks to be used for various tasks. Although several image UDA methods have been presented, video-based UDA has received less attention. This is due to the complexity involved in adapting diverse modalities of video features, such as the long-term dependency of temporally associated pixels. Existing methods often use optical flow to capture motion information between consecutive frames of the same domain, however, leveraging optical flow across images in different domains is challenging, and generally, the computation is heavy. In this work, we propose adversarial domain adaptation for video semantic segmentation by aligning temporally associated pixels in successive source and target frames. More specifically, we identify perceptually similar pixels with the highest correlation across consecutive frames and infer that such pixels correspond to the same class. Perceptual consistency matching (PCM) on successive frames within the same and across domains enables to capture temporal consistency and improves prediction accuracy. In addition, our method achieves faster inference time without using optical flow. Extensive experiments on public datasets demonstrate our method outperforms existing state-of-the-art video-based UDA approaches.

Drone Tracking in IR Videos with Transformers

In this work, we introduce a comprehensive framework designed to utilize the power of temporal information for precise drone tracking within infrared (IR) video sequences. We have integrated temporal information at two stages: feature extraction and similarity map enhancement. In the stage of feature extraction, we employ an online, temporally adaptive convolution approach that leverages temporal information to augment spatial features. This enhancement is achieved by dynamically adjusting convolution weights based on previous frame data. As for refining the similarity map, our method utilizes an adaptive temporal transformer. This transformer efficiently encodes temporal insights and subsequently decodes this knowledge to fine-tune the similarity map, ensuring accurate tracking results.

Protecting Personal Computers from Unauthorized Mobile Recordings

Unauthorized mobile screen recordings pose a serious threat to the security and privacy of personal computers in today’s digital era. Nevertheless, there is a lack of prior research conducted to address this particular challenge. To tackle this challenge, we present a deep learning approach that effectively manipulates the channels in the temporal dimension in video frames. The channel manipulation in temporal dimension allows the mixing of feature maps from adjacent frames with the current frame, resulting in improved mobile action recognition in videos. Moreover, the Mobilenetv2 architecture incorporates the channel shifting module after the bypass connections. In addition, the proposed method employs the Mobilenetv2 architecture, resulting in improved computational efficiency for frame processing. Consequently, it is well-suited for real-time recognition of unauthorized mobile screen recording, with low latency.

Deep Annotator Project

A Deep Learning-based Annotation Tool for Medical Images is an AI-powered tool that assists in the annotation process of medical images, making it faster and more accurate. The tool utilizes deep learning algorithms to make predictions about the annotations, but also allows for manual adjustments. This semi-automatic approach balances automation and manual intervention to ensure accurate and reliable annotations. This tool is an efficient and valuable tool for medical professionals, helping them make informed decisions for their patients.

Deep Vehicle Recongnition System

A Deep Learning based Vehicle Recognition System uses artificial neural networks and computer vision to accurately identify and classify different types of vehicles in real-time. Developed using Caffe, C++, and QT, this system has applications in transportation and security, such as traffic management and road safety. The system is trained on large datasets of vehicle images and recognizes various types of vehicles using a convolutional neural network model. This technology has the potential to revolutionize the field of transportation and security.