My research focuses on building architectures that jointly optimize multiple related tasks for better generalization through shared representations. I work on adaptive weighting, task-interaction mechanisms, and cross-task attention for computer vision and remote sensing.
View SlidesI integrate multiple modalities (images, text, audio) to develop robust multimodal representations. My work explores cross-modal alignment, fusion strategies, and transfer learning for zero-shot and few-shot tasks.
View SlidesI design models that generalize across diverse domains by learning domain-invariant representations and leveraging prompt-based adaptation and self-supervised pretraining for domain-robust vision models.
I study self-distillation where models refine their own representations by transferring knowledge from intermediate layers to deeper ones, improving accuracy without external teachers.
I work on adaptation strategies for large vision-language models such as CLIP and BLIP, including prompt tuning, domain adaptation, and semantic alignment for robust cross-modal understanding.
I develop few-shot learning frameworks that combine meta-learning and self-supervision to enable efficient learning from limited data and unseen tasks.
I explore pruning, quantization, and distillation to compress large-scale vision-language models, enabling efficient deployment on edge devices.