Ankit Jha | Research

Multitask Learning

My research focuses on building architectures that jointly optimize multiple related tasks for better generalization through shared representations. I work on adaptive weighting, task-interaction mechanisms, and cross-task attention for computer vision and remote sensing.

View Slides

Multimodal Learning

I integrate multiple modalities (images, text, audio) to develop robust multimodal representations. My work explores cross-modal alignment, fusion strategies, and transfer learning for zero-shot and few-shot tasks.

View Slides

Multi-Domain Learning

I design models that generalize across diverse domains by learning domain-invariant representations and leveraging prompt-based adaptation and self-supervised pretraining for domain-robust vision models.

Self-Distillation

I study self-distillation where models refine their own representations by transferring knowledge from intermediate layers to deeper ones, improving accuracy without external teachers.

Vision-Language Models

I work on adaptation strategies for large vision-language models such as CLIP and BLIP, including prompt tuning, domain adaptation, and semantic alignment for robust cross-modal understanding.

Few-Shot Learning

I develop few-shot learning frameworks that combine meta-learning and self-supervision to enable efficient learning from limited data and unseen tasks.

Model Compression

I explore pruning, quantization, and distillation to compress large-scale vision-language models, enabling efficient deployment on edge devices.