Exploring Partial Multi-Label Learning via Integrating
Semantic Co-occurrence Knowledge

*Corresponding Author

✨ Highlights


  • SCINet Framework: Models semantic co-occurrence among labels, instances, and instance-label assignments for precise partial multi-label learning.
  • Bi-Dominant Prompter & Cross-Modality Fusion: Uses CLIP to craft text- and image-based prompts and fuse them for robust label confidence.
  • Intrinsic Semantic Augmentation: Applies multi-level augmentations with dynamic thresholds and self-distillation to align confidence with sample difficulty.
  • Superior Performance: Outperforms baselines on VOC2007/12, COCO2014, and CUBβ€”e.g. up to +9.02% mAP gain under partial annotations.


Abstract

Partial multi-label learning aims to extract knowledge from incompletely annotated data, which includes known correct labels, known incorrect labels, and unknown labels. The core challenge lies in accurately identifying the ambiguous relationships between labels and instances. In this paper, we emphasize that matching co-occurrence patterns between labels and instances is key to addressing this challenge. To this end, we propose Semantic Co-occurrence Insight Network (SCINet), a novel and effective framework for partial multi-label learning. Specifically, SCINet introduces a bi-dominant prompter module, which leverages an off-the-shelf multimodal model to capture text-image correlations and enhance semantic alignment. To reinforce instance-label interdependencies, we develop a cross-modality fusion module that jointly models inter-label correlations, inter-instance relationships, and co-occurrence patterns across instance-label assignments. Moreover, we propose an intrinsic semantic augmentation strategy that enhances the model's understanding of intrinsic data semantics by applying diverse image transformations, thereby fostering a synergistic relationship between label confidence and sample difficulty. Extensive experiments on four widely-used benchmark datasets demonstrate that SCINet surpasses state-of-the-art methods.


Methodology


method

The proposed SCINet framework addresses partial multi-label learning through three key components: 1) Bi-Dominant Prompter, leveraging CLIP-based text and visual encoders to capture semantic co-occurrence between labels and instances, enhanced by weak/medium/strong image transformations; 2) Cross-Modality Fusion Module, which optimizes label confidence by integrating instance similarity (via Gaussian-based local relationships) and label correlations (via Pearson coefficients) into a unified confidence matrix; 3) Intrinsic Semantic Augmentation, employing triple transformations (weak, medium, strong) with consistency losses and self-distillation to align semantic distributions across perturbations. These components synergistically exploit global label-instance dependencies, refine multi-modal alignment, and enhance robustness against partial supervision.


Interactive Demo


Partial Multi-label Learning Results
1 / 4
πŸ–ΌοΈ COCO Dataset Example
Restaurant Scene
πŸ“Š Partial Multi-label Results
Known Objects
Person
Full Annotations
Sandwich Car Chair Dining Table Person Wine Glass Handbag Backpack
SCINet Predictions
Person
72.73%
πŸ–ΌοΈ COCO Dataset Example
Tennis Scene
πŸ“Š Partial Multi-label Results
Known Objects
Person Tennis Racket
Full Annotations
Sports Ball Chair Tennis Racket Person
SCINet Predictions
Person
84.57%
Chair
87.01%
πŸ–ΌοΈ COCO Dataset Example
Kitchen Scene
πŸ“Š Partial Multi-label Results
Known Objects
Banana
Full Annotations
Refrigerator Sink Bottle Bowl Banana Vase Potted Plant Oven
SCINet Predictions
Banana
56.10%
Dining Table
50.85%
πŸ–ΌοΈ COCO Dataset Example
Indoor Collection Scene
πŸ“Š Partial Multi-label Results
Known Objects
Airplane Clock
Full Annotations
Clock Bottle Book Airplane
SCINet Predictions
Airplane
82.24%
Car
53.30%

More Results


method

method

method

method

method

Citation

@article{WU2025,
  title = {Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge},
  year = {2025},
  author = {Wu, Xin and Teng, Fei and Feng, Yue and Shi, Kaibo and Lin, Zhuosheng and Zhang, Ji and Wang, James}
}