SCINet

Abstract

Partial multi-label learning aims to extract knowledge from incompletely annotated data, which includes known correct labels, known incorrect labels, and unknown labels. The core challenge lies in accurately identifying the ambiguous relationships between labels and instances. In this paper, we emphasize that matching co-occurrence patterns between labels and instances is key to addressing this challenge. To this end, we propose Semantic Co-occurrence Insight Network (SCINet), a novel and effective framework for partial multi-label learning. Specifically, SCINet introduces a bi-dominant prompter module, which leverages an off-the-shelf multimodal model to capture text-image correlations and enhance semantic alignment. To reinforce instance-label interdependencies, we develop a cross-modality fusion module that jointly models inter-label correlations, inter-instance relationships, and co-occurrence patterns across instance-label assignments. Moreover, we propose an intrinsic semantic augmentation strategy that enhances the model's understanding of intrinsic data semantics by applying diverse image transformations, thereby fostering a synergistic relationship between label confidence and sample difficulty. Extensive experiments on four widely-used benchmark datasets demonstrate that SCINet surpasses state-of-the-art methods.

Methodology

The proposed SCINet framework addresses partial multi-label learning through three key components: 1) Bi-Dominant Prompter, leveraging CLIP-based text and visual encoders to capture semantic co-occurrence between labels and instances, enhanced by weak/medium/strong image transformations; 2) Cross-Modality Fusion Module, which optimizes label confidence by integrating instance similarity (via Gaussian-based local relationships) and label correlations (via Pearson coefficients) into a unified confidence matrix; 3) Intrinsic Semantic Augmentation, employing triple transformations (weak, medium, strong) with consistency losses and self-distillation to align semantic distributions across perturbations. These components synergistically exploit global label-instance dependencies, refine multi-modal alignment, and enhance robustness against partial supervision.

Interactive Demo

Partial Multi-label Learning Results

1 / 4

🖼️ COCO Dataset Example

📊 Partial Multi-label Results

Known Objects

Person

Full Annotations

Sandwich Car Chair Dining Table Person Wine Glass Handbag Backpack

SCINet Predictions

Person

72.73%

🖼️ COCO Dataset Example

📊 Partial Multi-label Results

Known Objects

Person Tennis Racket

Full Annotations

Sports Ball Chair Tennis Racket Person

SCINet Predictions

Person

84.57%

Chair

87.01%

🖼️ COCO Dataset Example

📊 Partial Multi-label Results

Known Objects

Banana

Full Annotations

Refrigerator Sink Bottle Bowl Banana Vase Potted Plant Oven

SCINet Predictions

Banana

56.10%

Dining Table

50.85%

🖼️ COCO Dataset Example

📊 Partial Multi-label Results

Known Objects

Airplane Clock

Full Annotations

Clock Bottle Book Airplane

SCINet Predictions

Airplane

82.24%

Car

53.30%

More Results

Citation

@article{WU2025, title = {Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge}, year = {2025}, author = {Wu, Xin and Teng, Fei and Feng, Yue and Shi, Kaibo and Lin, Zhuosheng and Zhang, Ji and Wang, James} }

Abstract

Methodology

Interactive Demo

More Results

# 1. Varying Label Proportions

# 2. Feature Space Comparison

# 3. Prompt Length Impact

# 4. Similarity Measure

# 5. Parameter λn and λq

Citation