Figure 2: CLIP-based Contrastive Learning Architecture for fMRI-to-Image Embedding
The system employs contrastive learning to align fMRI neural representations with visual features in CLIP's embedding space.
The fMRI encoder learns to map brain signals to the same 512-dimensional space as CLIP image embeddings,
enabling cross-modal retrieval and reconstruction tasks through similarity-based matching.