Multimodal, alias, guided, image restoration is the reconstruction of a degraded image from a target modality with the aid of a high quality image from another modality. A similar task is image fusion; it refers to merging images from different modalities into a composite image. Traditional approaches for multimodal image restoration and fusion include analytical methods that are computationally expensive at inference time. Recently developed deep learning methods have shown a great performance at a reduced computational cost; however, since these methods do not incorporate prior knowledge about the problem at hand, they result in a “black box” model, that is, one can hardly say what the model has learned. In this paper, we formulate multimodal image restoration and fusion as a coupled convolutional sparse coding problem, and adopt the Method of Multipliers (MM) for its solution. Then, we use the MM-based solution to design a convolutional neural network (CNN) encoder that follows the principle of deep unfolding. To address multimodal image restoration and fusion, we design two multimodal models which employ the proposed encoder followed by an appropriately designed decoder that maps the learned representations to the desired output. Unlike most existing deep learning designs comprising multiple encoding branches followed by a concatenation or a linear combination fusion block, the proposed design provides an efficient and structured way to fuse information at different stages of the network, providing representations that can lead to accurate image reconstruction. The proposed models are applied to three image restoration tasks, as well as two image fusion tasks. Quantitative and qualitative comparisons against various state-of-the-art analytical and deep learning methods corroborate the superior performance of the proposed framework.
|Number of pages||16|
|Journal||IEEE Transactions on Circuits and Systems for Video Technology|
|Publication status||Published - 1 Sep 2022|