Deep learning has achieved great performance in a plethora of signal
and image processing tasks. With recent developments in
autonomous driving, robotics and other smart devices, there is a
growing need for small and efficient deep learning models that are
also interpretable and whose decisions can be clearly explained to
humans. The Transformer architecture has recently gained popularity
in the computer vision domain, improving the state-of-the-art in many
tasks. Meanwhile, these models remain highly computationally
expensive to train and use, prohibiting its usage in constrained
devices such as robotics or smart cameras. When extended to
multimodal imaging data, these costs can further explode. Therefore,
the DUST project proposes the development of a new class of
efficient and interpretable Transformers, based on sparsity and the
deep unfolding framework. These models will be used to achieve
state-of-the-art performance in multimodal imaging applications, for
instance multimodal object and anomaly detection. Secondly, the
models will be extended to GANs for guided super-resolution and
image fusion of e.g., RGB and depth or infrared. Furthermore, DUST
will develop new explainability and interpretability methods, both for
the explanation of classification and detection decisions of the
Transformers, as well as to gain insight into generative processes of
the Transformer GANs, which has not yet been studied extensively