Projecten per jaar
Samenvatting
This doctoral thesis presents several contributions in the fields of interpretable and explainable deep learning, focusing on video processing and sequential problems. The developed methods provide various insights on model-based approaches for deep learning, thereby contributing to the application of trustworthy AI in signal processing. We subdivide our findings into three main research areas: (1) model-based designs of efficient deep neural networks (DNNs), (2) generalization and information theories-based deep learning, and (3) post-hoc explainability.
In area (1), we introduce several deep unfolding networks for video processing tasks. Deep unfolding designs DNNs to learn variations of iterative algorithms that promote low-complexity data structures. Our first model, reweighted-RNN, is a recurrent neural network (RNN) for reconstructing video frames from compressive measurements. Specifically, it unfolds a weighted-$\ell_1$-$\ell_1$ minimization algorithm with sparse priors on the frame representation and the correlation of successive frames. Experimental validation on real video shows that our model consistently outperforms other deep unfolding RNNs, as well as standard ones while requiring substantially fewer training parameters. We then present a second network, refRPCA-Net, for the task of video foreground-background separation by learning the robust principal component analysis (RPCA). The network unfolds a proximal method for RPCA and incorporates the video correlation-promoting scheme of reweighted-RNN to leverage temporal relationship of the foreground, leading to enhanced performance compared to existing deep unfolding methods. Then, we address the related problem video foreground masking by unfolding an alternating directions of multipliers method. The resulting networks, ROMAN-S and ROMAN-R, outperform both the original optimization methods in terms of foreground detection quality and inference speed. They also outperform previous RPCA-based networks, as well as a typical 3D convolutional neural network (3D U-Net), with the additional advantage of providing video backgrounds, requiring substantially fewer parameters, and showing better generalization when trained on small datasets.
In research area (2), we first focus on model generalization. Here, we present a generalization error bound (GEB) for reweighted-RNN and derivative networks based on its Rademacher complexity. This is a first-of-its-kind result that bridges machine learning theory with deep unfolding RNNs. The GEB is further extended to sequence classification. An empirical evaluation of the bound demonstrates that reweighted-RNN achieves tight theoretical generalization error compared to its non-reweighted counterpart. We also demonstrate the positive impacts of weight regularization and norm-based reparameterization on the generalization properties, as related to the proposed complexity bound. Our next contribution to information-theoretic machine learning considers the learning of scalable representations of distributed signals using RNNs. This problem is also known as successive refinement coding in the Wyner-Ziv setup, which considers the progressive encoding a continuous source and its successive decoding with increasing levels of quality and with the aid of correlated side-information. We derive a variational loss function to train layered encoders and decoders, showing that the trained RNNs can explicitly retrieve layered binning solutions akin to scalable nested quantization. This result opens the door to various applications benefiting from learned approaches for distributed compression.
Lastly, in area (3), we introduce InteractionLIME, a model-agnostic feature attribution method to explain any multi-input model. The relevant interacting features are discovered by creating a bipartite graph of features and regressing a bilinear model using randomly perturbed samples. An experimental study on contrastive vision and language models demonstrates the effectiveness of InteractionLIME in explaining how the models capture similarity between inputs. The proposed technique is also promising to explain the learned representations of our other proposed networks.
In area (1), we introduce several deep unfolding networks for video processing tasks. Deep unfolding designs DNNs to learn variations of iterative algorithms that promote low-complexity data structures. Our first model, reweighted-RNN, is a recurrent neural network (RNN) for reconstructing video frames from compressive measurements. Specifically, it unfolds a weighted-$\ell_1$-$\ell_1$ minimization algorithm with sparse priors on the frame representation and the correlation of successive frames. Experimental validation on real video shows that our model consistently outperforms other deep unfolding RNNs, as well as standard ones while requiring substantially fewer training parameters. We then present a second network, refRPCA-Net, for the task of video foreground-background separation by learning the robust principal component analysis (RPCA). The network unfolds a proximal method for RPCA and incorporates the video correlation-promoting scheme of reweighted-RNN to leverage temporal relationship of the foreground, leading to enhanced performance compared to existing deep unfolding methods. Then, we address the related problem video foreground masking by unfolding an alternating directions of multipliers method. The resulting networks, ROMAN-S and ROMAN-R, outperform both the original optimization methods in terms of foreground detection quality and inference speed. They also outperform previous RPCA-based networks, as well as a typical 3D convolutional neural network (3D U-Net), with the additional advantage of providing video backgrounds, requiring substantially fewer parameters, and showing better generalization when trained on small datasets.
In research area (2), we first focus on model generalization. Here, we present a generalization error bound (GEB) for reweighted-RNN and derivative networks based on its Rademacher complexity. This is a first-of-its-kind result that bridges machine learning theory with deep unfolding RNNs. The GEB is further extended to sequence classification. An empirical evaluation of the bound demonstrates that reweighted-RNN achieves tight theoretical generalization error compared to its non-reweighted counterpart. We also demonstrate the positive impacts of weight regularization and norm-based reparameterization on the generalization properties, as related to the proposed complexity bound. Our next contribution to information-theoretic machine learning considers the learning of scalable representations of distributed signals using RNNs. This problem is also known as successive refinement coding in the Wyner-Ziv setup, which considers the progressive encoding a continuous source and its successive decoding with increasing levels of quality and with the aid of correlated side-information. We derive a variational loss function to train layered encoders and decoders, showing that the trained RNNs can explicitly retrieve layered binning solutions akin to scalable nested quantization. This result opens the door to various applications benefiting from learned approaches for distributed compression.
Lastly, in area (3), we introduce InteractionLIME, a model-agnostic feature attribution method to explain any multi-input model. The relevant interacting features are discovered by creating a bipartite graph of features and regressing a bilinear model using randomly perturbed samples. An experimental study on contrastive vision and language models demonstrates the effectiveness of InteractionLIME in explaining how the models capture similarity between inputs. The proposed technique is also promising to explain the learned representations of our other proposed networks.
Originele taal-2 | English |
---|---|
Toekennende instantie |
|
Begeleider(s)/adviseur |
|
Datum van toekenning | 7 nov 2024 |
Status | Published - 2024 |
Projecten
- 1 Afgelopen
-
FWOSB97: Interpreteerbare en Verklaarbare Deep Learning voor Video Verwerking
Joukovsky, B. & Deligiannis, N.
1/11/20 → 31/10/24
Project: Fundamenteel