In this paper, we address the problem of neural architecture search (NAS) in a context where the optimality policy derivatives. In this scenario, O(A) typically provides readings from a set of sensors on how a neural network architecture A fares in a target hardware, including its: power consumption, working temperature, cpu/gpu usage, central bus occupancy, and more. Current differentiable NAS approaches fail in this problem context due to lack of access to derivatives, whereas traditional reinforcement learning NAS approaches remain too expensive computationally. As solution, we propose a reinforcement learning NAS strategy based on policy gradient with increasingly sparse rewards. We rely on the fact that one does not need to fully train the weights of two neural networks to compare them. Our solution starts by comparing architecture candidates with almost fixed weights and no training, and progressively shifts toward comparisons under full weights training. Experimental results confirmed both the accuracy and training efficiency of our solution, as well as its compliance with soft/hard constraints imposed on the sensors feedback. Our strategy allows finding near-optimal architectures significantly faster, in approximately 1/3 of the time it would take otherwise.
|Title of host publication||International Conference on Artificial Intelligence in Information and Communication (ICAIIC)|
|Number of pages||6|
|Publication status||Published - 21 Feb 2020|
|Event||International Conference on Artificial Intelligence in Information and Communication: ICAIIC2020 - Takakura Hotel, Fukuoka, Japan|
Duration: 19 Feb 2020 → 21 Feb 2020
|Conference||International Conference on Artificial Intelligence in Information and Communication|
|Period||19/02/20 → 21/02/20|