Abstract
We look at an application based on the chemical production plant of Castillo and Roberts. It is a two-stage process with four times two parallel machines.
Each order must be handled first by a 'stage-one' machine and afterwards by a 'stage-two' machine. At each stage, a scheduler must choose between two parallel machines. Parallel machines can handle the same type of tasks, but may differ in speed. The schedulers' objective is to fulfill a sequence of product orders in the least amount of time. This means we want to minimize the make-span.
The action and state space grow very large when the above scheduling problem would be controlled in a centralized way. A better alternative is a multi-agent approach. In this way, one large problem is divided into different smaller problems. Each of these problems are solved by a single agent using his local/private information. This, however, creates a new problem in the multi-agent setting. From the viewpoint of one agent, the outcome/success of his actions can also depend on the actions of other agents.
In our solution, agents decide which machine processes the next job. After each 'stage-one' machine, an agent selects a machine from stage-two.
The learning techniques which we apply to this problem are Learning Automata and Q-Learning. In previous work we used Q-Learning to the problem of selecting parallel machines and multi-stage scheduling separately.
We measure the efficiency of the (online) algorithms by the ratio of time needed to process a predefined series of orders and time needed by the optimal offline algorithm.
Each order must be handled first by a 'stage-one' machine and afterwards by a 'stage-two' machine. At each stage, a scheduler must choose between two parallel machines. Parallel machines can handle the same type of tasks, but may differ in speed. The schedulers' objective is to fulfill a sequence of product orders in the least amount of time. This means we want to minimize the make-span.
The action and state space grow very large when the above scheduling problem would be controlled in a centralized way. A better alternative is a multi-agent approach. In this way, one large problem is divided into different smaller problems. Each of these problems are solved by a single agent using his local/private information. This, however, creates a new problem in the multi-agent setting. From the viewpoint of one agent, the outcome/success of his actions can also depend on the actions of other agents.
In our solution, agents decide which machine processes the next job. After each 'stage-one' machine, an agent selects a machine from stage-two.
The learning techniques which we apply to this problem are Learning Automata and Q-Learning. In previous work we used Q-Learning to the problem of selecting parallel machines and multi-stage scheduling separately.
We measure the efficiency of the (online) algorithms by the ratio of time needed to process a predefined series of orders and time needed by the optimal offline algorithm.
Original language | English |
---|---|
Title of host publication | Book of Abstracts of the 14th Belgian-French-German Conference on Optimization |
Editors | J. De Schutter, M. Diehl, F. Glineur, E. Jarlebring, Q. Louveaux, W. Michiels, I. Smets, J. Suykens, J. Swevers, J. Vandewalle, J. Van Impe, M. Verschuure |
Publisher | Instituut voor Mediastudies (KU Leuven) |
Pages | 162-162 |
Number of pages | 1 |
ISBN (Print) | 978-90-73802-00-1 |
Publication status | Published - Sep 2009 |
Publication series
Name | Belgian-French-German Conference on Optimization |
---|
Bibliographical note
J. De Schutter, M. Diehl, F. Glineur, E. Jarlebring, Q. Louveaux, W. Michiels, I. Smets, J. Suykens, J. Swevers, J. Vandewalle, J. Van Impe, M. VerschuureKeywords
- Scheduling
- Reinforcement Learning