From demonstration to autonomy: Reinforcement learning for autonomous assembly planning and flexible human-robot cooperation

Research output: ThesisPhD Thesis


In recent years, the significant reduction in the cost of industrial robots has led to
a shift in focus from the expense of robot acquisition to the challenges and costs
associated with programming these robots for repetitive tasks. This has become
a major concern as (expensive) specialized engineers are required to carry out
such programming tasks. The emergence of Industry 5.0 emphasizes the need for
greater flexibility in manufacturing processes, which in turn further increases the
complexity of assembly planning. Moreover, the integration of complementary
human strengths with the unique capabilities of robots is crucial for enhancing the
overall efficiency of manufacturing processes.
The manufacturing industry is witnessing a growing need for flexible assembly plans capable of updating on the fly to adapt to changing environment conditions. Shifting programming efforts from system engineers to factory workers not only reduces programming costs but also calls for intuitive assembly paradigms
that do not require extensive robot programming experience. This work will divide Robot programming into low and high level programming. Low level programming concerns programming the specific motions of the robot. High level programming is ordering the low level tasks into an assembly sequence. By leveraging the Programming by Demonstration paradigm, humans can already program
robot trajectories through physical demonstrations. This thesis utilizes Reinforcement Learning to enable a more intuitive programming experience for high-level
planning tasks, allowing RL agents to learn optimal assembly plans. This combination can leverage demonstrations by humans, to generate flexible assembly
plans resulting in fully autonomous and adaptable robotic systems.
For a RL agent to autonomously generate optimal assembly plans it requires
a training environment. We investigate how a simulated copy of the environment,
called a digital twin, can be used to make an initial reduction of the total solution space, by filtering out assembly plans that are physically possible (i.e. no
collisions with the environment). Our experiments show that from this reduced
solution space, the agent is able to faster converge to the fastest assembly plan
in the real world. We show that this two-step procedure allows for the robot to
autonomously find correct and fast assembly sequences, without any additional
human input or mismanufactured products.
The possible solution space of assemblies increase drastically with an increasing number of parts. In order to more efficiently reduce the total solution
space we present the novel Autonomous Constraint Generation (ACG) method.
This method requires one demonstrated sequence as input, and will us them to
generate all physically possible assembly plans. A major advantage of this method
is that it only scales linearly with the number of parts in the assembly. We show a
drastic decrease in planning time compared to the previous RL method, and show
that it can be applied to a real industrial use case.
Humans and robots are each specialized in different types of tasks (dexterous
and creative vs. easy and repetitive). Flexible assembly plans can be used to create
human-robot cooperation. We further enhance the RL method, by modelling the
occupancy of the operator in the assembly environment. The resulting system is
able to learn when to perform robot assembly tasks in parallel with an operator,
while ensuring safety of the operator. This system was implemented on a real
world use case. Results show the system is able to converge to a robot behavior
that performs as much assembly steps as possible in parallel with the operator,
significantly reducing the total assembly time.
In order to further personalize this cooperation, operators should be able to
advise the robot in a preferred assembly sequence. We investigate how natural language can be used to give commands to the RL agent. The proposed IRL-PBRS
method uses Interactive Reinforcement Learning (IRL) to learn from human advice in an interactive way, and uses Potential Based Reward Shaping (PBRS) to
focus learning on a smaller part of the solution space. Compared to other feedback strategies, IRL-PBRS converges faster to a valid assembly plan and does this
with the least amount of human interactions. Moreover, we show in a user study
that the system is able to learn fast enough to keep up with advice given by a user,
and is able to adapt online to a changing knowledge base.
In conclusion, Reinforcement Learning based methods can be used to autonomously generate efficient and fast robot assembly plans, based only on hu-
man demonstrations. By modelling the environment, and the occupancy of the
operator, it can be used to safely work together with an operator, in a cooperative setting. These plans can be further fine-tuned by giving commands in natural
language, that is incorporated in the learning process.
Original languageEnglish
Awarding Institution
  • Vrije Universiteit Brussel
  • Vanderborght, Bram, Supervisor
  • Van de Perre, Greet, Co-Supervisor
  • Verstraten, Tom, Co-Supervisor
Award date16 Nov 2023
Publication statusPublished - 2023


Dive into the research topics of 'From demonstration to autonomy: Reinforcement learning for autonomous assembly planning and flexible human-robot cooperation'. Together they form a unique fingerprint.

Cite this