Task Difficulty
FIGURE 10. The average elapsed time of different control methods
at different task difficulties.
consumes numerous computational resources but is meaningful.
For standard image augmentation, the texture and
lighting conditions are fine-tuned according to the actual situation.
The result shows that image augmentation can hardly
affect the performance of the model. However, we finally
decided to include the image augmentation process to
increase the robustness to natural disturbances. We use one,
five, and 10 frames of historical states of robot arms as network
input, and the results show that historical states can
improve performance, but excessive data increase the training
cost, so five frames are specified. The success rate of
sim-to-real transfer with the BC algorithm is significantly
inferior to that of the proposed algorithm, which proves the
effectiveness of the GAIL algorithm in the transfer process.
Besides, as shown in Figure 9, our agents manipulate more
rapidly and efficiently compared to scripts in most cases. The
average time consumed to complete the assembly task in each
mode under the three tasks of different difficulties is shown
in Figure 10, which shows that the RL algorithm based on
simulation training and domain transfer outperforms script
and keyboard operations on simple tasks and even approaches
DM on complex tasks.
Controlling robots to automate complex tasks has been an
important area of research. In this work, we demonstrate that
adapting an RL method based on training in a simulation
environment and sim-to-real transfer can outperform manually
written scripts. Our method implements three stages for
training robots. First, a physical simulator based on Unity3D
is elaborated to simulate real-world scenarios; second, largescale
distributed robot training is performed on physical simulation
to obtain an agent proficient in assembly tasks in a
simulation environment; and third, sim-to-real transfer is conducted
to adapt to real-world deployment. The experimental
results show that the proposed algorithm outperforms other
algorithms, and the real robot arm completes the assembly
task significantly faster than script and keyboard operations.
Despite this success, the work also reveals the potential of RL
methods when robots are confronted with complex tasks. In
future work, we will attempt to improve the sample efficiency
to speed up the training process of the neural network.
We gratefully acknowledge the financial support from The
National Key R&D Program of China (2021YFF0306405).
The corresponding author is Hong Wang.
Daqi Jiang, School of Mechanical Engineering and
Automation, Northeastern University, Shenyang 110819,
China. Email:
Hong Wang, School of Mechanical Engineering and
Automation, Northeastern University, Shenyang 110819,
China. Email:
Yanzheng Lu, School of Mechanical Engineering and
Automation, Northeastern University, Shenyang 110819,
China. Email:
