Rewards Actions Image Agent State Observations (a) Environment State Actions Agent Image Simulation Real World Simulation Data (Expert) Real Discriminator (b) Fake Generator (c) FIGURE 1. An illustration of the proposed workflow for completing complex assembly tasks. (a) Step 1: Training in simulation. (b) Step 2: Sim-to-real transfer. (c) Step 3: Real robot deployment. TABLE 1. The parameters of the reward function. RL REWARD WEIGHTS Grasp approach reward Grasp reward Assembly approach reward Hover reward Finish reward FIGURE 2. An overview of three doublets to be assembled. Time penalty (every second) 0.1 10 0.1 10 25 −0.02 Environment 20 40 60 80 FIGURE 3. A configuration diagram of the virtual assembly environment. 60 IEEE ROBOTICS & AUTOMATION MAGAZINE JUNE 2023 3 5 1 2 0 100 200 300 400 500 600 Step FIGURE 4. The changes of the reward functions in the manual control process and the key points of the task. 4 Reward