Supplementary material for:
A novel plasticity rule can explain the development of sensorimotor intelligence

Ralf Der and Georg Martius

Videos of the conducted experiments are presented in the following sections. Information on the source-code is given in Sect. 9.

1 Overview

Video 1: Overview video summarizing the experimental results of the paper. This video provides a demonstration of the self-organization of behavior created by the novel plasticity rule. Different systems are considered and the plasticity rule is briefly explained. Longer videos for each experiment are provided below.

back to List of Videos

2 Self-organized crawling behavior

Video 2: The humanoid robot on the floor developing a crawling behavior. During the initial period it is seen how, from the initialization condition, small movements get amplified into a coherent movement. These get increasingly shaped to fit the situation. From time 4:30 on a stable crawling motions is observed. Parameters: global normalization, κ=1.4, τ_h=0.4 sec, τ=4 sec.

back to List of Videos

3 Hexapod walking

(A)
(B)

Video 3: Hexapod developing different gaits. The internal model is structured to break the symmetries and facilitate a locomotion behavior, model M 1 (A) and M 2 in (B), see Fig. 4 (main text)(B,C). (A) Initially a synchronous wave gait and then a synchronous trot gait emerges which preserves most of the initial symmetries. After a perturbation by getting stuck on the front legs, new locomotion gaits develop which channel into the well known tripod gait. (B) Due to a different correlation structure imposed by model M 2, a different set of gaits emerges. See Fig. 4 (main text)(D,E) for the stepping patterns. Parameters: individual normalization, κ=2.2, h=0, τ=0.7 sec.

back to List of Videos

4 Memorizing and recalling behavior

By either taking snapshots of the synaptic weights, or more objectively, by a clustering procedure, a set of fixed control structures can be extracted. Examples of such a clustering are shown in Supplementary Fig. 1. If these synaptic connections are used in the controller network, the behavior can be reproduced (without synaptic dynamics) in most cases. A demonstration is given in Video 4(A) where 5 clusters from Video 2(A,B) have been selected for demonstration. The switching between behaviors may not be successful if the old behavior does not lie in the basin of attraction of the new one. This happens in particular when starting from inactive behaviors, but his can be helped by a short perturbation. In a second experiment we show how snapshots of the weights taken during Video 3(A,B) can be used to memorize and recall all the different emergent gaits. The synaptic weights have been copied instantaneously without selecting a precise time point or averaging. In fact, during learning for one particular gait a whole series of weights occurred, but apparently any of these weight sets is a viable controller.

Supplementary Figure 1: Clustering of the controller matrices of Video 2. Displayed are the cluster centers for some of the clusters. The names were given after a visual inspection of their corresponding behavior, see Video 4.

(A)
(B)

Video 4: Recall and transition between behaviors. (A) Sequence of behaviors generated by switching the synaptic weights to those determined by cluster analysis, see Supplementary Fig. 1. Typically, a fast transition between different behaviors occurs. In some cases, however, an external perturbation is required to facilitate the transition. (B) For the hexapod robot, a set of matrices have been stored by taking snapshots of the synaptic weights as indicated in Fig. 4 (main text)(E), i. e. at seconds: 23, 26, 39 (faster sync trot, not in Fig. 4 (main text)), 46 of run 1 and at sec 8, 13, 55 of run 2. No averaging was performed. All previously observed gaits are successfully reproduced with remarkably smooth transition between them.

back to List of Videos

5 Humanoid robot at the wheel: find your task in the world

(A)
(B)

Video 5: The humanoid robot at the wheels. (A) The hand of the robot are connected to the cranks of a massive wheel. By the drive to build up correlations between joint angle velocities, the robot “discovers” how the crank can be moved in order to realize a stable periodic motion of its internal joint angles. When the wheel is turned in the opposite direction by an external torque the control network is taking up the new direction quickly. Parameters: global normalization, κ=0.96, h=0, τ=0.4 sec. (B) Hand and feet are connected to independent wheels. The feet start to rotate the lower wheel earlier than the hands do, due to simpler physics. This difference also hinders spontaneous synchronization. Note that the trunk is fixed on the stool such that upper and lower body are physically decoupled. Nevertheless, when the lower wheel’s revolution direction gets inverted by external force the arms also stop and need some time to find the rotating motion again. Parameters: individual normalization, κ=1.4, h=0, τ=0.4 sec.

back to List of Videos

6 Emerging cooperation

(A)
(B)

Video 6: Emerging communications by force exchange. Both robots get connected to the wheel, each to one of the cranks. The robots have no information about their partner. Yet they manage to cooperate by “feeling” the others reactive forces. This even works if the robots are not supported by the stool (B) (muscle forces doubled). Parameters: see Video 5(A) but with κ=1.

back to List of Videos

7 Model learning from motor babbling

Video 7: Learning of the inverse model and behavior of the hexapod robot without and with guidance. The robot is fixed in the air as an idealized situation. All joints are controlled for 15 min independently by harmonic oscillations of varying frequency and the inverse model is learned. Then the behavior of the robot with this model is shown. It does not differ significantly from the behavior with the unit matrix. Finally, if the guidance is added (adding entries to M) the same locomotion pattern as in Video 3(A) are observed. Parameters: see Video 3.

back to List of Videos

8 Comparison of synaptic rules

Video 8: Comparison of plasticity rules with the hexapod robot. Left: DHL, middle: DEP, right: DEP-Guided (with structured model, Fig. 4 (main text)(B)). The insets show the eigenvalue spectrum over time and vertically the first 3 eigenvectors scaled with their corresponding eigenvalues. The robot is initialized with a zero controller as usual. Only DEP departs from this situtation. After 10 sec the synaptic weights (C) are copied from DEP to DHL. DHL is not able to maintain a persistent motion, even after a perturbation (at 45 sec). Both DEP settings show smooth behavior and a reaction to the perturbation by a different motion (middle) or changing the gait pattern (right). The eigenvalue spectrum differs across the examples. In the DHL case, there is only one nonzero eigenvalue whereas DEP has about five (see Fig. 7 (main text)). Its effect becomes visible when the robot is disturbed after 45 sec: DHL remains largely uninfluenced and also the eigenvector does not change its direction (it changes intensity in the visualization because it is scaled with the corresponding eigenvalue). Parameters: see Fig. 7 (main text).

back to List of Videos

9 Simulation source code

The experiments can be reproduced by using our simulation software and the following sources. The simulation software can be downloaded from http://robot.informatik.uni-leipzig.de/software or https://github. com/georgmartius/lpzrobots and has to be compiled on a Linux platform (for some platforms packages are available). The version 0.81 is required (best to go for the git source). The source code for the experiments of this paper can be downloaded from here as a gzipped tar file. Instructions for compilation and execution can be found in the README.txt file in the bundle.

This document was translated from L^AT_EX by H^EV^EA.

Supplementary material for: A novel plasticity rule can explain the development of sensorimotor intelligence