In evolutionary robotics, an encoding of the control software that maps sensor data (input) to motor control values (output) is shaped by stochastic optimization methods to complete a predefined task. This approach is assumed to be beneficial compared to standard methods of controller design in those cases where no a priori model is available that could help to optimize performance. For robots that have to operate in unpredictable environments as well, an evolutionary robotics approach is favorable. We present here a simple-to-implement, but hard-to-pass benchmark to allow for quantifying the “evolvability” of such evolving robot control software towards increasing behavioral complexity. We demonstrate that such a model-free approach is not a free lunch, as already simple tasks can be unsolvable barriers for fully open-ended uninformed evolutionary computation techniques. We propose the “Wankelmut” task as an objective for an evolutionary approach that starts from scratch without pre-shaped controller software or any other informed approach that would force the behavior to be evolved in a desired way. Our main claim is that “Wankelmut” represents the simplest set of problems that makes plain-vanilla evolutionary computation fail. We demonstrate this by a series of simple standard evolutionary approaches using different fitness functions and standard artificial neural networks, as well as continuous-time recurrent neural networks. All our tested approaches failed. From our observations, we conclude that other evolutionary approaches will also fail if they do not per se favor or enforce the modularity of the evolved structures and if they do not freeze or protect already evolved functionalities from being destroyed again in the later evolutionary process. However, such a protection would require a priori knowledge of the solution of the task and contradict the “no a priori model” approach that is often claimed in evolutionary computation. Thus, we propose a hard-to-pass benchmark in order to make a strong statement for self-complexifying and generative approaches in evolutionary computation in general and in evolutionary robotics specifically. We anticipate that defining such a benchmark by seeking the simplest task that causes the evolutionary process to fail can be a valuable benchmark for promoting future development in the fields of artificial intelligence, evolutionary robotics, and artificial life.