One of the essential challenges in the development of Artificial Intelligence (AI) to achieve a specific goal is that the system should be...
One of the essential challenges in the development of Artificial Intelligence (AI) to achieve a specific goal is that the system should be able to perform its task independently and in a controllable way. For an ideally functioning AI system, it is enough to specify only the task, to specify only the goal. With the help of the sensors and actuators available, the ideal AI acts independently in the direction of the assigned goal in order to perform the task.
In biological systems, the independent purposeful operation is ensured by the evolutionary functions of intention and will. Even in primitive, single-celled, brainless organisms, self-guided behavior is recognizable. Intention and will are essential functions of independent functioning.
Independent and purposeful operation is made possible by the incorporation of intention and will, by the integration of the motivation into the governing system.
Evolution realizes motivation based functioning, the functions of intention and will by perceiving the favorable and unfavorable situations and developing proper internal reward and penalty processes. An appropriately adapted biological system, guided by motivation, guided by internal rewards and penalty, performs the task of evolution adapting to the given environment while learning the environment’s special circumstances.
AI’s latest attempt to perform a specific task automatically by adapting to the given environment based on learning is the reinforcement learning method.
Reinforcement learning simplifies the machine learning process for the programmer because it uses a universal method. The guiding system does not need to know the specific environment in advance. It is enough just to observe the environment and just to examine how the environment affects the machine. The machine also has effectors that can affect the environment and this way influences it.
In applying the reinforcement learning method, the programmer defines, designates a goal or state that he wants to achieve or maintain with the machine placed in the environment. The programmer also incorporates a description of evaluation into the management system that shows whether the environment has a positive or negative effect on the machine achieving the specified goal or maintaining the desired condition.
The machine affects its environment during operation. The guiding system does not need to know the environment, it just needs to monitor and evaluate the impact of its intervention on the environment. Reinforcement learning-based AI acts on its own. Initially, it intervenes randomly in the environment. If the action issued by the machine elicits a favorable response in the environment, that action becomes confirmed, if the response is not favorable, the action is discarded. During its operation, the machine functions more and more purposefully and more and more efficiently while striving to achieve the designated goal.
It can be clearly seen that reinforcement learning-based AI seeks to implement a motivation-based guidance system by applying reward and penalty similarly as it is employed by the biological systems.
This managing mechanism was formulated in the description of the UAA-Systems.
Reinforcement learning is a step towards intention-based systems. The predetermined reward-penalty procedure of the reinforcement learning-based system can correspond to the functions of goal-intention-will functions created by evolution. (Formulating the UAA-System, this function is defined as the primary critical stimuli.)
However, the motivational mechanisms used by the biological brain are not based solely on the application of purpose-intention-will functions rigidly fixed in the individual by evolution. The brain not only learns its environment by its predetermined motivational system, but is also able to modify its motivational system on its own, without external intervention, flexibly, and properly considering the goal to be achieved.
The brain uses a constantly and dynamically changing, hierarchical reward and penalty structure creating by associative nervous system processes of learning (continuously discovering new critical stimulus between stimuli, and forgetting old, unused, or malfunctioning previously discovered critical stimuli by associative learning) built on the pre-programmed reward and penalty system created by evolution. (In the formulation of the UAA-System, this function is defined as the secondary critical stimuli.)
Continuous and dynamic modification of the secondary critical stimulus structure through learning makes the system qualitatively more efficient in adapting to the environment and achieving the designated goal.
The dynamic reward-penalty structure also makes the self-operating system manageable while maintaining self-functioning by externally modifying the secondary critical stimuli.
Controllability by modifying secondary critical stimuli does not mean direct and immediate control. It does not allow direct guidance, but guides the system indirectly by modifying the goal-intention-will functions from outside. This way the programmer can indirectly modify the motivation of the machine, still also maintaining its ability to operate independently.
The AI systems’ performance allows exponential progression because, in contrast to the capabilities of the biological brain, AI has no hardware limitations in computing capacity. Reinforcement learning-based AI systems are already superior to human abilities in many areas. The hierarchical and dynamical reward-penalty system developed through self-learning, the dynamically changing secondary critical stimulus structure provides a qualitative enhancement in capabilities for reinforcement learning-based guidance systems. Additionally, external modification of the motivation system by the programmer also allows the controllability of the independently operating system. The functions of dynamic motivation, goal-intention-will functions can be implemented dynamically in AI systems.
No comments