Abstract Learning optimal policies in Reinforcement Learning (RL) can present significant challenges in real-world applications, where it is crucial for agents to demonstrate specific behaviors while ensuring safety and efficiency. For human behavior learning using RL, the matter of safety during the learning process and its subsequent deployment in real-world scenarios has not been adequately addressed. This paper introduces a novel reinforcement learning approach that combines behavior learning with safe exploration in RL, offering a practical and effective method for acquiring specific behaviors while ensuring safety. The proposed algorithm's performance is evaluated in guiding a 2-degree-of-freedom planar robot in its task-space, demonstrating its ability to converge to an optimal policy while strictly adhering to safety constraints. This research has the potential to have a profound impact on various real-world applications, including robotics and virtual assistants.