This chapter presents a completely automated real-time character-based interface, where a scriptable affective humanoid 3D agent interacts with the user. Special care has been taken in making it possible multimodal natural user-agent interaction: communication is accomplished via text, image and voice (natural language). Our embodied agents are equipped with an emotional state which can be modified throughout the conversation with the user, and depends on the emotional state detected from the user's facial expressions. In fact, this nonverbal affective information is interpreted by the agent, which responds in an empathetic way by acompasing its voice intonation, facial expression and answers. These agents have been used as virtual presenters, domotic assistants and pedagogical agents in different applications and results are promising. The chapter has focused on two main aspects: the capture of the user emotional state from web cam images and the development of a dialog system in natural language (Spanish) that takes also emotional aspects into account. The facial expression recognizer is based on facial features' tracking and on an effective emotional classification method based on the theory of evidence and Ekman's emotional classification. From a set of distances and angles extracted from the user images and from a set of thresholds defined from the analysis of a sufficiently broad image database, the classification results are acceptable, and recent developments has enabled us to improve success rates. The utility of this kind of information is clear: the general vision in that is a user's emotion could be recognized by a computer, human computer-interaction would become more natural, enjoyable and productive. The dialog system has been developed so that the user can ask questions, give commands or ask for help to the agent. It is based on the recognition of patterns, to which fixed answers are associated. These answers, however, vary depending on the virtual character's emotional state, or may undergo random variations so that the user does not get the impression of repetition if the conversation goes on for a long time. Special attention has also been paid in adding an emotional component to the synthesized voice in order to reduce its artificial nature. Voice emotions also follow Ekman's ones and are modeled by means of modifying volume, speed and pitch. Several research lines remain open: ? Regarding Maxine, next steps are: o to allow not only facial expressions but body postures to be affected by the emotional state of the agent, o to use the user emotional information in a more sophisticated way: the computer could offer help and assistance to a confused user or try to cheer up a frustrated user and, hence, react in ways more appropriated than simply ignoring the user affective states, as is the case in most current interfaces, o to consider not only emotion but personality models for the virtual agents, o to give the system learning mechanisms, so that it can modify its display rules based on what appears to be working for a particular user, and improve its responses while interacting with that user, and o to carry out a proper validation of Maxine system and characters.