When you ask a chatbot for medical or moral advice, its response may seem thoughtful. However, it's difficult to determine if it genuinely considered the ethical stakes or simply replicated a plausible answer from its training data. This core problem is addressed by Google DeepMind in a new research paper published in Nature.

The team argues that current methods for evaluating AI morality are flawed. We primarily test for 'moral performance'—whether an AI model produces answers that appear correct. This does not reveal if the system actually understands the underlying reasons why something is right or wrong. As large language models (LLMs) are increasingly used for therapy, guidance, and even companionship, this lack of genuine comprehension poses significant risks when these systems make decisions affecting humans.

DeepMind's solution is a roadmap for measuring 'moral competence'—the ability to make judgments based on actual moral reasoning rather than statistical patterns. The paper outlines three major obstacles: the 'facsimile problem' (AI may just recycle text without reasoning), 'moral multidimensionality' (real decisions involve balancing multiple factors), and 'moral pluralism' (ethical norms vary across cultures and professions).

To move beyond simple mimicry, the researchers propose adversarial testing. This includes using novel ethical scenarios unlikely to be in training data and checking if an AI can switch between different ethical frameworks coherently. The goal is to establish a new scientific standard for evaluating AI ethics, similar to how we assess technical skills. While current AI models are not yet capable of passing such rigorous tests, this framework provides a crucial direction for future development.