In late July, OpenAI began rolling out an eerily humanlike voice interface for ChatGPT. In a safety analysis released today, the company acknowledges that this anthropomorphic voice may lure some users into becoming emotionally attached to their chatbot.
The warnings are included in a “system card” for GPT-4o, a technical document that lays out what the company believes are the risks associated with the model, plus details surrounding safety testing and the mitigation efforts the company’s taking to reduce potential risk.
OpenAI has faced scrutiny in recent months after a number of employees working on AI’s long-term risks quit the company. Some subsequently accused OpenAI of taking unnecessary chances and muzzling dissenters in its race to commercialize AI. Revealing more details of OpenAI’s safety regime may help mitigate the criticism and reassure the public that the company takes the issue seriously.
The risks explored in the new system card are wide-ranging, and include the potential for GPT-4o to amplify societal biases, spread disinformation, and aid in the development of chemical or biological weapons. It also discloses details of testing designed to ensure that AI models won’t try to break free of their controls, deceive people, or scheme catastrophic plans.
Some outside experts commend OpenAI for its transparency but say it could go further.
Lucie-Aimée Kaffee, an applied policy researcher at Hugging Face, a company that hosts AI tools, notes that OpenAI’s system card for GPT-4o does not include extensive details on the model’s training data or who owns that data. “The question of consent in creating such a large dataset spanning multiple modalities, including text, image, and speech, needs to be addressed,” Kaffee says.
Others note that risks could change as tools are used in the wild. “Their internal review should only be the first piece of ensuring AI safety,” says Neil Thompson, a professor at MIT who studies AI risk assessments. “Many risks only manifest when AI is used in the real world. It is important that these other risks are cataloged and evaluated as new models emerge.”
The new system card highlights how rapidly AI risks are evolving with the development of powerful new features such as OpenAI’s voice interface. In May, when the company unveiled its voice mode, which can respond swiftly and handle interruptions in a natural back and forth, many users noticed it appeared overly flirtatious in demos. The company later faced criticism from the actress Scarlett Johansson, who accused it of copying her style of speech.
A section of the system card titled “Anthropomorphization and Emotional Reliance” explores problems that arise when users perceive AI in human terms, something apparently exacerbated by the humanlike voice mode. During the red teaming, or stress testing, of GPT-4o, for instance, OpenAI researchers noticed instances of speech from users that conveyed a sense of emotional connection with the model. For example, people used language such as “This is our last day together.”
Anthropomorphism might cause users to place more trust in the output of a model when it “hallucinates” incorrect information, OpenAI says. Over time, it might even affect users’ relationships with other people. “Users might form social relationships with the AI, reducing their need for human interaction—potentially benefiting lonely individuals but possibly affecting healthy relationships,” the document says.
Joaquin Quiñonero Candela, head of preparedness at OpenAI, says that voice mode could evolve into a uniquely powerful interface. He also notes that the kind of emotional effects seen with GPT-4o can be positive—say, by helping those who are lonely or who need to practice social interactions. He adds that the company will study anthropomorphism and the emotional connections closely, including by monitoring how beta testers interact with ChatGPT. “We don’t have results to share at the moment, but it’s on our list of concerns,” he says.
Read the full article here