When people sign up for Koko, a San Francisco-based online emotional support chat service, they expect to exchange messages with an anonymous volunteer. They can seek relationship advice, talk about their depression, or find help for almost anything else — a kind of free, digital shoulder to cry on.
However, for a few thousand people, the mental health care they received was less than human. It was instead supplemented by robots.
Koko conducted an experiment in October in which GPT-3, a newly popular artificial intelligence chatbot, wrote responses in whole or in part. Humans could edit the responses and still press the send buttons, but they weren’t always the authors.
According to Koko co-founder Robert Morris, approximately 4,000 people received responses from Koko that were at least partially written by AI.
Since he revealed it a week ago, the experiment on the small and little-known platform has exploded into an intense controversy, in what may be a preview of more ethical disputes to come as AI technology works its way into more consumer products and health services.
Morris thought it was a worthwhile idea to try because GPT-3 is frequently both fast and eloquent, he told NBC News.
“People who saw the co-written GTP-3 responses rated them significantly higher than the ones that were written purely by a human. “What an interesting observation,” he said.
Morris stated that he did not have official test data to share.
However, once people realised the messages were co-created by a machine, the benefits of improved writing vanished. “Simulated empathy feels strange and empty,” Morris tweeted.
On Jan. 6, he shared the results of the experiment on Twitter and received a barrage of criticism. Academics, journalists, and fellow technologists accused him of acting unethically and duping people into becoming test subjects without their knowledge or consent when they were in need of mental health care. His Twitter thread received over 8 million views.
Of course, the senders of the AI-crafted messages knew whether they had written or edited them. However, recipients only received a notification saying, “Someone replied to your post! (written in collaboration with Koko Bot)” without elaborating on the bot’s role.
GPT-3 responded to someone who expressed difficulty becoming a better person in a demonstration Morris posted online. “I hear you,” the chatbot said. It’s difficult for you to become a better person. Making changes in our lives is difficult, especially when we try to do so alone. However, you are not alone.”
Morris stated that there was no way to opt out of the experiment other than not reading the response at all. “If you received a message, you could choose to ignore it,” he explained.
Leslie Wolf, a law professor at Georgia State University who writes about and teaches research ethics, expressed concern about how little Koko informed people who were receiving answers augmented by AI.
“This is an organisation trying to provide much-needed support in a mental health crisis where we don’t have enough resources to meet the needs, and yet when we manipulate people who are vulnerable, it’s not going to go well,” she explained. People in mental distress may be made to feel worse, she says, especially if the AI produces biassed or careless text that goes unreviewed.
Now, Koko is defending its decision, and the entire tech industry is under scrutiny for the casual way it sometimes turns ordinary people into lab rats, especially as more tech companies venture into health-related services.
After revelations of harmful experiments such as the Tuskegee Syphilis Study, in which government researchers injected syphilis into hundreds of Black Americans who went untreated and sometimes died, Congress mandated oversight of some human subject tests in 1974. As a result, universities and others receiving federal funding are required to follow strict rules when conducting experiments on human subjects, a process enforced by institutional review boards, or IRBs.
However, there are no such legal obligations for private corporations or nonprofit organisations that do not receive federal funding and are not seeking FDA approval.
Morris stated that Koko has not received any federal funds.
“People are frequently surprised to learn that there are no actual laws specifically governing human research in the United States,” said Alex John London, director of the Center for Ethics and Policy at Carnegie Mellon University and author of a book on research ethics, in an email.
He believes that even if an entity is not required to go through IRB review, it should do so in order to reduce risks. He’d like to know what steps Koko took to ensure that research participants “weren’t the most vulnerable users in acute psychological crisis.”
“Users at higher risk are always directed to crisis lines and other resources,” Morris said, adding that “Koko closely monitored the responses while the feature was live.”
Morris said in an email Saturday, following the publication of this article, that Koko was now looking into ways to set up a third-party IRB process to review product changes. He stated that he wanted to go above and beyond current industry standards to demonstrate what is possible for other nonprofits and services.
There are numerous examples of tech companies taking advantage of the oversight void. Facebook revealed in 2014 that it had conducted a psychological experiment on 689,000 people, demonstrating that it could spread negative or positive emotions like a contagion by changing the content of people’s news feeds. Facebook, now known as Meta, apologised and overhauled its internal review process, but it also claimed that people should have been aware of the possibility of such experiments by reading Facebook’s terms of service — a position that perplexed people outside the company because few people actually understand the agreements they make with platforms like Facebook.
Despite the outrage over the Facebook study, there was no change in federal law or policy to require universal oversight of human subject experiments.
Koko is not Facebook, despite its massive profits and user base. Morris, a former Airbnb data scientist with a doctorate from the Massachusetts Institute of Technology, founded Koko as a nonprofit platform and a passion project. It’s a peer-to-peer support service, not a professional therapist disruptor, and it’s only available through other platforms like Discord and Tumblr, not as a standalone app.
Koko had 10,000 volunteers in the last month and serves about 1,000 people per day, according to Morris.
“The overarching goal of my work is to figure out how to help people in emotional distress online,” he explained. “There are millions of people looking for help online.”
Even as symptoms of anxiety and depression have increased during the coronavirus pandemic, there is a nationwide shortage of professionals trained to provide mental health support.
“We’re getting people to write short messages of hope to each other in a safe environment,” Morris explained.
Critics, on the other hand, have focused on whether participants gave informed consent to the experiment.
Camille Nebeker, a professor at the University of California, San Diego who specialises in human research ethics applied to emerging technologies, said Koko put people seeking help at unnecessary risk. According to her, informed consent by a research participant includes, at the very least, a description of the potential risks and benefits written in clear, simple language.
“Informed consent is critical for traditional research,” she says. “It’s a cornerstone of ethical practises, but without the requirement to do so, the public could be jeopardised.”
She also mentioned that the potential for bias in AI has alarmed people. And, while chatbots have proliferated in areas such as customer service, they are still a relatively new technology. ChatGPT, a bot built on GPT-3 technology, was banned from New York City schools’ devices and networks earlier this month.
“We’re in the Wild West,” said Nebeker. “It’s simply too dangerous not to have some standards and agreement on road rules.”
The FDA regulates some mobile medical apps that it considers to be “medical devices,” such as those that assist people in overcoming opioid addiction. However, not all apps meet that definition, and the agency issued guidance in September to help businesses distinguish between the two. An FDA representative told NBC News that some apps that provide digital therapy may be considered medical devices, but that the FDA does not comment on specific companies per FDA policy.
Other organisations are grappling with how to apply AI in health-related fields in the absence of official oversight. Google, which has struggled with its handling of AI ethics questions, co-hosted a “health bioethics summit” in October with The Hastings Center, a nonprofit research centre and think tank dedicated to bioethics. Informed consent was included as one of the World Health Organization’s six “guiding principles” for AI design and use in June.
Koko has a mental-health advisory board that weighs in on the company’s practises, but Morris says there is no formal process for them to approve proposed experiments.
According to Stephen Schueller, a member of the advisory board and a psychology professor at the University of California, Irvine, conducting a review every time Koko’s product team wanted to roll out a new feature or test an idea would be impractical. He refused to say whether Koko made a mistake, but he did say it demonstrated the need for a public discussion about private sector research.
“We really need to think about how we use new technologies responsibly as they come online,” he said.
Morris said he never imagined an AI chatbot would be able to solve the mental health crisis, and he didn’t like how it turned being a Koko peer supporter into a “assembly line” of approving prewritten answers.
He did, however, say that prewritten answers that are copied and pasted have long been a feature of online help services, and that organisations must continue to experiment with new ways to care for more people. According to him, a university-level review of experiments would put an end to the search.
“AI is not the only or perfect solution. “There is a lack of empathy and authenticity,” he said. “We can’t just have a position where any use of AI requires the ultimate IRB scrutiny,” he added.