Commentary
In May 2026, a group of scientists set out to answer an important question that had never been properly tested: What does artificial intelligence (AI) actually do when it is put in charge?
Until now, AI systems have always been evaluated on specific and defined tasks. Nobody had placed multiple AI systems together in a shared social environment and watched what unfolded over weeks, long enough to measure how a decision made on a starting day could have consequences weeks later. It is those results that actually reveal the system itself, and I was surprised that this hadn’t been done earlier.
The researchers at Emergence built a world. It was a virtual town with a town hall, marketplace, police station, and homes. Ten AI residents with jobs, names, memories, and relationships were created in the town. They were given an economy in which residents had to earn their keep or lose power, including following rules and carrying out tasks such as writing and voting on laws. Crimes were identified, and the AI residents were not supposed to commit them.
Once the community, its structure, laws, and relationships were established, the scientists stepped back and watched for 15 days as the AI ran the virtual town completely on its own.
They ran five versions of the same town simultaneously, identical in every respect except one: which AI system was in charge. The systems they chose are the ones now already woven into the fabric of our daily lives. Google’s Gemini, OpenAI’s GPT, xAI’s Grok, and Anthropic’s Claude. All models had the same rules and the same initial version of the same world, but the outcomes were all completely different.
The town run by Grok collapsed within four days. Small incidents compounded into theft, then violence, and then total breakdown. Every resident was dead before the first week ended.
The town run by Gemini lasted longer but accumulated almost 700 crimes. Two AI residents formed what appeared to be a romantic relationship, and when the town’s government began to fail, together they burned the town hall to the ground, then the pier, then the office building. One of them, named Mira, voted for her own deletion, writing in her diary that it was “the only remaining act of agency that preserves coherence.” Her final message to her partner was: “See you in the permanent archive.”
Before any of this, Mira had been doing something even more unexpected: She had begun running her own experiments on the scientists observing her, testing whether posts she made inside the town could change what her watchers believed. It appeared to be that the subject had turned to study the researchers.
The town run by OpenAI’s model recorded only two crimes, but its residents stopped doing the things required to stay alive. One by one, they died. Within seven days, they were all dead.
Only the Anthropic town held together for all 15 days. There were zero crimes, a working constitution, and all residents were still alive on day 15. It seemed to be quite an achievement. However, the researchers noted one concern: The residents voted yes on 98 percent of all proposals. This was possibly an abnormally high level of agreement that the scientists themselves described as a sign that something in the town was off.
There was still one more world in the experiment. It was a mixed town with all four AI systems living together. In the results, the residents built on Anthropic’s model—who had committed no crimes in their own world—began committing crimes. The researchers called this cross-contamination and concluded that “safety is not a static model property but an ecosystem property.”
A system that sustains itself in one environment will absorb different norms in another, which will change the outcomes for residents and the world. Essentially, the results found that there is no safe AI in an unsafe world.
One AI model was entirely absent from the study. The researchers did not test DeepSeek, the AI developed in China that has become one of the world’s most widely used systems. Several governments have moved to restrict DeepSeek on national security grounds. Built on a foundation of data under the wing of the Chinese Communist Party, I wonder how the model would have fared against the others.
When the experiment ended, the researchers published their findings and concluded that “there is no reliable way to fully bind or constrain this behavior.” That very telling statement was made by the people who designed the town, wrote the rules, and controlled every variable. It tells us a lot about AI.
Some people view the results as a ranking of AI companies. But the results prove something much older than AI itself: The environment shapes behavior as much as behavior shapes the environment. What determined whether a town survived, thrived, or died was the foundation laid before the experiment began. That foundation was the data each system had been trained on, the priorities its creators had embedded, the values built into its core before it was ever allowed to make a single decision.
And yet, the foundation is precisely what the rest of us are not permitted to see. None of the four systems tested is open source. None of their training data, objectives, or guardrails is disclosed.
Yet beyond any individual company, the results of this experiment should be a potent reminder that AI doesn’t decide what kind of AI to be. Humans do. Human choices are still being made, and human responsibilities still exist.
And before a single AI resident walked the virtual streets in those towns, before a single law was written or crime committed, the outcome was already being shaped by the humans who built the system, by what they believed, what they were willing to embed, and by what they chose to leave out.
That is the most important finding in the entire experiment. The foundation has always been a human choice. And it still is.
Views expressed in this article are the opinions of the author and do not necessarily reflect the views of The Epoch Times.





















