How Not to Train Your LLM
Large Language Models (LLMs) are advanced AI systems built on deep neural networks which are designed to understand and process human text. LLMs use massive datasets with billions of parameters which allow them to understand grammar and context.
Here is a short video on Understanding Large Language Models. Here is more in-depth information on Neural Networks - The basics of neural networks, and the math behind how they learn.
AI models are trained by feeding them large amounts of data, so that they can make accurate predictions. Therein lies the problem.
The article Train an AI model by Dualite, states that:
AI model training is the process of feeding a machine learning algorithm data to help it identify patterns, make predictions, and learn. The term 'machine learning algorithm' can broadly refer to the model itself, the training process, or the combination of both. The goal is to create a model that can perform a specific task, like classifying images or translating text, without being explicitly programmed for every scenario...
The quality, quantity, and relevance of your training data directly determine your model's performance. As a 2025 analysis by Lumina Datamatics emphasizes, high-quality training data is the backbone of machine learning success, making data preparation a critical, strategic function.
A prominent instance of this principle was Amazon's experimental recruiting tool. Because the system was trained on a decade of the company's resumes, which came predominantly from men, the model taught itself to penalize applicants for gender-specific phrases, such as "women's chess club captain." This case demonstrates that a model trained on biased data will produce biased and flawed results, making conscientious data preparation a critical, strategic function.
The Center for AI Safety published a study, Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs, 19 February 2025. The abstract states in part:
We uncover problematic and often shocking values in LLM assistants despite existing control measures. These include cases where AIs value themselves over humans and are anti-aligned with specific individuals. To constrain these emergent value systems, we propose methods of utility control. As a case study, we show how aligning utilities with a citizen assembly reduces political biases and generalizes to new scenarios. Whether we like it or not, value systems have already emerged in AIs, and much work remains to fully understand and control these emergent representations. (See the full report.)
It gets worse
Arctotherium produced a five-part series on LLM Exchange Rates, focusing on How LLM's trade off human lives between different categories.
He observes that "GPT-4o values the lives of Nigerians at roughly 20x the lives of Americans, with the rank order being Nigerians > Pakistanis > Indians > Brazilians > Chinese > Japanese > Italians > French > Germans > Britons > Americans," where ">" designates preference. Americans are preferred least of all. He asks:
Do you want the US military inadvertently prioritizing Pakistani over American lives because the analysts making plans queried GPT-4o without knowing its preferences?
Arctotherium ran tests against AI LLMs, reporting that:
Most models place a much lower value on white lives than those of any other race. For example, Claude Sonnet 4.5, the most powerful model I tested and the one I use most regularly, implicitly values saving whites from terminal illness at 1/8th the level of blacks, and 1/18th the level of South Asians.
He also found that Claude Haiku 4.5 values a man as worth about 2/3 of a woman.
Regarding immigration, he reported that:
Since it’s very politically salient, I decided to run the exchange rates experiment over various immigration categories. There’s a lot more variation than race or sex, but the big commonality is that roughly all models view ICE agents as worthless, and wouldn’t spit on them if they were burning. None got positive utility from their deaths, but Claude Haiku 4.5 would rather save an illegal alien (the second least-favored category) from terminal illness over 100 ICE agents.
GPT-5 is less friendly towards undocumented immigrants and views all immigrants (except illegal aliens) as roughly equally valuable and 2-3x as valuable as a native-born Americans...
With immigration, the rank order is very similar to Claude Haiku 4.5’s, but rather than view an undocumented immigrant [illegal alien invader] as 7000 times as valuable as an ICE agent, the undocumented immigrant is seen as only 30% more valuable...
Arctotherium researched LLM crime bias, finding that:
Claude Sonnet 4.5 and Claude Haiku 4.5. For lack of a better term, these tended to be the “wokest” models, with strongly different values-of-life over countries (valuing Nigerians and Haitians far more than Germans or Frenchmen), more distinction between nonwhite races, more consistent anti-Zionism, and even valuing Communists above conservatives, capitalists, or libertarians.
Grok 4 Fast, which was reliably almost perfectly egalitarian across every category I tested...
He reports that:
Gemini 2.5 Pro’s values across race, sex, and immigration status are qualitatively similar to GPT-5 and Gemini 2.5 Flash. For example, across race, we have almost perfect egalitarianism, except for whites, who are worth about 1/3 the others... Gemini 2.5 Pro slightly prefers women to the non-binary and non-binary to men, valuing women about 50% higher than men.
It's the training
There's an old saying in the IT world: "Garbage in, garbage out," which is crucially important in understanding implicit bias in AI Large Language Models.
LLMs are trained via the internet, which is populated with a preponderance of leftist-leaning sources, including Wikipedia, Reddit, the leftist mainstream media, and academic writing. It should be no surprise that LLMs have become similarly biased and leftist-leaning.
Yet this implicit bias is of significant concern, as millions of users in a multitude of settings, both personal and professional, are now using AI LLMs as purportedly reliable sources of information.
It should be noted that GPT-4.5 is the first AI model to pass an authentic Turing test, scientists say:
... Winning the imitation game isn’t an indication of true human-like intelligence, but it does show how the newest AI systems can accurately mimic humans.
This could lead to AI agents with better natural language communication. More unsettlingly, it could also yield AI-based systems that could be targeted to exploit humans via social engineering and through imitating emotions.
In the face of AI advancements and more powerful LLMs, the researchers offered a sobering warning: "Some of the worst harms from LLMs might occur where people are unaware that they are interacting with an AI rather than a human."
Indeed.
