News Analysis
An analysis I conducted of more than 5 million criminal records found that nearly one in three Hispanics are being assigned the label “white” in official Department of Corrections databases.
For the justice system, the failure to identify Hispanics correctly is not a research fluke but a feature of the bureaucracy. This is not the first time critical analysis has revealed a systematic misclassification of some Hispanics as whites.
A study by the National Bureau of Economic Research linked criminal justice records to Census Bureau data in which individuals self-reported their race. The research, published in 2024, found that 17 percent of court defendants and 10 percent of prison inmates had agency-recorded labels that did not match their self-reported race. The most common error was that Hispanics were recorded as white. In court records, only 16 percent of self-identified Hispanics were correctly labeled as Hispanic. In prison records, that figure rose to 67 percent, still meaning one-third of Hispanics were misclassified.
The direction of error was consistent. Of the 13 percent of court defendants who self-identified as Hispanic, 9 percentage points were recorded as white, 1 percentage point as black, and only 2 percentage points correctly as Hispanic.
A 2021 ProPublica investigation examined traffic stop data from Louisiana. Of the more than 73,000 traffic tickets issued by the Jefferson Parish Sheriff’s Office between 2015 and 2020, deputies identified only six people as Hispanic. The parish is 18 percent Hispanic. Louisiana State Police did worse: Of almost 80,000 tickets issued in the same parish over nearly six years, not a single one was issued to a person labeled Hispanic.
The problem extended beyond Louisiana. Between 2010 and 2015, Texas Department of Public Safety troopers misidentified more than 1.9 million drivers with traditionally Hispanic names as white. After a 2015 investigation, the number of Hispanics misidentified as white dropped by more than 75 percent because of increased focus on data accuracy. The improvement demonstrated that the misclassification was correctable when agencies actually tried.
A 2023 review from the University of California–Irvine Department of Criminology, Law and Society examined data infrastructure across 101 criminal justice agencies in 14 jurisdictions. Only 30 percent captured Hispanic ethnicity separately from race. The remaining 70 percent either lumped ethnicity into a single race variable or didn’t record it at all.
Across all four counties examined in detail, Hispanics made up a smaller share of arrests and jail bookings than their share of the population. In Harris County, Texas, Hispanics were 43 percent of the population but only 27 percent of those arrested. In Charleston County, South Carolina, the court data contained so few Hispanics that researchers couldn’t analyze it.
A Census Bureau study compared self-reported race from the 2010 census with those same individuals’ recorded race in administrative records for 351 million people—19.6 percent of administrative records had no race data whatsoever. Hispanics were 43 times more likely than non-Hispanic whites to have non-matching race responses across databases and nine times more likely to have missing race data.
A Deep Dive Into the Data
I acquired 1.5 million mugshots from 39 U.S. state corrections departments. Using these data and more, I built a statistical model combining facial recognition software with first- and last-name demographics from U.S. Census Bureau data to predict each individual’s race independently of official records.
The model achieved nearly 93 percent accuracy in distinguishing black, white, and Hispanic individuals. When the predictions diverged from official classifications, the pattern was one-directional: Hispanics were assigned as white, not the other way around. Nearly five times as many Hispanics were recorded as “white” as whites recorded as “Hispanic.”
A visual inspection confirms the pattern. The following individuals are all classified as “white” in official records.

Correcting for this mislabeling increased Hispanic criminal record rates by 31 percent and deflated white rates accordingly. Black rates changed by less than 1 percent. The bias in the data is asymmetric: It inflates white numbers and deflates Hispanic numbers, while leaving black classification largely untouched.

How the Analysis Worked
I built a statistical model to predict race from mugshots and names. With 93 percent agreement between predictions and official records, the model learned the actual race. The remaining 7 percent, where they disagree, mostly reflects mislabeling by authorities, not model mistakes. This is because the statistical model used was rigid. It fit the dominant pattern in the data and averaged through noise.
The model drew on 18 separate variables to predict race. Six came from DeepFace, a facial recognition system that analyzes mugshots and outputs probability scores across racial categories. Twelve came from name demographics: five variables from first names and seven from last names, drawn from U.S. Census Bureau data and academic databases tracking the racial distribution of American names.

This multi-source approach matters. A Hispanic surname alone could be ambiguous. Hispanic facial features alone could be subjective. But when mugshot analysis, surname demographics, and first name demographics all point to the same conclusion, the combined signal is difficult to dismiss.
Statistically, these 18 features can be compressed into a two-dimensional map where similar individuals cluster together. When I plotted the data this way, individuals officially classified as “white” appeared distributed throughout regions associated with Hispanics (circled below). The reverse was not true. Hispanics did not appear scattered throughout white regions. This visual pattern matched what the model detected numerically.

The model confirmed this pattern numerically as well. Even when predictions exceeded 95 percent confidence that someone was Hispanic, 22.4 percent were still recorded as white in official databases. The median confidence for these misclassified cases was 91.7 percent. These were not borderline cases. These were individuals whose facial features, surnames, and first names all strongly pointed toward Hispanic ethnicity.

Root Causes
Is partisan bias to blame?
I found no correlation between state political leaning and the extent of Hispanics assigned “white.” Republican and Democratic states showed similar error rates.
If partisan bias isn’t driving these errors, what is?
The answer lies in how race data enter the system in the first place.
The NBER study interviewed seven personnel who had worked in records management across multiple jurisdictions. They found that the police incident report is the critical point of data collection. Whatever the arresting officer writes down in the field shortly after arrest becomes the official record. That information then passes to jails, prosecutors, and courts, receiving “significant deference as the official record of a criminal incident and arrest.”
Once an error enters the police report, it propagates through the entire system. Records pass in one direction only. There is no feedback mechanism to correct mistakes.
A University of California–Irvine review confirmed this pattern. Across 101 criminal justice agencies, the most common method for determining race was police officer perception rather than self-report. In some jurisdictions, ethnicity fields were optional, making them “highly unreliable.” In others, when officers selected “Other” or “Unknown” for race, they routinely left the ethnicity field blank.
How unreliable can police reports be?
ProPublica analyzed surnames from traffic tickets issued in Louisiana. Of 167 tickets issued to drivers named Lopez, zero were labeled Hispanic. Same for the 252 tickets to people named Rodriguez, 234 named Martinez, 223 named Hernandez, and 189 named Garcia. Five of the top 10 most common surnames among people cited as “white” were Rodriguez, Martinez, Hernandez, Garcia, and Lopez.
Frank Baumgartner, a political science professor at the University of North Carolina–Chapel Hill who studies racial profiling, told ProPublica there is “no real rhyme or reason or logic” to how officers classify race.
“The white/black distinction is generally well recorded, but the Hispanic one is not. Many Hispanics are wrongly classified as white,” he said.
These police reports, however inaccurate, are then propagated throughout the entire justice system. Courts. Prisons. National records. And even when errors are identified, there is no incentive to fix them.
As the NBER report noted from the interviews, “fixing flawed race and ethnicity information typically requires filing formal paperwork and dealing with a records management bureaucracy, work that generally would fall on top of existing job responsibilities.”
Worse still, these errors are deliberately incentivized to propagate through the justice system, as courts actively avoid using race data.
The interviews further revealed: “Officers within the court system do not view race and ethnicity information as operationally relevant to their jobs since using that information during court proceedings could be illegal and a violation of civil rights law. Since it is considered improper to use this information, these fields are not monitored for data quality issues.”
This explains the difference between court and prison data. In courts, only 16 percent of Hispanics are correctly identified. In prisons, that figure increases to 67 percent. For prisons, the reason for the improvement is twofold: First, prisons verify race for gang-related security decisions, so accuracy matters. Second, prisons have greater autonomy; they can classify race independently of court records, police reports, and other intermediary sources.
Courts have no such ability. Furthermore, “prosecutors and investigating officers may be hesitant to contradict information contained in the original police report as those filings could be used by defense counsel to undermine the prosecution’s case, perhaps by suggesting that law enforcement officers may have misidentified the suspect.”
But all of this assumes “Hispanic” is recorded at all. In many states, it isn’t.
Nobody Knows How Many Hispanics Are in the System
“No one knows exactly how many Latinos are arrested each year or how many are in prison, on probation, or on parole,” the Urban Institute reported.
Of the 40 states that reported race in arrest records, only 15 reported ethnicity. Only one state, Alaska, consistently included data on Hispanics across all five categories examined: prison population, prison by offense, arrests, probation, and parole. Across the United States, the policies and laws for recording Hispanic ethnicity are a mosaic: inconsistent and wildly varying.
The consequences are predictable.
The Urban Institute said, “States that only count people as ‘black’ or ‘white’ likely label most of their Latino prison population ‘white,’ artificially inflating the number of ‘white’ people in prison and masking the white/black disparity in the criminal justice system.”
Thirty-eight states reported data on Hispanics in prison. Only 15 reported arrest data by ethnicity. Only one state reported prison data broken down by both ethnicity and offense type.
Large Hispanic populations do not guarantee better data. California, Florida, and New Mexico all reported ethnicity data in fewer categories than the average state. Florida, where Hispanics are 24 percent of the population, reported in only one category. New Mexico, 48 percent Hispanic, reported in only two. Seventy-five percent of Hispanics in the United States live in just 10 states, yet many have significant gaps in reported ethnicity data.

State and local agencies are not required to follow federal standards. The Office of Management and Budget’s Statistical Policy Directive No. 15 requires federal agencies to record Hispanic as a separate ethnicity, but state criminal justice systems can ignore this entirely. A National Conference of State Legislatures report found that of 13 states that reallocated prisoner residences during the 2020 redistricting cycle, only one recorded race and ethnicity consistent with federal guidelines.
I excluded nine states in my original analysis because they did not record Hispanic as a distinct category. In these states, all Hispanics are classified as white by default.
A Recent Category
So how did this become such a significant issue?
The problem has historical roots.
Hispanic wasn’t always a category in criminal justice data. For most of American history, it didn’t exist. The infrastructure was never built to track it. Even after it was built, the United States has taken decades to incorporate Hispanic classification into the criminal justice system.
- Pre-1980: On a federal level, all Hispanics were classified as white. Local statistics ignored Hispanics as a separate group entirely. A 1979 analysis noted that “both locally and nationally, almost nothing is known about the arrests of Hispanics.”
- 1980: The Social Security Administration added Hispanic as an option for the first time. Before 1980, the only options were white, black, and other.
- 1997: The Office of Management and Budget revised Statistical Policy Directive No. 15, requiring federal agencies to record Hispanic as a separate ethnicity distinct from race.
- 2013: The FBI’s Uniform Crime Report, the primary national source for arrest statistics, finally added a Hispanic category, 16 years after the federal standard was established.
Effectively, the criminal justice system has been treating Hispanics as white by default since the beginning. When states finally began adding Hispanic as an option, they did so inconsistently, with no national standard and no enforcement mechanism. The result is a system where Hispanic crime rates have been systematically understated for generations.
The Stakes
At every level, the system fails to record Hispanics accurately. Arresting officers default to white. Courts don’t monitor race data for quality. States ignore federal standards. Nine states don’t record Hispanic at all. The FBI didn’t add a Hispanic category until 2013. And even when Hispanic is an option, nearly one in three are still misclassified.
Given all this, is Hispanic crime data from the FBI even real? How can federal systems accurately aggregate race statistics when the underlying data are broken?
Race and crime are subjects of constant political debate. Policymakers, journalists, and researchers cite arrest rates and incarceration statistics to argue about policing, sentencing, and discrimination. These debates assume the numbers are accurate. They are not. For Hispanics, the official statistics have been wrong for decades, and correcting them changes the picture substantially. Until agencies adopt and enforce consistent standards, “white by default” will remain the unofficial policy.
Views expressed in this article are the opinions of the author and do not necessarily reflect the views of The Epoch Times.





















