Bias and Transparency in AI Tools Used by Courts : The Constitutional Crisis of Secret Algorithms in Criminal Sentencing

Introduction

Walk into any courtroom today and it mostly looks the same as it did thirty years ago. Robes, gavels, files stacked on counsel’s table. But something has changed that you can’t see from the public gallery. In several countries, the judge deciding whether someone goes home tonight or sits in remand is receiving input from a piece of software. A risk-score. A number generated by an algorithm that, in many cases, neither the judge nor the defence lawyer fully understands.

The case for these tools was never unreasonable. Courts are overwhelmed. Judges are human, they get tired, they carry unconscious assumptions, and studies have shown that something as mundane as the time of day can measurably affect sentencing outcomes. A defendant appearing before the bench just before lunch stands, statistically, a worse chance than one appearing first thing in the morning. If software could iron out that kind of arbitrary variation, why wouldn’t you use it?

What’s happened instead is more troubling. These tools haven’t scrubbed bias out of the system in documented cases they’ve encoded it more deeply, and then wrapped it in proprietary code that no one outside the company can examine. That’s not an efficiency problem. It’s a constitutional one.

How These Systems Actually Work

The best-known example in the United States is COMPAS – Correctional Offender Management Profiling for Alternative Sanctions, which is a name that tells you quite a lot about how these things get marketed. When someone enters the criminal justice system, information about them gets entered into the software: criminal record, employment history, housing situation, family background. The algorithm processes this and spits out a score – typically a number indicating how likely the person is to reoffend.

Not all uses of technology in courts are controversial. Software that schedules hearings, manages case files, or searches legal databases is unobjectionable – courts need that kind of help and technology delivers it well. The problem starts when the software shifts from organising information to making predictions about future human behaviour, and those predictions start influencing whether someone gets bail or a harsher sentence.

Predicting what a specific person will do in the future is genuinely hard – arguably impossible with any real precision. What these systems actually do is find statistical patterns in historical data. And that’s where things start to go wrong, because the historical data of criminal justice in many countries reflects decades of unequal enforcement.

The Bias Problem

The ProPublica investigation in 2016 is where this debate went public in a serious way. Journalists looked at thousands of COMPAS scores and compared them with what actually happened to the defendants afterward. What they found was stark: Black defendants who didn’t go on to reoffend were being labelled high-risk at nearly twice the rate of white defendants in the same position. White defendants who did reoffend had often been scored as low-risk.

Northpointe, the company behind COMPAS, disputed the methodology and to be fair, the dispute wasn’t entirely frivolous. Statistical fairness is genuinely complicated; different mathematical definitions of it can produce different and sometimes contradictory results.iv You can, in a technical sense, satisfy one definition of fairness while violating another. That’s a real problem for anyone trying to design these systems.

But the technical complexity doesn’t dissolve the underlying concern. An algorithm trained on arrest records from a city where policing was heavily concentrated in Black neighbourhoods is going to learn without anyone telling it to that living in those neighbourhoods is associated with criminal risk. Race doesn’t need to appear anywhere in the variables. The system finds it anyway, through proxies: postcode, employment gaps, family history with the justice system. The bias goes in through the data and comes out through the score.

This is what makes it insidious. The number looks clean. It looks like something a machine produced neutrally. But it carries the assumptions of every policing decision that fed into the training data, dressed up in the language of statistical science.

The Black Box Problem

There’s a principle that sits at the core of criminal justice, so basic it’s easy to overlook: if the state is going to use something against you, you get to see it and challenge it. That’s why cross-examination exists. That’s why disclosure rules exist. The whole adversarial structure of criminal proceedings assumes that truth is best found when both sides can scrutinise each other’s evidence.

Proprietary risk-assessment tools gut that principle. Northpointe won’t release COMPAS’s source code it’s a trade secret, commercially valuable, protected. So the defendant, their lawyer, and often the judge receive a score with no meaningful way to interrogate how it was produced. Which variables mattered most? Were any of them proxies for race or poverty? Was the training data appropriate for this jurisdiction? These questions can’t be answered because the methodology is hidden.

State v. Loomis, decided by the Wisconsin Supreme Court in 2016, put this tension on paper. Eric Loomis was sentenced partly on the basis of a COMPAS score and argued that he couldn’t get a fair hearing when part of the reasoning was a black box he couldn’t open. The court let the score stand but it did so uncomfortably, issuing warnings that judges shouldn’t treat these outputs as determinative and acknowledging the transparency concerns as genuine. A court simultaneously permitting a practice and expressing serious reservations about it is, to put it charitably, an unstable position.

What’s Actually at Stake Constitutionally

Due process isn’t a procedural nicety. It’s the guarantee that the state can’t take your freedom through arbitrary or unchallengeable means. When an opaque algorithm shapes a sentencing outcome, a defendant is effectively being judged by reasoning they’re not allowed to inspect. That’s a problem regardless of whether the algorithm is accurate because accuracy isn’t the only thing that matters in a justice system. Legitimacy matters too.

There’s also the authority problem that doesn’t get enough attention. Numbers carry weight. A score of 8 out of 10 on a recidivism risk scale looks precise and scientific in a way that a social worker’s narrative assessment doesn’t even if the underlying evidence base is weaker. Judges are trained to think critically, but the psychological pull of quantitative outputs is real. Research on how people process numerical versus narrative information suggests that numbers are harder to
argue with, which means a flawed algorithm can end up being more persuasive than it deserves to be.

And when a judge defers heavily to a system they don’t understand, accountability gets murky fast. Who’s responsible for the sentence the judge, or the algorithm, or the company that built it? Judicial accountability has always rested on judges being able to explain their reasoning. That chain breaks down when part of the reasoning is proprietary software.

What Should Actually Change

The answer isn’t to throw technology out of courts that ship has sailed, and frankly some of it is genuinely useful. The answer is to set conditions that any tool must meet before it gets anywhere near a liberty decision.

Explainability is the starting point. A system that produces a risk score must be able to say, in terms a defence lawyer can work with, which factors drove that score and how much weight each one carried. This isn’t a revolutionary ask it’s what we require from any expert witness. A forensic psychologist can’t just hand the court a number and refuse to explain their reasoning. There’s no principled basis for holding software to a lower standard.

Independent auditing should be a condition of any public contract. If Northpointe or any similar company wants to sell their product to courts institutions exercising state power over citizens they should have to open the methodology to independent researchers before deployment and periodically after it. Commercial confidentiality is a real interest, but it can’t trump constitutional requirements. The way to resolve that tension is through audit arrangements that protect genuine trade secrets while still allowing the kind of substantive scrutiny that due process demands.

Beyond that, courts need to be explicit about what these tools can and can’t do. An algorithmic score should be one piece of information among many not a recommendation the judge is expected to follow unless they have a good reason not to. The individual before the court is not a data point. They have circumstances, history, and context that no training dataset fully captures.

The Limits of What Algorithms Can Measure

It’s worth being specific about what a risk-assessment algorithm genuinely cannot do. It can’t sit across from someone and register whether their remorse sounds real or rehearsed. It can’t know about the job offer that came in last week, the relationship that just stabilised, or the treatment programme that’s actually working. It can’t factor in that this particular person’s prior record reflects a period of acute mental illness that’s now being managed. It works with structured data fields, and life doesn’t fit neatly into structured data fields.

Sentencing has always required something more than calculation a reckoning with the specific person, not the statistical type they happen to resemble. That’s not sentimentality; it’s what individualised justice actually means. Technology can inform that process. It shouldn’t be allowed to replace it.

This Isn’t Just an American Problem

COMPAS is the case study everyone uses because the ProPublica investigation gave it visibility, but the underlying dynamic private companies selling predictive tools to public justice institutions isn’t uniquely American. Courts and immigration authorities across Europe are piloting similar systems. Some jurisdictions in Asia have gone further, deploying AI in judicial administration at scale. The pressure to automate is global, driven by the same combination of caseload pressure, budget constraints, and technocratic optimism.

The countries best placed to catch problems are the ones with well-funded legal aid, active civil society organisations, and strong freedom of information frameworks because those are the conditions that allow problematic deployments to be scrutinised and challenged. Where those conditions don’t exist, algorithmic tools can become embedded in justice systems before anyone has really examined what they’re doing.

Which is an argument for setting standards now, while governance frameworks are still being built. Once these tools are deeply embedded once they’re relied on, budgeted for, politically defended reform becomes much harder. The window for getting this right is open, but it won’t stay open indefinitely.

Conclusion

Courts adopted these tools because the pitch made sense: reduce inconsistency, cut through human fallibility, process more cases faster. Whether the tools have delivered is increasingly not a live question most serious analysis now concedes that the bias problems are real and the transparency problems are serious. The debate that remains is whether those problems are fixable and, if so, on what terms.

That depends, I think, on whether institutions treat constitutional values as hard constraints or as obstacles to be managed. Due process, the right to challenge evidence, equal protection these aren’t bureaucratic inconveniences. They’re the architecture of a system that can call itself legitimate. A risk score produced by a private algorithm operating behind trade secret protection doesn’t fit within that architecture. Not as currently deployed.

Courts can use technology. They should use technology. But the technology has to earn its place through genuine transparency, independent verification, and a clear understanding that the judge, not the software, remains responsible for the outcome. An algorithm that can’t be examined isn’t a tool. It’s a delegation of public power to a black box, and that’s not something a court should be willing to accept.

THIS ARTICLE IS WRITTEN BY SHRUTIKSHA SHAH FROM COLLEGE OF COMMERCE,ARTS & SCIENCE,PATNA

REFERENCE :
(i) Fourteenth Amendment, United States Constitution — The Due Process Clause protects individuals against arbitrary state action and guarantees procedural fairness in judicial proceeding
(ii) Danielle Kehl, Priscilla Guo, and Samuel Kessler, ‘Algorithms in the Criminal Justice System,’ Berkman Klein Centre for Internet & Society at Harvard University (2017).
(iii) Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner, ‘Machine Bias,’ ProPublica, 23 May 2016 — analysed COMPAS risk scores and highlighted racial disparities in algorithmic predictions.
(iv )Christopher Slobogin, ‘Principles of Risk Assessment: Distinguishing Security from Liberty,’ Social Science Research Network (2021).
(v) Sandra Wachter, Brent Mittelstadt, and Chris Russell, ‘Counterfactual Explanations Without Opening the Black Box,’ Harvard Journal of Law & Technology, Vol. 31 (2018).
(vi) State v. Loomis, 881 N.W.2d 749 (Wisconsin Supreme Court, 2016) — examined whether the use of the COMPAS risk-assessment tool during sentencing violated due process rights.

Legal 60

Law News Aggregator

Bias and Transparency in AI Tools Used by Courts : The Constitutional Crisis of Secret Algorithms in Criminal Sentencing