Phantom Precedents: When AI Hallucinations Invade Judicial Reasoning

[This post is authored by SpicyIP intern Anushka Dhankar. Anushka is a third-year student at the National Law School of India University. She is interested in the AI/copyright interface and hopes to pursue a career in IP litigation, with a dash of AI policy on the side. Long post ahead.]

As many of you may have read, last December, we saw the first instance of AI hallucination issues assailing administrative orders! The Bengaluru bench of the Income Tax Appellate Tribunal (“ITAT”) passed an interesting order in Buckeye Trust v. Principal Commissioner of Income Tax (ITA No.1051/Bang/2024) – it ascertained the taxability of a transaction on the basis of judgements which simply did not exist. The Tribunal relied on four judgements – of the Supreme Court and the Madras High Court – of which two consisted of completely fabricated case names and citations, one had a fabricated case name accompanied by the citation of a different case, and another led to a case for an irrelevant proposition of law. While the order itself was recently recalled and removed from the public domain (see the ITAT order below), these citations were reported to be instances of AI hallucinations – occurrences where an AI model generates information that is incorrect, misleading, or entirely fabricated.

Order by ITAT stating "On suo-moto perusal of the order dated 30.12.2024 in ITA No. 1051/Bang/2024, it is found that there are certain inadvertent errors in the order, therefore, as per the provisions of section 254(2) of the Act, the above order is recalled in its entirety and fixed for hearing afresh on 19.02.2025. Notice be issued to both parties" — Image from ITAT website

This incident calls for reckoning with the impacts of Generative AI usage in (or for) the discharge of judicial functions. In this post, I examine how AI hallucinations challenge conventional notions of plagiarism and citation, and ultimately underline the need for clearer ethical standards and wider AI literacy in the legal profession.

It’s a Bird? It’s a Plane? – GenAI in Legal Research, Plagiarism and the Problem of Classification

Intuitively, the ITAT’s erroneous reliance on hallucinated GenAI outputs seems to resemble plagiarism, i.e. the use of another’s work without attribution (see here). The Indian judiciary itself is no stranger to the problem of plagiarism, with previous instances including the plagiarising of a whopping 33 paragraphs from a law review article in a Delhi High Court judgement (analysed here) as well as larger context-blind adoption of formulations from U.S. judgements in constitutional cases (analysed by Gautam Bhatia here).

The Conceptual Limits of Plagiarism

However, a closer examination of the situation strains the conceptual boundaries of plagiarism. Plagiarism, as a concept, depends on attribution to a pre-existing, complete work – and here, I use the word “work” loosely as signifying any process leading to a complete output. In the previously cited examples, plagiarism was determined to have occurred by comparing the two outputs – for instance, the law review article and the judgement – and determining whether they were substantially similar.

In the case of GenAI usage, however, firstly, it can be argued that there is no pre-existing work per se; the generation of an output by an AI model such as ChatGPT is merely a process (specifically, the process of generating an output with the maximum likelihood of following the instructions in the command), and the AI-generated content resulting from such process is merely a restatement of the training data that the model uses. This view, however, fails to explain AI hallucinations such as the present case, where it is apparent that the model’s output has been generated based on no actual data (for the argument that an AI-generated output is not a “work” under the Copyright Act, see here). In this view, it becomes impossible to contend plagiarism; no one can refer to something which doesn’t exist!

Creatorship, Attribution and the AI Conundrum

A second possible view (which resolves the issues with the first) is that the AI-generated output itself is the pre-existing work. For instance, when I asked ChatGPT to define an AI hallucination before writing the first paragraph of this post (see the image below), I initiated a process resulting in the finished output which I reproduced in my writing. However, even in this example, (where I could easily reference an academic source defining AI hallucinations), my use of the GenAI model problematises the next step of attribution.

It is important to note that attribution, as an activity, is inextricably premised upon the notion of creatorship of a finished output. Therefore, the relevant question becomes: Who created the output I generated? Is it OpenAI, the owner of the process I employed to generate it? Alternatively, can I be considered the creator of this definition, considering that my prompt generated the work, and the work would not exist without it? Creatorship, as well as ownership of AI-generated content is currently a grey area in copyright law (see here, here and here), as the law depends upon the notion of human involvement. Thus, for the purposes of proving plagiarism, attribution is rendered impossible in the absence of a clear answer to the creatorship/ownership issues that come attached with the process of using a GenAI model.

An alternative view of attribution as divorced from ownership – essentially, one where I admit to the usage of ChatGPT in writing this post – presents itself as an easy ethical solution to this conundrum. However, citing processes and not sources distorts our current understanding of plagiarism — by this standard, every fiction author must credit the invention of the novel to avoid plagiarism allegations. AI hallucinations which are complete fabrications, thus, are difficult to classify in terms of our current legal and ethical vocabulary – they cannot possibly be “plagiarised”, and simultaneously, cannot possibly be “cited”, in the current understanding of the respective terms.

Whose Fault is it Anyway? The Problem of Sanctions and The Way Forward

It is perhaps for this reason that, in similar cases in other jurisdictions, Courts have primarily characterised reliance on AI hallucinations in court documents as lapses in professional ethics and have penalised the perpetrators (see here, here and here). In all these cases, inquiries which sought to determine whether the lapse in judgement was deliberate proceeded on similar facts, but yielded mixed results. While some cases held the responsibility to verify the accuracy of pleadings to be paramount and heavily sanctioned the lawyers involved, others preferred to view the situation as an honest mistake and a product of AI illiteracy – with a Texas judge even requiring the lawyer responsible to attend a course on GenAI in the legal field.

All these cases, however, did not face the real brunt of the problem of classification discussed earlier. All of them involved AI-hallucinated case law cited by counsels in court filings, which made it easier to categorise and sanction the factual inaccuracy of the pleadings (as opposed to the use of AI itself) as the issue. However, the presence of AI in an administrative order lends a new dimension to the question of sanctions. While the order in Buckeye Trust also contained fabricated case law, for the first time, it formed part of the reasoning for a judicial decision, which begs the question – what ethical and legal issues are we confronted with when AI is used in the very act of judicial reasoning?

I believe AI usage in judicial reasoning itself raises a host of theoretical and practical issues.

On a practical level, when AI usage goes wrong – either in terms of blatant factual inaccuracies generated by AI hallucinations, or incorrect interpretation or application of case law – the problem of classification makes it difficult to identify where our ethical priorities should lie, and therefore, what exactly potential sanctions must address. In each of the previously mentioned cases, courts saw AI use as a method of legal research, and thus attributed responsibility to the person using the method. However, attribution becomes complicated when AI supplies the reasoning for an order – firstly, because the extent of the role played by AI in reasoning, as well as the reasoning employed, may become unclear, making their contestability difficult. Secondly, since the process of writing judicial pronouncements relies heavily on party submissions, apportionment of blame on one party may inevitably obscure the whole picture. For instance, in a scenario where one party has used generated case law as their basis, there is of course the question of that counsel’s liability. There may also then be additional questions – an argument could be perhaps related to negligence of duty, if the opposing party did not point this out in court? After all, if a counsel allows a fake judicial precedent to go unquestioned, surely it means they have not thoroughly responded to the claims? And of course, if the judicial authority goes on to ‘rely’ on such fabricated case law, it indicates that there has not been substantial application of mind. At the same time, being overly ‘generous’ with punitive actions may not be the best way forward (more on this in the last paragraph).

Additionally, graver concerns of natural justice requirements lurk around the use of AI in the exercise of judicial functions. Can AI-generated reasoning fulfil the requirements of natural justice when it is simply identifying and applying patterns without understanding the underlying rules and principles? (for further arguments on this point, see here). For instance, in the present case, assume that the ITAT, on its own research, employed AI in its rendering of the present order. Now, S.254(1) of the Income Tax Act mandates the provision of an opportunity to be heard before passing an order. However, the unrestricted use of AI by judicial actors might render such an opportunity meaningless — by justifying AI as yet another legal research method, the Tribunal can essentially pass orders without considering what parties have pleaded.

Interestingly, leaving the issue of hallucination-based orders aside, the ITAT’s order recalling the order dated 30.12.2024 also suffers from illegality – under S.254(2) of the Act, the Tribunal does not have the power to recall its own order in its entirety, save to amend it by rectifying any mistake apparent from the record. In Commissioner of Income Tax v. Reliance Telecom, the Supreme Court clearly held that S.254(2) does not give the ITAT the power to rehear a case on merits – which is exactly what the recall order purports to do.

My View

I agree with the more lenient view taken by courts in the U.S. and Canada – instances of reliance upon AI hallucinations must be viewed holistically, as a product of widespread AI access, but limited AI literacy. Considering the difficulties in categorizing the ethical and legal issues with AI hallucinations, I believe the present incident calls for establishing a new, complementary standard of academic and ethical integrity, at the very least for public documents which determine rights and liabilities, or have civil consequences. The potential benefits of GenAI in saving judicial time have previously been much-touted – but recognition of such benefits must come with training for judicial officers as to what AI can do, what it cannot, and how GenAI usage squares with natural justice requirements. In the meantime, perhaps a mandatory inclusion of “citing any AI process” by disclaiming any use of GenAI in official documents (whether by counsels or by judges) is the best possible ethical safeguard we can observe. This not only clarifies where potentially innocuous errors / lapses may have crept in, but also may allow further understandings of where AI tools should and shouldn’t be allowed.

Legal 60

Law News Aggregator