The Disadvantages and Limitations of Using Large Language Models in the Field of Law

This piece is part of a series of research posts which explores various themes under AI and the Law. The posts are authored by student interns working under the project ‘Exploring Digital Transformation of India’s Consumer Grievance Redressal System through GenAI’.

Large Language Models (LLMs) have swiftly risen to prominence, becoming the latest buzzword across various industries. The legal field is no exception, as numerous players in the legal fraternity have embraced AI-powered platforms for legal tasks such as research and drafting. While LLMs promise significant advancements, they also come with notable limitations. A restricted training corpus hampers their performance in legal tasks, leading to gaps in understanding and potential inaccuracies in document generation, legal advice, and research. Furthermore, a lack of diversity in training data can introduce biases, distorting legal analysis and opinions. In this piece, we delve into three specific concerns with using LLMs in the legal realm: hallucinations, privacy, and intellectual-property issues.

Large Language Models and Hallucinations

Hallucination is one of the biggest limitations of using large language models (LLMs), especially in the field of law, where accuracy and veracity are non-negotiable. The infamous fiasco of two New York lawyers and their law firm who were sanctioned for submitting a legal brief with six fake case citations created by ChatGPT and fined $5,000  by the U.S. District Judge P. Kevin Castel is a good example of this. The judge criticized their “bad faith,” highlighting their avoidance and misleading statements although the firm contended that the mistake was made in good faith.    

It is, therefore, imperative for large language model-based platforms catering to legal research to minimize and prevent the possibility of hallucinations.  

The causes of hallucination in large language models span three main areas: data, training, and inference. Data-related causes include flawed data sources and inferior data preparation, where misinformation, biases, and knowledge boundaries (such as lack of knowledge of recent events) lead to inaccuracies. Training limitations include the architectural flaws of creating a large language model. In legal context, this could include the fact that most LLMs do not weigh in the credibility of the sources concerned. Therefore, an opinion expressed in a High Court judgement and a case note written by a first-year law student would have the same value for the LLM.  Also, it is impossible to forecast what tasks models will be expected to perform later on, or more precisely, what kinds of text they will be given as input or what kinds of text they will be requested to produce, at the moment the model is being trained. Therefore, giving a task for which it is not specifically trained can lead to errors. The inference-based hallucinations refer to hallucinations which arise from not construing the context of the question properly.  

There are several mitigating strategies that a creator of the LLM model as well as the user could adopt. Retrieval Augmented Generation or RAG is a technique that enhances model accuracy by incorporating external, domain-specific knowledge, improving the model’s performance through grounding in additional information. Another suggestion is having a Human Oversight involving subject matter experts and a robust review process to validate model outputs, especially in high-risk applications like law, to manage misinformation effectively.  Incorporating citations for giving every answer for legal research can be a good strategy so that the user can verify the information. Similarly, incorporating a precision mode wherein the LLM based platform does not give an answer instead of giving a possible vague or fake answer can help build user trust for the model. A feedback mechanism for users to give feedback on the accuracy of answers given or tasks performed can also help in the development of a better LLM-based platform. Users should be encouraged to verify the accuracy of the generated content. Fine-tuning of prompts given by users is another important mitigating strategy. Instead of giving a complex task at one go, it would be effective to break that task into simpler tasks and command the platform to perform.  

Large Language Models and Intellectual Property

AI tools like ChatGPT, Gemini, and CLOUD are trained on vast amounts of data and excel in tasks like summarization, translation, language generation, and more, making them invaluable tools for efficient performance.  However, these models come with a plethora of intellectual property (IP) challenges that encompass issues related to training data, model outputs, and underlying algorithms. From ownership rights to ethical concerns, navigating the complex terrain of LLMs requires a nuanced understanding of legal frameworks, ethical guidelines, and industry practices. 

One of the paramount concerns regarding LLMs revolves around the intricate issue of ownership, a challenge that demands clarity among the blurred lines of content generation and autonomous capabilities. The question arises: who owns the output generated by these models or is there any ownership attributed to the output? While existing legal frameworks offer limited guidance on intellectual property rights concerning this issue, different jurisdictions have different views on this concept. For example, in the UK, while the Copyright, Designs, and Patents Act of 1988 protects entirely computer-generated works, ownership typically rests with the individual who orchestrated the AI’s creation. Determining these individuals hinges on their level of interaction with the AI algorithm during the creative process. Minimal user input suggests ownership by the platform creators, as they contribute significantly to the overall creation of the works.  Similarly, in China and some countries of the European Union, there is a possibility of attribution of ownership for AI-generated content. However, other countries like the US, France, Germany, and others, require human authorship or intellectual effort for the attrition of ownership of the output and do not confer these rights to the automated output. 

This principle was demonstrated in the case of Naruto v. Slater, commonly referred to as the “monkey selfie” case, the US Court of Appeals for the Ninth Circuit ruled in 2018 that a monkey does not possess copyright ownership over a photograph it took of itself. Nonetheless, with the advancement in technology and the increasing number of people relying on AI to generate content, policymakers will have to adapt the existing laws to address these issues. 

Beyond issues of originality, significant challenges arise with the use of LLMs, including ambiguity in the use of copyrighted material in LLM training, international legal variations in the protection of AI work, and what constitutes fair use of LLM-generated material. For example, the New York Times recently sued OpenAI alleging that ChatGPT has been trained on their copyrighted articles without proper authorization, raising issues about the fair use of such material. Evidence presented by The New York Times includes specific instances where generated text from OpenAI’s models closely mirrors or replicates portions of their articles. This evidence underscores the potential for substantial overlap between the AI-generated content and the original copyrighted material, raising significant questions about the ethical and legal use of such data.

These challenges, if not effectively addressed, can hinder economic and technological innovation. Firms might be reluctant to invest in science, technology, and research if their inventions and creations can be easily replicated through LLMs. Similarly, artists may hesitate to create original works if their unique styles can be replicated by GenAI tools at a fraction of the cost. Therefore, intellectual property laws must specifically account for artificial intelligence tools to ensure protections that foster continued innovation and creativity.

Large Language Models and Privacy

There are several challenges associated with the use of Large Language Models (LLMs), with data privacy being one of the key challenges in this area. Data privacy is a significant risk because LLMs are models that are trained on huge datasets. In fact, the larger the data set, the better the LLM and the higher the privacy risk. 

At a primary level, privacy concerns stem from the risk of data breach. The risk is compounded in the case of the use of LLMs in the legal field, because legal information necessarily includes sensitive information of clients. In the absence of robust data security measures, there is a serious risk of a cyber-attack that may lead to a massive data leak. There are also times when the LLM may leak information on its own in its responses. This happens without the intervention of a third party and is caused because the model has memorised certain information from its training data set.

For example, ChatGPT suffered such a data breach in March 2023, as a result of which users were able to see other users’ search history. While the breach impacted less than 1% of its users, it clearly demonstrated the potential problems with LLMs. In fact, the incident even prompted Italy to ban ChatGPT in the country. 

The privacy concerns surrounding LLMs can be better understood through Mark McCreary’s example of comparing them to black boxes in an airplane. McCreary is the co-chair of privacy and data security practice at Fox Rothschild, and in an interview with CNN, he mentioned  that chatbots are like black boxes. They store vast amounts of data and use that information to answer questions and prompts. Hence, anything in the chatbot’s memory becomes fair game for other users. 

Another significant concern lies in the phenomenon of “jailbreaking,” where the circumvention of an LLM’s ethical and safety constraints may result in the release of confidential legal information. Further, the absence of a universally adopted framework within the community limits comprehensive security evaluations. Equally troubling is the practice of prompt hijacking, wherein carefully crafted prompts are employed to coax an LLM into generating content that diverges from its intended purpose. This manipulation can lead to the creation of counterfeit legal documents, the falsification of signatures, or other types of legal forgery and fraud. Furthermore, prompt injections, a phenomenon wherein attackers manipulate LLM-integrated systems to generate responses based on injected content rather than user requests, remain a vulnerability of LLMs. The insertion of malicious prompts into conversations with an LLM involving legal communications or documents poses a direct threat to the data privacy of confidential legal documents.

Conclusion

Large language models introduce complex challenges in the legal field, notably in the area of hallucinations, data privacy and intellectual property (IP) issues. Apart from these, there are other types of risk that also come up when LLMs are used in the legal field. Today, several judges are also using LLMs to assist them in their judicial work. For example, there are robot judges in Estonia for adjudicating small claims, and AI is also being extensively used in China. While LLMs and AI can be used for clerical purposes, they pose a grave danger if they are used in the judicial decision-making process itself. This is because LLMs have a certain amount of bias, mainly because the training data used is not heterogeneous in nature. As such, they are more likely to reinforce stereotypes. For example, research has shown that facial recognition technology suffers from racial bias and is more likely to pick a  black man as a criminal

Thus, while LLMs present significant potential, their current application in law requires careful navigation of risks related to hallucinations, IP conflicts, data privacy, bias and misuse. Both developers and legal professionals must adopt mitigative strategies, to ensure that LLMs contribute positively and ethically to legal practice. 

Authors

  • Jerrin B. Mathew, Vth Year BA LLB
  • Sannah Mudbidri,  IVth Year BA LLB
  • Banisethi Aashrita, IIIrd Year BA LLB
  • Sanika Atul Tapre, IInd Year LLB (Hons.)