AI Gone Awry: A Case Study on Generative AI in Legal Filings

Jun 19

Written By Yale Undergraduate Law Journal

By Jack Litke ‘25

In June of this year, a federal judge in the Southern District of New York sanctioned two New York attorneys. The case is the first confrontation between today’s most advanced computing technology and a legal industry famously resistant to change.

Plaintiff Roberto Mata alleged that he had been struck and injured by a serving cart on an Avianca Airlines flight from El Salvador to New York in August of 2019. Mata hired New York attorney Steven Schwartz to represent him in a 2022 state court action against the airliner. (1) The action was removed to federal court in the Southern District, where Schwartz had not been admitted. He enlisted the help of a partner Peter LoDuca, an admitted federal litigator, to appear before the court, even as Schwartz continued to prepare all legal arguments.

Since the two-year limitation period granted under the governing statute expired prior to the lawsuit’s filing, Avianca moved to have the case dismissed, prompting LoDuca to submit an opposition. Drafted by Schwartz, it argued that the airliner’s bankruptcy filings tolled the limitation period and enabled Mata to proceed in the suit. To support its arguments, the brief identified relevant case law and supplied citations to cases such as Varghese v. China Southern Airlines Co., Ltd.; Miller v. United Airlines, Inc.; and Shaboon v. EgyptAir. The attorneys for Avianca could not locate many of the cases in the Opposition.

A reply filed by Avianca stated, “Although Plaintiff ostensibly cites to a variety of cases in opposition to this motion, [we have] been unable to locate most of the case law cited…and the few cases which [we have] been able to locate do not stand for the propositions for which they are cited.”

After a similarly unsuccessful search conducted by the Court, LoDuca was ordered to file an affidavit attaching the decisions. Unable to locate full-text documents, Schwartz provided LoDuca with excerpts of almost all the cases, which LoDuca subsequently submitted to the court. The Court reviewed the excerpts, some of which bore docket numbers and the names of federal judges purported to have written them. According to Southern District Judge Kevin Castel, the “Varghese” decision, which the affidavit claimed was decided in the Eleventh Circuit Court of Appeals, contained “stylistic and reasoning flaws” and “gibberish” legal analysis uncharacteristic of authentic decisions. The Eleventh Circuit’s Clerk confirmed to the Southern District that “Varghese” was never heard before the Court. Other excerpts in the affidavit contained glaring logical errors including citing themselves as precedent, suggesting that they too were not real decisions. In total, the Court determined that six of the cases were entirely non-existent.

Judge Castel sanctioned the two attorneys before the Court where Schwartz revealed that he had used the generative AI chatbot, ChatGPT, to conduct his legal research. Most practicing federal court attorneys use the databases LexisNexis or Westlaw to locate relevant case law, but because Schwartz had not practiced in federal court, he did not maintain a membership with either service. Instead, when the research database he regularly used turned up minimal information, he turned to ChatGPT.

ChatGPT initially provided Schwartz with only general information about the relevant statutes and only began to respond with false information and fabricated cases after Schwartz pressed the chatbot more aggressively to give him information. According to court documentation, Schwartz first asked ChatGPT to “argue that the statute of limitations is tolled by bankruptcy of defendant pursuant to montreal convention”. To Schwartz’s apparent dissatisfaction, the chatbot’s response merely provided information and affirmed the argument he provided without any citations. Schwartz then demanded that it “provide case law”, “show me more cases”, and “give me some cases”. Then ChatGPT began making up fake cases, some of which referenced real events.

Schwartz later questioned the chatbot regarding the authenticity of the cases it provided him. He asked the chatbot, “Is varghese a real case” and “What is your source”, to which the computer program affirmed that the case “does indeed exist and can be found on legal research databases such as Westlaw and LexisNexis.” When Schwartz questioned whether “the other cases you provided [were] fake”, the chatbot assured him that they were “real and can be found in reputable legal databases such as LexisNexis and Westlaw.”

In this first and widely publicized case involving the use of generative AI, Judge Castel sanctioned Schwartz and LoDuca under Rule 11, which requires attorneys to certify the accuracy of representations submitted to the court. Attorneys can often avoid sanctioning for providing false information if they rectify their mistake as soon as they become aware of it. Since the attorneys continued to defend the genuineness of the cases even after the order to submit an affidavit cast doubt on their authenticity, Judge Castel fined them $5,000. The ruling also serves to sufficiently deter other attorneys from using unverified information obtained from chatbots in their filings.

Schwartz and LoDuca’s story highlight hurdles that generative AI must overcome before it can make an incursion into the practice of law. Most obviously, developers will need to cure AI of its propensity for crafting elaborate falsehoods such as the long “excerpts” of court cases provided to Schwartz. It will need to learn to resist requests for evidence proving arguments for which no direct textual evidence exists. Those who make use of AI will need to limit its domain to such uses as those matching its capabilities. Until then, the legal system will remain exposed to risks from AI; bogus opinions will be costly and time consuming to ferret out, judges may suffer reputational harm from fake opinions they did not write, and litigants will be granted the unhealthy recourse of calling even the authenticity of decided cases into question.

The challenges of integrating AI into the practice of law are not insurmountable, as a flurry of AI-powered legal-assistant start-up activity has shown. Casetext’s CoCounsel, for instance, is powered on OpenAI’s GPT-4 and can perform a variety of legal tasks including reviewing and summarizing documents, preparing deposition questions, searching databases, and evaluating contracts for policy compliance. A second start-up company called Harvey has also developed a competing product for lawyers built on GPT-4. The London-based firm Allen & Overy began testing Harvey for preparing merger and acquisition documents in February of this year.

Even though generative AI seems poised to make a wider entrance into law offices around the globe, it remains unlikely that the technology will eliminate anything other than low-level work. The technology relies on user input, existing online sources, pattern recognition, and probabilities to comply in producing content matching a user’s request. That may make the technology ideal for simple tasks such as drafting memos, summarizing documents, and searching databases, but it also imposes an important upper limit on its capabilities. It is hard to imagine how a system, which does not yet even understand the meaning of the words its produces, could reason, analogize, or be creative in the ways that we expect of lawyers and judges. As of now, the technology can create plausible-sounding answers to human prompts based on content it has been fed. What is not clear is whether it could ever reach a critical point at which it begins to understand words and concepts more fundamentally. Until then, it seems chatbots will assist lawyers merely in eliminating the drudge work involving the regurgitation and repackaging of information and by serving as a high-functioning search engine.

As Schwartz’s case demonstrates, AI chatbots are neither likely to come for the jobs of diligent attorneys nor replace the the legal system’s reliance on research databases like LexisNexis or Westlaw in the near future. But as generative AI becomes more reliable and powerful, its presence in the practice of law is likely to grow ever larger.

Endnotes

(1) Mata v. Avianca, Inc., 2023 U.S. Dist. LEXIS 108261, 2023 WL 4138427 (United States District Court for the Southern District of New York June 22, 2023, Filed). https://advance.lexis.com/api/document?collection=cases&id=urn:contentItem:68HV-7XY1-DY33-B2J7-00000-00&context=1516831.

(2) Id.

(3) “Generative AI Could Radically Alter the Practice of Law.” The Economist, June 6, 2023. https://www.economist.com/business/2023/06/06/generative-ai-could-radically-alter-the-practice-of-law?utm_medium=cpc.adword.pd&utm_source=google&ppccampaignID=17210591673&ppcadID=&utm_campaign=a.22brand_pmax&utm_content=conversion.direct-response.anonymous&gclid=CjwKCAjw1t2pBhAFEiwA_-A-NP75t6cYGuxLYBymUiFwxaLLG2fXjal5okzMelv44SParXAOO7JnihoCoVEQAvD_BwE&gclsrc=aw.ds.

(4) “Cocounsel.” Casetext, July 25, 2023. https://casetext.com/?utm_source=google&utm_medium=paidsearch&utm_campaign=brand-cocounsel&utm_content=_&utm_term=cocounsel&hsa_acc=1447382923&hsa_cam=19749619180&hsa_grp=146520618637&hsa_ad=649786144508&hsa_src=g&hsa_tgt=kwd-1455391329769&hsa_kw=cocounsel&hsa_mt=e&hsa_net=adwords&hsa_ver=3&gad=1&gclid=CjwKCAjwnOipBhBQEiwACyGLuo5PzsZzVxo_epdZXKF_d5udxf7ro-ObeRpsccWDuUs300_Fhg5TyxoCLfMQAvD_BwE.

(5) Beioley, Kate, and Cristina Criddle. “Allen & Overy Introduces AI Chatbot to Lawyers in Search of Efficiencies.” Financial Times, February 15, 2023. https://www.ft.com/content/baf68476-5b7e-4078-9b3e-ddfce710a6e2.

Yale Undergraduate Law Journal

AI Gone Awry: A Case Study on Generative AI in Legal Filings

The Comparative Ethics of Traditional and Gestational Surrogacy

Yale Undergraduate Law Journal