The Structural Barriers to AI Lawyers: Why Law Isn't Code
AI can't replace lawyers because law is fundamentally ambiguous, ethically guarded, and built on decades of human-edited data—here's what builders need to understand.
The hype around AI lawyers is deafening. Every week, a new startup promises to automate discovery, draft contracts, or predict case outcomes. But the reality is far messier. A recent article on DiffuseAI outlines the structural barriers to AI lawyers, and the Hacker News discussion reveals why this field is not another SaaS play—it's a minefield of ambiguity, data moats, ethics, and entrenched human judgment.
Key Structural Barriers: Ambiguity, Data, and Ethics
The DiffuseAI article highlights three major hurdles. First, only three entities in the U.S. have near-complete coverage of legal data, and they sell access to editorial infrastructure—headnotes, practice guides, treatises—not raw opinions. That's a formidable data moat. Second, the legal profession's ethical rules assume a human at the center. Third, the billable hour model resists productivity gains; if a lawyer charges by the hour, why would they embrace tools that slash hours?
The HN thread adds nuance. One top commenter quotes Nilay Patel:
"But law isn’t actually code, and society and courts aren’t computers. [...] the law is not deterministic. You simply cannot take the facts of a case, the law as written, and predict the outcome of that case with any real certainty."
Another commenter pushes back on the data-moat thesis: CourtListener provides free access to millions of opinions. But the most heated discussion circles ethics. One commenter writes:
"What horrendous morals behind this article. Why would anyone advocate for prioritizing economics and technology before ethics, especially in something as important as law?"
The tension between efficiency and justice is palpable.
Why Legal Ambiguity Blocks AI
The core problem is that law is not code. Even with perfect data and flawless models, legal reasoning involves values, precedent interpretation, and societal nuance. Consider a contract dispute over "reasonable efforts." A language model cannot resolve that—it requires human judgment about context, industry standards, and equity.
As Nilay Patel said, the law is undeterministic. Facts, written laws, and outcomes don't have a one-to-one mapping. Every case has a unique set of facts, and courts interpret statutes with flexibility. That's a feature, not a bug. For AI, it's a structural barrier. ABA Model Rule 1.1 requires lawyers to exercise competent representation, which includes legal knowledge, skill, thoroughness, and preparation. No model can fully replicate that judgment.
The Data Moat: Editorial Judgment vs. Open Access
The data-moat argument is strong but not insurmountable. CourtListener offers open access to millions of opinions. Many state courts publish freely. The real moat is editorial judgment—decades of human-curated headnotes, treatises, and practice guides that major publishers own. That's a service, not just a dataset.
For builders, the path is clear: partner with legal publishers or leverage open data via APIs. Add value on top of their editorial work. Don't try to replicate it from scratch.
Ethical Constraints and Billable Hours
The ethical rules are clear. Lawyers must retain independent judgment. They cannot outsource core legal reasoning to an AI without review. That review is not free—it costs time and money. The billable hour model will shift from creation to verification, but verification isn't faster if you need to audit AI output thoroughly.
Some argue lawyers will use AI to handle more cases, but that raises ethical concerns about access to justice and quality of representation. The profession cannot simply accept AI-generated work product without oversight—that's malpractice.
What This Means for Legal Tech Builders
Build tools that augment, not replace. Instead of generating a full brief, build a system that surfaces relevant case law and highlights ambiguous language. Here's a simple example: flag undefined terms in a contract using NLP and a legal dictionary.
import spacy
nlp = spacy.load("en_core_web_sm")
contract = "The seller shall deliver the goods within a reasonable time."
doc = nlp(contract)
# Flag undefined terms
undefined = ["reasonable time"] # simplified
for token in doc:
if token.text.lower() in undefined:
print(f"Ambiguous term: {token.text}")
# Output: Ambiguous term: reasonable
This pattern—flag ambiguity, don't resolve it—keeps the human in charge. Other concrete steps: partner with legal publishers or use open data like CourtListener. Build on their APIs to add value without replicating their editorial work.
Ethically, ensure your product respects the professional standard of care. Provide audit trails, confidence scores, and disclaimers. The lawyer must still exercise independent judgment.
Takeaway
If you're building in legal tech, your reality is structural barriers—ambiguity, data moats, ethics. Embrace them. Don't try to "disrupt" law; aim to make it more accessible and efficient without sacrificing justice. That's the only path forward.
HN thread: https://news.ycombinator.com/item?id=48228728 Original article: https://www.diffuseai.pub/p/the-structural-barriers-to-ai-lawyers Free Law Project: https://www.courtlistener.com/ ABA Model Rules: https://www.americanbar.org/groups/professional_responsibility/publications/model_rules_of_professional_conduct/