[en] Using a corpus of compiled codes from U.S. states containing labeled tax law sections, we
train text classifiers to automatically tag tax-law documents and, further, to identify the associated revenue source (e.g. income, property, or sales). After evaluating classifier performance in held-out test data, we apply them to an historical corpus of U.S. state legislation to extract the flow of relevant laws over the years 1910 through 2010. We document that the classifiers are effective in the historical corpus, for example by automatically detecting establishments of state personal income taxes.
The trained models with replication code are published at https://github.com/luyang521/tax-classification.
Benjamin Alarie, Anthony Niblett, and Albert H Yoon. 2016. Using machine learning to predict outcomes in tax law. Can. Bus. LJ, 58:231.
Houda Alberts, Akin Ipek, Roderick Lucas, and Phillip Wozny. 2020. Coliee 2020: Legal information retrieval and entailment with legal embeddings and boosting. In JSAI International Symposium on Artificial Intelligence, pages 211-225. Springer.
Jonathan H Choi. 2020. An empirical study of statutory interpretation in tax law. NYUL Rev., 95:363.
Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. 2001. The elements of statistical learning, volume 1. Springer series in statistics New York.
Jerrold Soh Tsin Howe, Lim How Khang, and Ian Ernst Chai. 2019. Legal area classification: A comparative study of text classifiers on singapore supreme court judgments. arXiv preprint arXiv:1904.06470.
Mi-Young Kim, Juliano Rabelo, and Randy Goebel. 2019. Statute law information retrieval and entailment. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, pages 283-289.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Raquel Mochales Palau and Marie-Francine Moens. 2009. Argumentation mining: the detection, classification and structure of arguments in text. In Proceedings of the 12th international conference on artificial intelligence and law, pages 98-107.
Kurt Schmidheiny and Sebastian Siegloch. 2019. On event study designs and distributed-lag models: Equivalence, generalization and practical implications.
Souvik Sengupta and Vishwang Dave. 2021. Predicting applicable law sections from judicial case reports using legislative text analysis with machine learning. Journal of Computational Social Science, pages 1-14.
Bernhard Waltl, Georg Bonczek, Elena Scepankova, Jörg Landthaler, and Florian Matthes. 2017. Predicting the outcome of appeal decisions in germany's tax law. In International Conference on Electronic Participation, pages 89-99. Springer.
Thomas Wolf, Julien Chaumond, Lysandre Debut, Victor Sanh, Clement Delangue, Anthony Moi, Pierric Cistac, Morgan Funtowicz, Joe Davison, Sam Shleifer, et al. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38-45.
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. 2020. Big bird: Transformers for longer sequences. In NeurIPS.