Algorithmic Fairness from a Non-ideal Perspective

source

arxiv

source_type

latex

converted_with

pandoc

paper_version

2001.09773v1

title

Algorithmic Fairness from a Non-ideal Perspective

authors

["Sina Fazelpour","Zachary C. Lipton"]

date_published

2020-01-08 18:44:41+00:00

data_last_modified

2020-01-08 18:44:41+00:00

abstract

Inspired by recent breakthroughs in predictive modeling, practitioners in both industry and government have turned to machine learning with hopes of operationalizing predictions to drive automated decisions. Unfortunately, many social desiderata concerning consequential decisions, such as justice or fairness, have no natural formulation within a purely predictive framework. In efforts to mitigate these problems, researchers have proposed a variety of metrics for quantifying deviations from various statistical parities that we might expect to observe in a fair world and offered a variety of algorithms in attempts to satisfy subsets of these parities or to trade off the degree to which they are satisfied against utility. In this paper, we connect this approach to \emph{fair machine learning} to the literature on ideal and non-ideal methodological approaches in political philosophy. The ideal approach requires positing the principles according to which a just world would operate. In the most straightforward application of ideal theory, one supports a proposed policy by arguing that it closes a discrepancy between the real and the perfectly just world. However, by failing to account for the mechanisms by which our non-ideal world arose, the responsibilities of various decision-makers, and the impacts of proposed policies, naive applications of ideal thinking can lead to misguided interventions. In this paper, we demonstrate a connection between the fair machine learning literature and the ideal approach in political philosophy, and argue that the increasingly apparent shortcomings of proposed fair machine learning algorithms reflect broader troubles faced by the ideal approach. We conclude with a critical discussion of the harms of misguided solutions, a reinterpretation of impossibility results, and directions for future research.

author_comment

Accepted for publication at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) 2020

journal_ref

null

doi

null

primary_category

cs.CY

categories

["cs.CY","cs.AI","cs.LG","stat.ML"]

citation_level

alignment_text

pos

confidence_score

1.0

main_tex_filename

main.tex

bibliography_bbl

\begin{thebibliography}{60} \providecommand{\natexlab}[1]{#1} \providecommand{\url}[1]{\texttt{#1}} \expandafter\ifx\csname urlstyle\endcsname\relax \providecommand{\doi}[1]{doi: #1}\else \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi \bibitem[Anderson(2010)]{Anderson2010} Elizabeth Anderson. \newblock \emph{{The Imperative of Integration}}. \newblock Princeton University Press, Princeton, 2010. \bibitem[Angwin et~al.(2016)Angwin, Larson, Mattu, and Kirchner]{Angwin2016} Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. \newblock {Machine bias: There's software used across the country to predict future criminals. And it's biased against blacks}. \newblock \emph{ProPublica}, 2016. \bibitem[Barocas and Selbst(2016)]{Barocas2016} Solon Barocas and Andrew~D. Selbst. \newblock {Big Data's Disparate Impact}. \newblock \emph{California Law Review}, 2016. \bibitem[Berk et~al.(2018)Berk, Heidari, Jabbari, Kearns, and Roth]{Berk2018} Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. \newblock {Fairness in Criminal Justice Risk Assessments}. \newblock \emph{Sociological Methods {\&} Research}, 2018. \bibitem[Binns(2018)]{Binns18} Reuben Binns. \newblock {Fairness in Machine Learning: Lessons from Political Philosophy}. \newblock In \emph{Fairness, Accountability and Transparency (FAT*)}, 2018. \bibitem[Brown et~al.(2019)Brown, Chouldechova, Putnam-Hornstein, Tobin, and Vaithianathan]{Brown2019} Anna Brown, Alexandra Chouldechova, Emily Putnam-Hornstein, Andrew Tobin, and Rhema Vaithianathan. \newblock {Toward Algorithmic Accountability in Public Services: A Qualitative Study of Affected Community Perspectives on Algorithmic Decision-making in Child Welfare Services}. \newblock In \emph{Conference on Human Factors in Computing Systems (CHI)}, 2019. \bibitem[Chouldechova(2016)]{Chouldechova2016} Alexandra Chouldechova. \newblock {Fair prediction with disparate impact: A study of bias in recidivism prediction instruments}. \newblock \emph{CoRR}, abs/1610.0, 2016. \bibitem[Corbett-Davies and Goel(2018)]{Corbett-Davies2018} Sam Corbett-Davies and Sharad Goel. \newblock The measure and mismeasure of fairness: A critical review of fair machine learning. \newblock \emph{arXiv preprint arXiv:1808.00023}, 2018. \bibitem[Crawford and Schultz(2014)]{Crawford2014} Kate Crawford and Jason Schultz. \newblock {Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms}. \newblock \emph{Boston College Law Review}, 2014. \bibitem[Dwork et~al.(2012)Dwork, Hardt, Pitassi, Reingold, and Zemel]{Dwork2012} Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. \newblock {Fairness Through Awareness}. \newblock In \emph{Innovations in Theoretical Computer Science (ITCS)}, New York, NY, USA, 2012. \bibitem[Epp et~al.(2014)Epp, Maynard-Moody, and Haider-Markel]{Epp2014} Charles~R. Epp, Steven Maynard-Moody, and Donald~P. Haider-Markel. \newblock \emph{Pulled over: how police stops define race and citizenship}. \newblock University of Chicago Press, 2014. \bibitem[Eubanks(2018)]{Eubanks2018} Virginia Eubanks. \newblock \emph{{Automating inequality: how high-tech tools profile, police, and punish the poor}}. \newblock St. Martin's Press, 2018. \bibitem[Farrelly(2007)]{Farrelly2007} Colin Farrelly. \newblock Justice in ideal theory: A refutation. \newblock \emph{Political Studies}, 2007. \bibitem[Feinberg(1973)]{Feinberg1973} Joel Feinberg. \newblock {Duty and Obligation in the Non-Ideal World}. \newblock \emph{Journal of Philosophy}, 1973. \newblock \doi{10.2307/2025007}. \bibitem[Feinberg(1974)]{Feinberg1974} Joel Feinberg. \newblock Noncomparative justice. \newblock \emph{The Philosophical Review}, 1974. \newblock \doi{10.2307/2183696}. \bibitem[Feinberg(2014)]{Feinberg2014} Joel Feinberg. \newblock \emph{{Rights, Justice, and the Bounds of Liberty: Essays in Social Philosophy}}. \newblock Princeton University Press, Princeton, 2014. \bibitem[Feldman(2016)]{Feldman2016} Fred Feldman. \newblock \emph{{Distributive Justice}}. \newblock Oxford University Press, Oxford, 2016. \bibitem[Feldman et~al.(2015)Feldman, Friedler, Moeller, Scheidegger, and Venkatasubramanian]{Feldman2015} Michael Feldman, Sorelle~A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. \newblock {Certifying and Removing Disparate Impact}. \newblock In \emph{Knowledge Discovery and Data Mining (KDD)}, New York, NY, USA, 2015. \bibitem[Freeman et~al.(2011)Freeman, Penner, Saperstein, Scheutz, and Ambady]{Freeman2011} Jonathan~B. Freeman, Andrew~M. Penner, Aliya Saperstein, Matthias Scheutz, and Nalini Ambady. \newblock {Looking the Part: Social Status Cues Shape Race Perception}. \newblock \emph{PLoS ONE}, 2011. \bibitem[Galston(2010)]{Galston2010} William~A Galston. \newblock {Realism in political theory}. \newblock \emph{European Journal of Political Theory}, 2010. \newblock \doi{10.1177/1474885110374001}. \bibitem[Glymour and Herington(2019)]{Glymour2019} Bruce Glymour and Jonathan Herington. \newblock {Measuring the Biases That Matter: The Ethical and Casual Foundations for Measures of Fairness in Algorithms}. \newblock In \emph{Conference on Fairness, Accountability, and Transparency (FAT*)}, New York, NY, USA, 2019. \bibitem[Grgi{\'{c}}-Hlaca et~al.(2018)Grgi{\'{c}}-Hlaca, Zafar, Gummadi, and Weller]{Grgic-Hlaca2018a} Nina Grgi{\'{c}}-Hlaca, Muhammad~Bilal Zafar, Krishna~P Gummadi, and Adrian Weller. \newblock {Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning}. \newblock In Sheila~A McIlraith and Kilian~Q Weinberger, editors, \emph{Association for the Advancement of Artificial Intelligence (AAI)}, 2018. \bibitem[Hardt et~al.(2016)Hardt, Price, and Srebro]{Hardt2016} Moritz Hardt, Eric Price, and Nathan Srebro. \newblock {Equality of Opportunity in Supervised Learning}. \newblock In \emph{Advances in Neural Information Processing Systems (NeurIPS)}, 2016. \bibitem[Hellman(2008)]{Hellman2008} Deborah Hellman. \newblock \emph{{When is discrimination wrong?}} \newblock Harvard University Press, Cambridge, 2008. \bibitem[IBM()]{ibm360fairness} IBM. \newblock Ai fairness 360 open source toolkit, 2019. \newblock URL \url{https://aif360.mybluemix.net/}. \bibitem[Judge and Cable(2004)]{Judge2004} Timothy~A. Judge and Daniel~M. Cable. \newblock {The Effect of Physical Height on Workplace Success and Income: Preliminary Test of a Theoretical Model.} \newblock \emph{Journal of Applied Psychology}, 2004. \bibitem[Jung et~al.(2019)Jung, Kearns, Neel, Roth, Stapleton, and Wu]{Jung2019} Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton, and Zhiwei~Steven Wu. \newblock {Eliciting and Enforcing Subjective Individual Fairness}. \newblock \emph{arXiv}, 2019. \bibitem[Kamiran and Calders(2012)]{Kamiran2012} Faisal Kamiran and Toon Calders. \newblock Data preprocessing techniques for classification without discrimination. \newblock \emph{Knowledge and Information Systems}, 2012. \bibitem[Kilbertus et~al.(2017)Kilbertus, Rojas-Carulla, Parascandolo, Hardt, Janzing, and Sch{\"{o}}lkopf]{Kilbertus2017} Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Sch{\"{o}}lkopf. \newblock {Avoiding Discrimination through Causal Reasoning}. \newblock In \emph{Advances in Neural Information Processing Systems (NeurIPS)}, 2017. \bibitem[Kleinberg et~al.(2017)Kleinberg, Mullainathan, and Raghavan]{Kleinberg2017} Jon~M Kleinberg, Sendhil Mullainathan, and Manish Raghavan. \newblock {Inherent Trade-Offs in the Fair Determination of Risk Scores}. \newblock In \emph{Innovations in Theoretical Computer Science Conference (ITCS)}, 2017. \bibitem[Lipton et~al.(2018)Lipton, Chouldechova, and McAuley]{Lipton2018} Zachary~C. Lipton, Alexandra Chouldechova, and Julian McAuley. \newblock {Does Mitigating ML's Impact Disparity Require Treatment Disparity?} \newblock In \emph{Advances in Neural Information Processing Systems (NIPS)}, 2018. \bibitem[Liu et~al.(2018)Liu, Dean, Rolf, Simchowitz, and Hardt]{liu2018delayed} Lydia~T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. \newblock Delayed impact of fair machine learning. \newblock In \emph{International Conference on Machine Learning (ICML)}, 2018. \bibitem[Mallon(2006)]{Mallon2006} Ron Mallon. \newblock {'Race': Normative, Not Metaphysical or Semantic}. \newblock \emph{Ethics}, 2006. \bibitem[Miller(2017)]{Miller2017} David Miller. \newblock {Justice}. \newblock In Edward~N Zalta, editor, \emph{The Stanford Encyclopedia of Philosophy}. Stanford University, 2017. \bibitem[Mills(1998)]{Mills1998} Charles~W. Mills. \newblock \emph{{Blackness visible: essays on philosophy and race}}. \newblock Cornell University Press, 1998. \bibitem[Mills(2005)]{Mills2005} Charles~Wade Mills. \newblock {"Ideal Theory" as Ideology}. \newblock \emph{Hypatia}, 2005. \bibitem[Montague(1980)]{Montague1980} Phillip Montague. \newblock {Comparative and Non-Comparative Justice}. \newblock \emph{The Philosophical Quarterly}, 1980. \bibitem[Nabi and Shpitser(2018)]{Nabi2018} Razieh Nabi and Ilya Shpitser. \newblock {Fair Inference on Outcomes.} \newblock In \emph{Association for the Advancement of Artificial Intelligence (AAAI)}, 2018. \bibitem[O'Connor(2019)]{OConnor2019a} Cailin O'Connor. \newblock \emph{{The Origins of Unfairness}}. \newblock Oxford University Press, Oxford, 2019. \bibitem[on~Ethics~of Autonomous and Systems(2017)]{IEEE} The IEEE Global~Initiative on~Ethics~of Autonomous and Intelligent Systems. \newblock {Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems, Version 2}. \newblock Technical report, IEEE, 2017. \bibitem[Pateman and Mills(2007)]{Pateman2007} Carole Pateman and Charles~Wade Mills. \newblock \emph{{Contract and Domination}}. \newblock Polity Press, Cambridge, 2007. \bibitem[Pearl(2009)]{pearl2009causality} Judea Pearl. \newblock \emph{Causality}. \newblock Cambridge university press, 2009. \bibitem[Penner and Saperstein(2008)]{Penner2008} Andrew~M Penner and Aliya Saperstein. \newblock {How social status shapes race.} \newblock \emph{Proceedings of the National Academy of Sciences (PNAS)}, 2008. \bibitem[Petersen and Novick(1976)]{Petersen1976} Nancy~S. Petersen and Melvin~R. Novick. \newblock {An Evaluation of Some Models for Culture-Fair Selection}. \newblock \emph{Journal of Educational Measurement}, 1976. \bibitem[Proctor and Schiebinger(2008)]{Proctor2008} Robert~N. Proctor and Londa Schiebinger. \newblock \emph{{Agnotology: The Making and Unmaking of Ignorance}}. \newblock Stanford University Press, 2008. \bibitem[Pymetrics, Inc.()]{pymetrics} Pymetrics, Inc. \newblock Matching talent to opportunity, 2019. \newblock URL \url{https://www.pymetrics.com/employers/}. \bibitem[Rawls(1999)]{Rawls1999} John Rawls. \newblock \emph{{A Theory of Justice}}. \newblock Harvard University Press, 1999. \bibitem[Rawls(2001)]{Rawls2001} John Rawls. \newblock \emph{{Justice as Fairness: A Restatement}}. \newblock Harvard University Press, Cambridge, 2001. \bibitem[Ray(1998)]{Ray1998} Debraj. Ray. \newblock \emph{{Development economics}}. \newblock Princeton University Press, Princeton, 1998. \bibitem[Rudolph et~al.(2009)Rudolph, Wells, Weller, and Baltes]{Rudolph2009} Cort~W. Rudolph, Charles~L. Wells, Marcus~D. Weller, and Boris~B. Baltes. \newblock A meta-analysis of empirical studies of weight-based bias in the workplace. \newblock \emph{Journal of Vocational Behavior}, 2009. \bibitem[Schapiro(2003)]{Schapiro2003} Tamar Schapiro. \newblock {Compliance, Complicity, and the Nature of Nonideal Conditions}. \newblock \emph{Journal of Philosophy}, 2003. \bibitem[Schelling(1971)]{Schelling1971} Thomas~C Schelling. \newblock Dynamic models of segregation. \newblock \emph{Journal of mathematical sociology}, 1971. \bibitem[Selbst et~al.(2019)Selbst, Boyd, Friedler, Venkatasubramanian, and Vertesi]{Selbst2019} Andrew~D. Selbst, Danah Boyd, Sorelle~A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. \newblock Fairness and abstraction in sociotechnical systems. \newblock In \emph{Fairness, Accountability, and Transparency (FAT*)}, 2019. \bibitem[Sen(2006)]{Sen2006} Amartya Sen. \newblock {What Do We Want from a Theory of Justice?} \newblock \emph{Journal of Philosophy}, 2006. \bibitem[Sen(2009)]{Sen2009} Amartya Sen. \newblock \emph{{The Idea of Justice}}. \newblock Harvard University Press, Cambridge, 2009. \bibitem[Simmons(2010)]{Simmons2010} A.~John Simmons. \newblock {Ideal and Nonideal Theory}. \newblock \emph{Philosophy {\&} Public Affairs}, 2010. \bibitem[Stemplowska and Swift(2012)]{Stemplowska2012} Zofia Stemplowska and Adam Swift. \newblock {Ideal and Nonideal Theory}. \newblock In \emph{The Oxford Handbook of Political Philosophy}. Oxford University Press, 2012. \bibitem[Tuana(2010)]{Tuana2010} Nancy Tuana. \newblock {Leading with ethics, aiming for policy: new opportunities for philosophy of science}. \newblock \emph{Synthese}, 2010. \bibitem[Valentini(2012)]{Valentini2012} Laura Valentini. \newblock {Ideal vs. Non-ideal Theory: A Conceptual Map}. \newblock \emph{Philosophy Compass}, 2012. \bibitem[Zafar et~al.(2017)Zafar, Valera, {Gomez Rodriguez}, and Gummadi]{Zafar2017} Muhammad~Bilal Zafar, Isabel Valera, Manuel {Gomez Rodriguez}, and Krishna~P Gummadi. \newblock {Fairness Beyond Disparate Treatment {\&} Disparate Impact: Learning Classification Without Disparate Mistreatment}. \newblock In \emph{World Wide Web (WWW)}, 2017. \end{thebibliography}

bibliography_bib

@book{Anderson2010, address = {Princeton}, author = {Anderson, Elizabeth}, publisher = {Princeton University Press}, title = {{The Imperative of Integration}}, year = {2010} } @article{Angwin2016, author = {Angwin, Julia and Larson, Jeff and Mattu, Surya and Kirchner, Lauren}, journal = {ProPublica}, title = {{Machine bias: There's software used across the country to predict future criminals. And it's biased against blacks}}, year = {2016} } @inproceedings{Barabas2018, author = {Chelsea Barabas and Madars Virza and Karthik Dinakar and Joichi Ito and Jonathan Zittrain}, title = {Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment}, booktitle = {Fairness, Accountability and Transparency (FAT*)}, year = {2018} } @article{Barocas2016, author = {Barocas, Solon and Selbst, Andrew D.}, journal = {California Law Review}, title = {{Big Data's Disparate Impact}}, year = {2016} } @article{Bear2011a, author = {Bear, Julia B and Woolley, Anita Williams}, journal = {Interdisciplinary Science Reviews}, title = {{The role of gender in team collaboration and performance}}, year = {2011} } @article{Berk2018, author = {Berk, Richard and Heidari, Hoda and Jabbari, Shahin and Kearns, Michael and Roth, Aaron}, journal = {Sociological Methods {\&} Research}, title = {{Fairness in Criminal Justice Risk Assessments}}, year = {2018} } @inproceedings{Binns18, author = {Binns, Reuben}, booktitle = {Fairness, Accountability and Transparency (FAT*)}, title = {{Fairness in Machine Learning: Lessons from Political Philosophy}}, year = {2018} } @inproceedings{Brown2019, author = {Brown, Anna and Chouldechova, Alexandra and Putnam-Hornstein, Emily and Tobin, Andrew and Vaithianathan, Rhema}, booktitle = {Conference on Human Factors in Computing Systems (CHI)}, title = {{Toward Algorithmic Accountability in Public Services: A Qualitative Study of Affected Community Perspectives on Algorithmic Decision-making in Child Welfare Services}}, year = {2019} } @article{Chouldechova2016, archivePrefix = {arXiv}, arxivId = {1610.07524}, author = {Chouldechova, Alexandra}, eprint = {1610.07524}, journal = {CoRR}, title = {{Fair prediction with disparate impact: A study of bias in recidivism prediction instruments}}, volume = {abs/1610.0}, year = {2016} } @article{Corbett-Davies2018, title={The measure and mismeasure of fairness: A critical review of fair machine learning}, author={Corbett-Davies, Sam and Goel, Sharad}, journal={arXiv preprint arXiv:1808.00023}, year={2018} } @article{Crasnow2008, author = {Crasnow, Sharon}, journal = {Science {\&} Education}, title = {{Feminist philosophy of science: 'standpoint' and knowledge}}, year = {2008} } @article{Crawford2014, author = {Crawford, Kate and Schultz, Jason}, journal = {Boston College Law Review}, title = {{Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harms}}, year = {2014} } @inproceedings{Danks2017, author = {Danks, David and London, Alex John}, booktitle = {International Joint Conference on Artificial Intelligence (IJCAI)}, title = {{Algorithmic Bias in Autonomous Systems}}, year = {2017} } @inproceedings{Dwork2012, address = {New York, NY, USA}, author = {Dwork, Cynthia and Hardt, Moritz and Pitassi, Toniann and Reingold, Omer and Zemel, Richard}, booktitle = {Innovations in Theoretical Computer Science (ITCS)}, title = {{Fairness Through Awareness}}, year = {2012} } @book{Epp2014, author = {Epp, Charles R. and Maynard-Moody, Steven and Haider-Markel, Donald P.}, publisher = {University of Chicago Press}, title = {Pulled over: how police stops define race and citizenship}, year = {2014} } @book{Eubanks2018, author = {Eubanks, Virginia}, publisher = {St. Martin's Press}, title = {{Automating inequality: how high-tech tools profile, police, and punish the poor}}, year = {2018} } @article{Farrelly2007, author = {Colin Farrelly}, title ={Justice in Ideal Theory: A Refutation}, journal = {Political Studies}, year = {2007} } @article{Feinberg1973, author = {Feinberg, Joel}, doi = {10.2307/2025007}, journal = {Journal of Philosophy}, publisher = {Journal of Philosophy, Inc.}, title = {{Duty and Obligation in the Non-Ideal World}}, year = {1973} } @article{Feinberg1974, author = {Feinberg, Joel}, doi = {10.2307/2183696}, journal = {The Philosophical Review}, title = {Noncomparative Justice}, year = {1974} } @book{Feinberg2014, address = {Princeton}, author = {Feinberg, Joel}, publisher = {Princeton University Press}, title = {{Rights, Justice, and the Bounds of Liberty: Essays in Social Philosophy}}, year = {2014} } @inproceedings{Feldman2015, address = {New York, NY, USA}, author = {Feldman, Michael and Friedler, Sorelle A and Moeller, John and Scheidegger, Carlos and Venkatasubramanian, Suresh}, booktitle = {Knowledge Discovery and Data Mining (KDD)}, title = {{Certifying and Removing Disparate Impact}}, year = {2015} } @book{Feldman2016, address = {Oxford}, author = {Feldman, Fred}, publisher = {Oxford University Press}, title = {{Distributive Justice}}, year = {2016} } @article{Freeman2011, author = {Freeman, Jonathan B. and Penner, Andrew M. and Saperstein, Aliya and Scheutz, Matthias and Ambady, Nalini}, journal = {PLoS ONE}, title = {{Looking the Part: Social Status Cues Shape Race Perception}}, year = {2011} } @article{Galston2010, author = {Galston, William A}, doi = {10.1177/1474885110374001}, journal = {European Journal of Political Theory}, title = {{Realism in political theory}}, year = {2010} } @inproceedings{Glymour2019, address = {New York, NY, USA}, author = {Glymour, Bruce and Herington, Jonathan}, booktitle = {Conference on Fairness, Accountability, and Transparency (FAT*)}, title = {{Measuring the Biases That Matter: The Ethical and Casual Foundations for Measures of Fairness in Algorithms}}, year = {2019} } @inproceedings{Grgic-Hlaca2018a, author = {Grgi{\'{c}}-Hlaca, Nina and Zafar, Muhammad Bilal and Gummadi, Krishna P and Weller, Adrian}, booktitle = {Association for the Advancement of Artificial Intelligence (AAI)}, editor = {McIlraith, Sheila A and Weinberger, Kilian Q}, title = {{Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning}}, year = {2018} } @book{Harding2015, address = {Chicago}, author = {Harding, Sandra}, publisher = {University of Chicago Press}, title = {{Objectivity and Diversity: Another Logic of Scientific Research}}, year = {2015} } @inproceedings{Hardt2016, author = {Hardt, Moritz and Price, Eric and Srebro, Nathan}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, title = {{Equality of Opportunity in Supervised Learning}}, year = {2016} } @book{Hellman2008, address = {Cambridge}, author = {Hellman, Deborah}, publisher = {Harvard University Press}, title = {{When is discrimination wrong?}}, year = {2008} } @article{Hoffmann2019, author = {Hoffmann, Anna Lauren}, journal = {Information Communication and Society}, title = {{Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse}}, year = {2019} } @techreport{IEEE, author = {The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems}, title = {{Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems, Version 2}}, year = {2017}, institution={IEEE}, } @article{Judge2004, author = {Judge, Timothy A. and Cable, Daniel M.}, journal = {Journal of Applied Psychology}, title = {{The Effect of Physical Height on Workplace Success and Income: Preliminary Test of a Theoretical Model.}}, year = {2004} } @article{Jung2019, archivePrefix = {arXiv}, arxivId = {1905.10660}, author = {Jung, Christopher and Kearns, Michael and Neel, Seth and Roth, Aaron and Stapleton, Logan and Wu, Zhiwei Steven}, title = {{Eliciting and Enforcing Subjective Individual Fairness}}, year = {2019}, journal={arXiv} } @article{Kamiran2012, author = {Kamiran, Faisal and Calders, Toon}, journal = {Knowledge and Information Systems}, title = {Data preprocessing techniques for classification without discrimination}, year = {2012} } @inproceedings{Kilbertus2017, author = {Kilbertus, Niki and Rojas-Carulla, Mateo and Parascandolo, Giambattista and Hardt, Moritz and Janzing, Dominik and Sch{\"{o}}lkopf, Bernhard}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, title = {{Avoiding Discrimination through Causal Reasoning}}, year = {2017} } @inproceedings{Kleinberg2017, author = {Kleinberg, Jon M and Mullainathan, Sendhil and Raghavan, Manish}, booktitle = {Innovations in Theoretical Computer Science Conference (ITCS)}, title = {{Inherent Trade-Offs in the Fair Determination of Risk Scores}}, year = {2017} } @inproceedings{Lipton2018, author = {Lipton, Zachary C. and Chouldechova, Alexandra and McAuley, Julian}, booktitle = {Advances in Neural Information Processing Systems (NIPS)}, title = {{Does Mitigating ML's Impact Disparity Require Treatment Disparity?}}, year = {2018} } @article{Mallon2006, author = {Mallon, Ron}, journal = {Ethics}, title = {{'Race': Normative, Not Metaphysical or Semantic}}, year = {2006} } @incollection{Miller2017, author = {Miller, David}, booktitle = {The Stanford Encyclopedia of Philosophy}, editor = {Zalta, Edward N}, howpublished = {$\backslash$url{\{}https://plato.stanford.edu/archives/fall2017/entries/justice/{\}}}, title = {{Justice}}, publisher = {Stanford University}, year = {2017} } @book{Mills1998, author = {Mills, Charles W.}, publisher = {Cornell University Press}, title = {{Blackness visible: essays on philosophy and race}}, year = {1998} } @article{Mills2005, author = {Mills, Charles Wade}, journal = {Hypatia}, title = {{"Ideal Theory" as Ideology}}, year = {2005} } @article{Montague1980, author = {Montague, Phillip}, journal = {The Philosophical Quarterly}, mendeley-groups = {AI fairness}, title = {{Comparative and Non-Comparative Justice}}, year = {1980} } @book{Morrison2015, address = {New York}, author = {Morrison, Margaret}, publisher = {Oxford University Press}, title = {{Reconstructing Reality: Models, Mathematics, and Simulations}}, year = {2015} } @inproceedings{Nabi2018, author = {Nabi, Razieh and Shpitser, Ilya}, booktitle = {Association for the Advancement of Artificial Intelligence (AAAI)}, title = {{Fair Inference on Outcomes.}}, year = {2018} } @book{OConnor2019a, address = {Oxford}, author = {O'Connor, Cailin}, publisher = {Oxford University Press}, title = {{The Origins of Unfairness}}, year = {2019} } @book{Pateman2007, address = {Cambridge}, author = {Pateman, Carole and Mills, Charles Wade}, publisher = {Polity Press}, title = {{Contract and Domination}}, year = {2007} } @article{Penner2008, author = {Penner, Andrew M and Saperstein, Aliya}, journal = {Proceedings of the National Academy of Sciences (PNAS)}, title = {{How social status shapes race.}}, year = {2008} } @article{Petersen1976, author = {Petersen, Nancy S. and Novick, Melvin R.}, journal = {Journal of Educational Measurement}, title = {{An Evaluation of Some Models for Culture-Fair Selection}}, year = {1976} } @incollection{Phillips2017, author = {Phillips, Katherine W.}, booktitle = {The Diversity Bonus: How Great Teams Pay off in the Knowledge Economy}, title = {{What is the Real Value of Diversity in Organizations? Questioning our Assumptions}}, year = {2017} } @book{Proctor2008, author = {Proctor, Robert N. and Schiebinger, Londa}, publisher = {Stanford University Press}, title = {{Agnotology: The Making and Unmaking of Ignorance}}, year = {2008} } @book{Rawls1999, author = {Rawls, John}, publisher = {Harvard University Press}, title = {{A Theory of Justice}}, year = {1999} } @book{Rawls2001, address = {Cambridge}, author = {Rawls, John}, publisher = {Harvard University Press}, title = {{Justice as Fairness: A Restatement}}, year = {2001} } @book{Ray1998, address = {Princeton}, author = {Ray, Debraj.}, publisher = {Princeton University Press}, title = {{Development economics}}, year = {1998} } @article{Rudolph2009, author = {Rudolph, Cort W. and Wells, Charles L. and Weller, Marcus D. and Baltes, Boris B.}, journal = {Journal of Vocational Behavior}, publisher = {Academic Press}, title = {A meta-analysis of empirical studies of weight-based bias in the workplace}, year = {2009} } @article{Schapiro2003, author = {Schapiro, Tamar}, journal = {Journal of Philosophy}, title = {{Compliance, Complicity, and the Nature of Nonideal Conditions}}, year = {2003} } @article{Schelling1971, title={Dynamic models of segregation}, author={Schelling, Thomas C}, journal={Journal of mathematical sociology}, year={1971}, publisher={Taylor \& Francis} } @inproceedings{Selbst2019, author = {Selbst, Andrew D. and Boyd, Danah and Friedler, Sorelle A. and Venkatasubramanian, Suresh and Vertesi, Janet}, title = {Fairness and Abstraction in Sociotechnical Systems}, booktitle = {Fairness, Accountability, and Transparency (FAT*)}, year = {2019}, } @article{Sen2006, author = {Sen, Amartya}, journal = {Journal of Philosophy}, title = {{What Do We Want from a Theory of Justice?}}, year = {2006} } @book{Sen2009, address = {Cambridge}, author = {Sen, Amartya}, publisher = {Harvard University Press}, title = {{The Idea of Justice}}, year = {2009} } @article{Simmons2010, author = {Simmons, A. John}, journal = {Philosophy {\&} Public Affairs}, title = {{Ideal and Nonideal Theory}}, year = {2010} } @article{Steel2017, author = {Steel, Daniel}, journal = {Erkenntnis}, title = {{Sustainability and the Infinite Future: A Case Study of a False Modeling Assumption in Environmental Economics}}, year = {2017} } @article{Steel2019, author = {Steel, Daniel and Fazelpour, Sina and Crewe, Bianca and Gillette, Kinley}, journal = {Synthese}, title = {{Information elaboration and epistemic effects of diversity}}, year = {2019} } @incollection{Stemplowska2012, author = {Stemplowska, Zofia and Swift, Adam}, booktitle = {The Oxford Handbook of Political Philosophy}, publisher = {Oxford University Press}, title = {{Ideal and Nonideal Theory}}, year = {2012} } @article{Tuana2010, author = {Tuana, Nancy}, journal = {Synthese}, title = {{Leading with ethics, aiming for policy: new opportunities for philosophy of science}}, year = {2010} } @article{Valentini2012, author = {Valentini, Laura}, journal = {Philosophy Compass}, title = {{Ideal vs. Non-ideal Theory: A Conceptual Map}}, year = {2012} } @article{Woodward2006, author = {Woodward, James}, journal = {Journal of Economic Methodology}, title = {{Some varieties of robustness}}, year = {2006} } @article{Woolley2010, author = {Woolley, Anita Williams and Chabris, Christopher F and Pentland, Alex and Hashmi, Nada and Malone, Thomas W}, journal = {Science}, title = {{Evidence for a collective intelligence factor in the performance of human groups.}}, year = {2010} } @article{Woolley2015, author = {Woolley, Anita Williams and Aggarwal, Ishani and Malone, Thomas W.}, journal = {Current Directions in Psychological Science}, title = {{Collective Intelligence and Group Performance}}, year = {2015} } @incollection{Wylie2011, address = {Dordrecht}, author = {Wylie, Alison}, booktitle = {Feminist Epistemology and Philosophy of Science}, title = {{What Knowers Know Well: Women, Work and the Academy}}, url = {http://link.springer.com/10.1007/978-1-4020-6835-5{\_}8}, year = {2011} } @inproceedings{Zafar2017, author = {Zafar, Muhammad Bilal and Valera, Isabel and {Gomez Rodriguez}, Manuel and Gummadi, Krishna P}, booktitle = {World Wide Web (WWW)}, title = {{Fairness Beyond Disparate Treatment {\&} Disparate Impact: Learning Classification Without Disparate Mistreatment}}, year = {2017} } @misc{ibm360fairness, key = {IBM}, title={AI Fairness 360 Open Source Toolkit}, url={https://aif360.mybluemix.net/}, year={2019} } @misc{pymetrics, key = {Pymetrics, Inc.}, title={Matching talent to opportunity}, url={https://www.pymetrics.com/employers/}, year={2019} } @book{pearl2009causality, title={Causality}, author={Pearl, Judea}, year={2009}, publisher={Cambridge university press} } @inproceedings{liu2018delayed, title={Delayed impact of fair machine learning}, author={Liu, Lydia T and Dean, Sarah and Rolf, Esther and Simchowitz, Max and Hardt, Moritz}, booktitle={International Conference on Machine Learning (ICML)}, year={2018} }

arxiv_citations

{"1808.00023":true}

alignment_newsletter

{"source":"alignment-newsletter","source_type":"google-sheets","converted_with":"python","venue":"arXiv","newsletter_category":"Fairness and bias","highlight":false,"newsletter_number":"AN #98","newsletter_url":"https://mailchi.mp/2fbece2a4915/an-98understanding-neural-net-training-by-seeing-which-gradients-were-helpful","summarizer":"Rohin","summary":"The field of fairness has aimed to develop objective metrics of fairness, which can then be optimized for in order to produce a just AI system. Unfortunately, many intuitively desirable fairness metrics are fundamentally incompatible, and cannot be simultaneously achieved except in special circumstances. Should we lose all hope for fairness?\n\nThis paper argues that the problem was that we were building _idealized_ theories, referring to a conception from political philosophy of ideal and non-ideal modes of theorizing. An ideal theory is one that describes an optimal, ideal world, and then identifies injustices by searching for discrepancies between the real world and the idealized one. This leads to three major flaws:\n\n1. It can lead to systematic neglect of some injustices and distortions of our understanding of other injustices. For example, group parity metrics of fairness applied to college admissions would identify east Asian students as privileged relative to white students despite historical and institutional discrimination.\n\n2. It does not offer sufficient practical guidance about what should be done, sometimes leading to misguided mitigation strategies. Consider college admissions again. A _disparate learning process_ aims to be blind to protected characteristics (like gender) while still achieving demographic parity. This forces the model to penalize features that correlate with being male. As a result, we end up rewarding women who go into female-dominated fields, and penalize women who go into male-dominated fields! This was presumably not what we wanted.\n\n3. It does not make clear who among decision-makers is responsible for intervening to correct specific injustices.\n\nThe authors suggest that the research community move towards a non-ideal mode of theorizing, in which there is more emphasis on having a deep empirical understanding of the problem (including the various causal factors, rather than summary statistics), and using empirically-informed choices of treatments, rather than modifying ML algorithms to optimize a mathematically defined metric.","opinion":"I really enjoyed this paper, and my summary doesn't do it justice -- it makes several other good points. I feel similarly about alignment: I feel relatively pessimistic about formal definitions of concepts like <@goal-directedness@>(@Intuitions about goal-directed behavior@) or <@safe exploration@>(@Safety Gym@), and feel much better about schemes that don't assume a formal definition of concepts and instead learn them from humans (or don't require them at all).\n\nAnother thing that jumped out at me was that their description of the non-ideal mode of theorizing focuses a _lot_ on understanding what exactly is going on, which is very similar to the concepts of interpretability and <@universality@>(@Towards formalizing universality@) in alignment.","prerequisites":"nan","read_more":"nan","paper_version":"2001.09773v1","arxiv_id":"2001.09773","title":"Algorithmic Fairness from a Non-ideal Perspective","authors":["Sina Fazelpour","Zachary C. Lipton"],"date_published":"2020-01-08 18:44:41+00:00","data_last_modified":"2020-01-08 18:44:41+00:00","url":"http://arxiv.org/abs/2001.09773v1","abstract":"Inspired by recent breakthroughs in predictive modeling, practitioners in both industry and government have turned to machine learning with hopes of operationalizing predictions to drive automated decisions. Unfortunately, many social desiderata concerning consequential decisions, such as justice or fairness, have no natural formulation within a purely predictive framework. In efforts to mitigate these problems, researchers have proposed a variety of metrics for quantifying deviations from various statistical parities that we might expect to observe in a fair world and offered a variety of algorithms in attempts to satisfy subsets of these parities or to trade off the degree to which they are satisfied against utility. In this paper, we connect this approach to \\emph{fair machine learning} to the literature on ideal and non-ideal methodological approaches in political philosophy. The ideal approach requires positing the principles according to which a just world would operate. In the most straightforward application of ideal theory, one supports a proposed policy by arguing that it closes a discrepancy between the real and the perfectly just world. However, by failing to account for the mechanisms by which our non-ideal world arose, the responsibilities of various decision-makers, and the impacts of proposed policies, naive applications of ideal thinking can lead to misguided interventions. In this paper, we demonstrate a connection between the fair machine learning literature and the ideal approach in political philosophy, and argue that the increasingly apparent shortcomings of proposed fair machine learning algorithms reflect broader troubles faced by the ideal approach. We conclude with a critical discussion of the harms of misguided solutions, a reinterpretation of impossibility results, and directions for future research.","author_comment":"Accepted for publication at the AAAI/ACM Conference on Artificial\n Intelligence, Ethics, and Society (AIES) 2020","journal_ref":"None","doi":"None","primary_category":"cs.CY","categories":"['cs.CY', 'cs.AI', 'cs.LG', 'stat.ML']","individual_summary":"Title: Algorithmic Fairness from a Non-ideal Perspective\nAuthors: Sina Fazelpour, Zachary C. Lipton\nPaper abstract: Inspired by recent breakthroughs in predictive modeling, practitioners in both industry and government have turned to machine learning with hopes of operationalizing predictions to drive automated decisions. Unfortunately, many social desiderata concerning consequential decisions, such as justice or fairness, have no natural formulation within a purely predictive framework. In efforts to mitigate these problems, researchers have proposed a variety of metrics for quantifying deviations from various statistical parities that we might expect to observe in a fair world and offered a variety of algorithms in attempts to satisfy subsets of these parities or to trade off the degree to which they are satisfied against utility. In this paper, we connect this approach to \\emph{fair machine learning} to the literature on ideal and non-ideal methodological approaches in political philosophy. The ideal approach requires positing the principles according to which a just world would operate. In the most straightforward application of ideal theory, one supports a proposed policy by arguing that it closes a discrepancy between the real and the perfectly just world. However, by failing to account for the mechanisms by which our non-ideal world arose, the responsibilities of various decision-makers, and the impacts of proposed policies, naive applications of ideal thinking can lead to misguided interventions. In this paper, we demonstrate a connection between the fair machine learning literature and the ideal approach in political philosophy, and argue that the increasingly apparent shortcomings of proposed fair machine learning algorithms reflect broader troubles faced by the ideal approach. We conclude with a critical discussion of the harms of misguided solutions, a reinterpretation of impossibility results, and directions for future research.\nSummary: The field of fairness has aimed to develop objective metrics of fairness, which can then be optimized for in order to produce a just AI system. Unfortunately, many intuitively desirable fairness metrics are fundamentally incompatible, and cannot be simultaneously achieved except in special circumstances. Should we lose all hope for fairness?\n\nThis paper argues that the problem was that we were building _idealized_ theories, referring to a conception from political philosophy of ideal and non-ideal modes of theorizing. An ideal theory is one that describes an optimal, ideal world, and then identifies injustices by searching for discrepancies between the real world and the idealized one. This leads to three major flaws:\n\n1. It can lead to systematic neglect of some injustices and distortions of our understanding of other injustices. For example, group parity metrics of fairness applied to college admissions would identify east Asian students as privileged relative to white students despite historical and institutional discrimination.\n\n2. It does not offer sufficient practical guidance about what should be done, sometimes leading to misguided mitigation strategies. Consider college admissions again. A _disparate learning process_ aims to be blind to protected characteristics (like gender) while still achieving demographic parity. This forces the model to penalize features that correlate with being male. As a result, we end up rewarding women who go into female-dominated fields, and penalize women who go into male-dominated fields! This was presumably not what we wanted.\n\n3. It does not make clear who among decision-makers is responsible for intervening to correct specific injustices.\n\nThe authors suggest that the research community move towards a non-ideal mode of theorizing, in which there is more emphasis on having a deep empirical understanding of the problem (including the various causal factors, rather than summary statistics), and using empirically-informed choices of treatments, rather than modifying ML algorithms to optimize a mathematically defined metric.\nMy opinion: I really enjoyed this paper, and my summary doesn't do it justice -- it makes several other good points. I feel similarly about alignment: I feel relatively pessimistic about formal definitions of concepts like <@goal-directedness@>(@Intuitions about goal-directed behavior@) or <@safe exploration@>(@Safety Gym@), and feel much better about schemes that don't assume a formal definition of concepts and instead learn them from humans (or don't require them at all).\n\nAnother thing that jumped out at me was that their description of the non-ideal mode of theorizing focuses a _lot_ on understanding what exactly is going on, which is very similar to the concepts of interpretability and <@universality@>(@Towards formalizing universality@) in alignment.","paper_text":"","text":"HIGHLIGHTS\n[LCA: Loss Change Allocation for Neural Network Training](https://arxiv.org/abs/1909.01440) *(Janice Lan et al)* (summarized by Robert): This paper introduces the *Loss Change Allocation* (LCA) method. The method's purpose is to gain insight and understanding into the training process of deep neural networks. The method calculates an allocation of the change in overall loss (on the whole training set) between every parameter at each training iteration, which is iteratively refined until the approximation error is less than 1% overall. This loss change allocation can be either positive or negative; **if it's negative, then the parameter is said to have helped training at that iteration, and if it's positive then the parameter hurt training**. Given this measurement is per-parameter and per-iteration, it can be aggregated to per-layer LCA, or any other summation over parameters and training iterations.The authors use the method to gain a number of insights into the training process of several small neural networks (trained on MNIST and CIFAR-10).First, they validate that learning is very noisy, with **on average only half of the parameters helping at each iteration**. The distribution is heavier-tailed than a normal distribution, and is fairly symmetrical. However, parameters tend to alternate between helping and hurting, and each parameter only tends to help approximately 50% of the time.Second, they look at the LCA aggregated per-layer, summed over the entire training process, and show that in the CIFAR ResNet model **the first and last layers hurt overall** (i.e. have positive LCA). In an attempt to remedy this and understand the causes, the authors try freezing these layers, or reducing their learning rate. The first layer can't be fixed (freezing makes it's LCA 0, but later layers' LCA is increased in turn so the overall final loss stays the same). However, for the last layer, **freezing or reducing the learning rate increases the overall performance of the network**, as the last layer's LCA is decreased more than all the other layer's LCAs are increased. They also hypothesize that by reducing the momentum for the last layer, they can give it fresher information and make it more likely to learn. They find that this does work, though in this setting previous layers’ LCA increases to compensate, leaving overall performance unchanged.Finally, the authors show that **learning seems to be synchronised across layers**; layers get local LCA minima at the same training iterations, in a statistically significant way. They show this must be a combination of parameter motion and the gradient, as neither on their own explains this phenomenon. |\n\n\n |\n\n\n\n| \n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| **Robert's opinion:** I really liked this paper. The method is simple (although computationally expensive), and gives novel insights. I think understanding how deep learning training works is important as it can help us design better training processes, not just for better performance but for other properties we want the training process to induce. I think there's a lot of future work which could be done with this method, in making it more efficient and then applying it to larger models in domains other than vision. I'd also be interested in seeing if this can be used to understand which parts of the training set help and hurt training; for example seeing whether there's any correlation between the points of synchronised learning and the datapoints in the minibatch at that training iteration. Note: I'd recommend reading the paper (including the appendices) to see the graphs and visualisations the authors produced to demonstrate their arguments, as they're much easier to understand than a textual description.**Rohin's opinion:** I also really enjoyed this paper, it has great empirical evidence about how neural networks work. I'd be inclined to analyze the results somewhat differently. In particular, suppose that when calculating LCA, we made the following changes:1. We used the loss on the training batches instead of the full training set.2. We didn't improve the approximation error (i.e. we just used the point estimate of the gradient calculated during training).3. We trained using stochastic gradient descent (SGD) (as opposed to say Adam or Momentum-SGD).Then all LCA values would be negative (explanation in [this comment](https://www.alignmentforum.org/posts/TmHRACaxXrLbXb5tS/rohinmshah-s-shortform?commentId=dWkj6mK2uZtnXJQiq)). So, when the paper shows experiments where LCA values are positive (i.e. the parameters / layers are anti-learning), we can attribute those effects to some combination of these three factors.Take the observation that learning is very noisy. I would guess that this is primarily because of the first point: there are many many ways to improve the loss on a tiny little minibatch, but only a tiny fraction of those are capturing \"real effects\" that would improve the loss on the full large training dataset. Likely in the large majority of cases, the update doesn't capture a \"real effect\", and so it's a coin flip whether or not it will help with the loss on the full training dataset. A large probability of a coin flip + a small probability of a \"real effect\" gets you to an improvement slightly over half the time. This explanation applies across parameters, iterations, layers, etc.Similarly, they find that learning is synchronized across layers. I think this is also primarily because of the first point. My guess is that there are some batches of data that are more \"canonical\" than others, that are easiest to learn from. In the case where we see synchronization for each class, this could be as simple as that particular training batch having more examples of that class than other training batches.I’d be interested in seeing experiments in which we start with the version of LCA where everything is negative, and made only one of the changes. This would allow us to narrow down which particular change causes a given effect, kind of like an ablation study. |\n\n |\n\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| |\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| \n\n \n\n TECHNICAL AI ALIGNMENT\n\n\n ITERATED AMPLIFICATION\n[How does iterated amplification exceed human abilities?](https://www.alignmentforum.org/posts/ajQzejMYizfX4dMWK/how-does-iterated-amplification-exceed-human-abilities) *(Issa Rice)*\n\n LEARNING HUMAN INTENT\n[Shared Autonomy via Hindsight Optimization](https://arxiv.org/abs/1503.07619) *(Shervin Javdani et al)* (summarized by Rohin): This paper considers a shared autonomy task in which a user controls a robot to achieve some goal, and the robot learns to assist the user, without knowing the goal in advance. They formalize this as a POMDP in which the state includes the user's goal, which the robot does not get to observe. However, the POMDP observation model assigns higher probability to user actions that better achieve the goal (a standard Boltzmann rationality model), and this allows the agent to reason about what the goal must be. In practice, for computational tractability, rather than choosing optimal actions in the overall POMDP, the robot chooses optimal actions using a technique called hindsight optimization, which *assumes that the robot will never learn more information about the user's goal*. |\n\n\n |\n\n\n\n| \n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| **Rohin's opinion:** The formulation of a POMDP with uncertainty over the goal is remarkably similar to the formulation of [Cooperative Inverse Reinforcement Learning](https://arxiv.org/abs/1606.03137) ([AN #69](https://mailchi.mp/59ddebcb3b9a/an-69-stuart-russells-new-book-on-why-we-need-to-replace-the-standard-model-of-ai)) (and predates it), with the main difference being that there is only one actor (the robot hardware). |\n\n |\n\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| [Imitation Learning via Off-Policy Distribution Matching](https://openreview.net/forum?id=Hyg-JC4FDr) *(Ilya Kostrikov et al)* (summarized by Zach): One way to view imitation learning is as a distribution matching problem. In other words, the agent is rewarded based on how well it can imitate the state-distribution induced by the expert. In recent years, distribution matching via adversarial methods such as GAIL has become a popular approach to imitation learning. However, one weakness of these methods is that they require on-policy samples which means they require the agent to interact with the environment. In this paper, the authors present an off-policy method for distribution matching which can work without environment interaction. They do this by building on the prior work of DualDICE, a policy-agnostic method to estimate distribution ratios between agent and expert which can then be used to provide a reward to the agent. This allows the optimal policy to be estimated directly from demonstrations without any need for agent interaction. The authors run a few experiments and show that the method has comparable performance to behavioral cloning in the off-policy setting and adversarial methods in the on-policy setting.**Prerequisities:** [DualDICE](https://arxiv.org/abs/1906.04733)**Read more:** [GAIL](https://arxiv.org/abs/1606.03476) |\n\n\n |\n\n\n\n| \n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| **Zach's opinion:** This is a cool application of density-estimation via DualDICE. While the experiments are a bit weak, the fact that an off-policy method exists to do distribution-matching is interesting in its own right. Moreover, the method seems able to compete with both BC and GAIL-like methods which is intriguing. |\n\n |\n\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| \n\n \n\n VERIFICATION\n[Ethical Mission Definition and Execution for Maritime Robots Under Human Supervision](https://calhoun.nps.edu/handle/10945/61086) *(Don Brutzman et al)* (summarized by Rohin) (H/T Jon Rodriguez): While underwater robots can perform missions that humans cannot, they cannot be held liable for their actions. Our society requires that someone be responsible for (and can be held liable for) the actions of any such robot, leading to a form of the specification problem: how do we program robots such that it is reasonable to hold their operators accountable for their actions?This paper divides mission execution into three main parts: the execution level (hardware control), the tactical level (low-level behaviors), and the strategic level (what the robot should do). It proposes that, at the strategic level, we use formal methods to specify what the robot should do. The language should be expressive enough to be useful, while still keeping it sufficiently limited to allow exhaustive testing. They propose using state machines augmented with constraints. The constraints can be used to specify things like \"the robot must stay at least 10m away from obstacles\". The state machine decides which behaviors to execute, and each such behavior can have three results: success, failure, or exception (in the case that a constraint would have been violated had the behavior continued operating). |\n\n\n |\n\n\n\n| \n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| **Rohin's opinion:** It's interesting to see other groups also aiming to have what are essentially robustness guarantees, but motivated instead from the perspective of responsibility and liability. The actual method seems reasonable for the impoverished systems we have today, where we must specify everything that we want the system to do. |\n\n |\n\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| \n\n \n\n FORECASTING\n[FLI Podcast: On Superforecasting](https://futureoflife.org/2020/04/30/on-superforecasting-with-robert-de-neufville/) *(Lucas Perry and Robert de Neufville)*\n\n MISCELLANEOUS (ALIGNMENT)\n[Formal Metaethics and Metasemantics for AI Alignment](http://www.metaethical.ai/v20-1/) *(June Ku)* (summarized by Rohin): This website presents in great detail a process by which an agent might use data from human brains in order to infer a utility function for a single human (also spelling out what assumptions need to be made along the way), and then how it could combine the utility functions from different humans to arrive at \"a fully technical ethical goal function\". Emphasis is placed on solving the philosophical problems of metaethics and mental content. Quoting the website, they \"suppose that unlimited computation and a complete low-level causal model of the world and the adult human brains in it are available\".[Approaches to Deploying a Safe Artificial Moral Agent](https://montrealethics.ai/approaches-to-deploying-a-safe-artificial-moral-agent/) *(Olivier Couttolenc)* (summarized by Rohin): This post investigates which of the current moral theories would most reduce existential risk if we programmed it into an AI system, and settles on Aristotelian virtue ethics (over utilitarianism and Kant's categorical imperative). |\n\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| |\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| \n\n \n\n NEAR-TERM CONCERNS\n\n\n FAIRNESS AND BIAS\n[Algorithmic Fairness from a Non-ideal Perspective](https://arxiv.org/abs/2001.09773) *(Sina Fazelpour et al)* (summarized by Rohin): The field of fairness has aimed to develop objective metrics of fairness, which can then be optimized for in order to produce a just AI system. Unfortunately, many intuitively desirable fairness metrics are fundamentally incompatible, and cannot be simultaneously achieved except in special circumstances. Should we lose all hope for fairness?This paper argues that the problem was that we were building *idealized* theories, referring to a conception from political philosophy of ideal and non-ideal modes of theorizing. An ideal theory is one that describes an optimal, ideal world, and then identifies injustices by searching for discrepancies between the real world and the idealized one. This leads to three major flaws:1. It can lead to systematic neglect of some injustices and distortions of our understanding of other injustices. For example, group parity metrics of fairness applied to college admissions would identify east Asian students as privileged relative to white students despite historical and institutional discrimination.2. It does not offer sufficient practical guidance about what should be done, sometimes leading to misguided mitigation strategies. Consider college admissions again. A *disparate learning process* aims to be blind to protected characteristics (like gender) while still achieving demographic parity. This forces the model to penalize features that correlate with being male. As a result, we end up rewarding women who go into female-dominated fields, and penalize women who go into male-dominated fields! This was presumably not what we wanted.3. It does not make clear who among decision-makers is responsible for intervening to correct specific injustices.The authors suggest that the research community move towards a non-ideal mode of theorizing, in which there is more emphasis on having a deep empirical understanding of the problem (including the various causal factors, rather than summary statistics), and using empirically-informed choices of treatments, rather than modifying ML algorithms to optimize a mathematically defined metric. |\n\n\n |\n\n\n\n| \n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| **Rohin's opinion:** I really enjoyed this paper, and my summary doesn't do it justice -- it makes several other good points. I feel similarly about alignment: I feel relatively pessimistic about formal definitions of concepts like [goal-directedness](https://www.alignmentforum.org/posts/DfcywmqRSkBaCB6Ma/intuitions-about-goal-directed-behavior) ([AN #35](https://mailchi.mp/bbd47ba94e84/alignment-newsletter-35)) or [safe exploration](https://openai.com/blog/safety-gym/) ([AN #76](https://mailchi.mp/1106d0ce6766/an-76how-dataset-size-affects-robustness-and-benchmarking-safe-exploration-by-measuring-constraint-violations)), and feel much better about schemes that don't assume a formal definition of concepts and instead learn them from humans (or don't require them at all).Another thing that jumped out at me was that their description of the non-ideal mode of theorizing focuses a *lot* on understanding what exactly is going on, which is very similar to the concepts of interpretability and [universality](https://ai-alignment.com/towards-formalizing-universality-409ab893a456) ([AN #81](https://mailchi.mp/6078fe4f9928/an-81-universality-as-a-potential-solution-to-conceptual-difficulties-in-intent-alignment)) in alignment. |\n\n |\n\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| |\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| \n\n \n\n OTHER PROGRESS IN AI\n\n\n REINFORCEMENT LEARNING\n[The Ingredients of Real World Robotic Reinforcement Learning](https://bair.berkeley.edu/blog/2020/04/27/ingredients/) *(Henry Zhu, Justin Yu, Abhishek Gupta et al)* (summarized by Rohin): Suppose we wanted to train a robot to perform a task in the real world, and we didn't want to deal with the headache of sim-to-real transfer. Typically, since all of our experience must be collected in the real world, we would need a human to reset the robot to its initial state. The key idea of this paper is that the point of resets is to ensure that the robot explores a diversity of states causing it to learn a robust policy; this can be achieved by learning a *perturbation policy* whose objective is to take the robot to states it hasn't visited before. They then combine this with representation learning (so that they can learn from pixels) and use a classifier that distinguishes goal states from non-goal states as the reward function, to get a fully automated setup where once you start the robot's training, it trains itself without any human in the loop.**Read more:** [Paper: The Ingredients of Real World Robotic Reinforcement Learning](https://openreview.net/forum?id=rJe2syrtvS&noteId=rJe2syrtvS) |\n\n\n |\n\n\n\n| \n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| **Rohin's opinion:** This is a cool proof of concept, but the learned perturbation policy can only take you so far -- no learned perturbation policy is going to allow you to e.g. pick up an object after it is dropped, as you would want if you're training a robot to [manipulate a Rubik's cube](https://openai.com/blog/solving-rubiks-cube/) ([AN #70](https://mailchi.mp/732eaa192df0/an-70-agents-that-help-humans-who-are-still-learning-about-their-own-preferences)). It seems hard to overcome this sort of problem in a fully automated and learned way (though perhaps you could use more classical techniques to have a \"hardcoded\" but still automated reset policy). |\n\n |\n\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| |\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| \n\n \n\n NEWS\n[CLR Open Positions: Researchers and Summer Research Fellows](https://longtermrisk.org/work-with-us/) (summarized by Rohin): The Center on Long-Term Risk is looking for researchers and summer research fellows to work on high-quality research relevant to s-risks, including on (among other areas) multiagent systems. The application deadline is May 13. |\n\n\n |\n\n |\n\n |\n| \n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| FEEDBACK\nI'm always happy to hear feedback; you can send it to me, [Rohin Shah](https://rohinshah.com/), by **replying to this email**.\n |\n\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| PODCAST\nAn audio podcast version of the **Alignment Newsletter** is available. This podcast is an audio version of the newsletter, recorded by [Robert Miles](http://robertskmiles.com).\n**Subscribe here:**\n\n[RSS Feed](http://alignment-newsletter.libsyn.com/rss)[Google Podcasts](https://podcasts.google.com/?feed=aHR0cDovL2FsaWdubWVudC1uZXdzbGV0dGVyLmxpYnN5bi5jb20vcnNz)[Spotify Podcasts](https://open.spotify.com/show/5pwApVP0wr1Q61S4LmONuX)[Apple Podcasts](https://podcasts.apple.com/us/podcast/alignment-newsletter-podcast/id1489248000) |\n\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n| |\n| --- |\n| |\n\n |\n\n\n\n| | |\n| --- | --- |\n| \n\n\n| |\n| --- |\n| *Copyright © 2020 Alignment Newsletter, All rights reserved.*\n\n**"}

abstract: | Inspired by recent breakthroughs in predictive modeling, practitioners in both industry and government have turned to machine learning with hopes of operationalizing predictions to drive automated decisions. Unfortunately, many social desiderata concerning consequential decisions, such as justice or fairness, have no natural formulation within a purely predictive framework. In efforts to mitigate these problems, researchers have proposed a variety of metrics for quantifying deviations from various statistical parities that we might expect to observe in a fair world and offered a variety of algorithms in attempts to satisfy subsets of these parities or to trade off the degree to which they are satisfied against utility. In this paper, we connect this approach to fair machine learning to the literature on ideal and non-ideal methodological approaches in political philosophy. The ideal approach requires positing the principles according to which a just world would operate. In the most straightforward application of ideal theory, one supports a proposed policy by arguing that it closes a discrepancy between the real and the perfectly just world. However, by failing to account for the mechanisms by which our non-ideal world arose, the responsibilities of various decision-makers, and the impacts of proposed policies, naive applications of ideal thinking can lead to misguided interventions. In this paper, we demonstrate a connection between the fair machine learning literature and the ideal approach in political philosophy, and argue that the increasingly apparent shortcomings of proposed fair machine learning algorithms reflect broader troubles faced by the ideal approach. We conclude with a critical discussion of the harms of misguided solutions, a reinterpretation of impossibility results, and directions for future research.[^1] author:

| Sina Fazelpour & Zachary C. Lipton
Carnegie Mellon University
sinaf@andrew.cmu.edu, zlipton@cmu.edu bibliography:
refs.bib title: Algorithmic Fairness from a Non-ideal Perspective

Introduction {#sec:introduction}

Machine Learning (ML) models play increasingly prominent roles in the allocation of social benefits and burdens in numerous sensitive domains, including hiring, social services, and criminal justice [@Berk2018; @Crawford2014; @Barocas2016; @Feldman2015]. A growing body of academic research and investigative journalism has focused attention on ethical concerns regarding algorithmic decisions [@Brown2019; @Dwork2012; @Angwin2016], with many scholars warning that in numerous applications, ML-based systems may harm members of already-vulnerable communities [@Barocas2016; @Eubanks2018].

Motivated by this awareness, a new field of technical research addressing fairness in algorithmic decision-making has emerged, with researchers publishing countless papers aspiring to (i) formalize "fairness metrics"---mathematical expressions intended to quantify the extent to which a given algorithmic-based allocation is (un)just; and (2) mitigate "unfairness" as assessed by these metrics via modified data processing procedures, objective functions, or learning algorithms [@Hardt2016; @Zafar2017; @Nabi2018; @Kilbertus2017; @Dwork2012; @Grgic-Hlaca2018a; @Corbett-Davies2018]. However, progress has been hindered by disagreements over the appropriate conceptualization and formalization of fairness [@Chouldechova2016; @Kleinberg2017; @Glymour2019; @Binns18].

The persistence of such disagreements raises a fundamental methodological question about the appropriate approach for constructing tools for assessing and mitigating potential injustices of ML-supported allocations. Importantly, any useful methodology must provide normative guidance for how a given agent ought to act in a world plagued by systemic injustices. Broadly speaking, justice requires apportioning benefits and burdens in accordance with each person's rights and deserts---giving individuals "their due" [@Miller2017; @Feldman2016]. Beyond this general framing, how can we offer more specific and practical guidance?

Drawing on literature in political philosophy, in Section 2{reference-type="ref" reference="sec:ideal-non-ideal"}, we distinguish between ideal and non-ideal methodological approaches to developing such normative prescriptions, and highlight three challenges facing the ideal approach. Then, in Section 3{reference-type="ref" reference="sec:fair-ml-ideal"}, we argue that most of the current technical approaches for addressing algorithmic injustice are reasonably (and usefully) characterized as small-scale instances of ideal theorizing. Next, in Section 4{reference-type="ref" reference="sec:fair-ml-trouble"}, we support this argument by demonstrating several ways that current approaches are, to varying extents, plagued by the same types of problems that confront naive applications of ideal theorizing more generally. Finally, drawing on these considerations, in Section 5{reference-type="ref" reference="sec:discussion"}, we provide a critical discussion of the real-world dangers of this flawed framing and offer a set of recommendations for future work on algorithmic fairness.

Two Methodologies: Ideal vs. Non-Ideal {#sec:ideal-non-ideal}

How should one go about developing normative prescriptions that can guide decision-makers who aspire to act justly in an unjust world? A useful distinction in political philosophy is between ideal and non-ideal modes of theorizing about the relevant normative prescriptions [@Simmons2010; @Valentini2012; @Stemplowska2012]. When adopting the ideal approach, one starts by articulating a conception of an ideally just world under a set of idealized conditions. The conception of the just world serves two functions: (i) it provides decision-makers with a target state to aspire towards [@Stemplowska2012]; and (ii) when suitably specified, it serves as an evaluative standard for identifying and assessing current injustices "by the extent of the deviation from perfect justice" [@Rawls1999 p. 216]. According to this perspective, a suitably-specified evaluative standard can provide decision-makers with normative guidance to adopt policies that minimize deviations with respect to some notion of similarity, thus closing the gap between the ideal and reality [@Anderson2010].

Non-ideal theory emerged within political philosophy as a result of a number of challenges to ideal modes of theorizing [@Galston2010; @Valentini2012]. We focus here on three challenges that motivate the non-ideal approach. A first set of issues arises when we consider the intended role of a conception of an ideally just world as an evaluative lens for diagnosing actual injustices. In the ideal approach, the conceptual framing of perfect justice determines whether some actual procedure or event is identified as unjust and if so, how that injustice gets represented [@Mills2005; @Pateman2007]. When this conception is impoverished, e.g., by failing to articulate important factors, it can lead to systematic neglect of injustices that were overlooked in constructing the ideal. Moreover, the static nature of ideal standards and the pursuant diagnostic lens can overlook the factors that give rise to injustice in the first place. This is because such standards identify injustices in terms of the discrepancies between the actual world and an ideally-just target state. However, the historical origins and dynamics of current injustices and the ongoing social forces that sustain them are typically absent from consideration. By obfuscating these causal factors, ideal evaluative standards can distort our understanding of current injustices.

According to a second challenge, employing a conception of an ideally just world as an evaluative standard is not sufficient for deciding how actual injustices should be mitigated [@Sen2006; @Sen2009]. This is because, from the standpoint of an ideal, any discrepancy between our imperfect world and that ideal might be interpreted naively as a cause of an actual injustice, and thus, any policy that aims to directly minimize such a discrepancy might be erroneously argued to be justice-promoting [@Anderson2010; @Sen2006]. Yet, the actual world can deviate from an ideal in multiple respects, and the same kind of deviation can have varied and complex causal origins [@Sen2009]. Moreover, as the fair machine learning literature clearly demonstrates (see Section 5.2{reference-type="ref" reference="sec:impossibility"}), simultaneously eliminating all discrepancies might be impossible. Thus, a coherent approach requires not only a mandate to eliminate discrepancies, but also guidance for determining which discrepancies matter in a given context. Crucially, policies that simply seek to minimize any perceived gap between the ideal and reality without consideration for the underlying causes may not only be ineffective solutions to current injustices, but can potentially exacerbate the problem they purport to address. For example, ideal theorizing has been applied to argue for race-blind policies (against affirmative action) [@Anderson2010]. From the perspective of an ideally just society as a race-blind one, a solution to current injustices "would appear to be to end race-conscious policies" [@Anderson2010 4], thus blocking efforts devised to address historical racial injustices. Absent considerations of the dynamics by which disparities emerge, it is not clear that in a world where individuals have been racialized and treated differently on account of these perceived categories, race-blind policies are capable of bringing about the ideal [@Anderson2010].

Finally, a third challenge concerns the practical usefulness of the ideal approach for current decision-makers, given the type of idealized assumptions under which ideal theorizing proceeds. Consider, for example, the assumption of strict compliance, frequently assumed by ideal theorists as a condition under which the conception of an ideally just world can be developed. The condition assumes that nearly all relevant agents comply with what justice demands of them [@Rawls2001 13]. The condition thus idealizes away situations where some agents fail to act in conformity with their ethical duties (e.g., the duty not to racially discriminate), or are unwilling to do so. The vision of a just world constructed under this assumption fails to answer questions about what we might reasonably expect from a decision-maker in the real world, where others often neglect or avoid their responsibilities [@Schapiro2003; @Feinberg1973; @Valentini2012].

In short, when used as lens for identifying current injustices, ideal modes of theorizing (1) can lead to systematic neglects of some injustices and distort our understanding of other injustices; (2) do not, by themselves, offer sufficient practical guidance about what should be done, sometimes leading to misguided mitigation strategies; and finally, (3) do not, by themselves, make clear who, among decision-makers is responsible for intervening to right specific injustices. As a result of these challenges to ideal modes of theorizing, a number of researchers in political philosophy have turned to non-ideal modes of theorizing. In contrast to the ideal approach, the non-ideal approach begins by identifying actual injustices that are of concern to decision-makers and that give rise to reasonable complaints on behalf of those affected by their decisions [@Anderson2010; @Sen2006]. Non-ideal theorizing can be seen as a trouble-shooting effort towards addressing these actual concerns and complaints. As Sen notes, this trouble-shooting aim distinguishes non-ideal modes of theorizing from ideal approaches that focus "on looking only for the simultaneous fulfilment of the entire cluster of perfectly just societal arrangements" [@Sen2006 p. 218].

Anderson offers a succinct description of the non-ideal approach towards this trouble-shooting goal and what that approach requires:

[Non-ideal theorists] ... seek a causal explanation of the problem to determine what can and ought to be done about it, and who should be charged with correcting it. This requires an evaluation of the mechanisms causing the problem, as well as responsibilities of different agents to alter these mechanisms [@Anderson2010 p. 22]

As noted by Anderson, there is still a crucial role for normative ideals within the non-ideal approach. But this role is importantly different from the roles assigned to ideals in the ideal approach [@Anderson2010 6]. In the ideal approach, normative ideals are extra-empirical, in the sense that they set the evaluative standards against which actual practices are assessed, without themselves being subject to empirical evaluation. In contrast, in non-ideal theorizing, normative ideals act as hypotheses about potential solutions to identified problems. Viewed in this way, normative ideals are subject to revision in light of their efficacy in addressing the concerns and complaints that arise in practice. In the following sections, we show how the distinction can be put to work in understanding and addressing algorithmic injustice.

Work on Algorithmic Fairness as Small-scale Ideal Theorizing {#sec:fair-ml-ideal}

In political philosophy, the distinction between ideal and non-ideal approaches typically refers to ways of understanding the demands of justice at large, and offering practical normative guidance to basic societal institutions for complying with these demands. While some researchers are beginning to discuss how the automation of decision making in consequential domains interacts with demands of justice at this large scale, most works on algorithmic fairness have the more restricted aim of assessing and managing various disparities that arise among particular demographic groups in connection with the deployment of ML-supported decision systems in various (often-allocative) settings. Nonetheless, in what follows, we show that the distinction between ideal and non-ideal approaches provides a fruitful lens for formulating strategies for addressing algorithmic injustices, even on this smaller scale (of an individual decision-maker). In this section, we argue that the dominant approach among current efforts towards addressing algorithmic harms can be seen as exercises in small-scale ideal theorizing.

Developing a Fairness Ideal

Works on algorithmic fairness typically begin by outlining a conception of a "fairness ideal". @Dwork2012 [p. 215], for example, seek to "capture fairness by the principle that any two individuals who are similar with respect to a particular task should be classified similarly" (see also @Jung2019). Others envision the fair ideal at the group level. In nearly all cases, the groups of interest are those encompassing categories such as race, ethnic origin, sex, and religion. Following precedent in the United States Civil Rights Act, these groups are typically called protected classes or protected groups in the technical literature. According to one group-level conception of fairness, fair allocative policies and procedure are those that result in outcomes that impact different protected groups in the same way [@Zafar2017; @Feldman2015]. In other cases, a fair state is taken to be one in which membership in a protected group is irrelevant or does not make a difference to the allocative procedure [@Kilbertus2017; @Grgic-Hlaca2018a]. According to another view, a treatment disparity might exist in a fair state, if it is justified by the legitimate aims of the distributive procedure [@Hardt2016; @Nabi2018]. The endorsed fairness ideals have different provenances: in some cases, authors refer to historical legal cases, such as Carson v. Bethlehem Steel Corp. or Griggs v. Duke Power, to support their conception of fairness. In other cases, the ideal of fairness is derived from people's intuitive judgments about fair allocation [@Grgic-Hlaca2018a; @Jung2019]. And less frequently, authors allude to works of political philosophers such as Rawls, which is cited to support the conception of individual fairness in @Dwork2012.

Specifying a Fairness Metric

Next, on the basis of their favored fairness ideal, researchers specify a quantitative evaluative standard---a "fairness metric"---for diagnosing potential allocative injustices and guiding mitigation efforts. Typically, these fairness metrics take the form of mathematical expressions that quantify how far two among the protected groups are from parity. The magnitude of (dis)parity measured by a given fairness metric is taken to denote the degree of divergence from the ideal for which that metric is supposed to be a formal proxy.

Given their generality and abstract nature, fairness ideals do not fully determine the specific shape of fairness metrics. Accordingly, in addition to a fairness ideal, the construction of fairness metrics requires researchers to make further value judgments. For example, the ideal that membership in protected groups should be irrelevant to allocative decisions can be articulated in the language of statistics by requiring the outcome $\hat{Y}$ be independent (probabilistically) of the protected attributes $A$ [@Feldman2015]. However, the same ideal can also be expressed in the language of causality, e.g., by requiring that the average causal effect of protected attributes $A$ on $\hat{Y}$ be negligible [@Kilbertus2017]. Similarly, one can formalize the qualification that protected attributes can make a difference to outcomes when justified by the legitimate aims of allocative procedures in different ways. In the language of statistics, for example, one can require that while there may be some correlation between $\hat{Y}$ and $A$, the dependency must be screened off by the target variable, $Y$ [@Hardt2016]. Framed in the language of causality, some attempt to formalize this fairness ideal in terms of a parity between the causal effect of $A$ on $\hat{Y}$ along so-called legitimate pathways [@Nabi2018], where what counts as legitimate depends on the specific task and $Y$. Importantly, despite being motivated by the same ideal, such fairness metrics make different demands from the user and can result in different verdicts about the same case. In general, while statistical metrics can be formulated as functions of the joint distribution $P(Y, \hat{Y}, A, X)$, causal metrics additionally require the acquisition of a causal model that faithfully describes the data-generating processes and for which the desired causal effect is identifiable. Thus in some situations, statistical parity metrics may be estimable from data while the corresponding causal quantities may not be, owing to our limited knowledge of the data-generating process [@pearl2009causality].

Promoting Justice by Minimizing Deviations from the Ideal

Finally, current approaches seek to promote fairness (or mitigate unfairness) by modifying ML algorithms to maximize utility subject to a parity constraint expressed in terms of the proposed fairness metric. Such fairness-enforcing modifications can take the form of interventions (i) in the pre-processing stage to produce "fair representations" (e.g., @Kamiran2012); (ii) in the learning stage to create "fair learning" (e.g., @Zafar2017); or (iii) in the post-processing by adjusting the decision thresholds (e.g., @Hardt2016). Crucially, however, in all cases, the range of solutions to algorithmic harms is limited to an intervention to the ML algorithm. Absent from consideration in these approaches is the broader context in which the "certifiably fair" model will be deployed. Recalling Anderson's critique [@Anderson2010 22] of ideal approaches, neither the mechanisms causing the problem, nor the consequences of algorithmically-guided decisions, nor "the responsibilities of different agents to alter these mechanisms" are captured in any of these approaches.

Troubles with Ideal Fairness Metrics {#sec:fair-ml-trouble}

If current works on algorithmic fairness pursue (small-scale) ideal theorizing, then we should expect these works to encounter the same types of challenges as those confronting ideal theorizing more generally. As explained above, according to critics, ideal modes of theorizing can (1) lead to systematic neglects of some injustices; and distort our understanding of other injustices. Such ideal evaluative standards (2) do not offer sufficient practical guidance and can lead to misguided mitigation strategies. What is more, they (3) fail to delineate the responsibilities of current decision-makers in a world where others fail to comply with their responsibilities. Below, we consider each of these challenges in turn, and show that these same types of worries arise with respect to current works on algorithmic fairness.

Systematic Neglects of Rights

The identification of injustices in ideal theorizing is constrained by the underlying conceptual framing of normative ideals. If this conceptual framing is not sufficiently rich or comprehensive, we run the risk of overlooking many actual injustices. The ideals of fairness in literature on algorithmic fairness are predominantly expressed in terms of some type of parity among designated protected classes. Is this comprehensive enough to be sensitive to the types of injustices that would lead to legitimate complaints by those affected by ML-based allocations? We believe that the answer is negative. To see why, consider that assessing claims of injustice can require attending to different types of information. As noted by Feinberg [@Feinberg1974; @Feinberg2014], in some cases, what is someone's due is determinable only in comparison to what is allocated to others or what would have been allocated to them had they been present. In other cases, an individual's just due is determinable independent of any comparison and solely by reference to how that individual should have been treated in light of her rights and deserts. An allocative procedure can thus result in comparative as well as non-comparative cases of injustice [@Feinberg1974; @Feldman2016; @Montague1980].

Yet, virtually all algorithmic fairness ideals are framed in comparative terms. This comparative focus renders these ideals insensitive to legitimate claims of non-comparative injustice. Consider from this perspective, a judge who treats all defendants equally, denying parole to them all regardless of the specifics of their cases. Here the defendants can feel aggrieved because of how they should have been treated from the perspective of the standards of retributive justice; the review process was based on legally irrelevant factors, infringing on defendants' rights to due process, and at least in some cases, the punishments were disproportionately harsh, potentially resulting in arbitrary incarceration. Indeed, such sentencing behaviour goes against Articles 9 and 11 of the Universal Declaration of Human Rights, cited throughout various documents concerning ethical design such as the IEEE Ethically Aligned Design and the Toronto Declaration [@IEEE]. Yet, this and other cases of non-comparative injustice in which an individual's rights and deserts have been ignored escape the purview of current fairness metrics.

The situation is troubling even with respect to comparative cases of injustice. This is because, due to their narrow focus, fairness metrics essentially take the set of protected classes to exhaust comparison classes that might matter from the perspective of justice and fairness. However, consider a case where the appraisal of an employee's performance is influenced by factors such as their weight or height, despite the irrelevance (in a causal sense) of such characteristics to that job [@Judge2004; @Rudolph2009]. In this setting and from the perspective of comparative justice, height and weight are relevant categories. The complete reliance of such metrics on the particular specification of relevant comparison groups limits their adequacy in this regard. Indeed, unconstrained by these demands of comparative justice, algorithmic-based decisions might result in the creation of new "protected groups".

Distortion of the Harms of Discrimination

From the perspective of current fairness ideals, any divergence from the ideal of parity among protected classes (potentially subject to certain qualifications) is identified as a case of unfairness. Accordingly, the fairness metrics based on these ideals often have the property of being anonymous or symmetric; whether a distribution of benefits and burdens is fair does not depend on who the affected individuals or groups are. In certain contexts and for certain purposes, anonymity is a desirable property. Quantitative metrics of income inequality are required to be anonymous, for example, because "from an ethical point of view, it does not matter who is earning the income" [@Ray1998]. Unlike the case of income inequality, however, evaluating fairness claims requires going beyond the observation that some disparity exists [@Hellman2008]. We need to know why the disparity exists and to understand "the processes that produce or maintain it" [@Anderson2010 18]. This knowledge is required to determine a coherent course of action, and yet it does not inform any of the mitigation strategies in the standard fair machine learning tool-kits, making them unsuitable for off-the-shelf application.

Consider, for example, the very different mechanisms giving rise to disparities in representation between (white and east Asian) vs (white and black) students in US higher education. In the former case, the disparity (appearing to favor Asian students) emerges despite historical and institutional discrimination. In the latter, the disparity stems from well-documented historical and institutional discrimination. However, both represent violations of demographic parity [@Petersen1976]. A naive ideal approach may suggest that in both cases, the disparity requires alterations in admissions policies to enforce the parity across all groups we might expect in our ideal. A more nuanced non-ideal approach might recognize the differences between these two situations. In the literature on fair ML, approaches that incorporate knowledge of demographic labels are colloquially referred to as "fairness through awareness". However, as demonstrated above, awareness of demographic membership alone is too shallow to distinguish between these two situations. Instead, we require a deeper awareness, not only of demographic membership but of the societal mechanisms that imbue demographic membership with social significance in the given context and that give rise to existing disparities.

While this is especially problematic for statistical metrics that neglect the provenance of the observed data, recently-proposed causal approaches, including those formalizing fairness in terms of average causal effect or the effect of treatment on the treated, are similarly insufficient for capturing when a given disparity is reflective of discrimination, let alone whose discrimination it might reflect or providing guidance as to when the current decision-maker has a responsibility or license to intervene. Importantly, these causal methods typically address the problem of mediation analysis, adopting the perspective of an auditor seeking to explain the mechanisms by which the protected trait influences a model's prediction. Missing however, is a coherent theory for how to relate those mechanisms to the responsibilities of the current decision-maker, or any accounting of the causal mechanisms by which a proposed intervention may impact the social system for better or worse.

Insufficient Insights and Misguided Mitigation

As noted in the previous section, current mitigation strategies are guided by the idea that justice is promoted by intervening on ML algorithms to minimize disparities detected by a given metric. Insofar as the underlying causes of preexisting disparities and the consequences of proposed policies are ignored, however, these mitigation techniques might have adverse effects. As one example, consider a series of proposed approaches that @Lipton2018 denote disparate learning processes (DLPs). These techniques are designed to jointly satisfy two parities, blindness and demographic parity (e.g., @Zafar2017). However, as @Lipton2018 (2018) show, DLPs are oblivious to the underlying causal mechanisms of potential disparities and in some cases, DLPs achieve parity between protected classes (e.g., genders) by giving weight to the irrelevant proxies, (e.g., hair length). Using real-world data from graduate admissions to a computer science program, they showed that prohibited from considering gender directly, a DLP would pick up on proxies such as the subfield of interest. In order to achieve parity, the DLP must advantage those applicants that appear (based on their non-protected attributes) to be more likely to be women, while disadvantaging those that are more likely to be men. Thus, the DLP satisfies demographic parity by advantaging those pursuing studies in sub-fields chosen historically by more women (e.g., human-computer interaction) while disadvantaging those pursuing studies that are currently more male-dominated (e.g., machine learning). While the DLP achieves overall demographic parity, women in fields that already have greater parity receive the benefit, while women in those precise fields that most want for diversity would actually be penalized by the DLP.

Stepping back from a myopic view of the statistical problem and these arbitrarily-chosen deviations (the fairness metrics) from an ideal, when we consider the impact of a deployed DLP on a broader system of incentives, it becomes clear that the DLP risks amplifying the very injustices it is intended to address.

In addition to the non-comparative harm of making decisions on irrelevant grounds, the supposed remedy can reinforce social stereotypes, e.g., by incentivizing female applicants towards only those fields where they are already well represented (and away from others). Similarly, in simply seeking to minimize the disparity detected by fairness metrics, current metrics neglect considerations about whether the enforced parity might in fact result in long term harms [@liu2018delayed].

Lack of Practical Guidance

Finally, consider that the type of unjust disparities often faced in a given allocation context correspond to events potentially unfolding over decades. Current approaches to algorithmic fairness seek to address "is there discrimination?" but leave open the questions of "who discriminated?" and "what are the responsibilities of the current decision-maker?" If sensitive features influence education, which in turn influences employment decisions, then to what extent does the causal effect reflect the discrimination of the education system compared to that of the employer? The answer to this question is not straightforward and requires considerations not captured in the entries of confusion matrices. While identifying statistical disparities may be valuable unto itself, e.g., as a first step to indicate particular situations that warrant investigation, it provides little moral or legal guidance to the decision-maker. While the influence of protected attributes on predictions may reflect injustice, providing normative guidance requires identifying not only what would constitute a just world but also what constitute just decisions in the actual world, with its history of injustice.

Discussion {#sec:discussion}

If not Solutions, then Solutionism?

Even as the mitigation strategies arising from the recent technical literature on fair machine learning fail to offer practical guidance on matters of justice, they have not failed to deliver in the marketplace. From the perspective of stakeholders caught in the tension between (i) the potential profit to be gained from deploying machine learning in socially-consequential domains, and (ii) the increased scrutiny of a public concerned with algorithmic harms, these metrics offer an alluring solution: continue to deploy machine learning systems per the status quo, but use some chosen parity metric to claim a certificate of fairness, seemingly inoculating the actor against claims that they have not taken the moral concerns seriously, and weaponizing the half-baked tools produced by academics in the early stages of formalizing fairness as a shield against criticism.

In socially-consequential settings, requiring caution or even abstention (from applying ML) such as criminal justice and hiring, fair ML offers an apparent academic stamp of approval. Notable recent examples include the IBM fairness 360 toolkit, which offers fairness metrics and corresponding mitigation strategies as an open-source software service that claims to be able to "examine, report, and mitigate discrimination and bias in machine learning models throughout the AI application lifecycle" [@ibm360fairness]. Using just one parity metric (demographic parity), algorithmic hiring platform Pymetrics, Inc. claims that their system is "proven to be free of gender and ethnic bias" [@pymetrics].

The literature on fair machine learning bears some responsibility for this state of affairs. In many papers, these fairness-inspired parity metrics are described as definitions of fairness and the resulting algorithms that satisfy the parities are claimed axiomatically to be fair. While many of these metrics are useful diagnostics, potentially alerting practitioners to disparities warranting further investigation, the looseness with definitions creates an opening for stakeholders to claim compliance, even when the problems have not been addressed. Lacking the basic primitives required to make the relevant moral distinctions, when blindly optimized, these metrics are as likely to cause harm as to mitigate it. Thus current methods produced by the fair ML community run the risk of serving as solutionism if not as solutions [@Selbst2019].

Re-interpreting Impossibility Results {#sec:impossibility}

An additional benefit of viewing fairness in ML through the lens of non-ideal theorizing in political philosophy is that it gives a new perspective for parsing the numerous impossibility results [@Kleinberg2017; @Chouldechova2016] famously showing that many statistical fairness metrics are irreconcilable, presenting inescapable trade-offs. These results are sometimes misinterpreted as communicating that fairness is impossible. However, through the non-ideal lens, these impossibility theorems are simply a frank confirmation of the fact that we do not live in an ideal world. The inputs to statistical fairness metrics include four groups of variables: the covariates $X$, the group membership $A$, the label $Y$, and the classification $\hat{Y}$. The distribution over these variables at a given point in time is the consequence of the complex dynamics of an unjust society constituted of many decision-making agents. Of these, the current decision-maker has control only over their own predictions $\hat{Y}$. That various metrics/parities cannot be satisfied simultaneously merely by setting the values taken by $\hat{Y}$ indicates only that our present decision-maker cannot through their actions alone bring about the immediate end to all disparity, even as viewed locally through the variables that their individual decisions concern.

One potential contribution of ML impossibility theorems to philosophy is that they make evident an often-overlooked shortcoming with the ideal approach. These impossibility results make clear that in general, if we start from a non-ideal world, no set of actions (by a single agent) can instantaneously achieve the ideal world in every respect. Moreover, matching the ideal in a particular respect may only be possible at the expense of widening gaps in others. Thus this naive form of an ideal approach appears to be fundamentally under-specified. If matching the ideal in various respects simultaneously is impossible, then we require, in addition to an ideal, a basis for deciding which among competing discrepancies to focus on. In this manner, the impossibility results in fair ML provide a novel lens to approach the philosophical debate about the extent to which normative theorizing on matters of justice can proceed in isolation from empirical socio-historical facts [@Sen2009; @Farrelly2007].

While characterizing disparities and understanding the fundamental trade-offs among them may be valuable work, this work cannot by itself tell us what to do. The pressing issue in determining how to act justly is not how to optimize a given metric but how to make the determination of what, in a given situation, should be optimized in the first place.

Towards a Non-Ideal Perspective

Even if the reader finds the case against the ideal approach compelling, there remains a pragmatic question of what precisely a non-ideal approach might look like in practice. To begin, non-ideal theorizing about the demands of justice is a fact-sensitive exercise. Offering normative prescriptions to guide actions requires understanding the relevant causal mechanisms that (i) account for present injustices; and (ii) govern the impact of proposed interventions.

Empirical understanding of the problem: {#empirical-understanding-of-the-problem .unnumbered}

Developing causal models for understanding social dynamics that cause and maintain particular injustices requires extensive domain-knowledge as well as numerous value judgements about the relevance and significance of different aspects of the domain of interest. Choices must be made about what abstractions are reasonable, which simplifying assumptions are justified, and what formalizations are appropriate. Inevitably, these choices, embedded in design and modeling, raise coupled ethical-epistemic questions [@Tuana2010; @Proctor2008]. Consider, for instance, choices that might be made in understanding the causes of racial injustice in a particular allocative domain and a specific social setting. Aside from the challenge of understanding the concept of race [@Mills1998; @Mallon2006], research in psychology and sociology shows racial classification and identification to be dynamic categories that are shaped by a variety of socioeconomic factors such as unemployment, incarceration, and poverty [@Epp2014; @Penner2008; @Freeman2011]. Appreciating the complex and dynamic nature of race and the perception thereof is thus not only of ethical import; it also has important epistemic implications for formal models of racial injustice, as it shapes how "race" as an attribute should be understood and what causal relation it might bear to other factors of interest.

Empirically-informed choice of treatment: {#empirically-informed-choice-of-treatment .unnumbered}

Deployment of predictive models---whether those that simply maximize utility or those that maximize utility subject to some "fairness" constraint---constitutes a social intervention. As mentioned above, most existing approaches to fair ML consist only of modifying the data processing procedures or the objective functions. Crucially, the evaluation of these interventions is local and static: the evaluation is local insofar as it concerns the impact of the intervention only on that particular predictive model's statistics (i.e., its accuracy and various fairness metrics). The accompanying literature seldom considers the broader impacts of deploying such models in any particular social context. Moreover, the evaluation is typically static, ignoring the longer-term dynamics of proposed policies. When authors have attempted dynamic evaluations, the results have sometimes contraindicated proposed mitigation strategies [@liu2018delayed].

In contrast, a non-ideal approach to offering normative guidance should be based on evaluating the situated and system-wide (involving not just the predictive model but also the broader social context, actors, and users) and dynamic (evolving over longer periods) impact of potential fairness-promoting interventions.

Once more, we must face difficult questions and make value judgments. As some authors have noted, for instance, unjust circumstances can naturally arise as a result of seemingly benign initial conditions [@Schelling1971; @OConnor2019a]. To determine how to act, a coherent framework is needed for understanding when is it desirable or permissible for a given decision-maker to intervene. Importantly, we stress that the appropriate judgments simply cannot be made based on the reductive ($X$, $A$, $Y$ $\hat{Y}$) description upon which most statistical fair ML operates. Developing a coherent non-ideal approach requires (for the foreseeable future) human thought, both to understand the social context and to make the relevant normative judgments.

Conclusion {#sec:conclusion}

Approaching the issue of algorithmic fairness from a non-ideal perspective requires a broadening of scope beyond parity-constrained predictive models, and considering the wider socio-technological system consisting of human users, who informed by these models, make decisions in particular contexts and towards particular aims. Effectively addressing algorithmic harms demands nothing short of this broader, human-centered perspective, as it enables the formulation of novel and potentially more effective mitigation strategies that are not restricted to simple modifications of existing ML algorithms.

Acknowledgements {#sec:ackknowledge .unnumbered}

Many thanks to David Danks, Maria De-Arteaga, and our reviewers for helpful discussions and comments. Funding was provided by Social Sciences and Humanities Research Council of Canada (No. 756-2019-0289) and the AI Ethics and Governance Fund.

[^1]: A version of this paper was accepted at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) 2020.