Auto-generated fixes to algorithms don’t completely eliminate bias
As predictive models are deployed to make decisions ranging from employee hiring to loan approvals, there’s a growing emphasis on designing algorithms that explain their decision-making and provide recourse to affected individuals. (For example, when a person is denied a loan by a model, they should be informed of the reasons and what can be done to address them.) Several recourse generation algorithms have been proposed in academic research papers, but it remains an open question whether these algorithms are reliable in the sense that they consistently improve outcomes.
A study from Harvard- and Microsoft-affiliated researchers finds strong evidence that they aren’t. That’s because algorithmically generated recourses tend to become invalid as stakeholders like banks and financial institutions retrain and update their models and use frameworks to adapt to new patterns in the data. It’s also because the data used to train these decision-making models is subject to temporal, geospatial, and other kinds of shifts due to data corrections, recourse intervention, and more.
Inspired by current events, the coauthors considered the problem of predicting grades using an AI classifier model. They trained a classifier on a dataset consisting of schools spread out across Jordan and Kuwait, using training examples collected from Jordan schools and deploying it to schools in Kuwait. In one hypothetical scenario, they assumed that students in Kuwait were provided recourses to improve their predicted grades but that when the students reapplied for grade prediction, the training dataset was updated to include Kuwait school data. In a second scenario, the researchers swapped the initial training data to come from Kuwait instead of Jordan.
Applying a state-of-the-art recourse generation technique in the first scenario would provide explanations to 116 students in Kuwait who received failing grades from the classifier trained on the Jordan dataset, the coauthors found. However, were the students to follow the recommendations and reapply for grade prediction, the classifier would yield favorable predictions for only 28.3% of them after being updated with the Kuwait dataset. In the second scenario, the same recourse generation technique would provide recommendations to 66 students, but these recommendations would result in better grades for only 60.6% of students.
In another experiment, the researchers trained a classifier on an error-prone German credit dataset to determine the creditworthiness of loan applicants. After applying the same recourse generation technique in the grade prediction problem, they found that 900 of 1,000 applicants would have been provided recourses. However, if the classifier were to be retrained on a corrected dataset with minor changes, only 22% would be accepted, even after implementing the recommended recourses.
In one final sample, the coauthors benchmarked a classifier that predicted whether a candidate would repay a loan using income, age, and method of application data. Trained on a synthetic dataset, the classifier would give 261 (if age were considered) or 522 (without the age variable) out of 1,024 applicants unfavorable model predictions, the researchers report. But recourse generation wouldn’t vastly improve the candidates’ chances. They would have been told by the recourse generation technique to increase their income, but even with increased incomes, the classifier would predict only 0% to 8% of them would repay loans.
The researchers claim that their work, taken as a whole, shows that distributions shifts can cause “significant invalidation” of generated recourses, endangering trust in decision makers. “The problem of distribution shifts invalidating recourses and counterfactual explanations seems to be a direct result of current recourse finding technologies, rather than of the properties of the initial model,” they wrote. “It would be interesting to develop novel recourse finding strategies that do not suffer from the drawbacks of existing techniques and are robust to distribution shifts.”
- up-to-date information on the subjects of interest to you,
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform
- networking features, and more.
Source: Read Full Article