Participatory budgeting experiments are experiments done in the laboratory and in computerized simulations, in order to check various ethical and practical aspects of participatory budgeting. These experiments aim to decide on two main issues:
Back-end: Which rule to use for aggregating the voters' preferences? See combinatorial participatory budgeting for detailed descriptions of various aggregation rules.
Goel, Krishnaswamy, Sakshuwong and Aitamurto report the results of several experiments done on real PB systems in Boston (2015–2016), Cambridge (2014–2015), Vallejo (2015) and New York City (2015). They compare knapsack voting to k-approval voting. Their main findings are:[1]
Knapsack voting tends to favor cheaper projects, whereas k-approval favors more expensive projects. This is probably due to the fact that knapsack voting raises the voters' attention to the project costs.
The time it takes users to vote using the digital interface is not significantly different between the two methods; knapsack voting does not take more time.
They claim that knapsack voting is more compatible with the aggregate preferences of the voters. To show this, they count, for each pair of projects x,y, the number of agents whose value/cost ratio for x is larger than the value/cost ratio for y. It turns out that, in their data, there is a Condorcet winner - a project who wins a majority over all other projects. Once this project is removed, there is a Condorcet winner among the remaining projects, and so on. Thus, there is a linear order that represents the aggregate preferences. Based on this order, they compute a Borda count to each set of projects, and compare the Borda count of the knapsack outcome and the k-approval outcome. They find out that the knapsack outcome has a substantially higher score, and conclude that knapsack voting better represents the aggregate preferences.
Later experiments lead to different conclusions:
Benade, Itzhak, Shah, Procaccia and Gal compared input formats on two dimensions: efficiency (social welfare of the resulting outcomes), and usability (cognitive burden on the voters). They conducted an empirical study with over 1200 voters. Their story was about resource allocation for a desert island. They concluded that k-approval voting imposes low cognitive burden and is efficient, although it is not perceived as such by the voters.[2]
Benade, Nath, Procaccia and Shah experimented with four input formats: knapsack voting, ranking by value, ranking by value-for-money, and threshold-approval. Their goal was to maximize social welfare by using observed votes as proxies for voters’ unknown underlying utilities. They found out that threshold-approval voting performs best on real PB data.[3]
Fairstein, Benade and Gal report the results of an experiment with Amazon Turk workers, on a PB process in an imaginary town. In their experiment, 1800 participants vote in four PB elections in a controlled setting, to evaluate the practical effects of the choice of voting format and aggregation rule. They compared k-approval with k=5,[4]: Figure 8(a) threshold-approval, knapsack voting, rank by value, rank by value/cost, and cardinal ballots. Their main findings[5] are that the k-approval voting format leads to the best user experience: users spent the least time learning the format and casting their votes, and found the format easiest to use. They felt that this format allowed them to express their preferences best, probably due to its simplicity.[4]
Yang, Hausladen, Peters, Pournaras, Fricker and Helbing constructed an experiment modeled over the PB process in Zurich. They had 180 subjects that are students from Zurich universities. Each subject had to evaluate projects in six input formats: unrestricted approval, 5-approval, 5-approval with ranking, cumulative with 5 points, cumulative with 10 points, cumulative with 10 points over 5 projects. The subjects were then asked which input format was most easy, most expressive, and most suitable. Unrestricted approval was conceived most easy, but least expressive and least suitable; in contrast, 5-approval with ranking, and cumulative with 10 points over 5 projects, were found significantly more expressive and more suitable. Suitability was affected mainly by expressiveness; the effect of easiness was negligible. They also found out that the project ranking in unrestricted approval was significantly different than in the other 5 input formats. Approval voting encouraged voters to disperse their votes beyond their immediate self-interest. This may be considered as altruism, but it may also mean that this format does not represent their preferences well enough.[6]
Fairstein, Benade and Gal compared the robustness of various methods to the participation rate, that is: if a certain random subset of the voters remain at home, how does it affect the final outcome? They particularly compared the simple greedy algorithm (which assumes cost-based satisfaction) with equal shares (assuming cardinality-based satisfaction). They found out that greedy outcomes are highly sensitive to the input format used and the fraction of the population that participates. In contrast, MES outcomes are not sensitive to the type of voting format used. These outcomes are stable even when only 25–50% of the population participates in the election.[4]
Yang, Hausladen, Peters, Pournaras, Fricker and Helbing do a similar experiment comparing four rules: simple greedy (which assumes cost-satisfaction), value/cost greedy (which assumes cardinality-satisfaction), MES with cardinality-satisfaction, and MES with cost-based satisfaction. They found out that the differences in stability are not significant when comparing rules using the same satisfaction function.[6]
To compute the outcomes, they added to the subjects' votes, some random votes generated using a realistic probability distribution. They then compared three types of explanations: mechanism explanation (a general explanation of how the aggregation rule works given the voting input), individual explanation (explaining how many voters had at least one approved project, at least 10000 CHF in approved projects), and group explanation (explaining how the budget is distributed among the districts and topics). They compared the perceived trustworthiness and fairness of greedy and equal shares, before and after the explanations. They found out that:[6]
Voters found the mechanism explanation of MES harder to understand than that of greedy.
Despite this fact, voters found MES fairer and more trustworthy. This shows that they put more emphasis on outcome than on simplicity of mechanism.
For MES, Mechanism explanation yields the highest increase in perceived fairness and trustworthiness; the second-highest was group explanation.
For greedy, Mechanism explanation increases perceived trustworthiness but not fairness, whereas Individual explanation increases both perceived fairness and trustworthiness. Group explanation decreases the perceived fairness and trustworthiness.
With greedy, fairness concepts were correlated with personal gain; With MES there was no such correlation, which indicates that MES encourages voters to take a more community-centered stance.
Rosenfeld and Talmon conducted two experiments:[7]
In the first experiment, they assumed that agents have known and additive utilities (they presented a story of a mall manager, who has to partition a budget among projects that will yield different monetary revenue to different stores in the mall). They compared five aggregation methods: utilitarian rule, Nash-product rule, egalitarian rule, minimal transfers over costs[8] and BPJR.[9] They constructed random scenarios in which each of these rules yielded a different budget-allocation. They asked 100 subjects (university students from Israel and Poland) to help the mall manager choose among these 5 options. The Nash rule had the highest support, with the utilitarian a close second-place. Similar results were attained when subjects were asked to choose by the verbal descriptions of the rules.
In the second experiment, they assumed that agents only report approval ballots (they presented a story of a residential building manager, who has to partition a budget among projects that will yield different benefits to different tenants). They compared five utility functions: dichotomous (1 if at least one approved project is funded, 0 otherwise); cardinality (num of approved projects funded); cost (total cost of approved projects funded); square root of cost; and maximum cost. They constructed random scenarios in which the utilitiarian rule, with each of these utility models, yielded a different budget-allocation. They asked 80 subjects (Israeli students) to choose among these 5 options. They did the same experiment with the Nash aggregation rule. This yields an indirect evidence about the most reasonable utility function. In both experiments, the aggregation based on cardinality-utility function scored best. This was consistent with the ranking of verbal descriptions for utilitarian aggregation, but not for Nash aggregation.
Similar results were found when more advanced students (M.Sc. students) were asked to construct the budget-allocation by themselves, rather than choose from 5 options.[7]
Peters and Skowron conducted a simulation experiment: they took the votes from the PB in Warsaw, which were aggregated using the greedy algorithm, and compared the outcome to aggregation using equal shares. Their conclusions are:[10]
The average cardinality-satisfaction is 4.5 with greedy and 6 with MES.
The number of voters with positive satisfaction is 87% with greedy and 93% with MES.
The average cost-satisfaction is 27% of the total budget with greedy and 24% with MES.
The distribution of voter satisfaction is more equal with MES than with greedy.