author:

Lewis Hammond
James Fox
Tom Everitt
Alessandro Abate
Michael Wooldridge bibliography:
ms.bib title: "Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice"

Introduction {#sec:intro}

Multi-agent influence diagrams (MAIDs) are a compact and expressive graphical representation for non-cooperative games. Introduced by Koller and Milch (henceforth K&M) [@koller2003multi; @milch2008ignorable], they offer three key advantages over the classic extensive form game (EFG) representation. First, MAIDs can depict many games more compactly than EFGs, especially those with incomplete information. Second, MAIDs encode conditional independencies between variables. This means large MAIDs can often be decomposed into several smaller ones, with potentially exponential speedups for finding Nash equilibria [@koller2003multi]. Third, MAIDs often make it possible to explicitly represent aspects of game structure that are obscured in EFGs. While it is possible to convert any EFG to a MAID of at most the same size (Section 3.3.2{reference-type="ref" reference="sec:EFGtoMAID"}), it is true that EFGs are sometimes better suited for modelling asymmetric decision problems. With that said, every model has its weaknesses, and how useful a particular representation is rests on its strengths. We further develop both the theory and practical tools for MAIDs in order to allow both researchers and practitioners to make the most of their strengths.

Previous work on MAIDs has focussed on Nash equilibria as the core solution concept [@nash1950equilibrium]. Whilst this is arguably the most important solution concept in non-cooperative game theory, if there are many Nash equilibria we often wish to remove some of those that are less 'rational'. Many refinements to the Nash equilibrium have been proposed [@Maschler2013], with two of the most important being subgame perfect Nash equilibria [@selten1965spieltheoretische] and trembling hand perfect equilibria [@selten1974reexamination]. The first rules out 'non-credible' threats and the second requires that each player is still playing a best-response when other players make small mistakes. On the practical side, while much software exists for normal or extensive form games, there is no such implementation for reasoning about games expressed as MAIDs, despite their computational advantages.

Contribution

In this paper, we make the following contributions. First, we extend the applicability of MAIDs by introducing the concept of a MAID subgame (Section 3.1{reference-type="ref" reference="sec:subgames"}) and build on this concept to introduce subgame perfect and trembling hand perfect equilibrium refinements (Section 3.2{reference-type="ref" reference="sec:eqrefine"}). Second, we prove several equivalence results between MAIDs and EFGs, demonstrating the preservation of the key game-theoretic concepts described above when representing EFGs as MAIDs and thus further justifying the use of this model. These proofs are constructive and are based on procedures for converting between EFGs and MAIDs, the full details of which are included in Appendices 6.1{reference-type="ref" reference="sec:macid2efg"} and 6.2{reference-type="ref" reference="sec:EFG2MAID"}. Third, we report on our open source codebase for computing our equilibrium refinements in MAIDs (Section 4{reference-type="ref" reference="sec:implementation"}).

Related Work

Our work builds primarily on the seminal work of K&M [@koller2003multi; @milch2008ignorable]. More recently, casual influence diagrams (CIDs) have been defined [@Everitt2021], where the probabilistic arrows in influence diagrams are interpreted as describing a causal relationship, in accordance with Pearl's graphical causal models [@pearl2009causality]. CIDs model single agents, helping to predict behaviour by identifying the incentives that arise due to the agent optimising its objective, and have been shown to have many applications [@carey2020incentives; @everitt2019modeling; @everitt2019reward; @holtman2020agi; @Langlois2021]. Our equilibrium refinements for MAIDs and our implementation are partially targeted at extending this work on incentives to the multi-agent setting.

Pfeffer and Gal investigated when an agent is motivated to care about its decision in the context of MAIDs, identifying four reasoning patterns (with associated graphical criteria) that justify a particular decision choice [@pfeffer2007reasoning]. Later work showed practical applications of these reasoning patterns, which can lead to safer human-machine or machine-machine designs and again reduce the time complexity of computing Nash equilibria [@antos2012identifying]. In this work we implement these reasoning patterns in our codebase. Building on this, further research could consider which reasoning patterns arise when agents are playing a certain equilibrium refinement.

Several other formalisms, often partly inspired by MAIDs, have been proposed for representing and reasoning about games as probabilistic graphical models. For example: networks of influence diagrams represent mental models of the different agents as nodes in a graph and use these to describe and reason about belief structures [@Gal2008]; settable systems extend structural equation models to include the concept of optimisation and hence the idea of a 'best response', which is key to defining game-theoretic equilibria [@White2009]; temporal action graph games are similar to MAIDs, but can be more compact for games that involve anonymity or context-specific utility independencies [@Jiang2009]. These works, however, focus on the introduction of novel representations, whereas we focus on deepening the theory and practice behind an existing representation. It is an interesting question for further research whether our insights also apply to these related models.

Background

In this section, we define EFGs and MAIDs and show how their graphical representation of games differ with the help of the following example [@spence1978job].

[[ex:hiring]]{#ex:hiring label="ex:hiring"}

A company employs an AI system to automate their hiring process. A naturally hard-working or naturally lazy worker wants a job at this company and believes that a university degree will increase their chance of being hired; however, they also know that they will suffer an opportunity cost from three years of studying. A hard-worker will cope better with a university workload than a lazy worker. The algorithm must decide, on behalf of the company, whether to hire the worker. The company wants to hire someone who is naturally hard-working, but the algorithm can't observe the worker's temperament directly, it can only infer it indirectly through whether or not the worker attended university.

We use capital letters $X$ for variables and let $\dom(X)$ denote the domain of X. An assignment $x \in \dom(X)$ to $X$ is an instantiation of $X$ denoted by $X=x$. $\bm{X} = {X_1, \dots, X_n}$ is a set of variables with domain $\dom(\bm{X}) = \times^{n}_{i=1}\dom(X_i)$ and $\bm{x} = {x_1, \dots, x_n}$ is the set containing an instantiation of all variables in $\bm{X}$. We let $\Pa_V$ denote the parents of a node $V$ in a graphical representation and $\pa_V$ be the instantiation of $\Pa_V$. $\Ch_V$, $\Anc_V$, $\Desc_V$, and $\Fa_V \coloneqq \Pa_V \cup {V}$ are the children, ancestors, descendants, and family of $V$ with, analogously to $\pa_V$, their instantiations written in lowercase. Unless otherwise indicated we index mathematical objects with superscripts $i \in \bm{N}$ to denote their affiliation with a player $i$ (where $\bm{N}$ is a set of players) and with subscripts $j \in \mathbb{N}$ to enumerate them.

Extensive Form Games

An extensive-form game (EFG) $\efg$ is a tuple
$(\bm{N}, T, \bm{P}, \bm{D}, \lambda, \bm{I}, U), \textrm{where:}$

$\bm{N} = {1,\dots,n}$ is a set of agents.
$T = (\bm{V}, \bm{E})$ is a game tree with nodes $\bm{V}$ that are partitioned into the sets $\bm{V}^0, \bm{V}^1, \dots, \bm{V}^n, \bm{L}$ where $R \in \bm{V}$ is the root of $T$, $\bm{L}$ is the set of leaves of the tree, $\bm{V}^0$ is the set of chance nodes, and $\bm{V}^i$ is the set of nodes controlled by player $i \in \bm{N}$. These nodes are connected by edges $\bm{E} \subseteq \bm{V} \times \bm{V}$.
$\bm{P} = {P_1,\dots,P_{\vert\bm{V}^0\vert}}$ is a set of probability distributions where each $P_j : \Ch_{V_j} \rightarrow [0,1]$ determines the probability of a path through the game tree that has reached chance node $V^0_j$ proceeding to each child node in $\Ch_{V^0_j}$.
$\bm{D}$ is a set of decisions, we write $\bm{D}^i_j \subseteq \bm{D}$ to describe the set of available decisions at node $V^i_j \in \bm{V}^i$.
$\lambda : \bm{E} \rightarrow \bm{D}$ is a labelling function mapping an edge $(V^i_j, V^k_l)$ to a decision $d \in \bm{D}^i_j$.
$\bm{I} = {\bm{I}^1,\dots,\bm{I}^n}$ is a set such that for each player $\bm{I}^i \subset 2^{\bm{V}^i}$ defines a partition of the vertices controlled by player $i$ into information sets.
$U : \bm{L} \rightarrow \mathbb{R}^n$ is a utility function mapping each leaf node to a vector that determines the final payoff for each player.

An information set $I_j^i \in \bm{I}^i$ is defined such that for all $V^i_k, V^i_l \in I_j^i$ we have $\bm{D}^i_j \coloneqq \bm{D}^i_k = \bm{D}^i_l$. In other words, the same player $i$ selects the decision and the same decisions are available at each of the nodes in an information set. When $\vert I_j^i\vert = 1$ for all $i$ and $j$, $\efg$ is a perfect information game. A (behavioural) strategy $\sigma^i$ for a player $i$ is a set of probability distributions $\sigma_j^i : \bm{D}^i_j \rightarrow [0,1]$ over the actions available to the player at each of their information sets $I_j^i$.[^1] A strategy is pure when $\sigma_j^i(d) \in {0,1}$ for all information sets $I_j^i$ and fully mixed when $\sigma_j^i(d) > 0$ for all $d \in \bm{D}^i_j$. A strategy profile $\sigma = (\sigma^1, \dots, \sigma^n)$ is a tuple of strategies one for each player $i \in \bm{N}$. $\sigma^{-i} = (\sigma^1, \dots, \sigma^{i-1}, \sigma^{i+1},\dots, \sigma^n)$ denotes the partial strategy profile of all players other than $i$, and so $\sigma = (\sigma^i, \sigma^{-i})$. The combination of the distributions in $\bm{P}$ with a strategy profile $\sigma$ thus defines a full probability distribution $P^\sigma$ over paths in $\mathcal{G}$.

(0)[chance node]$V^0$ <grow=left>h[a] <grow=right>l[a] <grow=left>[b] <grow=right>[b] (1)(0-1)<180, red!70>$V^1_1$ <grow=north>g[l] <grow=south>a[l] (2)(0-2)<0, red!70>$V^1_2$ <grow=north>g[r] <grow=south>a[r] '[north](a1)(1-1)<315, blue!70>$V^2_1$ [bl](4,3) [br](-1,-1) (b1)(1-2)<45, blue!70>$V^2_2$ [al](5,3) [ar](0,-1) (a2)(2-2)<100, blue!70>$V^2_3$ [al](3,-2) [ar](0,0) '[north](b2)(2-1)<225, blue!70>$V^2_4$ [bl](2,-2) [br](-2,0) (a1)(b2)[$I^2_1$]{style="color: blue!70"} (b1)(a2)[$I^2_2$]{style="color: blue!70"}[below] (1)(1-1)(1-2)(2-1)(2-2)(a1-1)(a1-2)(b1-1)(b1-2)(a2-1)(a2-2)(b2-1)(b2-2)[black,inner sep = 12pt]

Figure [fig:signalEFG]{reference-type="ref" reference="fig:signalEFG"} shows Example [ex:hiring]{reference-type="ref" reference="ex:hiring"}'s signalling game in extensive form. Nature, as a chance node $V^0$, flips a biased coin at the root of the tree to decide whether the person is hard-working (probability $p$) or lazy (probability $1-p$). The worker (player 1)'s decision whether to go ($g$) or avoid ($a$) university is represented at nodes ${V^1_1, V^1_2} = \bm{V}^1$ and there are four nodes ${V^2_1, V^2_2, V^2_3, V^2_4} = \bm{V}^2$ for the hiring algorithm (player 2), each with two decision options: reject ($r$) or job offer ($j$). These nodes are split into two information sets (dotted lines between nodes) because the algorithm does not know whether the person is naturally hard-working. The payoffs for the worker and the employer respectively are given at the leaves of the tree.

Multi-Agent Influence Diagrams {#sec:MACID}

Following recent work [@howard2005influence; @Everitt2021], we depart slightly from the convention of K&M to distinguish between an influence diagram, which gives the structure of a strategic interaction, and an influence model, which adds a particular parametrisation to the diagram.

[[def:MAID]]{#def:MAID label="def:MAID"} A multi-agent influence diagram (MAID) is a triple $(\bm{N}, \bm{V}, \bm{E})$, where:

$\bm{N} = {1,\dots,n}$ is a set of agents.
$(\bm{V}, \bm{E})$ is a directed acyclic graph (DAG) with a set of vertices $\bm{V}$ connected by directed edges $\bm{E} \subseteq \bm{V} \times \bm{V}$. These vertices are partitioned into $\bm{D}$, $\bm{U}$, and $\bm{X}$, which correspond to decision, utility, and chance nodes respectively. $\bm{D}$ and $\bm{U}$ are in turn partitioned into ${\bm{D}^i}{i\in \bm{N}}$ and ${\bm{U}^i}{i\in \bm{N}}$ corresponding to their association with a particular agent $i \in \bm{N}$.

[[def:MAIM]]{#def:MAIM label="def:MAIM"} A multi-agent influence model (MAIM) is a tuple $(\bm{N}, \bm{V}, \bm{E}, \theta)$ where $(\bm{N}, \bm{V}, \bm{E})$ is a MAID and:

$\theta \in \Theta$ is a particular parametrisation over the nodes in the graph specifying a finite domain $\dom(V)$ for each node $V \in \bm{V}$, real-valued domains $\dom(U) \subset \mathbb{R}$ for each $U \in \bm{U}$, and a set of conditional probability distributions (CPDs) $\Pr(\bm{V} \mid \Pa_V)$ for every chance and utility node. Taken together, the CPDs form a partial distribution $\Pr(\bm{X},\bm{U} : \bm{D}) = \prod_{V \in \bm{V} \setminus \bm{D}} \Pr(V \mid \Pa_V)$ over the variables in the MAID.
The value $u \in \dom(U)$ of a utility node is a deterministic function of the values of its parents $\pa_U \in \dom(\Pa_U)$.

Figure [fig:signalMACID]{reference-type="ref" reference="fig:signalMACID"} a) shows the MAID for Example [ex:hiring]{reference-type="ref" reference="ex:hiring"} corresponding to the EFG in Figure [fig:signalEFG]{reference-type="ref" reference="fig:signalEFG"}. Whether the worker is hard-working or lazy is decided by nature's chance node $X$ (white circle). The worker's decision $D^1$ and utility $U^1$ nodes are depicted as a red rectangle and diamond respectively. The algorithm's decision $D^2$ and utility $U^2$ nodes are in blue. To instantiate a MAIM, CPD tables for $U^1$ and $U^2$ would be consistent with the payoffs and value of $p$ in Figure [fig:signalEFG]{reference-type="ref" reference="fig:signalEFG"}.

There are two types of directed edge in a MAID. Full edges leading into $\bm{X} \cup \bm{U}$ represent probabilistic dependence, as in a Bayesian network. Dotted edges leading into $\bm{D}$ represent information that is available to the agent at the time a decision is made (e.g. the edge $X \rightarrow D^1$). In this way, the values of the parents $\pa_D$ of a decision node $D$ represent the decision context for $D$. The CPDs of decision nodes are not defined when a MAIM is constructed because they are instead chosen by the agents playing the game. In general, a player's decision CPD need not be optimal.

This example demonstrates two clear advantages of MAIDs compared with EFGs. First, in many real world cases, MAIDs make it possible to explicitly represent aspects of game structure that are obscured in the extensive form. For example, in the EFG, information sets were drawn to reflect the fact that the algorithm does not know whether the worker is naturally hard-working or lazy when it selects its action. However, in the corresponding MAID, this incomplete information is represented simply by the fact that there is no edge $X \rightarrow D^2$. Moreover, the company's utility $U^2$ isn't a function of whether the applicant went to university or not -- it only cares whether the applicant is hard-working and whether or not they hired them. We can infer this from the EFG payoffs in Figure [fig:signalEFG]{reference-type="ref" reference="fig:signalEFG"}, but in the MAID this is shown instantly by the fact that there is no edge $D^1 \rightarrow U^2$.

Second, MAIDs can provide a more compact graphical representation of games [@koller2003multi; @milch2008ignorable]. In fact, the MAID representation of a game need never be bigger than the corresponding EFG and can be smaller in many cases. For example, there are four nodes $V^2_1, V^2_2, V^2_3, V^2_4$ in Figure [fig:signalEFG]{reference-type="ref" reference="fig:signalEFG"} which correspond to the company's decision. In the MAID, these are combined into one node, $D^2$.

A further strength of MAIMs derive from them being probabilistic graphical models, and so probabilistic dependencies between chance and strategic variables can be exploited. We recall the notion of d-separation, a graphical criterion for determining independence properties of the probability distribution associated with the graph. This is necessary for the concept of r-reachability (Section 2.5{reference-type="ref" reference="sec:stratrel"}) and consequently that of a MAID subgame (Section 3.1{reference-type="ref" reference="sec:subgames"}).

[[def:dsep]]{#def:dsep label="def:dsep"} A path $p$ in a MAID $(\bm{N}, \bm{V}, \bm{E})$ is said to be d-separated by a set of nodes $\bm{W} \subset \bm{V}$ if and only if either:

$p$ contains a chain $X \rightarrow Y \rightarrow Z$ or a fork $X \leftarrow Y \rightarrow Z$ and $Y \in \bm{W}$.
$p$ contains a collider $X \rightarrow Y \leftarrow Z$ and $({Y} \cup \Desc_Y) \not\subseteq \bm{W}$.

A set $\bm{W}$ d-separates $\bm{X}$ from $\bm{Y}$, denoted $\bm{X} \perp \bm{Y} \mid \bm{W}$, if and only if $\bm{W}$ d-separates every path from a node in $\bm{X}$ to a node in $\bm{Y}$. Sets of variables that are not d-separated are said to be d-connected, denoted $\bm{X} \not\perp \bm{Y} \mid \bm{W}$. If $\bm{X}$ and $\bm{Y}$ are d-separated conditioning on $\bm{W}$, then $\bm{X}$ and $\bm{Y}$ are probabilistically independent in the sense that $P(\bm{X}\mid \bm{Y}, \bm{W}) = P(\bm{X}\mid \bm{W})$.

For example, there are several paths from $U^2$ to $U^1$ in Figure [fig:signalMACID]{reference-type="ref" reference="fig:signalMACID"} a): direct forks through $X$ or $D^2$, a fork through $X$ and then a forward chain through $D^1$, or a backward chain through $D^2$ and then a fork through $D^1$. If $\bm{W} = \varnothing$ then $U^2$ is d-connected to $U^1$ ($U^2 \not\perp U^1\mid\varnothing$), but if $\bm{W} = {X, D^2}$ then all of the paths have been d-separated by conditioning on $\bm{W}$ and so $U^2 \perp U^1\mid\bm{W}$.

Policies {#sec:policies}

An agent makes a decision depending on the information it observes prior to making that decision. Therefore, in a MAIM, a decision rule $\pi_D$ for a decision node $D$ is a CPD $\pi_D(D\mid\Pa_D)$. A partial policy profile $\pi_{\bm{A}}$ is an assignment of decision rules $\pi_D(D\mid\Pa_D)$ to some subset $\bm{A} \subset \bm{D}$ and $\pi_{-\bm{A}}$ is the set of decision rules for all $D \in \bm{D} \setminus \bm{A}$. For example, $\pi_{D}$ refers to a decision rule at decision node $D$ and so $\pi_{-D} = \prod_{D' \in \bm{D} \setminus {D}} \pi_{D'}(D'\mid\Pa_{D'})$ denotes the partial policy profile over all of the MAIM's other decision nodes $\bm{D} \setminus {D}$. We refer to $\pi_{\bm{D}^i}$, which describes all the decision choices made by agent $i \in \bm{N}$, as that agent's policy, $\pi^i$, and we write $\pi^{-i} = (\pi^1, \dots, \pi^{i-1}, \pi^{i+1}, \dots, \pi^n)$ to denote the set of policies made by all agents other than agent $i$. A policy profile $\pi$ assigns a policy to every agent $\pi = (\pi^1, \dots, \pi^n)$; it describes all the decisions made by every agent in the MAIM. We denote spaces of policy profiles by $\Pi$ (e.g. $\Pi_{\bm{A}}$, $\Pi^i$, and $\Pi$).

If for every $\pa_d \in \dom(\Pa_D)$ and $d \in \dom(D)$ we have $\pi_D(d\mid\pa_D) \in {0,1}$, the decision rule is said to be pure or deterministic. Otherwise, the decision rule is said to be mixed and it is fully mixed if, for every $\pa_D$ and every $d$, we have $\pi_D(d\mid\pa_D) > 0$. Pure, mixed, and fully mixed policies or policy profiles are defined analogously.

When a partial policy profile $\pi_{\bm{A}}$ is applied to a MAIM $\macid$, a new MAIM $\macid(\pi_{\bm{A}})$ is obtained in which each decision node $D \in \bm{A}$ becomes a chance node with a CPD equal to $\pi_D$. In the case of a policy profile, all decision nodes are turned into chance nodes, and so the induced MAIM $\mathcal{M(\pi)}$ is now a Bayesian network (utility nodes are interpreted as chance nodes when a MAIM is viewed as a Bayesian network). This defines the joint probability distribution $\Pr^\pi$ over all variables in $\mathcal{M}$ and may be used for probabilistic inference.

Utilities {#sec:reward}

In an EFG $\efg$ the expected utility for each player depends on the set of probability distributions $\bm{P}$ and strategy profile $\sigma$ which give a full probability distribution ${P}^\sigma$ over the paths in $\efg$. For each path $\rho$ beginning from the root $R$ of $\efg$'s tree and terminating in a unique leaf node $\rho[\bm{L}]$, player $i$ receives utility $U(\rho[\bm{L}])[i]$ -- the $i$^th^ entry in the corresponding payoff vector. By playing strategy profile $\sigma$, player $i$'s expected utility $\mathcal{U}^i_{\efg}(\sigma) \coloneqq \sum_\rho P^\sigma(\rho) U(\rho[\bm{L}])[i]$.

Similarly, the joint distribution $\Pr^\pi$ induced by the policy profile $\pi$ in a MAIM $\macid$ allows us to define the expected utility for each player under this policy profile. Agent $i$'s expected utility from policy profile $\pi$ is the sum of the expected value of utility nodes $\bm{U}^i$ given by $\mathcal{U}^i_{\mathcal{M}}(\pi) \coloneqq \sum_{U_j \in \bm{U}^i}\sum_{u_j \in \dom(U_j)} !! u_j \Pr^\pi(U_j = u_j)$. We assume that each agent's goal is to select a policy $\pi^i$ that maximises its expected utility. Therefore, we can now define what it means for an agent to optimise $\pi_{\bm{A}}$ for a set of decisions $\bm{A} \subseteq \bm{D}^i$, given a partial policy profile $\pi_{-\bm{A}}$ over all of the other decision nodes in $\macid$. We write $\mathcal{U}^i_{\mathcal{M}}(\pi_{\bm{A}}, \pi_{\bm{-A}})$ to denote the expected utility for player $i$ under the policy profile $\pi = (\pi_{\bm{A}}, \pi_{\bm{-A}})$.

Let $\bm{A} \subseteq \bm{D}^i$. Player $i$'s partial policy $\pi_{\bm{A}}$ is optimal for a policy profile $\pi=(\pi_{\bm{A}}, \pi_{-\bm{A}})$ if $\mathcal{U}^i_\mathcal{M}(\pi_{\bm{A}}, \pi_{-\bm{A}}) \geq \mathcal{U}^i_\mathcal{M}(\hat{\pi}{\bm{A}}, \pi{-\bm{A}})$ for all $\hat{\pi}{\bm{A}} \in \Pi{\bm{A}}$. Player $i$'s policy $\pi^{i}$ is a best response to the partial policy profile $\pi^{-i}$ assigning policies to the other agents if $\mathcal{U}^i_\mathcal{M}(\pi^{i}, \pi^{-i}) \geq \mathcal{U}^i_\mathcal{M}(\hat{\pi}^i, \pi^{-i})$ for all $\hat{\pi}^i \in \Pi^i$. [[def:bestresponse]]{#def:bestresponse label="def:bestresponse"}

Strategic and Probabilistic Relevance {#sec:stratrel}

To optimise a particular decision rule, we often want to know which other decision rules need to already be known. This is captured by K$&$M's concept of strategic relevance.

[[def:stratrel]]{#def:stratrel label="def:stratrel"} Let $D_k, D_l \in \bm{D}$ be decision nodes in a MAIM $\mathcal{M}$. $D_l$ is strategically relevant to $D_k$ ($D_k$ strategically relies on $D_l$) if there exist two policy profiles $\pi$ and $\pi'$ and a decision rule $\pi_{D_k}$, such that:

$\pi_{D_k}$ is optimal for $\pi$.
$\pi$ differs from $\pi'$ only at $D_l$.
$\pi_{D_k}$ is not optimal for $\pi'$, and neither is any decision rule $\hat{\pi}{D_k}$ that agrees with $\pi{D_k}$ for all instantiations $\pa_{D_k}$ of $D_k$'s parents where the joint probability $\Pr^{\pi'}(\pa_{D_k}) > 0$.

The first two conditions say that if decision rule $\pi_{D_k}$ is optimal for a policy profile $\pi$, and $D_k$ does not strategically rely on $D_l$, then $\pi_{D_k}$ must also be optimal for any policy profile $\pi'$ that differs from $\pi$ only at $D_l$. The third condition deals with sub-optimal decisions in response to zero-probability decision contexts.

A related question is probabilistic relevance, which considers whether the probability distribution of a chance or utility node $X$ can influence the optimal policy.

Let $D$ be a decision node in a MAID $\macid$. A chance or utility node $Z \in \bm{X} \cup \bm{U}$ is probabilistically relevant to $D$ if the set of optimal decision rules for $D$ varies with the CPDs assigned to $Z$ under some joint policy profile $\pi$.

We generalise K&M's graphical criterion, s-reachability, as r-reachability to determine both strategic relevance and probabilistic relevance. Essentially, the criterion assesses whether knowing the CPD or decision rule of a node $V$ can have positive value of information [@Everitt2021]. The criterion is sound (if $V$ is relevant to $D$, then $V$ is r-reachable from $D$) and complete (if $V$ is r-reachable from $D$ then there is some parametrisation $\theta$ of the MAID and some policy profile $\pi$ such that $V$ is relevant to $D$). One can then further use r-reachability to define a relevance graph over $\bm{D}$.

[[def:rreachable]]{#def:rreachable label="def:rreachable"} A node $V$ is r-reachable from a decision $D \in \bm{D}^i$ in a MAID, $\macid = (\bm{N}, \bm{V}, \bm{E})$, if a newly added parent $\hat V$ of $V$ satisfies $\hat V \not\perp \bm{U}^i \cap \Desc_{D} \mid \Fa_{D}$.

[[def:relgraph]]{#def:relgraph label="def:relgraph"} The directed relevance graph for $\mathcal{M}$, denoted by $Rel(\mathcal{M}) = (\bm{D}, \bm{E}{Rel})$, is a graph where $\bm{D}$ is the set of $\mathcal{M}$'s decision nodes connected by directed edges $\bm{E}{Rel} \subseteq \bm{D} \times \bm{D}$. There is a directed edge from $D_j \rightarrow D_k$ if and only if $D_k$ is r-reachable from $D_j$.[^2]

Relevance graphs show which other decisions each decision depends on. The relevance graph for Example [ex:hiring]{reference-type="ref" reference="ex:hiring"}'s MAIM in Figure [fig:signalMACID]{reference-type="ref" reference="fig:signalMACID"} b) is cyclic because each decision node strategically relies on the other. The worker would be better off knowing the company's hiring policy before deciding whether or not to go to university, but the algorithm would also be better off knowing the worker's policy because it doesn't know the worker's temperament (lazy or hard-working). Our second example provides a case of acyclic strategic relevance.

[[ex:taxis]]{#ex:taxis label="ex:taxis"} Two autonomous taxis, operated by different companies, are driving along a road with two hotels located next to one another -- one expensive and one cheap. Each taxi must decide (one first, then the other) which hotel to stop in front of, knowing that it will likely receive a higher tip from guests of the expensive hotel. However, if both taxis choose the same location, this will reduce each taxi's chance of being chosen by that hotel's guests.

(0)<90,red!70>$V^1$+20mm..40mm+ [al] [ar] (1)(0-1)<180, blue!70>$V^2_1$ [al](2,2) [ar](5,3) (2)(0-2)<0, blue!70>$V^2_2$ [al](3,5) [ar](1,1) (1)(1-1)(1-2)[black,inner sep = 15pt] (2)(2-1)(2-2)[black,inner sep = 15pt] (0)(1-1)(1-2)(2-1)(2-2)[black,inner sep = 18pt]

Because the second taxi can observe which hotel the first taxi chooses to park in front of, it doesn't need to know the first taxi's policy in order to optimise its own; the second taxi's decision ($D^2$) does not strategically rely on the first taxi's decision ($D^1$). However, the first taxi would be better off knowing the second taxi's policy before deciding its own. For example, with the parametrisation in Figure [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"} e), if the first taxi knows that the second taxi's policy is to always park in front of the expensive hotel, the first taxi ought to always park in front of the cheaper hotel. $D^1$ does strategically rely on $D^2$. In Section 4{reference-type="ref" reference="sec:implementation"}, we shall see that it is easier to compute equilibria in MAIMs with acyclic relevance graphs.

(D1) [right = 2 of U, relevanceb] $D^1$; (D2) [below = of D1, relevanceb] $D^2$; D2;

Game Theory for MAIDs

In this section, we present novel material. We begin by defining MAID and MAIM subgames. These set up our discussion of several equilibrium refinements in MAIMs. Finally, we present and prove a number of equivalence results between EFGs and MAIMs.

Subgames {#sec:subgames}

In an EFG, a subtree of the original game tree is an EFG subgame if it is closed under information sets and descendants. Figure [fig:taxiEFG]{reference-type="ref" reference="fig:taxiEFG"} shows all the EFG subgames (dashed boxes) for the game described by Example [ex:taxis]{reference-type="ref" reference="ex:taxis"}. Any game tree is an EFG subgame of itself, and so an EFG subgame on a strictly smaller set of nodes is called a proper EFG subgame. We propose an analoguous definition for MAIDs. Just like for EFGs, MAIM subgames are parts of the game that can be solved independently.

A subgame base for a MAID $(\bm{N}, \bm{V}, \bm{E})$ is a subset $\bm{V'}\subseteq \bm{V}$ such that:

For any $X,Y\in \bm{V}'$ and any directed path $X \to \cdots \to Y$ in $\macid$, all nodes on the path are also in $\bm{V'}$.
$\bm{V}'$ is closed under r-reachability, i.e. if a node $Z$ is r-reachable from a decision $D\in \bm{V'}$, then $Z$ is also in $\bm{V'}$.

[[def:MACIDsubgame]]{#def:MACIDsubgame label="def:MACIDsubgame"} Let $\macid = (\bm{N}, \bm{V}, \bm{E})$ be a MAID, and let $\bm{V'}\subseteq \bm{V}$ be a subgame base. The MAID subgame corresponding to $\bm{V'}$, is a new MAID $\macid'=(\bm{N'}, \bm{V'}, \bm{E}')$ where:

$\bm{N'} = {i \in \bm{N} \mid \bm{D}^i \cap \bm{V'} \neq \varnothing }$, the players restricted to $\bm{V'}$.
$\bm{V'}$ is partitioned into $\bm{D'} = \bm{D}\cap \bm{V'}$, $\bm{U'} = \bm{U} \cap \Desc_{\bm{D'}}$, and $\bm{X'} = \bm{V'} \setminus (\bm{D'} \cap \bm{U'})$.
$\bm{E}'$ is the subset of edges in $\bm{E}$ that connect two nodes in $\bm{V'}$.

Analogously, the MAIM subgame of a MAIM $(\bm{N}, \bm{V}, \bm{E}, \theta)$ corresponding to a subset $\bm{V'}\subseteq \bm{V}$ and an instantiation $\bm{y}$ of the nodes $\bm{Y} = \bm{V}\setminus \bm{V'}$, is the modified MAIM $(\bm{N'}, \bm{V'}, \bm{E}', \theta')$ where:

$(\bm{N'}, \bm{V'}, \bm{E}')$ is the MAID subgame corresponding to $\bm{V'}$.
$\theta'$ is like $\theta$, restricted to nodes in $\bm{V'}$. If a node $X\in \bm{V'}$ has some parents outside of $\bm{V'}$ (i.e. in $\bm{Y}$) then $\Pr'(X\mid \pa'_X) = \Pr(X\mid \pa'_X, \bm{y}')$, where $\Pa'_X=\Pa_X\cap \bm{V'}$, $\bm{Y}' = \Pa_X\cap \bm{Y}$, $\Pr$ is the CPD of $X$ in $\theta$, and $\Pr'$ becomes the CPD of $X$ in $\theta'$.

In fact, only the setting $\bm{y}$ of the nodes that have a child in $\bm{V'}$ will matter. A MAIM subgame is feasible if there exists a policy profile $\pi$ where $\Pr^\pi(\bm{y}) > 0$.

In a sequential game with perfect information, the MAIM subgames will be in one-to-one correspondence with the subgames in any corresponding EFG. For example, Figures [fig:taxiEFG]{reference-type="ref" reference="fig:taxiEFG"} and [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"} show the EFG and MAID subgames of Example [ex:taxis]{reference-type="ref" reference="ex:taxis"}. As with EFG subgames, a MAID is trivially a MAID subgame of itself, as in Figure [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"} a). Figure [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"} c) shows the only proper MAID subgame of $\macid$. Two MAIM subgames are associated with this MAIM subgame: one for each value of $D^1$. The additional independencies represented by a MAID sometimes yields more independently solvable components than identifiable in an EFG representation; i.e., there can be more subgames in a MAIM than in a corresponding EFG (Appendix 6.4{reference-type="ref" reference="sec:extra_subgames"}).

A better sense of MAID subgames can be gained from looking at the strongly connected components (SCC) of the relevance graph, where recall that an SCC is a subgraph containing a directed path between every pair of nodes. A maximal SCC is an SCC that is not a strict subset of any other SCC. We can use this fact to define a condensed relevance graph, called the component graph by K$&$M, which aggregates each SCC into a single node.

[[def:ConRel]]{#def:ConRel label="def:ConRel"} For a given MAID $\mathcal{M} = (\bm{N}, \bm{V}, \bm{E})$, let $\bm{C}$ be the set of maximal SCCs of its relevance graph $Rel(\macid)$. The condensed relevance graph of $\mathcal{M}$ is the directed graph $ConRel(\macid) = (\bm{C}, \bm{E}_{ConRel})$. There is an edge $\bm{C}_m \rightarrow \bm{C}_l$ between $\bm{C}_m, \bm{C}_l\in\bm{C}$ if and only if there exists $C_m\in \bm{C}_m$ and $C_l\in \bm{C}_l$ with and edge $C_m\to C_l$ in $Rel(\macid)$.

Subgraphs of $ConRel(\macid)$ closed under descendants induce MAID subgames. Figures [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"} b) and d) highlight the nodes of the respective MAID subgames (since the relevance graph is acyclic here, condensing it has no effect). As the condenstion of a directed graph is always acyclic [@cormen2009introduction page 617], games can always be solved via backwards induction over the condensed relevance graph [@koller2003multi].

Equilibrium Refinements {#sec:eqrefine}

MAIDs represent dynamic games of incomplete information -- those in which at least one player $i$ does not have perfect information about the chance variables $\bm{X}$ (commonly interpreted as not knowing the type $T^i$ of the other agents, where $T^i$ defines the payoffs for agent $i$) -- and thus admit discussion of the beliefs that agents possess. In this work, we implicitly view such beliefs as defined by the induced distribution $\Pr^\pi$ and so eschew further discussion of them here; in a sense, chance variables can be viewed as decisions by nature (using fixed stochastic policies).

In non-cooperative games, the most fundamental solution is a Nash equilibrium [@nash1950equilibrium], a policy profile such that no agent has an incentive to unilaterally deviate. In other words, every player is simultaneously playing a best-response against all other players.

[[def:NE]]{#def:NE label="def:NE"} A full policy profile $\pi$ is a Nash equilibrium (NE) in a MAIM $\mathcal{M}$ if, for every player $i \in \bm{N}$, $\mathcal{U}^i_\mathcal{M}(\pi^i, \pi^{-i}) \geq \mathcal{U}^i_\mathcal{M}(\hat{\pi}^i, \pi^{-i})$ for all $\hat{\pi}^i \in \Pi^i$.

The concept of a subgame perfect equilibrium (SPE) was introduced by Reinhard Selten to address the issue that EFGs admit NEs with non-credible threats -- equilibria in which a player threatens to take some action that, if the player is rational, they would never actually carry out [@selten1965spieltheoretische; @selten1974reexamination]. In an EFG, a strategy profile is an SPE if it induces an NE in every EFG subgame; this eliminates all NEs containing non-credible threats. Our definition of MAID subgames above allows us to introduce an analogous equilibrium concept.

A full policy profile $\pi$ is a subgame perfect equilibrium (SPE) in a MAIM $\macid$ if $\pi$ is an NE in every MAIM subgame of $\macid$.

Figure [fig:taxiNE]{reference-type="ref" reference="fig:taxiNE"} shows the three pure NEs of Example [ex:taxis]{reference-type="ref" reference="ex:taxis"}. The policy profiles in b) and c) are NEs but not SPEs. To see why, consider the proper MAIM subgame when $D^1 = e$ and policy profile b). Here player 2 obtains utility 3 if they choose $c$ and utility 2 if they choose $e$. Therefore, player 2 is making a non-credible threat whenever $\pi^2(D^2=e \mid D^1=e) > 0$. For similar reasons, policy profile c) is also not SPE. Therefore a) is the only SPE of this MAIM.

Within maximal SCCs of $Rel(\macid)$, in which there are no proper subgames, the agents choose decision rules interdependently. This can lead to arbitrarily bad decision rules in decision contexts that occur with probability zero. Trembling hand equilibria offer a useful NE refinement in these situations [@selten1974reexamination]. Intuitively, they require that each player's policy is still a best response when the other players make mistakes, or 'tremble', with small probability. Let $\delta_k$ be a perturbation vector containing, for every $D \in \bm{D}$, $d \in \dom(D)$, and decision context $\pa_D$, a value $\epsilon^d_{\pa_D} \in (0,1)$ such that $\sum_{d \in \dom(D)} \epsilon^d_{\pa_D} \leq 1$. Then, given a MAIM $\macid$, the perturbed MAIM $\macid(\delta_k)$ is defined such that for every $d \in \dom(D)$ for $D \in \bm{D}^i$, agent $i$ must play $d$ with probability at least $\epsilon^d_{\pa_D}$ given $\pa_D$.

A full policy profile $\pi$ is a trembling hand perfect equilibrium (THPE) in a MAIM $\macid$ if there is a sequence of perturbation vectors ${\delta_k}{k\in\mathbb{N}}$ such that $\lim{k \rightarrow \infty}\vert\delta_k\vert_\infty = 0$ and for each perturbed MAIM $\macid(\delta_k)$ there is an NE $\pi_k$ such that $\lim_{k \rightarrow \infty} \pi_k = \pi$.

[[ex:cyber]]{#ex:cyber label="ex:cyber"} The security agencies for two governments both use an algorithm to manage their cyber-defence. Their algorithm decides whether to cyber-attack the other nation's security agency. If both agencies attack one another, both suffer some damage (mainly the opportunity cost of needing to continuously work on upgrading their defence systems). The attacker never gains much, but if only one agency attacks the other, the defender suffers a lot more damage.

Figure [fig:tremble]{reference-type="ref" reference="fig:tremble"} shows a MAIM with its parametrisation and relevance graph for Example [ex:cyber]{reference-type="ref" reference="ex:cyber"}. Using each player's parametrised policy in Figure [fig:tremble]{reference-type="ref" reference="fig:tremble"} d) this MAIM has two NEs: at $p=q=1$ and $p=q=0$, either both governments attack ($a$) or not ($n$). Figures [fig:THpolicy]{reference-type="ref" reference="fig:THpolicy"} a) and c) show player $1$'s best response policies for each of these NEs perturbed by $\epsilon$ to result in the perturbed MAIM $\macid(\epsilon)$. Figures [fig:THpolicy]{reference-type="ref" reference="fig:THpolicy"} b) and d) show player $2$'s expected utility if they attack (or not) in response to player $1$ using the policy in [fig:THpolicy]{reference-type="ref" reference="fig:THpolicy"} a) or c) respectively. For small $\epsilon>0$, player $2$'s best response to both the policies in Figures [fig:THpolicy]{reference-type="ref" reference="fig:THpolicy"} a) and c) is to choose $D^2 = a$ and so the NE $p=q=0$ is not robust against trembles. The NE $p=q=1$ is this MAIM's only THPE.[^3]

Transformations and Equivalences {#sec:trans_equivalences}

Both EFGs and MAIMs represent games graphically. In this section, we provide equivalence results between these models to demonstrate that alongside the increased compactness and structural clarity of MAIMs, the fundamental game-theoretic notions of subgames and equilibria are preserved when converting an EFG to a MAIM.

MAIM to EFG {#sec:MAIDtoEFG}

There are many ways to convert a MAIM into an EFG, but these differ in their computational costs [@koller2003multi; @pearl2014probabilistic]. We give a full and formal transformation procedure, , in Appendix 6.1{reference-type="ref" reference="sec:macid2efg"} based on that of K&M. The idea is to use a topological ordering $\prec$ over the nodes of the MAID to construct the EFG game tree by splitting on each of the nodes in $\prec$. Because there can be more than one such ordering, the output of is a set of EFGs. Our codebase implements a more efficient transformation, keeping only utility nodes, decision nodes, and informational parents ($\bigcup_{D\in\bm{D}} \Fa_{D}$). This information is enough for computing equilibria, and can offer significant efficiency gains since the cost of solving an EFG depends on its size, which is exponential in the length of $\prec$. The resulting EFG can be fed into Gambit, a popular tool for solving EFGs [@mckelvey2006gambit], though it may not contain enough information to fully recover the original MAIM.

EFG to MAIM {#sec:EFGtoMAID}

By encoding the CPDs for each variable in the MAIM using trees as opposed to tables, MAIMs can represent any decision-making problem using at most the same space, but often exponentially less space than an EFG [@koller2003multi]. In general, there are many MAIMs that can represent a given EFG. For instance, upon converting the EFG representation (Figure [fig:taxiEFG]{reference-type="ref" reference="fig:taxiEFG"}) of Example [ex:taxis]{reference-type="ref" reference="ex:taxis"} to a MAIM (Figure [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"}), we could naïvely retain the EFG's root and two child nodes as three decision nodes ($D^1$, $D^2_a$, and $D^2_b$) in the MAIM. Alternatively, we could recognise that $D^2_a$ and $D^2_b$ both correspond to the same real world variable, the decision made following $D^1$, and thus combine them (as shown in Figure [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"}). In Appendix 6.2{reference-type="ref" reference="sec:EFG2MAID"}, we formalise this notion and provide a procedure which maps an EFG to a unique, canonical MAIM (including those with absent-mindedness [@piccione1997interpretation]).

Equivalences {#sec:equivalences}

We now provide a series of equivalence results between EFGs and MAIMs to fortify the game-theoretic foundations behind our analysis of MAIDs. Results are justified using intuitive sketches, with full proofs in Appendix 7{reference-type="ref" reference="sec:proofs"}.

[[def:feasibleDC]]{#def:feasibleDC label="def:feasibleDC"} A decision context $\pa_D$ for a decision node $D$ in $\macid$ if feasible if there exists a policy profile $\pi$ where $\Pr^\pi(\pa_D) > 0$. A decision context $\pa_D$ is null if every player always receives utility 0, i.e. $\mathcal{U}^i_{\mathcal{M}}(\pi'' \mid \pa_D) = \sum_{U_j \in \bm{U}^i}\sum_{u_j \in \dom(U_j)} !! u_j \Pr^\pi(U_j = u_j \mid \pa_D) = 0$ for all $i$ and any policy profile $\pi''$, or if it is infeasible.

[[def:equiv]]{#def:equiv label="def:equiv"} We say that a MAIM $\macid$ is equivalent to an EFG $\efg$ (and vice versa) if there is a bijection $f: \Sigma \rightarrow \Pi / \sim$ between the strategies in $\efg$ and a partition of the policies in $\macid$ (the quotient set of $\Pi$ by an equivalence relation $\sim$) such that:

$\pi\sim\pi'$ only if $\pi$ and $\pi'$ differ only on null decision contexts.
For every $\pi \in f(\sigma)$ and every player $i$, $\mathcal{U}^i_{\efg}(\sigma) = \mathcal{U}^i_{\macid}(\pi)$.

We refer to $f$ as a natural mapping between $\efg$ and $\macid$.

The reason we use an equivalence relation on the space of policies is that can introduce additional null decision contexts: those that do not correspond to any path through the EFG. Although this equivalence is not exact, it is sufficient for preserving the essential game-theoretic features of each representation, as we show below. We begin with a supporting lemma that justifies the correctness of our procedures and , and forms the basis of our other results.

[[lem:correspondence]]{#lem:correspondence label="lem:correspondence"} If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid =\bfsf{efg2maim}(\efg)$ then $\efg$ and $\macid$ are equivalent.

This lemma follows directly by construction from the two procedures, and respectively. The intuition is that the information sets in an EFG correspond to the non-null decision contexts in a MAIM, and thus an EFG's behavioural strategy profile $\sigma$ corresponds to a policy profile $\pi$ in the MAIM, and vice versa. As an immediate consequence, we see that NEs are preserved by our transformations between EFGs and MAIMs.

[[prop:BE]]{#prop:BE label="prop:BE"} If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$ then there is a natural mapping $f$ between $\efg$ and $\macid$ such that $\sigma$ is an NE in $\efg$ if and only if any $\pi \in f(\sigma)$ is an NE in $\macid$.

For an EFG subgame $\efg'$, the variables outside $\efg'$ are neither strategically nor probabilistically relevant to those in the corresponding MAIM subgame $\macid'$. This means that EFG subgames have equivalent counterparts in the equivalent MAIM, as established by the following proposition.

[[prop:subgames]]{#prop:subgames label="prop:subgames"} If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$ then there is a natural mapping $f$ between $\efg$ and $\macid$ such that, for every EFG subgame $\efg'$ in $\efg$ there is a MAIM subgame $\macid'$ in $\macid$ that is equivalent to $\efg'$ under the natural mapping $f$ restricted to the strategies of $\efg'$.

This restriction of $f$ to the strategies in $\efg'$ can be made precise by considering only those non-null decision contexts that correspond to the information sets contained in $\efg'$, as in the case for Lemma [lem:correspondence]{reference-type="ref" reference="lem:correspondence"}. Given Proposition [prop:subgames]{reference-type="ref" reference="prop:subgames"} and Corollary [prop:BE]{reference-type="ref" reference="prop:BE"}, it can easily be seen that not only are NEs preserved when representing EFGs as MAIMs, but so too are SPEs. We remark, however, that as there may be more subgames in MAIM than in an equivalent EFG, that the criterion of subgame perfectness may be slightly stronger in the MAIM, and so not all SPEs in an EFG may be SPEs in the equivalent MAIM. This additional strength can be useful in ruling out what we intuitively view as 'irrational' behaviour, even when it does not fall under a particular subgame in the EFG.

[[prop:SPE]]{#prop:SPE label="prop:SPE"} If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$ then there is a natural mapping $f$ between $\efg$ and $\macid$ such that if any $\pi \in f(\sigma)$ is an SPE in $\macid$, then $\sigma$ is an SPE in $\efg$.

Finally, we derive an equivalence between the THPEs in EFGs and those in MAIMs. In order to do so, it suffices to prove an equivalence between perturbed versions of the corresponding games $\efg(\delta_k)$ and $\macid(\delta_k)$, which can easily be done via construction using , and then by applying Lemma [lem:correspondence]{reference-type="ref" reference="lem:correspondence"} and Corollary [prop:BE]{reference-type="ref" reference="prop:BE"}.

[[prop:THPE]]{#prop:THPE label="prop:THPE"} If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$ then there is a natural mapping $f$ between $\efg$ and $\macid$ such that $\sigma$ is a THPE in $\efg$ if and only if any $\pi \in f(\sigma)$ is a THPE in $\macid$.

This series of equivalence results serves to justify MAIDs as an appropriate choice of game representation. Not only do they provide computational advantages over EFGs, we have shown that they preserve the most fundamental game-theoretic concepts commonly employed in EFGs.

Implementation {#sec:implementation}

K&M showed that the explicit representation of dependencies between variables in MAIDs can substantially reduce the computational cost of finding an NE [@koller2003multi; @milch2008ignorable]. In this section, we describe a modified version of their algorithm and use MAID subgames to find all pure SPEs (see also Appendix 8{reference-type="ref" reference="sec:codebase"}). We show that MAID subgames exhibit the familiar subgame property of being useful for 'generalised backwards induction' algorithms [@kaminski2019generalized].

Beginning with an arbitrary policy profile $\pi(0)$ across all decision nodes in the original MAIM, $\macid$, we optimise decision rules associated to each $D \in \bm{D}$ by iterating backwards through a MAID subgame ordering from $\macid_m$ to $\macid_0$. In what follows, we write $\macid_i \prec \macid_j$ if $\macid_j$ is a proper MAID subgame of $\macid_i$, and $\bm{D}_k$ for the decision nodes in $\macid_k$. Several MAID subgames can have the same set of decisions, $\bm{D}_k$, so we choose a single MAID subgame $\macid_k$ (one with the fewest nodes $\bm{V}'$) for each $\bm{D}_k$ and discard the others. Each MAID in this ordering has an associated MAIM for each setting of the nodes which have a child in $\bm{V}'$.

When considering a MAIM for $\macid_{m-i}$, the decision rules for all decision nodes in proper MAIM subgames of $\macid_{m-i}$ will have already been optimised and fixed in previous iterations, so these are now chance nodes in $\macid_{m-i}$. In addition, none of the decision nodes $\bm{D}{m-i}$ in $\macid{m-i}$ strategically rely on any of the decision nodes outside of $\macid_{m-i}$. Therefore, this step is localised to computing the optimal decision rules only for $\bm{D}_{m-i}$.

The next step depends on $\vert \bm{D}{m-i} \vert$. If only one decision node $D \in \bm{D}^j$ remains, as in Figure [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"} c) for example, then its optimal decision rule is that which maximises player $j$'s expected utility in each of the MAIMs (for each value of $D^1$) for this MAID subgame. If $\vert \bm{D}{m-i} \vert > 1$, the relevance graph of $\macid_{m-i}$ is cyclic and so the decision nodes strategically rely on one another. We must therefore call a subroutine: the MAIMs for the MAID subgame induced by the policy profile at that step, $\macid(\pi_{-\bm{D}_{m-i}}(i))$, are converted to EFGs to be solved using Gambit [@mckelvey2006gambit]. Algorithm [algo:SPE]{reference-type="ref" reference="algo:SPE"} shows the full procedure.

Input: MAIM $\macid = (\bm{N}, \bm{V}, \bm{E}, \theta)$ Output: SPE $\pi$ initialise $\pi(0)$ as an arbitrary fully mixed policy profile compute an ordering $\prec$ over the subgames $\macid_0,\dots,\macid_m$ in $\macid$ compute a best response policy profile $\pi^{\bm{D}{m-i}}$ for all decision nodes in $\bm{D}{m-i}$ using $\macid(\pi{-\bm{D}{m-i}}(i))$ $\pi(i+1) \gets (\pi{-\bm{D}_{m-i}}(i),\pi^{\bm{D}{m-i}})$ return $\pi(m)$

It is more efficient to pass the EFGs in the algorithm's subroutine to an EFG solver such as Gambit [@mckelvey2006gambit], rather than passing an EFG for the entire original MAIM. In the induced MAIM $\macid(\pi_{-\bm{D}{m-i}}(i))$, all decision nodes in the proper MAIM subgames have been converted into chance nodes. In our MAID to EFG transformation we need only split on decision nodes and their informational parents, so the size of the EFG is exponential in $\vert \Fa{\bm{D}_{m-i}} \vert$. As the time complexity of solving an EFG depends on its size, the cost of solving a MAIM using Algorithm [algo:SPE]{reference-type="ref" reference="algo:SPE"} is never greater than solving an equivalent EFG representation of the original game, and is exponentially faster in many cases.

Our open-source Python codebase[^4] implements this procedure, provides methods for finding and plotting MAIDs $\macid$, along with $Rel(\macid)$ and $ConRel(\macid)$, and converts any MAIM into an EFG to be used with Gambit. Our aim is to provide the necessary computational tools for researchers and practitioners to develop further applications of MAIDs.

Discussion and conclusions {#sec:discussion}

This work has extended previous results on MAIDs by introducing the concept of a MAID subgame and a range of key equilibrium refinements. K&M argued that MAIDs offer several benefits [@koller2003multi]. First, MAIDs can represent games more concisely than EFGs. Second, because a parametrised MAID is a probabilistic graphical model, the probabilistic dependencies between chance and decision variables can be exploited in order to identify whether decision nodes strategically rely on one another; we used this to define MAID subgames and our resulting equilibrium refinements. Separately, MAIDs can lead to substantial savings in the computational cost of finding an SPE; in Section 4{reference-type="ref" reference="sec:implementation"}, we have described a modified version of an algorithm of K&M and implemented it in an open-source codebase.

These benefits of MAIDS, coupled with the theoretical and practical contributions of this paper, provide a rich basis for future work. One avenue for such work that we are already pursuing is to extend the analysis of incentives [@Everitt2021; @carey2020incentives] to the multi-agent setting by interpreting the directed edges in MAIDs causally. One could then investigate which variables in the graph each agent has an incentive to observe or control, and which reasoning patterns are involved [@pfeffer2007reasoning], given that all of the agents in the MAID are playing a certain equilibrium refinement.

The authors wish to thank Ryan Carey for invaluable feedback and assistance while completing this work, as well as Zac Kenton, Colin Rowat, and several anonymous reviewers for helpful comments. Hammond acknowledges the support of an EPSRC Doctoral Training Partnership studentship (Reference: 2218880). Fox acknowledges the support of the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems (Reference: EP/S024050/1).

Transformations between Game Representations {#sec:transformations}

MAIM to EFG {#sec:macid2efg}

In Section 3.3.1{reference-type="ref" reference="sec:MAIDtoEFG"}, we mentioned that there are many ways of converting a MAIM into an EFG. In this section, we provide an encoding transforming a MAIM, $\macid = (\bm{N}, \bm{V}, \bm{E}, \theta)$, into an EFG, $\efg = (\bm{N}, T, \bm{P}, \bm{D}, \lambda, \bm{I}, U)$. One can decide on whether one wants to keep record of the structure of the original MAID, or if one wants to optimise for complexity by deciding which of $\macid$'s nodes we split on in $\efg$. If one wants to be able to preserve all of the MAID's structure (e.g. the dependencies between variables) then the set of splitting nodes in the resulting tree is $\bm{S} = \bm{X} \cup \bm{D} \subset \bm{V}$. If instead, however, one wants to minimise the complexity of calculating game equilibria, then one only needs to split on the MAID's decision nodes and informational parents, $\bm{S} = \bm{D} \cup \bigcup_{D\in\bm{D}} \Pa_{D}$ [@pearl2014probabilistic]. This is because the EFG's tree size will be exponential in the size of this set, $\vert \bm{S} \vert$. The following procedure, which we refer to as , is based on that of K&M [@koller2003multi]. Given a MAIM $\macid = (\bm{N}, \bm{V}, \bm{E}, \theta)$ we define an equivalent EFG $\efg = (\bm{N}, T, \bm{P}, \bm{D}, \lambda, \bm{I}, U)$ as follows:

Choose a topological ordering $\prec : = S_1,\dots,S_n$ over all nodes in $\bm{S}$ such that if $S_j$ is a descendent of $S_i$, then $S_i \prec S_j$.[^5] Further, define $\bm{S}_i^\prec = {S' : S' \prec S_i}$ to denote the set of nodes in $\bm{S}$ that come before $S_i$ in $\prec$.
The set of players, $\bm{N}$, in $\efg$ is the same as in $\mathcal{M}$.
The tree $T$ is a symmetric tree with each path containing splits over all the variables in $\bm{S}$ in the order defined by $\prec$.
For all nodes $S \in \bm{S}$, if $S$ is a chance node in $\macid$, add it to $\efg$'s set of chance nodes $\bm{V}^0$. Else, if $S$ is a decision node belonging to some player $S \in \bm{D}^i$, add $S$ to player $i$'s set of nodes in $\efg$, $\bm{V}^i$.
Label each node $Z$ in $T$ with an instantiation $\mu(Z)$ corresponding to the values taken by each EFG node (i.e., the branch followed from this node) on the path from the tree's root node $R$ to $Z$.
For every node $Y$ in $T$ such that its corresponding node $S_Y \in \bm{S}$ is a chance node in $\mathcal{M}$, we determine the probability distribution $P_Y : \Ch_{Y} \rightarrow [0,1]$ over its children $Z \in \Ch_Y$ (each child corresponds to a value $s_y \in \dom(S_Y)$) by querying that node's CPD table with the instantiation label at $Y$. In other words, $P_Y(Z) \coloneqq \Pr(Y=s_y\mid\bm{S}Y^\prec = \mu(Z))$. This equation can be simplified because in a Bayesian network the CPD for each node $S_Y$ only depends on the values of its parents, $\pa{S_Y}$. Therefore, we have: $$P_Y(Z) \coloneqq \Pr(S_Y=s_y\mid\bm{S}Y^\prec = \mu(Z)) = \Pr(S_Y = s_y \mid \bm{S}Y^\prec[{\Pa{S_Y}}] = \mu(Z)[\pa{S_Y}])$$ where $\bm{S}Y^\prec[\Pa{S_Y}]$, $\mu(Z)[\pa_{S_Y}]$ are the restriction of $\bm{S}Y^\prec$ and $\mu(Z)$ to the variables $\Pa{S_Y}$ and corresponding values $\pa_{S_Y}$. If $P_Y(Z) = 0$ for some child $Z$, then remove that child from the EFG.
The set of available decisions $\bm{D}^i_j$ for player $i$ at node $V^i_j$ in the EFG is given by the domain of the corresponding decision node $D^i$ in the MAIM: $\bm{D}^i_j \coloneqq \dom(D^i)$.
Define $\lambda: \bm{V} \times \bm{V} \rightarrow \bm{D}$ for each $Y$ corresponding to a decision node in the MAIM ($Y \in \bm{V}^1 \cup \dots \cup \bm{V}^n$) and $Z \in \Ch_Y$ such that $\lambda(Y,Z) = \mu(Z)[s_Y]$ to label the outcome of the decision.
Define an equivalence relation $\sim$ over $\bm{V}^1 \cup \dots \cup \bm{V}^n$ such that $Y \sim Z$ if and only if $S_Y = S_Z = S$ and $\mu(Y)[\pa_{S_Y}] = \mu(Z)[\pa_{S_Z}]$. Then the set of information sets $I^i$ for each player $i \in \bm{N}$ is simply the quotient set $\bm{V}^i/\sim$ -- the set of $\sim$ equivalence classes partitioning $\bm{V}^i$. In other words, two nodes $Y, Z \in \bm{V}$ which correspond to MAIM decision nodes are in the same information set $I^i_j$ for player $i$ if and only if their instantiation labels, restricted to their MAIM parents, are the same $\mu(Y)[\pa_{S_Y}] = \mu(Z)[\pa_{S_Z}]$.
The payoff $U: \bm{L} \rightarrow \mathbb{R}^n$ for each leaf $L\in \bm{L}$ of the EFG's tree is $U(L) = (u^1,\dots,u^n)$, where $u^i \coloneqq \sum_{U^i_j \in \bm{U}^i} \mu(L)[u^i_j]$ and $u^i_j$ is the value for each utility node $U^i_j \in \bm{U}^i$ which is recorded as part of leaf's full instantiation label $\mu(L)$.

EFG to MAIM {#sec:EFG2MAID}

In this section, we detail the construction, which we refer to as , that underlies our equivalence results. Unlike in the case for , we show how every EFG can be converted to a unique, canonical equivalent MAIM. In order to do this, it will be helpful to define the notion of an intervention set. Note that in what follows we assume that the EFGs do not include instances of absentmindedness (where a path from the root of the tree to one of its leaves passes through an information set more than once); this restriction is lifted in our discussion of absentmindedness in Appendix 6.3{reference-type="ref" reference="sec:absent"}.

[[def:intervention_set]]{#def:intervention_set label="def:intervention_set"} An intervention set $J$ in an EFG $\efg$ is either a set of chance nodes or a set of information sets belonging to the same player, such that:

Each node in the set has the same number of children.
No path from the root of $\efg$ to one of its leaves passes through $J$ more than once.
For any two information sets $I$ and $I'$ in $J$, the same knowledge regarding the paths taken from the root of $\efg$ to $I$ and $I'$ must be available at both $I$ and $I'$.

The intuition behind this definition is that an intervention set represents a single variable in the real world. For example, while flipping two fair coins independently would result in an EFG with three nodes (excluding utilities), the two children of the root would represent a single event: the flipping of the second coin. In general, without further domain knowledge about the EFG, one cannot tell whether two separate chance nodes or information sets represent the same variable in the real world, and so by default every intervention set in an EFG is a singleton, containing only one chance node or one information set. However, if one does possess this knowledge (as is often the case), then it can allow one to form a more compact version of the equivalent MAIM (as shown by example in Section 3.3.2{reference-type="ref" reference="sec:EFGtoMAID"}). Given an EFG $\efg = (\bm{N}, T, \bm{P}, \bm{D}, \lambda, \bm{I}, U)$ (including intervention sets, such that every non-utility node belongs to a single intervention set) we define an equivalent MAIM $\macid = (\bm{N}, \bm{V}, \bm{E}, \theta)$ as follows:

The set of players $\bm{N}$ remains the same in $\macid$ as in $\efg$.
Initialise the MAID's graph $(\bm{V}, \bm{E})$ as $T$, where edges are directed from parents to children.
For each of $\efg$'s chance nodes $V \in \bm{V}^0$, label each outgoing edge from $V$ with a value $v$. Every other node with an outgoing edge is labelled, by $\lambda$, with the decision corresponding to that edge. Thus, let $\dom(V)$ contain the labels of the outgoing edges from $V$.
For each variable $V \in \bm{V}$ let $\rho_V$ be the unique path formed by the sequence of labels from the root $R$ of $\efg$ to the corresponding $\efg$ node for $V$ and let $\rho_V[V']$ denote the label of the outgoing edge from node $V'$ on the path $\rho_V$.
For each information set $I^i_j$ in $\efg$, let $\rho^\prec(I^i_j)$ be the set of paths from $R$ into the nodes of $I^i_j$. Next, define a function $\mu : \bm{I} \rightarrow 2^{\bm{V}}$ that maps each information set $I^i_j$ to the set of variables in $\Anc_{I^i_j}$ whose outgoing label is the same in every path in $\rho^\prec(I^i_j)$. Note that by Definition [def:intervention_set]{reference-type="ref" reference="def:intervention_set"}, if information sets $I$ and $I'$ are in the same intervention set, then the nodes whose values are in $\mu(I)$ are the same as the nodes whose values are in $\mu(I')$.
We then consider $\efg$'s chance, decision, and leaf nodes in turn:
- For every outgoing edge with some label $v_j$ from a chance node $V \in \bm{V}^0$, add the label $(v_j \mid \rho_{V_j}) : p$ where $p = P_j(v_j)$.
- For every information set $I^i_j$ and each variable $D \in I^i_j$, add the label $(d \mid \rho_D[\mu(I^i_j)] ) : _$ where $\rho_D[\mu(I^i_j)]$ denotes the labels of $\rho_D$ restricted to those in ${\rho_D[V] : V \in \mu(I^i_j)}$ and $_$ is a placeholder to be parametrised by a decision rule.
- For each leaf variable $L \in \bm{L}$ with payoff vector $U(L) = u$, split $L$ into $n$ utility nodes $U^1,\dots,U^n$ (duplicating incoming edges) with labels $(u[i] \mid \rho[L]) : 1$ respectively, where $u[i]$ is the $i^\text{th}$ entry in $u$.
Given these labellings we proceed by merging variables according to the intervention sets:
- Merge each information set $I^i_j$ into a single variable $D_j \in \bm{D}^i$, collecting the labels $(v \mid \rho_{V}) : p$ for each $V \in I^i_j$ and retaining all incoming and outgoing edges.
- Begin by adding a directed edge from every $V' \in \Anc_V$ to $V$ for every node $V$. Then for every variable $D_j$ corresponding to an information set $I^i_j$, remove all the incoming edges from variables that are not in $\mu(I^i_j)$.
- Merge each group of nodes, collecting their incoming and outgoing labelled edges, that belong to the same intervention set.
- Merge any utility nodes $U^i_j$ and $U^i_k$ belonging to the same player $i$ that have the same sets of incoming edges, and collect their labels.
We then let $\bm{V} \coloneqq \bm{X} \cup \bm{D} \cup \bm{U}$ where $\bm{X} = \bm{V}^0$, $\bm{D}$ is the union of variable sets $\bm{D}^i$ defined above, and $\bm{U}$ is the collection of utility variables, also defined above.
$E$ is the set of edges in the graph defined above.
We conclude by defining a set of CPDs, $\Pr(\bm{V} \mid \pa_V)$, for every chance and utility node $\bm{X} \cup \bm{U} \in \bm{V}$.
- For certain instantiations, $\pa_V$, the CPD over $V$ is undefined because $\pa_V$ does not represent a path through the original game tree $T$. For non-decision variables, this is dealt with by simply adding a null value $\perp$ for every variable in $\bm{X}$ and a value of 0 for every variable in $\bm{U}$.
- Recall the labels of the form $(v \mid \rho_V) : p$ and let $lab(V)$ denote the set of $V$'s labels.
  - For each variable $V \in \bm{V} \setminus \bm{D}$, we define $\Pr(V = v\mid \Pa_V = \pa_V) \coloneqq \sum_{(v \mid \pa_V) : p ~\in~ lab(V)} p$.
  - For any $\pa_V$ such that for all $v \in \dom(V)$, $\Pr(V = v\mid \Pa_V = \pa_V) = 0$, we set $\Pr(V = \perp\mid \Pa_V = \pa_V) = 1$ if $V \in \bm{X}$ and $\Pr(V = 0\mid \Pa_V = \pa_V) = 1$ if $V \in \bm{U}$.
By construction, $\Pr(\bm{X},\bm{U} : \bm{D}) = \prod_{V \in \bm{V} \setminus \bm{D}} \Pr(V \mid \Pa_V)$ therefore forms a partial distribution over the nodes in $\macid$.
For decision variables, we now see that the only changes in parametrisations of $\pi(D \mid \pa_D)$ that can have any effect on any of the other variables are those that occur under settings $\pa_D$ such that there exists a strategy $\sigma$ and path $\rho$ in $\efg$ capturing all values in $\pa_D$ with $\Pr^\sigma(\rho) > 0$. In other words, the possible parametrisations of $\pi(D \mid \pa_D)$ correspond to parametrisations of $(d \mid \rho_D[\mu(I^i_j)] ) : _$.

Absentmindedness {#sec:absent}

An EFG $\efg$ with absentmindedness has at least one path from the root to some other vertex in $\efg$ that passes through the same information set more than once [@piccione1997interpretation]. In an EFG, a behavioural strategy defines a single distribution over decisions per information set; this corresponds to a single decision rule for a decision node in the MAIM. In a non-absentminded game, a behavioural strategy at an information set only corresponds to one decision instance (as that information set is encountered only once for any path through the EFG). However, in $\efg$ the player may have to apply a behavioural strategy for a single information set at multiple decision instances.

In our usual transformation , a decision node $D$ in a MAID corresponds to an information set, and thus a single behavioural strategy. Therefore, there is no way for the decision choice to be different for each decision instance. To handle absentmindedness, we can replace $D$ with a set of nodes $D \cup \bm{X}^D$. $D$ is the decision node that represents the behavioural strategy with a decision rule, $\bm{X}^D$ is a set of chance nodes representing decision instances, and there is an edge from $D$ to all $X^D_j \in \bm{X}^D$. $D$'s outgoing edges in the original MAID now flow through some $X^D_j \in \bm{X}^D$, but incoming edges to $D$ remain the same. There is an additional edge $X^D_j \rightarrow X^D_k$ if and only if there is a path in the game tree from the root of $\efg$ that passes through the decision node represented by $X^D_j$ and then the decision node represented by $X^D_k$. Without absentmindedness, there is no reason to do this as $\vert \bm{X}^D \vert = 1$ and so the nodes $D \cup \bm{X}^D$ can be merged; however, with absentmindedness $\vert \bm{X}^D \vert > 1$ and so we must allow for the possibility of the player making different decision choices at different decision instances within the same information set. This can be seen, for example, in Figure [fig:absentminded]{reference-type="ref" reference="fig:absentminded"} where the player may use the same (stochastic) behavioural strategy at both nodes but end up continuing ($c$) at the first junction and exiting ($e$) at the second.

(0)(0,0)<45, red!70>$V^1_1$ [br](0) [bl] (1)(0-2)<45, red!70>$V^1_2$ [br](4) [bl](1) (0)(0)(0-1)(1-1)(1-2)(2-1)(2-2)[black,inner sep = 18pt] (0)(1)[$I^1$]{style="color: red!70"}[above right]

(D) [decision, player1] $D$; (XD2) [right = of D] $X^D_2$; (XD1) [below = of D] $X^D_1$; (U) [utility, below = of XD2, player1] $U$; XD1, XD2; XD2; U;

Subgames {#sec:extra_subgames}

In Section 3.1{reference-type="ref" reference="sec:subgames"}, we mentioned that a MAID may have more subgames than a corresponding EFG. We illustrate this here with an example in Figure [fig:subgames2]{reference-type="ref" reference="fig:subgames2"}, which shows that the number of MAID subgames can vastly exceed the corresponding EFG subgames. In the EFG representation shown in Figure [fig:subgames2]{reference-type="ref" reference="fig:subgames2"} c), there are no proper subgames, because $D^2$ does not observe $X$. However, from the MAID representation shown in Figure [fig:subgames2]{reference-type="ref" reference="fig:subgames2"} a), it is clear that $D^2$ does not need to know $X$ to choose an optimal policy. Therefore, $D^2\to U^2$ can be factored out as a MAID subgame, shown top left in Figure [fig:subgames2]{reference-type="ref" reference="fig:subgames2"} b). Similarly, the MAID representation reveals that we can solve the subgame corresponding to the $D^1$ subtrees independently, shown bottom left in Figure [fig:subgames2]{reference-type="ref" reference="fig:subgames2"} b). More surprisingly, perhaps, MAID subgames need not be closed under descendants. Indeed, in the example in Figure [fig:subgames2]{reference-type="ref" reference="fig:subgames2"}, it is possible to solve for $D^1$ without knowing the policy for $D^2$. This is represented by the MAID subgames in the right column of Figure [fig:subgames2]{reference-type="ref" reference="fig:subgames2"} b). Additional subgames may of course also occur in much more complex games, in which the computational advantages of being able to flexibly factor the game into subgames are more keenly felt.

(X) $X$; (D1) [below = of X, decision, player1] $D^1$; (D2) [decision, player2, below = of D1] $D^2$; (U1) [utility, player1, right = of D1] $U^1$; (U2) [utility, player2, right = of D2] $U^2$; D1; U1; D2; U2;

(X) $X$; (D1) [below = of X, decision, player1] $D^1$; (U1) [utility, player1, right = of D1] $U^1$; D1; U1; ;

(D1) [decision, player1] $D^1$; (D2) [decision, player2, below = of D1] $D^2$; (U1) [utility, player1, right = of D1] $U^1$; (U2) [utility, player2, right = of D2] $U^2$; U1; D2; U2; ;

(D1) [decision, player1] $D^1$; (U1) [utility, player1, right = of D1] $U^1$; U1;

(D2) [decision, player2] $D^2$; (U2) [utility, player2, right = of D2] $U^2$; U2;

(0)$X$ [al] [ar] (1)(0-1)<180, red!70>$D^1_1$ [al] [ar] (2)(0-2)<0, red!70>$D^1_2$ [al] [ar] (3)(1-1)<180, blue!70>$D^2_1$ [al](1,1) [ar](1,0) (4)(1-2)<180, blue!70>$D^2_2$ [al](0,1) [ar](0,0) (5)(2-1)<0,blue!70>$D^2_3$ [al](0,0) [ar](0,1) (6)(2-2)<0,blue!70>$D^2_4$ [al](1,0) [ar](1,1) (3)(5)[$I^2_1$]{style="color: blue!70"}[above right] (4)(6)[$I^2_2$]{style="color: blue!70"}[above left] (0)(0)(1)(3-1)(3-2)(4-1)(4-2)(2)(5-1)(5-2)(6-1)(6-2)[black,inner sep = 15pt, xshift=0pt, yshift=-2pt]

Proofs {#sec:proofs}

If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid =\bfsf{efg2maim}(\efg)$ then $\efg$ and $\macid$ are equivalent.

::: {.proof} Proof. In what follows we make the trivial assumption that any non-leaf node $V$ in $\efg$ has more than one child (otherwise we could simply remove such nodes), and that if $V$ is a chance node then all of its children are reached with positive probability (otherwise we could delete the sub-trees rooted at such children). The proof follows directly by construction using our procedures and respectively.

To see this, first suppose we have a MAIM, $\macid$, and $\efg$ is an EFG resulting from $\text{\bfsf{maim2efg}}(\macid)$. A decision rule $\pi_D$ defines a probability distribution over $\dom(D)$ conditional on some feasible decision context, $\pa_D$. Following the procedure, each feasible decision context is an instantiation of a set of variables which corresponds to one information set $I_j^i$ where $S_Y = D$ for all $Y \in I_j^i$, and for each $d \in \dom(D)$ there exists precisely one node $Z \in \Ch_Y$ such that $\lambda(Y,Z) = d$. A policy $\pi^i$ for player $i$ is simply the set of decision rules for all $D \in \bm{D}^i$. Thus, by construction, there is a one-to-one correspondence between each behavioural strategy $\sigma^i$ in $\efg$ and policy $\pi^i$ in $\macid$, where $\sigma^i_j(d) \coloneqq \pi_D(d\mid\pa_D)$. By reasoning analogously about the chance and utility nodes, it follows from this and our construction above that the expected utility for each player $i$ under a policy profile $\pi$ in $\macid$ is the same as the expected utility in $\efg$ under $\sigma$, $\mathcal{U}^i_\macid(\pi) = \mathcal{U}^i_\efg(\sigma)$.

In the second direction, note first that the deterministic nature of the procedure guarantees uniqueness of the resulting MAIM. Then, suppose we have an EFG $\efg$ and $\macid$ is the MAIM that results from $\text{\bfsf{efg2maim}}(\efg)$. In our construction above, the incoming edges for each decision variable $D_j \in \bm{D}^i$ are precisely those that originate in the variables whose values player $i$ can determine when in the corresponding information set(s) $I^i_j$. Hence the strategy $\sigma^i_j$, which assigns a probability distribution over the set of available decisions $\bm{D}^i_j$ at node $V^i_j$, is determined as a function of $\pa_D$, where $\pa_D$ corresponds to a particular information set $I^i_j$ from which $D_j$ in the MAIM was created. Thus, given a strategy $\sigma^i$ in $\efg$ we define the corresponding policy $\pi^i$ in $\macid$ by $\pi^i(d\mid\pa_D) \coloneqq \sigma^i_j(d)$. Note that unlike in the case for , there may be more possible policy profiles in $\macid$ compared to strategy profiles in $\efg$ because of the possibility of decision contexts (involving null values $\perp$) corresponding to impossible paths through the game tree. In other words, via our construction these decision contexts are null. Hence, although the policy space may technically be larger, a player's decision rule under such contexts has no impact on their expected utility. Once again, therefore, we have that $\mathcal{U}^i_\macid(\pi) = \mathcal{U}^i_\efg(\sigma)$ for any policy $\pi$ corresponding to a strategy $\sigma$. ◻ :::

Before giving proofs of the remaining results, we provide the relevant definitions of the corresponding equilibrium refinements in EFGs.

[[def:EFGNE]]{#def:EFGNE label="def:EFGNE"} A strategy profile $\sigma$ is a Nash equilibrium (NE) in an EFG $\efg$ if, for every player $i \in \bm{N}$, $\mathcal{U}^i_\efg(\sigma^{-i}, \sigma^i) \geq \mathcal{U}^i_\efg(\sigma^{-i}, \hat{\sigma}^i)$ for all $\hat{\sigma}^i \in \Sigma^i$.

A strategy profile $\sigma$ is a subgame perfect equilibrium (SPE) in an EFG $\efg$ if $\sigma$ is a NE in every EFG subgame of $\efg$, when restricted to the decision nodes in each EFG subgame.

A behavioural strategy profile $\sigma$ is a trembling hand perfect equilibrium (THPE) in an EFG $\efg$ if there is a sequence of perturbation vectors ${\delta_k}{k\in\mathbb{N}}$ such that $\lim{k \rightarrow \infty}\vert\delta_k\vert_\infty = 0$ and for each perturbed game $\efg(\delta_k)$ there exists an NE $\sigma_k$ such that $\lim_{k \rightarrow \infty} \sigma_k = \sigma$. In $\efg$, a perturbation vector $\delta_k$ is a sequence of perturbations $\epsilon^d_j > 0$ with $\sum_{d \in D^i_j} \epsilon^d_j \leq 1$ for every information set $I^i_j$ such that in the perturbed game $\efg(\delta_k)$, each decision $d$ available at $I^i_j$ is played with probability at least $\epsilon^d_j$ (i.e., each strategy profile $\sigma_k$ is fully mixed).

If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$ then then there is a natural mapping $f$ between $\efg$ and $\macid$ such that $\sigma$ is an NE in $\efg$ if and only if any $\pi \in f(\sigma)$ is an NE in $\macid$.

::: {.proof} Proof. By Lemma [lem:correspondence]{reference-type="ref" reference="lem:correspondence"} we have that $\efg$ and $\macid$ are equivalent. Thus, let $f : \Sigma \rightarrow \Pi / \sim$ be a natural mapping between $\efg$ and $\macid$ as defined in Definition [def:equiv]{reference-type="ref" reference="def:equiv"}. Next note that for any $\sigma$ and any $\pi,\pi' \in f(\sigma)$ we have $\mathcal{U}^i_{\macid}(\pi) = \mathcal{U}^i_{\macid}(\pi')$ for every player $i$. Finally, observe that by the equivalence between $\efg$ and $\macid$ we have that for any $\sigma^{-i}$ then $\sigma^i \in \argmax_{\hat{\sigma}^i} \mathcal{U}^i_{\efg}(\sigma^{-i},\hat{\sigma}^i)$ if and only if $\pi^i \in \argmax_{\hat{\pi}^i} \mathcal{U}^i_{\macid}(\pi^{-i},\hat{\pi}^i)$ where $\pi \in f(\sigma)$, and hence that $\sigma$ is an NE if and only if every $\pi \in f(\sigma)$ is an NE. ◻ :::

If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$ then there is a natural mapping $f$ between $\efg$ and $\macid$ such that, for every EFG subgame $\efg'$ in $\efg$ there is a MAIM subgame $\macid'$ in $\macid$ that is equivalent to $\efg'$ under the natural mapping $f$ restricted to the strategies of $\efg'$.

::: {.proof} Proof. We begin by proving existence. Recall that a subgame $\efg'$ in an EFG $\efg$ is a subtree that is closed under information sets and descendants. Let $\bm{V}'$ be the set of intervention sets overlapping with or contained in $\efg'$, which are therefore a subset of the variables $\bm{V}$ in the equivalent MAIM $\macid$. We first show that $\bm{V}'$ forms a subgame base, i.e. that $\bm{V}'$ is closed under r-reachability in $\macid$, and for any $X,Y\in \bm{V}'$ and any directed path $X \to \dots \to Y$ in $\macid$, all nodes on the path are also in $\bm{V'}$. Beginning with the first condition, it suffices to show that there exists no variable $Y \in \bm{Y} = \bm{V} \setminus \bm{V}'$ such that $Y$ is r-reachable from some node $D' \in \bm{D'} = \bm{D} \cap \bm{V}'$: recall that this means we would have $\hat{Y} \not\perp \bm{U}^i \cap \Desc_{D'} \mid \Fa_{D'}$, where $\hat{Y}$ is an extra parent node added to $Y$ in $\macid$ and we have $D' \in \bm{D}^i$ for some player $i$. Any path supporting such a dependency must have one of the following two forms:

$\hat{Y} \rightarrow Y \rightarrow \cdots~ \bm{U}^i \cap \Desc_{D'}$. In this case, as $Y \notin \bm{V'}$ then, by either construction or , any node in $\efg$ corresponding to $Y$ must lie outside $\efg'$. Further, as $\efg'$ is closed under information sets, then the value of $Y$ must be observed by any decision node in $\efg'$ corresponding to $D'$, and thus there is an information link $Y \rightarrow D'$ in $\macid$ and so $Y \in \Fa_{D'}$. Hence, by conditioning on $\Fa_{D'}$ we block the path above, meaning there is no dependency.
$\hat{Y} \rightarrow Y \leftarrow \cdots~ \bm{U}^i \cap \Desc_{D'}$. In this case, let $X$ be the first variable in the path (from left to right) that is a fork node, i.e., we have $\cdots \leftarrow X \rightarrow \cdots$. Such a variable must exist because we assume that utility variables do not have children. As before, notice that by either construction or we have that $Y \notin \bm{V'}$ must lie outside $\efg'$, and as there is a directed path $X \to \dots \to Y$ in $\macid$ then $X$ must also lie outside $\efg'$. But then the fact that $\efg'$ is closed under information sets means that $X$ is observed by $D'$ and so conditioning on $\Fa_{D'}$ blocks the path above, meaning there is no dependency.

We next consider the second condition. Suppose that $X,Y\in \bm{V}'$ and there is a directed path $X \to \cdots \to Z \to \cdots \to Y$ in $\macid$. Then if $\efg \in \bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$, any topological ordering $\prec$ over the nodes in $\macid$ must have $X \prec Z \prec Y$ and hence if there are nodes in $\efg'$ corresponding to both $X$ and $Y$ then as $\efg'$ is closed under descendants we must have a node corresponding to $Z$ in $\efg'$. This means that the intervention set containing $Z$ overlaps with $\efg'$ and so $Z \in \bm{V}'$ by the definition of $\bm{V}'$.

For the second part of the existence proof, we show that there is a setting $\bm{y}$ of $\bm{Y}$ which, when combined with $\bm{V'}$, leads to a subgame $\macid'$ of $\macid$ that is equivalent to $\efg'$. First, note that any node passed through on the path $\rho_{R'}$ from the root $R$ of $\efg$ to the root $R'$ of $\efg'$ must correspond to a variable in $\bm{Y}$. Let $\bm{y}$ be a setting of $\bm{Y}$ that is consistent with the path from $R$ to $R'$, and let $\macid'$ be the result MAIM subgame that is obtained by combining $\bm{V'}$ with $\bm{y}$. Observe that by the argument argument above, the decision variables in $\bm{D'} \subseteq \bm{V'}$ are precisely those corresponding to the information sets in $\efg'$, and moreover that any decision context of any $D' \in \bm{D'}$ that does not correspond to some information set in $\efg'$ is, by definition, not feasible and hence null. Thus, the partitioned policy space in $\macid'$ (where two policies belong to the same partition if and only if they differ only on those decision contexts that are null) is in one-to-one correspondence with the strategies in $\efg'$, just as was shown in the proof of Lemma [lem:correspondence]{reference-type="ref" reference="lem:correspondence"}. Further, for any equivalent strategy $\sigma$ in $\efg$ and $\pi$ in $\macid$, it follows directly by construction using or that: $$\begin{aligned} \mathcal{U}^i_{\efg'}(\sigma) &= \sum_\rho {P'}^\sigma(\rho) U(\rho[\bm{L}])[i]\ &= \sum_\rho {P}^\sigma(\rho \mid \rho_{R'}) U(\rho[\bm{L}])[i]\ &= \sum_{U_j \in \bm{U}^i}\sum_{u_j \in \dom(U_j)} !! u_j {\Pr}^\pi(U_j = u_j \mid \bm{y})\ &= \sum_{U_j \in \bm{U}^i}\sum_{u_j \in \dom(U_j)} !! u_j {\Pr'}^\pi(U_j = u_j)\ &= \mathcal{U}^i_{\macid'}(\pi) \end{aligned}$$ for every player $i$, where ${P}^\sigma(\rho \mid \rho_{R'})$ is the distribution over paths $\rho$ in $\efg$ conditional on $\rho$ containing $\rho_{R'}$ as a sub-path. Hence, not only do we have a one-to-one correspondence $f$ between strategies in $\efg'$ and (a partition over) the policies in $\macid'$, we see that for any $\pi \in f(\sigma)$ we have that $\mathcal{U}^i_{\efg'}(\sigma) = \mathcal{U}^i_{\macid'}(\pi)$ for all $i \in \bm{N'} = {i \in \bm{N} \mid \bm{D}^i \cap \bm{V'} \neq \varnothing }$ and hence that $\efg'$ and $\macid'$ are equivalent, and thus that $f$ forms a natural mapping between $\efg'$ and $\macid'$. ◻ :::

If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$ then there is a natural mapping $f$ between $\efg$ and $\macid$ such that if any $\pi \in f(\sigma)$ is an SPE in $\macid$, then $\sigma$ is an SPE in $\efg$.

::: {.proof} Proof. The corollary can be seen to follow immediately from combining the definitions of SPEs in EFGs and MAIMs with Proposition [prop:subgames]{reference-type="ref" reference="prop:subgames"} and Corollary [prop:BE]{reference-type="ref" reference="prop:BE"}. Concretely, if $\efg \in \bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$, and if any $\pi \in f(\sigma)$ is an SPE in $\macid$ (for some natural mapping $f$ between $\efg$ and $\macid$ satisfying Proposition [prop:subgames]{reference-type="ref" reference="prop:subgames"}) then $\pi$ is an NE in any subgame of $\macid$, and as every subgame of $\efg$ is equivalent to a subgame of $\macid$ under $f$, then we must have that $\sigma$ is an NE in each subgame of $\efg$ (by Corollary [prop:BE]{reference-type="ref" reference="prop:BE"}). ◻ :::

If $\efg\in\bfsf{maim2efg}(\macid)$ or $\macid = \bfsf{efg2maim}(\efg)$ then there is a natural mapping $f$ between $\efg$ and $\macid$ such that $\sigma$ is a THPE in $\efg$ if and only if any $\pi \in f(\sigma)$ is a THPE in $\macid$.

::: {.proof} Proof. Given Lemma [lem:correspondence]{reference-type="ref" reference="lem:correspondence"} and Corollary [prop:BE]{reference-type="ref" reference="prop:BE"}, it suffices to show that the perturbed MAIM resulting from $\text{\bfsf{efg2maim}}(\efg(\delta_k))$ is equivalent to $\efg(\delta_k)$, and similarly that any perturbed EFG resulting from $\text{\bfsf{maim2efg}}(\macid(\delta'_k))$ is equivalent to $\macid(\delta'k)$. It can readily be seen from the constructions detailed in Appendices 6.1{reference-type="ref" reference="sec:macid2efg"} and 6.2{reference-type="ref" reference="sec:EFG2MAID"} respectively that the entry $\epsilon^d_j$ of the perturbation vector $\delta_k$ for $\efg$ corresponds precisely to the entry $\epsilon^d{\pa_D}$ in the perturbation vector $\delta'_k$ for $\macid$. The only extra entries that $\delta'_k$ may have over $\delta_k$ are those correspond to null decision contexts $\pa_D$, which are are irrelevant for the determination of NEs in the perturbed MAIM. ◻ :::

Codebase {#sec:codebase}

In this section, we demonstrate the usage of our Python codebase, available at https://github.com/causalincentives/pycid. We provide a number of screenshots of a Jupyter notebook showing how to instantiate MAIDs and MAIMs, plot attributes, and how to find all of the MAIM's SPEs. We first give an overview of some of our codebase's key methods (Appendix 8.1{reference-type="ref" reference="sec:keymethods"}) before demonstrating how to instantiate a particular MAID or MAIM (Appendix 8.2{reference-type="ref" reference="sec:insMACID"}). We then show how we compute all pure policy SPEs (Appendix 8.3{reference-type="ref" reference="sec:NEcomp"}) and explain the usefulness of interfacing with Gambit [@mckelvey2006gambit] (Appendix 8.4{reference-type="ref" reference="sec:gambit"}).

Key Methods {#sec:keymethods}

The codebase relevant for what is discussed in this paper is centred around a MACID class. A MACID is a multi-agent causal influence diagram, which is a slightly different model than a MAID because the edges represent every causal relationship between the random variables chosen to be endogenous variables in the model. However, because MACIDs subsume MAIDs (in the sense of Pearl's 'causal hierarchy' [@pearl2009causality]), for the purpose of this discussion, one can interpret this object to be the same as that defined in Definition [def:MAID]{reference-type="ref" reference="def:MAID"}. Our MACID class inherits from pgmpy's BayesianModel class [@ankan2015pgmpy] because when a policy profile $\pi$ has been specified in a MAID $\macid$, the induced MAID $\macid(\pi)$ is a Bayesian network with joint probability distribution $\Pr^\pi$. Some of the most important methods belonging to this class, and relating to the work in this paper, perform the following operations:

Plot the MAID using NetworkX [@hagberg2008exploring].
Find and plot the MAID's relevance graph (Definition [def:relgraph]{reference-type="ref" reference="def:relgraph"}) with its maximal SCCs (as in, for example, Figure [fig:road]{reference-type="ref" reference="fig:road"}), and its condensed relevance graph.
Find a MAIM's pure SPEs using Algorithm [algo:SPE]{reference-type="ref" reference="algo:SPE"}.
Find all MAID and MAIM subgames.
Convert any MAID into an EFG using the procedure in Appendix 6.1{reference-type="ref" reference="sec:macid2efg"}, and into a format that can be used with Gambit [@mckelvey2006gambit].

Making use of the existing literature on MAIDs and CIDs and to advantage researchers for future work on MAIDs (and MACIDs), we have also implemented methods to:

Determine if various graphical criteria are satisfied such as:
- Whether a node faces incentives in single-agent CIDs [@Everitt2021; @carey2020incentives].
- Whether a decision node in a MAID faces one of Pfeffer and Gal's reasoning patterns [@pfeffer2007reasoning].
Prune the original MAID's edges based on these reasoning patterns and K&M's concept of ignorable information [@milch2008ignorable].

[[fig:jobhire]]{#fig:jobhire label="fig:jobhire"}

[[fig:jh_rel]]{#fig:jh_rel label="fig:jh_rel"}

[[fig:gambit]]{#fig:gambit label="fig:gambit"}

[[fig:road]]{#fig:road label="fig:road"}

[[fig:taxi]]{#fig:taxi label="fig:taxi"}

Instantiating a MAID {#sec:insMACID}

Recall that a MAID consists of a graph ($\bm{V}, \bm{E}$) to represent the conditional dependencies between random variables and a set of players $\bm{N}$. A MAIM adds a parametrisation, $\theta \in \Theta$, to this MAID to specify with CPDs the precise relationship between variables in the MAID. Therefore, if we are only interested in MAIM's structure, we need only specify the graph's nodes (including their partition into chance, decision, and utility variables, and their allocation to respective players) and edges. Figure [fig:jobhire]{reference-type="ref" reference="fig:jobhire"} shows how we define, instantiate, and then plot the MAID for Example [ex:hiring]{reference-type="ref" reference="ex:hiring"}. Decision and utility nodes are square- and diamond-shaped respectively. The nodes belonging to each agent are coloured uniquely and chance nodes are grey. Because r-reachability (Definition [def:rreachable]{reference-type="ref" reference="def:rreachable"}) is a graphical criterion, we can also plot the MAID's relevance graph before specifying the MAID's parametrisation.

To demonstrate a slightly more complex MAID, we use K&M's 'Road Example' [@koller2003multi]. As the relevance graph for this MAID contains cycles, it is helpful to have methods that visualise the relevance graph's distinct maximal SCCs and the MAID's condensed relevance graph (Figure [fig:road]{reference-type="ref" reference="fig:road"}). This shows clearly that although the initial game is highly complex, the MAID formalism allows us to exploit the dependencies between the decision variables because there are 3 maximal SCCs. The MAID's condensed relevance graph shows that $(B3W, B3E)$ are the decision nodes in the smallest proper MAID subgame, and then $(B2W, B2E)$, before finally $(B1W, B1E)$ are only decision nodes in the largest subgame, which is identical to the original MAID. This is also the ordering that Algorithm [algo:SPE]{reference-type="ref" reference="algo:SPE"} would follow once the MAID is parameterised as a MAIM.

If we want the MAID to be parametrised as a MAIM (such as for interfacing with Gambit [@mckelvey2006gambit] or computing SPEs), we must also define:

Each decision node's domain.
Domains and CPDs for each chance and utility node. Definition [def:MAIM]{reference-type="ref" reference="def:MAIM"} states that these CPDs must reflect the fact that the value of a utility node is a deterministic function of the value of its parents.

Finding Subgame Perfect Equilibria {#sec:NEcomp}

We will explain how Algorithm [algo:SPE]{reference-type="ref" reference="algo:SPE"} works with the help of Example [ex:taxis]{reference-type="ref" reference="ex:taxis"}. Figure [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"} shows the original MAID along with its parametrisation and proper MAID subgame. Figure [fig:taxi]{reference-type="ref" reference="fig:taxi"} shows how we would parameterise a MAIM object for this example to use with our codebase. The MAID subgame ordering will be $\prec~ = \macid_1, \macid_2$ (shown in Figures [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"} a) and [fig:taxiMACID]{reference-type="ref" reference="fig:taxiMACID"} c) respectively) because $\macid_2$ is a proper MAID subgame of $\macid_1$.

The only decision node in $\macid_2$ is $D^2$, and there are two two MAIMs for $\macid_2$ corresponding with the two possible instantiations of $D'1$: $d^1 \in {e, c}$. Player 2 will maximise its expected utility in $\macid_2$ by choosing $d^2 = c$ if $d^1 = e$ and $d^2 = e$ if $d^1=c$. In the next iteration of the optimisation loop, we consider the MAIM for $\macid_1$: $D^2$ has already been assigned a decision rule and so is now a chance node with a CPD corresponding to the policy just given. Therefore, $D^1$ is the only decision node remaining in $\macid_1$. Player 1's expected utility in this MAIM subgame is maximised by choosing $e$. Thus, the algorithm will output the only pure policy SPE of the game, shown in Figure [fig:taxiNE]{reference-type="ref" reference="fig:taxiNE"} a).

Our implementation uses a data tree to store results of each successive MAID subgame optimisation. The size of the tree depends on how many decision nodes there are and how many actions are available at each. The leaves of the tree are initially labelled with all the possible combination of actions selected at each of a MAIM's decision nodes. At each iteration of Algorithm [algo:SPE]{reference-type="ref" reference="algo:SPE"}, the data tree is reduced once. At each reduction, we find the first node that is empty according to a post-order traversal of the tree and assign it the label of its child which returns the largest expected utility for the agent making that choice. We continue reducing the tree until it is full, then read off the resulting pure policy SPE (Figure [fig:taxi]{reference-type="ref" reference="fig:taxi"}).

In this example, we did not have any 'ties', however these can arise when comparing the expected utility an agent will get from selecting different actions. To account for this, we create a queue system. We pop the tree from the front of our queue, reduce it once and then push it to the back of the queue; each time there is a tie, we push multiple trees (one for each option) to the back of the queue. Our algorithm is complete when all trees in the queue are full. Using this technique, we find all pure policy SPEs.

Gambit {#sec:gambit}

An especially useful method belonging to our MACID class converts any MAIM into an EFG that can be used with Gambit [@mckelvey2006gambit]. Gambit is a powerful open-source collection of tools for reasoning about games. Our method MAID_to_Gambit_file converts a MAID into an EFG using the procedure described in Appendix 6.1{reference-type="ref" reference="sec:macid2efg"} before writing this EFG as a .efg file using a prefix-order traversal of the EFG. This file can then run with Gambit's command-line tools as well as with its GUI (Figure [fig:gambit]{reference-type="ref" reference="fig:gambit"}) to compute different type of Nash Equilibria, to find beliefs at different nodes, or to find each agent's weakly or strongly dominated strategies.

[^1]: Formally, $\sigma_j^i(d) > 0$ only if $d \in D^i_k$ for any vertex $V_k \in I_j^i$, and $\Sigma_{d \in D^i_k}\sigma_j^i(d) = 1$.

[^2]: The edge directions used here are the same as originally defined by K&M [@koller2001multi] but reversed compared with those in their later work [@koller2003multi] as this eases our later exposition of MAID subgames (Section 3.1{reference-type="ref" reference="sec:subgames"}).

[^3]: In the case of a two-player game, a THPE removes all weakly dominated policies. Here, for both $D^1$ and $D^2$, the pure policy of choosing $n$ is weakly dominated by the pure policy of choosing $a$.

[^4]: Available online at https://github.com/causalincentives/pycid.

[^5]: This topological ordering will, in general, be non-unique.

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice