Contents
- Why this is weird
- Examples in the wild
- Mennen’s ABC example
- Interpreting Bayesian Networks
- Consequences
- In summary
- Acknowledgements In short: There is no objective way of summarizing a Bayesian update over an event with three outcomes A : B : C as an update over two outcomes A : \neg A. Suppose there is an event with possible outcomes A,B,C.We have prior beliefs about the outcomes p_1:p_2:p_3.An expert reports a likelihood factor of e_1:e_2:e_3.Our posterior beliefs about A : B : C are then p_1\cdot e_1 : p_2\cdot e_2 : p_3\cdot e_3. \underbrace{ \begin{pmatrix} p_1 \ p_2 \ p_3 \end{pmatrix} }{\text{Prior}} \times \underbrace{ \begin{pmatrix} e_1 \ e_2 \ e_3 \end{pmatrix} }{\text{Update}} = \underbrace{ \begin{pmatrix} p_1 \cdot e_1 \ p_2 \cdot e_2 \ p_3 \cdot e_3 \end{pmatrix} }{\text{Posterior}}But suppose we only care about whether A happens.Our prior beliefs about A : \neg A are p_1:(p_2+p_3).Our posterior beliefs are p_1\cdot e_1 : (p_2\cdot e_2 + p_3\cdot e_3).This implies that the likelihood factor of the expert regarding A : \neg A is \frac{p_1\cdot e_1 : (p_2\cdot e_2 + p_3\cdot e_3)}{ p_1 : (p_2+p_3) } = e_1 : \frac{p_2\cdot e_2 + p_3\cdot e_3}{ p_2 + p_3 }. \underbrace{ \begin{pmatrix} p_1 \ p_2 + p_3 \end{pmatrix} }{\text{Prior}} \times \underbrace{ \begin{pmatrix} e_1 \ \frac{p_2\cdot e_2 + p_3\cdot e_3}{p_2 + p_3} \end{pmatrix} }{\text{Update}} = \underbrace{ \begin{pmatrix} p_1 \cdot e_1 \ p_2 \cdot e_2 + p_3 \cdot e_3 \end{pmatrix} }{\text{Posterior}}This likelihood factor depends on the ratio of prior beliefs p_2 : p_3. Concretely, the lower factor in the update is the weighted mean of the evidence e_2 and e_3 according to the weights p_2 and p_3. This has a relatively straightforward interpretation. The update is supposed to be the ratio of the likelihoods under each hypothesis. The upper factor in the update is P(E | A). The lower factor is P(E | B \cup C) = \frac{P(B) \cdot P(E | B) + P(C) \cdot P(E | C)}{P(B) + P(C)}. \underbrace{ \begin{pmatrix} P(A | E) \ P(B \cup C | E) \end{pmatrix} }{\text{Posterior}} \propto \underbrace{ \begin{pmatrix} P(A) \ P(B \cup C) \end{pmatrix} }{\text{Prior}} \times \underbrace{ \begin{pmatrix} P(E | A) \ P(E | B \cup C) \end{pmatrix} }{\text{Update}}\underbrace{ \begin{pmatrix} P(E | A) \ P(E | B \cup C) \end{pmatrix} }{\text{Update}} = \begin{pmatrix} P(E | A) \ \frac{ P(E \cap (B \cup C))}{P(B \cup C)} \end{pmatrix} = \begin{pmatrix} P(E | A) \ \frac{P(B) \cdot P(E | B) + P(C) \cdot P(E | C)}{P(B) + P(C)} \end{pmatrix}I found this very surprising—the summary of the expert report depends on my prior beliefs! I claim that this phenomena is unintuitive, and being unaware of this can lead to errors.
Why this is weird
Bayes’ rule describes how to update our prior beliefs using data. In my mind, one very nice property of Bayes rule was that it cleanly separates the process into a subjective part (eliciting your priors) and an ~objective part (computing the update). \text{Posterior} = \underbrace{ \text{Prior} }{\text{Subjective}} \times \underbrace{ \text{Likelihood} }{\text{Objective}}For example, we may disagree on our prior beliefs on whether eg COVID19 originated in a lab. But we cannot disagree on the direction and magnitude of the update caused by learning that it originated in one of the few cities in the world with a gain-of-function lab working on coronaviruses. Because of this, researchers are encouraged to report their update factors together with their all considered beliefs. This way, users can use their research for their own conclusions by multiplying their prior with the update. And metastudies can just take the product of the likelihoods of all studies to estimate the combined effect of the evidence. In the above example, we lose this nice property—the update factor depends on the prior beliefs of the user. Researchers would not be able to objectively summarize their likelihood about whether COVID19 originated in a lab accidentally vs zoonotically vs being designed as a bioweapon as a single number for people who only care about whether it originated in a lab versus any other possibility.
Examples in the wild
I ran into this problem twice recently:
-
When analyzing Mennen’s ABC example of a case where averaging the logarithmic odds of experts seems to result in nonsense.
-
In my own research on interpreting Bayesian Networks as I was trying to come up with a way of decomposing a Bayesian update into a combination of several updates. In both cases being unaware of the phenomena led me to a conceptual mistake.
Mennen’s ABC example
Mennen’s example involves three experts debating an event with three possible outcomes, A:B:C. Expert #1 assigns relative odds of 2 : 1 : 1.Expert #2 assigns relative odds of 1 : 2 : 1.Expert #3 assigns relative odds of 1 : 1 : 2. The logodds-averaging pooled opinion of the experts is \sqrt[3]{2} : \sqrt[3]{2} : \sqrt[3]{2} i.e. equal odds, which correspond to a probability of A equal to \frac{1}{3} \approx 33.33%. \sqrt[3]{ \underbrace{ \begin{pmatrix} 2 \ 1 \ 1 \end{pmatrix} }{\text{Expert #1}} \times \underbrace{ \begin{pmatrix} 1 \ 2 \ 1 \end{pmatrix} }{\text{Expert #2}} \times \underbrace{ \begin{pmatrix} 1 \ 1 \ 2 \end{pmatrix} }{\text{Expert #3}} } = \underbrace{ \begin{pmatrix} \sqrt[3]{2} \ \sqrt[3]{2} \ \sqrt[3]{2} \end{pmatrix} }{\text{Pooled opinion} }But suppose we only care about A : \neg A. Expert #1’s implicit odds are 2 : 2.Expert #2’s implicit odds are 1 : 3.Expert #3’s implicit odds are 1 : 3. The pooled odds in this case are \sqrt[3]{2} : \sqrt[3]{2 \cdot 3 \cdot 3}, which correspond to a probability of A equal to \frac{\sqrt[3]{2}}{\sqrt[3]{2} + \sqrt[3]{2 \cdot 3 \cdot 3}} \approx 32.47%. \sqrt[3]{ \underbrace{ \begin{pmatrix} 2 \ 1 + 1 \end{pmatrix} }{\text{Expert #1}} \times \underbrace{ \begin{pmatrix} 1 \ 2 + 1 \end{pmatrix} }{\text{Expert #2}} \times \underbrace{ \begin{pmatrix} 1 \ 1 + 2 \end{pmatrix} }{\text{Expert #3}} } = \underbrace{ \begin{pmatrix} \sqrt[3]{2} \ \sqrt[3]{2\times 3 \times 3} \end{pmatrix} }{\text{Pooled opinion} }We get different results depending on whether we take the implicit odds after or before pooling expert opinion. What is going on? Mennen claims that this is a strike against logarithmic pooling. The issue according to him is in the step where we take the opinion of the three experts and aggregate it using average logodds. I think that this is related to the phenomena I described at the beginning of the article. The problem is with the step where we take the relative odds 1 : 2 : 1 and summarize them as 1 : 3. It’s no wonder that logodd pooling gives inconsistent results when we aggregate outcomes. Bayesian updating is not well defined in that case!
Interpreting Bayesian Networks
I will not enter into too much detail because my theory of interpretability of Bayesian Networks is very complex. But it suffices to say that I was getting inconsistent results because of this issue. In essence, I came up with a way of decomposing a Bayesian update into a series of independent steps, corresponding to different subgraphs of a Bayesian Network. For example, I would decompose the update over a node with three outcomes A, B, C as the product of the baseline odds of the event and a number of updates. In my system, I only cared about whether A happened. So I naively summarized each update before aggregating them. O(\text{Event} | \text{Evidence}) \approx \underbrace{ \begin{pmatrix} p_1 \ p_2 + p_3 \end{pmatrix} }{\text{Prior}} \times \underbrace{ \begin{pmatrix} e{1,1} \ e_{1,2} + e_{1,3} \end{pmatrix} }{\text{Argument 1}} \times \dots \times \underbrace{ \begin{pmatrix} e{n,1} \ e_{n,2} + e_{n,3} \end{pmatrix} }{\text{Argument n}}This was giving me very poor results—my resulting updates would be very off compared to traditional inference algorithms like message passing. It is no wonder this was giving me bad results—it is the wrong way of going about it! Our analysis at the beginning implies that the update should be the average of e{i,2} and e_{i,3}, instead of the sum. After realizing the paradox, I changed my system to not summarizing the odds of A : \neg A until after aggregating all the updates. O(\text{Event} | \text{Evidence}) \approx \underbrace{ \begin{pmatrix} p_1 \ p_2 \ p_3 \end{pmatrix} }{\text{Prior}} \times \underbrace{ \begin{pmatrix} e{1,1} \ e_{1,2} \ e_{1,3} \end{pmatrix} }{\text{Argument 1}} \times \dots \times \underbrace{ \begin{pmatrix} e{n,1} \ e_{n,2} \ e_{n,3} \end{pmatrix} }_{\text{Argument n}}Performance improved.
Consequences
I am quite confused about what to think about this.
It clearly has consequences, as illustrated by the examples in the previous section. But I am not sure what to recommend doing in response.
My most immediate takeaway is to be very careful when aggregating outcomes—there is an important chance we will be introducing an error along the way.
Beyond that, the aggregation paradox seems to imply that we need to work at the correct level of aggregation. We cannot naively deduce implied binary odds from the distribution of a multiple outcome event.
But what is the right level of aggregation?
When aggregating, the lower factor of the update is a weighted mean of the evidence likelihoods P(E | B) and P(E | C). This suggests that the problem disappears when we impose P(E | B) = P(E | C) for any disaggregation of the joint event \neg A into subevents B and C.
But this condition is too strong. For example, we could base our disaggregation on the observed evidence. For example, if the evidence E can either be \text{Red} or \text{Blue} we could disaggregate ~A into the cases where E=\text{Red} and the cases where E=\text{Blue}. In that case, the condition cannot ever be satisfied, by definition.
We can say that this disaggregation is not a sensible one, and ought to be excluded for the purposes of the condition. But in that case we have passed the bucket down to defining what is a sensible disaggregation.
Another approach is to assume that the prior relative likelihood of any aggregated outcomes is uniform, ie P(B) = P(C). In that case, we have that P(E | B \cup C) = \frac{P(B) \cdot P(E | B) + P(C) \cdot P(E | C)}{P(B) + P(C)} = \frac{P(E | B) + P(E | C)}{2}.
But then we can no longer chain updates—after applying any likelihood where P(E | B) \not = P(E | C) the resulting posterior will no longer meet this condition.
Pragmatically, it seems like the best we can do if we want to rescue objetivity is to resign ourselfs to summarize the updates assuming a uniform prior. That is, by averaging the evidence associated to each aggregated outcome.
This is not enough to correctly approximate Bayesian updating, as we can see in the example below:
\underbrace{ \begin{pmatrix} 1 \ 0.01 \ 0.01 \end{pmatrix} }\text{Posterior} = \underbrace{ \begin{pmatrix} 1 \ 1 \ 1 \end{pmatrix} }\text{Prior} \times \underbrace{ \begin{pmatrix} 1 \ 0.01 \ 1 \end{pmatrix} }\text{Refute B} \times \underbrace{ \begin{pmatrix} 1 \ 1 \ 0.01 \end{pmatrix} }\text{Refute C} \not = \underbrace{ \begin{pmatrix} 1 \ 1 + 1 \end{pmatrix} }\text{Prior} \times \underbrace{ \begin{pmatrix} 1 \ \frac{0.01 + 1}{2} \end{pmatrix} }\text{Refute B} \times \underbrace{ \begin{pmatrix} 1 \ \frac{1 + 0.01}{2} \end{pmatrix} }\text{Refute C} \approx \underbrace{ \begin{pmatrix} 1 \ 0.5 \end{pmatrix} }\text{Posterior}But I can’t see how to do better in the absence of more information.
One key takeaway here is that beliefs and updates are summarized in different ways.
\underbrace{\begin{pmatrix}p_1 \ p_2 \ p_3\end{pmatrix}}{\text{Belief}}\rightarrow\underbrace{\begin{pmatrix}p_1 \ p_2 + p_3\end{pmatrix}}{\text{Summarized belief}} \text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ } \underbrace{\begin{pmatrix}e_1 \ e_2 \ e_3\end{pmatrix}}{\text{Update}}\rightarrow\underbrace{\begin{pmatrix}e_1 \ \frac{e_2 + e_3}{2}\end{pmatrix}}{\text{Summarized update}}## In summary
I have explained one counterintuitive consequence of Bayesian updating on variables with more than two outcomes. This paradox implies that we should be careful when grouping together outcomes of a variable. And I have shown two situations where this unintuitive consequence is relevant.
This is a post meant to explore and start a discussion more than provide definite answers. Some things I’d be keen on discussing include:
-
Is this a documented phenomena? Where can I find more discussion?
-
What does this imply for formulating forecasting questions? Will this result in problems when asking binary questions about events that are multifaceted?
-
What is "the right level" of outcome aggregation for a given problem?
-
Are there other examples where similar issues come up? I’d be really interested in your thoughts—please leave a comment if you have any!
Acknowledgements
Thanks to rossry, Nuño Sempere, Eric Neyman, Ehud Reiter and ForgedInvariant for discussing this topic with me and helping me clarify some ideas. Thanks to Alex Mennen for coming up with the example I referenced in the post.
The framing of this issue that makes the most sense to me is "P(E|B\cup C) is a function of P(B):P(C)". When I look at it this way, I disagree with the claim (in "Mennen’s ABC example") that "[Bayesian updating] is not invariant when we aggregate outcomes"—I think it’s clearer to say the Bayesian updating is not well-defined when we aggregate outcomes. Additionally, in "Interpreting Bayesian Networks", the framing seems to make it clearer that the problem is that you used e_{1,2}+e_{1,3} for P(E|B\cup C) -- but they’re not the same thing! In essence, you’re taking the sum where you should be taking the average... With this focus on (mis)calculating P(E|B\cup C), the issue seems to me more like "a common error in applying Bayesian updates", rather than a fundamental paradox in Bayesian updating itself. I agree with the takeaway "be careful when grouping together outcomes of a variable"—because grouping exposes one to committing this error—but I’m not sure I’m seeing the thing that makes you describe it as unintuitive?
Comment
I like this framing. This seems to imply that summarizing beliefs and summarizing updates are two distinct operations. For summarizing beliefs we can still resort to summing: \underbrace{\begin{pmatrix}p_1 \ p_2 \ p_3\end{pmatrix}}{\text{Belief}}\rightarrow\underbrace{\begin{pmatrix}p_1 \ p_2 + p_3\end{pmatrix}}{\text{Summarized belief}}
But for summarizing updates we need to use an average—which in the absence of prior information will be a simple average: \underbrace{\begin{pmatrix}e_1 \ e_2 \ e_3\end{pmatrix}}{\text{Update}}\rightarrow\underbrace{\begin{pmatrix}e_1 \ \frac{e_2 + e_3}{2}\end{pmatrix}}{\text{Summarized update}} Annoyingly and as you point out this is not a perfect summary—we are definitely losing information here and subsequent updates will be not as exact as if we were working with the disaggregated odds. I still find it quite disturbing that the update after summarizing depends on prior information—but I can’t see how to do better than this, pragmatically speaking.
Comment
Right, I agree that for the update aggregation \frac{e_2+e_3}2 is better than e_2+e_3 (but still lossy). And the thing that p_2:p_3 affects is the weighting in the average—so if e_2=e_3 then the ps don’t matter! (which is a possible answer to your question of "how much aggregation/disaggregation can you do?")
But yeah if e_2 is very different from e_3 then I don’t think there’s any way around it, because the effective e_i could be one or the other depending on what the p_i are.
(Possibly a bit of a tangent) It occurred to me while reading this that perhaps average log odds could make sense in the context in which there is a uniform prior, and the probabilities provided by experts differ because the experts disagree on how to interpret evidence that brings them away from the uniform prior. This has some intuitive appeal:
Comment
Comment
I think I’ve followed the basic argument here? Let me try a couple examples, first a toy problem and then a more realistic one. Example 1: Dice. A person rolls some fair 20-sided dice and then tells you the highest number that appeared on any of the dice. They either rolled 1 die (and told you the number on it), or 5 dice (and told you the highest of the 5 numbers), or 6 dice (and told you the highest of the 6 numbers). For some reason you care a lot about whether there were exactly 5 dice, so you could break this down into two hypotheses: H1: They rolled 5 diceH2: They rolled 1 or 6 dice Let’s say they roll and tell you that the highest number rolled was 20. This favors 5 dice over 1 die, and to a lesser degree it favors 6 dice over 5 dice. So if you started with equal (1/3) probabilities on the 3 possibilities, you’ll update in favor of H1. Someone who also started with a 1⁄3 chance on H1, but who thought that 1 die was more likely than 6 dice, would update even more in favor of H1. And someone whose prior was that 6 dice was more likely than 1 die would update less in favor of H1, or even in the other direction if it was lopsided enough. Relatedly, if you repeated this experiment many times and got lots of 20s, that would eventually become evidence against H1. If the 100th roll is 20, then that favors 6 dice over 5, and by that point the possibility of there being only 1 die is negligible (if the first 99 rolls were large enough) so it basically doesn’t matter that the 20 also favors 5 dice over 1. This seems like another angle on the same phenomenon, since your posterior after 99 rolls is your prior for the 100th roll (and the evidence from the first 99 rolls has made it lopsided enough so that the 20 counts as evidence against H1). Example 2: College choice. A high school freshman hopes & expects to attend Harvard for college in a few years. One observer thinks that’s unlikely, because Harvard admissions is very selective even for very good students. Another observer thinks that’s unlikely because the student is into STEM and will probably wind up going to a more technical university like MIT; they haven’t thought much yet about choosing a college and Harvard is probably just serving as a default stand-in for a really good school. The two observers might give the same p(Harvard), but for very different reasons. And because their models are so different, they could even update in opposite directions on the same new data. For instance, perhaps the student does really well on a math contest, and the first observer updates in favor of the student attending Harvard (that’s an impressive accomplishment, maybe they will make it past the admissions filter) while the second observer updates a bit against the student attending Harvard (yep, they’re a STEM person). You could fit this into the "three outcomes" framing of this post, if you split "not attending Harvard" into "being rejected by Harvard" and "choosing not to attend Harvard".
Comment
I think your first example could be even simpler. Imagine you have a coin that’s either fair, all-heads, or all-tails. If your prior is "fair or all-heads with probability 1⁄2 each", then seeing heads is evidence against "fair". But if your prior is "fair or all-tails with probability 1⁄2 each", then seeing heads is evidence for "fair". Even though "fair" started as 1⁄2 in both cases. So the moral of the story is that there’s no such thing as evidence for or against a hypothesis, only evidence that favors one hypothesis over another.
Comment
That’s a great explanation. Evidence may also be compatible or incompatible with a hypothesis. For instance, if I get a die (without the dots on the sides that indicate 1-6), and I instead label* it: Red, 4, Life, X-Wing, Int, path through a tree Then finding out I rolled a 4, without knowing what die I used, is compatible with the regular dice hypothesis, but any of the other rolls, is not. *(likely using symbols, for space reasons)
This seems related to philosophy of science stuff, where updating is about pitting hypotheses against each other. In order to do that you have to locate the leading alternative hypotheses. It doesn’t work well to just pit a hypothesis against "everything else" (it’s hard to say what p(E|not-H) is, and it can change as you collect more data). You need to find data that distinguishes your hypothesis from leading alternatives. An experiment that favors Newtonian mechanics over Aristotelian mechanics won’t favor Newtonian mechanics over general relativity.
Seeing the equations, it was hard to intuitively grasp why updates work this way. This example made things more intuitive for me: If an event can have 3 outcomes, and we encounter strong evidence against outcomes B and C, then the update looks like this: \underbrace{ \begin{pmatrix} 1 \ 1 \ 1 \end{pmatrix} }{\text{Prior}} \times \underbrace{ \begin{pmatrix} 1 \ 0.01 \ 1 \end{pmatrix} }{\text{Refute B}} \times \underbrace{ \begin{pmatrix} 1 \ 1 \ 0.01 \end{pmatrix} }{\text{Refute C}} \neq \underbrace{ \begin{pmatrix} 1 \ 2 \end{pmatrix}}{\text{Pooled prior}} \times \underbrace{ \begin{pmatrix} 1 \ 1.01 \end{pmatrix}}{\text{Refute B}} \times \underbrace{ \begin{pmatrix} 1 \ 1.01 \end{pmatrix}}{\text{Refute C}}The information about what hypotheses are in the running is important, and pooling the updates can make the evidence look much weaker than it is.
Comment
Note that you are making the same mistake than me! Updates are not summarized in the same way as beliefs—for the update the "correct" way is to take an average of the B,C likelihoods: \underbrace{\begin{pmatrix}1 \ 0.01 \ 0.01\end{pmatrix}}\text{Posterior}=\underbrace{\begin{pmatrix}1 \ 1 \ 1\end{pmatrix}}\text{Prior}\times\underbrace{\begin{pmatrix}1 \ 0.01 \ 1\end{pmatrix}}\text{Refute B}\times\underbrace{\begin{pmatrix}1 \ 1 \ 0.01\end{pmatrix}}\text{Refute C}\not = \underbrace{\begin{pmatrix}1 \ 1 + 1\end{pmatrix}}\text{Prior}\times\underbrace{\begin{pmatrix}1 \ \frac{0.01 + 1}{2}\end{pmatrix}}\text{Refute B}\times\underbrace{\begin{pmatrix}1 \ \frac{1 + 0.01}{2}\end{pmatrix}}\text{Refute C}\approx\underbrace{\begin{pmatrix}1 \ 0.5\end{pmatrix}}\text{Posterior} This does not invalidate the example though! Thanks for suggesting, I think it helps clarify the conondrum.
Comment
The left hand side of the example is deliberately making the mistake described in your article, as a way to build intuition on why it is a mistake. (Adding instead of averaging in the update summaries was an unintended mistake) Thanks for explaining how to summarize updates, it took me a bit to see why averaging works.
There’s probably a radical constructivist argument for not really believing in open/noncompact categories like \neg A. I don’t know how to make that argument, but this post too updates me slightly towards such a Tao of conceptualization. (To not commit this same error at the meta level: Specifically, I update *away *from thinking of general negations as "real" concepts, disallowing statements like "Consider a non-chair, …"). But this is maybe a tangent, since just adopting this rule doesn’t resolve the care required in aggregation with even compact categories.
I think entropy is a key to understanding this more deeply. I believe you could consider the unaggregated distribution as the "microstates" and the aggregated one as the "macrostates". The entropy would then tell you how much information you lose by aggregating in this way.
Minor quibble: The likelihood part of probability is also subjective in the sense that it depends on the evidence the agent is aware of.
I find the beginning of this post somewhat strange, and I’m not sure your post proves what you claim it does. You start out discussing what appears to be a combination of two forecasts, but present it as Bayesian updating. Recall that Bayes theorem says p(\theta\mid x)=\frac{p(x\mid\theta)p(\theta)}{p(x)}. To use this theorem, you need both an x (your data / evidence), and a \theta (your parameter). Using "posterior\propto prior \times likelihood" (with priors p_{1},p_{2},p_{3} and likelihoods e_{1},e_{2},e_{3}), you’re talking as if your expert’s likelihood equals p(x\mid\theta) – but is that true in any sense? A likelihood isn’t just something you multiply with your prior, it is a conditional pmf or pdf with a different outcome than your prior. I can see two interpretations of what you’re doing at the beginning of your post:
You’re combining two forecasts. That is, with \theta\in{A,B,C} being the outcome, you have your own pmf p_{1}(\theta) and the expert’s e=p_{2}(\theta), then combine them using p(\theta)\propto p_{1}(\theta)p_{2}(\theta). That’s fair enough, but I suppose p(\theta)\propto\sqrt{p_{1}(\theta)p_{2}(\theta)} or maybe p(\theta)\propto p_{1}(\theta)^{q}p_{2}(\theta)^{1-q} for some q\in[0,1] would be a better way to do it.
It might be possible to interpret your calculations as a proper application of Bayes’ rule, but that requires stretching it. Suppose \theta is your subjective probability vector for the outcomes A,B,C and x is the subjective probability vector for the event supplied by an expert (the value of x is unknown to us). To use Bayes’ rule, we will have to say that the evidence vector e=p(x\mid\theta), the probability of observing an expert judgment of x given that \theta is true. I’m not sure we ever observe such quantities directly, and it is pretty clear from your post that you’re talking about e=p_{2}(\theta) in the sense used above, not p(x\mid\theta). Assuming interpretation 1, the rest of your calculations are not that interesting, as you’re using a method of knowledge pooling no one advocates. Assuming interpretation 2, the rest of your calculations are probably incorrect. I don’t think there is a unique way to go from p(x\mid\theta)to, let’s say, p(x^\mid\theta^), where x^ is the expert’s probability vector over A,A^{c} and \theta^{} your probability vector over A,A^{c}.
Comment
Thanks for engaging!
Comment
Okay, thanks for the clarification! Let’s see if I understand your setup correctly. Suppose we have the probability measures p_{E} and p_{1}, where p_{E} is the probability measure of the expert. Moreover, we have an outcome x\in{A,B,C}. In your post, you use p_{1}(x\mid z)\propto p_{E}(z\mid x)p_{1}(x), where z is an unknown outcome known only to the expert. To use Bayes’ rule, we must make the assumption that p_{1}(z\mid x)=p_{E}(z\mid x). This assumption doesn’t sound right to be, but I suppose some strange assumption is necessary for this simple framework. In this model, I agree with your calculations.