Generalised models as a category

https://www.lesswrong.com/posts/nQxqSsHfexivsd6vB/generalised-models-as-a-category

Contents

Naming the "generalised" models\newcommand{\E}{\mathcal{E}}\newcommand{\M}{\mathcal{M}}\newcommand{\W}{\mathcal{W}}\newcommand{\F}{\mathcal{F}}\newcommand{\L}{\mathcal{L}}\newcommand{\wF}{\overline{\F}}\newcommand{\suml}{\sum\limits}

In this post, I’ll apply some mathematical rigour to my ideas of model splintering, and see what they are as a category[1].

And the first question is… what to call them? I can’t refer to them as ‘the models I use in model splintering’. After a bit of reflection, I decided to call them ‘generalised models’. Though that’s a bit vague, it does describe well what they are, and what I hope to use them for: a formalism to cover all sorts of models.

The generalised models

A generalised model \M is given by three objects:

\M = (\F,\E,Q).

Here \F is a set of features. Each feature f consists of a name or label, and a set in which the feature takes values. For example, we might have the feature "room empty?" with values "true" and "false", or the feature "room temperature?" with values in \mathbb{R}^+, the positive reals.

We allow these features to sometimes take no values at all (such as the above two features if the room doesn’t exist) or multiple values (such as "potential running speed of person X" which includes the maximal speed and any speed below it).

Define \overline{f} as the set component of the feature, and \wF as disjoint union of all the sets of the different features—ie \wF= \sqcup_{f\in\F} \overline{f}.

A world, in the most general sense, is defined by all the values that the different features could take (including situations where features take multiple values and none at all). So the set of worlds, \W, is the set of functions from \wF to {0,1}, with 1 representing the fact that that feature takes that value, and 0 the opposite. Hence \W=2^{\wF}, the power set of \wF.

The set of environments is a specific subset of these worlds: \E \subset \W. The choice of \E is actually more important than that of \W, as that establishes which values of the features we are modelling.

The Q is a partial probability distribution. In general, we won’t worry as to whether Q is normalised (ie whether Q(\E)=1) or not; we’ll even allow Qs with Q(\E)>1. So Q could be more properly be defined as a partial weight distribution. As long as we consider terms like Q(A \mid B), then the normalisation doesn’t matter.

Morphisms: relations

For simplicity, assume there are finitely many features taking values in finite sets, making all sets in the generalised model finite.

If \M_0=(\F_0,\E_0,Q_0) and \M^1(\F_1,\E_1,Q_1) are generalised models, then we want to use binary relations between \E_0 and \E_1 as morphisms between the generalised models.

Let r be a relation between \E_0 and \E_1, written as e_0 \sim_r e_1. Then it defines a map r:2^{\E_0} \to 2^{\E_1} between subsets of \E_0 and \E_1. This map is defined by e_1 \in r(E_0) iff there exists an e_0 \in E_0 with e_0 \sim_r e_1. The map r^{-1}:2^{\E_1} \to 2^{\E_0} is defined similarly[2], seeing r^{-1} as the inverse relation, e_0 \sim_r e_1 iff e_1 \sim_{r^{-1}} e_0.

We say that the relation r is a morphism between the generalised models if, for any E_0 \subset \E_0 and E_1 \subset \E_1:

The intuition here is that probability flows along the connections: if e_0 \sim_r e_1 then probability can flow from e_0 to e_1 (and vice-versa). Thus r(E_0) must have picked up all the probability that flowed out of E_0 - but it might have picked up more probability, since there may be connections coming into it from outside E_0. Same goes for r^{-1}(E_1) and the probability of E_1.

Morphisms properties

We now check that these relations obey the requirements of morphisms in category theory.

Let r be a morphism \M_0 \to \M_1 (ie a relation between \E_0 and \E_1), and let q be a morphism \M_1 \to \M_2 (ie a relation between \E_1 and \E_2).

We compose relations by the composition of relations: e_0 \sim_{pr} e_2 iff there exists an e_1 with e_0 \sim_r e_1 and e_1 \sim_p e_2. Composition of relations is associative.

We now need to show that qr is a morphism. But this is easy to show:

Finally, the identity relation Id_{\E_0} is the one that relates a given e_0\in\E_0 only to itself; then r and r^{-1} are the identity maps on 2^{\E_0}, and the morphism properties for Q_0=Q_1 are trivially true.

So define the category of generalised models as \mathcal{GM}.

r-stable sets

Say that a set E_0 \subset \E_0 is r-stable if r^{-1}r(E_0)=E_0.

For such an r-stable set, Q_0(E_0) \leq Q_1(r(E_0)) and Q_1(r(E_0))\leq Q_0(r^{-1}r(E_0)) = Q_0(E_0), thus Q_0(E_0)=Q_1(r(E_0)).

Hence if r is a morphism, it preserves the probability measure on the r-stable sets.

In the particular case where r is a bijective function, all points of \E_0 are r-stable (and all points of \E_1 are r^{-1}-stable), so it’s an isomorphism between \E_0 and \E_1 that forces Q_0 = Q_1.

Morphism example: probability update

Suppose we wanted to update our probability measure Q_0, maybe by updating that a particular feature f takes a certain value x.

Then let E_{f=x} \subset \E_0 be the set of environments where f takes that value x. Then updating on f=x is the same as restricting to E_{f=x} and then rescaling.

Since we don’t care about the scaling, we can consider updating on f=x as just restricting to E_{f=x}. This morphism is given by:

Morphism example: surjective partial function

In my previous posts I defined how \M_1=(\F_1,\E_1,Q_1) could be a refinement of \M_0=(\F_0,\E_0,Q_0).

In the language of the present post, \M_1 is a refinement of \M_0 if there exists a generalised model \M_1'=(\F_1,\E_1,Q_1') and a surjective partial function r: \E_1 \to \E_0 (functions and partial functions are specific examples of binary relations) that is a morphism from \M_1' to \M_0. The Q_1 is required to be potentially ‘better’ than Q_1' on \E_1, in some relevant sense.

This means that \M_1 is ‘better’ than \M_0 in three ways. The r is surjective, so \E_1 covers all of \E_0, so its set of environments is at least as detailed. The r is a partial function, so \E_1 might have even more environments that don’t correspond to anything in \E_0 (it considers more situations). And, finally, Q_1 is better than Q_1', by whatever definition of better that we’re using.

Feature-split relations

The morphisms/​relations defined so far use \E and Q - but they don’t make any use of \F. Here is one definition that does make use of the feature structure.

Say that the generalised model \M=(\F,\E,Q) is feature-split if \F = \sqcup_{i=1}^n \F^i and \E = \times_{i=1}^n \E^i such that

\E^i \subset 2^{\overline{\F^i}}.

Note that \F = \sqcup_{i=1}^n \F^i implies \W=2^{\overline{\F}} = \times_{i=1}^n 2^{\overline{\F^i}}, so \times_{i=1}^n \E^i lies naturally within \W.

Designate such a generalised model by \M=({\F^i},\E,Q).

Then a feature-split relation between \M_0=({\F^i_0},\E_0,Q_0) and \M_1=({\F^i_1},\E_1,Q_1) is a morphism r that is defined as r=(r^1,r^2, \ldots, r^n) with r^i a relation between \E_0^i and \E_1^i.

Comment

https://www.lesswrong.com/posts/nQxqSsHfexivsd6vB/generalised-models-as-a-category?commentId=sbRpqEAqgpy7t4AAy

So the set of worlds, \mathcal W, is the set of functions from \mathcal F to …

I guess the \mathcal F should be a \bar{\mathcal F}? Also, you don’t seem to define \mathcal E; perhaps \mathcal E = \mathcal W?

Comment

https://www.lesswrong.com/posts/nQxqSsHfexivsd6vB/generalised-models-as-a-category?commentId=vQAjt942npuv4TBst

Thanks! Corrected both of those; \mathcal{E} is a subset of \mathcal{W}.

https://www.lesswrong.com/posts/nQxqSsHfexivsd6vB/generalised-models-as-a-category?commentId=4if7zLoTgSxKistTb

Re "I’m not fully sold on category theory as a mathematical tool", if someone (e.g. me) were to take the category you’ve outlined and run with it, in the sense of establishing its general structure and special features, could you be convinced? Are there questions that you have about this category that you currently are only able to answer by brute force computation from the definitions of the objects and morphisms as you’ve given them? More generally, are there variants of this category that you’ve considered that it might be useful to study in parallel?

Comment

https://www.lesswrong.com/posts/nQxqSsHfexivsd6vB/generalised-models-as-a-category?commentId=9CuKSqzNZ4EWvsPmK

For the moment, I’m going to be trying to resolve practical questions of model splintering, and then I’ll see if this formalism turns out to be useful for them.

https://www.lesswrong.com/posts/nQxqSsHfexivsd6vB/generalised-models-as-a-category?commentId=ED85aGk4jEZkmraSS

Cross reference: I am not a big fan of stating things in category theory notation, so I made some remarks on the building and interpretation of generalised models in the comment section of this earlier post on model splintering.

Comment

https://www.lesswrong.com/posts/nQxqSsHfexivsd6vB/generalised-models-as-a-category?commentId=9sLsrMCuCyn3nQ9t4

Cheers! My opinion on category theory has changed a bit, because of this post; by making things fit into the category formulation, I developed insights into how general relations could be used to connect different generalised models.

Comment

Definitely, it has also been my experience that you can often get new insights by constructing mappings to different models or notations.