TL;DR: I used to think the best way to get really good at skill s_i was to specialize by investing lots of time t_i into s_i. I was wrong. Investing lots of time t_i into s_i works only as a first-order approximation. Once t_i becomes large, investing in some other t_{j\neq i} produces greater real-world performance p_i than continued investment in t_i.
I like to think of intelligence as a vector s={s_1, s_2, \ldots, s_n} where each s_i\geq0 is a skill level in a different skill. I think of general intelligence g as the Euclidean norm g=\sqrt{\sum_{i=1}^ns_i^2}.
I use the Euclidean norm instead of the straight sum \sum_{i=1}^ns_i because generality of experience equals generality of transference. Suppose you are exposed to a novel situation requiring skill s_{n+1}=0. You have no experience at s_{n+1} so you must borrow from your most similar skill. The wider a variety of skills you have, more similar your most similar skill will be to s_{n+1}.
The best way to increase your general intelligence g is to invest time t_w into your weakest skill s_w. If your invested time t_s for your strongest skill is already high then investments in t_w can also increase the real world performance of your strongest skill faster than investments in t_s.
Suppose you want to increase p_1, your real world performance at s_1. \frac{\partial p_1}{\partial t_1}>0. Investing time t_1 into s_1 always results in increasing p_1. But eventually you will hit diminishing returns. For every \epsilon>0 there exists a \delta\in\mathbb{R} such that if t>\delta then \frac{\partial s_1}{\partial t_1}<\epsilon.
\lim_{t_1\to\infty}\frac{\partial p_1}{\partial t_1}=\lim_{t_1\to\infty}\frac{\partial s_1}{\partial t_1}=0
Here’s where things get interesting. "All non-trivial abstractions, to some degree, are leaky" and a system is only as secure as its weakest link; cracking a system tends to happen on an overlooked layer of abstraction. All real world applications of skill are non-trivial abstractions. Therefore performance in one skill occasionally leaks over to improve performance of adjacent skills. Your real-world performance at p_1 leaks over from adjacent skills s_{j\neq1}\in s on rungs above and below s_1 on the ladder of abstraction.
These adjacent skills increase your real world performance on p_1 by a quantity independent of s_1. Since \lim_{t_1\to\infty}\frac{\partial s_1}{\partial t_1}=0, there will inevitably come a time when increasing t_1 increases s_1 less than increasing t_{i\neq1}.
It follows that quantity of avocations correlates positively with winning Nobel Prizes, despite the time these hobbies take away from one’s specialization.
When I want to improve my ability to write machine learning algorithms, my first instinct is to study machine learning. But in practice, it’s often more profitable to do something seemingly unrelated, like learning about music theory. I find it hard to follow this strategy because it is so counterintuitive.
I think that a hidden assumption here is that improving in a weak skill always has a positive spillover affect on other skills. There might be a hidden truth within this. Namely, sometimes unlearning things will be the best way to make progress.
Perhaps this can be connected with another recent post. It was pointed about in Subspace Optima that when we optimize we do so under constraints external or internal. It seems like you had an internal constraint stopping you from optimizing over the whole space. Instead you focused on what you thought was the most correlated trait. This almost reads like an insight following the realization you’ve been optimizing a skill along a artificial sub-space.