8.4 Correlations

The model assigns a probability to each topic-article pair. So across the articles, we can ask how tightly correlated those probabilities are. Which of them tend to go up when the other goes up? There are 8010 pairs of distinct topics, so there is too much data here to usefully examine, or even visualise. But I wanted to go over the extremes. First, here are the thirty-two strongest correlations. (Why thirty-two? Because these seemed particularly interesting.)

Table 8.10: Highest topic correlations.
Subject One Subject Two Correlation
Knowledge Justification 0.2214
Chance Theory testing 0.2102
Idealism Self-consciousness 0.1956
Faith and theism Ontological argument 0.1852
Idealism Life and value 0.1713
Propositions and implications Deduction 0.1708
Moral conscience Virtues 0.1615
Moral conscience Promises and imperatives 0.1595
Laws Causation 0.1550
Physicalism Perception 0.1533
Methodology of science Theories and realism 0.1525
Mechanisms Cognitive science 0.1495
Temporal paradoxes Classical space and time 0.1490
Concepts Wide content 0.1468
Laws Explanation 0.1454
Promises and imperatives Intention 0.1445
Color/colour Perception 0.1422
Modality Composition and constitution 0.1408
Reasons Norms 0.1397
Life and value Marx 0.1369
Sense and reference Belief ascriptions 0.1347
Meaning and use Ordinary language 0.1345
Promises and imperatives Duties 0.1343
Other history History and culture 0.1336
Denoting Sense and reference 0.1319
Temporal paradoxes Time 0.1292
Dewey and pragmatism Moral conscience 0.1289
Dewey and pragmatism Value 0.1282
Definitions Meaning and use 0.1268
Life and value Faith and theism 0.1261
Marx Liberal democracy 0.1243
History and culture Marx 0.1243

I think these mostly make sense. The two epistemology topics are very tightly connected. The two topics that are about formal methods in scientific reasoning are correlated. (Remember that chance included a lot of work on formal models of inference.) The philosophy of religion articles are correlated. Idealism is correlated with the other early topics. Topics about time are correlated. denoting and sense and Reference are correlated; Frege and Russell aren’t that far apart.

The bottom few here are particularly interesting. Moral Conscience and value include some very analytic ethics; it’s interesting that it they are so close to Dewey and pragmatism. Marx the topic plays well with life and value, i.e., idealist ethics, with liberal democracy, and with history and culture. This is a bit surprising since Marx himself didn’t play well with any of them. But life and value also plays well with faith and theism, the core philosophy of religion topic. That mildly surprised me, but perhaps it should not have given how important the Absolute is to idealists.

Let’s turn to the strongest negative correlations. These are a little less interesting.

Table 8.11: Lowest topic correlations.
Subject One Subject Two Correlation
Life and value Arguments -0.1298
Idealism Arguments -0.1167
Ordinary language Sets and grue -0.0962
Idealism Norms -0.0902
Life and value Sets and grue -0.0891
Life and value Verification -0.0888
Life and value Propositions and implications -0.0869
Life and value Truth -0.0841
Psychology Arguments -0.0789
Other history Arguments -0.0765
Mechanisms Arguments -0.0752
Idealism Sets and grue -0.0749
Life and value Theories and realism -0.0747
Ordinary language Theories and realism -0.0746
Methodology of science Arguments -0.0736
Life and value Sense and reference -0.0718
Methodology of science Promises and imperatives -0.0716
Ordinary language Models -0.0716
Life and value Composition and constitution -0.0714
Life and value Deduction -0.0712
Idealism Justification -0.0700
Life and value Justification -0.0693
Physicalism Moral conscience -0.0692
Life and value Modality -0.0685
Definitions Arguments -0.0684

The early topics and the late topics aren’t correlated. The Idealists aren’t correlated with anyone who isn’t sympathetic to idealism. No one was offering arguments, at least not as such, in the early going. Let’s come back to this table and see what we can find that’s more interesting.

What about the topics that are perfectly independent? These topics are not correlated with each other at all.

Table 8.12: Most independent topics.
Subject One Subject Two Correlation
Personal identity Wide content 0e+00
Ordinary language Crime and punishment 0e+00
Decision theory Models 0e+00
Explanation Reasons 1e-04
Origins and purposes Races and DNA 1e-04
Intention Knowledge 1e-04
Beauty Meaning and use 1e-04
Beauty Functions 1e-04
Psychology Minds and machines -1e-04
Origins and purposes Abortion and self-defence -1e-04
Origins and purposes Cognitive science 2e-04
Abortion and self-defence Formal epistemology 2e-04
Denoting Arguments 2e-04
Theory testing Evolutionary biology 2e-04
Hume Personal identity -2e-04
Universals and particulars Functions -2e-04
Explanation Quantum physics -2e-04
Deduction Thermodynamics -2e-04
Value Liberal democracy 3e-04
Arguments Sense and reference -3e-04
History and culture Mechanisms -3e-04
Chance Mathematics -3e-04
Mechanisms Theory testing -3e-04
Physicalism Wide content -3e-04
Deduction Modality 4e-04

I don’t know what I expected here, but I don’t think it was this. Some of these felt like they should be positively correlated. I guess just on timing grounds I expected personal Identity to correlate with wide content. But I would have guessed beauty to be negatively correlated with meaning and use. Maybe there isn’t anything to be found here; this mostly looks like noise to me.

The low correlation table featured mostly topics from the first half of the topics. (Indeed, every pair featured at least one such topic.) So let’s do the high and low correlation tables again but restricted to topics 46–90.

Table 8.13: Highest topic correlations (topics 46–90).
Subject One Subject Two Correlation
Knowledge Justification 0.2214
Laws Causation 0.1550
Concepts Wide content 0.1468
Laws Explanation 0.1454
Modality Composition and constitution 0.1408
Reasons Norms 0.1397
Sense and reference Belief ascriptions 0.1347
Speech acts Sense and reference 0.1234
Theory testing Theories and realism 0.1224
Liberal democracy Egalitarianism 0.1110
Decision theory Game theory 0.1097
Truth Vagueness 0.1047
Quantum physics Thermodynamics 0.0991
Thermodynamics Models 0.0989
Causation Models 0.0987
Space and time Quantum physics 0.0983
Wide content Cognitive science 0.0960
Truth Radical translation 0.0960
Personal identity Composition and constitution 0.0946
Minds and machines Wide content 0.0899
Justification Norms 0.0898
Minds and machines Cognitive science 0.0892
Justification Reasons 0.0889
Liberal democracy Duties 0.0881
Theory testing Models 0.0877

Those all seem to make sense. That isn’t totally surprising, but it’s reassuring to see that the model seems to have not messed up here. Let’s look at the other end of the table.

Table 8.14: Lowest topic correlations (topics 46–90).
Subject One Subject Two Correlation
Perception Truth -0.0595
Decision theory Concepts -0.0490
Perception Decision theory -0.0488
Liberal democracy Truth -0.0463
Perception Liberal democracy -0.0458
Causation Truth -0.0437
Perception Duties -0.0434
Laws Perception -0.0432
Truth Reasons -0.0431
Knowledge Composition and constitution -0.0430
Theories and realism Knowledge -0.0423
Perception Reasons -0.0422
Concepts Formal epistemology -0.0421
Truth Egalitarianism -0.0419
Duties Truth -0.0415
Arguments Thermodynamics -0.0404
Decision theory Composition and constitution -0.0402
Perception Models -0.0400
Liberal democracy Composition and constitution -0.0396
Theory testing Composition and constitution -0.0395
Truth Evolutionary biology -0.0394
Perception Mathematics -0.0393
Perception Egalitarianism -0.0390
Mathematics Reasons -0.0389
Liberal democracy Concepts -0.0387

This is a bit surprising. I thought I’d see pairs like Liberal Democracy and composition and Constitution turning up a lot here. That is, I thought what we’d find would recreate the famiilar ethics versus M&E divide. But pairs like that are not the bulk of the table. Instead, we get a lot of negatively correlated pairs that are on the same side of this (alleged) divide.

Some such pairs are not surprising. concepts and formal epistemology are negatively correlated, but this makes perfect sense because virtually all the work in formal epistemology uses unstructured contents.

But one might worry that the lack of an Ethics versus M&E divide here shows that the model has missed something important. I think a better conclusion is that the model is correctly detecting that M&E isn’t a useful kind of classification in contemporary philosophy. This feels like something that could do with further study, but I doubt text mining will be the way forward here. It would be interesting, for example, to see whether citation studies show that there is (or is not) a big ethics versus M&E divide.