8.3 High-Confidence Topics

I mentioned in section 8.1 that there were several topics that were appearing frequently among the articles the model was very confident about. Let’s look at those topics on a graph.

This graph looks at the fifty articles the model is most confident about, and asks how many of them are in the various different topics.

A hisogram showing the topic distribution for the fifty articles the model is the most certain about, shown as the number of articles with topic probability at least 0.905. Space and time has by far the most articles in this category.

Figure 8.2: Topic distribution for the fifty articles the model is most certain about.

As I noted back when talking about space and time, it has a surprising large number of articles the model is very confident about. But as we saw above, a lot of the articles the model is confident about are very short. Let’s focus instead on the articles that are at least ten pages long, and again look at the distribution of the fifty articles the model is most confident about.

A histogram showing the topic distribution for the fifty articles the model is the most certain about that are at least ten pages long. This is shown as the number of articles with topic probability at least 0.847. Evolutionary Biology is the most frequent in this distribution, followed by War.

Figure 8.3: Topic distribution for the fifty articles the model is most certain about (min ten pages).

And this isn’t surprising; the model gets really confident that evolutionary biology articles are properly placed. The same thing happens when we increase the length to twenty pages.

 A histogram showing the topic distribution for the fifty articles the model is the most certain about that are at least twenty pages long. This is shown as the number of articles with topic probability at least 0.785. Evolutionary biology is the most frequent in this distribution, followed by quantum physics.

Figure 8.4: Topic distribution for the fifty articles the model is most certain about (min twenty pages).

There are still ten evolutionary biology articles, though mostly not the same ten. And there are fewer categories here. Just eighteen categories are represented in these fifty articles. And the purples and reds indicate that the articles are getting much later. These trends extend when we raise the floor to thirty pages, though now the topics start to shift.

A histogram showing the topic distribution for the fifty articles the model is the most certain about that are at least thirty pages long. This is shown as the number of articles with topic probability at least 0.695. Liberal Democracy is the most frequent in this distribution, followed by Quantum Physics and Egalitarianism.

Figure 8.5: Topic distribution for the fifty articles the model is most certain about (min thirty pages

There is more quantum physics, and more political philosophy. And when we move to forty pages, which means we’re just looking at the longest two percent of articles, these trends really accelerate.

A histogram showing the topic distribution for the fifty articles the model is the most certain about that are at least forty pages long. This is shown as the number of articles with topic probability at least 0.559. Quantum Physics is the most frequent in this distribution, followed by Liberal Democracy and Early Modern.

Figure 8.6: Topic distribution for the fifty articles the model is most certain about (min forty pages).

By this stage the graph is measuring less which articles the model is really confident in, and more which kinds of philosophers write articles that long. The answer is, apparently, philosophers of (quantum) physics, political philosophers, and early modern historians.