8.3 High Confidence Topics

I mentioned in section 8.1 that there were several topics that were appearing frequently among the articles the model was very confident about. Let’s look at those topics on a graph.

This graph looks at the 50 articles the model is most confident about, and asks how many of them are in the various different topics.

Topic distribution for the 50 articles the model is most certain about

Figure 8.2: Topic distribution for the 50 articles the model is most certain about

As I noted back when talking about Space and Time, it has a surprising large number of articles the model is very confident about. But as we saw above, a lot of the articles the model is confident about are very short. Let’s focus instead on the articles that are at least 10 pages long, and again look at the distribution of the 50 articles the model is most confident about.

Topic distribution for the 50 articles the model is most certain about (min 10 pages

Figure 8.3: Topic distribution for the 50 articles the model is most certain about (min 10 pages

And this isn’t surprising; the model gets really confident that Evolutionary Biology articles are properly placed. The same thing happens when we increase the length to 20 pages.

Topic distribution for the 50 articles the model is most certain about (min 20 pages

Figure 8.4: Topic distribution for the 50 articles the model is most certain about (min 20 pages

There are still 10 Evolutionary Biology articles, though mostly not the same 10. And there are fewer categories here. Just 18 categories are represented in these 50 articles. And the purples and reds indicate that the articles are getting much later. These trends extend when we raise the floor to 30 pages, though now the topics start to shift.

Topic distribution for the 50 articles the model is most certain about (min 30 pages

Figure 8.5: Topic distribution for the 50 articles the model is most certain about (min 30 pages

There is more Quantum Physics, and more political philosophy. And when we move to 40 pages, which means we’re just looking at the longest 2% of articles, these trends really accelerate.

Topic distribution for the 50 articles the model is most certain about (min 40 pages

Figure 8.6: Topic distribution for the 50 articles the model is most certain about (min 40 pages

By this stage the graph is measuring less which articles the model is really confident in, and more which kinds of philosophers write articles that long. The answer is, apparently, philosophers of (quantum) physics, political philosophers, and early modern historians.