1.9 Strengts and Weaknesses

The benefit of using this kind of modeling is that it allows you to take every article into account. This is the history of philosophy (in these journals) without any gaps whatsoever.

And this is no small feat. Remember that there are 32261 articles that we’re looking at. Let’s say that you could dedicate 8 hours a day, 5 days a week, just to reading these articles, and that you could on average read an article per hour. Some, to be sure, would take less than an hour even to read closely. But just one hour is an optimistic reading time for the longer articles. Still, let’s make the optimistic assumption. That would mean 807 weeks of just to read through them all. If you take 2 weeks a year off, you would take 16 years just to do the reading. And at the end of that time, you’d at best have some sketchy notes on the articles, not anything you can use for an analysis.

If you want to analyse all the articles, if you want to really have no gaps, then the only way to do it is by machine.

But there are a number of downsides to this algorithmic approach, all of which come from the fact that the machine is just doing string recognition. The algorithm doesn’t know any semantics, just syntax. And this causes some complications. I’ll mention five here, along with a brief discussion of how badly they impacted the model I ended up using.

One problem that I expected to find more was that the algorithm would run together different uses of the same word. But there was less of this than I feared. It seems, for example, to understand the difference between how ‘function’ is used in philosophy of biology, to how it is used in logic and mathematics. It didn’t run together the different uses of ‘realism’, or ‘internalism’/‘externalism’, like I would have expected. There is a hint of running together ‘scepticism’ in the sense most relevant to epistemology with other kinds of philosophical scepticism. (Someone who is a free will sceptic doesn’t say we don’t know whether free will exists, but that we know it doesn’t.) But maybe this isn’t too much of a problem, since the views aren’t that separate.

The one time that this particular model seems to have gotten confused over the two related meanings of a word concerned ‘free’. Topic 35 is a mishmash of work on free will, with work on political freedom. Now you might think this isn’t too bad, since the subjects are somewhat connected. But it’s not optimal, and we’ll eventually work out a way to separate out free will and political freedom. But the big picture is that something that seemed likely to be a problem turned out, pleasingly, to not be that bad.

A second problem comes from the reverse direction. Sometimes the differences in topics just come from a change in terminology. You can see this most clearly, I think, in the logic topics in the model. Papers about sequents get but in a different topic to papers about syllogisms. Papers about implications get put in a different topic from papers about validities. Now there is a sense in which that’s a good thing, and the model is picking up a philosophically significant change. But it’s a relatively minor change compared to what the model thinks. Still, this isn’t a particularly serious problem. The worst case scenario is that we have to come back in after and manually note that we should put together the papers on validities and papers on implications when we’re doing analysis. That’s a bit of work but it isn’t too bad, we just have to remember that it happens.

A third, and related, problem, comes from when the model makes fine-grained distinctions within a subject. I mentioned earlie that I saw several models that ended up separating out work on causation that didn’t discuss counterfactuals (like Mackie’s work) from post-Lewisian work where counterfactuals are front and center. That’s not great - these really are on the same topic - but it isn’t too bad. Again, worst case scenario is yo combine these topics by hand when doing analysis. But in practice I don’t think we really saw this problem arise in this particular run of the model.

A potentially bigger problem is the converse, which I already discussed when talking about choosing the number of topics. Sometimes the topics are just disjunctive. For example, Topic 37 ends up being half about sets, and half about the grue paradox. Now there is a connection of sorts here - Nelson Goodman is kind of important to both literatures. But really this shouldn’t be a single topic. As I already noted, this is a hard problem to fix. If you increase the number of topics, the model becomes harder to read, and you’re just as likely to split a coherent topic (like causation) as to split a disjunctive topic.

I did three things here to address these disjunctive topics. One, that I’ve already mentioned, was to keep running refinements until the worst of the disjunctiveness was polished away. (Before the refinements, some papers on probabilistic epistemology got classified in with papers on Hume, and I don’t know what the computer was thinking. A handful ended up there after the refinements, but not nearly as many.) A second is to use very clear labels for the topics, like “Sets and Grue”, to indicate that it is a disjunctive topic. And a third is to run a further analysis on articles in that topic to divide up the sets articles from the grue articles. Eventually there ended up being 10 topics where I felt this kind of split was worthwhile.

The fifth and final problem is that the algorithm can’t tell changes of topic apart from changes in style. If it becomes a requirement on all right-thinking philosophers to express onself more or less exclusively in monosyllables, as seems to have been the case in mid-century Britain, then the algorithm will think that there is a new topic that is being discussed right then. I’m exaggerating of course about mid-century Britain, but there is a trend that matters, and that I’ll talk much more about later.

Or imagine what would happen if every philosopher all at once decided that you shouldn’t respond to objections with a new theory that has distinctive consequences, but instead you should respond to worries with a new account that has distinctive commitments. Well, the model will think that there is this cool new subject about ‘worries’, ‘accounts’, and ‘commitments’, and that you’re all talking about it. And if this stylistic change happens all at once across philosophy, the model will think that the generalist journals, the philosophy of science journals, and the moral and political journals, are all obsessed all of a suddent with the worry/account/commitment subject. Of course, philosophy couldn’t be so caught up chasing trends that something like this would all happen at once, could it? Could it? Let’s return to this issue at the very end, and see how bad things got.