5 Sorting Into Categories

This chapter goes over how I got from the 90 topics that were generated by the model to the 12 categories that were the focus of the previous chapter. It goes fairly deep in the weeds, and the audience for it is really just (a) people who don’t trust that I got the coding behind the previous chapter right, and (b) people who would like to learn about how to do a similar project of their own in the future. (I expect substantial overlap between these categories.)

The short version of what I was trying to do is easy. Put each of these 90 topics into one of N familiar categories, for some small value of N, so the trends are easily visible on a single graph. In practice, it got more complicated than that, as you’ll see.

Note that one of the challenges here is working out the value for N, and working out which categories it should include. You can see in the previous chapter how I answered this. But I just want to stress at the start that these answers were things that came out of the sorting methodology, not things I settled on before trying to sort the topics. So let’s turn to how I did, or in the first instance didn’t do, that sorting.