I used data on 415,662 games from just over 7 years of tournament data (from cross-tables.com, goddamn those guys are good) which indicate that the ogive ratings curve used by the NSA seems to systematically underestimate the winning chances of the lower-rated player and systematically overestimate the winning chances of the higher-rated player. This may be due to the total absence of any model of luck in the theory underlying the Elo system (developed for chess, where there is no luck component, essentially).
Note, however, that any alternative ratings curve (estimated from past data, say) would have to be applied to past events in some way (though the pairings would have been different in past tournaments under the counterfactual hypothesis of using a different ratings curve) to see if a proposed alternative continued to match past performance once "revised" ratings were calculated. Tricky stuff.