Hi,
Yeah, I agree that the quality of the phoneme voice is pretty low, it was originally only meant to be a holdover until I could get hold of some diphone voices (for comparison the phoneme voice has 39 unique sounds, a diphone voice has around 1600) which is much clearer and higher quality. However, it's proven quite difficult to get hold of such voices.
For context, this article (http://www.mperfect.net/ttSpeech/) was the inspiration for the TTS mod, in it he was able to convert a voice file encoded using LPC (Linear predictive coding) to it's constituent diphones. This is where progress stopped, the linked article does not supply code for this process and my attempts at modifying the LPC source code of the mentioned speech engine (FreeTTS) have failed. But long story short; since I'm rather busy lately, I doubt I would make any progress towards that any time soon, so I agree that using phonemes is likely a dead end.
I like the idea of having a bunch of common, high quality words, however I'm concerned about the much larger file size, but it might be okay with careful word selection.
I'll definitely look into adding a few of the voices you suggested to begin with, it would make the mod much more useful, it's more or less a novelty/proof of concept at the moment.
Thank you for your constructive feedback, I appreciate it