While Voxforge has quite a lot of English speech, it is almost all in a US accent, so it can only make models thereof. Other Englishes are quite different, generally using more phonemes, lacking rhoticity, and having different mappings between phonemes and words. For example, in General American English (as it is called) “father” rhymes with “bother” but does not sound identical to “farther”; this is reversed in most other English dialects. The vowel used in “bother” in those dialects does not exist in General American.
A part of the problem (reasonably) not mentioned by Peter Grasch is the pronunciation model—a dictionary mapping phoneme sequences to words. Creating a new pronunciation model is neither trivial nor interesting. The only truly free one in English is the General American CMU dictionary. Wiktionary is a bit too messy and full of holes. The various text-to-speech engines can spit out phonemic transcriptions, but checking the output is a problem.
Nevertheless, Peter Grasch demonstrates that a voxforge model tuned to a single non-US speaker can do surprisingly well. It is the multiple speaker models that really suffer.
While I'm about it, I might as well mention another problem with Voxforge for general purpose models: its voices almost all belong to non-elderly adult males—the usual free software demographic. Its models won't perform well for children, women, or elderly men.
Anyway, great projects, Simon, Sphinx, and Voxforge; but great problems also, and they aren't software problems.