Initial release of gnuspeech available
| From: | David Hill <drh-AT-firethorne.com> | |
| To: | Gnu Announce <info-gnu-AT-gnu.org> | |
| Subject: | First release of gnuspeech project software | |
| Date: | Mon, 19 Oct 2015 18:41:22 -0700 | |
| Message-ID: | <AD48546B-E89C-4F7C-A2C5-D45D5C3C46A3@firethorne.com> | |
| Archive‑link: | Article |
gnuspeech-0.9 and gnuspeechsa-0.1.5 first official release Gnuspeech is new approach to synthetic speech as well as a speech research tool. It comprises a true articulatory model of the vocal tract, databases and rules for parameter composition, a 70,000 word plus pronouncing dictionary, a letter-to-sound fall-back module, and models of English rhythm and intonation, all based on extensive research that sets a new standard for synthetic speech, and computer-based speech research. There are two main components in this first official release. For those who would simply like speech output from whatever system they are using, including incorporating speech output in their applications, there is the gnuspeechsa tarball (currently 0.1.5), a cross-platform speech synthesis application, compiled using CMake. For those interested in an interactive system that gives access to the underlying algorithms and databases involved, providing an understanding of the mechanisms, databases, and output forms involved, as well as a tool for experiment and new language creation, there is the gnuspeech tarball (currently 0.9) that embodies several sub-apps, including the interactive database creation system Monet (My Own Nifty Editing Tool), and TRAcT (the Tube Resonance Access Tool) -- a GUI interface to the tube resonance model used in gnuspeech, that emulates the human vocal tract and provides the basis for an accurate rendition of human speech. This second tarball includes full manuals on both Monet and TRAcT. The Monet manual covers the compilation and installation of gnuspeechsa on a Macintosh under OS X 10.10.x, and references the related free software that allows the speech to be incorporated in applications. Appendix D of the Monet manual provides some additional information about gnuspeechsa and associated software that is available, and details how to compile it using CMake on the Macintosh under 10.10.x (Yosemite). The digitally signed tarballs may be accessed at http://ftp.gnu.org/gnu/gnuspeech/ There is a list of mirrors at http://www.gnu.org/order/ftp.html and the site http://ftpmirror.gnu.org/gnuspeech will redirect to a nearby mirror A longer project description and credits may be found at: http://www.gnu.org/software/gnuspeech/ which is also linked to a brief (four page) project history/component description, and a paper on the Tube Resonance Model by Leonard Manzara. Signed: David R Hill ----------------------- drh@firethorne.com http://www.gnu.org/software/gnuspeech/ http://savannah.gnu.org/projects/gnuspeech https://savannah.gnu.org/users/davidhill Twitter: @t33guy -------- The only function of economic forecasting is to make astrology look respectable. (J.K. Galbraith) -------- -- If you have a working or partly working program that you'd like to offer to the GNU project as a GNU package, see https://www.gnu.org/help/evaluation.html.
Posted Oct 22, 2015 14:47 UTC (Thu)
by josh (subscriber, #17465)
[Link] (7 responses)
Posted Oct 22, 2015 16:07 UTC (Thu)
by hummassa (subscriber, #307)
[Link] (6 responses)
Posted Oct 22, 2015 17:47 UTC (Thu)
by deltasquared (guest, #99235)
[Link] (4 responses)
The intonation of sentence structure is there, like rises in pitch towards the end of certain clauses. Of course it lacks emotional intonation (it was only reading text from the article), however I imagine that wouldn't be too hard to add. (Emotion tag chars in unicode, anyone?)
Personally I found the syllables at certain points too fast to keep up with, though time-stretching the audio fixes this.
I wonder how easily it could be equipped with different voices, but as it stands I would totally have this read out stuff to me (possibly slowed down slightly).
Posted Oct 22, 2015 18:35 UTC (Thu)
by josh (subscriber, #17465)
[Link]
I was going to say that it sounded like a human voice in a particular regional dialect after passing through a low-fidelity phone connection.
Seems like it's on the far edge of the uncanny valley, where it's starting to rise again without quite sounding *entirely* correct yet.
Posted Oct 23, 2015 10:08 UTC (Fri)
by micka (subscriber, #38720)
[Link] (2 responses)
Posted Oct 25, 2015 7:23 UTC (Sun)
by deltasquared (guest, #99235)
[Link]
Posted Nov 12, 2015 16:26 UTC (Thu)
by nye (subscriber, #51576)
[Link]
I am a native speaker, and I managed to understand *maybe* one word in ten. It would not be an exaggeration to say that that is literally the worst example of text-to-speech I've ever heard. So bad, in fact, that I get the impression the software was mistakenly set to a language other than English and then given English text to read, making it an unfair example.
Posted Nov 6, 2015 23:51 UTC (Fri)
by fuhchee (guest, #40059)
[Link]
Initial release of gnuspeech available
Initial release of gnuspeech available
Initial release of gnuspeech available
It has the right delays elsewhere, like at commas and stops.
Initial release of gnuspeech available
Initial release of gnuspeech available
Too fast, no accentuation, no pauses.
I don't think that can be considered better that output from other foss attempts I heard before.
I think I appreciate where you're coming from. I can say it's like no accent I've ever heard before. That and the speed of some bits made portions of the speech impossible to understand.Initial release of gnuspeech available
As it stands, it does need a lot of work (sorry, I had no intention to infer otherwise, got a bit ahead of myself). What I found exciting was that I personally think the sample clip has the right bits hidden in the sound somewhere - while it sounds weird atm, as josh replied here there's something to it that is uncanny in its approximation in human speech.
I feel that there's that potential there for gnuspeech to become a great speech synthesis tool. I haven't read the paper yet but it's method of simulating the human vocal system looks promising.
Initial release of gnuspeech available
Initial release of gnuspeech available
