Initial release of gnuspeech available

[Posted October 21, 2015 by n8willis]

From:		David Hill <drh-AT-firethorne.com>
To:		Gnu Announce <info-gnu-AT-gnu.org>
Subject:		First release of gnuspeech project software
Date:		Mon, 19 Oct 2015 18:41:22 -0700
Message-ID:		<AD48546B-E89C-4F7C-A2C5-D45D5C3C46A3@firethorne.com>
Archive‑link:		Article

gnuspeech-0.9 and gnuspeechsa-0.1.5  first official release

Gnuspeech is new approach to synthetic speech as well as a speech research tool. It comprises a
true articulatory model of the vocal tract, databases and rules for parameter composition, a 70,000
word plus pronouncing dictionary, a letter-to-sound fall-back module, and models of English rhythm
and intonation, all based on extensive research that sets a new standard for synthetic speech, and
computer-based speech research.

There are two main components in this first official release. For those who would simply like
speech output from whatever system they are using, including incorporating speech output in their
applications, there is the gnuspeechsa tarball (currently 0.1.5), a cross-platform speech synthesis
application, compiled using CMake.

For those interested in an interactive system that gives access to the underlying algorithms and
databases involved, providing an understanding of the mechanisms, databases, and output forms
involved, as well as a tool for experiment and new language creation, there is the gnuspeech
tarball (currently 0.9) that embodies several sub-apps, including the interactive database creation
system Monet (My Own Nifty Editing Tool), and TRAcT (the Tube Resonance Access Tool) -- a GUI
interface to the tube resonance model used in gnuspeech, that emulates the human vocal tract and
provides the basis for an accurate rendition of human speech.

This second tarball includes full manuals on both Monet and TRAcT. The Monet manual covers the
compilation and installation of gnuspeechsa on a Macintosh under OS X 10.10.x, and references the
related free software that allows the speech to be incorporated in applications. Appendix D of the
Monet manual provides some additional information about gnuspeechsa and associated software that is
available, and details how to compile it using CMake on the Macintosh under 10.10.x (Yosemite).

The digitally signed tarballs may be accessed at http://ftp.gnu.org/gnu/gnuspeech/


There is a list of mirrors at http://www.gnu.org/order/ftp.html and the site
http://ftpmirror.gnu.org/gnuspeech will redirect to a nearby mirror

A longer project description and credits may be found at: http://www.gnu.org/software/gnuspeech/
which is also linked to a brief (four page) project history/component description, and a paper on
the Tube Resonance Model by Leonard Manzara.


Signed: David R Hill
-----------------------
drh@firethorne.com
http://www.gnu.org/software/gnuspeech/

http://savannah.gnu.org/projects/gnuspeech

https://savannah.gnu.org/users/davidhill

Twitter: @t33guy
--------
 The only function of economic forecasting is to make astrology look respectable. (J.K.
Galbraith)
--------

-- 
If you have a working or partly working program that you'd like
to offer to the GNU project as a GNU package,
see https://www.gnu.org/help/evaluation.html.

Initial release of gnuspeech available

Posted Oct 22, 2015 14:47 UTC (Thu) by josh (subscriber, #17465) [Link] (7 responses)

Sounds quite impressive; if anyone has the necessary bits available to run this, I'd love to hear a sample of the speech this produces.

Initial release of gnuspeech available

Posted Oct 22, 2015 16:07 UTC (Thu) by hummassa (subscriber, #307) [Link] (6 responses)

https://www.dropbox.com/s/7n3p807pcsxvpry/test.aac?dl=1

Initial release of gnuspeech available

Posted Oct 22, 2015 17:47 UTC (Thu) by deltasquared (guest, #99235) [Link] (4 responses)

This sounds awesome. For those who haven't listened to it, it sounds like a person's voice passed through filters to create robotic sci-fi sounding speech, while still retaining a human quality. Except it's actually been generated by a program.

The intonation of sentence structure is there, like rises in pitch towards the end of certain clauses. Of course it lacks emotional intonation (it was only reading text from the article), however I imagine that wouldn't be too hard to add. (Emotion tag chars in unicode, anyone?)

Personally I found the syllables at certain points too fast to keep up with, though time-stretching the audio fixes this.
It has the right delays elsewhere, like at commas and stops.

I wonder how easily it could be equipped with different voices, but as it stands I would totally have this read out stuff to me (possibly slowed down slightly).

Initial release of gnuspeech available

Posted Oct 22, 2015 18:35 UTC (Thu) by josh (subscriber, #17465) [Link]

> This sounds awesome. For those who haven't listened to it, it sounds like a person's voice passed through filters to create robotic sci-fi sounding speech, while still retaining a human quality. Except it's actually been generated by a program.

I was going to say that it sounded like a human voice in a particular regional dialect after passing through a low-fidelity phone connection.

Seems like it's on the far edge of the uncanny valley, where it's starting to rise again without quite sounding *entirely* correct yet.

Initial release of gnuspeech available

Posted Oct 23, 2015 10:08 UTC (Fri) by micka (subscriber, #38720) [Link] (2 responses)

For me, it was incomprehensible. I did catch one or two words but that's all. Of course i'm not english native speaker, but I handle most case where people doesn't talk with accents that are too exotic.
Too fast, no accentuation, no pauses.
I don't think that can be considered better that output from other foss attempts I heard before.

Initial release of gnuspeech available

Posted Oct 25, 2015 7:23 UTC (Sun) by deltasquared (guest, #99235) [Link]

I think I appreciate where you're coming from. I can say it's like no accent I've ever heard before. That and the speed of some bits made portions of the speech impossible to understand.

As it stands, it does need a lot of work (sorry, I had no intention to infer otherwise, got a bit ahead of myself). What I found exciting was that I personally think the sample clip has the right bits hidden in the sound somewhere - while it sounds weird atm, as josh replied here there's something to it that is uncanny in its approximation in human speech.

I feel that there's that potential there for gnuspeech to become a great speech synthesis tool. I haven't read the paper yet but it's method of simulating the human vocal system looks promising.

Initial release of gnuspeech available

Posted Nov 12, 2015 16:26 UTC (Thu) by nye (subscriber, #51576) [Link]

>For me, it was incomprehensible. I did catch one or two words but that's all. Of course i'm not english native speaker, but I handle most case where people doesn't talk with accents that are too exotic

I am a native speaker, and I managed to understand *maybe* one word in ten. It would not be an exaggeration to say that that is literally the worst example of text-to-speech I've ever heard. So bad, in fact, that I get the impression the software was mistakenly set to a language other than English and then given English text to read, making it an unfair example.

Initial release of gnuspeech available

Posted Nov 6, 2015 23:51 UTC (Fri) by fuhchee (guest, #40059) [Link]

The old Amiga synthesis engine sounds to me at least as good.