GNU ed 1.6 released

[Posted January 2, 2012 by corbet]

From:		Antonio Diaz Diaz <ant_diaz-AT-teleline.es>
To:		info-gnu-AT-gnu.org
Subject:		Version 1.6 of GNU ed released
Date:		Mon, 02 Jan 2012 17:18:39 +0100
Message-ID:		<4F01D8DF.5040109@teleline.es>
Cc:		bug-directory-AT-fsf.org, bug-ed-AT-gnu.org
Archive‑link:		Article

I am pleased to announce the release of GNU ed 1.6.

GNU ed is an 8-bit clean, more or less POSIX-compliant implementation of 
the standard Unix line editor.

The homepage is at http://www.gnu.org/software/ed/ed.html

The sources can be downloaded from http://ftpmirror.gnu.org/ed/ 
http://download.savannah.gnu.org/releases/ed/ or from your favorite GNU 
mirror.

This version is also available in lzip format. If your distro doesn't 
yet distribute the lzip program, you can download it from 
http://www.nongnu.org/lzip/lzip.html

The md5sums are:
9a78593decccaa889523aa4bb555ed4b  ed-1.6.tar.gz
e3d4dcfd260b2ebb2855c86ffca1947f  ed-1.6.tar.lz

This release is also GPG signed. You can download the signature by 
appending ".sig" to the URL.

Changes in version 1.6:

   * Displaying of null characters by the "l" command has been fixed.

   * The condition deciding when to show the message "Newline appended" 
has been corrected.

   * The "modified" flag is now set when reading a non-empty file into 
an empty buffer.

   * An error that prevented using NUL characters in regular expressions 
has been fixed.

   * Ed now signals an error if it can't create a shell process when 
executing a shell command.

   * Ed now flushes stdout/stderr before reading a new command.

   * Man page is now generated with "help2man". All command-line options 
are now documented in the man page.

   * The copyright notices of Andrew L. Moore have been restored. It 
seems Andrew granted some permissions but never assigned copyright to 
the FSF.


Please send bug reports and suggestions to bug-ed@gnu.org

If you are packaging ed for a distribution, please, try to use the 
lzipped source tarball, as this can improve the support for the lzip 
format in packaging systems. Thanks.


Regards,
Antonio Diaz, GNU ed maintainer.


_______________________________________________
GNU Announcement mailing list <info-gnu@gnu.org>
https://lists.gnu.org/mailman/listinfo/info-gnu

GNU ed 1.6 released

Posted Jan 2, 2012 22:27 UTC (Mon) by johnny (guest, #10110) [Link] (16 responses)

What does "8-bit clean" mean? That the code only uses 8-bit variables?

GNU ed 1.6 released

Posted Jan 2, 2012 22:39 UTC (Mon) by andrel (guest, #5166) [Link] (15 responses)

It means that this version of ed works with 8-bit character sets.

GNU ed 1.6 released

Posted Jan 3, 2012 0:35 UTC (Tue) by geuder (subscriber, #62854) [Link] (14 responses)

True, but I don't think the wikipedia article you link to is very clear.

UTF-8 is not a character set at all, but a variable length encoding of a 16-bit or 32-bit character set.

8 bit cleanliness was a big issue for most Europeans wanting to write anything correctly on a computer in their mother tongue in the early 90s. Most editors could only handle 7 bit character sets without quirks.

8 bit character sets were an intermediate step in the 90s. 8 bits per character are enough for most bigger European languages (But not a single 8 bit character set for all of them)

But 8 bits don't help the Asians. They need 16 bits per character at least, and because different sets where needed in Europe it wasn't ideal even there.

I think the majority of software in use has been basically 16 bit Unicode for may years now. Windows, Symbian, and Java use true 16 bit wide characters, while all Linux distributions I have used use UTF-8 encoding by default. The nice thing with UTF-8 is that you even can't tell the difference to the old ASCII as long as you stick to 7 bit ASCII characters, because their encoding is identical, 8 bits with the most significant bit being 0.

Whether ed supports UTF-8 or not is not said in the announcement. IMHO 8 bit cleanliness defines support of 8 bit character sets, not stripping away or clearing the most significant bit. UTF-8 is more than this, the editor must be able to handle the variable length encoding.

But whether it can or cannot be used for writing texts in European languages I use regularly, I don't see a reason why I personally would do it using ed. As long as all American programmers remember every day that the world is not 7 bit and not even all ASCII characters are reachable on the keyboard without using modifier key I'm happy. The original question shows that there is work to do, so excuse my long comment.

GNU ed 1.6 released

Posted Jan 3, 2012 0:56 UTC (Tue) by Karellen (subscriber, #67644) [Link] (2 responses)

I think the only problem with the article is that it means 8-bit character *encodings*, where the encoding of the character set uses 8 bits (e.g. UTF-8), rather than 7 (e.g. UTF-7), regardless of how many bits are required by the character *set* (20 for all proper unicode encodings such as both UTF-7 and UTF-8).

Anyway, whatever that wikipedia article means, it's kind of a red herring, as "ed" being 8-bit clean means that it can handle both 8 bit character sets (e.g. ISO-8859-*) and 8-bit character encodings (e.g. UTF-8).

GNU ed 1.6 released

Posted Jan 3, 2012 19:39 UTC (Tue) by blitzkrieg3 (guest, #57873) [Link] (1 responses)

UTF-8 is a variable width encoding. It sounds like they now have support for something like extended ASCII.

This is not true...

Posted Jan 3, 2012 20:01 UTC (Tue) by khim (subscriber, #9252) [Link]

UTF-8 is not just run-of-the-mill variable-length encoding. Ken Thompson modified original IBM's proposal to make sure most algorithms which treat strings as sequence of 8-bit characters were still usable with UTF-8.

This means that yes, you can easily use UTF-8 with programs like GNU ED or GNU M4 which know absolutely nothing about UTF-8 but correctly support 8bit characters in strings.

GNU ed 1.6 released

Posted Jan 3, 2012 3:28 UTC (Tue) by wahern (subscriber, #37304) [Link] (9 responses)

Both UTF-16 and UTF-32 are also variable length encodings, it's just that most Windows, Java, etc programmers treat them like fixed length encodings. It mostly works for the time being, but will begin breaking when non-Western markets mature and people start requiring more feature parity when slicing and dicing text without buying special-purpose editing packages.

For the time being people have low expectations. But political and technical movements like Simplified Chinese will eventually hit substantial cultural barriers and the push back will require that software handle locales which didn't adapt to western syntax. That will mean following the Unicode rules to a T. To follow the Unicode rules you have to use an API for even simple things like "character" iteration, etc, unless the programming language supports the proper semantic text operations, like Perl6 can over graphemes using it's neat NFG hack. Scripts like Thai have no mandatory punctuation, so again you need to use accessors with a complex built-in rule base to detect, e.g., end-of-sentence. There's no hacking in this kind of support after the fact; it has to be baked into the code.

APIs like ICU are huge, but in many cases can make the code more clear. Unfortunately ICU doesn't get used much because the rule tables are so gargantuan that virtual memory explodes (though most of that is mmap'd straight from disk), and programmers are still beholden to their notion of low-level C-like character strings.

In 10-20 years we are going to see a surge in demand for I18N and L10N programmers to refactor all the crap hacks that came out of the 1990s, heralded by Microsoft's and Sun's half-hearted adoption of UTF-16.

GNU ed 1.6 released

Posted Jan 3, 2012 3:34 UTC (Tue) by mjg59 (subscriber, #23239) [Link] (3 responses)

From the spec:

"UTF-32 encoding form: The Unicode encoding form that assigns each Unicode scalar value to a single unsigned 32-bit code unit with the same numeric value as the Unicode scalar value"

So UTF-32 isn't variable length. The sudden rise in the use of emoji and other non-BMP characters means that ignoring the variable length of UTF-16 is already broken in real-world cases in non-CJK markets, too.

GNU ed 1.6 released

Posted Jan 3, 2012 4:02 UTC (Tue) by wahern (subscriber, #37304) [Link] (2 responses)

My fault for being lazy with the terminology.

Question: do all combining sequences have precomposed equivalents. I think all the Latin ones do, but what about other scripts?

GNU ed 1.6 released

Posted Jan 3, 2012 4:04 UTC (Tue) by wahern (subscriber, #37304) [Link]

Also, it's worth point out, from the FAQ:

Q: Doesn’t it cause a problem to have only UTF-16 string APIs, instead of UTF-32 char APIs?

A: Almost all international functions (upper-, lower-, titlecasing, case folding, drawing, measuring, collation, transliteration, grapheme-, word-, linebreaks, etc.) should take string parameters in the API, not single code-points (UTF-32). Single code-point APIs almost always produce the wrong results except for very simple languages, either because you need more context to get the right answer, or because you need to generate a sequence of characters to return the right answer, or both.

(Source: http://unicode.org/faq/utf_bom.html)

GNU ed 1.6 released

Posted Jan 3, 2012 12:02 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

Even if you ignore IPA, not all Latin-alphabet combining sequences used in the orthography of natural languages have precomposed code points. For example, as far as I know there is still no precomposed code point for n̈ - and yes, this does have a use other than correctly representing the name of a certain fictional heavy metal band.

GNU ed 1.6 released

Posted Jan 3, 2012 5:56 UTC (Tue) by ssmith32 (subscriber, #72404) [Link] (1 responses)

>...There's no hacking in this kind of support after the fact;...

Yet somehow, this is what always happens :D

GNU ed 1.6 released

Posted Jan 3, 2012 12:55 UTC (Tue) by sorpigal (guest, #36106) [Link]

> There's no *nice* way to hack in this kind of support after the fact...

There, fixed it for ya.

GNU ed 1.6 released

Posted Jan 3, 2012 23:52 UTC (Tue) by cmccabe (guest, #60281) [Link] (2 responses)

I don't see a reason to use ICU unless you need the functionality that ICU provides. Nearly all of the C programs I've ever written just treat strings as opaque blobs and don't try to do "proper semantic text operations."

UTF-8 works great for what I need. My only wish is that it had been invented sooner, so that people didn't come up with N+1 different subtly defective, backwards incompatible "wide character" solutions.

If I were performing fancy operations on text, I would probably do it in a higher level language with built-in unicode support. At that point the encoding should be a non-issue (right?) because the high level language abstracts that away.

GNU ed 1.6 released

Posted Jan 4, 2012 2:38 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (1 responses)

If the higher level language and its associated string manipulation APIs were created in the last decade or so, and by someone with a thorough understanding on Unicode itself and the general problems of different text systems, then yes, maybe.

In practice, I can't think of any languages like that. Many of them are built by people who at best assumed other writing systems are just like Latin except with differently shaped squiggles. They often mandate that "text" means "UTF-16 strings" and then blunder into all sorts of problems with filenames, URLs, streams of bytes some idiot stashed in a "text" field on a database, and other things that definitely aren't UTF-16 strings. There may be built-in assumptions about writing direction, the meaning of "character" (a very, very tricky issue) and so on.

As a rule of thumb if the language claims to be "high level" and yet it has a "character" data type that's distinct from a string, or can be treated meaningfully as an integer, or it has the same data type for binary data and text, then either they're yanking your chain or they had no idea about Unicode. C has the excuse that Unicode literally didn't exist back then. Languages like Python will have to provide their own excuses.

Some more bad signs:

• Mentions of the "length" of a string that don't either include or point at a multi-paragraph discussion of what "length" means in this context.

• Discussion of collation or "sorting" strings that doesn't mention locale.

• A string equality operator or comparison method that doesn't come with a multi-paragraph discussion of Unicode equivalence.

Of course a lot of this stuff can be /fixed/ in theory. But fixes after the fact are often messy. The can involve things like deprecated methods on core objects, parallel APIs replacing every mention of character with "string", or even inventing another type "Unicode string" and then going around replacing all the other APIs in the system with Unicode-friendly ones, leaving maintenance programmers to handle the debris.

GNU ed 1.6 released

Posted Jan 4, 2012 16:33 UTC (Wed) by cmccabe (guest, #60281) [Link]

Well, I guess you are right. Unicode support, even in higher-level languages, still is not perfect. Disappointing.

GNU ed 1.6 released

Posted Jan 3, 2012 13:15 UTC (Tue) by bjartur (guest, #67801) [Link]

I think ed is even simpler than that: ed doesn't bother with encoding at all, but leaves the mess to terminals. You should be fine as long as you refrain from inputting partial characters.

GNU ed 1.6 released

Posted Jan 3, 2012 1:06 UTC (Tue) by halfline (guest, #31920) [Link] (6 responses)

GNU ed 1.6 released

Posted Jan 3, 2012 1:48 UTC (Tue) by nescafe (subscriber, #45063) [Link] (5 responses)

note the consistent user interface

Posted Jan 3, 2012 3:38 UTC (Tue) by tnoo (subscriber, #20427) [Link] (4 responses)

Let's look at a typical novice's session with the mighty ed:

golem$ ed

?
help
?
?
?
quit
?
exit
?
bye
?
hello?
?
eat flaming death
?
^C
?
^C
?
^D
?

---

Note the consistent user interface and error reportage. Ed is generous enough to flag errors, yet prudent enough not to overwhelm the novice with verbosity.

“Ed is the standard text editor.”

Ed, the greatest WYGIWYG editor of all.

(from http://www.gnu.org/fun/jokes/ed.msg.html)

note the consistent user interface

Posted Jan 3, 2012 8:09 UTC (Tue) by rsidd (subscriber, #2582) [Link] (1 responses)

Source code for ed:

while :;do read x;echo \?;done

(from here)

note the consistent user interface

Posted Jan 4, 2012 11:15 UTC (Wed) by tnoo (subscriber, #20427) [Link]

this still breaks on ^C. The source code must be more complex, maybe like this:

trap "" SIGINT;while :;do read x;echo \?;done

note the consistent user interface

Posted Jan 3, 2012 15:28 UTC (Tue) by NAR (subscriber, #1313) [Link] (1 responses)

Once I had to use a router which had ed (but not vi) installed to edit it's configuration file on the router. For some very simple tasks I actually could edit the file, but for anything complicated it was easier to download the file, edit it with vim, upload, then reload the configuration.

note the consistent user interface

Posted Jan 6, 2012 1:25 UTC (Fri) by k8to (guest, #15413) [Link]

I learned C++ in a mud environment where all I had was ed. I did a lot of pasting from a local editor with very tightly controlled flow rates.

Sometimes I did legitimately fix bugs in source on the server system using ed commands. It was painful.

GNU ed 1.6 released

Posted Jan 3, 2012 19:38 UTC (Tue) by nicku (subscriber, #777) [Link] (2 responses)

It's strange to relate how happy I was writing all my (mostly Pascal) computing assignments at UNSW in ed on the locally compiled Unix through 2400 bps green terminals in 1986--1989. About twenty of us simultaneously wrote 6809 assembly language programs on a time-share OS/9 system running on one 68000 CPU in an ed-like editor.

GNU ed 1.6 released

Posted Jan 3, 2012 21:00 UTC (Tue) by JoeBuck (subscriber, #2330) [Link] (1 responses)

Yes, ed was optimized for use with DECwriters spitting out characters at 300 baud (and I'm old enough that I actually used it as intended).

GNU ed 1.6 released

Posted Jan 9, 2012 15:43 UTC (Mon) by ndk (subscriber, #43509) [Link]

Ah, those were the days: back in the late 70s/early 80s, we had a Prime 750 running PrimOS (anybody remember those?) with an IBM-inspired line editor from the mid-60s as the system editor: if you think ed is frustrating, you should try that beast. I spent a lot of pleasant all-nighters on a terminal with a 300-baud modem porting the Kernighan & Plauger software tools to PL1/G; of course I started with ed. After that bootstrap step, there was a quantum leap in productivity (and a 1200-baud modem upgrade helped, but not as much). The ed port (and quite a few of the other tools) was actually used in the classroom for a few years, before Prime actually paid somebody to port emacs to their OS.

GNU ed 1.6 released

Posted Jan 5, 2012 2:11 UTC (Thu) by neilbrown (subscriber, #359) [Link] (1 responses)

We must never forget the greatest legacy that 'ed' has left us.

If 're' is a 'regular expression', then
/re/
will search for it.
/re/p
will search and then print.
g/re/p
will apply this globally - for every line that matches 're', print the line.
So if you wanted to write a program that just printed the lines that match a regular expression - what do you call it?

GNU ed 1.6 released

Posted Feb 12, 2012 4:54 UTC (Sun) by dirtyepic (guest, #30178) [Link]

These days? SearchKit.

GNU ed 1.6 released

Posted Jan 5, 2012 17:18 UTC (Thu) by jhhaller (guest, #56103) [Link]

My first editor on Unix was ed, vi and emacs hadn't been written yet. em (ed for mortals) was next. But, then moving to emacs instead of vi, I never really learned much of vi other than a, i, d, r, and x; most of my use of vi consists of colon followed by a ed command. Go to the end of the file,

:$

make a copy of a line,

:.t.

move 3 lines

:.,.+2t52 :.,.+2d

(assuming moving lines forward), and substitute apple for banana

:g/apple/s//banana/g