User: Password:
|
|
Subscribe / Log in / New account

Shell programming

Shell programming

Posted Dec 7, 2012 1:08 UTC (Fri) by sitaram (guest, #5959)
In reply to: Shell programming by davidescott
Parent article: Quotes of the week

Speaking of very complex command chains, here's something that happened yesterday...

My daughter said that 3 seems to be the most common last digit in primes. I thought it might be 7. Without even thinking too much about it, I banged this out in pretty much a "flow of thought" kind of way:

seq 1 100 | xargs -L1 factor | egrep '^([0-9]+): \1$' | grep -o '[0-9]$' | sort | uniq -c | sort -n -r

Sure you won't use this in any long term program but that's because it is **slow**, not because it is unreadable. Yes it is "calling out to 15 different programs", but they're either fairly common (grep, sort, uniq) or easy to guess what they do (seq, factor).

To be honest I've rarely used anything more than grep/egrep, cut, sort, and occasionally a line or two of sed or perl. Years later I can still understand all those scripts because these basic tools are still around and they're well known and stable.

I agree that this example is somewhat contrived for our discussion (though it was real enough for me at the time I wrote it), but the wins for shell, IMO, come from:

* the lack of any actual variables anywhere
* 'sort | uniq -c', which encapsulates what would be the second largest component of the processing if you did this in a proper language
* 'factor', although it needs to be filtered by a regex to throw out the non-primes


(Log in to post comments)

Shell programming

Posted Dec 7, 2012 4:44 UTC (Fri) by davidescott (guest, #58580) [Link]

> The wins for shell, IMO, come from:
> * the lack of any actual variables anywhere
> * 'sort | uniq -c', which encapsulates what would be the second largest component of the processing if you did this in a proper language
> * 'factor', although it needs to be filtered by a regex to throw out the non-primes

What you are describing there is a mixture of functional and template/meta-programming aspects to shell.

The problem with this code is that it is impossible to read without knowing its purpose. This may have seemed trivial to you but it is beyond cryptic to anyone who does not know what it does:
> seq 1 100 |
trivial
> xargs -L1 factor |
I guess my xargs is weak (I seldom use it) had to look up -L1
> egrep '^([0-9]+): \1$' |
Dependent upon the output of factor which also must be looked up (I don't use factor).
Once you know what factor outputs you have to think a bit to understand how that relates to egrep.
> grep -o '[0-9]$' |
Something about the last character... but what is -o?
> sort |
trivial
> uniq -c | sort -n -r
c,n,r aren't terrible arguments but not everyone will know them.

I think this ruby is easier to follow except for the difficulty of knowing what group_by does:

require 'mathn'
(1..100).collect{|x| x.prime? ? x : nil }.compact().group_by{|x| x%10}.map{|k,v| [v.length(),k]}.sort()

Shell programming

Posted Dec 8, 2012 19:05 UTC (Sat) by nix (subscriber, #2304) [Link]

In most cases you're trying too hard. Rather than 'looking up' what xargs -L does, just do 'xargs --help' and it tells you (not that that is much harder than 'man xargs'. Rather than agonizing over what factor does, try 'factor 16' and 'factor 17' (one obvious prime, one obvious non-prime) and it is instantly obvious.

Shell programming

Posted Dec 8, 2012 19:09 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

>cyberax@cybmac:~$ factor 16
>bash: factor: команда не найдена [command not found]
Kinda fails.

Let's try "port search factor" - again it fails. There's no 'factor' utility in Mac OS X's ports.

On the other hand, Ruby version works perfectly even with the bundled Ruby interpreter.

And THAT is the problem with shells.

Shell programming

Posted Dec 8, 2012 20:01 UTC (Sat) by man_ls (guest, #15091) [Link]

I'd say that is a problem with the Mac OS X shell: it lacks the factor command.

Shell programming

Posted Dec 8, 2012 20:04 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

And Linux's shell lacks some SunOS's shell's tar options.

I've checked and it doesn't seem that the 'factor' command is mentioned anywhere in the POSIX spec.

Shell programming

Posted Dec 8, 2012 22:52 UTC (Sat) by man_ls (guest, #15091) [Link]

Neither does the POSIX spec mention ssh, rsync, wget, curl, less or a myriad of other commands in widespread use. Yet using them in scripts is easily forgiven; each command maintains excellent backwards compatibility on its own.

For your Mac OS X, I believe that brew install coreutils will get you a nice local copy of factor.

Shell programming

Posted Dec 8, 2012 23:26 UTC (Sat) by davidescott (guest, #58580) [Link]

What is this "less" you speak of? Maybe you are confused, the proper command is "more." Also what is this "ssh" do you mean "rsh?"

GNU has done an impressive job of unifying the *nix with a set of common tools, and the fact that someone can "brew install coreutils" and get a bunch of useful binaries is a testament to the value of the work GNU has done. BUT...

That does not mean that "shell" is the best language, it just means that shell is the least common denominator for interacting with a diverse set of programs (and it has served *nix well). Text input/output, flags for program options, silence is success, integer return codes, etc... As programs mature they get split into libraries, and those libraries get external interfaces to other languages like python/ruby/etc...

Shell programming

Posted Dec 8, 2012 23:10 UTC (Sat) by davidescott (guest, #58580) [Link]

I would go further and suggest that /usr/bin/factor is not something that should be in coreutils, and that OSX/BSD is far more sensible than GNU in excluding it. It just ends up polluting /usr/bin with functionality that would make more sense in a basic calculator like "bc" (which I curse out every time I start it because its so much easier to do basic arithmetic in python, ruby, R or octave).

If you look at what else is in coreutils factor seems completely out of place. There is not a single other tool in coreutils to do basic arithmetic. Where is there a /usr/bin/factor but not a /usr/bin/factorial, /usr/bin/exponent, /usr/bin/log or for that matter a /usr/bin/prime or /usr/bin/primes. In fact a good multi-function random tool (to generate random numbers or strings) would seem to be a far far far more useful thing to add to coreutils than "factor."

I'm really curious what considerations lead to "factor" being included into coreutils, is there an init script that really needs to factor a number before it can continue?

I don't mean this as a criticism of the developers of bc or coreutils or of GNU in general. Tools like "bc" were advanced and useful for their day. Thankful we have better alternatives for many use cases, and with 8 cores and 16GB of RAM I'm happy to waste a bit of each to get a more use friendly tool to do basic arithmetic and really don't need "bc" or "factor" anymore.

Shell programming

Posted Dec 9, 2012 17:39 UTC (Sun) by nix (subscriber, #2304) [Link]

I suspect only Roland McGrath can tell you why factor is in coreutils. It's been there since before it had an RCS repository (1992).

Shell programming

Posted Dec 8, 2012 22:44 UTC (Sat) by davidescott (guest, #58580) [Link]

> Rather than 'looking up' what xargs -L does, just do 'xargs --help' and it tells you (not that that is much harder than 'man xargs').

In what way is "xargs --help" not "looking it up"?

> Rather than agonizing over what factor does, try 'factor 16' and 'factor 17' (one obvious prime, one obvious non-prime) and it is instantly obvious.

Again I know what factor does (its in the name), what I don't know is that the output format is "INPUT: N1 N2 ... Nk". So I did exactly as you described (except I used 4 and 7).

There is nothing particularly complex about looking things up like this, and I had to look some things up for ruby as well (its been a few years since I have used it heavily). The key difference is that ruby has functions like "numeric.prime?" which (following standard naming convention) is a test if the numeric is prime, and has a standard obvious boolean return value.

The shell variant of this was a rather convoluted test if the output of "factor" was "N: N" which is a rather indirect way of testing if a number is prime. One could reasonably suspect that "factor" 7 would output "7: 1 7" or that factor 8 would output "2^3" or "2,2,2". There is no apriori reason to believe that factor should output the input number followed by the non-unit factors in increasing order (and separated by ":" and " ").

That is one of the advantages of a programming language. Real datatypes make it easier to guess what a function might return (factor() should return an array of prime factors [sorting and the unit are open questions]), and the subsequent manipulation of those objects is usually sufficient to say what kind of object was returned. If my test for primality of x was:
> (x.factor().length()==1)
Then it is clear that factor() returns an array of factors not including 1. In addition the standardized syntax makes it easier to guess what the functions might be.

Now at the expense of portability I could write a program /usr/bin/prime and simplify the shell script. I might even be able to write something like ruby's group_by in shell but it stretches the meaning of "shell" to say that such a script is "standard *nix shell."

Shell programming

Posted Dec 9, 2012 17:44 UTC (Sun) by nix (subscriber, #2304) [Link]

Hm. I hate to point it out, but your 'easy to read' Ruby one-liner was and remains utterly incomprehensible to me, full of thoroughly opaque punctuation, probably just as incomprehensible as the shell equivalent was to you: the presence of the occasional comprehensible word does not help much. The last time I saw that much nested foo().bar().baz().quux() was when building menus using Borland's Turbo Vision library. It is very much not intrinsically easier to follow than pipelines: at least with pipelines you know that what is being transported is text with a print syntax that you can easily examine, rather than some intricate datatype which you have to look up the various functions to figure out.

Your entire argument boils down to 'things that are familiar to me are intrinsically easier to read for everyone than things that are not familiar to me', which is not at all valid.

I'd agree that for larger programs, the shell is hardly ideal: datatyping is worthwhile -- but for quick one-liners to answer quick questions, it's exactly as valid as a bunch of other vaguely-scripty languages. Ruby is not magically better just because you know it better than the shell (and your claim that more people know Ruby than know the shell is unsupported, and to me sounds dubious in the extreme).

Shell programming

Posted Dec 9, 2012 19:01 UTC (Sun) by davidescott (guest, #58580) [Link]

> Ruby is not magically better just because you know it better than the shell

Never wrote that. In fact wrote something much the opposite of that

> (and your claim that more people know Ruby than know the shell is unsupported, and to me sounds dubious in the extreme).

Never wrote that.

> but for quick one-liners to answer quick questions

Not what this discussion was about.

> Your entire argument boils down to 'things that are familiar to me are intrinsically easier to read for everyone than things that are not familiar to me', which is not at all valid.

Your entire argument seems to be based on making up stuff and attributing it to others. There seems little reason to respond.

Shell programming

Posted Dec 9, 2012 23:20 UTC (Sun) by nix (subscriber, #2304) [Link]

No, it's based on not looking upthread to see who wrote what. Feel free to interpret 'you' as 'collective you' if you wish :)

Shell programming

Posted Dec 10, 2012 0:12 UTC (Mon) by Baylink (guest, #755) [Link]

> Your entire argument boils down to 'things that are familiar to me are intrinsically easier to read for everyone than things that are not familiar to me', which is not at all valid.

Exactly.

But the point you make immediately above, which was essentially: debugging is easier because you can merely lop off the end of the pipeline and *look at the data on the terminal*, is the really important one to me.

The best language for anything is very often *the one your programmer understands best*, as that is the one in which s/he'll be most efffective.

Shell programming

Posted Dec 18, 2012 12:25 UTC (Tue) by wookey (subscriber, #5501) [Link]

I think this is the nub of it.

I write stuff in shell because I know how to. After many years I've learned how to deal with many of the horrible things that go wrong, and with a couple of hours of dicking about can usually get what I wanted.

As with others if I know there will be a lot of data manipulation then I'll use perl instead, and know just enough to do that. I avoid writing in python because I don't know how, and worse don't know how to fill in the gaps.

If I'd started writing python 20 years ago things might be different (and I might be more productive - I like shell but I'm not going to claim that it's a _nice_ language)

Shell programming

Posted Dec 10, 2012 0:47 UTC (Mon) by dlang (subscriber, #313) [Link]

the pipe based approach has the wonderful property that it's trivial to truncate the 'program' at any point and see the output at that stage (with normal tools like head/less/wc/etc)

In fact, I normally build up the monster pipe shell programs iteratively this way, one chunk at a time.

Shell programming

Posted Dec 10, 2012 2:17 UTC (Mon) by davidescott (guest, #58580) [Link]

> it's trivial to truncate the 'program' at any point and see the output at that stage

So will interactive ruby (or python for that matter).

> In fact, I normally build up the monster pipe shell programs iteratively this way, one chunk at a time.

Same way I write long ruby chains.

The only difference is that Ruby passes objects and shell passes text.

Shell programming

Posted Dec 7, 2012 9:03 UTC (Fri) by man_ls (guest, #15091) [Link]

Compactness and the use of pipes: two basic Unix principles.

By the way, it appears that the most common last digit in primes is a bit random: running your script for 100 yields 3, for 1000 yields 7 and for 10000 it is 3 again, and for 100000 7 once more. Not bad for a Bash one-liner; and it is not so slow after all.

Shell programming

Posted Dec 7, 2012 10:53 UTC (Fri) by egk (subscriber, #50799) [Link]

Off topic, but anyone interested in the last digit of primes should search for "Chebychev bias", and especially the paper of Rubinstein and Sarnak on this topic. Very short summary: it's much more complicated than one might think...

And if you want to do numerical experiments on this type of questions, the right tool for the job is not a shell-scripting language. (Hint: Pari/GP).

Shell programming

Posted Dec 7, 2012 11:35 UTC (Fri) by man_ls (guest, #15091) [Link]

What is really surprising is that for a quick check, Bash can be useful even for interesting mathematical questions.

Shell programming

Posted Dec 7, 2012 12:01 UTC (Fri) by renox (subscriber, #23785) [Link]

> Compactness and the use of pipes: two basic Unix principles.

Maybe but even if I know better the shell than Ruby, the Ruby solution is more readable than the shell one..

Shell programming

Posted Dec 7, 2012 19:55 UTC (Fri) by dlang (subscriber, #313) [Link]

for you Ruby is the right answer, for now.

However, how much will Ruby change over time. If new people are hired (or your company is bought out, or you leave and go somewhere else), are these other people also going to be in the situation where Ruby is more familiar to them than shell?

Shell is everywhere. It's hard to administer a *nix system without dealing with shell (you can be an application programmer without dealing with shell, but not a sysadmin)

along similar lines, it's hard to be a sysadmin and completely ignore vi, you may prefer emacs, but any system you touch is going to have vi, not all systems will have emacs.

not all systems will have Ruby, Python, or Perl but every system will have shell of some sort.

Shell programming

Posted Dec 7, 2012 20:22 UTC (Fri) by bronson (subscriber, #4806) [Link]

Exactly right. Ten years ago Perl was the answer. Not anymore. (Perl5 is becoming nonessential a lot faster than I thought it would. It's almost scary...)

Scripting languages come and go, and the modern one break their own scripts every five years. Shell (without bashisms) seems to be as close to eternal as the computer industry can manage.

Use the right tool for the job.

Shell programming

Posted Dec 7, 2012 22:02 UTC (Fri) by davidescott (guest, #58580) [Link]

> Shell (without bashisms) seems to be as close to eternal as the computer industry can manage.

Not so sure about that. Trying to run a shell script on an older sysv system I think bash-isms are the least of your problems. Consider all the gnu-isms in the commands you use every day.

Compare the single unix specification ls (copyright is only 11 years ago):
http://pubs.opengroup.org/onlinepubs/009695399/utilities/...
to gnu ls:
http://unixhelp.ed.ac.uk/CGI/man-cgi?ls

-A, --author, -b (aka --escape), --block-size, -B (--ignore-backups), --color, -D, --file-type, --format, --full-time, -G, -h, --si, --dereference-command-line-symlink-to-dir, --hide, --indicator-style, -I, -k, -N (--literal), --show-control-chars, -Q, --quoting-style, -R, -S, --sort, --time, --time-style, -T, -U, -v, -w, -X

If you think about POSIX sh as the "core language" and all the programs in /usr/bin as the "libraries" for that language, then shell has seen a very stable core language and a massive expansion in the number and variety of libraries. On the other hand languages like ruby/python/etc have evolved more evenly across the language/library split.

"Shell is universally understood" has more to do with the overwhelming success of the GNU system and the power of open-source to push out the inferior closed source versions. Not that shell programming has been completely static.

Shell programming

Posted Dec 7, 2012 22:07 UTC (Fri) by dlang (subscriber, #313) [Link]

The key is in backwards compatibility, not in lack of change.

If you take a shell script from 20 years ago, the odds of it being able to run on a system today are very high

Perl is also fairly good about backwards compatibility, but I don't know how far back Perl 5 goes.

most other languages are far worse, and if you take a program written in them 10-15 years ago (if they were even around that long), the odds are very poor that you could run the program today without going through a porting effort that would be comparable to porting to a different language.

Shell programming

Posted Dec 8, 2012 1:35 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

HAhahahahadhahadkLOLOLL.

I've recently spent 5 hours rewriting old scripts from a long ago retired Sun box. Searching old manuals to understand what non-standard command line options do is such a great way to spend time. Especially when it's not possible to simply run them.

Shell programming

Posted Dec 8, 2012 2:05 UTC (Sat) by sitaram (guest, #5959) [Link]

That has nothing to do with shell per se. It's just non-open source versus open source. If they were open source you'd still be able to compile and run them.

As for non-standard options, I suspect most of the Sun utilities would be closer to BSD.

Shell programming

Posted Dec 8, 2012 2:11 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

Even if they were OpenSource (which they are), you'd need to compile 20-year old crufty C code and quite likely newer GCC versions won't do the trick. Waaaaay too much work.

On the other hand, I've ported 15-year old Python scripts without much problems. There were several non-backward-compatible changes in Python since then, but they are fairly minor.

Shell programming

Posted Dec 8, 2012 6:53 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

Some anecdara:

For Python, one I hit today was to change "except BaseException as e:" to "except BaseException: e = sys.exc_info()[0]" to support 2.4 through 3.2 in the same script. A little annoying, but not as pretty as either just-one-way canonical syntax.

As for shell, we found out that BSD sed and GNU sed don't support compatible -i flags. BSD requires a suffix, GNU requires there be no space if one is given, which BSD rejects. Using manual .bak files feels worse than either by a long shot. I think the call has been replaced by awk instead now.

In my experience, GNUisms tend to be harder to work around than Python incompatibilities. That's why I try to avoid them in my shell scripts. Unfortunately, sed -i is very convenient and sponge just isn't common enough, but this is slowly being trained out of my fingers when I'm in "portable" mode (I use every trick I can at the prompt, just not when writing .sh files).

Commands on an old Sun box

Posted Dec 10, 2012 7:19 UTC (Mon) by jrn (subscriber, #64214) [Link]

Have you looked at the heirloom toolset (http://heirloom.sourceforge.net/)?

Commands on an old Sun box

Posted Dec 10, 2012 14:17 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Nope (I didn't know about it).

In any case, I was interested in a rewrite in a sane language, not just running these scripts.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds