Ruby Performance (Linux Journal)
Antonio Cangiano posted a Ruby Implementation Shootout on his blog last week. While it's an interesting piece (and will likely be more interesting over time), it's still very premature."
      Posted Feb 27, 2007 0:52 UTC (Tue)
                               by BrucePerens (guest, #2510)
                              [Link] (14 responses)
       My Rails applications seem to fragment the heap over time, without obvious leaks (I've looked pretty thoroughly). And thus I think there's room for experimentation with the memory allocator running under Ruby. Bruce
      
           
     
    
      Posted Feb 27, 2007 3:43 UTC (Tue)
                               by ncm (guest, #165)
                              [Link] (13 responses)
       
     
    
      Posted Feb 27, 2007 4:19 UTC (Tue)
                               by BrucePerens (guest, #2510)
                              [Link] (2 responses)
       Bruce
      
           
     
    
      Posted Feb 27, 2007 4:53 UTC (Tue)
                               by ncm (guest, #165)
                              [Link] (1 responses)
       
     
    
      Posted Feb 27, 2007 5:42 UTC (Tue)
                               by BrucePerens (guest, #2510)
                              [Link] 
       Thanks Bruce
      
           
     
      Posted Feb 27, 2007 9:04 UTC (Tue)
                               by flewellyn (subscriber, #5047)
                              [Link] (9 responses)
       That's not necessarily true.  It's going to depend on the type of GC.  A good compacting or 
copying collector, an ephemeral collector, and other strategies can help enormously with locality 
issues, especially with long-lived objects (and with a copying or ephemeral collector, short-lived 
objects don't matter anyway).  Collectors can be tuned to improve locality over hand-allocated 
memory, in fact.  So let's not be spreading old misconceptions about GC now, okay?  It's the 21st 
century, we've had the technique for fifty years, and it's been well proven by now.  :-)  
     
    
      Posted Feb 27, 2007 19:29 UTC (Tue)
                               by ncm (guest, #165)
                              [Link] (8 responses)
       
How many more decades will it be before it's OK to say that the GC experiment has failed? 
      
           
     
    
      Posted Feb 27, 2007 20:29 UTC (Tue)
                               by flewellyn (subscriber, #5047)
                              [Link] (7 responses)
       
I will freely admit that there's probably workloads in which GC can cause problems, but let's not be relegating the whole concept to the bit bucket just because it doesn't work in all cases. 
Perhaps what we really need is some kind of "happy medium" system, in which a programmer COULD manage memory manually if necessary, but can otherwise leave it up to a GC.  I'm not sure what such a system would look like, but it would be interesting to see. 
     
    
      Posted Feb 28, 2007 1:38 UTC (Wed)
                               by drag (guest, #31333)
                              [Link] (5 responses)
       
As you probably know python is a slow language. However it is used in lots of places that demand very high performance from programs. 
How you deal with this, as I am told, is that you write the program you want to do in python. Use it and profile it. If it doesn't give acceptable performance then find the portiosn of the code that are slow and them figure out optimizations for it. 
If you optimize as much as you can in python and it's still slow then you take your now-highly optimized python code and translate it to a C which you then compile and import into your python program as a module. 
The key to making it all work is to write the program using generic code and _then_ identify bottlenecks (instead of speculating ahead of time), then refactor the code until it's no longer possible to get better performance _then_ rewrite it in a lower level language. That way you end up with a working program first (abiet slow) and then spend your time as wisely as possible to make it fast after some more realworld-ish experiance and testing. 
Thats the theory anyways. How well it works out is my guess. 
Well one of the major downsides to this approach is that although writing Python programs is easy, writing python extensions in C is not. 
It requires a significant portion of boiler plate code and can be tricky to make things 'python-ish'. 
 
So that is why Pyrex is invented. You can take straight python code (with some caveats), compile it with pyrex and get compiled python. Not realy that much faster, but the key is that you can then mix and match C code with python code... Allow them to intermingle in that python module your making freely and without all the boilerplate cruft. 
Often times you can end up with superior results then trying to write in pure C or import a C/C++ library using bindings and such. 
 
The some it up as: "Pyrex is Python with C data types." 
      
           
     
    
      Posted Feb 28, 2007 2:29 UTC (Wed)
                               by flewellyn (subscriber, #5047)
                              [Link] (1 responses)
       
You can use a GC with a compiled language: Lisps have had it for years, and have been compiled  
Thanks for the tip, though.  I do write a fair amount of Python code. 
     
    
      Posted Feb 28, 2007 4:27 UTC (Wed)
                               by drag (guest, #31333)
                              [Link] 
       
But I figured it you wrote in something like pyrex you could use C-like code to do memory management manually and then use python-like when you don't feel like it. 
     
      Posted Feb 28, 2007 5:31 UTC (Wed)
                               by jdell (guest, #25923)
                              [Link] (2 responses)
       
     
    
      Posted Feb 28, 2007 6:11 UTC (Wed)
                               by drag (guest, #31333)
                              [Link] (1 responses)
       
On that pyrex page I linked to they mentioned it. They say that it's nice, but it only converts basic types and you can't use it to make new python types. Maybe RubyInline is better. 
Both of them are based off of the Perl inline concept, of course. 
     
    
      Posted Feb 28, 2007 18:24 UTC (Wed)
                               by jdell (guest, #25923)
                              [Link] 
       
RubyInline only auto-converts basic types - From their webpage: Automatic conversion between ruby and C basic types: char, unsigned, unsigned int, char *, int, long, unsigned long 
     
      Posted Feb 28, 2007 7:53 UTC (Wed)
                               by ldo (guest, #40946)
                              [Link] 
       The problem is, even while the software continues to improve the locality of reference of garbage collectors, the hardware continues to increase the cost of violating locality of reference. CPU speeds continue to increase much faster than RAM speeds. Which is why you have L1 and L2 caches (and sometimes even L3 ones) to bridge the gap. As the gap widens, so does the cost of crossing it.
 It's been about half a century since the concepts of machine-independent languages and garbage collection were invented; the first one is pretty much universally taken for granted these days, the second one is not. This even though processor speeds have improved by something like five orders of magnitude over that time; everybody now accepts that you should write your code in machine-independent languages (even for something as machine-dependent as the Linux kernel!), but garbage collection is still too expensive for many uses, and will probably remain that way forever.
      
           
     
      Posted Feb 27, 2007 16:05 UTC (Tue)
                               by josh_stern (guest, #4868)
                              [Link] (1 responses)
       
     
    
      Posted Feb 28, 2007 3:08 UTC (Wed)
                               by josh_stern (guest, #4868)
                              [Link] 
       
http://lambda-the-ultimate.org/node/1617 
      
           
     
    
      These days, memory bandwidth is a crucial factor for real applications, and may not be captured in benchmarks. If you are writing about performance, you should present some material on locality-of-reference and memory usage.A reminder to benchmarkers
      
      We can expect implementations based on garbage-collecting VMs to do badly where locality matters.  If previous experience is any guide, we will see plenty of artificial benchmarks demonstrating rosy (or, in Sr. Cangiano's case, verdant) performance numbers on these targets, but badly disappointed users. I hope this expectation proves wrong, but wouldn't bet that way.
      
          garbage-collecting VMs 
      
      There is lots of room for tuning allocation and object storage, and thus I'm not quite so willing to give up. And the interpreter's pretty good now. I've a few Rails applications in production, and I wish I had as much load as they can service with only one dispatcher.garbage-collecting VMs 
      
      I'm optimistic for a different reason: there's no need to run one's Ruby programs on a GC VM.  If they turn out slow under real loads it will be easy to tell and easy to switch.  It will only be unfortunate on shared servers where it may be hard to convince some users that they're unfairly hammering the rest because their CPU usage looks minimal as they thrash the bus.
      
          garbage-collecting VMs 
      
      I wasn't aware of any non-GC environments for Ruby. There are node-traversal interpreters rather than VMs, but they still GC. And I'm curious about what you think would be better. I sometimes prefer reference counting as with smart pointers in C++, and double-indirect schemes that work with compacting collectors, but the cache performance of such things is worse than GC on average. GC is only worse when the collector is running.garbage-collecting VMs 
      garbage-collecting VMs 
      We can expect implementations based on garbage-collecting VMs to do badly 
where locality matters.
      People have been saying that for decades. Somehow, fifty years down the road, programs that depend on GC and address real-world problems still exhibit abysmal memory-usage patterns, and often abysmal performance, despite stellar benchmark numbers.  "Strategies that can help" evidently do not help enough, or are hard enough to apply that they aren't.  GC programs have always made bad neighbors, and there is no evidence of progress there.  Meanwhile, the machines are not getting faster any more, and are ever more dependent on good cache behavior.  garbage-collecting VMs 
      
      Actually, people have been saying "Eww, GC = slow" for decades, despite huge amounts of research and development into making GCs faster and more robust.garbage-collecting VMs 
      
      
          
      Maybe something like Pyrex?garbage-collecting VMs 
      
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/...
      Well, that's kind of a separate issue from memory management, though.garbage-collecting VMs 
      
since the 70s.  Heck, you can use the Boehm-Weiser GC with C or C++.  So memory 
management is orthogonal to compilation vs interpretation.
      
          
      I wasn't realy thinking about compiled vs interpreted.garbage-collecting VMs 
      
      
          
      You can do the same thing in Ruby with RubyInline.  Very elegant:garbage-collecting VMs 
      
http://www.zenspider.com/ZSS/Products/RubyInline/
      
          
      ya there is a pyinline also, if your curious.garbage-collecting VMs 
      
http://pyinline.sourceforge.net/
      
          
      Yes, you are absolutely correct.  It seems to me that WRT to Python and Ruby, if it is interesting and worth doing, it has already been done in Perl :-)garbage-collecting VMs 
      
 
      
          garbage-collecting VMs 
      Actually, people have been saying "Eww, GC = slow" for decades, despite huge amounts of research and development into making GCs faster and more robust.
      Are there so many tradeoffs in the implementation of byte-code interpreters that it makes sense for so many new and different ones to be created?
      
          Ruby Performance (Linux Journal)
      
      Coincidentally, my question was mostly answered by this thread:Ruby Performance (Linux Journal)
      
 
           