[Pharo-project] Cog VM -- Thanks and Performance / Optimization Questions

Stéphane Ducasse Stephane.Ducasse at inria.fr
Thu Feb 17 18:20:51 CET 2011


Hi john

have a look MessageNode class side methods you will see the list of messages that are inlined.

Stef

On Feb 17, 2011, at 3:21 PM, John B Thiel wrote:

> Cog VM -- Thanks and Performance / Optimization Questions
> 
> 
> To Everyone, thanks for your great work on Pharo and Squeak,  and to
> Eliot Miranda, Ian Piumarta, and all VM/JIT gurus, especially thanks
> for the Squeak VM Cog and its precursors, which I was keenly
> anticipating for a decade or so, and is really going into stride with
> the latest builds.
> 
> I like to code with awareness of performance issues.  Can you tell or
> point me to some performance and efficiency tips for Cog and the
> Squeak compiler -- detail on which methods are inlined, best among
> alternatives, etc.  For example, I understand #to:do: is inlined --
> what about #to:do:by: and #timesRepeat and #repeat  ?  Basically, I
> would like to read a full overview of which core methods are specially
> optimized (or planned).
> 
> I know about the list of NoLookup primitives, as per Object
> class>>howToModifyPrimitives,  supposing that is still valid?
> 
> What do you think is a reasonable speed factor for number-crunching
> Squeak code vs C ?   I am seeing about 20x slower in the semi-large
> scale, which surprised me a bit because I got about 10x on smaller
> tests, and a simple fib: with beautiful Cog is now about 3x (wow!).
> That range, 3x tiny tight loop, to 20x for general multi-class
> computation, seems a bit wide -- is it about expected?
> 
> My profiling does not reveal any hotspots, as such -- it's basically
> 2, 3, 5% scattered around, so I envision this is just the general
> vm/jit overhead as you scale up -- referencing distant objects, slots,
> dispatch lookups, more cache misses, etc.  But maybe I am generally
> using some backwater loop/control methods, techniques, etc. that could
> be tuned up.  e.g. I seem to recall a trace at some point showing
> #timesRepeat taking 10% of the time (?!).   Also, I recall reading
> about an anomaly with BlockClosures -- something like being rebuilt
> every time thru the loop - has that been fixed?  Any other gotchas to
> watch for currently?
> 
> (Also, any notoriously slow subsystems?  For example, Transcript
> writing is glacial.)
> 
> The Squeak bytecode compiler looks fairly straightforward and
> non-optimizing - just statement by statement translation.  So it
> misses e.g. chances to store and reuse, instead of pop, etc.  I see
> lots of redundant sequences emitted.  Are those kind of things now
> optimized out by Cog, or would tighter bytecode be another potential
> optimization path.  (Is that what the Opal project is targetting?)
> 
> -- jbthiel
> 





More information about the Pharo-project mailing list