[Pharo-project] Cog VM -- Thanks and Performance / Optimization Questions

David T. Lewis lewis at mail.msen.com
Thu Feb 17 16:28:39 CET 2011


I cannot really answer your questions directly, but you will find
lots of information on these topics in Eliot's blog:

  http://www.mirandabanda.org/cogblog/

Dave

On Thu, Feb 17, 2011 at 06:21:01AM -0800, John B Thiel wrote:
> Cog VM -- Thanks and Performance / Optimization Questions
> 
> 
> To Everyone, thanks for your great work on Pharo and Squeak,  and to
> Eliot Miranda, Ian Piumarta, and all VM/JIT gurus, especially thanks
> for the Squeak VM Cog and its precursors, which I was keenly
> anticipating for a decade or so, and is really going into stride with
> the latest builds.
> 
> I like to code with awareness of performance issues.  Can you tell or
> point me to some performance and efficiency tips for Cog and the
> Squeak compiler -- detail on which methods are inlined, best among
> alternatives, etc.  For example, I understand #to:do: is inlined --
> what about #to:do:by: and #timesRepeat and #repeat  ?  Basically, I
> would like to read a full overview of which core methods are specially
> optimized (or planned).
> 
> I know about the list of NoLookup primitives, as per Object
> class>>howToModifyPrimitives,  supposing that is still valid?
> 
> What do you think is a reasonable speed factor for number-crunching
> Squeak code vs C ?   I am seeing about 20x slower in the semi-large
> scale, which surprised me a bit because I got about 10x on smaller
> tests, and a simple fib: with beautiful Cog is now about 3x (wow!).
> That range, 3x tiny tight loop, to 20x for general multi-class
> computation, seems a bit wide -- is it about expected?
> 
> My profiling does not reveal any hotspots, as such -- it's basically
> 2, 3, 5% scattered around, so I envision this is just the general
> vm/jit overhead as you scale up -- referencing distant objects, slots,
> dispatch lookups, more cache misses, etc.  But maybe I am generally
> using some backwater loop/control methods, techniques, etc. that could
> be tuned up.  e.g. I seem to recall a trace at some point showing
> #timesRepeat taking 10% of the time (?!).   Also, I recall reading
> about an anomaly with BlockClosures -- something like being rebuilt
> every time thru the loop - has that been fixed?  Any other gotchas to
> watch for currently?
> 
> (Also, any notoriously slow subsystems?  For example, Transcript
> writing is glacial.)
> 
> The Squeak bytecode compiler looks fairly straightforward and
> non-optimizing - just statement by statement translation.  So it
> misses e.g. chances to store and reuse, instead of pop, etc.  I see
> lots of redundant sequences emitted.  Are those kind of things now
> optimized out by Cog, or would tighter bytecode be another potential
> optimization path.  (Is that what the Opal project is targetting?)
> 
> -- jbthiel




More information about the Pharo-project mailing list