[Pharo-project] Basic tricks for improving a serializer?

Mariano Martinez Peck marianopeck at gmail.com
Wed May 18 13:27:26 CEST 2011


On Wed, May 18, 2011 at 1:03 PM, Henrik Sperre Johansen <
henrik.s.johansen at veloxit.no> wrote:

>  On 18.05.2011 09:21, Martin Dias wrote:
>
>
>
> On Tue, May 17, 2011 at 7:16 PM, Igor Stasenko <siguctua at gmail.com> wrote:
>
>>  On 17 May 2011 22:58, Mariano Martinez Peck <marianopeck at gmail.com>
>> wrote:
>> >
>> >
>> > On Tue, May 17, 2011 at 10:31 PM, Sven Van Caekenberghe <sven at beta9.be>
>> > wrote:
>> >>
>> >> On 17 May 2011, at 21:57, Mariano Martinez Peck wrote:
>> >>
>> >> > Sven, I want to make it work :)
>> >> >
>> >> > so....the missing methods I told you that I need are:
>> >> >
>> >> > #nextStringPut:
>> >> > #nextNumber:put:
>> >> > #nextInt32Put:
>> >> > #nextWordPut:
>> >>
>> >> I guess these are pretty easy. But I think they clutter the interface
>> of
>> >> ZnBufferedWriteStream, so maybe you should make a subclass.
>> >>
>> >
>> > Yeah, don't worry. I can even duplicate the class hehehe
>> >
>> >>
>> >> > #contents
>> >> >
>> >> > Implement #contents I guess it is something like:
>> >> >
>> >> > ZnBufferedWriteStream >> contents
>> >> > ^ stream contents
>> >>
>> >> Why to you need #contents ?
>> >
>> > becasue I am an idiot. No, I don't need it. You are correct. Thanks for
>> > asking.
>> >
>> >>
>> >> I would say that it goes a bit against the concept of a stream as a
>> sink
>> >> of data.
>> >> I haven't looked, but I would guess that saying #contents to a
>> FileStream
>> >> is not efficient.
>> >>
>> >> > Those missing methods I need are implemented PositionableStream. I
>> took
>> >> > the implementation from there and  put it in ZnBufferedWriteStream.
>> >> > I just added to them a first line "    self flushBufferIfFull."
>> >>
>> >> That is probably OK, except when your string becomes larger than the
>> >> buffer. Have a look at #nextPutAll:
>> >>
>> >
>> > I am not sure if I understood. The following are correct for sure then:
>> >  #nextNumber:put:
>> >  #nextInt32Put:
>> >  #nextWordPut:
>> >
>> >
>> > And #nextStringPut:   is like this:
>> >
>> > nextStringPut: s
>> >     "Append the string, s, to the receiver.  Only used by DataStream.
>> Max
>> > size of 64*256*256*256."
>> >
>> >     | length |
>> >     self flushBufferIfFull.
>> >     (length := s size) < 192
>> >         ifTrue: [self nextPut: length]
>> >         ifFalse:
>> >             [self nextPut: (length digitAt: 4)+192.
>> >             self nextPut: (length digitAt: 3).
>> >             self nextPut: (length digitAt: 2).
>> >             self nextPut: (length digitAt: 1)].
>> >     self nextPutAll: s asByteArray.
>> >     ^s
>> >
>>
>>  Sorry, but i can't resist commenting on that.
>> Why, if you demand from stream to implement #nextInt32Put:
>> a the same time, you writing code like this
>>
>> self nextPut: (length digitAt: 4)+192.
>>             self nextPut: (length digitAt: 3).
>>             self nextPut: (length digitAt: 2).
>>             self nextPut: (length digitAt: 1)
>>
>>  ?
>> Then just extend your serializer with a notion of 'length' field,
>> which you can use for anything where you need to encode length/size value,
>> but not just for Strings.
>> So, then the above method could be as short as:
>>
>> nextStringPut: s
>>  self putLength: s size.
>>  self nexPutAll: s asByteArray.
>>
>> and here you have a potential caveat because your string could be
>> WideString .. muhahaha.
>>
>> So, i suggest you to reconsider the way how you serializing strings.
>> Instead what you could do is to extend ByteString and WideString (and
>> perhaps similarily do for ByteSymbol and WideSymbol),
>> the methods which is responsible to turning a receiver in a sequence
>> of bytes, and then simply put it into output stream,
>> whatever it might be.
>>
>> Then you don't need #nextStringPut: because its
>> a) not polymorphic, because apparently serializing ByteString should
>> be different from serializing WideString
>> b) instead you implementing this in
>> Byte/WideString>>serializeToFuelStream: aStream and you done.
>>
>
> Yes, I am not sure if #nextStringPut: is not polymorphic, but I think that
> with #serializeToFuelStream: we can avoid converting the WideString to
> ByteArray just to write the ByteArray to the stream, and instead just write
> the WideString to the stream. Right?
>
> You don't ever need the asByteArray call to store ByteStrings if you use
> nextPutAll: on a default stream in binary mode, the primitive handles that
> just fine for variableByte subclasses.
>
> If you are doing the same as DataStream, only ByteStrings are written using
> nextStringPut: , WideStrings are stored with the default writeWordLike:
> facilities.
>
> Is there a document describing the binary format of Fuel somewhere?
>

Hi Henry. When I sent this email, I was thinking in you :)

Depends what you mean by binary format ;)   How the whole subgraph is
encoded (the order, the references, indexes, etc) is quite hard to follow.
This is because of the pickle format. But I guess that if you check this
slides: http://rmod.lille.inria.fr/web/pier/software/Fuel   you will get it.

That being said, to see how Fuel serialize or deserialize a particular class
is really easy. Every class has a Fuel extension method called
#fuelSerializer that answers the class to serialize/des it.
So the easiest way is to check for #fuelSerializer  implementors and then
map to its serializer class. The relation is not 1 to 1.

To load it:

Gofer it
    squeaksource: 'Fuel';
    package: 'ConfigurationOfFuel';
load.

((Smalltalk at: #ConfigurationOfFuel) project version:  '1.2-baseline')
load.

If you play with something and you want to run benchmarks, just open a
Transcript and evaluate:

FLBenchmarks newFullFuelOnly run


Thanks Henry for any review you can do.

-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gforge.inria.fr/pipermail/pharo-project/attachments/20110518/9344dbc8/attachment.htm>


More information about the Pharo-project mailing list