[Pharo-project] Who broke UTF8TextConverter?

Igor Stasenko siguctua at gmail.com
Tue Aug 28 14:35:25 CEST 2012

or it was like that from the birth??


| stream |
stream := WriteStream on: (ByteArray new: 100).

UTF8TextConverter new nextPut: (Character value: 129 ) toStream: stream.

stream contents

This is WRONG! RTFM, about utf8 encoding, please! :)


nextPut: aCharacter toStream: aStream
	| leadingChar nBytes mask shift ucs2code |
	aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream].

in my case, stream is binary, so it goes directly to #storeBinaryOn:

storeBinaryOn: aStream
	"Store the receiver on a binary (file) stream"
	value < 256
		ifTrue:[aStream basicNextPut: value]
		ifFalse:[aStream nextInt32Put: value].

This is not even close to UTF8.
If character code is less than 256, it will store a single byte (wtf?),
and if more than that, it will store 32-bit integer value in
big-endian order (wtf raisedToPower: 2)..

i wonder , for what purpose we actually having this code path? this
stuff is completely useless.
according to implementation of storeBinaryOn:
there's no way how you can read the same character value back.
because it can be 1 byte or 4 bytes.. but you simply cannot determine which one.
this is one of the reasons we using utf8 encoding, btw ;)

Best regards,
Igor Stasenko.

More information about the Pharo-project mailing list