[Pharo-project] Who broke UTF8TextConverter?
siguctua at gmail.com
Tue Aug 28 14:35:25 CEST 2012
or it was like that from the birth??
| stream |
stream := WriteStream on: (ByteArray new: 100).
UTF8TextConverter new nextPut: (Character value: 129 ) toStream: stream.
This is WRONG! RTFM, about utf8 encoding, please! :)
nextPut: aCharacter toStream: aStream
| leadingChar nBytes mask shift ucs2code |
aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream].
in my case, stream is binary, so it goes directly to #storeBinaryOn:
"Store the receiver on a binary (file) stream"
value < 256
ifTrue:[aStream basicNextPut: value]
ifFalse:[aStream nextInt32Put: value].
This is not even close to UTF8.
If character code is less than 256, it will store a single byte (wtf?),
and if more than that, it will store 32-bit integer value in
big-endian order (wtf raisedToPower: 2)..
i wonder , for what purpose we actually having this code path? this
stuff is completely useless.
according to implementation of storeBinaryOn:
there's no way how you can read the same character value back.
because it can be 1 byte or 4 bytes.. but you simply cannot determine which one.
this is one of the reasons we using utf8 encoding, btw ;)
More information about the Pharo-project