[Pharo-project] Issue 3347 in pharo: simplified and unified String's line-ending changing methods

pharo at googlecode.com pharo at googlecode.com
Sat Nov 27 18:22:20 CET 2010


Comment #1 on issue 3347 by stephane.ducasse: simplified and unified  
String's line-ending changing methods
http://code.google.com/p/pharo/issues/detail?id=3347

Levente Uzonyi uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-ul.410.mcz

==================== Summary ====================

Name: Collections-ul.410
Author: ul
Time: 23 November 2010, 8:24:12.434 am
UUID: a68748b5-3380-9645-8686-fc80a9710dc6
Ancestors: Collections-ul.409

- added a translation table to String for exchanging cr and lf characters
- simplified and enhanced String's #withSqueakLineEndings and  
#withUnixLineEndings

=============== Diff against Collections-ul.409 ===============

Item was changed:
  ArrayedCollection subclass: #String
        instanceVariableNames: ''
+       classVariableNames: 'AsciiOrder CSLineEnders CSNonSeparators  
CSSeparators CaseInsensitiveOrder CaseSensitiveOrder CrLfExchangeTable  
HtmlEntities LowercasingTable Tokenish UppercasingTable'
-       classVariableNames: 'AsciiOrder CSLineEnders CSNonSeparators  
CSSeparators CaseInsensitiveOrder CaseSensitiveOrder HtmlEntities  
LowercasingTable Tokenish UppercasingTable'
        poolDictionaries: ''
        category: 'Collections-Strings'!

  !String commentStamp: '<historical>' prior: 0!
  A String is an indexed collection of Characters. Class String provides the  
abstract super class for ByteString (that represents an array of 8-bit  
Characters) and WideString (that represents an array of  32-bit  
characters).  In the similar manner of LargeInteger and SmallInteger, those  
subclasses are chosen accordingly for a string; namely as long as the  
system can figure out so, the String is used to represent the given string.

  Strings support a vast array of useful methods, which can best be learned  
by browsing and trying out examples as you find them in the code.

  Here are a few useful methods to look at...
        String match:
        String contractTo:

  String also inherits many useful methods from its hierarchy, such as
        SequenceableCollection ,
        SequenceableCollection copyReplaceAll:with:
  !

Item was added:
+ ----- Method: String class>>crLfExchangeTable (in category 'accessing')  
-----
+ crLfExchangeTable
+
+       ^CrLfExchangeTable!

Item was changed:
  ----- Method: String class>>initialize (in category 'initialization') -----
  initialize   "self initialize"

        | order |
        AsciiOrder := (0 to: 255) as: ByteArray.

        CaseInsensitiveOrder := AsciiOrder copy.
        ($a to: $z) do:
                [:c | CaseInsensitiveOrder at: c asciiValue + 1
                                put: (CaseInsensitiveOrder at: c asUppercase  
asciiValue +1)].

        "Case-sensitive compare sorts space, digits, letters, all the  
rest..."
        CaseSensitiveOrder := ByteArray new: 256 withAll: 255.
        order := -1.
        ' 0123456789' do:  "0..10"
                [:c | CaseSensitiveOrder at: c asciiValue + 1 put: (order :=  
order+1)].
        ($a to: $z) do:     "11-64"
                [:c | CaseSensitiveOrder at: c asUppercase asciiValue + 1  
put: (order := order+1).
                CaseSensitiveOrder at: c asciiValue + 1 put: (order :=  
order+1)].
        1 to: CaseSensitiveOrder size do:
                [:i | (CaseSensitiveOrder at: i) = 255 ifTrue:
                        [CaseSensitiveOrder at: i put: (order := order+1)]].
        order = 255 ifFalse: [self error: 'order problem'].

        "a table for translating to lower case"
        LowercasingTable := String withAll: (Character allByteCharacters  
collect: [:c | c asLowercase]).

        "a table for translating to upper case"
        UppercasingTable := String withAll: (Character allByteCharacters  
collect: [:c | c asUppercase]).

        "a table for testing tokenish (for fast numArgs)"
        Tokenish := String withAll: (Character allByteCharacters collect:
                                                                        [:c  
| c tokenish ifTrue: [c] ifFalse: [$~]]).

        "CR and LF--characters that terminate a line"
        CSLineEnders := CharacterSet crlf.

        "separators and non-separators"
        CSSeparators := CharacterSet separators.
+       CSNonSeparators := CSSeparators complement.
+
+       "a table for exchanging cr with lf and vica versa"
+       CrLfExchangeTable := Character allByteCharacters collect: [ :each |
+               each
+                       caseOf: {
+                               [ Character cr ] -> [ Character lf ].
+                               [ Character lf ] -> [ Character cr ] }
+                       otherwise: [ each ] ]!
-       CSNonSeparators := CSSeparators complement.!

Item was changed:
  ----- Method: String>>withSqueakLineEndings (in category 'internet') -----
  withSqueakLineEndings
        "Assume the string is textual, and that CR, LF, and CRLF are all  
valid line endings.
        Replace each occurence with a single CR."
-       | cr lf indexLF indexCR |
-       lf := Character linefeed.
-       indexLF := self indexOf: lf startingAt: 1.
-       indexLF = 0 ifTrue: [^self].
-
-       cr := Character cr.
-       indexCR := self indexOf: cr startingAt: 1.
-       indexCR = 0 ifTrue: [^self copy replaceAll: lf with: cr].

+       (self includes: Character lf) ifFalse: [ ^self ].
+       (self includes: Character cr) ifFalse: [
+               ^self translateWith: String crLfExchangeTable ].
        ^self withLineEndings: String cr!

Item was changed:
  ----- Method: String>>withUnixLineEndings (in category 'internet') -----
  withUnixLineEndings
        "Assume the string is textual, and that CR, LF, and CRLF are all  
valid line endings.
        Replace each occurence with a single LF."
-       | cr lf indexLF indexCR |
-       cr := Character cr.
-       indexCR := self indexOf: cr startingAt: 1.
-       indexCR = 0 ifTrue: [^self].
-
-       lf := Character linefeed.
-       indexLF := self indexOf: lf startingAt: 1.
-       indexLF = 0 ifTrue: [^self copy replaceAll: cr with: lf].

+       (self includes: Character cr) ifFalse: [ ^self ].
+       (self includes: Character lf) ifFalse: [
+               ^self translateWith: String crLfExchangeTable ].
        ^self withLineEndings: String lf!





More information about the Pharo-project mailing list