[Pharo-project] [Moose-dev] Re: how to deal with string position in relation to cr/crlf
toon.verwaest at gmail.com
Thu Apr 28 12:03:17 CEST 2011
> Indeed. The problem is that the token of PetitParser only knows the character position from the stream. This would mean that we would have to modify the tracking of the position with extra information.
> Is there no other option?
If what you are doing is relating it back to the original source code
... isn't the original source code stored in 1 specific format, \r, \n
or \r\n? Or do you use the models that are parsed once to map it back to
different versions of the same files on different platforms? In that
case you could always convert the input file into the format you want.
To me however it seems like it makes most sense to keep the line +
column count if you are going to keep anything yourself anyway. You do
not need to rely on what petitparser knows already, you can keep this
data yourself. Petitparser needs to have the char location since that's
where it's parsing. The line+column is metadata that you need, not
To implement this you just again need to keep track of all the newlines
you see. Everytime you see a newline you update your newline count AND
keep track of the position where the newline happened. This way you
actually have the column count (the actual position - the location where
the last newline occurred).
Another option I see is always parsing using a \r or \n file format by
first converting it. Then when you show the position, you will have to
check if the file is actually \r, \n or if it's rather \r\n. If it's \r
or \n then you just give back the number as is. Otherwise you walk over
the file to find out where all the newlines occur. From this you can
build an array that tells you which position ranges have to add how many
For example [0, 10, 15, 17, 20] if the newlines occur at [0, 9, 13, 14,
16] (always subtract 1 char of the newline since we map from 1-sized
newline to 2-sized newline). Now you can just translate your position by
looking for the highest number lower than the position. For example if
you were looking at position 15, this will map onto 14, which has index
4, so you have to do + 4 -> the real position is 19. This is just a
binary search for each position in the array of newlines, so it's
O(number of newlines in file * number of tokens) to translate the model
to become architecture-dependent.
The last option is to just store both position formats in your model
directly, and figuring out which fileformat you are mapping it back
onto. This is O(1) but requires double the data for position numbers (no
biggy I suppose); but it does require your parser to keep track of the
position info itself again. The previous option avoids that.
Hope this helps to make some sort of a decision :)
More information about the Pharo-project