[wiki-standards] Re: Better Creole Tests?
Radomir Dopieralski
wikistandards at sheep.art.pl
Sun Jul 27 19:30:25 CEST 2008
Sun, Jul 27, 2008 at 12:39:17PM -0400:
> On Sun, Jul 27, 2008 at 5:37 AM, Radomir Dopieralski
> <wikistandards at sheep.art.pl> wrote:
> > Sat, Jul 26, 2008 at 02:02:44PM -0400:
> >> On Sat, Jul 26, 2008 at 12:40 PM, Radomir Dopieralski
> >> Unfortunately because my tokenizer uses a single regex with an OR
> >> condition for each token, the image closing token '}}' matches before
> >> the look-ahead expression. So it seems a look-ahead expression doesn't
> >> follow the precedence expected.
[...]
> > That's really straightforward too, isn't it?
> >
> > (?<!})}}(?!})
>
> Apparently not because this doesn't work.
<?php
preg_match_all('/(?<!})}}(?!})|}}}(?!})/', "{{{foo}}}}} {{bar}} baz", $matches);
print_r($matches);
?>
gives output:
Array ( [0] => Array ( [0] => }}} [1] => }} ) )
which looks about right to me, but I'm not that experienced with PHP.
Of course, you have to add some basic cases for handling beginning and
ends of the lines -- but this is left as an exercise for the reader.
[...]
> So, I use a regex tokenizer that grabs two tokens at a time (stuff
> that doesn't match and a token). Then I use a regular state machine to
> handle the grammer.
Really, I'd advice you to use a regular expression that recognizes all
the tokens at once, because if you do it the way you describe here, you
are in fact reading the input log n times, where n is the number of
tokens. This produces O(n*log n) complexity of the parser, which is not
really too hot.
> >> Of course it would be better if it could be handled with only regex
> >> but these sort of clauses are common for a tokenizer + loop model.
> > Not really, they are a sure sign of a failure at some point in the
> > development: either a failure of preparing proper grammar and parsing
> > flow for the specific language, or a failure of the parsing technique
> > used (like trying to parse a context free grammar with regexps). Of
> > course, they can sometimes simplify the parser greatly, so the ugliness
> > might be a cheap price, but you shouldn't take them so lightly.
> > In particular, they can increase the computational complexity of your
> > parser -- and even make it never stop in some cases.
> My implementation handles every case that I've managed to think up.
"Program testing can be a very effective way to show the presence of bugs,
but is hopelessly inadequate for showing their absence." -- EWD, EWD340
> Here's my sample page:
>
> http://www.ioplex.com/~miallen/CreoleTest.html
>
> And it's as fast as I think it could ever be for PHP.
>
> I'm not using regex to iteratively transform the entire input 50 times
> like some implementations are doing.
That's really nice of you and I really admire it. Honest.
> You're jumping to conclusions to ring your own bell at my expense. And
> I would care if you provided a working answer to my question but you
> didn't even do that.
I'm trying to help you, and as I'm providing answers to your questions,
you seem to be able to formulate your actual question better. If you don't
like my answers, feel free to ignore them and instead use the
documentation of the language you are using. Paper endures all.
http://pl2.php.net/manual/en/regexp.reference.php#regexp.reference.assertions
--
Radomir `The Sheep' Dopieralski <http://sheep.art.pl>
On and on until we change / Everything remains the same
On and on until we learn / On and on the wheels will turn
More information about the wiki-standards
mailing list