[wiki-standards] Free-standing URL PCRE - how not to match trailing punctuation?

Michael B Allen ioplex at gmail.com
Fri Jul 25 05:01:41 CEST 2008


On Thu, Jul 24, 2008 at 10:38 PM, Filippo A. Salustri
<salustri at ryerson.ca> wrote:
> Could you just define the regexp for a URL as a string ending with a
> alphanumeric?  Then, I should think any non-alphanum, including space,
> newline, and your 'said chars' should terminate the match.

Using such a method with the Wiki text:

  Please visit http://www.yahoo.com/index.html.

would result in:

  Please visit <a
href="http://www.yahoo.com/index">http://www.yahoo.com/index</a>.html.

Mike

> Michael B Allen wrote:
>>
>> Hi,
>>
>> The Creole 1.0 standard says:
>>
>>  Free-standing URLs should be detected and turned into links. Single
>> punctuation characters (,.?!:;"') at the end of URLs should not be
>> considered part of the URL.
>>
>> The problem is I can't seem to come up with a regex that does NOT
>> match the optional (,.?!:;"') (herein abbreviated "said chars") at the
>> end of a link.
>>
>> This is my regex:
>>
>>
>>  ([a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}0-9"!#$%&\\()+,\\./:;=?\\@\\\\^_{}~-]*)(?:[,\\.?!:;"\'](?:\\s|$))
>>
>> [Note that each backslash is escaped with an extra backslash because
>> this is a PHP string literal.]
>>
>> The problem is this bit on the end:
>>
>>  (?:[,\\.?!:;"\'](?:\\s|$))
>>
>> This matches one said char followed by white space or the end of the
>> subject string.
>>
>> This works with a URL like:
>>
>>  http://www.yahoo.com, end
>>
>> where the end of the link does not include the ','.
>>
>> But with:
>>
>>  http://www.yahoo.com end
>>
>> it doesn't match the URL at all since it doesn't have one of the said
>> chars.
>>
>> If I make the entire trailing clause optional, it won't match a one of
>> the said chars because the said chars will be matched in the path part
>> of the regex. Meaning this:
>>
>>  http://www.yahoo.com,
>>
>> will include the ',' in the link because it was matched as part of the
>> path expression.
>>
>> Can someone recommend a suitable regex for this?
>>
>> Mike
>>
>
> --
> Filippo A. Salustri, Ph.D., P.Eng.
> Department of Mechanical and Industrial Engineering
> Ryerson University
> 350 Victoria St, Toronto, ON, M5B 2K3, Canada
> Tel: 416/979-5000 ext 7749
> Fax: 416/979-5265
> Email: salustri at ryerson.ca
> http://deseng.ryerson.ca/~fil/
> _______________________________________________
>
> wiki-standards mailing list. wiki-standards at wikisym.org
> http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards
>
> For the wiki-research, wiki-standards, wikisym-announce mailing lists,
> please see:
> http://www.wikisym.org/cgi-bin/mailman/listinfo
>



-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/


More information about the wiki-standards mailing list