[wiki-standards] Free-standing URL PCRE - how not to match
trailing punctuation?
Filippo A. Salustri
salustri at ryerson.ca
Fri Jul 25 04:38:34 CEST 2008
Could you just define the regexp for a URL as a string ending with a
alphanumeric? Then, I should think any non-alphanum, including space,
newline, and your 'said chars' should terminate the match.
Just an idea.
Cheers.
Fil Salustri
Michael B Allen wrote:
> Hi,
>
> The Creole 1.0 standard says:
>
> Free-standing URLs should be detected and turned into links. Single
> punctuation characters (,.?!:;"') at the end of URLs should not be
> considered part of the URL.
>
> The problem is I can't seem to come up with a regex that does NOT
> match the optional (,.?!:;"') (herein abbreviated "said chars") at the
> end of a link.
>
> This is my regex:
>
> ([a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}0-9"!#$%&\\()+,\\./:;=?\\@\\\\^_{}~-]*)(?:[,\\.?!:;"\'](?:\\s|$))
>
> [Note that each backslash is escaped with an extra backslash because
> this is a PHP string literal.]
>
> The problem is this bit on the end:
>
> (?:[,\\.?!:;"\'](?:\\s|$))
>
> This matches one said char followed by white space or the end of the
> subject string.
>
> This works with a URL like:
>
> http://www.yahoo.com, end
>
> where the end of the link does not include the ','.
>
> But with:
>
> http://www.yahoo.com end
>
> it doesn't match the URL at all since it doesn't have one of the said chars.
>
> If I make the entire trailing clause optional, it won't match a one of
> the said chars because the said chars will be matched in the path part
> of the regex. Meaning this:
>
> http://www.yahoo.com,
>
> will include the ',' in the link because it was matched as part of the
> path expression.
>
> Can someone recommend a suitable regex for this?
>
> Mike
>
--
Filippo A. Salustri, Ph.D., P.Eng.
Department of Mechanical and Industrial Engineering
Ryerson University
350 Victoria St, Toronto, ON, M5B 2K3, Canada
Tel: 416/979-5000 ext 7749
Fax: 416/979-5265
Email: salustri at ryerson.ca
http://deseng.ryerson.ca/~fil/
More information about the wiki-standards
mailing list