<div dir="ltr">This is what Oddmuse is using, similar to Sunir's idea:<br><br> $UrlProtocols = 'http|https|ftp|afs|news|nntp|mid|cid|mailto|wais|prospero|telnet|gopher|irc|feed';<br> $UrlProtocols .= '|file' if $NetworkFile;<br>
my $UrlChars = '[-a-zA-Z0-9/@=+$_~*.,;:?!\'"()&#%]'; # see RFC 2396<br> my $EndChars = '[-a-zA-Z0-9/@=+$_~*]'; # no punctuation at the end of the url.<br> $UrlPattern = "((?:$UrlProtocols):$UrlChars+$EndChars)";<br>
<br><br><div class="gmail_quote">On Fri, Jul 25, 2008 at 6:06 AM, Sunir Shah <span dir="ltr"><<a href="mailto:sunir@sunir.org">sunir@sunir.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Heya,<br>
<br>
Some Perl I just wrote that does the trick. The essential trick is to match<br>
the last character as everything *except* punctuation and space.<br>
<br>
my $UrlCharacter = "[A-Za-z0-9\;/\?\:\@\&\=\+\$\,\-\_\.\!\~\*\'\(\)\%\#\|]";<br>
my $UrlProtocols = "http|https|ftp|news|mailto|telnet|gopher"; #<br>
Alternatively, you can just use \w+<br>
my $UrlRegexp = qr<((?:$UrlProtocols):$UrlCharacter+[^,\\.?!:;"\'\s])>;<br>
<br>
my @strings = ( "<a href="http://www.yahoo.com" target="_blank">http://www.yahoo.com</a>", "<a href="http://www.yahoo.com" target="_blank">http://www.yahoo.com</a>, end",<br>
"<a href="http://www.yahoo.com" target="_blank">http://www.yahoo.com</a>,", "<a href="http://www.yahoo.com" target="_blank">http://www.yahoo.com</a> end" );<br>
<br>
foreach my $string (@strings) {<br>
print "\n$string\n";<br>
if( $string =~ /$UrlRegexp/ ) {<br>
print "$1\n";<br>
} else {<br>
print "no match";<br>
}<br>
}<br>
<br>
Cheers,<br>
<font color="#888888">Sunir<br>
</font><div><div></div><div class="Wj3C7c"><br>
-----Original Message-----<br>
From: <a href="mailto:wiki-standards-bounces@wikisym.org">wiki-standards-bounces@wikisym.org</a><br>
[mailto:<a href="mailto:wiki-standards-bounces@wikisym.org">wiki-standards-bounces@wikisym.org</a>] On Behalf Of Michael B Allen<br>
Sent: July 24, 2008 10:35 PM<br>
To: <a href="mailto:wiki-standards@wikisym.org">wiki-standards@wikisym.org</a><br>
Subject: [wiki-standards] Free-standing URL PCRE - how not to match<br>
trailingpunctuation?<br>
<br>
Hi,<br>
<br>
The Creole 1.0 standard says:<br>
<br>
Free-standing URLs should be detected and turned into links. Single<br>
punctuation characters (,.?!:;"') at the end of URLs should not be<br>
considered part of the URL.<br>
<br>
The problem is I can't seem to come up with a regex that does NOT match the<br>
optional (,.?!:;"') (herein abbreviated "said chars") at the end of a link.<br>
<br>
This is my regex:<br>
<br>
<br>
([a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}0-9"!#$%&\\()+,\\./:;=?\\@\\\\^_{}~<br>
-]*)(?:[,\\.?!:;"\'](?:\\s|$))<br>
<br>
[Note that each backslash is escaped with an extra backslash because this is<br>
a PHP string literal.]<br>
<br>
The problem is this bit on the end:<br>
<br>
(?:[,\\.?!:;"\'](?:\\s|$))<br>
<br>
This matches one said char followed by white space or the end of the subject<br>
string.<br>
<br>
This works with a URL like:<br>
<br>
<a href="http://www.yahoo.com" target="_blank">http://www.yahoo.com</a>, end<br>
<br>
where the end of the link does not include the ','.<br>
<br>
But with:<br>
<br>
<a href="http://www.yahoo.com" target="_blank">http://www.yahoo.com</a> end<br>
<br>
it doesn't match the URL at all since it doesn't have one of the said chars.<br>
<br>
If I make the entire trailing clause optional, it won't match a one of the<br>
said chars because the said chars will be matched in the path part of the<br>
regex. Meaning this:<br>
<br>
<a href="http://www.yahoo.com" target="_blank">http://www.yahoo.com</a>,<br>
<br>
will include the ',' in the link because it was matched as part of the path<br>
expression.<br>
<br>
Can someone recommend a suitable regex for this?<br>
<br>
Mike<br>
<br>
--<br>
Michael B Allen<br>
PHP Active Directory SPNEGO SSO<br>
<a href="http://www.ioplex.com/" target="_blank">http://www.ioplex.com/</a><br>
_______________________________________________<br>
<br>
wiki-standards mailing list. <a href="mailto:wiki-standards@wikisym.org">wiki-standards@wikisym.org</a><br>
<a href="http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards" target="_blank">http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards</a><br>
<br>
For the wiki-research, wiki-standards, wikisym-announce mailing lists,<br>
please see:<br>
<a href="http://www.wikisym.org/cgi-bin/mailman/listinfo" target="_blank">http://www.wikisym.org/cgi-bin/mailman/listinfo</a><br>
<br>
_______________________________________________<br>
<br>
wiki-standards mailing list. <a href="mailto:wiki-standards@wikisym.org">wiki-standards@wikisym.org</a><br>
<a href="http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards" target="_blank">http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards</a><br>
<br>
For the wiki-research, wiki-standards, wikisym-announce mailing lists, please see:<br>
<a href="http://www.wikisym.org/cgi-bin/mailman/listinfo" target="_blank">http://www.wikisym.org/cgi-bin/mailman/listinfo</a><br>
</div></div></blockquote></div><br></div>