Mailing List Archive

[Bug 6788] URL detection sometimes does not work
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6788

Kevin A. McGrail <kmcgrail@pccc.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
CC| |kmcgrail@pccc.com
Resolution|--- |DUPLICATE

--- Comment #4 from Kevin A. McGrail <kmcgrail@pccc.com> ---
I believe this is a duplicate of a bug already in the system.

*** This bug has been marked as a duplicate of bug 6751 ***

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6788] URL detection sometimes does not work [ In reply to ]
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6788

Lemat <lemat@lemat.priv.pl> changed:

What |Removed |Added
----------------------------------------------------------------------------
Resolution|DUPLICATE |WORKSFORME

--- Comment #5 from Lemat <lemat@lemat.priv.pl> ---
this is not a duplicate of bug 6751. The dot is always 2E hex.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6788] URL detection sometimes does not work [ In reply to ]
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6788

Kevin A. McGrail <kmcgrail@pccc.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|WORKSFORME |---

--- Comment #6 from Kevin A. McGrail <kmcgrail@pccc.com> ---
(In reply to comment #5)
> this is not a duplicate of bug 6751. The dot is always 2E hex.

Sorry about that. I viewed that as an alternate character being used and lumped
it together.

Reopening though I tried your small fix in HTML.pm

Index: lib/Mail/SpamAssassin/HTML.pm
===================================================================
--- lib/Mail/SpamAssassin/HTML.pm (revision 1338322)
+++ lib/Mail/SpamAssassin/HTML.pm (working copy)
@@ -240,6 +240,10 @@
# the HTML::Parser API won't do it for us
$text =~ s/<(\w+)\s*\/>/<$1>/gi;

+ # Bug 6788 https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6788
+ # we want a space after a closing tag so that URLs aren't lumped together
+ $text =~ s/>/> /g;
+
# Ignore stupid warning that can't be suppressed: 'Parsing of
# undecoded UTF-8 will give garbage when decoding entities at ..' (bug 4046)
{


This breaks html_obfu.t

t/html_obfu.t 9 5 55.56% 1-5

Thoughts?

--
You are receiving this mail because:
You are the assignee for the bug.