Mailing List Archive

[Bug 6788] URL detection sometimes does not work
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6788

Lemat <lemat@lemat.priv.pl> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |lemat@lemat.priv.pl

--- Comment #1 from Lemat <lemat@lemat.priv.pl> 2012-04-10 21:46:38 UTC ---
The dot in URL is 2e hex.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6788] URL detection sometimes does not work [ In reply to ]
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6788

D. Stussy <software+spamassassin@kd6lvw.ampr.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |software+spamassassin@kd6lv
| |w.ampr.org

--- Comment #2 from D. Stussy <software+spamassassin@kd6lvw.ampr.org> 2012-04-11 21:48:32 UTC ---
No. The dot in URL is 2E using hex. Hexidecimal always uses CAPITAL letters.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6788] URL detection sometimes does not work [ In reply to ]
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6788

--- Comment #3 from Lemat <lemat@lemat.priv.pl> 2012-04-23 22:45:41 UTC ---
those headers were misleading...
the problem is with space at the end of URL
the parser is (probably) removing HTML tags and dots, commas etc. making
something like that:

toppolandjob.com</b>Odp... -> toppolandjob.comOdp...- URI not found
toppolandjob.com,</b>Odp... -> toppolandjob.comOdp... - URI not found
toppolandjob.com</b> Odp... -> toppolandjob.com Odp... - URI found
toppolandjob.com,</b> Odp... -> toppolandjob.com Odp... - URI found
toppolandjob.com </b>Odp... -> toppolandjob.com Odp... - URI found

fix:

HTML.pm
sub parse {
...
$text =~ s/>/> /g; # before $self->SUPER::parse(...

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.