Mailing List Archive

[Bug 6781] multiple emails in From
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6781

Mark Martinec <Mark.Martinec@ijs.si> changed:

What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.4.0

--- Comment #5 from Mark Martinec <Mark.Martinec@ijs.si> ---
I'm seeing these messages too, and some of them are sneaking in.

The RFC 5322 section 3.6.2 states:

If the originator of the message can be indicated by a single mailbox
and the author and transmitter are identical, the "Sender:" field
SHOULD NOT be used. Otherwise, both fields SHOULD appear.

As these spam messages currently do not have a Sender present,
it should be safe to do:

header __HAS_SENDER exists:Sender

header MULTI_FROM_ADDR From =~ /\@.*,.*\@/
describe MULTI_FROM_ADDR Multiple addresses in a From header field
score MULTI_FROM_ADDR 1

meta MULTI_FROM_BAD MULTI_FROM_ADDR && !__HAS_SENDER
describe MULTI_FROM_BAD Multiple addresses in From, but no Sender
score MULTI_FROM_BAD 6


(btw, we should be adding some of the missing '__HAS_* exists:*'
rules for completeness anyway, they come handy with other official
or local metarules)

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6781] multiple emails in From [ In reply to ]
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6781

--- Comment #6 from AXB <axb.lists@gmail.com> ---
(In reply to comment #5)
> I'm seeing these messages too, and some of them are sneaking in.
>
> The RFC 5322 section 3.6.2 states:
>
> If the originator of the message can be indicated by a single mailbox
> and the author and transmitter are identical, the "Sender:" field
> SHOULD NOT be used. Otherwise, both fields SHOULD appear.
>
> As these spam messages currently do not have a Sender present,
> it should be safe to do:
>
> header __HAS_SENDER exists:Sender
>
> header MULTI_FROM_ADDR From =~ /\@.*,.*\@/
> describe MULTI_FROM_ADDR Multiple addresses in a From header field
> score MULTI_FROM_ADDR 1
>
> meta MULTI_FROM_BAD MULTI_FROM_ADDR && !__HAS_SENDER
> describe MULTI_FROM_BAD Multiple addresses in From, but no Sender
> score MULTI_FROM_BAD 6
>
>
> (btw, we should be adding some of the missing '__HAS_* exists:*'
> rules for completeness anyway, they come handy with other official
> or local metarules)

+1 to the motion for a 20_hasbase.cf

I have a collection of them and would volunteer to start adding

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6781] multiple emails in From [ In reply to ]
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6781

--- Comment #7 from Mark Martinec <Mark.Martinec@ijs.si> ---
> As these spam messages currently do not have a Sender present,
> it should be safe to do

I meant: good enough for this spam, and safe for valid mail even with
multiple authors.


> > (btw, we should be adding some of the missing '__HAS_* exists:*'
> > rules for completeness anyway, they come handy with other official
> > or local metarules)
>
> +1 to the motion for a 20_hasbase.cf
> I have a collection of them and would volunteer to start adding

Good idea.

These would be needed for Bug 6780:

header __HAS_FROM exists:From
header __HAS_TO exists:To
header __HAS_CC exists:CC

and this one in this PR:

header __HAS_SENDER exists:Sender

Several other are scattered all over the place, would be nice
to have them all in one place.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6781] multiple emails in From [ In reply to ]
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6781

--- Comment #8 from AXB <axb.lists@gmail.com> ---
(In reply to comment #7)
> > As these spam messages currently do not have a Sender present,
> > it should be safe to do
>
> I meant: good enough for this spam, and safe for valid mail even with
> multiple authors.
>
>
> > > (btw, we should be adding some of the missing '__HAS_* exists:*'
> > > rules for completeness anyway, they come handy with other official
> > > or local metarules)
> >
> > +1 to the motion for a 20_hasbase.cf
> > I have a collection of them and would volunteer to start adding
>
> Good idea.
>
> These would be needed for Bug 6780:
>
> header __HAS_FROM exists:From
> header __HAS_TO exists:To
> header __HAS_CC exists:CC
>
> and this one in this PR:
>
> header __HAS_SENDER exists:Sender
>
> Several other are scattered all over the place, would be nice
> to have them all in one place.

Commited 10_hasbase.cf

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6781] multiple emails in From [ In reply to ]
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6781

--- Comment #9 from Mark Martinec <Mark.Martinec@ijs.si> ---
Observed 3800 messages which hit MULTI_FROM_BAD during the last four days.

Among these there were three legitimate mail messages with two addresses
in a From, and a missing Sender (a conference registration confirmation
or paper submissions). These were genuine false positives (of which one
was quarantined for exceeding a spam threshold, while the other two
were rescued by other rules).

Besides the above three, there were three additional false positives, where
my version of MULTI_FROM_ADDR misfired. These three were a result of a
B64-encoded display name in the iso-2022-jp character set, which happened
to contain bytes '@' and ',' in the b64-decoded string.

The string that was matched looked like (somewhat obfuscated):
_$B:#1xxf_(B _$B@5,_(B <xxx@example.com>

It is most unfortunate that the :addr modifier only returns the first
of multiple addresses (in a To, From, Cc, ...), which means it can't
be used in counting the number of e-mail addresses in a From.

It also seems wrong to do the manual (in-the-rule) parsing *after*
the QP or B decoding, so apparently the :raw form must be used,
which means having to deal with folding, comments, display names,
and a group name.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 6781] multiple emails in From [ In reply to ]
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6781

--- Comment #10 from D. Stussy <software+spamassassin@kd6lvw.ampr.org> ---
RE: Comment #9 - I must disagree:

A message with multiple from entries, no sender header, and non-spammy content
is not a "false positive" as it is an RFC violation and therefore not a valid
message.

Although spam is generally about content, I cannot accept that a malformed
message is a legitimate message. Such malformations are precisely the target
of the rule(set) that we are developing as a result of this bug report.

Now, as for the misfirings on a character-set-encoded string, that could be a
problem. Maybe we need a "decoded" function, which for non-encoded strings
will be identical to "raw", but for strings starting with "=charset", it
obviously decodes them and performs comparisons thereafter.

--
You are receiving this mail because:
You are the assignee for the bug.