Mailing List Archive

spool format error: size
Hi there!

I get a lot of this errors:

4d  6.6K 1fhLuL-0003Kp-6q <systemd-devel-bounces@lists.freedesktop.org>
    *** spool format error: size=7871 ***
          jakob@localhost

In the logs i find:
Jul 22 23:30:22 aldebaran exim[13430]: 2018-07-22 23:30:22
1fhLvq-0003Uc-Es SA: Debug: SAEximRunCond expand returned: 'true'
Jul 22 23:30:22 aldebaran exim[13430]: 2018-07-22 23:30:22
1fhLvq-0003Uc-Es SA: Debug: check succeeded, running spamc
Jul 22 23:30:22 aldebaran spamd[14259]: spamd: connection from localhost
[::1]:56224 to port 783, fd 5
Jul 22 23:30:22 aldebaran spamd[14259]: spamd: processing message
<9132e567-2594-4796-9af6-0a0f9132c4c7@xtinp2mta4850.xt.local> for
Debian-exim:118
Jul 22 23:30:23 aldebaran spamd[14259]: spamd: clean message (-3.8/5.0)
for Debian-exim:118 in 0.9 seconds, 77991 bytes.
Jul 22 23:30:23 aldebaran spamd[14259]: spamd: result: . -3 -
DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,RCVD_IN_DNSWL_BLOCKED,RCVD_IN_RP_CERTIFIED,RCVD_IN_RP_SAFE,RDNS_NONE,T_DKIMW
Jul 22 23:30:23 aldebaran exim[13430]: 2018-07-22 23:30:23
1fhLvq-0003Uc-Es SA: Action: scanned but message isn't spam: score=-3.8
required=5.0 (scanned in 1/1 secs | Message-Id:
9132e567-2594-4796-9af6-0a0f9132
Jul 22 23:30:23 aldebaran spamd[9813]: prefork: child states: II
Jul 22 23:30:23 aldebaran exim[13430]: 2018-07-22 23:30:23
1fhLvq-0003Uc-Es <=
bounce-13_HTML-36038220-640150-6222865-771@bounce.email.XXXXXX.com
H=localhost (aldebaran.local) [127.0.0.1] P=esmtp S=80098
Jul 22 23:30:23 aldebaran fetchmail[4961]: Nachricht
me@my.domain.org@pop.myprovider.net:5199 von 5231 wird gelesen (78698
Bytes) nicht gelöscht
Jul 22 23:30:23 aldebaran exim[13455]: 2018-07-22 23:30:23
1fhLvq-0003Uc-Es Format error in spool file 1fhLvq-0003Uc-Es-H: size=6808


I use btrfs as filesystem. But scrubbing the filesystem ends without any
error.

I have exim4 4.91-5 from debian/buster.

Maybe it is a problem of exim? Or does it com from spamassassin?


Jakob


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: spool format error: size [ In reply to ]
On 07/26/2018 12:33 PM, Jakobus Schürz via Exim-users wrote:
> I get a lot of this errors:
>
> 4d  6.6K 1fhLuL-0003Kp-6q <systemd-devel-bounces@lists.freedesktop.org>
>     *** spool format error: size=7871 ***
>           jakob@localhost


> Maybe it is a problem of exim? Or does it com from spamassassin?

Most likely, SA is modifying a file exim thinks it owns.
--
Cheers,
Jeremy


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: spool format error: size [ In reply to ]
Hi Jakob,

I ran into this issue some weeks ago after updating to latest LTS Ubuntu
(Exim 4.90.1-1ubuntu1). Tracked this down to sa-exim which is screwing
up things as soon as more than one message is delivered over one
connection. By looking at the spool files I saw that random data was
missing/inserted.

On 26.07.2018 13:33, Jakobus Schürz via Exim-users wrote:
> Hi there!
>
> I get a lot of this errors:
>
> 4d  6.6K 1fhLuL-0003Kp-6q <systemd-devel-bounces@lists.freedesktop.org>
>     *** spool format error: size=7871 ***
>           jakob@localhost


Similar, to what I saw in my system. Usually the random data was
inserted in the sa-exim header line which listed the hosts, where
scanning was performed.


> I use btrfs as filesystem. But scrubbing the filesystem ends without any
> error.

I initially thought of fs corruptions as well, but found no evidence.

> I have exim4 4.91-5 from debian/buster.
>
> Maybe it is a problem of exim? Or does it com from spamassassin?

It is most likely the sa-exim module. I did some tests and was not able
to get it to run reliably. Limiting exim to accept only one message per
connection was no option for my setup (too much traffic), it might be
for you.

I tried greylistd as alternative, however was not very satisfied (no
easy dependency on spamassassin scores, only time-based whitelisting,
stability issues, not very robust in regards of alternating IP adresses
most big providers use, esp. for IPv6). Finally just configured exim to
do spam checks via spamassassin and reject for scores >= 6.0. Works with
acceptable SPAM rejection (more or less no SPAM in mailboxes) and is
rock-solid stable.

hth,
Thomas

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: spool format error: size [ In reply to ]
Thursday, 26 July 2018 kl. 19:41:08 CEST exim-users--- via Exim-users wrote:
> On 26.07.2018 13:33, Jakobus Sch?rz via Exim-users wrote:
> > Hi there!
> >
> > I get a lot of this errors:
> >
> > 4d 6.6K 1fhLuL-0003Kp-6q <systemd-devel-bounces@lists.freedesktop.org>
> > *** spool format error: size=7871 ***
> > jakob@localhost

> I ran into this issue some weeks ago after updating to latest LTS Ubuntu
> (Exim 4.90.1-1ubuntu1). Tracked this down to sa-exim which is screwing
> up things as soon as more than one message is delivered over one
> connection. By looking at the spool files I saw that random data was
> missing/inserted.

I'm trying to figure this out so that the next Debian release can have
local_scan enabled again, but haven't been able to reproduce it so far. I
thought it must have to do with CHUNKING and spool_wireformat, since that's
new, but the error only indicates a broken -H file, doesn't it (errno =
ERRNO_SPOOLFORMAT is set by spool_read_header())? Was body rewriting even
enabled when breakage occurred? If it only happens when receving more than one
message over the same connection, that would seem to suggest that some static
variable isn't reset properly, but I can't find any.

Nevertheless, can anybody please briefly explain how the body file differs
when the header file has the -spool_file_wireformat flag? The SMTP read
abstractions are a bit tricky to pierce through and the documentation only
says that "some -D files can have an alternate format" and that users of the
local_scan API have to be aware of that. It also says "Lines are terminated
with an ASCII CRLF pair. There is no dot-stuffing (and no dot-termination)." I
see now that CRs are kept, but is there dot-stuffing otherwise? Anything else
to keep in mind?

I did find one other problem though, namely that sa-exim doesn't expect CRLF
and only strips LF when looking for the empty line that terminates the header.

Thanks,
--
Magnus Holmgren holmgren@lysator.liu.se




--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: spool format error: size [ In reply to ]
On 22/04/2019 17:11, Magnus Holmgren via Exim-users wrote:
> Nevertheless, can anybody please briefly explain how the body file differs
> when the header file has the -spool_file_wireformat flag?

The spool datafile is a direct copy of what is on the wire.
Hence the name of the flag.
--
Cheers,
Jeremy


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: spool format error: size [ In reply to ]
Hi Magnus,

appreciate that you plan to look into this issue.

On 22.04.19 18:11, Magnus Holmgren via Exim-users wrote:
>> I ran into this issue some weeks ago after updating to latest LTS Ubuntu
>> (Exim 4.90.1-1ubuntu1). Tracked this down to sa-exim which is screwing
>> up things as soon as more than one message is delivered over one
>> connection. By looking at the spool files I saw that random data was
>> missing/inserted.

> I'm trying to figure this out so that the next Debian release can have
> local_scan enabled again, but haven't been able to reproduce it so far. I
> thought it must have to do with CHUNKING and spool_wireformat, since that's
> new, but the error only indicates a broken -H file, doesn't it (errno =
> ERRNO_SPOOLFORMAT is set by spool_read_header())? Was body rewriting even
> enabled when breakage occurred? If it only happens when receving more than one
> message over the same connection, that would seem to suggest that some static
> variable isn't reset properly, but I can't find any.

spool_wireformat did not trigger it in my case (running Ubuntu 18.04,
without spool_wireformat enabled). I was able to reproduce it with
following setup:
Exim 4.90.1-1ubuntu1 with sa-exim running on one hosts (Ubuntu standard
config with TLS enabled, sa-exim adding some headers) acting as
smarthost, second Exim generating mail (store some 10+ messages in queue
and trigger delivery via e.g. "exim4 -qff") and using that smarthost. As
soon as more than one message was delivered via one connection, files
got corrupted (not in every delivery but with a chance of about 10-20%,
iirc). From my tests, it seems that some random data was written to the
file (sometimes other parts of the message, sometimes other stuff).

I could dig up some old backups and provide you with my config, if you
are not able to reproduce it.

Best regards,
Thomas

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: spool format error: size [ In reply to ]
exim-users--- via Exim-users <exim-users@exim.org> (Mo 22 Apr 2019 19:57:42 CEST):

> Exim 4.90.1-1ubuntu1 with sa-exim running on one hosts (Ubuntu standard
> config with TLS enabled, sa-exim adding some headers) acting as
> smarthost, second Exim generating mail (store some 10+ messages in queue
> and trigger delivery via e.g. "exim4 -qff") and using that smarthost. As
> soon as more than one message was delivered via one connection, files
> got corrupted (not in every delivery but with a chance of about 10-20%,

I had a similar issue with broken -H files. Though if I understood your
issue right, only the symptoms are similar, the root cause might be
different.

But in my case the root cause was clearly a bug. (My case: multiple Exim
instances on multiple hosts used the same (NFS shared) spool directory).

Maybe there is some relation (again: I'm quite convinced that it isn't
related, as the rest of the -H file handling looks ok to me. Especially
there is a fclose(hdr.<PID>) and rename(hdr.<PID>, <MESSAGE_ID>-H)), so the next
message should get a fresh hdr.<PID>, even if it has the same PID.

Can you apply commit cb80814d1 to your Exim? (It breaks the testsuite,
to fix this, there is another commit).

Basically it is the following change:

diff --git a/src/src/spool_out.c b/src/src/spool_out.c
index 3970206cb..a4a734a1a 100644
--- a/src/src/spool_out.c
+++ b/src/src/spool_out.c
@@ -134,8 +134,7 @@ struct stat statbuf;
uschar * tname;
uschar * fname;

-tname = spool_fname(US"input", message_subdir,
- string_sprintf("hdr.%d", (int)getpid()), US"");
+tname = spool_fname(US"input", message_subdir, US"hdr.", message_id);

if ((fd = spool_open_temp(tname)) < 0)
return spool_write_error(where, errmsg, US"open", NULL, NULL);


Best regards from Dresden/Germany
Viele Grüße aus Dresden
Heiko Schlittermann
--
SCHLITTERMANN.de ---------------------------- internet & unix support -
Heiko Schlittermann, Dipl.-Ing. (TU) - {fon,fax}: +49.351.802998{1,3} -
gnupg encrypted messages are welcome --------------- key ID: F69376CE -
! key id 7CBF764A and 972EAC9F are revoked since 2015-01 ------------ -