Mailing List Archive

Replacement mbox.c (using ripmime)
>> <RANT>I'm not sure I really understand why Tomasz / Nigel seem so
>> reluctant to use ripmime / libripmime, when it seem to be to be a better
>> solution, and will soon have OLE unpacking support</RANT>
>
> If I remember correctly, the license of libripmime doesn't allow us to use it
> in libclamav.

You are kind of right, in that the "LICENSE.TXT" is unclear and the web
sites says BSD license, so I mailed Paul to clarify this, he said :-

"ripMIME is BSD licensed, so, nothing stopping them using it"

> Change to 20030831 snapshot brought about stability, so fix no
> longer needed in my case.

I'll give it a try, but seeing as 20030829 still leaked like hell (over
12Mb on 375 messages) and SEGV'ed on exactly the same e-mails that it
had before (the two I had already sent to Nigel), I was kind of loosing
faith. On some days the leaking alone was enough to crash the whole
system - something had to be done.

My personal preference would be for a different structure to "clamd" so
that SEGVs and leaks can't bring the service down.

Something like :-

load_database();
open_main_socket();
while(!SIGTERM)
{
alarm(60);
check_database();
select(sock,timeout);
if (FDSET(sock)
{
client = accept(sock)
if (!fork())
{
libripmime(tmpdir,client);
libclamscan(tmpdir);
}
}
}

Forking for each incomming connection could work out expensive, but its
essentially what clamd does already, but clamd forks a thread instead of
a process - in Linux the two are very similar.

With this kind of structure ripmime and clamscan can crash and leak to
their hearts content and the service will stay up.


A more sophisticated variation would be to load the database, open the
main socket, then fork (say) 5 child process all running a blocking
"accept", one of the child processes (whoever happens to have CPU at the
time) will then be given the incomming connection and scan the data.

The child could then either die (and be re-started by the master) or go
into another blocking "accept". You could then allow the child to scan,
say 10 jobs, before it dies (and is re-started by the master). This is
basically how Apache works.


Apache is slight more sophisticated still, in that the Apache master
process also monitors the main request socket and if the queue is
filling up it starts more child processes (to a MAX limit). As a further
check, if a child hasn't answered a query after more than a certain
period of time it is killed (by an SIGALRM) and the master re-starts it.

The master should also have a SIGALRM back stop, so that if it locks up,
it dies. The master would then be run through inittab so that it is
always immediately re-started.

This would give a really bullet proof scanning service and allow for a
reasonable level of leaking / bugs in the scanning process itself.

I'd be more than happy to write and donate the code (I have most of it
already), but you'd have to wait a week or two as I'm a bit busy on
client work right now.

> It did not appear to pick up SoBig.

Its picked up over 4000 SoBig sent to me so far. You may need the
"--mailbox" and the STDIN patch to ripmime (as per my previous posting).

Like I said before, we mainly use clam to scan mail one at a time using
the milter, so not having the "--mailbox" option worked fine for me, up
to a point.


James
Re: Replacement mbox.c (using ripmime) [ In reply to ]
Hi list,

I'm currently working on Nigel's code and found at least one source of
possible SIGSEGV, and currently tracking memleaks. I'm sure to have more
results after the weekend.

Stay tuned.
Re: Replacement mbox.c (using ripmime) [ In reply to ]
2003-09-05T04:41:22 James Stevens:
> Forking for each incomming connection could work out expensive,
> [...]

[...] but not badly so at all on Linux. Other OSes aren't so
swift; I can damn near fork (and context switch) faster than
Solaris:-).

> A more sophisticated variation would be to load the database, open the
> main socket, then fork (say) 5 child process all running a blocking
> "accept", one of the child processes (whoever happens to have CPU at the
> time) will then be given the incomming connection and scan the data.

I've done this, and it works out _magnificently_ for email content
scanning. An email content scanner is sandwiched between two bits of
MTA, so the MTA gets to have the Big Picture control over
concurrency management.

In my code, the master binds, then forks off N children (with quick
naps between, to keep from killing the system), then goes to sleep,
waiting for a child to exit. On healthy OSes the children just jump
right into accept on the sockets, letting the OS dispatch
connections to children as it wishes. On sick, sick platforms this
produces errors, so the children dispatch off a semaphore so that
only one child is attempting to accept at a time.

> The child could then either die (and be re-started by the master) or go
> into another blocking "accept". You could then allow the child to scan,
> say 10 jobs, before it dies (and is re-started by the master). This is
> basically how Apache works.

I had a configurable minjobs and maxjobs (defaults 100 and 200,
sounds like clamd might want to start a little lower:-), and each
child rolled a random number uniformly between those two and
serviced just that many before exiting; this schmeared the child
exits out over plenty of time so the master didn't suddenly find all
its children gone and service stalled until it could re-fork 'em.

> Apache is slight more sophisticated still, [...]

Indeed, but it's solving a harder problem, adapting gracefully to
the fractally chaotic load a public webserver gets. An email content
analyzer can be presented a far, far better conditioned load by its
surrounding MTA.

> The master should also have a SIGALRM back stop, so that if it
> locks up, it dies. The master would then be run through inittab so
> that it is always immediately re-started.

That far I don't go; if a simple networking parent can't remain
stable and alive, I'll hunt it down and fix it. Or delete it.

This reminds me of djb's daemontools, where absolutely rock-solid
daemons like dnscache and tinydns are run under a respawner that's
run under an init-replacement respawner that's run under init to
make sure it's respawned as necessary.... Thanks anyway, I run my
djbdns components out of init scripts:-).

> This would give a really bullet proof scanning service and allow
> for a reasonable level of leaking / bugs in the scanning process
> itself.

Arranging to have the mime-hacker process a bounded number of jobs
before exiting, and having crashes in it not deny the whole service,
is definitely appropriate; MIME parsing is an impossible job to do
completely correctly, and is a fiendishly difficult job to do even
usefully competantly. MIME is blecherous.

I'm less excited by massive efforts to carefully arrange for the
networking parent to be supervised and monitored and restarted if
necessary, and for the superviser that monitors that process to be
so monitored, etc. If the parent process that bound the socket and
forks the children should die, my MTA monitoring will set off alarms
(that's only one of a class of possible environmental problems that
could give it constipation), and I'll figure out what happened and
fix it.

Oh, and about my code, if anybody wants it for anything you're
welcome to it, <URL:http://bent.latency.net/smtpprox/>, but as it's
an SMTP proxy written in perl, it probably isn't directly useful to
clamd developers, the above description likely has all the goodie
you'd be able to get out of the perl.

-Bennett
Re: Memory leaks (was: Replacement mbox.c (using ripmime)) [ In reply to ]
Thomas Lamy wrote:
> Hi list,
>
> I'm currently working on Nigel's code and found at least one source of
> possible SIGSEGV, and currently tracking memleaks. I'm sure to have more
> results after the weekend.
>
> Stay tuned.
>
I've submitted a couple of fixes to Nigel/Tomasz. The modified mbox code
seems stable now (checked with 200MB of mails), fixes a bug where
certain attachments would not be extracted/checked, and no longer leaks.
I s'pose they're currently in review.

From further investigations, at least unrarlib leaks like hell, and
fails to unrar some archives (but _not_ 3.0 ones). I'm on that issue now.


Thomas