Mailing List Archive

CNAME lookup failure started
Hi All,
For the past two weeks we have started getting CNAME failure in the logs but
this time is a bit strange. We are using dnscache.
The "dnsqr any domain" will time out *but* when I do "dnsmx domain" it
returns the servers and after that dnsqr will start to work.
Here are some example of the domains that I tried:
# dnsqr any IX.NETCOM.COM
255 ix.netcom.com:
temporary failure
Here is the dnscache/current during the lookup:
2010-10-07 10:05:51.846519500 query 5697532 aa3d1433:2385:e08d 255
ix.netcom.com.
2010-10-07 10:05:51.917603500 tx 0 255 ix.netcom.com. netcom.com. cf45bcc8
cf45bcc9
2010-10-07 10:06:03.932202500 query 5697580 aa3d1433:6004:506a 255
ix.netcom.com.
2010-10-07 10:06:03.932378500 tx 0 255 ix.netcom.com. netcom.com. cf45bcc9
cf45bcc8
2010-10-07 10:06:48.022789500 query 5697663 aa3d1433:1dc3:9009 255
ix.netcom.com.
2010-10-07 10:06:48.022916500 tx 0 255 ix.netcom.com. netcom.com. cf45bcc9
cf45bcc8
2010-10-07 10:07:52.093749500 servfail ix.netcom.com. input/output error
2010-10-07 10:08:04.123374500 servfail ix.netcom.com. input/output error
2010-10-07 10:08:48.203857500 servfail ix.netcom.com. input/output error
Now I do dnsqr mx ix.netcom.com and i get:
15 ix.netcom.com:
124 bytes, 1+4+0+0 records, response, noerror
query: 15 ix.netcom.com
answer: ix.netcom.com 300 MX 10 mx2.earthlink.net
answer: ix.netcom.com 300 MX 10 mx3.earthlink.net
answer: ix.netcom.com 300 MX 10 mx4.earthlink.net
answer: ix.netcom.com 300 MX 10 mx1.earthlink.net
Then I do dnsqr any ix.netcom.com and this time I am ok:
255 ix.netcom.com:
87 bytes, 1+2+0+0 records, response, noerror
query: 255 ix.netcom.com
answer: ix.netcom.com 1795 NS scratchy.earthlink.net
answer: ix.netcom.com 1795 NS itchy.earthlink.net
and the log file contains:
2010-10-07 10:12:04.991860500 query 5698245 aa3d1433:649d:6b74 255
ix.netcom.com.
2010-10-07 10:12:04.992004500 cached ns netcom.com. hearsay.earthlink.net.
2010-10-07 10:12:04.992006500 cached ns netcom.com. speakeasy.earthlink.net.
2010-10-07 10:12:04.992010500 tx 0 255 ix.netcom.com. netcom.com. cf45bcc9
cf45bcc8
2010-10-07 10:12:12.294680500 query 5698281 aa3d1433:7719:7637 15
ix.netcom.com.
2010-10-07 10:12:12.294765500 cached ns netcom.com. hearsay.earthlink.net.
2010-10-07 10:12:12.294768500 cached ns netcom.com. speakeasy.earthlink.net.
2010-10-07 10:12:12.294773500 tx 0 15 ix.netcom.com. netcom.com. cf45bcc9
cf45bcc8
2010-10-07 10:12:12.341771500 rr cf45bcc9 1800 ns ix.netcom.com.
itchy.earthlink.net.
2010-10-07 10:12:12.341890500 rr cf45bcc9 1800 ns ix.netcom.com.
scratchy.earthlink.net.
2010-10-07 10:12:12.341897500 tx 0 15 ix.netcom.com. ix.netcom.com. cf45bcc5
cf45bcc4
2010-10-07 10:12:12.367292500 rr cf45bcc5 1800 ns ix.netcom.com.
scratchy.earthlink.net.
2010-10-07 10:12:12.367349500 rr cf45bcc5 1800 ns ix.netcom.com.
itchy.earthlink.net.
2010-10-07 10:12:12.367352500 rr cf45bcc5 300 mx ix.netcom.com. 10
mx2.earthlink.net.
2010-10-07 10:12:12.367354500 rr cf45bcc5 300 mx ix.netcom.com. 10
mx3.earthlink.net.
2010-10-07 10:12:12.367356500 rr cf45bcc5 300 mx ix.netcom.com. 10
mx4.earthlink.net.
2010-10-07 10:12:12.367358500 rr cf45bcc5 300 mx ix.netcom.com. 10
mx1.earthlink.net.
2010-10-07 10:12:17.382496500 query 5698283 aa3d1433:f8b3:7903 255
ix.netcom.com.
2010-10-07 10:12:17.382574500 cached 2 ix.netcom.com.
Some other domains with the same situation are:
NORTHCAROLINA.USA.COM
OIT.STATE.NJ.US
SE.EY.ORG this site might actually have a problem.

Can someone shed a light on this problem, is it our problem or the domain's?
Why this problem just started happening?
Thank you all.
Re: CNAME lookup failure started [ In reply to ]
Hi,



Am Donnerstag, den 07.10.2010, 10:31 -0400 schrieb Vahid Moghaddasi:
>
> Hi All,
> For the past two weeks we have started getting CNAME failure in the
> logs but this time is a bit strange. We are using dnscache.
> The "dnsqr any domain" will time out *but* when I do "dnsmx domain" it
> returns the servers and after that dnsqr will start to work.
> Here are some example of the domains that I tried:
> # dnsqr any IX.NETCOM.COM
> 255 ix.netcom.com:
> temporary failure

Don't get it here:

dnsqr any IX.NETCOM.COM
255 ix.netcom.com:
188 bytes, 1+4+0+4 records, response, noerror
query: 255 ix.netcom.com
answer: ix.netcom.com 266 MX 10 mx2.earthlink.net
answer: ix.netcom.com 266 MX 10 mx3.earthlink.net
answer: ix.netcom.com 266 MX 10 mx4.earthlink.net
answer: ix.netcom.com 266 MX 10 mx1.earthlink.net
additional: mx3.earthlink.net 521 A 209.86.93.228
additional: mx4.earthlink.net 566 A 209.86.93.229
additional: mx1.earthlink.net 335 A 209.86.93.226
additional: mx2.earthlink.net 471 A 209.86.93.227

However, I use a patched version of djbdns allowing 'big' UDP packets
(common these days). I suggest to enhance the value of 'udpbuf' in
dns_transmit.c to 1200 byte:

int dns_transmit_get(struct dns_transmit *d,const iopause_fd *x,const
struct taia *when)
{
char udpbuf[1200];
unsigned char ch;
int r;
int fd;

Remember: The original maximum size for UDP packets was due to the IPv4
MTU constraints, roughly about 500 byte. Current networks -- able to
transmit IPv6 packets -- need to support a minimum MTU of 1280 byte.

Change that value and recompile. This changed value will now be the
standard maximum size of all djbdns traffic.

regards.
--eh.

--
Dr. Erwin Hoffmann | FEHCom | http://www.fehcom.de
Re: CNAME lookup failure started [ In reply to ]
Thus said Vahid Moghaddasi on Thu, 07 Oct 2010 10:31:02 EDT:

> # dnsqr any IX.NETCOM.COM
> 255 ix.netcom.com:
> temporary failure

This problem is not specific to djbdns. I get the same problem when
trying to resolve this domain querying against a BIND server using dig.

> Can someone shed a light on this problem, is it our problem or the
> domain's?

This is the domain's problem. Even if it is a UDP byte size issue that
is the cause, the standard still says that the limit on the size of a
UDP based DNS datagram is ``512 octets or less.'' It also says:

Messages carried by UDP are restricted to 512 bytes (not counting the
IP or UDP headers). Longer messages are truncated and the TC bit is
set in the header.

It doesn't matter what the reason for the restriction on the size of the
packet was. All that matters is that the standard says it is to be
truncated at 512 bytes. Any server that sends anything else (unless the
client is using extended protocols like EDNS0) is in violation of this
standard and should expect problems.


As I mentioned earlier, djbdns is not the only DNS tool having problems
resolving these domains. Please run tests against them using other tools
like:

http://www.squish.net/dnscheck/

this one gave the following results for ANY ix.netcom.com:

50.0% of queries will end in failure at 207.69.188.201 (hearsay.earthlink.net) - query timed out

50.0% of queries will end in failure at 207.69.188.200 (speakeasy.earthlink.net) - query timed out

I see similar problems with the other domains you listed.
northcarolina.usa.com has major looping and nested query problems.
oit.state.nj.us also ends in major failures due to timeouts and
nameserver loops. se.ey.org also has a bunch of problems. You can verify
these problems yourself using the tool listed above (and others
previously mentioned on the list).

Is it any surprise that bind/dnscache is also having problems?

Andy
Re: CNAME lookup failure started [ In reply to ]
The reason some of the domains fail is because any lookups go over 512 bytes but the authoritative server sometimes doesn't listen on tcp. Technically the any lookup is not required anymore. Only reason qmail does an any lookup is to work around a bug with cname lookups in bind 4.91. Today that is not needed you can trivially replace the any lookup with a cname lookup to resolve the domain If it's a cname, followed by an mx lookup.

Actually I think a direct mx lookup is sufficient as resolvers are supposed to resolve cnames if the domain is a cname and then do the mx lookup.

-bhasker

On Oct 7, 2010, at 22:02, "Andy Bradford" <amb-sendok-1289105935.inipgeeiliidcaknkgof@bradfords.org> wrote:

> Thus said Vahid Moghaddasi on Thu, 07 Oct 2010 10:31:02 EDT:
>
>> # dnsqr any IX.NETCOM.COM
>> 255 ix.netcom.com:
>> temporary failure
>
> This problem is not specific to djbdns. I get the same problem when
> trying to resolve this domain querying against a BIND server using dig.
>
>> Can someone shed a light on this problem, is it our problem or the
>> domain's?
>
> This is the domain's problem. Even if it is a UDP byte size issue that
> is the cause, the standard still says that the limit on the size of a
> UDP based DNS datagram is ``512 octets or less.'' It also says:
>
> Messages carried by UDP are restricted to 512 bytes (not counting the
> IP or UDP headers). Longer messages are truncated and the TC bit is
> set in the header.
>
> It doesn't matter what the reason for the restriction on the size of the
> packet was. All that matters is that the standard says it is to be
> truncated at 512 bytes. Any server that sends anything else (unless the
> client is using extended protocols like EDNS0) is in violation of this
> standard and should expect problems.
>
>
> As I mentioned earlier, djbdns is not the only DNS tool having problems
> resolving these domains. Please run tests against them using other tools
> like:
>
> http://www.squish.net/dnscheck/
>
> this one gave the following results for ANY ix.netcom.com:
>
> 50.0% of queries will end in failure at 207.69.188.201 (hearsay.earthlink.net) - query timed out
>
> 50.0% of queries will end in failure at 207.69.188.200 (speakeasy.earthlink.net) - query timed out
>
> I see similar problems with the other domains you listed.
> northcarolina.usa.com has major looping and nested query problems.
> oit.state.nj.us also ends in major failures due to timeouts and
> nameserver loops. se.ey.org also has a bunch of problems. You can verify
> these problems yourself using the tool listed above (and others
> previously mentioned on the list).
>
> Is it any surprise that bind/dnscache is also having problems?
>
> Andy
>
Re: CNAME lookup failure started [ In reply to ]
On Thu, Oct 7, 2010 at 2:28 PM, Erwin Hoffmann <feh@fehcom.de> wrote:

> However, I use a patched version of djbdns allowing 'big' UDP packets
> (common these days). I suggest to enhance the value of 'udpbuf' in
> dns_transmit.c to 1200 byte:
>
> int dns_transmit_get(struct dns_transmit *d,const iopause_fd *x,const
> struct taia *when)
> {
> char udpbuf[1200];
> unsigned char ch;
> int r;
> int fd;
>
> Remember: The original maximum size for UDP packets was due to the IPv4
> MTU constraints, roughly about 500 byte. Current networks -- able to
> transmit IPv6 packets -- need to support a minimum MTU of 1280 byte.
>
> Change that value and recompile. This changed value will now be the
> standard maximum size of all djbdns traffic.
>
> regards.
> --eh.
>
> --
> Dr. Erwin Hoffmann | FEHCom | http://www.fehcom.de
>

FYI, I merged a similar fix (patch from Matthew Dempsky) into zinq-djbdns a
while back:

http://zinq.svn.sourceforge.net/viewvc/zinq/dns/trunk/dns_transmit.c?r1=51&r2=66
http://marc.info/?l=djbdns&m=122368590802063&w=2
Re: CNAME lookup failure started [ In reply to ]
On Mon, Oct 11, 2010 at 6:04 PM, Mark Johnson <johnsonm@gmail.com> wrote:

> On Thu, Oct 7, 2010 at 2:28 PM, Erwin Hoffmann <feh@fehcom.de> wrote:
>
>> However, I use a patched version of djbdns allowing 'big' UDP packets
>> (common these days). I suggest to enhance the value of 'udpbuf' in
>> dns_transmit.c to 1200 byte:
>>
>> int dns_transmit_get(struct dns_transmit *d,const iopause_fd *x,const
>> struct taia *when)
>> {
>> char udpbuf[1200];
>> unsigned char ch;
>> int r;
>> int fd;
>>
>> Remember: The original maximum size for UDP packets was due to the IPv4
>> MTU constraints, roughly about 500 byte. Current networks -- able to
>> transmit IPv6 packets -- need to support a minimum MTU of 1280 byte.
>>
>> Change that value and recompile. This changed value will now be the
>> standard maximum size of all djbdns traffic.
>>
>> regards.
>> --eh.
>>
>> --
>> Dr. Erwin Hoffmann | FEHCom | http://www.fehcom.de
>>
>
> FYI, I merged a similar fix (patch from Matthew Dempsky) into zinq-djbdns a
> while back:
>
>
> http://zinq.svn.sourceforge.net/viewvc/zinq/dns/trunk/dns_transmit.c?r1=51&r2=66
> http://marc.info/?l=djbdns&m=122368590802063&w=2
>


I have all the necessary patches for oversised UDP packets but what I am not
clear is why everyone is thinking this is a DJBdns problem?
I tried this on any machine and get the same result, regardless of the DNS
server.
dig -t any IX.NETCOM.COM will hang but when you do dig -t mx
IX.NETCOM.COMeverything is ok a while and dig -t any
IX.NETCOM.COM will no longer hangs for a few hours.
Re: CNAME lookup failure started [ In reply to ]
Hi All,

> I have all the necessary patches for oversised UDP packets but what I am not
> clear is why everyone is thinking this is a DJBdns problem?
> I tried this on any machine and get the same result, regardless of the DNS
> server.
> dig -t any IX.NETCOM.COM will hang but when you do dig -t mx IX.NETCOM.COM
> everything is ok a while and dig -t any IX.NETCOM.COM will no longer hangs
> for a few hours.

I have noticed the same CNAME deferral problem crop up for
ix.netcom.com (and a few other domains) lately. I already have the
buffer size patch installed and that's not the source of the problem.
The problem appears to have something to do with the DNS servers for
the domains causing problems. The qmail dns.c code uses T_ANY in the
dns_cname() function. Change that to a T_CNAME and life gets better.
I presume there was a good reason for using type T_ANY in the
dns_cname() function instead of T_CNAME. Can someone explain?

- David
Re: CNAME lookup failure started [ In reply to ]
Reply Inlined.
David I. Bell wrote:
> Hi All,
>
>
> I have noticed the same CNAME deferral problem crop up for
> ix.netcom.com (and a few other domains) lately. I already have the
> buffer size patch installed and that's not the source of the problem.
> The problem appears to have something to do with the DNS servers for
> the domains causing problems. The qmail dns.c code uses T_ANY in the
> dns_cname() function. Change that to a T_CNAME and life gets better.
> I presume there was a good reason for using type T_ANY in the
> dns_cname() function instead of T_CNAME. Can someone explain?
>
> - David

I had mailed DJB about this exact same problem a few months back and
here is his response


Hi Bhasker,

Back in the 1990s there were many sites relying on the following feature
of the SMTP infrastructure:

If you set up www.your.site with a CNAME for your.site, mail to
www.your.site will automatically be accepted by your.site's mailer.

This feature was implemented by SMTP clients: the client would see the
CNAME record for www.your.site and rewrite www.your.site as your.site in
SMTP. This wasn't in the RFCs---I'm pretty sure that it started with
Eric Allman misinterpreting a stupid side comment in the RFCs---but new
clients such as qmail had to do the same thing for interoperability.

Implementors discussing this in the late 1990s agreed that it would be
good to drop this feature, eliminating all special knowledge of CNAMEs
from clients and telling servers to take care of themselves. I hate to
break the mail system, so I advocated a two-step transition with a gap
in time between

(1) warning clients to stop relying on the feature and
(2) turning the feature off.

Other people---including the RFC 2821 author---advocated simply turning
the feature off, mail delivery be damned.

I don't know who was the first to actually turn the feature off. I'm
sure that there aren't any sites relying on the feature now. It's safe
to simply skip the CNAME lookup: i.e., have dns_cname simply return 0.

---Dan
Re: CNAME lookup failure started [ In reply to ]
On Wed, Oct 13, 2010 at 10:56 AM, David I. Bell <dibl283@gmail.com> wrote:

>
> I have noticed the same CNAME deferral problem crop up for
> ix.netcom.com (and a few other domains) lately. I already have the
> buffer size patch installed and that's not the source of the problem.
> The problem appears to have something to do with the DNS servers for
> the domains causing problems. The qmail dns.c code uses T_ANY in the
> dns_cname() function. Change that to a T_CNAME and life gets better.
> I presume there was a good reason for using type T_ANY in the
> dns_cname() function instead of T_CNAME. Can someone explain?
>
> - David
>

I only see one place in dns.c with T_ANY (line 214) and one T_CNAME (line
220), changing T_ANY would not break any part of qmail? I guess you have it
that way for a while?