Mailing List Archive

dns and software, was Re: Reliable Cloud host ?
On Mon, 27 Feb 2012, William Herrin wrote:

> In some cases this is because of carelessness: The application does a
> gethostbyname once when it starts, grabs the first IP address in the
> list and retains it indefinitely. The gethostbyname function doesn't
> even pass the TTL to the application. Ntpd is/used to be one of the
> notable offenders, continuing to poll the dead address for years after
> the server moved.

While yes it often is carelessness - it's been reported by hardcore
development sorts that I trust that there is no standardized API to obtain
the TTL... What needs to get fixed is get[hostbyname,addrinfo,etc] so
programmers have better tools.



--
david raistrick http://www.netmeister.org/news/learn2quote.html
drais@icantclick.org http://www.expita.com/nomime.html
dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mon, 27 Feb 2012, William Herrin wrote:

> In some cases this is because of carelessness: The application does a
> gethostbyname once when it starts, grabs the first IP address in the
> list and retains it indefinitely. The gethostbyname function doesn't
> even pass the TTL to the application. Ntpd is/used to be one of the
> notable offenders, continuing to poll the dead address for years after
> the server moved.

While yes it often is carelessness - it's been reported by hardcore
development sorts that I trust that there is no standardized API to obtain
the TTL... What needs to get fixed is get[hostbyname,addrinfo,etc] so
programmers have better tools.



--
david raistrick http://www.netmeister.org/news/learn2quote.html
drais@icantclick.org http://www.expita.com/nomime.html
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mon, Feb 27, 2012 at 3:43 PM, david raistrick <drais@icantclick.org> wrote:
> On Mon, 27 Feb 2012, William Herrin wrote:
>> In some cases this is because of carelessness: The application does a
>> gethostbyname once when it starts, grabs the first IP address in the
>> list and retains it indefinitely. The gethostbyname function doesn't
>> even pass the TTL to the application. Ntpd is/used to be one of the
>> notable offenders, continuing to poll the dead address for years after
>> the server moved.
>
> While yes it often is carelessness - it's been reported by hardcore
> development sorts that I trust that there is no standardized API to obtain
> the TTL...  What needs to get fixed is get[hostbyname,addrinfo,etc] so
> programmers have better tools.

Meh. What should be fixed is that connect() should receive a name
instead of an IP address. Having an application deal directly with the
IP address should be the exception rather than the rule. Then, deal
with the TTL issues once in the standard libraries instead of
repeatedly in every single application.

In theory, that'd even make the app code protocol agnostic so that it
doesn't have to be rewritten yet again for IPv12.

Regards,
Bill Herrin

--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Feb 27, 2012, at 3:50 PM, William Herrin wrote:

> On Mon, Feb 27, 2012 at 3:43 PM, david raistrick <drais@icantclick.org> wrote:
>> On Mon, 27 Feb 2012, William Herrin wrote:
>>> In some cases this is because of carelessness: The application does a
>>> gethostbyname once when it starts, grabs the first IP address in the
>>> list and retains it indefinitely. The gethostbyname function doesn't
>>> even pass the TTL to the application. Ntpd is/used to be one of the
>>> notable offenders, continuing to poll the dead address for years after
>>> the server moved.
>>
>> While yes it often is carelessness - it's been reported by hardcore
>> development sorts that I trust that there is no standardized API to obtain
>> the TTL... What needs to get fixed is get[hostbyname,addrinfo,etc] so
>> programmers have better tools.
>
> Meh. What should be fixed is that connect() should receive a name
> instead of an IP address. Having an application deal directly with the
> IP address should be the exception rather than the rule. Then, deal
> with the TTL issues once in the standard libraries instead of
> repeatedly in every single application.
>
> In theory, that'd even make the app code protocol agnostic so that it
> doesn't have to be rewritten yet again for IPv12.
>

While I agree with the principle of what you are trying to say, I would argue
that it should be dealt with in getnameinfo() / getaddrinfo() and not connect().

It is perfectly reasonable for connect() to deal with an address structure.

If people are not using getnameinfo()/getaddrinfo() from the standard libraries,
then, I don't see any reason to believe that they would use connect() from the
standard libraries if it incorporated their functionality.

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mon, Feb 27, 2012 at 7:07 PM, Owen DeLong <owen@delong.com> wrote:
> On Feb 27, 2012, at 3:50 PM, William Herrin wrote:
>> Meh. What should be fixed is that connect() should receive a name
>> instead of an IP address. Having an application deal directly with the
>> IP address should be the exception rather than the rule. Then, deal
>> with the TTL issues once in the standard libraries instead of
>> repeatedly in every single application.
>>
>> In theory, that'd even make the app code protocol agnostic so that it
>> doesn't have to be rewritten yet again for IPv12.
>
> While I agree with the principle of what you are trying to say, I would argue
> that it should be dealt with in getnameinfo() / getaddrinfo() and not connect().
>
> It is perfectly reasonable for connect() to deal with an address structure.

Yes, well, that's why we're still using a layer 4 protocol (TCP) that
can't dynamically rebind to the protocol level below (IP). God help us
when folks start overriding the ethernet MAC address to force machines
to keep the same IPv6 address that's been hardcoded somewhere or is
otherwise too much trouble to change.

Regards,
Bill Herrin



--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mon, Feb 27, 2012 at 4:59 PM, William Herrin <bill@herrin.us> wrote:
> ....
> Yes, well, that's why we're still using a layer 4 protocol (TCP) that
> can't dynamically rebind to the protocol level below (IP).

This is somewhat irritating, but on the scale of 0 (all is well) to 10
(you want me to do WHAT with DHCPv6???) this is about a 2.

The application can re-connect from the TCP layer if something wiggy
happens to the layer below. This is an application layer solution, is
well established, and works fine. One just has to notice something's
amiss and retry connection rather than abort the application.

> God help us
> when folks start overriding the ethernet MAC address to force machines
> to keep the same IPv6 address that's been hardcoded somewhere or is
> otherwise too much trouble to change.

It could be worse. Back in the day I worked for a company that did
one of the earlier two-on-motherboard ethernet chip servers. The Boot
PROM (from another vendor) had no clue about multiple ethernet
interfaces. It came up with both interfaces set to the same NVRAM-set
MAC. We wanted to fix it in firmware but kept having issues with
that.

I had to get an init script to rotate the MAC for the second interface
up one, and ensure that it was in the OS and run before the interfaces
got plumbed, get it bundled into the OS distribution, and ensure that
factory MACs were only set to even numbers to start with.

One of these steps ultimately failed rather spectacularly.



--
-george william herbert
george.herbert@gmail.com
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
In message <CAP-guGVA4eHv0K=U=x2B-WPYDy2RQ7ZE1Di2AHc+dmA_huyGzA@mail.gmail.com>,
William Herrin writes:
> On Mon, Feb 27, 2012 at 3:43 PM, david raistrick <drais@icantclick.org> wro=
> te:
> > On Mon, 27 Feb 2012, William Herrin wrote:
> >> In some cases this is because of carelessness: The application does a
> >> gethostbyname once when it starts, grabs the first IP address in the
> >> list and retains it indefinitely. The gethostbyname function doesn't
> >> even pass the TTL to the application. Ntpd is/used to be one of the
> >> notable offenders, continuing to poll the dead address for years after
> >> the server moved.
> >
> > While yes it often is carelessness - it's been reported by hardcore
> > development sorts that I trust that there is no standardized API to obtai=
> n
> > the TTL... =A0What needs to get fixed is get[hostbyname,addrinfo,etc] so
> > programmers have better tools.
>
> Meh. What should be fixed is that connect() should receive a name
> instead of an IP address. Having an application deal directly with the
> IP address should be the exception rather than the rule. Then, deal
> with the TTL issues once in the standard libraries instead of
> repeatedly in every single application.

No. connect() should stay the way it is. Most developers cut and paste
the connection code. It's just that the code they cut and paste is very
old and is often IPv4 only.

> In theory, that'd even make the app code protocol agnostic so that it
> doesn't have to be rewritten yet again for IPv12.

getaddrinfo() man page has IP version agnostic code examples. It
is however simplistic code which doesn't behave well when a address
is unreachable. For examples of how to behave better for TCP see:

https://www.isc.org/community/blog/201101/how-to-connect-to-a-multi-homed-server-over-tcp

Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Feb 27, 2012, at 19:10, Owen DeLong <owen@delong.com> wrote:

>
> On Feb 27, 2012, at 3:50 PM, William Herrin wrote:
>
>> On Mon, Feb 27, 2012 at 3:43 PM, david raistrick <drais@icantclick.org> wrote:
>>> On Mon, 27 Feb 2012, William Herrin wrote:
>>>> In some cases this is because of carelessness: The application does a
>>>> gethostbyname once when it starts, grabs the first IP address in the
>>>> list and retains it indefinitely. The gethostbyname function doesn't
>>>> even pass the TTL to the application. Ntpd is/used to be one of the
>>>> notable offenders, continuing to poll the dead address for years after
>>>> the server moved.
>>>
>>> While yes it often is carelessness - it's been reported by hardcore
>>> development sorts that I trust that there is no standardized API to obtain
>>> the TTL... What needs to get fixed is get[hostbyname,addrinfo,etc] so
>>> programmers have better tools.
>>
>> Meh. What should be fixed is that connect() should receive a name
>> instead of an IP address. Having an application deal directly with the
>> IP address should be the exception rather than the rule. Then, deal
>> with the TTL issues once in the standard libraries instead of
>> repeatedly in every single application.
>>
>> In theory, that'd even make the app code protocol agnostic so that it
>> doesn't have to be rewritten yet again for IPv12.
>
> While I agree with the principle of what you are trying to say, I would argue
> that it should be dealt with in getnameinfo() / getaddrinfo() and not connect().
>
> It is perfectly reasonable for connect() to deal with an address structure.
>
> If people are not using getnameinfo()/getaddrinfo() from the standard libraries,
> then, I don't see any reason to believe that they would use connect() from the
> standard libraries if it incorporated their functionality.

gai/gni do not return TTL values on any platforms I'm aware of, the
only way to get TTL currently is to use a non standard resolver (e.g.
lwres). The issue is application developers not calling gai every time
they connect (due to aforementioned security concerns, at least in the
browser realm), instead opting to hold onto the original resolved
address for unreasonable amounts of time. Modifying gai to provide TTL
has been proposed in the past (dnsop '04) but afaik was shot down to
prevent inconsistencies in the API. Maybe when happy eyeballs
stabilizes someone will propose an API for inclusion in the standard
library that implements HE style connections. Looks like there was
already some talk on v6ops headed this way, but as always there's
resistance to standardizing it.

~Matt
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
getaddrinfo was designed to be extensible as was struct
addrinfo. Part of the problem with TTL is not data sources
used by getaddrinfo have TTL information. Additionally for
many uses you want to reconnect to the same server rather
than the same name. Note there is nothing to prevent a
getaddrinfo implementation maintaining its own cache though
if I was implementing such a cache I would have a flag to
to force a refresh.

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Feb 27, 2012, at 9:45 PM, Mark Andrews wrote:

>
> getaddrinfo was designed to be extensible as was struct
> addrinfo. Part of the problem with TTL is not data sources
> used by getaddrinfo have TTL information. Additionally for
> many uses you want to reconnect to the same server rather
> than the same name. Note there is nothing to prevent a
> getaddrinfo implementation maintaining its own cache though
> if I was implementing such a cache I would have a flag to
> to force a refresh.
>

Sorry if I wasn't clear... My point to Bill was that we should be using calls that don't have TTL information
(GAI/GNI in their default forms). That we don't need to abuse connect() to achieve that. That if people use GAI/GNI(), then, any brokenness is system-wide brokenness in the system's resolver library and should be addressed there.

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Tue, Feb 28, 2012 at 12:45 AM, Mark Andrews <marka@isc.org> wrote:
>        getaddrinfo was designed to be extensible as was struct
>        addrinfo.  Part of the problem with TTL is not [all] data sources
>        used by getaddrinfo have TTL information.

Hi Mark,

By the time getaddrinfo replaced gethostbyname, NIS and similar
systems were on their way out. It was reasonably well understood that
many if not most of the calls would return information gained from the
DNS. Depending on how you look at it, choosing not to propagate TTL
knowledge was either a belligerent choice to continue disrespecting
the DNS Time To Live or it was fatalistic acceptance that the DNS TTL
isn't and would not become functional at the application level.

Still works fine deeper in the query system, timing out which server
holds the records though.


>  Additionally for
>        many uses you want to reconnect to the same server rather
>        than the same name.

The SRV record was designed to solve that whole class of problems
without damaging the operation of the TTL. No one uses it.


It's all really very unfortunate. The recipe for SOHO multihoming, the
end of routing table bloat and IP roaming without pivoting off a home
base all boils down to two technologies: (1) a layer 4 protocol that
can dynamically rebind to the layer 3 IP address the same way IP uses
ARP to rebind to a changing ethernet MAC and (2) a DNS TTL that
actually works so that the DNS supports finding a connection's current
IP address.

Regards,
Bill Herrin

--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
In message <CAP-guGV09HF7in+vZbKpGk0RR1Q4gpMMo5jQREUZVEj+ewzmkg@mail.gmail.com>,
William Herrin writes:
> On Tue, Feb 28, 2012 at 12:45 AM, Mark Andrews <marka@isc.org> wrote:
> > getaddrinfo was designed to be extensible as was struct
> > addrinfo. Part of the problem with TTL is not [all] dat=
> a sources
> > used by getaddrinfo have TTL information.
>
> Hi Mark,
>
> By the time getaddrinfo replaced gethostbyname, NIS and similar
> systems were on their way out. It was reasonably well understood that
> many if not most of the calls would return information gained from the
> DNS. Depending on how you look at it, choosing not to propagate TTL
> knowledge was either a belligerent choice to continue disrespecting
> the DNS Time To Live or it was fatalistic acceptance that the DNS TTL
> isn't and would not become functional at the application level.

No. Propogating TTL is still a issue especially when you do not always
have one. You can't just wave the problem away. As for DNS TTL addresses
are about the only thing which have multiple sources. You also don't
have to use getaddrinfo. It really is designed to be the first step in
connecting to a host. If you need to reconnect you call it again.

> Still works fine deeper in the query system, timing out which server
> holds the records though.
>
>
> > Additionally for
> > many uses you want to reconnect to the same server rather
> > than the same name.
>
> The SRV record was designed to solve that whole class of problems
> without damaging the operation of the TTL. No one uses it.

You don't need to know the TTL to use SRV.

> It's all really very unfortunate. The recipe for SOHO multihoming, the
> end of routing table bloat and IP roaming without pivoting off a home
> base all boils down to two technologies: (1) a layer 4 protocol that
> can dynamically rebind to the layer 3 IP address the same way IP uses
> ARP to rebind to a changing ethernet MAC and (2) a DNS TTL that
> actually works so that the DNS supports finding a connection's current
> IP address.

DNS TTL works. Applications that don't honour it arn't a indication that
it doesn't work.

> Regards,
> Bill Herrin
>
> --
> William D. Herrin ................ herrin@dirtside.com bill@herrin.us
> 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
> Falls Church, VA 22042-3004
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka@isc.org> wrote:
> DNS TTL works.  Applications that don't honour it arn't a indication that
> it doesn't work.

Mark,

If three people died and the building burned down then the sprinkler
system didn't work. It may have sprayed water, but it didn't *work*.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q@mail.gmail.com>,
William Herrin writes:
> On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka@isc.org> wrote:
> > DNS TTL works. =A0Applications that don't honour it arn't a indication th=
> at
> > it doesn't work.
>
> Mark,
>
> If three people died and the building burned down then the sprinkler
> system didn't work. It may have sprayed water, but it didn't *work*.

Not enough evidence to say if it worked or not. Sprinkler systems
are designed to handle particular classes of fire, not every fire.

A 0 TTL means use this information for this transaction. We don't
tear down TCP sessions on DNS TTL going to zero.

If one really want to deprecate addresses we need something a lot
more complicated than A and AAAA records in the DNS. We need stuff
like "use this address for new transactions", "this address is going
away soon, don't use it unless no other works". One also has to use
multiple addresses at the same time.

Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
> In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q@mail.gmail.com>,
> William Herrin writes:
> > On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka@isc.org> wrote:
> > > DNS TTL works. =A0Applications that don't honour it arn't a indication th=
> > at
> > > it doesn't work.
> >
> > Mark,
> >
> > If three people died and the building burned down then the sprinkler
> > system didn't work. It may have sprayed water, but it didn't *work*.
>
> Not enough evidence to say if it worked or not. Sprinkler systems
> are designed to handle particular classes of fire, not every fire.

It is also worth noting that many fire systems are not intended to
put out the fire, but to provide warning and then provide an extended
window for people to exit the affected building through use of sprinklers
and other measures to slow the spread of the fire. As you suggest, most
sprinkler systems are not actually designed to be able to completely
extinguish fires - but that even applies to fires they are intended to be
able to "handle". This is a common misunderstanding of the technology.

> A 0 TTL means use this information for this transaction. We don't
> tear down TCP sessions on DNS TTL going to zero.
>
> If one really want to deprecate addresses we need something a lot
> more complicated than A and AAAA records in the DNS. We need stuff
> like "use this address for new transactions", "this address is going
> away soon, don't use it unless no other works". One also has to use
> multiple addresses at the same time.

This has always been a weakness of the technology and documentation.
The common usage scenario of static hosts and merely needing to be able
to resolve a hostname to reach the traditional example of a "departmental
server" or something like that is what most code and code examples are
intended to tackle; very little of what developers are actually given (in
real practical terms) even hints at needing to consider aspects such as
TTL or periodically refreshing host->ip mappings, and most of the
documentation I've seen fails to discuss the implications of overloading
things like TTL for deliberate load-balancing or geo purposes. Shocking
it's poorly understood by developers who just want their poor little
program to connect over the Internet.

It's funny how these two technologies are both often misunderstood. I
would not have thought of comparing DNS to fire suppression. :-)

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Wed, Feb 29, 2012 at 7:57 AM, Joe Greco <jgreco@ns.sol.net> wrote:
>> In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q@mail.gmail.com>,
>>  William Herrin writes:
>> > On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka@isc.org> wrote:
>> > > DNS TTL works. =A0Applications that don't honour it arn't a indication th=
>> > at
>> > > it doesn't work.
>> >
>> > Mark,
>> >
>> > If three people died and the building burned down then the sprinkler
>> > system didn't work. It may have sprayed water, but it didn't *work*.
>>
>> Not enough evidence to say if it worked or not.  Sprinkler systems
>> are designed to handle particular classes of fire, not every fire.
>
> It is also worth noting that many fire systems are not intended to
> put out the fire, but to provide warning and then provide an extended
> window for people to exit the affected building through use of sprinklers
> and other measures to slow the spread of the fire.

Hi Joe,

The sprinkler system is designed to delay the fire long enough for
everyone to safely escape. As a secondary objective, it reduces the
fire damage that occurs while waiting for firefighters to arrive and
extinguish the fire. If "three people died" then the system failed.
Perhaps the design was inadequate. Perhaps some age-related issue
prevented the sprinkler heads from melting. Perhaps someone stacked
boxes to the ceiling and it blocked the water. Perhaps the water was
shut off and nobody knew it. Perhaps an initial explosion damaged the
sprinkler system so it could no longer work effectively. Whatever the
exact details, that sprinkler system failed.

Whoever you want to blame, DNS TTL dysfunction at the application
level is the same way. It's a failed system. With the TTL on an A
record set to 60 seconds, you can't change the address attached to the
A record and expect that 60 seconds later no one will continue to
connect to the old address. Nor 600 seconds later nor 6000 seconds
later. The "system" for renumbering a service of which the TTL setting
is a part consistently fails to reliably function in that manner.

Regards,
Bill Herrin



--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Feb 29, 2012, at 6:18 AM, William Herrin wrote:

> On Wed, Feb 29, 2012 at 7:57 AM, Joe Greco <jgreco@ns.sol.net> wrote:
>>> In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q@mail.gmail.com>,
>>> William Herrin writes:
>>>> On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka@isc.org> wrote:
>>>>> DNS TTL works. =A0Applications that don't honour it arn't a indication th=
>>>> at
>>>>> it doesn't work.
>>>>
>>>> Mark,
>>>>
>>>> If three people died and the building burned down then the sprinkler
>>>> system didn't work. It may have sprayed water, but it didn't *work*.
>>>
>>> Not enough evidence to say if it worked or not. Sprinkler systems
>>> are designed to handle particular classes of fire, not every fire.
>>
>> It is also worth noting that many fire systems are not intended to
>> put out the fire, but to provide warning and then provide an extended
>> window for people to exit the affected building through use of sprinklers
>> and other measures to slow the spread of the fire.
>
> Hi Joe,
>
> The sprinkler system is designed to delay the fire long enough for
> everyone to safely escape. As a secondary objective, it reduces the
> fire damage that occurs while waiting for firefighters to arrive and
> extinguish the fire. If "three people died" then the system failed.
> Perhaps the design was inadequate. Perhaps some age-related issue
> prevented the sprinkler heads from melting. Perhaps someone stacked
> boxes to the ceiling and it blocked the water. Perhaps the water was
> shut off and nobody knew it. Perhaps an initial explosion damaged the
> sprinkler system so it could no longer work effectively. Whatever the
> exact details, that sprinkler system failed.

Bill, you are blaming the sprinkler system for what could, in fact, be not
a failure of the sprinkler system, but, of the 3 humans.

If they were too intoxicated or stoned to react, for example, the sprinkler
system is not to blame. If they were overcome by smoke before the
sprinklers went off, that may be a failure of the smoke detectors, but, it
is not a failure of the sprinklers. If they were killed or rendered unconsious
and/or unresponsive in the preceding explosion you mentioned and did
not die in the subsequent fire, then, that is not a failure in the sprinkler
system.

>
> Whoever you want to blame, DNS TTL dysfunction at the application
> level is the same way. It's a failed system. With the TTL on an A
> record set to 60 seconds, you can't change the address attached to the
> A record and expect that 60 seconds later no one will continue to
> connect to the old address. Nor 600 seconds later nor 6000 seconds
> later. The "system" for renumbering a service of which the TTL setting
> is a part consistently fails to reliably function in that manner.

Yes, the assumption by developers that gni/ghi is a fire-and-forget
mechanism and that the data received is static is a failure. It is not a
failure of DNS TTL. It is a failure of the application developers that
code that way. Further analysis of the underlying causes of that failure
to properly understand name resolution technology and the environment
in which it operates is left as an exercise for the reader.

The fact that people playing interesting games with DNS TTLs don't
necessarily understand or well document the situation to raise awareness
among application developers could also be argued to be a failure
on the part of those people.

It is not, in either case, a failure of the technology.

One should always call gni/gai in close temporal (and ideally close
in the code as well) proximity to calling connect(). Obviously one
should call these resolver functions prior to calling connect().

Most example code is designed for short-lived non-recovering flows,
so, it's designed along the lines of resolve->(iterate through results
calling connect() for each result untill connect() succeeds)->process->
close->exit.

Examples for persistent connections and/or connections that recover
or re-establish after a failure and/or browsers that stay running for a
long time and connect to the same system again significantly later
are few and far between. As a result, most code doing that ends up
being poorly written.

Further, DNS performance issues in the past have led developers of
such applications to "take matters into their own hands" to try and
improve the performance/behavior of their application in spite of
DNS. This is one of the things that led to many of the TTL ignorant
application-level DNS caches which you are complaining about.

Again, not a failure of DNS technology, but, of the operators of that
technology and the developers that tried to compensate for those
failures. They introduced a cure that is often worse than the disease.

Owen

>
> Regards,
> Bill Herrin
>
>
>
> --
> William D. Herrin ................ herrin@dirtside.com bill@herrin.us
> 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
> Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 02/29/12 10:01, Owen DeLong wrote:
> Further, DNS performance issues in the past have led developers of
> such applications to "take matters into their own hands" to try and
> improve the performance/behavior of their application in spite of
> DNS. This is one of the things that led to many of the TTL ignorant
> application-level DNS caches which you are complaining about.

I have found some carriers to run hacked nameservers. Several years
ago I was moving a website and found that Cox was overriding the TTL
for all "www" names. At least for their residential customers in
Oklahoma. The TTL value our test subject was getting was larger than
it had ever been set.

--
Mr. Flibble
King of the Potato People
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 2/29/2012 1:38 PM, Robert Hajime Lanning wrote:
> On 02/29/12 10:01, Owen DeLong wrote:
>> Further, DNS performance issues in the past have led developers of
>> such applications to "take matters into their own hands" to try and
>> improve the performance/behavior of their application in spite of
>> DNS. This is one of the things that led to many of the TTL ignorant
>> application-level DNS caches which you are complaining about.
>
> I have found some carriers to run hacked nameservers. Several years
> ago I was moving a website and found that Cox was overriding the TTL
> for all "www" names. At least for their residential customers in
> Oklahoma. The TTL value our test subject was getting was larger than
> it had ever been set.
>

Back in the day, the uu.net cache servers where set for 24 hours (can't
remember if they claimed it was a performance issue or some other
justification). Several other large ISPs of the day also did this, so
you typically got the "allow 24 hours for full propagation of DNS
changes ..." response when updating external DNS entries. Nominal best
practice is to expect that and to run the service on old and new IPs for
at least 24 hours then start doing redirection (where possible by
protocol) or stop servicing the protocols on the old IP.


I'm sure other providers are doing the same to slow down fast flux
entries being used for spam site hosting today.

--
---
James M Keller
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
> On Wed, Feb 29, 2012 at 7:57 AM, Joe Greco <jgreco@ns.sol.net> wrote:
> >> In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q@mail.gmail.com>,
> >>  William Herrin writes:
> >> > On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka@isc.org> wrote:
> >> > > DNS TTL works. =A0Applications that don't honour it arn't a indication th=
> >> > at
> >> > > it doesn't work.
> >> >
> >> > Mark,
> >> >
> >> > If three people died and the building burned down then the sprinkler
> >> > system didn't work. It may have sprayed water, but it didn't *work*.
> >>
> >> Not enough evidence to say if it worked or not.  Sprinkler systems
> >> are designed to handle particular classes of fire, not every fire.
> >
> > It is also worth noting that many fire systems are not intended to
> > put out the fire, but to provide warning and then provide an extended
> > window for people to exit the affected building through use of sprinklers
> > and other measures to slow the spread of the fire.
>
> Hi Joe,
>
> The sprinkler system is designed to delay the fire long enough for
> everyone to safely escape.

Hi Bill,

No, the sprinkler system is *intended* to delay the fire long enough
for everyone to safely escape, however, in order to accomplish this,
the designer chooses from some reasonable options to meet certain
goals that are commonly accepted to allow that. For example, the
suppression design applied to a multistory dwelling where people
live, cook, and sleep is typically different from the single-story
light office space. Neither design will be effective against all
possible types of fire

> As a secondary objective, it reduces the
> fire damage that occurs while waiting for firefighters to arrive and
> extinguish the fire. If "three people died" then the system failed.

That's silly. The system fails if the system *fails* or doesn't
behave as designed. No system is capable of guaranteeing survival.

Just yesterday, here in Milwaukee, we had a child killed at a
railroad crossing. The crossing was well-marked, with signals
and gates. Visibility of approaching trains for close to a mile
in either direction. The crew on the train saw him crossing,
blew their horn, laid on the emergency brakes. CP Rail inspected
the gates and signals for any possible faults, but eyewitness
accounts were that the gates and signals were working, and the
train made every effort to make itself known.

The 11 year old kid had his hood up and earbuds in, and apparently
didn't see the signals or look up and down the track before crossing,
and for whatever reason, didn't hear the train horn blaring at him.

At a certain point, you just can't protect against every possible
bad thing that can happen. I have a hard time seeing this as a
failure of the railroad's fully functional railroad crossing and
related safety mechanisms. The system doesn't guarantee survival.

> Whoever you want to blame, DNS TTL dysfunction at the application
> level is the same way. It's a failed system. With the TTL on an A
> record set to 60 seconds, you can't change the address attached to the
> A record and expect that 60 seconds later no one will continue to
> connect to the old address. Nor 600 seconds later nor 6000 seconds
> later. The "system" for renumbering a service of which the TTL setting
> is a part consistently fails to reliably function in that manner.

It's a failure because people don't understand the intent of the system,
and it is pretty safe to argue that it is a multifaceted failure, due
to failures by client implementations, server implementations, sample
code, attempts to use the system for things it wasn't meant for, etc.
This is by no means limited to TTL; we've screwed up multiple addresses,
IPv6 handling, negative caching, um, do I need to go on...?

In the specific case of TTL, the problem is made much worse due to the
way most client code has hidden this data from developers, so that many
developers don't even have any idea that such a thing exists.

I'm not sure how to see that a design failure of the TTL mechanism.

I don't see developers ignoring DNS and hardcoding IP addresses into
code as a failure of the DNS system.

I see both as naive implementation errors. The difference with TTL is
that the implementation errors are so widespread as to render any sane
implementation relatively useless.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Wed, Feb 29, 2012 at 4:02 PM, Joe Greco <jgreco@ns.sol.net> wrote:
> In the specific case of TTL, the problem is made much worse due to the
> way most client code has hidden this data from developers, so that many
> developers don't even have any idea that such a thing exists.
>
> I'm not sure how to see that a design failure of the TTL mechanism.

Hi Joe,

You shouldn't see that as a design failure of the TTL mechanism. It
isn't. It's a failure of the system of which DNS TTL is a component.
The TTL component itself was reasonably designed.

The failure is likened to installing a well designed sprinkler system
(the DNS with a TTL) and then shutting off the water valve
(gethostbyname/getaddrinfo).


> I don't see developers ignoring DNS and hardcoding IP addresses into
> code as a failure of the DNS system.

It isn't. It's a failure of the sockets API design which calls on
every application developer to (a) translate the name to a set of
addresses with a mechanism that discards the TTL knowledge and (b)
implement his own glue between name to address mapping and connect by
address.

It would be like telling an app developer: here's the ARP function and
the SEND function. When you Send to an IP address, make sure you
attach the right destination MAC. Of course the app developer gets it
wrong most of the time.

Regards,
Bill Herrin



--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mon, Feb 27, 2012 at 10:57 PM, Matt Addison
<matt.addison@lists.evilgeni.us> wrote:
> gai/gni do not return TTL values on any platforms I'm aware of, the
> only way to get TTL currently is to use a non standard resolver (e.g.
> lwres). The issue is application developers not calling gai every time

GAI/GNI do not return TTL values, but this should not be a problem.
If they were to return anything, it should not be a TTL, but a time()
value, after which
the result may no longer be used.

One way to achieve that would be for GAI to return an opaque structure
that contained the IP and such a value, in a manner consumable by the
sockets API, and adjust connect() to return an error if passed a
structure containing a ' returned time + TTL' in the past.


TTL values are a DNS resolver function; the application consuming the
sockets API
should not be concerned about details of the DNS protocol.

All the application developer should need to know is that you invoke
GAI/GNI and wait for a response.
Once you have that response, it is permissible to use the value immediately,
but you may not store or re-use that value for more than a few seconds.

If you require that value again later, then you invoke GAI/GNI again;
any caching details
are the concern of the resolver library developer who has implemented GAI/GNI.

--
-JH
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
> GAI/GNI do not return TTL values, but this should not be a problem.
> If they were to return anything, it should not be a TTL, but a time()
> value, after which the result may no longer be used.
>
> One way to achieve that would be for GAI to return an opaque structure
> that contained the IP and such a value, in a manner consumable by the
> sockets API, and adjust connect() to return an error if passed a
> structure containing a ' returned time + TTL' in the past.

AF_INET_TTL and AFINET6_TTL, with correspondingly expanded struct sockaddr_* ?

Code that explictly requests AF_INET or AF_INET6 would get what it was expecting, code that requests AF_UNSPEC on a system with modified getaddrinfo() would get the expanded structs with the different ai_family set, and could pass them straight into a modified connect().

I'm sure I'm grossly oversimplifying somewhere though...

Regards,
Tim.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Feb 29, 2012, at 10:15 PM, Jimmy Hess wrote:

> On Mon, Feb 27, 2012 at 10:57 PM, Matt Addison
> <matt.addison@lists.evilgeni.us> wrote:
>> gai/gni do not return TTL values on any platforms I'm aware of, the
>> only way to get TTL currently is to use a non standard resolver (e.g.
>> lwres). The issue is application developers not calling gai every time
>
> GAI/GNI do not return TTL values, but this should not be a problem.
> If they were to return anything, it should not be a TTL, but a time()
> value, after which
> the result may no longer be used.
>
> One way to achieve that would be for GAI to return an opaque structure
> that contained the IP and such a value, in a manner consumable by the
> sockets API, and adjust connect() to return an error if passed a
> structure containing a ' returned time + TTL' in the past.
>
>
> TTL values are a DNS resolver function; the application consuming the
> sockets API
> should not be concerned about details of the DNS protocol.
>
> All the application developer should need to know is that you invoke
> GAI/GNI and wait for a response.
> Once you have that response, it is permissible to use the value immediately,
> but you may not store or re-use that value for more than a few seconds.
>
> If you require that value again later, then you invoke GAI/GNI again;
> any caching details
> are the concern of the resolver library developer who has implemented GAI/GNI.
>
> --
> -JH

The simpler approach and perfectly viable without mucking up what is already implemented and working:

Don't keep returns from GAI/GNI around longer than it takes to cycle through your connect() loop immediately after the GAI/GNI call.

If you write your code to the standard of:

getaddrinfo();
/* do something with the results */
freeaddrinfo();

with a very limited amount of time passing between getaddrinfo() and freeaddrinfo(), then, you don't need TTLs and it doesn't matter.

The system resolver library should do the right thing with DNS TTLs for records retrieved from DNS and a subsequent call to getaddrinfo() within the DNS TTL for the previously retrieved record should be a relatively cheap, fast in-memory operation.

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
>
> On Wed, Feb 29, 2012 at 4:02 PM, Joe Greco <jgreco@ns.sol.net> wrote:
> > In the specific case of TTL, the problem is made much worse due to the
> > way most client code has hidden this data from developers, so that many
> > developers don't even have any idea that such a thing exists.
> >
> > I'm not sure how to see that a design failure of the TTL mechanism.
>
> Hi Joe,
>
> You shouldn't see that as a design failure of the TTL mechanism. It
> isn't. It's a failure of the system of which DNS TTL is a component.
> The TTL component itself was reasonably designed.

Think that's pretty much what I said.

> The failure is likened to installing a well designed sprinkler system
> (the DNS with a TTL) and then shutting off the water valve
> (gethostbyname/getaddrinfo).

No, the water still works as intended. I think your analogy starts to
fail here. It's more like expecting a water suppression system to put
out a grease fire. The TTL mechanism is completely suitable for what
it was originally meant for, and in an environment where everyone has
followed the rules, it works fine. If you take a light office space
with sprinklers and remodel it into a short order grill, the fire
inspector will require you to rework the fire suppression system to
an appropriate system.

Problem is, TTL is a relatively light-duty system that people have felt
free to ignore, overload for other purposes, etc., but there's no fire
inspector to come around and tell people how and why what they've done
is broken. In the case of TTL, the system is even largely hidden from
users, so that it is rarely thought about except now and then on NANOG,
dns-operations, etc. ;-) No wonder it is even poorly understood.

> > I don't see developers ignoring DNS and hardcoding IP addresses into
> > code as a failure of the DNS system.
>
> It isn't. It's a failure of the sockets API design which calls on
> every application developer to (a) translate the name to a set of
> addresses with a mechanism that discards the TTL knowledge and (b)
> implement his own glue between name to address mapping and connect by
> address.
>
> It would be like telling an app developer: here's the ARP function and
> the SEND function. When you Send to an IP address, make sure you
> attach the right destination MAC. Of course the app developer gets it
> wrong most of the time.

That's correct - and it doesn't imply that the system that was engineered
is faulty. In all likelihood, the fault lies with what the app developer
was told.

You originally said:

"If three people died and the building burned down then the sprinkler
system didn't work. It may have sprayed water, but it didn't *work*."

That's not true. If it sprayed water in the manner it was designed to,
then it worked. If three people took sleeping pills and didn't wake up
when the alarms blared, and an arsonist poured ten gallons of gas
everywhere before lighting the fire, the system still worked. It failed
to save those lives or protect the building from burning down, but I
am aware of no fire suppression systems that realistically attempts to
address that. It is an unreasonable expectation.

I have a hard time seeing the many self-inflicted wounds of people who
have attempted to abuse TTL for various purposes as a failure of the TTL
design. The design is reasonable.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 7:20 AM, Owen DeLong <owen@delong.com> wrote:
> The simpler approach and perfectly viable without mucking
> up what is already implemented and working:
>
> Don't keep returns from GAI/GNI around longer than it takes
> to cycle through your connect() loop immediately after the GAI/GNI call.

The even simpler approach: create an AF_NAME with a sockaddr struct
that contains a hostname instead of an IPvX address. Then let
connect() figure out the details of caching, TTLs, protocol and
address selection, etc. Such a connect() could even support a revised
TCP stack which is able to retry with the other addresses at the first
subsecond timeout rather than camping on each address in sequence for
the typical system default of two minutes.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 8:25 AM, Joe Greco <jgreco@ns.sol.net> wrote:
> "If three people died and the building burned down then the sprinkler
> system didn't work. It may have sprayed water, but it didn't *work*."
>
> That's not true.  If it sprayed water in the manner it was designed to,
> then it worked.

That's like the old crack about ICBM interceptors. Why yes, our system
performed swimmingly in the latest test achieving nine out of the ten
criteria for success. Which criteria didn't it achieve? It missed the
target.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
> On Thu, Mar 1, 2012 at 8:25 AM, Joe Greco <jgreco@ns.sol.net> wrote:
> > "If three people died and the building burned down then the sprinkler
> > system didn't work. It may have sprayed water, but it didn't *work*."
> >
> > That's not true. =A0If it sprayed water in the manner it was designed to,
> > then it worked.
>
> That's like the old crack about ICBM interceptors. Why yes, our system
> performed swimmingly in the latest test achieving nine out of the ten
> criteria for success. Which criteria didn't it achieve? It missed the
> target.

Difference: the fire suppression system worked as designed, the ICBM
didn't.

That's kind of the whole point here. If you have something like an
automobile that's designed to protect you against certain kinds of
accidents, it isn't a failure if it does not protect you against an
accident that is not reasonably within the protection envelope.

For example, cars these days are designed to protect against many
different types of impacts and provide survivability. It is a failure
if my car is designed to protect against a head-on crash at 30MPH by
use of engineered crumple zones and deploying air bags, and I get into
such an accident and am killed regardless. However, if I fly my car
into a bridge abutment at 150MPH and am instantly pulverized, I am not
prepared to consider that a failure of the car. Likewise, if a freeway
overpass slab falls on my car and crushes me as I drive underneath it,
I am not going to consider that a failure of the car.

There's a definite distinction between a system that fails when it is
deployed and used in the intended manner, and a system that doesn't
work as you'd like it to when it is used in some incorrect manner, which
is really not a failure as the word is normally used.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 03/01/2012 06:26 AM, William Herrin wrote:
> On Thu, Mar 1, 2012 at 7:20 AM, Owen DeLong<owen@delong.com> wrote:
>> The simpler approach and perfectly viable without mucking
>> up what is already implemented and working:
>>
>> Don't keep returns from GAI/GNI around longer than it takes
>> to cycle through your connect() loop immediately after the GAI/GNI call.
> The even simpler approach: create an AF_NAME with a sockaddr struct
> that contains a hostname instead of an IPvX address. Then let
> connect() figure out the details of caching, TTLs, protocol and
> address selection, etc. Such a connect() could even support a revised
> TCP stack which is able to retry with the other addresses at the first
> subsecond timeout rather than camping on each address in sequence for
> the typical system default of two minutes.

The effect of what you're recommending is to move all of this
into the kernel, and in the process greatly expand its scope. Also:
even if you did this, you'd be saddled with the same problem because
nothing existing would use an AF_NAME.

The real issue is that gethostbyxxx has been inadequate for a very
long time. Moving it across the kernel boundary solves nothing and
most likely causes even more trouble: what if I want, say, asynchronous
name resolution? What if I want to use SRV records? What if a new DNS
RR comes around -- do i have do recompile the kernel? It's for these
reasons and probably a whole lot more that connect just confuses the
actual issues.

When I was writing the first version of DKIM I used a library that I scraped
off the net called ARES. It worked adequately for me, but the most notable
thing was the very fact that I had to scrape it off the net at all. As far as
I could tell, standard distos don't have libraries with lower level access to
DNS (in my case, it needed to not block). Before positing a super-deluxe
gethostbyxx that does addresses picking, etc, etc, it would be better to
lobby all of the distos to settle on a decomposed resolver library from
which that and more could be built.

Mike
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
> On 03/01/2012 06:26 AM, William Herrin wrote:
> > On Thu, Mar 1, 2012 at 7:20 AM, Owen DeLong<owen@delong.com> wrote:
> >> The simpler approach and perfectly viable without mucking
> >> up what is already implemented and working:
> >>
> >> Don't keep returns from GAI/GNI around longer than it takes
> >> to cycle through your connect() loop immediately after the GAI/GNI call.
> > The even simpler approach: create an AF_NAME with a sockaddr struct
> > that contains a hostname instead of an IPvX address. Then let
> > connect() figure out the details of caching, TTLs, protocol and
> > address selection, etc. Such a connect() could even support a revised
> > TCP stack which is able to retry with the other addresses at the first
> > subsecond timeout rather than camping on each address in sequence for
> > the typical system default of two minutes.
>
> The effect of what you're recommending is to move all of this
> into the kernel, and in the process greatly expand its scope. Also:
> even if you did this, you'd be saddled with the same problem because
> nothing existing would use an AF_NAME.
>
> The real issue is that gethostbyxxx has been inadequate for a very
> long time. Moving it across the kernel boundary solves nothing and
> most likely causes even more trouble: what if I want, say, asynchronous
> name resolution? What if I want to use SRV records? What if a new DNS
> RR comes around -- do i have do recompile the kernel? It's for these
> reasons and probably a whole lot more that connect just confuses the
> actual issues.
>
> When I was writing the first version of DKIM I used a library that I scraped
> off the net called ARES. It worked adequately for me, but the most notable
> thing was the very fact that I had to scrape it off the net at all. As far as
> I could tell, standard distos don't have libraries with lower level access to
> DNS (in my case, it needed to not block). Before positing a super-deluxe
> gethostbyxx that does addresses picking, etc, etc, it would be better to
> lobby all of the distos to settle on a decomposed resolver library from
> which that and more could be built.

It's deeper than just that, though. The whole paradigm is messy, from
the point of view of someone who just wants to get stuff done. The
examples are (almost?) all fatally flawed. The code that actually gets
at least some of it right ends up being too complex and too hard for
people to understand why things are done the way they are.

Even in the "old days", before IPv6, geez, look at this:

bcopy(host->h_addr_list[n], (char *)&addr->sin_addr.s_addr, sizeof(addr->sin_addr.s_addr));

That's real comprehensible - and it's essentially the data interface
between the resolver library and the system's addressing structures
for syscalls.

On one hand, it's "great" that they wanted to abstract the dirty details
of DNS away from users, but I'd say they failed pretty much even at that.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 03/01/2012 07:22 AM, Joe Greco wrote:
> It's deeper than just that, though. The whole paradigm is messy, from
> the point of view of someone who just wants to get stuff done. The
> examples are (almost?) all fatally flawed. The code that actually gets
> at least some of it right ends up being too complex and too hard for
> people to understand why things are done the way they are.
>
> Even in the "old days", before IPv6, geez, look at this:
>
> bcopy(host->h_addr_list[n], (char *)&addr->sin_addr.s_addr, sizeof(addr->sin_addr.s_addr));
>
> That's real comprehensible - and it's essentially the data interface
> between the resolver library and the system's addressing structures
> for syscalls.
>
> On one hand, it's "great" that they wanted to abstract the dirty details
> of DNS away from users, but I'd say they failed pretty much even at that.

Yes, as simple as the normal kernel interface is for net io, getting
to the point that you can do a connect() is both maddeningly
messy and maddeningly inflexible -- the worst of all possible
worlds. We shouldn't kid ourselves that DNS is a simple protocol
though. It has layers of complexity and the policy decisions about
address picking are not easy. But things like dealing with caching
correctly shouldn't be that painful if done correctly by, say, discouraging
copying addresses with, say, a wrapper function that validates the
TTL and hands you back a filled out sockaddr.

But not wanting to block -- which is needed for an event loop or
run to completion like interface -- adds a completely new dimension.
Maybe it's the intersection of all of these complexities that's at the root
of why we're stuck with either gethostbyxx or roll your own.

Mike
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
Hi,

On Mar 1, 2012, at 7:22 AM, Joe Greco wrote:
> On Mar 1, 2012, at 7:01 AM, Michael Thomas wrote:
>> The effect of what you're recommending is to move all of this
>> into the kernel, and in the process greatly expand its scope. Also:
>> even if you did this, you'd be saddled with the same problem because
>> nothing existing would use an AF_NAME.

I always thought the right way to deal with IPv6 would have been to use a 32-bit number from the class E space as a 'network handle' where the actual address (be it IPv4 or IPv6) was handled by the kernel. I suspect this would have allowed the majority of network-utilizing applications to magically just work, regardless of whether the name supplied by gethosbyname/getnameinfo/etc. was mapped to an address with A or AAAA. Probably would make stuff faster too since you'd only have to deal with an unsigned int instead of (worst case) 16 bytes that have to be copied back and forth.

Instead, we have forced application developers to use a really odd mixture of old and new, e.g. 'struct sockaddr_in6' and GNI/GAI. Seems this is the worst of both worlds -- no backwards compatibility yet an adherence to a really broken model that requires applications to know useless details like the length of an address ("what do you mean a sizeof(struct sockaddr) isn't big enough to hold an IPv6 address?") and even its bit patterns.

>> Moving it across the kernel boundary solves nothing

Actually, it does. Right now, applications effectively cache the address in their data space, requiring the application developer to go to quite a bit of work to deal with the address changing (or, far more typically, just pretend addresses never change). This has a lot of unfortunate side effects.

>> and
>> most likely causes even more trouble: what if I want, say, asynchronous
>> name resolution?

Set non-blocking on the socket?

>> What if I want to use SRV records? What if a new DNS
>> RR comes around -- do i have do recompile the kernel?

I believe with the exception of A/AAAA, RDATA is typically returned as either opaque (to the DNS) data blobs or names. This means the only stuff the kernel would need to deal with would be the A/AAAA lookups, everything else would be passed back as data, presumably via a new system call.

>> As far as
>> I could tell, standard distos don't have libraries with lower level access to
>> DNS (in my case, it needed to not block).

There have been lower-level resolver APIs since (at least) BSD 4.3 (man resolver(3)).

> It's deeper than just that, though. The whole paradigm is messy, from
> the point of view of someone who just wants to get stuff done. The

> examples are (almost?) all fatally flawed. The code that actually gets
> at least some of it right ends up being too complex and too hard for
> people to understand why things are done the way they are.

Exactly. Even before IPv6, it was icky. Now, it's just crazy. We had an opportunity to fix this with IPv6 since IPv6 required non-trivial kernel hackage. Unfortunately, we didn't take advantage of that opportunity.

Regards,
-drc
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 10:01 AM, Michael Thomas <mike@mtcc.com> wrote:
> On 03/01/2012 06:26 AM, William Herrin wrote:
>> The even simpler approach: create an AF_NAME with a sockaddr struct
>> that contains a hostname instead of an IPvX address. Then let
>> connect() figure out the details of caching, TTLs, protocol and
>> address selection, etc.  Such a connect() could even support a revised
>> TCP stack which is able to retry with the other addresses at the first
>> subsecond timeout rather than camping on each address in sequence for
>> the typical system default of two minutes.
>
>
> The effect of what you're recommending is to move all of this
> into the kernel, and in the process greatly expand its scope.

Hi Michael,

libc != kernel. I want to move the action into the standard libraries
where it can be done once and done well. A little kernel action on top
to parallelize connection attempts where there are multiple candidate
addresses would be gravy, but not required.


> even if you did this, you'd be saddled with the same problem because
> nothing existing would use an AF_NAME.

It won't instantly fix everything so we shouldn't do it at all?


> what if I want, say, asynchronous
> name resolution? What if I want to use SRV records? What if a new DNS
> RR comes around

Then you do it the long way, same as you do now. But in the 99% of the
time that you're initiating a connection the "normal" way, you don't
have to (badly) reinvent the wheel.


> As far as
> I could tell, standard distos don't have libraries with lower level access to
> DNS (in my case, it needed to not block). Before positing a super-deluxe
> gethostbyxx that does addresses picking, etc, etc it would be better to
> lobby all of the distos to settle on a decomposed resolver library from
> which that and more could be built.

(A) Revised standards are -how- multiple OSes from multiple vendors
coordinate the deployment of an identical capability.

(B) Application programmers generally DO want the abstraction from
"DNS" to "Name resolution." If there's an /etc/hosts name or a NIS
name or a Windows name available, you ordinarily want to use it. You
don't want to build extra code to search each name service
independently any more than you want to build extra code to cycle
through candidate addresses.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 2012-03-01 17:57 , David Conrad wrote:
> Hi,
>
> On Mar 1, 2012, at 7:22 AM, Joe Greco wrote:
>> On Mar 1, 2012, at 7:01 AM, Michael Thomas wrote:
>>> The effect of what you're recommending is to move all of this
>>> into the kernel, and in the process greatly expand its scope.
>>> Also: even if you did this, you'd be saddled with the same
>>> problem because nothing existing would use an AF_NAME.
>
> I always thought the right way to deal with IPv6 would have been to
> use a 32-bit number from the class E space as a 'network handle'
> where the actual address (be it IPv4 or IPv6) was handled by the
> kernel.

This is the case when you pass in a sockaddr. Note, not a sockaddr_in or
a sockaddr_in6, but just a sockaddr.

There is a nice 14 year old article about this:
http://www.kame.net/newsletter/19980604/

> I suspect this would have allowed the majority of
> network-utilizing applications to magically just work, regardless of
> whether the name supplied by gethosbyname/getnameinfo/etc. was mapped
> to an address with A or AAAA. Probably would make stuff faster too
> since you'd only have to deal with an unsigned int instead of (worst
> case) 16 bytes that have to be copied back and forth.

There is quite a bit more state than that. And actually those addresses
are only 'copied' once: during accept() or connect(), there is no
"speed-loss" per send/recv as the only thing being moved from user space
to kernel space is the file descriptor and the actual data.

[..]
> Instead, we have forced application developers to use a really odd
> mixture of old and new, e.g. 'struct sockaddr_in6' and GNI/GAI.
> Seems this is the worst of both worlds -- no backwards compatibility
> yet an adherence to a really broken model that requires applications
> to know useless details like the length of an address ("what do you
> mean a sizeof(struct sockaddr) isn't big enough to hold an IPv6
> address?") and even its bit patterns.

Ever heard of sockaddr_storage? It was made to solve that little issue.
See also, that article above.

[..]
> Exactly. Even before IPv6, it was icky. Now, it's just crazy. We
> had an opportunity to fix this with IPv6 since IPv6 required
> non-trivial kernel hackage. Unfortunately, we didn't take advantage
> of that opportunity.

What you are talking about is an API wrapper. Depending on platform
these have existed for years already. Quite a few do not expose
addresses at all to the calling code.

One of the many reasons why putting the IPv6 enabled winsock dll in
place 14 years ago made various winsock applications understand IPv6.

Greets,
Jeroen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 03/01/2012 08:57 AM, David Conrad wrote:
>
>> Moving it across the kernel boundary solves nothing
> Actually, it does. Right now, applications effectively cache the address in their data space, requiring the application developer to go to quite a bit of work to deal with the address changing (or, far more typically, just pretend addresses never change). This has a lot of unfortunate side effects.

My rule of thumb is for this sort of thing "does it *require* kernel level access?"
In this case, the answer is manifestly "no". As far as ttl's go in particular, most
apps would work perfectly well always doing real DNS socket IO to a local resolver
each time which has the side effect that it would honor ttl, as well as benefiting
from cross process caching. It could be done in the kernel, but it would be introducing
a *lot* of complexity and inflexibility.

Even if you did want super high performance local DNS resolution, there are
still a lot of other ways to achieve that besides jamming it into the kernel. A
lot of the beauty of UNIX is that the kernel system interface is simple... dragging
more into the kernel is aesthetically wrong.

>>> What if I want to use SRV records? What if a new DNS
>>> RR comes around -- do i have do recompile the kernel?
> I believe with the exception of A/AAAA, RDATA is typically returned as either opaque (to the DNS) data blobs or names. This means the only stuff the kernel would need to deal with would be the A/AAAA lookups, everything else would be passed back as data, presumably via a new system call.

SRV records? This is starting to get really messy inside the kernel and for
no good reason that I can see.

>
>>> As far as
>>> I could tell, standard distos don't have libraries with lower level access to
>>> DNS (in my case, it needed to not block).
> There have been lower-level resolver APIs since (at least) BSD 4.3 (man resolver(3)).

This is all getting sort of hazy since it was 8 years ago, but yes res_XX existed,
and hence the ares_ analog that I used. Maybe all that's really needed for low
level access primitives is a merger of res_ and ares_... asynchronous resolution
is a fairly important feature for modern event loop like things. But I don't claim
to be a DNS wonk so it might be worse than that.

Mike
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 03/01/2012 08:58 AM, William Herrin wrote:
> On Thu, Mar 1, 2012 at 10:01 AM, Michael Thomas<mike@mtcc.com> wrote:
>> On 03/01/2012 06:26 AM, William Herrin wrote:
>>> The even simpler approach: create an AF_NAME with a sockaddr struct
>>> that contains a hostname instead of an IPvX address. Then let
>>> connect() figure out the details of caching, TTLs, protocol and
>>> address selection, etc. Such a connect() could even support a revised
>>> TCP stack which is able to retry with the other addresses at the first
>>> subsecond timeout rather than camping on each address in sequence for
>>> the typical system default of two minutes.
>>
>> The effect of what you're recommending is to move all of this
>> into the kernel, and in the process greatly expand its scope.
> Hi Michael,
>
> libc != kernel. I want to move the action into the standard libraries
> where it can be done once and done well. A little kernel action on top
> to parallelize connection attempts where there are multiple candidate
> addresses would be gravy, but not required.

connect(2) is a kernel level call just like open(2), etc. It may
have a thin wrapper, but that's OS dependent, IIRC.

man connect 2:

"The connect() system call connects the socket referred to by the file descriptor..."

Mike
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 1:32 PM, Michael Thomas <mike@mtcc.com> wrote:
> On 03/01/2012 08:58 AM, William Herrin wrote:
>> libc != kernel. I want to move the action into the standard libraries
>> where [resolve and connect] can be done once and done well.
>> A little kernel action on top
>> to parallelize connection attempts where there are multiple candidate
>> addresses would be gravy, but not required.
>
> connect(2) is a kernel level call just like open(2), etc. It may
> have a thin wrapper, but that's OS dependent, IIRC.
>
> man connect 2:
>
> "The connect() system call connects the socket referred to by the file
> descriptor..."

Then name the new one something else and document it in man section 3.
Next objection?

-Bill


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
Michael,

On Mar 1, 2012, at 10:00 AM, Michael Thomas wrote:
> My rule of thumb is for this sort of thing "does it *require* kernel level access?"
> In this case, the answer is manifestly "no".

This is tilting at windmills since it's wildly unlikely anything will change, but...

The idea is to add a level of indirection that does not currently exist, similar to the mapping of filename/file handle/inode in the filesystem. This layer of indirection allows the kernel to remap things as it sees fit without impacting the application. If such functionality existed, the kernel could manage the mapping between name and address to do things like honoring DNS TTL, transparently handling renumbering events, deal with protocol transitions even during a connection, etc. As things are now, it's like having to rewrite non-tivial sections of code for _all_ disk-aware applications because we've gone from a 32-bit file system to a 64-bit file system, even though the vast majority of those applications couldn't care less.

> SRV records?

Do not have addresses in their RDATA, they have names.

Regards,
-drc
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
Jeroen,

On Mar 1, 2012, at 9:25 AM, Jeroen Massar wrote:
>> I always thought the right way to deal with IPv6 would have been to
>> use a 32-bit number from the class E space as a 'network handle'
>> where the actual address (be it IPv4 or IPv6) was handled by the
>> kernel.
>
> This is the case when you pass in a sockaddr. Note, not a sockaddr_in or
> a sockaddr_in6, but just a sockaddr.

Sorry? On which system? As far as I'm aware, there are no libraries that make use of class E addresses to act as a layer of indirection similar to file handles. Would love to know such exists.

> There is a nice 14 year old article about this:
> http://www.kame.net/newsletter/19980604/

Quoting from that article: "This way the network address and address family is will not live together, and leads to bunch of if/switch statement and mistakes in programming. " which is exactly the point. It has been 14 years and people are _STILL_ discussing this.

> And actually those addresses
> are only 'copied' once: during accept() or connect(),

Assuming the application doesn't need to copy the address, ever.

> Ever heard of sockaddr_storage?

Oddly, yes. It still astonishes me that sizeof(struct sockaddr) < sizeof(struct sockaddr_storage).

> It was made to solve that little issue. See also, that article above.

Thus requiring people to go in and muck with code thereby increasing the cost of migration with obvious effect.

> What you are talking about is an API wrapper. Depending on platform
> these have existed for years already. Quite a few do not expose
> addresses at all to the calling code.

And yet, look at the code Mark Andrews just referenced as his recommend way of dealing with initiating connections. How many applications actually do anything like that? More to the point, how many books/article/etc. exist that reference these APIs you're talking about vs. how many reference the traditional way one goes about dealing with networks?

Rhetorical questions, no need to answer. Got tired of tilting at this windmill some time ago and I know nothing will change. I'm just amazed that people defend the abominable kludge that are the existing common sockets/resolver APIs.

Regards,
-drc
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 1, 2012, at 6:26 AM, William Herrin wrote:

> On Thu, Mar 1, 2012 at 7:20 AM, Owen DeLong <owen@delong.com> wrote:
>> The simpler approach and perfectly viable without mucking
>> up what is already implemented and working:
>>
>> Don't keep returns from GAI/GNI around longer than it takes
>> to cycle through your connect() loop immediately after the GAI/GNI call.
>
> The even simpler approach: create an AF_NAME with a sockaddr struct
> that contains a hostname instead of an IPvX address. Then let
> connect() figure out the details of caching, TTLs, protocol and
> address selection, etc. Such a connect() could even support a revised
> TCP stack which is able to retry with the other addresses at the first
> subsecond timeout rather than camping on each address in sequence for
> the typical system default of two minutes.
>

That's not simpler for the following reasons:

1. It takes away abilities to manage the connect() process that some
applications want.

2. It requires a rewrite of a whole lot of software built on the current
mechanisms.

Most systems provide a mechanism for overriding the timeout for
connect().

Further, there are lots of classes, libraries, etc. that you can already use
if you want to abstract the gai/gni + connect functionality.

What exists isn't broken at the API level. Please stop trying to fix what
is not broken.

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
>
> It's deeper than just that, though. The whole paradigm is messy, from
> the point of view of someone who just wants to get stuff done. The
> examples are (almost?) all fatally flawed. The code that actually gets
> at least some of it right ends up being too complex and too hard for
> people to understand why things are done the way they are.
>
> Even in the "old days", before IPv6, geez, look at this:
>
> bcopy(host->h_addr_list[n], (char *)&addr->sin_addr.s_addr, sizeof(addr->sin_addr.s_addr));
>
> That's real comprehensible - and it's essentially the data interface
> between the resolver library and the system's addressing structures
> for syscalls.
>
> On one hand, it's "great" that they wanted to abstract the dirty details
> of DNS away from users, but I'd say they failed pretty much even at that.
>
> ... JG
> --
> Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
> "We call it the 'one bite at the apple' rule. Give me one chance [and] then I
> won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
> With 24 million small businesses in the US alone, that's way too many apples.

I think that the modern set of getaddrinfo and connect is actually not that complicated:

/* Hints for getaddrinfo() (tell it what we want) */
memset(&addrinfo, 0, sizeof(addrinfo)); /* Zero out the buffer */
addrinfo.ai_family=PF_UNSPEC; /* Any and all address families */
addrinfo.ai_socktype=SOCK_STREAM; /* Stream Socket */
addrinfo.ai_protocol=IPPROTO_TCP; /* TCP */
/* Ask the resolver library for the information. Exit on failure. */
/* argv[1] is the hostname passed in by the user. "demo" is the service name */
if (rval = getaddrinfo(argv[1], "demo", &addrinfo, &res) != 0) {
fprintf(stderr, "%s: Failed to resolve address information.\n", argv[0]);
exit(2);
}

/* Iterate through the results */
for (r=res; r; r = r->ai_next) {
/* Create a socket configured for the next candidate */
sockfd6 = socket(r->ai_family, r->ai_socktype, r->ai_protocol);
/* Try to connect */
if (connect(sockfd6, r->ai_addr, r->ai_addrlen) < 0)
{
/* Failed to connect */
e_save = errno;
/* Destroy socket */
(void) close(sockfd6);
/* Recover the error information */
errno = e_save;
/* Tell the user that this attempt failed */
fprintf(stderr, "%s: Failed attempt to %s.\n", argv[0],
get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN));
/* Give error details */
perror("Socket error");
} else { /* Success! */
/* Inform the user */
snprintf(s, BUFLEN, "%s: Succeeded to %s.", argv[0],
get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN));
debug(5, argv[0], s);
/* Flag our success */
success++;
/* Stop iterating */
break;
}
}
/* Out of the loop. Either we succeeded or ran out of possibilities */
if (success == 0) /* If we ran out of possibilities... */
{
/* Inform the user, free up the resources, and exit */
fprintf(stderr, "%s: Failed to connect to %s.\n", argv[0], argv[1]);
freeaddrinfo(res);
exit(5);
}
/* Succeeded. Inform the user and continue with the application */
printf("%s: Successfully connected to %s at %s on FD %d.\n", argv[0], argv[1],
get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN),
sockfd6);
/* Free up the memory held by the resolver results */
freeaddrinfo(res);

It's really hard to make a case that this is all that complex.

I put a lot of extra comments in there to make it clear what's happening for people who may not be used to coding in C. It also contains a whole lot of extra user notification and debugging instrumentation because it is designed as an example people can use to learn with.

Yes, this was a lot messier and a lot stranger and harder to get right with get*by{name,addr}, but, those days are long gone and anyone still coding with those needs to move forward.

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
In message <CAC38B59-1F54-4788-87A2-A1A8BE453500@delong.com>, Owen DeLong write
s:
> >=20
> > It's deeper than just that, though. The whole paradigm is messy, from
> > the point of view of someone who just wants to get stuff done. The
> > examples are (almost?) all fatally flawed. The code that actually =
> gets
> > at least some of it right ends up being too complex and too hard for
> > people to understand why things are done the way they are.
> >=20
> > Even in the "old days", before IPv6, geez, look at this:
> >=20
> > bcopy(host->h_addr_list[n], (char *)&addr->sin_addr.s_addr, =
> sizeof(addr->sin_addr.s_addr));
> >=20
> > That's real comprehensible - and it's essentially the data interface=20=
>
> > between the resolver library and the system's addressing structures
> > for syscalls.
> >=20
> > On one hand, it's "great" that they wanted to abstract the dirty =
> details
> > of DNS away from users, but I'd say they failed pretty much even at =
> that.
> >=20
> > ... JG
> > --=20
> > Joe Greco - sol.net Network Services - Milwaukee, WI - =
> http://www.sol.net
> > "We call it the 'one bite at the apple' rule. Give me one chance [and] =
> then I
> > won't contact you again." - Direct Marketing Ass'n position on e-mail =
> spam(CNN)
> > With 24 million small businesses in the US alone, that's way too many =
> apples.
>
> I think that the modern set of getaddrinfo and connect is actually not =
> that complicated:
>
> /* Hints for getaddrinfo() (tell it what we want) */
> memset(&addrinfo, 0, sizeof(addrinfo)); /* Zero out the buffer =
> */
> addrinfo.ai_family=3DPF_UNSPEC; /* Any and all =
> address families */
> addrinfo.ai_socktype=3DSOCK_STREAM; /* Stream Socket */
> addrinfo.ai_protocol=3DIPPROTO_TCP; /* TCP */
> /* Ask the resolver library for the information. Exit on failure. */
> /* argv[1] is the hostname passed in by the user. "demo" is the =
> service name */
> if (rval =3D getaddrinfo(argv[1], "demo", &addrinfo, &res) !=3D 0) {
> fprintf(stderr, "%s: Failed to resolve address information.\n", =
> argv[0]);
> exit(2);
> }
>
> /* Iterate through the results */
> for (r=3Dres; r; r =3D r->ai_next) {
> /* Create a socket configured for the next candidate */
> sockfd6 =3D socket(r->ai_family, r->ai_socktype, r->ai_protocol);
> /* Try to connect */
> if (connect(sockfd6, r->ai_addr, r->ai_addrlen) < 0)
> {
> /* Failed to connect */
> e_save =3D errno;
> /* Destroy socket */
> (void) close(sockfd6);
> /* Recover the error information */
> errno =3D e_save;
> /* Tell the user that this attempt failed */
> fprintf(stderr, "%s: Failed attempt to %s.\n", argv[0],=20
> get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN));
> /* Give error details */
> perror("Socket error");
> } else { /* Success! */
> /* Inform the user */
> snprintf(s, BUFLEN, "%s: Succeeded to %s.", argv[0],
> get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN));
> debug(5, argv[0], s);
> /* Flag our success */
> success++;
> /* Stop iterating */
> break;
> }
> }
> /* Out of the loop. Either we succeeded or ran out of possibilities */
> if (success =3D=3D 0) /* If we ran out of possibilities... */
> {
> /* Inform the user, free up the resources, and exit */
> fprintf(stderr, "%s: Failed to connect to %s.\n", argv[0], argv[1]);
> freeaddrinfo(res);
> exit(5);
> }
> /* Succeeded. Inform the user and continue with the application */
> printf("%s: Successfully connected to %s at %s on FD %d.\n", argv[0], =
> argv[1],
> get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN),
> sockfd6);
> /* Free up the memory held by the resolver results */
> freeaddrinfo(res);
>
> It's really hard to make a case that this is all that complex.
>
> I put a lot of extra comments in there to make it clear what's happening =
> for people who may not be used to coding in C. It also contains a whole =
> lot of extra user notification and debugging instrumentation because it =
> is designed as an example people can use to learn with.=20
>
> Yes, this was a lot messier and a lot stranger and harder to get right =
> with get*by{name,addr}, but, those days are long gone and anyone still =
> coding with those needs to move forward.
>
> Owen
>

These days you want something more complicated as everyone is or
will be soon multi-homed. The basic loop above has very bad error
characteristics if the first machines are not reachable. I've got
working select, poll and thread based examples here:

http://www.isc.org/community/blog/201101/how-to-connect-to-a-multi-homed-server-over-tcp.

From http://www.isc.org/files/imce/select-connect_0.c:

/*
* Copyright (C) 2011 Internet Systems Consortium, Inc. ("ISC")
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND ISC DISCLAIMS ALL WARRANTIES WITH
* REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
* AND FITNESS. IN NO EVENT SHALL ISC BE LIABLE FOR ANY SPECIAL, DIRECT,
* INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
* LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
* OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
* PERFORMANCE OF THIS SOFTWARE.
*/


#define TIMEOUT 500 /* ms */

int
connect_to_host(struct addrinfo *res0) {
struct addrinfo *res;
int fd = -1, n, i, j, flags, count, max = -1, *fds;
struct timeval *timeout, timeout0 = { 0, TIMEOUT * 1000};
fd_set fdset, wrset;

/*
* Work out how many possible descriptors we could use.
*/
for (res = res0, count = 0; res; res = res->ai_next)
count++;
fds = calloc(count, sizeof(*fds));
if (fds == NULL) {
perror("calloc");
goto cleanup;
}
FD_ZERO(&fdset);
for (res = res0, i = 0, count = 0; res; res = res->ai_next) {
fd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
if (fd == -1) {
/*
* If AI_ADDRCONFIG is not supported we will get
* EAFNOSUPPORT returned. Behave as if the address
* was not there.
*/
if (errno != EAFNOSUPPORT)
perror("socket");
else if (res->ai_next != NULL)
continue;
} else if (fd >= FD_SETSIZE) {
close(fd);
} else if ((flags = fcntl(fd, F_GETFL)) == -1) {
perror("fcntl");
close(fd);
} else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) == -1) {
perror("fcntl");
close(fd);
} else if (connect(fd, res->ai_addr, res->ai_addrlen) == -1) {
if (errno != EINPROGRESS) {
perror("connect");
close(fd);
} else {
/*
* Record the information for this descriptor.
*/
fds[i] = fd;
FD_SET(fd, &fdset);
if (max == -1 || fd > max)
max = fd;
count++;
i++;
}
} else {
/*
* We connected without blocking.
*/
goto done;
}

if (count == 0)
continue;

assert(max != -1);
do {
if (res->ai_next != NULL)
timeout = &timeout0;
else
timeout = NULL;

/* The write bit is set on both success and failure. */
wrset = fdset;
n = select(max + 1, NULL, &wrset, NULL, timeout);
if (n == 0) {
timeout0.tv_usec >>= 1;
break;
}
if (n < 0) {
if (errno == EAGAIN || errno == EINTR)
continue;
perror("select");
fd = -1;
goto done;
}
for (fd = 0; fd <= max; fd++) {
if (FD_ISSET(fd, &wrset)) {
socklen_t len;
int err;
for (j = 0; j < i; j++)
if (fds[j] == fd)
break;
assert(j < i);
/*
* Test to see if the connect
* succeeded.
*/
len = sizeof(err);
n = getsockopt(fd, SOL_SOCKET,
SO_ERROR, &err, &len);
if (n != 0 || err != 0) {
close(fd);
FD_CLR(fd, &fdset);
fds[j] = -1;
count--;
continue;
}
/* Connect succeeded. */
goto done;
}
}
} while (timeout == NULL && count != 0);
}

/* We failed to connect. */
fd = -1;

done:
/* Close all other descriptors we have created. */
for (j = 0; j < i; j++)
if (fds[j] != fd && fds[j] != -1) {
close(fds[j]);
}

if (fd != -1) {
/* Restore default blocking behaviour. */
if ((flags = fcntl(fd, F_GETFL)) != -1) {
flags &= ~O_NONBLOCK;
if (fcntl(fd, F_SETFL, flags) == -1)
perror("fcntl");
} else
perror("fcntl");
}

cleanup:
/* Free everything. */
if (fds) free(fds);

return (fd);
}

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 4:07 PM, Owen DeLong <owen@delong.com> wrote:
> I think that the modern set of getaddrinfo and connect is actually not that complicated:

Owen,

If took you 50 lines of code to do
'socket=connect("www.google.com",80,TCP);' and you still managed to
produce a version which, due to the timeout on dead addresses, is
worthless for any kind of interactive program like a web browser. And
because that code isn't found in a system library, every single
application programmer has to write it all over again.

I'm a fan of Rube Goldberg machines but that was ridiculous.

Regards,
Bill Herrin





--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
In message <CAP-guGXLpzai4LrxyJcNn06yQ1jAEu4QeRpVzGRah=+OGLy9Zw@mail.gmail.com>
, William Herrin writes:
> On Thu, Mar 1, 2012 at 4:07 PM, Owen DeLong <owen@delong.com> wrote:
> > I think that the modern set of getaddrinfo and connect is actually not th=
> at complicated:
>
> Owen,
>
> If took you 50 lines of code to do
> 'socket=connect("www.google.com",80,TCP);' and you still managed to
> produce a version which, due to the timeout on dead addresses, is
> worthless for any kind of interactive program like a web browser. And
> because that code isn't found in a system library, every single
> application programmer has to write it all over again.

And your 'socket=connect("www.google.com",80,TCP);' won't work for
a web browser either unless you are using threads and are willing
to have the thread stall.

The existing connect() semantics actually work well for browsers
but they need to be properly integrated into the system as a whole.
Nameservers have similar connect() issues as web browsers with one
advantage, most of the time we are connecting to a machine we have
just connected to via UDP. That doesn't mean we don't do non-blocking
connect however.

> I'm a fan of Rube Goldberg machines but that was ridiculous.
>
> Regards,
> Bill Herrin
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
William,

I could have done it in a lot less lines of code, but, it would have been much less readable.

Not blocking on the connect() call is a little more complex, but, not terribly so. It does, however, again, make the code quite a bit less readable.

There are libraries available that abstract everything I did there and you are welcome to use them.

Since C does not support overloading, they export different functions for the behavior you seek.

If you want, program in Python where the libraries do provide the abstraction you seek. Of course, that means you have to cope with Python's other disgusting habits like spaces are meaningful and variables are indistinguishable from code, but, there's always a tradeoff.

You don't have to reinvent what I've done. Neither does every or any other application programmer.
You are welcome to use any of the many connection abstraction libraries that are available in open source. I suggest you make a trip through google code.

Owen

On Mar 1, 2012, at 2:09 PM, William Herrin wrote:

> On Thu, Mar 1, 2012 at 4:07 PM, Owen DeLong <owen@delong.com> wrote:
>> I think that the modern set of getaddrinfo and connect is actually not that complicated:
>
> Owen,
>
> If took you 50 lines of code to do
> 'socket=connect("www.google.com",80,TCP);' and you still managed to
> produce a version which, due to the timeout on dead addresses, is
> worthless for any kind of interactive program like a web browser. And
> because that code isn't found in a system library, every single
> application programmer has to write it all over again.
>
> I'm a fan of Rube Goldberg machines but that was ridiculous.
>
> Regards,
> Bill Herrin
>
>
>
>
>
> --
> William D. Herrin ................ herrin@dirtside.com bill@herrin.us
> 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
> Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 5:37 PM, Owen DeLong <owen@delong.com> wrote:
> You don't have to reinvent what I've done. Neither does every
> or any other application programmer.
> You are welcome to use any of the many connection
> abstraction libraries that are available in open source.
> I suggest you make a trip through google code.

Which is what everybody basically does. And when it works during the
decidedly non-rigorous testing, they move on to the next problem...
with code that doesn't perform well in the corner cases. Such as when
a host has just been renumbered or one of the host's addresses is
unreachable.

And because most everybody has made more or less the same errors, the
DNS TTL fails to cause their applications to work as intended and
loses its utility as a tool to facilitate renumbering.


> If you want, program in Python where the libraries do
> provide the abstraction you seek. Of course, that
> means you have to cope with Python's other disgusting
> habits like spaces are meaningful and variables are
> indistinguishable from code, but, there's always a tradeoff.

::shudder:: I don't *want* to do anything in python. The occasional
reality of a situation dictates that I do some work in python, but I
most definitely don't *want* to.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 01, 2012 at 05:57:11PM -0500, William Herrin wrote:
> Which is what everybody basically does. And when it works during the
> decidedly non-rigorous testing, they move on to the next problem...
> with code that doesn't perform well in the corner cases. Such as when
> a host has just been renumbered or one of the host's addresses is
> unreachable.
>
> And because most everybody has made more or less the same errors, the
> DNS TTL fails to cause their applications to work as intended and
> loses its utility as a tool to facilitate renumbering.

Is there an RFC or BCP that describes how to correctly write such a
library? Perhaps we need to work to get such a thing, and then push
for RFC-compliance of the resolver libraries, or develop a set of
libraries named after and fully compliant with the RFC and get
software to use them.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 1, 2012, at 2:57 PM, William Herrin wrote:

> On Thu, Mar 1, 2012 at 5:37 PM, Owen DeLong <owen@delong.com> wrote:
>> You don't have to reinvent what I've done. Neither does every
>> or any other application programmer.
>> You are welcome to use any of the many connection
>> abstraction libraries that are available in open source.
>> I suggest you make a trip through google code.
>
> Which is what everybody basically does. And when it works during the
> decidedly non-rigorous testing, they move on to the next problem...
> with code that doesn't perform well in the corner cases. Such as when
> a host has just been renumbered or one of the host's addresses is
> unreachable.
>

Then push for better written abstraction libraries. There's no need to
break the current functionality of the underlying system calls and
libc functions which would be needed by any such library anyway.

> And because most everybody has made more or less the same errors, the
> DNS TTL fails to cause their applications to work as intended and
> loses its utility as a tool to facilitate renumbering.
>

Since I don't write applications for a living, I will admit I haven't rigorously
tested any of the libraries out there, but, I'm willing to bet that someone,
somewhere has probably written a good one by now.

>
>> If you want, program in Python where the libraries do
>> provide the abstraction you seek. Of course, that
>> means you have to cope with Python's other disgusting
>> habits like spaces are meaningful and variables are
>> indistinguishable from code, but, there's always a tradeoff.
>
> ::shudder:: I don't *want* to do anything in python. The occasional
> reality of a situation dictates that I do some work in python, but I
> most definitely don't *want* to.

Believe me, I'm in the same boat on that one. However, it is the only
language I know of that provides the kind of interface you are demanding.
Perhaps this should tell you something about what you are asking for. ;-)

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 8:02 PM, Owen DeLong <owen@delong.com> wrote:
> There's no need to
> break the current functionality of the underlying system calls and
> libc functions which would be needed by any such library anyway.

Owen,

Point to one sentence written by anybody in this entire thread in
which breaking current functionality was proposed.


>> And because most everybody has made more or less the same errors, the
>> DNS TTL fails to cause their applications to work as intended and
>> loses its utility as a tool to facilitate renumbering.
>
> Since I don't write applications for a  living, I will admit I haven't rigorously
> tested any of the libraries out there, but, I'm willing to bet that someone,
> somewhere has probably written a good one by now.

Yeah, and if you give me a few weeks I can probably find it amidst all
the others which aren't so hot.

Regards,
Bill



--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 1, 2012, at 5:15 PM, William Herrin wrote:

> On Thu, Mar 1, 2012 at 8:02 PM, Owen DeLong <owen@delong.com> wrote:
>> There's no need to
>> break the current functionality of the underlying system calls and
>> libc functions which would be needed by any such library anyway.
>
> Owen,
>
> Point to one sentence written by anybody in this entire thread in
> which breaking current functionality was proposed.
>
When you said that:

connect(char *name, uint16_t port) should work

That can't work without breaking the existing functionality of the connect() system call.

>
>>> And because most everybody has made more or less the same errors, the
>>> DNS TTL fails to cause their applications to work as intended and
>>> loses its utility as a tool to facilitate renumbering.
>>
>> Since I don't write applications for a living, I will admit I haven't rigorously
>> tested any of the libraries out there, but, I'm willing to bet that someone,
>> somewhere has probably written a good one by now.
>
> Yeah, and if you give me a few weeks I can probably find it amidst all
> the others which aren't so hot.
>

I doubt it would take weeks, but, in any case, it's probably faster than writing and
debugging your own.

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 1, 2012, at 17:10, William Herrin <bill@herrin.us> wrote:
> If took you 50 lines of code to do
> 'socket=connect("www.google.com",80,TCP);' and you still managed to
> produce a version which, due to the timeout on dead addresses, is
> worthless for any kind of interactive program like a web browser. And
> because that code isn't found in a system library, every single
> application programmer has to write it all over again.
>
> I'm a fan of Rube Goldberg machines but that was ridiculous.

I'm thinking for this to work it would have to be 2 separate calls:

Call 1 being to the resolver (using lwres, system resolver, or
whatever you want to use) and returning an array of struct addrinfo-
same as gai does currently. If applications need TTL/SRV/$NEWRR
awareness it would be implemented here.

Call 2 would be a "happy eyeballs" connect syscall (mconnect? In the
spirit of sendmmsg) which accepts an array of struct addrinfo and
returns an fd. In the case of O_NONBLOCK it would return a dummy fd
(as non-blocking connects do currently) then once one of the
connections finishes handshake the kernel connects it to the FD and
signals writable to trigger select/poll/epoll. This allows developers
to keep using the same loops (and most of the APIs) they're already
comfortable with, keeps DNS out of the kernel, but hopefully provides
a better and easier to use connect() experience, for SOCK_STREAM at
least.

It's not as neat as a single connect() accepting a name, but seems to
be a happy medium and provides a standardized/predictable connect()
experience without breaking existing APIs.

~Matt
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
In message <596196444196086313@unknownmsgid>, Matt Addison writes:
> On Mar 1, 2012, at 17:10, William Herrin <bill@herrin.us> wrote:
> > If took you 50 lines of code to do
> > 'socket=connect("www.google.com",80,TCP);' and you still managed to
> > produce a version which, due to the timeout on dead addresses, is
> > worthless for any kind of interactive program like a web browser. And
> > because that code isn't found in a system library, every single
> > application programmer has to write it all over again.
> >
> > I'm a fan of Rube Goldberg machines but that was ridiculous.
>
> I'm thinking for this to work it would have to be 2 separate calls:
>
> Call 1 being to the resolver (using lwres, system resolver, or
> whatever you want to use) and returning an array of struct addrinfo-
> same as gai does currently. If applications need TTL/SRV/$NEWRR
> awareness it would be implemented here.
>
> Call 2 would be a "happy eyeballs" connect syscall (mconnect? In the
> spirit of sendmmsg) which accepts an array of struct addrinfo and
> returns an fd. In the case of O_NONBLOCK it would return a dummy fd
> (as non-blocking connects do currently) then once one of the
> connections finishes handshake the kernel connects it to the FD and
> signals writable to trigger select/poll/epoll. This allows developers
> to keep using the same loops (and most of the APIs) they're already
> comfortable with, keeps DNS out of the kernel, but hopefully provides
> a better and easier to use connect() experience, for SOCK_STREAM at
> least.
>
> It's not as neat as a single connect() accepting a name, but seems to
> be a happy medium and provides a standardized/predictable connect()
> experience without breaking existing APIs.
>
> ~Matt

And you can do the same in userland with kqueue and similar.

int
connectxx(struct addrinfo *res0, int *fd, int *timeout, void**state);

0 *fd is a connected socket.
EINPROGRESS Wait on '*fd' with a timeout of 'timeout' nanoseconds.
ETIMEDOUT connect failed.

If timeout or state is NULL you block.
You re-call with res0 set to NULL.

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 8:47 PM, Owen DeLong <owen@delong.com> wrote:
> On Mar 1, 2012, at 5:15 PM, William Herrin wrote:
>> On Thu, Mar 1, 2012 at 8:02 PM, Owen DeLong <owen@delong.com> wrote:
>>> There's no need to
>>> break the current functionality of the underlying system calls and
>>> libc functions which would be needed by any such library anyway.
>>
>> Owen,
>>
>> Point to one sentence written by anybody in this entire thread in
>> which breaking current functionality was proposed.
>>
> When you said that:
>
> connect(char *name, uint16_t port) should work
>
> That can't work without breaking the existing functionality of the connect() system call.

You know, when I wrote 'socket=connect("www.google.com",80,TCP);' I
stopped and thought to myself, "I wonder if I should change that to
'connectbyname' instead just to make it clear that I'm not replacing
the existing connect() call?" But then I thought, "No, there's a
thousand ways someone determined to misunderstand what I'm saying will
find to misunderstand it. To someone who wants to understand my point,
this is crystal clear."

-Bill


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 1, 2012, at 9:34 PM, William Herrin wrote:

> On Thu, Mar 1, 2012 at 8:47 PM, Owen DeLong <owen@delong.com> wrote:
>> On Mar 1, 2012, at 5:15 PM, William Herrin wrote:
>>> On Thu, Mar 1, 2012 at 8:02 PM, Owen DeLong <owen@delong.com> wrote:
>>>> There's no need to
>>>> break the current functionality of the underlying system calls and
>>>> libc functions which would be needed by any such library anyway.
>>>
>>> Owen,
>>>
>>> Point to one sentence written by anybody in this entire thread in
>>> which breaking current functionality was proposed.
>>>
>> When you said that:
>>
>> connect(char *name, uint16_t port) should work
>>
>> That can't work without breaking the existing functionality of the connect() system call.
>
> You know, when I wrote 'socket=connect("www.google.com",80,TCP);' I
> stopped and thought to myself, "I wonder if I should change that to
> 'connectbyname' instead just to make it clear that I'm not replacing
> the existing connect() call?" But then I thought, "No, there's a
> thousand ways someone determined to misunderstand what I'm saying will
> find to misunderstand it. To someone who wants to understand my point,
> this is crystal clear."

I'm all for additional library functionality built on top of what exists that does what you want.

As I said, there are many such libraries out there to do that.

If someone wants to add it to libc, more power to them. I'm not the libc maintainer.

I just don't want conect() to stop working the way it does or for getaddrinfo() to stop
working the way it does.

Since you were hell bent on calling the existing mechanisms broken rather than
conceding the point that the current process is not broken, but, could stand some
improvements in the library (http://owend.corp.he.net/ipv6 I even say as much myself),
it was not entirely clear that you did not intend to replace connect() rather than
augment the current capabilities with additional more abstract functions with
different names.

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
In a message written on Thu, Mar 01, 2012 at 05:02:30PM -0800, Owen DeLong wrote:
> Then push for better written abstraction libraries. There's no need to
> break the current functionality of the underlying system calls and
> libc functions which would be needed by any such library anyway.

Agree in part and disagree in part.

I think where the Open Source community has fallen behind in the
last decade is application level libraries. Open source pioneered
cross platform libraries (libX11, libresolv, libm) in the early
days and the benefit was they worked darn near exactly the same on
all platforms. It made programming and porting easier and lead to
growth in the ecosystem.

Today that mantle has been taken up by Apple and Microsoft. In
Objective-C for example I can in one line of code say "retrieve
this URL", and the libraries know about DNS, IPv4 vrs IPv6, happy
eyeballs algorythms, multi-threading parts so that the user doesn't
wait, and so on. Typical application programs on these platforms
never make any of the system calls that have been discussed in this
thread.

Unfortunately the open source world is without even basic enhancements.
Library work in many areas has stagnated, and in the areas where it is
progressing it's often done in a way to make the same library (by name)
perform differently on different operating systems! Plenty of people
have done research finding rampent file copying and duplication of code,
and that's a bad sign:

http://tagide.com/blog/2011/09/file-cloning-in-open-source-the-good-the-bad-and-the-ugly/
http://www.solidsourceit.com/blog/?p=4
http://pages.cs.wisc.edu/~shanlu/paper/TSE-CPMiner.pdf

I can't find it now but there was a paper a few years back that looked
for a hash or CRC algorythm because they were easy to identify in source
by the fixed, unique constant they used. In the Linux kernel alone was
like 10 implementations, widen to all software in the application
repository and there were like 10,000 instances of (nearly) the same
code!

Now, where I disagree. Better libraries means not just better ones
at a high level (fetch me this URL), but better ones at a lower level.
For instance libresolv discussed here is old and creaky. It was
designed for a different time. Many folks doing DNS work have moved
on to libldns from Unbound because libresolv does not do what they
need with respect to DNSSEC or IPv4/IPv6 issues.

I think the entire community needs to come together with a strong bit of
emphasis on libraries, standardizing them, making them ship with the
base OS so programmers can count on them, and rolling in new stuff that
needs to be in them on a timely basis. Apple and Microsoft do it with
their (mostly closed) platforms, open source can do it better.

--
Leo Bicknell - bicknell@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Fri, Mar 2, 2012 at 1:03 AM, Owen DeLong <owen@delong.com> wrote:
> On Mar 1, 2012, at 9:34 PM, William Herrin wrote:
>> You know, when I wrote 'socket=connect("www.google.com",80,TCP);' I
>> stopped and thought to myself, "I wonder if I should change that to
>> 'connectbyname' instead just to make it clear that I'm not replacing
>> the existing connect() call?" But then I thought, "No, there's a
>> thousand ways someone determined to misunderstand what I'm saying will
>> find to misunderstand it. To someone who wants to understand my point,
>> this is crystal clear."

"Hyperbole." If I had remembered the word, I could have skipped the
long description.

> I'm all for additional library functionality
> I just don't want conect() to stop working the way it does or for getaddrinfo() to stop
> working the way it does.

Good. Let's move on.


First question: who actually maintains the standard for the C sockets
API these days? Is it a POSIX standard?

Next, we have a set of APIs which, with sufficient caution and skill
(which is rarely the case) it's possible to string together a
reasonable process which starts with a some kind of name in a text
string and ends with established communication with a remote server
for any sort of name and any sort of protocol. These APIs are complete
but we repeatedly see certain kinds of error committed while using
them.

Is there a common set of activities an application programmer intends
to perform 9 times out of 10 when using getaddrinfo+connect? I think
there is, and it has the following functionality:

Create a [stream].to one of the hosts satisfying [name] + [service]
within [timeout] and return a [socket].

Does anybody disagree? Here's my reasoning:

Better than 9 times out of 10 a steam and usually a TCP stream at
that. Connect also designates a receiver for a connectionless protocol
like UDP, but its use for that has always been a little peculiar since
the protocol doesn't actually connect. And indeed, sendto() can
designate a different receiver for each packet sent through the
socket.

Name + Service. If TCP, a hostname and a port.

Sometimes you want to start multiple connection attempts in parallel
or have some not-quire-threaded process implement its own scheduler
for dealing with multiple connections at once, but that's the
exception. Usually the only reason for dealing with the connect() in
non-blocking mode is that you want to implement sensible error recover
with timeouts.

And the timeout - the direction that control should be returned to the
caller no later than X. If it would take more than X to complete, then
fail instead.



Next item: how would this work under the hood?

Well, you have two tasks: find a list of candidate endpoints from the
name, and establish a connection to one of them.

Find the candidates: ask all available name services in parallel
(hosts, NIS, DNS, etc). Finished when:

1. All services have responded negative (failure)

2. You have a positive answer and all services which have not yet
answered are at a lower priority (e.g. hosts answers, so you don't
need to wait for NIS and DNS).

3. You have a positive answer from at least one name service and 1/2
of the requested time out has expired.

4. The full time out has expired (failure).

Cache the knowledge somewhere along with TTLs (locally defined if the
name service doesn't explicitly provide a TTL). This may well be the
first of a series of connection requests for the same host. If cached
and TTL valid knowledge was known for this name for a particular
service, don't ask that service again.

Also need to let the app tell us to deprioritize a particular result
later on. Why? Let's say I get an HTTP connection to a host but then
that connection times out. If the app is managing the address list, it
can try again to another address for the same name. We're now hiding
that detail from the app, so we need a callback for the app to tell
us, "when I try again, avoid giving me this answer because it didn't
turn out to work."


So, now we have a list of addresses with valid TTLs as of the start of
our connection attempt. Next step: start the connection attempt.

Pick the "first" address (chosen by whatever the ordering rules are)
and send the connection request packet and let the OS do its normal
retry schedule. Wait one second (system or sysctl configurable) or
until the previous connection request was either accepted or rejected,
whichever is shorter. If not connected yet, background it, pick the
next address and send a connection request. Repeat until a one
connection request has been issued to all possible destination
addresses for the name.

Finished when:

1. Any of the pending connection requests completes (others are aborted).

2. The time out is reached (all pending request aborted).

Once a connection is established, this should be cached alongside the
address and its TTL so that next time around that address can be tried
first.

Thoughts?

The idea here, of course, is that any application which uses this
function to make its connections should, at an operations level, do a
good job handling both multiple addresses with one of them unreachable
as well as host renumbering that relies on the DNS TTL.



> Since you were hell bent on calling the existing mechanisms broken rather than
> conceding the point that the current process is not broken, but, could stand some
> improvements in the library

I hold that if an architecture encourages a certain implementation
mistake largely to the exclusion of correct implementations then that
architecture is in some way broken. That error may be in a particular
component, but it could be that the components themselves are correct.
There could be in a missing component or the components could strung
together in a way that doesn't work right. Regardless of the exact
cause, there is an architecture level mistake which is the root cause
of the consistently broken implementations.


Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 1, 2012, at 10:01 AM, Michael Thomas wrote:

> The real issue is that gethostbyxxx has been inadequate for a very
> long time. Moving it across the kernel boundary solves nothing and
> most likely causes even more trouble: what if I want, say, asynchronous
> name resolution? What if I want to use SRV records? What if a new DNS
> RR comes around -- do i have do recompile the kernel? It's for these
> reasons and probably a whole lot more that connect just confuses the
> actual issues.

<software-developer-hat-on>

My experience is that these calls are expensive and require a lot of work to get a true result. Some systems also have interim caching that happens as well (e.g. NSCD).

When building software that did a lot of dns lookups at once, I had to build my own internal cache to maintain performance. Startup costs were expensive, but maintaining it started to space out a bit more and be less of an issue.

I ended up caching these entries for 1 hour by default.

</hat ?xml-fail>

- jared
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 2, 2012, at 10:12 AM, William Herrin wrote:

> On Fri, Mar 2, 2012 at 1:03 AM, Owen DeLong <owen@delong.com> wrote:
>> On Mar 1, 2012, at 9:34 PM, William Herrin wrote:
>>> You know, when I wrote 'socket=connect("www.google.com",80,TCP);' I
>>> stopped and thought to myself, "I wonder if I should change that to
>>> 'connectbyname' instead just to make it clear that I'm not replacing
>>> the existing connect() call?" But then I thought, "No, there's a
>>> thousand ways someone determined to misunderstand what I'm saying will
>>> find to misunderstand it. To someone who wants to understand my point,
>>> this is crystal clear."
>
> "Hyperbole." If I had remembered the word, I could have skipped the
> long description.
>
>> I'm all for additional library functionality
>> I just don't want conect() to stop working the way it does or for getaddrinfo() to stop
>> working the way it does.
>
> Good. Let's move on.
>
>
> First question: who actually maintains the standard for the C sockets
> API these days? Is it a POSIX standard?
>

Well, some of it seems to be documented in RFCs, but, I think what you're wanting doesn't require adds to the sockets library, per se. In fact, I think wanting to make it part of that is a mistake. As I said, this should be a
higher level library.

For example, in Perl, you have Socket (and Socket6), but, you also have several other abstraction libraries such as Net::HTTP.

While there's no hierarchical naming scheme for the functions in libc, if you look at the source for any of the open source libc libraries out there, you'll find definite hierarchy.

POSIX certainly controls one standard. The GNU libc maintainers control the standard for the libc that accompanies GCC to the best of my knowledge. I would suggest that is probably the best place to start since I think anything that gains acceptance there will probably filter to the others fairly quickly.

> Next, we have a set of APIs which, with sufficient caution and skill
> (which is rarely the case) it's possible to string together a
> reasonable process which starts with a some kind of name in a text
> string and ends with established communication with a remote server
> for any sort of name and any sort of protocol. These APIs are complete
> but we repeatedly see certain kinds of error committed while using
> them.
>

Right... Since these are user-errors (at the developer level) I wouldn't try to fix them in the APIs. I would, instead, build more developer proof add-on APIs on top of them.

> Is there a common set of activities an application programmer intends
> to perform 9 times out of 10 when using getaddrinfo+connect? I think
> there is, and it has the following functionality:
>
> Create a [stream].to one of the hosts satisfying [name] + [service]
> within [timeout] and return a [socket].
>

Seems reasonable, but ignores UDP. If we're going to do this, I think we should target a more complete solution to include a broader range of probabilities than just the most common TCP connect scenario.

> Does anybody disagree? Here's my reasoning:
>
> Better than 9 times out of 10 a steam and usually a TCP stream at
> that. Connect also designates a receiver for a connectionless protocol
> like UDP, but its use for that has always been a little peculiar since
> the protocol doesn't actually connect. And indeed, sendto() can
> designate a different receiver for each packet sent through the
> socket.
>

Most applications using UDP that I have seen use sendto()/recvfrom() et. al. Netflow data would suggest that it's less than 9 out of ten times for TCP, but, yes, I would agree it is the most common scenario.

> Name + Service. If TCP, a hostname and a port.
>
That would apply to UDP as well. Just the semantics of what you do once you have the filehandle are different. (and it's not really a stream, per se).

> Sometimes you want to start multiple connection attempts in parallel
> or have some not-quire-threaded process implement its own scheduler
> for dealing with multiple connections at once, but that's the
> exception. Usually the only reason for dealing with the connect() in
> non-blocking mode is that you want to implement sensible error recover
> with timeouts.
>

Agreed.

> And the timeout - the direction that control should be returned to the
> caller no later than X. If it would take more than X to complete, then
> fail instead.
>

Actually, this is one thing I would like to see added to connect() and that could be done without breaking the existing API.

>
>
> Next item: how would this work under the hood?
>
> Well, you have two tasks: find a list of candidate endpoints from the
> name, and establish a connection to one of them.
>
> Find the candidates: ask all available name services in parallel
> (hosts, NIS, DNS, etc). Finished when:
>
> 1. All services have responded negative (failure)
>
> 2. You have a positive answer and all services which have not yet
> answered are at a lower priority (e.g. hosts answers, so you don't
> need to wait for NIS and DNS).
>
> 3. You have a positive answer from at least one name service and 1/2
> of the requested time out has expired.
>
> 4. The full time out has expired (failure).
>

I think the existing getaddrinfo() does this pretty well already.

I will note that the services you listed only apply to resolving the host name. Don't forget that you might also need to resolve the service to a port number. (An application should be looking up HTTP, not assuming it is 80, for example).

Conveniently, getaddrinfo simultaneously handles both of these lookups.

> Cache the knowledge somewhere along with TTLs (locally defined if the
> name service doesn't explicitly provide a TTL). This may well be the
> first of a series of connection requests for the same host. If cached
> and TTL valid knowledge was known for this name for a particular
> service, don't ask that service again.
>

I recommend against doing this above the level of getaddrinfo(). Just call getaddrinfo() again each time you need something. If it has cached data, it will return quickly and is cheap. If it doesn't return quickly, it will still work just as quickly as anything else most likely.

If getaddrinfo() on a particular system is not well behaved, we should seek to fix that implementation of getaddrinfo(), not write yet another replacement.

> Also need to let the app tell us to deprioritize a particular result
> later on. Why? Let's say I get an HTTP connection to a host but then
> that connection times out. If the app is managing the address list, it
> can try again to another address for the same name. We're now hiding
> that detail from the app, so we need a callback for the app to tell
> us, "when I try again, avoid giving me this answer because it didn't
> turn out to work."
>

I would suggest that instead of making this opaque and then complicating
it with these hints when we return, that we return use a mecahism where we
return a pointer to a dynamically allocated result (similar to getaddrinfo) and
if we get called again with a pointer to that structure, we know to delete the
previously connected host from the list we try next time.

When the application is done with the struct, it should free it by calling an
appropriate free function exported by this new API.

>
> So, now we have a list of addresses with valid TTLs as of the start of
> our connection attempt. Next step: start the connection attempt.
>
> Pick the "first" address (chosen by whatever the ordering rules are)
> and send the connection request packet and let the OS do its normal
> retry schedule. Wait one second (system or sysctl configurable) or
> until the previous connection request was either accepted or rejected,
> whichever is shorter. If not connected yet, background it, pick the
> next address and send a connection request. Repeat until a one
> connection request has been issued to all possible destination
> addresses for the name.
>
> Finished when:
>
> 1. Any of the pending connection requests completes (others are aborted).
>
> 2. The time out is reached (all pending request aborted).
>
> Once a connection is established, this should be cached alongside the
> address and its TTL so that next time around that address can be tried
> first.
>

Seems mostly reasonable. I would consider possibly having some form of inverse exponential backoff on the initial connection attempts. Maybe wait 5 seconds for the first one before trying the second one and waiting 2 seconds, then 1 second if the third one hasn't connected, then bottoming out somewhere around 500ms for the remainder.

>
>
>> Since you were hell bent on calling the existing mechanisms broken rather than
>> conceding the point that the current process is not broken, but, could stand some
>> improvements in the library
>
> I hold that if an architecture encourages a certain implementation
> mistake largely to the exclusion of correct implementations then that
> architecture is in some way broken. That error may be in a particular

I don't believe that the architecture encourages the implementation mistake.

Rather, I think human behavior and our tendency not to seek proper understanding of the theory of operation of various things prior to implementing things which depend on them is more at fault. I suppose that you can argue that the API should be built to avoid that, but, we'll have to agree to disagree on that point. I think that low-level APIs (and this is a low-level API) have to be able to rely on the engineers that use them making the effort to understand the theory of operation. I believe that the fault here is the lack of a standardized higher-level API in some languages.

> component, but it could be that the components themselves are correct.
> There could be in a missing component or the components could strung
> together in a way that doesn't work right. Regardless of the exact
> cause, there is an architecture level mistake which is the root cause
> of the consistently broken implementations.
>

I suppose by your definition this constitutes a missing component. I don't see it that way. I see it as a complete and functional system for a low-level API. There are high-level APIs available. As you have noted, some better than others. A standardized well-written high-level API would, indeed, be useful. However, that does not make the low-level API broken just because it is common for poorly trained users to make improper use of it. It is common for people using hammers to hit their thumbs. This does not mean that hammers are architecturally broken or that they should be re-engineered to have elaborate thumb-protection mechanisms.

The fact that you can electrocute yourself by sticking a fork into a toaster while it is operating is likewise, not an indication that toasters are architecturally broken.

It is precisely this attitude that has significantly increased the overhead and unnecessary expense of many systems while making product liability lawyers quite wealthy.

Owen