Mailing List Archive

1 2 3  View All
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 7:20 AM, Owen DeLong <owen@delong.com> wrote:
> The simpler approach and perfectly viable without mucking
> up what is already implemented and working:
>
> Don't keep returns from GAI/GNI around longer than it takes
> to cycle through your connect() loop immediately after the GAI/GNI call.

The even simpler approach: create an AF_NAME with a sockaddr struct
that contains a hostname instead of an IPvX address. Then let
connect() figure out the details of caching, TTLs, protocol and
address selection, etc. Such a connect() could even support a revised
TCP stack which is able to retry with the other addresses at the first
subsecond timeout rather than camping on each address in sequence for
the typical system default of two minutes.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 8:25 AM, Joe Greco <jgreco@ns.sol.net> wrote:
> "If three people died and the building burned down then the sprinkler
> system didn't work. It may have sprayed water, but it didn't *work*."
>
> That's not true.  If it sprayed water in the manner it was designed to,
> then it worked.

That's like the old crack about ICBM interceptors. Why yes, our system
performed swimmingly in the latest test achieving nine out of the ten
criteria for success. Which criteria didn't it achieve? It missed the
target.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
> On Thu, Mar 1, 2012 at 8:25 AM, Joe Greco <jgreco@ns.sol.net> wrote:
> > "If three people died and the building burned down then the sprinkler
> > system didn't work. It may have sprayed water, but it didn't *work*."
> >
> > That's not true. =A0If it sprayed water in the manner it was designed to,
> > then it worked.
>
> That's like the old crack about ICBM interceptors. Why yes, our system
> performed swimmingly in the latest test achieving nine out of the ten
> criteria for success. Which criteria didn't it achieve? It missed the
> target.

Difference: the fire suppression system worked as designed, the ICBM
didn't.

That's kind of the whole point here. If you have something like an
automobile that's designed to protect you against certain kinds of
accidents, it isn't a failure if it does not protect you against an
accident that is not reasonably within the protection envelope.

For example, cars these days are designed to protect against many
different types of impacts and provide survivability. It is a failure
if my car is designed to protect against a head-on crash at 30MPH by
use of engineered crumple zones and deploying air bags, and I get into
such an accident and am killed regardless. However, if I fly my car
into a bridge abutment at 150MPH and am instantly pulverized, I am not
prepared to consider that a failure of the car. Likewise, if a freeway
overpass slab falls on my car and crushes me as I drive underneath it,
I am not going to consider that a failure of the car.

There's a definite distinction between a system that fails when it is
deployed and used in the intended manner, and a system that doesn't
work as you'd like it to when it is used in some incorrect manner, which
is really not a failure as the word is normally used.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 03/01/2012 06:26 AM, William Herrin wrote:
> On Thu, Mar 1, 2012 at 7:20 AM, Owen DeLong<owen@delong.com> wrote:
>> The simpler approach and perfectly viable without mucking
>> up what is already implemented and working:
>>
>> Don't keep returns from GAI/GNI around longer than it takes
>> to cycle through your connect() loop immediately after the GAI/GNI call.
> The even simpler approach: create an AF_NAME with a sockaddr struct
> that contains a hostname instead of an IPvX address. Then let
> connect() figure out the details of caching, TTLs, protocol and
> address selection, etc. Such a connect() could even support a revised
> TCP stack which is able to retry with the other addresses at the first
> subsecond timeout rather than camping on each address in sequence for
> the typical system default of two minutes.

The effect of what you're recommending is to move all of this
into the kernel, and in the process greatly expand its scope. Also:
even if you did this, you'd be saddled with the same problem because
nothing existing would use an AF_NAME.

The real issue is that gethostbyxxx has been inadequate for a very
long time. Moving it across the kernel boundary solves nothing and
most likely causes even more trouble: what if I want, say, asynchronous
name resolution? What if I want to use SRV records? What if a new DNS
RR comes around -- do i have do recompile the kernel? It's for these
reasons and probably a whole lot more that connect just confuses the
actual issues.

When I was writing the first version of DKIM I used a library that I scraped
off the net called ARES. It worked adequately for me, but the most notable
thing was the very fact that I had to scrape it off the net at all. As far as
I could tell, standard distos don't have libraries with lower level access to
DNS (in my case, it needed to not block). Before positing a super-deluxe
gethostbyxx that does addresses picking, etc, etc, it would be better to
lobby all of the distos to settle on a decomposed resolver library from
which that and more could be built.

Mike
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
> On 03/01/2012 06:26 AM, William Herrin wrote:
> > On Thu, Mar 1, 2012 at 7:20 AM, Owen DeLong<owen@delong.com> wrote:
> >> The simpler approach and perfectly viable without mucking
> >> up what is already implemented and working:
> >>
> >> Don't keep returns from GAI/GNI around longer than it takes
> >> to cycle through your connect() loop immediately after the GAI/GNI call.
> > The even simpler approach: create an AF_NAME with a sockaddr struct
> > that contains a hostname instead of an IPvX address. Then let
> > connect() figure out the details of caching, TTLs, protocol and
> > address selection, etc. Such a connect() could even support a revised
> > TCP stack which is able to retry with the other addresses at the first
> > subsecond timeout rather than camping on each address in sequence for
> > the typical system default of two minutes.
>
> The effect of what you're recommending is to move all of this
> into the kernel, and in the process greatly expand its scope. Also:
> even if you did this, you'd be saddled with the same problem because
> nothing existing would use an AF_NAME.
>
> The real issue is that gethostbyxxx has been inadequate for a very
> long time. Moving it across the kernel boundary solves nothing and
> most likely causes even more trouble: what if I want, say, asynchronous
> name resolution? What if I want to use SRV records? What if a new DNS
> RR comes around -- do i have do recompile the kernel? It's for these
> reasons and probably a whole lot more that connect just confuses the
> actual issues.
>
> When I was writing the first version of DKIM I used a library that I scraped
> off the net called ARES. It worked adequately for me, but the most notable
> thing was the very fact that I had to scrape it off the net at all. As far as
> I could tell, standard distos don't have libraries with lower level access to
> DNS (in my case, it needed to not block). Before positing a super-deluxe
> gethostbyxx that does addresses picking, etc, etc, it would be better to
> lobby all of the distos to settle on a decomposed resolver library from
> which that and more could be built.

It's deeper than just that, though. The whole paradigm is messy, from
the point of view of someone who just wants to get stuff done. The
examples are (almost?) all fatally flawed. The code that actually gets
at least some of it right ends up being too complex and too hard for
people to understand why things are done the way they are.

Even in the "old days", before IPv6, geez, look at this:

bcopy(host->h_addr_list[n], (char *)&addr->sin_addr.s_addr, sizeof(addr->sin_addr.s_addr));

That's real comprehensible - and it's essentially the data interface
between the resolver library and the system's addressing structures
for syscalls.

On one hand, it's "great" that they wanted to abstract the dirty details
of DNS away from users, but I'd say they failed pretty much even at that.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 03/01/2012 07:22 AM, Joe Greco wrote:
> It's deeper than just that, though. The whole paradigm is messy, from
> the point of view of someone who just wants to get stuff done. The
> examples are (almost?) all fatally flawed. The code that actually gets
> at least some of it right ends up being too complex and too hard for
> people to understand why things are done the way they are.
>
> Even in the "old days", before IPv6, geez, look at this:
>
> bcopy(host->h_addr_list[n], (char *)&addr->sin_addr.s_addr, sizeof(addr->sin_addr.s_addr));
>
> That's real comprehensible - and it's essentially the data interface
> between the resolver library and the system's addressing structures
> for syscalls.
>
> On one hand, it's "great" that they wanted to abstract the dirty details
> of DNS away from users, but I'd say they failed pretty much even at that.

Yes, as simple as the normal kernel interface is for net io, getting
to the point that you can do a connect() is both maddeningly
messy and maddeningly inflexible -- the worst of all possible
worlds. We shouldn't kid ourselves that DNS is a simple protocol
though. It has layers of complexity and the policy decisions about
address picking are not easy. But things like dealing with caching
correctly shouldn't be that painful if done correctly by, say, discouraging
copying addresses with, say, a wrapper function that validates the
TTL and hands you back a filled out sockaddr.

But not wanting to block -- which is needed for an event loop or
run to completion like interface -- adds a completely new dimension.
Maybe it's the intersection of all of these complexities that's at the root
of why we're stuck with either gethostbyxx or roll your own.

Mike
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
Hi,

On Mar 1, 2012, at 7:22 AM, Joe Greco wrote:
> On Mar 1, 2012, at 7:01 AM, Michael Thomas wrote:
>> The effect of what you're recommending is to move all of this
>> into the kernel, and in the process greatly expand its scope. Also:
>> even if you did this, you'd be saddled with the same problem because
>> nothing existing would use an AF_NAME.

I always thought the right way to deal with IPv6 would have been to use a 32-bit number from the class E space as a 'network handle' where the actual address (be it IPv4 or IPv6) was handled by the kernel. I suspect this would have allowed the majority of network-utilizing applications to magically just work, regardless of whether the name supplied by gethosbyname/getnameinfo/etc. was mapped to an address with A or AAAA. Probably would make stuff faster too since you'd only have to deal with an unsigned int instead of (worst case) 16 bytes that have to be copied back and forth.

Instead, we have forced application developers to use a really odd mixture of old and new, e.g. 'struct sockaddr_in6' and GNI/GAI. Seems this is the worst of both worlds -- no backwards compatibility yet an adherence to a really broken model that requires applications to know useless details like the length of an address ("what do you mean a sizeof(struct sockaddr) isn't big enough to hold an IPv6 address?") and even its bit patterns.

>> Moving it across the kernel boundary solves nothing

Actually, it does. Right now, applications effectively cache the address in their data space, requiring the application developer to go to quite a bit of work to deal with the address changing (or, far more typically, just pretend addresses never change). This has a lot of unfortunate side effects.

>> and
>> most likely causes even more trouble: what if I want, say, asynchronous
>> name resolution?

Set non-blocking on the socket?

>> What if I want to use SRV records? What if a new DNS
>> RR comes around -- do i have do recompile the kernel?

I believe with the exception of A/AAAA, RDATA is typically returned as either opaque (to the DNS) data blobs or names. This means the only stuff the kernel would need to deal with would be the A/AAAA lookups, everything else would be passed back as data, presumably via a new system call.

>> As far as
>> I could tell, standard distos don't have libraries with lower level access to
>> DNS (in my case, it needed to not block).

There have been lower-level resolver APIs since (at least) BSD 4.3 (man resolver(3)).

> It's deeper than just that, though. The whole paradigm is messy, from
> the point of view of someone who just wants to get stuff done. The

> examples are (almost?) all fatally flawed. The code that actually gets
> at least some of it right ends up being too complex and too hard for
> people to understand why things are done the way they are.

Exactly. Even before IPv6, it was icky. Now, it's just crazy. We had an opportunity to fix this with IPv6 since IPv6 required non-trivial kernel hackage. Unfortunately, we didn't take advantage of that opportunity.

Regards,
-drc
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 10:01 AM, Michael Thomas <mike@mtcc.com> wrote:
> On 03/01/2012 06:26 AM, William Herrin wrote:
>> The even simpler approach: create an AF_NAME with a sockaddr struct
>> that contains a hostname instead of an IPvX address. Then let
>> connect() figure out the details of caching, TTLs, protocol and
>> address selection, etc.  Such a connect() could even support a revised
>> TCP stack which is able to retry with the other addresses at the first
>> subsecond timeout rather than camping on each address in sequence for
>> the typical system default of two minutes.
>
>
> The effect of what you're recommending is to move all of this
> into the kernel, and in the process greatly expand its scope.

Hi Michael,

libc != kernel. I want to move the action into the standard libraries
where it can be done once and done well. A little kernel action on top
to parallelize connection attempts where there are multiple candidate
addresses would be gravy, but not required.


> even if you did this, you'd be saddled with the same problem because
> nothing existing would use an AF_NAME.

It won't instantly fix everything so we shouldn't do it at all?


> what if I want, say, asynchronous
> name resolution? What if I want to use SRV records? What if a new DNS
> RR comes around

Then you do it the long way, same as you do now. But in the 99% of the
time that you're initiating a connection the "normal" way, you don't
have to (badly) reinvent the wheel.


> As far as
> I could tell, standard distos don't have libraries with lower level access to
> DNS (in my case, it needed to not block). Before positing a super-deluxe
> gethostbyxx that does addresses picking, etc, etc it would be better to
> lobby all of the distos to settle on a decomposed resolver library from
> which that and more could be built.

(A) Revised standards are -how- multiple OSes from multiple vendors
coordinate the deployment of an identical capability.

(B) Application programmers generally DO want the abstraction from
"DNS" to "Name resolution." If there's an /etc/hosts name or a NIS
name or a Windows name available, you ordinarily want to use it. You
don't want to build extra code to search each name service
independently any more than you want to build extra code to cycle
through candidate addresses.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 2012-03-01 17:57 , David Conrad wrote:
> Hi,
>
> On Mar 1, 2012, at 7:22 AM, Joe Greco wrote:
>> On Mar 1, 2012, at 7:01 AM, Michael Thomas wrote:
>>> The effect of what you're recommending is to move all of this
>>> into the kernel, and in the process greatly expand its scope.
>>> Also: even if you did this, you'd be saddled with the same
>>> problem because nothing existing would use an AF_NAME.
>
> I always thought the right way to deal with IPv6 would have been to
> use a 32-bit number from the class E space as a 'network handle'
> where the actual address (be it IPv4 or IPv6) was handled by the
> kernel.

This is the case when you pass in a sockaddr. Note, not a sockaddr_in or
a sockaddr_in6, but just a sockaddr.

There is a nice 14 year old article about this:
http://www.kame.net/newsletter/19980604/

> I suspect this would have allowed the majority of
> network-utilizing applications to magically just work, regardless of
> whether the name supplied by gethosbyname/getnameinfo/etc. was mapped
> to an address with A or AAAA. Probably would make stuff faster too
> since you'd only have to deal with an unsigned int instead of (worst
> case) 16 bytes that have to be copied back and forth.

There is quite a bit more state than that. And actually those addresses
are only 'copied' once: during accept() or connect(), there is no
"speed-loss" per send/recv as the only thing being moved from user space
to kernel space is the file descriptor and the actual data.

[..]
> Instead, we have forced application developers to use a really odd
> mixture of old and new, e.g. 'struct sockaddr_in6' and GNI/GAI.
> Seems this is the worst of both worlds -- no backwards compatibility
> yet an adherence to a really broken model that requires applications
> to know useless details like the length of an address ("what do you
> mean a sizeof(struct sockaddr) isn't big enough to hold an IPv6
> address?") and even its bit patterns.

Ever heard of sockaddr_storage? It was made to solve that little issue.
See also, that article above.

[..]
> Exactly. Even before IPv6, it was icky. Now, it's just crazy. We
> had an opportunity to fix this with IPv6 since IPv6 required
> non-trivial kernel hackage. Unfortunately, we didn't take advantage
> of that opportunity.

What you are talking about is an API wrapper. Depending on platform
these have existed for years already. Quite a few do not expose
addresses at all to the calling code.

One of the many reasons why putting the IPv6 enabled winsock dll in
place 14 years ago made various winsock applications understand IPv6.

Greets,
Jeroen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 03/01/2012 08:57 AM, David Conrad wrote:
>
>> Moving it across the kernel boundary solves nothing
> Actually, it does. Right now, applications effectively cache the address in their data space, requiring the application developer to go to quite a bit of work to deal with the address changing (or, far more typically, just pretend addresses never change). This has a lot of unfortunate side effects.

My rule of thumb is for this sort of thing "does it *require* kernel level access?"
In this case, the answer is manifestly "no". As far as ttl's go in particular, most
apps would work perfectly well always doing real DNS socket IO to a local resolver
each time which has the side effect that it would honor ttl, as well as benefiting
from cross process caching. It could be done in the kernel, but it would be introducing
a *lot* of complexity and inflexibility.

Even if you did want super high performance local DNS resolution, there are
still a lot of other ways to achieve that besides jamming it into the kernel. A
lot of the beauty of UNIX is that the kernel system interface is simple... dragging
more into the kernel is aesthetically wrong.

>>> What if I want to use SRV records? What if a new DNS
>>> RR comes around -- do i have do recompile the kernel?
> I believe with the exception of A/AAAA, RDATA is typically returned as either opaque (to the DNS) data blobs or names. This means the only stuff the kernel would need to deal with would be the A/AAAA lookups, everything else would be passed back as data, presumably via a new system call.

SRV records? This is starting to get really messy inside the kernel and for
no good reason that I can see.

>
>>> As far as
>>> I could tell, standard distos don't have libraries with lower level access to
>>> DNS (in my case, it needed to not block).
> There have been lower-level resolver APIs since (at least) BSD 4.3 (man resolver(3)).

This is all getting sort of hazy since it was 8 years ago, but yes res_XX existed,
and hence the ares_ analog that I used. Maybe all that's really needed for low
level access primitives is a merger of res_ and ares_... asynchronous resolution
is a fairly important feature for modern event loop like things. But I don't claim
to be a DNS wonk so it might be worse than that.

Mike
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On 03/01/2012 08:58 AM, William Herrin wrote:
> On Thu, Mar 1, 2012 at 10:01 AM, Michael Thomas<mike@mtcc.com> wrote:
>> On 03/01/2012 06:26 AM, William Herrin wrote:
>>> The even simpler approach: create an AF_NAME with a sockaddr struct
>>> that contains a hostname instead of an IPvX address. Then let
>>> connect() figure out the details of caching, TTLs, protocol and
>>> address selection, etc. Such a connect() could even support a revised
>>> TCP stack which is able to retry with the other addresses at the first
>>> subsecond timeout rather than camping on each address in sequence for
>>> the typical system default of two minutes.
>>
>> The effect of what you're recommending is to move all of this
>> into the kernel, and in the process greatly expand its scope.
> Hi Michael,
>
> libc != kernel. I want to move the action into the standard libraries
> where it can be done once and done well. A little kernel action on top
> to parallelize connection attempts where there are multiple candidate
> addresses would be gravy, but not required.

connect(2) is a kernel level call just like open(2), etc. It may
have a thin wrapper, but that's OS dependent, IIRC.

man connect 2:

"The connect() system call connects the socket referred to by the file descriptor..."

Mike
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 1:32 PM, Michael Thomas <mike@mtcc.com> wrote:
> On 03/01/2012 08:58 AM, William Herrin wrote:
>> libc != kernel. I want to move the action into the standard libraries
>> where [resolve and connect] can be done once and done well.
>> A little kernel action on top
>> to parallelize connection attempts where there are multiple candidate
>> addresses would be gravy, but not required.
>
> connect(2) is a kernel level call just like open(2), etc. It may
> have a thin wrapper, but that's OS dependent, IIRC.
>
> man connect 2:
>
> "The connect() system call connects the socket referred to by the file
> descriptor..."

Then name the new one something else and document it in man section 3.
Next objection?

-Bill


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
Michael,

On Mar 1, 2012, at 10:00 AM, Michael Thomas wrote:
> My rule of thumb is for this sort of thing "does it *require* kernel level access?"
> In this case, the answer is manifestly "no".

This is tilting at windmills since it's wildly unlikely anything will change, but...

The idea is to add a level of indirection that does not currently exist, similar to the mapping of filename/file handle/inode in the filesystem. This layer of indirection allows the kernel to remap things as it sees fit without impacting the application. If such functionality existed, the kernel could manage the mapping between name and address to do things like honoring DNS TTL, transparently handling renumbering events, deal with protocol transitions even during a connection, etc. As things are now, it's like having to rewrite non-tivial sections of code for _all_ disk-aware applications because we've gone from a 32-bit file system to a 64-bit file system, even though the vast majority of those applications couldn't care less.

> SRV records?

Do not have addresses in their RDATA, they have names.

Regards,
-drc
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
Jeroen,

On Mar 1, 2012, at 9:25 AM, Jeroen Massar wrote:
>> I always thought the right way to deal with IPv6 would have been to
>> use a 32-bit number from the class E space as a 'network handle'
>> where the actual address (be it IPv4 or IPv6) was handled by the
>> kernel.
>
> This is the case when you pass in a sockaddr. Note, not a sockaddr_in or
> a sockaddr_in6, but just a sockaddr.

Sorry? On which system? As far as I'm aware, there are no libraries that make use of class E addresses to act as a layer of indirection similar to file handles. Would love to know such exists.

> There is a nice 14 year old article about this:
> http://www.kame.net/newsletter/19980604/

Quoting from that article: "This way the network address and address family is will not live together, and leads to bunch of if/switch statement and mistakes in programming. " which is exactly the point. It has been 14 years and people are _STILL_ discussing this.

> And actually those addresses
> are only 'copied' once: during accept() or connect(),

Assuming the application doesn't need to copy the address, ever.

> Ever heard of sockaddr_storage?

Oddly, yes. It still astonishes me that sizeof(struct sockaddr) < sizeof(struct sockaddr_storage).

> It was made to solve that little issue. See also, that article above.

Thus requiring people to go in and muck with code thereby increasing the cost of migration with obvious effect.

> What you are talking about is an API wrapper. Depending on platform
> these have existed for years already. Quite a few do not expose
> addresses at all to the calling code.

And yet, look at the code Mark Andrews just referenced as his recommend way of dealing with initiating connections. How many applications actually do anything like that? More to the point, how many books/article/etc. exist that reference these APIs you're talking about vs. how many reference the traditional way one goes about dealing with networks?

Rhetorical questions, no need to answer. Got tired of tilting at this windmill some time ago and I know nothing will change. I'm just amazed that people defend the abominable kludge that are the existing common sockets/resolver APIs.

Regards,
-drc
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 1, 2012, at 6:26 AM, William Herrin wrote:

> On Thu, Mar 1, 2012 at 7:20 AM, Owen DeLong <owen@delong.com> wrote:
>> The simpler approach and perfectly viable without mucking
>> up what is already implemented and working:
>>
>> Don't keep returns from GAI/GNI around longer than it takes
>> to cycle through your connect() loop immediately after the GAI/GNI call.
>
> The even simpler approach: create an AF_NAME with a sockaddr struct
> that contains a hostname instead of an IPvX address. Then let
> connect() figure out the details of caching, TTLs, protocol and
> address selection, etc. Such a connect() could even support a revised
> TCP stack which is able to retry with the other addresses at the first
> subsecond timeout rather than camping on each address in sequence for
> the typical system default of two minutes.
>

That's not simpler for the following reasons:

1. It takes away abilities to manage the connect() process that some
applications want.

2. It requires a rewrite of a whole lot of software built on the current
mechanisms.

Most systems provide a mechanism for overriding the timeout for
connect().

Further, there are lots of classes, libraries, etc. that you can already use
if you want to abstract the gai/gni + connect functionality.

What exists isn't broken at the API level. Please stop trying to fix what
is not broken.

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
>
> It's deeper than just that, though. The whole paradigm is messy, from
> the point of view of someone who just wants to get stuff done. The
> examples are (almost?) all fatally flawed. The code that actually gets
> at least some of it right ends up being too complex and too hard for
> people to understand why things are done the way they are.
>
> Even in the "old days", before IPv6, geez, look at this:
>
> bcopy(host->h_addr_list[n], (char *)&addr->sin_addr.s_addr, sizeof(addr->sin_addr.s_addr));
>
> That's real comprehensible - and it's essentially the data interface
> between the resolver library and the system's addressing structures
> for syscalls.
>
> On one hand, it's "great" that they wanted to abstract the dirty details
> of DNS away from users, but I'd say they failed pretty much even at that.
>
> ... JG
> --
> Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
> "We call it the 'one bite at the apple' rule. Give me one chance [and] then I
> won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
> With 24 million small businesses in the US alone, that's way too many apples.

I think that the modern set of getaddrinfo and connect is actually not that complicated:

/* Hints for getaddrinfo() (tell it what we want) */
memset(&addrinfo, 0, sizeof(addrinfo)); /* Zero out the buffer */
addrinfo.ai_family=PF_UNSPEC; /* Any and all address families */
addrinfo.ai_socktype=SOCK_STREAM; /* Stream Socket */
addrinfo.ai_protocol=IPPROTO_TCP; /* TCP */
/* Ask the resolver library for the information. Exit on failure. */
/* argv[1] is the hostname passed in by the user. "demo" is the service name */
if (rval = getaddrinfo(argv[1], "demo", &addrinfo, &res) != 0) {
fprintf(stderr, "%s: Failed to resolve address information.\n", argv[0]);
exit(2);
}

/* Iterate through the results */
for (r=res; r; r = r->ai_next) {
/* Create a socket configured for the next candidate */
sockfd6 = socket(r->ai_family, r->ai_socktype, r->ai_protocol);
/* Try to connect */
if (connect(sockfd6, r->ai_addr, r->ai_addrlen) < 0)
{
/* Failed to connect */
e_save = errno;
/* Destroy socket */
(void) close(sockfd6);
/* Recover the error information */
errno = e_save;
/* Tell the user that this attempt failed */
fprintf(stderr, "%s: Failed attempt to %s.\n", argv[0],
get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN));
/* Give error details */
perror("Socket error");
} else { /* Success! */
/* Inform the user */
snprintf(s, BUFLEN, "%s: Succeeded to %s.", argv[0],
get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN));
debug(5, argv[0], s);
/* Flag our success */
success++;
/* Stop iterating */
break;
}
}
/* Out of the loop. Either we succeeded or ran out of possibilities */
if (success == 0) /* If we ran out of possibilities... */
{
/* Inform the user, free up the resources, and exit */
fprintf(stderr, "%s: Failed to connect to %s.\n", argv[0], argv[1]);
freeaddrinfo(res);
exit(5);
}
/* Succeeded. Inform the user and continue with the application */
printf("%s: Successfully connected to %s at %s on FD %d.\n", argv[0], argv[1],
get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN),
sockfd6);
/* Free up the memory held by the resolver results */
freeaddrinfo(res);

It's really hard to make a case that this is all that complex.

I put a lot of extra comments in there to make it clear what's happening for people who may not be used to coding in C. It also contains a whole lot of extra user notification and debugging instrumentation because it is designed as an example people can use to learn with.

Yes, this was a lot messier and a lot stranger and harder to get right with get*by{name,addr}, but, those days are long gone and anyone still coding with those needs to move forward.

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
In message <CAC38B59-1F54-4788-87A2-A1A8BE453500@delong.com>, Owen DeLong write
s:
> >=20
> > It's deeper than just that, though. The whole paradigm is messy, from
> > the point of view of someone who just wants to get stuff done. The
> > examples are (almost?) all fatally flawed. The code that actually =
> gets
> > at least some of it right ends up being too complex and too hard for
> > people to understand why things are done the way they are.
> >=20
> > Even in the "old days", before IPv6, geez, look at this:
> >=20
> > bcopy(host->h_addr_list[n], (char *)&addr->sin_addr.s_addr, =
> sizeof(addr->sin_addr.s_addr));
> >=20
> > That's real comprehensible - and it's essentially the data interface=20=
>
> > between the resolver library and the system's addressing structures
> > for syscalls.
> >=20
> > On one hand, it's "great" that they wanted to abstract the dirty =
> details
> > of DNS away from users, but I'd say they failed pretty much even at =
> that.
> >=20
> > ... JG
> > --=20
> > Joe Greco - sol.net Network Services - Milwaukee, WI - =
> http://www.sol.net
> > "We call it the 'one bite at the apple' rule. Give me one chance [and] =
> then I
> > won't contact you again." - Direct Marketing Ass'n position on e-mail =
> spam(CNN)
> > With 24 million small businesses in the US alone, that's way too many =
> apples.
>
> I think that the modern set of getaddrinfo and connect is actually not =
> that complicated:
>
> /* Hints for getaddrinfo() (tell it what we want) */
> memset(&addrinfo, 0, sizeof(addrinfo)); /* Zero out the buffer =
> */
> addrinfo.ai_family=3DPF_UNSPEC; /* Any and all =
> address families */
> addrinfo.ai_socktype=3DSOCK_STREAM; /* Stream Socket */
> addrinfo.ai_protocol=3DIPPROTO_TCP; /* TCP */
> /* Ask the resolver library for the information. Exit on failure. */
> /* argv[1] is the hostname passed in by the user. "demo" is the =
> service name */
> if (rval =3D getaddrinfo(argv[1], "demo", &addrinfo, &res) !=3D 0) {
> fprintf(stderr, "%s: Failed to resolve address information.\n", =
> argv[0]);
> exit(2);
> }
>
> /* Iterate through the results */
> for (r=3Dres; r; r =3D r->ai_next) {
> /* Create a socket configured for the next candidate */
> sockfd6 =3D socket(r->ai_family, r->ai_socktype, r->ai_protocol);
> /* Try to connect */
> if (connect(sockfd6, r->ai_addr, r->ai_addrlen) < 0)
> {
> /* Failed to connect */
> e_save =3D errno;
> /* Destroy socket */
> (void) close(sockfd6);
> /* Recover the error information */
> errno =3D e_save;
> /* Tell the user that this attempt failed */
> fprintf(stderr, "%s: Failed attempt to %s.\n", argv[0],=20
> get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN));
> /* Give error details */
> perror("Socket error");
> } else { /* Success! */
> /* Inform the user */
> snprintf(s, BUFLEN, "%s: Succeeded to %s.", argv[0],
> get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN));
> debug(5, argv[0], s);
> /* Flag our success */
> success++;
> /* Stop iterating */
> break;
> }
> }
> /* Out of the loop. Either we succeeded or ran out of possibilities */
> if (success =3D=3D 0) /* If we ran out of possibilities... */
> {
> /* Inform the user, free up the resources, and exit */
> fprintf(stderr, "%s: Failed to connect to %s.\n", argv[0], argv[1]);
> freeaddrinfo(res);
> exit(5);
> }
> /* Succeeded. Inform the user and continue with the application */
> printf("%s: Successfully connected to %s at %s on FD %d.\n", argv[0], =
> argv[1],
> get_ip_str((struct sockaddr *)r->ai_addr, buf, BUFLEN),
> sockfd6);
> /* Free up the memory held by the resolver results */
> freeaddrinfo(res);
>
> It's really hard to make a case that this is all that complex.
>
> I put a lot of extra comments in there to make it clear what's happening =
> for people who may not be used to coding in C. It also contains a whole =
> lot of extra user notification and debugging instrumentation because it =
> is designed as an example people can use to learn with.=20
>
> Yes, this was a lot messier and a lot stranger and harder to get right =
> with get*by{name,addr}, but, those days are long gone and anyone still =
> coding with those needs to move forward.
>
> Owen
>

These days you want something more complicated as everyone is or
will be soon multi-homed. The basic loop above has very bad error
characteristics if the first machines are not reachable. I've got
working select, poll and thread based examples here:

http://www.isc.org/community/blog/201101/how-to-connect-to-a-multi-homed-server-over-tcp.

From http://www.isc.org/files/imce/select-connect_0.c:

/*
* Copyright (C) 2011 Internet Systems Consortium, Inc. ("ISC")
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND ISC DISCLAIMS ALL WARRANTIES WITH
* REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
* AND FITNESS. IN NO EVENT SHALL ISC BE LIABLE FOR ANY SPECIAL, DIRECT,
* INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
* LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
* OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
* PERFORMANCE OF THIS SOFTWARE.
*/


#define TIMEOUT 500 /* ms */

int
connect_to_host(struct addrinfo *res0) {
struct addrinfo *res;
int fd = -1, n, i, j, flags, count, max = -1, *fds;
struct timeval *timeout, timeout0 = { 0, TIMEOUT * 1000};
fd_set fdset, wrset;

/*
* Work out how many possible descriptors we could use.
*/
for (res = res0, count = 0; res; res = res->ai_next)
count++;
fds = calloc(count, sizeof(*fds));
if (fds == NULL) {
perror("calloc");
goto cleanup;
}
FD_ZERO(&fdset);
for (res = res0, i = 0, count = 0; res; res = res->ai_next) {
fd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
if (fd == -1) {
/*
* If AI_ADDRCONFIG is not supported we will get
* EAFNOSUPPORT returned. Behave as if the address
* was not there.
*/
if (errno != EAFNOSUPPORT)
perror("socket");
else if (res->ai_next != NULL)
continue;
} else if (fd >= FD_SETSIZE) {
close(fd);
} else if ((flags = fcntl(fd, F_GETFL)) == -1) {
perror("fcntl");
close(fd);
} else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) == -1) {
perror("fcntl");
close(fd);
} else if (connect(fd, res->ai_addr, res->ai_addrlen) == -1) {
if (errno != EINPROGRESS) {
perror("connect");
close(fd);
} else {
/*
* Record the information for this descriptor.
*/
fds[i] = fd;
FD_SET(fd, &fdset);
if (max == -1 || fd > max)
max = fd;
count++;
i++;
}
} else {
/*
* We connected without blocking.
*/
goto done;
}

if (count == 0)
continue;

assert(max != -1);
do {
if (res->ai_next != NULL)
timeout = &timeout0;
else
timeout = NULL;

/* The write bit is set on both success and failure. */
wrset = fdset;
n = select(max + 1, NULL, &wrset, NULL, timeout);
if (n == 0) {
timeout0.tv_usec >>= 1;
break;
}
if (n < 0) {
if (errno == EAGAIN || errno == EINTR)
continue;
perror("select");
fd = -1;
goto done;
}
for (fd = 0; fd <= max; fd++) {
if (FD_ISSET(fd, &wrset)) {
socklen_t len;
int err;
for (j = 0; j < i; j++)
if (fds[j] == fd)
break;
assert(j < i);
/*
* Test to see if the connect
* succeeded.
*/
len = sizeof(err);
n = getsockopt(fd, SOL_SOCKET,
SO_ERROR, &err, &len);
if (n != 0 || err != 0) {
close(fd);
FD_CLR(fd, &fdset);
fds[j] = -1;
count--;
continue;
}
/* Connect succeeded. */
goto done;
}
}
} while (timeout == NULL && count != 0);
}

/* We failed to connect. */
fd = -1;

done:
/* Close all other descriptors we have created. */
for (j = 0; j < i; j++)
if (fds[j] != fd && fds[j] != -1) {
close(fds[j]);
}

if (fd != -1) {
/* Restore default blocking behaviour. */
if ((flags = fcntl(fd, F_GETFL)) != -1) {
flags &= ~O_NONBLOCK;
if (fcntl(fd, F_SETFL, flags) == -1)
perror("fcntl");
} else
perror("fcntl");
}

cleanup:
/* Free everything. */
if (fds) free(fds);

return (fd);
}

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 4:07 PM, Owen DeLong <owen@delong.com> wrote:
> I think that the modern set of getaddrinfo and connect is actually not that complicated:

Owen,

If took you 50 lines of code to do
'socket=connect("www.google.com",80,TCP);' and you still managed to
produce a version which, due to the timeout on dead addresses, is
worthless for any kind of interactive program like a web browser. And
because that code isn't found in a system library, every single
application programmer has to write it all over again.

I'm a fan of Rube Goldberg machines but that was ridiculous.

Regards,
Bill Herrin





--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
In message <CAP-guGXLpzai4LrxyJcNn06yQ1jAEu4QeRpVzGRah=+OGLy9Zw@mail.gmail.com>
, William Herrin writes:
> On Thu, Mar 1, 2012 at 4:07 PM, Owen DeLong <owen@delong.com> wrote:
> > I think that the modern set of getaddrinfo and connect is actually not th=
> at complicated:
>
> Owen,
>
> If took you 50 lines of code to do
> 'socket=connect("www.google.com",80,TCP);' and you still managed to
> produce a version which, due to the timeout on dead addresses, is
> worthless for any kind of interactive program like a web browser. And
> because that code isn't found in a system library, every single
> application programmer has to write it all over again.

And your 'socket=connect("www.google.com",80,TCP);' won't work for
a web browser either unless you are using threads and are willing
to have the thread stall.

The existing connect() semantics actually work well for browsers
but they need to be properly integrated into the system as a whole.
Nameservers have similar connect() issues as web browsers with one
advantage, most of the time we are connecting to a machine we have
just connected to via UDP. That doesn't mean we don't do non-blocking
connect however.

> I'm a fan of Rube Goldberg machines but that was ridiculous.
>
> Regards,
> Bill Herrin
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
William,

I could have done it in a lot less lines of code, but, it would have been much less readable.

Not blocking on the connect() call is a little more complex, but, not terribly so. It does, however, again, make the code quite a bit less readable.

There are libraries available that abstract everything I did there and you are welcome to use them.

Since C does not support overloading, they export different functions for the behavior you seek.

If you want, program in Python where the libraries do provide the abstraction you seek. Of course, that means you have to cope with Python's other disgusting habits like spaces are meaningful and variables are indistinguishable from code, but, there's always a tradeoff.

You don't have to reinvent what I've done. Neither does every or any other application programmer.
You are welcome to use any of the many connection abstraction libraries that are available in open source. I suggest you make a trip through google code.

Owen

On Mar 1, 2012, at 2:09 PM, William Herrin wrote:

> On Thu, Mar 1, 2012 at 4:07 PM, Owen DeLong <owen@delong.com> wrote:
>> I think that the modern set of getaddrinfo and connect is actually not that complicated:
>
> Owen,
>
> If took you 50 lines of code to do
> 'socket=connect("www.google.com",80,TCP);' and you still managed to
> produce a version which, due to the timeout on dead addresses, is
> worthless for any kind of interactive program like a web browser. And
> because that code isn't found in a system library, every single
> application programmer has to write it all over again.
>
> I'm a fan of Rube Goldberg machines but that was ridiculous.
>
> Regards,
> Bill Herrin
>
>
>
>
>
> --
> William D. Herrin ................ herrin@dirtside.com bill@herrin.us
> 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
> Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 5:37 PM, Owen DeLong <owen@delong.com> wrote:
> You don't have to reinvent what I've done. Neither does every
> or any other application programmer.
> You are welcome to use any of the many connection
> abstraction libraries that are available in open source.
> I suggest you make a trip through google code.

Which is what everybody basically does. And when it works during the
decidedly non-rigorous testing, they move on to the next problem...
with code that doesn't perform well in the corner cases. Such as when
a host has just been renumbered or one of the host's addresses is
unreachable.

And because most everybody has made more or less the same errors, the
DNS TTL fails to cause their applications to work as intended and
loses its utility as a tool to facilitate renumbering.


> If you want, program in Python where the libraries do
> provide the abstraction you seek. Of course, that
> means you have to cope with Python's other disgusting
> habits like spaces are meaningful and variables are
> indistinguishable from code, but, there's always a tradeoff.

::shudder:: I don't *want* to do anything in python. The occasional
reality of a situation dictates that I do some work in python, but I
most definitely don't *want* to.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 01, 2012 at 05:57:11PM -0500, William Herrin wrote:
> Which is what everybody basically does. And when it works during the
> decidedly non-rigorous testing, they move on to the next problem...
> with code that doesn't perform well in the corner cases. Such as when
> a host has just been renumbered or one of the host's addresses is
> unreachable.
>
> And because most everybody has made more or less the same errors, the
> DNS TTL fails to cause their applications to work as intended and
> loses its utility as a tool to facilitate renumbering.

Is there an RFC or BCP that describes how to correctly write such a
library? Perhaps we need to work to get such a thing, and then push
for RFC-compliance of the resolver libraries, or develop a set of
libraries named after and fully compliant with the RFC and get
software to use them.
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 1, 2012, at 2:57 PM, William Herrin wrote:

> On Thu, Mar 1, 2012 at 5:37 PM, Owen DeLong <owen@delong.com> wrote:
>> You don't have to reinvent what I've done. Neither does every
>> or any other application programmer.
>> You are welcome to use any of the many connection
>> abstraction libraries that are available in open source.
>> I suggest you make a trip through google code.
>
> Which is what everybody basically does. And when it works during the
> decidedly non-rigorous testing, they move on to the next problem...
> with code that doesn't perform well in the corner cases. Such as when
> a host has just been renumbered or one of the host's addresses is
> unreachable.
>

Then push for better written abstraction libraries. There's no need to
break the current functionality of the underlying system calls and
libc functions which would be needed by any such library anyway.

> And because most everybody has made more or less the same errors, the
> DNS TTL fails to cause their applications to work as intended and
> loses its utility as a tool to facilitate renumbering.
>

Since I don't write applications for a living, I will admit I haven't rigorously
tested any of the libraries out there, but, I'm willing to bet that someone,
somewhere has probably written a good one by now.

>
>> If you want, program in Python where the libraries do
>> provide the abstraction you seek. Of course, that
>> means you have to cope with Python's other disgusting
>> habits like spaces are meaningful and variables are
>> indistinguishable from code, but, there's always a tradeoff.
>
> ::shudder:: I don't *want* to do anything in python. The occasional
> reality of a situation dictates that I do some work in python, but I
> most definitely don't *want* to.

Believe me, I'm in the same boat on that one. However, it is the only
language I know of that provides the kind of interface you are demanding.
Perhaps this should tell you something about what you are asking for. ;-)

Owen
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Thu, Mar 1, 2012 at 8:02 PM, Owen DeLong <owen@delong.com> wrote:
> There's no need to
> break the current functionality of the underlying system calls and
> libc functions which would be needed by any such library anyway.

Owen,

Point to one sentence written by anybody in this entire thread in
which breaking current functionality was proposed.


>> And because most everybody has made more or less the same errors, the
>> DNS TTL fails to cause their applications to work as intended and
>> loses its utility as a tool to facilitate renumbering.
>
> Since I don't write applications for a  living, I will admit I haven't rigorously
> tested any of the libraries out there, but, I'm willing to bet that someone,
> somewhere has probably written a good one by now.

Yeah, and if you give me a few weeks I can probably find it amidst all
the others which aren't so hot.

Regards,
Bill



--
William D. Herrin ................ herrin@dirtside.com  bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
Re: dns and software, was Re: Reliable Cloud host ? [ In reply to ]
On Mar 1, 2012, at 5:15 PM, William Herrin wrote:

> On Thu, Mar 1, 2012 at 8:02 PM, Owen DeLong <owen@delong.com> wrote:
>> There's no need to
>> break the current functionality of the underlying system calls and
>> libc functions which would be needed by any such library anyway.
>
> Owen,
>
> Point to one sentence written by anybody in this entire thread in
> which breaking current functionality was proposed.
>
When you said that:

connect(char *name, uint16_t port) should work

That can't work without breaking the existing functionality of the connect() system call.

>
>>> And because most everybody has made more or less the same errors, the
>>> DNS TTL fails to cause their applications to work as intended and
>>> loses its utility as a tool to facilitate renumbering.
>>
>> Since I don't write applications for a living, I will admit I haven't rigorously
>> tested any of the libraries out there, but, I'm willing to bet that someone,
>> somewhere has probably written a good one by now.
>
> Yeah, and if you give me a few weeks I can probably find it amidst all
> the others which aren't so hot.
>

I doubt it would take weeks, but, in any case, it's probably faster than writing and
debugging your own.

Owen

1 2 3  View All