Mailing List Archive

User-Agent:
Hi!

from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites.

Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
On Mon, Feb 15, 2010 at 8:54 PM, Domas Mituzas <midom.lists@gmail.com> wrote:
> Hi!
>
> from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites.
>
> Domas
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

In that case should we tweak the MediaWiki user agent to serve
something more unique than "MediaWiki/version?"

-Chad

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
Domas wrote:
> from now on specific per-bot/per-software/per-client User-Agent
> header is mandatory for contacting Wikimedia sites.

But why?

(This just broke one of my bots.)

Are the details of this policy discussed anywhere?

Is it permissible to send

User-Agent: x

thus providing precisely the same amount of information as if not
supplying the header at all?

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
Domas wrote:
> from now on specific per-bot/per-software/per-client User-Agent
> header is mandatory for contacting Wikimedia sites.

Oh, my. And not just to be a bot, or to edit the site manually,
but even to view it. You can't even fetch a single, simple page
now without supplying that header.

If this has been discussed to death elsewhere and represents
some bizarrely-informed consensus, I'll try to spare this list
my belated rantings, but this is a terrible, terrible idea.
Relying on User-Agent represents the very antithesis of
[[Postel's Law]], a rock-solid principle o which the Internet
(used to be) based.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
Hello,
Am Dienstag 16 Februar 2010 03:06:49 schrieb Steve Summit:
> Is it permissible to send
>
> User-Agent: x

why is it so hard to set
User-Agent: mytoolname/version mymail@mail.invalid

? (you can forgo the mail if you paranoid)

It's clean, fast and good.

Sincerly,
DaB.

--
wp-blog.de
Re: User-Agent: [ In reply to ]
Hi Steve,

> But why?

Because we need to identify malicious behavior.

> (This just broke one of my bots.)
> Are the details of this policy discussed anywhere?

I don't know. Probably. We always told people to specify User-Agent, just the check was broken.

> Is it permissible to send
>
> User-Agent: x
>
> thus providing precisely the same amount of information as if not
> supplying the header at all?

No, you clearly miss very simple idea that with such user-agent you clearly identify yourself as malicious, whereas when you don't specify, you're either malicious or ignorant.

Do note, we're good at detecting spoofed user-agents too, so if your bots disguise as MSIE or Firefox or any other regular browser, your behavior is seen as malicious.

We do not like malicious behavior.

Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
Steve,

> If this has been discussed to death elsewhere and represents
> some bizarrely-informed consensus, I'll try to spare this list
> my belated rantings, but this is a terrible, terrible idea.
> Relying on User-Agent represents the very antithesis of
> [[Postel's Law]], a rock-solid principle o which the Internet
> (used to be) based.

RFC2616:
14.43 User-Agent

The User-Agent request-header field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations. User agents SHOULD include this field with requests. The field can contain multiple product tokens (section 3.8) and comments identifying the agent and any subproducts which form a significant part of the user agent. By convention, the product tokens are listed in order of their significance for identifying the application.

User-Agent = "User-Agent" ":" 1*( product | comment )

Example:

User-Agent: CERN-LineMode/2.15 libwww/2.17b3

RFC2119:
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

I guess you just found one more implication to carefully weight before not specifying U-A.

Domas


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
Domas wrote:
> Hi Steve,
> > But why?
>
> Because we need to identify malicious behavior.

You're trying to detect / guard against malicious behavior using
*User-Agent*?? Good grief. Have fun with the whack-a-mole game, then.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
>> Relying on User-Agent represents the very antithesis of
>> [[Postel's Law]], a rock-solid principle o which the Internet
>> (used to be) based.
>
> RFC2616:
> 14.43 User-Agent
> The User-Agent request-header field... is for... automated
> recognition of user agents for the sake of tailoring
> responses to avoid particular user agent limitations.

Yes, that's precisely the violation of Postel's Law I was
thinking of.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
Hi!

> You're trying to detect / guard against malicious behavior using
> *User-Agent*?? Good grief. Have fun with the whack-a-mole game, then.


Thanks! I'm relatively new to this all operations game, so I'm obsessed about graphs and whack-a-mole :)

Cheers,
Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
On 02/15/2010 05:54 PM, Domas Mituzas wrote:
> Hi!
>
> from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites.
>

Two questions:

Was there some urgent production impact that required doing this with no
notice?

Was any impact analysis done on this? Given Wikipedia's mission, we
can't be as casual about rejecting traffic as a commercial site would
be. If a commercial site accidentally gets rid of some third-world
traffic running behind a shoddy ISP, it's no loss; nobody wants to
advertise to them anyhow. But for us, those are the people who gain the
most from being able to reach us.

William

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
Hi!

> Was there some urgent production impact that required doing this with no
> notice?

Actually we had User-Agent header requirement for ages, it just failed to do what it had to do for a while. Consider this to be a bugfix.

> Was any impact analysis done on this?

Yup!

> Given Wikipedia's mission, we can't be as casual about rejecting traffic as a commercial site would be.
> If a commercial site accidentally gets rid of some third-world traffic running behind a shoddy ISP, it's no
> loss; nobody wants to advertise to them anyhow. But for us, those are the people who gain the
> most from being able to reach us.

Actually, at the moment this mostly affects crap sites that hot-load data from us to display spamvertisements on hacked sites on internet.
I don't know where your 'shoddy ISP' speculation fits in.

Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
Hello,
Am Dienstag 16 Februar 2010 04:15:57 schrieb William Pietri:
> some third-world traffic

why should browser in the 3. world not send user-agents like our browsers (I
doubt that they use others then we)? The change by domas just blocks 2 kinds
of requests: 1.) By broken bots and crawlers and 2.) by paranoid users who
removed the user-agents in their browsers. The >99% of normal users (with
normal browser) will not notice a difference.

Sincerly,
DaB.

--
wp-blog.de
Re: User-Agent: [ In reply to ]
On 02/15/2010 07:25 PM, Domas Mituzas wrote:
>> Was there some urgent production impact that required doing this with no
>> notice?
>>
> Actually we had User-Agent header requirement for ages, it just failed to do what it had to do for a while. Consider this to be a bugfix.
>

Ok. I'm going to take that as "no". In the future, I think it would be
better to let people know in advance about non-urgent changes that may
break things for them.


>> Was any impact analysis done on this?
>>
> Yup!
>

Would you care to share the results with us?

In the future, I'd suggest giving basic info like that as part of an
announcement.

> Actually, at the moment this mostly affects crap sites that hot-load data from us to display spamvertisements on hacked sites on internet.
>

That's another good thing to share as part of a change announcement:
motivation for the change.

> I don't know where your 'shoddy ISP' speculation fits in.
>

Last I looked, there were a lot of poorly maintained proxies out there,
some of which mangle headers. It seemed reasonable to me that some of
those are on low-rent ISPs in poor countries. If you have already done
the work to prove that no legitimate users anywhere in the world are
impacted by this change, then perhaps you could save us further
discussion and just explain that?

Thanks,

William



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
On 02/15/2010 06:50 PM, Steve Summit wrote:
> You're trying to detect / guard against malicious behavior using
> *User-Agent*?? Good grief. Have fun with the whack-a-mole game, then.
>


Yes, a simple restriction like this tends to create smarter villains
rather than less villainy. Filtering on an obvious, easy-to-change
characteristic also destroys a useful source of information on who the
bad people are, making future abuse prevention efforts harder.

William

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
William,

> Yes, a simple restriction like this tends to create smarter villains
> rather than less villainy. Filtering on an obvious, easy-to-change
> characteristic also destroys a useful source of information on who the
> bad people are, making future abuse prevention efforts harder.


Thanks for insights. But no.

We don't use UA as first step of analysis, it was helpful tertiary tool, that put these people into "ignorant or malicious" category.
If they'd have spoofed their UAs, we'd block the IPs and inform upstreams, as fully malicious behavior.
If they had nice UA, we might have attempted to contact them or have isolated their workload until the issue is fixed ;-)

Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
On 02/15/2010 07:55 PM, Domas Mituzas wrote:
>> Yes, a simple restriction like this tends to create smarter villains
>> rather than less villainy. Filtering on an obvious, easy-to-change
>> characteristic also destroys a useful source of information on who the
>> bad people are, making future abuse prevention efforts harder.
>>
>
> Thanks for insights. But no.
>
> We don't use UA as first step of analysis, it was helpful tertiary tool, that put these people into "ignorant or malicious" category.
> If they'd have spoofed their UAs, we'd block the IPs and inform upstreams, as fully malicious behavior.
> If they had nice UA, we might have attempted to contact them or have isolated their workload until the issue is fixed ;-)
>

I am saying that going forward you have eliminated WMF's ability to use
a tertiary tool that you agree was helpful.

Having spent a lot of time dealing with abuse early in the Web's
history, I wouldn't have done it that way. But it's not really my
problem and you don't appear to be looking for input, so godspeed.

William

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
"William Pietri" <william@scissor.com> wrote in message
news:4B7A141E.9000808@scissor.com...
> On 02/15/2010 07:25 PM, Domas Mituzas wrote:
>>> Was there some urgent production impact that required doing this with no
>>> notice?
>
> Ok. I'm going to take that as "no".

As best I understand the discussion in #wikimedia-tech last night, ~20% of
search server load was being taken by aforementioned spamvertisers. That
sounds like an "urgent production impact" to me.

--HM



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
> As best I understand the discussion in #wikimedia-tech last night, ~20% of
> search server load was being taken by aforementioned spamvertisers. That
> sounds like an "urgent production impact" to me.

50% of load, which at that time was using ~20% of search server CPU load.
It also cut our API node traffic into half (some CPU too), and got our average response times for API way nicer: http://www.nedworks.org/~mark/reqstats/svctimestats-daily.png ;-)
Also it removed some pressure on API squids, which were misbehaving yesterday, and caused API outage.

Also, currently 20% of cluster CPU is being spent on generating atom feeds for people who never really subscribed to them. We don't know why, yet, though :)

Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
> Yes, that's precisely the violation of Postel's Law I was
> thinking of.

Steve, someone is sending us this User-Agent, is that you?:))

User-Agent: Mozilla 5.0 (Compatible; with Safari, with Opera, Chrome, Netscape, MSIE etc. You get the idea. It's compatible with everything!) Let me tell you a story. Once upon a time, there was a browser named SeaMonkey. It was of noble inheritance, as it was a direct descendant of the famous Netscape of old. Oh, it could have been so proud, this browser, it could have stood up, radiant and tall and strong. But no, that was not to be. Many websites closed their doors for this browser, saying, We don't know who you are, go away! And poor SeaMonkey would have no other choice than to go in disguise, to make the websites believe it was some other browser. And you who read this, you are one of those websites that are putting SeaMonkey to shame. Listen. All browsers support HTML. If you just send the same HTML, all browsers will accept it. Only in some borderline cases will the page you send to the client need to be tailored to a specific client. In the majority of cases, the standard HTML you send to, say, Opera, can be sent to every other browser as well; K-Meleon, Galeon, etc. There is no need to scan the user agent string for keywords like Firefox, Konqueror, Midori or whatever. Just send the standard HTML, OK? But even if you do scan the user agent string, even if you do insist on sending different stuff to different browsers, you should look for distinguishing signs of the rendering engines: Gecko, Webkit, KHTML, Trident, Presto, and so on, not for different browser names. SeaMonkey works the same way as Firefox, Netscape and Flock; they all have Gecko/yyyymmdd in the user agent string. Similarly, Google Chrome works the same way as Safari, Midori and SRWare Iron; they can be identified by the word AppleWebKit in the string. And so on. Distinguishing browsers by name is not only overkill, but it even can backfire in cases such as this, when a misrepresentation is made. Anyway, even more important than that, above all, what you never ever should to is send fatal errors! The real error, incidentally, is not in the content part of the page you send, but in the header: putting info in the header that identifies the page as XHTML, while sending the content of the page as HTML, which will trigger the fatal error message. Errors like that will make you look silly, or worse, they make you look like you're doing it on purpose; sending bad stuff to some browsers, while sending perfectly OK looking pages to others. You're not REALLY doing it on purpose, are you? Trying to make people think that their browsers aren't good enough, that for instance MS Internet Explorer is a better browser than the ones they're using now, because MSIE can display the site while their own browsers can't? No, let's just give you the benefit of the doubt; you're not doing it on purpose. Blame your content management system if you want. Still, it's nothing I can help from here; you will have to make the change to your site to make sure to not saddle some browsers like SeaMonkey or Kazehakaze with your errors. So please, would you consider having a look? Thanks in advance!

Cheers,
Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
On Mon, Feb 15, 2010 at 8:54 PM, Domas Mituzas <midom.lists@gmail.com>wrote:

> Hi!
>
> from now on specific per-bot/per-software/per-client User-Agent header is
> mandatory for contacting Wikimedia sites.
>
> Domas
>

Hi,

Whose decision was this? Were Erik, Sue, or Danese involved?
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
Hi!

> Whose decision was this?

Mine.

> Were Erik, Sue, or Danese involved?

No.

Domas

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
On Tue, Feb 16, 2010 at 10:31 AM, Domas Mituzas <midom.lists@gmail.com>wrote:

> Hi!
>
> > Whose decision was this?
>
> Mine.
>
> > Were Erik, Sue, or Danese involved?
>
> No.
>

Cool. Who's your boss, and who's your boss's boss? Sorry, I couldn't find
you in the org chart or I'd just have looked that up myself.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
William,

> I am saying that going forward you have eliminated WMF's ability to use
> a tertiary tool that you agree was helpful.

I can't say, that we entirely eliminated it - we transform it a bit, I guess.

> Having spent a lot of time dealing with abuse early in the Web's
> history, I wouldn't have done it that way. But it's not really my
> problem and you don't appear to be looking for input, so godspeed.

Oh, I'm observing all the input.

The decision made wasn't entirely "oh we must do it", and of course,
there could be other courses of action taken, like cherry-picking IPs to ban,
or combine subnet-wide bans with URL-based restrictions.

All of that needs work, and if WMF is willing to spend resources on implementing
such restrictions, it can sure work on it - none of my choices are binding, all I do is usually to
keep the site up in good shape, without wasting too much money ;-)

Domas
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: User-Agent: [ In reply to ]
> Cool. Who's your boss, and who's your boss's boss? Sorry, I couldn't find
> you in the org chart or I'd just have looked that up myself.

Nobody? Been like that for ages, haven't it?

Domas

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

1 2 3 4  View All