Mailing List Archive

should not web server logs (of requests) be published?
hello
should not web server logs (of requests) be published?

my native language is tatar and i would or i am going to write to
tatar wikipedia and say other people to write to it.
authors/managers/administrators of tatar texts are tatar people. for
that i think it is correct if tatar people can see web server logs. i
think this would not be bad for privacy of readers, because they would
see that logs are published, and can access wikipedia through proxy to
hide their ip address. ip-addresses of anonymous writers are already
published. if anonymouse readers want to hide their referer or search
keywords, they also can hide that by copy-pasting wikipedia article
url, and this also should be said shortly on every page and in privacy
page.
another advantage of this is that people could create custom analysers
of the logs.

i think logs should be divided with directory structure by years,
months, days, and probably hours.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
Its againt the privacy poliicy to publish logs like that, and there is
really no good reason given why people should see al the ip
information for all visitors on a wiki

2010/11/28, dinar qorbanof <qdinar@gmail.com>:
> hello
> should not web server logs (of requests) be published?
>
> my native language is tatar and i would or i am going to write to
> tatar wikipedia and say other people to write to it.
> authors/managers/administrators of tatar texts are tatar people. for
> that i think it is correct if tatar people can see web server logs. i
> think this would not be bad for privacy of readers, because they would
> see that logs are published, and can access wikipedia through proxy to
> hide their ip address. ip-addresses of anonymous writers are already
> published. if anonymouse readers want to hide their referer or search
> keywords, they also can hide that by copy-pasting wikipedia article
> url, and this also should be said shortly on every page and in privacy
> page.
> another advantage of this is that people could create custom analysers
> of the logs.
>
> i think logs should be divided with directory structure by years,
> months, days, and probably hours.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

--
Verzonden vanaf mijn mobiele apparaat

Regards,
Huib "Abigor" Laurens



Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
Дана Sunday 28 November 2010 09:35:40 dinar qorbanof написа:
> another advantage of this is that people could create custom analysers
> of the logs.

For now, see http://stats.wikimedia.org/EN/TablesWikipediaTT.htm and
http://stats.grok.se/tt/201009/ .

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
Дана Sunday 28 November 2010 09:53:06 Huib Laurens написа:
> Its againt the privacy poliicy to publish logs like that, and there is

It should be possible to anonymyse the logs sufficiently so that no private
information could be gained from them.

> really no good reason given why people should see al the ip
> information for all visitors on a wiki

Well it would be possible to create custom analysers of the logs.

> 2010/11/28, dinar qorbanof <qdinar@gmail.com>:
> > hello
> > should not web server logs (of requests) be published?
> >
> > my native language is tatar and i would or i am going to write to
> > tatar wikipedia and say other people to write to it.
> > authors/managers/administrators of tatar texts are tatar people. for
> > that i think it is correct if tatar people can see web server logs. i
> > think this would not be bad for privacy of readers, because they would
> > see that logs are published, and can access wikipedia through proxy to
> > hide their ip address. ip-addresses of anonymous writers are already
> > published. if anonymouse readers want to hide their referer or search
> > keywords, they also can hide that by copy-pasting wikipedia article
> > url, and this also should be said shortly on every page and in privacy
> > page.
> > another advantage of this is that people could create custom analysers
> > of the logs.
> >
> > i think logs should be divided with directory structure by years,
> > months, days, and probably hours.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
i do not think that ip address is so important private information,
many people browse through dynamic ip and NAT.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
Do you have a source that many people use dymamic ip's? Cuz I'm pretty sure
most of the regular visiters use one ip.

2010/11/28 dinar qorbanof <qdinar@gmail.com>

> i do not think that ip address is so important private information,
> many people browse through dynamic ip and NAT.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Regards,
Huib "Abigor" Laurens



Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The Wikimedia Foundation believes otherwise. Take a look at their
Privacy Policy <http://meta.wikimedia.org/wiki/Privacy_Policy> (relevant
excerpt follows):

"=== IP and other technical information ===
When a visitor requests or reads a page, or sends email to a Wikimedia
server, no more information is collected than is typically collected by
web sites. The Wikimedia Foundation may keep raw logs of such
transactions, but these will not be published or used to track
legitimate users."

I find it extremely unlikely that the WMF will allow an exception to
this rule. While I don't care if people know my IP address(es), some
people are understandably quite frankly scared by the idea of
broadcasting their IP address to the world, since very often, rather
accurate details about the location - amongst other things - of the user
can be found from checking the IP address. In the end, it pretty much
comes down to the fact that the WMF simply won't release this
information, short of a ruling from the Board of Trustees. Not very
likely to happen. In addition, by extension of that excerpt from the
privacy policy, I don't think the Foundation would agree to publish
anonymized logs either. You also point out that many users edit
anonymously, publishing their IP address instead of a username. I would
view this under the context of the Privacy Policy as voluntary release
of IP address by a user, much as if I posted the IP address I use on my
Wikipedia userpage.

As for NATs and dynamic IP addresses, NATs really don't mean anything
except at large corporations or schools (aside from a convenient way to
put multiple computers on one network); even then, the "external" IP
used by the NAT/Internet gateway is usually a sufficient privacy
concern. And dynamic IP addresses usually don't change very much - for
example, my dynamic IP doesn't actually change unless I shut off my DSL
modem for a good few minutes, which I haven't done since the last power
outage. And, of course, anyone editing from a school, business, or
other institution would most likely have a static IP address, which
could (should?) even, through RDNS, resolve back to the name of that
institution. As for open proxies for editing, they are generally
disallowed from editing.
- --
- --FastLizard4 (http://en.wikipedia.org/wiki/User:FastLizard4)

dinar qorbanof wrote:
> i do not think that ip address is so important private information,
> many people browse through dynamic ip and NAT.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFM8iqzIUvvVwjDo7YRAs4hAKDGfnpsRk6iBkUf4C1jiIWSF1UCzQCePU2O
a/ji6Ujigzv/i9oDGNDlfKY=
=AM8U
-----END PGP SIGNATURE-----

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
i know something about our local providers. tattelecom is adsl
provider, which is only one adsl provider in most villages of republic
of tatarstan, used to use nat, now it is switching to dynamic ip.
ip-addresses of gprs providers are probably like anonymouse. i do not
know much about other adsl and tv cabel internet providers in cities
like kazan and chelny. and there are readers and writers around russia
and world, i do not know much about their providers. as i know it is
used to make some anonymousity of ip address in russian providers, and
as i know there is a big provider in usa, for example (aol?) that
connects lot of people through every IP with nat. but that was because
of shortage of ipv4 addresses. now ipv6 is coming.

2010/11/28 Huib Laurens <sterkebak@gmail.com>:
> Do you have a source that many people use dymamic ip's? Cuz I'm pretty sure
> most of the regular visiters use one ip.
>

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
i have said "as i know it is used to make some anonymousity of ip
address in russian providers", it is "as i think", i think that
probably they intentionally use dynamic ip for some anonymousity,
partially just to connect many people through few ip-addresses. i have
said "but that was because
of shortage of ipv4 addresses". but if it is made for anonymousity,
that can be made also with ipv6.

Huib Laurens has said "Its againt the privacy poliicy to publish logs
like that" and FastLizard4 has said "The Wikimedia Foundation believes
otherwise. Take a look at their Privacy Policy".
these arguments are not very correct, because i say about changing
that privacy policy itself, and am not i talking to wikimedia
foundation?

FastLizard4 has said:
>some
>people are understandably quite frankly scared by the idea of
>broadcasting their IP address to the world, since very often, rather
>accurate details about the location - amongst other things - of the user
>can be found from checking the IP address.
i think, that is quite secure for them, if only their town or region
is found. how many people think so? how many people have one ip
address for a family (home) or even personal ip (if it is personal
modem of gprs/edge/3g for personal notebook)? may be they should use
proxy or ask their provider to make anonymous ip for them?
FastLizard4 has said:
> As for open proxies for editing, they are generally
>disallowed from editing.
i had not known about that. i want to check that.

Huib Laurens has said:
>there is
>really no good reason given why people should see al the ip
>information for all visitors on a wiki
what about opening ips not of all wikipedias, but of only several
language subdomains?

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

My reply inline with quoted message.

dinar qorbanof wrote:
> i have said "as i know it is used to make some anonymousity of ip
> address in russian providers", it is "as i think", i think that
> probably they intentionally use dynamic ip for some anonymousity,
> partially just to connect many people through few ip-addresses. i have
> said "but that was because
> of shortage of ipv4 addresses". but if it is made for anonymousity,
> that can be made also with ipv6.

Here in the U.S., ISPs keep records of who used what IP address at what
time. So, let's say that I had a dynamic IP address that changed every
day. If I got arrested and the courts ordered my ISP to give them a
list of IP addresses I have used in the last month, they would do so,
complete with the times I used each IP address. At least here in the
U.S., dynamic IPs aren't used for anonymity, but simply because there
aren't enough IPv4 addresses left.

> Huib Laurens has said "Its againt the privacy poliicy to publish logs
> like that" and FastLizard4 has said "The Wikimedia Foundation believes
> otherwise. Take a look at their Privacy Policy".
> these arguments are not very correct, because i say about changing
> that privacy policy itself, and am not i talking to wikimedia
> foundation?

No, you aren't. You're talking to a mailing list of people interested
in Foundation affairs. You'll find that most of the people posting to
this list, including myself, are simply volunteer Wikipedia editors
interested in what's going on in the WMF. There are a few WMF staffers
that subscribe to this list, but this isn't the appropriate place for
requesting a change to the Privacy Policy, and I don't know where that
place is. And, as I have said, it is *extremely* unlikely that the
Privacy Policy will be changed. But, I believe to actually propose the
change, you need to go to
<http://meta.wikimedia.org/wiki/Talk:Privacy_policy>.

> FastLizard4 has said:
>> some
>> people are understandably quite frankly scared by the idea of
>> broadcasting their IP address to the world, since very often, rather
>> accurate details about the location - amongst other things - of the user
>> can be found from checking the IP address.
> i think, that is quite secure for them, if only their town or region
> is found.

Although I am no longer really this way, for a few years as a Wikipedia
editor, when I was more active, I certainly didn't want people to know
what city I lived in. I live in a very small one, and there's probably
twelve or less Wikipedia editors that live there. Many editors
(especially administrators) have had threats of violence made against
them; all the more reason to keep your IP address secret to ensure one
less way for people to find out where you live.

Besides, the aim with keeping IP addresses confidential is not to be
convenient to people who want access to server logs, but to take
reasonable measures to protect users' privacy. Why should we even take
the risk of putting lists of IP addresses from server logs out in the
public?

> how many people think so?

You're missing the central point here: the fact that *some* editors do
believe that their IP address should be kept confidential means that IP
address info will be kept confidential for *all* users - it's simply too
much trouble to cherry-pick IPs that want and do not want to be kept
confidential; it's far easier (and makes the Foundation far less liable)
if they just keep all IPs secret. This is why the process for checking
the IP addresses of registered users is so complex and checked
<http://en.wikipedia.org/wiki/Wikipedia:CheckUser> - and even then, the
actual IP addresses are never given to anyone.

> how many people have one ip address for a family (home) or even
> personal ip (if it is personal modem of gprs/edge/3g for personal
> notebook)?

I'm not exactly sure what you're asking here, but if I do understand you
correctly, almost everyone here in the U.S. has only one external IP
address per household. Most families only need (and can afford) one
Internet connection, hence one IP address. The only exceptions, I'd
imagine, are people that run servers. Hence why I have two IP addresses
I use primarily - my home, and my server.

> may be they should use proxy
>
> FastLizard4 has said:
>> As for open proxies for editing, they are generally
>> disallowed from editing.
> i had not known about that. i want to check that.

http://en.wikipedia.org/wiki/Wikipedia:PROXY (Other WMF wikis may have
different policies on the matter, but the English Wikipedia's is pretty
common, I believe.)

> ...or ask their provider to make anonymous ip for them?

Some ISPs here in the U.S., such as AOL, do use anonymizing proxies
normally, but many (including AOL) have agreements with the WMF in which
the ISP will send X-Forwarded-For headers, which contain the original
user's IP address; XFF headers, if present and approved for use by the
WMF, are used instead of the external IP as seen by the servers. And,
as far as I know, in the U.S., requesting an anonymous IP from your ISP
is not a request a user can make.

And, besides, what are we going to do? Put up a banner on top of every
WMF website saying "Hey, we're releasing your IP address information to
people! If you don't like this, go call your ISP to get an anonymous IP
address!" Half the people visiting probably don't even know what an IP
address is, and in this case, not knowing about it doesn't make it any
less dangerous to your privacy.

> Huib Laurens has said:
>> there is
>> really no good reason given why people should see al the ip
>> information for all visitors on a wiki
> what about opening ips not of all wikipedias, but of only several
> language subdomains?

Subdomains are also covered under the WMF Privacy Policy, so it's really
a moot point. But, what exactly would you do with the IP address logs
for a few subdomains, as opposed to the entire Wikimedia farm?
- --
- --FastLizard4 (http://en.wikipedia.org/wiki/User:FastLizard4)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFM8kHtIUvvVwjDo7YRAjCwAJ4x95sEBCJtELPZzkhSTFWHzQL61wCeNVhw
9d8z49psxQJtVok0LpsRLOs=
=sX/O
-----END PGP SIGNATURE-----

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
Hello,

> should not web server logs (of requests) be published?


which intelligence service are you representing?
there are hourly page view statistics somewhere out there, so most of data is already out, drilling in more would mean violating privacy.

and no, I don't see this as a per-project negotiable issue.

Domas
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
2010/11/28 FastLizard4 <fastlizard4@gmail.com>:
> Here in the U.S., ISPs keep records of who used what IP address at what
> time.  So, let's say that I had a dynamic IP address that changed every
> day.  If I got arrested and the courts ordered my ISP to give them a
> list of IP addresses I have used in the last month, they would do so,
> complete with the times I used each IP address.
so in russia. i say only about relative anonymousity, not against
government, but against different people.

>At least here in the
> U.S., dynamic IPs aren't used for anonymity, but simply because there
> aren't enough IPv4 addresses left.
but, maybe, not only for that? maybe, partially also for
partial/relative anonymousity?

> Besides, the aim with keeping IP addresses confidential is not to be
> convenient to people who want access to server logs, but to take
> reasonable measures to protect users' privacy.  Why should we even take
> the risk of putting lists of IP addresses from server logs out in the
> public?
maybe i do not understand this. how keeping ips which are part of logs
can be called convenience to people who can see that logs fully? or
you mean some government people who may request the logs? to wikipedia
owners who want to loook at them? in these cases, also i do not say
that not publishing them to all people is convenience to that
government people or owners.

you have mentioned that provider can give logs to government, probably
also wikipedia must give its logs to government, if requested, is not
it?

>> FastLizard4 has said:
>>> As for open proxies for editing, they are generally
>>> disallowed from editing.
>> i had not known about that. i want to check that.
>
> http://en.wikipedia.org/wiki/Wikipedia:PROXY (Other WMF wikis may have
> different policies on the matter, but the English Wikipedia's is pretty
> common, I believe.)
ah it is wikipedia itself blocks them from editing! then like no
problem! i had thought that proxies do not allow POST requests :) .

>> ...or ask their provider to make anonymous ip for them?
>
> Some ISPs here in the U.S., such as AOL, do use anonymizing proxies
> normally, but many (including AOL) have agreements with the WMF in which
> the ISP will send X-Forwarded-For headers, which contain the original
> user's IP address; XFF headers, if present and approved for use by the
> WMF, are used instead of the external IP as seen by the servers.
i think, ip from xff can be used only together with
anonymouse-external-nat ip, because probably ip from xff is only
unique inside providers internal network. and is that xff ip is logged
by web server? i think that not logged. how it is used/saved/shown in
mediawiki? if 2 ips are needed indeed, as ip pair?

> And,
> as far as I know, in the U.S., requesting an anonymous IP from your ISP
> is not a request a user can make.
users cannot request in provider's official web forum to make dynamic
ip or nat? probably you mean that they cannot
require/demand/claim/request(?) that as their right that is written in
law.

> And, besides, what are we going to do?  Put up a banner on top of every
> WMF website saying "Hey, we're releasing your IP address information to
> people!  If you don't like this, go call your ISP to get an anonymous IP
> address!"  Half the people visiting probably don't even know what an IP
> address is, and in this case, not knowing about it doesn't make it any
> less dangerous to your privacy.
i do not think that to write "ask for anonymouse ip from your
provider". may be this way: "your request, ip address, referer, user
agent are published, read more >>".

>> Huib Laurens has said:
>>> there is
>>> really no good reason given why people should see al the ip
>>> information for all visitors on a wiki
>> what about opening ips not of all wikipedias, but of only several
>> language subdomains?
>
> But, what exactly would you do with the IP address logs
> for a few subdomains, as opposed to the entire Wikimedia farm?
i say this because probably tatar wikipedia for example mostly used by
people whose provider is in russia and i think probably they are
dynamic or under nat. as opposed to english wikipedia, that is usually
used by almost all usa people, who, as you said, use one ip per
family, and uk, australia, etc, about whose providers and ips i do not
know.

2010/11/28 Domas Mituzas <midom.lists@gmail.com>:
> Hello,
>
>> should not web server logs (of requests) be published?
>
>
> which intelligence service are you representing?
i am not from "intelligence service" :) . you mean something like spy?
not, i am not. as i said, i ask this because i think that tatar people
should be managers/adminstrators/controllers of texts they wrote, and
that texts are read mostly by tatar people. if logs are not published,
that mean that they can be read by wikipedia owners, by us government,
but not by tatar people.

> there are hourly page view statistics somewhere out there, so most of data is already out, drilling in more would mean violating privacy.
many sites open their statistics: countries and regions of ips, search
engine query strings. for example, sites on ucoz.ru has that
capability, and other sites that use counter of top.mail.ru ,
liveinternet.ru , and statcounter.com , histats.com etc. does that
hourly statistics have search query strings? i have not seen that of
wikipedia. publishing full/raw logs also is not much violence of
privacy, i think. and wikipedia could say "if you do not want to
publish your ip, then do not use this" but take in account that there
is no problem with hiding ip and referer. and so there is no problem
with anonymous reading. anonymous writing is already generally blocked
by wikipedia itself.

2010/11/28 rupert THURNER <rupert.thurner@gmail.com>:
> what would you like to read out of the logs?

i would read, how much people are reading certain articles, maybe i
would read what pages they browse, if i have analyser that can show
that easily for me. from what search engine requests they come. not
only me can do that then, all people can read that. and users who are
"tracked" also will know that their browsing is published.
and what do you think or can say, knowing what i would read out of the logs?

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
why my messages are not published in
http://lists.wikimedia.org/pipermail/foundation-l/ (in November 2010:
View by: [ Thread ] or [ Subject ] or [ Author ] or [ Date ]) ?

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
Hi!

> you have mentioned that provider can give logs to government, probably
> also wikipedia must give its logs to government, if requested, is not
> it?

Wikipedia cannot give logs to government, as it has none.

> users cannot request in provider's official web forum to make dynamic
> ip or nat? probably you mean that they cannot require/demand/claim/request(?) that as their right that is written in
> law.

No IP is anonymous - based on various usage patterns one can determine who is behind it :)

> i am not from "intelligence service" :) . you mean something like spy?

I meant someone who has some sarcasm detection skills.

> not, i am not. as i said, i ask this because i think that tatar people
> should be managers/adminstrators/controllers of texts they wrote, and
> that texts are read mostly by tatar people. if logs are not published,
> that mean that they can be read by wikipedia owners, by us government,
> but not by tatar people.

Logs cannot be read by wikipedia owners or us government because they don't exist.
You're free to suggest aggregations of interest to you - now we provide hourly pageview counters for each article.

Wikipedia does not track its readers, last time I checked.

> i have not seen that of
> wikipedia. publishing full/raw logs also is not much violence of
> privacy, i think.

I really really would like to avoid going into any ad hominem attacks, but you're not capable to see much, then.

> and wikipedia could say "if you do not want to
> publish your ip, then do not use this" but take in account that there
> is no problem with hiding ip and referer. and so there is no problem
> with anonymous reading.

Wikipedia will not say "do not use this", because its primary goal is to spread knowledge, and that includes spreading knowledge to people who value their privacy.

> anonymous writing is already generally blocked by wikipedia itself.

You can edit under a pseudonym. That is already good enough. IPs identify real people way more than pseudonyms may do.

> and users who are "tracked" also will know that their browsing is published.

Sorry, disregard word 'intelligence' used before in any forms.

Domas


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
I'm afraid our Tatar is correct in some senses and others in this thread
are in a failing or failed mode.

Each web server, of which the WMF has a few, collects details on the
behaviour of IPs, in logs. Those logs can be and probably have been requested by
certain government officials, most likely for the purpose of tracking down
who is behind a certain "Bad" posting to a BLP.

In addition, courts can make such orders in order to determine an otherwise
"John Doe" named in a suit, such as for libel, etc. It's happened it will
continue to happen, the WMF does keep such logs.

Knowing the IP, it can then be tracked back to that user's ISP and a log
again requested to determine the exact person, or at least business or
household, who used the IP at that exact time. So playing with words, doesn't let
us get around that point.

I'm still not clear why we would want to know the IP exactly for analytical
purposes. Some intrepid programmer could write a program which would
simply collect detailed analysis of a person's in-world behaviour and call them
"Bob992" instead of 13.42.204.192 or whatever. Making the information
packets anonymous. That would still allow any sort of analysis the Tatars want to
make, and not reveal any private information.

W
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
On Sun, Nov 28, 2010 at 12:38 PM, Domas Mituzas <midom.lists@gmail.com> wrote:
> Logs cannot be read by wikipedia owners or us government because they don't exist.
There aren't any raw logs?

On Sun, Nov 28, 2010 at 2:30 PM, <WJhonson@aol.com> wrote:
> Each web server, of which the WMF has a few, collects details on the
> behaviour of IPs, in logs.  Those logs can be and probably have been requested by
> certain government officials, most likely for the purpose of tracking down
> who is behind a certain "Bad" posting to a BLP.
Presumably they would usually just use CheckUser data for that.

On Sun, Nov 28, 2010 at 2:30 PM, <WJhonson@aol.com> wrote:
> I'm still not clear why we would want to know the IP exactly for analytical
> purposes. Some intrepid programmer could write a program which would
> simply collect detailed analysis of a person's in-world behaviour and call them
> "Bob992" instead of 13.42.204.192 or whatever. Making the information
> packets anonymous. That would still allow any sort of analysis the Tatars want to
> make, and not reveal any private information.
It's a bit more complicated than that. Sometimes anonymous isn't
anonymous enough: http://en.wikipedia.org/wiki/AOL_search_data_scandal

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
2010/11/28 <WJhonson@aol.com>:
> I'm still not clear why we would want to know the IP exactly for analytical
> purposes.  Some intrepid programmer could write a program which would
> simply collect detailed analysis of a person's in-world behaviour and call them
> "Bob992" instead of 13.42.204.192 or whatever.  Making the information
> packets anonymous.  That would still allow any sort of analysis the Tatars want to
> make, and not reveal any private information.
i just has not thought about that as threat. theoretically ip
addresses can be used to count how much wikipedia readers are in
russia regions. such statistics is made by russian counters:
liveinternet, and maybe, mail.ru . but i do not know whether any tatar
can get such database to make such counter for wikipedia logs.
probably some russian companies can make such analysis for russian and
tatar and other wikipedias of languages of russia.
>the Tatars want to make
on the one hand, i do not represent [all] tatars, and on the one hand,
i think i represent also other language native speakers.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
On Sun, Nov 28, 2010 at 2:30 PM, <WJhonson@aol.com> wrote:

> I'm afraid our Tatar is correct in some senses and others in this thread
> are in a failing or failed mode.
>
> Each web server, of which the WMF has a few, collects details on the
> behaviour of IPs, in logs. Those logs can be and probably have been
> requested by
> certain government officials, most likely for the purpose of tracking down
> who is behind a certain "Bad" posting to a BLP.
>
>
CheckUser data (IPs of editors) are kept for 3 months.

http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUser.php?view=markup

WMF does not keep apache logs which would track what pages people are
reading.''

http://noc.wikimedia.org/conf/httpd.conf (see CustomLog which is commented
out, meaning that access logs are not kept)

There are some logs for the squid servers which are used to generate page
view stats, but those take a 1/1000 sample and there are full squid logs for
click throughs on the fundraising banners.

http://wikitech.wikimedia.org/view/Squid_logging

So, we do not have readership logs except for the sampled squid logs. For
performance reasons, it's not desirable to collect more detailed logs, nor
would we really want them.

-Katie (@aude)


> In addition, courts can make such orders in order to determine an otherwise
> "John Doe" named in a suit, such as for libel, etc. It's happened it will
> continue to happen, the WMF does keep such logs.
>
> Knowing the IP, it can then be tracked back to that user's ISP and a log
> again requested to determine the exact person, or at least business or
> household, who used the IP at that exact time. So playing with words,
> doesn't let
> us get around that point.
>
> I'm still not clear why we would want to know the IP exactly for analytical
> purposes. Some intrepid programmer could write a program which would
> simply collect detailed analysis of a person's in-world behaviour and call
> them
> "Bob992" instead of 13.42.204.192 or whatever. Making the information
> packets anonymous. That would still allow any sort of analysis the Tatars
> want to
> make, and not reveal any private information.


> W
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
On Sun, Nov 28, 2010 at 3:21 PM, dinar qorbanof <qdinar@gmail.com> wrote:

> 2010/11/28 <WJhonson@aol.com>:
> > I'm still not clear why we would want to know the IP exactly for
> analytical
> > purposes. Some intrepid programmer could write a program which would
> > simply collect detailed analysis of a person's in-world behaviour and
> call them
> > "Bob992" instead of 13.42.204.192 or whatever. Making the information
> > packets anonymous. That would still allow any sort of analysis the
> Tatars want to
> > make, and not reveal any private information.
> i just has not thought about that as threat. theoretically ip
> addresses can be used to count how much wikipedia readers are in
> russia regions. such statistics is made by russian counters:
> liveinternet, and maybe, mail.ru . but i do not know whether any tatar
> can get such database to make such counter for wikipedia logs.
> probably some russian companies can make such analysis for russian and
> tatar and other wikipedias of languages of russia.
> >the Tatars want to make
> on the one hand, i do not represent [all] tatars, and on the one hand,
> i think i represent also other language native speakers.
>
>
The sampled 1/1000 squid logs can be used for statistical purposes, such as
page view stats. Someone more techy can answer that better than I can, if
the samples include IP addresses that could be used w/ geoip for geographic
analysis. (I think perhaps not)

Here are the page view stats generated from the squid sample logs:

http://dammit.lt/wikistats/

http://stats.grok.se/

For other analysis of readership, we do get stats from comScore, but that's
survey data from panelists and nothing to do with logs.

http://meta.wikimedia.org/wiki/User:Stu/comScore_data_on_Wikimedia

-Katie (@aude)


_______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
:) ok then. thank you. i should ask first whether wikipedia collects logs.

2010/11/28 aude <aude.wiki@gmail.com>:
> On Sun, Nov 28, 2010 at 2:30 PM, <WJhonson@aol.com> wrote:
>
>> I'm afraid our Tatar is correct in some senses and others in this thread
>> are in a failing  or failed mode.
>>
>> Each web server, of which the WMF has a few, collects details on the
>> behaviour of IPs, in logs.  Those logs can be and probably have been
>> requested by
>> certain government officials, most likely for the purpose of tracking down
>> who is behind a certain "Bad" posting to a BLP.
>>
>>
> CheckUser data (IPs of editors) are kept for 3 months.
>
> http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUser.php?view=markup
>
> WMF does not keep apache logs which would track what pages people are
> reading.''
>
> http://noc.wikimedia.org/conf/httpd.conf (see CustomLog which is commented
> out, meaning that access logs are not kept)
>
> There are some logs for the squid servers which are used to generate page
> view stats, but those take a 1/1000 sample and there are full squid logs for
> click throughs on the fundraising banners.
>
> http://wikitech.wikimedia.org/view/Squid_logging
>
> So, we do not have readership logs except for the sampled squid logs.  For
> performance reasons, it's not desirable to collect more detailed logs, nor
> would we really want them.
>
> -Katie (@aude)
>
>
>> In addition, courts can make such orders in order to determine an otherwise
>> "John Doe" named in a suit, such as for libel, etc.  It's happened it will
>> continue to happen, the WMF does keep such logs.
>>
>> Knowing the IP, it can then be tracked back to that user's ISP and a log
>> again requested to determine the exact person, or at least business or
>> household, who used the IP at that exact time.  So playing with words,
>> doesn't let
>> us get around that point.
>>
>> I'm still not clear why we would want to know the IP exactly for analytical
>> purposes.  Some intrepid programmer could write a program which would
>> simply collect detailed analysis of a person's in-world behaviour and call
>> them
>> "Bob992" instead of 13.42.204.192 or whatever.  Making the information
>> packets anonymous.  That would still allow any sort of analysis the Tatars
>> want to
>> make, and not reveal any private information.
>
>
>> W
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



and i write again, do not you or somebody know why my messages are not
published in the official mail archive? i do not format my message
correctly?

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
We should all be asking "Is there really a problem here that would justify creating a major exception to our privacy policies?" -- because I haven't seen one. Did anyone notice how some of the earlier posts were suggesting that it was OK because people can anonymize themselves with a proxy or some other option -- a situation that would require a user (possibly one with no understanding of the concept of open proxies) to take technical steps simply to "opt-in" to privacy. Also, did anyone think to ask the tech team whether they'd be OK shouldering the burden of releasing these logs? Or the OTRS team whether they're OK with dealing the email burden that would come with that? Or Communications to see whether they agree with the negative PR of this?

Any one of these above steps would probably have revealed that it is a bad idea. Just sayin.


-Dan
On Nov 28, 2010, at 3:41 PM, dinar qorbanof wrote:

> :) ok then. thank you. i should ask first whether wikipedia collects logs.
>
> 2010/11/28 aude <aude.wiki@gmail.com>:
>> On Sun, Nov 28, 2010 at 2:30 PM, <WJhonson@aol.com> wrote:
>>
>>> I'm afraid our Tatar is correct in some senses and others in this thread
>>> are in a failing or failed mode.
>>>
>>> Each web server, of which the WMF has a few, collects details on the
>>> behaviour of IPs, in logs. Those logs can be and probably have been
>>> requested by
>>> certain government officials, most likely for the purpose of tracking down
>>> who is behind a certain "Bad" posting to a BLP.
>>>
>>>
>> CheckUser data (IPs of editors) are kept for 3 months.
>>
>> http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUser.php?view=markup
>>
>> WMF does not keep apache logs which would track what pages people are
>> reading.''
>>
>> http://noc.wikimedia.org/conf/httpd.conf (see CustomLog which is commented
>> out, meaning that access logs are not kept)
>>
>> There are some logs for the squid servers which are used to generate page
>> view stats, but those take a 1/1000 sample and there are full squid logs for
>> click throughs on the fundraising banners.
>>
>> http://wikitech.wikimedia.org/view/Squid_logging
>>
>> So, we do not have readership logs except for the sampled squid logs. For
>> performance reasons, it's not desirable to collect more detailed logs, nor
>> would we really want them.
>>
>> -Katie (@aude)
>>
>>
>>> In addition, courts can make such orders in order to determine an otherwise
>>> "John Doe" named in a suit, such as for libel, etc. It's happened it will
>>> continue to happen, the WMF does keep such logs.
>>>
>>> Knowing the IP, it can then be tracked back to that user's ISP and a log
>>> again requested to determine the exact person, or at least business or
>>> household, who used the IP at that exact time. So playing with words,
>>> doesn't let
>>> us get around that point.
>>>
>>> I'm still not clear why we would want to know the IP exactly for analytical
>>> purposes. Some intrepid programmer could write a program which would
>>> simply collect detailed analysis of a person's in-world behaviour and call
>>> them
>>> "Bob992" instead of 13.42.204.192 or whatever. Making the information
>>> packets anonymous. That would still allow any sort of analysis the Tatars
>>> want to
>>> make, and not reveal any private information.
>>
>>
>>> W
>>> _______________________________________________
>>> foundation-l mailing list
>>> foundation-l@lists.wikimedia.org
>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>>
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
>
>
>
> and i write again, do not you or somebody know why my messages are not
> published in the official mail archive? i do not format my message
> correctly?
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
On Sun, Nov 28, 2010 at 3:41 PM, dinar qorbanof <qdinar@gmail.com> wrote:

> :) ok then. thank you. i should ask first whether wikipedia collects logs.
>
> 2010/11/28 aude <aude.wiki@gmail.com>:
> > On Sun, Nov 28, 2010 at 2:30 PM, <WJhonson@aol.com> wrote:
> >
> >> I'm afraid our Tatar is correct in some senses and others in this thread
> >> are in a failing or failed mode.
> >>
> >> Each web server, of which the WMF has a few, collects details on the
> >> behaviour of IPs, in logs. Those logs can be and probably have been
> >> requested by
> >> certain government officials, most likely for the purpose of tracking
> down
> >> who is behind a certain "Bad" posting to a BLP.
> >>
> >>
> > CheckUser data (IPs of editors) are kept for 3 months.
> >
> >
> http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUser.php?view=markup
> >
> > WMF does not keep apache logs which would track what pages people are
> > reading.''
> >
> > http://noc.wikimedia.org/conf/httpd.conf (see CustomLog which is
> commented
> > out, meaning that access logs are not kept)
> >
> > There are some logs for the squid servers which are used to generate page
> > view stats, but those take a 1/1000 sample and there are full squid logs
> for
> > click throughs on the fundraising banners.
> >
> > http://wikitech.wikimedia.org/view/Squid_logging
> >
> > So, we do not have readership logs except for the sampled squid logs.
> For
> > performance reasons, it's not desirable to collect more detailed logs,
> nor
> > would we really want them.
> >
> > -Katie (@aude)
> >
> >
> >> In addition, courts can make such orders in order to determine an
> otherwise
> >> "John Doe" named in a suit, such as for libel, etc. It's happened it
> will
> >> continue to happen, the WMF does keep such logs.
> >>
> >> Knowing the IP, it can then be tracked back to that user's ISP and a log
> >> again requested to determine the exact person, or at least business or
> >> household, who used the IP at that exact time. So playing with words,
> >> doesn't let
> >> us get around that point.
> >>
> >> I'm still not clear why we would want to know the IP exactly for
> analytical
> >> purposes. Some intrepid programmer could write a program which would
> >> simply collect detailed analysis of a person's in-world behaviour and
> call
> >> them
> >> "Bob992" instead of 13.42.204.192 or whatever. Making the information
> >> packets anonymous. That would still allow any sort of analysis the
> Tatars
> >> want to
> >> make, and not reveal any private information.
> >
> >
> >> W
> >> _______________________________________________
> >> foundation-l mailing list
> >> foundation-l@lists.wikimedia.org
> >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >>
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
>
>
>
> and i write again, do not you or somebody know why my messages are not
> published in the official mail archive? i do not format my message
> correctly?
>
>
I don't know. :/

-Katie (aude)


_______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
My belief is that this is not so. Checkuser logs are not the same thing as
IP logs.

Are you suggesting that should a court, three months-and-a-day after a
logged in user made a libelous edit, order the WMF to release the IP address of
that user, they would not be able to do so? I suggest they would and
probably have.

I would like to see a clear citation to where, when and how the WMF retains
logs of user activity. Is there actually such an official statement
somewhere? And could anyone cite it with a link?

The issue with the AOL Search Scandal is a red herring. People are not
going to be searching for their own phone number or Social Security numbers
within Wikipedia. And even if someone searches for such a thing, there is no
way to know that they are looking for details on themselves, or on someone
else.

Our entry on that regardless notes a lawsuit *four years old* with no
resolution
http://en.wikipedia.org/wiki/AOL_search_data_scandal

Indicative I suggest of it being a non-story.
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
should not web server logs (of requests) be published? [ In reply to ]
WJhonson:
> The issue with the AOL Search Scandal is a red herring. People are not
> going to be searching for their own phone number or Social Security
numbers
> within Wikipedia. And even if someone searches for such a thing, there is
no
> way to know that they are looking for details on themselves, or on someone
> else.
>
> Our entry on that regardless notes a lawsuit *four years old* with no
> resolution
> http://en.wikipedia.org/wiki/AOL_search_data_scandal
>
> Indicative I suggest of it being a non-story.

Many people did search for their own name occasionally,
and relatively often did search for local shops and local news.
Each of these clues were ambiguous and insignificant by themselves,
but once put together often did paint a unique picture of one single person.

Apparently de-anonimization is a nice pursuit for some would-be detectives,
and quite possibly also for government officials in some parts of the world
where privacy is considered a risk to a state's stability.

The AOL data were taken offline very quickly (and the research team
disbanded),
but copies had already been made, and you can still find the data online
now.

http://www.gregsadetsky.com/aol-data/

The following article paints a rather graphical picture of how search terms
came to haunt back their author.

http://www.zdnet.co.uk/news/networking/2006/08/08/search-history-gives-insig
ht-into-lives-of-aol-users-39280576/

Erik Zachte





_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: should not web server logs (of requests) be published? [ In reply to ]
Repost with shortened url:

WJhonson:
> The issue with the AOL Search Scandal is a red herring. People are
> not going to be searching for their own phone number or Social
> Security numbers within Wikipedia. And even if someone searches for
> such a thing, there is no way to know that they are looking for
> details on themselves, or on someone else.
>
> Our entry on that regardless notes a lawsuit *four years old* with no
> resolution http://en.wikipedia.org/wiki/AOL_search_data_scandal
>
> Indicative I suggest of it being a non-story.

Many people did search for their own name occasionally, and relatively often
did search for local shops and local news.
Each of these clues were ambiguous and insignificant by themselves, but once
put together often did paint a unique picture of one single person.

Apparently de-anonimization is a nice pursuit for some would-be detectives,
and quite possibly also for government officials in some parts of the world
where privacy is considered a risk to a state's stability.

The AOL data were taken offline very quickly (and the research team
disbanded), but copies had already been made, and you can still find the
data online now.

http://www.gregsadetsky.com/aol-data/

The following article paints a rather graphical picture of how search terms
came to haunt back their author.

http://tinyurl.com/322a5pk

Erik Zachte




_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

1 2 3 4  View All