Mailing List Archive

Languages and numbers
While preparing Missing Wikipedias [1], I've got numbers of speakers and
languages by area and country with chapter not covered by Wikipedias.

Numbers are preliminary, some of them should be corrected. I didn't
exclude Han languages, which mostly shouldn't be counted, and similar.
Note, also, that every language should be analyzed separately. Many
languages are spoken not just inside of one country.

Please, fix errors and comment.

* * *

Areas. They approximate the usual definitions of areas, but they are
different because of linguistic corrections.

* Afro-Asiatic Area: Area where Afro-Asiatic languages are dominant.
North Africa + Middle East + Sudan, Ethiopia, Eritrea and Somalia - Iran.
* Europe: Europe (including Caucasus) includes Turkey.
* South Asia: South Asia + Iran. Dominantly Indo-European and Dravidian
languages.
* Sub-Saharan Africa: The rest of Africa.
* Polynesia, Australia and Oceania: Includes Malaysia and Taiwan
(Taiwanese languages not covered in Wikipedias are dominantly Austronesian.)
* East Asia: Han China "China (Central)", Korea and Japan.
* South-East Asia: Includes non-Han south China "China (South)".
* Latin America: Parts of America where Spanish and Portuguese are
official languages.
* Anglo-French America: Parts of America where English, French and Dutch
are official languages.
* North Asia: Asian part of former USSR, Mongolia and non-Han northern
and western China "China (North)".

The first column is number of speakers, the second number of languages,
the third is area.

399259294 592 South Asia
353676706 1805 Sub-Saharan Africa
221855457 253 Afro-Asiatic Area
138979263 2198 Polynesia, Australia and Oceania
107363760 37 East Asia
99260271 447 South-East Asia
47901185 143 Europe
30361602 724 Latin America
8481452 227 Anglo-French America
3724384 45 North Asia

* * *

Countries with chapters. (Numbers are not fully correct, as they include
some languages removed in the list below this one.)

If any chapter (or interested group) is interested in full list of
missing languages, I'll provide it by request before completing the
work. I suppose that some chapters are interested in languages with less
than 100K of speakers, as well.

296,097,274 349 India
71,356,176 681 Indonesia
46,676,395 157 Philippines
7,819,010 9 Germany
7,994,871 76 Russian Federation
5,386,580 5 Serbia
4,785,299 6 South Africa
2,841,300 17 Israel
1,139,750 4 Ukraine
1,085,931 125 United States
832,000 3 Netherlands
705,967 70 Canada
472,470 1 Czech Republic
375,704 17 Taiwan
313,642 6 Chile
246,900 3 United Kingdom
200,500 4 Spain
191,430 5 Poland
151,240 7 Sweden
132,809 12 Argentina
86,390 155 Australia
50,000 1 France
30,000 1 Hungary
29,980 4 Switzerland
17,460 5 Finland
15,000 1 Portugal
10,500 2 Norway
5,000 1 Denmark
4,500 1 Estonia

Languages with more than million or more than 100,000 of speakers
without Wikipedia and with chapter in the country:

India (more than million)
38261000 Awadhi
34700000 Maithili
17500000 Chhattisgarhi
13000000 Magahi
13000000 Haryanvi
12800000 Deccan
10400000 Malvi
9500000 Kanauji
9000000 Dhundari
7760000 Bagheli
6970000 Varhadi-Nagpuri
6170900 Santali
6000000 Lambadi
5622600 Marwari
5000000 Mewati
4730000 Hadothi
4004490 Konkani
3900000 Merwari
3800000 Mina
3633900 Konkani, Goan
3000000 Shekhawati
3000000 Godwari
2920000 Garhwali
2680000 Indian Sign Language
2360000 Kumaoni
2110000 Dogri
2100000 Bagri
2094200 Kurux
2000000 Mewari
1970000 Sadri
1950000 Tulu
1950000 Gondi, Northern
1930000 Waddar
1710000 Wagdi
1700000 Kangri
1580000 Khandesi
1560280 Mundari
1543300 Bodo
1500000 Ho
1430000 Nimadi
1391000 Meitei
1300000 Bhili
1200000 Vasavi
1150000 Bhilali
1045000 Panjabi, Mirpur
1000000 Pahari, Mahasu

Indonesia (more than million)
13600900 Madura
5530000 Minangkabau
3930000 Musi
3502300 Banjar
3330000 Bali
2700000 Betawi
2350000 Malay, Central
2100000 Sasak
2000000 Batak Toba
1880000 Malay, Makassar
1600000 Makasar
1200000 Batak Simalungun
1200000 Batak Dairi
1100000 Batak Mandailing
1000000 Malay, Jambi

Philippines (more than 100k)
5770000 Hiligaynon
2500000 Bicolano, Central
1900000 Bicolano, Albay
1062000 Tausug
1000000 Maguindanao
776000 Maranao
639000 Capiznon
540000 Bontoc, Central
500000 Ibanag
395000 Inakeanon
378000 Kinaray-a
350000 Masbatenyo
345000 Surigaonon
319000 Sama, Southern
293000 Chavacano
234000 Bicolano, Iriga
200000 Romblomanon
200000 Bantoanon
185000 Sorsogon, Waray
150000 Kankanaey
150000 Blaan, Koronadal
147000 Davawenyo
140000 Subanen, Central
134000 Itawit
123000 Cuyonon
122000 Bicolano, Northern Catanduanes
111000 Ibaloi
107000 Yakan
100000 Philippine Sign Language
100000 Binukid

Germany
4910000 Mainfränkisch
2000000 Saxon, Upper
819000 Swabian

Russian Federation
783720 Lezgi
696630 Erzya
614000 Moksha
516490 Dargwa
499300 Adyghe
460090 Mari, Meadow
422550 Kumyk
413000 Ingush
363000 Yakut
264400 Tuva
217000 Komi-Zyrian
164420 Lak
128900 Tabassaran
113710 Balkar

Serbia and Kosovo
4156090 Albanian, Gheg
709570 Romani, Balkan
318920 Romani, Sinte
172000 Romano-Serbian

South Africa
4101000 Sotho, Northern
640000 Ndebele

Israel
1762320 Yiddish, Eastern
352500 Arabic, Judeo-Tunisian
258930 Arabic, Judeo-Moroccan
110000 Bukharic
100130 Arabic, Judeo-Iraqi

United States
600000 Hawai’i Creole English
250000 Sea Island Creole English

Netherlands
592000 Gronings
220000 Zeeuws

Canada
402900 Plautdietsch

Czech Republic
472470 Romani, Carpathian

Taiwan
138000 Amis

Chile
300039 Mapudungun

United Kingdom
202900 Angloromani

Spain
102000 Spanish Sign Language

Sweden
109600 Finnish, Tornedalen

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
Hi,

On 25 Jun 2011, at 05:52, Milos Rancic <millosh@gmail.com> wrote:

> While preparing Missing Wikipedias [1], I've got numbers of speakers and
> languages by area and country with chapter not covered by Wikipedias.

Fascinating! Thanks for the work! :-)

Isabell.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
Forwarding Deryk Chan's email and my response on his request.

-------- Original Message --------
Subject: Re: [Internal-l] Fwd: [Foundation-l] Languages and numbers
Date: Sat, 25 Jun 2011 13:55:58 +0200
From: Milos Rancic <millosh@gmail.com>
To: Deryck Chan <deryckchan@gmail.com>

On 06/25/2011 01:28 PM, Deryck Chan wrote:
> (sorry, am on mobile, can't post to list. Feel free to forward this onto
> the list)
>
> 2 obvious queries:
> 1. How are we going to do a Wikipedia on... Indian Sign Language?
> 2. If we exclude the Chinese languages from the table (which is a move I
> agree with), we should also exclude all other languages which defer to
> the standard written form of a related language that has a Wikipedia,
> eg. Mainfränkisch (because we have a standard German Wikipedia).

1. There are requests for Wikipedias in sign languages (search for "sign
language" here [1]). They intend to use SignWriting [2]. We are waiting
for implementation of top-bottom writing to be able to host sign languages.

2. I didn't say that we should exclude Chinese languages, but that is
likely that some of them should be excluded. If they are too close to
Mandarin so there is no significant difference in writing, yes. If not,
no. But, I think that all of the Han languages not closely related to
Mandarin already have their own Wikipedia.

Note, also, that there is request for Wikipedia in Swabian [3], as well
as there are a number of Wikipedias in German languages. So, it's up to
them to decide what do they want. Besides that, one thing is Standard
Chinese, the other is Standard German. Logographic script allows much
more varieties to be covered than phonetic one. For example, with
logographic script Serbian and English could be written in one
orthography (while not English and German nor Serbian and Bulgarian).

Besides that, I intentionally categorized Han China, Korea and Japan
together (as East Asia) because it is not likely that WMF should do
anything there. All countries are developed enough (OK, North Korea is
not, but there is South Korea) and languages in those areas stay well
enough. That's true for the most of languages of countries which are
OECD members.

The main purpose of this document is to point to the large populations
without Wikipedia in their native language. India, Indonesia and
Philippines will be in focus, obviously.

[1] http://meta.wikimedia.org/wiki/Requests_for_new_languages
[2] http://en.wikipedia.org/wiki/SignWriting
[3]
http://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Swabian

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
I posted this on the India list (many people are not subscribed to
foundation-l) - forwarding this question which just popped up.

Bishakha

---------- Forwarded message ----------
From: Vickram Crishna <vvcrishna@radiophony.com>
Date: Sat, Jun 25, 2011 at 6:08 PM
Subject: Re: [Wikimediaindia-l] Fwd: [Foundation-l] Languages and numbers
To: Wikimedia India Community list <wikimediaindia-l@lists.wikimedia.org>


It is fascinating, although I think I may not have understood the
classifications. Is there only one Indian Sign Language, for instance? I was
told by a user (in the UK) that several are in use in different parts of the
country. Still, perhaps the variants do not have sufficient numbers of users
to qualify for this listing. However, the context in which I was told was
precisely the severe lack of support materials for helping users become
self-sufficient and good communicators, so the list itself becomes a
barrier.

Unfortunately, I do not know at the moment how to fix the problem.

[296,097,274 349 India]

Does the population number mean that the existing indic language wikipedias
covers the rest of the population ie over 90 crore? Is this information
updated from the current census?


On Sat, Jun 25, 2011 at 10:22 AM, Milos Rancic <millosh@gmail.com> wrote:

> While preparing Missing Wikipedias [1], I've got numbers of speakers and
> languages by area and country with chapter not covered by Wikipedias.
>
> Numbers are preliminary, some of them should be corrected. I didn't
> exclude Han languages, which mostly shouldn't be counted, and similar.
> Note, also, that every language should be analyzed separately. Many
> languages are spoken not just inside of one country.
>
> Please, fix errors and comment.
>
> * * *
>
> Areas. They approximate the usual definitions of areas, but they are
> different because of linguistic corrections.
>
> * Afro-Asiatic Area: Area where Afro-Asiatic languages are dominant.
> North Africa + Middle East + Sudan, Ethiopia, Eritrea and Somalia - Iran.
> * Europe: Europe (including Caucasus) includes Turkey.
> * South Asia: South Asia + Iran. Dominantly Indo-European and Dravidian
> languages.
> * Sub-Saharan Africa: The rest of Africa.
> * Polynesia, Australia and Oceania: Includes Malaysia and Taiwan
> (Taiwanese languages not covered in Wikipedias are dominantly
> Austronesian.)
> * East Asia: Han China "China (Central)", Korea and Japan.
> * South-East Asia: Includes non-Han south China "China (South)".
> * Latin America: Parts of America where Spanish and Portuguese are
> official languages.
> * Anglo-French America: Parts of America where English, French and Dutch
> are official languages.
> * North Asia: Asian part of former USSR, Mongolia and non-Han northern
> and western China "China (North)".
>
> The first column is number of speakers, the second number of languages,
> the third is area.
>
> 399259294 592 South Asia
> 353676706 1805 Sub-Saharan Africa
> 221855457 253 Afro-Asiatic Area
> 138979263 2198 Polynesia, Australia and Oceania
> 107363760 37 East Asia
> 99260271 447 South-East Asia
> 47901185 143 Europe
> 30361602 724 Latin America
> 8481452 227 Anglo-French America
> 3724384 45 North Asia
>
> * * *
>
> Countries with chapters. (Numbers are not fully correct, as they include
> some languages removed in the list below this one.)
>
> If any chapter (or interested group) is interested in full list of
> missing languages, I'll provide it by request before completing the
> work. I suppose that some chapters are interested in languages with less
> than 100K of speakers, as well.
>
> 296,097,274 349 India
> 71,356,176 681 Indonesia
> 46,676,395 157 Philippines
> 7,819,010 9 Germany
> 7,994,871 76 Russian Federation
> 5,386,580 5 Serbia
> 4,785,299 6 South Africa
> 2,841,300 17 Israel
> 1,139,750 4 Ukraine
> 1,085,931 125 United States
> 832,000 3 Netherlands
> 705,967 70 Canada
> 472,470 1 Czech Republic
> 375,704 17 Taiwan
> 313,642 6 Chile
> 246,900 3 United Kingdom
> 200,500 4 Spain
> 191,430 5 Poland
> 151,240 7 Sweden
> 132,809 12 Argentina
> 86,390 155 Australia
> 50,000 1 France
> 30,000 1 Hungary
> 29,980 4 Switzerland
> 17,460 5 Finland
> 15,000 1 Portugal
> 10,500 2 Norway
> 5,000 1 Denmark
> 4,500 1 Estonia
>
> Languages with more than million or more than 100,000 of speakers
> without Wikipedia and with chapter in the country:
>
> India (more than million)
> 38261000 Awadhi
> 34700000 Maithili
> 17500000 Chhattisgarhi
> 13000000 Magahi
> 13000000 Haryanvi
> 12800000 Deccan
> 10400000 Malvi
> 9500000 Kanauji
> 9000000 Dhundari
> 7760000 Bagheli
> 6970000 Varhadi-Nagpuri
> 6170900 Santali
> 6000000 Lambadi
> 5622600 Marwari
> 5000000 Mewati
> 4730000 Hadothi
> 4004490 Konkani
> 3900000 Merwari
> 3800000 Mina
> 3633900 Konkani, Goan
> 3000000 Shekhawati
> 3000000 Godwari
> 2920000 Garhwali
> 2680000 Indian Sign Language
> 2360000 Kumaoni
> 2110000 Dogri
> 2100000 Bagri
> 2094200 Kurux
> 2000000 Mewari
> 1970000 Sadri
> 1950000 Tulu
> 1950000 Gondi, Northern
> 1930000 Waddar
> 1710000 Wagdi
> 1700000 Kangri
> 1580000 Khandesi
> 1560280 Mundari
> 1543300 Bodo
> 1500000 Ho
> 1430000 Nimadi
> 1391000 Meitei
> 1300000 Bhili
> 1200000 Vasavi
> 1150000 Bhilali
> 1045000 Panjabi, Mirpur
> 1000000 Pahari, Mahasu
>
> Indonesia (more than million)
> 13600900 Madura
> 5530000 Minangkabau
> 3930000 Musi
> 3502300 Banjar
> 3330000 Bali
> 2700000 Betawi
> 2350000 Malay, Central
> 2100000 Sasak
> 2000000 Batak Toba
> 1880000 Malay, Makassar
> 1600000 Makasar
> 1200000 Batak Simalungun
> 1200000 Batak Dairi
> 1100000 Batak Mandailing
> 1000000 Malay, Jambi
>
> Philippines (more than 100k)
> 5770000 Hiligaynon
> 2500000 Bicolano, Central
> 1900000 Bicolano, Albay
> 1062000 Tausug
> 1000000 Maguindanao
> 776000 Maranao
> 639000 Capiznon
> 540000 Bontoc, Central
> 500000 Ibanag
> 395000 Inakeanon
> 378000 Kinaray-a
> 350000 Masbatenyo
> 345000 Surigaonon
> 319000 Sama, Southern
> 293000 Chavacano
> 234000 Bicolano, Iriga
> 200000 Romblomanon
> 200000 Bantoanon
> 185000 Sorsogon, Waray
> 150000 Kankanaey
> 150000 Blaan, Koronadal
> 147000 Davawenyo
> 140000 Subanen, Central
> 134000 Itawit
> 123000 Cuyonon
> 122000 Bicolano, Northern Catanduanes
> 111000 Ibaloi
> 107000 Yakan
> 100000 Philippine Sign Language
> 100000 Binukid
>
> Germany
> 4910000 Mainfränkisch
> 2000000 Saxon, Upper
> 819000 Swabian
>
> Russian Federation
> 783720 Lezgi
> 696630 Erzya
> 614000 Moksha
> 516490 Dargwa
> 499300 Adyghe
> 460090 Mari, Meadow
> 422550 Kumyk
> 413000 Ingush
> 363000 Yakut
> 264400 Tuva
> 217000 Komi-Zyrian
> 164420 Lak
> 128900 Tabassaran
> 113710 Balkar
>
> Serbia and Kosovo
> 4156090 Albanian, Gheg
> 709570 Romani, Balkan
> 318920 Romani, Sinte
> 172000 Romano-Serbian
>
> South Africa
> 4101000 Sotho, Northern
> 640000 Ndebele
>
> Israel
> 1762320 Yiddish, Eastern
> 352500 Arabic, Judeo-Tunisian
> 258930 Arabic, Judeo-Moroccan
> 110000 Bukharic
> 100130 Arabic, Judeo-Iraqi
>
> United States
> 600000 Hawai’i Creole English
> 250000 Sea Island Creole English
>
> Netherlands
> 592000 Gronings
> 220000 Zeeuws
>
> Canada
> 402900 Plautdietsch
>
> Czech Republic
> 472470 Romani, Carpathian
>
> Taiwan
> 138000 Amis
>
> Chile
> 300039 Mapudungun
>
> United Kingdom
> 202900 Angloromani
>
> Spain
> 102000 Spanish Sign Language
>
> Sweden
> 109600 Finnish, Tornedalen
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
On 06/25/2011 03:11 PM, Bishakha Datta wrote:
> I posted this on the India list (many people are not subscribed to
> foundation-l) - forwarding this question which just popped up.

First of all, although numbers look fascinatingly precise, they are far
from that. When you make a sum of approximations like
~1M+800k+30k+4k+700+20+ the language spoken by three individuals, you
will get fascinating number 1,834,723. So, the numbers are far from
being census-level precision.

All of the numbers are based on Ethnologue data [1], which varies from
very good to very bad approximations. Ethnologue varies even in
linguistic classification a lot. (Being educated in Serbian linguistics,
I know how bad the description of the South Slavic area is.) BUT, it is
the best source for all languages of the world ever been made, and it
gives good general picture.

> [296,097,274 349 India]
>
> Does the population number mean that the existing indic language
> wikipedias covers the rest of the population ie over 90 crore? Is
> this information updated from the current census?

By making a quick approximation of number of speakers of some large
official languages of India [2] and not counting English, I've come to
the number of ~650M and stopped counting (BTW, that includes the number
of 180M of Hindi speakers from 1991; and according to the population
growth in India, there should be at least 250M of Hindi speakers today).
Thus, I think that ~300M more could be gathered by other languages with
Wikipedias and by adjusting existing numbers for population growth. (I
could make more precise calculation if needed, but I would need some
time.) It should be also noted that dates of the entries in Ethnologue
vary a lot and that some of them could be old 20 years or more.

And, again, this should be used as very general guideline, not as a
precise one. This list would be very good in telling that there are much
more speakers of Awadhi than Merwari today. However, it is not good to
be used for comparison of number of speakers between Awadhi and
Maithili. But, anyway, that's not important. We know that we should work
to cover both Awadhi and Maithili.

At the other side, I will, indeed, try to make those numbers more useful
(although I think that the most important usefulness is about pointing
to the large populations without Wikipedias).

> It is fascinating, although I think I may not have understood the
> classifications. Is there only one Indian Sign Language, for
> instance? I was told by a user (in the UK) that several are in use in
> different parts of the country. Still, perhaps the variants do not
> have sufficient numbers of users to qualify for this listing.
> However, the context in which I was told was precisely the severe
> lack of support materials for helping users become self-sufficient
> and good communicators, so the list itself becomes a barrier.
>
> Unfortunately, I do not know at the moment how to fix the problem.

I've checked the whole database and just one Indian Sign Language has
been listed, which doesn't tell us a lot. Ethnologue entry about Indian
Sign Language [3] says that it is called "Indo-Pakistani Sign Language"
or "Urban Indian Sign Language". However, according to the fact that
"Deaf schools mainly do not use ISL...", it could mean that dialectical
divergence could be very high (thus, it could look as a number of
different languages), no matter the fact that it's been used in Pakistan
and Bangladesh, as well.

Said so, I have to admit that my knowledge about sign languages is very
limited.

[1] http://www.ethnologue.com/
[2] http://en.wikipedia.org/wiki/Languages_with_official_status_in_India
[3] http://www.ethnologue.com/show_language.asp?code=ins

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
Some of these actually already have Wikipedias:

Meadow Mari
Yakut (aka Sakha)
Lak
Balkar (aka Karachay-Balkar)
Yiddish, Eastern (= "standard" Yiddish, "Western Yiddish" is the one we are
missing but it has much fewer speakers; according to Ethnologue there are
only 5,400 around the world)

In addition, in another message you stated that we probably had Wikipedias
in every Sinitic language that was distinct enough from Mandarin to receive
an own Wikipedia; Min Bei has 10.3 million speakers and does not have a
Wikipedia and is definitely far removed from Mandarin; Xiang is also
probably deserving of its own Wikipedia and has 30 million+ speakers.


2011/6/24 Milos Rancic <millosh@gmail.com>

> While preparing Missing Wikipedias [1], I've got numbers of speakers and
> languages by area and country with chapter not covered by Wikipedias.
>
> Numbers are preliminary, some of them should be corrected. I didn't
> exclude Han languages, which mostly shouldn't be counted, and similar.
> Note, also, that every language should be analyzed separately. Many
> languages are spoken not just inside of one country.
>
> Please, fix errors and comment.
>
> * * *
>
> Areas. They approximate the usual definitions of areas, but they are
> different because of linguistic corrections.
>
> * Afro-Asiatic Area: Area where Afro-Asiatic languages are dominant.
> North Africa + Middle East + Sudan, Ethiopia, Eritrea and Somalia - Iran.
> * Europe: Europe (including Caucasus) includes Turkey.
> * South Asia: South Asia + Iran. Dominantly Indo-European and Dravidian
> languages.
> * Sub-Saharan Africa: The rest of Africa.
> * Polynesia, Australia and Oceania: Includes Malaysia and Taiwan
> (Taiwanese languages not covered in Wikipedias are dominantly
> Austronesian.)
> * East Asia: Han China "China (Central)", Korea and Japan.
> * South-East Asia: Includes non-Han south China "China (South)".
> * Latin America: Parts of America where Spanish and Portuguese are
> official languages.
> * Anglo-French America: Parts of America where English, French and Dutch
> are official languages.
> * North Asia: Asian part of former USSR, Mongolia and non-Han northern
> and western China "China (North)".
>
> The first column is number of speakers, the second number of languages,
> the third is area.
>
> 399259294 592 South Asia
> 353676706 1805 Sub-Saharan Africa
> 221855457 253 Afro-Asiatic Area
> 138979263 2198 Polynesia, Australia and Oceania
> 107363760 37 East Asia
> 99260271 447 South-East Asia
> 47901185 143 Europe
> 30361602 724 Latin America
> 8481452 227 Anglo-French America
> 3724384 45 North Asia
>
> * * *
>
> Countries with chapters. (Numbers are not fully correct, as they include
> some languages removed in the list below this one.)
>
> If any chapter (or interested group) is interested in full list of
> missing languages, I'll provide it by request before completing the
> work. I suppose that some chapters are interested in languages with less
> than 100K of speakers, as well.
>
> 296,097,274 349 India
> 71,356,176 681 Indonesia
> 46,676,395 157 Philippines
> 7,819,010 9 Germany
> 7,994,871 76 Russian Federation
> 5,386,580 5 Serbia
> 4,785,299 6 South Africa
> 2,841,300 17 Israel
> 1,139,750 4 Ukraine
> 1,085,931 125 United States
> 832,000 3 Netherlands
> 705,967 70 Canada
> 472,470 1 Czech Republic
> 375,704 17 Taiwan
> 313,642 6 Chile
> 246,900 3 United Kingdom
> 200,500 4 Spain
> 191,430 5 Poland
> 151,240 7 Sweden
> 132,809 12 Argentina
> 86,390 155 Australia
> 50,000 1 France
> 30,000 1 Hungary
> 29,980 4 Switzerland
> 17,460 5 Finland
> 15,000 1 Portugal
> 10,500 2 Norway
> 5,000 1 Denmark
> 4,500 1 Estonia
>
> Languages with more than million or more than 100,000 of speakers
> without Wikipedia and with chapter in the country:
>
> India (more than million)
> 38261000 Awadhi
> 34700000 Maithili
> 17500000 Chhattisgarhi
> 13000000 Magahi
> 13000000 Haryanvi
> 12800000 Deccan
> 10400000 Malvi
> 9500000 Kanauji
> 9000000 Dhundari
> 7760000 Bagheli
> 6970000 Varhadi-Nagpuri
> 6170900 Santali
> 6000000 Lambadi
> 5622600 Marwari
> 5000000 Mewati
> 4730000 Hadothi
> 4004490 Konkani
> 3900000 Merwari
> 3800000 Mina
> 3633900 Konkani, Goan
> 3000000 Shekhawati
> 3000000 Godwari
> 2920000 Garhwali
> 2680000 Indian Sign Language
> 2360000 Kumaoni
> 2110000 Dogri
> 2100000 Bagri
> 2094200 Kurux
> 2000000 Mewari
> 1970000 Sadri
> 1950000 Tulu
> 1950000 Gondi, Northern
> 1930000 Waddar
> 1710000 Wagdi
> 1700000 Kangri
> 1580000 Khandesi
> 1560280 Mundari
> 1543300 Bodo
> 1500000 Ho
> 1430000 Nimadi
> 1391000 Meitei
> 1300000 Bhili
> 1200000 Vasavi
> 1150000 Bhilali
> 1045000 Panjabi, Mirpur
> 1000000 Pahari, Mahasu
>
> Indonesia (more than million)
> 13600900 Madura
> 5530000 Minangkabau
> 3930000 Musi
> 3502300 Banjar
> 3330000 Bali
> 2700000 Betawi
> 2350000 Malay, Central
> 2100000 Sasak
> 2000000 Batak Toba
> 1880000 Malay, Makassar
> 1600000 Makasar
> 1200000 Batak Simalungun
> 1200000 Batak Dairi
> 1100000 Batak Mandailing
> 1000000 Malay, Jambi
>
> Philippines (more than 100k)
> 5770000 Hiligaynon
> 2500000 Bicolano, Central
> 1900000 Bicolano, Albay
> 1062000 Tausug
> 1000000 Maguindanao
> 776000 Maranao
> 639000 Capiznon
> 540000 Bontoc, Central
> 500000 Ibanag
> 395000 Inakeanon
> 378000 Kinaray-a
> 350000 Masbatenyo
> 345000 Surigaonon
> 319000 Sama, Southern
> 293000 Chavacano
> 234000 Bicolano, Iriga
> 200000 Romblomanon
> 200000 Bantoanon
> 185000 Sorsogon, Waray
> 150000 Kankanaey
> 150000 Blaan, Koronadal
> 147000 Davawenyo
> 140000 Subanen, Central
> 134000 Itawit
> 123000 Cuyonon
> 122000 Bicolano, Northern Catanduanes
> 111000 Ibaloi
> 107000 Yakan
> 100000 Philippine Sign Language
> 100000 Binukid
>
> Germany
> 4910000 Mainfränkisch
> 2000000 Saxon, Upper
> 819000 Swabian
>
> Russian Federation
> 783720 Lezgi
> 696630 Erzya
> 614000 Moksha
> 516490 Dargwa
> 499300 Adyghe
> 460090 Mari, Meadow
> 422550 Kumyk
> 413000 Ingush
> 363000 Yakut
> 264400 Tuva
> 217000 Komi-Zyrian
> 164420 Lak
> 128900 Tabassaran
> 113710 Balkar
>
> Serbia and Kosovo
> 4156090 Albanian, Gheg
> 709570 Romani, Balkan
> 318920 Romani, Sinte
> 172000 Romano-Serbian
>
> South Africa
> 4101000 Sotho, Northern
> 640000 Ndebele
>
> Israel
> 1762320 Yiddish, Eastern
> 352500 Arabic, Judeo-Tunisian
> 258930 Arabic, Judeo-Moroccan
> 110000 Bukharic
> 100130 Arabic, Judeo-Iraqi
>
> United States
> 600000 Hawai’i Creole English
> 250000 Sea Island Creole English
>
> Netherlands
> 592000 Gronings
> 220000 Zeeuws
>
> Canada
> 402900 Plautdietsch
>
> Czech Republic
> 472470 Romani, Carpathian
>
> Taiwan
> 138000 Amis
>
> Chile
> 300039 Mapudungun
>
> United Kingdom
> 202900 Angloromani
>
> Spain
> 102000 Spanish Sign Language
>
> Sweden
> 109600 Finnish, Tornedalen
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
On 06/27/2011 12:30 AM, M. Williamson wrote:
> Some of these actually already have Wikipedias:
>
> Meadow Mari
> Yakut (aka Sakha)
> Lak
> Balkar (aka Karachay-Balkar)
> Yiddish, Eastern (= "standard" Yiddish, "Western Yiddish" is the one we are
> missing but it has much fewer speakers; according to Ethnologue there are
> only 5,400 around the world)
>
> In addition, in another message you stated that we probably had Wikipedias
> in every Sinitic language that was distinct enough from Mandarin to receive
> an own Wikipedia; Min Bei has 10.3 million speakers and does not have a
> Wikipedia and is definitely far removed from Mandarin; Xiang is also
> probably deserving of its own Wikipedia and has 30 million+ speakers.

Thanks for the corrections!

As for Han languages, because of the languages which you mentioned, I
intentionally left all of them. Obviously, they will be analyzed on
case-by-case basis.

But, Han languages are not endangered, China is fairly developed
country, their basic written language needs are covered by CJK
characters and fonts etc. If they want to have Wikipedia, it is likely
that they would get it, but it is not priority.

If we are talking about languages of China, Hmong–Mien (or Miao–Yao)
languages, for example, should be more in focus, as some of them have
enough speakers to create viable Wikimedia projects if supported
(Chuanqiandian Cluster Miao has 1.4M of speakers).

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
More data could be found at [1]. It is about coverage of languages by
Wikimedia projects by size of population, logarithmic.

Numbers are not a surprise.

[1]
https://spreadsheets.google.com/spreadsheet/ccc?key=tCwO11tFPLPB-SJafDesypg&authkey=CPCE5pMB#gid=1


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
Milosh, thanks for your work. Just to correct: Moksha, Erzya, Yakut
(=Sakha), Komi-Zyrian (=Komi) and Lak all have Wikipedias (though
admittedly for Lak I am the only active contributor). Adyge is almost
identical to Kabardino-Circassian, and Adyge speakers probably will never
have their own Wikipedia. Balkar is a part of Karachai-Balkar which has a
Wikipedia.

Cheers
Yaroslav


> Russian Federation
> 783720 Lezgi
> 696630 Erzya
> 614000 Moksha
> 516490 Dargwa
> 499300 Adyghe
> 460090 Mari, Meadow
> 422550 Kumyk
> 413000 Ingush
> 363000 Yakut
> 264400 Tuva
> 217000 Komi-Zyrian
> 164420 Lak
> 128900 Tabassaran
> 113710 Balkar

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
2011/7/1 Yaroslav M. Blanter <putevod@mccme.ru>:
> Adyge is almost
> identical to Kabardino-Circassian, and Adyge speakers probably will never
> have their own Wikipedia.

From what i hear about this, Adyge and Kabardian may be two varieties
of a Circassian [[macrolanguage]]. Maybe someone who cares about it
will submit a request to ISO to consider redefining their codes
accordingly.

The recently created Kabardian Wikipedia ( kbd.wikipedia.org ) is
developing quite nicely. It already has contributors in both varieties
of this language and they get along well.

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
"We're living in pieces,
 I want to live in peace." - T. Moore

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
On 07/01/2011 01:24 PM, Yaroslav M. Blanter wrote:
> Milosh, thanks for your work. Just to correct: Moksha, Erzya, Yakut
> (=Sakha), Komi-Zyrian (=Komi) and Lak all have Wikipedias (though
> admittedly for Lak I am the only active contributor). Adyge is almost
> identical to Kabardino-Circassian, and Adyge speakers probably will never
> have their own Wikipedia. Balkar is a part of Karachai-Balkar which has a
> Wikipedia.

Thanks! I've updated database for those which have Wikipedias.

As Russia is fairly developed country, it is likely that reaching people
who speak those languages and teaching them how to use Wikimedia
projects would the task for WM RU. Besides that, I think that all
languages of Russia have writing systems and support in Unicode.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: Languages and numbers [ In reply to ]
2011/7/1 Milos Rancic <millosh@gmail.com>:
> As Russia is fairly developed country, it is likely that reaching people
> who speak those languages and teaching them how to use Wikimedia
> projects would the task for WM RU. Besides that, I think that all
> languages of Russia have writing systems and support in Unicode.

Actually, a few small languages in Northern and Eastern Russia don't
have writing systems, but at least for some of them one is being
developed by the government.

And all the current languages of Russia are indeed supported in
Unicode, but in a few discussions i had just a couple of weeks ago i
learned the shocking truth: While we take Unicode for granted for
about a decade, it is not so for quite a lot of people around the
globe. In less developed parts of Russia there are still computers
with Windows 98 and even earlier, and Unicode support there is poor to
non-existent. Maybe in Russia WM-RU can indeed handle this - for
example, to organize sending donated second-hand computers to key
organizations in these regions (schools, libraries, local newspapers
etc.)

This, however, happens in many other countries, some of which need
Unicode even more desperately than these Russian regions, and which
don't have a chapter. For example, Ethiopia. There the Foundation or
other chapters will be able to help. WM-IL, for example, sent
second-hand computers pre-installed with Ubuntu and offline Wikipedia
to African countries, and maybe other chapters did similar things,
too.

Long story short: Unicode support cannot be taken for granted, but
something can be done about it.

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l