Sorry for coming into this discussion a bit late. I'm one of the members of
Google's translation team, and I wanted to make myself available for
Quoting some suggestions from Mark earlier in the thread:
1) Fix some of the formatting errors with GTTK. Would this really be so
difficult? It seems to me that the breaking of links is a bug that needs
fixing by Google.
We're working on various formatting errors based on our conversations with
members of the Tamil and Telugu Wikipedia. We're hoping to push those out
soon (in the coming weeks).
2) Implement spelling and punctuation check automatically within GTTK before
posting of the articles.
There is spell check in Translator Toolkit, although it's not available for
all languages. We don't have any punctuation checks today and I doubt that
we can release this anytime soon. (If it's not available in Google Docs or
Gmail, then it's unlikely that we'll have it for Translator Toolkit, as
well, since we use the same infrastructure.)
What's the proposal, though - would you like for us to prevent publishing of
articles if they have too many spelling errors, or simply warn the user that
there are X spelling errors? Any input you can provide on preferred
behavior would be great.
3) Have GTTK automatically remove broken templates and images, or require
users to translate any templates before a page may be posted.
Templates are a bit tricky. Sometimes, a template in one Wikipedia does not
exist in another Wikipedia. Other times, a template in one langauge maps to
a template in another language but the parameters are different.
Removing broken templates automatically may not work because some templates
come between words. If we remove them, the sentences or paragraph may
become invalid. We've also considered creating a custom interface for
localizing templates, but this requires a lot of work.
In the interim, the approach we've taken is to have translators fix the
templates in Wikipedia when they post the article from Translator Toolkit.
When a user clicks on Share > Publish to source page in Translator Toolkit,
the Wikipedia article is in preview mode --- it's not live. The idea is
that if there are any errors, the translator can fix them before saving the
4) Include a list of most needed articles for people to create, rather than
random articles that will be of little use to local readers. Some articles,
such as those on local topics, have the added benefit of encouraging more
edits and community participation since they tend to generate more interest
from speakers of a language in my experience.
The articles we selected actually weren't really random. Here's how we
1. we looked at the top Google searches in the region (e.g., for Tamil, we
looked at searches in India and I believe Sri Lanka, as well)
2. from the top Google searches in the region, we looked at the top, clicked
Wikipedia articles --- regardless of the language (so we wound up with
Wikipedia source articles in English, Hindi, and other languages)
3. from the top, clicked Wikipedia articles, we looked for articles that
were either stubs or unavailable in the local language - these are the
articles that we sent for translation
This selection isn't perfect. For example, it assumes that the top, clicked
Wikipedia articles by all users in India/Sri Lanka --- who may be searching
in English, Hindi, Tamil, or some other language --- are relevant to the
Tamil community. To improve this, last month, we met with members of the
Tamil and Telugu Wikipedias to improve this article selection. The main
changes that we agreed on were:
1. the local Wikipedia community should give Google final OK on what
articles should or should not be translated
2. the local Wikipedia community add articles to Google's list
3. the local Wikipedia community can suggest titles for the articles
4. Google's translators will post the articles with their user names, and
they will monitor community feedback on their user pages until the
translation meets the community's standards
We're just getting started on this new process, and we'll keep refining this
with the Tamil and Telugu communities as we move forward. If it's
successful, we'll use it as our template for other projects.
As always, any feedback or suggestions are welcome. Also, while I plan to
look at this foundation lists periodically, if you have bugs, you can also
file them to our bug queue: translator-toolkit-support at google.com. While
the eng team may not monitor this list, they do look at the support queue.
On Wed, Aug 4, 2010 at 5:17 PM, Federico Leva (Nemo) <firstname.lastname@example.org>wrote: > Aphaia, 27/07/2010 21:33:
> > I've noticed many of English Wikipedia articles cite only English
> > written articles even if the topics are of non-English world. And
> > normally, specially in the developing world, the most comprehend
> > sources are found in their own languages - how can those articles be
> > assured in NPOV when they ignore the majority of reliable sources?
> It's not only a matter of NPOV. There's even a policy for this:
> Obviously you can expect other language version to want the same for
> their language.
> foundation-l mailing list
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list