Mailing List Archive

Cross-wiki search API?
I seem to remember that all Wikimedia wikis now share a single search
index, and per-wiki searches are filtered through a tag for the respective
wiki.

If that is indeed the case, is there an API to search all wikis, but
omitting that tag?
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Cross-wiki search API? [ In reply to ]
See also related bug:
https://phabricator.wikimedia.org/T71489

On Fri, Sep 11, 2015 at 12:45 PM, Magnus Manske <magnusmanske@googlemail.com
> wrote:

> I seem to remember that all Wikimedia wikis now share a single search
> index, and per-wiki searches are filtered through a tag for the respective
> wiki.
>
> If that is indeed the case, is there an API to search all wikis, but
> omitting that tag?
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Cross-wiki search API? [ In reply to ]
Unfortunately there is not a single index that is then filtered by tags, in
WMF production we currently have 1824 different search indexes. Due to
these being all independent search indexes the scores generated by queries
in one in index are not directly comparable to the scores generated by
querying another index (we can't just naively merge the result set's
together). By having these as separate indexes our busiest index
(enwiki_content) is able to be kept down to only querying against 150GB of
data, rather than querying the full cross-wiki data set which represents
2.5TB of data. This ends up being very important as we serve 1.5-3k queries
per second against this index.

On Fri, Sep 11, 2015 at 4:25 AM, Eran Rosenthal <eranroz89@gmail.com> wrote:

> See also related bug:
> https://phabricator.wikimedia.org/T71489
>
> On Fri, Sep 11, 2015 at 12:45 PM, Magnus Manske <
> magnusmanske@googlemail.com
> > wrote:
>
> > I seem to remember that all Wikimedia wikis now share a single search
> > index, and per-wiki searches are filtered through a tag for the
> respective
> > wiki.
> >
> > If that is indeed the case, is there an API to search all wikis, but
> > omitting that tag?
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Cross-wiki search API? [ In reply to ]
I should add we are currently also looking into
https://phabricator.wikimedia.org/T109715 which would make the full data
set available for raw queries (directly to elasticsearch) to labs users.
When issuing raw queries you can query every index in elasticsearch. its
not performant at all, but it would work.

On Fri, Sep 11, 2015 at 7:56 AM, Erik Bernhardson <
ebernhardson@wikimedia.org> wrote:

> Unfortunately there is not a single index that is then filtered by tags,
> in WMF production we currently have 1824 different search indexes. Due to
> these being all independent search indexes the scores generated by queries
> in one in index are not directly comparable to the scores generated by
> querying another index (we can't just naively merge the result set's
> together). By having these as separate indexes our busiest index
> (enwiki_content) is able to be kept down to only querying against 150GB of
> data, rather than querying the full cross-wiki data set which represents
> 2.5TB of data. This ends up being very important as we serve 1.5-3k queries
> per second against this index.
>
> On Fri, Sep 11, 2015 at 4:25 AM, Eran Rosenthal <eranroz89@gmail.com>
> wrote:
>
>> See also related bug:
>> https://phabricator.wikimedia.org/T71489
>>
>> On Fri, Sep 11, 2015 at 12:45 PM, Magnus Manske <
>> magnusmanske@googlemail.com
>> > wrote:
>>
>> > I seem to remember that all Wikimedia wikis now share a single search
>> > index, and per-wiki searches are filtered through a tag for the
>> respective
>> > wiki.
>> >
>> > If that is indeed the case, is there an API to search all wikis, but
>> > omitting that tag?
>> > _______________________________________________
>> > Wikitech-l mailing list
>> > Wikitech-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l