Mailing List Archive

Feedback requested on our search APIs
Do you use our search API? If so, I'd like to hear from you!

The Discovery Department
<https://wikimediafoundation.org/wiki/Staff_and_contractors#Discovery> at
the Wikimedia Foundation is tasked with building a path of discovery to
relevant and trusted knowledge. In line with that, one of our primary
responsibilities is to ensure that our search APIs are stable, fast, and
easy to use. We'd love to hear from the people that are using our APIs, so
we can learn what you love about them, what frustrates you, and what we can
do to improve them for you.

I'd prefer that you keep the comments about the API itself rather than the
relevance of the results it returns; I plan to start a separate thread
about the result relevance, since they're separate topics.

If you have some feedback, please reply in this thread or reach out to me
privately.

Thanks!

Dan

--
Dan Garry
Product Manager, Discovery
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
On 6/8/15, Dan Garry <dgarry@wikimedia.org> wrote:
> Do you use our search API? If so, I'd like to hear from you!
>
> The Discovery Department
> <https://wikimediafoundation.org/wiki/Staff_and_contractors#Discovery> at
> the Wikimedia Foundation is tasked with building a path of discovery to
> relevant and trusted knowledge. In line with that, one of our primary
> responsibilities is to ensure that our search APIs are stable, fast, and
> easy to use. We'd love to hear from the people that are using our APIs, so
> we can learn what you love about them, what frustrates you, and what we can
> do to improve them for you.
>
> I'd prefer that you keep the comments about the API itself rather than the
> relevance of the results it returns; I plan to start a separate thread
> about the result relevance, since they're separate topics.
>
> If you have some feedback, please reply in this thread or reach out to me
> privately.
>
> Thanks!
>
> Dan
>
> --
> Dan Garry
> Product Manager, Discovery
> Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

The search api (by which I mean query=search in api.php) is somewhat
poorly documented. You have to dig to find
https://www.mediawiki.org/wiki/Help:CirrusSearch . I would much prefer
that the relavent documentation was including in the normal api.php
auto-generated help. Even better would be if that api allowed users to
specify the options using normal url parameters, (as a separate
options from using operators in the search string). Its also not
entirely the most clear from the api that the search options differ
depending on which extensions you have installed.

Additionally, from the help page, its not entirely clear about some of
the limitations. e.g. You can't do incategory:Foo OR intitle:bar.
regexes on intitle don't seem to work over the whole title, only word
level tokens (I think, maybe? I'm a bit unclear on how the regex
operator works).

Cheers,
Brian

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff <bawolff@gmail.com> wrote:

> The search api (by which I mean query=search in api.php) is somewhat
> poorly documented. You have to dig to find
> https://www.mediawiki.org/wiki/Help:CirrusSearch .


I recently added https://www.mediawiki.org/wiki/API:Search_and_discovery
which clarifies the connection with Help:CirrusSearch, and mentions other
kinds of searching like geosearch.

>
> I would much prefer
> that the relavent documentation was including in the normal api.php
> auto-generated help.


https://gerrit.wikimedia.org/r/216899 changes the
'apihelp-query+search-param-search message' in
https://www.mediawiki.org/wiki/Special:ApiHelp/query+search to
*srsearch*

Search for page titles and page content that match this value. You can use
the search string to invoke special wiki search features, depending on what
its search backend implements.
But API query search can only use CirrusSearch features if it's installed.
I think Extension:CirrusSearch could handle the 'APIGetAllowedParams' hook
to modified this help text. If I understand correctly, it might be easier
to interpose WMF-specific help text that links to mw:Help:CirrusSearch in a
'wikimedia-apihelp-query+search-param-search' key in
extensions/WikimediaMessages/i18n/wikimediaoverrides/en.json ; I tried it
locally and it didn't work.



> Even better would be if that api allowed users to
> specify the options using normal url parameters, (as a separate
> options from using operators in the search string). Its also not
> entirely the most clear from the api that the search options differ
> depending on which extensions you have installed.
>

What do you mean? Beyone special terms in srsearch I'm not aware of any
changes to query+search's sr parameters depending on extensions.


> Additionally, from the help page, its not entirely clear about some of
> the limitations. e.g. You can't do incategory:Foo OR intitle:bar.
> regexes on intitle don't seem to work over the whole title, only word
> level tokens (I think, maybe? I'm a bit unclear on how the regex
> operator works).
>

Yes it's not a full reference.

--
=S Page WMF Tech writer
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
On 6/8/15, S Page <spage@wikimedia.org> wrote:
> On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff <bawolff@gmail.com> wrote:
>
>> The search api (by which I mean query=search in api.php) is somewhat
>> poorly documented. You have to dig to find
>> https://www.mediawiki.org/wiki/Help:CirrusSearch .
>
>
> I recently added https://www.mediawiki.org/wiki/API:Search_and_discovery
> which clarifies the connection with Help:CirrusSearch, and mentions other
> kinds of searching like geosearch.
>

Last I looked at the docs was about 6 months ago. Glad to hear they're
improving.

>>
>> I would much prefer
>> that the relavent documentation was including in the normal api.php
>> auto-generated help.
>
>
> https://gerrit.wikimedia.org/r/216899 changes the
> 'apihelp-query+search-param-search message' in
> https://www.mediawiki.org/wiki/Special:ApiHelp/query+search to
> *srsearch*
>
> Search for page titles and page content that match this value. You can use
> the search string to invoke special wiki search features, depending on what
> its search backend implements.
> But API query search can only use CirrusSearch features if it's installed.
> I think Extension:CirrusSearch could handle the 'APIGetAllowedParams' hook
> to modified this help text. If I understand correctly, it might be easier
> to interpose WMF-specific help text that links to mw:Help:CirrusSearch in a
> 'wikimedia-apihelp-query+search-param-search' key in
> extensions/WikimediaMessages/i18n/wikimediaoverrides/en.json ; I tried it
> locally and it didn't work.

It shouldn't be WMF specific (since its not WMF specific like TOS
links), it should be specific to CirrusSearch.

One possible implementation would be to do an override message (I
would note, that the wikimediaoverride messages aren't direct
overrides, they are replacement messages used by other code that does
the overriding). In my original email I was thinking more from a user
perspective of what I'd like to see, without thought to how it would
be implemented. Without looking at the code, I would probably favour
an extra hook just for the search module, instead of using the generic
hook.

>
>
>> Even better would be if that api allowed users to
>> specify the options using normal url parameters, (as a separate
>> options from using operators in the search string). Its also not
>> entirely the most clear from the api that the search options differ
>> depending on which extensions you have installed.
>>
>
> What do you mean? Beyone special terms in srsearch I'm not aware of any
> changes to query+search's sr parameters depending on extensions.
>

Yeah, that doesn't happen currently. I think it should be the case, it
would mesh much better with the mediawiki api if instead of doing
https://commons.wikimedia.org/w/api.php?action=query&list=search&srsearch=Black+incategory:Felis_silvestris_catus&srnamespace=6
you could do something like
https://commons.wikimedia.org/w/api.php?action=query&list=search&srincategory=Felis_silvestris_catus&srsearch=Black&srnamespace=6
. Especially if all the parameters were documented in the normal api
way, I think it would represent a big boon to discovering the hidden
features of search. (I appreciate it might be a lot of work to express
all the search options possible, but the original email sounded like
it wanted a wishlist).

--
bawolff

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff <bawolff@gmail.com> wrote:

> Additionally, from the help page, its not entirely clear about some of
> the limitations. e.g. You can't do incategory:Foo OR intitle:bar.
> regexes on intitle don't seem to work over the whole title, only word
> level tokens (I think, maybe? I'm a bit unclear on how the regex
> operator works).
>

Being able to see a parse tree of the search expression would be nice, like
with the parse/expandtemplates APIs. That would make it easier to find out
whether the search fails because the query is parsed differently from what
you imagined, or because there really is nothing to return.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
Dan Garry wrote:
>In line with that, one of our primary responsibilities is to ensure that
>our search APIs are stable, fast, and easy to use. We'd love to hear from
>the people that are using our APIs, so we can learn what you love about
>them, what frustrates you, and what we can do to improve them for you.

I have two recurring thoughts about search lately, since you asked.

First, multimedia search is absolutely horrible, basically non-existent.
If you go to Wikimedia Commons and try its search functionality and then
compare to any other media service on the Internet, you can quickly come
up with a list of a dozen features that are missing (search by file size,
by color, by image file format, etc.).

Second, Wikimedia still hasn't aggregated and released anonymized search
data. People use Special:Search daily and they encounter a page of search
results instead of having a redirect take them to the appropriate
destination. Or sometimes worse there's no coverage at all of what our
users are searching for. It's a long tail, yes, but we could start filling
in gaps if we had data about what users are looking for. We could save
users a lot of time and build better sites by analyzing what users are
looking for and not finding or what they're looking for and not
immediately being redirected toward. And yes, of course, there are privacy
considerations (the infamous AOL case, &c.), but nothing insurmountable.

Beyond these two points, it's vitally important that we able to
arbitrarily query Wikidata soon. I'm hoping this functionality is live on
Wikimedia wikis by the end of 2015. And speaking to APIs specifically, we
really need to focus on projects such as Wiktionary and Wikisource that
are desperately in need of API support to serialize and add structure to
what is currently very fragile blobs of wikitext markup.

MZMcBride

P.S. RIP, SAD.



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
On Wed, 2015-06-10 at 02:01 -0400, MZMcBride wrote:
> a list of a dozen features that are missing (search by file size,
> by color, by image file format, etc.).

Also see https://phabricator.wikimedia.org/T101089 and
https://phabricator.wikimedia.org/T101087


--
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
On Wed, Jun 10, 2015 at 8:01 AM, MZMcBride <z@mzmcbride.com> wrote:
> I have two recurring thoughts about search lately, since you asked.
>
> First, multimedia search is absolutely horrible, basically non-existent.
> If you go to Wikimedia Commons and try its search functionality and then
> compare to any other media service on the Internet, you can quickly come
> up with a list of a dozen features that are missing (search by file size,
> by color, by image file format, etc.).

To really make this awesome we need structured data support for
Commons with Wikidata. We'll be making more progress on it in the
second half of this year but there is a lot to do.

<snip>

> Beyond these two points, it's vitally important that we able to
> arbitrarily query Wikidata soon. I'm hoping this functionality is live on
> Wikimedia wikis by the end of 2015. And speaking to APIs specifically, we
> really need to focus on projects such as Wiktionary and Wikisource that
> are desperately in need of API support to serialize and add structure to
> what is currently very fragile blobs of wikitext markup.

Please give feedback on the latest proposal for Wikidata support for
Wiktionary: https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015-05


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
On Mon, Jun 8, 2015 at 7:16 PM, Brian Wolff <bawolff@gmail.com> wrote:

> You can't do incategory:Foo OR intitle:bar.
> regexes on intitle don't seem to work over the whole title, only word
> level tokens (I think, maybe? I'm a bit unclear on how the regex
> operator works).
>
>
intitle is word level though you can do phrase searching. Its pretty much
the same as a regular search but limited to the title field.
incategory:Foo OR intitle:Bar is a limitation I'm working on now. No idea
when it'll be avilable. Limitation comes from us trying to be cute with the
command parsing in Cirrus and not writing a whole grammar for the query
language.
Regexes only work for wikitext. This is a somewhat arbitrary decision on my
part - we need to made special ngram fields to accelerate the regex
searching and we only do that for wikitext. We _can_ do it for other fields
at the cost of update time and disk space.

Nik
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
On Tue, Jun 9, 2015 at 2:19 AM, Gergo Tisza <gtisza@wikimedia.org> wrote:

> On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff <bawolff@gmail.com> wrote:
>
> > Additionally, from the help page, its not entirely clear about some of
> > the limitations. e.g. You can't do incategory:Foo OR intitle:bar.
> > regexes on intitle don't seem to work over the whole title, only word
> > level tokens (I think, maybe? I'm a bit unclear on how the regex
> > operator works).
> >
>
> Being able to see a parse tree of the search expression would be nice, like
> with the parse/expandtemplates APIs. That would make it easier to find out
> whether the search fails because the query is parsed differently from what
> you imagined, or because there really is nothing to return.
>
>
You can _kindof_ get that now by adding the cirrusDumpQuery url parameter.
But it only dumps the query as sent by Cirrus to Elasticsearch and that
contains a query_string query that Elasticsearch (Lucene really) parses on
its own.

One interesting option would be to make a way for Cirrus to return
Elasticsearch's explain results. Its not perfect because it only explains
why things are found and scored the way they are but it doesn't explain why
things aren't found. Exporting the actual parsed query is more ambitious.

Nik
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
>
> To really make this awesome we need structured data support for
> Commons with Wikidata. We'll be making more progress on it in the
> second half of this year but there is a lot to do.
>

Sure, to really make that awsome, yeah you need wikidata. But we are
far away from hitting the point where we need wikidata. In fact the
three examples McBride gave don't need wikidata. mime type and file
size are easily programmaticly available already. And unless I'm
mistaken, functionally dependent metadata like algortihmically
determined main image colour, are out of scope of wikidata.

--bawolff

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: Feedback requested on our search APIs [ In reply to ]
On Wed, Jun 10, 2015 at 7:36 PM, Brian Wolff <bawolff@gmail.com> wrote:
>> To really make this awesome we need structured data support for
>> Commons with Wikidata. We'll be making more progress on it in the
>> second half of this year but there is a lot to do.
>
> Sure, to really make that awsome, yeah you need wikidata. But we are
> far away from hitting the point where we need wikidata. In fact the
> three examples McBride gave don't need wikidata. mime type and file
> size are easily programmaticly available already.

Yeah of course.

> And unless I'm
> mistaken, functionally dependent metadata like algortihmically
> determined main image colour, are out of scope of wikidata.

We've been thinking about this a bit but no decision has been made.
It'd be nice to make these accessible in the same way as other
properties without needing to store and maintain them the same way.
We've been thinking about some kind of fake properties for example.
But we'll worry about that when we get there.
We're getting a bit off-topic. Sorry, Dan.


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l