Help talk:CirrusSearch
Add topic![]() | This talk page is about the usage of CirrusSearch. See installation, or development, or bugs, or Wikipedia for related discussions. |
![]() Archives |
---|
See also /en. |
Does boost force sort=relevance?
[edit]If I am using a defined sort order, such as, say, &sort=last_edit_desc
, what happens if I choose a bounded or boosted search on, say, 100km,San Francisco
? Does it turn off my sort selection, and switch it to relevance
instead? If not, what is the meaning of a search which contains both of those (or any sort option other than relevance) ? (subscribed) Mathglot (talk) 05:47, 19 December 2024 (UTC)
How to search the fields of the File information template on Commons?
[edit]Nearly all of the over 110 million files on Commons use this standardized template that specifies various useful metadata like date taken and file description. How to make use of this data and search it?
For example, how could one search file description for a term like "Kathmandu" (as asked about by another user) like on can search with intitle
. description:"Kathmandu"
does show some results but I don't know what it does and the results don't have that word in their description. I could not find info on this at mw:Help:CirrusSearch either. Info how to search specified fields of c:Template:Information should be added here.
- One could also use this to infer categories (such as by reading the date field and then adding it to a category by date like "Videos of {year}") as proposed here.
- For example, I found that many files in
deepcategory:"NASA videos from unidentified year" deepcategory:"Videos of 2020โ"
have been miscategorized into Videos of 2020 (and thus should not be copied into "NASA videos in 2020" from there) where they have the correct date in the date field which is why I'd like to use that to correct that as well as copy them to their year category in c:Category:Videos from NASA by year. - This may also be needed for a date range filter, see phab:T329961. I'd like to search the date field but there is no information on how to do that at Help:Searching but I think it's already possible if I remember correctly.
- Another example: one could set subcats of c:Category:Media from scholarly journals depending on the link in the source field. For example, files with an URL starting with
https://www.nature.com/
or a DOI that resolves to one should be in a respective subcat of c:Category:Media from Nature Publishing Group journals. - One could also search the source field to put files into cats like c:Category:Audio files from Soundcloud.com and so on.
- Also how can one search for files from a specific uploader? (I'd like to check which of my video2commons uploads were imported below resolution at source.)
EBernhardson (WMF) said Unfortunately, the image description is simply an argument to a template. CirrusSearch doesn't do anything at that level and can't be that specific.. I think the best workaround currently would be to use the insource search operator with the field name first so for example I searched for insource:"|source=[https://soundcloud.com
to identify files for c:Category:Audio files from Soundcloud.com. I think easily searching fields of the File pages' Information template could be enabled by
- Developing some regex that searches for any content after e.g.
|source=
- Creating some alias for it so instead of writing some complex regex query every time one can simply enter e.g.
info-source:"soundcloud.com"
Please comment what you think about this proposed way to make this possible and if you have any info on what would be needed for that. Would be great if somebody could develop such (a) regex(es) if there is no better way to search specific fields of the Information template. It's great that files have that structured metadata but it could be much more useful if it was searchable.
Previously asked here. Maybe c:Module:Information could be used for this somehow. rspective (talk) 16:39, 5 March 2025 (UTC)
Regex search speed
[edit]In my experience bare regex searches seem to work even without any other terms. For example https://syl.wikipedia.org/w/index.php?go=Go&search=insource%3A%2F%5C%7B%5C%7BINTERWIKI%2F&title=%EA%A0%9B%EA%A0%A4%EA%A0%A1%EA%A0%A6%EA%A0%A1%3ASearch&ns0=1 completes quickly and returns 43 results. Why is that, and can the warning be removed? * Pppery * it has begun 14:24, 1 April 2025 (UTC)
- The regex search is from my understanding still searching the whole database unless the search area is narrowed with an index based parameter or filter. The database for syl-wiki will just be not very large so that the regex search is faster than the timeout. However, the help page is for every possible Mediawiki installation, so any potential caveat has to be addressed. โ Speravir (talk) โ 23:47, 11 April 2025 (UTC)
not displayable chars U+10FFFF
[edit]In chapter "Substitutions for some metacharacters" columns 'CirrusSearch' and text , all chars explained as "๔ฟฟ" is U+10FFFF" are displayed by a default pavement char as for not existing chars.Adapt or useless ? Thanks. -- Christian ๐ซ๐ท FR ๐จ (talk) 08:04, 7 April 2025 (UTC)
- You may see a โdefault pavement charโ, I see a glyph which has the unicode number (very small) imprinted. In general, you get a glyph of a font selected by your browser, if there is one; otherwise the behaviuor apparently depends on the browser. At this very place it means, there is actually the character U+10FFFF visible and you can execute a copy and paste action. โ Speravir (talk) โ 00:05, 12 April 2025 (UTC)