Purpose: The purpose of this UXPROD is to capture the need to deal with sorting and diacritics. This isn't a Swedish problem, but a more general problem surfaced by Theodor in the context of Swedish (Chalmers). In addressing this UXPROD, we need to look at the problem holistically. Should also look into the other, related linked issues (see links).
Below are the details from the original bug (
UISE-68). Lot's of good discussion can also be found in that bug's comments:
Original Issue Summary: Codex search treats Swedish diacritics as ascii equivalents
Overview: When conducting title level searches in Codex for titles containing Swedish diacritics (å,ä,ö) the search behaves as if those characters are reduced to their ASCII equivalents (a,o).
Steps to Reproduce:
- Create a couple of records in Inventory with titles starting on a, å, ä or similar
"Den äkta varan"
"Den åländska skärgården"
"The Åland archipelago"
"The Aland archipelago"
- Go to Codex and conduct a title search for åland
The title "The Åland archipelago" is showing.
(Another form of expected result is that also "Den åländska skärgården" is showing since "åländska" is a form of "åland" that Swedish stemming algorithms might be able to catch.)
"The Aland archipelago" is returned together with the above and a few other items containing the string "aland". Se attached image.
Additional Information: Will add these in separate issues.
This particular issue might get solved by changing Collation on relevant tables in Postgres to Swedish (see https://www.postgresql.org/docs/9.1/static/collation.html), but I believe that this issue is related to a bigger discussions on search technology