Uploaded image for project: 'eZ Publish / Platform'
  1. eZ Publish / Platform
  2. EZP-8997

Some words with accented chars not searchable - mysql collation problem

    XMLWordPrintable

Details

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Medium Medium
    • None
    • 3.6.2
    • Misc
    • None
    • Version: 3.8.3 (updated from 3.6.2)
      PHP Version: 4.4
      Webserver: apache2
      Database: mysql5

    Description

      Discovered a very strange problem in a site with norwegian content.

      Most of words with accented chars were searchable but noticed that "kåfjord" gave 0 hits, although it was used in hundreds of places. After the initial theory that system just hates some parts of norway was cast aside I looked at sql debug.. long long story short, this is what happens when "kåfjord" is searched for:

      1. select is done on ezsearch_word for "kåfjord". in this case this returns TWO lines, as table uses utf8_general_ci collation and the word "kafjord" (without accent) is also present in system.

      2. select is done for objects containig "kåfjord"

      3. select is done for objects containing "kafjord" AND JOINED WITH RESULT OF PREVIOUS SELECT.

      Resultingly, search returns only those objects that have both versions of word in them.

      So the problem came from use of wrong collation but I think this should be checked at install time and user warned/better collation offered like utf8_bin, also the behaviour of ez was wrong in any case - if only one word is searched for, hits for both versions should be shown, not the common part.

      Attachments

        Activity

          People

            unknown unknown
            zurgutt zurgutt
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: