Uploaded image for project: 'eZ Publish / Platform'
  1. eZ Publish / Platform
  2. EZP-2154

search-table / strips off special characters

    XMLWordPrintable

Details

    • Icon: Improvement Improvement
    • Resolution: Obsolete
    • Icon: Low Low
    • None
    • None
    • Misc
    • None
    • Version: 3.x
      PHP Version:
      Webserver:
      Database:

    Description

      Hi,

      I'm using utf-8 charset and I'm a bit unhappy with the eZSearchEngine::normalizeText method.

      The first line of the method is
      $text =& strToLower( $text );

      This breaks up utf-8 chars, so one of the following line
      $unicodeValueArray =& $codec->convertString( $text );

      won't give a reasonable result.

      Example:

      If you have the german word "für" (a preposition) - it will be converted to "f" after running through normalizeText. So the character "r" is also stripped off.

      "Österreich" ( = "Austria" in english) -> "terreich"
      "Schönfärberei" (= "garment dyeing" in english) => "schfberei"

      At the moment this isn't big problem because the search-string of a user will also run through the method. So, if someone search "Österreich", he will find content which contains "Österreich".

      But you give the feature of a wildcard search and I think this could be a problem because the search may find too much (or not expected) content.

      Kind regards,
      Emil.

      Attachments

        Activity

          People

            unknown unknown
            emil.webber emil.webber
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: