Uploaded image for project: 'eZ Publish / Platform'
  1. eZ Publish / Platform
  2. EZP-24377

Solr multicore: penalize matches in secondary languages

    Details

    • Sprint:
      Pollux Platform S13, Pollux Platform S14

      Description

      Solr multicore implementation uses field filters as a language configuration with a list of prioritized languages. Query can match a Content in the primary (or the most-prioritized) language, or in the one of the fallback languages, or through always available fallback.

      Since each translation is indexed in a separate core, relevancy statistics for a language are kept in its core. When a secondary language is returned, it can be possible that it is scored above the matches in the primary language. Matches in secondary languages should be penalized with negative scoring factor, so that they are scored below matches in the primary language.

      Penalization factor should be configurable.

        Issue Links

          Activity

          Hide
          Paul Borgermans (Inactive) added a comment - - edited

          In order to penalize, the filter mechanism cannot be used. So boosting should be the target mechanism.

          Now boosting mechanisms depend on the query type being used which can be a parameter sent to the Solr backend with every request, or as a default in solrconfig.xml (the main configuration file for Solr):

          1. "standard", this is raw Lucene: in this case there are no additional boost options and everything related to boosting should be part of the query
          2. "edismax", as used in eZ Find is much more versatile: there are dedicated boost parameters

          For the language dependent boosting, the easiest way is to use the bq parameter on the language meta field. Where eZ Find used a (too) simple penalizing scheme, a better alternative is to use a reciprocal function in the order of the sileaccess languages used.

          So if the ordered list of languages used is:

          [0] => eng-GB
          [1] => nor-NO
          [3] => ger-DE
          [4] => fre-FR
          ...
          The formula 1+ ( a / (m*x +b) ) with x the array key and (a, m, b) tunable parameters provides a good way to boost/penalize the prioritized languages

          For example with parameters (4, 3,1), the boost values become

          Array key Boost value
          0 5.00
          1 2.00
          2 1.57
          3 1.40
          4 1.31
          5 1.25
          6 1.21
          7 1.18
          8 1.16
          9 1.14
          10 1.13

          ...

          Show
          Paul Borgermans (Inactive) added a comment - - edited In order to penalize, the filter mechanism cannot be used. So boosting should be the target mechanism. Now boosting mechanisms depend on the query type being used which can be a parameter sent to the Solr backend with every request, or as a default in solrconfig.xml (the main configuration file for Solr): "standard", this is raw Lucene: in this case there are no additional boost options and everything related to boosting should be part of the query "edismax", as used in eZ Find is much more versatile: there are dedicated boost parameters For the language dependent boosting, the easiest way is to use the bq parameter on the language meta field. Where eZ Find used a (too) simple penalizing scheme, a better alternative is to use a reciprocal function in the order of the sileaccess languages used. So if the ordered list of languages used is: [0] => eng-GB [1] => nor-NO [3] => ger-DE [4] => fre-FR ... The formula 1+ ( a / (m*x +b) ) with x the array key and (a, m, b) tunable parameters provides a good way to boost/penalize the prioritized languages For example with parameters (4, 3,1), the boost values become Array key Boost value 0 5.00 1 2.00 2 1.57 3 1.40 4 1.31 5 1.25 6 1.21 7 1.18 8 1.16 9 1.14 10 1.13 ...

            People

            • Assignee:
              Unassigned
              Reporter:
              Petar Spanja (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 2 days
                2d
                Remaining:
                Remaining Estimate - 2 days
                2d
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Agile