Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: High High
    • Resolution: Fixed
    • Affects Version/s: 5.3.7, 5.4.4.3
    • Fix Version/s: Customer request, 5.4.5, 2015.12
    • Component/s: Caching
    • Labels:
      None

      Description

      It seems that within the default VCL we provide for Varnish 4 and 3, when banning a specific page, the regexp is not good and should cause issue regarding cache.
      See Using Varnish.

      https://github.com/ezsystems/ezpublish-community/blob/master/doc/varnish/vcl/varnish4.vcl#L116

      The regex uses "~" and this means "contains", so if we modify location 123, it will also remove cache for location 1234, 1235, 1236 and so on. This must be changed to prevent such behavior.

      Steps to Reproduce

      1. Create an article and make sure it has a node_id similar to "142"
      2. Create more articles, and make sure one of them has a node_id similar to "142x"
      3. Open the article with the node_id "142x" and make sure you get a "HIT" on the header
      4. Edit the article with "142" and open the article with "142x" and confirm you get a "MISS" in the header

        Activity

        Hide
        Joao Inacio (Inactive) added a comment -

        Was this made possible by the previously bad regex? Because now if we're saying that the X-Location-Id header has to start with the node ID in question, that would no longer match a comma-separated list.

        To be honest I don't recall ever using comma-separated list; the fix here is to make sure the request header specifies the IDs exactly, separated by pipes (eg: ^(2|20)$) and not match anything that contains those characters (such as 2|12|22|120|200 ... ).

        Varnish vcl was not updated, issuing a ban for documents whose X-Location-Id header contains the values passed:

        ban("obj.http.X-Location-Id ~ " + req.http.X-Location-Id);
        

        Show
        Joao Inacio (Inactive) added a comment - Was this made possible by the previously bad regex? Because now if we're saying that the X-Location-Id header has to start with the node ID in question, that would no longer match a comma-separated list. To be honest I don't recall ever using comma-separated list; the fix here is to make sure the request header specifies the IDs exactly, separated by pipes (eg: ^(2|20)$ ) and not match anything that contains those characters (such as 2|12|22|120|200 ... ). Varnish vcl was not updated, issuing a ban for documents whose X-Location-Id header contains the values passed: ban("obj.http.X-Location-Id ~ " + req.http.X-Location-Id);
        Hide
        Peter Keung added a comment -

        The general instruction for adding multiple node IDs to a response was "separate them by commas" (ref: http://share.ez.no/forums/ez-publish-5-platform/connect-controller-cache-with-several-locations-ids)

        Was this made possible by the previously bad regex? Because now if we're saying that the X-Location-Id header has to start with the node ID in question, that would no longer match a comma-separated list.

        Show
        Peter Keung added a comment - The general instruction for adding multiple node IDs to a response was "separate them by commas" (ref: http://share.ez.no/forums/ez-publish-5-platform/connect-controller-cache-with-several-locations-ids ) Was this made possible by the previously bad regex? Because now if we're saying that the X-Location-Id header has to start with the node ID in question, that would no longer match a comma-separated list.
        Hide
        Paulo Nunes (Inactive) added a comment -

        QA Approved

        Show
        Paulo Nunes (Inactive) added a comment - QA Approved
        Show
        Joao Inacio (Inactive) added a comment - Merged in ezpublish-kernel/master: https://github.com/ezsystems/ezpublish-kernel/commit/dbeefb65f9e8497bf1e59770b1eb0c66f0da1454
        Show
        Joao Inacio (Inactive) added a comment - PR: https://github.com/ezsystems/ezpublish-kernel/pull/1427
        Hide
        Ivo Lukač added a comment -

        I can confirm this. It might not be very severe for ids with 3, 4 or more digits, but when the root node is changed with location id = 2, it will ban pages/blocks that are tagged with location id which has the number 2 in it. And that is on large site A LOT.

        It could be fixed by having a more precise regexp here:
        https://github.com/ezsystems/ezpublish-kernel/blob/master/eZ/Publish/Core/MVC/Symfony/Cache/Http/FOSPurgeClient.php#L49

        Show
        Ivo Lukač added a comment - I can confirm this. It might not be very severe for ids with 3, 4 or more digits, but when the root node is changed with location id = 2, it will ban pages/blocks that are tagged with location id which has the number 2 in it. And that is on large site A LOT. It could be fixed by having a more precise regexp here: https://github.com/ezsystems/ezpublish-kernel/blob/master/eZ/Publish/Core/MVC/Symfony/Cache/Http/FOSPurgeClient.php#L49
        Hide
        Eduardo Fernandes (Inactive) added a comment - - edited

        At first, there was a suggestion of using "==" instead "~" in the regex,

        The comparator to use should be "==" (please see https://gist.github.com/dimsemenov/10100415#file-vcl-regex-cheat-sheet-L10)

        But there were some other considerations that invalidated the first suggestion

        Simply changing it from "~" to "==" might not be the way to go. Yes, it would fix the issue of a match for "123" also matching "1234", but it would also prevent it from matching "87,123", when you have multiple X-Location-IDs on a single document, and need the document purged because of it.
        My initial thought is that you'd want to keep using the "~" operator, but improve the regexp, so you more accurately target what needs to be purged.
        Something along the lines of this:

        [^,] req.http.X-Location-Id[$,]
        

        The idea being to match the X-Location-Id, but only if it's either at the start of the value, or there's a comma in front of it, and also only if it's either at the end of the value, or there's a comma behind it. That way, you'd both match "87,123" (comma in front, end of value behind), and also not match "1234" (start of value in front, but neither comma nor end of value behind).
        There are multiple regexp dialects, so please take the exact regexp I suggested with a grain of salt, this should probably be tested properly before updating the default vcl.

        And

        This seems like a better regexp to use: (^|,)req.http.X-Location-Id(,|$)

        Show
        Eduardo Fernandes (Inactive) added a comment - - edited At first, there was a suggestion of using " == " instead " ~ " in the regex, The comparator to use should be "==" (please see https://gist.github.com/dimsemenov/10100415#file-vcl-regex-cheat-sheet-L10 ) But there were some other considerations that invalidated the first suggestion Simply changing it from "~" to "==" might not be the way to go. Yes, it would fix the issue of a match for "123" also matching "1234", but it would also prevent it from matching "87,123", when you have multiple X-Location-IDs on a single document, and need the document purged because of it. My initial thought is that you'd want to keep using the "~" operator, but improve the regexp, so you more accurately target what needs to be purged. Something along the lines of this: [^,] req.http.X-Location-Id[$,] The idea being to match the X-Location-Id, but only if it's either at the start of the value, or there's a comma in front of it, and also only if it's either at the end of the value, or there's a comma behind it. That way, you'd both match "87,123" (comma in front, end of value behind), and also not match "1234" (start of value in front, but neither comma nor end of value behind). There are multiple regexp dialects, so please take the exact regexp I suggested with a grain of salt, this should probably be tested properly before updating the default vcl. And This seems like a better regexp to use: (^|,)req.http.X-Location-Id(,|$)

          People

          • Assignee:
            Unassigned
            Reporter:
            Eduardo Fernandes (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: