Uploaded image for project: 'eZ Publish / Platform'
  1. eZ Publish / Platform
  2. EZP-20563

avoid database overload when fetching a node by remote_id

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Medium Medium
    • Resolution: Fixed
    • Affects Version/s: 4.7.0, 5.0
    • Fix Version/s: Customer request, 5.1
    • Component/s: Database related
    • Labels:
      None

      Description

      using the fetch operator with parameter hash( 'remote_id', $some_param ) will cause very slow queries on a large system, can be 10s or more for a single query if there are 1M+ objects in the db.

      creating an index on ezcontentobject_tree.remote_id stops the problem, allows the db usage to go up from overloaded to running cool once more.

        Activity

        Hide
        Gilles Ballini added a comment -

        Hi.

        We have the same issue with a project from which we have the eZ enterprise support.

        I sent a ticket through our eZ Support Service interface.
        Hope this help it to be fixed soon

        Show
        Gilles Ballini added a comment - Hi. We have the same issue with a project from which we have the eZ enterprise support. I sent a ticket through our eZ Support Service interface. Hope this help it to be fixed soon
        Hide
        Philipp Kamps added a comment -

        I should check jira before I try any new fetches. I had the same problem, crashing a high traffic website...

        Show
        Philipp Kamps added a comment - I should check jira before I try any new fetches. I had the same problem, crashing a high traffic website...
        Hide
        Peter Keung added a comment - - edited

        Here's an answer from someone at eZ:

        the issue is already created as an improvement. it is not a bug, since the
        fetch operator has so many functions that trying to foresee all use cases
        is just not possible. performance bottlenecks such as yours depend on the
        specific usage case by case, and trying to make a "generic" optimization at
        design time is just not a good idea. actually, i think it's an
        anti-pattern(imho).

        so, enter the mysql query EXPLAIN. if you have a specific fetch operator
        (the remote_id in your case) that overloads the db, you need to optimize
        the installation, as long as kernel and db schema are not modified, you can
        customize to increase the performance. in this case there's already an
        improvement because it seems an optimization common enough to be useful to
        several users/clients.

        the consequences for your system should be none, apart from the big
        performance improvement. that is, assuming that your db server can cope
        with the index. it is loaded into ram, and it can grow to 100's of MB in
        size. so, be sure that the server has enough ram, or mysql will start
        thrash-swapping trying to keep all that info in server memory.

        a generic index would be created with the 100 chars length, but since your
        db has a very large size, you can try optimizing the index according to the
        contents that you have in the remote_id field, following this rule:
        optimize the index for the N first chars, that depends on your data in the
        field. double N, double the memory, halve the search time. please refer to
        mysql docs to create an index with ( n ) size.

        you will not have performance hit for updates, the innodb b-tree indexation
        in mysql will take care of that. if over time you find that the update is
        getting slower, running ANALYSE or OPTIMIZE on the table will be enough to
        rebuild the index. even with massive updates, running this operation
        monthly should be enough(if you ever need it at all). beware that these
        commands lock the tables, please check the mysql documentation if you
        intend to use them.

        i recommend that you set the generic index on a stage setup, and check how
        much ram it eats in the server, check how fast the queries go. then you can
        try to fine-tune it, set index length to 50 (or increase server ram),
        repeat until you get a balance that works.

        Show
        Peter Keung added a comment - - edited Here's an answer from someone at eZ: the issue is already created as an improvement. it is not a bug, since the fetch operator has so many functions that trying to foresee all use cases is just not possible. performance bottlenecks such as yours depend on the specific usage case by case, and trying to make a "generic" optimization at design time is just not a good idea. actually, i think it's an anti-pattern(imho). so, enter the mysql query EXPLAIN. if you have a specific fetch operator (the remote_id in your case) that overloads the db, you need to optimize the installation, as long as kernel and db schema are not modified, you can customize to increase the performance. in this case there's already an improvement because it seems an optimization common enough to be useful to several users/clients. the consequences for your system should be none, apart from the big performance improvement. that is, assuming that your db server can cope with the index. it is loaded into ram, and it can grow to 100's of MB in size. so, be sure that the server has enough ram, or mysql will start thrash-swapping trying to keep all that info in server memory. a generic index would be created with the 100 chars length, but since your db has a very large size, you can try optimizing the index according to the contents that you have in the remote_id field, following this rule: optimize the index for the N first chars, that depends on your data in the field. double N, double the memory, halve the search time. please refer to mysql docs to create an index with ( n ) size. you will not have performance hit for updates, the innodb b-tree indexation in mysql will take care of that. if over time you find that the update is getting slower, running ANALYSE or OPTIMIZE on the table will be enough to rebuild the index. even with massive updates, running this operation monthly should be enough(if you ever need it at all). beware that these commands lock the tables, please check the mysql documentation if you intend to use them. i recommend that you set the generic index on a stage setup, and check how much ram it eats in the server, check how fast the queries go. then you can try to fine-tune it, set index length to 50 (or increase server ram), repeat until you get a balance that works.
        Hide
        Philipp Kamps added a comment -

        Looks like eZ has somebody with great mysql knowledge. Thanks for the details.

        I'm happy to call it an improvement. But I think eZ Systems should make sure that official fetch functions

        http://doc.ez.no/eZ-Publish/Technical-manual/4.x/Reference/Modules/content/Fetch-functions/node

        should not cause a website to crash.

        Also, if I understand it right, the only downside of adding the index is that we increase the size of the index and therefore increase the size of RAM required for the mysql server.
        I'm not sure how to calculate the index size - so I tried it on an ezcontentobject_tree table with 250.000 entries. The index size grow from 68.3 MiB to 78.8 MiB. So in my case I would need to make sure I have an extra 10.5 MiB RAM (that's what my phpmyadmin shows me, I hope it's not a bad idea to rely on that).

        I don't think anybody would run a bigger site with a mysql server with less then 4Gig of RAM. 10MB of 4Gig is 0.25%. I'm willing to take a chance and add the index.

        Could I ask you to reconsider to add the index to ezp? It's very unlikely it will cause an issue because of the RAM consumption (if my calculation is correct) - but there are already 3 people running into serious issues not having the index.

        Show
        Philipp Kamps added a comment - Looks like eZ has somebody with great mysql knowledge. Thanks for the details. I'm happy to call it an improvement. But I think eZ Systems should make sure that official fetch functions http://doc.ez.no/eZ-Publish/Technical-manual/4.x/Reference/Modules/content/Fetch-functions/node should not cause a website to crash. Also, if I understand it right, the only downside of adding the index is that we increase the size of the index and therefore increase the size of RAM required for the mysql server. I'm not sure how to calculate the index size - so I tried it on an ezcontentobject_tree table with 250.000 entries. The index size grow from 68.3 MiB to 78.8 MiB. So in my case I would need to make sure I have an extra 10.5 MiB RAM (that's what my phpmyadmin shows me, I hope it's not a bad idea to rely on that). I don't think anybody would run a bigger site with a mysql server with less then 4Gig of RAM. 10MB of 4Gig is 0.25%. I'm willing to take a chance and add the index. Could I ask you to reconsider to add the index to ezp? It's very unlikely it will cause an issue because of the RAM consumption (if my calculation is correct) - but there are already 3 people running into serious issues not having the index.
        Hide
        Philipp Kamps added a comment -

        I think the index was added in ezp 5.1.

        Show
        Philipp Kamps added a comment - I think the index was added in ezp 5.1.
        Show
        André Rømcke added a comment - Fixed in 5.1: https://github.com/ezsystems/ezpublish-legacy/blob/master/update/database/mysql/5.1/dbupdate-5.0.0-to-5.1.0.sql#L6

          People

          • Assignee:
            Unassigned
            Reporter:
            Paulo Bras (Inactive)
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: