Uploaded image for project: 'eZ Publish / Platform'
  1. eZ Publish / Platform
  2. EZP-16340

DB Deadlocks on ezcontentobject_tree when updating subnode

    Details

      Description

      From customer:
      -----------------------------------------------
      We are currently encountering a high number of db deadlocks resulting in fatal errors, usually when publishing content. This appears to be due to the update of the subnode modified time in the ezcontentobject_tree table.

      During publishing this sometimes fails to gain a lock and results in a fatal error which is very disconcerting for editors within the system.

      Is there a way to optimize this query using an IN statement instead of multiple node_id= in the WHERE clause? My explain query showed that the current statement is affecting all table rows, whereas the IN would affect only a handful.

      This is becoming a real issue from a scalability perspective, is there any other mysql settings or changes we can make to avoid or remedy this deadlock situation?
      -----------------------------------------------

      Attached a patch that changes multiple "OR node_id = " to "node_id IN()" using eZDB::generateSQLINStatement().

      1. ezcontentobjecttreenode.php-issue-6647.diff
        1 kB
        (inactive) Gunnstein Lye
      2. issue-16340-modified_subnode-master.diff
        3 kB
        (inactive) Gunnstein Lye

        Issue Links

          Activity

          Hide
          Bertrand Dunogier added a comment -

          As highlighted earlier by Gaetano, another possible track would be to limit the transaction that includes this query. A query that affects so many rows should NOT be part of such a huge transaction. Either this particular query could be moved outside the transaction (after, I'd say), or other queries can be moved out.

          The issue with delayed execution (e.g. asynchronous) of this query is that it affects clearing of subtree based cache blocks, and will delay their update. This is by itself not an issue, but ViewCaching would STILL be cleared during publishing, and this would lead to inconsistencies as ViewCache files could be updated with outdated cache blocks.

          If such a workaround improves things in an order of magnitude that resolves the current issues, eZ Engineering is thinking on the larger scales: what do we use this field for exactly; can we use something else instead ? if we need this field, what is the true purpose, and how can we make the feature behind it more efficient.

          I must also add that proper MySQL tuning will improve the situation; it won't remove the LOCK timeout issue, but will make it less likely to happen, as queries will execute faster.

          To conclude, about raising the wait timeout: yes, this is a valid workaround, but it has one major defect that makes it a non option from a product point of vue: it degrades the editor's experience. Publishing MUST be either fast or asynchronous.

          Show
          Bertrand Dunogier added a comment - As highlighted earlier by Gaetano, another possible track would be to limit the transaction that includes this query. A query that affects so many rows should NOT be part of such a huge transaction. Either this particular query could be moved outside the transaction (after, I'd say), or other queries can be moved out. The issue with delayed execution (e.g. asynchronous) of this query is that it affects clearing of subtree based cache blocks, and will delay their update. This is by itself not an issue, but ViewCaching would STILL be cleared during publishing, and this would lead to inconsistencies as ViewCache files could be updated with outdated cache blocks. If such a workaround improves things in an order of magnitude that resolves the current issues, eZ Engineering is thinking on the larger scales: what do we use this field for exactly; can we use something else instead ? if we need this field, what is the true purpose, and how can we make the feature behind it more efficient. I must also add that proper MySQL tuning will improve the situation; it won't remove the LOCK timeout issue, but will make it less likely to happen, as queries will execute faster. To conclude, about raising the wait timeout: yes, this is a valid workaround, but it has one major defect that makes it a non option from a product point of vue: it degrades the editor's experience. Publishing MUST be either fast or asynchronous.
          Hide
          Gaetano Giunta added a comment -

          http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html

          It says that when using replication, SBR mode needs more locking than RBR mode. Might be worth investigating...

          Show
          Gaetano Giunta added a comment - http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html It says that when using replication, SBR mode needs more locking than RBR mode. Might be worth investigating...
          Hide
          Bertrand Dunogier added a comment -

          This situation is now fixed by using the asynchronous publishing system introduced in eZ Publish 4.5:
          https://github.com/ezsystems/ezpublish/blob/master/doc/features/4.5/ezasynchronouspublisher.txt

          Show
          Bertrand Dunogier added a comment - This situation is now fixed by using the asynchronous publishing system introduced in eZ Publish 4.5: https://github.com/ezsystems/ezpublish/blob/master/doc/features/4.5/ezasynchronouspublisher.txt
          Hide
          Philipp Kamps added a comment -

          I don't think it's a problem of a single sql query. The entire publishing process (ezp core php code) is wrapped into a DB transaction. That transaction includes ez publishing pre/post trigger handling. So let's say you have a custom publishing post trigger which takes some time to execute, it will become more likely that you run into the "Lock wait timeout exceeded" error. The default setting for inno DB tables is 50 seconds.

          Show
          Philipp Kamps added a comment - I don't think it's a problem of a single sql query. The entire publishing process (ezp core php code) is wrapped into a DB transaction. That transaction includes ez publishing pre/post trigger handling. So let's say you have a custom publishing post trigger which takes some time to execute, it will become more likely that you run into the "Lock wait timeout exceeded" error. The default setting for inno DB tables is 50 seconds.
          Hide
          Geoff Bentley added a comment -

          Async publishing doesn't fix the problem - but it does delay it.

          Show
          Geoff Bentley added a comment - Async publishing doesn't fix the problem - but it does delay it.

            People

            • Assignee:
              Patrick Allaert
              Reporter:
              (inactive) Gunnstein Lye
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: