Sunday, June 30, 2013

Alfresco Two-Phase commit limitation - "Does the document really exist?"

Alfresco has a limitation where it has no global transaction support with multiple database schemas (at least since 4.1.2 or earlier).

The problem is that if you have Alfresco storing your document repository, but another application database is managing the application, Alfresco does not support a two-phase commit scenario between those two database schemas. Here are two simple examples that you want treated as "all or nothing" in a transaction:

Example 1:
  • Create a document in Alfresco 
  • Insert row into non-Alfresco database schema 

Example 2:
  • Delete a document from Alfresco
  • Delete row from non-Alfresco database schema
There are questions that come to mind: 
  • Q1: What if I create a document in Alfresco, but the insert into the non-Alfresco database fails? 
  • A: You will have an orphan Alfresco document, not associated to the non-Alfresco database
    - In essence, the application will have no awareness of the document.  In reality, the application will need to give an error on the first attempt, and try to ingest the document again.

  • Q2: What if I insert into the non-Alfresco database, but the Alfresco document creation fails?
  • A: This scenario won't happen, because we decided to always handle Alfresco operations first, with the application operations last.
    - In essence, the application will return an error right away if it can't ingest the document into Alfresco, so the non-Alfresco database insert won't be reached.

  • Q3: What if I delete a document from Alfresco, but the delete from the non-Alfresco database fails?
  • A: You will have an orphan non-Alfresco database document, not associated to an Alfresco document
    - In essence, the non-Alfresco database will have awareness of the document, but the application won't have the actual document to retrieve.  In reality, the application will need to give an error on the first attempt, and try to delete the document again.

  • Q4: What if I delete from the non-Alfresco database, but the Alfresco document deletion fails?
  • A: This scenario won't happen, because we decided to always handle Alfresco operations first, with the application operations last.
    - In essence, the application will return an error right away if it can't delete the document from Alfresco, so the non-Alfresco database delete won't be reached.

  • Q5: What if I search for the document from the application, but the document doesn't exist in Alfresco?
  • A: The answer is that the application should not find the document!
    - This particular scenario must have occurred as a side-effect from Q3.  In short, for scenarios Q1 and Q3, if an application receives an error, it must correct the situation by making the call again until it succeeds.

    If the application doesn't correct the situation, all is not lost.  The limitation workaround is the last step.

What is the limitation's workaround?
The goal of the workaround is if a document exists in the non-Alfresco schema, but not in Alfresco, the application should still work (don't get an exception).

The proper steps in order are:
  • Part 1: Check if the document exists in the non-Alfresco database
  • Part 2: then check if the document exists in Alfresco
If part 1 or part 2 fails, then simply return to the application that the document doesn't exist (and log the descrepency in your log file).  You MUST have the document existing in BOTH Alfresco and non-Alfresco database to truly EXIST in the system!

No comments:

Post a Comment

I appreciate your time in leaving a comment!