I am working on integrating multipcf or mcf with alfresco cms as a repository connector using a CMIS request and using solr as the output channel where the entire index is stored. I can do it well and look for documents in the solr index.
Now, as part of the implementation, I plan to implement several repositories such as sharepoint, file systems, etc., so now I have three document repositories: alfresco, sharepoint and file system. I plan to have scheduled tasks that go through each of the repositories and scan them at regular intervals. But I have the following statements.
- Although I plan tasks at frequent intervals, I want to make sure that mcf tasks select only those materials that are either added new or updated. Let's say I have 100 documents that do the current job, but say 110 the next time I start the task, so I just want to start the task for 10 new documents, and not as many as 110 documents.
- Since there are relatively less accessible mcf tutorials, I donβt have the means to ensure that mcf jobs behave this way, but I assume that he is intelligent enough to behave that way, but again no evidence to justify it .
- I want to learn more about the type of mcf job schedule: scan each document once / scan documents directly. Likewise, I want to know more about the job call: full / minimal. I would regret being a newbie.
- I also consider doing some custom coding to ensure that only the latest / updated documents are suitable for processing, but go through the code again only as the documentation becomes less accessible.
- In this case, it is wise to use custom doc encoding, or mcf provides all of these OOTB functions.
Thank you very much in advance.
source
share