Last week a client asked me to help out, we had been creating a system that creates PDF files in Salesforce using Drawloop (today known as Nintex Document Generation which is a boring name).
Anyways, we had about 2000 PDF created in the system and after looking into it there doesn’t seem to be a way to download them in bulk. Sure you can use the Dataloader and download them but you’ll get the content in a CSV column and that doesn’t really fly with most customers.
Enough of small talk, I had to solve the problem so I went ahead and created a very simple Python script that lets you specify the query to find your ContentVersion objects and also filter the ContentDocuments if you need to ignore some ids.
My very specific use case was that I was to export all PDF files with a certain pattern in the filename but only those that were related to a custom object that had a certain status. Given that you can’t do certain queries like this one:
SELECT ContentDocumentId, Title, VersionData, CreatedDate FROM ContentVersion WHERE ContentDocumentId IN ( SELECT ContentDocumentId FROM ContentDocumentLink where LinkedEntityId IN (SELECT Id FROM Custom_Object__c))
It gives you a:
Entity 'ContentDocumentLink' is not supported for semi join inner selects
I had to implement the option for the second query which gives a list of valid ContentDocumentIds to include in the download.
The code is at https://github.com/snorf/salesforce-files-download, feel free to try it out and let me know if it works or doesn’t work out for you.
One more thing, keep in mind that even if you’re an administration with View All you will not see ContentDocuments that doesn’t belong to you or are explicitly shared with you. You’ll need to either change the ownership of the affected files or share them with the user running the Python script.