Salesforce Data Import Guidelines and Best Practices

Best Practices for Integrating Your Data with Salesforce

When we are building data integrations, we are setting up an automated data movements that run regularly. Sometimes we are pulling the data in real-time. Other times we might be loading only the updated records once a month. Regardless, we know we must maintain the code long term, watch it, and handle errors properly.
The data migration and integration tracks of a project are often viewed as the riskiest part of the project. They can be incredibly complex, and full of nuanced details.
Over the course of years and projects we‘ve been working on in aikon group, we adopted the process for handling data import tasks and would like to share it here.

1. Have a plan

Plan your data Integration like a project and track your progress against the plan. Choose the tool that best suits your needs.

2. Partner with your PM (Project Manager)

Work with your project manager, to include your plan in the master project plan.

3. Start Early

If you have data in Salesforce your business analyst can demo the system with real data which users can relate.

4. Understand the source Data, Ask Questions to Understand Intent, Analyze Data to Know The Reality

After you understand what needs to be imported, you need to understand how the data is used and then you can start your data analysis.

5. Understand the Salesforce Data Model and Hierarchy

After we figure out what needs to be imported we need to figure out how it is used in Salesforce. For Example: what custom fields and objects need to be created.

6. Document your Transformations

Now you can start planning and documenting the data transformations needed to move data to Salesforce.

Copy the object needed to a new spreadsheet. Add columns, for example, the source table and target Salesforce objects and another for notes. This is our suggestion for basic mapping information:

Figure 1. The Mapping Sheet

7. Include Source-To-Target Mapping

You should rename the columns to match the Salesforce object field API names.

8. Don’t Hard-code Salesforce Ids; They Change with Environments

You should not hard-code the record type Id and the Id of the objects in your code. Instead, store those Ids in configuration files or cross-reference tables. You will likely need to use them more than once.

9. Store Cross-reference data in Salesforce

By storing External ids on the corresponding Salesforce records, you essentially have built-in cross-reference tables that not only allow you to use Upsert functionality but also have solid traceability back to the source system.

10. Load Your Data in the Order of the Object Hierarchy

You need to load Accounts before Contacts, and both Accounts and Contacts before Opportunities and the three of those before you load Activities. You need the parent object to exist before you can create related records. The only exception to this is self-relations, which exist when an object is related to itself, such as accounts and parent accounts. In this case, you first create all the accounts, then do a second pass to update just the self-related field.

11. Delete Child Records First, Then Parents, When Deleting Data

You want to start at the bottom of the hierarchy and work your way up. If you start at the top (Account), you will probably trigger so many cascading deletes that the deletion will fail, either because of locking errors or other governor limitations.

12. Don’t Forget about Owners and Record Types

Define ownership and record types early in the process.

13. Partner with your BA (Business Analyst)

Use your data analysis and data manipulation skills to help the team better understand the source data and answer any questions team members might have. Validate assumptions being made about the source data.

14. Automate When Possible

Anything that can be automated should be automated.

15. Use Proper Tools. When choosing your toolset, make sure it meets all your needs

When choosing your toolset, make sure it can help you achieve all your requirements. We are using Data Loader, but I will point out the differences between Data Loader and Import Wizard so you can choose the one that meets your needs.

Data Loader:

For complex imports of any size more than 50.000 records.
It supports all standard and custom objects.
It doesn’t support scheduled export.
Delete operation is available.
It requires installation.
While importing, duplicates cannot be ignored.

Import Wizard:

For simple imports of up to 50.000 records.
It supports all custom objects and only a few standard objects like Account, Contact, Leads, and Solution.
It supports schedule export.
Delete operation is not available.
It doesn’t require installation.
While importing, duplicates can be ignored.

16. Build the Library of Reusable Code

There are some transformations you will need to perform repeatedly, not necessarily just within the same project, but for every Salesforce integration, you do. Some examples are Formatting phone numbers, breaking up name fields into First Name, Middle Name, and Last Name and Validation email addresses.

17. Turn off Duplicate Management

Salesforce duplicate management feature is a great feature but it has no place in the data integrations. Check for duplicates before you push the data, not during or after. If there is a duplicate, it should be fixed in the source system and the integration should then push the fix to Salesforce.

18. Fix Code, Not Data

It is always better to fix data before import in the source system rather than fixing it in Salesforce. Here are a few examples:

The source system has a single field for Name and it needs to be split into First Name and Last Name. You should either alter the source system to use two fields or write transformation code to split it.
The source system has e-mail addresses with spaces in them.
The source system stores address lines in separate fields (Salesforce wants them in one, with line breaks between the lines).
Names are all capital letters, but users want them converted to camel case.

For integrations, fixing the source data and adding validation to the source system is the best way to go, because if that system owns the data, it should also own the validation.

19. Fix Errors with Parent Objects before Moving on to Children

Depending on how the relationships are configured in Salesforce and how your code is written, if a parent fails to load, the child records will “error-out” (either because the parent field is not valid or is not found) or the record will be created but the parent field will be NULL.

20. Every Record You Insert Should Have and External Id

External Ids allow for the use of Upsert, but this is not the only reason for using them. External Ids make tracing Salesforce data back to the originating system incredibly easy. They are indexed automatically, so all you need to do to find the record in Salesforce is type the source system Id into the Salesforce search box.

21. Standardize External Ids and Always Enforce Uniqueness

Always Enforce Uniqueness. This practice prevents duplicates from being created accidentally. Also, if you make a mistake and calculate nonunique External Ids, any inserts will be rejected and you will catch your error.
External Ids should always be text. There is no benefit to creating an External Id as a number; with text, you can always add a prefix or a suffix later (with a delimiter) to make it compound key.
Use a different External Id field for each data source and include the source name as part of the field name. For example, if you have two data sources, ABC and XYZ, create two External Id fields – one for each of them (ABC_Id and XYZ_Id) – and name them as such.

22. Use Upsert When Possible

An Upsert is a combination of insert and update in a single call. If the record exists, Salesforce updates it; if not, Salesforce creates it. To update a record we require Record Id whereas for inserting a record we don’t need any Id, we simply create CSV file and upload through Data Loader.

23. Real Users Must Perform UAT (User Acceptance Testing) on the Data. Plan for an Extended UAT Period

It’s more common than not that users are given the task of performing UAT in addition to their regular job duties.

24. Build a Relationship with the Users

Explain why you are asking certain questions and answer all their questions fully.

25. Start Fresh for Each Round of QA (Quality Assurance) and UAT (User Acceptance Testing)

For each round of QA and UAT, reset your target Salesforce org back to its originating state – that is, the state will be in at the time of go-live. For integrations, by the time you get to UAT, the integration processes should run as they would in production.

26. When Errors Are Found, Patch the code Than Return it

When Issues are found, even small ones, fix your file and return it. Often, when a minor bug is found, the temptation is to patch data quickly, tell whoever logged the issue to retest it so they can close the issue, then go back and fix your code. This is the wrong approach because those small things add up and something can slip through the cracks quite easily.

Error Handling and Performance Tuning

Two Categories of Errors

There are two basic categories (or levels) of errors: job-level errors and row-level errors. Job-level errors are when the entire job fails for some reason. Row-level errors are errors returned from the Salesforce API stating that an action has failed for a particular row.

Job-level errors

Job-level errors occur because of some environmental change, such as network outage, Internet connectivity issues, or database or Salesforce schema changes. The job fails and stops executing the moment the error occurs. The error message for job-level errors should be self-explanatory and is likely thrown from your middleware or ETL tool rather than from Salesforce.

Row-level errors

Error log view will contain things like Salesforce object name, Salesforce Id or External Id, error code and error message. The primary causes (or most common) types of row-level errors we will encounter include the following:

Salesforce Validation errors
Salesforce code errors
Governor limit errors
Record Locking errors

Salesforce Validation Errors

Salesforce Validation Errors include errors caused by not following custom field validation rules, not populating required fields, pushing not valid data types, or trying to set not valid values for restricted picklists.

Salesforce Code Errors

Salesforce code errors are errors thrown by Workflows, Process Builder, triggers and other Salesforce code that runs automatically when we insert or update data into Salesforce.

Governor Limit Errors

Governor Limit Errors occur when we exceed some Salesforce Limit – for example, we run out of storage space, we uploaded a file larger than 25MB, or we trigger too many cascading deletes.

Record Locking Errors

They occur when Salesforce can’t obtain a lock on the record because it’s in use, so it can’t update it.

Performance Tuning

Here is a checklist of things you should do/check when trying to spread your code:

Don’t push unnecessary updates. When downloading cross-reference data from Salesforce, use a “where” clause to download only those records that you need, and download only those fields that you need.
If you are getting good performance but are getting some locking errors, consider leaving the code as-is, catching the errors, then reprocessing just the errors at a smaller batch size.
Sort your datasets.

API Operations

The following sections review the most common API operations. Those operations should cover all you need to build even the most complex Salesforce integrations.

Export

Export is your basic export data operation that passes in the SOQL query. Most tools have a build-in SOQL builder to abstract this completely.

Export All

Export All is exactly like Export except that it includes deleted (assuming they have not yet been deleted physically) and archived records. If you want to include archived but not deleted records, you can modify the SOQL code to specify “isDeleted=0”.

Insert

Insert is your basic create-new-record operation. The API returns the Ids of the newly created records and either success messages or error codes and messages for rows that failed.

Update

Update works exactly as Insert does except it updates existing records rather than inserts new ones. To use Update, you must pass in the Salesforce Ids of the records to be updated. If you don’t have the Ids, you first have to perform an export to get the Ids, then you can push the updates using the Ids.

Upsert

When using Upsert, you must specify on which field the match should be made. By doing so, Salesforce knows whether it should do an update or an insert.

Delete

It deletes records based on the passed Salesforce Ids.

Hard Delete

Hard Delete is similar to Delete, except the targeted records do not go to the recycling bin.

Undelete

Undelete restores records from the recycle bin based on the passed-in Salesforce Ids. It is rare to find a tool that supports Undelete.

Merge

Merge allows you to merge records that you consider duplicates (for example, you may have two records for the same company in the Account object). It is only supported for accounts, contacts, and leads.

Figure 2. The Apex Data Loader

We hope we have been able to introduce you to data import best practices, toolset differences, error handling, and API operations.
If you would like to learn more about data import or share your ideas feel free to contact us.
We would love to hear your thoughts on this topic.

References
David Masri: Developing Data Migrations and Integrations with Salesforce, Apress