Understanding Salesforce’s Data Architecture

Salesforce Database Access

Salesforce is an “API (Application Programming Interface) First” company. This means Salesforce made a decision that any functionality added to the system must first be exposed via an API, then Salesforce‘s own user interface (UI) must use that API to perform the function. So, anything we can do via the Salesforce UI can also be done via an API. Salesforce‘s API are all the HTTP (Hypertext Transfer Protocol) based and are exposed as SOAP (Simple Object Access Protocol) or REST (Representation State Transfer) web services. This includes the data APIs.

SQL vs. SOQL and the Data APIs

SQL is the standard for querying relational data. Salesforce has a custom language that looks a lot like SQL called SOQL, which stands for Salesforce Object Query Language. The following list presents the key differences between SQL and SOQL:

  • SOQL is a query-only language. It has no equivalent INSERT, UPDATE, and DELETE statements.
  • In the Salesforce world, data manipulation is handled using a set of methods known as DML (Data Manipulation Language). One big difference you’ll notice right away is that SOQL has no such thing as SELECT *.
  • With SQL, we can (and must) specify the Join criteria. With Salesforce, Joints are attributes of the data type. For example, the Salesforce Contacts object has an AccountId field. As part of that field definition, Salesforce knows that it joins to the Account object, so we don’t have to tell it to do so. Because of this, we can join only on Id fields-only on predetermined joins- so we can’t join on derived data or other non-Salesforce fields (such as Date field).
  • When selecting from a parent object, we can only join one level down. For example, we can join from Account to Contact, but not down another level to the child of Contact (A grandchild of Account). When joining up, we can only go five levels up-for example, from Case ⇨ Contact ⇨ Account ⇨ Owner (this is four levels).
  • We can’t join from a child to parent and then back down to another child – for example, from Contact ⇨ Account ⇨ Account Note.

Salesforce Data Types

  • Every data type has the option to be set as a required field.
  • Custom field API names are always suffixed with “__c”.
  • All strings are stored as Varchars, so there is never any extra space padding –neither at the beginning nor end of the string – even if we type in extra blank spaces purposely.

Note that the API name is like a database field name; it can be referenced in the code. The label is what is displayed next to the field when it’s being shown in the UI. We don’t have to worry about changing label values because they are generally not referenced in the code.

Lookups and Master Detail

Object relationships are a special field type that connects two objects.
Salesforce has two relationship data types: Lookups and Master-Details. In both of these cases, we add the field to the child object and specify the parent object as part of the setup. As of this writing, we can have a maximum of two Master-Detail fields per object and a maximum of 40 relationship fields (Master-Detail + Lookup) per object.

Lookup Relationship

With Lookup relationships field is not required but we can configure the field to be required. The lookup field has no impact on security and access. It can be multiple layers deep. We cannot create Rollup fields based on the Lookup relationship.

Master-Detail Relationship

Master-Detail Relationship is the Parent-child relationship. In which Master represents Parent and detail represents Child. Master-Detail relationship fields are always required. When deleting the parent of a Master-Detail the deletion cascades down automatically and deletes all child records. There are also settings to control the security of the child record based on a user’s access to the parent record. Master-Detail relationships allow us to create rollups on the parent object which will calculate the COUNT, SUM, MIN and MAX of the Child records. A child of one master-detail relationship cannot be the parent of another.

CRUD Fields

CRUD – Create, Read, Update, and Delete. Every object in Salesforce has native CRUD fields that are system maintained. Salesforce refers to these as Audit Fields.
These fields are

  • CreatedById
  • CreatedDate
  • LastModifiedDate
  • LastModifiedById
  • SystemModStamp

Record Types

Record types are a special kind of lookup relationship used to control page layouts and security configuration. For example, if we have two types of accounts (School and Companies) and we want a different set of fields to be displayed depending on which record is being displayed, we would create two record types for the Account object (Schools and Companies). When we add a record type to an object, Salesforce created the RecordTypeId field for us. It’s a lookup to the RecordType object and is always required.

Fifteen- and 18-Character Ids

All Id fields are stored as a 15-character alphanumeric, case-sensitive string. Because so many systems are not case sensitive, Salesforce added three additional characters (calculated), making it an 18-character case-insensitive string. Salesforce Ids are unique across the org (not just within the object), with the first three chars indicating the object type. For example, any Id that starts with 001 is an Account; 003 a Contact.


Salesforce AutoNumbers are number fields and should probably be called autoincrement fields because Salesforce allows us to add a text prefix to it or include date parts. The increment portion must be a number. When creating an AutoNumber, we can set the initial starting number as well as choose to populate all existing records.


A checkbox is the Salesforce equivalent of a Boolean, but it always displays as checkbox on-screen layouts and reports. Internally, it’s stored as zeros or ones (where 0 = False and 1 = True), and when we export the data, it exports as zeros and ones. When loading data (via the API), it also accepts the string values True or False. When creating checkboxes, Salesforce lets us choose a default value of either Checked or Unchecked. There is no option for required. Salesforce checkboxes are always required. If nothing is selected, it uses the default value.


Currency is a number field that represents money. We can specify the number of decimal places. When displaying currency, it is shown with the currency symbol based on our Salesforce org’s configured type. If we have multicurrency enabled, when we create a currency field, two fields are created – one for the currency value and one for the currency type and Salesforce displays the appropriate currency symbol next to the value.


Salesforce Date fields store dates only, without timestamps, and are not subject to time zones. This field type is ideal for things like Birthdays, which don’t change with the time zone.


Salesforce Date/Time fields store date and time data in UTC, but when displayed, it’s always displayed in the user’s configured time zone. When interacting with date and time via the API, the UTC data are always used. So, when we send data to Salesforce for insertion, we need to send UTC time. When pulling data out, the UTC time is returned.


It stores time without a date and does not adjust for the time zone. We can load data formatted as HH: MM using the 24-hour format or HH: MM AM/PM using the 12-hour format.


Salesforce e-mails are strings with a maximum length of 80 characters, Salesforce validates that e-mails are in a valid format. E-mails are displayed as hyperlinks, can have uniqueness enforced, and can be configured as External Ids.


Geolocation fields are used to store longitude and latitude and can be used to calculate distances.


All Salesforce objects have a Name field. This is the text that is displayed at the top of the record when viewing it in the UI. Names are always required. For custom objects, we have the option to configure Name either as a string, with a maximum length of 80 characters, or as an AutoNumber. For native objects, there is a lot of variation. For example, Account is a string (255) and Opportunity is a string (120). Cases use CaseNumber (AutoNumber). Contacts use a formula field that concatenates the various parts of the contact’s name (First Name, Middle Name, Last Name, and Suffix) with 121 characters which is the combined length of the concatenated fields. Activities use the subject fields.


Numbers are numeric fields. When defining the data type, we must specify the length of the number and how many decimal places it can have. The maximum total characters (not including the decimal point itself) is 18. So, for example, we can define the number with 16 long to the left of the decimal and two to the right (16 + 2 = 18). Numbers can be configured as External Ids and have the option to be enforced as unique.


Percentages are similar to numbers, but they display with the percent symbol. When interacting with percentage fields via an API, always use the decimal value (load in 0.2 as opposed to 20, for 20%).


Salesforce phone data types are nothing more than strings with a length of 40. If entering data through the UI, Salesforce formats the phone number (adding the parentheses and hyphens) and saves that formatting with the data. When phone numbers are exported via an API, they are exported with the formatting. When loading data through an API, we must format the data ourselves before loading it or it won’t be formatted, even when viewed in the UI.


Picklists are Salesforce equivalent of drop-down lists and are stored as strings with a length of 240 characters. We configure the available list values at the time of creation and can change them at any time. We can also configure our picklist values to be dependent on a previous picklist selection to create things like a type/subtype or category/subcategory hierarchy. Unlike a traditional “combo box”, when entering data into a picklist through the UI, a picklist does not allow for free text entry. When removing a value from the set of picklist options, we can either delete the option or deactivate it. If we choose to delete it Salesforce asks us to select a valid value to replace all records that have the old value with the new valid one. If we choose to deactivate the value, Salesforce leaves the record as is, but it does not allow any new records to be created or updated with the deactivated value through the UI because it’s not available for selection. When loading data into a picklist field through an API, if we load a value that does not exist, that value is created automatically as an inactive value and the update/insert succeeds. We can configure a picklist to be restricted. This action implements validation at the API level, so instead of creating new inactive value, Salesforce blocks the update and throws an error.

Multi-select Picklist

Multiselect picklists are the Salesforce equivalent of list boxes and are exactly what you would except a picklist with the option to select more than one value. Everything discussed on picklists applies to multi-select picklists. All values in multi-select picklists are stored as a single string, with the individual values separated by a semicolon. This is also how we have to load data into Salesforce when using an API. For example, if we want to update the Color multi-select picklist field, the value we pass looks something like this: Red;Blue;Green;Purple. Salesforce stores the list values in the order they are configured in the Picklist setup, not in the order they are loaded. So, when exporting the data from Salesforce, it may not come out in the same order in which we inserted it.


Text is a string that is displayed as a single line with a maximum length of 255 characters. Text fields can be configured as External Ids and have the option to be enforced as unique.

Text Area

A text area is a string that is displayed as a multiple-line text box with a maximum length of 255 characters.

Text Area (Long)

Text area (long) is a string that is displayed as a multiple-line text box with a maximum length of 131, 072 characters.

Text Area (Rich)

Text area (rich) is a string that is displayed as a multiple-line text box with a maximum length of 131, 072 characters and it allows for limited HTML formatting.

Text Encrypted

Encrypted text fields are just like text fields, which are stored encrypted at rest (single line; maximum length, 255 characters). From an API perspective, we treat them just like any other data type. Everything is done for us behind the scenes.


URLs are text fields that are displayed as and functions like a hyperlink. When we load data into a URL field, if we forget to include the protocol (the http://), Salesforce adds it for us. But what if we want to use HTTPS or FTP instead of HTTP? In this case, we must specify it when loading in the data.

Formula fields

Formula fields are calculated at runtime and can include the references to parent objects. For example, we can create a formula on the Contact object that references its parent Account (Account.Name). These types of formulas are referred to as cross-object formula fields. Formula fields are always read-only and are exposed via the API when exporting data. Because formula fields are calculated at runtime, the record’s audit information is not changed. This means that if we updated an Account name and we have a formula on our Contact object, the contacts field (LastModifyDate) is not updated nor is the SystemModStamp field), even though the data on the object have changed.


Rollups are similar to formula fields in that they are used to expose calculated data in real-time. The key differences include the following:

  • Rollups are created on the parent object and “roll-up” (summarize) data from its children. A formula field either references itself or its parent object.
  • Rollup works more like triggers because they are recalculated based on an update action, then stored physically on the parent object if possible. Some rollups may be non-deterministic because they include time-based data, so they are always changing. These types of rollups are calculated at runtime. When we create a rollup, Salesforce decides for us whether the rollup is deterministic or nondeterministic. There is nothing special we need to do for this.
  • If the rollup is deterministic when a record is updated, Salesforce updates SystemModStamp but not LastModifyDate or LastModifyBy.
  • Rollups can only be created on Master-Detail relationships (not Lookups).

Owner Id

Owner Ids are lookup relationships to the user object and are used for security. Depending on how security is configured (for the object in question), Salesforce may or may not add this field. Often, people use the owner field for things, such as a Sales Representative or Account Executive.


Addresses aren’t a data type, they are special UI construct that combines the following five fields:

  1. Street
  2. City
  3. State
  4. Postal Code
  5. Country

Users can’t create new addresses field groups, but they can use the ones created by Salesforce on the Account and Contact objects. Of course, users can create four new fields and display them separately in the UI.

The Salesforce Recycle Bin and Data Archiving

The Salesforce Recycle Bin

When you delete a record in Salesforce, it does a kind of a soft delete. It flags the record as deleted and hides it from users and the system as a whole (for example, it is not used in triggers, rollups, formula fields, and so on). If you want to view or restore the record, you can simply check the Salesforce recycle bin. When using the API, you can query using SOQL for records where “isDeleted = 1” (note that all Salesforce objects have this field). Salesforce recycle bin autoempties when it fills up (its capacity is 25 times your org total storage space). Salesforce autoempties the oldest records.

Data Archiving

Salesforce archives Events automatically that are more than a year old, and archives tasks that are more than a year old and are closed. These two objects have an IsArchived field that indicates whether the record is archived. Like the IsDeleted flag, you can query the Salesforce API using SOQL for archived records by using “IsArchived = 1”.

We hope we have been able to introduce you to data architecture in Salesforce.
If you would like to learn more about data import or share your ideas feel free to contact us.
We would love to hear your thoughts on this topic.

David Masri: Developing Data Migrations and Integrations with Salesforce, Apress

Leave a Reply