Migrating files from an external source into S-Drive

Introduction

When using S-Drive, the files are stored in an AWS S3 Bucket. A record in Salesforce (an S-Drive file record, such as Account File, Case File, etc) stores information about the file (name, size, etc) and points to its location in the bucket.

Therefore, migrating files involves these two main steps:

Creating file records (Account File, Case File, etc) in Salesforce that contain the metadata and location of the file.
Uploading files from the external system to a specified location in the AWS bucket.

If you want your files placed in folders/subfolders, rather than at the Home/root level, there are extra steps.

How S-Drive finds the files in AWS

The file’s location in the S3 bucket is stored in a “key” field in the file record in Salesforce. During migration, we need to get the key field and file location in the bucket to match. We do this by first creating the key field, then uploading the files to AWS to the location specified by the key.

Normally, by default, a key field is made up of 3 parts:

parent record id--in the example below it’s an Account record
file record id--the record id of the Account File record
file name

It looks something like this 0014x00000Hm3JlAAJ/a004x000006BG2bAAG/myfile.jpg

(The key field can be non-standard. For example, you could have a key field like AccountABC/myfile.jpg. There should be no problem opening the file, but with non-standard keys, there could be S-Drive functionality that doesn’t work correctly and you could end up with duplicate keys.)

What about S-Drive Folders/subfolders?

S-Drive folder or subfolder locations don’t affect where the files are stored in AWS. The key does not change. The folder location on the S-Drive side is determined by a field in the file object called Parent Folder Id. This field must be filled in with the folder’s record id in order for the file to show inside the folder.

The parent record id can be filled in either during initial migration or afterwards. See Adding Folders and Subfolders during or after Initial Migration below.

Migration Instructions

In order to migrate files, we’ll use DataLoader and AWS CLI.

You must use an unversioned bucket for this method. You can enable versioning later if you wish.

This method requires the following steps, which are described in greater detail below.

Use Dataloader to create file records with blank keys
Export the just-created file records
Fill in the key fields
Use Dataloader to update the file records to populate the key field
Create a spreadsheet of the files to use for uploading to AWS
Use AWS CLI to upload the files

If you have a large number of files and are working in batches, you can create a checkbox field (called New for example) on the file object that is checked when you first load in the records in step 1. In step 2, export the “new” records--the ones where the box is checked. In step 3, change the box to unchecked.

Detailed steps

1: Use Dataloader to create file records with blank keys (Spreadsheet 1)

Create a spreadsheet with the following columns.

Parent id: Record Id of record file is attached to, for example, the Account record Id if you’re migrating file to an Account
Bucket id: Bucket id where file will be stored
Content type: such as Folder, image/jpg, etc.
File Name: Name of the file
File Size in Bytes: the size of the file in bytes
Key: can be blank when first importing but will need to be filled in later (steps 2-4 above)
Parent Folder Id: optional. Record Id of folder this file will go in. If blank, the file will be at the root level. It’s simplest if files will be stored at the root level. If you have folders, you can create those first and add the folder record id to the spreadsheet here, or you can update the parent folder id later. See Adding Folders and Subfolders during or after Initial Migration below.
Version Id: should be blank for migration, so not needed. But Bucket must be unversioned
WIP: must be set to false

Import the records into Dataloader.

2: Use Dataloader to export the just-created file records (Spreadsheet 2)

Export the records you just imported. Be sure to choose the following fields to export:
- Id
- Parent Id
- File Name
- Key
- Bucket Id

3: Fill in the key fields

In the key field, create the key with the format ParentRecordId/FileRecordId/FileName
- You can use the CONCATENATE function as follows
  =CONCATENATE(A1, “/”, B1, “/”, C1) where A1 is the cell with the Parent Id, B1 is the cell with the record Id, and C1 is the cell with the file name.

4: Using Dataloader, update the file records to populate the key field

Use dataloader “update” command to update the key field (and the checkbox “new” field if needed.)

5: Create a spreadsheet of the files to use for uploading to AWS

We will use a spreadsheet to create the AWS commands to copy local files to S3. The command has the following format:

aws s3 cp ".\filedirectory\test.txt" "s3://mybucket/parentId/recordId/test.txt" where
test.txt is in the path .\filedirectory relative to where the command is run
mybucket is your bucket name,
and you want the file uploaded to S3 with the path parentId/recordId

When it comes time to run the bat file to upload files to AWS, the file path for local files must be the relative path to the directory where you’re running the bat file.

For our purposes, we’ll create the copy commands in a spreadsheet, then copy them to a bat file to run.

Using the following steps, create a spreadsheet with the local file path where the file is located, and the key field, which will determine where the file goes in S3. Then we’ll create the AWS cp command.

Copy Spreadsheet 2 (with the key fields) to a new spreadsheet (Spreadsheet 3)
Change the key field from a function to a text field
1. Select the key column and copy it
2. Paste it back into the same column. Now this field won’t rely on other fields.
Remove the columns for Id and Parent id
Populate the local path for each file. This can be done any way you prefer. One suggestion is as follows:
1. Create a column with the local path (This path must be relative to where you will run the AWS cp commands from. If you’re not sure yet, you can do a global change on the path later.)
2. Create another new column and use a formula to concatenate the local path with the file name. If the path is in cell A1 and the filename is in column A2, you can use the formula =CONCATENATE(A1,”\”,A2)
You should now have at least the following columns in Spreadsheet 3
1. Location of each file locally (path relative to where you will execute the aws copy command)
2. Bucket Id
3. Key field
Create the aws copy command (aws s3 cp ".\filedirectory\test.txt" "s3://mybucket/parentId/recordId/test.txt")in a new column as follows:
1. In a new column, use the Concatenate formula to create the AWS copy command as follows;
  =CONCATENATE("aws s3 cp ",CHAR(34), A1, CHAR(34)," ", CHAR(34),"s3://", B1, "/",C1, CHAR(34))
  where
  A1 is the local relative path
  B1 is the bucket name
  C1 is the key field

6: Use AWS CLI to upload the files

AWS Command Line Interface will be used to upload your files to your AWS bucket. We’ll use a script to execute the commands to upload files listed on a spreadsheet.

Install AWS CLI
Refer to https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html for instructions
Configure AWS CLI
https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html
The following example shows sample values. Replace them with your own values as described in the following sections.
CODE
```
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json
```
⚠️ Make sure the local paths in Spreadsheet 3 are relative to the directory where AWS CLI is configured
For example, I’m running the command from Desktop\Migration. My files are in a directory Desktop/Files. The path in my cp command is ..\files
Create a bat file to run the AWS CLI copy command
- Create a new .txt file
- From Spreadsheet 3, copy the command column created in step 6 above
- Paste into the txt file
- Save the file
- Use Save As to save the file as type .bat
From a cmd window, type the name of the bat file to run it. This will execute all the aws cp commands in the file, which will upload the files to the AWS bucket.

Adding Folders and Subfolders during or after Initial Migration

Files will be displayed inside a folder if the file record has a Parent Folder Id field that is not null. In order to populate the field, the folders must first be created. This can be done using dataloader in the same way the other file records were created.

Folder Creation

Folders are just file records (such as cg__AccountFile__c) with the following fields:

File Name--name of the folder
Parent or lookup field (such as cg__Account__c for an Account File or Parent__c for a custom file object) – set to the parent record id
Content Type--Folder (a text field that is set to “Folder” without the quotes)
WIP--set to false
File Size in Bytes--set to 0

Create a spreadsheet with the columns above and Import it into Dataloader to create the folders.

Putting Files in Folders

This step can be done during initial migration or afterwards.

Once the folders are created, get their record id. This can be done by exporting them via Dataloader
Using Dataloader again, export the file records that need to be put in folders
In the Parent Record Id field of the file, fill in the folder record id of the folder the file should be in.
Re-import (update) via dataloader to update the Parent Record Id field

The same process can be followed for subfolders. Subfolders are just folders that have the Parent Record Id filled in with the id of the folder above it in the folder structure.