Azure Data Factory for Data Migration
For advanced users: Learn how to use Power Automate / Azure Data Factory to migrate data from other platforms into CommCare with example scripts, detailed steps, and troubleshooting tips.
- 1 Parent Cases
- 2 Child Cases
- 2.1 Step 1: Create Household Member XML Data in ADF
- 2.2 Step 2: Combine with Parent XML and Finalize Submission
- 2.3 Step 2: Create Parent Case XML Block
- 2.4 Step 3: Generate Footer XML Block
- 2.5 Step 4: Join and Combine XML Blocks
- 2.6 Step 5: Upsert the Data to SQL Database
- 2.7 Step 6: Submit XML Data to CommCare
- 3 Repeat Group Flow Configuration
Parent Cases
Step 1: Create XML Data in ADF
Create a Data Flow:
In Azure Data Factory Studio, go to the Author tab.
Create a new Data Flow and name it
HouseholdCaseXML.
Add Source:
Select your Azure SQL Database dataset containing household data (
hh_sqldata).Ensure columns include
household_id,access_to_healthcare_services,availability_of_health_insurance,access_to_clean_drinking_water, andcase_id.
Add Derived Column Transformation:
Name the transformation
hhxml.Create a derived column
hh_xmlusing the following expression. This can be easily copied from the Raw XML section of any form submission within CommCare for a household registration.Replace sections where inputs from the SQL DB are required. For example, case id, household_id and others as seen in this example.
Ensure that the instance_id has a UUID value for creating a unique ID for the submission.
concat( '<?xml version="1.0" ?>', '<data xmlns:jrm="<http://dev.commcarehq.org/jr/xforms>" xmlns="<http://openrosa.org/formdesigner/4E761EC2-20EA-47EB-904E-557041DCAD1C>" uiVersion="1" version="77" name="Household Registration">', ' <household_id>', household_id, '</household_id>', ' <access_to_healthcare_services>', access_to_healthcare_services, '</access_to_healthcare_services>', ' <availability_of_health_insurance>', availability_of_health_insurance, '</availability_of_health_insurance>', ' <access_to_clean_drinking_water>', access_to_clean_drinking_water, '</access_to_clean_drinking_water>', ' <n0:case xmlns:n0="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(case_id), '" date_modified="', currentTimestamp(), '+05:30" user_id="170794ebd70648668c3d8fa7728abfce">', ' <n0:create>', ' <n0:case_name>', household_id, '</n0:case_name>', ' <n0:owner_id>170794ebd70648668c3d8fa7728abfce</n0:owner_id>', ' <n0:case_type>household</n0:case_type>', ' </n0:create>', ' <n0:update>', ' <n0:access_to_clean_drinking_water>', access_to_clean_drinking_water, '</n0:access_to_clean_drinking_water>', ' <n0:access_to_healthcare_services>', access_to_healthcare_services, '</n0:access_to_healthcare_services>', ' <n0:availability_of_health_insurance>', availability_of_health_insurance, '</n0:availability_of_health_insurance>', ' <n0:household_id>', household_id, '</n0:household_id>', ' </n0:update>', ' </n0:case>', ' <n1:meta xmlns:n1="<http://openrosa.org/jr/xforms>">', ' <n1:deviceID>Formplayer</n1:deviceID>', ' <n1:timeStart>', currentTimestamp(), '+05:30</n1:timeStart>', ' <n1:timeEnd>', currentTimestamp(), '+05:30</n1:timeEnd>', ' <n1:username>migration</n1:username>', ' <n1:userID>170794ebd70648668c3d8fa7728abfce</n1:userID>', ' <n1:instanceID>', uuid(), '</n1:instanceID>', ' <n2:appVersion xmlns:n2="<http://commcarehq.org/xforms>">Formplayer Version: 2.53</n2:appVersion>', ' <n1:drift>0</n1:drift>', ' </n1:meta>', '</data>' )Add Sink:
Add a sink transformation to save the generated XML (
hhxml) to your Azure SQL Database in thehh_submissiontable under thefinal_xmlcolumn.
Publish the Data Flow:
Save and publish your Data Flow to make it available for execution.
Execute the Data Flow
Trigger the Data Flow in your pipeline to generate and store the household case XML in the Azure SQL Database.
Step 2: Submit XML Data to CommCare
Create a New Pipeline:
Name it
Household Submission.
Add Lookup Activity:
Add a Lookup activity to retrieve the
final_xmldata from thehh_submissiontable.Configure the SQL query as follows:
SELECT final_xml FROM hh_submission;Add Get Credentials Activity:
Use a Web Activity to retrieve the API key from Azure Key Vault.
Set the method to
GETand point to the Key Vault secret.
Add ForEach Activity:
Add a ForEach activity to iterate over the results of the Lookup activity.
In the ForEach activity, add a Web Activity to submit each
final_xmlto CommCare’s API.
Configure Web Activity for Submission:
Set the HTTP method to
POST.Add a header for
Authorizationusing the output from theGetCredentialsactivity.Set the body to
@item().final_xmlto send the XML data.
Publish and Execute the Pipeline:
Publish the pipeline and run it to submit your household case XMLs to CommCare.
Child Cases
This tutorial provides a comprehensive walkthrough on creating and submitting household member cases XML for CommCare using Azure Data Factory (ADF). The process entails generating XML data in ADF, storing it in an Azure SQL Database, and subsequently posting it to CommCare’s API.
Step 1: Create Household Member XML Data in ADF
Create a Data Flow:
In Azure Data Factory Studio, go to the Author tab.
Create a new Data Flow and name it
HouseholdMemberXML.
Add Source:
Select your Azure SQL Database dataset containing household member data (
hm_sqldata).Ensure columns include
Household_ID,Member_ID,Name,Age,Gender,Relation_to_Head,case_id, andparent_case_id.
Add Derived Column Transformation for Household Member Details Block:
Name the transformation
EditHouseholdMemberXMLBlock.Create a derived column
hm_xml_snippetusing the following expression. This can be easily copied from the Raw XML section of any form submission within CommCare for a household member registration.
concat( ' <household_member_details>', ' <n0:case xmlns:n0="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(case_id), '" date_modified="', $formatted_datetime, '" user_id="', $mapped_userid, '">', ' <n0:create>', ' <n0:case_name>', Name, '</n0:case_name>', ' <n0:owner_id>', $mapped_ownerid, '</n0:owner_id>', ' <n0:case_type>household_member</n0:case_type>', ' </n0:create>', ' <n0:update>', ' <n0:age>', toString(Age), '</n0:age>', ' <n0:gender>', Gender, '</n0:gender>', ' <n0:member_id>', Member_ID, '</n0:member_id>', ' <n0:relation_to_head>', Relation_to_Head, '</n0:relation_to_head>', ' </n0:update>', ' <n0:index>', ' <n0:parent case_type="household">', lower(parent_case_id), '</n0:parent>', ' </n0:index>', ' </n0:case>', ' <member_id>', Member_ID, '</member_id>', ' <name>', Name, '</name>', ' <age>', toString(Age), '</age>', ' <gender>', Gender, '</gender>', ' <relation_to_head>', Relation_to_Head, '</relation_to_head>', ' </household_member_details>' )Add Aggregate Transformation:
Name the transformation
AggregatebyParentCaseID.Aggregate by
parent_case_idto group household members under their respective parents.Use
collect()to gather allhm_xml_snippetinto an array.
Convert Array to String:
Add a derived column to convert the aggregated array into a string and clean up the formatting.
Use the following expression:
replace( replace( replace( replace(hm_xml_string, '["', ''), '"]', ''), '\\\\\\\\n', ''), '\\\\', '')Add Sink:
Add a sink transformation to save the generated XML to your Azure SQL Database in the
hm_submissiontable under thechild_xml_blockcolumn.
Publish the Data Flow:
Save and publish your Data Flow to make it available for execution.
1.2 Execute the Data Flow
Trigger the Data Flow in your pipeline to generate and store the household member case XML in the Azure SQL Database.
Step 2: Combine with Parent XML and Finalize Submission
Create a Data Flow:
In Azure Data Factory Studio, go to the Author tab.
Create a new Data Flow and name it
ParentCaseBlock.
Add a Source Transformation:
Name the source
HouseholdMembers.Use your Azure SQL Database dataset (
hm_sqldata) as the source.Ensure the following columns are available:
Household_ID, Member_ID, Name, Age, Gender, Relation_to_Head, case_id, parent_case_id(or any other relevant fields)
Add a Window Transformation:
Name it
SplitNodes.This transformation will generate a row number for each household member under the same parent case, which is useful for uniquely identifying each XML block.
Configure the window to partition by
parent_case_idand order byMember_ID.
window(
over(parent_case_id),
asc(Member_ID, true),
node = rowNumber()-1
)Add an Aggregate Transformation:
Name it
AggregateByParent.Aggregate the data by
parent_case_idto find the maximum node number for each parent case.
aggregate(
groupBy(parent_case_id),
max_node = max(node)
)Join Aggregated Data with Source:
Name this transformation
AggregateCaseNodes.Join the output of
SplitNodeswithAggregateByParenton theparent_case_id.This will allow you to create a parent case XML block that contains all related household members.
join(
HouseholdMembers@parent_case_id == AggregateByParent@parent_case_id,
joinType: 'left',
matchType: 'exact'
)Add a Filter Transformation:
Name it
OnlyParentCases.Filter the rows to isolate the maximum node value for each parent case.
filter(node == max_node)Step 2: Create Parent Case XML Block
Add a Derived Column Transformation:
Name it
ParentCaseXMLBlock.Use this transformation to generate the XML for the parent case block.
The derived column should be named
parent_case_xmlblockand use the following expression:
concat( '<n1:case xmlns:n1="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(AggregateByParent@parent_case_id), '" date_modified="', $formatted_datetime, '" user_id="', $mapped_userid, '"/>' )Finalize Parent Case XML:
Add another derived column transformation named
FinalParentCaseXMLBlock.This transformation will ensure that the node value in the XML is unique by replacing placeholders with the correct node numbers.
derive( finalparentxmlblock = replace(parent_case_xmlblock, 'n1', concat('n', toString(max_node + 1))) )
Step 3: Generate Footer XML Block
Derived Column for Footer XML:
Add a derived column transformation named
FooterXMLBlock.This block creates the footer XML for the submission.
derive( xml_footer = replace($xml_footer_hm, 'n3', concat('n', toString(max_node + 3))) )Final Footer XML Block:
Add a derived column named
FinalFooterXMLBlock.This will finalize the footer by ensuring all placeholders are replaced correctly.
derive( xml_footer = replace(xml_footer, 'n2', concat('n', toString(max_node + 2))) )
Step 4: Join and Combine XML Blocks
Join Transformation:
Add a join transformation named
JoinwithFooterXML.Combine the header, parent case, and footer XML blocks to create the final XML document.
join( CleanUp@parent_case_id == AggregateByParent@parent_case_id, joinType: 'outer', matchType: 'exact' )Combine Final XML:
Add a derived column transformation named
FinalParentandFooterXML.Combine all XML parts into the final submission XML.
derive( parent_case_xmlandxml_footer = concat(finalparentxmlblock, '\\n', xml_footer) )
Step 5: Upsert the Data to SQL Database
Add an Upsert Transformation:
Name it
UpsertCondition.Use an
alterRow()transformation to determine if the row should be upserted based on theparent_case_id.
alterRow( upsertIf(true()) )Configure the Sink:
Name the sink transformation
SQLDBSink.Configure it to upsert the XML data into the
hm_submissiontable, updating theparent_xml_blockcolumn.
sink( allowSchemaDrift: true, validateSchema: false, input( child_xml_block as string, parent_case_id as string, parent_xml_block as string, header_xml_block as string ), deletable: false, insertable: false, updateable: false, upsertable: true, keys: ['parent_case_id'], format: 'table', postSQLs: ['UPDATE hm_submission SET final_xml = ISNULL(header_xml_block, \\'\\') + ISNULL(child_xml_block, \\'\\') + ISNULL(parent_xml_block, \\'\\') WHERE final_xml IS NULL OR final_xml = \\'\\';'] )
Step 6: Submit XML Data to CommCare
Create a Pipeline:
Name it
Household Member Submission.
Add Lookup Activity:
Add a Lookup activity to retrieve the
final_xmldata from thehm_submissiontable.Configure the SQL query as follows:
SELECT final_xml FROM hm_submission;Add Get Credentials Activity:
Use a Web Activity to retrieve the API key from Azure Key Vault.
Set the method to
GETand point to the Key Vault secret.
Add ForEach Activity:
Add a ForEach activity to iterate over the results of the Lookup activity.
In the ForEach activity, add a Web Activity to submit each
final_xmlto CommCare’s Form Submission API.
Configure Web Activity for Submission:
Set the HTTP method to
POST.Add a header for
Authorizationusing the output from theGetCredentialsactivity.Set the body to
@item().final_xmlto send the XML data.
Publish and Execute the Pipeline:
Publish the pipeline and run it to submit your household member case XMLs to CommCare.
Repeat Group Flow Configuration
Step 1: Create Visit Cases XML Data in ADF
Create a Data Flow:
In Azure Data Factory Studio, go to the Author tab.
Create a new Data Flow and name it
VisitCaseXML.
Add Sources:
Visits Source: Add a source transformation and name it
Visits. Connect it to your Azure SQL Database dataset (visit_sqldata). This source will contain columns such asHousehold_ID,Member_ID,Visit_ID,Visit_Date,Health_Score,Weight,Vaccinated,case_id,parent_case_id, andVisit_Count.Meds Source: Add another source transformation and name it
Meds. Connect it to your Azure SQL Database dataset (med_sqldata). This source will contain columns such asMedication_Name,Dosage, andparent_case_id.
Join Transformations:
Join Visits and Meds: Add a join transformation named
JoinVisitsandMeds. Join theVisitsandMedssources on theparent_case_id. This step ensures that each visit is associated with the corresponding medications.
Create Visit and Medication XML Blocks:
Split Nodes: Add a window transformation named
SplitNodesto create row numbers for each visit associated with a household member.window( over(case_id), asc(Visit_ID, true), node = rowNumber()-1 )Edit Visit and Med XML Block: Add a derived column transformation named
EditVisitandMedXMLBlock. Use the following expressions to create the XML snippets for visits and medications. This can be easily copied from the Raw XML section of any form submission within CommCare for a visits registration.visitblock_xml = concat( '<visit_counter>', Visit_Count, '</visit_counter>', '<visit_date>', Visit_Date, '</visit_date>', '<health_score>', Health_Score, '</health_score>', '<weight>', Weight, '</weight>', '<vaccinated>', Vaccinated, '</vaccinated>', '<subcase_0>', ' <n0:case xmlns:n0="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(case_id), '" date_modified="', $formatted_datetime, '" user_id="', $mapped_userid, '">', ' <n0:create>', ' <n0:case_name>', Visit_Count, '</n0:case_name>', ' <n0:owner_id>', $mapped_ownerid, '</n0:owner_id>', ' <n0:case_type>visit</n0:case_type>', ' </n0:create>', ' <n0:update>', ' <n0:health_score>', Health_Score, '</n0:health_score>', ' <n0:visit_date>', Visit_Date, '</n0:visit_date>', ' <n0:weight>', Weight, '</n0:weight>', ' </n0:update>', ' <n0:index>', ' <n0:parent case_type="household_member">', lower(parent_case_id), '</n0:parent>', ' </n0:index>', ' </n0:case>', '</subcase_0>', '<n1:case xmlns:n1="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(case_id), '" date_modified="', $formatted_datetime, '" user_id="', $mapped_userid, '">', ' <n1:update>', ' <n1:visit_counter>', Visit_Count, '</n1:visit_counter>', ' </n1:update>', '</n1:case>' ) medblock_xml = concat( '<medication>', ' <medicine_name>', Medication_Name, '</medicine_name>', ' <dosage>', Dosage, '</dosage>', '</medication>' )
Aggregate and Convert to String:
Aggregate by Parent Case ID: Add an aggregate transformation named
AggregatebyParentCaseID. Group bycase_idand collect all visit and medication XML snippets into arrays.aggregate( groupBy(case_id), visitblock_xml_array = collect(visitblock_xml), medblock_xml_array = collect(medblock_xml) )Convert to String: Add a derived column transformation named
ConverttoStringto convert the collected arrays into strings.derive( visitblock_xml_string = toString(visitblock_xml_array), medblock_xml_string = toString(medblock_xml_array) )
Fix Formatting:
Fix Formatting: Add a derived column transformation named
FixFormattingto remove unwanted characters and add line breaks where necessary.replace( replace( replace( replace(visitblock_xml_string, '[\\"', ''), '\\"]', ''), '\\\\\\\\n', ''), '\\\\\\\\', '')
Combine XML Blocks:
Final Output: Add a derived column transformation named
FinalOutput. Combine the header, visit, medication, and footer XML blocks into the final XML.derive( final_xml = concat(header_xml_string, visitblock_xml_string, medblock_xml_string, footer_xml_string) )
Sink Transformation:
SQLDB Sink: Add a sink transformation named
SQLDBSink. Configure it to upsert the XML data into thevisit_submissiontable in your Azure SQL Database.
Step 2: Submit Visit XML Data to CommCare
Create a New Pipeline:
Name it
Visits Submission.
Add Lookup Activity:
Lookup SQL Data: Add a Lookup activity named
LookupSQLDB. Retrieve thefinal_xmldata from thevisit_submissiontable using the following SQL query:SELECT final_xml FROM visit_submission;
Add Get Credentials Activity:
Get API Key: Add a Web Activity named
GetCredentialsto retrieve the API key from Azure Key Vault. Set the method toGETand configure the URL to point to your Key Vault secret.
Add ForEach Activity:
Submit Forms: Add a ForEach activity named
SubmitFormto iterate over the results of the Lookup activity. Inside the ForEach activity, add a Web Activity namedCCFormSubmissionAPIto submit eachfinal_xmlto CommCare’s API.
Configure Web Activity for Submission:
CCFormSubmissionAPI: Set the HTTP method to
POST. Add a header forAuthorizationusing the output from theGetCredentialsactivity. Set the body to@item().final_xmlto send the XML data.
Publish and Execute the Pipeline:
Publish the pipeline and run it to submit your visit case XMLs to CommCare.