Azure Data Factory for Data Migration

Azure Data Factory for Data Migration

For advanced users: Learn how to use Power Automate / Azure Data Factory to migrate data from other platforms into CommCare with example scripts, detailed steps, and troubleshooting tips.

Parent Cases

Step 1: Create XML Data in ADF

  1. Create a Data Flow:

    • In Azure Data Factory Studio, go to the Author tab.

    • Create a new Data Flow and name it HouseholdCaseXML.

  2. Add Source:

    • Select your Azure SQL Database dataset containing household data (hh_sqldata).

    • Ensure columns include household_id, access_to_healthcare_services, availability_of_health_insurance, access_to_clean_drinking_water, and case_id.

  3. Add Derived Column Transformation:

    • Name the transformation hhxml.

    • Create a derived column hh_xml using the following expression. This can be easily copied from the Raw XML section of any form submission within CommCare for a household registration.

    • Replace sections where inputs from the SQL DB are required. For example, case id, household_id and others as seen in this example.

    • Ensure that the instance_id has a UUID value for creating a unique ID for the submission.

    concat( '<?xml version="1.0" ?>', '<data xmlns:jrm="<http://dev.commcarehq.org/jr/xforms>" xmlns="<http://openrosa.org/formdesigner/4E761EC2-20EA-47EB-904E-557041DCAD1C>" uiVersion="1" version="77" name="Household Registration">', ' <household_id>', household_id, '</household_id>', ' <access_to_healthcare_services>', access_to_healthcare_services, '</access_to_healthcare_services>', ' <availability_of_health_insurance>', availability_of_health_insurance, '</availability_of_health_insurance>', ' <access_to_clean_drinking_water>', access_to_clean_drinking_water, '</access_to_clean_drinking_water>', ' <n0:case xmlns:n0="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(case_id), '" date_modified="', currentTimestamp(), '+05:30" user_id="170794ebd70648668c3d8fa7728abfce">', ' <n0:create>', ' <n0:case_name>', household_id, '</n0:case_name>', ' <n0:owner_id>170794ebd70648668c3d8fa7728abfce</n0:owner_id>', ' <n0:case_type>household</n0:case_type>', ' </n0:create>', ' <n0:update>', ' <n0:access_to_clean_drinking_water>', access_to_clean_drinking_water, '</n0:access_to_clean_drinking_water>', ' <n0:access_to_healthcare_services>', access_to_healthcare_services, '</n0:access_to_healthcare_services>', ' <n0:availability_of_health_insurance>', availability_of_health_insurance, '</n0:availability_of_health_insurance>', ' <n0:household_id>', household_id, '</n0:household_id>', ' </n0:update>', ' </n0:case>', ' <n1:meta xmlns:n1="<http://openrosa.org/jr/xforms>">', ' <n1:deviceID>Formplayer</n1:deviceID>', ' <n1:timeStart>', currentTimestamp(), '+05:30</n1:timeStart>', ' <n1:timeEnd>', currentTimestamp(), '+05:30</n1:timeEnd>', ' <n1:username>migration</n1:username>', ' <n1:userID>170794ebd70648668c3d8fa7728abfce</n1:userID>', ' <n1:instanceID>', uuid(), '</n1:instanceID>', ' <n2:appVersion xmlns:n2="<http://commcarehq.org/xforms>">Formplayer Version: 2.53</n2:appVersion>', ' <n1:drift>0</n1:drift>', ' </n1:meta>', '</data>' )
  4. Add Sink:

    • Add a sink transformation to save the generated XML (hhxml) to your Azure SQL Database in the hh_submission table under the final_xml column.

  5. Publish the Data Flow:

    • Save and publish your Data Flow to make it available for execution.

  6. Execute the Data Flow

  • Trigger the Data Flow in your pipeline to generate and store the household case XML in the Azure SQL Database.

Step 2: Submit XML Data to CommCare

  1. Create a New Pipeline:

    • Name it Household Submission.

  2. Add Lookup Activity:

    • Add a Lookup activity to retrieve the final_xml data from the hh_submission table.

    • Configure the SQL query as follows:

    SELECT final_xml FROM hh_submission;
  3. Add Get Credentials Activity:

    • Use a Web Activity to retrieve the API key from Azure Key Vault.

    • Set the method to GET and point to the Key Vault secret.

  4. Add ForEach Activity:

    • Add a ForEach activity to iterate over the results of the Lookup activity.

    • In the ForEach activity, add a Web Activity to submit each final_xml to CommCare’s API.

  5. Configure Web Activity for Submission:

    • Set the HTTP method to POST.

    • Add a header for Authorization using the output from the GetCredentials activity.

    • Set the body to @item().final_xml to send the XML data.

  6. Publish and Execute the Pipeline:

    • Publish the pipeline and run it to submit your household case XMLs to CommCare.

Child Cases

This tutorial provides a comprehensive walkthrough on creating and submitting household member cases XML for CommCare using Azure Data Factory (ADF). The process entails generating XML data in ADF, storing it in an Azure SQL Database, and subsequently posting it to CommCare’s API.

Step 1: Create Household Member XML Data in ADF

  1. Create a Data Flow:

    • In Azure Data Factory Studio, go to the Author tab.

    • Create a new Data Flow and name it HouseholdMemberXML.

  2. Add Source:

    • Select your Azure SQL Database dataset containing household member data (hm_sqldata).

    • Ensure columns include Household_ID, Member_ID, Name, Age, Gender, Relation_to_Head, case_id, and parent_case_id.

  3. Add Derived Column Transformation for Household Member Details Block:

    • Name the transformation EditHouseholdMemberXMLBlock.

    • Create a derived column hm_xml_snippet using the following expression. This can be easily copied from the Raw XML section of any form submission within CommCare for a household member registration.

    concat( ' <household_member_details>', ' <n0:case xmlns:n0="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(case_id), '" date_modified="', $formatted_datetime, '" user_id="', $mapped_userid, '">', ' <n0:create>', ' <n0:case_name>', Name, '</n0:case_name>', ' <n0:owner_id>', $mapped_ownerid, '</n0:owner_id>', ' <n0:case_type>household_member</n0:case_type>', ' </n0:create>', ' <n0:update>', ' <n0:age>', toString(Age), '</n0:age>', ' <n0:gender>', Gender, '</n0:gender>', ' <n0:member_id>', Member_ID, '</n0:member_id>', ' <n0:relation_to_head>', Relation_to_Head, '</n0:relation_to_head>', ' </n0:update>', ' <n0:index>', ' <n0:parent case_type="household">', lower(parent_case_id), '</n0:parent>', ' </n0:index>', ' </n0:case>', ' <member_id>', Member_ID, '</member_id>', ' <name>', Name, '</name>', ' <age>', toString(Age), '</age>', ' <gender>', Gender, '</gender>', ' <relation_to_head>', Relation_to_Head, '</relation_to_head>', ' </household_member_details>' )
  4. Add Aggregate Transformation:

    • Name the transformation AggregatebyParentCaseID.

    • Aggregate by parent_case_id to group household members under their respective parents.

    • Use collect() to gather all hm_xml_snippet into an array.

  5. Convert Array to String:

    • Add a derived column to convert the aggregated array into a string and clean up the formatting.

    • Use the following expression:

    replace( replace( replace( replace(hm_xml_string, '["', ''), '"]', ''), '\\\\\\\\n', ''), '\\\\', '')
  6. Add Sink:

    • Add a sink transformation to save the generated XML to your Azure SQL Database in the hm_submission table under the child_xml_block column.

  7. Publish the Data Flow:

    • Save and publish your Data Flow to make it available for execution.

1.2 Execute the Data Flow

  • Trigger the Data Flow in your pipeline to generate and store the household member case XML in the Azure SQL Database.

Step 2: Combine with Parent XML and Finalize Submission

  1. Create a Data Flow:

    • In Azure Data Factory Studio, go to the Author tab.

    • Create a new Data Flow and name it ParentCaseBlock.

  2. Add a Source Transformation:

    • Name the source HouseholdMembers.

    • Use your Azure SQL Database dataset (hm_sqldata) as the source.

    • Ensure the following columns are available:

    • Household_ID, Member_ID, Name, Age, Gender, Relation_to_Head, case_id, parent_case_id (or any other relevant fields)

  3. Add a Window Transformation:

    • Name it SplitNodes.

    • This transformation will generate a row number for each household member under the same parent case, which is useful for uniquely identifying each XML block.

    • Configure the window to partition by parent_case_id and order by Member_ID.

window( over(parent_case_id), asc(Member_ID, true), node = rowNumber()-1 )
  1. Add an Aggregate Transformation:

    • Name it AggregateByParent.

    • Aggregate the data by parent_case_id to find the maximum node number for each parent case.

aggregate( groupBy(parent_case_id), max_node = max(node) )
  1. Join Aggregated Data with Source:

  • Name this transformation AggregateCaseNodes.

  • Join the output of SplitNodes with AggregateByParent on the parent_case_id.

  • This will allow you to create a parent case XML block that contains all related household members.

join( HouseholdMembers@parent_case_id == AggregateByParent@parent_case_id, joinType: 'left', matchType: 'exact' )
  1. Add a Filter Transformation:

    • Name it OnlyParentCases.

    • Filter the rows to isolate the maximum node value for each parent case.

filter(node == max_node)

Step 2: Create Parent Case XML Block

  1. Add a Derived Column Transformation:

    • Name it ParentCaseXMLBlock.

    • Use this transformation to generate the XML for the parent case block.

    • The derived column should be named parent_case_xmlblock and use the following expression:

    concat( '<n1:case xmlns:n1="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(AggregateByParent@parent_case_id), '" date_modified="', $formatted_datetime, '" user_id="', $mapped_userid, '"/>' )
  2. Finalize Parent Case XML:

    • Add another derived column transformation named FinalParentCaseXMLBlock.

    • This transformation will ensure that the node value in the XML is unique by replacing placeholders with the correct node numbers.

    derive( finalparentxmlblock = replace(parent_case_xmlblock, 'n1', concat('n', toString(max_node + 1))) )

Step 3: Generate Footer XML Block

  1. Derived Column for Footer XML:

    • Add a derived column transformation named FooterXMLBlock.

    • This block creates the footer XML for the submission.

    derive( xml_footer = replace($xml_footer_hm, 'n3', concat('n', toString(max_node + 3))) )
  2. Final Footer XML Block:

    • Add a derived column named FinalFooterXMLBlock.

    • This will finalize the footer by ensuring all placeholders are replaced correctly.

    derive( xml_footer = replace(xml_footer, 'n2', concat('n', toString(max_node + 2))) )

Step 4: Join and Combine XML Blocks

  1. Join Transformation:

    • Add a join transformation named JoinwithFooterXML.

    • Combine the header, parent case, and footer XML blocks to create the final XML document.

    join( CleanUp@parent_case_id == AggregateByParent@parent_case_id, joinType: 'outer', matchType: 'exact' )
  2. Combine Final XML:

    • Add a derived column transformation named FinalParentandFooterXML.

    • Combine all XML parts into the final submission XML.

    derive( parent_case_xmlandxml_footer = concat(finalparentxmlblock, '\\n', xml_footer) )

Step 5: Upsert the Data to SQL Database

  1. Add an Upsert Transformation:

    • Name it UpsertCondition.

    • Use an alterRow() transformation to determine if the row should be upserted based on the parent_case_id.

    alterRow( upsertIf(true()) )
  2. Configure the Sink:

    • Name the sink transformation SQLDBSink.

    • Configure it to upsert the XML data into the hm_submission table, updating the parent_xml_block column.

    sink( allowSchemaDrift: true, validateSchema: false, input( child_xml_block as string, parent_case_id as string, parent_xml_block as string, header_xml_block as string ), deletable: false, insertable: false, updateable: false, upsertable: true, keys: ['parent_case_id'], format: 'table', postSQLs: ['UPDATE hm_submission SET final_xml = ISNULL(header_xml_block, \\'\\') + ISNULL(child_xml_block, \\'\\') + ISNULL(parent_xml_block, \\'\\') WHERE final_xml IS NULL OR final_xml = \\'\\';'] )

Step 6: Submit XML Data to CommCare

  1. Create a Pipeline:

    • Name it Household Member Submission.

  2. Add Lookup Activity:

    • Add a Lookup activity to retrieve the final_xml data from the hm_submission table.

    • Configure the SQL query as follows:

    SELECT final_xml FROM hm_submission;
  3. Add Get Credentials Activity:

    • Use a Web Activity to retrieve the API key from Azure Key Vault.

    • Set the method to GET and point to the Key Vault secret.

  4. Add ForEach Activity:

    • Add a ForEach activity to iterate over the results of the Lookup activity.

    • In the ForEach activity, add a Web Activity to submit each final_xml to CommCare’s Form Submission API.

  5. Configure Web Activity for Submission:

    • Set the HTTP method to POST.

    • Add a header for Authorization using the output from the GetCredentials activity.

    • Set the body to @item().final_xml to send the XML data.

  6. Publish and Execute the Pipeline:

    • Publish the pipeline and run it to submit your household member case XMLs to CommCare.

Repeat Group Flow Configuration

Step 1: Create Visit Cases XML Data in ADF

  1. Create a Data Flow:

    • In Azure Data Factory Studio, go to the Author tab.

    • Create a new Data Flow and name it VisitCaseXML.

  2. Add Sources:

    • Visits Source: Add a source transformation and name it Visits. Connect it to your Azure SQL Database dataset (visit_sqldata). This source will contain columns such as Household_ID, Member_ID, Visit_ID, Visit_Date, Health_Score, Weight, Vaccinated, case_id, parent_case_id, and Visit_Count.

    • Meds Source: Add another source transformation and name it Meds. Connect it to your Azure SQL Database dataset (med_sqldata). This source will contain columns such as Medication_Name, Dosage, and parent_case_id.

  3. Join Transformations:

    • Join Visits and Meds: Add a join transformation named JoinVisitsandMeds. Join the Visits and Meds sources on the parent_case_id. This step ensures that each visit is associated with the corresponding medications.

  4. Create Visit and Medication XML Blocks:

    • Split Nodes: Add a window transformation named SplitNodes to create row numbers for each visit associated with a household member.

      window( over(case_id), asc(Visit_ID, true), node = rowNumber()-1 )
    • Edit Visit and Med XML Block: Add a derived column transformation named EditVisitandMedXMLBlock. Use the following expressions to create the XML snippets for visits and medications. This can be easily copied from the Raw XML section of any form submission within CommCare for a visits registration.

      visitblock_xml = concat( '<visit_counter>', Visit_Count, '</visit_counter>', '<visit_date>', Visit_Date, '</visit_date>', '<health_score>', Health_Score, '</health_score>', '<weight>', Weight, '</weight>', '<vaccinated>', Vaccinated, '</vaccinated>', '<subcase_0>', ' <n0:case xmlns:n0="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(case_id), '" date_modified="', $formatted_datetime, '" user_id="', $mapped_userid, '">', ' <n0:create>', ' <n0:case_name>', Visit_Count, '</n0:case_name>', ' <n0:owner_id>', $mapped_ownerid, '</n0:owner_id>', ' <n0:case_type>visit</n0:case_type>', ' </n0:create>', ' <n0:update>', ' <n0:health_score>', Health_Score, '</n0:health_score>', ' <n0:visit_date>', Visit_Date, '</n0:visit_date>', ' <n0:weight>', Weight, '</n0:weight>', ' </n0:update>', ' <n0:index>', ' <n0:parent case_type="household_member">', lower(parent_case_id), '</n0:parent>', ' </n0:index>', ' </n0:case>', '</subcase_0>', '<n1:case xmlns:n1="<http://commcarehq.org/case/transaction/v2>" case_id="', lower(case_id), '" date_modified="', $formatted_datetime, '" user_id="', $mapped_userid, '">', ' <n1:update>', ' <n1:visit_counter>', Visit_Count, '</n1:visit_counter>', ' </n1:update>', '</n1:case>' ) medblock_xml = concat( '<medication>', ' <medicine_name>', Medication_Name, '</medicine_name>', ' <dosage>', Dosage, '</dosage>', '</medication>' )
  5. Aggregate and Convert to String:

    • Aggregate by Parent Case ID: Add an aggregate transformation named AggregatebyParentCaseID. Group by case_id and collect all visit and medication XML snippets into arrays.

      aggregate( groupBy(case_id), visitblock_xml_array = collect(visitblock_xml), medblock_xml_array = collect(medblock_xml) )
    • Convert to String: Add a derived column transformation named ConverttoString to convert the collected arrays into strings.

      derive( visitblock_xml_string = toString(visitblock_xml_array), medblock_xml_string = toString(medblock_xml_array) )
  6. Fix Formatting:

    • Fix Formatting: Add a derived column transformation named FixFormatting to remove unwanted characters and add line breaks where necessary.

      replace( replace( replace( replace(visitblock_xml_string, '[\\"', ''), '\\"]', ''), '\\\\\\\\n', ''), '\\\\\\\\', '')
  7. Combine XML Blocks:

    • Final Output: Add a derived column transformation named FinalOutput. Combine the header, visit, medication, and footer XML blocks into the final XML.

      derive( final_xml = concat(header_xml_string, visitblock_xml_string, medblock_xml_string, footer_xml_string) )
  8. Sink Transformation:

    • SQLDB Sink: Add a sink transformation named SQLDBSink. Configure it to upsert the XML data into the visit_submission table in your Azure SQL Database.

Step 2: Submit Visit XML Data to CommCare

  1. Create a New Pipeline:

    • Name it Visits Submission.

  2. Add Lookup Activity:

    • Lookup SQL Data: Add a Lookup activity named LookupSQLDB. Retrieve the final_xml data from the visit_submission table using the following SQL query:

      SELECT final_xml FROM visit_submission;
  3. Add Get Credentials Activity:

    • Get API Key: Add a Web Activity named GetCredentials to retrieve the API key from Azure Key Vault. Set the method to GET and configure the URL to point to your Key Vault secret.

  4. Add ForEach Activity:

    • Submit Forms: Add a ForEach activity named SubmitForm to iterate over the results of the Lookup activity. Inside the ForEach activity, add a Web Activity named CCFormSubmissionAPI to submit each final_xml to CommCare’s API.

  5. Configure Web Activity for Submission:

    • CCFormSubmissionAPI: Set the HTTP method to POST. Add a header for Authorization using the output from the GetCredentials activity. Set the body to @item().final_xml to send the XML data.

  6. Publish and Execute the Pipeline:

    • Publish the pipeline and run it to submit your visit case XMLs to CommCare.