Create a pipeline in azure data factory

Below is an Azure CLI script to create an Azure Data Factory (ADF) instance and set up a basic copy flow (pipeline) to copy data from a source (e.g., Azure Blob Storage) to a destination (e.g., Azure SQL Database).


Pre-requisites

1. Azure CLI installed and authenticated.

2. Required Azure resources created:

• Azure Blob Storage with a container and a sample file.

• Azure SQL Database with a table to hold the copied data.

3. Replace placeholders (e.g., <RESOURCE_GROUP_NAME>) with actual values.


Script: Create Azure Data Factory and Copy Flow


# Variables

RESOURCE_GROUP="<RESOURCE_GROUP_NAME>"

LOCATION="<LOCATION>"

DATA_FACTORY_NAME="<DATA_FACTORY_NAME>"

STORAGE_ACCOUNT="<STORAGE_ACCOUNT_NAME>"

BLOB_CONTAINER="<BLOB_CONTAINER_NAME>"

SQL_SERVER_NAME="<SQL_SERVER_NAME>"

SQL_DATABASE_NAME="<SQL_DATABASE_NAME>"

SQL_USERNAME="<SQL_USERNAME>"

SQL_PASSWORD="<SQL_PASSWORD>"

PIPELINE_NAME="CopyPipeline"

DATASET_SOURCE_NAME="BlobDataset"

DATASET_DEST_NAME="SQLDataset"

LINKED_SERVICE_BLOB="BlobLinkedService"

LINKED_SERVICE_SQL="SQLLinkedService"


# Create Azure Data Factory

az datafactory create \

 --resource-group $RESOURCE_GROUP \

 --location $LOCATION \

 --factory-name $DATA_FACTORY_NAME


# Create Linked Service for Azure Blob Storage

az datafactory linked-service create \

 --resource-group $RESOURCE_GROUP \

 --factory-name $DATA_FACTORY_NAME \

 --linked-service-name $LINKED_SERVICE_BLOB \

 --properties "{\"type\": \"AzureBlobStorage\", \"typeProperties\": {\"connectionString\": \"DefaultEndpointsProtocol=https;AccountName=$STORAGE_ACCOUNT;EndpointSuffix=core.windows.net\"}}"


# Create Linked Service for Azure SQL Database

az datafactory linked-service create \

 --resource-group $RESOURCE_GROUP \

 --factory-name $DATA_FACTORY_NAME \

 --linked-service-name $LINKED_SERVICE_SQL \

 --properties "{\"type\": \"AzureSqlDatabase\", \"typeProperties\": {\"connectionString\": \"Server=tcp:$SQL_SERVER_NAME.database.windows.net,1433;Initial Catalog=$SQL_DATABASE_NAME;User ID=$SQL_USERNAME;Password=$SQL_PASSWORD;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;\"}}"


# Create Dataset for Azure Blob Storage

az datafactory dataset create \

 --resource-group $RESOURCE_GROUP \

 --factory-name $DATA_FACTORY_NAME \

 --dataset-name $DATASET_SOURCE_NAME \

 --properties "{\"type\": \"AzureBlob\", \"linkedServiceName\": {\"referenceName\": \"$LINKED_SERVICE_BLOB\", \"type\": \"LinkedServiceReference\"}, \"typeProperties\": {\"folderPath\": \"$BLOB_CONTAINER\", \"format\": {\"type\": \"TextFormat\"}}}"


# Create Dataset for Azure SQL Database

az datafactory dataset create \

 --resource-group $RESOURCE_GROUP \

 --factory-name $DATA_FACTORY_NAME \

 --dataset-name $DATASET_DEST_NAME \

 --properties "{\"type\": \"AzureSqlTable\", \"linkedServiceName\": {\"referenceName\": \"$LINKED_SERVICE_SQL\", \"type\": \"LinkedServiceReference\"}, \"typeProperties\": {\"tableName\": \"<TABLE_NAME>\"}}"


# Create a Copy Pipeline

az datafactory pipeline create \

 --resource-group $RESOURCE_GROUP \

 --factory-name $DATA_FACTORY_NAME \

 --pipeline-name $PIPELINE_NAME \

 --properties "{\"activities\": [{\"name\": \"CopyFromBlobToSQL\", \"type\": \"Copy\", \"inputs\": [{\"referenceName\": \"$DATASET_SOURCE_NAME\", \"type\": \"DatasetReference\"}], \"outputs\": [{\"referenceName\": \"$DATASET_DEST_NAME\", \"type\": \"DatasetReference\"}], \"typeProperties\": {\"source\": {\"type\": \"BlobSource\"}, \"sink\": {\"type\": \"AzureSqlSink\"}}}]}"


# Trigger the Pipeline Run

az datafactory pipeline create-run \

 --resource-group $RESOURCE_GROUP \

 --factory-name $DATA_FACTORY_NAME \

 --pipeline-name $PIPELINE_NAME


echo "Azure Data Factory and Copy Pipeline setup complete!"


Steps Breakdown

1. Create Data Factory: Sets up the ADF instance in the specified resource group and location.

2. Linked Services:

• Blob Storage: Connects ADF to Azure Blob Storage.

• SQL Database: Connects ADF to Azure SQL Database.

3. Datasets:

• Source Dataset: Represents the data in Azure Blob Storage.

• Destination Dataset: Represents the table in Azure SQL Database.

4. Pipeline: Defines a copy activity to transfer data from Blob to SQL.

5. Trigger: Starts the pipeline to execute the copy process.


Customize

• Replace <TABLE_NAME> in the SQL dataset properties with the target SQL table.

• Update typeProperties for datasets and activities to match specific formats (e.g., CSV, JSON).


Let me know if you need adjustments or additional features!



From Blogger iPhone client