Search This Blog

September 10, 2019

Azure Data Factory - Execute Pipeline Activity

Execute Pipeline activity is used to call another Azure Data Factory Pipeline. This will help to develop reusable pipelines and use it in other pipelines

For example, you can create a separate pipeline to archive the files after processing. You can create one pipeline for archiving files (move the blobs or files to different folders) and call that pipeline in multiple pipelines. This will help you to reduce development time and increase code reusability.

Similarly, you can create a separate pipeline for sending emails in Azure data factory and re-use in multiple pipelines.


In the following screen-shot, we are calling two different pipelines (Archiving Pipeline, Send_Error_Message)









For more information, please check the following
https://docs.microsoft.com/en-us/azure/data-factory/control-flow-execute-pipeline-activity




September 09, 2019

Azure Data Warehouse - Materialized View

Materialized View

Materialized Views are similar to Indexed Views in SQL Server. These are used to increase query performance and hide complex query logic from the users.

To enable disabled materialized view, you need to use the "Alter" statement.

ALTER MATERIALIZED VIEW My_Indexed_View REBUILD;

Materialized views are visible within SQL Server Management Studio under the views folder of the Azure SQL Data Warehouse instance

Limitations

  • Materialized views are disabled during an update or delete operations on the base tables.
  • Only Clustered Cloumnstore Index is supported.
  • Only Hash and Round Robin distributions are supported.

Sample Code

CREATE MATERIALIZED VIEW mv_test2
WITH (distribution = hash(i_category_id), FOR_APPEND)
AS
SELECT MAX(i.i_rec_start_date) as max_i_rec_start_date, MIN(i.i_rec_end_date) as min_i_rec_end_date, i.i_item_sk, i.i_item_id, i.i_category_id
FROM syntheticworkload.item i
GROUP BY i.i_item_sk, i.i_item_id, i.i_category_id


For more information, check the following links

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-materialized-view-as-select-transact-sql?view=azure-sqldw-latest


September 08, 2019

Azure Data Factory - If Condition activity

If activity is used to perform different activities based on the input expressions. In the following, we are checking if the file is present or not using "GetMetaData" activity. If file presents perform the activities in if True activities. If file does not present then Edit the False activities.

Passing input values for If condition activity



Set validation expression in settings for If condition activity.


Adding activity for True or False conditions.


For more information, please check https://docs.microsoft.com/en-us/azure/data-factory/control-flow-if-condition-activity

September 07, 2019

Azure Data Factory - For Each Activity

"ForEach" activity in Azure Data Factory is used to loop thru each row of the input and perform the actions based on the data.

Following screenshot shows two activities one is GetMetaData and second one is ForEach loop. The output of the GetMetaData is passed to the ForEach activity in the Settings. If the checkbox for "Sequential" is selected all the activities are performed in sequence for all the child items. If parallel execution is needed we need to un-check "Sequential".


"ForEach" activity can be used to an output of Lookup activity as well.

You can perform multiple activities for each row of data received for the input of ForEach activity. You can add all the activities by going to Activities and Edit Activities.

Limitations and workarounds

Here are some limitations of the ForEach activity and suggested workarounds.

Limitation Workaround
You can't nest a ForEach loop inside another ForEach loop (or an Until loop). Design a two-level pipeline where the outer pipeline with the outer ForEach loop iterates over an inner pipeline with the nested loop.
The ForEach activity has a maximum batchCount of 50 for parallel processing, and a maximum of 100,000 items. Design a two-level pipeline where the outer pipeline with the ForEach activity iterates over an inner pipeline.

For more information, please check the documentation at  https://docs.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity

September 06, 2019

Azure Analysis Services

Supported Features

  • AAS is available in 3 tiers (Developer for development, Basic and Standard)
  • We can pause and resume the server when not used to reduce the cost.
  • Supports only the tabular model.
  • Scale-out up to 8 nodes when the load on the model is increased.
  • Partitions are supported
  • Row-level permissions are supported.
  • Perspectives are supported.
  • Bi-directional relations are supported.
  • Metadata is stored in Azure Blob storage.

 Limitations
  • A multidimensional model is not supported
  • Power-pivot for SharePoint is not supported.



September 05, 2019

Azure SQL Server VS Azure Data Warehouse

Azure SQL Server
  • Database size limit is 4TB
  • Optimized for OLTP loads
  • Cross-database queries are supported
  • Automatic performance tuning of a database is supported


Azure Data Warehouse
  • Supports up to 1 Peta Byte (1024 TB) size.
  • Optimized for OLAP loads
  • MPP (Massively Parallel Processing) system (data is processed on multiple parallel nodes)
  • Polybase support to load data from multiple systems.
  • Cross-database queries are not supported.
  • Manual performance tuning.
  • You can pause and resume the database to reduce cost.

September 04, 2019

Azure Data Factory Stored Procedure Activity

Stored Procedure activity is used to execute Stored Procedure in Azure Data Factory. You can use this to maintain audit logs, update metadata information, to track files in Azure blobs to  MetaData activity output. To create dynamic external tables in Azure Data Warehouse.

To pass the parameter values to Stored Procedure, you can use pipeline parameters. Use dynamic content to pass dynamic values. You can import all the stored procedure parameters using import parameters


For more information on Stored Procedure activity, Please check the following https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure

September 03, 2019

Azure Data Factory Execute Pipeline Activity

Execute Pipeline Activity is used to call a different pipeline in Azure Data Factory. It is similar to calling Execute Package task in SSIS.

The main purpose of Execute Pipeline is to develop common pipelines which can be used in multiple pipelines and call them when needed. This helps to reduce development time and reduce maintenance of code.

If the calling pipeline needs to wait till the called pipeline is completed, set the "waitOnCompletation" to True.

Check the following for more details https://docs.microsoft.com/en-us/azure/data-factory/control-flow-execute-pipeline-activity


September 02, 2019

Azure Data Factory Lookup activity

Lookup activity in Azure Data Factory is different from the lookup activity present in SSIS. Lookup activity in Azure Data Factory is used to execute Stored Procedure with output.

It can return the first row or multiple rows of data from the stored procedure. The output can be used to input for each loop.

The lookup activity supports a max of 5000 rows as output.

For more details, please check the following https://docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity documentation

September 01, 2019

Azure Data Analytics: U-SQL Skipping headers and outputting headers

To skip the headers while reading the input files. Use the skipFirstNRows parameter

USING Extractors.Csv(skipFirstNRows:1)

For complete parameter details check the following
https://docs.microsoft.com/en-us/u-sql/functions/operators/extractors/extractor-parameters



Output the Header row while writting the output file

USING Outputters.Csv(outputHeader:true);

For complete parameter details check the following
https://docs.microsoft.com/en-us/u-sql/functions/operators/outputters/outputter-parameters