Search This Blog

March 02, 2021

Data pipeline orchestration technology in Azure

 

Capability matrix

The following tables summarize the key differences in capabilities.

General capabilities

GENERAL CAPABILITIES
CapabilityAzure Data FactorySQL Server Integration Services (SSIS)Oozie on HDInsight
ManagedYesNoYes
Cloud-basedYesNo (local)Yes
PrerequisiteAzure SubscriptionSQL ServerAzure Subscription, HDInsight cluster
Management toolsAzure Portal, PowerShell, CLI, .NET SDKSSMS, PowerShellBash shell, Oozie REST API, Oozie web UI
PricingPay per usageLicensing / pay for featuresNo additional charge on top of running the HDInsight cluster

Pipeline capabilities

PIPELINE CAPABILITIES
CapabilityAzure Data FactorySQL Server Integration Services (SSIS)Oozie on HDInsight
Copy dataYesYesYes
Custom transformationsYesYesYes (MapReduce, Pig, and Hive jobs)
Azure Machine Learning scoringYesYes (with scripting)No
HDInsight On-DemandYesNoNo
Azure BatchYesNoNo
Pig, Hive, MapReduceYesNoYes
SparkYesNoNo
Execute SSIS PackageYesYesNo
Control flowYesYesYes
Access on-premises dataYesYesNo

Scalability capabilities

SCALABILITY CAPABILITIES
CapabilityAzure Data FactorySQL Server Integration Services (SSIS)Oozie on HDInsight
Scale upYesNoNo
Scale outYesNoYes (by adding worker nodes to cluster)
Optimized for big dataYesNoYes

No comments: