Search This Blog

August 31, 2019

Microsoft Azure: Free Microsoft Azure learning videos for different roles at Pluralsight


Go to the following link and register with your email. Three roles are for free of cost. There is a sequence of videos.

In Azure Environment can be used for different things
  1. Web-based platforms
  2. Big Data Platform
  3. AI platform

Choose the technologies based on the platform you are interested in.

August 30, 2019

Microsoft Azure: End to End technologies used in Azure for data storage and processing


Azure Storage (Blob storage) -- Used to store data files. Landing or storage area for getting data from multiple systems. Create multiple storage containers for landing, staging, processing and archiving. Virtual folders. Permissions are at the container level. Less data cost. Cost and data availability is different in Hot, Cold & Archive levels.

Azure Data Lake (Gen-1) -- Optimized for Big Data storage and processing. Data can be stored in a hierarchy format. Security can be enabled at the folder level. Parallel writes and reads are enabled. Storage cost is more compared to Azure Storage.

Azure Data Lake (Gen-2) -- Optimized for Big Data storage and processing. Supports both Object-based storage and Hierarchy level storage. Advantages of Azure Storage and Azure Data Lake (Gen-1).

Azure Data Analytics -- Serverless and cluster level processing of big data.

Azure DataBricks -- Cluster-based processing of big data using Spark.

Azure Data Factory -- Similar to SSIS and SQL Server Agent. Used to develop control flows and different tasks for processing data. Jobs can be executed one time or scheduled basis. Only UTC time is supported for job schedules. You can use SSIS, U-SQL, Data Bricks Spark and Azure Data Factory tasks to process the data.

Azure SQL Server -- Supports up to 4 TB of data size. Mainly used for OLTP systems. When needed SQL Server instance, you need to use Managed Azure SQL Server. You can also create a single database.

Azure SQL Data Warehouse -- Used for OLAP data marts and if the data size is more than 4 TB. MPP processing architecture. Data is processed on multiple nodes. Data is stored in Azure Storage. You can create External tables to access HDFS and other data sources. Not all the SQL features are supported currently.

Azure Analysis Services -- Tabular model for Analysis services.

Power BI -- For Reporting purpose.

Azure DevOps -- For source code control and auto-deploy the code.

August 29, 2019

Azure Key Vault -- To store sensitive data

Azure Key Vault -- To store sensitive data

This technology is used to hide all the sensitive information like SQL Connection strings, SQL User Name, and passwords. Advantage of this technology is you define the key-value pairs like give the connection string a name and the entire connection string is hidden from all the applications.

This will help not to store the connection strings in source control or applications. In all the applications and source control we refer only with the secret name

For more information, check the following

August 28, 2019

Azure Data Factory - GetMetaData activity



GetMetaData activity is used to get file information which is present in Azure storage. This will get file size, row count, lastModifiedDate, file exists and other information.

Following screenshot shows how to get all files information present in a particular folder. The folder name is passed as pipeline parameter



The output of this activity can be used as input to Stored Procedure activity. It can be used to store the metadata information in Azure SQL Database

For more information check the following

August 26, 2019

Azure Storage: Azure Storage VS Azure Data Lake Storage (Gen-1 & Gen-II)

Azure Storage VS Azure Data Lake Storage (Gen-1 & Gen-II)
Azure Storage
· Object Storage
· Virtual folders -no real folders.
· No folder level security
· No folder specific performance optimizations
· Data is stored in Containers as blobs
· Storage cost is based on hot, cold and archive tiers
· Available in all regions globally
· Data replication and redundancy options
Azure Data Lake Storage (Gen-1)
· Hierarchical file system
o Supports nesting of files within folders.
· Folder level security can be implemented
o Fine-grained security visa access control lists
· Performance optimization at the folder level
· Parallel reads and writes
· Scaled out over multiple nodes
· Hadoop and big data optimizations
· Optimized for analytics workloads.
Azure Data Lake Storage (Gen-2)
· Multi-modal storage combining features from both Azure Storage and ADLS (Gen-1)
· Enable the Hierarchical Namespace to use for the file system.
· Data can be accessed using object storage endpoint or file system storage endpoint.
· Object Storage Endpoint – wasb://containername@accountname.blob.core.windows.net/
· File System Endpoint – abfs://filesystemname@accountname.dfs.core.windows.net/
· Optimized for analytics workloads.
Leverage partition scans & partition pruning to improve query performance.

August 25, 2019

Azure Data Analytics: How to access Azure Storage Blob data from Azure Data Lakes

1. Register Azure Storage Blob account in Azure Data Lakes

Ensure that your Windows Azure Blob Storage account is registered with your Azure Data Lake Analytics account. I have copied the steps below from Registering Your Windows Azure Blob Storage account.
  1. Navigate to the Azure Portal and log in.
  2. Navigate to your Azure Data Lake Analytics Account.
  3. Select Data Sources under Settings.
  4. Verify whether your WABS account is listed. If yes, stop here. If no, continue to next step.
  5. Click Add Data Source.
  6. Select Azure Storage from the Storage Type drop-down list.
  7. Select Select Account from the Selection Method drop-down list.
  8. Select your WABS account from the Azure Storage drop-down list.
  9. Click Add