Frequently Asked Questions on Lab 1 Lesson - Prepare Your Data

Here, we are going to answer all your questions related to Lab 1 -Prepare Your Data for Crime Analytics & Predictions Labs course.

If you didn’t find the answer listed below, please raise a ticket in Safera Labs Support.

Browse through the commonly asked questions below:

What is Resource Group?

A resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group. You decide how you want to allocate resources to resource groups based on what makes the most sense for your organization. Generally, add resources that share the same lifecycle to the same resource group so you can easily deploy, update, and delete them as a group.

What is Storage Account?

An Azure storage account contains all of your Azure Storage data objects, including blobs, file shares, queues, tables, and disks. The storage account provides a unique namespace for your Azure Storage data that’s accessible from anywhere in the world over HTTP or HTTPS. Data in your storage account is durable and highly available, secure, and massively scalable.

What is Azure Data Factory?

Azure Data Factory is Azure’s cloud ETL service for scale-out server-less data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management. You can also lift and shift existing SSIS packages to Azure and run them with full compatibility in ADF.

What is Dataflow?

Mapping data flows are visually designed data transformations in Azure Data Factory. Data flows allow data engineers to develop data transformation logic without writing code. The resulting data flows are executed as activities within Azure Data Factory pipelines that use scaled-out Apache Spark clusters. Data flow activities can be operationalized using existing Azure Data Factory scheduling, control, flow, and monitoring capabilities.

What is Linked Service?

Linked services are much like connection strings, which define the connection information needed for the service to connect to external resources. Think of it this way; the dataset represents the structure of the data within the linked data stores, and the linked service defines the connection to the data source. For example, an Azure Storage linked service links a storage account to the service. An Azure Blob dataset represents the blob container and the folder within that Azure Storage account that contains the input blobs to be processed.

How to undo any steps?

Like other tools you can use all windows shortcuts in Azure portal hence can use CTRL+Z for Undo

What is Dataflow Debug mode?

Azure Data Factory and Synapse Analytics mapping data flow’s debug mode allows you to interactively watch the data shape transform while you build and debug your data flows. The debug session can be used both in Data Flow design sessions as well as during pipeline debug execution of data flows.

Getting Service Already exists error?

While renaming any transformation or Activity as instructed in the lab, if you get name taken error , please try to use some other name for the given activity as you might have already used the same name before for any other service.

From where can i get the login credentials for Azure ?

The credentials will be shared by the Lab facilitator, please contact them in case of any issue.

How to search for any resource/service in Azure ?

Try using the global search option and type in the name of the resource as per the documentation and you will get to see the list of resources and their type as shown below.

What is the use of Publish option in Azure ?

Publish option is used to commit any changes into Azure as once we have done any modification in any of the Azure service code then we have to publish those changes so that the changes will be effective and viewed by all other members.

What are the different types of Join supported in Azure Dataflow?

Why the resource names i am seeing in the lab is not matching the documentation/instruction?

The documentation/instruction was made from a generic prospective so the actual resource names would be somewhat similar but might not be exact like in document it says SaferaLab as name for resource group but you might see names like SaferaLab1, SaferaLab2,..Safera_Lab, etc. So you can select whichever service is visible to you similar to that of instruction steps/snip.

Why we use incognito tab for the lab?

In order to avoid any access issues with Azure portal caused due to old saved logins in the browser it is instructed to use incognito mode to be used for the lab.

Can we resume our task steps after a day?

No, as the labs will be performed on a Sandbox, so all changes will be lost after couple of hours, so please try to finish the labs in one go.

What is a “Substring”?

Extracts a substring from the source string starting from some index to the end of the string.

Syntax-

substring(source, startingIndex [, length])

Examples-

substring(“123456”, 1) // 23456

substring(“123456”, 2, 2) // 34

substring(“ABCD”, 0, 2) // AB

substring(“123456”, -2, 2) // 56

Can Azure Data Factory be used to migrate data from on-premises systems to the cloud?

Yes, Azure Data Factory can be used for data migration and Lift and Shift of your existing jobs & process from on-premise servers to Cloud.

What is the maximum size of data that can be processed by Azure Data Factory?

However, Limit for payload size doesn’t relate to the amount of data you can move and process with Azure Data Factory.

To know more about Azure Data Factory limits please refer to this link below

https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits#azure-data-factory-limits

Can Azure portal be used for machine learning?

Yes, we can use Azure ML Studio to develop machine learning solutions.

More details can be found out in the given link below:

https://azure.microsoft.com/en-us/products/machine-learning/

What is EDA and why we are using EDA?

Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions.

The main purpose of EDA is to help look at data before making any assumptions. It can help identify obvious errors, as well as better understand patterns within the data, detect outliers or anomalous events, find interesting relations among the variables.

Didn’t find the answer? Please raise a ticket here:

[ticket-submit]