Frequently Asked Questions on Lab 1 Lesson - Prepare Your Data

Here, we are going to answer all your questions related to Lab 1 -Prepare Your Data for Crime Analytics & Predictions Labs course.

If you didn’t find the answer listed below, please raise a ticket in Safera Labs Support.

 
 

Browse through the commonly asked questions below:

A resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group. You decide how you want to allocate resources to resource groups based on what makes the most sense for your organization. Generally, add resources that share the same lifecycle to the same resource group so you can easily deploy, update, and delete them as a group.

An Azure storage account contains all of your Azure Storage data objects, including blobs, file shares, queues, tables, and disks. The storage account provides a unique namespace for your Azure Storage data that’s accessible from anywhere in the world over HTTP or HTTPS. Data in your storage account is durable and highly available, secure, and massively scalable.

Azure Data Factory is Azure’s cloud ETL service for scale-out server-less data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management.  You can also lift and shift existing SSIS packages to Azure and run them with full compatibility in ADF.

Mapping data flows are visually designed data transformations in Azure Data Factory. Data flows allow data engineers to develop data transformation logic without writing code. The resulting data flows are executed as activities within Azure Data Factory pipelines that use scaled-out Apache Spark clusters. Data flow activities can be operationalized using existing Azure Data Factory scheduling, control, flow, and monitoring capabilities.

Linked services are much like connection strings, which define the connection information needed for the service to connect to external resources. Think of it this way; the dataset represents the structure of the data within the linked data stores, and the linked service defines the connection to the data source. For example, an Azure Storage linked service links a storage account to the service. An Azure Blob dataset represents the blob container and the folder within that Azure Storage account that contains the input blobs to be processed.

Like other tools you can use all windows shortcuts in Azure portal hence can use CTRL+Z for Undo

Azure Data Factory and Synapse Analytics mapping data flow’s debug mode allows you to interactively watch the data shape transform while you build and debug your data flows. The debug session can be used both in Data Flow design sessions as well as during pipeline debug execution of data flows.

While renaming any transformation or Activity as instructed in the lab, if you get name taken error , please try to use some other name for the given activity as you might have already used the same name before for any other service.

The credentials will be shared by the Lab facilitator, please contact them in case of any issue.

Try using the global search option and type in the name of the resource as per the documentation and you will get to see the list of resources and their type as shown below.

Publish option is used to commit any changes into Azure as once we have done any modification in any of the Azure service code then we have to publish those changes so that the changes will be effective and viewed by all other members.

  • Inner Join  
  • Left Outer  
  • Right Outer  
  • Full Outer  
  • Custom cross join  
  • Non-equi joins

The documentation/instruction was made from a generic prospective so the actual resource names would be somewhat similar but might not be exact like in document it says SaferaLab as name for resource group but you might see names like SaferaLab1, SaferaLab2,..Safera_Lab, etc. So you can select whichever service is visible to you similar to that of instruction steps/snip. 

In order to avoid any access issues with Azure portal caused due to old saved logins in the browser it is instructed to use incognito mode to be used for the lab. 

No, as the labs will be performed on a Sandbox, so all changes will be lost after couple of hours, so please try to finish the labs in one go. 

Extracts a substring from the source string starting from some index to the end of the string.

Syntax-  

substring(source, startingIndex [, length])  

Examples-  

substring(“123456”, 1)        // 23456

substring(“123456”, 2, 2)     // 34

substring(“ABCD”, 0, 2)       // AB

substring(“123456”, -2, 2)    // 56

Yes, Azure Data Factory can be used for data migration and Lift and Shift of your existing jobs & process from on-premise servers to Cloud.

However, Limit for payload size doesn’t relate to the amount of data you can move and process with Azure Data Factory.

To know more about Azure Data Factory limits please refer to this link below 

https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits#azure-data-factory-limits

Yes, we can use Azure ML Studio to develop machine learning solutions.  

More details can be found out in the given link below:  

https://azure.microsoft.com/en-us/products/machine-learning/

Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions.  

The main purpose of EDA is to help look at data before making any assumptions. It can help identify obvious errors, as well as better understand patterns within the data, detect outliers or anomalous events, find interesting relations among the variables.

Didn’t find the answer? Please raise a ticket here:

[ticket-submit]