21. Where do the users use ETL concepts? Explain it.
- Before ETL tools users wrote the extended code for data transformation to data loading.
- ETL makes life more comfortable, and one tool manages all the scenarios of transformation and loading of the data.
Here is the following example where we are using the ETL:
Example: Data Warehousing
ETL is used in the data warehousing concept. Here, we need to fetch the data from multiple different systems and load it in the data warehouse database. ETL concept is used here to extract the data from the source, transform the data, and load it into the target system.
Example: Data Migration
Data migrations are a difficult task if we are using PL SQL. If we want to migrate the data using a simple way, we will use different ETL tools.
Example: Mergers and Acquisitions
In today’s time, lots of companies are merging into different MNCs. To move the data from one company to another, the need for ETL concept arises.
22. Mention the types of Data Warehouse applications. What is the difference between Data Mining and Data Warehousing?
Types of data warehouse applications are:
- Info Processing
- Analytical Processing
- Data Mining
Data mining can be defined as the process of extracting hidden predictive information from large databases and interpreting the data, while data warehousing may make use of a data mine for the analytical processing of the data in a faster way. Data warehousing is the process of aggregating data from multiple sources into one common repository.
23. How do we use ETL in third party management?
The big organization always gives different application development to different kinds of vendors. A single vendor cannot manage everything. Here we are taking an example of a telecommunication project where billing is handled by one company, and another company manages CRM. If a CRM company needs the data from the company, who is managing the billing, now the company will receive the data feed from another company. To load the data from the ETL process is used.
24.What is the difference between ETL Testing and Database Testing?
The differences between ETL testing and Database testing are:
ETL Testing | Database Testing |
|
|
|
|
|
|
|
|
|
|
|
|
25. What are the characteristics of a Data Warehouse?
- Data Warehouse is a database which is different from the operational database and stores the historical data.
- Data Warehouse Database contains the analytical as well as transactional data.
- Data Warehouse is used for data analysis and reporting purposes.
- Data Warehouse helps the higher management to take strategic and tactical decisions using historical or current data.
- Data Warehouse helps the business user to the current trend to run the business.
26. What are the types of Data Warehouse systems?
- Online Analytical Processing (OLAP)
- Predictive Analysis
- Online Transactional Processing
- Data Mart
27. What are the steps followed in the ETL testing process?
The different steps followed in ETL testing process are:
Step 1. Requirement Analyzing
In this step, we understand the business structure and the requirements.
Step 2. Validation and Test Estimation
An estimation of time and expertise is required in this step.
Step 3. Test Planning and designing the testing environment
 planned according to the input which is used in the test estimation and worked according to that.
Step 4. Test Data Preparation and Execution
As per the test, data is prepared and executed as per the requirement.
Step 5. Summary Report
On the completion of the test run, a summary report is prepared for concluding and improvising.
28. How is ETL used in a Data Migration Project? Explain it.
ETL tools are generally used in Data Migration Projects. If any organization is managing the data in Oracle 10g previously, now the organization wants to use SQL server cloud database, then there is a need to move the data from source to target. For this kind of movement, ETL tools are very useful. If we want to write the code for ETL, it is a very time-consuming process. To make this simple, we use ETL tool, which makes the coding simple PL SQL or T- SQL code. So the ETL process is useful in Data Migration Projects
29. What is Operation Data Source?
- ODS stands for Operational Data Source.
- ODS works between the staging area and the Data Warehouse. The data is ODS will be at the level of granularity.
- When the data is inserted in ODS, all the data will be loaded in the EDW through ODS.
30. What is the data extraction phase in ETL?
Data Extraction is nothing, but it is extracting the data from multiple different sources using ETL tools.
Here are two types of data extraction.
- Full Extraction: All extracted data from an operational system or source system load to the staging area.
- Partial Extraction: Sometimes, we get the notification from the source system to update the specific data. It is  called Delta Load.
- Source System Performance: The extraction strategies of data should not affect the performance of the source system.
31. What are the ETL Tools?
The popular tools are:
- Enterprise ETL tools
- Informatica
- Talend
- IBM Datastage
- Abnitio
- MS SQL Server Integration service
- Clover ETL
- Open Source ETL tools
- Pentaho
- Kettle
32. What is partitioning in ETL?
Transactions are always needed to be divided for better performance. The same processes are known as Partitioning. It merely makes sure that the server can directly access the sources through multiple connections.
33. What is the ETL Pipeline?
ETL Pipeline refers to a set of processes to extract the data from one system, transform it, and load it into some database or data warehouse. ETL pipelines are built for data warehousing applications, which includes both enterprise data warehouses as well as subject-specific data marts. ETL pipelines are also used for data migration solutions. Data warehouse/ business intelligence engineers build ETL pipelines.
34. What is the Data Pipeline?
Data Pipeline refers to any set of processes elements that move data from one system to another. Data Pipeline can be built for any kind of application which uses data to bring value. It can be used for integrating the data across the applications, building the data-driven web products and carrying out the data mining activities. Data engineers build the data pipeline.
35. What is the staging place in the ETL Testing?
Staging place is the temporary storage area that is used during the data integration process. In this place, data is analyzed carefully for redundancy and duplication.
36. What is an ETL mapping sheet? Define its significance.
ETL mapping sheet contains all the necessary information from the source file and stores the details in rows and columns. Mapping sheets help in writing the SQL queries to speed up the testing process.
37. What is the transformation in ETL Testing?
- Transformation is defined as the archive objects to generate, modify, or pass the data. Transformation can be Active or passive. Transformation is beneficial in many ways.
- It helps in getting values very quickly.
- The transformation can update the slowly changing dimension table.
- It checks or verifies whether the record exists or not inside the table.
38. What is the use of dynamic cache and static cache in transformation?
Dynamic cache is used to update the dimension or master table slowly. The static cache is used in flat files.
39. What is a mapping, Session, Worklet, and Mapplet?
- Mapping: Mapping represents workflow from source to target.
- Workflow: Workflow is a set of instructions which tells the Informatica server how to execute the tasks.
- Mapplet: Mapplet configures or creates a set of transformations.
- Worklet: It is an object that represents a set of tasks.
- Session: Session is a set of instructions that describe how and when to move the data from sources to target.
40. What is full load and incremental or refresh load?
Full Load: Full load completely erase the content of one or more tables and reload with fresh data.
Incremental Load: In this, we apply the ongoing changes to one or more tables, which is based on a predefined schedule.
41 What are joiner and lookup?
The joiner is used to join two or more tables to retrieve the data from tables.
Lookup is used to check and compare the source table and the target table.
42. What is data purging?
Data Purging is a term that is commonly used to describe the methods which remove and permanently erase the data from a storage space. In other words, it can be defined as deleting the data from the data warehouse is known as data purging. Usually, we have to clean up the junk data like rows which have null values or spaces. Data Purging is the process of cleaning the junk values.
43. What is the difference between ETL tools and OLAP tools?
ETL Tools is meant for extraction of the data from the legacy system and load it into the specified database with some process of cleansing data.
For example: Informatica, data stage etc.
OLAP Tools: It is used for reporting purposes in OLAP data available in the multidirectional model. We can write a simple query to extract the data from the database.
Example: Business object, Cognos, etc.