1.What is ETL? Explain it.
ETL stands for Extraction, Transformation, and Loading. It is an essential concept in Data Warehousing systems. There are three basic steps in the Data Integration Process. Extraction stands for extracting the data from different data sources such as transactional systems or applications. Transformation stands to apply the conversion rules on data so that it becomes suitable for analytical reporting. Loading process involves, to move the data into the target system, i.e., Data Warehouse.
2.Compare ETL Testing with Manual Testing
- Basic Procedure:
- ETL Testing: Involves automating the testing process by writing scripts.
- Manual Testing: Relies on direct observation and hands-on testing of the software.
- Requirements:
- ETL Testing: Only basic software knowledge is needed.
- Manual Testing: Requires technical skills, including SQL and Shell scripting knowledge.
- Efficiency:
- ETL Testing: Achieves results quickly and systematically with high effectiveness.
- Manual Testing: Demands more time and effort, and is more susceptible to errors.
3.Explain the concept of Extraction, Transformation, and Loading?
Extraction
Extracted the data from an external source and move it to the data Warehouse pre-processor database.
Transformation
Transform data task allows point to point generating, modifying, and transforming the data.
Loading
In this task, the data is added to the database table in a warehouse.
4. Why is ETL Testing required?
- To keep an eye on data that is being transferred from one system to another
- To keep track of the efficiency and speed of the process
- To achieve fast and the best results
5.What is the three-layer architecture of an ETL cycle?
- Staging Layer: Staging layer is used to store the data which is extracted from the different data source systems.
- Data Integration Layer: Integration layer transforms the data from the staging layer and moves the data to a database. In the database, the data is arranged into hierarchical groups, which is often called dimension, and into facts and aggregation facts. The combination of facts and dimension tables in a data warehouse system is called a schema.
- Access Layer: Access layer is used by the end-users to retrieve the data for analytical reporting.
6. What are the responsibilities of an ETL Tester?
An ETL Tester:
- Requires in-depth knowledge of the ETL tools and processes
- Needs to write SQL queries for various scenarios during the testing phase
- Should be able to carry out different types of tests and keep a check on the other functionalities of the process
- Needs to carry out quality checks on a regular basis
7. What is BI?
Business Intelligence is the process for collecting raw business data and transforming it into a meaningful vision that is more useful for business.
8. What are the various tools used in ETL?
- Cognos Decision Stream
- Oracle Warehouse Builder
- Business Objects XI
- SAS Business Warehouse
- SAS Enterprise ETL Server
9.What are the differences between ETL and BI tools?
ETL Tools | BI Tools |
ETL tools are used to extract data from various data sources, transform it, and load it into a data warehouse system. | BI tools are used to generate interactive and ad-hoc reports, and provide data visualization for monthly, quarterly, and annual board meetings. |
Common ETL tools include Informatica, SAP BO Data Services, Microsoft SSIS, Oracle Data Integrator (ODI), and CloverETL. | Common BI tools include SAP Lumira, IBM Cognos, Microsoft BI platform, Tableau, and Oracle Business Intelligence Enterprise Edition. |
10.What are the ETL tools available in the market?
The popular ETL tools available in the market are:
- IBM- Websphere DataStage
- Informatica- Power Center
- SAP- Business objects data service BODS
- SAS – Data Integration Studio
- Oracle- Warehouse Builder
- Open source Clover ETL
11. Define ETL Processing.
ETL Testing Process:
Although there are many ETL tools, there is a simple testing process commonly used in ETL Testing. It is as important as the implementation of the ETL tool into your business. Having a well-defined ETL Testing strategy can make the testing process much easier. Hence, this process needs to be completed before you start the data integration with the selected ETL tool.
In this ETL Testing process, a group of experts comprising the programming and developing team will start writing SQL statements. The development team may customize them according to the requirements.
ETL Testing process has the following stages:
- Analyzing requirements: Understanding the business structure and their particular requirements.
- Validation and test estimation: Estimating the time and expertise required to carry on with the procedure.
- Test planning and designing the testing environment: Based on the inputs from the estimation, an ETL environment is planned and worked out.
- Test data preparation and execution: Data for the test is prepared and executed as per the requirements.
- Summary report:Â Upon the completion of the test run, a brief summary report is prepared for improvising and concluding.
12. What is the difference between the data warehouse and data mining?
Data warehousing is a broad concept as compared to data mining. Data Mining involves extracting the hidden information from the data and interpreting it for future forecasting. In contrast, data warehousing includes operations such as analytical reporting to generate detailed reports and ad-hoc reports, information processing to generate interactive dashboards and charts.
13. What do ETL Testing operations include?
ETL Testing includes:
- Verifying whether the data is transformed accurately according to business requirements
- Verifying that the projected data is loaded into the data warehouse without any truncation or data loss
- Making sure that the ETL application reports any invalid data and replaces with default values
- Making sure that the data loads within the expected time frame to improve scalability and performance
14.What are the differences between OLTP and OLAP?
OLTP | OLAP |
OLTP stands for Online Transactional Processing. | OLAP stands for Online Analytical Processing. |
OLTP is a relational database used to manage day-to-day transactions. | OLAP is a multidimensional system, also known as a data warehouse. |
15.What is a dimension table and how is it different from the fact table?
A dimension table stores attributes or dimensions that describe the objects in a fact table. It acts as a companion to the fact table.
Example to illustrate the differences:
Suppose a company sells products to customers. Each sale is a fact, and the fact table records these facts. The fact table stores the primary keys that join the fact table with the dimension tables and measures/facts.
Example: Fact Table – Units Sold
Cust_ID | Prod_ID | Time_ID | No. of units sold |
101 | 24 | 1 | 25 |
102 | 25 | 2 | 15 |
103 | 26 | 3 | 30 |
Dimension Table – Customer Details
Cust_ID | Cust_Name | Gender |
101 | Sana | F |
102 | Jass | M |
The fact table (Fact Units) records quantitative data about sales transactions, while the dimension table (Customer Details) provides descriptive attributes related to those transactions, such as customer names and gender.
16. What is a Data Mart?
Data Mart is a simple form of Data Warehouse, and it is focused on a single functional area. It gets the only comes from a few sources.
For example: In an organization, data marts may exist for marketing, finance, human resource, and other individual departments which store the data related to their specific functions.
17. List a few ETL bugs.
- Calculation Bug
- User Interface Bug
- Source Bug
- Load Condition Bug
- ECP-related Bug
18. What is Fact? What are the types of Facts?
Fact is a central component of a multi-dimensional model that contains the measures to be analyzed. Facts are related to dimensions.
Types of facts are:
- Additive Facts
- Semi-additive Facts
- Non-additive Facts
19. What are Cubes and OLAP Cubes?
Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provide a multi-dimensional analysis.
OLAP stands for ‘Online Analytics Processing,’ and OLAP Cubes store voluminous data in a multi-dimensional form for reporting purposes. They consist of facts called ‘measures’ categorized by dimensions.
20. What is the need for ETL Testing?
In today’s time, we are migrating lots of systems from old technology to new technology. At the time of migration activities, we also need to migrate the data as well from old DBMS to latest DBMS. So there is a lot of need to test whether the data is correct from the target side.
Here, are some important points where the need for ETL testing is arising:
- ETL testing used to keep an eye on the data which is being transferred from one system to another.
- The need for ETL testing is to keep a track on the efficiency and speed of the process.
- The need for ETL testing is arising to be familiar with the ETL process before we implement it into our business and production.