Now that we have discussed about the overview of Oracle data warehouse reference architecture in our earlier article, let me explain in detail about each layer starting with Staging.
Staging, as the name suggests is an area to stage your unprocessed or processed data in your ETL or ELT process. In most of the data warehousing projects, staging area is created as a separate schema in the warehouse. However, staging area can also refer to imaginary area within the same schema as that of your warehouse and separated by naming convention.
Staging can contain tables of two different types:
Persistent Staging: The tables are designed to hold data for extended period of time for archival or troubleshooting purposes. This data will be purged from database after a certain period of time called the staging retention period either automatically by ETL process or manually.
Non- Persistent Staging: This kind of staging area will be transient in nature, with their contents erased prior to running an ETL process or immediately after successful completion of an ETL process.
Staging area, in general sits between the Source and Target systems of warehouse architecture. In some architectures, staging area is designed closer to source systems where the structure of staging tables will be similar to source tables and in others, it will be designed closer to target systems.
In the reference architecture, staging is placed conveniently between source and foundation layers for doing a variety of data warehousing functions.
Data Cleansing: Data warehouse must contain data that is clean, consistent and complete, as far as is practical. This is the layer where the business rules are applied on data to achieve these objectives. Rejected data is retained in this layer for manual or automatic correction.
Data Transformation: This layer acts as a temporary storage area for data manipulation before it enters the foundation layer. Transforming data essentially means converting data to conform to business rules of the warehouse.
Change Data Capture: Changed data can be captured using real time event detection process like Oracle Streams, Oracle Golden Gate etc. or simply using traditional change data capture techniques like identifying the delta records based on last update time on source or a combination of both. This layer serves to isolate the rate at which data is received from source to the rate at which it is consumed by reports. The data can be received from the source system using Real time techniques but can be loaded to target on frequencies like daily, weekly, monthly, yearly ruled by the business requirements.
As with other layers in the architecture, staging layer is exposed to BI tools for reporting including data quality information.
Let us discuss Foundation layer and its functions in the next article.