Data Quality & Database Migration Checklist

Data is the oil of this century. The database is the storage place for this oil.

Interested in the world of data? Then this blog post is for you 🙂

Database migration testing is a vital aspect of software testing that helps in smooth and error-free transitions between different versions or structures of a database. It can help with data integrity, consistency, and functionality verification throughout the migration process. This blog post will present a comprehensive checklist of essential considerations for effective database migration testing. Following this checklist can enhance your test coverage, minimize risks, and ensure a successful migration.

Note: This checklist was initiated and majorly contributed by Ayan Paul. He holds majority rights to this work. So, if you want to refer to this or use this as a reference in your work, Please add your due credits to Ayan Paul.

Pre-Migration Checklist

Post Migration Checklist

The below section consists of a checklist for Data Quality by Amit Sharma | LinkedIn
These were originally posted as LinkedIn posts and have been added here with due permissions and credits.

Data Quality Checklist

How can we implement Data Quality checks within a Data Pipeline?

  1. Applications can expose their data in the form of various data sources like Flat files, APIs, Databases, etc
  2. Data Pipeline in the first stage will connect to the data sources, and extract and ingest the data in the staging area.
  3. In the staging area data will be in the same format as it was exposed by the data source.

The following Data Quality Test needs to be carried out here:

  1. Data Freshness Checks: These tests will check whether we have received data from all the sources. This can be performed by checking the latest timestamp of data obtained for a particular data source against the current date.
  2. Data Uniqueness Checks: These tests will verify that there are no duplicates in the data received from the source.
  3. Data Validity tests: These tests should check for null checks and accepted values check.
  4. Data Completeness Checks: These checks will verify that all rows that were exposed from the data source were ingested successfully in the staging area.
  5. Data Transformation: After raw data has been ingested in the staging layer and all tests have passed, the transformation task will begin. It involves converting raw data into a standardized format and writing it to the reporting layer of the Data Warehouse, where downstream applications and reporting systems can consume the data. Once data has been transformed, the following tests can be performed to check the quality of the data:
  • Business Logic Checks: These checks will ensure that data has been transformed in accordance with the dataset’s requirements. As per business rules, all expected columns, etc are present in the data.
  • Referential Integrity checks: The purpose of these checks is to ensure that transformed data is consistent across related tables. The foreign key must match the primary key in the related table. If a dataset contains two tables, Orders and Customers, this check will ensure all customers in the Customer table are present in the Orders table.
  • Data Consistency Checks: During these checks, it will be ensured that the data across different systems is the same. For instance, if there are two tables in a database that show impressions on a particular advertisement on a daily and hourly basis. As a result of this check, we will ensure that the total of all the hourly impressions for a particular advertisement matches the total of the daily impressions in the daily table for that advertisement
  • Data Profiling Checks: Data will be profiled and compared to some statistical values, like standard deviation, averages, etc., to identify trends. These checks should trigger an alert if the data received does not conform to the trend for a
  • Schema Checks: These tests will compare data received against the expected schema. Data types of received columns against expected columns can be checked here. Number of columns received in data against expected columns can also be checked here

Want to help us in extending this checklist? Share your thoughts in the comments. Thanks 🤩

Still wondering, What could go wrong while migrating data? Check out this resource from #RST by Michael Bolton and James Bach on Heuristics of Software Testability.

If you liked this article, you can check out this popular post from my blog hits:

Leave a Comment

Your email address will not be published. Required fields are marked *