In the last post, we discussed actually building the data extract to be consumed by the reporting tool of choice. If you haven’t read the previous posts, you can find them here:
- Phase 1:https://datastrides.wordpress.com/2019/02/02/your-report-2-0-a-data-engineers-guide-to-migrating-bi-reports-from-old-to-new-systems-part-1/
- Phase 2:https://datastrides.wordpress.com/2019/02/23/your-report-2-0-a-data-engineers-guide-to-migrating-bi-reports-from-old-to-new-systems-part-2-building-the-extract/
The next phase is crucial: Unit testing and QA. If these steps are glazed over, it will result in many iterations of back and forth with business users, which is frustrating to all involved. A wise data leader once described the data mantra he lives: “Get it working, get it right, get it fast”. If the extract is running and producing data, it is ‘working’. So now is time to get it ‘right’.
Before even running the extract and loading data into the reporting tool, I like to validate the data first by simply looking at it and doing some simple unit tests. The first step to validating the data is attempting to find a source of truth. Hopefully, if you are migrating an existing report, you can be provided with a historic version of that report and a data extract to compare to. If this is a brand new report, or you are sourcing from a brand new system, you might have to work with the business to manually validate your first pass. Once you have you source of truth, pick one specific record, or if it is an aggregated report, drop to the lowest granularity possible, and just make sure it looks everything looks good. Pick a specific record by ID, and ask the data some questions: Is the record there? Do all the fields match your source of truth? Do the datatypes and data formats match up (dates in the same format?). If you are looking at an aggregate, are your totals at the lowest granularity accurate? If not, you will have to dive in and take a look at how you are aggregating the data and see if something is missing, a calculation is off, or if your ETL is losing records or data. After you are happy with your first record, extend it to other records, and if you have other base ‘types’ or records, try those out as well.
During this time, you might discover bugs in the old report. I have seen this frequently. The best thing you can do in this scenario is to write up a detailed ‘report’ of the discrepancy, with 1-3 detailed examples and screenshots, and meet with the business to ‘prove’ your version is actually accurate. This will require stepping through the entire ETL process, from source to extract, so it can be quite time consuming. This happens more often than you would expect, so getting good at explaining these technical discrepancies to the business is an invaluable skill.
Once you have unit tested at the record level, you should unit test within a specific time range. Some things to make sure: Are record counts matching? Do aggregates across a specific range work? Basically, this is a good time to look at the data across a larger span, to make sure you have all the data you need.
After you have completed your unit testing, and you are pretty confident in the quality of your data extract, it is time to integrate with the reporting tool. Ideally, this would be the time for a QA person to jump in and test out the report itself. It is always good to have a third party test your work, and remember not to take bugs personally! They are completely unavoidable. But if you do have to be your own QA, you will want to spin through the entire report. Are the charts populating properly? Do all the filters and functions work? If you have historical data, and you can compare an old version of the report to the new version, now is the time to do it! Another piece of integration testing you can do is making sure the extract in the report works properly when you are doing incrementals. So if you have dates as a parameter in your extract, try running it multiple times with overlapping dates. If you are getting duplicates, there is likely an issue in your source query itself, or in your merge statements. Read a prior blog post here about identifying issues with incremental merges.
The final step in the process should always be business validation (User acceptance testing). Once you are confident that the data looks good in the report, have a business user confirm them same. Have them test it over a period of time (perhaps days), so you can confirm updates are all working properly, and that no issues might arise over time. Once the business gives the green light, you are ready to push the report to production, thus beginning phase 4: Support and Enhancements.