Why row-count checks miss data loss in a database migration

← Back to blog

The row counts match, so the migration went fine. It's the most common false comfort in a data migration — and it's wrong often enough to be dangerous. A source table and a target table can hold the same number of rows while the actual data has quietly drifted, corrupted, or disappeared.

A row-count check answers exactly one question: are there the same number of rows on both sides? That's a useful smoke test, but it says nothing about whether the values inside those rows survived the trip. Below are the failure modes that slip straight past a count, using the kind of Oracle-to-PostgreSQL migration where they show up most.

Nulls that appear or disappear

Oracle famously treats an empty string as NULL; PostgreSQL treats '' and NULL as distinct values. Migrate a column of mostly-empty VARCHAR2 fields and you can land a target where thousands of rows now hold an empty string where the source meant "no value" — or the reverse, depending on how your ETL coerces them. The row count is identical. A WHERE col IS NULL report downstream now returns a completely different set.

Default values are the mirror image of this: a column that was nullable in the source but has a DEFAULT in the target schema will silently backfill nulls with the default on insert. Every row is present. Every affected value is wrong.

Type and precision drift

Number types rarely map one-to-one across engines. Oracle's NUMBER is arbitrary-precision; map it carelessly to a PostgreSQL double precision or a too-small numeric(p,s) and you round or truncate values on the way in. A monetary column that loses its fourth decimal place still counts as one row per row — but your ledger no longer foots.

Timestamps are just as treacherous. A TIMESTAMP WITH TIME ZONE migrated into a plain timestamp drops the offset; a value stored in one session's timezone gets reinterpreted in another. Every appointment, transaction, or audit record is now off by hours, and nothing about the count reveals it.

Shifted joins and reordered keys

Migrations frequently rebuild relationships. If a foreign key is repointed, a surrogate key is regenerated, or a composite key's column order changes, rows can attach to the wrong parent. The child table still has N rows; they just describe different relationships than they used to. This is the most insidious category, because the data looks structurally healthy — it's the meaning that broke.

Encoding and collation

Character-set conversions — say a legacy WE8ISO8859P1 database into a UTF-8 target — can mangle accented characters, currency symbols, and anything outside plain ASCII. Names, addresses, and product descriptions come across as mojibake. Meanwhile collation differences change how strings sort and compare, which quietly alters the results of range queries and uniqueness checks even when every byte survived.

The common thread: every one of these preserves the row count. Counting rows verifies that the plumbing moved data. It cannot verify that the data is the same.

What actually catches these

The only reliable way to catch value-level drift is to compare the values — row by row, column by column — between source and target. That's what data reconciliation does: it reads both datasets, aligns them on a stable key, and returns the exact rows and columns that differ. Not a percentage, not an anomaly score. The specific cells.

A practical migration check looks less like SELECT COUNT(*) and more like: pick a comparison key that's stable across both systems, normalise the types you know will drift (trim, cast, canonicalise timezones), then diff every column. When it comes back clean, you have evidence you can hand to whoever signs off the cutover. When it doesn't, you get a worklist of precisely what to fix — before you switch the old system off.

Row counts still earn their place as a first, cheap gate. Just don't mistake them for proof. For the practical end-to-end checklist, see how to validate a database migration cutover with reconciliation, and for the deeper argument, reconciliation, not guesswork.

Nulls that appear or disappear

Type and precision drift

Shifted joins and reordered keys

Encoding and collation

What actually catches these

Prove your migration, don't just count it