From b706172d22a29962c42c90f2c3c704c2a4141660 Mon Sep 17 00:00:00 2001 From: Bruce Momjian Date: Tue, 31 Oct 2023 11:42:02 -0400 Subject: [PATCH] C comment: improve statistics computation comment example Discussion: https://postgr.es/m/CAKFQuwbD672Sc0EXv0ifx3pzfQ5UAEpiAeaBGKz_Ox-4d2NGCA@mail.gmail.com Author: David G. Johnston Backpatch-through: master --- doc/src/sgml/planstats.sgml | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml index 43ad57253e..c7ec749d0a 100644 --- a/doc/src/sgml/planstats.sgml +++ b/doc/src/sgml/planstats.sgml @@ -389,18 +389,20 @@ tablename | null_frac | n_distinct | most_common_vals In this case there is no MCV information for - unique2 because all the values appear to be - unique, so we use an algorithm that relies only on the number of - distinct values for both relations together with their null fractions: + unique2 and all the values appear to be + unique (n_distinct = -1), so we use an algorithm that relies on the row + count estimates for both relations (num_rows, not shown, but "tenk") + together with the column null fractions (zero for both): -selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2) +selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1, num_rows2) = (1 - 0) * (1 - 0) / max(10000, 10000) = 0.0001 This is, subtract the null fraction from one for each of the relations, - and divide by the maximum of the numbers of distinct values. + and divide by the row count of the larger relation (this value does get + scaled in the non-unique case). The number of rows that the join is likely to emit is calculated as the cardinality of the Cartesian product of the two inputs, multiplied by the -- 2.30.2