git.postgresql.org Git - postgresql.git/log

Fix handling of orphaned 2PC files in the future at recovery

Before 728bd991c3c4, that has improved the support for 2PC files during
recovery, the initial logic scanning files in pg_twophase was done so as
files in the future of the transaction ID horizon were checked first,
followed by a check if a transaction ID is aborted or committed which
could involve a pg_xact lookup.  After this commit, these checks have
been done in reverse order.

Files detected as in the future do not have a state that can be checked
in pg_xact, hence this caused recovery to fail abruptly should an
orphaned 2PC file in the future of the transaction ID horizon exist in
pg_twophase at the beginning of recovery.

A test is added to check for this scenario, using an empty 2PC with a
transaction ID large enough to be in the future when running the test.
This test is added in 16 and older versions for now.  17 and newer
versions are impacted by a second bug caused by the addition of the
epoch in the 2PC file names.  An equivalent test will be added in these
branches in a follow-up commit, once the second set of issues reported
are fixed.

Author: Vitaly Davydov, Michael Paquier
Discussion: https://postgr.es/m/11e597-676ab680-8d-374f23c0@145466129
Back-through: 13

Exclude parallel workers from connection privilege/limit checks.

Cause parallel workers to not check datallowconn, rolcanlogin, and
ACL_CONNECT privileges.  The leader already checked these things
(except for rolcanlogin which might have been checked for a different
role).  Re-checking can accomplish little except to induce unexpected
failures in applications that might not even be aware that their query
has been parallelized.  We already had the principle that parallel
workers rely on their leader to pass a valid set of authorization
information, so this change just extends that a bit further.

Also, modify the ReservedConnections, datconnlimit and rolconnlimit
logic so that these limits are only enforced against regular backends,
and only regular backends are counted while checking if the limits
were already reached.  Previously, background processes that had an
assigned database or role were subject to these limits (with rather
random exclusions for autovac workers and walsenders), and the set of
existing processes that counted against each limit was quite haphazard
as well.  The point of these limits, AFAICS, is to ensure the
availability of PGPROC slots for regular backends.  Since all other
types of processes have their own separate pools of PGPROC slots, it
makes no sense either to enforce these limits against them or to count
them while enforcing the limit.

While edge-case failures of these sorts have been possible for a
long time, the problem got a good deal worse with commit 5a2fed911
(CVE-2024-10978), which caused parallel workers to make some of these
checks using the leader's current role where before we had used its
AuthenticatedUserId, thus allowing parallel queries to fail after
SET ROLE.  The previous behavior was fairly accidental and I have
no desire to return to it.

This  includes reverting 73c9f91a1, which was an emergency hack
to suppress these same checks in some cases.  It wasn't complete,
as shown by a recent bug report from Laurenz Albe.  We can also revert
fd4d93d26 and 492217301, which hacked around the same problems in one
regression test.

In passing, remove the special case for autovac workers in
CheckMyDatabase; it seems cleaner to have AutoVacWorkerMain pass
the INIT_PG_OVERRIDE_ALLOW_CONNS flag, now that that does what's
needed.

Like 5a2fed911, back- to supported branches (which sadly no
longer includes v12).

Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us

In REASSIGN OWNED of a database, lock the tuple as mandated.

Commit aac2c9b4fde889d13f859c233c2523345e72d32b mandated such locking
and attempted to fulfill that mandate, but it missed REASSIGN OWNED.
Hence, it remained possible to lose VACUUM's inplace update of
datfrozenxid if a REASSIGN OWNED processed that database at the same
time.  This didn't affect the other inplace-updated catalog, pg_class.
For pg_class, REASSIGN OWNED calls ATExecChangeOwner() instead of the
generic AlterObjectOwner_internal(), and ATExecChangeOwner() fulfills
the locking mandate.

Like in GRANT, implement this by following the locking protocol for any
catalog subject to the generic AlterObjectOwner_internal().  It would
suffice to do this for IsInplaceUpdateOid() catalogs only.  Back-
to v13 (all supported versions).

Kirill Reshke.  Reported by Alexander Kukushkin.

Discussion: https://postgr.es/m/CAFh8B=mpKjAy4Cuun-HP-f_vRzh2HSvYFG3rhVfYbfEBUhBAGg@mail.gmail.com

Fix memory  in pgoutput with publication list cache

The pgoutput module caches publication names in a list and frees it upon
invalidation.  However, the code forgot to free the actual publication
names within the list elements, as publication names are pstrdup()'d in
GetPublication().  This would cause memory to  in
CacheMemoryContext, bloating it over time as this context is not
cleaned.

This is a problem for WAL senders running for a long time, as an
accumulation of invalidation requests would bloat its cache memory
usage.  A second case, where this  is easier to see, involves a
backend calling SQL functions like pg_logical_slot_{get,peek}_changes()
which create a new decoding context with each execution.  More
publications create more bloat.

To address this, this commit adds a new memory context within the
logical decoding context and resets it each time the publication names
cache is invalidated, based on a suggestion from Amit Kapila.  This
ensures that the lifespan of the publication names aligns with that of
the logical decoding context.

Contrary to the HEAD-only commit f0c569d71515 that has changed
PGOutputData to track this new child memory context, the context is
tracked with a static variable whose state is reset with a MemoryContext
reset callback attached to PGOutputData->context, so as ABI
compatibility is preserved in stable branches.  This approach is based
on an suggestion from Amit Kapila.

Analyzed-by: Michael Paquier, Jeff Davis
Author: Masahiko Sawada
Reviewed-by: Amit Kapila, Michael Paquier, Euler Taveira, Hou Zhijie
Discussion: https://postgr.es/m/[email protected]
Back-through: 13

Update TransactionXmin when MyProc->xmin is updated

GetSnapshotData() set TransactionXmin = MyProc->xmin, but when
SnapshotResetXmin() advanced MyProc->xmin, it did not advance
TransactionXmin correspondingly. That meant that TransactionXmin could
be older than MyProc->xmin, and XIDs between than TransactionXmin and
the real MyProc->xmin could be vacuumed away. One known consequence is
in pg_subtrans lookups: we might try to look up the status of an XID
that was already truncated away.

Back- to all supported versions.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/d27a046d-a1e4-47d1-a95c-fbabe41debb4@iki.fi

Fix corruption when relation truncation fails.

RelationTruncate() does three things, while holding an
AccessExclusiveLock and preventing checkpoints:

1. Logs the truncation.
2. Drops buffers, even if they're dirty.
3. Truncates some number of files.

Step 2 could previously be canceled if it had to wait for I/O, and step
3 could and still can fail in file APIs.  All orderings of these
operations have data corruption hazards if interrupted, so we can't give
up until the whole operation is done.  When dirty pages were discarded
but the corresponding blocks were left on disk due to ERROR, old page
versions could come back from disk, reviving deleted data (see
pgsql-bugs #18146 and several like it).  When primary and standby were
allowed to disagree on relation size, standbys could panic (see
pgsql-bugs #18426) or revive data unknown to visibility management on
the primary (theorized).

Changes:

* WAL is now unconditionally flushed first
* smgrtruncate() is now called in a critical section, preventing
   interrupts and causing PANIC on file API failure
* smgrtruncate() has a new parameter for existing fork sizes,
   because it can't call smgrnblocks() itself inside a critical section

The changes apply to RelationTruncate(), smgr_redo() and
pg_truncate_visibility_map().  That last is also brought up to date with
other evolutions of the truncation protocol.

The VACUUM FileTruncate() failure mode had been discussed in older
reports than the ones referenced below, with independent analysis from
many people, but earlier theories on how to fix it were too complicated
to back-.  The more recently invented cancellation bug was
diagnosed by Alexander Lakhin.  Other corruption scenarios were spotted
by me while iterating on this  and earlier commit 75818b3a.

Back- to all supported releases.

Reviewed-by: Michael Paquier <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reported-by: [email protected]
Reported-by: Alexander Lakhin <[email protected]>
Discussion: https://postgr.es/m/18146-04e908c662113ad5%40postgresql.org
Discussion: https://postgr.es/m/18426-2d18da6586f152d6%40postgresql.org

Replace durable_rename_excl() by durable_rename(), take two

durable_rename_excl() attempts to avoid overwriting any existing files
by using link() and unlink(), and it falls back to rename() on some
platforms (aka WIN32), which offers no such overwrite protection.  Most
callers use durable_rename_excl() just in case there is an existing
file, but in practice there shouldn't be one (see below for more
details).

Furthermore, failures during durable_rename_excl() can result in
multiple hard links to the same file.  As per Nathan's tests, it is
possible to end up with two links to the same file in pg_wal after a
crash just before unlink() during WAL recycling.  Specifically, the test
produced links to the same file for the current WAL file and the next
one because the half-recycled WAL file was re-recycled upon restarting,
leading to WAL corruption.

This change replaces all the calls of durable_rename_excl() to
durable_rename().  This removes the protection against accidentally
overwriting an existing file, but some platforms are already living
without it and ordinarily there shouldn't be one.  The function itself
is left around in case any extensions are using it.  It will be removed
on HEAD via a follow-up commit.

Here is a summary of the existing callers of durable_rename_excl() (see
second discussion link at the bottom), replaced by this commit.  First,
basic_archive used it to avoid overwriting an archive concurrently
created by another server, but as mentioned above, it will still
overwrite files on some platforms.  Second, xlog.c uses it to recycle
past WAL segments, where an overwrite should not happen (origin of the
change at f0e37a8) because there are protections about the WAL segment
to select when recycling an entry.  The third and last area is related
to the write of timeline history files.  writeTimeLineHistory() will
write a new timeline history file at the end of recovery on promotion,
so there should be no such files for the same timeline.
What remains is writeTimeLineHistoryFile(), that can be used in parallel
by a WAL receiver and the startup process, and some digging of the
buildfarm shows that EEXIST from a WAL receiver can happen with an error
of "could not link file \"pg_wal/xlogtemp.NN\" to \"pg_wal/MM.history\",
which would cause an automatic restart of the WAL receiver as it is
promoted to FATAL, hence this should improve the stability of the WAL
receiver as rename() would overwrite an existing TLI history file
already fetched by the startup process at recovery.

This is the second time this change is attempted, ccfbd92 being the
first one, but this time no assertions are added for the case of a TLI
history file written concurrently by the WAL receiver or the startup
process because we can expect one to exist (some of the TAP tests are
able to trigger with a proper timing).

This commit has been originally applied on v16~ as of dac1ff30906b, and
we have received more reports of this issue, where clusters can become
corrupted at replay in older stable branches with multiple links
pointing to the same physical WAL segment file.  This back
addresses the problem for the v13~v15 range.

Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/CAJhEC04tBkYPF4q2uS_rCytauvNEVqdBAzasBEokfceFhF=KDQ@mail.gmail.com

Fix Assert failure in WITH RECURSIVE UNION queries

If the non-recursive part of a recursive CTE ended up using
TTSOpsBufferHeapTuple as the table slot type, then a duplicate value
could cause an Assert failure in CheckOpSlotCompatibility() when
checking the hash table for the duplicate value.  The expected slot type
for the deform step was TTSOpsMinimalTuple so the Assert failed when the
TTSOpsBufferHeapTuple slot was used.

This is a long-standing bug which we likely didn't notice because it
seems much more likely that the non-recursive term would have required
projection and used a TTSOpsVirtual slot, which CheckOpSlotCompatibility
is ok with.

There doesn't seem to be any harm done here other than the Assert
failure.  Both TTSOpsMinimalTuple and TTSOpsBufferHeapTuple slot types
require tuple deformation, so the EEOP_*_FETCHSOME ExprState step would
have properly existed in the ExprState.

The solution is to pass NULL for the ExecBuildGroupingEqual's 'lops'
parameter.  This means the ExprState's EEOP_*_FETCHSOME step won't
expect a fixed slot type.  This makes CheckOpSlotCompatibility() happy as
no checking is performed when the ExprEvalStep is not expecting a fixed
slot type.

Reported-by: Richard Guo
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAMbWs4-8U9q2LAtf8+ghV11zeUReA3AmrYkxzBEv0vKnDxwkKA@mail.gmail.com
Back-through: 13, all supported versions

Accommodate very large dshash tables.

If a dshash table grows very large (e.g., the dshash table for
cumulative statistics when there are millions of tables), resizing
it may fail with an error like:

ERROR: invalid DSA memory alloc request size 1073741824

To fix, permit dshash resizing to allocate more than 1 GB by
providing the DSA_ALLOC_HUGE flag.

Reported-by: Andreas Scherbaum
Author: Matthias van de Meent
Reviewed-by: Cédric Villemain, Michael Paquier, Andres Freund
Discussion: https://postgr.es/m/80a12d59-0d5e-4c54-866c-e69cd6536471%40pgug.de
Back-through: 13

Make 009_twophase.pl test pass with recovery_min_apply_delay set

The test failed if you ran the regression tests with TEMP_CONFIG with
recovery_min_apply_delay = '500ms'. Fix the race condition by waiting
for transaction to be applied in the replica, like in a few other
tests.

The failing test was introduced in commit cbfbda7841. Back to all
supported versions like that commit (except v12, which is no longer
supported).

Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/09e2a70a-a6c2-4b5c-aeae-040a7449c9f2@gmail.com

pgbench: fix misprocessing of some nested \if constructs.

An \if command appearing within a false (not-to-be-executed) \if
branch was incorrectly treated the same as \elif.  This could allow
statements within the inner \if to be executed when they should
not be.  Also the missing inner \if stack entry would result in an
assertion failure (in assert-enabled builds) when the final \endif
is reached.

Report and  by Michail Nikolaev.  Back- to all
supported branches.

Discussion: https://postgr.es/m/CANtu0oiA1ke=SP6tauhNqkUdv5QFsJtS1p=aOOf_iU+EhyKkjQ@mail.gmail.com

Fix possible crash in pg_dump with identity sequences.

If an owned sequence is considered interesting, force its owning
table to be marked interesting too.  This ensures, in particular,
that we'll fetch the owning table's column names so we have the
data needed for ALTER TABLE ... ADD GENERATED.  Previously there were
edge cases where pg_dump could get SIGSEGV due to not having filled in
the column names.  (The known case is where the owning table has been
made part of an extension while its identity sequence is not a member;
but there may be others.)

Also, if it's an identity sequence, force its dumped-components mask
to exactly match the owning table: dump definition only if we're
dumping the table's definition, dump data only if we're dumping the
table's data, etc.  This generalizes the code introduced in commit
b965f2617 that set the sequence's dump mask to NONE if the owning
table's mask is NONE.  That's insufficient to prevent failures,
because for example the table's mask might only request dumping ACLs,
which would lead us to still emit ALTER TABLE ADD GENERATED even
though we didn't create the table.  It seems better to treat an
identity sequence as though it were an inseparable aspect of the
table, matching the treatment used in the backend's dependency logic.
Perhaps this policy needs additional refinement, but let's wait to
see some field use-cases before changing it further.

While here, add a comment in pg_dump.h warning against writing tests
like "if (dobj->dump == DUMP_COMPONENT_NONE)", which was a bug in this
case.  There is one other example in getPublicationNamespaces, which
if it's not a bug is at least remarkably unclear and under-documented.
Changing that requires a separate discussion, however.

Per report from Artur Zakirov.  Back- to all supported branches.

Discussion: https://postgr.es/m/CAKNkYnwXFBf136=u9UqUxFUVagevLQJ=zGd5BsLhCsatDvQsKQ@mail.gmail.com

Fix elog(FATAL) before PostmasterMain() or just after fork().

Since commit 97550c0711972a9856b5db751539bbaf2f88884c, these failed with
"PANIC:  proc_exit() called in child process" due to uninitialized or
stale MyProcPid.  That was reachable if close() failed in
ClosePostmasterPorts() or setlocale(category, "C") failed, both
unlikely.  Back- to v13 (all supported versions).

Discussion: https://postgr.es/m/20241208034614 [email protected]

Fix unused-but-set-variable compiler warning in reorderbuffer.c.

On v13, this variable is only used for an assertion, so adding
PG_USED_FOR_ASSERTS_ONLY is sufficient to suppress this warning on
builds with assertions disabled. Older versions are unsupported,
and newer versions use the variable for more than the assertion, so
this only needs to be applied to REL_13_STABLE.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/Z1dCFnzrP24O8WNR%40nathan

Doc: fix incorrect EXPLAIN ANALYZE output for bloom indexes

It looks like the example case was once modified to increase the number
of rows but the EXPLAIN ANALYZE output wasn't updated to reflect that.

Also adjust the text which discusses the index sizes. With the example
table size, the bloom index isn't quite 8 times more space efficient
than the btree indexes.

Discussion: https://postgr.es/m/CAApHDvovx8kQ0=HTt85gFDAwmTJHpCgiSvRmQZ_6u_g-vQYM_w@mail.gmail.com
Back-through: 13, all supported versions

Simplify executor's determination of whether to use parallelism.

Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution.  The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun.  This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented.  That led to assertion failures in some corner
cases.  The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan.  (It could have been done in
ExecutorRun too, leading to a slightly shorter  -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)

This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.

Per report from Yugo Nagata, who also reviewed and tested
this .  Back- to all supported branches.

Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp

Ensure that pg_amop/amproc entries depend on their lefttype/righttype.

Usually an entry in pg_amop or pg_amproc does not need a dependency on
its amoplefttype/amoprighttype/amproclefttype/amprocrighttype types,
because there is an indirect dependency via the argument types of its
referenced operator or procedure, or via the opclass it belongs to.
However, for some support procedures in some index AMs, the argument
types of the support procedure might not mention the column data type
at all.  Also, the amop/amproc entry might be treated as "loose" in
the opfamily, in which case it lacks a dependency on any particular
opclass; or it might be a cross-type entry having a reference to a
datatype that is not its opclass' opcintype.

The upshot of all this is that there are cases where a datatype can
be dropped while leaving behind amop/amproc entries that mention it,
because there is no path in pg_depend showing that those entries
depend on that type.  Such entries are harmless in normal activity,
because they won't get used, but they cause problems for maintenance
actions such as dropping the operator family.  They also cause pg_dump
to produce bogus output.  The previous commit put a band-aid on the
DROP OPERATOR FAMILY failure, but a real fix is needed.

To fix, add pg_depend entries showing that a pg_amop/pg_amproc entry
depends on its lefttype/righttype.  To avoid bloating pg_depend too
much, skip this if the referenced operator or function has that type
as an input type.  (I did not bother with considering the possible
indirect dependency via the opclass' opcintype; at least in the
reported case, that wouldn't help anyway.)

Probably, the reason this has escaped notice for so long is that
add-on datatypes and relevant opclasses/opfamilies are usually
packaged as extensions nowadays, so that there's no way to drop
a type without dropping the referencing opclasses/opfamilies too.
Still, in the absence of pg_depend entries there's nothing that
constrains DROP EXTENSION to drop the opfamily entries before the
datatype, so it seems possible for a DROP failure to occur anyway.

The specific case that was reported doesn't fail in v13, because
v13 prefers to attach the support procedure to the opclass not the
opfamily.  But it's surely possible to construct other edge cases
that do fail in v13, so  that too.

Per report from Yoran Heling.  Back- to all supported branches.

Discussion: https://postgr.es/m/Z1MVCOh1hprjK5Sf@gmai021

Make getObjectDescription robust against dangling amproc type links.

Yoran Heling reported a case where a data type could be dropped
while references to its OID remain behind in pg_amproc.  This
causes getObjectDescription to fail, which blocks dropping the
operator family (since our DROP code likes to construct descriptions
of everything it's dropping).  The proper fix for this requires
adding more pg_depend entries.  But to allow DROP to go through with
already-corrupt catalogs, tweak getObjectDescription to print "???"
for the type instead of failing when it processes such an entry.

I changed the logic for pg_amop similarly, for consistency,
although it is not known that the problem can manifest in pg_amop.

Per report from Yoran Heling.  Back- to all supported
branches (although the problem may be unreachable in v13).

Discussion: https://postgr.es/m/Z1MVCOh1hprjK5Sf@gmai021

Fix is_digit labeling of to_timestamp's FFn format codes.

These format codes produce or consume strings of digits, so they
should be labeled with is_digit = true, but they were not.
This has effect in only one place, where is_next_separator()
is checked to see if the preceding format code should slurp up
all the available digits. Thus, with a format such as '...SSFF3'
with remaining input '12345', the 'SS' code would consume all
five digits (and then complain about seconds being out of range)
when it should eat only two digits.

Per report from Nick Davies. This bug goes back to d589f9446
where the FFn codes were introduced, so back- to v13.

Discussion: https://postgr.es/m/AM8PR08MB6356AC979252CFEA78B56678B6312@AM8PR08MB6356.eurprd08.prod.outlook.com

Avoid low-probability crash on out-of-memory.

check_restrict_nonsystem_relation_kind() correctly uses guc_malloc()
in v16 and later. But in older branches it must use malloc()
directly, and it forgot to check for failure return.
Faulty backing of 66e94448a.

Karina Litskevich

Discussion: https://postgr.es/m/CACiT8iZ=atkguKVbpN4HmJFMb4+T9yEowF5JuPZG8W+kkZ9L6w@mail.gmail.com

RelationTruncate() must set DELAY_CHKPT_START.

Previously, it set only DELAY_CHKPT_COMPLETE. That was important,
because it meant that if the XLOG_SMGR_TRUNCATE record preceded a
XLOG_CHECKPOINT_ONLINE record in the WAL, then the truncation would also
happen on disk before the XLOG_CHECKPOINT_ONLINE record was
written.

However, it didn't guarantee that the sync request for the truncation
was processed before the XLOG_CHECKPOINT_ONLINE record was written. By
setting DELAY_CHKPT_START, we guarantee that if an XLOG_SMGR_TRUNCATE
record is written to WAL before the redo pointer of a concurrent
checkpoint, the sync request queued by that operation must be processed
by that checkpoint, rather than being left for the following one.

This is a refinement of commit 412ad7a5563. Back- to all supported
releases, like that commit.

Author: Robert Haas <[email protected]>
Reported-by: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKG%2B-2rjGZC2kwqr2NMLBcEBp4uf59QT1advbWYF_uc%2B0Aw%40mail.gmail.com

Fix broken list-munging in ecpg's remove_variables().

The loops over cursor argument variables neglected to ever advance
"prevvar".  The code would accidentally do the right thing anyway
when removing the first or second list entry, but if it had to
remove the third or later entry then it would also remove all
entries between there and the first entry.  AFAICS this would
only matter for cursors that reference out-of-scope variables,
which is a weird Informix compatibility hack; between that and
the lack of impact for short lists, it's not so surprising that
nobody has complained.  Nonetheless it's a pretty obvious bug.

It would have been more obvious if these loops used a more standard
coding style for chasing the linked lists --- this business with the
"prev" pointer sometimes pointing at the current list entry is
confusing and overcomplicated.  So rather than just add a minimal
band-aid, I chose to rewrite the loops in the same style we use
elsewhere, where the "prev" pointer is NULL until we are dealing with
a non-first entry and we save the "next" pointer at the top of the
loop.  (Two of the four loops touched here are not actually buggy,
but it seems better to make them all look alike.)

Coverity discovered this problem, but not until 2b41de4a5 added code
to free no-longer-needed arguments structs.  With that, the incorrect
link updates are possibly touching freed memory, and it complained
about that.  Nonetheless the list corruption hazard is ancient, so
back- to all supported branches.

Fix MinGW %d vs %lu warnings in back branches.

Commit 352f6f2d used %d instead of %lu to format DWORD (unsigned long)
with psprintf().  The _WIN32_WINNT value recently changed for MinGW in
REL_15_STABLE (commit d700e8d7), so the code was suddenly being
compiled, with warnings from gcc.

The warnings were already fixed in 16+ by commits 495ed0ef and a9bc04b2
after the _WIN32_WINNT value was increase there.  14 and 13 didn't warn
because they still use a lower value for MinGW, and supported versions
of Visual Studio should compile the code in all live branches but don't
check our format string.

The change doesn't affect the result: sizeof(int) == sizeof(long) on
this platform, and the values are computed with expressions that cannot
exceed INT_MAX so were never interpreted as negative.

Back- the formatting change from those commits into 13-15.  This
should turn CI's 15 branch green again and stop fairywren from warning
about that on 15.

Reported-by: Andres Freund <[email protected]>
Reported-by: Peter Eisentraut <[email protected]>
Discussion: https://postgr.es/m/t2vjrcb3bloxf5qqvxjst6r7lvrefqyecxgt2koy5ho5b5glr2%40yuupmm6whgob

Skip SectionMemoryManager.h in cpluspluscheck.

Commit 9044fc1d45a0 skipped SectionMemoryManager.h in headerscheck, and
by extension also cpluspluscheck, because it's C++ and would fail both
tests. That worked in master and REL_17_STABLE due to 7b8e2ae2fd3b, but
older branches have a separate cpluspluscheck script. We need to copy
the filtering rule into there too.

This problem was being reported by CI's CompilerWarnings task in the 15
and 16 branches, but was a victim of alert fatigue syndrome (unrelated
problems in the back-branches).

Fix 16, and back- to 13, as those are the live branches that have a
separate cpluspluscheck script.

Revert "Handle better implicit transaction state of pipeline mode"

This reverts commit d77f91214fb7 on all stable branches, due to concerns
regarding the compatility side effects this could create in a minor
release. The change still exists on HEAD.

Discussion: https://postgr.es/m/CA+TgmoZqRgeFTg4+Yf_CMRRXiHuNz1u6ZC4FvVk+rxw0RmOPnw@mail.gmail.com
Back-through: 13

pgbench: Ensure previous progress message is fully cleared when updating.

During pgbench's table initialization, progress updates could display
leftover characters from the previous message if the new message
was shorter. This commit resolves the issue by appending spaces to
the current message to fully overwrite any remaining characters from
the previous line.

Back- to all the supported versions.

Author: Yushi Ogiwara, Tatsuo Ishii, Fujii Masao
Reviewed-by: Tatsuo Ishii, Fujii Masao
Discussion: https://postgr.es/m/9a9b8b95b6a709877ae48ad5b0c59bb9@oss.nttdata.com

Exclude LLVM files from whitespace checks

Commit 9044fc1d45a added some files from upstream LLVM. These files
have different whitespace rules, which make the git whitespace checks
powered by gitattributes fail. To fix, add those files to the exclude
list.

If a C23 compiler is detected, try asking for C17.

Branches before 16 can't be compiled with a C23 compiler (see
deprecation warnings silenced by commit f9a56e72, and non-back-able
changes made in 16 by commit 1c27d16e). Test __STDC_VERSION__, and if
it's above C17 then try appending -std=gnu17. The test is done with the
user's CFLAGS, so an acceptable language version can also be configured
manually that way.

This is done in branches 15 and older, back to 9.2, per policy of
keeping them buildable with modern tools.

Discussion: https://postgr.es/m/87o72eo9iu.fsf%40gentoo.org

Handle better implicit transaction state of pipeline mode

When using a pipeline, a transaction starts from the first command and
is committed with a Sync message or when the pipeline ends.

Functions like IsInTransactionBlock() or PreventInTransactionBlock()
were already able to understand a pipeline as being in a transaction
block, but it was not the case of CheckTransactionBlock().  This
function is called for example to generate a WARNING for SET LOCAL,
complaining that it is used outside of a transaction block.

The current state of the code caused multiple problems, like:
- SET LOCAL executed at any stage of a pipeline issued a WARNING, even
if the command was at least second in line where the pipeline is in a
transaction state.
- LOCK TABLE failed when invoked at any step of a pipeline, even if it
should be able to work within a transaction block.

The pipeline protocol assumes that the first command of a pipeline is
not part of a transaction block, and that any follow-up commands is
considered as within a transaction block.

This commit changes the backend so as an implicit transaction block is
started each time the first Execute message of a pipeline has finished
processing, with this implicit transaction block ended once a sync is
processed.  The checks based on XACT_FLAGS_PIPELINING in the routines
checking if we are in a transaction block are not necessary: it is
enough to rely on the existing ones.

Some tests are added to pgbench, that can be backed down to v17
when \syncpipeline is involved and down to v14 where \startpipeline and
\endpipeline are available.  This is unfortunately limited regarding the
error patterns that can be checked, but it provides coverage for various
pipeline combinations to check if these succeed or fail.  These tests
are able to capture the case of SET LOCAL's WARNING.  The author has
proposed a different feature to improve the coverage by adding similar
meta-commands to psql where error messages could be checked, something
more useful for the cases where commands cannot be used in transaction
blocks, like REINDEX CONCURRENTLY or VACUUM.  This is considered as
future work for v18~.

Author: Anthonin Bonnefoy
Reviewed-by: Jelte Fennema-Nio, Michael Paquier
Discussion: https://postgr.es/m/CAO6_XqrWO8uNBQrSu5r6jh+vTGi5Oiyk4y8yXDORdE2jbzw8xw@mail.gmail.com
Back-through: 13

Fix NULLIF()'s handling of read-write expanded objects.

If passed a read-write expanded object pointer, the EEOP_NULLIF
code would hand that same pointer to the equality function
and then (unless equality was reported) also return the same
pointer as its value.  This is no good, because a function that
receives a read-write expanded object pointer is fully entitled
to scribble on or even delete the object, thus corrupting the
NULLIF output.  (This problem is likely unobservable with the
equality functions provided in core Postgres, but it's easy to
demonstrate with one coded in plpgsql.)

To fix, make sure the pointer passed to the equality function
is read-only.  We can still return the original read-write
pointer as the NULLIF result, allowing optimization of later
operations.

Per bug #18722 from Alexander Lakhin.  This has been wrong
since we invented expanded objects, so back- to all
supported branches.

Discussion: https://postgr.es/m/18722-fd9e645448cc78b4@postgresql.org

Avoid "you don't own a lock of type ExclusiveLock" in GRANT TABLESPACE.

This WARNING appeared because SearchSysCacheLocked1() read
cc_relisshared before catcache initialization, when the field is false
unconditionally.  On the basis of reading false there, it constructed a
locktag as though pg_tablespace weren't relisshared.  Only shared
catalogs could be affected, and only GRANT TABLESPACE was affected in
practice.  SearchSysCacheLocked1() callers use one other shared-relation
syscache, DATABASEOID.  DATABASEOID is initialized by the end of
CheckMyDatabase(), making the problem unreachable for pg_database.

Back- to v13 (all supported versions).  This has no known impact
before v16, where ExecGrant_common() first appeared.  Earlier branches
avoid trouble by having a separate ExecGrant_Tablespace() that doesn't
use LOCKTAG_TUPLE.  However, leaving this unfixed in v15 could ensnare a
future back- of a SearchSysCacheLocked1() call.

Reported by Aya Iwata.

Discussion: https://postgr.es/m/OS7PR01MB11964507B5548245A7EE54E70EA212@OS7PR01MB11964.jpnprd01.prod.outlook.com

Update configure probes for CFLAGS needed for ARM CRC instructions.

On ARM platforms where the baseline CPU target lacks CRC instructions,
we need to supply a -march flag to persuade the compiler to compile
such instructions.  It turns out that our existing choice of
"-march=armv8-a+crc" has not worked for some time, because recent gcc
will interpret that as selecting software floating point, and then
will spit up if the platform requires hard-float ABI, as most do
nowadays.  The end result was to silently fall back to software CRC,
which isn't very desirable since in practice almost all currently
produced ARM chips do have hardware CRC.

We can fix this by using "-march=armv8-a+crc+simd" to enable the
correct ABI choice.  (This has no impact on the code actually
generated, since neither of the files we compile with this flag
does any floating-point stuff, let alone SIMD.)  Keep the test for
"-march=armv8-a+crc" since that's required for soft-float ABI,
but try that second since most platforms we're likely to build on
use hard-float.

Since this isn't working as-intended on the last several years'
worth of gcc releases, back- to all supported branches.

Discussion: https://postgr.es/m/4496616.iHFcN1HehY@portable-bastien

Add support for Tcl 9

Tcl 9 changed several API functions to take Tcl_Size, which is
ptrdiff_t, instead of int, for 64-bit enablement. We have to change a
few local variables to be compatible with that. We also provide a
fallback typedef of Tcl_Size for older Tcl versions.

The affected variables are used for quantities that will not approach
values beyond the range of int, so this doesn't change any
functionality.

Reviewed-by: Tristan Partin <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/bce0fe54-75b4-438e-b42b-8e84bc7c0e9c%40eisentraut.org

Assume that <stdbool.h> conforms to the C standard.

Previously we checked "for <stdbool.h> that conforms to C99" using
autoconf's AC_HEADER_STDBOOL macro.  We've required C99 since PostgreSQL
12, so the test was redundant, and under C23 it was broken: autoconf
2.69's implementation doesn't understand C23's new empty header (the
macros it's looking for went away, replaced by language keywords).
Later autoconf versions fixed that, but let's just remove the
anachronistic test.

HAVE_STDBOOL_H and HAVE__BOOL will no longer be defined, but they
weren't directly tested in core or likely extensions (except in 11, see
below).  PG_USE_STDBOOL (or USE_STDBOOL in 11 and 12) is still defined
when sizeof(bool) is 1, which should be true on all modern systems.
Otherwise we define our own bool type and values of size 1, which would
fail to compile under C23 as revealed by the broken test.  (We'll
probably clean that dead code up in master, but here we want a minimal
back-able change.)

This came to our attention when GCC 15 recently started using using C23
by default and failed to compile the replacement code, as reported by
Sam James and build farm animal alligator.

Back- to all supported releases, and then two older versions that
also know about <stdbool.h>, per the recently-out-of-support policy[1].
12 requires C99 so it's much like the supported releases, but 11 only
assumes C89 so it now uses AC_CHECK_HEADERS instead of the overly picky
AC_HEADER_STDBOOL.  (I could find no discussion of which historical
systems had <stdbool.h> but failed the conformance test; if they ever
existed, they surely aren't relevant to that policy's goals.)

[1] https://wiki.postgresql.org/wiki/Committing_checklist#Policies

Reported-by: Sam James <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]> (master version)
Reviewed-by: Tom Lane <[email protected]> (approach)
Discussion: https://www.postgresql.org/message-id/flat/87o72eo9iu.fsf%40gentoo.org

jit: Use -mno-outline-atomics for bitcode on ARM.

If the executable's .o files were produced by a compiler (probably gcc)
not using -moutline-atomics, and the corresponding .bc files were
produced by clang using -moutline-atomics (probably by default), then
the generated bitcode functions would have the target attribute
"+outline-atomics", and could fail at runtime when inlined.  If the
target ISA at bitcode generation time was armv8-a (the most conservative
aarch64 target, no LSE), then LLVM IR atomic instructions would generate
calls to functions in libgcc.a or libclang_rt.*.a that switch between
LL/SC and faster LSE instructions depending on a runtime AT_HWCAP check.
Since the corresponding .o files didn't need those functions, they
wouldn't have been included in the executable, and resolution would
fail.

At least Debian and Ubuntu are known to ship gcc and clang compilers
that target armv8-a but differ on the use of outline atomics by default.

Fix, by suppressing the outline atomics attribute in bitcode explicitly.
Inline LL/SC instructions will be generated for atomic operations in
bitcode built for armv8-a.  Only configure scripts are adjusted for now,
because the meson build system doesn't generate bitcode yet.

This doesn't seem to be a new phenomenon, so real cases of functions
using atomics that are inlined by JIT must be rare in the wild given how
long it took for a bug report to arrive.  The reported case could be
reduced to:

postgres=# set jit_inline_above_cost = 0;
SET
postgres=# set jit_above_cost = 0;
SET
postgres=# select pg_last_wal_receive_lsn();
WARNING:  failed to resolve name __aarch64_swp4_acq_rel
FATAL:  fatal llvm error: Program used external function
'__aarch64_swp4_acq_rel' which could not be resolved!

The change doesn't affect non-ARM systems or later target ISAs.

Back- to all supported releases.

Reported-by: Alexander Kozhemyakin <[email protected]>
Discussion: https://postgr.es/m/18610-37bf303f904fede3%40postgresql.org

Fix outdated bit in README.tuplock

Apparently this information has been outdated since first committed,
because we adopted a different implementation during development per
reviews and this detail was not updated in the README.

This has been wrong since commit 0ac5ad5134f2 introduced the file in
2013. Back to all live branches.

Reported-by: Will Mortensen <[email protected]>
Discussion: https://postgr.es/m/CAMpnoC6yEQ=c0Rdq-J7uRedrP7Zo9UMp6VZyP23QMT68n06cvA@mail.gmail.com

Avoid assertion failure if a setop leaf query contains setops.

Ordinarily transformSetOperationTree will collect all UNION/
INTERSECT/EXCEPT steps into the setOperations tree of the topmost
Query, so that leaf queries do not contain any setOperations.
However, it cannot thus flatten a subquery that also contains
WITH, ORDER BY, FOR UPDATE, or LIMIT.  I (tgl) forgot that in
commit 07b4c48b6 and wrote an assertion in rule deparsing that
a leaf's setOperations would always be empty.

If it were nonempty then we would want to parenthesize the subquery
to ensure that the output represents the setop nesting correctly
(e.g. UNION below INTERSECT had better get parenthesized).  So
rather than just removing the faulty Assert, let's change it into
an additional case to check to decide whether to add parens.  We
don't expect that the additional case will ever fire, but it's
cheap insurance.

Man Zeng and Tom Lane

Discussion: https://postgr.es/m/[email protected]

Compare collations before merging UNION operations.

In the dim past we figured it was okay to ignore collations
when combining UNION set-operation nodes into a single N-way
UNION operation.  I believe that was fine at the time, but
it stopped being fine when we added nondeterministic collations:
the semantics of distinct-ness are affected by those.  v17 made
it even less fine by allowing per-child sorting operations to
be merged via MergeAppend, although I think we accidentally
avoided any live bug from that.

Add a check that collations match before deciding that two
UNION nodes are equivalent.  I also failed to resist the
temptation to comment plan_union_children() a little better.

Back- to all supported branches (v13 now), since they
all have nondeterministic collations.

Discussion: https://postgr.es/m/3605568.1731970579@sss.pgh.pa.us

Stamp 13.18.

Fix recently-exposed portability issue in regex optimization.

fixempties() counts the number of in-arcs in the regex NFA and then
allocates an array of that many arc pointers.  If the NFA contains no
arcs, this amounts to malloc(0) for which some platforms return NULL.
The code mistakenly treats that as indicating out-of-memory.  Thus,
we can get a bogus "out of memory" failure for some unsatisfiable
regexes.

This happens only in v15 and earlier, since bea3d7e38 switched to
using palloc() rather than bare malloc().  And at least of the
platforms in the buildfarm, only AIX seems to return NULL.  So the
impact is pretty narrow.  But I don't especially want to ship code
that is failing its own regression tests, so let's fix this for
this week's releases.

A quick code survey says that there is only the one place in
src/backend/regex/ that is at risk of doing malloc(0), so we'll just
band-aid that place.  A more future-proof fix could be to install a
malloc() wrapper similar to pg_malloc().  But this code seems unlikely
to change much more in the affected branches, so that's probably
overkill.

The only known test case for this involves a complemented character
class in a bracket expression, for example [^\s\S], and we didn't
support that in v13.  So it may be that the problem is unreachable
in v13.  But I'm not 100% sure of that, so  v13 too.

Discussion: https://postgr.es/m/661fd81b-f069-8f30-1a41-e195c57449b4@gmail.com

Release notes for 17.2, 16.6, 15.10, 14.15, 13.18, 12.22.

Fix per-session activation of ALTER {ROLE|DATABASE} SET role.

After commit 5a2fed911a85ed6d8a015a6bafe3a0d9a69334ae, the catalog state
resulting from these commands ceased to affect sessions.  Restore the
longstanding behavior, which is like beginning the session with a SET
ROLE command.  If cherry-picking the CVE-2024-10978 fixes, default to
including this, too.  (This fixes an unintended side effect of fixing
CVE-2024-10978.)  Back- to v12, like that commit.  The release team
decided to include v12, despite the original intent to halt v12 commits
earlier this week.

Tom Lane and Noah Misch.  Reported by Etienne LAFARGE.

Discussion: https://postgr.es/m/CADOZwSb0UsEr4_UTFXC5k7=fyyK8uKXekucd+-uuGjJsGBfxgw@mail.gmail.com

Fix a possibility of logical replication slot's restart_lsn going backwards.

Previously LogicalIncreaseRestartDecodingForSlot() accidentally
accepted any LSN as the candidate_lsn and candidate_valid after the
restart_lsn of the replication slot was updated, so it potentially
caused the restart_lsn to move backwards.

A scenario where this could happen in logical replication is: after a
logical replication restart, based on previous candidate_lsn and
candidate_valid values in memory, the restart_lsn advances upon
receiving a subscriber acknowledgment. Then, logical decoding restarts
from an older point, setting candidate_lsn and candidate_valid based
on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments
then update the restart_lsn to an LSN older than the current value.

In the reported case, after WAL files were removed by a checkpoint,
the retreated restart_lsn prevented logical replication from
restarting due to missing WAL segments.

This change essentially modifies the 'if' condition to 'else if'
condition within the function. The previous code had an asymmetry in
this regard compared to LogicalIncreaseXminForSlot(), which does
almost the same thing for different fields.

The WAL removal issue was reported by Hubert Depesz Lubaczewski.

Back to all supported versions, since the bug exists since 9.4
where logical decoding was introduced.

Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila
Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com
Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me
Back-through: 13

Count contrib/bloom index scans in pgstat view.

Maintain the pg_stat_user_indexes.idx_scan pgstat counter during
contrib/Bloom index scans.

Oversight in commit 9ee014fc, which added the Bloom index contrib
module.

Author: Masahiro Ikeda <[email protected]>
Reviewed-By: Peter Geoghegan <[email protected]>
Discussion: https://postgr.es/m/c48839d881388ee401a01807c686004d@oss.nttdata.com
Back: 13- (all supported branches).

Fix arrays comparison in CompareOpclassOptions()

The current code calls array_eq() and does not provide FmgrInfo. This commit
provides initialization of FmgrInfo and uses C collation as the safe option
for text comparison because we don't know anything about the semantics of
opclass options.

Back to 13, where opclass options were introduced.

Reported-by: Nicolas Maus
Discussion: https://postgr.es/m/18692-72ea398df3ec6712%40postgresql.org
Back-through: 13

Stamp 13.17.

Last-minute updates for release notes.

Security: CVE-2024-10976, CVE-2024-10977, CVE-2024-10978, CVE-2024-10979

Parallel workers use AuthenticatedUserId for connection privilege checks.

Commit 5a2fed911 had an unexpected side-effect: the parallel worker
launched for the new test case would fail if it couldn't use a
superuser-reserved connection slot. The reason that test failed
while all our pre-existing ones worked is that the connection
privilege tests in InitPostgres had been based on the superuserness
of the leader's AuthenticatedUserId, but after the rearrangements
of 5a2fed911 we were testing the superuserness of CurrentUserId,
which the new test case deliberately made to be a non-superuser.

This all seems very accidental and probably not the behavior we really
want, but a security is no time to be redesigning things.
Pending some discussion about desirable semantics, hack it so that
InitPostgres continues to pay attention to the superuserness of
AuthenticatedUserId when starting a parallel worker.

Nathan Bossart and Tom Lane, per buildfarm member sawshark.

Security: CVE-2024-10978

Fix cross-version upgrade tests.

TestUpgradeXversion knows how to make the main regression database's
references to pg_regress.so be version-independent. But it doesn't
do that for plperl's database, so that the C function added by
commit b7e3a52a8 is causing cross-version upgrade test failures.
Path of least resistance is to just drop the function at the end
of the new test.

In <= v14, also take the opportunity to clean up the generated
test files.

Security: CVE-2024-10979

src/tools/msvc: Respect REGRESS_OPTS in plcheck.

v16 commit 8fe3e697a1a83a722b107c7cb9c31084e1f4d077 used REGRESS_OPTS in
a way needing this. That broke "vcregress plcheck". Back-
v16..v12; newer versions don't have this build system.

Add needed .gitignore files in back branches.

v14 and earlier use generated test files, which require being
.gitignore'd to avoid git complaints when testing in-tree.

Security: CVE-2024-10979

Fix improper interactions between session_authorization and role.

The SQL spec mandates that SET SESSION AUTHORIZATION implies
SET ROLE NONE.  We tried to implement that within the lowest-level
functions that manipulate these settings, but that was a bad idea.
In particular, guc.c assumes that it doesn't matter in what order
it applies GUC variable updates, but that was not the case for these
two variables.  This problem, compounded by some hackish attempts to
work around it, led to some security-grade issues:

* Rolling back a transaction that had done SET SESSION AUTHORIZATION
would revert to SET ROLE NONE, even if that had not been the previous
state, so that the effective user ID might now be different from what
it had been.

* The same for SET SESSION AUTHORIZATION in a function SET clause.

* If a parallel worker inspected current_setting('role'), it saw
"none" even when it should see something else.

Also, although the parallel worker startup code intended to cope
with the current role's pg_authid row having disappeared, its
implementation of that was incomplete so it would still fail.

Fix by fully separating the miscinit.c functions that assign
session_authorization from those that assign role.  To implement the
spec's requirement, teach set_config_option itself to perform "SET
ROLE NONE" when it sets session_authorization.  (This is undoubtedly
ugly, but the alternatives seem worse.  In particular, there's no way
to do it within assign_session_authorization without incompatible
changes in the API for GUC assign hooks.)  Also, improve
ParallelWorkerMain to directly set all the relevant user-ID variables
instead of relying on some of them to get set indirectly.  That
allows us to survive not finding the pg_authid row during worker
startup.

In v16 and earlier, this includes back-ing 9987a7bf3 which
fixed a violation of GUC coding rules: SetSessionAuthorization
is not an appropriate place to be throwing errors from.

Security: CVE-2024-10978

Ensure cached plans are correctly marked as dependent on role.

If a CTE, subquery, sublink, security invoker view, or coercion
projection references a table with row-level security policies, we
neglected to mark the plan as potentially dependent on which role
is executing it. This could lead to later executions in the same
session returning or hiding rows that should have been hidden or
returned instead.

Reported-by: Wolfgang Walther
Reviewed-by: Noah Misch
Security: CVE-2024-10976
Back-through: 12

Block environment variable mutations from trusted PL/Perl.

Many process environment variables (e.g. PATH), bypass the containment
expected of a trusted PL.  Hence, trusted PLs must not offer features
that achieve setenv().  Otherwise, an attacker having USAGE privilege on
the language often can achieve arbitrary code execution, even if the
attacker lacks a database server operating system user.

To fix PL/Perl, replace trusted PL/Perl %ENV with a tied hash that just
replaces each modification attempt with a warning.  Sites that reach
these warnings should evaluate the application-specific implications of
proceeding without the environment modification:

  Can the application reasonably proceed without the modification?

    If no, switch to plperlu or another approach.

    If yes, the application should change the code to stop attempting
    environment modifications.  If that's too difficult, add "untie
    %main::ENV" in any code executed before the warning.  For example,
    one might add it to the start of the affected function or even to
    the plperl.on_plperl_init setting.

In passing, link to Perl's guidance about the Perl features behind the
security posture of PL/Perl.

Back- to v12 (all supported versions).

Andrew Dunstan and Noah Misch

Security: CVE-2024-10979

Translation updates

Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: be7f3c3a26b382c9d7c9d32c7a972e452b56f529

libpq: Bail out during SSL/GSS negotiation errors

This commit changes libpq so that errors reported by the backend during
the protocol negotiation for SSL and GSS are discarded by the client, as
these may include bytes that could be consumed by the client and write
arbitrary bytes to a client's terminal.

A failure with the SSL negotiation now leads to an error immediately
reported, without a retry on any other methods allowed, like a fallback
to a plaintext connection.

A failure with GSS discards the error message received, and we allow a
fallback as it may be possible that the error is caused by a connection
attempt with a pre-11 server, GSS encryption having been introduced in
v12. This was a problem only with v17 and newer versions; older
versions discard the error message already in this case, assuming a
failure caused by a lack of support for GSS encryption.

Author: Jacob Champion
Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Michael Paquier
Security: CVE-2024-10977
Back-through: 12

Release notes for 17.1, 16.5, 15.9, 14.14, 13.17, 12.21.

Improve fix for not entering parallel mode when holding interrupts.

Commit ac04aa84a put the shutoff for this into the planner, which is
not ideal because it doesn't prevent us from re-using a previously
made parallel plan. Revert the planner change and instead put the
shutoff into InitializeParallelDSM, modeling it on the existing code
there for recovering from failure to allocate a DSM segment.

However, that code path is mostly untested, and testing a bit harder
showed there's at least one bug: ExecHashJoinReInitializeDSM is not
prepared for us to have skipped doing parallel DSM setup. I also
thought the Assert in ReinitializeParallelWorkers is pretty
ill-advised, and replaced it with a silent Min() operation.

The existing test case added by ac04aa84a serves fine to test this
version of the fix, so no change needed there.

by me, but thanks to Noah Misch for the core idea that we
could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED.
Back- to v12, as ac04aa84a was.

Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com

Disallow partitionwise join when collations don't match

If the collation of any join key column doesn’t match the collation of
the corresponding partition key, partitionwise joins can yield incorrect
results. For example, rows that would match under the join key collation
might be located in different partitions due to the partitioning
collation. In such cases, a partitionwise join would yield different
results from a non-partitionwise join, so disallow it in such cases.

Reported-by: Tender Wang <[email protected]>
Author: Jian He <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
Back-through: 12

Disallow partitionwise grouping when collations don't match

If the collation of any grouping column doesn’t match the collation of
the corresponding partition key, partitionwise grouping can yield
incorrect results. For example, rows that would be grouped under the
grouping collation may end up in different partitions under the
partitioning collation. In such cases, full partitionwise grouping
would produce results that differ from those without partitionwise
grouping, so disallowed that.

Partial partitionwise aggregation is still allowed, as the Finalize
step reconciles partition-level aggregates with grouping requirements
across all partitions, ensuring that the final output remains
consistent.

This commit also fixes group_by_has_partkey() by ensuring the
RelabelType node is stripped from grouping expressions when matching
them to partition key expressions to avoid false mismatches.

Bug: #18568
Reported-by: Webbo Han <[email protected]>
Author: Webbo Han <1105066510@qq.com>
Reviewed-by: Tender Wang <[email protected]>
Reviewed-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Jian He <[email protected]>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
Back-through: 12

Message style improvement

Back the part of edee0c621de that applies to a90bdd7a44d, which
was also backed. That way, the message is consistent in all
branches.

Fix lstat() for broken junction points on Windows.

When using junction points to emulate symlinks on Windows, one edge case
was not handled correctly by commit c5cb8f3b: if a junction point is
broken (pointing to a non-existent path), we'd report ENOENT. This
doesn't break any known use case, but was noticed while developing a
test suite for these functions and is fixed here for completeness.

Also add translation ERROR_CANT_RESOLVE_FILENAME -> ENOENT, as that is
one of the errors Windows can report for some kinds of broken paths.

Discussion: https://postgr.es/m/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
(cherry picked from commit 387803d81d6256fcb60b9192bb5b00042442b4e3)

Author: Thomas Munro <[email protected]>
Author: Alexandra Wang <[email protected]>

Provide lstat() for Windows.

Junction points will be reported with S_ISLNK(x.st_mode), simulating
POSIX lstat(). stat() will follow pseudo-symlinks, like in POSIX (but
only one level before giving up, unlike in POSIX).

This completes a TODO left by commit bed90759fcb.

Tested-by: Andrew Dunstan <[email protected]> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com
(cherry picked from commit c5cb8f3b770c043509b61528664bcd805e1777e6)

Author: Thomas Munro <[email protected]>
Author: Alexandra Wang <[email protected]>

Make unlink() work for junction points on Windows.

To support harmonization of Windows and Unix code, teach our unlink()
wrapper that junction points need to be unlinked with rmdir() on
Windows.

Tested-by: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGLfOOeyZpm5ByVcAt7x5Pn-%3DxGRNCvgiUPVVzjFLtnY0w%40mail.gmail.com
(cherry picked from commit f357233c9db8be2a015163da8e1ab0630f444340)

Author: Thomas Munro <[email protected]>
Author: Alexandra Wang <[email protected]>

Add missing include guard to win32ntdll.h.

Oversight in commit e2f0f8ed. Also add this file to the exclusion lists
in headerscheck and cpluscpluscheck, because Unix systems don't have a
header it includes.

Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/2760528.1641929756%40sss.pgh.pa.us
(cherry picked from commit af9e6331aeba149c93052c3549140082a85a3cf9)

Author: Thomas Munro <[email protected]>
Author: Alexandra Wang <[email protected]>

Check for STATUS_DELETE_PENDING on Windows.

1.  Update our open() wrapper to check for NT's STATUS_DELETE_PENDING
and translate it to Unix-like errors.  This is done with
RtlGetLastNtStatus(), which is dynamically loaded from ntdll.  A new
file win32ntdll.c centralizes lookup of NT functions, in case we decide
to add more in the future.

2.  Remove non-working code that was trying to do something similar for
stat(), and just reuse the open() wrapper code.  As a side effect,
stat() also gains resilience against "sharing violation" errors.

3.  Since stat() is used very early in process startup, remove the
requirement that the Win32 signal event has been created before
pgwin32_open_handle() is reached.  Instead, teach pg_usleep() to fall
back to a non-interruptible sleep if reached before the signal event is
available.

This could be back-ed, but for now it's in master only.  The
problem has apparently been with us for a long time and generated only a
few complaints.  Proposed es trigger it more often, which led to
this investigation and fix.

Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Alexander Lakhin <[email protected]>
Reviewed-by: Juan José Santamaría Flecha <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGJz_pZTF9mckn6XgSv69%2BjGwdgLkxZ6b3NWGLBCVjqUZA%40mail.gmail.com
(cherry picked from commit e2f0f8ed251d02c1eda79e1ca3cb3db2681e7a86)

Author: Thomas Munro <[email protected]>
Author: Alexandra Wang <[email protected]>

Disable clang 16's -Wcast-function-type-strict.

Clang 16 is still in development, but seawasp reveals that it has
started warning about many of our casts of function pointers (those
introduced by commit 1c27d16e, and some older ones). Disable the new
warning for now, since otherwise buildfarm animal seawasp fails, and we
have no current plans to change our strategy for these callback function
types.

May be back-ed with other Clang/LLVM 16 changes around release
time.

Discussion: https://postgr.es/m/CA%2BhUKGJvX%2BL3aMN84ksT-cGy08VHErRNip3nV-WmTx7f6Pqhyw%40mail.gmail.com
(cherry picked from commit 101c37cd342a3ae134bb3e5e0abb14ae46692b56)

Author: Thomas Munro <[email protected]>
Author: Alexandra Wang <[email protected]>

Fix -Wcast-function-type warnings

Three groups of issues needed to be addressed:

load_external_function() and related functions returned PGFunction,
even though not necessarily all callers are looking for a function of
type PGFunction.  Since these functions are really just wrappers
around dlsym(), change to return void * just like dlsym().

In dynahash.c, we are using strlcpy() where a function with a
signature like memcpy() is expected.  This should be safe, as the new
comment there explains, but the cast needs to be augmented to avoid
the warning.

In PL/Python, methods all need to be cast to PyCFunction, per Python
API, but this now runs afoul of these warnings.  (This issue also
exists in core CPython.)

To fix the second and third case, we add a new type pg_funcptr_t that
is defined specifically so that gcc accepts it as a special function
pointer that can be cast to any other function pointer without the
warning.

Also add -Wcast-function-type to the standard warning flags, subject
to configure check.

Reviewed-by: Tom Lane <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/1e97628e-6447-b4fd-e230-d109cec2d584%402ndquadrant.com
(cherry picked from commit de8feb1f3a23465b5737e8a8c160e8ca62f61339)

Author: Peter Eisentraut <[email protected]>
Author: Alexandra Wang <[email protected]>

Fix issues with Windows' stat() for files pending on deletion

The code introduced by bed9075 to enhance the stat() implementation on
Windows for file sizes larger than 4GB fails to properly detect files
pending for deletion with its method based on NtQueryInformationFile()
or GetFileInformationByHandleEx(), as proved by Alexander Lakhin in a
custom TAP test of his own.

The method used in the implementation of open() to sleep and loop when
when failing on ERROR_ACCESS_DENIED (EACCES) is showing much more
stability, so switch to this method.  This could still lead to issues if
the permission problem stays around for much longer than the timeout of
1 second used, but that should (hopefully) never happen in
performance-critical paths.  Still, there could be a point in increasing
the timeouts for the sake of machines that handle heavy loads.

Note that WIN32's open() now uses microsoft_native_stat() as it should
be similar to stat() when working around issues with concurrent file
deletions.

I have spent some time testing this  with pgbench in combination
of the SQL functions from genfile.c, as well as running the TAP test
provided on the thread with MSVC builds, and this looks much more
stable than the previous method.

Author: Alexander Lakhin
Reviewed-by: Tom Lane, Michael Paquier, Justin Pryzby
Discussion: https://postgr.es/m/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com
Back-through: 14
(cherry picked from commit 54fb8c7ddf152629021cab3ac3596354217b7d81)

Author: Alexandra Wang <[email protected]>

Fix our Windows stat() emulation to handle file sizes > 4GB.

Hack things so that our idea of "struct stat" is equivalent to Windows'
struct __stat64, allowing it to have a wide enough st_size field.

Instead of relying on native stat(), use GetFileInformationByHandle().
This avoids a number of issues with Microsoft's multiple and rather
slipshod emulations of stat(). We still need to jump through hoops
to deal with ERROR_DELETE_PENDING, though :-(

Pull the relevant support code out of dirmod.c and put it into
its own file, win32stat.c.

Still TODO: do we need to do something different with lstat(),
rather than treating it identically to stat()?

Juan José Santamaría Flecha, reviewed by Emil Iggland;
based on prior work by Michael Paquier, Sergey Zubkovsky, and others

Discussion: https://postgr.es/m/1803D792815FC24D871C00D17AE95905CF5099@g01jpexmbkw24
Discussion: https://postgr.es/m/15858-9572469fd3b73263@postgresql.org
(cherry picked from commit bed90759fcbcd72d4d06969eebab81e47326f9a2)

Author: Alexandra Wang <[email protected]>

doc: Reword ALTER TABLE ATTACH restriction on NO INHERIT constraints

The previous wording is easy to read incorrectly; this change makes it
simpler, less ambiguous, and less prominent.

Back to all live branches.

Reviewed-by: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/202411051201 [email protected]

Monkey- LLVM code to fix ARM relocation bug.

Supply a new memory manager for RuntimeDyld, to avoid crashes in
generated code caused by memory placement that can overflow a 32 bit
data type.  This is a drop-in replacement for the
llvm::SectionMemoryManager class in the LLVM library, with Michael
Smith's proposed fix from
https://www..com/llvm/llvm-project/pull/71968.

We hereby slurp it into our own source tree, after moving into a new
namespace llvm::backport and making some minor adjustments so that it
can be compiled with older LLVM versions as far back as 12.  It's harder
to make it work on even older LLVM versions, but it doesn't seem likely
that people are really using them so that is not investigated for now.

The problem could also be addressed by switching to JITLink instead of
RuntimeDyld, and that is the LLVM project's recommended solution as
the latter is about to be deprecated.  We'll have to do that soon enough
anyway, and then when the LLVM version support window advances far
enough in a few years we'll be able to delete this code.  Unfortunately
that wouldn't be enough for PostgreSQL today: in most relevant versions
of LLVM, JITLink is missing or incomplete.

Several other projects have already back-ported this fix into their fork
of LLVM, which is a vote of confidence despite the lack of commit into
LLVM as of today.  We don't have our own copy of LLVM so we can't do
exactly what they've done; instead we have a copy of the whole ed
class so we can pass an instance of it to RuntimeDyld.

The LLVM project hasn't chosen to commit the fix yet, and even if it
did, it wouldn't be back-ported into the releases of LLVM that most of
our users care about, so there is not much point in waiting any longer
for that.  If they make further changes and commit it to LLVM 19 or 20,
we'll still need this for older versions, but we may want to
resynchronize our copy and update some comments.

The changes that we've had to make to our copy can be seen by diffing
our SectionMemoryManager.{h,cpp} files against the ones in the tree of
the pull request.  Per the LLVM project's license requirements, a copy
is in SectionMemoryManager.LICENSE.

This should fix the spate of crash reports we've been receiving lately
from users on large memory ARM systems.

Back- to all supported releases.

Co-authored-by: Thomas Munro <[email protected]>
Co-authored-by: Anthonin Bonnefoy <[email protected]>
Reviewed-by: Anthonin Bonnefoy <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]> (license aspects)
Reported-by: Anthonin Bonnefoy <[email protected]>
Discussion: https://postgr.es/m/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com

Suppress new "may be used uninitialized" warning.

Buildfarm member mamba fails to deduce that the function never uses this
variable without initializing it. Back- to v12, like commit
b412f402d1e020c5dac94f3bf4a005db69519b99.

Move I/O before the index_update_stats() buffer lock region.

Commit a07e03fd8fa7daf4d1356f7cb501ffe784ea6257 enlarged the work done
here under the pg_class heap buffer lock.  Two preexisting actions are
best done before holding that lock.  Both RelationGetNumberOfBlocks()
and visibilitymap_count() do I/O, and the latter might exclusive-lock a
visibility map buffer.  Moving these reduces contention and risk of
undetected LWLock deadlock.  Back- to v12, like that commit.

Discussion: https://postgr.es/m/20241031200139 [email protected]

Revert "For inplace update, send nontransactional invalidations."

This reverts commit 95c5acb3fc261067ab65ddc0b2dca8e162f09442 (v17) and
counterparts in each other non-master branch. If released, that commit
would have caused a worst-in-years minor release regression, via
undetected LWLock self-deadlock. This commit and its self-deadlock fix
warrant more bake time in the master branch.

Reported by Alexander Lakhin.

Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com

Revert "WAL-log inplace update before revealing it to other sessions."

This reverts commit bfd5c6e279c8e1702eea882439dc7ebdf4d4b3a5 (v17) and
counterparts in each other non-master branch. This unblocks reverting a
commit on which it depends.

Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com

doc: fix ALTER DOMAIN domain_constraint to spell out options

It used to refer to CREATE DOMAIN, but CREATE DOMAIN allows NULL, while
ALTER DOMAIN does not.

Reported-by: [email protected]
Discussion: https://postgr.es/m/172225092461.915373.6103973717483380183@wrigleys.postgresql.org

Back-through: 12

doc: remove mention of ActiveState for Perl and Tcl on Windows

Replace with Strawberry Perl and Magicsplat Tcl.

Reported-by: Yasir Hussain
Discussion: https://postgr.es/m/CAA9OW9fAAM_WDYYpAquqF6j1hmfRMzHPsFkRfP5E6oSfkF=dMA@mail.gmail.com

Back-through: 12

Unpin buffer before inplace update waits for an XID to end.

Commit a07e03fd8fa7daf4d1356f7cb501ffe784ea6257 changed inplace updates
to wait for heap_update() commands like GRANT TABLE and GRANT DATABASE.
By keeping the pin during that wait, a sequence of autovacuum workers
and an uncommitted GRANT starved one foreground LockBufferForCleanup()
for six minutes, on buildfarm member sarus. Prevent, at the cost of a
bit of complexity. Back- to v12, like the earlier commit. That
commit and heap_inplace_lock() have not yet appeared in any release.

Discussion: https://postgr.es/m/20241026184936 [email protected]

Update time zone data files to tzdata release 2024b.

Historical corrections for Mexico, Mongolia, and Portugal.
Notably, Asia/Choibalsan is now an alias for Asia/Ulaanbaatar
rather than being a separate zone, mainly because the differences
between those zones were found to be based on untrustworthy data.

doc: Add better description for rewrite functions in event triggers

There are two functions that can be used in event triggers to get more
details about a rewrite happening on a relation. Both had a limited
documentation:
- pg_event_trigger_table_rewrite_reason() and
pg_event_trigger_table_rewrite_oid() were not mentioned in the main
event trigger section in the paragraph dedicated to the event
table_rewrite.
- pg_event_trigger_table_rewrite_reason() returns an integer which is a
bitmap of the reasons why a rewrite happens. There was no explanation
about the meaning of these values, forcing the reader to look at the
code to find out that these are defined in event_trigger.h.

While on it, let's add a comment in event_trigger.h where the
AT_REWRITE_* are defined, telling to update the documentation when
these values are changed.

Back down to 13 as a consequence of 1ad23335f36b, where this area
of the documentation has been heavily reworked.

Author: Greg Sabino Mullane
Discussion: https://postgr.es/m/CAKAnmmL+Z6j-C8dAx1tVrnBmZJu+BSoc68WSg3sR+CVNjBCqbw@mail.gmail.com
Back-through: 13

Doc: clarify enable_indexscan=off also disabled Index Only Scans

Disabling enable_indexscan has always also disabled Index Only Scans.
Here we make that more clear in the documentation in an attempt to
prevent future complaints complaining about this expected behavior.

Reported-by: Melanie Plageman
Author: David G. Johnston, David Rowley
Back-through: 12, oldest supported version
Discussion: https://postgr.es/m/CAAKRu_atV=kovgpaLREyG68PB5+ncKvJ2UNoeRetEgyC3Yb5Sw@mail.gmail.com

WAL-log inplace update before revealing it to other sessions.

A buffer lock won't stop a reader having already checked tuple
visibility.  If a vac_update_datfrozenid() and then a crash happened
during inplace update of a relfrozenxid value, datfrozenxid could
overtake relfrozenxid.  That could lead to "could not access status of
transaction" errors.  Back- to v12 (all supported versions).  In
v14 and earlier, this also back-es the assertion removal from
commit 7fcf2faf9c7dd473208fd6d5565f88d7f733782b.

Discussion: https://postgr.es/m/20240620012908 [email protected]

For inplace update, send nontransactional invalidations.

The inplace update survives ROLLBACK.  The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update.  In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f.  That is a
source of index corruption.  Back- to v12 (all supported versions).
The back branch versions don't change WAL, because those branches just
added end-of-recovery SIResetAll().  All branches change the ABI of
extern function PrepareToInvalidateCacheTuple().  No PGXN extension
calls that, and there's no apparent use case in extensions.

Reviewed by Nitin Motiani and (in earlier versions) Andres Freund.

Discussion: https://postgr.es/m/20240523000548 [email protected]

At end of recovery, reset all sinval-managed caches.

An inplace update's invalidation messages are part of its transaction's
commit record.  However, the update survives even if its transaction
aborts or we stop recovery before replaying its transaction commit.
After recovery, a backend that started in recovery could update the row
without incorporating the inplace update.  That could result in a table
with an index, yet relhasindex=f.  That is a source of index corruption.

This bulk invalidation avoids the functional consequences.  A future
change can fix the !RecoveryInProgress() scenario without changing the
WAL format.  Back- to v17 - v12 (all supported versions).  v18 will
instead add invalidations to WAL.

Discussion: https://postgr.es/m/20240618152349 [email protected]

Stop reading uninitialized memory in heap_inplace_lock().

Stop computing a never-used value. This removes the read; the read had
no functional implications. Back- to v12, like commit
a07e03fd8fa7daf4d1356f7cb501ffe784ea6257.

Reported by Alexander Lakhin.

Discussion: https://postgr.es/m/6c92f59b-f5bc-e58c-9bdd-d1f21c17c786@gmail.com

Remove unnecessary word in a comment

Relations opened by the executor are only closed once in
ExecCloseRangeTableRelations(), so the word "again" in the comment
for ExecGetRangeTableRelation() is misleading and unnecessary.

Discussion: https://postgr.es/m/CA+HiwqHnw-zR+u060i3jp4ky5UR0CjByRFQz50oZ05de7wUg=Q@mail.gmail.com
Back-through: 12

ecpg: Fix out-of-bound read in DecodeDateTime()

It was possible for the code to read out-of-bound data from the
"day_tab" table with some crafted input data. Let's treat these as
invalid input as the month number is incorrect.

A test is added to test this case with a check on the errno returned by
the decoding routine. A test close to the new one added in this commit
was testing for a failure, but did not look at the errno generated, so
let's use this commit to also change it, adding a check on the errno
returned by DecodeDateTime().

Like the other test scripts, dt_test should likely be expanded to
include more checks based on the errnos generated in these code paths.
This is left as future work.

This issue exists since 2e6f97560a83, so back all the way down.

Reported-by: Pavel Nekrasov
Author: Bruce Momjian, Pavel Nekrasov
Discussion: https://postgr.es/m/18614-6bbe00117352309e@postgresql.org
Back-through: 12

Restructure foreign key handling code for ATTACH/DETACH

... to fix bugs when the referenced table is partitioned.

The catalog representation we chose for foreign keys connecting
partitioned tables (in commit f56f8f8da6af) is inconvenient, in the
sense that a standalone table has a different way to represent the
constraint when referencing a partitioned table, than when the same
table becomes a partition (and vice versa).  Because of this, we need to
create additional catalog rows on detach (pg_constraint and pg_trigger),
and remove them on attach.  We were doing some of those things, but not
all of them, leading to missing catalog rows in certain cases.

The worst problem seems to be that we are missing action triggers after
detaching a partition, which means that you could update/delete rows
from the referenced partitioned table that still had referencing rows on
that table, the server failing to throw the required errors.

!!!
Note that this means existing databases with FKs that reference
partitioned tables might have rows that break relational integrity, on
tables that were once partitions on the referencing side of the FK.

Another possible problem is that trying to reattach a table
that had been detached would fail indicating that internal triggers
cannot be found, which from the user's point of view is nonsensical.

In branches 15 and above, we fix this by creating a new helper function
addFkConstraint() which is in charge of creating a standalone
pg_constraint row, and repurposing addFkRecurseReferencing() and
addFkRecurseReferenced() so that they're only the recursive routine for
each side of the FK, and they call addFkConstraint() to create
pg_constraint at each partitioning level and add the necessary triggers.
These new routines can be used during partition creation, partition
attach and detach, and foreign key creation.  This reduces redundant
code and simplifies the flow.

In branches 14 and 13, we have a much simpler fix that consists on
simply removing the constraint on detach.  The reason is that those
branches are missing commit f4566345cf40, which reworked the way this
works in a way that we didn't consider back-able at the time.

We opted to leave branch 12 alone, because it's different from branch 13
enough that the fix doesn't apply; and because it is going in EOL mode
very soon, ing it now might be worse since there's no way to undo
the damage if it goes wrong.

Existing databases might need to be repaired.

In the future we might want to rethink the catalog representation to
avoid this problem, but for now the code seems to do what's required to
make the constraints operate correctly.

Co-authored-by: Jehan-Guillaume de Rorthais <[email protected]>
Co-authored-by: Tender Wang <[email protected]>
Co-authored-by: Alvaro Herrera <[email protected]>
Reported-by: Guillaume Lelarge <[email protected]>
Reported-by: Jehan-Guillaume de Rorthais <[email protected]>
Reported-by: Thomas Baehler (SBB CFF FFS) <[email protected]>
Discussion: https://postgr.es/m/20230420144344.40744130@karst
Discussion: https://postgr.es/m/20230705233028.2f554f73@karst
Discussion: https://postgr.es/m/GVAP278MB02787E7134FD691861635A8BC9032@GVAP278MB0278.CHEP278.PROD.OUTLOOK.COM
Discussion: https://postgr.es/m/18541-628a61bc267cd2d3@postgresql.org

Fix wrong assertion and poor error messages in "COPY (query) TO".

If the query is rewritten into a NOTIFY command by a DO INSTEAD
rule, we'd get an assertion failure, or in non-assert builds
issue a rather confusing error message. Improve that.

Also fix a longstanding grammar mistake in a nearby error message.

Per bug #18664 from Alexander Lakhin. Back- to all supported
branches.

Tender Wang and Tom Lane

Discussion: https://postgr.es/m/18664-ffd0ebc2386598df@postgresql.org

Fix race condition in committing a serializable transaction

The finished transaction list can contain XIDs that are older than the
serializable global xmin. It's a short-lived state;
ClearOldPredicateLocks() removes any such transactions from the list,
and it's called whenever the global xmin advances. But if another
backend calls SummarizeOldestCommittedSxact() in that window, it will
call SerialAdd() on an XID that's older than the global xmin, or if
there are no more transactions running, when global xmin is
invalid. That trips the assertion in SerialAdd().

Fixes bug #18658 reported by Andrew Bille. Thanks to Alexander Lakhin
for analysis. Back to all versions.

Discussion: https://www.postgresql.org/message-id/18658-7dab125ec688c70b%40postgresql.org

Note that index_name in ALTER INDEX ATTACH PARTITION can be schema-qualified

Missed in 8b08f7d4820f; back to all supported branches.

Reported-by: [email protected]
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/172924785099.698.15236991344616673753@wrigleys.postgresql.org

Fix extreme skew detection in Parallel Hash Join.

After repartitioning the inner side of a hash join that would have
exceeded the allowed size, we check if all the tuples from a parent
partition moved to one child partition.  That is evidence that it
contains duplicate keys and later attempts to repartition will also
fail, so we should give up trying to limit memory (for lack of a better
fallback strategy).

A thinko prevented the check from working correctly in partition 0 (the
one that is partially loaded into memory already).  After
repartitioning, we should check for extreme skew if the *parent*
partition's space_exhausted flag was set, not the child partition's.
The consequence was repeated futile repartitioning until per-partition
data exceeded various limits including "ERROR: invalid DSA memory alloc
request size 1811939328", OS allocation failure, or temporary disk space
errors.  (We could also do something about some of those symptoms, but
that's material for separate es.)

This problem only became likely when PostgreSQL 16 introduced support
for Parallel Hash Right/Full Join, allowing NULL keys into the hash
table.  Repartitioning always leaves NULL in partition 0, no matter how
many times you do it, because the hash value is all zero bits.  That's
unlikely for other hashed values, but they might still have caused
wasted extra effort before giving up.

Back- to all supported releases.

Reported-by: Craig Milhiser <[email protected]>
Reviewed-by: Andrei Lepikhov <[email protected]>
Discussion: https://postgr.es/m/CA%2BwnhO1OfgXbmXgC4fv_uu%3DOxcDQuHvfoQ4k0DFeB0Qqd-X-rQ%40mail.gmail.com

Further refine _SPI_execute_plan's rule for atomic execution.

Commit 2dc1deaea turns out to have been still a brick shy of a load,
because CALL statements executing within a plpgsql exception block
could still pass the wrong snapshot to stable functions within the
CALL's argument list.  That happened because standard_ProcessUtility
forces isAtomicContext to true if IsTransactionBlock is true, which
it always will be inside a subtransaction.  Then ExecuteCallStmt
would think it does not need to push a new snapshot --- but
_SPI_execute_plan didn't do so either, since it thought it was in
nonatomic mode.

The best fix for this seems to be for _SPI_execute_plan to operate
in atomic execution mode if IsSubTransaction() is true, even when the
SPI context as a whole is non-atomic.  This makes _SPI_execute_plan
have the same rules about when non-atomic execution is allowed as
_SPI_commit/_SPI_rollback have about when COMMIT/ROLLBACK are allowed,
which seems appropriately symmetric.  (If anyone ever tries to allow
COMMIT/ROLLBACK inside a subtransaction, this would all need to be
rethought ... but I'm unconvinced that such a thing could be logically
consistent at all.)

For further consistency, also check IsSubTransaction() in
SPI_inside_nonatomic_context.  That does not matter for its
one present-day caller StartTransaction, which can't be reached
inside a subtransaction.  But if any other callers ever arise,
they'd presumably want this definition.

Per bug #18656 from Alexander Alehin.  Back- to all
supported branches, like previous fixes in this area.

Discussion: https://postgr.es/m/18656-cade1780866ef66c@postgresql.org

Reduce memory block size for decoded tuple storage to 8kB.

Commit a4ccc1cef introduced the Generation Context and modified the
logical decoding process to use a Generation Context with a fixed
block size of 8MB for storing tuple data decoded during logical
decoding (i.e., rb->tup_context). Several reports have indicated that
the logical decoding process can be terminated due to
out-of-memory (OOM) situations caused by excessive memory usage in
rb->tup_context.

This issue can occur when decoding a workload involving several
concurrent transactions, including a long-running transaction that
modifies tuples. By design, the Generation Context does not free a
memory block until all chunks within that block are
released. Consequently, if tuples modified by the long-running
transaction are stored across multiple memory blocks, these blocks
remain allocated until the long-running transaction completes, leading
to substantial memory fragmentation. The memory usage during logical
decoding, tracked by rb->size, does not account for memory
fragmentation, resulting in potentially much higher memory consumption
than the value of the logical_decoding_work_mem parameter.

Various improvement strategies were discussed in the relevant
thread. This change reduces the block size of the Generation Context
used in rb->tup_context from 8MB to 8kB. This modification
significantly decreases the likelihood of substantial memory
fragmentation occurring and is relatively straightforward to
backport. Performance testing across multiple platforms has confirmed
that this change will not introduce any performance degradation that
would impact actual operation.

Backport to all supported branches.

Reported-by: Alex Richman, Michael Guissine, Avi Weinberg
Reviewed-by: Amit Kapila, Fujii Masao, David Rowley
Tested-by: Hayato Kuroda, Shlok Kyal
Discussion: https://postgr.es/m/CAD21AoBTY1LATZUmvSXEssvq07qDZufV4AF-OHh9VD2pC0VY2A%40mail.gmail.com
Back-through: 12

Correctly identify which EC members are computable at a plan node.

find_computable_ec_member() had the wrong mental model of what
its primary caller prepare_sort_from_pathkeys() would do with
the selected EquivalenceClass member expression.  We will not
compute the EC expression in a plan node atop the one returning
the passed-in targetlist; rather, the EC expression will be
computed as an additional column of that targetlist.  So any
Var or quasi-Var used in the given tlist is also available to the
EC expression.  In simple cases this makes no difference because
the given tlist is just a list of Vars or quasi-Vars --- but if
we are considering an appendrel member produced by flattening
a UNION ALL, the tlist may contain expressions, resulting in
failure to match and a "could not find pathkey item to sort"
error.

To fix, we can flatten both the tlist and the EC members with
pull_var_clause(), and then just check for subset-ness, so
that the code is actually shorter than before.

While this bug is quite old, the present  only works back to
v13.  We could possibly make it work in v12 by back-ing parts
of 375398244.  On the whole though I don't like the risk/reward
ratio of that idea.  v12's final release is next month, meaning
there would be no chance to correct matters if the  causes a
regression.  Since this failure has escaped notice for 14 years,
it's likely nobody will hit it in the field with v12.

Per bug #18652 from Alexander Lakhin.

Andrei Lepikhov and Tom Lane

Discussion: https://postgr.es/m/18652-deaa782ebcca85d1@postgresql.org

Remove incorrect function import from pgindent

Commit 149ac7d4559 which re-implemented pgindent in Perl explicitly
imported the devnull function from File::Spec, but the module does
not export anything. In recent versions of Perl calling a missing
import function cause a warning, which combined with warnings being
fatal cause pgindent to error out.

Back to all supported versions.

Author: Erik Wienhold <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discusson: https://postgr.es/m/2372cd74-11b0-46f9-b28e-8f9627215d19@ewie.name
Back-through: v12

vacuumdb: Schema-qualify operator in catalog query's WHERE clause.

Commit 1ab67c9dfa, which modified this catalog query so that it
doesn't return temporary relations, forgot to schema-qualify the
operator. A comment earlier in the function implores us to fully
qualify everything in the query:

* Since we execute the constructed query with the default search_path
* (which could be unsafe), everything in this query MUST be fully
* qualified.

This commit fixes that. While at it, add a newline for consistency
with surrounding code.

Reviewed-by: Noah Misch
Discussion: https://postgr.es/m/ZwQJYcuPPUsF0reU%40nathan
Back-through: 12

Fix Y2038 issues with MyStartTime.

Several places treat MyStartTime as a "long", which is only 32 bits
wide on some platforms.  In reality, MyStartTime is a pg_time_t,
i.e., a signed 64-bit integer.  This will lead to interesting bugs
on the aforementioned systems in 2038 when signed 32-bit integers
are no longer sufficient to store Unix time (e.g., "pg_ctl start"
hanging).  To fix, ensure that MyStartTime is handled as a 64-bit
value everywhere.  (Of course, users will need to ensure that
time_t is 64 bits wide on their system, too.)

Co-authored-by: Max Johnson
Discussion: https://postgr.es/m/CO1PR07MB905262E8AC270FAAACED66008D682%40CO1PR07MB9052.namprd07.prod.outlook.com
Back-through: 12

Ignore not-yet-defined Portals in pg_cursors view.

pg_cursor() supposed that any Portal it finds in the hash table must
have sourceText set up, but there's an edge case where that is not so.
A newly-created Portal has sourceText = NULL, and that doesn't change
until PortalDefineQuery is called.  In SPI_cursor_open_internal,
we perform GetCachedPlan between CreatePortal and PortalDefineQuery,
and it's possible for user-defined code to execute during that
planning and cause a fetch from the pg_cursors view, resulting in a
null-pointer-dereference crash.  (It looks like the same could happen
in exec_bind_message, but I've not tried to provoke a failure there.)

I considered trying to fix this by setting sourceText sooner, but
there may be instances of this same calling pattern in extensions,
and we couldn't be sure they'd get the memo promptly.  It seems
better to redefine pg_cursor as not showing Portals that have
not yet had PortalDefineQuery called on them, which we can do by
just skipping them if sourceText is still NULL.

(Before a1c692358, pg_cursor would instead return a row with NULL
in the statement column.  We could revert to that behavior but it
doesn't really seem like a better definition, especially since our
documentation doesn't suggest that the column could be NULL.)

Per report from PetSerAl.  Back- to all supported branches.

Discussion: https://postgr.es/m/CAKygsHTBXLXjwV43kpZa+Cs+XTiaeeJiZdL4cPBm9f4MTdw7wg@mail.gmail.com