Álvaro Herrera [Thu, 21 Nov 2024 15:54:36 +0000 (16:54 +0100)] Fix outdated bit in README.tuplock
Apparently this information has been outdated since first committed,
because we adopted a different implementation during development per
reviews and this detail was not updated in the README.
This has been wrong since commit
0ac5ad5134f2 introduced the file in
2013. Back to all live branches.
Reported-by: Will Mortensen <[email protected]>Discussion: https://postgr.es/m/CAMpnoC6yEQ=c0Rdq-J7uRedrP7Zo9UMp6VZyP23QMT68n06cvA@mail.gmail.com
Peter Eisentraut [Thu, 21 Nov 2024 12:50:18 +0000 (13:50 +0100)] Fix ALTER TABLE / REPLICA IDENTITY for temporal tables
REPLICA IDENTITY USING INDEX did not accept a GiST index. This should
be allowed when used as a temporal primary key.
Author: Paul Jungwirth <
[email protected]>
Discussion: https://www.postgresql.org/message-id/
04579cbf-b134-45e1-8f2d-
8c54c849c1ee@illuminatedcomputing.com
Álvaro Herrera [Thu, 21 Nov 2024 09:54:30 +0000 (10:54 +0100)] Unify repetitive error messages
Michael Paquier [Thu, 21 Nov 2024 06:14:02 +0000 (15:14 +0900)] Fix memory in pgoutput for the WAL sender
RelationSyncCache, the hash table in charge of tracking the relation
schemas sent through pgoutput, was forgetting to free the TupleDesc
associated to the two slots used to store the new and old tuples,
causing some memory to be each time a relation is invalidated
when the slots of an existing relation entry are cleaned up.
This is rather hard to notice as the bloat is pretty minimal, but a
long-running WAL sender would be in trouble over time depending on the
workload. sysbench has proved to be pretty good at showing the problem,
coupled with some memory monitoring of the WAL sender.
Issue introduced in
52e4f0cd472d, that has added row filters for tables
logically replicated.
Author: Boyu Yang
Reviewed-by: Michael Paquier, Hou ZhijieDiscussion: https://postgr.es/m/DM3PR84MB3442E14B340E553313B5C816E3252@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM
Back-through: 15
Bruce Momjian [Wed, 20 Nov 2024 22:09:17 +0000 (17:09 -0500)] More logically order libpq func. includes, e.g., group GUC vals
Reported-by: David ZhangDiscussion: https://postgr.es/m/
65909efe-97c6-4863-af4e-
21eb5a26dd1e@highgo.ca
Co-authored-by: David ZhangBack-through: master
Bruce Momjian [Wed, 20 Nov 2024 22:03:45 +0000 (17:03 -0500)] doc: clarify that jsonb_path_match() returns an SQL boolean
Not a JSON boolean. Also clarify that other predicate check expressions
functions return a JSON boolean, not an SQL boolean.
Reported-by: jian he
Discussion: https://postgr.es/m/CACJufxH7tP1NXCHN1bUBXcEB=dv7-qE+ZjB3UxwK6Em+9Qzb9Q@mail.gmail.com
Back-through: 17
Bruce Momjian [Wed, 20 Nov 2024 19:48:31 +0000 (14:48 -0500)] clarify --no-comments option in --help and SGML files
The previous commit,
b38bac26e20, missed these cases for dump/restore.
Reported-by: Tom LaneDiscussion: https://postgr.es/m/
3495698.
1731968093@sss.pgh.pa.us
Back-through: master
Peter Geoghegan [Wed, 20 Nov 2024 18:37:08 +0000 (13:37 -0500)] Refine nbtree = redundancy preprocessing comment.
Spell out how a = key associated with a SAOP array renders a > key
against the same index column redundant at the relevant point inside
_bt_preprocess_keys.
Follow-up to commit
5bf748b8.
Tom Lane [Wed, 20 Nov 2024 17:03:47 +0000 (12:03 -0500)] Avoid assertion failure if a setop leaf query contains setops.
Ordinarily transformSetOperationTree will collect all UNION/
INTERSECT/EXCEPT steps into the setOperations tree of the topmost
Query, so that leaf queries do not contain any setOperations.
However, it cannot thus flatten a subquery that also contains
WITH, ORDER BY, FOR UPDATE, or LIMIT. I (tgl) forgot that in
commit
07b4c48b6 and wrote an assertion in rule deparsing that
a leaf's setOperations would always be empty.
If it were nonempty then we would want to parenthesize the subquery
to ensure that the output represents the setop nesting correctly
(e.g. UNION below INTERSECT had better get parenthesized). So
rather than just removing the faulty Assert, let's change it into
an additional case to check to decide whether to add parens. We
don't expect that the additional case will ever fire, but it's
cheap insurance.
Man Zeng and Tom Lane
Discussion: https://postgr.es/m/
[email protected]Fujii Masao [Wed, 20 Nov 2024 14:53:19 +0000 (23:53 +0900)] file_fdw: Add REJECT_LIMIT option to file_fdw.
Commit
4ac2a9bece introduced the REJECT_LIMIT option for the COPY
command. This commit extends the support for this option to file_fdw.
As well as REJECT_LIMIT option for COPY, this option limits
the maximum number of erroneous rows that can be skipped.
If the number of data type conversion errors exceeds this limit,
accessing the file_fdw foreign table will fail with an error,
even when on_error = 'ignore' is specified.
Since the CREATE/ALTER FOREIGN TABLE commands require foreign
table options to be single-quoted, this commit updates
defGetCopyRejectLimitOption() to handle also string value for them,
in addition to int64 value for COPY command option.
Author: Atsushi Torikoshi
Reviewed-by: Fujii Masao, Yugo Nagata, Kirill ReshkeDiscussion: https://postgr.es/m/
bab68a9fc502b12693f0755b6f35f327@oss.nttdata.com
Michael Paquier [Wed, 20 Nov 2024 05:20:52 +0000 (14:20 +0900)] doc: Fix section of functions age(xid) and mxid_age(xid)
In 17~, age(xid) and mxid_age(xid) were listed as deprecated. Based on
the discussion that led to
48b5aa3143, this is not intentional as this
could break many existing monitoring queries. Note that vacuumdb also
uses both of them.
In 16, both functions were listed under "Control Data Functions", which
is incorrect, so let's move them to the list of functions related to
transaction IDs and snapshots.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/
[email protected]Discussion: https://postgr.es/m/
20231114013224[email protected]Back-through: 16
Tom Lane [Tue, 19 Nov 2024 23:26:19 +0000 (18:26 -0500)] Compare collations before merging UNION operations.
In the dim past we figured it was okay to ignore collations
when combining UNION set-operation nodes into a single N-way
UNION operation. I believe that was fine at the time, but
it stopped being fine when we added nondeterministic collations:
the semantics of distinct-ness are affected by those. v17 made
it even less fine by allowing per-child sorting operations to
be merged via MergeAppend, although I think we accidentally
avoided any live bug from that.
Add a check that collations match before deciding that two
UNION nodes are equivalent. I also failed to resist the
temptation to comment plan_union_children() a little better.
Back- to all supported branches (v13 now), since they
all have nondeterministic collations.
Discussion: https://postgr.es/m/
3605568.
1731970579@sss.pgh.pa.us
Fujii Masao [Tue, 19 Nov 2024 17:00:50 +0000 (02:00 +0900)] Improve error message for database object stats manipulation functions.
Previously, database object statistics manipulation functions like
pg_set_relation_stats() reported unclear error and hint messages
when executed during recovery. These messages were "internal",
making it difficult for users to understand the issue:
ERROR: cannot acquire lock mode ShareUpdateExclusiveLock on database objects while recovery is in progress
HINT: Only RowExclusiveLock or less can be acquired on database objects during recovery.
This commit updates the error handling so that, if these functions
are called during recovery, they produce clearer messages:
ERROR: recovery is in progress
HINT: Statistics cannot be modified during recovery.
The related documentation has also been updated to explicitly
clarify that these functions are not available during recovery.
Author: Fujii Masao
Reviewed-by: Heikki Linnakangas, Maxim OrlovDiscussion: https://postgr.es/m/
6d313829-5f56-4a28-ae4b-
bd01bf1ae791@oss.nttdata.com
Michael Paquier [Tue, 19 Nov 2024 04:27:42 +0000 (13:27 +0900)] libpq: Improve error message when parsing URI parameters and keywords
The error message showing up when parameters or keywords include too
many whitespaces was "trailing data found", which was confusing because
there was no hint about what was actually wrong.
Issue introduced in
430ce189fc45, hence there is no need for a
back.
Author: Yushi Ogiwara
Reviewed-by: Fujii Masao, Tom Lane, Bruce MomjianDiscussion: https://postgr.es/m/
645bd22a53c4da8a1bc7e1e52d9d3b52@oss.nttdata.com
Bruce Momjian [Mon, 18 Nov 2024 21:30:33 +0000 (16:30 -0500)] doc: clarify pg_dump --no-comments meaning as SQL comments
Discussion: https://postgr.es/m/
[email protected]Back-through: master
Bruce Momjian [Mon, 18 Nov 2024 20:34:59 +0000 (15:34 -0500)] doc: clarify text about combining row-level policies
Reported-by: [email protected]Discussion: https://postgr.es/m/
173045909386.700.
9231055113418242392@wrigleys.postgresql.org
Back-through: master
Peter Geoghegan [Mon, 18 Nov 2024 18:35:28 +0000 (13:35 -0500)] nbtree: consistently use minoff variable.
This was arguably an oversight in commit
29b64d1de7, which moved this
code from nbtutils.c to its nbtsearch.c caller.
Michael Paquier [Mon, 18 Nov 2024 04:41:10 +0000 (13:41 +0900)] Improve some code format in gist.c
Author: Tender Wang
Discussion: https://postgr.es/m/CAHewXNmD=K7XmsHq=L1SyyzZYvwU4oaMG9EKSSMe4OrXfykLzg@mail.gmail.com
Michael Paquier [Mon, 18 Nov 2024 02:44:11 +0000 (11:44 +0900)] Use pg_memory_is_all_zeros() in PageIsVerifiedExtended()
Relying on pg_memory_is_all_zeros(), which would apply SIMD instructions
when dealing with an aligned page, is proving to be at least three times
faster than the original size_t-based comparisons when checking if a
BLCKSZ page is full of zeros. Note that PageIsVerifiedExtended() is
called each time a page is read from disk, and making it faster is a
good thing.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
Michael Paquier [Mon, 18 Nov 2024 01:08:20 +0000 (10:08 +0900)] Optimize pg_memory_is_all_zeros() in memutils.h
pg_memory_is_all_zeros() is currently implemented to do only a
byte-per-byte comparison. While being sufficient for its existing
callers for pgstats entries, it could lead to performance regressions
should it be used for larger memory areas, like 8kB blocks, or even
future commits.
This commit optimizes the implementation of this function to be more
efficient for larger sizes, written in a way so that compilers can
optimize the code. This is portable across 32b and 64b architectures.
The implementation handles three cases, depending on the size of the
input provided:
* If less than sizeof(size_t), do a simple byte-by-byte comparison.
* If between sizeof(size_t) and (sizeof(size_t) * 8 - 1):
** Phase 1: byte-by-byte comparison, until the pointer is aligned.
** Phase 2: size_t comparisons, with aligned pointers, up to the last
aligned location possible.
** Phase 3: byte-by-byte comparison, until the end location.
* If more than (sizeof(size_t) * 8) bytes, this is the same as case 2
except that an additional phase is placed between Phase 1 and Phase 2,
with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage
compilers to use SIMD instructions if available.
The last improvement proves to be at least 3 times faster than the
size_t comparisons, which is something currently used for the all-zero
page check in PageIsVerifiedExtended().
The optimization tricks that would encourage the use of SIMD
instructions have been suggested by David Rowley.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Ranier Vilela
Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
Noah Misch [Sat, 16 Nov 2024 04:39:56 +0000 (20:39 -0800)] Fix per-session activation of ALTER {ROLE|DATABASE} SET role.
After commit
5a2fed911a85ed6d8a015a6bafe3a0d9a69334ae, the catalog state
resulting from these commands ceased to affect sessions. Restore the
longstanding behavior, which is like beginning the session with a SET
ROLE command. If cherry-picking the CVE-2024-10978 fixes, default to
including this, too. (This fixes an unintended side effect of fixing
CVE-2024-10978.) Back- to v12, like that commit. The release team
decided to include v12, despite the original intent to halt v12 commits
earlier this week.
Tom Lane and Noah Misch. Reported by Etienne LAFARGE.
Discussion: https://postgr.es/m/CADOZwSb0UsEr4_UTFXC5k7=fyyK8uKXekucd+-uuGjJsGBfxgw@mail.gmail.com
Masahiko Sawada [Sat, 16 Nov 2024 01:06:11 +0000 (17:06 -0800)] Fix a possibility of logical replication slot's restart_lsn going backwards.
Previously LogicalIncreaseRestartDecodingForSlot() accidentally
accepted any LSN as the candidate_lsn and candidate_valid after the
restart_lsn of the replication slot was updated, so it potentially
caused the restart_lsn to move backwards.
A scenario where this could happen in logical replication is: after a
logical replication restart, based on previous candidate_lsn and
candidate_valid values in memory, the restart_lsn advances upon
receiving a subscriber acknowledgment. Then, logical decoding restarts
from an older point, setting candidate_lsn and candidate_valid based
on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments
then update the restart_lsn to an LSN older than the current value.
In the reported case, after WAL files were removed by a checkpoint,
the retreated restart_lsn prevented logical replication from
restarting due to missing WAL segments.
This change essentially modifies the 'if' condition to 'else if'
condition within the function. The previous code had an asymmetry in
this regard compared to LogicalIncreaseXminForSlot(), which does
almost the same thing for different fields.
The WAL removal issue was reported by Hubert Depesz Lubaczewski.
Back to all supported versions, since the bug exists since 9.4
where logical decoding was introduced.
Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit KapilaDiscussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com
Discussion: https://postgr.es/m/
85fff40e-148b-4e86-b921-
b4b846289132%40vondra.me
Back-through: 13
Tom Lane [Fri, 15 Nov 2024 23:23:38 +0000 (18:23 -0500)] Avoid assertion due to disconnected NFA sub-graphs in regex parsing.
In commit
08c0d6ad6 which introduced "rainbow" arcs in regex NFAs,
I didn't think terribly hard about what to do when creating the color
complement of a rainbow arc. Clearly, the complement cannot match any
characters, and I took the easy way out by just not building any arcs
at all in the complement arc set. That mostly works, but Nikolay
Shaplov found a case where it doesn't: if we decide to delete that
sub-NFA later because it's inside a "{0}" quantifier, delsub()
suffered an assertion failure. That's because delsub() relies on
the target sub-NFA being fully connected. That was always true
before, and the best fix seems to be to restore that property.
Hence, invent a new arc type CANTMATCH that can be generated in
place of an empty color complement, and drop it again later when we
start NFA optimization. (At that point we don't need to do delsub()
any more, and besides there are other cases where NFA optimization can
lead to disconnected subgraphs.)
It appears that this bug has no consequences in a non-assert-enabled
build: there will be some transiently NFA states/arcs, but
they'll get cleaned up eventually. Still, we don't like assertion
failures, so back- to v14 where rainbow arcs were introduced.
Per bug #18708 from Nikolay Shaplov.
Discussion: https://postgr.es/m/18708-
f94f2599c9d2c005@postgresql.org
Fujii Masao [Fri, 15 Nov 2024 16:59:33 +0000 (01:59 +0900)] Remove unnecessary backslash from CopyFrom() code.
Commit
4ac2a9bece accidentally added an unnecessary backslash
to CopyFrom() code. This commit removes it.
Author: Yugo Nagata
Reviewed-by: Tender WangDiscussion: https://postgr.es/m/
20241112114609.
4175a2e175282edd1463dbc6@sraoss.co.jp
Peter Eisentraut [Fri, 15 Nov 2024 13:52:28 +0000 (14:52 +0100)] Fix collation handling for foreign keys
Allowing foreign keys where the referenced and the referencing columns
have collations with different notions of equality is problematic.
This can only happen when using nondeterministic collations, for
example, if the referencing column is case-insensitive and the
referenced column is not, or vice versa. It does not happen if both
collations are deterministic.
To show one example:
CREATE COLLATION case_insensitive (provider = icu, deterministic = false, locale = 'und-u-ks-level2');
CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY);
CREATE TABLE fktable (x text COLLATE case_insensitive REFERENCES pktable ON UPDATE CASCADE ON DELETE CASCADE);
INSERT INTO pktable VALUES ('A'), ('a');
INSERT INTO fktable VALUES ('A');
BEGIN; DELETE FROM pktable WHERE x = 'a'; TABLE fktable; ROLLBACK;
BEGIN; DELETE FROM pktable WHERE x = 'A'; TABLE fktable; ROLLBACK;
Both of these DELETE statements delete the one row from fktable. So
this means that one row from fktable references two rows in pktable,
which should not happen. (That's why a primary key or unique
constraint is required on pktable.)
When nondeterministic collations were implemented, the SQL standard
available to yours truly said that referential integrity checks should
be performed with the collation of the referenced column, and so
that's how we implemented it. But this turned out to be a mistake in
the SQL standard, for the same reasons as above, that was later
(SQL:2016) fixed to require both collations to be the same. So that's
what we are aiming for here.
We don't have to be quite so strict. We can allow different
collations if they are both deterministic. This is also good for
backward compatibility.
So the new rule is that the collations either have to be the same or
both deterministic. Or in other words, if one of them is
nondeterministic, then both have to be the same.
Users upgrading from before that have affected setups will need to
make changes to their schemas (i.e., change one or both collations in
affected foreign-key relationships) before the upgrade will succeed.
Some of the nice test cases for the previous situation in
collate.icu.utf8.sql are now obsolete. They are changed to just check
the error checking of the new rule. Note that collate.sql already
contained a test for foreign keys with different deterministic
collations.
A bunch of code in ri_triggers.c that added a COLLATE clause to
enforce the referenced column's collation can be removed, because both
columns now have to have the same notion of equality, so it doesn't
matter which one to use.
Reported-by: Paul Jungwirth <[email protected]>Reviewed-by: Jian He <[email protected]>Discussion: https://www.postgresql.org/message-id/flat/
78d824e0-b21e-480d-a252-
e4b84bc2c24b@illuminatedcomputing.com
Álvaro Herrera [Fri, 15 Nov 2024 11:53:12 +0000 (12:53 +0100)] Avoid deleting critical WAL segments during pg_rewind
Previously, in unlucky cases, it was possible for pg_rewind to remove
certain WAL segments from the rewound demoted primary. In particular
this happens if those files have been marked for archival (i.e., their
.ready files were created) but not yet archived; the newly promoted node
no longer has such files because of them having been recycled, but they
are likely critical for recovery in the demoted node. If pg_rewind
removes them, recovery is not possible anymore.
Fix this by maintaining a hash table of files in this situation in the
scan that looks for a checkpoint, which the decide_file_actions phase
can consult so that it knows to preserve them.
Back to 14. The problem also exists in 13, but that branch was not
blessed with commit
eb00f1d4bf96, so this is difficult to apply
there. Users of older releases will just have to continue to be extra
careful when rewinding.
Co-authored-by: Полина Бунгина (Polina Bungina) <[email protected]>Co-authored-by: Alexander Kukushkin <[email protected]>Reviewed-by: Kyotaro Horiguchi <[email protected]>Reviewed-by: Atsushi Torikoshi <[email protected]>Discussion: https://postgr.es/m/CAAtGL4AhzmBRsEsaDdz7065T+k+BscNadfTqP1NcPmsqwA5HBw@mail.gmail.com
Peter Eisentraut [Fri, 15 Nov 2024 09:53:54 +0000 (10:53 +0100)] Proper object locking for GRANT/REVOKE
Refactor objectNamesToOids() to use get_object_address() internally if
possible. Not only does this save a lot of code, it also allows us to
use the object locking provided by get_object_address() for
GRANT/REVOKE. There was previously a code comment that complained
about the lack of locking in objectNamesToOids(), which is now fixed.
The check in ExecGrant_Type_check() is obsolete because
get_object_address_type() already does the same check.
Reviewed-by: Bertrand Drouvot <[email protected]>Discussion: https://www.postgresql.org/message-id/flat/
bf72b82c-124d-4efa-a484-
bb928e9494e4@eisentraut.org
Heikki Linnakangas [Fri, 15 Nov 2024 08:06:36 +0000 (10:06 +0200)] jit: Stop emitting some unnecessary instructions
In EEOP_BOOL_AND_STEP* and EEOP_BOOL_OR_STEP*, we emitted pointlesss
store instructions to store to resnull/resvalue values that were just
loaded from the same fields in the previous instructions. They will
surely get optimized away by LLVM if any optimizations are enabled,
but it's better to not emit them in the first place. In
EEOP_BOOL_NOT_STEP, similar story with resnull.
In EEOP_NULLIF, when it returns NULL, there was also a redundant store
to resvalue just after storing a 0 to it. The value of resvalue
doesn't matter when resnull is set, so in fact even storing the 0 is
unnecessary, but I kept that because we tend to do that for general
tidiness.
Author: Xing Guo <
[email protected]>
Reviewed-by: Andreas Karlsson <[email protected]>Discussion: https://www.postgresql.org/message-id/CACpMh%2BC%
[email protected]Peter Eisentraut [Fri, 15 Nov 2024 07:42:59 +0000 (08:42 +0100)] Add an assertion in get_object_address()
Some places declared a Relation before calling get_object_address()
only to assert that the relation is NULL after the call.
The new assertion allows passing NULL as the relation argument at
those places making the code cleaner and easier to understand.
Author: Bertrand Drouvot <
[email protected]>
Discussion: https://www.postgresql.org/message-id/ZzG34eNrT83W/
[email protected]Michael Paquier [Fri, 15 Nov 2024 02:31:58 +0000 (11:31 +0900)] Fix race conditions with drop of reused pgstats entries
This fixes a set of race conditions with cumulative statistics where a
shared stats entry could be dropped while it should still be valid in
the event when it is reused: an entry may refer to a different object
but requires the same hash key. This can happen with various stats
kinds, like:
- Replication slots that compute internally an index number, for
different slot names.
- Stats kinds that use an OID in the object key, where a wraparound
causes the same key to be used if an OID is used for the same object.
- As of PostgreSQL 18, custom pgstats kinds could also be an issue,
depending on their implementation.
This issue is fixed by introducing a counter called "generation" in the
shared entries via PgStatShared_HashEntry, initialized at 0 when an
entry is created and incremented when the same entry is reused, to avoid
concurrent issues on drop because of other backends still holding a
reference to it. This "generation" is copied to the local copy that a
backend holds when looking at an object, then cross-checked with the
shared entry to make sure that the entry is not dropped even if its
"refcount" justifies that if it has been reused.
This problem could show up when a backend shuts down and needs to
discard any entries it still holds, causing statistics to be removed
when they should not, or even an assertion failure. Another report
involved a failure in a standby after an OID wraparound, where the
startup process would FATAL on a "can only drop stats once", stopping
recovery abruptly. The buildfarm has been sporadically complaining
about the problem, as well, but the window is hard to reach with the
in-core tests.
Note that the issue can be reproduced easily by adding a sleep before
dshash_find() in pgstat_release_entry_ref() to enlarge the problematic
window while repeating test_decoding's isolation test oldest_xmin a
couple of times, for example, as pointed out by Alexander Lakhin.
Reported-by: Alexander Lakhin, Peter SmithAuthor: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Bertrand DrouvotDiscussion: https://postgr.es/m/CAA4eK1KxuMVyAryz_Vk5yq3ejgKYcL6F45Hj9ZnMNBS-g+PuZg@mail.gmail.com
Discussion: https://postgr.es/m/17947-
b9554521ad963c9c@postgresql.org
Back-through: 15
Heikki Linnakangas [Thu, 14 Nov 2024 14:12:32 +0000 (16:12 +0200)] Pass MyPMChildSlot as an explicit argument to child process
All the other global variables passed from postmaster to child have
the same value in all the processes, while MyPMChildSlot is more like
a parameter to each child process.
Reviewed-by: Andres Freund <[email protected]>Discussion: https://www.postgresql.org/message-id/
a102f15f-eac4-4ff2-af02-
f9ff209ec66f@iki.fi
Heikki Linnakangas [Thu, 14 Nov 2024 14:12:28 +0000 (16:12 +0200)] Assign a child slot to every postmaster child process
Previously, only backends, autovacuum workers, and background workers
had an entry in the PMChildFlags array. With this commit, all
postmaster child processes, including all the aux processes, have an
entry. Dead-end backends still don't get an entry, though, and other
processes that don't touch shared memory will never mark their
PMChildFlags entry as active.
We now maintain separate freelists for different kinds of child
processes. That ensures that there are always slots available for
autovacuum and background workers. Previously, pre-authentication
backends could prevent autovacuum or background workers from starting
up, by using up all the slots.
The code to manage the slots in the postmaster process is in a new
pmchild.c source file. Because postmaster.c is just so large.
Assigning pmsignal slot numbers is now pmchild.c's responsibility.
This replaces the PMChildInUse array in pmsignal.c.
Some of the comments in postmaster.c still talked about the "stats
process", but that was removed in commit
5891c7a8ed. Fix those while
we're at it.
Reviewed-by: Andres Freund <[email protected]>Discussion: https://www.postgresql.org/message-id/
a102f15f-eac4-4ff2-af02-
f9ff209ec66f@iki.fi
Heikki Linnakangas [Thu, 14 Nov 2024 14:12:04 +0000 (16:12 +0200)] Kill dead-end children when there's nothing else left
Previously, the postmaster would never try to kill dead-end child
processes, even if there were no other processes left. A dead-end
backend will eventually exit, when authentication_timeout expires, but
if a dead-end backend is the only thing that's preventing the server
from shutting down, it seems better to kill it immediately. It's
particularly important, if there was a bug in the early startup code
that prevented a dead-end child from timing out and exiting normally.
Includes a test for that case where a dead-end backend previously
prevented the server from shutting down.
Reviewed-by: Andres Freund <[email protected]>Discussion: https://www.postgresql.org/message-id/
a102f15f-eac4-4ff2-af02-
f9ff209ec66f@iki.fi
Heikki Linnakangas [Thu, 14 Nov 2024 14:06:16 +0000 (16:06 +0200)] Replace postc's own backend type codes with BackendType
Introduce a separate BackendType for dead-end children, so that we
don't need a separate dead_end flag.
Reviewed-by: Andres Freund <[email protected]>Discussion: https://www.postgresql.org/message-id/
a102f15f-eac4-4ff2-af02-
f9ff209ec66f@iki.fi
Peter Eisentraut [Thu, 14 Nov 2024 08:30:14 +0000 (09:30 +0100)] Remove a useless cast to (void *) in hash_search() call
This pattern was previously cleaned up in
54a177a948b, but a new
instance snuck in around the same time in
31966b151e6.
Michael Paquier [Thu, 14 Nov 2024 04:23:11 +0000 (13:23 +0900)] contrib/lo: Use SQL-standard function bodies
Author: Ronan Dunklau
Discussion: https://postgr.es/m/
3316564.aeNJFYEL58@aivenlaptop
Michael Paquier [Thu, 14 Nov 2024 04:10:36 +0000 (13:10 +0900)] xml2: Add tests for functions xpath_nodeset() and xpath_list()
These two functions with their different argument lists have never been
tested in this module, so let's add something.
Author: Ronan Dunklau
Discussion: https://postgr.es/m/
[email protected]Michael Paquier [Thu, 14 Nov 2024 03:24:00 +0000 (12:24 +0900)] contrib/lo: Add test for function lo_oid()
Author: Ronan Dunklau
Discussion: https://postgr.es/m/
[email protected]Peter Geoghegan [Wed, 13 Nov 2024 14:50:57 +0000 (09:50 -0500)] Add nbtree amgettuple return item function.
This makes it easier to add precondition assertions. We now assert that
the last call to _bt_readpage succeeded, and that the current item index
is within the bounds of the currPos items array.
Author: Peter Geoghegan <
[email protected]>
Reviewed-By: Masahiro Ikeda <[email protected]>Discussion: https://postgr.es/m/CAH2-WznFkEs9K1PtNruti5JjawY-dwj+gkaEh_k1ZE+1xLLGkA@mail.gmail.com
Álvaro Herrera [Wed, 13 Nov 2024 10:06:44 +0000 (11:06 +0100)] Fix pg_upgrade's cross-version tests when old < 18
Because in the 18 cycle we turned checksums on by default with commit
04bec894a04c, and pg_upgrade fails if the setting doesn't match in old
and new clusters, the built-in cross-version pg_upgrade test is failing
if the old version is older than 18. Fix the script so that it creates
the old cluster with checksums enabled (-k) in cross-version scenarios.
This went unnoticed because the buildfarm doesn't use the same test code
for cross-version testing.
Reviewed-by: Peter Eisentraut <[email protected]>Discussion: https://postgr.es/m/
202411071838[email protected]Peter Eisentraut [Wed, 13 Nov 2024 09:29:31 +0000 (10:29 +0100)] configure.ac: Remove useless AC_SUBST
No longer used since commit
805e431a386.
Peter Eisentraut [Wed, 13 Nov 2024 08:05:02 +0000 (09:05 +0100)] doc: Update pg_constraint.conexclop docs for WITHOUT OVERLAPS
Fixup for commit
fc0438b4e80.
Author: Paul A. Jungwirth <
[email protected]>
Discussion: https://www.postgresql.org/message-id/
57ea0668-5205-426e-b934-
efc89f2186c2@illuminatedcomputing.com
Peter Eisentraut [Wed, 13 Nov 2024 07:53:08 +0000 (08:53 +0100)] doc: Add PERIOD to ALTER TABLE reference docs
Commit
89f908a6d0a documented foreign keys with PERIOD in the CREATE
TABLE docs, but not in ALTER TABLE. This commit adds the new syntax
to the ALTER TABLE docs.
Author: Paul A. Jungwirth <
[email protected]>
Discussion: https://www.postgresql.org/message-id/
57ea0668-5205-426e-b934-
efc89f2186c2@illuminatedcomputing.com
Peter Eisentraut [Wed, 13 Nov 2024 07:51:23 +0000 (08:51 +0100)] doc: Small improvement in CREATE TABLE / PERIOD documentation
Use placeholders that are more consistent and match the description
better. Fixup for commit
89f908a6d0a.
Peter Eisentraut [Wed, 13 Nov 2024 07:42:34 +0000 (08:42 +0100)] doc: Add WITHOUT OVERLAPS to ALTER TABLE reference docs
Commit
fc0438b4e80 documented WITHOUT OVERLAPS in the CREATE TABLE
docs, but not in ALTER TABLE. This commit adds the new syntax to the
ALTER TABLE docs.
Author: Paul A. Jungwirth <
[email protected]>
Discussion: https://www.postgresql.org/message-id/
57ea0668-5205-426e-b934-
efc89f2186c2@illuminatedcomputing.com
Michael Paquier [Wed, 13 Nov 2024 04:58:09 +0000 (13:58 +0900)] Fix comment in injection_point.c
InjectionPointEntry->name was described as a hash key, which was fine
when introduced in
d86d20f0ba79, but it is not now.
Oversight in
86db52a5062a, that has changed the way injection points are
stored in shared memory from a hash table to an array.
Back-through: 17
Peter Geoghegan [Wed, 13 Nov 2024 03:09:00 +0000 (22:09 -0500)] Fix obsolete nbtree page reuse FSM comment.
Oversight in commit
d088ba5a.
Peter Geoghegan [Wed, 13 Nov 2024 01:57:45 +0000 (20:57 -0500)] Count contrib/bloom index scans in pgstat view.
Maintain the pg_stat_user_indexes.idx_scan pgstat counter during
contrib/Bloom index scans.
Oversight in commit
9ee014fc, which added the Bloom index contrib
module.
Author: Masahiro Ikeda <
[email protected]>
Reviewed-By: Peter Geoghegan <[email protected]>Discussion: https://postgr.es/m/
c48839d881388ee401a01807c686004d@oss.nttdata.com
Back: 13- (all supported branches).
Amit Langote [Tue, 12 Nov 2024 06:56:51 +0000 (15:56 +0900)] Add missing word in comment
Discussion: https://postgr.es/m/CA+HiwqFgdp8=0_gi+DU0fPWZbg7qY3KZ_c1Wj1DEvzXC4BCnMQ@mail.gmail.com
Álvaro Herrera [Tue, 12 Nov 2024 10:35:43 +0000 (11:35 +0100)] Silence compilers about extractNotNullColumn()
Multiple buildfarm animals warn that a newly added Assert() is
impossible to fail; remove it to avoid the noise. While at it, use
direct assignment to obtain the value we need, avoiding an unnecessary
memcpy().
(I decided to remove the "pfree" call for the detoasted short-datum;
because this is only used for DDL, it's not problematic to such a
small allocation.)
Noted by Tom Lane about
14e87ffa5c54.
Discussion: https://postgr.es/m/
3649828.
1731083171@sss.pgh.pa.us
Michael Paquier [Tue, 12 Nov 2024 08:28:03 +0000 (17:28 +0900)] pg_freespacemap: Use SQL-standard function bodies
72a5b1fc8804 was the piece missing for the conversion of this module.
pg_freespace is bumped to 1.3, with its function pg_freespace(regclass)
converted to this new style.
There are other modules in the tree that need a similar treatment; these
will be handled later.
Author: Tom Lane
Reviewed-by: Ronan DunklauDiscussion: https://postgr.es/m/
3395418.
1618352794@sss.pgh.pa.us
Alexander Korotkov [Mon, 11 Nov 2024 23:44:20 +0000 (01:44 +0200)] Fix arrays comparison in CompareOpclassOptions()
The current code calls array_eq() and does not provide FmgrInfo. This commit
provides initialization of FmgrInfo and uses C collation as the safe option
for text comparison because we don't know anything about the semantics of
opclass options.
Back to 13, where opclass options were introduced.
Reported-by: Nicolas MausDiscussion: https://postgr.es/m/18692-
72ea398df3ec6712%40postgresql.org
Back-through: 13
Tom Lane [Mon, 11 Nov 2024 22:05:53 +0000 (17:05 -0500)] Parallel workers use AuthenticatedUserId for connection privilege checks.
Commit
5a2fed911 had an unexpected side-effect: the parallel worker
launched for the new test case would fail if it couldn't use a
superuser-reserved connection slot. The reason that test failed
while all our pre-existing ones worked is that the connection
privilege tests in InitPostgres had been based on the superuserness
of the leader's AuthenticatedUserId, but after the rearrangements
of
5a2fed911 we were testing the superuserness of CurrentUserId,
which the new test case deliberately made to be a non-superuser.
This all seems very accidental and probably not the behavior we really
want, but a security is no time to be redesigning things.
Pending some discussion about desirable semantics, hack it so that
InitPostgres continues to pay attention to the superuserness of
AuthenticatedUserId when starting a parallel worker.
Nathan Bossart and Tom Lane, per buildfarm member sawshark.
Security: CVE-2024-10978
Tom Lane [Mon, 11 Nov 2024 18:57:21 +0000 (13:57 -0500)] Fix cross-version upgrade tests.
TestUpgradeXversion knows how to make the main regression database's
references to pg_regress.so be version-independent. But it doesn't
do that for plperl's database, so that the C function added by
commit
b7e3a52a8 is causing cross-version upgrade test failures.
Path of least resistance is to just drop the function at the end
of the new test.
In <= v14, also take the opportunity to clean up the generated
test files.
Security: CVE-2024-10979
Tom Lane [Mon, 11 Nov 2024 17:20:08 +0000 (12:20 -0500)] Avoid bizarre meson behavior with backslashes in command arguments.
meson makes the backslashes in text2macro.pl's --strip argument
into forward slashes, effectively disabling comment stripping.
That hasn't caused us issues before, but it breaks the test case
for
b7e3a52a8. We don't really need the pattern to be adjustable,
so just hard-wire it into the script instead.
Context: https://.com/mesonbuild/meson/issues/1564
Security: CVE-2024-10979
Tom Lane [Mon, 11 Nov 2024 15:29:54 +0000 (10:29 -0500)] Fix improper interactions between session_authorization and role.
The SQL spec mandates that SET SESSION AUTHORIZATION implies
SET ROLE NONE. We tried to implement that within the lowest-level
functions that manipulate these settings, but that was a bad idea.
In particular, guc.c assumes that it doesn't matter in what order
it applies GUC variable updates, but that was not the case for these
two variables. This problem, compounded by some hackish attempts to
work around it, led to some security-grade issues:
* Rolling back a transaction that had done SET SESSION AUTHORIZATION
would revert to SET ROLE NONE, even if that had not been the previous
state, so that the effective user ID might now be different from what
it had been.
* The same for SET SESSION AUTHORIZATION in a function SET clause.
* If a parallel worker inspected current_setting('role'), it saw
"none" even when it should see something else.
Also, although the parallel worker startup code intended to cope
with the current role's pg_authid row having disappeared, its
implementation of that was incomplete so it would still fail.
Fix by fully separating the miscinit.c functions that assign
session_authorization from those that assign role. To implement the
spec's requirement, teach set_config_option itself to perform "SET
ROLE NONE" when it sets session_authorization. (This is undoubtedly
ugly, but the alternatives seem worse. In particular, there's no way
to do it within assign_session_authorization without incompatible
changes in the API for GUC assign hooks.) Also, improve
ParallelWorkerMain to directly set all the relevant user-ID variables
instead of relying on some of them to get set indirectly. That
allows us to survive not finding the pg_authid row during worker
startup.
In v16 and earlier, this includes back-ing
9987a7bf3 which
fixed a violation of GUC coding rules: SetSessionAuthorization
is not an appropriate place to be throwing errors from.
Security: CVE-2024-10978
Nathan Bossart [Mon, 11 Nov 2024 15:00:00 +0000 (09:00 -0600)] Ensure cached plans are correctly marked as dependent on role.
If a CTE, subquery, sublink, security invoker view, or coercion
projection references a table with row-level security policies, we
neglected to mark the plan as potentially dependent on which role
is executing it. This could lead to later executions in the same
session returning or hiding rows that should have been hidden or
returned instead.
Reported-by: Wolfgang Walther
Reviewed-by: Noah Misch
Security: CVE-2024-10976
Back-through: 12
Noah Misch [Mon, 11 Nov 2024 14:23:43 +0000 (06:23 -0800)] Block environment variable mutations from trusted PL/Perl.
Many process environment variables (e.g. PATH), bypass the containment
expected of a trusted PL. Hence, trusted PLs must not offer features
that achieve setenv(). Otherwise, an attacker having USAGE privilege on
the language often can achieve arbitrary code execution, even if the
attacker lacks a database server operating system user.
To fix PL/Perl, replace trusted PL/Perl %ENV with a tied hash that just
replaces each modification attempt with a warning. Sites that reach
these warnings should evaluate the application-specific implications of
proceeding without the environment modification:
Can the application reasonably proceed without the modification?
If no, switch to plperlu or another approach.
If yes, the application should change the code to stop attempting
environment modifications. If that's too difficult, add "untie
%main::ENV" in any code executed before the warning. For example,
one might add it to the start of the affected function or even to
the plperl.on_plperl_init setting.
In passing, link to Perl's guidance about the Perl features behind the
security posture of PL/Perl.
Back- to v12 (all supported versions).
Andrew Dunstan and Noah Misch
Security: CVE-2024-10979
Amit Kapila [Mon, 11 Nov 2024 09:54:40 +0000 (15:24 +0530)] Doc: Add links to clarify the max_replication_slots.
The GUC max_replication_slots has a different meaning for sending servers
and subscribers. Add cross-links in each section for easy reference.
Author: Tristan Partin
Discussion: https://postgr.es/m/
[email protected]Michael Paquier [Mon, 11 Nov 2024 01:40:48 +0000 (10:40 +0900)] Add two attributes to pg_stat_database for parallel workers activity
Two attributes are added to pg_stat_database:
* parallel_workers_to_launch, counting the total number of parallel
workers that were planned to be launched.
* parallel_workers_launched, counting the total number of parallel
workers actually launched.
The ratio of both fields can provide hints that there are not enough
slots available when launching parallel workers, also useful when
pg_stat_statements is not deployed on an instance (i.e.
cf54a2c00254).
This commit relies on
de3a2ea3b264, that has added two fields to EState,
that get incremented when executing Gather or GatherMerge nodes.
A test is added in select_parallel, where parallel workers are spawned.
Bump catalog version.
Author: Benoit Lobréau
Discussion: https://postgr.es/m/
783bc7f7-659a-42fa-99dd-
ee0565644e25@dalibo.com
Michael Paquier [Mon, 11 Nov 2024 01:19:52 +0000 (10:19 +0900)] libpq: Bail out during SSL/GSS negotiation errors
This commit changes libpq so that errors reported by the backend during
the protocol negotiation for SSL and GSS are discarded by the client, as
these may include bytes that could be consumed by the client and write
arbitrary bytes to a client's terminal.
A failure with the SSL negotiation now leads to an error immediately
reported, without a retry on any other methods allowed, like a fallback
to a plaintext connection.
A failure with GSS discards the error message received, and we allow a
fallback as it may be possible that the error is caused by a connection
attempt with a pre-11 server, GSS encryption having been introduced in
v12. This was a problem only with v17 and newer versions; older
versions discard the error message already in this case, assuming a
failure caused by a lack of support for GSS encryption.
Author: Jacob Champion
Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Michael Paquier
Security: CVE-2024-10977
Back-through: 12
Michael Paquier [Mon, 11 Nov 2024 00:02:30 +0000 (09:02 +0900)] pg_stat_statements: Avoid some locking during PGSS entry scans
A single PGSS entry's spinlock is used to be able to modify "counters"
without holding pgss->lock exclusively, as mentioned at the top of
pg_stat_statements.c and within pgssEntry.
Within a single pgssEntry, stats_since and minmax_stats_since are never
modified without holding pgss->lock exclusively, so there is no need to
hold an entry's spinlock when reading stats_since and
minmax_stats_since, as done when scanning all the PGSS entries for
function calls of pg_stat_statements().
This also restores the consistency between the code and the comments
about the entry's spinlock usage. This change is a performance
improvement (it can be argued that this is a logic bug), so there is no
need for a back. This saves two instructions from being read while
holding an entry's spinlock.
Author: Karina Litskevich
Reviewed-by: Michael Paquier, wenhui qiu
Discussion: https://postgr.es/m/CACiT8ibhCmzbcOxM0v4pRLH3abk-95LPkt7_uC2JMP+miPjxsg@mail.gmail.com
Thomas Munro [Sun, 10 Nov 2024 22:46:37 +0000 (11:46 +1300)] jit: Remove obsolete LLVM version guard.
Commit
9044fc1d needed a version guard when back-ed, but it is
redundant in master as of commit
972c2cd2, and I accidentally left it
in there.
Nathan Bossart [Fri, 8 Nov 2024 22:11:08 +0000 (16:11 -0600)] Fix sign-compare warnings in pg_iovec.h.
The code in question (pg_preadv() and pg_pwritev()) has been around
for a while, but commit
15c9ac3629 moved it to a header file. If
third-party code that includes this header file is built with
-Wsign-compare on a system without preadv() or pwritev(), warnings
ensue. This commit fixes said warnings by casting the result of
pg_pread()/pg_pwrite() to size_t, which should be safe because we
will have already checked for a negative value.
Author: Wolfgang Walther
Discussion: https://postgr.es/m/
16989737-1aa8-48fd-8dfe-
b7ada06509ab%40technowledgy.de
Back-through: 17
Peter Geoghegan [Fri, 8 Nov 2024 21:34:41 +0000 (16:34 -0500)] Assert consistency of currPage that ended scan.
When _bt_readnextpage is called with our nbtree parallel scan already
seized (i.e. when it is directly called by _bt_first), we never expect a
prior call to _bt_readpage for lastcurrblkno to already indicate that
the scan should end -- the _bt_first caller's blkno must always be read.
After all, the "prior" _bt_readpage call (the call for lastcurrblkno)
probably took place in some other backend (and it might not even have
finished by the time our backend reaches _bt_first/_bt_readnextpage).
Add a documenting assertion to the path where _bt_readnextpage ends the
parallel scan based on information about lastcurrblkno from so->currPos.
Assert that the most recent _bt_readpage call that set so->currPos is in
fact lastcurrblkno's _bt_readpage call.
Follow-up to bugfix commit
b5ee4e52.
Nathan Bossart [Fri, 8 Nov 2024 20:25:28 +0000 (14:25 -0600)] Move check for USE_AVX512_POPCNT_WITH_RUNTIME_CHECK.
Unlike TRY_POPCNT_FAST, which is defined in pg_bitutils.h, this
macro is defined in c.h (via pg_config.h), so we can check for it
earlier and avoid some unnecessary #includes on systems that lack
AVX-512 support.
Oversight in commit
f78667bd91.
Discussion: https://postgr.es/m/Zy5K5Qmlb3Z4dsd4%40nathan
Tom Lane [Fri, 8 Nov 2024 18:42:01 +0000 (13:42 -0500)] Improve fix for not entering parallel mode when holding interrupts.
Commit
ac04aa84a put the shutoff for this into the planner, which is
not ideal because it doesn't prevent us from re-using a previously
made parallel plan. Revert the planner change and instead put the
shutoff into InitializeParallelDSM, modeling it on the existing code
there for recovering from failure to allocate a DSM segment.
However, that code path is mostly untested, and testing a bit harder
showed there's at least one bug: ExecHashJoinReInitializeDSM is not
prepared for us to have skipped doing parallel DSM setup. I also
thought the Assert in ReinitializeParallelWorkers is pretty
ill-advised, and replaced it with a silent Min() operation.
The existing test case added by
ac04aa84a serves fine to test this
version of the fix, so no change needed there.
by me, but thanks to Noah Misch for the core idea that we
could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED.
Back- to v12, as
ac04aa84a was.
Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
Peter Geoghegan [Fri, 8 Nov 2024 18:10:10 +0000 (13:10 -0500)] Avoid nbtree parallel scan currPos confusion.
Commit
1bd4bc85, which refactored nbtree sibling link traversal, made
_bt_parallel_seize reset the scan's currPos so that things were
consistent with the state of a serial backend moving between pages.
This overlooked the fact that _bt_readnextpage relied on the existing
currPos state to decide when to end the scan -- even though it came from
before the scan was seized. As a result of all this, parallel nbtree
scans could needlessly behave like full index scans.
To fix, teach _bt_readnextpage to explicitly allow the use of an already
read page's so->currPos when deciding whether to end the scan -- even
during parallel index scans (allow it consistently now). This requires
moving _bt_readnextpage's seizure of the scan to earlier in its loop.
That way _bt_readnextpage either deals with the true so->currPos state,
or an initialized-by-_bt_parallel_seize currPos state set from when the
scan was seized. Now _bt_steppage (the most important _bt_readnextpage
caller) takes the same uniform approach to setting up its call using
details taken from so->currPos -- regardless of whether the scan happens
to be parallel or serial.
The new loop structure in _bt_readnextpage is prone to getting confused
by P_NONE blknos set when the rightmost or leftmost page was reached.
We could avoid that by adding an explicit check, but that would be ugly.
Avoid this problem by teaching _bt_parallel_seize to end the parallel
scan instead of returning a P_NONE next block/blkno. Doing things this
way was arguably a missed opportunity for commit
1bd4bc85. It allows us
to remove a similar "blkno == P_NONE" check from _bt_first.
Oversight in commit
1bd4bc85, which refactored sibling link traversal
(as part of optimizing nbtree backward scan locking).
Author: Peter Geoghegan <
[email protected]>
Reported-By: Masahiro Ikeda <[email protected]>Diagnosed-By: Masahiro Ikeda <[email protected]>Reviewed-By: Masahiro Ikeda <[email protected]>Discussion: https://postgr.es/m/
f8efb9c0f8d1a71b44fd7f8e42e49c25@oss.nttdata.com
Álvaro Herrera [Fri, 8 Nov 2024 12:28:48 +0000 (13:28 +0100)] Add pg_constraint rows for not-null constraints
We now create contype='n' pg_constraint rows for not-null constraints on
user tables. Only one such constraint is allowed for a column.
We propagate these constraints to other tables during operations such as
adding inheritance relationships, creating and attaching partitions and
creating tables LIKE other tables. These related constraints mostly
follow the well-known rules of conislocal and coninhcount that we have
for CHECK constraints, with some adaptations: for example, as opposed to
CHECK constraints, we don't match not-null ones by name when descending
a hierarchy to alter or remove it, instead matching by the name of the
column that they apply to. This means we don't require the constraint
names to be identical across a hierarchy.
The inheritance status of these constraints can be controlled: now we
can be sure that if a parent table has one, then all children will have
it as well. They can optionally be marked NO INHERIT, and then children
are free not to have one. (There's currently no support for altering a
NO INHERIT constraint into inheriting down the hierarchy, but that's a
desirable future feature.)
This also opens the door for having these constraints be marked NOT
VALID, as well as allowing UNIQUE+NOT NULL to be used for functional
dependency determination, as envisioned by commit
e49ae8d3bc58. It's
likely possible to allow DEFERRABLE constraints as followup work, as
well.
psql shows these constraints in \d+, though we may want to reconsider if
this turns out to be too noisy. Earlier versions of this hid
constraints that were on the same columns of the primary key, but I'm
not sure that that's very useful. If clutter is a problem, we might be
better off inventing a new \d++ command and not showing the constraints
in \d+.
For now, we omit these constraints on system catalog columns, because
they're unlikely to achieve anything.
The main difference to the previous attempt at this (
b0e96f311985) is
that we now require that such a constraint always exists when a primary
key is in the column; we didn't require this previously which had a
number of unpalatable consequences. With this requirement, the code is
easier to reason about. For example:
- We no longer have "throwaway constraints" during pg_dump. We needed
those for the case where a table had a PK without a not-null
underneath, to prevent a slow scan of the data during restore of the
PK creation, which was particularly problematic for pg_upgrade.
- We no longer have to cope with attnotnull being set spuriously in
case a primary key is dropped indirectly (e.g., via DROP COLUMN).
Some bits of code in this were authored by Jian He.
Author: Álvaro Herrera <
[email protected]>
Author: Bernd Helmle <
[email protected]>
Reviewed-by: 何建 (jian he) <[email protected]>Reviewed-by: 王刚 (Tender Wang) <[email protected]>Reviewed-by: Justin Pryzby <[email protected]>Reviewed-by: Peter Eisentraut <[email protected]>Reviewed-by: Dean Rasheed <[email protected]>Discussion: https://postgr.es/m/
202408310358[email protected]Amit Langote [Fri, 8 Nov 2024 07:27:24 +0000 (16:27 +0900)] Disallow partitionwise join when collations don't match
If the collation of any join key column doesn’t match the collation of
the corresponding partition key, partitionwise joins can yield incorrect
results. For example, rows that would match under the join key collation
might be located in different partitions due to the partitioning
collation. In such cases, a partitionwise join would yield different
results from a non-partitionwise join, so disallow it in such cases.
Reported-by: Tender Wang <[email protected]>Author: Jian He <
[email protected]>
Reviewed-by: Tender Wang <[email protected]>Reviewed-by: Junwang Zhao <[email protected]>Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
Back-through: 12
Amit Langote [Fri, 8 Nov 2024 07:07:22 +0000 (16:07 +0900)] Disallow partitionwise grouping when collations don't match
If the collation of any grouping column doesn’t match the collation of
the corresponding partition key, partitionwise grouping can yield
incorrect results. For example, rows that would be grouped under the
grouping collation may end up in different partitions under the
partitioning collation. In such cases, full partitionwise grouping
would produce results that differ from those without partitionwise
grouping, so disallowed that.
Partial partitionwise aggregation is still allowed, as the Finalize
step reconciles partition-level aggregates with grouping requirements
across all partitions, ensuring that the final output remains
consistent.
This commit also fixes group_by_has_partkey() by ensuring the
RelabelType node is stripped from grouping expressions when matching
them to partition key expressions to avoid false mismatches.
Bug: #18568
Reported-by: Webbo Han <[email protected]>Author: Webbo Han <
1105066510@qq.com>
Reviewed-by: Tender Wang <[email protected]>Reviewed-by: Aleksander Alekseev <[email protected]>Reviewed-by: Jian He <[email protected]>Discussion: https://postgr.es/m/18568-
2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/
[email protected]Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
Back-through: 12
Richard Guo [Fri, 8 Nov 2024 02:21:11 +0000 (11:21 +0900)] Fix inconsistent RestrictInfo serial numbers
When we generate multiple clones of the same qual condition to cope
with outer join identity 3, we need to ensure that all the clones get
the same serial number. To achieve this, we reset the
root->last_rinfo_serial counter each time we produce RestrictInfo(s)
from the qual list (see deconstruct_distribute_oj_quals). This
approach works only if we ensure that we are not changing the qual
list in any way that'd affect the number of RestrictInfos built from
it.
However, with
b262ad440, an IS NULL qual on a NOT NULL column might
result in an additional constant-FALSE RestrictInfo. And different
versions of the same qual clause can lead to different conclusions
about whether it can be reduced to constant-FALSE. This would affect
the number of RestrictInfos built from the qual list for different
versions, causing inconsistent RestrictInfo serial numbers across
multiple clones of the same qual. This inconsistency can confuse
users of these serial numbers, such as rebuild_joinclause_attr_needed,
and lead to planner errors such as "ERROR: variable not found in
subplan target lists".
To fix, reset the root->last_rinfo_serial counter after generating the
additional constant-FALSE RestrictInfo.
Back- to v17 where the issue crept in. In v17, I failed to make
a test case that would expose this bug, so no test case for v17.
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-B6kafn+LmPuh-TYFwFyEm-vVj3Qqv7Yo-69CEv14rRg@mail.gmail.com
Nathan Bossart [Thu, 7 Nov 2024 21:27:32 +0000 (15:27 -0600)] Fix __attribute__((target(...))) usage.
The commonly supported way to specify multiple target options is to
surround the entire list with quotes and to use a comma (with no
extra spaces) as the delimiter.
Oversight in commit
f78667bd91.
Discussion: https://postgr.es/m/Zy0jya8nF8CPpv3B%40nathan
Nathan Bossart [Thu, 7 Nov 2024 19:58:43 +0000 (13:58 -0600)] Use __attribute__((target(...))) for AVX-512 support.
Presently, we check for compiler support for the required
intrinsics both with and without extra compiler flags (e.g.,
-mxsave), and then depending on the results of those checks, we
pick which files to compile with which flags. This is tedious and
complicated, and it results in unsustainable coding patterns such
as separate files for each portion of code may need to be built
with different compiler flags.
This commit introduces support for __attribute__((target(...))) and
uses it for the AVX-512 code. This simplifies both the
configure-time checks and the build scripts, and it allows us to
place the functions that use the intrinsics in files that we
otherwise do not want to build with special CPU instructions. We
are careful to avoid using __attribute__((target(...))) on
compilers that do not understand it, but we still perform the
configure-time checks in case the compiler allows using the
intrinsics without it (e.g., MSVC).
A similar change could likely be made for some of the CRC-32C code,
but that is left as a future exercise.
Suggested-by: Andres FreundReviewed-by: Raghuveer Devulapalli, Andres FreundDiscussion: https://postgr.es/m/
20240731205254.vfpap7uxwmebqeaf%40awork3.anarazel.de
Álvaro Herrera [Thu, 7 Nov 2024 13:06:24 +0000 (14:06 +0100)] doc: Reword ALTER TABLE ATTACH restriction on NO INHERIT constraints
The previous wording is easy to read incorrectly; this change makes it
simpler, less ambiguous, and less prominent.
Back to all live branches.
Reviewed-by: Amit Langote <[email protected]>Discussion: https://postgr.es/m/
202411051201[email protected]Peter Eisentraut [Thu, 7 Nov 2024 10:13:06 +0000 (11:13 +0100)] Clarify a foreign key error message
Clarify the message about type mismatch in foreign key definition to
indicate which column the referencing and which is the referenced one.
Reported-by: Jian He <[email protected]>Discussion: https://www.postgresql.org/message-id/CACJufxEL82ao-aXOa=d_-Xip0bix-qdSyNc9fcWxOdkEZFko8w@mail.gmail.com
Michael Paquier [Thu, 7 Nov 2024 06:13:50 +0000 (15:13 +0900)] Remove an obsolete comment in gistinsert()
This is inconsistent since
1f7ef548ec2e where the definition of
gistFormTuple() has changed.
Author: Tender Wang
Reviewed-by: Aleksander AlekseevDiscussion: https://postgr.es/m/CAHewXNkjU95_HdioDVU=5yBq_Xt=GfBv=Od-0oKtiA006pWW7Q@mail.gmail.com
Amit Kapila [Thu, 7 Nov 2024 03:28:49 +0000 (08:58 +0530)] Replicate generated columns when 'publish_generated_columns' is set.
This builds on the work done in commit
745217a051 by enabling the
replication of generated columns alongside regular column changes through
a new publication parameter: publish_generated_columns.
Example usage:
CREATE PUBLICATION pub1 FOR TABLE tab_gencol WITH (publish_generated_columns = true);
The column list takes precedence. If the generated columns are specified
in the column list, they will be replicated even if
'publish_generated_columns' is set to false. Conversely, if generated
columns are not included in the column list (assuming the user specifies a
column list), they will not be replicated even if
'publish_generated_columns' is true.
Author: Vignesh C, Shubham Khanna
Reviewed-by: Peter Smith, Amit Kapila, Hayato Kuroda, Shlok Kyal, Ajin Cherian, Hou Zhijie, Masahiko SawadaDiscussion: https://postgr.es/m/
B80D17B2-2C8E-4C7D-87F2-
E5B4BE3C069E@gmail.com
Michael Paquier [Thu, 7 Nov 2024 03:11:27 +0000 (12:11 +0900)] Improve handling of empty query results in BackgroundPsql::query()
A newline is not added at the end of an empty query result, causing the
banner of the hardcoded \echo to not be discarded. This would reflect
on scripts that expect an empty result by showing the "QUERY_SEPARATOR"
in the output returned back to the caller, which was confusing.
This commit changes BackgroundPsql::query() so as empty results are able
to work correctly, making the first newline before the banner optional,
bringing more flexibility.
Note that this change affects 037_invalid_database.pl, where three
queries generated an empty result, with the script relying on the data
from the hardcoded banner to exist in the expected output. These
queries are changed to use query_safe(), leading to a simpler script.
The author has also proposed a test in a different where empty
results would exist when using BackgroundPsql.
Author: Jacob Champion
Reviewed-by: Andrew Dunstan, Michael Paquier
Discussion: https://postgr.es/m/CAOYmi+=60deN20WDyCoHCiecgivJxr=98s7s7-C8SkXwrCfHXg@mail.gmail.com
Daniel Gustafsson [Wed, 6 Nov 2024 14:40:52 +0000 (15:40 +0100)] Find invalid databases during upgrade check stage
Before continuing with the check start by checking that all databases
allow connections to avoid a hard fail without proper error reporting.
Inspired by a larger by Thomas Krennwallner.
Discussion: https://postgr.es/m/
f9315bf0-e03e-4490-9f0d-
5b6f7a6d9908@postsubmeta.net
Daniel Gustafsson [Wed, 6 Nov 2024 14:11:14 +0000 (15:11 +0100)] Remove unused variable
The low variable has not been used since it was added in
d168b666823and can be safely removed. The variable is present in the Sedgewick
paper "Analysis of Shellsort and Related Algorithms" as a parameter
to the shellsort function, but our implementation does not use it.
Remove to improve readability of the code.
Author: Koki Nakamura <
[email protected]>
Discussion: https://postgr.es/m/
8aeb7b3eda53ca4c65fbacf8f43628fb@oss.nttdata.com
Peter Eisentraut [Wed, 6 Nov 2024 12:42:27 +0000 (13:42 +0100)] doc: Remove event trigger firing matrix
This is difficult to maintain accurately, and it was probably already
somewhat incorrect, especially in the sql_drop and table_rewrite
categories.
The prior section already documented which DDL commands are *not*
supported (which was also slightly outdated), so let's expand that a
bit and just rely on that instead of listing out each command in full
detail.
Reviewed-by: Daniel Gustafsson <[email protected]>Reviewed-by: Michael Paquier <[email protected]>Reviewed-by: Jian He <[email protected]>Discussion: https://www.postgresql.org/message-id/flat/CACJufxE_UAuxcM08BW5oVsg34v0cFWoEt8yBa5xSAoKLmL6LTQ%40mail.gmail.com
Thomas Munro [Wed, 6 Nov 2024 09:04:44 +0000 (22:04 +1300)] Monkey- LLVM code to fix ARM relocation bug.
Supply a new memory manager for RuntimeDyld, to avoid crashes in
generated code caused by memory placement that can overflow a 32 bit
data type. This is a drop-in replacement for the
llvm::SectionMemoryManager class in the LLVM library, with Michael
Smith's proposed fix from
https://www..com/llvm/llvm-project/pull/71968.
We hereby slurp it into our own source tree, after moving into a new
namespace llvm::backport and making some minor adjustments so that it
can be compiled with older LLVM versions as far back as 12. It's harder
to make it work on even older LLVM versions, but it doesn't seem likely
that people are really using them so that is not investigated for now.
The problem could also be addressed by switching to JITLink instead of
RuntimeDyld, and that is the LLVM project's recommended solution as
the latter is about to be deprecated. We'll have to do that soon enough
anyway, and then when the LLVM version support window advances far
enough in a few years we'll be able to delete this code. Unfortunately
that wouldn't be enough for PostgreSQL today: in most relevant versions
of LLVM, JITLink is missing or incomplete.
Several other projects have already back-ported this fix into their fork
of LLVM, which is a vote of confidence despite the lack of commit into
LLVM as of today. We don't have our own copy of LLVM so we can't do
exactly what they've done; instead we have a copy of the whole ed
class so we can pass an instance of it to RuntimeDyld.
The LLVM project hasn't chosen to commit the fix yet, and even if it
did, it wouldn't be back-ported into the releases of LLVM that most of
our users care about, so there is not much point in waiting any longer
for that. If they make further changes and commit it to LLVM 19 or 20,
we'll still need this for older versions, but we may want to
resynchronize our copy and update some comments.
The changes that we've had to make to our copy can be seen by diffing
our SectionMemoryManager.{h,cpp} files against the ones in the tree of
the pull request. Per the LLVM project's license requirements, a copy
is in SectionMemoryManager.LICENSE.
This should fix the spate of crash reports we've been receiving lately
from users on large memory ARM systems.
Back- to all supported releases.
Co-authored-by: Thomas Munro <[email protected]>Co-authored-by: Anthonin Bonnefoy <[email protected]>Reviewed-by: Anthonin Bonnefoy <[email protected]>Reviewed-by: Daniel Gustafsson <[email protected]> (license aspects)Reported-by: Anthonin Bonnefoy <[email protected]>Discussion: https://postgr.es/m/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com
Peter Eisentraut [Wed, 6 Nov 2024 10:06:58 +0000 (11:06 +0100)] Remove unused #include's from bin .c files
as determined by IWYU
Similar to commit
dbbca2cf299, but for bin and some related files.
Discussion: https://www.postgresql.org/message-id/flat/
0df1d5b1-8ca8-4f84-93be-
121081bde049%40eisentraut.org
Michael Paquier [Wed, 6 Nov 2024 06:31:14 +0000 (15:31 +0900)] Extend Cluster.pm's background_psql() to be able to start asynchronously
This commit extends the constructor routine of BackgroundPsql.pm with a
new "wait" parameter. If set to 0, the routine returns without waiting
for psql to start, ready to consume input.
background_psql() in Cluster.pm gains the same "wait" parameter. The
default behavior is still to wait for psql to start. It becomes now
possible to not wait, giving to TAP scripts the possibility to perform
actions between a BackgroundPsql startup and its wait_connect() call.
Author: Jacob Champion
Discussion: https://postgr.es/m/CAOYmi+=60deN20WDyCoHCiecgivJxr=98s7s7-C8SkXwrCfHXg@mail.gmail.com
David Rowley [Tue, 5 Nov 2024 20:16:00 +0000 (09:16 +1300)] Fix hypothetical bug in ExprState building for hashing
adf97c156 gave ExprStates the ability to hash expressions and return a
single hash value. That commit supports seeding the hash value with an
initial value to have that blended into the final hash value.
Here we fix a hypothetical bug where if there are zero expressions to
hash, the initial value is stored in the wrong location. The existing
code stored the initial value in an intermediate location expecting that
when the expressions were hashed that those steps would store the final
hash value in the ExprState.resvalue field. However, that wouldn't happen
when there are zero expressions to hash. The correct thing to do instead
is to have a special case for zero expressions and when we hit that case,
store the initial value directly in the ExprState.resvalue. The reason
that this is a hypothetical bug is that no code currently calls
ExecBuildHash32Expr passing a non-zero initial value.
Discussion: https://postgr.es/m/CAApHDvpMAL_zxbMRr1LOex3O7Y7R7ZN2i8iUFLQhqQiJMAg3qw@mail.gmail.com
Daniel Gustafsson [Tue, 5 Nov 2024 12:56:02 +0000 (13:56 +0100)] Add a Git .mailmap file
This adds a Git .mailmap to unify spellings of committer names.
Discussion: https://postgr.es/m/
76773029-A7AD-4BAF-AFC2-
E511D26E866D@yesql.se
Heikki Linnakangas [Tue, 5 Nov 2024 10:25:25 +0000 (12:25 +0200)] Silence meson warning about PG_TEST_EXTRA in src/Makefile.global.in
Commit
99b937a44f introduced this warning when you run "meson setup":
Configuring Makefile.global using configuration
../src/meson.build:31: WARNING: The variable(s) 'PG_TEST_EXTRA' in the input file 'src/Makefile.global.in' are not present in the given configuration data.
To fix, add PG_TEST_EXTRA to the list of variables that are not needed
in the makefiles generated by meson. In meson builds, the makefiles
are only used for PGXS, not for building or testing the server itself.
Reported-by: Peter EisentrautDiscussion: https://www.postgresql.org/message-id/
5c380997-e270-425a-9542-
e4ef36a285de@eisentraut.org
Michael Paquier [Tue, 5 Nov 2024 00:39:43 +0000 (09:39 +0900)] Clear padding of PgStat_HashKey when handling pgstats entries
PgStat_HashKey is currently initialized in a way that could result in
random data if the structure has any padding bytes. The structure
has no padding bytes currently, fortunately, but it could become a
problem should the structure change at some point in the future.
The code is changed to use some memset(0) so as any padding would be
handled properly, as it would be surprising to see random failures in
the pgstats entry lookups. PgStat_HashKey is a structure internal to
pgstats, and an ABI change could be possible in the scope of a bug fix,
so back down to 15 where this has been introduced.
Author: Bertrand Drouvot
Reviewed-by: Jelte Fennema-Nio, Michael PaquierDiscussion: https://postgr.es/m/
[email protected]Back-through: 15
Tom Lane [Mon, 4 Nov 2024 23:08:48 +0000 (18:08 -0500)] Use portable diff options in pg_bsd_indent's regression test.
We had been using "diff -upd", which evidently works for most people,
but Solaris's diff doesn't like it. (We'd not noticed because the
Solaris buildfarm animals weren't running this test until they were
upgraded to the latest buildfarm client script.) Change to "diff -U3"
which is what pg_regress has used for ages.
Per buildfarm (and off-list discussion with Noah Misch).
Back- to v16 where this test was added. In v16,
also back- the relevant part of
628c1d1f2 so that
the test script looks about the same in all branches.
Alexander Korotkov [Mon, 4 Nov 2024 20:43:08 +0000 (22:43 +0200)] Revert pg_wal_replay_wait() stored procedure
This commit reverts
3c5db1d6b0, and subsequent improvements and fixes
including
8036d73ae3,
867d396ccd,
3ac3ec580c,
0868d7ae70,
85b98b8d5a,
2520226c95,
014f9f34d2,
e658038772,
e1555645d7,
5035172e4a,
6cfebfe88b,
73da6b8d1b, and
e546989a26.
The reason for reverting is a set of remaining issues. Most notably, the
stored procedure appears to need more effort than the utility statement
to turn the backend into a "snapshot-less" state. This makes an approach
to use stored procedures questionable.
Catversion is bumped.
Discussion: https://postgr.es/m/Zyhj2anOPRKtb0xW%40paquier.xyz
Bruce Momjian [Mon, 4 Nov 2024 20:07:38 +0000 (15:07 -0500)] doc: use more accurate URL for bug reporting
Reported-by: [email protected]Discussion: https://postgr.es/m/
172947609746.699.
14488791149769110078@wrigleys.postgresql.org
Back-through: master
Tom Lane [Mon, 4 Nov 2024 19:36:04 +0000 (14:36 -0500)] pg_basebackup, pg_receivewal: fix failure to find password in ~/.pgpass.
Sloppy refactoring in commit
cca97ce6a caused these programs
to pass dbname = NULL to libpq if there was no "--dbname" switch
on the command line, where before "replication" would be passed.
This didn't break things completely, because the source server doesn't
care about the dbname specified for a physical replication connection.
However, it did cause libpq to fail to match a ~/.pgpass entry that
has "replication" in the dbname field. Restore the previous behavior
of passing "replication".
Also, closer inspection shows that if you do specify a dbname
in the connection string, that is what will be matched to ~/.pgpass,
not "replication". This was the pre-existing behavior so we should
not change it, but the SGML docs were pretty misleading about it.
Improve that.
Per bug #18685 from Toshi Harada. Back- to v17 where the
error crept in.
Discussion: https://postgr.es/m/18685-
fee2dd142b9688f1@postgresql.org
Discussion: https://postgr.es/m/
2702546.
1730740456@sss.pgh.pa.us
Bruce Momjian [Mon, 4 Nov 2024 19:10:11 +0000 (14:10 -0500)] doc: remove check of SVG files, since they are derived
revert of change from commit
641a5b7a144Reported-by: Peter EisentrautDiscussion: https://postgr.es/m/
2c5dd601-b245-4092-9c27-
6d1ad51609df@eisentraut.org
Back-through: master
Tom Lane [Mon, 4 Nov 2024 18:30:30 +0000 (13:30 -0500)] pg_dump: provide a stable sort order for rules.
Previously, we sorted rules by schema name and then rule name;
if that wasn't unique, we sorted by rule OID. This can be
problematic for comparing dumps from databases with different
histories, especially since certain rule names like "_RETURN"
are very common. Let's make the sort key schema name, rule name,
table name, which should be unique. (This is the same behavior
we've long used for triggers and RLS policies.)
Andreas Karlsson
Discussion: https://postgr.es/m/
b4e468d8-0cd6-42e6-ac8a-
1d6afa6e0cf1@proxel.se
Masahiko Sawada [Mon, 4 Nov 2024 18:21:59 +0000 (10:21 -0800)] Fix typo in comment of gistdoinsert().
Author: Tender Wang
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAHewXN%3D3sH2sNw4nC3QGCEVw1Lftmw9m5y1Xje0bXK6ApDrsPQ%40mail.gmail.com
Peter Geoghegan [Mon, 4 Nov 2024 17:43:54 +0000 (12:43 -0500)] Fix obsolete _bt_first comments.
_bt_first doesn't necessarily hold onto a buffer pin on success exit.
Fix header comments that claimed that we'll always hold onto a pin.
Oversight in commit
2ed5b87f96.
Heikki Linnakangas [Mon, 4 Nov 2024 16:28:40 +0000 (18:28 +0200)] docs: Consistently use <optional> to indicate optional parameters
Some functions were using square brackets instead, replace them all
with <optional>.
Author: Dagfinn Ilmari Mannsåker <
[email protected]>
Reviewed-by: jian he <[email protected]>Discussion: https://www.postgresql.org/message-id/flat/CACJufxFfUbSph5UUSsZbL4SitbuPuW%
[email protected]Peter Geoghegan [Mon, 4 Nov 2024 16:04:30 +0000 (11:04 -0500)] nbtree: Remove useless 'strat' local variable.
Remove a local variable that was used to avoid overwriting strat_total
with the = operator strategy when a >= operator strategy key was already
included in the initial positioning/insertion scan keys by _bt_first
(for backwards scans it would have to be a <= key that was included).
_bt_first's strat_total local variable now simply tracks the operator
strategy of the final scan key that was included in the scan's insertion
scan key (barring the case where the !used_all_subkeys row compare path
adjusts strat_total in its own way).
_bt_first already treated >= keys (or <= keys) as = keys for initial
positioning purposes. There is no good reason to remember that that was
what happened; no later _bt_first step cares about the distinction.
Note, in particular, that the insertion scan key's 'nextkey' and
'backward' fields will be initialized the same way regardless.
Author: Peter Geoghegan <
[email protected]>
Reviewed-By: Tomas Vondra <[email protected]>Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
Heikki Linnakangas [Mon, 4 Nov 2024 15:59:24 +0000 (17:59 +0200)] Split ProcSleep function into JoinWaitQueue and ProcSleep
Split ProcSleep into two functions: JoinWaitQueue and ProcSleep.
JoinWaitQueue is called while holding the partition lock, and inserts
the current process to the wait queue, while ProcSleep() does the
actual sleeping. ProcSleep() is now called without holding the
partition lock, and it no longer re-acquires the partition lock before
returning. That makes the wakeup a little cheaper. Once upon a time,
re-acquiring the partition lock was needed to prevent a signal handler
from longjmping out at a bad time, but these days our signal handlers
just set flags, and longjmping can only happen at points where we
explicitly run CHECK_FOR_INTERRUPTS().
If JoinWaitQueue detects an "early deadlock" before even joining the
wait queue, it returns without changing the shared lock entry, leaving
the cleanup of the shared lock entry to the caller. This makes the
handling of an early deadlock the same as the dontWait=true case.
One small user-visible side-effect of this refactoring is that we now
only set the 'ps' title to say "waiting" when we actually enter the
sleep, not when the lock is skipped because dontWait=true, or when a
deadlock is detected early before entering the sleep.
This eliminates the 'lockAwaited' global variable in proc.c, which was
largely redundant with 'awaitedLock' in lock.c
Note: Updating the local lock table is now the caller's responsibility.
JoinWaitQueue and ProcSleep are now only responsible for modifying the
shared state. Seems a little nicer that way.
Based on Thomas Munro's earlier and observation that ProcSleep
doesn't really need to re-acquire the partition lock.
Reviewed-by: Maxim OrlovDiscussion: https://www.postgresql.org/message-id/
7c2090cd-a72a-4e34-afaa-
6dd2ef31440e@iki.fi