Mini Summit: Universally Buildable Extensions

30 Apr 2024

Well that snuck up on me. Tomorrow, May 1 2024, Yurii Rashkovskii of Omnigres will be giving a presentation at the fifth installment of the Postgres extension ecosystem mini-summit. The tal, “Universally buildable extensions: dev to prod”, should provoke some interesting asymmetrical thinking and discussion. I hope to see you there.

Note! If you reserved a spot at a prior mini-summit, you will need to do so again for each subsequent event or get no reminders from Eventbrite. If it’s sold out just email david@ this domain, ping me on Mastodon or via the #extensions channel on the Postgres Slack or the Postgres Discord for a link or, if you’d like a calendar invitation.

Mini Summit Four

25 Apr 2024

My thanks to Jonathan Katz for his presentation, “Trusted Language Extensions for PostgreSQL”, at last week’s Postgres Extension Ecosystem Mini-Summit. As usual I’ve collected the transcript here interspersed with comments from the chat window. First, links!

Video
PDF Slides [TBD]

And now, rivers of text!

Introduction

I opened the meeting and introduced Jonathan Katz.

Presentation

Thank you for having me. I’m very excited to talk about this, and extensions in general. I see a lot of folks here I know and recognize and some new folks or folks I’ve not met in person yet.
Borrowed from the original presentation on TLEs from November of 2022, to level set on why we built it. I know this is a slightly more advanced audience, so some stuff that might seem relatively introductory to some of you, though there is some material on the internals of extensions.
The premise is why we decided to build TLEs, what were the problems we’re trying to solve. Part of it is just understanding Postgres extensions. In general this group is very familiar with Extensions but there are two points I want to hit on.
One of the things that excites me most about Postgres is that, when you look back at Postgres as the Berkeley database project researching how to create an object relational database, an accidental or intentional features is not just that Postgres is an object-relational database, but that Postgres is an extensible database, built from the get-go to be able to add functionality without necessarily having to fork it.
Early on you’d have to Fork the database to add additional functionality, but the beauty of the Postgres design was the ability to keep adding functionality without forking.
It did require making changes to Postgres to further enhance that capability, which we’ll discuss in a bit, but that’s a really powerful concept.
The second point is that there is the large landscape of both open source and Commercial extensions. I think a lot of folks in this group are very familiar with the available open source extensions, but there are entire businesses built on, effectively, commercial extensions on top of Postgres. Again, that’s a really powerful notion!
It’s kind of like the Postgres economy: we created something that’s so special that it’s able to spawn all these different businesses whether it’s building things on top of Postgres or using Postgres as the heart of their business. Extensions have a very large role to to to play in that.
Which gets us to the history of extensions. The Syntax for Postgres extensions has been around for over a decade, since Postgres 9.1. I know there’s folks here well familiar with building extensions prior to that syntax! But we’re very grateful for the extension syntax because, as a recovering application developer, I would say it made things a lot easier.
Instead of having to futz around the file system to figure out where to install everything (wink wink nudge nudge on the topic today), you had a very simple syntax. Once an extension is installed, I can CREATE EXTENSION postgis (or whatever extension) and there you go! I have PostGIS installed.
Again, that’s really cool! Because anything we can do to make it simpler to install and use extensions further drives their adoption, and ultimately makes it even easier to develop and build applications with Postgres and continues to drive that forward.
So what can you build with Postgres, what extensions are available? It’s a whole range of things. For starters, there are extensions that I call “the ones that you take for granted”. If you’re using any monitoring tool you’re likely running and you may not even realize it. pg_stat_statements helps to aggregate statistics as queries execute and bubbles it up to whatever monitoring tool you use. It’s a great tool for performance tuning.
The example I like to give for my personal life was that, back when I was an application developer trying to manage my own databases, I had some query that was blocking my entire logical decoding system, so we weren’t keeping up with transactions. Looking in pg_stat_statements I see a recursive query where I should have had a UNION with SELECT DISTINCT instead of a gigantic query that was amassing so many rows. Fixed it: I had a 100x speed up in the query. Thank you pg_stat_statements!
Side note: I got to say, “hey, I sped up this query by 100x,” even though it was my fault it was slow to begin with.
There are utility functions to help with data types. UID OSSP is very widely used. Newer versions of Postgres have a random UUID function, but previously, anytime you needed a UUI you would CREATE EXTENSION "uuid-ossp".
The other fun thing about this extension is that developers learned about SQL identifiers that require double quotes to install the extension.
I think I saw Keith on here today. pg_partman! What’s really cool about pg_partman too is that a lot of it is PL/pgSQL. This PL/pgSQL code provides a way to manage partitions across all your tables in your entire database. Again, that’s really powerful because Postgres has added declarative partitioning in version 10, but pg_partman is still incredibly useful because there are all sorts of aspects to partition management not supported in Postgres today. This is another example where Postgres provides the core functionality and you can use the extension to package additional functionality that makes it easier for use.
Foreign data wrappers.Postgres has a whole interface to interface with other databases. It could be other Postgres databases, other relational databases, non-relational databases, file systems, etc. Postgres has a C-level interface that allows you to design the calls to optimally use all these different databases. Again, packaged up as an extension, being able to add things as we go on.
I’ll wait till the end to answer questions this will be a relatively short presentation, so we should have some time for discussion
Last but not least, a little bit on PostGIS. I think this is one of the most powerful aspects of Postgres. PostGIS itself is incredibly powerful because you have a geospatial database that happens to be Postgres underneath. A lot of heavy PostGIS users don’t even realize they’re using Postgres! They think they’re using PostGIS. That is really the power of Postgres extensibility in a nutshell: It looks like you have like a brand new, domain-specific database – and yet underneath it it’s just boring old Postgres doing all the things you expect a database to do. That is also a very powerful notion/
- Tobias Bussmann in chat: Many PostGIS users don’t know they are using PostgreSQL 🤣
- nils in chat: 🎉
To add a coda to it, you have pgRouting, an extension built on top of PostGIS, which is built on top of Postgres. So you have a cascading effect of extensions building on top of extensions building on top of Postgres.
So we’re supposed to talk about trusted language extensions. To really to TLEs it’s important to understand the anatomy of an extension. There are certain things that you need in order to have an extension: You need a control file, which is kind of like your packaging manifest. it tells you what’s in the extension. It goes into a directory.
You have SQL files, which effectively map out the objects that you’re going to have in your database. If you have functions that need to map to a C function or if you need to create a table access method in order to build your new your new storage layer, the SQL files are the building block.
If you have C- specific code, it goes in a library file or a shared object file that gets stored in a library directory.
It’s a very simple layout. What’s cool is if you go to create an extension, there’s a particular pattern that it forms: You need to know that when you have an extension, the information in the control file goes into the Postgres catalog. Then, if there are any functions or objects or whatever in that extension, we need to install the object itself, but we also need to make sure that there’s a dependency on the extension. That way, if we need to remove the extension or upgrade it, we know all the objects that we’ve collected.
So why this “extension building 101”? This gets at the heart of why we built TLes. Because the first thing to notice is that we install extensions directly on the file system. There are a lot of domains where that’s restricted — certainly managed service providers.
I worked a lot with containers previously, and a container is effectively an immutable file system: once you have things installed on it it’s installed. You typically don’t want to give your app developers access to your production systems, because your app developers are going to install everything under the sun on them, myself included. You certainly want to be able to restrict certain domains.
But we also don’t want to inhibit what developers want to build. We want to make it as easy as possible for them to manage their code and be able to install in different environments. That gets to another point beyond restricting the file system. Some extensions may not be universally available, depending on where you’re running them. You might be running an extension on an on-premise environment that might not work in a managed service provider. Or different managed service providers have different extensions available.
The final bit — and something that I’ve definitely personally experienced — is that, between major versions of Postgres, the API or the ABI will change. These are the interface points at the C layer. When they change it can break extensions. Ask any of the package managers how much they need to nag people to upgrade their extensions: they always want to make sure that they’re keeping it up-to-date, and ensuring that it’s compatible.
But this can also lead to other issues, because as the end user, this makes it challenging to perform major version upgrades — particularly if I’m dependent on an extension that hasn’t been updated to work with the latest version of Postgres. A subtle line of code change in Postgres could end up breaking an extension.
Quick story: that actually happened to me while I was managing pg_tle. I think it was a change in Postgres 15.1 actually broke something in the pg_tle extension. I had to to fix it. I think that’s part of the point: if you’re able to use a language that’s on top of C, and you have the C layer abstracted away, in theory it could make it easier to perform major version upgrades.
That leads into TLE.
I think there were two Notions behind trusted language extensions, or TLEs, when they were initially designed. The first is giving power to app developers to be able to build extensions. It’s actually one thing I noticed as I started making the journey from app developer to quasi-DBA to, ultimately, product manager not writing code. Part of that Journey was the power of putting some of my business logic in the database.
There’s always this tension between how much business logic to put in application code versus the database. But there are certain things that were just clear wins for me when they wer in the database. The first was a lot of search functions I wrote where filtering data down to a very small set in the database and returning to the application would save on network time, processing time on the app side, etc. There were some very clear wins by encapsulating them in functions.
But also solving things that were just much easier to solve in the database. Having specific data types that solve a particular problem — geospatial extensions keep coming to mind, pgvector dealing with Vector data, and being able to store it in a database without having delegate it out into an application certainly is a is a clear win.
The other thing was installing extensions. I think the notion of portability is very powerful. If I have a way to manage my extensions from a SQL interface, it makes it much easier to move it between different systems.
Now, we do need to be careful as soon as we start saying “SQL interface”. I don’t want to suggest that we should have a SQL interface to ship C code everywhere We know there are some challenges with C code. C is highly performant, you can effectively build anything under the sun using C, but it’s not memory-safe, and it’s very easy if you’re not familiar with what you’re doing — and even if you are familiar with what you’re doing! — you can easily make mistakes that could either lead to crashes or or possibly worse.
As we were thinking about all this with TLE, there’s three things. First, we need an interface to be able to install and manage extension code effectively regardless of environment. We need a SQL interface to do that. We also need to make sure there’s an appropriate trust boundary. Now, Postgres provides a trust boundary with the notion of a trusted language. But there are other things we need to build with trust, as well.
For example, you might not want everyone in your environment to be be able to install the a TLE, so we need to make sure there’s appropriate access controls there.
Finally, we need a way to package it up — which I think is what we’re going to talk about at the Extension Summit.
If there are any takeaways from why we built TLE (I think this is the the slide that encapsulates it), it’s that, by using by using Postgres’s built-in trusted language interface it allows you to write extension code in languages that we know are going to respect Postgres security boundaries.
Postgres has this definition of a trusted language which, if you look at for it, you have to effectively dance around the documentation to find it.
But effectively I’d summarize as, if you allow an unprivileged user to write code in a trusted language, they can’t do anything to escalate their privileges, access the file system directly, or do anything that would violate Postgres’s security boundary.
It’s a pretty good definition. Arguably, the easiest way to violate that definition is that you as the database administrator mark an untrusted language as trusted in the catalog. But I strongly advise to not do that!
What trusted languages are available. There’s a great wiki page called the “PL Matrix” on the Postgres Wiki that shows the status of all all the known PLs in Postgres and whether they’re trusted or not. I suggest looking at that.
- David Wheeler (he/him) in chat: https://wiki.postgresql.org/wiki/PL_Matrix
- Jeremy S in chat: Linux had kernel modules forever, but many people were rightfully hesitant because a kernel module could easily crash your entire system. One of the reasons eBPF is exploding today is because it’s verified and safe and enables code (like entire networking apps) to run directly in the linux kernel.
  
  I see TLE similarly
A big thing regarding a trusted language is performance. There are a variety of trusted languages, and they all have different properties you know around them. The ones I’m showing today are the ones available in RDS Postgres. But the reason I want to show them is that, part of the idea of trusted language extensions is allowing app developers who may be less familiar with C to write extension code and access some of the same internals as a C extension, but from one of these languages.
Here are some of the known trusted languages today that work with TLE. If you’re using the TlE open source project, you can use any available trusted language — or you can use untrusted languages an just use the TLE packaging mechanism. In that case you lose the trusted part, but gain the extension installation aspect of TLE.
There are a few things included in TLE to make sure that TLE can be installed safely. It is an opt-in feature. We do have a shared preload library for pg_tle called “pg underscore TLE”, and you do need to have your database super user install pg_le initially. This ensures that we’re respecting your security boundary, If you’re going to use trusted language extensions, you do have an explicit opt-in to using it.
- David Wheeler (he/him) in chat: https://github.com/aws/pg_tle
After that, an app developer can create their own trusted language extension.
Here’s a simple example from the TlE announcement with two functions and packaged into an extension you can install. You can give it a name like any Postgres extension; this one is called “tle_test”. The code looks like the SQL file in any extension. And it’s effectively packaged up like an extension using the pgtle.install_extension command. If you go to the pg_le GitHub project you can see the the different interface points.
Once it’s installed you can use CREATE EXTENSION like any other extension: it follows all the usual Postgres semantics: extension installation, uninstallation, software life cycle management. pg_tle has its own interface for that functionality, bu once you’ve installed it, managing the extension is just like managing any other Postgres extension, and follows those conventions.
Effectively TLE is offering, loosely, a packaging mechanism (I think packaging has a lot more connotations): it’s a grouping mechanism for your code. One of the parts that I always found most useful in pg_tle was this part, effectively versioning my store procedures.
When I talked about the example where I was putting business logic into the database, one part I would usually mess up is: what version of my stored procedures was running in a given database. Particularly if you have hundreds or thousands of databases that you’re managing, that can be a challenge.
Now I had far fewer databases I was managing, I was more dealing with our developer environments: staging and production. But I was managing the store procedures within our migration scripts — which is totally fine, because if I know what version of the migration that I ran then I would know what version of the stored procedures are on that database. Kind of. Sort of. Maybe. You know: unless someone manually modified it — in which case shame on me for giving access to the database. But there basically could be some gaps in knowing what version of a stored procedure was on a particular server.
With pg_le we can significantly reduce the risk of that problem because we have a way to version our store procedures, and be able to know exactly what we’re running at any given time, and create a consistent packaging mechanism wherever we’re running our code. And it goes beyond stored procedures because there’s far more that you can can build with your code.
What else does TLE add? We discussed was the packaging mechanism, but a lot of the power of Postgres extensions is the ability to use the underlying Postgres internals. One of these types of internals is called a “hook”.

Hooks are the Postgres feature that you’ve never heard of, that are not well documented, and yet are the foundational part of many extensions. Hooks are almost everywhere in Postgres. You particularly see a lot of them during the query execution process. For example the process utility hook which allows you to modify any utility command, anything that’s not a direct SQL statement. There are all sorts of hooks: there are password check hooks, client authentication hooks, hooks called around shared memory allocation, hooks called at each step of the the execution phase.
- Florents Tselai in chat: Hooks are almost undocumented indeed
  
  The best resources I’ve found:
  - https://github.com/taminomara/psql-hooks
  - and the standard: https://wiki.postgresql.org/images/e/e3/Hooks_in_postgresql.pdf
Hooks are very powerful; particularly enabling a lot of extensions adding different semantic behavior to Postgres. We could probably do a whole series of talks just on all the different ways you can extend Postgres. I mean, that’s why David has organized the summit! But hooks are very simply a powerful mechanism to define behavior and Postgres.
Because they’re so powerful, for the hooks that we expose in tle we make sure that there is a super user opt-in. Remember, an unprivileged user can define this behavior but you do need someone with privilege to be able to enable something like a hook.
For example, a password check hook probably means that you have the ability to evaluate a plain text password that’s coming through. Now on that topic we can have a very long debate, but let’s save that for Vancouver. But with this hook, you do have the ability to do password checks, so you want to make sure that, when you enable a function that calling a password check hook that there’s a certain level of privilege to that function. Or you you know you want to make sure you do your appropriate evaluation to make sure that you trust that function.
In addition to that check, there’s an additional check from the pg_tle admin role that requires someone with administrative privileges over your TLE to register that hook. The concept of “TLE features” are the way to map hooks into the TLE. We’ve been building it up since we launched TLE by adding a few hooks. There’s both the check password hook and the client authentication hook.
There’s also the ability to register custom data types — which is pretty cool, because data types are what attracted me to Postgres when I was an app developer: “Oh! There are all these data types! I can do all these rich comparisons against an index? Cool! Oh wait, you can even add custom data types? That’s even cooler!”
TLE allows you to create the base data type, so you can really expand the data types that you’re able to add. This is what TLE features does: it enables that safe mapping between trusted language code and the Postgres C internals.
In order to create a hook, you need to match the hook function definition. The TLE documentation documents how to create it appropriately, but it doesn’t need all the parameters that you would find in the hook function.
In this check password hook — I call this the “delay check password test”, meaning you’re probably trying to avoid someone trying to guess your password repeatedly, and if they keep failing so what, because they’re not going to brute force it anyway. There are actually more practical examples of check password hooks. But what’s cool is that you can define everything around the your hook behavior from within the hook function and then it acts as if you wrote a C-based hook! You just happen to write it in a in a trusted language.
Hooks do execute with elevated privileges, particularly around authentication you want to be very careful. So there are some safeguards built into TLE to make sure that you only enable hooks when you want to.
Last but not least: choosing a trusted language. I know this group is more focused on extension building, but I do want to talk about what an app developer goes through when choosing a trusted language.
Because everything has its trade-offs to consider. The Golden Rule (I actually took this from Jim Mlodgensky) is: when in doubt use PL/pgSQL, because it does have a lot of access to context that’s already available in Postgres. What’s interesting about this is that what we see today is based on PL/SQL. PL/pgSQL was developed to try to make it simpler to migrate from Oracle, but at the same time to provide a lot of rich functionality around Postgres.
As someone much more familiar with Ruby and Python, I can tell you that PL/pgSQL can be a little bit quirky. But it is very well documented, and it can solve all the problems that you need to in Postgres. And it already has a lot of very simple ways to directly access your data from Postgres. Certainly an easy choice to go with.
But wait, there’s more!
like PL/v8, writing JavaScript in your database, this is really cool! I remember when it came out and how mind-blowing it was, in particular for JSON processing. PL/v8 is awesome. PL/v8 came out right around the same time as the document database! So you kind of had perfect storm of being able to process JSON and write it in JavaScript — both within your Postgres database and it could be quite powerful.
Another really cool feature of PL/v8 is the ability to directly call another function or another PL/v8 function from within PL/v8, and not have to go through Postgres function processing, which adds a lot of additional overhead.
And now the one that’s all abuzz right now: PL/Rust. Being able to write and execute Rust code within Postgres. This is pretty cool, because Rust is a compiled language! There’s a trusted way to run PL/Rust within Postgres. There are a few techniques to do it. First, whenever you’re running Rust on your server, to make sure that you’re guarding against breakouts.
There is a library, I believe it called postgres FTD, that effectively compiles out some of the less dressed parts of Rust, such as unsafe function calls. But you can still get everything that you want in PL/Rust today: you get the Rust standard Library, the ability to run crates — and you do want to evaluate crates to make sure that you’re comfortable running them in your environment. But then you get this compiled language that is CPU efficient, memory efficient, and memory safe. (Well, a lot of Rust is memory safe) It’s pretty cool!
- Steven Miller in chat: In PL/Rust, does it run the compilation when the function is created? Then if there is a compiler issue it just shows up right there?
I wrote a blog post last year that compared some different function calls between PL/pgSQL, PL/v8, and PL/Rust. First I was doing some array processing, and you could see that the Pl/Rust calls were very comparable to the C calls. And then there’s some additional Vector processing, given that I’ve been obsessing on vectors for the past 14 months. Seeing rust actually win against PL/pgSQL and PL/v8 (I don’t remember the numbers off the top of my head I can look up that blog as soon as I switch windows). Pretty cool!
This brings us in some ways to the best of all worlds, because I can take an extension that normally I would write in C, particularly because I’m focused on performance, I can write it in PL/Rust, package it as a trusted language extension, and run it anywhere that TLE and PL/Rust are supported. Again, that is very powerful, because suddenly I have what I hope is the best of all worlds: I have this portability, I don’t have to worry as much about major version upgrades because pg_le is acting as that abstraction layer between the Postgres C code and the application code that I’m writing.
- Jeremy S in chat: Versioning of stored procedures is a very interesting use case
- Darren Baldwin in chat: Agreed! Moving stuff to the database layer seems to be something very foreign and “scary” to most app devs I’ve talked to
- Anup Sharma in chat: Is TLE a requirement for any PostgreSQL extension, or is it dependent?
- Steven Miller in chat: So during a major version upgrade, the function declaration stays the same, so that’s why your application doesn’t need to change with respect to the extensions during a major version upgrade. And at some point during the migration, you create the function again, which recompiles. So it all works the same! That’s great
Last slide, then I’m certainly looking forward to discussion. pg_tle is open source, and it’s open source for a lot of reasons. A lot of it is because we want to make sure that trusted language extension are as portable as possible. But in some ways the ideas behind TLE are not original. If you look at other databases there is this notion of, let’s call it inline extensions, or inline SQL, ou call them modules, you call them packages. But the idea is that I can take reusable chunks of code, package them together, and have them run anywhere. It doesn’t matter where the database is located or hosted.
This is something that I personally want to work with folks on figuring out how we can make this possible in Postgres. Because even in Postgres this is not an original idea. Dimitri Fontaine was talking about this as far back as 2012 in terms of his vision of where of the extension framework was going.
What I’m looking forward to about this Extension Summit — and hopefully and hopefully I’m not in conflicting meetings while it’s going on — is talking about how we can allow app developers to leverage all the great parts of Postgres around function writing, function building, and ultimately packaging these functions, and making it simple simpler for them to be able to move it wherever their applications are running.
So it is open source, open to feedback, under active development, continue to add more features to support Postgres. Iltimately we want to hear what’ll make it easier for extension writers to be able to use TLE, both as a packaging mechanism and as a as a development mechanism.
So with that uh I that is the end of my slides and happy to uh get into a discussion about this.

Discussion

David Wheeler (he/him): Awesome, thank you Jonathan. there was one question about PL/Rust in the comments. Stephen asks whether it compiles when you create the function, so if there are compiler issues they they show up there.
Jonathan Katz: Correct It compiles when you create the function and that’s where you’ll get compile errors. I have definitely received my fair share of those [chuckles]. There is a Discord. PL/Rust is developed principally by the folks uh responsible for the pgrx project, the folks at ZomboDB, and they were super helpful and debugging all of my really poor Rust code.
David Wheeler (he/him): While while people are thinking about the questions I’ll just jump in here. You mentioned using crates with PL/Rust. How does that work with pg_le since they have to be loaded from somewhere?
Jonathan Katz: That’s a good question. I kind of call it shifting the problem. TLE solves one problem in that you don’t need to necessarily have everything installed on your on your local file system outside of pg_tle itself. If you’re using PL/Rust and you need crates, you do need those crates available either within your file system or within whatever package management tools you’re using. So it shifts the problem. I think it’s going to be a good discussion, about what we can do to help ensure that there is a trusted way of loading those.
David Wheeler (he/him): Yeah I wonder if they could be vendored and then just included in the upload through the function call.

Anup Sharma asked asked if pg_tle s a requirement any extension or is it dependent.
Jonathan Katz: It’s not requirement. This is a project that is making it possible to write Postgres extensions in trusted languages. There ar plenty of extension authors on this call who have written very, very, very good extensions in C that do not use TLE.
David Wheeler (he/him): You can use trusted languages to write extensions without TLE as well. It’s just a way of getting it into the database without access to the file system, right?
Jonathan Katz: Correct. I think I saw Keith here. pg_partman is PL/pgSQL.
- Anup Sharma in chat: Understood. Thanks
- Tobias Bussmann in chat: I think it is important not to confuse Trusted Language Extensions TLE with “trusted extensions” which is a feature of Postgres 13
- Keith Fiske in chat: Pretty much all of it is. Just the background worker isn’t
- Jonathan Katz in chat: hat’s what I thought but didn’t want to misspeak 🙂
David Wheeler (he/him): Right Any other questions or comments or any implications that you’re thinking about through for extension distribution, extension packaging, extension development?
- Steven Miller in chat: Is background worker the main thing that a TLE could not do in comparison to traditional extensions?
Jason Petersen: The crates thing kind of raised my interest. I don’t know if Python has this ability to bring in libraries, or if JavaScript has those dependencies as well. But has there been any thought within pg_tle for first classing the idea of having a local subdirectory or a local file system layout for the “native” dependencies? I’m using “native” in quotes here because it could be JavaScript, it could be Python, whatever of those languages, so they could be installed in a way that’s not operating system independent.

I know this is kind of a complex setup, but what I’m getting at is that a lot of times you’ll see someone say “you need to install this package which is called this and Red Hat or this on Mac or this on Debian — and then you can install my extension. Has there been any push towards solving that problem by having your TLE extensions load things from like a a sort of Walled Garden that you set up or something? So it’s specific to the database instead of the OS?
Jonathan Katz: That’s a good question. There has been thought around this. I think this is going to be probably something that requires a thorough discussion in Vancouver. Because if you look at the trusted languages that exist in Postgres today, the definition of trusted language is: thou shall not access the file system. But if you look at all these different languages, they all have external dependencies in some in some way shape or form. Through Perl there’s everything in CPAN; through JavaScript there’s everything in npm. Let’s say installed the appropriate CPAN libs and npm libs within uh your database for everything I recall from playing with trusted PL/v8 and PL/Perl is that you still can’t access those libraries. You can’t make the include or the require call to get them.

Where PL/Rust is unique is that first off we just said, “yes, you can use your Cargo crates here.” But I think that also requires some more thinking in terms of like how we make that available, if it’s OS specific, vendor specific, or if there’s something universal that we can build that helps to make that more of a trusted piece. Because I think at the end of the day, we still want to give the administrative discretion in terms of what they ultimately install.

With the trusted language extensions themselves, we’re able to say, “here’s the post security boundary, we’re operating within that security boundary.” As soon as we start introducing additional dependencies, effectively that becomes a judgment call: are those dependencies going to operate within that security boundary or not. We need to be make sure that administrators still have the ability to to make that choice.
I think there are some very good discussion topics around this, not just for something like PL/Rust but extension distribution in general I think that is you know one of the I think that’ll be one of the key discussions at the Extension Summit.
- David Wheeler (he/him) in chat: What if the required modules/packages/whatever were in a table. e.g. in Perl I do use Foo::Bar and it has a hook to load a record with the ID Foo::Bar from a table
David G. Johnson: Has there been any thought to having the default version of an extension tied to the version of PostgreSQL? Instead of it just being 1.3 and, whether I’m on version 12 or 15, because 1.3 might not even work on version 12 but it would work on version 15. The versioning of the an extension and the versioning of PostgreSQL seem like they’re almost too independent.
Jonathan Katz: So David, I think what you need to do is chastise the extension developers to let them know they should be versioning appropriately to to the the version of Postgres that they’re using. [Chuckles]

There is a good point in there, though. There is a lot of freedom in terms of how folks can build extensions. For example, just top of mind, pgvector supports all the supported versions of Postgres. Version 0.7.0 is going to be coming out soon so it’s able to say, “pgvector 0.7.0 works with these versions.” Dumb. PG plan meanwhile maintains several back releases; I think 1.6.0 is the latest release and it only supports Postgres 16. I don’t believe it supports the earlier versions (I have to double check), but there’s effectively things of that nature.

And then there aer all sorts of different things out there, like PostGIS has its own life cycles. So there’s something good in that and maybe the answer is that becomes part of the control file, saying what versions ov Postgres an extension is compatible with. That way we’re not necessarily doing something to break some environment. I’m just brainstorming on on live TV.
David G. Johnson: The other day I open a but report on this. but PostgreSQL dump and restore will dump it without the version that’s in the source database, and when yoq restore it, it’s going to restore to whatever the current version for the control file is even if you’re upgrading to a different database. versus restoring it to whatever the original version was. That dynamic just seemed problematic.
David Wheeler (he/him): I think it’s less problematic for trusted language extensions or extensions that have no C code in them, because pg_dump does dump the extension, so you should be able to load it up. I assume base backup and the others do the same thing.
David G. Johnson: I haven’t checked into that. It dumps CREATE EXTENSION and then it dump any user tables that are marked by the extension. So these code tables are marked as being user tables for TLE?
David Wheeler (he/him): What do you mean by code tables?
Regina Obe: That’s a good point. For example my Tiger geocoder is all PL/pgSQL, but it’s only the CREATE EXTENSION thing that’s named. So for your TLE table, it would try to reload it from the original source, wouldn’t it? In which case it would be the wrong version.
Jonathan Katz: We had to add some things into TLE to make sure it worked appropriately with pg_dump. Like I know for a fact that if you dump and load the extension it works it works fine. Of it doesn’t then there’s a bug and we need to fix it.
David G. Johnson: Okay yeah I haven’t played with this. Literally this is new to me for the most part. I found the whole fact that the control file is not updated when you do ALTER EXTENSION to be, at least in my mind, buggy.
Jonathan Katz: In the case of TLE, because it’s in theory major version-agnostic. When I say “in theory,” it’s because we need to make sure the TLE code in library itself is able to work with every major version. But once that’s abstracted away the TLEs themselves can just be dumped and reloaded into different versions of Postgres. I think we I we have a TAP test for that, I have to double check. But major version upgrades was something we 100% tested for
David Wheeler (he/him): I assume it’d be easier with pg_tle since there’s no need to make sure the extension is is installed on the file system of the new server.
Jonathan Katz: Yep. if you look at the internals for pg_tle, effectively the TLEs themselves are in a table. When you do a CREATE EXTENSION it gets loaded from that particular table.
David G. Johnson: Right, and when you do a pg_dump you make suer that table was dumped to the dump file.
Jonathan Katz: Yes. But this is a key thing that we we had to make sure would does work: When loading in a pg_dump, a lot of the CREATE EXTENSIONS get called before the table. So we need to make sure that we created the appropriate dependency so that we load the TLE data before the CREATE EXTENSION. Or the CREATE EXTENSION for the TLE itself.
- Jeremy S in chat, replying to “Is background worker the main…”: doing a background worker today, I think requires working in C, and I don’t think core PG exposes this yet. Maybe it could be possible to create a way to register with a hook to a rust procedure or something, but maybe a better way in many cases is using pg_cron
- Jonathan Katz in chat: We can add support for BGWs via the TLE API; it’s just not present currently.
- nils in chat: Creative thinking, if a background worker doesn’t work in TLE, how about create your UDF in tle and schedule with pg_cron 🤡
David Wheeler (he/him): You mentioned in the comments that you think that background workers could be added. How would that work?
Jonathan Katz: It would be similar to the the other things that we’ve added, the data types and the hooks. It’s effectively creating the interface between the C API and what we’d expose as part of the TLE API. It’s similar to things like pgrx, where it’s binding to Postgres C API but it’s exposing it through a Rust API. We do something similar with the TLE API.
- Steven Miller in chat: Thank you Jeremy. I like this idea to use TLE, then depend on cron for a recurring function call
- Steven Miller in chat: Ah yes Nils same idea 😄
- Jason Petersen in chat: Thumbs up to nils about pgcron. If you need a recurring BGW just write it in plpgsql and schedule it
- nils in chat: Great hackers think alike
- Jason Petersen in chat: (I know I do this)
David Wheeler (he/him): That that makes sense. I just thought the background workers were literally applications that are started when the postmaster starts up shut down when the postmaster shuts down.
Jonathan Katz: But there’s dynamic background workers.
David Wheeler (he/him): Oh, okay.
Jonathan Katz: That’s how a parallel query works.
- Jeremy S in chat: Threading? 😰
David Wheeler (he/him): Gotcha, okay. Sorry my information’s out of date. [chuckles]
Jonathan Katz: Well maybe one day we’ll have you know some some form of threading, too. I don’t think like we’ll get a wholesale replacement with threads, but I think there are certain areas where threads would help and certain areas workers are the way to go/
David Wheeler (he/him): Yeah, yeah that makes sense.
Jonathan Katz: Hot take!
David Wheeler (he/him): What other questions do you have for about TLEs or extensions more broadly and packaging in relation to TLEs?
David G. Johnson: Just a random thought: Have you thought about incorporating foreign servers and pointing the TLE, instead of a local database, point it to a master, company-wide foreign table?
David Wheeler (he/him): Like a TLE registry?
David G. Johnson: Right, yeah something global would be nice. like okay we hosted on PGXN at there’s a TLE registry. But because for a company who wants maintain code internally between projects, and they want a shared library, they can publish it on one server, send up a link to it over foreign server, and then just point at that.
Jonathan Katz: Could be!
David Wheeler (he/him): I mean you could just use foreign foreign tables for that for the tables that TLE uses for its its registry, right?
David G. Johnson: That’s I’m thinking.
David Wheeler (he/him): Yeah that’s a cute idea.
Jonathan Katz: I think that just to to go back a few more minutes. I think you I was asked to talk about the vision. One one way to view extensions is trying things out before they’re in core, or before they’re in Postgres. The aspect that I would ultimately like to see in core someday is the ability to do that’s called “inline modules.” There is a SQL standard syntax, CREATE MODULE, that for this purpose. Some folks were trying to see see if we could get it into, I believe, Postgres 15. There was some push back on the design and it died on the vine for the time being.

But I do think it’s something to consider because when I talk to folks, whether it’s random Postgres users RDS customers, etc., and I go through TLE, one of the things that really stands out is one of the things that we had discussed here and I saw in the chat, which is this aspect: being able to version your stored procedures. This is in part what modules aims to solve. One is just having a SQL interface to load all these things and group it together. But then once you have that grouping you have the ability to version it. This is the part that’s very powerful. As soon as I saw this I was like, “man I could have used that that would have saved me like hours of debugging code in production.” Mot saying that I was ever sloppy and you know in random store procedures in my production database!
- David Wheeler (he/him) in chat: I see CREATE MODULE in the db2 docs.
Jonathan Katz: But that’s kind of the vision. The fact that Postgres is extensible has led to this very widely adopted database. But I think there are things that we can also learn in our extensions and bring back upstream. There are certainly reasons why they we developing things in extensions! Like pgvector is an example of that, where we talked about it at PGCon last year. And part of the thought of not trying to add a vector data type to Postgres was, first, to make sure we could settle on what the the binary format would be; and once that’s solidified, then we could add it.

But I had an aside with Tom [Lane] where we talked about the fact that this is something we need to move fast on, the vector space is moving very quickly, extensions are a way to be able to move quickly when something like Postgres moves more deliberately.

This is in some ways where TLE is, our way to be able to see what kind of interface makes sense for being able to do inline extension loading and ultimately how we want that to look in core.
David Wheeler (he/him): Can you create data types with a binary representation in TLE?
Jonathan Katz: Yes as of (I want to say) the the 1.3 release. I have to double check the version. The way we’re able to do it safely is that it actually leverages the BYTEA type. When you create that representation it stores it as a BYTEA. What you get for free today is that, if you create your equality/inequality operators, you can use a b-tree look up on these data types.

So there’s a “dot dot dot” there. If we wanted to be able to use like GIST in GIN and build data types for our other index interfaces, there’s more work to be done. That would require a TLE interface. I spent a lot of time playing with GIST and GIN, and the interface calls themselves involve pointers. So that will require some thought yeah.
David Wheeler (he/him): I assume it’s a similar issue for Rust data types that are basically just serde-serialized.
Jonathan Katz: Yeah we can at least like store things in BYTEA, and that’s half the battle. It allows us to do a safe representation on disk as opposed just “here’s some random binary; good luck and don’t crash the database!”
- Jason Petersen in chat: I also wondered about the function interface for things like storage features (table access methods).
  
  I assume they’re similarly hairy
David Wheeler (he/him): Any other last minute questions?
Jonathan Katz: Table access methods. Yes table access methods are very hairy as are index access methods. I spent a lot of time the past 14 months looking at the index access method interface, which has a lot of brilliance in it, and certainly some more areas to develop. But it’s amazing! The fact that we can implement vector indexes and get all the best parts of Postgres is a phenomenal advantage.
Jeremy S: One last question. We’re leading up to Vancouver and we’re going to be starting to think about some of the topics that we want to make sure to talk about at the Summit. I think you mentioned one earlier (I should have written it down), but any final thoughts about topics that we should make sure to discuss?
Jonathan Katz: Just in general or TLE specific?
Jeremy S: Both. I mean for sure TLE-specific, but also just generally related to extensions
Jonathan Katz: My TLE-specific one dovetails into the general one. The first one is: is there ultimately a path forward to having some kind of inline extension management mechanism in core Postgres. That’s the top, part one, I spent the past five minutes talking about that.

But I think the big thing, and why we’re all here today, is how do we make it easier for developers to install extensions, manage extensions, etc. I think the notion of package management thanks to the work of Andres finding the backdoor to xz also shines a new light, because there’s a huge security component to this. I remember, David, some of our earlier chats around this. I think you know —- again, being ap-developer sympathetic — I definitely want to see ways to make it easier to be able to load extensions.

Having spend spent a lot of time on the other side, the first thing that comes to mind is security. How do we create a protocol for managing the extension ecosystem that also allows folks to opt into it and apply their own security or operational or whatever the requirements are on top of it. That’s the thing that’s most top of mind. I don’t expect to have like a full resolution from the Extension Summit on it, but at least the start of it. What is ultimately that universal packaging distribution protocol for Postgres extensions that we can all agree on?
David Wheeler (he/him): Thank you so much! Before we go I just wanted to tee up that in two weeks Yuri Rashkovskii is going to talk about his idea for universally buildable extensions: dev to prod. That’ll be on May 1st at noon Eastern and 4pm UTC. Thank you everybody for coming.

Test and Release pgrx Extensions with pgxn-tools

24 Apr 2024

Yesterday I released v1.6.0 of the pgxn/pgxn-tools Docker image with a new command: pgrx-build-test works much like the existing pg-build-test utility for PGXS extensions, but for pgrx extensions. Here’s an example from pg-jsonschema-boon, a pgrx extension I’ve been working on:

name: 🧪 Test
on:
  push:
jobs:
  test:
    runs-on: ubuntu-latest
    container: pgxn/pgxn-tools
    strategy:
      matrix:
        pg: [11, 12, 13, 14, 15, 16]
    name: 🐘 Postgres ${{ matrix.pg }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Start PostgreSQL ${{ matrix.pg }}
        run: pg-start ${{ matrix.pg }}
      - name: Setup Rust Cache
        uses: Swatinem/rust-cache@v2
      - name: Test on PostgreSQL ${{ matrix.pg }}
        run: pgrx-build-test

The format is the same as for pg-build-test, starting with installing a specific version of Postgres from the Postgres Apt repository (supporting versions 8.2 – 17). It then adds the Swatinem/rust-cache action to speed up Rust builds by caching dependencies, and then simply calls pgrx-build-test instead of pg-build-test. Here’s what it does:

Extracts the pgrx version from the Cargo.toml file and installs it (requires v0.11.4 or higher)
Initializes pgrx to use the Postgres installed by pg-start
Builds the extension with cargo pgrx package
Tests the extension with cargo pgrx test
Installs the extension with cargo pgrx install
Checks for a Makefile with installcheck configured and, if it exists, runs make installcheck

This last step allows one to include PGXS-style pg_regress tests in addition to Rust/pgrx tests, as pg-jsonschema-boon does. Here’s a successful run.

Special thanks to Eric Ridge and @Jubilee for all the help and improvements in pgrx v0.11.4 that enable this to work transparently.

pgrx Release Pattern

The pattern for releasing a prgx extension on PGXN is the same as before, although you may want to generate the META.json file from a template. For example, the pg-jsonschema-boon Makefile creates META.json from META.json.in by reading the version from Cargo.toml and replacing @CARGO_VERSION@, like so:

DISTVERSION = $(shell perl -nE '/^version\s*=\s*"([^"]+)/ && do { say $$1; exit }' Cargo.toml)

META.json: META.json.in Cargo.toml
	@sed "s/@CARGO_VERSION@/$(DISTVERSION)/g" $< > $@

The release workflow uses it like so:

name: 🚀 Release on PGXN
on:
  push:
    # Release on semantic version tag.
    tags: ['v[0-9]+.[0-9]+.[0-9]+']
jobs:
  release:
    name: 🚀 Release on PGXN
    runs-on: ubuntu-latest
    container: pgxn/pgxn-tools
    env:
      PGXN_USERNAME: ${{ secrets.PGXN_USERNAME }}
      PGXN_PASSWORD: ${{ secrets.PGXN_PASSWORD }}
    steps:
    - name: Check out the repo
      uses: actions/checkout@v4
    - name: Bundle the Release
      env: { GIT_BUNDLE_OPTS: --add-file META.json }
      run: make META.json && pgxn-bundle
    - name: Release on PGXN
      run: pgxn-release

Note the “Bundle the Release” step, which first calls make META.json to generate the dated file, and tells pgxn-bundle to add the META.json via the GIT_BUNDLE_OPTS environment variable. The project also excludes the META.json.in file from the bundle in its .gitattributes file, and excludes META.json from the project repository in its .gigignore file.

Looking forward to seeing all your pgrx projects on PGXN!

PGXN V2 Architecture

23 Apr 2024

Diagram of the extension distribution ecosystem vision, featuring “Root Registry” in the center and bidirectional lines to four of the surrounding nodes: “Web UX”, “Client”, “Packaging”, and “Interactions”. The “Packaging” and “Interactions” boxes also have a bi-directional arrow between them, while the fifth box, “Stats & Reports”, has a bi–directional arrow pointing to “Interactions” and another arrow pointing to “Root Registry”. — PGXN Future Architecture

High-level diagram of the six logical services making up the proposed future extension distribution architecture. The *Root Registry* sits at the center, providing APIs for the other services to consume for their own use cases. Trusted instances of those services submit additional data about extensions via the *Interactions* service to enhance and enrich the service to better inform and delight users.

Over on the Postgres Wiki I’ve published a new document for the PGXN v2 project: PGXN v2 Architecture. It has diagrams, such as the one above! From the introduction:

This document outlines the project to build extension distribution, discovery, and packaging tools and services to power the growth, accessability, and utility of the Postgres extension ecosystem. Taking the overall Postgres community as its audience, it defines the services to be provided and the architecture to run them, as well as the strategic vision to guide project planning and decision-making.

With the goal to think strategically and plan pragmatically, this document describes the former to enable the latter. As such, it is necessarily high-level; details, scoping, and planning will be surfaced in more project-focused documents.

Bear in mind that this document outlines an ambitious, long-term strategy. If you’re thinking that there’s too much here, that we’er over-thinking and over-designing the system, rest assured that project execution will be fundamentally incremental and pragmatic. This document is the guiding light for the project, and subject to change as development proceeds and new wrinkles arise.

For those of you interested in the future of Postgres extension distribution, please give it a read! I expect it to guide the planning and implementation of the the new services and tools in the coming year. Please do consider it a living document, however; it’s likely to need updates as new issues and patterns emerge. Log in and hit the “watch” tab to stay in the loop for those changes or the “discussion” tab to leave feedback.

I’ve also moved the previously-mentioned document Extension Ecosystem: Jobs and Tools to the wiki, and created a top-level PGXN v2 and PGXN category for all PGXN-related content. It also includes another new document, Service Disposition, which describes itself as:

A summary of the ambitiously-envisioned future PGXN services and architecture, followed by an examination of existing services and how they will gradually be refactored or replaced for the updated platform.

Check it out for how I expect existing services to evolve into or be replaced by the updated platform.

🎙️ Hacking Postgres s02e03

22 Apr 2024

Screenshot from Hacking Postgres, with a classic TV-shaped view on the left labeled “Ry Walker, CEO Tembo” featuring a middle-aged white man in a black hoodie and large black headphones; and another TV-shaped view on the right labeled “David Wheeler, Principal Architect, Tembo” featuring a middle-aged white man with a grey beard, eyeglasses, blue collared shirt, and large beige headphones. — Hacking Postgres S2E3: David Wheeler, Principal Architect, Tembo

Last week I appeared on s02e03 of the Hacking Postgres podcast.

The experience I had after my independent Consulting gig for 10 years working in companies was, like, bringing up other people and being supportive of other people and hearing from a diversity of voices and perspectives makes everything better.

That’s part of why I want to get so much input on and feedback on the stuff that I’m hoping do with PGXN v2 — or whatever we ultimately call it. But people matter, more than the technology, more than any of the rest of it.

I quite enjoyed this wide-ranging discussion. We covered my history with the Postgres community, a bunch of the projects I’ve worked on over the years, plans and hopes for the PGXN v2 project, perspectives on people and technology, and exciting new and anticipated features of Postgres. Find it wherever fine podcasts are streamed, including:

Mini Summit: Jonathan Katz on TLEs

15 Apr 2024

Coming up this week: the fourth installment of the Postgres extension ecosystem mini-summit. The topic: Trusted Language Extensions, a.k.a. TLEs. Jonathan Katz of AWS will give a brief presentation on “TLE Vision and Specifics” followed by community discussion of the issues TLEs aim to address, what works and what doesn’t, and the role of TLEs in the future of the extension ecosystem.

Join us! Note! that if you reserved a spot at a prior mini-summit, sadly you will need to do so again for each subsequent summit or miss out on reminders from Eventbrite. And if Eventbrite says the event is sold out, rest assured we have plenty more virtual seats! Just send at david@ this domain, ping me on Mastodon or via the #extensions channel on the Postgres Slack or the Postgres Discord.

Mini Summit Three

10 Apr 2024

Terrific presentation and discussion at last week’s Postgres Extension Ecosystem Mini-Summit. In fact, I later learned that some people couldn’t make it because the Eventbrite event page showed it as sold out!

This is a limitation of the free level at Eventbrite, which caps tickets at 25. But this is a fully-remote event and we can host more people than that. We had over 30 last week! So if you’d like to attend one of the remaining three events and Eventbrite shows it as sold out, have a look at the bookmarks for the #extensions channel on the Postgres Slack, or email the username david at this domain and I will send you the link!

Okay, back to business. Last week, Devrim Gündüz kindly gave a super thorough presentation in his capacity as the maintainer of the Postgres community Yum and ZYpp repositories. This topic sparked lots of interesting discussion and commentary, which I detail below. But why read when you can look?

Still reading? Okay then, here we go!

Introduction

I opened the meeting and introduced Devrim Gündüz.

Presentation

Thank you for the introduction. Going to talk about Yum and ZYpp dot postgresql.org, these are the challenge for us, and the challenge of making RPMs from extensions.
Work for EDB as Postgres expert, Postgres Major contributor responsible for the Postgres Yum repo. If you’re using RPMs, I’m building them. I also contribute to Rocky, SUSE, and Fedora Linux, proud Postgres community member, live in London, and I’m also doing some DJing, so if I get fired I have an extra job to do.
Agenda: Last year at PGConf.eu in Prague, I had 60 slides for 5 minutes, so expect huge number of slides for today. I want to leave some time for discussion discussion and questions.
I’m going to start with how it started back in the 2000s, talk about the contents of the repos, which distributions we t because that’s another challenge um how do we do the extension packaging how to RPM how to build RPMs of an extension and how to update an extension RPM.
Then the horror story um when what happens when there’s a new Postgres beta is out, which is due in the next next month or so for Postgres 17.
Then we are going to speak about the extension maintainers, problem for us, and then the problems about relying on the external repos.
So if you have any questions please ask as soon as you can. I may not be able to follow the chat during the presentation, but I’ll try as much as I can.
So let’s talk about how it started. When I started using Linux in 1996 um and Postgres in 1998, we always had Tom Lane and we had we had Lamar for for who build built RPMs for RedHta Upstream. So they were just building the distribution packages, nothing community. It was only specific to RedHat — not RedHat Enterprise Linux but RedHat 3-4-5 — not the non-enterprise version of RedHat back then, but it was called it Fedora core back then, the first version was released in November 2003, which was another challenge for packaging
One of the problems with the distribution packaging was that it was slightly behind the minor Postgres releases, sometimes major post releases
So that was one single Postgres version available for a given distro, say Postgres 6.4 or something, 7.0, and multiple versions were not available back then, and the minor version was slightly behind.
I started building packages for my laptop because I wanted to use Postgres and not all versions wer available back then. So I started building packages for my laptop and my server. They were based on the packaging of Tom Lane and Lamar.
Then I uploaded them to my personal server and emailed the PG mailing lists lists and said, “I’m running on own packages, use at your own risk. This is just a rebuild of the upstream packages on the RedHat version that I’m using or the Fedora version that I’m using. Up to you! This is a personal project, nothing serious.”
So then then people started downloading them, and using them. There was no signature, nothing official back then. Then Lamar said he didn’t have enough time for the RPMs. He sent an email to the mailing lists and I said Devrim is stepping up to the plate, and I did it. So that was I think in 2004, about which is about 20 years ago.
So 19 years ago we had the first domain, postgresql.rpm.org, and then we had more packages. In 2007 we had the first repository RPM and then we had yum.postgresql.org. This means that, starting 2007, this began to be the official RPM repo of the community, which which was a good thing because we could control everything under the Community.
I was living in Canada back then. We had the first RPM build farm — instead of using my laptop and my VMs — we had the actual machine back then.
In 2010 we had what was then called multiple postmaster support, which means the parallel installation of the Postgres RPMs. That was a that was a revolution because even still, the current Fedora or RedHat RPMs cannot be installed in parallel. So if you want to install post 13, 14, 15, 16, and 17 or whatever, it is available in the community repo. This is a great feature because you may want to test or do something else. This is how we how we started, 14 years ago we had this feature in the community repo.
Then next year we moved the repo to the community servers and unified the spec files. Our builds are extremely simple — like you can start your own builds in less than 10 minutes: five minutes for pulling the git repo and then a few minutes for for a package. Extremely simple builds, and now we are in 2024
Let’s talk about the contents, because we have four different contents for the repo. The first one is the “common” repo. We call it “common” in the repository config file. It has the RPMs that work across all Postgres major releases. Let’s say pg_badger, or the ODBC driver, JDBC driver, or GDAL for PostGIS. There’s lots of stuff: Python packages, which are not extensions but we like Patroni, which actually works for all Postgres releases.
This was an effort to get rid of duplicates in the Postgres repo. I think we shaved lots of gigabytes back then, and we still save a lot
Then, to address one of the topics of today’s talks, we have the “non-common” RPMs. (These are the name of the directories, by the way.) They are the RPMs which are Postgres version-specific, basically they are extensions. Say plpgsql_check 15 or whatever. Lots of extensions are there.
Then we have extras. They are not actually Postgres packages, they shouldn’t be included in our repo by default, but many people actually look for these packages because they they want to use Patroni and they don’t have the supporting RPMs or supporting RTM RPMs, or they’re not up-to-date.
I’m building a console, console-template, ETCD, HAProxy keepalived and vip-manager. They are all open source, some of them are Go packages — which, actually, I don’t build them, I just distribute the precompiled binaries via the repo. So that makes easier for people to deploy the packages.
And then we have the “non-free” repo. These are the packages that depend on closed-source software like Oracle libraries, or that have license restrictions. For example, ora2pg depends on Perl DBD::Oracle, oracle_fdw depends on Oracle, pg_storm depends on Cuda Nvidia stuff, timescaledb-tsl actually is Timescale DB with the TSL license, informix_fdw and db2_ftw.
So we have some non-free packages which actually depend on non-free stuff as well. All of them are well-maintained: I’m trying to keep everything up to date — like real up-to-date! That brings some problems but we will get there.
We support RedHat Enterprise Linux and of course Rocky Linux. This year we started supporting Alma Linux as well. Of course they are more or less identical, but we test them, install, and support to verify the packages in these three distributions.
We have x86_64 aarchm64, ppc64le, and RedHat 9, 8, and 7. We have also RedHat 6 for Postgres 12, but it’s going to be killed by the end of this year. We have Fedora, but only two major releases, which matches the Fedora lifecycle. And SUSE — my pain in the neck — um I’ll get there.
Since you all are here for extension packaging, let’s get there: what happens for extension packaging.
First of all, we have the first extension, which is the in-core extensions. They are delivered with the contrib sub-package, which matches the directory name in The Postgres tarball. There are separate packages for each major version, so postgres15-contrib, postgres13-contrib, etc. These are the installation directories for each extension. We are going to use those directories for the other [non-cre] extensions as well.
When we add a new extension to the repo, it’s going to use these directories if they have a binary or if they an extension config file, if the library or the mem files — all are going are all installed under these directories. This magic is done by PGXS, which has been there forever. We just provide initial stuff and then the rest is done by the the PGXS magic. This is the base for a lot of the core extensions.
So what happens when we do non-common and non-free package? First of all, they are distributed separately for each Postgres major version. Let’s go back to the one of the extensions, say plpgsql_check. We have a separate package for Postgres 14, a separate package for Postgres 15, 16, 13, and 12. If they build against all the supported Postgres versions, we have separate packages for each.
Of course from time to time — as far as I remember Timescale DB does this —- Timescale DB only supports Postgres 15 and 16 nowadays. So we drop support for the older versions in the Git repo. But they are separate packages; they are all installed in these directories along with the main contrib package.
This is the naming convention that I use: $extensionName_PGMajorVersion. Four or six years ago, some of the packages didn’t have an underscore before the PG major version. It was a problem, so someone complained in the hackers mailing list, and then I made the changes.
Currently all the previous and all the new packages have this naming convention except a few packages like PostGIS — because in PostGIS we have multiple versions of the same extension. let’s say we have PostGIS 3.3, 3.2, 3.4, 3.1, 3.0. We have combinations of each — I think we have PostGIS 3.0 in some of the distributions but mostly PostGIS 3.1, 3.2, 3.3, and 3.4, and then all the Postgres versions they support — A lot of builds! So there are some exceptions where we have the extension name and extension major version before the Postgres major version.
- Jeremy S in chat: question: curious if you have thoughts about relocatable rpm packages. ever tried? just best to avoid?
I have a question from Jeremy. This is a very basic question to answer. This is actually forbidden by the packaging rules. The RPM packaging rules forbid you to distribute or create relocatable RPM packages. We we stick to the packaging guidelines, so this this cannot be done.
- Jeremy S in chat: Thanks! (Didn’t realize packaging guidelines forbid this)
Let’s talk about how we build extensions. Often our develop package is enough: many of our packages just just rely on on Postgres itself. But of course packages like PostGIS may depend on some other packages, like GDAL, which requires lots of lots of extra dependencies as well. The most problematic one is the GIS Stack: they need EPEL on RHEL and RHEL and its derivatives.
There there has been a discussion in the past about whether should require EPEL by default. The answer is “no,” because not all not all of our users are installing, for example, the GIS stack or other packages. Most of our users — not the majority of our users —- rely on the um rely on just our repo.
On the other hand, in order to provide completeness for our users, I added lots of python packages in the past to support Patroni — because the upstream packages (I’m sorry not maybe upstream packages) were not enough. The version wasn’t enough or maybe too low. So From some time to time I add non Postgres-related packages to the repo just to support the Postgres package. In the past it was PGAdmin, but now it’s not in our repo so it’s not a problem: their upstream is building their own RPMs, which is a good thing for us. We are building extra packages mostly for Patroni.
However, this is a potential problem for some enterprise users because large companies don’t even want to use the EPEL repo because they feel like it’s like it’s not a community repo, but a community repo controlled by Fedora and RedHat. That’s why from time to time I try to add some of the packages to our repo.
If it’s a problem for enterprise users, does it mean we should we maintain tons of extra packages in the EPEL repo for the GIS stack? The answer is “no, definitely no”. Not just because of the human power that we need to maintain those those packages — I mean rebuilding them is easy: I just get the source RPM, commit the spec files into our repo, and rebuild them. But maintaining them is something else.
We will have a similar problem when we release our own ICU package in the next few years. Because, now that we have the in core collation — but just for C Locale —- and people are trying to get rid of glibc, maybe we should have an ICU package. But maintaining an ICU Library across a single Postgres major version is a real big challenge that I don’t know how to solve for now, at least.
And then SLES — my pain in the neck — the GIS stack requires lots of repos on SLES 15. They are they are well documented on on our website.
Fedora is safe because Fedora is Fedora, everything is there, it’s easy to get a package there.
- Jeremy S in chat: “Building them is easy. Maintaining them is something else.”
Yeah that’s that’s the problem, Jeremy.
So how do you RPMify an extension?
The first thing is to know that the extension exists. This is one of the big problems between developers and users. The developer creates a useful extension and then they don’t create a tarball, they don’t release anything. They expect people to install Git on their production databases and git pull, install make, gcc, all the development libraries and build a binary, blah, blah blah.
I’m sorry that’ss not going to work. It also doesn’t work for pip — pip is not a package manager, it just destroys your server. It downloads things to random places and then everything’s gone. That’s why I added lots of Python packages to support Patroni, because most of the users use the packaging package manager to install Postgres and other packages to their servers. It’s the same for Debian, Ubuntu, for RPMs, for Windows, for macOS.
So first of all we know have to know that the extension exists and we have to have a tallbal. If I see an extension that seems good enough I’ll get there. PGXN is a good place, because when I go to pgxn.org a few times per day and see if there is a new version of an extension or if there’s new extension, it’s a good piece. But there’s a problem: we have hundreds of extensions — maybe thousands — but not all of them are on PGXN. They should be!
- David Wheeler (he/him) in chat: You should follow @pgxn@botsin.space for new releases :-)
- nils in chat: pg_xz incoming
I don’t know how to solve this problem, but we should expect every extension to announce their extensions on PGXN. I’m not just talking about installing everything through PGXN, but at least have an entry that there’s a new extension, this is repo, the website, the readme and the is a tarball. It doesn’t have to be on PGXN, as long as we have something.
And then I check the version. If there is an extension that will kill your database and the version is 0.001, that’s not going to be added to the repo, because we don’t want to distribute an experimental feature.
- David Wheeler (he/him) in chat: LOL, all my extensions start at 0.0.1
- David G. Johnston in chat: If it isn’t on PGXN it doesn’t exist.
Another problem is that lots of people write extensions but some of them are just garbage. I’m sorry but that’s the truth. I mean they just release a version and then do nothing.
- David Wheeler (he/him) in chat: LOLOLOL
From the chat, “pgxn_xz is coming”: that’s right! We have blackhole_fdw, which was written by Andrew Dunstan. When you create blackhole_fdw, it throws all of your data into black hole, so everything is gone.
Yeah, “if it’s not on PGXN it doesn’t exist,” that’s what I hope we achieve achieve in the next year or so.
- Jimmy Angelakos in chat, replying to “If it isn’t on PGXN …”: I would correct that to “If it isn’t on PGXN it isn’t maintained.” Sometimes even ON PGXN…
Yeah Jimmy, that’s one of the big problems that we have: maintenance.
We create the spec file, just copy one of the existing ones and start editing. It’s easy but sometimes we have to add patches. We used to carry patches for each Postgres major version to change the Makefiles for the specific Postgres major version. But I realized that it was [not a great pattern]. Now we just export the path, which fixes the problem.
- David G. Johnston in chat: As a policy though, someone who cares and wants responsibility needs to apply to be part of the RPM community.
Then I initiate a scratch build for any missing requirements. If there are any missing build requirements it fails to build. I only do it on Fedora latest, not for every package because it doesn’t always help because some distros may not have the missing dependency
- Alvaro Hernandez in chat: Hi everybody!
- David G. Johnston in chat: Delegated to PGXN for that directory.
Let’s say we rely on some really good feature that comes with a latest version of something, but that latest version may not appear in RedHat 7 or 8. So some dist dros may have it, but the version may be lower than required. Or some distros may have the dependency under different name. Now in the spec file we have “if SUSE then this” and “if RedHat then this” “if Fedora then”, “if RedHat nine then this”, etc. That’s okay, it’s expected. As long as we have the package, I don’t care.
Then I push it to the Git repo, which I use not just for the spec files and patches, but also for carrying the spec files and pitches to the build instances.
- Jorge in chat: How to handle extension versioning properly? I mean, in the control file the version could be anything, like for ex. citus default_version = ‘12.2-1’ where the “published version” is v12.1.2, then the “default version” could remain there forever.
  
  Also have seen in the wild extensions that the control file have a version 0.1 (forever) and the “released” version is 1.5
If something fails I go back to the drawing board. GCC may fail (gcc 14 has been released on Fedora 40 and is causing lots of issues for for packaging nowadays), it could be cmake — too recent or too old. It could be LLVM — LLVM18 is a problem for Postgres nowadays. I either try to fix it ping upstream. I often ping upstream because the issue must be fixed anyway
If everything is okay, just push the packages to the repo.
- Ruohang Feng (Vonng) in chat: question: how about adding some good extensions written in Rust/pgrx to the repo? like pgml, pg_bm25, pg_analytics, pg_graphql….
One issues is that there is no proper announcement. Maybe I have an awesome extension available in the Postgres repo that people crave and, we build the extensions, it took a lot of time (thank you Jimmy, he helped me a lot) and then I didn’t actually announce it that much. On the other hand, people just can use PG stat base [?] to install and start using it in a few seconds. This is something that we should improve.
- Steven Miller in chat: How to handle system dependencies like libc versions being updated on the target system? Do extensions need to be complied against exactly a specific libc version?
- From Nevzat in chat: how can we make sure bugfix or extension is safe before installing it
- vrmiguel in chat: Interesting approach to identify build/runtime requirements
  
  Over at Tembo we have a project called trunk-packager which attempts to automatically create .deb packages given the extension’s shared object.
  
  We try to identify the dynamic libs the extension requires by parsing its ELF and then trying to map the required .so to the Debian package that provides it, saving this info in the .deb’s control file
From the chat: How to handle extension versions properly? That’s a good thing but, extension version and the release version don’t have to match. Thr extension version isn’t the same thing as the release version. It’s the version of the SQL file or the functions or the tables, the views, sort procedures, or whatever. If it’s 0.1 it means it’s 0.1 it means nothing nothing has changed in this specific regarding the control file. They they may bump up the package version because they may add new features, but if they don’t add new features to the SQL file, then they don’t update the extensions. I hope that answers your question George
I have another question from Ruohang. Yaaaaay! I was afraid that someone would ask that one. We have no extensions written in Rust in repo so far. It’s not like Go; there is a ban against Go because we don’t want to download the world, all the internet just to build an extension. If I recall correctly they’re rewriting pg_anonymizer in Rust. They will let me know when they release it or they’re ready to release it, and then I’ll build it. It’s not something I don’t like, it just hasn’t happened.
- Keith Fiske in chat: I still update the control file with my extensions even if it’s only a library change. Makes it easier to know what version is installed from within the database, not just looking at the package version (which may not be accessible)
- Ruohang Feng (Vonng) inchat: question: How to handle RPM and extension name conflicts, e.g., Hydra’s columnar and Citus’s columnar.
- *David Wheeler (he/him) in chat, replying to “I still update the c…” Yeah I’ve been shifting to this pattern, it’s too confusing otherwise
If you think there are good extensions like these, just create a ticket on redmine.postgresql.org. I’m happy to take a look as long as I know them. That’s one of the problems: I have never heard about pg_analytics or pgml, because they’re not on PGXN. Or maybe they are. This is something that we should improve in the next few months.
- Jimmy Angelakos in chat: Go is a pretty terrible ecosystem. It has its own package manager in the language, so it’s kinda incompatible with distro packages
- Jason Petersen in chat: (but that means a build is safe within a single release, it doesn’t mean you can move a built extension from one Fedora version to another, right?)
- David Wheeler (he/him) in chat, replying to “How to handle system…”: Libc is stable in each major version of the OS, and there are separate RPMs for each.
Another question from Steven: how to handle system dependencies like libc version updates. The answer is no. It’s mostly because they don’t update the libc major version across the across across the lifetime of the of the release. So we don’t need to rebuild the extension against libc.
- Steven Miller in chat, replying to “How to handle system…”: Ok I see, TY
- Jason Petersen in chat, replying to “How to handle system…”: Is that how we deploy in trunk, though?
- David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Huh? You can build binaries in a sandbox and then you just need the binary in your package.
[Addressing Nevzat’s question]: That’s a great question. It’s up to you! It’s no different than installing Postges or any other thing. I just build RPMs. If you’re reading the hackers mailing list nowadays, people rely on me an Christoph and others, so that we don’t inject any code into the RPMs or Debian packages. You just need to trust us not to add extra code to the packages. But if there’s a feature problem or any bug then you should complain upstream, not to us. so you should just test.
- Jimmy Angelakos in chat, replying to “Go is a pretty terri…”: Find me one person that does this.
- Steven Miller in chat, replying to “How to handle system…”: We don’t have OS version as one of the dimensions of version packaging but should
[Addressing vrmiguel’s comment]: Yeah, that could be done but like I don’t like complex things, that’s why I’m an RPM packager.
- Jason Petersen in chat, replying to “Go is a pretty terri…”: (doesn’t go statically link binaries, or did they drop that philosophy?)
- vrmiguel in chat: I think citus has changed it to citus_columnar
- David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Hugo: https://github.com/hugomods/docker/tree/main/docker/hugo
- David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Jason: Static except for libc, yes
Another question from Ruohang: uh how to handle RPM and extension name conflicts. I think Citus came first, so you should complain to Hydra and ask them to change the name. They shouldn’t be identical. We have something similar with Pgpool: they they are conflicting with the PCP Library ,which has been in the Linux for the last 25 years. I think Pgpool has to change their name.
- Jeremy S in chat, replying to “I still update the c…”: So you think people will run the “alter extension upgrade” eh?
[Addressing Keith Fiske’s comment]: I’m not saying I don’t agree with you, but it means every time I have to update my extension version in my running database — it’s some extra work but that’s okay. It’s the user problem, not my problem.
Question from Jason [on moving an extension from one Fedora to another]: Right, it may not be safe because the GCC version may be different and other stuff may be different. One distro to another is not safe, Jason; sorry about that.
[Back to Steven’s question]: Yes, David’s answer is right.
[Addressing vrmiguel’s comment about citus_columnar]: You are right.
Jimmy I’m not going to read your comment about go because I don’t think think you can swear enough here.
- vrmiguel in chat, replying to “Go is a pretty terri…”: Are there known Postgres extensions written in Go? Not sure how Go is relevant here
- Jason Petersen in chat: you said “gcc” and not “libc” there, are you implying that things like compiler versions and flags need to be identical between postgres and extensions
- Keith Fiske in chat, replying to “I still update the c…”: I think they should …
- David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Were some experiments a few years ago. https://pkg.go.dev/github.com/microo8/plgo
Let me continue now. First you have to know the extension exists, and then the you also need to know that the extension has an update. Unfortunately the same problem: the extension exists or has an update and they just don’t let us know.
- Jimmy Angelakos in chat, replying to “Go is a pretty terri…”: @vrmiguel now you know why :D
This is a big challenge Fedora has in house solution.When you add a new package to Fedora, I think they crawl their repo once a day and if there’s new release they create a ticket in their bug tracker automatically, so that the maintainer knows there’s a new version. This can be done, but would need a volunteer to do it. Orr maybe the easiest thing is just add everything to the to PGXN,
When we update an extension we, have to make sure it doesn’t break anything. It requires some testing. As I said earlier, building is one thing, maintaining the extension is a bigger thing. If you want to raise a baby, you are responsible until until the end of your life. Consider this like your baby: either just let us know if you can’t maintain an extension anymore or please respond to the tickets that I open.
- Steven Miller in chat: One other detail about compatibility dimensions. We have noticed some extensions can be complied with chip-specific instructions like AVX512, for example vector does this which optimizes performance in some cases
- Alvaro Hernandez in chat, replying to “you said “gcc” and n…”: I’d reverse the question: do we have strong guarantees that there are no risks if versions and/or flags may differ?
  
  I believe extensions are already risky in several ways, and we should diminish any other risks, like packaging ones.
  
  So I’d say absolutely yes, compile extensions and Postgres in exactly the same way, versions and environments.
Sometimes a new minor version of an extension breaks a previous Postgres release. For example, an extension drops support for Postgres 12 even though Postgres 12 is still supported. Or they didn’t do the upgrade path work. I have to make sure everything is safe.
- nils in chat, rReplying to “I think citus has ch…”: It was never changed, the extension has always either been embedded in Citus or later moved to a separate extension called citus_columner.
  
  I think the name conflict comes from the access method being called columnar, which Citus claimed first. (Hydra’s started actually as a fork from Citus’ codebase).
  
  (disclaimer; I work on Citus and its ecosystem)
- Jason Petersen in chat, replying to “I think citus has ch…”: hi nils
Next month a new beta comes out. Everyone is happy, let’s start testing new features. For the packagers that means it’s time to start building extensions against beta-1. So a build might fail, we fix it, and then it may fail against beta-2. I understand if extension authors may want to wait until rc-1. That’s acceptable as long as they let us know. Many of them fail, and then Christoph and I create tickets against them and display them on wiki.postgresql.org. It’s a Hall of Shame!
- Eric in chat: When you decide to package a new extension do you coordinate with upstream to make that decision?
- David Wheeler (he/him) in chat, replying to “When you decide to p…”: I learned I had extensions in the yum repo only after the fact
- Eric in chat, replying to “When you decide to p…”: I see
- vrmiguel in chat: @Devrim Gündüz I’m curious about how RPM deals with extensions that depend on other Pg extensions
- David Wheeler (he/him) in chat: You can test Postgres 17 in the pgxn-tools docker image today. Example: https://github.com/theory/pgtap/actions/runs/8502825052
This list pisses off the extension authors because they don’t respond to ticket. So what do we do next? It happens again and again and again, because they just don’t respond to us. On Monday uh I got a response from an extension maintainer. He said “you are talking like you are my boss!” I said, “I’m talking like I’m your user, I’m sorry. I just asked for a very specific thing.”
- nils in chat: I love Devrim’s issues against our repo’s! They are like clockwork, every year 😄
- David Wheeler (he/him) in chat, replying to “You can test Postgre…”: It relies on the community apt repo
- Eric in chat, replying to “When you decide to p…”: Related: ever had upstream request you stop packaging an extension?
- Steven Miller* in chat, replying to “One other detail abo…”: Even if compiled inside a container, on a specific chip it can get chip-specific instructions inside the binary. For example building vector on linux/amd64 inside docker on a chip with AVX512, the container image will not work on another linux/amd64 system that does not have AVX512
- David Wheeler (he/him) in chat: :boss:
Unresponsive maintainers are a challenge: they don’t respond to tickets, or emails, or they don’t update the extensions for recent Postgres versions.
Don’t get me wrong even the big companies also do this, or they don’t update the extensions for the new GCC versions. I don’t expect them to test everything against all all the GCC versions; that’s that’s my problem. But just respond please.
What’s the responsibility of the packager in this case? Should we fork if they don’t respond at all? No we are not forking it! VBut going to conferences helps, because if the extension author is there I can talk to them in person in a quiet place, in a good way, just “please update the package tomorrow or you’re going to die”. Of course not this but you see what I mean.
[Looking at chat]: I’m going to skip any word about containers; sorry about that.
[Addressing Eric’s question]: That’s a good so so the question! No, actually they support us a lot, because that’s the way that people use their extensions. And do we coordinate with upstream? No, I coordinate with myself and try to build it. Of course upstream just can just create a ticket, send me email, or find me at a conference. They can say, “hey, we have an extension, could you package an RPM?” Sure, why not." I don’t coordinate with Upstream as long as uh there is no problem with the builds.
- Eric in chat, replying to “When you decide to p…”: So you haven’t run into a situation where upstream did not want/appreciate you packaging for them?
[Respondinding to nils’s comment]: Thank you, thanks for responding!
[Responding to vrmiguel’s question about depending on other extensions]: We actually add dependency to that one. That’s bit of uh work, like PG rotting depends on PostGIS. In order to provide a seamless installation the PostGIS package, in the PostGIS spec file, I add an extra line that says it provides PostGiS without the version as part of the name. Then when we install pg rotting, it looks for any PostGIS package — which is fine because it can run against any PostGIS version. So I add the dependency to other extensions if we need them.
- David G. Johnston in chat: The tooling ideally would report, say to PGXN or whatever the directory location for the initial application is, the issues and remind them that if the build system cannot build their extension it will not be included in the final RPM. You are an unpaid service provider for them and if they don’t meet their obligations their don’t get the benefit of the service.
[Responding to Eric’s upstream follow-up question]: I haven’t seen anything in any upstream where a person didn’t want me to package. But I haven’t seen many appreciations, either; I mean they don’t appreciate you. I’m being appreciated by EDB — money, money, money, must be funny — thanks EDB! But I haven’t had any rejections so far. Good question!
- Eric in chat, replying to “When you decide to p…”: Fair. Cool. Thanks
Relying on external repos is a big problem for SUSE. Some of the maintainers just discontinue their repo. One problem with SUSE is they don’t have an EPEL-like repo. EPEL is a great thing. The barrier to add a package to EPEL is not low but not high, either. If you if you’re an advanced packager you can add a package quick enough. Of course it requires review from others. But this a big problem for SUSE.
Lack of maintenance is a problem. We have a repo but they don’t update it; so I have to go find another repo from build.opensuse.org, change it, update the website, change our build instance, etc. That’s a big problem.
- David Wheeler (he/him) in chat, replying to “The tooling ideally …”: I want to add build and test success/fail matrices to extension pages on PGXN
- Florents Tselai in chat: How do you handle Pl/Python-based extensions + pip dependencies? Especially with virtualenv-based installations. i.e. Pl/Python usually relies on a /usr/bin/python3, but people shouldn’t install dependencies there.
And then there’s costs! What’s the cost of RPMifying an extension? Hosting a build server? We have a very beefy bare metal build server hosted by Enterprise DB, just because I’m working for them and they have a spare machine. Hosting a build server is a cost.
I have to use some external resources for architecture reasons, like some of our build instances, like PPC 64 ,is hosted somewhere else. There are some admin tasks to keep everything and running, like EDB’s IT team actually helped me to fix an issue today in both of our PPC instances.
- Jason Petersen in chat, replying to “How do you handle Pl…”: I think early on he said he makes the extensions rely on RPMs that provide those Python dependencies
- David Wheeler (he/him) in chat, replying to “How do you handle Pl…”: I have used this pattern for RPMifying Perl tools
Then, maintaining build instances requires keeping them up-to-date, and also that each update doesn’t break anything. It’s not like “dnf update and build a package”. No. It may be a problem with Fedora because Fedora may can update anything any time they want. But it’s a less problem for SUSE and RedHat, but we have to take care that the updates don’t break anything.
Redhat, the company, actually follows our release schedule. We release every three months. Unless something bad happens, we know the next release is in May, on a Thursday. So every Wednesday, one day before our minor release, RedHat releases their new maintenance releases. RedHat is going to release 9.4 on Wednesday before our minor release. What does that mean for us as an RPM packager for RedHat?
*RedHat releases a new version with a new LLVM, for example, and then it means we have to rebuild the packages against the new LLVM so that people can use it. That means I have to work until Thursday morning to build the packages. That’s fine but another problem is for Rocky and Alma Linux users, because they’re are not going to have the updated LLVM package, or any any updated package, like GCC. It’s not like the old RedHat days; they change everything uh in minor versions.
So I have to rebuild GCC and LLVM on our instances, add them to our special repo “sysupdates”, which is in the config file, and this takes many hours because building GCC and LLVM is a big thing.
In the last two years I have not been able to build the from GCC Source RPM. I had to edit everything and not edit the spec files blah blah to be able to build it. I have no idea how how they can break in Source RPM.
So that’s another cost: in May I’m going to spend lots of cycles to keep up with the latest RedHat release, and also make the make the Rocky Linux and Alma Linux users happier. Maintaining build systems is not as easy as running Yup or Zypper update. It requires employing the packager — because I have the bills pay I have the beers to drink.
[Addressing Florents’s PL/Python question]: I don’t know what the PL/Python based extensions are, but I tried to get rid of everything related to pip. I’m not a developer, a DBA isn’t a developer, a Sysadmin isn’t a developer. They’re not suposed to use pip; they are supposed to use the package manager to keep up with everything. My point is if someone needs pip then I should fix it. That’s what I did for Patroni. I added lots of packages to our Git repo just to be able to support Patroni.
- Ian Stanton in chat: Need to drop, thank you Devrim!
- Jeremy S in chat, replying to “How do you handle Pl…”: A lot of larger companies have inventory management and risk control processes that heavily leverage package management
- Alvaro Hernandez in chat: Need to go, ttyl!
- vrmiguel in chat, replying to “you said “gcc” and n…”: Do you think there are no guarantees at all? For instance, Postgres loads up the extension with dlopen, which could fail with version mismatch. If that doesn’t occur and the extension loads ‘fine’, how likely do you think an issue could be?
  
  Also I’m curious how often you’ve seen problems arise from libc itself (rather than any of the many things that could cause UB in a C program) and how these problems have manifested
- Ahmet Melih Başbuğ in chat: Thank you

Conclusion

I thanked Devrim and all the discussion, and pitched the next mini-summit, where I think Jonathan Katz will talk about the TLE vision and specifics.

Thank you all for coming!

Talk: Postgres Yum Extensions

1 Apr 2024

Quick reminder that Devrim Gündüz of EnterpriseDB will be giving a talk at this week’s extension ecosystem mini-summit, an overview of the PostgreSQL Yum Repository architecture, how new RPMs are added, and issues and challenges with distributing RPMed extensions.

The community Yum and Apt repositories are the canonical distributors of PostgreSQL community builds on Linux platforms, with a long history of robust support for multiple OSes, OS versions, and PostgreSQL versions. Devrim will help us understand how the Yum repository works, the inclusion of extensions from the broader community, and the challenges for automatic binary packaging for all publicly-available extensions.

Join us! Note! that if you reserved a spot at a prior mini-summit, sadly you will need to do so again for each subsequent summit or miss out on reminders from Eventbrite.

Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

PGXN v2: Go or Rust?

28 Mar 2024

TL;DR: I’d like Postgres community input on a decision: Should we build PGXN v2 services and tools in Go or Rust? Context for the question and some weighing of options constitutes the rest of this post, but to skip to the end, 🗳️ Vote your choice! Poll closes April 12 at the end of the day (midnight) New York time.

The PGXN v2 project now under way requires developing or updating several services and tools, including:

A root registry for source distribution
A package registry for binary distribution
A command line client for developing, building, and distributing extension packages
An interactions service for notifications and stats aggregation

And more. Historically, the PGXN tools were written in Perl, which was an ideal choice for me back in 2011, and quite familiar to many members of the core team, but also deeply foreign to most everyone else. Furthermore, its dynamic nature and need for dozens of dependencies in most apps makes installation and packaging a challenge, to say the least.¹

In the last ten years I’ve become quite proficient in Go. I appreciate its near system-level performance, memory safety, robust standard library, concurrency design, and short learning curve — especially for web services. But perhaps most eye-opening to me, as a long-time user of dynamic languages, is that, like C, Go compiles an application into a single static binary. Not only that, but Go provides cross compilation natively. This makes distribution incredibly simple.

Distribution Digression

Compare, for example, the Dockerfile for Sqitch, the database change management system I wrote and maintain in Perl. It’s…a lot. Sure there are a ton of system dependencies, but what’s invisible in this file is the weeks of work that went into Module::Build::Sqitch, which performs a bunch of tricks to build the Sqitch “app” as a single directory with all of its Perl dependencies. Don’t get me wrong, the work was worth it for Sqitch, and powers the Homebrew formula, as well. But even there, I’ve not been able to get Sqitch into the Homebrew core because every single dependency requires a checksum, and I’ve not had the time (or energy) to figure out how to generate them.

Contrast with this Dockerfile for a Go service compiled into a binary named thinko:

FROM gcr.io/distroless/base-debian12:latest

# TARGETOS and TARGETARCH: https://docs.docker.com/build/guide/multi-platform/
ARG TARGETOS
ARG TARGETARCH

COPY "_build/${TARGETOS}-${TARGETARCH}/thinko" /thinko/bin/
USER nonroot:nonroot
ENTRYPOINT [ "/thinko/bin/thinko" ]

That’s the whole thing. There are no dependencies at all, aside from a few included in distroless image. And where does that image come from? This is the relevant from the project Makefile:

.PHONY: all # Build all binaries
all: local linux darwin windows freebsd

linux: thinko-linux
darwin: thinko-darwin
windows: thinko-windows
freebsd: thinko-freebsd

thinko-linux: _build/linux-amd64/thinko _build/linux-arm64/thinko
thinko-darwin: _build/darwin-amd64/thinko _build/darwin-arm64/thinko
thinko-windows: _build/windows-amd64/thinko _build/windows-arm64/thinko
thinko-freebsd: _build/freebsd-amd64/thinko _build/freebsd-arm64/thinko

# Build Thinko for specific platform
_build/%/thinko: cmd/thinko
	GOOS=$(word 1,$(subst -, ,$*)) GOARCH=$(word 2,$(subst -, ,$*)) $(GO) build -o $@ ./$<

This configuration allows me to build thinko for every OS and architecture at once:

$ make thinko
go build -o _build/local/thinko ./cmd/thinko
GOOS=linux GOARCH=amd64 go build -o _build/linux-amd64/thinko ./cmd/thinko
GOOS=linux GOARCH=arm64 go build -o _build/linux-arm64/thinko ./cmd/thinko
GOOS=darwin GOARCH=amd64 go build -o _build/darwin-amd64/thinko ./cmd/thinko
GOOS=darwin GOARCH=arm64 go build -o _build/darwin-arm64/thinko ./cmd/thinko
GOOS=windows GOARCH=amd64 go build -o _build/windows-amd64/thinko ./cmd/thinko
GOOS=windows GOARCH=arm64 go build -o _build/windows-arm64/thinko ./cmd/thinko
GOOS=freebsd GOARCH=amd64 go build -o _build/freebsd-amd64/thinko ./cmd/thinko
GOOS=freebsd GOARCH=arm64 go build -o _build/freebsd-arm64/thinko ./cmd/thinko

Those first two commands build thinko for Linux on amd64 and arm64, right where the Dockerfile expects them. Building then is easy; a separate make target runs the equivalent of:

$ docker buildx build --platform linux/arm64 -f dist/Dockerfile .
$ docker buildx build --platform linux/amd64 -f dist/Dockerfile .

The --platform flag sets the TARGETOS and TARGETARCH arguments in the Dockerfile, and because the directories into which each binary were compiled have these same terms, the binary compiled for the right OS and architecture can be copied right in.

And that’s it, it’s ready to ship! No mucking with dependencies, tweaking system issues, removing unneeded stuff from the image. It’s just the bare minimum.

This pattern works not just for Docker images, of course. See, for example, how [Hugo], the Go blog generator, releases tarballs for a bunch of OSes and architectures, each containing nothing more than a README.md, LICENSE.md, and the hugo binary itself. This pattern allows both the Hugo Homebrew formula and its Dockerfile to be incredibly simple.

Back to PGXN

I very much want these advantages for the next generation of PGXN tools. Not only the services, but also the command-line client, which would become very easy to distribute to a wide variety of platforms with minimal effort.

But there are other variables to weigh in the choice of language for the PGXN servers and tools, including:

Familiarity to other developers: Ideally someone can quickly contribute to a project because they’re familiar with the language, or there’s a short learning curve.
Safety from common issues and vulnerabilities such as buffer overflows, and dangling pointers.
Tooling for robust and integrated development, including dependency management, testing, distribution, and of course cross-compilation.

Decisions, Decisions

In my experience, there are two language that fulfill these requirements very well:

🐿️ Go²
🦀 Rust

Which should we use? Some relevant notes:

I expect to do the bulk of the initial development on PGXN v2, as the only person currently dedicated full time to the project, and I’m most familiar with Go — indeed I enjoy writing web services and CLIs in Go!. I’d therefore be able go ship Go tools more quickly.
But I’ve played around with Rust a number of times over the years, and very much would like to learn more. Its syntax and long feature list steepen the learning curve, but given my background in Perl — another language with unique syntax and context-sensitive features — I’m certain I could become incredibly proficient in Rust after a few months.
My employer, Tembo, is a Rust shop, and we’ll likely borrow heavily from the trunk project, especially for the CLI and binary registry. It would also be easier for my coworkers to contribute.
pgrx, the tooling to build Postgres extensions in Rust, has taken the community by storm, rapidly building familiarity with the language among extensions developers. Perhaps some of those developers would also be willing to turn their expertise to PGXN Rust contributions, as well. It’s likely some features could be borrowed, as well.
Sadly, the plgo project appears to have stalled, so has not built up the same community momentum.

This leaves me torn! But it’s time to start coding, so it’s also time to make some decisions. Should PGXN v2 services and tool be:

🐿️ Written in Go
🦀 Written in Rust
🐿️ + 🦀 Some of each (e.g., Go for web services and Rust for CLIs)

What do you think? If you were to contribute to PGXN, what language would you like to work in? Do you think one language or the other would be more compatible with community direction or core development?³

Got an opinion? 🗳️ Vote! Poll closes April 12 at the end of the day (midnight) New York time.

And if those choices aren’t enough for you, please come yell at me on Mastodon, or via the #extensions channel on the Postgres Slack. Thanks!

Ever wonder why PGXN isn’t hosted by community servers? It’s because I screwed up the installation trying to balance all the dependencies without wiping out Perl modules the systems depend on. 🤦🏻‍♂️ ↩︎
Pity there’s no gopher emoji yet. ↩︎
I can imagine a future where an extension CLI was included in core. ↩︎

Mini Summit Two

25 Mar 2024

We had such thoughtful and engaged discussion at this week’s Postgres Extension Ecosystem Mini-Summit! I did learn that one has to reserve a spot for each mini-summit individually, however. Eventbrite sends reminders for each one you sign up for, not all of them.

To reserve a spot and be reminded for forthcoming meetings, hit the Eventbrite page and select a date and hit “Reserve a Spot” for each date you’d like to attend.

Back to this week’s meetup. My colleague Ian Stanton of Tembo gave a great talk, “Building Trunk: A Postgres Extension Registry and CLI”, that provided background on the motivations and problems that inspired the creation of trunk, a binary packaging system for Postgres extensions.

The presentation was followed by 35+ minutes of questions, discussion, and brainstorming, which I’ve summarized below. But first, links!

Now down to business.

Introduction

I opened the meeting and introduced Ian Stanton.

Presentation

Ian introduced himself and trunk, “an extension registry and CLI supporting Tembo Cloud. Wants to tell a story, starting with backstory.
Tembo founded November 2022, provide managed Postgres solution called Tembo Cloud. Idea is Postgres can be used for so many different things through the power of extensions, so built use-case optimized “stacks” flavors of Postgres powered by extensions and configurations. Super proud of them, including Message Queue, for which we build an open-source extension.
Envisioned ability to install any extension, including user-provided extensions. Knew we’d need an extension management solution. So we built it.
It’s called trunk, an extension registry and CLI, an open-source app for the community that hosts binary packages for extensions, and powers Tembo Cloud’s extension management.
Q1 2023 had build Tembo CLoud v1 with all extensions bundled in containers. But wanted way to install them on the fly, ideally with installable packages. Explored the ecosystem for tool we could use.
PGXN first we found. Love it, backed by the community, been around since 2011, but hosted source code, not binaries. Also little development since 2012.
Apt and Yum repositories are community-backed and are binaries, just what we wanted, but smaller subset of extensions relative to the 1000s available. Thought it would be too time-consuming to add them all through the community process.
- *Steven Miller: in chat: “Also with apt packaging, it requires to install to a specific path, but we needed to customize the install path based on what pg_config shows for share lib and package lib dir. That way we could persist extension installations on tembo cloud”
Weighed pros and cons of building one. Pros:
- Full control over integration with Tembo Cloud
- Binary distribution
- We could build new features quickly
- We could publish new extensions quickly
Cons:
- How will the community react?
- Recreating the wheel?
Expected to publish 2–3 extension a day, only do-able with a solution we built.
Want to build something meaningful for Tembo Cloud and the community.
Astronomer Registry for Airflow: Built by Astronomer to find modules for Airflow, very well received by the community.
PGXN, Apt, and Yum repos: Wanted to take the best of them and build on it.
crates.io: Wanted a similar great experience for Postgres extensions.
Vision boiled down to discoverability, categories, ratings system, certification, and indexing of cloud provider support.
Want to package any extension, whether SQL, C/SQL, or pgrx.
Simple experience, like cargo publish and cargo install cargo-pgrx.

Eric in chat: “❤️”
Hopes and Dreams: had idea people would magically show up, contribute to the code, and publish their extensions. Wanted to support multiple platforms, architectures, and Postgres versions, and for it to be a one-stop shop for Postgres extensions.
How it works.
CLI and Registry, written in Rust, uses Docker to build extensions. Packages named <trunk-project-name>-<version>-<pg-version>.tar.gz. Published with trunk publish and installed with trunk install, putting all the files in the right places.
- Steven Miller in chat: “The approach to use docker for building has been nice. It allows for cross-compile, for example, building for any platform docker supports with the —platform flag”
Registry stores metadata and service web site and API, and uses S3 bucket for the tar-gzip files.

Example building semver extension:

Create Trunk bundle:
bitcode/src/semver/src/semver.bc
bitcode/src/semver.index.bc
semver.so
licenses/LICENSE
extension/semver--0.10.0--0.11.0.sql
extension/semver--0.11.0--0.12.0.sql
extension/semver--0.12.0--0.13.0.sql
extension/semver--0.13.0--0.15.0.sql
extension/semver--0.15.0--0.16.0.sql
extension/semver--0.16.0--0.17.0.sql
extension/semver--0.17.0--0.20.0.sql
extension/semver--0.2.1--0.2.4.sql
extension/semver--0.2.4--0.3.0.sql
extension/semver--0.20.0--0.21.0.sql
extension/semver--0.21.0--0.22.0.sql
extension/semver--0.22.0--0.30.0.sql
extension/semver--0.3.0--0.4.0.sql
extension/semver--0.30.0--0.31.0.sql
extension/semver--0.31.0--0.31.1.sql
extension/semver--0.31.1--0.31.2.sql
extension/semver--0.31.2--0.32.0.sql
extension/semver--0.32.1.sql
extension/semver--0.5.0--0.10.0.sql
extension/semver--unpackaged--0.2.1. sql
extension/semver.control
extension/semver.sql
manifest. json
Packaged to •/. trunk/pg_semver-0.32.1-pg15.tar.gz

Package up SQL files, control file, SO files, bitcode files into gzip file.

Once it’s published, API surfaces all this information:

[
  {
    "name": "pg_semver",
    "description": "A semantic version data type for PostgreSQL.",
    "documentation_link": "https://github.com/theory/pg-semver",
    "repository_link": "https://github.com/theory/pg-semver",
    "version": "0.32.0",
    "postgres_versions": [
      15
    ],
    "extensions": [
      {
        "extension_name": "semver",
        "version": "0.32.0",
        "trunk_project_name": "pg_semver",
        "dependencies_extension_names": null,
        "loadable_libraries": null,
        "configurations": null,
        "control_file": {
          "absent": false,
          "content": ""
        }
      }
    ],
    "downloads": [
      {
        "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg15-0.32.0.tar.gz",
        "pg_version": 15,
        "platform": "linux/amd64",
        "sha256": "016249a3aeec1dc431fe14b2cb3c252b76f07133ea5954e2372f1a9f2178091b"
      }
    ]
  },
  {
    "name": "pg_semver",
    "description": "A semantic version data type for PostgreSQL.",
    "documentation_link": "https://github.com/theory/pg-semver",
    "repository_link": "https://github.com/theory/pg-semver",
    "version": "0.32.1",
    "postgres_versions": [
      15,
      14,
      16
    ],
    "extensions": [
      {
        "extension_name": "semver",
        "version": "0.32.1",
        "trunk_project_name": "pg_semver",
        "dependencies_extension_names": null,
        "loadable_libraries": null,
        "configurations": null,
        "control_file": {
          "absent": false,
          "content": "# semver extension\ncomment = 'Semantic version data type'\ndefault_version = '0.32.1'\nmodule_pathname = '$libdir/semver'\nrelocatable = true\n"
        }
      }
    ],
    "downloads": [
      {
        "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg14-0.32.1.tar.gz",
        "pg_version": 14,
        "platform": "linux/amd64",
        "sha256": "f412cfb4722eac32a38dbcc7cd4201d95f07fd88b7abc623cd84c77aecc8d4bb"
      },
      {
        "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg15-0.32.1.tar.gz",
        "pg_version": 15,
        "platform": "linux/amd64",
        "sha256": "9213771ffc44fb5a88726770f88fd13e62118b0f861e23271c3eeee427a23be9"
      },
      {
        "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg16-0.32.1.tar.gz",
        "pg_version": 16,
        "platform": "linux/amd64",
        "sha256": "8ffe4fa491f13a1764580d274e9f9909af4461aacbeb15857ab2fa235b152117"
      }
    ]
  }
]

Includes different tar-gzip files for different versions of Postgres, the contents of the control file, dependencies; loadable libraries and configurations; and the one extension in this package — some can have many like PostGIS. Then Postgres version support and some other metadata.

What it looks like on the web site, includes README contents, data from the last slide, install command, etc.

This is what installation looks like:

$ trunk install pg_semver
Using pkglibdir: "/usr/lib/postgresql/16/lib"
Using sharedir: "/usr/share/postgresql/16"
Using Postgres version: 16
info: Downloading from: https://cdb-plat-usel-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg16-0.32.1.tar.gz
info: Dependent extensions to be installed: []
info: Installing pg_semver 0.32.1
[+] bitcode/src/semver/src/semver.bc => /usr/lib/postgresql/16/lib
[+] bitcode/src/semver. index.bc => /usr/lib/postgresql/16/lib
[+] semver.so => /usr/lib/postgresql/16/lib
info: Skipping license file licenses/LICENSE
[+] extension/semver--0.10.0--0.11.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.11.0--0.12.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.12.0--0.13.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.13.0--0.15.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.15.0--0.16.0.sql = /usr/share/postgresql/16
[+] extension/semver--0.16.0--0.17.0.sql => /us/share/postgresql/16
[+] extension/semver--0.17.0--0.20.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.2.1--0.2.4.sql => /usr/share/postgresql/16
[+] extension/semver--0.2.4--0.3.0.sql > /us/share/postgresql/16
[+] extension/semver--0.20.0--0.21.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.21.0--0.22.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.22.0--0.30.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.3.0--0.4.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.30.0--0.31.0.sql = /usr/share/postgresql/16
[+] extension/semver--0.31.0--0.31.1.sql => /usr/share/postgresql/16
[+] extension/semver--0.31.1--0.31.2.sql => /usr/share/postgresql/16
[+] extension/semver--0.31.2--0.32.0.sql => /usr/share/postgresql/16
[+] extension/semver--0.32.1.sql => /usr/share/postgresql/16
[+] extension/semver--0.5.0--0.10.0.sql => /usr/share/postgresql/16
[+] extension/semver--unpackaged--0.2.1.sql => /usr/share/postgresql/16
[+] extension/semver.control => /usr/share/postgresql/16
[+] extension/semver.sql => /usr/share/postgresql/16

***************************
* POST INSTALLATION STEPS *
***************************

Install the following system-level dependencies:
        On systems using apt:
            libc6

Enable the extension with:
       CREATE EXTENSION IF NOT EXISTS semver CASCADE;

CLI pulls down the tar-gzip, unpacks it, and puts the files in the right places and tells the users what other commands are needed to enable the extension.

Pause to take a sip of water.
- David Wheeler (he/him) in chat: “STAY HYDRATED PEOPLE!”
State of the project. Trunk powers extension management for Tembo Cloud, 200 extensions on the platform, install and enable on the fly. Tembo Cloud likely trunk’s #1 user.
Get lots of site traffic, especially around categorization, addresses the discoverability problem set set out to solve.
- Jeremy S in chat: “Interested in insights from site traffic - you mentioned that ‘categorization’ was popular - any other things that traffic patterns seem to suggest you might have done really well, or clearly is needed?”
But pretty minimal community involvement, out fault for not involving the community early on.
Did we solve the problem?
- For Tembo Cloud: yes! Trunk is core component of the Tembo Cloud platform that lest us offer high number of extensions.
- For the community: no! But helped bring more awareness to the opportunities to improve the ecosystem as a community.
- Saw other solutions arise around the same time, including dbdev and pgxman, and Yurri at Omnigres is working on something as well. Huge opportunity to solve this together.
- Steven Miller in chat: “I think it is very nice way to install other extensions via an extension how dbdev works”
- David Wheeler (he/him) in chat: “GRANT!”
- Grant Holly in chat: “Oh hi”
Lessons Learned
- It’s a really hard problem to solve! As you add more layers of complexity, like different architectures, versions of Postgres, it gets harder and harder.
  - Steven Miller in chat, Replying to “The approach to use …”: “The downside for this approach includes missing chip-specific instruction support, for example AVX512, which optimizes performance on some extensions. However if you are building with docker on the same architecture as the host, then it still includes these instructions.”
  - David Wheeler (he/him) in chat, Replying to “The approach to use …” “Also presumably no support for building for non-Linux platforms, yes?”
- The extension ecosystem truly is the wild west, not really best practices around building, versioning, and releasing, and when you’re collecting and housing them, it makes things difficult. A huge opportunity for us to come up with those standards and share them with the community.
- Community involvement is crucial, wish we’d done it better early on, that’s why we’re all here today! Solution to build together doesn’t happen if we don’t tackle it as a community.
- Similarly, wish we’d reached out to folks like David and Devrim early on, to get more insight from them and bring the community into the project from the beginning
The future of trunk
- Registry and CLI will continue to serve Tembo Cloud
- Has paved the way for binary packaging and distribution in PGXN v2 that David is spearheading, will at least will inform and perhaps be the basis for that part of the project.
That’s all, thank you, back to you, David!

Discussion

David Wheeler (he/him): Thanks for history an context, Ian! Questions or discussion topics? Some comments in the thread from Steven and Tobias.
- Tobias Bussmann: in chat: speaking of paths: it would be super helpful if postgresql would support loading extensions from additional paths beside the $SHAREDIR/extension and $PKGLIBDIR directories. At least following directory symlinks within…
- Steven Miller in chat, Replying to “The approach to use …”: I tried to make it work for Mac, for example, but the docker support didn’t work basically. I think it might work for Windows, since they have better container support. However I didn’t try that yet.
David Wheeler (he/him): Reads Tobias’s comment. You can specify a subdirectory in the sharedir and maybe the moduledir? But it’s a little hinky right now. Steve, do you want to talk about the us of Docker to build images?
Steven Miller: Yeah, I’d love to. To Tobias’s point, agree, on Tembo Cloud, we have a persistent directory where wer’re sintalling extensions, but because there is no way for an extra sharedir or package dir, we’re persisting all of the library files, including Postgres core. Not ideal, especially for upgrades.

Approach for building ind Docker: been nice, do the build ina Dockerfile, start the container, then install and compare the difference between layers and zip up all the new files. Great for cross-compile but, not working for mac or other systems. Will need a fallback option to do a local build.
- Jeremy S: in chat, Replying to “speaking of paths: i…”: Exactly same point was also mentioned just this morning on slack by Matthias
David Wheeler (he/him): Makes sense, thanks. What other bits do you feel like could be useful for packaging binaries at a community level?
Steven Miller: Sometimes we install binaries with trunk, but then difficult to know what has been installed. Nothing like apt where there is a history of what is installed or uninstall. Would be nice to do something like trunk list and see everything that has been installed. Also, future should be not just install but management, including turning extensions on, and there are a lot of ways to turn them on.
Ian Stanton: uninstall would be useful, too.
David Wheeler (he/him): Other questions about trunk or challenges to binary distribution it brings?
- *Tobias Bussmann in chat, Replying to “speaking of paths: i…”: this would allow an immutable PostgreSQL base package and still allow to install extensions on top. This is esp. important if you need to have singned packages like on macOS
- *nils in chat, Replying to “speaking of paths: i…”: Guess there is some prior art in how search_path in pg work, or the PATH in unix’s.
  
  Should be doable to allow to specify some kind of ordered search path, where Postgres will look for extensions. That way, Postgres can protect it’s own libs to no be overwritten by external libs, but allow for loading them from extra paths.
Yurri: There is CREATE EXTENSION and other extensions like logical decoding plugins. Does trunk handle them?
Steven Miller: We think of it as four types extensions into 2x2 matrix: 1. Does it require CREATE EXTENSION true or false; and 2. Does it have a loadable library true or false. The false/false category is output plugins; The true/true category, e.g. pg_partman, pg_cron; CREATE EXTENSION false and loadable library true, e.g., autoexplain, just a library, no upgrade concerns; and then CREATE EXTENSION true and loadable library false is the default case.
Ian Stanton: Steven wrote a blog on this.
- Eric in chat: Does pgrx make the process of building easier or harder and is there anything we can do today to make rust extension building better?
- Jason Petersen in chat: Yeah, it sounds like we need some sort of system database like apt has; would enable management, uninstall, version list, whether upgrades are available, etc
Yurri: That would be great. What other modules are there without extensions, like autoexplain?
Ian Stanton: auth delay is another, base backup to shell, considered parts of postgres, but we have trouble categorizing them. There are 10-15 I’ve come across.
Yurri: ARe these categories on Tembo, can you click a button?
Ian Stanton: Not a category, but would be a good one to add.
- Steven Miller in chat: This one! https://tembo.io/blog/four-types-of-extensions
  
  It’s in the API metadata
  
  Sorry if I mispronounced your name Tobias
- David Wheeler (he/him) in chat: SAME
Yurri: Did you say output plugins are handled with Tembo:
Steven Miller: YOu can install them with trunk, yes.
Yurri: And you have the build pipeline that will work without plugins too, yeah, cool.
David Wheeler (he/him): Tobias, did you want to say more about the path issues?
Tobias Bussmann: Sure! We are building the Postgres.app, distribution for macOS, working different from Linux systems. We distribute some extensions directly, but also allow building and installing extensions on it. Works nicely, even with pgxn client, but it’s built within the application, which breaks the code signature.

We always have to fight against a breaking system to allow that. Possible, but would be much cleaner to specify an extra directory where extensions could be loaded, and we could distribute packages with binary extensions that the user could download and install separately from the Postgres.app.
David Wheeler (he/him): You’re not suggesting a different directory for every extension with a module, but just another path in the search path that’s not subject to the signature verification.
Tobias Bussmann: Yes, that would be an option, but with a flexible system could use one per extension or just specify a second directory. Contrib extensions sometimes seen as part of Postgres, and they’re all stuffed in the same directory with third party extensions, which gets confusing and hard to manage.
- Steven Miller in chat: In the previous extensions mini summit, Yuri mentioned that he was working on a patch to postgres for extra libdir, extra share dir, but I have not been tracking this one
nils: That’s what I was saying in chat, there is prior art in Postgres and Unix systems where you can specify a search path in postgres for a list of schemas, and in Unix the path is to find binaries. Give me a delimited list of directories on my system. Could be super user only, where they can specify where they’re installed, and we can go through the list ot find an extension.
David Wheeler (he/him): I might be imagining this, but I seem to recall there was a proposal to have extensions in their own directories, which would be nice for packaging, but then every time you add one you have to add another directory to the list and there is some fear the lookup time could be too long.
- Jeremy S in chat, replying to “speaking of paths: i…”: (Or like LD_LIBRARY_PATH )
- David Wheeler (he/him) in chat, replying to “speaking of paths: i…”: LD_LIBRARY_PATH is all but dead on macOS
Jason Petersen: If it happens at startup I can’t imagine that being a concern. If the list changes you reboot. It’s not gonna be a performance problem, I hope.
*David Wheeler (he/him): Or HUP it if you don’t want downtime.
Jason Petersen: Sure, but it doesn’t need to be on every command.
*David Wheeler (he/him): Eric, do you want to pose your question about pgrx?
Eric: Sure. Wanted to know, were there stumbling blocks to get pgrx support built into trunk, and does it make things easy or difficult? Different from C path, are there things we could do to make things easier today?
Ian Stanton: Yeah, I think the issue is mostly on our end. We have a separate image for each version of pgrx, and keeping up with the releases is challenging. We need to rethink our image building strategy. Shouldn’t be one image for each version of pgrx. That’s the biggest thing I’ve noticed, mostly on our side.
*David Wheeler (he/him): Because you need the install the version of pgrx that the extension requires before you do the build, and that’s just too slow?
Ian Stanton: Could be too slow. We’ve known about this problem for some time, just hasn’t been addressed yet.
Eric: Okay, maybe we can talk about it offline one day, be happy to chat. I think we’re close to being able to have the CLI, cargo-pgrx, be a different version than whatever version the extension uses.
Ian Stanton: That would be super useful!
Eric: Yeah, I think we’re close to being at that point, if not there already. We can talk about that offline.
Ian Stanton: Nice! We’ll reach out in Discord.
*David Wheeler (he/him): Other comments or questions, or people who have worked on other kinds of binary registry things, would love to hear more from other perspectives. Devrim is going to talk about the Yum repository next week [ed. correction: in two weeks].
Steven Miller in chat: Daniele last time mentioned Pip is good example of mixing source and binary distributions
Eric: I have a random question related to this. In the past and recent history, has hackers talked about some way of storing extension in the database rather than relying on the file system?
*David Wheeler (he/him): Yes! In this long thread from 2011 [ed. Correction: 2013] Dimitri was proposing a “unit”, a placeholder name, where the object would be stored in the database. Very long thread, I didn’t read the whole thing, lot of security challenges with it. If it needs a shared object library loading having to be written to the file system it’s just not going to happen. I don’t know whether that’d be required or not.

Dimitri also worked on a project called pginstall where you could install extensions from the database like dbdev, but not just TLEs, but anything. The idea is a build farm would build binaries and the function in the database would go to the registry and pull down the binaries and put them in the right places on the file system.

There were a lot of interesting ideas floating around, but because of the legacy of the PGXS stuff, it has always been a bit of a struggle to decide not to use it, to support something not just on the machine, but do something over libpq or in SQL. Lot of talk, not a lot of action.
- *Tobias Bussmann in chat in response to “In the previous ex…”: still searching on hacker for it. Meanwhile I found: https://commitfest.postgresql.org/5/170/
- Steven Miller in chat: That approach is very awesome (install via extension)
Eric: I can see why it would take some time to sort it all out. One thing to require super user privileges to create an extension, but also having root on the box itself? Yeah.
Yurri: TLE plugs into that a little bit for a non-shared object. Not exactly storing it in the database, but does provide a SQL based/function method of installing from inside the database, but only for trusted languages, not shared objects.
*David Wheeler (he/him): dbdev install does download it from database.dev and stores it in the database, and has hooks into the CREATE EXTENSION command and pulls it out of its own catalog. Was a similar model with pginstall, but with binary support, too.
Yurri: Back to trunk. When you start building, and have to deal with binaries, pgxn you can put the source up there, but I want to get to the whole matrix of all the different versions. Every extension author does it a little different. Some extensions have versions for Postgres 15, another for 14, some have the same version across all the majors, sometimes an extension works for some majors and others. Has trunk expanded to other Postgres versions to support the whole exploding matrix of stuff that does and doesn’t work, 5-6 majors, gets to be a large matrix, a lot to keep track of. How’s that working out for the builds and managing that matrix.
- Steven Miller in chat: Dimensions I think are:
  - pg version
  - architecture
  - chip-specific instructions (edge case for native builds?)
Steven Miller in chat: We just announced support for 14 and 16

David Wheeler (he/him) in chat, replying to “Dimensions I think a…”: OS, OS version
Steven Miller in chat,: Replying to “Dimensions I think a…”: Ah right
Ian Stanton: Steven do you want to take that one?
Steven Miller: Oh yeah. We’ve started toe-dipping on this one. Started with Tembo Cloud’s platform, but have no released Postgres 14 and 16, and also trunk has built-in support for other architectures, such as arm, or whatever the Docker --platform flag supports. We looked at mac builds, not working yet, might work for Windows, which ahs better container support, but I don’t know, and also there is an edge case for pg_vector especially, which compiles to include ship-specific instructions for AVX512, which helps with vector. So that’s another dimension to consider.
Yurri: Part of the idea behind this forum is to see if we can chart a path forward, maybe not solve everything. What can we solve, how can we make something a little better for Postgres at large?
- Eric in chat: Even as a Mac user I don’t know the answer to this… what’s the common Postgres package there? Postgres dot app, homebrew, something else?
- David Wheeler (he/him) in chat: pgenv! (self-promotion)
- Eric in chat: I assume folks don’t use macOS in prod but developers are important too
- nils in chat, Replying to “Even as a Mac user I…”:
```
$ git clone ..
$ ./configure
$ make
$ make install
```
  At least that is what I do 😄
Steven Miller: In my opinion, the way to approach it is to know all the dimensions you need, and in the metadata API say which binaries are available. Then get through it with testing and badging If we let things get built, to what extent is it tested and used? That can help. Daniele was in the previous call, said we could look to Pip and Wheel files for inspiration, and Adam on our team has said the same. This is something that has some binary and some source, and falls back on doing the build when it needs to.
*David Wheeler (he/him): I’ve been thinking about this quite a bit lately. Can see needing to take advantage of multiple platforms available through GitHub workflow nodes or the community’s build farm, which has a vast array of different architectures and platforms to build stuff. There are precedents!

I imagine a system where, when something is published on PGXN, another system is notified and queues it up to all its build farm members to build binaries, ideally without full paths like trunk, and making them available for those platforms. Building out that infrastructure will take a fair bit of effort, I think. With cross-compiling is available it might be…doable? But most modules and for SQL and maybe Rust or Go extensions, but a challenge for C extensions.

This is a problem I’d like us to solve in the next year or two.
- Steven Miller in chat, replying to “I assume folks don’t…”: Yeah exactly, like trunk install after brew install postgres
- Tobias Bussmann in chat, replying to “Even as a Mac user…”: this seems to be quite spread. There are also people that prefer docker based installs
- Eric in chat: pgrx supports cross compilation
  
  With a caveat or two!
- Eric in chat, replying to “Even as a Mac user I…” @nils same. For v9.3 though 16!
*David Wheeler (he/him): What else? Reading the comments.
Yurri: I think maybe that PGXN JSON file, I know you’ve been spending time on it, David, including the proposal on namespacing a few days ago. That feels like it could be helpful to be part of this. IF it could be something we could center around… The first time I wanted to put an extension on PGXN, it took me a long time to figure out that JSON file. I didn’t find the blog post that goes through it in nice detail till like two weeks after. If I’d found it sooner I could have skipped so many things I tried to figure out on my own.

If we can center around that file, it’ll draw more attention to it, more links back to it, more examples people blog about here and there, it helps going forward. The trick is getting it right not being this massive thing no one can figure out, or has too many options, but hits all the points we need.
- nils in chat, replying to “Even as a Mac user I…”: Well, mostly for extension, for Postgres I rely on David’s pgenv
- *Eric * in chat, replying to “Even as a Mac user I…”: @Tobias Bussmann hmm. Makes it difficult to get an extension installed.
*David Wheeler (he/him): I’ve been thinking about this a lot, drafted a doc some of my colleagues at Tembo have read over and I hope to publish soon [ed. Note: now published], thinking through what a v2 of the PGXN Meta Spec might include. I think we should extend with list of external libraries required, or the architectures it supports, or it’s a loadable library or an app that doesn’t even go into the database.

I would like soon to draft an actual revision of the spec, and document it well but also turn it into a JSON Schema document so we can automate publishing it and verification in the same place. I also imagine building an eventual replacement or evolution of the PGXN client or trunk client or some client that you can use to manage that thing. I think pgrx does that, adding metadata via the client rather than parse and understand the whole file.

I’m with you it could get really complicated, but I’m not sure I see an alternative other than building good tooling to minimize the pain.
Ian Stanton: I think automatically pulling that information when it’s readily available would be super helpful. We use it as an app to just take care of things for people.
*David Wheeler (he/him): Right, and then if we’re successful in getting it done it’s getting people to take up the tools and start using them. There’s only so much we can infer. I can tell how to do a build if there’s a Makefile or a configure file or a cargo.toml, but that doesn’t reveal what libraries are required. This is why there’s a lot of hand-tuning of RPM and Apt spec files.
- Steven Miller in chat: We are calling this “system dependencies”
  
  Ssl and glibc the main ones 🙂
- Jason Petersen in chat: And sometimes the package names aren’t even 1—1 mappings
- Eric in chat: Ha! Try relying on elasticsearch as a runtime dependency! 😞
Yurri: That’s another thing to touch on. A lot of extensions are just a thin layer of glue between Postgres and some OSS library that someone else maintains. But the trick, when you want to build a Yum package, the dependency has a different name than the rest of the RedHat ecosystem vs. the Debian ecosystem. So part of what Devrim has to do to maintain the RPM packages is manually sort all that out, because you can’t automatically… libc! It’s called glibc in RedHat and just libc in Debian, and every package has slightly different names. Do how do you manage that in trunk? Do you pull the source for any dependencies? Does your Docker image…I don’t know how this is working.
- David Wheeler (he/him) in chat: I want to build a json schema validation extension in Rust using https://github.com/Stranger6667/jsonschema-rs or something
- Tobias Bussmann in chat, replying to “Ha! Try relying o…”: or V8 🤯
Ian Stanton: Two sides to that one is build time dependencies, and there there are runtime dependencies. I just dropped an example for some random extension. Tthe way we’ve been building this is to write out a Dockerfile that can include build time dependencies. [hunts for link…]
- Ian Stanton in chat: https://github.com/tembo-io/trunk/blob/main/contrib/age/Dockerfile
Ian Stanton: We specify them all there. But for runtime, we don’t know what’s required until we test the thing. We have stuff in our CI pipelines to install and enable the extension to see if it works. If it doesn’t, it will report a missing dependency. Then we know we need to add it to our Postgres images. Not the best flow for finding these dependencies. Steven, want to add anything more to the build time dependency piece?
- David Wheeler (he/him) in chat, replying to “Ha! Try relying on …”: Next version of plv8 released on PGXN will have v8 bundled
Steven Miller: A lot share the same ones, SSL and glibc, so we just build with the same versions we run on Tembo Cloud. In the metadata we list all system dependencies, that’s what we build towards, and include them in the Docker image. If you pick a different stack, like the Machine Learning stack, it has all the Python stuff in the base image. We don’t really love this, but this is something where Python wheel might inspire us, becaus it has packaging and system dependencies.
- Eric in chat, replying to “I want to build a js…”: I feel like I’ve seen one already?
- David Wheeler (he/him) in chat, replying to “I want to build a js…”: GIMME
Yurri: If you really want to od this right, just like in the RPM repositories, you have to know what the dependencies are. David, I’m curious, what your thoughts are, if this is to be done right, there has to be a way to indicate dependencies in the META.json file, but then I’m talking about Debian and RedHat, but what about Mac? Windows doesn’t really have a packaging system. There are BSDs, other places Postgres can run, probably have to narrow the scope a bit to solve something.
Tobias Bussmann in chat, responding to “Ha! Try relying o…” Sounds promising, but for which architectures? I have good hope for pljs as replacement for plv8
Ian Stanton in chat: https://github.com/tembo-io/trunk/blob/d199346/contrib/fuzzystrmatch/Trunk.toml#L13
David Wheeler (he/him): Fortunately there are only around 1100 extensions in the world, a relatively low barrier at this point. Some of these other things have thousands or millions of extensions.
Yurri: I guess when you put it that way! But I wasn’t going to go through all 1000 of them one-at-a-time.
David Wheeler (he/him): No. I posted about this on Ivory a few weeks ago [ed. correction: he means on Mastodon]: how does one do this in a platform-neutral way. There are some emerging standards where people are trying to figure this stuff out. One is called purl, where you specify dependencies by packing URLs, or “purls”, and then it’s up to the installing client to resolve them vai whatever the packaging system it depends on.

I would assume on Windows we’d have to say “it works great as long as you use Chocolatey” or something like that. But it’s certainly a difficult problem. I’m looking forward to your talk about your unique approach to solving it, Yurrii [ed. note: that’s the May 1 mini-summit], that’s going to be super interesting.
David G. Johnston: Ultimately you just crowd sourcing. If we just say “this is what we call this thing in PostgreSQL world”, then if people need to compile it on Chocolatey on Windows, they figure it out and contribute it. Or on Debian or RedHat. Just facilitate crowd-sourcing, metadata in a database.
David Wheeler (he/him): My initial idea was a global registry that people contribute to just by editing files in a GitHub repository.
David G. Johnston: HashiCorp has to have something like that already, there’s stuff out there, no need to reinvent the wheel. This is a global problem if we open-source it we can solve it.
David Wheeler (he/him): Right. Really appreciate everyone coming. Great discussion, I appreciate it. In two weeks, Devrim Gündüz is going to talk about the Yum Community Repository and the challenges of RPMifying extensions. I had this idea of automating adding extensions to the Yum and Apt repositories, an Devrim is a little skeptical. So super look forward to his perspective on this stuff. Two weeks from today at noon [ed.: America/New_York]. Thanks for coming!
- Eric in chat: Thanks a ton! This is exciting stuff.
- Tobias Bussmann in chat: Thanks all!
- Grant Holly in chat: Thanks everyone. Great discussion
- Jeremy S: in chat: Thanks david
- Steven Miller in chat: Thanks all! Cya next time
- Jeremy S in chat: Isn’t bagel supposed to come for the end
- Ian Stanton in chat: Thanks all :)

RFC: PGXN Metadata Sketch

21 Mar 2024

Update 2024-03-22: To compliment high-level discussions on the #extensions channel on the Postgres Slack, I also created a pull request to make it easer directly comment to this post point by point. Keep the feedback coming, and thank you!

Update 2024-03-25: Clarified the definition of “Release” and made “Maintainers” plural. Thanks to Matthias van de Meent for the suggestions!

This post proposes a new metadata standard for extension packaging, distribution, and delivery, building on the PGXN Meta Spec to address its shortcomings and emerging use cases 12 years on. The goals include:

Don’t break the existing standard, but add to it
Add explicit support for different types of Postgres extensions, such as background workers and loadable modules
Add additional metadata for curation and automated binary compilation
Add hooks for registry and third-party-contributed metadata, including binary distribution options and stats & reports.

There are two high level categories of metadata, split into two separate files:

Package Metadata provided by extension package authors
Registry Metadata aggregated by the root registry from various sources, including data derived from the extension source code or package metadata, but also trusted third parties such as packagers, smoke testers, security scanners, and more.

Following community discussion of this proposal, the Package Metadata will lead to a draft for PGXN Meta Spec version 2.0.0, while the Registry Metadata will guide the design and implementation of the Root Registry APIs required to provide it.

Terms

Definition of terms as used in this document, because every language and distribution system uses terms differently. These are ours.

Extension: a software component that extends the capabilities of a PostgreSQL database or cluster. Extensions may be CREATE EXTENSION extensions, background workers, command-line apps, loadable modules, shared libraries, and more.
Package: A collection of extensions that are released, versioned, and distributed together. Packages may be downloaded directly from version control repositories or in archive files generated by a release tag.
Package Path: Identifies a package, declared in the package metadata file. A package path should describe both what the package does and where to find it. Typically, a package path consists of a repository root path — the directory that contains the metadata file — and a directory within the repository.
Repository: The version control repository where the package is developed, referenced as package path or URL.
Repository root path: The portion of the package path that corresponds to the root directory of the version control repository where the package is developed, and which contains the metadata file. Most package are defined in their repository’s root directory, so this is usually the entire path. For example, github.com/example/pgtap is the repository root path for the package of the same name.
Source Distribution: The contents of a single package bundled together with package metadata into distributable archive file, usually named with the last part of the package path or the main extension, a dash, and the version, e.g., pgtap-1.14.3.zip.
Binary Distribution: A binary software distribution of a package, compiled and packaged for a particular OS version, platform, and hardware architecture. Examples include deb, RPM, and wheel.
Release: A single version of the package made available to the public on PGXN, expressed as the package path, an at sign, and the semver. Example: github.com/theory/pgtap@v1.14.3.

Package Metadata

The extension author provides this data as part of a release submitted to the root registry. The registry itself will not change this data in any way, but provide it as-is.

Essentials

Essential information about the package itself, including its path, name, version, list of authors, license, etc. Ecosystem applications use this data for indexing, naming conventions, and display information.

Fields (all required):

Package: A package path uniquely identifying the package (usually the source repo host name and path; see this RFC)
Name: The name of the extension package, usually the same as the main extension, not unique but conflicts are resolved by escalation to package
Version: The version of the package, expressed as a SemVer
Abstract: A short description of the purpose of the package
Description: A longer description of the package, answering the question “what is this thing and what value is it?”
Maintainers: List of maintainers, each an object with name and either email or url (or both)
License: An SPDX 2.3 license expression comprised of one or more licenses from the SPDX License List
Spec: Information about the metadata spec itself, a version and URL

Example:

{
  "package": "github.com/example/pg-pair",
  "name": "pair",
  "version": "1.1.0",
  "abstract": "A key/value pair data type",
  "description": "Adds a key/value pair data type to Postgres, useful for specifying arbitrary key/value function parameters.",
  "maintainers": [
    {
        "name": "Naomi Nagata",
        "email": "naomi@example.com",
        "url": "https://example.com/roci/naomi"
    }
  ],
  "license": "PostgreSQL OR MIT",
  "spec": {
    "version": "2.0.0",
    "url": "https://pgxn.org/meta/spec.txt"
  }
}

Other possibilities:

License File: An alternative to license for nonstandard licenses (from cargo)
Contributors: List of people who have contributed to the package
Emeriti: List of former maintainers and contributors

PGXN Compatibility:

Overloads the string representation of the spec v1 field, but it should be able to parse either one, especially with the spec version to differentiate.
Uniqueness of package and relationship to name
PGXN has maintainer; is it really worth pluralizing it? Maybe we don’t need it at all, given repository and issue links.

Artifacts

A list of links and checksums for downloading the extension package in one or more formats, including source code, binaries, system packages, and more. Apps use this information (along with registry metadata) to determine the best option for installing an extension on a particular system.

This information is optional, and provided in addition to the source code available from the root registry itself. The idea is that, on release, the release manager creates artifacts in one or more locations (such as GitHub releases), and it’s useful to link to them from the registry. Binary builds may be unlikely today, but leave the option open for now by allowing arbitrary text in the type field.

Each item must have:

Type: The type of artifact: Source, binary, RPM, etc.
URL: Link the the artifact
SHA: A SHA checksum

Each URL must properly resolve and the checksum must match.

[
  {
    "type": "source",
    "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0.zip",
    "sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004"
  },
  {
    "type": "binary",
    "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-amd64.tar.gz",
    "sha1": "12d9bc5cfb6bc3c453627eac69511f48be63cfc0"
  },
  {
    "type": "binary",
    "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-arm64.tar.gz",
    "sha1": "787dc39137f7d1510a33ab0a1b8905cd5f3f72d1"
  }
]

PGXN Compatibility:

New object, no conflicts.

Resources

External information about the package, mostly links, including source code repository, bug reporting, documentation, badges, funding, etc. Apps use this data for links, of course, but also full text indexing, documentation rendering, and displaying useful information about the extension.

Fields:

Homepage: URL for the package’s home page
Issues: URL to issues
Documentation: URL to documentation
Support: URL to support
Repository: Source code repository URL
Badges: Array of badge link objects
Funding: Array of funding objects

{
  "homepage": "https://pair.example.com",
  "issues": "https://github.com/example/pair/issues",
  "documentation": "https://pair.example.com/docs",
  "support": "https://github.com/example/pair/discussions",
  "repository": "https://github.com/example/pair",
  "badges": [
    {
      "alt": "Test Status",
      "src": "https://test.packages.postgresql.org/github.com/example/pair.svg"
    }
  ],
  "funding": [
    {
      "type" : "individual",
      "url" : "http://example.com/donate"
    },
    {
      "type" : "patreon",
      "url" : "https://www.patreon.com/example"
    }
  ]
}

PGXN Compatibility:

issues replaces bugtracker and is just a URL. Can support either
repository is just a URL instead of an object. Can support either but with the package key referencing the repository path perhaps we don’t need to change it.

A description of what’s included in the package. Often a package consists of multiple extensions, such as PostGIS, which includes postgis, postgis_tiger_geocoder, address_standardizer, and more.

Furthermore, some extensions are not CREATE EXTENSION-type extensions at all, such as background workers, command-line apps, loadable modules, shared libraries, and more. Each should be listed along with documentation links where they differ from the package overall (or are simply more specific).

Fields:

Extensions: Object describing CREATE EXTENSION extensions. Keys are extension names, fields are objects with the fields sql, doc, abstract, control (same as control file, but can use the file itself), tle (meaning it can be used as a trusted language extension), and preload (to indicate whether its libraries need to be loaded in advance via shared_preload_libraries, session_preload_libraries, or local_preload_libraries).
Workers: Object describing background workers. Keys are worker names, fields are objects with the fields bin, doc, and abstract.
Apps: Object describing applications, command-line or otherwise. Keys are app names, fields are objects with the fields bin, doc, and abstract.
Modules: Object describing loadable modules that can be loaded into Postgres (not necessary for extensions that include libraries). Keys are module names, fields are objects with the fields lib (without file extension), doc, abstract, and preload.
Libraries: Other libraries that may ship in the package and need to be installed but are not loadable modules, such as a dynamic library used by an app. Keys are library names, fields are objects with the fields lib (without file extension) or dir (for a directory of files), doc, and abstract.

{
  "extensions": {
    "pair": {
      "sql": "sql/pair.sql",
      "doc": "doc/pair.md",
      "abstract": "A key/value pair data type",
      "preload": "session",
      "tle": true,
      "control": {
        "directory": "pair",
        "module_pathname": "$libdir/pair",
        "relocatable": true
      }
    }
  },
  "workers": {
    "pair_pruner": {
      "bin": "bin/pair_pruner",
      "doc": "doc/pair_pruner.md",
      "abstract": "A worker to periodically prune pairs"
    }
  },
  "apps": {
    "pair_rand": {
      "bin": "bin/pair_rand",
      "doc": "doc/pair_rand.md",
      "abstract": "Command to generate random pairs of strings"
    }
  },
  "modules": {
    "lib_pair": {
      "lib": "lib/lib_pair",
      "doc": "doc/lib_pair.md",
      "abstract": "A library hooking function calls to convert pairs to named parameters",
      "load": "shared_preload_libraries"
    }
  },
  "libraries": {
    "ruby_pair": {
      "dir": "lib/gems",
      "abstract": "Ruby libraries required to run the extension"
    }
  }
}

Other options:

Allow or require an array of docs?
Specify support for specific doc formats (markdown, manpage, plain text, MkDocs, etc.)?
Support glob patterns?
Add libraries as sub-key for workers and apps, e.g. for a Ruby app that requires a directory of gems?

PGXN Compatibility:

Significantly different from the existing provides object, so key this one under contents or manifest and continue to support both.

Dependencies

A list of external dependencies required to configure, build, test, install, and run the extensions in the package. These include not only other extension packages, but also external libraries and system dependencies. And let’s not forget the versions of Postgres required, as well as any OS and version dependencies (e.g, does it work on Windows? FreeBSD? What versions?) and architectures (arm64, amd64, etc.)

Fields:

platforms: Platforms supported by the package, defined as OS and architecture. Assume all platforms if none are listed.
postgres: Versions of PostgreSQL required by the package, specified as a version range, as well as compiled features (--with-* configure options)
pipeline: The build pipeline required to build the extension. Options: pgxs, meson (?) pgrx, gnu (./configure), gem, cpan, pip, go, etc.
prereqs: Object defining external dependencies required for different phases:
- configure: Dependencies to configure the package (e.g., items required for ./configure or make to work)
- build: Dependencies to build the package (in addition to run dependencies)
- test: Dependencies to test the package (in addition to build and run dependencies)
- run: Dependencies to run the package
- develop: Dependencies to develop the package (in addition to all other phase dependencies)
Each points to an object with at least one of these keys:
- requires: Required to use the package
- recommends: Not required, but recommended as a best practice
- suggests: Not required, but nice to have
- conflicts: Package will not work with these items
These, in turn, contain at least one of these keys:
- package: Other packages, expressed by their package paths, pointing to a version expression
- external: External dependencies not included with the package
- contrib: List of Postgres contrib or development packages
variations: A list of dependency variations, each with two fields:
- where: Specifies a configuration that, when matched, includes the dependencies. For example, to specify alternate dependencies for Linux arm64, it would be "platforms": { "linux": "arm64" }. For a specific version of Postgres, it would be something like "postgres": { "version": ">= 16, < 17" }. [This likely needs some deep thinking through.]
- dependencies: An object containing of the other key structures in this object, defining the additional dependencies for environments that match the where field.

The packages and external keys point to objects in which the keys are dependency names and the values are version range expressions (empty string means any version). The postgres.version field is also a version range expression.

The contrib key points to an array of Postgres-supplied apps and extensions. Versions are not necessary because they effectively have the same version as a supported Postgres version.

Example:

{
  "postgres": {
    "version": ">= 12, < 17",
    "with": [ "xml", "uuid", "perl" ]
  },
  "pipeline": "pgrx",
  "platforms": {
    "linux": [ "amd64", "arm64" ],
    "darwin": [ "amd64", "arm64" ],
    "windows": [ "amd64" ],
    "freebsd": [ "amd64" ]
  },
  "dependencies": {
    "configure": {
      "requires": {
        "external": { "cargo-pgrx": "" }
      }
    },
    "build": {
      "requires": {
        "external": {
          "curl": "",
          "awk": "",
          "sed": "",
          "perl": "5.20"
        }
      },
      "recommends": {
        "external": {
          "jq": "",
          "perl": "5.38"
        }
      }
    },
    "test": {
      "requires": {
        "contrib": [ "pg_regress", "plpgsql", "plperl" ],
        "packages": {
          "github.com/theory/pgtap": "1.1.0"
        }
      }
    },
    "run": {
      "requires": {
        "contrib": [ "plperl" ],
        "packages": {
          "github.com/theory/hostname": ""
        }
      }
    }
  }
}

Notes:

The external field is the tricky one. How does one provide a canonical dependency for a third-party library or app that may have loads of different names on different platforms, different packaging systems (apt vs rpm), and even different versions of them, or split apart for development and runtime use? Some possibilities:

Use purl and let downstream services handle the conversion
Create a rule set that allows one to predict dependency name changes across ecosystems, similar to repology-rules
Just use Repology?

Other notes:

Decide on a version range standards to adopt. Start with the PGXN version range, pep-0508, npm ranges, and Gem::Version.
Is pipeline really necessary, given configure requirements? I think so, because it tells the client the preferred build system to use, in case it can’t detect it for some reason.

PGXN Compatibility:

Significantly differs from the existing prereqs object, so this one is keyed under dependencies and we can support either.

Classification

Classification metadata lets the extension developer associate additional information to improve discovery, such as key words. It might also allow selections from a curated list of extension classifications, such as the category slugs supported for the cargo categories field.

Ecosystem apps use this data to organize extensions under key words or categories, making it easier for users to find extensions that are frequently used together or for specific workloads or tasks.

Example:

{
  "tags": [
    "testing",
    "pair",
    "parameter"
  ],
  "categories": [
    "Machine Learning"
  ]
}

PGXN Compatibility:

Conflicts with the existing root-level tags array, but we can support either one.

Example: PGML Extension

A hand-coded example that attempts to capture all the complexity of the pgml extension:

{
  "package": "github.com/postgresml/postgresml/pgml-extension",
  "name": "pgml",
  "version": "2.8.2",
  "abstract": "pgml: Created by the PostgresML team",
  "maintainers": [
    {
      "name": "the PostgresML team",
      "url": "https://github.com/postgresml/postgresml/"
    }
  ],
  "license": "MIT",
  "spec": {
    "version": "2.0.0",
    "url": "https://pgxn.org/meta/spec.txt"
  },
  "artifacts": [
    {
      "type": "source zip",
      "url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.zip",
      "sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004"
    },
    {
      "type": "source tgz",
      "url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.tar.gz",
      "sha256": "845f28339c6159ac32daccea1cd17b386ea083c3e60bb8d58fb737725afe7eb5"
    }
  ],
  "resources": {
    "homepage": "https://postgresml.org/",
    "issues": "https://github.com/postgresml/postgresml/issues",
    "documentation": "https://postgresml.org/docs/",
    "support": "https://discord.com/invite/DmyJP3qJ7U",
    "repository": "https://github.com/postgresml/postgresml",
    "badges": [
      {
        "alt": "Tests Passing",
        "src": "https://github.com/postgresml/postgresml/actions/workflows/ci.yml/badge.svg"
      },
      {
        "alt": "Chat Activity",
        "src": "https://camo.githubusercontent.com/1988e7aaaa6a0c8ddcf880faacf1ec3263e23fac4aeff4710922082fde84442a/68747470733a2f2f696d672e736869656c64732e696f2f646973636f72642f31303133383638323433303336393330303939",
        "link": "https://discord.gg/DmyJP3qJ7U"
      }
    ]
  },
  "manifest": {
    "extensions": {
      "pgml": {
        "sql": "sql/pgml--2.8.2.sql",
        "preload": "shared",
        "tle": false,
        "control": {
          "comment": "pgml:  Created by the PostgresML team",
          "default_version": "2.8.2",
          "module_pathname": "$libdir/pgml",
          "relocatable": false,
          "superuser": true,
          "schema": "pgml",
          "trusted": true
        }
      }
    }
  },
  "dependencies": {
    "postgres": {
      "version": ">= 15, < 16"
    },
    "pipeline": "pgxs",
    "platforms": {
      "linux": [
        "amd64",
        "arm64"
      ],
      "darwin": [
        "amd64",
        "arm64"
      ]
    },
    "dependencies": {
      "configure": {
        "requires": {
          "external": {
            "cargo-pgrx": "",
            "bison": "",
            "cmake": "",
            "flex": "",
            "libclang-dev": "",
            "libopenblas-dev": "",
            "libpython3-dev": "",
            "libreadline-dev": "",
            "libssl-dev": "",
            "pkg-config": ""
          }
        }
      },
      "run": {
        "requires": {
          "external": {
            "libopenblas": "",
            "libpython3": "",
            "libreadline": "",
            "libssl": "",
            "python3": ""
          }
        },
        "recommends": {
          "external": {
            "python(pyarrow)": "=11.0.0",
            "python(catboost)": "",
            "python(lightgbm)": "",
            "python(torch)": "",
            "python(torchaudio)": "",
            "python(torchvision)": "",
            "python(xgboost)": "",
            "python(accelerate)": "",
            "python(bitsandbytes)": "",
            "python(ctransformers)": "",
            "python(huggingface-hub)": "",
            "python(deepspeed)": "",
            "python(einops)": "",
            "python(optimum)": "",
            "python(peft)": "",
            "python(tokenizers)": "",
            "python(transformers)": "",
            "python(transformers-stream-generator)": "",
            "python(InstructorEmbedding)": "",
            "python(sentence-transformers)": "",
            "python(rouge)": "",
            "python(sacrebleu)": "",
            "python(sacremoses)": "",
            "python(datasets)": "",
            "python(orjson)": "",
            "python(langchain)": ""
          }
        }
      }
    },
    "variations": [
      {
        "where": {
          "platforms": {
            "linux": []
          }
        },
        "dependencies": {
          "prereqs": {
            "run": {
              "recommends": {
                "external": {
                  "python(auto-gptq)": "",
                  "python(xformers)": ""
                }
              }
            }
          }
        }
      }
    ]
  },
  "classification": {
    "tags": [
      "machine learning",
      "ml",
      "transformers"
    ],
    "categories": [
      "Machine Learning"
    ]
  }
}

Note that PostgresML also provides a dashboard app, but it does not ship with the extension, so it’s not listed here. It could have its own package metadata and be separately distributed.

For the Python dependencies, in the absence of a neutral standard for specifying package, here they’re specified using a syntax borrowed from RPM to make clear that they’re Python dependencies. A couple of those dependencies are Linux-only, so listed in variations where the OS is Linux.

Registry Metadata

The second metadata format is registry metadata. It starts by with a copy the package metadata, but builds from there, adding information (such as the artifact link, SHA, and cryptographic signature from the registry itself) and sections (defined below).

Providing a separate metadata format enables broader community collaboration to augment the metadata for extension packages.

Ideally an ecosystem of trusted services will be developed by various parties, who will report back to the root registry as the source of record for all metadata about extension packages. For example, a binary distribution service like apt.postgresql.org or trunk will submit links and checksums for binary packages and perhaps periodic download stats.

Broadly speaking, there are three typologies for registry metadata: Distributions, Reports, and Classifications.

Distributions

A list of links and checksums for downloading binary distributions for the extension package. The root registry page for the package can therefore list all the ways to install an extension and expand the list as new binary registries add it.

Along with the Artifacts package metadata, the distribution metadata allows installer tools to select the best option to install based on the OS, Postgres version, and cluster configuration (e.g., install the community Apt package if the cluster was installed from the community Apt registry).

Each item must have:

Registry: The name of the distribution registry, unique
Type: The type of distribution: Apt, RPM, TLE, etc.
URL: Link to the the registry’s page for the distribution
Command: In lieu of a URL and SHA it can provide instructions to install the distribution, e.g., apt-get install postgresql16-pair.

Each distribution registry must be trusted by the root registry to submit this information to add to the metadata, and have its own page with additional contextual information: its home page, docs, basic instructions, whatever. A README of sorts.

Example:

[
  {
    "registry": "trunk.pgxn.org",
    "type": "trunk",
    "command": "pgxn install github.com/example/pair"
  },
  {
    "registry": "apt.postgresql.org",
    "type": "apt",
    "command": "apt-get install postgresql16-pair"
  },
  {
    "registry": "yum.postgresql.org",
    "type": "rpm",
    "command": "yum install postgresql16-pair"
  },
  {
    "registry": "pgt.dev",
    "type": "apt",
    "url": "https://pgt.dev/extensions/pair",
    "command": "trunk install pair"
  },
  {
    "registry": "pgxman.com",
    "type": "apt",
    "url": "https://pgxman.com/x/pair",
    "command": "pgxman install pair"
  },
  {
    "registry": "database.dev",
    "type": "sql",
    "url": "https://database.dev/example/pair",
    "command": "SELECT dbdev.install('example-pair');"
  },
]

Other Options:

Information about installing from a registry that doesn’t offer URLs, like Homebrew, where one just needs to know what the formula is called. Maybe the same should apply to Apt?

Metrics and Reports

This object includes data such as number of downloads, build and test status on various Postgres/OS/version combinations, binary packaging distributions, test coverage, security certification, vulnerability detection, quality metrics and user ratings, and more. Some of these bits of data might include badges, e.g., for ratings or security assessments.

In the broader ecosystem, it would be the responsibility of the root registry to ensure that data comes only from trusted sources. However this data will be publicly readable, allowing any downstream applications to extend and publish metrics and reports with their own information.

Example:

{
    "aggregates": {
      "downloads": 32
    },
    "sources": {
      "smoke-tester.example.com": {
        "stats": {
          "passes": 32,
          "fails": 8,
          "unknown": 0
        },
        "link": "https://smoke-tester.example.com/extension/pair",
        "badge": "https://smoke-tester.example.com/extension/pair.svg"
      },
      "pgt.dev": {
        "downloads": 12,
        "tags": ["two", "kv"],
        "categories": ["Analytics"]
      },
      "pgxman.com": {
        "downloads": 20
      },
      "ratings.example.com": {
        "stats": {
          "ratings": 324,
          "average": 3.2,
          "comments": 13
        },
        "link": "https://ratings.example.com/extension/pair",
        "badge": "https://ratings.example.com/extension/pair.svg"
      },
      "stats.example.com": {
        "contributors": 2,
        "stars": 14,
        "last_active": "2024-01-12",
      },
      "security-scan.example.com": {
        "advisories": {
          "high": 2,
          "moderate": 11,
          "low": 1,
          "cves": 0
        },
        "link": "https://security-scan.example.com/extension/pair",
        "badge": "https://security-scan.example.com/extension/pair.svg"
      }
    }
}

Notes:

The aggregates section aggregates results from multiple sources, for example summing all downloads or averaging ratings. The list of items to aggregate could evolve regularly.

Each key in sources identifies a trusted downstream source of information. Each would have its own schema describing its objects and their meaning, along with URI templates to link to. For example, stats.example.com might have these templates:

{
  "contributors": {
    "alt": "List of people who have contributed to the {name} project",
    "badge": "https://stats.example.com/{ package }/contributors.svg",
    "uri": "https://stats.example.com/{ package }/contributors"
  },
  "stars": {
    "alt": "Star ratings for the {name} project",
    "badge": "https://stats.example.com/{ package }/stars.svg",
    "uri": "https://stats.example.com/{ package }/stars"
  }
}

Might be worth adopting more of an API of multiple files, rather than one big file. Example: Homebrew analytics. Note also its support for date ranges for analytics (30/90/365 days, stats when directly installed vs. installed as a dependency).

Extension Registry Namespacing RFC

19 Mar 2024

A few weeks ago I brainstormed about decentralized Postgres extension publishing, inspired in part by an examination of Go decentralized publishing. It was…a lot. I’ve been deeply pondering the future of PGXN and the broader extension ecosystem, and want to start to nail down some decisions. To that end, I’d like to propose an update to extension namespacing.

Status Quo

There are currently three ways in which an extension is considered unique:

Only one extension can have a given name within a single Postgres cluster. Names are defined by the name of the control file. It is therefore not possible to have two extensions with the same name in the same Postgres cluster.
PGXN follows this pattern: Only one extension can have a given name in the PGXN registry. The first person to release an extension then “owns” its name, and no one else can release an extension with the same name.¹ I think dbdev follows the same pattern.
Other registries like trunk and pgxman define an extension by the distribution name, at least for the purposes of selecting a binary to install. Thus when you trunk install postgis, you get all of the extensions included, as you’d expect, while trunk install address_standardizer wouldn’t work at all. In the few places that trunk supports installation by extension name, it prompts the user to use the appropriate package name if there’s a conflict.

A Modest Proposal

I’d like to propose the following changes to the PGXN Meta Spec to start to move away from extension uniqueness in the broader extension ecosystem and more toward package name.

Add a new field, call it module_path, project_path, project_uri, that, if present, uniquely identifies an extension project and all of its parts. It should be to a Go-style module path (or URI) that identifies the project repository path where a META.json file lives.
Retain the provides object where keys identify extensions, but those keys will no longer be globally unique to the registry. In other words, the combination of module_path and extension name uniquely identifies an extension, including an empty module_path.

How it Works

Some examples. Let’s say there is an existing extension named pair, included in the distribution named pg_pair:

{
  "name": "pg_pair",
  "version": "1.2.3",
  "provides": {
    "pair": {
      "file": "pair.sql",
      "version": "1.2.0"
    }
  }
}

The extension name pair is unique, and pgxn install pair will download the pg_pair v1.2.3 bundle and compile and install pair v1.2.0.

Now someone else comes along and wants to make their own pair with this metadata:

{
  "name": "my_pair",
  "version": "0.2.3",
  "provides": {
    "pair": {
      "file": "pair.sql",
      "version": "0.2.3"
    }
  }
}

Just like today, this upload would be rejected, because there is already a registered pair extension. Under my proposal, they can disambiguate by providing a module_path:

{
  "name": "my_pair",
  "module_path": "github/example/pair",
  "version": "0.2.3",
  "provides": {
    "pair": {
      "file": "pair.sql",
      "version": "0.2.3"
    }
  }
}

This upload would be allowed. With these two releases, someone attempting to install pair would see something like this:

$ pgxn install pair
ERROR: Duplicate extension name “pair”. Install one of these instead:
       * pgxn.org/dist/pair
       * github/example/pair

Note the the module path pgxn.org/dist/pair in the the first option. This is the default module path for distributions without a module path.² But now the user can select the proper one to install:

$ pgxn install pgxn.org/dist/pair
INFO: latest version: pgxn.org/dist/pair@1.2.3
INFO: building extension
INFO: installing extension
INFO: done!

Furthermore, the PGXN client will prevent the user from later installing a conflicting extension. The failure would look something like:

$ pgxn install github/example/pair
INFO: latest version: pgxn.org/dist/pair@0.2.3
ERROR: Cannot install extension “pair” from pgxn.org/dist/pair:
ERROR: A conflicting extension named “pair” is already installed
ERROR: from pgxn.org/dist/pair

Features with Benefits

I see a number of benefits to this change:

Compatibility with the v1 metadata spec, so that no data migration or distribution indexing is required.
It loosens up extension namespacing (or name registration, if you prefer) while adding additional metadata to help users evaluate the quality of an extension. For example, does it come from a well-known developer? You can see it right in the module path.
It creates a pattern to eventually allow auto-indexing of extensions. For example, if you run pgxn install github.com/example/pew, and PGXN doesn’t have it, it can look for a META.json file in that repository and, if it exists, and there’s a semver release tag, it could try to index it and let the user install it. There are ownership issues to be worked out, but it has possibilities.
It preserves the Postgres core concept of extension identity while putting in place a well-established (by Go modules and widespread use of URIs in general) that the Postgres core could eventually adopt to allow more flexible extension namespacing.

Request for Comments

What do you think? Good idea? Terrible idea? Please hit me with your thoughts on Mastodon, or via the #extensions channel on the Postgres Slack. I’d like to get this decision (and a few others, stay tuned!) nailed down soon and start development, so don’t hesitate? I need your help to prevent me from making a huge mistake.

Unless the owner would like to share ownership with someone else, in which case they can email me to request that another user be granted “co-ownership”. They can also request to transfer ownership to another user, after which the original owner will no longer be able to release the extension. ↩︎
Or, if the META.json file has a repository resource with a URL, PGXN could index it as the implied module path. Or, failing that, maybe it should fall back on the distribution name instead of a pgxn.org path, and prompt with pg_pair/pair. ↩︎

Talk: Building Trunk

18 Mar 2024

Line drawing of a steamer trunk with three diamond stars above it.

Update: 2024-03-22: Slides and video linked below.

This week, my fellow Tembonaut Ian Stanton will present the extension ecosystem mini-summit talk, “Building Trunk: A Postgres Extension Registry and CLI”. We felt it important to get some insight from a couple of the recently-developed Postgres extension registries: what problems they set out to solve, how the were built and operate, successes addressing their issues, and what issues remain, both for the projects and the ecosystem overall. Ian plans to give us the low-down on trunk.

Join us! Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

Update: 2024-03-22: Such a great discussion, was great! Missed it? I got you. Links:

Mini Summit One

15 Mar 2024

Great turnout and discussion for the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28. Thank you!

The talk, “State of the Extension Ecosystem”, was followed by 15 minutes or so of super interesting discussion. Here are the relevant links:

For posterity, I listened through my droning and tried to capture the general outline, posted here along with interspersed chat history and some relevant links. Apologies in advance for any inaccuracies or missed nuance; i’m happy to update these notes with your corrections.

And now, to the notes!

Introduction

Introduced myself, first Mini Summit, six leading up to the in-person summit on May 28 at PGConf.dev in Vancouver, Canada.
Thought I would get it things started, provide a bit of history of extensions and context for what’s next.

Presentation

Postgres has a long history of extensibility, originally using pure SQL or shared preload libraries. Used by a few early adopters, perhaps a couple dozen, including …
Explicit extension support added in Postgres 9.1 by Dimitri Fontaine, with PGXS, CREATE EXTENSION, and pg_dump & pg_restore support.

Example pair--1.0.0.sql:

-- complain if script is sourced in psql and not CREATE EXTENSION
\echo Use "CREATE EXTENSION pair" to load this file. \quit

CREATE TYPE pair AS ( k text, v text );

CREATE FUNCTION pair(text, text)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;';

CREATE OPERATOR ~> (LEFTARG = text, RIGHTARG = text, FUNCTION = pair);

Bagel makes an appearance.

Example pair.control:

# pair extension
comment = 'A key/value pair data type'
default_version = '1.0'
module_pathname = '$libdir/pair'
relocatable = true

Example Makefile:

EXTENSION    = pair
MODULEDIR    = $(EXTENSION)
DOCS         = README.md
DATA         = sql/pair--1.0.sql
TESTS        = test/sql/base.sql
REGRESS      = base
REGRESS_OPTS = --inputdir=test
MODULES      = src/pair
PG_CONFIG   ?= pg_config

PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

Build and Install:

$ make
make: Nothing to be done for `all'.

$ make install
mkdir -p '/pgsql/share/extension'
mkdir -p '/pgsql/share/pair'
mkdir -p '/pgsql/share/doc/pair'
install -c -m 644 pair.control '/pgsql/share/extension/'
install -c -m 644 sql/pair--1.0.sql  '/pgsql/share/pair/'
install -c -m 644 README.md '/pgsql/share/doc/pair/'

$ make installcheck
# +++ regress install-check in  +++
# using postmaster on Unix socket, default port
ok 1         - base                                       15 ms
1..1
# All 1 tests passed.

CREATE EXTENSION:

$ psql -d try -c 'CREATE EXTENSION pair'
CREATE EXTENSION

$ pg_dump -d try
--
-- Name: pair; Type: EXTENSION; Schema: -; Owner: -
--
CREATE EXTENSION IF NOT EXISTS pair WITH SCHEMA public;

--
-- Name: EXTENSION pair; Type: COMMENT; Schema: -; Owner:
--
COMMENT ON EXTENSION pair IS 'A key/value pair data type';

Many of us saw opportunity in this new feature.

PostgreSQL today is not merely a database, it’s an application development platform.

— Me, 2010
Proposed to build PGXN. Raised funds to build it in late 2010. Launched site April 2011; Daniele Varrazzo released CLI, and Dickson Guedes released the dev CLI.
Problems PGXN set out to solve:
- Source code distribution with user registration and namespacing
- Discovery: Search, read docs, brows tags
- Installation: CLI to compile and install using PGXS or Configure
PGXN Components:
- Meta Spec
- Manager
- Root Mirror
- API Server
- Site
- Client
- Utils CLI
- CI/CD Image
Problems out of scope for PGXN:
- Binary packaging and distribution
  - Defer to apt/yum
- Developer tooling (though dev utils helped)
- Build tooling
  - Defer to core (PGXS)
PGXN Shortcomings:
- Little development since 2012
- Search limitations
  - Docs preferred
  - Most distributions have few docs
  - Some issues addressed in last few weeks
- Source of Record
  - Minority of available extensions on PGXN
  - Releases uneven or neglected
In classic SDLC fashion, PGXN POC shipped as an MVP and was neglected.

— Me, Just Now
Been peripheral to Postgres extensions for the last 10-12 years, but some things have happened.
Non-Core extension counts:
- PGXN: 382
- joelonsql/PostgreSQL-EXTENSIONs.md: 1,186
Daniele asks about that last source, which is just a list in a gist.
Joe Nelson links to the gist in Zoom chat. It is not his list, contrary to my off-the-cuff guess
Why haven’t extensions taken off?
Lost Opportunities
- No one canonical source to discover and install extensions
- Difficult to find and discover extensions without canonical registry
- Most extensions are under-documented and difficult to understand
- They are also hard to configure and install; most people don’t want or need a compiler
- The maturity of extensions can be difficult to gauge, not systematized, must each be independently researched
  - David Christensen in Chat “attention economy/awareness, NIH, etc”
  - Jeremy S in chat: “Maybe some people don’t know they are using extensions (I think that’s possible to some degree)”
- There is no comprehensive binary packaging
- Centralized source distribution is insufficient (even if it were complete)
  - jubilee in chat: Trust aspect?
  - David Johnson in chat: To seem legit you need to touch the repo at least annually to ensure it works on the newest major release. Even if you just do your compile and update the readme.
  - I mention using pgxn-utils and GitHub workflows to ensure my extensions continue working
- There is insufficient developer tooling; pgxn-utils not well-maintained, don’t build on recent Rubies, but pgrx has a lot of Rust-oriented tooling
  - Eric in chat: ❤️
  - jubilee in chat: 🦀 mentioned!
Filling the Gaps
dbdev: “The Database Package Manager for Trusted Language Extensions”: Includes only TLEs, no binary extensions
trunk: “A Postgres Extension Registry”: Binary distribution of curated extensions, desires to be comprehensive and cross-platform
pgxman: “npm for PostgreSQL”: Binary Apt package distribution of curated extensions packaged with, desires to be comprehensive and cross-platform
Emphases: Ease of Use. Screenshot from pgxman:
```
$ curl -sfL https://install.pgx.sh | sh -
👏🎉 pgxman successfully installed
$ pgxman install pgvector
The following Debian packages will be installed:
postgresql-14-pgxman-pgvector=0.5.1
Do you want to continue? [Y/n] y
pgvector has been successfully installed.
```
- Daniele in chat: “Missing a “curl | sudo sh” there…. 👀”
- Greg Mullane (CrunchyData) [he/him] in chat: “Really not a fan of that “pipe curl stuff from internet into sh” system.”
- Jeremy S in chat: “Someone recently reprimanded me for putting curl | psql in an extension README. From a security perspective it probably sets a better example to do curl >file.sql … psql file.sql (encourage users not to run from Internet but read/review first)” * jubilee in chat: “apt/yum install is just a better UI over curl | sh :^)”
- Jeremy S in chat: “Yes and once you’re to that point there’s already more supply chain verification happening”
- Jeremy S in chat: “It’s usually just the initial bootstrap into any system, if the setup wasn’t already in your distro”
Emphases: Platform neutrality. Screenshot from trunk:

Architecture x86-64

Operating system Debian/Ubuntu
Emphases: Stats. Screenshot from dbdev:

Downloads

20 all time downloads 0 downloads in last 30 days 1 download in last 90 days 0 downloads in last 180 days
Emphases: Curation. Screenshot from trunk:

Featured 7

Analytics 13

Auditing / Logging 7

Data Change Capture 6

Connectors 27

Data / Transformations 49
- Damien Clochard in chat: gtg, see you later guys !
MVPs
- trunk: Manual integration, Currently Debian-only
- pgxman: Form-based submission, Currently Apt-only
- dbdev: TLEs only, CLI publishing
- David Christensen in chat: “go has a pretty good extensions infra, imho, wrt discovery/docs, etc. also has the benefit of the package names being the URL to access it, which is a nice convention.”
New Opportunities Today

What are the community opportunities for the extension ecosystem?

Some ideas:
- Improved dev tools: More than pgxn-utils and pgrx
- Canonical registry: All publicly-available extensions in one pac3
- Easy publishing: auto-discovery or CI/CD pipeline publishing
- Continuous Delivery: CI/CD pipeline publishing
- File-free installation: TLEs
- Documentation: Something like Go docs or Rust docs
- File management: Put all the files for an extension in one directory
- Improved metadata
  - Library Dependencies: utilities used by extensions
  - Platform Dependencies: system packages
  - Build pipelines: PGXS, pgrx, make, cpan, pypi, etc.
  - Artifacts: Binaries build on release
  - Classification: Curated in addition to tags
  - Extension Types: Extensions, apps, background workers, loadable libraries
- Derived Services
  - Binary Packaging: Distributed binaries for many platforms
  - Ratings & Reviews: Stars, thumbs, comments
  - Aggregated Stats: Repository stats, etc.
  - Smoke Testing: Matrix of multi-platform test results
  - Security Scanning: Reporting vulnerabilities
  - Badging & Curation: Third-party classification, badging various statuses
Extension Ecosystem Summit

Collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.
🏔️ Your Summit Organizers
- David Wheeler, Tembo, PGXN
- Jeremy Schneider, AWS, dsef
- David Christensen, Crunchy Data, pg_kaboom
- Keith Fiske, Crunchy Data, pg_partman
- Devrim Gündüz, EnterpriseDB, yum.postgresql.org
Devrim Gunduz in chat: Thanks David!
Schedule:
- March 6: David Wheeler, PGXN: “State of the Extension Ecosystem”
- March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI”
- April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions”
- April 17: Jonathan Katz: “TLE Vision and Specifics”
- May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod”
- May 15: David Wheeler, PGXN: “Community Organizing Summit Topics”
Ultimately want to talk about what’s important to you, the members of the community to make extensions successful.


Architecture	x86-64
Operating system	Debian/Ubuntu


Featured	7
Analytics	13
Auditing / Logging	7
Data Change Capture	6
Connectors	27
Data / Transformations	49

Discussion

Eric: I’m Eric Ridge, one of the developers behind pgrx, as you’re going through this process of building a modern extension ecosystem, let us know what we can do on the Rust side to help make your lives easier, we’re happy to help any way we can.
Steven Miller in chat:
These are some areas of interest we noticed building Tembo

Binary packaging / distribution:
- Variable installation location
- System dependencies / uncommon system dependencies or versions
- Chip specific instructions (e.g. vector compiled with avx512)
- Extension-specific file types / extra data files (e.g. anonymizer .csv data)
Turning on extensions automatically
- Different ways to enable extensions
- does it need load (shared_preload_libraries, session_… etc)?
- Does it use create extension framework?
- Does it require a specific schema?
- What about turning on in multiple DBs at the same time in the same cluster, with background worker?
- Disabling, what data will be lost?
- Validating safety / user feedback on upgrade?
In cloud / SaaS:
- Installing + enabling extensions quickly, without restart
- Persisting extension files
- Extension-specific files (e.g. libraries) versus postgres’ libraries
- Updating
- Troubleshooting crashes / core dumps
Anyone else have similar problems / tips?
Steven Miller: These were just things I noted during the presentation. Curious if these are interesting to others on the call.
Daniele in chat: “Regards binary distributions, python wheels might be a useful reference.”
Steven Miller: That’s good point! What do people think of idea to just install extensions onto servers, not packages, persisted on the disk, next to PGDATA so they go into a single persistent volume, and the rest is managed by an immutable container.
Daniele: Had experience on Crunchy where we had to replace an image to get an extension. Looked for feature to have a sidecar or a volume with the extension.
Steven Miller: Didn’t have a separate directory just for extensions, it’s just pg_config --libdir fore everything. Had to persist entire directory, including those files form the base build, their internal files. Would have been nice to have a separate directory, extra-libdr or extra-sharedir, something like that.
Yurii Rashkovskii: I was working on a patch to do exactly that, but haven’t completed it. Was going to introduce additional directories to search for this stuff.
Steven Miller: That would be really awesome.
Jeremy S in chat: “Advantage of that is that a single image can be shared among systems with different needs”
Eric in chat: “Thoughts around “enterprise repositories” that could be self-hosted and disconnected from the internet?”
- Ian Stanton in chat: “I’ll touch on this in the next talk, it’s crossed our minds when building the Trunk registry”
Steven Miller: I think that’s a great idea.
Bagel reappears.
David Wheeler: PGXN originally designed so anyone could run Manager and their own root mirror, and maybe rsync from the community one. Don’t know that anyone ever did, it’s a little complicated and most people don’t want to work with Perl. [Chuckles]. Definitely think there’s space for that. If you work with Java or Go or maybe Rust, lots of orgs like Artifactory that provide internal registries. Could be cool use case for Postgres extensions.
David Christensen in chat: “something that could support extension batches; like groups of related extensions that could be installed in bulk or loaded in bulk (so could accommodate the shared systems with different individual extension needs, but could be shared_preload_library configured)”
“Steven Miller” in chat: “Sounds familiar”
Greg Mullane (CrunchyData) [he/him] in chat: “All these items remind me of CPAN. We should see what things it (and other similar systems) get right and wrong. I’ve learned from CPAN that ratings, reviews, and badging are going to be very difficult.”
- David Christensen in chat: “I’d assumed at the time that it was largely lifted (at least in philosophy/design) from CPAN. 🙂”
- David Wheeler (he/him) in chat: “yes”
Jeremy S: I think this is mostly focused on developers, but I had recent experience where multiple people in the past few months, new to Postgres, are trying to understand extensions. They install a version and then see there are like 15 versions installed, so confused. Goes back to the install file. Bit of UX angle where there are sharp edges where people trying to make sense of extensions, the flexibility makes it hard to understand. Some might be some nice guides, some architectural things explaining PGXS, or improvements to make to the design. Related, not main topic, but good to keep end user UX and devs building on Postgres but not Postgres developers, who run their businesses.
David Wheeler: Yeah all the files can be confusing, which is why I think trunk and pgxman trying to simplify: Just run this command and then you have it.
Steven Miller in chat: “I really agree with what Jeremy is saying. Right now PGXN and Trunk are taking and approach like “whatever an extension could do / how it may work, it should work on this registry”. But I think more standards / “what is a normal extension” would make the UX much easier.”
Jeremy S: Even with that the available extensions view is still there. Some of that is just schema management, and that’s how core is doing schema management.
Steven Miller in chat: I exactly agree about the concern about multiple extensions within a package. Also version and name mismatches
David Wheeler: And not everything is an extension, you just want to work, or your extension is just utility like pg_top you just want to use. Extensions I think were a tremendous contribution to Postgres itself, but a lot of it was wrangling the existing system for building Postgres itself to make it work for that. Could be very interesting, though quite long term — and I know Dimitri has tried this multiple times — to build a proper package management system within Postgres itself, to eas a lot of that pain and burden.
Tobias Bussmann in chat: “Thank you for the great overview and for taking this topic further! Unfortunately, I’ll not be able to join at pgConf.dev but will follow whatever results this will lead to. As a package maintainer, I am constantly looking in a easy way to allow users to use extensions without having to package everything ;)”
Steven Miller in chat: “Like auto explain for example right. i.e. a LOAD only “extension””
*Yurii Rashkovskii: An interesting topic, what extensions are capable of doing and how they can be more self-contained. Like Steven was saying in chat: how easy is it to load and unload extensions. Example: want an extension to hook into a part of Postgres: executor, planner, etc. How do you go about enabling them? How you unload them, introspect, list of hooks.

Omni extension provides a list of all hooks, and when you remove an extension it removes the hooks that provide the extension, but still not part of the core. Hooks one of the greatest ways to expand the functionality of Postgres, allows us to experiment with Postgres before committing to the full cycle of getting a patch into Postgres. Lets us get it to users today to try. if it makes a lot of sense and people want it, time to commit to the process of getting a patch committed. But if we don’t hve this venue, how to get extensions in, our ability to try things is limited.
jubilee in chat: Hmm. It seems my audio is not working.
David Wheeler: The next session is two weeks from today: Ian Stanton is going to talk about “Building Trunk: A Postgres Extension Registry and CLI”. Will be interesting because a number of people have decided to build a binary packaging system for extensions, just to air out what the challenges were, what problems they wanted to solve, what problems remain, and where they want to take it in the future.
Jeremy S* in chat: “Bagel clearly has something to say”
*David Wheeler: jubileee I see your audio issues, do you just want to type your question into chat? We can also discuss things in the #extensions channel on the Postgres Slack
David Wheeler: Thank you all for coming!
jubilee in chat: “I had a question which is about: Does Postgres actually support docs for extensions? Like, a lot of people don’t really WANT to read a README. Can you get docstrings for a function in psql?”
Ian Stanton in chat: “Thank you David!”
jubilee in chat: And if not, why not?

Post Presentation Discussion

From Slack:

David Wheeler: I see now that “jubilee” left their question in the Zoom chat.
David Wheeler: The closest attempt at this I’ve seen is pg_readme, which will generate Markdown for an extension from comments in the catalog and write it out to a file.
David G. Johnson: The comment on command adds in database comments that psql describe commands should display.

Also on Slack, Greg Sabino Mullane started a longish thread on the things we want to do and build.

Talk: State of the Extension Ecosystem

4 Mar 2024

Update: 2024-03-06: Slides and video linked below.

A quick reminder that I’ll be giving a brief talk on the “State of the Extension Ecosystem” on Wednesday at noon US Eastern / 17:00 UTC. This talk is the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28.

I plan to give a brief history of Postgres extension tools and distribution, the challenges encountered, recent developments, and opportunities for the future. It should take about 30 minutes, followed by discussion. Following this pattern for all the talks in the series, I hope to set up some engaging discussions and to surface significant topics ahead of the summit.

Join us! Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

Update: 2024-03-06: Great turnout and discussion, thank you! Links:

Extension Ecosystem Summit 2024

27 Feb 2024

I’m pleased to announce that some pals and I have organized and will host the (first annual?) Extension Ecosystem Summit at PGConf.dev in Vancouver (and more, see below) on May 28:

Enabling comprehensive indexing, discovery, and binary distribution.

Participants will collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

Going to PGConf? Select it as an “Additional Option” when you register, or update your registration if you’ve already registered. Hope to see you there!

Extension Ecosystem Mini-Summit

But if you can’t make it, that’s okay, because in the lead up to the Summit, to we’re hosting a series of six virtual gatherings, the Postgres Extension Ecosystem Mini-Summit.

Join us for an hour or so every other Wednesday starting March 6 to hear contributors to a variety of community and commercial extension initiatives outline the problems they want to solve, their attempts to so, challenges discovered along the way, and dreams for an ideal extension ecosystem in the future. Tentative speaker lineup:

March 6: David Wheeler, PGXN: “State of the Extension Ecosystem”
March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI”
April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions”
April 17: Jonathan Katz: “TLE Vision and Specifics”
May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod”
May 15: David Wheeler, PGXN: “Community Organizing Summit Topics”

Hit the event page for details. Many thanks to my co-organizers Jeremy Schneider, David Christensen, Keith Fiske, and Devrim Gündüz, as well as the PGConf.dev organizers for making this all happen!

Update: 2024-03-06: Updated the talk schedule.

Introduction

Presentation

Discussion

pgrx Release Pattern

PGXN Future Architecture

Introduction

Presentation

Conclusion

Distribution Digression

Back to PGXN

Decisions, Decisions

Introduction

Presentation

Discussion

Terms

Package Metadata

Essentials

Artifacts

Resources

Contents

Dependencies

Classification

Example: PGML Extension

Registry Metadata

Distributions

Metrics and Reports

Status Quo

A Modest Proposal

How it Works

Features with Benefits

Request for Comments

Introduction

Presentation

Downloads

Discussion

Post Presentation Discussion

Extension Ecosystem Mini-Summit