Skip to content

Prototype based on blackhole_fdw and c -> java calls #534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

beargiles
Copy link

This is prototype based on the blackhole_fdw. That is - the required method are the minimum required for the FDW to be loaded. There's quite a bit of extra code required if before you can use a real backend that's just happens to also return nothing. I have already implemented that elsewhere but for now it's fine to skip it.

That said most of the provided methods call the appropriate java method. THESE ARE NOT STATIC METHODS - it is assumed that everything except for the FDWValidator is associated with a Java object - FDWForeignDataWrapper, FDWServer, FDWTable, or the various internal states.

The fwd_private fields contain a jobject for the java instance. It is passed through to java via the JNI call in the same way that we can use method.invoke(Object obj,...) in Java.

There is also a dummy implementation of the java side of the FDW. It does nothing but log an acknowledgement that the method was called. (Is there a way to call trigger elog(NOTIFY,...) ?

BONUS - and probably a separate pull request

The pljava-so directory contains a Dockerfile and docker-compose.yml file that can be used to create a test image. The makefile(?) needs to add a stanza that calls docker build ... - I can provide the details later. For this project it may make more sense to move this to the packaging module - for the other project I'm working on it's entirely standalone so I can do everything in that module.

Once there's a docker image I can add an 'example' that uses TestContainers to run some actual java-based tests.

KNOWN ISSUES

There are a lot.

The most basic is that I'm not reusing existing bits of JNI. This should be easy to fix.

The most important, and the reason I haven't tried to actually build and run the .so, is that most of these objects need to stay in the persistent memory context since they live beyond the lifetime of the query. I know there's already one in the existing backend code but I don't know if I should use it or a per-FDW one.

It's also clear that there needs to be an OID associated with the Foreign Data Wrapper, Server, and Foreign Table objects since they're persistence database objects. It should be possible to reuse existing ones, or at least have a clean way to access them.

For now I'm using a simple tree - actually a one-to-one-to-one tree - but while the hooks are there it's not actually executing all of the steps yet. However there is enough that we can create a standalone object for testing.

However this has raised a second issue - the method signatures to create a new Server or Foreign Table are identical but both Foreign Data Wrappers and Servers can support multiple children. I'm sure there's an easy way to get the jclass from a jobject but I wanted to focus on other things since we can always fall back to assuming a single matching class.

There also needs to be correct mapping between int and jint, char * and jstring, etc. Once we have a solid foundation we can start adding conversions for the rest of the functions/methods.

Finally the user OID is available to us but I don't know how to retrieve the desired information from it. For maximum flexibility (and security) we want at least three things:

  • the database username
  • the authentication method used
  • the connection method used (via unix named file, TCP/IP (IP address), etc.)

The last item is important in highly secured environments, e.g., some critical operations may be limited to local users - and then the java class may still require additional authentication with something like a Yubico key. (Can you tell my other project is related to encryption keys and signing using harded external devices?)

So, overall, this is enough to give a decent representation of what's required to have the C-based FDW call java classes for the actual work. There's quite a bit more work to be done on the FDW side but it's internal bookkeeping and doesn't affect the java SPI.

However I don't know what prep work needs to be done beyond what's already done for UDF and UDT, and there's definitely the question of how the different persistent nodes are created and used. I've looked at a few other implementations but this goes a few steps beyond them due to the desire to eventually support multiple servers and tables.

FDW Handler?...

I've been a bit confused about a single handler when there has always been a desire to provide multiple servers and foreign tables. I think I have an answer though - the two functions are persistent and will have nodes and OIDs associated with them. The initial definition can point to a 'blackhole' Handler but it the handler could be replaced at the Server and Foreign table level. That's the only thing that makes sense since different tables will require different implementations of the planner, scanner, modification methods, etc.

It's likely that the Validator can also change since the meaningful values for a Foreign Table may depend on the the specific Server used.

BONUS ITEM #2

This is WAAAY out there but while digging through the deeply nested structures I came across a function pointer for performance analysis. I don't think it's limited to just FDW - it was pretty deep in the standard structures at this point.

On one hand I shudder to think of the impact of proving a JNI binding to allow a java class to collect performance metrics. On the other hand....

That said... I've looked at similar situations in the past and have largely concluded that the best solution - on a single-node system - is to use a named pipe, or IPC if you're more comfortable with it. As long as you're willing to accept data loss if the system is overwhelmed there's not much risk to slamming information into either pseudodevice and relying on a consumer to move the information somewhere else. For instance something like ApacheMQ so that the analysis can be performed on one or more remote machines.

However it still got me wondering... and I'm sure there's something similar for row and column-level authorization, auditing, perhaps even encryption.

This is prototype based on the blackhole_fdw. That is - the required
method are the minimum required for the FDW to be loaded. There's quite
a bit of extra code required if before you can use a real backend that's
just happens to also return nothing. I have already implemented that
elsewhere but for now it's fine to skip it.

That said most of the provided methods call the appropriate
java method. THESE ARE NOT STATIC METHODS - it is assumed that
everything except for the FDWValidator is associated with a Java
object - FDWForeignDataWrapper, FDWServer, FDWTable, or the various
internal states.

The `fwd_private` fields contain a `jobject` for the java `instance`. It
is passed through to java via the JNI call in the same way that we can
use `method.invoke(Object obj,...)` in Java.

There is also a dummy implementation of the java side of the FDW. It
does nothing but log an acknowledgement that the method was called. (Is
there a way to call trigger `elog(NOTIFY,...)` ?

BONUS - and probably a separate pull request

The pljava-so directory contains a Dockerfile and docker-compose.yml
file that can be used to create a test image. The makefile(?) needs
to add a stanza that calls `docker build ...` - I can provide the
details later. For this project it may make more sense to move this to
the packaging module - for the other project I'm working on it's
entirely standalone so I can do everything in that module.

Once there's a docker image I can add an 'example' that uses
TestContainers to run some actual java-based tests.

KNOWN ISSUES

There are a lot.

The most basic is that I'm not reusing existing bits of JNI. This should
be easy to fix.

The most important, and the reason I haven't tried to actually build and run the
.so, is that most of these objects need to stay in the persistent
memory context since they live beyond the lifetime of the query. I know
there's already one in the existing backend code but I don't know if I
should use it or a per-FDW one.

It's also clear that there needs to be an OID associated with the
Foreign Data Wrapper, Server, and Foreign Table objects since they're
persistence database objects. It should be possible to reuse existing
ones, or at least have a clean way to access them.

For now I'm using a simple tree - actually a one-to-one-to-one tree -
but while the hooks are there it's not actually executing all of the
steps yet. However there is enough that we can create a standalone
object for testing.

However this has raised a second issue - the method signatures to create
a new Server or Foreign Table are identical but both Foreign Data
Wrappers and Servers can support multiple children. I'm sure there's
an easy way to get the jclass from a jobject but I wanted to focus on
other things since we can always fall back to assuming a single matching
class.

There also needs to be correct mapping between int and jint, char * and
jstring, etc. Once we have a solid foundation we can start adding
conversions for the rest of the functions/methods.

Finally the user OID is available to us but I don't know how to retrieve
the desired information from it. For maximum flexibility (and security)
we want at least three things:

 - the database username
 - the authentication method used
 - the connection method used (via unix named file, TCP/IP (IP address),
   etc.)

The last item is important in highly secured environments, e.g., some
critical operations may be limited to local users - and then the java
class may still require additional authentication with something like a
Yubico key. (Can you tell my other project is related to encryption keys
and signing using harded external devices?)

So, overall, this is enough to give a decent representation of what's
required to have the C-based FDW call java classes for the actual work.
There's quite a bit more work to be done on the FDW side but it's
internal bookkeeping and doesn't affect the java SPI.

However I don't know what prep work needs to be done beyond what's
already done for UDF and UDT, and there's definitely the question of how
the different persistent nodes are created and used. I've looked at a
few other implementations but this goes a few steps beyond them due to
the desire to eventually support multiple servers and tables.

FDW Handler?...

I've been a bit confused about a single handler when there has always
been a desire to provide multiple servers and foreign tables. I think I
have an answer though - the two functions are persistent and will have
nodes and OIDs associated with them. The initial definition can point
to a 'blackhole' Handler but it the handler could be replaced at the
Server and Foreign table level. That's the only thing that makes sense
since different tables will require different implementations of the
planner, scanner, modification methods, etc.

It's likely that the Validator can also change since the meaningful
values for a Foreign Table may depend on the the specific Server used.

BONUS ITEM tada#2

This is WAAAY out there but while digging through the deeply nested
structures I came across a function pointer for performance analysis. I
don't think it's limited to just FDW - it was pretty deep in the
standard structures at this point.

On one hand I shudder to think of the impact of proving a JNI binding to
allow a java class to collect performance metrics. On the other hand....

That said... I've looked at similar situations in the past and have
largely concluded that the best solution - on a single-node system - is
to use a named pipe, or IPC if you're more comfortable with it. As long
as you're willing to accept data loss if the system is overwhelmed
there's not much risk to slamming information into either pseudodevice
and relying on a consumer to move the information somewhere else. For
instance something like ApacheMQ so that the analysis can be performed
on one or more remote machines.

However it still got me wondering... and I'm sure there's something
similar for row and column-level authorization, auditing, perhaps even
encryption.
@beargiles
Copy link
Author

Here's a bit of info from my other project.

First, a quick descent through the obvious tables.

postgres=# select * from pg_catalog.pg_extension;
  oid  |  extname   | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition 
-------+------------+----------+--------------+----------------+------------+-----------+--------------
 13569 | plpgsql    |       10 |           11 | f              | 1.0        |           | 
 16384 | simple_fdw |       10 |         2200 | t              | 0.1.0      |           | 
(2 rows)

postgres=# select * from pg_catalog.pg_foreign_data_wrapper;
  oid  |  fdwname   | fdwowner | fdwhandler | fdwvalidator | fdwacl | fdwoptions 
-------+------------+----------+------------+--------------+--------+------------
 16387 | simple_fdw |       10 |      16385 |        16386 |        | 
(1 row)

postgres=# select * from pg_catalog.pg_foreign_server;
  oid  |    srvname    | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions 
-------+---------------+----------+--------+---------+------------+--------+------------
 16388 | simple_server |       10 |  16387 |         |            |        | 
(1 row)

postgres=# select * from pg_catalog.pg_foreign_table;
 ftrelid | ftserver | ftoptions 
---------+----------+-----------
   16391 |    16388 | 
(1 row)

The owner is postgres (10).

I'm a bit disappointed that there fdwhandler is only associated with the Foreign Data Wrapper but it's not a problem. The key functions have a Oid foreigntableoid parameter that can be used to get the options used that table, server, and wrapper.

More importantly we can create a permanent cache from the table's ftrelid to the corresponding java classes and methods. We can't store the actual jclass and jmethodid but we can store the classname and method signature. For the maximum consistency with the java structure we would probably want to cache the values with the corresponding table, server, or wrapper.

This would look something like

create table jsql.fdw_wrapper_jni (
    oid   oid not null primary key references pg_catalog.pg_foreign_data_wrapper.oid,
    classname text not null
);

create table jsql.fdw_server_jni (
   oid oid not null primary key references pg_catalog.pg_foreign_server.oid,
   classname text not null
);

create table jsql.fdw_table_jni (
   oid oid not null primary key references pg_catalog.pg_foreign_table.oid,
   classname text not null,
   plan_state_classname text not null,
   scan_state_classname text not null
);

A quick peek at the functions (fdwhandler, fdwvalidator) are

postgres=# select oid, proname, proowner, prolang, pronargs, prorettype, proargnames, proargtypes from pg_catalog.pg_proc where oid in (16385, 16386);
  oid  |       proname        | proowner | prolang | pronargs | prorettype | proargnames | proargtypes 
-------+----------------------+----------+---------+----------+------------+-------------+-------------
 16385 | simple_fdw_handler   |       10 |      13 |        0 |       3115 |             | 
 16386 | simple_fdw_validator |       10 |      13 |        2 |       2278 |             | 1009 26
(2 rows)

and the types are

postgres=# select oid,typname,typnamespace,typinput,typoutput,typstorage,typtype from pg_catalog.pg_type where oid in (3115, 2278);
 oid  |   typname   | typnamespace |    typinput    |    typoutput    | typstorage | typtype 
------+-------------+--------------+----------------+-----------------+------------+---------
 2278 | void        |           11 | void_in        | void_out        | p          | p
 3115 | fdw_handler |           11 | fdw_handler_in | fdw_handler_out | p          | p
(2 rows)

@beargiles
Copy link
Author

After a bit of thought I realized that classnames can stored as standard options - no need for an extra table. I only thought of that because of my initial hope that we could cache jclass and jmethod.

It's also occurred to me that I now know how to handle the one-to-many aspect.

Java side:

  • keep cache of the three key implementations indexed by OID.
  • each class has constructor taking OID (as long?) and list of options

backend site:

  • start with the foreigntableoid. I don't think we have the options (classname) yet but
  • query pg_catalog to get the respective OIDs.
  • call java with the table oid and options
  • java creates new objects if necessary
  • java returns the target instance. It's stored as a jobject.

I know the java can update the local one-to-many relations when a new object is created but I don't know if there's a clean way to know that one has been deleted.

I think most of the FDWRoutine methods already use the JNI_FDW_Foreign_Table object, not the parent, but it would be easy to have the java side properly navigate to the appropriate server or wrapper.

I'll try to submit a revision later today.

@beargiles
Copy link
Author

Quick update since I wasn't at a pull-request-ready stage but hope to have something in the next few days.

I've refactored the java side extensively to match my new information about what's created when. That said I still have some questions that can only be resolved by running some actual code.

So the current plan is two-pronged (and hence the delay.)

The first is incorporate big chunks of my existing project that does successfully load a FDW. That FDW doesn't do much - but it doesn't blow chunks when I execute SELECT * FROM my_private_table. Wags will point out that I'm not actually providing any results but my main concern is that it doesn't fail and that I can use elog(NOTIFY,...) to see what's called when.

The second is to have a preliminary implementation of more of the JNI methods while using Java constructs like Maps and enums. I'm also still evaluating our options when the jobject instance associated with each OID is no longer valid. I think we'll need to have a working "define and immediately use" implementation before we can start evaluating the costs and benefits of backend-side vs java-side caches, etc.

Finally I'll add huge

#ifdef NEVER

// any JNI code that I'm not 100% confident on

#endif

and verify that I can build a working FDW. That will make further development much easier since we'll have immediate feedback whether the changes 1) broke anything and 2) worked as expected. I mentioned earlier that there's a standard location where you can put initialization scripts (both shell and sql) so the docker image can immediately create the extension, fdw, foreign user, server, and table. Or define a stored procedure to do this so we can see any NOTIFY messages.

Java implementation snippet

I have a tentative implementation of a not-quite-blackhole table. It should give you a feel for what I'm thinking of, but of course many details are still up in the air.

public class BlackholeTable {

    private final BlackholeServer server;'
    private final Map<String, String> options;
    private final List<Map<String, Object>> data = new ArrayList<>();

    public BlackholeTable(BlackholeServer server, Map<String, String> options) {
        this.server = server;
        this.options = new LinkedHashMap<>(options);

        // do something based on options
        // e.g., they may define a mapping from the database column name with a class's field name and/or accessor.
        .. 
        // initialize data with predefined values.
    } 

    // Note: this should also provide current database user and foreign user information
    // for any additional authorization checks.
    public BlackholeTableScanState newScanState(this, data) {
        return new BlackholeTableScanState(this, data);
    }
}

public class BlackholeTableScanState {

    private final List<Map<String, Object>> data;
    private Iterator<Map<String, Object> iter;
    private isOpen;

    public BlackholeTableScanState(BlackholeTable table, List<Map<String, Object>> data) {
        this.table = table;
        this.data = new Ar rayList<>(data);
        this.isOpen = false;
    }

    public boolean open() {
        if (isOpen) {  } // log this for now

        if (iter != null) {
            // log this for now
            iter = null;
        }

        // do some safe stuff

        if (explainOnly) {
            return false;
        }

        // if necessary open and close external resources
        // ...

        isOpen = true;
        return true;
    }

    public Map<String, Object> next() {
        if  (!isOpen) {  } // log this for now
        if (iter == null) {
            iter = data.iterator();
        }

        if (iter.hasNext()) {
            return new LinkedHashMap<>(iter.next());
        } else {
            return Collections.emptyMap();
        }
    }

    public void reset() {
        if (!isOpen) { } // log this for now
        iter = data.iterator();
    }

    public voiid close() {
        if (!isOpen) { } // log this for now

        iter = null
        data.clear();
    }

    public void explain() {
        // provide information about the table, server, wrapper, possibly user
    }
}

Java annotations

The prior code assumes a very simple relationship, e.g., the map's keys have the same name as the foreign table's columns. This puts a nontrivial load on the developer.

This could be hidden with annotations. The class-level annotation would provide the table options (e.g., URL) while the field- or method-level annotations could provide an additional table option for mapping column names. This would make implementations cleaner (more DB-agnostic) and improve the potential for reuse.

Further abstraction

We could make the implementation of the ScanState much more reuable if we could use a Supplier<T> or Stream<T> instead of an explicit Collection, file reader, REST response, etc. The problem is that neither supports reset() - and I suspect it is mandatory since a database query may fail and need to be retried.

There may be a solution since Stream<T> includes this method:

   Stream<T> generate(Supplier<T> s);

and we would be covered if the ForeignTable provided an object that implemented this method and allowed it to called more than once. It could use internal caching, re-read tables, resubmit REST calls, etc.

This seems like weird requirement but it would allow the framework to hide massive amounts of implementation details from the java developer.

@jcflack
Copy link
Contributor

jcflack commented May 31, 2025

I've refactored the java side extensively to match my new information about what's created when.

I'll look forward to seeing that. I had questions about the way it is organized here.

As a kind of combination of useful information and cautionary tale, I offer this youtube link.

It's a 25-minute talk that was given at Postgres Extensions Day in Montréal earlier this month. The presenter describes a project where he used C, Java, and JNI inside a PostgreSQL backend, and he goes into everything he had to invent solutions for in dealing with JNI, memory management, signals, Java multithreading in single-threaded PostgreSQL, and so on.

He was one of three developers on the project, which they started a year ago, and somehow three developers worked for a year on a Java-with-JNI-in-PostgreSQL project and prepared a 25-minute talk about it without ever having looked at how PL/Java addresses any of those things, and so it was a 25-minute talk given in the year 2025 that could have come straight from the archived original documentation in the PL/Java repo from the year 2005.

I'm going to seriously recommend that you perhaps look at both: his video, and the archived PL/Java docs, in particular solutions.html. The combination ought to make clear that there is a collection of issues needing to be understood when hacking on PL/Java (or anything like PL/Java) and that they are as salient now as twenty years ago.

After time-traveling back to the archived docs, you will probably want to get a better sense of what in PL/Java's internals still fairly closely resembles what Thomas wrote 20 years ago, and which of his solutions have more significantly evolved.

For starters, I would recommend looking at two specific things: JNICalls.c (which is still, with minor changes, pretty much as Thomas designed it), and the doc for DualState (which is an evolution of his earlier approach).

While looking at JNICalls.c, you should satisfy yourself as to why it is that PL/Java uses those wrappers in preference to direct use of JNI. When looking at DualState, I think the class javadoc is thorough enough to generally cover what you need to be paying attention to.

These are not things you can hack on PL/Java (in the C parts, anyway) without knowing.

Fleshed out FDW a bit.

On the pljava-so side it fails to load (in the docker test) because of a dependency on
GLIBC_2.38.  I hadn't hit this with my sister project because it was much simplier and
didn't introduce additional dependencies.

I also found the bits of backend code that lets us access all five types of options at
the points where we need them. It's forcing me to continue to rethink some of my assumptions,
e.g., there's definitely a need to have a conceptual difference between FDW, server, and
table, but it looks like nearly all of the required functions only require a relation or
table.

Could the solution be having the java classes responsible for executing 'CREATE FOREIGN
DATA WRAPPER ...' etc? I had assumed the constructors would need to be called from the
backend - like all of the callbacks - but now think that's backwards. This would makes it
much easier to ensure the options contain the correct classnames and method signatures. :-)

On the java side I have a big chunk of the API and thin implementation written. I haven't
had a chance to introduce some changes I mentioned earlier - ones that will provide a much
better abstraction between the FDW-specific bits and the actual implementation. That was because
I hoped to start seeing immediate feedback as I modified a working (but minimal) FDW.

The Dockerfile under pljava-so had always been a bit of a stopgap measure - a way to do a really
quick sanity check when modifying the backend code - but with the extension additions to the api
and examples modules the docker creation should probably be moved to 'packaging' anyway. At that
point it can continue to use the official postgresql images but use the locally build .so,
jars, etc., instead of trying to sneak in changes to a preconfigured docker image.
@beargiles
Copy link
Author

In the most recent pull request the focus was on getting a working FDW - even if it meant all of the JNI was behind #ifdef blocks.
The docker image loads the newly built .so file but the CREATE EXTENSION PLJAVA fails due different GLIBC versions. This docker image was always intended as a quick sanity check - not the final version - but the error is annoying because it keeps us from seeing logged messages showing what's being called when, what information is available, etc.

The Java side is still mostly a placeholder. The API and example code have been updated but I'm still learning more about how things are tied together.

The C side had a lot more code showing how to obtain the information that's already available in the callback methods, plus a #ifdef USE_JAVA block where I think we would need to make the actual call. However I haven't copied over some code that used advanced memory management.

There's also a lot of commented JNI that's essentially just a note to myself. It will probably all be wiped after I have a chance to review the stuff you mentioned above.

* @param options
* @return
*/
default boolean validateOptions(Map<String, String> options) { return true; };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this interface you have combined behaviors of a PostgreSQL catalog object (such as a getOptions method to report what options have been assigned in PostgreSQL DDL) with behaviors that belong to some chosen FDW implementation (such as a validateOptions method that knows which options are valid for that chosen implementation.

Those concerns are distinct. There should be (and already is) a catalog object interface that takes care of retrieving the foreign data wrapper's options and other particulars from the catalog. It's PL/Java's job to supply that and take care of the housekeeping, like caching it by its oid, making sure it isn't stale, and so on.

It's a particular FDW implementation's job to implement some different interface that specifies methods for validating the options, along with whatever other behavior an FDW implementation needs to supply.

* @param options
* @return
*/
default boolean validateOptions(Map<String, String> options) { return true; };
Copy link
Contributor

@jcflack jcflack Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the wrapper validateOptions

Indeed, most of the methods of this interface are methods proper to an FDW implementation. It's only methods like getId and getOptions that duplicate the methods of the catalog object.

* @param options
* @return
*/
default boolean validateOptions(Map<String, String> options) { return true; };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for wrapper validateOptions.

*
* @return
*/
default ResultSetMetaData getMetaData() { return null; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you assuming here that JDBC metadata will be adequate to describe the foreign table?

Are you assuming further that the foreign connection will involve JDBC? The JDBC ...MetaData interfaces will take many many lines of code to implement synthetically if it doesn't.

*
* @return
*/
default DatabaseMetaData getMetaData() { return null; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for foreign table getMetaData.


#ifdef USE_JAVA
// JAVA CONSTRUCTOR based on foreigntable id
baserel->fdw_private = JNI_getForeignRelSize(user, root, baserel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function names starting with JNI_ in PL/Java are JNICalls.c wrappers of Java JNI functions. That prefix shouldn't be used here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants