From 5b3502c82eb221eb420ac6143c24534c381b752d Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Thu, 24 Jul 2025 13:47:22 +0200 Subject: [PATCH 1/6] chore: Update Service exposition concepts page --- .../concepts/pages/service-exposition.adoc | 83 +++++++++---------- 1 file changed, 37 insertions(+), 46 deletions(-) diff --git a/modules/concepts/pages/service-exposition.adoc b/modules/concepts/pages/service-exposition.adoc index 0c0f8e86f..0deac2259 100644 --- a/modules/concepts/pages/service-exposition.adoc +++ b/modules/concepts/pages/service-exposition.adoc @@ -1,70 +1,61 @@ = Service exposition -:k8s-service: https://kubernetes.io/docs/concepts/services-networking/service/ -:k8s-service-types: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types -:description: Explore Stackable's service exposition options: ClusterIP for internal access, NodePort for unstable external access, and LoadBalancer for stable external access. - +:listener-operator: xref:listener-operator:index.adoc +:secret-operator: xref:secret-operator:index.adoc +:listenerclass: xref:listener-operator:listenerclass.adoc +:description: Explore how Stackable uses listener-operator to expose Services. Data products expose interfaces to the outside world. These interfaces (whether UIs, or APIs) can be accessed by other products or by end users. -Other products accessing the interfaces can run inside or outside of the same Kubernetes cluster. +Clients accessing the interfaces can run inside or outside of the same Kubernetes cluster. For example, xref:zookeeper:index.adoc[Apache ZooKeeper] is a dependency for other products, and it usually needs to be accessible only from within Kubernetes, while xref:superset:index.adoc[Apache Superset] is a data analysis product for end users and therefore needs to be accessible from outside the Kubernetes cluster. Users connecting to Superset can be restricted within the local company network, or they can connect over the internet depending on the company security policies and demands. This page gives an overview over the different options for service exposition, when to choose which option and how these options are configured. -== Service exposition options - -The Stackable Data Platform supports three {k8s-service-types}[types of Kubernetes Service] for exposing data product endpoints: +== Motivation -* ClusterIP -* NodePort -* LoadBalancer +Service exposition is such a complicated topic, that Stackable has build it's own operator for that: {listener-operator}[]. +The following section explains the motivation why we wrote such an operator over just using plain regular Kubernetes Services. -All custom resources for data products provide a resource field named `spec.clusterConfig.listenerClass` which determines how the product can be accessed. -There are three ListenerClasses, named after the goal for which they are used (more on this in the <>): +=== Tools advertising their address -* `cluster-internal` => Use ClusterIP (default) -* `external-unstable` => Use NodePort -* `external-stable` => Use LoadBalancer +Some tools need to know how they are externally reachable. +This is e.g. important for HDFS, where the namenode keeps track of which datanode serves which block or Kafka (used for client bootstrapping). +A HDFS client asks the namenode "I want to read block 42, who is serving that?", the namenode responds with "block 42 is served by ". +For that to work, the datanode needs to know it's external address on startup and tell it the namenode. +(And yes, we needed to patch Hadoop source-code for that ;)) -The `cluster-internal` class exposes the interface of a product by using a ClusterIP Service. -This service is only reachable from within the Kubernetes cluster. -This setting is the most secure and was chosen as the default for that reason. +The {listener-operator}[listener-operator] runs as CSI driver (same as the {secret-operator}[secret-operator]) and places files inside the CSI volume, which tell the tool how it is reachable. -NOTE: Not all operators support all classes. -Consult the operator specific documentation to find out about the supported service types. +=== Integration with {secret-operator}[secret-operator] -[#when-to-choose-which-option] -== When to choose which option +If a tool is secured using TLS or Kerberos, it does not only need to be reachable via the determined address, it also needs a TLS certificate/keytab issued on the determined address. +{secret-operator}[secret-operator] integrated with to {listener-operator}[listener-operator], so that the platform takes care of provisioning certificates with the correct addresses (in the form of SAN entries). -There are three options, one for internal traffic and two for external access, where internal and external refer to the Kubernetes cluster. -Internal means inside of the Kuberenetes cluster, and external means access from outside of it. +== {listenerclass}[ListenerClasses] -=== Internal +A {listenerclass}[] describes how a product should be exposed. +Please read on {listenerclass}[it's documentation] before continuing on this page. -`cluster-internal` is the default class and the Service behind it is only reachable from within Kubernetes. -This is useful for middleware products such as xref:zookeeper:index.adoc[Apache ZooKeeper], xref:hive:index.adoc[Apache Hive metastore], or an xref:kafka:index.adoc[Apache Kafka] cluster used for internal data flow. -Products using this ListenerClass are not accessible from outside Kubernetes. +As a quick reminder, the platform ships with 3 default {listenerclass}[ListenerClasses]: -=== External +`cluster-internal`:: Used for listeners that are only accessible internally from the cluster. For example: communication between ZooKeeper nodes. +`external-unstable`:: Used for listeners that are accessible from outside the cluster, but which do not require a stable address. For example: individual Kafka brokers. +`external-stable`:: Used for listeners that are accessible from outside the cluster, and do require a stable address. For example: Kafka bootstrap. -External access is needed when a product needs to be accessed from _outside_ of Kubernetes. -This is necessary for all end user products such as xref:superset:index.adoc[Apache Superset]. -Some tools can expose APIs for data ingestion like xref:kafka:index.adoc[Apache Kafka] or xref:nifi:index.adoc[Apache NiFi]. -If data needs to be ingested from outside of the cluster, one of the external listener classes should be chosen. +Keep in mind that you are not restricted to this list, you can configure your own custom {listenerclass}[ListenerClasses]. -When to use `stable` and when to use `unstable`? -The `external-unstable` setting exposes a product interface via a Kuberneres NodePort. -In this case the service's IP address and port can change if Kubernetes needs to restart or reschedule the Pod to another node. +== Configuring the ListenerClass for a stacklet -The `external-stable` class uses a LoadBalancer. -The LoadBalancer is running at a fixed address and is therefore `stable`. -Managed Kubernetes services in the cloud usually offer a LoadBalancer, but for an on premise cluster you have to configure a LoadBalancer yourself. -For a production setup, it is recommended to use a LoadBalancer and the `external-stable` ListenerClass. +We integrated {listener-operator}[listener-operator] into most of our products, currently only xref:opa:index.adoc[] and xref:spark-k8s:index.adoc[] are not using {listener-operator}[listener-operator]. -== Outlook +Most of the tools configure the {listenerclass}[] at the role level as follows: -For most of the Stackable operators, these listener classes are hardcoded to expose certain Service types and do not offer any additional configuration. -However, some operators support specifying custom xref:listener-operator:listenerclass.adoc[ListenerClass]es with more granular configuration options, via the xref:listener-operator:index.adoc[listener-operator]. -In a future release, all Stackable operators are planned to be migrated over to this system. +[source,yaml] +---- +spec: + my-role: + roleConfig: + listenerClass: external-unstable +---- -For more information on what is supported by any individual operator, please see that operator's documentation. +Every operator has a documentation section called "Service exposition with ListenerClasses", which may provide details for the specific tool. From f2e3b2c4153e003be5d175b80746f3d563d3f089 Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Thu, 24 Jul 2025 13:53:20 +0200 Subject: [PATCH 2/6] linter --- modules/concepts/pages/service-exposition.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/concepts/pages/service-exposition.adoc b/modules/concepts/pages/service-exposition.adoc index 0deac2259..9d981c58b 100644 --- a/modules/concepts/pages/service-exposition.adoc +++ b/modules/concepts/pages/service-exposition.adoc @@ -58,4 +58,4 @@ spec: listenerClass: external-unstable ---- -Every operator has a documentation section called "Service exposition with ListenerClasses", which may provide details for the specific tool. +Every operator has a documentation section called "Service exposition with ListenerClasses", which may provide details for the specific tool. From 249e23ffda49ce14d32f9fb4b3caa97de7de25a7 Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Fri, 25 Jul 2025 10:18:11 +0200 Subject: [PATCH 3/6] Apply suggestions from code review Co-authored-by: Malte Sander --- modules/concepts/pages/service-exposition.adoc | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/modules/concepts/pages/service-exposition.adoc b/modules/concepts/pages/service-exposition.adoc index 9d981c58b..281c970c7 100644 --- a/modules/concepts/pages/service-exposition.adoc +++ b/modules/concepts/pages/service-exposition.adoc @@ -2,7 +2,7 @@ :listener-operator: xref:listener-operator:index.adoc :secret-operator: xref:secret-operator:index.adoc :listenerclass: xref:listener-operator:listenerclass.adoc -:description: Explore how Stackable uses listener-operator to expose Services. +:description: Explore how Stackable utilizes the listener-operator to expose Services. Data products expose interfaces to the outside world. These interfaces (whether UIs, or APIs) can be accessed by other products or by end users. @@ -14,12 +14,12 @@ This page gives an overview over the different options for service exposition, w == Motivation Service exposition is such a complicated topic, that Stackable has build it's own operator for that: {listener-operator}[]. -The following section explains the motivation why we wrote such an operator over just using plain regular Kubernetes Services. +The following section explains the motivation behind implementing such an operator instead of using plain regular Kubernetes Services. -=== Tools advertising their address +=== Products advertising their addresses -Some tools need to know how they are externally reachable. -This is e.g. important for HDFS, where the namenode keeps track of which datanode serves which block or Kafka (used for client bootstrapping). +Some products require information about their external accessibility. +This is e.g. important for HDFS, where the namenode keeps track of which datanode serves which block. Another case is Kafka, where it is required for client bootstrapping. A HDFS client asks the namenode "I want to read block 42, who is serving that?", the namenode responds with "block 42 is served by ". For that to work, the datanode needs to know it's external address on startup and tell it the namenode. (And yes, we needed to patch Hadoop source-code for that ;)) @@ -34,7 +34,7 @@ If a tool is secured using TLS or Kerberos, it does not only need to be reachabl == {listenerclass}[ListenerClasses] A {listenerclass}[] describes how a product should be exposed. -Please read on {listenerclass}[it's documentation] before continuing on this page. +Please read on {listenerclass}[its documentation] before continuing on this page. As a quick reminder, the platform ships with 3 default {listenerclass}[ListenerClasses]: @@ -44,9 +44,9 @@ As a quick reminder, the platform ships with 3 default {listenerclass}[ListenerC Keep in mind that you are not restricted to this list, you can configure your own custom {listenerclass}[ListenerClasses]. -== Configuring the ListenerClass for a stacklet +== Configuring the ListenerClass for a Stacklet -We integrated {listener-operator}[listener-operator] into most of our products, currently only xref:opa:index.adoc[] and xref:spark-k8s:index.adoc[] are not using {listener-operator}[listener-operator]. +The {listener-operator}[listener-operator] is integrated into most of the Stackable products, currently only xref:opa:index.adoc[] and xref:spark-k8s:index.adoc[] are not using {listener-operator}[listener-operator]. Most of the tools configure the {listenerclass}[] at the role level as follows: From ee2fb2b92ca1356cbe66f25964b0ee2236d022c7 Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Fri, 25 Jul 2025 10:18:32 +0200 Subject: [PATCH 4/6] Update modules/concepts/pages/service-exposition.adoc Co-authored-by: Malte Sander --- modules/concepts/pages/service-exposition.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/concepts/pages/service-exposition.adoc b/modules/concepts/pages/service-exposition.adoc index 281c970c7..a6f339ba0 100644 --- a/modules/concepts/pages/service-exposition.adoc +++ b/modules/concepts/pages/service-exposition.adoc @@ -20,7 +20,7 @@ The following section explains the motivation behind implementing such an operat Some products require information about their external accessibility. This is e.g. important for HDFS, where the namenode keeps track of which datanode serves which block. Another case is Kafka, where it is required for client bootstrapping. -A HDFS client asks the namenode "I want to read block 42, who is serving that?", the namenode responds with "block 42 is served by ". +A common use case is an HDFS client connecting to a namenode in order to read block 42. Therefore, the namenode needs to know which datanode is serving block 42. The namenode then responds with the IP or hostname of the datanode containing that block 42. For that to work, the datanode needs to know it's external address on startup and tell it the namenode. (And yes, we needed to patch Hadoop source-code for that ;)) From 0fb3949847923d29628dcda4b65692f36c9acc12 Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Fri, 25 Jul 2025 10:19:25 +0200 Subject: [PATCH 5/6] remove smiley :P --- modules/concepts/pages/service-exposition.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/concepts/pages/service-exposition.adoc b/modules/concepts/pages/service-exposition.adoc index a6f339ba0..61d9ad4c4 100644 --- a/modules/concepts/pages/service-exposition.adoc +++ b/modules/concepts/pages/service-exposition.adoc @@ -22,7 +22,7 @@ Some products require information about their external accessibility. This is e.g. important for HDFS, where the namenode keeps track of which datanode serves which block. Another case is Kafka, where it is required for client bootstrapping. A common use case is an HDFS client connecting to a namenode in order to read block 42. Therefore, the namenode needs to know which datanode is serving block 42. The namenode then responds with the IP or hostname of the datanode containing that block 42. For that to work, the datanode needs to know it's external address on startup and tell it the namenode. -(And yes, we needed to patch Hadoop source-code for that ;)) +(And yes, we needed to patch the Hadoop sourcecode for that) The {listener-operator}[listener-operator] runs as CSI driver (same as the {secret-operator}[secret-operator]) and places files inside the CSI volume, which tell the tool how it is reachable. From a886b67ba3b01d9bb5cf5c96ba4c404bcf4fdce0 Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Fri, 25 Jul 2025 10:44:33 +0200 Subject: [PATCH 6/6] mention HDFS difference --- modules/concepts/pages/service-exposition.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/modules/concepts/pages/service-exposition.adoc b/modules/concepts/pages/service-exposition.adoc index 61d9ad4c4..f2025d04b 100644 --- a/modules/concepts/pages/service-exposition.adoc +++ b/modules/concepts/pages/service-exposition.adoc @@ -48,7 +48,9 @@ Keep in mind that you are not restricted to this list, you can configure your ow The {listener-operator}[listener-operator] is integrated into most of the Stackable products, currently only xref:opa:index.adoc[] and xref:spark-k8s:index.adoc[] are not using {listener-operator}[listener-operator]. -Most of the tools configure the {listenerclass}[] at the role level as follows: +Most of the products configure the {listenerclass}[] at the role level as follows. +However, there are some products that have this option at the rolegroup level. +One example is HDFS, where some roles require a listener service per Pod, to individually access single instances. [source,yaml] ----