Skip to main content

Klustron Spatial Data Management

KlustronAbout 3 min

Klustron Spatial Data Management

Klustron supports managing spatial data using the PostGIS plugin, and all PostGIS features are available in Klustron. Users must use Klustron's built-in PostGIS plugin, which is developed by the Klustron team based on PostGIS-3.3.4 and is called Klustron-PostGIS. The Klustron-PostGIS plugin exchanges GIS data with storage nodes (Klustron-storage) in the format required by the storage nodes. Klustron can use the RTREE index of storage nodes to index GIS data, thus achieving better spatial data query performance. Especially since Klustron clusters can use hardware resources from multiple servers, this is very necessary for GIS data management because each GIS data is relatively large (at least tens of bytes, usually hundreds of bytes to tens of KB, or even larger), occupying significant storage space, and spatial calculations consume substantial resources, easily exhausting the hardware resources of a single server. Therefore, using centralized databases to manage GIS data inherently lacks scalability and cannot meet the needs of large-scale spatial data management scenarios. A Klustron cluster with Klustron-PostGIS mounted becomes a distributed PostGIS spatial database cluster with unlimited spatial data storage and computing capabilities.

This document demonstrates the functionality and usage of Klustron-PostGIS. In fact, the usage of Klustron-PostGIS is exactly the same as mounting PostGIS in PostgreSQL.


gis_test=# create extension postgis; -- Mount Klustron-PostGIS plugin
CREATE EXTENSION
gis_test=# create table village(id serial primary key, facility varchar(64) not null, coord geometry); -- geometry is the base type for all geometric objects; specific geometric types can also be used as needed, see PostGIS documentation.
CREATE TABLE
gis_test=# insert into village(facility, coord) values('school', st_geomfromtext('polygon((1 1, 3 1, 3 3, 1 3, 1 1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('market', st_geomfromtext('polygon((-1 -1, -3 -1, -3 -3, -1 -3, -1 -1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('post office', st_geomfromtext('polygon((-0.2 -0.2, 0.2 -0.2, 0.2 0.2, -0.2 0.2, -0.2 -0.2))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('hospital', st_geomfromtext('polygon((1 -1, 3 -1, 3 -3, 1 -3, 1 -1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('warehouses', st_geomfromtext('polygon((-1 1, -5 1, -5 6, -1 6, -1 1))'));
INSERT 0 1
gis_test=# insert into village(facility, coord) values('unknown place', NULL);
INSERT 0 1
gis_test=# insert into village(facility, coord) values('available area', st_geomfromtext('polygon empty'));
INSERT 0 1

Let's look at the data retrieved from the village table. The GIS data is stored in binary encoding, and readers don't need to try to understand these binary GIS data. If users want to output GIS text data, they can use the ST_ASTEXT() function, for example: select id, facility, st_astext(coord) from village.

gis_test=# select*from village;
 id |    facility    |                                                                                           coord                                                                                        
     
----+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----
  1 | school         | 01030000000100000005000000000000000000F03F000000000000F03F0000000000000840000000000000F03F00000000000008400000000000000840000000000000F03F0000000000000840000000000000F03F000000000000
F03F
  2 | market         | 01030000000100000005000000000000000000F0BF000000000000F0BF00000000000008C0000000000000F0BF00000000000008C000000000000008C0000000000000F0BF00000000000008C0000000000000F0BF000000000000
F0BF
  3 | post office    | 010300000001000000050000009A9999999999C9BF9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999
C9BF
  4 | hospital       | 01030000000100000005000000000000000000F03F000000000000F0BF0000000000000840000000000000F0BF000000000000084000000000000008C0000000000000F03F00000000000008C0000000000000F03F000000000000
F0BF
  5 | warehouses     | 01030000000100000005000000000000000000F0BF000000000000F03F00000000000014C0000000000000F03F00000000000014C00000000000001840000000000000F0BF0000000000001840000000000000F0BF000000000000
F03F
  6 | unknown place  | 
  7 | available area | 010300000000000000
(7 rows)

The following query finds geometric shapes that contain the point with coordinates (0,0). According to the data above, the query result is correct. Using the EXPLAIN statement to view the query plan, we can see that the ST_CONTAINS containment relationship is pushed down to the storage node and executed using MBRCONTAINS. This query can use the spatial index of the coord column on the storage node, if it exists. At this point, we haven't created a spatial index on the coord column. Querying for special value NULL and checking for emptiness can also be pushed down to storage nodes for execution. Note that NULL and empty spatial data are different, similar to how NULL and empty string '' are different for character type columns.

gis_test=# select*from village where st_contains(coord, st_geomfromtext('point(0 0)'));
 id |  facility   |                                                                                           coord                                                                                          
  
----+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
  3 | post office | 010300000001000000050000009A9999999999C9BF9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C93F9A9999999999C9BF9A9999999999C93F9A9999999999C9BF9A9999999999C9B
F
(1 row)

gis_test=# explain select*from village where st_contains(coord, st_geomfromtext('point(0 0)'));
                                                                                                 QUERY PLAN                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 RemotePlan  (cost=101.02..101.02 rows=1 width=182)
   Filter: _st_contains(coord, '010100000000000000000000000000000000000000'::geometry)
   Shard: 3      Remote SQL: SELECT village.id,village.facility,village.coord FROM  `gis_test_$$_public`.`village`  WHERE (MBRCONTAINS(village.coord, 0x00000000010100000000000000000000000000000000000000))
(3 rows)


gis_test=# explain select*from village where coord is null;
                                                                    QUERY PLAN                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------
 RemotePlan  (cost=102.02..102.02 rows=1 width=182)
   Shard: 3      Remote SQL: SELECT village.id,village.facility,village.coord FROM  `gis_test_$$_public`.`village`  WHERE (village.coord IS NULL)
(2 rows)


gis_test=# select*from village where st_isempty(coord);
 id |    facility    |       coord        
----+----------------+--------------------
  7 | available area | 010300000000000000
(1 row)

gis_test=# explain select*from village where st_isempty(coord);
                                                                     QUERY PLAN                                                                      
----------------------------------------------------------------------------------------------------------------------------------------------------
 RemotePlan  (cost=102.02..102.02 rows=1 width=182)
   Shard: 3      Remote SQL: SELECT village.id,village.facility,village.coord FROM  `gis_test_$$_public`.`village`  WHERE ST_IsEmpty(village.coord)
(2 rows)

Create a spatial index on the village.coord column, then EXPLAIN the query plan. From the query plan, you can't see the difference between before and after creating the spatial index. However, as shown in the figure below, in the storage node, when explaining the query statement sent to the storage node, with the spatial index, the index is used to execute the query received by the storage node.

gis_test=# 
gis_test=# 
gis_test=# create index vil_coord_idx on village using gist(coord);
CREATE INDEX
gis_test=# explain select*from village where st_contains(coord, st_geomfromtext('point(0 0)'));
                                                                                                 QUERY PLAN                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 RemotePlan  (cost=102.02..102.02 rows=1 width=182)
   Filter: _st_contains(coord, '010100000000000000000000000000000000000000'::geometry)
   Shard: 3      Remote SQL: SELECT village.id,village.facility,village.coord FROM  `gis_test_$$_public`.`village`  WHERE (MBRCONTAINS(village.coord, 0x00000000010100000000000000000000000000000000000000))
(3 rows)

gis_test=# select*from village where coord is null;
 id |   facility    | coord 
----+---------------+-------
  6 | unknown place | 
(1 row)

Query plan for spatial queries executed on storage nodes