Clustering geospatial markers with elastics search

I have several hundred thousand documents in the elasticsearch index with corresponding latitudes and longitudes (stored as geo_point types). I would like to be able to create a map visualization that looks something like this: http://leaflet.imtqy.com/Leaflet.markercluster/example/marker-clustering-realworld.388.html

So, I think that I want to run a query with a bounding box (i.e. the borders of the map the user is looking at) and return a summary of the clusters in that bounding box. Is there a good way to achieve this in elasticsearch? Perhaps a new indexing strategy? Something like geohash might work, but it would cluster things into a rectangular grid, and not into arbitrary polygons based on the density of points, as can be seen from the above example.


@kumetix - Good question. I am responding to your comment here because the text is too long to add another comment. The geohash_precision parameter will determine the maximum accuracy with which geohash aggregation can return. For example, if geohash_precision is set to 8, we can start geohash aggregation in this field with a maximum accuracy of 8. This, according to reference , results are grouped into geohash fields of approximately 38.2mx 19m. An accuracy of 7 or 8 would probably be accurate enough to display a web map similar to the one I mentioned in the above example.

How much geohash_precision affects the internal elements of the cluster, I assume that this parameter stores in geo_point a geohash string of length <= geohash_precision. Let's say we have a point in the Statue of Liberty: 40.6892, -74.0444. Geohash12 for this: dr5r7p4xb2ts. Setting geohash_precision to geo_point to 8 will internally store the lines: d dr DR5 dr5r dr5r7 dr5r7p dr5r7p4 dr5r7p4x

and geohash_precision of 12 will additionally store the lines: dr5r7p4xb dr5r7p4xb2 dr5r7p4xb2t dr5r7p4xb2ts

which leads to less storage overhead for each geoinformation. Setting geohash_precision to a distance value (1 km, 1 m, etc.) probably just saves it at the closest geohash string length accuracy value.

Note. How to calculate geohash using python

$ pip install python-geohash
>>> import geohash
>>> geohash.encode(40.6892,-74.0444)
'dr5r7p4xb2ts'
+5
source
2

Elasticsearch 1.0 Geohash Grid.

- geohash , , , , .

, geohash , , - .

+3

:

https://github.com/triforkams/geohash-facet

, .

:

GET /things/thing/_search
{
  "size": 0,
  "query": {
        "filtered": {
            "filter": {
                "geo_bounding_box": {
                    "Location"
                    : {
                        "top_left": {
                            "lat": 45.274886437048941,
                            "lon": -34.453125
                        },
                        "bottom_right": {
                            "lat": -35.317366329237856,
                            "lon": 1.845703125
                        }
                    }
                }
            }
        }
    },
    "facets": {
      "places": {
        "geohash": {
          "field": "Location",
          "factor": 0.85
        }
      }

    }
}
+1

All Articles