API Reference

API Reference#

atlas_profiler.process_dataset(data, geo_classifier=True, geo_classifier_threshold=0.5, include_sample=False, coverage=True, plots=False, indexes=True, load_max_size=None, metadata=None, nominatim=None, datamart_geo_data=None, **kwargs)[source]#

Compute all metafeatures from a dataset.

Parameters:
  • data – path to dataset, or file object, or DataFrame

  • geo_classifierTrue to enable geo_classifier

  • geo_classifier_threshold – Confidence threshold for geo_classifier predictions (default: 0.85). Predictions below this threshold will be treated as “non_spatial”.

  • include_sample – Set to True to include a few random rows to the result. Useful to present to a user.

  • coverage – Whether to compute data ranges

  • plots – Whether to compute plots

  • indexes – Whether to include indexes. If True (the default), the input is a DataFrame, and it has index(es) different from the default range, they will appear in the result with the columns.

  • (bytes) (load_max_size) – Target size of the data to be analyzed. The data will be randomly sampled if it is bigger. Defaults to MAX_SIZE, currently 5 MB (5000000). This is different from the sample data included in the result.

  • metadata – The metadata provided by the discovery plugin (might be very limited).

  • nominatim – URL of the Nominatim server

  • datamart_geo_dataTrue or a datamart_geo.GeoData instance to use to resolve named administrative territorial entities

Returns:

JSON structure (dict)