Atlas Profiler#
Atlas Profiler is a dataset profiling library. Given a CSV/TSV, file-like object, or pandas DataFrame, it returns JSON-style metadata about the dataset, its columns, detected types, value ranges, optional plots, spatial/temporal coverage, and profiling runtime.
The package builds on the Datamart Profiler workflow and adds an ML-assisted spatial column classifier. That classifier is only one part of the profiler: non-spatial columns still go through the core rule-based type detection, statistics, plots, coverage, and dataset-summary pipeline.
What It Produces#
process_dataset(...) returns a metadata dictionary with fields such as:
Dataset size, row count, profiled row count, and column count.
Per-column structural type, semantic types, missing/unclean value ratios, distinct counts, and optional plots.
Dataset-level type summary: numerical, categorical, spatial, and temporal.
Spatial coverage from lat/long pairs, WKT points, resolved addresses, and administrative areas.
Temporal coverage and temporal resolution for datetime columns.
Attribute keywords derived from column names.
Optional random sample rows and per-step profiling timings.
Core Type System#
The profiler detects broad structural types for all columns:
Structural type |
Meaning |
|---|---|
|
Empty column. |
|
Integer-like values. |
|
Floating point values. |
|
String/text values. |
|
Boolean-like values such as true/false, yes/no, 0/1. |
|
Point geometry or coordinate-pair strings. |
|
Polygon-like geometry. |
It also annotates semantic types when evidence is available:
Semantic type |
Examples |
|---|---|
|
Dates, timestamps, and year columns. |
|
Coordinate columns, paired after profiling. |
|
Address-like and admin-area text, optionally resolved with Nominatim or |
|
URLs, file paths, IDs, and categorical columns. |
Spatial ML Classifier#
When geo_classifier=True, Atlas Profiler creates a HybridGeoClassifier(GeoClassifier()). It samples values from each column, predicts spatial labels in one batch, validates sensitive predictions with rules, and passes accepted labels into the normal profiler type system.