post https://api.us-east-1.relevance.ai/latest/datasets//aggregate
Aggregation/Groupby of a collection using an aggregation query.
The aggregation query is a json body that follows the schema of:
{
"groupby" : [
{"name": <alias>, "field": <field in the collection>, "agg": "category"},
{"name": <alias>, "field": <another groupby field in the collection>, "agg": "numeric"}
],
"metrics" : [
{"name": <alias>, "field": <numeric field in the collection>, "agg": "avg"}
{"name": <alias>, "field": <another numeric field in the collection>, "agg": "max"}
{"name": <alias>, "fields": [<numeric field in the collection>, <another numeric field in the collection>], "agg": "correlation"}
]
}
For example, one can use the following aggregations to group score based on region and player name.
{
"groupby" : [
{"name": "region", "field": "player_region", "agg": "category"},
{"name": "player_name", "field": "name", "agg": "category"}
],
"metrics" : [
{"name": "average_score", "field": "final_score", "agg": "avg"},
{"name": "max_score", "field": "final_score", "agg": "max"},
{'name':'total_score','field':"final_score", 'agg':'sum'},
{'name':'average_deaths','field':"final_deaths", 'agg':'avg'},
{'name':'highest_deaths','field':"final_deaths", 'agg':'max'},
{'name':'score_death_correlation', 'fields':['final_deaths', 'final_score'], 'agg': 'correlation'},
]
}
- "groupby" is the fields you want to split the data into. These are the available groupby types:
- category" : groupby a field that is a category
- numeric: groupby a field that is a numeric
- "metrics" is the fields you want to metrics you want to calculate in each of those, every aggregation includes a frequency metric. These are the available metric types:
- For single fields: "avg"/"average"/"mean", "cardinality", "count", "kurtosis", "max", "min", "percentiles", "kurtosis", "std_deviation", "std_deviation_bounds", "sum", "sum_of_squares", "variance"
- For multiple fields (the attribute "fields" must be used instead of "field"): "correlation", "covariance", "kurtosis", "mean", "skewness", "variance"
The response returned has the following in descending order.
IF you want to return documents, specify a "group_size" parameter and a "select_fields" parameter if you want to limit the specific fields chosen.
This looks as such:
{
'groupby':[
{'name':'Manufacturer','field':'manufacturer','agg':'category',
'group_size': 10, 'select_fields': ["name"]},
],
'metrics':[
{'name':'Price Average','field':'price','agg':'avg'},
],
}
{"title": {"title": "books", "frequency": 200, "documents": [{...}, {...}]}, {"title": "books", "frequency": 100, "documents": [{...}, {...}]}}
For array-aggregations, you can add "agg": "array" into the aggregation query.