CreateDataset

post https://api.us-east-1.relevance.ai/latest/datasets/create

A dataset can store documents to be searched, retrieved, filtered and aggregated (similar to Collections in MongoDB, Tables in SQL, Indexes in ElasticSearch).

A powerful and core feature of VecDB is that you can store both your metadata and vectors in the same document.
When specifying the schema of a dataset and inserting your own vector use the suffix (ends with) "_vector_" for the field name, and specify the length of the vector in dataset_schema.

For example:

{
    "product_image_vector_": 1024,
    "product_text_description_vector_" : 128
}

These are the field types supported in our datasets: ["text", "numeric", "date", "dict", "chunks", "vector", "chunkvector"].

For example:

{
    "product_text_description" : "text",
    "price" : "numeric",
    "created_date" : "date",
    "product_texts_chunk_": "chunks",
    "product_text_chunkvector_" : 1024
}

You don't have to specify the schema of every single field when creating a dataset, as VecDB will automatically detect the appropriate data type for each field (vectors will be automatically identified by its "_vector_" suffix). Infact you also don't always have to use this endpoint to create a dataset as /datasets/bulk_insert will infer and create the dataset and schema as you insert new documents.

Note:

A dataset name/id can only contain undercase letters, dash, underscore and numbers.
"_id" is reserved as the key and id of a document.
Once a schema is set for a dataset it cannot be altered. If it has to be altered, utlise the copy dataset endpoint.

For more information about vectors check out the 'Vectorizing' section, /services/search/vector or out blog at https://relevance.ai/blog.
For more information about chunks and chunk vectors check out /services/search/chunk.