bodo.pandas.BodoSeries.ai.embed¶

BodoSeries.ai.embed(
        api_key: str | None = None,
        model: str | None = None,
        base_url: str | None = None,
        request_formatter: Callable[[str], str] | None = None,
        response_formatter: Callable[[str], list[float]] | None = None,
        region: str | None = None,
        backend: Backend = Backend.OPENAI,
        **embedding_kwargs) -> BodoSeries

Embed a series of strings using the specified embedding backend.

Supports OpenAI-compatible endpoints and Amazon Bedrock via the backend parameter.

Parameters

api_key: str | None: The API key for authentication. Required for OpenAI backend. Must not be passed for Bedrock backend.

model: str | None: The model to use for generation. If None, the backend's default model will be used. If the backend is Bedrock, this should be the model ID (e.g., "amazon.titan-text-lite-v1:") and may not be None. For OpenAI, this should be the model name (e.g., "gpt-3.5-turbo").

base_url: str | None: The URL of an OpenAI-compatible LLM endpoint (only applies to OpenAI-style backends).

request_formatter: Callable[[str], str] | None: Optional function to format the input text before sending to the model. This is only used for the Bedrock backend and must not be passed otherwise.

If None, a default formatter will be used for supported backends (e.g., Nova, Titan, Claude, OpenAI).

For unsupported/custom models, this must be provided.

response_formatter: Callable[[str], str] | None: Optional function to format the model's raw response into a string. This is only used for the Bedrock backend and must not be passed otherwise.

If None, a default formatter will be used for supported backends.

For unsupported/custom models, this must be provided.

region: str | None: The AWS region where the Bedrock model is hosted (only applies to Bedrock backend). If None, the default configured region will be used.

backend: bodo.ai.backend.Backend: The backend to use for generation. Currently supports:

bodo.ai.backend.Backend.OPENAI – for OpenAI-compatible endpoints

bodo.ai.backend.Backend.BEDROCK – for Amazon Bedrock models

**embedding_kwargs: dict: Additional keyword arguments for the embedding API.

Returns

BodoSeries: A series containing the embedded vectors as lists of doubles.

Example — OpenAI-compatible backend

import bodo.pandas as pd
from bodo.ai.backend import Backend

# Example series
a = pd.Series(["bodo.ai will improve your workflows.", "This is a professional sentence."])
# Define the LLM base_url and API key
base_url = "https://api.example.com/v1"
api_key = "your_api_key_here"
# Embed the series using the model
b = a.ai.embed(
    api_key=api_key,
    model="text-embedding-3-small",
    base_url=base_url,
    backend=Backend.OPENAI
)
print(b)

Output:

0    [0.123, 0.456, 0.789, ...]
1    [0.234, 0.567, 0.890, ...]
dtype: list<item: float64>[pyarrow]

Example — Amazon Bedrock backend

import bodo.pandas as pd
from bodo.ai.backend import Backend

# Example series
a = pd.Series(["bodo.ai will improve your workflows.", "This is a professional sentence."])
# Generate embeddings using the Bedrock model
b = a.ai.embed(
    model="amazon.titan-embed-text-v2:0",
    backend=Backend.BEDROCK,
    region="us-west-2"
)
print(b)

Example — Amazon Bedrock backend with custom formatters

import bodo.pandas as pd
from bodo.ai.backend import Backend

def request_formatter(row: str) -> str:
    return json.dumps({"inputText": row})

def response_formatter(response: str) -> list[float]:
    return json.loads(response)["embedding"]

a = pd.Series([
    "What is the capital of France?",
    "Who wrote 'To Kill a Mockingbird'?",
    "What is the largest mammal?",
])

b = a.ai.embed(
    model="custom_embedding_model_id",
    backend=Backend.BEDROCK,
    region="us-east-1"
    request_formatter=request_formatter,
    response_formatter=response_formatter,
)

print(b)