Skip to content

DataFrameIt

Enrich DataFrames with LLMs in a simple and structured way

PyPI version Python 3.10+ License: MIT

What is it?

DataFrameIt processes text in DataFrames using Large Language Models (LLMs) and extracts structured information defined by Pydantic models. One function, one model, one prompt — done.

from pydantic import BaseModel
from typing import Literal
import pandas as pd
from dataframeit import dataframeit

class Sentiment(BaseModel):
    sentiment: Literal['positive', 'negative', 'neutral']
    confidence: Literal['high', 'medium', 'low']

df = pd.DataFrame({'text': ['Excellent product!', 'Terrible service.']})
result = dataframeit(df, Sentiment, "Analyze the sentiment of the text.", text_column='text')

Features

Multiple Providers

Google Gemini, OpenAI GPT-5, Anthropic Claude 4.5, Cohere, Mistral — all via LangChain.

Structured Output

Automatic validation with Pydantic. Define fields, types, and descriptions — the LLM respects them.

Resilience

Automatic retry with exponential backoff. Configurable rate limiting. Never lose progress.

Performance

Parallel processing with auto-adjustment. Real-time throughput metrics.

Tavily integration to enrich data with information from the internet.

Multiple Inputs

DataFrame, Series, list, dictionary — everything works. Polars included.

Quick Installation

pip install dataframeit[google]  # Google Gemini 3 (recommended)
pip install dataframeit[openai]  # OpenAI GPT-5
pip install dataframeit[anthropic]  # Anthropic Claude 4.5

Next Steps