InMemorySource - Superlinked

The InMemorySource class provides an in-memory implementation of the Source interface, designed for development, testing, and scenarios where data is provided programmatically. It stores data in memory and allows for real-time data ingestion and immediate vector processing.

Constructor

Create a new in-memory source with the specified schema and optional parser.

InMemorySource(schema, parser=None)

Parameters

schema

IdSchemaObjectT

required

The schema object that defines the structure of data this source will handle. All data added to this source must conform to this schema.

parser

DataParser | None

default:"None"

Optional data parser for processing input data. If None, defaults to JsonParser for handling JSON-formatted data.

Raises: InvalidInputException - If the schema is not an instance of SchemaObject.

Inheritance

The InMemorySource extends several classes to provide comprehensive functionality: Inheritance Chain:

InMemorySource
→ InteractiveSource
→ OnlineSource
→ TransformerPublisher
→ Source
→ Generic

This inheritance provides:

Base Source functionality from Source
Online processing capabilities from OnlineSource
Interactive data input from InteractiveSource
Event publishing from TransformerPublisher
Generic type support for schema types

Key Features

Memory-Based Storage

All data is stored in RAM, providing:

Fast Access: No disk I/O overhead for data operations
Immediate Processing: Data is immediately available for vector processing
Simple Setup: No external dependencies or database configuration

Real-Time Ingestion

Inherited from InteractiveSource:

Continuous Data Input: Add data while the application is running
Immediate Processing: Data is processed into vectors as soon as it’s added
Live Updates: Indices are updated in real-time with new data

Use Cases

Development and Testing

Perfect for initial development and unit testing:

from superlinked import InMemorySource, schema

@schema
class ProductSchema:
    id: str
    name: str
    description: str
    price: float

product_schema = ProductSchema()

# Create in-memory source
source = InMemorySource(product_schema)

# Add test data
test_products = [
    {"id": "1", "name": "Laptop", "description": "Gaming laptop", "price": 999.99},
    {"id": "2", "name": "Mouse", "description": "Wireless mouse", "price": 29.99}
]

source.put(test_products)

Rapid Prototyping

Ideal for quick experimentation and proof-of-concepts:

# Quick prototype setup
source = InMemorySource(document_schema)

# Add sample documents
sample_docs = [
    {"id": "doc1", "title": "AI Overview", "content": "Introduction to AI concepts"},
    {"id": "doc2", "title": "ML Basics", "content": "Machine learning fundamentals"}
]

source.put(sample_docs)

# Immediately available for vector search testing

Interactive Development

Great for Jupyter notebooks and interactive development:

# Start with empty source
source = InMemorySource(article_schema)

# Add data incrementally during development
source.put({"id": "1", "title": "First Article", "content": "Content here"})

# Test search functionality
# ... run queries ...

# Add more data as needed
source.put({"id": "2", "title": "Second Article", "content": "More content"})

Demo Applications

Excellent for demonstrations and training:

# Demo setup with realistic data
demo_source = InMemorySource(movie_schema)

# Load demo dataset
demo_movies = load_demo_movie_data()  # Your demo data function
demo_source.put(demo_movies)

# Ready for live demonstration with pre-loaded data

Data Management

Programmatic Data Input

Data is added programmatically through the put() method inherited from InteractiveSource:

Single Items: Add individual data records
Batch Input: Add multiple records at once
Continuous Updates: Add data while the application is running
Real-Time Processing: Data is immediately processed and available for search

Memory Considerations

Memory Usage: All data is stored in RAM. Monitor memory usage with large datasets to prevent out-of-memory errors.

Data Size Limits: Keep datasets reasonably sized (typically under 1GB) for optimal performance in development scenarios.

Data Persistence

No Persistence: Data is lost when the application shuts down. Not suitable for production use cases requiring data durability.

Performance Characteristics

Advantages

Speed: Fastest possible data access and processing
Simplicity: No external dependencies or setup required
Flexibility: Easy to modify and test with different datasets

Limitations

Memory Constraints: Limited by available RAM
No Persistence: Data doesn’t survive application restarts
Single Instance: Cannot share data across multiple application instances

Best Practices

Development Workflow

Incremental Development: Start with small datasets in InMemorySource, then migrate to production sources when ready for deployment.

Testing Strategy

Isolated Tests: Each test should create its own InMemorySource instance to ensure test isolation and prevent data contamination.

Data Management

Schema Validation: Always validate your test data against the schema before adding to the source to catch schema mismatches early.

Integration Example

from superlinked import (
    InMemorySource, InMemoryApp, Index, 
    TextSimilaritySpace, InMemoryVectorDatabase
)

# Complete development setup
@schema
class DocumentSchema:
    id: str
    title: str
    content: str
    category: str

document_schema = DocumentSchema()

# Create source and other components
source = InMemorySource(document_schema)
text_space = TextSimilaritySpace(text=document_schema.content)
index = Index([text_space])
vector_db = InMemoryVectorDatabase()

# Create application
app = InMemoryApp(
    sources=[source],
    indices=[index],
    vector_database=vector_db
)

# Add data for development
development_docs = [
    {
        "id": "1", 
        "title": "Getting Started", 
        "content": "Introduction to our system",
        "category": "tutorial"
    },
    {
        "id": "2", 
        "title": "Advanced Features", 
        "content": "Deep dive into advanced functionality",
        "category": "guide"
    }
]

source.put(development_docs)

# Ready for development and testing

Migration Path

When transitioning from development to production:

# Development with InMemorySource
dev_source = InMemorySource(schema)

# Production with RestSource
prod_source = RestSource(schema, parser=custom_parser)

# Same application logic works with both sources
app = ProductionApp(sources=[prod_source], indices=[index])

Reference

​Constructor

​Parameters

​Inheritance

​Key Features

​Memory-Based Storage

​Real-Time Ingestion

​Use Cases

​Development and Testing

​Rapid Prototyping

​Interactive Development

​Demo Applications

​Data Management

​Programmatic Data Input

​Memory Considerations

​Data Persistence

​Performance Characteristics

​Advantages

​Limitations

​Best Practices

​Development Workflow

​Testing Strategy

​Data Management

​Integration Example

​Migration Path

Constructor

Parameters

Inheritance

Key Features

Memory-Based Storage

Real-Time Ingestion

Use Cases

Development and Testing

Rapid Prototyping

Interactive Development

Demo Applications

Data Management

Programmatic Data Input

Memory Considerations

Data Persistence

Performance Characteristics

Advantages

Limitations

Best Practices

Development Workflow

Testing Strategy

Data Management

Integration Example

Migration Path