Storage Architecture & Data Directories

The trueblocks-dalle library organizes all data using a structured directory hierarchy managed by the pkg/storage package. This chapter explains the storage architecture, data directory resolution, and file organization patterns.

Data Directory Resolution

Default Location

The library automatically determines an appropriate data directory based on the platform:

func DataDir() string {
    if dir := os.Getenv("TB_DALLE_DATA_DIR"); dir != "" {
        return dir
    }
    // Falls back to platform-specific defaults
}

Platform defaults:

macOS: ~/Library/Application Support/TrueBlocks
Linux: ~/.local/share/TrueBlocks
Windows: %APPDATA%/TrueBlocks

Environment Override

Set TB_DALLE_DATA_DIR to use a custom location:

export TB_DALLE_DATA_DIR="/custom/path/to/dalle-data"

Directory Structure

The data directory contains several key subdirectories:

$TB_DALLE_DATA_DIR/
├── output/              # Generated artifacts (images, prompts, audio)
├── cache/               # Database and context caches
├── series/              # Series configuration files
└── metrics/             # Progress timing data

Output Directory

Generated artifacts are organized by series under output/:

output/
└── <series-name>/
    ├── data/            # Raw attribute data dumps
    ├── title/           # Human-readable titles
    ├── terse/           # Short captions
    ├── prompt/          # Full structured prompts
    ├── enhanced/        # OpenAI-enhanced prompts
    ├── generated/       # Raw DALL·E generated images
    ├── annotated/       # Images with caption overlays
    ├── selector/        # Complete DalleDress JSON metadata
    └── audio/           # Text-to-speech MP3 files

Each subdirectory contains files named <address>.ext where:

address is the input seed string (typically Ethereum address)
ext is the appropriate file extension (.txt, .png, .json, .mp3)

Cache Directory

The cache directory stores processed database indexes and temporary files:

cache/
├── databases.cache      # Binary database cache file
├── series.cache         # Series configuration cache
└── temp/               # Temporary files during processing

Series Directory

Series configurations are stored as JSON files:

series/
├── default.json         # Default series configuration
├── custom-series.json   # Custom series with filters
└── deleted/            # Soft-deleted series
    └── old-series.json

File Path Utilities

The storage package provides utilities for constructing paths:

Core Functions

// Base directories
func DataDir() string                    // Main data directory
func OutputDir() string                  // output/ subdirectory
func SeriesDir() string                  // series/ subdirectory
func CacheDir() string                   // cache/ subdirectory

// Path construction
func EnsureDir(path string) error        // Create directory if needed
func CleanPath(path string) string       // Sanitize file paths

Path Security

All file operations include security checks to prevent directory traversal:

// Example from annotate.go
cleanName := filepath.Clean(fileName)
if !strings.Contains(cleanName, string(os.PathSeparator)+"generated"+string(os.PathSeparator)) {
    return "", fmt.Errorf("invalid image path: %s", fileName)
}

Artifact Lifecycle

Creation Flow

Directory Creation: Output directories are created as needed during generation
Incremental Writing: Artifacts are written as they're generated (prompts → image → annotation)
Atomic Operations: Files are written atomically to prevent corruption
Metadata Updates: JSON metadata is updated throughout the process

Caching Strategy

Existence Checks: If an annotated image exists, the pipeline returns immediately (cache hit)
Incremental Processing: Individual artifacts are cached, allowing partial resume
Selective Regeneration: Only missing or outdated artifacts are regenerated

Cleanup Operations

The Clean function removes all artifacts for a series/address pair:

func Clean(series, address string) {
    // Removes files from all output subdirectories
    // Clears cached DalleDress entries
    // Updates progress tracking
}

Database Storage

Embedded Databases

Attribute databases are embedded in the binary as compressed tar.gz archives:

pkg/storage/databases.tar.gz     # Compressed attribute databases

Cache Format

Processed databases are cached in binary format for fast loading:

type DatabaseCache struct {
    Version    string                   // Cache version
    Timestamp  int64                    // Creation time
    Databases  map[string]DatabaseIndex // Processed indexes
    Checksum   string                   // Validation checksum
    SourceHash string                   // Source data hash
}

Cache Validation

The cache system validates integrity on load:

Checksum Verification: Ensures cache file hasn't been corrupted
Source Hash Check: Detects if embedded databases have changed
Version Compatibility: Handles cache format changes
Automatic Rebuild: Rebuilds cache if validation fails

Performance Considerations

Directory Operations

Lazy Creation: Directories are created only when needed
Path Caching: Resolved paths are cached to avoid repeated filesystem calls
Batch Operations: Multiple files in the same directory are processed efficiently

Storage Optimization

Binary Caching: Database indexes use efficient binary serialization
Compression: Embedded databases are compressed to reduce binary size
Selective Loading: Only required database sections are loaded into memory

Cleanup Strategies

Automatic Cleanup: Temporary files are cleaned up on completion or failure
LRU Eviction: Context cache uses LRU eviction to prevent unbounded growth
Configurable Retention: TTL settings control how long contexts remain cached

Error Handling

Common Storage Errors

// Permission issues
if os.IsPermission(err) {
    // Handle insufficient filesystem permissions
}

// Disk space issues
if strings.Contains(err.Error(), "no space left") {
    // Handle disk space exhaustion
}

// Path traversal attempts
if strings.Contains(err.Error(), "invalid path") {
    // Handle security violations
}

Recovery Strategies

Graceful Degradation: Continue operation when non-critical files can't be written
Cache Rebuilding: Automatically rebuild corrupted caches
Alternative Paths: Fall back to temporary directories if primary locations fail

Integration Points

With Context Management

Series configurations are loaded from the series directory
Context cache uses storage utilities for persistence
Database loading integrates with the cache management system

With Progress Tracking

Progress metrics are persisted to the data directory
Temporary run state is stored in cache directory
Completed runs can optionally archive detailed timing data

With Generation Pipeline

Each generation phase writes artifacts to appropriate subdirectories
File existence checks drive caching decisions
Path resolution ensures consistent artifact locations

This storage architecture provides a robust foundation for reproducible, auditable, and efficient artifact management throughout the generation pipeline.

DalleDress: Local-First Image Generation for Go