Skip to Content
Engineering11 Documentation πŸ”₯

Sync Service

Reliable data synchronization across database collections at scale.

On This Page

What It Does

The Sync Service manages data synchronization across database collections and documents, enabling bulk updates and deletions when source data changes. It handles denormalized data consistency, cascade operations, and large-scale batch processing through asynchronous task queues with built-in pagination and reliability.

Key Capabilities

CapabilityDescription
Collection PatchingBulk update all documents in a collection with new values
Collection Group PatchingUpdate documents across all subcollections with same name
Document PatchingUpdate individual documents at specific paths
Collection DeletionDelete all documents in a collection via paginated batch processing
Document DeletionDelete individual documents at specific paths
Paginated ProcessingProcess large collections in batches (default: 100 docs/page)
Task-Based ArchitectureReliable, asynchronous processing via task queues with retry logic

How It Fits Together

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ TaskService Entry Points β”‚ β”‚ requestCollectionPatch, requestDocumentPatch, β”‚ β”‚ requestCollectionDeletion, requestDocumentDeletion β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Cloud Task Queue β”‚ β”‚ (Reliable delivery) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ SyncCollectionPatchController β”‚ β”‚ (NestJS Task Processors) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚PatcherTask β”‚ β”‚DeletionTask β”‚ β”‚Generator β”‚ β”‚Generator β”‚ β”‚ β”‚ β”‚ β”‚ β”‚- Streams pages β”‚ β”‚- Streams pages β”‚ β”‚- Creates tasks β”‚ β”‚- Creates delete tasksβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Firestore Updates β”‚ β”‚ (Atomic writes) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Sync Service uses a two-phase approach: first, it streams through collections in pages, then creates individual document tasks for reliable atomic updates.

Common Use Cases

  • Denormalized data updates: When a product name changes, update all subscription documents that reference that product
  • Bulk data migrations: Apply schema changes or data transformations across entire collections
  • Cascading updates: Propagate user profile changes to all related records
  • Collection cleanup: Delete all member documents when a community is removed
  • Cross-collection consistency: Maintain referential integrity for duplicated data
  • Large-scale refactoring: Update field names or structures across thousands of documents

What You Don’t Have to Build

  • Cursor-based pagination for large collections
  • Memory-efficient collection streaming
  • Collection group query logic
  • Two-phase task decomposition
  • Independent document task retry
  • Task queue configuration and routing
  • Path normalization and validation
  • Merge-based partial updates
  • Atomic document operations
  • Transaction management
  • Command/handler pattern infrastructure
  • Reusable streaming abstractions
  • Firestore serialization handling
  • Denormalized data sync systems
  • Cascade deletion orchestration
  • Bulk update workflows
  • Pagination cursor management
  • Task deduplication logic
  • Rate limiting and throttling
  • Ordered collection traversal
  • Progress tracking across failures
  • Partial failure recovery
  • Cross-service update coordination

API Reference

class TaskService { /** * Update all documents in a collection * WARNING: Documents must have 'id' field for pagination */ requestCollectionPatch(request: { collectionId: string; // Collection path (leading/trailing slashes stripped) values: unknown; // Partial document update (merged with existing) collectionGroup?: boolean; // Query across all subcollections with this name }): Promise<void>; /** * Update a single document * For slow-changing data and high-volume updates */ requestDocumentPatch(request: { path: string; // Full document path values: unknown; // Partial document update (merged with existing) }): Promise<void>; /** * Delete all documents in a collection * Creates individual delete tasks for each document */ requestCollectionDeletion(request: { collectionPath: string; // Collection path }): Promise<void>; /** * Delete a single document */ requestDocumentDeletion(request: { path: string; // Full document path }): Promise<void>; }

Important Notes

Collection ID Requirement

All documents in collections processed by the Sync Service must have an id field. This field is used for ordering during pagination to ensure consistent, deterministic traversal of large collections.

Collection Group Behavior

When using collectionGroup: true, the service queries all subcollections with the specified name across the entire database hierarchy. Use this feature carefully as it can affect documents across many parent collections.

Task Processing Model

Collection operations (patch/delete) create individual document-level tasks. Each document is processed independently, allowing for:

  • Partial progress on failures
  • Independent retry per document
  • Natural rate limiting through task queue
  • No memory issues with large collections

Update Semantics

Document patches use Firestore’s update() operation, not set(). This means:

  • Only specified fields are modified
  • Unspecified fields remain unchanged
  • Partial updates are safe and efficient
  • No need to read document before updating

Technical Details

Heavy-Lifting Features

1. Intelligent Collection Streaming with Pagination

Efficient processing of collections of any size:

BaseFirestoreCollectionStreamer features: - Automatic cursor-based pagination - Configurable page size (default: 100 documents) - Orders by document 'id' field for consistent pagination - Processes pages sequentially to avoid memory issues - Supports both regular collections and collection groups Processing flow: 1. Fetch page of documents 2. Process each document in page 3. Move cursor to next page 4. Repeat until no more documents

What customers avoid:

  • Writing pagination logic for large collections
  • Managing cursor state across batches
  • Implementing memory-efficient streaming
  • Handling collection vs. collection group differences

2. Collection Group Query Support

Cross-subcollection updates with a single operation:

Collection Group features: - Query all subcollections with the same name - Example: Update all 'members' subcollections across communities - Strips leading/trailing slashes from collection paths - Removes all slashes for collection group queries - Processes documents regardless of parent hierarchy Use case: Collection: communities/{id}/members With collectionGroup=true: updates ALL members in ALL communities With collectionGroup=false: updates members in specific community

What customers avoid:

  • Writing complex multi-collection traversal logic
  • Managing hierarchical collection paths
  • Implementing collection group query syntax
  • Coordinating updates across subcollections

3. Two-Phase Task Architecture for Reliability

Decomposed task processing with independent retry:

Phase 1 - Collection Task: - Receives collection patch/delete request - Streams through collection in pages - Creates individual document task for each document - Each document task is independent Phase 2 - Document Tasks: - Process individual documents atomically - Can retry independently if failed - No cross-document dependencies - Guaranteed eventual completion Benefits: - Partial failures don't restart entire collection - Individual documents can retry without affecting others - Progress persists across task failures - Natural rate limiting through task queue

What customers avoid:

  • Building multi-phase task systems
  • Implementing partial failure recovery
  • Writing retry logic for large operations
  • Managing task dependencies and ordering

4. Automatic Task Queue Management

Built-in task queue integration with routing:

Task queues: - SYNC_COLLECTION_PATCH_REQUEST β†’ /tasks/sync/request-collection-patch - SYNC_DOCUMENT_PATCH_REQUEST β†’ /tasks/sync/request-document-patch - SYNC_COLLECTION_DELETION_REQUEST β†’ /tasks/sync/request-collection-deletion - SYNC_DOCUMENT_DELETE_REQUEST β†’ /tasks/sync/request-document-deletion Task infrastructure handles: - Automatic retry on failure - Rate limiting and throttling - Task deduplication - Scheduled execution - GCP project routing

What customers avoid:

  • Configuring task queue infrastructure
  • Writing task routing logic
  • Implementing retry mechanisms
  • Managing task queue names and endpoints

5. Path Normalization and Validation

Automatic path cleaning for consistency:

Path processing: - Strips leading slashes: /collection β†’ collection - Strips trailing slashes: collection/ β†’ collection - Removes all slashes for collection groups: a/b/c β†’ abc - Handles both top-level and nested collection paths Examples: - Input: /communities/123/members/ - Output: communities/123/members - Input: /members/ (collectionGroup=true) - Output: members

What customers avoid:

  • Writing path parsing logic
  • Handling inconsistent path formats
  • Validating collection path syntax
  • Converting between path formats

6. Document Update with Partial Values

Merge-based updates preserve existing fields:

Update behavior: - Uses Firestore update() not set() - Merges new values with existing document - Only specified fields are modified - Unspecified fields remain unchanged - Supports nested field updates with dot notation Example: Document: {id: '123', name: 'John', age: 30, city: 'NYC'} Update: {age: 31} Result: {id: '123', name: 'John', age: 31, city: 'NYC'}

What customers avoid:

  • Implementing merge logic
  • Reading documents before updating
  • Managing partial updates
  • Preserving unmodified fields

7. Atomic Document Operations

Transaction-safe single-document operations:

Document operations: - Each document patch is atomic - Each document deletion is atomic - No partial document states - Guaranteed consistency per document Controller handlers: - documentPatchRequested: FirestorePaginationFunctions.update() - documentDeletionRequest: FirestorePaginationFunctions.delete() Both use atomic Firestore operations

What customers avoid:

  • Writing transaction wrappers
  • Handling partial write failures
  • Implementing rollback logic
  • Managing document consistency

8. Command Pattern for Complex Operations

CQRS-based deletion handling with extensibility:

Deletion uses Command/Handler pattern: - DeleteCollectionCommand: Request representation - DeleteCollectionHandler: Business logic execution - DeletionTaskGenerator: Streams and creates delete tasks Benefits: - Separation of concerns - Easy to add pre/post-processing hooks - Testable business logic - Extensible for custom deletion logic

What customers avoid:

  • Implementing command pattern infrastructure
  • Writing command routing logic
  • Managing handler registration
  • Building extensible operation framework

9. Built-In Streaming Infrastructure

Reusable streaming abstraction for collection processing:

BaseFirestoreStreamer features: - Abstract consume() method for custom processing - Automatic page fetching and iteration - Cursor management between pages - Pages processed counter - Optional custom orderBy field - Deep copy of items before processing Implementations: - PatcherTaskGenerator: Creates patch tasks per document - DeletionTaskGenerator: Creates delete tasks per document Both inherit streaming logic, only implement consume()

What customers avoid:

  • Writing collection streaming boilerplate
  • Implementing cursor-based iteration
  • Managing page state across iterations
  • Building reusable collection processors

10. Real-World Integration Examples

Production usage patterns from other services:

Example 1 - Product name updates: UpdateProductSubscriptionsHandler: - Streams all subscriptions - Filters by productId - Creates patch tasks for matching subscriptions - Updates denormalized productName field - Processes in parallel with Promise.all Example 2 - Community member cleanup: DeleteAllMembersHandler: - Builds collection path from communityId - Requests collection deletion - Sync service handles pagination and deletion - Guarantees all members removed These demonstrate: - Filtering before patching for efficiency - Parallel task creation for performance - Integration with repository patterns - Cascade deletion coordination

What customers avoid:

  • Building denormalized data sync systems
  • Writing cascade deletion logic
  • Implementing bulk update workflows
  • Coordinating multi-service operations

11. Firestore Serialization Handling

Automatic data transformation for database operations:

Serialization features: - Converts JavaScript objects to Firestore format - Deserializes Firestore data back to objects - Handles special Firestore types (Timestamp, GeoPoint, etc.) - Preserves type information through round-trips - Used automatically by all update/delete operations FirestorePaginationFunctions handles: - serialize<T>() before writes - deserialize() after reads - Type-safe transformations

What customers avoid:

  • Writing serialization logic
  • Managing Firestore type conversions
  • Handling timestamp transformations
  • Building type-safe database operations

12. Document ID Ordering Requirement

Predictable pagination with indexed field:

Ordering requirement: - All documents MUST have 'id' field - Used for orderBy in pagination queries - Enables consistent cursor-based pagination - Prevents duplicate processing - Ensures deterministic iteration order WARNING in TaskService: "this collection data to patch must have an 'id' property which is used to sort paginated data" Impact: - Documents without 'id' may be skipped - Pagination relies on indexed 'id' field - Collection must support orderBy('id') query

What customers avoid:

  • Debugging pagination inconsistencies
  • Handling unordered collection traversal
  • Implementing custom sorting logic
  • Managing cursor state without indexes

πŸ€– This documentation was generated using AI and human-proofed for accuracy.

Last updated on