Files Service

Secure file storage, video processing, and intelligent lifecycle management.

What It Does

The Files Service provides complete file lifecycle management from upload to deletion. It handles storage organization, video hosting integration, time-limited signed URLs, automatic interest tracking, and scheduled cleanup of orphaned files. The service abstracts storage complexity and coordinates with external video processors while maintaining security through secret tokens and validation.

Key Capabilities

Capability	Description
File Receipts	Lightweight pointers with secret tokens for secure file references
Signed URL Generation	Time-limited URLs with caching for both storage and video files
Video Processing	Automatic upload to video hosting with composite key tracking
Interest-Based Deletion	Track file references across documents and schedule cleanup
File Cloning	Deep copy files and receipts for document duplication
Public/Private Files	Separate storage paths with access control
Stream Upload/Download	Efficient stream-based file operations with size limits
PDF Generation	URL-to-PDF conversion with streaming or bucket storage
Metadata Synchronization	Automatic metadata updates from storage events
Multi-Platform URLs	Separate signed URLs for web and mobile applications

How It Fits Together


┌─────────────────┐
│  Files Service  │
└────────┬────────┘
         │
    ┌────┴────────────┬──────────────┬─────────────┐
    ▼                 ▼              ▼             ▼
┌─────────┐   ┌──────────────┐  ┌───────┐  ┌──────────┐
│ Storage │   │Video Hosting │  │ Cache │  │ Database │
│(Public/ │   │  (External)  │  │(Redis)│  │          │
│Private) │   └──────────────┘  └───────┘  └──────────┘
└─────────┘

Files uses cloud storage for object persistence, external video hosting for video processing, Redis for signed URL caching, and database for metadata and interest tracking.

Common Use Cases

Media management: Product images, user avatars, and marketing assets with automatic video processing
Document storage: Contracts, invoices, and reports with secure time-limited access
Content publishing: Blog images, videos, and attachments with public CDN delivery
Multi-tenant SaaS: Customer file isolation with separate storage namespaces
File lifecycle: Temporary uploads, approval workflows, and automatic cleanup of unused files
Document cloning: Duplicate records with all associated files copied
PDF generation: Convert web pages to PDFs for reports, receipts, and documentation
Video hosting: Automatic upload to video processors with playback URL management

What You Don’t Have to Build

Secret-token based file security system
Batch signed URL generation with deduplication
Multi-layer caching (storage + video + Redis)
Automatic video hosting integration
Interest-based reference tracking across documents
Scheduled deletion workflows with cancellation
Deep document cloning with storage coordination
Public/private file access control
Stream-based upload/download with size limits
Temporary file management with cleanup
PDF generation from URLs with streaming
Metadata synchronization from storage events
Cascade deletion across multiple resources
Multi-tenant file organization with grouping
Event-driven architecture with pub/sub
Type-safe MIME type validation
Separate web and mobile URL formats
Storage path conventions and parsing
Customer-scoped access validation
Transaction-aware file operations
Video composite key management
Cache invalidation coordination
Throttling and rate limiting for URLs
Async file copy operations with tasks

Technical Details

Heavy-Lifting Features

1. File Receipt Security System

Secret-token based file references with validation:


File Receipt structure:
  - Unique ID and secret token for security
  - Customer key and grouping ID for organization
  - File name, MIME type, and size metadata
  - Public/private flag determines storage path
  - Optional videoCompositeKey for hosted videos
  - Optional deletionDelayDays for custom cleanup windows
 
buildFileReceiptObject():
  - Generates unique ID and secret token (UUID)
  - Validates MIME type against allowed types
  - Determines storage path based on public flag
  - Creates signature for receipt validation
 
Validation on signed URL requests:
  - Verifies secret token matches metadata
  - Checks customer key consistency
  - Validates file ID matches
  - Rejects tampered or invalid receipts

What customers avoid:

Building secure file reference systems
Generating and validating secret tokens
Implementing storage path conventions
Writing receipt validation logic
Managing customer-scoped access control

2. Intelligent Signed URL Management

Multi-layer caching with storage and video routing:


SignedUrlService provides:
  - Batch signed URL generation (deduplicates by ID)
  - Secret token validation before URL creation
  - Automatic routing to storage or video URLs
  - Separate web and mobile URL formats
  - Multi-status responses (207) for partial success
  - Silent error logging for failed individual files
 
Caching strategy:
  - Cache key: signed-url/{fileId} or signed-url/video/{videoId}
  - TTL: configurable expiration (default ~7 days)
  - Separate cache keys for web vs. mobile
  - Automatic cache invalidation on file deletion
  - Transaction-aware bypass for consistency
 
Throttling protection:
  - 500 requests per 5 seconds (100/sec rate)
  - Applied to batch operations to prevent abuse

What customers avoid:

Building batch URL generation with deduplication
Implementing two-tier caching (storage + video)
Managing separate web/mobile URL formats
Writing partial success response handling
Coordinating cache invalidation on deletions
Implementing rate limiting for URL endpoints

3. Video Processing Integration

Automatic video hosting with composite key management:


Video workflow:
  - Detects video MIME types (video/*)
  - Uploads to external video hosting provider
  - Receives composite key for hosted video
  - Stores composite key in file metadata
  - Generates signed playback URLs
  - Routes mobile vs. web video URLs differently
 
VideoService.sendToVideoHost():
  - Validates MIME type is video
  - Builds storage path to source file
  - Calls video provider API with bucket path
  - Sets private flag for access control
  - Returns composite key for metadata storage
 
Signed video URLs:
  - Cache separate from storage URLs
  - Support expiration time parameters
  - Automatic provider selection from composite key
  - Fallback to storage signing on video errors
  - Separate mobile URL generation for apps

What customers avoid:

Integrating with video hosting platforms
Managing video upload workflows
Implementing composite key tracking
Building video URL generation systems
Writing provider selection logic
Handling mobile vs. web video formats

4. Interest-Based File Lifecycle Management

Automatic reference tracking and scheduled deletion:


FileInterest tracking:
  - Global database triggers on ALL document writes
  - Automatic file receipt extraction from documents
  - Bidirectional interest mapping (file ↔ documents)
  - Array-based interested document tracking
  - Scheduled deletion when no documents interested
 
Register Interest functions:
  onCreate: Add document to interested array for all files
  onUpdate: Calculate diff and update interested arrays
  onDelete: Remove document from all file interest arrays
 
Deletion scheduling:
  - Triggers when interestedDocuments becomes empty
  - Default 14-day deletion window (configurable per file)
  - Creates scheduled Cloud Task for deletion
  - Stores task ID and deletion date in metadata
  - Cancels scheduled deletion if interest re-added
 
Transaction guarantees:
  - Read latest document state within transaction
  - Prevents race conditions on concurrent updates
  - Atomic interest array modifications
  - Rollback on scheduling failures

What customers avoid:

Building automatic file reference tracking
Writing global document change listeners
Implementing array-based interest tracking
Managing scheduled deletion workflows
Coordinating task creation and cancellation
Handling race conditions on interest changes
Building configurable deletion windows

5. Deep File Cloning with Storage Coordination

Recursive document cloning with storage object duplication:


FileCloningService features:
  - Deep document traversal to find all file receipts
  - Automatic receipt cloning with new IDs and tokens
  - Storage object copying with metadata preservation
  - Donor ID tracking for clone relationships
  - Synchronous or asynchronous copy modes
 
cloneDocument() workflow:
  - Recursively scan document for file receipts
  - Generate new receipts with cloneReceipts()
  - Deep copy entire document structure
  - Replace all file receipts with clones
  - Copy storage objects from donor to clone paths
  - Upload videos to hosting for cloned receipts
  - Create metadata entries for all clones
 
Copy operations:
  - Preserve MIME type and content type
  - Copy metadata as custom storage metadata
  - Handle public files (make public after copy)
  - Revoke access tokens for private files
  - Async copy via Cloud Tasks for large files

What customers avoid:

Writing recursive document cloning logic
Building file receipt extraction systems
Coordinating storage object copies
Managing donor-clone relationships
Implementing metadata preservation
Handling async copy operations
Synchronizing video hosting for clones

6. Public/Private File Separation

Storage path conventions with automatic access control:


Storage paths:
  - Private files: content/{customerKey}/{groupingId}/{id}/{filename}
  - Public files: assets/{customerKey}/{groupingId}/{id}/{filename}
 
Public file workflow:
  finalizeUpload() for public files:
    - Calls storage.makePublic(path)
    - Generates static CDN URL
    - Stores URL in file receipt and metadata
    - No signed URL required (direct access)
 
Private file workflow:
  finalizeUpload() for private files:
    - Revokes public access tokens
    - Requires signed URL for access
    - URLs expire after configured time
    - Cache signed URLs for performance
 
Path parsing:
  - Extract customerKey, groupingId, fileId from paths
  - Determine public/private from path prefix
  - Parse secret token for validation
  - Reconstruct full paths from file receipts

What customers avoid:

Designing storage path conventions
Implementing public/private access logic
Building CDN URL generation
Managing access token revocation
Writing path parsing utilities
Coordinating storage permissions

7. Stream-Based File Operations

Efficient upload/download with size limiting:


Upload streaming:
  - FileStream API for multipart form uploads
  - Progress tracking during upload
  - Size limit enforcement (default 500MB)
  - Automatic stream abortion on size exceeded
  - Direct streaming to cloud storage
  - File receipt creation during upload
 
uploadFileStream():
  - Creates writable stream to storage
  - Optional size limit with limitSize() pipe
  - Resolves on stream finish
  - Rejects on stream error
  - Automatic finalizeUpload() after stream
 
Download streaming:
  - Readable stream from storage
  - Efficient for large files
  - No memory buffering required
  - Supports range requests
 
File size validation:
  - Decorator-based validation (@FileReceiptSize)
  - Type-specific size limits
  - Async validation against metadata
  - Automatic metadata lookup if size missing

What customers avoid:

Implementing stream-based uploads
Building size limit enforcement
Writing progress tracking systems
Managing multipart form parsing
Creating storage stream adapters
Implementing size validation decorators

8. Temporary File Management

Time-limited files with automatic cleanup:


Temporary file workflow:
  - Created with saveAsTemp flag on finalize
  - Stored in separate temporary files collection
  - Contains file receipt and creation timestamp
  - No immediate storage deletion on save
 
Use cases:
  - Files pending approval or processing
  - Preview files before permanent storage
  - Staged uploads for multi-step workflows
  - Files with uncertain final destination
 
Cleanup (implementation pattern):
  - Scheduled task checks creation timestamp
  - Deletes temp record after configured window
  - Triggers normal file deletion cascade
  - Removes storage object and metadata

What customers avoid:

Building temporary file tracking
Implementing staged upload patterns
Writing cleanup scheduling logic
Managing separate temp file collections
Coordinating deletion cascades

9. PDF Generation from URLs

Headless browser integration with streaming:


PDFGeneratorController endpoints:
  V1: Stream PDF directly to client
    - Generates PDF from URL
    - Streams response with Transfer-Encoding: chunked
    - Works within Cloud Run 32MB limits
    - No Content-Length header for streaming
 
  V2: Generate PDF to bucket
    - Creates unique PDF filename (UUID)
    - Streams to configured PDF_BUCKET
    - Returns signed URL for download
    - Automatic lifecycle deletion after 1 day
 
PDF generation:
  - Validates URL to prevent SSRF
  - Calls headless browser service
  - Streams response data
  - Handles chunked transfer encoding
  - Throttled at 100 requests/second
 
Error handling:
  - Validates bucket configuration
  - Handles stream errors
  - Logs PDF generation failures
  - Returns appropriate error codes

What customers avoid:

Integrating headless browser services
Implementing URL-to-PDF conversion
Managing streaming responses
Handling Cloud Run size limits
Building SSRF protection
Writing PDF lifecycle management

10. Metadata Synchronization from Storage Events

Automatic metadata updates from cloud storage:


Storage event handlers:
  onFinalize: File written to storage
    - Triggered when upload completes
    - Extracts metadata from storage object
    - Merges with existing file metadata
    - Updates size, hashes, storage file ID
    - Only processes files with metadata
 
  onDelete: File deleted from storage
    - Triggered when storage object removed
    - Deletes corresponding file metadata
    - Only processes files with receipt metadata
 
Metadata merge:
  - Extracts file ID from storage metadata
  - Updates: size, crc32c, fileType, storageFileId
  - Updates: md5Hash, bucket, updated timestamp
  - Preserves existing file receipt data
  - Upserts to avoid overwriting
 
Event routing:
  Storage Event → Pub/Sub Topic → Cloud Task → Task Handler
  - Async processing prevents blocking
  - Retries on failure
  - Deduplication for deletions

What customers avoid:

Building storage event handlers
Implementing metadata synchronization
Writing hash and checksum tracking
Managing storage-database consistency
Coordinating async event processing
Building retry and deduplication logic

11. Cascade Deletion with Multi-Resource Cleanup

Comprehensive cleanup across storage, video, and database:


FileDeletedTaskService.delete():
  Parallel cleanup operations:
    - Delete storage object (if exists)
    - Delete signed URL cache entry
    - Delete video from hosting (if video file)
    - Delete video signed URL cache entries
    - Delete file interest tracking document
 
  All operations are non-throwing:
    - Storage deletion succeeds if not exists
    - Cache deletion succeeds if key missing
    - Video deletion handles missing videos
 
Video cleanup:
  - Extracts video provider from composite key
  - Calls provider-specific delete method
  - Removes both web and mobile cache entries
  - Handles deleted video hosting records
 
Storage deletion triggers:
  - Database file deletion → storage cleanup
  - Storage file deletion → database cleanup
  - Both paths converge to same cleanup logic

What customers avoid:

Writing cascade deletion logic
Coordinating multi-resource cleanup
Implementing non-throwing deletion
Managing video hosting deletion
Building cache cleanup coordination
Handling bidirectional deletion triggers

12. Multi-Tenant Organization with Grouping

Hierarchical file organization by customer and grouping:


File organization hierarchy:
  customerKey: Tenant/customer identifier
  groupingId: Document or resource ID
  fileId: Unique file identifier
 
Storage path structure:
  {root}/{customerKey}/{groupingId}/{fileId}/{filename}
 
Grouping-based operations:
  - deleteAllForGroup(groupingId): Delete all files for a document
  - getAll(groupingId): Query all files for a document
  - collectFileObjects(document): Extract all file receipts recursively
 
Use cases:
  - Product with multiple images (groupingId = productId)
  - User profile with avatar and documents (groupingId = userId)
  - Post with attachments (groupingId = postId)
 
Query patterns:
  - All files for a customer: query by customerKey
  - All files for a document: query by groupingId
  - Specific file: direct lookup by fileId

What customers avoid:

Designing multi-tenant storage structures
Building hierarchical file organization
Implementing grouping-based queries
Writing recursive file extraction
Managing customer isolation
Building batch deletion by group

13. Automatic Event-Driven Architecture

Transparent pub/sub for all file operations:


Cloud Functions automatically publish:
  Storage events:
    - FILE_UPLOADED → onFinalize trigger
    - FILE_DELETED_STORAGE → onDelete trigger
 
  Database events:
    - FILE_UPDATED → onUpdate trigger
    - FILE_DELETED → onDelete trigger
 
  Interest events:
    - FILE_INTEREST_CREATED → onCreate trigger
    - FILE_INTEREST_UPDATED → onUpdate trigger
    - FILE_INTEREST_DELETED → onDelete trigger
 
Event routing:
  Database/Storage → Pub/Sub Topic → Cloud Task Queue → Task Handler
 
  Benefits:
    - Async processing prevents blocking
    - Automatic retries on failure
    - Deduplication for deletions
    - Guaranteed execution
    - Decoupled architecture

What customers avoid:

Building event publishing infrastructure
Writing database and storage triggers
Implementing pub/sub integrations
Managing async task queues
Coordinating event routing
Building retry and deduplication logic

14. Type-Safe File Operations with Validation

Comprehensive MIME type support and validation:


Supported file types:
  Images: JPEG, PNG, GIF, HEIC, HEIF
  Videos: MP4, MPEG, OGG, WebM, AVI, QuickTime, 3GPP, M4V
  Audio: MPEG, WAV, 3GPP
  Documents: TXT, MD, DOC, DOCX, RTF, PDF, PPT, PPTX,
             EPUB, XLS, XLSX, generic binary
 
File receipt validation:
  - @FileReceiptSize() decorator for size limits
  - Type-specific size limits (configurable)
  - Default 500MB maximum size
  - Async validation against metadata
  - Automatic size lookup from storage
 
MIME type detection:
  - Automatic MIME lookup from filename
  - Validation against allowed types
  - Video type detection (video/* prefix)
  - Content-Type preservation on copy
 
Storage metadata validation:
  - Secret token verification
  - Customer key matching
  - File ID consistency checks
  - Reject tampered receipts

What customers avoid:

Building MIME type validation
Implementing size limit enforcement
Writing validation decorators
Managing allowed file type lists
Building automatic type detection
Implementing security validation

🤖 This documentation was generated using AI and human-proofed for accuracy.

Files Service

On This Page

What It Does

Key Capabilities

How It Fits Together

Common Use Cases

What You Don’t Have to Build

Technical Details

Heavy-Lifting Features

1. File Receipt Security System

2. Intelligent Signed URL Management

3. Video Processing Integration

4. Interest-Based File Lifecycle Management

5. Deep File Cloning with Storage Coordination

6. Public/Private File Separation

7. Stream-Based File Operations

8. Temporary File Management

9. PDF Generation from URLs

10. Metadata Synchronization from Storage Events

11. Cascade Deletion with Multi-Resource Cleanup

12. Multi-Tenant Organization with Grouping

13. Automatic Event-Driven Architecture

14. Type-Safe File Operations with Validation