survos / data-bundle
Shared data directory conventions and path utilities for dataset-driven apps (APP_DATA_DIR).
Fund package maintenance!
Requires
- php: ^8.4
- doctrine/dbal: ^4.2
- symfony/config: ^8.0
- symfony/console: ^8.0
- symfony/dependency-injection: ^8.0
- symfony/filesystem: ^8.0
- symfony/framework-bundle: ^8.0
- symfony/serializer: ^8.0
- symfony/ux-twig-component: ^2.0|^3.0
Requires (Dev)
- phpunit/phpunit: ^13.0
- roave/security-advisories: dev-latest
- survos/ai-workflow-bundle: ^2.0
- survos/import-bundle: ^2.0
- symfony/yaml: ^8.0
Suggests
- survos/ai-workflow-bundle: Wires DatasetIterateRowEvent → Subject creation for AI workflow pipelines
- dev-main
- 2.2.5
- 2.2.4
- 2.2.3
- 2.2.2
- 2.2.1
- 2.2.0
- 2.1.2
- 2.1.1
- 2.0.220
- 2.0.219
- 2.0.218
- 2.0.217
- 2.0.216
- 2.0.215
- 2.0.214
- 2.0.213
- 2.0.212
- 2.0.211
- 2.0.210
- 2.0.209
- 2.0.208
- 2.0.207
- 2.0.206
- 2.0.205
- 2.0.204
- 2.0.203
- 2.0.202
- 2.0.201
- 2.0.200
- 2.0.199
- 2.0.198
- 2.0.197
- 2.0.196
- 2.0.195
- 2.0.194
- 2.0.193
- 2.0.192
- 2.0.191
- 2.0.190
- 2.0.189
- 2.0.188
- 2.0.187
- 2.0.186
- 2.0.185
- 2.0.184
- 2.0.183
- 2.0.182
- 2.0.181
- 2.0.180
- 2.0.179
- 2.0.178
- 2.0.177
- 2.0.176
- 2.0.175
- 2.0.173
- 2.0.172
- 2.0.171
- 2.0.170
- 2.0.169
- 2.0.168
- 2.0.167
- 2.0.166
- 2.0.165
- 2.0.164
- 2.0.163
- 2.0.162
- 2.0.161
- 2.0.160
- 2.0.159
- 2.0.158
- 2.0.156
- 2.0.155
- 2.0.154
- 2.0.146
- 2.0.145
- 2.0.144
- 2.0.143
- 2.0.142
- 2.0.141
- 2.0.140
- 2.0.139
- 2.0.138
- 2.0.137
- 2.0.136
- 2.0.135
- 2.0.134
- 2.0.133
- 2.0.132
- 2.0.131
- 2.0.130
- 2.0.129
- 2.0.128
- 2.0.127
- 2.0.126
- 2.0.125
- 2.0.124
- 2.0.123
- 2.0.122
- 2.0.121
- 2.0.120
- 2.0.119
- 2.0.117
- 2.0.116
- 2.0.115
- 2.0.114
- 2.0.113
- 2.0.112
- 2.0.111
- 2.0.110
- 2.0.109
This package is auto-updated.
Last update: 2026-05-14 02:43:59 UTC
README
survos/data-bundle centralizes dataset filesystem conventions for
dataset-driven Symfony applications.
Despite the historical name, this bundle is not the owner of shared semantic metadata contracts. It manages where dataset files, provider metadata, Pixie databases, run artifacts, cache files, and related JSONL outputs live.
For shared vocabulary and typed metadata contracts, use
survos/data-contracts.
Scope
This bundle provides:
DataPaths: root-level path resolution underAPP_DATA_DIRDatasetPaths: dataset-scoped path helpers- dataset metadata loading and ensuring
DatasetInfo/Providerregistry entities- provider snapshot encoding
- dataset context helpers for console/import workflows
- commands for browsing, diagnosing, and resolving dataset paths
This bundle does not provide:
- Dublin Core vocabulary constants
- collection-object DTO contracts
- metadata claim storage
- AI workflow execution
- media upload, IIIF, or mediary publishing
- import/normalize/profile logic
Relationship to Other Packages
survos/data-contracts: shared metadata vocabulary and DTO contracts.survos/data-bundle: dataset paths, provider storage, and dataset registry.survos/import-bundle: import/convert workflows that may ask this bundle for dataset paths.survos/ai-workflow-bundle: task execution in apps that own subject context.- claims bundle: tracked metadata assertions with provenance and confidence.
survos/media-bundle: media identity and mediary publishing.
The dependency direction should stay honest: packages should require
survos/data-contracts directly when they only need DcTerms, ContentType,
or metadata DTOs. Do not require this bundle just to get vocabulary classes.
Core Idea
All dataset work lives under a single root directory:
APP_DATA_DIR=/absolute/path/to/data/root
The bundle avoids repository-relative paths and gives services and commands one place to ask for canonical locations.
Example layout:
$APP_DATA_DIR/
work/
<datasetKey>/
00_meta/
dataset.json
10_extract/
obj.jsonl
20_normalize/
obj.jsonl
21_profile/
obj.profile.json
30_terms/
*.jsonl
pixie/
tenants/
<tenant>.db
template/
exports/
runs/
cache/
Installation
composer require survos/data-bundle
Set the root directory:
export APP_DATA_DIR=/absolute/path/to/data/root
Usage
Inject DataPaths for root and dataset path resolution:
use Survos\DataBundle\Service\DataPaths; final class SomeService { public function __construct( private readonly DataPaths $paths, ) { } }
Common dataset paths:
$paths->datasetDir('dc/tb09jw350'); $paths->extractDir('dc/tb09jw350'); $paths->extractFile('dc/tb09jw350'); $paths->normalizeDir('dc/tb09jw350'); $paths->normalizeFile('dc/tb09jw350'); $paths->profileDir('dc/tb09jw350'); $paths->profileFile('dc/tb09jw350'); $paths->termsDir('dc/tb09jw350');
Pixie paths:
$paths->pixieTenantDb('larco');
Operational directories:
$paths->runsDir; $paths->cacheDir;
Commands
Current command names retain the historical data:* prefix:
bin/console data:path dc/tb09jw350 20_normalize bin/console data:head dc/tb09jw350 20_normalize --limit=5 bin/console data:diag dc/tb09jw350 bin/console data:browse bin/console data:scan-datasets
These may eventually move to dataset:* aliases when the bundle is renamed.
Directory Creation
Ensure global roots exist:
$paths->ensureRootDirs();
Ensure standard dataset stage directories exist:
$paths->ensureDatasetDirs('dc/tb09jw350');
Atomic File Writes
For small metadata files:
$paths->atomicWrite($path, $contents);
The write uses a temporary file in the same directory followed by an atomic rename.
Design Principles
- Dataset path conventions are centralized.
- Paths are semantic, not stringly typed.
- Dataset/provider storage concerns stay separate from semantic metadata contracts.
- Import, AI workflow, claims, and media publishing remain in their own packages.
- The bundle should stay boring and infrastructure-focused.
Future Rename
The better long-term name is survos/dataset-bundle. See
docs/rename-to-dataset-bundle.md.