Generate Notebook
Overview
The Generate Notebook feature is at the heart of how the Osmos AI Data Engineer turns your configuration into powerful, reusable Python code. With one click, it produces a fully functional Spark-based notebook that is ready to run, schedule, version, and integrate into pipelines.
These notebooks do more than ingest and transform data—they represent long-living, production-grade workflows that evolve with your needs, while putting human reviewers in complete control.
What It Is
Generate Notebook triggers the AI Data Engineer to build a ready-to-run Python notebook based on your configuration instructions, source files, and destination schemas. The notebook is:
Execution-ready: Includes logic for ingestion, transformation, and validation
Reusable: Can be versioned, re-executed, and adapted for new data
Pipeline-ready: Built for integration into orchestration systems (e.g., Fabric, Airflow)
Autonomous but supervised: All actions are user-initiated, ensuring complete control
Think of it as saying: “Hey engineer, write me a Python notebook for this job.” And the AI does it—intelligently, iteratively, and at scale.
What the AI Does Behind the Scenes
When you click Generate Notebook, the AI Data Engineer will:
Sample & Analyze Files It inspects your input data (CSV, JSON, XML, Parquet, etc.) to understand schemas, anomalies, and transformations.
Write the Code It generates Spark-based Python code that:
Ingests your data
Transforms it according to your instructions
Includes built-in schema checks and validation logic
Write Its Tests The notebook includes test cases to catch data issues, logic gaps, or structural inconsistencies.
Handle Errors Automatically If tests fail, the AI:
Resamples the data
Revises the code
Re-generates logic until a working solution is found
Add Bookkeeping Built-in logic tracks what data has been processed, avoiding duplicates or reprocessing in future runs.
Iteration & Feedback
After reviewing the generated instructions, you can:
Edit them inline
Add edge-case handling
Strengthen constraints (e.g., "fail if source columns change")
If the result isn't correct, update your instructions and regenerate
Use real-time feedback to refine and guide the AI’s behavior
Key Capabilities
Reusable & Versionable
Notebooks are long-living and can be stored, shared, and reused
Fully Tested
AI includes test scripts and validation checks
Pipeline Integration
Designed to plug into workflows and orchestration platforms
Bookkeeping Logic
Automatically tracks processed files for repeatable, safe operations
Performance Optimized
Passes through an AI profiler for better runtime and scaling
Human-in-the-Loop
All notebook generation and execution are initiated and reviewed by users
Last updated
Was this helpful?