LogoLogo
Back to OsmosBlogContact Us
  • Welcome to Osmos
    • Introduction
  • Getting Started with Microsoft Fabric
    • Fabric Tenant Settings
    • Common Fabric Issues & Troubleshooting
    • Adding the Osmos Workload
  • Adding the Osmos Workspace
  • Adding Workspace Items
  • Adding Data into a Lakehouse
  • AI Data Wrangler
    • AI Data Wrangler Overview
    • Create an AI Data Wrangler
    • Running a Wrangler
    • Wrangler Data Statuses
    • Wrangler Context
      • Descriptors
        • Best Practices for Column Descriptors
      • Instructions
    • Writing to the Destination
    • File Metadata
  • AI Data Engineer
    • AI Data Engineer Overview
    • Create an AI Data Engineer
    • Connect a Destination Table
    • Auto-Configure Instructions
    • Generate Notebook
  • Support
Powered by GitBook
On this page
  • Overview
  • What It Is
  • What the AI Does Behind the Scenes
  • 🔄 Iteration & Control
  • 🧩 Key Capabilities
  • ✅ Summary

Was this helpful?

Export as PDF
  1. AI Data Engineer

Generate Notebook

Overview

The Generate Notebook feature is at the heart of how the Osmos AI Data Engineer turns your configuration into powerful, reusable Python code. With one click, it produces a fully functional Spark-based notebook that is ready to run, schedule, version, and integrate into pipelines.

These notebooks do more than ingest and transform data—they represent long-living, production-grade workflows that evolve with your needs, while putting human reviewers in complete control.

What It Is

Generate Notebook triggers the AI Data Engineer to build a ready-to-run Python notebook based on your configuration instructions, source files, and destination schemas. The notebook is:

  • Execution-ready: Includes logic for ingestion, transformation, and validation

  • Reusable: Can be versioned, re-executed, and adapted for new data

  • Pipeline-ready: Built for integration into orchestration systems (e.g., Fabric, Airflow)

  • Autonomous but supervised: All actions are user-initiated, ensuring full control

Think of it as saying: “Hey engineer, write me a Python notebook for this job.” And the AI does it—intelligently, iteratively, and at scale.

What the AI Does Behind the Scenes

When you click Generate Notebook, the AI Data Engineer will:

  1. Sample & Analyze Files It inspects your input data (CSV, JSON, XML, Parquet, etc.) to understand schemas, anomalies, and transformations.

  2. Write the Code It generates Spark-based Python code that:

    • Ingests your data

    • Transforms it according to your instructions

    • Includes built-in schema checks and validation logic

  3. Write Its Tests The notebook includes test cases to catch data issues, logic gaps, or structural inconsistencies.

  4. Handle Errors Automatically If tests fail, the AI:

    • Resamples the data

    • Revises the code

    • Re-generates logic until a working solution is found

  5. Add Bookkeeping Built-in logic tracks what data has been processed, avoiding duplicates or reprocessing in future runs.

🔄 Iteration & Control

You're always in the loop:

  • Preview the generated notebook

  • Edit the code or configuration as needed

  • Regenerate if something’s missing or incorrect

  • Schedule or manually trigger runs

  • Version in Git or any repo of your choice

Notebooks are written defensively. They handle schema shifts gracefully, raising clear errors for issues that need intervention.

🧩 Key Capabilities

Capability
Description

Reusable & Versionable

Notebooks are long-living and can be stored, shared, and reused

Fully Tested

AI includes test scripts and validation checks

Pipeline Integration

Designed to plug into workflows and orchestration platforms

Bookkeeping Logic

Automatically tracks processed files for repeatable, safe operations

Performance Optimized

Passes through an AI profiler for better runtime and scaling

Human-in-the-Loop

All notebook generation and execution are initiated and reviewed by users

✅ Summary

The Generate Notebook feature lets you go from config to code—automatically. Whether you're managing a complex data lake or building a repeatable ingestion flow, Osmos AI writes high-quality notebooks for you.

-No boilerplate. -No hand coding. -Just reliable, production-ready notebooks you can trust.

PreviousAuto-Configure InstructionsNextSupport

Last updated 4 days ago

Was this helpful?