LogoLogo
Back to OsmosBlogContact Us
  • Welcome to Osmos
    • Introduction
  • Getting Started with Microsoft Fabric
    • Fabric Tenant Settings
    • Common Fabric Issues & Troubleshooting
    • Adding the Osmos Workload
  • Adding the Osmos Workspace
  • Adding Workspace Items
  • Adding Data into a Lakehouse
  • AI Data Wrangler
    • AI Data Wrangler Overview
    • Create an AI Data Wrangler
    • Running a Wrangler
    • Wrangler Data Statuses
    • Wrangler Context
      • Descriptors
        • Best Practices for Column Descriptors
      • Instructions
    • Writing to the Destination
    • File Metadata
  • AI Data Engineer
    • AI Data Engineer Overview
    • Create an AI Data Engineer
    • Connect a Destination Table
    • Auto-Configure Instructions
    • Generate Notebook
  • Support
Powered by GitBook
On this page
  • Overview
  • 🚀 What It Is
  • 👥 Who It’s For
  • 🎯 Key Use Cases
  • 🛠️ How It Works
  • đź§© Key Capabilities
  • ⚙️ AI Decision-Making Logic
  • đź§Ş Example Scenarios
  • Summary

Was this helpful?

Export as PDF
  1. AI Data Wrangler

AI Data Wrangler Overview

Overview

What is the AI Data Wrangler? The Osmos AI Data Wrangler is an autonomous data agent that transforms your messiest, most irregular files into clean, structured data—hands-free. It’s purpose-built to automate the wrangling of complex file formats found across SharePoint, FTPs, GDrive, and more, enabling faster, more reliable decision-making without scaling up your data engineering team.

Currently available within Microsoft Fabric, the AI Data Wrangler helps organizations prepare lakehouse data with precision and minimal effort by leveraging generative AI.

🚀 What It Is

The AI Data Wrangler uses GenAI to intelligently and autonomously clean and reshape messy files into structured, SQL-ready datasets. Whether it's inconsistent Excel exports, broken PDFs, or fixed-width legacy system files, the Wrangler selects the most effective processing strategy—writing custom code or chunking through LLMs—so you don’t have to.

No rules. No templates. No manual rework.

👥 Who It’s For

  • Primary User Persona: Business, Operations, and Data Services Teams

🎯 Key Use Cases

  • Preparing messy source files to deliver SQL-ready data for downstream analytics

  • Wrangling input from:

    • Excel files with inconsistent headers and merged rows

    • PDFs with embedded or unstructured data

    • Fixed-width or custom-delimited exports

    • “Not really” CSVs from legacy tools

  • Mapping irregular data to a standardized schema (e.g., customer master table)

🛠️ How It Works

1. Submit Your Files

Upload messy files from sources like SharePoint, GDrive, FTPs, or internal systems. Supports a wide range of formats with irregular structure.

2. Provide Instructions—or Don’t

You can:

  • Point to a golden schema or a Fabric destination table

  • Let the AI infer expectations from instructions, example files, or even code

  • Use Autoconfigure to ingest prior docs and extract transformation logic

3. Leave the Dirty Work to Osmos

The Wrangler decides:

  • Whether to generate transformation code

  • Whether to chunk and semantically analyze the file using LLMs

  • How to best get your clean, validated tabular data

Each file is processed independently. No brittle code. No manual tuning.

4. Review, Approve, Repeat

  • Review outputs before committing

  • Compare the output side-by-side with the input for validation

  • Request changes or reprocess with new instructions

  • Accept the result and move on

đź§© Key Capabilities

Capability
Description

Fully Autonomous

AI decides optimal logic per file—LLM, code, or both

Flexible File Support

Handles PDFs, Excels, fixed-width, delimited, malformed CSVs, and more

Golden Schema Mapping

Aligns source data with your lakehouse schema and business expectations

Instant Review Cycles

See results in minutes, give feedback, or approve with a click

Built for Fabric

Seamlessly manages and prepares data in your Microsoft Fabric environment

⚙️ AI Decision-Making Logic

The Wrangler processes each file independently and flexibly:

  • Infers structure and formatting quirks

  • Chooses between LLM chunking and custom code generation

  • Validates results through in-process checks

  • Supports multiple data types in a single run

The output is always clean tabular data, not reusable code, because messy files change constantly, and brittle code breaks.

đź§Ş Example Scenarios

Input Scenario
Wrangler Outcome

Broken Excel with multi-row headers

Extracted proper columns, standardized formats, aligned to schema

PDF invoices with nested info

Parsed PO numbers, product descriptions, and quantities cleanly

Fixed-width export with missing headers

Inferred headers, extracted fields by position, produced structured output

Custom-delimited file with inconsistent rows

Detected delimiters, normalized row lengths, created clean flat file

Semi-structured CSV with embedded fields

Split merged fields into columns, matched values to categories

Summary

The Osmos AI Data Wrangler turns unstructured, irregular data chaos into consistent, actionable insights—fast. With no need for templates or hand-written transformations, it autonomously learns what your data should look like and delivers results you can trust.

Whether you’re prepping data for analytics or just trying to get invoice PDFs into your lakehouse, the AI Data Wrangler is your hands-free, error-free solution.

-From chaos to clean in minutes. -Powered by generative AI. -Available now in Microsoft Fabric.

PreviousAI Data WranglerNextCreate an AI Data Wrangler

Last updated 4 days ago

Was this helpful?