LogoLogo
Back to OsmosBlogContact Us
  • Welcome to Osmos
    • Introduction
  • Getting Started with Microsoft Fabric
    • Fabric Tenant Settings
    • Common Fabric Issues & Troubleshooting
    • Adding the Osmos Workload
  • Adding the Osmos Workspace
  • Adding Workspace Items
  • Adding Data into a Lakehouse
  • AI Data Wrangler
    • How to Create an AI Data Wrangler
    • Running a Wrangler
    • Wrangler Data Statuses
    • Wrangler Context
      • Descriptors
        • Best Practices for Column Descriptors
      • Instructions
    • Writing to the Destination
    • File Metadata
  • Support
Powered by GitBook
On this page
  • 🧹 AI Data Wrangler
  • Overview
  • 🚀 What It Is
  • 👥 Who It’s For
  • 🎯 Key Use Cases
  • 🛠️ How It Works
  • 🧩 Key Capabilities
  • ⚙️ AI Decision-Making Logic
  • 🧪 Example Scenarios
  • ✅ Summary

Was this helpful?

Export as PDF

AI Data Wrangler

🧹 AI Data Wrangler

Overview

The Osmos AI Data Wrangler is your intelligent solution for the messy file problem. Whether it’s broken Excel headers, scattered PDFs, or misbehaving CSVs, the AI Data Wrangler transforms chaos into clean, SQL-ready data—hands-free. Built for business and ops teams who constantly wrestle with wild, irregular data, this AI agent lets you move from doer to reviewer.


🚀 What It Is

The AI Data Wrangler autonomously processes irregular files—PDFs, Excels, fixed-width files, custom-delimited exports, and more—and converts them into clean tabular outputs. Each file is handled independently using a tailored approach: either AI-generated transformation logic or direct LLM-based parsing, depending on what best suits the file.


👥 Who It’s For

  • Primary User Persona: Business, Operations, and Data Services Teams


🎯 Key Use Cases

  • Extracting tabular data from:

    • Messy Excels (merged headers, split rows)

    • PDFs with embedded or unstructured text

    • Fixed-width or custom-delimited files

    • Non-standard CSV exports from legacy systems

  • Cleaning and standardizing data to match predefined schemas

  • Delivering SQL-ready data for downstream analytics or machine learning


🛠️ How It Works

1. Submit Your Files

Upload your messiest files from FTP, SharePoint, or Google Drive. The Wrangler supports diverse formats and inconsistent structures.

2. Provide Instructions, or Don’t

Give the Wrangler a golden schema or transformation instructions, or point it to relevant documentation. It learns from examples, prior scripts, SQL, and more.

3. Leave the Dirty Work to Osmos

The AI autonomously determines the best processing approach:

  • Code generation for structured mess

  • LLM chunking for semantic extraction

  • Hybrid strategies for complex files

4. Review and Approve Clean Data

Inspect the cleaned outputs:

  • Validate field mappings and transformations

  • Approve, revise, or re-run

  • Export or load directly into your lakehouse or downstream systems


🧩 Key Capabilities

Capability
Description

Autonomous File Handling

Determines the optimal processing logic for each file type and structure.

Semantic Understanding

Extracts structured data from even the most unstructured formats like PDFs.

Golden Schema Mapping

Learns expected output structure and aligns raw data accordingly.

Rapid Review Cycles

Instantly preview, approve, or rerun cleanups—no waiting on scripts.


⚙️ AI Decision-Making Logic

Each file is evaluated independently. The AI decides:

  • Whether to write transformation code

  • Whether to chunk and LLM-process the data

  • Whether to blend both approaches

The result? Clean data with minimal user intervention and no brittle, one-off scripts.


🧪 Example Scenarios

Input Example
Wrangler Outcome

PDF invoice with embedded product details

Extracted SKUs, quantities, and invoice fields neatly tabulated

Excel with broken multi-row headers

Normalized columns, standardized categories, cleaned contact fields

Fixed-width files without headers

Parsed structure, inferred schema, aligned with destination table

Randomly delimited exports from mainframes

Detected and corrected delimiters, extracted consistent field values

Semi-structured CSV with merged fields

Split out embedded values, aligned rows with reference schema


✅ Summary

The Osmos AI Data Wrangler turns file chaos into data clarity. There are no rules or templates—just clean, structured data, fast. With built-in schema alignment, intelligent transformation decisions, and a review-first workflow, it’s the easiest way to tame messy files at scale.

PreviousAdding Data into a LakehouseNextHow to Create an AI Data Wrangler

Last updated 12 days ago

Was this helpful?