Instructions

What are Instructions?

Instructions provide guardrails that guide the AI, ensuring transformations stay within defined constraints and follow business intent. Users will upload one or more files, such as Business Requirements Documents and Information Architecture documentation. The uploaded documentation will be used to generate updates to descriptors that guide the wrangler.

Enter instructions specific to the Wrangler, such as cleaning rules, formatting guidelines, or preprocessing steps. These will only apply to the Wrangler's operations and won't affect others tied to the destination table.

Types of Instructions

There are two types of Instructions in the Wrangler.

  1. Auto-configure Wrangler using the documentation.

    1. Select a folder with documents that define destination data, source data, and how to transform from source to destination. The Wrangler will use this information to create instructions and descriptors for review.

  2. Provide Instructions

    1. Manually enter specific instructions.

Unlike Descriptors, Instructions are scoped to the Wrangler.

What are Descriptors?

Descriptors define and enforce schema-level constraints to ensure structural consistency across datasets. Essentially, they provide a method for users to describe the field. This will enable them to explain more details about the column definition, such as their acronyms, and share any other relevant information. Field Descriptors are unaware of the source data. They are optional and scoped to a destination table. This means all Wanglers pointing to a table share a standard set of field descriptors.

Users can provide descriptions for column headings when they want to guide the column cleaning to achieve better results. These column header description fields are referred to as column descriptors. Column descriptors are applied during the file review process. Once a column descriptor has been entered, you must save and rerun the file to apply the changes.

Descriptors are scoped to the destination table, not a Wrangler.

Here are suggestions for adding column descriptors to drive the most effective outcomes.

  • Describe valid data for this field.

  • Describe this field’s relationship to other fields in this table.

  • Describe any business rules that govern how this field should be populated.

Step 1: Access the Column Descriptors

  1. When the file is ready, select Ready for Review.

  2. In the review screen, select Retry (note, it will default to Approve)

  3. A Retry file processing information message will pop up, select Got It

Step 2: Adding a Column Descriptor

  1. In the Retry screen, select Add Descriptor, which is located directly below the column header field.

  2. When you select the Descriptor, the instructions box will open on the right.

  3. In the box, describe how your data should be cleaned.

  4. Select Save Descriptor.

Note: The column descriptor will update from Add Descriptor to Edit Descriptor.

Step 2: Applying the Descriptor

  1. To apply the descriptors to the file, select Rerun File in the lower right-hand corner.

Step 3: Editing a Descriptor

  1. Descriptors can be modified by selecting Edit Descriptor.

  2. Update the instructions.

  3. Select Save Descriptor.

Best Practices for Column Descriptors

  1. Define valid data that can be stored in the is column.

  2. Do not specify the source data. Focus on the destination column.

  3. Column Descriptors are tied to the destination table. If multiple Wranglers are pointed to the same table, they share the same descriptors.

  4. Provide examples

  5. Tell us how you want us to handle null

  6. Tell us how you want us to handle errors; even provide an example

  7. Be careful not to create contradictions between the data type and the column descriptor. For instance, if the data type is int32, do not ask to round to zero decimal places.

  8. If information is already in a descriptor, be careful of deleting it, especially if it is being shared across Wranglers. It is best practice to add to a description by editing the descriptor rather than deleting it.

Last updated

Was this helpful?