# Document Import

## Overview

GoFigr's Document Import feature lets you extract figures from existing PowerPoint presentations and Word documents, bringing them into your managed figure library. Whether you have years of presentations to organize or want to track figures from collaborators' documents, GoFigr automatically extracts images, generates intelligent titles using AI, and links everything back to the source document.

## Key Benefits for Users

### Rescue Figures from the Slide Deck Graveyard

* **Extract All Images**: Every image in your PowerPoint or Word document is automatically extracted
* **Organize Legacy Content**: Transform scattered presentations into a searchable figure library
* **No Manual Work**: Skip the tedious process of right-click-save-as for every figure

### AI-Powered Intelligence

* **Smart Figure Titles**: AI analyzes slide content and context to generate meaningful figure names
* **Context-Aware Naming**: Titles reflect what's on the slide, not generic names like "image1.png"
* **OCR-Based Matching**: QR codes with GoFigr UUIDs are detected to match figures to existing tracked figures

### Full Document Provenance

* **Source Linking**: Every extracted figure links back to its source document
* **Slide/Page Context**: Know exactly which slide or page each figure came from
* **Duplicate Detection**: Identical figures are recognized and not duplicated in your library

## How It Works

### Importing PowerPoint Files (.pptx)

#### Step 1: Upload Your Presentation

1. Navigate to the Import page
2. Select your workspace
3. Drag and drop or browse to select your .pptx file
4. Click "Upload"

#### Step 2: Automatic Processing

GoFigr processes your presentation:

1. **Document Storage**: The full presentation is stored as an asset
2. **Slide Scanning**: Each slide is examined for images
3. **Image Extraction**: All images are extracted with metadata
4. **AI Title Generation**: Slide text is used to generate descriptive figure titles
5. **UUID Detection**: QR codes are scanned for GoFigr figure UUIDs
6. **Figure Creation**: Each image becomes a tracked figure in your library

#### Step 3: Review Results

* View the extracted figures in your workspace
* Each figure shows its source presentation
* Click through to the original slide location
* Edit titles if the AI suggestions need refinement

### Importing Word Documents (.docx)

#### Step 1: Upload Your Document

1. Navigate to the Import page
2. Select your workspace
3. Upload your .docx file
4. Processing begins automatically

#### Step 2: Automatic Processing

GoFigr processes your document:

1. **Document Storage**: The Word file is stored as an asset
2. **Structure Analysis**: Document hierarchy is traversed
3. **Image Extraction**: All embedded images are extracted
4. **Context Capture**: Surrounding text is used for AI title generation
5. **Figure Creation**: Each image becomes a tracked figure

#### Step 3: Review and Organize

* Extracted figures appear in your workspace
* Linked to the source Word document
* Organized by the analysis (named after the document)

## AI-Powered Title Generation

### How It Works

When AI title generation is enabled:

1. **Context Extraction**: For PowerPoint, slide title and body text are captured
2. **Semantic Analysis**: AI understands what the slide/page is about
3. **Relevant Naming**: Titles describe the figure's content, not generic identifiers
4. **Batch Processing**: Multiple titles generated efficiently in a single AI call

### Examples

| Generic Name       | AI-Generated Title                       |
| ------------------ | ---------------------------------------- |
| image1.png         | "Survival curves by treatment group"     |
| Picture 3          | "Gene expression heatmap - top 50 genes" |
| Slide4\_shape2.png | "ROC curve comparison - Model A vs B"    |

### When AI Naming Helps Most

* Presentations with descriptive slide titles
* Documents with figure captions
* Scientific figures with contextual text nearby

## QR Code and UUID Detection

### Automatic Figure Matching

If your figures contain GoFigr QR codes:

1. **QR Scanning**: Images are scanned for QR codes
2. **UUID Extraction**: GoFigr UUIDs are extracted from detected codes
3. **Revision Matching**: UUIDs are matched against existing figure revisions
4. **Deduplication**: Matched figures link to existing revisions instead of creating duplicates

### When This Helps

* Re-importing presentations that contain tracked GoFigr figures
* Maintaining a single source of truth for each figure
* Preserving figure history across document versions

## Source Document Linking

### Bidirectional Connections

Every imported figure maintains links to its source:

**From Figure View:**

* "Source: Q4\_Results.pptx" with clickable link
* "Slide 7, Shape 3" position metadata
* Direct navigation to the document

**From Document View:**

* List of all figures extracted from this document
* Thumbnails with links to full figure views
* Extraction status and metadata

### Document Preview

Imported documents are viewable within GoFigr:

* **PowerPoint Preview**: Navigate through slides
* **Word Preview**: Scroll through document content
* **Specialized Views**: Optimized rendering for each format

## Import Metadata

### What's Captured

For each imported figure:

| Metadata         | Description                     |
| ---------------- | ------------------------------- |
| Source Type      | "powerpoint" or "word"          |
| File Name        | Original document filename      |
| File Size        | Document size in bytes          |
| Slide/Page Index | Position in the document        |
| Shape Index      | Which shape on the slide (PPT)  |
| Surrounding Text | Text context used for AI naming |
| Import Timestamp | When the import occurred        |

### Using Metadata

* Search for figures by source document
* Filter by import date
* Track provenance for compliance

## Handling Duplicates

### Hash-Based Detection

GoFigr uses content hashing to detect duplicates:

1. Each image's content is hashed
2. Hash is compared against existing figures in the workspace
3. Matching hash = existing figure is reused
4. New hash = new figure revision created

### UUID-Based Detection

For GoFigr-watermarked figures:

1. QR codes are scanned for UUIDs
2. UUIDs identify specific figure revisions
3. Matching UUID links to existing revision
4. No duplicate figures created

### Benefits

* Clean, deduplicated figure library
* Single source of truth for each figure
* History preserved across imports

## How to Access

### Via the Import Page

1. Click "Import" in the main navigation
2. Select your workspace
3. Choose "Upload Files"
4. Drop your .pptx or .docx files
5. Monitor progress in the task modal

### Via Drag and Drop

1. Navigate to your workspace view
2. Drag documents directly onto the page
3. Import processing begins automatically

### Supported Formats

| Format            | Extension | Notes                   |
| ----------------- | --------- | ----------------------- |
| PowerPoint        | .pptx     | Modern XML format       |
| Word              | .docx     | Modern XML format       |
| Legacy PowerPoint | .ppt      | Limited support         |
| Legacy Word       | .doc      | Not currently supported |

## Best Practices

### Before Importing

1. **Use Modern Formats**: Convert .ppt to .pptx and .doc to .docx for best results
2. **Clean Up Decorative Images**: Remove logos, backgrounds, and non-figure images that you don't want tracked
3. **Add Descriptive Slide Titles**: Better slide titles = better AI-generated figure names

### After Importing

1. **Review AI Titles**: Check and edit any titles that need refinement
2. **Organize into Analyses**: Group related figures if they span multiple documents
3. **Set Up Tracking**: Enable figure tracking for ongoing updates

### For Large Presentations

1. **Import in Batches**: Break very large presentations into smaller files if needed
2. **Monitor Progress**: Use the task modal to track import status
3. **Check Results**: Review extracted figures for completeness

## Technical Details

### Image Extraction

* **PowerPoint**: Uses python-pptx to access slide shapes and embedded images
* **Word**: Uses python-docx to traverse document structure and extract images
* **Formats Supported**: PNG, JPEG, GIF, TIFF, BMP, WMF, EMF

### AI Integration

* Powered by Amazon Bedrock
* Uses slide/document context for intelligent naming
* Respects AI quotas and rate limits

### Storage

* Original documents stored as assets
* Extracted images stored as figure revisions
* Full provenance chain maintained

### Processing

* Asynchronous processing via task queue
* Progress tracking via WebSocket updates
* Error recovery for partial failures


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.gofigr.io/features/document-import.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
