Document Import

Overview

GoFigr's Document Import feature lets you extract figures from existing PowerPoint presentations and Word documents, bringing them into your managed figure library. Whether you have years of presentations to organize or want to track figures from collaborators' documents, GoFigr automatically extracts images, generates intelligent titles using AI, and links everything back to the source document.

Key Benefits for Users

Rescue Figures from the Slide Deck Graveyard

  • Extract All Images: Every image in your PowerPoint or Word document is automatically extracted

  • Organize Legacy Content: Transform scattered presentations into a searchable figure library

  • No Manual Work: Skip the tedious process of right-click-save-as for every figure

AI-Powered Intelligence

  • Smart Figure Titles: AI analyzes slide content and context to generate meaningful figure names

  • Context-Aware Naming: Titles reflect what's on the slide, not generic names like "image1.png"

  • OCR-Based Matching: QR codes with GoFigr UUIDs are detected to match figures to existing tracked figures

Full Document Provenance

  • Source Linking: Every extracted figure links back to its source document

  • Slide/Page Context: Know exactly which slide or page each figure came from

  • Duplicate Detection: Identical figures are recognized and not duplicated in your library

How It Works

Importing PowerPoint Files (.pptx)

Step 1: Upload Your Presentation

  1. Navigate to the Import page

  2. Select your workspace

  3. Drag and drop or browse to select your .pptx file

  4. Click "Upload"

Step 2: Automatic Processing

GoFigr processes your presentation:

  1. Document Storage: The full presentation is stored as an asset

  2. Slide Scanning: Each slide is examined for images

  3. Image Extraction: All images are extracted with metadata

  4. AI Title Generation: Slide text is used to generate descriptive figure titles

  5. UUID Detection: QR codes are scanned for GoFigr figure UUIDs

  6. Figure Creation: Each image becomes a tracked figure in your library

Step 3: Review Results

  • View the extracted figures in your workspace

  • Each figure shows its source presentation

  • Click through to the original slide location

  • Edit titles if the AI suggestions need refinement

Importing Word Documents (.docx)

Step 1: Upload Your Document

  1. Navigate to the Import page

  2. Select your workspace

  3. Upload your .docx file

  4. Processing begins automatically

Step 2: Automatic Processing

GoFigr processes your document:

  1. Document Storage: The Word file is stored as an asset

  2. Structure Analysis: Document hierarchy is traversed

  3. Image Extraction: All embedded images are extracted

  4. Context Capture: Surrounding text is used for AI title generation

  5. Figure Creation: Each image becomes a tracked figure

Step 3: Review and Organize

  • Extracted figures appear in your workspace

  • Linked to the source Word document

  • Organized by the analysis (named after the document)

AI-Powered Title Generation

How It Works

When AI title generation is enabled:

  1. Context Extraction: For PowerPoint, slide title and body text are captured

  2. Semantic Analysis: AI understands what the slide/page is about

  3. Relevant Naming: Titles describe the figure's content, not generic identifiers

  4. Batch Processing: Multiple titles generated efficiently in a single AI call

Examples

Generic Name
AI-Generated Title

image1.png

"Survival curves by treatment group"

Picture 3

"Gene expression heatmap - top 50 genes"

Slide4_shape2.png

"ROC curve comparison - Model A vs B"

When AI Naming Helps Most

  • Presentations with descriptive slide titles

  • Documents with figure captions

  • Scientific figures with contextual text nearby

QR Code and UUID Detection

Automatic Figure Matching

If your figures contain GoFigr QR codes:

  1. QR Scanning: Images are scanned for QR codes

  2. UUID Extraction: GoFigr UUIDs are extracted from detected codes

  3. Revision Matching: UUIDs are matched against existing figure revisions

  4. Deduplication: Matched figures link to existing revisions instead of creating duplicates

When This Helps

  • Re-importing presentations that contain tracked GoFigr figures

  • Maintaining a single source of truth for each figure

  • Preserving figure history across document versions

Source Document Linking

Bidirectional Connections

Every imported figure maintains links to its source:

From Figure View:

  • "Source: Q4_Results.pptx" with clickable link

  • "Slide 7, Shape 3" position metadata

  • Direct navigation to the document

From Document View:

  • List of all figures extracted from this document

  • Thumbnails with links to full figure views

  • Extraction status and metadata

Document Preview

Imported documents are viewable within GoFigr:

  • PowerPoint Preview: Navigate through slides

  • Word Preview: Scroll through document content

  • Specialized Views: Optimized rendering for each format

Import Metadata

What's Captured

For each imported figure:

Metadata
Description

Source Type

"powerpoint" or "word"

File Name

Original document filename

File Size

Document size in bytes

Slide/Page Index

Position in the document

Shape Index

Which shape on the slide (PPT)

Surrounding Text

Text context used for AI naming

Import Timestamp

When the import occurred

Using Metadata

  • Search for figures by source document

  • Filter by import date

  • Track provenance for compliance

Handling Duplicates

Hash-Based Detection

GoFigr uses content hashing to detect duplicates:

  1. Each image's content is hashed

  2. Hash is compared against existing figures in the workspace

  3. Matching hash = existing figure is reused

  4. New hash = new figure revision created

UUID-Based Detection

For GoFigr-watermarked figures:

  1. QR codes are scanned for UUIDs

  2. UUIDs identify specific figure revisions

  3. Matching UUID links to existing revision

  4. No duplicate figures created

Benefits

  • Clean, deduplicated figure library

  • Single source of truth for each figure

  • History preserved across imports

How to Access

Via the Import Page

  1. Click "Import" in the main navigation

  2. Select your workspace

  3. Choose "Upload Files"

  4. Drop your .pptx or .docx files

  5. Monitor progress in the task modal

Via Drag and Drop

  1. Navigate to your workspace view

  2. Drag documents directly onto the page

  3. Import processing begins automatically

Supported Formats

Format
Extension
Notes

PowerPoint

.pptx

Modern XML format

Word

.docx

Modern XML format

Legacy PowerPoint

.ppt

Limited support

Legacy Word

.doc

Not currently supported

Best Practices

Before Importing

  1. Use Modern Formats: Convert .ppt to .pptx and .doc to .docx for best results

  2. Clean Up Decorative Images: Remove logos, backgrounds, and non-figure images that you don't want tracked

  3. Add Descriptive Slide Titles: Better slide titles = better AI-generated figure names

After Importing

  1. Review AI Titles: Check and edit any titles that need refinement

  2. Organize into Analyses: Group related figures if they span multiple documents

  3. Set Up Tracking: Enable figure tracking for ongoing updates

For Large Presentations

  1. Import in Batches: Break very large presentations into smaller files if needed

  2. Monitor Progress: Use the task modal to track import status

  3. Check Results: Review extracted figures for completeness

Technical Details

Image Extraction

  • PowerPoint: Uses python-pptx to access slide shapes and embedded images

  • Word: Uses python-docx to traverse document structure and extract images

  • Formats Supported: PNG, JPEG, GIF, TIFF, BMP, WMF, EMF

AI Integration

  • Powered by Amazon Bedrock

  • Uses slide/document context for intelligent naming

  • Respects AI quotas and rate limits

Storage

  • Original documents stored as assets

  • Extracted images stored as figure revisions

  • Full provenance chain maintained

Processing

  • Asynchronous processing via task queue

  • Progress tracking via WebSocket updates

  • Error recovery for partial failures

Last updated