Document Import
Overview
GoFigr's Document Import feature lets you extract figures from existing PowerPoint presentations and Word documents, bringing them into your managed figure library. Whether you have years of presentations to organize or want to track figures from collaborators' documents, GoFigr automatically extracts images, generates intelligent titles using AI, and links everything back to the source document.
Key Benefits for Users
Rescue Figures from the Slide Deck Graveyard
Extract All Images: Every image in your PowerPoint or Word document is automatically extracted
Organize Legacy Content: Transform scattered presentations into a searchable figure library
No Manual Work: Skip the tedious process of right-click-save-as for every figure
AI-Powered Intelligence
Smart Figure Titles: AI analyzes slide content and context to generate meaningful figure names
Context-Aware Naming: Titles reflect what's on the slide, not generic names like "image1.png"
OCR-Based Matching: QR codes with GoFigr UUIDs are detected to match figures to existing tracked figures
Full Document Provenance
Source Linking: Every extracted figure links back to its source document
Slide/Page Context: Know exactly which slide or page each figure came from
Duplicate Detection: Identical figures are recognized and not duplicated in your library
How It Works
Importing PowerPoint Files (.pptx)
Step 1: Upload Your Presentation
Navigate to the Import page
Select your workspace
Drag and drop or browse to select your .pptx file
Click "Upload"
Step 2: Automatic Processing
GoFigr processes your presentation:
Document Storage: The full presentation is stored as an asset
Slide Scanning: Each slide is examined for images
Image Extraction: All images are extracted with metadata
AI Title Generation: Slide text is used to generate descriptive figure titles
UUID Detection: QR codes are scanned for GoFigr figure UUIDs
Figure Creation: Each image becomes a tracked figure in your library
Step 3: Review Results
View the extracted figures in your workspace
Each figure shows its source presentation
Click through to the original slide location
Edit titles if the AI suggestions need refinement
Importing Word Documents (.docx)
Step 1: Upload Your Document
Navigate to the Import page
Select your workspace
Upload your .docx file
Processing begins automatically
Step 2: Automatic Processing
GoFigr processes your document:
Document Storage: The Word file is stored as an asset
Structure Analysis: Document hierarchy is traversed
Image Extraction: All embedded images are extracted
Context Capture: Surrounding text is used for AI title generation
Figure Creation: Each image becomes a tracked figure
Step 3: Review and Organize
Extracted figures appear in your workspace
Linked to the source Word document
Organized by the analysis (named after the document)
AI-Powered Title Generation
How It Works
When AI title generation is enabled:
Context Extraction: For PowerPoint, slide title and body text are captured
Semantic Analysis: AI understands what the slide/page is about
Relevant Naming: Titles describe the figure's content, not generic identifiers
Batch Processing: Multiple titles generated efficiently in a single AI call
Examples
image1.png
"Survival curves by treatment group"
Picture 3
"Gene expression heatmap - top 50 genes"
Slide4_shape2.png
"ROC curve comparison - Model A vs B"
When AI Naming Helps Most
Presentations with descriptive slide titles
Documents with figure captions
Scientific figures with contextual text nearby
QR Code and UUID Detection
Automatic Figure Matching
If your figures contain GoFigr QR codes:
QR Scanning: Images are scanned for QR codes
UUID Extraction: GoFigr UUIDs are extracted from detected codes
Revision Matching: UUIDs are matched against existing figure revisions
Deduplication: Matched figures link to existing revisions instead of creating duplicates
When This Helps
Re-importing presentations that contain tracked GoFigr figures
Maintaining a single source of truth for each figure
Preserving figure history across document versions
Source Document Linking
Bidirectional Connections
Every imported figure maintains links to its source:
From Figure View:
"Source: Q4_Results.pptx" with clickable link
"Slide 7, Shape 3" position metadata
Direct navigation to the document
From Document View:
List of all figures extracted from this document
Thumbnails with links to full figure views
Extraction status and metadata
Document Preview
Imported documents are viewable within GoFigr:
PowerPoint Preview: Navigate through slides
Word Preview: Scroll through document content
Specialized Views: Optimized rendering for each format
Import Metadata
What's Captured
For each imported figure:
Source Type
"powerpoint" or "word"
File Name
Original document filename
File Size
Document size in bytes
Slide/Page Index
Position in the document
Shape Index
Which shape on the slide (PPT)
Surrounding Text
Text context used for AI naming
Import Timestamp
When the import occurred
Using Metadata
Search for figures by source document
Filter by import date
Track provenance for compliance
Handling Duplicates
Hash-Based Detection
GoFigr uses content hashing to detect duplicates:
Each image's content is hashed
Hash is compared against existing figures in the workspace
Matching hash = existing figure is reused
New hash = new figure revision created
UUID-Based Detection
For GoFigr-watermarked figures:
QR codes are scanned for UUIDs
UUIDs identify specific figure revisions
Matching UUID links to existing revision
No duplicate figures created
Benefits
Clean, deduplicated figure library
Single source of truth for each figure
History preserved across imports
How to Access
Via the Import Page
Click "Import" in the main navigation
Select your workspace
Choose "Upload Files"
Drop your .pptx or .docx files
Monitor progress in the task modal
Via Drag and Drop
Navigate to your workspace view
Drag documents directly onto the page
Import processing begins automatically
Supported Formats
PowerPoint
.pptx
Modern XML format
Word
.docx
Modern XML format
Legacy PowerPoint
.ppt
Limited support
Legacy Word
.doc
Not currently supported
Best Practices
Before Importing
Use Modern Formats: Convert .ppt to .pptx and .doc to .docx for best results
Clean Up Decorative Images: Remove logos, backgrounds, and non-figure images that you don't want tracked
Add Descriptive Slide Titles: Better slide titles = better AI-generated figure names
After Importing
Review AI Titles: Check and edit any titles that need refinement
Organize into Analyses: Group related figures if they span multiple documents
Set Up Tracking: Enable figure tracking for ongoing updates
For Large Presentations
Import in Batches: Break very large presentations into smaller files if needed
Monitor Progress: Use the task modal to track import status
Check Results: Review extracted figures for completeness
Technical Details
Image Extraction
PowerPoint: Uses python-pptx to access slide shapes and embedded images
Word: Uses python-docx to traverse document structure and extract images
Formats Supported: PNG, JPEG, GIF, TIFF, BMP, WMF, EMF
AI Integration
Powered by Amazon Bedrock
Uses slide/document context for intelligent naming
Respects AI quotas and rate limits
Storage
Original documents stored as assets
Extracted images stored as figure revisions
Full provenance chain maintained
Processing
Asynchronous processing via task queue
Progress tracking via WebSocket updates
Error recovery for partial failures
Last updated