Git Repository Import

Overview

GoFigr's Git Repository Import feature bridges the gap between your code repositories and your figure management workflow. Import Jupyter notebooks directly from GitHub, GitLab, Bitbucket, or any Git-compatible host, and GoFigr automatically extracts every figure from every commit—preserving your complete research history with full version tracking and attribution.

Key Benefits for Users

Preserve Your Research History

  • Full Commit History: Import figures from all commits, not just the latest version

  • Version Tracking: Each notebook revision becomes a tracked figure revision in GoFigr

  • Git Metadata Preserved: Branch names, commit hashes, and timestamps are stored with each import

  • Author Attribution: Git commit authors are automatically mapped to GoFigr users

Seamless Integration with Your Workflow

  • No Workflow Changes: Keep using Git as you always have—GoFigr pulls from your repositories

  • Multiple Git Hosts: Support for GitHub, GitLab, Bitbucket, and any standard Git server

  • Both HTTPS and SSH: Use public repos via HTTPS or private repos with SSH key authentication

  • Branch Selection: Choose exactly which branches to import

Automatic Figure Extraction

  • Jupyter Notebook Processing: All output cells with images are automatically extracted

  • Code Association: Each figure is linked to the code cell that generated it

  • Source Document Linking: Figures are connected back to their source notebook files

  • Intelligent Deduplication: Identical figures are detected and not duplicated

How It Works

Step 1: Navigate to Import

  1. Go to the Import page from the main navigation

  2. Select your target workspace

  3. Choose the "Git Repository" import option

Step 2: Enter Repository URL

Enter your Git repository URL. GoFigr supports multiple formats:

HTTPS URLs (public repositories):

  • https://github.com/username/repository.git

  • https://gitlab.com/username/repository.git

  • https://bitbucket.org/username/repository.git

SSH URLs (private repositories):

Step 3: Configure SSH Key (for Private Repos)

For SSH-based URLs to private repositories:

  1. The SSH Key selector appears automatically

  2. Choose an existing SSH key or add a new one

  3. SSH keys are stored encrypted and used securely for authentication

  4. Click "Manage SSH Keys" to add, view, or remove keys

Note: SSH keys are optional for HTTPS URLs to public repositories.

Step 4: Select Branches

Once the repository is validated:

  1. GoFigr automatically fetches available branches

  2. Main/master branches are selected by default if present

  3. Use the multi-select dropdown to add or remove branches

  4. Each selected branch will be scanned for notebooks

Step 5: Start Import

Click "Import from Git" to begin:

  1. GoFigr clones the repository to a secure temporary location

  2. All commits in selected branches are scanned for .ipynb files

  3. Each notebook file at each commit is processed

  4. Figures are extracted from cell outputs

  5. Progress is displayed in real-time

Step 6: Monitor Progress

The import progress modal shows:

  • Overall Progress: Percentage complete across all files

  • Current File: Which notebook is being processed

  • Branch Progress: Which branch/commit is being scanned

  • Log Messages: Detailed status updates

  • Cancel Option: Stop the import at any time if needed

What Gets Imported

From Each Notebook

  • All Figure Outputs: PNG, JPEG, SVG, and other image outputs from cells

  • Cell Code: The code that generated each figure is preserved

  • Notebook Metadata: Kernel info, notebook version, and custom metadata

From Git History

  • Commit Timestamps: Each figure revision uses the original Git commit time

  • Author Information: Commit authors are mapped to GoFigr users by email

  • Branch Context: Which branch each version came from

  • Commit Hash: Links back to the exact commit for provenance

How Figures Are Organized

  1. One Analysis per Notebook Path: Notebooks with the same path share an analysis

  2. One Figure per Output: Each distinct figure in the notebook becomes a GoFigr figure

  3. Revisions by Commit: Different commits create different figure revisions

  4. Source Linking: Each figure links back to its source notebook asset

Author Attribution

GoFigr intelligently maps Git authors to GoFigr users:

Automatic Matching

  • Git commit author emails are matched against GoFigr user emails

  • Matching requires the importing user to have a confirmed email address

  • Matched figures show the original author in GoFigr

When No Match Is Found

  • The "on behalf of" field stores the Git author name and email

  • Full attribution is preserved even without a GoFigr account

  • Future users can claim their figures when they join

Importing User as Fallback

  • If author matching is disabled or fails, the importing user is credited

  • The original Git author info is still stored in metadata

Real-Time Progress Tracking

Progress Indicators

  • File Count: "Processing file 5 of 23"

  • Branch Progress: "Scanning branch: feature/analysis"

  • Commit Info: "Processing commit a1b2c3d..."

  • Detailed Logs: Timestamped log messages for debugging

Error Handling

  • Graceful Failures: Individual file failures don't stop the entire import

  • Error Messages: Clear descriptions of what went wrong

  • Partial Success: Successfully imported figures are kept even if some fail

  • Retry Option: Failed imports can be retried after fixing issues

Cancellation

  • Click "Cancel" at any time during import

  • Already-imported figures are preserved

  • The repository clone is cleaned up automatically

Import History

Recent imports are tracked and displayed:

  • Repository URL: Which repository was imported

  • Import Time: When the import occurred

  • Status: Success, partial, or failed

  • Figure Count: How many figures were extracted

Best Practices

Before Importing

  1. Clean Up Notebooks: Clear unnecessary output cells to reduce processing time

  2. Organize by Project: One repository = one logical project for cleaner organization

  3. Tag Important Commits: Consider which commits contain meaningful figure changes

SSH Key Management

  1. Use Deploy Keys: GitHub/GitLab deploy keys limit access to specific repositories

  2. Read-Only Access: GoFigr only needs read access to clone

  3. Rotate Periodically: Update SSH keys regularly for security

  4. One Key Per Repository: Easier to manage and audit

Branch Selection

  1. Start with Main Branch: Begin with main/master for the primary history

  2. Add Feature Branches Selectively: Only import branches with meaningful figures

  3. Consider Tag-Based Workflows: Some teams may want to import only tagged releases

Security & Privacy

SSH Key Storage

  • SSH keys are encrypted at rest

  • Keys are only decrypted during clone operations

  • Temporary key files are securely deleted after use

  • Thread-safe handling prevents key leakage

Repository Access

  • GoFigr clones to isolated temporary directories

  • Clone directories are deleted after processing

  • No repository data is stored except extracted figures

  • Network access is limited to the clone operation

Duplicate Import Prevention

  • GoFigr prevents simultaneous imports of the same repository

  • Avoids race conditions and duplicate figures

  • Clear error messages if import already in progress

Technical Details

Supported Notebook Versions

  • Jupyter Notebook (.ipynb) format versions 4.x

  • JupyterLab notebooks

  • Google Colab exports

Processing Architecture

  • Parallel File Processing: Multiple notebooks processed simultaneously (configurable)

  • Sequential Commit Processing: Commits for each file processed in chronological order

  • Thread-Safe SSH: Isolated SSH key handling per operation

  • Automatic Cleanup: Temporary files removed even on errors

Git Operations

  • Full repository clone (not shallow) for complete history

  • Remote branch fetching when needed

  • Git protocol support: HTTPS, SSH, git://

Figure Extraction

  • All image MIME types from display_data outputs

  • Execute_result outputs with image data

  • Embedded images in markdown cells (future enhancement)

Troubleshooting

"Failed to clone repository"

  • Check the URL format is correct

  • Verify SSH key has access (for private repos)

  • Ensure the repository exists and is accessible

"SSH key required"

  • Non-HTTPS URLs require an SSH key

  • Add a key via "Manage SSH Keys"

  • Verify the key has read access to the repository

"No notebooks found"

  • Ensure the repository contains .ipynb files

  • Check selected branches contain notebooks

  • Verify notebooks are committed (not just in working directory)

Import Taking Too Long

  • Large repositories with many commits may take time

  • Consider importing specific branches only

  • Check network connectivity to the Git host

Last updated