# Git Repository Import

## Overview

GoFigr's Git Repository Import feature bridges the gap between your code repositories and your figure management workflow. Import Jupyter notebooks directly from GitHub, GitLab, Bitbucket, or any Git-compatible host, and GoFigr automatically extracts every figure from every commit—preserving your complete research history with full version tracking and attribution.

## Key Benefits for Users

### Preserve Your Research History

* **Full Commit History**: Import figures from all commits, not just the latest version
* **Version Tracking**: Each notebook revision becomes a tracked figure revision in GoFigr
* **Git Metadata Preserved**: Branch names, commit hashes, and timestamps are stored with each import
* **Author Attribution**: Git commit authors are automatically mapped to GoFigr users

### Seamless Integration with Your Workflow

* **No Workflow Changes**: Keep using Git as you always have—GoFigr pulls from your repositories
* **Multiple Git Hosts**: Support for GitHub, GitLab, Bitbucket, and any standard Git server
* **Both HTTPS and SSH**: Use public repos via HTTPS or private repos with SSH key authentication
* **Branch Selection**: Choose exactly which branches to import

### Automatic Figure Extraction

* **Jupyter Notebook Processing**: All output cells with images are automatically extracted
* **Code Association**: Each figure is linked to the code cell that generated it
* **Source Document Linking**: Figures are connected back to their source notebook files
* **Intelligent Deduplication**: Identical figures are detected and not duplicated

## How It Works

### Step 1: Navigate to Import

1. Go to the Import page from the main navigation
2. Select your target workspace
3. Choose the "Git Repository" import option

### Step 2: Enter Repository URL

Enter your Git repository URL. GoFigr supports multiple formats:

**HTTPS URLs (public repositories)**:

* `https://github.com/username/repository.git`
* `https://gitlab.com/username/repository.git`
* `https://bitbucket.org/username/repository.git`

**SSH URLs (private repositories)**:

* `git@github.com:username/repository.git`
* `git@gitlab.com:username/repository.git`
* `git+ssh://git@github.com/username/repository.git`

### Step 3: Configure SSH Key (for Private Repos)

For SSH-based URLs to private repositories:

1. The SSH Key selector appears automatically
2. Choose an existing SSH key or add a new one
3. SSH keys are stored encrypted and used securely for authentication
4. Click "Manage SSH Keys" to add, view, or remove keys

**Note**: SSH keys are optional for HTTPS URLs to public repositories.

### Step 4: Select Branches

Once the repository is validated:

1. GoFigr automatically fetches available branches
2. Main/master branches are selected by default if present
3. Use the multi-select dropdown to add or remove branches
4. Each selected branch will be scanned for notebooks

### Step 5: Start Import

Click "Import from Git" to begin:

1. GoFigr clones the repository to a secure temporary location
2. All commits in selected branches are scanned for `.ipynb` files
3. Each notebook file at each commit is processed
4. Figures are extracted from cell outputs
5. Progress is displayed in real-time

### Step 6: Monitor Progress

The import progress modal shows:

* **Overall Progress**: Percentage complete across all files
* **Current File**: Which notebook is being processed
* **Branch Progress**: Which branch/commit is being scanned
* **Log Messages**: Detailed status updates
* **Cancel Option**: Stop the import at any time if needed

## What Gets Imported

### From Each Notebook

* **All Figure Outputs**: PNG, JPEG, SVG, and other image outputs from cells
* **Cell Code**: The code that generated each figure is preserved
* **Notebook Metadata**: Kernel info, notebook version, and custom metadata

### From Git History

* **Commit Timestamps**: Each figure revision uses the original Git commit time
* **Author Information**: Commit authors are mapped to GoFigr users by email
* **Branch Context**: Which branch each version came from
* **Commit Hash**: Links back to the exact commit for provenance

### How Figures Are Organized

1. **One Analysis per Notebook Path**: Notebooks with the same path share an analysis
2. **One Figure per Output**: Each distinct figure in the notebook becomes a GoFigr figure
3. **Revisions by Commit**: Different commits create different figure revisions
4. **Source Linking**: Each figure links back to its source notebook asset

## Author Attribution

GoFigr intelligently maps Git authors to GoFigr users:

### Automatic Matching

* Git commit author emails are matched against GoFigr user emails
* Matching requires the importing user to have a confirmed email address
* Matched figures show the original author in GoFigr

### When No Match Is Found

* The "on behalf of" field stores the Git author name and email
* Full attribution is preserved even without a GoFigr account
* Future users can claim their figures when they join

### Importing User as Fallback

* If author matching is disabled or fails, the importing user is credited
* The original Git author info is still stored in metadata

## Real-Time Progress Tracking

### Progress Indicators

* **File Count**: "Processing file 5 of 23"
* **Branch Progress**: "Scanning branch: feature/analysis"
* **Commit Info**: "Processing commit a1b2c3d..."
* **Detailed Logs**: Timestamped log messages for debugging

### Error Handling

* **Graceful Failures**: Individual file failures don't stop the entire import
* **Error Messages**: Clear descriptions of what went wrong
* **Partial Success**: Successfully imported figures are kept even if some fail
* **Retry Option**: Failed imports can be retried after fixing issues

### Cancellation

* Click "Cancel" at any time during import
* Already-imported figures are preserved
* The repository clone is cleaned up automatically

## Import History

Recent imports are tracked and displayed:

* **Repository URL**: Which repository was imported
* **Import Time**: When the import occurred
* **Status**: Success, partial, or failed
* **Figure Count**: How many figures were extracted

## Best Practices

### Before Importing

1. **Clean Up Notebooks**: Clear unnecessary output cells to reduce processing time
2. **Organize by Project**: One repository = one logical project for cleaner organization
3. **Tag Important Commits**: Consider which commits contain meaningful figure changes

### SSH Key Management

1. **Use Deploy Keys**: GitHub/GitLab deploy keys limit access to specific repositories
2. **Read-Only Access**: GoFigr only needs read access to clone
3. **Rotate Periodically**: Update SSH keys regularly for security
4. **One Key Per Repository**: Easier to manage and audit

### Branch Selection

1. **Start with Main Branch**: Begin with main/master for the primary history
2. **Add Feature Branches Selectively**: Only import branches with meaningful figures
3. **Consider Tag-Based Workflows**: Some teams may want to import only tagged releases

## Security & Privacy

### SSH Key Storage

* SSH keys are encrypted at rest
* Keys are only decrypted during clone operations
* Temporary key files are securely deleted after use
* Thread-safe handling prevents key leakage

### Repository Access

* GoFigr clones to isolated temporary directories
* Clone directories are deleted after processing
* No repository data is stored except extracted figures
* Network access is limited to the clone operation

### Duplicate Import Prevention

* GoFigr prevents simultaneous imports of the same repository
* Avoids race conditions and duplicate figures
* Clear error messages if import already in progress

## Technical Details

### Supported Notebook Versions

* Jupyter Notebook (.ipynb) format versions 4.x
* JupyterLab notebooks
* Google Colab exports

### Processing Architecture

* **Parallel File Processing**: Multiple notebooks processed simultaneously (configurable)
* **Sequential Commit Processing**: Commits for each file processed in chronological order
* **Thread-Safe SSH**: Isolated SSH key handling per operation
* **Automatic Cleanup**: Temporary files removed even on errors

### Git Operations

* Full repository clone (not shallow) for complete history
* Remote branch fetching when needed
* Git protocol support: HTTPS, SSH, git://

### Figure Extraction

* All image MIME types from display\_data outputs
* Execute\_result outputs with image data
* Embedded images in markdown cells (future enhancement)

## Troubleshooting

### "Failed to clone repository"

* Check the URL format is correct
* Verify SSH key has access (for private repos)
* Ensure the repository exists and is accessible

### "SSH key required"

* Non-HTTPS URLs require an SSH key
* Add a key via "Manage SSH Keys"
* Verify the key has read access to the repository

### "No notebooks found"

* Ensure the repository contains `.ipynb` files
* Check selected branches contain notebooks
* Verify notebooks are committed (not just in working directory)

### Import Taking Too Long

* Large repositories with many commits may take time
* Consider importing specific branches only
* Check network connectivity to the Git host


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.gofigr.io/features/git-import.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
