Version Control for Web Content: How to Manage Localization with Git + CMS

Version Control for Web Content: How to Manage Localization with Git + CMS

Managing a single-language website is straightforward. You write content, you publish it, you iterate. But as soon as you add a second or third language, the complexity doesn’t just double – it grows exponentially. Suddenly, you aren’t just managing content; you are managing the relationships between different versions of that content.

A common scenario: The marketing team updates the English pricing page. Two days later, the German team notices the change and manually updates their version. The French team misses the memo entirely and keeps the old pricing. Meanwhile, a developer pushes a code update that breaks the layout for long Japanese strings. The result is a fragmented, inconsistent user experience that damages trust.

The solution lies in treating content with the same rigor as code. By adopting a Git-based Content Management System (CMS) or a headless architecture that integrates with version control, you can apply the principles of DevOps – branching, merging, and traceability – to your localization workflow.

This article explores how to architect a content system where Git acts as the single source of truth, ensuring that every language stays in sync, every change is tracked, and no translation is ever lost in the void.

The Problem with Traditional CMS Localization

In a traditional monolithic CMS (like older WordPress setups), localization is often an afterthought handled by plugins. These plugins usually store translations in separate database tables or as “clones” of the original page.

The “Drift” Issue

When you clone a page to create a French version, you create a disconnect. If the original English page is updated, the French clone doesn’t “know” it needs to change. Over time, your localized sites drift further and further away from the source content, becoming stale and inaccurate artifacts.

The Database Black Box

In a database-driven CMS, changes are opaque. If someone accidentally deletes a paragraph in the Spanish version, there is often no easy way to see who did it, when it happened, or what the text used to say without digging into complex database backups. There is no git blame for a SQL row.

Why Git? Treating Content as Code

Moving content into a Git-based workflow (often using formats like Markdown, JSON, or YAML) solves these structural problems by providing a file-based history of every single change.

1. Granular History and Rollbacks

Every edit is a commit. If a translator introduces an error or deletes a critical section, you can revert that specific file to its previous state instantly. You have a complete audit trail of every modification across every language.

2. Branching Strategies for Content

Just as developers use feature branches to test code before merging it to main, content teams can use branches for major updates. You can create a release/Q2-campaign branch, translate all the content there, preview it in a staging environment, and merge it only when all languages are ready. This prevents the “half-translated” state where English is live but other languages are catching up.

3. Collaboration Without Collision

When multiple teams edit content at once, following a structured website language localization guide becomes critical. So you can track changes, prevent overwrites, and ship consistent updates across locales. In a Git workflow, if two people edit the same line of the same file, the system flags a “merge conflict,” forcing a human to decide which version is correct. This is far safer than the “last save wins” model of traditional databases.

Architecting the Workflow: Git + Headless CMS

You don’t need to force your marketing team to write raw Git commands in a terminal. The modern stack combines the power of Git with the usability of a Headless CMS (like Decap CMS, CrafterCMS, or Git-backed implementations of Strapi/Contentful).

Step 1: Content Modeling for i18n

The structure of your data determines the success of your localization. Avoid creating separate files for every page (e.g., about-us.en.md, about-us.fr.md) if they share the same layout. Instead, use a structured data approach where a single content object contains all language variations.

Example (JSON structure):

json

{

  “id”: “homepage-hero”,

  “layout”: “hero-banner”,

  “content”: {

    “en”: {

      “title”: “Welcome to the Future”,

      “cta”: “Get Started”

    },

    “de”: {

      “title”: “Willkommen in der Zukunft”,

      “cta”: “Loslegen”

    }

  }

}

This keeps all translations physically adjacent in the file system. If a developer changes the data structure (e.g., adds a “subtitle” field), it’s immediately obvious which languages are missing that new field.

Step 2: The “Sync” Pipeline

Your Git repository is the source of truth, but your translators work in a Translation Management System (TMS) like Crowdin or Phrase. You need a bi-directional sync.

  1. Push: When a content editor saves a change in the CMS (or a developer pushes to Git), a CI/CD script extracts the new strings.
  2. Upload: These strings are automatically uploaded to the TMS via API.
  3. Translate: Translators (or AI) work in the TMS.
  4. Pull: Once approved, the TMS opens a Pull Request (PR) back to your Git repository with the updated translations.

This workflow means your repository always contains the latest, approved translations without anyone manually copying and pasting text files.

Handling “Hard” vs. “Soft” Localization in Git

Not all localization is just text translation. A robust Git-based strategy must distinguish between Hard Localization (structural changes) and Soft Localization (text replacement).

Soft Localization is what we typically discuss: replacing “Hello” with “Bonjour.” In a Git workflow, this is handled via separate JSON/YAML keys or Markdown files. The page structure remains identical; only the strings change. This is efficient and should account for 90% of your localized content.

Hard Localization occurs when a region requires a fundamentally different page layout or flow. For example, a payment checkout flow in Germany might need an extra step for “SEPA Direct Debit” which doesn’t exist in the US version.
In a traditional CMS, you might hack this with if (lang == ‘de’) statements in your template code, which quickly becomes unmaintainable “spaghetti code.”

In a Git-based architecture, the cleaner approach is Component Swapping. Your content model can define a components array for each page.

  • The en-US JSON file might list: [‘CreditCardForm’, ‘AddressInput’].
  • The de-DE JSON file might list: [‘SepaForm’, ‘AddressInput’, ‘LegalDisclaimer’].

Because the page composition itself is defined in the data files (which are version-controlled), you can track exactly when the German team added that disclaimer component. You can revert the structure just as easily as you revert text. This separates the presentation logic (the code for the components) from the business logic (which components appear for which user), keeping your codebase clean and your regional variations explicit in the history log.

The Role of “Pseudo-Localization” in CI/CD

Before merging any new content branch, your CI pipeline should run a Pseudo-Localization build. This is a stress test for your UI.
A script automatically generates a “fake” language version where:

  1. Text Expansion: English strings are padded by 30-40% (e.g., “Save” becomes “Save [!!! !!!]”) to simulate verbose languages like German or Russian.
  2. Character Substitution: ASCII characters are replaced with accented versions (e.g., “Account” becomes “Åççôûñt”) to test font support and encoding issues.

By committing this pseudo-locale to Git and deploying it to a preview URL (e.g., pseudo.staging.yoursite.com), developers and QA teams can instantly spot broken layouts, overflowing buttons, or hardcoded strings (which won’t have the accents) before real translators even start working. This “Shift Left” approach catches i18n bugs at the code commit stage, saving expensive rounds of linguistic QA later on. Since these pseudo-files are just build artifacts, they don’t pollute your production deployment but serve as a crucial gatekeeper in your version control workflow.

Best Practices for Git-Based Localization

1. Lock the “Source” Language

In your repository settings, protect the files that contain your source language (usually English). Only allow changes via Pull Request. This prevents accidental direct edits that could break the synchronization with your TMS.

2. Use Automated Checks (Linters)

Add a step to your CI pipeline that checks your translation files for syntax errors. A missing comma in a JSON file can crash your entire build. Tools like jsonlint or custom scripts should run on every commit to ensure that even if the translation is wrong, the file format is valid.

3. Atomic Commits for Translations

Configure your synchronization tool to create separate commits for each language (e.g., “Update German translations” and “Update French translations”). This makes it easier to revert a specific language if something goes wrong without rolling back the entire release.

Case Study: Handling “Zombie” Content

One of the hardest problems in localization is knowing when to delete translations. If you remove a feature from your English site, the German translations for that feature often linger in the database forever, bloating your system.

With a Git-based approach, this is solved automatically. If you delete the key feature.old_dashboard.title from your English source file, the next sync to your TMS will identify it as a “deleted key.” Depending on your configuration, the TMS can archive it, and the subsequent Pull Request will remove that line from all language files in your repository. Your codebase stays lean and clean.

FAQ

Is a Git-based CMS too technical for marketers?

Not necessarily. Modern Git-based CMSs provide a “WYSIWYG” (What You See Is What You Get) interface that looks just like WordPress. Behind the scenes, when a marketer clicks “Save,” the CMS creates a Git commit. They get the user-friendly interface; developers get the version control.

How do we handle images in a Git workflow?

Avoid storing large binary files (images, videos) directly in Git, as it slows down the repository. Use a Digital Asset Management (DAM) system or a service like Cloudinary. In your content files, simply store the reference URL to the image. This also allows you to serve different images for different locales (e.g., a localized screenshot) by swapping the URL in the language-specific JSON block.

What happens if two translators edit the same file at the same time?

If they are working in a TMS, the system handles the concurrency and exports a clean file. If they are editing files directly in Git (rare for translators), Git’s merge conflict resolution comes into play. You will see exactly where the conflict is and can choose the correct version.

Can I use this with a website builder like Webflow or Squarespace?

Generally, no. Those platforms are “closed ecosystems” that store data in their own proprietary databases. To use a Git-based localization workflow, you typically need a Headless CMS (like Strapi, Sanity, or Decap) coupled with a frontend framework (like Next.js, Gatsby, or Nuxt).

Does this improve SEO?

Indirectly, yes. By ensuring that your localized content is always structurally identical to your source content (same HTML tags, same schema markup), you prevent the common SEO issue where localized pages lack the optimization of the main site. Furthermore, having all content in Git allows you to programmatically generate perfect hreflang sitemaps based on the actual files present in your repo.