Software Developer

Move directory from one repository to another, preserving history

I just moved one directory within a Git repository to a directory within another repository including its history. For example:

repositoryA/
.........../directoryToKeep
.........../otherDirectory
.........../someFile.ext

repositoryB/
.........../someStuff

The goal is to move directoryToKeep into repositoryB with its history, i.e., all commits that affect directory1. If instead, you want to create a repository just for the contents of directoryToKeep, just skip the last step of the preparation of the source repository.

If you have files tracked by git-lfs, please note the update at the bottom first.

Here is how I did it, based on this blog post and StackOverflow topic:

1. Prepare the source repository

    1. Clone repositoryA (make a copy, don’t use your already existing one)
    2. cd to it
    3. Delete the link to the original repository to avoid accidentally making any remote changes
      git remote rm origin
    4. Using filter-branch, go through the complete history and remove all commits (or keep all commits affecting directoryToKeep) not related to directoryToKeep.
      git filter-branch --subdirectory-filter <directoryToKeep> -- --all

      From the git documentation:

      Only look at the history which touches the given subdirectory. The result will contain that directory (and only that) as its project root.

      You might need to add --prune-empty to avoid empty commits, in my case it was not necessary.

      This means that the result will be repositoryA containing the contents of directoryToKeep directly, which is also reflected in all the commits. If you want to create a separate repository just for directoryToKeep, skip the next step. If instead you want to move directoryToKeep to repositoryB into its own directory, you basically have two options. You might be fine with the way the commits are and create an additional commit that moves all files into a directory. However, if you are a perfectionist like myself, you can perform the following command to move directoryToKeep into its own directory, which will update all remaining commits accordingly.

    5. Replace directoryToKeep with your actual directory before, and execute the following command using index-filter this time:
      git filter-branch --index-filter '
          git ls-files -sz | 
          perl -0pe "s{\t}{\tdirectoryToKeep/}" |
          GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
              git update-index --clear -z --index-info &&
              mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"
      ' HEAD

      If you want to preserve tags and update them, you need to add --tag-name-filter cat.

      If you get the error “mv: cannot stat โ€˜.newโ€™: No such file or directory“, you need to add the --prune-empty option to filter-branch to avoid empty commits.

  1. There might be old untracked files. You can clean up the repository with the following commands:
    git reset --hard
    git gc --aggressive
    git prune
    git clean -df
  2. If you just want a new repository for directoryToKeep, you should be able to just push it. Otherwise follow the second step.
    It’s also good at this point to make sure that the result is correct, e.g., using git log.

2. Merge into target repository

  1. Clone repository (make a copy, don’t use your already existing one)
  2. cd into it
  3. Create a remote connection to repositoryA as a branch in repositoryB.
    git remote add <branch-name-repoA> /path/to/repositoryA
  4. Pull from the branch (this assumes you performed the changes above on master)
     git pull --allow-unrelated-histories <branch-name-repoA> master

    Note: Because your branch and master don’t have a common base, git 2.9+ will refuse to merge them without the --allow-unrelated-histories option.

  5. It will create a merge commit to merge the current HEAD with your branch. The editor for the commit message should appear. Enter a meaningful commit message and proceed.
  6. Now you’re done and can push.
  7. Personally, I would just delete the cloned repositories from step 1 and go back to the actual repository.
  8. If everything works, remove directoryToKeep from repositoryA.

Update 19.01.2017: Updated step 2.4 with additional option (Thanks, Paul!)

Update 18.12.2018: Updated step 1.5 with additional option to preserve tags (Thanks, Sandip!)

Update 28.10.2019: If you have files tracked by git-lfs, there are is an additional step you need to perform. After cloning the repository at the beginning, perform git lfs fetch --all (source). As Evan pointed out in the comments below, if the directory that should be kept does not have any large files, he performed git lfs uninstall โ€“local to get rid of them.

35 Comments

  1. Paul

    git 2+ will require an additional flag for the final pull flag:

    ` git pull master –allow-unrelated-histories`

  2. Paul

    You’re awesome! Saved us a tonne of work!

  3. Gergely

    Would it be possible to write a script for this?

  4. Dimitris Pantazopoulos

    Excellent, thanks a lot.

    Can move the whole repo without steps 1.4, 15 and (optionally) 1.6.

    Thanks again.

  5. Anil

    Excellent post, but this does not work when you are trying you are trying to move ‘repositoryA:/path/to/directoryToKeep’ to ‘repositoryB:/path/to/directoryToKeep’. Instead the the ‘directoryToKeep’ is all copied into the root of ‘repositoryB’ (ex: ‘repositoryB:/directoryToKeep’) after I run the `git pull –allow-unrelated-histories master`. What am I missing here? How do I make sure that git pull creates everything under ‘repositoryB:/path/to/directoryToKeep’ ?

    • Matthias Schoettle

      Have you tried what I mentioned above step 1.5? In the source repository you could simply move the contents (which at this point will be in the root) into its own folder with one commit.

      • Mukku

        Hi. Are you able to copy to specific folder in destination repo. I am struggling for the same. The file which I need to copy from source repo to target repo is actually copied in the root folder. I am unable to find which option i must use if i want to copy to specific folder at destination repo. eg. I have a file ‘myfile.txt’ in /sourcerepo/srcfolder/srcsubfolder and want to copy into /targetrepo/targetfolder/targetsubfolder. I tried so many options and asked so many friends, everyone tried but file copied only in /targetrepo not in /targetrepo/targetfolder/targetsubfolder. Please help me how to copy in targetsubfolder or where should i give path for copy file in target folder.
        Can you or anyone please please help me?

        • Matthias Schoettle

          Hi Mukku,

          Is srcfolder/srcsubfolder the same as targetfolder/targetsubfolder?

  6. Mike

    If repositoryA previously had history of the contents of directoryToKeep being moved from someOtherDirectory->directoryToKeep, this will lose all history prior to that move occurring.
    This solution literally just looks for all instances of the files under a directoryToKeep folder in all commits in the history, and only keeps the commits/portions of the commits, that affect directoryToKeep.
    A more robust solution would likely need to recursively consider all files currently under directoryToKeep, inspect them to determine all their previous locations based on the history of possible moves, and take the sum-total bundled set of individual files and request that they all be kept.

    • Matthias Schoettle

      That is correct.

      Do you know the name of the directory it was renamed from? If so, you could try what this post suggests.

      I am not sure if a general solution that follows renames exists.

  7. Andy Pippin

    Awesome! Still a valid procedure.

  8. raj

    Is there a way to do without changing the commit Ids. When i follow the procedure all is good, except that i have new commit Id’s for all the commits.

    • Matthias Schoettle

      No, using this technique it is not possible since the parent commit id (among other things) is used to determine the commit id (SHA-1 hash).

      Is there a specific reason you need to preserve the same commit ids?

  9. Prav

    Thank you! This was an easy to use tutorial – after wrestling with this issue for over 3 hours, you helped me solve it in 5 mins!

    • Matthias Schoettle

      Thank you, appreciate it! ๐Ÿ™‚

  10. Olimpio

    Whener I try to run this command I het this error

    “`
    Cannot create a new backup.
    A previous backup already exists in refs/original/
    Force overwriting the backup with -f
    “`

    • Matthias Schoettle

      Which command are you referring to that results in this error?

      • Nash

        I’m getting this same error when running the command on Step# 5 using –index-filter.

        Thanks

        • Matthias Schoettle

          Ah, I see. My guess is it is due to a backup being created in step 4.

          From the filter branch documentation:

          Always verify that the rewritten version is correct: The original refs, if different from the rewritten ones, will be stored in the namespace refs/original/.

          Does the following before step 5 do the trick?

          git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

  11. Sandip Bhattacharya

    `git filter-branch –index-filter` will need to have an additional `–tag-name-filter cat` if you want the tags to be preserved, else you will lose all your tags from the previous `–subdirectory-filter`

    • Matthias Schoettle

      Thanks! I added this as an optional part of step 1.5.

  12. Gyakuten

    Hi, I get this error on step 1.e :

    mv: cannot stat ‘C:/dev/ip/jahia-epi-sinistres/.git-rewrite/t/../index.new’: No such file or directory
    index filter failed:
    git ls-files -sz |
    perl -0pe “s{\t}{\tmyProject/}” |
    GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
    git update-index –clear -z –index-info &&
    mv “$GIT_INDEX_FILE.new” “$GIT_INDEX_FILE”

    Any idea how to fix it ?

    Thanks

    • tom932

      I had the exactly same problem, it was solved by adding –prune-empty, like mentioned above: `You might need to add –prune-empty to avoid empty commits, in my case it was not necessary.`

      • Matthias Schoettle

        Thanks for pointing that out!

  13. Evan

    Another case where one needs to take into account is when the original repository is using lfs files then they need to be removed from the attributes of the remaining git repository. I used:

    git lfs uninstall –local

    because the folder I kept did not use any lfs files itself but I imagine in the case that there are files in there as well a more elaborate scheme should apply (e.g. remove all and add back or remove only the files belonging in the parent ).

    • Matthias Schoettle

      Thanks, good point! Updated the instructions.

  14. Bryan W

    I’m getting the error “mv: cannot stat ‘.new’: No such file or directory” on the index-filter step. It appears the variable $GIT_INDEX_FILE isn’t being set by anything. Do I need to set it manually to “`pwd`/.git/index” before executing the command? Or is this supposed to be an environment variable?

    If I do the above, the error changes to “mv: cannot stat ‘/c/Users/bwingert/Desktop/Git Transfer Test/for_export/repo_a/.git/index.new’: No such file or directory” Would I need to also create this file, or is the command supposed to handle it under the hood?

    Using Git 2.20.1.windows.1 via bash in MINGW64

    • Matthias Schoettle

      Hi Bryan, please check the comments by Gyakuten and tom932 above.

  15. Bryan W

    I have –prune-empty in both calls to filter-branch, as well as the “git for-each-ref –format=”%(refname)” refs/original/ | xargs -n 1 git update-ref -d” line suggested above, and the “mv: cannot stat ‘[repo path]/.git-rewrite/t/../index.new’: No such file or directory” error persists.

    • Matthias Schoettle

      This might be an OS-specific (Windows) issue. The commands might need to be adjusted. But instead, you might want to check out git-filter-repo which gets recommended when executing git filter-branch.

      If you do try it out and it works (or not), please let me know. I will update this post then.

    • Sandip Bhattacharya

      The steps given in the post worked for most repos I worked with, but for one repo I kep getting this error, even after adding –prune-empty. After a while, I gave up and updated that “mv” command like this to ignore the error.

      mv “$GIT_INDEX_FILE.new” “$GIT_INDEX_FILE”

      to

      mv “$GIT_INDEX_FILE.new” “$GIT_INDEX_FILE” || true

      ๐Ÿ˜›

  16. Joakim Brorsson

    Hi

    I have been following the steps, including git lfs fetch --all after cloning the source repository, but the last step, i.e. git pull --allow-unrelated-histories master, with the following error message:

    Downloading gpumd/binaries/bagage/figs/bgg_all_x_ba_pes_combined_fcp_dft_cage_center_ols.pdf (183KB)
    Error downloading object: gpumd/binaries/bagage/figs/bgg_all_x_ba_pes_combined_fcp_dft_cage_center_ols.pdf (f53cf05): Smudge error: Error downloading gpumd/binaries/bagage/figs/bgg_all_x_ba_pes_combined_fcp_dft_cage_center_ols.pdf (f53cf0538cbab0cc303d0f89e24c0f366cc7b6ee5e1d5da0b61ead6db9362590): [f53cf0538cbab0cc303d0f89e24c0f366cc7b6ee5e1d5da0b61ead6db9362590] Object does not exist on the server or you don't have permissions to access it: [404] Object does not exist on the server or you don't have permissions to access it
    
    Errors logged to /Users/joabro/clathrate-thermal-conductivity/.git/lfs/logs/20200209T144723.861288.log
    Use git lfs logs last to view the log.
    error: external filter 'git-lfs filter-process' failed
    fatal: gpumd/binaries/bagage/figs/bgg_all_x_ba_pes_combined_fcp_dft_cage_center_ols.pdf: smudge filter lfs failed
    

    Should git really try to download the lfs files when I did git lfs fetch --all?

    With regards

    Joakim

    • Matthias Schoettle

      Hi Joakim,

      You don’t get this error when doing git lfs fetch โ€“-all?

      Can you try running it with GIT_TRACE=1 git pull ..., maybe that will produce more information.

      Here is some information I found that might be helpful:

      git lfs fetch scans the working tree for LFS pointer files to download, whereas git lfs fetch --all scans the entire history. If the later yields an error and the former does not, it is likely to indicate that a previous version of the file is missing on the server.

      Source: https://github.com/git-lfs/git-lfs/issues/1935#issuecomment-328209526

Leave a Reply to raj Cancel reply

Your email address will not be published. Required fields are marked *

© 2020 Matthias Schoettle

Theme by Anders NorenUp ↑