
Description
Setup
- Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
$ git --version --build-options
git version 2.17.0.windows.1
cpu: x86_64
built from commit: e7621d891d081acff6acd1f0ba6ae0adce06dd09
sizeof-long: 4
- Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
Microsoft Windows 10.0.14393
- What options did you set as part of the installation? Or did you choose the
defaults?
Editor Option: Notepad++
Path Option: BashOnly
SSH Option: OpenSSH
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Enabled
Enable Symlinks: Disabled
Details
- Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other
Bash
- What commands did you run to trigger this issue? If you can provide a
Minimal, Complete, and Verifiable example
this will help us understand the issue.
git rebase --interative
... kernel panic / BSOD
git rebase --abort
Explanation
The nature of the BSOD was more severe than usual, it had to do with storage devices, the very SSD which contained the OS and everything about Git. The BSODs were not related to git or any of it's components in any way.
After a reboot, and opening Bash, navigating to the same repository, it was still showing me being in the middle of rebase, I didn't check or ran any other command or anything else, I have simply out of precaution aborted the rebase immediately, after that the index errors appeared, however the index might have been corrupted before executing that command, I have no way to know that anymore.
While being in the middle of an interactive rebase, however, no obvious filesystem activity was happening, it was idle, there was no code documents opened, I was not actively editing files.
Kowledable people on IRC claim that index file is not critically important, even tho they say it should be an easy fix with "read-tree" I still wanted to dig deeper into this from my own curiosity at least but why not document my research on top of it here.
Every time the index file got corrupt, experienced it on Win7 as well as on Win10, which were on different storage devices, the name and the filesize of the index file remained the same, while contents were all simply filled with zeroes, NULNULNULNUL, I think this raises a question how this file is handled, if it was an unexpected shutdown wouldn't the contents simply be mangled instead of replaced by zeroes, it couldn't have had the time to write them? I suspect the file is being opened in a way and being left open, and gets written at a later stage, while the shutdown must have happened between that.
Some of the under the hood behavior
Contrary to what knowledgable and popular IRC people have been saying that mintty and bash has nothing to do with git, by looking at the process tree, everything about GitForWindows runs under a single git-bash.exe
process, under it is mintty.exe
and git.exe
and other smaller utilites are all childrens.
This doesn't confirm one of the original theories, but at least maybe supports it (having Bash window opened at moment of BSOD), it could have affected and/or delayed the actual writing of the data to disk, the OS or the SSD holding it in it's cache.
Furthermore, I was told on IRC that git only touches the index file on 3 or 4 commands not including git status, I have done a scan with Process Monitor and found out even the commands that don't change the contents it, still rewrite it, I don't know how many as I mentioned I haven't done more testing yet, but this is what git status
does (tested on an unaffected repo):
AFAIK it's a 1:1 copy, nothing was changed in practice. I have checked with hash and the before-after match, the file metadata for "date of modification" also stays the same as before, so from outside it gives a false appearance. The use of such .locks and rewriting may be perfectly normal for all I know may be used elsewhere by git when doing writing, but maybe there's some occasion where it may not work right when it comes to the index file, it may be handled a bit different than other stuff, I don't have an idea yet as I haven't done any more such testing due to lack of time, I've written this issue like a month after the whole thing began but most of the time I was troubleshooting the PC hardware/OS it self. It may be something the developers could look into.
The Win10 is configured to NOT auto-rebot after a - no power loss actually occured, but if it wasn't able to write any crashdumps (including minidumps), it probably couldn't flush any of the write caches as well due to the nature of the what looked like to be a hardware problem (I'm still not completely sure, I've cloned the same exact Win10 to another very similar SSD which is simply an 1 year earlier model of the same exact size and brand series, with all new sata cables, it is working so far, I was not able to recreate the issue after 3 weeks of troubleshooting, smart tests, combinations)
I hope people don't jump on this in a way that I somehow found the culprit, no, I have simply shown where the culprit could be in, none of these things could have had anything to do with it, but at least this may eventually lead to finding something in the code, or if that's all fine, then it's just some extra code to strengthen the integrity against such events, even if it's not not git fault at all,
At the end of the day the git index file might be a small and trivial thing, maybe this leads to something bigger in terms of data integrity robustness for overall git system, and I know there's other layers of securing the data, but here's the point, if git doesn't have actual rescue/repair tools to at least salvage things then even if you have enterprise data solutions, you'd still have to use the earlier snapshot which may not be most optimal, for example, even if you have data duplication, you'd still have a broken duplicate if it synced before you discovered the issue.
Justifications come from the reasons that this is not entertainment software, it's probably not justifiable in some kind of video game, but this is the basis of a lot of serious code, even if the index file is a small non-important thing, the mere interruption can cause delays in various businesses because of the time it may take to figure it out if it happens out of the blue and there's no mention of it in the docs.
Another is safety-critical projects, there will be more and more of these in the future, in an event of some data corruption it may delay the release of a vital patch for some real-life machinery in a wild coincidence, you have Intel developing a whole Linux distro just for safety-critical coding in mind.
One more reason it is justified to add official repairability/maintenance for this is because .git folder is by default hidden, making it harder for manual servicing for the average users (GUI) which would need additional step in some cases (deleting it, inspecting contents)
Lack of git documentation sorrounding this issue
There is no mention of index file corruption and how to repair/restore it in any of the documentation I have searched, I've used grep, correct me if I have missed it.
For example according to the git docs for fsck,
If no objects are given, git fsck defaults to using the index file
..etc.
There's many mentions of corruption of objects, but nothing about the index file. In the book v2 docs at https://git-scm.com/book/en/v2/Git-Internals-Maintenance-and-Data-Recovery there is again no mention of index file it self, only about objects in the index package
And about the "read-tree" command it is again not mentioning any repairing/rebuilding of the index file in case of corruption, the thing appears to be mainly for something else, not part of maintenance, and none of the suboptions have seem to have anything specifically to do with recovery of a broken index file specifically for that purpose.
https://git-scm.com/docs/git-read-tree
Checklist
- strengthen index-file integrity by better handling it with or without FSCache enabled.
- such FS data integrity strengthening should also be done on other small .git files IMO
- decision whether or not index file (and files containing hashes) would be completely excluded from FSCache feature making sure it's immediately written physically to disk, except if it requires a full OS flush cache command in which case this could be made optional as such OS-wide flushing every time the index file is touched might affect other things.
- implement the use of a backup index file which would be an inactive copy before the last command was executed, the contents if the primary index file would be copied to the backup index file at the beginning of commands which require the use of it, while keeping in mind that even if some actions only appear to read the contents of the index file and don't change it, they technically do, except they rewrite the same contents back (git status).
- the use of backup index file could be implemented globally for all commands which would throw out that index is corrupt, they would present with an option to use the backup one instead if user agrees and/or point to the maintenance docs.
- in the case of NULNULNUL contents, while practically the same, technically this would be called an "unrecognized index file" versus a corrupt one, corruption usually means only partial damage.
- update git maintenance tools to handle index file corruption along with commands to restore it from backup or add a "use backup index file" suboption where applicable,
- update documentation regarding the index file corruption phenomena and such changes where applicable including official ways how to get more information regarding it's state.