How does git store duplicate files? -


we have git repository contains svm ai input data , results. every time run new model, create new root folder model can organize our results on time:

/run1.0   /data     ... 100 mb of data   /classification.csv   /results.csv   ... /run2.0   /data     ... 200 mb of data (including run1.0/data)   /classification.csv   /results.csv   ... 

as build new models may pull in data (large .wav files) previous run. means our data folder 2.0 may contain files 1.0/data plus additional data may have collected.

the repo going exceed gigabyte if keep up.

does git have way recognize duplicate binary files , store them once (e.g. symlink)? if not, rework how data stored.

i not going explain quite right understanding every commit stores tree structure representing file structure of project pointers actual files stored in objects sub folder. git uses sha1 hash of file contents create file name , sub folder, example if file's contents created following hash:

ob064b56112cc80495ba59e2ef63ffc9e9ef0c77 

it stored as:

.git/objects/ob/064b56112cc80495ba59e2ef63ffc9e9ef0c77

the first 2 characters used directory name , rest file name.

the result if have multiple files same contents different names or in different locations or different commits 1 copy ever saved several pointers in each commit tree.


Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -