OK i built a tool that cleansup data folder and leaves only the files that are in use.... what it does is :
Collects all txt files and c files ( yes in scripts as well) inside data and all folders and reads all paths mentioned inside them.
The paths recognized must be paths to files like - pcx,gif,png,bor,ogg,wav,txt,c - If i miss any type of file let me know.
Then it puts all paths to these files into a log, and it applies actual path of the files so its not just data/spriters/flash.gif fron inside of flash.txt , it actually adds entire path to extracted paths.
Once all files are acquired from txt files and put into a log file then i add required files into that log file as well - files that should never be deleted which are ( i think i miss some for widescreens alternative levels.txt or for psp or something ):
for file in [
'bgs/loading.png', 'bgs/loading.gif', 'bgs/loading2.png', 'bgs/loading2.gif',
'bgs/title.png', 'bgs/title.gif', 'bgs/titleb.png', 'bgs/titleb.gif', 'bgs/logo.png',
'bgs/logo.gif', 'sprites/font.gif', 'sprites/font2.gif', 'sprites/font3.gif',
'sprites/font4.gif', 'sprites/font5.gif', 'sprites/shadow1.png', 'sprites/shadow1.gif',
'sprites/shadow2.png', 'sprites/shadow2.gif', 'sprites/shadow3.png', 'sprites/shadow3.gif',
'sprites/shadow4.png', 'sprites/shadow4.gif', 'sprites/shadow5.png', 'sprites/shadow5.gif',
'sprites/shadow6.png', 'sprites/shadow6.gif', 'sprites/arrow.png', 'sprites/arrow.gif',
'sprites/arrowl.png', 'sprites/arrowl.gif', 'bgs/hiscore.gif', 'bgs/hiscore.png',
'bgs/complete.gif', 'bgs/complete.png', 'bgs/select.gif', 'bgs/select.png',
'bgs/unlockbg.gif', 'bgs/unlockbg.png', 'models.txt', 'levels.txt', 'script.txt',
'lifebar.txt', 'select.txt', 'touch.txt', 'video.txt', 'scenes/gameover.txt',
'scenes/logo.txt', 'scenes/intro.txt', 'scenes/howto.txt', 'data/sounds/beat1.wav',
'data/sounds/fall.wav', 'data/sounds/get.wav', 'data/sounds/money.wav', 'data/sounds/jump.wav',
'data/sounds/indirect.wav', 'data/sounds/punch.wav', 'data/sounds/1up.wav', 'data/sounds/go.wav',
'data/sounds/timeover.wav', 'data/sounds/beep.wav', 'data/sounds/beep2.wav', 'data/sounds/bike.wav',
'data/sounds/block.wav', 'data/sounds/pause.wav'
]
If i miss any required file ( i dont think i do cause i checked manual) then let me know.
Cause not all paths are actually mentioned inside txt files - for example theres no path to loading.gif mentioned anywhere but the file must be saved so its excluded from deletion/cleanup and its on a list above.
Theres no paths to font files as well, no paths to shadows cause you just use numbers for them in header etc .... so all this must be preserved and eventually manually removed if you want to.
Then the process moves all unused files to REMOVED folder ( while preserving structure so you can bring them back by just copy pasting back into data folder)
This also scans inside scripts ! So it detects even files when you have something like : set(prevs, i, loadsprite("data/bgs/prev/"+(i+1))); - the code will make sure files from prev folder are not removed.
Of course i cant predict all the scripts people make, but it is possible to make it work with anything IMO.
So this helps to trim down data folder by a lot sometimes.