"A Quick Guide to Organizing Computational Biology Projects" →
Very nice article from Dr. William Noble at the University of Washington. I’ve been working with a similar setup. But one tidbit caught my attention:
Within the data and results directories, it is often tempting to apply a similar, logical organization. For example, you may have two or three data sets against which you plan to benchmark your algorithms, so you could create one directory for each of them under data. In my experience, this approach is risky, because the logical structure of your final set of experiments may look drastically different from the form you initially designed. This is particularly true under the results directory, where you may not even know in advance what kinds of experiments you will need to perform. If you try to give your directories logical names, you may end up with a very long list of directories with names that, six months from now, you no longer know how to interpret.
At the moment, my directory structure is exactly as Dr. Noble describes, a logical layout of different data sets tagged with a descriptive names. This has worked okay so far. But, I had not given due consideration of the inherent riskiness. Perhaps a restructuring is in order.