Saturday, November 7, 2009

Data and Fungibility

Fungibility is an important concept in economics. Basically it states that:

A given unit of some commodity is completely interchangeable with another unit the same commodity.


That is, if one has a given number of units of some or other good, it is equivalent to substituting these for a different lot of the same quantity and type.

Ultimately, this loose coupling makes large scale trading plausible, because, when the trade is finally cleared, clearly any tonne of wheat will suffice (I don't specifically need the tonne of wheat harvested from some particular farm in Argentina or Australia).

In the world of computers it is almost assumed that data is fungible. That is, clearly one does not actually move the bits from one process to another. Rather, the actual data representation in one process is precisely communicated to the receiving process. Within the address space of the receiving process an equivalent representation of the data is constructed. Clearly, any such reconstruction will suffice.

So, it would seem that clearly networked computing processes already benefit from the fungibility of data.

However, when it comes to actual implementations, while this idealistic world view sketched above does hold partially true, even this state has generally only been attained via the painstaking efforts of the developers and is often a very tedious undertaking. It is not an implicit artifact of computing. Various technologies try to simplify these efforts (e.g. RPC, CORBA, Web-services, OR-mapping etc.), however, when using these it seems that often the underlying problem is simply being hidden by additional layers rather than fundamentally solved.

It would seem to me that in order to address these issues we need to strive towards true fungibility of data. That is, we need data that can be automatically moved from one process to another where it can be used directly even when these processes have been developed completely independently of each other. Equivalently, we need data that can be repeatedly passed through different computing layers (communication, logic, persistence) automatically rather than relying on any additional efforts on the part of the developers in order to map the data between layers.

More plainly, data fungibility should be an implicit part of computing and should be efficient and compatible across all computing layers and span all cooperating processes.

Within a world where data is truly fungible, the efforts of developers can then be turned to focus on the real computational problems and not on the task of moving data between systems.

How we get there is left as a cooperative exercise for the reader...