Monday, August 13, 2012

A Podcast, But With Data

“A mechanism for publishing data interfaces” was one of the agenda items at the summit, and this turned out to be less than it looked like at first. The Finili Project tries to avoid creating something new when perfectly good technology already exists, and that was what we found here.

First of all, it may be necessary to explain the proposed requirement. Users should be able to share data with other users without much risk of the data being incomplete or inadvertently transformed along the way. This means there should be a document format, human-readable but also Finili-readable, that can declare where the data can be found in enough detail that it can be retrieved correctly and automatically. That is, if I create a table of flying saucer records, I should be able to point you to it in email or on my blog in such a way that you can get the data with a single Finili command and begin your own analysis of it.

But that isn’t much to ask, is it? We already have XML to give a data document all the structure it might need. We have URLs, Internet addresses, to say where files are. All that is needed is some kind of minimal metadata document format to put everything together.

It turns out this exists too. The metadata document that declares an audio podcast is a small XML document, around 1 kilobyte in size, that serves to point audience members to the audio files of a podcast, either in their original locations or in other repositories that the podcasts or individual episodes might have been copied to. Finili supports audio data, so audio data delivered to Finili might literally be delivered in podcast format. Of course, most data is not audio data, but Finili could access data in almost the same way regardless of its content.

One objection to this idea is that a podcast is ordinarily meant as a series over time, while a data delivery may be a one-shot event. Still, any data delivery is potentially open to revision if errors are found in the originally published data. In business, we observe that most data files marked as “final” are revised again, perhaps several times, for one reason or another that no one thought of when the delivery was declared final. Based on this experience, the series mechanism could be helpful even for data that is meant as a one-file, one-time event, and in any case, it shouldn’t do any harm to mark a file as #1 of a series.

There are other precedents we might look at for inspiration, including the various RSS formats. Note that “publishing” does not necessarily imply support for “subscribing” in the context of Finili. It is enough if the Finili program can go fetch the data as described, either in the program or in a project table. The details are left for later when we have a better sense of the range of use cases. For now, though, it is enough to say, “think of it as a podcast, but with data.”

No comments:

Post a Comment