I've noticed something about the way things happen on the web: when a debate starts, people chime in from all sides in many different formats, and pretty soon the debate is (literally) all over the place. So there's no way to track a debate systematically, or to reconstruct its course in detail. You can probably get the main points OK, but you can't reconstruct the detailed history.
Of course, this is true for debates outside the web as well, and for the most part, it always has been. Except for the narrowest kind of technical debate, confined to the pages of one or two specialized academic journals, debates have always become highly distributed, with pieces of the field ignoring one another simply because they don't know of one another's existence.
Yet this seems to be more of an issue on the web; or at least, it feels that way. I started off idly imagining that all this wonderful technology of RSS feeds and aggregators and trackbacks and pings should somehow result in a log which catalogs all the "he said/she said" in some neat orderly fashion. And then I thought: How would I do it if I were watching a debate in the field? There's no conventional procedure; I could try some sort of sequence analysis (see Abbott 1995 for a review), but those techniques are about analyzing data in a formal way once you've got it in some appropriate form. So here's a nice methods problem: how can we capture, represent, and store the sequence and organization of a debate on the web, as automatically as possible? What would we have to do?
Ultimately, of course, we'd like to be able to say something about debates. When do they become polarized into irreconcilable camps, when do they get resolved quickly, and when do they drag on over long periods without much progress? When do they stay confined to a few participants, and when do we get "piling on", where seemingly everyone finds it necessary to have a public opinion? Lots of work to do.
Abbott, A. 1995. “Sequence analysis: New methods for old ideas.” Annual Review of Sociology 21: 93-113.
As far as technical solutions go, I think it would be pretty easy to keep track of who said what when. I wrote down some ideas on this topic the other day:
http://laniels.org/cgi-bin/weblog?rightframe=2003-07-31#2003-07-31_threading
It requires some changes to HTML, however, which are unlikely given how slowly the standards-making bodies work. But basically, I'd like a way to indicate how you found a weblog (call it a "via" attribute) and what you're responding to (just like in emails).
In fact, it surprises me that there isn't some generalized way of saying "I found this resource [email, web page, whatever] at location X, and I am replying to the resource at location Y." The Web wasn't designed with this sort of conversation in mind, so I think it'll soon need to be retrofitted to make it work. Or maybe when the Web moves to full XML, we won't need to wait for a standards-making body to add a new attribute to HTML links.
Posted by: Steve Laniel | August 04, 2003 at 08:55 AM
Hi--
Your email address doesn't seem to work. Send me a working addresss via email. Thanks.
EMG
Posted by: EMG | August 05, 2003 at 01:40 PM
Even though it's not possible to quite get the "via" tag, it would seem to be easy to get "close enough". I tend to see, after all, bloggers write things like "I found over HERE this argument that ....."
If you scrape those messages (via trackback?), you're halfway there.
Now, no-one is (to my knowldege) doing the nice "TAG = DISAGREE" that might be necessary, but that might be derivable just from the communities. Hm. Now I feel like I need to check out the link structure of what the graph looks like, but surely there's some useful features you could derive quickly: the "dogpile" (*everyone* links to Instapundit), the "thread" (tacitus responds to dailykos responds to tacitus), the "meme" (i hate that term, but in which everyone links to a non-blog source), and so on.
Posted by: Dan | August 07, 2003 at 12:55 AM