Understanding digg: rate, not volume

This is a personal attempt to under­stand the digg front page. I am not a math­em­atician, nor a coder nor an Excel wiz (all of which will become obvious). Nonetheless, I wanted to under­stand digg better than I did and decided a tiny bit of analysis was in order.

This was the state of play on the front page at 16:25 BST today.

table2

(Time means the number of hours since a post was sub­mitted, to the nearest hour.)

Things to note? Your ranking on the digg site does not cor­res­pond any to the three metrics avail­able to users. Nor any com­bin­a­tion of them that I was able to concoct. I tried a lot of formulas, but nothing that worked. If the number of diggs was key, then the number one story would be on page 3 or some­thing. All the front page stories have attracted a fair amount of comments, but there’s no cor­rel­a­tion between that and their rank. Comments are good and diggs are good, but there’s no way of knowing how much. I could post some graphs at this point, but since there’s no visible rela­tion­ship, there’s no point. It’s clear that each story has a time-​​to-​​live on the front page, but impossible to detect what that is.

However, we’ve learned that digg front page stories do appear to be sub­mitted within the last day, with around 15 hours being the average between sub­mis­sion and the front page.

It makes sense for digg to use a secret algorithm for posts, one that isn’t easily avail­able to users through any of the inform­a­tion they’re given. Otherwise, the solution for gaming digg would be pub­li­cised, get spammed and the service would lose its users.

What we don’t have is either the rate of diggs or the rate of comments. I think it would be fair to assume that the ‘G-​​Meter in 1 minute’ story, the number one story in the list at the time I recorded, gained either a large number of diggs or comments over the recent past. Otherwise, the number two story, and all the rest, would have a higher place. Rate of diggs or comments seems to be more important than their number, although both are important. Since this is about just one moment in time, I can’t comment on how quickly or slowly a post rises and falls dependent on that. However, this is a key indic­ator, I am sure.

So I failed to reverse-​​engineer the digg algorithm. I’m frankly not up to the job and I don’t believe we have the inform­a­tion avail­able. So then I looked to a more folksy way to under­stand the page. Folksy is my forte. I tried to come up with tags that would cat­egorise the stories I was seeing. I know this is not aca­dem­ic­ally rigorous in any way. But this was how it panned out according to my own cat­egor­isa­tion of what came up:

digg2

So all of these things are good. Again, it’s just one moment from the cont­antly cir­cu­lating digg nexus. I wouldn’t want to draw too many rules from this that you couldn’t work out for yourself. Web, Major Vendors (espe­cially MS & Apple) and Conspiracies seem espe­cially good.

I look forward to future posts about ‘How To Beat Microsoft’s Planned Web Conspiracy About Melons’.

P.S. working out how ‘Friends’ or voting blocs might con­tribute to any of this is well beyond my reach. Alex has some thoughts on this. Thanks to David for the link.

P.P.S. Maths geniuses are welcome to the data here. (.xls file)

Share this post:

Digg This
Reddit This
Stumble Now!
Buzz This
Share on Facebook
Bookmark this on Delicious
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Possibly related:

5 comments to Understanding digg: rate, not volume

  • Rate intu­it­ively feels right as an important ingredient of pop­ularity. If my website gets 1000 hits, it’s more popular if it happens in an hour than if it happens over the course of a year. I’m fairly sure that the homepage is also designed to be inclusive of members of each of the cat­egories where possible. I see a lot of sport articles on there that have far fewer diggs for instance. Rate as a pro­por­tion of total diggs in that category might be inter­esting to look at.

  • The order of items on the front page is chro­no­lo­gical based on the time they are promoted, which is not actually shown (just the date they are posted). The pro­mo­tion algorithm itself has nothing to do with the order on the front page.

  • That’s good inform­a­tion, Owen & Dave. And obvi­ously, the next com­par­ison that needs to be made is between page one and page two. Were I a stat­ist­i­cian, I’m sure I’d relish that…

  • […] My first attempts to under­stand digg, the news-​​voting site, were a bit of a shambles, to be honest. I tried to work out the order and content of the front page and ended up in a tangle of half-​​remembered Maths lessons. Owen Byrne, senior software engineer at the service, put me out of my misery by com­menting that the order was actually chro­no­lo­gical according the time they were promoted to the top. I also com­mented on the import­ance of rate and topic. […]

  • […] This is a post with more writing and also a picture. My first attempts to under­stand digg, the news-​​voting site, were a bit of a shambles, to be honest. I tried to work out the order and content of the front page and ended up in a tangle of half-​​remembered Maths lessons. Owen Byrne, senior software engineer at the service, put me out of my misery by com­menting that the order was actually chro­no­lo­gical according the time stories were promoted to the top. I also com­mented on the import­ance of rate and topic, which may have been less useless.Yesterday, Fred Stutzman posted some­thing to revive my interest. He was talking about the moaning and groaning about the power of top users and the voting blocs around them. Essentially, he says the reason for this is because we need some way to sort through the thou­sands of stories sub­mitted to digg. Users can’t read them all, a lot of them are spam anyway, and so we develop coping mechanisms.My first attempts to under­stand digg, the news-​​voting site, were a bit of a shambles, to be honest. I tried to work out the order and content of the front page and ended up in a tangle of half-​​remembered Maths lessons. Owen Byrne, senior software engineer at the service, put me out of my misery by com­menting that the order was actually chro­no­lo­gical according the time stories were promoted to the top. I also com­mented on the import­ance of rate and topic, which may have been less useless. […]

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>