Hello Visitor! Login or Sign Up

Improved Tag Clouds

I wrote before about creating tag clouds with Turbogears though the idea is the same in pretty much any language you use : Split the bucket of things that you have into smaller buckets, assign each bucket a size and then display them.

Really it's mostly about how you split the categories into the buckets, how many elements end up in each one.

Lets say you have a bunch of categories and each one has a number of elements within it. You want to display the category names with each name in a size relative to the number of elements in it.

My last attempt looked at the min and max number of categories and divided it into 5 buckets each identified by it's own CSS class. Here are those classes.

.smallestTag { font-size: xx-small;}
        .smallTag { font-size: small;}
        .mediumTag { font-size: medium;}
        .largeTag { font-size: large;}
        .largestTag { font-size: xx-large;}

so far so good. What gets put into what bucket was determined by whether you were a min value (smallestTag), max value (largestTag) or somewhere between the buckets (small, medium and Large Tags). Here's the code:

        allcats = GiftCategory.select()
                cats = [cat for cat in allcats]

                #find out what the max and min clicks are
                nums = [cat.clicks for cat in cats]
                maxn = max(nums)
                minn = min(nums)
                diff = (maxn - minn) / 3

                #Work out what category deserves which tag
                l = []
                for cat in cats:
                    if cat.clicks == minn:
                        klass = 'smallestTag'
                    elif cat.clicks == maxn:
                        klass = 'largestTag'
                    elif cat.clicks > (minn + (diff * 2)):
                        klass = 'largeTag'
                    elif cat.clicks > (minn + diff):
                        klass = 'mediumTag'
                        klass = 'smallTag'

The problem with it is that you can get a lot of bunching when your distribution is uneven. If your largest category has 100 items but the others are in the range 1-10 your tag cloud isn't going to have a lot of variation.

This happened to me a lot. So I figured better to split the distribution into an even number of buckets regardless of the distribution.

Enter Recipe 425397 Split a list into roughly equal-sized pieces on ASPN. The code is short if not exactly a simple read :

def split_seq(seq, size):
                newseq = []
                splitsize = 1.0/size*len(seq)
                for i in range(size):
                return newseq

So now we can split up our categories by the number of buckets we want easily and our old code is changed to (something like, not tested):

        tag_names = ['smallestTag','smallTag','mediumTag','largeTag','largestTag']

                cats = [cat for cat in GiftCategory.select()]

                #Get all the numbers in a big list
                nums = [cat.clicks for cat in cats]

                #make a unique list of the numbers so [1,1,1,2,2,3] = [1,2,3]
                n = {}
                for num in nums:
                    n[num] = 1
                nums = n.keys()

                #Get them ordered

                #Split them into our buckets (result is a list of lists [[1,2,],[3,4],...,[45,60],[100]]
                num_seq = split_seq(nums,length(tag_names))

                #Assign each bucket a tag name
                num_seq = zip(num_seq,tag_names)

                l = []
                #Look through each category
                for cat in cats:
                    #Check each sequence to see if the count is in that sequence-list
                    for seq,tagname in num_seq:
                        if cat.clicks in seq:

Overall the result should be better for uneven distributions and give a more pleasing effect.


© 2006 - 2013 Automatic Romantic | Terms of Use | Privacy Policy | Developer Blog

Web Design Inspired by Andreas Viklund Some icons by Mark James