| Download the AOL dataset |
| How to use the dataset |
aol"
has clicked 10 times onto the URL "http://www.aol.com" in the AOL query-log.
We then create an edge between the corresponding query and URL vertices, and then set
the click attribute to the value 10, and associate 10 distinct values for the attributes session,
time and rank. Each of these values will be separated by the special character
sequence "::". Additionally, these values will be aligned, in the
sense that the i-th values of these three attributes will refer to the same user click.
| Dataset | Original Size | Compressed Size | Compression Ratio | TextDB Compressor |
|---|---|---|---|---|
| Query-node labels | 108MB | 43MB | 40% | it.unipi.di.textdb.BucketedHuffword |
| URL-node labels | 46MB | 22MB | 48% | it.unipi.di.textdb.FrontCoding |
click edge attribute | 42MB | 15MB | 36% | it.unipi.di.textdb.BucketedHuffword |
rank edge attribute | 97MB | 36MB | 37% | it.unipi.di.textdb.BucketedHuffword |
session edge attribute | 316MB | 164MB | 52% | it.unipi.di.textdb.BucketedHuffword |
time edge attribute | 759MB | 278MB | 36% | it.unipi.di.textdb.BucketedHuffword |
| Total | 1.4GB | 557MB | 40% |
click.
click for all of its outgoing edges
still using GraphLabeler.
stdout as: the number
of the fetched strings, the rate at which these strings have been fetched form disk and the overall time to complete.
Running this experiment on our machine (SMP architecture with two processors of type Intel(R) Pentium(R) 4 CPU 3.00GHz)
we have taken the following times:
click attributes)aol-dataset, you can run the class
by doing:
$ cd aol-dataset/
$ java -Xmx512m AOLGraphExample webgraph/wg-aol graphlabeler/graphlabeler.conf