« Object models and data sets for a social network .. crawlers vs simulation .. | Main | Johnny Cash impersonates Elvis »
March 21, 2008
Object models and data sets for a social network .. a technique for data
following on from my last post .. Object models and data sets for a social network .. crawlers vs simulation ..
(read the previous email first to get the context)
I have been thinking of this last night and this may be a solution
The question is - Can we extapolate social network data based on existing data patterns?
We have three components
a) A knowledge base(typical size of friends, no of blog posts etc)
b) Parameterization (setup configuration to run the generator) and
c) The generation itself
We need a large volume of data to be relevent
We need parameters that mirror real life
Here is my plan
The objective is to 'clone' the transactions from a core set first and then apply the parameters(intelligence)
a) Create configuration tables (for instance a profiles table)
b) Create transactions tables(blog entries, facebook pokes etc) and
populate it with the base entries
c) Create a cartesian join between the profiles table and the blog
entries table. In a normal course of events, cartesian joins are not
desirable. However they are good to create a massively large number of
records very quickly
fror instance
select profile.profile_id, blogs.blog_id from profiles, blogs
If profiles has profile_id P1, P2
blogs has blog_id B1, B2
will give 4 rows
P1, B1
P1, B2
P2, B1
P2, B2
d) We thus get a large number of rows
e) We then 'apply' the rules as a series of update statements on the
base data(post cartesian join)
f) This gives us the 'real' data
g) To make this work, I plan to 'open source' the whole thing -
tables and more importantlyt the knowledge base.
h) So, I see many people contributing insights(A typical user on FB
has typically 100 friends on average, Myspace has 40 blog entries per
week that sort of thing)
i) So, we can now create a set of data based on parameters and its
all open sourced
thougts?
kind rgds
Ajit
Posted by ajit at March 21, 2008 11:36 AM
Trackback Pings
TrackBack URL for this entry:
http://www.opengardensblog.futuretext.com/mt-tb.cgi/811