Sunday, September 8, 2013

Correlation of investment funds - python pandas

While playing around with python data mining framework Pandas I really liked how easy it is to calculate pairwise correlation in data series. Let's check out a sample. As data we can download historical investment fund data and we will calculate the correlation between them.

note 1.: If you are using windows the easiest way to install python with every necessary packages is the Anaconda distribution. Just download and run the installer from here and you are ready to start :)


note 2.: It wasn't easy to find historical data about investment funds. Finally I get the data from Bloomberg website. It was a kind of reserve engineering by checking the network communication when the site was drawing the graphs so there is no warranty that the data format won't change in the future. If somebody know better way to get this data I would be happy to hear about it.

Let's download first the data into an array:


Now we load it into a Pandas DataFrame and plot it (maybe you have to import matplotlib first):


The two dataframe has to contain data for the same period of time so we can simply merge them:

The result is:

Finally let's calculate the correlation:

And we get the pairwise correlation of the numeric columns:

1 comment:

Allen said...

Hi there, thank you so much for this!
I have a question: Where did you get the REST API url from? Also, where can I find a list of parameters and their descriptions (i.e. PR005-H etc.)?

Thanks! :)