Monday, June 24, 2019

development - Switching from C++ to R - limitations/applications


I've only recently begun exploring and learning R (especially since Dirk recommended RStudio and a lot of people in here speak highly of R). I'm rather C(++) oriented, so it got me thinking - what are the limitations of R, in particular in terms of performance?


I'm trying to weigh the C++/Python/R alternatives for research and I'm considering if getting to know R well enough is worth the time investment.


Available packages look quite promising, but there are some issues in my mind that keep me at bay for the time being:



  • How efficient is R when it comes to importing big datasets? And first of all, what's big in terms of R development? I used to process a couple hundred CSV files in C++ (around 0.5M values I suppose) and I remember it being merely acceptable. What can I expect from R here? Judging by Jeff's spectacular results I assume with a proper long-term solution (not CSV) I should be even able to switch to tick processing without hindrances. But what about ad-hoc data mangling? Is the difference in performance (compared to more low level implementations) that visible? Or is it just an urban legend?

  • What are the options for GUI development? Let's say I would like to go further than research oriented analysis, like developing full blown UIs for investment analytics/trading etc. From what I found mentioned here and on StackOverflow, with proper bindings I am free to use Python's frameworks here and even further chain into Qt if such a need arises. But deploying such a beast must be a real nuisance. How do you cope with it?



In general I see R's flexibility allows me to mix and match it with a plethora of other languages (either way round - using low level additions in R or embed/invoke R in projects written in another language). That seems nice, but does it make sense (I mean like thinking about it from start/concept phase, not extending preexisting solutions)? Or is it better to stick with one-and-only language (insert whatever you like/have experience with)?


So to sum up: In what quant finance applications is R a (really) bad choice (or at least can be)?



Answer



R can be pretty slow, and it's very memory-hungry. My data set is only 8 GB or so, and I have a machine with 96 GB of RAM, and I'm always wrestling with R's memory management. Many of the model estimation functions capture a link to their environment, which means you can be keeping a pointer to each subset of the data that you're dealing with. SAS was much better at dealing with large-ish data sets, but R is much nicer to deal with. (This is in the context of mortgage prepayment and default modeling.)


Importing the data sets is pretty easy and fast enough, in my experience. It's the ballooning memory requirements for actually processing that data that's the problem.


Anything that isn't easily vectorizable seems like it would be a problem. P&L backtesting for a strategy that depends on the current portfolio state seems hard. If you're looking at the residual P&L from hedging a fixed-income portfolio, with full risk metrics, that's going to be hard.


I doubt many people would want to write a term structure model in R or a monte-carlo engine.


Even with all that, though, R is a very useful tool to have in your toolbox. But it's not exactly a computational powerhouse.


I don't know anything about the GUI options.


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...