The first ‘Functional Conference’ happened in Bangalore between Oct 9-11. I had been keenly looking forward to it. This is a quick post on the sessions I attended and the conference itself. As the lineup of speakers and topics shaped up in the buildup to the conference on their website, it heightened my expectations. As a younger engineer I have gone through the cycle of expecting too much from conferences and thus not being able to learn sufficiently from that which was on offer. Time has had a mellowing effect… I find it much better to keep an open mind and try to absorb all that is on offer. And then, a little later, retain only that which is useful/pertinent. With that mindset and approach I found ‘Functional Conference’ a very fulfilling technology experience - plenty of technical richness to absorb and sufficient ideas to retain for long.
These days I mostly program in Scala. Few weeks ago I ran into a problem to search for data within fairly large XMLs. XPath and XQuery are the standard technologies to query XML’s. JVM programmers have a choice of multiple libraries to choose from when it comes to XPath. One constraint in my problem was that the program to crunch these XML was a long-running one. So, apart from trying to make the search fast I had to make sure that the CPU/memory requirements were sane. On submitting a XPath search if a library forked many hundred threads, broke the XML into many hundred stubs thus consuming every single ounce of CPU/RAM at disposal on my machine, then it was simply a no-go. Even if such a library turned out to be an order of magnitude faster than the rest.
The latest offering of Coursera’s popular course on Machine Learning by Andrew Ng started in the first week of March. The course requires Matlab’s Octave to be used to solve the assignments. Apart from trying to solve the problems in Octave, I decided to solve the assignments in the programming language of my choice - Scala. This is a quick post on the why’s and the how’s.
Last Saturday I happened to go to my first code retreat. Hosted by Multunus and conducted by the master programmer Venkat Subramaniam, it was a very learnful experience. Here’s a sneak-peek of the event from my eyes.
JavaEE (v5 and v6) has a commanding presence in both marketshare and (developer) mindshare in the enterprise software world. The specifications are well thought-out, battle-tested and highly relied upon. I started using JavaEE (v5) way back in 2007 with JBoss 4.x. The latest release, JavaEE-7, which was released close to a year ago brings with itself a lot of worthy changes to the specs and impl. To bring myself up to speed on it I went through few books and attended a conference (JUDCon, Bangalore). But I have also been coding and acquainting myself with Typesafe’s Scala reactive stack. These two stacks are bound to compete with each more and more in the coming days. However I feel, they can be used in applications in complementary ways when carefully designed. The competition and challenge to JavaEE-7 stems from two tough requirements -
The Indian general elections are around the corner. For software engineers, this time around, there is data to play with and try to predict the outcome. Among all, the data from the social media giants - Twitter and Facebook - is easily accessible for analysis. Though social media may not be the right barometer to judge voter sentiments in a country as big and diverse as India, it is nonetheless a very tempting datasource for anyone curious. So couple of days ago I decided to do a small project - to simply chart the volume of tweets with strings like “modi”, “rahul”, “kejri” and “india” in it. I thought just a graph of volumes by itself will be interesting to see. So here I present the v1.0 of my Indian-general-elections-social-media-tracker!
There are plenty of laws, theorems and aphorisms out there, apart from Moore’s and Murphy’s, that computer people could use. They sometimes come handy in meetings and emails! Just using them could, at times, mean standing on the shoulders of giants. I plan to keep this as an ongoing post… shall keep adding till it gets too long!
Working on Scalog I decided to write a quick program couple of days ago to see the super trending twitter hashtag - #ArnabVsRahul. I initially tried to follow the hashtag on TweetDeck but found that the arrival rate of new tweets simply did not allow me to read. Wanted a way to read the tweets page-by-page with each page reloading when I refresh. So wrote a program to do so - A twitter stream listener! And yesterday pushed the code to GitHub and this is a quick post on it. The code itself can be found here.
I reluctantly started to write this post some 6 months ago. As a application developer my knowledge of the internals of DBMS design was (and still is) very limited. It is one thing to work with a DBMS at development and quite another to keep it running as part of IT Operations. My motivation here is to share a few specific ideas with fellow application developers. The attempt is to do a value judgement of the two systems from a development standpoint and steer clear from a value judgement in the deployed scenario. After all DBMS systems are probably at the heart of more Aps vs. Ops debates than anything else.
Over the last year I often heard my friends say the era of MOOC was truly upon us. It was only on taking up couple of Coursera courses did I realise it fully. They have been eye-opening many times over (and extremely rigorous). Would particularly recommend these two to anyone wanting to understand programming for the multicore, realtime, big-data world -
It was a long time ago that I read this masterpiece by the software engineering guru, Peter Norvig - Teach Yourself Programming In Ten Years. Peter advises wannabe programmers to learn at least half a dozen programming languages. Taking stock of myself earlier this year I realised having terribly missed out. In my decade long career, I have worked deeply in only 4 languages - C++, Java, JavaScript and Perl. And none of them strongly functional per Peter’s advise (functional JavaScript hasn’t come to me yet). This led me to pose two questions to myself -
Couple of recent incidents triggered me to write this one. Few weeks ago, I met an old friend. A fellow software industry man. But unlike me, a people manager. As we shared our experiences in software development, my friend picked on my recently acquired MBA. Give me something to read, my friend demanded. I promised him this blog.
Over the last few months I amused myself with an interesting pursuit. I spoke to a large number of people on the aspect of concurrency. I spoke to ex-colleagues. I spoke to engineers, architects at hackathons/meetups. And I interviewed a large number of senior engineers for a job at my company. I spoke to them about building a highly-concurrent, high-volume, real-time data-aggregation engine. Gave examples of easy, textbookish projects to drive home the requirements. Like a stock trading platform with 1000s of users, 1000s of stocks and 100s of stock-exchanges. Or a IPL ticket bidding site with 1000s of users, many seating categories, many venues etc. And this small article is about my perspectives at the end of it.
I have been dabbling with Scala for a few months now. And one of the things that strikes me about functional programming is the beauty of the finished code. It sometimes gives me a feeling of being just the right mix of art and science! Gone are the dirty null/empty checking if statements. Gone are the dumb variety for/while loops. I haven’t progressed far enough to be using actors but the very thought that variables in my program are not getting mutated while being thrashed across many cores and caches is enough to sometimes give me a high!
I did my engineering in electronics and communication systems. But my very first job was in software development. Having not studied theory of computing, databases, compilers and even algorithms/data-strutures as part of my graduation I went on to self-study these. However, deep down, have felt the need for more structured education. I don’t remember when I first heard of Coursera. But my early tryst with online education had been dismal (at my previous employer they would make me go through online training’s mandatorily… and those used to absolutely suck). So even as I kept track of the courses offered on Coursera since early this year, I did not enroll. A couple of months ago I decided to give it a serious try… and I enrolled myself for the first course on Algorithms by Professor Robert Sidgewick. I finished my final exam on the course yesterday. And it feels great to be done with all tests and programming assignments. The course was structured in the undergraduate training way… which is exactly what I wanted. The learning has been enormous. Anyone who has spent a decade in software development like me would know MergeSort and QuickSort anyway… but the scientific treatment of the subject both in the videos and the textbook gives me a sense of closure. And by the way, I think algorithms and data-structures is a field which a practicing engineer has to seriously brush-up, once in every few years, just to keep up…
Had been to the wonderful JavaScript conference JSFoo last week. The tremendous enthusiasm in the web development community for server side JavaScript was all at display. Personally, I have spent a lot of time coding visualizations with JavaScript. However only recently did I write some tidbits of code with NodeJS. And I hadn’t spent any time properly studying it. The conference has spurred me to do better. I started reading NodeJS and working on a small project to create my first non-trivial NodeJS application (which I shall share in this blog). But those details are for a little later… let me start with JSFoo…
Most web applications have the well-known 3-tiered structure - WebTier > ApplicationTier > DataTier. Both WebTier and ApplicationTier have the web-layer to parse the incoming HTTP requests. Its in the WebTier that one deploy’s load-balancing L4-routers like Apache/Nginx or Netscaler like appliances. HTTP requests are forwarded by the WebTier to the ApplicationTier which is generally served by a much bigger farm of servers. Web-layer in the ApplicationTier is the focus of this blog. Its a challenging area of software development for the following reasons and more -
Graph depictions are common for problems like computer networks, social networks etc. Sometime ago, I came across the use-case of graphs for software application topologies. This post covers the few things I discovered on the topic of application topologies and their graphical representation.
I read this beautifully written article a few days ago - “I will not do your tech interview”. I can’t agree more with the author. Every single time I have had to give/take a technical interview, more than the sense of being inadequately prepared I feel like carrying an inexplicable psychological burden. And I have met no one who does not fear what Ellis beautifully calls as - “bear-trap of a stupid brainteaser” :-).
My champion-hacker friend Sumanth and I spent a little time few weeks ago digging to know if there was a data collection challenge for system and application health metrics at a typical small data center. Here is the little that we discovered…
I have built real-time ‘stock-ticker’ like dashboards. There are many ways to build them. Few months ago I had the opportunity to design one freshly again for an enterprise product. I did a quick sweep at the different technology stacks that can be used to build a highly scalable (design/code and performance scalability) real-time dashboard. There are many technologies for real-time in the browser (like BlazeDS) that are either outdated or on their way out. I came across this very interesting presentation, code and blog by Charles Moulliard which I found to be a very exciting design. So I sat down to extend what Charles had done to suit my usecase. I would recommend this nice book by Apress as a good introduction to the subject of WebSockets. But before getting to the real usecase and seeing why use Camel or ActiveMQ, here is a quick primer to the different techniques one could use to build a real-time dashboard.
Doing Java early in the morning makes for a good day. Got up early today to put together few slides for a talk to developer folk. Not comprehensive. May not be very accurate even. And too much opinionated. If you dont mind my ego you may click here.
I have been a Dojo user for many years now. Also use many JavaScript libraries (jQuery, backbone, bootstrap, D3, highsoft) all the while but Dojo is what I really love. I would not embark on any “professional” development work without being armed with Dojo. But I rest my opinions and comparisons of different JS libraries for a different blog. Here the context is to “build” Dojo. After all every professional project should do a build of their JS - compilers like Google Closure can find bugs, obfuscate and eventually make execution faster.
Few years ago I was a product developer at a big software (but non-database) company. We were writing the v2 of a new product after a fairly successful development round of v1. For everything OLTP, we used the wonderful open-source database - Postgres. But by v2, we had new, hight-volume data like NetFlow coming in. This would have intensely tested Postgres’s scalability and read/write performance. And we had some datawarehousing and OLAP requirements too. A hard look at our queries told us that column-stores would be a great-fit. Looking back, the options for a new product to store and query on massive data volumes boiled down to these few options -
For the last couple of years I have been in search of theories in Data Visualization. Educate myself on the fundamentals. My search has taken me to many books and blogs. But none as remarkable as Edward Tufte book seminal work on the subject. This is a short refresher of the core concepts. Even as I write for myself, it may be of some use to a passing busy programmer.