Carpe diem (Felix's blog)

I am a happy developer

Writing a Damn Fast Hash Table With Tiny Memory Footprints

Hash table is probably the most commonly used data structure in software industry. Most implementations focus on its speed instead of memory usage, yet small memory footprint has significant impact on large in-memory tables and database hash indexes.

In this post, I”ll provide a step by step guide for writing a modern hash table that optimize for both speed and memory efficiency. I’ll also give some mathematical bounds on how well the hash table could achieve, and shows how close we are to the optimal.

Autoconf Tutorial Part-3

In this post I’ll show an example of how to write a cross-plaform OpenGL program. We’ll explore more autoconf features, including config.h, third party libraries, and many more.

Autoconf Tutorial Part-2

This is the second post of the autoconf tutorial series. In this post I’ll cover some fundamental units in autoconf and automake, and an example cross platform X11 program that uses the concepts in this post. After reading this post, you should be able to write your own build script for small scope projects.

Autoconf Tutorial Part-1

It’s been more than a year since my last update to my blog. I learnt a lot new stuffs in last year, but was too busy on work to write down what I’ve learnt. Luckily I got some breaks recently, and I’ll pick up some of the posts that I’ve wanted to write about. First I’ll start with a autoconf tutorial series. This is one of the difficult material to learn, but I’ll try to re-bottle it to make it more accessible to everyone.

Writing 64 Bit Assembly on Mac OS X

Many assembly tutorials and books doesn’t cover how to write a simple assembly program on the Mac OS X. Here are some baby steps that can help people who are also interested in assembly to get started easier.

Integer Promotion Part 2

It’s been a while since I wrote my last article. I recently read an old book Expert C Programming and found there are many C langauge details that I never think about. Review and rethink what C integer promotion rules meant to C is one of the interesting topic that I want to share with you.

Introducing Hadoop-FieldFormat

Hadoop FieldFormat is the new library I released that is flexible and robust for reading and setting schema information in Hadoop map-reduce program. We use this library to record the meta information for the data, and improve the semantic when building large map-reduce pipe-lined tasks. The project is quite stable now and we already used it in our production system. Any suggestion is welcome!

Hadoop Performance Tuning Best Practices

I have been working on Hadoop in production for a while. Here are some of the performance tuning tips I learned from work. Many of my tasks had performance improved over 50% in general. Those guide lines work perfectly in my work place; hope it can help you as well.

Setting Up Jasper Server on Linux

Jasper is one of the standard report generator in the industry. However, setting up Jasper is a pain of ass. This post is my note for setting up Jasper on Linux, in case I have to do it again in the future…

Capture Path Info in Hadoop InputFormat Class

On the last post I presented how to use Mapper context object to obtain Path information. This is a nice way to hack for ad-hoc jobs; however, it’s not really reusable and abstract. In this post, I’ll show you how to subclass Text, TextInputFormat, and LineRecordReader and create reusable components across all of your hadoop tasks.