Carpe diem (Felix's blog)

I am a happy developer

Autoconf Tutorial Part-3

In this post I’ll show an example of how to write a cross-plaform OpenGL program. We’ll explore more autoconf features, including config.h, third party libraries, and many more.

Autoconf Tutorial Part-2

This is the second post of the autoconf tutorial series. In this post I’ll cover some fundamental units in autoconf and automake, and an example cross platform X11 program that uses the concepts in this post. After reading this post, you should be able to write your own build script for small scope projects.

Autoconf Tutorial Part-1

It’s been more than a year since my last update to my blog. I learnt a lot new stuffs in last year, but was too busy on work to write down what I’ve learnt. Luckily I got some breaks recently, and I’ll pick up some of the posts that I’ve wanted to write about. First I’ll start with a autoconf tutorial series. This is one of the difficult material to learn, but I’ll try to re-bottle it to make it more accessible to everyone.

Writing 64 Bit Assembly on Mac OS X

Many assembly tutorials and books doesn’t cover how to write a simple assembly program on the Mac OS X. Here are some baby steps that can help people who are also interested in assembly to get started easier.

Integer Promotion Part 2

It’s been a while since I wrote my last article. I recently read an old book Expert C Programming and found there are many C langauge details that I never think about. Review and rethink what C integer promotion rules meant to C is one of the interesting topic that I want to share with you.

Introducing Hadoop-FieldFormat

Hadoop FieldFormat is the new library I released that is flexible and robust for reading and setting schema information in Hadoop map-reduce program. We use this library to record the meta information for the data, and improve the semantic when building large map-reduce pipe-lined tasks. The project is quite stable now and we already used it in our production system. Any suggestion is welcome!

Hadoop Performance Tuning Best Practices

I have been working on Hadoop in production for a while. Here are some of the performance tuning tips I learned from work. Many of my tasks had performance improved over 50% in general. Those guide lines work perfectly in my work place; hope it can help you as well.

Setting Up Jasper Server on Linux

Jasper is one of the standard report generator in the industry. However, setting up Jasper is a pain of ass. This post is my note for setting up Jasper on Linux, in case I have to do it again in the future…

Capture Path Info in Hadoop InputFormat Class

On the last post I presented how to use Mapper context object to obtain Path information. This is a nice way to hack for ad-hoc jobs; however, it’s not really reusable and abstract. In this post, I’ll show you how to subclass Text, TextInputFormat, and LineRecordReader and create reusable components across all of your hadoop tasks.

Capture Directory Context in Hadoop Mapper

I have been using hadoop for data processing and datawarehousing for a while. One of the problem we encountered was map-reduce framework abstracts the input from files to lines, and thus it’s really difficult to apply logic based on different file or directories. Things got worse when we need to aggregate data across various versions of input sources. After digging in Hadoop source code, here is my solution.