There are many programming languages that support recursion. However, recursion is hard to write, and even harder to write it well.
To iterate is human, to recurse, divine.
If you also surveyed deeply in FP area, you will find a lot of patterns, such as tail recursion, continuous passing style, combination of higher order functions, fixed point ycombinator, memoization, and many more. You can see that a simple factorial can be written in 23 different forms. How do we pick a better one?
In short, the only pattern you should use heavily is tail recursion. Use other patterns only when necessary.
Straight (body) recursion, the problem
Let’s begin with mostly seen recursions: factorial and fibonacci.
1 2 3 4 5 6 

When you are programming in functional style, keep in mind of “nonfunctional” execution.
fac (5) = 5 * fac (4)
= 5 * 4 * fac (3)
= 5 * 4 * 3 * fac (2)
= 5 * 4 * 3 * 2 * fac (1)
= 5 * 4 * 3 * 2 * 1 * fac (0)
= 5 * 4 * 3 * 2 * 1 * 1
= 5 * 4 * 3 * 2 * 1
= 5 * 4 * 3 * 2
= 5 * 4 * 6
= 5 * 24
= 120
The problem is that the function has to use stack to hold number and multiply to the value returned by the recursive function.^{1}
Also the fibonacci executing model:
This is even worse, the complexity of fibonacci function cost $O(\phi^n)$ where $\phi=\frac{1+\sqrt{5}}{2}$.
Whenever you use a returned value in your function body, there is a cost.
We can reduce both factorial and fibonacci in tail recursion style using some accumulators.
Tail recursion
The common way to translate a body recursion into a tail recursion is to add a accumulator in argument list.
1 2 

fac 5 1 = fac 4 5
= fac 3 20
= fac 2 60
= fac 1 120
= fac 0 120
= 120
We reduce the execution steps from $2 n$ to $n$, and there is no stack variables
any more! Moreover, in assembly level, it only have to use goto
to the front
of the function and no need to set up environment again.
Fibonacci can be transformed to tail recursive function like this^{2}:
1 2 3 4 

fib' 5 1 1 = fib' 4 1 2
= fib' 3 2 3
= fib' 2 3 5
= fib' 1 5 8
= fib' 0 8 13
= 8
This time we use two accumulator f1
and f2
to record the state and make it
more “iterative”. The original input n
is treated as a “counter.”
Use multiple accumulators to make double recursion (like fibonacci) tail recursive
For different kinds of functional programming languages, you can abstract the interface and hide details in language suggested ways:
1 2 3 4 

1 2 3 4 

1 2 3 4 5 

Returning a list
When using tail recursion, we can also construct a returning list instead of a atom:
1 2 3 4 5 

Or, in common lisp:
1 2 3 

The key is to use cons
(:
in haskell, 
in erlang) instead of list
concatenation operator (++
in haskell and erlang.) The result of the left hand
side of the operator would be copied again and again and cause a quadratic
space and time complexity.
In common practices, use cons to build a reversed list, then reverse it at the end of recursion
For example, if we want to return a list of fibonacci numbers^{3}:
1 2 3 4 5 

Manipulating a tree
This is the best part of this article, it might be a little bit hard to understand at the first time. But once you get the idea, you’d just love it as I did. I first saw this idea in Paul Graham’s on lisp. He named this pattern as “doubly recursion.” Later on I saw people use this pattern in Quicksort (Erlang)  LiteratePrograms, without naming the pattern. For convenience I’d just use the term “doubly recursive” to express the idea :)
The key idea of doubly recursive is to use a returned accumulator as another recursion’s accumulator
I want to write the flattening function that takes a nested list and return a flatten one in haskell, however different depths of list are different types and thus you cannot simply write a general function to deal it. So I turned back to use lisp to express it:
1 2 3 4 5 6 7 8 

Note that labels
are like where
clause in haskell. The rec
function will
take a input x and return a reversed order of flatten list.
You can also apply this pattern on quick sort^{4}:
1 2 3 4 5 6 7 8 9 

Other patterns
1. Continuous passing style (CPS)
CPS is to pass a exit function to a function and let it call it at the end. The concept is similar to tail recursion, but I think it is better to “make no assumption about what user will do with your functions.” Leave your function interface simple and elegant, because using CPS is micro optimization.
CPS is a powerful techniques, but you must use it wisely. For example, you can use it in regular expression back tracing or garbage collection generation step. In those example the function has to jump back to somewhere in the control flow, and CPS can make the job done nice and clean.
But please don’t use CPS like this:
1 2 3 4 5 6 

The CPS above will still cause $O(\phi^n)$ time and space order to do the calculation!
2. Fixed point ycombinator
If you don’t know about ycombinator, just skip it.
The evolution of Haskell suggested that fixed point
ycombinator is the fastest implementation of writing factorial in haskell, even
faster than tail recursion. However, it depends. In my benchmark it made no
differences on factorial function. And why do you want to make your function
hard to debug in a ycombinator? It is even hard to trace function calls in
Lisp’s trace
commands.
3. Memoization
Memoization is also a powerful techniques that can benefit on rapid function calls. It is a trade off of memory and speed. As CPS mentioned before, use it wisely and not just for cool.
Conclusion
There are not much design patterns on functional programming. I hope these guidelines can be a start:
 Whenever you use a returned value in your function body, there is a cost.
 Use multiple accumulators to make double recursion (like fibonacci) tail recursive
 In common practices, use cons to build a reversed list, then reverse it at the end of recursion
 The key idea of doubly recursive is to use a returned accumulator as another recursion’s accumulator
 CPS and Memoization are powerful, but use it wisely
Any discussion and suggestions are welcomed! Thanks!

In modern compiler the trivial straight recursion such as factorial will be optimized and make no big performance difference compare to the one written in tail recursion form.↩

Take a look at how many possible ways to write fibonacci in common lisp. There is also a way to write fibonacci in $O(log(n))$ order.↩

The example is taken from Erlang performance tuning – List handling.↩

Example taken from Quicksort (haskell)  Literate Programs.↩