Lecture 6
Intertemporal Labor Supply - Continued
Outline
1. aside on method of moments estimation, in preparation for PS#4
2. structural estimation of intertemporal labor supply - a short introduction
References
For estimation of micro wage models:
John Abowd and David Card (1989). "On the Covariance Structure of
Earnings and Hours Changes". Econometrica 57 (2): 411-445.
Altonji, Joseph G & Segal, Lewis (1996). "Small-Sample Bias in GMM Es-
timation of Covariance Structures," Journal of Business & Economic Statistics,
14(3): 353-66. We can only touch on this topic: for a lot more, see Econ 244
For labor supply:
Pierre Olivier Gourinchas and Jonathan Parker. (2002) "Consumption Over
the Life Cycle" Econometrica 70 (No. 1):47-89.
Eric French. "The E¤ects of Health, Wealth and Wages on Labor Supply
and Retirement Behavior." Review of Economic Studies 73: 395-427.
Michael Keane and Kenneth Wolpin. "The Solution and Estimation of Dis-
crete Choice Dynamic Programming Models by Simulation and Interpolation:
Monte Calro Evidence. Review of Economics and Statistics 76 (November
1994): 648-672.
Michael Keane, Petra Todd and Kenneth Wolpin. "The Structural Estima-
tion of Behavioral Models: Discrete Choice Dynamic Programming Methods
and Applications. Chapter 4 in Handbook of Labor Economics Volume 4a.
Part 1: Background on Method of Moments Estimation
As discussed in lecture 4, an important question for interpreting the reaction
of hours to wage changes is to what extent wage innovations are expected to
persist. Pistaferri assumes that innovations are "permanent": i.e., that an
appropriate model for individual wages is:
logwit = !i + uit ,
uit = uit�1 + �it
where the � 0its are uncorrelated over time. This is a "pure random walk" model,
in which E[logwit+j j logwit] = logwit: A more general model is
logwit = !i + xit�t + uit + eit
uit = �uit�1 + �it ,
where eit and ,�it are serially uncorrelated and uncorrelated with each other.
This model includes a
xed component !i, a component attributable to ob-
servables xit; an AR(1) component uit, and a "purely transitory" component
1
eit: We will discuss how to estimate the parameters of this model using simple
method of moments. A standard method is to
rst regress logwit on xit, and
treat the residuals rit as estimates of the combined error component !i+uit+eit
. (There is a more sophisticated approach which we may discuss briey in class).
Then we form the covariance matrix C of the residuals and
t a model to the
vector of elements of C. Let
�2! = var[!i]
�2u0 = var[ui0];
vt = var[�it]
Notice that:we can write
rit = !i + �
tui0 + �
t�1�i1 + :::+ �
t�it�1 + �it + eit
which implies that
var[ri1] = �
2
! + �
2�2u0 + v1 + var[ei1];
var[rit] = �
2
! + �
2t�2u0 + vt + �
2vt�1 + :::+ �2(t�1)v1 + var[eit];
cov[rit; ris] = �
2
! + �
s+t�2u0 + �
t�svs + �t�s+2vs�1 + :::+ �s+t�2v1; (s < t)
The term �2u0 represents an "initial conditions" e¤ect: it is the e¤ect of the
dispersion in the pre-sample value of uit, which gradually fades out if � < 1:
It is a matter of algebra to show that if var[eit] is constant, and all the v0ts are
constant (i.e., vt = v), and if �2u0 = v=(1 � �2);(its "steady state" value) then
the variances of rit are all constant. If var[eit] and all the v0ts are constant but
�2u0 < v=(1��2); the variances of rit rise over time. A model with learning, in
which people enter the labor market getting paid roughly the same wage, and
employers gradually learn who is who will lead to a pattern like this of rising
cross-sectional variation in wages.
In general we can write
vecltr[C] = m = f(�)
where � represents the parameters in the wage generating model. The method
of moments idea is to
nd a value for � that gives the "best
t" to the empirical
estimates of m. Call bm the estimate of m. In general an element of bm is some
term in the empirical covariance matrix bC, say
bmk = cov[rit; ris] = 1
N
X
i
ritris =
1
N
X
i
mki
(since the residuals have zero mean by construction we dont have to deviate
from means). We can construct the sampling variance of the element bmk by
1
N
X
i
(mki � bmk)2
2
which is just the variance of the second moment in the sample, divided by N ,
and the sampling covariance between estimates of any two elements bmk and bmh
by
1
N
X
i
(mki � bmk)(mhi � bmh):
Under regularity conditions (basically, iid sampling and
nite fourth moments),
the vector of estimates of the second moments will have a standard normal
distribution with p
N(bm�m)! N(0; V )
Moreover, the matrix
bV = 1
N
X
i
(mi � bm)(mi � bm)0
is a consistent estimate of V:
For estimation, one simple choice is "least squares"
min
�
[bm� f(�)]0[bm� f(�)]
Various GLS variants are also possible. Consider a positive de
nite matrix A
(of the right dimension): then we can use the objective:
min
�
[bm� f(�)]0A[bm� f(�)]: (1)
Chamberlain (1982) presented the following theorem. Assume:
1. bm! f(�0) almost surely
2. f is continuous in � in some neighborhood � that contains �0
3. f(�) = f(�0) for � in �) � = �0 (i.e, we have identi
cation)
4. A! a positive de
nite matrix
Then the gls estimator b� based on equation (1) converges almost surely to
�0.
If in addition:
5.
p
N(bm� f(�0))! N(0; V )
6. f is 2x continuously di¤erentiable for � in some neighborhood of �0, and
F = F (�0) � @f(�
0)
@�
has full rank, then p
N(b� � �0)! N(0;�)
where
� = (F 0 F )�1F 0 V F (F 0 F )�1:
3
It can also be shown that the "optimal" choice for A is one such that A! V �1;
in which case � = (F 0V �1F )�1: Notice that the "least squares" choice A = I
leads to the var-cov:
�ols = (F
0F )�1F 0V F (F 0F )�1
which looks just like the variance matrix you get in a regression model with
non-spherical errors when you use OLS. In applications we need to estimate F
and V : we will use bF = F (b�) and some estimate of bV :
A nice feature of the "optimal" weight matrix is that under the null, the
minimand
N [bm� f(�)]0V �1[bm� f(�)]
has an asymptotic �2 distribution, with degrees of freedom equal to the di¤er-
ence between the number of moments and the number of elements of �: This
provides a general speci
cation test of the validity of the model m = f(�): For
other weighting matrices there is a similar overall goodness of
t statistic:
N [bm� f(�)]0R�[bm� f(�)]
where R� is a generalized inverse of the matrix R = (I�F (F 0AF )�1F 0A)V (I�
F (F 0AF )�1F 0A): (This matrix has rank at most equal to the di¤erence between
the number of moments and the number of columns of F , which is the number
of elements in �):
As a practical matter the "optimal" choice for the weighting matrix can lead
to substantial problems in small samples. This was not well understood at the
time of Abowd-Card, but was pointed out in the paper by Altonji and Segel.
It is generally agreed that when the moments of interest are all (roughly) scaled
the same (as is true when we consider covariances of log wage residuals) the
least squares objective is sensible.
Part 2: Structural Methods
The idea of "fully structural" modelling is to estimate the parameters of the
utility function that drives choices within and between period. Some advantages
of this approach:
1) the model can be solved for the value of the marginal utility of wealth
for an agent in a given period, conditional on the state variables he or she sees at
that point. This makes it possible to assess the wealth e¤ects of wage changes,
and the net e¤ect (via intertemporal substitution and wealth e¤ects) on labor
supply
2) the model can be used to assess "out of sample" policy changes, like
a revision in social security, on outcomes at all stages of the lifecycle
There are also some costs:
3) because of computational complexity many simpli
cations have to
be made.
4) it is often very hard to understand where identi
cation is "coming
from" - in most cases parameters are identi
ed by a combination of functional
4
form assumptions and general features of the data. There is rarely "local iden-
ti
cation" based on speci
c design features, as occurs in IV or RD approaches
to estimation of simpler reduced formmodels
A basic example.
We will discuss a simple dynamic labor supply model that illustrates the
idea of interpolation of the value function (or, actually the derivative of the
value function) using a regression approximation. To keep things very simple,
we will assume that wages take on only a limited set of values (say w1; w2:::wJ)
and �ij = P (wt = wijwt�1 = wj) are known. There will be two state variables:
the wage, and assets. The value function at time t will be denoted Vt(At; wt).
When the wage takes on only discrete values this is just a set of J functions
Vt(At; wj): What is relevant for dynamic consumption and hours choices are
the derivatives @Vt(A;wj)=@A = �t(A;wj): The solution method will involve
working backward from the retirement period, and at each period solving for
the optimal choices of consumption and hours in that period, as a function of
the wage in that period, assets, and the approximations to @Vt+1(A;wj)=@A:
With these in hand we can then compute @Vt(A;wj)=@A at each of a
nite set
of values for A: We will then
t a regression model to these points to get an
approximating model for @Vt(A;wj)=@A at every level of A. We then continue
working backward to obtain the optimal consumption and hours functions in
each period for each wage and level of assets,
c�t (At; wt)
h�t (At; wt):
In applications these functions can be used to compute a likelihood for the
observed data for a sample of people who are observed at various points in time,
or to compute hours and consumption pro
les that are matched to observed
pro
les. We defer a discussion of how to use the estimated optimal response
functions till the end of the lecture.
Lets assume the within period utility function is separable:
U(c; h) = u(c)� d(h):
with d(0) = 0. Lets also assume that agents work until an exogenous age R,
then retire. At that point the agent becomes eligible for a pension p: In addition
to the pension amount, an agent with (beginning-of-period) wealth AR buys an
annuity and receives a per-period payment of rAR for the rest of his/her life.
For purposes of modeling labor supply at earlier ages we can therefore consider
the value function for period R :
VR(AR) =
1X
j=0
U(p+ rAR; 0)
(1 + r)j
=
1
r
u(p+ rAR)
where U(c; h) is the within-period utility function, and I have simpli
ed things
by assuming that the agentsdiscount rate and the annuity price are equivalent
5
(with separable preferences this means that the agent wants to set consumption
constant for all remaining periods). A similar setup is used by Gourinchas and
Parker (2002). Note that the function VR(AR) inherits properties from u(:), so
if u depends on some parameter � then the same parameter shifts VR:
Now lets go back to period R � 1: In this period the agent faces a wage
wR�1; and has assets AR�1: The value function for this period is
VR�1(AR�1; wR�1) = max
cR�1;hR�1
u(cR�1)�d(hR�1)+ 1
1 + r
[
1
r
u(p+r(1+r)(AR�1+wR�1hR�1�cR�1)]:
Note that there is no uncertainty left once we get to R� 1. So we can solve for
the optimal choice in this period very easily, to get a "starting value function"
for our backward recursion.
The f.o.c.s for period R� 1 are:
u0(cR�1) = �R�1 = u0(p+ r(1 + r)(AR�1 + wR�1hR�1 � cR�1))
d0(hR�1) = �R�1wR�1:
Now lets assume
d(h) =
1
1 + 1=�
h1+1=�
u(c) = log c
so the f.o.c. for hours implies:
hR�1 = w
�
R�1 c
��
R�1;
which means optimal earnings in period R� 1 are
wR�1hR�1 = w
1+�
R�1 c
��
R�1
Now all we have to do is
nd an optimal choice for cR�1: Equating marginal util-
ity of consumption in period R�1 and R means that the levels of consumption
are equal, so:
c = p+ r(1 + r)(AR�1 + w
1+�
R�1 c
�� � c)
) c = r(1 + r)
1 + r(1 + r)
AR�1 +
1
1 + r(1 + r)
p+
r(1 + r)
1 + r(1 + r)
w1+�R�1c
��
This has to be solved numerically. It has the form
c = f(c) = k +
c��
and notice that k is pretty big and
is small. Its not hard to solve this by
iterative methods.1 With this we have now obtained numerically
c�R�1(AR�1; wR�1)
1 I used this method: start with the initial guess c1 = k: Now f(c) = f(c1)+(c�c1)f 0(c1);
so setting c = f(c) gives a new guess
c2 =
f(c1)� c1f 0(c1)
1� f 0(c1)
:
This converges in 3-4 iterations.
6
(this also depends on �; p; r). We can then obtain h�R�1(AR�1; wR�1):
Now notice that
@VR�1(A;wR�1)=@A = ��R�1(AR�1; wR�1) =
1
c�R�1(AR�1; wR�1)
:
This is the function we are going to need to take expectations over in solving
for optimal choices at period R � 2: In particular, if in period R � 2 the wage
is wR�2 = wi then we are going to need to calculate
ER�2[@VR�1(A;wi)=@A] =
X
j
1
c�R�1(A;wj)
�ji;
treating A as an endogenous variable that depends on cR�2; wR�2; hR�2; and
AR�2:
Our method is as follows. First, using the procedure above, we calculate
c�R�1(A;wj) for a grid of values of A and each possible value of wj : In a "test"
program, I measured all monetary units in 10000s and assumed that the possible
values for A are 1; 2:::1; 000 (i.e., up to a million). I assumed that w takes on
values of 10; 20::::100 (i.e., 10,000, 20,000... 100,000), and that p = 20 (i.e.,
20,000). Then I formed a simple nth � order polynomial approximation:
1
c�R�1(A;wj)
= b0j + b1jA+ b2jA
2 + :::bnjA
n
For my test program I found that n = 4 gets an extremely good
t. Now notice
that once we have these coe¢ cients, the expected derivative of the R� 1 value
function is:
ER�2[@VR�1(A;wi)=@A] =
X
j
(b0j + b1jA+ b2jA
2 + :::bnjA
n)�ji
=
X
j
b0j�ji +
X
j
b1j�jiA+ :::+
X
j
bnj�jiA
n
= bi0 + b
i
1A+ b
i
2A
2 + :::+ binA
n
where the coe¢ cients bi0; b
i
1:::b
i
n depend on the wage in R� 2 via the "weights"
�ji: Notice the bene
t of having a discrete
rst-order process for wages: given
the J approximating polynomials, all we have to do to form the expectation for a
given wage in R�2 is weight the approximating polynomials by the appropriate
transition probabilities.
Now we are ready to solve the optimal choices for c and h in R� 2: Specif-
ically, the Bellman equation is:
VR�2(AR�2; wR�2) = max
cR�2;hR�2
u(cR�2)�d(hR�2)+ 1
1 + r
ER�2[VR�1(AR�1; wR�1jwR�2)]:
7
And the f.o.c. are:
u0(cR�2) = �R�2 = ER�2[@VR�1(AR�1; wR�1jwR�2)=@AR�1]
d0(hR�2) = �R�2wR�2
) hR�2 = w�R�2 c��R�2
) wR�2hR�2 = w1+�R�2 c��R�2
So we need to solve
1
cR�2
= bi0 + b
i
1A+ b
i
2A
2 + :::+ binA
n
where
A = (1 + r)(AR�2 + w
1+�
R�2 c
��
R�2 � cR�2):
Thus for each value of AR�2 and each possible value of the wage wi we need
to solve the root of the function g(c;AR�2; wi); where:
g(c;AR�2; wi) =
1
c
�
X
k
bik((1 + r)(AR�2 + w
1+�
i c
�� � c)k = 0:
Again, a numerical solution is needed.2 The solution is
c�R�2(AR�2; wR�2)
(which also depends on �; p; r). We can then get h�R�2(AR�2; wR�2):
Finally, going backward one step we will need to evaluate
ER�3[@VR�2(AR�2; wR�2)=@AR�2jwR�3 = wi] =
X
j
1
c�R�2(AR�2; wj)
�ji:
Thus we can proceed backwards, by estimating the approximating polynomial
functions and repeating the previous steps.
Some comments:
1) Notice in this algorithm, everything is summarized by the approximating
polynomial coe¢ cients for ��t (At; wj): For example, if we use a fourth order
polynomial, and have 10 possible wage values, the relevant information for pe-
riod t (given the transition matrix elements �ij ; and the parameters �; p; r) is
contained in 50 numbers. The algorithm proceeds by getting the numbers
sequentially from R� 2 back to some earliest possible period (e:g:; R� 40):
2The standard method is Newton-Raphson. Recall that if you are trying to
nd a c
such that g(c) = 0 you can normally start with an initial guess c1 and iterate: cj = cj�1 �
g(cj�1)=gc(cj�1): In the case where we are approximating the marginal utility of income
with polynomials, the analytical derviative is easy.
8
2) We could introduce tastes in one of several ways. One way is to allow
the marginal utilities of consumption or leisure to change with age in some way,
e.g.,
dt(ht) = f(t)
1
1 + 1=�
h
1+1=�
t
where f(t) is a simple function like f(t) = exp(�t): For a given value of � it
is possible to solve for the optimal consumption and hours functions in each
period, and then search for a "best
tting" choice. Another way is to assume
there are discrete types � 2 f�1; �2; :::�Kg; and assume
dk(h) = exp(�k)
1
1 + 1=�
h1+1=�
Then we have to solve the problem for each "type",and think of how to map
the behavior we see into an average across the types.
3) How do we get the �ij elements?
Suppose that we want to approximate a
rst order serially correlated con-
tinuous process by a 1st order Markov process. G. Tauchen (1986 Economics
Letters) described a simple algorithm. For example, suppose we want to ap-
proximate an AR-1 wage process:
wt = a+ �wt�1 + �t
where �t � N(0; �2): Note that for this process E[wt] = �w = a=(1 � �);
and var[wt] = �2w = �
2=(1 � �2): To approximate this with a discrete 1st
order markov model with N points of support,
rst
nd N � 1 cut points kj
(j = 1; ::N � 1) such that
�[
kj+1 � �w
�w
]� �[kj � �w
�w
] =
1
N
with k0 = �1; and kN =1: (This de
nes the boundaries so that the probabil-
ity a draw from N(�w; �
2
w) falls in each bin is 1/N). Next,
nd the mean value
of a N(�w; �
2
w) within each bin. These values will be the points of support for
the discrete process. If � = 0 we can stop. Otherwise, the last step is to de
ne
"transition probabilities" �ij such that
�ij = P (ki < wt < ki+1jkj < wt�1 < kj+1)
assuming that �
wt�1
wt
�
� N
��
�w
�w
�
; �2w
�
1 �
� 1
��
This can be computed using the usual formulas (e.g. in Johnson and Kotz) (or
using simple simulation methods).
4) How do we use the optimal consumption and hours functions, c�t (At; wt); h
�
t (At; wt)?
9
A huge obstacle to micro research on consumption and labor supply is the
absence of reliable data on assets. For example, the well known structural study
of retirement by Rust and Phelan, "How Social Security and Medicare A¤ect
Retirement Behavior In a World of Incomplete Markets" Econometrica 65(July
1997), assumes no savings, in part because of the low quality of the asset
information in their data set. As a result, almost no studies have tried to
estimate structural labor supply models that are directly based on observed
data on consumption, hours, wages, and assets. One of the few is Imai and
Keane, IER 2004, which solves the problem by evaluating the value function
at a discrete number of points and interpolating (rather than interpolating the
marginal utility of wealth function). Imai and Keane allow for mismeasurement
in assets and hours.
10