You are doing field work in India. You go into rural villages and you interview a large number of
families who are farmers. For each family, you record the following information:
Y = annual family income (in rupees)
C = number of children
L = whether or not the male adult is literate
F = the amount of land the family farms (in hectares)
B = whether or the family has a bank account
a. Say whether each variable is discrete, continuous, or approximately continuous (and briefly
explain why).
b. The distributions of Y and F are very skewed (positively). The distribution of C is slightly
positively skewed, but not very much. All three distributions are unimodal.
For C and F, say which measure of central tendency you think is most appropriate, and give
your reason. For Y, give the ranking of the mean, median, and mode.c.
Here are values of the sample mean and sample standard deviation for Y, C, and F, and also
the pairwise covariances between these variables:
mean
standard deviation
covariances
Y 19000 3000 s YC 1680
C 4.7 1.4 s CF . 196
F 1.2 .2 s YF 360
Originally, Y is measured in rupees. You decide to add a government subsidy for education,
in the amount of 500 rupees, that each family receives. Also, you decide to change the units
so that each family’s income is in US dollars. The exchange rate is 40 rupees per US dollar.
Call the new variable “net US dollar family income” (N). Give the values of the mean and
standard deviation of N.
d. Originally, F is measured in “hectares” (one hectare is about 2.5 acres). You decide to rescale
farm size so it is measured in acres. Call the new variable A (for area). Give the value of the
covariance between A and N. Give the value of the correlation between A and N.
e. Suppose you decide to drop the 10% of observations that have the lowest family income, and
also the 10% of observations that have the highest family income. What direction of effect
(increase, decrease, approximately no effect) will this have on the mean? What direction of
effect will this have on the standard deviation? The median? The interquartile range?