Need some help implementing a linear program with a parameter and deviational variables for quantile regression areal interpolation

Question

I'm trying to implement an areal interpolation technique described in https://www.tandfonline.com/doi/abs/10.1080/00045608.2011.627054 but I'm struggling to understand how the linear programming formulation in the paper translates to something like lpSolve using R.

Basically the task is to interpolate population from a set of source zones, each with an aggregate population value, to an arbitrary set of target zones. By using ancillary land cover data that classifies different land uses, and obtaining the area of intersection of each land use class with each source zone and using these as independent variables in a quantile regression, the coefficients of the quantile regression at the exact percentile of each source zone should give an estimate of the population density for each land class for each source zone. These can then be summed for each area of intersection for each class in each source zone and each target zone to get an estimate for the target zone.

The formulation is:

Minimise $\sum_{i}p\lambda_{i}^{-}+(1-p)\lambda_{i}^{+}$

Subject to:

$\sum_{j}\beta_{j}X_{ij}+\lambda_{i}^{-}+\lambda_{i}^{+} = P_{i}$ (for all i)

$\beta_{j},\lambda_{i}^{-},\lambda_{i}^{+} >= 0$

Where:

$p$ is the quantile parameter (0..1)

$\beta_{j}$ is the estimated population density for land class $j$

$X_{ij}$ is the amount of the $j^{th}$ land cover class associated with the $i^{th}$ observation

$P_{i}$ is the population count for the $i^{th}$ observation

$\lambda_{i}^{-}$ and $\lambda_{i}^{+}$ are deviational variables representing the amount of under and over estimation for the $i^{th}$ observation by the regression model

Essentially it's quantile regression but without an intercept term, and I can see how the formulation is meant to minimise the absolute deviation from the regression line for the chosen quantile, I'm just not sure how that quantile parameter fits in with the standard formulation for a linear program, or how exactly the deviational variables fit in and need to be treated (even if they are actually different in any way from a standard decision variable?).

I'm also slightly confused in that $\beta_{j}$, which is actually the value I'm interested in, is not part of the objective function, but only appears in the constraints - is this 'allowed' in linear programming? I guess it's just the same as it having a zero coefficient in the objective function, but I haven't found any other examples of linear programs that look like this, and as a geography student this is all fairly new to me!

Any help appreciated!

EDIT

OK, thanks to David M's response confirming what the decision variables are, I think I now know what vectors and matrices I need to pass to the lp function of the lpSolve package in R to carry out the optimisation!

Does the following look sensible?

direction: min

objective.in (vector of coefficients of the objective function): a vector of alternating p, (1-p), p, (1-p) ... for as many i as I have

const.mat (matrix of constraint coefficients, one row per constraint, one column per decision variable):

$X_{i,j}$, $X_{i,j+1}$ ... for as many j as there are for i ... , 1, 1

$X_{i+1,j}$, $X_{i+1,j+1}$ ... for as many j as there are for i+1 ... , 1, 1 ... for as many i as there are

A row of 1s of length: number of combinations of i and j (for $\beta_{ij}$) + number of i (for $\lambda_{i}^{-}$) + number of i (for $\lambda_{i}^{+}$) - this last row is for the final constraint specifying that $\beta_{ij}$, $\lambda_{i}^{-}$ and $\lambda_{i}^{+}$ must be >= 0 - or should this be defined elsewhere?

const.dir (vector of directions for each constraint): == for all except the last element which is >= for the final constraint as above?

const.rhs (vector of numeric values for the right hand side of the constraints): $P_{i}$, $P_{i+1}$, ... etc. for all i then zero for the final constraint as above?

I hope that makes sense, apologies if it's not a standard way of describing this stuff!

I'm assuming also that the +$\lambda_{i}^{-}$+$\lambda_{i}^{+}$ in $\sum_{j}\beta_{j}X_{ij}+\lambda_{i}^{-}+\lambda_{i}^{+} = P_{i}$ (for all i) is meant to be outside the summation? It's not very clear the way it's written in the paper but I'm pretty sure it wouldn't make any sense otherwise!

Thanks again for any help.

Dan

EDIT

Ah, I just spotted the following in the lpSolve docs:

"Note that every variable is assumed to be >= 0!"

Which answers my question about the final constraints above - I can leave those bits out, which makes things a bit simpler.

Welcome to Math.SE! The proposed problem is a standard linear program with decisions variables $\beta$ and $\lambda$. The fact that the variables $\beta$ do not appear in the objective function definitely not a problem—that occurs very frequently in linear programming. Are you unsure of how to implement this LP in R? — David M., Jun 09 '18 at 17:51
Hi David, thanks so much for the quick reply!
I wasn't sure, after reading through the documentation and some examples for the lpSolve package for R, how to implement this, but knowing what the decision variables are and that it's ok to have one in the constraint but not in the objective function, I am slightly more confident of being able to figure it out!

I see looking again that there are two R packages based on lp_solve, lpSolve and lpSolveAPI, with lpSolve being slightly simplified and lpSolveAPI being more full featured - could you advise which would be best? Or some other library? — Dan Evans, Jun 09 '18 at 18:25
I don’t use R so I’m not so familiar with the different packages available. If these are relatively small LPs (i.e. you’re not solving them on an industrial scale) then the solver choice probably doesn’t matter too much—just pick what you can code up the easiest. If you’re looking for performance, commercial solvers like Gurobi and CPLEX are good — David M., Jun 09 '18 at 18:29
I can't get around the paywall to see the paper right now, but are you sure the constraint isn't supposed to be $\sum_j\beta_jX_{ij}+\lambda_i^+-\lambda_i^-=P_i$ (note the minus sign)? I think this would make more sense... — David M., Jun 10 '18 at 14:52
The paper does have a second plus there. I noticed that early on and thought it seemed strange but then I thought it might be because it's trying to minimise the total absolute deviation. But thinking about it again now I think you're right, it should be minus. I think for any i only one of those lambda terms will be non-zero (since you can't have both an under and over estimation for the same observation). If the model over estimates for an observation it should subtract to get to the original value for that observation and if it under estimates it should add. — Dan Evans, Jun 10 '18 at 16:37
I think it should be a negative. Did you have any luck with this? — David M., Jun 12 '18 at 12:10
For various reasons I've only just got around to trying this. I'm having problems with the lp function in the lpSolve library crashing R when I try to run it! I have asked another question here if anyone is interested in following or has any suggestions: https://stackoverflow.com/questions/51803669/r-lpsolve-crashing-r — Dan Evans, Aug 11 '18 at 20:54

Need some help implementing a linear program with a parameter and deviational variables for quantile regression areal interpolation

EDIT

EDIT

0 Answers0