[1] R. S. Acosta-Abreu, O. Hernandez-Lerma: 
Iterative adaptive control of denumerable state average-cost Markov systems. Control Cybernet. 14 (1985), 313 - 322. 
MR 0842780 
[2] V. V. Baranov: 
Recursive algorithms of adaptive control in stochastic systems. Cybernetics 17 (1981), 815-824. 
MR 0689427 
[4] A. Federgruen, P. J. Schweitzer: 
Nonstationary Markov decision problems with converging parameters. J. optim. Theory Appl. 34 (1981), 207-241. 
MR 0625228 | 
Zbl 0426.90091 
[5] A. Federgruen P. J. Schweitzer, H. C Tijms: 
Contraction mappings underlying undiscounted Markov decision problems. J. Math. Anal. Appl. 65 (1978), 711 - 730. 
MR 0510481 
[6] A. Federgruen, H. C Tijms: 
The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms. J. Appl. Probab. 15 (1978), 356-373. 
MR 0475896 | 
Zbl 0386.90060 
[7] O. Hernandez-Lerma: 
Adaptive Control Processes. Springer-Verlag, Berlin-Heidelberg- New York 1989. 
MR 0995463 
[8] K. Hinderer: 
On approximate solutions of finite-stage dynamic programs. In: Dynamic Programming and its applications (M. L. Puterman, ed.), Academic Press, New York 1978, pp. 289-317. 
MR 0537885 | 
Zbl 0461.90075 
[9] G. Hiibner: 
Contraction properties of Markov decision models with applications to the elimination of non-optimal actions. In: Dynamische optimierung, Bonner Math. Schriften 98 (1977), 57-65. 
MR 0524411 
[10] G. Hiibner: 
A unified approach to adaptive control of average reward  Markov decision processes. OR Spektrum 10 (1988), 161-166. 
MR 0961229 
[11] M. Kurano: 
Discrete-time Markovian decision processes with an unknown parameter - average return criterion. J. oper. Res. Soc. Japan 15 (1972), 67-76. 
MR 0343942 | 
Zbl 0238.90006 
[12] M.  Kurano: 
Adaptive policies in  Markov decision  processes  with  uncertain  matrices. J. Inf. Optim. 4 (1983), 21-40. 
MR 0697991 
[13] M.  Kurano: 
Learning algorithms for Markov decision processes. J. Appl. Probab. 24 (1987), 270-276. 
MR 0876190 | 
Zbl 0631.90085 
[14] P. Mandl: 
Estimation and control of Markov chains. Adv. in Appl. Probab. 6 (1974), 40-60. 
MR 0339876 
[15] P. Mandl: 
On the adaptive control of countable Markov chains. In: Probability Theory, Banach Centre Publications, Warsaw 1979, pp. 159-173. 
MR 0561478 | 
Zbl 0439.60069