### HTML

[1] | R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction, Cambridge, MA, USA MIT Press, 1998. |

[2] | M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming, New York, NY, USA: John Wiley & Sons, Inc., 1994. |

[3] | R. E. Bellman. Dynamic Programming, Princeton, NJ, USA: Princeton University Press, 1957. |

[4] | C. Szepesvari. Algorithms for Reinforcement Learning, San Mateo, CA, USA: Morgan & Claypool Publishers, 2010. |

[5] | P. J. Werbos. Approximate dynamic programming for realtime control and neural modeling. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White, D. A. Sofge, Eds., New York, USA: Van Nostrand Reinhold, 1992. |

[6] | D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic Programming, Belmont, MA, USA: Athena Scientific, 1996. |

[7] | J. Si, A. G. Barto, W. B. Powell, D. C. Wunsch. Handbook of Learning and Approximate Dynamic Programming, New York, USA: Wiley-IEEE Press, 2004. |

[8] | W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality, New York, USA: Wiley-Interscience, 2007. |

[9] | F. Y. Wang, H. G. Zhang, D. R. Liu. Adaptive dynamic programming: An introduction. IEEE Computational Intelligence Magazine, vol. 4, no. 2, pp. 39-47, 2009. |

[10] | F. L. Lewis, D. R. Liu. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, Hoboken, NJ, USA: Wiley-IEEE Press, 2013. |

[11] | F. Y.Wang, N. Jin, D. R. Liu, Q. L.Wei. Adaptive dynamic programming for finite-horizon optimal control of discretetime nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, vol. 22, no. 1, pp. 24-36, 2011. |

[12] | D. Wang, D. R. Liu, Q. L. Wei, D. B. Zhao, N. Jin. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica, vol. 48, no. 8, pp. 1825-1832, 2012. |

[13] | D. R. Liu, D. Wang, X. Yang. An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Information Sciences, vol. 220, pp. 331-342, 2013. |

[14] | H. Li, D. Liu. Optimal control for discrete-time affine non-linear systems using general value iteration. IET Control Theory and Applications, vol. 6, no. 18, pp. 2725-2736, 2012. |

[15] | A. Gosavi. Simulation-based Optimization: Parametric Optimization Techniques and Reinforcement Learning, Secaucus, NJ, USA: Springer Science & Business Media, 2003. |

[16] | V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint, Hindustan, India: Hindustan Book Agency, 2008. |

[17] | S. Lange, T. Gabel, M. Riedmiller. Batch reinforcement learning. Reinforcement Learning: State-of-the-Art, Adaptation, Learning, and Optimization, M. Wiering, M. van Otterlo, Eds., Berlin, Germany: Springer-Verlag, pp. 45-73, 2012. |

[18] | D. P. Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, vol. 9, no. 3, pp. 310-335, 2011. |

[19] | L. Busoniu, R. Babuska, B. D. Schutter, D. Ernst. Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering), Boca Raton, FL, USA: CRC Press, 2010. |

[20] | L. Busoniu, D. Ernst, B. De Schutter, R. Babuska. Approximate reinforcement learning: An overview. In Proceedings of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, IEEE, Paris, France, 2011. |

[21] | M. Geist, O. Pietquin. Algorithmic survey of parametric value function approximation. IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 6, pp. 845-867, 2013. |

[22] | G. J. Gordon. Approximate Solutions to Markov Decision Processes, Ph.D. dissertation, Carnegie Mellon University, USA, 1999. |

[23] | D. Ormoneit, Ś. Sen. Kernel-based reinforcement learning. Machine Learning, vol. 49, no. 2-3, pp. 161-178, 2002. |

[24] | D. Ernst, P. Geurts, L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, vol. 6, pp. 503-556, 2005. |

[25] | M. Riedmiller. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Proceedings of the 16th European Conference on Machine Learning, Springer, Porto, Portugal, pp. 317-328, 2005. |

[26] | S. J. Bradtke, A. G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, vol. 22, no. 1-3, pp. 33-57, 1996. |

[27] | J. A. Boyan. Technical update: Least-squares temporal difference learning. Machine Learning, vol. 49, no. 2-3, pp. 233-246, 2002. |

[28] | A. Nedić, D. P. Bertsekas. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, vol. 13, no. 1-2, pp. 79-110, 2003. |

[29] | M. G. Lagoudakis, R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, vol. 4, pp. 1107-1149, 2003. |

[30] | A. Antos, C. Szepesvári, R. Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, vol. 71, no. 1, pp. 89-129, 2008. |

[31] | A. Antos, C. Szepsevári, R. Munos. Value-iteration based fitted policy iteration: Learning with a single trajectory. In Proceedings of IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, IEEE, Honolulu, Hawaii, USA, 2007, pp. 330-337, 2007. |

[32] | M. Puterman, M. Shin. Modified policy iteration algorithms for discounted Markov decision problems. Management Science, vol. 24, no. 11, pp. 1127-1137, 1978. |

[33] | J. N. Tsitsiklis. On the convergence of optimistic policy iteration. Journal of Machine Learning Research, vol. 3, pp. 59-72, 2002. |

[34] | B. Scherrer, V. Gabillon, M. Ghavamzadeh, M. Geist. Approximate modified policy iteration. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, UK, pp. 1207-1214, 2012. |

[35] | A. M. Farahmand, M. Ghavamzadeh, C. Szepesvári, S. Mannor. Regularized policy iteration. Advances in Neural Information Processing Systems, D. Koller, D. Schuurmans, Y. Bengio, L. Bottou, Eds., Cambridge, MA, USA: MIT Press, pp. 441-448, 2008. |

[36] | A. M. Farahmand, M. Ghavamzadeh, C. Szepesvari, S. Mannor. Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems. In Proceedings of American Control Conference, IEEE, St. Louis, MO, USA, pp. 725-730, 2009. |

[37] | A. M. Farahmand, C. Szepesvári. Model selection in reinforcement learning. Machine Learning, vol. 85, no. 3, pp. 299-332, 2011. |

[38] | M. Loth, M. Davy, P. Preux. Sparse temporal difference learning using LASSO. In Proceedings of IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, IEEE, Honolulu, Hawaii, USA, pp. 352-359, 2007. |

[39] | J. Z. Kolter, A. Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, NY, USA, pp. 521-528, 2009. |

[40] | J. Johns, C. Painter-Wakefield, R. Parr. Linear complementarity for regularized policy evaluation and improvement. In Proceedings of Neural Information and Processing Systems, Curran Associates, New York, USA, pp. 1009-1017, 2010. |

[41] | M. Ghavamzadeh, A. Lazaric, R. Munos, M. W. Hoffman. Finite-sample analysis of Lasso-TD. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, USA, pp. 1177-1184, 2011. |

[42] | B. Liu, S. Mahadevan, J. Liu. Regularized off-policy TDlearning. In Proceedings of Advances in Neural Information Processing Systems 25, pp. 845-853, 2012. |

[43] | S. Mahadevan, B. Liu. Sparse Q-learning with mirror descent. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, pp. 564-573, 2012. |

[44] | M. Petrik, G. Taylor, R. Parr, S. Zilberstein. Feature selection using regularization in approximate linear programs for Markov decision processes. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 871-878, 2010. |

[45] | M. Geist, B. Scherrer. _{1}-penalized projected Bellman residual. In Proceedings of the 9th European Workshop on Reinforcement Learning, Athens, Greece, pp. 89-101, 2011. |

[46] | M. Geist, B. Scherrer, A. Lazaric, M. Ghavamzadeh. A Dantzig selector approach to temporal difference learning. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, pp. 1399-1406, 2012. |

[47] | Z. W. Qin, W. C. Li, F. Janoos. Sparse reinforcement learning via convex optimization. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, pp. 424-432, 2014. |

[48] | M. W. Hoffman, A. Lazaric, M. Ghavamzadeh, R. Munos. Regularized least squares temporal difference learning with nested l_{2} and l_{1} penalization. In Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning, Athens, Greece, pp. 102-114, 2012. |

[49] | J. Johns, S. Mahadevan. Sparse Approximate Policy Evaluation Using Graph-based Basis Functions, Technical Report UM-CS-2009-041, University of Massachusetts, Amherst, USA, 2009. |

[50] | C. Painter-Wakefield, R. Parr. Greedy algorithms for sparse reinforcement learning. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, pp. 1391-1398, 2012. |

[51] | A. M. Farahmand, D. Precup. Value pursuit iteration. In Proceedings of Advances in Neural Information Processing Systems 25, Stateline, NV, USA pp. 1349-1357, 2012. |

[52] | M. Ghavamzadeh, A. Lazaric, O. A. Maillard, R. Munos. LSTD with random projections. In Proceedings of Advances in Neural Information Processing Systems 23, Vancourer, Canada, pp. 721-729, 2010. |

[53] | B. Liu, S. Mahadevan. Compressive Reinforcement Learning with Oblique Random Projections, Technical Report UM-CS-2011-024, University of Massachusetts, Amherst, USA, 2011. |

[54] | G. Taylor, R. Parr. Kernelized value function approximation for reinforcement learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, NY, USA, pp. 1017-1024, 2009. |

[55] | T. Jung, D. Polani. Least squares SVM for least squares TD learning. In Proceedings of the 17th European Conference on Artificial Intelligence, Trento, Italy, pp. 499-503, 2006. |

[56] | X. Xu, D. W. Hu, X. C. Lu. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, vol. 18, no. 4, pp. 973-992, 2007. |

[57] | F. W. Keller, S. Mannor, D. Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning, ACM, New York, NY, USA, pp. 449-456, 2006. |

[58] | R. Parr, C. Painter-Wakefield, L. H. Li, M. L. Littman. Analyzing feature generation for value-function approximation. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, USA, pp. 737-744, 2007. |

[59] | R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, M. L. Littman. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In Proceedings of the 25th International Conference on Machine Learning, ACM, New York, NY, USA, pp. 752-759, 2008. |

[60] | M. M. Fard, Y. Grinberg, A. M. Farahmand, J. Pineau, D. Precup. Bellman error based feature generation using random projections on sparse spaces. In Proceedings of Advances in Neural Information Processing Systems 26, Stateline, NV, USA, pp. 3030-3038, 2013. |

[61] | M. Belkin, P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003. |

[62] | S. T. Roweis, L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, vol. 290, no. 5500, pp. 2323-2326, 2000. |

[63] | J. Tenenbaum, V. de Silva, J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, vol. 290, no. 5500, pp. 2319-2323, 2000. |

[64] | S. Mahadevan. Proto-value functions: Developmental reinforcement learning. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 553-560, 2005. |

[65] | S. Mahadevan. Representation policy iteration. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, Edinburgh, Scotland, pp. 372-379, 2005. |

[66] | S. Mahadevan, M. Maggioni, K. Ferguson, S. Osentoski. Learning representation and control in continuous Markov decision processes. In Proceedings of the 21st National Conference on Artificial Intelligence, Boston, USA, pp. 1194-1199, 2006. |

[67] | S. Mahadevan, M. Maggioni. Value function approximation with diffusion wavelets and Laplacian eigenfunctions. In Proceedings of Advances in Neural Information Processing Systems 18, Vancourer, Canada, pp. 843-850, 2005. |

[68] | S. Mahadevan, M. Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, vol. 8, no. 10, pp. 2169-2231, 2007. |

[69] | S. Mahadevan. Learning representation and control in Markov decision processes: New frontiers. Foundations and Trends in Machine Learning, vol. 1, no. 4, pp. 403-565, 2009. |

[70] | S. Osentoski, S. Mahadevan. Learning state-action basis functions for hierarchical MDPs. In Proceedings of the 24th International Conference on Machine Learning, ACM, New York, NY, USA, pp. 705-712, 2007. |

[71] | J. Johns, S. Mahadevan. Constructing basis functions from directed graphs for value function approximation. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, USA, pp. 385-392, 2007. |

[72] | J. Johns, S. Mahadevan, C. Wang. Compact spectral bases for value function approximation using Kronecker factorization. In Proceedings of the 22nd National Conference on Artificial Intelligence, AAAI, California, USA, pp. 559-564, 2007. |

[73] | M. Petrik. An analysis of Laplacian methods for value function approximation in MDPs. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, Hyderabad, India, pp. 2574-2579, 2007. |

[74] | J. H. Metzen. Learning graph-based representations for continuous reinforcement learning domains. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Czech Republic, pp. 81-96, 2013. |

[75] | X. Xu, Z. H. Huang, D. Graves, W. Pedrycz. A clusteringbased graph Laplacian framework for value function approximation in reinforcement learning. IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2613-2625, 2014. |

[76] | K. Rohanimanesh, N. Roy, R. Tedrake. Towards feature selection in actor-critic algorithms. In Proceedings of Workshop on Abstraction in Reinforcement Learning, Montreal, Canada, pp. 1-9, 2009. |

[77] | H. Sprekeler. On the relation of slow feature analysis and Laplacian eigenmaps. Neural Computation, vol. 23, no. 12, pp. 3287-3302, 2011. |

[78] | L. Wiskott, T. Sejnowski. Slow feature analysis: Uunsupervised learning of invariances. Neural Computation, vol. 14, no. 4, pp. 715-770, 2002. |

[79] | M. Luciw, J. Schmidhuber. Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In Proceedings of the 22nd International Conference on Artificial Neural Networks and Machine Learning, Lausame, Switzerland, pp. 279-287, 2012. |

[80] | R. Legenstein, N.Wilbert, L. Wiskott. Reinforcement learning on slow features of high-dimensional input streams. PLoS Computational Biology, vol. 6, no. 8, Article number e1000894, 2010. |

[81] | W. Böhmer, S. Grünewälder, Y. Shen, M. Musial, K. Obermayer. Construction of approximation spaces for reinforcement learning. Journal of Machine Learning Research, vol. 14, pp. 2067-2118, 2013. |

[82] | G. E. Hinton, R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, vol. 313, no. 5786, pp. 504-507, 2006. |

[83] | Y. Bengio, A. Courville, P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, 2013. |

[84] | I. Arel, D. C. Rose, T. P. Karnowski. Deep machine learning -A new frontier in artificial intelligence research. IEEE Computational Intelligence Magazine, vol. 5, no. 4, pp. 13-18, 2010. |

[85] | G. E. Hinton, S, Osindero, Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006. |

[86] | R. Salakhutdinov, G. E. Hinton. A better way to pretrain deep Boltzmann machines. In Proceedings of Advances in Neural Information Processing Systems 25, MIT Press, Cambridge, MA, pp. 2456-2464, 2012. |

[87] | Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle. Greedy layer-wise training of deep networks. In Proceedings of Advances in Neural Information Processing Systems 19, Stateline, NV, USA, pp. 153-160, 2007. |

[88] | P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, vol. 11, pp. 3371-3408, 2010. |

[89] | Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. |

[90] | G. E. Hinton. A practical guide to training restricted Boltzmann machines. Neural Networks: Tricks of the Trade, 2nd ed., G. Montavon, G. B. Orr, K. R. Müller, Eds., Berlin, Germany Springer, pp. 599-619, 2012. |

[91] | B. Sallans, G. E. Hinton. Reinforcement learning with factored states and actions. Journal of Machine Learning Research, vol. 5, pp. 1063-1088, 2004. |

[92] | M. Otsuka, J. Yoshimoto, K. Doya. Free-energy-based reinforcement learning in a partially observable environment. In Proceedings of the 18th European Symposium on Artifical Neural Networks, Bruges, Belgium, pp. 541-546, 2010. |

[93] | S. Elfwing, M. Otsuka, E. Uchibe, K. Doya. Free-energy based reinforcement learning for vision-based navigation with high-dimensional sensory inputs. In Proceedings of the 17th International Conference on Neural Information Processing: Theory and algorithms, Sydney, Australia, pp. 215-222, 2010. |

[94] | N. Heess, D. Silver, Y. W. Teh. Actor-critic reinforcement learning with energy-based policies. In Proceedings of the 10th European Workshop on Reinforcement Learning, pp. 43-58, 2012. |

[95] | F. Abtahi, I. Fasel. Deep belief nets as function approximators for reinforcement learning. In Proceedings of IEEE ICDL-EPIROB, Frankfurt, Germany, 2011. |

[96] | P. D. Djurdjevic, D. M. Huber. Deep belief network for modeling hierarchical reinforcement learning policies. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, IEEE, Manchester, UK, pp. 2485-2491, 2013. |

[97] | R. Faulkner, D. Precup. Dyna planning using a feature based generative model. In Proceedings of Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Vancourer, Canada, pp. 1-9, 2010. |

[98] | S. Lange, M. Riedmiller, A. Voigtlander. Autonomous reinforcement learning on raw visual input data in a real world application. In Proceedings of International Joint Conference on Neural Networks, Brisbane, Australia, pp. 1-8, 2012. |

[99] | S. Lange, M. Riedmiller. Deep auto-encoder neural networks in reinforcement learning. In Proceedings of International Joint Conference on Neural Networks, IEEE, Barcelona, Spain, 2010. |

[100] | J. Mattner, S. Lange, M. Riedmiller. Learn to swing up and balance a real pole based on raw visual input data. In Proceedings of Advances on Neural Information Processing, Springer-Verlag, Stateline, USA, pp. 126-133, 2012. |

[101] | V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antogoglou, D. Wierstra, M. Riedmiller. Playing Atari with deep reinforcement learning. In Proceedings of Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Nevada, USA, pp. 1-9, 2013. |

[102] | D. P. Bertsekas. Weighted Sup-norm Contractions in Dynamic Programming: A Review and Some New Applications, Technical Report LIDS-P-2884, Laboratory for Information and Decision Systems, MIT, USA, 2012. |

[103] | R. Munos. Error bounds for approximate policy iteration. In Proceedings of the 20th International Conference on Machine Learning, Washington DC, USA, pp. 560-567, 2003. |

[104] | R. Munos. Performance bounds in L_{p}-norm for approximate value iteration. SIAM Journal on Control and Optimization, vol. 46, no. 2, pp. 541-561, 2007. |

[105] | R. Munos, C. Szepesvari. Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, vol. 9, pp. 815-857, 2008. |

[106] | S. A. Murphy. A generalization error for Q-learning. Journal of Machine Learning Research, vol. 6, pp. 1073-1097, 2005. |

[107] | O. Maillard, R. Munos, A. Lazaric, M. Ghavamzadeh. Finite-sample analysis of Bellman residual minimization. In Proceedings of the 2nd Asian Conference on Machine Learning, Tokyo, Japan, pp. 299-314, 2010. |

[108] | A. Lazaric, M. Ghavamzadeh, R. Munos. Analysis of classification-based policy iteration algorithms. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 607-614, 2010. |

[109] | A. Farahmand, R. Munos, C. Szepesvári. Error propagation for approximate policy and value iteration. In Proceedings of Advances on Neural Information and Processing Systems 23, Vancourer, Canada, pp. 568-576, 2010. |

[110] | A. Almudevar, E. F. de Arruda. Optimal approximation schedules for a class of iterative algorithms, with an application to multigrid value iteration. IEEE Transactions on Automatic Control, vol. 57, no. 12, pp. 3132-3146, 2012. |

[111] | A. Antos, R. Munos, C. Szepsevári. Fitted Q-iteration in continuous action-space MDPs. In Proceedings of Advances in Neural Information and Processing Systems 20, pp. 1-8, 2007. |

[112] | A. Lazaric, M. Ghavamzadeh, R. Munos. Finite-sample analysis of LSTD. In Proceedings of the 27th International Conference onMachine Learning, Haifa, Israel, pp. 615-622, 2010. |

[113] | A. Lazaric, M. Ghavamzadeh, R. Munos. Finite-sample analysis of least-squares policy iteration. Journal of Machine Learning Research, vol. 13, no. 1, pp. 3041-3074, 2012. |

[114] | A. Lazaric. Transfer in reinforcement learning: A framework and a survey. Reinforcement Learning: State-of-the-Art, Adaptation, Learning, and Optimization, M.Wiering, M. van Otterlo, Eds., Berlin, Germeny: Springer-Verlag, pp. 143-173, 2012. |

[115] | Y. X. Li, D. Schuurmans. MapReduce for parallel reinforcement learning. In Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning, Athens, Greece, pp. 309-320, 2011. |