Abstract Landing guidance for reusable launch vehicle should ensure the accuracy of landing position and velocity as well as minimized fuel consumption. Landing guidance methods based on optimal control is based on accurate rocket dynamic model, which corrupts the scalability of guidance methods. To address this problem, a neural network landing guidance policy is developed based on model-free iterative reinforcement learning approach. First, a Markov decision process model of the rocket landing guidance problem is established, and a staged reward function is designed according to the terminal constraints and fuel consumption index; Further, a multilayer perceptron guidance policy network is developed, and a model-free proximal policy optimization algorithm is adopted to achieve iterative optimization of the guidance policy network through interaction with the rocket landing guidance Markov decision process; Finally, the guidance policy is validated under simulations of a reusable launch vehicle landing scenario. The results show that the proposed reinforcement learning landing guidance policy can achieve high landing accuracy, near optimal fuel consumption, and adaptivity to parameter uncertainty of the rocket model.