A systematic control-oriented model for the HTV was built.
A Markov chain model learns power transition probability recursively.
The Kullback–Leibler divergence rate determines the transition probability update.
Reinforcement learning (RL) was applied to optimize the control strategy.
The strategy improves fuel efficiency and works real time.