This paper proposes a computationally efficient method for learning features robust to image variations for facial expression recognition (FER). The proposed method minimizes the feature difference between an image under a variable image variation and a corresponding target image with the best image conditions for FER (i.e. frontal face image with uniform illumination). This is achieved by regulating the objective function during the learning process where a Siamese network is employed. At the test stage, the learned network parameters are transferred to a convolutional neural network (CNN) with which the features robust to image variations can be obtained. Experiments have been conducted on the Multi-PIE dataset to evaluate the proposed method under a large number of variations including pose and illumination. The results show that the proposed method improves the FER performance under different variations without requiring extra computational complexity.