We designed a novel method that converts a word image into a sequential signal. We designed an ensembling RNN for word-level scene text recognition which obtained superior recognition accuracy. Our method uses publicly available datasets in training which provides a baseline for benchmarking of the future works. Our method uses word instead character level annotations, which reduces the efforts in ground truth generation greatly.