Evaluating Long Range Reordering with Permutation-Forests Milos Stanojevic, Khalil Sima'an Abstract: Automatically evaluating the quality of word order of MT systems is challenging yet crucial for MT evaluation. Existing approaches em- ploy string-based metrics, which are computed over the permutations of word positions in system output relative to a reference transla- tion. We introduce a new metric computed over Permutation Forests (PEFs), tree-based representations that decompose permutations re- cursively. Relative to string-based metrics, PEFs offer advantages for evaluating long range reordering. We compare the present PEFs metric against five known reordering metrics on WMT13 data for ten language pairs. The PEFs metric shows better correlation with human ranking than the other metrics almost on all language pairs. None of the other metrics exhibits as stable behavior across language pairs.