I’m currently write a simple web based TPEGML editor.I used information available at http://www.bbc.co.uk/travelnews/xml/
In order to accommodate future entities changes,I regex code to parse entities into a array. While parsing English RTM entity, I notice the result was weird. I thought there was something wrong with my code. After hours of debugging, I found out that it was 3 non ASCII characters in rtm45_33 and rtm35_7 message.