Home Page
  • April 24, 2024, 07:03:27 am *
  • Welcome, Guest
Please login or register.

Login with username, password and session length
Advanced search  

News:

Official site launch very soon, hurrah!



Post reply

Warning: this topic has not been posted in for at least 120 days.
Unless you're sure you want to reply, please consider starting a new topic.

Note: this post will not display until it's been approved by a moderator.

Name:
Email:
Subject:
Message icon:

Attach:
(Clear Attachment)
(more attachments)
Restrictions: 10 per post, maximum total size 8192KB, maximum individual size 5120KB
Note that any files attached will not be displayed until approved by a moderator.
Verification:
Type the letters shown in the picture
Listen to the letters / Request another image

Type the letters shown in the picture:
Please stop spamming. Your spam posts are moderated and will never be displayed on the internet. What is eighty-eight minus eighty-six (spell out the answer):
Пожалуйста, прекратите спамить. Ваши спам-сообщения модерируются и никогда не будут отображаться в Интернете. What color is grass.:

shortcuts: hit alt+s to submit/post or alt+p to preview


Topic Summary

Posted by: Dakusan
« on: February 08, 2016, 12:52:16 am »


So somehow all of the file names in my Rammstein music directory, and some in my Daft Punk, had characters with diacritics replaced with an invalid character. I pasted one of such filenames into a hex editor to evaluate what the problem was. First, I should note that Windows encodes its filenames (and pretty much everything) in UTF16. Everything else in the world (mostly) has settled on UTF8, which is a much better encoding for many reasons. So during some file copy/conversion at some point in the directories’ lifetime, the file names had done a freakish (utf16*)(utf16->utf8) rename, or something to that extent. I had noticed that all I needed to do was to replace the first 2 bytes of the diacritic character with a different byte. Namely “EF 8x” to “Cx”, and the rest of the bytes for the character were fine. So if anyone ever needs it, here is the bash script.


LANG=;
IFS=$'\n'
for i in `find -type f | grep -P '\xEF[\x80-\x8F]'`; do
   FROM="$i";
   TO=$(echo "$i" | perl -pi -e 's/\xEF([\x80-\x8F])/pack("C", ord($1)+(0xC0-0x80))/e');
   echo Renaming "'$FROM'" to "'$TO'"
   mv "$FROM" "$TO"
done

I may need to expand the range beyond the x80-x8F range, but am unsure at this point. I only confirmed the range x82-x83.