I have gotten the ” This diff includes files which are not valid UTF-8 (they contain invalid byte sequences). You can either stop this workflow and fix these files, or continue. If you continue, these files will be marked as binary.” quite a few times lately. I tried various things like using SublimeText to save as UTF-8, and nothing seems to work. I was convinced there was an invalid character being appended somewhere in the file on save. I stumbled across this post about stripping invalid UTF-8 characters.
Following that suggestion I tried the following:
iconv -c -f UTF8 -t UTF8 /Users/Levi/Desktop/file.php > /Users/Levi/Desktop/file.php
It kept spitting out an empty file though for some reason. After a bit of head scratching I realized it may be writing to the file it was reading to it, which would mean that it was creating a new blank file.php, reading from file.php and encoding, and saving file.php. So it was doing this on a blank file! So what you need to do is save it to a new file name.
iconv -c -f UTF8 -t UTF8 /Users/Levi/Desktop/file.php > /Users/Levi/Desktop/file-new.php
After all that you may be thinking that fixed this, right? I was still getting the same error message so I decided to do some debugging and try and isolate the line it had an issue with in arcanist. After var_dump’ing down the path the diff takes in libphutil and arcanist I found out the reason I was continually getting the error was because the string it was sending to phabricator was the git diff of the file. So even though I removed the invalid character in the new revision, the old revision still had the invalid character.
So how do you get around this?
The only way I could get around this issue was to:
– edit the original file, remove the invalid character
– push that live
– merge that branch in with my dev branch
– arc diff
So after all the debugging the issue was not with the current file, but with the original file I was editing… which was to say the least a little frustrating to find out. So if the original file has an invalid character, your best bet is to edit the original to remove that character and push that live before diff’ing your branch.