Your best friend for file transfer.Fetch
issue with unicode normalization in filenames (5 posts)
- Started 5 years ago by Chad
- Latest reply 4 years ago from Scott McGuire
Hi! We are having issues with filenames containing non-ASCII characters. I don't know if this is a Fetch issue or not, perhaps a bug but more likely a setting somewhere that I am missing.
Several users share an FTP space on a Linux server. Most of us use OS X on the client side, so filenames are mostly normalized to NFD (canonical decomposition). Occasionally non-NFD filenames make their way onto the server though. Linux doesn't seem to make any attempt to consider canonical equivalence in filenames, so it's possible to have multiple files or folders that have canonically equivalent filenames (which look exactly the same and would be considered the same by the user).
Question 1: It would be *really* nice if Fetch would handle this for us, i.e. if I am uploading a file or folder and it sees that there is one already there with a canonically equivalent name, it should treat it as if it were the same name. Is this possible? Is this Fetch's responsibility? Or the server's?
Question 2: Fetch 5.5.3 apparently cannot open (read or write) the contents of folders that are on the server with NFD (decomposed) filenames. However, I can read both NFD and NFC without any issues using Fetch 5.3.1. Is this a bug?
Scott McGuire Administrator
We have done some work with how Fetch handles Unicode filenames and canonicalization of them to try to make it as compatible as possible. I'm not sure if there's anything we can do for your situation, but we will look into it.
The first thing I'd suggest you try is changing Fetch's "preferred encoding" setting, which among other things, controls how Fetch interprets filenames from servers. If it is not set to "Unicode (UTF-8)" already, you should change it to that and let us know if that fixes either of the problems you're experiencing. To find and change this setting:
* Close any open connections in Fetch.
* Go to the Fetch menu, and choose Preferences.
* Click the Miscellaneous tab.
* Click the "Preferred encoding" menu, and choose "Unicode (UTF-8)"
Then try working with the server in question and let us know if that helps or not (or let us know if that is the setting you were already using).
Hi Scott, thanks for the quick reply!
Yes, we're already set to "Unicode (UTF-8)" as the preferred encoding.
What seems to be happening, in the case of my 2nd question above, is that Fetch gets a list of contents from the server and is converting filenames from NFD to NFC in the process. Then if it subsequently requests the NFC version of the filename, the server reports that no such file exists. This was not the behavior in 5.3 but is new in 5.5 (not sure about intermediate versions). I'd call this a bug - Fetch should request the filename as originally reported by the server, not as internally converted to another normalization form.
(Whether the server considers those two filenames to be distinct is another issue.)
Ben Artin Administrator
Chad, you are right, there is a problem with accessing NFD filenames on a server that is not Unicode-aware. Thanks for reporting this; I'll look into it further.
Right now, Fetch leaves it up to the server to decide what to do with files that have the same name. This is largely because it's impossible for Fetch to know exactly what file names the server considers to be the same — different servers use different text encodings, and some also disagree about whether file names are sensitive. I agree with you that it would be nice if Fetch could do this, but because of differences between servers, I am not sure we can do it in a way that doesn't cause other confusion.
Thanks for the feedback,
Scott McGuire Administrator
We've revisited how Fetch handles Unicode canonicalization and while it isn't a full solution, we have incorporated a workaround that may help with your situation some. Please send us an email to email@example.com if you'd like to help us test it and give us further feedback.
- Page 1