Posts Tagged unison

Native Unison Unicode support

As has been already mentioned in the comments to my last Unison Unicode post, native Unicode support is now available in the development version av Unison. Up until now however, I had been sticking with my home brewed version, in the spirit of “if it ain’t broken, don’t fix it”. But as problems with AppleDouble files started to surface, a bug has been fixed in later versions, it was time for to update.

Step 1, OSX binary: Couldn’t find precompiled ones, but as I already had OCaml installed I just checked out the latest trunk and compiled with “make UISTYLE=text”. If you want to avoid this hassle you can just get my binary here: unison-2.39.0-osx.zip (compiled under MacOSX 10.5.3, 568 kb).

Step 2, Win32 binary: Actually there are two options here. Initially I tried the precompiled binaries provided by Jérôme Vouillon, but the problem with the native win32-version is that it doesn’t support symbolic links! Turns out I had lots of those, so that wasn’t really an option. The other alternative is to run Unison from Cygwin (which I need for sshd on the Windows side anyway). That way symbolic links are supported (cygwin creates Windows style “.lnk” files, but Unison doesn’t know this and assumes they are real symbolic links).

That means I have to compile Unison myself under Cygwin. Luckily this was surprisingly simple, just install Cygwin with packages “make” and “ocaml”, check out the Unison trunk, and run “make UISTYLE=text” again! As before, if you want to avoid this hassle you can just get my binary here: unison-2.39.0-cygwin.zip (compiled under Cygwin 1.7.1/Win2k3 32bit SP2, 564 kb).

Step 3, update scripts and synchronize: Once this was done, I just modified my scripts to use the new binaries (note that I’m no longer using the unicode-hack for cygwin) and everything worked! Unicode support is completely automatic!

My thanks go out to all the developers of Unison!

, , ,

No Comments

I can haz success! Unison hack to enable Unicode normalization of filenames

NOTE: The latest development version of Unison now has built in Unicode support. Check this post for how to compile and use it!
DISCLAIMER: This is a very ugly hack! It’s been tested to work in MY setup, but might not work in yours. I really don’t know OCaml, or makefiles for that matter. You have been warned!

ihazsucceessAfter much agony I’ve finally managed to build a hacked version of Unison to make my file sync setup work. The problem, as explained earlier, is that Unison doesn’t support Unicode, and that I have to synchronize files between Mac OSX-machines (using UTF8 NFD-normalized filenames) and Windows machines (using latin1 or UTF8 NFKC-normalized filenames). To make filenames containing non ASCII characters transfer correctly, some kind of conversion has to be made, and as of now Unison does not support this.

In my file sync setup, I have three OSX machines synchronizing files using a Windows server as the central node (all OSX machines sync with the Windows machine). Synchronization is always initiated from one of the OSX-machines. What I have done is to install Cygwin on the Windows machine, and also install a hack for Cygwin which enables UTF8 support.

When I first did this I thought it would be enough, but since Windows/Cygwin and OSX uses different Unicode normalization (NFKC and NFD) the bit-by-bit representation of the filenames are different. This is what I set out to fix. I have inserted a few lines of code in the function the preprocesses filenames before comparison is done in Unison. Those lines uses the Camomile Unicode library to normalize the filename to NFKC, so when the OSX and Windows filenames are compared a little bit later they will be bit-wise identical.

This is DEFINITELY not the best way to do this, and does not by far fix all of Unison’s encoding problems. What one should do is to rewrite all of the filename handling to support Unicode and also other encodings. But I don’t know OCaml very well, in fact I find it quite confusing and frustrating, so for the moment this will have to do for me.

And it seems this is enough to fix my problems. The hack only needs to be applied to the OSX-side of Unison to work, even though it would probably be better if it was applied to both sides (but I’m WAY too lazy to try to compile Unison in Cygwin if it seems I don’t have to :P).

So, if anyone needs to sync an OSX machine with a Windows machine, or perhaps with a Linux machine with a UTF8 filesystem, this could perhaps be of some help to you. (Note that while OSX and Windows/Cygwin enforces NFD and NFKC respectivly, Linux does NOT. So in Linux it would be possible to have to two different files with seemingly identical names, but with different normalization. This would obviously not work well with this hack, but that would probably be a less than ideal situation anyway.)

Quick install:

This is the quick install for people who don’t want to compile stuff.

  1. Download my precompiled (OSX Leopard) Unison binary here: unison-unicode.zip (600KB, based on Unison 2.27). You only need the modified binary on the OSX side (as long as synchronization is initiated from that side), but all other machines must use the same version of Unison (2.27).
  2. Download the Camomile data files (5MB). These files must be extracted into /usr/local/share/camomile on your OSX machine (hardcoded, sorry!).

Build yourself:

These are instructions for how to build the modified Unison version yourself (for OSX, but might work on other architectures as well):

  1. Download and install OCaml.
  2. Download and install/build Camomile (follow instructions and use the default installation directory).
  3. Checkout a version of Unison with Subversion (I’m using /branches/2.27, but I think it will work with the latest beta version as well).
  4. Replace the files src/case.ml and src/src/Makefile.OCaml with these files.
  5. Compile using “make UISTYLE=text”.
  6. The new Unison binary will be at src/unison. I would recommend you rename it to unison-unicode or something to tell it apart from your regular Unison version.

Your modified binary (from either the quick or full install) will enable you to synchronize files with Unicode filenames between an OSX machine and another machine with a UTF8 filesystem (for example Linux). If you want to sync with Windows you need to install Cygwin (make sure to select the unison package during installation) and the Cygwin UTF8 hack as well (make sure it’s the cygwin unison binary that is being used during synchronization, use the parameter “-servercmd /usr/bin/unison”).

Note that this version of Unison requires that the two file systems being synchronized are UTF8, if it encounters a filename that is not valid UTF8 it will probably crash!

If anyone actually tries this, please post your comments below! Thanks ;)

, , ,

8 Comments

Unison Unicode problems

Unison is a pretty awesome file synchronizing utility. It’s free, open source, highly customizable and scriptable. It does, however, have one big flaw: it doesn’t support Unicode. As long as you synchronize between file systems of identical encoding, it doesn’t matter. Unfortunately however, Windows, Linux and MacOSX all use different encodings per default.

My setup synchronizes files between 3 different OSX-machines using a Windows server as the central node. File names containing non-ascii characters like ÅÄÖ gets messed up when transferred, eg. the OSX file räksmörgås.txt will appear as räksmörgaÌŠs.txt on the Windows machine.

This is very annoying. I really like my synchronization setup, and this is the only problem I have with it. What to do? Windows uses latin1 encoding for file names, and OSX uses utf8. What if you could trick windows into using utf8 also? Linux supports utf8 file names, so maybe cygwin can help. Nope, turns out Cygwin does not support Unicode… Googled “cygwin unicode” and found a hack to cygwin which enables Unicode and utf8 support for file names. My hope was rising as räksmörgås.txt seemed to correctly appear on the Windows side. Yes I had done it! Ran unison again to to double check, and the file was now for some reason flagged as new on the windows side, and the whole operation failed when unison tried to copy the file back to the OSX side and failing when discovering that the file was already there.

So, it turns out that there is such a thing as Unicode Normalization. Short story: The same character can be represented in different ways in Unicode, namely composed or decomposed. And, to make matters worse, OSX uses the decomposed form (NFD), and Windows/hacked Cygwin uses the composed form (NFKC). So even though the file is called räksmörgås.txt on both machines, the exact bit representation of the name is different. If I had used a Unicode aware program, this wouldn’t have been a problem and the file names would have been recognized as identical. But as I said, Unison is NOT such a program…

I’ve done some research (ie, googled) there doesn’t seem to be any plans to incorporate Unicode support in Unison. It turns out Unison is written in OCaml, which doesn’t nativly support Unicode, so adding support for this would according to Unisons developers be pretty hard.

But how hard can it really be? I just need to make sure that both filenames are normalized before they are compared. And there are third party libraries to enable Unicode support in OCaml. So I went off and downloaded the Unison source code, the OCaml binaries, and the Unicode library (Camomile). It was pretty easy to locate the piece of code where the normalization should, or at least could, be done. Only one problem remains: Camomile is very poorly documented, and comes with absolutely no example code! Right, two problems: OCaml is a functional languange (like Haskell), and it turns out I hate functional languages!

To be continued (hopefully)…

UPDATE: Problem kind of solved!

, , ,

No Comments

Syncing OSX Stickies

I’ve been missing the ability to sync notes since I abandoned my last Palm… I used the Palm/Outlook synchronized notes for everything from remebering song lyrics to keeping track of how much money my friends borrowed to buy ice cream.

In OSX  there is bundled program called Stickies which displays lots of colored post-it-like notes on the screen. It’s not really what I want since you have to have all your notes on screen at the same time (if you close one, it counts as deleting it!), but I guess it will do for now. At least as long as I can sync the notes between my OSX machines!

And, it turns out its pretty simple. The notes are stored in ~/Library/StickiesDatabase, so all I needed to do was to add this file to my OSX-prefs Unison sync profile! Of course, you can’t go around editing your notes on two computers at the same time without syncing, but most of the time I’m only using one computer at a time, so it shouldn’t be that much of a problem.

, ,

No Comments