#============================================================================ # Enca v1.12 (2009-10-29) guess and convert encoding of text files # Copyright (C) 2000-2003 David Necas (Yeti) # Copyright (C) 2009 Michal Cihar #============================================================================ Contents 0. Developing programs utilizing libenca 1. How to add a new charset/encoding to libenca 2. How to add a new surface to libenca 3. How to add a new language to libenca 4. Automake, autoconf, libtool, ... note 0. Developing programs utilizing libenca **************************************** * Look at libenca API documentation in devel-docs/html. * Look into enca source how it uses libenca. Note enca is quite a simple application (practically all libenca interaction is in src/enca.c). It's single-threaded and uses one language and one analyser all the time. Provided each thread has its own analyser, libenca should be thread-safe (untested). * Take names starting with ENCA, Enca, enca, _ENCA, _Enca, and _enca as reserved. * pkgconfig is supported, you can use PKG_CHECK_MODULES to check for libenca in your configure scripts 1. How to add a new charset/encoding ************************************ (optional steps are marked `[optional]'): iconvcap.c: * Add a new test (even if you are 100% sure iconv will never support it), please see top of iconvcap.c for some documentation how it works. tools/encodings.dat: * Add a new entry. * Use @ICONV_NAME_@ (as it will appear in iconvcap output) for iconv names. tools/iconvenc.null: * Add it (with NULL) Specifically, for regular 8bit (language dependent) charsets: lib/unicodemap.c: * Add a new map to Unicode (UCS-2) unicode_map_...[]. * Add a new UNICODE_MAP[] entry. lib/filters.c: [optional] * Create a new filter or make an alias of an existing filter. lib/lang_??.c: * Add the new encoding to some existing language(s). * Add appropriate filters or hooks [optional]. data/maps/??.map: * Add a new map to Unicode (UCS-2) Specifically, for multibyte encodings: lib/multibyte.c: * Create a new check function. * Put it into appropriate ascii/8bit/binary test group ENCA_MULTIBYTE_TESTS_ASCII[], ENCA_MULTIBYTE_TESTS_8BIT[], ENCA_MULTIBYTE_TESTS_BINARY[]. * Put strict tests (i.e. test which may fail) first, looks-like tests last. 2. How to add a new surface *************************** * Try to ask the author what to do, since this may be complicated, or * Hack, basically it must be added to lib/enca.h EncaSurface enum, to lib/encnames.c SURFACE_INFO[] a detection method must be added to lib/guess.c and now the most complicated part: this new method must be used ``in the right places'' in lib/guess.c make_guess(). 3. How to add a new language **************************** Create a new language file: * Create new lib/lang_....c files by copying some existing (use locale code for names) * Fill all encoding and occurence data, create filters and hooks (see filters.c too). You can do it manually, but look how it's done for existing languages in data/* and read data/README. lib/internal.h: * Add new ENCA_LANGUAGE_.... src/lang.c: * Add a new LANGUAGE_LIST[] entry pointing to the ENCA_LANGUAGE_.... 4. Automake, autoconf, libtool, ... note **************************************** If you run ./autogen.sh and it finishes OK, you are lucky and can expect things to work. You have to give --enable-maintainer-mode to ./configure (or ./autogen) to build dists and/or the strange stuff in tools/, data/, tests/, and devel-docs/.