1 #============================================================================
2 # Enca v1.12 (2009-10-29) guess and convert encoding of text files
3 # Copyright (C) 2000-2003 David Necas (Yeti) <yeti@physics.muni.cz>
4 # Copyright (C) 2009 Michal Cihar <michal@cihar.com>
5 #============================================================================
9 0. Developing programs utilizing libenca
10 1. How to add a new charset/encoding to libenca
11 2. How to add a new surface to libenca
12 3. How to add a new language to libenca
13 4. Automake, autoconf, libtool, ... note
16 0. Developing programs utilizing libenca
17 ****************************************
19 * Look at libenca API documentation in devel-docs/html.
20 * Look into enca source how it uses libenca.
21 Note enca is quite a simple application (practically all libenca
22 interaction is in src/enca.c). It's single-threaded and uses one
23 language and one analyser all the time. Provided each thread has its own
24 analyser, libenca should be thread-safe (untested).
25 * Take names starting with ENCA, Enca, enca, _ENCA, _Enca, and _enca
27 * pkgconfig is supported, you can use PKG_CHECK_MODULES to check for libenca
28 in your configure scripts
32 1. How to add a new charset/encoding
33 ************************************
35 (optional steps are marked `[optional]'):
38 * Add a new test (even if you are 100% sure iconv will never support it),
39 please see top of iconvcap.c for some documentation how it works.
42 * Use @ICONV_NAME_<name>@ (as it will appear in iconvcap output) for
48 Specifically, for regular 8bit (language dependent) charsets:
51 * Add a new map to Unicode (UCS-2) unicode_map_...[].
52 * Add a new UNICODE_MAP[] entry.
53 lib/filters.c: [optional]
54 * Create a new filter or make an alias of an existing filter.
56 * Add the new encoding to some existing language(s).
57 * Add appropriate filters or hooks [optional].
59 * Add a new map to Unicode (UCS-2)
62 Specifically, for multibyte encodings:
65 * Create a new check function.
66 * Put it into appropriate ascii/8bit/binary test group
67 ENCA_MULTIBYTE_TESTS_ASCII[], ENCA_MULTIBYTE_TESTS_8BIT[],
68 ENCA_MULTIBYTE_TESTS_BINARY[].
69 * Put strict tests (i.e. test which may fail) first, looks-like tests
74 2. How to add a new surface
75 ***************************
77 * Try to ask the author what to do, since this may be complicated, or
78 * Hack, basically it must be added to lib/enca.h EncaSurface enum,
79 to lib/encnames.c SURFACE_INFO[] a detection method must be added to
80 lib/guess.c and now the most complicated part: this new method must be
81 used ``in the right places'' in lib/guess.c make_guess().
85 3. How to add a new language
86 ****************************
88 Create a new language file:
89 * Create new lib/lang_....c files by copying some existing (use locale code
91 * Fill all encoding and occurence data, create filters and hooks (see
92 filters.c too). You can do it manually, but look how it's done for
93 existing languages in data/* and read data/README.
95 * Add new ENCA_LANGUAGE_....
97 * Add a new LANGUAGE_LIST[] entry pointing to the ENCA_LANGUAGE_....
101 4. Automake, autoconf, libtool, ... note
102 ****************************************
104 If you run ./autogen.sh and it finishes OK, you are lucky and can expect
107 You have to give --enable-maintainer-mode to ./configure (or ./autogen) to
108 build dists and/or the strange stuff in tools/, data/, tests/, and