1 #============================================================================
2 # Enca v1.12 (2009-10-29) guess and convert encoding of text files
3 # Copyright (C) 2000-2003 David Necas (Yeti) <yeti@physics.muni.cz>
4 # Copyright (C) 2009 Michal Cihar <michal@cihar.com>
5 #============================================================================
7 List of user-visible changes in Enca
8 More detailed log can be obtained from older changelogs or git log.
11 * change of behaviour (including disappearing of a feature)
15 - Fixes some minor memory leaks.
16 - Fixes little problems in autoconf scripts.
19 - Dropped scanf configure test which is not used at all.
20 - Fixes some wrong format strings.
23 + Enca is back alive or at least in maintenance mode.
24 * Enca now lives in git repository, see <http://gitorious.org/enca>.
25 - Add missing charset koi8u to belarussian language.
26 - Fixed some typos in program and documentation.
29 + support for HZ encoding
30 * Big5 and GBK detection improved
31 - enca.spec no longer installs docs to world-unreadable directory
34 + Chinese (Big5 and GBK) support (thanks to Zuxy)
35 * deb/ subdirectory is gone as there is finally an Enca package in Debian
36 (thanks to Michal Cihar)
37 - manual page clean-up (thanks to Michal Cihar)
40 + new name type: preferred MIME name (option -m)
41 - broken iconv detection on some system was fixed
44 * English language names (--list=languages, enca_language_english_name())
45 were changed to lowercase to match common locale aliases
46 - Win32, i.e. MinGW and Cygwin, build problems were fixed
49 - crash on impossible recovery after iconv failure in pipe was fixed
50 - rpm building problems on Mandrake Linux were fixed
53 - dependency of guessing API on locales (via ctype functions) was fixed
54 - --help text generation failure on some systems was fixed
57 + [libenca] it's possible to get analyser option values, not just set them
58 * a good BOM (byte order mark) increases the chance of being recognized for
60 * external converter wrappers were moved from bin to libexec and the b-
61 prefix was removed (though it still works)
62 * external converters are no longer searched in PATH, nonstandard ones
63 has to be specified with full path
66 - fixed segfault in language detection for some locale setups
69 - fixed losing data at the end of file when using external converters in a
70 pipe (and maybe in other situations)
71 - [libenca] enca_analyser_free() not freeing analyser completely was fixed
74 * deprectated options -T, -R, -S, -u, -U, -m, and -M were finally removed
75 * default HTML API docs installation path changed to the new gtk-doc style
76 (DATADIR/gtk-doc/html/enca)
77 * debian/ subdir moved to deb/ to allow official deb creation w/o too much
80 enca-0.99.4 2003-07-15
81 - several race conditions in librecode and iconv interfaces were fixed
82 - temporary file names are much less predictable now
84 enca-0.99.3 2003-06-30
85 * Debian package is back from death
86 * failure to find external converter is now fatal
87 - fixed build problems on FreeBSD (and probably other Unices)
88 - libiconv is not used for `conversion to ASCII' since never does the
89 Right Thing, whatever it is
90 - when conversion with libiconv fails, the file should now survive intact
91 - fixed build problems on systems w/o libiconv (hopefully)
92 - fixed distclean and uninstall targets to really clean and uninstall
94 - fixed builds with separate source (read-only) and build directories
95 - fixed builds with --without-libiconv and --without-librecode on GNU/Linux
96 - external converter is not checked when it's not going to be used
98 enca-0.99.2 2003-06-25
99 + EOL type is used to decide ambiguous cases, e.g. CP1250 is reported
100 instead of ISO-8859-2/CRLF
101 * --list languages by default prints English names, instead of ISO-639a
102 codes, use -e or -r to get the old listing
103 * if LC_CTYPE is something like en_US, more locale categories are examined
104 to detect the language
105 * cork charset was modified to contain \n, \r and \t in the same places as
107 * some heuristics tuning
109 enca-0.99.1 2003-06-22
110 + libenca pkg-config support
111 * all libenca tuning parameters (-T, -R, -S, -u, -U, -m, and -M) were
112 marked deprecated and are noop, Enca should DWIM
113 * ambiguity is now always OK when the sample has the same meaning in all the
115 * deprecated `built-in-encodings' and `encodings' lists were removed
116 * PAGER feature was removed
117 - exchanged `latvian' and `lithuanian' language names were fixed (`lv' and
119 - missing tests for the new languages was added to the test suit
121 enca-0.99.0 2003-06-14
122 + added some support for: Bulgarian, Croatian, Estonian, Hungarian, Latvian,
124 + a new algorithm for 8bit-dense languages (cyrillics), the old one is used
126 * removed support for non-transitive iconv (such a thing should not exist)
127 * auxiliary tools in data are not longer built in regular builds,
128 use --enable-maintainer-mode to rebuild them, create dists, etc.
129 - fixed iconv interface surface check pickier than iconv itself inhibiting
130 some otherwise possible conversions
131 - fixed u+x permissions on temporary files (from 0.10.7)
132 - fixed not deleting temporary files in iconv interface
133 - fixed broken iconv interface behaviour in pipes
134 - fixed iconvcap misdetecting Latin5 as ISO-8859-5
135 - fixed casual `make distclean' failures
137 enca-0.10.7 2003-01-28
138 - fixed interchanged iconv and cstocs encoding names
139 - corrected(?) librecode surface interaction
140 - fixed a temporary file creation race condition
141 * added tex and utf8 to cstocs (names and b-cstocs)
143 enca-0.10.6 2002-10-22
144 + enconv uses DEFAULT_CHARSET variable, exactly as recode
145 - ENCAOPT works everywhere, albeit imperfectly
146 - options -P and -p no longer imply -M too
147 - ambiguous mode (-M) works again
148 - pager is run so that help text doesn't disappear
149 - standard input it printed as STDIN with -d, not as null
150 - make check works again
151 - it compiles wihtout recode again
153 enca-0.10.5 2002-10-13
154 + UTF-8 recognition in binary and otherwise messy files
155 + detection of double-encoding from some 8bit charset to UTF-8
156 + Cork encoding conversion
157 * librecode interaction was (hopefully) improved
158 - fixed some build-time problems
160 enca-0.10.4 2002-10-10
161 + added Cork encoding support for Czech, Slovak and Polish
162 - empty files are now considered convertible to any encoding
163 - removed the so-called faster (in fact slower) I/O
164 - fixed some more compile-time search path issues
166 enca-0.10.3 2002-09-22
167 * added support for perl umap as external converter
168 - fixed external converter wrappers to work with standard sh
169 - fixed some compile-time library search path issues
171 enca-0.10.2 2002-09-15
172 + target charset is automatically obtained from locales when called as
173 enconv, new options --guess, --auto-convert
174 + English language names can be used instead of ISO-639 codes everywhere
175 - cs_SK and ru_UA locales are properly recognised as Slovak and Ukrainian
177 enca-0.10.1 2002-08-29
179 * external converters can be disabled at build time
180 - `-' is accepted for standard input
181 - fixed broken built-in converter
182 - fixed crasing on an unknown language
183 - trivial (identity) conversions are not performed any more
184 - help is now printed when input is a terminal and no argument specified
185 - changed braindamaged <STDIN>, <STDOUT> to STDIN, STDOUT in messages
186 - various small fixes and build-time improvements
188 enca-0.10.0 2002-08-26
189 + added support for Ukraininan (CP1251, IBM855, ISO-8859-5, KOI8-U, maccyr
190 CP1125), Belarussian (CP1251, IBM866, ISO-8859-5, KOI8-UNI, maccyr,
191 IBM855) and Polish (ISO-8859-2, ISO-8859-12, ISO-8859-16, Baltic, macce,
193 + Enca library introduced
194 * dropped native Debian package
195 * --details no longer prints guessing details (now is mostly like --human)
196 * --list=encodings, --list=built-in-encodings corrected to --list=charsets,
197 --list-built-in charsets (old names supported with a warning)
198 * improved Czech and Slovak charsets detection
200 enca-0.9.4: 2002-03-03
201 - built-in converter didn't convert more than first 64kB of a file
203 enca-0.9.3: 2001-07-16
204 + a native Debian package
205 - fixed random reporting of nonsense results
206 - fixed self-contradictory --details output when file was quoted-printable
208 - fixed poor performance on non-GNU/Linux
209 - made pager less intrusive (instead of intrusive `less' ;-)
210 - --list=encodings prints only `known' encodings
211 - fixed several compile-time/portability problems
213 enca-0.9.2: 2001-07-13
214 * --help and --license are displayed through pager (when possible)
215 - fixed broken language hooks--they were never activated (from 0.9.1)
216 - fixed reporting ASCII when a 7bit encoding was detected
217 - fixed boundary-case behaviour when recovering from librecode failures
219 enca-0.9.1: 2001-06-25
220 + support for Macintosh Cyrillic, including conversion
221 + support for unusual UCS-4 byte orders (3412 and 2143)
222 + new option --license printing full enca license
223 * exit codes now make sense (0, 1, 2; where 2 means serious troubles)
224 - temporary files are no longer world-readable
226 enca-0.9.0: 2001-03-26
227 Serious incompatibilities:
228 * -E and -C option letters exchanged (much better mnemonics)
229 * converter wrappers renamed to b-cstocs and b-recode
230 * finding only 7bit ASCII is no longer considered failure
231 * need to use --language to set language (sometimes)
232 * dull converter behaviour no longer supported, -x syntax changed
233 * option -g removed (try --name=aliases)
234 * option -c changed to --list=converters, listing format changed
235 * option -l changed to --list=encodings, listing format changed
236 * converter names are no longer case insensitive
237 * no longer uses cstocs names as canonical
238 * external converters are called with Enca's names, not cstocs's
241 + support for slovak and russian (and `none') language
242 + support for CP1251, IBM866, ISO-8859-5 and KOI8-R, including conversion
243 + UCS-2, UCS-4, UTF-8, UTF-7 and LaTeX encoding recognition
244 + much more encoding aliases accepted
245 + long `GNU style' command line options
246 + new output types: --enca-name, --iconv-name
247 + output type --name=WORD allowing to select output type by name
248 + ENCAOPT environment variable
249 + language detection from locales
250 + support for surfaces (experimental)
251 + new option --list printing various listings
252 + new converter wrapper b-map (for perl `map')
253 + new option -m to reset -M back
254 + new language filters
255 + new options -u and -U to control multibyte encoding checks
256 + included [generated] enca.spec into the tarball to allow `rpm -tb'
258 * read limit changed to 16MB
259 * librecode now run with flags diacritics_only and ascii_graphics
260 - fixed broken -P options
261 - fixed several build problems on non-GNU/Linux systems
262 - fixed some missing and wrong characters in Unicode data
263 - temporary copy of damaged original file is not deleted when rescue fails
265 enca-0.8.x: Since features planned for 0.8 and 0.9 happened to be developed
266 simultaneously, this version number has been skipped.
268 enca-0.7.7: 2001-01-01
269 + ability to use UNIX98 iconv conversion functions
270 + the word `none' can be used as -E parameter causing clearing of converter
272 - fixed disarranged help text, misspelled word `European' in macce long
273 name, obsolete statements in manual page and other stuff of this kind
275 enca-0.7.6: 2000-11-20
276 + any converter combination/order can be now specified with -E, old -E
277 meaning is no longer valid
278 + new option -c (list all valid converter names)
279 * cork encoding not supported anymore
281 * `/' is added to recode recoding requests thus partially solving the
282 surface problem---surface never changes
283 * some errors like specifying invalid value of threshold are no longer fatal,
284 the bad values are ignored instead
285 * handling of some exotic characters in bulit-in converter slightly changed
286 - fixed several fatal bugs regarding stdin to stdout conversion
287 - stdin is copied to stdout in case of failure whenever possible/applicable
289 enca-0.7.5: 2000-10-25
290 * license changed to GNU GPL Version 2 (i.e. license version is explicitly
292 * prints error message when conversion is impossible
293 * binary data filter improved/changed
294 - fails back to external converter when GNU recode library cannot convert
295 due to errorneous request
296 - '' no longer causes enca to read from stdin
297 - tries to restore files damaged by GNU recode library
299 enca-0.7.4: 2000-10-12
300 + box-drawing characters are (carefully) filtered out when guessing
301 - fixed intermixed behaviour in SMS/nonSMS modes
303 enca-0.7.3: 2000-10-09
304 + blocks of probably binary data are filtered out when guessing
305 * standard input is copied to standard output when its encoding is unknown
306 - fixed reading only 4096 bytes from pipe (from 0.7.1)
308 enca-0.7.2: has been never released
309 + GNU recode recoding chains made possible by starting -x (convert) parameter
311 + second best guess is marked with `-' in -d (print details) output
313 enca-0.7.1: 2000-10-02
314 * in case of nonfatal i/o failure enca continues processing remaining files
316 enca-0.7.0: 2000-09-26
317 + standard input to standard output conversion
318 + short message mode -M
319 + ability to use GNU recode library
320 + new output type -r (encoding name after RFC1345)
321 + ability to convert cork internally
322 + new external converter brecode (recode wrapper)
323 + new output type -g (list of aliases)
324 + new option -V (verbose)
325 * -x (convert) paramteres syntax changed to in_enc..out_enc (old syntax still
326 supported, will be removed in 0.8.x)
327 * option -e (disable external) no longer supported, empty string as -C
328 (external converter) parameter can be used instead
329 * encoding names specified as -x (convert) parameters are case insensitive
330 * ascii is not considered unknown encoding (i.e. failure) so enca returns 0
331 * -d (print details) output improved/changed/updated
332 * -p (prefix result with file name) no longer prints conversion details
333 * by default result is prefixed by file name when enca is run on more than
336 enca-0.6.2: 2000-08-17
337 + help texts (-h and -v) made usable (thanx to Halef)
339 enca-0.6.1: 2000-08-15
342 enca-0.6.0: 2000-07-20
343 + bulilt-in converter
344 + -x (convert) can now take form -x in_enc,out_enc causing enca to behave
345 like a dull converter
346 + new options -e and -E (disable internal/external converter)
347 + new option -l (print internally-convertible encodings)
349 enca-0.5.0: 2000-07-17
350 * -p (prefix result with file name) causes enca to print what is converted
352 * iso8859-2/cp1250 recognition improved
353 - doesn't spawn external converters as fast as is possbile, but waits for
355 - fixed `Unrecognized encoding' when winner is 1250 (from 0.4.3)
356 - corrected -d (print details) table alignment
358 enca-0.4.3: 2000-07-14
359 * -d (print details) prints encodings alphabetically sorted
360 - corrected short encoding name t1 -> cork
361 - division-by-zero bugfixes
363 enca-0.4.2: has been never released
364 * options -m/-M ([don't] use iso8892-2/cp1250 hack) no longer supported
365 - fixed showing standard input as empty string (<STDIN> is printed now)
367 enca-0.4.1: 2000-07-12
368 * default of 60 significant characters changed to 10
370 enca-0.4.0: 2000-07-10
371 + first public release