Topic: CONVERTCP  (Read 124 times)

CONVERTCP
« on: May 18, 2020, 01:48:39 PM »

Nikky

  • Chef
  • ***
  • Location: Croatia
  • Date Registered: Jan 2016
  • Posts: 131
While playing with the VB script I noticed one awkward detail with IniWrite function,
if the file is initially in unicode (utf-8) then it is simply deletes the entire contents and only the newly written lines remains.  :ohmy:

ok, we have the ConvertToAnsi function, then IniWrite works fine.

At least as far as I know, we don't have a reverse conversion function (ANSI to Unicode  or some other).

I found this little cmd util: CONVERTCP.exe - Convert text from one code page to another
https://www.dostips.com/forum/viewtopic.php?t=7570

Code: [Select]
CONVERTCP v.7.4. Converts a stream of characters to another code page.

Usage:
CONVERTCP CP_In CP_Out [/i "infile.txt"] [/o "outfile.txt"] [/v] [/f] [/b|/a]
CONVERTCP /?|/l

CP_In     Code Page Identifier of the input stream
CP_Out    Code Page Identifier of the output stream
 To get a list of supported Code Page Identifiers use option /l
 Alternatively you can use 0 for the ANSI Code Page
  and 1 for the OEM Code Page of your system default settings.
 Instead of the Code Page Identifier you may pass the related
  MIME type, or the name of a custom *.sbcs file.

/i        Introduces the source file
/o        Introduces the destination file
           (the content of an existing file will be truncated
           unless option /a was passed)
 Redirections to or from CONVERTCP can be used instead of /i and /o

/v        Verify that all characters have been converted without
           using the replacement character or approximated ASCII
           characters
           Only in this case CONVERTCP returns a zero value
           NOTE Option /v is supported on Windows Vista and later
/f        Flush the stream buffer before CONVERTCP terminates
           in case the new file shall be accessed immediately
/b        Add the Byte Order Mark to the output stream
           (will be ignored if CP_Out was not one of
           65001, 1200, 1201, 12000, 12001, or 54936)
/a        Append the output stream to the destination file
           (always use the same CP_Out)
 Do not combine options /b and /a

/?        Display this help message
/l        Display a list of supported Code Page Identifiers
           installed on this computer

infile    Path of a text file whose content shall be converted
outfile   Path of a text file where the converted stream
           shall be written
 Input file and output file must not be the same


sourceforge files > https://sourceforge.net/projects/convertcp/files/
look bin/x86 & bin/x64 for the corresponding os architecture.

Maybe it wouldn't be bad to add a routine to projects / tools  :thumbsup:


Re: CONVERTCP
« Reply #1 on: May 18, 2020, 02:26:30 PM »

Lancelot

  • Gena Baker
  • Grand Chef
  • *****
  • Date Registered: Sep 2010
  • Posts: 10350
Thanks for CONVERTCP

I am not sure if good for utf-8 !

ps: I use notepad2 to convert ascii code page, good to have a cmd line tool

+
iniwrite tools using windows api always trouble and for now I do not know an application that does not use windows api to iniwrite yet.  :wink:

It will be great to have a none-windows-api iniwrite tool that supports multi txt formats (ascii utf-8 utf16 ....) and without reading all file to memory (and no .net)
 let me know if you find one day. :cheers:

ps: and last time I also failed with iniwrite AutoIT on a special utf-8 case .......

Another idea, If you like to test Try this:
add a line and a section to the top of the utf-8 file before iniwrite
Code: [Select]
;Dummy
[Dummy]
....
tip: like .reg files which are utf-8 / unicode-bom .....

after you iniwrite txtremove top lines.


*
Further, for small amount of iniwrite utf-8 files, try to use Call,IniWrite, see if it works fine ?

:turtle:

Re: CONVERTCP
« Reply #2 on: May 18, 2020, 04:21:58 PM »

Nikky

  • Chef
  • ***
  • Location: Croatia
  • Date Registered: Jan 2016
  • Posts: 131
Seems to support utf-8:
Code: [Select]
/l        Display a list of supported Code Page Identifiers
           installed on this computer

h:\Win10PESE\Projects\Tools\Win10PESE>convertcp /l
Code   |     Supported As     | Description
Page ID| Input Stream >511 MB |
-------+----------------------+-------------------------------------------------
-
    37 |         Yes          | 37    (IBM EBCDIC - U.S./Canada)
   437 |         Yes          | 437   (OEM - United States)
   500 |         Yes          | 500   (IBM EBCDIC - International)
   708 |         Yes          | 708   (Arabic - ASMO)
   720 |         Yes          | 720   (Arabic - Transparent ASMO)
   737 |         Yes          | 737   (OEM - Greek 437G)
   775 |         Yes          | 775   (OEM - Baltic)
   850 |         Yes          | 850   (OEM - Multilingual Latin I)
   852 |         Yes          | 852   (OEM - Latin II)
   855 |         Yes          | 855   (OEM - Cyrillic)
   857 |         Yes          | 857   (OEM - Turkish)
   858 |         Yes          | 858   (OEM - Multilingual Latin I + Euro)
   860 |         Yes          | 860   (OEM - Portuguese)
   861 |         Yes          | 861   (OEM - Icelandic)
   862 |         Yes          | 862   (OEM - Hebrew)
   863 |         Yes          | 863   (OEM - Canadian French)
   864 |         Yes          | 864   (OEM - Arabic)
   865 |         Yes          | 865   (OEM - Nordic)
   866 |         Yes          | 866   (OEM - Russian)
   869 |         Yes          | 869   (OEM - Modern Greek)
   870 |         Yes          | 870   (IBM EBCDIC - Multilingual/ROECE (Latin-2))
   874 |         Yes          | 874   (ANSI/OEM - Thai)
   875 |         Yes          | 875   (IBM EBCDIC - Modern Greek)
   932 |         No           | 932   (ANSI/OEM - Japanese Shift-JIS)
   936 |         No           | 936   (ANSI/OEM - Simplified Chinese GBK)
   949 |         No           | 949   (ANSI/OEM - Korean)
   950 |         No           | 950   (ANSI/OEM - Traditional Chinese Big5)
  1026 |         Yes          | 1026  (IBM EBCDIC - Turkish (Latin-5))
  1047 |         Yes          | 1047  (IBM EBCDIC - Latin-1/Open System)
  1140 |         Yes          | 1140  (IBM EBCDIC - U.S./Canada (37 + Euro))
  1141 |         Yes          | 1141  (IBM EBCDIC - Germany (20273 + Euro))
  1142 |         Yes          | 1142  (IBM EBCDIC - Denmark/Norway (20277 + Euro))
  1143 |         Yes          | 1143  (IBM EBCDIC - Finland/Sweden (20278 + Euro))
  1144 |         Yes          | 1144  (IBM EBCDIC - Italy (20280 + Euro))
  1145 |         Yes          | 1145  (IBM EBCDIC - Latin America/Spain (20284 + Euro))
  1146 |         Yes          | 1146  (IBM EBCDIC - United Kingdom (20285 + Euro))
  1148 |         Yes          | 1148  (IBM EBCDIC - International (500 + Euro))
  1149 |         Yes          | 1149  (IBM EBCDIC - Icelandic (20871 + Euro))
  1200 |         Yes          | 1200  (UTF-16 Little Endian Byte Order)
  1201 |         Yes          | 1201  (UTF-16 Big Endian Byte Order)
  1250 |         Yes          | 1250  (ANSI - Central Europe)
  1251 |         Yes          | 1251  (ANSI - Cyrillic)
  1252 |         Yes          | 1252  (ANSI - Latin I)
  1253 |         Yes          | 1253  (ANSI - Greek)
  1254 |         Yes          | 1254  (ANSI - Turkish)
  1255 |         Yes          | 1255  (ANSI - Hebrew)
  1256 |         Yes          | 1256  (ANSI - Arabic)
  1257 |         Yes          | 1257  (ANSI - Baltic)
  1258 |         Yes          | 1258  (ANSI/OEM - Viet Nam)
  1361 |         No           | 1361  (Korean - Johab)
 10000 |         Yes          | 10000 (MAC - Roman)
 10001 |         No           | 10001 (MAC - Japanese)
 10002 |         No           | 10002 (MAC - Traditional Chinese Big5)
 10003 |         No           | 10003 (MAC - Korean)
 10004 |         Yes          | 10004 (MAC - Arabic)
 10005 |         Yes          | 10005 (MAC - Hebrew)
 10006 |         Yes          | 10006 (MAC - Greek I)
 10007 |         Yes          | 10007 (MAC - Cyrillic)
 10008 |         No           | 10008 (MAC - Simplified Chinese GB 2312)
 10010 |         Yes          | 10010 (MAC - Romania)
 10017 |         Yes          | 10017 (MAC - Ukraine)
 10021 |         Yes          | 10021 (MAC - Thai)
 10029 |         Yes          | 10029 (MAC - Latin II)
 10079 |         Yes          | 10079 (MAC - Icelandic)
 10081 |         Yes          | 10081 (MAC - Turkish)
 10082 |         Yes          | 10082 (MAC - Croatia)
 12000 |         Yes          | 12000 (UTF-32 Little Endian Byte Order)
 12001 |         Yes          | 12001 (UTF-32 Big Endian Byte Order)
 20000 |         No           | 20000 (CNS - Taiwan)
 20001 |         No           | 20001 (TCA - Taiwan)
 20002 |         No           | 20002 (Eten - Taiwan)
 20003 |         No           | 20003 (IBM5550 - Taiwan)
 20004 |         No           | 20004 (TeleText - Taiwan)
 20005 |         No           | 20005 (Wang - Taiwan)
 20105 |         Yes          | 20105 (IA5 IRV International Alphabet No.5)
 20106 |         Yes          | 20106 (IA5 German)
 20107 |         Yes          | 20107 (IA5 Swedish)
 20108 |         Yes          | 20108 (IA5 Norwegian)
 20127 |         Yes          | 20127 (US-ASCII)
 20261 |         No           | 20261 (T.61)
 20269 |         Yes          | 20269 (ISO 6937 Non-Spacing Accent)
 20273 |         Yes          | 20273 (IBM EBCDIC - Germany)
 20277 |         Yes          | 20277 (IBM EBCDIC - Denmark/Norway)
 20278 |         Yes          | 20278 (IBM EBCDIC - Finland/Sweden)
 20280 |         Yes          | 20280 (IBM EBCDIC - Italy)
 20284 |         Yes          | 20284 (IBM EBCDIC - Latin America/Spain)
 20285 |         Yes          | 20285 (IBM EBCDIC - United Kingdom)
 20290 |         Yes          | 20290 (IBM EBCDIC - Japanese Katakana Extended)
 20297 |         Yes          | 20297 (IBM EBCDIC - France)
 20420 |         Yes          | 20420 (IBM EBCDIC - Arabic)
 20423 |         Yes          | 20423 (IBM EBCDIC - Greek)
 20424 |         Yes          | 20424 (IBM EBCDIC - Hebrew)
 20833 |         Yes          | 20833 (IBM EBCDIC - Korean Extended)
 20838 |         Yes          | 20838 (IBM EBCDIC - Thai)
 20866 |         Yes          | 20866 (Russian - KOI8)
 20871 |         Yes          | 20871 (IBM EBCDIC - Icelandic)
 20880 |         Yes          | 20880 (IBM EBCDIC - Cyrillic (Russian))
 20905 |         Yes          | 20905 (IBM EBCDIC - Turkish)
 20924 |         Yes          | 20924 (IBM EBCDIC - Latin-1/Open System (1047 +Euro))
 20932 |         No           | 20932 (JIS X 0208-1990 & 0212-1990)
 20936 |         No           | 20936 (Simplified Chinese GB2312)
 21025 |         Yes          | 21025 (IBM EBCDIC - Cyrillic (Serbian, Bulgarian))
 21027 |         Yes          | 21027 (Ext Alpha Lowercase)
 21866 |         Yes          | 21866 (Ukrainian - KOI8-U)
 28591 |         Yes          | 28591 (ISO 8859-1 Latin I)
 28592 |         Yes          | 28592 (ISO 8859-2 Central Europe)
 28593 |         Yes          | 28593 (ISO 8859-3 Latin 3)
 28594 |         Yes          | 28594 (ISO 8859-4 Baltic)
 28595 |         Yes          | 28595 (ISO 8859-5 Cyrillic)
 28596 |         Yes          | 28596 (ISO 8859-6 Arabic)
 28597 |         Yes          | 28597 (ISO 8859-7 Greek)
 28598 |         Yes          | 28598 (ISO 8859-8 Hebrew: Visual Ordering)
 28599 |         Yes          | 28599 (ISO 8859-9 Latin 5)
 28603 |         Yes          | 28603 (ISO 8859-13 Latin 7)
 28605 |         Yes          | 28605 (ISO 8859-15 Latin 9)
 38598 |         Yes          | 38598 (ISO 8859-8 Hebrew: Logical Ordering)
 50220 |         No           | 50220 (ISO-2022 Japanese with no halfwidth Katakana)
 50221 |         No           | 50221 (ISO-2022 Japanese with halfwidth Katakana)
 50222 |         No           | 50222 (ISO-2022 Japanese JIS X 0201-1989)
 50225 |         No           | 50225 (ISO-2022 Korean)
 50227 |         No           | 50227 (ISO-2022 Simplified Chinese)
 50229 |         No           | 50229 (ISO-2022 Traditional Chinese)
 51949 |         No           | 51949 (EUC-Korean)
 52936 |         No           | 52936 (HZ-GB2312 Simplified Chinese)
 54936 |         No           | 54936 (GB18030 Simplified Chinese)
 57002 |         No           | 57002 (ISCII - Devanagari)
 57003 |         No           | 57003 (ISCII - Bengali)
 57004 |         No           | 57004 (ISCII - Tamil)
 57005 |         No           | 57005 (ISCII - Telugu)
 57006 |         No           | 57006 (ISCII - Assamese)
 57007 |         No           | 57007 (ISCII - Oriya)
 57008 |         No           | 57008 (ISCII - Kannada)
 57009 |         No           | 57009 (ISCII - Malayalam)
 57010 |         No           | 57010 (ISCII - Gujarati)
 57011 |         No           | 57011 (ISCII - Punjabi (Gurmukhi))
 65000 |         No           | 65000 (UTF-7)
 65001 |         Yes          | 65001 (UTF-8)
65001 (UTF-8)

I'll try what you suggested and let you know the result.

PS: Original ini file on top have:
; UNICODE FILE - edit with care ;-)

[Others]
...

PS2: Call,IniWrite (without ConvertToAnsi) seems to use the ConvertToAnsi function,
the whole content is there, probably the ansi format, without ; UNICODE FILE - edit with care ;-) on top.
« Last Edit: May 18, 2020, 04:59:06 PM by Nikky »

Re: CONVERTCP
« Reply #3 on: May 19, 2020, 12:51:03 PM »

Nikky

  • Chef
  • ***
  • Location: Croatia
  • Date Registered: Jan 2016
  • Posts: 131
As they say, simple solutions are the best  :tongue:

By trying multiple variants, this is the solution:

"original" ini - correct unicode - two bytes for each character - FF FE hex on begining of file
.
use ConvertToAnsi in VB script and change to the desired values
.
burn / boot winpe and start that application
.
application accepts ansi ini file and changes made
.
by closing the application, automatically recreates the unicode ini file.


No need for an external converter to unicode.
By the way, the above CONVERTCP did not create the same structure when converted to unicode (two bytes for each character - FF FE hex on begining of file).

For now, the case is closed.  :grin:

Re: CONVERTCP
« Reply #4 on: May 20, 2020, 10:42:09 AM »

Lancelot

  • Gena Baker
  • Grand Chef
  • *****
  • Date Registered: Sep 2010
  • Posts: 10350
If ini file do not have none ascii and app happy with ascii, case closed.  :thumbsup:
We did that many times on plugins.

but if one day none ansi/ascii ini file have none ascii chars (probably multilanguage support)
check Horst Schaeffer's inifile (which I had used on some plugins in very past)
https://www.horstmuc.de/wbat32.htm#inifile

or AutoIT with /AutoIt3ExecuteLine (with iniwrite) like NIKZZZZ wrote here:
http://theoven.org/index.php?topic=2703.msg30750#msg30750

and if you have trouble with UnicodeBom file with these tools like you wrote (all file lost etc.), read my previous tips. :wink:

See you
:turtle:

Re: CONVERTCP
« Reply #5 on: May 20, 2020, 11:22:31 AM »

Nikky

  • Chef
  • ***
  • Location: Croatia
  • Date Registered: Jan 2016
  • Posts: 131
Probably due to multilanguage support they switch to unicode,
I wrote above, there is a solution that works in this case,
We will not complicate, let's move on, neext   :grin:

 

Powered by EzPortal