STDIN and Powershell - How do I make an encoding?

I have a problem with Ruby (1.9.3) and Powershell.

I need to write an interactive console application that will deal with offers in Polish. They helped me and can extract ARGV elements with Polish diacritics, but the standard input does not work the way I want.

Code designation:

# encoding: UTF-8
target = ARGV[0].dup.force_encoding('CP1250').encode('UTF-8')
puts "string constant = dupą"
puts "dupą".bytes.to_a.to_s
puts "dupą".encoding

puts "target = " +target
puts target.bytes.to_a.to_s
puts target.encoding
puts target.eql? "dupą"

STDIN.set_encoding("CP1250", "UTF-8") 
# the line above changes nothing, it can be removed and the result is still the same
# I obviously wanted to mimic the ARGV solution

target2 = STDIN.gets
puts "target2 = " +target2
puts target2.bytes.to_a.to_s
puts target2.encoding
puts target2.eql? "dupą"

Output:

string constant = dupą
[100, 117, 112, 196, 133]
UTF-8
target = dupą
[100, 117, 112, 196, 133]
UTF-8
true
dupą //this is fed to STDIN.gets
target2 = dup
[100, 117, 112]
UTF-8
false

Apparently, Ruby never gets the fourth character from STDIN.gets. If I write a longer line, for example dupąlalala, only three initial bytes still appear inside the program.

  • I tried to list the bytes and the loop using getc, but they never reach Ruby (where are they lost?)
  • I used chcp 65001 (it seems nothing changes)
  • I changed my $ OutputEncoding to [Console] :: OutputEncoding; now it looks like this:

     IsSingleByte      : True
     BodyName          : ibm852
     EncodingName      : Środkowoeuropejski (DOS)
     HeaderName        : ibm852 
     WebName           : ibm852
     WindowsCodePage   : 1250
     IsBrowserDisplay  : True
     IsBrowserSave     : True
     IsMailNewsDisplay : False
     IsMailNewsSave    : False
     EncoderFallback   : System.Text.InternalEncoderBestFitFallback
     DecoderFallback   : System.Text.InternalDecoderBestFitFallback
     IsReadOnly        : True
     CodePage          : 852
    
  • Consolas

, - Powershell?

+5
1

. , . , , .

# Get "encoding" for code page 1250 (Central European)
$en=[System.Text.Encoding]::GetEncoding(1250)
# Looks like this:
IsSingleByte      : True
BodyName          : iso-8859-2
EncodingName      : Central European (Windows)
HeaderName        : windows-1250
WebName           : windows-1250
WindowsCodePage   : 1250
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : True
CodePage          : 1250

# Change STDIN input encoding
[console]::InputEncoding=$en
$x = Read-Host 
# I typed in dupą 
#  (I set Polish in Languate Bar. 
#   Final letter is apostrophe on US English keyboard)
[int[]][char[]]$x
# output is: 100 117 112 261 (in hex): 64 75 70 105
# the final character (261) is "Latin Small Letter A with Ogonek" 
+1

All Articles