From time to time i get a request from a dept head to do a query in a DB and dump the results then alter it and submit it for processing. I dump it as a csv, which comes out as a csv (pipe delimeted values). The file is normally several hundred lines, one in awhile over 1000 (today’s was 1500).

I am trying to write a PS script to read the file, make the changes and puke it back out, so i don’t have to go line by line, or even a search and replace in excel, to make the desired altercations in the file. The issue I run into is, each line is a total of 79 characters total, but for some reason, someone along the line must have inputted a value wrong (or maybe the system did it, not sure) and about half way through, it drops to 78 characters long. My script goes through and uses the substring method to parse values in each line and assign them to variables to create a new string line and save it to a new file. I discovered the location of the error, which is near the end of each line, near one of the values that I parse out. I am having trouble trying to come up with a solution for this. Right now its throwing an " Exception calling “Substring” with “2” argument(s): “Index and length must refer to a location within the string. Parameter name: length” error message. Which i found is because of the irregular length of some of the lines, causing the substring length values, which are located at the end of each line, to be less than what is define for length in the substring, if that makes sense.

So sample of a valid line is:
23|SPARROW Jack J |W32100 | | |SAGE | | |SAGE |X|X

example of an invalid line:
23|SPARROW Jack J |W32100 | | |SAGE | | |SAGE|X|X

notice the late of a space between the ‘E’ at the end of ‘SAGE’ and the proceeding pipe.

Here is my code to parse values out of each line:

Get-Content -Path ".\InputFile.txt" | ForEach-Object {
      
            $DistrictNum         =   $_.substring(0,2)
        
            $CaseNum             =   $_.substring(32,10) 

            $OldOffice           =   $_.substring(43,3)  
 
            $OldUnit             =   $_.substring(47,5)
           
            $OldWorker           =   $_.substring(53,5)
        
            $NewOffice           =   $_.substring(59,3)

            $NewUnit             =   $_.substring(63,5)
           
            $NewWorker           =   $_.substring(69,5)
            if ( ($NewWorker.EndsWith('\|') ) ) {
                $NewWorker = $NewWorker.replace('\|',' ')
               

            }                   
            $Tape                =   $_.substring(75,1)
                 
            $Filler              =   $_.substring(77,1)


                $Dataline = $DistrictNum  + $CaseNum + $OldOffice + $OldUnit +  $OldWorker + $NewOffice + $NewUnit + $NewWorker + $Tape +  $Filler 
                $Dataline | Out-File -FilePath "$PSScriptRoot\OutputFile.txt" -Append 
            } | Out-Null

I can’t seem to figure out how to get that extra space between the ‘E’ at the end of ‘SAGE’ and the proceeding pipe, to be able to push the invalid lines out from 78 characters to the needed 79. For some reason the replace method i am using is not working. It seems to read in the file then throw an error once it reads in the first instance of an invalid line.

4 Spice ups

I would use -split, -join, and -replace for this. Let’s say the column you want to modify is third from the last, you can do something like this:

get-content .\inputfile.txt | Foreach-Object {
    # split columns into an array using pipe delimiter
    $s = $_ -split '\|'
    # replace the final character (if it is nonspace) of third to last item with the character plus a space
    $s[-3] = $s[-3] -replace '\S$','$0 '
    # join items back together with pipe separators
    $s -join '|'
} | Set-Content "$PSScriptRoot\OutputFile.txt"
1 Spice up

You can import it as a CSV ? just tell it to use ‘|’ as a delimiter?
and then you can trim the input, which will get rid of the space?

import-csv "somefile.txt" -delimiter "|"

maybe I’m missing something?

1 Spice up

I like Neally’s idea if you are using PowerShell 7. Then you can write back to the text file using Export-Csv -UseQuotes Never. Otherwise, you are dealing with quoted fields in your output. Or you are building your output using property member references rather than substrings. I’m not a fan of the substring stuff as it is overly brittle if checks aren’t in place.

2 Spice ups

I would just use MS word then open the csv file…

  • Replace “SAGE |” to “SAGE|”

But the more obvious or permanent solution is to change the data in DB itself to “Sage” instead of "Sage " or to find out what is causing the space ?

Or just RTRIM() the values in the query before it even dumps to the report file. Is what I’d do.

Unfortunately, each piece of information I am parsing out has to be a specific number of characters long, so if the values that are in that range, fall short of the specified length, the remaining place holders need to be filled with spaces. Example: “$NewWorker” has a predetermined length of 5 characters. The values in alot of the lines for where this variable is referencing, is only 4 characters long and then has 1 space. The system I upload this file too, is very specific and it will ingest 5 characters, from character number 53-57. So if i trim the space, the next character after that, will be put into spot 57 (which most likely will be the first character of the next value) and everything in the system will be thrown off by one character. Thank you for the suggestion though.

sadly, this system is owned by a third party and i don’t have access to do this or I would. Thank you for though.

1 Spice up

Can you provide the required schema for all columns? You’ve presented new information that wasn’t originally present. I think the substring method on the whole line is not the way to go. You should split on the pipes and then manipulate elements. The PadRight method is great for ensuring short columns are certain widths if your filler characters are the same. Combine that with substring to ensure long columns are certain widths. Here is an example, but since I don’t know your full column schema, I can’t provide something complete.

get-content .\inputfile.txt | Foreach-Object {
    # split columns into an array using pipe delimiter
    $s = $_ -split '\|'
    # first column should be two characters
    $s[0] = $s[0].PadRight(2,' ').SubString(0,2)
    # ninth column should be 5 characters
    $s[8] = $s[8].PadRight(5,' ').SubString(0,5)
    $s -join '|'
} | Set-Content "$PSScriptRoot\OutputFile.txt"
2 Spice ups

@adminofthings, this is what the original csv file (modified for security) looks like (at the point where the issue starts):

image

These are the variables I need to extract from the each line in the original file and their length. Then when constructing the new file, i also have to reference this chart (using the starting position field):

then the final layout needs to look like this (in Notepad++ i had it show dots for spaces):
image

Each line has to come out to exact 40 characters long and last two positions with be filled with ‘X’.

This is one way to apply that schema:

get-content .\inputfile.txt | Foreach-Object {
    # split columns into an array using pipe delimiter
    $s = $_ -split '\|'
    # district number
    $s[0] = $s[0].PadRight(2,' ').SubString(0,2)
    # case number
    $s[2] = $s[2].PadRight(10,' ').SubString(0,10)
    # old office
    $s[3] = $s[3].PadRight(3,' ').SubString(0,3)
    # old unit
    $s[4] = $s[4].PadRight(5,' ').SubString(0,5)
    # old worker
    $s[5] = $s[5].PadRight(5,' ').SubString(0,5)
    # new office
    $s[6] = $s[6].PadRight(3,' ').SubString(0,3)   
    # new unit
    $s[7] = $s[7].PadRight(5,' ').SubString(0,5)  
    # new worker
    $s[8] = $s[8].PadRight(5,' ').SubString(0,5)
    # extra X values
    $extra = 'XX'
    # combine array elements and extra values   
    -join (@($s[0]) + $s[2..8] + $extra)
} | Set-Content "$PSScriptRoot\OutputFile.txt"

Of course you can replace the variable names with something more descriptive and add all the elements together like you did in your code. You could also Import-Csv and add column names. But that does mean you will need PowerShell 7 to avoid quotes and remove the header line from the output.

@adminofthings, OMG that worked! Thank you so much for your help!

I think most of the other suggestions here would work. Another way would be to do like:

$DistrictNum         =   $_.substring(0,2).Trim()

The other suggestions would not work because this simply isn’t a “trim white space” problem. Columns have fixed width requirements where small columns need extra spaces and large columns need to be trimmed of any extra characters

1 Spice up

Fair. I overlooked the part about this needing to be fixed-length. One might wonder if fixed-length is a requirement for the processing-end of this.