Wednesday, October 8, 2014

Reading Massive Files In PowerShell

It's a known fact that PowerShell is rubbish at reading large files, and that's putting it politely.  Why you ask?

Get-Content command let when used on it's own appears to display the content of the file as soon as it reads the data.  If, however, you then output to a pipeline the content it will buffer first before being able to use the data.  This in turn will force your memory and swap to be consumed greatly, and hence why PowerShell is useless with large files.

So, we need to go back a step to good old VBScript where by we made use of a system object called the Scripting.FileSystemObject, which works well with large files and allows you to work through a file and process as you go rather than killing your system trying to load it into memory.

$fso=New-Object -ComObject Scripting.FileSystemObject

$file=$fso.OpenTextFile("SomeTextFile.txt",1)

while ( ! $file.AtEndOfStream ) {
    $line = $file.ReadLine()

    if ( $line -match "findsomething" ) {
        write-host $line
    }
}

$file.Close()
$file=$Null


Now you can work effectively with large files in PowerShell and not use the cumbersome Get-Content, until the developers of PowerShell understand memory management and stop killing our Windows systems.