Wednesday, October 8, 2014

Reading Massive Files In PowerShell

It's a known fact that PowerShell is rubbish at reading large files, and that's putting it politely.  Why you ask?

Get-Content command let when used on it's own appears to display the content of the file as soon as it reads the data.  If, however, you then output to a pipeline the content it will buffer first before being able to use the data.  This in turn will force your memory and swap to be consumed greatly, and hence why PowerShell is useless with large files.

So, we need to go back a step to good old VBScript where by we made use of a system object called the Scripting.FileSystemObject, which works well with large files and allows you to work through a file and process as you go rather than killing your system trying to load it into memory.

$fso=New-Object -ComObject Scripting.FileSystemObject

$file=$fso.OpenTextFile("SomeTextFile.txt",1)

while ( ! $file.AtEndOfStream ) {
    $line = $file.ReadLine()

    if ( $line -match "findsomething" ) {
        write-host $line
    }
}

$file.Close()
$file=$Null


Now you can work effectively with large files in PowerShell and not use the cumbersome Get-Content, until the developers of PowerShell understand memory management and stop killing our Windows systems.

Tuesday, September 23, 2014

MySQL Data Import

I always forget this piece, as MySQL unlike SQL Server doesn't have a convenient data import tool, but instead provides a flexible command line import feature.

To import CSV (for example) data into MySQL you need;
- A database container
- A table

Let's start with a new database;

create database myDataImportExample;

use myDataImportExample;

Now we need a table for the data;

create table myData (
  ticker varchar(14),
  tradeDate varchar(8),
  openPrice decimal(17,4),
  volume bigint
);

Then import the data that has the 4 columns of data;

load data infile '/home/user1/myData.csv'
INTO table myData
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

Thursday, August 21, 2014

Java worse than ever

Java, the coffee is better than the language.

A language that is meant to make writing code for multiple platforms easier, but even with Oracles implementation of Java which you would think should allow me to compile one on any platform and then take the compiled byte code and place it on another platform with technically the same jvm would work straight away. So Java folk tell me what is all the fuss about having a language that doesn't do what it says it should?

1. I have to recompile code on windows and then again on Linux if it involves more than "Hello world".
So why not just do it in c++?
Or better still Perl then true write once run anywhere language.

2. Memory leaks. So many Java programmer are under the impression this language can't have memory leaks.
Can we make a law that says those who think this should be band from writing software.
The number of times I've had a Java dev ask for jvm values to be increased makes me laugh. Sorry your code out.

3. Frameworks. A fancy word for bloatware. I have to include a massive library of crap to fill up memory that i only use two functions from.

Come on people, with all this processing power we have today our computer systems really could be thinking for themselves. They could certainly be writing better code.

If we can't have a language that can compile one and run everywhere then let's just go back to c++ and Perl which let's be honest are still the only two real languages.

Write one compile everywhere such as Java is a poor excuse for software development.

A rang from a Perl and C programmer and system admin who's seen far too much poor coding in Java and fed up with having to complete a language that should only need compiling once to run anywhere.

The end, I'm off for a cup of Java :-)