Thursday, December 10, 2020

Pandas change string to new dataframe

 Today I was given a challenge where someone was extracting JSON data and wanting to write it to a database and thanks to https://www.dataquest.io/blog/sql-insert-tutorial/ this gave the relevant information to enable the JSON data to be placed into a Pandas DataFrame and then grab the column names and the data and generate the necessary SQL.

HOWEVER

The data that was given had an array of dictionaries within it causing the DataFrame to show it as a string!

So this information was dropped from the DataFrame so that the core data for that table could be stored.

A second DataFrame was then required so the the Security Group data for each instance could then be stored into the database in its own table.

For this to happen I first created a new DataFrame that would contain;

  • Instance ID
  • Security Group array of dictionaries as a string
From here it was then a case of breaking it down so that the string and the instance ID could become a new record for a new DataFrame.


data = pd.read_json('ec2_data.json')
sg=data[['instance_id','sec_groups']]
a=[]
for x in sg.values:
    for y in x[1]:
        b=dict()
         b={"instance_id": x[0]}
         b.update(y)
         a.append(b)
df2=pd.DataFrame.from_dict(a)
print(df2)

  • The idea was to create the new DataFrame called sg from the original, but only taking the 2 columns we needed.
  • An empty array was then created so that the new DataFrame (df2) could be created since each record in a DataFrame is a dictionary from an array element.
  • We then iterate over the values of the sg DataFrame and within that iterate over the security groups since x[1] contains an array of security group dictionary objects.
  • A temporary dictionary is then created (b) to store the data of instance id, security group id and security group name.
  • That dictionary is then added to the array a.
  • This continues for all the data, and if there happens to be an instance with more than 1 security group then the instance ID will appear twice in the final DataFrame, but with the different security group data for each record.

Thursday, July 27, 2017

Windows PowerShell wget equivalent

Windows lacks useful tools that Linux and Unix have, but with the introduction of PowerShell things become more useful and easier to work with.

To implement wget in PowerShell and have it download any type of file use the following;

$webConnect = new-object System.Net.WebClient

$webConnect.DownloadFile("http://your.web.location/somefile.ext",".\localfile.ext")


The above example will download the file into the current directory.  somefile.ext could be for example a zip file such as mydocs.zip, and localfile.ext would simply be mydocs.zip.

If you omit the 2nd argument then the content of the URL will be output to the screen (stdout).

Wednesday, June 3, 2015

Dynamically adding Elements to forms and then submitting

Recently I decided to write a web application using PHP, for a change, rather than Perl, and as part of my application wanted to dynamically add some fields to a form, by allowing the user to decide how many fields they wanted to add.

The fields needed to be added to a table layout which would include the detail for the user in one column and the input fields in the right, example code;
<script>
function addExtraBoxes(numBoxes) {

        var container = document.getElementById('extragrid');
        // Clear container
        while ( container.hasChildNodes() ) {
                container.removeChild(container.lastChild);
        }
        
        // Add the question boxes
        for ( x = 1; x <= numBoxes; x++ ) {
                var newdiv = document.createElement('tr');
                newdiv.innerHTML = '<td>Extra Box ' + x + ': </td><td> <input type="text" name="extra' + x + '" size="60"></td></tr>'
                container.appendChild(newdiv);
        }
}
</script>

<table>
<form name='dynamic' id='dynamic' action='somepage.php' method='post'>
<tr><td>Number extra fields:</td><td><select name='extras' onChange='addExtraBoxes(this.options[this.selectedIndex].value)'>
<option value='Choose' selected>---Choose one---</option>
<option value='1'>1</option>
<option value='2'>2</option>
<option value='3'>3</option>
<option value='4'>4</option>
<option value='5'>5</option>
</select></td></tr>
<tr><td colspan='2'>
<table id='extragrid'></table>
</td></tr>
</form>
</table>

The above code will conveniently add the new input text boxes and field info to the table based on the users selection from the select box.

A lot of web sites will show you only this piece of code, and then proceed to tell you that if the form elements are showing within the <form> elements in your developer view (e.g. firebug or Chrome's developer tools) that the fields should be submitted with the rest of your form.  This sadly is not true, since what they are doing is adding the new form elements to either a DIV, or in my case above a table.  The table or DIV itself is not associated with the form (you have to know about the DOM to understand this).

So, let me explain:

I have a <form> called dynamic.  That form at the time of the web page rendering only has a <select> element and nothing else.
If we then select one of the numbers from the select element it fires the javascript function addExtraBoxes() which will add the chosen number of text field elements to the extragrid table.  The tables are not in the same location as the form in the DOM, so we get the following;

document.dynamic.extras     This is the form and the select box

document.extragrid               This is the table element
document.extragrid.extra1
document.extragrid.extra2    These extran elements are not associated with the form, but with the table since that is how javascript added them.

So you can see that the DOM does not associate the new form elements to the form, so although you will see them showing nicely in the developer tools, when you come to submit the form the elements are not submitted and are missing from the form data.

Therefore those people saying that if they are showing up in the developer tools inside the <form> and </form> elements when viewing the source, who say that the fields should be contained in the form data, couldn't be more wrong and don't understand the JavaScript DOM.

How do you over come this, I hear you ask.  Well if you have an understanding of the engineering of things then from above you'll realise that you'll need some more Javascript that will add these disjoint elements to the form before submitting it.  Below is the piece of code that will do that;

function addExtras() {
    var str="";
    try {
        var elems = document.getElementsByTagName('input');
        for ( i = 0; i < elems.length; i++ ) {
            if ( elems.item(i).type == 'text' ) {
                document.getElementById('dynamic').appendChild(elems.item(i));
            }
        }
    } catch(e) {
        alert(e.message);
    }
    document.getElementById('dynamic').submit();
    return false
}

Our <form> element needs to have the onSubmit event added to call the function as follows;
<form name='dynamic' id='dynamic' action='somepage.php' method='post' onSubmit='return addExtras()'>

By using the onSubmit event we are able to use the JavaScript to add the form elements to the correct location in the DOM, and therefore when the submit occurs the data is correctly posted with the rest of the form.

Here is the final code in the correct order;
<script>
function addExtraBoxes(numBoxes) {

        var container = document.getElementById('extragrid');
        // Clear container
        while ( container.hasChildNodes() ) {
                container.removeChild(container.lastChild);
        }
        
        // Add the question boxes
        for ( x = 1; x <= numBoxes; x++ ) {
                var newdiv = document.createElement('tr');
                newdiv.innerHTML = '<td>Extra Box ' + x + ': </td><td> <input type="text" name="extra' + x + '" size="60"></td></tr>'
                container.appendChild(newdiv);
        }
}

function addExtras() {
    var str="";
    try {
        var elems = document.getElementsByTagName('input');
        for ( i = 0; i < elems.length; i++ ) {
            if ( elems.item(i).type == 'text' ) {
                document.getElementById('dynamic').appendChild(elems.item(i));
            }
        }
    } catch(e) {
        alert(e.message);
    }
    document.getElementById('dynamic').submit();
    return false
}
</script>

<table>
<form name='dynamic' id='dynamic' action='somepage.php' method='post' onSubmit='return addExtras()'>
<tr><td>Number extra fields:</td><td><select name='extras' onChange='addExtraBoxes(this.options[this.selectedIndex].value)'>
<option value='Choose' selected>---Choose one---</option>
<option value='1'>1</option>
<option value='2'>2</option>
<option value='3'>3</option>
<option value='4'>4</option>
<option value='5'>5</option>
</select></td></tr>
<tr><td colspan='2'>
<table id='extragrid'></table>
</td></tr>
</form>
</table>

Wednesday, October 8, 2014

Reading Massive Files In PowerShell

It's a known fact that PowerShell is rubbish at reading large files, and that's putting it politely.  Why you ask?

Get-Content command let when used on it's own appears to display the content of the file as soon as it reads the data.  If, however, you then output to a pipeline the content it will buffer first before being able to use the data.  This in turn will force your memory and swap to be consumed greatly, and hence why PowerShell is useless with large files.

So, we need to go back a step to good old VBScript where by we made use of a system object called the Scripting.FileSystemObject, which works well with large files and allows you to work through a file and process as you go rather than killing your system trying to load it into memory.

$fso=New-Object -ComObject Scripting.FileSystemObject

$file=$fso.OpenTextFile("SomeTextFile.txt",1)

while ( ! $file.AtEndOfStream ) {
    $line = $file.ReadLine()

    if ( $line -match "findsomething" ) {
        write-host $line
    }
}

$file.Close()
$file=$Null


Now you can work effectively with large files in PowerShell and not use the cumbersome Get-Content, until the developers of PowerShell understand memory management and stop killing our Windows systems.

Tuesday, September 23, 2014

MySQL Data Import

I always forget this piece, as MySQL unlike SQL Server doesn't have a convenient data import tool, but instead provides a flexible command line import feature.

To import CSV (for example) data into MySQL you need;
- A database container
- A table

Let's start with a new database;

create database myDataImportExample;

use myDataImportExample;

Now we need a table for the data;

create table myData (
  ticker varchar(14),
  tradeDate varchar(8),
  openPrice decimal(17,4),
  volume bigint
);

Then import the data that has the 4 columns of data;

load data infile '/home/user1/myData.csv'
INTO table myData
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

Thursday, August 21, 2014

Java worse than ever

Java, the coffee is better than the language.

A language that is meant to make writing code for multiple platforms easier, but even with Oracles implementation of Java which you would think should allow me to compile one on any platform and then take the compiled byte code and place it on another platform with technically the same jvm would work straight away. So Java folk tell me what is all the fuss about having a language that doesn't do what it says it should?

1. I have to recompile code on windows and then again on Linux if it involves more than "Hello world".
So why not just do it in c++?
Or better still Perl then true write once run anywhere language.

2. Memory leaks. So many Java programmer are under the impression this language can't have memory leaks.
Can we make a law that says those who think this should be band from writing software.
The number of times I've had a Java dev ask for jvm values to be increased makes me laugh. Sorry your code out.

3. Frameworks. A fancy word for bloatware. I have to include a massive library of crap to fill up memory that i only use two functions from.

Come on people, with all this processing power we have today our computer systems really could be thinking for themselves. They could certainly be writing better code.

If we can't have a language that can compile one and run everywhere then let's just go back to c++ and Perl which let's be honest are still the only two real languages.

Write one compile everywhere such as Java is a poor excuse for software development.

A rang from a Perl and C programmer and system admin who's seen far too much poor coding in Java and fed up with having to complete a language that should only need compiling once to run anywhere.

The end, I'm off for a cup of Java :-)

Thursday, May 2, 2013

Python DB Abstraction

Whilst teaching Python this week on the 29 Apr - 2 May 2013, I decided to show those on the course how to make a script that would work between different databases, whilst keeping the amount of code in the main program to a minimum.

The program makes use of conditional import through Python's try/except capability, and uses the alias of module import to ensure that the abstract layer for the database calls are called by the same name.

Main Program
#!/usr/bin/python

import sys

if len(sys.argv) < 1:
  sys.stderr.write("Usage error\n")
  sys.stderr.write("Usage: "+sys.argv[0]+" dbtype sqliteFile\n")

# User chooses mysql
if sys.argv[1] == 'mysql':
  try:
    import MySQLdb
    import mysqlcond as actions  # Ensure that the module namespace is actions
    pyconn = MySQLdb.connect(host="localhost",user="root",db="abc")
  except:
    print "MySQLdb does not exist"
    sys.exit(1)
else:
  try:
    import sqlite3
    import sqlitecond as actions
    pyconn = sqlite3.connect(sys.argv[2])
  except:
    sys.stderr.write("SQLite does not exist\n")
    sys.exit(1)

pycur=pyconn.cursor()
myList=[2,'Shil','Steve','abc@xyz.com','blah blah']
actions.build(pycur)
actions.insert(pycur,myList)
actions.list(pycur)
pyconn.commit()
pyconn.close()



SQLite DB code layer "sqlitecond.py"
def build(pycur):
  sqlstmnt='''
        CREATE TABLE abook (
                id int primary key,
                surname varchar(50),
                firstname varchar(50),
                email varchar(150),
                notes text
        )
'''
  pycur.execute(sqlstmnt)

def insert(pycur,values):
  sql="INSERT INTO abook VALUES(?,?,?,?,?)"
  pycur.execute(sql,values);

def list(pycur,table=None,fields=None,values=None):

  sql="SELECT * FROM ",table
  pycur.execute(sql)
  return pycur.fetchall()
 
 




MySQL DB code layer "mysqlcond.py"
def build(pycur):
  sqlstmnt='''
        CREATE TABLE abook (
                id int primary key,
                surname varchar(50),
                firstname varchar(50),
                email varchar(150),
                notes text
        ) ENGINE = INNODB;
'''
  pycur.execute(sqlstmnt)

def insert(pycur,values):
  sql="INSERT INTO abook (id,surname,firstname,email) VALUES(?,?,?,?)"
  pycur.execute(sql,values);

def list(pycur,table=None,fields=None,values=None):

  sql="SELECT * FROM ",table
  pycur.execute(sql)
  return pycur.fetchall()