R 2.15 is out

R 2.15 is out,detailed release information is here: http://www.r-bloggers.com/r-2-15-0-is-released/

Lots of small improvements and tweaks across the landscape.  Some of the new load balancing functions (clustermaps new argument and parLapplyLB and parSapplyLB) are worth digging into a bit more. As I get into this release I may have more to say.

TidBits

Networking things I use rarely enough I can not remember

sudo tcpdump -i eth0 -s 65535 -w tcpoutput

tshark -i eth1 -f 'host 1.2.3.4' -R 'http' -S -V -l | awk '/^[HL]/ {p=30} /^[^ HL]/ {p=0} /^ / {--p} {if (p>0) print}'

Start a terminal session other people can join and watch

screen -d -R watch_me_code

to join the session:

screen -x watch_me_code

Hadoop and Map/Reduce

It was coming eventually.  A post about the beast that is dominating this space: Hadoop.  This isn’t that post however. This post merely sets up the future and motivation for the work.

I have a few large projects I am working on and ultimately want to evaluate the Mahout machine learning libraries within those environments.  To get there I will need to evaluate Hadoop and the associated environments.  Performance, reliability, durability and operational integration and complexity will be the areas of concern.

I have downloaded Virtual Machine Environments for Cloudera, MapR, (waiting on HortonWorks), the latest apache Hadoop release (from source) and this (http://developer.yahoo.com/hadoop/tutorial/module3.html#vm) from yahoo YDN.  I may also spend some time with Amazon’s AWS EMR Elastic Map Reduce.  Did I miss any Hadoop focused vendors?

HCP encode script

Encoding credentials for HCP http based access one liner:

echo `echo -n $1 | base64`:`echo -n $2 | md5sum` | awk ‘{print $1}’

Machine Generated Data: TempDuino II

This is the second article on MGD, the first is here.  In that article I had setup a simple sensor to capture temperature and was recording that value every minute into a file.  We left off with the sensor running.  Now that we have some data lets get into it a bit and see what we can learn.

$ wc -l raw_temp_data.csv
43948 raw_temp_data.csv

Nice, almost 44,000 observations. Keep in mind that the majority of time in analysis is spent in data preparation and cleaning.  Especially if you have data from different sources in different formats.  In this simplified example we begin to see some of what that data preparation and cleaning will look like using some basic linux shell commands.

Read More

The core components

I believe in abstractions.  Here is one I use, and may come as an “obviously” to some.

When I am building a technology system, there are three fundamental things I want to do with information: move it,  compute it, save it for later.  I like to decompose how I think of systems into those basic pieces, and build up from there. It is a simple method, but one I have come to rely on.

Read More

Machine Generated Data: TempDuino

I’ve been hearing a LOT about machine generated data (MGD) lately.  I am incredibly interested in getting into the nuts and bots of what the value of all of this data is, how to manage it, and how to draw insights from the data. So in that light, I started wondering how I could better get my head around this problem space and form a fresher perspective.

Read More

Data Analysis

I recently started to read Data Analysis from Oreilly media.  As the space for analytics and “Big Data” grows, the quality and quantity of our (the product and practitioner side) learning resources will need to expand, and this book is a significant entrant.  It is broad enough in its applicability and provides enough high level references to concrete tools that it should find broad appeal. 

Read More

Big Data. And apps

Its been a while since I was actively blogging, hopefully this will change as I begin to look into the wide world of enterprise computing on this blog.  It seems fitting as I move further into my tenure as a technologist with Hitachi Data Systems in the File and Content areas.

I will also probably post random bits of technology here as well.