Oracle gets behind R!!

Interesting news for the R community and another validating data point that R is mainstream and has impact for Enterprises.

http://www.oracle.com/technetwork/topics/bigdata/r-offerings-1566363.html

Soccer + Pagerank

Great post on how soccer can be analyzed using network theory.

http://www.technologyreview.com/view/428399/pagerank-algorithm-reveals-soccer-teams/

Quick-R

Great site for getting up to speed on R:

http://www.statmethods.net/index.html

R 2.15 is out

R 2.15 is out,detailed release information is here: http://www.r-bloggers.com/r-2-15-0-is-released/

Lots of small improvements and tweaks across the landscape.  Some of the new load balancing functions (clustermaps new argument and parLapplyLB and parSapplyLB) are worth digging into a bit more. As I get into this release I may have more to say.

TidBits

Networking things I use rarely enough I can not remember

sudo tcpdump -i eth0 -s 65535 -w tcpoutput

tshark -i eth1 -f 'host 1.2.3.4' -R 'http' -S -V -l | awk '/^[HL]/ {p=30} /^[^ HL]/ {p=0} /^ / {--p} {if (p>0) print}'

Start a terminal session other people can join and watch

screen -d -R watch_me_code

to join the session:

screen -x watch_me_code

HCP encode script

Encoding credentials for HCP http based access one liner:

echo `echo -n $1 | base64`:`echo -n $2 | md5sum` | awk ‘{print $1}’

Machine Generated Data: TempDuino II

This is the second article on MGD, the first is here.  In that article I had setup a simple sensor to capture temperature and was recording that value every minute into a file.  We left off with the sensor running.  Now that we have some data lets get into it a bit and see what we can learn.

$ wc -l raw_temp_data.csv
43948 raw_temp_data.csv

Nice, almost 44,000 observations. Keep in mind that the majority of time in analysis is spent in data preparation and cleaning.  Especially if you have data from different sources in different formats.  In this simplified example we begin to see some of what that data preparation and cleaning will look like using some basic linux shell commands.

Read More

The core components

I believe in abstractions.  Here is one I use, and may come as an “obviously” to some.

When I am building a technology system, there are three fundamental things I want to do with information: move it,  compute it, save it for later.  I like to decompose how I think of systems into those basic pieces, and build up from there. It is a simple method, but one I have come to rely on.

Read More

Machine Generated Data: TempDuino

I’ve been hearing a LOT about machine generated data (MGD) lately.  I am incredibly interested in getting into the nuts and bots of what the value of all of this data is, how to manage it, and how to draw insights from the data. So in that light, I started wondering how I could better get my head around this problem space and form a fresher perspective.

Read More