Enter the Pyed Piper via (@deferraz)

"The pyed piper, or pyp, is a python-centric command line text manipulation tool. It allows you to format, replace, augment and otherwise mangle text using standard python syntax with a few golden-oldie tricks from unix commands of the past. You can pipe data into pyp just like any other unix command line tool. After it's in, you can use the standard repertoire of python string and list methods to modify the text..."

http://code.google.com/p/pyp/wiki/intro

Threads and fork(): think twice before mixing them

"When debugging a program I came across a bug that was caused by using fork(2) in a multi-threaded program. I thought it's worth to write some words about mixing POSIX threads with fork(2) because there are non-obvious problems when doing that.

What happens after fork() in a multi-threadeed program

The fork(2) function creates a copy of the process, all memory pages are copied, open file descriptors are copied etc. All this stuff is intuitive for a UNIX programmer. One important thing that differs the child process from the parent is that the child has only one thread. Cloning the whole process with all threads would be problematic and in most cases not what the programmer wants. Just think about it: what to do with threads that are suspended executing a system call? So the fork(2) call clones just the thread which executed it.

 

What are the problems

Critical sections, mutexes

The non-obvious problem in this approach is that at the moment of the fork(2) call some threads may be in critical sections of code, doing non-atomic operations protected by mutexes. In the child process the threads just disappears and left data half-modified without any possibility to "fix" them, there is no way to say what other threads were doing and what should be done to make the data consistent. Moreover: state of mutexes is undefined, they might be unusable and the only way to use them in the child is to call pthread_mutex_init() to reset them to a usable state. It's implementation dependent how mutexes behave after fork(2) was called. On my Linux machine locked mutexes are locked in the child..."

 

http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

Fast, easy, realtime metrics using Redis bitmaps

"At Spool, we calculate our key metrics in real time. Traditionally, metrics are performed by a batch job (running hourly, daily, etc.). Redis backed bitmaps allow us to perform such calculations in realtime and are extremely space efficient. In a simulation of 128 million users, a typical metric such as “daily unique users” takes less than 50 ms on a MacBook Pro and only takes 16 MB of memory. Spool doesn’t have 128 million users yet but it’s nice to know our approach will scale. We thought we’d share how we do it, in case other startups find our approach useful..."

http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps 
https://github.com/antirez/redis/commit/9ee55e8e583a139b35628c47ef8d2c9f1e65936b

Batch Acknowledged Pipelines with ZeroMQ

"Parallel processing with a task ventilator is a common pattern with ZeroMQ.  The basics of this pattern are outlined in the “Divide and Conquer” section of the ZeroMQ guide.  The pattern consists of the following components:

  • A task ventilator that produces tasks.
  • A number of workers that do the processing work.
  • A sink that collects results from the worker processes..." 

http://blog.aggregateknowledge.com/tag/pyzmq

Alchemy Database: A Hybrid RDBMS/NOSQL-Datastore

"Alchemy Database is a low-latency high-TPS NewSQL RDBMS embedded in the NOSQL datastore redis. Extensive datastore-side-scripting is provided via deeply embedded Lua. Unstructured data, can also be stored, as there are no limits on #tables, #indexes, #columns, and sparsely populated rows use minimal memory.

AlchemyDB believes OLTP traffic's needs are best served by extending SQL and has recently added the following experimental functionalities:

  • LuaTable - A column type that is a Lua Table, that single handedly adds both Document-Store & Object-DB functionality by mixing Lua into SQL.
  • GraphDB - so brand new it is not even fully documented :) A GraphDB was created on top of AlchemyDB using SQL for indexes and Lua for graph-traversal logic. AlchemyDB is a customizable data platform.
  • AppStack - AlchemyDB uses a REST API and already had Lua embedded, creating a dynamic HTTP server, serving Lua webpages was a logical step, and it is the fastest dynamic webserver I have ever benchmarked (probably because it can only make internal AlchemyDB calls, i.e. NO backend calls :)

Alchemy Database is optimised for top notch memory efficiency and top notch TPS for OLTP requests:

  • Speed is achieved by being an event driven network server that stores 100% of data in RAM, achieving disk persistence by using a spare cpu-core to periodically log data changes (i.e. no threads, no locks, no undo-logs, no disk-seeks, serving data over a network at RAM speed)
  • Storage data structures w/ very low memory overhead and data compression, via algorithms w/ insignificant performance hits, greatly increase the amount of data you can fit in RAM
  • Optimising to the SQL statements most commonly used in OLTP workloads yields a lightweight RDBMS designed for low latency at high concurrency (i.e. world class speed/thruput).

RAM is CHEAP these days

RAM is now affordable enough to be able to store ENTIRE OLTP Databases in a single machine's RAM (e.g. Wikipedia's English DB is 30GB and a Dell T610 w/ 32GB RAM costs $2100). Data can be asynchronously replicated over the wire (providing high availability) and written to disk via snapshots and appending log files (providing durability) and data I/O is done at RAM speed.

FAST ON COMMODITY HARDWARE:

Client/Server using 1GbE LAN to/from a single core running at 3.0GHz, RAM PC3200 (400MHz)

  • 95K INSERT/sec, 95K SELECT/sec, 90K UPDATE/sec, 100K DELETE/sec
  • Range Queries returning 10 rows: 40K/sec
  • 2 Table Joins returning 10 rows: 20K/sec
  • Lua script performing read and write: 85K/sec

MEMORY EFFICIENT:

  1. Each row has very little overhead when stored (20-30bytes) and Insert speed does not significantly degrade as more indices are added
    • Simple row (PK+TEXT->16 bytes), 1GB stores 40 million rows, insert speed: 70K/sec
    • Complex row (10 Indices+TEXT->48 bytes), 1GB stores 9 million rows, insert speed: 40K/s
    • TEXT fields are compressed. If a 100 character column compresses down to 80 bytes, the row can be stored w/ ZERO storage overhead (e.g. 1million rows of 100 chars will take up 100MB)
  2. Sparse-Rows: tables w/ 1000s of columns, that are sparsely populated, use a serialised hash table in the row's stream to store column offsets. Sparse-Rows can be Orders-Of-Magnitude smaller than full rows ... more info

EASY TO USE:

http://code.google.com/p/alchemydatabase
https://github.com/JakSprats/Alchemy-Database

An implementation of Clojure in pure Python

"Why Python?

It is our belief that static virtual machines make very poor runtimes for dynamic languages. They constrain the languages to their view of what the "world should look like" and limit the options available to language implementors. We are attempting to prove this by writing an implementation of Clojure that runs on the Python VM. We believe that with a proper dynamic JIT (like pypy) a version of clojure running on a dynamic VM can outperform its JVM and CLR counterparts.

Aside from that, there are many Python libraries like PySide (Qt GUI), numpy, scipy, and stackless that do not have JVM counterparts, or at least the Python implementations are easier to use and learn. clojure-py will integrate tightly with thy Python VM and will be able to use all of these libraries..."

https://github.com/halgari/clojure-py

Modeling Time Series Data on top of Cassandra

"At RockMelt they collect data from various sources: server logs, web site logs, browser metrics, etc. Data from these sources gets processed via Hadoop, Splunk or Hive and permanently stored in HDFS or as compressed files in Amazon EBS storage. As it turns out, almost all of our performance, product and business metrics are time-based, and different metrics have different data types/structures. One common use case arises: we need to store and retrieve time series data on any schema. We use this to drive various dashboards displaying the latest metrics, data trends and other interesting numbers.

For example, our crash dashboard displays the number of crashes per hour per browser version for the past 60 days. We use this to track the stability of new releases and to help drive down crashes over time.

Tumblr_lz1h4xszmj1qdhm0a

Why not use RRD?

RRD is a commonly used tool for storing time series data. One major limitation of RRD is that it deals only with numerical values. As you can see in the above example we would like the flexibility to store and retrieve time series data on any schema, be it an array or a complex JSON object..."

http://engineering.rockmelt.com/post/17229017779/modeling-time-series-data-on-top-of-cassandra

 

Advanced Time Series with Cassandra

"Cassandra is an excellent fit for time series data, and it’s widely used for storing many types of data that follow the time series pattern: performance metrics, fleet tracking, sensor data, logs, financial data (pricing and ratings histories), user activity, and so on.

A great introduction to this topic is Kelley Reynolds’ Basic Time Series with Cassandra. If you haven’t read that yet, I highly recommend starting with it. This post builds on that material, covering a few more details, corner cases, and advanced techniques.

Indexes vs Materialized Views

When working with time series data, one of two strategies is typically employed: either the column values contain row keys pointing to a separate column family which contains the actual data for events, or the complete set of data for each event is stored in the timeline itself. The latter strategy can be implemented by serializing the entire event into a single column value or by using composite column names of the form <timestamp>:<event_field>.

With the first strategy, which is similar to building an index, you first fetch a set of row keys from a timeline and then multiget the matching data rows from a separate column family. This approach is appealing to many at first because it is more normalized; it allows for easy updates of events, doesn’t require you to repeat the same data in multiple timelines, and lets you easily add built-in secondary indexes to your main data column family. However, the second step of the data fetching process, the multiget, is fairly expensive and slow. It requires querying many nodes where each node will need to perform many disk seeks to fetch the rows if they aren’t well cached. This approach will not scale well with large data sets.

The second strategy, which resembles maintaining a materialized view, provides much more efficient reads. Fetching a time slice of events only requires reading a contiguous portion of a row on one set of replicas. If the same event is tracked in multiple timelines, it’s okay to denormalize and store all of the event data in each of those timelines. One of the main principles that Cassandra was built on is that disk space is very cheap resource; minimizing disk seeks at the cost of higher space consumption is a good tradeoff. Unless the data for each event is very large, I always prefer this strategy over the index strategy..."

http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

Salt a remote execution and configuration management tool.

"What is Salt?

 

Salt is a powerful remote execution manager that can be used to administer servers in a fast and efficient way.

Salt allows commands to be executed across large groups of servers. This means systems can be easily managed, but data can also be easily gathered. Quick introspection into running systems becomes a reality.

Remote execution is usually used to set up a certain state on a remote system. Salt addresses this problem as well, the salt state system uses salt state files to define the state a server needs to be in.

Between the remote execution system, and state management Salt addresses the backbone of cloud and data center management.

Distributed remote execution

Salt is a distributed remote execution system used to execute commands and query data. It was developed in order to bring the best solutions found in the world of remote execution together and make them better, faster and more malleable. Salt accomplishes this via its ability to handle larger loads of information, and not just dozens, but hundreds or even thousands of individual servers, handle them quickly and through a simple and manageable interface.
 

Simplicity

Versatility between massive scale deployments and smaller systems may seem daunting, but Salt is very simple to set up and maintain, regardless of the size of the project. The architecture of Salt is designed to work with any number of servers, from a handful of local network systems to international deployments across disparate datacenters. The topology is a simple server/client model with the needed functionality built into a single set of daemons. While the default configuration will work with little to no modification, Salt can be fine tuned to meet specific needs.
 

Parallel execution

The core function of Salt is to enable remote commands to be called in parallel rather than in serial, to use a secure and encrypted protocol, the smallest and fastest network payloads possible, and with a simple programming interface. Salt also introduces more granular controls to the realm of remote execution, allowing for commands to be executed in parallel and for systems to be targeted based on more than just hostname, but by system properties.

 

Building on proven technology

Salt takes advantage of a number of technologies and techniques. The networking layer is built with the excellent ZeroMQ networking library, so Salt itself contains a viable, and transparent, AMQ broker inside the daemon. Salt uses public keys for authentication with the master daemon, then uses faster AES encryption for payload communication, this means that authentication and encryption are also built into Salt. Salt takes advantage of communication via msgpack, enabling fast and light network traffic.

 

Python client interface

In order to allow for simple expansion, Salt execution routines can be written as plain Python modules and the data collected from Salt executions can be sent back to the master server, or to any arbitrary program. Salt can be called from a simple Python API, or from the command line, so that Salt can be used to execute one-off commands as well as operate as an integral part of a larger application.

 

Fast, flexible, scalable

The result is a system that can execute commands across groups of varying size, from very few to very many servers at considerably high speed. A system that is very fast, easy to set up and amazingly malleable, able to suit the needs of any number of servers working within the same system. Salt’s unique architecture brings together the best of the remote execution world, amplifies its capabilities and expands its range, resulting in this system that is as versatile as it is practical, able to suit any network.

 

Open

Salt is developed under the Apache 2.0 licence, and can be used for open and proprietary projects. Please submit your expansions back to the Salt project so that we can all benefit together as Salt grows. So, please feel free to sprinkle some of this around your systems and let the deliciousness come forth..."

http://saltstack.org