Cette page appartient aux archives web de l'EPFL et n'est plus tenue à jour.
This page belongs to EPFL's web archive and is no longer updated.

Daniel Tralamazza

Idea for lock extension for Java
I have a "simple" idea for lock composition. Everyone knows that locks are not scalable because they don't compose, not without breaking encapsulation at least.
Modern JVM's can merge locks of the same object if they are used inside the same function block (see http://www.ibm.com/developerworks/java/library/j-jtp10185/).
The idea is to use let the programmer specify explicitly which locks are composable, something like:
composable synchronized void add(int _v) {}
Both compiler and runtime would treat these synchronized methods & blocks differently. There are at least two possible scenarios: (#1) When composable locks are enclosed by a common parent lock, and (#2) no parent lock but just sibling locks.
 
(#1)
synchronized (foo) {
add(1);
add(2);
}
  The VM would force a thread to always acquire the parent lock of a composable lock (recursive).
 
(#2)
add(1);
add(2); 
  All sibling composable locks should be acquired in the same order.
 
This is just a pre-draft of the idea, any comment is welcomed.
Posted by Daniel Tralamazza at 14:34
OProfile + VMware = ?

Recently I had strange problem with OProfile on a VM. OProfile is pretty straightforward, here is an example of operation:

 

$ opcontrol --event=default

$ opcontrol --vmlinux=/usr/lib/debug/lib/modules/`uname -r`/vmlinux

$ opcontrol --start

(do something, benchmark etc)

$ opcontrol --stop

$ opcontrol --dump

$ opreport -l

 

You should see a pretty detailed output of where your CPU is spending time.

Back to the problem, running "opcontrol --list-events" gives you all performance counters (events) that the CPU supports. This command works inside a guest OS in vmware (fusion at least) as if it ran on the host OS, great!. Well, unfortunately there is no data coming from them. VMware doesn't support performance counters, and many other special registers for that matter, but it lets you list them anyway. So how can we use oprofile inside vmware ? A: timer interrupt mode.

First unload oprofile kernel module:

$ opcontrol --deinit

Now load it manually but passing an argument:

$ modprobe oprofile timer=1

Now you are good to go !

 

If opcontrol returns an error "You cannot specify any performance counter events because OProfile is in timer mode", just remove the file ~./oprofile/daemonrc and try again.

 

NOTE: This was done on a Fedora 10, you should check your distro on how to point to vmlinux.

Posted by Daniel Tralamazza at 16:13
Concurrency Paradigms: A Comparison in Scala

Mohsen Lesani presented today a talk on his concurrency project using scala. He gave a great introduction on the different  concurrency control mechanisms:

  • Locks
  • STM
  • Actors
  • Wait-free algorithms

To compare them he implemented two different problems: Bank account transfer and Producer & Consumer. The former shows a clear isolation problem while the later poses a signaling issue. With these simple problems we can already see that there is no silver bullet in concurrent programming, i.e. no single technique can easily implement both problems and yield good performance at the same time.

For me it was interesting to discover that STMs lack signaling capabilities, after the presentation I talked to Mohsen and proposed that the signaling could maybe be done at the end of an atomic block (although I don't know how a thread could wait inside an atomic region).

You can see the presentation below.

Posted by Daniel Tralamazza at 15:31
Linux I/O schedulers vs workloads

This is a quick example of why one size doesn't fit all. The picture shows the TPC-C performance of MySQL under Linux running different I/O schedulers.

tpcc results

The test machine was a Dell PowerEdge R900 16-core 8 disks in RAID10.

For this kind of workloads anticiatory is expected to perform poorly (there is no idleness). I think is safe to assume that both noop and deadline are fast because the RAID controller does most of the job ordering requests. The surprise comes form CFQ terrible performance.

Source: http://www.mysqlperformanceblog.com/2009/01/30/linux-schedulers-in-tpcc-like-benchmark/

Posted by Daniel Tralamazza at 16:06
XtraDB scalability

 Recently I discovered a new storage engine for MySQL called XtraDB. Since Oracle's acquisition of InnoDB MySQL is trying to find a substitute, but so far nothing has come close to its maturity and stability.

XtraDB is basically a InnoDB fork focused on scalability. What's so special about that ? Well their preliminary patches and benchmarks are exactly centered around locks. Most patches actually come from google-mysql-tools.

During a benchmark session on a 24-core machine the XtraDB team discovered a concurrency problem on the rollback segment.

Posted by Daniel Tralamazza at 13:54
Early Experience with a Commercial Hardware Transactional Memory Implementation

Finally (for me at least) Sun published a paper with performance measurements of their shiny new CPU Rock.

For those currently living under a rock (pun intended), this is the newest and delayed sparc processor from Sun which implements Hardware Transactional Memory (HTM).

The paper is available at Sun's Labs Scalable Synchronization Research Group soon to be published at ASPLOS'09.

Posted by Daniel Tralamazza at 14:27
FreeBSD kernel lock profiling

I recently discovered that FreeBSD has a neat way of doing kernel lock profiling. To use it you just need to add a single line in your kernel config script:

options  LOCK_PROFILING

Rebuild, install and reboot your new kernel. To check if it's working call sysctl debug.lock, you should see some options. Now try:

> sysctl debug.lock.prof.enable=1

> (launch a program or just wait a few seconds)

> sysctl debug.lock.prof.enable=0

> sysctl debug.lock.prof.stats

The stats include: file:line and lock type, maximum time held, total time held, total wait time, count, average time held, average wait time, contention while holding, contention while locking

Posted by Daniel Tralamazza at 15:38
Lock profiling #1

My current research involves locking primitives profiling (e.g. mutex, conditional var, etc).

I search the internet almost everyday for profiling tools and papers, but sometimes I forget to search my own machine.

Today I "discovered" that my everyday macbook can lock profiling with one simple command:

tralamazza$ sudo plockstat -n 5 -A -s 8 -e 10 -p <app pid here>

This simple tool plockstat is implemented using DTrace (long live Leopard) and it can display lock contention/wait time along with backtrace for any application.

Even better you can instruct plockstat to show the generated DTrace script so you can customize it.

Not bad at all.

Posted by Daniel Tralamazza at 5:10
Catching disk latency in the act

Q: What happens when you scream at a bunch of disks ?

A: http://blogs.sun.com/bmc/entry/catching_disk_latency_in_the

Posted by Daniel Tralamazza at 15:21