First 2013 Day

Jan 2nd, 2013

Phone call at 8 am

This morning I received a call from an operator that tells me that the services were down.

The system is made up of:

* two Sun server (X4450) with CentOS 5.X, Xen dom0,
* one DRBD resource for each services,
* a first cluster that control DRBD and Xen,
* a second cluster that controls the services (over the first level).

The server are connected using trunking/bounding pairs of NIC both for a layer 3 switch (HP Procurve 2824) and for connect to one another.

All the DRBD resources are in the correct state but one was in stalled state.

I changed the status of all service in cluster level-2 in ‘standby’ mode with:

    # crm node standby <vm-domU-lev2>

than I changed status also for cluster level-1:

    # crm node standby <vm-domU-lev1>

than I try to restart manually the ‘stalled’ DRBD resource. I follow the instruction for “manual split brain recovery” but this case is different because the two server have distinct roles:

    dom0-a# /etc/init.d/drbd start
    dom0-b# /etc/init.d/drbd start

Set primary/secondary the resource:

    dom0-a# drbdadm primary <resource>
    dom0-b# drbdadm secondary <resource>

Disconnect the “wrong” side and connect it (to force resync):

    dom0-b# drbdadm disconnect <resource>
    dom0-b# drbdadm --  --discard-my-data connect <resource>

We observe the starting of sync process but after little time the resource goes into ‘stalled’ state. I repeat the last steps a few times but with the same results.

After 2 hours of test I suppose that the switch was “in messy” and force to reboot it. This action was decisive :)

After some investigation I discover that in December 2012 HP releases a firmware upgrade that covers some problem with this switch …

After Maya Release

Dec 31st, 2012

Released the Netkit augmented Knoppix DVD

After mr. Knopper has released Knoppix 7.0.5 as “The Final 21.12.2012 Release” we release the “After Maya” netkit DVD :)

Testing Octopress

Dec 31st, 2012

how to detect cdrom device

$ wodim --devices
wodim: Overview of accessible drives (1 found) :
-------------------------------------------------------------------------
0  dev='/dev/sg1'      rwrw-- : 'TSSTcorp' 'DVD+-RW TS-H653B'
-------------------------------------------------------------------------

gist embedding

Include Code Snippets

(bigG.sh) download

#!/bin/bash

APK=../apk

# start emulator
# emulator -avd am -partition-size 1000

# wait until emulator is ready:
adb -s emulator-5554 wait-for-device

# now the loading of the apks
adb -s emulator-5554 shell mount -o remount,rw -t yaffs2 /dev/block/mtdblock0 /system
adb shell chmod 777 /system/app
adb push $APK/GoogleLoginService.apk      /system/app/
adb push $APK/GoogleServicesFramework.apk /system/app/
adb push $APK/Phonesky.apk                /system/app/
adb shell rm /system/app/SdkSetup*

# maps:
#adb push $APK/com.google.android.apps.maps-1.apk /system/app/
#adb push $APK/com.google.android.gms-2.apk /system/app/

# bot

(Nodo.java) download

public class Nodo {
  char info;
  Nodo next;

  public Nodo (char c) {
    info = c;
    next = null;
  }
}

(dhcprelay.startup) download

ip link set eth0 up
ip address add 10.2.0.2/16 brd + dev eth0

ip route add default via 10.2.0.1

#DEBIAN_FRONTEND=noninteractive dpkg -i /root/*.deb
debconf-set-selections /root/sel.txt
dpkg -i /root/dh*.deb

(lab.conf) download

machines="router2 dhcprelay pc3 pc1 router1 pc2"

router1[0]=A
router1[1]=B
router1[mem]=64

router2[0]=C
router2[1]=A
router2[mem]=64

dhcprelay[0]=C
dhcprelay[mem]=64

pc1[0]=A
pc2[0]=B
pc3[0]=C

(pc1.startup) download

dhclient eth0

(router1.startup) download

ip link set eth0 up
ip link set eth1 up
ip address add 10.0.0.1/16 broadcast 10.0.255.255 dev eth0
ip address add 10.1.0.1/16 brd       +            dev eth1

ip route add 10.2.0.0/16 via 10.0.0.2

/etc/init.d/dhcp3-server start

echo 1 > /proc/sys/net/ipv4/ip_forward

sdoro Blog

A blogging framework for hackers.