2011
05.26

Google Data API, Google Apps, Provisioning and OAuth

2 people like this post.
Share

The following info is blatantly stolen from this precious post, but the issue I faced is so odd that I wanted to stress about it myself. All credits go to Philip Hofstetter and his blog.

[Edit: the following still applies even when, as described here, the Provisioning API is enabled.]
[Edit: the script below doesn’t handle Results Pagination. That means that it will just return the first 200 or so queried objects. I’ve yet to complete it… Depending on your needs, you may just use Google Apps Manager instead.

I was trying to fetch some info using Google Data API and Python. At some point, I decided to move from simple authentication with user supplied credentials to two-legged OAuth. The Contact feed remained accessible while trying to read Groups, Users or Nicknames (by means of the Provisioning API) failed with “Internal Error” 500 or “Authentication Failure”.

As Philip discovered, some feeds just won’t work unless explicitly permitted access to.
Inconsistency 1: You API Client name (the one you’ll use as “customer_key” in OAuth and the one whose name will most likely match your Google Apps domain name), is already listed under “Manage this domain”, “Advanced tools”, “Manage third party OAuth Client access”. The wording “This client has access to all APIs” is clearly a lie.
Inconsistency 2: I followed Philip advice and manually added the (readonly) feeds/scopes, except that they don’t show up under “Manage API client access”. But they’re somewhat being honored (i.e.: without the tweak, my script won’t work). Moreover, the “Authorize” operation should be done just once and encompass all of the scopes you need. You can’t just add one later. Adding a single scope will revoke access to the previous ones. This behaviour is different from Philip’s (in his screenshots, authorized scopes are indeed visible on Google Apps Domain Control Panel).
This is what I used:

https://apps-apis.google.com/a/feeds/group/#readonly,https://apps-apis.google.com/a/feeds/user/#readonly,https://apps-apis.google.com/a/feeds/nickname/#readonly

And this is the script:

#!/usr/bin/python

# $Id: list_groups_emails_oauth.py,v 1.3 2011/05/26 16:12:42 giuliano Exp giuliano $

import string
import gdata.apps.service
import gdata.apps.groups.service

consumer_key = 'yourdomain.com'
consumer_secret = 'yourOAuthkey'
sig_method = gdata.auth.OAuthSignatureMethod.HMAC_SHA1

service = gdata.apps.groups.service.GroupsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key, consumer_secret=consumer_secret, two_legged_oauth=True)
res = service.RetrieveAllGroups()
for entry in res:
    print 'group;' + string.lower(entry['groupId'])

service = gdata.apps.service.AppsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key, consumer_secret=consumer_secret, two_legged_oauth=True)

res = service.RetrieveAllUsers()
for entry in res.entry:
    print 'email;' + string.lower(entry.login.user_name) + '@' + consumer_key

res = service.RetrieveAllNicknames()
for entry in res.entry:
  if hasattr(entry, 'nickname'):
    print 'alias;' + string.lower(entry.nickname.name) + '@' + consumer_key
2011
05.24

Slow write performance and partition alignment

3 people like this post.
Share

Or how a single sector can make you ten times more happier.
Today I’ll talk about a client issue: getting (extremely) slow write performance when backing up my laptop over an USB drive (a 200GB Samsung S1 Mini). All I usually do is booting the PC with SystemRescueCd, plug an USB disk in, “ddrescue /dev/sda /mnt/externaldisk/laptop_disk_image.dd“, letting it run overnight. Except that this morning the backup wasn’t finished yet. What’s wrong? Long post (for a simple solution) ahead…

The USB disk (shown below as /dev/sdb*) “feels” fast when reading and awfully slow when writing. The most simple way to do an HDD benchmark is, of course dd. Use it along with dstat (an essential tool when pinpointing performance issues, whatever they may be) and you’ll quickly gather some useful figures. Beware! dd can ruin all your data just by mistaking a “b” for an “a”: triple-check and make sure that you’re running it on the right devices!

A sequential write test:

balrog ~ # dd if=/dev/zero of=/mnt/temp/x.bin bs=16384 count=$((100*1024))
102400+0 records in
102400+0 records out
1677721600 bytes (1.7 GB) copied, 455.334 s, 3.7 MB/s

3.7 MB/s only, definitely slow. :( Note that you shouldn’t use /dev/{random,urandom} as input file, they’re a bottleneck by themselves. /dev/zero, on the other hand, is super-fast. “dd if=/dev/zero of=/dev/null bs=16384 count=$((10000*1024))” (shove zeros to /dev/null) is bound only by the CPU, running at about 9.3 GB/s here.

A sequential read test:

balrog ~ # dd if=/mnt/temp/x.bin of=/dev/null bs=16384 count=$((100*1024))
102400+0 records in
102400+0 records out
1677721600 bytes (1.7 GB) copied, 52.1106 s, 32.2 MB/s

30 MB/s, that’s the order of magnitude I was expecting (confirmed here).

If I repeat the write test and, at the same time, run dstat, I notice that there are no burst or drops: speed is constant.

balrog linux-2.6.36-gentoo-r5 # dstat -p -d -D sdb
---procs--- --dsk/sdb--
run blk new| read  writ
  0   0 1.0| 362k  888k    # <-- ignore the first sample
  0   0   0|   0     0
4.0 1.0 1.0|   0     0
1.0 2.0   0|   0   360k    # <-- "dd" starts
  0 2.0   0|   0  3360k
  0 2.0   0|   0  3240k
1.0 2.0   0|   0  3240k
  0 2.0   0|   0  3240k

Since reading works, kernel and USB Host Controller seem to go along well. Issue should lie on the disk’s side. I had no clue of what was happening until I tried writing straight to the disk instead of the first primary partition (i.e.: /dev/sdb instead of /dev/sdb1), thus trashing the filesystem (I’ve got no data to lose on that disk: no worries).

balrog ~ # dd if=/dev/zero of=/dev/sdb bs=16384 count=$((100*1024))
1677721600 bytes (1.7 GB) copied, 63.0382 s, 26.6 MB/s

Even though the difference between read and write throughput seems to be too much (almost one order of magnitude), this is starting to look like a FS blocksize/partition aligment issue. Well, some disks use a physical sector size (PSS) of 512 bytes. Others use 4096 bytes (4 KiB). Others use the latter but tell the OS that they’re using 512 bytes or more simply the OS can’t figure out the right physical sector size… And USB mass storage devices tell the OS almost nothing (hdparm won’t help this time)…

Filesystem (or other “structured storage” systems like, for instance, datafiles in databases) organize their data in blocks. The block size can sometimes be adjusted. 4096 bytes is a quite common value:

balrog ~ # tune2fs -l /dev/sdb1 | grep -i block.size
Block size:               4096

A sector represents the smallest chunk of data that can be read/written from/to a disk. If its size is 512, and the filesystem block size is 4096, the filesystem driver will read/write batches of 8 sectors. Better said: the FS thinks to deal with 4k blocks, not knowing that lower level functions will further split them in eight (if only logically).
Consider another example: the PSS is 4096, but the drive acts as if it was 512. Physical sectors can be found at absolute offset sector_number*512*8 (0, 4096, 8192, …). What if a 4k write operation happens at offset 1*512*8-512? (3584, it doesn’t look like a “bad” offset: as far as the OS is concerned, any multiple of 512 is fine). The drive, being unable to write less than 4k and at proper locations, will: read sector 0, read sector 1, modify the last 512 bytes chunk of sector 0, modify the first three chunks of sector 1 then write both sectors (or something similar). If things were properly aligned, a single write operation would’ve sufficed. Read operations speed, on the other hand, may be almost unaffected. Think about it: unless you’re dealing with tons of 4k files spread across sector couples (i.e.: two sectors read instead of one), large chunks of data are (hopefully) laid out sequentially of the disc. Reading 1GB plus 512 bytes, instead of 1GB alone, won’t change anything benchmarks.

What’s up with my partition?

balrog ~ # sfdisk -uS -l /dev/sdb    

Disk /dev/sdb: 24321 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1            63 390716864  390716802  83  Linux

sfdisk (one of fdisk‘s cousins) shows that the partition starts at byte 63*512=32256. This value isn’t divisible by 4096, yielding a non integer result. Sector 64, instead, is a good place to start an aligned partition:

63*512/4096 = 7.87
64*512/4096 = 8.00

Similarly, other partitions should start at sectors that are multiples of 8 (because 512*8=4096).
This is the corrected partition table. Moving the partition forward (by a mere 512 bytes) causes a 10x write speed increase.

balrog ~ # sfdisk -uS -l /dev/sdb

Disk /dev/sdb: 24321 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1            64 390721967  390721904  83  Linux

You may still have a question though. Does aligning a partition mean that the contained filesystem is aligned too? You’re right, that assumption should not be taken for a fact.

A filesystem is made up of “your” data and “its” data (the latter being internal structures necessary to organize the former). In any case, a FS will try to pad/align stuff to the block size. That was to say that, if you start a partition and the partition is aligned to a given boundary, the filesystem (all of its composing blocks) will be aligned too.

You can find a description of the ext2 layout here. Partition starts with two sectors reserved for the boot loader. Then comes a 1k chunk holding ext2 “superblock”. At offset 56 within the superblock, we should find the ext2 magic number (0x53EF), and here it is:

giuliano@giuliano ~ $ dd if=/dev/sdb1 bs=512 skip=2 count=1 2>/dev/null | xxd -s +56 -l 2
0000038: 53ef                                     S.

The next byte after the superblock, is byte 4096. From then on, everything happens (from the FS point of view) in chunks as big as the configured block size. My disk is a 4k sector disk, formatted with a single partition aligned (as the FS) to a 4K block. FS block size is 4K too. Can’t do really do any better than that besides choosing a filesystem that manages to handle the given workload with fewer read/write operations, but I digress…

2011
04.20

About path thrashing and why you should always zone

2 people like this post.
Share

So, Customer starts updating all of his VMware ESX hosts and things turn out for the worst. VMs are crawling slow (ping response time from 0 to 1000ms), console access through vSphere client doesn’t always work, and hosts’ CPU percentage is unnaturally high. Cause is apparent: path thrashing.
Path thrashing happens when, for some reason, SCSI LUNs are being continuously reassinged from a controller (Target) to another one. ESX has a hard time “bouncing” I/O back and forth on the right Fibre Channel path. On Active/Passive SAN arrays a LUN can be “owned” by just one controller at a time. If the LUN owner has to be changed because of a hardware failure (path, Controller, SFP/GBIC, FC switch, …) or because the Initiator would like to, the LUN itself has to “trespass” (in EMC parlance), transition to another controller. The “command” to do so can be issued by the Initiator or internally by the storage subsystem.
Back to today’s case, I was dealing with an IBM DS4800 where LUNs flipped like mad between controller A and B. How to stop it quickly?

  • If anything, the flipping shows that failover works as expected (VMs don’t crash despite the chaos).
  • That said, I could just disconnect a controller. Not really because the same storage system hosts an Oracle RAC cluster, humming along happily, unaffected by the issue.
  • I need a way to selectively “hide” a controller from one or more hosts. I can do it easily by tweaking the SAN zoning configuration.

A Zone (much like a VLAN) is basically a group of WWNs (or ports). Objects in the Zone can only talk to each other. While creating Zones, it is common practice to “go minimal”: they should contain as few stuff as possible. I usually name them like this:
    Z_HOSTNAME_P1_DS4800_CA1_CB1
HBA Port 1 of HOSTNAME can see Controller A/Port 1 and Controller B/Port 1 of the DS4800.
Thus, going through each ESX server’s Zone, I just remove the Controller that the host shouldn’t see. Path thrashing is temporarily stopped.
The above rant serves mainly as a pro-zoning argument. “If every HBA port has to access every Controller’s port, why implement zoning?”. As you just read, zoning saved me from serious trouble, today.
About the “real” issue, it was ultimately caused by a thing called “Auto Volume Transfer” (AVT)1. Let’s say that a LUN is assigned to controller A, but I/O for the LUN is issued to controller B. With AVT switched on the storage system will automatically transfer the LUN from A to B.
The Customer ESX servers are all (correctly) configured to use the “Most Recently Used” (MRU) path to a LUN. It seems that ESX, from a certain version on, issues I/O on the standby path, causing havoc if AVT is on. I can’t tell if that’s because it is fooled into thinking that the storage is an Active/Active one or if it just sort of periodically “probes” standby paths.
How do you switch AVT off? By using the DS “Storage Manager” and changing the ESX Hosts’ type from “Linux” (or whatever) to “LNXCLVMWARE”. This applies to all of the LSI derived Storage Systems (IBM, SUN StorageTek, Engenio, …). The latter host type is the right one to use when hooking an ESX cluster to an IBM DS Storage System. But “Linux” seems to do just fine on not so new ESX hosts version 4.1.x … When AVT is off, the Storage will decide to trespass LUNs only in the event of an internal hardware failure while, normally, LUN ownership will be handled by the multipathing software on the Host.

More reading on the subject:

[1] Differences between the “Linux” and “LNXCLVMWARE” host types.
[2] How does Auto Volume Transfer (AVT) work? Courtesy of Google’s cache. Lists which SCSI commands trigger AVT.
[3] A really nice blog post about the same issue described here. (Found, of course, when I was writing mine)

  1. or even “Auto Disk Transfer” (ADT)
2011
04.12

Cloning DHCP reservations between Windows servers

4 people like this post.
Share

Quick post to show you how DHCP reservations can be replicated between Windows servers. Why whould you want to do that? Because often, to achieve DHCP service high availability, DHCP scopes are equally divided between servers. When a client PC is connected to the network, it sends out a broadcast to discover which DHCP servers are active on that particular ethernet segment. Depending on their number, the PC will receive one or more answer, each offering an IP address. If a client is to be assigned a fixed IP, all of those offers should bear the same IP address. Hence, DHCP reservations need to be configured the same for every DHCP server in the given scope. As far as I know, this needs to be done by hand. To speed up the process, I use netsh (see Netsh commands for DHCP).

The command below will dump all of the reservations to a file named “reservations.txt”. findstr filters netsh output keeping just the info we need.

C:\Documents and Settings\Administrator> netsh dhcp server \\dhcpsrv1 scope 10.4.0.0 dump | findstr Add.reservedip > reservations.txt

Each line in “reservations.txt” should look like this:

Dhcp Server 10.4.1.1 Scope 10.4.0.0 Add reservedip 10.4.5.3 58b04576339a "pcname.domain.lan" "Reservation Comment" "BOTH"

10.4.1.1 is the IP address for dhcpsrv1, the “source” DHCP server.

Open “reservations.txt” in a text editor, check that everything is fine and substitute the source DHCP server IP with the target’s one (i.e.: 10.4.1.1 becomes 10.4.1.2), save the file and run:

C:\Documents and Settings\Administrator> netsh < reservations.txt
netsh>
Changed the current scope context to 10.4.0.0 scope.

Command completed successfully.
netsh>
Command completed successfully.
netsh>
[..]

That’s it; not a fancy trick, but it may be useful nonetheless. Just beware that, when there are thousands of clients, netsh could take a while to complete its job (especially the “dump” step)…

2011
03.30

JavaScript for Sysadmins, again

Share

Following up on the previous post, let me show you other ways to trick web application into doing what they’ve not been designed to do: saving the Sysadmin some typing and avoding errors. I’ll use JavaScript, jQuery, Greasemonkey and Perl to automate Firefox and implement a sort of dynamic form filling. Let’s start with a screencast:

The config I had to do (on a SonicWALL firewall), involves about 70 subnets, each similar to the other. Only the subnet’s addressing scheme changes, making creation of VLANs/objects/rules a repetitive and error-prone task.

What’s happening in the screencast? VLAN sub-interfaces are being automatically created without me having to type anything at all. How could that be? A simple Greasemonkey script calls a “web service” (AJAX style) fetching the needed data and filling the form for me (I just click the “OK” or “Cancel” buttons). Why this whole Greasemonkey/web service mess? Because here, as in the previous post, JavaScript code is basically being injected into a “page”. Pages (or tabs, or windows) are ran by the browser into a sandbox: they can’t exchange data between each other. Thus page A (the webapp) can’t access code/data in page B (our code). Moreover, injected JavaScript gets lost when the page is closed (think ugly GUIs where dialog windows pop up just to be destroyed shortly after). We need a way (Greasemonkey) to re-inject the code each time our page is shown and some external, long lived, entity to update/keep status (the web service).

Thanks to the HTTP::Server::Simple Perl module, building the web service is trivial. The only logic behind it is keeping track of the current VLAN’s index, iterating through each value on subsequent web service calls:

#!/usr/bin/perl
use strict;
package LameServer;
use base qw(HTTP::Server::Simple::CGI);

my @VLANS = qw(10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98);

my $COUNTER = 0;

sub handle_request {
    my ($self, $cgi) = @_;

        print "HTTP/1.0 200 OK\r\n";
        print "Content-type:text/plain\r\n\r\n";
        print STDERR "$VLANS[$COUNTER] index $COUNTER\n";
        print $VLANS[$COUNTER];
        $COUNTER++;
        $COUNTER = 0 if $COUNTER >= @VLANS;
}

1;

package main;

my $server = LameServer->new(3333);
$server->run();

And here’s the Greasemonkey script. It runs automatically when the “add VLAN” page pops up, does the sort-of-AJAX call, uses jQuery to properly fill the form.

// ==UserScript==
// @name           CurrentVLAN
// @namespace      dontcare
// @require        http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js
// @include        https://192.168.168.168/*
// ==/UserScript==

if (window.location.toString().match(/192.168.168.168.*editInterface/)) {
    GM_xmlhttpRequest({
        method: "GET",
        url: "http://localhost:3333/",
        onload: function(response) {
            window.GM_CurrentVLAN = parseInt(response.responseText);
            GM_log(window.GM_CurrentVLAN);
            }
    });
    function GM_wait() {
        if(typeof unsafeWindow.jQuery == 'undefined') {
            window.setTimeout(GM_wait,100);
        }
        else {
            $ = unsafeWindow.jQuery;
        }
    }
    GM_wait();

    window.setTimeout(function() {
            $('select[name=interface_Zone]').val('CZ');
            unsafeWindow.setIfaceContent(); // "Zone" onchange function.
            $('input[name=iface_vlan_tag]').val(window.GM_CurrentVLAN+100);
            $('select[name=iface_vlan_parent]').val('X0');
            $('input[name=lan_iface_lan_ip]').val("10.0."+window.GM_CurrentVLAN+".1");
            $('input[name=lan_iface_ping_mgmt]').attr('checked', true);
            }, 1000);
}

I built similar scripts to create static ARP entries and routes. Another one took care of firewall symbolic objects (names) but, as that part of the config is itself carried out by AJAX (in a non reloading window), I didn’t need Greasemonkey, just Firebug.
No kidding, the above tricks saved me half a day of tedium…

2011
03.19

FireQuery fun

1 person likes this post.
Share

Or how to toggle a thousand checkboxes clicking none.
Have you ever wondered that jQuery may be relevant to your everyday sysadmin job? It is: web-based GUIs proliferate and the Command Line is sooo nineties (this one’s not mine)…
Working on a SonicWALL/Aventail SSL VPN box, I was asked to simplify how permissions were mapped to Users. What a User could or couldn’t do, was defined at the User level. Ok, let’s just:

  • Create an A/D group for each role/profile.
  • Configure permissions (rules, resources access, …) on various Communities (Aventail parlance for roles/profiles).
  • Put the right Users into the right A/D group.
  • Cleanup: generally, each Community should just have A/D groups assigned to it. Get rid of unnecessary Community memberships.

First three tasks were easy (thanks also to the DS* commands). The fourth, uhm:

Removing members from a Community means unchecking each User. In my case, more than one thousand clicks and a beginning of carpal tunnel syndrome. At page three I started to think about a less saddening way.

FireQuery is a Firefox extensions that let’s you “inject” jQuery into any webpage. Here’s what I did:

  • Went to the page depicted above (the one that let’s you assign members to a Community).
  • Launched Firebug.
  • Inspected the DOM and noticed that each of the checkboxes value begins with “AV”.
  • Used the debugger to see what’s going on when checking/unchecking members. Nothing really strange: just a bit of JavaScript to highlight a row depending on its checkbox’s status.
  • Hit the jQuerify button. FireQuery is needed because Aventail’s web-based GUI doesn’t use jQuery.
  • Went to Console, typed the JavaScript one-liner below and hit Run
$('input[value^="AV"]').attr('checked', false)

which translates to: use jQuery to select all of the checkboxes whose value starts with “AV”. Uncheck the selected checkboxes.

See? No useless clicking: a GUI has been CLI-fied.

2011
02.09

Oracle Grid Control – crashing runInstaller

1 person likes this post.
Share

Today I ran into a weird issue while installing Oracle Grid Control Agent 10.2.0.3 on Linux. Right after typing “runInstaller”, OUI crashed because of segmentation fault… Let me talk about some of the troubleshooting maneuvers you may need to perform should you find yourself in similar troubles.

Here are the relevant details:

  • OS: Red Hat Enterprise Linux Server 5.3 x86-64
  • GC Agent: Oracle Enterprise Manager 10g Grid Control Release 3 (10.2.0.3) for Linux x86-64
  • GC Console: Oracle Enterprise Manager 10g Release 5 (10.2.0.5) Grid Control for Microsoft Windows 32-bit

And here’s the error message (the most interesting portions):

An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : 11 occurred at PC=0xE44F46A7
Function=[Unknown.]
Library=(N/A)

[..]

Current Java thread:
        at sun.awt.motif.MToolkit.init(Native Method)
        at sun.awt.motif.MToolkit.<init>(Unknown Source)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)

[..]

Heap at VM Abort:
Heap
 def new generation   total 576K, used 84K [0xe6510000, 0xe65b0000, 0xe7090000)
  eden space 512K,   4% used [0xe6510000, 0xe65152f8, 0xe6590000)
  from space 64K, 100% used [0xe65a0000, 0xe65b0000, 0xe65b0000)
  to   space 64K,   0% used [0xe6590000, 0xe6590000, 0xe65a0000)
 tenured generation   total 6212K, used 4461K [0xe7090000, 0xe76a1000, 0xefb10000)
   the space 6212K,  71% used [0xe7090000, 0xe74eb5f8, 0xe74eb600, 0xe76a1000)
 compacting perm gen  total 5632K, used 5398K [0xefb10000, 0xf0090000, 0xf3b10000)
   the space 5632K,  95% used [0xefb10000, 0xf00558b0, 0xf0055a00, 0xf0090000)

Local Time = Tue Feb  8 09:45:48 2011
Elapsed Time = 1
#
# The exception above was detected in native code outside the VM
#
# Java VM: Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode)
#

To go past this show-stopper I tried a few things…

The Heap report produced by java at crash time, seemed to indicate a memory shortage. By editing the “install/oraparam.ini” file, you can tweak how much RAM is available for OUI’s JVM. Just alter “JRE_MEMORY_OPTIONS” value.

#JRE_MEMORY_OPTIONS=" -mx150m"
JRE_MEMORY_OPTIONS=" -Xms512m -Xmx2048m"

This is also a safe place to put additional command line parameters: they’ll mostly be passed to java’s command line. I said “mostly” because OUI wrapper/launcher seems to check some sort of allowed parameters list and may refuse to go on if somethings doesn’t look right.

The “-XX:MaxPermSize=32m” is one of the knobs that doesn’t pass the sanity check. In order to run OUI’s JVM by hand, with the right parameters, just keep the first lines of runInstaller (the ones starting with ‘Arg:‘):

Arg:0:/tmp/OraInstall2011-02-08_04-55-33PM/jre/1.4.2/bin/java:
Arg:1:-Doracle.installer.library_loc=/tmp/OraInstall2011-02-08_04-55-33PM/oui/lib/linux:
Arg:2:-Doracle.installer.oui_loc=/tmp/OraInstall2011-02-08_04-55-33PM/oui:
Arg:3:-Doracle.installer.bootstrap=TRUE:
[..]
Arg:20:-timestamp:
Arg:21:2011-02-08_04-55-33PM:
Arg:22:-nowelcome:

Strip “^Arg:“, “^\d*:“, “:$“, add a trailing “ \” and you’ll have an OUI launching shell script you can alter at will.

Increasing JVM’s memory led to no effect. Heap report looked fine (usage percentages went down) but crash was still there.

Another useful switch is “-XX:+ShowMessageBoxOnError“. It makes java halt on error, allowing us to attach a debugger and perform a stack backtrace, e.g.:

Unexpected Signal: 11, PC: 0x6d4626a7, PID: 4866
An error has just occurred.
To debug, use 'gdb /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/bin/java 4866'; then switch to thread -136623920
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xf7e462b6 in nanosleep () from /lib/libc.so.6
#2  0xf7e460df in sleep () from /lib/libc.so.6
#3  0xf7bdc6d7 in os::message_box ()
   from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#4  0xf7bd9c52 in os::handle_unexpected_exception ()
   from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#5  0xf7bddbf6 in JVM_handle_linux_signal ()
   from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#6  0xf7bdc9d8 in signalHandler ()
   from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#7  <signal handler called>
#8  0x6d4626a7 in ?? ()
#9  0x6d6d75b9 in XtToolkitInitialize () from /usr/lib/libXt.so.6

I also tried to “inject” a couple of newer JVM’s into the stage directory. The quickest way is to borrow it from another installer.

[oracle@racnode01 orastage]$ find . -type d -name oracle.swd.jre -exec echo {} \; -exec ls {} \;
./Linux_x86_64_Grid_Control_full_102030/Disk1/stage/Components/oracle.swd.jre
1.4.2.8.0
./p6810189_10204_Linux-x86-64/Disk1/stage/Components/oracle.swd.jre
1.4.2.14.0

The server’s has a “working” directory were Oracle patches/products are stored before use. In my case, changing OUI’s JVM from 1.4.2.8 to 1.4.2.14 is a matter of copying:

./p6810189_10204_Linux-x86-64/Disk1/stage/Components/oracle.swd.jre/1.4.2.14.0

to:

./Linux_x86_64_Grid_Control_full_102030/Disk1/stage/Components/oracle.swd.jre

Then modifing the same “oraparam.ini” file mentioned before.

#JRE_LOCATION=../stage/Components/oracle.swd.jre/1.4.2.8.0/1/DataFiles
JRE_LOCATION=../stage/Components/oracle.swd.jre/1.4.2.14.0/1/DataFiles

You could as well download a specific JRE from http://java.sun.com (sorry: from Oracle) and:

  • install the new JRE somewhere
  • unzip (-t) the “filegroup1.jar” file that corresponds to OUI’s “factory” JRE. Note how the directories are laid out (something like: “jre/1.4.2″). Modify the new JRE accordingly.
  • zip the new JRE, rename the resulting file to “filegroup1.jar”, copy it in the right place.
  • modify oraparam.ini and choose the JVM version you’ll boot OUI into.
[oracle@racnode01 oracle.swd.jre]$ pwd
/opt/orastage/Linux_x86_64_Grid_Control_full_102030/Disk1/stage/Components/oracle.swd.jre
[oracle@racnode01 oracle.swd.jre]$ find . -type f
./1.4.2.8.0/1/DataFiles/filegroup1.jar   # <-- factory
./1.4.2.8.0/1/DataFiles/filegroup2.jar
./1.4.2.8.0/1/DataFiles/filegroup3.jar
./1.4.2.8.0/1/DataFiles/filegroup4.jar
./1.4.2.8.0/1/DataFiles/filegroup5.jar
./1.4.2.14.0/1/DataFiles/filegroup1.jar  # <-- stolen from patchset p6810189
./1.4.2.14.0/1/DataFiles/filegroup2.jar
./1.4.2.14.0/1/DataFiles/filegroup3.jar
./1.4.2.14.0/1/DataFiles/filegroup4.jar
./1.4.2.14.0/1/DataFiles/filegroup5.jar
./1.4.2.19.0/1/DataFiles/filegroup1.jar  # <-- downloaded by hand

Three different JREs, each of them segfaulting in the same spot, as we saw in the backtrace:

#9  0x6d6d75b9 in XtToolkitInitialize () from /usr/lib/libXt.so.6

Who’s the owner of libXt?

[root@racnode01 ~]# rpm -q --queryformat '%{NAME}-%{VERSION}-%{RELEASE} %{ARCH}\n' -f /usr/lib/libXt.so.6
libXt-1.0.2-3.1.fc6 i386

After making sure that none of the running processes was using that package contents, I decided to remove it (rpm -e –nodeps libXt-1.0.2-3.1.i386) and reinstall it. Surprisingly, OUI worked flawlessy after this last action. Too bad I can’t really explain why. :( libXt version didn’t change before/after reinstall. I should diff it anyway with what’s left untouched on other RAC cluster members. I’ll update the post when I have a stricter explanation…

2011
01.20

Extracting data from a CCTV DVR hard disk

1 person likes this post.
Share

Today I was handed a hard disk, removed from from a CCTV DVR about which we knew nothing (no model/make). The request was to extract as much footage as possible from it. I hooked the disk to my laptop via a SATA/USB adaptor but, unsurprisingly, wasn’t able to find/mount a filesystem. I dumped the first 30 megabytes off the device (dd if=/dev/sdb of=x.bin bs=1M count=30) and opend the resulting file with a hex editor:

# od -t x1 -a x.bin | sed 's/nul/   /g' | head
0000000  44  48  46  53  34  2e  31  00  00  00  00  00  00  00  00  00
          D   H   F   S   4   .   1                                    
0000020  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00
                                                                       
*
0036040  00  00  00  00  00  00  00  00  03  00  00  00  01  00  00  00
                                        etx             soh            
0036060  03  00  00  00  00  00  00  00  00  00  00  00  00  00  00  00
        etx                                                            
0036100  00  00  00  00  00  00  00  00  22  00  00  00  00  00  00  00

The first bytes read “DHFS4.1″. Various Google searches return just one english hit: a machine translated page from chinese mentioning a company named “Dahua” that manufactures some video surveillance equipment. On their support page an HDD Download Tool can be found (filename: HDD Download Tool.rar). It can look up video clips stored on a disk by date, time and channel (input number), and extract them as “.dav” files (e.g: 01.23.03-01.33.21[M][@2a2ed][0].dav). You can convert .dav files to .avi using “dhavi.exe” (from bahamassecurity.com). I was able to run the latter with Wine, which probably means that .dav content can be packed into an .avi container without transcoding (even though the website says something about H.264 codecs). The former tool, instead, requires Windows because of the raw disk I/O going on.

Hopefully, this post gathered some difficult to find info, coming either from Chinese pages or sites not indexed by Google…

Size    | md5sum                           | Filename
--------+----------------------------------+------------------------
240168  | cbe0912a78074226060d4101f4f67902 | *HDD Download Tool.rar
87552   | 18b79f0827dfa27cf4e068f02d78f02b | *HDD Download Tool User's Manua 2009-6.doc
216081  | 993c1bd56e03427351bb8cdfea142803 | *General_DiskCopy_Eng_TS_V1.00.0.R.090611.rar
299008  | 624721057cd0857ef5ecbde9643debfd | *General_DiskCopy_Eng_TS_V1.00.0.R.090611.exe
208896  | e5e3fa834ccf7e6b1ad222b1aa38b91c | *DiskIOCtl.dll
1037312 | 4a5b234d673e5d10fbddddae3bb777a1 | *dhavi.exe
2011
01.12

Cloning an Active Directory group

Share

Or how to use the dsquery/dsget/dsmod commands to copy all the members from an
Active Directory group (source), to another one (destination).

If, like me, you are on a neverending quest to click less and script more, you can solve the problem this way:

  • Create the destination group, should it not exist.
  • Find the source group’s DN:
    >dsquery group -samid sourcegroup
    "CN=sourcegroup,OU=Groups,DC=contoso,DC=com"

    -samid” argument is the group name whose DN you’re looking for. You can use “*” as a wildcard.

  • Ditto for the destination group:
    >dsquery group -samid destinationgroup
    "CN=destinationgroup,OU=Groups,DC=contoso,DC=com"</li>
  • On with the copy itself:
    >dsget group "CN=sourcegroup,OU=Groups,DC=contoso,DC=com" -members -expand | dsmod group "CN=destinationgroup,OU=Groups,DC=contoso,DC=com" -addmbr -c
    dsmod succeeded:CN=destinationgroup,OU=Groups,DC=contoso,DC=com

    These are two commands: “dsget group” and “dsmod group“. Output from the first is piped to the second. “-members” causes the group members’ DNs to be listed on standard output (one by line, quoted). “-expand” makes dsget to recursively expand the sub-groups that sourcegroup may hold.
    Conversely, dsmod modifies destinationgroup adding members to it.
    Very cool, so far. The only caveat is that the “-c” switch doesn’t work as advertised. It should copy members over destinationgroup even if already exist, but it doesn’t. If you need to re-sync source and dest, delete source’s contents from dest.

Bonus note; here’s a quick way to discover a user’s DN given his username:

>dsquery user -samid jdoe
"CN=John Doe,CN=Users,DC=contoso,DC=com"
2010
12.14

Symantec Endpoint Protection – crypt32 errors

1 person likes this post.
Share

One of the most procrastinated issues I had at a Customer’s, was the proliferation of errors like these (as shown in servers/clients Event Viewer):

Event Type: Error
Event Source:   crypt32
Event Category: None
Event ID:   8
Description:
Failed auto update retrieval of third-party root list sequence number from: <http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootseq.txt> with error: This network connection does not exist.
Event Type: Error
Event Source:   crypt32
Event Category: None
Event ID:   11
Description:
Failed extract of third-party root list from auto update cab at: <http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab> with error: A required certificate is not within its validity period when verifying against the current system clock or the timestamp in the signed file.

There are several posts mentioning the issue, this one pointed me in the right direction. Basically, because of how SEP components communicate, Windows is triggered into updating the list of trusted root Certification Authorities. It tries to do so through the Internet using the Computer account. The latter may not have any proxy configured. Being unable to reach outside, the host gets flooded by crypt32 errors.

In order to solve the issue, I decided to deploy a valid proxy configuration, for the Computer account (SYSTEM user), on a subset of the Domain’s hosts.
One of the ways to script that is the “proxycfg -u” command1 that works by copying the current user proxy settings to the SYSTEM’s registry. Sounds cool but if the current user is not a member of the local Administrators group, he won’t have the necessary rights. The following script instead, can be launched via Group Policy2 during operating system startup, and since it’s a startup script rather than a login one, it will run with administrative privileges.

Nothing fancy in the below source. It creates the registry key if it doesn’t exist, then sets the right value for WinHttpSettings which I obtained this way:

  • use “proxycfg -u” on a test host
  • use the Registry editor to export the contents of HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Connections

The value is of type REG_BINARY. Since the RegWrite API (method of class WScript.Shell) cannot deal with binary values, WMI (StdRegProv registry provider) needs to be used. Also, SetBinaryValue expects an array of decimal values, while Regedit exports them as hexadecimal digits (you’ll have to take care of the conversion yourself).

On Error Resume Next
Const HKEY_LOCAL_MACHINE = &H80000002

strPath = "SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Connections"
strKey = "WinHttpSettings"
strValue = "24,0,0,0,0,0,0,0,3,0,0,0,19,0,0,0,112,114,111,120,121,46,99,117,115,116,46,108,97,110,58,56,48,56,48,47,0,0,0,49,48,46,42,46,42,46,42,59,115,101,114,118,101,114,50,48,59,115,101,114,118,101,114,50,48,46,42,59,42,46,99,117,115,116,46,108,97,110,59,60,108,111,99,97,108,62"
strMachineName = "."

arrValues = Split(strValue,",")
strMoniker = "winMgmts:\\" & strMachineName & "\root\default:StdRegProv"
Set oReg = GetObject(strMoniker)
rv = oReg.CreateKey(HKEY_LOCAL_MACHINE, strPath)
rv = oReg.SetBinaryValue(HKEY_LOCAL_MACHINE, strPath, strKey, arrValues)

If the scripts works as it should, you’ll be greeted by these events:

Event Type: Information
Event Source:   crypt32
Event Category: None
Event ID:   7
Description:
Successful auto update retrieval of third-party root list sequence number from: <http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootseq.txt>
Event Type: Information
Event Source:   crypt32
Event Category: None
Event ID:   2
Description:
Successful auto update retrieval of third-party root list cab from: <http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab>

And, hopefully, crypt32 errors will be gone for good.

  1. See Using the WinHTTP Proxy Configuration Utility
  2. Computer Configuration, Windows Settings, Scripts, Startup