2011
02.09

Today I ran into a weird issue while installing Oracle Grid Control Agent 10.2.0.3 on Linux. Right after typing “runInstaller”, OUI crashed because of segmentation fault… Let me talk about some of the troubleshooting maneuvers you may need to perform should you find yourself in similar troubles.

Here are the relevant details:

  • OS: Red Hat Enterprise Linux Server 5.3 x86-64
  • GC Agent: Oracle Enterprise Manager 10g Grid Control Release 3 (10.2.0.3) for Linux x86-64
  • GC Console: Oracle Enterprise Manager 10g Release 5 (10.2.0.5) Grid Control for Microsoft Windows 32-bit

And here’s the error message (the most interesting portions):

An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : 11 occurred at PC=0xE44F46A7
Function=[Unknown.]
Library=(N/A)

[..]

Current Java thread:
        at sun.awt.motif.MToolkit.init(Native Method)
        at sun.awt.motif.MToolkit.<init>(Unknown Source)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)

[..]

Heap at VM Abort:
Heap
 def new generation   total 576K, used 84K [0xe6510000, 0xe65b0000, 0xe7090000)
  eden space 512K,   4% used [0xe6510000, 0xe65152f8, 0xe6590000)
  from space 64K, 100% used [0xe65a0000, 0xe65b0000, 0xe65b0000)
  to   space 64K,   0% used [0xe6590000, 0xe6590000, 0xe65a0000)
 tenured generation   total 6212K, used 4461K [0xe7090000, 0xe76a1000, 0xefb10000)
   the space 6212K,  71% used [0xe7090000, 0xe74eb5f8, 0xe74eb600, 0xe76a1000)
 compacting perm gen  total 5632K, used 5398K [0xefb10000, 0xf0090000, 0xf3b10000)
   the space 5632K,  95% used [0xefb10000, 0xf00558b0, 0xf0055a00, 0xf0090000)

Local Time = Tue Feb  8 09:45:48 2011
Elapsed Time = 1
#
# The exception above was detected in native code outside the VM
#
# Java VM: Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode)
#

To go past this show-stopper I tried a few things…

The Heap report produced by java at crash time, seemed to indicate a memory shortage. By editing the “install/oraparam.ini” file, you can tweak how much RAM is available for OUI’s JVM. Just alter “JRE_MEMORY_OPTIONS” value.

#JRE_MEMORY_OPTIONS=" -mx150m"
JRE_MEMORY_OPTIONS=" -Xms512m -Xmx2048m"

This is also a safe place to put additional command line parameters: they’ll mostly be passed to java’s command line. I said “mostly” because OUI wrapper/launcher seems to check some sort of allowed parameters list and may refuse to go on if somethings doesn’t look right.

The “-XX:MaxPermSize=32m” is one of the knobs that doesn’t pass the sanity check. In order to run OUI’s JVM by hand, with the right parameters, just keep the first lines of runInstaller (the ones starting with ‘Arg:‘):

Arg:0:/tmp/OraInstall2011-02-08_04-55-33PM/jre/1.4.2/bin/java:
Arg:1:-Doracle.installer.library_loc=/tmp/OraInstall2011-02-08_04-55-33PM/oui/lib/linux:
Arg:2:-Doracle.installer.oui_loc=/tmp/OraInstall2011-02-08_04-55-33PM/oui:
Arg:3:-Doracle.installer.bootstrap=TRUE:
[..]
Arg:20:-timestamp:
Arg:21:2011-02-08_04-55-33PM:
Arg:22:-nowelcome:

Strip “^Arg:“, “^\d*:“, “:$“, add a trailing “ \” and you’ll have an OUI launching shell script you can alter at will.

Increasing JVM’s memory led to no effect. Heap report looked fine (usage percentages went down) but crash was still there.

Another useful switch is “-XX:+ShowMessageBoxOnError“. It makes java halt on error, allowing us to attach a debugger and perform a stack backtrace, e.g.:

Unexpected Signal: 11, PC: 0x6d4626a7, PID: 4866
An error has just occurred.
To debug, use 'gdb /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/bin/java 4866'; then switch to thread -136623920
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xf7e462b6 in nanosleep () from /lib/libc.so.6
#2  0xf7e460df in sleep () from /lib/libc.so.6
#3  0xf7bdc6d7 in os::message_box ()
   from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#4  0xf7bd9c52 in os::handle_unexpected_exception ()
   from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#5  0xf7bddbf6 in JVM_handle_linux_signal ()
   from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#6  0xf7bdc9d8 in signalHandler ()
   from /tmp/OraInstall2011-02-08_11-01-42AM/jre/1.4.2/lib/i386/client/libjvm.so
#7  <signal handler called>
#8  0x6d4626a7 in ?? ()
#9  0x6d6d75b9 in XtToolkitInitialize () from /usr/lib/libXt.so.6

I also tried to “inject” a couple of newer JVM’s into the stage directory. The quickest way is to borrow it from another installer.

[oracle@racnode01 orastage]$ find . -type d -name oracle.swd.jre -exec echo {} \; -exec ls {} \;
./Linux_x86_64_Grid_Control_full_102030/Disk1/stage/Components/oracle.swd.jre
1.4.2.8.0
./p6810189_10204_Linux-x86-64/Disk1/stage/Components/oracle.swd.jre
1.4.2.14.0

The server’s has a “working” directory were Oracle patches/products are stored before use. In my case, changing OUI’s JVM from 1.4.2.8 to 1.4.2.14 is a matter of copying:

./p6810189_10204_Linux-x86-64/Disk1/stage/Components/oracle.swd.jre/1.4.2.14.0

to:

./Linux_x86_64_Grid_Control_full_102030/Disk1/stage/Components/oracle.swd.jre

Then modifing the same “oraparam.ini” file mentioned before.

#JRE_LOCATION=../stage/Components/oracle.swd.jre/1.4.2.8.0/1/DataFiles
JRE_LOCATION=../stage/Components/oracle.swd.jre/1.4.2.14.0/1/DataFiles

You could as well download a specific JRE from http://java.sun.com (sorry: from Oracle) and:

  • install the new JRE somewhere
  • unzip (-t) the “filegroup1.jar” file that corresponds to OUI’s “factory” JRE. Note how the directories are laid out (something like: “jre/1.4.2”). Modify the new JRE accordingly.
  • zip the new JRE, rename the resulting file to “filegroup1.jar”, copy it in the right place.
  • modify oraparam.ini and choose the JVM version you’ll boot OUI into.
[oracle@racnode01 oracle.swd.jre]$ pwd
/opt/orastage/Linux_x86_64_Grid_Control_full_102030/Disk1/stage/Components/oracle.swd.jre
[oracle@racnode01 oracle.swd.jre]$ find . -type f
./1.4.2.8.0/1/DataFiles/filegroup1.jar   # <-- factory
./1.4.2.8.0/1/DataFiles/filegroup2.jar
./1.4.2.8.0/1/DataFiles/filegroup3.jar
./1.4.2.8.0/1/DataFiles/filegroup4.jar
./1.4.2.8.0/1/DataFiles/filegroup5.jar
./1.4.2.14.0/1/DataFiles/filegroup1.jar  # <-- stolen from patchset p6810189
./1.4.2.14.0/1/DataFiles/filegroup2.jar
./1.4.2.14.0/1/DataFiles/filegroup3.jar
./1.4.2.14.0/1/DataFiles/filegroup4.jar
./1.4.2.14.0/1/DataFiles/filegroup5.jar
./1.4.2.19.0/1/DataFiles/filegroup1.jar  # <-- downloaded by hand

Three different JREs, each of them segfaulting in the same spot, as we saw in the backtrace:

#9  0x6d6d75b9 in XtToolkitInitialize () from /usr/lib/libXt.so.6

Who’s the owner of libXt?

[root@racnode01 ~]# rpm -q --queryformat '%{NAME}-%{VERSION}-%{RELEASE} %{ARCH}\n' -f /usr/lib/libXt.so.6
libXt-1.0.2-3.1.fc6 i386

After making sure that none of the running processes was using that package contents, I decided to remove it (rpm -e –nodeps libXt-1.0.2-3.1.i386) and reinstall it. Surprisingly, OUI worked flawlessy after this last action. Too bad I can’t really explain why. 🙁 libXt version didn’t change before/after reinstall. I should diff it anyway with what’s left untouched on other RAC cluster members. I’ll update the post when I have a stricter explanation…

Share