MINOS Computing at the University of Minnesota

MINOS Computing at the University of Minnesota



Principal

First you'll need kerberos principal, in this case we'll say from Fermilab computing division. A kerberos user principal is same as your unix username when you login to a kerberized. Don't ask why it's called a principal instead of username, nobody really knows. To get a Fermilab principal, you must first have visitor ID. I won't going into the whole process behind this, as I've blocked it from my memory. For MINOS, follow this link for instructions on how to get visitor ID and kerberos principal.

If you've been using a cryptocard for a while to attach to fermilab systems, chances are that you have forgotten your password. Don't panic. You can have your password reset by the fine computing division folks with a phone call. All of this is described in the FNAL computing division kerberos user page


Tickets

Once you have your principal ready to use, you can obtain and locally store a kerberos ticket on one of the UMN linux machines. The command to get a ticket is 'kinit', with 'klist' to display your locally stored ticket and 'kdestroy' to remove your locally stored ticket. The relevant options to using kinit on UMN linux are:

kinit [options] principal@REALM
  -l [lifetime] *** sepcify the lifetime for the ticket (ie '-l 1d' for 1 day lifetime ticket)
  -f *** Makes the ticket forwardable

To obtain a kerberos ticket, I will type "kinit -f -l 1d bspeak@FNAL.GOV" and enter in my kerberized password. After this, the klist command shows my ticket.

[bspeak@speakman ~]$ klist
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_7025)


Kerberos 4 ticket cache: /tmp/tkt7025
klist: You have no tickets cached
[bspeak@speakman ~]$ kinit -f -l 1d bspeak@FNAL.GOV
Password for bspeak@FNAL.GOV:
[bspeak@speakman ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_7025
Default principal: bspeak@FNAL.GOV

Valid starting     Expires            Service principal
10/26/05 14:54:31  10/27/05 14:54:26  krbtgt/FNAL.GOV@FNAL.GOV


Kerberos 4 ticket cache: /tmp/tkt7025
klist: You have no tickets cached

Once you have obtained tickets, you can then use them to login into a keberized with no other authentication. This can be done by a variety of means:

  • secure shell -- This is the recommended method of connecting with encryption to a kerberized machine. Use "ssh -1 [principal]@[machine]", ie "ssh -1 bspeak@minos01.fnal.gov" to login to the machine minos01.fnal.gov as user bspeak.
    If you are logged into a SL4 machine, the standard install of ssh doesn't authenticate krb5 tickets to the current configuration of fermi-linux machines. I've compiled an older version of openssh and placed it in /local/minos/bin/kssh. Read here for more details
  • rsh -- This connection is not encrypted, and thus not encouraged. Use "/usr/kerberos/bin/krsh -l [principal] [machine]", ie "/usr/kerberos/bin/krsh -l bspeak minos01.fnal.gov"
  • ftp -- An unencrypted way to transfer files back and forth with a kerberized machine. Use "/usr/kerberos/bin/ftp [machine]". This will ask for you username, and open a typical ftp session.
Return to Document Contents

aklog on SL41

2005/11/16

The command aklog will take existing kerberos tickets and convert them into afs tokens. I've always just cheated by using a FNAL compiled aklog on the SL3 machines, but this isn't an option for SL4. So, I'm going to try to compile aklog and keep it in the /local/minos area.

I first grabbed the openafs src rpm:

ftp://linux1.fnal.gov/linux/scientific/41/SRPMS/openafs-1.3.82-3.SL.src.rpm

This was unpacked with rpm -ivh [file] into ~/build/RPM by virtue of the file ~/.rpmmacros with the line "%_topdir /home/hep/bspeak/build/RPM". I used the spec file ~/build/RPM/SPECS/openafs-1.3.82-3.SL.spec with the command rpmbuild -bc [file] (This had to be done minos-pc2 since minos-pc7 ask for the kernel-smp-devel packages, since minos-pc2 is a single processor and minos-pc7 is a hyper-threaded p4). This should just build the package and not make binary rpm or attempt any install. This completed, and I copied the file ~/build/RPM/BUILD/openafs-1.3.82/afs-krb5/src/aklog to /local/minos/bin/aklog.

Return to Document Contents

openssh on SL41

2005/11/16

When connecting via SSH to FNAL linux systems, no version of openssh passed 3.6.1 will connect to a kerberized fermi-linux machine with krb5 ticket authentication. I've become rather detached from my cryptocard, and would like to remain so. Unfortunately, SL4 comes with openssh 3.9 (SL3 had 3.6.1), so I've decided to compile a version of openssh 3.6.1 on a SL4 machine, and make an executable /local/minos/bin/kssh

wget a tarchive from:

ftp://mirror.sg.depaul.edu/pub/OpenBSD/OpenSSH/portable/openssh-3.6.1p2.tar.gz

I unpacked this in ~/build/, and went into ~/build/openssh-3.6.1p2/. There I configured as:

./configure --with-kerberos=/usr/kerberos >& cfg.log gmake >& bld.log

The files in this directory were copied to /local/minos/bin as ssh to kssh, and scp to kscp.

Return to Document Contents


Root Releases

Root Version Built for OS Products enabled Products enabled
SL4OSF1 SL4OSF1 SL4OSF1 SL4OSF1
HEAD nono yesnono --- --- 2004--
v2_25_03 yesnono yesnono --- --- 2004--
v3_05_06 yesnono yesnono --- --- 2004--
v3_05_07 yesnono yesnono --- --- 2004--
v3_10_02 yesnono yesnono --- --- 2004--
v4_00_03 yesnono yesnono --- --- 2004--
v4_00_04 yesnono yesnono --- --- 2004--
v4_00_08 yesnono yesnono --- --- 2004--
v4_00_08e yesnono yesnono --- --- 2004--
v4_01_02 yesnono yesnono --- --- 2004--
v4_01_04 yesnono yesnono --- --- 2004--
v4_02_00 yesyesno yesnono --- --- 2004--
v4_03_02 yesnono yesnono --- --- 2004--
v4_03_04 yesnono yesnono --- --- 2004--
v4_04_02 yesnono yesnono --- --- 2004--
v4_04_02d yesnono yesnono --- --- 2004--
v4_04_02f yesyesno yesnono --- --- 2004--
v5_04_00 yesyesno yesnono --- --- 2004--
Return to Document Contents

Minossoft Releases

Release Root Version Last Build Date
SL3SL4
developmentHEAD Nov 27Nov 20
R1.20v5_08_00 Dec 22Dec 29
R1.18v4_02_00 Sep 8Oct 3
R1.18.0v4_02_00 Sep 8Oct 3
R1.18.1v4_02_00 Sep 8Oct 3
R1.18.2v4_02_00 Sep 8Oct 3
R1.16v4_02_00 Sep 8Oct 3
R1.15v4_02_00 Sep 8, 2005Dec 8, 2005
R1.14v4_02_00 Sep 8-
R1.13v4_02_00 Sep 8-
R1.12v4_01_04 Sep 8-
R1.11v4_01_02 Sep 8-
R1.10v4_00_08e Sep 8-
R1.9v4_00_08 Sep 8-
R1.8v4_00_04 Sep 8-
R1.7v4_00_03 Sep 8-
R1.0v3_05_07 Sep 8-

Problems and Solutions

2005/11/12 -- root v5_06_00

Tried compiling root v5_06_00 on SL4 with:

root_build.sh -v -m -F -u -C -O "--enable-roofit" -O "--enable-minuit2" -z /local/ups/db -j 2 v5_06_00

The build dies in the middle of building the xrootd package, with the following message: (complete log file)

g++ -D_ALL_SOURCE -D_REENTRANT -D_GNU_SOURCE -fPIC -rdynamic -Wall -Wno-deprecated -D__linux__ ../../obj/XrdSectestClient.o -lnsl -lpthread -lrt -ldl -lc -L../../l ib -lXrdSec -lXrdOuc -lXrdNet -o ../../bin/testclient ../../obj/XrdSectestClient.o(.text+0x46f): In function `main': : undefined reference to `XrdSecGetProtocol' collect2: ld returned 1 exit status gmake[5]: *** [../../bin/testclient] Error 1

Since this might be a problem with multi-threaded build outrunning itself, I tried to remove the '-j 2' option and run as a single thread. This fixed the problem. Caution should be taken in the future when compiling with multiple threads.


Which options should be enabled when compiling a given version of root? Good question, you can tell what has been done from the root prompt with "gROOT->GetConfigOptions()". The standard options have been changing progressively by version, so there is unfortunately no standard to follow. What I'll do for future generations is to start a table with versions and the added options, along with those that are placed by default in the root_build.sh script.

Still no movement on this issue, no time left for you.


2006/04/16 -- minossoft S06-04-15-R1-21

The initial checkout of this snaphshot release was flaky, and as a result the release/include/RDBC pointed to ../RDBC/include rather than ../RDBC/include/RDBC as is necessary to find the appropriate headers for the DB packages. I manually changed the soft link and recompiled. This release will be built against root v5_08_00b, v5_10_00, and d2006-04-15.

Return to Document Contents


Neutrino Condor Configuration

Each of the neutrino machine has a local configuration file (/export/scratch/users/condor/condor_config.local) that control how jobs are spooled to, and dealt with when running on each machine.

CondorGroup             = "neutrino"
SUBMIT_EXPRS            = CondorGroup, $(SUBMIT_EXPRS)

BaseMachineBusy = (VirtualMachineID == 1 && \
                   (KeyboardIdle < 10 || ConsoleIdle < 10))
BaseMachineNotBusy = (VirtualMachineID != 1 || \
                      (KeyboardIdle > 120 && ConsoleIdle > 120))


WANT_SUSPEND            = TRUE
WANT_VACATE             = TRUE
START                   = $(CPUIdle) && (CondorGroup =?= "neutrino") && \
                          $(BaseMachineNotBusy)
SUSPEND                 = $(ActivationTimer) > 90 && \
                          (CpuBusyTime > 30 || $(BaseMachineBusy))
CONTINUE                = $(CPUIdle) && ($(ActivityTimer) > 30) && \
                          $(BaseMachineNotBusy)
PREEMPT                 = ( ((Activity == "Suspended") && \
                            ($(ActivityTimer) > 2 * $(ActivationTimer))) )
KILL                    = $(ActivityTimer) > 2 * $(ActivationTimer)
PERIODIC_CHECKPOINT     = FALSE
PREEMPTION_REQUIREMENTS = FALSE
PREEMPTION_RANK         = 0

What does this all means? From the perspective of a Vanilla Universe job, there are four important values: START, SUSPEND, CONTINUE, and KILL. START determines the conditions under which a job will start. SUSPEND determines the condition under which a job will be suspended. CONTINUE determines the condition under which a suspended job will continue to run. KILL determines the condition under which a suspenede job will be killed and run on another machine.

The BASEMACHINEBUSY and BASEMACHINENOTBUSY variables designate the status of the first VirtualMachineID (i.e. the first processor on a multi-processor machine). If a user sits down and types for more the 10 seconds, new jobs will not start no the first processor and jobs currently running on the first processor will be suspended. When the user has been away from the keyboard for two minutes or longer, suspended jobs on the first processor will restart or new jobs may start there if requested.

In the configuration there are a total of N configurable values tuned to optimize the distributed usage of a desktop machine:

  • Keyboard Busy Time = 10 seconds
  • Keyboard Idle Time = 120 seconds
  • Minimum Activation Time for Suspend = 90 seconds
  • Minimum CPU Busy Time for Suspend = 30 seconds
  • Minimum Suspend -> Continue Time = 30 seconds
  • Maximum Suspend Time to Kill = 2 * Time Spent Running

Submitting Jobs

Return to Document Contents

The data files for the MINOS experiment are held on the FNAL tape robot, which can be accessed with ftp. A script has been written to aide in the access to these data files while working on the UMN systems.

get_data.sh

This script contains a few bash functions that can be used either from command line of a bash shell or from shell script if you are using tcsh instead of bash. Before using any of these functions, there are several bash variables that can be modified to alter the function behavior:

  • ROOT_DATA=/data/minos/root_data #Local data directory where files will be downloaded
  • AFSHOME=/afs/fnal.gov/files/home/room3/bspeak #User's AFS home area
  • AFSDATA=/afs/fnal.gov/files/data/minos #Area in AFS where data is kept
  • AFSINDEX=$AFSDATA/d10/indexes #Area in AFS where indices are kept
  • RENEWLISTINGS=false #If true, the directory listing are grabbed again even if they already exist.
  • RECURSE=false #If true, the get_data_dir function will get all subdirectories as well as the current directory
  • NLIMIT=-1 #Limit the number of files that will be grabbed (no limit if =-1)
  • USEAFS=false #Try to find the files in $AFSDATA with indices in $AFSINDEX, currently this is not working
  • MAILHOST=physics.umn.edu #Mail is sent to $USER@$MAILHOST in case of download failure
  • KSSH=/local/minos/bin/kssh #Point KSSH to the kerberized ssh client

The get_data.sh script contains the following primary functions:

  • get_data_dir "directoryname" "suffix" # This function called with a directory name as it appears in PNFS (/pnfs/minos/directory_name) will download that directory to $ROOT_DATA as defined above. Only files that end in suffix will be grabbed. In order for this to work, the current account must have valid krb5 tickets to get a directory listing.
    i.e. if I wanted to grab the month of 2007-10 for the far detector raw data, I would execute the following:
    #!/bin/bash
    . /data/minos/root_data/scripts/get_data.sh
    get_data_dir fardet_data/2007-10
    
  • get_data filename #This function grabs filename with the path name as it would appear with out /pnfs/minos, and places it in the current directory
    i.e. If I wanted to get the file /pnfs/minos/neardet_data/2007-01/N00011669_0001.mdaq.root, I would do the following.
    #!/bin/bash
    . /data/minos/root_data/scripts/get_data.sh
    get_data neardet_data/2007-01/N00011669_0001.mdaq.root
    

Here are a couple example of scripts that implement the functions in get_data.sh:

example1.sh
example2.sh
example3.sh