overhead of ROOT I/O?

Pasha Murat (murat@cdfsga.fnal.gov)
Sun, 26 Oct 1997 05:20:18 GMT


Hello,

after exercising a little bit with ROOT I/O I ended up with a problem which
might be interesting for everybody on this list and with an example,
full source code of which (including makefile) is enclosed below.

What this example is doing:
--------------------------
I'm writing out 10000 events , each event (EVENT_AB) is a very simple object.
It contains 2 pointers to dynamically allocated objects (A and B),
each of A and B is 2 integers+TObject (see header files below)
Defining pointers to A and B in the event serves for writing events
out in split mode.

I intentionally use a small buffer(2000 bytes), which however is much larger
than the event size, so buffer size shouldn't be a factor in this exercise.

Next, I'm trying to estimate the expected size of the output file.
Size of each A and B is 12 bytes (2*4+sizeof(TObject)=12),
size of the event header (EVENT_AB) is also 12 bytes:
4+4+sizeof(TObject) = 12 bytes (the last number may be even smaller).

With 10000 events written in it the output file should have its size of
about 10000*(12+12+12) = 360000 bytes. ROOT, however, creates a file,
which is 712959 bytes large, i.e. almost 2 times larger!

Here is what the output says:

/cdf/upgrade/tracking/murat/test_root>ab.exe
TFile Writing Name=ab.root Title=ab test
************************************************************************************
*Tree :AB : AB tree *
*Entries : 10000 : Total Size = 712644 bytes File Size = 712644 bytes *
* : : Tree compression factor = 1.00 *
************************************************************************************
*Branch :Event : Event *
*Entries : 10000 : BranchObject (see below) *
*..................................................................................*
*Branch :fA : fA *
*Entries : 10000 : Total Size = 290002 bytes File Size = 290002 bytes *
*Baskets : 166 : Basket Size = 2000 bytes EvOffsetLen= 1000 *
* : : Branch compression factor = 1.00 *
*..................................................................................*
*Branch :fB : fB *
*Entries : 10000 : Total Size = 290002 bytes File Size = 290002 bytes *
*Baskets : 166 : Basket Size = 2000 bytes EvOffsetLen= 1000 *
* : : Branch compression factor = 1.00 *
*..................................................................................*
*Branch :fUniqueID : fUniqueID *
*Entries : 10000 : Total Size = 39960 bytes File Size = 39960 bytes *
*Baskets : 20 : Basket Size = 2000 bytes EvOffsetLen= 0 *
* : : Branch compression factor = 1.00 *
*..................................................................................*
*Branch :fBits : fBits *
*Entries : 10000 : Total Size = 39960 bytes File Size = 39960 bytes *
*Baskets : 20 : Basket Size = 2000 bytes EvOffsetLen= 0 *
* : : Branch compression factor = 1.00 *
*..................................................................................*
/cdf/upgrade/tracking/murat/test_root>dir -l *.root
-rw-r--r-- 1 murat cdfupg 456479 Aug 29 11:19 Event.root
-rw-r--r-- 1 murat cdfupg 712959 Oct 25 23:36 ab.root

*********************

- my 1st observation is that the size of each of fA and fB branches
(290002) is about 2.5 times larger than one could expect from
multiplying 12*10000 = 120000

- the size of fBits and fUniqueID branches (constituents of TObject) is
about right: 4*10000 = 40000 and the branch size is 39960

- then there is some missing component which accounts for
712644-2*(290002+39960) = 52720 bytes ( about 5 bytes per event)

The resulting question is: where the overhead (factor of 1.98) is coming from?
- It seems to be unacceptably large...

I'd appreciate any comments and hope that I did something wrong,

thanks, Pasha.

******************************************************
source codes and makefile of the example which produced output above
*****************************************************

------------------------------ A.hh
#ifndef __A_HH__
#define __A_HH__
#include "TObject.h"

class A: public TObject {
public:
Int_t Word[2];

A();
virtual ~A();
ClassDef(A,1)
};

#endif
----------------------------- B.hh
#ifndef __B_HH__
#define __B_HH__
#include "TObject.h"

class B: public TObject {
public:
Int_t Word[2];

B();
virtual ~B();
ClassDef(B,1)
};

#endif
----------------------------- EVENT_AB.hh
#ifndef __EVENT_AB_HH__
#define __EVENT_AB_HH__
#include "A.hh"
#include "B.hh"

class EVENT_AB : public TObject {
A* fA;
B* fB;
public:
EVENT_AB();
virtual ~EVENT_AB();
void init();
ClassDef(EVENT_AB,1)
};
#endif
------------------------------- A.cc
#include "A.hh"

ClassImp(A)

A::A() {
Word[0] = 101;
Word[1] = 202;
}

A::~A() {
}
-------------------------------- B.cc
#include "B.hh"

ClassImp(B)

B::B() {
Word[0] = 101;
Word[1] = 202;
}

B::~B() {
}
---------------------------------- EVENT_AB.cc
#include "EVENT_AB.hh"

ClassImp(EVENT_AB)

EVENT_AB::EVENT_AB() {
fA = 0;
fB = 0;
}

void EVENT_AB::init() {
if (fA) delete fA;
fA = new A();
if (fB) delete fB;
fB = new B();
}

EVENT_AB::~EVENT_AB() {
if (fA) delete fA;
if (fB) delete fB;
}
---------------------------------- test_write1.cc
// -*- Mode: C++ -*-
//------------------------------------------------------------------------------
// Oct 03 1997 P.Murat
//
// revision history :
// ------------------
// *0000 Oct 14 1997 P.Murat: creation date
//------------------------------------------------------------------------------
#ifdef __GNUG__
#pragma implementation
#endif

#include <stdlib.h>

#include "TROOT.h"
#include "TFile.h"
#include "TRandom.h"
#include "TTree.h"
#include "TBranch.h"
#include "TClonesArray.h"
#include "TStopwatch.h"

#include "EVENT_AB.hh"

EVENT_AB* Event;
TROOT* Root;
TFile* File;
TTree* Tree;
TBranch* BranchAB;

int main() {

int split, bufsize;
int comp = 0;

Root = ::new TROOT("root","root");

File = new TFile("ab.root","RECREATE","ab test");

File->SetCompressionLevel(comp);
Tree = new TTree("AB","AB tree");

// autosave when 1 Mbyte written
Tree->SetAutoSave(1000000);
bufsize = 2000;
split = 1;
Event = new EVENT_AB();
BranchAB = Tree->Branch("Event","EVENT_AB",&Event,bufsize,split);

for (int i=0; i<10000; i++) {
Event->init();
Tree->Fill();
}

File->Write();
Tree->Print();
// tree should be deleted BEFORE
// the file is closed
delete Tree;
delete Event;
File->Close();
delete File;
delete Root;
}
------------------------------------------- Makefile
ROOTLIBS = -L$(ROOTSYS)/lib -lBase -lRint -lCint -lClib -lCont -lFunc -lGraf \
-lGraf3d -lHist -lHtml -lMeta -lMinuit -lNet -lPostscript \
-lProof -lTree -lUnix -lZip
ROOTGLIBS = -lGpad -lGX11 -lMotif -lWidgets -lX3d

# SGI with GCC
CXX = gcc
CXXFLAGS = -fsigned-char -fPIC -w -g -I$(ROOTSYS)/include
LD = gcc
LDFLAGS = -Wl,-u,__builtin_new -Wl,-u,__builtin_delete -Wl,-u,__nw__FUiPv
SOFLAGS = -Wl,-soname,libEvent.so -shared
LIBS = $(ROOTLIBS) -lg++ -lm -ldl
#GLIBS = $(ROOTLIBS) $(ROOTGLIBS) -lXm -lXt -lX11 -lg++ -lm -lPW -ldl

# don't need -ldl on IRIX 6.2

GLIBS = $(ROOTLIBS) $(ROOTGLIBS) -lXm -lXt -lX11 -lg++ -lm -lPW

w1:

rootcint -f ab_cint.cc -c A.hh B.hh EVENT_AB.hh
gcc -o ab.exe $(CPPFLAGS) $(CXXFLAGS) A.cc B.cc EVENT_AB.cc \
ab_cint.cc test_write1.cc $(LDFLAGS) $(GLIBS)
********************************************************************************
********************************************************************************
/cdf/upgrade/tracking/murat/test_root>make w1
rootcint -f ab_cint.cc -c A.hh B.hh EVENT_AB.hh
Note: operator new() masked c
gcc -o ab.exe -fsigned-char -fPIC -w -g -I/cdf/upgrade/root/v1_03/include A.cc B.cc EVENT_AB.cc \
ab_cint.cc test_write1.cc -Wl,-u,__builtin_new -Wl,-u,__builtin_delete -Wl,-u,__nw__FUiPv -L/cdf/upgrade/root/v1_03/lib -lBase -lRint lCint -lClib -lCont -lFunc -lGraf -lGraf3d -lHist -lHtml -lMeta -lMinuit -lNet -lPostscript -lProof -lTree -lUnix -lZip -lGpad -lGX11 -lMotif -Widgets -lX3d -lXm -lXt -lX11 -lg++ -lm -lPW
10.251u 1.871s 0:18.02 67.2% 0+0k 344+36io 96pf+0w