[subexp-daq] Report of a possible bug of the CAEN_V560 module

Wed Feb 21 15:28:01 CET 2024

Dear Günter,

The most important thing is that you get reasonable values with these 
reads, the actual values don't mean a whole lot.

One of the manual reads that you did (ofs=0xfa) is what 'map_map' does 
for "poke reading". The macros

MAP_POKE_ARGS(fixed_code), or the older
MAP_POKE_ARGS(*v560->read, fixed_code)

tell 'map_map' what address offset to poke, and it depends on each module.

The next thing that happens in 'map_map' is the "poke writing". Could 
you try to write to the 'scale_clear' register next? That would be:

rwdump -a0x33333350 -w16,0

---

In case you would like to look deeper in 'map_map', you can find it in 
module/map/map.c around line-number 103. It's not a very complicated 
function that does the following:

-) Checks user-mapped memory, you don't need to worry about this, it's 
mainly for simulating module memory for tests.

-) Performs the poke-read.

-) Performs the poke-write.

-) If it's a BLT mapping, asks the platform-specific code to do that 
without further tests.

-) Otherwise times the poke registers many times to get an idea about 
the speed of every single-cycle access.

If you want to dig even deeper, you can look in 
module/map/map_xpc_3310.c which is what is used in the most recent Linux 
Rio4's. It's mainly a wrapper around a proprietary black-box library, so 
not scary and scary at the same time.

Best regards,
Hans

On 2024-02-21 14:32, Weber, Guenter Dr. wrote:
> Dear Hans,
> 
> 
> with the different register addresses it works.
> 
> 
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16
> Address=0x333333fa
> Raw-read value=0xfaf5
> 
> 
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16
> Address=0x333333fc
> Raw-read value=0x083a
> 
> 
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16
> Address=0x333333fe
> Raw-read value=0x01bc
> 
> What can we learn from these numbers?
> 
> 
> 
> 
> Best greetings
> 
> Günter
> 
> 
> 
> ------------------------------------------------------------------------
> *Von:* Hans Toshihide Törnqvist <hans.tornqvist at chalmers.se>
> *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06
> *An:* Weber, Guenter Dr.
> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 
> module
> Hmm, looks like address offset 0 is "not used", could you try 
> -a0x333333fa? Or fe and fc at the end,they should be some read-only 
> registers.
> 
> 
> "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024 
> 12:06:00 CET)
> 
>     Different VME slot of the V560 module, same result. :-(
> 
>     ------------------------------------------------------------------------
>     *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag
>     von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
>     *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25
>     *An:* Hans Toshihide Törnqvist; Discuss use of Nurdlib, TRLO II,
>     drasi and UCESB.
>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>     CAEN_V560 module
> 
>     Dear Hans,
> 
> 
>     the output from manual reading of the module indeed shows a problem:
> 
> 
>     RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16
>     Address=0x33333300
>     Raw-read value=rwdump: line 28:   593 Bus error              
>     $PREFIX $f "$@"
> 
> 
>     The module was working with this address in the other DAQ system (as
>     we did not know the order of the individual switches, we set them
>     all to "3"). But I can take it our and put it in again at a
>     different slot, if maybe this particular slot has a hardware
>     problem. (But I never heard of such thing.)
> 
> 
> 
> 
>     Best greetings
> 
>     Günter
> 
> 
>     ------------------------------------------------------------------------
>     *Von:* Hans Toshihide Törnqvist <hans.tornqvist at chalmers.se>
>     *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44
>     *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber,
>     Guenter Dr.
>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>     CAEN_V560 module
>     Dear Günter,
> 
>     map_map before mapping tries to read and write some given registers
>     with a "safe" but slower method of accessing registers, which is
>     called "poking" in nurdlib. Maybe the method of access on the rio4
>     you have is not safe enough and one of the two pokes fails horribly...
> 
>     Could you please double check the module address? Could you also try
>     using bin/rwdump to read any register in the v560 to see if it's
>     accessible at all and not a problem with the module implementation
>     in nurdlib?
> 
>     Something like bin/rwdump -a0x33333300 -r16
> 
>     Actually the address 0x33333300 looks weird to me, maybe it should
>     be 0x33330000?
>     Also for reading, try register offsets fa, fc, fe, with 16 bits
>     accesseses, they should have some interesting values.
> 
>     Cheers,
>     Hans
> 
> 
>     "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari
>     2024 10:18:29 CET)
> 
>         Dear Håkan,
> 
> 
>         thanks for the hint to flush and sleep. Indeed, I now see that
>         the crash happens in init_slow of V560 at this line:
> 
> 
>         v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
>         0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
> 
> 
>         Maybe the code is accessing/writing into a memory location that
>         it should better not touch?
> 
>         This problematic line is then followed by:
> 
> 
>         id=MAP_READ(v560->sicy_map, fixed_code);
> 
>         The corresponding line in the V560 code on the system that was
>         running with this module looks like this:
> 
> 
>         v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
>         0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
>         MAP_POKE_ARGS(*v560->write, scale_clear));
> 
>         And is followed by:
> 
> 
>              mapped_ptr =map_get_mapped_ptr(v560->sicy_map);
>         v560->read=mapped_ptr;
>         v560->write=mapped_ptr;
> 
>         Maybe you already have an idea what causes the problem here?
> 
> 
>         I will now go to the system that was running with V560 and make
>         a push of the NURDLIB.
> 
> 
> 
> 
>         Best greetings
> 
>         Günter
> 
> 
> 
>         ------------------------------------------------------------------------
>         *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im
>         Auftrag von Håkan T Johansson <f96hajo at chalmers.se>
>         *Gesendet:* Dienstag, 20. Februar 2024 20:13:32
>         *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>         *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>         CAEN_V560 module
> 
>         Dear Günter,
> 
>         I took the files you provided and for comparison put them in a
>         branch
>         'old_caen_v560'.
> 
>         git diff origin/old_caen_v560..origin/master
> 
>         however does not show anything which is suspicious to me. 
>         Perhaps Hans
>         can spot something.
> 
>         Otherwise, the only idea I can come up with is to continue to
>         bisect the
>         code inside slow init.
> 
>         However, before that, I would suggest to add
> 
>            fflush(stdout); sleep(1);
> 
>         after each printf statement, such that one can be quite sure
>         that the
>         printout is not eaten when the RIO crash happens.  I.e. that it
>         actually
>         had gotten further than shown by the prints.
> 
>         Best regards,
>         Håkan
> 
> 
> 
> 
>         On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:
> 
>         > 
>         > Dear friends,
>         > 
>         > 
>         > I now had a look at the system where the V560 was running. It was also setup
>         > by Bastian. And there the code for the V560 module is slightly different
>         > from the one included in the NURDLIB branch that I am using on the test
>         > system.
>         > 
>         > 
>         > Maybe you can have a look at it.
>         > 
>         > 
>         > I also could push the complete NURDLIB from this system, if this helps.
>         > 
>         > 
>         > 
>         > 
>         > Best greetings
>         > 
>         > Günter
>         > 
>         > 
>         > 
>         > 
>         > ____________________________________________________________________________
>         > Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
>         > Guenter Dr. <g.weber at hi-jena.gsi.de>
>         > Gesendet: Dienstag, 20. Februar 2024 10:58:27
>         > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>         > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module  
>         > 
>         > Dear friends,
>         > 
>         > 
>         > I now grabbed a V560 module that was working fine in another DAQ system and
>         > put it into our test system.
>         > 
>         > 
>         > The main.cfg looks like this:
>         > 
>         > 
>         > log_level=spam # info, verbose, debug, spam
>         > 
>         > CRATE("MCAL") {
>         >     GSI_VULOM(0x03000000) {
>         >         timestamp = true # needed to get timestamps in the data output
>         >     #   ecl=0..15
>         >     }
>         >     BARRIER
>         >     CAEN_V560(0x333333300) {
>         >         use_veto = true
>         >     }  
>         > #   CAEN_V767A(0x03100000) {
>         > #   }
>         > }
>         > 
>         > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
>         > is necessary to talk to it again.
>         > 
>         > 
>         > The problem occurs in the first slow init of the V560 module. To find the
>         > exact line, I added some output to CRATE.C:
>         > 
>         > 
>         > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>         > before push_log_level(module)
>         > before a_crate->module_init_id = module->id
>         > before module->props->init_slow(a_crate, module)
>         > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>         > before module_init_id_mark(a_crate, module)
>         > before pop_log_level(module)
>         > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
>         > before push_log_level(module)
>         > before a_crate->module_init_id = module->id
>         > before module->props->init_slow(a_crate, module)
>         > 
>         > 
>         > The CRATE.C code now looks like this:
>         > 
>         > 
>         >     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>         >         if (NULL == module->props) {
>         >             continue;
>         >         }
>         >         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>         >             keyword_get_string(module->type));
>         >         printf("before push_log_level(module) \n");
>         >         push_log_level(module);
>         >         printf("before a_crate->module_init_id = module->id \n");
>         >         a_crate->module_init_id = module->id;
>         >         printf("before module->props->init_slow(a_crate, module) \n");
>         >         if (!module->props->init_slow(a_crate, module)) {
>         >             printf("before pop_log_level(module) \n");
>         >             pop_log_level(module);
>         >             printf("before goto crate_init_done \n");
>         >             goto crate_init_done;
>         >         }
>         >         printf("before module_init_id_mark(a_crate, module) \n");
>         >         module_init_id_mark(a_crate, module);
>         >         printf("before pop_log_level(module) \n");
>         >         pop_log_level(module);
>         >     }
>         > 
>         > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
>         > module)) ..." is doing something quite horrible to the RIO4.
>         > 
>         > 
>         > This is unfortunate, because my original aim was to show that there is also
>         > a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>         > 
>         > 
>         > Do you have any idea what might cause the freezing of the RIO4?
>         > 
>         > 
>         > 
>         > 
>         > Best greetings and many thanks
>         > 
>         > Günter
>         > 
>         > 
>         > 
>         > 
>         > 
>         >
> 
>