[subexp-daq] Report of a possible bug of the CAEN_V560 module

Wed Feb 21 16:46:25 CET 2024

Dear Günter,

I cannot see anything important having changed in the v560 code, and in 
any case the freeze happens inside map_map.

Aha, but I do see a bug 'map_map'! Starting with the switch statement 
around line 195 where the bit depth is chosen, 'map_sicy_write' writes 
to 'poke_r_ofs', must be 'poke_w_ofs', please try that. (Says a lot 
about this piece of code... Cleanup action to the todo.)

Best regards,
Hans

On 2024-02-21 16:14, Weber, Guenter Dr. wrote:
> Dear Hans,
> 
> 
> writing into the register works fine (I tried it several times):
> 
> 
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> 
> I could now litter map_map with printf() outputs to see where execution of
> 
> v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
> 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
> 
> is failing. Should I proceed this way? Or is there anything else that I 
> could check?
> 
> (As I understand, the slightly different implementation of V560 on our 
> running system is not indicative of a specific issue, but just due to 
> fact that this is a deprecated version of NURDLIB. Right?)
> 
> 
> 
> Best greetings
> Günter
> 
> 
> ------------------------------------------------------------------------
> *Von:* Hans Toshihide Törnqvist <hans.tornqvist at chalmers.se>
> *Gesendet:* Mittwoch, 21. Februar 2024 15:28:01
> *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.
> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 
> module
> Dear Günter,
> 
> The most important thing is that you get reasonable values with these
> reads, the actual values don't mean a whole lot.
> 
> One of the manual reads that you did (ofs=0xfa) is what 'map_map' does
> for "poke reading". The macros
> 
> MAP_POKE_ARGS(fixed_code), or the older
> MAP_POKE_ARGS(*v560->read, fixed_code)
> 
> tell 'map_map' what address offset to poke, and it depends on each module.
> 
> The next thing that happens in 'map_map' is the "poke writing". Could
> you try to write to the 'scale_clear' register next? That would be:
> 
> rwdump -a0x33333350 -w16,0
> 
> ---
> 
> In case you would like to look deeper in 'map_map', you can find it in
> module/map/map.c around line-number 103. It's not a very complicated
> function that does the following:
> 
> -) Checks user-mapped memory, you don't need to worry about this, it's
> mainly for simulating module memory for tests.
> 
> -) Performs the poke-read.
> 
> -) Performs the poke-write.
> 
> -) If it's a BLT mapping, asks the platform-specific code to do that
> without further tests.
> 
> -) Otherwise times the poke registers many times to get an idea about
> the speed of every single-cycle access.
> 
> If you want to dig even deeper, you can look in
> module/map/map_xpc_3310.c which is what is used in the most recent Linux
> Rio4's. It's mainly a wrapper around a proprietary black-box library, so
> not scary and scary at the same time.
> 
> Best regards,
> Hans
> 
> On 2024-02-21 14:32, Weber, Guenter Dr. wrote:
>> Dear Hans,
>> 
>> 
>> with the different register addresses it works.
>> 
>> 
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16
>> Address=0x333333fa
>> Raw-read value=0xfaf5
>> 
>> 
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16
>> Address=0x333333fc
>> Raw-read value=0x083a
>> 
>> 
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16
>> Address=0x333333fe
>> Raw-read value=0x01bc
>> 
>> What can we learn from these numbers?
>> 
>> 
>> 
>> 
>> Best greetings
>> 
>> Günter
>> 
>> 
>> 
>> ------------------------------------------------------------------------
>> *Von:* Hans Toshihide Törnqvist <hans.tornqvist at chalmers.se>
>> *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06
>> *An:* Weber, Guenter Dr.
>> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 
>> module
>> Hmm, looks like address offset 0 is "not used", could you try 
>> -a0x333333fa? Or fe and fc at the end,they should be some read-only 
>> registers.
>> 
>> 
>> "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024 
>> 12:06:00 CET)
>> 
>>     Different VME slot of the V560 module, same result. :-(
>> 
>>     ------------------------------------------------------------------------
>>     *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag
>>     von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
>>     *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25
>>     *An:* Hans Toshihide Törnqvist; Discuss use of Nurdlib, TRLO II,
>>     drasi and UCESB.
>>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>>     CAEN_V560 module
>> 
>>     Dear Hans,
>> 
>> 
>>     the output from manual reading of the module indeed shows a problem:
>> 
>> 
>>     RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16
>>     Address=0x33333300
>>     Raw-read value=rwdump: line 28:   593 Bus error              
>>     $PREFIX $f "$@"
>> 
>> 
>>     The module was working with this address in the other DAQ system (as
>>     we did not know the order of the individual switches, we set them
>>     all to "3"). But I can take it our and put it in again at a
>>     different slot, if maybe this particular slot has a hardware
>>     problem. (But I never heard of such thing.)
>> 
>> 
>> 
>> 
>>     Best greetings
>> 
>>     Günter
>> 
>> 
>>     ------------------------------------------------------------------------
>>     *Von:* Hans Toshihide Törnqvist <hans.tornqvist at chalmers.se>
>>     *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44
>>     *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber,
>>     Guenter Dr.
>>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>>     CAEN_V560 module
>>     Dear Günter,
>> 
>>     map_map before mapping tries to read and write some given registers
>>     with a "safe" but slower method of accessing registers, which is
>>     called "poking" in nurdlib. Maybe the method of access on the rio4
>>     you have is not safe enough and one of the two pokes fails horribly...
>> 
>>     Could you please double check the module address? Could you also try
>>     using bin/rwdump to read any register in the v560 to see if it's
>>     accessible at all and not a problem with the module implementation
>>     in nurdlib?
>> 
>>     Something like bin/rwdump -a0x33333300 -r16
>> 
>>     Actually the address 0x33333300 looks weird to me, maybe it should
>>     be 0x33330000?
>>     Also for reading, try register offsets fa, fc, fe, with 16 bits
>>     accesseses, they should have some interesting values.
>> 
>>     Cheers,
>>     Hans
>> 
>> 
>>     "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari
>>     2024 10:18:29 CET)
>> 
>>         Dear Håkan,
>> 
>> 
>>         thanks for the hint to flush and sleep. Indeed, I now see that
>>         the crash happens in init_slow of V560 at this line:
>> 
>> 
>>         v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
>>         0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
>> 
>> 
>>         Maybe the code is accessing/writing into a memory location that
>>         it should better not touch?
>> 
>>         This problematic line is then followed by:
>> 
>> 
>>         id=MAP_READ(v560->sicy_map, fixed_code);
>> 
>>         The corresponding line in the V560 code on the system that was
>>         running with this module looks like this:
>> 
>> 
>>         v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
>>         0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
>>         MAP_POKE_ARGS(*v560->write, scale_clear));
>> 
>>         And is followed by:
>> 
>> 
>>              mapped_ptr =map_get_mapped_ptr(v560->sicy_map);
>>         v560->read=mapped_ptr;
>>         v560->write=mapped_ptr;
>> 
>>         Maybe you already have an idea what causes the problem here?
>> 
>> 
>>         I will now go to the system that was running with V560 and make
>>         a push of the NURDLIB.
>> 
>> 
>> 
>> 
>>         Best greetings
>> 
>>         Günter
>> 
>> 
>> 
>>         ------------------------------------------------------------------------
>>         *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im
>>         Auftrag von Håkan T Johansson <f96hajo at chalmers.se>
>>         *Gesendet:* Dienstag, 20. Februar 2024 20:13:32
>>         *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>>         *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>>         CAEN_V560 module
>> 
>>         Dear Günter,
>> 
>>         I took the files you provided and for comparison put them in a
>>         branch
>>         'old_caen_v560'.
>> 
>>         git diff origin/old_caen_v560..origin/master
>> 
>>         however does not show anything which is suspicious to me. 
>>         Perhaps Hans
>>         can spot something.
>> 
>>         Otherwise, the only idea I can come up with is to continue to
>>         bisect the
>>         code inside slow init.
>> 
>>         However, before that, I would suggest to add
>> 
>>            fflush(stdout); sleep(1);
>> 
>>         after each printf statement, such that one can be quite sure
>>         that the
>>         printout is not eaten when the RIO crash happens.  I.e. that it
>>         actually
>>         had gotten further than shown by the prints.
>> 
>>         Best regards,
>>         Håkan
>> 
>> 
>> 
>> 
>>         On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:
>> 
>>         > 
>>         > Dear friends,
>>         > 
>>         > 
>>         > I now had a look at the system where the V560 was running. It was also setup
>>         > by Bastian. And there the code for the V560 module is slightly different
>>         > from the one included in the NURDLIB branch that I am using on the test
>>         > system.
>>         > 
>>         > 
>>         > Maybe you can have a look at it.
>>         > 
>>         > 
>>         > I also could push the complete NURDLIB from this system, if this helps.
>>         > 
>>         > 
>>         > 
>>         > 
>>         > Best greetings
>>         > 
>>         > Günter
>>         > 
>>         > 
>>         > 
>>         > 
>>         > ____________________________________________________________________________
>>         > Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
>>         > Guenter Dr. <g.weber at hi-jena.gsi.de>
>>         > Gesendet: Dienstag, 20. Februar 2024 10:58:27
>>         > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>>         > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module 
>>         > 
>>         > Dear friends,
>>         > 
>>         > 
>>         > I now grabbed a V560 module that was working fine in another DAQ system and
>>         > put it into our test system.
>>         > 
>>         > 
>>         > The main.cfg looks like this:
>>         > 
>>         > 
>>         > log_level=spam # info, verbose, debug, spam
>>         > 
>>         > CRATE("MCAL") {
>>         >     GSI_VULOM(0x03000000) {
>>         >         timestamp = true # needed to get timestamps in the data output
>>         >     #   ecl=0..15
>>         >     }
>>         >     BARRIER
>>         >     CAEN_V560(0x333333300) {
>>         >         use_veto = true
>>         >     }  
>>         > #   CAEN_V767A(0x03100000) {
>>         > #   }
>>         > }
>>         > 
>>         > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
>>         > is necessary to talk to it again.
>>         > 
>>         > 
>>         > The problem occurs in the first slow init of the V560 module. To find the
>>         > exact line, I added some output to CRATE.C:
>>         > 
>>         > 
>>         > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>>         > before push_log_level(module)
>>         > before a_crate->module_init_id = module->id
>>         > before module->props->init_slow(a_crate, module)
>>         > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>>         > before module_init_id_mark(a_crate, module)
>>         > before pop_log_level(module)
>>         > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
>>         > before push_log_level(module)
>>         > before a_crate->module_init_id = module->id
>>         > before module->props->init_slow(a_crate, module)
>>         > 
>>         > 
>>         > The CRATE.C code now looks like this:
>>         > 
>>         > 
>>         >     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>>         >         if (NULL == module->props) {
>>         >             continue;
>>         >         }
>>         >         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>>         >             keyword_get_string(module->type));
>>         >         printf("before push_log_level(module) \n");
>>         >         push_log_level(module);
>>         >         printf("before a_crate->module_init_id = module->id \n");
>>         >         a_crate->module_init_id = module->id;
>>         >         printf("before module->props->init_slow(a_crate, module) \n");
>>         >         if (!module->props->init_slow(a_crate, module)) {
>>         >             printf("before pop_log_level(module) \n");
>>         >             pop_log_level(module);
>>         >             printf("before goto crate_init_done \n");
>>         >             goto crate_init_done;
>>         >         }
>>         >         printf("before module_init_id_mark(a_crate, module) \n");
>>         >         module_init_id_mark(a_crate, module);
>>         >         printf("before pop_log_level(module) \n");
>>         >         pop_log_level(module);
>>         >     }
>>         > 
>>         > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
>>         > module)) ..." is doing something quite horrible to the RIO4.
>>         > 
>>         > 
>>         > This is unfortunate, because my original aim was to show that there is also
>>         > a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>>         > 
>>         > 
>>         > Do you have any idea what might cause the freezing of the RIO4?
>>         > 
>>         > 
>>         > 
>>         > 
>>         > Best greetings and many thanks
>>         > 
>>         > Günter
>>         > 
>>         > 
>>         > 
>>         > 
>>         > 
>>         >
>> 
>> 
>