[subexp-daq] Report of a possible bug of the CAEN_V560 module
Hans Toshihide Törnqvist
hans.tornqvist at chalmers.se
Wed Feb 21 16:46:25 CET 2024
Dear Günter,
I cannot see anything important having changed in the v560 code, and in
any case the freeze happens inside map_map.
Aha, but I do see a bug 'map_map'! Starting with the switch statement
around line 195 where the bit depth is chosen, 'map_sicy_write' writes
to 'poke_r_ofs', must be 'poke_w_ofs', please try that. (Says a lot
about this piece of code... Cleanup action to the todo.)
Best regards,
Hans
On 2024-02-21 16:14, Weber, Guenter Dr. wrote:
> Dear Hans,
>
>
> writing into the register works fine (I tried it several times):
>
>
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
>
> I could now litter map_map with printf() outputs to see where execution of
>
> v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
> 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
>
> is failing. Should I proceed this way? Or is there anything else that I
> could check?
>
> (As I understand, the slightly different implementation of V560 on our
> running system is not indicative of a specific issue, but just due to
> fact that this is a deprecated version of NURDLIB. Right?)
>
>
>
> Best greetings
> Günter
>
>
> ------------------------------------------------------------------------
> *Von:* Hans Toshihide Törnqvist <hans.tornqvist at chalmers.se>
> *Gesendet:* Mittwoch, 21. Februar 2024 15:28:01
> *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.
> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560
> module
> Dear Günter,
>
> The most important thing is that you get reasonable values with these
> reads, the actual values don't mean a whole lot.
>
> One of the manual reads that you did (ofs=0xfa) is what 'map_map' does
> for "poke reading". The macros
>
> MAP_POKE_ARGS(fixed_code), or the older
> MAP_POKE_ARGS(*v560->read, fixed_code)
>
> tell 'map_map' what address offset to poke, and it depends on each module.
>
> The next thing that happens in 'map_map' is the "poke writing". Could
> you try to write to the 'scale_clear' register next? That would be:
>
> rwdump -a0x33333350 -w16,0
>
> ---
>
> In case you would like to look deeper in 'map_map', you can find it in
> module/map/map.c around line-number 103. It's not a very complicated
> function that does the following:
>
> -) Checks user-mapped memory, you don't need to worry about this, it's
> mainly for simulating module memory for tests.
>
> -) Performs the poke-read.
>
> -) Performs the poke-write.
>
> -) If it's a BLT mapping, asks the platform-specific code to do that
> without further tests.
>
> -) Otherwise times the poke registers many times to get an idea about
> the speed of every single-cycle access.
>
> If you want to dig even deeper, you can look in
> module/map/map_xpc_3310.c which is what is used in the most recent Linux
> Rio4's. It's mainly a wrapper around a proprietary black-box library, so
> not scary and scary at the same time.
>
> Best regards,
> Hans
>
> On 2024-02-21 14:32, Weber, Guenter Dr. wrote:
>> Dear Hans,
>>
>>
>> with the different register addresses it works.
>>
>>
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16
>> Address=0x333333fa
>> Raw-read value=0xfaf5
>>
>>
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16
>> Address=0x333333fc
>> Raw-read value=0x083a
>>
>>
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16
>> Address=0x333333fe
>> Raw-read value=0x01bc
>>
>> What can we learn from these numbers?
>>
>>
>>
>>
>> Best greetings
>>
>> Günter
>>
>>
>>
>> ------------------------------------------------------------------------
>> *Von:* Hans Toshihide Törnqvist <hans.tornqvist at chalmers.se>
>> *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06
>> *An:* Weber, Guenter Dr.
>> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560
>> module
>> Hmm, looks like address offset 0 is "not used", could you try
>> -a0x333333fa? Or fe and fc at the end,they should be some read-only
>> registers.
>>
>>
>> "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024
>> 12:06:00 CET)
>>
>> Different VME slot of the V560 module, same result. :-(
>>
>> ------------------------------------------------------------------------
>> *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag
>> von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
>> *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25
>> *An:* Hans Toshihide Törnqvist; Discuss use of Nurdlib, TRLO II,
>> drasi and UCESB.
>> *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>> CAEN_V560 module
>>
>> Dear Hans,
>>
>>
>> the output from manual reading of the module indeed shows a problem:
>>
>>
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16
>> Address=0x33333300
>> Raw-read value=rwdump: line 28: 593 Bus error
>> $PREFIX $f "$@"
>>
>>
>> The module was working with this address in the other DAQ system (as
>> we did not know the order of the individual switches, we set them
>> all to "3"). But I can take it our and put it in again at a
>> different slot, if maybe this particular slot has a hardware
>> problem. (But I never heard of such thing.)
>>
>>
>>
>>
>> Best greetings
>>
>> Günter
>>
>>
>> ------------------------------------------------------------------------
>> *Von:* Hans Toshihide Törnqvist <hans.tornqvist at chalmers.se>
>> *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44
>> *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber,
>> Guenter Dr.
>> *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>> CAEN_V560 module
>> Dear Günter,
>>
>> map_map before mapping tries to read and write some given registers
>> with a "safe" but slower method of accessing registers, which is
>> called "poking" in nurdlib. Maybe the method of access on the rio4
>> you have is not safe enough and one of the two pokes fails horribly...
>>
>> Could you please double check the module address? Could you also try
>> using bin/rwdump to read any register in the v560 to see if it's
>> accessible at all and not a problem with the module implementation
>> in nurdlib?
>>
>> Something like bin/rwdump -a0x33333300 -r16
>>
>> Actually the address 0x33333300 looks weird to me, maybe it should
>> be 0x33330000?
>> Also for reading, try register offsets fa, fc, fe, with 16 bits
>> accesseses, they should have some interesting values.
>>
>> Cheers,
>> Hans
>>
>>
>> "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari
>> 2024 10:18:29 CET)
>>
>> Dear Håkan,
>>
>>
>> thanks for the hint to flush and sleep. Indeed, I now see that
>> the crash happens in init_slow of V560 at this line:
>>
>>
>> v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
>> 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
>>
>>
>> Maybe the code is accessing/writing into a memory location that
>> it should better not touch?
>>
>> This problematic line is then followed by:
>>
>>
>> id=MAP_READ(v560->sicy_map, fixed_code);
>>
>> The corresponding line in the V560 code on the system that was
>> running with this module looks like this:
>>
>>
>> v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
>> 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
>> MAP_POKE_ARGS(*v560->write, scale_clear));
>>
>> And is followed by:
>>
>>
>> mapped_ptr =map_get_mapped_ptr(v560->sicy_map);
>> v560->read=mapped_ptr;
>> v560->write=mapped_ptr;
>>
>> Maybe you already have an idea what causes the problem here?
>>
>>
>> I will now go to the system that was running with V560 and make
>> a push of the NURDLIB.
>>
>>
>>
>>
>> Best greetings
>>
>> Günter
>>
>>
>>
>> ------------------------------------------------------------------------
>> *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im
>> Auftrag von Håkan T Johansson <f96hajo at chalmers.se>
>> *Gesendet:* Dienstag, 20. Februar 2024 20:13:32
>> *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>> *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>> CAEN_V560 module
>>
>> Dear Günter,
>>
>> I took the files you provided and for comparison put them in a
>> branch
>> 'old_caen_v560'.
>>
>> git diff origin/old_caen_v560..origin/master
>>
>> however does not show anything which is suspicious to me.
>> Perhaps Hans
>> can spot something.
>>
>> Otherwise, the only idea I can come up with is to continue to
>> bisect the
>> code inside slow init.
>>
>> However, before that, I would suggest to add
>>
>> fflush(stdout); sleep(1);
>>
>> after each printf statement, such that one can be quite sure
>> that the
>> printout is not eaten when the RIO crash happens. I.e. that it
>> actually
>> had gotten further than shown by the prints.
>>
>> Best regards,
>> Håkan
>>
>>
>>
>>
>> On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:
>>
>> >
>> > Dear friends,
>> >
>> >
>> > I now had a look at the system where the V560 was running. It was also setup
>> > by Bastian. And there the code for the V560 module is slightly different
>> > from the one included in the NURDLIB branch that I am using on the test
>> > system.
>> >
>> >
>> > Maybe you can have a look at it.
>> >
>> >
>> > I also could push the complete NURDLIB from this system, if this helps.
>> >
>> >
>> >
>> >
>> > Best greetings
>> >
>> > Günter
>> >
>> >
>> >
>> >
>> > ____________________________________________________________________________
>> > Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
>> > Guenter Dr. <g.weber at hi-jena.gsi.de>
>> > Gesendet: Dienstag, 20. Februar 2024 10:58:27
>> > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>> > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>> >
>> > Dear friends,
>> >
>> >
>> > I now grabbed a V560 module that was working fine in another DAQ system and
>> > put it into our test system.
>> >
>> >
>> > The main.cfg looks like this:
>> >
>> >
>> > log_level=spam # info, verbose, debug, spam
>> >
>> > CRATE("MCAL") {
>> > GSI_VULOM(0x03000000) {
>> > timestamp = true # needed to get timestamps in the data output
>> > # ecl=0..15
>> > }
>> > BARRIER
>> > CAEN_V560(0x333333300) {
>> > use_veto = true
>> > }
>> > # CAEN_V767A(0x03100000) {
>> > # }
>> > }
>> >
>> > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
>> > is necessary to talk to it again.
>> >
>> >
>> > The problem occurs in the first slow init of the V560 module. To find the
>> > exact line, I added some output to CRATE.C:
>> >
>> >
>> > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>> > before push_log_level(module)
>> > before a_crate->module_init_id = module->id
>> > before module->props->init_slow(a_crate, module)
>> > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>> > before module_init_id_mark(a_crate, module)
>> > before pop_log_level(module)
>> > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
>> > before push_log_level(module)
>> > before a_crate->module_init_id = module->id
>> > before module->props->init_slow(a_crate, module)
>> >
>> >
>> > The CRATE.C code now looks like this:
>> >
>> >
>> > TAILQ_FOREACH(module, &a_crate->module_list, next) {
>> > if (NULL == module->props) {
>> > continue;
>> > }
>> > LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>> > keyword_get_string(module->type));
>> > printf("before push_log_level(module) \n");
>> > push_log_level(module);
>> > printf("before a_crate->module_init_id = module->id \n");
>> > a_crate->module_init_id = module->id;
>> > printf("before module->props->init_slow(a_crate, module) \n");
>> > if (!module->props->init_slow(a_crate, module)) {
>> > printf("before pop_log_level(module) \n");
>> > pop_log_level(module);
>> > printf("before goto crate_init_done \n");
>> > goto crate_init_done;
>> > }
>> > printf("before module_init_id_mark(a_crate, module) \n");
>> > module_init_id_mark(a_crate, module);
>> > printf("before pop_log_level(module) \n");
>> > pop_log_level(module);
>> > }
>> >
>> > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
>> > module)) ..." is doing something quite horrible to the RIO4.
>> >
>> >
>> > This is unfortunate, because my original aim was to show that there is also
>> > a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>> >
>> >
>> > Do you have any idea what might cause the freezing of the RIO4?
>> >
>> >
>> >
>> >
>> > Best greetings and many thanks
>> >
>> > Günter
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>
More information about the subexp-daq
mailing list