From g.weber at hi-jena.gsi.de  Thu Feb 15 20:31:00 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Thu, 15 Feb 2024 19:31:00 +0000
Subject: [subexp-daq] Report of a possible bug with "log_level=spam"
Message-ID: <560c111aa5e343a9aa679dde1eeb3855@hi-jena.gsi.de>

Dear friends,


while playing around with the DAQ, I got the following problem right at the start of the DAQ:


5: util/log.c:319: ..........Log indent overflow: "..........Gsi Vulom readout_dt {".
5: util/log.c:319: ..........Calling abort()...


In my main.cfg I had just a single VULOM active.


log_level=spam # info, verbose, debug, spam

CRATE("MCAL") {
    GSI_VULOM(0x03000000) {
        timestamp = true # needed to get timestamps in the data output
    #   ecl=0..15
    }
#   BARRIER
#   DUMMY(0x01000000) {
#   }
}


If the log_level is reduced to debug, the error does not occur and the system is running with problems.


I am using the most recent version of NURDLIB.


Attached please find the full output of the RIO when the DAQ is started.


Best greetings

G?nter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240215/2ee85db1/attachment.html>
-------------- next part --------------
10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 56583).
Thread has no error buffer yet...
CPUS: 1
delay: 1
10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 56583).
Thread has no error buffer yet...
HOST: RIO4-MCAL-1
Token: d6d68d7b (d6d68d7b:d6d68d7b) [/mbsusr/mbsdaq/.drasi_tokens/mcal]
10: lwroc_hostname_util.c:457: Own address: 192.168.1.71/255.255.255.0 (eth1).
10: lwroc_data_pipe.c:145: Data buffer READOUT_PIPE, size 419430400 = 0x19000000, 1 consumers.
10: lwroc_triva_readout.c:66: Silence TRIVA  (HALT)
10: lwroc_net_io.c:167: Started server on port 56583 (data port 43514).
client union size: 244 208 188 508 640 204 204  => 640
10: lwroc_udp_awaken_hints.c:159: UDP awaken hints file: /tmp/drasi.u1001/drasi.hints.u1001.RIO4-MCAL-1:56583
10: lwroc_main.c:706: Log message rate limit not in effect.
10: lwroc_readout.c:112: call readout_init...
10: lwroc_thread_util.c:117: This is the triva control thread!
10: lwroc_thread_util.c:117: This is the net io thread!
10: lwroc_thread_util.c:117: This is the slow_async thread!
10: lwroc_thread_util.c:117: This is the data server thread!
10: lwroc_message_internal.c:472: Message client connected!
10: lwroc_net_trans.c:1156: [drasi] Transport client connected (data) [192.168.1.1].
10: lwroc_triva_control.c:370: Setup TRIVA  (DISBUS, HALT, MASTER, RESET)
10: lwroc_triva_control.c:418: Minimum event time ctime(5000)+1*rd(694)+3*wr(634)+fctime(1000)=8596 ns (116.333 kHz)
10: lwroc_triva_state.c:1486: (Re)send ident messages...
10: lwroc_triva_control.c:495: START TEST ACQ: HALT, CLEAR=RESET, MT=1
9: lwroc_triva_control.c:507: TEST: GO
10: lwroc_triva_control.c:725: RUN: RESET
10: lwroc_triva_control.c:729: RUN: MT=14
9: lwroc_triva_control.c:737:   GO (1 good test triggers done) (max 116.3 kHz)
10: lwroc_triva_readout.c:376: Trigger 14 seen.
10: config/config.c:181: Will try default cfg path='/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default', can be set with NURDLIB_DEF_PATH.
10: config/parser.c:287: Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' {
10: config/parser.c:299: Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' }
10: config/parser.c:287: Opened './main.cfg' {
8: lwroc_triva_state.c:2399: Master: deadtime: 1.  Status: 0x10 (IN_READOUT).  EC: 1
10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0.
8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting...
10: config/config.c:1299: .Global log level=spam.
10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' {
10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' }
10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' {
10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' }
10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' {
10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' }
10: config/parser.c:299: Closed './main.cfg' }
10: crate/crate.c:347: crate_create {
10: crate/crate.c:673: crate_create(MCAL) }
10: crate/crate.c:899: crate_init(MCAL) {
10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
10: crate/crate.c:976: .Fast-init module[0]=GSI_VULOM.
10: crate/crate.c:1073: crate_init(MCAL) }
10: ctrl/ctrl.c:788: Control server online.
Thread has no error buffer yet...
10: f_user.c:559: WR ID=0x200.
10: f_user.c:565: TS offset unset. Will not modify stamp.
10: f_user.c:572: TPAT: No.
10: f_user.c:573: Sync-check: No.
10: f_user.c:575: Spill triggers: No.
10: f_user.c:576: LMU: No.
10: f_user.c:577: Timer latches: No.
10: f_user.c:578: Spill shape: No.
10: f_user.c:579: Micro-structure: No.
10: f_user.c:581: Multi-event flag: No.
10: f_user.c:586: UDP destination: None.
5: util/log.c:319: ..........Log indent overflow: "..........Gsi Vulom readout_dt {".
5: util/log.c:319: ..........Calling abort()...

From hans.tornqvist at chalmers.se  Thu Feb 15 23:25:18 2024
From: hans.tornqvist at chalmers.se (=?ISO-8859-1?Q?Hans_Toshihide_T=F6rnqvist?=)
Date: Thu, 15 Feb 2024 23:25:18 +0100
Subject: [subexp-daq] Report of a possible bug with "log_level=spam"
In-Reply-To: <560c111aa5e343a9aa679dde1eeb3855@hi-jena.gsi.de>
References: <560c111aa5e343a9aa679dde1eeb3855@hi-jena.gsi.de>
Message-ID: <55CF5C51-002D-448D-ACEE-DE4C22F90D0B@chalmers.se>

Dear G?nter,

That is definitely a bug in nurdlib, thanks for finding it!

It looks like some log call which opens a new scope with "{" does not have a proper corresponding closing curly bracket log. Is there more information in the drasi log file by any chance? I will look through the relevant code meanwhile, but if you find more lines before the error appears it would help a lot.

(This also reminds me to evaluate nurdlib log scopes agai, since it requires very careful ha sling of actual C scope...) 

Best regards,
Hans


"Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (15 februari 2024 20:31:00 CET)
>Dear friends,
>
>
>while playing around with the DAQ, I got the following problem right at the start of the DAQ:
>
>
>5: util/log.c:319: ..........Log indent overflow: "..........Gsi Vulom readout_dt {".
>5: util/log.c:319: ..........Calling abort()...
>
>
>In my main.cfg I had just a single VULOM active.
>
>
>log_level=spam # info, verbose, debug, spam
>
>CRATE("MCAL") {
>    GSI_VULOM(0x03000000) {
>        timestamp = true # needed to get timestamps in the data output
>    #   ecl=0..15
>    }
>#   BARRIER
>#   DUMMY(0x01000000) {
>#   }
>}
>
>
>If the log_level is reduced to debug, the error does not occur and the system is running with problems.
>
>
>I am using the most recent version of NURDLIB.
>
>
>Attached please find the full output of the RIO when the DAQ is started.
>
>
>
>
>
>Best greetings
>
>G?nter
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240215/f9649341/attachment.html>

From g.weber at hi-jena.gsi.de  Fri Feb 16 01:50:30 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Fri, 16 Feb 2024 00:50:30 +0000
Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c
Message-ID: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de>

Dear friends,


we are now trying to add a new module into NURDLIB. At the beginning, we just want to have a 'software version' of the module, so no hardware involved yet.


We are stuck with the following code in crate.c (line 1262 and following):


            diff_module = COUNTER_DIFF(*module->crate_counter,
                module->event_counter, module->this_minus_crate);
            /* TODO: Clean this. */
            shadow_counter.value =
                module->shadow.data_counter_value;
            shadow_counter.mask = module->event_counter.mask;
            diff_shadow = COUNTER_DIFF(*module->crate_counter,
                shadow_counter, module->this_minus_crate);


As we understand, the idea here is to check with a call of readout_dt if the module's internal counter agrees with the counter of the crate for the given module. Basicly when the crate thinks it has accessed the module n times, the module should report the same number of access attempts. Is this understanding correct?


If yes, how exactly should the module increment it's internal counter? If this is done on readout_dt, a mismatch between the crate and the internal counter occurs as soon as we stop the aquisition.


I can explain in more details, but maybe first you can explain to us what the whole idea behind this check is? To us it looks like also the dummy module implementation will have problems with it.


Thank you very much!


Best greetings

G?nter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240216/f4481af8/attachment-0001.html>

From f96hajo at chalmers.se  Fri Feb 16 13:25:59 2024
From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=)
Date: Fri, 16 Feb 2024 13:25:59 +0100
Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c
In-Reply-To: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de>
References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de>
Message-ID: <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se>


Dear G?nter,

I think Hans might have to correct me.

This is not exactly what this code does, but typically the modules have a 
trigger/event counter, which is incremented for each gate/common signal 
they receive on the front-panel.

When the readout is event-by-event, these counters are checked strictly by 
nurdlib, in order to detect cabling issues, double-triggers and so on. 
This is especially important for modules that have multi-event buffers, 
that otherwise easily can become desynchronised.  (It takes of course some 
time to read these counters, which contributes to the overall deadtime. 
But the amount of times this has 'saved' data-taking by detecting issues 
early make it very worthwhile.)


What the code you refer to is doing is I think 'abusing' this a bit. 
There is typically some time (order of us) between the trigger, and when 
the signals have been digitised and the data becomes available.  Often, 
those counters are only updated after that is the case.  To me it looks 
like this function use the counters (which we anyhow want to check) to 
wait for the modules to have finished converting one event.


Best regards,
H?kan


On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote:

> 
> Dear friends,
> 
> 
> we are now trying to add a new module into NURDLIB. At the beginning, we just want to have a 'software version' of the module, so no
> hardware involved yet.
> 
> 
> We are stuck with the following code in crate.c (line 1262 and following):
> 
> 
> ? ? ? ? ? ? diff_module = COUNTER_DIFF(*module->crate_counter,
> ? ? ? ? ? ? ? ? module->event_counter, module->this_minus_crate);
> ? ? ? ? ? ? /* TODO: Clean this. */
> ? ? ? ? ? ? shadow_counter.value =
> ? ? ? ? ? ? ? ? module->shadow.data_counter_value;
> ? ? ? ? ? ? shadow_counter.mask = module->event_counter.mask;
> ? ? ? ? ? ? diff_shadow = COUNTER_DIFF(*module->crate_counter,
> ? ? ? ? ? ? ? ? shadow_counter, module->this_minus_crate);
> 
> 
> As we understand, the idea here is to check with a call of readout_dt if the module's internal counter agrees with the counter of the
> crate for the given module. Basicly when the crate thinks it has accessed the module n times, the module should report the same number of
> access attempts. Is this understanding correct?
> 
> 
> If yes, how exactly should the module increment it's internal counter? If this is done on readout_dt, a mismatch between the crate and the
> internal counter occurs as soon as we stop the aquisition.
> 
> 
> I can explain in more details, but maybe first you can explain to us what the whole idea behind this check is? To us it looks like also
> the dummy module implementation will have problems with it.
> 
> 
> 
> 
> Thank you very much!
> 
> 
> 
> 
> Best greetings
> 
> G?nter
> 
> 
> 
>

From hans.tornqvist at chalmers.se  Fri Feb 16 13:36:28 2024
From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=)
Date: Fri, 16 Feb 2024 13:36:28 +0100
Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c
In-Reply-To: <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se>
References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de>
	<1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se>
Message-ID: <e15926ab-2476-4192-9871-8ff196936bca@chalmers.se>

Dear all,

Everything looks fine, but I thought I could add a few more points, 
which were written in parallel to H?kan's reply, I'll try to update my 
text properly :)

---
The crate struct keeps a counter for every permutation of module tags. 
The counters are modified either by:

-) 'crate_tag_counter_increase' which should be called by the user code 
(e.g. the f-user) for every tag that should fire for a particular event, or

-) a scaler channel which counts the number of events, e.g. MASTER_START 
in a TRLO II vulom, or a v830 which has an accepted trigger cabled to an 
input, which is fully specified in main.cfg.

The latter overrides the former if configured, so the nurdlib f-user and 
r3bfuser always call 'crate_tag_counter_increase' per readout. For 
multi-event readout one would need to setup the latter approach. 
(Putting a description of this on the docu todo, if it's not already in...)

---
Best is when the "hardware" increments. For example, a v775 counts the 
number of signals sent on its trigger input, and nurdlib reads this 
value in readout_dt, which the crate then uses to compare with its 
software counters.
Some modules do not provide such counters directly before payload 
readout (e.g. gsi_tamex and similar) in which case the counting is done 
in software, which is mostly a test of the library logic. The module 
payload can carry a counter and trigger number however, which are 
checked later in '*parse_data'.

---
I will have a look at the dummy module, I thought we have a test with a 
few software triggers for it, otherwise I will add that.

Best regards,
Hans

On 2024-02-16 13:25, H?kan T Johansson wrote:
> 
> Dear G?nter,
> 
> I think Hans might have to correct me.
> 
> This is not exactly what this code does, but typically the modules have 
> a trigger/event counter, which is incremented for each gate/common 
> signal they receive on the front-panel.
> 
> When the readout is event-by-event, these counters are checked strictly 
> by nurdlib, in order to detect cabling issues, double-triggers and so 
> on. This is especially important for modules that have multi-event 
> buffers, that otherwise easily can become desynchronised.? (It takes of 
> course some time to read these counters, which contributes to the 
> overall deadtime. But the amount of times this has 'saved' data-taking 
> by detecting issues early make it very worthwhile.)
> 
> 
> What the code you refer to is doing is I think 'abusing' this a bit. 
> There is typically some time (order of us) between the trigger, and when 
> the signals have been digitised and the data becomes available.? Often, 
> those counters are only updated after that is the case.? To me it looks 
> like this function use the counters (which we anyhow want to check) to 
> wait for the modules to have finished converting one event.
> 
> 
> Best regards,
> H?kan
> 
> 
> 
> 
> 
> On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote:
> 
>>
>> Dear friends,
>>
>>
>> we are now trying to add a new module into NURDLIB. At the beginning, 
>> we just want to have a 'software version' of the module, so no
>> hardware involved yet.
>>
>>
>> We are stuck with the following code in crate.c (line 1262 and 
>> following):
>>
>>
>> ? ? ? ? ? ? diff_module = COUNTER_DIFF(*module->crate_counter,
>> ? ? ? ? ? ? ? ? module->event_counter, module->this_minus_crate);
>> ? ? ? ? ? ? /* TODO: Clean this. */
>> ? ? ? ? ? ? shadow_counter.value =
>> ? ? ? ? ? ? ? ? module->shadow.data_counter_value;
>> ? ? ? ? ? ? shadow_counter.mask = module->event_counter.mask;
>> ? ? ? ? ? ? diff_shadow = COUNTER_DIFF(*module->crate_counter,
>> ? ? ? ? ? ? ? ? shadow_counter, module->this_minus_crate);
>>
>>
>> As we understand, the idea here is to check with a call of readout_dt 
>> if the module's internal counter agrees with the counter of the
>> crate for the given module. Basicly when the crate thinks it has 
>> accessed the module n times, the module should report the same number of
>> access attempts. Is this understanding correct?
>>
>>
>> If yes, how exactly should the module increment it's internal counter? 
>> If this is done on readout_dt, a mismatch between the crate and the
>> internal counter occurs as soon as we stop the aquisition.
>>
>>
>> I can explain in more details, but maybe first you can explain to us 
>> what the whole idea behind this check is? To us it looks like also
>> the dummy module implementation will have problems with it.
>>
>>
>>
>>
>> Thank you very much!
>>
>>
>>
>>
>> Best greetings
>>
>> G?nter
>>
>>
>>
>>
> 

From g.weber at hi-jena.gsi.de  Fri Feb 16 13:50:02 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Fri, 16 Feb 2024 12:50:02 +0000
Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c
In-Reply-To: <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se>
References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de>,
	<1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se>
Message-ID: <5e78793bb00a4cb8840fef372e3ebb07@hi-jena.gsi.de>


Dear H?kan,


thank you. This explanation makes some sense.


Could you also explain the concept of "shadow module". What is it good for?


Also I would like to point out that the implementation of readout_dt for the V560 module that we used as a reference looks weird to us:


uint32_t
caen_v560_readout_dt(struct Crate *a_crate, struct Module *a_module)
{
    (void)a_crate;
    LOGF(spam)(LOGL, NAME" readout_dt {");
    a_module->event_counter.value++;
    LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }",
        a_module->event_counter.value);
    return 0;
}


Here the counter is incremented every time the module is 'touched' by the readout_dt function. In the loop that starts at line 1240 in crate.c the function readout_dt is executed for the module until the test around line 1270 is passed or the timeout around line 1280 happens. Thus an initial mismatch between the (software) counter of module V560 and the crate that prevents the loop from being existed at the first trial will grow steadily as the module is accessed via readout_dt many, many times until it runs into the timeout. What is this good for?


To my (current) understanding it is pointless to try to mimic the function of a true hardware counter within the module by a counter that only exists in software. The better way would be to tell crate.c that this module does not have such a counter so that the check is pointless. Is this understanding correct? And if yes, how can I tell NURDLIB to skip this check?


Best greetings

G?nter


________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
Gesendet: Freitag, 16. Februar 2024 13:25:59
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Question on COUNTER_DIFF in crate.c


Dear G?nter,

I think Hans might have to correct me.

This is not exactly what this code does, but typically the modules have a
trigger/event counter, which is incremented for each gate/common signal
they receive on the front-panel.

When the readout is event-by-event, these counters are checked strictly by
nurdlib, in order to detect cabling issues, double-triggers and so on.
This is especially important for modules that have multi-event buffers,
that otherwise easily can become desynchronised.  (It takes of course some
time to read these counters, which contributes to the overall deadtime.
But the amount of times this has 'saved' data-taking by detecting issues
early make it very worthwhile.)


What the code you refer to is doing is I think 'abusing' this a bit.
There is typically some time (order of us) between the trigger, and when
the signals have been digitised and the data becomes available.  Often,
those counters are only updated after that is the case.  To me it looks
like this function use the counters (which we anyhow want to check) to
wait for the modules to have finished converting one event.


Best regards,
H?kan


On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote:

>
> Dear friends,
>
>
> we are now trying to add a new module into NURDLIB. At the beginning, we just want to have a 'software version' of the module, so no
> hardware involved yet.
>
>
> We are stuck with the following code in crate.c (line 1262 and following):
>
>
>             diff_module = COUNTER_DIFF(*module->crate_counter,
>                 module->event_counter, module->this_minus_crate);
>             /* TODO: Clean this. */
>             shadow_counter.value =
>                 module->shadow.data_counter_value;
>             shadow_counter.mask = module->event_counter.mask;
>             diff_shadow = COUNTER_DIFF(*module->crate_counter,
>                 shadow_counter, module->this_minus_crate);
>
>
> As we understand, the idea here is to check with a call of readout_dt if the module's internal counter agrees with the counter of the
> crate for the given module. Basicly when the crate thinks it has accessed the module n times, the module should report the same number of
> access attempts. Is this understanding correct?
>
>
> If yes, how exactly should the module increment it's internal counter? If this is done on readout_dt, a mismatch between the crate and the
> internal counter occurs as soon as we stop the aquisition.
>
>
> I can explain in more details, but maybe first you can explain to us what the whole idea behind this check is? To us it looks like also
> the dummy module implementation will have problems with it.
>
>
>
>
> Thank you very much!
>
>
>
>
> Best greetings
>
> G?nter
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240216/2ea4ddbc/attachment-0001.html>

From hans.tornqvist at chalmers.se  Fri Feb 16 15:39:42 2024
From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=)
Date: Fri, 16 Feb 2024 15:39:42 +0100
Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c
In-Reply-To: <5e78793bb00a4cb8840fef372e3ebb07@hi-jena.gsi.de>
References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de>
	<1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se>
	<5e78793bb00a4cb8840fef372e3ebb07@hi-jena.gsi.de>
Message-ID: <68326b40-e0e4-493a-9287-12e07819070e@chalmers.se>

Dear G?nter,

On 2024-02-16 13:50, Weber, Guenter Dr. wrote:
> 
> Dear H?kan,
> 
> thank you. This explanation makes some sense.
> 
> Could you also explain the concept of "shadow module". What is it good for?

It's rather "shadow readout mode". The idea is to read data from a 
module continuously in parallel to conversion and buffering, instead of 
performing every task of acquisition in sequence where every task has to 
wait for all the others to finish.

The advantage is that this can significantly reduce the time where a 
module is unable to convert and buffer signals.

The potential disadvantage is that the data traffic on for example a VME 
backplane could induce noise in the analog measurement.

The non-shadow mode is the default and best tested mode, due to present 
module support, cooperation of modules in experiments, and historical 
reasons.

> Also I would like to point out that the implementation of readout_dt for 
> the V560 module that we used as a reference looks weird to us:
> 
> uint32_t
> caen_v560_readout_dt(structCrate*a_crate, structModule*a_module)
> {
>  ? ? (void)a_crate;
> LOGF(spam)(LOGL, NAME" readout_dt {");
> a_module->event_counter.value++;
> LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }",
> a_module->event_counter.value);
> return0;
> }
> 
> Here the counter is incremented every time the module is 'touched' by 
> the readout_dt function. In the loop that starts at line 1240 in crate.c 
> the function readout_dt is executed for the module until the test around 
> line 1270 is passed or the timeout around line 1280 happens. Thus an 
> initial mismatch between the (software) counter of module V560 and the 
> crate that prevents the loop from being existed at the first trial will 
> grow steadily as the module is accessed via readout_dt many, many times 
> until it runs into the timeout. What is this good for?

Nurdlib avoids resetting counters and instead latches and saves counter 
values (e.g. soft crate counters and module counters) during dead-time 
when counters should not change. After the first event, both counters 
must have incremented by 1. Any particular start value such as 0 or 1 
has no deeper meaning, it's the progression that is important.

Resetting tends to carry with it the idea of "after a reset, or setting 
something to 0, everything is fine". It moves the focus away from 
getting the complete logic correct, at least that is the feeling I get.

Now, to actually discuss your case :)
Are you seeing that the mismatch between the crate counter and the v560 
keeps increasing for new events? Or do you have such a problem with the 
dummy module? (I still did not find the slot to look at it...)

> To my (current) understanding it is pointless to try to mimic the 
> function of a true hardware counter within the module by a counter that 
> only exists in software. The better way would be to tell crate.c that 
> this module does not have such a counter so that the check is pointless. 
> Is this understanding correct? And if yes, how can I tell NURDLIB to 
> skip this check?

A software counter would be a consistency check of the implementation, 
but you are correct that it has little to do with the signals that are 
recorded. One can send the same random signal to different modules and 
verify correlation at some point after digitisation, and definitely no 
later than online monitoring while data is recorded.

I think it's possible to give the module counter a mask of 0. The 
counter check should in principle be:

mask = ctr_a_mask & ctr_b_mask;
ctr_a = ctr_a_raw - ctr_a_latch;
ctr_b = ctr_b_raw - ctr_b_latch;
if ( (ctr_a & mask) == (ctr_b & mask) ) { all good! }

If either mask is 0 the condition will always pass. There is maybe a 
better way to make this clear than built into the masking, but I would 
say that using a module without any kind of sync-check in 
event-per-event analysis is overall dangerous...

Hope that helps!

Best regards,
Hans

> Best greetings
> 
> G?nter
> 
> 
> 
> ------------------------------------------------------------------------
> *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von 
> H?kan T Johansson <f96hajo at chalmers.se>
> *Gesendet:* Freitag, 16. Februar 2024 13:25:59
> *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.
> *Betreff:* Re: [subexp-daq] Question on COUNTER_DIFF in crate.c
> 
> Dear G?nter,
> 
> I think Hans might have to correct me.
> 
> This is not exactly what this code does, but typically the modules have a
> trigger/event counter, which is incremented for each gate/common signal
> they receive on the front-panel.
> 
> When the readout is event-by-event, these counters are checked strictly by
> nurdlib, in order to detect cabling issues, double-triggers and so on.
> This is especially important for modules that have multi-event buffers,
> that otherwise easily can become desynchronised.? (It takes of course some
> time to read these counters, which contributes to the overall deadtime.
> But the amount of times this has 'saved' data-taking by detecting issues
> early make it very worthwhile.)
> 
> 
> What the code you refer to is doing is I think 'abusing' this a bit.
> There is typically some time (order of us) between the trigger, and when
> the signals have been digitised and the data becomes available.? Often,
> those counters are only updated after that is the case.? To me it looks
> like this function use the counters (which we anyhow want to check) to
> wait for the modules to have finished converting one event.
> 
> 
> Best regards,
> H?kan
> 
> 
> 
> 
> 
> On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote:
> 
>> 
>> Dear friends,
>> 
>> 
>> we are now trying to add a new module into NURDLIB. At the beginning, we just want to have a 'software version' of the module, so no
>> hardware involved yet.
>> 
>> 
>> We are stuck with the following code in crate.c (line 1262 and following):
>> 
>> 
>> ? ? ? ? ? ? diff_module = COUNTER_DIFF(*module->crate_counter,
>> ? ? ? ? ? ? ? ? module->event_counter, module->this_minus_crate);
>> ? ? ? ? ? ? /* TODO: Clean this. */
>> ? ? ? ? ? ? shadow_counter.value =
>> ? ? ? ? ? ? ? ? module->shadow.data_counter_value;
>> ? ? ? ? ? ? shadow_counter.mask = module->event_counter.mask;
>> ? ? ? ? ? ? diff_shadow = COUNTER_DIFF(*module->crate_counter,
>> ? ? ? ? ? ? ? ? shadow_counter, module->this_minus_crate);
>> 
>> 
>> As we understand, the idea here is to check with a call of readout_dt if the module's internal counter agrees with the counter of the
>> crate for the given module. Basicly when the crate thinks it has accessed the module n times, the module should report the same number of
>> access attempts. Is this understanding correct?
>> 
>> 
>> If yes, how exactly should the module increment it's internal counter? If this is done on readout_dt, a mismatch between the crate and the
>> internal counter occurs as soon as we stop the aquisition.
>> 
>> 
>> I can explain in more details, but maybe first you can explain to us what the whole idea behind this check is? To us it looks like also
>> the dummy module implementation will have problems with it.
>> 
>> 
>> 
>> 
>> Thank you very much!
>> 
>> 
>> 
>> 
>> Best greetings
>> 
>> G?nter
>> 
>> 
>> 
>>
> 

From hans.tornqvist at chalmers.se  Fri Feb 16 16:01:49 2024
From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=)
Date: Fri, 16 Feb 2024 16:01:49 +0100
Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c
In-Reply-To: <68326b40-e0e4-493a-9287-12e07819070e@chalmers.se>
References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de>
	<1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se>
	<5e78793bb00a4cb8840fef372e3ebb07@hi-jena.gsi.de>
	<68326b40-e0e4-493a-9287-12e07819070e@chalmers.se>
Message-ID: <1059b12a-8a34-4aef-a6c6-06e60871ea84@chalmers.se>

Dear G?nter,

I had a quick look in the Caen v767 manual and thought I could mention a 
few of my thoughts.

There is an event-counter register on offset 0x4c which holds the number 
of events transferred to the output buffer. This can be read out and 
returned in readout_dt.

Note here that the time from accepting a trigger to starting the readout 
of the module must be sufficiently long, otherwise you might get a value 
before the complete event has been finished! If the module behaves 
correctly this counter should only update once the last word of an event 
is ready to be read from the output buffer.
This waiting time is typically controlled with the "conversion time" 
that can be set in the TRIMI, or the trigger module in general. Waking 
up the readout computer which accesses the module can take an 
"arbitrary" time on top of the conversion time, so one can play around 
with this value a bit.

There is an adaptive conversion time (acvt) feature in nurdlib that I 
have not tested myself for quite some time. It polls event-counters 
until some timeout, and adjusts the CVT on-the-fly to reduce the overall 
waiting time and polling calls. Could be interesting if you would like 
to go that far in optimising your setup, but I do not know how well the 
Sis 3316 supports this feature.

Back to the v767. It looks like the headers in the payload have the same 
event-counter in the lowest 10 bits. I would suggest that parse_data 
checks this value with the module event-counter. The End-of-block word 
has an event-size value which should also be checked with the size of 
the payload, but I would not check every time measurement word on a 
typically slow VME controller.

That's enough for now :)

Best regards,
Hans

On 2024-02-16 15:39, Hans Toshihide T?rnqvist wrote:
> Dear G?nter,
> 
> On 2024-02-16 13:50, Weber, Guenter Dr. wrote:
>>
>> Dear H?kan,
>>
>> thank you. This explanation makes some sense.
>>
>> Could you also explain the concept of "shadow module". What is it good 
>> for?
> 
> It's rather "shadow readout mode". The idea is to read data from a 
> module continuously in parallel to conversion and buffering, instead of 
> performing every task of acquisition in sequence where every task has to 
> wait for all the others to finish.
> 
> The advantage is that this can significantly reduce the time where a 
> module is unable to convert and buffer signals.
> 
> The potential disadvantage is that the data traffic on for example a VME 
> backplane could induce noise in the analog measurement.
> 
> The non-shadow mode is the default and best tested mode, due to present 
> module support, cooperation of modules in experiments, and historical 
> reasons.
> 
>> Also I would like to point out that the implementation of readout_dt 
>> for the V560 module that we used as a reference looks weird to us:
>>
>> uint32_t
>> caen_v560_readout_dt(structCrate*a_crate, structModule*a_module)
>> {
>> ?? ? (void)a_crate;
>> LOGF(spam)(LOGL, NAME" readout_dt {");
>> a_module->event_counter.value++;
>> LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }",
>> a_module->event_counter.value);
>> return0;
>> }
>>
>> Here the counter is incremented every time the module is 'touched' by 
>> the readout_dt function. In the loop that starts at line 1240 in 
>> crate.c the function readout_dt is executed for the module until the 
>> test around line 1270 is passed or the timeout around line 1280 
>> happens. Thus an initial mismatch between the (software) counter of 
>> module V560 and the crate that prevents the loop from being existed at 
>> the first trial will grow steadily as the module is accessed via 
>> readout_dt many, many times until it runs into the timeout. What is 
>> this good for?
> 
> Nurdlib avoids resetting counters and instead latches and saves counter 
> values (e.g. soft crate counters and module counters) during dead-time 
> when counters should not change. After the first event, both counters 
> must have incremented by 1. Any particular start value such as 0 or 1 
> has no deeper meaning, it's the progression that is important.
> 
> Resetting tends to carry with it the idea of "after a reset, or setting 
> something to 0, everything is fine". It moves the focus away from 
> getting the complete logic correct, at least that is the feeling I get.
> 
> Now, to actually discuss your case :)
> Are you seeing that the mismatch between the crate counter and the v560 
> keeps increasing for new events? Or do you have such a problem with the 
> dummy module? (I still did not find the slot to look at it...)
> 
>> To my (current) understanding it is pointless to try to mimic the 
>> function of a true hardware counter within the module by a counter 
>> that only exists in software. The better way would be to tell crate.c 
>> that this module does not have such a counter so that the check is 
>> pointless. Is this understanding correct? And if yes, how can I tell 
>> NURDLIB to skip this check?
> 
> A software counter would be a consistency check of the implementation, 
> but you are correct that it has little to do with the signals that are 
> recorded. One can send the same random signal to different modules and 
> verify correlation at some point after digitisation, and definitely no 
> later than online monitoring while data is recorded.
> 
> I think it's possible to give the module counter a mask of 0. The 
> counter check should in principle be:
> 
> mask = ctr_a_mask & ctr_b_mask;
> ctr_a = ctr_a_raw - ctr_a_latch;
> ctr_b = ctr_b_raw - ctr_b_latch;
> if ( (ctr_a & mask) == (ctr_b & mask) ) { all good! }
> 
> If either mask is 0 the condition will always pass. There is maybe a 
> better way to make this clear than built into the masking, but I would 
> say that using a module without any kind of sync-check in 
> event-per-event analysis is overall dangerous...
> 
> Hope that helps!
> 
> Best regards,
> Hans
> 
>> Best greetings
>>
>> G?nter
>>
>>
>>
>> ------------------------------------------------------------------------
>> *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag 
>> von H?kan T Johansson <f96hajo at chalmers.se>
>> *Gesendet:* Freitag, 16. Februar 2024 13:25:59
>> *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>> *Betreff:* Re: [subexp-daq] Question on COUNTER_DIFF in crate.c
>>
>> Dear G?nter,
>>
>> I think Hans might have to correct me.
>>
>> This is not exactly what this code does, but typically the modules have a
>> trigger/event counter, which is incremented for each gate/common signal
>> they receive on the front-panel.
>>
>> When the readout is event-by-event, these counters are checked 
>> strictly by
>> nurdlib, in order to detect cabling issues, double-triggers and so on.
>> This is especially important for modules that have multi-event buffers,
>> that otherwise easily can become desynchronised.? (It takes of course 
>> some
>> time to read these counters, which contributes to the overall deadtime.
>> But the amount of times this has 'saved' data-taking by detecting issues
>> early make it very worthwhile.)
>>
>>
>> What the code you refer to is doing is I think 'abusing' this a bit.
>> There is typically some time (order of us) between the trigger, and when
>> the signals have been digitised and the data becomes available.? Often,
>> those counters are only updated after that is the case.? To me it looks
>> like this function use the counters (which we anyhow want to check) to
>> wait for the modules to have finished converting one event.
>>
>>
>> Best regards,
>> H?kan
>>
>>
>>
>>
>>
>> On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote:
>>
>>>
>>> Dear friends,
>>>
>>>
>>> we are now trying to add a new module into NURDLIB. At the beginning, 
>>> we just want to have a 'software version' of the module, so no
>>> hardware involved yet.
>>>
>>>
>>> We are stuck with the following code in crate.c (line 1262 and 
>>> following):
>>>
>>>
>>> ? ? ? ? ? ? diff_module = COUNTER_DIFF(*module->crate_counter,
>>> ? ? ? ? ? ? ? ? module->event_counter, module->this_minus_crate);
>>> ? ? ? ? ? ? /* TODO: Clean this. */
>>> ? ? ? ? ? ? shadow_counter.value =
>>> ? ? ? ? ? ? ? ? module->shadow.data_counter_value;
>>> ? ? ? ? ? ? shadow_counter.mask = module->event_counter.mask;
>>> ? ? ? ? ? ? diff_shadow = COUNTER_DIFF(*module->crate_counter,
>>> ? ? ? ? ? ? ? ? shadow_counter, module->this_minus_crate);
>>>
>>>
>>> As we understand, the idea here is to check with a call of readout_dt 
>>> if the module's internal counter agrees with the counter of the
>>> crate for the given module. Basicly when the crate thinks it has 
>>> accessed the module n times, the module should report the same number of
>>> access attempts. Is this understanding correct?
>>>
>>>
>>> If yes, how exactly should the module increment it's internal 
>>> counter? If this is done on readout_dt, a mismatch between the crate 
>>> and the
>>> internal counter occurs as soon as we stop the aquisition.
>>>
>>>
>>> I can explain in more details, but maybe first you can explain to us 
>>> what the whole idea behind this check is? To us it looks like also
>>> the dummy module implementation will have problems with it.
>>>
>>>
>>>
>>>
>>> Thank you very much!
>>>
>>>
>>>
>>>
>>> Best greetings
>>>
>>> G?nter
>>>
>>>
>>>
>>>
>>

From g.weber at hi-jena.gsi.de  Mon Feb 19 10:15:37 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Mon, 19 Feb 2024 09:15:37 +0000
Subject: [subexp-daq] Question on default firmware for VULOM
Message-ID: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de>

Dear friends,


I restarted the crate and now get the following error message when starting the DAQ:


10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07 19:56:55 UTC)
WARNING: Known firmware (alias): 0x6e4ba1a9.
WARNING: Known firmware (alias): 0x1409285e.
WARNING: Known firmware (alias): 0xa73c5093.
WARNING: Known firmware (alias): 0x6e4ba1a9.
WARNING: Known firmware (alias): 0x1409285e.
FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or alias.


I assume that upon power-on the VULOM is booting with a default firmware that is not the one necessary to run with the most recent version of NURDLIB, etc.


How can I tell the VULOM which firmware version from its memory it should load as default?


Thank you very much!


Best greetings

G?nter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240219/5788b2ae/attachment.html>

From hans.tornqvist at chalmers.se  Mon Feb 19 10:37:37 2024
From: hans.tornqvist at chalmers.se (=?ISO-8859-1?Q?Hans_Toshihide_T=F6rnqvist?=)
Date: Mon, 19 Feb 2024 10:37:37 +0100
Subject: [subexp-daq] Question on default firmware for VULOM
In-Reply-To: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de>
References: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de>
Message-ID: <32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se>

Dear G?nter,

As I understand it, the firmware you would like to have has been flashed and tested quite a lot and seems to work?

Then you can flash program area 0, but you will need the --force option too. Area 0 will only be flashed if the firmware in question has been flashed into another area, as in your present case.

After that it will load after every power cycle.

Cheers,
Hans 


"Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (19 februari 2024 10:15:37 CET)
>Dear friends,
>
>
>I restarted the crate and now get the following error message when starting the DAQ:
>
>
>10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07 19:56:55 UTC)
>WARNING: Known firmware (alias): 0x6e4ba1a9.
>WARNING: Known firmware (alias): 0x1409285e.
>WARNING: Known firmware (alias): 0xa73c5093.
>WARNING: Known firmware (alias): 0x6e4ba1a9.
>WARNING: Known firmware (alias): 0x1409285e.
>FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or alias.
>
>
>I assume that upon power-on the VULOM is booting with a default firmware that is not the one necessary to run with the most recent version of NURDLIB, etc.
>
>
>How can I tell the VULOM which firmware version from its memory it should load as default?
>
>
>
>
>Thank you very much!
>
>
>
>
>Best greetings
>
>G?nter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240219/75fe453e/attachment.html>

From g.weber at hi-jena.gsi.de  Tue Feb 20 10:46:52 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Tue, 20 Feb 2024 09:46:52 +0000
Subject: [subexp-daq] Question on default firmware for VULOM
In-Reply-To: <32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se>
References: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de>,
	<32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se>
Message-ID: <de5cf41191864c8083f002af44d57248@hi-jena.gsi.de>

Dear Hans,


just to close this issue, here is what we are doing now to set the VULOM to the right firmware. As it is possible that a DAQ system is operating with different software versions that also require different VULOM firmware, we now load the right firmware 'on-the-fly' as part of the startup of the DAQ system. After loading of the firmware a short waiting time is necessary before the VULOM is ready again.


---------------------------------------------------

# Obtain the right firmware

VULOM4_FW=`cat ${TRLOII_PATH}/fw/vulom4b_trlo/trlo_defs.h | grep MD5SUM_STAMP | sed 's/.*0x//'`
export VULOM4_CTRL=$trloiipath/trloctrl/fw_${VULOM4_FW}_trlo/bin_${machine}_${version}/trlo_ctrl


# setup local constants
addr=3
firmware_region=2

# restarting the VULOM to set the correct firmware version
echo "Restarting VULOM with firmware region" $firmware_region
$TRLOII_PATH/bin/vulomflash --addr=$addr --restart=$firmware_region
echo "Waiting for VULOM to answer..."
exit_code=1
while [ $exit_code -ne 0 ] ; do
    $TRLOII_PATH/bin/vulomflash --addr=$addr --read &> /dev/null
    exit_code=$?
    if [ $exit_code -ne 0 ] ; then
        sleep 1 #sleep for 1 sec before retrying
    fi
done


# Trigger on VULOM pulser
$VULOM4_CTRL --addr=$addr --clear-setup --config=vulom.trlo standalone module_trigger

----------------------------------------------------


If we did understand the naming convention in "bin/vulomflash --addr=$ADDR --readprogs" a bit better, we would probably also be able to estimate from the name/number of the firmware the right firmware region on the VULOM (and we could check if the right firmware was already loaded). But at the moment, we set firmware_region by hand.


In the ideal world, we would then have an init script for the DAQ that takes care of the VULOM settings just by looking at the state of the software under TRLOII_PATH. This could help our guys a lot who have no idea about all this stuff :-)


Best greetings

G?nter


________________________________
Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
Gesendet: Montag, 19. Februar 2024 10:37:37
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.; Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Question on default firmware for VULOM

Dear G?nter,

As I understand it, the firmware you would like to have has been flashed and tested quite a lot and seems to work?

Then you can flash program area 0, but you will need the --force option too. Area 0 will only be flashed if the firmware in question has been flashed into another area, as in your present case.

After that it will load after every power cycle.

Cheers,
Hans


"Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (19 februari 2024 10:15:37 CET)

Dear friends,


I restarted the crate and now get the following error message when starting the DAQ:


10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07 19:56:55 UTC)
WARNING: Known firmware (alias): 0x6e4ba1a9.
WARNING: Known firmware (alias): 0x1409285e.
WARNING: Known firmware (alias): 0xa73c5093.
WARNING: Known firmware (alias): 0x6e4ba1a9.
WARNING: Known firmware (alias): 0x1409285e.
FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or alias.


I assume that upon power-on the VULOM is booting with a default firmware that is not the one necessary to run with the most recent version of NURDLIB, etc.


How can I tell the VULOM which firmware version from its memory it should load as default?


Thank you very much!


Best greetings

G?nter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240220/cca5657e/attachment.html>

From g.weber at hi-jena.gsi.de  Tue Feb 20 10:58:27 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Tue, 20 Feb 2024 09:58:27 +0000
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
Message-ID: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>

Dear friends,


I now grabbed a V560 module that was working fine in another DAQ system and put it into our test system.


The main.cfg looks like this:


log_level=spam # info, verbose, debug, spam

CRATE("MCAL") {
    GSI_VULOM(0x03000000) {
        timestamp = true # needed to get timestamps in the data output
    #   ecl=0..15
    }
    BARRIER
    CAEN_V560(0x333333300) {
        use_veto = true
    }
#   CAEN_V767A(0x03100000) {
#   }
}


Starting the DAQ now results in a freeze of the RIO4. A reset of the crate is necessary to talk to it again.


The problem occurs in the first slow init of the V560 module. To find the exact line, I added some output to CRATE.C:


10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
before push_log_level(module)
before a_crate->module_init_id = module->id
before module->props->init_slow(a_crate, module)
LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
before module_init_id_mark(a_crate, module)
before pop_log_level(module)
10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
before push_log_level(module)
before a_crate->module_init_id = module->id
before module->props->init_slow(a_crate, module)


The CRATE.C code now looks like this:


    TAILQ_FOREACH(module, &a_crate->module_list, next) {
        if (NULL == module->props) {
            continue;
        }
        LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
            keyword_get_string(module->type));
        printf("before push_log_level(module) \n");
        push_log_level(module);
        printf("before a_crate->module_init_id = module->id \n");
        a_crate->module_init_id = module->id;
        printf("before module->props->init_slow(a_crate, module) \n");
        if (!module->props->init_slow(a_crate, module)) {
            printf("before pop_log_level(module) \n");
            pop_log_level(module);
            printf("before goto crate_init_done \n");
            goto crate_init_done;
        }
        printf("before module_init_id_mark(a_crate, module) \n");
        module_init_id_mark(a_crate, module);
        printf("before pop_log_level(module) \n");
        pop_log_level(module);
    }


Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, module)) ..." is doing something quite horrible to the RIO4.


This is unfortunate, because my original aim was to show that there is also a bug/mistake in readout_dt of the V560 module. But I did not come this far.


Do you have any idea what might cause the freezing of the RIO4?


Best greetings and many thanks

G?nter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240220/57ed3a60/attachment.html>

From g.weber at hi-jena.gsi.de  Tue Feb 20 13:54:22 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Tue, 20 Feb 2024 12:54:22 +0000
Subject: [subexp-daq] Question on default firmware for VULOM
In-Reply-To: <de5cf41191864c8083f002af44d57248@hi-jena.gsi.de>
References: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de>,
	<32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se>,
	<de5cf41191864c8083f002af44d57248@hi-jena.gsi.de>
Message-ID: <bf091547e1914eb28608374f74496804@hi-jena.gsi.de>

Dear friends,


attached please find a script that automatically choses the right firmware for the VULOM.


If you find time, please have a look at it and tell me if you think it is useful or if there is any mistake.


On my system it works fine so far.


Best greetings

G?nter

________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
Gesendet: Dienstag, 20. Februar 2024 10:46:52
An: Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Question on default firmware for VULOM


Dear Hans,


just to close this issue, here is what we are doing now to set the VULOM to the right firmware. As it is possible that a DAQ system is operating with different software versions that also require different VULOM firmware, we now load the right firmware 'on-the-fly' as part of the startup of the DAQ system. After loading of the firmware a short waiting time is necessary before the VULOM is ready again.


---------------------------------------------------

# Obtain the right firmware

VULOM4_FW=`cat ${TRLOII_PATH}/fw/vulom4b_trlo/trlo_defs.h | grep MD5SUM_STAMP | sed 's/.*0x//'`
export VULOM4_CTRL=$trloiipath/trloctrl/fw_${VULOM4_FW}_trlo/bin_${machine}_${version}/trlo_ctrl


# setup local constants
addr=3
firmware_region=2

# restarting the VULOM to set the correct firmware version
echo "Restarting VULOM with firmware region" $firmware_region
$TRLOII_PATH/bin/vulomflash --addr=$addr --restart=$firmware_region
echo "Waiting for VULOM to answer..."
exit_code=1
while [ $exit_code -ne 0 ] ; do
    $TRLOII_PATH/bin/vulomflash --addr=$addr --read &> /dev/null
    exit_code=$?
    if [ $exit_code -ne 0 ] ; then
        sleep 1 #sleep for 1 sec before retrying
    fi
done


# Trigger on VULOM pulser
$VULOM4_CTRL --addr=$addr --clear-setup --config=vulom.trlo standalone module_trigger

----------------------------------------------------


If we did understand the naming convention in "bin/vulomflash --addr=$ADDR --readprogs" a bit better, we would probably also be able to estimate from the name/number of the firmware the right firmware region on the VULOM (and we could check if the right firmware was already loaded). But at the moment, we set firmware_region by hand.


In the ideal world, we would then have an init script for the DAQ that takes care of the VULOM settings just by looking at the state of the software under TRLOII_PATH. This could help our guys a lot who have no idea about all this stuff :-)


Best greetings

G?nter


________________________________
Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
Gesendet: Montag, 19. Februar 2024 10:37:37
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.; Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Question on default firmware for VULOM

Dear G?nter,

As I understand it, the firmware you would like to have has been flashed and tested quite a lot and seems to work?

Then you can flash program area 0, but you will need the --force option too. Area 0 will only be flashed if the firmware in question has been flashed into another area, as in your present case.

After that it will load after every power cycle.

Cheers,
Hans


"Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (19 februari 2024 10:15:37 CET)

Dear friends,


I restarted the crate and now get the following error message when starting the DAQ:


10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07 19:56:55 UTC)
WARNING: Known firmware (alias): 0x6e4ba1a9.
WARNING: Known firmware (alias): 0x1409285e.
WARNING: Known firmware (alias): 0xa73c5093.
WARNING: Known firmware (alias): 0x6e4ba1a9.
WARNING: Known firmware (alias): 0x1409285e.
FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or alias.


I assume that upon power-on the VULOM is booting with a default firmware that is not the one necessary to run with the most recent version of NURDLIB, etc.


How can I tell the VULOM which firmware version from its memory it should load as default?


Thank you very much!


Best greetings

G?nter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240220/b00cd88d/attachment.html>
-------------- next part --------------
#!/bin/sh

# hardware address of the VULOM
addr=3

# the following script checks if the VULOM is running with the firmware required by the software in TRLOII_PATH
# if necessary, the script looks for the location of the required firmware on the memory of the VULOM and sets the VULOM to the desired firmware version
VULOM4_FW=`cat ${TRLOII_PATH}/fw/vulom4b_trlo/trlo_defs.h | grep MD5SUM_STAMP | sed 's/.*0x//'`
export VULOM4_CTRL=$trloiipath/trloctrl/fw_${VULOM4_FW}_trlo/bin_${machine}_${version}/trlo_ctrl
VULOM_FW_NOW=`${TRLOII_PATH}/bin/vulomflash --addr=3 --read | grep "VOLUM+0 =>" | sed "s/.*0x//"`
echo "Current firmware on VULOM -> " $VULOM_FW_NOW
echo "Necessary firmware on VULOM -> " $VULOM4_FW
if [ $VULOM4_FW != $VULOM_FW_NOW ] ; then
    firmware_region=`${TRLOII_PATH}/bin/vulomflash --addr=3 --readprogs | sed -n -e "/$VULOM4_FW/{s/^.*Rng \([0-9]\+\):.*$/\1/p;q}"`
    if [ -z $firmware_region ] ; then
        echo "Necessary firmware not found on VULOM!"
        exit 1
    else
        # restarting the VULOM to set the correct firmware version
        echo "Restarting VULOM with firmware region" $firmware_region
        $TRLOII_PATH/bin/vulomflash --addr=$addr --restart=$firmware_region
        echo "Waiting for VULOM to answer..."
        exit_code=1
        while [ $exit_code -ne 0 ] ; do
            $TRLOII_PATH/bin/vulomflash --addr=$addr --read &> /dev/null
            exit_code=$?
            if [ $exit_code -ne 0 ] ; then
                sleep 1 #sleep for 1 sec before retrying
            fi
        done
    fi
fi

From g.weber at hi-jena.gsi.de  Tue Feb 20 15:33:49 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Tue, 20 Feb 2024 14:33:49 +0000
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
Message-ID: <c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>

Dear friends,


I now had a look at the system where the V560 was running. It was also setup by Bastian. And there the code for the V560 module is slightly different from the one included in the NURDLIB branch that I am using on the test system.


Maybe you can have a look at it.


I also could push the complete NURDLIB from this system, if this helps.


Best greetings

G?nter


________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
Gesendet: Dienstag, 20. Februar 2024 10:58:27
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module


Dear friends,


I now grabbed a V560 module that was working fine in another DAQ system and put it into our test system.


The main.cfg looks like this:


log_level=spam # info, verbose, debug, spam

CRATE("MCAL") {
    GSI_VULOM(0x03000000) {
        timestamp = true # needed to get timestamps in the data output
    #   ecl=0..15
    }
    BARRIER
    CAEN_V560(0x333333300) {
        use_veto = true
    }
#   CAEN_V767A(0x03100000) {
#   }
}


Starting the DAQ now results in a freeze of the RIO4. A reset of the crate is necessary to talk to it again.


The problem occurs in the first slow init of the V560 module. To find the exact line, I added some output to CRATE.C:


10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
before push_log_level(module)
before a_crate->module_init_id = module->id
before module->props->init_slow(a_crate, module)
LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
before module_init_id_mark(a_crate, module)
before pop_log_level(module)
10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
before push_log_level(module)
before a_crate->module_init_id = module->id
before module->props->init_slow(a_crate, module)


The CRATE.C code now looks like this:


    TAILQ_FOREACH(module, &a_crate->module_list, next) {
        if (NULL == module->props) {
            continue;
        }
        LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
            keyword_get_string(module->type));
        printf("before push_log_level(module) \n");
        push_log_level(module);
        printf("before a_crate->module_init_id = module->id \n");
        a_crate->module_init_id = module->id;
        printf("before module->props->init_slow(a_crate, module) \n");
        if (!module->props->init_slow(a_crate, module)) {
            printf("before pop_log_level(module) \n");
            pop_log_level(module);
            printf("before goto crate_init_done \n");
            goto crate_init_done;
        }
        printf("before module_init_id_mark(a_crate, module) \n");
        module_init_id_mark(a_crate, module);
        printf("before pop_log_level(module) \n");
        pop_log_level(module);
    }


Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, module)) ..." is doing something quite horrible to the RIO4.


This is unfortunate, because my original aim was to show that there is also a bug/mistake in readout_dt of the V560 module. But I did not come this far.


Do you have any idea what might cause the freezing of the RIO4?


Best greetings and many thanks

G?nter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240220/8708cd77/attachment-0001.html>
-------------- next part --------------
#ifndef MODULE_CAEN_V560_INTERNAL_H
#define MODULE_CAEN_V560_INTERNAL_H

#include <module/caen_v560/structs.h>

struct CaenV560Module {
	struct	Module module;
	uint32_t	address;
	struct	Map *sicy_map;
	struct	CaenV560Read volatile const *read;
	struct	CaenV560Write volatile *write;
	unsigned	use_veto;
};

#endif
-------------- next part --------------
#include <module/caen_v560/caen_v560.h>
#include <module/caen_v560/internal.h>
#include <module/map/map.h>
#include <nurdlib/config.h>
#include <nurdlib/log.h>
#include <util/math.h>
#include <util/string.h>

#define NAME "Caen v560"

MODULE_PROTOTYPES(caen_v560);

int
caen_v560_are_distinguishable(enum Keyword a_type)
{
	(void)a_type;
	LOGF(verbose)(LOGL, NAME" are_distinguishable.");
	return 1;
}

uint32_t
caen_v560_check_empty(struct Module *a_module)
{
	(void)a_module;
	return 0;
}

struct Module *
caen_v560_create_(struct Crate *a_crate, struct ConfigBlock const *a_block)
{
	struct CaenV560Module *v560;

	LOGF(verbose)(LOGL, NAME" create {");

	(void)a_crate;
	MODULE_CREATE(v560);
	v560->module.event_max = 32; /* no event buffer, arbitrary > 0 */
	v560->address = config_get_block_param_int32(a_block, 0);
	LOGF(verbose)(LOGL, "Address=%08x.", v560->address);

	LOGF(verbose)(LOGL, NAME" create }");

	return (void *)v560;
}

void
caen_v560_deinit(struct Module *a_module)
{
	struct CaenV560Module *v560;

	LOGF(verbose)(LOGL, NAME" deinit {");
	MODULE_CAST(KW_CAEN_V560, v560, a_module);
	map_unmap(&v560->sicy_map);
	LOGF(verbose)(LOGL, NAME" deinit }");
}

void
caen_v560_destroy(struct Module *a_module)
{
	(void)a_module;
	LOGF(verbose)(LOGL, NAME" destroy.");
}

uintptr_t
caen_v560_get_module_base(struct Module const *a_module)
{
	struct CaenV560Module *v560;
	uintptr_t base;

	LOGF(verbose)(LOGL, NAME" get_module_base {");
	MODULE_CAST(KW_CAEN_V560, v560, a_module);
	base = (uintptr_t)map_get_mapped_ptr(v560->sicy_map);
	LOGF(verbose)(LOGL, NAME" get_module_base(%p) }", (void *)base);
	return base;
}

int
caen_v560_init_fast(struct Crate *a_crate, struct Module *a_module)
{
	struct CaenV560Module *v560;

	(void)a_crate;
	LOGF(verbose)(LOGL, NAME" init_fast {");

	MODULE_CAST(KW_CAEN_V560, v560, a_module);

	v560->use_veto = config_get_boolean(a_module->config, KW_USE_VETO);
	LOGF(verbose)(LOGL, "use_veto = %s", v560->use_veto ? "yes" : "no");

	v560->write->scale_clear = 1;
	v560->write->vme_veto_reset = 1;

	LOGF(verbose)(LOGL, NAME" init_fast }");
	return 1;
}

int
caen_v560_init_slow(struct Crate *a_crate, struct Module *a_module)
{
	struct CaenV560Module *v560;
	void volatile *mapped_ptr;
	uint16_t id;

	(void)a_crate;
	LOGF(verbose)(LOGL, NAME" init_slow {");

	MODULE_CAST(KW_CAEN_V560, v560, a_module);

	v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
	    0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
	    MAP_POKE_ARGS(*v560->write, scale_clear));
	mapped_ptr = map_get_mapped_ptr(v560->sicy_map);
	v560->read = mapped_ptr;
	v560->write = mapped_ptr;

	id = v560->read->fixed_code;
	if (0xfaf5 != id) {
		log_die(LOGL, "Fixed code=0x%04x != 0xfaf5, module borked?",
		    id);
	}
	id = v560->read->manufacturer_module_type;
	LOGF(verbose)(LOGL, "Manufacturer=%02x, module type=%03x (0x%04x).",
	    (0xfc00 & id) >> 10, 0x3ff & id, id);
	id = v560->read->version_serial_number;
	LOGF(verbose)(LOGL, "Version=%x, S/N=%03x (0x%04x).",
	    (0xf000 & id) >> 12, 0xfff & id, id);

	LOGF(verbose)(LOGL, NAME" init_slow }");
	return 1;
}

void
caen_v560_memtest(struct Module *a_module, enum Keyword a_mode)
{
	(void)a_module;
	(void)a_mode;
}

uint32_t
caen_v560_parse_data(struct Crate const *a_crate, struct Module *a_module,
    struct EventConstBuffer const *a_event_buffer, int a_do_pedestals)
{
	(void)a_crate;
	(void)a_module;
	(void)a_event_buffer;
	(void)a_do_pedestals;
	return 0;
}

uint32_t
caen_v560_readout(struct Crate *a_crate, struct Module *a_module, struct
    EventBuffer *a_event_buffer)
{
	struct CaenV560Module *v560;
	uint32_t *outp;
	uint32_t result = 0;
	int ch;

	(void)a_crate;
	LOGF(spam)(LOGL, NAME" readout {");

	MODULE_CAST(KW_CAEN_V560, v560, a_module);
	outp = a_event_buffer->ptr;

	if (v560->use_veto) {
		v560->write->vme_veto_set = 1;
	}

	*outp++ = 0xc560c560;
	for (ch = 0; ch < 16; ch++) {
		*outp++ = v560->read->counter[ch];
	}

	if (v560->use_veto) {
		v560->write->vme_veto_reset = 1;
	}

	EVENT_BUFFER_ADVANCE(*a_event_buffer, outp);
	LOGF(spam)(LOGL, NAME" readout(0x%08x) }", result);
	return result;
}

uint32_t
caen_v560_readout_dt(struct Crate *a_crate, struct Module *a_module)
{
	(void)a_crate;
	LOGF(spam)(LOGL, NAME" readout_dt {");
	a_module->event_counter.value++;
	LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }",
	    a_module->event_counter.value);
	return 0;
}

void
caen_v560_setup_()
{
	MODULE_SETUP(caen_v560, 0);
}
-------------- next part --------------
#ifndef MODULE_CAEN_V560_CAEN_V560_H
#define MODULE_CAEN_V560_CAEN_V560_H

#include <module/module.h>

MODULE_INTERFACE(caen_v560);

#endif
-------------- next part --------------
nabm

Interrupt Vector   0x0004 16 RW
Interrupt Level    0x0006 16 RW
Enable Interrupt   0x0008 16 RW
Disable Interrupt 0x000A 16 RW
Clear Interrupt    0x000C 16 RW
Request            0x000E 16 RW

Counter            0x0010..0x004C 32 R

Scale clear        0x0050 16 RW
VME VETO set       0x0052 16 RW
VME VETO reset     0x0054 16 RW
Scale Increase     0x0056 16 RW
Scale Status       0x0058 16 R

Fixed code                0x00FA 16 R
Manufacturer Module Type  0x00FC 16 R
Version Serial Number     0x00FE 16 R
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rules.mk
Type: application/octet-stream
Size: 161 bytes
Desc: rules.mk
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240220/8708cd77/attachment-0001.obj>

From f96hajo at chalmers.se  Tue Feb 20 19:41:06 2024
From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=)
Date: Tue, 20 Feb 2024 19:41:06 +0100
Subject: [subexp-daq] Question on default firmware for VULOM
In-Reply-To: <bf091547e1914eb28608374f74496804@hi-jena.gsi.de>
References: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de>,
	<32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se>,
	<de5cf41191864c8083f002af44d57248@hi-jena.gsi.de>
	<bf091547e1914eb28608374f74496804@hi-jena.gsi.de>
Message-ID: <9a62763e-663d-3303-2130-d1f878bdaf62@chalmers.se>


Dear G?nter,

question: are you using different firmwares due to the need to have 
different amounts of various kinds of logic inside?

Or is it just to run 'older' versions, due to software incompatibilities?

If the latter, then the long-term approach would be to try to rectify that 
by maving to newer versions when possible.

We have been working on forward-porting the sis3316 branch you are using. 
All the nurdlib-common things have been merged with the master branch 
already.

The sis3316 changes are also done, but needs testing.  We have no 
experience or direct access to such modules.  We have tried to be careful, 
but it is easy to overlook things.  Separate mail to come.

Cheers,
H?kan


On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:

> 
> Dear friends,
> 
> 
> attached please find a script that automatically choses the right firmware
> for the VULOM.
> 
> 
> If you find time, please have a look at it and tell me if you think it is
> useful or if there is any mistake.
> 
> 
> On my system it works fine so far.
> 
> 
> 
> 
> Best greetings
> 
> G?nter
> 
> ____________________________________________________________________________
> Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
> Guenter Dr. <g.weber at hi-jena.gsi.de>
> Gesendet: Dienstag, 20. Februar 2024 10:46:52
> An: Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, drasi and
> UCESB.
> Betreff: Re: [subexp-daq] Question on default firmware for VULOM ?
> 
> Dear Hans,
> 
> 
> just to close this issue, here is what we are doing now to set the VULOM to
> the right firmware. As it is possible that a DAQ system is operating with
> different software versions that also require different VULOM firmware, we
> now load the right firmware 'on-the-fly' as part of the startup of the DAQ
> system. After loading of the firmware a short waiting time is necessary
> before the VULOM is ready again.
> 
> 
> ---------------------------------------------------
> 
> # Obtain the right firmware
> 
> VULOM4_FW=`cat ${TRLOII_PATH}/fw/vulom4b_trlo/trlo_defs.h | grep
> MD5SUM_STAMP | sed 's/.*0x//'`
> exportVULOM4_CTRL=$trloiipath/trloctrl/fw_${VULOM4_FW}_trlo/bin_${machine}_${vers
> ion}/trlo_ctrl
> 
> # setup local constants
> addr=3
> firmware_region=2
> 
> # restarting the VULOM to set the correct firmware version
> echo "Restarting VULOM with firmware region" $firmware_region
> $TRLOII_PATH/bin/vulomflash --addr=$addr --restart=$firmware_region
> echo "Waiting for VULOM to answer..."
> exit_code=1
> while [ $exit_code -ne 0 ] ; do
> ??? $TRLOII_PATH/bin/vulomflash --addr=$addr --read &> /dev/null
> ??? exit_code=$?
> ??? if [ $exit_code -ne 0 ] ; then
> ??????? sleep 1 #sleep for 1 sec before retrying
> ??? fi
> done
> 
> 
> # Trigger on VULOM pulser
> $VULOM4_CTRL --addr=$addr --clear-setup --config=vulom.trlo standalone
> module_trigger
> 
> ----------------------------------------------------
> 
> 
> If we did understand the naming convention in "bin/vulomflash --addr=$ADDR
> --readprogs" a bit better, we would probably also be able to estimate from
> the name/number of the firmware the right firmware region on the VULOM (and
> we could check if the right firmware was already loaded). But at the moment,
> we set firmware_region by hand.
> 
> 
> In the ideal world, we would then have an init script for the DAQ that takes
> care of the VULOM settings just by looking at the state of the software
> under TRLOII_PATH. This could help our guys a lot who have no idea about all
> this stuff :-)
> 
> 
> 
> 
> Best greetings
> 
> G?nter
> 
> 
> 
> ____________________________________________________________________________
> Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
> Gesendet: Montag, 19. Februar 2024 10:37:37
> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.;
> Discuss use of Nurdlib, TRLO II, drasi and UCESB.
> Betreff: Re: [subexp-daq] Question on default firmware for VULOM ?
> Dear G?nter,
> 
> As I understand it, the firmware you would like to have has been flashed and
> tested quite a lot and seems to work?
> 
> Then you can flash program area 0, but you will need the --force option too.
> Area 0 will only be flashed if the firmware in question has been flashed
> into another area, as in your present case.
> 
> After that it will load after every power cycle.
> 
> Cheers,
> Hans
> 
> 
> "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (19 februari 2024
> 10:15:37 CET)
>
>       Dear friends,
> 
>
>       I restarted the crate and now get the following error message
>       when starting the DAQ:
> 
>
>       10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>       LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07
>       19:56:55 UTC)
>       WARNING: Known firmware (alias): 0x6e4ba1a9.
>       WARNING: Known firmware (alias): 0x1409285e.
>       WARNING: Known firmware (alias): 0xa73c5093.
>       WARNING: Known firmware (alias): 0x6e4ba1a9.
>       WARNING: Known firmware (alias): 0x1409285e.
>       FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or
>       alias.
> 
> 
> I assume that upon power-on the VULOM is booting with a default
> firmware that is not the one necessary to run with the most recent
> version of NURDLIB, etc.
> 
> 
> How can I tell the VULOM which firmware version from its memory it
> should load as default?
> 
> 
> 
> 
> Thank you very much!
> 
> 
> 
> 
> Best greetings
> 
> G?nter
> 
> 
> 
>

From f96hajo at chalmers.se  Tue Feb 20 19:59:11 2024
From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=)
Date: Tue, 20 Feb 2024 19:59:11 +0100
Subject: [subexp-daq] sis3316 updates
Message-ID: <ac4a724e-1e3c-0ac2-9b94-d09bd5942431@chalmers.se>


Dear G?nter, all sis3316 nurdlib user,

the changes to the nurdlib sis3316-code that have been used at Jena (and 
possibly other places), which was a branch that had its branch point about 
three years ago has been forward-ported to approximately current master.

It is available as the 'rebasing_sis3316' branch at
https://gitlab.com/chalmers-subexp/nurdlib

Since we have no direct access or own experience with those modules, the 
testing needs to be done by some/anyone with access to sis3316 hardware.

Note: it is not necessary to have used the forked branch to provide 
helpful test results!  Knowing that the new changes do not break other 
sis3316-behaviour would also be very helpful.

Thus, this is a call for help!  ;)

As can be seen in the repository graph

https://gitlab.com/chalmers-subexp/nurdlib/-/network/master?ref_type=heads

there are about 20 commits.  Some of them are followed by fixup commits, 
where we just kept a minimal merge first, and then fixed compilation 
issues separately, in order to more easily follow any mistakes.  I.e.: 
when a commit is followed by fixup commits, it only makes sense to test 
the last fixup commit in that sequence.

I would suggest the following test strategy:

0) First test the 'rebasing_sis3316' branch.
    If we are lucky - it just works!

1) If 0) fails, then test the fork point, i.e. the currently the
    commit e2163738.  This is an close ancestor of the nurdlib master
    branch, and thus contains no additional sis3316 changes than has been
    in the master branch so far.

    When testing that, it will probably be necessary to comment out some
    settings, which have been implemented in the new branch.

    If this fails, nurdlib master has a problem, which I think should be
    looked into before proceeding further.

2) Move forward, commit by commit (in steps to fixup commits where they
    follow other commits).  For each such commit, test and see if it still
    works.  If a commit implements a new option, also test that one.

This way, we should hopefully be able to pin-point any issues.

Best regards,
H?kan

From f96hajo at chalmers.se  Tue Feb 20 20:13:32 2024
From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=)
Date: Tue, 20 Feb 2024 20:13:32 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
Message-ID: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>


Dear G?nter,

I took the files you provided and for comparison put them in a branch 
'old_caen_v560'.

git diff origin/old_caen_v560..origin/master

however does not show anything which is suspicious to me.  Perhaps Hans 
can spot something.

Otherwise, the only idea I can come up with is to continue to bisect the 
code inside slow init.

However, before that, I would suggest to add

  fflush(stdout); sleep(1);

after each printf statement, such that one can be quite sure that the 
printout is not eaten when the RIO crash happens.  I.e. that it actually 
had gotten further than shown by the prints.

Best regards,
H?kan


On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:

> 
> Dear friends,
> 
> 
> I now had a look at the system where the V560 was running. It was also setup
> by Bastian. And there the code for the V560 module is slightly different
> from the one included in the NURDLIB branch that I am using on the test
> system.
> 
> 
> Maybe you can have a look at it.
> 
> 
> I also could push the complete NURDLIB from this system, if this helps.
> 
> 
> 
> 
> Best greetings
> 
> G?nter
> 
> 
> 
> 
> ____________________________________________________________________________
> Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
> Guenter Dr. <g.weber at hi-jena.gsi.de>
> Gesendet: Dienstag, 20. Februar 2024 10:58:27
> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module ?
> 
> Dear friends,
> 
> 
> I now grabbed a V560 module that was working fine in another DAQ system and
> put it into our test system.
> 
> 
> The main.cfg looks like this:
> 
> 
> log_level=spam # info, verbose, debug, spam
> 
> CRATE("MCAL") {
> ? ? GSI_VULOM(0x03000000) {
> ? ? ? ? timestamp = true # needed to get timestamps in the data output
> ? ? # ? ecl=0..15
> ? ? }
> ? ? BARRIER
> ? ? CAEN_V560(0x333333300) {
> ? ? ? ? use_veto = true
> ? ? } ?
> # ? CAEN_V767A(0x03100000) {
> # ? }
> }
> 
> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
> is necessary to talk to it again.
> 
> 
> The problem occurs in the first slow init of the V560 module. To find the
> exact line, I added some output to CRATE.C:
> 
> 
> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> before module_init_id_mark(a_crate, module)
> before pop_log_level(module)
> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
> 
> 
> The CRATE.C code now looks like this:
> 
> 
> ? ? TAILQ_FOREACH(module, &a_crate->module_list, next) {
> ? ? ? ? if (NULL == module->props) {
> ? ? ? ? ? ? continue;
> ? ? ? ? }
> ? ? ? ? LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
> ? ? ? ? ? ? keyword_get_string(module->type));
> ? ? ? ? printf("before push_log_level(module) \n");
> ? ? ? ? push_log_level(module);
> ? ? ? ? printf("before a_crate->module_init_id = module->id \n");
> ? ? ? ? a_crate->module_init_id = module->id;
> ? ? ? ? printf("before module->props->init_slow(a_crate, module) \n");
> ? ? ? ? if (!module->props->init_slow(a_crate, module)) {
> ? ? ? ? ? ? printf("before pop_log_level(module) \n");
> ? ? ? ? ? ? pop_log_level(module);
> ? ? ? ? ? ? printf("before goto crate_init_done \n");
> ? ? ? ? ? ? goto crate_init_done;
> ? ? ? ? }
> ? ? ? ? printf("before module_init_id_mark(a_crate, module) \n");
> ? ? ? ? module_init_id_mark(a_crate, module);
> ? ? ? ? printf("before pop_log_level(module) \n");
> ? ? ? ? pop_log_level(module);
> ? ? }
> 
> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
> module)) ..." is doing something quite horrible to the RIO4.
> 
> 
> This is unfortunate, because my original aim was to show that there is also
> a bug/mistake in readout_dt of the V560 module. But I did not come this far.
> 
> 
> Do you have any idea what might cause the freezing of the RIO4?
> 
> 
> 
> 
> Best greetings and many thanks
> 
> G?nter
> 
> 
> 
> 
> 
>

From f96hajo at chalmers.se  Tue Feb 20 20:15:06 2024
From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=)
Date: Tue, 20 Feb 2024 20:15:06 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
Message-ID: <7aa42eb8-7a25-a1d0-3179-1c9853922807@chalmers.se>


Ohh, and please do push also the complete nurdlib branch from that system.
Who knows what other changes it might contain :-)

Cheers,
H?kan


On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:

> 
> Dear friends,
> 
> 
> I now had a look at the system where the V560 was running. It was also setup
> by Bastian. And there the code for the V560 module is slightly different
> from the one included in the NURDLIB branch that I am using on the test
> system.
> 
> 
> Maybe you can have a look at it.
> 
> 
> I also could push the complete NURDLIB from this system, if this helps.
> 
> 
> 
> 
> Best greetings
> 
> G?nter
> 
> 
> 
> 
> ____________________________________________________________________________
> Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
> Guenter Dr. <g.weber at hi-jena.gsi.de>
> Gesendet: Dienstag, 20. Februar 2024 10:58:27
> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module ?
> 
> Dear friends,
> 
> 
> I now grabbed a V560 module that was working fine in another DAQ system and
> put it into our test system.
> 
> 
> The main.cfg looks like this:
> 
> 
> log_level=spam # info, verbose, debug, spam
> 
> CRATE("MCAL") {
> ? ? GSI_VULOM(0x03000000) {
> ? ? ? ? timestamp = true # needed to get timestamps in the data output
> ? ? # ? ecl=0..15
> ? ? }
> ? ? BARRIER
> ? ? CAEN_V560(0x333333300) {
> ? ? ? ? use_veto = true
> ? ? } ?
> # ? CAEN_V767A(0x03100000) {
> # ? }
> }
> 
> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
> is necessary to talk to it again.
> 
> 
> The problem occurs in the first slow init of the V560 module. To find the
> exact line, I added some output to CRATE.C:
> 
> 
> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> before module_init_id_mark(a_crate, module)
> before pop_log_level(module)
> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
> 
> 
> The CRATE.C code now looks like this:
> 
> 
> ? ? TAILQ_FOREACH(module, &a_crate->module_list, next) {
> ? ? ? ? if (NULL == module->props) {
> ? ? ? ? ? ? continue;
> ? ? ? ? }
> ? ? ? ? LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
> ? ? ? ? ? ? keyword_get_string(module->type));
> ? ? ? ? printf("before push_log_level(module) \n");
> ? ? ? ? push_log_level(module);
> ? ? ? ? printf("before a_crate->module_init_id = module->id \n");
> ? ? ? ? a_crate->module_init_id = module->id;
> ? ? ? ? printf("before module->props->init_slow(a_crate, module) \n");
> ? ? ? ? if (!module->props->init_slow(a_crate, module)) {
> ? ? ? ? ? ? printf("before pop_log_level(module) \n");
> ? ? ? ? ? ? pop_log_level(module);
> ? ? ? ? ? ? printf("before goto crate_init_done \n");
> ? ? ? ? ? ? goto crate_init_done;
> ? ? ? ? }
> ? ? ? ? printf("before module_init_id_mark(a_crate, module) \n");
> ? ? ? ? module_init_id_mark(a_crate, module);
> ? ? ? ? printf("before pop_log_level(module) \n");
> ? ? ? ? pop_log_level(module);
> ? ? }
> 
> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
> module)) ..." is doing something quite horrible to the RIO4.
> 
> 
> This is unfortunate, because my original aim was to show that there is also
> a bug/mistake in readout_dt of the V560 module. But I did not come this far.
> 
> 
> Do you have any idea what might cause the freezing of the RIO4?
> 
> 
> 
> 
> Best greetings and many thanks
> 
> G?nter
> 
> 
> 
> 
> 
>

From g.weber at hi-jena.gsi.de  Wed Feb 21 10:18:29 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Wed, 21 Feb 2024 09:18:29 +0000
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>,
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
Message-ID: <aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>

Dear H?kan,


thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line:


    v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT,
        0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));


Maybe the code is accessing/writing into a memory location that it should better not touch?

This problematic line is then followed by:


    id = MAP_READ(v560->sicy_map, fixed_code);


The corresponding line in the V560 code on the system that was running with this module looks like this:


    v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
        0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
        MAP_POKE_ARGS(*v560->write, scale_clear));


And is followed by:


    mapped_ptr = map_get_mapped_ptr(v560->sicy_map);
    v560->read = mapped_ptr;
    v560->write = mapped_ptr;


Maybe you already have an idea what causes the problem here?


I will now go to the system that was running with V560 and make a push of the NURDLIB.


Best greetings

G?nter


________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
Gesendet: Dienstag, 20. Februar 2024 20:13:32
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module


Dear G?nter,

I took the files you provided and for comparison put them in a branch
'old_caen_v560'.

git diff origin/old_caen_v560..origin/master

however does not show anything which is suspicious to me.  Perhaps Hans
can spot something.

Otherwise, the only idea I can come up with is to continue to bisect the
code inside slow init.

However, before that, I would suggest to add

  fflush(stdout); sleep(1);

after each printf statement, such that one can be quite sure that the
printout is not eaten when the RIO crash happens.  I.e. that it actually
had gotten further than shown by the prints.

Best regards,
H?kan


On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:

>
> Dear friends,
>
>
> I now had a look at the system where the V560 was running. It was also setup
> by Bastian. And there the code for the V560 module is slightly different
> from the one included in the NURDLIB branch that I am using on the test
> system.
>
>
> Maybe you can have a look at it.
>
>
> I also could push the complete NURDLIB from this system, if this helps.
>
>
>
>
> Best greetings
>
> G?nter
>
>
>
>
> ____________________________________________________________________________
> Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
> Guenter Dr. <g.weber at hi-jena.gsi.de>
> Gesendet: Dienstag, 20. Februar 2024 10:58:27
> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>
> Dear friends,
>
>
> I now grabbed a V560 module that was working fine in another DAQ system and
> put it into our test system.
>
>
> The main.cfg looks like this:
>
>
> log_level=spam # info, verbose, debug, spam
>
> CRATE("MCAL") {
>     GSI_VULOM(0x03000000) {
>         timestamp = true # needed to get timestamps in the data output
>     #   ecl=0..15
>     }
>     BARRIER
>     CAEN_V560(0x333333300) {
>         use_veto = true
>     }
> #   CAEN_V767A(0x03100000) {
> #   }
> }
>
> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
> is necessary to talk to it again.
>
>
> The problem occurs in the first slow init of the V560 module. To find the
> exact line, I added some output to CRATE.C:
>
>
> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> before module_init_id_mark(a_crate, module)
> before pop_log_level(module)
> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
>
>
> The CRATE.C code now looks like this:
>
>
>     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>         if (NULL == module->props) {
>             continue;
>         }
>         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>             keyword_get_string(module->type));
>         printf("before push_log_level(module) \n");
>         push_log_level(module);
>         printf("before a_crate->module_init_id = module->id \n");
>         a_crate->module_init_id = module->id;
>         printf("before module->props->init_slow(a_crate, module) \n");
>         if (!module->props->init_slow(a_crate, module)) {
>             printf("before pop_log_level(module) \n");
>             pop_log_level(module);
>             printf("before goto crate_init_done \n");
>             goto crate_init_done;
>         }
>         printf("before module_init_id_mark(a_crate, module) \n");
>         module_init_id_mark(a_crate, module);
>         printf("before pop_log_level(module) \n");
>         pop_log_level(module);
>     }
>
> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
> module)) ..." is doing something quite horrible to the RIO4.
>
>
> This is unfortunate, because my original aim was to show that there is also
> a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>
>
> Do you have any idea what might cause the freezing of the RIO4?
>
>
>
>
> Best greetings and many thanks
>
> G?nter
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240221/994bfd7c/attachment-0001.html>

From g.weber at hi-jena.gsi.de  Wed Feb 21 11:03:41 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Wed, 21 Feb 2024 10:03:41 +0000
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>,
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>,
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
Message-ID: <1a521b6bf7a64588b0bc8754952eb565@hi-jena.gsi.de>

Ok. Push was done.


Counting objects: 13558, done.
Delta compression using up to 32 threads.
Compressing objects: 100% (3619/3619), done.
Writing objects: 100% (13558/13558), 2.63 MiB | 4.29 MiB/s, done.
Total 13558 (delta 9974), reused 13407 (delta 9863)
remote: Resolving deltas: 100% (9974/9974), done.
remote:
remote: To create a merge request for caen_v560, visit:
remote:   https://gitlab.com/chalmers-subexp/nurdlib/-/merge_requests/new?merge_request%5Bsource_branch%5D=caen_v560
remote:
To gitlab.com:chalmers-subexp/nurdlib.git
 * [new branch]      caen_v560 -> caen_v560


Best greetings
G?nter


________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
Gesendet: Mittwoch, 21. Februar 2024 10:18:29
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module


Dear H?kan,


thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line:


    v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT,
        0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));


Maybe the code is accessing/writing into a memory location that it should better not touch?

This problematic line is then followed by:


    id = MAP_READ(v560->sicy_map, fixed_code);


The corresponding line in the V560 code on the system that was running with this module looks like this:


    v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
        0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
        MAP_POKE_ARGS(*v560->write, scale_clear));


And is followed by:


    mapped_ptr = map_get_mapped_ptr(v560->sicy_map);
    v560->read = mapped_ptr;
    v560->write = mapped_ptr;


Maybe you already have an idea what causes the problem here?


I will now go to the system that was running with V560 and make a push of the NURDLIB.


Best greetings

G?nter


________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
Gesendet: Dienstag, 20. Februar 2024 20:13:32
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module


Dear G?nter,

I took the files you provided and for comparison put them in a branch
'old_caen_v560'.

git diff origin/old_caen_v560..origin/master

however does not show anything which is suspicious to me.  Perhaps Hans
can spot something.

Otherwise, the only idea I can come up with is to continue to bisect the
code inside slow init.

However, before that, I would suggest to add

  fflush(stdout); sleep(1);

after each printf statement, such that one can be quite sure that the
printout is not eaten when the RIO crash happens.  I.e. that it actually
had gotten further than shown by the prints.

Best regards,
H?kan


On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:

>
> Dear friends,
>
>
> I now had a look at the system where the V560 was running. It was also setup
> by Bastian. And there the code for the V560 module is slightly different
> from the one included in the NURDLIB branch that I am using on the test
> system.
>
>
> Maybe you can have a look at it.
>
>
> I also could push the complete NURDLIB from this system, if this helps.
>
>
>
>
> Best greetings
>
> G?nter
>
>
>
>
> ____________________________________________________________________________
> Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
> Guenter Dr. <g.weber at hi-jena.gsi.de>
> Gesendet: Dienstag, 20. Februar 2024 10:58:27
> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>
> Dear friends,
>
>
> I now grabbed a V560 module that was working fine in another DAQ system and
> put it into our test system.
>
>
> The main.cfg looks like this:
>
>
> log_level=spam # info, verbose, debug, spam
>
> CRATE("MCAL") {
>     GSI_VULOM(0x03000000) {
>         timestamp = true # needed to get timestamps in the data output
>     #   ecl=0..15
>     }
>     BARRIER
>     CAEN_V560(0x333333300) {
>         use_veto = true
>     }
> #   CAEN_V767A(0x03100000) {
> #   }
> }
>
> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
> is necessary to talk to it again.
>
>
> The problem occurs in the first slow init of the V560 module. To find the
> exact line, I added some output to CRATE.C:
>
>
> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> before module_init_id_mark(a_crate, module)
> before pop_log_level(module)
> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
>
>
> The CRATE.C code now looks like this:
>
>
>     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>         if (NULL == module->props) {
>             continue;
>         }
>         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>             keyword_get_string(module->type));
>         printf("before push_log_level(module) \n");
>         push_log_level(module);
>         printf("before a_crate->module_init_id = module->id \n");
>         a_crate->module_init_id = module->id;
>         printf("before module->props->init_slow(a_crate, module) \n");
>         if (!module->props->init_slow(a_crate, module)) {
>             printf("before pop_log_level(module) \n");
>             pop_log_level(module);
>             printf("before goto crate_init_done \n");
>             goto crate_init_done;
>         }
>         printf("before module_init_id_mark(a_crate, module) \n");
>         module_init_id_mark(a_crate, module);
>         printf("before pop_log_level(module) \n");
>         pop_log_level(module);
>     }
>
> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
> module)) ..." is doing something quite horrible to the RIO4.
>
>
> This is unfortunate, because my original aim was to show that there is also
> a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>
>
> Do you have any idea what might cause the freezing of the RIO4?
>
>
>
>
> Best greetings and many thanks
>
> G?nter
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240221/5fd9be38/attachment.html>

From hans.tornqvist at chalmers.se  Wed Feb 21 11:14:44 2024
From: hans.tornqvist at chalmers.se (=?ISO-8859-1?Q?Hans_Toshihide_T=F6rnqvist?=)
Date: Wed, 21 Feb 2024 11:14:44 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>,
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
Message-ID: <E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>

Dear G?nter,

map_map before mapping tries to read and write some given registers with a "safe" but slower method of accessing registers, which is called "poking" in nurdlib. Maybe the method of access on the rio4 you have is not safe enough and one of the two pokes fails horribly...

Could you please double check the module address? Could you also try using bin/rwdump to read any register in the v560 to see if it's accessible at all and not a problem with the module implementation in nurdlib?

Something like bin/rwdump -a0x33333300 -r16

Actually the address 0x33333300 looks weird to me, maybe it should be 0x33330000?
Also for reading, try register offsets fa, fc, fe, with 16 bits accesseses, they should have some interesting values.

Cheers,
Hans


"Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024 10:18:29 CET)
>Dear H?kan,
>
>
>thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line:
>
>
>    v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT,
>        0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
>
>
>Maybe the code is accessing/writing into a memory location that it should better not touch?
>
>This problematic line is then followed by:
>
>
>    id = MAP_READ(v560->sicy_map, fixed_code);
>
>
>The corresponding line in the V560 code on the system that was running with this module looks like this:
>
>
>    v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
>        0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
>        MAP_POKE_ARGS(*v560->write, scale_clear));
>
>
>And is followed by:
>
>
>    mapped_ptr = map_get_mapped_ptr(v560->sicy_map);
>    v560->read = mapped_ptr;
>    v560->write = mapped_ptr;
>
>
>Maybe you already have an idea what causes the problem here?
>
>
>I will now go to the system that was running with V560 and make a push of the NURDLIB.
>
>
>
>
>Best greetings
>
>G?nter
>
>
>________________________________
>Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
>Gesendet: Dienstag, 20. Februar 2024 20:13:32
>An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>
>
>Dear G?nter,
>
>I took the files you provided and for comparison put them in a branch
>'old_caen_v560'.
>
>git diff origin/old_caen_v560..origin/master
>
>however does not show anything which is suspicious to me.  Perhaps Hans
>can spot something.
>
>Otherwise, the only idea I can come up with is to continue to bisect the
>code inside slow init.
>
>However, before that, I would suggest to add
>
>  fflush(stdout); sleep(1);
>
>after each printf statement, such that one can be quite sure that the
>printout is not eaten when the RIO crash happens.  I.e. that it actually
>had gotten further than shown by the prints.
>
>Best regards,
>H?kan
>
>
>
>
>On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:
>
>>
>> Dear friends,
>>
>>
>> I now had a look at the system where the V560 was running. It was also setup
>> by Bastian. And there the code for the V560 module is slightly different
>> from the one included in the NURDLIB branch that I am using on the test
>> system.
>>
>>
>> Maybe you can have a look at it.
>>
>>
>> I also could push the complete NURDLIB from this system, if this helps.
>>
>>
>>
>>
>> Best greetings
>>
>> G?nter
>>
>>
>>
>>
>> ____________________________________________________________________________
>> Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
>> Guenter Dr. <g.weber at hi-jena.gsi.de>
>> Gesendet: Dienstag, 20. Februar 2024 10:58:27
>> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>>
>> Dear friends,
>>
>>
>> I now grabbed a V560 module that was working fine in another DAQ system and
>> put it into our test system.
>>
>>
>> The main.cfg looks like this:
>>
>>
>> log_level=spam # info, verbose, debug, spam
>>
>> CRATE("MCAL") {
>>     GSI_VULOM(0x03000000) {
>>         timestamp = true # needed to get timestamps in the data output
>>     #   ecl=0..15
>>     }
>>     BARRIER
>>     CAEN_V560(0x333333300) {
>>         use_veto = true
>>     }
>> #   CAEN_V767A(0x03100000) {
>> #   }
>> }
>>
>> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
>> is necessary to talk to it again.
>>
>>
>> The problem occurs in the first slow init of the V560 module. To find the
>> exact line, I added some output to CRATE.C:
>>
>>
>> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>> before push_log_level(module)
>> before a_crate->module_init_id = module->id
>> before module->props->init_slow(a_crate, module)
>> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>> before module_init_id_mark(a_crate, module)
>> before pop_log_level(module)
>> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
>> before push_log_level(module)
>> before a_crate->module_init_id = module->id
>> before module->props->init_slow(a_crate, module)
>>
>>
>> The CRATE.C code now looks like this:
>>
>>
>>     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>>         if (NULL == module->props) {
>>             continue;
>>         }
>>         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>>             keyword_get_string(module->type));
>>         printf("before push_log_level(module) \n");
>>         push_log_level(module);
>>         printf("before a_crate->module_init_id = module->id \n");
>>         a_crate->module_init_id = module->id;
>>         printf("before module->props->init_slow(a_crate, module) \n");
>>         if (!module->props->init_slow(a_crate, module)) {
>>             printf("before pop_log_level(module) \n");
>>             pop_log_level(module);
>>             printf("before goto crate_init_done \n");
>>             goto crate_init_done;
>>         }
>>         printf("before module_init_id_mark(a_crate, module) \n");
>>         module_init_id_mark(a_crate, module);
>>         printf("before pop_log_level(module) \n");
>>         pop_log_level(module);
>>     }
>>
>> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
>> module)) ..." is doing something quite horrible to the RIO4.
>>
>>
>> This is unfortunate, because my original aim was to show that there is also
>> a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>>
>>
>> Do you have any idea what might cause the freezing of the RIO4?
>>
>>
>>
>>
>> Best greetings and many thanks
>>
>> G?nter
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240221/4b6d7bd1/attachment-0001.html>

From hans.tornqvist at chalmers.se  Wed Feb 21 11:21:25 2024
From: hans.tornqvist at chalmers.se (=?ISO-8859-1?Q?Hans_Toshihide_T=F6rnqvist?=)
Date: Wed, 21 Feb 2024 11:21:25 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>,
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
Message-ID: <A46D5A85-EF5F-4224-8CF7-D119E9ED0108@chalmers.se>

Ah sorry I see now that the v560 has six address selectors so 0x33333300 is actually possible. Still, please double check the adress setting and try using rwdump to poke the module manually.

Cheers,
Hans


"Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024 10:18:29 CET)
>Dear H?kan,
>
>
>thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line:
>
>
>    v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT,
>        0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
>
>
>Maybe the code is accessing/writing into a memory location that it should better not touch?
>
>This problematic line is then followed by:
>
>
>    id = MAP_READ(v560->sicy_map, fixed_code);
>
>
>The corresponding line in the V560 code on the system that was running with this module looks like this:
>
>
>    v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
>        0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
>        MAP_POKE_ARGS(*v560->write, scale_clear));
>
>
>And is followed by:
>
>
>    mapped_ptr = map_get_mapped_ptr(v560->sicy_map);
>    v560->read = mapped_ptr;
>    v560->write = mapped_ptr;
>
>
>Maybe you already have an idea what causes the problem here?
>
>
>I will now go to the system that was running with V560 and make a push of the NURDLIB.
>
>
>
>
>Best greetings
>
>G?nter
>
>
>________________________________
>Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
>Gesendet: Dienstag, 20. Februar 2024 20:13:32
>An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>
>
>Dear G?nter,
>
>I took the files you provided and for comparison put them in a branch
>'old_caen_v560'.
>
>git diff origin/old_caen_v560..origin/master
>
>however does not show anything which is suspicious to me.  Perhaps Hans
>can spot something.
>
>Otherwise, the only idea I can come up with is to continue to bisect the
>code inside slow init.
>
>However, before that, I would suggest to add
>
>  fflush(stdout); sleep(1);
>
>after each printf statement, such that one can be quite sure that the
>printout is not eaten when the RIO crash happens.  I.e. that it actually
>had gotten further than shown by the prints.
>
>Best regards,
>H?kan
>
>
>
>
>On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:
>
>>
>> Dear friends,
>>
>>
>> I now had a look at the system where the V560 was running. It was also setup
>> by Bastian. And there the code for the V560 module is slightly different
>> from the one included in the NURDLIB branch that I am using on the test
>> system.
>>
>>
>> Maybe you can have a look at it.
>>
>>
>> I also could push the complete NURDLIB from this system, if this helps.
>>
>>
>>
>>
>> Best greetings
>>
>> G?nter
>>
>>
>>
>>
>> ____________________________________________________________________________
>> Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
>> Guenter Dr. <g.weber at hi-jena.gsi.de>
>> Gesendet: Dienstag, 20. Februar 2024 10:58:27
>> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>>
>> Dear friends,
>>
>>
>> I now grabbed a V560 module that was working fine in another DAQ system and
>> put it into our test system.
>>
>>
>> The main.cfg looks like this:
>>
>>
>> log_level=spam # info, verbose, debug, spam
>>
>> CRATE("MCAL") {
>>     GSI_VULOM(0x03000000) {
>>         timestamp = true # needed to get timestamps in the data output
>>     #   ecl=0..15
>>     }
>>     BARRIER
>>     CAEN_V560(0x333333300) {
>>         use_veto = true
>>     }
>> #   CAEN_V767A(0x03100000) {
>> #   }
>> }
>>
>> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
>> is necessary to talk to it again.
>>
>>
>> The problem occurs in the first slow init of the V560 module. To find the
>> exact line, I added some output to CRATE.C:
>>
>>
>> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>> before push_log_level(module)
>> before a_crate->module_init_id = module->id
>> before module->props->init_slow(a_crate, module)
>> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>> before module_init_id_mark(a_crate, module)
>> before pop_log_level(module)
>> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
>> before push_log_level(module)
>> before a_crate->module_init_id = module->id
>> before module->props->init_slow(a_crate, module)
>>
>>
>> The CRATE.C code now looks like this:
>>
>>
>>     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>>         if (NULL == module->props) {
>>             continue;
>>         }
>>         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>>             keyword_get_string(module->type));
>>         printf("before push_log_level(module) \n");
>>         push_log_level(module);
>>         printf("before a_crate->module_init_id = module->id \n");
>>         a_crate->module_init_id = module->id;
>>         printf("before module->props->init_slow(a_crate, module) \n");
>>         if (!module->props->init_slow(a_crate, module)) {
>>             printf("before pop_log_level(module) \n");
>>             pop_log_level(module);
>>             printf("before goto crate_init_done \n");
>>             goto crate_init_done;
>>         }
>>         printf("before module_init_id_mark(a_crate, module) \n");
>>         module_init_id_mark(a_crate, module);
>>         printf("before pop_log_level(module) \n");
>>         pop_log_level(module);
>>     }
>>
>> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
>> module)) ..." is doing something quite horrible to the RIO4.
>>
>>
>> This is unfortunate, because my original aim was to show that there is also
>> a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>>
>>
>> Do you have any idea what might cause the freezing of the RIO4?
>>
>>
>>
>>
>> Best greetings and many thanks
>>
>> G?nter
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240221/f7cfab6b/attachment.html>

From g.weber at hi-jena.gsi.de  Wed Feb 21 11:40:25 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Wed, 21 Feb 2024 10:40:25 +0000
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>,
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>,
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
Message-ID: <a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>

Dear Hans,


the output from manual reading of the module indeed shows a problem:


RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16
Address=0x33333300
Raw-read value=rwdump: line 28:   593 Bus error               $PREFIX $f "$@"


The module was working with this address in the other DAQ system (as we did not know the order of the individual switches, we set them all to "3"). But I can take it our and put it in again at a different slot, if maybe this particular slot has a hardware problem. (But I never heard of such thing.)


Best greetings

G?nter

________________________________
Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
Gesendet: Mittwoch, 21. Februar 2024 11:14:44
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module

Dear G?nter,

map_map before mapping tries to read and write some given registers with a "safe" but slower method of accessing registers, which is called "poking" in nurdlib. Maybe the method of access on the rio4 you have is not safe enough and one of the two pokes fails horribly...

Could you please double check the module address? Could you also try using bin/rwdump to read any register in the v560 to see if it's accessible at all and not a problem with the module implementation in nurdlib?

Something like bin/rwdump -a0x33333300 -r16

Actually the address 0x33333300 looks weird to me, maybe it should be 0x33330000?
Also for reading, try register offsets fa, fc, fe, with 16 bits accesseses, they should have some interesting values.

Cheers,
Hans


"Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024 10:18:29 CET)

Dear H?kan,


thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line:


    v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT,
        0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));


Maybe the code is accessing/writing into a memory location that it should better not touch?

This problematic line is then followed by:


    id = MAP_READ(v560->sicy_map, fixed_code);


The corresponding line in the V560 code on the system that was running with this module looks like this:


    v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
        0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
        MAP_POKE_ARGS(*v560->write, scale_clear));


And is followed by:


    mapped_ptr = map_get_mapped_ptr(v560->sicy_map);
    v560->read = mapped_ptr;
    v560->write = mapped_ptr;


Maybe you already have an idea what causes the problem here?


I will now go to the system that was running with V560 and make a push of the NURDLIB.


Best greetings

G?nter


________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
Gesendet: Dienstag, 20. Februar 2024 20:13:32
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module


Dear G?nter,

I took the files you provided and for comparison put them in a branch
'old_caen_v560'.

git diff origin/old_caen_v560..origin/master

however does not show anything which is suspicious to me.  Perhaps Hans
can spot something.

Otherwise, the only idea I can come up with is to continue to bisect the
code inside slow init.

However, before that, I would suggest to add

  fflush(stdout); sleep(1);

after each printf statement, such that one can be quite sure that the
printout is not eaten when the RIO crash happens.  I.e. that it actually
had gotten further than shown by the prints.

Best regards,
H?kan


On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:

>
> Dear friends,
>
>
> I now had a look at the system where the V560 was running. It was also setup
> by Bastian. And there the code for the V560 module is slightly different
> from the one included in the NURDLIB branch that I am using on the test
> system.
>
>
> Maybe you can have a look at it.
>
>
> I also could push the complete NURDLIB from this system, if this helps.
>
>
>
>
> Best greetings
>
> G?nter
>
>
>
>
> ____________________________________________________________________________
> Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
> Guenter Dr. <g.weber at hi-jena.gsi.de>
> Gesendet: Dienstag, 20. Februar 2024 10:58:27
> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>
> Dear friends,
>
>
> I now grabbed a V560 module that was working fine in another DAQ system and
> put it into our test system.
>
>
> The main.cfg looks like this:
>
>
> log_level=spam # info, verbose, debug, spam
>
> CRATE("MCAL") {
>     GSI_VULOM(0x03000000) {
>         timestamp = true # needed to get timestamps in the data output
>     #   ecl=0..15
>     }
>     BARRIER
>     CAEN_V560(0x333333300) {
>         use_veto = true
>     }
> #   CAEN_V767A(0x03100000) {
> #   }
> }
>
> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
> is necessary to talk to it again.
>
>
> The problem occurs in the first slow init of the V560 module. To find the
> exact line, I added some output to CRATE.C:
>
>
> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> before module_init_id_mark(a_crate, module)
> before pop_log_level(module)
> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
>
>
> The CRATE.C code now looks like this:
>
>
>     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>         if (NULL == module->props) {
>             continue;
>         }
>         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>             keyword_get_string(module->type));
>         printf("before push_log_level(module) \n");
>         push_log_level(module);
>         printf("before a_crate->module_init_id = module->id \n");
>         a_crate->module_init_id = module->id;
>         printf("before module->props->init_slow(a_crate, module) \n");
>         if (!module->props->init_slow(a_crate, module)) {
>             printf("before pop_log_level(module) \n");
>             pop_log_level(module);
>             printf("before goto crate_init_done \n");
>             goto crate_init_done;
>         }
>         printf("before module_init_id_mark(a_crate, module) \n");
>         module_init_id_mark(a_crate, module);
>         printf("before pop_log_level(module) \n");
>         pop_log_level(module);
>     }
>
> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
> module)) ..." is doing something quite horrible to the RIO4.
>
>
> This is unfortunate, because my original aim was to show that there is also
> a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>
>
> Do you have any idea what might cause the freezing of the RIO4?
>
>
>
>
> Best greetings and many thanks
>
> G?nter
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240221/13cd6eb1/attachment-0001.html>

From g.weber at hi-jena.gsi.de  Wed Feb 21 14:32:14 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Wed, 21 Feb 2024 13:32:14 +0000
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>,
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>,
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>,
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>,
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
Message-ID: <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>

Dear Hans,


with the different register addresses it works.


RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16
Address=0x333333fa
Raw-read value=0xfaf5


RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16
Address=0x333333fc
Raw-read value=0x083a


RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16
Address=0x333333fe
Raw-read value=0x01bc


What can we learn from these numbers?


Best greetings

G?nter


________________________________
Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
Gesendet: Mittwoch, 21. Februar 2024 12:43:06
An: Weber, Guenter Dr.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module

Hmm, looks like address offset 0 is "not used", could you try -a0x333333fa? Or fe and fc at the end,they should be some read-only registers.


"Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024 12:06:00 CET)

Different VME slot of the V560 module, same result. :-(

________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
Gesendet: Mittwoch, 21. Februar 2024 11:40:25
An: Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module


Dear Hans,


the output from manual reading of the module indeed shows a problem:


RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16
Address=0x33333300
Raw-read value=rwdump: line 28:   593 Bus error               $PREFIX $f "$@"


The module was working with this address in the other DAQ system (as we did not know the order of the individual switches, we set them all to "3"). But I can take it our and put it in again at a different slot, if maybe this particular slot has a hardware problem. (But I never heard of such thing.)


Best greetings

G?nter

________________________________
Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
Gesendet: Mittwoch, 21. Februar 2024 11:14:44
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module

Dear G?nter,

map_map before mapping tries to read and write some given registers with a "safe" but slower method of accessing registers, which is called "poking" in nurdlib. Maybe the method of access on the rio4 you have is not safe enough and one of the two pokes fails horribly...

Could you please double check the module address? Could you also try using bin/rwdump to read any register in the v560 to see if it's accessible at all and not a problem with the module implementation in nurdlib?

Something like bin/rwdump -a0x33333300 -r16

Actually the address 0x33333300 looks weird to me, maybe it should be 0x33330000?
Also for reading, try register offsets fa, fc, fe, with 16 bits accesseses, they should have some interesting values.

Cheers,
Hans


"Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024 10:18:29 CET)

Dear H?kan,


thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line:


    v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT,
        0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));


Maybe the code is accessing/writing into a memory location that it should better not touch?

This problematic line is then followed by:


    id = MAP_READ(v560->sicy_map, fixed_code);


The corresponding line in the V560 code on the system that was running with this module looks like this:


    v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
        0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
        MAP_POKE_ARGS(*v560->write, scale_clear));


And is followed by:


    mapped_ptr = map_get_mapped_ptr(v560->sicy_map);
    v560->read = mapped_ptr;
    v560->write = mapped_ptr;


Maybe you already have an idea what causes the problem here?


I will now go to the system that was running with V560 and make a push of the NURDLIB.


Best greetings

G?nter


________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
Gesendet: Dienstag, 20. Februar 2024 20:13:32
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module


Dear G?nter,

I took the files you provided and for comparison put them in a branch
'old_caen_v560'.

git diff origin/old_caen_v560..origin/master

however does not show anything which is suspicious to me.  Perhaps Hans
can spot something.

Otherwise, the only idea I can come up with is to continue to bisect the
code inside slow init.

However, before that, I would suggest to add

  fflush(stdout); sleep(1);

after each printf statement, such that one can be quite sure that the
printout is not eaten when the RIO crash happens.  I.e. that it actually
had gotten further than shown by the prints.

Best regards,
H?kan


On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:

>
> Dear friends,
>
>
> I now had a look at the system where the V560 was running. It was also setup
> by Bastian. And there the code for the V560 module is slightly different
> from the one included in the NURDLIB branch that I am using on the test
> system.
>
>
> Maybe you can have a look at it.
>
>
> I also could push the complete NURDLIB from this system, if this helps.
>
>
>
>
> Best greetings
>
> G?nter
>
>
>
>
> ____________________________________________________________________________
> Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
> Guenter Dr. <g.weber at hi-jena.gsi.de>
> Gesendet: Dienstag, 20. Februar 2024 10:58:27
> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>
> Dear friends,
>
>
> I now grabbed a V560 module that was working fine in another DAQ system and
> put it into our test system.
>
>
> The main.cfg looks like this:
>
>
> log_level=spam # info, verbose, debug, spam
>
> CRATE("MCAL") {
>     GSI_VULOM(0x03000000) {
>         timestamp = true # needed to get timestamps in the data output
>     #   ecl=0..15
>     }
>     BARRIER
>     CAEN_V560(0x333333300) {
>         use_veto = true
>     }
> #   CAEN_V767A(0x03100000) {
> #   }
> }
>
> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
> is necessary to talk to it again.
>
>
> The problem occurs in the first slow init of the V560 module. To find the
> exact line, I added some output to CRATE.C:
>
>
> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> before module_init_id_mark(a_crate, module)
> before pop_log_level(module)
> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
> before push_log_level(module)
> before a_crate->module_init_id = module->id
> before module->props->init_slow(a_crate, module)
>
>
> The CRATE.C code now looks like this:
>
>
>     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>         if (NULL == module->props) {
>             continue;
>         }
>         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>             keyword_get_string(module->type));
>         printf("before push_log_level(module) \n");
>         push_log_level(module);
>         printf("before a_crate->module_init_id = module->id \n");
>         a_crate->module_init_id = module->id;
>         printf("before module->props->init_slow(a_crate, module) \n");
>         if (!module->props->init_slow(a_crate, module)) {
>             printf("before pop_log_level(module) \n");
>             pop_log_level(module);
>             printf("before goto crate_init_done \n");
>             goto crate_init_done;
>         }
>         printf("before module_init_id_mark(a_crate, module) \n");
>         module_init_id_mark(a_crate, module);
>         printf("before pop_log_level(module) \n");
>         pop_log_level(module);
>     }
>
> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
> module)) ..." is doing something quite horrible to the RIO4.
>
>
> This is unfortunate, because my original aim was to show that there is also
> a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>
>
> Do you have any idea what might cause the freezing of the RIO4?
>
>
>
>
> Best greetings and many thanks
>
> G?nter
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240221/daec44fa/attachment-0001.html>

From hans.tornqvist at chalmers.se  Wed Feb 21 15:28:01 2024
From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=)
Date: Wed, 21 Feb 2024 15:28:01 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
	<5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>
Message-ID: <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>

Dear G?nter,

The most important thing is that you get reasonable values with these 
reads, the actual values don't mean a whole lot.

One of the manual reads that you did (ofs=0xfa) is what 'map_map' does 
for "poke reading". The macros

MAP_POKE_ARGS(fixed_code), or the older
MAP_POKE_ARGS(*v560->read, fixed_code)

tell 'map_map' what address offset to poke, and it depends on each module.

The next thing that happens in 'map_map' is the "poke writing". Could 
you try to write to the 'scale_clear' register next? That would be:

rwdump -a0x33333350 -w16,0

---

In case you would like to look deeper in 'map_map', you can find it in 
module/map/map.c around line-number 103. It's not a very complicated 
function that does the following:

-) Checks user-mapped memory, you don't need to worry about this, it's 
mainly for simulating module memory for tests.

-) Performs the poke-read.

-) Performs the poke-write.

-) If it's a BLT mapping, asks the platform-specific code to do that 
without further tests.

-) Otherwise times the poke registers many times to get an idea about 
the speed of every single-cycle access.

If you want to dig even deeper, you can look in 
module/map/map_xpc_3310.c which is what is used in the most recent Linux 
Rio4's. It's mainly a wrapper around a proprietary black-box library, so 
not scary and scary at the same time.

Best regards,
Hans

On 2024-02-21 14:32, Weber, Guenter Dr. wrote:
> Dear Hans,
> 
> 
> with the different register addresses it works.
> 
> 
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16
> Address=0x333333fa
> Raw-read value=0xfaf5
> 
> 
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16
> Address=0x333333fc
> Raw-read value=0x083a
> 
> 
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16
> Address=0x333333fe
> Raw-read value=0x01bc
> 
> What can we learn from these numbers?
> 
> 
> 
> 
> Best greetings
> 
> G?nter
> 
> 
> 
> ------------------------------------------------------------------------
> *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
> *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06
> *An:* Weber, Guenter Dr.
> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 
> module
> Hmm, looks like address offset 0 is "not used", could you try 
> -a0x333333fa? Or fe and fc at the end,they should be some read-only 
> registers.
> 
> 
> "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024 
> 12:06:00 CET)
> 
>     Different VME slot of the V560 module, same result. :-(
> 
>     ------------------------------------------------------------------------
>     *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag
>     von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
>     *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25
>     *An:* Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II,
>     drasi and UCESB.
>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>     CAEN_V560 module
> 
>     Dear Hans,
> 
> 
>     the output from manual reading of the module indeed shows a problem:
> 
> 
>     RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16
>     Address=0x33333300
>     Raw-read value=rwdump: line 28:?? 593 Bus error              
>     $PREFIX $f "$@"
> 
> 
>     The module was working with this address in the other DAQ system (as
>     we did not know the order of the individual switches, we set them
>     all to "3"). But I can take it our and put it in again at a
>     different slot, if maybe this particular slot has a hardware
>     problem. (But I never heard of such thing.)
> 
> 
> 
> 
>     Best greetings
> 
>     G?nter
> 
> 
>     ------------------------------------------------------------------------
>     *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
>     *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44
>     *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber,
>     Guenter Dr.
>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>     CAEN_V560 module
>     Dear G?nter,
> 
>     map_map before mapping tries to read and write some given registers
>     with a "safe" but slower method of accessing registers, which is
>     called "poking" in nurdlib. Maybe the method of access on the rio4
>     you have is not safe enough and one of the two pokes fails horribly...
> 
>     Could you please double check the module address? Could you also try
>     using bin/rwdump to read any register in the v560 to see if it's
>     accessible at all and not a problem with the module implementation
>     in nurdlib?
> 
>     Something like bin/rwdump -a0x33333300 -r16
> 
>     Actually the address 0x33333300 looks weird to me, maybe it should
>     be 0x33330000?
>     Also for reading, try register offsets fa, fc, fe, with 16 bits
>     accesseses, they should have some interesting values.
> 
>     Cheers,
>     Hans
> 
> 
>     "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari
>     2024 10:18:29 CET)
> 
>         Dear H?kan,
> 
> 
>         thanks for the hint to flush and sleep. Indeed, I now see that
>         the crash happens in init_slow of V560 at this line:
> 
> 
>         v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
>         0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
> 
> 
>         Maybe the code is accessing/writing into a memory location that
>         it should better not touch?
> 
>         This problematic line is then followed by:
> 
> 
>         id=MAP_READ(v560->sicy_map, fixed_code);
> 
>         The corresponding line in the V560 code on the system that was
>         running with this module looks like this:
> 
> 
>         v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
>         0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
>         MAP_POKE_ARGS(*v560->write, scale_clear));
> 
>         And is followed by:
> 
> 
>          ? ? mapped_ptr =map_get_mapped_ptr(v560->sicy_map);
>         v560->read=mapped_ptr;
>         v560->write=mapped_ptr;
> 
>         Maybe you already have an idea what causes the problem here?
> 
> 
>         I will now go to the system that was running with V560 and make
>         a push of the NURDLIB.
> 
> 
> 
> 
>         Best greetings
> 
>         G?nter
> 
> 
> 
>         ------------------------------------------------------------------------
>         *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im
>         Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
>         *Gesendet:* Dienstag, 20. Februar 2024 20:13:32
>         *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>         *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>         CAEN_V560 module
> 
>         Dear G?nter,
> 
>         I took the files you provided and for comparison put them in a
>         branch
>         'old_caen_v560'.
> 
>         git diff origin/old_caen_v560..origin/master
> 
>         however does not show anything which is suspicious to me. 
>         Perhaps Hans
>         can spot something.
> 
>         Otherwise, the only idea I can come up with is to continue to
>         bisect the
>         code inside slow init.
> 
>         However, before that, I would suggest to add
> 
>          ? fflush(stdout); sleep(1);
> 
>         after each printf statement, such that one can be quite sure
>         that the
>         printout is not eaten when the RIO crash happens.? I.e. that it
>         actually
>         had gotten further than shown by the prints.
> 
>         Best regards,
>         H?kan
> 
> 
> 
> 
>         On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:
> 
>         > 
>         > Dear friends,
>         > 
>         > 
>         > I now had a look at the system where the V560 was running. It was also setup
>         > by Bastian. And there the code for the V560 module is slightly different
>         > from the one included in the NURDLIB branch that I am using on the test
>         > system.
>         > 
>         > 
>         > Maybe you can have a look at it.
>         > 
>         > 
>         > I also could push the complete NURDLIB from this system, if this helps.
>         > 
>         > 
>         > 
>         > 
>         > Best greetings
>         > 
>         > G?nter
>         > 
>         > 
>         > 
>         > 
>         > ____________________________________________________________________________
>         > Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
>         > Guenter Dr. <g.weber at hi-jena.gsi.de>
>         > Gesendet: Dienstag, 20. Februar 2024 10:58:27
>         > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>         > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module  
>         > 
>         > Dear friends,
>         > 
>         > 
>         > I now grabbed a V560 module that was working fine in another DAQ system and
>         > put it into our test system.
>         > 
>         > 
>         > The main.cfg looks like this:
>         > 
>         > 
>         > log_level=spam # info, verbose, debug, spam
>         > 
>         > CRATE("MCAL") {
>         > ? ? GSI_VULOM(0x03000000) {
>         > ? ? ? ? timestamp = true # needed to get timestamps in the data output
>         > ? ? # ? ecl=0..15
>         > ? ? }
>         > ? ? BARRIER
>         > ? ? CAEN_V560(0x333333300) {
>         > ? ? ? ? use_veto = true
>         > ? ? }  
>         > # ? CAEN_V767A(0x03100000) {
>         > # ? }
>         > }
>         > 
>         > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
>         > is necessary to talk to it again.
>         > 
>         > 
>         > The problem occurs in the first slow init of the V560 module. To find the
>         > exact line, I added some output to CRATE.C:
>         > 
>         > 
>         > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>         > before push_log_level(module)
>         > before a_crate->module_init_id = module->id
>         > before module->props->init_slow(a_crate, module)
>         > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>         > before module_init_id_mark(a_crate, module)
>         > before pop_log_level(module)
>         > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
>         > before push_log_level(module)
>         > before a_crate->module_init_id = module->id
>         > before module->props->init_slow(a_crate, module)
>         > 
>         > 
>         > The CRATE.C code now looks like this:
>         > 
>         > 
>         > ? ? TAILQ_FOREACH(module, &a_crate->module_list, next) {
>         > ? ? ? ? if (NULL == module->props) {
>         > ? ? ? ? ? ? continue;
>         > ? ? ? ? }
>         > ? ? ? ? LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>         > ? ? ? ? ? ? keyword_get_string(module->type));
>         > ? ? ? ? printf("before push_log_level(module) \n");
>         > ? ? ? ? push_log_level(module);
>         > ? ? ? ? printf("before a_crate->module_init_id = module->id \n");
>         > ? ? ? ? a_crate->module_init_id = module->id;
>         > ? ? ? ? printf("before module->props->init_slow(a_crate, module) \n");
>         > ? ? ? ? if (!module->props->init_slow(a_crate, module)) {
>         > ? ? ? ? ? ? printf("before pop_log_level(module) \n");
>         > ? ? ? ? ? ? pop_log_level(module);
>         > ? ? ? ? ? ? printf("before goto crate_init_done \n");
>         > ? ? ? ? ? ? goto crate_init_done;
>         > ? ? ? ? }
>         > ? ? ? ? printf("before module_init_id_mark(a_crate, module) \n");
>         > ? ? ? ? module_init_id_mark(a_crate, module);
>         > ? ? ? ? printf("before pop_log_level(module) \n");
>         > ? ? ? ? pop_log_level(module);
>         > ? ? }
>         > 
>         > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
>         > module)) ..." is doing something quite horrible to the RIO4.
>         > 
>         > 
>         > This is unfortunate, because my original aim was to show that there is also
>         > a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>         > 
>         > 
>         > Do you have any idea what might cause the freezing of the RIO4?
>         > 
>         > 
>         > 
>         > 
>         > Best greetings and many thanks
>         > 
>         > G?nter
>         > 
>         > 
>         > 
>         > 
>         > 
>         >
> 
> 

From g.weber at hi-jena.gsi.de  Wed Feb 21 16:14:45 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Wed, 21 Feb 2024 15:14:45 +0000
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
	<5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>,
	<93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>
Message-ID: <b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>

Dear Hans,


writing into the register works fine (I tried it several times):


RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
Address=0x33333350
Raw-write done.
RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
Address=0x33333350
Raw-write done.
RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
Address=0x33333350
Raw-write done.
RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
Address=0x33333350
Raw-write done.

I could now litter map_map with printf() outputs to see where execution of

    v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT,
        0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));

is failing. Should I proceed this way? Or is there anything else that I could check?

(As I understand, the slightly different implementation of V560 on our running system is not indicative of a specific issue, but just due to fact that this is a deprecated version of NURDLIB. Right?)


Best greetings
G?nter


________________________________
Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
Gesendet: Mittwoch, 21. Februar 2024 15:28:01
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module

Dear G?nter,

The most important thing is that you get reasonable values with these
reads, the actual values don't mean a whole lot.

One of the manual reads that you did (ofs=0xfa) is what 'map_map' does
for "poke reading". The macros

MAP_POKE_ARGS(fixed_code), or the older
MAP_POKE_ARGS(*v560->read, fixed_code)

tell 'map_map' what address offset to poke, and it depends on each module.

The next thing that happens in 'map_map' is the "poke writing". Could
you try to write to the 'scale_clear' register next? That would be:

rwdump -a0x33333350 -w16,0

---

In case you would like to look deeper in 'map_map', you can find it in
module/map/map.c around line-number 103. It's not a very complicated
function that does the following:

-) Checks user-mapped memory, you don't need to worry about this, it's
mainly for simulating module memory for tests.

-) Performs the poke-read.

-) Performs the poke-write.

-) If it's a BLT mapping, asks the platform-specific code to do that
without further tests.

-) Otherwise times the poke registers many times to get an idea about
the speed of every single-cycle access.

If you want to dig even deeper, you can look in
module/map/map_xpc_3310.c which is what is used in the most recent Linux
Rio4's. It's mainly a wrapper around a proprietary black-box library, so
not scary and scary at the same time.

Best regards,
Hans

On 2024-02-21 14:32, Weber, Guenter Dr. wrote:
> Dear Hans,
>
>
> with the different register addresses it works.
>
>
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16
> Address=0x333333fa
> Raw-read value=0xfaf5
>
>
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16
> Address=0x333333fc
> Raw-read value=0x083a
>
>
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16
> Address=0x333333fe
> Raw-read value=0x01bc
>
> What can we learn from these numbers?
>
>
>
>
> Best greetings
>
> G?nter
>
>
>
> ------------------------------------------------------------------------
> *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
> *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06
> *An:* Weber, Guenter Dr.
> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560
> module
> Hmm, looks like address offset 0 is "not used", could you try
> -a0x333333fa? Or fe and fc at the end,they should be some read-only
> registers.
>
>
> "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024
> 12:06:00 CET)
>
>     Different VME slot of the V560 module, same result. :-(
>
>     ------------------------------------------------------------------------
>     *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag
>     von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
>     *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25
>     *An:* Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II,
>     drasi and UCESB.
>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>     CAEN_V560 module
>
>     Dear Hans,
>
>
>     the output from manual reading of the module indeed shows a problem:
>
>
>     RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16
>     Address=0x33333300
>     Raw-read value=rwdump: line 28:   593 Bus error
>     $PREFIX $f "$@"
>
>
>     The module was working with this address in the other DAQ system (as
>     we did not know the order of the individual switches, we set them
>     all to "3"). But I can take it our and put it in again at a
>     different slot, if maybe this particular slot has a hardware
>     problem. (But I never heard of such thing.)
>
>
>
>
>     Best greetings
>
>     G?nter
>
>
>     ------------------------------------------------------------------------
>     *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
>     *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44
>     *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber,
>     Guenter Dr.
>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>     CAEN_V560 module
>     Dear G?nter,
>
>     map_map before mapping tries to read and write some given registers
>     with a "safe" but slower method of accessing registers, which is
>     called "poking" in nurdlib. Maybe the method of access on the rio4
>     you have is not safe enough and one of the two pokes fails horribly...
>
>     Could you please double check the module address? Could you also try
>     using bin/rwdump to read any register in the v560 to see if it's
>     accessible at all and not a problem with the module implementation
>     in nurdlib?
>
>     Something like bin/rwdump -a0x33333300 -r16
>
>     Actually the address 0x33333300 looks weird to me, maybe it should
>     be 0x33330000?
>     Also for reading, try register offsets fa, fc, fe, with 16 bits
>     accesseses, they should have some interesting values.
>
>     Cheers,
>     Hans
>
>
>     "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari
>     2024 10:18:29 CET)
>
>         Dear H?kan,
>
>
>         thanks for the hint to flush and sleep. Indeed, I now see that
>         the crash happens in init_slow of V560 at this line:
>
>
>         v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
>         0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
>
>
>         Maybe the code is accessing/writing into a memory location that
>         it should better not touch?
>
>         This problematic line is then followed by:
>
>
>         id=MAP_READ(v560->sicy_map, fixed_code);
>
>         The corresponding line in the V560 code on the system that was
>         running with this module looks like this:
>
>
>         v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
>         0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
>         MAP_POKE_ARGS(*v560->write, scale_clear));
>
>         And is followed by:
>
>
>              mapped_ptr =map_get_mapped_ptr(v560->sicy_map);
>         v560->read=mapped_ptr;
>         v560->write=mapped_ptr;
>
>         Maybe you already have an idea what causes the problem here?
>
>
>         I will now go to the system that was running with V560 and make
>         a push of the NURDLIB.
>
>
>
>
>         Best greetings
>
>         G?nter
>
>
>
>         ------------------------------------------------------------------------
>         *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im
>         Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
>         *Gesendet:* Dienstag, 20. Februar 2024 20:13:32
>         *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>         *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>         CAEN_V560 module
>
>         Dear G?nter,
>
>         I took the files you provided and for comparison put them in a
>         branch
>         'old_caen_v560'.
>
>         git diff origin/old_caen_v560..origin/master
>
>         however does not show anything which is suspicious to me.
>         Perhaps Hans
>         can spot something.
>
>         Otherwise, the only idea I can come up with is to continue to
>         bisect the
>         code inside slow init.
>
>         However, before that, I would suggest to add
>
>            fflush(stdout); sleep(1);
>
>         after each printf statement, such that one can be quite sure
>         that the
>         printout is not eaten when the RIO crash happens.  I.e. that it
>         actually
>         had gotten further than shown by the prints.
>
>         Best regards,
>         H?kan
>
>
>
>
>         On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:
>
>         >
>         > Dear friends,
>         >
>         >
>         > I now had a look at the system where the V560 was running. It was also setup
>         > by Bastian. And there the code for the V560 module is slightly different
>         > from the one included in the NURDLIB branch that I am using on the test
>         > system.
>         >
>         >
>         > Maybe you can have a look at it.
>         >
>         >
>         > I also could push the complete NURDLIB from this system, if this helps.
>         >
>         >
>         >
>         >
>         > Best greetings
>         >
>         > G?nter
>         >
>         >
>         >
>         >
>         > ____________________________________________________________________________
>         > Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
>         > Guenter Dr. <g.weber at hi-jena.gsi.de>
>         > Gesendet: Dienstag, 20. Februar 2024 10:58:27
>         > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>         > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>         >
>         > Dear friends,
>         >
>         >
>         > I now grabbed a V560 module that was working fine in another DAQ system and
>         > put it into our test system.
>         >
>         >
>         > The main.cfg looks like this:
>         >
>         >
>         > log_level=spam # info, verbose, debug, spam
>         >
>         > CRATE("MCAL") {
>         >     GSI_VULOM(0x03000000) {
>         >         timestamp = true # needed to get timestamps in the data output
>         >     #   ecl=0..15
>         >     }
>         >     BARRIER
>         >     CAEN_V560(0x333333300) {
>         >         use_veto = true
>         >     }
>         > #   CAEN_V767A(0x03100000) {
>         > #   }
>         > }
>         >
>         > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
>         > is necessary to talk to it again.
>         >
>         >
>         > The problem occurs in the first slow init of the V560 module. To find the
>         > exact line, I added some output to CRATE.C:
>         >
>         >
>         > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>         > before push_log_level(module)
>         > before a_crate->module_init_id = module->id
>         > before module->props->init_slow(a_crate, module)
>         > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>         > before module_init_id_mark(a_crate, module)
>         > before pop_log_level(module)
>         > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
>         > before push_log_level(module)
>         > before a_crate->module_init_id = module->id
>         > before module->props->init_slow(a_crate, module)
>         >
>         >
>         > The CRATE.C code now looks like this:
>         >
>         >
>         >     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>         >         if (NULL == module->props) {
>         >             continue;
>         >         }
>         >         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>         >             keyword_get_string(module->type));
>         >         printf("before push_log_level(module) \n");
>         >         push_log_level(module);
>         >         printf("before a_crate->module_init_id = module->id \n");
>         >         a_crate->module_init_id = module->id;
>         >         printf("before module->props->init_slow(a_crate, module) \n");
>         >         if (!module->props->init_slow(a_crate, module)) {
>         >             printf("before pop_log_level(module) \n");
>         >             pop_log_level(module);
>         >             printf("before goto crate_init_done \n");
>         >             goto crate_init_done;
>         >         }
>         >         printf("before module_init_id_mark(a_crate, module) \n");
>         >         module_init_id_mark(a_crate, module);
>         >         printf("before pop_log_level(module) \n");
>         >         pop_log_level(module);
>         >     }
>         >
>         > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
>         > module)) ..." is doing something quite horrible to the RIO4.
>         >
>         >
>         > This is unfortunate, because my original aim was to show that there is also
>         > a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>         >
>         >
>         > Do you have any idea what might cause the freezing of the RIO4?
>         >
>         >
>         >
>         >
>         > Best greetings and many thanks
>         >
>         > G?nter
>         >
>         >
>         >
>         >
>         >
>         >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240221/ddaaadee/attachment-0001.html>

From hans.tornqvist at chalmers.se  Wed Feb 21 16:46:25 2024
From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=)
Date: Wed, 21 Feb 2024 16:46:25 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
	<5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>
	<93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>
	<b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>
Message-ID: <66bdd6e1-93a1-4f32-b1a7-9880cc35ab09@chalmers.se>

Dear G?nter,

I cannot see anything important having changed in the v560 code, and in 
any case the freeze happens inside map_map.

Aha, but I do see a bug 'map_map'! Starting with the switch statement 
around line 195 where the bit depth is chosen, 'map_sicy_write' writes 
to 'poke_r_ofs', must be 'poke_w_ofs', please try that. (Says a lot 
about this piece of code... Cleanup action to the todo.)

Best regards,
Hans

On 2024-02-21 16:14, Weber, Guenter Dr. wrote:
> Dear Hans,
> 
> 
> writing into the register works fine (I tried it several times):
> 
> 
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
> Address=0x33333350
> Raw-write done.
> 
> I could now litter map_map with printf() outputs to see where execution of
> 
> v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
> 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
> 
> is failing. Should I proceed this way? Or is there anything else that I 
> could check?
> 
> (As I understand, the slightly different implementation of V560 on our 
> running system is not indicative of a specific issue, but just due to 
> fact that this is a deprecated version of NURDLIB. Right?)
> 
> 
> 
> Best greetings
> G?nter
> 
> 
> ------------------------------------------------------------------------
> *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
> *Gesendet:* Mittwoch, 21. Februar 2024 15:28:01
> *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.
> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 
> module
> Dear G?nter,
> 
> The most important thing is that you get reasonable values with these
> reads, the actual values don't mean a whole lot.
> 
> One of the manual reads that you did (ofs=0xfa) is what 'map_map' does
> for "poke reading". The macros
> 
> MAP_POKE_ARGS(fixed_code), or the older
> MAP_POKE_ARGS(*v560->read, fixed_code)
> 
> tell 'map_map' what address offset to poke, and it depends on each module.
> 
> The next thing that happens in 'map_map' is the "poke writing". Could
> you try to write to the 'scale_clear' register next? That would be:
> 
> rwdump -a0x33333350 -w16,0
> 
> ---
> 
> In case you would like to look deeper in 'map_map', you can find it in
> module/map/map.c around line-number 103. It's not a very complicated
> function that does the following:
> 
> -) Checks user-mapped memory, you don't need to worry about this, it's
> mainly for simulating module memory for tests.
> 
> -) Performs the poke-read.
> 
> -) Performs the poke-write.
> 
> -) If it's a BLT mapping, asks the platform-specific code to do that
> without further tests.
> 
> -) Otherwise times the poke registers many times to get an idea about
> the speed of every single-cycle access.
> 
> If you want to dig even deeper, you can look in
> module/map/map_xpc_3310.c which is what is used in the most recent Linux
> Rio4's. It's mainly a wrapper around a proprietary black-box library, so
> not scary and scary at the same time.
> 
> Best regards,
> Hans
> 
> On 2024-02-21 14:32, Weber, Guenter Dr. wrote:
>> Dear Hans,
>> 
>> 
>> with the different register addresses it works.
>> 
>> 
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16
>> Address=0x333333fa
>> Raw-read value=0xfaf5
>> 
>> 
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16
>> Address=0x333333fc
>> Raw-read value=0x083a
>> 
>> 
>> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16
>> Address=0x333333fe
>> Raw-read value=0x01bc
>> 
>> What can we learn from these numbers?
>> 
>> 
>> 
>> 
>> Best greetings
>> 
>> G?nter
>> 
>> 
>> 
>> ------------------------------------------------------------------------
>> *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
>> *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06
>> *An:* Weber, Guenter Dr.
>> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 
>> module
>> Hmm, looks like address offset 0 is "not used", could you try 
>> -a0x333333fa? Or fe and fc at the end,they should be some read-only 
>> registers.
>> 
>> 
>> "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024 
>> 12:06:00 CET)
>> 
>>???? Different VME slot of the V560 module, same result. :-(
>> 
>>???? ------------------------------------------------------------------------
>>???? *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag
>>???? von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
>>???? *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25
>>???? *An:* Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II,
>>???? drasi and UCESB.
>>???? *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>>???? CAEN_V560 module
>> 
>>???? Dear Hans,
>> 
>> 
>>???? the output from manual reading of the module indeed shows a problem:
>> 
>> 
>>???? RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16
>>???? Address=0x33333300
>>???? Raw-read value=rwdump: line 28:?? 593 Bus error              
>>???? $PREFIX $f "$@"
>> 
>> 
>>???? The module was working with this address in the other DAQ system (as
>>???? we did not know the order of the individual switches, we set them
>>???? all to "3"). But I can take it our and put it in again at a
>>???? different slot, if maybe this particular slot has a hardware
>>???? problem. (But I never heard of such thing.)
>> 
>> 
>> 
>> 
>>???? Best greetings
>> 
>>???? G?nter
>> 
>> 
>>???? ------------------------------------------------------------------------
>>???? *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
>>???? *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44
>>???? *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber,
>>???? Guenter Dr.
>>???? *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>>???? CAEN_V560 module
>>???? Dear G?nter,
>> 
>>???? map_map before mapping tries to read and write some given registers
>>???? with a "safe" but slower method of accessing registers, which is
>>???? called "poking" in nurdlib. Maybe the method of access on the rio4
>>???? you have is not safe enough and one of the two pokes fails horribly...
>> 
>>???? Could you please double check the module address? Could you also try
>>???? using bin/rwdump to read any register in the v560 to see if it's
>>???? accessible at all and not a problem with the module implementation
>>???? in nurdlib?
>> 
>>???? Something like bin/rwdump -a0x33333300 -r16
>> 
>>???? Actually the address 0x33333300 looks weird to me, maybe it should
>>???? be 0x33330000?
>>???? Also for reading, try register offsets fa, fc, fe, with 16 bits
>>???? accesseses, they should have some interesting values.
>> 
>>???? Cheers,
>>???? Hans
>> 
>> 
>>???? "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari
>>???? 2024 10:18:29 CET)
>> 
>>???????? Dear H?kan,
>> 
>> 
>>???????? thanks for the hint to flush and sleep. Indeed, I now see that
>>???????? the crash happens in init_slow of V560 at this line:
>> 
>> 
>>???????? v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
>>???????? 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
>> 
>> 
>>???????? Maybe the code is accessing/writing into a memory location that
>>???????? it should better not touch?
>> 
>>???????? This problematic line is then followed by:
>> 
>> 
>>???????? id=MAP_READ(v560->sicy_map, fixed_code);
>> 
>>???????? The corresponding line in the V560 code on the system that was
>>???????? running with this module looks like this:
>> 
>> 
>>???????? v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
>>???????? 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
>>???????? MAP_POKE_ARGS(*v560->write, scale_clear));
>> 
>>???????? And is followed by:
>> 
>> 
>>????????? ? ? mapped_ptr =map_get_mapped_ptr(v560->sicy_map);
>>???????? v560->read=mapped_ptr;
>>???????? v560->write=mapped_ptr;
>> 
>>???????? Maybe you already have an idea what causes the problem here?
>> 
>> 
>>???????? I will now go to the system that was running with V560 and make
>>???????? a push of the NURDLIB.
>> 
>> 
>> 
>> 
>>???????? Best greetings
>> 
>>???????? G?nter
>> 
>> 
>> 
>>???????? ------------------------------------------------------------------------
>>???????? *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im
>>???????? Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
>>???????? *Gesendet:* Dienstag, 20. Februar 2024 20:13:32
>>???????? *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>>???????? *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>>???????? CAEN_V560 module
>> 
>>???????? Dear G?nter,
>> 
>>???????? I took the files you provided and for comparison put them in a
>>???????? branch
>>???????? 'old_caen_v560'.
>> 
>>???????? git diff origin/old_caen_v560..origin/master
>> 
>>???????? however does not show anything which is suspicious to me. 
>>???????? Perhaps Hans
>>???????? can spot something.
>> 
>>???????? Otherwise, the only idea I can come up with is to continue to
>>???????? bisect the
>>???????? code inside slow init.
>> 
>>???????? However, before that, I would suggest to add
>> 
>>????????? ? fflush(stdout); sleep(1);
>> 
>>???????? after each printf statement, such that one can be quite sure
>>???????? that the
>>???????? printout is not eaten when the RIO crash happens.? I.e. that it
>>???????? actually
>>???????? had gotten further than shown by the prints.
>> 
>>???????? Best regards,
>>???????? H?kan
>> 
>> 
>> 
>> 
>>???????? On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:
>> 
>>???????? > 
>>???????? > Dear friends,
>>???????? > 
>>???????? > 
>>???????? > I now had a look at the system where the V560 was running. It was also setup
>>???????? > by Bastian. And there the code for the V560 module is slightly different
>>???????? > from the one included in the NURDLIB branch that I am using on the test
>>???????? > system.
>>???????? > 
>>???????? > 
>>???????? > Maybe you can have a look at it.
>>???????? > 
>>???????? > 
>>???????? > I also could push the complete NURDLIB from this system, if this helps.
>>???????? > 
>>???????? > 
>>???????? > 
>>???????? > 
>>???????? > Best greetings
>>???????? > 
>>???????? > G?nter
>>???????? > 
>>???????? > 
>>???????? > 
>>???????? > 
>>???????? > ____________________________________________________________________________
>>???????? > Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
>>???????? > Guenter Dr. <g.weber at hi-jena.gsi.de>
>>???????? > Gesendet: Dienstag, 20. Februar 2024 10:58:27
>>???????? > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>>???????? > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module 
>>???????? > 
>>???????? > Dear friends,
>>???????? > 
>>???????? > 
>>???????? > I now grabbed a V560 module that was working fine in another DAQ system and
>>???????? > put it into our test system.
>>???????? > 
>>???????? > 
>>???????? > The main.cfg looks like this:
>>???????? > 
>>???????? > 
>>???????? > log_level=spam # info, verbose, debug, spam
>>???????? > 
>>???????? > CRATE("MCAL") {
>>???????? > ? ? GSI_VULOM(0x03000000) {
>>???????? > ? ? ? ? timestamp = true # needed to get timestamps in the data output
>>???????? > ? ? # ? ecl=0..15
>>???????? > ? ? }
>>???????? > ? ? BARRIER
>>???????? > ? ? CAEN_V560(0x333333300) {
>>???????? > ? ? ? ? use_veto = true
>>???????? > ? ? }  
>>???????? > # ? CAEN_V767A(0x03100000) {
>>???????? > # ? }
>>???????? > }
>>???????? > 
>>???????? > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
>>???????? > is necessary to talk to it again.
>>???????? > 
>>???????? > 
>>???????? > The problem occurs in the first slow init of the V560 module. To find the
>>???????? > exact line, I added some output to CRATE.C:
>>???????? > 
>>???????? > 
>>???????? > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>>???????? > before push_log_level(module)
>>???????? > before a_crate->module_init_id = module->id
>>???????? > before module->props->init_slow(a_crate, module)
>>???????? > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>>???????? > before module_init_id_mark(a_crate, module)
>>???????? > before pop_log_level(module)
>>???????? > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
>>???????? > before push_log_level(module)
>>???????? > before a_crate->module_init_id = module->id
>>???????? > before module->props->init_slow(a_crate, module)
>>???????? > 
>>???????? > 
>>???????? > The CRATE.C code now looks like this:
>>???????? > 
>>???????? > 
>>???????? > ? ? TAILQ_FOREACH(module, &a_crate->module_list, next) {
>>???????? > ? ? ? ? if (NULL == module->props) {
>>???????? > ? ? ? ? ? ? continue;
>>???????? > ? ? ? ? }
>>???????? > ? ? ? ? LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>>???????? > ? ? ? ? ? ? keyword_get_string(module->type));
>>???????? > ? ? ? ? printf("before push_log_level(module) \n");
>>???????? > ? ? ? ? push_log_level(module);
>>???????? > ? ? ? ? printf("before a_crate->module_init_id = module->id \n");
>>???????? > ? ? ? ? a_crate->module_init_id = module->id;
>>???????? > ? ? ? ? printf("before module->props->init_slow(a_crate, module) \n");
>>???????? > ? ? ? ? if (!module->props->init_slow(a_crate, module)) {
>>???????? > ? ? ? ? ? ? printf("before pop_log_level(module) \n");
>>???????? > ? ? ? ? ? ? pop_log_level(module);
>>???????? > ? ? ? ? ? ? printf("before goto crate_init_done \n");
>>???????? > ? ? ? ? ? ? goto crate_init_done;
>>???????? > ? ? ? ? }
>>???????? > ? ? ? ? printf("before module_init_id_mark(a_crate, module) \n");
>>???????? > ? ? ? ? module_init_id_mark(a_crate, module);
>>???????? > ? ? ? ? printf("before pop_log_level(module) \n");
>>???????? > ? ? ? ? pop_log_level(module);
>>???????? > ? ? }
>>???????? > 
>>???????? > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
>>???????? > module)) ..." is doing something quite horrible to the RIO4.
>>???????? > 
>>???????? > 
>>???????? > This is unfortunate, because my original aim was to show that there is also
>>???????? > a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>>???????? > 
>>???????? > 
>>???????? > Do you have any idea what might cause the freezing of the RIO4?
>>???????? > 
>>???????? > 
>>???????? > 
>>???????? > 
>>???????? > Best greetings and many thanks
>>???????? > 
>>???????? > G?nter
>>???????? > 
>>???????? > 
>>???????? > 
>>???????? > 
>>???????? > 
>>???????? >
>> 
>> 
> 

From g.weber at hi-jena.gsi.de  Thu Feb 22 10:04:28 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Thu, 22 Feb 2024 09:04:28 +0000
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
	<5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>,
	<93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>,
	<b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>
Message-ID: <c985e7923d944c1598475550f150c042@hi-jena.gsi.de>

Dear friends,


after the bug in map_map was fixed, the freeze does not happen again. Very good!


Now back to my original concern regarding the V560 module ...


readout_dt looks like this:


uint32_t
caen_v560_readout_dt(struct Crate *a_crate, struct Module *a_module)
{
    (void)a_crate;
    LOGF(spam)(LOGL, NAME" readout_dt {");
    a_module->event_counter.value++;
    LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }",
        a_module->event_counter.value);
    return 0;
}


The module counter is incremented by one every time readout_dt is executed. This results in a problem in crate.c:


            diff_module = COUNTER_DIFF(*module->crate_counter,
                module->event_counter, module->this_minus_crate);
            /* TODO: Clean this. */
            shadow_counter.value =
                module->shadow.data_counter_value;
            shadow_counter.mask = module->event_counter.mask;
            diff_shadow = COUNTER_DIFF(*module->crate_counter,
                shadow_counter, module->this_minus_crate);

            create_do_shad = !crate_get_do_shadow(a_crate);
            printf("%s: diff_module: %u, module_crate_counter: %u, module_event_counter: %u, module_this_minus_crate: %u \n", keyword_get_string(module->type), diff_module, (*module->crate_counter).value, (module->event_counter).value, module->this_minus_crate);
            if (0 == diff_module &&
                ( create_do_shad ||
                 NULL == module->props->readout_shadow ||
                 0 == diff_shadow)) {
                ok = 1;
                printf("%u \n", ok);
                break;
            }
            getchar();


When the difference between (*module->crate_counter).value and (module->event_counter).value is evaluated the later was already incremented as readout_dt for the module was already executed while the former counter was not incremented.


This is the output:


CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, module_event_counter: 1, module_this_minus_crate: 0

Note that diff_module shows the result of an "0 - 1" operation when working with unsigned integers.

The original version of the crate.c code would now again execute readout_dt of module V560, thus incrementing the module counter another time. Thus, diff_module would be "0 - 2". This would repeat until the timeout condition a bit later in the code is reached.
Then the modules would be re-initialized, thus setting (module->event_counter).value of V560 back to zero. But the crate counter would be incremented. Thus, by shear luck the next try of the same test would have (*module->crate_counter).value and (module->event_counter).value both equal to 1. And from this point the DAQ is running as intended.

Ok, I hope the explanation was clear and I understood correctly what is happening.


Best greetings
G?nter


----------------

G?nter Weber

Helmholtz-Institut Jena
Fr?belstieg 3
07743 Jena
Germany
Phone: +49-3641-947605
www.hi-jena.de<http://www.hi-jena.de>

GSI Helmholtzzentrum f?r Schwerionenforschung
Planckstrasse 1
64291 Darmstadt
Germany
www.gsi.de<http://www.gsi.de>
________________________________
Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
Gesendet: Mittwoch, 21. Februar 2024 16:14:45
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module


Dear Hans,


writing into the register works fine (I tried it several times):


RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
Address=0x33333350
Raw-write done.
RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
Address=0x33333350
Raw-write done.
RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
Address=0x33333350
Raw-write done.
RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0
Address=0x33333350
Raw-write done.

I could now litter map_map with printf() outputs to see where execution of

    v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT,
        0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));

is failing. Should I proceed this way? Or is there anything else that I could check?

(As I understand, the slightly different implementation of V560 on our running system is not indicative of a specific issue, but just due to fact that this is a deprecated version of NURDLIB. Right?)


Best greetings
G?nter


________________________________
Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
Gesendet: Mittwoch, 21. Februar 2024 15:28:01
An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.
Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module

Dear G?nter,

The most important thing is that you get reasonable values with these
reads, the actual values don't mean a whole lot.

One of the manual reads that you did (ofs=0xfa) is what 'map_map' does
for "poke reading". The macros

MAP_POKE_ARGS(fixed_code), or the older
MAP_POKE_ARGS(*v560->read, fixed_code)

tell 'map_map' what address offset to poke, and it depends on each module.

The next thing that happens in 'map_map' is the "poke writing". Could
you try to write to the 'scale_clear' register next? That would be:

rwdump -a0x33333350 -w16,0

---

In case you would like to look deeper in 'map_map', you can find it in
module/map/map.c around line-number 103. It's not a very complicated
function that does the following:

-) Checks user-mapped memory, you don't need to worry about this, it's
mainly for simulating module memory for tests.

-) Performs the poke-read.

-) Performs the poke-write.

-) If it's a BLT mapping, asks the platform-specific code to do that
without further tests.

-) Otherwise times the poke registers many times to get an idea about
the speed of every single-cycle access.

If you want to dig even deeper, you can look in
module/map/map_xpc_3310.c which is what is used in the most recent Linux
Rio4's. It's mainly a wrapper around a proprietary black-box library, so
not scary and scary at the same time.

Best regards,
Hans

On 2024-02-21 14:32, Weber, Guenter Dr. wrote:
> Dear Hans,
>
>
> with the different register addresses it works.
>
>
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16
> Address=0x333333fa
> Raw-read value=0xfaf5
>
>
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16
> Address=0x333333fc
> Raw-read value=0x083a
>
>
> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16
> Address=0x333333fe
> Raw-read value=0x01bc
>
> What can we learn from these numbers?
>
>
>
>
> Best greetings
>
> G?nter
>
>
>
> ------------------------------------------------------------------------
> *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
> *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06
> *An:* Weber, Guenter Dr.
> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560
> module
> Hmm, looks like address offset 0 is "not used", could you try
> -a0x333333fa? Or fe and fc at the end,they should be some read-only
> registers.
>
>
> "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari 2024
> 12:06:00 CET)
>
>     Different VME slot of the V560 module, same result. :-(
>
>     ------------------------------------------------------------------------
>     *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag
>     von Weber, Guenter Dr. <g.weber at hi-jena.gsi.de>
>     *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25
>     *An:* Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II,
>     drasi and UCESB.
>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>     CAEN_V560 module
>
>     Dear Hans,
>
>
>     the output from manual reading of the module indeed shows a problem:
>
>
>     RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16
>     Address=0x33333300
>     Raw-read value=rwdump: line 28:   593 Bus error
>     $PREFIX $f "$@"
>
>
>     The module was working with this address in the other DAQ system (as
>     we did not know the order of the individual switches, we set them
>     all to "3"). But I can take it our and put it in again at a
>     different slot, if maybe this particular slot has a hardware
>     problem. (But I never heard of such thing.)
>
>
>
>
>     Best greetings
>
>     G?nter
>
>
>     ------------------------------------------------------------------------
>     *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
>     *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44
>     *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber,
>     Guenter Dr.
>     *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>     CAEN_V560 module
>     Dear G?nter,
>
>     map_map before mapping tries to read and write some given registers
>     with a "safe" but slower method of accessing registers, which is
>     called "poking" in nurdlib. Maybe the method of access on the rio4
>     you have is not safe enough and one of the two pokes fails horribly...
>
>     Could you please double check the module address? Could you also try
>     using bin/rwdump to read any register in the v560 to see if it's
>     accessible at all and not a problem with the module implementation
>     in nurdlib?
>
>     Something like bin/rwdump -a0x33333300 -r16
>
>     Actually the address 0x33333300 looks weird to me, maybe it should
>     be 0x33330000?
>     Also for reading, try register offsets fa, fc, fe, with 16 bits
>     accesseses, they should have some interesting values.
>
>     Cheers,
>     Hans
>
>
>     "Weber, Guenter Dr." <g.weber at hi-jena.gsi.de> skrev: (21 februari
>     2024 10:18:29 CET)
>
>         Dear H?kan,
>
>
>         thanks for the hint to flush and sleep. Indeed, I now see that
>         the crash happens in init_slow of V560 at this line:
>
>
>         v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT,
>         0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear));
>
>
>         Maybe the code is accessing/writing into a memory location that
>         it should better not touch?
>
>         This problematic line is then followed by:
>
>
>         id=MAP_READ(v560->sicy_map, fixed_code);
>
>         The corresponding line in the V560 code on the system that was
>         running with this module looks like this:
>
>
>         v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT,
>         0, 0, MAP_POKE_ARGS(*v560->read, fixed_code),
>         MAP_POKE_ARGS(*v560->write, scale_clear));
>
>         And is followed by:
>
>
>              mapped_ptr =map_get_mapped_ptr(v560->sicy_map);
>         v560->read=mapped_ptr;
>         v560->write=mapped_ptr;
>
>         Maybe you already have an idea what causes the problem here?
>
>
>         I will now go to the system that was running with V560 and make
>         a push of the NURDLIB.
>
>
>
>
>         Best greetings
>
>         G?nter
>
>
>
>         ------------------------------------------------------------------------
>         *Von:* subexp-daq <subexp-daq-bounces at lists.chalmers.se> im
>         Auftrag von H?kan T Johansson <f96hajo at chalmers.se>
>         *Gesendet:* Dienstag, 20. Februar 2024 20:13:32
>         *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>         *Betreff:* Re: [subexp-daq] Report of a possible bug of the
>         CAEN_V560 module
>
>         Dear G?nter,
>
>         I took the files you provided and for comparison put them in a
>         branch
>         'old_caen_v560'.
>
>         git diff origin/old_caen_v560..origin/master
>
>         however does not show anything which is suspicious to me.
>         Perhaps Hans
>         can spot something.
>
>         Otherwise, the only idea I can come up with is to continue to
>         bisect the
>         code inside slow init.
>
>         However, before that, I would suggest to add
>
>            fflush(stdout); sleep(1);
>
>         after each printf statement, such that one can be quite sure
>         that the
>         printout is not eaten when the RIO crash happens.  I.e. that it
>         actually
>         had gotten further than shown by the prints.
>
>         Best regards,
>         H?kan
>
>
>
>
>         On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote:
>
>         >
>         > Dear friends,
>         >
>         >
>         > I now had a look at the system where the V560 was running. It was also setup
>         > by Bastian. And there the code for the V560 module is slightly different
>         > from the one included in the NURDLIB branch that I am using on the test
>         > system.
>         >
>         >
>         > Maybe you can have a look at it.
>         >
>         >
>         > I also could push the complete NURDLIB from this system, if this helps.
>         >
>         >
>         >
>         >
>         > Best greetings
>         >
>         > G?nter
>         >
>         >
>         >
>         >
>         > ____________________________________________________________________________
>         > Von: subexp-daq <subexp-daq-bounces at lists.chalmers.se> im Auftrag von Weber,
>         > Guenter Dr. <g.weber at hi-jena.gsi.de>
>         > Gesendet: Dienstag, 20. Februar 2024 10:58:27
>         > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.
>         > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module
>         >
>         > Dear friends,
>         >
>         >
>         > I now grabbed a V560 module that was working fine in another DAQ system and
>         > put it into our test system.
>         >
>         >
>         > The main.cfg looks like this:
>         >
>         >
>         > log_level=spam # info, verbose, debug, spam
>         >
>         > CRATE("MCAL") {
>         >     GSI_VULOM(0x03000000) {
>         >         timestamp = true # needed to get timestamps in the data output
>         >     #   ecl=0..15
>         >     }
>         >     BARRIER
>         >     CAEN_V560(0x333333300) {
>         >         use_veto = true
>         >     }
>         > #   CAEN_V767A(0x03100000) {
>         > #   }
>         > }
>         >
>         > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate
>         > is necessary to talk to it again.
>         >
>         >
>         > The problem occurs in the first slow init of the V560 module. To find the
>         > exact line, I added some output to CRATE.C:
>         >
>         >
>         > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM.
>         > before push_log_level(module)
>         > before a_crate->module_init_id = module->id
>         > before module->props->init_slow(a_crate, module)
>         > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>         > before module_init_id_mark(a_crate, module)
>         > before pop_log_level(module)
>         > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560.
>         > before push_log_level(module)
>         > before a_crate->module_init_id = module->id
>         > before module->props->init_slow(a_crate, module)
>         >
>         >
>         > The CRATE.C code now looks like this:
>         >
>         >
>         >     TAILQ_FOREACH(module, &a_crate->module_list, next) {
>         >         if (NULL == module->props) {
>         >             continue;
>         >         }
>         >         LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id,
>         >             keyword_get_string(module->type));
>         >         printf("before push_log_level(module) \n");
>         >         push_log_level(module);
>         >         printf("before a_crate->module_init_id = module->id \n");
>         >         a_crate->module_init_id = module->id;
>         >         printf("before module->props->init_slow(a_crate, module) \n");
>         >         if (!module->props->init_slow(a_crate, module)) {
>         >             printf("before pop_log_level(module) \n");
>         >             pop_log_level(module);
>         >             printf("before goto crate_init_done \n");
>         >             goto crate_init_done;
>         >         }
>         >         printf("before module_init_id_mark(a_crate, module) \n");
>         >         module_init_id_mark(a_crate, module);
>         >         printf("before pop_log_level(module) \n");
>         >         pop_log_level(module);
>         >     }
>         >
>         > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate,
>         > module)) ..." is doing something quite horrible to the RIO4.
>         >
>         >
>         > This is unfortunate, because my original aim was to show that there is also
>         > a bug/mistake in readout_dt of the V560 module. But I did not come this far.
>         >
>         >
>         > Do you have any idea what might cause the freezing of the RIO4?
>         >
>         >
>         >
>         >
>         > Best greetings and many thanks
>         >
>         > G?nter
>         >
>         >
>         >
>         >
>         >
>         >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240222/0e6ddc66/attachment-0001.html>

From hans.tornqvist at chalmers.se  Thu Feb 22 13:10:29 2024
From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=)
Date: Thu, 22 Feb 2024 13:10:29 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <c985e7923d944c1598475550f150c042@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
	<5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>
	<93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>
	<b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>
	<c985e7923d944c1598475550f150c042@hi-jena.gsi.de>
Message-ID: <53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se>

On 2024-02-22 10:04, Weber, Guenter Dr. wrote:
> Dear friends,
> 
> after the bug in map_map was fixed, the freeze does not happen again. 
> Very good!

Thanks for testing, I'm saving that fix myself!

> Now back to my original concern regarding the V560 module ...
> 
> readout_dt looks like this:
> 
> uint32_t
> caen_v560_readout_dt(structCrate*a_crate, structModule*a_module)
> {
>  ? ? (void)a_crate;
> LOGF(spam)(LOGL, NAME" readout_dt {");
> a_module->event_counter.value++;
> LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }",
> a_module->event_counter.value);
> return0;
> }
> 
> The module counter is incremented by one every time readout_dt is 
> executed. This results in a problem in crate.c:
> 
> diff_module=COUNTER_DIFF(*module->crate_counter,
> module->event_counter, module->this_minus_crate);
>  ? ? ? ? ? ? /* TODO: Clean this. */
> shadow_counter.value=
> module->shadow.data_counter_value;
> shadow_counter.mask=module->event_counter.mask;
> diff_shadow=COUNTER_DIFF(*module->crate_counter,
> shadow_counter, module->this_minus_crate);
> 
> create_do_shad=!crate_get_do_shadow(a_crate);
> printf("%s: diff_module: %u, module_crate_counter: %u, 
> module_event_counter: %u, module_this_minus_crate: %u\n", 
> keyword_get_string(module->type), diff_module, 
> (*module->crate_counter).value, (module->event_counter).value, 
> module->this_minus_crate);
> if(0==diff_module&&
>  ? ? ? ? ? ? ? ? ( create_do_shad||
> NULL==module->props->readout_shadow||
> 0==diff_shadow)) {
> ok=1;
> printf("%u\n", ok);
> break;
>  ? ? ? ? ? ? }
> getchar();
> 
> When the difference between (*module->crate_counter).value and 
> (module->event_counter).value is evaluated the later was already 
> incremented as readout_dt for the module was already executed while the 
> former counter was not incremented.

The crate counter should have been incremented by the readout function 
that calls 'crate_readout_dt'. If I remember correctly you used the 
r3bfuser, so somewhere in fuser.c there's a function fuser_readout which 
calls crate_tag_counter_increase. The crate counter increment is 
"abstracted" away a bit, due to module tagging and multi-event support 
when it can increase by an arbritary value between events.

> This is the output:
> 
> CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, 
> module_event_counter: 1, module_this_minus_crate: 0
> 
> Note that diff_module shows the result of an "0 - 1" operation when 
> working with unsigned integers.

Looks like the crate counter stays still on 0 indeed. Do you have a 
snippet of the drasi log around this error message?

> The original version of the crate.c code would now again execute 
> readout_dt of module V560, thus incrementing the module counter another 
> time. Thus, diff_module would be "0 - 2". This would repeat until the 
> timeout condition a bit later in the code is reached.
> Then the modules would be re-initialized, thus setting 
> (module->event_counter).value of V560 back to zero. But the crate 
> counter would be incremented. Thus, by shear luck the next try of the 
> same test would have (*module->crate_counter).value and 
> (module->event_counter).value both equal to 1. And from this point the 
> DAQ is running as intended.

It sounds to me like the old version was very broken and should be 
buried, deep.

Best regards,
Hans

> Ok, I hope the explanation was clear and I understood correctly what is 
> happening.
> 
> Best greetings
> G?nter

From f96hajo at chalmers.se  Thu Feb 22 13:46:29 2024
From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=)
Date: Thu, 22 Feb 2024 13:46:29 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
	<5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>
	<93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>
	<b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>
	<c985e7923d944c1598475550f150c042@hi-jena.gsi.de>
	<53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se>
Message-ID: <2a4cfd01-8c50-2721-c4e2-89f1a3e55a12@chalmers.se>


Dear G?nter,

just a side-note (since I'm not deep into r3bfuser...):

perhaps you already have, but if not, I suspect it would be good if you 
could push the versions of the code you are using (unless it is plain 
master branches).  Just to avoid some guesswork.

Cheers,
H?kan


On Thu, 22 Feb 2024, Hans Toshihide T?rnqvist wrote:

> On 2024-02-22 10:04, Weber, Guenter Dr. wrote:
>> Dear friends,
>> 
>> after the bug in map_map was fixed, the freeze does not happen again. 
>> Very good!
>
> Thanks for testing, I'm saving that fix myself!
>
>> Now back to my original concern regarding the V560 module ...
>> 
>> readout_dt looks like this:
>> 
>> uint32_t
>> caen_v560_readout_dt(structCrate*a_crate, structModule*a_module)
>> {
>>  ? ? (void)a_crate;
>> LOGF(spam)(LOGL, NAME" readout_dt {");
>> a_module->event_counter.value++;
>> LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }",
>> a_module->event_counter.value);
>> return0;
>> }
>> 
>> The module counter is incremented by one every time readout_dt is 
>> executed. This results in a problem in crate.c:
>> 
>> diff_module=COUNTER_DIFF(*module->crate_counter,
>> module->event_counter, module->this_minus_crate);
>>  ? ? ? ? ? ? /* TODO: Clean this. */
>> shadow_counter.value=
>> module->shadow.data_counter_value;
>> shadow_counter.mask=module->event_counter.mask;
>> diff_shadow=COUNTER_DIFF(*module->crate_counter,
>> shadow_counter, module->this_minus_crate);
>> 
>> create_do_shad=!crate_get_do_shadow(a_crate);
>> printf("%s: diff_module: %u, module_crate_counter: %u, 
>> module_event_counter: %u, module_this_minus_crate: %u\n", 
>> keyword_get_string(module->type), diff_module, 
>> (*module->crate_counter).value, (module->event_counter).value, 
>> module->this_minus_crate);
>> if(0==diff_module&&
>>  ? ? ? ? ? ? ? ? ( create_do_shad||
>> NULL==module->props->readout_shadow||
>> 0==diff_shadow)) {
>> ok=1;
>> printf("%u\n", ok);
>> break;
>>  ? ? ? ? ? ? }
>> getchar();
>> 
>> When the difference between (*module->crate_counter).value and 
>> (module->event_counter).value is evaluated the later was already 
>> incremented as readout_dt for the module was already executed while the 
>> former counter was not incremented.
>
> The crate counter should have been incremented by the readout function 
> that calls 'crate_readout_dt'. If I remember correctly you used the 
> r3bfuser, so somewhere in fuser.c there's a function fuser_readout which 
> calls crate_tag_counter_increase. The crate counter increment is 
> "abstracted" away a bit, due to module tagging and multi-event support 
> when it can increase by an arbritary value between events.
>
>> This is the output:
>> 
>> CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, 
>> module_event_counter: 1, module_this_minus_crate: 0
>> 
>> Note that diff_module shows the result of an "0 - 1" operation when 
>> working with unsigned integers.
>
> Looks like the crate counter stays still on 0 indeed. Do you have a 
> snippet of the drasi log around this error message?
>
>> The original version of the crate.c code would now again execute 
>> readout_dt of module V560, thus incrementing the module counter another 
>> time. Thus, diff_module would be "0 - 2". This would repeat until the 
>> timeout condition a bit later in the code is reached.
>> Then the modules would be re-initialized, thus setting 
>> (module->event_counter).value of V560 back to zero. But the crate 
>> counter would be incremented. Thus, by shear luck the next try of the 
>> same test would have (*module->crate_counter).value and 
>> (module->event_counter).value both equal to 1. And from this point the 
>> DAQ is running as intended.
>
> It sounds to me like the old version was very broken and should be 
> buried, deep.
>
> Best regards,
> Hans
>
>> Ok, I hope the explanation was clear and I understood correctly what is 
>> happening.
>> 
>> Best greetings
>> G?nter
> -- 
> subexp-daq mailing list
> subexp-daq at lists.chalmers.se
> https://lists.chalmers.se/mailman/listinfo/subexp-daq
>

From g.weber at hi-jena.gsi.de  Thu Feb 22 16:09:25 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Thu, 22 Feb 2024 15:09:25 +0000
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <423e14dd-73a9-4fbc-b106-4856cb1e1997@chalmers.se>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
	<5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>
	<93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>
	<b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>
	<c985e7923d944c1598475550f150c042@hi-jena.gsi.de>
	<53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se>
	<62267106fb284e50b805b9dba09b8483@hi-jena.gsi.de>,
	<423e14dd-73a9-4fbc-b106-4856cb1e1997@chalmers.se>
Message-ID: <3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de>

Dear Hans,


many thanks! And in particular for all the detailed explanations.


For the VULOM "sleep 1" did not do the trick, but "sleep 10" worked. Is there any chance to ask the VULOM if it feels ready to do the job, instead of using a random waiting time?


Also I noticed that when aksing the VULOM which firmware it is using, we get a slightly different reply than the actual firmware number:

RIO4-MCAL-1 mbsdaq > vulomflash --addr=3 --read
VULOM base address: 0x03000000
hwmap_mapvme.c:398: LOG: Virtual address for VULOM/TRIDI @ VME 0x03000000 is 0x3005e000.
Performing command 'read'...
VOLUM+0 => 0x14091f20
VOLUM+RANGE_REG(0x800000) => 0x0000006a
Released vme ptr.
But the actual firmware number is 1409285e.

For comparison should one look only at the first four hex numbers? Or is there more to take into account?


For the V560 module, misusing the bitmask for the counter resolved the issue. At the end of this mail, I attach the new log. Maybe you find something notable, but to me it looks fine now.


Our next steps would be as follows:


1) Wait for you to implement the bugfixes of the last days into NURDLIB.

2) Setting up the test system with the most recent version of NURDLIB and checking, if our minimal system with VULOM and V560 is now running smoothly.

3) Hammering the V767 TDC into NURDLIB.

4) Once we have achieved this, we would go back to testing the SIS3316 modules.


Best greetings

G?nter


10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 56583).
Thread has no error buffer yet...
CPUS: 1
delay: 1
10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 56583).
Thread has no error buffer yet...
HOST: RIO4-MCAL-1
Token: d6d68d7b (d6d68d7b:d6d68d7b) [/mbsusr/mbsdaq/.drasi_tokens/mcal]
10: lwroc_hostname_util.c:457: Own address: 192.168.1.71/255.255.255.0 (eth1).
10: lwroc_data_pipe.c:145: Data buffer READOUT_PIPE, size 419430400 = 0x19000000, 1 consumers.
10: lwroc_triva_readout.c:66: Silence TRIVA  (HALT)
10: lwroc_net_io.c:167: Started server on port 56583 (data port 39534).
client union size: 244 208 188 508 640 204 204  => 640
10: lwroc_udp_awaken_hints.c:159: UDP awaken hints file: /tmp/drasi.u1001/drasi.hints.u1001.RIO4-MCAL-1:56583
10: lwroc_main.c:706: Log message rate limit not in effect.
10: lwroc_readout.c:112: call readout_init...
10: lwroc_thread_util.c:117: This is the triva control thread!
10: lwroc_thread_util.c:117: This is the net io thread!
10: lwroc_thread_util.c:117: This is the slow_async thread!
10: lwroc_thread_util.c:117: This is the data server thread!
8: lwroc_message_wait.c:86: Waited 1 seconds for msg client.
8: lwroc_triva_state.c:414: Waited 1 seconds for initial slave and EB connection(s):
8: lwroc_triva_state.c:422: [EB lyserv] (state 0)
10: lwroc_message_internal.c:472: Message client connected!
10: lwroc_net_trans.c:1156: [drasi] Transport client connected (data) [192.168.1.1].
10: lwroc_triva_control.c:370: Setup TRIVA  (DISBUS, HALT, MASTER, RESET)
10: lwroc_triva_control.c:418: Minimum event time ctime(5000)+1*rd(686)+3*wr(634)+fctime(1000)=8588 ns (116.442 kHz)
10: lwroc_triva_state.c:1486: (Re)send ident messages...
10: lwroc_triva_control.c:495: START TEST ACQ: HALT, CLEAR=RESET, MT=1
9: lwroc_triva_control.c:507: TEST: GO
10: lwroc_triva_control.c:725: RUN: RESET
10: lwroc_triva_control.c:729: RUN: MT=14
9: lwroc_triva_control.c:737:   GO (1 good test triggers done) (max 116.4 kHz)
10: lwroc_triva_readout.c:376: Trigger 14 seen.
10: config/config.c:181: Will try default cfg path='/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default', can be set with NURDLIB_DEF_PATH.
8: lwroc_triva_state.c:2399: Master: deadtime: 1.  Status: 0x10 (IN_READOUT).  EC: 1
10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0.
8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting...
10: config/parser.c:287: Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' {
10: config/parser.c:299: Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' }
10: config/parser.c:287: Opened './main.cfg' {
10: config/config.c:1299: .Global log level=debug.
10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' {
10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' }
10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' {
10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' }
10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' {
10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' }
10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen_v560.cfg' {
10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen_v560.cfg' }
10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' {
10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' }
10: config/parser.c:299: Closed './main.cfg' }
10: crate/crate.c:348: crate_create {
10: crate/crate.c:674: crate_create(MCAL) }
10: crate/crate.c:900: crate_init(MCAL) {
10: crate/crate.c:924: .Slow-init module[0]=GSI_VULOM.
LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
10: crate/crate.c:924: .Slow-init module[1]=CAEN_V560.
10: module/map/map.c:224: ...rd(0x33333300+0xfa/16)=963ns wr(0x33333300+0x50/16)=713ns.
10: crate/crate.c:977: .Fast-init module[0]=GSI_VULOM.
10: crate/crate.c:977: .Fast-init module[1]=CAEN_V560.
10: crate/crate.c:1074: crate_init(MCAL) }
10: ctrl/ctrl.c:788: Control server online.
Thread has no error buffer yet...
10: f_user.c:559: WR ID=0x200.
10: f_user.c:565: TS offset unset. Will not modify stamp.
10: f_user.c:572: TPAT: No.
10: f_user.c:573: Sync-check: No.
10: f_user.c:575: Spill triggers: No.
10: f_user.c:576: LMU: No.
10: f_user.c:577: Timer latches: No.
10: f_user.c:578: Spill shape: No.
10: f_user.c:579: Micro-structure: No.
10: f_user.c:581: Multi-event flag: No.
10: f_user.c:586: UDP destination: None.
GSI_VULOM: diff_module: 0, module_crate_counter: 0, module_event_counter: 0, module_this_minus_crate: 0
CAEN_V560: diff_module: 0, module_crate_counter: 0, module_event_counter: 0, module_this_minus_crate: 0
...


________________________________
Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
Gesendet: Donnerstag, 22. Februar 2024 15:26:15
An: Weber, Guenter Dr.; H?kan T Johansson
Betreff: Re: AW: [subexp-daq] Report of a possible bug of the CAEN_V560 module

Dear G?nter,

I have some ideas after your two e-mails, logs and config files are
really useful :) I'll shorten the log for my comments. for the tl;dr
version, just skip to the bottom code change suggestion.

On 2024-02-22 14:21, Weber, Guenter Dr. wrote:
> Dear Hans,
>
> here is the output of the DRASI log from the test system. I am using the
> most recent (or almost most recent) versions of the various software
> packages from GITLAB. The system currently only has a VULOM and a V560
> module.
>
> I added some comments to the output. To me it looks, the VULOM has some
> problems at the beginning. And, separate from the VULOM issue, the math
> of the difference in counters does not work out for the V560 module.
>
> The VULOM issue I did not notice before. So, maybe by adding a lot of
> output lines into crate.c and then removing them I have broken
> something. But it also possible that before I simply overlooked these
> error message as in the end it looks like the DAQ is working fine.
>
> Best greetings
>
> G?nter
>
> 10: f_user.c:559: WR ID=0x200.
> 10: f_user.c:565: TS offset unset. Will not modify stamp.
> 10: f_user.c:572: TPAT: No.
> 10: f_user.c:573: Sync-check: No.
> 10: f_user.c:575: Spill triggers: No.
> 10: f_user.c:576: LMU: No.
> 10: f_user.c:577: Timer latches: No.
> 10: f_user.c:578: Spill shape: No.
> 10: f_user.c:579: Micro-structure: No.
> 10: f_user.c:581: Multi-event flag: No.
> 10: f_user.c:586: UDP destination: None.
> ***** looks like the VULOM has a problem. I did not notice this before *****
> 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver has had
> sync failure, status=0x000a8000.
> 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver not
> synced, status=0x000a8000.
> 5: crate/crate.c:1250: .MCAL[0]=GSI_VULOM: readout_dt failed = 0x00000004.
> 5: crate/crate.c:1322: .MCAL[0]=GSI_VULOM: Event counter:
> crate=0x00000000/32, this-crate=0x00000000, module=0x00000000/31
> diff=0xdeadbeef, shadow=0x00000000/31 diff=0xdeadbeef.

The sync could be bad if the DAQ starts too fast after setting up the
timestamp source. Try "sleep 1" after setting up the vulom so the
timestamp receiver in the vulom can latch onto its input, even if it's
wired internally.

> ***** here we see the counter mismatch - I did add a 250 ms delay to
> crate.c, before it does readout_dt again to avoid having thousands of
> output lines here******
> *
> CAEN_V560: diff_module: 4294967295, module_crate_counter: 0,
> module_event_counter: 1, module_this_minus_crate: 0
> CAEN_V560: diff_module: 4294967294, module_crate_counter: 0,
> module_event_counter: 2, module_this_minus_crate: 0
> CAEN_V560: diff_module: 4294967293, module_crate_counter: 0,
> module_event_counter: 3, module_this_minus_crate: 0
> CAEN_V560: diff_module: 4294967292, module_crate_counter: 0,
> module_event_counter: 4, module_this_minus_crate: 0
> ***** after four trials of readout_dt of V560, we reach the timeout of 1
> second*****
> 5: crate/crate.c:1287: .MCAL[1]=CAEN_V560: readout_dt timeout.
> 5: crate/crate.c:1322: .MCAL[1]=CAEN_V560: Event counter:
> crate=0x00000000/32, this-crate=0x00000000, module=0x00000004/32
> diff=0xfffffffc, shadow=0x00000000/32 diff=0x00000000.

This is an artifact of the soft counter in the v560, obviously we don't
expect the module to have more accepted events while polling it, but the
real problem comes a bit later.

> 5: crate/crate.c:1394: .MCAL: readout_dt failed!
> 5: crate/crate.c:1501: .MCAL: had problems, re-initializing.
> 10: crate/crate.c:684: .crate_deinit(MCAL) {
> 10: crate/crate.c:708: .crate_deinit(MCAL) }
> 8: lwroc_triva_state.c:2028: Master TRIVA/MI no progress last second,
> and in deadtime.
> 8: lwroc_triva_state.c:2399: Master: deadtime: 1.  Status: 0x10
> (IN_READOUT).  EC: 2
> 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0.
> 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting...
> 10: crate/crate.c:900: .crate_init(MCAL) {
> 10: crate/crate.c:924: ..Slow-init module[0]=GSI_VULOM.
> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> 10: crate/crate.c:924: ..Slow-init module[1]=CAEN_V560.
> 10: module/map/map.c:224: ....rd(0x33333300+0xfa/16)=963ns
> wr(0x33333300+0x50/16)=713ns.
> 10: crate/crate.c:977: ..Fast-init module[0]=GSI_VULOM.
> 10: crate/crate.c:977: ..Fast-init module[1]=CAEN_V560.
> 10: crate/crate.c:1074: .crate_init(MCAL) }
> 5: f_user.c:1257: .had readout error, ret=0x14, trigger=14, prev=0

This is the last log from the readout part, it says trigger 14 was
handled. This trigger is always fired in MBS-like DAQ's before starting
the event loop, but no master start is delivered to the modules. No
physical event is associated with it, so we expect no counter increases
and no event data.

In general we read out "everything" for every trigger and rely on
modules reporting their status/content properly. Modules like the v560
ruin this since we always check it for all events and the counter always
increments, and it clearly shouldn't for trigger 14. So, in this case it
really did test the incorrect software logic...

I had a look in the v560 manual once more and only now did I realize
that it is not trigger based, the scalers are only available on-the-fly.
The event counter makes no sense then, so I will concede and suggest you
set the module counter mask to 0.

Have a look in module/gsi_vftx2/gsi_vftx2.c line 79, another module
without an event-counter which skips the whole counting stuff:

vftx2->module.event_counter.mask = 0;

Put something similar in module/caen_v560/caen_v560.c line 41, and feel
free to remove the increment in readout_dt.

Hope the extra info isn't too verbose...

Best regards,
Hans
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240222/3c9bde3b/attachment-0001.html>

From f96hajo at chalmers.se  Thu Feb 22 17:06:19 2024
From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=)
Date: Thu, 22 Feb 2024 17:06:19 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
	<5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>
	<93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>
	<b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>
	<c985e7923d944c1598475550f150c042@hi-jena.gsi.de>
	<53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se>
	<62267106fb284e50b805b9dba09b8483@hi-jena.gsi.de>,
	<423e14dd-73a9-4fbc-b106-4856cb1e1997@chalmers.se>
	<3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de>
Message-ID: <21a0486d-cd94-a93a-f677-2bd1e2bf4bb5@chalmers.se>


On Thu, 22 Feb 2024, Weber, Guenter Dr. wrote:

> 
> Dear Hans,
> 
> 
> many thanks! And in particular for all the detailed explanations.
> 
> 
> For the VULOM "sleep 1" did not do the trick, but "sleep 10" worked. Is
> there any chance to ask the VULOM if it feels ready to do the job, instead
> of using a random waiting time?

Now there is!  Update trloii and recompile trlo_ctrl,

   --trig-status   will at the end show some extra lines:

Serial timestamp status:(0x000a8004) words:  4 badbits: 0 CHKsum:0x00
Serial timestamp: Sync: no  Bitstr. sync: no, had loss  Data ptn: no, had loss

Where it should say "Sync: ok" when the receiver has locked.

The 'had loss' and bad bits count can be cleared (when locked) by issuing
   "pulse = SERIAL_TSTAMP_FAIL_CLEAR"

> Also I noticed that when aksing the VULOM which firmware it is using, we get
> a slightly different reply than the actual firmware number:
> 
> RIO4-MCAL-1 mbsdaq > vulomflash --addr=3 --read
> VULOM base address: 0x03000000
> hwmap_mapvme.c:398: LOG: Virtual address for VULOM/TRIDI @ VME 0x03000000 is
> 0x3005e000.
> Performing command 'read'...
> VOLUM+0 => 0x14091f20
> VOLUM+RANGE_REG(0x800000) => 0x0000006a
> Released vme ptr.
> But the actual firmware number is 1409285e.
> 
> For comparison should one look only at the first four hex numbers? Or is
> there more to take into account?

Yes, vulomflash --read reads at offset 0, and at that offset is also a 
TRIVA module mimic, which only uses the low 16 bits however.  So the high 
16 bits give part of the firmware hash.

> For the V560 module, misusing the bitmask for the counter?resolved the
> issue. At the end of this mail, I attach the new log. Maybe you find
> something notable, but to me it looks fine now.
> 
> 
> Our next steps would be as follows:
> 
> 
> 1) Wait for you to implement the bugfixes of the last days into NURDLIB.
> 
> 2) Setting up the test system with the most recent version of NURDLIB and
> checking, if our minimal system with VULOM and V560 is now running smoothly.
> 
> 3) Hammering the V767 TDC into NURDLIB.
> 
> 4) Once we have achieved this, we would go back to testing the SIS3316
> modules.
> 
> 
> 
> Best greetings
> 
> G?nter

Cheers,
H?kan


> 
> 
> 
> 
> 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port:
> 56583).
> Thread has no error buffer yet...
> CPUS: 1
> delay: 1
> 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port:
> 56583).
> Thread has no error buffer yet...
> HOST: RIO4-MCAL-1
> Token: d6d68d7b (d6d68d7b:d6d68d7b) [/mbsusr/mbsdaq/.drasi_tokens/mcal]
> 10: lwroc_hostname_util.c:457: Own address: 192.168.1.71/255.255.255.0
> (eth1).
> 10: lwroc_data_pipe.c:145: Data buffer READOUT_PIPE, size 419430400 =
> 0x19000000, 1 consumers.
> 10: lwroc_triva_readout.c:66: Silence TRIVA? (HALT)
> 10: lwroc_net_io.c:167: Started server on port 56583 (data port 39534).
> client union size: 244 208 188 508 640 204 204? => 640
> 10: lwroc_udp_awaken_hints.c:159: UDP awaken hints file:
> /tmp/drasi.u1001/drasi.hints.u1001.RIO4-MCAL-1:56583
> 10: lwroc_main.c:706: Log message rate limit not in effect.
> 10: lwroc_readout.c:112: call readout_init...
> 10: lwroc_thread_util.c:117: This is the triva control thread!
> 10: lwroc_thread_util.c:117: This is the net io thread!
> 10: lwroc_thread_util.c:117: This is the slow_async thread!
> 10: lwroc_thread_util.c:117: This is the data server thread!
> 8: lwroc_message_wait.c:86: Waited 1 seconds for msg client.
> 8: lwroc_triva_state.c:414: Waited 1 seconds for initial slave and EB
> connection(s):
> 8: lwroc_triva_state.c:422: [EB lyserv] (state 0)
> 10: lwroc_message_internal.c:472: Message client connected!
> 10: lwroc_net_trans.c:1156: [drasi] Transport client connected (data)
> [192.168.1.1].
> 10: lwroc_triva_control.c:370: Setup TRIVA? (DISBUS, HALT, MASTER, RESET)
> 10: lwroc_triva_control.c:418: Minimum event time
> ctime(5000)+1*rd(686)+3*wr(634)+fctime(1000)=8588 ns (116.442 kHz)
> 10: lwroc_triva_state.c:1486: (Re)send ident messages...
> 10: lwroc_triva_control.c:495: START TEST ACQ: HALT, CLEAR=RESET, MT=1
> 9: lwroc_triva_control.c:507: TEST: GO
> 10: lwroc_triva_control.c:725: RUN: RESET
> 10: lwroc_triva_control.c:729: RUN: MT=14
> 9: lwroc_triva_control.c:737:?? GO (1 good test triggers done) (max 116.4
> kHz)
> 10: lwroc_triva_readout.c:376: Trigger 14 seen.
> 10: config/config.c:181: Will try default cfgpath='/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default
> ', can be set with NURDLIB_DEF_PATH.
> 8: lwroc_triva_state.c:2399: Master: deadtime: 1.? Status: 0x10
> (IN_READOUT).? EC: 1
> 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0.
> 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting...
> 10: config/parser.c:287: Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/glob
> al.cfg' {
> 10: config/parser.c:299: Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/glob
> al.cfg' }
> 10: config/parser.c:287: Opened './main.cfg' {
> 10: config/config.c:1299: .Global log level=debug.
> 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crat
> e.cfg' {
> 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crat
> e.cfg' }
> 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_
> vulom.cfg' {
> 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_
> vulom.cfg' }
> 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/modu
> le_log_level.cfg' {
> 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/modu
> le_log_level.cfg' }
> 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen
> _v560.cfg' {
> 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen
> _v560.cfg' }
> 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/modu
> le_log_level.cfg' {
> 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/modu
> le_log_level.cfg' }
> 10: config/parser.c:299: Closed './main.cfg' }
> 10: crate/crate.c:348: crate_create {
> 10: crate/crate.c:674: crate_create(MCAL) }
> 10: crate/crate.c:900: crate_init(MCAL) {
> 10: crate/crate.c:924: .Slow-init module[0]=GSI_VULOM.
> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> 10: crate/crate.c:924: .Slow-init module[1]=CAEN_V560.
> 10: module/map/map.c:224: ...rd(0x33333300+0xfa/16)=963ns
> wr(0x33333300+0x50/16)=713ns.
> 10: crate/crate.c:977: .Fast-init module[0]=GSI_VULOM.
> 10: crate/crate.c:977: .Fast-init module[1]=CAEN_V560.
> 10: crate/crate.c:1074: crate_init(MCAL) }
> 10: ctrl/ctrl.c:788: Control server online.
> Thread has no error buffer yet...
> 10: f_user.c:559: WR ID=0x200.
> 10: f_user.c:565: TS offset unset. Will not modify stamp.
> 10: f_user.c:572: TPAT: No.
> 10: f_user.c:573: Sync-check: No.
> 10: f_user.c:575: Spill triggers: No.
> 10: f_user.c:576: LMU: No.
> 10: f_user.c:577: Timer latches: No.
> 10: f_user.c:578: Spill shape: No.
> 10: f_user.c:579: Micro-structure: No.
> 10: f_user.c:581: Multi-event flag: No.
> 10: f_user.c:586: UDP destination: None.
> GSI_VULOM: diff_module: 0, module_crate_counter: 0, module_event_counter: 0,
> module_this_minus_crate: 0
> CAEN_V560: diff_module: 0, module_crate_counter: 0, module_event_counter: 0,
> module_this_minus_crate: 0
> ...
> 
> 
> 
> ____________________________________________________________________________
> Von: Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
> Gesendet: Donnerstag, 22. Februar 2024 15:26:15
> An: Weber, Guenter Dr.; H?kan T Johansson
> Betreff: Re: AW: [subexp-daq] Report of a possible bug of the CAEN_V560
> module ?
> Dear G?nter,
> 
> I have some ideas after your two e-mails, logs and config files are
> really useful :) I'll shorten the log for my comments. for the tl;dr
> version, just skip to the bottom code change suggestion.
> 
> On 2024-02-22 14:21, Weber, Guenter Dr. wrote:
> > Dear Hans,
> >
> > here is the output of the DRASI log from the test system. I am using the
> > most recent (or almost most recent) versions of the various software
> > packages from GITLAB. The system currently only has a VULOM and a V560
> > module.
> >
> > I added some comments to the output. To me it looks, the VULOM has some
> > problems at the beginning. And, separate from the VULOM issue, the math
> > of the difference in counters does not work out for the V560 module.
> >
> > The VULOM issue I did not notice before. So, maybe by adding a lot of
> > output lines into crate.c and then removing them I have broken
> > something. But it also possible that before I simply overlooked these
> > error message as in the end it looks like the DAQ is working fine.
> >
> > Best greetings
> >
> > G?nter
> >
> > 10: f_user.c:559: WR ID=0x200.
> > 10: f_user.c:565: TS offset unset. Will not modify stamp.
> > 10: f_user.c:572: TPAT: No.
> > 10: f_user.c:573: Sync-check: No.
> > 10: f_user.c:575: Spill triggers: No.
> > 10: f_user.c:576: LMU: No.
> > 10: f_user.c:577: Timer latches: No.
> > 10: f_user.c:578: Spill shape: No.
> > 10: f_user.c:579: Micro-structure: No.
> > 10: f_user.c:581: Multi-event flag: No.
> > 10: f_user.c:586: UDP destination: None.
> > ***** looks like the VULOM has a problem. I did not notice this before
> *****
> > 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver has had
> > sync failure, status=0x000a8000.
> > 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver not
> > synced, status=0x000a8000.
> > 5: crate/crate.c:1250: .MCAL[0]=GSI_VULOM: readout_dt failed = 0x00000004.
> > 5: crate/crate.c:1322: .MCAL[0]=GSI_VULOM: Event counter:
> > crate=0x00000000/32, this-crate=0x00000000, module=0x00000000/31
> > diff=0xdeadbeef, shadow=0x00000000/31 diff=0xdeadbeef.
> 
> The sync could be bad if the DAQ starts too fast after setting up the
> timestamp source. Try "sleep 1" after setting up the vulom so the
> timestamp receiver in the vulom can latch onto its input, even if it's
> wired internally.
> 
> > ***** here we see the counter mismatch - I did add a 250 ms delay to
> > crate.c, before it does readout_dt again to avoid having thousands of
> > output lines here******
> > *
> > CAEN_V560: diff_module: 4294967295, module_crate_counter: 0,
> > module_event_counter: 1, module_this_minus_crate: 0
> > CAEN_V560: diff_module: 4294967294, module_crate_counter: 0,
> > module_event_counter: 2, module_this_minus_crate: 0
> > CAEN_V560: diff_module: 4294967293, module_crate_counter: 0,
> > module_event_counter: 3, module_this_minus_crate: 0
> > CAEN_V560: diff_module: 4294967292, module_crate_counter: 0,
> > module_event_counter: 4, module_this_minus_crate: 0
> > ***** after four trials of readout_dt of V560, we reach the timeout of 1
> > second*****
> > 5: crate/crate.c:1287: .MCAL[1]=CAEN_V560: readout_dt timeout.
> > 5: crate/crate.c:1322: .MCAL[1]=CAEN_V560: Event counter:
> > crate=0x00000000/32, this-crate=0x00000000, module=0x00000004/32
> > diff=0xfffffffc, shadow=0x00000000/32 diff=0x00000000.
> 
> This is an artifact of the soft counter in the v560, obviously we don't
> expect the module to have more accepted events while polling it, but the
> real problem comes a bit later.
> 
> > 5: crate/crate.c:1394: .MCAL: readout_dt failed!
> > 5: crate/crate.c:1501: .MCAL: had problems, re-initializing.
> > 10: crate/crate.c:684: .crate_deinit(MCAL) {
> > 10: crate/crate.c:708: .crate_deinit(MCAL) }
> > 8: lwroc_triva_state.c:2028: Master TRIVA/MI no progress last second,
> > and in deadtime.
> > 8: lwroc_triva_state.c:2399: Master: deadtime: 1.? Status: 0x10
> > (IN_READOUT).? EC: 2
> > 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0.
> > 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting...
> > 10: crate/crate.c:900: .crate_init(MCAL) {
> > 10: crate/crate.c:924: ..Slow-init module[0]=GSI_VULOM.
> > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> > 10: crate/crate.c:924: ..Slow-init module[1]=CAEN_V560.
> > 10: module/map/map.c:224: ....rd(0x33333300+0xfa/16)=963ns
> > wr(0x33333300+0x50/16)=713ns.
> > 10: crate/crate.c:977: ..Fast-init module[0]=GSI_VULOM.
> > 10: crate/crate.c:977: ..Fast-init module[1]=CAEN_V560.
> > 10: crate/crate.c:1074: .crate_init(MCAL) }
> > 5: f_user.c:1257: .had readout error, ret=0x14, trigger=14, prev=0
> 
> This is the last log from the readout part, it says trigger 14 was
> handled. This trigger is always fired in MBS-like DAQ's before starting
> the event loop, but no master start is delivered to the modules. No
> physical event is associated with it, so we expect no counter increases
> and no event data.
> 
> In general we read out "everything" for every trigger and rely on
> modules reporting their status/content properly. Modules like the v560
> ruin this since we always check it for all events and the counter always
> increments, and it clearly shouldn't for trigger 14. So, in this case it
> really did test the incorrect software logic...
> 
> I had a look in the v560 manual once more and only now did I realize
> that it is not trigger based, the scalers are only available on-the-fly.
> The event counter makes no sense then, so I will concede and suggest you
> set the module counter mask to 0.
> 
> Have a look in module/gsi_vftx2/gsi_vftx2.c line 79, another module
> without an event-counter which skips the whole counting stuff:
> 
> vftx2->module.event_counter.mask = 0;
> 
> Put something similar in module/caen_v560/caen_v560.c line 41, and feel
> free to remove the increment in readout_dt.
> 
> Hope the extra info isn't too verbose...
> 
> Best regards,
> Hans
> 
>

From hans.tornqvist at chalmers.se  Thu Feb 22 18:04:07 2024
From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=)
Date: Thu, 22 Feb 2024 18:04:07 +0100
Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module
In-Reply-To: <3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de>
References: <f26289233d474198a724c44292bee7ed@hi-jena.gsi.de>
	<c83a1fc0e25d49bd93f145e32715f21a@hi-jena.gsi.de>
	<36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>
	<aef881cc33704546835a4e37051bb4ac@hi-jena.gsi.de>
	<E401E502-273E-4E8F-9474-95DAF1F5767A@chalmers.se>
	<a1c630bddacc45e1815496f2e5e0d763@hi-jena.gsi.de>
	<0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>
	<743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se>
	<5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>
	<93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>
	<b2c00173e725403fa0a9867634851cd3@hi-jena.gsi.de>
	<c985e7923d944c1598475550f150c042@hi-jena.gsi.de>
	<53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se>
	<62267106fb284e50b805b9dba09b8483@hi-jena.gsi.de>
	<423e14dd-73a9-4fbc-b106-4856cb1e1997@chalmers.se>
	<3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de>
Message-ID: <a565ec44-a090-49df-ad4b-4bfd11b01d64@chalmers.se>

Dear G?nter,

The fixes are up, I hope I didn't forget any. Try it, and in case of 
issues please do a "git diff" so we can see all the things that changed.

Thanks for helping out!

Cheers,
Hans

On 2024-02-22 16:09, Weber, Guenter Dr. wrote:
> Dear Hans,
> 
> 
> many thanks! And in particular for all the detailed explanations.
> 
> 
> For the VULOM "sleep 1" did not do the trick, but "sleep 10" worked. Is 
> there any chance to ask the VULOM if it feels ready to do the job, 
> instead of using a random waiting time?
> 
> 
> Also I noticed that when aksing the VULOM which firmware it is using, we 
> get a slightly different reply than the actual firmware number:
> 
> RIO4-MCAL-1 mbsdaq > vulomflash --addr=3 --read
> VULOM base address: 0x03000000
> hwmap_mapvme.c:398: LOG: Virtual address for VULOM/TRIDI @ VME 
> 0x03000000 is 0x3005e000.
> Performing command 'read'...
> *VOLUM+0 => 0x14091f20*
> VOLUM+RANGE_REG(0x800000) => 0x0000006a
> Released vme ptr.
> But the actual firmware number is *1409285e*.
> 
> For comparison should one look only at the first four hex numbers? Or is 
> there more to take into account?
> 
> 
> For the V560 module, misusing the bitmask for the counter?resolved the 
> issue. At the end of this mail, I attach the new log. Maybe you find 
> something notable, but to me it looks fine now.
> 
> 
> Our next steps would be as follows:
> 
> 
> 1) Wait for you to implement the bugfixes of the last days into NURDLIB.
> 
> 2) Setting up the test system with the most recent version of NURDLIB 
> and checking, if our minimal system with VULOM and V560 is now running 
> smoothly.
> 
> 3) Hammering the V767 TDC into NURDLIB.
> 
> 4) Once we have achieved this, we would go back to testing the SIS3316 
> modules.
> 
> 
> 
> Best greetings
> 
> G?nter
> 
> 
> 
> 
> 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 
> 56583).
> Thread has no error buffer yet...
> CPUS: 1
> delay: 1
> 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 
> 56583).
> Thread has no error buffer yet...
> HOST: RIO4-MCAL-1
> Token: d6d68d7b (d6d68d7b:d6d68d7b) [/mbsusr/mbsdaq/.drasi_tokens/mcal]
> 10: lwroc_hostname_util.c:457: Own address: 192.168.1.71/255.255.255.0 
> (eth1).
> 10: lwroc_data_pipe.c:145: Data buffer READOUT_PIPE, size 419430400 = 
> 0x19000000, 1 consumers.
> 10: lwroc_triva_readout.c:66: Silence TRIVA? (HALT)
> 10: lwroc_net_io.c:167: Started server on port 56583 (data port 39534).
> client union size: 244 208 188 508 640 204 204? => 640
> 10: lwroc_udp_awaken_hints.c:159: UDP awaken hints file: 
> /tmp/drasi.u1001/drasi.hints.u1001.RIO4-MCAL-1:56583
> 10: lwroc_main.c:706: Log message rate limit not in effect.
> 10: lwroc_readout.c:112: call readout_init...
> 10: lwroc_thread_util.c:117: This is the triva control thread!
> 10: lwroc_thread_util.c:117: This is the net io thread!
> 10: lwroc_thread_util.c:117: This is the slow_async thread!
> 10: lwroc_thread_util.c:117: This is the data server thread!
> 8: lwroc_message_wait.c:86: Waited 1 seconds for msg client.
> 8: lwroc_triva_state.c:414: Waited 1 seconds for initial slave and EB 
> connection(s):
> 8: lwroc_triva_state.c:422: [EB lyserv] (state 0)
> 10: lwroc_message_internal.c:472: Message client connected!
> 10: lwroc_net_trans.c:1156: [drasi] Transport client connected (data) 
> [192.168.1.1].
> 10: lwroc_triva_control.c:370: Setup TRIVA? (DISBUS, HALT, MASTER, RESET)
> 10: lwroc_triva_control.c:418: Minimum event time 
> ctime(5000)+1*rd(686)+3*wr(634)+fctime(1000)=8588 ns (116.442 kHz)
> 10: lwroc_triva_state.c:1486: (Re)send ident messages...
> 10: lwroc_triva_control.c:495: START TEST ACQ: HALT, CLEAR=RESET, MT=1
> 9: lwroc_triva_control.c:507: TEST: GO
> 10: lwroc_triva_control.c:725: RUN: RESET
> 10: lwroc_triva_control.c:729: RUN: MT=14
> 9: lwroc_triva_control.c:737:?? GO (1 good test triggers done) (max 
> 116.4 kHz)
> 10: lwroc_triva_readout.c:376: Trigger 14 seen.
> 10: config/config.c:181: Will try default cfg 
> path='/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default', can be set with NURDLIB_DEF_PATH.
> 8: lwroc_triva_state.c:2399: Master: deadtime: 1.? Status: 0x10 
> (IN_READOUT).? EC: 1
> 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0.
> 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting...
> 10: config/parser.c:287: Opened 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' {
> 10: config/parser.c:299: Closed 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' }
> 10: config/parser.c:287: Opened './main.cfg' {
> 10: config/config.c:1299: .Global log level=debug.
> 10: config/parser.c:287: .Opened 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' {
> 10: config/parser.c:299: .Closed 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' }
> 10: config/parser.c:287: .Opened 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' {
> 10: config/parser.c:299: .Closed 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' }
> 10: config/parser.c:287: .Opened 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' {
> 10: config/parser.c:299: .Closed 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' }
> 10: config/parser.c:287: .Opened 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen_v560.cfg' {
> 10: config/parser.c:299: .Closed 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen_v560.cfg' }
> 10: config/parser.c:287: .Opened 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' {
> 10: config/parser.c:299: .Closed 
> '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' }
> 10: config/parser.c:299: Closed './main.cfg' }
> 10: crate/crate.c:348: crate_create {
> 10: crate/crate.c:674: crate_create(MCAL) }
> 10: crate/crate.c:900: crate_init(MCAL) {
> 10: crate/crate.c:924: .Slow-init module[0]=GSI_VULOM.
> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
> 10: crate/crate.c:924: .Slow-init module[1]=CAEN_V560.
> 10: module/map/map.c:224: ...rd(0x33333300+0xfa/16)=963ns 
> wr(0x33333300+0x50/16)=713ns.
> 10: crate/crate.c:977: .Fast-init module[0]=GSI_VULOM.
> 10: crate/crate.c:977: .Fast-init module[1]=CAEN_V560.
> 10: crate/crate.c:1074: crate_init(MCAL) }
> 10: ctrl/ctrl.c:788: Control server online.
> Thread has no error buffer yet...
> 10: f_user.c:559: WR ID=0x200.
> 10: f_user.c:565: TS offset unset. Will not modify stamp.
> 10: f_user.c:572: TPAT: No.
> 10: f_user.c:573: Sync-check: No.
> 10: f_user.c:575: Spill triggers: No.
> 10: f_user.c:576: LMU: No.
> 10: f_user.c:577: Timer latches: No.
> 10: f_user.c:578: Spill shape: No.
> 10: f_user.c:579: Micro-structure: No.
> 10: f_user.c:581: Multi-event flag: No.
> 10: f_user.c:586: UDP destination: None.
> GSI_VULOM: diff_module: 0, module_crate_counter: 0, 
> module_event_counter: 0, module_this_minus_crate: 0
> CAEN_V560: diff_module: 0, module_crate_counter: 0, 
> module_event_counter: 0, module_this_minus_crate: 0
> ...
> 
> 
> 
> ------------------------------------------------------------------------
> *Von:* Hans Toshihide T?rnqvist <hans.tornqvist at chalmers.se>
> *Gesendet:* Donnerstag, 22. Februar 2024 15:26:15
> *An:* Weber, Guenter Dr.; H?kan T Johansson
> *Betreff:* Re: AW: [subexp-daq] Report of a possible bug of the 
> CAEN_V560 module
> Dear G?nter,
> 
> I have some ideas after your two e-mails, logs and config files are
> really useful :) I'll shorten the log for my comments. for the tl;dr
> version, just skip to the bottom code change suggestion.
> 
> On 2024-02-22 14:21, Weber, Guenter Dr. wrote:
>> Dear Hans,
>> 
>> here is the output of the DRASI log from the test system. I am using the 
>> most recent (or almost most recent) versions of the various software 
>> packages from GITLAB. The system currently only has a VULOM and a V560 
>> module.
>> 
>> I added some comments to the output. To me it looks, the VULOM has some 
>> problems at the beginning. And, separate from the VULOM issue, the math 
>> of the difference in counters does not work out for the V560 module.
>> 
>> The VULOM issue I did not notice before. So, maybe by adding a lot of 
>> output lines into crate.c and then removing them I have broken 
>> something. But it also possible that before I simply overlooked these 
>> error message as in the end it looks like the DAQ is working fine.
>> 
>> Best greetings
>> 
>> G?nter
>> 
>> 10: f_user.c:559: WR ID=0x200.
>> 10: f_user.c:565: TS offset unset. Will not modify stamp.
>> 10: f_user.c:572: TPAT: No.
>> 10: f_user.c:573: Sync-check: No.
>> 10: f_user.c:575: Spill triggers: No.
>> 10: f_user.c:576: LMU: No.
>> 10: f_user.c:577: Timer latches: No.
>> 10: f_user.c:578: Spill shape: No.
>> 10: f_user.c:579: Micro-structure: No.
>> 10: f_user.c:581: Multi-event flag: No.
>> 10: f_user.c:586: UDP destination: None.
>> ***** looks like the VULOM has a problem. I did not notice this before *****
>> 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver has had 
>> sync failure, status=0x000a8000.
>> 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver not 
>> synced, status=0x000a8000.
>> 5: crate/crate.c:1250: .MCAL[0]=GSI_VULOM: readout_dt failed = 0x00000004.
>> 5: crate/crate.c:1322: .MCAL[0]=GSI_VULOM: Event counter: 
>> crate=0x00000000/32, this-crate=0x00000000, module=0x00000000/31 
>> diff=0xdeadbeef, shadow=0x00000000/31 diff=0xdeadbeef.
> 
> The sync could be bad if the DAQ starts too fast after setting up the
> timestamp source. Try "sleep 1" after setting up the vulom so the
> timestamp receiver in the vulom can latch onto its input, even if it's
> wired internally.
> 
>> ***** here we see the counter mismatch - I did add a 250 ms delay to 
>> crate.c, before it does readout_dt again to avoid having thousands of 
>> output lines here******
>> *
>> CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, 
>> module_event_counter: 1, module_this_minus_crate: 0
>> CAEN_V560: diff_module: 4294967294, module_crate_counter: 0, 
>> module_event_counter: 2, module_this_minus_crate: 0
>> CAEN_V560: diff_module: 4294967293, module_crate_counter: 0, 
>> module_event_counter: 3, module_this_minus_crate: 0
>> CAEN_V560: diff_module: 4294967292, module_crate_counter: 0, 
>> module_event_counter: 4, module_this_minus_crate: 0
>> ***** after four trials of readout_dt of V560, we reach the timeout of 1 
>> second*****
>> 5: crate/crate.c:1287: .MCAL[1]=CAEN_V560: readout_dt timeout.
>> 5: crate/crate.c:1322: .MCAL[1]=CAEN_V560: Event counter: 
>> crate=0x00000000/32, this-crate=0x00000000, module=0x00000004/32 
>> diff=0xfffffffc, shadow=0x00000000/32 diff=0x00000000.
> 
> This is an artifact of the soft counter in the v560, obviously we don't
> expect the module to have more accepted events while polling it, but the
> real problem comes a bit later.
> 
>> 5: crate/crate.c:1394: .MCAL: readout_dt failed!
>> 5: crate/crate.c:1501: .MCAL: had problems, re-initializing.
>> 10: crate/crate.c:684: .crate_deinit(MCAL) {
>> 10: crate/crate.c:708: .crate_deinit(MCAL) }
>> 8: lwroc_triva_state.c:2028: Master TRIVA/MI no progress last second, 
>> and in deadtime.
>> 8: lwroc_triva_state.c:2399: Master: deadtime: 1.? Status: 0x10 
>> (IN_READOUT).? EC: 2
>> 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0.
>> 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting...
>> 10: crate/crate.c:900: .crate_init(MCAL) {
>> 10: crate/crate.c:924: ..Slow-init module[0]=GSI_VULOM.
>> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC)
>> 10: crate/crate.c:924: ..Slow-init module[1]=CAEN_V560.
>> 10: module/map/map.c:224: ....rd(0x33333300+0xfa/16)=963ns 
>> wr(0x33333300+0x50/16)=713ns.
>> 10: crate/crate.c:977: ..Fast-init module[0]=GSI_VULOM.
>> 10: crate/crate.c:977: ..Fast-init module[1]=CAEN_V560.
>> 10: crate/crate.c:1074: .crate_init(MCAL) }
>> 5: f_user.c:1257: .had readout error, ret=0x14, trigger=14, prev=0
> 
> This is the last log from the readout part, it says trigger 14 was
> handled. This trigger is always fired in MBS-like DAQ's before starting
> the event loop, but no master start is delivered to the modules. No
> physical event is associated with it, so we expect no counter increases
> and no event data.
> 
> In general we read out "everything" for every trigger and rely on
> modules reporting their status/content properly. Modules like the v560
> ruin this since we always check it for all events and the counter always
> increments, and it clearly shouldn't for trigger 14. So, in this case it
> really did test the incorrect software logic...
> 
> I had a look in the v560 manual once more and only now did I realize
> that it is not trigger based, the scalers are only available on-the-fly.
> The event counter makes no sense then, so I will concede and suggest you
> set the module counter mask to 0.
> 
> Have a look in module/gsi_vftx2/gsi_vftx2.c line 79, another module
> without an event-counter which skips the whole counting stuff:
> 
> vftx2->module.event_counter.mask = 0;
> 
> Put something similar in module/caen_v560/caen_v560.c line 41, and feel
> free to remove the increment in readout_dt.
> 
> Hope the extra info isn't too verbose...
> 
> Best regards,
> Hans
> 

From f96hajo at chalmers.se  Mon Feb 26 10:05:21 2024
From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=)
Date: Mon, 26 Feb 2024 10:05:21 +0100
Subject: [subexp-daq] drasi option --log-ack-wait
Message-ID: <27b8f559-3a3e-ab54-db7f-4901bc1fd998@chalmers.se>


Hi!

while I hope this option will very rarely be needed, drasi now has an 
option

   --log-ack-wait

which will make it wait for an acknowledge from the log client before 
proceeding after each log message.  This is intended to help debugging 
hardware lockups, by sprinkling the code with log messages before and 
after each suspicious point.  (Or perhaps first just enable verbose 
logging.)

Cheers,
H?kan

From g.weber at hi-jena.gsi.de  Thu Feb 29 17:00:40 2024
From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.)
Date: Thu, 29 Feb 2024 16:00:40 +0000
Subject: [subexp-daq] NURDLIB: Init fast vs Init slow for modules
Message-ID: <8a32f9e3be684e1892036bd445c14181@hi-jena.gsi.de>

Dear friends,


is there a clear rule what should happen in the two init routines? In which cases INIT SLOW is executed and how is that different from INIT FAST?


Thanks a lot!


Best greetings

G?nter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chalmers.se/pipermail/subexp-daq/attachments/20240229/fc9e5347/attachment.html>

From hans.tornqvist at chalmers.se  Thu Feb 29 18:04:19 2024
From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=)
Date: Thu, 29 Feb 2024 18:04:19 +0100
Subject: [subexp-daq] NURDLIB: Init fast vs Init slow for modules
In-Reply-To: <8a32f9e3be684e1892036bd445c14181@hi-jena.gsi.de>
References: <8a32f9e3be684e1892036bd445c14181@hi-jena.gsi.de>
Message-ID: <9e7ff158-ca44-40f4-9326-2186f3a4d909@chalmers.se>

Dear G?nter,


Here is a hopefully quick summary of the not so long nurdlib history 
(tl;dr at the bottom):


The idea of two init functions came from the v1290, which has a 
controller for many settings that is really slow to talk to. 'init_slow' 
would do the slow configs which one does not want to sit through too 
much, and 'init_fast' the faster ones.

The crate calls 'init_slow' for all modules, does some checks, calls 
'init_fast' for all modules, and for certain modules also calls an 
optional 'postinit'.

Later came the idea to do online configuration while a DAQ is running. 
Rather than splitting the init functions (or my mind) even further, 
'init_slow' was taken as the non-online part (e.g. mapping) and 
'init_fast' the online part (e.g. writing thresholds). Some controller 
drivers have been buggy and reducing re-maps has been important.

Obviously, it turned out that some, or most, slow writes on the v1290 
were useful to do online, so lots of things in 'init_slow' moved over to 
'init_fast'. Voila, the names don't really make sense any longer...

The online feature has higher priority than the re-initialisation 
nowadays, since the latter should be rare in a properly working setup. 
Eventually there should be a refactoring which is great since it changes 
so much at once. I even have another one ready to go into 'master', but 
I didn't dare to push that onto others yet.


Now for the useful tl;dr part :)

Put mapping and things that should not be changed online in init_slow, 
and everything else in init_fast. Everything that comes from 
'config_get_*' could be changed online, I think.


Cheers,
Hans

On 2024-02-29 17:00, Weber, Guenter Dr. wrote:
> Dear friends,
> 
> 
> is there a clear rule what should happen in the two init routines? In 
> which cases INIT SLOW is executed and how is that different from INIT FAST?
> 
> 
> 
> Thanks a lot!
> 
> 
> 
> Best greetings
> 
> G?nter
> 
> 
> 
>