From g.weber at hi-jena.gsi.de Thu Feb 15 20:31:00 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Thu, 15 Feb 2024 19:31:00 +0000 Subject: [subexp-daq] Report of a possible bug with "log_level=spam" Message-ID: <560c111aa5e343a9aa679dde1eeb3855@hi-jena.gsi.de> Dear friends, while playing around with the DAQ, I got the following problem right at the start of the DAQ: 5: util/log.c:319: ..........Log indent overflow: "..........Gsi Vulom readout_dt {". 5: util/log.c:319: ..........Calling abort()... In my main.cfg I had just a single VULOM active. log_level=spam # info, verbose, debug, spam CRATE("MCAL") { GSI_VULOM(0x03000000) { timestamp = true # needed to get timestamps in the data output # ecl=0..15 } # BARRIER # DUMMY(0x01000000) { # } } If the log_level is reduced to debug, the error does not occur and the system is running with problems. I am using the most recent version of NURDLIB. Attached please find the full output of the RIO when the DAQ is started. Best greetings G?nter -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 56583). Thread has no error buffer yet... CPUS: 1 delay: 1 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 56583). Thread has no error buffer yet... HOST: RIO4-MCAL-1 Token: d6d68d7b (d6d68d7b:d6d68d7b) [/mbsusr/mbsdaq/.drasi_tokens/mcal] 10: lwroc_hostname_util.c:457: Own address: 192.168.1.71/255.255.255.0 (eth1). 10: lwroc_data_pipe.c:145: Data buffer READOUT_PIPE, size 419430400 = 0x19000000, 1 consumers. 10: lwroc_triva_readout.c:66: Silence TRIVA (HALT) 10: lwroc_net_io.c:167: Started server on port 56583 (data port 43514). client union size: 244 208 188 508 640 204 204 => 640 10: lwroc_udp_awaken_hints.c:159: UDP awaken hints file: /tmp/drasi.u1001/drasi.hints.u1001.RIO4-MCAL-1:56583 10: lwroc_main.c:706: Log message rate limit not in effect. 10: lwroc_readout.c:112: call readout_init... 10: lwroc_thread_util.c:117: This is the triva control thread! 10: lwroc_thread_util.c:117: This is the net io thread! 10: lwroc_thread_util.c:117: This is the slow_async thread! 10: lwroc_thread_util.c:117: This is the data server thread! 10: lwroc_message_internal.c:472: Message client connected! 10: lwroc_net_trans.c:1156: [drasi] Transport client connected (data) [192.168.1.1]. 10: lwroc_triva_control.c:370: Setup TRIVA (DISBUS, HALT, MASTER, RESET) 10: lwroc_triva_control.c:418: Minimum event time ctime(5000)+1*rd(694)+3*wr(634)+fctime(1000)=8596 ns (116.333 kHz) 10: lwroc_triva_state.c:1486: (Re)send ident messages... 10: lwroc_triva_control.c:495: START TEST ACQ: HALT, CLEAR=RESET, MT=1 9: lwroc_triva_control.c:507: TEST: GO 10: lwroc_triva_control.c:725: RUN: RESET 10: lwroc_triva_control.c:729: RUN: MT=14 9: lwroc_triva_control.c:737: GO (1 good test triggers done) (max 116.3 kHz) 10: lwroc_triva_readout.c:376: Trigger 14 seen. 10: config/config.c:181: Will try default cfg path='/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default', can be set with NURDLIB_DEF_PATH. 10: config/parser.c:287: Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' { 10: config/parser.c:299: Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' } 10: config/parser.c:287: Opened './main.cfg' { 8: lwroc_triva_state.c:2399: Master: deadtime: 1. Status: 0x10 (IN_READOUT). EC: 1 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0. 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting... 10: config/config.c:1299: .Global log level=spam. 10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' { 10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' } 10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' { 10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' } 10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' { 10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' } 10: config/parser.c:299: Closed './main.cfg' } 10: crate/crate.c:347: crate_create { 10: crate/crate.c:673: crate_create(MCAL) } 10: crate/crate.c:899: crate_init(MCAL) { 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) 10: crate/crate.c:976: .Fast-init module[0]=GSI_VULOM. 10: crate/crate.c:1073: crate_init(MCAL) } 10: ctrl/ctrl.c:788: Control server online. Thread has no error buffer yet... 10: f_user.c:559: WR ID=0x200. 10: f_user.c:565: TS offset unset. Will not modify stamp. 10: f_user.c:572: TPAT: No. 10: f_user.c:573: Sync-check: No. 10: f_user.c:575: Spill triggers: No. 10: f_user.c:576: LMU: No. 10: f_user.c:577: Timer latches: No. 10: f_user.c:578: Spill shape: No. 10: f_user.c:579: Micro-structure: No. 10: f_user.c:581: Multi-event flag: No. 10: f_user.c:586: UDP destination: None. 5: util/log.c:319: ..........Log indent overflow: "..........Gsi Vulom readout_dt {". 5: util/log.c:319: ..........Calling abort()... From hans.tornqvist at chalmers.se Thu Feb 15 23:25:18 2024 From: hans.tornqvist at chalmers.se (=?ISO-8859-1?Q?Hans_Toshihide_T=F6rnqvist?=) Date: Thu, 15 Feb 2024 23:25:18 +0100 Subject: [subexp-daq] Report of a possible bug with "log_level=spam" In-Reply-To: <560c111aa5e343a9aa679dde1eeb3855@hi-jena.gsi.de> References: <560c111aa5e343a9aa679dde1eeb3855@hi-jena.gsi.de> Message-ID: <55CF5C51-002D-448D-ACEE-DE4C22F90D0B@chalmers.se> Dear G?nter, That is definitely a bug in nurdlib, thanks for finding it! It looks like some log call which opens a new scope with "{" does not have a proper corresponding closing curly bracket log. Is there more information in the drasi log file by any chance? I will look through the relevant code meanwhile, but if you find more lines before the error appears it would help a lot. (This also reminds me to evaluate nurdlib log scopes agai, since it requires very careful ha sling of actual C scope...) Best regards, Hans "Weber, Guenter Dr." skrev: (15 februari 2024 20:31:00 CET) >Dear friends, > > >while playing around with the DAQ, I got the following problem right at the start of the DAQ: > > >5: util/log.c:319: ..........Log indent overflow: "..........Gsi Vulom readout_dt {". >5: util/log.c:319: ..........Calling abort()... > > >In my main.cfg I had just a single VULOM active. > > >log_level=spam # info, verbose, debug, spam > >CRATE("MCAL") { > GSI_VULOM(0x03000000) { > timestamp = true # needed to get timestamps in the data output > # ecl=0..15 > } ># BARRIER ># DUMMY(0x01000000) { ># } >} > > >If the log_level is reduced to debug, the error does not occur and the system is running with problems. > > >I am using the most recent version of NURDLIB. > > >Attached please find the full output of the RIO when the DAQ is started. > > > > > >Best greetings > >G?nter > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.weber at hi-jena.gsi.de Fri Feb 16 01:50:30 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Fri, 16 Feb 2024 00:50:30 +0000 Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c Message-ID: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de> Dear friends, we are now trying to add a new module into NURDLIB. At the beginning, we just want to have a 'software version' of the module, so no hardware involved yet. We are stuck with the following code in crate.c (line 1262 and following): diff_module = COUNTER_DIFF(*module->crate_counter, module->event_counter, module->this_minus_crate); /* TODO: Clean this. */ shadow_counter.value = module->shadow.data_counter_value; shadow_counter.mask = module->event_counter.mask; diff_shadow = COUNTER_DIFF(*module->crate_counter, shadow_counter, module->this_minus_crate); As we understand, the idea here is to check with a call of readout_dt if the module's internal counter agrees with the counter of the crate for the given module. Basicly when the crate thinks it has accessed the module n times, the module should report the same number of access attempts. Is this understanding correct? If yes, how exactly should the module increment it's internal counter? If this is done on readout_dt, a mismatch between the crate and the internal counter occurs as soon as we stop the aquisition. I can explain in more details, but maybe first you can explain to us what the whole idea behind this check is? To us it looks like also the dummy module implementation will have problems with it. Thank you very much! Best greetings G?nter -------------- next part -------------- An HTML attachment was scrubbed... URL: From f96hajo at chalmers.se Fri Feb 16 13:25:59 2024 From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=) Date: Fri, 16 Feb 2024 13:25:59 +0100 Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c In-Reply-To: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de> References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de> Message-ID: <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se> Dear G?nter, I think Hans might have to correct me. This is not exactly what this code does, but typically the modules have a trigger/event counter, which is incremented for each gate/common signal they receive on the front-panel. When the readout is event-by-event, these counters are checked strictly by nurdlib, in order to detect cabling issues, double-triggers and so on. This is especially important for modules that have multi-event buffers, that otherwise easily can become desynchronised. (It takes of course some time to read these counters, which contributes to the overall deadtime. But the amount of times this has 'saved' data-taking by detecting issues early make it very worthwhile.) What the code you refer to is doing is I think 'abusing' this a bit. There is typically some time (order of us) between the trigger, and when the signals have been digitised and the data becomes available. Often, those counters are only updated after that is the case. To me it looks like this function use the counters (which we anyhow want to check) to wait for the modules to have finished converting one event. Best regards, H?kan On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote: > > Dear friends, > > > we are now trying to add a new module into NURDLIB. At the beginning, we just want to have a 'software version' of the module, so no > hardware involved yet. > > > We are stuck with the following code in crate.c (line 1262 and following): > > > ? ? ? ? ? ? diff_module = COUNTER_DIFF(*module->crate_counter, > ? ? ? ? ? ? ? ? module->event_counter, module->this_minus_crate); > ? ? ? ? ? ? /* TODO: Clean this. */ > ? ? ? ? ? ? shadow_counter.value = > ? ? ? ? ? ? ? ? module->shadow.data_counter_value; > ? ? ? ? ? ? shadow_counter.mask = module->event_counter.mask; > ? ? ? ? ? ? diff_shadow = COUNTER_DIFF(*module->crate_counter, > ? ? ? ? ? ? ? ? shadow_counter, module->this_minus_crate); > > > As we understand, the idea here is to check with a call of readout_dt if the module's internal counter agrees with the counter of the > crate for the given module. Basicly when the crate thinks it has accessed the module n times, the module should report the same number of > access attempts. Is this understanding correct? > > > If yes, how exactly should the module increment it's internal counter? If this is done on readout_dt, a mismatch between the crate and the > internal counter occurs as soon as we stop the aquisition. > > > I can explain in more details, but maybe first you can explain to us what the whole idea behind this check is? To us it looks like also > the dummy module implementation will have problems with it. > > > > > Thank you very much! > > > > > Best greetings > > G?nter > > > > From hans.tornqvist at chalmers.se Fri Feb 16 13:36:28 2024 From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=) Date: Fri, 16 Feb 2024 13:36:28 +0100 Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c In-Reply-To: <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se> References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de> <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se> Message-ID: Dear all, Everything looks fine, but I thought I could add a few more points, which were written in parallel to H?kan's reply, I'll try to update my text properly :) --- The crate struct keeps a counter for every permutation of module tags. The counters are modified either by: -) 'crate_tag_counter_increase' which should be called by the user code (e.g. the f-user) for every tag that should fire for a particular event, or -) a scaler channel which counts the number of events, e.g. MASTER_START in a TRLO II vulom, or a v830 which has an accepted trigger cabled to an input, which is fully specified in main.cfg. The latter overrides the former if configured, so the nurdlib f-user and r3bfuser always call 'crate_tag_counter_increase' per readout. For multi-event readout one would need to setup the latter approach. (Putting a description of this on the docu todo, if it's not already in...) --- Best is when the "hardware" increments. For example, a v775 counts the number of signals sent on its trigger input, and nurdlib reads this value in readout_dt, which the crate then uses to compare with its software counters. Some modules do not provide such counters directly before payload readout (e.g. gsi_tamex and similar) in which case the counting is done in software, which is mostly a test of the library logic. The module payload can carry a counter and trigger number however, which are checked later in '*parse_data'. --- I will have a look at the dummy module, I thought we have a test with a few software triggers for it, otherwise I will add that. Best regards, Hans On 2024-02-16 13:25, H?kan T Johansson wrote: > > Dear G?nter, > > I think Hans might have to correct me. > > This is not exactly what this code does, but typically the modules have > a trigger/event counter, which is incremented for each gate/common > signal they receive on the front-panel. > > When the readout is event-by-event, these counters are checked strictly > by nurdlib, in order to detect cabling issues, double-triggers and so > on. This is especially important for modules that have multi-event > buffers, that otherwise easily can become desynchronised.? (It takes of > course some time to read these counters, which contributes to the > overall deadtime. But the amount of times this has 'saved' data-taking > by detecting issues early make it very worthwhile.) > > > What the code you refer to is doing is I think 'abusing' this a bit. > There is typically some time (order of us) between the trigger, and when > the signals have been digitised and the data becomes available.? Often, > those counters are only updated after that is the case.? To me it looks > like this function use the counters (which we anyhow want to check) to > wait for the modules to have finished converting one event. > > > Best regards, > H?kan > > > > > > On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote: > >> >> Dear friends, >> >> >> we are now trying to add a new module into NURDLIB. At the beginning, >> we just want to have a 'software version' of the module, so no >> hardware involved yet. >> >> >> We are stuck with the following code in crate.c (line 1262 and >> following): >> >> >> ? ? ? ? ? ? diff_module = COUNTER_DIFF(*module->crate_counter, >> ? ? ? ? ? ? ? ? module->event_counter, module->this_minus_crate); >> ? ? ? ? ? ? /* TODO: Clean this. */ >> ? ? ? ? ? ? shadow_counter.value = >> ? ? ? ? ? ? ? ? module->shadow.data_counter_value; >> ? ? ? ? ? ? shadow_counter.mask = module->event_counter.mask; >> ? ? ? ? ? ? diff_shadow = COUNTER_DIFF(*module->crate_counter, >> ? ? ? ? ? ? ? ? shadow_counter, module->this_minus_crate); >> >> >> As we understand, the idea here is to check with a call of readout_dt >> if the module's internal counter agrees with the counter of the >> crate for the given module. Basicly when the crate thinks it has >> accessed the module n times, the module should report the same number of >> access attempts. Is this understanding correct? >> >> >> If yes, how exactly should the module increment it's internal counter? >> If this is done on readout_dt, a mismatch between the crate and the >> internal counter occurs as soon as we stop the aquisition. >> >> >> I can explain in more details, but maybe first you can explain to us >> what the whole idea behind this check is? To us it looks like also >> the dummy module implementation will have problems with it. >> >> >> >> >> Thank you very much! >> >> >> >> >> Best greetings >> >> G?nter >> >> >> >> > From g.weber at hi-jena.gsi.de Fri Feb 16 13:50:02 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Fri, 16 Feb 2024 12:50:02 +0000 Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c In-Reply-To: <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se> References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de>, <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se> Message-ID: <5e78793bb00a4cb8840fef372e3ebb07@hi-jena.gsi.de> Dear H?kan, thank you. This explanation makes some sense. Could you also explain the concept of "shadow module". What is it good for? Also I would like to point out that the implementation of readout_dt for the V560 module that we used as a reference looks weird to us: uint32_t caen_v560_readout_dt(struct Crate *a_crate, struct Module *a_module) { (void)a_crate; LOGF(spam)(LOGL, NAME" readout_dt {"); a_module->event_counter.value++; LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }", a_module->event_counter.value); return 0; } Here the counter is incremented every time the module is 'touched' by the readout_dt function. In the loop that starts at line 1240 in crate.c the function readout_dt is executed for the module until the test around line 1270 is passed or the timeout around line 1280 happens. Thus an initial mismatch between the (software) counter of module V560 and the crate that prevents the loop from being existed at the first trial will grow steadily as the module is accessed via readout_dt many, many times until it runs into the timeout. What is this good for? To my (current) understanding it is pointless to try to mimic the function of a true hardware counter within the module by a counter that only exists in software. The better way would be to tell crate.c that this module does not have such a counter so that the check is pointless. Is this understanding correct? And if yes, how can I tell NURDLIB to skip this check? Best greetings G?nter ________________________________ Von: subexp-daq im Auftrag von H?kan T Johansson Gesendet: Freitag, 16. Februar 2024 13:25:59 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Question on COUNTER_DIFF in crate.c Dear G?nter, I think Hans might have to correct me. This is not exactly what this code does, but typically the modules have a trigger/event counter, which is incremented for each gate/common signal they receive on the front-panel. When the readout is event-by-event, these counters are checked strictly by nurdlib, in order to detect cabling issues, double-triggers and so on. This is especially important for modules that have multi-event buffers, that otherwise easily can become desynchronised. (It takes of course some time to read these counters, which contributes to the overall deadtime. But the amount of times this has 'saved' data-taking by detecting issues early make it very worthwhile.) What the code you refer to is doing is I think 'abusing' this a bit. There is typically some time (order of us) between the trigger, and when the signals have been digitised and the data becomes available. Often, those counters are only updated after that is the case. To me it looks like this function use the counters (which we anyhow want to check) to wait for the modules to have finished converting one event. Best regards, H?kan On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote: > > Dear friends, > > > we are now trying to add a new module into NURDLIB. At the beginning, we just want to have a 'software version' of the module, so no > hardware involved yet. > > > We are stuck with the following code in crate.c (line 1262 and following): > > > diff_module = COUNTER_DIFF(*module->crate_counter, > module->event_counter, module->this_minus_crate); > /* TODO: Clean this. */ > shadow_counter.value = > module->shadow.data_counter_value; > shadow_counter.mask = module->event_counter.mask; > diff_shadow = COUNTER_DIFF(*module->crate_counter, > shadow_counter, module->this_minus_crate); > > > As we understand, the idea here is to check with a call of readout_dt if the module's internal counter agrees with the counter of the > crate for the given module. Basicly when the crate thinks it has accessed the module n times, the module should report the same number of > access attempts. Is this understanding correct? > > > If yes, how exactly should the module increment it's internal counter? If this is done on readout_dt, a mismatch between the crate and the > internal counter occurs as soon as we stop the aquisition. > > > I can explain in more details, but maybe first you can explain to us what the whole idea behind this check is? To us it looks like also > the dummy module implementation will have problems with it. > > > > > Thank you very much! > > > > > Best greetings > > G?nter > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.tornqvist at chalmers.se Fri Feb 16 15:39:42 2024 From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=) Date: Fri, 16 Feb 2024 15:39:42 +0100 Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c In-Reply-To: <5e78793bb00a4cb8840fef372e3ebb07@hi-jena.gsi.de> References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de> <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se> <5e78793bb00a4cb8840fef372e3ebb07@hi-jena.gsi.de> Message-ID: <68326b40-e0e4-493a-9287-12e07819070e@chalmers.se> Dear G?nter, On 2024-02-16 13:50, Weber, Guenter Dr. wrote: > > Dear H?kan, > > thank you. This explanation makes some sense. > > Could you also explain the concept of "shadow module". What is it good for? It's rather "shadow readout mode". The idea is to read data from a module continuously in parallel to conversion and buffering, instead of performing every task of acquisition in sequence where every task has to wait for all the others to finish. The advantage is that this can significantly reduce the time where a module is unable to convert and buffer signals. The potential disadvantage is that the data traffic on for example a VME backplane could induce noise in the analog measurement. The non-shadow mode is the default and best tested mode, due to present module support, cooperation of modules in experiments, and historical reasons. > Also I would like to point out that the implementation of readout_dt for > the V560 module that we used as a reference looks weird to us: > > uint32_t > caen_v560_readout_dt(structCrate*a_crate, structModule*a_module) > { > ? ? (void)a_crate; > LOGF(spam)(LOGL, NAME" readout_dt {"); > a_module->event_counter.value++; > LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }", > a_module->event_counter.value); > return0; > } > > Here the counter is incremented every time the module is 'touched' by > the readout_dt function. In the loop that starts at line 1240 in crate.c > the function readout_dt is executed for the module until the test around > line 1270 is passed or the timeout around line 1280 happens. Thus an > initial mismatch between the (software) counter of module V560 and the > crate that prevents the loop from being existed at the first trial will > grow steadily as the module is accessed via readout_dt many, many times > until it runs into the timeout. What is this good for? Nurdlib avoids resetting counters and instead latches and saves counter values (e.g. soft crate counters and module counters) during dead-time when counters should not change. After the first event, both counters must have incremented by 1. Any particular start value such as 0 or 1 has no deeper meaning, it's the progression that is important. Resetting tends to carry with it the idea of "after a reset, or setting something to 0, everything is fine". It moves the focus away from getting the complete logic correct, at least that is the feeling I get. Now, to actually discuss your case :) Are you seeing that the mismatch between the crate counter and the v560 keeps increasing for new events? Or do you have such a problem with the dummy module? (I still did not find the slot to look at it...) > To my (current) understanding it is pointless to try to mimic the > function of a true hardware counter within the module by a counter that > only exists in software. The better way would be to tell crate.c that > this module does not have such a counter so that the check is pointless. > Is this understanding correct? And if yes, how can I tell NURDLIB to > skip this check? A software counter would be a consistency check of the implementation, but you are correct that it has little to do with the signals that are recorded. One can send the same random signal to different modules and verify correlation at some point after digitisation, and definitely no later than online monitoring while data is recorded. I think it's possible to give the module counter a mask of 0. The counter check should in principle be: mask = ctr_a_mask & ctr_b_mask; ctr_a = ctr_a_raw - ctr_a_latch; ctr_b = ctr_b_raw - ctr_b_latch; if ( (ctr_a & mask) == (ctr_b & mask) ) { all good! } If either mask is 0 the condition will always pass. There is maybe a better way to make this clear than built into the masking, but I would say that using a module without any kind of sync-check in event-per-event analysis is overall dangerous... Hope that helps! Best regards, Hans > Best greetings > > G?nter > > > > ------------------------------------------------------------------------ > *Von:* subexp-daq im Auftrag von > H?kan T Johansson > *Gesendet:* Freitag, 16. Februar 2024 13:25:59 > *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB. > *Betreff:* Re: [subexp-daq] Question on COUNTER_DIFF in crate.c > > Dear G?nter, > > I think Hans might have to correct me. > > This is not exactly what this code does, but typically the modules have a > trigger/event counter, which is incremented for each gate/common signal > they receive on the front-panel. > > When the readout is event-by-event, these counters are checked strictly by > nurdlib, in order to detect cabling issues, double-triggers and so on. > This is especially important for modules that have multi-event buffers, > that otherwise easily can become desynchronised.? (It takes of course some > time to read these counters, which contributes to the overall deadtime. > But the amount of times this has 'saved' data-taking by detecting issues > early make it very worthwhile.) > > > What the code you refer to is doing is I think 'abusing' this a bit. > There is typically some time (order of us) between the trigger, and when > the signals have been digitised and the data becomes available.? Often, > those counters are only updated after that is the case.? To me it looks > like this function use the counters (which we anyhow want to check) to > wait for the modules to have finished converting one event. > > > Best regards, > H?kan > > > > > > On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote: > >> >> Dear friends, >> >> >> we are now trying to add a new module into NURDLIB. At the beginning, we just want to have a 'software version' of the module, so no >> hardware involved yet. >> >> >> We are stuck with the following code in crate.c (line 1262 and following): >> >> >> ? ? ? ? ? ? diff_module = COUNTER_DIFF(*module->crate_counter, >> ? ? ? ? ? ? ? ? module->event_counter, module->this_minus_crate); >> ? ? ? ? ? ? /* TODO: Clean this. */ >> ? ? ? ? ? ? shadow_counter.value = >> ? ? ? ? ? ? ? ? module->shadow.data_counter_value; >> ? ? ? ? ? ? shadow_counter.mask = module->event_counter.mask; >> ? ? ? ? ? ? diff_shadow = COUNTER_DIFF(*module->crate_counter, >> ? ? ? ? ? ? ? ? shadow_counter, module->this_minus_crate); >> >> >> As we understand, the idea here is to check with a call of readout_dt if the module's internal counter agrees with the counter of the >> crate for the given module. Basicly when the crate thinks it has accessed the module n times, the module should report the same number of >> access attempts. Is this understanding correct? >> >> >> If yes, how exactly should the module increment it's internal counter? If this is done on readout_dt, a mismatch between the crate and the >> internal counter occurs as soon as we stop the aquisition. >> >> >> I can explain in more details, but maybe first you can explain to us what the whole idea behind this check is? To us it looks like also >> the dummy module implementation will have problems with it. >> >> >> >> >> Thank you very much! >> >> >> >> >> Best greetings >> >> G?nter >> >> >> >> > From hans.tornqvist at chalmers.se Fri Feb 16 16:01:49 2024 From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=) Date: Fri, 16 Feb 2024 16:01:49 +0100 Subject: [subexp-daq] Question on COUNTER_DIFF in crate.c In-Reply-To: <68326b40-e0e4-493a-9287-12e07819070e@chalmers.se> References: <6ce6f77452774e7b9a053044e2e0b35c@hi-jena.gsi.de> <1744d4fc-b1e8-ed71-3aa2-dbf8a799a77b@chalmers.se> <5e78793bb00a4cb8840fef372e3ebb07@hi-jena.gsi.de> <68326b40-e0e4-493a-9287-12e07819070e@chalmers.se> Message-ID: <1059b12a-8a34-4aef-a6c6-06e60871ea84@chalmers.se> Dear G?nter, I had a quick look in the Caen v767 manual and thought I could mention a few of my thoughts. There is an event-counter register on offset 0x4c which holds the number of events transferred to the output buffer. This can be read out and returned in readout_dt. Note here that the time from accepting a trigger to starting the readout of the module must be sufficiently long, otherwise you might get a value before the complete event has been finished! If the module behaves correctly this counter should only update once the last word of an event is ready to be read from the output buffer. This waiting time is typically controlled with the "conversion time" that can be set in the TRIMI, or the trigger module in general. Waking up the readout computer which accesses the module can take an "arbitrary" time on top of the conversion time, so one can play around with this value a bit. There is an adaptive conversion time (acvt) feature in nurdlib that I have not tested myself for quite some time. It polls event-counters until some timeout, and adjusts the CVT on-the-fly to reduce the overall waiting time and polling calls. Could be interesting if you would like to go that far in optimising your setup, but I do not know how well the Sis 3316 supports this feature. Back to the v767. It looks like the headers in the payload have the same event-counter in the lowest 10 bits. I would suggest that parse_data checks this value with the module event-counter. The End-of-block word has an event-size value which should also be checked with the size of the payload, but I would not check every time measurement word on a typically slow VME controller. That's enough for now :) Best regards, Hans On 2024-02-16 15:39, Hans Toshihide T?rnqvist wrote: > Dear G?nter, > > On 2024-02-16 13:50, Weber, Guenter Dr. wrote: >> >> Dear H?kan, >> >> thank you. This explanation makes some sense. >> >> Could you also explain the concept of "shadow module". What is it good >> for? > > It's rather "shadow readout mode". The idea is to read data from a > module continuously in parallel to conversion and buffering, instead of > performing every task of acquisition in sequence where every task has to > wait for all the others to finish. > > The advantage is that this can significantly reduce the time where a > module is unable to convert and buffer signals. > > The potential disadvantage is that the data traffic on for example a VME > backplane could induce noise in the analog measurement. > > The non-shadow mode is the default and best tested mode, due to present > module support, cooperation of modules in experiments, and historical > reasons. > >> Also I would like to point out that the implementation of readout_dt >> for the V560 module that we used as a reference looks weird to us: >> >> uint32_t >> caen_v560_readout_dt(structCrate*a_crate, structModule*a_module) >> { >> ?? ? (void)a_crate; >> LOGF(spam)(LOGL, NAME" readout_dt {"); >> a_module->event_counter.value++; >> LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }", >> a_module->event_counter.value); >> return0; >> } >> >> Here the counter is incremented every time the module is 'touched' by >> the readout_dt function. In the loop that starts at line 1240 in >> crate.c the function readout_dt is executed for the module until the >> test around line 1270 is passed or the timeout around line 1280 >> happens. Thus an initial mismatch between the (software) counter of >> module V560 and the crate that prevents the loop from being existed at >> the first trial will grow steadily as the module is accessed via >> readout_dt many, many times until it runs into the timeout. What is >> this good for? > > Nurdlib avoids resetting counters and instead latches and saves counter > values (e.g. soft crate counters and module counters) during dead-time > when counters should not change. After the first event, both counters > must have incremented by 1. Any particular start value such as 0 or 1 > has no deeper meaning, it's the progression that is important. > > Resetting tends to carry with it the idea of "after a reset, or setting > something to 0, everything is fine". It moves the focus away from > getting the complete logic correct, at least that is the feeling I get. > > Now, to actually discuss your case :) > Are you seeing that the mismatch between the crate counter and the v560 > keeps increasing for new events? Or do you have such a problem with the > dummy module? (I still did not find the slot to look at it...) > >> To my (current) understanding it is pointless to try to mimic the >> function of a true hardware counter within the module by a counter >> that only exists in software. The better way would be to tell crate.c >> that this module does not have such a counter so that the check is >> pointless. Is this understanding correct? And if yes, how can I tell >> NURDLIB to skip this check? > > A software counter would be a consistency check of the implementation, > but you are correct that it has little to do with the signals that are > recorded. One can send the same random signal to different modules and > verify correlation at some point after digitisation, and definitely no > later than online monitoring while data is recorded. > > I think it's possible to give the module counter a mask of 0. The > counter check should in principle be: > > mask = ctr_a_mask & ctr_b_mask; > ctr_a = ctr_a_raw - ctr_a_latch; > ctr_b = ctr_b_raw - ctr_b_latch; > if ( (ctr_a & mask) == (ctr_b & mask) ) { all good! } > > If either mask is 0 the condition will always pass. There is maybe a > better way to make this clear than built into the masking, but I would > say that using a module without any kind of sync-check in > event-per-event analysis is overall dangerous... > > Hope that helps! > > Best regards, > Hans > >> Best greetings >> >> G?nter >> >> >> >> ------------------------------------------------------------------------ >> *Von:* subexp-daq im Auftrag >> von H?kan T Johansson >> *Gesendet:* Freitag, 16. Februar 2024 13:25:59 >> *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB. >> *Betreff:* Re: [subexp-daq] Question on COUNTER_DIFF in crate.c >> >> Dear G?nter, >> >> I think Hans might have to correct me. >> >> This is not exactly what this code does, but typically the modules have a >> trigger/event counter, which is incremented for each gate/common signal >> they receive on the front-panel. >> >> When the readout is event-by-event, these counters are checked >> strictly by >> nurdlib, in order to detect cabling issues, double-triggers and so on. >> This is especially important for modules that have multi-event buffers, >> that otherwise easily can become desynchronised.? (It takes of course >> some >> time to read these counters, which contributes to the overall deadtime. >> But the amount of times this has 'saved' data-taking by detecting issues >> early make it very worthwhile.) >> >> >> What the code you refer to is doing is I think 'abusing' this a bit. >> There is typically some time (order of us) between the trigger, and when >> the signals have been digitised and the data becomes available.? Often, >> those counters are only updated after that is the case.? To me it looks >> like this function use the counters (which we anyhow want to check) to >> wait for the modules to have finished converting one event. >> >> >> Best regards, >> H?kan >> >> >> >> >> >> On Fri, 16 Feb 2024, Weber, Guenter Dr. wrote: >> >>> >>> Dear friends, >>> >>> >>> we are now trying to add a new module into NURDLIB. At the beginning, >>> we just want to have a 'software version' of the module, so no >>> hardware involved yet. >>> >>> >>> We are stuck with the following code in crate.c (line 1262 and >>> following): >>> >>> >>> ? ? ? ? ? ? diff_module = COUNTER_DIFF(*module->crate_counter, >>> ? ? ? ? ? ? ? ? module->event_counter, module->this_minus_crate); >>> ? ? ? ? ? ? /* TODO: Clean this. */ >>> ? ? ? ? ? ? shadow_counter.value = >>> ? ? ? ? ? ? ? ? module->shadow.data_counter_value; >>> ? ? ? ? ? ? shadow_counter.mask = module->event_counter.mask; >>> ? ? ? ? ? ? diff_shadow = COUNTER_DIFF(*module->crate_counter, >>> ? ? ? ? ? ? ? ? shadow_counter, module->this_minus_crate); >>> >>> >>> As we understand, the idea here is to check with a call of readout_dt >>> if the module's internal counter agrees with the counter of the >>> crate for the given module. Basicly when the crate thinks it has >>> accessed the module n times, the module should report the same number of >>> access attempts. Is this understanding correct? >>> >>> >>> If yes, how exactly should the module increment it's internal >>> counter? If this is done on readout_dt, a mismatch between the crate >>> and the >>> internal counter occurs as soon as we stop the aquisition. >>> >>> >>> I can explain in more details, but maybe first you can explain to us >>> what the whole idea behind this check is? To us it looks like also >>> the dummy module implementation will have problems with it. >>> >>> >>> >>> >>> Thank you very much! >>> >>> >>> >>> >>> Best greetings >>> >>> G?nter >>> >>> >>> >>> >> From g.weber at hi-jena.gsi.de Mon Feb 19 10:15:37 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Mon, 19 Feb 2024 09:15:37 +0000 Subject: [subexp-daq] Question on default firmware for VULOM Message-ID: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de> Dear friends, I restarted the crate and now get the following error message when starting the DAQ: 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07 19:56:55 UTC) WARNING: Known firmware (alias): 0x6e4ba1a9. WARNING: Known firmware (alias): 0x1409285e. WARNING: Known firmware (alias): 0xa73c5093. WARNING: Known firmware (alias): 0x6e4ba1a9. WARNING: Known firmware (alias): 0x1409285e. FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or alias. I assume that upon power-on the VULOM is booting with a default firmware that is not the one necessary to run with the most recent version of NURDLIB, etc. How can I tell the VULOM which firmware version from its memory it should load as default? Thank you very much! Best greetings G?nter -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.tornqvist at chalmers.se Mon Feb 19 10:37:37 2024 From: hans.tornqvist at chalmers.se (=?ISO-8859-1?Q?Hans_Toshihide_T=F6rnqvist?=) Date: Mon, 19 Feb 2024 10:37:37 +0100 Subject: [subexp-daq] Question on default firmware for VULOM In-Reply-To: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de> References: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de> Message-ID: <32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se> Dear G?nter, As I understand it, the firmware you would like to have has been flashed and tested quite a lot and seems to work? Then you can flash program area 0, but you will need the --force option too. Area 0 will only be flashed if the firmware in question has been flashed into another area, as in your present case. After that it will load after every power cycle. Cheers, Hans "Weber, Guenter Dr." skrev: (19 februari 2024 10:15:37 CET) >Dear friends, > > >I restarted the crate and now get the following error message when starting the DAQ: > > >10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. >LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07 19:56:55 UTC) >WARNING: Known firmware (alias): 0x6e4ba1a9. >WARNING: Known firmware (alias): 0x1409285e. >WARNING: Known firmware (alias): 0xa73c5093. >WARNING: Known firmware (alias): 0x6e4ba1a9. >WARNING: Known firmware (alias): 0x1409285e. >FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or alias. > > >I assume that upon power-on the VULOM is booting with a default firmware that is not the one necessary to run with the most recent version of NURDLIB, etc. > > >How can I tell the VULOM which firmware version from its memory it should load as default? > > > > >Thank you very much! > > > > >Best greetings > >G?nter > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.weber at hi-jena.gsi.de Tue Feb 20 10:46:52 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Tue, 20 Feb 2024 09:46:52 +0000 Subject: [subexp-daq] Question on default firmware for VULOM In-Reply-To: <32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se> References: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de>, <32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se> Message-ID: Dear Hans, just to close this issue, here is what we are doing now to set the VULOM to the right firmware. As it is possible that a DAQ system is operating with different software versions that also require different VULOM firmware, we now load the right firmware 'on-the-fly' as part of the startup of the DAQ system. After loading of the firmware a short waiting time is necessary before the VULOM is ready again. --------------------------------------------------- # Obtain the right firmware VULOM4_FW=`cat ${TRLOII_PATH}/fw/vulom4b_trlo/trlo_defs.h | grep MD5SUM_STAMP | sed 's/.*0x//'` export VULOM4_CTRL=$trloiipath/trloctrl/fw_${VULOM4_FW}_trlo/bin_${machine}_${version}/trlo_ctrl # setup local constants addr=3 firmware_region=2 # restarting the VULOM to set the correct firmware version echo "Restarting VULOM with firmware region" $firmware_region $TRLOII_PATH/bin/vulomflash --addr=$addr --restart=$firmware_region echo "Waiting for VULOM to answer..." exit_code=1 while [ $exit_code -ne 0 ] ; do $TRLOII_PATH/bin/vulomflash --addr=$addr --read &> /dev/null exit_code=$? if [ $exit_code -ne 0 ] ; then sleep 1 #sleep for 1 sec before retrying fi done # Trigger on VULOM pulser $VULOM4_CTRL --addr=$addr --clear-setup --config=vulom.trlo standalone module_trigger ---------------------------------------------------- If we did understand the naming convention in "bin/vulomflash --addr=$ADDR --readprogs" a bit better, we would probably also be able to estimate from the name/number of the firmware the right firmware region on the VULOM (and we could check if the right firmware was already loaded). But at the moment, we set firmware_region by hand. In the ideal world, we would then have an init script for the DAQ that takes care of the VULOM settings just by looking at the state of the software under TRLOII_PATH. This could help our guys a lot who have no idea about all this stuff :-) Best greetings G?nter ________________________________ Von: Hans Toshihide T?rnqvist Gesendet: Montag, 19. Februar 2024 10:37:37 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.; Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Question on default firmware for VULOM Dear G?nter, As I understand it, the firmware you would like to have has been flashed and tested quite a lot and seems to work? Then you can flash program area 0, but you will need the --force option too. Area 0 will only be flashed if the firmware in question has been flashed into another area, as in your present case. After that it will load after every power cycle. Cheers, Hans "Weber, Guenter Dr." skrev: (19 februari 2024 10:15:37 CET) Dear friends, I restarted the crate and now get the following error message when starting the DAQ: 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07 19:56:55 UTC) WARNING: Known firmware (alias): 0x6e4ba1a9. WARNING: Known firmware (alias): 0x1409285e. WARNING: Known firmware (alias): 0xa73c5093. WARNING: Known firmware (alias): 0x6e4ba1a9. WARNING: Known firmware (alias): 0x1409285e. FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or alias. I assume that upon power-on the VULOM is booting with a default firmware that is not the one necessary to run with the most recent version of NURDLIB, etc. How can I tell the VULOM which firmware version from its memory it should load as default? Thank you very much! Best greetings G?nter -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.weber at hi-jena.gsi.de Tue Feb 20 10:58:27 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Tue, 20 Feb 2024 09:58:27 +0000 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module Message-ID: Dear friends, I now grabbed a V560 module that was working fine in another DAQ system and put it into our test system. The main.cfg looks like this: log_level=spam # info, verbose, debug, spam CRATE("MCAL") { GSI_VULOM(0x03000000) { timestamp = true # needed to get timestamps in the data output # ecl=0..15 } BARRIER CAEN_V560(0x333333300) { use_veto = true } # CAEN_V767A(0x03100000) { # } } Starting the DAQ now results in a freeze of the RIO4. A reset of the crate is necessary to talk to it again. The problem occurs in the first slow init of the V560 module. To find the exact line, I added some output to CRATE.C: 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. before push_log_level(module) before a_crate->module_init_id = module->id before module->props->init_slow(a_crate, module) LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) before module_init_id_mark(a_crate, module) before pop_log_level(module) 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. before push_log_level(module) before a_crate->module_init_id = module->id before module->props->init_slow(a_crate, module) The CRATE.C code now looks like this: TAILQ_FOREACH(module, &a_crate->module_list, next) { if (NULL == module->props) { continue; } LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, keyword_get_string(module->type)); printf("before push_log_level(module) \n"); push_log_level(module); printf("before a_crate->module_init_id = module->id \n"); a_crate->module_init_id = module->id; printf("before module->props->init_slow(a_crate, module) \n"); if (!module->props->init_slow(a_crate, module)) { printf("before pop_log_level(module) \n"); pop_log_level(module); printf("before goto crate_init_done \n"); goto crate_init_done; } printf("before module_init_id_mark(a_crate, module) \n"); module_init_id_mark(a_crate, module); printf("before pop_log_level(module) \n"); pop_log_level(module); } Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, module)) ..." is doing something quite horrible to the RIO4. This is unfortunate, because my original aim was to show that there is also a bug/mistake in readout_dt of the V560 module. But I did not come this far. Do you have any idea what might cause the freezing of the RIO4? Best greetings and many thanks G?nter -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.weber at hi-jena.gsi.de Tue Feb 20 13:54:22 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Tue, 20 Feb 2024 12:54:22 +0000 Subject: [subexp-daq] Question on default firmware for VULOM In-Reply-To: References: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de>, <32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se>, Message-ID: Dear friends, attached please find a script that automatically choses the right firmware for the VULOM. If you find time, please have a look at it and tell me if you think it is useful or if there is any mistake. On my system it works fine so far. Best greetings G?nter ________________________________ Von: subexp-daq im Auftrag von Weber, Guenter Dr. Gesendet: Dienstag, 20. Februar 2024 10:46:52 An: Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Question on default firmware for VULOM Dear Hans, just to close this issue, here is what we are doing now to set the VULOM to the right firmware. As it is possible that a DAQ system is operating with different software versions that also require different VULOM firmware, we now load the right firmware 'on-the-fly' as part of the startup of the DAQ system. After loading of the firmware a short waiting time is necessary before the VULOM is ready again. --------------------------------------------------- # Obtain the right firmware VULOM4_FW=`cat ${TRLOII_PATH}/fw/vulom4b_trlo/trlo_defs.h | grep MD5SUM_STAMP | sed 's/.*0x//'` export VULOM4_CTRL=$trloiipath/trloctrl/fw_${VULOM4_FW}_trlo/bin_${machine}_${version}/trlo_ctrl # setup local constants addr=3 firmware_region=2 # restarting the VULOM to set the correct firmware version echo "Restarting VULOM with firmware region" $firmware_region $TRLOII_PATH/bin/vulomflash --addr=$addr --restart=$firmware_region echo "Waiting for VULOM to answer..." exit_code=1 while [ $exit_code -ne 0 ] ; do $TRLOII_PATH/bin/vulomflash --addr=$addr --read &> /dev/null exit_code=$? if [ $exit_code -ne 0 ] ; then sleep 1 #sleep for 1 sec before retrying fi done # Trigger on VULOM pulser $VULOM4_CTRL --addr=$addr --clear-setup --config=vulom.trlo standalone module_trigger ---------------------------------------------------- If we did understand the naming convention in "bin/vulomflash --addr=$ADDR --readprogs" a bit better, we would probably also be able to estimate from the name/number of the firmware the right firmware region on the VULOM (and we could check if the right firmware was already loaded). But at the moment, we set firmware_region by hand. In the ideal world, we would then have an init script for the DAQ that takes care of the VULOM settings just by looking at the state of the software under TRLOII_PATH. This could help our guys a lot who have no idea about all this stuff :-) Best greetings G?nter ________________________________ Von: Hans Toshihide T?rnqvist Gesendet: Montag, 19. Februar 2024 10:37:37 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.; Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Question on default firmware for VULOM Dear G?nter, As I understand it, the firmware you would like to have has been flashed and tested quite a lot and seems to work? Then you can flash program area 0, but you will need the --force option too. Area 0 will only be flashed if the firmware in question has been flashed into another area, as in your present case. After that it will load after every power cycle. Cheers, Hans "Weber, Guenter Dr." skrev: (19 februari 2024 10:15:37 CET) Dear friends, I restarted the crate and now get the following error message when starting the DAQ: 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07 19:56:55 UTC) WARNING: Known firmware (alias): 0x6e4ba1a9. WARNING: Known firmware (alias): 0x1409285e. WARNING: Known firmware (alias): 0xa73c5093. WARNING: Known firmware (alias): 0x6e4ba1a9. WARNING: Known firmware (alias): 0x1409285e. FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or alias. I assume that upon power-on the VULOM is booting with a default firmware that is not the one necessary to run with the most recent version of NURDLIB, etc. How can I tell the VULOM which firmware version from its memory it should load as default? Thank you very much! Best greetings G?nter -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #!/bin/sh # hardware address of the VULOM addr=3 # the following script checks if the VULOM is running with the firmware required by the software in TRLOII_PATH # if necessary, the script looks for the location of the required firmware on the memory of the VULOM and sets the VULOM to the desired firmware version VULOM4_FW=`cat ${TRLOII_PATH}/fw/vulom4b_trlo/trlo_defs.h | grep MD5SUM_STAMP | sed 's/.*0x//'` export VULOM4_CTRL=$trloiipath/trloctrl/fw_${VULOM4_FW}_trlo/bin_${machine}_${version}/trlo_ctrl VULOM_FW_NOW=`${TRLOII_PATH}/bin/vulomflash --addr=3 --read | grep "VOLUM+0 =>" | sed "s/.*0x//"` echo "Current firmware on VULOM -> " $VULOM_FW_NOW echo "Necessary firmware on VULOM -> " $VULOM4_FW if [ $VULOM4_FW != $VULOM_FW_NOW ] ; then firmware_region=`${TRLOII_PATH}/bin/vulomflash --addr=3 --readprogs | sed -n -e "/$VULOM4_FW/{s/^.*Rng \([0-9]\+\):.*$/\1/p;q}"` if [ -z $firmware_region ] ; then echo "Necessary firmware not found on VULOM!" exit 1 else # restarting the VULOM to set the correct firmware version echo "Restarting VULOM with firmware region" $firmware_region $TRLOII_PATH/bin/vulomflash --addr=$addr --restart=$firmware_region echo "Waiting for VULOM to answer..." exit_code=1 while [ $exit_code -ne 0 ] ; do $TRLOII_PATH/bin/vulomflash --addr=$addr --read &> /dev/null exit_code=$? if [ $exit_code -ne 0 ] ; then sleep 1 #sleep for 1 sec before retrying fi done fi fi From g.weber at hi-jena.gsi.de Tue Feb 20 15:33:49 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Tue, 20 Feb 2024 14:33:49 +0000 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: Message-ID: Dear friends, I now had a look at the system where the V560 was running. It was also setup by Bastian. And there the code for the V560 module is slightly different from the one included in the NURDLIB branch that I am using on the test system. Maybe you can have a look at it. I also could push the complete NURDLIB from this system, if this helps. Best greetings G?nter ________________________________ Von: subexp-daq im Auftrag von Weber, Guenter Dr. Gesendet: Dienstag, 20. Februar 2024 10:58:27 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear friends, I now grabbed a V560 module that was working fine in another DAQ system and put it into our test system. The main.cfg looks like this: log_level=spam # info, verbose, debug, spam CRATE("MCAL") { GSI_VULOM(0x03000000) { timestamp = true # needed to get timestamps in the data output # ecl=0..15 } BARRIER CAEN_V560(0x333333300) { use_veto = true } # CAEN_V767A(0x03100000) { # } } Starting the DAQ now results in a freeze of the RIO4. A reset of the crate is necessary to talk to it again. The problem occurs in the first slow init of the V560 module. To find the exact line, I added some output to CRATE.C: 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. before push_log_level(module) before a_crate->module_init_id = module->id before module->props->init_slow(a_crate, module) LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) before module_init_id_mark(a_crate, module) before pop_log_level(module) 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. before push_log_level(module) before a_crate->module_init_id = module->id before module->props->init_slow(a_crate, module) The CRATE.C code now looks like this: TAILQ_FOREACH(module, &a_crate->module_list, next) { if (NULL == module->props) { continue; } LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, keyword_get_string(module->type)); printf("before push_log_level(module) \n"); push_log_level(module); printf("before a_crate->module_init_id = module->id \n"); a_crate->module_init_id = module->id; printf("before module->props->init_slow(a_crate, module) \n"); if (!module->props->init_slow(a_crate, module)) { printf("before pop_log_level(module) \n"); pop_log_level(module); printf("before goto crate_init_done \n"); goto crate_init_done; } printf("before module_init_id_mark(a_crate, module) \n"); module_init_id_mark(a_crate, module); printf("before pop_log_level(module) \n"); pop_log_level(module); } Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, module)) ..." is doing something quite horrible to the RIO4. This is unfortunate, because my original aim was to show that there is also a bug/mistake in readout_dt of the V560 module. But I did not come this far. Do you have any idea what might cause the freezing of the RIO4? Best greetings and many thanks G?nter -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #ifndef MODULE_CAEN_V560_INTERNAL_H #define MODULE_CAEN_V560_INTERNAL_H #include struct CaenV560Module { struct Module module; uint32_t address; struct Map *sicy_map; struct CaenV560Read volatile const *read; struct CaenV560Write volatile *write; unsigned use_veto; }; #endif -------------- next part -------------- #include #include #include #include #include #include #include #define NAME "Caen v560" MODULE_PROTOTYPES(caen_v560); int caen_v560_are_distinguishable(enum Keyword a_type) { (void)a_type; LOGF(verbose)(LOGL, NAME" are_distinguishable."); return 1; } uint32_t caen_v560_check_empty(struct Module *a_module) { (void)a_module; return 0; } struct Module * caen_v560_create_(struct Crate *a_crate, struct ConfigBlock const *a_block) { struct CaenV560Module *v560; LOGF(verbose)(LOGL, NAME" create {"); (void)a_crate; MODULE_CREATE(v560); v560->module.event_max = 32; /* no event buffer, arbitrary > 0 */ v560->address = config_get_block_param_int32(a_block, 0); LOGF(verbose)(LOGL, "Address=%08x.", v560->address); LOGF(verbose)(LOGL, NAME" create }"); return (void *)v560; } void caen_v560_deinit(struct Module *a_module) { struct CaenV560Module *v560; LOGF(verbose)(LOGL, NAME" deinit {"); MODULE_CAST(KW_CAEN_V560, v560, a_module); map_unmap(&v560->sicy_map); LOGF(verbose)(LOGL, NAME" deinit }"); } void caen_v560_destroy(struct Module *a_module) { (void)a_module; LOGF(verbose)(LOGL, NAME" destroy."); } uintptr_t caen_v560_get_module_base(struct Module const *a_module) { struct CaenV560Module *v560; uintptr_t base; LOGF(verbose)(LOGL, NAME" get_module_base {"); MODULE_CAST(KW_CAEN_V560, v560, a_module); base = (uintptr_t)map_get_mapped_ptr(v560->sicy_map); LOGF(verbose)(LOGL, NAME" get_module_base(%p) }", (void *)base); return base; } int caen_v560_init_fast(struct Crate *a_crate, struct Module *a_module) { struct CaenV560Module *v560; (void)a_crate; LOGF(verbose)(LOGL, NAME" init_fast {"); MODULE_CAST(KW_CAEN_V560, v560, a_module); v560->use_veto = config_get_boolean(a_module->config, KW_USE_VETO); LOGF(verbose)(LOGL, "use_veto = %s", v560->use_veto ? "yes" : "no"); v560->write->scale_clear = 1; v560->write->vme_veto_reset = 1; LOGF(verbose)(LOGL, NAME" init_fast }"); return 1; } int caen_v560_init_slow(struct Crate *a_crate, struct Module *a_module) { struct CaenV560Module *v560; void volatile *mapped_ptr; uint16_t id; (void)a_crate; LOGF(verbose)(LOGL, NAME" init_slow {"); MODULE_CAST(KW_CAEN_V560, v560, a_module); v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), MAP_POKE_ARGS(*v560->write, scale_clear)); mapped_ptr = map_get_mapped_ptr(v560->sicy_map); v560->read = mapped_ptr; v560->write = mapped_ptr; id = v560->read->fixed_code; if (0xfaf5 != id) { log_die(LOGL, "Fixed code=0x%04x != 0xfaf5, module borked?", id); } id = v560->read->manufacturer_module_type; LOGF(verbose)(LOGL, "Manufacturer=%02x, module type=%03x (0x%04x).", (0xfc00 & id) >> 10, 0x3ff & id, id); id = v560->read->version_serial_number; LOGF(verbose)(LOGL, "Version=%x, S/N=%03x (0x%04x).", (0xf000 & id) >> 12, 0xfff & id, id); LOGF(verbose)(LOGL, NAME" init_slow }"); return 1; } void caen_v560_memtest(struct Module *a_module, enum Keyword a_mode) { (void)a_module; (void)a_mode; } uint32_t caen_v560_parse_data(struct Crate const *a_crate, struct Module *a_module, struct EventConstBuffer const *a_event_buffer, int a_do_pedestals) { (void)a_crate; (void)a_module; (void)a_event_buffer; (void)a_do_pedestals; return 0; } uint32_t caen_v560_readout(struct Crate *a_crate, struct Module *a_module, struct EventBuffer *a_event_buffer) { struct CaenV560Module *v560; uint32_t *outp; uint32_t result = 0; int ch; (void)a_crate; LOGF(spam)(LOGL, NAME" readout {"); MODULE_CAST(KW_CAEN_V560, v560, a_module); outp = a_event_buffer->ptr; if (v560->use_veto) { v560->write->vme_veto_set = 1; } *outp++ = 0xc560c560; for (ch = 0; ch < 16; ch++) { *outp++ = v560->read->counter[ch]; } if (v560->use_veto) { v560->write->vme_veto_reset = 1; } EVENT_BUFFER_ADVANCE(*a_event_buffer, outp); LOGF(spam)(LOGL, NAME" readout(0x%08x) }", result); return result; } uint32_t caen_v560_readout_dt(struct Crate *a_crate, struct Module *a_module) { (void)a_crate; LOGF(spam)(LOGL, NAME" readout_dt {"); a_module->event_counter.value++; LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }", a_module->event_counter.value); return 0; } void caen_v560_setup_() { MODULE_SETUP(caen_v560, 0); } -------------- next part -------------- #ifndef MODULE_CAEN_V560_CAEN_V560_H #define MODULE_CAEN_V560_CAEN_V560_H #include MODULE_INTERFACE(caen_v560); #endif -------------- next part -------------- nabm Interrupt Vector 0x0004 16 RW Interrupt Level 0x0006 16 RW Enable Interrupt 0x0008 16 RW Disable Interrupt 0x000A 16 RW Clear Interrupt 0x000C 16 RW Request 0x000E 16 RW Counter 0x0010..0x004C 32 R Scale clear 0x0050 16 RW VME VETO set 0x0052 16 RW VME VETO reset 0x0054 16 RW Scale Increase 0x0056 16 RW Scale Status 0x0058 16 R Fixed code 0x00FA 16 R Manufacturer Module Type 0x00FC 16 R Version Serial Number 0x00FE 16 R -------------- next part -------------- A non-text attachment was scrubbed... Name: rules.mk Type: application/octet-stream Size: 161 bytes Desc: rules.mk URL: From f96hajo at chalmers.se Tue Feb 20 19:41:06 2024 From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=) Date: Tue, 20 Feb 2024 19:41:06 +0100 Subject: [subexp-daq] Question on default firmware for VULOM In-Reply-To: References: <2a93957274cc4503bb7915c804182801@hi-jena.gsi.de>, <32E3DE33-8FAE-40A3-A3FE-44C14924D590@chalmers.se>, Message-ID: <9a62763e-663d-3303-2130-d1f878bdaf62@chalmers.se> Dear G?nter, question: are you using different firmwares due to the need to have different amounts of various kinds of logic inside? Or is it just to run 'older' versions, due to software incompatibilities? If the latter, then the long-term approach would be to try to rectify that by maving to newer versions when possible. We have been working on forward-porting the sis3316 branch you are using. All the nurdlib-common things have been merged with the master branch already. The sis3316 changes are also done, but needs testing. We have no experience or direct access to such modules. We have tried to be careful, but it is easy to overlook things. Separate mail to come. Cheers, H?kan On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > Dear friends, > > > attached please find a script that automatically choses the right firmware > for the VULOM. > > > If you find time, please have a look at it and tell me if you think it is > useful or if there is any mistake. > > > On my system it works fine so far. > > > > > Best greetings > > G?nter > > ____________________________________________________________________________ > Von: subexp-daq im Auftrag von Weber, > Guenter Dr. > Gesendet: Dienstag, 20. Februar 2024 10:46:52 > An: Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, drasi and > UCESB. > Betreff: Re: [subexp-daq] Question on default firmware for VULOM ? > > Dear Hans, > > > just to close this issue, here is what we are doing now to set the VULOM to > the right firmware. As it is possible that a DAQ system is operating with > different software versions that also require different VULOM firmware, we > now load the right firmware 'on-the-fly' as part of the startup of the DAQ > system. After loading of the firmware a short waiting time is necessary > before the VULOM is ready again. > > > --------------------------------------------------- > > # Obtain the right firmware > > VULOM4_FW=`cat ${TRLOII_PATH}/fw/vulom4b_trlo/trlo_defs.h | grep > MD5SUM_STAMP | sed 's/.*0x//'` > exportVULOM4_CTRL=$trloiipath/trloctrl/fw_${VULOM4_FW}_trlo/bin_${machine}_${vers > ion}/trlo_ctrl > > # setup local constants > addr=3 > firmware_region=2 > > # restarting the VULOM to set the correct firmware version > echo "Restarting VULOM with firmware region" $firmware_region > $TRLOII_PATH/bin/vulomflash --addr=$addr --restart=$firmware_region > echo "Waiting for VULOM to answer..." > exit_code=1 > while [ $exit_code -ne 0 ] ; do > ??? $TRLOII_PATH/bin/vulomflash --addr=$addr --read &> /dev/null > ??? exit_code=$? > ??? if [ $exit_code -ne 0 ] ; then > ??????? sleep 1 #sleep for 1 sec before retrying > ??? fi > done > > > # Trigger on VULOM pulser > $VULOM4_CTRL --addr=$addr --clear-setup --config=vulom.trlo standalone > module_trigger > > ---------------------------------------------------- > > > If we did understand the naming convention in "bin/vulomflash --addr=$ADDR > --readprogs" a bit better, we would probably also be able to estimate from > the name/number of the firmware the right firmware region on the VULOM (and > we could check if the right firmware was already loaded). But at the moment, > we set firmware_region by hand. > > > In the ideal world, we would then have an init script for the DAQ that takes > care of the VULOM settings just by looking at the state of the software > under TRLOII_PATH. This could help our guys a lot who have no idea about all > this stuff :-) > > > > > Best greetings > > G?nter > > > > ____________________________________________________________________________ > Von: Hans Toshihide T?rnqvist > Gesendet: Montag, 19. Februar 2024 10:37:37 > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr.; > Discuss use of Nurdlib, TRLO II, drasi and UCESB. > Betreff: Re: [subexp-daq] Question on default firmware for VULOM ? > Dear G?nter, > > As I understand it, the firmware you would like to have has been flashed and > tested quite a lot and seems to work? > > Then you can flash program area 0, but you will need the --force option too. > Area 0 will only be flashed if the firmware in question has been flashed > into another area, as in your present case. > > After that it will load after every power cycle. > > Cheers, > Hans > > > "Weber, Guenter Dr." skrev: (19 februari 2024 > 10:15:37 CET) > > Dear friends, > > > I restarted the crate and now get the following error message > when starting the DAQ: > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > LOG: TRLO: MD5SUM: 0x426cb99c (CT: 5bba6507 = 2018-10-07 > 19:56:55 UTC) > WARNING: Known firmware (alias): 0x6e4ba1a9. > WARNING: Known firmware (alias): 0x1409285e. > WARNING: Known firmware (alias): 0xa73c5093. > WARNING: Known firmware (alias): 0x6e4ba1a9. > WARNING: Known firmware (alias): 0x1409285e. > FATAL: TRLO firmware wrong: 0x426cb99c, expected 0x6e4ba1a9 or > alias. > > > I assume that upon power-on the VULOM is booting with a default > firmware that is not the one necessary to run with the most recent > version of NURDLIB, etc. > > > How can I tell the VULOM which firmware version from its memory it > should load as default? > > > > > Thank you very much! > > > > > Best greetings > > G?nter > > > > From f96hajo at chalmers.se Tue Feb 20 19:59:11 2024 From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=) Date: Tue, 20 Feb 2024 19:59:11 +0100 Subject: [subexp-daq] sis3316 updates Message-ID: Dear G?nter, all sis3316 nurdlib user, the changes to the nurdlib sis3316-code that have been used at Jena (and possibly other places), which was a branch that had its branch point about three years ago has been forward-ported to approximately current master. It is available as the 'rebasing_sis3316' branch at https://gitlab.com/chalmers-subexp/nurdlib Since we have no direct access or own experience with those modules, the testing needs to be done by some/anyone with access to sis3316 hardware. Note: it is not necessary to have used the forked branch to provide helpful test results! Knowing that the new changes do not break other sis3316-behaviour would also be very helpful. Thus, this is a call for help! ;) As can be seen in the repository graph https://gitlab.com/chalmers-subexp/nurdlib/-/network/master?ref_type=heads there are about 20 commits. Some of them are followed by fixup commits, where we just kept a minimal merge first, and then fixed compilation issues separately, in order to more easily follow any mistakes. I.e.: when a commit is followed by fixup commits, it only makes sense to test the last fixup commit in that sequence. I would suggest the following test strategy: 0) First test the 'rebasing_sis3316' branch. If we are lucky - it just works! 1) If 0) fails, then test the fork point, i.e. the currently the commit e2163738. This is an close ancestor of the nurdlib master branch, and thus contains no additional sis3316 changes than has been in the master branch so far. When testing that, it will probably be necessary to comment out some settings, which have been implemented in the new branch. If this fails, nurdlib master has a problem, which I think should be looked into before proceeding further. 2) Move forward, commit by commit (in steps to fixup commits where they follow other commits). For each such commit, test and see if it still works. If a commit implements a new option, also test that one. This way, we should hopefully be able to pin-point any issues. Best regards, H?kan From f96hajo at chalmers.se Tue Feb 20 20:13:32 2024 From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=) Date: Tue, 20 Feb 2024 20:13:32 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: Message-ID: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> Dear G?nter, I took the files you provided and for comparison put them in a branch 'old_caen_v560'. git diff origin/old_caen_v560..origin/master however does not show anything which is suspicious to me. Perhaps Hans can spot something. Otherwise, the only idea I can come up with is to continue to bisect the code inside slow init. However, before that, I would suggest to add fflush(stdout); sleep(1); after each printf statement, such that one can be quite sure that the printout is not eaten when the RIO crash happens. I.e. that it actually had gotten further than shown by the prints. Best regards, H?kan On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > Dear friends, > > > I now had a look at the system where the V560 was running. It was also setup > by Bastian. And there the code for the V560 module is slightly different > from the one included in the NURDLIB branch that I am using on the test > system. > > > Maybe you can have a look at it. > > > I also could push the complete NURDLIB from this system, if this helps. > > > > > Best greetings > > G?nter > > > > > ____________________________________________________________________________ > Von: subexp-daq im Auftrag von Weber, > Guenter Dr. > Gesendet: Dienstag, 20. Februar 2024 10:58:27 > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module ? > > Dear friends, > > > I now grabbed a V560 module that was working fine in another DAQ system and > put it into our test system. > > > The main.cfg looks like this: > > > log_level=spam # info, verbose, debug, spam > > CRATE("MCAL") { > ? ? GSI_VULOM(0x03000000) { > ? ? ? ? timestamp = true # needed to get timestamps in the data output > ? ? # ? ecl=0..15 > ? ? } > ? ? BARRIER > ? ? CAEN_V560(0x333333300) { > ? ? ? ? use_veto = true > ? ? } ? > # ? CAEN_V767A(0x03100000) { > # ? } > } > > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate > is necessary to talk to it again. > > > The problem occurs in the first slow init of the V560 module. To find the > exact line, I added some output to CRATE.C: > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > before module_init_id_mark(a_crate, module) > before pop_log_level(module) > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > > > The CRATE.C code now looks like this: > > > ? ? TAILQ_FOREACH(module, &a_crate->module_list, next) { > ? ? ? ? if (NULL == module->props) { > ? ? ? ? ? ? continue; > ? ? ? ? } > ? ? ? ? LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, > ? ? ? ? ? ? keyword_get_string(module->type)); > ? ? ? ? printf("before push_log_level(module) \n"); > ? ? ? ? push_log_level(module); > ? ? ? ? printf("before a_crate->module_init_id = module->id \n"); > ? ? ? ? a_crate->module_init_id = module->id; > ? ? ? ? printf("before module->props->init_slow(a_crate, module) \n"); > ? ? ? ? if (!module->props->init_slow(a_crate, module)) { > ? ? ? ? ? ? printf("before pop_log_level(module) \n"); > ? ? ? ? ? ? pop_log_level(module); > ? ? ? ? ? ? printf("before goto crate_init_done \n"); > ? ? ? ? ? ? goto crate_init_done; > ? ? ? ? } > ? ? ? ? printf("before module_init_id_mark(a_crate, module) \n"); > ? ? ? ? module_init_id_mark(a_crate, module); > ? ? ? ? printf("before pop_log_level(module) \n"); > ? ? ? ? pop_log_level(module); > ? ? } > > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, > module)) ..." is doing something quite horrible to the RIO4. > > > This is unfortunate, because my original aim was to show that there is also > a bug/mistake in readout_dt of the V560 module. But I did not come this far. > > > Do you have any idea what might cause the freezing of the RIO4? > > > > > Best greetings and many thanks > > G?nter > > > > > > From f96hajo at chalmers.se Tue Feb 20 20:15:06 2024 From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=) Date: Tue, 20 Feb 2024 20:15:06 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: Message-ID: <7aa42eb8-7a25-a1d0-3179-1c9853922807@chalmers.se> Ohh, and please do push also the complete nurdlib branch from that system. Who knows what other changes it might contain :-) Cheers, H?kan On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > Dear friends, > > > I now had a look at the system where the V560 was running. It was also setup > by Bastian. And there the code for the V560 module is slightly different > from the one included in the NURDLIB branch that I am using on the test > system. > > > Maybe you can have a look at it. > > > I also could push the complete NURDLIB from this system, if this helps. > > > > > Best greetings > > G?nter > > > > > ____________________________________________________________________________ > Von: subexp-daq im Auftrag von Weber, > Guenter Dr. > Gesendet: Dienstag, 20. Februar 2024 10:58:27 > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module ? > > Dear friends, > > > I now grabbed a V560 module that was working fine in another DAQ system and > put it into our test system. > > > The main.cfg looks like this: > > > log_level=spam # info, verbose, debug, spam > > CRATE("MCAL") { > ? ? GSI_VULOM(0x03000000) { > ? ? ? ? timestamp = true # needed to get timestamps in the data output > ? ? # ? ecl=0..15 > ? ? } > ? ? BARRIER > ? ? CAEN_V560(0x333333300) { > ? ? ? ? use_veto = true > ? ? } ? > # ? CAEN_V767A(0x03100000) { > # ? } > } > > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate > is necessary to talk to it again. > > > The problem occurs in the first slow init of the V560 module. To find the > exact line, I added some output to CRATE.C: > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > before module_init_id_mark(a_crate, module) > before pop_log_level(module) > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > > > The CRATE.C code now looks like this: > > > ? ? TAILQ_FOREACH(module, &a_crate->module_list, next) { > ? ? ? ? if (NULL == module->props) { > ? ? ? ? ? ? continue; > ? ? ? ? } > ? ? ? ? LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, > ? ? ? ? ? ? keyword_get_string(module->type)); > ? ? ? ? printf("before push_log_level(module) \n"); > ? ? ? ? push_log_level(module); > ? ? ? ? printf("before a_crate->module_init_id = module->id \n"); > ? ? ? ? a_crate->module_init_id = module->id; > ? ? ? ? printf("before module->props->init_slow(a_crate, module) \n"); > ? ? ? ? if (!module->props->init_slow(a_crate, module)) { > ? ? ? ? ? ? printf("before pop_log_level(module) \n"); > ? ? ? ? ? ? pop_log_level(module); > ? ? ? ? ? ? printf("before goto crate_init_done \n"); > ? ? ? ? ? ? goto crate_init_done; > ? ? ? ? } > ? ? ? ? printf("before module_init_id_mark(a_crate, module) \n"); > ? ? ? ? module_init_id_mark(a_crate, module); > ? ? ? ? printf("before pop_log_level(module) \n"); > ? ? ? ? pop_log_level(module); > ? ? } > > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, > module)) ..." is doing something quite horrible to the RIO4. > > > This is unfortunate, because my original aim was to show that there is also > a bug/mistake in readout_dt of the V560 module. But I did not come this far. > > > Do you have any idea what might cause the freezing of the RIO4? > > > > > Best greetings and many thanks > > G?nter > > > > > > From g.weber at hi-jena.gsi.de Wed Feb 21 10:18:29 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Wed, 21 Feb 2024 09:18:29 +0000 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> References: , <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> Message-ID: Dear H?kan, thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line: v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT, 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); Maybe the code is accessing/writing into a memory location that it should better not touch? This problematic line is then followed by: id = MAP_READ(v560->sicy_map, fixed_code); The corresponding line in the V560 code on the system that was running with this module looks like this: v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), MAP_POKE_ARGS(*v560->write, scale_clear)); And is followed by: mapped_ptr = map_get_mapped_ptr(v560->sicy_map); v560->read = mapped_ptr; v560->write = mapped_ptr; Maybe you already have an idea what causes the problem here? I will now go to the system that was running with V560 and make a push of the NURDLIB. Best greetings G?nter ________________________________ Von: subexp-daq im Auftrag von H?kan T Johansson Gesendet: Dienstag, 20. Februar 2024 20:13:32 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear G?nter, I took the files you provided and for comparison put them in a branch 'old_caen_v560'. git diff origin/old_caen_v560..origin/master however does not show anything which is suspicious to me. Perhaps Hans can spot something. Otherwise, the only idea I can come up with is to continue to bisect the code inside slow init. However, before that, I would suggest to add fflush(stdout); sleep(1); after each printf statement, such that one can be quite sure that the printout is not eaten when the RIO crash happens. I.e. that it actually had gotten further than shown by the prints. Best regards, H?kan On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > Dear friends, > > > I now had a look at the system where the V560 was running. It was also setup > by Bastian. And there the code for the V560 module is slightly different > from the one included in the NURDLIB branch that I am using on the test > system. > > > Maybe you can have a look at it. > > > I also could push the complete NURDLIB from this system, if this helps. > > > > > Best greetings > > G?nter > > > > > ____________________________________________________________________________ > Von: subexp-daq im Auftrag von Weber, > Guenter Dr. > Gesendet: Dienstag, 20. Februar 2024 10:58:27 > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module > > Dear friends, > > > I now grabbed a V560 module that was working fine in another DAQ system and > put it into our test system. > > > The main.cfg looks like this: > > > log_level=spam # info, verbose, debug, spam > > CRATE("MCAL") { > GSI_VULOM(0x03000000) { > timestamp = true # needed to get timestamps in the data output > # ecl=0..15 > } > BARRIER > CAEN_V560(0x333333300) { > use_veto = true > } > # CAEN_V767A(0x03100000) { > # } > } > > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate > is necessary to talk to it again. > > > The problem occurs in the first slow init of the V560 module. To find the > exact line, I added some output to CRATE.C: > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > before module_init_id_mark(a_crate, module) > before pop_log_level(module) > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > > > The CRATE.C code now looks like this: > > > TAILQ_FOREACH(module, &a_crate->module_list, next) { > if (NULL == module->props) { > continue; > } > LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, > keyword_get_string(module->type)); > printf("before push_log_level(module) \n"); > push_log_level(module); > printf("before a_crate->module_init_id = module->id \n"); > a_crate->module_init_id = module->id; > printf("before module->props->init_slow(a_crate, module) \n"); > if (!module->props->init_slow(a_crate, module)) { > printf("before pop_log_level(module) \n"); > pop_log_level(module); > printf("before goto crate_init_done \n"); > goto crate_init_done; > } > printf("before module_init_id_mark(a_crate, module) \n"); > module_init_id_mark(a_crate, module); > printf("before pop_log_level(module) \n"); > pop_log_level(module); > } > > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, > module)) ..." is doing something quite horrible to the RIO4. > > > This is unfortunate, because my original aim was to show that there is also > a bug/mistake in readout_dt of the V560 module. But I did not come this far. > > > Do you have any idea what might cause the freezing of the RIO4? > > > > > Best greetings and many thanks > > G?nter > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.weber at hi-jena.gsi.de Wed Feb 21 11:03:41 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Wed, 21 Feb 2024 10:03:41 +0000 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: , <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se>, Message-ID: <1a521b6bf7a64588b0bc8754952eb565@hi-jena.gsi.de> Ok. Push was done. Counting objects: 13558, done. Delta compression using up to 32 threads. Compressing objects: 100% (3619/3619), done. Writing objects: 100% (13558/13558), 2.63 MiB | 4.29 MiB/s, done. Total 13558 (delta 9974), reused 13407 (delta 9863) remote: Resolving deltas: 100% (9974/9974), done. remote: remote: To create a merge request for caen_v560, visit: remote: https://gitlab.com/chalmers-subexp/nurdlib/-/merge_requests/new?merge_request%5Bsource_branch%5D=caen_v560 remote: To gitlab.com:chalmers-subexp/nurdlib.git * [new branch] caen_v560 -> caen_v560 Best greetings G?nter ________________________________ Von: subexp-daq im Auftrag von Weber, Guenter Dr. Gesendet: Mittwoch, 21. Februar 2024 10:18:29 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear H?kan, thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line: v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT, 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); Maybe the code is accessing/writing into a memory location that it should better not touch? This problematic line is then followed by: id = MAP_READ(v560->sicy_map, fixed_code); The corresponding line in the V560 code on the system that was running with this module looks like this: v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), MAP_POKE_ARGS(*v560->write, scale_clear)); And is followed by: mapped_ptr = map_get_mapped_ptr(v560->sicy_map); v560->read = mapped_ptr; v560->write = mapped_ptr; Maybe you already have an idea what causes the problem here? I will now go to the system that was running with V560 and make a push of the NURDLIB. Best greetings G?nter ________________________________ Von: subexp-daq im Auftrag von H?kan T Johansson Gesendet: Dienstag, 20. Februar 2024 20:13:32 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear G?nter, I took the files you provided and for comparison put them in a branch 'old_caen_v560'. git diff origin/old_caen_v560..origin/master however does not show anything which is suspicious to me. Perhaps Hans can spot something. Otherwise, the only idea I can come up with is to continue to bisect the code inside slow init. However, before that, I would suggest to add fflush(stdout); sleep(1); after each printf statement, such that one can be quite sure that the printout is not eaten when the RIO crash happens. I.e. that it actually had gotten further than shown by the prints. Best regards, H?kan On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > Dear friends, > > > I now had a look at the system where the V560 was running. It was also setup > by Bastian. And there the code for the V560 module is slightly different > from the one included in the NURDLIB branch that I am using on the test > system. > > > Maybe you can have a look at it. > > > I also could push the complete NURDLIB from this system, if this helps. > > > > > Best greetings > > G?nter > > > > > ____________________________________________________________________________ > Von: subexp-daq im Auftrag von Weber, > Guenter Dr. > Gesendet: Dienstag, 20. Februar 2024 10:58:27 > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module > > Dear friends, > > > I now grabbed a V560 module that was working fine in another DAQ system and > put it into our test system. > > > The main.cfg looks like this: > > > log_level=spam # info, verbose, debug, spam > > CRATE("MCAL") { > GSI_VULOM(0x03000000) { > timestamp = true # needed to get timestamps in the data output > # ecl=0..15 > } > BARRIER > CAEN_V560(0x333333300) { > use_veto = true > } > # CAEN_V767A(0x03100000) { > # } > } > > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate > is necessary to talk to it again. > > > The problem occurs in the first slow init of the V560 module. To find the > exact line, I added some output to CRATE.C: > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > before module_init_id_mark(a_crate, module) > before pop_log_level(module) > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > > > The CRATE.C code now looks like this: > > > TAILQ_FOREACH(module, &a_crate->module_list, next) { > if (NULL == module->props) { > continue; > } > LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, > keyword_get_string(module->type)); > printf("before push_log_level(module) \n"); > push_log_level(module); > printf("before a_crate->module_init_id = module->id \n"); > a_crate->module_init_id = module->id; > printf("before module->props->init_slow(a_crate, module) \n"); > if (!module->props->init_slow(a_crate, module)) { > printf("before pop_log_level(module) \n"); > pop_log_level(module); > printf("before goto crate_init_done \n"); > goto crate_init_done; > } > printf("before module_init_id_mark(a_crate, module) \n"); > module_init_id_mark(a_crate, module); > printf("before pop_log_level(module) \n"); > pop_log_level(module); > } > > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, > module)) ..." is doing something quite horrible to the RIO4. > > > This is unfortunate, because my original aim was to show that there is also > a bug/mistake in readout_dt of the V560 module. But I did not come this far. > > > Do you have any idea what might cause the freezing of the RIO4? > > > > > Best greetings and many thanks > > G?nter > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.tornqvist at chalmers.se Wed Feb 21 11:14:44 2024 From: hans.tornqvist at chalmers.se (=?ISO-8859-1?Q?Hans_Toshihide_T=F6rnqvist?=) Date: Wed, 21 Feb 2024 11:14:44 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: , <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> Message-ID: Dear G?nter, map_map before mapping tries to read and write some given registers with a "safe" but slower method of accessing registers, which is called "poking" in nurdlib. Maybe the method of access on the rio4 you have is not safe enough and one of the two pokes fails horribly... Could you please double check the module address? Could you also try using bin/rwdump to read any register in the v560 to see if it's accessible at all and not a problem with the module implementation in nurdlib? Something like bin/rwdump -a0x33333300 -r16 Actually the address 0x33333300 looks weird to me, maybe it should be 0x33330000? Also for reading, try register offsets fa, fc, fe, with 16 bits accesseses, they should have some interesting values. Cheers, Hans "Weber, Guenter Dr." skrev: (21 februari 2024 10:18:29 CET) >Dear H?kan, > > >thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line: > > > v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT, > 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); > > >Maybe the code is accessing/writing into a memory location that it should better not touch? > >This problematic line is then followed by: > > > id = MAP_READ(v560->sicy_map, fixed_code); > > >The corresponding line in the V560 code on the system that was running with this module looks like this: > > > v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, > 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), > MAP_POKE_ARGS(*v560->write, scale_clear)); > > >And is followed by: > > > mapped_ptr = map_get_mapped_ptr(v560->sicy_map); > v560->read = mapped_ptr; > v560->write = mapped_ptr; > > >Maybe you already have an idea what causes the problem here? > > >I will now go to the system that was running with V560 and make a push of the NURDLIB. > > > > >Best greetings > >G?nter > > >________________________________ >Von: subexp-daq im Auftrag von H?kan T Johansson >Gesendet: Dienstag, 20. Februar 2024 20:13:32 >An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. >Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module > > >Dear G?nter, > >I took the files you provided and for comparison put them in a branch >'old_caen_v560'. > >git diff origin/old_caen_v560..origin/master > >however does not show anything which is suspicious to me. Perhaps Hans >can spot something. > >Otherwise, the only idea I can come up with is to continue to bisect the >code inside slow init. > >However, before that, I would suggest to add > > fflush(stdout); sleep(1); > >after each printf statement, such that one can be quite sure that the >printout is not eaten when the RIO crash happens. I.e. that it actually >had gotten further than shown by the prints. > >Best regards, >H?kan > > > > >On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > >> >> Dear friends, >> >> >> I now had a look at the system where the V560 was running. It was also setup >> by Bastian. And there the code for the V560 module is slightly different >> from the one included in the NURDLIB branch that I am using on the test >> system. >> >> >> Maybe you can have a look at it. >> >> >> I also could push the complete NURDLIB from this system, if this helps. >> >> >> >> >> Best greetings >> >> G?nter >> >> >> >> >> ____________________________________________________________________________ >> Von: subexp-daq im Auftrag von Weber, >> Guenter Dr. >> Gesendet: Dienstag, 20. Februar 2024 10:58:27 >> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. >> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module >> >> Dear friends, >> >> >> I now grabbed a V560 module that was working fine in another DAQ system and >> put it into our test system. >> >> >> The main.cfg looks like this: >> >> >> log_level=spam # info, verbose, debug, spam >> >> CRATE("MCAL") { >> GSI_VULOM(0x03000000) { >> timestamp = true # needed to get timestamps in the data output >> # ecl=0..15 >> } >> BARRIER >> CAEN_V560(0x333333300) { >> use_veto = true >> } >> # CAEN_V767A(0x03100000) { >> # } >> } >> >> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate >> is necessary to talk to it again. >> >> >> The problem occurs in the first slow init of the V560 module. To find the >> exact line, I added some output to CRATE.C: >> >> >> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. >> before push_log_level(module) >> before a_crate->module_init_id = module->id >> before module->props->init_slow(a_crate, module) >> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) >> before module_init_id_mark(a_crate, module) >> before pop_log_level(module) >> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. >> before push_log_level(module) >> before a_crate->module_init_id = module->id >> before module->props->init_slow(a_crate, module) >> >> >> The CRATE.C code now looks like this: >> >> >> TAILQ_FOREACH(module, &a_crate->module_list, next) { >> if (NULL == module->props) { >> continue; >> } >> LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, >> keyword_get_string(module->type)); >> printf("before push_log_level(module) \n"); >> push_log_level(module); >> printf("before a_crate->module_init_id = module->id \n"); >> a_crate->module_init_id = module->id; >> printf("before module->props->init_slow(a_crate, module) \n"); >> if (!module->props->init_slow(a_crate, module)) { >> printf("before pop_log_level(module) \n"); >> pop_log_level(module); >> printf("before goto crate_init_done \n"); >> goto crate_init_done; >> } >> printf("before module_init_id_mark(a_crate, module) \n"); >> module_init_id_mark(a_crate, module); >> printf("before pop_log_level(module) \n"); >> pop_log_level(module); >> } >> >> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, >> module)) ..." is doing something quite horrible to the RIO4. >> >> >> This is unfortunate, because my original aim was to show that there is also >> a bug/mistake in readout_dt of the V560 module. But I did not come this far. >> >> >> Do you have any idea what might cause the freezing of the RIO4? >> >> >> >> >> Best greetings and many thanks >> >> G?nter >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.tornqvist at chalmers.se Wed Feb 21 11:21:25 2024 From: hans.tornqvist at chalmers.se (=?ISO-8859-1?Q?Hans_Toshihide_T=F6rnqvist?=) Date: Wed, 21 Feb 2024 11:21:25 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: , <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> Message-ID: Ah sorry I see now that the v560 has six address selectors so 0x33333300 is actually possible. Still, please double check the adress setting and try using rwdump to poke the module manually. Cheers, Hans "Weber, Guenter Dr." skrev: (21 februari 2024 10:18:29 CET) >Dear H?kan, > > >thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line: > > > v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT, > 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); > > >Maybe the code is accessing/writing into a memory location that it should better not touch? > >This problematic line is then followed by: > > > id = MAP_READ(v560->sicy_map, fixed_code); > > >The corresponding line in the V560 code on the system that was running with this module looks like this: > > > v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, > 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), > MAP_POKE_ARGS(*v560->write, scale_clear)); > > >And is followed by: > > > mapped_ptr = map_get_mapped_ptr(v560->sicy_map); > v560->read = mapped_ptr; > v560->write = mapped_ptr; > > >Maybe you already have an idea what causes the problem here? > > >I will now go to the system that was running with V560 and make a push of the NURDLIB. > > > > >Best greetings > >G?nter > > >________________________________ >Von: subexp-daq im Auftrag von H?kan T Johansson >Gesendet: Dienstag, 20. Februar 2024 20:13:32 >An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. >Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module > > >Dear G?nter, > >I took the files you provided and for comparison put them in a branch >'old_caen_v560'. > >git diff origin/old_caen_v560..origin/master > >however does not show anything which is suspicious to me. Perhaps Hans >can spot something. > >Otherwise, the only idea I can come up with is to continue to bisect the >code inside slow init. > >However, before that, I would suggest to add > > fflush(stdout); sleep(1); > >after each printf statement, such that one can be quite sure that the >printout is not eaten when the RIO crash happens. I.e. that it actually >had gotten further than shown by the prints. > >Best regards, >H?kan > > > > >On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > >> >> Dear friends, >> >> >> I now had a look at the system where the V560 was running. It was also setup >> by Bastian. And there the code for the V560 module is slightly different >> from the one included in the NURDLIB branch that I am using on the test >> system. >> >> >> Maybe you can have a look at it. >> >> >> I also could push the complete NURDLIB from this system, if this helps. >> >> >> >> >> Best greetings >> >> G?nter >> >> >> >> >> ____________________________________________________________________________ >> Von: subexp-daq im Auftrag von Weber, >> Guenter Dr. >> Gesendet: Dienstag, 20. Februar 2024 10:58:27 >> An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. >> Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module >> >> Dear friends, >> >> >> I now grabbed a V560 module that was working fine in another DAQ system and >> put it into our test system. >> >> >> The main.cfg looks like this: >> >> >> log_level=spam # info, verbose, debug, spam >> >> CRATE("MCAL") { >> GSI_VULOM(0x03000000) { >> timestamp = true # needed to get timestamps in the data output >> # ecl=0..15 >> } >> BARRIER >> CAEN_V560(0x333333300) { >> use_veto = true >> } >> # CAEN_V767A(0x03100000) { >> # } >> } >> >> Starting the DAQ now results in a freeze of the RIO4. A reset of the crate >> is necessary to talk to it again. >> >> >> The problem occurs in the first slow init of the V560 module. To find the >> exact line, I added some output to CRATE.C: >> >> >> 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. >> before push_log_level(module) >> before a_crate->module_init_id = module->id >> before module->props->init_slow(a_crate, module) >> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) >> before module_init_id_mark(a_crate, module) >> before pop_log_level(module) >> 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. >> before push_log_level(module) >> before a_crate->module_init_id = module->id >> before module->props->init_slow(a_crate, module) >> >> >> The CRATE.C code now looks like this: >> >> >> TAILQ_FOREACH(module, &a_crate->module_list, next) { >> if (NULL == module->props) { >> continue; >> } >> LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, >> keyword_get_string(module->type)); >> printf("before push_log_level(module) \n"); >> push_log_level(module); >> printf("before a_crate->module_init_id = module->id \n"); >> a_crate->module_init_id = module->id; >> printf("before module->props->init_slow(a_crate, module) \n"); >> if (!module->props->init_slow(a_crate, module)) { >> printf("before pop_log_level(module) \n"); >> pop_log_level(module); >> printf("before goto crate_init_done \n"); >> goto crate_init_done; >> } >> printf("before module_init_id_mark(a_crate, module) \n"); >> module_init_id_mark(a_crate, module); >> printf("before pop_log_level(module) \n"); >> pop_log_level(module); >> } >> >> Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, >> module)) ..." is doing something quite horrible to the RIO4. >> >> >> This is unfortunate, because my original aim was to show that there is also >> a bug/mistake in readout_dt of the V560 module. But I did not come this far. >> >> >> Do you have any idea what might cause the freezing of the RIO4? >> >> >> >> >> Best greetings and many thanks >> >> G?nter >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.weber at hi-jena.gsi.de Wed Feb 21 11:40:25 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Wed, 21 Feb 2024 10:40:25 +0000 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: , <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> , Message-ID: Dear Hans, the output from manual reading of the module indeed shows a problem: RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16 Address=0x33333300 Raw-read value=rwdump: line 28: 593 Bus error $PREFIX $f "$@" The module was working with this address in the other DAQ system (as we did not know the order of the individual switches, we set them all to "3"). But I can take it our and put it in again at a different slot, if maybe this particular slot has a hardware problem. (But I never heard of such thing.) Best greetings G?nter ________________________________ Von: Hans Toshihide T?rnqvist Gesendet: Mittwoch, 21. Februar 2024 11:14:44 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear G?nter, map_map before mapping tries to read and write some given registers with a "safe" but slower method of accessing registers, which is called "poking" in nurdlib. Maybe the method of access on the rio4 you have is not safe enough and one of the two pokes fails horribly... Could you please double check the module address? Could you also try using bin/rwdump to read any register in the v560 to see if it's accessible at all and not a problem with the module implementation in nurdlib? Something like bin/rwdump -a0x33333300 -r16 Actually the address 0x33333300 looks weird to me, maybe it should be 0x33330000? Also for reading, try register offsets fa, fc, fe, with 16 bits accesseses, they should have some interesting values. Cheers, Hans "Weber, Guenter Dr." skrev: (21 februari 2024 10:18:29 CET) Dear H?kan, thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line: v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT, 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); Maybe the code is accessing/writing into a memory location that it should better not touch? This problematic line is then followed by: id = MAP_READ(v560->sicy_map, fixed_code); The corresponding line in the V560 code on the system that was running with this module looks like this: v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), MAP_POKE_ARGS(*v560->write, scale_clear)); And is followed by: mapped_ptr = map_get_mapped_ptr(v560->sicy_map); v560->read = mapped_ptr; v560->write = mapped_ptr; Maybe you already have an idea what causes the problem here? I will now go to the system that was running with V560 and make a push of the NURDLIB. Best greetings G?nter ________________________________ Von: subexp-daq im Auftrag von H?kan T Johansson Gesendet: Dienstag, 20. Februar 2024 20:13:32 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear G?nter, I took the files you provided and for comparison put them in a branch 'old_caen_v560'. git diff origin/old_caen_v560..origin/master however does not show anything which is suspicious to me. Perhaps Hans can spot something. Otherwise, the only idea I can come up with is to continue to bisect the code inside slow init. However, before that, I would suggest to add fflush(stdout); sleep(1); after each printf statement, such that one can be quite sure that the printout is not eaten when the RIO crash happens. I.e. that it actually had gotten further than shown by the prints. Best regards, H?kan On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > Dear friends, > > > I now had a look at the system where the V560 was running. It was also setup > by Bastian. And there the code for the V560 module is slightly different > from the one included in the NURDLIB branch that I am using on the test > system. > > > Maybe you can have a look at it. > > > I also could push the complete NURDLIB from this system, if this helps. > > > > > Best greetings > > G?nter > > > > > ____________________________________________________________________________ > Von: subexp-daq im Auftrag von Weber, > Guenter Dr. > Gesendet: Dienstag, 20. Februar 2024 10:58:27 > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module > > Dear friends, > > > I now grabbed a V560 module that was working fine in another DAQ system and > put it into our test system. > > > The main.cfg looks like this: > > > log_level=spam # info, verbose, debug, spam > > CRATE("MCAL") { > GSI_VULOM(0x03000000) { > timestamp = true # needed to get timestamps in the data output > # ecl=0..15 > } > BARRIER > CAEN_V560(0x333333300) { > use_veto = true > } > # CAEN_V767A(0x03100000) { > # } > } > > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate > is necessary to talk to it again. > > > The problem occurs in the first slow init of the V560 module. To find the > exact line, I added some output to CRATE.C: > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > before module_init_id_mark(a_crate, module) > before pop_log_level(module) > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > > > The CRATE.C code now looks like this: > > > TAILQ_FOREACH(module, &a_crate->module_list, next) { > if (NULL == module->props) { > continue; > } > LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, > keyword_get_string(module->type)); > printf("before push_log_level(module) \n"); > push_log_level(module); > printf("before a_crate->module_init_id = module->id \n"); > a_crate->module_init_id = module->id; > printf("before module->props->init_slow(a_crate, module) \n"); > if (!module->props->init_slow(a_crate, module)) { > printf("before pop_log_level(module) \n"); > pop_log_level(module); > printf("before goto crate_init_done \n"); > goto crate_init_done; > } > printf("before module_init_id_mark(a_crate, module) \n"); > module_init_id_mark(a_crate, module); > printf("before pop_log_level(module) \n"); > pop_log_level(module); > } > > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, > module)) ..." is doing something quite horrible to the RIO4. > > > This is unfortunate, because my original aim was to show that there is also > a bug/mistake in readout_dt of the V560 module. But I did not come this far. > > > Do you have any idea what might cause the freezing of the RIO4? > > > > > Best greetings and many thanks > > G?nter > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.weber at hi-jena.gsi.de Wed Feb 21 14:32:14 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Wed, 21 Feb 2024 13:32:14 +0000 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> References: , <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> , , <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de>, <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> Message-ID: <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de> Dear Hans, with the different register addresses it works. RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16 Address=0x333333fa Raw-read value=0xfaf5 RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16 Address=0x333333fc Raw-read value=0x083a RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16 Address=0x333333fe Raw-read value=0x01bc What can we learn from these numbers? Best greetings G?nter ________________________________ Von: Hans Toshihide T?rnqvist Gesendet: Mittwoch, 21. Februar 2024 12:43:06 An: Weber, Guenter Dr. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Hmm, looks like address offset 0 is "not used", could you try -a0x333333fa? Or fe and fc at the end,they should be some read-only registers. "Weber, Guenter Dr." skrev: (21 februari 2024 12:06:00 CET) Different VME slot of the V560 module, same result. :-( ________________________________ Von: subexp-daq im Auftrag von Weber, Guenter Dr. Gesendet: Mittwoch, 21. Februar 2024 11:40:25 An: Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear Hans, the output from manual reading of the module indeed shows a problem: RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16 Address=0x33333300 Raw-read value=rwdump: line 28: 593 Bus error $PREFIX $f "$@" The module was working with this address in the other DAQ system (as we did not know the order of the individual switches, we set them all to "3"). But I can take it our and put it in again at a different slot, if maybe this particular slot has a hardware problem. (But I never heard of such thing.) Best greetings G?nter ________________________________ Von: Hans Toshihide T?rnqvist Gesendet: Mittwoch, 21. Februar 2024 11:14:44 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear G?nter, map_map before mapping tries to read and write some given registers with a "safe" but slower method of accessing registers, which is called "poking" in nurdlib. Maybe the method of access on the rio4 you have is not safe enough and one of the two pokes fails horribly... Could you please double check the module address? Could you also try using bin/rwdump to read any register in the v560 to see if it's accessible at all and not a problem with the module implementation in nurdlib? Something like bin/rwdump -a0x33333300 -r16 Actually the address 0x33333300 looks weird to me, maybe it should be 0x33330000? Also for reading, try register offsets fa, fc, fe, with 16 bits accesseses, they should have some interesting values. Cheers, Hans "Weber, Guenter Dr." skrev: (21 februari 2024 10:18:29 CET) Dear H?kan, thanks for the hint to flush and sleep. Indeed, I now see that the crash happens in init_slow of V560 at this line: v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT, 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); Maybe the code is accessing/writing into a memory location that it should better not touch? This problematic line is then followed by: id = MAP_READ(v560->sicy_map, fixed_code); The corresponding line in the V560 code on the system that was running with this module looks like this: v560->sicy_map = map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), MAP_POKE_ARGS(*v560->write, scale_clear)); And is followed by: mapped_ptr = map_get_mapped_ptr(v560->sicy_map); v560->read = mapped_ptr; v560->write = mapped_ptr; Maybe you already have an idea what causes the problem here? I will now go to the system that was running with V560 and make a push of the NURDLIB. Best greetings G?nter ________________________________ Von: subexp-daq im Auftrag von H?kan T Johansson Gesendet: Dienstag, 20. Februar 2024 20:13:32 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear G?nter, I took the files you provided and for comparison put them in a branch 'old_caen_v560'. git diff origin/old_caen_v560..origin/master however does not show anything which is suspicious to me. Perhaps Hans can spot something. Otherwise, the only idea I can come up with is to continue to bisect the code inside slow init. However, before that, I would suggest to add fflush(stdout); sleep(1); after each printf statement, such that one can be quite sure that the printout is not eaten when the RIO crash happens. I.e. that it actually had gotten further than shown by the prints. Best regards, H?kan On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > Dear friends, > > > I now had a look at the system where the V560 was running. It was also setup > by Bastian. And there the code for the V560 module is slightly different > from the one included in the NURDLIB branch that I am using on the test > system. > > > Maybe you can have a look at it. > > > I also could push the complete NURDLIB from this system, if this helps. > > > > > Best greetings > > G?nter > > > > > ____________________________________________________________________________ > Von: subexp-daq im Auftrag von Weber, > Guenter Dr. > Gesendet: Dienstag, 20. Februar 2024 10:58:27 > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module > > Dear friends, > > > I now grabbed a V560 module that was working fine in another DAQ system and > put it into our test system. > > > The main.cfg looks like this: > > > log_level=spam # info, verbose, debug, spam > > CRATE("MCAL") { > GSI_VULOM(0x03000000) { > timestamp = true # needed to get timestamps in the data output > # ecl=0..15 > } > BARRIER > CAEN_V560(0x333333300) { > use_veto = true > } > # CAEN_V767A(0x03100000) { > # } > } > > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate > is necessary to talk to it again. > > > The problem occurs in the first slow init of the V560 module. To find the > exact line, I added some output to CRATE.C: > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > before module_init_id_mark(a_crate, module) > before pop_log_level(module) > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. > before push_log_level(module) > before a_crate->module_init_id = module->id > before module->props->init_slow(a_crate, module) > > > The CRATE.C code now looks like this: > > > TAILQ_FOREACH(module, &a_crate->module_list, next) { > if (NULL == module->props) { > continue; > } > LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, > keyword_get_string(module->type)); > printf("before push_log_level(module) \n"); > push_log_level(module); > printf("before a_crate->module_init_id = module->id \n"); > a_crate->module_init_id = module->id; > printf("before module->props->init_slow(a_crate, module) \n"); > if (!module->props->init_slow(a_crate, module)) { > printf("before pop_log_level(module) \n"); > pop_log_level(module); > printf("before goto crate_init_done \n"); > goto crate_init_done; > } > printf("before module_init_id_mark(a_crate, module) \n"); > module_init_id_mark(a_crate, module); > printf("before pop_log_level(module) \n"); > pop_log_level(module); > } > > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, > module)) ..." is doing something quite horrible to the RIO4. > > > This is unfortunate, because my original aim was to show that there is also > a bug/mistake in readout_dt of the V560 module. But I did not come this far. > > > Do you have any idea what might cause the freezing of the RIO4? > > > > > Best greetings and many thanks > > G?nter > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.tornqvist at chalmers.se Wed Feb 21 15:28:01 2024 From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=) Date: Wed, 21 Feb 2024 15:28:01 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de> References: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de> <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de> Message-ID: <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se> Dear G?nter, The most important thing is that you get reasonable values with these reads, the actual values don't mean a whole lot. One of the manual reads that you did (ofs=0xfa) is what 'map_map' does for "poke reading". The macros MAP_POKE_ARGS(fixed_code), or the older MAP_POKE_ARGS(*v560->read, fixed_code) tell 'map_map' what address offset to poke, and it depends on each module. The next thing that happens in 'map_map' is the "poke writing". Could you try to write to the 'scale_clear' register next? That would be: rwdump -a0x33333350 -w16,0 --- In case you would like to look deeper in 'map_map', you can find it in module/map/map.c around line-number 103. It's not a very complicated function that does the following: -) Checks user-mapped memory, you don't need to worry about this, it's mainly for simulating module memory for tests. -) Performs the poke-read. -) Performs the poke-write. -) If it's a BLT mapping, asks the platform-specific code to do that without further tests. -) Otherwise times the poke registers many times to get an idea about the speed of every single-cycle access. If you want to dig even deeper, you can look in module/map/map_xpc_3310.c which is what is used in the most recent Linux Rio4's. It's mainly a wrapper around a proprietary black-box library, so not scary and scary at the same time. Best regards, Hans On 2024-02-21 14:32, Weber, Guenter Dr. wrote: > Dear Hans, > > > with the different register addresses it works. > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16 > Address=0x333333fa > Raw-read value=0xfaf5 > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16 > Address=0x333333fc > Raw-read value=0x083a > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16 > Address=0x333333fe > Raw-read value=0x01bc > > What can we learn from these numbers? > > > > > Best greetings > > G?nter > > > > ------------------------------------------------------------------------ > *Von:* Hans Toshihide T?rnqvist > *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06 > *An:* Weber, Guenter Dr. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 > module > Hmm, looks like address offset 0 is "not used", could you try > -a0x333333fa? Or fe and fc at the end,they should be some read-only > registers. > > > "Weber, Guenter Dr." skrev: (21 februari 2024 > 12:06:00 CET) > > Different VME slot of the V560 module, same result. :-( > > ------------------------------------------------------------------------ > *Von:* subexp-daq im Auftrag > von Weber, Guenter Dr. > *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25 > *An:* Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, > drasi and UCESB. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > > Dear Hans, > > > the output from manual reading of the module indeed shows a problem: > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16 > Address=0x33333300 > Raw-read value=rwdump: line 28:?? 593 Bus error > $PREFIX $f "$@" > > > The module was working with this address in the other DAQ system (as > we did not know the order of the individual switches, we set them > all to "3"). But I can take it our and put it in again at a > different slot, if maybe this particular slot has a hardware > problem. (But I never heard of such thing.) > > > > > Best greetings > > G?nter > > > ------------------------------------------------------------------------ > *Von:* Hans Toshihide T?rnqvist > *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44 > *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, > Guenter Dr. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > Dear G?nter, > > map_map before mapping tries to read and write some given registers > with a "safe" but slower method of accessing registers, which is > called "poking" in nurdlib. Maybe the method of access on the rio4 > you have is not safe enough and one of the two pokes fails horribly... > > Could you please double check the module address? Could you also try > using bin/rwdump to read any register in the v560 to see if it's > accessible at all and not a problem with the module implementation > in nurdlib? > > Something like bin/rwdump -a0x33333300 -r16 > > Actually the address 0x33333300 looks weird to me, maybe it should > be 0x33330000? > Also for reading, try register offsets fa, fc, fe, with 16 bits > accesseses, they should have some interesting values. > > Cheers, > Hans > > > "Weber, Guenter Dr." skrev: (21 februari > 2024 10:18:29 CET) > > Dear H?kan, > > > thanks for the hint to flush and sleep. Indeed, I now see that > the crash happens in init_slow of V560 at this line: > > > v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT, > 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); > > > Maybe the code is accessing/writing into a memory location that > it should better not touch? > > This problematic line is then followed by: > > > id=MAP_READ(v560->sicy_map, fixed_code); > > The corresponding line in the V560 code on the system that was > running with this module looks like this: > > > v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, > 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), > MAP_POKE_ARGS(*v560->write, scale_clear)); > > And is followed by: > > > ? ? mapped_ptr =map_get_mapped_ptr(v560->sicy_map); > v560->read=mapped_ptr; > v560->write=mapped_ptr; > > Maybe you already have an idea what causes the problem here? > > > I will now go to the system that was running with V560 and make > a push of the NURDLIB. > > > > > Best greetings > > G?nter > > > > ------------------------------------------------------------------------ > *Von:* subexp-daq im > Auftrag von H?kan T Johansson > *Gesendet:* Dienstag, 20. Februar 2024 20:13:32 > *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > > Dear G?nter, > > I took the files you provided and for comparison put them in a > branch > 'old_caen_v560'. > > git diff origin/old_caen_v560..origin/master > > however does not show anything which is suspicious to me. > Perhaps Hans > can spot something. > > Otherwise, the only idea I can come up with is to continue to > bisect the > code inside slow init. > > However, before that, I would suggest to add > > ? fflush(stdout); sleep(1); > > after each printf statement, such that one can be quite sure > that the > printout is not eaten when the RIO crash happens.? I.e. that it > actually > had gotten further than shown by the prints. > > Best regards, > H?kan > > > > > On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > > > > Dear friends, > > > > > > I now had a look at the system where the V560 was running. It was also setup > > by Bastian. And there the code for the V560 module is slightly different > > from the one included in the NURDLIB branch that I am using on the test > > system. > > > > > > Maybe you can have a look at it. > > > > > > I also could push the complete NURDLIB from this system, if this helps. > > > > > > > > > > Best greetings > > > > G?nter > > > > > > > > > > ____________________________________________________________________________ > > Von: subexp-daq im Auftrag von Weber, > > Guenter Dr. > > Gesendet: Dienstag, 20. Februar 2024 10:58:27 > > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. > > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module > > > > Dear friends, > > > > > > I now grabbed a V560 module that was working fine in another DAQ system and > > put it into our test system. > > > > > > The main.cfg looks like this: > > > > > > log_level=spam # info, verbose, debug, spam > > > > CRATE("MCAL") { > > ? ? GSI_VULOM(0x03000000) { > > ? ? ? ? timestamp = true # needed to get timestamps in the data output > > ? ? # ? ecl=0..15 > > ? ? } > > ? ? BARRIER > > ? ? CAEN_V560(0x333333300) { > > ? ? ? ? use_veto = true > > ? ? } > > # ? CAEN_V767A(0x03100000) { > > # ? } > > } > > > > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate > > is necessary to talk to it again. > > > > > > The problem occurs in the first slow init of the V560 module. To find the > > exact line, I added some output to CRATE.C: > > > > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > > before push_log_level(module) > > before a_crate->module_init_id = module->id > > before module->props->init_slow(a_crate, module) > > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > > before module_init_id_mark(a_crate, module) > > before pop_log_level(module) > > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. > > before push_log_level(module) > > before a_crate->module_init_id = module->id > > before module->props->init_slow(a_crate, module) > > > > > > The CRATE.C code now looks like this: > > > > > > ? ? TAILQ_FOREACH(module, &a_crate->module_list, next) { > > ? ? ? ? if (NULL == module->props) { > > ? ? ? ? ? ? continue; > > ? ? ? ? } > > ? ? ? ? LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, > > ? ? ? ? ? ? keyword_get_string(module->type)); > > ? ? ? ? printf("before push_log_level(module) \n"); > > ? ? ? ? push_log_level(module); > > ? ? ? ? printf("before a_crate->module_init_id = module->id \n"); > > ? ? ? ? a_crate->module_init_id = module->id; > > ? ? ? ? printf("before module->props->init_slow(a_crate, module) \n"); > > ? ? ? ? if (!module->props->init_slow(a_crate, module)) { > > ? ? ? ? ? ? printf("before pop_log_level(module) \n"); > > ? ? ? ? ? ? pop_log_level(module); > > ? ? ? ? ? ? printf("before goto crate_init_done \n"); > > ? ? ? ? ? ? goto crate_init_done; > > ? ? ? ? } > > ? ? ? ? printf("before module_init_id_mark(a_crate, module) \n"); > > ? ? ? ? module_init_id_mark(a_crate, module); > > ? ? ? ? printf("before pop_log_level(module) \n"); > > ? ? ? ? pop_log_level(module); > > ? ? } > > > > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, > > module)) ..." is doing something quite horrible to the RIO4. > > > > > > This is unfortunate, because my original aim was to show that there is also > > a bug/mistake in readout_dt of the V560 module. But I did not come this far. > > > > > > Do you have any idea what might cause the freezing of the RIO4? > > > > > > > > > > Best greetings and many thanks > > > > G?nter > > > > > > > > > > > > > > From g.weber at hi-jena.gsi.de Wed Feb 21 16:14:45 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Wed, 21 Feb 2024 15:14:45 +0000 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se> References: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de> <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>, <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se> Message-ID: Dear Hans, writing into the register works fine (I tried it several times): RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 Address=0x33333350 Raw-write done. RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 Address=0x33333350 Raw-write done. RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 Address=0x33333350 Raw-write done. RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 Address=0x33333350 Raw-write done. I could now litter map_map with printf() outputs to see where execution of v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT, 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); is failing. Should I proceed this way? Or is there anything else that I could check? (As I understand, the slightly different implementation of V560 on our running system is not indicative of a specific issue, but just due to fact that this is a deprecated version of NURDLIB. Right?) Best greetings G?nter ________________________________ Von: Hans Toshihide T?rnqvist Gesendet: Mittwoch, 21. Februar 2024 15:28:01 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear G?nter, The most important thing is that you get reasonable values with these reads, the actual values don't mean a whole lot. One of the manual reads that you did (ofs=0xfa) is what 'map_map' does for "poke reading". The macros MAP_POKE_ARGS(fixed_code), or the older MAP_POKE_ARGS(*v560->read, fixed_code) tell 'map_map' what address offset to poke, and it depends on each module. The next thing that happens in 'map_map' is the "poke writing". Could you try to write to the 'scale_clear' register next? That would be: rwdump -a0x33333350 -w16,0 --- In case you would like to look deeper in 'map_map', you can find it in module/map/map.c around line-number 103. It's not a very complicated function that does the following: -) Checks user-mapped memory, you don't need to worry about this, it's mainly for simulating module memory for tests. -) Performs the poke-read. -) Performs the poke-write. -) If it's a BLT mapping, asks the platform-specific code to do that without further tests. -) Otherwise times the poke registers many times to get an idea about the speed of every single-cycle access. If you want to dig even deeper, you can look in module/map/map_xpc_3310.c which is what is used in the most recent Linux Rio4's. It's mainly a wrapper around a proprietary black-box library, so not scary and scary at the same time. Best regards, Hans On 2024-02-21 14:32, Weber, Guenter Dr. wrote: > Dear Hans, > > > with the different register addresses it works. > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16 > Address=0x333333fa > Raw-read value=0xfaf5 > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16 > Address=0x333333fc > Raw-read value=0x083a > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16 > Address=0x333333fe > Raw-read value=0x01bc > > What can we learn from these numbers? > > > > > Best greetings > > G?nter > > > > ------------------------------------------------------------------------ > *Von:* Hans Toshihide T?rnqvist > *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06 > *An:* Weber, Guenter Dr. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 > module > Hmm, looks like address offset 0 is "not used", could you try > -a0x333333fa? Or fe and fc at the end,they should be some read-only > registers. > > > "Weber, Guenter Dr." skrev: (21 februari 2024 > 12:06:00 CET) > > Different VME slot of the V560 module, same result. :-( > > ------------------------------------------------------------------------ > *Von:* subexp-daq im Auftrag > von Weber, Guenter Dr. > *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25 > *An:* Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, > drasi and UCESB. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > > Dear Hans, > > > the output from manual reading of the module indeed shows a problem: > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16 > Address=0x33333300 > Raw-read value=rwdump: line 28: 593 Bus error > $PREFIX $f "$@" > > > The module was working with this address in the other DAQ system (as > we did not know the order of the individual switches, we set them > all to "3"). But I can take it our and put it in again at a > different slot, if maybe this particular slot has a hardware > problem. (But I never heard of such thing.) > > > > > Best greetings > > G?nter > > > ------------------------------------------------------------------------ > *Von:* Hans Toshihide T?rnqvist > *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44 > *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, > Guenter Dr. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > Dear G?nter, > > map_map before mapping tries to read and write some given registers > with a "safe" but slower method of accessing registers, which is > called "poking" in nurdlib. Maybe the method of access on the rio4 > you have is not safe enough and one of the two pokes fails horribly... > > Could you please double check the module address? Could you also try > using bin/rwdump to read any register in the v560 to see if it's > accessible at all and not a problem with the module implementation > in nurdlib? > > Something like bin/rwdump -a0x33333300 -r16 > > Actually the address 0x33333300 looks weird to me, maybe it should > be 0x33330000? > Also for reading, try register offsets fa, fc, fe, with 16 bits > accesseses, they should have some interesting values. > > Cheers, > Hans > > > "Weber, Guenter Dr." skrev: (21 februari > 2024 10:18:29 CET) > > Dear H?kan, > > > thanks for the hint to flush and sleep. Indeed, I now see that > the crash happens in init_slow of V560 at this line: > > > v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT, > 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); > > > Maybe the code is accessing/writing into a memory location that > it should better not touch? > > This problematic line is then followed by: > > > id=MAP_READ(v560->sicy_map, fixed_code); > > The corresponding line in the V560 code on the system that was > running with this module looks like this: > > > v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, > 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), > MAP_POKE_ARGS(*v560->write, scale_clear)); > > And is followed by: > > > mapped_ptr =map_get_mapped_ptr(v560->sicy_map); > v560->read=mapped_ptr; > v560->write=mapped_ptr; > > Maybe you already have an idea what causes the problem here? > > > I will now go to the system that was running with V560 and make > a push of the NURDLIB. > > > > > Best greetings > > G?nter > > > > ------------------------------------------------------------------------ > *Von:* subexp-daq im > Auftrag von H?kan T Johansson > *Gesendet:* Dienstag, 20. Februar 2024 20:13:32 > *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > > Dear G?nter, > > I took the files you provided and for comparison put them in a > branch > 'old_caen_v560'. > > git diff origin/old_caen_v560..origin/master > > however does not show anything which is suspicious to me. > Perhaps Hans > can spot something. > > Otherwise, the only idea I can come up with is to continue to > bisect the > code inside slow init. > > However, before that, I would suggest to add > > fflush(stdout); sleep(1); > > after each printf statement, such that one can be quite sure > that the > printout is not eaten when the RIO crash happens. I.e. that it > actually > had gotten further than shown by the prints. > > Best regards, > H?kan > > > > > On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > > > > Dear friends, > > > > > > I now had a look at the system where the V560 was running. It was also setup > > by Bastian. And there the code for the V560 module is slightly different > > from the one included in the NURDLIB branch that I am using on the test > > system. > > > > > > Maybe you can have a look at it. > > > > > > I also could push the complete NURDLIB from this system, if this helps. > > > > > > > > > > Best greetings > > > > G?nter > > > > > > > > > > ____________________________________________________________________________ > > Von: subexp-daq im Auftrag von Weber, > > Guenter Dr. > > Gesendet: Dienstag, 20. Februar 2024 10:58:27 > > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. > > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module > > > > Dear friends, > > > > > > I now grabbed a V560 module that was working fine in another DAQ system and > > put it into our test system. > > > > > > The main.cfg looks like this: > > > > > > log_level=spam # info, verbose, debug, spam > > > > CRATE("MCAL") { > > GSI_VULOM(0x03000000) { > > timestamp = true # needed to get timestamps in the data output > > # ecl=0..15 > > } > > BARRIER > > CAEN_V560(0x333333300) { > > use_veto = true > > } > > # CAEN_V767A(0x03100000) { > > # } > > } > > > > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate > > is necessary to talk to it again. > > > > > > The problem occurs in the first slow init of the V560 module. To find the > > exact line, I added some output to CRATE.C: > > > > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > > before push_log_level(module) > > before a_crate->module_init_id = module->id > > before module->props->init_slow(a_crate, module) > > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > > before module_init_id_mark(a_crate, module) > > before pop_log_level(module) > > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. > > before push_log_level(module) > > before a_crate->module_init_id = module->id > > before module->props->init_slow(a_crate, module) > > > > > > The CRATE.C code now looks like this: > > > > > > TAILQ_FOREACH(module, &a_crate->module_list, next) { > > if (NULL == module->props) { > > continue; > > } > > LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, > > keyword_get_string(module->type)); > > printf("before push_log_level(module) \n"); > > push_log_level(module); > > printf("before a_crate->module_init_id = module->id \n"); > > a_crate->module_init_id = module->id; > > printf("before module->props->init_slow(a_crate, module) \n"); > > if (!module->props->init_slow(a_crate, module)) { > > printf("before pop_log_level(module) \n"); > > pop_log_level(module); > > printf("before goto crate_init_done \n"); > > goto crate_init_done; > > } > > printf("before module_init_id_mark(a_crate, module) \n"); > > module_init_id_mark(a_crate, module); > > printf("before pop_log_level(module) \n"); > > pop_log_level(module); > > } > > > > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, > > module)) ..." is doing something quite horrible to the RIO4. > > > > > > This is unfortunate, because my original aim was to show that there is also > > a bug/mistake in readout_dt of the V560 module. But I did not come this far. > > > > > > Do you have any idea what might cause the freezing of the RIO4? > > > > > > > > > > Best greetings and many thanks > > > > G?nter > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.tornqvist at chalmers.se Wed Feb 21 16:46:25 2024 From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=) Date: Wed, 21 Feb 2024 16:46:25 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de> <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de> <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se> Message-ID: <66bdd6e1-93a1-4f32-b1a7-9880cc35ab09@chalmers.se> Dear G?nter, I cannot see anything important having changed in the v560 code, and in any case the freeze happens inside map_map. Aha, but I do see a bug 'map_map'! Starting with the switch statement around line 195 where the bit depth is chosen, 'map_sicy_write' writes to 'poke_r_ofs', must be 'poke_w_ofs', please try that. (Says a lot about this piece of code... Cleanup action to the todo.) Best regards, Hans On 2024-02-21 16:14, Weber, Guenter Dr. wrote: > Dear Hans, > > > writing into the register works fine (I tried it several times): > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 > Address=0x33333350 > Raw-write done. > RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 > Address=0x33333350 > Raw-write done. > RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 > Address=0x33333350 > Raw-write done. > RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 > Address=0x33333350 > Raw-write done. > > I could now litter map_map with printf() outputs to see where execution of > > v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT, > 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); > > is failing. Should I proceed this way? Or is there anything else that I > could check? > > (As I understand, the slightly different implementation of V560 on our > running system is not indicative of a specific issue, but just due to > fact that this is a deprecated version of NURDLIB. Right?) > > > > Best greetings > G?nter > > > ------------------------------------------------------------------------ > *Von:* Hans Toshihide T?rnqvist > *Gesendet:* Mittwoch, 21. Februar 2024 15:28:01 > *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 > module > Dear G?nter, > > The most important thing is that you get reasonable values with these > reads, the actual values don't mean a whole lot. > > One of the manual reads that you did (ofs=0xfa) is what 'map_map' does > for "poke reading". The macros > > MAP_POKE_ARGS(fixed_code), or the older > MAP_POKE_ARGS(*v560->read, fixed_code) > > tell 'map_map' what address offset to poke, and it depends on each module. > > The next thing that happens in 'map_map' is the "poke writing". Could > you try to write to the 'scale_clear' register next? That would be: > > rwdump -a0x33333350 -w16,0 > > --- > > In case you would like to look deeper in 'map_map', you can find it in > module/map/map.c around line-number 103. It's not a very complicated > function that does the following: > > -) Checks user-mapped memory, you don't need to worry about this, it's > mainly for simulating module memory for tests. > > -) Performs the poke-read. > > -) Performs the poke-write. > > -) If it's a BLT mapping, asks the platform-specific code to do that > without further tests. > > -) Otherwise times the poke registers many times to get an idea about > the speed of every single-cycle access. > > If you want to dig even deeper, you can look in > module/map/map_xpc_3310.c which is what is used in the most recent Linux > Rio4's. It's mainly a wrapper around a proprietary black-box library, so > not scary and scary at the same time. > > Best regards, > Hans > > On 2024-02-21 14:32, Weber, Guenter Dr. wrote: >> Dear Hans, >> >> >> with the different register addresses it works. >> >> >> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16 >> Address=0x333333fa >> Raw-read value=0xfaf5 >> >> >> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16 >> Address=0x333333fc >> Raw-read value=0x083a >> >> >> RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16 >> Address=0x333333fe >> Raw-read value=0x01bc >> >> What can we learn from these numbers? >> >> >> >> >> Best greetings >> >> G?nter >> >> >> >> ------------------------------------------------------------------------ >> *Von:* Hans Toshihide T?rnqvist >> *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06 >> *An:* Weber, Guenter Dr. >> *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 >> module >> Hmm, looks like address offset 0 is "not used", could you try >> -a0x333333fa? Or fe and fc at the end,they should be some read-only >> registers. >> >> >> "Weber, Guenter Dr." skrev: (21 februari 2024 >> 12:06:00 CET) >> >>???? Different VME slot of the V560 module, same result. :-( >> >>???? ------------------------------------------------------------------------ >>???? *Von:* subexp-daq im Auftrag >>???? von Weber, Guenter Dr. >>???? *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25 >>???? *An:* Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, >>???? drasi and UCESB. >>???? *Betreff:* Re: [subexp-daq] Report of a possible bug of the >>???? CAEN_V560 module >> >>???? Dear Hans, >> >> >>???? the output from manual reading of the module indeed shows a problem: >> >> >>???? RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16 >>???? Address=0x33333300 >>???? Raw-read value=rwdump: line 28:?? 593 Bus error >>???? $PREFIX $f "$@" >> >> >>???? The module was working with this address in the other DAQ system (as >>???? we did not know the order of the individual switches, we set them >>???? all to "3"). But I can take it our and put it in again at a >>???? different slot, if maybe this particular slot has a hardware >>???? problem. (But I never heard of such thing.) >> >> >> >> >>???? Best greetings >> >>???? G?nter >> >> >>???? ------------------------------------------------------------------------ >>???? *Von:* Hans Toshihide T?rnqvist >>???? *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44 >>???? *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, >>???? Guenter Dr. >>???? *Betreff:* Re: [subexp-daq] Report of a possible bug of the >>???? CAEN_V560 module >>???? Dear G?nter, >> >>???? map_map before mapping tries to read and write some given registers >>???? with a "safe" but slower method of accessing registers, which is >>???? called "poking" in nurdlib. Maybe the method of access on the rio4 >>???? you have is not safe enough and one of the two pokes fails horribly... >> >>???? Could you please double check the module address? Could you also try >>???? using bin/rwdump to read any register in the v560 to see if it's >>???? accessible at all and not a problem with the module implementation >>???? in nurdlib? >> >>???? Something like bin/rwdump -a0x33333300 -r16 >> >>???? Actually the address 0x33333300 looks weird to me, maybe it should >>???? be 0x33330000? >>???? Also for reading, try register offsets fa, fc, fe, with 16 bits >>???? accesseses, they should have some interesting values. >> >>???? Cheers, >>???? Hans >> >> >>???? "Weber, Guenter Dr." skrev: (21 februari >>???? 2024 10:18:29 CET) >> >>???????? Dear H?kan, >> >> >>???????? thanks for the hint to flush and sleep. Indeed, I now see that >>???????? the crash happens in init_slow of V560 at this line: >> >> >>???????? v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT, >>???????? 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); >> >> >>???????? Maybe the code is accessing/writing into a memory location that >>???????? it should better not touch? >> >>???????? This problematic line is then followed by: >> >> >>???????? id=MAP_READ(v560->sicy_map, fixed_code); >> >>???????? The corresponding line in the V560 code on the system that was >>???????? running with this module looks like this: >> >> >>???????? v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, >>???????? 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), >>???????? MAP_POKE_ARGS(*v560->write, scale_clear)); >> >>???????? And is followed by: >> >> >>????????? ? ? mapped_ptr =map_get_mapped_ptr(v560->sicy_map); >>???????? v560->read=mapped_ptr; >>???????? v560->write=mapped_ptr; >> >>???????? Maybe you already have an idea what causes the problem here? >> >> >>???????? I will now go to the system that was running with V560 and make >>???????? a push of the NURDLIB. >> >> >> >> >>???????? Best greetings >> >>???????? G?nter >> >> >> >>???????? ------------------------------------------------------------------------ >>???????? *Von:* subexp-daq im >>???????? Auftrag von H?kan T Johansson >>???????? *Gesendet:* Dienstag, 20. Februar 2024 20:13:32 >>???????? *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB. >>???????? *Betreff:* Re: [subexp-daq] Report of a possible bug of the >>???????? CAEN_V560 module >> >>???????? Dear G?nter, >> >>???????? I took the files you provided and for comparison put them in a >>???????? branch >>???????? 'old_caen_v560'. >> >>???????? git diff origin/old_caen_v560..origin/master >> >>???????? however does not show anything which is suspicious to me. >>???????? Perhaps Hans >>???????? can spot something. >> >>???????? Otherwise, the only idea I can come up with is to continue to >>???????? bisect the >>???????? code inside slow init. >> >>???????? However, before that, I would suggest to add >> >>????????? ? fflush(stdout); sleep(1); >> >>???????? after each printf statement, such that one can be quite sure >>???????? that the >>???????? printout is not eaten when the RIO crash happens.? I.e. that it >>???????? actually >>???????? had gotten further than shown by the prints. >> >>???????? Best regards, >>???????? H?kan >> >> >> >> >>???????? On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: >> >>???????? > >>???????? > Dear friends, >>???????? > >>???????? > >>???????? > I now had a look at the system where the V560 was running. It was also setup >>???????? > by Bastian. And there the code for the V560 module is slightly different >>???????? > from the one included in the NURDLIB branch that I am using on the test >>???????? > system. >>???????? > >>???????? > >>???????? > Maybe you can have a look at it. >>???????? > >>???????? > >>???????? > I also could push the complete NURDLIB from this system, if this helps. >>???????? > >>???????? > >>???????? > >>???????? > >>???????? > Best greetings >>???????? > >>???????? > G?nter >>???????? > >>???????? > >>???????? > >>???????? > >>???????? > ____________________________________________________________________________ >>???????? > Von: subexp-daq im Auftrag von Weber, >>???????? > Guenter Dr. >>???????? > Gesendet: Dienstag, 20. Februar 2024 10:58:27 >>???????? > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. >>???????? > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module >>???????? > >>???????? > Dear friends, >>???????? > >>???????? > >>???????? > I now grabbed a V560 module that was working fine in another DAQ system and >>???????? > put it into our test system. >>???????? > >>???????? > >>???????? > The main.cfg looks like this: >>???????? > >>???????? > >>???????? > log_level=spam # info, verbose, debug, spam >>???????? > >>???????? > CRATE("MCAL") { >>???????? > ? ? GSI_VULOM(0x03000000) { >>???????? > ? ? ? ? timestamp = true # needed to get timestamps in the data output >>???????? > ? ? # ? ecl=0..15 >>???????? > ? ? } >>???????? > ? ? BARRIER >>???????? > ? ? CAEN_V560(0x333333300) { >>???????? > ? ? ? ? use_veto = true >>???????? > ? ? } >>???????? > # ? CAEN_V767A(0x03100000) { >>???????? > # ? } >>???????? > } >>???????? > >>???????? > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate >>???????? > is necessary to talk to it again. >>???????? > >>???????? > >>???????? > The problem occurs in the first slow init of the V560 module. To find the >>???????? > exact line, I added some output to CRATE.C: >>???????? > >>???????? > >>???????? > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. >>???????? > before push_log_level(module) >>???????? > before a_crate->module_init_id = module->id >>???????? > before module->props->init_slow(a_crate, module) >>???????? > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) >>???????? > before module_init_id_mark(a_crate, module) >>???????? > before pop_log_level(module) >>???????? > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. >>???????? > before push_log_level(module) >>???????? > before a_crate->module_init_id = module->id >>???????? > before module->props->init_slow(a_crate, module) >>???????? > >>???????? > >>???????? > The CRATE.C code now looks like this: >>???????? > >>???????? > >>???????? > ? ? TAILQ_FOREACH(module, &a_crate->module_list, next) { >>???????? > ? ? ? ? if (NULL == module->props) { >>???????? > ? ? ? ? ? ? continue; >>???????? > ? ? ? ? } >>???????? > ? ? ? ? LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, >>???????? > ? ? ? ? ? ? keyword_get_string(module->type)); >>???????? > ? ? ? ? printf("before push_log_level(module) \n"); >>???????? > ? ? ? ? push_log_level(module); >>???????? > ? ? ? ? printf("before a_crate->module_init_id = module->id \n"); >>???????? > ? ? ? ? a_crate->module_init_id = module->id; >>???????? > ? ? ? ? printf("before module->props->init_slow(a_crate, module) \n"); >>???????? > ? ? ? ? if (!module->props->init_slow(a_crate, module)) { >>???????? > ? ? ? ? ? ? printf("before pop_log_level(module) \n"); >>???????? > ? ? ? ? ? ? pop_log_level(module); >>???????? > ? ? ? ? ? ? printf("before goto crate_init_done \n"); >>???????? > ? ? ? ? ? ? goto crate_init_done; >>???????? > ? ? ? ? } >>???????? > ? ? ? ? printf("before module_init_id_mark(a_crate, module) \n"); >>???????? > ? ? ? ? module_init_id_mark(a_crate, module); >>???????? > ? ? ? ? printf("before pop_log_level(module) \n"); >>???????? > ? ? ? ? pop_log_level(module); >>???????? > ? ? } >>???????? > >>???????? > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, >>???????? > module)) ..." is doing something quite horrible to the RIO4. >>???????? > >>???????? > >>???????? > This is unfortunate, because my original aim was to show that there is also >>???????? > a bug/mistake in readout_dt of the V560 module. But I did not come this far. >>???????? > >>???????? > >>???????? > Do you have any idea what might cause the freezing of the RIO4? >>???????? > >>???????? > >>???????? > >>???????? > >>???????? > Best greetings and many thanks >>???????? > >>???????? > G?nter >>???????? > >>???????? > >>???????? > >>???????? > >>???????? > >>???????? > >> >> > From g.weber at hi-jena.gsi.de Thu Feb 22 10:04:28 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Thu, 22 Feb 2024 09:04:28 +0000 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de> <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de>, <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se>, Message-ID: Dear friends, after the bug in map_map was fixed, the freeze does not happen again. Very good! Now back to my original concern regarding the V560 module ... readout_dt looks like this: uint32_t caen_v560_readout_dt(struct Crate *a_crate, struct Module *a_module) { (void)a_crate; LOGF(spam)(LOGL, NAME" readout_dt {"); a_module->event_counter.value++; LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }", a_module->event_counter.value); return 0; } The module counter is incremented by one every time readout_dt is executed. This results in a problem in crate.c: diff_module = COUNTER_DIFF(*module->crate_counter, module->event_counter, module->this_minus_crate); /* TODO: Clean this. */ shadow_counter.value = module->shadow.data_counter_value; shadow_counter.mask = module->event_counter.mask; diff_shadow = COUNTER_DIFF(*module->crate_counter, shadow_counter, module->this_minus_crate); create_do_shad = !crate_get_do_shadow(a_crate); printf("%s: diff_module: %u, module_crate_counter: %u, module_event_counter: %u, module_this_minus_crate: %u \n", keyword_get_string(module->type), diff_module, (*module->crate_counter).value, (module->event_counter).value, module->this_minus_crate); if (0 == diff_module && ( create_do_shad || NULL == module->props->readout_shadow || 0 == diff_shadow)) { ok = 1; printf("%u \n", ok); break; } getchar(); When the difference between (*module->crate_counter).value and (module->event_counter).value is evaluated the later was already incremented as readout_dt for the module was already executed while the former counter was not incremented. This is the output: CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, module_event_counter: 1, module_this_minus_crate: 0 Note that diff_module shows the result of an "0 - 1" operation when working with unsigned integers. The original version of the crate.c code would now again execute readout_dt of module V560, thus incrementing the module counter another time. Thus, diff_module would be "0 - 2". This would repeat until the timeout condition a bit later in the code is reached. Then the modules would be re-initialized, thus setting (module->event_counter).value of V560 back to zero. But the crate counter would be incremented. Thus, by shear luck the next try of the same test would have (*module->crate_counter).value and (module->event_counter).value both equal to 1. And from this point the DAQ is running as intended. Ok, I hope the explanation was clear and I understood correctly what is happening. Best greetings G?nter ---------------- G?nter Weber Helmholtz-Institut Jena Fr?belstieg 3 07743 Jena Germany Phone: +49-3641-947605 www.hi-jena.de GSI Helmholtzzentrum f?r Schwerionenforschung Planckstrasse 1 64291 Darmstadt Germany www.gsi.de ________________________________ Von: subexp-daq im Auftrag von Weber, Guenter Dr. Gesendet: Mittwoch, 21. Februar 2024 16:14:45 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear Hans, writing into the register works fine (I tried it several times): RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 Address=0x33333350 Raw-write done. RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 Address=0x33333350 Raw-write done. RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 Address=0x33333350 Raw-write done. RIO4-MCAL-1 mbsdaq > rwdump -a0x33333350 -w16,0 Address=0x33333350 Raw-write done. I could now litter map_map with printf() outputs to see where execution of v560->sicy_map = map_map(v560->address, MAP_SIZE, KW_NOBLT, 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); is failing. Should I proceed this way? Or is there anything else that I could check? (As I understand, the slightly different implementation of V560 on our running system is not indicative of a specific issue, but just due to fact that this is a deprecated version of NURDLIB. Right?) Best greetings G?nter ________________________________ Von: Hans Toshihide T?rnqvist Gesendet: Mittwoch, 21. Februar 2024 15:28:01 An: Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, Guenter Dr. Betreff: Re: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear G?nter, The most important thing is that you get reasonable values with these reads, the actual values don't mean a whole lot. One of the manual reads that you did (ofs=0xfa) is what 'map_map' does for "poke reading". The macros MAP_POKE_ARGS(fixed_code), or the older MAP_POKE_ARGS(*v560->read, fixed_code) tell 'map_map' what address offset to poke, and it depends on each module. The next thing that happens in 'map_map' is the "poke writing". Could you try to write to the 'scale_clear' register next? That would be: rwdump -a0x33333350 -w16,0 --- In case you would like to look deeper in 'map_map', you can find it in module/map/map.c around line-number 103. It's not a very complicated function that does the following: -) Checks user-mapped memory, you don't need to worry about this, it's mainly for simulating module memory for tests. -) Performs the poke-read. -) Performs the poke-write. -) If it's a BLT mapping, asks the platform-specific code to do that without further tests. -) Otherwise times the poke registers many times to get an idea about the speed of every single-cycle access. If you want to dig even deeper, you can look in module/map/map_xpc_3310.c which is what is used in the most recent Linux Rio4's. It's mainly a wrapper around a proprietary black-box library, so not scary and scary at the same time. Best regards, Hans On 2024-02-21 14:32, Weber, Guenter Dr. wrote: > Dear Hans, > > > with the different register addresses it works. > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fa -r16 > Address=0x333333fa > Raw-read value=0xfaf5 > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fc -r16 > Address=0x333333fc > Raw-read value=0x083a > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x333333fe -r16 > Address=0x333333fe > Raw-read value=0x01bc > > What can we learn from these numbers? > > > > > Best greetings > > G?nter > > > > ------------------------------------------------------------------------ > *Von:* Hans Toshihide T?rnqvist > *Gesendet:* Mittwoch, 21. Februar 2024 12:43:06 > *An:* Weber, Guenter Dr. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the CAEN_V560 > module > Hmm, looks like address offset 0 is "not used", could you try > -a0x333333fa? Or fe and fc at the end,they should be some read-only > registers. > > > "Weber, Guenter Dr." skrev: (21 februari 2024 > 12:06:00 CET) > > Different VME slot of the V560 module, same result. :-( > > ------------------------------------------------------------------------ > *Von:* subexp-daq im Auftrag > von Weber, Guenter Dr. > *Gesendet:* Mittwoch, 21. Februar 2024 11:40:25 > *An:* Hans Toshihide T?rnqvist; Discuss use of Nurdlib, TRLO II, > drasi and UCESB. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > > Dear Hans, > > > the output from manual reading of the module indeed shows a problem: > > > RIO4-MCAL-1 mbsdaq > rwdump -a0x33333300 -r16 > Address=0x33333300 > Raw-read value=rwdump: line 28: 593 Bus error > $PREFIX $f "$@" > > > The module was working with this address in the other DAQ system (as > we did not know the order of the individual switches, we set them > all to "3"). But I can take it our and put it in again at a > different slot, if maybe this particular slot has a hardware > problem. (But I never heard of such thing.) > > > > > Best greetings > > G?nter > > > ------------------------------------------------------------------------ > *Von:* Hans Toshihide T?rnqvist > *Gesendet:* Mittwoch, 21. Februar 2024 11:14:44 > *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB.; Weber, > Guenter Dr. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > Dear G?nter, > > map_map before mapping tries to read and write some given registers > with a "safe" but slower method of accessing registers, which is > called "poking" in nurdlib. Maybe the method of access on the rio4 > you have is not safe enough and one of the two pokes fails horribly... > > Could you please double check the module address? Could you also try > using bin/rwdump to read any register in the v560 to see if it's > accessible at all and not a problem with the module implementation > in nurdlib? > > Something like bin/rwdump -a0x33333300 -r16 > > Actually the address 0x33333300 looks weird to me, maybe it should > be 0x33330000? > Also for reading, try register offsets fa, fc, fe, with 16 bits > accesseses, they should have some interesting values. > > Cheers, > Hans > > > "Weber, Guenter Dr." skrev: (21 februari > 2024 10:18:29 CET) > > Dear H?kan, > > > thanks for the hint to flush and sleep. Indeed, I now see that > the crash happens in init_slow of V560 at this line: > > > v560->sicy_map=map_map(v560->address, MAP_SIZE, KW_NOBLT, > 0, 0, MAP_POKE_ARGS(fixed_code), MAP_POKE_ARGS(scale_clear)); > > > Maybe the code is accessing/writing into a memory location that > it should better not touch? > > This problematic line is then followed by: > > > id=MAP_READ(v560->sicy_map, fixed_code); > > The corresponding line in the V560 code on the system that was > running with this module looks like this: > > > v560->sicy_map=map_map(v560->address, MAP_SIZE_MAX(*v560), KW_NOBLT, > 0, 0, MAP_POKE_ARGS(*v560->read, fixed_code), > MAP_POKE_ARGS(*v560->write, scale_clear)); > > And is followed by: > > > mapped_ptr =map_get_mapped_ptr(v560->sicy_map); > v560->read=mapped_ptr; > v560->write=mapped_ptr; > > Maybe you already have an idea what causes the problem here? > > > I will now go to the system that was running with V560 and make > a push of the NURDLIB. > > > > > Best greetings > > G?nter > > > > ------------------------------------------------------------------------ > *Von:* subexp-daq im > Auftrag von H?kan T Johansson > *Gesendet:* Dienstag, 20. Februar 2024 20:13:32 > *An:* Discuss use of Nurdlib, TRLO II, drasi and UCESB. > *Betreff:* Re: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > > Dear G?nter, > > I took the files you provided and for comparison put them in a > branch > 'old_caen_v560'. > > git diff origin/old_caen_v560..origin/master > > however does not show anything which is suspicious to me. > Perhaps Hans > can spot something. > > Otherwise, the only idea I can come up with is to continue to > bisect the > code inside slow init. > > However, before that, I would suggest to add > > fflush(stdout); sleep(1); > > after each printf statement, such that one can be quite sure > that the > printout is not eaten when the RIO crash happens. I.e. that it > actually > had gotten further than shown by the prints. > > Best regards, > H?kan > > > > > On Tue, 20 Feb 2024, Weber, Guenter Dr. wrote: > > > > > Dear friends, > > > > > > I now had a look at the system where the V560 was running. It was also setup > > by Bastian. And there the code for the V560 module is slightly different > > from the one included in the NURDLIB branch that I am using on the test > > system. > > > > > > Maybe you can have a look at it. > > > > > > I also could push the complete NURDLIB from this system, if this helps. > > > > > > > > > > Best greetings > > > > G?nter > > > > > > > > > > ____________________________________________________________________________ > > Von: subexp-daq im Auftrag von Weber, > > Guenter Dr. > > Gesendet: Dienstag, 20. Februar 2024 10:58:27 > > An: Discuss use of Nurdlib, TRLO II, drasi and UCESB. > > Betreff: [subexp-daq] Report of a possible bug of the CAEN_V560 module > > > > Dear friends, > > > > > > I now grabbed a V560 module that was working fine in another DAQ system and > > put it into our test system. > > > > > > The main.cfg looks like this: > > > > > > log_level=spam # info, verbose, debug, spam > > > > CRATE("MCAL") { > > GSI_VULOM(0x03000000) { > > timestamp = true # needed to get timestamps in the data output > > # ecl=0..15 > > } > > BARRIER > > CAEN_V560(0x333333300) { > > use_veto = true > > } > > # CAEN_V767A(0x03100000) { > > # } > > } > > > > Starting the DAQ now results in a freeze of the RIO4. A reset of the crate > > is necessary to talk to it again. > > > > > > The problem occurs in the first slow init of the V560 module. To find the > > exact line, I added some output to CRATE.C: > > > > > > 10: crate/crate.c:923: .Slow-init module[0]=GSI_VULOM. > > before push_log_level(module) > > before a_crate->module_init_id = module->id > > before module->props->init_slow(a_crate, module) > > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > > before module_init_id_mark(a_crate, module) > > before pop_log_level(module) > > 10: crate/crate.c:923: .Slow-init module[1]=CAEN_V560. > > before push_log_level(module) > > before a_crate->module_init_id = module->id > > before module->props->init_slow(a_crate, module) > > > > > > The CRATE.C code now looks like this: > > > > > > TAILQ_FOREACH(module, &a_crate->module_list, next) { > > if (NULL == module->props) { > > continue; > > } > > LOGF(info)(LOGL, "Slow-init module[%u]=%s.", module->id, > > keyword_get_string(module->type)); > > printf("before push_log_level(module) \n"); > > push_log_level(module); > > printf("before a_crate->module_init_id = module->id \n"); > > a_crate->module_init_id = module->id; > > printf("before module->props->init_slow(a_crate, module) \n"); > > if (!module->props->init_slow(a_crate, module)) { > > printf("before pop_log_level(module) \n"); > > pop_log_level(module); > > printf("before goto crate_init_done \n"); > > goto crate_init_done; > > } > > printf("before module_init_id_mark(a_crate, module) \n"); > > module_init_id_mark(a_crate, module); > > printf("before pop_log_level(module) \n"); > > pop_log_level(module); > > } > > > > Thus, to me it looks like the check "if (!module->props->init_slow(a_crate, > > module)) ..." is doing something quite horrible to the RIO4. > > > > > > This is unfortunate, because my original aim was to show that there is also > > a bug/mistake in readout_dt of the V560 module. But I did not come this far. > > > > > > Do you have any idea what might cause the freezing of the RIO4? > > > > > > > > > > Best greetings and many thanks > > > > G?nter > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.tornqvist at chalmers.se Thu Feb 22 13:10:29 2024 From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=) Date: Thu, 22 Feb 2024 13:10:29 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: References: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de> <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de> <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se> Message-ID: <53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se> On 2024-02-22 10:04, Weber, Guenter Dr. wrote: > Dear friends, > > after the bug in map_map was fixed, the freeze does not happen again. > Very good! Thanks for testing, I'm saving that fix myself! > Now back to my original concern regarding the V560 module ... > > readout_dt looks like this: > > uint32_t > caen_v560_readout_dt(structCrate*a_crate, structModule*a_module) > { > ? ? (void)a_crate; > LOGF(spam)(LOGL, NAME" readout_dt {"); > a_module->event_counter.value++; > LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }", > a_module->event_counter.value); > return0; > } > > The module counter is incremented by one every time readout_dt is > executed. This results in a problem in crate.c: > > diff_module=COUNTER_DIFF(*module->crate_counter, > module->event_counter, module->this_minus_crate); > ? ? ? ? ? ? /* TODO: Clean this. */ > shadow_counter.value= > module->shadow.data_counter_value; > shadow_counter.mask=module->event_counter.mask; > diff_shadow=COUNTER_DIFF(*module->crate_counter, > shadow_counter, module->this_minus_crate); > > create_do_shad=!crate_get_do_shadow(a_crate); > printf("%s: diff_module: %u, module_crate_counter: %u, > module_event_counter: %u, module_this_minus_crate: %u\n", > keyword_get_string(module->type), diff_module, > (*module->crate_counter).value, (module->event_counter).value, > module->this_minus_crate); > if(0==diff_module&& > ? ? ? ? ? ? ? ? ( create_do_shad|| > NULL==module->props->readout_shadow|| > 0==diff_shadow)) { > ok=1; > printf("%u\n", ok); > break; > ? ? ? ? ? ? } > getchar(); > > When the difference between (*module->crate_counter).value and > (module->event_counter).value is evaluated the later was already > incremented as readout_dt for the module was already executed while the > former counter was not incremented. The crate counter should have been incremented by the readout function that calls 'crate_readout_dt'. If I remember correctly you used the r3bfuser, so somewhere in fuser.c there's a function fuser_readout which calls crate_tag_counter_increase. The crate counter increment is "abstracted" away a bit, due to module tagging and multi-event support when it can increase by an arbritary value between events. > This is the output: > > CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, > module_event_counter: 1, module_this_minus_crate: 0 > > Note that diff_module shows the result of an "0 - 1" operation when > working with unsigned integers. Looks like the crate counter stays still on 0 indeed. Do you have a snippet of the drasi log around this error message? > The original version of the crate.c code would now again execute > readout_dt of module V560, thus incrementing the module counter another > time. Thus, diff_module would be "0 - 2". This would repeat until the > timeout condition a bit later in the code is reached. > Then the modules would be re-initialized, thus setting > (module->event_counter).value of V560 back to zero. But the crate > counter would be incremented. Thus, by shear luck the next try of the > same test would have (*module->crate_counter).value and > (module->event_counter).value both equal to 1. And from this point the > DAQ is running as intended. It sounds to me like the old version was very broken and should be buried, deep. Best regards, Hans > Ok, I hope the explanation was clear and I understood correctly what is > happening. > > Best greetings > G?nter From f96hajo at chalmers.se Thu Feb 22 13:46:29 2024 From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=) Date: Thu, 22 Feb 2024 13:46:29 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: <53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se> References: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de> <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de> <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se> <53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se> Message-ID: <2a4cfd01-8c50-2721-c4e2-89f1a3e55a12@chalmers.se> Dear G?nter, just a side-note (since I'm not deep into r3bfuser...): perhaps you already have, but if not, I suspect it would be good if you could push the versions of the code you are using (unless it is plain master branches). Just to avoid some guesswork. Cheers, H?kan On Thu, 22 Feb 2024, Hans Toshihide T?rnqvist wrote: > On 2024-02-22 10:04, Weber, Guenter Dr. wrote: >> Dear friends, >> >> after the bug in map_map was fixed, the freeze does not happen again. >> Very good! > > Thanks for testing, I'm saving that fix myself! > >> Now back to my original concern regarding the V560 module ... >> >> readout_dt looks like this: >> >> uint32_t >> caen_v560_readout_dt(structCrate*a_crate, structModule*a_module) >> { >> ? ? (void)a_crate; >> LOGF(spam)(LOGL, NAME" readout_dt {"); >> a_module->event_counter.value++; >> LOGF(spam)(LOGL, NAME" readout_dt(ctr=0x%08x) }", >> a_module->event_counter.value); >> return0; >> } >> >> The module counter is incremented by one every time readout_dt is >> executed. This results in a problem in crate.c: >> >> diff_module=COUNTER_DIFF(*module->crate_counter, >> module->event_counter, module->this_minus_crate); >> ? ? ? ? ? ? /* TODO: Clean this. */ >> shadow_counter.value= >> module->shadow.data_counter_value; >> shadow_counter.mask=module->event_counter.mask; >> diff_shadow=COUNTER_DIFF(*module->crate_counter, >> shadow_counter, module->this_minus_crate); >> >> create_do_shad=!crate_get_do_shadow(a_crate); >> printf("%s: diff_module: %u, module_crate_counter: %u, >> module_event_counter: %u, module_this_minus_crate: %u\n", >> keyword_get_string(module->type), diff_module, >> (*module->crate_counter).value, (module->event_counter).value, >> module->this_minus_crate); >> if(0==diff_module&& >> ? ? ? ? ? ? ? ? ( create_do_shad|| >> NULL==module->props->readout_shadow|| >> 0==diff_shadow)) { >> ok=1; >> printf("%u\n", ok); >> break; >> ? ? ? ? ? ? } >> getchar(); >> >> When the difference between (*module->crate_counter).value and >> (module->event_counter).value is evaluated the later was already >> incremented as readout_dt for the module was already executed while the >> former counter was not incremented. > > The crate counter should have been incremented by the readout function > that calls 'crate_readout_dt'. If I remember correctly you used the > r3bfuser, so somewhere in fuser.c there's a function fuser_readout which > calls crate_tag_counter_increase. The crate counter increment is > "abstracted" away a bit, due to module tagging and multi-event support > when it can increase by an arbritary value between events. > >> This is the output: >> >> CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, >> module_event_counter: 1, module_this_minus_crate: 0 >> >> Note that diff_module shows the result of an "0 - 1" operation when >> working with unsigned integers. > > Looks like the crate counter stays still on 0 indeed. Do you have a > snippet of the drasi log around this error message? > >> The original version of the crate.c code would now again execute >> readout_dt of module V560, thus incrementing the module counter another >> time. Thus, diff_module would be "0 - 2". This would repeat until the >> timeout condition a bit later in the code is reached. >> Then the modules would be re-initialized, thus setting >> (module->event_counter).value of V560 back to zero. But the crate >> counter would be incremented. Thus, by shear luck the next try of the >> same test would have (*module->crate_counter).value and >> (module->event_counter).value both equal to 1. And from this point the >> DAQ is running as intended. > > It sounds to me like the old version was very broken and should be > buried, deep. > > Best regards, > Hans > >> Ok, I hope the explanation was clear and I understood correctly what is >> happening. >> >> Best greetings >> G?nter > -- > subexp-daq mailing list > subexp-daq at lists.chalmers.se > https://lists.chalmers.se/mailman/listinfo/subexp-daq > From g.weber at hi-jena.gsi.de Thu Feb 22 16:09:25 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Thu, 22 Feb 2024 15:09:25 +0000 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: <423e14dd-73a9-4fbc-b106-4856cb1e1997@chalmers.se> References: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de> <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de> <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se> <53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se> <62267106fb284e50b805b9dba09b8483@hi-jena.gsi.de>, <423e14dd-73a9-4fbc-b106-4856cb1e1997@chalmers.se> Message-ID: <3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de> Dear Hans, many thanks! And in particular for all the detailed explanations. For the VULOM "sleep 1" did not do the trick, but "sleep 10" worked. Is there any chance to ask the VULOM if it feels ready to do the job, instead of using a random waiting time? Also I noticed that when aksing the VULOM which firmware it is using, we get a slightly different reply than the actual firmware number: RIO4-MCAL-1 mbsdaq > vulomflash --addr=3 --read VULOM base address: 0x03000000 hwmap_mapvme.c:398: LOG: Virtual address for VULOM/TRIDI @ VME 0x03000000 is 0x3005e000. Performing command 'read'... VOLUM+0 => 0x14091f20 VOLUM+RANGE_REG(0x800000) => 0x0000006a Released vme ptr. But the actual firmware number is 1409285e. For comparison should one look only at the first four hex numbers? Or is there more to take into account? For the V560 module, misusing the bitmask for the counter resolved the issue. At the end of this mail, I attach the new log. Maybe you find something notable, but to me it looks fine now. Our next steps would be as follows: 1) Wait for you to implement the bugfixes of the last days into NURDLIB. 2) Setting up the test system with the most recent version of NURDLIB and checking, if our minimal system with VULOM and V560 is now running smoothly. 3) Hammering the V767 TDC into NURDLIB. 4) Once we have achieved this, we would go back to testing the SIS3316 modules. Best greetings G?nter 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 56583). Thread has no error buffer yet... CPUS: 1 delay: 1 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: 56583). Thread has no error buffer yet... HOST: RIO4-MCAL-1 Token: d6d68d7b (d6d68d7b:d6d68d7b) [/mbsusr/mbsdaq/.drasi_tokens/mcal] 10: lwroc_hostname_util.c:457: Own address: 192.168.1.71/255.255.255.0 (eth1). 10: lwroc_data_pipe.c:145: Data buffer READOUT_PIPE, size 419430400 = 0x19000000, 1 consumers. 10: lwroc_triva_readout.c:66: Silence TRIVA (HALT) 10: lwroc_net_io.c:167: Started server on port 56583 (data port 39534). client union size: 244 208 188 508 640 204 204 => 640 10: lwroc_udp_awaken_hints.c:159: UDP awaken hints file: /tmp/drasi.u1001/drasi.hints.u1001.RIO4-MCAL-1:56583 10: lwroc_main.c:706: Log message rate limit not in effect. 10: lwroc_readout.c:112: call readout_init... 10: lwroc_thread_util.c:117: This is the triva control thread! 10: lwroc_thread_util.c:117: This is the net io thread! 10: lwroc_thread_util.c:117: This is the slow_async thread! 10: lwroc_thread_util.c:117: This is the data server thread! 8: lwroc_message_wait.c:86: Waited 1 seconds for msg client. 8: lwroc_triva_state.c:414: Waited 1 seconds for initial slave and EB connection(s): 8: lwroc_triva_state.c:422: [EB lyserv] (state 0) 10: lwroc_message_internal.c:472: Message client connected! 10: lwroc_net_trans.c:1156: [drasi] Transport client connected (data) [192.168.1.1]. 10: lwroc_triva_control.c:370: Setup TRIVA (DISBUS, HALT, MASTER, RESET) 10: lwroc_triva_control.c:418: Minimum event time ctime(5000)+1*rd(686)+3*wr(634)+fctime(1000)=8588 ns (116.442 kHz) 10: lwroc_triva_state.c:1486: (Re)send ident messages... 10: lwroc_triva_control.c:495: START TEST ACQ: HALT, CLEAR=RESET, MT=1 9: lwroc_triva_control.c:507: TEST: GO 10: lwroc_triva_control.c:725: RUN: RESET 10: lwroc_triva_control.c:729: RUN: MT=14 9: lwroc_triva_control.c:737: GO (1 good test triggers done) (max 116.4 kHz) 10: lwroc_triva_readout.c:376: Trigger 14 seen. 10: config/config.c:181: Will try default cfg path='/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default', can be set with NURDLIB_DEF_PATH. 8: lwroc_triva_state.c:2399: Master: deadtime: 1. Status: 0x10 (IN_READOUT). EC: 1 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0. 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting... 10: config/parser.c:287: Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' { 10: config/parser.c:299: Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' } 10: config/parser.c:287: Opened './main.cfg' { 10: config/config.c:1299: .Global log level=debug. 10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' { 10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' } 10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' { 10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' } 10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' { 10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' } 10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen_v560.cfg' { 10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen_v560.cfg' } 10: config/parser.c:287: .Opened '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' { 10: config/parser.c:299: .Closed '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' } 10: config/parser.c:299: Closed './main.cfg' } 10: crate/crate.c:348: crate_create { 10: crate/crate.c:674: crate_create(MCAL) } 10: crate/crate.c:900: crate_init(MCAL) { 10: crate/crate.c:924: .Slow-init module[0]=GSI_VULOM. LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) 10: crate/crate.c:924: .Slow-init module[1]=CAEN_V560. 10: module/map/map.c:224: ...rd(0x33333300+0xfa/16)=963ns wr(0x33333300+0x50/16)=713ns. 10: crate/crate.c:977: .Fast-init module[0]=GSI_VULOM. 10: crate/crate.c:977: .Fast-init module[1]=CAEN_V560. 10: crate/crate.c:1074: crate_init(MCAL) } 10: ctrl/ctrl.c:788: Control server online. Thread has no error buffer yet... 10: f_user.c:559: WR ID=0x200. 10: f_user.c:565: TS offset unset. Will not modify stamp. 10: f_user.c:572: TPAT: No. 10: f_user.c:573: Sync-check: No. 10: f_user.c:575: Spill triggers: No. 10: f_user.c:576: LMU: No. 10: f_user.c:577: Timer latches: No. 10: f_user.c:578: Spill shape: No. 10: f_user.c:579: Micro-structure: No. 10: f_user.c:581: Multi-event flag: No. 10: f_user.c:586: UDP destination: None. GSI_VULOM: diff_module: 0, module_crate_counter: 0, module_event_counter: 0, module_this_minus_crate: 0 CAEN_V560: diff_module: 0, module_crate_counter: 0, module_event_counter: 0, module_this_minus_crate: 0 ... ________________________________ Von: Hans Toshihide T?rnqvist Gesendet: Donnerstag, 22. Februar 2024 15:26:15 An: Weber, Guenter Dr.; H?kan T Johansson Betreff: Re: AW: [subexp-daq] Report of a possible bug of the CAEN_V560 module Dear G?nter, I have some ideas after your two e-mails, logs and config files are really useful :) I'll shorten the log for my comments. for the tl;dr version, just skip to the bottom code change suggestion. On 2024-02-22 14:21, Weber, Guenter Dr. wrote: > Dear Hans, > > here is the output of the DRASI log from the test system. I am using the > most recent (or almost most recent) versions of the various software > packages from GITLAB. The system currently only has a VULOM and a V560 > module. > > I added some comments to the output. To me it looks, the VULOM has some > problems at the beginning. And, separate from the VULOM issue, the math > of the difference in counters does not work out for the V560 module. > > The VULOM issue I did not notice before. So, maybe by adding a lot of > output lines into crate.c and then removing them I have broken > something. But it also possible that before I simply overlooked these > error message as in the end it looks like the DAQ is working fine. > > Best greetings > > G?nter > > 10: f_user.c:559: WR ID=0x200. > 10: f_user.c:565: TS offset unset. Will not modify stamp. > 10: f_user.c:572: TPAT: No. > 10: f_user.c:573: Sync-check: No. > 10: f_user.c:575: Spill triggers: No. > 10: f_user.c:576: LMU: No. > 10: f_user.c:577: Timer latches: No. > 10: f_user.c:578: Spill shape: No. > 10: f_user.c:579: Micro-structure: No. > 10: f_user.c:581: Multi-event flag: No. > 10: f_user.c:586: UDP destination: None. > ***** looks like the VULOM has a problem. I did not notice this before ***** > 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver has had > sync failure, status=0x000a8000. > 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver not > synced, status=0x000a8000. > 5: crate/crate.c:1250: .MCAL[0]=GSI_VULOM: readout_dt failed = 0x00000004. > 5: crate/crate.c:1322: .MCAL[0]=GSI_VULOM: Event counter: > crate=0x00000000/32, this-crate=0x00000000, module=0x00000000/31 > diff=0xdeadbeef, shadow=0x00000000/31 diff=0xdeadbeef. The sync could be bad if the DAQ starts too fast after setting up the timestamp source. Try "sleep 1" after setting up the vulom so the timestamp receiver in the vulom can latch onto its input, even if it's wired internally. > ***** here we see the counter mismatch - I did add a 250 ms delay to > crate.c, before it does readout_dt again to avoid having thousands of > output lines here****** > * > CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, > module_event_counter: 1, module_this_minus_crate: 0 > CAEN_V560: diff_module: 4294967294, module_crate_counter: 0, > module_event_counter: 2, module_this_minus_crate: 0 > CAEN_V560: diff_module: 4294967293, module_crate_counter: 0, > module_event_counter: 3, module_this_minus_crate: 0 > CAEN_V560: diff_module: 4294967292, module_crate_counter: 0, > module_event_counter: 4, module_this_minus_crate: 0 > ***** after four trials of readout_dt of V560, we reach the timeout of 1 > second***** > 5: crate/crate.c:1287: .MCAL[1]=CAEN_V560: readout_dt timeout. > 5: crate/crate.c:1322: .MCAL[1]=CAEN_V560: Event counter: > crate=0x00000000/32, this-crate=0x00000000, module=0x00000004/32 > diff=0xfffffffc, shadow=0x00000000/32 diff=0x00000000. This is an artifact of the soft counter in the v560, obviously we don't expect the module to have more accepted events while polling it, but the real problem comes a bit later. > 5: crate/crate.c:1394: .MCAL: readout_dt failed! > 5: crate/crate.c:1501: .MCAL: had problems, re-initializing. > 10: crate/crate.c:684: .crate_deinit(MCAL) { > 10: crate/crate.c:708: .crate_deinit(MCAL) } > 8: lwroc_triva_state.c:2028: Master TRIVA/MI no progress last second, > and in deadtime. > 8: lwroc_triva_state.c:2399: Master: deadtime: 1. Status: 0x10 > (IN_READOUT). EC: 2 > 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0. > 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting... > 10: crate/crate.c:900: .crate_init(MCAL) { > 10: crate/crate.c:924: ..Slow-init module[0]=GSI_VULOM. > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > 10: crate/crate.c:924: ..Slow-init module[1]=CAEN_V560. > 10: module/map/map.c:224: ....rd(0x33333300+0xfa/16)=963ns > wr(0x33333300+0x50/16)=713ns. > 10: crate/crate.c:977: ..Fast-init module[0]=GSI_VULOM. > 10: crate/crate.c:977: ..Fast-init module[1]=CAEN_V560. > 10: crate/crate.c:1074: .crate_init(MCAL) } > 5: f_user.c:1257: .had readout error, ret=0x14, trigger=14, prev=0 This is the last log from the readout part, it says trigger 14 was handled. This trigger is always fired in MBS-like DAQ's before starting the event loop, but no master start is delivered to the modules. No physical event is associated with it, so we expect no counter increases and no event data. In general we read out "everything" for every trigger and rely on modules reporting their status/content properly. Modules like the v560 ruin this since we always check it for all events and the counter always increments, and it clearly shouldn't for trigger 14. So, in this case it really did test the incorrect software logic... I had a look in the v560 manual once more and only now did I realize that it is not trigger based, the scalers are only available on-the-fly. The event counter makes no sense then, so I will concede and suggest you set the module counter mask to 0. Have a look in module/gsi_vftx2/gsi_vftx2.c line 79, another module without an event-counter which skips the whole counting stuff: vftx2->module.event_counter.mask = 0; Put something similar in module/caen_v560/caen_v560.c line 41, and feel free to remove the increment in readout_dt. Hope the extra info isn't too verbose... Best regards, Hans -------------- next part -------------- An HTML attachment was scrubbed... URL: From f96hajo at chalmers.se Thu Feb 22 17:06:19 2024 From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=) Date: Thu, 22 Feb 2024 17:06:19 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: <3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de> References: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de> <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de> <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se> <53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se> <62267106fb284e50b805b9dba09b8483@hi-jena.gsi.de>, <423e14dd-73a9-4fbc-b106-4856cb1e1997@chalmers.se> <3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de> Message-ID: <21a0486d-cd94-a93a-f677-2bd1e2bf4bb5@chalmers.se> On Thu, 22 Feb 2024, Weber, Guenter Dr. wrote: > > Dear Hans, > > > many thanks! And in particular for all the detailed explanations. > > > For the VULOM "sleep 1" did not do the trick, but "sleep 10" worked. Is > there any chance to ask the VULOM if it feels ready to do the job, instead > of using a random waiting time? Now there is! Update trloii and recompile trlo_ctrl, --trig-status will at the end show some extra lines: Serial timestamp status:(0x000a8004) words: 4 badbits: 0 CHKsum:0x00 Serial timestamp: Sync: no Bitstr. sync: no, had loss Data ptn: no, had loss Where it should say "Sync: ok" when the receiver has locked. The 'had loss' and bad bits count can be cleared (when locked) by issuing "pulse = SERIAL_TSTAMP_FAIL_CLEAR" > Also I noticed that when aksing the VULOM which firmware it is using, we get > a slightly different reply than the actual firmware number: > > RIO4-MCAL-1 mbsdaq > vulomflash --addr=3 --read > VULOM base address: 0x03000000 > hwmap_mapvme.c:398: LOG: Virtual address for VULOM/TRIDI @ VME 0x03000000 is > 0x3005e000. > Performing command 'read'... > VOLUM+0 => 0x14091f20 > VOLUM+RANGE_REG(0x800000) => 0x0000006a > Released vme ptr. > But the actual firmware number is 1409285e. > > For comparison should one look only at the first four hex numbers? Or is > there more to take into account? Yes, vulomflash --read reads at offset 0, and at that offset is also a TRIVA module mimic, which only uses the low 16 bits however. So the high 16 bits give part of the firmware hash. > For the V560 module, misusing the bitmask for the counter?resolved the > issue. At the end of this mail, I attach the new log. Maybe you find > something notable, but to me it looks fine now. > > > Our next steps would be as follows: > > > 1) Wait for you to implement the bugfixes of the last days into NURDLIB. > > 2) Setting up the test system with the most recent version of NURDLIB and > checking, if our minimal system with VULOM and V560 is now running smoothly. > > 3) Hammering the V767 TDC into NURDLIB. > > 4) Once we have achieved this, we would go back to testing the SIS3316 > modules. > > > > Best greetings > > G?nter Cheers, H?kan > > > > > 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: > 56583). > Thread has no error buffer yet... > CPUS: 1 > delay: 1 > 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: > 56583). > Thread has no error buffer yet... > HOST: RIO4-MCAL-1 > Token: d6d68d7b (d6d68d7b:d6d68d7b) [/mbsusr/mbsdaq/.drasi_tokens/mcal] > 10: lwroc_hostname_util.c:457: Own address: 192.168.1.71/255.255.255.0 > (eth1). > 10: lwroc_data_pipe.c:145: Data buffer READOUT_PIPE, size 419430400 = > 0x19000000, 1 consumers. > 10: lwroc_triva_readout.c:66: Silence TRIVA? (HALT) > 10: lwroc_net_io.c:167: Started server on port 56583 (data port 39534). > client union size: 244 208 188 508 640 204 204? => 640 > 10: lwroc_udp_awaken_hints.c:159: UDP awaken hints file: > /tmp/drasi.u1001/drasi.hints.u1001.RIO4-MCAL-1:56583 > 10: lwroc_main.c:706: Log message rate limit not in effect. > 10: lwroc_readout.c:112: call readout_init... > 10: lwroc_thread_util.c:117: This is the triva control thread! > 10: lwroc_thread_util.c:117: This is the net io thread! > 10: lwroc_thread_util.c:117: This is the slow_async thread! > 10: lwroc_thread_util.c:117: This is the data server thread! > 8: lwroc_message_wait.c:86: Waited 1 seconds for msg client. > 8: lwroc_triva_state.c:414: Waited 1 seconds for initial slave and EB > connection(s): > 8: lwroc_triva_state.c:422: [EB lyserv] (state 0) > 10: lwroc_message_internal.c:472: Message client connected! > 10: lwroc_net_trans.c:1156: [drasi] Transport client connected (data) > [192.168.1.1]. > 10: lwroc_triva_control.c:370: Setup TRIVA? (DISBUS, HALT, MASTER, RESET) > 10: lwroc_triva_control.c:418: Minimum event time > ctime(5000)+1*rd(686)+3*wr(634)+fctime(1000)=8588 ns (116.442 kHz) > 10: lwroc_triva_state.c:1486: (Re)send ident messages... > 10: lwroc_triva_control.c:495: START TEST ACQ: HALT, CLEAR=RESET, MT=1 > 9: lwroc_triva_control.c:507: TEST: GO > 10: lwroc_triva_control.c:725: RUN: RESET > 10: lwroc_triva_control.c:729: RUN: MT=14 > 9: lwroc_triva_control.c:737:?? GO (1 good test triggers done) (max 116.4 > kHz) > 10: lwroc_triva_readout.c:376: Trigger 14 seen. > 10: config/config.c:181: Will try default cfgpath='/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default > ', can be set with NURDLIB_DEF_PATH. > 8: lwroc_triva_state.c:2399: Master: deadtime: 1.? Status: 0x10 > (IN_READOUT).? EC: 1 > 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0. > 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting... > 10: config/parser.c:287: Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/glob > al.cfg' { > 10: config/parser.c:299: Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/glob > al.cfg' } > 10: config/parser.c:287: Opened './main.cfg' { > 10: config/config.c:1299: .Global log level=debug. > 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crat > e.cfg' { > 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crat > e.cfg' } > 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_ > vulom.cfg' { > 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_ > vulom.cfg' } > 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/modu > le_log_level.cfg' { > 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/modu > le_log_level.cfg' } > 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen > _v560.cfg' { > 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen > _v560.cfg' } > 10: config/parser.c:287: .Opened'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/modu > le_log_level.cfg' { > 10: config/parser.c:299: .Closed'/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/modu > le_log_level.cfg' } > 10: config/parser.c:299: Closed './main.cfg' } > 10: crate/crate.c:348: crate_create { > 10: crate/crate.c:674: crate_create(MCAL) } > 10: crate/crate.c:900: crate_init(MCAL) { > 10: crate/crate.c:924: .Slow-init module[0]=GSI_VULOM. > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > 10: crate/crate.c:924: .Slow-init module[1]=CAEN_V560. > 10: module/map/map.c:224: ...rd(0x33333300+0xfa/16)=963ns > wr(0x33333300+0x50/16)=713ns. > 10: crate/crate.c:977: .Fast-init module[0]=GSI_VULOM. > 10: crate/crate.c:977: .Fast-init module[1]=CAEN_V560. > 10: crate/crate.c:1074: crate_init(MCAL) } > 10: ctrl/ctrl.c:788: Control server online. > Thread has no error buffer yet... > 10: f_user.c:559: WR ID=0x200. > 10: f_user.c:565: TS offset unset. Will not modify stamp. > 10: f_user.c:572: TPAT: No. > 10: f_user.c:573: Sync-check: No. > 10: f_user.c:575: Spill triggers: No. > 10: f_user.c:576: LMU: No. > 10: f_user.c:577: Timer latches: No. > 10: f_user.c:578: Spill shape: No. > 10: f_user.c:579: Micro-structure: No. > 10: f_user.c:581: Multi-event flag: No. > 10: f_user.c:586: UDP destination: None. > GSI_VULOM: diff_module: 0, module_crate_counter: 0, module_event_counter: 0, > module_this_minus_crate: 0 > CAEN_V560: diff_module: 0, module_crate_counter: 0, module_event_counter: 0, > module_this_minus_crate: 0 > ... > > > > ____________________________________________________________________________ > Von: Hans Toshihide T?rnqvist > Gesendet: Donnerstag, 22. Februar 2024 15:26:15 > An: Weber, Guenter Dr.; H?kan T Johansson > Betreff: Re: AW: [subexp-daq] Report of a possible bug of the CAEN_V560 > module ? > Dear G?nter, > > I have some ideas after your two e-mails, logs and config files are > really useful :) I'll shorten the log for my comments. for the tl;dr > version, just skip to the bottom code change suggestion. > > On 2024-02-22 14:21, Weber, Guenter Dr. wrote: > > Dear Hans, > > > > here is the output of the DRASI log from the test system. I am using the > > most recent (or almost most recent) versions of the various software > > packages from GITLAB. The system currently only has a VULOM and a V560 > > module. > > > > I added some comments to the output. To me it looks, the VULOM has some > > problems at the beginning. And, separate from the VULOM issue, the math > > of the difference in counters does not work out for the V560 module. > > > > The VULOM issue I did not notice before. So, maybe by adding a lot of > > output lines into crate.c and then removing them I have broken > > something. But it also possible that before I simply overlooked these > > error message as in the end it looks like the DAQ is working fine. > > > > Best greetings > > > > G?nter > > > > 10: f_user.c:559: WR ID=0x200. > > 10: f_user.c:565: TS offset unset. Will not modify stamp. > > 10: f_user.c:572: TPAT: No. > > 10: f_user.c:573: Sync-check: No. > > 10: f_user.c:575: Spill triggers: No. > > 10: f_user.c:576: LMU: No. > > 10: f_user.c:577: Timer latches: No. > > 10: f_user.c:578: Spill shape: No. > > 10: f_user.c:579: Micro-structure: No. > > 10: f_user.c:581: Multi-event flag: No. > > 10: f_user.c:586: UDP destination: None. > > ***** looks like the VULOM has a problem. I did not notice this before > ***** > > 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver has had > > sync failure, status=0x000a8000. > > 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver not > > synced, status=0x000a8000. > > 5: crate/crate.c:1250: .MCAL[0]=GSI_VULOM: readout_dt failed = 0x00000004. > > 5: crate/crate.c:1322: .MCAL[0]=GSI_VULOM: Event counter: > > crate=0x00000000/32, this-crate=0x00000000, module=0x00000000/31 > > diff=0xdeadbeef, shadow=0x00000000/31 diff=0xdeadbeef. > > The sync could be bad if the DAQ starts too fast after setting up the > timestamp source. Try "sleep 1" after setting up the vulom so the > timestamp receiver in the vulom can latch onto its input, even if it's > wired internally. > > > ***** here we see the counter mismatch - I did add a 250 ms delay to > > crate.c, before it does readout_dt again to avoid having thousands of > > output lines here****** > > * > > CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, > > module_event_counter: 1, module_this_minus_crate: 0 > > CAEN_V560: diff_module: 4294967294, module_crate_counter: 0, > > module_event_counter: 2, module_this_minus_crate: 0 > > CAEN_V560: diff_module: 4294967293, module_crate_counter: 0, > > module_event_counter: 3, module_this_minus_crate: 0 > > CAEN_V560: diff_module: 4294967292, module_crate_counter: 0, > > module_event_counter: 4, module_this_minus_crate: 0 > > ***** after four trials of readout_dt of V560, we reach the timeout of 1 > > second***** > > 5: crate/crate.c:1287: .MCAL[1]=CAEN_V560: readout_dt timeout. > > 5: crate/crate.c:1322: .MCAL[1]=CAEN_V560: Event counter: > > crate=0x00000000/32, this-crate=0x00000000, module=0x00000004/32 > > diff=0xfffffffc, shadow=0x00000000/32 diff=0x00000000. > > This is an artifact of the soft counter in the v560, obviously we don't > expect the module to have more accepted events while polling it, but the > real problem comes a bit later. > > > 5: crate/crate.c:1394: .MCAL: readout_dt failed! > > 5: crate/crate.c:1501: .MCAL: had problems, re-initializing. > > 10: crate/crate.c:684: .crate_deinit(MCAL) { > > 10: crate/crate.c:708: .crate_deinit(MCAL) } > > 8: lwroc_triva_state.c:2028: Master TRIVA/MI no progress last second, > > and in deadtime. > > 8: lwroc_triva_state.c:2399: Master: deadtime: 1.? Status: 0x10 > > (IN_READOUT).? EC: 2 > > 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0. > > 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting... > > 10: crate/crate.c:900: .crate_init(MCAL) { > > 10: crate/crate.c:924: ..Slow-init module[0]=GSI_VULOM. > > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > > 10: crate/crate.c:924: ..Slow-init module[1]=CAEN_V560. > > 10: module/map/map.c:224: ....rd(0x33333300+0xfa/16)=963ns > > wr(0x33333300+0x50/16)=713ns. > > 10: crate/crate.c:977: ..Fast-init module[0]=GSI_VULOM. > > 10: crate/crate.c:977: ..Fast-init module[1]=CAEN_V560. > > 10: crate/crate.c:1074: .crate_init(MCAL) } > > 5: f_user.c:1257: .had readout error, ret=0x14, trigger=14, prev=0 > > This is the last log from the readout part, it says trigger 14 was > handled. This trigger is always fired in MBS-like DAQ's before starting > the event loop, but no master start is delivered to the modules. No > physical event is associated with it, so we expect no counter increases > and no event data. > > In general we read out "everything" for every trigger and rely on > modules reporting their status/content properly. Modules like the v560 > ruin this since we always check it for all events and the counter always > increments, and it clearly shouldn't for trigger 14. So, in this case it > really did test the incorrect software logic... > > I had a look in the v560 manual once more and only now did I realize > that it is not trigger based, the scalers are only available on-the-fly. > The event counter makes no sense then, so I will concede and suggest you > set the module counter mask to 0. > > Have a look in module/gsi_vftx2/gsi_vftx2.c line 79, another module > without an event-counter which skips the whole counting stuff: > > vftx2->module.event_counter.mask = 0; > > Put something similar in module/caen_v560/caen_v560.c line 41, and feel > free to remove the increment in readout_dt. > > Hope the extra info isn't too verbose... > > Best regards, > Hans > > From hans.tornqvist at chalmers.se Thu Feb 22 18:04:07 2024 From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=) Date: Thu, 22 Feb 2024 18:04:07 +0100 Subject: [subexp-daq] Report of a possible bug of the CAEN_V560 module In-Reply-To: <3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de> References: <36c79ac1-dcce-052e-ce02-1783a3fb383e@chalmers.se> <0ff5bfa61f5544fa93742c4ff2da59a3@hi-jena.gsi.de> <743E497D-9BBD-418C-A40C-1EA2FDBBFCCA@chalmers.se> <5cdf2fcf92bf432bad243eca8e917b4a@hi-jena.gsi.de> <93276dae-2037-4c8a-b247-5b065336089d@chalmers.se> <53bcb3db-1ea5-43db-9e4e-7a77238dd899@chalmers.se> <62267106fb284e50b805b9dba09b8483@hi-jena.gsi.de> <423e14dd-73a9-4fbc-b106-4856cb1e1997@chalmers.se> <3c091e0197404e4b85562bb5a0559906@hi-jena.gsi.de> Message-ID: Dear G?nter, The fixes are up, I hope I didn't forget any. Try it, and in case of issues please do a "git diff" so we can see all the things that changed. Thanks for helping out! Cheers, Hans On 2024-02-22 16:09, Weber, Guenter Dr. wrote: > Dear Hans, > > > many thanks! And in particular for all the detailed explanations. > > > For the VULOM "sleep 1" did not do the trick, but "sleep 10" worked. Is > there any chance to ask the VULOM if it feels ready to do the job, > instead of using a random waiting time? > > > Also I noticed that when aksing the VULOM which firmware it is using, we > get a slightly different reply than the actual firmware number: > > RIO4-MCAL-1 mbsdaq > vulomflash --addr=3 --read > VULOM base address: 0x03000000 > hwmap_mapvme.c:398: LOG: Virtual address for VULOM/TRIDI @ VME > 0x03000000 is 0x3005e000. > Performing command 'read'... > *VOLUM+0 => 0x14091f20* > VOLUM+RANGE_REG(0x800000) => 0x0000006a > Released vme ptr. > But the actual firmware number is *1409285e*. > > For comparison should one look only at the first four hex numbers? Or is > there more to take into account? > > > For the V560 module, misusing the bitmask for the counter?resolved the > issue. At the end of this mail, I attach the new log. Maybe you find > something notable, but to me it looks fine now. > > > Our next steps would be as follows: > > > 1) Wait for you to implement the bugfixes of the last days into NURDLIB. > > 2) Setting up the test system with the most recent version of NURDLIB > and checking, if our minimal system with VULOM and V560 is now running > smoothly. > > 3) Hammering the V767 TDC into NURDLIB. > > 4) Once we have achieved this, we would go back to testing the SIS3316 > modules. > > > > Best greetings > > G?nter > > > > > 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: > 56583). > Thread has no error buffer yet... > CPUS: 1 > delay: 1 > 10: lwroc_hostname_util.c:108: Host 'lyserv' known as 192.168.1.1 (port: > 56583). > Thread has no error buffer yet... > HOST: RIO4-MCAL-1 > Token: d6d68d7b (d6d68d7b:d6d68d7b) [/mbsusr/mbsdaq/.drasi_tokens/mcal] > 10: lwroc_hostname_util.c:457: Own address: 192.168.1.71/255.255.255.0 > (eth1). > 10: lwroc_data_pipe.c:145: Data buffer READOUT_PIPE, size 419430400 = > 0x19000000, 1 consumers. > 10: lwroc_triva_readout.c:66: Silence TRIVA? (HALT) > 10: lwroc_net_io.c:167: Started server on port 56583 (data port 39534). > client union size: 244 208 188 508 640 204 204? => 640 > 10: lwroc_udp_awaken_hints.c:159: UDP awaken hints file: > /tmp/drasi.u1001/drasi.hints.u1001.RIO4-MCAL-1:56583 > 10: lwroc_main.c:706: Log message rate limit not in effect. > 10: lwroc_readout.c:112: call readout_init... > 10: lwroc_thread_util.c:117: This is the triva control thread! > 10: lwroc_thread_util.c:117: This is the net io thread! > 10: lwroc_thread_util.c:117: This is the slow_async thread! > 10: lwroc_thread_util.c:117: This is the data server thread! > 8: lwroc_message_wait.c:86: Waited 1 seconds for msg client. > 8: lwroc_triva_state.c:414: Waited 1 seconds for initial slave and EB > connection(s): > 8: lwroc_triva_state.c:422: [EB lyserv] (state 0) > 10: lwroc_message_internal.c:472: Message client connected! > 10: lwroc_net_trans.c:1156: [drasi] Transport client connected (data) > [192.168.1.1]. > 10: lwroc_triva_control.c:370: Setup TRIVA? (DISBUS, HALT, MASTER, RESET) > 10: lwroc_triva_control.c:418: Minimum event time > ctime(5000)+1*rd(686)+3*wr(634)+fctime(1000)=8588 ns (116.442 kHz) > 10: lwroc_triva_state.c:1486: (Re)send ident messages... > 10: lwroc_triva_control.c:495: START TEST ACQ: HALT, CLEAR=RESET, MT=1 > 9: lwroc_triva_control.c:507: TEST: GO > 10: lwroc_triva_control.c:725: RUN: RESET > 10: lwroc_triva_control.c:729: RUN: MT=14 > 9: lwroc_triva_control.c:737:?? GO (1 good test triggers done) (max > 116.4 kHz) > 10: lwroc_triva_readout.c:376: Trigger 14 seen. > 10: config/config.c:181: Will try default cfg > path='/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default', can be set with NURDLIB_DEF_PATH. > 8: lwroc_triva_state.c:2399: Master: deadtime: 1.? Status: 0x10 > (IN_READOUT).? EC: 1 > 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0. > 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting... > 10: config/parser.c:287: Opened > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' { > 10: config/parser.c:299: Closed > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/global.cfg' } > 10: config/parser.c:287: Opened './main.cfg' { > 10: config/config.c:1299: .Global log level=debug. > 10: config/parser.c:287: .Opened > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' { > 10: config/parser.c:299: .Closed > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/crate.cfg' } > 10: config/parser.c:287: .Opened > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' { > 10: config/parser.c:299: .Closed > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/gsi_vulom.cfg' } > 10: config/parser.c:287: .Opened > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' { > 10: config/parser.c:299: .Closed > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' } > 10: config/parser.c:287: .Opened > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen_v560.cfg' { > 10: config/parser.c:299: .Closed > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/caen_v560.cfg' } > 10: config/parser.c:287: .Opened > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' { > 10: config/parser.c:299: .Closed > '/LynxOS/mbsusr/mbsdaq/mbsrun/rio4/2024_mcalstruck/nurdlib/cfg/default/module_log_level.cfg' } > 10: config/parser.c:299: Closed './main.cfg' } > 10: crate/crate.c:348: crate_create { > 10: crate/crate.c:674: crate_create(MCAL) } > 10: crate/crate.c:900: crate_init(MCAL) { > 10: crate/crate.c:924: .Slow-init module[0]=GSI_VULOM. > LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) > 10: crate/crate.c:924: .Slow-init module[1]=CAEN_V560. > 10: module/map/map.c:224: ...rd(0x33333300+0xfa/16)=963ns > wr(0x33333300+0x50/16)=713ns. > 10: crate/crate.c:977: .Fast-init module[0]=GSI_VULOM. > 10: crate/crate.c:977: .Fast-init module[1]=CAEN_V560. > 10: crate/crate.c:1074: crate_init(MCAL) } > 10: ctrl/ctrl.c:788: Control server online. > Thread has no error buffer yet... > 10: f_user.c:559: WR ID=0x200. > 10: f_user.c:565: TS offset unset. Will not modify stamp. > 10: f_user.c:572: TPAT: No. > 10: f_user.c:573: Sync-check: No. > 10: f_user.c:575: Spill triggers: No. > 10: f_user.c:576: LMU: No. > 10: f_user.c:577: Timer latches: No. > 10: f_user.c:578: Spill shape: No. > 10: f_user.c:579: Micro-structure: No. > 10: f_user.c:581: Multi-event flag: No. > 10: f_user.c:586: UDP destination: None. > GSI_VULOM: diff_module: 0, module_crate_counter: 0, > module_event_counter: 0, module_this_minus_crate: 0 > CAEN_V560: diff_module: 0, module_crate_counter: 0, > module_event_counter: 0, module_this_minus_crate: 0 > ... > > > > ------------------------------------------------------------------------ > *Von:* Hans Toshihide T?rnqvist > *Gesendet:* Donnerstag, 22. Februar 2024 15:26:15 > *An:* Weber, Guenter Dr.; H?kan T Johansson > *Betreff:* Re: AW: [subexp-daq] Report of a possible bug of the > CAEN_V560 module > Dear G?nter, > > I have some ideas after your two e-mails, logs and config files are > really useful :) I'll shorten the log for my comments. for the tl;dr > version, just skip to the bottom code change suggestion. > > On 2024-02-22 14:21, Weber, Guenter Dr. wrote: >> Dear Hans, >> >> here is the output of the DRASI log from the test system. I am using the >> most recent (or almost most recent) versions of the various software >> packages from GITLAB. The system currently only has a VULOM and a V560 >> module. >> >> I added some comments to the output. To me it looks, the VULOM has some >> problems at the beginning. And, separate from the VULOM issue, the math >> of the difference in counters does not work out for the V560 module. >> >> The VULOM issue I did not notice before. So, maybe by adding a lot of >> output lines into crate.c and then removing them I have broken >> something. But it also possible that before I simply overlooked these >> error message as in the end it looks like the DAQ is working fine. >> >> Best greetings >> >> G?nter >> >> 10: f_user.c:559: WR ID=0x200. >> 10: f_user.c:565: TS offset unset. Will not modify stamp. >> 10: f_user.c:572: TPAT: No. >> 10: f_user.c:573: Sync-check: No. >> 10: f_user.c:575: Spill triggers: No. >> 10: f_user.c:576: LMU: No. >> 10: f_user.c:577: Timer latches: No. >> 10: f_user.c:578: Spill shape: No. >> 10: f_user.c:579: Micro-structure: No. >> 10: f_user.c:581: Multi-event flag: No. >> 10: f_user.c:586: UDP destination: None. >> ***** looks like the VULOM has a problem. I did not notice this before ***** >> 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver has had >> sync failure, status=0x000a8000. >> 5: module/gsi_vulom/gsi_vulom.c:12: .Serial timestamp receiver not >> synced, status=0x000a8000. >> 5: crate/crate.c:1250: .MCAL[0]=GSI_VULOM: readout_dt failed = 0x00000004. >> 5: crate/crate.c:1322: .MCAL[0]=GSI_VULOM: Event counter: >> crate=0x00000000/32, this-crate=0x00000000, module=0x00000000/31 >> diff=0xdeadbeef, shadow=0x00000000/31 diff=0xdeadbeef. > > The sync could be bad if the DAQ starts too fast after setting up the > timestamp source. Try "sleep 1" after setting up the vulom so the > timestamp receiver in the vulom can latch onto its input, even if it's > wired internally. > >> ***** here we see the counter mismatch - I did add a 250 ms delay to >> crate.c, before it does readout_dt again to avoid having thousands of >> output lines here****** >> * >> CAEN_V560: diff_module: 4294967295, module_crate_counter: 0, >> module_event_counter: 1, module_this_minus_crate: 0 >> CAEN_V560: diff_module: 4294967294, module_crate_counter: 0, >> module_event_counter: 2, module_this_minus_crate: 0 >> CAEN_V560: diff_module: 4294967293, module_crate_counter: 0, >> module_event_counter: 3, module_this_minus_crate: 0 >> CAEN_V560: diff_module: 4294967292, module_crate_counter: 0, >> module_event_counter: 4, module_this_minus_crate: 0 >> ***** after four trials of readout_dt of V560, we reach the timeout of 1 >> second***** >> 5: crate/crate.c:1287: .MCAL[1]=CAEN_V560: readout_dt timeout. >> 5: crate/crate.c:1322: .MCAL[1]=CAEN_V560: Event counter: >> crate=0x00000000/32, this-crate=0x00000000, module=0x00000004/32 >> diff=0xfffffffc, shadow=0x00000000/32 diff=0x00000000. > > This is an artifact of the soft counter in the v560, obviously we don't > expect the module to have more accepted events while polling it, but the > real problem comes a bit later. > >> 5: crate/crate.c:1394: .MCAL: readout_dt failed! >> 5: crate/crate.c:1501: .MCAL: had problems, re-initializing. >> 10: crate/crate.c:684: .crate_deinit(MCAL) { >> 10: crate/crate.c:708: .crate_deinit(MCAL) } >> 8: lwroc_triva_state.c:2028: Master TRIVA/MI no progress last second, >> and in deadtime. >> 8: lwroc_triva_state.c:2399: Master: deadtime: 1.? Status: 0x10 >> (IN_READOUT).? EC: 2 >> 10: lwroc_triva_state.c:2428: [EB lyserv] EB: Status: 0x0. >> 8: lwroc_triva_state.c:2488: Node(s) busy in readout, waiting... >> 10: crate/crate.c:900: .crate_init(MCAL) { >> 10: crate/crate.c:924: ..Slow-init module[0]=GSI_VULOM. >> LOG: TRLO: MD5SUM: 0x1409285e (CT: 63bb1d44 = 2023-01-08 19:45:08 UTC) >> 10: crate/crate.c:924: ..Slow-init module[1]=CAEN_V560. >> 10: module/map/map.c:224: ....rd(0x33333300+0xfa/16)=963ns >> wr(0x33333300+0x50/16)=713ns. >> 10: crate/crate.c:977: ..Fast-init module[0]=GSI_VULOM. >> 10: crate/crate.c:977: ..Fast-init module[1]=CAEN_V560. >> 10: crate/crate.c:1074: .crate_init(MCAL) } >> 5: f_user.c:1257: .had readout error, ret=0x14, trigger=14, prev=0 > > This is the last log from the readout part, it says trigger 14 was > handled. This trigger is always fired in MBS-like DAQ's before starting > the event loop, but no master start is delivered to the modules. No > physical event is associated with it, so we expect no counter increases > and no event data. > > In general we read out "everything" for every trigger and rely on > modules reporting their status/content properly. Modules like the v560 > ruin this since we always check it for all events and the counter always > increments, and it clearly shouldn't for trigger 14. So, in this case it > really did test the incorrect software logic... > > I had a look in the v560 manual once more and only now did I realize > that it is not trigger based, the scalers are only available on-the-fly. > The event counter makes no sense then, so I will concede and suggest you > set the module counter mask to 0. > > Have a look in module/gsi_vftx2/gsi_vftx2.c line 79, another module > without an event-counter which skips the whole counting stuff: > > vftx2->module.event_counter.mask = 0; > > Put something similar in module/caen_v560/caen_v560.c line 41, and feel > free to remove the increment in readout_dt. > > Hope the extra info isn't too verbose... > > Best regards, > Hans > From f96hajo at chalmers.se Mon Feb 26 10:05:21 2024 From: f96hajo at chalmers.se (=?ISO-8859-15?Q?H=E5kan_T_Johansson?=) Date: Mon, 26 Feb 2024 10:05:21 +0100 Subject: [subexp-daq] drasi option --log-ack-wait Message-ID: <27b8f559-3a3e-ab54-db7f-4901bc1fd998@chalmers.se> Hi! while I hope this option will very rarely be needed, drasi now has an option --log-ack-wait which will make it wait for an acknowledge from the log client before proceeding after each log message. This is intended to help debugging hardware lockups, by sprinkling the code with log messages before and after each suspicious point. (Or perhaps first just enable verbose logging.) Cheers, H?kan From g.weber at hi-jena.gsi.de Thu Feb 29 17:00:40 2024 From: g.weber at hi-jena.gsi.de (Weber, Guenter Dr.) Date: Thu, 29 Feb 2024 16:00:40 +0000 Subject: [subexp-daq] NURDLIB: Init fast vs Init slow for modules Message-ID: <8a32f9e3be684e1892036bd445c14181@hi-jena.gsi.de> Dear friends, is there a clear rule what should happen in the two init routines? In which cases INIT SLOW is executed and how is that different from INIT FAST? Thanks a lot! Best greetings G?nter -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.tornqvist at chalmers.se Thu Feb 29 18:04:19 2024 From: hans.tornqvist at chalmers.se (=?UTF-8?Q?Hans_Toshihide_T=C3=B6rnqvist?=) Date: Thu, 29 Feb 2024 18:04:19 +0100 Subject: [subexp-daq] NURDLIB: Init fast vs Init slow for modules In-Reply-To: <8a32f9e3be684e1892036bd445c14181@hi-jena.gsi.de> References: <8a32f9e3be684e1892036bd445c14181@hi-jena.gsi.de> Message-ID: <9e7ff158-ca44-40f4-9326-2186f3a4d909@chalmers.se> Dear G?nter, Here is a hopefully quick summary of the not so long nurdlib history (tl;dr at the bottom): The idea of two init functions came from the v1290, which has a controller for many settings that is really slow to talk to. 'init_slow' would do the slow configs which one does not want to sit through too much, and 'init_fast' the faster ones. The crate calls 'init_slow' for all modules, does some checks, calls 'init_fast' for all modules, and for certain modules also calls an optional 'postinit'. Later came the idea to do online configuration while a DAQ is running. Rather than splitting the init functions (or my mind) even further, 'init_slow' was taken as the non-online part (e.g. mapping) and 'init_fast' the online part (e.g. writing thresholds). Some controller drivers have been buggy and reducing re-maps has been important. Obviously, it turned out that some, or most, slow writes on the v1290 were useful to do online, so lots of things in 'init_slow' moved over to 'init_fast'. Voila, the names don't really make sense any longer... The online feature has higher priority than the re-initialisation nowadays, since the latter should be rare in a properly working setup. Eventually there should be a refactoring which is great since it changes so much at once. I even have another one ready to go into 'master', but I didn't dare to push that onto others yet. Now for the useful tl;dr part :) Put mapping and things that should not be changed online in init_slow, and everything else in init_fast. Everything that comes from 'config_get_*' could be changed online, I think. Cheers, Hans On 2024-02-29 17:00, Weber, Guenter Dr. wrote: > Dear friends, > > > is there a clear rule what should happen in the two init routines? In > which cases INIT SLOW is executed and how is that different from INIT FAST? > > > > Thanks a lot! > > > > Best greetings > > G?nter > > > >