Xilinx MPSoC PS DDR Performance Monitor
If you are using Xilinx MPSoC devices, I think you need to know about the PS DDR AXI Performance Monitor. While this is not a new block, I found this very useful when using the PS DDR.
Xilinx has had the AXI Performance Monitor (APM) IP since the early 7 series days in 2012 and you can read all about the IP in this link.
This IP allows the user to monitor different metrics on the AXI bus it’s connected to in order to understand the throughput and latency on the bus.
MPSoC DDR APM
In the MPSoC devices, Xilinx added a hard IP instance of this block in the PS DDR controller (marked in red) connected to all six DDR inputs (marked in green): 1 from PS LPD, 2 from PS HPD and 3 from PL AXI interfaces HP0 to HP3 (important to note that HP1 and HP2 share a single DDR interface).
Using this IP, you can gather the following information on the AXI traffic:
This information is very useful when debugging issues on the PS DDR for which we do not have a direct connection and cannot use chipscope.
The DDR APM has 10 metric slots meaning it can count a total of 10 metrics for all 5 AXI slots. These counters can be divided either counting 10 different metrics for a single slot or 2 different metrics per slot for all 5 AXI slots etc…
Each metric slot also has the following features:
- A metric accumulator which counts the total number of events during a period of time.
- A metric incrementer which can be configured to increment the counter on a specific range of values set between Low and High.
- An AXI ID filter which can be used to count events from a specific AXI ID source.
What is it good for?
Few usage examples:
- Estimating the CPU DDR bandwidth usage: This issue is always estimated at the design stage but hardly ever verified. Using the APM, you can simply count how many bytes were written and read by the CPU during any period of time. If you like, you can even use the AXI ID filter to count events from a specific A53/R5 core.
- Analysis of AXI transaction efficiency: Using the APM you can record both the number of bytes written/read and the according AXI transaction count from which you can calculate the average AXI transaction size. If you have an AXI master issuing many small transactions to the DDR, it can cause severe degradation of the overall DDR performance and this can be found using the APM.
- Video data latency: If you are using the PS DDR to read real time video data from the PL, you might, for example, have a tight latency constraint on the time from which the AXI read command was issued until you expect to get the first pixel from the DDR. Using the APM you can monitor the minimum and maximum AXI read latency in real time. This information is useful to detect a problem if something isn’t working as expected or just check how much latency margin you have when everything is working correctly.
How to use it
I will first point out the official ways to use this IP:
- Linux:
If you are using Linux you can use this link for APM driver instructions and this link for APM driver usage code sample. This requires adding the driver before building the Kernel and i find it less useful for existing projects unless you intend to include the APM monitors in your design. - Bare-Metal:
If you are using a Bare-Metal design use this link for APM driver instructions and this link for code examples.
Personally, I use the DDR APM for debugging and verifying existing projects and therfore I usually opt for using direct register access without any driver.
There are 4 APM blocks in the MPSoC. This article focuses on the DDR APM and using the MPSoC Register Reference (UG1087), available at this link, you can find that the base address for this APM is 0xFD0B0000.
The register offsets are available both on the IP documentation at this link in the Register Space chapter (page 22) and on the MPSoC Register Reference mentioned above.
I personally like to use MPSoC Register Reference as it is more interactive and you get an easier view of each bits’ description.
Below is a simple example used to periodically measure the number of bytes written to the PS DDR from the PL on PL AXI HP0 every second.
Configuration steps using Linux Shell:
1.
Reset the APM using Control Register (CR) — 0x0300
bit 1 — Resets all metric counters and sampled metric counters in the monitor
bit 17 — Resets the free-running Global Clock Counter
devmem 0xFD0B0300 32 0x00020002
2.
Setup the metrics wanted and the slots to monitor using the Metric Selector Registers (MSR0/1) — 0x0044 0x0048 and 0x004C
The configuration is done using 8 bits per counter, the metric selector (5 bits) and the slot select (3bits) as shown below for MSR_0 register.
For example, to select metric counter 0 “write byte count” on slot 3 (PL AXI HP0) you would write 0x62 (0x2 to the lower 5 bits and 0x3 to the upper 3 bits) to the lower 8bits of register 0x0044.
devmem 0xFD0B0044 32 0x00000062
Some people found this step confusing so it is explained in-depth below…
Counter 0 can be found on MSR_0 as shown in the image below:
PL AXI HP0 in on slot number 3 (starts at 0 on the left side) as shown in the image below:
Metric “write byte count” has a value of 2 as shown in the image below:
3.
Setup the sample interval using the Sample Interval Register (SIR) — 0x0024
This is the number of clock cycles to count before latching the Metric Counters to the Sampled Metric Counters. This allows you to count metrics in a specified amount of time. For example, in my design the DDR Reference clock is 599,994,019MHz and so if i want to measure events occuring in 1 second intervals i would write 599,994,019 clocks or 0x23C32EA3 to SIR register.
devmem 0xFD0B0024 32 0x23C32EA3
4.
Load the sample interval value to the counter using the Sample Interval Control Register (SICR) — 0x0028
bit 1 — Loads the Sample Interval register value into the Sample Interval Counter.
devmem 0xFD0B0028 32 0x00000002
5.
Load the sample interval value to the counter using the Sample Interval Control Register (SICR) — 0x0028
bit 0 — Enables the down counter
bit 8 —’1' Resets metric counters when sample interval timer expires or when the sample register is read
I set bit 8 to ‘1’ because I want a “clean” count from 0 every 1 second.
devmem 0xFD0B0028 32 0x00000101
6.
Load the sample interval value to the counter using the Sample Interval Control Register (SICR) — 0x0028
bit 0— Enables all metric counters in the monitor
bit 16 — ’Enables the free-running Global Clock Counter
devmem 0xFD0B0300 32 0x00010001
From this point, the counter is running and measuring the requested Write Byte Count on Slot 3. The count is latched every 1 second and can be read from the Sampled Metric Counter Resgister (SMCR_0) at offset 0x0200.
If wanted, this can be read periodically using:
watch -n 1 “devmem 0xFD0B0200”
Repeating the steps described above, one can add the additional 9 metric counters for other metrics on the same slot or other slots.
Notes:
- All metric counters use the same sample interval counter
- If you are measuring latency, note that the metric counters start and end measurement points can be setup using CR register 0x0300 bits 4–7.
For example, if you want to measure read latency, you might want to change the measurement to end on the first word read as ooposed to the default last word read. - You can create a histogram type measurement by selecting all 10 counters to the same slot and the same metric but setting up a different range value per counter using the Range Registers (RR_0 to RR_9). There is a nice example of this on the IP user guide Programming Sequence chapter titled “Write and Read Latency Distribution of Monitor Slot for Five Ranges”.