This manual page describes events specfic to AMD Family 1ah Zen5
processors. For more information, please consult the appropriate AMD BIOS
and Kernel Developer's guide or Open-Source Register Reference.
Each of the events listed below includes the AMD mnemonic which
matches the name found in the AMD manual and a brief summary of the event.
If available, a more detailed description of the event follows and then any
additional unit values that modify the event. Each unit can be combined to
create a new event in the system by placing the '.' character between the
event name and the unit name.
- Retired_x87_FP_Ops
- Core::X86::Pmc::Core::Retired_x87_FP_Ops
- FP retired x87 uops
Number of retired x87 arithmetic operations. Can be used to
calculate x87 FLOPs.
This event has the following units which may be used to modify
the behavior of the event:
- DivSqrROps
- x87 Divide or square root uops.
- MulOps
- x87 Multiply uops.
- AddSubOps
- x87 Add/subtract uops.
- Retired_SSE_AVX_FLOPs
- Core::X86::Pmc::Core::Retired_SSE_AVX_FLOPs
- FP retired SSE and AVX FLOPs
Number of SSE and AVX floating point arithmetic operations
retired. Number of arithmetic operations retired is dependent on number
of uops retired, data size (scalar/128/256/512), data type
(BF16/FP16/FP32/FP64) and type of operation (add/sub/mul/mac/...). Use
MergeEvent feature for accurate results.
- Retired_FP_uOps
- Core::X86::Pmc::Core::Retired_FP_uOps
- FP uops retired by size
Report number of FP uops retired by size. Can be used to
determine how vectorized code is and how much MMX / x87 content is in
the code.
This event has the following units which may be used to modify
the behavior of the event:
- Pack512uOpsRetired
- Packed 512-bit uops retired.
- Pack256uOpsRetired
- Packed 256-bit uops retired.
- Pack128uOpsRetired
- Packed 128-bit uops retired.
- ScalaruOpsRetired
- Scalar uops retired.
- MMXuOpsRetired
- MMX uops retired.
- x87uOpsRetired
- x87 uops retired.
- FP_Ops_Retired
- Core::X86::Pmc::Core::FP_Ops_Retired
- FP uops retired sorted by vector or scalar
Number of FP uops retired of selected type sorted by vector
(AVX/SSE packed) or scalar (x87, AVX/SSE scalar). Can be used to profile
FP codes.
- INT_Ops_Retired
- Core::X86::Pmc::Core::INT_Ops_Retired
- FP executed integer type uops sorted by vector or scalar
Number of integer uops executed in the FP retired of selected
type sorted by vector (SSE/AVX) or scalar (MMX). Can be used to profile
vector INT / MMX codes.
- Packed_FP_Ops_Retired
- Core::X86::Pmc::Core::Packed_FP_Ops_Retired
- FP uops retired sorted by packed 128 or packed 256
Number of FP uops retired of selected type sorted by 128-bit
packed dest (XMM) or 256-bit packed dest (YMM). Can be used to profile
FP codes.
- Packed_INT_Ops_Retired
- Core::X86::Pmc::Core::Packed_INT_Ops_Retired
- FP executed packed integer uops sorted by packed 128 or packed
256
Number of integer uops executed in FP retired of selected type
sorted by 128-bit packed dest (XMM) or 256-bit packed dest (YMM). Can be
used to profile FP codes.
- FP_Dispatch_Faults
- Core::X86::Pmc::Core::FP_Dispatch_Faults
- FP Dispatch Faults
Number of FP dispatch faults triggered by type. Dispatch
fill/spill faults occur when FP either does not have the data needed to
operate on in its local registers (fill), or FP needs to empty out upper
register data for proper SSE merging behavior when executing AVX code
(spill).
This event has the following units which may be used to modify
the behavior of the event:
- YmmSpillFault
- YMM spill fault
- YmmFillFault
- YMM fill fault
- XmmFillFault
- XMM Fill fault
- x87FillFault
- x87 Fill fault
- Bad_Status_2_STLI
- Core::X86::Pmc::Core::Bad_Status_2_STLI
- Bad Status 2
Store To Load Interlock (STLI) are loads that were unable to
complete because of a possible match with an older store, and the older
store could not do Store To Load Forwarding (STLF) for some reason.
This event has the following units which may be used to modify
the behavior of the event:
- StliOther
- Store-to-load conflicts: A load was unable to complete due to a
non-forwardable conflict with an older store. Most commonly, a load's
address range partially but not completely overlaps with an
uncompleted older store. Software can avoid this problem by using
same-size and same-alignment loads and stores when accessing the same
data. Vector/SIMD code is particularly susceptible to this problem;
software should construct wide vector stores by manipulating vector
elements in registers using shuffle/blend/swap instructions prior to
storing to memory, instead of using narrow element-by-element
stores.
- Retired_Lock_Instructions
- Core::X86::Pmc::Core::Retired_Lock_Instructions
- Retired Lock Instructions
Counts retired atomic read-modify-write instructions with a
LOCK prefix.
- CLFLUSH
- Core::X86::Pmc::Core::CLFLUSH
- Retired CLFLUSH Instructions
The number of retired CLFLUSH instructions. This is a
non-speculative event.
- CPUID
- Core::X86::Pmc::Core::CPUID
- Retired CPUID Instructions
The number of CPUID instructions retired.
- LS_Dispatch
- Core::X86::Pmc::Core::LS_Dispatch
- LS Dispatch
Counts the number of operations dispatched to the LS unit.
Unit Masks events are ADDed.
This event has the following units which may be used to modify
the behavior of the event:
- LdOpSt
- Dispatch of a single op that performs a load from and store to the
same memory address.
- PureSt
- Dispatch of a single op that performs a memory store.
- PureLd
- Dispatch of a single op that performs a memory load.
- SMI_or_SMM_cycles
- Core::X86::Pmc::Core::SMI_or_SMM_cycles
- SMIs Received
Counts the number of System Management Interrupts (SMIs)
received.
- Interrupts_Taken
- Core::X86::Pmc::Core::Interrupts_Taken
- Interrupts Taken
Counts the number of interrupts taken.
This event has the following units which may be used to modify
the behavior of the event:
- NumInterrupts
- Number of interrupts taken. This event is also counted when
UnitMask[7:0]=0.
- Store_to_Load_Forward
- Core::X86::Pmc::Core::Store_to_Load_Forward
- Store to Load Forward
Number of STLF hits.
- Store_Globally_Visible_Cancels_2
- Core::X86::Pmc::Core::Store_Globally_Visible_Cancels_2
- Store Globally Visible Cancels 2
Counts reasons why a Store Coalescing Buffer (SCB) commit is
canceled.
This event has the following units which may be used to modify
the behavior of the event:
- OlderStVisibleDepCancel
- Older SCB we are waiting on to become globally visible was unable to
become globally visible.
- LS_MAB_Allocates_by_Type
- Core::X86::Pmc::Core::LS_MAB_Allocates_by_Type
- LS MAB Allocates by Type
Counts when an LS pipe allocates a Miss Address Buffer (MAB)
entry to make a miss request.
- Demand_DC_Fills_by_Data_Source
- Core::X86::Pmc::Core::Demand_DC_Fills_by_Data_Source
- Demand Data Cache Fills by Data Source
Counts fills into the DC that were initiated by demand ops,
per data source.
This event has the following units which may be used to modify
the behavior of the event:
- AlternateMemories_NearFar
- Requests that return from Extension Memory.
- DramIO_Far
- Requests that target another NUMA node and return from DRAM or
MMIO.
- NearFarCache_Far
- Requests that target another NUMA node and return from another CCX's
cache.
- DramIO_Near
- Requests that target the same NUMA node and return from DRAM or
MMIO.
- NearFarCache_Near
- Requests that target the same NUMA node and return from another CCX's
cache.
- LocalCcx
- Data returned from L3 or different L2 in the same CCX.
- LocalL2
- Data returned from local L2.
- Any_DC_Fills_by_Data_Source
- Core::X86::Pmc::Core::Any_DC_Fills_by_Data_Source
- Any Data Cache Fills by Data Source
Counts all fills into the DC, per data source.
This event has the following units which may be used to modify
the behavior of the event:
- AlternateMemories_NearFar
- Requests that return from Extension Memory.
- DramIO_Far
- Requests that target another NUMA node and return from DRAM or
MMIO.
- NearFarCache_Far
- Requests that target another NUMA node and return from another CCX's
cache.
- DramIO_Near
- Requests that target the same NUMA node and return from DRAM or
MMIO.
- NearFarCache_Near
- Requests that target the same NUMA node and return from another CCX's
cache.
- LocalCcx
- Data returned from L3 or different L2 in the same CCX.
- LocalL2
- Data returned from local L2.
- L1_DTLB_Reloads
- Core::X86::Pmc::Core::L1_DTLB_Reloads
- L1 DTLB Reloads
Counts L1DTLB reloads
This event has the following units which may be used to modify
the behavior of the event:
- TlbReload1GL2Miss
- DTLB reload to a 1G page that missed in the L2DTLB.
- TlbReload2ML2Miss
- DTLB reload to a 2M page that missed in the L2DTLB.
- TlbReloadCoalescedPageMiss
- DTLB reload to a coalesced page that missed in the L2DTLB.
- TlbReload4KL2Miss
- DTLB reload to a 4K page that missed in the L2DTLB.
- TlbReload1GL2Hit
- DTLB reload to a 1G page that hit in the L2DTLB.
- TlbReload2ML2Hit
- DTLB reload to a 2M page that hit in the L2DTLB.
- TlbReloadCoalescedPageHit
- DTLB reload to a coalesced page that hit in the L2DTLB.
- TlbReload4KL2Hit
- DTLB reload to a 4K page that hit in the L2DTLB.
- Misaligned_Load_Flows
- Core::X86::Pmc::Core::Misaligned_Load_Flows
- Misaligned Load Flows
The number of misaligned load flows.
This event has the following units which may be used to modify
the behavior of the event:
- MA4K
- The number of 4KB misaligned (i.e., page crossing) loads or
LdOpSt.
- MA64
- The number of 64B misaligned (i.e., cacheline crossing) loads or
LdOpSt.
- Software_Prefetch_Dispatched
- Core::X86::Pmc::Core::Software_Prefetch_Dispatched
- Prefetch Instructions Dispatched
Software Prefetch Instructions Dispatched (speculative)
This event has the following units which may be used to modify
the behavior of the event:
- PREFETCHNTA
- PrefetchNTA instruction. See docAPM3 PREFETCHlevel.
- PREFETCHW
- PrefetchW instruction. See docAPM3 PREFETCHlevel.
- PREFETCH
- PrefetchT0, T1, and T2 instructions. See docAPM3 PREFETCHlevel.
- WCB_Close
- Core::X86::Pmc::Core::WCB_Close
- Write Combining Buffer Close
Counts events that cause a Write Combining Buffer (WCB) entry
to close.
This event has the following units which may be used to modify
the behavior of the event:
- FullLine64B
- All 64 bytes of the WCB entry have been written.
- Ineffective_Software_Prefetches
- Core::X86::Pmc::Core::Ineffective_Software_Prefetches
- Ineffective Software Prefetches
The number of software prefetches that did not fetch data
outside of the processor core.
This event has the following units which may be used to modify
the behavior of the event:
- MabHit
- Software PREFETCH instruction saw a match on an already-allocated miss
request.
- DcHit
- Software PREFETCH instruction saw a DC hit.
- Software_Prefetch_Data_Cache_Fills
- Core::X86::Pmc::Core::Software_Prefetch_Data_Cache_Fills
- Software Prefetch Data Cache Fills by Data Source
Counts fills into the DC that were initiated by software
prefetch instructions, per data source.
This event has the following units which may be used to modify
the behavior of the event:
- AlternateMemories_NearFar
- Requests that return from Extension Memory.
- DramIO_Far
- Requests that target another NUMA node and return from DRAM or
MMIO.
- NearFarCache_Far
- Requests that target another NUMA node and return from another CCX's
cache.
- DramIO_Near
- Requests that target the same NUMA node and return from DRAM or
MMIO.
- NearFarCache_Near
- Requests that target the same NUMA node and return from another CCX's
cache.
- LocalCcx
- Data returned from L3 or different L2 in the same CCX.
- LocalL2
- Data returned from local L2.
- Hardware_Prefetch_Data_Cache_Fills
- Core::X86::Pmc::Core::Hardware_Prefetch_Data_Cache_Fills
- Hardware Prefetch Data Cache Fills by Data Source
Counts fills into the DC that were initiated by hardware
prefetches, per data source.
This event has the following units which may be used to modify
the behavior of the event:
- AlternateMemories_NearFar
- Requests that return from Extension Memory.
- DramIO_Far
- Requests that target another NUMA node and return from DRAM or
MMIO.
- NearFarCache_Far
- Requests that target another NUMA node and return from another CCX's
cache.
- DramIO_Near
- Requests that target the same NUMA node and return from DRAM or
MMIO.
- NearFarCache_Near
- Requests that target the same NUMA node and return from another CCX's
cache.
- LocalCcx
- Data returned from L3 or different L2 in the same CCX.
- LocalL2
- Data returned from local L2.
- Allocated_DC_misses
- Core::X86::Pmc::Core::Allocated_DC_misses
- Allocated DC misses
Counts the number of in-flight DC misses each cycle.
- Cycles_Not_in_Halt
- Core::X86::Pmc::Core::Cycles_Not_in_Halt
- Cycles Not in Halt
Counts cycles when the thread is not in a HALTed state
- TLB_Flush_Events
- Core::X86::Pmc::Core::TLB_Flush_Events
- All TLB Flushes
TLB flush events.
- P0_frequency_Cycles_Not_in_Halt
- Core::X86::Pmc::Core::P0_frequency_Cycles_Not_in_Halt
- P0 Freq Cycles not in Halt
Counts cycles not in Halt, at the P0 P-state frequency,
regardless of the current Pstate.
This event has the following units which may be used to modify
the behavior of the event:
- P0_frequency_Cycles_Not_in_Halt
- Counts at the P0 frequency (same as Core::X86::Msr::MPERF) when not in
Halt.
- Instruction_Cache_Refills_from_L2
- Core::X86::Pmc::Core::Instruction_Cache_Refills_from_L2
- Instruction Cache Refills From L2
The number of 64 byte instruction cache lines fulfilled from
the L2 cache.
- Instruction_Cache_Refills_from_System
- Core::X86::Pmc::Core::Instruction_Cache_Refills_from_System
- Instruction Cache Refills from System
The number of 64 byte instruction cache line fulfilled from
system memory or another cache.
- L1_ITLB_Miss_L2_ITLB_Hit
- Core::X86::Pmc::Core::L1_ITLB_Miss_L2_ITLB_Hit
- L1 ITLB Miss, L2ITLB Hit
The number of instruction fetches that miss in the L1 ITLB but
hit in the L2 ITLB.
- ITLB_Reload_from_Page_Table_walk
- Core::X86::Pmc::Core::ITLB_Reload_from_Page_Table_walk
- L1 ITLB Miss, L2 ITLB Miss
The number of instruction fetches that miss in both the L1
ITLB and L2 ITLB.
This event has the following units which may be used to modify
the behavior of the event:
- Coalesced_4k
- Walk for >4k Coalesced page (implemented as 16k)
- walk_1G
- Walk for 1G page
- walk_2M
- Walk for 2M page
- walk_4K
- Walk to 4k page
- BP_Correct
- Core::X86::Pmc::Core::BP_Correct
- BP Pipe Correction or Cancel
The Branch Predictor flushed its own pipeline due to internal
conditions such as a second level prediction structure. Does not count
the number of bubbles caused by these internal flushes.
- Variable_Target_Predictions
- Core::X86::Pmc::Core::Variable_Target_Predictions
- Variable Target Predictions
The number of times a branch used the indirect predictor to
make a prediction.
- Decoder_Overrides_Existing_Branch_Prediction_Speculative
- Core::X86::Pmc::Core::Decoder_Overrides_Existing_Branch_Prediction_Speculative
- Early Redirects
Number of times that an Early Redirect is sent to Branch
Predictor. This happens when either the decoder or dispatch logic is
able to detect that the Branch Predictor needs to be redirected.
- ITLB_Hits
- Core::X86::Pmc::Core::ITLB_Hits
- ITLB Instruction Fetch Hits
The number of instruction fetches that hit in the L1ITLB.
This event has the following units which may be used to modify
the behavior of the event:
- IF1G
- L1 Instruction TLB Hit (1G page size)
- IF2M
- L1 Instruction TLB Hit (2M page size)
- IF4K
- L1 Instruction TLB Hit (4k or 16k coalesced page size)
- BP_redirects
- Core::X86::Pmc::Core::BP_redirects
- BP Redirects
Counts redirects of the branch predictor. To support legacy
software, counts both EX mispredict and resyncs when unit_mask[7:0] is
set to 0.
This event has the following units which may be used to modify
the behavior of the event:
- ExRedir
- Mispredict redirect from EX (execution-time)
- Resync
- Resync redirect (Retire-time) from RT
- Fetch_IBS_events
- Core::X86::Pmc::Core::Fetch_IBS_events
- Fetch IBS events
Counts significant Fetch IBS State transitions.
This event has the following units which may be used to modify
the behavior of the event:
- SampleVal
- Counts the number of valid Fetch Instruction Based Sampling (fetch
IBS) samples that were collected. Each valid sample also created an
IBS interrupt.
- SampleFiltered
- Counts the number of Fetch IBS tagged fetches that were discarded due
to IBS filtering. When a tagged fetch is discarded the Fetch IBS
facility will automatically tag a new fetch.
- SampleDiscarded
- Counts when the Fetch IBS facility discards an IBS tagged fetch for
reasons other than IBS filtering. When a tagged fetch is discarded the
Fetch IBS facility will automatically tag a new fetch.
- FetchTagged
- Counts the number of fetches tagged for Fetch IBS. Not all tagged
fetches create an IBS interrupt and valid fetch sample.
- IC_Tag_Hit_Miss_events
- Core::X86::Pmc::Core::IC_Tag_Hit_Miss_events
- IC Tag Hit and Miss Events
Counts the number of microtag and full tag events as selected
by unit mask.
- Op_Cache_hit_miss
- Core::X86::Pmc::Core::Op_Cache_hit_miss
- Op Cache Hit or Miss
Counts Op Cache micro-tag hit/miss events.
- Dispatch_Empty
- Core::X86::Pmc::Core::Dispatch_Empty
- Op Queue Empty
Cycles where the Op Queue is empty.
- Source_of_Op_Dispatched_From_Decoder
- Core::X86::Pmc::Core::Source_of_Op_Dispatched_From_Decoder
- Source of Op Dispatched From Decoder
Counts the number of ops dispatched from the decoder
classified by op source.
This event has the following units which may be used to modify
the behavior of the event:
- Op_Cache
- Count of ops dispatched from OpCache
- x86_decoder
- Count of ops dispatched from x86 decoder
- Types_of_Ops_Dispatched_From_Decoder
- Core::X86::Pmc::Core::Types_of_Ops_Dispatched_From_Decoder
- Types of Ops Dispatched From Decoder
Counts the number of ops dispatched from the decoder
classified by op type. The UnitMask value encodes which types of ops are
counted.
- Dispatch_Stall_Cycles_Dynamic_Tokens_Part_1
- Core::X86::Pmc::Core::Dispatch_Stall_Cycles_Dynamic_Tokens_Part_1
- Dynamic Tokens Dispatch Stall Cycles 1
Cycles where a dispatch group is valid but does not get
dispatched due to a Token Stall. UnitMask bits select the stall types
included in the count.
This event has the following units which may be used to modify
the behavior of the event:
- FPSchRsrcStall
- FP NSQ token stall
- TakenBrnchBufferRsrc
- taken branch buffer resource stall.
- StoreQueueRsrcStall
- STQ Tokens unavailable
- LoadQueueRsrcStall
- Load Queue Token Stall.
- IntPhyRegFileRsrcStall
- Integer Physical Register File resource stall.
- Dispatch_Stall_Cycles_Dynamic_Tokens_Part_2
- Core::X86::Pmc::Core::Dispatch_Stall_Cycles_Dynamic_Tokens_Part_2
- Dynamic Tokens Dispatch Stall Cycles 2
Cycles where a dispatch group is valid but does not get
dispatched due to a token stall. UnitMask bits select the stall types
included in the count.
This event has the following units which may be used to modify
the behavior of the event:
- RetQ
- Retire queue tokens unavailable
- EX_Flush_recovery
- Integer Execution flush recovery pending
- AGTokens
- Agen tokens unavailable
- ALTokens
- ALU tokens unavailable
- No_Dispatch_per_Slot
- Core::X86::Pmc::Core::No_Dispatch_per_Slot
- No_Dispatch_per_Slot
Counts the number of dispatch slots (each cycle) that remained
unused for reasons selected by UnitMask.
- Additional_Resource_Stalls
- Core::X86::Pmc::Core::Additional_Resource_Stalls
- Dispatch Additional Resource Stalls
This PMC event counts additional resource stalls that are not
captured by Dispatch_Stall_Cycle_Dynamic_Tokens_Part_1 or
Dispatch_Stall_Cycles_Dynamic_Tokens_Part_2.
- Retired_Instructions
- Core::X86::Pmc::Core::Retired_Instructions
- Retired Instructions
The number of instructions retired.
- Retired_Macro_Ops
- Core::X86::Pmc::Core::Retired_Macro_Ops
- Retired Macro-Ops
The number of macro-ops retired.
- Retired_Branch_Instructions
- Core::X86::Pmc::Core::Retired_Branch_Instructions
- Retired Branch Instructions
The number of branch instructions retired. This includes all
types of architectural control flow changes, including exceptions and
interrupts.
- Retired_Branch_Instructions_Mispredicted
- Core::X86::Pmc::Core::Retired_Branch_Instructions_Mispredicted
- Retired Branch Instructions Mispredicted.
The number of retired branch instructions, that were
mispredicted. Note that only EX mispredicts are counted.
- Retired_Taken_Branch_Instructions
- Core::X86::Pmc::Core::Retired_Taken_Branch_Instructions
- Retired Taken Branch Instructions
The number of taken branches that were retired. This includes
all types of architectural control flow changes, including exceptions
and interrupts.
- Retired_Taken_Branch_Instructions_Mispredicted
- Core::X86::Pmc::Core::Retired_Taken_Branch_Instructions_Mispredicted
- Retired Taken Branch Instructions Mispredicted.
The number of retired taken branch instructions that were
mispredicted. Note that only EX mispredicts are counted.
- Retired_Far_Control_Transfers
- Core::X86::Pmc::Core::Retired_Far_Control_Transfers
- Retired Far Control Transfers
The number of far control transfers retired including far
call/jump/return, IRET, SYSCALL and SYSRET, plus exceptions and
interrupts. Far control transfers are not subject to branch
prediction.
- Retired_Near_Return_Branch_Instructions
- Core::X86::Pmc::Core::Retired_Near_Return_Branch_Instructions
- Retired Near Return Branch Instructions
The number of near return instructions (RET [C3] or RET Iw
[C2]) retired.
- Retired_Near_Return_Branch_Instructions_Mispredicted
- Core::X86::Pmc::Core::Retired_Near_Return_Branch_Instructions_Mispredicted
- Retired Near Return Branch Instructions Mispredicted
The number of near returns retired that were not correctly
predicted by the return address predictor. Each such mispredict incurs
the same penalty as a mispredicted conditional branch instruction. Note
that only EX mispredicts are counted.
- Retired_Indirect_Branch_Instructions_Mispredicted
- Core::X86::Pmc::Core::Retired_Indirect_Branch_Instructions_Mispredicted
- Retired Indirect Branch Instructions Mispredicted
The number of indirect branches retired that were not
correctly predicted. Each such mispredict incurs the same penalty as a
mispredicted conditional branch instruction. Note that only EX
mispredicts are counted.
- Retired_MMX_FP_Instructions
- Core::X86::Pmc::Core::Retired_MMX_FP_Instructions
- Retired MMX FP Instructions
The number of MMX, SSE or x87 instructions retired. The
UnitMask allows the selection of the individual classes of instructions
as given in the table. Each increment represents one complete
instruction. Since this event includes non-numeric instructions it is
not suitable for measuring MFLOPs
This event has the following units which may be used to modify
the behavior of the event:
- SSE
- SSE instructions (SSE, SSE2, SSE3, SSSE3, SSE4A, SSE41, SSE42,
AVX).
- MMX
- MMX instructions
- X87
- x87 instructions
- Retired_Indirect_Branch_Instructions
- Core::X86::Pmc::Core::Retired_Indirect_Branch_Instructions
- Retired Indirect Branch Instructions
The number of indirect branches retired.
- Retired_Conditional_Branch_Instructions
- Core::X86::Pmc::Core::Retired_Conditional_Branch_Instructions
- Retired Conditional Branch Instructions
Count of conditional branch instructions that retired
- Div_Cycles_Busy_count
- Core::X86::Pmc::Core::Div_Cycles_Busy_count
- Div Cycles Busy count
Counts cycles when the divider is busy
- Div_Op_Count
- Core::X86::Pmc::Core::Div_Op_Count
- Div Op Count
Counts number of divide ops
- Cycles_with_no_retire
- Core::X86::Pmc::Core::Cycles_with_no_retire
- Cycles with no retire
This event counts cycles when the hardware thread does not
retire any ops for reasons selected by UnitMask[4:0]. UnitMask events
[4:0] are mutually exclusive. If multiple reasons apply for a given
cycle, the lowest numbered UnitMask event is counted.
This event has the following units which may be used to modify
the behavior of the event:
- ThreadNotSelected
- The number cycles where ops could have retired (i.e. did not fall into
the sub-events [0]...[3]) but did not retire because the thread
arbitration did not select the thread for retire.
- Other
- The number of cycles where ops could have retired (self and older ops
are complete), but were stopped from retirement for other reasons:
retire breaks, traps, faults, etc.
- NotCompleteSelf
- The number of cycles where the oldest retire slot did not have its
completion bits set.
- Empty
- The number of cycles when there were no valid ops in the retire queue.
This may be caused by front-end bottlenecks or pipeline
redirects.
- Retired_Microcoded_Instructions
- Core::X86::Pmc::Core::Retired_Microcoded_Instructions
- Retired Microcoded Instructions
The number of retired microcoded instructions.
- Retired_Microcode_Ops
- Core::X86::Pmc::Core::Retired_Microcode_Ops
- Retired Microcode Ops
The number of microcode ops that have retired.
- Retired_Conditional_Branch_Instructions_Mispredicted
- Core::X86::Pmc::Core::Retired_Conditional_Branch_Instructions_Mispredicted
- Retired Conditional Branch Instructions Mispredicted
The number of retired conditional branch instructions that
were not correctly predicted because of a branch direction mismatch.
- Retired_Unconditional_Branch_Instructions_Mispredicted
- Core::X86::Pmc::Core::Retired_Unconditional_Branch_Instructions_Mispredicted
- Retired Unconditional Branch Instructions Mispredicted
The number of retired unconditional indirect branch
instructions that were mispredicted.
- Retired_Unconditional_Branch_Instructions
- Core::X86::Pmc::Core::Retired_Unconditional_Branch_Instructions
- Retired Unconditional Branch Instructions
Retired Unconditional Branch Instructions
- Tagged_IBS_Ops
- Core::X86::Pmc::Core::Tagged_IBS_Ops
- Tagged IBS Ops
Counts Op IBS related events
This event has the following units which may be used to modify
the behavior of the event:
- IbsCountRollover
- Number of times an op could not be tagged by IBS because of a previous
tagged op that has not yet signaled interrupt.
- IbsTaggedOpsRet
- Number of Ops tagged by IBS that retired
- IbsTaggedOps
- Number of Ops tagged by IBS
- Retired_fused_instructions
- Core::X86::Pmc::Core::Retired_fused_instructions
- Retired Fused Instructions
Counts retired fused instructions.
- L2RequestG1
- Core::X86::Pmc::L2::L2RequestG1
- Requests to L2 Group1
All L2 Cache Requests (Breakdown 1 - Common)
This event has the following units which may be used to modify
the behavior of the event:
- RdBlkL
- Data Cache Reads (including hardware and software prefetch).
- RdBlkX
- Data Cache Stores
- LsRdBlkC_S
- Data Cache Shared Reads
- CacheableIcRead
- Instruction Cache Reads.
- LsPrefetchL2Cmd
-
- L2HwPf
- All prefetches accepted by L2 pipeline, hit or miss. Types of PF and
L2 hit/miss broken out in a separate perfmon event
- Group2
- Various Noncacheable requests. Non-cached Data Reads, Non- cached
Instruction Reads, Self-modifying code checks.
- L2RequestG2
- Core::X86::Pmc::L2::L2RequestG2
- Requests to L2 Group2
All L2 Cache Requests (Breakdown 2 - Rare).
This event has the following units which may be used to modify
the behavior of the event:
- LsRdSized
- LS sized read, coherent non-cacheable.
- LsRdSizedNC
- LS sized read, non-coherent, non-cacheable.
- L2WcbReq
- Core::X86::Pmc::L2::L2WcbReq
- Write Combining Buffer Requests
Write Combining Buffer operations. For information on Write
Combining see docAPM2 sections: Memory System, Memory Types, Buffering
and Combining Memory Writes.
This event has the following units which may be used to modify
the behavior of the event:
- WcbClose
- Write Combining Buffer close
- L2CacheReqStat
- Core::X86::Pmc::L2::L2CacheReqStat
- Core to L2 Cacheable Request Access Status
L2 Cache Request Outcomes (not including L2 Prefetch).
This event has the following units which may be used to modify
the behavior of the event:
- LsRdBlkCS
- Data Cache Shared Read Hit in L2.
- LsRdBlkLHitX:
Data Cache Read Hit in L2
- Modifiable
- LsRdBlkLHitS
- Data Cache Read Hit Non-Modifiable Line in L2.
- LsRdBlkX
- Data Cache Store Hit in L2.
- LsRdBlkC
- Data Cache Req Miss in L2.
- IcFillHitX
- Instruction Cache Hit Modifiable Line in L2.
- IcFillHitS
- Instruction Cache Hit Non-Modifiable Line in L2.
- IcFillMiss
- Instruction Cache Req Miss in L2.
- L2PfHitL2
- Core::X86::Pmc::L2::L2PfHitL2
- L2 Prefetch Hit in L2
Counts all L2 prefetches accepted by L2 pipeline which hit in
the L2 cache.
- L2PfMissL2HitL3
- Core::X86::Pmc::L2::L2PfMissL2HitL3
- L2 Prefetcher Hits in L3
Counts all L2 prefetches accepted by the L2 pipeline which
miss the L2 cache and hit the L3.
- L2PfMissL2L3
- Core::X86::Pmc::L2::L2PfMissL2L3
- L2 Prefetcher Misses in L3
Counts all L2 prefetches accepted by the L2 pipeline which
miss the L2 and the L3 caches
- L2FillRspSrc
- Core::X86::Pmc::L2::L2FillRspSrc
- L2 Fill Response Source
Counts fill responses based on their source. Selecting an
event mask of 0xfe will count all L3 responses. This will count all L3
responses to fill requests. This event is similar to LS PMC 0x44
This event has the following units which may be used to modify
the behavior of the event:
- AlternateMemories_NearFar
- Requests that return from Extension Memory
- DramIO_Far
- Requests that target another NUMA node and return from either DRAM or
MMIO from another NUMA node, either from the same or different NUMA
node.
- NearFarCache_Far
- Requests that target another NUMA node and return from another CCX's
cache.
- DramIO_Near
- Requests that target the same NUMA node and return from either DRAM or
MMIO from the same NUMA node.
- NearFarCache_Near
- Requests that target the same NUMA node and return from another CCX's
cache.
- LocalCcx
- Data returned from L3 or different L2 in the same CCX.