This manual page describes events specific to the following Intel
CPU models and is derived from Intel's perfmon data. For more information,
please consult the Intel Software Developer's Manual or Intel's perfmon
website.
- ld_blocks.store_forward
- This event counts loads that followed a store to the same address, where
the data could not be forwarded inside the pipeline from the store to the
load. The most common reason why store forwarding would be blocked is when
a load's address range overlaps with a preceding smaller uncompleted
store. The penalty for blocked store forwarding is that the load must wait
for the store to write its value to the cache before it can be
issued.
- ld_blocks.no_sr
- The number of times that split load operations are temporarily blocked
because all resources for handling the split accesses are in use.
- misalign_mem_ref.loads
- Speculative cache-line split load uops dispatched to L1D.
- misalign_mem_ref.stores
- Speculative cache-line split store-address uops dispatched to L1D.
- ld_blocks_partial.address_alias
- Aliasing occurs when a load is issued after a store and their memory
addresses are offset by 4K. This event counts the number of loads that
aliased with a preceding store, resulting in an extended address check in
the pipeline which can have a performance impact.
- dtlb_load_misses.miss_causes_a_walk
- Misses in all TLB levels that cause a page walk of any page size.
- dtlb_load_misses.walk_completed_4k
- Completed page walks due to demand load misses that caused 4K page walks
in any TLB levels.
- dtlb_load_misses.walk_completed_2m_4m
- Completed page walks due to demand load misses that caused 2M/4M page
walks in any TLB levels.
- dtlb_load_misses.walk_completed_1g
- Load miss in all TLB levels causes a page walk that completes. (1G)
- dtlb_load_misses.walk_completed
- Completed page walks in any TLB of any page size due to demand load
misses.
- dtlb_load_misses.walk_duration
- This event counts cycles when the page miss handler (PMH) is servicing
page walks caused by DTLB load misses.
- dtlb_load_misses.stlb_hit_4k
- This event counts load operations from a 4K page that miss the first DTLB
level but hit the second and do not cause page walks.
- dtlb_load_misses.stlb_hit_2m
- This event counts load operations from a 2M page that miss the first DTLB
level but hit the second and do not cause page walks.
- dtlb_load_misses.stlb_hit
- Number of cache load STLB hits. No page walk.
- dtlb_load_misses.pde_cache_miss
- DTLB demand load misses with low part of linear-to-physical address
translation missed.
- int_misc.recovery_cycles
- This event counts the number of cycles spent waiting for a recovery after
an event such as a processor nuke, JEClear, assist, hle/rtm abort
etc.
- int_misc.recovery_cycles_any
- Core cycles the allocator was stalled due to recovery from earlier clear
event for any thread running on the physical core (e.g. misprediction or
memory nuke).
- uops_issued.any
- This event counts the number of uops issued by the Front-end of the
pipeline to the Back-end. This event is counted at the allocation stage
and will count both retired and non-retired uops.
- uops_issued.stall_cycles
- Cycles when Resource Allocation Table (RAT) does not issue Uops to
Reservation Station (RS) for the thread.
- uops_issued.core_stall_cycles
- Cycles when Resource Allocation Table (RAT) does not issue Uops to
Reservation Station (RS) for all threads.
- uops_issued.flags_merge
- Number of flags-merge uops allocated. Such uops add delay.
- uops_issued.slow_lea
- Number of slow LEA or similar uops allocated. Such uop has 3 sources (for
example, 2 sources + immediate) regardless of whether it is a result of
LEA instruction or not.
- uops_issued.single_mul
- Number of multiply packed/scalar single precision uops allocated.
- arith.divider_uops
- Any uop executed by the Divider. (This includes all divide uops, sqrt,
...)
- l2_rqsts.demand_data_rd_miss
- Demand data read requests that missed L2, no rejects.
The following errata may apply to this: HSD78, HSM80
- l2_rqsts.rfo_miss
- Counts the number of store RFO requests that miss the L2 cache.
- l2_rqsts.code_rd_miss
- Number of instruction fetches that missed the L2 cache.
- l2_rqsts.all_demand_miss
- Demand requests that miss L2 cache.
The following errata may apply to this: HSD78, HSM80
- l2_rqsts.l2_pf_miss
- Counts all L2 HW prefetcher requests that missed L2.
- l2_rqsts.miss
- All requests that missed L2.
The following errata may apply to this: HSD78, HSM80
- l2_rqsts.demand_data_rd_hit
- Counts the number of demand Data Read requests, initiated by load
instructions, that hit L2 cache
The following errata may apply to this: HSD78, HSM80
- l2_rqsts.rfo_hit
- Counts the number of store RFO requests that hit the L2 cache.
- l2_rqsts.code_rd_hit
- Number of instruction fetches that hit the L2 cache.
- l2_rqsts.l2_pf_hit
- Counts all L2 HW prefetcher requests that hit L2.
- l2_rqsts.all_demand_data_rd
- Counts any demand and L1 HW prefetch data load requests to L2.
The following errata may apply to this: HSD78, HSM80
- l2_rqsts.all_rfo
- Counts all L2 store RFO requests.
- l2_rqsts.all_code_rd
- Counts all L2 code requests.
- l2_rqsts.all_demand_references
- Demand requests to L2 cache.
The following errata may apply to this: HSD78, HSM80
- l2_rqsts.all_pf
- Counts all L2 HW prefetcher requests.
- l2_rqsts.references
- All requests to L2 cache.
The following errata may apply to this: HSD78, HSM80
- l2_demand_rqsts.wb_hit
- Not rejected writebacks that hit L2 cache.
- longest_lat_cache.miss
- This event counts each cache miss condition for references to the last
level cache.
- longest_lat_cache.reference
- This event counts requests originating from the core that reference a
cache line in the last level cache.
- cpu_clk_unhalted.thread_p
- Counts the number of thread cycles while the thread is not in a halt
state. The thread enters the halt state when it is running the HLT
instruction. The core frequency may change from time to time due to power
or thermal throttling.
- cpu_clk_unhalted.thread_p_any
- Core cycles when at least one thread on the physical core is not in halt
state.
- cpu_clk_thread_unhalted.ref_xclk
- Increments at the frequency of XCLK (100 MHz) when not halted.
- cpu_clk_thread_unhalted.ref_xclk_any
- Reference cycles when the at least one thread on the physical core is
unhalted (counts at 100 MHz rate).
- cpu_clk_unhalted.ref_xclk
- Reference cycles when the thread is unhalted. (counts at 100 MHz
rate)
- cpu_clk_unhalted.ref_xclk_any
- Reference cycles when the at least one thread on the physical core is
unhalted (counts at 100 MHz rate).
- cpu_clk_thread_unhalted.one_thread_active
- Count XClk pulses when this thread is unhalted and the other thread is
halted.
- cpu_clk_unhalted.one_thread_active
- Count XClk pulses when this thread is unhalted and the other thread is
halted.
- l1d_pend_miss.pending
- Increments the number of outstanding L1D misses every cycle. Set Cmask = 1
and Edge =1 to count occurrences.
- l1d_pend_miss.pending_cycles
- Cycles with L1D load Misses outstanding.
- l1d_pend_miss.pending_cycles_any
- Cycles with L1D load Misses outstanding from any thread on physical
core.
- l1d_pend_miss.request_fb_full
- Number of times a request needed a FB entry but there was no entry
available for it. That is the FB unavailability was dominant reason for
blocking the request. A request includes cacheable/uncacheable demands
that is load, store or SW prefetch. HWP are e.
- l1d_pend_miss.fb_full
- Cycles a demand request was blocked due to Fill Buffers
inavailability.
- dtlb_store_misses.miss_causes_a_walk
- Miss in all TLB levels causes a page walk of any page size
(4K/2M/4M/1G).
- dtlb_store_misses.walk_completed_4k
- Completed page walks due to store misses in one or more TLB levels of 4K
page structure.
- dtlb_store_misses.walk_completed_2m_4m
- Completed page walks due to store misses in one or more TLB levels of
2M/4M page structure.
- dtlb_store_misses.walk_completed_1g
- Store misses in all DTLB levels that cause completed page walks. (1G)
- dtlb_store_misses.walk_completed
- Completed page walks due to store miss in any TLB levels of any page size
(4K/2M/4M/1G).
- dtlb_store_misses.walk_duration
- This event counts cycles when the page miss handler (PMH) is servicing
page walks caused by DTLB store misses.
- dtlb_store_misses.stlb_hit_4k
- This event counts store operations from a 4K page that miss the first DTLB
level but hit the second and do not cause page walks.
- dtlb_store_misses.stlb_hit_2m
- This event counts store operations from a 2M page that miss the first DTLB
level but hit the second and do not cause page walks.
- dtlb_store_misses.stlb_hit
- Store operations that miss the first TLB level but hit the second and do
not cause page walks.
- dtlb_store_misses.pde_cache_miss
- DTLB store misses with low part of linear-to-physical address translation
missed.
- load_hit_pre.sw_pf
- Non-SW-prefetch load dispatches that hit fill buffer allocated for S/W
prefetch.
- load_hit_pre.hw_pf
- Non-SW-prefetch load dispatches that hit fill buffer allocated for H/W
prefetch.
- ept.walk_cycles
- Cycle count for an Extended Page table walk.
- l1d.replacement
- This event counts when new data lines are brought into the L1 Data cache,
which cause other lines to be evicted from the cache.
- tx_mem.abort_conflict
- Number of times a transactional abort was signaled due to a data conflict
on a transactionally accessed address.
- tx_mem.abort_capacity_write
- Number of times a transactional abort was signaled due to a data capacity
limitation for transactional writes.
- tx_mem.abort_hle_store_to_elided_lock
- Number of times a HLE transactional region aborted due to a non XRELEASE
prefixed instruction writing to an elided lock in the elision buffer.
- tx_mem.abort_hle_elision_buffer_not_empty
- Number of times an HLE transactional execution aborted due to
NoAllocatedElisionBuffer being non-zero.
- tx_mem.abort_hle_elision_buffer_mismatch
- Number of times an HLE transactional execution aborted due to XRELEASE
lock not satisfying the address and value requirements in the elision
buffer.
- tx_mem.abort_hle_elision_buffer_unsupported_alignment
- Number of times an HLE transactional execution aborted due to an
unsupported read alignment from the elision buffer.
- tx_mem.hle_elision_buffer_full
- Number of times HLE lock could not be elided due to ElisionBufferAvailable
being zero.
- move_elimination.int_eliminated
- Number of integer move elimination candidate uops that were
eliminated.
- move_elimination.simd_eliminated
- Number of SIMD move elimination candidate uops that were eliminated.
- move_elimination.int_not_eliminated
- Number of integer move elimination candidate uops that were not
eliminated.
- move_elimination.simd_not_eliminated
- Number of SIMD move elimination candidate uops that were not
eliminated.
- cpl_cycles.ring0
- Unhalted core cycles when the thread is in ring 0.
- cpl_cycles.ring0_trans
- Number of intervals between processor halts while thread is in ring
0.
- cpl_cycles.ring123
- Unhalted core cycles when the thread is not in ring 0.
- tx_exec.misc1
- Counts the number of times a class of instructions that may cause a
transactional abort was executed. Since this is the count of execution, it
may not always cause a transactional abort.
- tx_exec.misc2
- Counts the number of times a class of instructions (e.g., vzeroupper) that
may cause a transactional abort was executed inside a transactional
region.
- tx_exec.misc3
- Counts the number of times an instruction execution caused the
transactional nest count supported to be exceeded.
- tx_exec.misc4
- Counts the number of times a XBEGIN instruction was executed inside an HLE
transactional region.
- tx_exec.misc5
- Counts the number of times an HLE XACQUIRE instruction was executed inside
an RTM transactional region.
- rs_events.empty_cycles
- This event counts cycles when the Reservation Station ( RS ) is empty for
the thread. The RS is a structure that buffers allocated micro-ops from
the Front-end. If there are many cycles when the RS is empty, it may
represent an underflow of instructions delivered from the Front-end.
- rs_events.empty_end
- Counts end of periods where the Reservation Station (RS) was empty. Could
be useful to precisely locate Frontend Latency Bound issues.
- offcore_requests_outstanding.demand_data_rd
- Offcore outstanding demand data read transactions in SQ to uncore. Set
Cmask=1 to count cycles.
The following errata may apply to this: HSD78, HSD62, HSD61,
HSM63, HSM80
- offcore_requests_outstanding.cycles_with_demand_data_rd
- Cycles when offcore outstanding Demand Data Read transactions are present
in SuperQueue (SQ), queue to uncore.
The following errata may apply to this: HSD78, HSD62, HSD61,
HSM63, HSM80
- offcore_requests_outstanding.demand_data_rd_ge_6
- Cycles with at least 6 offcore outstanding Demand Data Read transactions
in uncore queue.
The following errata may apply to this: HSD78, HSD62, HSD61,
HSM63, HSM80
- offcore_requests_outstanding.demand_code_rd
- Offcore outstanding Demand code Read transactions in SQ to uncore. Set
Cmask=1 to count cycles.
The following errata may apply to this: HSD62, HSD61,
HSM63
- offcore_requests_outstanding.demand_rfo
- Offcore outstanding RFO store transactions in SQ to uncore. Set Cmask=1 to
count cycles.
The following errata may apply to this: HSD62, HSD61,
HSM63
- offcore_requests_outstanding.cycles_with_demand_rfo
- Offcore outstanding demand rfo reads transactions in SuperQueue (SQ),
queue to uncore, every cycle.
The following errata may apply to this: HSD62, HSD61,
HSM63
- offcore_requests_outstanding.all_data_rd
- Offcore outstanding cacheable data read transactions in SQ to uncore. Set
Cmask=1 to count cycles.
The following errata may apply to this: HSD62, HSD61,
HSM63
- offcore_requests_outstanding.cycles_with_data_rd
- Cycles when offcore outstanding cacheable Core Data Read transactions are
present in SuperQueue (SQ), queue to uncore.
The following errata may apply to this: HSD62, HSD61,
HSM63
- lock_cycles.split_lock_uc_lock_duration
- Cycles in which the L1D and L2 are locked, due to a UC lock or split
lock.
- lock_cycles.cache_lock_duration
- Cycles in which the L1D is locked.
- idq.empty
- Counts cycles the IDQ is empty.
The following errata may apply to this: HSD135
- idq.mite_uops
- Increment each cycle # of uops delivered to IDQ from MITE path. Set Cmask
= 1 to count cycles.
- idq.mite_cycles
- Cycles when uops are being delivered to Instruction Decode Queue (IDQ)
from MITE path.
- idq.dsb_uops
- Increment each cycle. # of uops delivered to IDQ from DSB path. Set Cmask
= 1 to count cycles.
- idq.dsb_cycles
- Cycles when uops are being delivered to Instruction Decode Queue (IDQ)
from Decode Stream Buffer (DSB) path.
- idq.ms_dsb_uops
- Increment each cycle # of uops delivered to IDQ when MS_busy by DSB. Set
Cmask = 1 to count cycles. Add Edge=1 to count # of delivery.
- idq.ms_dsb_cycles
- Cycles when uops initiated by Decode Stream Buffer (DSB) are being
delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS)
is busy.
- idq.ms_dsb_occur
- Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream
Buffer (DSB) while Microcode Sequenser (MS) is busy.
- idq.all_dsb_cycles_4_uops
- Counts cycles DSB is delivered four uops. Set Cmask = 4.
- idq.all_dsb_cycles_any_uops
- Counts cycles DSB is delivered at least one uops. Set Cmask = 1.
- idq.ms_mite_uops
- Increment each cycle # of uops delivered to IDQ when MS_busy by MITE. Set
Cmask = 1 to count cycles.
- idq.all_mite_cycles_4_uops
- Counts cycles MITE is delivered four uops. Set Cmask = 4.
- idq.all_mite_cycles_any_uops
- Counts cycles MITE is delivered at least one uop. Set Cmask = 1.
- idq.ms_uops
- This event counts uops delivered by the Front-end with the assistance of
the microcode sequencer. Microcode assists are used for complex
instructions or scenarios that can't be handled by the standard decoder.
Using other instructions, if possible, will usually improve
performance.
- idq.ms_cycles
- This event counts cycles during which the microcode sequencer assisted the
Front-end in delivering uops. Microcode assists are used for complex
instructions or scenarios that can't be handled by the standard decoder.
Using other instructions, if possible, will usually improve
performance.
- idq.ms_switches
- Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode
pipeline) to the Microcode Sequencer.
- idq.mite_all_uops
- Number of uops delivered to IDQ from any path.
- icache.hit
- Number of Instruction Cache, Streaming Buffer and Victim Cache Reads. both
cacheable and noncacheable, including UC fetches.
- icache.misses
- This event counts Instruction Cache (ICACHE) misses.
- icache.ifetch_stall
- Cycles where a code fetch is stalled due to L1 instruction-cache
miss.
- icache.ifdata_stall
- Cycles where a code fetch is stalled due to L1 instruction-cache
miss.
- itlb_misses.miss_causes_a_walk
- Misses in ITLB that causes a page walk of any page size.
- itlb_misses.walk_completed_4k
- Completed page walks due to misses in ITLB 4K page entries.
- itlb_misses.walk_completed_2m_4m
- Completed page walks due to misses in ITLB 2M/4M page entries.
- itlb_misses.walk_completed_1g
- Store miss in all TLB levels causes a page walk that completes. (1G)
- itlb_misses.walk_completed
- Completed page walks in ITLB of any page size.
- itlb_misses.walk_duration
- This event counts cycles when the page miss handler (PMH) is servicing
page walks caused by ITLB misses.
- itlb_misses.stlb_hit_4k
- ITLB misses that hit STLB (4K).
- itlb_misses.stlb_hit_2m
- ITLB misses that hit STLB (2M).
- itlb_misses.stlb_hit
- ITLB misses that hit STLB. No page walk.
- ild_stall.lcp
- This event counts cycles where the decoder is stalled on an instruction
with a length changing prefix (LCP).
- ild_stall.iq_full
- Stall cycles due to IQ is full.
- br_inst_exec.nontaken_conditional
- Not taken macro-conditional branches.
- br_inst_exec.taken_conditional
- Taken speculative and retired macro-conditional branches.
- br_inst_exec.taken_direct_jump
- Taken speculative and retired macro-conditional branch instructions
excluding calls and indirects.
- br_inst_exec.taken_indirect_jump_non_call_ret
- Taken speculative and retired indirect branches excluding calls and
returns.
- br_inst_exec.taken_indirect_near_return
- Taken speculative and retired indirect branches with return mnemonic.
- br_inst_exec.taken_direct_near_call
- Taken speculative and retired direct near calls.
- br_inst_exec.taken_indirect_near_call
- Taken speculative and retired indirect calls.
- br_inst_exec.all_conditional
- Speculative and retired macro-conditional branches.
- br_inst_exec.all_direct_jmp
- Speculative and retired macro-unconditional branches excluding calls and
indirects.
- br_inst_exec.all_indirect_jump_non_call_ret
- Speculative and retired indirect branches excluding calls and
returns.
- br_inst_exec.all_indirect_near_return
- Speculative and retired indirect return branches.
- br_inst_exec.all_direct_near_call
- Speculative and retired direct near calls.
- br_inst_exec.all_branches
- Counts all near executed branches (not necessarily retired).
- br_misp_exec.nontaken_conditional
- Not taken speculative and retired mispredicted macro conditional
branches.
- br_misp_exec.taken_conditional
- Taken speculative and retired mispredicted macro conditional
branches.
- br_misp_exec.taken_indirect_jump_non_call_ret
- Taken speculative and retired mispredicted indirect branches excluding
calls and returns.
- br_misp_exec.taken_return_near
- Taken speculative and retired mispredicted indirect branches with return
mnemonic.
- br_misp_exec.taken_indirect_near_call
- Taken speculative and retired mispredicted indirect calls.
- br_misp_exec.all_conditional
- Speculative and retired mispredicted macro conditional branches.
- br_misp_exec.all_indirect_jump_non_call_ret
- Mispredicted indirect branches excluding calls and returns.
- br_misp_exec.all_branches
- Counts all near executed branches (not necessarily retired).
- idq_uops_not_delivered.core
- This event count the number of undelivered (unallocated) uops from the
Front-end to the Resource Allocation Table (RAT) while the Back-end of the
processor is not stalled. The Front-end can allocate up to 4 uops per
cycle so this event can increment 0-4 times per cycle depending on the
number of unallocated uops. This event is counted on a per-core basis.
The following errata may apply to this: HSD135
- idq_uops_not_delivered.cycles_0_uops_deliv.core
- This event counts the number cycles during which the Front-end allocated
exactly zero uops to the Resource Allocation Table (RAT) while the
Back-end of the processor is not stalled. This event is counted on a
per-core basis.
The following errata may apply to this: HSD135
- idq_uops_not_delivered.cycles_le_1_uop_deliv.core
- Cycles per thread when 3 or more uops are not delivered to Resource
Allocation Table (RAT) when backend of the machine is not stalled.
The following errata may apply to this: HSD135
- idq_uops_not_delivered.cycles_le_2_uop_deliv.core
- Cycles with less than 2 uops delivered by the front end.
The following errata may apply to this: HSD135
- idq_uops_not_delivered.cycles_le_3_uop_deliv.core
- Cycles with less than 3 uops delivered by the front end.
The following errata may apply to this: HSD135
- idq_uops_not_delivered.cycles_fe_was_ok
- Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was
stalling FE.
The following errata may apply to this: HSD135
- uops_executed_port.port_0
- Cycles which a uop is dispatched on port 0 in this thread.
- uops_executed_port.port_0_core
- Cycles per core when uops are exectuted in port 0.
- uops_dispatched_port.port_0
- Cycles per thread when uops are executed in port 0.
- uops_executed_port.port_1
- Cycles which a uop is dispatched on port 1 in this thread.
- uops_executed_port.port_1_core
- Cycles per core when uops are exectuted in port 1.
- uops_dispatched_port.port_1
- Cycles per thread when uops are executed in port 1.
- uops_executed_port.port_2
- Cycles which a uop is dispatched on port 2 in this thread.
- uops_executed_port.port_2_core
- Cycles per core when uops are dispatched to port 2.
- uops_dispatched_port.port_2
- Cycles per thread when uops are executed in port 2.
- uops_executed_port.port_3
- Cycles which a uop is dispatched on port 3 in this thread.
- uops_executed_port.port_3_core
- Cycles per core when uops are dispatched to port 3.
- uops_dispatched_port.port_3
- Cycles per thread when uops are executed in port 3.
- uops_executed_port.port_4
- Cycles which a uop is dispatched on port 4 in this thread.
- uops_executed_port.port_4_core
- Cycles per core when uops are exectuted in port 4.
- uops_dispatched_port.port_4
- Cycles per thread when uops are executed in port 4.
- uops_executed_port.port_5
- Cycles which a uop is dispatched on port 5 in this thread.
- uops_executed_port.port_5_core
- Cycles per core when uops are exectuted in port 5.
- uops_dispatched_port.port_5
- Cycles per thread when uops are executed in port 5.
- uops_executed_port.port_6
- Cycles which a uop is dispatched on port 6 in this thread.
- uops_executed_port.port_6_core
- Cycles per core when uops are exectuted in port 6.
- uops_dispatched_port.port_6
- Cycles per thread when uops are executed in port 6.
- uops_executed_port.port_7
- Cycles which a uop is dispatched on port 7 in this thread.
- uops_executed_port.port_7_core
- Cycles per core when uops are dispatched to port 7.
- uops_dispatched_port.port_7
- Cycles per thread when uops are executed in port 7.
- resource_stalls.any
- Cycles allocation is stalled due to resource related reason.
The following errata may apply to this: HSD135
- resource_stalls.rs
- Cycles stalled due to no eligible RS entry available.
- resource_stalls.sb
- This event counts cycles during which no instructions were allocated
because no Store Buffers (SB) were available.
- resource_stalls.rob
- Cycles stalled due to re-order buffer full.
- cycle_activity.cycles_l2_pending
- Cycles with pending L2 miss loads. Set Cmask=2 to count cycle.
The following errata may apply to this: HSD78, HSM63,
HSM80
- cycle_activity.cycles_ldm_pending
- Cycles with pending memory loads. Set Cmask=2 to count cycle.
- cycle_activity.cycles_no_execute
- This event counts cycles during which no instructions were executed in the
execution stage of the pipeline.
- cycle_activity.stalls_l2_pending
- Number of loads missed L2.
The following errata may apply to this: HSM63, HSM80
- cycle_activity.stalls_ldm_pending
- This event counts cycles during which no instructions were executed in the
execution stage of the pipeline and there were memory instructions pending
(waiting for data).
- cycle_activity.cycles_l1d_pending
- Cycles with pending L1 data cache miss loads. Set Cmask=8 to count
cycle.
- cycle_activity.stalls_l1d_pending
- Execution stalls due to L1 data cache miss loads. Set Cmask=0CH.
- lsd.uops
- Number of uops delivered by the LSD.
- lsd.cycles_active
- Cycles Uops delivered by the LSD, but didn't come from the decoder.
- lsd.cycles_4_uops
- Cycles 4 Uops delivered by the LSD, but didn't come from the decoder.
- dsb2mite_switches.penalty_cycles
- Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles.
- itlb.itlb_flush
- Counts the number of ITLB flushes, includes 4k/2M/4M pages.
- offcore_requests.demand_data_rd
- Demand data read requests sent to uncore.
The following errata may apply to this: HSD78, HSM80
- offcore_requests.demand_code_rd
- Demand code read requests sent to uncore.
- offcore_requests.demand_rfo
- Demand RFO read requests sent to uncore, including regular RFOs, locks,
ItoM.
- offcore_requests.all_data_rd
- Data read requests sent to uncore (demand and prefetch).
- uops_executed.stall_cycles
- Counts number of cycles no uops were dispatched to be executed on this
thread.
The following errata may apply to this: HSD144, HSD30,
HSM31
- uops_executed.cycles_ge_1_uop_exec
- This events counts the cycles where at least one uop was executed. It is
counted per thread.
The following errata may apply to this: HSD144, HSD30,
HSM31
- uops_executed.cycles_ge_2_uops_exec
- This events counts the cycles where at least two uop were executed. It is
counted per thread.
The following errata may apply to this: HSD144, HSD30,
HSM31
- uops_executed.cycles_ge_3_uops_exec
- This events counts the cycles where at least three uop were executed. It
is counted per thread.
The following errata may apply to this: HSD144, HSD30,
HSM31
- uops_executed.cycles_ge_4_uops_exec
- Cycles where at least 4 uops were executed per-thread.
The following errata may apply to this: HSD144, HSD30,
HSM31
- uops_executed.core
- Counts total number of uops to be executed per-core each cycle.
The following errata may apply to this: HSD30, HSM31
- uops_executed.core_cycles_ge_1
- Cycles at least 1 micro-op is executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
- uops_executed.core_cycles_ge_2
- Cycles at least 2 micro-op is executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
- uops_executed.core_cycles_ge_3
- Cycles at least 3 micro-op is executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
- uops_executed.core_cycles_ge_4
- Cycles at least 4 micro-op is executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
- uops_executed.core_cycles_none
- Cycles with no micro-ops executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
- offcore_requests_buffer.sq_full
- Offcore requests buffer cannot take more entries for this thread
core.
- page_walker_loads.dtlb_l1
- Number of DTLB page walker loads that hit in the L1+FB.
- page_walker_loads.dtlb_l2
- Number of DTLB page walker loads that hit in the L2.
- page_walker_loads.dtlb_l3
- Number of DTLB page walker loads that hit in the L3.
The following errata may apply to this: HSD25
- page_walker_loads.dtlb_memory
- Number of DTLB page walker loads from memory.
The following errata may apply to this: HSD25
- page_walker_loads.itlb_l1
- Number of ITLB page walker loads that hit in the L1+FB.
- page_walker_loads.itlb_l2
- Number of ITLB page walker loads that hit in the L2.
- page_walker_loads.itlb_l3
- Number of ITLB page walker loads that hit in the L3.
The following errata may apply to this: HSD25
- page_walker_loads.itlb_memory
- Number of ITLB page walker loads from memory.
The following errata may apply to this: HSD25
- page_walker_loads.ept_dtlb_l1
- Counts the number of Extended Page Table walks from the DTLB that hit in
the L1 and FB.
- page_walker_loads.ept_dtlb_l2
- Counts the number of Extended Page Table walks from the DTLB that hit in
the L2.
- page_walker_loads.ept_dtlb_l3
- Counts the number of Extended Page Table walks from the DTLB that hit in
the L3.
- page_walker_loads.ept_dtlb_memory
- Counts the number of Extended Page Table walks from the DTLB that hit in
memory.
- page_walker_loads.ept_itlb_l1
- Counts the number of Extended Page Table walks from the ITLB that hit in
the L1 and FB.
- page_walker_loads.ept_itlb_l2
- Counts the number of Extended Page Table walks from the ITLB that hit in
the L2.
- page_walker_loads.ept_itlb_l3
- Counts the number of Extended Page Table walks from the ITLB that hit in
the L2.
- page_walker_loads.ept_itlb_memory
- Counts the number of Extended Page Table walks from the ITLB that hit in
memory.
- tlb_flush.dtlb_thread
- DTLB flush attempts of the thread-specific entries.
- tlb_flush.stlb_any
- Count number of STLB flush attempts.
- inst_retired.any_p
- Number of instructions at retirement.
The following errata may apply to this: HSD11, HSD140
- inst_retired.prec_dist
- Precise instruction retired event with HW to reduce effect of PEBS shadow
in IP distribution.
The following errata may apply to this: HSD140
- inst_retired.x87
- This is a non-precise version (that is, does not use PEBS) of the event
that counts FP operations retired. For X87 FP operations that have no
exceptions counting also includes flows that have several X87, or flows
that use X87 uops in the exception handling.
- other_assists.avx_to_sse
- Number of transitions from AVX-256 to legacy SSE when penalty applicable.
The following errata may apply to this: HSD56, HSM57
- other_assists.sse_to_avx
- Number of transitions from SSE to AVX-256 when penalty applicable.
The following errata may apply to this: HSD56, HSM57
- other_assists.any_wb_assist
- Number of microcode assists invoked by HW upon uop writeback.
- uops_retired.all
- Counts the number of micro-ops retired. Use Cmask=1 and invert to count
active cycles or stalled cycles.
- uops_retired.stall_cycles
- Cycles without actually retired uops.
- uops_retired.total_cycles
- Cycles with less than 10 actually retired uops.
- uops_retired.core_stall_cycles
- Cycles without actually retired uops.
- uops_retired.retire_slots
- This event counts the number of retirement slots used each cycle. There
are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4
instructions could retire each cycle.
- machine_clears.cycles
- Cycles there was a Nuke. Account for both thread-specific and All Thread
Nukes.
- machine_clears.memory_ordering
- This event counts the number of memory ordering machine clears detected.
Memory ordering machine clears can result from memory address aliasing or
snoops from another hardware thread or core to data inflight in the
pipeline. Machine clears can have a significant performance impact if they
are happening frequently.
- machine_clears.smc
- This event is incremented when self-modifying code (SMC) is detected,
which causes a machine clear. Machine clears can have a significant
performance impact if they are happening frequently.
- machine_clears.maskmov
- This event counts the number of executed Intel AVX masked load operations
that refer to an illegal address range with the mask bits set to 0.
- br_inst_retired.all_branches
- Branch instructions at retirement.
- br_inst_retired.conditional
- Counts the number of conditional branch instructions retired.
- br_inst_retired.near_call
- Direct and indirect near call instructions retired.
- br_inst_retired.near_call_r3
- Direct and indirect macro near call instructions retired (captured in ring
3).
- br_inst_retired.all_branches_pebs
- All (macro) branch instructions retired.
- br_inst_retired.near_return
- Counts the number of near return instructions retired.
- br_inst_retired.not_taken
- Counts the number of not taken branch instructions retired.
- br_inst_retired.near_taken
- Number of near taken branches retired.
- br_inst_retired.far_branch
- Number of far branches retired.
- br_misp_retired.all_branches
- Mispredicted branch instructions at retirement.
- br_misp_retired.conditional
- Mispredicted conditional branch instructions retired.
- br_misp_retired.all_branches_pebs
- This event counts all mispredicted branch instructions retired. This is a
precise event.
- br_misp_retired.near_taken
- Number of near branch instructions retired that were taken but
mispredicted.
- avx_insts.all
- Note that a whole rep string only counts AVX_INST.ALL once.
- hle_retired.start
- Number of times an HLE execution started.
- hle_retired.commit
- Number of times an HLE execution successfully committed.
- hle_retired.aborted
- Number of times an HLE execution aborted due to any reasons (multiple
categories may count as one).
- hle_retired.aborted_misc1
- Number of times an HLE execution aborted due to various memory events
(e.g., read/write capacity and conflicts).
- hle_retired.aborted_misc2
- Number of times an HLE execution aborted due to uncommon conditions.
- hle_retired.aborted_misc3
- Number of times an HLE execution aborted due to HLE-unfriendly
instructions.
- hle_retired.aborted_misc4
- Number of times an HLE execution aborted due to incompatible memory type.
The following errata may apply to this: HSD65
- hle_retired.aborted_misc5
- Number of times an HLE execution aborted due to none of the previous 4
categories (e.g. interrupts).
- rtm_retired.start
- Number of times an RTM execution started.
- rtm_retired.commit
- Number of times an RTM execution successfully committed.
- rtm_retired.aborted
- Number of times an RTM execution aborted due to any reasons (multiple
categories may count as one).
- rtm_retired.aborted_misc1
- Number of times an RTM execution aborted due to various memory events
(e.g. read/write capacity and conflicts).
- rtm_retired.aborted_misc2
- Number of times an RTM execution aborted due to various memory events
(e.g., read/write capacity and conflicts).
- rtm_retired.aborted_misc3
- Number of times an RTM execution aborted due to HLE-unfriendly
instructions.
- rtm_retired.aborted_misc4
- Number of times an RTM execution aborted due to incompatible memory type.
The following errata may apply to this: HSD65
- rtm_retired.aborted_misc5
- Number of times an RTM execution aborted due to none of the previous 4
categories (e.g. interrupt).
- fp_assist.x87_output
- Number of X87 FP assists due to output values.
- fp_assist.x87_input
- Number of X87 FP assists due to input values.
- fp_assist.simd_output
- Number of SIMD FP assists due to output values.
- fp_assist.simd_input
- Number of SIMD FP assists due to input values.
- fp_assist.any
- Cycles with any input/output SSE* or FP assists.
- rob_misc_events.lbr_inserts
- Count cases of saving new LBR records by hardware.
- mem_uops_retired.stlb_miss_loads
- Retired load uops that miss the STLB.
The following errata may apply to this: HSD29, HSM30
- mem_uops_retired.stlb_miss_stores
- Retired store uops that miss the STLB.
The following errata may apply to this: HSD29, HSM30
- mem_uops_retired.lock_loads
- Retired load uops with locked access.
The following errata may apply to this: HSD76, HSD29,
HSM30
- mem_uops_retired.split_loads
- Retired load uops that split across a cacheline boundary.
The following errata may apply to this: HSD29, HSM30
- mem_uops_retired.split_stores
- Retired store uops that split across a cacheline boundary.
The following errata may apply to this: HSD29, HSM30
- mem_uops_retired.all_loads
- All retired load uops.
The following errata may apply to this: HSD29, HSM30
- mem_uops_retired.all_stores
- All retired store uops.
The following errata may apply to this: HSD29, HSM30
- mem_load_uops_retired.l1_hit
- Retired load uops with L1 cache hits as data sources.
The following errata may apply to this: HSD29, HSM30
- mem_load_uops_retired.l2_hit
- Retired load uops with L2 cache hits as data sources.
The following errata may apply to this: HSD76, HSD29,
HSM30
- mem_load_uops_retired.l3_hit
- Retired load uops with L3 cache hits as data sources.
The following errata may apply to this: HSD74, HSD29, HSD25,
HSM26, HSM30
- mem_load_uops_retired.l1_miss
- Retired load uops missed L1 cache as data sources.
The following errata may apply to this: HSM30
- mem_load_uops_retired.l2_miss
- Retired load uops missed L2. Unknown data source excluded.
The following errata may apply to this: HSD29, HSM30
- mem_load_uops_retired.l3_miss
- Retired load uops missed L3. Excludes unknown data source .
The following errata may apply to this: HSD74, HSD29, HSD25,
HSM26, HSM30
- mem_load_uops_retired.hit_lfb
- Retired load uops which data sources were load uops missed L1 but hit FB
due to preceding miss to the same cache line with data not ready.
The following errata may apply to this: HSM30
- mem_load_uops_l3_hit_retired.xsnp_miss
- Retired load uops which data sources were L3 hit and cross-core snoop
missed in on-pkg core cache.
The following errata may apply to this: HSD29, HSD25, HSM26,
HSM30
- mem_load_uops_l3_hit_retired.xsnp_hit
- Retired load uops which data sources were L3 and cross-core snoop hits in
on-pkg core cache.
The following errata may apply to this: HSD29, HSD25, HSM26,
HSM30
- mem_load_uops_l3_hit_retired.xsnp_hitm
- Retired load uops which data sources were HitM responses from shared L3.
The following errata may apply to this: HSD29, HSD25, HSM26,
HSM30
- mem_load_uops_l3_hit_retired.xsnp_none
- Retired load uops which data sources were hits in L3 without snoops
required.
The following errata may apply to this: HSD74, HSD29, HSD25,
HSM26, HSM30
- mem_load_uops_l3_miss_retired.local_dram
- This event counts retired load uops where the data came from local DRAM.
This does not include hardware prefetches.
The following errata may apply to this: HSD74, HSD29, HSD25,
HSM30
- mem_load_uops_l3_miss_retired.remote_dram
- Retired load uop whose Data Source was: remote DRAM either Snoop not
needed or Snoop Miss (RspI)
The following errata may apply to this: HSD29, HSM30
- mem_load_uops_l3_miss_retired.remote_hitm
- Retired load uop whose Data Source was: Remote cache HITM
The following errata may apply to this: HSM30
- mem_load_uops_l3_miss_retired.remote_fwd
- Retired load uop whose Data Source was: forwarded from remote cache
The following errata may apply to this: HSM30
- baclears.any
- Number of front end re-steers due to BPU misprediction.
- l2_trans.demand_data_rd
- Demand data read requests that access L2 cache.
- l2_trans.rfo
- RFO requests that access L2 cache.
- l2_trans.code_rd
- L2 cache accesses when fetching instructions.
- l2_trans.all_pf
- Any MLC or L3 HW prefetch accessing L2, including rejects.
- l2_trans.l1d_wb
- L1D writebacks that access L2 cache.
- l2_trans.l2_fill
- L2 fill requests that access L2 cache.
- l2_trans.l2_wb
- L2 writebacks that access L2 cache.
- l2_trans.all_requests
- Transactions accessing L2 pipe.
- l2_lines_in.i
- L2 cache lines in I state filling L2.
- l2_lines_in.s
- L2 cache lines in S state filling L2.
- l2_lines_in.e
- L2 cache lines in E state filling L2.
- l2_lines_in.all
- This event counts the number of L2 cache lines brought into the L2 cache.
Lines are filled into the L2 cache when there was an L2 miss.
- l2_lines_out.demand_clean
- Clean L2 cache lines evicted by demand.
- l2_lines_out.demand_dirty
- Dirty L2 cache lines evicted by demand.
- sq_misc.split_lock
- tbd