NAME

hsx_events — processor model specific performance counter events

DESCRIPTION

This manual page describes events specific to the following Intel CPU models and is derived from Intel's perfmon data. For more information, please consult the Intel Software Developer's Manual or Intel's perfmon website.

CPU models described by this document:

Family 0x6, Model 0x3f

The following events are supported:

ld_blocks.store_forward: This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load. The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceding smaller uncompleted store. The penalty for blocked store forwarding is that the load must wait for the store to write its value to the cache before it can be issued.
ld_blocks.no_sr: The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.
misalign_mem_ref.loads: Speculative cache-line split load uops dispatched to L1D.
misalign_mem_ref.stores: Speculative cache-line split store-address uops dispatched to L1D.
ld_blocks_partial.address_alias: Aliasing occurs when a load is issued after a store and their memory addresses are offset by 4K. This event counts the number of loads that aliased with a preceding store, resulting in an extended address check in the pipeline which can have a performance impact.
dtlb_load_misses.miss_causes_a_walk: Misses in all TLB levels that cause a page walk of any page size.
dtlb_load_misses.walk_completed_4k: Completed page walks due to demand load misses that caused 4K page walks in any TLB levels.
dtlb_load_misses.walk_completed_2m_4m: Completed page walks due to demand load misses that caused 2M/4M page walks in any TLB levels.
dtlb_load_misses.walk_completed_1g: Load miss in all TLB levels causes a page walk that completes. (1G)
dtlb_load_misses.walk_completed: Completed page walks in any TLB of any page size due to demand load misses.
dtlb_load_misses.walk_duration: This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB load misses.
dtlb_load_misses.stlb_hit_4k: This event counts load operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.
dtlb_load_misses.stlb_hit_2m: This event counts load operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.
dtlb_load_misses.stlb_hit: Number of cache load STLB hits. No page walk.
dtlb_load_misses.pde_cache_miss: DTLB demand load misses with low part of linear-to-physical address translation missed.
int_misc.recovery_cycles: This event counts the number of cycles spent waiting for a recovery after an event such as a processor nuke, JEClear, assist, hle/rtm abort etc.
int_misc.recovery_cycles_any: Core cycles the allocator was stalled due to recovery from earlier clear event for any thread running on the physical core (e.g. misprediction or memory nuke).
uops_issued.any: This event counts the number of uops issued by the Front-end of the pipeline to the Back-end. This event is counted at the allocation stage and will count both retired and non-retired uops.
uops_issued.stall_cycles: Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread.
uops_issued.core_stall_cycles: Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads.
uops_issued.flags_merge: Number of flags-merge uops allocated. Such uops add delay.
uops_issued.slow_lea: Number of slow LEA or similar uops allocated. Such uop has 3 sources (for example, 2 sources + immediate) regardless of whether it is a result of LEA instruction or not.
uops_issued.single_mul: Number of multiply packed/scalar single precision uops allocated.
arith.divider_uops: Any uop executed by the Divider. (This includes all divide uops, sqrt, ...)
l2_rqsts.demand_data_rd_miss: Demand data read requests that missed L2, no rejects.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.rfo_miss: Counts the number of store RFO requests that miss the L2 cache.
l2_rqsts.code_rd_miss: Number of instruction fetches that missed the L2 cache.
l2_rqsts.all_demand_miss: Demand requests that miss L2 cache.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.l2_pf_miss: Counts all L2 HW prefetcher requests that missed L2.
l2_rqsts.miss: All requests that missed L2.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.demand_data_rd_hit: Counts the number of demand Data Read requests, initiated by load instructions, that hit L2 cache
The following errata may apply to this: HSD78, HSM80
l2_rqsts.rfo_hit: Counts the number of store RFO requests that hit the L2 cache.
l2_rqsts.code_rd_hit: Number of instruction fetches that hit the L2 cache.
l2_rqsts.l2_pf_hit: Counts all L2 HW prefetcher requests that hit L2.
l2_rqsts.all_demand_data_rd: Counts any demand and L1 HW prefetch data load requests to L2.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.all_rfo: Counts all L2 store RFO requests.
l2_rqsts.all_code_rd: Counts all L2 code requests.
l2_rqsts.all_demand_references: Demand requests to L2 cache.
The following errata may apply to this: HSD78, HSM80
l2_rqsts.all_pf: Counts all L2 HW prefetcher requests.
l2_rqsts.references: All requests to L2 cache.
The following errata may apply to this: HSD78, HSM80
l2_demand_rqsts.wb_hit: Not rejected writebacks that hit L2 cache.
longest_lat_cache.miss: This event counts each cache miss condition for references to the last level cache.
longest_lat_cache.reference: This event counts requests originating from the core that reference a cache line in the last level cache.
cpu_clk_unhalted.thread_p: Counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.
cpu_clk_unhalted.thread_p_any: Core cycles when at least one thread on the physical core is not in halt state.
cpu_clk_thread_unhalted.ref_xclk: Increments at the frequency of XCLK (100 MHz) when not halted.
cpu_clk_thread_unhalted.ref_xclk_any: Reference cycles when the at least one thread on the physical core is unhalted (counts at 100 MHz rate).
cpu_clk_unhalted.ref_xclk: Reference cycles when the thread is unhalted. (counts at 100 MHz rate)
cpu_clk_unhalted.ref_xclk_any: Reference cycles when the at least one thread on the physical core is unhalted (counts at 100 MHz rate).
cpu_clk_thread_unhalted.one_thread_active: Count XClk pulses when this thread is unhalted and the other thread is halted.
cpu_clk_unhalted.one_thread_active: Count XClk pulses when this thread is unhalted and the other thread is halted.
l1d_pend_miss.pending: Increments the number of outstanding L1D misses every cycle. Set Cmask = 1 and Edge =1 to count occurrences.
l1d_pend_miss.pending_cycles: Cycles with L1D load Misses outstanding.
l1d_pend_miss.pending_cycles_any: Cycles with L1D load Misses outstanding from any thread on physical core.
l1d_pend_miss.request_fb_full: Number of times a request needed a FB entry but there was no entry available for it. That is the FB unavailability was dominant reason for blocking the request. A request includes cacheable/uncacheable demands that is load, store or SW prefetch. HWP are e.
l1d_pend_miss.fb_full: Cycles a demand request was blocked due to Fill Buffers inavailability.
dtlb_store_misses.miss_causes_a_walk: Miss in all TLB levels causes a page walk of any page size (4K/2M/4M/1G).
dtlb_store_misses.walk_completed_4k: Completed page walks due to store misses in one or more TLB levels of 4K page structure.
dtlb_store_misses.walk_completed_2m_4m: Completed page walks due to store misses in one or more TLB levels of 2M/4M page structure.
dtlb_store_misses.walk_completed_1g: Store misses in all DTLB levels that cause completed page walks. (1G)
dtlb_store_misses.walk_completed: Completed page walks due to store miss in any TLB levels of any page size (4K/2M/4M/1G).
dtlb_store_misses.walk_duration: This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB store misses.
dtlb_store_misses.stlb_hit_4k: This event counts store operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.
dtlb_store_misses.stlb_hit_2m: This event counts store operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.
dtlb_store_misses.stlb_hit: Store operations that miss the first TLB level but hit the second and do not cause page walks.
dtlb_store_misses.pde_cache_miss: DTLB store misses with low part of linear-to-physical address translation missed.
load_hit_pre.sw_pf: Non-SW-prefetch load dispatches that hit fill buffer allocated for S/W prefetch.
load_hit_pre.hw_pf: Non-SW-prefetch load dispatches that hit fill buffer allocated for H/W prefetch.
ept.walk_cycles: Cycle count for an Extended Page table walk.
l1d.replacement: This event counts when new data lines are brought into the L1 Data cache, which cause other lines to be evicted from the cache.
tx_mem.abort_conflict: Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address.
tx_mem.abort_capacity_write: Number of times a transactional abort was signaled due to a data capacity limitation for transactional writes.
tx_mem.abort_hle_store_to_elided_lock: Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer.
tx_mem.abort_hle_elision_buffer_not_empty: Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero.
tx_mem.abort_hle_elision_buffer_mismatch: Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer.
tx_mem.abort_hle_elision_buffer_unsupported_alignment: Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer.
tx_mem.hle_elision_buffer_full: Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero.
move_elimination.int_eliminated: Number of integer move elimination candidate uops that were eliminated.
move_elimination.simd_eliminated: Number of SIMD move elimination candidate uops that were eliminated.
move_elimination.int_not_eliminated: Number of integer move elimination candidate uops that were not eliminated.
move_elimination.simd_not_eliminated: Number of SIMD move elimination candidate uops that were not eliminated.
cpl_cycles.ring0: Unhalted core cycles when the thread is in ring 0.
cpl_cycles.ring0_trans: Number of intervals between processor halts while thread is in ring 0.
cpl_cycles.ring123: Unhalted core cycles when the thread is not in ring 0.
tx_exec.misc1: Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort.
tx_exec.misc2: Counts the number of times a class of instructions (e.g., vzeroupper) that may cause a transactional abort was executed inside a transactional region.
tx_exec.misc3: Counts the number of times an instruction execution caused the transactional nest count supported to be exceeded.
tx_exec.misc4: Counts the number of times a XBEGIN instruction was executed inside an HLE transactional region.
tx_exec.misc5: Counts the number of times an HLE XACQUIRE instruction was executed inside an RTM transactional region.
rs_events.empty_cycles: This event counts cycles when the Reservation Station ( RS ) is empty for the thread. The RS is a structure that buffers allocated micro-ops from the Front-end. If there are many cycles when the RS is empty, it may represent an underflow of instructions delivered from the Front-end.
rs_events.empty_end: Counts end of periods where the Reservation Station (RS) was empty. Could be useful to precisely locate Frontend Latency Bound issues.
offcore_requests_outstanding.demand_data_rd: Offcore outstanding demand data read transactions in SQ to uncore. Set Cmask=1 to count cycles.
The following errata may apply to this: HSD78, HSD62, HSD61, HSM63, HSM80
offcore_requests_outstanding.cycles_with_demand_data_rd: Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore.
The following errata may apply to this: HSD78, HSD62, HSD61, HSM63, HSM80
offcore_requests_outstanding.demand_data_rd_ge_6: Cycles with at least 6 offcore outstanding Demand Data Read transactions in uncore queue.
The following errata may apply to this: HSD78, HSD62, HSD61, HSM63, HSM80
offcore_requests_outstanding.demand_code_rd: Offcore outstanding Demand code Read transactions in SQ to uncore. Set Cmask=1 to count cycles.
The following errata may apply to this: HSD62, HSD61, HSM63
offcore_requests_outstanding.demand_rfo: Offcore outstanding RFO store transactions in SQ to uncore. Set Cmask=1 to count cycles.
The following errata may apply to this: HSD62, HSD61, HSM63
offcore_requests_outstanding.cycles_with_demand_rfo: Offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore, every cycle.
The following errata may apply to this: HSD62, HSD61, HSM63
offcore_requests_outstanding.all_data_rd: Offcore outstanding cacheable data read transactions in SQ to uncore. Set Cmask=1 to count cycles.
The following errata may apply to this: HSD62, HSD61, HSM63
offcore_requests_outstanding.cycles_with_data_rd: Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore.
The following errata may apply to this: HSD62, HSD61, HSM63
lock_cycles.split_lock_uc_lock_duration: Cycles in which the L1D and L2 are locked, due to a UC lock or split lock.
lock_cycles.cache_lock_duration: Cycles in which the L1D is locked.
idq.empty: Counts cycles the IDQ is empty.
The following errata may apply to this: HSD135
idq.mite_uops: Increment each cycle # of uops delivered to IDQ from MITE path. Set Cmask = 1 to count cycles.
idq.mite_cycles: Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path.
idq.dsb_uops: Increment each cycle. # of uops delivered to IDQ from DSB path. Set Cmask = 1 to count cycles.
idq.dsb_cycles: Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path.
idq.ms_dsb_uops: Increment each cycle # of uops delivered to IDQ when MS_busy by DSB. Set Cmask = 1 to count cycles. Add Edge=1 to count # of delivery.
idq.ms_dsb_cycles: Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy.
idq.ms_dsb_occur: Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy.
idq.all_dsb_cycles_4_uops: Counts cycles DSB is delivered four uops. Set Cmask = 4.
idq.all_dsb_cycles_any_uops: Counts cycles DSB is delivered at least one uops. Set Cmask = 1.
idq.ms_mite_uops: Increment each cycle # of uops delivered to IDQ when MS_busy by MITE. Set Cmask = 1 to count cycles.
idq.all_mite_cycles_4_uops: Counts cycles MITE is delivered four uops. Set Cmask = 4.
idq.all_mite_cycles_any_uops: Counts cycles MITE is delivered at least one uop. Set Cmask = 1.
idq.ms_uops: This event counts uops delivered by the Front-end with the assistance of the microcode sequencer. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.
idq.ms_cycles: This event counts cycles during which the microcode sequencer assisted the Front-end in delivering uops. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.
idq.ms_switches: Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.
idq.mite_all_uops: Number of uops delivered to IDQ from any path.
icache.hit: Number of Instruction Cache, Streaming Buffer and Victim Cache Reads. both cacheable and noncacheable, including UC fetches.
icache.misses: This event counts Instruction Cache (ICACHE) misses.
icache.ifetch_stall: Cycles where a code fetch is stalled due to L1 instruction-cache miss.
icache.ifdata_stall: Cycles where a code fetch is stalled due to L1 instruction-cache miss.
itlb_misses.miss_causes_a_walk: Misses in ITLB that causes a page walk of any page size.
itlb_misses.walk_completed_4k: Completed page walks due to misses in ITLB 4K page entries.
itlb_misses.walk_completed_2m_4m: Completed page walks due to misses in ITLB 2M/4M page entries.
itlb_misses.walk_completed_1g: Store miss in all TLB levels causes a page walk that completes. (1G)
itlb_misses.walk_completed: Completed page walks in ITLB of any page size.
itlb_misses.walk_duration: This event counts cycles when the page miss handler (PMH) is servicing page walks caused by ITLB misses.
itlb_misses.stlb_hit_4k: ITLB misses that hit STLB (4K).
itlb_misses.stlb_hit_2m: ITLB misses that hit STLB (2M).
itlb_misses.stlb_hit: ITLB misses that hit STLB. No page walk.
ild_stall.lcp: This event counts cycles where the decoder is stalled on an instruction with a length changing prefix (LCP).
ild_stall.iq_full: Stall cycles due to IQ is full.
br_inst_exec.nontaken_conditional: Not taken macro-conditional branches.
br_inst_exec.taken_conditional: Taken speculative and retired macro-conditional branches.
br_inst_exec.taken_direct_jump: Taken speculative and retired macro-conditional branch instructions excluding calls and indirects.
br_inst_exec.taken_indirect_jump_non_call_ret: Taken speculative and retired indirect branches excluding calls and returns.
br_inst_exec.taken_indirect_near_return: Taken speculative and retired indirect branches with return mnemonic.
br_inst_exec.taken_direct_near_call: Taken speculative and retired direct near calls.
br_inst_exec.taken_indirect_near_call: Taken speculative and retired indirect calls.
br_inst_exec.all_conditional: Speculative and retired macro-conditional branches.
br_inst_exec.all_direct_jmp: Speculative and retired macro-unconditional branches excluding calls and indirects.
br_inst_exec.all_indirect_jump_non_call_ret: Speculative and retired indirect branches excluding calls and returns.
br_inst_exec.all_indirect_near_return: Speculative and retired indirect return branches.
br_inst_exec.all_direct_near_call: Speculative and retired direct near calls.
br_inst_exec.all_branches: Counts all near executed branches (not necessarily retired).
br_misp_exec.nontaken_conditional: Not taken speculative and retired mispredicted macro conditional branches.
br_misp_exec.taken_conditional: Taken speculative and retired mispredicted macro conditional branches.
br_misp_exec.taken_indirect_jump_non_call_ret: Taken speculative and retired mispredicted indirect branches excluding calls and returns.
br_misp_exec.taken_return_near: Taken speculative and retired mispredicted indirect branches with return mnemonic.
br_misp_exec.taken_indirect_near_call: Taken speculative and retired mispredicted indirect calls.
br_misp_exec.all_conditional: Speculative and retired mispredicted macro conditional branches.
br_misp_exec.all_indirect_jump_non_call_ret: Mispredicted indirect branches excluding calls and returns.
br_misp_exec.all_branches: Counts all near executed branches (not necessarily retired).
idq_uops_not_delivered.core: This event count the number of undelivered (unallocated) uops from the Front-end to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. The Front-end can allocate up to 4 uops per cycle so this event can increment 0-4 times per cycle depending on the number of unallocated uops. This event is counted on a per-core basis.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_0_uops_deliv.core: This event counts the number cycles during which the Front-end allocated exactly zero uops to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. This event is counted on a per-core basis.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_le_1_uop_deliv.core: Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_le_2_uop_deliv.core: Cycles with less than 2 uops delivered by the front end.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_le_3_uop_deliv.core: Cycles with less than 3 uops delivered by the front end.
The following errata may apply to this: HSD135
idq_uops_not_delivered.cycles_fe_was_ok: Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE.
The following errata may apply to this: HSD135
uops_executed_port.port_0: Cycles which a uop is dispatched on port 0 in this thread.
uops_executed_port.port_0_core: Cycles per core when uops are exectuted in port 0.
uops_dispatched_port.port_0: Cycles per thread when uops are executed in port 0.
uops_executed_port.port_1: Cycles which a uop is dispatched on port 1 in this thread.
uops_executed_port.port_1_core: Cycles per core when uops are exectuted in port 1.
uops_dispatched_port.port_1: Cycles per thread when uops are executed in port 1.
uops_executed_port.port_2: Cycles which a uop is dispatched on port 2 in this thread.
uops_executed_port.port_2_core: Cycles per core when uops are dispatched to port 2.
uops_dispatched_port.port_2: Cycles per thread when uops are executed in port 2.
uops_executed_port.port_3: Cycles which a uop is dispatched on port 3 in this thread.
uops_executed_port.port_3_core: Cycles per core when uops are dispatched to port 3.
uops_dispatched_port.port_3: Cycles per thread when uops are executed in port 3.
uops_executed_port.port_4: Cycles which a uop is dispatched on port 4 in this thread.
uops_executed_port.port_4_core: Cycles per core when uops are exectuted in port 4.
uops_dispatched_port.port_4: Cycles per thread when uops are executed in port 4.
uops_executed_port.port_5: Cycles which a uop is dispatched on port 5 in this thread.
uops_executed_port.port_5_core: Cycles per core when uops are exectuted in port 5.
uops_dispatched_port.port_5: Cycles per thread when uops are executed in port 5.
uops_executed_port.port_6: Cycles which a uop is dispatched on port 6 in this thread.
uops_executed_port.port_6_core: Cycles per core when uops are exectuted in port 6.
uops_dispatched_port.port_6: Cycles per thread when uops are executed in port 6.
uops_executed_port.port_7: Cycles which a uop is dispatched on port 7 in this thread.
uops_executed_port.port_7_core: Cycles per core when uops are dispatched to port 7.
uops_dispatched_port.port_7: Cycles per thread when uops are executed in port 7.
resource_stalls.any: Cycles allocation is stalled due to resource related reason.
The following errata may apply to this: HSD135
resource_stalls.rs: Cycles stalled due to no eligible RS entry available.
resource_stalls.sb: This event counts cycles during which no instructions were allocated because no Store Buffers (SB) were available.
resource_stalls.rob: Cycles stalled due to re-order buffer full.
cycle_activity.cycles_l2_pending: Cycles with pending L2 miss loads. Set Cmask=2 to count cycle.
The following errata may apply to this: HSD78, HSM63, HSM80
cycle_activity.cycles_ldm_pending: Cycles with pending memory loads. Set Cmask=2 to count cycle.
cycle_activity.cycles_no_execute: This event counts cycles during which no instructions were executed in the execution stage of the pipeline.
cycle_activity.stalls_l2_pending: Number of loads missed L2.
The following errata may apply to this: HSM63, HSM80
cycle_activity.stalls_ldm_pending: This event counts cycles during which no instructions were executed in the execution stage of the pipeline and there were memory instructions pending (waiting for data).
cycle_activity.cycles_l1d_pending: Cycles with pending L1 data cache miss loads. Set Cmask=8 to count cycle.
cycle_activity.stalls_l1d_pending: Execution stalls due to L1 data cache miss loads. Set Cmask=0CH.
lsd.uops: Number of uops delivered by the LSD.
lsd.cycles_active: Cycles Uops delivered by the LSD, but didn't come from the decoder.
lsd.cycles_4_uops: Cycles 4 Uops delivered by the LSD, but didn't come from the decoder.
dsb2mite_switches.penalty_cycles: Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles.
itlb.itlb_flush: Counts the number of ITLB flushes, includes 4k/2M/4M pages.
offcore_requests.demand_data_rd: Demand data read requests sent to uncore.
The following errata may apply to this: HSD78, HSM80
offcore_requests.demand_code_rd: Demand code read requests sent to uncore.
offcore_requests.demand_rfo: Demand RFO read requests sent to uncore, including regular RFOs, locks, ItoM.
offcore_requests.all_data_rd: Data read requests sent to uncore (demand and prefetch).
uops_executed.stall_cycles: Counts number of cycles no uops were dispatched to be executed on this thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.cycles_ge_1_uop_exec: This events counts the cycles where at least one uop was executed. It is counted per thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.cycles_ge_2_uops_exec: This events counts the cycles where at least two uop were executed. It is counted per thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.cycles_ge_3_uops_exec: This events counts the cycles where at least three uop were executed. It is counted per thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.cycles_ge_4_uops_exec: Cycles where at least 4 uops were executed per-thread.
The following errata may apply to this: HSD144, HSD30, HSM31
uops_executed.core: Counts total number of uops to be executed per-core each cycle.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_ge_1: Cycles at least 1 micro-op is executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_ge_2: Cycles at least 2 micro-op is executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_ge_3: Cycles at least 3 micro-op is executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_ge_4: Cycles at least 4 micro-op is executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
uops_executed.core_cycles_none: Cycles with no micro-ops executed from any thread on physical core.
The following errata may apply to this: HSD30, HSM31
offcore_requests_buffer.sq_full: Offcore requests buffer cannot take more entries for this thread core.
page_walker_loads.dtlb_l1: Number of DTLB page walker loads that hit in the L1+FB.
page_walker_loads.dtlb_l2: Number of DTLB page walker loads that hit in the L2.
page_walker_loads.dtlb_l3: Number of DTLB page walker loads that hit in the L3.
The following errata may apply to this: HSD25
page_walker_loads.dtlb_memory: Number of DTLB page walker loads from memory.
The following errata may apply to this: HSD25
page_walker_loads.itlb_l1: Number of ITLB page walker loads that hit in the L1+FB.
page_walker_loads.itlb_l2: Number of ITLB page walker loads that hit in the L2.
page_walker_loads.itlb_l3: Number of ITLB page walker loads that hit in the L3.
The following errata may apply to this: HSD25
page_walker_loads.itlb_memory: Number of ITLB page walker loads from memory.
The following errata may apply to this: HSD25
page_walker_loads.ept_dtlb_l1: Counts the number of Extended Page Table walks from the DTLB that hit in the L1 and FB.
page_walker_loads.ept_dtlb_l2: Counts the number of Extended Page Table walks from the DTLB that hit in the L2.
page_walker_loads.ept_dtlb_l3: Counts the number of Extended Page Table walks from the DTLB that hit in the L3.
page_walker_loads.ept_dtlb_memory: Counts the number of Extended Page Table walks from the DTLB that hit in memory.
page_walker_loads.ept_itlb_l1: Counts the number of Extended Page Table walks from the ITLB that hit in the L1 and FB.
page_walker_loads.ept_itlb_l2: Counts the number of Extended Page Table walks from the ITLB that hit in the L2.
page_walker_loads.ept_itlb_l3: Counts the number of Extended Page Table walks from the ITLB that hit in the L2.
page_walker_loads.ept_itlb_memory: Counts the number of Extended Page Table walks from the ITLB that hit in memory.
tlb_flush.dtlb_thread: DTLB flush attempts of the thread-specific entries.
tlb_flush.stlb_any: Count number of STLB flush attempts.
inst_retired.any_p: Number of instructions at retirement.
The following errata may apply to this: HSD11, HSD140
inst_retired.prec_dist: Precise instruction retired event with HW to reduce effect of PEBS shadow in IP distribution.
The following errata may apply to this: HSD140
inst_retired.x87: This is a non-precise version (that is, does not use PEBS) of the event that counts FP operations retired. For X87 FP operations that have no exceptions counting also includes flows that have several X87, or flows that use X87 uops in the exception handling.
other_assists.avx_to_sse: Number of transitions from AVX-256 to legacy SSE when penalty applicable.
The following errata may apply to this: HSD56, HSM57
other_assists.sse_to_avx: Number of transitions from SSE to AVX-256 when penalty applicable.
The following errata may apply to this: HSD56, HSM57
other_assists.any_wb_assist: Number of microcode assists invoked by HW upon uop writeback.
uops_retired.all: Counts the number of micro-ops retired. Use Cmask=1 and invert to count active cycles or stalled cycles.
uops_retired.stall_cycles: Cycles without actually retired uops.
uops_retired.total_cycles: Cycles with less than 10 actually retired uops.
uops_retired.core_stall_cycles: Cycles without actually retired uops.
uops_retired.retire_slots: This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle.
machine_clears.cycles: Cycles there was a Nuke. Account for both thread-specific and All Thread Nukes.
machine_clears.memory_ordering: This event counts the number of memory ordering machine clears detected. Memory ordering machine clears can result from memory address aliasing or snoops from another hardware thread or core to data inflight in the pipeline. Machine clears can have a significant performance impact if they are happening frequently.
machine_clears.smc: This event is incremented when self-modifying code (SMC) is detected, which causes a machine clear. Machine clears can have a significant performance impact if they are happening frequently.
machine_clears.maskmov: This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0.
br_inst_retired.all_branches: Branch instructions at retirement.
br_inst_retired.conditional: Counts the number of conditional branch instructions retired.
br_inst_retired.near_call: Direct and indirect near call instructions retired.
br_inst_retired.near_call_r3: Direct and indirect macro near call instructions retired (captured in ring 3).
br_inst_retired.all_branches_pebs: All (macro) branch instructions retired.
br_inst_retired.near_return: Counts the number of near return instructions retired.
br_inst_retired.not_taken: Counts the number of not taken branch instructions retired.
br_inst_retired.near_taken: Number of near taken branches retired.
br_inst_retired.far_branch: Number of far branches retired.
br_misp_retired.all_branches: Mispredicted branch instructions at retirement.
br_misp_retired.conditional: Mispredicted conditional branch instructions retired.
br_misp_retired.all_branches_pebs: This event counts all mispredicted branch instructions retired. This is a precise event.
br_misp_retired.near_taken: Number of near branch instructions retired that were taken but mispredicted.
avx_insts.all: Note that a whole rep string only counts AVX_INST.ALL once.
hle_retired.start: Number of times an HLE execution started.
hle_retired.commit: Number of times an HLE execution successfully committed.
hle_retired.aborted: Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).
hle_retired.aborted_misc1: Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts).
hle_retired.aborted_misc2: Number of times an HLE execution aborted due to uncommon conditions.
hle_retired.aborted_misc3: Number of times an HLE execution aborted due to HLE-unfriendly instructions.
hle_retired.aborted_misc4: Number of times an HLE execution aborted due to incompatible memory type.
The following errata may apply to this: HSD65
hle_retired.aborted_misc5: Number of times an HLE execution aborted due to none of the previous 4 categories (e.g. interrupts).
rtm_retired.start: Number of times an RTM execution started.
rtm_retired.commit: Number of times an RTM execution successfully committed.
rtm_retired.aborted: Number of times an RTM execution aborted due to any reasons (multiple categories may count as one).
rtm_retired.aborted_misc1: Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts).
rtm_retired.aborted_misc2: Number of times an RTM execution aborted due to various memory events (e.g., read/write capacity and conflicts).
rtm_retired.aborted_misc3: Number of times an RTM execution aborted due to HLE-unfriendly instructions.
rtm_retired.aborted_misc4: Number of times an RTM execution aborted due to incompatible memory type.
The following errata may apply to this: HSD65
rtm_retired.aborted_misc5: Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt).
fp_assist.x87_output: Number of X87 FP assists due to output values.
fp_assist.x87_input: Number of X87 FP assists due to input values.
fp_assist.simd_output: Number of SIMD FP assists due to output values.
fp_assist.simd_input: Number of SIMD FP assists due to input values.
fp_assist.any: Cycles with any input/output SSE* or FP assists.
rob_misc_events.lbr_inserts: Count cases of saving new LBR records by hardware.
mem_uops_retired.stlb_miss_loads: Retired load uops that miss the STLB.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.stlb_miss_stores: Retired store uops that miss the STLB.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.lock_loads: Retired load uops with locked access.
The following errata may apply to this: HSD76, HSD29, HSM30
mem_uops_retired.split_loads: Retired load uops that split across a cacheline boundary.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.split_stores: Retired store uops that split across a cacheline boundary.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.all_loads: All retired load uops.
The following errata may apply to this: HSD29, HSM30
mem_uops_retired.all_stores: All retired store uops.
The following errata may apply to this: HSD29, HSM30
mem_load_uops_retired.l1_hit: Retired load uops with L1 cache hits as data sources.
The following errata may apply to this: HSD29, HSM30
mem_load_uops_retired.l2_hit: Retired load uops with L2 cache hits as data sources.
The following errata may apply to this: HSD76, HSD29, HSM30
mem_load_uops_retired.l3_hit: Retired load uops with L3 cache hits as data sources.
The following errata may apply to this: HSD74, HSD29, HSD25, HSM26, HSM30
mem_load_uops_retired.l1_miss: Retired load uops missed L1 cache as data sources.
The following errata may apply to this: HSM30
mem_load_uops_retired.l2_miss: Retired load uops missed L2. Unknown data source excluded.
The following errata may apply to this: HSD29, HSM30
mem_load_uops_retired.l3_miss: Retired load uops missed L3. Excludes unknown data source .
The following errata may apply to this: HSD74, HSD29, HSD25, HSM26, HSM30
mem_load_uops_retired.hit_lfb: Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready.
The following errata may apply to this: HSM30
mem_load_uops_l3_hit_retired.xsnp_miss: Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache.
The following errata may apply to this: HSD29, HSD25, HSM26, HSM30
mem_load_uops_l3_hit_retired.xsnp_hit: Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache.
The following errata may apply to this: HSD29, HSD25, HSM26, HSM30
mem_load_uops_l3_hit_retired.xsnp_hitm: Retired load uops which data sources were HitM responses from shared L3.
The following errata may apply to this: HSD29, HSD25, HSM26, HSM30
mem_load_uops_l3_hit_retired.xsnp_none: Retired load uops which data sources were hits in L3 without snoops required.
The following errata may apply to this: HSD74, HSD29, HSD25, HSM26, HSM30
mem_load_uops_l3_miss_retired.local_dram: This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches.
The following errata may apply to this: HSD74, HSD29, HSD25, HSM30
mem_load_uops_l3_miss_retired.remote_dram: Retired load uop whose Data Source was: remote DRAM either Snoop not needed or Snoop Miss (RspI)
The following errata may apply to this: HSD29, HSM30
mem_load_uops_l3_miss_retired.remote_hitm: Retired load uop whose Data Source was: Remote cache HITM
The following errata may apply to this: HSM30
mem_load_uops_l3_miss_retired.remote_fwd: Retired load uop whose Data Source was: forwarded from remote cache
The following errata may apply to this: HSM30
baclears.any: Number of front end re-steers due to BPU misprediction.
l2_trans.demand_data_rd: Demand data read requests that access L2 cache.
l2_trans.rfo: RFO requests that access L2 cache.
l2_trans.code_rd: L2 cache accesses when fetching instructions.
l2_trans.all_pf: Any MLC or L3 HW prefetch accessing L2, including rejects.
l2_trans.l1d_wb: L1D writebacks that access L2 cache.
l2_trans.l2_fill: L2 fill requests that access L2 cache.
l2_trans.l2_wb: L2 writebacks that access L2 cache.
l2_trans.all_requests: Transactions accessing L2 pipe.
l2_lines_in.i: L2 cache lines in I state filling L2.
l2_lines_in.s: L2 cache lines in S state filling L2.
l2_lines_in.e: L2 cache lines in E state filling L2.
l2_lines_in.all: This event counts the number of L2 cache lines brought into the L2 cache. Lines are filled into the L2 cache when there was an L2 miss.
l2_lines_out.demand_clean: Clean L2 cache lines evicted by demand.
l2_lines_out.demand_dirty: Dirty L2 cache lines evicted by demand.
sq_misc.split_lock: tbd

NAME

DESCRIPTION

SEE ALSO