Components¶
Overview¶
This is an overview of the components available in timemory. For detailed info on the member functions, etc. please refer to the Doxygen.
The component documentation below is categorized into some general subsections and then sorted alphabetically.
In general, which member function are present are not that important as long
as you use the variadic component bundlers – these handle ignoring
trying to call start()
on a component is the component does not have
a start()
member function but is bundled alongside other components
which do (that the start()
was intended for).
Component Basics¶
Timemory components are C++ structs (class which defaults to public
instead of private
) which
define a single collection instance, e.g. the wall_clock
component is written as a simple class
with two 64-bit integers with start()
and stop()
member functions.
// This "component" is for conceptual demonstration only
// It is not intended to be copy+pasted
struct wall_clock
{
int64_t m_value = 0;
int64_t m_accum = 0;
void start();
void stop();
};
The start()
member function which records a timestamp
and assigns it to one of the integers temporarily, the stop()
member function
which records another timestamp, computes the difference and then assigns the difference
to the first integer and adds the difference to the second integer.
void wall_clock::start()
{
m_value = get_timestamp();
}
void wall_clock::stop()
{
// compute difference b/t when start and stop were called
m_value = (get_timestamp() - m_value);
// accumulate the difference
m_accum += m_value;
}
Thus, after start()
and stop()
is invoked twice on the object:
wall_clock foo;
foo.start();
sleep(1); // sleep for 1 second
foo.stop();
foo.start();
sleep(1); // sleep for 1 second
foo.stop();
The first integer (m_value
) represents the most recent timing interval of 1 second
and the second integer (m_accum
) represents the accumulated timing interval totaling 2 seconds.
This design not only encapsulates how to take the measurement, but also provides it’s own
data storage model. With this design, timemory measurements naturally support asynchronous
data collection. Additionally, as part of the design for generating the call-graph,
call-graphs are accumulated locally on each thread and on each process and merged at
the termination of the thread or process. This allows parallel data to be collection
free from synchronization overheads. On the worker threads, there is a concept of being
at “sea-level” – the call-graphs relative position based on the base-line of the
primary thread in the application. When a worker thread is at sea-level, it reads the
position of the call-graph on the primary thread and creates a copy of that entry
in it’s call-graph, ensuring that when merged into the primary thread at the end,
the accumulated call-graph across all threads is inserted into the appropriate
location. This approach has been found to produce the fewest number of artifacts.
In general, components do not need to conform to a specific interface. This is relatively unique approach. Most performance analysis which allow user extensions use callbacks and dynamic polymorphism to integrate the user extensions into their workflow. It should be noted that there is nothing preventing a component from creating a similar system but timemory is designed to query the presence of member function names for feature detection and adapts accordingly to the overloads of that function name and it’s return type. This is all possible due to the template-based design which makes extensive use of variadic functions to accept any arguments at a high-level and SFINAE to decide at compile-time which function to invoke (if a function is invoked at all). For example:
component A can contain these member functions:
void start()
int get()
void set_prefix(const char*)
component B can contains these member functions:
void start()
void start(cudaStream_t)
double get()
component C can contain these member functions:
void start()
void set_prefix(const std::string&)
And for a given bundle component_tuple<A, B, C> obj
:
When
obj
is created, a string identifer, instance of asource_location
struct, or a hash is requiredThis is the label for the measurement
If a string is passed,
obj
generates the hash and adds the hash and the string to a hash-map if it didn’t previously existA::set_prefix(const char*)
will be invoked with the underlyingconst char*
from the string that the hash maps to in the hash-mapC::set_prefix(const std::string&)
will be invoked with string that the hash maps to in the hash-mapIt will be detected that
B
does not have a member function namedset_prefix
and no member function will be invoked
Invoking
obj.start()
calls the following member functions on instances of A, B, and C:A::start()
B::start()
C::start()
Invoking
obj.start(cudaStream_t)
calls the following member functions on instances of A, B, and C:A::start()
B::start(cudaStream_t)
C::start()
Invoking
obj.get()
:Returns
std::tuple<int, double>
because it detects the two return types from A and B and the lack ofget()
member function in component C.
This design makes has several benefits and one downside in particular. The benefits are that timemory: (1) makes it extremely easy to create a unified interface between two or more components which different interfaces/capabilities, (2) invoking the different interfaces is efficient since no feature detection logic is required at runtime, and (3) components define their own interface.
With respect to #2, consider the two more traditional implementations. If callbacks are used, a function pointer exists and a component which does not implement a feature will either have a null function pointer (requiring a check at runtime time) or the tool will implement an array of function pointers with an unknown size at compile-time. In the latter case, this will require heap allocations (which are expensive operations) and in both cases, the loop of the function pointers will likely be quite ineffienct since function pointers have a very high probability of thrashing the instruction cache. If dynamic polymorphism is used, then virtual table look-ups are required during every iteration. In the timemory approach, none of these additional overheads are present and there isn’t even a loop – the bundle either expands into a direct call to the member function without any abstractions or nothing.
With respect to #1 and #3, this has some interesting implications with regard to a universal instrumentation interface and is discussed in the following section and the CONTRIBUTING.md documentation.
The aforementioned downside is that the byproduct of all this flexibility and adaption to custom interfaces by each component is that directly using the template interface can take quite a long time to compile.
Component Metadata¶
-
template<int
Idx
>
structtim::component
::
enumerator
: public tim::component::properties<placeholder<nothing>>¶ This is a critical specialization for mapping string and integers to component types at runtime (should always be specialized alongside tim::component::properties) and it is also critical for performing template metaprogramming “loops” over all the components. E.g.:
template <size_t Idx> using Enumerator_t = typename tim::component::enumerator<Idx>::type; template <size_t... Idx> auto init(std::index_sequence<Idx...>) { // expand for [0, TIMEMORY_COMPONENTS_END) TIMEMORY_FOLD_EXPRESSION(tim::storage_initializer::get< Enumerator_t<Idx>>()); } void init() { init(std::make_index_sequence<TIMEMORY_COMPONENTS_END>{}); }
- tparam Idx
Enumeration value
Public Functions
-
inline bool
operator==
(int) const¶
-
inline bool
operator==
(const char*) const¶
-
inline bool
operator==
(const std::string&) const¶
-
inline void
serialize
(Archive&, const unsigned int)¶
-
inline TIMEMORY_COMPONENT
operator()
()¶
-
inline constexpr
operator TIMEMORY_COMPONENT
() const¶
Public Static Functions
-
static inline constexpr bool
specialized
()¶
-
static inline constexpr const char *
enum_string
()¶
-
static inline constexpr const char *
id
()¶
-
static inline idset_t
ids
()¶
Public Static Attributes
-
static constexpr bool
value
= false¶
-
template<typename
Tp
>
structtim::component
::
metadata
¶ Provides forward declaration support for assigning static metadata properties. This is most useful for specialization of template components. If this class is specialized for component, then the component does not need to provide the static member functions
label()
anddescription()
.Public Static Functions
-
static std::string
name
()¶
-
static std::string
label
()¶
-
static std::string
description
()¶
-
static inline std::string
extra_description
()¶
-
static inline constexpr bool
specialized
()¶
Public Static Attributes
-
static constexpr TIMEMORY_COMPONENT
value
= TIMEMORY_COMPONENTS_END¶
-
static std::string
-
template<typename
Tp
>
structtim::component
::
properties
: public tim::component::static_properties<Tp>¶ This is a critical specialization for mapping string and integers to component types at runtime. The
enum_string()
function is the enum id as a string. Theid()
function is (typically) the name of the C++ component as a string. Theids()
function returns a set of strings which are alternative string identifiers to the enum string or the string ID. Additionally, it provides serializaiton of these values.A macro is provides to simplify this specialization:
- tparam Tp
Component type
TIMEMORY_PROPERTY_SPECIALIZATION(wall_clock, TIMEMORY_WALL_CLOCK, "wall_clock", "real_clock", "virtual_clock")
In the above, the first parameter is the C++ type, the second is the enumeration id, the enum string is automatically generated via preprocessor
#
on the second parameter, the third parameter is the string ID, and the remaining values are placed in theids()
. Additionally, this macro specializes the tim::component::enumerator.Public Functions
-
inline TIMEMORY_COMPONENT
operator()
()¶
-
inline constexpr
operator TIMEMORY_COMPONENT
() const¶
Public Static Functions
-
static inline constexpr bool
specialized
()¶
-
static inline constexpr const char *
enum_string
()¶
-
static inline constexpr const char *
id
()¶
-
static inline idset_t
ids
()¶
Public Static Attributes
-
static constexpr TIMEMORY_COMPONENT
value
= TIMEMORY_COMPONENTS_END¶
-
template<typename
Tp
, boolPlaceHolder
= concepts::is_placeholder<Tp>::value>
structstatic_properties
¶ Provides three variants of a
matches
function for determining if a component is identified by a given string or enumeration value.- tparam Tp
Component type
- tparam Placeholder
Whether or not the component type is a placeholder type that should be ignored during runtime initialization.
Subclassed by tim::component::properties< placeholder< nothing > >, tim::component::properties< Tp >
Timing Components¶
-
struct
cpu_clock
: public tim::component::base<cpu_clock>¶ this component extracts only the CPU time spent in both user- and kernel- mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct
cpu_util
: public tim::component::base<cpu_util, std::pair<int64_t, int64_t>>¶ this computes the CPU utilization percentage for the calling process and child processes. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct
kernel_mode_time
: public tim::component::base<kernel_mode_time, int64_t>¶ This is the total amount of time spent executing in kernel mode.
-
struct
monotonic_clock
: public tim::component::base<monotonic_clock>¶ clock that increments monotonically, tracking the time since an arbitrary point, and will continue to increment while the system is asleep.
-
struct
monotonic_raw_clock
: public tim::component::base<monotonic_raw_clock>¶ clock that increments monotonically, tracking the time since an arbitrary point like CLOCK_MONOTONIC. However, this clock is unaffected by frequency or time adjustments. It should not be compared to other system time sources.
-
struct
process_cpu_clock
: public tim::component::base<process_cpu_clock>¶ this clock measures the CPU time within the current process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct
process_cpu_util
: public tim::component::base<process_cpu_util, std::pair<int64_t, int64_t>>¶ this computes the CPU utilization percentage for ONLY the calling process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct
system_clock
: public tim::component::base<system_clock>¶ this component extracts only the CPU time spent in kernel-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct
thread_cpu_clock
: public tim::component::base<thread_cpu_clock>¶ this clock measures the CPU time within the current thread (excludes sibling/child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct
thread_cpu_util
: public tim::component::base<thread_cpu_util, std::pair<int64_t, int64_t>>¶ this computes the CPU utilization percentage for ONLY the calling thread (excludes sibling and child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct
user_clock
: public tim::component::base<user_clock>¶ this component extracts only the CPU time spent in user-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct
user_mode_time
: public tim::component::base<user_mode_time, int64_t>¶ This is the total amount of time spent executing in user mode.
-
struct
wall_clock
: public tim::component::base<wall_clock, int64_t>¶
Resource Usage Components¶
-
struct
current_peak_rss
: public tim::component::base<current_peak_rss, std::pair<int64_t, int64_t>>¶ this struct extracts the absolute value of high-water mark of the resident set size (RSS) at start and stop points. RSS is current amount of memory in RAM.
-
struct
num_io_in
: public tim::component::base<num_io_in>¶ the number of times the file system had to perform input.
-
struct
num_io_out
: public tim::component::base<num_io_out>¶ the number of times the file system had to perform output.
-
struct
num_major_page_faults
: public tim::component::base<num_major_page_faults>¶ the number of page faults serviced that required I/O activity.
-
struct
num_minor_page_faults
: public tim::component::base<num_minor_page_faults>¶ the number of page faults serviced without any I/O activity; here I/O activity is avoided by reclaiming a page frame from the list of pages awaiting reallocation.
-
struct
page_rss
: public tim::component::base<page_rss, int64_t>¶ this struct measures the resident set size (RSS) currently allocated in pages of memory. Unlike the peak_rss, this value will fluctuate as memory gets freed and allocated
-
struct
peak_rss
: public tim::component::base<peak_rss>¶ this struct extracts the high-water mark (or a change in the high-water mark) of the resident set size (RSS). Which is current amount of memory in RAM. When used on a system with swap enabled, this value may fluctuate but should not on an HPC system.
-
struct
priority_context_switch
: public tim::component::base<priority_context_switch>¶ the number of times a context switch resulted due to a higher priority process becoming runnable or because the current process exceeded its time slice
-
struct
virtual_memory
: public tim::component::base<virtual_memory>¶ this struct extracts the virtual memory usage
-
struct
voluntary_context_switch
: public tim::component::base<voluntary_context_switch>¶ the number of times a context switch resulted due to a process voluntarily giving up the processor before its time slice was completed (usually to await availability of a resource).
I/O Components¶
-
struct
read_bytes
: public tim::component::base<read_bytes, std::pair<int64_t, int64_t>>¶ I/O counter for bytes read. Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. Done at the submit_bio() level, so it is accurate for block-backed filesystems.
-
struct
read_char
: public tim::component::base<read_char, std::pair<int64_t, int64_t>>¶ I/O counter for chars read. The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read() and pread(). It includes things like tty IO and it is unaffected by whether or not actual physical disk IO was required (the read might have been satisfied from pagecache)
-
struct
written_bytes
: public tim::component::base<written_bytes, std::array<int64_t, 2>>¶ I/O counter for bytes written. Attempt to count the number of bytes which this process caused to be sent to the storage layer. This is done at page-dirtying time.
-
struct
written_char
: public tim::component::base<written_char, std::array<int64_t, 2>>¶ I/O counter for chars written. The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with tim::component::read_char (rchar).
User Bundle Components¶
Timemory provides the user_bundle
component as a generic component bundler
that the user can use to insert components at runtime. This component is
heavily used when mapping timemory to languages other than C++. Timemory
implements many specialization of this template class for various tools.
For example, user_mpip_bundle
is the bundle used by the MPI wrappers,
user_profiler_bundle
is used by the Python function profiler,
user_trace_bundle
is used by the dynamic instrumentation tool timemory-run
and
the Python line tracing profiler, etc. These specialization are
all individually configurable and it is recommended that applications create
their own specialization specific to their project – this will ensure that
the desired set of components configured by your application will not be
affected by a third-party library configuring their own set of components.
The general design is that each user-bundle:
Has their own unique environment variable for exclusive configuration, usually
"TIMEMORY_<LABEL>_COMPONENTS"
, e.g.:"TIMEMORY_TRACE_COMPONENTS"
foruser_trace_bundle
"TIMEMORY_MPIP_COMPONENTS"
foruser_mpip_components
If the unique environment variable is set, only the components in the variable are used
Thus making the bundle uniquely configurable
If the unique environment variable is not set, it searches one or more backup environment variables, the last of which being
"TIMEMORY_GLOBAL_COMPONENTS"
Thus, if no specific environment variables are set, all user bundles collect the components specified in
"TIMEMORY_GLOBAL_COMPONENTS"
If the unique environment variable is set to
"none"
, it terminates searching the backup environment variablesThus,
"TIMEMORY_GLOBAL_COMPONENTS"
can be set but the user can suppress a specific bundle from being affected by this configuration
If the unique environment variable contains
"fallthrough"
, it will continue adding the components specified by the backup environment variablesThus, the components specified in
"TIMEMORY_GLOBAL_COMPONENTS"
and"TIMEMORY_<LABEL>_COMPONENTS"
will be added
-
template<size_t
Idx
, typenameTag
>
structuser_bundle
: public tim::component::base<user_bundle<Idx, Tag>, void>, public tim::concepts::runtime_configurable, private tim::component::internal::user_bundle¶
Warning
doxygentypedef: Cannot find typedef “tim::component::user_global_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_mpip_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_ncclp_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_ompt_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_profiler_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_trace_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_kokkosp_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Third-Party Interface Components¶
-
struct
allinea_map
: public tim::component::base<allinea_map, void>, private tim::policy::instance_tracker<allinea_map, false>¶ Controls the AllineaMap sampling profiler.
-
struct
caliper_marker
: public tim::component::base<caliper_marker, void>, public tim::component::base<caliper_marker, void>, public tim::component::caliper_common¶ Standard marker for the Caliper Performance Analysis Toolbox.
-
struct
caliper_config
: public tim::component::base<caliper_config, void>, public tim::component::base<caliper_config, void>, private tim::policy::instance_tracker<caliper_config, false>¶ Component which provides Caliper
cali::ConfigManager
.
-
struct
caliper_loop_marker
: public tim::component::base<caliper_loop_marker, void>, public tim::component::base<caliper_loop_marker, void>, public tim::component::caliper_common¶ Loop marker for the Caliper Performance Analysis Toolbox.
-
struct
craypat_counters
: public tim::component::base<craypat_counters, std::vector<unsigned long>>¶
-
struct
craypat_flush_buffer
: public tim::component::base<craypat_flush_buffer, unsigned long>¶ Writes all the recorded contents in the data buffer. Returns the number of bytes flushed.
-
struct
craypat_heap_stats
: public tim::component::base<craypat_heap_stats, void>¶ Dumps the craypat heap statistics.
-
struct
craypat_record
: public tim::component::base<craypat_record, void>, private tim::policy::instance_tracker<craypat_record>¶ Provides scoping the CrayPAT profiler. Global initialization stops the profiler, the first call to
start()
starts the profiler again on the calling thread. Instance counting is enabled per-thread and each call to start increments the counter. All calls tostop()
have no effect until the counter reaches zero, at which point the compiler is turned off again.
-
struct
craypat_region
: public tim::component::base<craypat_region, void>, private tim::policy::instance_tracker<craypat_region, false>¶ Adds a region label to the CrayPAT profiling output.
Retrieves the names and value of any counter events that have been set to count on the hardware category.
-
struct
gperftools_cpu_profiler
: public tim::component::base<gperftools_cpu_profiler, void>¶
-
struct
gperftools_heap_profiler
: public tim::component::base<gperftools_heap_profiler, void>¶
-
struct
likwid_marker
: public tim::component::base<likwid_marker, void>¶ Provides likwid perfmon marker forwarding. Requires.
-
struct
likwid_nvmarker
: public tim::component::base<likwid_nvmarker, void>¶ Provides likwid nvmon marker forwarding. Requires.
-
template<typename
Api
>
structompt_handle
: public tim::component::base<ompt_handle<Api>, void>, private tim::policy::instance_tracker<ompt_handle<Api>>¶
-
struct
tau_marker
: public tim::component::base<tau_marker, void>¶ Forwards timemory labels to the TAU (Tuning and Analysis Utilities)
-
struct
vtune_event
: public tim::component::base<vtune_event, void>¶ Implements
__itt_event
-
struct
vtune_frame
: public tim::component::base<vtune_frame, void>¶ Implements
__itt_domain
-
struct
vtune_profiler
: public tim::component::base<vtune_profiler, void>, private tim::policy::instance_tracker<vtune_profiler, false>¶ Implements
__itt_pause()
and__itt_resume()
to control where the vtune profiler is active.
Hardware Counter Components¶
-
template<int...
EventTypes
>
structpapi_tuple
: public tim::component::base<papi_tuple<EventTypes...>, std::array<long long, sizeof...(EventTypes)>>, private tim::policy::instance_tracker<papi_tuple<EventTypes...>>, private tim::component::papi_common¶ This component is useful for bundling together a fixed set of hardware counter identifiers which require no runtime configuration.
// the "Instructions" alias below explicitly collects the total instructions, // the number of load instructions, the number of store instructions using Instructions = papi_tuple<PAPI_TOT_INS, PAPI_LD_INS, PAPI_SR_INS>; Instructions inst{}; inst.start(); ... inst.stop(); std::vector<double> data = inst.get();
- tparam EventTypes
Compile-time constant list of PAPI event identifiers
-
template<typename
RateT
, int...EventTypes
>
structpapi_rate_tuple
: public tim::component::base<papi_rate_tuple<RateT, EventTypes...>, std::pair<papi_tuple<EventTypes...>, RateT>>, private tim::component::papi_common¶ This component pairs a tim::component::papi_tuple with a component which will provide an interval over which the hardware counters will be reported, e.g. if
RateT
is tim::component::wall_clock, the reported values will be the hardware-counters w.r.t. the wall-clock time. IfRateT
is tim::component::cpu_clock, the reported values will be the hardware counters w.r.t. the cpu time.// the "Instructions" alias below explicitly collects the total instructions per second, // the number of load instructions per second, the number of store instructions per second using Instructions = papi_rate_tuple<wall_clock, PAPI_TOT_INS, PAPI_LD_INS, PAPI_SR_INS>; Instructions inst{}; inst.start(); ... inst.stop(); std::vector<double> data = inst.get();
- tparam RateT
Component whose value will be the divisor for all the hardware counters
- tparam EventTypes
Compile-time constant list of PAPI event identifiers
-
template<size_t
MaxNumEvents
>
structpapi_array
: public tim::component::base<papi_array<MaxNumEvents>, std::array<long long, MaxNumEvents>>, private tim::policy::instance_tracker<papi_array<MaxNumEvents>>, private tim::component::papi_common¶
-
struct
papi_vector
: public tim::component::base<papi_vector, std::vector<long long>>, private tim::policy::instance_tracker<papi_vector>, private tim::component::papi_common¶
Miscellaneous Components¶
-
template<typename ...
Types
>
structcpu_roofline
: public tim::component::base<cpu_roofline<Types...>, std::pair<std::vector<long long>, double>>¶ Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine.
- tparam Types
Variadic list of data types for roofline analysis
-
typedef cpu_roofline<double>
tim::component
::
cpu_roofline_dp_flops
¶ A specialization of tim::component::cpu_roofline for 64-bit floating point operations.
-
using
tim::component
::
cpu_roofline_flops
= cpu_roofline<float, double>¶
-
typedef cpu_roofline<float>
tim::component
::
cpu_roofline_sp_flops
¶ A specialization of tim::component::cpu_roofline for 32-bit floating point operations.
GPU Components¶
-
struct
cuda_event
: public tim::component::base<cuda_event, float>¶ Records the time interval between two points in a CUDA stream. Less accurate than ‘cupti_activity’ for kernel timing but does not require linking to the CUDA driver.
-
struct
cupti_activity
: public tim::component::base<cupti_activity, intmax_t>¶ CUPTI activity tracing component for high-precision kernel timing. For low-precision kernel timing, use tim::component::cuda_event component.
-
struct
cupti_counters
: public tim::component::base<cupti_counters, cupti::profiler::results_t>¶ NVprof-style hardware counters via the CUpti callback API. Collecting these hardware counters has a higher overhead than the new CUpti Profiling API (tim::component::cupti_profiler). However, there are currently some issues with nesting the Profiling API and it is currently recommended to use this component for NVIDIA hardware counters in timemory. The callback API / NVprof is quite specific about the distinction between an “event” and a “metric”. For your convenience, timemory removes this distinction and events can be specified arbitrarily as metrics and vice-versa and this component will sort them into their appropriate category. For the full list of the available events/metrics, use
timemory-avail -H
from the command-line.
Warning
doxygenstruct: Cannot find class “tim::component::cupti_profiler” in doxygen xml output for project “timemory” from directory: doxygen-xml
-
template<typename ...
Types
>
structgpu_roofline
: public tim::component::base<gpu_roofline<Types...>, std::tuple<cupti_activity::value_type, cupti_counters::value_type>>¶ Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine.
- tparam Types
Variadic list of data types for roofline analysis
-
typedef gpu_roofline<double>
tim::component
::
gpu_roofline_dp_flops
¶ A specialization of tim::component::gpu_roofline for 64-bit floating point operations.
-
using
tim::component
::
gpu_roofline_flops
= gpu_roofline<float, double>¶
-
typedef gpu_roofline<cuda::fp16_t>
tim::component
::
gpu_roofline_hp_flops
¶ A specialization of tim::component::gpu_roofline for 16-bit floating point operations (depending on availability).
-
typedef gpu_roofline<float>
tim::component
::
gpu_roofline_sp_flops
¶ A specialization of tim::component::gpu_roofline for 32-bit floating point operations.
-
struct
tim::component
::
nvtx_marker
: public tim::component::base<nvtx_marker, void>¶ Inserts NVTX markers with the current timemory prefix. The default color scheme is a round-robin of red, blue, green, yellow, purple, cyan, pink, and light_green. These colors.
Public Functions
-
inline explicit
nvtx_marker
(const nvtx::color::color_t &_color)¶ construct with an specific color
-
inline explicit
nvtx_marker
(cuda::stream_t _stream)¶ construct with an specific CUDA stream
-
inline
nvtx_marker
(const nvtx::color::color_t &_color, cuda::stream_t _stream)¶ construct with an specific color and CUDA stream
-
inline void
start
()¶ start an nvtx range. Equivalent to
nvtxRangeStartEx
-
inline void
stop
()¶ stop the nvtx range. Equivalent to
nvtxRangeEnd
. Depending onsettings::nvtx_marker_device_sync()
this will either callcudaDeviceSynchronize()
orcudaStreamSynchronize(m_stream)
before stopping the range.
-
inline void
mark_begin
()¶ asynchronously add a marker. Equivalent to
nvtxMarkA
-
inline void
mark_end
()¶ asynchronously add a marker. Equivalent to
nvtxMarkA
-
inline void
mark_begin
(cuda::stream_t _stream)¶ asynchronously add a marker for a specific stream. Equivalent to
nvtxMarkA
-
inline void
mark_end
(cuda::stream_t _stream)¶ asynchronously add a marker for a specific stream. Equivalent to
nvtxMarkA
-
inline void
set_stream
(cuda::stream_t _stream)¶ set the current CUDA stream
-
inline void
set_color
(nvtx::color::color_t _color)¶ set the current color
-
inline explicit
Data Tracking Components¶
-
template<typename
InpT
, typenameTag
>
structtim::component
::
data_tracker
: public tim::component::base<data_tracker<InpT, Tag>, InpT>¶ This component is provided to facilitate data tracking. The first template parameter is the type of data to be tracked, the second is a custom tag for differentiating trackers which handle the same data types but record different high-level data.
Usage:
// declarations struct myproject {}; using itr_tracker_type = data_tracker<uint64_t, myproject>; using err_tracker_type = data_tracker<double, myproject>; // add statistics capabilities TIMEMORY_STATISTICS_TYPE(itr_tracker_type, int64_t) TIMEMORY_STATISTICS_TYPE(err_tracker_type, double) // set the label and descriptions TIMEMORY_METADATA_SPECIALIZATION( itr_tracker_type, "myproject_iterations", "short desc", "long description") TIMEMORY_METADATA_SPECIALIZATION( err_tracker_type, "myproject_convergence", "short desc", "long description") // this is the generic bundle pairing a timer with an iteration tracker // using this and not updating the iteration tracker will create entries // in the call-graph with zero iterations. using bundle_t = tim::auto_tuple<wall_clock, itr_tracker_type>; // this is a dedicated bundle for adding data-tracker entries. This style // can also be used with the iteration tracker or you can bundle // both trackers together. The auto_tuple will call start on construction // and stop on destruction so once can construct a nameless temporary of the // this bundle type and call store(...) on the nameless tmp. This will // ensure that the statistics are updated for each entry // using err_bundle_t = tim::auto_tuple<err_tracker_type>; // usage in a function is implied below double err = std::numeric_limits<double>::max(); const double tolerance = 1.0e-6; bundle_t t("iteration_time"); while(err > tolerance) { // store the starting error double initial_err = err; // add 1 for each iteration. Stats only updated when t is destroyed or t.stop() is // called t.store(std::plus<uint64_t>{}, 1); // ... do something ... // construct a nameless temporary which records the change in the error and // update the statistics <-- "foo" will have mean/min/max/stddev of the // error err_bundle_t{ "foo" }.store(err - initial_err); // NOTE: std::plus is used with t above bc it has multiple components so std::plus // helps ensure 1 doesn't get added to some other component with `store(int)` // In above err_bundle_t, there is only one component so there is not concern. }
When creating new data trackers, it is recommended to have this in header:
TIMEMORY_DECLARE_EXTERN_COMPONENT(custom_data_tracker_t, true, data_type)
And this in one source file (preferably one that is not re-compiled often)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(custom_data_tracker_t, true, data_type) TIMEMORY_INITIALIZE_STORAGE(custom_data_tracker_t)
where
custom_data_tracker_t
is the custom data tracker type (or an alias to the type) anddata_type
is the data type being tracked.Public Functions
-
inline auto
get
() const¶ get the data in the final form after unit conversion
-
inline auto
get_display
() const¶ get the data in a form suitable for display
-
inline auto
get_secondary
() const¶ map of the secondary entries. When TIMEMORY_ADD_SECONDARY is enabled contents of this map will be added as direct children of the current node in the call-graph.
-
template<typename
T
>
voidstore
(T &&val, enable_if_acceptable_t<T, int> = 0)¶ store some data. Uses tim::data::handler for the type.
-
template<typename
T
>
voidstore
(handler_type&&, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which takes a handler to ensure proper overload resolution
-
template<typename
FuncT
, typenameT
>
autostore
(FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0) -> decltype(std::declval<handler_type>().store(*this, std::forward<FuncT>(f), std::forward<T>(val)), void())¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values
-
template<typename
FuncT
, typenameT
>
autostore
(handler_type&&, FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0) -> decltype(std::declval<handler_type>().store(*this, std::forward<FuncT>(f), std::forward<T>(val)), void())¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values and takes a handler to ensure proper overload resolution
-
template<typename
T
>
voidmark_begin
(T &&val, enable_if_acceptable_t<T, int> = 0)¶ The combination of
mark_begin(...)
andmark_end(...)
can be used to store some initial data which may be needed later. Whenmark_end(...)
is called, the value is updated with the difference of the value provided tomark_end
and the temporary stored duringmark_begin
.
-
template<typename
T
>
voidmark_begin
(handler_type&&, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which takes a handler to ensure proper overload resolution
-
template<typename
FuncT
, typenameT
>
voidmark_begin
(FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values
-
template<typename
FuncT
, typenameT
>
voidmark_begin
(handler_type&&, FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values and takes a handler to ensure proper overload resolution
-
template<typename
T
>
voidmark_end
(T &&val, enable_if_acceptable_t<T, int> = 0)¶ The combination of
mark_begin(...)
andmark_end(...)
can be used to store some initial data which may be needed later. Whenmark_end(...)
is called, the value is updated with the difference of the value provided tomark_end
and the temporary stored duringmark_begin
. It may be valid to callmark_end
without callingmark_begin
but the result will effectively be a more expensive version of callingstore
.
-
template<typename
T
>
voidmark_end
(handler_type&&, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which takes a handler to ensure proper overload resolution
-
template<typename
FuncT
, typenameT
>
voidmark_end
(FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values
-
template<typename
FuncT
, typenameT
>
voidmark_end
(handler_type&&, FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values and takes a handler to ensure proper overload resolution
-
template<typename
T
>
this_type *add_secondary
(const std::string &_key, T &&val, enable_if_acceptable_t<T, int> = 0)¶ add a secondary value to the current node in the call-graph. When TIMEMORY_ADD_SECONDARY is enabled contents of this map will be added as direct children of the current node in the call-graph. This is useful for finer-grained details that might not always be desirable to display
-
template<typename
T
>
this_type *add_secondary
(const std::string &_key, handler_type &&h, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which takes a handler to ensure proper overload resolution
-
template<typename
FuncT
, typenameT
>
this_type *add_secondary
(const std::string &_key, FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values
-
template<typename
FuncT
, typenameT
>
this_type *add_secondary
(const std::string &_key, handler_type &&h, FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values and takes a handler to ensure proper overload resolution
-
inline void
set_value
(const value_type &v)¶ set the current value
-
inline void
set_value
(value_type &&v)¶ set the current value via move
Public Static Functions
-
static inline std::string &
label
()¶ a reference is returned here so that it can be easily updated
-
static std::string &
description
()¶ a reference is returned here so that it can be easily updated
-
static inline auto &
get_unit
()¶ this returns a reference so that it can be easily modified
-
inline auto
-
typedef data_tracker<intmax_t, TIMEMORY_API>
tim::component
::
data_tracker_integer
¶
-
typedef data_tracker<size_t, TIMEMORY_API>
tim::component
::
data_tracker_unsigned
¶
-
using
tim::component
::
data_tracker_floating
= data_tracker<double, TIMEMORY_API>¶
Function Wrapping Components¶
-
template<size_t
Nt
, typenameBundleT
, typenameDiffT
>
structtim::component
::
gotcha
: public tim::component::base<gotcha<Nt, BundleT, DiffT>, void>, public tim::concepts::external_function_wrapper¶ The gotcha component rewrites the global offset table such that calling the wrapped function actually invokes either a function which is wrapped by timemory instrumentation or is replaced by a timemory component with an function call operator (
operator()
) whose return value and arguments exactly match the original function. This component is only available on Linux and can only by applied to external, dynamically-linked functions (i.e. functions defined in a shared library). If theBundleT
template parameter is a non-empty component bundle, this component will surround the original function call with:bundle_type _obj{ "<NAME-OF-ORIGINAL-FUNCTION>" }; _obj.construct(_args...); _obj.start(); _obj.audit("<NAME-OF-ORIGINAL-FUNCTION>", _args...); Ret _ret = <CALL-ORIGINAL-FUNCTION> _obj.audit("<NAME-OF-ORIGINAL-FUNCTION>", _ret); _obj.stop();
- tparam Nt
Max number of functions which will wrapped by this component
- tparam BundleT
Component bundle to wrap around the function(s)
- tparam DiffT
Differentiator type to distinguish different sets of wrappers with identical values of
Nt
andBundleT
(or provide function call operator if replacing functions instead of wrapping functions)
If the
BundleT
template parameter is an empty variadic class, e.g.std::tuple<>
,tim::component_tuple<>
, etc., and theDiffT
template parameter is a timemory component, the assumption is that theDiffT
component has a function call operator which should replace the original function call, e.g.void* malloc(size_t)
can be replaced with a component withvoid* operator()(size_t)
, e.g.:// replace 'double exp(double)' struct exp_replace : base<exp_replace, void> { double operator()(double value) { float result = expf(static_cast<float>(value)); return static_cast<double>(result); } };
Example usage:
#include <timemory/timemory.hpp> #include <cassert> #include <cmath> #include <tuple> using empty_tuple_t = std::tuple<>; using base_bundle_t = tim::component_tuple<wall_clock, cpu_clock>; using gotcha_wrap_t = tim::component::gotcha<2, base_bundle_t, void>; using gotcha_repl_t = tim::component::gotcha<2, empty_tuple_t, exp_replace>; using impl_bundle_t = tim::mpl::append_type_t<base_bundle_t, tim::type_list<gotcha_wrap_t, gotcha_repl_t>>; void init_wrappers() { // wraps the sin and cos math functions gotcha_wrap_t::get_initializer() = []() { TIMEMORY_C_GOTCHA(gotcha_wrap_t, 0, sin); // index 0 replaces sin TIMEMORY_C_GOTCHA(gotcha_wrap_t, 1, cos); // index 1 replace cos }; // replaces the 'exp' function which may be 'exp' in symbols table // or '__exp_finite' in symbols table (use `nm <bindary>` to determine) gotcha_repl_t::get_initializer() = []() { TIMEMORY_C_GOTCHA(gotcha_repl_t, 0, exp); TIMEMORY_DERIVED_GOTCHA(gotcha_repl_t, 1, exp, "__exp_finite"); }; } // the following is useful to avoid having to call 'init_wrappers()' explicitly: // use comma operator to call 'init_wrappers' and return true static auto called_init_at_load = (init_wrappers(), true); int main() { assert(called_init_at_load == true); double angle = 45.0 * (M_PI / 180.0); impl_bundle_t _obj{ "main" }; // gotcha wrappers not activated yet printf("cos(%f) = %f\n", angle, cos(angle)); printf("sin(%f) = %f\n", angle, sin(angle)); printf("exp(%f) = %f\n", angle, exp(angle)); // gotcha wrappers are reference counted according to start/stop _obj.start(); printf("cos(%f) = %f\n", angle, cos(angle)); printf("sin(%f) = %f\n", angle, sin(angle)); printf("exp(%f) = %f\n", angle, exp(angle)); _obj.stop(); // gotcha wrappers will be deactivated printf("cos(%f) = %f\n", angle, cos(angle)); printf("sin(%f) = %f\n", angle, sin(angle)); printf("exp(%f) = %f\n", angle, exp(angle)); return 0; }
Public Static Functions
-
static inline get_select_list_t &
get_permit_list
()¶ when a permit list is provided, only these functions are wrapped by GOTCHA
-
static inline get_select_list_t &
get_reject_list
()¶ reject listed functions are never wrapped by GOTCHA
-
static inline void
add_global_suppression
(const std::string &func)¶ add function names at runtime to suppress wrappers
-
static inline auto
get_ready
()¶ get an array of whether the wrappers are filled and ready
-
static inline auto
set_ready
(bool val)¶ set filled wrappers to array of ready values
-
template<size_t
N
, typenameRet
, typename ...Args
>
structinstrument
¶
-
struct
tim::component
::
malloc_gotcha
: public tim::component::base<malloc_gotcha, double>, public tim::concepts::external_function_wrapper¶ Public Functions
-
struct
memory_allocations
: public tim::component::base<memory_allocations, void>, public tim::concepts::external_function_wrapper, private tim::policy::instance_tracker<memory_allocations, true>¶ This component wraps malloc, calloc, free, CUDA/HIP malloc/free via GOTCHA and tracks the number of bytes requested/freed in each call. This component is useful for detecting the locations where memory re-use would provide a performance benefit.
Base Components¶
-
template<typename
Tp
, typenameValue
>
structtim::component
::
base
: public tim::component::empty_base, private tim::component::base_state, private base_data_t<Tp, Value>, public tim::concepts::component¶ Public Types
-
using
EmptyT
= std::tuple<>¶
-
using
dynamic_type
= typename trait::dynamic_base<Tp>::type¶
-
using
statistics_policy
= policy::record_statistics<Tp, Value>¶
-
using
fmtflags
= std::ios_base::fmtflags¶
Public Functions
-
~base
() = default¶
-
void
set_started
()¶ store that start has been called
-
void
set_stopped
()¶ store that stop has been called
-
void
reset
()¶ reset the values
-
void
get
(void *&ptr, size_t _typeid_hash) const¶ assign type to a pointer
-
inline auto
get
() const¶ retrieve the current measurement value in the units for the type
-
inline auto
get_display
() const¶ retrieve the current measurement value in the units for the type in a format that can be piped to the output stream operator (‘<<’)
-
template<typename
Up
= Tp>
voidprint
(std::ostream&, enable_if_t<trait::uses_value_storage<Up, Value>::value, int> = 0) const¶
-
template<typename
Up
= Tp>
voidprint
(std::ostream&, enable_if_t<!trait::uses_value_storage<Up, Value>::value, long> = 0) const¶
-
template<typename
Archive
, typenameUp
= Type, enable_if_t<!trait::custom_serialization<Up>::value, int> = 0>
voidload
(Archive &ar, unsigned int)¶ serialization load (input)
-
template<typename
Archive
, typenameUp
= Type, enable_if_t<!trait::custom_serialization<Up>::value, int> = 0>
voidsave
(Archive &ar, unsigned int version) const¶ serialization store (output)
-
inline int64_t
get_laps
() const¶ add a sample
get number of measurement
-
inline auto
get_iterator
() const¶
-
inline void
set_laps
(int64_t v)¶
-
inline void
set_iterator
(graph_iterator itr)¶
-
inline decltype(auto)
load
()¶
-
inline decltype(auto)
load
() const¶
-
inline bool
get_depth_change
() const¶
-
inline bool
get_is_flat
() const¶
-
inline bool
get_is_invalid
() const¶
-
inline bool
get_is_on_stack
() const¶
-
inline bool
get_is_running
() const¶
-
inline bool
get_is_transient
() const¶
-
inline void
set_depth_change
(bool v)¶
-
inline void
set_is_flat
(bool v)¶
-
inline void
set_is_invalid
(bool v)¶
-
inline void
set_is_on_stack
(bool v)¶
-
inline void
set_is_running
(bool v)¶
-
inline void
set_is_transient
(bool v)¶
Public Static Functions
-
template<typename
Vp
, typenameUp
= Tp, enable_if_t<trait::sampler<Up>::value, int> = 0>
static voidadd_sample
(Vp&&)¶
-
static base_storage_type *
get_storage
()¶
-
template<typename
Up
= Type, typenameUnitT
= typename trait::units<Up>::type, enable_if_t<std::is_same<UnitT, int64_t>::value, int> = 0>
static int64_tunit
()¶
-
template<typename
Up
= Type, typenameUnitT
= typename trait::units<Up>::display_type, enable_if_t<std::is_same<UnitT, std::string>::value, int> = 0>
static std::stringdisplay_unit
()¶
-
template<typename
Up
= Type, typenameUnitT
= typename trait::units<Up>::type, enable_if_t<std::is_same<UnitT, int64_t>::value, int> = 0>
static int64_tget_unit
()¶
-
template<typename
Up
= Type, typenameUnitT
= typename trait::units<Up>::display_type, enable_if_t<std::is_same<UnitT, std::string>::value, int> = 0>
static std::stringget_display_unit
()¶
-
static short
get_width
()¶
-
static short
get_precision
()¶
-
static std::string
label
()¶
-
static std::string
description
()¶
-
static std::string
get_label
()¶
-
static std::string
get_description
()¶
Public Static Attributes
-
static constexpr bool
is_component
= true¶
-
static constexpr bool
timing_category_v
= trait::is_timing_category<Type>::value¶
-
static constexpr bool
memory_category_v
= trait::is_memory_category<Type>::value¶
-
static constexpr bool
timing_units_v
= trait::uses_timing_units<Type>::value¶
-
static constexpr bool
memory_units_v
= trait::uses_memory_units<Type>::value¶
-
static constexpr bool
percent_units_v
= trait::uses_percent_units<Type>::value¶
-
static constexpr auto
ios_fixed
= std::ios_base::fixed¶
-
static constexpr auto
ios_decimal
= std::ios_base::dec¶
-
static constexpr auto
ios_showpoint
= std::ios_base::showpoint¶
-
static const fmtflags
format_flags
= ios_fixed | ios_decimal | ios_showpoint¶
Friends
- friend struct node::graph< Tp >
- friend struct operation::init_storage< Tp >
- friend struct operation::fini_storage< Tp >
- friend struct operation::cache< Tp >
- friend struct operation::construct< Tp >
- friend struct operation::set_prefix< Tp >
- friend struct operation::push_node< Tp >
- friend struct operation::pop_node< Tp >
- friend struct operation::record< Tp >
- friend struct operation::reset< Tp >
- friend struct operation::measure< Tp >
- friend struct operation::start< Tp >
- friend struct operation::stop< Tp >
- friend struct operation::set_started< Tp >
- friend struct operation::set_stopped< Tp >
- friend struct operation::minus< Tp >
- friend struct operation::plus< Tp >
- friend struct operation::multiply< Tp >
- friend struct operation::divide< Tp >
- friend struct operation::base_printer< Tp >
- friend struct operation::print< Tp >
- friend struct operation::print_storage< Tp >
- friend struct operation::copy< Tp >
- friend struct operation::sample< Tp >
- friend struct operation::serialization< Tp >
- friend struct operation::finalize::get< Tp, true >
- friend struct operation::finalize::get< Tp, false >
- friend struct operation::finalize::merge< Tp, true >
- friend struct operation::finalize::merge< Tp, false >
- friend struct operation::finalize::print< Tp, true >
- friend struct operation::finalize::print< Tp, false >
- friend struct operation::compose
-
using
-
struct
tim::component
::
empty_base
¶ A very lightweight base which provides no storage.
Subclassed by tim::component::base< mpi_trace_gotcha, void >, tim::component::base< pthread_gotcha, void >, tim::component::base< allinea_map, void >, tim::component::base< caliper_config, void >, tim::component::base< caliper_loop_marker, void >, tim::component::base< caliper_marker, void >, tim::component::base< cpu_clock >, tim::component::base< cpu_roofline< Types… >, std::pair< std::vector< long long >, double > >, tim::component::base< cpu_util, std::pair< int64_t, int64_t > >, tim::component::base< craypat_counters, std::vector< unsigned long > >, tim::component::base< craypat_flush_buffer, unsigned long >, tim::component::base< craypat_heap_stats, void >, tim::component::base< craypat_record, void >, tim::component::base< craypat_region, void >, tim::component::base< cuda_event, float >, tim::component::base< cuda_profiler, void >, tim::component::base< cupti_activity, intmax_t >, tim::component::base< cupti_counters, cupti::profiler::results_t >, tim::component::base< cupti_pcsampling, cupti::pcsample >, tim::component::base< current_peak_rss, std::pair< int64_t, int64_t > >, tim::component::base< data_tracker< InpT, Tag >, InpT >, tim::component::base< gotcha< Nt, BundleT, DiffT >, void >, tim::component::base< gperftools_cpu_profiler, void >, tim::component::base< gperftools_heap_profiler, void >, tim::component::base< gpu_roofline< Types… >, std::tuple< cupti_activity::value_type, cupti_counters::value_type > >, tim::component::base< hip_event, float >, tim::component::base< kernel_mode_time, int64_t >, tim::component::base< likwid_marker, void >, tim::component::base< likwid_nvmarker, void >, tim::component::base< malloc_gotcha, double >, tim::component::base< memory_allocations, void >, tim::component::base< monotonic_clock >, tim::component::base< monotonic_raw_clock >, tim::component::base< mpip_handle< Toolset, Tag >, void >, tim::component::base< ncclp_handle< Toolset, Tag >, void >, tim::component::base< network_stats, cache::network_stats >, tim::component::base< nothing, skeleton::base >, tim::component::base< num_io_in >, tim::component::base< num_io_out >, tim::component::base< num_major_page_faults >, tim::component::base< num_minor_page_faults >, tim::component::base< nvtx_marker, void >, tim::component::base< ompt_data_tracker< Api >, void >, tim::component::base< ompt_handle< Api >, void >, tim::component::base< page_rss, int64_t >, tim::component::base< papi_array< MaxNumEvents >, std::array< long long, MaxNumEvents > >, tim::component::base< papi_rate_tuple< RateT, EventTypes… >, std::pair< papi_tuple< EventTypes… >, RateT > >, tim::component::base< papi_tuple< EventTypes… >, std::array< long long, sizeof…(EventTypes)> >, tim::component::base< papi_vector, std::vector< long long > >, tim::component::base< peak_rss >, tim::component::base< perfetto_trace, void >, tim::component::base< placeholder< Types… >, void >, tim::component::base< priority_context_switch >, tim::component::base< process_cpu_clock >, tim::component::base< process_cpu_util, std::pair< int64_t, int64_t > >, tim::component::base< read_bytes, std::pair< int64_t, int64_t > >, tim::component::base< read_char, std::pair< int64_t, int64_t > >, tim::component::base< roctx_marker, void >, tim::component::base< system_clock >, tim::component::base< tau_marker, void >, tim::component::base< thread_cpu_clock >, tim::component::base< thread_cpu_util, std::pair< int64_t, int64_t > >, tim::component::base< timestamp, timestamp_entry_t >, tim::component::base< trip_count >, tim::component::base< user_bundle< Idx, Tag >, void >, tim::component::base< user_clock >, tim::component::base< user_mode_time, int64_t >, tim::component::base< virtual_memory >, tim::component::base< voluntary_context_switch >, tim::component::base< vtune_event, void >, tim::component::base< vtune_frame, void >, tim::component::base< vtune_profiler, void >, tim::component::base< wall_clock, int64_t >, tim::component::base< written_bytes, std::array< int64_t, 2 > >, tim::component::base< written_char, std::array< int64_t, 2 > >, tim::component::base< kernel_logger, void >, tim::component::base< sampler< CompT< Types… >, N, SigIds… >, void >, tim::component::base< Tp, Value >, tim::component::base< Tp, void >, tim::component::printer
Public Functions
-
inline void
get
() const¶
-
inline void