AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / ubuntu / 问题 / 1521440
Accepted
Assa Yeroslaviz
Assa Yeroslaviz
Asked: 2024-07-23 21:49:06 +0800 CST2024-07-23 21:49:06 +0800 CST 2024-07-23 21:49:06 +0800 CST

在多个 cors/线程上运行脚本时如何监控系统使用情况?

  • 772

我在 Ubuntu 22.04 中编写了一个特定的脚本(经过测试,我可以说它也可以在 24.04 上运行),用于在特定命令运行时监视某些系统参数。有问题的命令正在运行另一个工具,该工具分布在多个核心/线程上。我想监视内存的使用情况、特定文件大小以及使用了哪些核心和线程。脚本附在下面。

我对目前的结果很满意,但我想知道是否可以做得更好。

我无法监控的一件事是执行命令top然后按 时可以看到的 %CPU 1。我可以看到每个核心的命令的 %CPU 使用率。我似乎找不到以编程方式执行此操作的方法。我不确定这是否可行,但我想知道我的哪个进程在哪个线程上运行,如果可以的话,可以量化它。

我将非常感激任何使它更加优雅的建议或意见。

谢谢

阿萨

#!/bin/bash

process_name="dotnet"
folder_path="Output/combined"
log_file="usage_log.csv"

#Rename log file if exists
if [ -e "$log_file" ]; then
    # Get the current timestamp
    timestamp=$(date +"%Y%m%d_%H%M%S")
# Construct the new file name with the timestamp
    new_log_file="${log_file%.*}_$timestamp.${log_file##*.}"
    
    # Rename the log file
    mv "$log_file" "$new_log_file"
    echo "Log file renamed to $new_log_file"
else
    echo "Log file does not exist."
fi

echo "Timestamp,PID,%MEM %CPU RSS_GB VSZ_GB,Threads,Folder_Size" > $log_file

while true; do
    timestamp=$(date +"%Y-%m-%d %H:%M:%S")
    
    # Get all PIDs for the process name
    pids=$(pgrep -f $process_name)
    
    if [ -z "$pids" ]; then
        echo "Process not found"
        break
    fi
    
    folder_size=$(du -sh $folder_path | cut -f1)
    
    for pid in $pids; do
    # Extract memory info and convert to GB
    mem_info=$(ps -o %mem,%cpu,rss,vsz -p $pid --no-headers | \
               awk '{printf "%s %s %.2f %.2f", $1, $2, $3/1048576, $4/1048576}')
    # Extract number of threads
    threads=$(ps -o nlwp -p $pid --no-headers)
    # Extract CPU cores
    cores=$(taskset -pc $pid | awk -F: '{print $2}' | tr -d ' ')

    # Extract CPU cores allowed
    allowed_cores=$(taskset -pc $pid | awk -F: '{print $2}' | tr -d ' ')
     
    #interval after $pid removed
    cores_used=$(pidstat -p $pid | awk -v pid="$pid" ' 
    BEGIN { cores = 0; }
    $4 == pid && $10 > 0 { cores++; }
    END { print cores; }
    ')
   
    echo "$timestamp, $pid ,$mem_info,$threads,$cores,$cores_used, $folder_size" >> $log_file
    done
    
    sleep 30
done

输出如下:

Timestamp,PID,%MEM %CPU RSS_GB VSZ_GB,Threads,Folder_Size
...
2024-07-23 13:42:40, 2798428 ,0.0 29.2 0.35 1018.40,   7,0-63,1, 2.7G
2024-07-23 13:42:40, 2798434 ,0.0 30.5 0.37 1018.40,   7,0-63,1, 2.7G
2024-07-23 13:42:40, 2798436 ,0.0 31.5 0.36 1018.40,   7,0-63,1, 2.7G
2024-07-23 13:42:40, 2798438 ,0.0 32.2 0.34 1018.40,   7,0-63,1, 2.7G
2024-07-23 13:42:40, 2798441 ,0.0 33.8 0.39 1018.40,   7,0-63,1, 2.7G
2024-07-23 13:42:40, 2798447 ,0.0 33.0 0.40 1018.40,   7,0-63,1, 2.7G
2024-07-23 13:42:40, 2798452 ,0.0 26.9 0.38 1018.40,   7,0-63,1, 2.7G
...
monitoring
  • 1 1 个回答
  • 35 Views

1 个回答

  • Voted
  1. Best Answer
    Assa Yeroslaviz
    2024-08-26T14:31:32+08:002024-08-26T14:31:32+08:00

    自从我发布这个问题以来,我已经修改了我的脚本以提供一些更必要的信息,例如活动和空闲核心以及我希望尽可能保持小的中间文件夹的大小(只要我的机器上有存储空间)。

    脚本如下。欢迎所有评论和改进(当然也包括一些批评)。🤗

    #!/bin/bash
    
    process_name="dotnet"
    folder_path="/home/ubuntu/raw_data/" # this is the path in the ubuntu AWS machine
    #folder_path="/fs/pool/pool-cox-projects-bioinformatics/AG_Cox/AWS/big_runs/PXD041421_timstof/DIA/combined"
    log_file="TT_DIA_usage_m7i.48xlarge_log.csv"
    #log_file="TT_DIA_usage_log.csv"
    mqpar_file="mqpar_timstof_DIA_m7i.48xlarge.xml"
    batch_size=100
    # interval=1  # Interval in seconds for sampling CPU usage
    
    #Rename log file if exists
    if [ -e "$log_file" ]; then
        # Get the current timestamp
        timestamp=$(date +"%Y%m%d_%H%M%S")
    # Construct the new file name with the timestamp
        new_log_file="${log_file%.*}_$timestamp.${log_file##*.}"
        
        # Rename the log file
        mv "$log_file" "$new_log_file"
        echo "Log file renamed to $new_log_file"
        touch $log_file
    else
        echo "Log file does not exist."
        touch $log_file
    fi
    
    # First step is to calculate the batch size of the rwo files. The data is taken from the mqpar file
    ## extract the names to a tmp file
    sed -n '/filePaths/,/filePaths/{ /filePaths/b; /filePaths/b; p }' ${mqpar_file} > filesNames
    ## remove the prefix and suffix patterns
    sed -i -e 's/<string>//g' -e 's/<\/string>//g' filesNames
    # split the file into the separated batches based on the batch number used for the run
    split -l ${batch_size} --numeric-suffixes=1 filesNames batch_
    # Go through the created files in each of the lists and calculate the total size for each of them. The information is added to the head of the log file.
    for list in batch_*; 
    do
     echo -n $list": " >> $log_file
     find $(cat "$list") -type f -exec du -b {} + | awk '{sum += $1} END {printf "Total: %.2f GB\n", sum / (1024 * 1024 * 1024)}' >> $log_file
    done
    # Clean tmp files
    rm batch_0* filesNames.txt
    
    #echo "Timestamp,PID,%MEM,%CPU,RSS_GB,VSZ_GB,Threads,Cores_Used,Allowed_Cores,Folder_Size" > $log_file
    echo "Timestamp,PID,%MEM %CPU RSS_GB VSZ_GB,Threads,Folder_Size,Core_Usage">> $log_file
    
    while true; do
        timestamp=$(date +"%Y-%m-%d %H:%M:%S")
        
        # Get all PIDs for the process name
        pids=$(pgrep -f $process_name)
        
        if [ -z "$pids" ]; then
            echo "Process not found"
            break
        fi
     # This was modified to include also the temporary folders, which are not in the 'combined' folder.
     # This must be adjusted each time a different machine is used. For tims-tof data the folders ends with .d, the intermediate ones don't have this .d
        folder_size=$(find $folder_path  -maxdepth 1 -type d ! -name '*.d' ! -name '.' ! -name 'raw_data' -exec du -sb {} + | awk '{total += $1} END {print total}' | numfmt --to=iec)
    #    folder_size=$(du -sh $folder_path | cut -f1)
    
    # -maxdepth 1 ensures it looks only at the top-level subdirectories.
    # ! -name '*.d' excludes directories ending with .d. the '.' and 'raw_data' exclude the complete folder claculation. This leaves only those directories without a '.d'. 
    # du -sb computes the sizes.
    # awk sums up the sizes, 
    # numfmt --to=iec converts the result to human-readable format.
    
        for pid in $pids; do
        # Extract memory info and convert to GB
        mem_info=$(ps -o %mem,%cpu,rss,vsz -p $pid --no-headers | \
                   awk '{printf "%s %s %.2f %.2f", $1, $2, $3/1048576, $4/1048576}')
        # Extract number of threads
        threads=$(ps -o nlwp -p $pid --no-headers)
        # Extract CPU cores
        cores=$(taskset -pc $pid | awk -F: '{print $2}' | tr -d ' ')
    
        # Extract CPU cores allowed
        allowed_cores=$(taskset -pc $pid | awk -F: '{print $2}' | tr -d ' ')
         
        #interval after $pid removed
        cores_used=$(pidstat -p $pid | awk -v pid="$pid" ' 
        BEGIN { cores = 0; }
        $4 == pid && $10 > 0 { cores++; }
        END { print cores; }
        ')
    
        # Check if cores_used is empty and print debug info
        if [ -z "$cores_used" ]; then
            echo "Debug: cores_used is empty for PID $pid"
            # Optionally print raw pidstat output for debugging
            pidstat -p $pid -t $interval 2>/dev/null
        fi
        
        #cores_high_activity=$(mpstat -P ALL 1 1 | awk '/^[0-9]/ { if ($(NF) < 90) printf "CPU" $2 "," }')
        total_active_cores=$(mpstat -P ALL 1 1 | awk '/^[0-9]/ { if ($(NF) < 30) count++ } END { print count }')
        #cores_zero_activity=$(mpstat -P ALL 1 1 | awk '/^[0-9]/ { if ($(NF) == 100) printf "CPU" $2 "," }')
        total_idle_cores=$(mpstat -P ALL 1 1 | awk '/^[0-9]/ { if ($(NF) == 100) count++ } END { print count }')
    
        # Remove trailing commas from the lists
        cores_high_activity=${cores_high_activity%,}
        cores_zero_activity=${cores_zero_activity%,}
    
        # Count the number of cores in each category
        count_high=$(echo "$cores_high_activity" | grep -o "," | wc -l)
        count_zero=$(echo "$cores_zero_activity" | grep -o "," | wc -l)
    
        # Adjust counts for cases where no cores are found
        count_high=$((count_high + 1)) # Adds 1 to account for the first core in the list
        count_zero=$((count_zero + 1))
    
        # Handle empty lists (if no cores meet the conditions)
        [ -z "$cores_high_activity" ] && count_high=0
        #[ -z "$cores_zero_activity" ] && count_zero=0
    
    #    core_stats=$(echo "Cores with >70% activity: $cores_high_activity ($count_high cores), Total idle cores: $total_idle_cores")
        core_stats=$(echo "#Cores with >70% activity: $total_active_cores, Total idle cores: $total_idle_cores")
        
    #    echo "$timestamp,$pid,$mem_info,$threads,$cores_used,$allowed_cores,$folder_size" >> $log_file
        echo "$timestamp, $pid ,$mem_info,$threads,$cores,$cores_used, $folder_size, $core_stats" >> $log_file
        done
        
        sleep 30
    done
    

    输出如下:

    $ tail -f TT_DIA_usage_m7i.48xlarge_log.csv
    Timestamp,PID,%MEM %CPU RSS_GB VSZ_GB,Threads,Folder_Size
    2024-08-21 08:32:05, 2529250 ,0.0 0.0 0.05 1516.96, 193,0-191,0, 1.5G, #Cores with >70% activity: 182, Total idle cores: 9
    2024-08-21 08:32:05, 2539965 ,0.0 102 0.11 1511.50, 197,0-191,0, 1.5G, #Cores with >70% activity: 183, Total idle cores: 7
    2024-08-21 08:32:05, 2539968 ,0.0 102 0.11 1511.50, 197,0-191,0, 1.5G, #Cores with >70% activity: 182, Total idle cores: 11
    2024-08-21 08:32:05, 2539971 ,0.0 101 0.11 1511.50, 197,0-191,0, 1.5G, #Cores with >70% activity: 183, Total idle cores: 10
    2024-08-21 08:32:05, 2539982 ,0.0 99.6 0.12 1511.50, 197,0-191,0, 1.5G, #Cores with >70% activity: 183, Total idle cores: 8
    2024-08-21 08:32:05, 2539986 ,0.0 99.5 0.12 1511.50, 198,0-191,0, 1.5G, #Cores with >70% activity: 182, Total idle cores: 12
    2024-08-21 08:32:05, 2539993 ,0.0 101 0.11 1511.50, 197,0-191,0, 1.5G, #Cores with >70% activity: 183, Total idle cores: 9
    2024-08-21 08:32:05, 2540001 ,0.0 101 0.11 1511.50, 197,0-191,0, 1.5G, #Cores with >70% activity: 182, Total idle cores: 8
    2024-08-21 08:32:05, 2540010 ,0.0 99.7 0.11 1511.50, 197,0-191,0, 1.5G, #Cores with >70% activity: 183, Total idle cores: 8
    2024-08-21 08:32:05, 2540018 ,0.0 100 0.12 1511.50, 197,0-191,0, 1.5G, #Cores with >70% activity: 182, Total idle cores: 9
    ...
    
    • 0

相关问题

  • 需要一个小程序来报告网站不可用

  • 计算流量的最简单方法是什么?

  • 如何监控内存使用情况?

  • 您使用什么工具来监控 Web 服务器?

  • 万一网络中断时拨打3g链接的脚本?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    如何运行 .sh 脚本?

    • 16 个回答
  • Marko Smith

    如何安装 .tar.gz(或 .tar.bz2)文件?

    • 14 个回答
  • Marko Smith

    如何列出所有已安装的软件包

    • 24 个回答
  • Marko Smith

    无法锁定管理目录 (/var/lib/dpkg/) 是另一个进程在使用它吗?

    • 25 个回答
  • Martin Hope
    Flimm 如何在没有 sudo 的情况下使用 docker? 2014-06-07 00:17:43 +0800 CST
  • Martin Hope
    Ivan 如何列出所有已安装的软件包 2010-12-17 18:08:49 +0800 CST
  • Martin Hope
    La Ode Adam Saputra 无法锁定管理目录 (/var/lib/dpkg/) 是另一个进程在使用它吗? 2010-11-30 18:12:48 +0800 CST
  • Martin Hope
    David Barry 如何从命令行确定目录(文件夹)的总大小? 2010-08-06 10:20:23 +0800 CST
  • Martin Hope
    jfoucher “以下软件包已被保留:”为什么以及如何解决? 2010-08-01 13:59:22 +0800 CST
  • Martin Hope
    David Ashford 如何删除 PPA? 2010-07-30 01:09:42 +0800 CST

热门标签

10.10 10.04 gnome networking server command-line package-management software-recommendation sound xorg

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve