中文文档

关于Skywalking的介绍请参见中文文档

Skywalking简单环境图

安装

环境:

linux ubuntu 18 TLS arm64

elasticsearch: 7.11.0

skywalking: 8.4.0

1. 安装ElasticSearch

参考ELK最佳实践

2. 安装Skywalking

2.1 下载安装包

进入下载页面,选择最新的版本进行下载,以下是本次笔记所下载版本

https://www.apache.org/dyn/closer.cgi/skywalking/8.4.0/apache-skywalking-apm-es7-8.4.0.tar.gz

2.2 解压

tar -xf apache-skywalking-apm-es7-8.4.0.tar.gz
# 移动至/opt/server/ 目录下
mv apache-skywalking-apm-bin-es7 skywalking

2.3 修改配置

 vim /opt/server/skywalking/config/application.yml
storage:
  selector: ${SW_STORAGE:elasticsearch7}
  elasticsearch7:
    nameSpace: ${SW_NAMESPACE:"elasticsearch7"}
    clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:192.168.1.13:9200,192.168.1.14:9200,192.168.1.15:9200}
    protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
    #trustStorePath: ${SW_STORAGE_ES_SSL_JKS_PATH:""}
    #trustStorePass: ${SW_STORAGE_ES_SSL_JKS_PASS:""}
    dayStep: ${SW_STORAGE_DAY_STEP:1} # Represent the number of days in the one minute/hour/day index.
    indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:1} # Shard number of new indexes
    indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:1} # Replicas number of new indexes
    # Super data set has been defined in the codes, such as trace segments.The following 3 config would be improve es performance when storage super size data in es.
    superDatasetDayStep: ${SW_SUPERDATASET_STORAGE_DAY_STEP:-1} # Represent the number of days in the super size dataset record index, the default value is the same as dayStep when the value is less than 0
    superDatasetIndexShardsFactor: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR:5} #  This factor provides more shards for the super data set, shards number = indexShardsNumber * superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger traces.
    superDatasetIndexReplicasNumber: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER:0} # Represent the replicas number in the super size dataset record index, the default value is 0.
    user: ${SW_ES_USER:"elastic"}
    password: ${SW_ES_PASSWORD:"elastic"}
    secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} # Secrets management file in the properties format includes the username, password, which are managed by 3rd party tool.
    bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:1000} # Execute the async bulk record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests
    syncBulkActions: ${SW_STORAGE_ES_SYNC_BULK_ACTIONS:50000} # Execute the sync bulk metrics data every ${SW_STORAGE_ES_SYNC_BULK_ACTIONS} requests
    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
    concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
    resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
    metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
    segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200}
    profileTaskQueryMaxSize: ${SW_STORAGE_ES_QUERY_PROFILE_TASK_SIZE:200}
    oapAnalyzer: ${SW_STORAGE_ES_OAP_ANALYZER:"{\"analyzer\":{\"oap_analyzer\":{\"type\":\"stop\"}}}"} # the oap analyzer.
    oapLogAnalyzer: ${SW_STORAGE_ES_OAP_LOG_ANALYZER:"{\"analyzer\":{\"oap_log_analyzer\":{\"type\":\"standard\"}}}"} # the oap log analyzer. It could be customized by the ES analyzer configuration to support more language log formats, such as Chinese log, Japanese log and etc.
    advanced: ${SW_STORAGE_ES_ADVANCED:""}

只需修改storage配置

storage.selector:选择哪种数据库进行存储,我们选择elasticsearch7

修改elastcisearch中的以下配置

nameSpace: 命名空间

clusterNodes: es集群

user: es用户名

password: es密码

2.4 启动服务

/opt/server/skywalking/bin/oapService.sh
# 查看日志
tail -100f /opt/server/skywalking/logs/skywalking-oap-server.log

第一次启动时间较长,需要初始化环境

2.5 启动UI服务

/opt/server/skywalking/bin/webappService.sh

如需修改配置 webapp/webapp.yml

2.6 查看控制台

http://localhost:8080

3. 服务集成

skywalking已经搭建好了,那么现在就开始集成到服务里吧

3.1 准备

假装你已经知道服务是使用skywalking-agent进行数据采集的(不知道就看最开头的文档吧),关于agent相关的文件在/opt/server/skywalking/agent目录下

3.2 修改服务启动脚本

  • java 脚本

    export SW_AGENT_NAME=demo
    export SW_AGENT_SPAN_LIMIT=2000
    export SW_AGENT_COLLECTOR_BACKEND_SERVICES=122.9.35.11:21800
    JAVA_AGENT="-javaagent:/opt/server/skywalking/agent/skywalking-agent.jar"
    javar -jar ${JAVA_AGENT} demo.jar
    
  • Docker

    FROM openjdk:8-jdk-alpine3.8
    ENV SW_AGENT_NAME=demo \
    		SW_AGENT_SPAN_LIMIT=2000 \
    		SW_AGENT_COLLECTOR_BACKEND_SERVICES=122.9.35.11:21800 \
        JAVA_AGENT=-javaagent:/app/agent/skywalking-agent.jar \
    ENTRYPOINT ["sh","-c","java  ${JAVA_AGENT} -jar /app/app.jar"]
    

    在 docker-compose中编辑数据卷挂载

    volumes:
      - /opt/server/skywalking/agent:/app/agent
    

SW_AGENT_NAME: 服务名

SW_AGENT_SPAN_LIMIT:调用链路记录的最大跨度

SW_AGENT_COLLECTOR_BACKEND_SERVICES:skywalking-oap的地址

这些配置都在agent/config/agent.config中

4. 测试

这里我已经编写好了一个接口:/oauth/login

这个接口将途径 apiserver(网关) -> auth(认证中心) -> user(用户服务) -> mysql | redis

发起一个请求

curl http://localhost:9001/oauth/login

查看ui界面

查看拓扑图

查看调用链路

5. 性能剖析

我们发现有一个性能剖析的的tab,怎么用呢?

端点名称在追踪链路中找到

点击分析,可以看到出现了线程栈,并且有每个方法的调用时长

6.告警

告警规则

默认告警规则

为了方便,skywalking在发行版中提供了默认的alarm setting.yml文件,包括以下规则

1.最近 3 分钟内服务平均响应时间超过 1 秒。

2.服务成功率在最近 2 分钟内低于80%。

3.服务响应时间在最近 3 分钟内低于 1000 毫秒.

4.服务实例在最近 2 分钟内的平均响应时间超过 1 秒。

5.端点平均响应时间在最近 2 分钟内超过1秒。

6.数据库访问平均响应时间在过去 2 分钟内超过 1 秒。

7.端点之间平均响应时间在最近 2 分钟内超过 1 秒。

想要定制化告警需要自己实现,如何实现具体参考官方文档

7. 集成ELK

我们发现,在链路追踪中,存在一个trace id,这个trace id是全链路的,通过这个trace id我们可以找到整条调用链,如果我们将这个trace id放到日志中,再集成到ELK, 嘿嘿~

  • 引入依赖

    <!-- skywalking -->
    <dependency>
      <groupId>org.apache.skywalking</groupId>
      <artifactId>apm-toolkit-logback-1.x</artifactId>
      <version>8.4.0</version>
    </dependency>
    
  • 修改logback-spring.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration scan="true" scanPeriod="60 seconds" debug="false">
        <include resource="org/springframework/boot/logging/logback/defaults.xml"/>
        <springProperty name="applicationName" scope="context" source="spring.application.name" />
        <property name="LOG_FILE_NAME_PATTERN" value="logs/${applicationName}/log.out"/>
        <!-- 日志格式 -->
        <property name="CONSOLE_LOG_PATTERN"
                  value="%clr(%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}}){faint} %clr(${LOG_LEVEL_PATTERN:-%5p}) %clr(${PID:- }){magenta} %clr(---){faint} %clr([%15.15t]){faint} %clr(%c){cyan} %clr(:){faint} %m%n${LOG_EXCEPTION_CONVERSION_WORD:-%wEx}"/>
        <property name="FILE_LOG_PATTERN"
                  value="%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}} ${applicationName} [%tid] ${LOG_LEVEL_PATTERN:-%5p} ${PID:- } --- [%t] %c : %m%n${LOG_EXCEPTION_CONVERSION_WORD:-%wEx}"/>
    
        <!--输出到控制台-->
        <appender name="console" class="ch.qos.logback.core.ConsoleAppender">
            <encoder>
                <pattern>${CONSOLE_LOG_PATTERN}</pattern>
            </encoder>
        </appender>
    
        <!--输出到文件-->
        <appender name="file" class="ch.qos.logback.core.rolling.RollingFileAppender">
            <file>${LOG_FILE_NAME_PATTERN}</file>
            <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
                <fileNamePattern>${LOG_FILE_NAME_PATTERN}.%d{yyyy-MM-dd}.%i.gz</fileNamePattern>
                <!-- 日志保留天数 -->
                <maxHistory>7</maxHistory>
                <!-- 每个日志文件的最大值 -->
                <maxFileSize>10MB</maxFileSize>
    
            </rollingPolicy>
            <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
                <layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.TraceIdPatternLogbackLayout">
                    <pattern>${FILE_LOG_PATTERN}</pattern>
                </layout>
            </encoder>
        </appender>
    
        <!-- (多环境配置日志级别)根据不同的环境设置不同的日志输出级别 -->
        <springProfile name="local">
            <root level="info">
                <appender-ref ref="console"/>
            </root>
        </springProfile>
        <springProfile name="dev">
            <root level="info">
                <appender-ref ref="file"/>
            </root>
        </springProfile>
    
        <springProfile name="staging">
            <root level="info">
                <appender-ref ref="file"/>
            </root>
        </springProfile>
    
        <springProfile name="online">
            <root level="info">
                <appender-ref ref="console"/>
                <appender-ref ref="file"/>
            </root>
        </springProfile>
    </configuration>
    

    主要修改项:

    FILE_LOG_PATTERN中添加: [%tid]

    encode中layout的class修改为:TraceIdPatternLogbackLayout (必须!!)

如何在ELK查看参考 参考ELK最佳实践 :JAVA项目实战小结

最后,更多的内容请参考官方文档,官网文档才是最好最快的学习途径