Logstash实时采集nginx日志存储到mysql数据中

为了优化我们的SEO工作,我们常常会分析Nginx的请求访问日志,以获取蜘蛛(搜索引擎爬虫)的爬行记录作为支撑。然而,这一任务往往繁琐且耗时。幸运的是,我们可以借助相关软件来简化这一过程,即利用ELK日志采集分析框架中的Logstash组件。通过Logstash,我们能够高效地采集并分析Nginx的日志,从而为SEO优化提供更加便捷和有力的支持。

第一步安装openjdk

sudo yum -y install java-1.8.0-openjdk
java -version

下载并logstash到/opt目录

[root@prod logstash-8.7.0]# cd /opt/
[root@prod opt]# pwd
/opt
[root@prod opt]# ls
logstash-8.7.0-linux-x86_64.tar.gz
[root@prod opt]# tar -xvf logstash-8.7.0-linux-x86_64.tar.gz
[root@prod opt]# ls
logstash-8.7.0  logstash-8.7.0-linux-x86_64.tar.gz

修改nginx.conf配置

将nginx.conf配置文件中的log_format字段代码替换成下方的代码。log_format指令用于定义日志的格式,这对于分析Nginx的请求访问日志,如蜘蛛爬行记录等,至关重要。

log_format main '{"@timestamp":"$time_iso8601",'
                           '"@version":"1",'
                           '"client":"$remote_addr",'
                           '"url":"$request_uri",'
                           '"status":"$status",'
                           '"domain":"$host",'
                           '"host":"$server_addr",'
                           '"size":$body_bytes_sent,'
                           '"responsetime":$request_time,'
                           '"referer": "$http_referer",'
                           '"ua": "$http_user_agent",'
                           '"request_time": "$request_time",'
                           '"request_method": "$request_method"'

               '}';

logstash.conf配置文件

vim /opt/logstash-8.7.0/config/logstash.conf

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  file {
        type => "nginx-access-log"
        path => ["/var/log/nginx/gocsgo.com.log"]
        start_position => "beginning"
        stat_interval => "2"
        codec => json
   }

}


filter{
  if ([ua] !~ "Baiduspider|360Spider|Sogou web spider|bingbot|Sosospider|YandexBot|Bytespider|Googlebot|YisouSpider|YoudaoBot|Yahoo! Slurp China|YandexBot|DNSPOD|AspiegelBot") {
     drop{}
  }

  mutate{
        remove_field => ["event"]
        remove_field => ["log"]
        add_field => { "domain_lookup" => "%{client}" }

    }

  useragent {
                source => "ua" ###字段来源
                target => "ua" ###指定覆盖的字段,如果没有会新生成这个字段
  }

  dns {
    reverse => [ "domain_lookup" ]
    action => "replace"
  }


   #mutate
   # {
   #     add_field => { "@ua" => "%{ua}" } #先新建一个新的字段,并将friends赋值给它
   # }

   #json {
   #    source => "@ua"
       #remove_field => [ "@alert","alert" ]
   #}

}


output {
   jdbc {
     driver_jar_path => "/opt/logstash-8.7.0/vendor/jar/jdbc/mysql-connector-j-8.0.32.jar"
     connection_string => "jdbc:mysql://10.138.0.2:3306/spider?user=spider&password=fNd3JyshAXdS47mD&useUnicode=true&characterEncoding=UTF8&useSSL=false"
     statement => ["INSERT INTO spider(status,domain,client,ua,request_uri,responsetime,request_time,request_method,timestamp,referer,domain_lookup)values(?,?,?,?,?,?,?,?,?,?,?)","status","domain","client","ua","url","responsetime","request_time","request_method","@timestamp","referer","domain_lookup"]
   }
   stdout{}


}

安装logstash-output-jdbc插件

[root@VM-16-4-centos logstash-8.7.0]# bin/logstash-plugin install logstash-outpu                                                                                                            t-jdbc

注意事项

1、修改数据库IP地址,账户,密码,这里默认数据库地址是10.138.0.2:3306,数据库spider,账户spider 密码fNd3JyshAXdS47mD

2、jdbc驱动jar包文件,默认logstash里面是没有自带数据库驱动jar包,需要自己上传到/opt/logstash-8.7.0/vendor/jar/jdbc目录中

[root@prod jdbc]# pwd
/opt/logstash-8.7.0/vendor/jar/jdbc
[root@prod jdbc]# ls
mysql-connector-j-8.0.32.jar  mysql-connector-java-8.0.23.jar

3、项目需要,使用logstash定时读取log文件,并插入mysql数据库中,output使用logstash-output-jdbc插件。该插件不是默认安装的,需要使用命令:bin/logstash-plugin install logstash-output-jdbc去官方拉取。在不联网的电脑上,这种方法就不可行了,解决:在可联网的电脑安装完这个插件后,把整个logstash文件夹拷贝到无法联网的电脑。

创建数据库表

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for spider
-- ----------------------------
DROP TABLE IF EXISTS `spider`;
CREATE TABLE `spider`  (
  `domain` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '域名',
  `client` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '客户端IP',
  `request_uri` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '请求地址',
  `timestamp` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '请求时间',
  `ua` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '用户终端浏览器等信息',
  `id` int(0) NOT NULL AUTO_INCREMENT,
  `status` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'HTTP请求状态',
  `responsetime` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '响应时间',
  `request_time` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '请求时间',
  `request_method` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '请求方法',
  `referer` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT ' url跳转来源',
  `domain_lookup` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 701068 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = DYNAMIC;

SET FOREIGN_KEY_CHECKS = 1;

执行命令启动看是否报错

/opt/logstash-8.7.0/bin/logstash -f /opt/logstash-8.7.0/config/logstash.conf

一切正常,开始安装supervisor守护进程,让logstash处于后台进程运行中。

[root@prod jdbc]# yum -y install epel-release
[root@prod jdbc]# yum -y install supervisor
[root@prod jdbc]# cat /etc/supervisord.d/logstash.ini
[program:logstash] 
environment=LS_HEAP_SIZE=5000m 
directory=/opt/logstash-8.7.0 
command=/opt/logstash-8.7.0/bin/logstash -f /opt/logstash-8.7.0/config/logstash.conf -w 10 -l /var/log/logstash/logstash.log 

启动服务

[root@prod jdbc]# systemctl start supervisord
[root@prod jdbc]# systemctl status supervisord
[root@prod jdbc]# supervisorctl update
[root@prod jdbc]# supervisorctl status
logstash                         RUNNING   pid 1724408, uptime 0:15:53

查看数据表中是否存在访问记录

声明: 本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。

给TA打赏
共{{data.count}}人
人已打赏
中间件

当面试官问“Redis内存满了怎么办”,别只想到LRU!

2024-11-22 21:20:34

中间件

Docker 最佳实战:Docker 部署单节点 MariaDB 实战指南

2024-11-29 13:54:32

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索