为了优化我们的SEO工作,我们常常会分析Nginx的请求访问日志,以获取蜘蛛(搜索引擎爬虫)的爬行记录作为支撑。然而,这一任务往往繁琐且耗时。幸运的是,我们可以借助相关软件来简化这一过程,即利用ELK日志采集分析框架中的Logstash组件。通过Logstash,我们能够高效地采集并分析Nginx的日志,从而为SEO优化提供更加便捷和有力的支持。
第一步安装openjdk
sudo yum -y install java-1.8.0-openjdk
java -version
下载并logstash到/opt目录
[root@prod logstash-8.7.0]# cd /opt/
[root@prod opt]# pwd
/opt
[root@prod opt]# ls
logstash-8.7.0-linux-x86_64.tar.gz
[root@prod opt]# tar -xvf logstash-8.7.0-linux-x86_64.tar.gz
[root@prod opt]# ls
logstash-8.7.0 logstash-8.7.0-linux-x86_64.tar.gz
修改nginx.conf配置
将nginx.conf配置文件中的log_format字段代码替换成下方的代码。log_format
指令用于定义日志的格式,这对于分析Nginx的请求访问日志,如蜘蛛爬行记录等,至关重要。
log_format main '{"@timestamp":"$time_iso8601",'
'"@version":"1",'
'"client":"$remote_addr",'
'"url":"$request_uri",'
'"status":"$status",'
'"domain":"$host",'
'"host":"$server_addr",'
'"size":$body_bytes_sent,'
'"responsetime":$request_time,'
'"referer": "$http_referer",'
'"ua": "$http_user_agent",'
'"request_time": "$request_time",'
'"request_method": "$request_method"'
'}';
logstash.conf配置文件
vim /opt/logstash-8.7.0/config/logstash.conf
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.
input {
file {
type => "nginx-access-log"
path => ["/var/log/nginx/gocsgo.com.log"]
start_position => "beginning"
stat_interval => "2"
codec => json
}
}
filter{
if ([ua] !~ "Baiduspider|360Spider|Sogou web spider|bingbot|Sosospider|YandexBot|Bytespider|Googlebot|YisouSpider|YoudaoBot|Yahoo! Slurp China|YandexBot|DNSPOD|AspiegelBot") {
drop{}
}
mutate{
remove_field => ["event"]
remove_field => ["log"]
add_field => { "domain_lookup" => "%{client}" }
}
useragent {
source => "ua" ###字段来源
target => "ua" ###指定覆盖的字段,如果没有会新生成这个字段
}
dns {
reverse => [ "domain_lookup" ]
action => "replace"
}
#mutate
# {
# add_field => { "@ua" => "%{ua}" } #先新建一个新的字段,并将friends赋值给它
# }
#json {
# source => "@ua"
#remove_field => [ "@alert","alert" ]
#}
}
output {
jdbc {
driver_jar_path => "/opt/logstash-8.7.0/vendor/jar/jdbc/mysql-connector-j-8.0.32.jar"
connection_string => "jdbc:mysql://10.138.0.2:3306/spider?user=spider&password=fNd3JyshAXdS47mD&useUnicode=true&characterEncoding=UTF8&useSSL=false"
statement => ["INSERT INTO spider(status,domain,client,ua,request_uri,responsetime,request_time,request_method,timestamp,referer,domain_lookup)values(?,?,?,?,?,?,?,?,?,?,?)","status","domain","client","ua","url","responsetime","request_time","request_method","@timestamp","referer","domain_lookup"]
}
stdout{}
}
安装logstash-output-jdbc插件
[root@VM-16-4-centos logstash-8.7.0]# bin/logstash-plugin install logstash-outpu t-jdbc
注意事项
1、修改数据库IP地址,账户,密码,这里默认数据库地址是10.138.0.2:3306,数据库spider,账户spider 密码fNd3JyshAXdS47mD
2、jdbc驱动jar包文件,默认logstash里面是没有自带数据库驱动jar包,需要自己上传到/opt/logstash-8.7.0/vendor/jar/jdbc目录中
[root@prod jdbc]# pwd
/opt/logstash-8.7.0/vendor/jar/jdbc
[root@prod jdbc]# ls
mysql-connector-j-8.0.32.jar mysql-connector-java-8.0.23.jar
3、项目需要,使用logstash定时读取log文件,并插入mysql数据库中,output使用logstash-output-jdbc插件。该插件不是默认安装的,需要使用命令:bin/logstash-plugin install logstash-output-jdbc去官方拉取。在不联网的电脑上,这种方法就不可行了,解决:在可联网的电脑安装完这个插件后,把整个logstash文件夹拷贝到无法联网的电脑。
创建数据库表
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for spider
-- ----------------------------
DROP TABLE IF EXISTS `spider`;
CREATE TABLE `spider` (
`domain` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '域名',
`client` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '客户端IP',
`request_uri` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '请求地址',
`timestamp` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '请求时间',
`ua` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '用户终端浏览器等信息',
`id` int(0) NOT NULL AUTO_INCREMENT,
`status` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'HTTP请求状态',
`responsetime` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '响应时间',
`request_time` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '请求时间',
`request_method` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '请求方法',
`referer` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT ' url跳转来源',
`domain_lookup` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 701068 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = DYNAMIC;
SET FOREIGN_KEY_CHECKS = 1;
执行命令启动看是否报错
/opt/logstash-8.7.0/bin/logstash -f /opt/logstash-8.7.0/config/logstash.conf
一切正常,开始安装supervisor守护进程,让logstash处于后台进程运行中。
[root@prod jdbc]# yum -y install epel-release
[root@prod jdbc]# yum -y install supervisor
[root@prod jdbc]# cat /etc/supervisord.d/logstash.ini
[program:logstash]
environment=LS_HEAP_SIZE=5000m
directory=/opt/logstash-8.7.0
command=/opt/logstash-8.7.0/bin/logstash -f /opt/logstash-8.7.0/config/logstash.conf -w 10 -l /var/log/logstash/logstash.log
启动服务
[root@prod jdbc]# systemctl start supervisord
[root@prod jdbc]# systemctl status supervisord
[root@prod jdbc]# supervisorctl update
[root@prod jdbc]# supervisorctl status
logstash RUNNING pid 1724408, uptime 0:15:53
查看数据表中是否存在访问记录