CodeQL学习笔记（code-learning）

发布时间：2025-02-02 01:42:55

41 阅读

0 评论

文章标签： bootstrap right

前言

codeql被誉为下一代的代码审计工具，对codeql进行了扫描测试及自定义规则的学习，在海量代码中查找有问题的代码，可以形象的类比为在一个村庄里面，发生了命案，codeql侦探需要根据一些信息在海量的人员信息中筛选出最有可能的犯人

那么就需要有输入、条件的判断，可能性的输出

扫描测试

首先来使用下codeql，看看codeql怎么运行的

安装

下载对应版本

https://github.com/github/codeql-cli-binaries/releasesgithub.com/github/codeql-cli-binaries/releases

创建目录：codeql-home，bin文件可以放在里面，便于增加环境变量

下载规则文件：

https://github.com/github/codeqlgithub.com/github/codeql

包含了python、java、js等规则

go规则文件：

github/codeql-gogithub.com/github/codeql-go/

里面是go的规则

使用

1、创建索引代码数据库

codeql database create <database> --language=<language-identifier>

language对应关系如下

Language	Identity
C/C++	cpp
C#	csharp
Go	go
Java	java
javascript/Typescript	javascript
Python	python

事例：创建代码扫描数据库

扫描python代码

codeql database create ./dbs/pythondbs --language=python --source-root=./codescan/smart-master

注：source-root 为源码路径，创建的dbs需要是个空文件夹

2、更新数据库

codeql database upgrade dbs/pythondbs

3、使用规则扫

codeql database analyze ./dbs/pythondbs codeql-repo/python --format=csv --output=analysis/python-results.csv

codeql-repo/python：是我们前面下载的python扫描规则
--format：结果输出格式：including CSV, SARIF, and graph formats
--output：结果文件路径

编译与非编译

对于编译型语言来说，需要在创建索引数据库的时候增加编译的功能，主要是针对java，对于go来说，直接扫描即可，参见：

Creating CodeQL databaseshelp.semmle.com/codeql/codeql-cli/procedures/create-codeql-database.html

中的Creating databases for compiled languages

对于非编译性的语言来说，直接扫描吧

对于go来说，可以编译也可不编译

基础知识

环境配置好了，下面就是在环境中是怎样进行犯人的查找了

一、QL语言

首先还是hello world，通过ql输出hello world，ql文件如下：

import javascript
select "Hello World"

二、数据库概述

描述项目代码的数据库，这个可以类比为将村庄里面的人都创建了一个数据库，codeql需要在数据库中抽丝剥茧的一步步发现犯人

数据库的创建分为两步：

数据的提取：将源代码文件转换为代码所定义的底层层次结构。对于编译型语言来说，这一步需要编译源代码。LGTM首先会确定出哪些文件需要进行处理，然后，根据文件中的源代码所使用的编程语言来确定相应的“提取器（extractor）”，并将源文件转换为关系表示形式，即所谓的“陷阱（trap）”文件。

· 数据导入：将提取的所有数据导入到数据库中，以便查询之用。分为数据库和所分析的所有源文件的副本

三、像处理数据一样处理代码-QL库

QL谓词是一个微型查询，用于表征数据之间的关系，并描述数据的某些属性，这里所谓的“查询”，就是用QL语言编写的一个程序、一段代码，或者说是一个函数，专门用来表示数据之间的关系，或描述某些属性。就这里来说，可以将“谓词”看成是用来访问村民个人信息的库函数，例如个人的身高或年龄等。
没有返回结果的谓词：需要以关键字predicate开头
带有返回结果的谓词：需要用返回结果的数据类型替换关键字predicate

比较有意思的一个参考代码：

import javascript

// select "hhhhhhhhh"
string getANeighbor(string country) {
    country = "France" and result = "Belgium"
    or
    country = "France" and result = "Germany"
    or
    country = "Germany" and result = "Austria"
    or
    country = "Germany" and result = "Belgium"
    or
    country = getANeighbor(result)
  }

  select getANeighbor("Belgium")

谓词是getANeighbor，返回结果是string类型的，输入的参数为country，并且使用了递归的方法

输出为：France和Germany

四、逻辑连接词

QL语言中exists对应于逻辑学中的存在量词，表示“存在某个”或“至少有一个”。此外，QL语言中还提供了一个通用量词，即forall，表示“所有的”，或“每一个”。

这个可以认为是一个关联关系，比如凶手的身高是超过150厘米吗？的答案为“是”，写成ql语言为

from Person p
where p.getHeight() > 150
select p

若是再增加条件：凶手的性别是男吗？答案为”是“，再增加一个查询：

from Person p
where p.getHeight() > 150 and p.getGender()="man"
select p

上述判断条件是准确的信息，若是存在变量的情况下咋办呢，比如凶手的头发，没人能确定是什么颜色，但是可以肯定凶手不是秃头，那么这个查询咋办呢？

引入一个变量c，

from Person t, string c
where t.getHairColor() = c
select t

这个由于select中不会用到c，最好的方式是使用如下，引入关键字exists：

from Person t
where exists(string c | t.getHairColor() = c)
select t

exists引入了一个字符串类型的临时变量 c，并且至少有一个字符串c满足条件 t.getHairColor() = c时，这个where子句才成立。

再增加一个条件：这个凶手是最老的吗，答案是“否”，这个的查询语句可见：

exists(Person t | t.getAge() > p.getAge())

存在另一个person，他的年纪大于凶手的年纪。

那要是查找最老的这个人呢：

max(int i | exists(Person p | p.getAge() = i) | i)

若是回答是是的话，在所有的居民中查找年纪，

from Person t
where t.getAge() = max(int i | exists(Person p | p.getAge() = i) | i)
select t

min(Person p | p.getLocation() = "east" | p order by p.getHeight())
count(Person p | p.getLocation() = "south" | p)
avg(Person p | | p.getHeight())
sum(Person p | p.getHairColor() = "brown" | p.getAge())

最终找凶手的查询语句是：

from Person p
where p.getHeight() > 150 and
p.getGender()="man" and 
exists(string c | t.getHairColor() = c) and
exists(Person t | t.getAge() > p.getAge())
select p

突然获取到新的线索，这个凶手住在村南边，那么就锁定了一个特定的群体了

增加一个新的方法，单独说明这个条件

predicate southern(Person p) {
    p.getLocation() = "south"
}

此时的筛选变成

from Person p
where southern(p)
select p

除此之外可以定一个新的类型：

class Southerner extends Person {
    Southerner() { southern(this) }
}

表达式southern(this)定义了这个类所表示的逻辑属性，我们称这个谓词为这个类的特征谓词。需要注意的是，这个表达式中使用了一个特殊变量this，就这里来说，该变量表示一个Person类型的值，也就是一个村民；如果this满足southern(this)这一限制条件，那么，this代表的村民就属于Southerner类，也就是居住在村南的村民。

使用的时候可直接从Southerner进行限制了，并且再重新增加一个条件，儿童不可能犯罪，定义一个儿童的类，此时并且限制了可以行动的地区，由于发生了凶杀案，可行动的地区限制到儿童本来的区域里面。此时重写isAllowedIn函数

class Child extends Person {
    /* the characteristic predicate */
    Child() { this.getAge() < 10 }
 
    /* a member predicate */
    override predicate isAllowedIn(string region) {
        region = this.getLocation()
    }
}

最终形成的查询语句是：

import tutorial
 
predicate southern(Person p) {
    p.getLocation() = "south"
}
 
class Southerner extends Person {
    /* the characteristic predicate */
    Southerner() { southern(this) }
}
 
class Child extends Person {
    /* the characteristic predicate */
    Child() { this.getAge() < 10 }
 
    /* a member predicate */
    override predicate isAllowedIn(string region) {
        region = this.getLocation()
    }
}
 
predicate bald(Person p) {
    not exists (string c | p.getHairColor() = c)
}
 
from Southerner s
where s.isAllowedIn("north") and bald(s)
select s

输入输出数据流

这里考虑一个经典的逻辑问题：将山羊、卷心菜和狼运到对岸

分析组成的基本元素，输入有两类：货物、河岸

货物的对象包括：山羊、卷心菜和狼

对岸分为两岸：左岸和右岸

定义货物类：

class Cargo extends string {
  Cargo() {
    this = "Nothing" or
    this = "Goat" or
    this = "Cabbage" or
    this = "Wolf"
  }
}

定义对岸类：

class Shore extends string {
  Shore() {
    this = "Left" or
    this = "Right"
  }
}

货物的运输会产生对岸的变化，定一个变化函数

 Shore other() {
    this = "Left" and result = "Right"
    or
    this = "Right" and result = "Left"
  }

货物最后需要位置的状态：

class State extends string {
  Shore manShore;
  Shore goatShore;
  Shore cabbageShore;
  Shore wolfShore;
  State() { this = manShore + "," + goatShore + "," + cabbageShore + "," + wolfShore 
}

初始的位置都是左岸，最终的位置都是右岸，这样派生两个子类：

class InitialState extends State {
  InitialState() { this = "Left" + "," + "Left" + "," + "Left" + "," + "Left" }
}

class GoalState extends State {
  GoalState() { this = "Right" + "," + "Right" + "," + "Right" + "," + "Right" }
}

最后的ql语句：

class Cargo extends string {
  Cargo() {
    this = "Nothing" or
    this = "Goat" or
    this = "Cabbage" or
    this = "Wolf"
  }
}
 
/** One of two shores. */
class Shore extends string {
  Shore() {
    this = "Left" or
    this = "Right"
  }
 
  /** Returns the other shore. */
  Shore other() {
    this = "Left" and result = "Right"
    or
    this = "Right" and result = "Left"
  }
}
 
/** Renders the state as a string. */
string renderState(Shore manShore, Shore goatShore, Shore cabbageShore, Shore wolfShore) {
  result = manShore + "," + goatShore + "," + cabbageShore + "," + wolfShore
}
 
/** A record of where everything is. */
class State extends string {
  Shore manShore;
  Shore goatShore;
  Shore cabbageShore;
  Shore wolfShore;
 
  State() { this = renderState(manShore, goatShore, cabbageShore, wolfShore) }
}
 
/** The initial state, where everything is on the left shore. */
class InitialState extends State {
  InitialState() { this = renderState("Left", "Left", "Left", "Left") }
}
 
/** The goal state, where everything is on the right shore. */
class GoalState extends State {
  GoalState() { this = renderState("Right", "Right", "Right", "Right") }
}

后面开始运输了，每次运动后的位置都会变化，运输其他货物的时候其他位置也会产生变化

State ferry(Cargo cargo) {
    cargo = "Nothing" and
    result = renderState(manShore.other(), goatShore, cabbageShore, wolfShore)
    or
    cargo = "Goat" and
    result = renderState(manShore.other(), goatShore.other(), cabbageShore, wolfShore)
    or
    cargo = "Cabbage" and
    result = renderState(manShore.other(), goatShore, cabbageShore.other(), wolfShore)
    or
    cargo = "Wolf" and
    result = renderState(manShore.other(), goatShore, cabbageShore, wolfShore.other())
  }

为了保护货物的安全性，两个方法：isSafe表示货物是安全的。safeFerry：表示只有安全情况下执行的动作

predicate isSafe() {
    // The goat can't eat the cabbage.
    (goatShore != cabbageShore or goatShore = manShore) and
    // The wolf can't eat the goat.
    (wolfShore != goatShore or wolfShore = manShore)
  }
State safeFerry(Cargo cargo) { result = this.ferry(cargo) and result.isSafe() }

查找路径，也就是数据流（没有理解）

/**
   * Returns all states that are reachable via safe ferrying.
   * `path` keeps track of how it is achieved.
   * `visitedStates` keeps track of previously visited states and is used to avoid loops.
   */
  State reachesVia(string path, string visitedStates) {
    // Trivial case: a state is always reachable from itself.
    this = result and
    visitedStates = this and
    path = ""
    or
    // A state is reachable using pathSoFar and then safely ferrying cargo.
    exists(string pathSoFar, string visitedStatesSoFar, Cargo cargo |
      result = this.reachesVia(pathSoFar, visitedStatesSoFar).safeFerry(cargo) and
      // The resulting state has not yet been visited.
      not exists(int i | i = visitedStatesSoFar.indexOf(result)) and
      visitedStates = visitedStatesSoFar + "/" + result and
      path = pathSoFar + "\n Ferry " + cargo
    )
  }

输出数据流

from string path
where any(InitialState i).reachesVia(path, _) = any(GoalState g)
select path

实例练习

1、环境搭建

使用到的工具有

工具：VScode 插件：codeql

安装成功后会出现：

其中这部分是增加codeql的代码数据库的，可以通过上述方法创建代码的数据库文件夹

对于JavaScript可以加载

esbena_bootstrap-pre-27047_javascript CodeQL databasegithub.com/githubsatelliteworkshops/codeql/releases/download/v1.0/esbena_bootstrap-pre-27047_javascript.zip

这个database作为练习，加载完成后会显示，这个就相当于我们的源码了，只是通过codeql进行了数据化的分析

下载以下项目，注意要使用git clone --recursive进行clone

github/vscode-codeql-startergithub.com/github/vscode-codeql-starter

这个项目是用来练习query编写的项目，其中的ql文件夹包括了C/C++，C#，Java、Javascript、Python等语言的sdk，供后面编写query进行调用，codeql-go包括了go的标准库内容 custom是编写query的demo

加载完成后会出现

二、查找项目中使用到的$函数调用

下面我们编写第一个规则，查找项目中使用到的$函数调用

编辑calls-to-dollar.ql文件

import javascript

from CallExpr dollarCall
where dollarCall.getCalleeName()="#34;
select dollarCall

点击邮件运行query，查找的结果直接展示到了右侧

点击相应的结果可以直接跳转到有问题的代码处

下面说下这个query的内容：

import 引入javascrit标准库；
from 从标准库中查找可以使用的类，将鼠标放在函数上可以看到解释如：

dollarCall 定义一个对象，然后通过where 查找对象的内容，getCalleeName()的内容如下，查找调用函数的名字，此时我们查找的是$()

select 输出结果

官方解释如下：

from /* ... 变量声明... */
where /* ... 逻辑公式 ... */
select /* ... 表达式 ... */

三、找到$函数的第一个参数调用

$(<first-argument>)咋样通过ql找到first-argument呢，编辑calls-to-dollar-arg.ql，首先把$()函数找出来，这个就是上面的ql内容，然后增加查找参数内容，首先定义一个函数值，然后将查找出来的值赋予

import javascript

from CallExpr dollarCall, Expr dollarArg
where dollarCall.getCalleeName()="#34; and
dollarArg=dollarCall.getArgument(0)
select dollarArg

重新运行，这样即可把所有的函数值找出来

四、使用AST节点数据

codeql本身内置了jquery方法，去代替$()，重新编写上面的查找函数值的ql文件

import javascript

from DataFlow::Node dollarArg
where dollarArg=jquery().getACall().getArgument(0)
select dollarArg

DataFlow::Node 这个可以认为是AST节点

查找属性

import javascript

from DataFlow::Node n
where n=jquery().getAPropertyRead("fn")
select n

五、发现数据流

数据流咋展示关联起来的还没有学会