Skip to content

Commit

Permalink
✨ Add file analyzers for multiple languages
Browse files Browse the repository at this point in the history
Introduce file analyzers for Python, JavaScript, JSON, YAML, and Markdown.
Update Rust file analyzer to detect modified structs and traits. Enhance the
central module to include the new analyzers and ensure proper analyzer
selection based on file extensions.

Add tests to confirm functionality and accuracy of the new analyzers.
  • Loading branch information
hyperb1iss committed Jul 29, 2024
1 parent 9310eb7 commit 2fb592c
Show file tree
Hide file tree
Showing 9 changed files with 821 additions and 7 deletions.
156 changes: 156 additions & 0 deletions docs/file-analyzers-docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# File Analyzers System Documentation

## Table of Contents
1. Introduction
2. Rationale
3. System Architecture
4. File Type Analyzers
4.1 Rust Analyzer
4.2 JavaScript/TypeScript Analyzer
4.3 Python Analyzer
4.4 YAML Analyzer
4.5 JSON Analyzer
4.6 Markdown Analyzer
5. Default Analyzer
6. Extensibility
7. Conclusion

## 1. Introduction

The File Analyzers System is a crucial component of the Git-Iris project, designed to provide intelligent analysis of changes made to different types of files in a Git repository. This system enhances the commit message generation process by offering context-aware insights into the modifications made across various file types.

## 2. Rationale

The primary goal of the File Analyzers System is to improve the quality and specificity of generated commit messages. By analyzing the changes made to different file types, the system can provide more detailed and accurate information about the nature of the modifications. This context-rich analysis enables the AI to generate more meaningful and descriptive commit messages, enhancing the overall clarity and usefulness of the Git history.

Key benefits of the File Analyzers System include:
- Language-specific analysis for more accurate insights
- Detection of structural changes (e.g., function modifications, class updates)
- Identification of changes to configuration files and documentation
- Improved context for AI-generated commit messages

## 3. System Architecture

The File Analyzers System is built on a modular architecture, allowing for easy extension and maintenance. The core components include:

1. `FileAnalyzer` trait: Defines the interface for all file analyzers
2. `get_analyzer` function: Factory method to return the appropriate analyzer based on file extension
3. Individual analyzer implementations for each supported file type
4. Default analyzer for unsupported file types

The system is designed to be easily extendable, allowing new file type analyzers to be added with minimal changes to the existing codebase.

## 4. File Type Analyzers

### 4.1 Rust Analyzer

The Rust Analyzer is responsible for analyzing changes in Rust source files (`.rs`).

Key features:
- Detects modifications to functions, structs, and traits
- Identifies changes to import statements
- Provides information about added, modified, or removed Rust-specific constructs

Implementation details:
- Uses regex patterns to identify Rust-specific syntax
- Extracts names of modified functions, structs, and traits
- Checks for changes in use statements and extern crate declarations

### 4.2 JavaScript/TypeScript Analyzer

The JavaScript/TypeScript Analyzer handles changes in JavaScript and TypeScript files (`.js`, `.ts`).

Key features:
- Detects modifications to functions and classes
- Identifies changes to import/export statements
- Recognizes updates to React components (both class and functional)

Implementation details:
- Utilizes regex patterns to capture JavaScript/TypeScript syntax
- Extracts names of modified functions, classes, and React components
- Checks for changes in import and export statements
- Distinguishes between regular functions and React functional components

### 4.3 Python Analyzer

The Python Analyzer is responsible for analyzing changes in Python source files (`.py`).

Key features:
- Detects modifications to functions and classes
- Identifies changes to import statements
- Recognizes updates to decorators

Implementation details:
- Uses regex patterns to identify Python-specific syntax
- Extracts names of modified functions and classes
- Checks for changes in import statements
- Identifies modifications to decorator usage

### 4.4 YAML Analyzer

The YAML Analyzer handles changes in YAML configuration files (`.yaml`, `.yml`).

Key features:
- Detects modifications to top-level keys
- Identifies changes to list structures
- Recognizes updates to nested structures

Implementation details:
- Utilizes regex patterns to capture YAML syntax
- Extracts names of modified top-level keys
- Checks for changes in list structures and nested objects

### 4.5 JSON Analyzer

The JSON Analyzer is responsible for analyzing changes in JSON files (`.json`).

Key features:
- Detects modifications to top-level keys
- Identifies changes to array structures
- Recognizes updates to nested objects

Implementation details:
- Uses regex patterns to identify JSON syntax
- Extracts names of modified top-level keys
- Checks for changes in array structures and nested objects

### 4.6 Markdown Analyzer

The Markdown Analyzer handles changes in Markdown documentation files (`.md`).

Key features:
- Detects modifications to headers
- Identifies changes to list structures
- Recognizes updates to code blocks and links

Implementation details:
- Utilizes regex patterns to capture Markdown syntax
- Extracts modified headers
- Checks for changes in list structures, code blocks, and links

## 5. Default Analyzer

The Default Analyzer is used for file types that are not specifically supported by the system. It provides a basic analysis without any file-type-specific insights.

Key features:
- Provides a generic analysis for unsupported file types
- Returns the file type as "Unknown file type"

Implementation details:
- Returns an empty vector of analysis results
- Acts as a fallback for any file type not recognized by the system

## 6. Extensibility

The File Analyzers System is designed to be easily extendable. To add support for a new file type:

1. Create a new struct implementing the `FileAnalyzer` trait
2. Implement the `analyze` and `get_file_type` methods for the new analyzer
3. Update the `get_analyzer` function in `mod.rs` to return the new analyzer for the appropriate file extension

This modular design allows for easy addition of new file type support without modifying existing analyzers.

## 7. Conclusion

The File Analyzers System is a powerful and flexible component of the Git-Iris project. By providing detailed, language-specific analysis of file changes, it significantly enhances the context available for generating meaningful commit messages. The system's modular architecture ensures easy maintenance and extensibility, allowing for future improvements and additions to support new file types as needed.

104 changes: 104 additions & 0 deletions src/file_analyzers/javascript.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
use super::FileAnalyzer;
use crate::git::FileChange;
use regex::Regex;
use std::collections::HashSet;

pub struct JavaScriptAnalyzer;

impl FileAnalyzer for JavaScriptAnalyzer {
fn analyze(&self, _file: &str, change: &FileChange) -> Vec<String> {
let mut analysis = Vec::new();

if let Some(functions) = extract_modified_functions(&change.diff) {
println!("JavaScript Debug: Detected functions: {:?}", functions);
analysis.push(format!("Modified functions: {}", functions.join(", ")));
}

if let Some(classes) = extract_modified_classes(&change.diff) {
analysis.push(format!("Modified classes: {}", classes.join(", ")));
}

if has_import_changes(&change.diff) {
analysis.push("Import statements have been modified".to_string());
}

if let Some(components) = extract_modified_react_components(&change.diff) {
println!(
"JavaScript Debug: Detected React components: {:?}",
components
);
analysis.push(format!(
"Modified React components: {}",
components.join(", ")
));
}

println!("JavaScript Debug: Final analysis: {:?}", analysis);
analysis
}

fn get_file_type(&self) -> &'static str {
"JavaScript/TypeScript source file"
}
}

fn extract_modified_functions(diff: &str) -> Option<Vec<String>> {
let re = Regex::new(
r"(?m)^[+-]\s*(function\s+(\w+)|const\s+(\w+)\s*=\s*(\([^)]*\)\s*=>|\function))",
)
.unwrap();
let functions: Vec<String> = re
.captures_iter(diff)
.filter_map(|cap| cap.get(2).or(cap.get(3)).map(|m| m.as_str().to_string()))
.collect();

if functions.is_empty() {
None
} else {
Some(functions)
}
}

fn extract_modified_classes(diff: &str) -> Option<Vec<String>> {
let re = Regex::new(r"(?m)^[+-]\s*class\s+(\w+)").unwrap();
let classes: Vec<String> = re
.captures_iter(diff)
.filter_map(|cap| cap.get(1).map(|m| m.as_str().to_string()))
.collect();

if classes.is_empty() {
None
} else {
Some(classes)
}
}

fn has_import_changes(diff: &str) -> bool {
let re = Regex::new(r"(?m)^[+-]\s*(import|export)").unwrap();
re.is_match(diff)
}

fn extract_modified_react_components(diff: &str) -> Option<Vec<String>> {
let class_re = Regex::new(r"(?m)^[+-]\s*class\s+(\w+)\s+extends\s+React\.Component").unwrap();
let func_re = Regex::new(r"(?m)^[+-]\s*(?:function\s+(\w+)|const\s+(\w+)\s*=)(?:\s*\([^)]*\))?\s*(?:=>)?\s*(?:\{[^}]*return|=>)\s*(?:<|\()").unwrap();

let mut components = HashSet::new();

for cap in class_re.captures_iter(diff) {
if let Some(m) = cap.get(1) {
components.insert(m.as_str().to_string());
}
}

for cap in func_re.captures_iter(diff) {
if let Some(m) = cap.get(1).or(cap.get(2)) {
components.insert(m.as_str().to_string());
}
}

if components.is_empty() {
None
} else {
Some(components.into_iter().collect())
}
}
66 changes: 66 additions & 0 deletions src/file_analyzers/json.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
use super::FileAnalyzer;
use crate::git::FileChange;
use regex::Regex;
use std::collections::HashSet;

pub struct JsonAnalyzer;

impl FileAnalyzer for JsonAnalyzer {
fn analyze(&self, _file: &str, change: &FileChange) -> Vec<String> {
let mut analysis = Vec::new();

if let Some(keys) = extract_modified_top_level_keys(&change.diff) {
println!("JSON Debug: Detected keys: {:?}", keys);
analysis.push(format!("Modified top-level keys: {}", keys.join(", ")));
}

if has_array_changes(&change.diff) {
analysis.push("Array structures have been modified".to_string());
}

if has_nested_object_changes(&change.diff) {
analysis.push("Nested objects have been modified".to_string());
}

println!("JSON Debug: Final analysis: {:?}", analysis);
analysis
}

fn get_file_type(&self) -> &'static str {
"JSON configuration file"
}
}

fn extract_modified_top_level_keys(diff: &str) -> Option<Vec<String>> {
let lines: Vec<&str> = diff.lines().collect();
let re = Regex::new(r#"^[+-]\s*"(\w+)"\s*:"#).unwrap();
let mut keys = HashSet::new();

for (i, line) in lines.iter().enumerate() {
if let Some(cap) = re.captures(line) {
let key = cap.get(1).unwrap().as_str();
let prev_line = if i > 0 { lines[i - 1] } else { "" };
let next_line = lines.get(i + 1).unwrap_or(&"");

if !prev_line.trim().ends_with("{") && !next_line.trim().starts_with("}") {
keys.insert(key.to_string());
}
}
}

if keys.is_empty() {
None
} else {
Some(keys.into_iter().collect())
}
}

fn has_array_changes(diff: &str) -> bool {
let re = Regex::new(r#"(?m)^[+-]\s*(?:"[^"]+"\s*:\s*)?\[|\s*[+-]\s*"[^"]+","#).unwrap();
re.is_match(diff)
}

fn has_nested_object_changes(diff: &str) -> bool {
let re = Regex::new(r#"(?m)^[+-]\s*"[^"]+"\s*:\s*\{|\s*[+-]\s*"[^"]+"\s*:"#).unwrap();
re.is_match(diff)
}
67 changes: 67 additions & 0 deletions src/file_analyzers/markdown.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
use super::FileAnalyzer;
use crate::git::FileChange;
use regex::Regex;

/// Analyzer for Markdown files
pub struct MarkdownAnalyzer;

impl FileAnalyzer for MarkdownAnalyzer {
fn analyze(&self, _file: &str, change: &FileChange) -> Vec<String> {
let mut analysis = Vec::new();

// Check for new or modified headers
if let Some(headers) = extract_modified_headers(&change.diff) {
analysis.push(format!("Modified headers: {}", headers.join(", ")));
}

// Check for changes in lists
if has_list_changes(&change.diff) {
analysis.push("List structures have been modified".to_string());
}

// Check for changes in code blocks
if has_code_block_changes(&change.diff) {
analysis.push("Code blocks have been modified".to_string());
}

// Check for changes in links
if has_link_changes(&change.diff) {
analysis.push("Links have been modified".to_string());
}

analysis
}

fn get_file_type(&self) -> &'static str {
"Markdown file"
}
}

fn extract_modified_headers(diff: &str) -> Option<Vec<String>> {
let re = Regex::new(r"[+-]\s*(#{1,6})\s+(.+)").unwrap();
let headers: Vec<String> = re
.captures_iter(diff)
.filter_map(|cap| cap.get(2).map(|m| m.as_str().to_string()))
.collect();

if headers.is_empty() {
None
} else {
Some(headers)
}
}

fn has_list_changes(diff: &str) -> bool {
let re = Regex::new(r"[+-]\s*[-*+]\s+").unwrap();
re.is_match(diff)
}

fn has_code_block_changes(diff: &str) -> bool {
let re = Regex::new(r"[+-]\s*```").unwrap();
re.is_match(diff)
}

fn has_link_changes(diff: &str) -> bool {
let re = Regex::new(r"[+-]\s*\[.+\]\(.+\)").unwrap();
re.is_match(diff)
}
Loading

0 comments on commit 2fb592c

Please sign in to comment.