✨ Add file analyzers for multiple languages

Introduce file analyzers for Python, JavaScript, JSON, YAML, and Markdown. Update Rust file analyzer to detect modified structs and traits. Enhance the central module to include the new analyzers and ensure proper analyzer selection based on file extensions. Add tests to confirm functionality and accuracy of the new analyzers.
hyperb1iss · Jul 29, 2024 · 2fb592c · 2fb592c
1 parent 9310eb7
commit 2fb592c
Show file tree

Hide file tree

Showing 9 changed files with 821 additions and 7 deletions.
diff --git a/docs/file-analyzers-docs.md b/docs/file-analyzers-docs.md
@@ -0,0 +1,156 @@
+# File Analyzers System Documentation
+
+## Table of Contents
+1. Introduction
+2. Rationale
+3. System Architecture
+4. File Type Analyzers
+   4.1 Rust Analyzer
+   4.2 JavaScript/TypeScript Analyzer
+   4.3 Python Analyzer
+   4.4 YAML Analyzer
+   4.5 JSON Analyzer
+   4.6 Markdown Analyzer
+5. Default Analyzer
+6. Extensibility
+7. Conclusion
+
+## 1. Introduction
+
+The File Analyzers System is a crucial component of the Git-Iris project, designed to provide intelligent analysis of changes made to different types of files in a Git repository. This system enhances the commit message generation process by offering context-aware insights into the modifications made across various file types.
+
+## 2. Rationale
+
+The primary goal of the File Analyzers System is to improve the quality and specificity of generated commit messages. By analyzing the changes made to different file types, the system can provide more detailed and accurate information about the nature of the modifications. This context-rich analysis enables the AI to generate more meaningful and descriptive commit messages, enhancing the overall clarity and usefulness of the Git history.
+
+Key benefits of the File Analyzers System include:
+- Language-specific analysis for more accurate insights
+- Detection of structural changes (e.g., function modifications, class updates)
+- Identification of changes to configuration files and documentation
+- Improved context for AI-generated commit messages
+
+## 3. System Architecture
+
+The File Analyzers System is built on a modular architecture, allowing for easy extension and maintenance. The core components include:
+
+1. `FileAnalyzer` trait: Defines the interface for all file analyzers
+2. `get_analyzer` function: Factory method to return the appropriate analyzer based on file extension
+3. Individual analyzer implementations for each supported file type
+4. Default analyzer for unsupported file types
+
+The system is designed to be easily extendable, allowing new file type analyzers to be added with minimal changes to the existing codebase.
+
+## 4. File Type Analyzers
+
+### 4.1 Rust Analyzer
+
+The Rust Analyzer is responsible for analyzing changes in Rust source files (`.rs`).
+
+Key features:
+- Detects modifications to functions, structs, and traits
+- Identifies changes to import statements
+- Provides information about added, modified, or removed Rust-specific constructs
+
+Implementation details:
+- Uses regex patterns to identify Rust-specific syntax
+- Extracts names of modified functions, structs, and traits
+- Checks for changes in use statements and extern crate declarations
+
+### 4.2 JavaScript/TypeScript Analyzer
+
+The JavaScript/TypeScript Analyzer handles changes in JavaScript and TypeScript files (`.js`, `.ts`).
+
+Key features:
+- Detects modifications to functions and classes
+- Identifies changes to import/export statements
+- Recognizes updates to React components (both class and functional)
+
+Implementation details:
+- Utilizes regex patterns to capture JavaScript/TypeScript syntax
+- Extracts names of modified functions, classes, and React components
+- Checks for changes in import and export statements
+- Distinguishes between regular functions and React functional components
+
+### 4.3 Python Analyzer
+
+The Python Analyzer is responsible for analyzing changes in Python source files (`.py`).
+
+Key features:
+- Detects modifications to functions and classes
+- Identifies changes to import statements
+- Recognizes updates to decorators
+
+Implementation details:
+- Uses regex patterns to identify Python-specific syntax
+- Extracts names of modified functions and classes
+- Checks for changes in import statements
+- Identifies modifications to decorator usage
+
+### 4.4 YAML Analyzer
+
+The YAML Analyzer handles changes in YAML configuration files (`.yaml`, `.yml`).
+
+Key features:
+- Detects modifications to top-level keys
+- Identifies changes to list structures
+- Recognizes updates to nested structures
+
+Implementation details:
+- Utilizes regex patterns to capture YAML syntax
+- Extracts names of modified top-level keys
+- Checks for changes in list structures and nested objects
+
+### 4.5 JSON Analyzer
+
+The JSON Analyzer is responsible for analyzing changes in JSON files (`.json`).
+
+Key features:
+- Detects modifications to top-level keys
+- Identifies changes to array structures
+- Recognizes updates to nested objects
+
+Implementation details:
+- Uses regex patterns to identify JSON syntax
+- Extracts names of modified top-level keys
+- Checks for changes in array structures and nested objects
+
+### 4.6 Markdown Analyzer
+
+The Markdown Analyzer handles changes in Markdown documentation files (`.md`).
+
+Key features:
+- Detects modifications to headers
+- Identifies changes to list structures
+- Recognizes updates to code blocks and links
+
+Implementation details:
+- Utilizes regex patterns to capture Markdown syntax
+- Extracts modified headers
+- Checks for changes in list structures, code blocks, and links
+
+## 5. Default Analyzer
+
+The Default Analyzer is used for file types that are not specifically supported by the system. It provides a basic analysis without any file-type-specific insights.
+
+Key features:
+- Provides a generic analysis for unsupported file types
+- Returns the file type as "Unknown file type"
+
+Implementation details:
+- Returns an empty vector of analysis results
+- Acts as a fallback for any file type not recognized by the system
+
+## 6. Extensibility
+
+The File Analyzers System is designed to be easily extendable. To add support for a new file type:
+
+1. Create a new struct implementing the `FileAnalyzer` trait
+2. Implement the `analyze` and `get_file_type` methods for the new analyzer
+3. Update the `get_analyzer` function in `mod.rs` to return the new analyzer for the appropriate file extension
+
+This modular design allows for easy addition of new file type support without modifying existing analyzers.
+
+## 7. Conclusion
+
+The File Analyzers System is a powerful and flexible component of the Git-Iris project. By providing detailed, language-specific analysis of file changes, it significantly enhances the context available for generating meaningful commit messages. The system's modular architecture ensures easy maintenance and extensibility, allowing for future improvements and additions to support new file types as needed.
+
diff --git a/src/file_analyzers/javascript.rs b/src/file_analyzers/javascript.rs
@@ -0,0 +1,104 @@
+use super::FileAnalyzer;
+use crate::git::FileChange;
+use regex::Regex;
+use std::collections::HashSet;
+
+pub struct JavaScriptAnalyzer;
+
+impl FileAnalyzer for JavaScriptAnalyzer {
+    fn analyze(&self, _file: &str, change: &FileChange) -> Vec<String> {
+        let mut analysis = Vec::new();
+
+        if let Some(functions) = extract_modified_functions(&change.diff) {
+            println!("JavaScript Debug: Detected functions: {:?}", functions);
+            analysis.push(format!("Modified functions: {}", functions.join(", ")));
+        }
+
+        if let Some(classes) = extract_modified_classes(&change.diff) {
+            analysis.push(format!("Modified classes: {}", classes.join(", ")));
+        }
+
+        if has_import_changes(&change.diff) {
+            analysis.push("Import statements have been modified".to_string());
+        }
+
+        if let Some(components) = extract_modified_react_components(&change.diff) {
+            println!(
+                "JavaScript Debug: Detected React components: {:?}",
+                components
+            );
+            analysis.push(format!(
+                "Modified React components: {}",
+                components.join(", ")
+            ));
+        }
+
+        println!("JavaScript Debug: Final analysis: {:?}", analysis);
+        analysis
+    }
+
+    fn get_file_type(&self) -> &'static str {
+        "JavaScript/TypeScript source file"
+    }
+}
+
+fn extract_modified_functions(diff: &str) -> Option<Vec<String>> {
+    let re = Regex::new(
+        r"(?m)^[+-]\s*(function\s+(\w+)|const\s+(\w+)\s*=\s*(\([^)]*\)\s*=>|\function))",
+    )
+    .unwrap();
+    let functions: Vec<String> = re
+        .captures_iter(diff)
+        .filter_map(|cap| cap.get(2).or(cap.get(3)).map(|m| m.as_str().to_string()))
+        .collect();
+
+    if functions.is_empty() {
+        None
+    } else {
+        Some(functions)
+    }
+}
+
+fn extract_modified_classes(diff: &str) -> Option<Vec<String>> {
+    let re = Regex::new(r"(?m)^[+-]\s*class\s+(\w+)").unwrap();
+    let classes: Vec<String> = re
+        .captures_iter(diff)
+        .filter_map(|cap| cap.get(1).map(|m| m.as_str().to_string()))
+        .collect();
+
+    if classes.is_empty() {
+        None
+    } else {
+        Some(classes)
+    }
+}
+
+fn has_import_changes(diff: &str) -> bool {
+    let re = Regex::new(r"(?m)^[+-]\s*(import|export)").unwrap();
+    re.is_match(diff)
+}
+
+fn extract_modified_react_components(diff: &str) -> Option<Vec<String>> {
+    let class_re = Regex::new(r"(?m)^[+-]\s*class\s+(\w+)\s+extends\s+React\.Component").unwrap();
+    let func_re = Regex::new(r"(?m)^[+-]\s*(?:function\s+(\w+)|const\s+(\w+)\s*=)(?:\s*\([^)]*\))?\s*(?:=>)?\s*(?:\{[^}]*return|=>)\s*(?:<|\()").unwrap();
+
+    let mut components = HashSet::new();
+
+    for cap in class_re.captures_iter(diff) {
+        if let Some(m) = cap.get(1) {
+            components.insert(m.as_str().to_string());
+        }
+    }
+
+    for cap in func_re.captures_iter(diff) {
+        if let Some(m) = cap.get(1).or(cap.get(2)) {
+            components.insert(m.as_str().to_string());
+        }
+    }
+
+    if components.is_empty() {
+        None
+    } else {
+        Some(components.into_iter().collect())
+    }
+}
diff --git a/src/file_analyzers/json.rs b/src/file_analyzers/json.rs
@@ -0,0 +1,66 @@
+use super::FileAnalyzer;
+use crate::git::FileChange;
+use regex::Regex;
+use std::collections::HashSet;
+
+pub struct JsonAnalyzer;
+
+impl FileAnalyzer for JsonAnalyzer {
+    fn analyze(&self, _file: &str, change: &FileChange) -> Vec<String> {
+        let mut analysis = Vec::new();
+
+        if let Some(keys) = extract_modified_top_level_keys(&change.diff) {
+            println!("JSON Debug: Detected keys: {:?}", keys);
+            analysis.push(format!("Modified top-level keys: {}", keys.join(", ")));
+        }
+
+        if has_array_changes(&change.diff) {
+            analysis.push("Array structures have been modified".to_string());
+        }
+
+        if has_nested_object_changes(&change.diff) {
+            analysis.push("Nested objects have been modified".to_string());
+        }
+
+        println!("JSON Debug: Final analysis: {:?}", analysis);
+        analysis
+    }
+
+    fn get_file_type(&self) -> &'static str {
+        "JSON configuration file"
+    }
+}
+
+fn extract_modified_top_level_keys(diff: &str) -> Option<Vec<String>> {
+    let lines: Vec<&str> = diff.lines().collect();
+    let re = Regex::new(r#"^[+-]\s*"(\w+)"\s*:"#).unwrap();
+    let mut keys = HashSet::new();
+
+    for (i, line) in lines.iter().enumerate() {
+        if let Some(cap) = re.captures(line) {
+            let key = cap.get(1).unwrap().as_str();
+            let prev_line = if i > 0 { lines[i - 1] } else { "" };
+            let next_line = lines.get(i + 1).unwrap_or(&"");
+
+            if !prev_line.trim().ends_with("{") && !next_line.trim().starts_with("}") {
+                keys.insert(key.to_string());
+            }
+        }
+    }
+
+    if keys.is_empty() {
+        None
+    } else {
+        Some(keys.into_iter().collect())
+    }
+}
+
+fn has_array_changes(diff: &str) -> bool {
+    let re = Regex::new(r#"(?m)^[+-]\s*(?:"[^"]+"\s*:\s*)?\[|\s*[+-]\s*"[^"]+","#).unwrap();
+    re.is_match(diff)
+}
+
+fn has_nested_object_changes(diff: &str) -> bool {
+    let re = Regex::new(r#"(?m)^[+-]\s*"[^"]+"\s*:\s*\{|\s*[+-]\s*"[^"]+"\s*:"#).unwrap();
+    re.is_match(diff)
+}
diff --git a/src/file_analyzers/markdown.rs b/src/file_analyzers/markdown.rs
@@ -0,0 +1,67 @@
+use super::FileAnalyzer;
+use crate::git::FileChange;
+use regex::Regex;
+
+/// Analyzer for Markdown files
+pub struct MarkdownAnalyzer;
+
+impl FileAnalyzer for MarkdownAnalyzer {
+    fn analyze(&self, _file: &str, change: &FileChange) -> Vec<String> {
+        let mut analysis = Vec::new();
+
+        // Check for new or modified headers
+        if let Some(headers) = extract_modified_headers(&change.diff) {
+            analysis.push(format!("Modified headers: {}", headers.join(", ")));
+        }
+
+        // Check for changes in lists
+        if has_list_changes(&change.diff) {
+            analysis.push("List structures have been modified".to_string());
+        }
+
+        // Check for changes in code blocks
+        if has_code_block_changes(&change.diff) {
+            analysis.push("Code blocks have been modified".to_string());
+        }
+
+        // Check for changes in links
+        if has_link_changes(&change.diff) {
+            analysis.push("Links have been modified".to_string());
+        }
+
+        analysis
+    }
+
+    fn get_file_type(&self) -> &'static str {
+        "Markdown file"
+    }
+}
+
+fn extract_modified_headers(diff: &str) -> Option<Vec<String>> {
+    let re = Regex::new(r"[+-]\s*(#{1,6})\s+(.+)").unwrap();
+    let headers: Vec<String> = re
+        .captures_iter(diff)
+        .filter_map(|cap| cap.get(2).map(|m| m.as_str().to_string()))
+        .collect();
+
+    if headers.is_empty() {
+        None
+    } else {
+        Some(headers)
+    }
+}
+
+fn has_list_changes(diff: &str) -> bool {
+    let re = Regex::new(r"[+-]\s*[-*+]\s+").unwrap();
+    re.is_match(diff)
+}
+
+fn has_code_block_changes(diff: &str) -> bool {
+    let re = Regex::new(r"[+-]\s*```").unwrap();
+    re.is_match(diff)
+}
+
+fn has_link_changes(diff: &str) -> bool {
+    let re = Regex::new(r"[+-]\s*\[.+\]\(.+\)").unwrap();
+    re.is_match(diff)
+}