API Reference¶
mmocr.apis¶
-
mmocr.apis.
model_inference
(model, imgs, batch_mode=False)[源代码]¶ Inference image(s) with the detector.
- 参数
model (nn.Module) – The loaded detector.
imgs (str/ndarray or list[str/ndarray] or tuple[str/ndarray]) – Either image files or loaded images.
batch_mode (bool) – If True, use batch mode for inference.
- 返回
Predicted results.
- 返回类型
result (dict)
mmocr.core¶
evaluation¶
-
mmocr.core.evaluation.
eval_hmean_ic13
(det_boxes, gt_boxes, gt_ignored_boxes, precision_thr=0.4, recall_thr=0.8, center_dist_thr=1.0, one2one_score=1.0, one2many_score=0.8, many2one_score=1.0)[源代码]¶ Evalute hmean of text detection using the icdar2013 standard.
- 参数
det_boxes (list[list[list[float]]]) – List of arrays of shape (n, 2k). Each element is the det_boxes for one img. k>=4.
gt_boxes (list[list[list[float]]]) – List of arrays of shape (m, 2k). Each element is the gt_boxes for one img. k>=4.
gt_ignored_boxes (list[list[list[float]]]) – List of arrays of (l, 2k). Each element is the ignored gt_boxes for one img. k>=4.
precision_thr (float) – Precision threshold of the iou of one (gt_box, det_box) pair.
recall_thr (float) – Recall threshold of the iou of one (gt_box, det_box) pair.
center_dist_thr (float) – Distance threshold of one (gt_box, det_box) center point pair.
one2one_score (float) – Reward when one gt matches one det_box.
one2many_score (float) – Reward when one gt matches many det_boxes.
many2one_score (float) – Reward when many gts match one det_box.
- 返回
Tuple of dicts which encodes the hmean for the dataset and all images.
- 返回类型
hmean (tuple[dict])
-
mmocr.core.evaluation.
eval_hmean_iou
(pred_boxes, gt_boxes, gt_ignored_boxes, iou_thr=0.5, precision_thr=0.5)[源代码]¶ Evalute hmean of text detection using IOU standard.
- 参数
pred_boxes (list[list[list[float]]]) – Text boxes for an img list. Each box has 2k (>=8) values.
gt_boxes (list[list[list[float]]]) – Ground truth text boxes for an img list. Each box has 2k (>=8) values.
gt_ignored_boxes (list[list[list[float]]]) – Ignored ground truth text boxes for an img list. Each box has 2k (>=8) values.
iou_thr (float) – Iou threshold when one (gt_box, det_box) pair is matched.
precision_thr (float) – Precision threshold when one (gt_box, det_box) pair is matched.
- 返回
- Tuple of dicts indicates the hmean for the dataset
and all images.
- 返回类型
hmean (tuple[dict])
-
mmocr.core.evaluation.
eval_ocr_metric
(pred_texts, gt_texts)[源代码]¶ Evaluate the text recognition performance with metric: word accuracy and 1-N.E.D. See https://rrc.cvc.uab.es/?ch=14&com=tasks for details.
- 参数
pred_texts (list[str]) – Text strings of prediction.
gt_texts (list[str]) – Text strings of ground truth.
- 返回
- float]): Metric dict for text recognition, include:
word_acc: Accuracy in word level.
word_acc_ignore_case: Accuracy in word level, ignore letter case.
- word_acc_ignore_case_symbol: Accuracy in word level, ignore
letter case and symbol. (default metric for academic evaluation)
- char_recall: Recall in character level, ignore
letter case and symbol.
- char_precision: Precision in character level, ignore
letter case and symbol.
1-N.E.D: 1 - normalized_edit_distance.
- 返回类型
eval_res (dict[str
-
mmocr.core.evaluation.
eval_hmean
(results, img_infos, ann_infos, metrics={'hmean-iou'}, score_thr=0.3, rank_list=None, logger=None, **kwargs)[源代码]¶ Evaluation in hmean metric.
- 参数
results (list[dict]) – Each dict corresponds to one image, containing the following keys: boundary_result
img_infos (list[dict]) – Each dict corresponds to one image, containing the following keys: filename, height, width
ann_infos (list[dict]) – Each dict corresponds to one image, containing the following keys: masks, masks_ignore
score_thr (float) – Score threshold of prediction map.
metrics (set{str}) – Hmean metric set, should be one or all of {‘hmean-iou’, ‘hmean-ic13’}
- 返回
float]
- 返回类型
dict[str
-
mmocr.core.evaluation.
compute_f1_score
(preds, gts, ignores=[])[源代码]¶ Compute the F1-score of prediction.
- 参数
preds (Tensor) – The predicted probability NxC map with N and C being the sample number and class number respectively.
gts (Tensor) – The ground truth vector of size N.
ignores – The index set of classes that are ignored when reporting results. Note: all samples are participated in computing.
mmocr.utils¶
-
mmocr.utils.
get_root_logger
(log_file=None, log_level=20)[源代码]¶ Use get_logger method in mmcv to get the root logger.
The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmpose”.
- 参数
log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.
log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.
- 返回
The root logger.
- 返回类型
logging.Logger
-
mmocr.utils.
drop_orientation
(img_file)[源代码]¶ Check if the image has orientation information. If yes, ignore it by converting the image format to png, and return new filename, otherwise return the original filename.
- 参数
img_file (str) – The image path
- 返回
The converted image filename with proper postfix
mmocr.models¶
common_backbones¶
-
class
mmocr.models.common.backbones.
UNet
(in_channels=3, base_channels=64, num_stages=5, strides=(1, 1, 1, 1, 1), enc_num_convs=(2, 2, 2, 2, 2), dec_num_convs=(2, 2, 2, 2), downsamples=(True, True, True, True), enc_dilations=(1, 1, 1, 1, 1), dec_dilations=(1, 1, 1, 1), with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, upsample_cfg={'type': 'InterpConv'}, norm_eval=False, dcn=None, plugins=None)[源代码]¶ UNet backbone. U-Net: Convolutional Networks for Biomedical Image Segmentation. https://arxiv.org/pdf/1505.04597.pdf
- 参数
in_channels (int) – Number of input image channels. Default” 3.
base_channels (int) – Number of base channels of each stage. The output channels of the first stage. Default: 64.
num_stages (int) – Number of stages in encoder, normally 5. Default: 5.
strides (Sequence[int 1 | 2]) – Strides of each stage in encoder. len(strides) is equal to num_stages. Normally the stride of the first stage in encoder is 1. If strides[i]=2, it uses stride convolution to downsample in the correspondence encoder stage. Default: (1, 1, 1, 1, 1).
enc_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence encoder stage. Default: (2, 2, 2, 2, 2).
dec_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence decoder stage. Default: (2, 2, 2, 2).
downsamples (Sequence[int]) – Whether use MaxPool to downsample the feature map after the first stage of encoder (stages: [1, num_stages)). If the correspondence encoder stage use stride convolution (strides[i]=2), it will never use MaxPool to downsample, even downsamples[i-1]=True. Default: (True, True, True, True).
enc_dilations (Sequence[int]) – Dilation rate of each stage in encoder. Default: (1, 1, 1, 1, 1).
dec_dilations (Sequence[int]) – Dilation rate of each stage in decoder. Default: (1, 1, 1, 1).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
conv_cfg (dict | None) – Config dict for convolution layer. Default: None.
norm_cfg (dict | None) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict | None) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’).
upsample_cfg (dict) – The upsample config of the upsample module in decoder. Default: dict(type=’InterpConv’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
dcn (bool) – Use deformable convolution in convolutional layer or not. Default: None.
plugins (dict) – plugins for convolutional layers. Default: None.
- Notice:
The input image size should be divisible by the whole downsample rate of the encoder. More detail of the whole downsample rate can be found in UNet._check_input_divisible.
-
class
mmocr.models.common.losses.
FocalLoss
(gamma=2, weight=None, ignore_index=-100)[源代码]¶ Multi-class Focal loss implementation.
- 参数
gamma (float) – The larger the gamma, the smaller the loss weight of easier samples.
weight (float) – A manual rescaling weight given to each class.
ignore_index (int) – Specifies a target value that is ignored and does not contribute to the input gradient.
textdet_dense_heads¶
textdet_necks¶
textdet_detectors¶
textdet_losses¶
textdet_postprocess¶
textrecog_recognizer¶
textrecog_backbones¶
textrecog_necks¶
textrecog_heads¶
textrecog_convertors¶
textrecog_encoders¶
textrecog_decoders¶
textrecog_losses¶
textrecog_backbones¶
textrecog_layers¶
kie_extractors¶
-
class
mmocr.models.kie.extractors.
SDMGR
(backbone, neck=None, bbox_head=None, extractor={'featmap_strides': [1], 'roi_layer': {'output_size': 7, 'type': 'RoIAlign'}, 'type': 'SingleRoIExtractor'}, visual_modality=False, train_cfg=None, test_cfg=None, pretrained=None, class_list=None)[源代码]¶ The implementation of the paper: Spatial Dual-Modality Graph Reasoning for Key Information Extraction. https://arxiv.org/abs/2103.14470.
- 参数
visual_modality (bool) – Whether use the visual modality.
class_list (None | str) – Mapping file of class index to class name. If None, class index will be shown in show_results, else class name.
-
forward_train
(img, img_metas, relations, texts, gt_bboxes, gt_labels)[源代码]¶ - 参数
img (tensor) – Input images of shape (N, C, H, W). Typically these should be mean centered and std scaled.
img_metas (list[dict]) – A list of image info dict where each dict contains: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details of the values of these keys, please see
mmdet.datasets.pipelines.Collect
.relations (list[tensor]) – Relations between bboxes.
texts (list[tensor]) – Texts in bboxes.
gt_bboxes (list[tensor]) – Each item is the truth boxes for each image in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[tensor]) – Class indices corresponding to each box.
- 返回
A dictionary of loss components.
- 返回类型
dict[str, tensor]
-
show_result
(img, result, boxes, win_name='', show=False, wait_time=0, out_file=None, **kwargs)[源代码]¶ Draw result on img.
- 参数
img (str or tensor) – The image to be displayed.
result (dict) – The results to draw on img.
boxes (list) – Bbox of img.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param. Default: 0.
show (bool) – Whether to show the image. Default: False.
out_file (str or None) – The output filename. Default: None.
- 返回
Only if not show or out_file.
- 返回类型
img (tensor)
kie_heads¶
mmocr.datasets¶
datasets¶
-
class
mmocr.datasets.base_dataset.
BaseDataset
(ann_file, loader, pipeline, img_prefix='', test_mode=False)[源代码]¶ Custom dataset for text detection, text recognition, and their downstream tasks.
The text detection annotation format is as follows: The annotations field is optional for testing (this is one line of anno_file, with line-json-str
converted to dict for visualizing only).
- {
“file_name”: “sample.jpg”, “height”: 1080, “width”: 960, “annotations”:
- [
- {
“iscrowd”: 0, “category_id”: 1, “bbox”: [357.0, 667.0, 804.0, 100.0], “segmentation”: [[361, 667, 710, 670,
72, 767, 357, 763]]
}
]
}
The two text recognition annotation formats are as follows: The x1,y1,x2,y2,x3,y3,x4,y4 field is used for online crop augmentation during training.
format1: sample.jpg hello format2: sample.jpg 20 20 100 20 100 40 20 40 hello
- 参数
ann_file (str) – Annotation file path.
pipeline (list[dict]) – Processing pipeline.
loader (dict) – Dictionary to construct loader to load annotation infos.
img_prefix (str, optional) – Image prefix to generate full image path.
test_mode (bool, optional) – If set True, try…except will be turned off in __getitem__.
-
evaluate
(results, metric=None, logger=None, **kwargs)[源代码]¶ Evaluate the dataset.
- 参数
results (list) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
- 返回
float]
- 返回类型
dict[str
-
class
mmocr.datasets.icdar_dataset.
IcdarDataset
(ann_file, pipeline, classes=None, data_root=None, img_prefix='', seg_prefix=None, proposal_file=None, test_mode=False, filter_empty_gt=True, select_first_k=-1)[源代码]¶ -
evaluate
(results, metric='hmean-iou', logger=None, score_thr=0.3, rank_list=None, **kwargs)[源代码]¶ Evaluate the hmean metric.
- 参数
results (list[dict]) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
rank_list (str) – json file used to save eval result of each image after ranking.
- 返回
float]]: The evaluation results.
- 返回类型
dict[dict[str
-